Ben Kirwin created SAMZA-459:
--------------------------------
Summary: Explicit flush for individual output streams
Key: SAMZA-459
URL: https://issues.apache.org/jira/browse/SAMZA-459
Project: Samza
Issue Type: Improvement
Reporter: Ben Kirwin
Priority: Minor
>From the mailing list:
http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201411.mbox/%3CCACuX-D8-CS7867ob47fqytCAdvGURc4owv82Rhg2oEJYmr8hpg%40mail.gmail.com%3E
At the moment, the only way to trigger a flush of the output streams is to call
TaskCoordinator.commit, which also flushes the state and saves the checkpoints.
There are a few cases where more granularity would be useful: writing out a
single stream can be much faster than doing a full commit, and if a user cares
about the order in which messages are published, they can disable the
autocommit and trigger flushes manually.
I'd anticipate this to look something like
TaskCoordinator.flush(systemStream). It looks like the TaskCoordinator normally
only queues up work, instead of doing it synchronously -- if that's the case,
it should be enough to buffer up all the requested flushes, then perform them
in order when the moment comes.
Note: you could get *almost* the same effect by switching to a synchronous
system and letting the user send a batch of messages all at once, much as the
underlying Kafka client does. This woudn't let you flush a changelog stream,
though.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)