[ https://issues.apache.org/jira/browse/SAMZA-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14297402#comment-14297402 ]
Chris Riccomini commented on SAMZA-541: --------------------------------------- Need to think this through a bit. It's unclear that there'd be much advantage to supporting this. > Passing coordinator to producer chains > -------------------------------------- > > Key: SAMZA-541 > URL: https://issues.apache.org/jira/browse/SAMZA-541 > Project: Samza > Issue Type: Improvement > Components: container > Reporter: Jae Hyeon Bae > Assignee: Jae Hyeon Bae > > StreamTask can control SamzaContainer.commit() through task coordinator but > SystemProducer can call flush() without commit() which can create duplicate > data on container failure. For example, > T=30s; flush > T=45s; flush > T=60s; flush && commit > T=65s; flush > "If there's a failure before 60s, the messages that were flushed at 30s and > 45s will be duplicated when the container reprocesses" (quoted from > [~criccomini] response). > If SystemProducer can call access TaskCoordinator created from RunLoop in > SamzaContainer, it will be flexible to control 'exactly-once-delivery'. > The following interface should be changed: > {code} > MessageCollector.send(envelop, coordinator) > SystemProducers.send(source, envelop, coordinator) > SystemProducer.send(source, envelop, coordinator) > {code} > Kafka 0.8 new Java producer cannot be synchronized with TaskInstance.commit > because it doesn't do batch-flush. Depending on SystemProducer > implementation, it can guarantee 'exactly-once-delivery'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)