[ 
https://issues.apache.org/jira/browse/SAMZA-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14297402#comment-14297402
 ] 

Chris Riccomini commented on SAMZA-541:
---------------------------------------

Need to think this through a bit. It's unclear that there'd be much advantage 
to supporting this.

> Passing coordinator to producer chains
> --------------------------------------
>
>                 Key: SAMZA-541
>                 URL: https://issues.apache.org/jira/browse/SAMZA-541
>             Project: Samza
>          Issue Type: Improvement
>          Components: container
>            Reporter: Jae Hyeon Bae
>            Assignee: Jae Hyeon Bae
>
> StreamTask can control SamzaContainer.commit() through task coordinator but 
> SystemProducer can call flush() without commit() which can create duplicate 
> data on container failure. For example, 
> T=30s; flush 
> T=45s; flush
> T=60s; flush && commit
> T=65s; flush
> "If there's a failure before 60s, the messages that were flushed at 30s and 
> 45s will be duplicated when the container reprocesses" (quoted from 
> [~criccomini] response).
> If SystemProducer can call access TaskCoordinator created from RunLoop in 
> SamzaContainer, it will be flexible to control 'exactly-once-delivery'.
> The following interface should be changed:
> {code}
> MessageCollector.send(envelop, coordinator)
> SystemProducers.send(source, envelop, coordinator)
> SystemProducer.send(source, envelop, coordinator)
> {code}
> Kafka 0.8 new Java producer cannot be synchronized with TaskInstance.commit 
> because it doesn't do batch-flush. Depending on SystemProducer 
> implementation, it can guarantee 'exactly-once-delivery'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to