[ 
https://issues.apache.org/jira/browse/SAMZA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128695#comment-14128695
 ] 

Chris Riccomini commented on SAMZA-405:
---------------------------------------

A detailed post on different implementation styles for "exactly once messaging" 
written by [~jkreps] can be found 
[here|http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201409.mbox/%3CCAOeJiJg%2Bc7Ei%3DgzCuOz30DD3G5Hm9yFY%3DUJ6SafdNUFbvRgorg%40mail.gmail.com%3E].

> Trying for deterministic behavior on recovery and rewind
> --------------------------------------------------------
>
>                 Key: SAMZA-405
>                 URL: https://issues.apache.org/jira/browse/SAMZA-405
>             Project: Samza
>          Issue Type: Improvement
>            Reporter: Roger Hoover
>
> Ideally, we want streaming tasks to produce the exact same output on recovery 
> or rewind as they did/would during normal operation.  After thinking harder 
> on this, I don't believe it's possible with at-least-once semantics.  I think 
> duplicates break ordering guarantees.  For any message that updates local 
> state, it can always be surrounded on both sides by duplicate of another 
> message which negates it.  Nonetheless, we can get closer now and if 
> idempotent producers later are supported by Kafka, we'll have what we want.
> See discussion here: 
> http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201409.mbox/%3CCAPOm=tpsevpxludaxintxj9z54gyeanocv5dk7nbotsgpd-...@mail.gmail.com%3E
> Here are three changes that seem to make sense for Samza to support in order 
> to achieve this.
> 1) Bootstrapping is only appropriate on cold start, not when restoring saved 
> state.  On recovery, local state will be restored from the change log.
> 2) Local state should be saved and restored atomically with checkpoint state. 
>  This may require support for transactions in Kafka.
> 3) Ability to store and replay message chooser history.  Samza could have a 
> configuration option to save a history of the messages a task has processed.  
> This log could be used during recovery or rewind to replay messages in a 
> deterministic order. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to