[
https://issues.apache.org/jira/browse/SAMZA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128695#comment-14128695
]
Chris Riccomini commented on SAMZA-405:
---------------------------------------
A detailed post on different implementation styles for "exactly once messaging"
written by [~jkreps] can be found
[here|http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201409.mbox/%3CCAOeJiJg%2Bc7Ei%3DgzCuOz30DD3G5Hm9yFY%3DUJ6SafdNUFbvRgorg%40mail.gmail.com%3E].
> Trying for deterministic behavior on recovery and rewind
> --------------------------------------------------------
>
> Key: SAMZA-405
> URL: https://issues.apache.org/jira/browse/SAMZA-405
> Project: Samza
> Issue Type: Improvement
> Reporter: Roger Hoover
>
> Ideally, we want streaming tasks to produce the exact same output on recovery
> or rewind as they did/would during normal operation. After thinking harder
> on this, I don't believe it's possible with at-least-once semantics. I think
> duplicates break ordering guarantees. For any message that updates local
> state, it can always be surrounded on both sides by duplicate of another
> message which negates it. Nonetheless, we can get closer now and if
> idempotent producers later are supported by Kafka, we'll have what we want.
> See discussion here:
> http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201409.mbox/%3CCAPOm=tpsevpxludaxintxj9z54gyeanocv5dk7nbotsgpd-...@mail.gmail.com%3E
> Here are three changes that seem to make sense for Samza to support in order
> to achieve this.
> 1) Bootstrapping is only appropriate on cold start, not when restoring saved
> state. On recovery, local state will be restored from the change log.
> 2) Local state should be saved and restored atomically with checkpoint state.
> This may require support for transactions in Kafka.
> 3) Ability to store and replay message chooser history. Samza could have a
> configuration option to save a history of the messages a task has processed.
> This log could be used during recovery or rewind to replay messages in a
> deterministic order.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)