[ https://issues.apache.org/jira/browse/SPARK-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tathagata Das updated SPARK-4964: --------------------------------- Summary: Exactly-once + WAL-free Kafka Support in Spark Streaming (was: Exactly-once semantics for Kafka) > Exactly-once + WAL-free Kafka Support in Spark Streaming > -------------------------------------------------------- > > Key: SPARK-4964 > URL: https://issues.apache.org/jira/browse/SPARK-4964 > Project: Spark > Issue Type: Improvement > Components: Streaming > Reporter: Cody Koeninger > > for background, see > http://apache-spark-developers-list.1001551.n3.nabble.com/Which-committers-care-about-Kafka-td9827.html > Requirements: > - allow client code to implement exactly-once end-to-end semantics for Kafka > messages, in cases where their output storage is either idempotent or > transactional > - allow client code access to Kafka offsets, rather than automatically > committing them > - do not assume Zookeeper as a repository for offsets (for the transactional > case, offsets need to be stored in the same store as the data) > - allow failure recovery without lost or duplicated messages, even in cases > where a checkpoint cannot be restored (for instance, because code must be > updated) > Design: > The basic idea is to make an rdd where each partition corresponds to a given > Kafka topic, partition, starting offset, and ending offset. That allows for > deterministic replay of data from Kafka (as long as there is enough log > retention). > Client code is responsible for committing offsets, either transactionally to > the same store that data is being written to, or in the case of idempotent > data, after data has been written. > PR of a sample implementation for both the batch and dstream case is > forthcoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org