Github user hmcl commented on a diff in the pull request: https://github.com/apache/storm/pull/2380#discussion_r147012486 --- Diff: docs/storm-kafka-client.md --- @@ -298,25 +298,44 @@ Currently the Kafka spout has has the following default values, which have been * max.uncommitted.offsets = 10000000 <br/> -# Messaging reliability modes +# Processing Guarantees -In some cases you may not need or want the spout to guarantee at-least-once processing of messages. The spout also supports at-most-once and any-times modes. At-most-once guarantees that any tuple emitted to the topology will never be reemitted. Any-times makes no guarantees, but may reduce the overhead of committing offsets to Kafka in cases where you truly don't care how many times a message is processed. +The `KafkaSpoutConfig.ProcessingGuarantee` enum parameter controls when the tuple with the `ConsumerRecord` for an offset is marked +as processed, i.e. when the offset is committed to Kafka. For AT_LEAST_ONCE and AT_MOST_ONCE guarantees the spout controls when +the commit happens. When the guarantee is NONE Kafka controls when the commit happens. + +* AT_LEAST_ONCE - an offset is ready to commit only after the corresponding tuple has been processed (at-least-once) + and acked. If a tuple fails or times-out it will be re-emitted. A tuple can be processed more than once if for instance + the ack gets lost. + +* AT_MOST_ONCE - every offset will be committed to Kafka right after being polled but before being emitted + to the downstream components of the topology. It guarantees that the offset is processed at-most-once because it + won't retry tuples that fail or timeout after the commit to Kafka has been done. --- End diff -- Done
---