Github user hmcl commented on a diff in the pull request: https://github.com/apache/storm/pull/2380#discussion_r147022902 --- Diff: docs/storm-kafka-client.md --- @@ -298,25 +298,44 @@ Currently the Kafka spout has has the following default values, which have been * max.uncommitted.offsets = 10000000 <br/> -# Messaging reliability modes +# Processing Guarantees -In some cases you may not need or want the spout to guarantee at-least-once processing of messages. The spout also supports at-most-once and any-times modes. At-most-once guarantees that any tuple emitted to the topology will never be reemitted. Any-times makes no guarantees, but may reduce the overhead of committing offsets to Kafka in cases where you truly don't care how many times a message is processed. +The `KafkaSpoutConfig.ProcessingGuarantee` enum parameter controls when the tuple with the `ConsumerRecord` for an offset is marked +as processed, i.e. when the offset is committed to Kafka. For AT_LEAST_ONCE and AT_MOST_ONCE guarantees the spout controls when +the commit happens. When the guarantee is NONE Kafka controls when the commit happens. + +* AT_LEAST_ONCE - an offset is ready to commit only after the corresponding tuple has been processed (at-least-once) + and acked. If a tuple fails or times-out it will be re-emitted. A tuple can be processed more than once if for instance + the ack gets lost. + +* AT_MOST_ONCE - every offset will be committed to Kafka right after being polled but before being emitted + to the downstream components of the topology. It guarantees that the offset is processed at-most-once because it + won't retry tuples that fail or timeout after the commit to Kafka has been done. + +* NONE - the polled offsets are committed to Kafka periodically as controlled by the Kafka properties + "enable.auto.commit" and "auto.commit.interval.ms". Because the spout does not control when the commit happens + it cannot give any message processing guarantees, i.e. a message may be processed 0, 1 or more times. + This option requires "enable.auto.commit=true". If "enable.auto.commit=false" an exception will be thrown. + +To set the processing guarantee use the `KafkaSpoutConfig.Builder.setProcessingGuarantee` method as follows: -To set the processing guarantee, use the KafkaSpoutConfig.Builder.setProcessingGuarantee method, e.g. ```java KafkaSpoutConfig<String, String> kafkaConf = KafkaSpoutConfig .builder(String bootstrapServers, String ... topics) .setProcessingGuarantee(ProcessingGuarantee.AT_MOST_ONCE) ``` -The spout will disable tuple tracking for emitted tuples by default when you use at-most-once or any-times. In some cases you may want to enable tracking anyway, because tuple tracking is necessary for some features of Storm, e.g. showing complete latency in Storm UI, or enabling backpressure through the `Config.TOPOLOGY_MAX_SPOUT_PENDING` parameter. +# Tuple Tracking + +By default the spout only tracks emitted tuples when the processing guarantee is AT_LEAST_ONCE. It may be necessary to track +emitted tuples with other processing guarantees to benefit of Storm features such as showing complete latency in the UI, --- End diff -- Done
---