Github user hmcl commented on a diff in the pull request:
https://github.com/apache/storm/pull/2380#discussion_r147012506
--- Diff: docs/storm-kafka-client.md ---
@@ -298,25 +298,44 @@ Currently the Kafka spout has has the following
default values, which have been
* max.uncommitted.offsets = 10000000
<br/>
-# Messaging reliability modes
+# Processing Guarantees
-In some cases you may not need or want the spout to guarantee
at-least-once processing of messages. The spout also supports at-most-once and
any-times modes. At-most-once guarantees that any tuple emitted to the topology
will never be reemitted. Any-times makes no guarantees, but may reduce the
overhead of committing offsets to Kafka in cases where you truly don't care how
many times a message is processed.
+The `KafkaSpoutConfig.ProcessingGuarantee` enum parameter controls when
the tuple with the `ConsumerRecord` for an offset is marked
+as processed, i.e. when the offset is committed to Kafka. For
AT_LEAST_ONCE and AT_MOST_ONCE guarantees the spout controls when
+the commit happens. When the guarantee is NONE Kafka controls when the
commit happens.
+
+* AT_LEAST_ONCE - an offset is ready to commit only after the
corresponding tuple has been processed (at-least-once)
+ and acked. If a tuple fails or times-out it will be re-emitted. A
tuple can be processed more than once if for instance
+ the ack gets lost.
+
+* AT_MOST_ONCE - every offset will be committed to Kafka right after being
polled but before being emitted
+ to the downstream components of the topology. It guarantees that the
offset is processed at-most-once because it
+ won't retry tuples that fail or timeout after the commit to Kafka has
been done.
+
+* NONE - the polled offsets are committed to Kafka periodically as
controlled by the Kafka properties
+ "enable.auto.commit" and "auto.commit.interval.ms". Because the spout
does not control when the commit happens
+ it cannot give any message processing guarantees, i.e. a message may
be processed 0, 1 or more times.
+ This option requires "enable.auto.commit=true". If
"enable.auto.commit=false" an exception will be thrown.
+
+To set the processing guarantee use the
`KafkaSpoutConfig.Builder.setProcessingGuarantee` method as follows:
-To set the processing guarantee, use the
KafkaSpoutConfig.Builder.setProcessingGuarantee method, e.g.
```java
KafkaSpoutConfig<String, String> kafkaConf = KafkaSpoutConfig
.builder(String bootstrapServers, String ... topics)
.setProcessingGuarantee(ProcessingGuarantee.AT_MOST_ONCE)
```
-The spout will disable tuple tracking for emitted tuples by default when
you use at-most-once or any-times. In some cases you may want to enable
tracking anyway, because tuple tracking is necessary for some features of
Storm, e.g. showing complete latency in Storm UI, or enabling backpressure
through the `Config.TOPOLOGY_MAX_SPOUT_PENDING` parameter.
+# Tuple Tracking
+
+By default the spout only tracks emitted tuples when the processing
guarantee is AT_LEAST_ONCE. It may be necessary to track
+emitted tuples with other processing guarantees to benefit of Storm
features such as showing complete latency in the UI,
+or enabling backpressure with Config.TOPOLOGY_MAX_SPOUT_PENDING.
-If you need to enable tracking, use the
KafkaSpoutConfig.Builder.setForceEnableTupleTracking method, e.g.
```java
KafkaSpoutConfig<String, String> kafkaConf = KafkaSpoutConfig
.builder(String bootstrapServers, String ... topics)
.setProcessingGuarantee(ProcessingGuarantee.AT_MOST_ONCE)
- .setForceEnableTupleTracking(true)
+ .setTupleTrackingEnforced(true)
```
-Note that this setting has no effect in at-least-once mode, where tuple
tracking is always enabled.
\ No newline at end of file
+Note: This setting has no effect with AT_LEAST_ONCE processing guarantees
where tuple tracking is required and therefore always enabled.
--- End diff --
Done
---