[GitHub] storm pull request #2380: STORM-2781: Refactor storm-kafka-client KafkaSpout...

hmcl Wed, 25 Oct 2017 17:39:06 -0700

Github user hmcl commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2380#discussion_r147022902
  
    --- Diff: docs/storm-kafka-client.md ---
    @@ -298,25 +298,44 @@ Currently the Kafka spout has has the following 
default values, which have been
     * max.uncommitted.offsets = 10000000
     <br/>
     
    -# Messaging reliability modes
    +# Processing Guarantees
     
    -In some cases you may not need or want the spout to guarantee 
at-least-once processing of messages. The spout also supports at-most-once and 
any-times modes. At-most-once guarantees that any tuple emitted to the topology 
will never be reemitted. Any-times makes no guarantees, but may reduce the 
overhead of committing offsets to Kafka in cases where you truly don't care how 
many times a message is processed.
    +The `KafkaSpoutConfig.ProcessingGuarantee` enum parameter controls when 
the tuple with the `ConsumerRecord` for an offset is marked
    +as processed, i.e. when the offset is committed to Kafka. For 
AT_LEAST_ONCE and AT_MOST_ONCE guarantees the spout controls when
    +the commit happens. When the guarantee is NONE Kafka controls when the 
commit happens.
    +
    +* AT_LEAST_ONCE - an offset is ready to commit only after the 
corresponding tuple has been processed (at-least-once)
    +     and acked. If a tuple fails or times-out it will be re-emitted. A 
tuple can be processed more than once if for instance
    +     the ack gets lost.
    +
    +* AT_MOST_ONCE - every offset will be committed to Kafka right after being 
polled but before being emitted
    +     to the downstream components of the topology. It guarantees that the 
offset is processed at-most-once because it
    +     won't retry tuples that fail or timeout after the commit to Kafka has 
been done.
    +
    +* NONE - the polled offsets are committed to Kafka periodically as 
controlled by the Kafka properties
    +     "enable.auto.commit" and "auto.commit.interval.ms". Because the spout 
does not control when the commit happens
    +     it cannot give any message processing guarantees, i.e. a message may 
be processed 0, 1 or more times.
    +     This option requires "enable.auto.commit=true". If 
"enable.auto.commit=false" an exception will be thrown.
    +
    +To set the processing guarantee use the 
`KafkaSpoutConfig.Builder.setProcessingGuarantee` method as follows:
     
    -To set the processing guarantee, use the 
KafkaSpoutConfig.Builder.setProcessingGuarantee method, e.g.
     ```java
     KafkaSpoutConfig<String, String> kafkaConf = KafkaSpoutConfig
       .builder(String bootstrapServers, String ... topics)
       .setProcessingGuarantee(ProcessingGuarantee.AT_MOST_ONCE)
     ```
     
    -The spout will disable tuple tracking for emitted tuples by default when 
you use at-most-once or any-times. In some cases you may want to enable 
tracking anyway, because tuple tracking is necessary for some features of 
Storm, e.g. showing complete latency in Storm UI, or enabling backpressure 
through the `Config.TOPOLOGY_MAX_SPOUT_PENDING` parameter.
    +# Tuple Tracking
    +
    +By default the spout only tracks emitted tuples when the processing 
guarantee is AT_LEAST_ONCE. It may be necessary to track
    +emitted tuples with other processing guarantees to benefit of Storm 
features such as showing complete latency in the UI,
    --- End diff --
    
    Done

---

[GitHub] storm pull request #2380: STORM-2781: Refactor storm-kafka-client KafkaSpout...

Reply via email to