GitHub user srdo opened a pull request:

    https://github.com/apache/storm/pull/2249

    WIP: STORM-2648/STORM-2357: Add storm-kafka-client support for 
at-most-onc…

    …e processing and a toggle for whether messages should be emitted with a 
message id when not using at-least-once
    
    See https://issues.apache.org/jira/browse/STORM-2357 and 
https://issues.apache.org/jira/browse/STORM-2648.
    
    I'd like to get some opinions on whether this approach is a good idea, or 
whether I've overlooked a better option, before finishing this up with some 
tests. I don't love that we'll end up with 3 different committing behaviors.
    
    In 2357 it was noted that the spout doesn't currently support true 
at-most-once, because using Kafka's auto commit option leaves the possibility 
that the spout receives a tuple, emits it to the topology, crashes and 
recovers, and then receives and emits the same tuple. The linked issue suggests 
solving this by committing polled offsets before emitting them to the topology, 
which is an option added here.
    
    2648 notes that there is currently no way to make Storm track messages when 
using auto commit with this spout. This prevents Storm UI from showing the 
complete latency for the spout, and I would assume also prevents max spout 
pending from having an effect. I've added a toggle to KafkaSpoutConfig to force 
the spout to emit messages with message ids, even when using auto commit or the 
at-most-once option. The spout does nothing on ack or fail when not doing 
at-least-once.
    
    I'd like to keep the spout config simple for the user, so I've added a 
processing guarantee setting corresponding to the standard at-least-once code 
path, the path that uses auto commit, and the path that commits offsets before 
emitting any tuples. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/srdo/storm STORM-2648

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/2249.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2249
    
----
commit 4fc4b71f9720f506be20740f780dfef93f2dd036
Author: Stig Rohde Døssing <[email protected]>
Date:   2017-07-31T18:26:55Z

    STORM-2648/STORM-2357: Add storm-kafka-client support for at-most-once 
processing and a toggle for whether messages should be emitted with a message 
id when not using at-least-once

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to