[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352004#comment-16352004 ] Jungtaek Lim commented on STORM-2914: - Thanks [~Srdo], merged into master. Since there's some diverge between master and 1.x-branch for storm-kafka-client, I'd like to wait for [~Srdo] to submit pull request against 1.x branch. We can merge it afterwards. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Attachments: storm-kafka-modes.ods > > Time Spent: 4h 40m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352003#comment-16352003 ] Jungtaek Lim commented on STORM-2914: - Thanks [~Srdo], merged into master. Since there's some diverge between master and 1.x-branch for storm-kafka-client, I'd like to wait for [~Srdo] to submit pull request against 1.x branch. We can merge it afterwards. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Attachments: storm-kafka-modes.ods > > Time Spent: 4h 40m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351584#comment-16351584 ] Hugo Louro commented on STORM-2914: --- [~Srdo] Thanks for working on this over the weekend. I am going to start reviewing your patch now. We are in agreement on the best timing to create an interface to support pluggable offsets storage. I also don't thing we should do it now, i.e. for 1.2.0, hence why I hinted at 1.2.x, 1.3+.x and 2.x, basically some release after 1.2.0. Thanks. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Attachments: storm-kafka-modes.ods > > Time Spent: 0.5h > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351308#comment-16351308 ] Alexandre Vermeerbergen commented on STORM-2914: Hello [~hmclouro] +1 for your proposal to get things decided quickly ! In your 7 questions lists, my vote is: +1 for item #5 : because I want to trust you guys on the meaning of "identical" in this sentence: "_Document that ProcessingGuarantee.NO_GUARANTEES has a behavior identical to auto.commit.enable=true but disregards this property altogether (if auto.commit.enable is set the KafkaSpoutConfig will throw an exception)_". And I have no problem changing the way I configure Kafka spout to get this "identical" behavior if documentation is crystal clear about it. The documentation needs to be very clear about how to replace legacy use of auto.commit.enable and related setting by newer ones. + 0 for other items, meaning I am neutral there, as I'm selfishly concerned by my need to keep a behavior "identical" to auto.commit.enable=true. I just hope that "identical" means "no noticeable difference in performance". Best regards, Alexandre Vermeerbergen PS: I also hope that in next RC, the toollib/ directory will be cleaned up from *src* and *javadoc* artifacts, which seems to break the display of Kafka spout statistics in Nimbus UI in 1.2.0 RC2 (see my previous comment about this in this thread).** > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351147#comment-16351147 ] Hugo Louro commented on STORM-2914: --- [~Srdo][~kabhwan][~avermeerbergen] Let’s try to quickly agree on the specific action points that will help us move past any blockers and release Storm 1.2.0. I suggest the following, which will give us a good balance between making this release stable, but at the same time will give us the flexibility to accommodate changes that make sense in the future. # (Optional) Rename ProcessingGuarantee.NONE to ProcessingGuarantee.NO_GUARANTEES or UNDEFINED (as Spark uses). I think that either of these names are clearer and emphasize that this option is likely not what the users wants # Annotate the enum ProcessingGuarantee with *@Unstable* and document that ProcessingGuarantee.NO_GUARANTEES may be removed in the near future. # For now we keep supporting NO_GUARANTEES mode, but there is a good chance that we will remove it down the line, unless for performance reasons. We will do a benchmark for this. # NO_GUARANTEES will use *kafkaConsumer.commitAsync(offsetsToCommit, null);* as [~Srdo] # proposed in the PR. # Document that ProcessingGuarantee.NO_GUARANTEES has a behavior identical to auto.commit.enable=true but disregards this property altogether (if auto.commit.enable is set the KafkaSpoutConfig will throw an exception). Furthermore, the commits will not be based on a timer as controlled by auto.commit.interval.ms, but rather using the following strategy ([~Srdo] # perhaps you can add the details in here). # AT_MOST_ONCE will use *kafkaConsumer.commitSync(offsetsToCommit);* as [~Srdo] # proposed in the PR # AT_LEAST_ONCE will stay as is (in the current code and in the PR) because this option requires no change. # Slightly unrelated with this PR but to address [~kabhwan]’s comment on how to persist the offsets. The next version of the KafkaSpout (1.2.x and 1.3+.x) should have offsets persistence pluggable and abstracted behind an interface. The first implementation of this interface will commit offsets to Kafka as it is currently doing. In the future we can add support to write offsets to arbitrary durable storage. If we agree on this, let's incorporate these changes in the PRs [~Srdo] created and target having them merged by end of Monday. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349477#comment-16349477 ] Priyank Shah commented on STORM-2914: - [~avermeerbergen] Not sure how the javadoc and sources jar got in to the binary distribution. On 1.1.1 binary distribution package i only see storm-kafka-monitor-1.1.1.jar That should be okay and not create any issue in loading spout offset lags in storm ui. I will verify in the latest RCs as well. Can you double check as well to see if the other jars are present and if that is creating an issue in displaying the offset lags? > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349373#comment-16349373 ] Jungtaek Lim commented on STORM-2914: - [~Srdo] You've got my points. Thanks for reading my comment thoughtfully. Regarding commit in ack(), we may need to take into consideration that unlike nextTuple() it is called only when spout receives ack. It should be easiest one but we don't want to commit every ack (acks may not come sequentially, and performance matters). But if we apply timer to ack(), we need to deal with edge case: not yet timed-out but received some acks afterwards which are "last" acks. Possible to deal with, but might be tricky. That's why I mentioned hybrid approach - allowing commit to Kafka in both nextTuple() and ack(), which still has such edge case as well but the chance is much smaller. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349315#comment-16349315 ] Alexandre Vermeerbergen commented on STORM-2914: Hello [~Srdo] Regarding storm-kafka-monitoring*.jar : thanks for your explanation, it was indeed already in our toollib/ directory from Storm 1.2.0RC2: {{[root@ip-172-31-18-84 storm-stable]# ls toollib/}} {{storm-kafka-monitor-1.2.0.jar storm-kafka-monitor-1.2.0-javadoc.jar storm-kafka-monitor-1.2.0-sources.jar}} But since I was no longer seeing Kafka spout stats in Nimbus UI's log, and it seem that this was broken because of some missing class issue (very strange that it's trying to load a class from *javadoc.jar" file, isn't it ?) : {{org.apache.storm.utils.ShellUtils$ExitCodeException: Error: Could not find or load main class .usr.local.Storm.storm-stable.toollib.storm-kafka-monitor-1.2.0-javadoc.jar}} {{ at org.apache.storm.utils.ShellUtils.runCommand(ShellUtils.java:231) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.utils.ShellUtils.run(ShellUtils.java:161) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.utils.ShellUtils$ShellCommandExecutor.execute(ShellUtils.java:371) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.utils.ShellUtils.execCommand(ShellUtils.java:461) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.utils.ShellUtils.execCommand(ShellUtils.java:444) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.utils.TopologySpoutLag.getLagResultForKafka(TopologySpoutLag.java:163) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.utils.TopologySpoutLag.getLagResultForNewKafkaSpout(TopologySpoutLag.java:189) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.utils.TopologySpoutLag.lag(TopologySpoutLag.java:57) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.ui.core$topology_lag.invoke(core.clj:805) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.ui.core$fn__9572.invoke(core.clj:1165) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.shade.compojure.core$make_route$fn__5965.invoke(core.clj:100) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.shade.compojure.core$if_route$fn__5953.invoke(core.clj:46) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.shade.compojure.core$if_method$fn__5946.invoke(core.clj:31) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.shade.compojure.core$routing$fn__5971.invoke(core.clj:113) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at clojure.core$some.invoke(core.clj:2570) ~[clojure-1.7.0.jar:?]}} {{ at org.apache.storm.shade.compojure.core$routing.doInvoke(core.clj:113) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at clojure.lang.RestFn.applyTo(RestFn.java:139) ~[clojure-1.7.0.jar:?]}} {{ at clojure.core$apply.invoke(core.clj:632) ~[clojure-1.7.0.jar:?]}} {{ at org.apache.storm.shade.compojure.core$routes$fn__5975.invoke(core.clj:118) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.shade.ring.middleware.cors$wrap_cors$fn__8880.invoke(cors.clj:149) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.shade.ring.middleware.json$wrap_json_params$fn__8827.invoke(json.clj:56) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.shade.ring.middleware.multipart_params$wrap_multipart_params$fn__6607.invoke(multipart_params.clj:118) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.shade.ring.middleware.reload$wrap_reload$fn__7890.invoke(reload.clj:22) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.ui.helpers$requests_middleware$fn__6860.invoke(helpers.clj:52) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.ui.core$catch_errors$fn__9747.invoke(core.clj:1428) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.shade.ring.middleware.keyword_params$wrap_keyword_params$fn__6527.invoke(keyword_params.clj:35) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.shade.ring.middleware.nested_params$wrap_nested_params$fn__6570.invoke(nested_params.clj:84) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.shade.ring.middleware.params$wrap_params$fn__6499.invoke(params.clj:64) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.shade.ring.middleware.multipart_params$wrap_multipart_params$fn__6607.invoke(multipart_params.clj:118) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.shade.ring.middleware.flash$wrap_flash$fn__6822.invoke(flash.clj:35) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.shade.ring.middleware.session$wrap_session$fn__6808.invoke(session.clj:98) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at org.apache.storm.shade.ring.util.servlet$make_service_method$fn__6357.invoke(servlet.clj:127) ~[storm-core-1.2.0.jar:1.2.0]}} {{ at
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349137#comment-16349137 ] Stig Rohde Døssing commented on STORM-2914: --- [~avermeerbergen] Yes, those logs are expected to occur, but only once per spout instance. Once the spout commits, the log should not be shown again. [~kabhwan] Sure, if we're moving to a stateful checkpointing model we might want to use that mechanism's storage to commit Kafka offsets as well. I don't think there's a reason to switch off of Kafka storage yet though. If I'm understanding you correctly, you're saying that it's fine to restrict how users can configure the spout and consumer, because some configurations don't make sense for Storm (right?) If so, I agree and I think it's what we're trying to do with e.g. the code around enable.auto.commit already ([https://github.com/apache/storm/blob/master/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java#L461),] or by not offering an option to use the subscribe API. This issue has largely the same kind of intent. I'm hoping to have a bit of time to compare NONE and AT_MOST_ONCE this weekend. I'm wondering if [~avermeerbergen] has any interest in comparing these two settings as well, since you have a real workload? Regarding the spout only committing when nextTuple is called, you have a point. I'm not sure why we couldn't move commit to happen during acks for the at-least-once case (the only one where acks matter). I don't have any insight into whether the case you describe would be likely to happen, but I'd think that it would often be possible to "luck into" a commit during the tuple sequence that leads up to triggering backpressure. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349071#comment-16349071 ] Priyank Shah commented on STORM-2914: - [~avermeerbergen] That jar is for displaying the Kafka spout offset lag in Storm UI if topology has a Kafka Spout. The jar should already be present in the toollib directory of your installation and you should not need to include it in your topologies' jar. Can you confirm that? > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348563#comment-16348563 ] Alexandre Vermeerbergen commented on STORM-2914: [~Srdo] Also I noticed that in the artifacts built by Maven from your 1.x branch, there's this new JAR file: storm-kafka-monitor-1.2.1-SNAPSHOT.jar What is it meant for? Should I include it into my topologies "BigJars" ? Worth mentioning: in Storm UI, the stats on Kafka consumption are no longer displayed.. is it related to this new JAR ? Best regards, Alexandre > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348558#comment-16348558 ] Alexandre Vermeerbergen commented on STORM-2914: Hello [~Srdo] I have build storm-kafka-client.jar from your 1.x branch, tests are starting now. While it's too early to say if it's OK or not, I noticed the following warning in one of our topologies's log: 2018-02-01 13:08:25.016 o.a.k.c.u.AppInfoParser Thread-4-podOrTenantsFromKafkaSpout-executor[2 2] [INFO] Kafka version : 0.10.2.1 2018-02-01 13:08:25.016 o.a.k.c.u.AppInfoParser Thread-4-podOrTenantsFromKafkaSpout-executor[2 2] [INFO] Kafka commitId : e89bffd6b2eff799 2018-02-01 13:08:25.076 o.a.s.k.s.KafkaSpout Thread-4-podOrTenantsFromKafkaSpout-executor[2 2] [INFO] Partitions revoked. [consumer-group=StormPodsyncTopology_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ, consumer=org.apache.kafka.clients.consumer.KafkaConsumer@5cb084d9, topic-partitions=[]] 2018-02-01 13:08:25.077 o.a.s.k.s.KafkaSpout Thread-4-podOrTenantsFromKafkaSpout-executor[2 2] [INFO] Partitions reassignment. [task-ID=2, consumer-group=StormPodsyncTopology_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ, consumer=org.apache.kafka.clients.consumer.KafkaConsumer@5cb084d9, topic-partitions=[podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-15, podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-4, podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-3, podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-6, podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-5, podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-0, podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-2, podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-1, podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-12, podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-11, podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-14, podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-13, podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-8, podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-7, podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-10, podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-9]] 2018-02-01 13:08:25.083 o.a.k.c.c.i.AbstractCoordinator Thread-4-podOrTenantsFromKafkaSpout-executor[2 2] [INFO] Discovered coordinator ec2-34-242-207-227.eu-west-1.compute.amazonaws.com:9092 (id: 2147483644 rack: null) for group StormPodsyncTopology_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ. 2018-02-01 13:08:25.116 o.a.s.k.s.i.CommitMetadata Thread-4-podOrTenantsFromKafkaSpout-executor[2 2] [WARN] Failed to deserialize [OffsetAndMetadata\{offset=63683, metadata='{topic-partition=podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-15, offset=63682, numFails=0, thread='Thread-4-podOrTenantsFromKafkaSpout-executor[2 2]'}'}]. Error likely occurred because the last commit for this topic-partition was done using an earlier version of Storm. Defaulting to behavior compatible with earlier version 2018-02-01 13:08:25.125 o.a.s.k.s.i.CommitMetadata Thread-4-podOrTenantsFromKafkaSpout-executor[2 2] [WARN] Failed to deserialize [OffsetAndMetadata\{offset=67701, metadata='{topic-partition=podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-4, offset=67700, numFails=0, thread='Thread-4-podOrTenantsFromKafkaSpout-executor[2 2]'}'}]. Error likely occurred because the last commit for this topic-partition was done using an earlier version of Storm. Defaulting to behavior compatible with earlier version 2018-02-01 13:08:25.130 o.a.s.k.s.i.CommitMetadata Thread-4-podOrTenantsFromKafkaSpout-executor[2 2] [WARN] Failed to deserialize [OffsetAndMetadata\{offset=69382, metadata='{topic-partition=podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-3, offset=69381, numFails=0, thread='Thread-4-podOrTenantsFromKafkaSpout-executor[2 2]'}'}]. Error likely occurred because the last commit for this topic-partition was done using an earlier version of Storm. Defaulting to behavior compatible with earlier version 2018-02-01 13:08:25.134 o.a.s.k.s.i.CommitMetadata Thread-4-podOrTenantsFromKafkaSpout-executor[2 2] [WARN] Failed to deserialize [OffsetAndMetadata\{offset=78943, metadata='{topic-partition=podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-6, offset=78942, numFails=0, thread='Thread-4-podOrTenantsFromKafkaSpout-executor[2 2]'}'}]. Error likely occurred because the last commit for this topic-partition was done using an earlier version of Storm. Defaulting to behavior compatible with earlier version 2018-02-01 13:08:25.135 o.a.s.k.s.i.CommitMetadata Thread-4-podOrTenantsFromKafkaSpout-executor[2 2] [WARN] Failed to deserialize [OffsetAndMetadata\{offset=76603, metadata='{topic-partition=podsync_RealTimeSupervision_zDHZGCMKTHGySBqVw9DiqQ-5, offset=76602, numFails=0, thread='Thread-4-podOrTenantsFromKafkaSpout-executor[2 2]'}'}]. Error likely occurred because the last commit for this
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347797#comment-16347797 ] Jungtaek Lim commented on STORM-2914: - Correction: If new model of backpressure doesn't adjust a rate but just block calling nextTuple(), when backpressure bubbles up to spout it may encounter same situation. New model doesn't always block spout in backpressure state, so the chance to encounter will become much less. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347675#comment-16347675 ] Jungtaek Lim commented on STORM-2914: - [~hmclouro] [~Srdo] As I showed the example of Flink and Spark and I believe others are doing it, we need to make data source being "adapted" to Storm's nature when implementing spout. We are making KafkaSpout being adapted to how Spout is working, and if we are going to support stateful exactly-once via checkpoint and need to make storm-kafka-client supporting this, we may also need to have a proper strategy like Spark and Flink, get rid of committing to kafka or at least not relying on that for checkpoint, since it is ideal to store checkpoint into "same" durable storage. We need to take this into account while designing the spout and restrict some features if necessary, instead of letting users to use whatever they want to, but with the risks we don't guide whichever will be. If we really want to provide the feature which doesn't conform to Storm's nature, we should warn users, guiding the possible risks are best, but at least guiding that they should know what is exactly they're doing now. That's why I'm not in favor of having NONE processing guarantee semantic only for storm-kafka-client, but if it really has enough value to take the risks (if there's huge gap between at-most-once and none in point of performance view) we may need to provide the feature, with guiding the risks. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346235#comment-16346235 ] Jungtaek Lim commented on STORM-2914: - Whatever we decide, it may be better to document such guide so that we could let users to be aware of any effects when enabling such option. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346232#comment-16346232 ] Jungtaek Lim commented on STORM-2914: - [~hmclouro] https://github.com/apache/flink/blob/1440e4febd589e320f846a2725e98aec8ee43e7f/docs/dev/connectors/kafka.md#kafka-consumers-offset-committing-behaviour-configuration Flink clearly documents how "enable.auto.commit" option plays with Flink's checkpoint mechanism. Key statement is below: {quote} "Note that the Flink Kafka Consumer does not rely on the committed offsets for fault tolerance guarantees. The committed offsets are only a means to expose the consumer's progress for monitoring purposes." {quote} Based on the statement, Flink doesn't have any concern about committing to Kafka, which is very different to Storm since storm-kafka-client is completely relying on Kafka committed offset. Here's Spark doc for streaming Kafka source: https://github.com/apache/spark/blob/55b8cfe6e6a6759d65bf219ff570fd6154197ec4/docs/streaming-kafka-0-10-integration.md#storing-offsets Relevant statements are below: {quote} Kafka has an offset commit API that stores offsets in a special Kafka topic. By default, the new consumer will periodically auto-commit offsets. This is almost certainly not what you want, because messages successfully polled by the consumer may not yet have resulted in a Spark output operation, resulting in undefined semantics. This is why the stream example above sets "enable.auto.commit" to false. {quote} Spark community also avoids to couple with "enable.auto.commit". Checkpoint would be done with different way, and even guide regarding storing offsets to kafka itself, they don't recommend using that option. cc. [~Srdo] [~avermeerbergen] > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346215#comment-16346215 ] Hugo Louro commented on STORM-2914: --- [~kabhwan] I am OK with supporting only at-most-once or at-least-once. Do you know how other frameworks deal with Kafka auto.commit mode? If they don't allow it altogether then Storm would be in an equal playing field and performance would not matter much at that point in my opinion. It will simply become a feature that is not supported. As far as the WARN level, initially when I created the PR with that change I had it with level DEBUG and I never felt that it should be WARN to begin with. During reviews someone asked for it to be WARN and I did it, but never felt it should be WARN, and I still don't think it should be. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346102#comment-16346102 ] Hugo Louro commented on STORM-2914: --- [~Srdo] [~kabhwan] [~avermeerbergen] the decision to keep or not keep NONE basically boils down to, do we want to support Kafka's enable.auto.commit or no? If we want to support this option, then the processing guarantee is technically NONE because a given record/tuple can be processed 0, 1, 2 or more times. In this case we cannot remove NONE. That leads me to ask why are we removing the option to set the Kafka property enable.auto.commit and throwing an exception. This is a Kafka feature that is available to every Kafka consumer, so why should the KafkaSpout, which is technically a Kafka consumer, be any different? If the goal is to avoid the WARN exception that was getting printed when enable.auto.commit=true, the obvious thing to do in my opinion is to simply not log the message if the processing guarantee is not AT_LEAST_ONCE. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345533#comment-16345533 ] Hugo Louro commented on STORM-2914: --- [~Srdo] I am reviewing this patches today. I have going over the discussion since yesterday to put all the pieces together. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345522#comment-16345522 ] Stig Rohde Døssing commented on STORM-2914: --- [~avermeerbergen] I have a 1.x branch here https://github.com/srdo/storm/tree/STORM-2914-2913-1.x with both 2914 and 2913 fixed. I'd appreciate if you would build and try out this fix. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342987#comment-16342987 ] Alexandre Vermeerbergen commented on STORM-2914: [~Srdo] We do set topology.max.spout.pending, and we already had a discussion in the past that until Storm 2.x, we had to use our by-pass to overcome the limitations of Storm 1.x back-pressure: See: [https://mail-archives.apache.org/mod_mbox/storm-user/201705.mbox/%3CCADeKz6qZG12mN-=gf+mpta1jwxdk8_wwz1npbcgslx7fga4...@mail.gmail.com%3E] (at subsequent posts). We observe no OOM, just the consumption from Spouts stops, regardless of whether it our own Kafka spout, the old Storm Kafka client or the newer Storm Kafka Client. Back to Storm 1.2.0 : any idea when I could test a fix which removes the cycling "WARN" message about metadata ? Thanks! > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342365#comment-16342365 ] Stig Rohde Døssing commented on STORM-2914: --- {quote} My belief is that our alerting topology gets stuck when there's a huge peak of messages rate because somehow we saturate Storm by exceeding its capacity, and/or maybe we hit some race condition that hands the spouts in a way that "normal load" would never trigger. {quote} I'd also add that what you're describing here is something you can likely avoid by configuring your topologies differently. For example, you can set topology.max.spout.pending to limit how many pending tuples will be allowed into the topology at a time, so a spike in messages written to Kafka doesn't trigger a flood of messages in Storm. Even without this configuration, the topology shouldn't hang (more likely an OOME would occur). I'd encourage you to try to get some more information (e.g. Storm UI screenshots, thread dumps of the workers) if it happens again, and raise a new issue here so we can try to figure out what's happening. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342327#comment-16342327 ] Jungtaek Lim commented on STORM-2914: - [~Srdo] Got it. Thanks for explanation. So it could make much difference if committing offset is costly. Basically I'm against keeping NONE, since not only it adds more complexity, but also it adds the issues to investigate (that guarantee is not we've supported before in other spouts), but we may still want to support it if the cost of committing is considerable. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342319#comment-16342319 ] Stig Rohde Døssing commented on STORM-2914: --- [~kabhwan] It would be good to figure out why Alexandre was seeing that behavior. Just to be clear though, this issue isn't related to fixing that. I simply wanted to know whether people would object to losing the NONE guarantee. To be a little more concrete about the difference between autocommit and how the spout acts with AT_MOST_ONCE, when the consumer is configured to autocommit, it will set an internal timestamp. If we call poll when the timestamp is sufficiently old, the consumer will asynchronously commit the offsets it has returned in earlier poll calls by effectively calling commitAsync on the consumer. When the spout is set to use AT_MOST_ONCE, the spout immediately calls commitSync after calling poll. I don't believe there's any difference between what goes over the network in the two calls, the only differences should be that one call is blocking and one is not, and that the AT_MOST_ONCE setting commits after every poll, while autocommit commits only on polls where the timer has expired. The linked PR removes the option to use autocommit, and makes the spout emulate autocommit's behavior when using NONE. After writing the code I don't think it's a big deal if we want to keep NONE, I thought it would add more complexity. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342316#comment-16342316 ] Jungtaek Lim commented on STORM-2914: - FYI to [~avermeerbergen], the behavior of back pressure will be completely renewed in Storm 2.0.0. https://issues.apache.org/jira/browse/STORM-2306 Please refer the doc in the issue. I guess it makes more sense to you, but please comment to the issue if you find some concerns. The change is included in huge patch, so there's no chance to port back to 1.x version line. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342312#comment-16342312 ] Jungtaek Lim commented on STORM-2914: - So I don't think we even need to have NONE processing guarantee: AT-MOST-ONCE should be able to cover that. Topology processing can get delayed more and more based on throughput vs incoming, but should never stuck, even we may encounter OOME instead if we fail to adjust the parameters. I'm not sure we may want to support this as official feature, but it may make sense that spout discards current offset and pull from latest, if gap between current offset and biggest offset exceeds specific threshold (or some other conditions are met). That should work similar as [~avermeerbergen]'s self healing cron. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342305#comment-16342305 ] Jungtaek Lim commented on STORM-2914: - I feel [~avermeerbergen]'s issue is unlikely to having auto-commit or not, since there's only a slight difference between AT-MOST-ONCE and auto-commit as [~Srdo] explained. There might be little difference for the performance perspective since auto-commit is handled by Kafka hence they can make it as non-round-trip, but I guess it would be not a big deal. I think investigating [~avermeerbergen]'s issue would be more helpful rather than supporting kind of workaround. The thing is why it was stuck, and why it couldn't be restored afterwards, and how does automatic backpressure play and how it affects the spout. What does "topology is stuck" mean? Even in back pressure state, non-spout should process tuples to reduce tuples in waiting queue so that back pressure flag can be turned off eventually. Topology shouldn't be stuck (except idle) at any chance. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342254#comment-16342254 ] Stig Rohde Døssing commented on STORM-2914: --- [~hmclouro] Thanks. I have most of the code changes ready for this and 2913, they're fairly small. I'll put up a pair of PRs soon. Happy to change them if they turn out to not be how we want to solve these issues. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342243#comment-16342243 ] Hugo Louro commented on STORM-2914: --- [~Srdo] I am currently with limited Internet access. I will get my input on this by tomorrow. Thanks. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342242#comment-16342242 ] Stig Rohde Døssing commented on STORM-2914: --- [~avermeerbergen] Regarding your use case, you might try at-least-once mode plus a filter to discard old metrics, plus the topology.max.spout.pending parameter to limit how many messages go into your topology at a time. That said, I'll continue on with the assumption that your current configuration is what you want to keep. I think it's likely that use cases like yours where you don't need at-least-once would be equally well supported by using the AT_MOST_ONCE processing guarantee https://github.com/apache/storm/blob/master/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java#L165. The main difference between enable.auto.commit and AT_MOST_ONCE is that AT_MOST_ONCE synchronously commits offsets to Kafka once it has polled, but before the tuples are emitted, while enable.auto.commit commits every once in a while before the consumer polls. What this means is that enable.auto.commit may emit a tuple 0, 1, 2, etc. times, while AT_MOST_ONCE will emit a tuple 0 or 1 times. Since users of enable.offset.commit don't care how many times a message is processed, they should be fine with the AT_MOST_ONCE guarantee. My only concern with dropping support for enable.auto.commit and asking people to use AT_MOST_ONCE instead, would be the switch to synchronous committing, which may have some impact on performance. I would be surprised if the impact were meaningful though. For now I think we should try to make the spout emulate enable.auto.commit when the processing guarantee is NONE, without actually setting enable.auto.commit on the consumer. That way we can specify metadata when committing, which allows us to fix STORM-2913 properly. If we decide later that enable.auto.commit is unnecessary, we can remove NONE as an option. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342221#comment-16342221 ] Alexandre Vermeerbergen commented on STORM-2914: Hello [~Srdo], We use auto-commit when consuming Kafka messages in our "real-time alerting" topology. This topology doesn't uses acking not anchoring Indeed, the purpose of our "real-time alerting" topology is to evaluate metrics read from Kafka through expressions (called triggers, like in Zabbix) into "alerting severities", which we write to both a Redis database (for rendering in an alerting web app) and into other Kafka topics (for notifications / persistency purposes handled by other topologies). We observed that sometimes our alerting topology is flooded by an excessive rate of metrics = Kafka messages read by kafka spout, because sometimes a remote Kafka cluster which was temporarily unable to replicate data to our central Kafka cluster (the one plugged to our topologies) comes "back to life" and sends all metrics at once, including metrics which are too old to be relevant for real-time alerting. And because if Storm 1.x back-pressure is kind of ... well, say "limited", the result of such flooding is that our topology is just stuck and no longer consumes any metric. So we have a simple self-healer "cron" which periodically checks whether or not our topology consumes metrics over a sliding windows of 10 minutes, and when consumption is stopped for 10 minutes, this cron will restart our topology with "LATEST " strategy. No with this context, after trying many combinations we found out that with Storm 1.1.0 and with our own Kafka spout (working in auto-commit mode, but not as powerful as storm kafka client), we had to enable auto-commit otherwise our topology would get stuck quite too often. We had been waiting for Storm 1.2.0 to use the official storm-kafka-client with auto-commit support, because our own spout was quite limited. Now, in order to answer your question about storm-kafka-client removal of auto-commit support, I am unsure about what I should answer: what would guarantee me that my real-time alerting topology's stability won't be worse than it was when I used storm-kafka-client before the following commit broke it :[https://github.com/apache/storm/commit/a3899b75a79781602fa58b90de6c8aa784af5332#diff-7d7cbc8f5444fa7ada7962033fc31c5e |https://github.com/apache/storm/commit/a3899b75a79781602fa58b90de6c8aa784af5332#diff-7d7cbc8f5444fa7ada7962033fc31c5e]? Best regards, Alexandre Vermeerbergen > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342145#comment-16342145 ] Stig Rohde Døssing commented on STORM-2914: --- Ping [~hmclouro] and [~avermeerbergen], I'd appreciate your opinions on this. > Remove enable.auto.commit support from storm-kafka-client > - > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.2.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > before poll, reassignment and close, and never use the enable.auto.commit > option. This allows us to include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)