[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347688#comment-16347688 ]
Jungtaek Lim commented on STORM-2914: ------------------------------------- Btw, not a blocker but just food for thought is that the timing of commitment. We are committing in nextTuple() which would not be called in several states, flooding messages exceed max spout pending, in backpressure state, etc. Sometimes it can get somewhat longer even acks are coming, and if there's crash, topology will discard acks completely while nextTuple() is not called and reprocess from start of backpressure at any luck or even worse, which might bring another backpressure and make loop if backpressure comes from the characteristic of records in poll. May need to double check about the model of backpressure which STORM-2306 will bring (I guess we could get rid of this situation with new backpressure model), but at least 1.x version line it will persist. I don't think handling commit in ack() with timer would work, so current approach would be our best effort in current situation (handling both ack()/nextTuple() might be an alternative solution to mitigate), but we may want to consider such case to improve spout's lifecycle and behavior in future (not KafkaSpout but Storm's spout). > Remove enable.auto.commit support from storm-kafka-client > --------------------------------------------------------- > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client > Affects Versions: 2.0.0, 1.2.0 > Reporter: Stig Rohde Døssing > Assignee: Stig Rohde Døssing > Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)