[jira] [Comment Edited] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

Jungtaek Lim (JIRA) Wed, 31 Jan 2018 14:17:32 -0800

    [ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347688#comment-16347688
 ]


Jungtaek Lim edited comment on STORM-2914 at 1/31/18 10:16 PM:
---------------------------------------------------------------

Btw, not a blocker but just food for thought is that the timing of commitment. 
We are committing in nextTuple() which would not be called in several states, 
flooding messages exceed max spout pending, in backpressure state, etc. 
Sometimes it can get somewhat longer even acks are coming (making timer being 
ineffective at that moment), and if there's crash, topology will discard acks 
completely while nextTuple() is not called and reprocess from start of 
backpressure at any luck or even worse, which might bring another backpressure 
and make loop if backpressure comes from the characteristic of records in poll.

May need to double check about the model of backpressure which STORM-2306 will 
bring (I guess we could get rid of this situation with new backpressure model), 
but at least 1.x version line it will persist. I don't think handling commit in 
ack() with timer would work, so current approach would be our best effort in 
current situation (handling it from both ack()/nextTuple() might be an 
alternative solution to mitigate), but we may want to consider such case to 
improve spout's lifecycle and behavior in future (not only KafkaSpout but also 
Storm's spout).


was (Author: kabhwan):
Btw, not a blocker but just food for thought is that the timing of commitment. 
We are committing in nextTuple() which would not be called in several states, 
flooding messages exceed max spout pending, in backpressure state, etc. 
Sometimes it can get somewhat longer even acks are coming, and if there's 
crash, topology will discard acks completely while nextTuple() is not called 
and reprocess from start of backpressure at any luck or even worse, which might 
bring another backpressure and make loop if backpressure comes from the 
characteristic of records in poll.

May need to double check about the model of backpressure which STORM-2306 will 
bring (I guess we could get rid of this situation with new backpressure model), 
but at least 1.x version line it will persist. I don't think handling commit in 
ack() with timer would work, so current approach would be our best effort in 
current situation (handling it from both ack()/nextTuple() might be an 
alternative solution to mitigate), but we may want to consider such case to 
improve spout's lifecycle and behavior in future (not only KafkaSpout but also 
Storm's spout).

> Remove enable.auto.commit support from storm-kafka-client
> ---------------------------------------------------------
>
>                 Key: STORM-2914
>                 URL: https://issues.apache.org/jira/browse/STORM-2914
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-kafka-client
>    Affects Versions: 2.0.0, 1.2.0
>            Reporter: Stig Rohde Døssing
>            Assignee: Stig Rohde Døssing
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (STORM-2914) Remove enable.auto.commit support from storm-kafka-client

Reply via email to