[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16342354#comment-16342354
 ] 

Stig Rohde Døssing commented on STORM-2914:
-------------------------------------------

[~avermeerbergen] 

I need to clarify a bit it seems.

The current proposal (the code currently in the PR) should not have a 
performance hit compared to the pre-1.2.0 code. It simply moves a bit of commit 
logic from the KafkaConsumer to the spout.

When we're talking about removing NONE, it is with the understanding that we'd 
want to benchmark it first so we can tell if using AT_MOST_ONCE instead of NONE 
has any real performance impact. I haven't benchmarked anything yet, so when I 
say that moving to AT_MOST_ONCE might incur a performance hit, I'm just 
speculating.

As Jungtaek mentioned, we're not going to remove NONE if it is a lot faster 
than AT_MOST_ONCE, but if it turns out there's no difference between the two 
(e.g. because the time spent sending commit requests to Kafka might be 
meaningless compared to time spent on other tasks), we would have no reason to 
keep NONE.

> Remove enable.auto.commit support from storm-kafka-client
> ---------------------------------------------------------
>
>                 Key: STORM-2914
>                 URL: https://issues.apache.org/jira/browse/STORM-2914
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-kafka-client
>    Affects Versions: 2.0.0, 1.2.0
>            Reporter: Stig Rohde Døssing
>            Assignee: Stig Rohde Døssing
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to