[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351447#comment-16351447
 ] 

Stig Rohde Døssing commented on STORM-2914:
-------------------------------------------

[~hmclouro] [~kabhwan]

I had time to do some tests. I've added a storm-perf topoloy that uses 
storm-kafka-client instead of storm-kafka, here 
[https://github.com/apache/storm/pull/2545.]

I ran the topology from the mentioned PR for 10 minutes, with NONE and 
AT_MOST_ONCE against a Kafka instance with small messages preloaded 
("testMessage"). Because the commitSync call in AT_MOST_ONCE is blocking and 
thus likely affected by latency to Kafka, I tried running the topology both 
against a local Kafka node, and against one set up in an Azure VM.

I've attached the raw results.  [^storm-kafka-modes.ods] . Here's the averages 
for spout acks/s for each test:
* NONE local: 124778.6
* AT_MOST_ONCE local: 120226.6

* NONE Azure (25-40 ms latency): 34593.6
* AT_MOST_ONCE Azure: 10374.8

As you can see it's not too expensive to use AT_MOST_ONCE locally instead of 
NONE, but once latency is introduced, it looks much cheaper to use NONE if 
possible. I'm a little surprised the difference is so large, I would have 
expected something more like 50% throughput for AT_MOST_ONCE compared to NONE 
due to doubling the number of blocking calls in nextTuple when polling.

I'm not too experienced with benchmarking, so it's possible that I've made a 
mistake somewhere, or this benchmark may be too synthetic to really show 
anything.

For now I think we should keep NONE. I'll do as Hugo suggests and rename it and 
mark it unstable. I agree with Hugo's list, but we should consider waiting to 
hide committing/loading commits behind an interface until we actually need to 
store commits outside Kafka. Once we know what a second implementation needs, 
it'll probably be easier to make a good generic interface, rather than one that 
only fits our current use case, and I also would prefer not to add pluggability 
if we end up not needing it. 

> Remove enable.auto.commit support from storm-kafka-client
> ---------------------------------------------------------
>
>                 Key: STORM-2914
>                 URL: https://issues.apache.org/jira/browse/STORM-2914
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-kafka-client
>    Affects Versions: 2.0.0, 1.2.0
>            Reporter: Stig Rohde Døssing
>            Assignee: Stig Rohde Døssing
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: storm-kafka-modes.ods
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to