[ 
https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16342333#comment-16342333
 ] 

Alexandre Vermeerbergen commented on STORM-2914:
------------------------------------------------

Hello [~kabhwan],

I am aware that back-pressure will be very different in Storm 2.x, hoping that 
it will allow to improve our alerting topology's design (using a cron entry to 
detect topology being stuck and automatically restart it is quite a poor man's 
by-pass, I think we all agree on this).

My belief is that our alerting topology gets stuck when there's a huge peak of 
messages rate because somehow we saturate Storm by exceeding its capacity, 
and/or maybe we hit some race condition that hands the spouts in a way that 
"normal load" would never trigger.

I would love to share a sample, but currently our code is too much tied to a 
Redis-based inventory of VMs.
Currently, our metrics are identified by the EC2ID of the VM from which they 
were measured : we use Redis to add metadata like "pod" identifier, pod's name 
& version (to get the list of triggers defined for this pod, etc.). We  plan to 
rewrite our system with self-contained metrics (ie., send metrics directly with 
all these metadata) so that our alerting topology will eventually no longer 
depend on our custom Redis inventory. Once it's all done and clean, then I'd 
like to share it as a "Storm sample a little bit more rich than Exclamation 
topology :)"

But this rewriting will take time, and I'm currently struggling to switch our 
topologies to storm kafka spout from storm kafka client 1.2.0. My "plan B" (if 
storm kafka client behavior in autocommit mode cant't be fixed ASAP is to 
revert to storm-kafka-client as it was before JSON metadata on commits where 
introduced, because it worked pretty well for us (more than 1 month in our 
pre-production monitoring 10000+ VMs, that's a decent test).

I am scared to read in current comments that the current proposal to fix the 
incident with auto-commit will have a performance hit, because the performances 
we get with Storm is the key reason why we keep using Storm, in spite of the 
pressure of our developers who find that programming with Flink with simpler 
and more powerful (lambas, strong datatypes, easy-to-use windowed operations to 
name a few). When I read about Storm 2.x expected performances (posts from 
Roshan, ...), I have lots of hopes that our Storm choice will continue to make 
sense as the "streaming platform" for developers who are ready to be "closer to 
the machine" rather than "using higher lever of abstractions". It reminds me 
the "C/C++" vs "Java" camps.

Long story, meant to explain why I put my hope in you guys (the talented Storm 
developers) to keep Storm as a platform which keeps on allowing us to build 
streaming processing applications with the best possible performances... 

I hope my long statement will help you make wise decisions with regard to this 
incident.

Again I'm ready to test whatever you'd like me to test.

Best regards,

Alexandre Vermeerbergen

 

 

 

> Remove enable.auto.commit support from storm-kafka-client
> ---------------------------------------------------------
>
>                 Key: STORM-2914
>                 URL: https://issues.apache.org/jira/browse/STORM-2914
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-kafka-client
>    Affects Versions: 2.0.0, 1.2.0
>            Reporter: Stig Rohde Døssing
>            Assignee: Stig Rohde Døssing
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The enable.auto.commit option causes the KafkaConsumer to periodically commit 
> the latest offsets it has returned from poll(). It is convenient for use 
> cases where messages are polled from Kafka and processed synchronously, in a 
> loop. 
> Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to 
> store some metadata in Kafka when the spout commits. This is not possible 
> with enable.auto.commit. I took at look at what that setting actually does, 
> and it just causes the KafkaConsumer to call commitAsync during poll (and 
> during a few other operations, e.g. close and assign) with some interval. 
> Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think 
> ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely 
> almost as fast. The primary difference between them is that AT_MOST_ONCE 
> commits synchronously.
> If we really want to keep ProcessingGuarantee.NONE, I think we should make 
> our ProcessingGuarantee.NONE setting cause the spout to call commitAsync 
> after poll, and never use the enable.auto.commit option. This allows us to 
> include metadata in the commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to