[ https://issues.apache.org/jira/browse/STORM-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16342333#comment-16342333 ]
Alexandre Vermeerbergen commented on STORM-2914: ------------------------------------------------ Hello [~kabhwan], I am aware that back-pressure will be very different in Storm 2.x, hoping that it will allow to improve our alerting topology's design (using a cron entry to detect topology being stuck and automatically restart it is quite a poor man's by-pass, I think we all agree on this). My belief is that our alerting topology gets stuck when there's a huge peak of messages rate because somehow we saturate Storm by exceeding its capacity, and/or maybe we hit some race condition that hands the spouts in a way that "normal load" would never trigger. I would love to share a sample, but currently our code is too much tied to a Redis-based inventory of VMs. Currently, our metrics are identified by the EC2ID of the VM from which they were measured : we use Redis to add metadata like "pod" identifier, pod's name & version (to get the list of triggers defined for this pod, etc.). We plan to rewrite our system with self-contained metrics (ie., send metrics directly with all these metadata) so that our alerting topology will eventually no longer depend on our custom Redis inventory. Once it's all done and clean, then I'd like to share it as a "Storm sample a little bit more rich than Exclamation topology :)" But this rewriting will take time, and I'm currently struggling to switch our topologies to storm kafka spout from storm kafka client 1.2.0. My "plan B" (if storm kafka client behavior in autocommit mode cant't be fixed ASAP is to revert to storm-kafka-client as it was before JSON metadata on commits where introduced, because it worked pretty well for us (more than 1 month in our pre-production monitoring 10000+ VMs, that's a decent test). I am scared to read in current comments that the current proposal to fix the incident with auto-commit will have a performance hit, because the performances we get with Storm is the key reason why we keep using Storm, in spite of the pressure of our developers who find that programming with Flink with simpler and more powerful (lambas, strong datatypes, easy-to-use windowed operations to name a few). When I read about Storm 2.x expected performances (posts from Roshan, ...), I have lots of hopes that our Storm choice will continue to make sense as the "streaming platform" for developers who are ready to be "closer to the machine" rather than "using higher lever of abstractions". It reminds me the "C/C++" vs "Java" camps. Long story, meant to explain why I put my hope in you guys (the talented Storm developers) to keep Storm as a platform which keeps on allowing us to build streaming processing applications with the best possible performances... I hope my long statement will help you make wise decisions with regard to this incident. Again I'm ready to test whatever you'd like me to test. Best regards, Alexandre Vermeerbergen > Remove enable.auto.commit support from storm-kafka-client > --------------------------------------------------------- > > Key: STORM-2914 > URL: https://issues.apache.org/jira/browse/STORM-2914 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka-client > Affects Versions: 2.0.0, 1.2.0 > Reporter: Stig Rohde Døssing > Assignee: Stig Rohde Døssing > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The enable.auto.commit option causes the KafkaConsumer to periodically commit > the latest offsets it has returned from poll(). It is convenient for use > cases where messages are polled from Kafka and processed synchronously, in a > loop. > Due to https://issues.apache.org/jira/browse/STORM-2913 we'd really like to > store some metadata in Kafka when the spout commits. This is not possible > with enable.auto.commit. I took at look at what that setting actually does, > and it just causes the KafkaConsumer to call commitAsync during poll (and > during a few other operations, e.g. close and assign) with some interval. > Ideally I'd like to get rid of ProcessingGuarantee.NONE, since I think > ProcessingGuarantee.AT_MOST_ONCE covers the same use cases, and is likely > almost as fast. The primary difference between them is that AT_MOST_ONCE > commits synchronously. > If we really want to keep ProcessingGuarantee.NONE, I think we should make > our ProcessingGuarantee.NONE setting cause the spout to call commitAsync > after poll, and never use the enable.auto.commit option. This allows us to > include metadata in the commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)