Github user roshannaik commented on the issue: https://github.com/apache/storm/pull/2241 Have updated the PR with these 2 major changes in addition to addressing many of the smaller TODOs I had. 1) **Introduced Wait Strategy** - For these two cases: Bolt having no incoming data. And for BackPressure mode when spout/bolt cant write to downstream bolt's full queue. The wait strategy implementation for the case when spout has no data to emit remains unchanged. The new "Progressive" Wait strategy is a dynamic strategy that can go into deeper wait states progressively. It can be tuned to be more/less conservative ... if the user desires to make different tradeoffs in latency, CPU usage and throughput. 2) **Changes to default settings:** - So far my defaults where aggressively geared towards optimizing high throughput cases. Based on the concerns I have received, I have tweaked them to favor tight latency(batchSz=1), and also conserving cpu usage under low/medium throughput(SleepStrategy Settings). So high throughput modes may see some impact depending on topology. @revans2 To summarize the problems you have brought up so far: - **At low throughput, latency is poor** This issue was do due to defaults not being suited for these modes (primarily flushtuple period=5seconds). New defaults should work better. For High throughput modes, batching & flushtuple rate will need to be customized. - **At low throughputs, high CPU usage**: This was due to the missing wait strategy. That should also get addressed by the introduction. I want to make a few points to be considered for test runs: - Ideally, settings need to be tweaked appropriately for low throughput v/s high throughput vs latency centric runs. Using the same defaults for all will yield suboptimal numbers on some end of spectrum. My little tips for this PR: For higher throughput modes:, batchSz=~1000, additional ackers and finally consider less frequent flushing if its not impacting latency. - My rule-of-thumb for optimal executor count : If spout/bolt executors are likely to be very busy (high throughput modes), then 1 executor per physical core is a good sweet spot for overall perf. ACKers (if enabled) and also WorkerTransferThd (in multiworker mode) would require their own core as they tend to stay very busy. - max.spout.pending will be needed in high-throughput multi-worker runs due to the netty client issue raised by Bobby. - Bottlenecks: ACKer is a known throughput bottleneck. The inter-Worker path bottlenecks around ~2.8 mil/sec with STORM-2306. - Worth trying some runs with ACK disabled as well as Back Pressure mode.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---