Github user roshannaik commented on the issue:
https://github.com/apache/storm/pull/2241
Have updated the PR with these 2 major changes in addition to addressing
many of the smaller TODOs I had.
1) **Introduced Wait Strategy** - For these two cases: Bolt having no
incoming data. And for BackPressure mode when spout/bolt cant write to
downstream bolt's full queue. The wait strategy implementation for the case
when spout has no data to emit remains unchanged. The new "Progressive" Wait
strategy is a dynamic strategy that can go into deeper wait states
progressively. It can be tuned to be more/less conservative ... if the user
desires to make different tradeoffs in latency, CPU usage and throughput.
2) **Changes to default settings:** - So far my defaults where aggressively
geared towards optimizing high throughput cases. Based on the concerns I have
received, I have tweaked them to favor tight latency(batchSz=1), and also
conserving cpu usage under low/medium throughput(SleepStrategy Settings). So
high throughput modes may see some impact depending on topology.
@revans2 To summarize the problems you have brought up so far:
- **At low throughput, latency is poor** This issue was do due to defaults
not being suited for these modes (primarily flushtuple period=5seconds). New
defaults should work better. For High throughput modes, batching & flushtuple
rate will need to be customized.
- **At low throughputs, high CPU usage**: This was due to the missing wait
strategy. That should also get addressed by the introduction.
I want to make a few points to be considered for test runs:
- Ideally, settings need to be tweaked appropriately for low throughput
v/s high throughput vs latency centric runs. Using the same defaults for all
will yield suboptimal numbers on some end of spectrum. My little tips for this
PR: For higher throughput modes:, batchSz=~1000, additional ackers and finally
consider less frequent flushing if its not impacting latency.
- My rule-of-thumb for optimal executor count : If spout/bolt executors are
likely to be very busy (high throughput modes), then 1 executor per physical
core is a good sweet spot for overall perf. ACKers (if enabled) and also
WorkerTransferThd (in multiworker mode) would require their own core as they
tend to stay very busy.
- max.spout.pending will be needed in high-throughput multi-worker runs due
to the netty client issue raised by Bobby.
- Bottlenecks: ACKer is a known throughput bottleneck. The inter-Worker
path bottlenecks around ~2.8 mil/sec with STORM-2306.
- Worth trying some runs with ACK disabled as well as Back Pressure mode.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---