Github user roshannaik commented on the issue:

    https://github.com/apache/storm/pull/2241
  
    Have updated the PR with these 2 major changes in addition to addressing 
many of the smaller TODOs I had. 
    
    1) **Introduced Wait Strategy** -  For these two cases: Bolt having no 
incoming data. And for BackPressure mode when spout/bolt cant write to 
downstream bolt's full queue. The wait strategy implementation for the case 
when spout has no data to emit remains unchanged. The new "Progressive" Wait 
strategy is a dynamic strategy that can go into deeper wait states 
progressively. It can be tuned to be more/less conservative  ... if the user 
desires to make different tradeoffs in latency, CPU usage and throughput.
    
    2) **Changes to default settings:** - So far my defaults where aggressively 
geared towards optimizing high throughput cases.  Based on the concerns I have 
received, I have tweaked them to favor tight latency(batchSz=1), and also 
conserving cpu usage under low/medium throughput(SleepStrategy Settings).  So 
high throughput modes may see some impact depending on topology.
    
    @revans2 To summarize the problems you have brought up so far:
    
    - **At low throughput, latency is poor**  This issue was do due to defaults 
not being suited for these modes (primarily flushtuple period=5seconds). New 
defaults should work better. For High throughput modes, batching  & flushtuple 
rate will need to be customized.
    - **At low throughputs, high CPU usage**: This was due to the missing wait 
strategy. That should also get addressed by the introduction.
    
    
    
    I want to make a few points to be considered for test runs:
    -  Ideally, settings need to be tweaked appropriately for low throughput 
v/s high throughput vs latency centric runs. Using the same defaults for all 
will yield suboptimal numbers on some end of spectrum. My little tips for this 
PR: For higher throughput modes:, batchSz=~1000, additional ackers and finally 
consider less frequent flushing if its not impacting latency.
    
    - My rule-of-thumb for optimal executor count : If spout/bolt executors are 
likely to be very busy (high throughput modes), then 1 executor per physical 
core is a good sweet spot for overall perf. ACKers (if enabled) and also 
WorkerTransferThd (in multiworker mode) would require their own core as they 
tend to stay very busy. 
    
    - max.spout.pending will be needed in high-throughput multi-worker runs due 
to the netty client issue raised by Bobby.
    
    - Bottlenecks: ACKer is a known throughput bottleneck. The inter-Worker 
path bottlenecks around ~2.8 mil/sec with STORM-2306.
    
    - Worth trying some runs with ACK disabled as well as Back Pressure mode.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to