[GitHub] storm pull request #2241: STORM-2306 : Messaging subsystem redesign.

roshannaik Wed, 27 Dec 2017 23:19:30 -0800

Github user roshannaik commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2241#discussion_r158906210
  
    --- Diff: docs/Performance.md ---
    @@ -0,0 +1,128 @@
    +---
    +title: Performance Tuning
    +layout: documentation
    +documentation: true
    +---
    +
    +Latency, throughput and CPU consumption are the three key dimensions 
involved in performance tuning.
    +In the following sections we discuss the settings that can used to tune 
along these dimension and understand the trade-offs.
    +
    +It is important to understand that these settings can vary depending on 
the topology, the type of hardware and the number of hosts used by the topology.
    +
    +## 1. Batch Size
    +Spouts and Bolts communicate with each other via concurrent message 
queues. The batch size determines the number of messages to be buffered before
    +the producer (spout/bolt) attempts to actually write to the downstream 
component's message queue. Inserting messages in batches to downstream
    +queues helps reduce the number of synchronization operations required for 
the inserts. Consequently this helps achieve higher throughput. However,
    +sometimes it may take a little time for the buffer to fill up, before it 
is flushed into the downstream queue. This implies that the buffered messages
    +will take longer to become visible to the downstream consumer who is 
waiting to process them. This can increase the average end-to-end latency for
    +these messages. The latency can get very bad if the batch sizes are large 
and the topology is not experiencing high traffic.
    +
    +`topology.producer.batch.size` : The batch size for writes into the 
receive queue of any spout/bolt is controlled via this setting. This setting
    +impacts the communication within a worker process. Each upstream producer 
maintains a separate batch to a component's receive queue. So if two spout
    +instances are writing to the same downstream bolt instance, each of the 
spout instances will have maintain a separate batch.
    +
    +`topology.transfer.batch.size` : Messages that are destined to a 
spout/bolt running on a different worker process, are sent to a queue called
    +the **Worker Transfer Queue**. The Worker Transfer Thread is responsible 
for draining the messages in this queue and send them to the appropriate
    +worker process over the network. This setting controls the batch size for 
writes into the Worker Transfer Queue.  This impacts the communication
    +between worker processes.
    +
    +#### Guidance
    +
    +**For Low latency:** Set batch size to 1. This basically disables 
batching. This is likely to reduce peak sustainable throughput under heavy 
traffic, but
    +not likely to impact throughput much under low/medium traffic situations.
    +**For High throughput:** Set batch size > 1. Try values like 10, 100, 1000 
or even higher and see what yields the best throughput for the topology.
    +Beyond a certain point the throughput is likely to get worse.
    +**Varying throughput:** Topologies often experience fluctuating amounts of 
incoming traffic over the day. Other topos may experience higher traffic in some
    +paths and lower throughput in other paths simultaneously. If latency is 
not a concern, a small bach size (e.g. 10) and in conjunction with the right 
flush
    +frequency may provide a reasonable compromise for such scenarios. For 
meeting stricter latency SLAs, consider setting it to 1.
    +
    +
    +## 2. Flush Tuple Frequency
    +In low/medium traffic situations or when batch size is too large, the 
batches may take too long to fill up and consequently the messages could take 
unacceptably
    +long time to become visible to downstream components. In such case, 
periodic flushing of batches is necessary to keep the messages moving and avoid 
compromising
    +latencies when batching is enabled.
    +
    +When batching has been enabled, special messages called *flush tuples* are 
inserted periodically into the receive queues of all spout and bolt instances.
    +This causes each spout/bolt instance to flush all its outstanding batches 
to their respective downstream components.
    +
    +`topology.flush.tuple.freq.millis` : This setting controls how often the 
flush tuples are generated. Flush tuples are not generated if this 
configuration is
    +set to 0 or if (`topology.producer.batch.size`=1 and 
`topology.transfer.batch.size`=1).
    +
    +
    +#### Guidance
    +Flushing interval can be used as tool to retain the higher throughput 
benefits of batching and avoid batched messages getting stuck for too long 
waiting for their.
    +batch to fill. Preferably this value should be larger than the average 
execute latencies of the bolts in the topology. Trying to flush the queues more 
frequently than
    +the amount of time it takes to produce the messages may hurt performance. 
Understanding the average execute latencies of each bolt will help determine 
the average
    +number of messages in the queues between two flushes.
    +
    +**For Low latency:** A smaller value helps achieve tighter latency SLAs.
    +**For High throughput:**  When trying to maximize throughput under high 
traffic situations, the batches are likely to get filled and flushed 
automatically.
    +To optimize for such cases, this value can be set to a higher number.
    +**Varying throughput:** If latency is not a concern, a larger value will 
optimize for high traffic situations. For meeting tighter SLAs set this to lower
    +values.
    +
    +
    +## 3. Wait Strategy
    +Wait strategies are used to conserve CPU usage by trading off some latency 
and throughput. They are applied for the following situations:
    +
    +3.1 **Spout Wait:**  In low/no traffic situations, Spout's nextTuple() may 
not produce any new emits. To prevent invoking the Spout's nextTuple,
    +this wait strategy is used between nextTuple() calls to allow the spout's 
executor thread to idle and conserve CPU. Select a strategy using 
`topology.spout.wait.strategy`.
    +
    +3.2 **Bolt Wait:** : When a bolt polls it's receive queue for new messages 
to process, it is possible that the queue is empty. This typically happens
    +in case of low/no traffic situations or when the upstream spout/bolt is 
inherently slower. This wait strategy is used in such cases. It avoids high CPU 
usage
    +due to the bolt continuously checking on a typically empty queue. Select a 
strategy using `topology.bolt.wait.strategy`. The chosen strategy can be 
further configured
    +using the `topology.bolt.wait.*` settings.
    +
    +3.3 **Backpressure Wait** : Select a strategy using 
`topology.backpressure.wait.strategy`. When a spout/bolt tries to write to a 
downstream component's receive queue,
    +there is a possibility that the queue is full. In such cases the write 
needs to be retried. This wait strategy is used to induce some idling 
in-between re-attempts for
    +conserving CPU. The chosen strategy can be further configured using the 
`topology.backpressure.wait.*` settings.
    +
    +
    +#### Built-in wait strategies:
    +
    +- **SleepSpoutWaitStrategy** : This is the only built-in strategy 
available for Spout Wait. It cannot be applied to other Wait situations. It is 
a simple static strategy that
    +calls Thread.sleep() each time. Set `topology.spout.wait.strategy` to 
`org.apache.storm.spout.SleepSpoutWaitStrategy` for using this. 
`topology.sleep.spout.wait.strategy.time.ms`
    +configures the sleep time.
    +
    +- **ProgressiveWaitStrategy** : This strategy can be used for Bolt Wait or 
Backpressure Wait situations. Set the strategy to 
'org.apache.storm.policy.WaitStrategyProgressive' to
    +select this wait strategy. This is a dynamic wait strategy that enters 
into progressively deeper states of CPU conservation if the Backpressure Wait 
or Bolt Wait situations persist.
    +It has 3 levels of idling and allows configuring how long to stay at each 
level :
    +
    +  1. No Waiting - The first few times it will return immediately. This 
does not conserve any CPU. The number of times it remains in this state is 
configured using
    +  `topology.bolt.wait.progressive.level1.count` or 
`topology.backpressure.wait.progressive.level1.count` depending which situation 
it is being used.
    +
    +  2. Park Nanos - In this state it disables the current thread for thread 
scheduling purposes, for 1 nano second using LockSupport.parkNanos(). This puts 
the CPU in a minimal
    +  conservation state. It remains in this state for 
`topology.bolt.wait.progressive.level2.count` or 
`topology.backpressure.wait.progressive.level2.count` iterations.
    +
    +  3. Thread.sleep() - In this state it calls Thread.sleep() with the value 
specified in `topology.backpressure.wait.progressive.level3.sleep.millis` or in
    +  `topology.bolt.wait.progressive.level3.sleep.millis` based on the Wait 
situation it is used in. This is the most CPU conserving level it remains in 
this level for
    +  the remaining iterations.
    +
    +
    +- **ParkWaitStrategy** : This strategy can be used for Bolt Wait or 
Backpressure Wait situations. Set the strategy to 
`org.apache.storm.policy.WaitStrategyPark` to use this.
    +This strategy disables the current thread for thread scheduling purposes 
by calling LockSupport.parkNanos(). The amount of park time is configured using 
either
    +`topology.bolt.wait.park.microsec` or 
`topology.backpressure.wait.park.microsec` based on the wait situation it is 
used. Setting the park time to 0, effectively disables
    +invocation of LockSupport.parkNanos and this mode can be used to achieve 
busy polling (which at the cost of high CPU utilization even when idle, may 
improve latency and/or throughput).
    +
    +
    +## Max.spout.pending
    +The back pressure mechanism no longer requires 
`topology.max.spout.pending`. It is recommend to set this to null (default).
    --- End diff --
    
    Thanks for catching that. This comment is no longer true. Will fix.

---

[GitHub] storm pull request #2241: STORM-2306 : Messaging subsystem redesign.

Reply via email to