Matthias J. Sax created STORM-855:
-------------------------------------

             Summary: Add tuple batching
                 Key: STORM-855
                 URL: https://issues.apache.org/jira/browse/STORM-855
             Project: Apache Storm
          Issue Type: Improvement
            Reporter: Matthias J. Sax
            Assignee: Matthias J. Sax
            Priority: Minor


In order to increase Storm's throughput, multiple tuples can be grouped 
together in a batch of tuples (ie, fat-tuple) and transfered from producer to 
consumer at once.

The initial idea is taken from https://github.com/mjsax/aeolus. However, we aim 
to integrate this feature deep into the system (in contrast to building it on 
top), what has multiple advantages:
  - batching can be even more transparent to the user (eg, no extra 
direct-streams needed to mimic Storm's data distribution patterns)
  - fault-tolerance (anchoring/acking) can be done on a tuple granularity (not 
on a batch granularity, what leads to much more replayed tuples -- and result 
duplicates -- in case of failure)

The aim is to extend TopologyBuilder interface with an additional parameter 
'batch_size' to expose this feature to the user. Per default, batching will be 
disabled.

This batching feature has pure tuple transport purpose, ie, tuple-by-tuple 
processing semantics are preserved. An output batch is assembled at the 
producer and completely disassembled at the consumer. The consumer output can 
be batched again, however, independent of batched or non-batched input. Thus, 
batches can be of different size for each producer-consumer pair. Furthermore, 
consumers can receive batches of different size from different producers 
(including regular non batched input).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to