Matthias J. Sax created STORM-855:
-------------------------------------
Summary: Add tuple batching
Key: STORM-855
URL: https://issues.apache.org/jira/browse/STORM-855
Project: Apache Storm
Issue Type: Improvement
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor
In order to increase Storm's throughput, multiple tuples can be grouped
together in a batch of tuples (ie, fat-tuple) and transfered from producer to
consumer at once.
The initial idea is taken from https://github.com/mjsax/aeolus. However, we aim
to integrate this feature deep into the system (in contrast to building it on
top), what has multiple advantages:
- batching can be even more transparent to the user (eg, no extra
direct-streams needed to mimic Storm's data distribution patterns)
- fault-tolerance (anchoring/acking) can be done on a tuple granularity (not
on a batch granularity, what leads to much more replayed tuples -- and result
duplicates -- in case of failure)
The aim is to extend TopologyBuilder interface with an additional parameter
'batch_size' to expose this feature to the user. Per default, batching will be
disabled.
This batching feature has pure tuple transport purpose, ie, tuple-by-tuple
processing semantics are preserved. An output batch is assembled at the
producer and completely disassembled at the consumer. The consumer output can
be batched again, however, independent of batched or non-batched input. Thus,
batches can be of different size for each producer-consumer pair. Furthermore,
consumers can receive batches of different size from different producers
(including regular non batched input).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)