[ 
https://issues.apache.org/jira/browse/STORM-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710005#comment-14710005
 ] 

ASF GitHub Bot commented on STORM-855:
--------------------------------------

Github user mjsax commented on a diff in the pull request:

    https://github.com/apache/storm/pull/694#discussion_r37800211
  
    --- Diff: storm-core/src/clj/backtype/storm/daemon/executor.clj ---
    @@ -445,6 +445,32 @@
         ret
         ))
     
    +(defn init-batching-buffer [worker-context component-id]
    +  (let [batchSize (.get_batch_size (.getComponentCommon worker-context 
component-id))]
    +    (if (> batchSize 1)
    +      (let [consumer-ids (flatten (for [cids (vals (.getTargets 
worker-context component-id))] (keys cids)))
    +            consumer-task-ids (flatten (for [cid consumer-ids :when (not 
(.startsWith cid "__"))] (into '() (.getComponentTasks worker-context cid))))]
    +        (HashMap. (zipmap consumer-task-ids (repeatedly (count 
consumer-task-ids) #(Batch. batchSize)))))
    +      (HashMap.)
    +      )))
    +
    +(defn emit-msg [out-task out-tuple overflow-buffer output-batch-buffer 
transfer-fn]
    +  (let [out-batch (.get output-batch-buffer out-task)]
    +    (if out-batch
    --- End diff --
    
    Your observation is right for the current state of the code. However, I 
would like to extend it with the possibility to used different batch sizes for 
different output streams (including mixed-mode batching/non-batching). 
Nevertheless, we could have 3 functions: non-batching, batching, mixed-mode. 
Thus, we could choose the correct function at setup time. The branching 
overhead is avoided for the both main cases (most times there is only a single 
output stream).


> Add tuple batching
> ------------------
>
>                 Key: STORM-855
>                 URL: https://issues.apache.org/jira/browse/STORM-855
>             Project: Apache Storm
>          Issue Type: New Feature
>            Reporter: Matthias J. Sax
>            Assignee: Matthias J. Sax
>            Priority: Minor
>
> In order to increase Storm's throughput, multiple tuples can be grouped 
> together in a batch of tuples (ie, fat-tuple) and transfered from producer to 
> consumer at once.
> The initial idea is taken from https://github.com/mjsax/aeolus. However, we 
> aim to integrate this feature deep into the system (in contrast to building 
> it on top), what has multiple advantages:
>   - batching can be even more transparent to the user (eg, no extra 
> direct-streams needed to mimic Storm's data distribution patterns)
>   - fault-tolerance (anchoring/acking) can be done on a tuple granularity 
> (not on a batch granularity, what leads to much more replayed tuples -- and 
> result duplicates -- in case of failure)
> The aim is to extend TopologyBuilder interface with an additional parameter 
> 'batch_size' to expose this feature to the user. Per default, batching will 
> be disabled.
> This batching feature has pure tuple transport purpose, ie, tuple-by-tuple 
> processing semantics are preserved. An output batch is assembled at the 
> producer and completely disassembled at the consumer. The consumer output can 
> be batched again, however, independent of batched or non-batched input. Thus, 
> batches can be of different size for each producer-consumer pair. 
> Furthermore, consumers can receive batches of different size from different 
> producers (including regular non batched input).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to