[GitHub] storm pull request: [STORM-855] Add tuple batching

mjsax Tue, 25 Aug 2015 09:17:16 -0700

Github user mjsax commented on the pull request:

    https://github.com/apache/storm/pull/694#issuecomment-134656876
  
    I just checked some older benchmark result doing batching in user land, ie, 
on top of Storm (=> Aeolus). For this case, a batch size of 100 increased the 
spout output rate by a factor of 6 (instead of 1.5 as the benchmark above 
shows). The benchmark should yield more than 70M tuples per 30 seconds... (and 
not about 19M).
    
    Of course, batching is done a little different now. In Aeolus, a fat-tuple 
is used as batch. Thus, the system sees only a single batch-tuple. Now, the 
system sees all tuples, but emitting is delayed until the batch is full (this 
still saved the overhead of going through the disruptor for each tuple). 
However, we generate a tuple-ID for each tuple in the batch, instead of a 
single ID per batch. Not sure how expensive this is. Because acking was not 
enabled, it should not be too expensive, because the IDs have not to be 
"registered" at the ackers (right?).
    
    As a further optimization, it might be a good idea not to batch whole 
tuples, but only `Values` and tuple-id. The `worker-context`, `task-id`, and 
`outstream-id` is the same for all tuples within a batch. I will try this out, 
and push a new version the next days if it works.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: [STORM-855] Add tuple batching

Reply via email to