[ 
https://issues.apache.org/jira/browse/SAMZA-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977123#comment-13977123
 ] 

Chris Riccomini commented on SAMZA-220:
---------------------------------------

Also, I ran this patch on an internal job that was exhibiting poor performance 
on large partition counts. With this patch, it's performing much better (4x or 
more msgs/sec, per container).

> SystemConsumers is slow when consuming from a large number of partitions
> ------------------------------------------------------------------------
>
>                 Key: SAMZA-220
>                 URL: https://issues.apache.org/jira/browse/SAMZA-220
>             Project: Samza
>          Issue Type: Bug
>          Components: container
>    Affects Versions: 0.6.0
>            Reporter: Chris Riccomini
>            Assignee: Chris Riccomini
>         Attachments: SAMZA-220.0.patch, SAMZA-220.1.patch, SAMZA-220.2.patch, 
> SAMZA-220.3.patch, with-SAMZA-220.2.png, without-SAMZA-220.2.png
>
>
> We have observed poor throughput when a SamzaContainer is consuming many 
> partitions (100s). The more partitions, the worse the performance gets.
> When hooking up VisualVM, two operations take up more than 65% of the CPU in 
> SystemConsumers:
> {code}
>     refresh.maybeCall()
>     updateMessageChooser
> {code}
> The problem is that we run each of these operations once before every 
> process() call to a StreamTask. Both of these operations iterate over *all* 
> SystemStreamPartitions that the SystemConsumers is consuming from. If you 
> have hundreds of partitions, it means you do two loops of 100+ items for 
> every message you process. This is true even if the SystemConsumers buffer 
> has a lot of messages (10,000+), and also true even if most 
> systemStreamPartitions have no messages available.
> I have two proposed solutions to this problem:
> 1. Only call refresh.maybeCall() when the total number of buffered messages 
> in the SystemConsumers has dropped below some low watermark.
> 2. Only have updateMessageChooser call messageChooser.update for 
> systemStreamPartitions that actually *have* a message.
> I have implemented this and deployed it on a few jobs, and I am seeing 
> significant performance improvement. From 10k-20k msgs/sec to 50k+.
> The trade off, as I see it is really around (1), which will introduce a 
> little latency for topics that are low volume. In such a case, the time from 
> when a message arrives to when it gets refreshed in the buffer, and updated 
> in the chooser increases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to