[ 
https://issues.apache.org/jira/browse/STORM-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated STORM-2733:
----------------------------------
    Labels: pull-request-available  (was: )

> Make Load Aware Shuffle much better at really bad situations
> ------------------------------------------------------------
>
>                 Key: STORM-2733
>                 URL: https://issues.apache.org/jira/browse/STORM-2733
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-client
>    Affects Versions: 1.0.0, 2.0.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>              Labels: pull-request-available
>             Fix For: 2.0.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We recently had an issue where some bolts got really backed up and started to 
> die from OOMs.  The issue ended up being 2 fold.
> First the GC really slowed down the worker so much that it could not keep up 
> even with < 1% of the traffic that was still being sent to it.  Which made it 
> almost impossible to recover.
> The second issue was that the serialization of the tuples took a lot longer 
> than the processing, which resulted in the send queue filling up much more 
> quickly than the receive queue.
> To help fix this issue I plan to address this in 2 ways.  First we need a 
> better algorithm that can actually shut off the flow entirely to a very slow 
> bolt and second we need to take the send queue into account when shuffling.
> This is not a full set of changes needed by STORM-2686 but it is a step in 
> that direction.  I am going to try and set it up so that the two algorithms 
> would work nicely together.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to