[ https://issues.apache.org/jira/browse/STORM-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Joseph Evans resolved STORM-2733. ---------------------------------------- Resolution: Fixed Fix Version/s: 2.0.0 I pulled this into 2.x > Make Load Aware Shuffle much better at really bad situations > ------------------------------------------------------------ > > Key: STORM-2733 > URL: https://issues.apache.org/jira/browse/STORM-2733 > Project: Apache Storm > Issue Type: Bug > Components: storm-client > Affects Versions: 1.0.0, 2.0.0 > Reporter: Robert Joseph Evans > Assignee: Robert Joseph Evans > Fix For: 2.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > We recently had an issue where some bolts got really backed up and started to > die from OOMs. The issue ended up being 2 fold. > First the GC really slowed down the worker so much that it could not keep up > even with < 1% of the traffic that was still being sent to it. Which made it > almost impossible to recover. > The second issue was that the serialization of the tuples took a lot longer > than the processing, which resulted in the send queue filling up much more > quickly than the receive queue. > To help fix this issue I plan to address this in 2 ways. First we need a > better algorithm that can actually shut off the flow entirely to a very slow > bolt and second we need to take the send queue into account when shuffling. > This is not a full set of changes needed by STORM-2686 but it is a step in > that direction. I am going to try and set it up so that the two algorithms > would work nicely together. -- This message was sent by Atlassian JIRA (v6.4.14#64029)