Robert Joseph Evans created STORM-2786:
------------------------------------------

             Summary: Ackers leak tracking info on failure and lots of other 
cases.
                 Key: STORM-2786
                 URL: https://issues.apache.org/jira/browse/STORM-2786
             Project: Apache Storm
          Issue Type: Bug
          Components: storm-client, storm-core
    Affects Versions: 0.9.1-incubating, 0.10.0, 1.0.0, 2.0.0
            Reporter: Robert Joseph Evans
            Assignee: Robert Joseph Evans
            Priority: Critical


Over the weekend we had an incident where ackers were running out of memory at 
a really scary rate.  It turns out that they were having a lot of failures, for 
an unrelated reason, but each of the failures were resulting in tuple tracking 
being lost because... 

We don't send ticks to any system components ever...

https://github.com/apache/storm/blob/124acb92dff04a57b530ab4d95a698abc8ff46d9/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L384

and ackers are system components.

So the tracking map was never rotated and all failed tuples

https://github.com/apache/storm/blob/124acb92dff04a57b530ab4d95a698abc8ff46d9/storm-client/src/jvm/org/apache/storm/daemon/Acker.java#L97-L103

Were never deleted from the map.

This leak eventually made the ackers crash, and when they came back up the 
other components kept blasting them with messages that would never be fully 
acked which also leaked because of the tick problem.

Looking back this has been in every release since 0.9.1-incubating.  It appears 
to have been introduced by 
https://github.com/apache/storm/commit/483ce454a3b2cd31b5d1c34e9365346459b358a8

So every apache release has this problem (which is the only reason I have not 
marked this as a blocker, because apparently it is not so bad that anyone has 
noticed in the past 4 years).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to