Re: Worker memory leaks?

Josh Rosen Mon, 27 Jul 2015 10:17:07 -0700

I have promoted https://issues.apache.org/jira/browse/SPARK-9202 to a
blocker to ensure that we get a fix for it before 1.5.0  I'm pretty swamped
with other tasks for the next few days, but I'd be happy to shepherd a
bugfix patch for this (this should be pretty straightforward and the JIRA
ticket contains a sketch of how I'd do it).


On Mon, Jul 20, 2015 at 12:37 PM, Richard Marscher <rmarsc...@localytics.com
> wrote:

> Hi,
>
> thanks for the follow up. You are right regarding the invalidation of
> observation #2. I later realized the Worker UI page directly displays the
> entries in the executors map and can see in our production UI it's in a
> proper state.
>
> As for the Killed vs Exited, it's less relevant now since the theory about
> the executors map is invalid. However to answer your question, the current
> setup is that the SparkContext lifecycle encapsulates exactly one
> application. That is we create a single context per application submitted
> and close it upon success/failure completion of the application.
>
> Thanks,
>
> On Mon, Jul 20, 2015 at 3:20 PM, Josh Rosen <joshro...@databricks.com>
> wrote:
>
>> Hi Richard,
>>
>> Thanks for your detailed investigation of this issue.  I agree with your
>> observation that the finishedExecutors hashmap is a source of memory leaks
>> for very-long-lived clusters.  It looks like the finishedExecutors map is
>> only read when rendering the Worker Web UI and in constructing REST API
>> responses.  I think that we could address this leak by adding a
>> configuration to cap the maximum number of retained executors,
>> applications, etc.  We already have similar caps in the driver UI.  If we
>> add this configuration, I think that we should pick some sensible default
>> value rather than an unlimited one.  This is technically a user-facing
>> behavior change but I think it's okay since the current behavior is to
>> crash / OOM.
>>
>> Regarding `KillExecutor`, I think that there might be some asynchrony and
>> indirection masking the cleanup here.  Based on a quick glance through the
>> code, it looks like ExecutorRunner's thread will end an
>> ExecutorStateChanged RPC back to the Worker after the executor is killed,
>> so I think that the cleanup will be triggered by that RPC.  Since this
>> isn't clear from reading the code, though, it would be great to add some
>> comments to the code to explain this, plus a unit test to make sure that
>> this indirect cleanup mechanism isn't broken in the future.
>>
>> I'm not sure what's causing the Killed vs Exited issue, but I have one
>> theory: does the behavior vary based on whether your application cleanly
>> shuts down the SparkContext via SparkContext.stop()? It's possible that
>> omitting the stop() could lead to a "Killed" exit status, but I don't know
>> for sure.  (This could probably also be clarified with a unit test).
>>
>> To my knowledge, the spark-perf suite does not contain the sort of
>> scale-testing workload that would expose these types of memory leaks; we
>> have some tests for very long-lived individual applications, but not tests
>> for long-lived clusters that run thousands of applications between
>> restarts.  I'm going to create some tickets to add such tests.
>>
>> I've filed https://issues.apache.org/jira/browse/SPARK-9202 to follow up
>> on the finishedExecutors leak.
>>
>> - Josh
>>
>> On Mon, Jul 20, 2015 at 9:56 AM, Richard Marscher <
>> rmarsc...@localytics.com> wrote:
>>
>>> Hi,
>>>
>>> we have been experiencing issues in production over the past couple
>>> weeks with Spark Standalone Worker JVMs seeming to have memory leaks. They
>>> accumulate Old Gen until it reaches max and then reach a failed state that
>>> starts critically failing some applications running against the cluster.
>>>
>>> I've done some exploration of the Spark code base related to Worker in
>>> search of potential sources of this problem and am looking for some
>>> commentary on a couple theories I have:
>>>
>>> Observation 1: The `finishedExecutors` HashMap seem to only accumulate
>>> new entries over time unbounded. It only seems to be appended and never
>>> periodically purged or cleaned of older executors in line with something
>>> like the worker cleanup scheduler.
>>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala#L473
>>>
>>> I feel somewhat confident that over time this will exhibit a "leak". I
>>> quote it just because it may be intentional to hold these references to
>>> support functionality versus a true leak where you just accidentally hold
>>> onto memory.
>>>
>>> Observation 2: I feel much less certain about this, but it seemed like
>>> if the Worker is messaged with `KillExecutor` then it only kills the `
>>> ExecutorRunner` but does not clean it up from the executor map.
>>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala#L492
>>>
>>> I haven't been able to sort out whether I'm missing something indirect
>>> where it before/after cleans that executor from the map. However, if it
>>> does not, then it may be leaking references on this map.
>>>
>>> One final observation related to our production metrics and not the
>>> codebase itself. We used to periodically see that our completed
>>> applications had the status of "Killed" instead of "Exited" for all the
>>> executors. However, now we see every completed application has a final
>>> state of "Killed" for all the executors. I might speculatively
>>> correlate this to Observation 2 as a potential reason we have started
>>> seeing this issue more recently.
>>>
>>> We also have a larger and increasing workload over the past few weeks
>>> and possibly code changes to the application description that could be
>>> exacerbating these potential underlying issues. We run a lot of smaller
>>> applications per day, something in the range of hundreds to maybe 1000
>>> applications per day with 16 executors per application.
>>>
>>> Thanks
>>> --
>>> *Richard Marscher*
>>> Software Engineer
>>> Localytics
>>> Localytics.com <http://localytics.com/> | Our Blog
>>> <http://localytics.com/blog> | Twitter <http://twitter.com/localytics>
>>>  | Facebook <http://facebook.com/localytics> | LinkedIn
>>> <http://www.linkedin.com/company/1148792?trk=tyah>
>>>
>>
>>
>
>
> --
> *Richard Marscher*
> Software Engineer
> Localytics
> Localytics.com <http://localytics.com/> | Our Blog
> <http://localytics.com/blog> | Twitter <http://twitter.com/localytics> |
> Facebook <http://facebook.com/localytics> | LinkedIn
> <http://www.linkedin.com/company/1148792?trk=tyah>
>

Re: Worker memory leaks?

Reply via email to