Re: Fair scheduler pool leak

Imran Rashid Fri, 06 Apr 2018 08:09:43 -0700

Hi Matthias,

This doeesn't look possible now.  It may be worth filing an improvement
jira for.


But I'm trying to understand what you're trying to do a little better.  So
you intentionally have each thread create a new unique pool when its
submits a job?  So that pool will just get the default pool configuration,
and you will see lots of these messages in your logs?

https://github.com/apache/spark/blob/6ade5cbb498f6c6ea38779b97f2325d5cf5013f2/core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala#L196-L200

What is the use case for creating pools this way?

Also if I understand correctly, it doesn't even matter if the thread dies
-- that pool will still stay around, as the rootPool will retain a
reference to its (the pools aren't really actually tied to specific
threads).

Imran

On Thu, Apr 5, 2018 at 9:46 PM, Matthias Boehm <mboe...@gmail.com> wrote:

> Hi all,
>
> for concurrent Spark jobs spawned from the driver, we use Spark's fair
> scheduler pools, which are set and unset in a thread-local manner by
> each worker thread. Typically (for rather long jobs), this works very
> well. Unfortunately, in an application with lots of very short
> parallel sections, we see 1000s of these pools remaining in the Spark
> UI, which indicates some kind of leak. Each worker cleans up its local
> property by setting it to null, but not all pools are properly
> removed. I've checked and reproduced this behavior with Spark 2.1-2.3.
>
> Now my question: Is there a way to explicitly remove these pools,
> either globally, or locally while the thread is still alive?
>
> Regards,
> Matthias
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: Fair scheduler pool leak

Reply via email to