Re: Fair scheduler pool leak

2018-04-09 Thread Imran Rashid
If I understand what you're trying to do correctly, I think you really just want one pool, but you want to change the mode *within* the pool to be FAIR as well https://spark.apache.org/docs/latest/job-scheduling.html#configuring-pool-properties you'd still need to change the conf file to set up

Re: Fair scheduler pool leak

2018-04-07 Thread Matthias Boehm
No, these pools are not created per job but per parfor worker and thus, used to execute many jobs. For all scripts with a single top-level parfor this is equivalent to static initialization. However, yes we create these pools dynamically on demand to avoid unnecessary initialization and handle

Re: Fair scheduler pool leak

2018-04-07 Thread Matthias Boehm
well, the point was "in a programmatic way without the need for additional configuration files which is a hassle for a library" - anyway, I appreciate your comments. Regards, Matthias On Sat, Apr 7, 2018 at 3:43 PM, Mark Hamstra wrote: >> Providing a way to set the mode

Re: Fair scheduler pool leak

2018-04-07 Thread Mark Hamstra
> > Providing a way to set the mode of the default scheduler would be awesome. That's trivial: Just use the pool configuration XML file and define a pool named "default" with the characteristics that you want (including schedulingMode FAIR). You only get the default construction of the pool

Re: Fair scheduler pool leak

2018-04-07 Thread Mark Hamstra
Sorry, but I'm still not understanding this use case. Are you somehow creating additional scheduling pools dynamically as Jobs execute? If so, that is a very unusual thing to do. Scheduling pools are intended to be statically configured -- initialized, living and dying with the Application. On

Re: Fair scheduler pool leak

2018-04-07 Thread Matthias Boehm
Thanks for the clarification Imran - that helped. I was mistakenly assuming that these pools are removed via weak references, as the ContextCleaner does for RDDs, broadcasts, and accumulators, etc. For the time being, we'll just work around it, but I'll file a nice-to-have improvement JIRA. Also,

Re: Fair scheduler pool leak

2018-04-06 Thread Imran Rashid
Hi Matthias, This doeesn't look possible now. It may be worth filing an improvement jira for. But I'm trying to understand what you're trying to do a little better. So you intentionally have each thread create a new unique pool when its submits a job? So that pool will just get the default

Fair scheduler pool leak

2018-04-05 Thread Matthias Boehm
Hi all, for concurrent Spark jobs spawned from the driver, we use Spark's fair scheduler pools, which are set and unset in a thread-local manner by each worker thread. Typically (for rather long jobs), this works very well. Unfortunately, in an application with lots of very short parallel