*Tl;dr: *SparkContext.setLocalProperty is implemented with
InheritableThreadLocal.
This has unexpected consequences, not least because the method
documentation doesn't say anything about it:

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L605

I'd like to propose that we do one of: (1) document explicitly that these
properties are inheritable; (2) stop them being inheritable; or (3)
introduce the option to set these in a non-inheritable way.

*Motivation: *This started with me investigating a last vestige of the
leaking spark.sql.execution.id issue in Spark 1.5.2 (it's not reproducible
under controlled conditions, and given the many and excellent fixes on this
issue it's completely mysterious that this hangs around; the bug itself is
largely beside the point).

The specific contribution that inheritable localProperties makes to this
problem is that if a localProperty like spark.sql.execution.id leaks (i.e.
remains set when it shouldn't) because those properties are inherited by
spawned threads, that pollution affects all subsequently spawned threads.

This doesn't sound like a big deal - why would worker threads be spawning
other threads? It turns out that Java's ThreadPoolExecutor has worker
threads spawn other worker threads (it has no master dispatcher thread; the
workers themselves run all the housekeeping). JavaDoc here:
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html
and source code here:
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/concurrent/ThreadPoolExecutor.java#ThreadPoolExecutor

Accordingly, if using Scala Futures and any kind of thread pool that comes
built-in with Java, it's impossible to avoid localproperties propagating
haphazardly to different threads. For localProperties explicitly set by
user code this isn't nice, and requires work arounds like explicitly
clearing known properties at the start of every future, or in a
beforeExecute hook on the threadpool. For leaky properties the work around
is pretty much the same: defensively clear them in the threadpool.

*Options:*
(0) Do nothing at all. Unattractive, because documenting this would still
be better;
(1) Update the scaladoc to explicitly say that localProperties are
inherited by spawned threads and note that caution should be exercised with
thread pools.
(2) Switch to using ordinary, non-inheritable thread locals. I assume this
would break something for somebody, but if not, this would be my preferred
option. Also a very simple change to implement if no-one is relying on
property inheritance.
(3) Introduce a second localProperty facility which is not inherited. This
would not break any existing code, and should not be too hard to implement.
localProperties which need cleanup could be migrated to using this
non-inheritable facility, helping to limit the impact of failing to clean
up.
The way I envisage this working is that non-inheritable localProperties
would be checked first, then inheritable, then global properties.

*Actions:*
I'm happy to do the coding and open such Jira tickets as desirable or
necessary. Before I do any of that, I'd like to know if there's any support
for this, and ideally secure a committer who can help shepherd this change
through.

Marcin Tustin

-- 
Want to work at Handy? Check out our culture deck and open roles 
<http://www.handy.com/careers>
Latest news <http://www.handy.com/press> at Handy
Handy just raised $50m 
<http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
 led 
by Fidelity

Reply via email to