Re: life if an executor

2014-05-20 Thread Koert Kuipers
if they are tied to the spark context, then why can the subprocess not be
started up with the extra jars (sc.addJars) already on class path? this way
a switch like user-jars-first would be a simple rearranging of the class
path for the subprocess, and the messing with classloaders that is
currently done in executor (which forces people to use reflection is
certain situations and is broken if you want user jars first) would be
history
On May 20, 2014 1:07 AM, Matei Zaharia matei.zaha...@gmail.com wrote:

 They’re tied to the SparkContext (application) that launched them.

 Matei

 On May 19, 2014, at 8:44 PM, Koert Kuipers ko...@tresata.com wrote:

 from looking at the source code i see executors run in their own jvm
 subprocesses.

 how long to they live for? as long as the worker/slave? or are they tied
 to the sparkcontext and life/die with it?

 thx





Re: life if an executor

2014-05-20 Thread Koert Kuipers
just for my clarification: off heap cannot be java objects, correct? so we
are always talking about serialized off-heap storage?
On May 20, 2014 1:27 AM, Tathagata Das tathagata.das1...@gmail.com
wrote:

 That's one the main motivation in using Tachyon ;)
 http://tachyon-project.org/

 It gives off heap in-memory caching. And starting Spark 0.9, you can cache
 any RDD in Tachyon just by specifying the appropriate StorageLevel.

 TD




 On Mon, May 19, 2014 at 10:22 PM, Mohit Jaggi mohitja...@gmail.comwrote:

 I guess it needs to be this way to benefit from caching of RDDs in
 memory. It would be nice however if the RDD cache can be dissociated from
 the JVM heap so that in cases where garbage collection is difficult to
 tune, one could choose to discard the JVM and run the next operation in a
 few one.


 On Mon, May 19, 2014 at 10:06 PM, Matei Zaharia 
 matei.zaha...@gmail.comwrote:

 They’re tied to the SparkContext (application) that launched them.

 Matei

 On May 19, 2014, at 8:44 PM, Koert Kuipers ko...@tresata.com wrote:

 from looking at the source code i see executors run in their own jvm
 subprocesses.

 how long to they live for? as long as the worker/slave? or are they tied
 to the sparkcontext and life/die with it?

 thx







Re: life if an executor

2014-05-20 Thread Aaron Davidson
One issue is that new jars can be added during the lifetime of a
SparkContext, which can mean after executors are already started. Off-heap
storage is always serialized, correct.


On Tue, May 20, 2014 at 6:48 AM, Koert Kuipers ko...@tresata.com wrote:

 just for my clarification: off heap cannot be java objects, correct? so we
 are always talking about serialized off-heap storage?
 On May 20, 2014 1:27 AM, Tathagata Das tathagata.das1...@gmail.com
 wrote:

 That's one the main motivation in using Tachyon ;)
 http://tachyon-project.org/

 It gives off heap in-memory caching. And starting Spark 0.9, you can
 cache any RDD in Tachyon just by specifying the appropriate StorageLevel.

 TD




 On Mon, May 19, 2014 at 10:22 PM, Mohit Jaggi mohitja...@gmail.comwrote:

 I guess it needs to be this way to benefit from caching of RDDs in
 memory. It would be nice however if the RDD cache can be dissociated from
 the JVM heap so that in cases where garbage collection is difficult to
 tune, one could choose to discard the JVM and run the next operation in a
 few one.


 On Mon, May 19, 2014 at 10:06 PM, Matei Zaharia matei.zaha...@gmail.com
  wrote:

 They’re tied to the SparkContext (application) that launched them.

 Matei

 On May 19, 2014, at 8:44 PM, Koert Kuipers ko...@tresata.com wrote:

 from looking at the source code i see executors run in their own jvm
 subprocesses.

 how long to they live for? as long as the worker/slave? or are they
 tied to the sparkcontext and life/die with it?

 thx







Re: life if an executor

2014-05-20 Thread Koert Kuipers
interesting, so it sounds to me like spark is forced to choose between the
ability to add jars during lifetime and the ability to run tasks with user
classpath first (which important for the ability to run jobs on spark
clusters not under your control, so for the viability of 3rd party spark
apps)


On Tue, May 20, 2014 at 1:06 PM, Aaron Davidson ilike...@gmail.com wrote:

 One issue is that new jars can be added during the lifetime of a
 SparkContext, which can mean after executors are already started. Off-heap
 storage is always serialized, correct.


 On Tue, May 20, 2014 at 6:48 AM, Koert Kuipers ko...@tresata.com wrote:

 just for my clarification: off heap cannot be java objects, correct? so
 we are always talking about serialized off-heap storage?
 On May 20, 2014 1:27 AM, Tathagata Das tathagata.das1...@gmail.com
 wrote:

 That's one the main motivation in using Tachyon ;)
 http://tachyon-project.org/

 It gives off heap in-memory caching. And starting Spark 0.9, you can
 cache any RDD in Tachyon just by specifying the appropriate StorageLevel.

 TD




 On Mon, May 19, 2014 at 10:22 PM, Mohit Jaggi mohitja...@gmail.comwrote:

 I guess it needs to be this way to benefit from caching of RDDs in
 memory. It would be nice however if the RDD cache can be dissociated from
 the JVM heap so that in cases where garbage collection is difficult to
 tune, one could choose to discard the JVM and run the next operation in a
 few one.


 On Mon, May 19, 2014 at 10:06 PM, Matei Zaharia 
 matei.zaha...@gmail.com wrote:

 They’re tied to the SparkContext (application) that launched them.

 Matei

 On May 19, 2014, at 8:44 PM, Koert Kuipers ko...@tresata.com wrote:

 from looking at the source code i see executors run in their own jvm
 subprocesses.

 how long to they live for? as long as the worker/slave? or are they
 tied to the sparkcontext and life/die with it?

 thx








life if an executor

2014-05-19 Thread Koert Kuipers
from looking at the source code i see executors run in their own jvm
subprocesses.

how long to they live for? as long as the worker/slave? or are they tied to
the sparkcontext and life/die with it?

thx


Re: life if an executor

2014-05-19 Thread Matei Zaharia
They’re tied to the SparkContext (application) that launched them.

Matei

On May 19, 2014, at 8:44 PM, Koert Kuipers ko...@tresata.com wrote:

 from looking at the source code i see executors run in their own jvm 
 subprocesses.
 
 how long to they live for? as long as the worker/slave? or are they tied to 
 the sparkcontext and life/die with it?
 
 thx



Re: life if an executor

2014-05-19 Thread Mohit Jaggi
I guess it needs to be this way to benefit from caching of RDDs in
memory. It would be nice however if the RDD cache can be dissociated from
the JVM heap so that in cases where garbage collection is difficult to
tune, one could choose to discard the JVM and run the next operation in a
few one.


On Mon, May 19, 2014 at 10:06 PM, Matei Zaharia matei.zaha...@gmail.comwrote:

 They’re tied to the SparkContext (application) that launched them.

 Matei

 On May 19, 2014, at 8:44 PM, Koert Kuipers ko...@tresata.com wrote:

 from looking at the source code i see executors run in their own jvm
 subprocesses.

 how long to they live for? as long as the worker/slave? or are they tied
 to the sparkcontext and life/die with it?

 thx





Re: life if an executor

2014-05-19 Thread Tathagata Das
That's one the main motivation in using Tachyon ;)
http://tachyon-project.org/

It gives off heap in-memory caching. And starting Spark 0.9, you can cache
any RDD in Tachyon just by specifying the appropriate StorageLevel.

TD




On Mon, May 19, 2014 at 10:22 PM, Mohit Jaggi mohitja...@gmail.com wrote:

 I guess it needs to be this way to benefit from caching of RDDs in
 memory. It would be nice however if the RDD cache can be dissociated from
 the JVM heap so that in cases where garbage collection is difficult to
 tune, one could choose to discard the JVM and run the next operation in a
 few one.


 On Mon, May 19, 2014 at 10:06 PM, Matei Zaharia 
 matei.zaha...@gmail.comwrote:

 They’re tied to the SparkContext (application) that launched them.

 Matei

 On May 19, 2014, at 8:44 PM, Koert Kuipers ko...@tresata.com wrote:

 from looking at the source code i see executors run in their own jvm
 subprocesses.

 how long to they live for? as long as the worker/slave? or are they tied
 to the sparkcontext and life/die with it?

 thx