Re: Reconnect to an application/RDD

2014-06-29 Thread Chris Fregly
Tachyon is another option - this is the off heap StorageLevel specified
when persisting RDDs:
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.storage.StorageLevel

or just use HDFS.  this requires subsequent Applications/SparkContext's to
reload the data from disk, of course.


On Tue, Jun 3, 2014 at 6:58 AM, Gerard Maas gerard.m...@gmail.com wrote:

 I don't think that's supported by default as when the standalone context
 will close, the related RDDs will be GC'ed

 You should explore Spark-Job Server, which allows to cache RDDs by name
 and reuse them within  a  context.

 https://github.com/ooyala/spark-jobserver

 -kr, Gerard.


 On Tue, Jun 3, 2014 at 3:45 PM, Oleg Proudnikov oleg.proudni...@gmail.com
  wrote:

 HI All,

 Is it possible to run a standalone app that would compute and
 persist/cache an RDD and then run other standalone apps that would gain
 access to that RDD?

 --
 Thank you,
 Oleg





Reconnect to an application/RDD

2014-06-03 Thread Oleg Proudnikov
HI All,

Is it possible to run a standalone app that would compute and persist/cache
an RDD and then run other standalone apps that would gain access to that
RDD?

-- 
Thank you,
Oleg


Re: Reconnect to an application/RDD

2014-06-03 Thread Gerard Maas
I don't think that's supported by default as when the standalone context
will close, the related RDDs will be GC'ed

You should explore Spark-Job Server, which allows to cache RDDs by name and
reuse them within  a  context.

https://github.com/ooyala/spark-jobserver

-kr, Gerard.


On Tue, Jun 3, 2014 at 3:45 PM, Oleg Proudnikov oleg.proudni...@gmail.com
wrote:

 HI All,

 Is it possible to run a standalone app that would compute and
 persist/cache an RDD and then run other standalone apps that would gain
 access to that RDD?

 --
 Thank you,
 Oleg