RE: where storagelevel DISK_ONLY persists RDD to

Shao, Saisai Sun, 25 Jan 2015 23:29:18 -0800

No, current RDD persistence mechanism do not support putting data on HDFS.

The directory is spark.local.dirs.

Instead you can use checkpoint() to save the RDD on HDFS.

Thanks
Jerry

From: Larry Liu [mailto:larryli...@gmail.com]
Sent: Monday, January 26, 2015 3:08 PM
To: Charles Feduke
Cc: u...@spark.incubator.apache.org
Subject: Re: where storagelevel DISK_ONLY persists RDD to

Hi, Charles

Thanks for your reply.

Is it possible to persist RDD to HDFS? What is the default location to persist 
RDD with storagelevel DISK_ONLY?

On Sun, Jan 25, 2015 at 6:26 AM, Charles Feduke 
<charles.fed...@gmail.com<mailto:charles.fed...@gmail.com>> wrote:
I think you want to instead use `.saveAsSequenceFile` to save an RDD to 
someplace like HDFS or NFS it you are attempting to interoperate with another 
system, such as Hadoop. `.persist` is for keeping the contents of an RDD around 
so future uses of that particular RDD don't need to recalculate its composite 
parts.

On Sun Jan 25 2015 at 3:36:31 AM Larry Liu 
<larryli...@gmail.com<mailto:larryli...@gmail.com>> wrote:
I would like to persist RDD TO HDFS or NFS mount. How to change the location?

RE: where storagelevel DISK_ONLY persists RDD to

Reply via email to