Can anybody answer this? Do I have to have hdfs to achieve this?
Regards,
Ningjun Wang
Consulting Software Engineer
LexisNexis
121 Chanlon Road
New Providence, NJ 07974-1541
From: Wang, Ningjun (LNG-NPV) [mailto:ningjun.w...@lexisnexis.com]
Sent: Friday, January 16, 2015 1:15 PM
To: Imran
If the dataset is not huge (in a few GB), you can setup NFS instead of
HDFS (which is much harder to setup):
1. export a directory in master (or anyone in the cluster)
2. mount it in the same position across all slaves
3. read/write from it by file:///path/to/monitpoint
On Tue, Jan 20, 2015 at
I don’t think it will work without HDFS.
Mohammed
From: Wang, Ningjun (LNG-NPV) [mailto:ningjun.w...@lexisnexis.com]
Sent: Tuesday, January 20, 2015 7:55 AM
To: Wang, Ningjun (LNG-NPV)
Cc: user@spark.apache.org
Subject: RE: Can I save RDD to local file system and then read it back on spark
I'm not positive, but I think this is very unlikely to work.
First, when you call sc.objectFile(...), I think the *driver* will need to
know something about the file, eg to know how many tasks to create. But it
won't even be able to see the file, since it only lives on the local
filesystem of
I need to save RDD to file system and then restore my RDD from the file system
in the future. I don’t have any hdfs file system and don’t want to go the
hassle of setting up a hdfs system. So how can I achieve this? The application
need to be run on a cluster with multiple nodes.
Regards,