Based on your code:

sparkContext.addFile("/home/files/data.txt");
List<String> file =sparkContext.textFile(SparkFiles.get("data.txt")).collect();

I’m assuming the file in “/home/files/data.txt” exists and is readable in the 
driver’s filesystem.
Did you try just doing this:

List<String> file =sparkContext.textFile("/home/files/data.txt").collect();

> On Mar 8, 2016, at 1:20 PM, Ashik Vetrivelu <vcas...@gmail.com> wrote:
> 
> Hey, yeah I also tried by setting sc.textFile() with a local path and it 
> still throws the exception when trying to use collect().
> 
> Sorry I am new to spark and I am just messing around with it.
> 
> On Mar 8, 2016 10:23 PM, "Tristan Nixon" <st...@memeticlabs.org 
> <mailto:st...@memeticlabs.org>> wrote:
> My understanding of the model is that you’re supposed to execute 
> SparkFiles.get(…) on each worker node, not on the driver.
> 
> Since you already know where the files are on the driver, if you want to load 
> these into an RDD with SparkContext.textFile, then this will distribute it 
> out to the workers, there’s no need to use SparkContext.add to do this.
> 
> If you have some functions that run on workers that expects local file 
> resources, then you can use SparkContext.addFile to distribute the files into 
> worker local storage, then you can execute SparkFiles.get separately on each 
> worker to retrieve these local files (it will give different paths on each 
> worker).
> 
> > On Mar 8, 2016, at 5:31 AM, ashikvc <vcas...@gmail.com 
> > <mailto:vcas...@gmail.com>> wrote:
> >
> > I am trying to play a little bit with apache-spark cluster mode.
> > So my cluster consists of a driver in my machine and a worker and manager in
> > host machine(separate machine).
> >
> > I send a textfile using `sparkContext.addFile(filepath)` where the filepath
> > is the path of my text file in local machine for which I get the following
> > output:
> >
> >    INFO Utils: Copying /home/files/data.txt to
> > /tmp/spark-b2e2bb22-487b-412b-831d-19d7aa96f275/userFiles-147c9552-1a77-427e-9b17-cb0845807860/data.txt
> >
> >    INFO SparkContext: Added file /home/files/data.txt at
> > http://192.XX.XX.164:58143/files/data.txt 
> > <http://192.xx.xx.164:58143/files/data.txt> with timestamp 1457432207649
> >
> > But when I try to access the same file using `SparkFiles.get("data.txt")`, I
> > get the path to file in my driver instead of worker.
> > I am setting my file like this
> >
> >    SparkConf conf = new
> > SparkConf().setAppName("spark-play").setMaster("spark://192.XX.XX.172:7077");
> >        conf.setJars(new String[]{"jars/SparkWorker.jar"});
> >        JavaSparkContext sparkContext = new JavaSparkContext(conf);
> >        sparkContext.addFile("/home/files/data.txt");
> >        List<String> file
> > =sparkContext.textFile(SparkFiles.get("data.txt")).collect();
> > I am getting FileNotFoundException here.
> >
> >
> >
> >
> >
> > --
> > View this message in context: 
> > http://apache-spark-user-list.1001560.n3.nabble.com/SparkFiles-get-returns-with-driver-path-Instead-of-Worker-Path-tp26428.html
> >  
> > <http://apache-spark-user-list.1001560.n3.nabble.com/SparkFiles-get-returns-with-driver-path-Instead-of-Worker-Path-tp26428.html>
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> > <mailto:user-unsubscr...@spark.apache.org>
> > For additional commands, e-mail: user-h...@spark.apache.org 
> > <mailto:user-h...@spark.apache.org>
> >
> 

Reply via email to