Based on your code: sparkContext.addFile("/home/files/data.txt"); List<String> file =sparkContext.textFile(SparkFiles.get("data.txt")).collect();
I’m assuming the file in “/home/files/data.txt” exists and is readable in the driver’s filesystem. Did you try just doing this: List<String> file =sparkContext.textFile("/home/files/data.txt").collect(); > On Mar 8, 2016, at 1:20 PM, Ashik Vetrivelu <vcas...@gmail.com> wrote: > > Hey, yeah I also tried by setting sc.textFile() with a local path and it > still throws the exception when trying to use collect(). > > Sorry I am new to spark and I am just messing around with it. > > On Mar 8, 2016 10:23 PM, "Tristan Nixon" <st...@memeticlabs.org > <mailto:st...@memeticlabs.org>> wrote: > My understanding of the model is that you’re supposed to execute > SparkFiles.get(…) on each worker node, not on the driver. > > Since you already know where the files are on the driver, if you want to load > these into an RDD with SparkContext.textFile, then this will distribute it > out to the workers, there’s no need to use SparkContext.add to do this. > > If you have some functions that run on workers that expects local file > resources, then you can use SparkContext.addFile to distribute the files into > worker local storage, then you can execute SparkFiles.get separately on each > worker to retrieve these local files (it will give different paths on each > worker). > > > On Mar 8, 2016, at 5:31 AM, ashikvc <vcas...@gmail.com > > <mailto:vcas...@gmail.com>> wrote: > > > > I am trying to play a little bit with apache-spark cluster mode. > > So my cluster consists of a driver in my machine and a worker and manager in > > host machine(separate machine). > > > > I send a textfile using `sparkContext.addFile(filepath)` where the filepath > > is the path of my text file in local machine for which I get the following > > output: > > > > INFO Utils: Copying /home/files/data.txt to > > /tmp/spark-b2e2bb22-487b-412b-831d-19d7aa96f275/userFiles-147c9552-1a77-427e-9b17-cb0845807860/data.txt > > > > INFO SparkContext: Added file /home/files/data.txt at > > http://192.XX.XX.164:58143/files/data.txt > > <http://192.xx.xx.164:58143/files/data.txt> with timestamp 1457432207649 > > > > But when I try to access the same file using `SparkFiles.get("data.txt")`, I > > get the path to file in my driver instead of worker. > > I am setting my file like this > > > > SparkConf conf = new > > SparkConf().setAppName("spark-play").setMaster("spark://192.XX.XX.172:7077"); > > conf.setJars(new String[]{"jars/SparkWorker.jar"}); > > JavaSparkContext sparkContext = new JavaSparkContext(conf); > > sparkContext.addFile("/home/files/data.txt"); > > List<String> file > > =sparkContext.textFile(SparkFiles.get("data.txt")).collect(); > > I am getting FileNotFoundException here. > > > > > > > > > > > > -- > > View this message in context: > > http://apache-spark-user-list.1001560.n3.nabble.com/SparkFiles-get-returns-with-driver-path-Instead-of-Worker-Path-tp26428.html > > > > <http://apache-spark-user-list.1001560.n3.nabble.com/SparkFiles-get-returns-with-driver-path-Instead-of-Worker-Path-tp26428.html> > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > <mailto:user-unsubscr...@spark.apache.org> > > For additional commands, e-mail: user-h...@spark.apache.org > > <mailto:user-h...@spark.apache.org> > > >