All (three) of them. It's kind of cool-- when I re-run collect() a different executor will show up as first to encounter the error.
On Wed, Sep 7, 2016 at 8:20 PM, ayan guha <guha.a...@gmail.com> wrote: > Hi > > Is it happening on all executors or one? > > On Thu, Sep 8, 2016 at 10:46 AM, Peter Figliozzi <pete.figlio...@gmail.com > > wrote: > >> >> Yes indeed (see below). Just to reiterate, I am not running Hadoop. The >> "curly" node name mentioned in the stacktrace is the name of one of the >> worker nodes. I've mounted the same directory "datashare" with two text >> files to all worker nodes with sshfs. The Spark documentation suggests >> that this should work: >> >> *If using a path on the local filesystem, the file must also be >> accessible at the same path on worker nodes. Either copy the file to all >> workers or use a network-mounted shared file system.* >> >> I was hoping someone else could try this and see if it works. >> >> Here's what I did to generate the error: >> >> val data = sc.textFile("file:///home/peter/datashare/*.txt") >> data.collect() >> >> It's working to some extent because if I put a bogus path in, I'll get a >> different (correct) error (InvalidInputException: Input Pattern >> file:/home/peter/ddatashare/*.txt matches 0 files). >> >> Here's the stack trace when I use a valid path: >> >> org.apache.spark.SparkException: Job aborted due to stage failure: Task >> 1 in stage 18.0 failed 4 times, most recent failure: Lost task 1.3 in stage >> 18.0 (TID 792, curly): java.io.FileNotFoundException: File >> file:/home/peter/datashare/f1.txt does not exist >> at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileSta >> tus(RawLocalFileSystem.java:609) >> at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInt >> ernal(RawLocalFileSystem.java:822) >> at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLoc >> alFileSystem.java:599) >> at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFi >> leSystem.java:421) >> at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputCheck >> er.<init>(ChecksumFileSystem.java:140) >> at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSys >> tem.java:341) >> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767) >> at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordR >> eader.java:109) >> at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(Tex >> tInputFormat.java:67) >> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:246) >> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:209) >> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) >> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsR >> DD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) >> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) >> at org.apache.spark.scheduler.Task.run(Task.scala:85) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) >> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >> Executor.java:1142) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >> lExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> >> >> On Wed, Sep 7, 2016 at 9:50 AM, Yong Zhang <java8...@hotmail.com> wrote: >> >>> What error do you get? FileNotFoundException? >>> >>> >>> Please paste the stacktrace here. >>> >>> >>> Yong >>> >>> >>> ------------------------------ >>> *From:* Peter Figliozzi <pete.figlio...@gmail.com> >>> *Sent:* Wednesday, September 7, 2016 10:18 AM >>> *To:* ayan guha >>> *Cc:* Lydia Ickler; user.spark >>> *Subject:* Re: distribute work (files) >>> >>> That's failing for me. Can someone please try this-- is this even >>> supposed to work: >>> >>> - create a directory somewhere and add two text files to it >>> - mount that directory on the Spark worker machines with sshfs >>> - read the textfiles into one datas structure using a file URL with >>> a wildcard >>> >>> Thanks, >>> >>> Pete >>> >>> On Tue, Sep 6, 2016 at 11:20 PM, ayan guha <guha.a...@gmail.com> wrote: >>> >>>> To access local file, try with file:// URI. >>>> >>>> On Wed, Sep 7, 2016 at 8:52 AM, Peter Figliozzi < >>>> pete.figlio...@gmail.com> wrote: >>>> >>>>> This is a great question. Basically you don't have to worry about the >>>>> details-- just give a wildcard in your call to textFile. See the >>>>> Programming >>>>> Guide <http://spark.apache.org/docs/latest/programming-guide.html> section >>>>> entitled "External Datasets". The Spark framework will distribute your >>>>> data across the workers. Note that: >>>>> >>>>> *If using a path on the local filesystem, the file must also be >>>>>> accessible at the same path on worker nodes. Either copy the file to all >>>>>> workers or use a network-mounted shared file system.* >>>>> >>>>> >>>>> In your case this would mean the directory of files. >>>>> >>>>> Curiously, I cannot get this to work when I mount a directory with >>>>> sshfs on all of my worker nodes. It says "file not found" even >>>>> though the file clearly exists in the specified path on all workers. >>>>> Anyone care to try and comment on this? >>>>> >>>>> Thanks, >>>>> >>>>> Pete >>>>> >>>>> On Tue, Sep 6, 2016 at 9:51 AM, Lydia Ickler <ickle...@googlemail.com> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> maybe this is a stupid question: >>>>>> >>>>>> I have a list of files. Each file I want to take as an input for a >>>>>> ML-algorithm. All files are independent from another. >>>>>> My question now is how do I distribute the work so that each worker >>>>>> takes a block of files and just runs the algorithm on them one by one. >>>>>> I hope somebody can point me in the right direction! :) >>>>>> >>>>>> Best regards, >>>>>> Lydia >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Best Regards, >>>> Ayan Guha >>>> >>> >>> >> >> > > > -- > Best Regards, > Ayan Guha >