Re: distribute work (files)

2016-09-07 Thread Peter Figliozzi
>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) >>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) >>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) >>>> at org.apache.spark.scheduler.Task.run(Task.scala:85) >>>> at org.apache

Re: distribute work (files)

2016-09-07 Thread ayan guha
ark.scheduler.ResultTask.runTask(ResultTask.scala:70) >>> at org.apache.spark.scheduler.Task.run(Task.scala:85) >>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) >>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >>>

Re: distribute work (files)

2016-09-07 Thread Peter Figliozzi
.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >> Executor.java:1142) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >> lExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> >> >> On Wed, Sep 7, 2016 at 9:50 AM, Yong

Re: distribute work (files)

2016-09-07 Thread ayan guha
va.lang.Thread.run(Thread.java:745) > > > On Wed, Sep 7, 2016 at 9:50 AM, Yong Zhang <java8...@hotmail.com> wrote: > >> What error do you get? FileNotFoundException? >> >> >> Please paste the stacktrace here. >> >> >> Yong >>

Fwd: distribute work (files)

2016-09-07 Thread Peter Figliozzi
leNotFoundException? > > > Please paste the stacktrace here. > > > Yong > > > -- > *From:* Peter Figliozzi <pete.figlio...@gmail.com> > *Sent:* Wednesday, September 7, 2016 10:18 AM > *To:* ayan guha > *Cc:* Lydia Ickler;

Re: distribute work (files)

2016-09-07 Thread Yong Zhang
What error do you get? FileNotFoundException? Please paste the stacktrace here. Yong From: Peter Figliozzi <pete.figlio...@gmail.com> Sent: Wednesday, September 7, 2016 10:18 AM To: ayan guha Cc: Lydia Ickler; user.spark Subject: Re: distribute work

Re: distribute work (files)

2016-09-07 Thread Peter Figliozzi
That's failing for me. Can someone please try this-- is this even supposed to work: - create a directory somewhere and add two text files to it - mount that directory on the Spark worker machines with sshfs - read the textfiles into one datas structure using a file URL with a

Re: distribute work (files)

2016-09-06 Thread ayan guha
To access local file, try with file:// URI. On Wed, Sep 7, 2016 at 8:52 AM, Peter Figliozzi wrote: > This is a great question. Basically you don't have to worry about the > details-- just give a wildcard in your call to textFile. See the Programming > Guide

Re: distribute work (files)

2016-09-06 Thread Peter Figliozzi
This is a great question. Basically you don't have to worry about the details-- just give a wildcard in your call to textFile. See the Programming Guide section entitled "External Datasets". The Spark framework will distribute your

distribute work (files)

2016-09-06 Thread Lydia Ickler
Hi, maybe this is a stupid question: I have a list of files. Each file I want to take as an input for a ML-algorithm. All files are independent from another. My question now is how do I distribute the work so that each worker takes a block of files and just runs the algorithm on them one by