That's failing for me. Can someone please try this-- is this even supposed to work:
- create a directory somewhere and add two text files to it - mount that directory on the Spark worker machines with sshfs - read the textfiles into one datas structure using a file URL with a wildcard Thanks, Pete On Tue, Sep 6, 2016 at 11:20 PM, ayan guha <guha.a...@gmail.com> wrote: > To access local file, try with file:// URI. > > On Wed, Sep 7, 2016 at 8:52 AM, Peter Figliozzi <pete.figlio...@gmail.com> > wrote: > >> This is a great question. Basically you don't have to worry about the >> details-- just give a wildcard in your call to textFile. See the Programming >> Guide <http://spark.apache.org/docs/latest/programming-guide.html> section >> entitled "External Datasets". The Spark framework will distribute your >> data across the workers. Note that: >> >> *If using a path on the local filesystem, the file must also be >>> accessible at the same path on worker nodes. Either copy the file to all >>> workers or use a network-mounted shared file system.* >> >> >> In your case this would mean the directory of files. >> >> Curiously, I cannot get this to work when I mount a directory with sshfs >> on all of my worker nodes. It says "file not found" even though the file >> clearly exists in the specified path on all workers. Anyone care to try >> and comment on this? >> >> Thanks, >> >> Pete >> >> On Tue, Sep 6, 2016 at 9:51 AM, Lydia Ickler <ickle...@googlemail.com> >> wrote: >> >>> Hi, >>> >>> maybe this is a stupid question: >>> >>> I have a list of files. Each file I want to take as an input for a >>> ML-algorithm. All files are independent from another. >>> My question now is how do I distribute the work so that each worker >>> takes a block of files and just runs the algorithm on them one by one. >>> I hope somebody can point me in the right direction! :) >>> >>> Best regards, >>> Lydia >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> >>> >> > > > -- > Best Regards, > Ayan Guha >