Re: One map per folder in spark or Hadoop

2016-07-07 Thread Deepak Sharma
You have to distribute the files in some distributed file system like hdfs. Or else copy the files to all executors local file system and make sure to mention the file scheme in the URI explicitly. Thanks Deepak On Thu, Jul 7, 2016 at 7:13 PM, Balachandar R.A. wrote:

Re: One map per folder in spark or Hadoop

2016-07-07 Thread Balachandar R.A.
Hi Thanks for the code snippet. If the executable inside the map process needs to access directories and files present in the local file system. Is it possible? I know they are running in slave node in a temporary working directory and i can think about distributed cache. But still would like to

Re: One map per folder in spark or Hadoop

2016-06-30 Thread Balachandar R.A.
Thank you very much. I will try this code and update you Regards Bala On 01-Jul-2016 7:46 am, "Sun Rui" wrote: > Say you have got all of your folder paths into a val folders: Seq[String] > > val add = sc.parallelize(folders, folders.size).mapPartitions { iter => > val

Re: One map per folder in spark or Hadoop

2016-06-30 Thread Sun Rui
Say you have got all of your folder paths into a val folders: Seq[String] val add = sc.parallelize(folders, folders.size).mapPartitions { iter => val folder = iter.next val status: Int = Seq(status).toIterator } > On Jun 30, 2016, at 16:42, Balachandar R.A.

One map per folder in spark or Hadoop

2016-06-30 Thread Balachandar R.A.
Hello, I have some 100 folders. Each folder contains 5 files. I have an executable that process one folder. The executable is a black box and hence it cannot be modified.I would like to process 100 folders in parallel using Apache spark so that I should be able to span a map task per folder. Can