Hello,
The variable argsList is an array defined above the parallel block. This
variawis accessed inside the map function. Launcher.main is not threadsafe.
Is is not possible to specify to spark that every folder needs to be
processed as a separate process in a separate working directory?
Where is argsList defined? is Launcher.main() thread-safe? Note that if
multiple folders are processed in a node, multiple threads may concurrently run
in the executor, each processing a folder.
> On Jul 14, 2016, at 12:28, Balachandar R.A. wrote:
>
> Hello Ted,
>
>
> Hello Ted,
>
Thanks for the response. Here is the additional information.
> I am using spark 1.6.1 (spark-1.6.1-bin-hadoop2.6)
>
>
>
> Here is the code snippet
>
>
>
>
>
> JavaRDD add = jsc.parallelize(listFolders, listFolders.size());
>
> JavaRDD test = add.map(new
Which Spark release are you using ?
Can you disclose what the folder processing does (code snippet is better) ?
Thanks
On Wed, Jul 13, 2016 at 9:44 AM, Balachandar R.A.
wrote:
> Hello
>
> In one of my use cases, i need to process list of folders in parallel. I
> used
Hello
In one of my use cases, i need to process list of folders in parallel. I
used
Sc.parallelize (list,list.size).map(" logic to process the folder").
I have a six node cluster and there are six folders to process. Ideally i
expect that each of my node process one folder. But, i see that a