Re: Issue in spark job. Remote rpc client dissociated

2016-07-14 Thread Balachandar R.A.
Hello, The variable argsList is an array defined above the parallel block. This variawis accessed inside the map function. Launcher.main is not threadsafe. Is is not possible to specify to spark that every folder needs to be processed as a separate process in a separate working directory?

Re: Issue in spark job. Remote rpc client dissociated

2016-07-14 Thread Sun Rui
Where is argsList defined? is Launcher.main() thread-safe? Note that if multiple folders are processed in a node, multiple threads may concurrently run in the executor, each processing a folder. > On Jul 14, 2016, at 12:28, Balachandar R.A. wrote: > > Hello Ted, >

Re: Issue in spark job. Remote rpc client dissociated

2016-07-13 Thread Balachandar R.A.
> > Hello Ted, > Thanks for the response. Here is the additional information. > I am using spark 1.6.1 (spark-1.6.1-bin-hadoop2.6) > > > > Here is the code snippet > > > > > > JavaRDD add = jsc.parallelize(listFolders, listFolders.size()); > > JavaRDD test = add.map(new

Re: Issue in spark job. Remote rpc client dissociated

2016-07-13 Thread Ted Yu
Which Spark release are you using ? Can you disclose what the folder processing does (code snippet is better) ? Thanks On Wed, Jul 13, 2016 at 9:44 AM, Balachandar R.A. wrote: > Hello > > In one of my use cases, i need to process list of folders in parallel. I > used

Issue in spark job. Remote rpc client dissociated

2016-07-13 Thread Balachandar R.A.
Hello In one of my use cases, i need to process list of folders in parallel. I used Sc.parallelize (list,list.size).map(" logic to process the folder"). I have a six node cluster and there are six folders to process. Ideally i expect that each of my node process one folder. But, i see that a