Which Spark release are you using ? Can you disclose what the folder processing does (code snippet is better) ?
Thanks On Wed, Jul 13, 2016 at 9:44 AM, Balachandar R.A. <balachandar...@gmail.com> wrote: > Hello > > In one of my use cases, i need to process list of folders in parallel. I > used > Sc.parallelize (list,list.size).map(" logic to process the folder"). > I have a six node cluster and there are six folders to process. Ideally i > expect that each of my node process one folder. But, i see that a node > process multiple folders while one or two of the nodes do not get any job. > In the end, the spark- submit crashes with the exception saying "remote RPC > client dissociated". Can someone give me a hint on what's going wrong here? > Please note that this issue does not arise if i comment my logic that > processes the folder but simply print folder name. In this case, every > node gets one folder to process. I inserted a sleep of 40 seconds inside > the map. No issue. But when i uncomment my logic i see this issue. Also, > before crashing it does process some of the folders successfully. > Successfully means the business logic generates a file in a shared file > system. > > Regards > Bala >