Hello

In one of my use cases, i need to process list of folders in parallel. I
used
Sc.parallelize (list,list.size).map(" logic to process the folder").
I have a six node cluster and there are six folders to process.  Ideally i
expect that each of my node process one folder.  But,  i see that a node
process multiple folders while one or two of the nodes do not get any job.
In the end, the spark- submit crashes with the exception saying "remote RPC
client dissociated". Can someone give me a hint on what's going wrong here?
Please note that this issue does not arise if i comment my logic that
processes the folder but simply print folder name. In this case,  every
node gets one folder to process.  I inserted a sleep of 40 seconds inside
the map. No issue. But when i uncomment my logic i see this issue. Also,
before crashing it does process some of the folders successfully.
Successfully means the business logic generates a file in a shared file
system.

Regards
Bala

Reply via email to