Which Spark release are you using ?

Can you disclose what the folder processing does (code snippet is better) ?

Thanks

On Wed, Jul 13, 2016 at 9:44 AM, Balachandar R.A. <balachandar...@gmail.com>
wrote:

> Hello
>
> In one of my use cases, i need to process list of folders in parallel. I
> used
> Sc.parallelize (list,list.size).map(" logic to process the folder").
> I have a six node cluster and there are six folders to process.  Ideally i
> expect that each of my node process one folder.  But,  i see that a node
> process multiple folders while one or two of the nodes do not get any job.
> In the end, the spark- submit crashes with the exception saying "remote RPC
> client dissociated". Can someone give me a hint on what's going wrong here?
> Please note that this issue does not arise if i comment my logic that
> processes the folder but simply print folder name. In this case,  every
> node gets one folder to process.  I inserted a sleep of 40 seconds inside
> the map. No issue. But when i uncomment my logic i see this issue. Also,
> before crashing it does process some of the folders successfully.
> Successfully means the business logic generates a file in a shared file
> system.
>
> Regards
> Bala
>

Reply via email to