Re: How to set Spark to perform only one map at once at each cluster node

2014-10-31 Thread jan.zikes
Yes I would expect it as you say, setting executor-cores as 1 would work, but  it seems to me that when I do use executor-cores=1 than it does actually perform more than one job on each of the machines at one time moment (at least based on what top says).

How to set Spark to perform only one map at once at each cluster node

2014-10-28 Thread jan.zikes
Hi, I am currently struggling with how to properly set Spark to perform only one map, flatMap, etc at once. In other words my map uses multi core algorithm so I would like to have only one map running to be able to use all the machine cores. Thank you in advance for advices and replies.  Jan 

Re: How to set Spark to perform only one map at once at each cluster node

2014-10-28 Thread Yanbo Liang
The number of tasks is decided by the input partition numbers. If you want only one map or flatMap at once, just call coalesce() or repartition() to associate data into one partition. However, this is not recommend because it was not executed parallel efficiently. 2014-10-28 17:27 GMT+08:00

Re: How to set Spark to perform only one map at once at each cluster node

2014-10-28 Thread jan.zikes
But I guess that this makes only one task over all the clusters nodes. I would like to run several tasks, but I would like Spark to not run more than one map at each of my nodes at one time. That means I would like to let's say have 4 different tasks and 2 nodes where each node has 2 cores.

Re: How to set Spark to perform only one map at once at each cluster node

2014-10-28 Thread Yanbo Liang
It's not very difficult to implement by properly set parameter of application. Some basic knowledge you should know: An application can have only one executor at each machine or container (YARN). So you just set executor-cores as 1, then each executor will make only one task at once. 2014-10-28