Re: HPC with Spark? Simultaneous, parallel one to one mapping of partition to vcore

2016-11-19 Thread Stephen Boesch
While "apparently" saturating the N available workers using your proposed N partitions - the "actual" distribution of workers to tasks is controlled by the scheduler. If my past experience were of service - you can *not *trust the default Fair Scheduler to ensure the round-robin scheduling of the

HPC with Spark? Simultaneous, parallel one to one mapping of partition to vcore

2016-11-19 Thread Adam Smith
Dear community, I have a RDD with N rows and N partitions. I want to ensure that the partitions run all at the some time, by setting the number of vcores (spark-yarn) to N. The partitions need to talk to each other with some socket based sync that is why I need them to run more or less