Hi Daniel,
Take a look at .coalesce()
I’ve seen good results by coalescing to num executors * 10, but I’m still 
trying to figure out the 
optimal number of partitions per executor. 
To get the number of executors, sc.getConf.getInt(“spark.executor.instances”,-1)


Cheers,

Doug

> On Jul 20, 2015, at 5:04 AM, Daniel Haviv <daniel.ha...@veracity-group.com> 
> wrote:
> 
> Hi,
> My data is constructed from a lot of small files which results in a lot of 
> partitions per RDD.
> Is there some way to locally repartition the RDD without shuffling so that 
> all of the partitions that reside on a specific node will become X partitions 
> on the same node ?
> 
> Thank you.
> Daniel


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to