Hi Daniel, Take a look at .coalesce() I’ve seen good results by coalescing to num executors * 10, but I’m still trying to figure out the optimal number of partitions per executor. To get the number of executors, sc.getConf.getInt(“spark.executor.instances”,-1)
Cheers, Doug > On Jul 20, 2015, at 5:04 AM, Daniel Haviv <daniel.ha...@veracity-group.com> > wrote: > > Hi, > My data is constructed from a lot of small files which results in a lot of > partitions per RDD. > Is there some way to locally repartition the RDD without shuffling so that > all of the partitions that reside on a specific node will become X partitions > on the same node ? > > Thank you. > Daniel --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org