RE: Repartitioning by partition size, not by number of partitions.

2014-10-31 Thread Ganelin, Ilya
Hi Jan. I've actually written a function recently to do precisely that using the RDD.randomSplit function. You just need to calculate how big each element of your data is, then how many of each data can fit in each RDD to populate the input to rqndomSplit. Unfortunately, in my case I wind up

RE: Repartitioning by partition size, not by number of partitions.

2014-10-31 Thread jan.zikes
Hi Ilya, This seems to me as quiet complicated solution, I'm thinking that easier (though not optimal) solution might be for example to use heuristicaly something like RDD.coalesce(RDD.getNumPartitions() / N), but it keeps me wonder that Spark does not have something like