Hi All,
The code: RangePartitioner
// This is the sample size we need to have roughly balanced output
partitions, capped at 1M.
val sampleSize = math.min(20.0 * partitions, 1e6)
// Assume the input partitions are roughly balanced and over-sample a
little bit.
val sampleSizePerPartition = math.ceil(3.0 * sampleSize /
rdd.partitions.length).toInt
The Constants : 20.0 and 3.0 It is hardcode. Why is it fixed?
Is it come from some white paper or research?
Regards
-Raintung Li
