Reynold Xin created SPARK-2568:
----------------------------------

             Summary: RangePartitioner should go through the data only once
                 Key: SPARK-2568
                 URL: https://issues.apache.org/jira/browse/SPARK-2568
             Project: Spark
          Issue Type: Bug
    Affects Versions: 1.0.0
            Reporter: Reynold Xin


As of Spark 1.0, RangePartitioner goes through data twice: once to compute the 
count and once to do sampling. As a result, to do sortByKey, Spark goes through 
data 3 times (once to count, once to sample, and once to sort).

RangePartitioner should go through data only once (remove the count step).






--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to