Kevin Mader created SPARK-4640:
----------------------------------

             Summary: FixedRangePartitioner for partitioning items with a known 
range
                 Key: SPARK-4640
                 URL: https://issues.apache.org/jira/browse/SPARK-4640
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
            Reporter: Kevin Mader


For the large datasets I work with, it is common to have light-weight keys and 
very heavy values (integers and large double arrays for example). The key 
values are however known and unchanging. It would be nice if Spark had a built 
in partitioner which could take advantage of this. A 
FixedRangePartitioner[T](keys: Seq[T], partitions: Int) would be ideal. 
Furthermore this partitioner type could be extended to a 
PartitionerWithKnownKeys that had a getAllKeys function allowing for a list of 
keys to be obtained without querying through the entire RDD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to