Kevin Mader created SPARK-4640: ---------------------------------- Summary: FixedRangePartitioner for partitioning items with a known range Key: SPARK-4640 URL: https://issues.apache.org/jira/browse/SPARK-4640 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Kevin Mader
For the large datasets I work with, it is common to have light-weight keys and very heavy values (integers and large double arrays for example). The key values are however known and unchanging. It would be nice if Spark had a built in partitioner which could take advantage of this. A FixedRangePartitioner[T](keys: Seq[T], partitions: Int) would be ideal. Furthermore this partitioner type could be extended to a PartitionerWithKnownKeys that had a getAllKeys function allowing for a list of keys to be obtained without querying through the entire RDD. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org