[ https://issues.apache.org/jira/browse/SPARK-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14058322#comment-14058322 ]
Aaron commented on SPARK-911: ----------------------------- I decided to take a look at this myself (disclaimer haven't worked on sparks source code before). A couple things I noticed were: 1. since compute is called per split for all RDD's, the best I could figure out using that approach is to just return an empty iterator 2. you could more efficiently check the partitions that could contain correct numbers, but with only an iterator you have to linearly search > Support map pruning on sorted (K, V) RDD's > ------------------------------------------ > > Key: SPARK-911 > URL: https://issues.apache.org/jira/browse/SPARK-911 > Project: Spark > Issue Type: Bug > Reporter: Patrick Wendell > > If someone has sorted a (K, V) rdd, we should offer them a way to filter a > range of the partitions that employs map pruning. This would be simple using > a small range index within the rdd itself. A good example is I sort my > dataset by time and then I want to serve queries that are restricted to a > certain time range. -- This message was sent by Atlassian JIRA (v6.2#6252)