[ https://issues.apache.org/jira/browse/SPARK-12931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Reynold Xin updated SPARK-12931: -------------------------------- Assignee: Michael Armbrust (was: Reynold Xin) > Improve bucket read path to only create one single RDD > ------------------------------------------------------ > > Key: SPARK-12931 > URL: https://issues.apache.org/jira/browse/SPARK-12931 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Wenchen Fan > Assignee: Michael Armbrust > > Currently we will create one RDD per bucket and coalesce it to one partition, > and finally union them to a final RDD. We should create a single RDD instead, > it requires to modify the data source interface a little bit and abstract the > logic of reader out to decouple it from RDD. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org