[ https://issues.apache.org/jira/browse/SPARK-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115076#comment-15115076 ]
Takeshi Yamamuro edited comment on SPARK-12890 at 1/25/16 11:39 AM: -------------------------------------------------------------------- I looked over the related codes; partition pruning optimization itself has been implemented in https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L74. However, there is no interface in DataFrame#parquet to pass partition information (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L321). was (Author: maropu): I looked over the related codes; ISTM that partition pruning optimization itself has been implemented in https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L74. However, there is no interface in DataFrame#parquet to pass partition information (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L321). > Spark SQL query related to only partition fields should not scan the whole > data. > -------------------------------------------------------------------------------- > > Key: SPARK-12890 > URL: https://issues.apache.org/jira/browse/SPARK-12890 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Prakash Chockalingam > > I have a SQL query which has only partition fields. The query ends up > scanning all the data which is unnecessary. > Example: select max(date) from table, where the table is partitioned by date. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org