[jira] [Comment Edited] (SPARK-12890) Spark SQL query related to only partition fields should not scan the whole data.

Takeshi Yamamuro (JIRA) Mon, 25 Jan 2016 05:33:07 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115076#comment-15115076
 ]


Takeshi Yamamuro edited comment on SPARK-12890 at 1/25/16 1:32 PM:
-------------------------------------------------------------------

I looked over the related codes; partition pruning optimization itself has been 
implemented in 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L74.
However, there is no interface in DataFrameReader#parquet to pass partition 
information 
(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L321).


was (Author: maropu):
I looked over the related codes; partition pruning optimization itself has been 
implemented in 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L74.
However, there is no interface in DataFrame#parquet to pass partition 
information 
(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L321).

> Spark SQL query related to only partition fields should not scan the whole 
> data.
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-12890
>                 URL: https://issues.apache.org/jira/browse/SPARK-12890
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Prakash Chockalingam
>
> I have a SQL query which has only partition fields. The query ends up 
> scanning all the data which is unnecessary.
> Example: select max(date) from table, where the table is partitioned by date.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-12890) Spark SQL query related to only partition fields should not scan the whole data.

Reply via email to