[ 
https://issues.apache.org/jira/browse/SPARK-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307073#comment-14307073
 ] 

Apache Spark commented on SPARK-5614:
-------------------------------------

User 'ianluyan' has created a pull request for this issue:
https://github.com/apache/spark/pull/4394

> Predicate pushdown through Generate
> -----------------------------------
>
>                 Key: SPARK-5614
>                 URL: https://issues.apache.org/jira/browse/SPARK-5614
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.2.0
>            Reporter: Lu Yan
>
> Now in Catalyst's rules, predicates can not be pushed through "Generate" 
> nodes. Further more, partition pruning in HiveTableScan can not be applied on 
> those queries involves "Generate". This makes such queries very inefficient.
> For example, physical plan for query
> {quote}
> select len, bk
> from s_server lateral view explode(len_arr) len_table as len 
> where len > 5 and day = '20150102';
> {quote}
> where 'day' is a partition column in metastore is like this in current 
> version of Spark SQL:
> {quote}
> Project [len, bk]
> Filter ((len > "5") && "(day = "20150102")")
> Generate explode(len_arr), true, false
> HiveTableScan [bk, len_arr, day], (MetastoreRelation default, s_server, 
> None), None
> {quote}
> But theoretically the plan should be like this
> {quote}
> Project [len, bk]
> Filter (len > "5")
> Generate explode(len_arr), true, false
> HiveTableScan [bk, len_arr, day], (MetastoreRelation default, s_server, 
> None), Some(day = "20150102")
> {quote} 
> Where partition pruning predicates can be pushed to HiveTableScan nodes.
> I've developed a solution on this issue. If you guys do not have a plan for 
> this already, I could merge the solution back to master.
> And there is also a problem on column pruning for "Generate", I would file 
> another issue about that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to