[ https://issues.apache.org/jira/browse/SPARK-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307073#comment-14307073 ]
Apache Spark commented on SPARK-5614: ------------------------------------- User 'ianluyan' has created a pull request for this issue: https://github.com/apache/spark/pull/4394 > Predicate pushdown through Generate > ----------------------------------- > > Key: SPARK-5614 > URL: https://issues.apache.org/jira/browse/SPARK-5614 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 1.2.0 > Reporter: Lu Yan > > Now in Catalyst's rules, predicates can not be pushed through "Generate" > nodes. Further more, partition pruning in HiveTableScan can not be applied on > those queries involves "Generate". This makes such queries very inefficient. > For example, physical plan for query > {quote} > select len, bk > from s_server lateral view explode(len_arr) len_table as len > where len > 5 and day = '20150102'; > {quote} > where 'day' is a partition column in metastore is like this in current > version of Spark SQL: > {quote} > Project [len, bk] > Filter ((len > "5") && "(day = "20150102")") > Generate explode(len_arr), true, false > HiveTableScan [bk, len_arr, day], (MetastoreRelation default, s_server, > None), None > {quote} > But theoretically the plan should be like this > {quote} > Project [len, bk] > Filter (len > "5") > Generate explode(len_arr), true, false > HiveTableScan [bk, len_arr, day], (MetastoreRelation default, s_server, > None), Some(day = "20150102") > {quote} > Where partition pruning predicates can be pushed to HiveTableScan nodes. > I've developed a solution on this issue. If you guys do not have a plan for > this already, I could merge the solution back to master. > And there is also a problem on column pruning for "Generate", I would file > another issue about that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org