[ https://issues.apache.org/jira/browse/SPARK-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Armbrust updated SPARK-8287: ------------------------------------ Summary: Filters not pushed with substitution through aggregation (was: Filter not push down through Subquery or View) > Filters not pushed with substitution through aggregation > -------------------------------------------------------- > > Key: SPARK-8287 > URL: https://issues.apache.org/jira/browse/SPARK-8287 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.3.1 > Reporter: Li Sheng > Fix For: 1.4.0 > > Original Estimate: 40h > Remaining Estimate: 40h > > Filter not push down through Subquery or View. Assume we have two big > partitioned table join inner a Subquery or a View and filter not push down, > this will cause a full partition join and will cause performance issues. > Let me give and example that can reproduce the problem: > {code:sql} > create table src(key int, value string); > -- Creates partitioned table and imports data > CREATE TABLE src_partitioned1 (key int, value STRING) PARTITIONED BY (ds > STRING); > insert overwrite table src_partitioned1 PARTITION (ds='1') select key, value > from src; > insert overwrite table src_partitioned1 PARTITION (ds='2') select key, value > from src; > CREATE TABLE src_partitioned2 (key int, value STRING) PARTITIONED BY (ds > STRING); > insert overwrite table src_partitioned2 PARTITION (ds='1') select key, value > from src; > insert overwrite table src_partitioned2 PARTITION (ds='2') select key, value > from src; > -- Creates views > create view src_view as select sum(a.key) s1, sum(b.key) s2, b.ds ds from > src_partitioned1 a join src_partitioned2 b on a.ds = b.ds group by b.ds > create view src_view_1 as select sum(a.key) s1, sum(b.key) s2, b.ds my_ds > from src_partitioned1 a join src_partitioned2 b on a.ds = b.ds group by b.ds > -- QueryExecution > select * from dw.src_view where ds='2' > {code} > {noformat} > sql("select * from dw.src_view where ds='2' ").queryExecution > == Parsed Logical Plan == > 'Project [*] > 'Filter ('ds = 2) > 'UnresolvedRelation [dw,src_view], None > == Analyzed Logical Plan == > Project [s1#60L,s2#61L,ds#62] > Filter (ds#62 = 2) > Subquery src_view > Aggregate [ds#66], [SUM(CAST(key#64, LongType)) AS s1#60L,SUM(CAST(key#67, > LongType)) AS s2#61L,ds#66 AS ds#62] > Join Inner, Some((ds#63 = ds#66)) > MetastoreRelation dw, src_partitioned1, Some(a) > MetastoreRelation dw, src_partitioned2, Some(b) > == Optimized Logical Plan == > Filter (ds#62 = 2) > Aggregate [ds#66], [SUM(CAST(key#64, LongType)) AS s1#60L,SUM(CAST(key#67, > LongType)) AS s2#61L,ds#66 AS ds#62] > Project [ds#66,key#64,key#67] > Join Inner, Some((ds#63 = ds#66)) > Project [key#64,ds#63] > MetastoreRelation dw, s... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org