[ https://issues.apache.org/jira/browse/IMPALA-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Fang-Yu Rao updated IMPALA-9281: -------------------------------- Comment: was deleted (was: After some initial investigation, I found that the field of {{assignedConjunctsByTupleId}} of {{Analyzer}}, which according to the comment stores all registered inferred conjuncts is always empty throughout every query described above (https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L358-L362). Moreover, the method {{createInferredEqPred()}} at https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L1364-L1375 has never been called during the planning. In the following I also provide the queries I have tested. {code:java} select * from myview_1_on_2_parquet_tables a, myview_2_on_2_parquet_tables b where a.c2 = b.c2a and a.c2 = 'one'; {code} I expected to see an inferred predicate of {{b.c2a = 'one'}} for the query above but did not see it after the planning. {code:java} select * from pt1, pta1 where pt1.c1 = pta1.c1a and pt1.c1 = 1; {code} I expected to see an inferred predicate of {{pta1.c1a = 1}} for the query above but did not see it after the planning. {code:java} select * from pt1, pta1 where pt1.c2 = pta1.c2a and pt1.c2 = 'one'; {code} I expected to see an inferred predicate of {{pta1.c2a = 'one'}} for the query above but did not see it after the planning. ) > Inferred predicates not assigned to scan nodes when views are involved > ---------------------------------------------------------------------- > > Key: IMPALA-9281 > URL: https://issues.apache.org/jira/browse/IMPALA-9281 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Affects Versions: Impala 3.4.0 > Reporter: Fang-Yu Rao > Assignee: Fang-Yu Rao > Priority: Major > Attachments: profile_query_1_parquet.txt, profile_query_2_parquet.txt > > > When a query involves the join of views each created based on multiple > tables, the inferred predicate(s) is(are) not assigned to the scan node(s). > This issue is/seems related to > https://issues.apache.org/jira/browse/IMPALA-4578#. > In the following a minimum example to reproduce the phenomenon. > {code:java} > CREATE TABLE default.pt1 ( > c1 INT, > c2 STRING > ) > STORED AS PARQUET; > insert into pt1 values (1, 'one'); > CREATE TABLE default.pt2 ( > c1 INT, > c2 STRING > ) > STORED AS PARQUET; > insert into pt2 values (2, 'two'); > CREATE TABLE default.pta1 ( > c1a INT, > c2a STRING > ) > STORED AS PARQUET; > insert into pta1 values (1,'one'); > CREATE TABLE default.pta2 ( > c1a INT, > c2a STRING > ) > STORED AS PARQUET; > insert into pta2 values (2,'two'); > CREATE VIEW myview_1_on_2_parquet_tables AS > SELECT 'ONE' table_source, c1, c2 FROM `default`.pt1 > UNION ALL > SELECT 'TWO' table_source, c1, c2 FROM `default`.pt2; > CREATE VIEW myview_2_on_2_parquet_tables AS > SELECT 'ONE' table_source_a, c1a, c2a FROM `default`.pta1 > UNION ALL > SELECT 'TWO' table_source_a, c1a, c2a FROM `default`.pta2; > {code} > For easy reference, the contents of tables {{pt1}}, {{pt2}}, {{pta1}}, > {{pta2}}, and views {{myview_1_on_2_tables}}, {{myview_2_on_2_tables}} are > also given as follows. > Contents of table {{pt1}} afterwards: > {code:java} > +----+-----+ > | c1 | c2 | > +----+-----+ > | 1 | one | > +----+-----+ > {code} > Contents of table {{pt2}} afterwards: > {code:java} > +----+-----+ > | c1 | c2 | > +----+-----+ > | 2 | two | > +----+-----+ > {code} > Contents of table {{pta1}} afterwards: > {code:java} > +-----+-----+ > | c1a | c2a | > +-----+-----+ > | 1 | one | > +-----+-----+ > {code} > Contents of table {{pta2}} afterwards: > {code:java} > +-----+-----+ > | c1a | c2a | > +-----+-----+ > | 2 | two | > +-----+-----+ > {code} > Contents in {{myview_1_on_2_parquet_tables}} (union of tables {{t1}} and > {{t2}}): > {code:java} > +--------------+----+-----+ > | table_source | c1 | c2 | > +--------------+----+-----+ > | ONE | 1 | one | > | TWO | 2 | two | > +--------------+----+-----+ > {code} > Contents in {{myview_2_on_2_parquet_tables}} (union of tables {{ta1}} and > {{ta2}}): > {code:java} > +----------------+-----+-----+ > | table_source_a | c1a | c2a | > +----------------+-----+-----+ > | ONE | 1 | one | > | TWO | 2 | two | > +----------------+-----+-----+ > {code} > After creating the related tables and views described above, we consider the > following 2 queries. > Query 1: > {code:java} > select * > from default.myview_1_on_2_parquet_tables a, myview_2_on_2_parquet_tables b > where a.table_source = 'ONE' > and a.table_source = b.table_source_a; > {code} > Query 2: > {code:java} > select * > from default.myview_1_on_2_parquet_tables a, myview_2_on_2_parquet_tables b > where a.table_source = 'ONE' > and b.table_source_a = 'ONE' > and a.table_source = b.table_source_a; > {code} > Both queries join those 2 views on the column {{table_source}} and filter out > those rows not satisfying {{table_source = 'ONE'}}. Both queries produce the > same result set as the following. > {code:java} > +--------------+----+-----+----------------+-----+-----+ > | table_source | c1 | c2 | table_source_a | c1a | c2a | > +--------------+----+-----+----------------+-----+-----+ > | ONE | 1 | one | ONE | 1 | one | > +--------------+----+-----+----------------+-----+-----+ > {code} > However, according to the query profile, Query 1 results in 3 scans on tables > {{pt1}}, {{pta1}}, and {{pta2}}, respectively. On the other hand, Query 2 > that incorporates the additional/redundant predicate "{{b.table_source_a = > 'ONE}}'" only involves 2 scans on tables {{pt1}} and {{pta1}}, respectively > due to this seemingly redundant predicate on {{b.table_source_a}}. > Hence, it can be seen that the plan generated from Query 1 is sub-optimal > since a table that cannot contain any row in the result set is still scanned, > i.e., table {{pta2}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org