[ 
https://issues.apache.org/jira/browse/IMPALA-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016353#comment-17016353
 ] 

Fang-Yu Rao commented on IMPALA-9281:
-------------------------------------

After some initial investigation, I found that the field of 
{{assignedConjunctsByTupleId}} of {{Analyzer}}, which according to the comment 
stores all registered inferred conjuncts is always empty throughout every query 
described above 
(https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L358-L362).

Moreover, the method {{createInferredEqPred()}} at 
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L1364-L1375
 has never been called during the planning.

Since the example provided in the problem description is Kudu specific and the 
problem we are having here is not only specific to Kudu, I will update the 
description accordingly to make it easier for us to reproduce the issue.


> Inferred predicates not assigned to scan nodes when views are involved
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-9281
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9281
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.4.0
>            Reporter: Fang-Yu Rao
>            Assignee: Fang-Yu Rao
>            Priority: Major
>         Attachments: profile_query_1_kudu.txt, profile_query_1_parquet.txt, 
> profile_query_1_predicate_on_table_column_kudu.txt, 
> profile_query_1_predicate_on_table_column_parquet.txt, 
> profile_query_2_kudu.txt, profile_query_2_parquet.txt
>
>
> When a query involves the join of views each created based on multiple 
> tables, the inferred predicate(s) is(are) not assigned to the scan node(s). 
> This issue is/seems related to 
> https://issues.apache.org/jira/browse/IMPALA-4578#.
> In the following a minimum example to reproduce the phenomenon using views 
> created based on Kudu tables is given. However, we note that this issue is 
> not Kudu specific, it also exists when the underlying tables are of Parquet 
> format. The respective query profiles are also attached.
> To create the (Kudu) tables in the provided example below, please replace the 
> address(es) of the Kudu master(s) accordingly.
> {code:java}
> CREATE TABLE default.t1 (
>    c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    PRIMARY KEY (c1)
> )
> PARTITION BY HASH (c1) PARTITIONS 2
> STORED AS KUDU 
> TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
> insert into t1 values (1, 'one');
> CREATE TABLE default.t2 (
>    c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    PRIMARY KEY (c1)
> )
> PARTITION BY HASH (c1) PARTITIONS 2
> STORED AS KUDU 
> TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
> insert into t2 values (2, 'two');
> CREATE TABLE default.ta1 (
>    c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    PRIMARY KEY (c1)
> )
> PARTITION BY HASH (c1) PARTITIONS 2
> STORED AS KUDU
> TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
> insert into ta1 values (1,'one');
> CREATE TABLE default.ta2 (
>    c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    PRIMARY KEY (c1)
> )
> PARTITION BY HASH (c1) PARTITIONS 2
> STORED AS KUDU
> TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
> insert into ta2 values (2,'two');
> CREATE VIEW myview_1_on_2_tables AS  
> SELECT 'ONE' table_source, c1, c2 FROM `default`.t1 
> UNION ALL 
> SELECT 'TWO' table_source, c1, c2 FROM `default`.t2;
> CREATE VIEW myview_2_on_2_tables AS  
> SELECT 'ONE' table_source, c1, c2 FROM `default`.ta1 
> UNION ALL 
> SELECT 'TWO' table_source, c1, c2 FROM `default`.ta2;
> {code}
> For easy reference, the contents of tables {{t1}}, {{t2}}, {{ta1}}, {{ta2}}, 
> and views {{myview_1_on_2_tables}}, {{myview_2_on_2_tables}} are also given 
> as follows.
> Contents of table {{t1}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2  |
> +----+-----+
> | 1  | one |
> +----+-----+
> {code}
> Contents of table {{t2}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2  |
> +----+-----+
> | 2  | two |
> +----+-----+
> {code}
> Contents of table {{ta1}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2  |
> +----+-----+
> | 1  | one |
> +----+-----+
> {code}
> Contents of table {{ta2}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2  |
> +----+-----+
> | 2  | two |
> +----+-----+
> {code}
> Contents in {{default.myview_1_on_2_tables}} (union of tables {{t1}} and 
> {{t2}}):
> {code:java}
> +--------------+----+-----+
> | table_source | c1 | c2  |
> +--------------+----+-----+
> | ONE          | 1  | one |
> | TWO          | 2  | two |
> +--------------+----+-----+
> {code}
> Contents in {{default.myview_2_on_2_tables}} (union of tables {{ta1}} and 
> {{ta2}}):
> {code:java}
> +--------------+----+-----+
> | table_source | c1 | c2  |
> +--------------+----+-----+
> | ONE          | 1  | one |
> | TWO          | 2  | two |
> +--------------+----+-----+
> {code}
> After creating the related tables and views described above, we consider the 
> following 2 queries.
> Query 1:
> {code:java}
> select * 
> from default.myview_1_on_2_tables a, myview_2_on_2_tables b 
> where a.table_source = 'ONE' 
> and a.table_source = b.table_source 
> and a.c2 = 'one';
> {code}
> Query 2:
> {code:java}
> select * 
> from default.myview_1_on_2_tables a, myview_2_on_2_tables b 
> where a.table_source = 'ONE' 
> and b.table_source = 'ONE' 
> and a.table_source = b.table_source 
> and a.c2 = 'one';
> {code}
> Both queries join those 2 views on the column {{table_source}} and filter out 
> those rows not satisfying {{table_source = 'ONE'}}. Both queries produce the 
> same result set as the following.
> {code:java}
> +--------------+----+-----+--------------+----+-----+
> | table_source | c1 | c2  | table_source | c1 | c2  |
> +--------------+----+-----+--------------+----+-----+
> | ONE          | 1  | one | ONE          | 1  | one |
> +--------------+----+-----+--------------+----+-----+
> {code}
> However, according to the query profile, Query 1 results in 3 Kudu scans on 
> tables {{t1}}, {{ta1}}, and {{ta2}}, respectively. On the other hand, Query 2 
> that incorporates the additional/redundant predicate "{{b.table_source = 
> 'ONE}}'" only involves 2 Kudu scans on tables {{t1}} and {{ta1}}, 
> respectively due to this seemingly redundant predicate on {{b.table_source}}.
> Hence, it can be seen that the plan generated from Query 1 is sub-optimal 
> since a table that cannot contain any row in the result set is still scanned, 
> i.e., table {{ta2}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to