[jira] [Issue Comment Deleted] (IMPALA-9281) Inferred predicates not assigned to scan nodes when views are involved

Fang-Yu Rao (Jira) Wed, 15 Jan 2020 17:54:29 -0800


     [ 
https://issues.apache.org/jira/browse/IMPALA-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Fang-Yu Rao updated IMPALA-9281:
--------------------------------
    Comment: was deleted

(was: After some initial investigation, I found that the field of 
{{assignedConjunctsByTupleId}} of {{Analyzer}}, which according to the comment 
stores all registered inferred conjuncts is always empty throughout every query 
described above 
(https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L358-L362).

Moreover, the method {{createInferredEqPred()}} at 
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L1364-L1375
 has never been called during the planning.

In the following I also provide the queries I have tested.

{code:java}
select * 
from myview_1_on_2_parquet_tables a, myview_2_on_2_parquet_tables b 
where a.c2 = b.c2a 
and a.c2 = 'one';
{code}
I expected to see an inferred predicate of {{b.c2a = 'one'}} for the query 
above but did not see it after the planning.

{code:java}
select * 
from pt1, pta1 
where pt1.c1 = pta1.c1a  
and pt1.c1 = 1;
{code}
I expected to see an inferred predicate of {{pta1.c1a = 1}} for the query above 
but did not see it after the planning.

{code:java}
select * 
from pt1, pta1 
where pt1.c2 = pta1.c2a 
and pt1.c2 = 'one';
{code}
I expected to see an inferred predicate of {{pta1.c2a = 'one'}} for the query 
above but did not see it after the planning.
)

> Inferred predicates not assigned to scan nodes when views are involved
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-9281
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9281
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.4.0
>            Reporter: Fang-Yu Rao
>            Assignee: Fang-Yu Rao
>            Priority: Major
>         Attachments: profile_query_1_parquet.txt, profile_query_2_parquet.txt
>
>
> When a query involves the join of views each created based on multiple 
> tables, the inferred predicate(s) is(are) not assigned to the scan node(s). 
> This issue is/seems related to 
> https://issues.apache.org/jira/browse/IMPALA-4578#.
> In the following a minimum example to reproduce the phenomenon.
> {code:java}
> CREATE TABLE default.pt1 (
>    c1 INT,
>    c2 STRING
> ) 
> STORED AS PARQUET;
> insert into pt1 values (1, 'one');
> CREATE TABLE default.pt2 (
>    c1 INT,
>    c2 STRING
> ) 
> STORED AS PARQUET;
> insert into pt2 values (2, 'two');
> CREATE TABLE default.pta1 (
>    c1a INT, 
>    c2a STRING
> )
> STORED AS PARQUET;
> insert into pta1 values (1,'one');
> CREATE TABLE default.pta2 (
>    c1a INT, 
>    c2a STRING
> )
> STORED AS PARQUET;
> insert into pta2 values (2,'two');
> CREATE VIEW myview_1_on_2_parquet_tables AS 
> SELECT 'ONE' table_source, c1, c2 FROM `default`.pt1 
> UNION ALL 
> SELECT 'TWO' table_source, c1, c2 FROM `default`.pt2;
> CREATE VIEW myview_2_on_2_parquet_tables AS  
> SELECT 'ONE' table_source_a, c1a, c2a FROM `default`.pta1 
> UNION ALL 
> SELECT 'TWO' table_source_a, c1a, c2a FROM `default`.pta2;
> {code}
> For easy reference, the contents of tables {{pt1}}, {{pt2}}, {{pta1}}, 
> {{pta2}}, and views {{myview_1_on_2_tables}}, {{myview_2_on_2_tables}} are 
> also given as follows.
> Contents of table {{pt1}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2  |
> +----+-----+
> | 1  | one |
> +----+-----+
> {code}
> Contents of table {{pt2}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2  |
> +----+-----+
> | 2  | two |
> +----+-----+
> {code}
> Contents of table {{pta1}} afterwards:
> {code:java}
> +-----+-----+
> | c1a | c2a |
> +-----+-----+
> | 1   | one |
> +-----+-----+
> {code}
> Contents of table {{pta2}} afterwards:
> {code:java}
> +-----+-----+
> | c1a | c2a |
> +-----+-----+
> | 2   | two |
> +-----+-----+
> {code}
> Contents in {{myview_1_on_2_parquet_tables}} (union of tables {{t1}} and 
> {{t2}}):
> {code:java}
> +--------------+----+-----+
> | table_source | c1 | c2  |
> +--------------+----+-----+
> | ONE          | 1  | one |
> | TWO          | 2  | two |
> +--------------+----+-----+
> {code}
> Contents in {{myview_2_on_2_parquet_tables}} (union of tables {{ta1}} and 
> {{ta2}}):
> {code:java}
> +----------------+-----+-----+
> | table_source_a | c1a | c2a |
> +----------------+-----+-----+
> | ONE            | 1   | one |
> | TWO            | 2   | two |
> +----------------+-----+-----+
> {code}
> After creating the related tables and views described above, we consider the 
> following 2 queries.
> Query 1:
> {code:java}
> select * 
> from default.myview_1_on_2_parquet_tables a, myview_2_on_2_parquet_tables b 
> where a.table_source = 'ONE' 
> and a.table_source = b.table_source_a;
> {code}
> Query 2:
> {code:java}
> select * 
> from default.myview_1_on_2_parquet_tables a, myview_2_on_2_parquet_tables b 
> where a.table_source = 'ONE' 
> and b.table_source_a = 'ONE' 
> and a.table_source = b.table_source_a;
> {code}
> Both queries join those 2 views on the column {{table_source}} and filter out 
> those rows not satisfying {{table_source = 'ONE'}}. Both queries produce the 
> same result set as the following.
> {code:java}
> +--------------+----+-----+----------------+-----+-----+
> | table_source | c1 | c2  | table_source_a | c1a | c2a |
> +--------------+----+-----+----------------+-----+-----+
> | ONE          | 1  | one | ONE            | 1   | one |
> +--------------+----+-----+----------------+-----+-----+
> {code}
> However, according to the query profile, Query 1 results in 3 scans on tables 
> {{pt1}}, {{pta1}}, and {{pta2}}, respectively. On the other hand, Query 2 
> that incorporates the additional/redundant predicate "{{b.table_source_a = 
> 'ONE}}'" only involves 2 scans on tables {{pt1}} and {{pta1}}, respectively 
> due to this seemingly redundant predicate on {{b.table_source_a}}.
> Hence, it can be seen that the plan generated from Query 1 is sub-optimal 
> since a table that cannot contain any row in the result set is still scanned, 
> i.e., table {{pta2}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Issue Comment Deleted] (IMPALA-9281) Inferred predicates not assigned to scan nodes when views are involved

Reply via email to