[ 
https://issues.apache.org/jira/browse/IMPALA-10064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189818#comment-17189818
 ] 

ASF subversion and git services commented on IMPALA-10064:
----------------------------------------------------------

Commit 5e9f10d34cc2ba6e18b469a3a5ae3ed9f5f306b1 in impala's branch 
refs/heads/master from Aman Sinha
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5e9f10d ]

IMPALA-10064: Support constant propagation for eligible range predicates

This patch adds support for constant propagation of range predicates
involving date and timestamp constants. Previously, only equality
predicates were considered for propagation. The new type of propagation
is shown by the following example:

Before constant propagation:
 WHERE date_col = CAST(timestamp_col as DATE)
  AND timestamp_col BETWEEN '2019-01-01' AND '2020-01-01'
After constant propagation:
 WHERE date_col >= '2019-01-01' AND date_col <= '2020-01-01'
  AND timestamp_col >= '2019-01-01' AND timestamp_col <= '2020-01-01'
  AND date_col = CAST(timestamp_col as DATE)

As a consequence, since Impala supports table partitioning by date
columns but not timestamp columns, the above propagation enables
partition pruning based on timestamp ranges.

Existing code for equality based constant propagation was refactored
and consolidated into a new class which handles both equality and
range based constant propagation. Range based propagation is only
applied to date and timestamp columns.

Testing:
 - Added new range constant propagation tests to PlannerTest.
 - Added e2e test for range constant propagation based on a newly
   added date partitioned table.
 - Ran precommit tests.

Change-Id: I811a1f8d605c27c7704d7fc759a91510c6db3c2b
Reviewed-on: http://gerrit.cloudera.org:8080/16346
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Support constant propagation for range predicates
> -------------------------------------------------
>
>                 Key: IMPALA-10064
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10064
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 3.4.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>            Priority: Major
>
> Consider the following table schema, view and 2 queries on the view:
> {noformat}
> create table tt1 (a1 int, b1 int, ts timestamp) partitioned by (mydate date);
> create view tt1_view as (select a1, b1, ts from tt1 where mydate = cast(ts as 
> date));
> // query 1:  (Good) constant on ts gets propagated
> explain select * from tt1_view where ts = '2019-07-01';
> 00:SCAN HDFS [db1.tt1]
>    partition predicates: mydate = DATE '2019-07-01'
>    HDFS partitions=1/3 files=2 size=48B
>    predicates: db1.tt1.ts = TIMESTAMP '2019-07-01 00:00:00'
>    row-size=24B cardinality=1
> // query 2: (Not good) constant on ts does not get propagated
> explain select * from tt1_view where ts > '2019-07-01';
> 00:SCAN HDFS [db1.tt1]
>    HDFS partitions=3/3 files=4 size=96B
>    predicates: db1.tt1.ts > TIMESTAMP '2019-07-01 00:00:00', mydate = CAST(ts 
> AS DATE)
>    row-size=28B cardinality=1
> {noformat}
> Note that in query 1, with the equality condition on 'ts' the constant value 
> is propagated to the 'mydate = CAST(ts as date)' predicate.  This gets 
> applied as a partition predicate.  Whereas, in query 2 which has a range 
> predicate, the constant is not propagated and no partition predicate is 
> created for the scan.  We should support the second case also for constant 
> propagation.  The constant predicates such as >, >=. <. <= and involving date 
> or timestamp literals should be considered ..but we have to analyze the cases 
> where the propagation is valid.  E.g with date_add, date_diff type of 
> functions is there a potential for incorrect propagation.
> Note that a predicate can be a BETWEEN condition such as:
> {noformat}
> WHERE ts >= '2019-07-01' AND ts <= '2020--07-01'
> {noformat}
> In this case both need to be applied 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to