[ 
https://issues.apache.org/jira/browse/IMPALA-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2108.
-----------------------------------
      Assignee:     (was: Bharath Vissapragada)
    Resolution: Duplicate

> Improve partition pruning by extracting partition-column filters from 
> non-trivial disjunctions.
> -----------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-2108
>                 URL: https://issues.apache.org/jira/browse/IMPALA-2108
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 1.2.4, Impala 1.3, Impala 1.4, Impala 2.1, Impala 
> 2.2
>            Reporter: Alexander Behm
>            Priority: Minor
>              Labels: newbie, performance
>
> *Problem Statement*
> Impala fails to prune partitions if the partition-column filters are part of 
> a "non-trivial" disjunction where each disjunct itself consists of conjuncts 
> referencing both partition and non-partition columns.
> Consider the following example:
> {code}
> create table test_table (c1 INT, c2 STRING) PARTITIONED BY (pc INT);
> [localhost.localdomain:21000] > explain select c1 from test_table where (pc=1 
> and c2='a') or (pc=2 and c2='b') or (pc=3 and c2='c');
> Query: explain select c1 from test_table where (pc=1 and c2='a') or (pc=2 and 
> c2='b') or (pc=3 and c2='c') <-- Partition-column filters inside non-trivial 
> djsiunctions
> +----------------------------------------------------------------------------------------+
> | Explain String                                                              
>            |
> +----------------------------------------------------------------------------------------+
> | Estimated Per-Host Requirements: Memory=176.00MB VCores=1                   
>            |
> | WARNING: The following tables are missing relevant table and/or column 
> statistics.     |
> | default.test_table                                                          
>            |
> |                                                                             
>            |
> | 01:EXCHANGE [UNPARTITIONED]                                                 
>            |
> | |                                                                           
>            |
> | 00:SCAN HDFS [default.test_table]                                           
>            |
> |    partitions=5/5 files=9 size=36B                                          
>            |
> |    predicates: (pc = 1 AND c2 = 'a') OR (pc = 2 AND c2 = 'b') OR (pc = 3 
> AND c2 = 'c') |
> +----------------------------------------------------------------------------------------+
> Fetched 9 row(s) in 0.04s
> [localhost.localdomain:21000] > 
> {code}
> *Cause*
> This is a limitation in how Impala filters partitions.
> *Workaround*
> The above example can be fixed by manually rewriting the predicate as follows:
> {code}
> select c1 from test_table where ((pc=1 and c2='a') or (pc=2 and c2='b') or 
> (pc=3 and c2='c')) and (pc=1 OR pc=2 OR pc=3);
> {code}
> *Proposed fix*
> The proposed fix is for Impala to automatically do what is stated in the 
> workaround above:
> Extract the partition-column filters from the disjunctions, create a new 
> predicate with all those partition-column filters connected with OR, and add 
> the new predicate to the original one with AND.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to