[ https://issues.apache.org/jira/browse/IMPALA-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong resolved IMPALA-2108. ----------------------------------- Assignee: (was: Bharath Vissapragada) Resolution: Duplicate > Improve partition pruning by extracting partition-column filters from > non-trivial disjunctions. > ----------------------------------------------------------------------------------------------- > > Key: IMPALA-2108 > URL: https://issues.apache.org/jira/browse/IMPALA-2108 > Project: IMPALA > Issue Type: Improvement > Components: Frontend > Affects Versions: Impala 1.2.4, Impala 1.3, Impala 1.4, Impala 2.1, Impala > 2.2 > Reporter: Alexander Behm > Priority: Minor > Labels: newbie, performance > > *Problem Statement* > Impala fails to prune partitions if the partition-column filters are part of > a "non-trivial" disjunction where each disjunct itself consists of conjuncts > referencing both partition and non-partition columns. > Consider the following example: > {code} > create table test_table (c1 INT, c2 STRING) PARTITIONED BY (pc INT); > [localhost.localdomain:21000] > explain select c1 from test_table where (pc=1 > and c2='a') or (pc=2 and c2='b') or (pc=3 and c2='c'); > Query: explain select c1 from test_table where (pc=1 and c2='a') or (pc=2 and > c2='b') or (pc=3 and c2='c') <-- Partition-column filters inside non-trivial > djsiunctions > +----------------------------------------------------------------------------------------+ > | Explain String > | > +----------------------------------------------------------------------------------------+ > | Estimated Per-Host Requirements: Memory=176.00MB VCores=1 > | > | WARNING: The following tables are missing relevant table and/or column > statistics. | > | default.test_table > | > | > | > | 01:EXCHANGE [UNPARTITIONED] > | > | | > | > | 00:SCAN HDFS [default.test_table] > | > | partitions=5/5 files=9 size=36B > | > | predicates: (pc = 1 AND c2 = 'a') OR (pc = 2 AND c2 = 'b') OR (pc = 3 > AND c2 = 'c') | > +----------------------------------------------------------------------------------------+ > Fetched 9 row(s) in 0.04s > [localhost.localdomain:21000] > > {code} > *Cause* > This is a limitation in how Impala filters partitions. > *Workaround* > The above example can be fixed by manually rewriting the predicate as follows: > {code} > select c1 from test_table where ((pc=1 and c2='a') or (pc=2 and c2='b') or > (pc=3 and c2='c')) and (pc=1 OR pc=2 OR pc=3); > {code} > *Proposed fix* > The proposed fix is for Impala to automatically do what is stated in the > workaround above: > Extract the partition-column filters from the disjunctions, create a new > predicate with all those partition-column filters connected with OR, and add > the new predicate to the original one with AND. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org