Rahul Challapalli created DRILL-3418: ----------------------------------------
Summary: Partition Pruning : We are over-pruning and this leads to wrong results Key: DRILL-3418 URL: https://issues.apache.org/jira/browse/DRILL-3418 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Reporter: Rahul Challapalli Assignee: Steven Phillips Priority: Critical Fix For: 1.1.0 git.commit.id.abbrev=c199860 We are over-pruning based on the below plan. {code} explain plan for select * from `existing_partition_pruning/lineitem_hierarchical_intstring` where (dir0=1993 or dir1='jun') and (dir0=1991 or dir1='aug' or columns[0] > 5000); 00-00 Screen 00-01 Project(*=[$0]) 00-02 Project(T17¦¦*=[$0]) 00-03 SelectionVectorRemover 00-04 Filter(condition=[AND(OR(=($1, 1993), =($2, 'jun')), OR(=($1, 1991), =($2, 'aug'), >(ITEM($3, 0), 5000)))]) 00-05 Project(T17¦¦*=[$0], dir0=[$1], dir1=[$2], columns=[$3]) 00-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitem_hierarchical_intstring/0_0_26.parquet], ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitem_hierarchical_intstring/0_0_7.parquet]], selectionRoot=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitem_hierarchical_intstring, numFiles=2, columns=[`*`]]]) {code} We have 7 year partitions and each year has 12 further month partitions. I executed a count(*) query based on the above filters with & without partitioning in place and the values were different Without Partitions : {code} select count(*) from tbl_nopartitions where (dir0=1993 or dir1='jun') and (dir0=1991 or dir1='aug' or columns[0] > 5000); +---------+ | EXPR$0 | +---------+ | 14515 | +---------+ 1 row selected (0.903 seconds) {code} With Partitions : {code} select count(*) from `existing_partition_pruning/lineitem_hierarchical_intstring` where (dir0=1993 or dir1='jun') and (dir0=1991 or dir1='aug' or columns[0] > 5000); +---------+ | EXPR$0 | +---------+ | 1800 | +---------+ 1 row selected (0.49 seconds) {code} The data is larger than 10 MB to upload here -- This message was sent by Atlassian JIRA (v6.3.4#6332)