[jira] [Commented] (DRILL-3410) Partition Pruning : We are doing a prune when we shouldn't
[ https://issues.apache.org/jira/browse/DRILL-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053774#comment-15053774 ] Rahul Challapalli commented on DRILL-3410: -- Verified the fix and added a test case. > Partition Pruning : We are doing a prune when we shouldn't > -- > > Key: DRILL-3410 > URL: https://issues.apache.org/jira/browse/DRILL-3410 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Rahul Challapalli >Assignee: Steven Phillips >Priority: Critical > Fix For: 1.1.0 > > Attachments: DRILL-3410.patch, DRILL-3410_part2.patch, > DRILL-3410_part2.patch, DRILL-3410_part2.patch > > > git.commit.id.abbrev=60bc945 > The below plan does not look right. It should scan all the files based on the > filters in the query. Also hive returned more rows than drill > {code} > explain plan for select * from `existing_partition_pruning/lineitempart` > where (dir0=1993 and columns[0] >29600) or (dir0=1994 or columns[0]>29700); > | 00-00Screen > 00-01 Project(*=[$0]) > 00-02Project(T70¦¦*=[$0]) > 00-03 SelectionVectorRemover > 00-04Filter(condition=[OR(AND(=($1, 1993), >(ITEM($2, 0), > 29600)), =($1, 1994), >(ITEM($2, 0), 29700))]) > 00-05 Project(T70¦¦*=[$0], dir0=[$1], columns=[$2]) > 00-06Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath > [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_3.parquet], > ReadEntryWithPath > [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_4.parquet]], > > selectionRoot=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart, > numFiles=2, columns=[`*`]]]) > | > {code} > I attached the data set used. Let me know if you need anything more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3410) Partition Pruning : We are doing a prune when we shouldn't
[ https://issues.apache.org/jira/browse/DRILL-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604818#comment-14604818 ] Steven Phillips commented on DRILL-3410: [~amansinha100], [~jnadeau], [~jni], could one of you review the new patch? Partition Pruning : We are doing a prune when we shouldn't -- Key: DRILL-3410 URL: https://issues.apache.org/jira/browse/DRILL-3410 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Rahul Challapalli Assignee: Aman Sinha Priority: Critical Fix For: 1.1.0 Attachments: DRILL-3410.patch, DRILL-3410_part2.patch git.commit.id.abbrev=60bc945 The below plan does not look right. It should scan all the files based on the filters in the query. Also hive returned more rows than drill {code} explain plan for select * from `existing_partition_pruning/lineitempart` where (dir0=1993 and columns[0] 29600) or (dir0=1994 or columns[0]29700); | 00-00Screen 00-01 Project(*=[$0]) 00-02Project(T70¦¦*=[$0]) 00-03 SelectionVectorRemover 00-04Filter(condition=[OR(AND(=($1, 1993), (ITEM($2, 0), 29600)), =($1, 1994), (ITEM($2, 0), 29700))]) 00-05 Project(T70¦¦*=[$0], dir0=[$1], columns=[$2]) 00-06Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_3.parquet], ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_4.parquet]], selectionRoot=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart, numFiles=2, columns=[`*`]]]) | {code} I attached the data set used. Let me know if you need anything more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3410) Partition Pruning : We are doing a prune when we shouldn't
[ https://issues.apache.org/jira/browse/DRILL-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604956#comment-14604956 ] Aman Sinha commented on DRILL-3410: --- It would be better to use the utility methods RexUtil.composeConjunction() and composeDisjunction() since they handle empty lists etc. Do you need to use a LinkedList for the list of conjuncts/disjuncts ? ArrayList should work. Partition Pruning : We are doing a prune when we shouldn't -- Key: DRILL-3410 URL: https://issues.apache.org/jira/browse/DRILL-3410 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Rahul Challapalli Assignee: Aman Sinha Priority: Critical Fix For: 1.1.0 Attachments: DRILL-3410.patch, DRILL-3410_part2.patch, DRILL-3410_part2.patch git.commit.id.abbrev=60bc945 The below plan does not look right. It should scan all the files based on the filters in the query. Also hive returned more rows than drill {code} explain plan for select * from `existing_partition_pruning/lineitempart` where (dir0=1993 and columns[0] 29600) or (dir0=1994 or columns[0]29700); | 00-00Screen 00-01 Project(*=[$0]) 00-02Project(T70¦¦*=[$0]) 00-03 SelectionVectorRemover 00-04Filter(condition=[OR(AND(=($1, 1993), (ITEM($2, 0), 29600)), =($1, 1994), (ITEM($2, 0), 29700))]) 00-05 Project(T70¦¦*=[$0], dir0=[$1], columns=[$2]) 00-06Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_3.parquet], ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_4.parquet]], selectionRoot=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart, numFiles=2, columns=[`*`]]]) | {code} I attached the data set used. Let me know if you need anything more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3410) Partition Pruning : We are doing a prune when we shouldn't
[ https://issues.apache.org/jira/browse/DRILL-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605068#comment-14605068 ] Aman Sinha commented on DRILL-3410: --- Looks like there's an extraneous call: RexUtil.composeConjunction(builder, call.getOperands(), true); Otherwise LGTM. +1. Partition Pruning : We are doing a prune when we shouldn't -- Key: DRILL-3410 URL: https://issues.apache.org/jira/browse/DRILL-3410 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Rahul Challapalli Assignee: Aman Sinha Priority: Critical Fix For: 1.1.0 Attachments: DRILL-3410.patch, DRILL-3410_part2.patch, DRILL-3410_part2.patch, DRILL-3410_part2.patch git.commit.id.abbrev=60bc945 The below plan does not look right. It should scan all the files based on the filters in the query. Also hive returned more rows than drill {code} explain plan for select * from `existing_partition_pruning/lineitempart` where (dir0=1993 and columns[0] 29600) or (dir0=1994 or columns[0]29700); | 00-00Screen 00-01 Project(*=[$0]) 00-02Project(T70¦¦*=[$0]) 00-03 SelectionVectorRemover 00-04Filter(condition=[OR(AND(=($1, 1993), (ITEM($2, 0), 29600)), =($1, 1994), (ITEM($2, 0), 29700))]) 00-05 Project(T70¦¦*=[$0], dir0=[$1], columns=[$2]) 00-06Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_3.parquet], ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_4.parquet]], selectionRoot=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart, numFiles=2, columns=[`*`]]]) | {code} I attached the data set used. Let me know if you need anything more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3410) Partition Pruning : We are doing a prune when we shouldn't
[ https://issues.apache.org/jira/browse/DRILL-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604301#comment-14604301 ] Steven Phillips commented on DRILL-3410: Created reviewboard https://reviews.apache.org/r/35973/ Partition Pruning : We are doing a prune when we shouldn't -- Key: DRILL-3410 URL: https://issues.apache.org/jira/browse/DRILL-3410 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Rahul Challapalli Assignee: Steven Phillips Priority: Critical Fix For: 1.1.0 Attachments: DRILL-3410.patch git.commit.id.abbrev=60bc945 The below plan does not look right. It should scan all the files based on the filters in the query. Also hive returned more rows than drill {code} explain plan for select * from `existing_partition_pruning/lineitempart` where (dir0=1993 and columns[0] 29600) or (dir0=1994 or columns[0]29700); | 00-00Screen 00-01 Project(*=[$0]) 00-02Project(T70¦¦*=[$0]) 00-03 SelectionVectorRemover 00-04Filter(condition=[OR(AND(=($1, 1993), (ITEM($2, 0), 29600)), =($1, 1994), (ITEM($2, 0), 29700))]) 00-05 Project(T70¦¦*=[$0], dir0=[$1], columns=[$2]) 00-06Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_3.parquet], ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_4.parquet]], selectionRoot=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart, numFiles=2, columns=[`*`]]]) | {code} I attached the data set used. Let me know if you need anything more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3410) Partition Pruning : We are doing a prune when we shouldn't
[ https://issues.apache.org/jira/browse/DRILL-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604302#comment-14604302 ] Steven Phillips commented on DRILL-3410: [~amansinha100], can you please review? Partition Pruning : We are doing a prune when we shouldn't -- Key: DRILL-3410 URL: https://issues.apache.org/jira/browse/DRILL-3410 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Rahul Challapalli Assignee: Aman Sinha Priority: Critical Fix For: 1.1.0 Attachments: DRILL-3410.patch git.commit.id.abbrev=60bc945 The below plan does not look right. It should scan all the files based on the filters in the query. Also hive returned more rows than drill {code} explain plan for select * from `existing_partition_pruning/lineitempart` where (dir0=1993 and columns[0] 29600) or (dir0=1994 or columns[0]29700); | 00-00Screen 00-01 Project(*=[$0]) 00-02Project(T70¦¦*=[$0]) 00-03 SelectionVectorRemover 00-04Filter(condition=[OR(AND(=($1, 1993), (ITEM($2, 0), 29600)), =($1, 1994), (ITEM($2, 0), 29700))]) 00-05 Project(T70¦¦*=[$0], dir0=[$1], columns=[$2]) 00-06Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_3.parquet], ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_4.parquet]], selectionRoot=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart, numFiles=2, columns=[`*`]]]) | {code} I attached the data set used. Let me know if you need anything more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3410) Partition Pruning : We are doing a prune when we shouldn't
[ https://issues.apache.org/jira/browse/DRILL-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603909#comment-14603909 ] Steven Phillips commented on DRILL-3410: This appears to be due to the fact that the FindPartitionConditions class, which is the code that walks the expression tree and determines if pruning is valid, assumes that the Binary operators OR and AND only have two arguments. But you can see from expression in the plan: {code} OR(AND(=($1, 1993), (ITEM($2, 0), 29600)), =($1, 1994), (ITEM($2, 0), 29700)) {code} that expression was rewritten with a single OR operator with 3 arguments. Rewriting the expression with true binary operators seems to fix the problem. I will have a patch available shortly. Partition Pruning : We are doing a prune when we shouldn't -- Key: DRILL-3410 URL: https://issues.apache.org/jira/browse/DRILL-3410 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Rahul Challapalli Assignee: Steven Phillips Priority: Critical Fix For: 1.1.0 git.commit.id.abbrev=60bc945 The below plan does not look right. It should scan all the files based on the filters in the query. Also hive returned more rows than drill {code} explain plan for select * from `existing_partition_pruning/lineitempart` where (dir0=1993 and columns[0] 29600) or (dir0=1994 or columns[0]29700); | 00-00Screen 00-01 Project(*=[$0]) 00-02Project(T70¦¦*=[$0]) 00-03 SelectionVectorRemover 00-04Filter(condition=[OR(AND(=($1, 1993), (ITEM($2, 0), 29600)), =($1, 1994), (ITEM($2, 0), 29700))]) 00-05 Project(T70¦¦*=[$0], dir0=[$1], columns=[$2]) 00-06Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_3.parquet], ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_4.parquet]], selectionRoot=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart, numFiles=2, columns=[`*`]]]) | {code} I attached the data set used. Let me know if you need anything more -- This message was sent by Atlassian JIRA (v6.3.4#6332)