[jira] [Commented] (DRILL-3410) Partition Pruning : We are doing a prune when we shouldn't

2015-12-11 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053774#comment-15053774
 ] 

Rahul Challapalli commented on DRILL-3410:
--

Verified the fix and added a test case.

> Partition Pruning : We are doing a prune when we shouldn't
> --
>
> Key: DRILL-3410
> URL: https://issues.apache.org/jira/browse/DRILL-3410
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Rahul Challapalli
>Assignee: Steven Phillips
>Priority: Critical
> Fix For: 1.1.0
>
> Attachments: DRILL-3410.patch, DRILL-3410_part2.patch, 
> DRILL-3410_part2.patch, DRILL-3410_part2.patch
>
>
> git.commit.id.abbrev=60bc945
> The below plan does not look right. It should scan all the files based on the 
> filters in the query. Also hive returned more rows than drill
> {code}
> explain plan for select * from `existing_partition_pruning/lineitempart` 
> where (dir0=1993 and columns[0] >29600) or (dir0=1994 or columns[0]>29700);
> | 00-00Screen
> 00-01  Project(*=[$0])
> 00-02Project(T70¦¦*=[$0])
> 00-03  SelectionVectorRemover
> 00-04Filter(condition=[OR(AND(=($1, 1993), >(ITEM($2, 0), 
> 29600)), =($1, 1994), >(ITEM($2, 0), 29700))])
> 00-05  Project(T70¦¦*=[$0], dir0=[$1], columns=[$2])
> 00-06Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_3.parquet],
>  ReadEntryWithPath 
> [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_4.parquet]],
>  
> selectionRoot=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart,
>  numFiles=2, columns=[`*`]]])
>  |
> {code}
> I attached the data set used. Let me know if you need anything more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3410) Partition Pruning : We are doing a prune when we shouldn't

2015-06-28 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604818#comment-14604818
 ] 

Steven Phillips commented on DRILL-3410:


[~amansinha100], [~jnadeau], [~jni], could one of you review the new patch?

 Partition Pruning : We are doing a prune when we shouldn't
 --

 Key: DRILL-3410
 URL: https://issues.apache.org/jira/browse/DRILL-3410
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Rahul Challapalli
Assignee: Aman Sinha
Priority: Critical
 Fix For: 1.1.0

 Attachments: DRILL-3410.patch, DRILL-3410_part2.patch


 git.commit.id.abbrev=60bc945
 The below plan does not look right. It should scan all the files based on the 
 filters in the query. Also hive returned more rows than drill
 {code}
 explain plan for select * from `existing_partition_pruning/lineitempart` 
 where (dir0=1993 and columns[0] 29600) or (dir0=1994 or columns[0]29700);
 | 00-00Screen
 00-01  Project(*=[$0])
 00-02Project(T70¦¦*=[$0])
 00-03  SelectionVectorRemover
 00-04Filter(condition=[OR(AND(=($1, 1993), (ITEM($2, 0), 
 29600)), =($1, 1994), (ITEM($2, 0), 29700))])
 00-05  Project(T70¦¦*=[$0], dir0=[$1], columns=[$2])
 00-06Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath 
 [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_3.parquet],
  ReadEntryWithPath 
 [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_4.parquet]],
  
 selectionRoot=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart,
  numFiles=2, columns=[`*`]]])
  |
 {code}
 I attached the data set used. Let me know if you need anything more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3410) Partition Pruning : We are doing a prune when we shouldn't

2015-06-28 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604956#comment-14604956
 ] 

Aman Sinha commented on DRILL-3410:
---

It would be better to use the utility methods RexUtil.composeConjunction() and 
composeDisjunction() since they handle empty lists etc.  Do you need to use a 
LinkedList for the list of conjuncts/disjuncts ?  ArrayList should work.

 Partition Pruning : We are doing a prune when we shouldn't
 --

 Key: DRILL-3410
 URL: https://issues.apache.org/jira/browse/DRILL-3410
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Rahul Challapalli
Assignee: Aman Sinha
Priority: Critical
 Fix For: 1.1.0

 Attachments: DRILL-3410.patch, DRILL-3410_part2.patch, 
 DRILL-3410_part2.patch


 git.commit.id.abbrev=60bc945
 The below plan does not look right. It should scan all the files based on the 
 filters in the query. Also hive returned more rows than drill
 {code}
 explain plan for select * from `existing_partition_pruning/lineitempart` 
 where (dir0=1993 and columns[0] 29600) or (dir0=1994 or columns[0]29700);
 | 00-00Screen
 00-01  Project(*=[$0])
 00-02Project(T70¦¦*=[$0])
 00-03  SelectionVectorRemover
 00-04Filter(condition=[OR(AND(=($1, 1993), (ITEM($2, 0), 
 29600)), =($1, 1994), (ITEM($2, 0), 29700))])
 00-05  Project(T70¦¦*=[$0], dir0=[$1], columns=[$2])
 00-06Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath 
 [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_3.parquet],
  ReadEntryWithPath 
 [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_4.parquet]],
  
 selectionRoot=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart,
  numFiles=2, columns=[`*`]]])
  |
 {code}
 I attached the data set used. Let me know if you need anything more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3410) Partition Pruning : We are doing a prune when we shouldn't

2015-06-28 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605068#comment-14605068
 ] 

Aman Sinha commented on DRILL-3410:
---

Looks like there's an extraneous call: RexUtil.composeConjunction(builder, 
call.getOperands(), true);
Otherwise LGTM. 
+1. 

 Partition Pruning : We are doing a prune when we shouldn't
 --

 Key: DRILL-3410
 URL: https://issues.apache.org/jira/browse/DRILL-3410
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Rahul Challapalli
Assignee: Aman Sinha
Priority: Critical
 Fix For: 1.1.0

 Attachments: DRILL-3410.patch, DRILL-3410_part2.patch, 
 DRILL-3410_part2.patch, DRILL-3410_part2.patch


 git.commit.id.abbrev=60bc945
 The below plan does not look right. It should scan all the files based on the 
 filters in the query. Also hive returned more rows than drill
 {code}
 explain plan for select * from `existing_partition_pruning/lineitempart` 
 where (dir0=1993 and columns[0] 29600) or (dir0=1994 or columns[0]29700);
 | 00-00Screen
 00-01  Project(*=[$0])
 00-02Project(T70¦¦*=[$0])
 00-03  SelectionVectorRemover
 00-04Filter(condition=[OR(AND(=($1, 1993), (ITEM($2, 0), 
 29600)), =($1, 1994), (ITEM($2, 0), 29700))])
 00-05  Project(T70¦¦*=[$0], dir0=[$1], columns=[$2])
 00-06Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath 
 [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_3.parquet],
  ReadEntryWithPath 
 [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_4.parquet]],
  
 selectionRoot=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart,
  numFiles=2, columns=[`*`]]])
  |
 {code}
 I attached the data set used. Let me know if you need anything more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3410) Partition Pruning : We are doing a prune when we shouldn't

2015-06-27 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604301#comment-14604301
 ] 

Steven Phillips commented on DRILL-3410:


Created reviewboard https://reviews.apache.org/r/35973/


 Partition Pruning : We are doing a prune when we shouldn't
 --

 Key: DRILL-3410
 URL: https://issues.apache.org/jira/browse/DRILL-3410
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Rahul Challapalli
Assignee: Steven Phillips
Priority: Critical
 Fix For: 1.1.0

 Attachments: DRILL-3410.patch


 git.commit.id.abbrev=60bc945
 The below plan does not look right. It should scan all the files based on the 
 filters in the query. Also hive returned more rows than drill
 {code}
 explain plan for select * from `existing_partition_pruning/lineitempart` 
 where (dir0=1993 and columns[0] 29600) or (dir0=1994 or columns[0]29700);
 | 00-00Screen
 00-01  Project(*=[$0])
 00-02Project(T70¦¦*=[$0])
 00-03  SelectionVectorRemover
 00-04Filter(condition=[OR(AND(=($1, 1993), (ITEM($2, 0), 
 29600)), =($1, 1994), (ITEM($2, 0), 29700))])
 00-05  Project(T70¦¦*=[$0], dir0=[$1], columns=[$2])
 00-06Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath 
 [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_3.parquet],
  ReadEntryWithPath 
 [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_4.parquet]],
  
 selectionRoot=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart,
  numFiles=2, columns=[`*`]]])
  |
 {code}
 I attached the data set used. Let me know if you need anything more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3410) Partition Pruning : We are doing a prune when we shouldn't

2015-06-27 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604302#comment-14604302
 ] 

Steven Phillips commented on DRILL-3410:


[~amansinha100], can you please review?

 Partition Pruning : We are doing a prune when we shouldn't
 --

 Key: DRILL-3410
 URL: https://issues.apache.org/jira/browse/DRILL-3410
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Rahul Challapalli
Assignee: Aman Sinha
Priority: Critical
 Fix For: 1.1.0

 Attachments: DRILL-3410.patch


 git.commit.id.abbrev=60bc945
 The below plan does not look right. It should scan all the files based on the 
 filters in the query. Also hive returned more rows than drill
 {code}
 explain plan for select * from `existing_partition_pruning/lineitempart` 
 where (dir0=1993 and columns[0] 29600) or (dir0=1994 or columns[0]29700);
 | 00-00Screen
 00-01  Project(*=[$0])
 00-02Project(T70¦¦*=[$0])
 00-03  SelectionVectorRemover
 00-04Filter(condition=[OR(AND(=($1, 1993), (ITEM($2, 0), 
 29600)), =($1, 1994), (ITEM($2, 0), 29700))])
 00-05  Project(T70¦¦*=[$0], dir0=[$1], columns=[$2])
 00-06Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath 
 [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_3.parquet],
  ReadEntryWithPath 
 [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_4.parquet]],
  
 selectionRoot=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart,
  numFiles=2, columns=[`*`]]])
  |
 {code}
 I attached the data set used. Let me know if you need anything more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3410) Partition Pruning : We are doing a prune when we shouldn't

2015-06-26 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603909#comment-14603909
 ] 

Steven Phillips commented on DRILL-3410:


This appears to be due to the fact that the FindPartitionConditions class, 
which is the code that walks the expression tree and determines if pruning is 
valid, assumes that the Binary operators OR and AND only have two 
arguments. But you can see from expression in the plan:

{code}
OR(AND(=($1, 1993), (ITEM($2, 0), 29600)), =($1, 1994), (ITEM($2, 0), 29700))
{code}

that expression was rewritten with a single OR operator with 3 arguments.

Rewriting the expression with true binary operators seems to fix the problem. I 
will have a patch available shortly.

 Partition Pruning : We are doing a prune when we shouldn't
 --

 Key: DRILL-3410
 URL: https://issues.apache.org/jira/browse/DRILL-3410
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Rahul Challapalli
Assignee: Steven Phillips
Priority: Critical
 Fix For: 1.1.0


 git.commit.id.abbrev=60bc945
 The below plan does not look right. It should scan all the files based on the 
 filters in the query. Also hive returned more rows than drill
 {code}
 explain plan for select * from `existing_partition_pruning/lineitempart` 
 where (dir0=1993 and columns[0] 29600) or (dir0=1994 or columns[0]29700);
 | 00-00Screen
 00-01  Project(*=[$0])
 00-02Project(T70¦¦*=[$0])
 00-03  SelectionVectorRemover
 00-04Filter(condition=[OR(AND(=($1, 1993), (ITEM($2, 0), 
 29600)), =($1, 1994), (ITEM($2, 0), 29700))])
 00-05  Project(T70¦¦*=[$0], dir0=[$1], columns=[$2])
 00-06Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath 
 [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_3.parquet],
  ReadEntryWithPath 
 [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_4.parquet]],
  
 selectionRoot=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart,
  numFiles=2, columns=[`*`]]])
  |
 {code}
 I attached the data set used. Let me know if you need anything more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)