[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-12-21 Thread Bridget Bevens (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727104#comment-16727104
 ] 

Bridget Bevens commented on DRILL-6381:
---

Added the following content to the Apache Drill docs:
 created content and posted to https://drill.apache.org/docs/querying-indexes/
with the following supporting pages:
https://drill.apache.org/docs/querying-indexes-introduction/
https://drill.apache.org/docs/queries-that-qualify-for-index-based-query-plans/
https://drill.apache.org/docs/types-of-indexes/
https://drill.apache.org/docs/index-selection/
https://drill.apache.org/docs/designing-indexes-for-your-queries/
https://drill.apache.org/docs/configuring-index-planning/
https://drill.apache.org/docs/verifying-index-use/

Please let me know if I need to make any changes.

Thanks,
Bridget

> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
>  Labels: doc-complete, ready-to-commit
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670397#comment-16670397
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

dvjyothsna closed pull request #1516: Fixed imports for DRILL-6381
URL: https://github.com/apache/drill/pull/1516
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBIndexDescriptor.java
 
b/contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBIndexDescriptor.java
index 75e6bc23973..ae386ab93a0 100644
--- 
a/contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBIndexDescriptor.java
+++ 
b/contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBIndexDescriptor.java
@@ -22,8 +22,8 @@
 import java.util.List;
 import java.util.Set;
 
-import com.google.common.collect.Lists;
-import com.google.common.collect.Sets;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
 
 import org.apache.calcite.plan.RelOptCost;
 import org.apache.calcite.plan.RelOptPlanner;
@@ -40,8 +40,8 @@
 import org.apache.drill.exec.util.EncodedSchemaPathSet;
 import org.apache.drill.common.expression.LogicalExpression;
 
-import com.google.common.base.Preconditions;
-import com.google.common.collect.ImmutableSet;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableSet;
 
 public class MapRDBIndexDescriptor extends DrillIndexDescriptor {
 
diff --git 
a/contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBIndexDiscover.java
 
b/contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBIndexDiscover.java
index aed3e045a02..f828ba02daf 100644
--- 
a/contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBIndexDiscover.java
+++ 
b/contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBIndexDiscover.java
@@ -18,7 +18,7 @@
 
 package org.apache.drill.exec.planner.index;
 
-import com.google.common.collect.Maps;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
 import com.mapr.db.Admin;
 import com.mapr.db.MapRDB;
 import com.mapr.db.exceptions.DBException;
diff --git 
a/contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBStatistics.java
 
b/contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBStatistics.java
index e129b968bf7..6fedaffd092 100644
--- 
a/contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBStatistics.java
+++ 
b/contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBStatistics.java
@@ -17,11 +17,11 @@
  */
 package org.apache.drill.exec.planner.index;
 
-import com.google.common.base.Charsets;
-import com.google.common.base.Preconditions;
-import com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
 
-import com.google.common.collect.Maps;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
 import org.apache.calcite.plan.RelOptUtil;
 import org.apache.calcite.rel.RelNode;
 import org.apache.calcite.rel.metadata.RelMdUtil;
diff --git 
a/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBTableCache.java
 
b/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBTableCache.java
index f35a4c41668..924d9c0e8e7 100644
--- 
a/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBTableCache.java
+++ 
b/contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBTableCache.java
@@ -17,11 +17,11 @@
  */
 package org.apache.drill.exec.store.mapr.db;
 
-import com.google.common.cache.CacheBuilder;
-import com.google.common.cache.CacheLoader;
-import com.google.common.cache.LoadingCache;
-import com.google.common.cache.RemovalListener;
-import com.google.common.cache.RemovalNotification;
+import org.apache.drill.shaded.guava.com.google.common.cache.CacheBuilder;
+import org.apache.drill.shaded.guava.com.google.common.cache.CacheLoader;
+import org.apache.drill.shaded.guava.com.google.common.cache.LoadingCache;
+import org.apache.drill.shaded.guava.com.google.common.cache.RemovalListener;
+import 
org.apache.drill.shaded.guava.com.google.common.cache.RemovalNotification;
 import com.mapr.db.Table;
 import com.mapr.db.impl.MapRDBImpl;
 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670391#comment-16670391
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

dvjyothsna opened a new pull request #1516: Fixed imports for DRILL-6381
URL: https://github.com/apache/drill/pull/1516
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664403#comment-16664403
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on issue #1466: DRILL-6381: Add support for index based 
planning and execution
URL: https://github.com/apache/drill/pull/1466#issuecomment-433234882
 
 
   Opened the new PR:  https://github.com/apache/drill/pull/1512   to do the 
final merge. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664401#comment-16664401
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on issue #1512: DRILL-6381: Add support for index based 
planning and execution
URL: https://github.com/apache/drill/pull/1512#issuecomment-433234277
 
 
   Since the original PR has received +1 from committers,  I will be merging 
this one directly. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664400#comment-16664400
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 opened a new pull request #1512: DRILL-6381: Add support for index 
based planning and execution
URL: https://github.com/apache/drill/pull/1512
 
 
   This PR is a replacement for the original PR 
https://github.com/apache/drill/pull/1466.  Please see that PR for all review 
comments and resolutions.   I have created  this new PR after addressing review 
comments, doing a full rebase and resolving merge conflicts and squashing a 
subset of the commits.  Pushing these to the original PR would have caused some 
comments to be lost, hence creating the new one. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664386#comment-16664386
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on issue #1466: DRILL-6381: Add support for index based 
planning and execution
URL: https://github.com/apache/drill/pull/1466#issuecomment-433231667
 
 
   I will close this PR and open a separate one that is rebased on apache 
master.  The reason is after rebase and squashing some commits, I am having to 
force-push but that may clobber some of the review comments which I want to 
avoid.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664369#comment-16664369
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r228358390
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/BroadcastExchangePrel.java
 ##
 @@ -57,7 +57,9 @@ public RelOptCost computeSelfCost(RelOptPlanner planner, 
RelMetadataQuery mq) {
 
 final int  rowWidth = child.getRowType().getFieldCount() * 
DrillCostBase.AVG_FIELD_WIDTH;
 final double cpuCost = broadcastFactor * DrillCostBase.SVR_CPU_COST * 
inputRows;
-final double networkCost = broadcastFactor * 
DrillCostBase.BYTE_NETWORK_COST * inputRows * rowWidth * numEndPoints;
+
+//we assume localhost network cost is 1/10 of regular network cost
+final double networkCost = broadcastFactor * 
DrillCostBase.BYTE_NETWORK_COST * inputRows * rowWidth * (numEndPoints - 0.9);
 
 Review comment:
   I have added comments in the code regarding this. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664362#comment-16664362
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r228357573
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/BroadcastExchangePrel.java
 ##
 @@ -57,7 +57,9 @@ public RelOptCost computeSelfCost(RelOptPlanner planner, 
RelMetadataQuery mq) {
 
 final int  rowWidth = child.getRowType().getFieldCount() * 
DrillCostBase.AVG_FIELD_WIDTH;
 final double cpuCost = broadcastFactor * DrillCostBase.SVR_CPU_COST * 
inputRows;
-final double networkCost = broadcastFactor * 
DrillCostBase.BYTE_NETWORK_COST * inputRows * rowWidth * numEndPoints;
+
+//we assume localhost network cost is 1/10 of regular network cost
+final double networkCost = broadcastFactor * 
DrillCostBase.BYTE_NETWORK_COST * inputRows * rowWidth * (numEndPoints - 0.9);
 
 Review comment:
   @gparai,  forgot to respond to this.  The cost formula is: 
   (cost of broadcasting num_bytes to N - 1 nodes)  +  (cost of local 
broadcast to all minor fragments on my own node)
  = (C * num_bytes * (N - 1) )  +  (C * num_bytes * 0.1)   where the 0.1 
factor comes from the assumption that local broadcast is 10% network cost of 
the remote broadcast. 
 = C * num_bytes * (N - 0.9)
   
   While the formula seems reasonable, it is biasing the cost in favor of 
Broadcast compared to HashPartition.  We should re-visit this and ideally a 
similar change should be done for HashPartition also.  I don't recall the exact 
use case which motivated the change.. it may have been the Index Intersection.  
I can create a JIRA to re-visit this. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664336#comment-16664336
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r228350448
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillMergeProjectRule.java
 ##
 @@ -166,4 +169,25 @@ public void onMatch(RelOptRuleCall call) {
 return list;
   }
 
+  public static Project replace(Project topProject, Project bottomProject) {
 
 Review comment:
   Added javadoc. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664335#comment-16664335
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r228349995
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushProjectIntoScan.java
 ##
 @@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db;
+
+import com.google.common.collect.Lists;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.plan.RelTrait;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.rules.ProjectRemoveRule;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexNode;
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.logical.RelOptHelper;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil.ProjectPushInfo;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.store.StoragePluginOptimizerRule;
+import org.apache.drill.exec.store.mapr.db.binary.BinaryTableGroupScan;
+import org.apache.drill.exec.store.mapr.db.json.JsonTableGroupScan;
+import org.apache.drill.exec.util.Utilities;
+
+import java.util.List;
+
+public abstract class MapRDBPushProjectIntoScan extends 
StoragePluginOptimizerRule {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(MapRDBPushProjectIntoScan.class);
+
+  private MapRDBPushProjectIntoScan(RelOptRuleOperand operand, String 
description) {
+super(operand, description);
+  }
+
+  public static final StoragePluginOptimizerRule PROJECT_ON_SCAN = new 
MapRDBPushProjectIntoScan(
+  RelOptHelper.some(ProjectPrel.class, RelOptHelper.any(ScanPrel.class)), 
"MapRDBPushProjIntoScan:Proj_On_Scan") {
+@Override
+public void onMatch(RelOptRuleCall call) {
+  final ScanPrel scan = (ScanPrel) call.rel(1);
+  final ProjectPrel project = (ProjectPrel) call.rel(0);
+  if (!(scan.getGroupScan() instanceof MapRDBGroupScan)) {
+return;
+  }
+  doPushProjectIntoGroupScan(call, project, scan, (MapRDBGroupScan) 
scan.getGroupScan());
+  if (scan.getGroupScan() instanceof BinaryTableGroupScan) {
+BinaryTableGroupScan groupScan = (BinaryTableGroupScan) 
scan.getGroupScan();
+
+  } else {
+assert (scan.getGroupScan() instanceof JsonTableGroupScan);
+JsonTableGroupScan groupScan = (JsonTableGroupScan) 
scan.getGroupScan();
+
+doPushProjectIntoGroupScan(call, project, scan, groupScan);
+  }
+}
+
+@Override
+public boolean matches(RelOptRuleCall call) {
+  final ScanPrel scan = (ScanPrel) call.rel(1);
+  if (scan.getGroupScan() instanceof BinaryTableGroupScan ||
+  scan.getGroupScan() instanceof JsonTableGroupScan) {
+return super.matches(call);
+  }
+  return false;
+}
+  };
+
+  protected void doPushProjectIntoGroupScan(RelOptRuleCall call,
+  ProjectPrel project, ScanPrel scan, MapRDBGroupScan groupScan) {
+try {
+
+  DrillRelOptUtil.ProjectPushInfo columnInfo =
+  DrillRelOptUtil.getFieldsInformation(scan.getRowType(), 
project.getProjects());
+  if (columnInfo == null || Utilities.isStarQuery(columnInfo.getFields()) 
//
+  || !groupScan.canPushdownProjects(columnInfo.getFields())) {
+return;
+  }
+  RelTraitSet newTraits = call.getPlanner().emptyTraitSet();
+  // Clear out collation trait
+  for (RelTrait trait : scan.getTraitSet()) {
+  

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664320#comment-16664320
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r228345668
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/partitionsender/Partitioner.java
 ##
 @@ -29,6 +29,8 @@
 import org.apache.drill.exec.record.RecordBatch;
 
 public interface Partitioner {
+  int DEFAULT_RECORD_BATCH_SIZE = (1 << 10) - 1;
 
 Review comment:
   I think that was an omission.  I have re-inserted the comment but with some 
modifications because the batch-sizing project now allows operators to set the 
output batch size in terms of Mbytes rather than `recordCount`.  It is not yet 
applied across the board, so this `DEFAULT_RECORD_BATCH_SIZE` is still relevant 
for the exchange operators. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664316#comment-16664316
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r228343223
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdSelectivity.java
 ##
 @@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.cost;
+
+import org.apache.calcite.plan.volcano.RelSubset;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider;
+import org.apache.calcite.rel.metadata.RelMdSelectivity;
+import org.apache.calcite.rel.metadata.RelMdUtil;
+import org.apache.calcite.rel.metadata.RelMetadataProvider;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+
+import java.util.List;
+
+public class DrillRelMdSelectivity extends RelMdSelectivity {
+  private static final DrillRelMdSelectivity INSTANCE = new 
DrillRelMdSelectivity();
+
+  public static final RelMetadataProvider SOURCE = 
ReflectiveRelMetadataProvider.reflectiveSource(BuiltInMethod.SELECTIVITY.method,
 INSTANCE);
+
+
+  public Double getSelectivity(RelNode rel, RexNode predicate) {
 
 Review comment:
   There were customizations needed for RelSubset, DrillScanRel and ScanPrel 
selectivities.  For all other nodes, we are calling the 
`super.getSelectivity()`. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664314#comment-16664314
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r228342002
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/mergereceiver/MergingRecordBatch.java
 ##
 @@ -535,7 +535,10 @@ public FragmentContext getContext() {
 
   @Override
   public BatchSchema getSchema() {
-return outgoingContainer.getSchema();
+if (outgoingContainer.hasSchema()) {
+  return outgoingContainer.getSchema();
+}
+return null;
 
 Review comment:
   The null value could occur for example when even though the 
`VectorContainer` has vectors but the operator has not yet called 
`buildSchema()` on the container.  There's a `Preconditions.checkNotNull` in 
`VectorContainer.getSchema()`. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664297#comment-16664297
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r228337517
 
 

 ##
 File path: pom.xml
 ##
 @@ -53,8 +53,8 @@
 2.9.5
 2.9.5
 3.4.12
-5.2.1-mapr
-1.1
+6.0.1-mapr
 
 Review comment:
   For this version update, I will let the the PR you referenced [1] handle it. 
 There are functional test failures with that PR that need to be addressed. 
   
   [1] https://github.com/apache/drill/pull/1489


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663931#comment-16663931
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r228229608
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBFunctionalIndexInfo.java
 ##
 @@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import org.apache.drill.common.expression.CastExpression;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+
+import java.util.Map;
+import java.util.Set;
+
+public class MapRDBFunctionalIndexInfo implements FunctionalIndexInfo {
+
+  final private IndexDescriptor indexDesc;
+
+  private boolean hasFunctionalField = false;
+
+  //when we scan schemaPath in groupscan's columns, we check if this 
column(schemaPath) should be rewritten to '$N',
+  //When there are more than two functions on the same column in index, 
CAST(a.b as INT), CAST(a.b as VARCHAR),
+  // then we should map SchemaPath a.b to a set of SchemaPath, e.g. $1, $2
+  private Map> columnToConvert;
+
+  // map of functional index expression to destination SchemaPath e.g. $N
+  private Map exprToConvert;
+
+  //map of SchemaPath involved in a functional field
+  private Map> pathsInExpr;
+
+  private Set newPathsForIndexedFunction;
+
+  private Set allPathsInFunction;
+
+  public MapRDBFunctionalIndexInfo(IndexDescriptor indexDesc) {
+this.indexDesc = indexDesc;
+columnToConvert = Maps.newHashMap();
 
 Review comment:
   Ah, that explains it.  Presumably we will fix it for 1.15.  Thanks. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663479#comment-16663479
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vvysotskyi commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r228097346
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBFunctionalIndexInfo.java
 ##
 @@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import org.apache.drill.common.expression.CastExpression;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+
+import java.util.Map;
+import java.util.Set;
+
+public class MapRDBFunctionalIndexInfo implements FunctionalIndexInfo {
+
+  final private IndexDescriptor indexDesc;
+
+  private boolean hasFunctionalField = false;
+
+  //when we scan schemaPath in groupscan's columns, we check if this 
column(schemaPath) should be rewritten to '$N',
+  //When there are more than two functions on the same column in index, 
CAST(a.b as INT), CAST(a.b as VARCHAR),
+  // then we should map SchemaPath a.b to a set of SchemaPath, e.g. $1, $2
+  private Map> columnToConvert;
+
+  // map of functional index expression to destination SchemaPath e.g. $N
+  private Map exprToConvert;
+
+  //map of SchemaPath involved in a functional field
+  private Map> pathsInExpr;
+
+  private Set newPathsForIndexedFunction;
+
+  private Set allPathsInFunction;
+
+  public MapRDBFunctionalIndexInfo(IndexDescriptor indexDesc) {
+this.indexDesc = indexDesc;
+columnToConvert = Maps.newHashMap();
 
 Review comment:
   Checkstyle-validation passed for this module because of this Jira: 
[DRILL-6691](https://issues.apache.org/jira/browse/DRILL-6691)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663352#comment-16663352
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r228055114
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBFunctionalIndexInfo.java
 ##
 @@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import org.apache.drill.common.expression.CastExpression;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+
+import java.util.Map;
+import java.util.Set;
+
+public class MapRDBFunctionalIndexInfo implements FunctionalIndexInfo {
+
+  final private IndexDescriptor indexDesc;
+
+  private boolean hasFunctionalField = false;
+
+  //when we scan schemaPath in groupscan's columns, we check if this 
column(schemaPath) should be rewritten to '$N',
+  //When there are more than two functions on the same column in index, 
CAST(a.b as INT), CAST(a.b as VARCHAR),
+  // then we should map SchemaPath a.b to a set of SchemaPath, e.g. $1, $2
+  private Map> columnToConvert;
+
+  // map of functional index expression to destination SchemaPath e.g. $N
+  private Map exprToConvert;
+
+  //map of SchemaPath involved in a functional field
+  private Map> pathsInExpr;
+
+  private Set newPathsForIndexedFunction;
+
+  private Set allPathsInFunction;
+
+  public MapRDBFunctionalIndexInfo(IndexDescriptor indexDesc) {
+this.indexDesc = indexDesc;
+columnToConvert = Maps.newHashMap();
 
 Review comment:
   The checkstyle-validation had identified a bunch of non-shaded Guava classes 
in the core classes when I created the PR and I had fixed those by using the 
shaded version. But oddly, it did not complain for the MapR-DB plugin classes. 
Does this enforcement not occur for the plugins ? I am not sure..but in any 
case I looked through the new code added in MapR-DB plugin and have modified 
the imports to use shaded Guava classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663120#comment-16663120
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r228020096
 
 

 ##
 File path: 
protocol/src/main/java/org/apache/drill/exec/proto/beans/CoreOperatorType.java
 ##
 @@ -78,7 +78,8 @@
 SEQUENCE_SUB_SCAN(53),
 PARTITION_LIMIT(54),
 PCAPNG_SUB_SCAN(55),
-RUNTIME_FILTER(56);
+RUNTIME_FILTER(56),
+ROWKEY_JOIN(57);
 
 Review comment:
   I have regenerated the protobuf files for the native C++ client.  (As an 
aside, the steps for doing this seems complicated unlike the Java version. In 
either case, we really should not need to rebuild the client side if a new 
operator has been added on the server side). 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662956#comment-16662956
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r227928390
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBFunctionalIndexInfo.java
 ##
 @@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import org.apache.drill.common.expression.CastExpression;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+
+import java.util.Map;
+import java.util.Set;
+
+public class MapRDBFunctionalIndexInfo implements FunctionalIndexInfo {
+
+  final private IndexDescriptor indexDesc;
+
+  private boolean hasFunctionalField = false;
+
+  //when we scan schemaPath in groupscan's columns, we check if this 
column(schemaPath) should be rewritten to '$N',
+  //When there are more than two functions on the same column in index, 
CAST(a.b as INT), CAST(a.b as VARCHAR),
+  // then we should map SchemaPath a.b to a set of SchemaPath, e.g. $1, $2
+  private Map> columnToConvert;
+
+  // map of functional index expression to destination SchemaPath e.g. $N
+  private Map exprToConvert;
+
+  //map of SchemaPath involved in a functional field
+  private Map> pathsInExpr;
+
+  private Set newPathsForIndexedFunction;
+
+  private Set allPathsInFunction;
+
+  public MapRDBFunctionalIndexInfo(IndexDescriptor indexDesc) {
+this.indexDesc = indexDesc;
+columnToConvert = Maps.newHashMap();
 
 Review comment:
   Currently shaded 23 version of Guava is used for Drill internal purpose, see 
more in [DRILL-6422](https://issues.apache.org/jira/browse/DRILL-6422).
   Usage of 19 version of not shaded Guava causes `BUILD FAILURE` by the 
`checkstyle-validation`
   -> 
   ```
   import org.apache.drill.shaded.guava.com.google.common.collect.Maps
   import org.apache.drill.shaded.guava.com.google.common.collect.Sets
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662952#comment-16662952
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r227984433
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/partitionsender/Partitioner.java
 ##
 @@ -29,6 +29,8 @@
 import org.apache.drill.exec.record.RecordBatch;
 
 public interface Partitioner {
+  int DEFAULT_RECORD_BATCH_SIZE = (1 << 10) - 1;
 
 Review comment:
   I see, thanks for clarification.
   Is that comment regarding `recordCount` isn't necessary anymore?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662957#comment-16662957
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r227982664
 
 

 ##
 File path: 
contrib/format-maprdb/src/test/java/com/mapr/drill/maprdb/tests/index/IndexPlanTest.java
 ##
 @@ -0,0 +1,1715 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package com.mapr.drill.maprdb.tests.index;
+
+import com.mapr.db.Admin;
+import com.mapr.drill.maprdb.tests.MaprDBTestsSuite;
+import com.mapr.drill.maprdb.tests.json.BaseJsonTest;
+import com.mapr.tests.annotations.ClusterTest;
+import org.apache.drill.PlanTestBase;
+import org.joda.time.DateTime;
+import org.joda.time.format.DateTimeFormat;
+import org.apache.drill.common.config.DrillConfig;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.FixMethodOrder;
+import org.junit.Ignore;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runners.MethodSorters;
+import java.util.Properties;
+
+
+@FixMethodOrder(MethodSorters.NAME_ASCENDING)
+@Category(ClusterTest.class)
+public class IndexPlanTest extends BaseJsonTest {
+
+  final static String PRIMARY_TABLE_NAME = "/tmp/index_test_primary";
+
+  final static int PRIMARY_TABLE_SIZE = 1;
+  private static final String sliceTargetSmall = "alter session set 
`planner.slice_target` = 1";
+  private static final String sliceTargetDefault = "alter session reset 
`planner.slice_target`";
+  private static final String noIndexPlan = "alter session set 
`planner.enable_index_planning` = false";
+  private static final String defaultHavingIndexPlan = "alter session reset 
`planner.enable_index_planning`";
+  private static final String disableHashAgg = "alter session set 
`planner.enable_hashagg` = false";
+  private static final String enableHashAgg =  "alter session set 
`planner.enable_hashagg` = true";
+  private static final String defaultnonCoveringSelectivityThreshold = "alter 
session set `planner.index.noncovering_selectivity_threshold` = 0.025";
+  private static final String incrnonCoveringSelectivityThreshold = "alter 
session set `planner.index.noncovering_selectivity_threshold` = 0.25";
+  private static final String disableFTS = "alter session set 
`planner.disable_full_table_scan` = true";
+  private static final String enableFTS = "alter session reset 
`planner.disable_full_table_scan`";
+  private static final String preferIntersectPlans = "alter session set 
`planner.index.prefer_intersect_plans` = true";
+  private static final String defaultIntersectPlans = "alter session reset 
`planner.index.prefer_intersect_plans`";
+  private static final String lowRowKeyJoinBackIOFactor
+  = "alter session set `planner.index.rowkeyjoin_cost_factor` = 0.01";
+  private static final String defaultRowKeyJoinBackIOFactor
+  = "alter session reset `planner.index.rowkeyjoin_cost_factor`";
+
+  /**
+   *  A sample row of this 10K table:
+   --+-++
+   | 1012  | {"city":"pfrrs","state":"pc"}  | 
{"email":"kffzkuz...@gmail.com","phone":"655471"}  |
+   {"ssn":"17423"}  | {"fname":"KfFzK","lname":"UZwNk"}  | 
{"age":53.0,"income":45.0}  | 1012   |
+   *
+   * This test suite generate random content to fill all the rows, since the 
random function always start from
+   * the same seed for different runs, when the row count is not changed, the 
data in table will always be the same,
+   * thus the query result could be predicted and verified.
+   */
+
+  @BeforeClass
+  public static void setupTableIndexes() throws Exception {
+
+Properties overrideProps = new Properties();
+
overrideProps.setProperty("format-maprdb.json.useNumRegionsForDistribution", 
"true");
+updateTestCluster(1, DrillConfig.create(overrideProps));
+
+MaprDBTestsSuite.setupTests();
+MaprDBTestsSuite.createPluginAndGetConf(getDrillbitContext());
+
+

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662953#comment-16662953
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r227985900
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillMergeProjectRule.java
 ##
 @@ -166,4 +169,25 @@ public void onMatch(RelOptRuleCall call) {
 return list;
   }
 
+  public static Project replace(Project topProject, Project bottomProject) {
 
 Review comment:
   It makes sense. A small javadoc can be useful here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662954#comment-16662954
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r227928626
 
 

 ##
 File path: contrib/format-maprdb/pom.xml
 ##
 @@ -83,6 +83,41 @@
   
 
   
+  
+org.apache.maven.plugins
+maven-jar-plugin
+
+  
+
+**/core-site.xml
+**/logback.xml
+  
+
+  
+
+  
+  
+org.codehaus.mojo
+build-helper-maven-plugin
+1.9.1
 
 Review comment:
   In any case, it is not major.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662955#comment-16662955
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r227978263
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushProjectIntoScan.java
 ##
 @@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db;
+
+import com.google.common.collect.Lists;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.plan.RelTrait;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.rules.ProjectRemoveRule;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexNode;
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.logical.RelOptHelper;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil.ProjectPushInfo;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.store.StoragePluginOptimizerRule;
+import org.apache.drill.exec.store.mapr.db.binary.BinaryTableGroupScan;
+import org.apache.drill.exec.store.mapr.db.json.JsonTableGroupScan;
+import org.apache.drill.exec.util.Utilities;
+
+import java.util.List;
+
+public abstract class MapRDBPushProjectIntoScan extends 
StoragePluginOptimizerRule {
 
 Review comment:
   It makes sense, thank you for clarifications.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662951#comment-16662951
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r227980606
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/MaprDBJsonRecordReader.java
 ##
 @@ -517,6 +388,15 @@ private static FieldPath 
getFieldPathForProjection(SchemaPath column) {
 return new FieldPath(child);
   }
 
+  public static boolean includesIdField(Collection projected) {
+return Iterables.tryFind(projected, new Predicate() {
 
 Review comment:
   Agreed. 
   Vova was doing that, but has not finished in the result.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659715#comment-16659715
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r227140803
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
 ##
 @@ -568,25 +603,13 @@ public IterOutcome innerNext() {
 
   } else {
 // Our build side is empty, we won't have any matches, clear the probe 
side
-if (leftUpstream == IterOutcome.OK_NEW_SCHEMA || leftUpstream == 
IterOutcome.OK) {
-  for (final VectorWrapper wrapper : probeBatch) {
-wrapper.getValueVector().clear();
-  }
-  probeBatch.kill(true);
-  leftUpstream = next(HashJoinHelper.LEFT_INPUT, probeBatch);
-  while (leftUpstream == IterOutcome.OK_NEW_SCHEMA || leftUpstream == 
IterOutcome.OK) {
-for (final VectorWrapper wrapper : probeBatch) {
-  wrapper.getValueVector().clear();
-}
-leftUpstream = next(HashJoinHelper.LEFT_INPUT, probeBatch);
-  }
-}
+drainLeft();
 
 Review comment:
   I have resolved this merge conflict after rebasing on my local branch.  I 
haven't pushed it remote yet (need to see if I can avoid force-push). 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657661#comment-16657661
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

Ben-Zvi commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r226813040
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
 ##
 @@ -1212,6 +1238,81 @@ private void updateStats() {
 this.stats.setLongStat(Metric.SPILLED_PARTITIONS, numSpilled);
   }
 
+  private void drainLeft() {
 
 Review comment:
   Follow on (after #1480) - these two methods can be eliminated ( 
`drainLeft()` and `drainRight()` )
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657658#comment-16657658
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

Ben-Zvi commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r226813004
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
 ##
 @@ -568,25 +603,13 @@ public IterOutcome innerNext() {
 
   } else {
 // Our build side is empty, we won't have any matches, clear the probe 
side
-if (leftUpstream == IterOutcome.OK_NEW_SCHEMA || leftUpstream == 
IterOutcome.OK) {
-  for (final VectorWrapper wrapper : probeBatch) {
-wrapper.getValueVector().clear();
-  }
-  probeBatch.kill(true);
-  leftUpstream = next(HashJoinHelper.LEFT_INPUT, probeBatch);
-  while (leftUpstream == IterOutcome.OK_NEW_SCHEMA || leftUpstream == 
IterOutcome.OK) {
-for (final VectorWrapper wrapper : probeBatch) {
-  wrapper.getValueVector().clear();
-}
-leftUpstream = next(HashJoinHelper.LEFT_INPUT, probeBatch);
-  }
-}
+drainLeft();
 
 Review comment:
   Note the expected merge conflict here (from DRILL-6755 = #1480):  The new 
code defined similar methods (e.g. `killAndDrainLeftUpstream()`). So please 
follow the #1480 new code. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657538#comment-16657538
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

chunhui-shi commented on issue #1466: DRILL-6381: Add support for index based 
planning and execution
URL: https://github.com/apache/drill/pull/1466#issuecomment-431522766
 
 
   +1. I think you already addressed all the comments and just need to solve 
the rebase conflicts.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657517#comment-16657517
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on issue #1466: DRILL-6381: Add support for index based 
planning and execution
URL: https://github.com/apache/drill/pull/1466#issuecomment-431516459
 
 
   Reviewers, I am hoping that this PR can be merged sometime soon since I am 
already having trouble rebasing the large amount of code on latest Apache 
Master.  Note that the feature has been extensively tested by Drill's QA 
(several thousands of tests) both functionally and performance-wise.I have 
tried to address most of the review comments and made a judgement call on a few 
remaining ones.  If there are blocker issues in terms of code organization or 
style let me know.   Thanks for your reviews ! 
   
   cc  @vdiravka, @Ben-Zvi, @arina-ielchiieva , @gparai, @HanumathRao 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657470#comment-16657470
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r226787783
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/partitionsender/Partitioner.java
 ##
 @@ -29,6 +29,8 @@
 import org.apache.drill.exec.record.RecordBatch;
 
 public interface Partitioner {
+  int DEFAULT_RECORD_BATCH_SIZE = (1 << 10) - 1;
 
 Review comment:
   The DEFAULT_RECORD_BATCH_SIZE is already in the current master branch here 
[1].  I simply moved it into the `Partitioner` interface. 
   
   [1] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/partitionsender/PartitionerTemplate.java#L58


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652657#comment-16652657
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r225740876
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillMergeProjectRule.java
 ##
 @@ -166,4 +169,25 @@ public void onMatch(RelOptRuleCall call) {
 return list;
   }
 
+  public static Project replace(Project topProject, Project bottomProject) {
 
 Review comment:
   They can be merged into the top level Project that allows 
duplicates...however, the purpose of the `replace()` method here is to simply 
allow the caller to replace a project with another with the assumption that 
callers know exactly what they are doing.  This is not applying the full 
fledged `DrillMergeProjectRule`. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652618#comment-16652618
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r225737406
 
 

 ##
 File path: 
contrib/format-maprdb/src/test/java/com/mapr/drill/maprdb/tests/index/IndexPlanTest.java
 ##
 @@ -0,0 +1,1715 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package com.mapr.drill.maprdb.tests.index;
+
+import com.mapr.db.Admin;
+import com.mapr.drill.maprdb.tests.MaprDBTestsSuite;
+import com.mapr.drill.maprdb.tests.json.BaseJsonTest;
+import com.mapr.tests.annotations.ClusterTest;
+import org.apache.drill.PlanTestBase;
+import org.joda.time.DateTime;
+import org.joda.time.format.DateTimeFormat;
+import org.apache.drill.common.config.DrillConfig;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.FixMethodOrder;
+import org.junit.Ignore;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runners.MethodSorters;
+import java.util.Properties;
+
+
+@FixMethodOrder(MethodSorters.NAME_ASCENDING)
+@Category(ClusterTest.class)
+public class IndexPlanTest extends BaseJsonTest {
+
+  final static String PRIMARY_TABLE_NAME = "/tmp/index_test_primary";
+
+  final static int PRIMARY_TABLE_SIZE = 1;
+  private static final String sliceTargetSmall = "alter session set 
`planner.slice_target` = 1";
+  private static final String sliceTargetDefault = "alter session reset 
`planner.slice_target`";
+  private static final String noIndexPlan = "alter session set 
`planner.enable_index_planning` = false";
+  private static final String defaultHavingIndexPlan = "alter session reset 
`planner.enable_index_planning`";
+  private static final String disableHashAgg = "alter session set 
`planner.enable_hashagg` = false";
+  private static final String enableHashAgg =  "alter session set 
`planner.enable_hashagg` = true";
+  private static final String defaultnonCoveringSelectivityThreshold = "alter 
session set `planner.index.noncovering_selectivity_threshold` = 0.025";
+  private static final String incrnonCoveringSelectivityThreshold = "alter 
session set `planner.index.noncovering_selectivity_threshold` = 0.25";
+  private static final String disableFTS = "alter session set 
`planner.disable_full_table_scan` = true";
+  private static final String enableFTS = "alter session reset 
`planner.disable_full_table_scan`";
+  private static final String preferIntersectPlans = "alter session set 
`planner.index.prefer_intersect_plans` = true";
+  private static final String defaultIntersectPlans = "alter session reset 
`planner.index.prefer_intersect_plans`";
+  private static final String lowRowKeyJoinBackIOFactor
+  = "alter session set `planner.index.rowkeyjoin_cost_factor` = 0.01";
+  private static final String defaultRowKeyJoinBackIOFactor
+  = "alter session reset `planner.index.rowkeyjoin_cost_factor`";
+
+  /**
+   *  A sample row of this 10K table:
+   --+-++
+   | 1012  | {"city":"pfrrs","state":"pc"}  | 
{"email":"kffzkuz...@gmail.com","phone":"655471"}  |
+   {"ssn":"17423"}  | {"fname":"KfFzK","lname":"UZwNk"}  | 
{"age":53.0,"income":45.0}  | 1012   |
+   *
+   * This test suite generate random content to fill all the rows, since the 
random function always start from
+   * the same seed for different runs, when the row count is not changed, the 
data in table will always be the same,
+   * thus the query result could be predicted and verified.
+   */
+
+  @BeforeClass
+  public static void setupTableIndexes() throws Exception {
+
+Properties overrideProps = new Properties();
+
overrideProps.setProperty("format-maprdb.json.useNumRegionsForDistribution", 
"true");
+updateTestCluster(1, DrillConfig.create(overrideProps));
+
+MaprDBTestsSuite.setupTests();
+MaprDBTestsSuite.createPluginAndGetConf(getDrillbitContext());
+
+

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652510#comment-16652510
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r225721627
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/JoinControl.java
 ##
 @@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.common;
+
+/**
+ * For the int type control,
+ * the meaning of each bit start from lowest:
+ * bit 0: intersect or not, 0 -- default(no intersect), 1 -- INTERSECT 
(DISTINCT as default)
+ * bit 1: intersect type, 0 -- default (DISTINCT), 1 -- INTERSECT_ALL
+ */
+public class JoinControl {
 
 Review comment:
   @Ben-Zvi I have incorporated the joinControl logic as part of the hash join 
probe phase.  Please see changes in [1].  I haven't done it for build phase 
since it will be superseded by the semi-join changes.  Can you pls review this 
change ?
   
   [1] 
https://github.com/apache/drill/pull/1466/commits/5330fd684a20587e9e860d7894e3d3602a6d0495


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652285#comment-16652285
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

gparai commented on a change in pull request #1466: DRILL-6381: Add support for 
index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r225675265
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
 ##
 @@ -114,6 +114,28 @@
   public static final String UNIONALL_DISTRIBUTE_KEY = 
"planner.enable_unionall_distribute";
   public static final BooleanValidator UNIONALL_DISTRIBUTE = new 
BooleanValidator(UNIONALL_DISTRIBUTE_KEY, null);
 
+  // --- Index planning related 
options BEGIN --
+  public static final String USE_SIMPLE_OPTIMIZER_KEY = 
"planner.use_simple_optimizer";
+  public static final BooleanValidator USE_SIMPLE_OPTIMIZER = new 
BooleanValidator(USE_SIMPLE_OPTIMIZER_KEY, null);
+  public static final BooleanValidator INDEX_PLANNING = new 
BooleanValidator("planner.enable_index_planning", null);
+  public static final BooleanValidator ENABLE_STATS = new 
BooleanValidator("planner.enable_statistics", null);
 
 Review comment:
   Yes, that was the intent - allow statistics use for any source with one 
option.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648530#comment-16648530
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r224928601
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/MaprDBJsonRecordReader.java
 ##
 @@ -517,6 +388,15 @@ private static FieldPath 
getFieldPathForProjection(SchemaPath column) {
 return new FieldPath(child);
   }
 
+  public static boolean includesIdField(Collection projected) {
+return Iterables.tryFind(projected, new Predicate() {
 
 Review comment:
   Since this is a syntactic modification, I am inclined to leave it as-is .. 
partly because I might cause an unintentional side effect/regression and partly 
because such functional utilities appear elsewhere in the Drill and Calcite 
code.  Ideally, if we want to move to Lambdas, we should re-visit all such 
implementations.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648514#comment-16648514
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r224926241
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushFilterIntoScan.java
 ##
 @@ -137,11 +137,10 @@ protected void 
doPushFilterIntoJsonGroupScan(RelOptRuleCall call,
   return; //no filter pushdown ==> No transformation.
 }
 
-// clone the groupScan with the newScanSpec.
-final JsonTableGroupScan newGroupsScan = groupScan.clone(newScanSpec);
+final JsonTableGroupScan newGroupsScan = (JsonTableGroupScan) 
groupScan.clone(newScanSpec);
 
 Review comment:
   Currently, the `clone()` method returns an instance of `GroupScan`, so this 
cast is needed. Did you mean change the original `clone` implementation itself 
?  That will require some additional changes since both binary table and json 
table implement it. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648507#comment-16648507
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r224925146
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/generators/IndexIntersectPlanGenerator.java
 ##
 @@ -0,0 +1,350 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index.generators;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.InvalidRelException;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeFactory;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.util.Pair;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.IndexGroupScan;
+import org.apache.drill.exec.planner.common.JoinControl;
+import org.apache.drill.exec.planner.index.IndexLogicalPlanCallContext;
+import org.apache.drill.exec.planner.index.IndexDescriptor;
+import org.apache.drill.exec.planner.index.FunctionalIndexInfo;
+import org.apache.drill.exec.planner.index.FunctionalIndexHelper;
+import org.apache.drill.exec.planner.index.IndexPlanUtils;
+import org.apache.drill.exec.planner.index.IndexConditionInfo;
+import org.apache.drill.exec.planner.physical.DrillDistributionTrait;
+import org.apache.drill.exec.planner.physical.DrillDistributionTraitDef;
+import 
org.apache.drill.exec.planner.physical.DrillDistributionTrait.DistributionType;
+import org.apache.drill.exec.planner.physical.FilterPrel;
+import org.apache.drill.exec.planner.physical.HashJoinPrel;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.physical.Prule;
+import org.apache.drill.exec.planner.physical.RowKeyJoinPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * IndexScanIntersectGenerator is to generate index plan against multiple 
index tables,
+ * the input indexes are assumed to be ranked by selectivity(low to high) 
already.
+ */
+public class IndexIntersectPlanGenerator extends AbstractIndexPlanGenerator {
+
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(IndexIntersectPlanGenerator.class);
+
+  final Map indexInfoMap;
+
+  public IndexIntersectPlanGenerator(IndexLogicalPlanCallContext indexContext,
+ Map 
indexInfoMap,
+ RexBuilder builder,
+ PlannerSettings settings) {
+super(indexContext, null, null, builder, settings);
+this.indexInfoMap = indexInfoMap;
+  }
+
+  public RelNode buildRowKeyJoin(RelNode left, RelNode right, boolean 
isRowKeyJoin, int htControl)
+  throws InvalidRelException {
+final int leftRowKeyIdx = getRowKeyIndex(left.getRowType(), origScan);
+final int rightRowKeyIdx = 0; // only rowkey field is being projected from 
right side
+
+assert leftRowKeyIdx >= 0;
+
+List leftJoinKeys = ImmutableList.of(leftRowKeyIdx);
+List rightJoinKeys = ImmutableList.of(rightRowKeyIdx);
+
+logger.trace(String.format(
+"buildRowKeyJoin: leftIdx: %d, rightIdx: %d",
+

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648504#comment-16648504
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r224925023
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
 ##
 @@ -114,6 +114,28 @@
   public static final String UNIONALL_DISTRIBUTE_KEY = 
"planner.enable_unionall_distribute";
   public static final BooleanValidator UNIONALL_DISTRIBUTE = new 
BooleanValidator(UNIONALL_DISTRIBUTE_KEY, null);
 
+  // --- Index planning related 
options BEGIN --
+  public static final String USE_SIMPLE_OPTIMIZER_KEY = 
"planner.use_simple_optimizer";
+  public static final BooleanValidator USE_SIMPLE_OPTIMIZER = new 
BooleanValidator(USE_SIMPLE_OPTIMIZER_KEY, null);
+  public static final BooleanValidator INDEX_PLANNING = new 
BooleanValidator("planner.enable_index_planning", null);
+  public static final BooleanValidator ENABLE_STATS = new 
BooleanValidator("planner.enable_statistics", null);
 
 Review comment:
   I will let @gparai elaborate on this but my understanding is that by keeping 
a general name such as `planner.enable_statistics` option we could potentially 
use it for any type of data source (including Parquet) once the broader stats 
capability is added to Drill. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648486#comment-16648486
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r224921473
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/InvalidIndexDefinitionException.java
 ##
 @@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+
+public class InvalidIndexDefinitionException extends DrillRuntimeException {
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648482#comment-16648482
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r224921316
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBFormatPlugin.java
 ##
 @@ -57,34 +64,74 @@ public MapRDBFormatPlugin(String name, DrillbitContext 
context, Configuration fs
 hbaseConf = HBaseConfiguration.create(fsConf);
 hbaseConf.set(ConnectionFactory.DEFAULT_DB, 
ConnectionFactory.MAPR_ENGINE2);
 connection = ConnectionFactory.createConnection(hbaseConf);
+jsonTableCache = new MapRDBTableCache(context.getConfig());
+int scanRangeSizeMBConfig = 
context.getConfig().getInt(PluginConstants.JSON_TABLE_SCAN_SIZE_MB);
+if (scanRangeSizeMBConfig < 32 || scanRangeSizeMBConfig > 8192) {
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648461#comment-16648461
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r224918663
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushFilterIntoScan.java
 ##
 @@ -187,7 +186,7 @@ protected void doPushFilterIntoBinaryGroupScan(final 
RelOptRuleCall call,
 
groupScan.getTableStats());
 newGroupsScan.setFilterPushedDown(true);
 
-final ScanPrel newScanPrel = ScanPrel.create(scan, filter.getTraitSet(), 
newGroupsScan, scan.getRowType());
+final ScanPrel newScanPrel = ScanPrel.create(scan, filter.getTraitSet(), 
newGroupsScan, scan.getRowType(), scan.getTable());
 
 Review comment:
   Changed to use the constructor.  @vdiravka why not remove the `create` 
method altogether ?   [Updated] never mind...I see that the latest master has 
that removed.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648457#comment-16648457
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r224918663
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushFilterIntoScan.java
 ##
 @@ -187,7 +186,7 @@ protected void doPushFilterIntoBinaryGroupScan(final 
RelOptRuleCall call,
 
groupScan.getTableStats());
 newGroupsScan.setFilterPushedDown(true);
 
-final ScanPrel newScanPrel = ScanPrel.create(scan, filter.getTraitSet(), 
newGroupsScan, scan.getRowType());
+final ScanPrel newScanPrel = ScanPrel.create(scan, filter.getTraitSet(), 
newGroupsScan, scan.getRowType(), scan.getTable());
 
 Review comment:
   Changed to use the constructor.  @vdiravka why not remove the `create` 
method altogether ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648451#comment-16648451
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r224917712
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushProjectIntoScan.java
 ##
 @@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db;
+
+import com.google.common.collect.Lists;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.plan.RelTrait;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.rules.ProjectRemoveRule;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexNode;
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.logical.RelOptHelper;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil.ProjectPushInfo;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.store.StoragePluginOptimizerRule;
+import org.apache.drill.exec.store.mapr.db.binary.BinaryTableGroupScan;
+import org.apache.drill.exec.store.mapr.db.json.JsonTableGroupScan;
+import org.apache.drill.exec.util.Utilities;
+
+import java.util.List;
+
+public abstract class MapRDBPushProjectIntoScan extends 
StoragePluginOptimizerRule {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(MapRDBPushProjectIntoScan.class);
+
+  private MapRDBPushProjectIntoScan(RelOptRuleOperand operand, String 
description) {
+super(operand, description);
+  }
+
+  public static final StoragePluginOptimizerRule PROJECT_ON_SCAN = new 
MapRDBPushProjectIntoScan(
+  RelOptHelper.some(ProjectPrel.class, RelOptHelper.any(ScanPrel.class)), 
"MapRDBPushProjIntoScan:Proj_On_Scan") {
+@Override
+public void onMatch(RelOptRuleCall call) {
+  final ScanPrel scan = (ScanPrel) call.rel(1);
+  final ProjectPrel project = (ProjectPrel) call.rel(0);
+  if (!(scan.getGroupScan() instanceof MapRDBGroupScan)) {
+return;
+  }
+  doPushProjectIntoGroupScan(call, project, scan, (MapRDBGroupScan) 
scan.getGroupScan());
+  if (scan.getGroupScan() instanceof BinaryTableGroupScan) {
 
 Review comment:
   I have made the changes here and rest of this class to only check for 
JsonTableGroupScan.  Binary table's projection pushdown was already handled by 
the logical planning rule `DrillPushProjectIntoScan`.   In future, if we want 
to extend this rule to binary tables, we would need to ensure it is tested. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648448#comment-16648448
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r224916314
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushLimitIntoScan.java
 ##
 @@ -0,0 +1,203 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db;
+
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.logical.RelOptHelper;
+import org.apache.drill.exec.planner.physical.LimitPrel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.physical.RowKeyJoinPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.store.StoragePluginOptimizerRule;
+import org.apache.drill.exec.store.hbase.HBaseScanSpec;
+import org.apache.drill.exec.store.mapr.db.binary.BinaryTableGroupScan;
+import org.apache.drill.exec.store.mapr.db.json.JsonTableGroupScan;
+import org.apache.drill.exec.store.mapr.db.json.RestrictedJsonTableGroupScan;
+
+public abstract class MapRDBPushLimitIntoScan extends 
StoragePluginOptimizerRule {
 
 Review comment:
   The `DrillPushLimitToScanRule` works for the logical plan.  We introduced 
the MapR-DB specific rule to apply similar pushdown for the physical plan 
because after index planning is done additional pushdowns may be possible. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647511#comment-16647511
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223668844
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/IndexConditionInfo.java
 ##
 @@ -0,0 +1,250 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import 
org.apache.drill.exec.planner.logical.partition.RewriteCombineBinaryOperators;
+import org.apache.calcite.rel.RelNode;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class IndexConditionInfo {
+  public final RexNode indexCondition;
+  public final RexNode remainderCondition;
+  public final boolean hasIndexCol;
+
+  public IndexConditionInfo(RexNode indexCondition, RexNode 
remainderCondition, boolean hasIndexCol) {
+this.indexCondition = indexCondition;
+this.remainderCondition = remainderCondition;
+this.hasIndexCol = hasIndexCol;
+  }
+
+  public static Builder newBuilder(RexNode condition,
+   Iterable indexes,
+   RexBuilder builder,
+   RelNode scan) {
+return new Builder(condition, indexes, builder, scan);
+  }
+
+  public static class Builder {
+final RexBuilder builder;
+final RelNode scan;
+final Iterable indexes;
+private RexNode condition;
+
+public Builder(RexNode condition,
+   Iterable indexes,
+   RexBuilder builder,
+   RelNode scan
+) {
+  this.condition = condition;
+  this.builder = builder;
+  this.scan = scan;
+  this.indexes = indexes;
+}
+
+public Builder(RexNode condition,
+   IndexDescriptor index,
+   RexBuilder builder,
+   DrillScanRel scan
+) {
+  this.condition = condition;
+  this.builder = builder;
+  this.scan = scan;
+  this.indexes = Lists.newArrayList(index);
+}
+
+/**
+ * Get a single IndexConditionInfo in which indexCondition has field  on 
all indexes in this.indexes
+ * @return
+ */
+public IndexConditionInfo getCollectiveInfo(IndexLogicalPlanCallContext 
indexContext) {
+  Set paths = Sets.newLinkedHashSet();
+  for ( IndexDescriptor index : indexes ) {
+paths.addAll(index.getIndexColumns());
+//paths.addAll(index.getNonIndexColumns());
+  }
+  return indexConditionRelatedToFields(Lists.newArrayList(paths), 
condition);
+}
+
+/*
+ * A utility function to check whether the given index hint is valid.
+ */
+public boolean isValidIndexHint(IndexLogicalPlanCallContext indexContext) {
+  if (indexContext.indexHint.equals("")) { return false; }
+
+  for ( IndexDescriptor index: indexes ) {
 
 Review comment:
   formatting


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
>

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647512#comment-16647512
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223668951
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/IndexConditionInfo.java
 ##
 @@ -0,0 +1,250 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import 
org.apache.drill.exec.planner.logical.partition.RewriteCombineBinaryOperators;
+import org.apache.calcite.rel.RelNode;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class IndexConditionInfo {
+  public final RexNode indexCondition;
+  public final RexNode remainderCondition;
+  public final boolean hasIndexCol;
+
+  public IndexConditionInfo(RexNode indexCondition, RexNode 
remainderCondition, boolean hasIndexCol) {
+this.indexCondition = indexCondition;
+this.remainderCondition = remainderCondition;
+this.hasIndexCol = hasIndexCol;
+  }
+
+  public static Builder newBuilder(RexNode condition,
+   Iterable indexes,
+   RexBuilder builder,
+   RelNode scan) {
+return new Builder(condition, indexes, builder, scan);
+  }
+
+  public static class Builder {
+final RexBuilder builder;
+final RelNode scan;
+final Iterable indexes;
+private RexNode condition;
+
+public Builder(RexNode condition,
+   Iterable indexes,
+   RexBuilder builder,
+   RelNode scan
+) {
+  this.condition = condition;
+  this.builder = builder;
+  this.scan = scan;
+  this.indexes = indexes;
+}
+
+public Builder(RexNode condition,
+   IndexDescriptor index,
+   RexBuilder builder,
+   DrillScanRel scan
+) {
+  this.condition = condition;
+  this.builder = builder;
+  this.scan = scan;
+  this.indexes = Lists.newArrayList(index);
+}
+
+/**
+ * Get a single IndexConditionInfo in which indexCondition has field  on 
all indexes in this.indexes
+ * @return
+ */
+public IndexConditionInfo getCollectiveInfo(IndexLogicalPlanCallContext 
indexContext) {
+  Set paths = Sets.newLinkedHashSet();
+  for ( IndexDescriptor index : indexes ) {
+paths.addAll(index.getIndexColumns());
+//paths.addAll(index.getNonIndexColumns());
+  }
+  return indexConditionRelatedToFields(Lists.newArrayList(paths), 
condition);
+}
+
+/*
+ * A utility function to check whether the given index hint is valid.
+ */
+public boolean isValidIndexHint(IndexLogicalPlanCallContext indexContext) {
+  if (indexContext.indexHint.equals("")) { return false; }
+
+  for ( IndexDescriptor index: indexes ) {
+if ( indexContext.indexHint.equals(index.getIndexName())) {
+  return true;
+}
+  }
+  return false;
+}
+
+/**
+ * Get a map of Index=>IndexConditionInfo, each IndexConditionInfo has the 
separated condition and remainder condition.
+ * The map is ordered, so the last IndexDescriptor will have the final 
remainderCondition after separating conditions
+ * that are relevant to this.indexes. The conditions are separated on 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647507#comment-16647507
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223655243
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/udf/mapr/db/NotTypeOfPlaceholder.java
 ##
 @@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.udf.mapr.db;
+
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import org.apache.drill.exec.expr.holders.BitHolder;
+import org.apache.drill.exec.expr.holders.IntHolder;
+
+/**
+ * This is a placeholder for the nottypeof() function.
+ *
+ * At this time, this function can only be used in predicates. The placeholder
+ * is here to prevent calcite from complaining; the function will get pushed 
down
+ * by the storage plug-in into DB. That process will go through 
JsonConditionBuilder.java,
+ * which will replace this function with the real OJAI equivalent to be pushed 
down.
+ * Therefore, there's no implementation here.
+ */
+@FunctionTemplate(
 
 Review comment:
   the same regarding `FunctionTemplate` formatting


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647510#comment-16647510
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223659414
 
 

 ##
 File path: 
contrib/format-maprdb/src/test/java/com/mapr/drill/maprdb/tests/index/LargeTableGen.java
 ##
 @@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package com.mapr.drill.maprdb.tests.index;
+
+import static com.mapr.drill.maprdb.tests.MaprDBTestsSuite.INDEX_FLUSH_TIMEOUT;
+
+import java.io.InputStream;
+import java.io.StringBufferInputStream;
+
+import org.apache.hadoop.fs.Path;
+import org.ojai.DocumentStream;
+import org.ojai.json.Json;
+
+import com.mapr.db.Admin;
+import com.mapr.db.Table;
+import com.mapr.db.TableDescriptor;
+import com.mapr.db.impl.MapRDBImpl;
+import com.mapr.db.impl.TableDescriptorImpl;
+import com.mapr.db.tests.utils.DBTests;
+import com.mapr.fs.utils.ssh.TestCluster;
+
+/**
+ * This class is to generate a MapR json table of this schema:
+ * {
+ *   "address" : {
+ *  "city":"wtj",
+ *  "state":"ho"
+ *   }
+ *   "contact" : {
+ *  "email":"vcfahj...@gmail.com",
+ *  "phone":"655583"
+ *   }
+ *   "id" : {
+ *  "ssn":"15461"
+ *   }
+ *   "name" : {
+ *  "fname":"VcFahj",
+ *  "lname":"RfM"
+ *   }
+ * }
+ *
+ */
+public class LargeTableGen extends LargeTableGenBase {
+
+  static final int SPLIT_SIZE = 5000;
+  private Admin admin;
+
+  public LargeTableGen(Admin dbadmin) {
+admin = dbadmin;
+  }
+
+  Table createOrGetTable(String tableName, int recordNum) {
+if (admin.tableExists(tableName)) {
+  return MapRDBImpl.getTable(tableName);
+  //admin.deleteTable(tableName);
+}
+else {
+  TableDescriptor desc = new TableDescriptorImpl(new Path(tableName));
+
+  int splits = (recordNum / SPLIT_SIZE) - (((recordNum % SPLIT_SIZE) > 1)? 
0 : 1);
+
+  String[] splitsStr = new String[splits];
+  StringBuilder strBuilder = new StringBuilder("Splits:");
+  for(int i=0; i Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647505#comment-16647505
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r224675962
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBFunctionalIndexInfo.java
 ##
 @@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import org.apache.drill.common.expression.CastExpression;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+
+import java.util.Map;
+import java.util.Set;
+
+public class MapRDBFunctionalIndexInfo implements FunctionalIndexInfo {
+
+  final private IndexDescriptor indexDesc;
+
+  private boolean hasFunctionalField = false;
+
+  //when we scan schemaPath in groupscan's columns, we check if this 
column(schemaPath) should be rewritten to '$N',
+  //When there are more than two functions on the same column in index, 
CAST(a.b as INT), CAST(a.b as VARCHAR),
+  // then we should map SchemaPath a.b to a set of SchemaPath, e.g. $1, $2
+  private Map> columnToConvert;
+
+  // map of functional index expression to destination SchemaPath e.g. $N
+  private Map exprToConvert;
+
+  //map of SchemaPath involved in a functional field
+  private Map> pathsInExpr;
+
+  private Set newPathsForIndexedFunction;
+
+  private Set allPathsInFunction;
+
+  public MapRDBFunctionalIndexInfo(IndexDescriptor indexDesc) {
+this.indexDesc = indexDesc;
+columnToConvert = Maps.newHashMap();
 
 Review comment:
   You can use it, but from the last Guava version used by Drill (23.0 version) 
there is a note for this factory method:
   > Note for Java 7 and later: this method is now unnecessary and
   > should be treated as deprecated. Instead, use the {@code HashMap}
   > constructor directly, taking advantage of the new
   > http://goo.gl/iz2Wi;>"diamond" syntax.
   
https://google.github.io/guava/releases/snapshot/api/docs/com/google/common/collect/Maps.html#newHashMap--
   
   There was no this note in the previous Guava version used by Drill (19.0 
version), therefore some usages of this method are left.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647500#comment-16647500
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r217477279
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBScanBatchCreator.java
 ##
 @@ -33,7 +33,9 @@
 
 import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
 
-public class MapRDBScanBatchCreator implements BatchCreator {
+public class MapRDBScanBatchCreator implements BatchCreator{
 
 Review comment:
   space before curly brace


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647496#comment-16647496
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223680969
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillMergeProjectRule.java
 ##
 @@ -166,4 +169,25 @@ public void onMatch(RelOptRuleCall call) {
 return list;
   }
 
+  public static Project replace(Project topProject, Project bottomProject) {
+final List newProjects =
+RelOptUtil.pushPastProject(topProject.getProjects(), bottomProject);
+
+// replace the two projects with a combined projection
+if(topProject instanceof DrillProjectRel) {
 
 Review comment:
   space


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647495#comment-16647495
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223680822
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/rules/MatchFunction.java
 ##
 @@ -0,0 +1,25 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index.rules;
+
+import org.apache.calcite.plan.RelOptRuleCall;
+
+public interface MatchFunction {
+  boolean match(RelOptRuleCall call);
+  T onMatch(RelOptRuleCall call);
+}
 
 Review comment:
   new line


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647493#comment-16647493
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223679873
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/generators/IndexIntersectPlanGenerator.java
 ##
 @@ -0,0 +1,350 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index.generators;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.InvalidRelException;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeFactory;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.util.Pair;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.IndexGroupScan;
+import org.apache.drill.exec.planner.common.JoinControl;
+import org.apache.drill.exec.planner.index.IndexLogicalPlanCallContext;
+import org.apache.drill.exec.planner.index.IndexDescriptor;
+import org.apache.drill.exec.planner.index.FunctionalIndexInfo;
+import org.apache.drill.exec.planner.index.FunctionalIndexHelper;
+import org.apache.drill.exec.planner.index.IndexPlanUtils;
+import org.apache.drill.exec.planner.index.IndexConditionInfo;
+import org.apache.drill.exec.planner.physical.DrillDistributionTrait;
+import org.apache.drill.exec.planner.physical.DrillDistributionTraitDef;
+import 
org.apache.drill.exec.planner.physical.DrillDistributionTrait.DistributionType;
+import org.apache.drill.exec.planner.physical.FilterPrel;
+import org.apache.drill.exec.planner.physical.HashJoinPrel;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.physical.Prule;
+import org.apache.drill.exec.planner.physical.RowKeyJoinPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * IndexScanIntersectGenerator is to generate index plan against multiple 
index tables,
+ * the input indexes are assumed to be ranked by selectivity(low to high) 
already.
+ */
+public class IndexIntersectPlanGenerator extends AbstractIndexPlanGenerator {
+
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(IndexIntersectPlanGenerator.class);
+
+  final Map indexInfoMap;
+
+  public IndexIntersectPlanGenerator(IndexLogicalPlanCallContext indexContext,
+ Map 
indexInfoMap,
+ RexBuilder builder,
+ PlannerSettings settings) {
+super(indexContext, null, null, builder, settings);
+this.indexInfoMap = indexInfoMap;
+  }
+
+  public RelNode buildRowKeyJoin(RelNode left, RelNode right, boolean 
isRowKeyJoin, int htControl)
+  throws InvalidRelException {
+final int leftRowKeyIdx = getRowKeyIndex(left.getRowType(), origScan);
+final int rightRowKeyIdx = 0; // only rowkey field is being projected from 
right side
+
+assert leftRowKeyIdx >= 0;
+
+List leftJoinKeys = ImmutableList.of(leftRowKeyIdx);
+List rightJoinKeys = ImmutableList.of(rightRowKeyIdx);
+
+logger.trace(String.format(
+"buildRowKeyJoin: leftIdx: %d, rightIdx: %d",
+

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647492#comment-16647492
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223677718
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/SimpleRexRemap.java
 ##
 @@ -0,0 +1,300 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableMap;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexCorrelVariable;
+import org.apache.calcite.rex.RexDynamicParam;
+import org.apache.calcite.rex.RexFieldAccess;
+
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexLocalRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexOver;
+import org.apache.calcite.rex.RexRangeRef;
+import org.apache.calcite.rex.RexShuttle;
+import org.apache.calcite.rex.RexVisitorImpl;
+
+import org.apache.calcite.sql.fun.SqlStdOperatorTable;
+import org.apache.calcite.sql.type.SqlTypeName;
+import org.apache.calcite.util.NlsString;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.PathSegment;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Rewrite RexNode with these policies:
+ * 1) field renamed. The input field was named differently in index table,
+ * 2) field is in different position of underlying rowtype
+ *
+ * TODO: 3) certain operator needs rewriting. e.g. CAST function
+ * This class for now applies to only filter on scan, for 
filter-on-project-on-scan. A stack of
+ * rowType is required.
+ */
+public class SimpleRexRemap {
+  final RelNode origRel;
+  final RelDataType origRowType;
+  final RelDataType newRowType;
+
+  private RexBuilder builder;
+  private Map destExprMap;
+
+  public SimpleRexRemap(RelNode origRel,
+RelDataType newRowType, RexBuilder builder) {
+super();
+this.origRel = origRel;
+this.origRowType = origRel.getRowType();
+this.newRowType = newRowType;
+this.builder = builder;
+this.destExprMap = Maps.newHashMap();
+  }
+
+  /**
+   * Set the map of src expression to target expression, expressions not in 
the map do not have assigned destinations
+   * @param exprMap
+   * @return
+   */
+  public SimpleRexRemap setExpressionMap(Map  exprMap) {
+destExprMap.putAll(exprMap);
+return this;
+  }
+
+  public RexNode rewriteEqualOnCharToLike(RexNode expr,
+  Map 
equalOnCastCharExprs) {
+Map srcToReplace = Maps.newIdentityHashMap();
+for(Map.Entry entry: 
equalOnCastCharExprs.entrySet()) {
+  RexNode equalOp = entry.getKey();
+  LogicalExpression opInput = entry.getValue();
+
+  final List operands = ((RexCall)equalOp).getOperands();
+  RexLiteral newLiteral = null;
+  RexNode input = null;
+  if(operands.size() == 2 ) {
+RexLiteral oplit = null;
+if (operands.get(0) instanceof RexLiteral) {
+  oplit = (RexLiteral) operands.get(0);
+  if(oplit.getTypeName() == SqlTypeName.CHAR) {
+newLiteral = builder.makeLiteral(((NlsString) 
oplit.getValue()).getValue() + "%");
+input = operands.get(1);
+  }
+}
+else if (operands.get(1) instanceof RexLiteral) {
+  oplit = (RexLiteral) operands.get(1);
+  if(oplit.getTypeName() == SqlTypeName.CHAR) {
 
 Review comment:
   space 


This is an automated message from the Apache Git Service.
To respond to the message, 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647494#comment-16647494
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223680267
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/rules/AbstractMatchFunction.java
 ##
 @@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index.rules;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+
+public abstract class AbstractMatchFunction implements MatchFunction {
+  public boolean checkScan(DrillScanRel scanRel) {
+GroupScan groupScan = scanRel.getGroupScan();
+if (groupScan instanceof DbGroupScan) {
+  DbGroupScan dbscan = ((DbGroupScan) groupScan);
+  //if we already applied index convert rule, and this scan is indexScan 
or restricted scan already,
+  //no more trying index convert rule
+  return dbscan.supportsSecondaryIndex() && (!dbscan.isIndexScan()) && 
(!dbscan.isRestrictedScan());
+}
+return false;
+  }
+
+  public boolean checkScan(GroupScan groupScan) {
+if (groupScan instanceof DbGroupScan) {
+  DbGroupScan dbscan = ((DbGroupScan) groupScan);
+  //if we already applied index convert rule, and this scan is indexScan 
or restricted scan already,
+  //no more trying index convert rule
+  return dbscan.supportsSecondaryIndex() &&
+ !dbscan.isRestrictedScan() &&
+  (!dbscan.isFilterPushedDown() || dbscan.isIndexScan()) &&
+ !containsStar(dbscan);
+}
+return false;
+  }
+
+  public static boolean containsStar(DbGroupScan dbscan) {
+for (SchemaPath column : dbscan.getColumns()) {
+  if (column.getRootSegment().getPath().startsWith("*")) {
+return true;
+  }
+}
+return false;
+  }
+}
 
 Review comment:
   new line


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647491#comment-16647491
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r224673328
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java
 ##
 @@ -22,7 +22,7 @@
 import java.util.LinkedList;
 import java.util.List;
 
-import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+import org.apache.calcite.rel.type.RelDataType;
 
 Review comment:
   Agree. I will take a note to do it in context of next Calcite version update 
in Drill.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647444#comment-16647444
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223670914
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/IndexPlanUtils.java
 ##
 @@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.planner.index;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.plan.volcano.RelSubset;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.RelCollationTraitDef;
+import org.apache.calcite.rel.RelCollations;
+import org.apache.calcite.rel.RelFieldCollation;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Sort;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.IndexGroupScan;
+import org.apache.drill.exec.planner.common.DrillProjectRelBase;
+import org.apache.drill.exec.planner.common.DrillScanRelBase;
+import org.apache.drill.exec.planner.fragment.DistributionAffinity;
+import org.apache.drill.exec.planner.logical.DrillOptiq;
+import org.apache.drill.exec.planner.logical.DrillParseContext;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import org.apache.drill.exec.planner.physical.DrillDistributionTrait;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.common.OrderedRel;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+
+public class IndexPlanUtils {
+
+  public enum ConditionIndexed {
+NONE,
+PARTIAL,
+FULL}
+
+  /**
+   * Check if any of the fields of the index are present in a list of 
LogicalExpressions supplied
+   * as part of IndexableExprMarker
+   * @param exprMarker, the marker that has analyzed original index condition 
on top of original scan
+   * @param indexDesc
+   * @return ConditionIndexed.FULL, PARTIAL or NONE depending on whether all, 
some or no columns
+   * of the indexDesc are present in the list of LogicalExpressions supplied 
as part of exprMarker
+   *
+   */
+  static public ConditionIndexed conditionIndexed(IndexableExprMarker 
exprMarker, IndexDescriptor indexDesc) {
+Map mapRexExpr = 
exprMarker.getIndexableExpression();
+List infoCols = Lists.newArrayList();
+infoCols.addAll(mapRexExpr.values());
+if (indexDesc.allColumnsIndexed(infoCols)) {
+  return ConditionIndexed.FULL;
+} else if (indexDesc.someColumnsIndexed(infoCols)) {
+  return ConditionIndexed.PARTIAL;
+} else {
+  return ConditionIndexed.NONE;
+}
+  }
+
+  /**
+   * check if we want to apply index rules on this scan,
+   * if group scan is not instance of DbGroupScan, or this DbGroupScan 
instance does not support secondary index, or
+   *this scan is already an index scan or Restricted Scan, do not 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647446#comment-16647446
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223671209
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/IndexPlanUtils.java
 ##
 @@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.planner.index;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.plan.volcano.RelSubset;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.RelCollationTraitDef;
+import org.apache.calcite.rel.RelCollations;
+import org.apache.calcite.rel.RelFieldCollation;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Sort;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.IndexGroupScan;
+import org.apache.drill.exec.planner.common.DrillProjectRelBase;
+import org.apache.drill.exec.planner.common.DrillScanRelBase;
+import org.apache.drill.exec.planner.fragment.DistributionAffinity;
+import org.apache.drill.exec.planner.logical.DrillOptiq;
+import org.apache.drill.exec.planner.logical.DrillParseContext;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import org.apache.drill.exec.planner.physical.DrillDistributionTrait;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.common.OrderedRel;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+
+public class IndexPlanUtils {
+
+  public enum ConditionIndexed {
+NONE,
+PARTIAL,
+FULL}
+
+  /**
+   * Check if any of the fields of the index are present in a list of 
LogicalExpressions supplied
+   * as part of IndexableExprMarker
+   * @param exprMarker, the marker that has analyzed original index condition 
on top of original scan
+   * @param indexDesc
+   * @return ConditionIndexed.FULL, PARTIAL or NONE depending on whether all, 
some or no columns
+   * of the indexDesc are present in the list of LogicalExpressions supplied 
as part of exprMarker
+   *
+   */
+  static public ConditionIndexed conditionIndexed(IndexableExprMarker 
exprMarker, IndexDescriptor indexDesc) {
+Map mapRexExpr = 
exprMarker.getIndexableExpression();
+List infoCols = Lists.newArrayList();
+infoCols.addAll(mapRexExpr.values());
+if (indexDesc.allColumnsIndexed(infoCols)) {
+  return ConditionIndexed.FULL;
+} else if (indexDesc.someColumnsIndexed(infoCols)) {
+  return ConditionIndexed.PARTIAL;
+} else {
+  return ConditionIndexed.NONE;
+}
+  }
+
+  /**
+   * check if we want to apply index rules on this scan,
+   * if group scan is not instance of DbGroupScan, or this DbGroupScan 
instance does not support secondary index, or
+   *this scan is already an index scan or Restricted Scan, do not 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647442#comment-16647442
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223670746
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/IndexPlanUtils.java
 ##
 @@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.planner.index;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.plan.volcano.RelSubset;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.RelCollationTraitDef;
+import org.apache.calcite.rel.RelCollations;
+import org.apache.calcite.rel.RelFieldCollation;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Sort;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.IndexGroupScan;
+import org.apache.drill.exec.planner.common.DrillProjectRelBase;
+import org.apache.drill.exec.planner.common.DrillScanRelBase;
+import org.apache.drill.exec.planner.fragment.DistributionAffinity;
+import org.apache.drill.exec.planner.logical.DrillOptiq;
+import org.apache.drill.exec.planner.logical.DrillParseContext;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import org.apache.drill.exec.planner.physical.DrillDistributionTrait;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.common.OrderedRel;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+
+public class IndexPlanUtils {
+
+  public enum ConditionIndexed {
+NONE,
+PARTIAL,
+FULL}
+
+  /**
+   * Check if any of the fields of the index are present in a list of 
LogicalExpressions supplied
+   * as part of IndexableExprMarker
+   * @param exprMarker, the marker that has analyzed original index condition 
on top of original scan
+   * @param indexDesc
+   * @return ConditionIndexed.FULL, PARTIAL or NONE depending on whether all, 
some or no columns
+   * of the indexDesc are present in the list of LogicalExpressions supplied 
as part of exprMarker
+   *
+   */
+  static public ConditionIndexed conditionIndexed(IndexableExprMarker 
exprMarker, IndexDescriptor indexDesc) {
+Map mapRexExpr = 
exprMarker.getIndexableExpression();
+List infoCols = Lists.newArrayList();
+infoCols.addAll(mapRexExpr.values());
+if (indexDesc.allColumnsIndexed(infoCols)) {
+  return ConditionIndexed.FULL;
+} else if (indexDesc.someColumnsIndexed(infoCols)) {
+  return ConditionIndexed.PARTIAL;
+} else {
+  return ConditionIndexed.NONE;
+}
+  }
+
+  /**
+   * check if we want to apply index rules on this scan,
+   * if group scan is not instance of DbGroupScan, or this DbGroupScan 
instance does not support secondary index, or
+   *this scan is already an index scan or Restricted Scan, do not 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647441#comment-16647441
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223670701
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/IndexPlanUtils.java
 ##
 @@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.planner.index;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.plan.volcano.RelSubset;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.RelCollationTraitDef;
+import org.apache.calcite.rel.RelCollations;
+import org.apache.calcite.rel.RelFieldCollation;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Sort;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.IndexGroupScan;
+import org.apache.drill.exec.planner.common.DrillProjectRelBase;
+import org.apache.drill.exec.planner.common.DrillScanRelBase;
+import org.apache.drill.exec.planner.fragment.DistributionAffinity;
+import org.apache.drill.exec.planner.logical.DrillOptiq;
+import org.apache.drill.exec.planner.logical.DrillParseContext;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import org.apache.drill.exec.planner.physical.DrillDistributionTrait;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.common.OrderedRel;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+
+public class IndexPlanUtils {
+
+  public enum ConditionIndexed {
+NONE,
+PARTIAL,
+FULL}
+
+  /**
+   * Check if any of the fields of the index are present in a list of 
LogicalExpressions supplied
+   * as part of IndexableExprMarker
+   * @param exprMarker, the marker that has analyzed original index condition 
on top of original scan
+   * @param indexDesc
+   * @return ConditionIndexed.FULL, PARTIAL or NONE depending on whether all, 
some or no columns
+   * of the indexDesc are present in the list of LogicalExpressions supplied 
as part of exprMarker
+   *
+   */
+  static public ConditionIndexed conditionIndexed(IndexableExprMarker 
exprMarker, IndexDescriptor indexDesc) {
+Map mapRexExpr = 
exprMarker.getIndexableExpression();
+List infoCols = Lists.newArrayList();
+infoCols.addAll(mapRexExpr.values());
+if (indexDesc.allColumnsIndexed(infoCols)) {
+  return ConditionIndexed.FULL;
+} else if (indexDesc.someColumnsIndexed(infoCols)) {
+  return ConditionIndexed.PARTIAL;
+} else {
+  return ConditionIndexed.NONE;
+}
+  }
+
+  /**
+   * check if we want to apply index rules on this scan,
+   * if group scan is not instance of DbGroupScan, or this DbGroupScan 
instance does not support secondary index, or
+   *this scan is already an index scan or Restricted Scan, do not 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647447#comment-16647447
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223671338
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/IndexSelector.java
 ##
 @@ -0,0 +1,766 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.calcite.plan.RelOptCost;
+import org.apache.calcite.plan.RelOptPlanner;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.RelCollationTraitDef;
+import org.apache.calcite.rel.metadata.RelMdUtil;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.planner.common.DrillJoinRelBase;
+import org.apache.drill.exec.planner.cost.DrillCostBase;
+import org.apache.drill.exec.planner.cost.PluginCost;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.planner.common.DrillScanRelBase;
+
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+
+public class IndexSelector  {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(IndexSelector.class);
+  private static final double COVERING_TO_NONCOVERING_FACTOR = 100.0;
+  private RexNode indexCondition;   // filter condition on indexed columns
+  private RexNode otherRemainderCondition;  // remainder condition on all 
other columns
+  private double totalRows;
+  private Statistics stats; // a Statistics instance that will be used 
to get estimated rowcount for filter conditions
+  private IndexConditionInfo.Builder builder;
+  private List indexPropList;
+  private DrillScanRelBase primaryTableScan;
+  private IndexCallContext indexContext;
+  private RexBuilder rexBuilder;
+
+  public IndexSelector(RexNode indexCondition,
+  RexNode otherRemainderCondition,
+  IndexCallContext indexContext,
+  IndexCollection collection,
+  RexBuilder rexBuilder,
+  double totalRows) {
+this.indexCondition = indexCondition;
+this.otherRemainderCondition = otherRemainderCondition;
+this.indexContext = indexContext;
+this.totalRows = totalRows;
+this.stats = indexContext.getGroupScan().getStatistics();
+this.rexBuilder = rexBuilder;
+this.builder =
+IndexConditionInfo.newBuilder(indexCondition, collection, rexBuilder, 
indexContext.getScan());
+this.primaryTableScan = indexContext.getScan();
+this.indexPropList = Lists.newArrayList();
+  }
+
+  /**
+   * This constructor is to build selector for no index condition case (no 
filter)
+   * @param indexContext
+   */
+  public IndexSelector(IndexCallContext indexContext) {
+this.indexCondition = null;
+this.otherRemainderCondition = null;
+this.indexContext = indexContext;
+this.totalRows = Statistics.ROWCOUNT_UNKNOWN;
+this.stats = indexContext.getGroupScan().getStatistics();
+this.rexBuilder = indexContext.getScan().getCluster().getRexBuilder();
+this.builder = null;
+this.primaryTableScan = indexContext.getScan();
+this.indexPropList = Lists.newArrayList();
+  }
+
+  public void addIndex(IndexDescriptor indexDesc, boolean isCovering, int 
numProjectedFields) {
+IndexProperties indexProps = new DrillIndexProperties(indexDesc, 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647445#comment-16647445
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223671173
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/IndexPlanUtils.java
 ##
 @@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.planner.index;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.plan.volcano.RelSubset;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.RelCollationTraitDef;
+import org.apache.calcite.rel.RelCollations;
+import org.apache.calcite.rel.RelFieldCollation;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Sort;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.IndexGroupScan;
+import org.apache.drill.exec.planner.common.DrillProjectRelBase;
+import org.apache.drill.exec.planner.common.DrillScanRelBase;
+import org.apache.drill.exec.planner.fragment.DistributionAffinity;
+import org.apache.drill.exec.planner.logical.DrillOptiq;
+import org.apache.drill.exec.planner.logical.DrillParseContext;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import org.apache.drill.exec.planner.physical.DrillDistributionTrait;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.common.OrderedRel;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+
+public class IndexPlanUtils {
+
+  public enum ConditionIndexed {
+NONE,
+PARTIAL,
+FULL}
+
+  /**
+   * Check if any of the fields of the index are present in a list of 
LogicalExpressions supplied
+   * as part of IndexableExprMarker
+   * @param exprMarker, the marker that has analyzed original index condition 
on top of original scan
+   * @param indexDesc
+   * @return ConditionIndexed.FULL, PARTIAL or NONE depending on whether all, 
some or no columns
+   * of the indexDesc are present in the list of LogicalExpressions supplied 
as part of exprMarker
+   *
+   */
+  static public ConditionIndexed conditionIndexed(IndexableExprMarker 
exprMarker, IndexDescriptor indexDesc) {
+Map mapRexExpr = 
exprMarker.getIndexableExpression();
+List infoCols = Lists.newArrayList();
+infoCols.addAll(mapRexExpr.values());
+if (indexDesc.allColumnsIndexed(infoCols)) {
+  return ConditionIndexed.FULL;
+} else if (indexDesc.someColumnsIndexed(infoCols)) {
+  return ConditionIndexed.PARTIAL;
+} else {
+  return ConditionIndexed.NONE;
+}
+  }
+
+  /**
+   * check if we want to apply index rules on this scan,
+   * if group scan is not instance of DbGroupScan, or this DbGroupScan 
instance does not support secondary index, or
+   *this scan is already an index scan or Restricted Scan, do not 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647443#comment-16647443
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223670799
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/IndexPlanUtils.java
 ##
 @@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.planner.index;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.plan.volcano.RelSubset;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.RelCollationTraitDef;
+import org.apache.calcite.rel.RelCollations;
+import org.apache.calcite.rel.RelFieldCollation;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Sort;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.IndexGroupScan;
+import org.apache.drill.exec.planner.common.DrillProjectRelBase;
+import org.apache.drill.exec.planner.common.DrillScanRelBase;
+import org.apache.drill.exec.planner.fragment.DistributionAffinity;
+import org.apache.drill.exec.planner.logical.DrillOptiq;
+import org.apache.drill.exec.planner.logical.DrillParseContext;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import org.apache.drill.exec.planner.physical.DrillDistributionTrait;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.common.OrderedRel;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+
+public class IndexPlanUtils {
+
+  public enum ConditionIndexed {
+NONE,
+PARTIAL,
+FULL}
+
+  /**
+   * Check if any of the fields of the index are present in a list of 
LogicalExpressions supplied
+   * as part of IndexableExprMarker
+   * @param exprMarker, the marker that has analyzed original index condition 
on top of original scan
+   * @param indexDesc
+   * @return ConditionIndexed.FULL, PARTIAL or NONE depending on whether all, 
some or no columns
+   * of the indexDesc are present in the list of LogicalExpressions supplied 
as part of exprMarker
+   *
+   */
+  static public ConditionIndexed conditionIndexed(IndexableExprMarker 
exprMarker, IndexDescriptor indexDesc) {
+Map mapRexExpr = 
exprMarker.getIndexableExpression();
+List infoCols = Lists.newArrayList();
+infoCols.addAll(mapRexExpr.values());
+if (indexDesc.allColumnsIndexed(infoCols)) {
+  return ConditionIndexed.FULL;
+} else if (indexDesc.someColumnsIndexed(infoCols)) {
+  return ConditionIndexed.PARTIAL;
+} else {
+  return ConditionIndexed.NONE;
+}
+  }
+
+  /**
+   * check if we want to apply index rules on this scan,
+   * if group scan is not instance of DbGroupScan, or this DbGroupScan 
instance does not support secondary index, or
+   *this scan is already an index scan or Restricted Scan, do not 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647440#comment-16647440
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223668629
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/IndexConditionInfo.java
 ##
 @@ -0,0 +1,250 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import 
org.apache.drill.exec.planner.logical.partition.RewriteCombineBinaryOperators;
+import org.apache.calcite.rel.RelNode;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class IndexConditionInfo {
+  public final RexNode indexCondition;
+  public final RexNode remainderCondition;
+  public final boolean hasIndexCol;
+
+  public IndexConditionInfo(RexNode indexCondition, RexNode 
remainderCondition, boolean hasIndexCol) {
+this.indexCondition = indexCondition;
+this.remainderCondition = remainderCondition;
+this.hasIndexCol = hasIndexCol;
+  }
+
+  public static Builder newBuilder(RexNode condition,
+   Iterable indexes,
+   RexBuilder builder,
+   RelNode scan) {
+return new Builder(condition, indexes, builder, scan);
+  }
+
+  public static class Builder {
+final RexBuilder builder;
+final RelNode scan;
+final Iterable indexes;
+private RexNode condition;
+
+public Builder(RexNode condition,
+   Iterable indexes,
+   RexBuilder builder,
+   RelNode scan
+) {
 
 Review comment:
   formatting


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647439#comment-16647439
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223667409
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/DrillIndexDefinition.java
 ##
 @@ -0,0 +1,278 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.RelCollations;
+import org.apache.calcite.rel.RelFieldCollation;
+import org.apache.calcite.rel.RelFieldCollation.NullDirection;
+import org.apache.drill.common.expression.CastExpression;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class DrillIndexDefinition implements IndexDefinition {
+  /**
+   * The indexColumns is the list of column(s) on which this index is created. 
If there is more than 1 column,
+   * the order of the columns is important: index on {a, b} is not the same as 
index on {b, a}
+   * NOTE: the indexed column could be of type columnfamily.column
+   */
+  @JsonProperty
+  protected final List indexColumns;
+
+  /**
+   * nonIndexColumns: the list of columns that are included in the index as 
'covering'
+   * columns but are not themselves indexed.  These are useful for covering 
indexes where the
+   * query request can be satisfied directly by the index and avoid accessing 
the table altogether.
+   */
+  @JsonProperty
+  protected final List nonIndexColumns;
+
+  @JsonIgnore
+  protected final Set allIndexColumns;
+
+  @JsonProperty
+  protected final List rowKeyColumns;
+
+  @JsonProperty
+  protected final CollationContext indexCollationContext;
+
+  /**
+   * indexName: name of the index that should be unique within the scope of a 
table
+   */
+  @JsonProperty
+  protected final String indexName;
+
+  protected final String tableName;
+
+  @JsonProperty
+  protected final IndexDescriptor.IndexType indexType;
+
+  @JsonProperty
+  protected final NullDirection nullsDirection;
+
+  public DrillIndexDefinition(List indexCols,
+  CollationContext indexCollationContext,
+  List nonIndexCols,
+  List rowKeyColumns,
+  String indexName,
+  String tableName,
+  IndexType type,
+  NullDirection nullsDirection) {
+this.indexColumns = indexCols;
+this.nonIndexColumns = nonIndexCols;
+this.rowKeyColumns = rowKeyColumns;
+this.indexName = indexName;
+this.tableName = tableName;
+this.indexType = type;
+this.allIndexColumns = Sets.newHashSet(indexColumns);
+this.allIndexColumns.addAll(nonIndexColumns);
+this.indexCollationContext = indexCollationContext;
+this.nullsDirection = nullsDirection;
+
+  }
+
+  @Override
+  public int getIndexColumnOrdinal(LogicalExpression path) {
+int id = indexColumns.indexOf(path);
+return id;
+  }
+
+  @Override
+  public boolean isCoveringIndex(List columns) {
+return allIndexColumns.containsAll(columns);
+  }
+
+  @Override
+  public boolean allColumnsIndexed(Collection columns) {
+return columnsInIndexFields(columns, indexColumns);
+  }
+
+  @Override
+  public boolean someColumnsIndexed(Collection columns) {
+return someColumnsInIndexFields(columns, indexColumns);
+  }
+
+  public boolean pathExactIn(SchemaPath path, Collection 
exprs) {
+for (LogicalExpression expr : exprs) {
+  if (expr instanceof SchemaPath) {
+if (((SchemaPath) 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647437#comment-16647437
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223664484
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/RowKeyJoinBatch.java
 ##
 @@ -0,0 +1,284 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.join;
+
+
+import java.util.List;
+
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.physical.config.RowKeyJoinPOP;
+import org.apache.drill.exec.record.AbstractRecordBatch;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.selection.SelectionVector2;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.exec.vector.SchemaChangeCallBack;
+import org.apache.drill.exec.vector.ValueVector;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.Iterables;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+
+
+public class RowKeyJoinBatch extends AbstractRecordBatch 
implements RowKeyJoin {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(RowKeyJoinBatch.class);
+
+  // primary table side record batch
+  private final RecordBatch left;
+
+  // index table side record batch
+  private final RecordBatch right;
+
+  private boolean hasRowKeyBatch;
+  private IterOutcome leftUpstream = IterOutcome.NONE;
+  private IterOutcome rightUpstream = IterOutcome.NONE;
+  private final List transfers = Lists.newArrayList();
+  private int recordCount = 0;
+  private SchemaChangeCallBack callBack = new SchemaChangeCallBack();
+  private RowKeyJoinState rkJoinState = RowKeyJoinState.INITIAL;
+
+  public RowKeyJoinBatch(RowKeyJoinPOP config, FragmentContext context, 
RecordBatch left, RecordBatch right)
+  throws OutOfMemoryException {
+super(config, context, true /* need to build schema */);
+this.left = left;
+this.right = right;
+this.hasRowKeyBatch = false;
+  }
+
+  @Override
+  public int getRecordCount() {
+if (state == BatchState.DONE) {
+  return 0;
+}
+return recordCount;
+  }
+
+  @Override
+  public SelectionVector2 getSelectionVector2() {
+throw new UnsupportedOperationException("RowKeyJoinBatch does not support 
selection vector");
+  }
+
+  @Override
+  public SelectionVector4 getSelectionVector4() {
+throw new UnsupportedOperationException("RowKeyJoinBatch does not support 
selection vector");
+  }
+
+  @Override
+  protected void buildSchema() throws SchemaChangeException {
+container.clear();
+
+rightUpstream = next(right);
+
+if (leftUpstream == IterOutcome.STOP || rightUpstream == IterOutcome.STOP) 
{
+  state = BatchState.STOP;
+  return;
+}
+
+if (right.getRecordCount() > 0) {
+  // set the hasRowKeyBatch flag such that calling next() on the left input
+  // would see the correct status
+  hasRowKeyBatch = true;
+}
+
+leftUpstream = next(left);
+
+if (leftUpstream == IterOutcome.OUT_OF_MEMORY || rightUpstream == 
IterOutcome.OUT_OF_MEMORY) {
+  state = BatchState.OUT_OF_MEMORY;
+  return;
+}
+
+for(final VectorWrapper v : left) {
 
 Review comment:
   space


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647435#comment-16647435
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223661752
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/IndexGroupScan.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.base;
+
+import com.fasterxml.jackson.annotation.JsonIgnore;
+
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rex.RexNode;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.planner.index.Statistics;
+
+
+import java.util.List;
+
+/**
+ * An IndexGroupScan operator represents the scan associated with an Index.
+ */
+public interface IndexGroupScan extends GroupScan {
+
+  /**
+   * Get the column ordinal of the rowkey column from the output schema of the 
IndexGroupScan
+   * @return
+   */
+  @JsonIgnore
+  public int getRowKeyOrdinal();
+
+  /**
+   * Set the artificial row count after applying the {@link RexNode} condition
+   * Mainly used for debugging
+   * @param condition
+   * @param count
+   * @param capRowCount
+   */
+  @JsonIgnore
+  public void setRowCount(RexNode condition, double count, double capRowCount);
+
+  /**
+   * Get the row count after applying the {@link RexNode} condition
+   * @param condition, filter to apply
+   * @return row count post filtering
+   */
+  @JsonIgnore
+  public double getRowCount(RexNode condition, RelNode scanRel);
+
+  /**
+   * Set the statistics for {@link IndexGroupScan}
+   * @param statistics
+   */
+  @JsonIgnore
+  public void setStatistics(Statistics statistics);
+
+  @JsonIgnore
+  public void setColumns(List columns);
+
+  @JsonIgnore
+  public List getColumns();
+
+  @JsonIgnore
+  public void setParallelizationWidth(int width);
+
+}
 
 Review comment:
   new line


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647438#comment-16647438
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223664534
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/RowKeyJoinBatch.java
 ##
 @@ -0,0 +1,284 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.join;
+
+
+import java.util.List;
+
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.physical.config.RowKeyJoinPOP;
+import org.apache.drill.exec.record.AbstractRecordBatch;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.selection.SelectionVector2;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.exec.vector.SchemaChangeCallBack;
+import org.apache.drill.exec.vector.ValueVector;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.Iterables;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+
+
+public class RowKeyJoinBatch extends AbstractRecordBatch 
implements RowKeyJoin {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(RowKeyJoinBatch.class);
+
+  // primary table side record batch
+  private final RecordBatch left;
+
+  // index table side record batch
+  private final RecordBatch right;
+
+  private boolean hasRowKeyBatch;
+  private IterOutcome leftUpstream = IterOutcome.NONE;
+  private IterOutcome rightUpstream = IterOutcome.NONE;
+  private final List transfers = Lists.newArrayList();
+  private int recordCount = 0;
+  private SchemaChangeCallBack callBack = new SchemaChangeCallBack();
+  private RowKeyJoinState rkJoinState = RowKeyJoinState.INITIAL;
+
+  public RowKeyJoinBatch(RowKeyJoinPOP config, FragmentContext context, 
RecordBatch left, RecordBatch right)
+  throws OutOfMemoryException {
+super(config, context, true /* need to build schema */);
+this.left = left;
+this.right = right;
+this.hasRowKeyBatch = false;
+  }
+
+  @Override
+  public int getRecordCount() {
+if (state == BatchState.DONE) {
+  return 0;
+}
+return recordCount;
+  }
+
+  @Override
+  public SelectionVector2 getSelectionVector2() {
+throw new UnsupportedOperationException("RowKeyJoinBatch does not support 
selection vector");
+  }
+
+  @Override
+  public SelectionVector4 getSelectionVector4() {
+throw new UnsupportedOperationException("RowKeyJoinBatch does not support 
selection vector");
+  }
+
+  @Override
+  protected void buildSchema() throws SchemaChangeException {
+container.clear();
+
+rightUpstream = next(right);
+
+if (leftUpstream == IterOutcome.STOP || rightUpstream == IterOutcome.STOP) 
{
+  state = BatchState.STOP;
+  return;
+}
+
+if (right.getRecordCount() > 0) {
+  // set the hasRowKeyBatch flag such that calling next() on the left input
+  // would see the correct status
+  hasRowKeyBatch = true;
+}
+
+leftUpstream = next(left);
+
+if (leftUpstream == IterOutcome.OUT_OF_MEMORY || rightUpstream == 
IterOutcome.OUT_OF_MEMORY) {
+  state = BatchState.OUT_OF_MEMORY;
+  return;
+}
+
+for(final VectorWrapper v : left) {
+  final TransferPair pair = v.getValueVector().makeTransferPair(
+  container.addOrGet(v.getField(), callBack));
+  transfers.add(pair);
+}
+
+container.buildSchema(left.getSchema().getSelectionVectorMode());
+  }
+
+  @Override
+  public IterOutcome innerNext() {
+if (state == BatchState.DONE) {
+  return IterOutcome.NONE;
+}
+try {
+  if (state == BatchState.FIRST && 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647433#comment-16647433
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223660467
 
 

 ##
 File path: 
contrib/format-maprdb/src/test/java/com/mapr/drill/maprdb/tests/index/LargeTableGenBase.java
 ##
 @@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package com.mapr.drill.maprdb.tests.index;
+
+import org.apache.commons.lang3.RandomStringUtils;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Random;
+import java.util.Set;
+
+public class LargeTableGenBase {
+
+  private boolean dict_ready = false;
+
+  protected List firstnames;
+  protected List lastnames;
+  protected List cities;
+  protected int[] randomized;
+
+  protected synchronized void  initDictionary() {
+initDictionaryWithRand();
+  }
+
+  protected void initDictionaryWithRand() {
+{
+  firstnames = new ArrayList<>();
+  lastnames = new ArrayList<>();
+  cities = new ArrayList<>();
+  List states = new ArrayList<>();
+
+  int fnNum = 2000; //2k
+  int lnNum = 20;//200k
+  int cityNum = 1;//10k
+  int stateNum = 50;
+  Random rand = new Random(2017);
+  int i;
+  try {
+Set strSet = new LinkedHashSet<>();
+while(strSet.size() < stateNum) {
+  strSet.add(RandomStringUtils.random(2, 0, 0, true, false, null, 
rand));
+}
+states.addAll(strSet);
+
+strSet = new LinkedHashSet<>();
+while(strSet.size() < cityNum) {
+  int len = 3 + strSet.size() % 6;
+  strSet.add(RandomStringUtils.random(len, 0, 0, true, false, null, 
rand));
+}
+
+Iterator it = strSet.iterator();
+for(i=0; i Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647432#comment-16647432
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223659334
 
 

 ##
 File path: 
contrib/format-maprdb/src/test/java/com/mapr/drill/maprdb/tests/index/LargeTableGen.java
 ##
 @@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package com.mapr.drill.maprdb.tests.index;
+
+import static com.mapr.drill.maprdb.tests.MaprDBTestsSuite.INDEX_FLUSH_TIMEOUT;
+
+import java.io.InputStream;
+import java.io.StringBufferInputStream;
+
+import org.apache.hadoop.fs.Path;
+import org.ojai.DocumentStream;
+import org.ojai.json.Json;
+
+import com.mapr.db.Admin;
+import com.mapr.db.Table;
+import com.mapr.db.TableDescriptor;
+import com.mapr.db.impl.MapRDBImpl;
+import com.mapr.db.impl.TableDescriptorImpl;
+import com.mapr.db.tests.utils.DBTests;
+import com.mapr.fs.utils.ssh.TestCluster;
+
+/**
+ * This class is to generate a MapR json table of this schema:
+ * {
+ *   "address" : {
+ *  "city":"wtj",
+ *  "state":"ho"
+ *   }
+ *   "contact" : {
+ *  "email":"vcfahj...@gmail.com",
+ *  "phone":"655583"
+ *   }
+ *   "id" : {
+ *  "ssn":"15461"
+ *   }
+ *   "name" : {
+ *  "fname":"VcFahj",
+ *  "lname":"RfM"
+ *   }
+ * }
+ *
+ */
+public class LargeTableGen extends LargeTableGenBase {
+
+  static final int SPLIT_SIZE = 5000;
+  private Admin admin;
+
+  public LargeTableGen(Admin dbadmin) {
+admin = dbadmin;
+  }
+
+  Table createOrGetTable(String tableName, int recordNum) {
+if (admin.tableExists(tableName)) {
+  return MapRDBImpl.getTable(tableName);
+  //admin.deleteTable(tableName);
+}
+else {
+  TableDescriptor desc = new TableDescriptorImpl(new Path(tableName));
+
+  int splits = (recordNum / SPLIT_SIZE) - (((recordNum % SPLIT_SIZE) > 1)? 
0 : 1);
+
+  String[] splitsStr = new String[splits];
+  StringBuilder strBuilder = new StringBuilder("Splits:");
+  for(int i=0; i Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647428#comment-16647428
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223654776
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/RestrictedJsonTableGroupScan.java
 ##
 @@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db.json;
+
+import java.util.List;
+import java.util.NavigableMap;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.physical.base.ScanStats;
+import org.apache.drill.exec.physical.base.ScanStats.GroupScanProperty;
+import org.apache.drill.exec.planner.index.MapRDBStatistics;
+import org.apache.drill.exec.planner.cost.PluginCost;
+import org.apache.drill.exec.planner.index.Statistics;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.exec.store.mapr.db.MapRDBFormatPlugin;
+import org.apache.drill.exec.store.mapr.db.MapRDBSubScan;
+import org.apache.drill.exec.store.mapr.db.MapRDBSubScanSpec;
+import org.apache.drill.exec.store.mapr.db.RestrictedMapRDBSubScan;
+import org.apache.drill.exec.store.mapr.db.RestrictedMapRDBSubScanSpec;
+import org.apache.drill.exec.store.mapr.db.TabletFragmentInfo;
+
+/**
+ * A RestrictedJsonTableGroupScan encapsulates (along with a subscan) the 
functionality
+ * for doing restricted (i.e skip) scan rather than sequential scan.  The 
skipping is based
+ * on a supplied set of row keys (primary keys) from a join operator.
+ */
+@JsonTypeName("restricted-json-scan")
+public class RestrictedJsonTableGroupScan extends JsonTableGroupScan {
+
+  @JsonCreator
+  public RestrictedJsonTableGroupScan(@JsonProperty("userName") String 
userName,
+@JsonProperty("storage") FileSystemPlugin 
storagePlugin,
+@JsonProperty("format") MapRDBFormatPlugin 
formatPlugin,
+@JsonProperty("scanSpec") JsonScanSpec scanSpec, 
/* scan spec of the original table */
+@JsonProperty("columns") List columns,
+@JsonProperty("")MapRDBStatistics statistics) {
 
 Review comment:
   space


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647430#comment-16647430
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223654821
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/RestrictedJsonTableGroupScan.java
 ##
 @@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db.json;
+
+import java.util.List;
+import java.util.NavigableMap;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.physical.base.ScanStats;
+import org.apache.drill.exec.physical.base.ScanStats.GroupScanProperty;
+import org.apache.drill.exec.planner.index.MapRDBStatistics;
+import org.apache.drill.exec.planner.cost.PluginCost;
+import org.apache.drill.exec.planner.index.Statistics;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.exec.store.mapr.db.MapRDBFormatPlugin;
+import org.apache.drill.exec.store.mapr.db.MapRDBSubScan;
+import org.apache.drill.exec.store.mapr.db.MapRDBSubScanSpec;
+import org.apache.drill.exec.store.mapr.db.RestrictedMapRDBSubScan;
+import org.apache.drill.exec.store.mapr.db.RestrictedMapRDBSubScanSpec;
+import org.apache.drill.exec.store.mapr.db.TabletFragmentInfo;
+
+/**
+ * A RestrictedJsonTableGroupScan encapsulates (along with a subscan) the 
functionality
+ * for doing restricted (i.e skip) scan rather than sequential scan.  The 
skipping is based
+ * on a supplied set of row keys (primary keys) from a join operator.
+ */
+@JsonTypeName("restricted-json-scan")
+public class RestrictedJsonTableGroupScan extends JsonTableGroupScan {
+
+  @JsonCreator
+  public RestrictedJsonTableGroupScan(@JsonProperty("userName") String 
userName,
+@JsonProperty("storage") FileSystemPlugin 
storagePlugin,
+@JsonProperty("format") MapRDBFormatPlugin 
formatPlugin,
+@JsonProperty("scanSpec") JsonScanSpec scanSpec, 
/* scan spec of the original table */
+@JsonProperty("columns") List columns,
+@JsonProperty("")MapRDBStatistics statistics) {
+super(userName, storagePlugin, formatPlugin, scanSpec, columns, 
statistics);
+  }
+
+  // TODO:  this method needs to be fully implemented
+  protected RestrictedMapRDBSubScanSpec getSubScanSpec(TabletFragmentInfo tfi) 
{
+JsonScanSpec spec = scanSpec;
+RestrictedMapRDBSubScanSpec subScanSpec =
+new RestrictedMapRDBSubScanSpec(
+spec.getTableName(),
+getRegionsToScan().get(tfi), spec.getSerializedFilter(), 
getUserName());
+return subScanSpec;
+  }
+
+  protected NavigableMap getRegionsToScan() {
+return getRegionsToScan(formatPlugin.getRestrictedScanRangeSizeMB());
+  }
+
+  @Override
+  public MapRDBSubScan getSpecificScan(int minorFragmentId) {
+assert minorFragmentId < endpointFragmentMapping.size() : String.format(
+"Mappings length [%d] should be greater than minor fragment id [%d] 
but it isn't.", endpointFragmentMapping.size(),
+minorFragmentId);
+RestrictedMapRDBSubScan subscan =
+new RestrictedMapRDBSubScan(getUserName(), formatPlugin,
+getEndPointFragmentMapping(minorFragmentId), columns, 
maxRecordsToRead, TABLE_JSON);
+
+return subscan;
+  }
+
+  private List getEndPointFragmentMapping(int 
minorFragmentId) {
+List restrictedSubScanSpecList = 
Lists.newArrayList();
+List subScanSpecList = 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647421#comment-16647421
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223651337
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/JsonTableGroupScan.java
 ##
 @@ -179,16 +295,126 @@ public MapRDBSubScan getSpecificScan(int 
minorFragmentId) {
 assert minorFragmentId < endpointFragmentMapping.size() : String.format(
 "Mappings length [%d] should be greater than minor fragment id [%d] 
but it isn't.", endpointFragmentMapping.size(),
 minorFragmentId);
-return new MapRDBSubScan(getUserName(), formatPlugin, 
endpointFragmentMapping.get(minorFragmentId), columns, TABLE_JSON);
+return new MapRDBSubScan(getUserName(), formatPlugin, 
endpointFragmentMapping.get(minorFragmentId), columns, maxRecordsToRead, 
TABLE_JSON);
   }
 
   @Override
   public ScanStats getScanStats() {
-//TODO: look at stats for this.
-long rowCount = (long) ((scanSpec.getSerializedFilter() != null ? .5 : 1) 
* totalRowCount);
-int avgColumnSize = 10;
-int numColumns = (columns == null || columns.isEmpty()) ? 100 : 
columns.size();
-return new ScanStats(GroupScanProperty.NO_EXACT_ROW_COUNT, rowCount, 1, 
avgColumnSize * numColumns * rowCount);
+if (isIndexScan()) {
+  return indexScanStats();
+}
+return fullTableScanStats();
+  }
+
+  private ScanStats fullTableScanStats() {
+PluginCost pluginCostModel = formatPlugin.getPluginCostModel();
+final int avgColumnSize = pluginCostModel.getAverageColumnSize(this);
+final int numColumns = (columns == null || columns.isEmpty()) ? STAR_COLS 
: columns.size();
+// index will be NULL for FTS
+double rowCount = stats.getRowCount(scanSpec.getCondition(), null);
+// rowcount based on _id predicate. If NO _id predicate present in 
condition, then the
+// rowcount should be same as totalRowCount. Equality b/w the two 
rowcounts should not be
+// construed as NO _id predicate since stats are approximate.
+double leadingRowCount = stats.getLeadingRowCount(scanSpec.getCondition(), 
null);
+double avgRowSize = stats.getAvgRowSize(null, true);
+double totalRowCount = stats.getRowCount(null, null);
+logger.debug("GroupScan {} with stats {}: rowCount={}, condition={}, 
totalRowCount={}, fullTableRowCount={}",
+System.identityHashCode(this), System.identityHashCode(stats), 
rowCount,
+scanSpec.getCondition()==null?"null":scanSpec.getCondition(),
 
 Review comment:
   formatting


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647423#comment-16647423
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223651509
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/JsonTableGroupScan.java
 ##
 @@ -214,11 +445,323 @@ public boolean canPushdownProjects(List 
columns) {
 
   @Override
   public String toString() {
-return "JsonTableGroupScan [ScanSpec=" + scanSpec + ", columns=" + columns 
+ "]";
+return "JsonTableGroupScan [ScanSpec=" + scanSpec + ", columns=" + columns
++ (maxRecordsToRead>0? ", limit=" + maxRecordsToRead : "")
 
 Review comment:
   formatting


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647429#comment-16647429
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223654400
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/OjaiFunctionsProcessor.java
 ##
 @@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db.json;
+
+import org.apache.commons.codec.binary.Base64;
+
+import org.apache.drill.common.expression.FunctionCall;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.expression.ValueExpressions.IntExpression;
+import org.apache.drill.common.expression.ValueExpressions.LongExpression;
+import org.apache.drill.common.expression.ValueExpressions.QuotedString;
+import org.apache.drill.common.expression.visitors.AbstractExprVisitor;
+
+import org.ojai.Value;
+import org.ojai.store.QueryCondition;
+
+import com.google.common.collect.ImmutableMap;
+import com.mapr.db.impl.ConditionImpl;
+import com.mapr.db.impl.MapRDBImpl;
+
+import java.nio.ByteBuffer;
+
+class OjaiFunctionsProcessor extends AbstractExprVisitor {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(OjaiFunctionsProcessor.class);
+  private QueryCondition queryCond;
+
+  private OjaiFunctionsProcessor() {
+  }
+
+  private static String getStackTrace() {
+final Throwable throwable = new Throwable();
+final StackTraceElement[] ste = throwable.getStackTrace();
+final StringBuilder sb = new StringBuilder();
+for(int i = 1; i < ste.length; ++i) {
 
 Review comment:
   space


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647424#comment-16647424
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223652043
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/MaprDBJsonRecordReader.java
 ##
 @@ -98,91 +114,181 @@
   private final boolean disableCountOptimization;
   private final boolean nonExistentColumnsProjection;
 
-  public MaprDBJsonRecordReader(MapRDBSubScanSpec subScanSpec,
-  MapRDBFormatPluginConfig formatPluginConfig,
-  List projectedColumns, FragmentContext context) {
+  protected final MapRDBSubScanSpec subScanSpec;
+  protected final MapRDBFormatPlugin formatPlugin;
+
+  protected OjaiValueWriter valueWriter;
+  protected DocumentReaderVectorWriter documentWriter;
+  protected int maxRecordsToRead = -1;
+
+  public MaprDBJsonRecordReader(MapRDBSubScanSpec subScanSpec, 
MapRDBFormatPlugin formatPlugin,
+List projectedColumns, 
FragmentContext context, int maxRecords) {
+this(subScanSpec, formatPlugin, projectedColumns, context);
+this.maxRecordsToRead = maxRecords;
+  }
+
+  protected MaprDBJsonRecordReader(MapRDBSubScanSpec subScanSpec, 
MapRDBFormatPlugin formatPlugin,
+List projectedColumns, 
FragmentContext context) {
 buffer = context.getManagedBuffer();
-projectedFields = null;
-tableName = Preconditions.checkNotNull(subScanSpec, "MapRDB reader needs a 
sub-scan spec").getTableName();
-documentReaderIterators = null;
-includeId = false;
-idOnly= false;
+final Path tablePath = new Path(Preconditions.checkNotNull(subScanSpec,
+  "MapRDB reader needs a sub-scan spec").getTableName());
+this.subScanSpec = subScanSpec;
+this.formatPlugin = formatPlugin;
+final IndexDesc indexDesc = subScanSpec.getIndexDesc();
 byte[] serializedFilter = subScanSpec.getSerializedFilter();
 condition = null;
 
 if (serializedFilter != null) {
   condition = 
com.mapr.db.impl.ConditionImpl.parseFrom(ByteBufs.wrap(serializedFilter));
 }
 
-disableCountOptimization = formatPluginConfig.disableCountOptimization();
+disableCountOptimization = 
formatPlugin.getConfig().disableCountOptimization();
+// Below call will set the scannedFields and includeId correctly
 setColumns(projectedColumns);
-unionEnabled = 
context.getOptions().getBoolean(ExecConstants.ENABLE_UNION_TYPE_KEY);
-readNumbersAsDouble = formatPluginConfig.isReadAllNumbersAsDouble();
-allTextMode = formatPluginConfig.isAllTextMode();
-ignoreSchemaChange = formatPluginConfig.isIgnoreSchemaChange();
-disablePushdown = !formatPluginConfig.isEnablePushdown();
-nonExistentColumnsProjection = 
formatPluginConfig.isNonExistentFieldSupport();
+unionEnabled = 
context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE);
+readNumbersAsDouble = formatPlugin.getConfig().isReadAllNumbersAsDouble();
+allTextMode = formatPlugin.getConfig().isAllTextMode();
+ignoreSchemaChange = formatPlugin.getConfig().isIgnoreSchemaChange();
+disablePushdown = !formatPlugin.getConfig().isEnablePushdown();
+nonExistentColumnsProjection = 
formatPlugin.getConfig().isNonExistentFieldSupport();
+
+// Do not use cached table handle for two reasons.
+// cached table handles default timeout is 60 min after which those 
handles will become stale.
+// Since execution can run for longer than 60 min, we want to get a new 
table handle and use it
+// instead of the one from cache.
+// Since we are setting some table options, we do not want to use shared 
handles.
+//
+// Call it here instead of setup since this will make sure it's called 
under correct UGI block when impersonation
+// is enabled and table is used with and without views.
+table = (indexDesc == null ? MapRDBImpl.getTable(tablePath) : 
MapRDBImpl.getIndexTable(indexDesc));
+
+if (condition != null) {
+  logger.debug("Created record reader with query condition {}", 
condition.toString());
+} else {
+  logger.debug("Created record reader with query condition NULL");
+}
   }
 
   @Override
   protected Collection transformColumns(Collection 
columns) {
 Set transformed = Sets.newLinkedHashSet();
+Set encodedSchemaPathSet = Sets.newLinkedHashSet();
+
 if (disablePushdown) {
   transformed.add(SchemaPath.STAR_COLUMN);
   includeId = true;
-  return transformed;
-}
+} else {
+  if (isStarQuery()) {
+transformed.add(SchemaPath.STAR_COLUMN);
+includeId = true;
+if (isSkipQuery() && !disableCountOptimization) {
+  // `SELECT 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647426#comment-16647426
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223651862
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/JsonTableRangePartitionFunction.java
 ##
 @@ -0,0 +1,237 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db.json;
+
+import java.util.List;
+
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.exec.planner.physical.AbstractRangePartitionFunction;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.store.mapr.db.MapRDBFormatPlugin;
+import org.apache.drill.exec.vector.ValueVector;
+import org.ojai.store.QueryCondition;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+import com.mapr.db.Table;
+import com.mapr.db.impl.ConditionImpl;
+import com.mapr.db.impl.IdCodec;
+import com.mapr.db.impl.ConditionNode.RowkeyRange;
+import com.mapr.db.scan.ScanRange;
+import com.mapr.fs.jni.MapRConstants;
+import com.mapr.org.apache.hadoop.hbase.util.Bytes;
+
+@JsonTypeName("jsontable-range-partition-function")
+public class JsonTableRangePartitionFunction extends 
AbstractRangePartitionFunction {
+
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(JsonTableRangePartitionFunction.class);
+
+  @JsonProperty("refList")
+  protected List refList;
+
+  @JsonProperty("tableName")
+  protected String tableName;
+
+  @JsonIgnore
+  protected String userName;
+
+  @JsonIgnore
+  protected ValueVector partitionKeyVector = null;
+
+  // List of start keys of the scan ranges for the table.
+  @JsonProperty
+  protected List startKeys = null;
+
+  // List of stop keys of the scan ranges for the table.
+  @JsonProperty
+  protected List stopKeys = null;
+
+  @JsonCreator
+  public JsonTableRangePartitionFunction(
+  @JsonProperty("refList") List refList,
+  @JsonProperty("tableName") String tableName,
+  @JsonProperty("startKeys") List startKeys,
+  @JsonProperty("stopKeys") List stopKeys) {
+this.refList = refList;
+this.tableName = tableName;
+this.startKeys = startKeys;
+this.stopKeys = stopKeys;
+  }
+
+  public JsonTableRangePartitionFunction(List refList,
+  String tableName, String userName, MapRDBFormatPlugin formatPlugin) {
+this.refList = refList;
+this.tableName = tableName;
+this.userName = userName;
+initialize(formatPlugin);
+  }
+
+  @JsonProperty("refList")
+  @Override
+  public List getPartitionRefList() {
+return refList;
+  }
+
+  @Override
+  public void setup(List> partitionKeys) {
+if (partitionKeys.size() != 1) {
+  throw new UnsupportedOperationException(
+  "Range partitioning function supports exactly one partition column; 
encountered " + partitionKeys.size());
+}
+
+VectorWrapper v = partitionKeys.get(0);
+
+partitionKeyVector = v.getValueVector();
+
+Preconditions.checkArgument(partitionKeyVector != null, "Found null 
partitionKeVector.") ;
+  }
+
+  @Override
+  public boolean equals(Object obj) {
+if (this == obj) {
+  return true;
+}
+if (obj instanceof JsonTableRangePartitionFunction) {
+  JsonTableRangePartitionFunction rpf = (JsonTableRangePartitionFunction) 
obj;
+  List thisPartRefList = this.getPartitionRefList();
+  List otherPartRefList = rpf.getPartitionRefList();
+  if (thisPartRefList.size() != otherPartRefList.size()) {
+return false;
+  }
+  for (int refIdx=0; refIdx= 0 ||
 
 Review comment:
   formatting


This is an 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647427#comment-16647427
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223651821
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/JsonTableRangePartitionFunction.java
 ##
 @@ -0,0 +1,237 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db.json;
+
+import java.util.List;
+
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.exec.planner.physical.AbstractRangePartitionFunction;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.store.mapr.db.MapRDBFormatPlugin;
+import org.apache.drill.exec.vector.ValueVector;
+import org.ojai.store.QueryCondition;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+import com.mapr.db.Table;
+import com.mapr.db.impl.ConditionImpl;
+import com.mapr.db.impl.IdCodec;
+import com.mapr.db.impl.ConditionNode.RowkeyRange;
+import com.mapr.db.scan.ScanRange;
+import com.mapr.fs.jni.MapRConstants;
+import com.mapr.org.apache.hadoop.hbase.util.Bytes;
+
+@JsonTypeName("jsontable-range-partition-function")
+public class JsonTableRangePartitionFunction extends 
AbstractRangePartitionFunction {
+
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(JsonTableRangePartitionFunction.class);
+
+  @JsonProperty("refList")
+  protected List refList;
+
+  @JsonProperty("tableName")
+  protected String tableName;
+
+  @JsonIgnore
+  protected String userName;
+
+  @JsonIgnore
+  protected ValueVector partitionKeyVector = null;
+
+  // List of start keys of the scan ranges for the table.
+  @JsonProperty
+  protected List startKeys = null;
+
+  // List of stop keys of the scan ranges for the table.
+  @JsonProperty
+  protected List stopKeys = null;
+
+  @JsonCreator
+  public JsonTableRangePartitionFunction(
+  @JsonProperty("refList") List refList,
+  @JsonProperty("tableName") String tableName,
+  @JsonProperty("startKeys") List startKeys,
+  @JsonProperty("stopKeys") List stopKeys) {
+this.refList = refList;
+this.tableName = tableName;
+this.startKeys = startKeys;
+this.stopKeys = stopKeys;
+  }
+
+  public JsonTableRangePartitionFunction(List refList,
+  String tableName, String userName, MapRDBFormatPlugin formatPlugin) {
+this.refList = refList;
+this.tableName = tableName;
+this.userName = userName;
+initialize(formatPlugin);
+  }
+
+  @JsonProperty("refList")
+  @Override
+  public List getPartitionRefList() {
+return refList;
+  }
+
+  @Override
+  public void setup(List> partitionKeys) {
+if (partitionKeys.size() != 1) {
+  throw new UnsupportedOperationException(
+  "Range partitioning function supports exactly one partition column; 
encountered " + partitionKeys.size());
+}
+
+VectorWrapper v = partitionKeys.get(0);
+
+partitionKeyVector = v.getValueVector();
+
+Preconditions.checkArgument(partitionKeyVector != null, "Found null 
partitionKeVector.") ;
+  }
+
+  @Override
+  public boolean equals(Object obj) {
+if (this == obj) {
+  return true;
+}
+if (obj instanceof JsonTableRangePartitionFunction) {
+  JsonTableRangePartitionFunction rpf = (JsonTableRangePartitionFunction) 
obj;
+  List thisPartRefList = this.getPartitionRefList();
+  List otherPartRefList = rpf.getPartitionRefList();
+  if (thisPartRefList.size() != otherPartRefList.size()) {
+return false;
+  }
+  for (int refIdx=0; refIdx Add capability to do index based planning and execution
> ---
>
>  

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647419#comment-16647419
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r217475464
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBIndexDescriptor.java
 ##
 @@ -0,0 +1,222 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+
+import java.util.Collection;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+
+import org.apache.calcite.plan.RelOptCost;
+import org.apache.calcite.plan.RelOptPlanner;
+import org.apache.calcite.rel.RelFieldCollation.NullDirection;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.expr.CloneVisitor;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.planner.cost.DrillCostBase;
+import org.apache.drill.exec.planner.cost.DrillCostBase.DrillCostFactory;
+import org.apache.drill.exec.planner.cost.PluginCost;
+import org.apache.drill.exec.planner.index.IndexProperties;
+import org.apache.drill.exec.store.mapr.PluginConstants;
+import org.apache.drill.exec.util.EncodedSchemaPathSet;
+import org.apache.drill.common.expression.LogicalExpression;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableSet;
+
+public class MapRDBIndexDescriptor extends DrillIndexDescriptor {
+
+  protected final Object desc;
+  protected final Set allFields;
+  protected final Set indexedFields;
+  protected MapRDBFunctionalIndexInfo functionalInfo;
+  protected PluginCost pluginCost;
+
+  public MapRDBIndexDescriptor(List indexCols,
+   CollationContext indexCollationContext,
+   List nonIndexCols,
+   List rowKeyColumns,
+   String indexName,
+   String tableName,
+   IndexType type,
+   Object desc,
+   DbGroupScan scan,
+   NullDirection nullsDirection) {
+super(indexCols, indexCollationContext, nonIndexCols, rowKeyColumns, 
indexName, tableName, type, nullsDirection);
+this.desc = desc;
+this.indexedFields = ImmutableSet.copyOf(indexColumns);
+this.allFields = new ImmutableSet.Builder()
+.add(PluginConstants.DOCUMENT_SCHEMA_PATH)
+.addAll(indexColumns)
+.addAll(nonIndexColumns)
+.build();
+this.pluginCost = scan.getPluginCostModel();
+  }
+
+  public Object getOriginalDesc(){
+return desc;
+  }
+
+  @Override
+  public boolean isCoveringIndex(List expressions) {
+List decodedCols = new 
DecodePathinExpr().parseExpressions(expressions);
+return columnsInIndexFields(decodedCols, allFields);
+  }
+
+  @Override
+  public boolean allColumnsIndexed(Collection expressions) {
+List decodedCols = new 
DecodePathinExpr().parseExpressions(expressions);
+return columnsInIndexFields(decodedCols, indexedFields);
+  }
+
+  @Override
+  public boolean someColumnsIndexed(Collection columns) {
+return columnsIndexed(columns, false);
+  }
+
+  private boolean columnsIndexed(Collection expressions, 
boolean allColsIndexed) {
+List decodedCols = new 
DecodePathinExpr().parseExpressions(expressions);
+if (allColsIndexed) {
+  return columnsInIndexFields(decodedCols, indexedFields);
+} else {
+  return someColumnsInIndexFields(decodedCols, indexedFields);
+}
+  }
+
+  public FunctionalIndexInfo getFunctionalInfo() {
+if (this.functionalInfo == null) {
+  this.functionalInfo = new MapRDBFunctionalIndexInfo(this);
+}
+return this.functionalInfo;
+  }
+
+  /**
+   * Search through a LogicalExpression, finding all 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647420#comment-16647420
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r217477453
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBSubScanSpec.java
 ##
 @@ -19,32 +19,39 @@
 
 import com.fasterxml.jackson.annotation.JsonCreator;
 import com.fasterxml.jackson.annotation.JsonProperty;
+import com.mapr.db.index.IndexDesc;
 import com.mapr.fs.jni.MapRConstants;
 import com.mapr.org.apache.hadoop.hbase.util.Bytes;
 
-public class MapRDBSubScanSpec {
+public class MapRDBSubScanSpec implements Comparable{
 
 Review comment:
   space


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647418#comment-16647418
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r217475314
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBFunctionalIndexInfo.java
 ##
 @@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import org.apache.drill.common.expression.CastExpression;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+
+import java.util.Map;
+import java.util.Set;
+
+public class MapRDBFunctionalIndexInfo implements FunctionalIndexInfo {
+
+  final private IndexDescriptor indexDesc;
+
+  private boolean hasFunctionalField = false;
+
+  //when we scan schemaPath in groupscan's columns, we check if this 
column(schemaPath) should be rewritten to '$N',
+  //When there are more than two functions on the same column in index, 
CAST(a.b as INT), CAST(a.b as VARCHAR),
+  // then we should map SchemaPath a.b to a set of SchemaPath, e.g. $1, $2
+  private Map> columnToConvert;
+
+  // map of functional index expression to destination SchemaPath e.g. $N
+  private Map exprToConvert;
+
+  //map of SchemaPath involved in a functional field
+  private Map> pathsInExpr;
+
+  private Set newPathsForIndexedFunction;
+
+  private Set allPathsInFunction;
+
+  public MapRDBFunctionalIndexInfo(IndexDescriptor indexDesc) {
+this.indexDesc = indexDesc;
+columnToConvert = Maps.newHashMap();
+exprToConvert = Maps.newHashMap();
+pathsInExpr = Maps.newHashMap();
+//keep the order of new paths, it may be related to the naming policy
+newPathsForIndexedFunction = Sets.newLinkedHashSet();
+allPathsInFunction = Sets.newHashSet();
+init();
+  }
+
+  private void init() {
+int count = 0;
+for(LogicalExpression indexedExpr : indexDesc.getIndexColumns()) {
+  if( !(indexedExpr instanceof SchemaPath) ) {
+hasFunctionalField = true;
+SchemaPath functionalFieldPath = SchemaPath.getSimplePath("$"+count);
+newPathsForIndexedFunction.add(functionalFieldPath);
+
+//now we handle only cast expression
+if(indexedExpr instanceof CastExpression) {
+  //We handle only CAST directly on SchemaPath for now.
+  SchemaPath pathBeingCasted = (SchemaPath)((CastExpression) 
indexedExpr).getInput();
+  addTargetPathForOriginalPath(pathBeingCasted, functionalFieldPath);
+  addPathInExpr(indexedExpr, pathBeingCasted);
+  exprToConvert.put(indexedExpr, functionalFieldPath);
+  allPathsInFunction.add(pathBeingCasted);
+}
+
+count++;
+  }
+}
+  }
+
+  private void addPathInExpr(LogicalExpression expr, SchemaPath path) {
+if (!pathsInExpr.containsKey(expr)) {
+  Set newSet = Sets.newHashSet();
+  newSet.add(path);
+  pathsInExpr.put(expr, newSet);
+}
+else {
+  pathsInExpr.get(expr).add(path);
+}
+  }
+
+  private void addTargetPathForOriginalPath(SchemaPath origPath, SchemaPath 
newPath) {
+if (!columnToConvert.containsKey(origPath)) {
+  Set newSet = Sets.newHashSet();
+  newSet.add(newPath);
+  columnToConvert.put(origPath, newSet);
+}
+else {
+  columnToConvert.get(origPath).add(newPath);
+}
+  }
+
+
+  public boolean hasFunctional() {
+return hasFunctionalField;
+  }
+
+  public IndexDescriptor getIndexDesc() {
+return indexDesc;
+  }
+
+  /**
+   * getNewPath: for an original path, return new rename '$N' path, notice 
there could be multiple renamed paths
+   * if the there are multiple functional indexes refer original path.
+   * @param path
+   * @return
+   */
+  public SchemaPath getNewPath(SchemaPath path) {
+if(columnToConvert.containsKey(path)) {
 
 Review 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647417#comment-16647417
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r217475224
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBFunctionalIndexInfo.java
 ##
 @@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import org.apache.drill.common.expression.CastExpression;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+
+import java.util.Map;
+import java.util.Set;
+
+public class MapRDBFunctionalIndexInfo implements FunctionalIndexInfo {
+
+  final private IndexDescriptor indexDesc;
+
+  private boolean hasFunctionalField = false;
+
+  //when we scan schemaPath in groupscan's columns, we check if this 
column(schemaPath) should be rewritten to '$N',
+  //When there are more than two functions on the same column in index, 
CAST(a.b as INT), CAST(a.b as VARCHAR),
+  // then we should map SchemaPath a.b to a set of SchemaPath, e.g. $1, $2
+  private Map> columnToConvert;
+
+  // map of functional index expression to destination SchemaPath e.g. $N
+  private Map exprToConvert;
+
+  //map of SchemaPath involved in a functional field
+  private Map> pathsInExpr;
+
+  private Set newPathsForIndexedFunction;
+
+  private Set allPathsInFunction;
+
+  public MapRDBFunctionalIndexInfo(IndexDescriptor indexDesc) {
+this.indexDesc = indexDesc;
+columnToConvert = Maps.newHashMap();
+exprToConvert = Maps.newHashMap();
+pathsInExpr = Maps.newHashMap();
+//keep the order of new paths, it may be related to the naming policy
+newPathsForIndexedFunction = Sets.newLinkedHashSet();
+allPathsInFunction = Sets.newHashSet();
+init();
+  }
+
+  private void init() {
+int count = 0;
+for(LogicalExpression indexedExpr : indexDesc.getIndexColumns()) {
+  if( !(indexedExpr instanceof SchemaPath) ) {
+hasFunctionalField = true;
+SchemaPath functionalFieldPath = SchemaPath.getSimplePath("$"+count);
+newPathsForIndexedFunction.add(functionalFieldPath);
+
+//now we handle only cast expression
 
 Review comment:
   Please add white spaces in the comments


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647416#comment-16647416
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r217475035
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBFunctionalIndexInfo.java
 ##
 @@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import org.apache.drill.common.expression.CastExpression;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+
+import java.util.Map;
+import java.util.Set;
+
+public class MapRDBFunctionalIndexInfo implements FunctionalIndexInfo {
+
+  final private IndexDescriptor indexDesc;
+
+  private boolean hasFunctionalField = false;
+
+  //when we scan schemaPath in groupscan's columns, we check if this 
column(schemaPath) should be rewritten to '$N',
+  //When there are more than two functions on the same column in index, 
CAST(a.b as INT), CAST(a.b as VARCHAR),
+  // then we should map SchemaPath a.b to a set of SchemaPath, e.g. $1, $2
+  private Map> columnToConvert;
+
+  // map of functional index expression to destination SchemaPath e.g. $N
+  private Map exprToConvert;
+
+  //map of SchemaPath involved in a functional field
+  private Map> pathsInExpr;
+
+  private Set newPathsForIndexedFunction;
+
+  private Set allPathsInFunction;
+
+  public MapRDBFunctionalIndexInfo(IndexDescriptor indexDesc) {
+this.indexDesc = indexDesc;
+columnToConvert = Maps.newHashMap();
+exprToConvert = Maps.newHashMap();
+pathsInExpr = Maps.newHashMap();
+//keep the order of new paths, it may be related to the naming policy
+newPathsForIndexedFunction = Sets.newLinkedHashSet();
+allPathsInFunction = Sets.newHashSet();
+init();
+  }
+
+  private void init() {
+int count = 0;
+for(LogicalExpression indexedExpr : indexDesc.getIndexColumns()) {
+  if( !(indexedExpr instanceof SchemaPath) ) {
 
 Review comment:
   remove spaces please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647205#comment-16647205
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

Ben-Zvi commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r224639518
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/generators/IndexIntersectPlanGenerator.java
 ##
 @@ -0,0 +1,350 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index.generators;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+
+import org.apache.calcite.plan.RelOptUtil;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.InvalidRelException;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeFactory;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.util.Pair;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.IndexGroupScan;
+import org.apache.drill.exec.planner.common.JoinControl;
+import org.apache.drill.exec.planner.index.IndexLogicalPlanCallContext;
+import org.apache.drill.exec.planner.index.IndexDescriptor;
+import org.apache.drill.exec.planner.index.FunctionalIndexInfo;
+import org.apache.drill.exec.planner.index.FunctionalIndexHelper;
+import org.apache.drill.exec.planner.index.IndexPlanUtils;
+import org.apache.drill.exec.planner.index.IndexConditionInfo;
+import org.apache.drill.exec.planner.physical.DrillDistributionTrait;
+import org.apache.drill.exec.planner.physical.DrillDistributionTraitDef;
+import 
org.apache.drill.exec.planner.physical.DrillDistributionTrait.DistributionType;
+import org.apache.drill.exec.planner.physical.FilterPrel;
+import org.apache.drill.exec.planner.physical.HashJoinPrel;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.physical.Prule;
+import org.apache.drill.exec.planner.physical.RowKeyJoinPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * IndexScanIntersectGenerator is to generate index plan against multiple 
index tables,
+ * the input indexes are assumed to be ranked by selectivity(low to high) 
already.
+ */
+public class IndexIntersectPlanGenerator extends AbstractIndexPlanGenerator {
+
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(IndexIntersectPlanGenerator.class);
+
+  final Map indexInfoMap;
+
+  public IndexIntersectPlanGenerator(IndexLogicalPlanCallContext indexContext,
+ Map 
indexInfoMap,
+ RexBuilder builder,
+ PlannerSettings settings) {
+super(indexContext, null, null, builder, settings);
+this.indexInfoMap = indexInfoMap;
+  }
+
+  public RelNode buildRowKeyJoin(RelNode left, RelNode right, boolean 
isRowKeyJoin, int htControl)
+  throws InvalidRelException {
+final int leftRowKeyIdx = getRowKeyIndex(left.getRowType(), origScan);
+final int rightRowKeyIdx = 0; // only rowkey field is being projected from 
right side
+
+assert leftRowKeyIdx >= 0;
+
+List leftJoinKeys = ImmutableList.of(leftRowKeyIdx);
+List rightJoinKeys = ImmutableList.of(rightRowKeyIdx);
+
+logger.trace(String.format(
+"buildRowKeyJoin: leftIdx: %d, rightIdx: %d",
+

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645796#comment-16645796
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r224287561
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java
 ##
 @@ -22,7 +22,7 @@
 import java.util.LinkedList;
 import java.util.List;
 
-import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+import org.apache.calcite.rel.type.RelDataType;
 
 Review comment:
   It probably does..can you create a JIRA for the renaming since it is 
independent of this PR. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645792#comment-16645792
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r224286776
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushProjectIntoScan.java
 ##
 @@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db;
+
+import com.google.common.collect.Lists;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.plan.RelTrait;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.rules.ProjectRemoveRule;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexNode;
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.logical.RelOptHelper;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil.ProjectPushInfo;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.store.StoragePluginOptimizerRule;
+import org.apache.drill.exec.store.mapr.db.binary.BinaryTableGroupScan;
+import org.apache.drill.exec.store.mapr.db.json.JsonTableGroupScan;
+import org.apache.drill.exec.util.Utilities;
+
+import java.util.List;
+
+public abstract class MapRDBPushProjectIntoScan extends 
StoragePluginOptimizerRule {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(MapRDBPushProjectIntoScan.class);
+
+  private MapRDBPushProjectIntoScan(RelOptRuleOperand operand, String 
description) {
+super(operand, description);
+  }
+
+  public static final StoragePluginOptimizerRule PROJECT_ON_SCAN = new 
MapRDBPushProjectIntoScan(
+  RelOptHelper.some(ProjectPrel.class, RelOptHelper.any(ScanPrel.class)), 
"MapRDBPushProjIntoScan:Proj_On_Scan") {
+@Override
+public void onMatch(RelOptRuleCall call) {
+  final ScanPrel scan = (ScanPrel) call.rel(1);
+  final ProjectPrel project = (ProjectPrel) call.rel(0);
+  if (!(scan.getGroupScan() instanceof MapRDBGroupScan)) {
+return;
+  }
+  doPushProjectIntoGroupScan(call, project, scan, (MapRDBGroupScan) 
scan.getGroupScan());
+  if (scan.getGroupScan() instanceof BinaryTableGroupScan) {
 
 Review comment:
   Good catch..this pushdown was only intended for `JsonTableGroupScan`, so not 
sure why that extra call before the `if` block is there.  Will look into the 
origins of it. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645764#comment-16645764
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r224282950
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushProjectIntoScan.java
 ##
 @@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mapr.db;
+
+import com.google.common.collect.Lists;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.plan.RelTrait;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.rules.ProjectRemoveRule;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexNode;
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.logical.RelOptHelper;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil.ProjectPushInfo;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.store.StoragePluginOptimizerRule;
+import org.apache.drill.exec.store.mapr.db.binary.BinaryTableGroupScan;
+import org.apache.drill.exec.store.mapr.db.json.JsonTableGroupScan;
+import org.apache.drill.exec.util.Utilities;
+
+import java.util.List;
+
+public abstract class MapRDBPushProjectIntoScan extends 
StoragePluginOptimizerRule {
 
 Review comment:
   The main difference is that DrillPushProjectIntoScanRule is applied during 
logical planning phase whereas the MapRDBPushProjectIntoScan is applied during 
physical planning because we want to ensure that even after new physical plans 
are created (such as index based plans), the projection pushdown is done. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645759#comment-16645759
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r224281194
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/planner/index/MapRDBFunctionalIndexInfo.java
 ##
 @@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import org.apache.drill.common.expression.CastExpression;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+
+import java.util.Map;
+import java.util.Set;
+
+public class MapRDBFunctionalIndexInfo implements FunctionalIndexInfo {
+
+  final private IndexDescriptor indexDesc;
+
+  private boolean hasFunctionalField = false;
+
+  //when we scan schemaPath in groupscan's columns, we check if this 
column(schemaPath) should be rewritten to '$N',
+  //When there are more than two functions on the same column in index, 
CAST(a.b as INT), CAST(a.b as VARCHAR),
+  // then we should map SchemaPath a.b to a set of SchemaPath, e.g. $1, $2
+  private Map> columnToConvert;
+
+  // map of functional index expression to destination SchemaPath e.g. $N
+  private Map exprToConvert;
+
+  //map of SchemaPath involved in a functional field
+  private Map> pathsInExpr;
+
+  private Set newPathsForIndexedFunction;
+
+  private Set allPathsInFunction;
+
+  public MapRDBFunctionalIndexInfo(IndexDescriptor indexDesc) {
+this.indexDesc = indexDesc;
+columnToConvert = Maps.newHashMap();
 
 Review comment:
   I am inclined to leave these as-is since there are quite a few places where 
we use the Google commons package to create not just HashMaps but Lists, Sets 
and other collections. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645741#comment-16645741
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r224278895
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/DrillIndexDescriptor.java
 ##
 @@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.index;
+
+import org.apache.calcite.rel.RelFieldCollation.NullDirection;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rex.RexNode;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.IndexGroupScan;
+import org.apache.drill.exec.planner.cost.PluginCost;
+import org.apache.drill.exec.planner.logical.DrillTable;
+
+import java.io.IOException;
+import java.util.List;
+
+public class DrillIndexDescriptor extends AbstractIndexDescriptor {
+
+  /**
+   * The name of Drill's Storage Plugin on which the Index was stored
+   */
+  private String storage;
+
+  private DrillTable table;
+
+  public DrillIndexDescriptor(List indexCols,
+  CollationContext indexCollationContext,
+  List nonIndexCols,
+  List rowKeyColumns,
+  String indexName,
+  String tableName,
+  IndexType type,
+  NullDirection nullsDirection) {
+super(indexCols, indexCollationContext, nonIndexCols, rowKeyColumns, 
indexName, tableName, type, nullsDirection);
+  }
+
+  public DrillIndexDescriptor(DrillIndexDefinition def) {
+this(def.indexColumns, def.indexCollationContext, def.nonIndexColumns, 
def.rowKeyColumns, def.indexName,
+def.getTableName(), def.getIndexType(), def.nullsDirection);
+  }
+
+  @Override
+  public double getRows(RelNode scan, RexNode indexCondition) {
+//TODO: real implementation is to use Drill's stats implementation. for 
now return fake value 1.0
+return 1.0;
+  }
+
+  @Override
+  public IndexGroupScan getIndexGroupScan() {
+try {
+  final DrillTable idxTable = getDrillTable();
+  GroupScan scan = idxTable.getGroupScan();
+
+  if (!(scan instanceof IndexGroupScan)){
+logger.error("The Groupscan from table {} is not an IndexGroupScan", 
idxTable.toString());
+return null;
+  }
+  return (IndexGroupScan)scan;
+}
+catch(IOException e) {
+  logger.error("Error in getIndexGroupScan ", e);
+}
+return null;
+  }
+
+  public void attach(String storageName, DrillTable inTable) {
 
 Review comment:
   Removed attach() since it was not being used.  Added Javadoc or `@Override` 
for few others. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645732#comment-16645732
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on issue #1466: DRILL-6381: Add support for index based 
planning and execution
URL: https://github.com/apache/drill/pull/1466#issuecomment-428770114
 
 
   @vdiravka I think I have addressed all the formatting issues in commit 
5e0fa72 and added some missing javadocs (btw, thanks for pointing out the 
formatting issues..I don't think I introduced them butshould have caught 
them before creating the PR).  I will go ahead and 'hide' your comments related 
to these and leave your other comments open. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645148#comment-16645148
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r224127034
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/MaprDBJsonRecordReader.java
 ##
 @@ -517,6 +388,15 @@ private static FieldPath 
getFieldPathForProjection(SchemaPath column) {
 return new FieldPath(child);
   }
 
+  public static boolean includesIdField(Collection projected) {
+return Iterables.tryFind(projected, new Predicate() {
 
 Review comment:
   Sure. Guava Predicate has a javadoc with a reference to: 
https://github.com/google/guava/wiki/FunctionalExplained
   > While Guava's functional utilities are usable on Java versions prior to 
Java 8, functional programming without Java 8 requires awkward and verbose use 
of anonymous classes.
   
   Personally I think it looks better also my IDE sometimes suggest me how it 
is possible to convert old code style to java8.
   ```
   return Iterables.tryFind(projected, path -> 
Preconditions.checkNotNull(path).equals(ID_FIELD))
   .isPresent();
   ```
   Also here `FieldPath` could be null and NPE as a result. Is it expected? If 
no, to avoid it you can use `Objects.equals()` instead of 
`reconditions.checkNotNull()` and usual `Object.equals()` (_Note_: Objects vs 
Object).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644152#comment-16644152
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

amansinha100 commented on a change in pull request #1466: DRILL-6381: Add 
support for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223878408
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/MaprDBJsonRecordReader.java
 ##
 @@ -517,6 +388,15 @@ private static FieldPath 
getFieldPathForProjection(SchemaPath column) {
 return new FieldPath(child);
   }
 
+  public static boolean includesIdField(Collection projected) {
+return Iterables.tryFind(projected, new Predicate() {
 
 Review comment:
   @vdiravka could you elaborate on this a bit ?  I haven't done much with 
lambda expressions in Java 8, but if you show how to rewrite this statement and 
assuming it has an advantage over the existing implementation, I would be happy 
to.  Thanks. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643915#comment-16643915
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223672779
 
 

 ##
 File path: pom.xml
 ##
 @@ -53,8 +53,8 @@
 2.9.5
 2.9.5
 3.4.12
-5.2.1-mapr
-1.1
+6.0.1-mapr
 
 Review comment:
   Use 6.1.0-mapr version
   It will be introduced after merging the following PR:
   
https://github.com/apache/drill/pull/1489/files#diff-600376dffeb79835ede4a0b285078036R56


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643913#comment-16643913
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223667043
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdSelectivity.java
 ##
 @@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.cost;
+
+import org.apache.calcite.plan.volcano.RelSubset;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider;
+import org.apache.calcite.rel.metadata.RelMdSelectivity;
+import org.apache.calcite.rel.metadata.RelMdUtil;
+import org.apache.calcite.rel.metadata.RelMetadataProvider;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+
+import java.util.List;
+
+public class DrillRelMdSelectivity extends RelMdSelectivity {
+  private static final DrillRelMdSelectivity INSTANCE = new 
DrillRelMdSelectivity();
+
+  public static final RelMetadataProvider SOURCE = 
ReflectiveRelMetadataProvider.reflectiveSource(BuiltInMethod.SELECTIVITY.method,
 INSTANCE);
+
+
+  public Double getSelectivity(RelNode rel, RexNode predicate) {
 
 Review comment:
   Why super methods can't be used instead? Is it necessary to improve them in 
Calcite RelMdSelectivity.java?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643338#comment-16643338
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223675289
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
 ##
 @@ -114,6 +114,28 @@
   public static final String UNIONALL_DISTRIBUTE_KEY = 
"planner.enable_unionall_distribute";
   public static final BooleanValidator UNIONALL_DISTRIBUTE = new 
BooleanValidator(UNIONALL_DISTRIBUTE_KEY, null);
 
+  // --- Index planning related 
options BEGIN --
+  public static final String USE_SIMPLE_OPTIMIZER_KEY = 
"planner.use_simple_optimizer";
+  public static final BooleanValidator USE_SIMPLE_OPTIMIZER = new 
BooleanValidator(USE_SIMPLE_OPTIMIZER_KEY, null);
+  public static final BooleanValidator INDEX_PLANNING = new 
BooleanValidator("planner.enable_index_planning", null);
+  public static final BooleanValidator ENABLE_STATS = new 
BooleanValidator("planner.enable_statistics", null);
 
 Review comment:
   Is it used for obtaining MapR-DB and Parquet table statistics?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643334#comment-16643334
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223670746
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/IndexPlanUtils.java
 ##
 @@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.planner.index;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
+
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.plan.volcano.RelSubset;
+import org.apache.calcite.rel.RelCollation;
+import org.apache.calcite.rel.RelCollationTraitDef;
+import org.apache.calcite.rel.RelCollations;
+import org.apache.calcite.rel.RelFieldCollation;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Sort;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.DbGroupScan;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.IndexGroupScan;
+import org.apache.drill.exec.planner.common.DrillProjectRelBase;
+import org.apache.drill.exec.planner.common.DrillScanRelBase;
+import org.apache.drill.exec.planner.fragment.DistributionAffinity;
+import org.apache.drill.exec.planner.logical.DrillOptiq;
+import org.apache.drill.exec.planner.logical.DrillParseContext;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import org.apache.drill.exec.planner.physical.DrillDistributionTrait;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.common.OrderedRel;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+
+public class IndexPlanUtils {
+
+  public enum ConditionIndexed {
+NONE,
+PARTIAL,
+FULL}
+
+  /**
+   * Check if any of the fields of the index are present in a list of 
LogicalExpressions supplied
+   * as part of IndexableExprMarker
+   * @param exprMarker, the marker that has analyzed original index condition 
on top of original scan
+   * @param indexDesc
+   * @return ConditionIndexed.FULL, PARTIAL or NONE depending on whether all, 
some or no columns
+   * of the indexDesc are present in the list of LogicalExpressions supplied 
as part of exprMarker
+   *
+   */
+  static public ConditionIndexed conditionIndexed(IndexableExprMarker 
exprMarker, IndexDescriptor indexDesc) {
+Map mapRexExpr = 
exprMarker.getIndexableExpression();
+List infoCols = Lists.newArrayList();
+infoCols.addAll(mapRexExpr.values());
+if (indexDesc.allColumnsIndexed(infoCols)) {
+  return ConditionIndexed.FULL;
+} else if (indexDesc.someColumnsIndexed(infoCols)) {
+  return ConditionIndexed.PARTIAL;
+} else {
+  return ConditionIndexed.NONE;
+}
+  }
+
+  /**
+   * check if we want to apply index rules on this scan,
+   * if group scan is not instance of DbGroupScan, or this DbGroupScan 
instance does not support secondary index, or
+   *this scan is already an index scan or Restricted Scan, do not 

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643321#comment-16643321
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223652706
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/MaprDBJsonRecordReader.java
 ##
 @@ -517,6 +388,15 @@ private static FieldPath 
getFieldPathForProjection(SchemaPath column) {
 return new FieldPath(child);
   }
 
+  public static boolean includesIdField(Collection projected) {
+return Iterables.tryFind(projected, new Predicate() {
 
 Review comment:
   replace with lambda


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

2018-10-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643317#comment-16643317
 ] 

ASF GitHub Bot commented on DRILL-6381:
---

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223654960
 
 

 ##
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/udf/mapr/db/DecodeFieldPath.java
 ##
 @@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.udf.mapr.db;
+
+import javax.inject.Inject;
+
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate.FunctionScope;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate.NullHandling;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+
+import io.netty.buffer.DrillBuf;
+
+@FunctionTemplate(name = "maprdb_decode_fieldpath", scope = 
FunctionScope.SIMPLE, nulls = NullHandling.NULL_IF_NULL)
+public class DecodeFieldPath implements DrillSimpleFunc {
+  @Param  VarCharHolder input;
+  @Output VarCharHolder   out;
+
+  @Inject DrillBuf buffer;
+
+  @Override
+  public void setup() {
+  }
+
+  @Override
+  public void eval() {
+String[] encodedPaths = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.
+toStringFromUTF8(input.start, input.end, input.buffer).split(",");
+String[] decodedPaths = 
org.apache.drill.exec.util.EncodedSchemaPathSet.decode(encodedPaths);
+java.util.Arrays.sort(decodedPaths);
+
+StringBuilder sb = new StringBuilder();
+for(String decodedPath : decodedPaths) {
 
 Review comment:
   sapce


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> ---
>
> Key: DRILL-6381
> URL: https://issues.apache.org/jira/browse/DRILL-6381
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >