[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

ASF GitHub Bot (JIRA) Tue, 09 Oct 2018 05:54:46 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643345#comment-16643345
 ]


ASF GitHub Bot commented on DRILL-6381:
---------------------------------------

vdiravka commented on a change in pull request #1466: DRILL-6381: Add support 
for index based planning and execution
URL: https://github.com/apache/drill/pull/1466#discussion_r223651337
 
 

 ##########
 File path: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/JsonTableGroupScan.java
 ##########
 @@ -179,16 +295,126 @@ public MapRDBSubScan getSpecificScan(int 
minorFragmentId) {
     assert minorFragmentId < endpointFragmentMapping.size() : String.format(
         "Mappings length [%d] should be greater than minor fragment id [%d] 
but it isn't.", endpointFragmentMapping.size(),
         minorFragmentId);
-    return new MapRDBSubScan(getUserName(), formatPlugin, 
endpointFragmentMapping.get(minorFragmentId), columns, TABLE_JSON);
+    return new MapRDBSubScan(getUserName(), formatPlugin, 
endpointFragmentMapping.get(minorFragmentId), columns, maxRecordsToRead, 
TABLE_JSON);
   }
 
   @Override
   public ScanStats getScanStats() {
-    //TODO: look at stats for this.
-    long rowCount = (long) ((scanSpec.getSerializedFilter() != null ? .5 : 1) 
* totalRowCount);
-    int avgColumnSize = 10;
-    int numColumns = (columns == null || columns.isEmpty()) ? 100 : 
columns.size();
-    return new ScanStats(GroupScanProperty.NO_EXACT_ROW_COUNT, rowCount, 1, 
avgColumnSize * numColumns * rowCount);
+    if (isIndexScan()) {
+      return indexScanStats();
+    }
+    return fullTableScanStats();
+  }
+
+  private ScanStats fullTableScanStats() {
+    PluginCost pluginCostModel = formatPlugin.getPluginCostModel();
+    final int avgColumnSize = pluginCostModel.getAverageColumnSize(this);
+    final int numColumns = (columns == null || columns.isEmpty()) ? STAR_COLS 
: columns.size();
+    // index will be NULL for FTS
+    double rowCount = stats.getRowCount(scanSpec.getCondition(), null);
+    // rowcount based on _id predicate. If NO _id predicate present in 
condition, then the
+    // rowcount should be same as totalRowCount. Equality b/w the two 
rowcounts should not be
+    // construed as NO _id predicate since stats are approximate.
+    double leadingRowCount = stats.getLeadingRowCount(scanSpec.getCondition(), 
null);
+    double avgRowSize = stats.getAvgRowSize(null, true);
+    double totalRowCount = stats.getRowCount(null, null);
+    logger.debug("GroupScan {} with stats {}: rowCount={}, condition={}, 
totalRowCount={}, fullTableRowCount={}",
+            System.identityHashCode(this), System.identityHashCode(stats), 
rowCount,
+            scanSpec.getCondition()==null?"null":scanSpec.getCondition(),
 
 Review comment:
   formatting

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add capability to do index based planning and execution
> -------------------------------------------------------
>
>                 Key: DRILL-6381
>                 URL: https://issues.apache.org/jira/browse/DRILL-6381
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Execution - Relational Operators, Query Planning &amp; 
> Optimization
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>            Priority: Major
>             Fix For: 1.15.0
>
>
> If the underlying data source supports indexes (primary and secondary 
> indexes), Drill should leverage those during planning and execution in order 
> to improve query performance.  
> On the planning side, Drill planner should be enhanced to provide an 
> abstraction layer which express the index metadata and statistics.  Further, 
> a cost-based index selection is needed to decide which index(es) are 
> suitable.  
> On the execution side, appropriate operator enhancements would be needed to 
> handle different categories of indexes such as covering, non-covering 
> indexes, taking into consideration the index data may not be co-located with 
> the primary table, i.e a global index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6381) Add capability to do index based planning and execution

Reply via email to