vdiravka commented on a change in pull request #1640: DRILL-7038: Queries on 
partitioned columns scan the entire datasets
URL: https://github.com/apache/drill/pull/1640#discussion_r263464327
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java
 ##########
 @@ -553,4 +571,220 @@ private static void setPruneStatus(MetadataContext 
metaContext, PruneStatus prun
     }
   }
 
+  /**
+   * A rule which transforms {@link TableScan} into {@link DrillValuesRel} to 
avoid
+   * unnecessary scanning of selected files. The rule is applied when query 
references
+   * directory columns only and has {@code DISTINCT} or {@code GROUP BY} 
operation.
+   *
+   * Resulting {@link DrillValuesRel} will be populated with constant literals 
obtained from:
+   * <ol>
+   *   <li>metadata directory file if it exists</li>
+   *   <li>or from file selection</li>
+   * </ol>
+   */
+  private static class PartitionColumnScanPruningRule extends PruneScanRule {
 
 Review comment:
   1. Looks like it can be applied not only for partitions, but for directories 
too.
   2. And it is not completely pruning, since pruning is partial pruning of 
data from scanning. 
   This rule is more similar to Convert rules (similar to 
ConvertCountToDirectScan).
   
   Therefore I suggest the following  description:
   A rule which converts {@link Aggregate} on {@link TableScan} with Partitions 
or Directories into {@link DrillValuesRel} to avoid scanning at all.
   And try to reflect it in the name of rule (along with getter name), for 
example `ConvertAggScanToValuesRule`.
   `Partitions` and `directories` words can be omitted from the rule name, 
since this rule can be applied for any column, which values are known on the 
planning stage.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to