[GitHub] drill pull request #637: Drill 1950 : Parquet row group filter pushdown.

jinfengni Thu, 03 Nov 2016 14:41:43 -0700

Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/637#discussion_r86449453
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
    @@ -1000,6 +1053,81 @@ public long getColumnValueCount(SchemaPath column) {
     
       @Override
       public List<SchemaPath> getPartitionColumns() {
    -    return new ArrayList<>(columnTypeMap.keySet());
    +    return new ArrayList<>(partitionColTypeMap.keySet());
       }
    +
    +  public GroupScan applyFilter(LogicalExpression filterExpr, UdfUtilities 
udfUtilities,
    +      FunctionImplementationRegistry functionImplementationRegistry, 
OptionManager optionManager) {
    +    if (fileSet.size() == 1 || ! (parquetTableMetadata instanceof 
Metadata.ParquetTableMetadata_v3)) {
    +      return null; // no pruning for 1 single parquet file or metadata is 
prior v3.
    +    }
    +
    +    final Set<SchemaPath> schemaPathsInExpr = filterExpr.accept(new 
ParquetRGFilterEvaluator.FieldReferenceFinder(), null);
    +
    +    final List<RowGroupMetadata> qualifiedRGs = new 
ArrayList<>(parquetTableMetadata.getFiles().size());
    +    Set<String> qualifiedFileNames = Sets.newHashSet(); // HashSet keeps a 
fileName unique.
    +
    +    ParquetFilterPredicate filterPredicate = null;
    +
    +    for (ParquetFileMetadata file : parquetTableMetadata.getFiles()) {
    +      final ImplicitColumnExplorer columnExplorer = new 
ImplicitColumnExplorer(optionManager, this.columns);
    +      Map<String, String> implicitColValues = 
columnExplorer.populateImplicitColumns(file.getPath(), selectionRoot);
    +
    +      for (RowGroupMetadata rowGroup : file.getRowGroups()) {
    +        ParquetMetaStatCollector statCollector = new 
ParquetMetaStatCollector(
    +            parquetTableMetadata,
    +            rowGroup.getColumns(),
    +            implicitColValues);
    +
    +        Map<SchemaPath, ColumnStatistics> columnStatisticsMap = 
statCollector.collectColStat(schemaPathsInExpr);
    --- End diff --
    
    Right. Filter predicate should be build only once. It's inside the loop 
just we need the column type information during filter expression 
materialization, for both regular columns and implicit columns. 
    
    I put a check if (filterPredicate == null) inside the loop, so that filter 
predicate is built only once.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #637: Drill 1950 : Parquet row group filter pushdown.

Reply via email to