Github user jinfengni commented on a diff in the pull request:
https://github.com/apache/drill/pull/637#discussion_r86449453
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
---
@@ -1000,6 +1053,81 @@ public long getColumnValueCount(SchemaPath column) {
@Override
public List<SchemaPath> getPartitionColumns() {
- return new ArrayList<>(columnTypeMap.keySet());
+ return new ArrayList<>(partitionColTypeMap.keySet());
}
+
+ public GroupScan applyFilter(LogicalExpression filterExpr, UdfUtilities
udfUtilities,
+ FunctionImplementationRegistry functionImplementationRegistry,
OptionManager optionManager) {
+ if (fileSet.size() == 1 || ! (parquetTableMetadata instanceof
Metadata.ParquetTableMetadata_v3)) {
+ return null; // no pruning for 1 single parquet file or metadata is
prior v3.
+ }
+
+ final Set<SchemaPath> schemaPathsInExpr = filterExpr.accept(new
ParquetRGFilterEvaluator.FieldReferenceFinder(), null);
+
+ final List<RowGroupMetadata> qualifiedRGs = new
ArrayList<>(parquetTableMetadata.getFiles().size());
+ Set<String> qualifiedFileNames = Sets.newHashSet(); // HashSet keeps a
fileName unique.
+
+ ParquetFilterPredicate filterPredicate = null;
+
+ for (ParquetFileMetadata file : parquetTableMetadata.getFiles()) {
+ final ImplicitColumnExplorer columnExplorer = new
ImplicitColumnExplorer(optionManager, this.columns);
+ Map<String, String> implicitColValues =
columnExplorer.populateImplicitColumns(file.getPath(), selectionRoot);
+
+ for (RowGroupMetadata rowGroup : file.getRowGroups()) {
+ ParquetMetaStatCollector statCollector = new
ParquetMetaStatCollector(
+ parquetTableMetadata,
+ rowGroup.getColumns(),
+ implicitColValues);
+
+ Map<SchemaPath, ColumnStatistics> columnStatisticsMap =
statCollector.collectColStat(schemaPathsInExpr);
--- End diff --
Right. Filter predicate should be build only once. It's inside the loop
just we need the column type information during filter expression
materialization, for both regular columns and implicit columns.
I put a check if (filterPredicate == null) inside the loop, so that filter
predicate is built only once.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---