bvaradar commented on code in PR #8452: URL: https://github.com/apache/hudi/pull/8452#discussion_r1188099833
########## hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java: ########## @@ -58,13 +62,21 @@ public class FileSystemBackedTableMetadata implements HoodieTableMetadata { private final SerializableConfiguration hadoopConf; private final String datasetBasePath; private final boolean assumeDatePartitioning; + private final boolean hiveStylePartitioningEnabled; + private final boolean urlEncodePartitioningEnabled; public FileSystemBackedTableMetadata(HoodieEngineContext engineContext, SerializableConfiguration conf, String datasetBasePath, boolean assumeDatePartitioning) { this.engineContext = engineContext; this.hadoopConf = conf; this.datasetBasePath = datasetBasePath; this.assumeDatePartitioning = assumeDatePartitioning; + HoodieTableMetaClient metaClient = HoodieTableMetaClient.builder() Review Comment: The super class already instantiates metaclient. Please move the members hiveStylePartitioningEnabled and urlEncodePartitioningEnabled there so that they can be reused for HoodieBackedTableMetadata ########## hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/FilterGenVisitor.java: ########## @@ -42,9 +43,10 @@ private String quoteStringLiteral(String value) { } } - private String visitAnd(Expression left, Expression right) { - String leftResult = left.accept(this); - String rightResult = right.accept(this); + @Override + public String visitAnd(Predicates.And and) { Review Comment: Is case sensitivity same between hive-sync and spark integration ? ########## hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/spark/sql/adapter/Spark2Adapter.scala: ########## @@ -186,4 +186,13 @@ class Spark2Adapter extends SparkAdapter { case OFF_HEAP => "OFF_HEAP" case _ => throw new IllegalArgumentException(s"Invalid StorageLevel: $level") } + + override def translateFilter(predicate: Expression, + supportNestedPredicatePushdown: Boolean = false): Option[Filter] = { + if (supportNestedPredicatePushdown) { Review Comment: Is this expected to fail any spark 2 queries ? ########## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java: ########## @@ -153,6 +156,20 @@ protected Option<HoodieRecord<HoodieMetadataPayload>> getRecordByKey(String key, return recordsByKeys.size() == 0 ? Option.empty() : recordsByKeys.get(0).getValue(); } + @Override + public List<String> getPartitionPathByExpression(List<String> relativePathPrefixes, + Types.RecordType partitionFields, + Expression expression) throws IOException { + Expression boundedExpr = expression.accept(new BindVisitor(partitionFields, false)); + boolean hiveStylePartitioningEnabled = Boolean.parseBoolean(dataMetaClient.getTableConfig().getHiveStylePartitioningEnable()); Review Comment: Once we move hiveStylePartitioningEnabled and urlEncodePartitioningEnabled to base class, reuse them instead of creating this each time. ########## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala: ########## @@ -307,8 +318,20 @@ class SparkHoodieTableFileIndex(spark: SparkSession, Seq(new PartitionPath(relativePartitionPathPrefix, staticPartitionColumnNameValuePairs.map(_._2._2.asInstanceOf[AnyRef]).toArray)) } else { // Otherwise, compile extracted partition values (from query predicates) into a sub-path which is a prefix - // of the complete partition path, do listing for this prefix-path only - listPartitionPaths(Seq(relativePartitionPathPrefix).toList.asJava).asScala + // of the complete partition path, do listing for this prefix-path and filter them with partitionPredicates + Try { + SparkFilterHelper.convertDataType(partitionSchema).asInstanceOf[RecordType] + } match { + case Success(partitionRecordType) if partitionRecordType.fields().size() == _partitionSchemaFromProperties.size => + val convertedFilters = SparkFilterHelper.convertFilters( + partitionColumnPredicates.flatMap { + expr => sparkAdapter.translateFilter(expr) + }) + listPartitionPaths(Seq(relativePartitionPathPrefix).toList.asJava, partitionRecordType, convertedFilters).asScala Review Comment: If we encounter exception such as in Conversions.fromPartitionString default case, we should revert to list by prefix without filtering. ########## hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java: ########## @@ -84,6 +96,19 @@ public List<String> getAllPartitionPaths() throws IOException { return getPartitionPathWithPathPrefixes(Collections.singletonList("")); } + @Override + public List<String> getPartitionPathByExpression(List<String> relativePathPrefixes, Review Comment: Lets have a standalone test for this method in this class and in HoodieBackedTableMetadata as this is the crux of this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org