Re: [PR] [HUDI-7291] Pushing Down Partition Pruning Conditions to Column Stats Earlier During Data Skipping [hudi]

via GitHub Mon, 15 Jan 2024 22:17:50 -0800


majian1998 commented on code in PR #10493:
URL: https://github.com/apache/hudi/pull/10493#discussion_r1452969248



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/ColumnStatsIndexSupport.scala:
##########
@@ -106,14 +107,23 @@ class ColumnStatsIndexSupport(spark: SparkSession,
    *
    * Please check out scala-doc of the [[transpose]] method explaining this 
view in more details
    */
-  def loadTransposed[T](targetColumns: Seq[String], shouldReadInMemory: 
Boolean)(block: DataFrame => T): T = {
+  def loadTransposed[T](targetColumns: Seq[String], shouldReadInMemory: 
Boolean, prunedPartitionFileNames: Set[String] = Set.empty)(block: DataFrame => 
T): T = {
     cachedColumnStatsIndexViews.get(targetColumns) match {
       case Some(cachedDF) =>
         block(cachedDF)
 
       case None =>
-        val colStatsRecords: HoodieData[HoodieMetadataColumnStats] =
+        val colStatsRecords: HoodieData[HoodieMetadataColumnStats] = if 
(prunedPartitionFileNames.isEmpty) {
+          // NOTE: In order to ensure that testing and unexpected logic are 
normal, judgment logic is added.
           loadColumnStatsIndexRecords(targetColumns, shouldReadInMemory)
+        } else {
+          val filterFunction = new 
SerializableFunction[HoodieMetadataColumnStats, java.lang.Boolean] {

Review Comment:
   I refrained from introducing new tests as the current data skipping test 
logic is already comprehensive enough to encompass the modifications made 
here.I think ensuring the correctness of the existing tests should suffice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7291] Pushing Down Partition Pruning Conditions to Column Stats Earlier During Data Skipping [hudi]

Reply via email to