Re: [PR] [MINOR] Fix minor issues in tests, lower wait threshold for remote markers [hudi]

via GitHub Wed, 11 Jun 2025 11:37:57 -0700


the-other-tim-brown commented on code in PR #13419:
URL: https://github.com/apache/hudi/pull/13419#discussion_r2140561135



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestRecordLevelIndex.scala:
##########
@@ -229,19 +229,14 @@ class TestRecordLevelIndex extends 
RecordLevelIndexTestBase {
   }
 
   @ParameterizedTest
-  @CsvSource(value = Array(
-    "COPY_ON_WRITE,6", "COPY_ON_WRITE,8", "MERGE_ON_READ,6", "MERGE_ON_READ,8"
-  ))
-  def testRLIWithDelete(tableType: String, tableVersion: Int): Unit = {
-    val hudiOpts = commonOpts ++ Map(
-      DataSourceWriteOptions.TABLE_TYPE.key -> tableType,
-      HoodieTableConfig.VERSION.key() -> tableVersion.toString,
-      HoodieWriteConfig.WRITE_TABLE_VERSION.key() -> tableVersion.toString)
+  @EnumSource(classOf[HoodieTableType])

Review Comment:
   The full suite runs with v6 with `TestRecordLevelIndexTableVersionSix` so we 
don't need to override at this level as well. This lead to 4 repeated test 
cases.



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestPartitionStatsPruning.scala:
##########
@@ -32,9 +32,11 @@ import org.apache.hudi.storage.StoragePath
 
 import org.apache.spark.sql.SaveMode
 import org.junit.jupiter.api.Assertions.assertTrue
+import org.junit.jupiter.api.Tag
 import org.junit.jupiter.params.ParameterizedTest
 import org.junit.jupiter.params.provider.MethodSource
 
+@Tag("functional-b")

Review Comment:
   Not required since this extends a tagged class but this makes it more 
explicit for what is meant to be in each tagged set of tests.



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestRecordLevelIndexTableVersionSix.scala:
##########
@@ -21,6 +21,9 @@ package org.apache.hudi.functional
 import org.apache.hudi.common.table.HoodieTableConfig
 import org.apache.hudi.config.HoodieWriteConfig
 
+import org.junit.jupiter.api.Tag
+
+@Tag("functional")

Review Comment:
   Not required since this extends a tagged class but this makes it more 
explicit for what is meant to be in each tagged set of tests.



##########
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/utils/HoodieWriterClientTestHarness.java:
##########
@@ -255,6 +255,7 @@ public HoodieWriteConfig.Builder getConfigBuilder(String 
schemaStr, HoodieIndex.
         
.withStorageConfig(HoodieStorageConfig.newBuilder().hfileMaxFileSize(1024 * 
1024).parquetMaxFileSize(1024 * 1024).orcMaxFileSize(1024 * 1024).build())
         .forTable(RAW_TRIPS_TEST_NAME)
         
.withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(indexType).build())
+        .withMarkersTimelineServerBasedBatchIntervalMs(10)

Review Comment:
   In many places in the PR, the batch interval is decreased from the default 
of 50ms to 10ms. This shaves the overhead of each write helping trim the 
runtime of our tests that execute a series of writes.



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestColumnStatsIndexWithSQL.scala:
##########
@@ -41,15 +41,16 @@ import org.apache.spark.sql.catalyst.expressions.{And, 
AttributeReference, Bitwi
 import org.apache.spark.sql.catalyst.expressions.Literal.TrueLiteral
 import org.apache.spark.sql.hudi.DataSkippingUtils
 import org.apache.spark.sql.types.StringType
+import org.junit.jupiter.api.{Tag, Test}
 import org.junit.jupiter.api.Assertions.{assertEquals, assertFalse, assertTrue}
-import org.junit.jupiter.api.Test
 import org.junit.jupiter.params.ParameterizedTest
 import org.junit.jupiter.params.provider.MethodSource
 
 import java.io.File
 
 import scala.collection.JavaConverters._
 
+@Tag("functional-b")

Review Comment:
   Not required since this extends a tagged class but this makes it more 
explicit for what is meant to be in each tagged set of tests.



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/RecordLevelIndexTestBase.scala:
##########
@@ -35,7 +35,8 @@ import scala.util.Using
 class RecordLevelIndexTestBase extends HoodieStatsIndexTestBase {
   val metadataOpts: Map[String, String] = Map(
     HoodieMetadataConfig.ENABLE.key -> "true",
-    HoodieMetadataConfig.RECORD_INDEX_ENABLE_PROP.key -> "true"
+    HoodieMetadataConfig.RECORD_INDEX_ENABLE_PROP.key -> "true",
+    HoodieMetadataConfig.ENABLE_METADATA_INDEX_COLUMN_STATS.key -> "false"

Review Comment:
   Col stats adds overhead to our tests so disabling for the common case. We 
have some cases that explicitly test the combination of RLI and other indexes 
like: `testRLIWithOtherMetadataPartitions`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [MINOR] Fix minor issues in tests, lower wait threshold for remote markers [hudi]

Reply via email to