Re: [PR] [HUDI-8664] adapt TestSparkSqlCoreFlow for hudi stream API [hudi]

via GitHub Wed, 08 Jan 2025 16:15:58 -0800


yihua commented on code in PR #12602:
URL: https://github.com/apache/hudi/pull/12602#discussion_r1908003760



##########
hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/HoodieDataSourceHelpers.java:
##########
@@ -98,6 +106,20 @@ public static String latestCommit(HoodieStorage storage, 
String basePath) {
     return timeline.lastInstant().get().requestedTime();
   }
 
+  /**
+   * Returns the last successful write operation's completed instant.
+   */
+  @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING)
+  public static HoodieInstant latestCompletedCommitCompletionTime(FileSystem 
fs, String basePath) {

Review Comment:
   Similar here on extracting the common functionality with `latestCommit` (two 
methods above). 



##########
hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/HoodieDataSourceHelpers.java:
##########
@@ -76,14 +76,22 @@ public static List<String> listCommitsSince(HoodieStorage 
storage, String basePa
 
   // this is used in the integration test script: 
docker/demo/sparksql-incremental.commands
   public static List<String> listCompletionTimeSince(FileSystem fs, String 
basePath,
-      String instantTimestamp) {
+                                                     String instantTimestamp) {

Review Comment:
   Could this method use `listCompletedInstantSince` (to return 
`Stream<HoodieInstant>`) to avoid code duplication on the same logic?



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestSparkSqlCoreFlow.scala:
##########
@@ -30,16 +30,15 @@ import 
org.apache.hudi.common.testutils.RawTripTestPayload.recordsToStrings
 import org.apache.hudi.hadoop.fs.HadoopFSUtils
 import org.apache.hudi.keygen.NonpartitionedKeyGenerator
 import org.apache.hudi.testutils.HoodieClientTestUtils.createMetaClient
-import org.apache.hudi.{DataSourceReadOptions, HoodieSparkUtils}
-
+import org.apache.hudi.DataSourceReadOptions

Review Comment:
   nit: keep import grouping



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestSparkSqlCoreFlow.scala:
##########
@@ -91,87 +90,101 @@ class TestSparkSqlCoreFlow extends HoodieSparkSqlTestBase {
     val dataGen = new 
HoodieTestDataGenerator(HoodieTestDataGenerator.TRIP_NESTED_EXAMPLE_SCHEMA, 
0xDEED)
 
     //Bulk insert first set of records
-    val inputDf0 = generateInserts(dataGen, "000", 100).cache()
+    val inputDf0 = generateInserts(dataGen, "000", 10).cache()

Review Comment:
   keep the record count the same as before



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-8664] adapt TestSparkSqlCoreFlow for hudi stream API [hudi]

Reply via email to