[GitHub] [hudi] xushiyan merged pull request #9017: [HUDI-6393] Enable MOR support for Record index with functional test cases

2023-06-29 Thread via GitHub


xushiyan merged PR #9017:
URL: https://github.com/apache/hudi/pull/9017


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on pull request #9017: [HUDI-6393] Enable MOR support for Record index with functional test cases

2023-06-29 Thread via GitHub


xushiyan commented on PR #9017:
URL: https://github.com/apache/hudi/pull/9017#issuecomment-1613510452

   CI is timing out as expected. The newly added testcase is passing. will land 
this now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan merged pull request #9089: [MINOR] Increase timeout for Azure CI: UT spark-datasource to 240 minutes

2023-06-29 Thread via GitHub


nsivabalan merged PR #9089:
URL: https://github.com/apache/hudi/pull/9089


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (8def3e68ae5 -> 05435bb0344)

2023-06-29 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 8def3e68ae5 [MINOR] Improve CollectionUtils helper methods (#9088)
 add 05435bb0344 [MINOR] Increase timeout for Azure CI: UT spark-datasource 
to 240 minutes (#9089)

No new revisions were added by this update.

Summary of changes:
 azure-pipelines-20230430.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



[GitHub] [hudi] hudi-bot commented on pull request #9083: PKLess Merge Into

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9083:
URL: https://github.com/apache/hudi/pull/9083#issuecomment-1613500714

   
   ## CI report:
   
   * be6801e9ca41f00576a511c7d3ffe144e90717ee Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18179)
 
   * 3a0bfb88049cf2c0f8afe5c925dbd76fa6f7cd89 UNKNOWN
   * 767eb9cc26d98ed8e64632f98ab688aa4145e5aa Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18204)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9058:
URL: https://github.com/apache/hudi/pull/9058#issuecomment-1613500487

   
   ## CI report:
   
   * 1697d1bfa095ca16a9361e3728a77331d3a28037 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18195)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9083: PKLess Merge Into

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9083:
URL: https://github.com/apache/hudi/pull/9083#issuecomment-1613490095

   
   ## CI report:
   
   * be6801e9ca41f00576a511c7d3ffe144e90717ee Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18179)
 
   * 3a0bfb88049cf2c0f8afe5c925dbd76fa6f7cd89 UNKNOWN
   * 767eb9cc26d98ed8e64632f98ab688aa4145e5aa UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1613489865

   
   ## CI report:
   
   * 4775dce07f2f3237b32f22b360f3423b1eafce85 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18191)
 
   * af66542fd96990611c79e90c943a18341442 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18203)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1613476228

   
   ## CI report:
   
   * 4775dce07f2f3237b32f22b360f3423b1eafce85 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18191)
 
   * af66542fd96990611c79e90c943a18341442 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9083: PKLess Merge Into

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9083:
URL: https://github.com/apache/hudi/pull/9083#issuecomment-1613476471

   
   ## CI report:
   
   * be6801e9ca41f00576a511c7d3ffe144e90717ee Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18179)
 
   * 3a0bfb88049cf2c0f8afe5c925dbd76fa6f7cd89 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9017: [HUDI-6393] Enable MOR support for Record index with functional test cases

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9017:
URL: https://github.com/apache/hudi/pull/9017#issuecomment-1613475988

   
   ## CI report:
   
   * ceffe7d8146f48e1c6c083613646463c1404a77f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18194)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] BBency opened a new issue, #9094: Async Clustering failing with errors for MOR table

2023-06-29 Thread via GitHub


BBency opened a new issue, #9094:
URL: https://github.com/apache/hudi/issues/9094

   **Problem Description**
   
   We have a MOR table which is partitioned by yearmonth(MM). We would like 
to trigger async clustering after doing the compaction at the end of the day so 
that we can stitch together small files into larger files. Async clustering for 
the table is failing. Below are the different approaches I tried and the error 
messages I got.
   
   **Hudi Config Used**
   
   ```
   "hoodie.table.name" -> hudiTableName,
   "hoodie.datasource.write.keygenerator.class" -> 
"org.apache.hudi.keygen.ComplexKeyGenerator",
   "hoodie.datasource.write.precombine.field" -> preCombineKey,
   "hoodie.datasource.write.recordkey.field" -> recordKey,
   "hoodie.datasource.write.operation" -> writeOperation,
   "hoodie.datasource.write.row.writer.enable" -> "true",
   "hoodie.datasource.write.reconcile.schema" -> "true",
   "hoodie.datasource.write.partitionpath.field" -> partitionColumnName,
   "hoodie.datasource.write.hive_style_partitioning" -> "true",
   "hoodie.bulkinsert.sort.mode" -> "GLOBAL_SORT",
   "hoodie.datasource.hive_sync.enable" -> "true",
   "hoodie.datasource.hive_sync.table" -> hudiTableName,
   "hoodie.datasource.hive_sync.database" -> databaseName,
   "hoodie.datasource.hive_sync.partition_fields" -> partitionColumnName,
   "hoodie.datasource.hive_sync.partition_extractor_class" -> 
"org.apache.hudi.hive.MultiPartKeysValueExtractor",
   "hoodie.datasource.hive_sync.use_jdbc" -> "false",
   "hoodie.combine.before.upsert" -> "true",
   "hoodie.index.type" -> "BLOOM",
   "spark.hadoop.parquet.avro.write-old-list-structure" -> "false"
   "hoodie.datasource.write.table.type" -> "MERGE_ON_READ"
   "hoodie.compact.inline" -> "false",
   "hoodie.compact.schedule.inline" -> "true",
   "hoodie.compact.inline.trigger.strategy" -> "NUM_COMMITS",
   "hoodie.compact.inline.max.delta.commits" -> "5",
   "hoodie.cleaner.policy" -> "KEEP_LATEST_COMMITS",
   "hoodie.cleaner.commits.retained" -> "3",
   "hoodie.clustering.async.enabled" -> "true",
   "hoodie.clustering.async.max.commits" -> "2",
   "hoodie.clustering.execution.strategy.class" -> 
"org.apache.hudi.client.clustering.run.strategy.SparkSortAndSizeExecutionStrategy",
   "hoodie.clustering.plan.strategy.sort.columns" -> recordKey,
   "hoodie.clustering.plan.strategy.small.file.limit" -> "67108864",
   "hoodie.clustering.plan.strategy.target.file.max.bytes" -> "134217728",
   "hoodie.clustering.plan.strategy.max.bytes.per.group" -> "2147483648",
   "hoodie.clustering.plan.strategy.max.num.groups" -> "150",
   "hoodie.clustering.preserve.commit.metadata" -> "true"
   ```
   
   **Approaches Tried**
   
   1. Triggered a clustering job with running mode as scheduleAndExecute
   **Code Used**
   
   ``` val hudiClusterConfig = new HoodieClusteringJob.Config
hudiClusterConfig.basePath = 
hudiClusterConfig.tableName = 
hudiClusterConfig.runningMode = "scheduleAndExecute"
hudiClusterConfig.retryLastFailedClusteringJob = true
val configList: util.List[String] = new util.ArrayList()
configList.add("hoodie.clustering.async.enabled=true")
configList.add("hoodie.clustering.async.max.commits=2") 
configList.add("hoodie.clustering.execution.strategy.class=org.apache.hudi.client.clustering.run.strategy.SparkSortAndSizeExecutionStrategy")

configList.add("hoodie.clustering.plan.strategy.sort.columns=")

configList.add("hoodie.clustering.plan.strategy.small.file.limit=67108864")

configList.add("hoodie.clustering.plan.strategy.target.file.max.bytes=134217728")

configList.add("hoodie.clustering.plan.strategy.max.bytes.per.group=2147483648")
configList.add("hoodie.clustering.plan.strategy.max.num.groups=150")
configList.add("hoodie.clustering.preserve.commit.metadata=true")
hudiClusterConfig.configs = configList
val hudiClusterJob = new HoodieClusteringJob(jsc, hudiClusterConfig)
val clusterStatus = hudiClusterJob.cluster(1)
println(clusterStatus)
```
   
**Stacktrace**
   
   ShuffleMapStage 87 (sortBy at RDDCustomColumnsSortPartitioner.java:64) 
failed in 1.098 s due to Job aborted due to stage failure: task 0.0 in stage 
28.0 (TID 367) had a not serializable result: 
org.apache.avro.generic.GenericData$Record
   Serialization stack:
- object not serializable (class: 
org.apache.avro.generic.GenericData$Record, value:

   2. Used the procedure run_clustering to schedule and trigger clustering. We 
found that the replacecommit created through the procedure run had lesser data 
compared to what it was created when scheduled from the code in approach 1
   **Code Used**

   ```query_run_clustering = f"call run_clustering(path => '{path}')"
   spark_df_run_clustering = spark.sql(query_run_clustering)
   spark_df_run_clustering.show()
   ```
   
   **Stacktrace**
   
   An 

[GitHub] [hudi] xushiyan commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-29 Thread via GitHub


xushiyan commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1613418110

   manually verified the flow 0.13.1 -> 0.14.0-SNAPSHOT (this PR)
   
   before upgrade
   
   ```
   hoodie.table.version=5
   hoodie.table.metadata.partitions=files
   ```
   
   upgrade
   
   ```
   ./hudi-cli.sh
   connect --path /tmp/hudi_trips_13_1_to_14_0_COPY_ON_WRITE
   upgrade table --toVersion 6 --sparkMaster 'local[2]'
   ```
   
   after upgrade
   
   ```
   hoodie.table.version=6
   hoodie.table.metadata.partitions=files
   ```
   
   write data with RLI enabled
   
   ```
   hoodie.table.version=6
   hoodie.table.metadata.partitions=files,record_index
   ```
   
   RLI partition and hfiles created
   
   downgrade
   
   ```
   downgrade table --toVersion 5 --sparkMaster 'local[2]'
   ```
   
   after downgrade
   
   ```
   hoodie.table.version=5
   hoodie.table.metadata.partitions=files
   ```
   
   RLI partition is removed
   
   ```
   ➜ ll /tmp/hudi_trips_13_1_to_14_0_COPY_ON_WRITE/.hoodie/metadata/record_index
   ls: 
/tmp/hudi_trips_13_1_to_14_0_COPY_ON_WRITE/.hoodie/metadata/record_index: No 
such file or directory
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #9079: [SUPPORT] Hudi delete not working when using UuidKeyGenerator

2023-06-29 Thread via GitHub


nsivabalan commented on issue #9079:
URL: https://github.com/apache/hudi/issues/9079#issuecomment-1613408548

   this is a known limitation of UUID Key generator. This key gen is generally 
meant to be used only for immutable data. 
   with 0.14.0, we are adding pk less(primary key less) table, you can use 
spark-sql DELETES to delete records. but this is coming in 0.14.0 and we don't 
have any such support in prior versions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] noahtaite commented on issue #9067: [SUPPORT] Manual Glue sync for large, highly partitioned table failing

2023-06-29 Thread via GitHub


noahtaite commented on issue #9067:
URL: https://github.com/apache/hudi/issues/9067#issuecomment-1613377080

   Hello @danny0405 @ad1happy2go I can confirm 0.13.1 works nicely as the HMS 
sync mode now supports batching and boolean values (conditional sync). thank 
you for the support


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] noahtaite closed issue #9067: [SUPPORT] Manual Glue sync for large, highly partitioned table failing

2023-06-29 Thread via GitHub


noahtaite closed issue #9067: [SUPPORT] Manual Glue sync for large, highly 
partitioned table failing
URL: https://github.com/apache/hudi/issues/9067


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] gamblewin opened a new issue, #9093: [SUPPORT] Is it allowed using Flink Table API sqlQuery() to read data from hudi tables?

2023-06-29 Thread via GitHub


gamblewin opened a new issue, #9093:
URL: https://github.com/apache/hudi/issues/9093

   **Describe the problem you faced**
   
   I'm trying to use flink table api sqlQuery to read data from hudi table but 
not working, so am i doing it wrong or hudi doesn't support this way to query 
data.
   
   **Code**
   ```java
   sEnv = StreamExecutionEnvironment.getExecutionEnvironment();
   sTableEnv = StreamTableEnvironment.create(sEnv);
   sEnv.setParallelism(1);
   sEnv.enableCheckpointing(3000);
   // create table
   String createTabelSql = "create table dept(\n" +
   "  dept_id BIGINT PRIMARY KEY NOT ENFORCED,\n" +
   "  dept_name varchar(10),\n" +
   "  ts timestamp(3)\n" +
   ")\n" +
   "with (\n" +
   "  'connector' = 'hudi',\n" +
   "  'path' = 'hdfs://localhost:9000/hudi/dept',\n" +
   "  'table.type' = 'MERGE_ON_READ'\n" +
   ")";
   sTableEnv.executeSql(createTabelSql);
   // insert data
   sTableEnv.executeSql("insert into dept values (1, 'a', NOW()), (2, 'b', 
NOW())");
   // query data
   Table table = sTableEnv.sqlQuery("select * from dept");
   DataStream dataStream = sTableEnv.toDataStream(table);
   // there's nothing to print
   dataStream.print();
   ```
   
   **Environment Description**
   
   * Hudi version : 1.12.0
   
   * Hadoop version : 3.1.3
   
   * Flink version: 1.13.6
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9038: [HUDI-6423] Incremental cleaning should consider inflight compaction instant

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9038:
URL: https://github.com/apache/hudi/pull/9038#issuecomment-1613357939

   
   ## CI report:
   
   * a65a29c0cf1c8feb9f39e168ba80c99ebcae1c5d UNKNOWN
   * 34f8823f48712c57058bc37c8936a276c1457557 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18187)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18193)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18188)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ad1happy2go commented on issue #9086: [SUPPORT]How to build with scala 2.11 for spark and scala2.12 for flink

2023-06-29 Thread via GitHub


ad1happy2go commented on issue #9086:
URL: https://github.com/apache/hudi/issues/9086#issuecomment-1613339717

   @bigdata-spec I dont think we can build with different scala version in a 
single build. You may need to build it twice and then use the spark and flink 
jars from separate artifacts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ad1happy2go commented on issue #9091: [BUG] Use NonpartitionedKeyGenerator WriteOperationType BULK_INSERT and UPSERT get different _hoodie_record_key format

2023-06-29 Thread via GitHub


ad1happy2go commented on issue #9091:
URL: https://github.com/apache/hudi/issues/9091#issuecomment-1613328306

   @lipusheng Known Issue which got fixed in hudi 0.13.X.
   
   Refer this GitHub issue - https://github.com/apache/hudi/issues/8981


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9082: [HUDI-6445] Distribute spark ds func tests

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9082:
URL: https://github.com/apache/hudi/pull/9082#issuecomment-1613268792

   
   ## CI report:
   
   * c529c624afdca331514a2bdfb78cc6e18ab9f57a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18185)
 
   * 474ce7e9a78909fe90b0641f7be1b059084bb11a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18202)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9082: [HUDI-6445] Distribute spark ds func tests

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9082:
URL: https://github.com/apache/hudi/pull/9082#issuecomment-1613254158

   
   ## CI report:
   
   * c529c624afdca331514a2bdfb78cc6e18ab9f57a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18185)
 
   * 474ce7e9a78909fe90b0641f7be1b059084bb11a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9041:
URL: https://github.com/apache/hudi/pull/9041#issuecomment-1613253794

   
   ## CI report:
   
   * b681df04a7ad0febbcd9235622c2ee7f98759cf9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18082)
 
   * 2f139383c54f93669342539af77dca9b3a352be3 UNKNOWN
   * a1458e17e5749a89948be8f60387eeecd4c0f87c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18201)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] KenjiFujima commented on pull request #8933: [HUDI-5329] Spark reads table error when Flink creates table without record key and primary key

2023-06-29 Thread via GitHub


KenjiFujima commented on PR #8933:
URL: https://github.com/apache/hudi/pull/8933#issuecomment-1613251280

   @danny0405, I have addressed above comments. PTAL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub


xushiyan commented on code in PR #9041:
URL: https://github.com/apache/hudi/pull/9041#discussion_r1246658153


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -459,11 +459,6 @@ private Pair> 
initializeRecordIndexPartition()
 final HoodieMetadataFileSystemView fsView = new 
HoodieMetadataFileSystemView(dataMetaClient,
 dataMetaClient.getActiveTimeline(), metadata);
 
-// MOR tables are not supported
-if (!dataMetaClient.getTableType().equals(HoodieTableType.COPY_ON_WRITE)) {
-  throw new HoodieMetadataException("Only COW tables are supported with 
record index");
-}
-

Review Comment:
   this change will be included in functional test PR (which should be merged 
first). i include it here for CI to pass. when merging, this diff should be 
auto-resolved.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub


xushiyan commented on code in PR #9041:
URL: https://github.com/apache/hudi/pull/9041#discussion_r1246655960


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##
@@ -310,6 +312,56 @@ public static  HoodieData> 
mergeForPartitionUpdates(
 return Arrays.asList(deleteRecord, getTaggedRecord(merged, 
Option.empty())).iterator();
   }
 });
-return taggedUpdatingRecords.union(newRecords);
+return taggedUpdatingRecords.union(taggedNewRecords);
+  }
+
+  public static  HoodieData> tagGlobalLocationBackToRecords(
+  HoodieData> incomingRecords,
+  HoodiePairData 
keyAndExistingLocations,
+  boolean mayContainDuplicateLookup,
+  boolean shouldUpdatePartitionPath,
+  HoodieWriteConfig config,
+  HoodieTable table) {
+final HoodieRecordMerger merger = config.getRecordMerger();
+
+HoodiePairData> keyAndIncomingRecords =
+incomingRecords.mapToPair(record -> Pair.of(record.getRecordKey(), 
record));
+
+// Pair of incoming record and the global location if meant for merged 
lookup in later stage
+HoodieData, Option>> 
incomingRecordsAndLocations
+= keyAndIncomingRecords.leftOuterJoin(keyAndExistingLocations).values()
+.map(v -> {
+  final HoodieRecord incomingRecord = v.getLeft();
+  Option currentLocOpt = 
Option.ofNullable(v.getRight().orElse(null));
+  if (currentLocOpt.isPresent()) {
+HoodieRecordGlobalLocation currentLoc = currentLocOpt.get();
+boolean shouldPerformMergedLookUp = mayContainDuplicateLookup
+|| !Objects.equals(incomingRecord.getPartitionPath(), 
currentLoc.getPartitionPath());
+if (shouldUpdatePartitionPath && shouldPerformMergedLookUp) {
+  return Pair.of(incomingRecord, currentLocOpt);
+} else {
+  // - When update partition path is set to false,
+  //   the incoming record will be tagged to the existing record's 
partition regardless of being equal or not.
+  // - When update partition path is set to true,
+  //   the incoming record will be tagged to the existing record's 
partition
+  //   when partition is not updated and the look-up won't have 
duplicates (e.g. COW, or using RLI).
+  return Pair.of((HoodieRecord) getTaggedRecord(

Review Comment:
   this was having Option.empty() as right of the pair and it won't be 
merged-lookup candidates



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on a diff in pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub


codope commented on code in PR #9041:
URL: https://github.com/apache/hudi/pull/9041#discussion_r1246655286


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##
@@ -310,6 +312,56 @@ public static  HoodieData> 
mergeForPartitionUpdates(
 return Arrays.asList(deleteRecord, getTaggedRecord(merged, 
Option.empty())).iterator();
   }
 });
-return taggedUpdatingRecords.union(newRecords);
+return taggedUpdatingRecords.union(taggedNewRecords);
+  }
+
+  public static  HoodieData> tagGlobalLocationBackToRecords(
+  HoodieData> incomingRecords,
+  HoodiePairData 
keyAndExistingLocations,
+  boolean mayContainDuplicateLookup,
+  boolean shouldUpdatePartitionPath,
+  HoodieWriteConfig config,
+  HoodieTable table) {
+final HoodieRecordMerger merger = config.getRecordMerger();
+
+HoodiePairData> keyAndIncomingRecords =
+incomingRecords.mapToPair(record -> Pair.of(record.getRecordKey(), 
record));
+
+// Pair of incoming record and the global location if meant for merged 
lookup in later stage
+HoodieData, Option>> 
incomingRecordsAndLocations
+= keyAndIncomingRecords.leftOuterJoin(keyAndExistingLocations).values()
+.map(v -> {
+  final HoodieRecord incomingRecord = v.getLeft();
+  Option currentLocOpt = 
Option.ofNullable(v.getRight().orElse(null));
+  if (currentLocOpt.isPresent()) {
+HoodieRecordGlobalLocation currentLoc = currentLocOpt.get();
+boolean shouldPerformMergedLookUp = mayContainDuplicateLookup
+|| !Objects.equals(incomingRecord.getPartitionPath(), 
currentLoc.getPartitionPath());
+if (shouldUpdatePartitionPath && shouldPerformMergedLookUp) {
+  return Pair.of(incomingRecord, currentLocOpt);
+} else {
+  // - When update partition path is set to false,
+  //   the incoming record will be tagged to the existing record's 
partition regardless of being equal or not.
+  // - When update partition path is set to true,
+  //   the incoming record will be tagged to the existing record's 
partition
+  //   when partition is not updated and the look-up won't have 
duplicates (e.g. COW, or using RLI).
+  return Pair.of((HoodieRecord) getTaggedRecord(
+  createNewHoodieRecord(incomingRecord, currentLoc, 
merger), Option.of(currentLoc)),
+  Option.empty());
+}
+  } else {
+return Pair.of(getTaggedRecord(incomingRecord, Option.empty()), 
Option.empty());
+  }
+});
+return shouldUpdatePartitionPath
+? mergeForPartitionUpdatesIfNeeded(incomingRecordsAndLocations, 
config, table)

Review Comment:
   yeah we need to consider duplicates, otherwise we'll have to special-case 
for RLI.



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -459,11 +459,6 @@ private Pair> 
initializeRecordIndexPartition()
 final HoodieMetadataFileSystemView fsView = new 
HoodieMetadataFileSystemView(dataMetaClient,
 dataMetaClient.getActiveTimeline(), metadata);
 
-// MOR tables are not supported
-if (!dataMetaClient.getTableType().equals(HoodieTableType.COPY_ON_WRITE)) {
-  throw new HoodieMetadataException("Only COW tables are supported with 
record index");
-}
-

Review Comment:
   Would prefer to land it in a separate commit. I guess #9017 will land 
earlier anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub


xushiyan commented on code in PR #9041:
URL: https://github.com/apache/hudi/pull/9041#discussion_r1241059669


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##
@@ -310,6 +312,56 @@ public static  HoodieData> 
mergeForPartitionUpdates(
 return Arrays.asList(deleteRecord, getTaggedRecord(merged, 
Option.empty())).iterator();
   }
 });
-return taggedUpdatingRecords.union(newRecords);
+return taggedUpdatingRecords.union(taggedNewRecords);
+  }
+
+  public static  HoodieData> tagGlobalLocationBackToRecords(
+  HoodieData> incomingRecords,
+  HoodiePairData 
keyAndExistingLocations,
+  boolean mayContainDuplicateLookup,
+  boolean shouldUpdatePartitionPath,
+  HoodieWriteConfig config,
+  HoodieTable table) {
+final HoodieRecordMerger merger = config.getRecordMerger();
+
+HoodiePairData> keyAndIncomingRecords =
+incomingRecords.mapToPair(record -> Pair.of(record.getRecordKey(), 
record));
+
+// Pair of incoming record and the global location if meant for merged 
lookup in later stage
+HoodieData, Option>> 
incomingRecordsAndLocations
+= keyAndIncomingRecords.leftOuterJoin(keyAndExistingLocations).values()
+.map(v -> {
+  final HoodieRecord incomingRecord = v.getLeft();
+  Option currentLocOpt = 
Option.ofNullable(v.getRight().orElse(null));
+  if (currentLocOpt.isPresent()) {
+HoodieRecordGlobalLocation currentLoc = currentLocOpt.get();
+boolean shouldPerformMergedLookUp = mayContainDuplicateLookup
+|| !Objects.equals(incomingRecord.getPartitionPath(), 
currentLoc.getPartitionPath());
+if (shouldUpdatePartitionPath && shouldPerformMergedLookUp) {
+  return Pair.of(incomingRecord, currentLocOpt);
+} else {
+  // - When update partition path is set to false,
+  //   the incoming record will be tagged to the existing record's 
partition regardless of being equal or not.
+  // - When update partition path is set to true,
+  //   the incoming record will be tagged to the existing record's 
partition
+  //   when partition is not updated and the look-up won't have 
duplicates (e.g. COW, or using RLI).
+  return Pair.of((HoodieRecord) getTaggedRecord(

Review Comment:
   new record creation needs optimization; i have not finished it yet.



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##
@@ -310,6 +312,56 @@ public static  HoodieData> 
mergeForPartitionUpdates(
 return Arrays.asList(deleteRecord, getTaggedRecord(merged, 
Option.empty())).iterator();
   }
 });
-return taggedUpdatingRecords.union(newRecords);
+return taggedUpdatingRecords.union(taggedNewRecords);
+  }
+
+  public static  HoodieData> tagGlobalLocationBackToRecords(
+  HoodieData> incomingRecords,
+  HoodiePairData 
keyAndExistingLocations,
+  boolean mayContainDuplicateLookup,
+  boolean shouldUpdatePartitionPath,
+  HoodieWriteConfig config,
+  HoodieTable table) {
+final HoodieRecordMerger merger = config.getRecordMerger();
+
+HoodiePairData> keyAndIncomingRecords =
+incomingRecords.mapToPair(record -> Pair.of(record.getRecordKey(), 
record));
+
+// Pair of incoming record and the global location if meant for merged 
lookup in later stage
+HoodieData, Option>> 
incomingRecordsAndLocations
+= keyAndIncomingRecords.leftOuterJoin(keyAndExistingLocations).values()
+.map(v -> {
+  final HoodieRecord incomingRecord = v.getLeft();
+  Option currentLocOpt = 
Option.ofNullable(v.getRight().orElse(null));
+  if (currentLocOpt.isPresent()) {
+HoodieRecordGlobalLocation currentLoc = currentLocOpt.get();
+boolean shouldPerformMergedLookUp = mayContainDuplicateLookup
+|| !Objects.equals(incomingRecord.getPartitionPath(), 
currentLoc.getPartitionPath());
+if (shouldUpdatePartitionPath && shouldPerformMergedLookUp) {
+  return Pair.of(incomingRecord, currentLocOpt);
+} else {
+  // - When update partition path is set to false,
+  //   the incoming record will be tagged to the existing record's 
partition regardless of being equal or not.
+  // - When update partition path is set to true,
+  //   the incoming record will be tagged to the existing record's 
partition
+  //   when partition is not updated and the look-up won't have 
duplicates (e.g. COW, or using RLI).
+  return Pair.of((HoodieRecord) getTaggedRecord(

Review Comment:
   refactored



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific 

[GitHub] [hudi] xushiyan commented on a diff in pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub


xushiyan commented on code in PR #9041:
URL: https://github.com/apache/hudi/pull/9041#discussion_r1241059710


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##
@@ -310,6 +312,56 @@ public static  HoodieData> 
mergeForPartitionUpdates(
 return Arrays.asList(deleteRecord, getTaggedRecord(merged, 
Option.empty())).iterator();
   }
 });
-return taggedUpdatingRecords.union(newRecords);
+return taggedUpdatingRecords.union(taggedNewRecords);
+  }
+
+  public static  HoodieData> tagGlobalLocationBackToRecords(
+  HoodieData> incomingRecords,
+  HoodiePairData 
keyAndExistingLocations,
+  boolean mayContainDuplicateLookup,
+  boolean shouldUpdatePartitionPath,
+  HoodieWriteConfig config,
+  HoodieTable table) {
+final HoodieRecordMerger merger = config.getRecordMerger();
+
+HoodiePairData> keyAndIncomingRecords =
+incomingRecords.mapToPair(record -> Pair.of(record.getRecordKey(), 
record));
+
+// Pair of incoming record and the global location if meant for merged 
lookup in later stage
+HoodieData, Option>> 
incomingRecordsAndLocations
+= keyAndIncomingRecords.leftOuterJoin(keyAndExistingLocations).values()
+.map(v -> {
+  final HoodieRecord incomingRecord = v.getLeft();
+  Option currentLocOpt = 
Option.ofNullable(v.getRight().orElse(null));
+  if (currentLocOpt.isPresent()) {
+HoodieRecordGlobalLocation currentLoc = currentLocOpt.get();
+boolean shouldPerformMergedLookUp = mayContainDuplicateLookup
+|| !Objects.equals(incomingRecord.getPartitionPath(), 
currentLoc.getPartitionPath());
+if (shouldUpdatePartitionPath && shouldPerformMergedLookUp) {
+  return Pair.of(incomingRecord, currentLocOpt);
+} else {
+  // - When update partition path is set to false,
+  //   the incoming record will be tagged to the existing record's 
partition regardless of being equal or not.
+  // - When update partition path is set to true,
+  //   the incoming record will be tagged to the existing record's 
partition
+  //   when partition is not updated and the look-up won't have 
duplicates (e.g. COW, or using RLI).
+  return Pair.of((HoodieRecord) getTaggedRecord(
+  createNewHoodieRecord(incomingRecord, currentLoc, 
merger), Option.of(currentLoc)),
+  Option.empty());
+}
+  } else {
+return Pair.of(getTaggedRecord(incomingRecord, Option.empty()), 
Option.empty());
+  }
+});
+return shouldUpdatePartitionPath
+? mergeForPartitionUpdatesIfNeeded(incomingRecordsAndLocations, 
config, table)
+: incomingRecordsAndLocations.map(Pair::getLeft);
+  }
+
+  public static HoodieRecord createNewHoodieRecord(HoodieRecord oldRecord, 
HoodieRecordGlobalLocation location, HoodieRecordMerger merger) {
+HoodieKey recordKey = new HoodieKey(oldRecord.getRecordKey(), 
location.getPartitionPath());
+return merger.getRecordType() == HoodieRecordType.AVRO

Review Comment:
   new record creation needs optimization; i have not finished it yet.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on a diff in pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub


codope commented on code in PR #9041:
URL: https://github.com/apache/hudi/pull/9041#discussion_r1246648546


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/simple/HoodieGlobalSimpleIndex.java:
##
@@ -72,85 +68,37 @@ public  HoodieData> tagLocation(
   protected  HoodieData> tagLocationInternal(
   HoodieData> inputRecords, HoodieEngineContext context,
   HoodieTable hoodieTable) {
-
-HoodiePairData> keyedInputRecords =
-inputRecords.mapToPair(entry -> new 
ImmutablePair<>(entry.getRecordKey(), entry));
-HoodiePairData allRecordLocationsInTable =
-fetchAllRecordLocations(context, hoodieTable, 
config.getGlobalSimpleIndexParallelism());
-return getTaggedRecords(keyedInputRecords, allRecordLocationsInTable, 
hoodieTable);
+List> latestBaseFiles = 
getAllBaseFilesInTable(context, hoodieTable);

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1613184716

   
   ## CI report:
   
   * 4775dce07f2f3237b32f22b360f3423b1eafce85 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18191)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9041:
URL: https://github.com/apache/hudi/pull/9041#issuecomment-1613184534

   
   ## CI report:
   
   * b681df04a7ad0febbcd9235622c2ee7f98759cf9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18082)
 
   * 2f139383c54f93669342539af77dca9b3a352be3 UNKNOWN
   * a1458e17e5749a89948be8f60387eeecd4c0f87c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6459) Add Rollback test for Record Level Index

2023-06-29 Thread Lokesh Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HUDI-6459:
--
Summary: Add Rollback test for Record Level Index  (was: Add Rollback 
validation for Record Level Index)

> Add Rollback test for Record Level Index
> 
>
> Key: HUDI-6459
> URL: https://issues.apache.org/jira/browse/HUDI-6459
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>
> The Jira aims to add validation for rollback with record level index. The 
> validation is added in TestRecordLevelIndex test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6459) Add Rollback validation for Record Level Index

2023-06-29 Thread Lokesh Jain (Jira)
Lokesh Jain created HUDI-6459:
-

 Summary: Add Rollback validation for Record Level Index
 Key: HUDI-6459
 URL: https://issues.apache.org/jira/browse/HUDI-6459
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Lokesh Jain
Assignee: Lokesh Jain


The Jira aims to add validation for rollback with record level index. The 
validation is added in TestRecordLevelIndex test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9064:
URL: https://github.com/apache/hudi/pull/9064#issuecomment-1613172562

   
   ## CI report:
   
   * 2b572a55998c0e1c4eca7970e8f63ed79254161c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18127)
 
   * b8418b74febf4551c0f79c7ebe71cf24916124e6 UNKNOWN
   * 9c6d2bf222b7247bc926302045123bad69157d39 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18198)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1613172489

   
   ## CI report:
   
   * 4775dce07f2f3237b32f22b360f3423b1eafce85 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9041:
URL: https://github.com/apache/hudi/pull/9041#issuecomment-1613172359

   
   ## CI report:
   
   * b681df04a7ad0febbcd9235622c2ee7f98759cf9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18082)
 
   * 2f139383c54f93669342539af77dca9b3a352be3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9092: [MINOR] Enable log compaction by default for MDT

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9092:
URL: https://github.com/apache/hudi/pull/9092#issuecomment-1613159780

   
   ## CI report:
   
   * 408e9f946e0a0647b0fc9f8e220d55ad2fbde62d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18199)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9089: [MINOR] Increase timeout for Azure CI: UT spark-datasource to 240 minutes

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9089:
URL: https://github.com/apache/hudi/pull/9089#issuecomment-1613159726

   
   ## CI report:
   
   * 4d2e8926188ce5aa2342054aeb99bf1d31eaf0e3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18190)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9064:
URL: https://github.com/apache/hudi/pull/9064#issuecomment-1613159516

   
   ## CI report:
   
   * 2b572a55998c0e1c4eca7970e8f63ed79254161c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18127)
 
   * b8418b74febf4551c0f79c7ebe71cf24916124e6 UNKNOWN
   * 9c6d2bf222b7247bc926302045123bad69157d39 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1613159448

   
   ## CI report:
   
   * 4775dce07f2f3237b32f22b360f3423b1eafce85 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18191)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on a diff in pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub


codope commented on code in PR #9058:
URL: https://github.com/apache/hudi/pull/9058#discussion_r1246592430


##
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##
@@ -209,9 +211,10 @@ public class HoodieMetadataPayload implements 
HoodieRecordPayload 
orderingVal) {
-this(Option.of(record));
+  public HoodieMetadataPayload(@Nullable GenericRecord record, Comparable 
orderingVal) {
+this(Option.ofNullable(record));

Review Comment:
   Where is this constructor used?



##
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##
@@ -283,6 +285,8 @@ public HoodieMetadataPayload(Option 
recordOpt) {
 
Integer.parseInt(recordIndexRecord.get(RECORD_INDEX_FIELD_FILE_INDEX).toString()),
 
Long.parseLong(recordIndexRecord.get(RECORD_INDEX_FIELD_INSTANT_TIME).toString()));
   }
+} else {
+  this.isDeletedRecord = true;

Review Comment:
   I would favor `isDeleted` field in `HoodieRecordIndexInfo` in the schema.
   
   1. It keeps the schema consistent wrt deletes for different MDT index types. 
Let's say some index types have `isDeleted` and some don't, then it's an added 
mental burden for developers and also not easy to maintain as we add more 
indexes. 
   2. It gives enough flexibility to have separate delete handling logic for 
different index types.
   3. Let's consider the semantics of the if-else in the 
`HoodieMetadataPayload` constructor. It is based on different index types. By 
setting `this.isDeletedRecord = true` in the last else-block we're saying that 
for all index types other than the ones above, consider the record to be 
deleted. It does not make much sense from the pov of adding more index types in 
the future.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub


xushiyan commented on code in PR #9041:
URL: https://github.com/apache/hudi/pull/9041#discussion_r1246579974


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##
@@ -310,6 +312,56 @@ public static  HoodieData> 
mergeForPartitionUpdates(
 return Arrays.asList(deleteRecord, getTaggedRecord(merged, 
Option.empty())).iterator();
   }
 });
-return taggedUpdatingRecords.union(newRecords);
+return taggedUpdatingRecords.union(taggedNewRecords);
+  }
+
+  public static  HoodieData> tagGlobalLocationBackToRecords(
+  HoodieData> incomingRecords,
+  HoodiePairData 
keyAndExistingLocations,
+  boolean mayContainDuplicateLookup,
+  boolean shouldUpdatePartitionPath,
+  HoodieWriteConfig config,
+  HoodieTable table) {
+final HoodieRecordMerger merger = config.getRecordMerger();
+
+HoodiePairData> keyAndIncomingRecords =
+incomingRecords.mapToPair(record -> Pair.of(record.getRecordKey(), 
record));
+
+// Pair of incoming record and the global location if meant for merged 
lookup in later stage
+HoodieData, Option>> 
incomingRecordsAndLocations
+= keyAndIncomingRecords.leftOuterJoin(keyAndExistingLocations).values()
+.map(v -> {
+  final HoodieRecord incomingRecord = v.getLeft();
+  Option currentLocOpt = 
Option.ofNullable(v.getRight().orElse(null));
+  if (currentLocOpt.isPresent()) {
+HoodieRecordGlobalLocation currentLoc = currentLocOpt.get();
+boolean shouldPerformMergedLookUp = mayContainDuplicateLookup
+|| !Objects.equals(incomingRecord.getPartitionPath(), 
currentLoc.getPartitionPath());
+if (shouldUpdatePartitionPath && shouldPerformMergedLookUp) {
+  return Pair.of(incomingRecord, currentLocOpt);
+} else {
+  // - When update partition path is set to false,
+  //   the incoming record will be tagged to the existing record's 
partition regardless of being equal or not.
+  // - When update partition path is set to true,
+  //   the incoming record will be tagged to the existing record's 
partition
+  //   when partition is not updated and the look-up won't have 
duplicates (e.g. COW, or using RLI).
+  return Pair.of((HoodieRecord) getTaggedRecord(
+  createNewHoodieRecord(incomingRecord, currentLoc, 
merger), Option.of(currentLoc)),
+  Option.empty());
+}
+  } else {
+return Pair.of(getTaggedRecord(incomingRecord, Option.empty()), 
Option.empty());

Review Comment:
   refactored relevant helper methods



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub


xushiyan commented on code in PR #9041:
URL: https://github.com/apache/hudi/pull/9041#discussion_r1246579590


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##
@@ -310,6 +312,56 @@ public static  HoodieData> 
mergeForPartitionUpdates(
 return Arrays.asList(deleteRecord, getTaggedRecord(merged, 
Option.empty())).iterator();
   }
 });
-return taggedUpdatingRecords.union(newRecords);
+return taggedUpdatingRecords.union(taggedNewRecords);
+  }
+
+  public static  HoodieData> tagGlobalLocationBackToRecords(
+  HoodieData> incomingRecords,
+  HoodiePairData 
keyAndExistingLocations,
+  boolean mayContainDuplicateLookup,
+  boolean shouldUpdatePartitionPath,
+  HoodieWriteConfig config,
+  HoodieTable table) {
+final HoodieRecordMerger merger = config.getRecordMerger();
+
+HoodiePairData> keyAndIncomingRecords =
+incomingRecords.mapToPair(record -> Pair.of(record.getRecordKey(), 
record));
+
+// Pair of incoming record and the global location if meant for merged 
lookup in later stage
+HoodieData, Option>> 
incomingRecordsAndLocations
+= keyAndIncomingRecords.leftOuterJoin(keyAndExistingLocations).values()
+.map(v -> {
+  final HoodieRecord incomingRecord = v.getLeft();
+  Option currentLocOpt = 
Option.ofNullable(v.getRight().orElse(null));
+  if (currentLocOpt.isPresent()) {
+HoodieRecordGlobalLocation currentLoc = currentLocOpt.get();
+boolean shouldPerformMergedLookUp = mayContainDuplicateLookup
+|| !Objects.equals(incomingRecord.getPartitionPath(), 
currentLoc.getPartitionPath());
+if (shouldUpdatePartitionPath && shouldPerformMergedLookUp) {
+  return Pair.of(incomingRecord, currentLocOpt);
+} else {
+  // - When update partition path is set to false,
+  //   the incoming record will be tagged to the existing record's 
partition regardless of being equal or not.
+  // - When update partition path is set to true,
+  //   the incoming record will be tagged to the existing record's 
partition
+  //   when partition is not updated and the look-up won't have 
duplicates (e.g. COW, or using RLI).
+  return Pair.of((HoodieRecord) getTaggedRecord(
+  createNewHoodieRecord(incomingRecord, currentLoc, 
merger), Option.of(currentLoc)),

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [MINOR] Improve CollectionUtils helper methods (#9088)

2023-06-29 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 8def3e68ae5 [MINOR] Improve CollectionUtils helper methods (#9088)
8def3e68ae5 is described below

commit 8def3e68ae5a0b72eefe26db49b6d33226f7b4c0
Author: Shiyan Xu <2701446+xushi...@users.noreply.github.com>
AuthorDate: Thu Jun 29 05:35:19 2023 -0700

[MINOR] Improve CollectionUtils helper methods (#9088)
---
 .../action/clean/CleanPlanActionExecutor.java  |  4 +--
 .../action/commit/TestSchemaEvolutionClient.java   |  3 +-
 .../table/action/rollback/TestRollbackUtils.java   |  3 +-
 .../table/functional/TestCleanPlanExecutor.java|  2 +-
 .../apache/hudi/common/util/CollectionUtils.java   | 35 +++---
 .../hudi/common/table/TestTimelineUtils.java   |  2 +-
 .../table/view/TestIncrementalFSViewSync.java  |  3 +-
 .../hudi/common/testutils/HoodieTestTable.java |  8 ++---
 8 files changed, 23 insertions(+), 37 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java
index 043db1acbf9..ba7c71b1356 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java
@@ -29,7 +29,6 @@ import org.apache.hudi.common.table.timeline.HoodieInstant;
 import org.apache.hudi.common.table.timeline.HoodieTimeline;
 import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
 import org.apache.hudi.common.util.CleanerUtils;
-import org.apache.hudi.common.util.CollectionUtils;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.collection.Pair;
 import org.apache.hudi.config.HoodieWriteConfig;
@@ -42,6 +41,7 @@ import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 import java.io.IOException;
+import java.util.Collections;
 import java.util.List;
 import java.util.Map;
 import java.util.stream.Collectors;
@@ -132,7 +132,7 @@ public class CleanPlanActionExecutor extends 
BaseActionExecutor new HoodieActionInstant(x.getTimestamp(), x.getAction(), 
x.getState().name())).orElse(null),
   planner.getLastCompletedCommitTimestamp(),
-  config.getCleanerPolicy().name(), 
CollectionUtils.createImmutableMap(),
+  config.getCleanerPolicy().name(), Collections.emptyMap(),
   CleanPlanner.LATEST_CLEAN_PLAN_VERSION, cleanOps, 
partitionsToDelete);
 } catch (IOException e) {
   throw new HoodieIOException("Failed to schedule clean operation", e);
diff --git 
a/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/table/action/commit/TestSchemaEvolutionClient.java
 
b/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/table/action/commit/TestSchemaEvolutionClient.java
index bf825df570f..dc45a80754b 100644
--- 
a/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/table/action/commit/TestSchemaEvolutionClient.java
+++ 
b/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/table/action/commit/TestSchemaEvolutionClient.java
@@ -24,7 +24,6 @@ import org.apache.hudi.common.model.HoodieAvroRecord;
 import org.apache.hudi.common.model.HoodieKey;
 import org.apache.hudi.common.table.TableSchemaResolver;
 import org.apache.hudi.common.testutils.RawTripTestPayload;
-import org.apache.hudi.common.util.CollectionUtils;
 import org.apache.hudi.config.HoodieWriteConfig;
 import org.apache.hudi.internal.schema.Types;
 import org.apache.hudi.testutils.HoodieJavaClientTestHarness;
@@ -72,7 +71,7 @@ public class TestSchemaEvolutionClient extends 
HoodieJavaClientTestHarness {
 .withEngineType(EngineType.JAVA)
 .withPath(basePath)
 .withSchema(SCHEMA.toString())
-
.withProps(CollectionUtils.createImmutableMap(HoodieWriteConfig.TBL_NAME.key(), 
"hoodie_test_table"))
+.withProps(Collections.singletonMap(HoodieWriteConfig.TBL_NAME.key(), 
"hoodie_test_table"))
 .build();
 return new HoodieJavaWriteClient<>(context, config);
   }
diff --git 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackUtils.java
 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackUtils.java
index f03d9f3967d..c22a2aef424 100644
--- 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackUtils.java
+++ 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackUtils.java
@@ -30,6 +30,7 @@ import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.fs.permission.FsPermission;
 import org.junit.jupiter.api.Test;
 
+import 

[GitHub] [hudi] xushiyan merged pull request #9088: [MINOR] Improve CollectionUtils helper methods

2023-06-29 Thread via GitHub


xushiyan merged PR #9088:
URL: https://github.com/apache/hudi/pull/9088


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9092: [MINOR] Enable log compaction by default for MDT

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9092:
URL: https://github.com/apache/hudi/pull/9092#issuecomment-1613076306

   
   ## CI report:
   
   * 408e9f946e0a0647b0fc9f8e220d55ad2fbde62d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9064:
URL: https://github.com/apache/hudi/pull/9064#issuecomment-1613075951

   
   ## CI report:
   
   * 2b572a55998c0e1c4eca7970e8f63ed79254161c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18127)
 
   * b8418b74febf4551c0f79c7ebe71cf24916124e6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8609: [HUDI-6154] Introduced retry while reading hoodie.properties to deal with parallel updates.

2023-06-29 Thread via GitHub


hudi-bot commented on PR #8609:
URL: https://github.com/apache/hudi/pull/8609#issuecomment-1613056925

   
   ## CI report:
   
   * e14bd41edf6cc961d77087eea67f755f23590834 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17992)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18115)
 
   * a64034d612fa64c99dd8d319ac00680924773f53 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18197)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6458) Scheduling jobs should not fail when there is no completed commits

2023-06-29 Thread kwang (Jira)
kwang created HUDI-6458:
---

 Summary: Scheduling jobs should not fail when there is no 
completed commits
 Key: HUDI-6458
 URL: https://issues.apache.org/jira/browse/HUDI-6458
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: kwang
 Fix For: 0.14.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] zaza commented on a diff in pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString

2023-06-29 Thread via GitHub


zaza commented on code in PR #9064:
URL: https://github.com/apache/hudi/pull/9064#discussion_r1246538265


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/cdc/HoodieCDCRDD.scala:
##
@@ -561,7 +561,7 @@ class HoodieCDCRDD(
   originTableSchema.structTypeSchema.zipWithIndex.foreach {
 case (field, idx) =>
   if (field.dataType.isInstanceOf[StringType]) {
-map(field.name) = record.getString(idx)
+map(field.name) = 
Option(record.getUTF8String(idx)).map(_.toString).orNull
   } else {

Review Comment:
   This is what I have based on my limited knowledge of Hudi: 
https://github.com/apache/hudi/pull/9064/commits/c88aee0f26afa779594a9981d86aeb3d06727d4b
   
   I'm more than happy to make further adjustments when needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope opened a new pull request, #9092: [MINOR] Enable log compaction by default for MDT

2023-06-29 Thread via GitHub


codope opened a new pull request, #9092:
URL: https://github.com/apache/hudi/pull/9092

   ### Change Logs
   
   Enable log compaction on metadata table by default.
   
   ### Impact
   
   Will compact log blocks to produce another log file every 5 log blocks.
   
   ### Risk level (write none, low medium or high below)
   
   medium
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9088: [MINOR] Improve CollectionUtils helper methods

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9088:
URL: https://github.com/apache/hudi/pull/9088#issuecomment-1613041272

   
   ## CI report:
   
   * fb282b7602962846c4f561cd101033fca41e43d6 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18182)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8609: [HUDI-6154] Introduced retry while reading hoodie.properties to deal with parallel updates.

2023-06-29 Thread via GitHub


hudi-bot commented on PR #8609:
URL: https://github.com/apache/hudi/pull/8609#issuecomment-1613038827

   
   ## CI report:
   
   * e14bd41edf6cc961d77087eea67f755f23590834 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17992)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18115)
 
   * a64034d612fa64c99dd8d319ac00680924773f53 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6457) Keep JavaSizeBasedClusteringPlanStrategy and SparkSizeBasedClusteringPlanStrategy aligned

2023-06-29 Thread kwang (Jira)
kwang created HUDI-6457:
---

 Summary: Keep JavaSizeBasedClusteringPlanStrategy and 
SparkSizeBasedClusteringPlanStrategy aligned
 Key: HUDI-6457
 URL: https://issues.apache.org/jira/browse/HUDI-6457
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: kwang
 Fix For: 0.14.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] zaza commented on a diff in pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString

2023-06-29 Thread via GitHub


zaza commented on code in PR #9064:
URL: https://github.com/apache/hudi/pull/9064#discussion_r1246504222


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/cdc/HoodieCDCRDD.scala:
##
@@ -561,7 +561,7 @@ class HoodieCDCRDD(
   originTableSchema.structTypeSchema.zipWithIndex.foreach {
 case (field, idx) =>
   if (field.dataType.isInstanceOf[StringType]) {
-map(field.name) = record.getString(idx)
+map(field.name) = 
Option(record.getUTF8String(idx)).map(_.toString).orNull
   } else {

Review Comment:
   Absolutely, the only problem is that I don't see any unit tests for the cdc 
package so it's hard to follow existing examples. I tried implementing a test 
that extends `HoodieClientTestBase` but that was getting me far from the 
requested "unit test". What would be the best way to start with tests for this 
particular issue?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] lipusheng opened a new issue, #9091: [SUPPORT]

2023-06-29 Thread via GitHub


lipusheng opened a new issue, #9091:
URL: https://github.com/apache/hudi/issues/9091

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   When I use the Spark synchronous Hive table data to Hudi table I specify 
"KeyGeneratorOptions. RECORDKEY_FIELD_NAME" for "id, user_id," Use the 
"KEYGENERATOR_CLAS" to "NonpartitionedKeyGenerator" and specify the "hoodie. 
The datasource. Write. Operatio" for "WriteOperationType.BULK_INSERT" In this 
case, the _hoodie_record_key for writing data is "125230088,6941". When I 
access Kafka data, I just changed the "hoodie. The datasource. Write. Operatio" 
for "WriteOperationType. UPSERT", but "_hoodie_record_key format has changed," 
The system changes to user_id:125230088,id:6941, and data duplication occurs 
during the query
   
![image](https://github.com/apache/hudi/assets/57984409/f45c37a8-b38c-4457-9677-2fcbe3bac178)
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.12.0
   
   * Spark version : 3.3.1
   
   * Hive version : 3.1.3
   
   * Hadoop version : 3.2.1
   
   * Storage (HDFS/S3/GCS..) : OSS
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on a diff in pull request #8609: [HUDI-6154] Introduced retry while reading hoodie.properties to deal with parallel updates.

2023-06-29 Thread via GitHub


codope commented on code in PR #8609:
URL: https://github.com/apache/hudi/pull/8609#discussion_r1246489239


##
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java:
##
@@ -334,22 +337,43 @@ public HoodieTableConfig() {
 super();
   }
 
-  private void fetchConfigs(FileSystem fs, String metaPath) throws IOException 
{
+  private static TypedProperties fetchConfigs(FileSystem fs, String metaPath) 
throws IOException {
 Path cfgPath = new Path(metaPath, HOODIE_PROPERTIES_FILE);
-try (FSDataInputStream is = fs.open(cfgPath)) {
-  props.load(is);
-} catch (IOException ioe) {
-  if (!fs.exists(cfgPath)) {
-LOG.warn("Run `table recover-configs` if config update/delete failed 
midway. Falling back to backed up configs.");
-// try the backup. this way no query ever fails if update fails midway.
-Path backupCfgPath = new Path(metaPath, HOODIE_PROPERTIES_FILE_BACKUP);
-try (FSDataInputStream is = fs.open(backupCfgPath)) {
+Path backupCfgPath = new Path(metaPath, HOODIE_PROPERTIES_FILE_BACKUP);
+int readRetryCount = 0;
+boolean found = false;
+
+TypedProperties props = new TypedProperties();
+while (readRetryCount++ < MAX_READ_RETRIES) {
+  for (Path path : Arrays.asList(cfgPath, backupCfgPath)) {
+// Read the properties and validate that it is a valid file
+try (FSDataInputStream is = fs.open(path)) {
+  props.clear();
   props.load(is);
+  found = true;
+  ValidationUtils.checkArgument(validateChecksum(props));
+  return props;
+} catch (IOException e) {
+  LOG.warn(String.format("Could not read properties from %s: %s", 
path, e));
+} catch (IllegalArgumentException e) {
+  LOG.warn(String.format("Invalid properties file %s: %s", path, 
props));
 }
-  } else {
-throw ioe;
+  }
+
+  // Failed to read all files so wait before retrying. This can happen in 
cases of parallel updates to the properties.
+  try {
+Thread.sleep(READ_RETRY_DELAY_MSEC);
+  } catch (InterruptedException e) {
+LOG.warn("Interrupted while waiting");
   }
 }
+
+// If we are here then after all retries either no hoodie.properties was 
found or only an invalid file was found.
+if (found) {
+  throw new IllegalArgumentException("hoodie.properties file seems 
invalid. Please check for left over `.updated` files if any, manually copy it 
to hoodie.properties and retry");
+} else {
+  throw new HoodieIOException("Could not load Hoodie properties from " + 
cfgPath);

Review Comment:
   Fixed the deltastreamer tests by modifying the exception message here as 
deltastreamer depends on specific messae. Pitfalls of depending on exception 
message as business logic! We should try to avoid that as much as apossible.
   
https://github.com/apache/hudi/blob/b95248e011931f4748a7a9fbb8298cbbb71bda88/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java#L695-L697



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9082: [HUDI-6445] Distribute spark ds func tests

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9082:
URL: https://github.com/apache/hudi/pull/9082#issuecomment-1612915683

   
   ## CI report:
   
   * c529c624afdca331514a2bdfb78cc6e18ab9f57a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18185)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1612915077

   
   ## CI report:
   
   * 3b6d13a83efdae5e46eebe9ae168ba7e0d8e9f34 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18189)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] LINGQ1991 commented on issue #8903: [SUPPORT] aws spark3.2.1 & hudi 0.13.1 with java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.PartitionedFile

2023-06-29 Thread via GitHub


LINGQ1991 commented on issue #8903:
URL: https://github.com/apache/hudi/issues/8903#issuecomment-1612912367

   > @ad1happy2go I use emr-6.5.0. It's error with " 
java.lang.NoSuchMethodError: 
org.apache.spark.sql.execution.datasources.PartitionedFile".
   > 
   > But i have package with oss spark and hudi bundle. Work ok now.
   > 
   > ```java
   > 
   > org.apache.maven.plugins
   > maven-shade-plugin
   > 3.2.1
   > 
   > hudi-${spark.version}-plugin
   > 
false
   > 
   > 
   > 
   > package
   > 
   > shade
   > 
   > 
   > 
   > 
   > 
org.apache.spark.sql.execution.datasources.PartitionedFile
   > 
org.local.spark.sql.execution.datasources.PartitionedFile
   > 
   > 
   > org.apache.curator
   > 
org.local.curator
   > 
   > 
   > 
   >  
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"/>
   > 
   > 
   > 
   > *:*
   > 
   > 
module-info.class
   > 
org/apache/spark/unused/**
   > 
   > 
   > 
   > *:*
   > 
   > META-INF/*.SF
   > META-INF/*.DSA
   > META-INF/*.RSA
   > 
   > 
   > 
   > 
   > 
   > 
   > 
   > ```
   
   I have package with hudi bundle. But the following error occurred
   `Caused by: java.lang.ClassCastException: 
org.apache.hudi.spark.org.apache.spark.sql.execution.datasources.PartitionedFile
 cannot be cast to org.apache.spark.sql.execution.datasources.PartitionedFile
at 
org.apache.hudi.HoodieMergeOnReadRDD.read(HoodieMergeOnReadRDD.scala:113)
at 
org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)`
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] flashJd commented on pull request #9048: [HUDI-6434] Fix illegalArgumentException when do read_optimized read in Flink

2023-06-29 Thread via GitHub


flashJd commented on PR #9048:
URL: https://github.com/apache/hudi/pull/9048#issuecomment-1612907539

   > The `DeltaCommitWriteHandleFactory` can be tweaked for the purpose, I'm 
wondering what's the engine conflicts you are talking about?
   
   sry to reply late
   ## engine conflicts:
   v0.12.2 when spark insert overwrite a partition after flink write the log 
files only bucket in this partition, 
   
https://github.com/apache/hudi/blob/b95248e011931f4748a7a9fbb8298cbbb71bda88/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java#L160
 throws, but I found it was fixed in the master
   ## other consideration:
   If align the first create base file logic, many codes can be simplified, 
like:
   
https://github.com/apache/hudi/blob/b95248e011931f4748a7a9fbb8298cbbb71bda88/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java#L362
   
https://github.com/apache/hudi/blob/b95248e011931f4748a7a9fbb8298cbbb71bda88/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/CompactionExecutionHelper.java#L63
   
https://github.com/apache/hudi/blob/b95248e011931f4748a7a9fbb8298cbbb71bda88/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java#L200
   etc.
   what's your opinion, looking forward to your reply


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] flashJd commented on pull request #9048: [HUDI-6434] Fix illegalArgumentException when do read_optimized read in Flink

2023-06-29 Thread via GitHub


flashJd commented on PR #9048:
URL: https://github.com/apache/hudi/pull/9048#issuecomment-1612904526

   > 
   
   sry to reply late
   ## engine conflicts:
   v0.12.2 when spark insert overwrite a partition after flink write the log 
files only bucket in this partition, 
   
https://github.com/apache/hudi/blob/b95248e011931f4748a7a9fbb8298cbbb71bda88/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java#L160
 throws, but I found it was fixed in the master
   ## other consideration:
   If align the first create base file logic, many codes can be simplified, 
like:
   
https://github.com/apache/hudi/blob/b95248e011931f4748a7a9fbb8298cbbb71bda88/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java#L362
   
https://github.com/apache/hudi/blob/b95248e011931f4748a7a9fbb8298cbbb71bda88/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/CompactionExecutionHelper.java#L63
   
https://github.com/apache/hudi/blob/b95248e011931f4748a7a9fbb8298cbbb71bda88/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java#L200
   etc.
   what's your opinion, looking forward to your reply


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] beyond1920 opened a new issue, #9090: [SUPPORT]

2023-06-29 Thread via GitHub


beyond1920 opened a new issue, #9090:
URL: https://github.com/apache/hudi/issues/9090

   I cherry pick [HUDI-1517](https://issues.apache.org/jira/browse/HUDI-1517) 
into internal HUDI version. 
   And find a FileNotFoundException during read latest snapshot of a MOR table.
   
![1688033363329](https://github.com/apache/hudi/assets/1525333/9330203d-866e-4c3d-96a8-922960afc152)
   
   The exception would happen if enable spark speculative feature, there exists 
concurrent writer and reader. For example:
   1. Job1 is writing to a MOR table and not finished yet. It enables spark 
speculative feature.
   2. Job2 is reading the latest snapshot from the MOR table, when it call 
getLatestMergedFileSlicesBeforeOrOn, it might list the log files which are 
written by speculative attempt task in Job1.
   3. Job1 is finished, deletes the log files which are written by slow 
speculative tasks.
   4. Job2 throws the FileNotFoundException when it read the log file which is 
already deleted in step3


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] beyond1920 commented on pull request #4913: [HUDI-1517] create marker file for every log file

2023-06-29 Thread via GitHub


beyond1920 commented on PR #4913:
URL: https://github.com/apache/hudi/pull/4913#issuecomment-1612808066

   I cherrypick this PR to the internal HUDI. And find a 
`FileNotFoundException` during read latest snapshot of a mor table.
   
![1688033363329](https://github.com/apache/hudi/assets/1525333/99459239-1dbf-4067-8020-d4e20bae0bd1)
   The exception would happen if enable spark speculative feature under the 
following case.
   1. Job1 is writing to a MOR table and not finished yet. It enables spark 
speculative feature.
   2. Job2 is reading the latest snapshot from the MOR table, when it call 
`getLatestMergedFileSlicesBeforeOrOn`, it might list the log files which are 
written by speculative attempt task in Job1. 
   3. Job1 is finished, deletes the log files which are written by slow 
speculative tasks.
   4. Job2 throws the `FileNotFoundException` when it read the log file which 
is already deleted in step3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9066: [HUDI-6452] Add MOR snapshot reader to integrate with query engines without using Hadoop APIs

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9066:
URL: https://github.com/apache/hudi/pull/9066#issuecomment-1612807150

   
   ## CI report:
   
   * 8662958e8ccb7203d320dc33445f9f2dbc28fb0c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18159)
 
   * 60c1b8c5885fdda28e07f3ba79290f01dc60a9c4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18196)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8933: [HUDI-5329] Spark reads table error when Flink creates table without record key and primary key

2023-06-29 Thread via GitHub


hudi-bot commented on PR #8933:
URL: https://github.com/apache/hudi/pull/8933#issuecomment-1612806333

   
   ## CI report:
   
   * d1564f421664fd2dee15dfdbdae4dec07baedf92 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18186)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9066: [HUDI-6452] Add MOR snapshot reader to integrate with query engines without using Hadoop APIs

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9066:
URL: https://github.com/apache/hudi/pull/9066#issuecomment-1612791679

   
   ## CI report:
   
   * 8662958e8ccb7203d320dc33445f9f2dbc28fb0c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18159)
 
   * 60c1b8c5885fdda28e07f3ba79290f01dc60a9c4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9058:
URL: https://github.com/apache/hudi/pull/9058#issuecomment-1612791490

   
   ## CI report:
   
   * 345482ba6529fc3bf0ac9f50ce0c1d79a3accd37 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18163)
 
   * 1697d1bfa095ca16a9361e3728a77331d3a28037 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18195)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9058:
URL: https://github.com/apache/hudi/pull/9058#issuecomment-1612774450

   
   ## CI report:
   
   * 345482ba6529fc3bf0ac9f50ce0c1d79a3accd37 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18163)
 
   * 1697d1bfa095ca16a9361e3728a77331d3a28037 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9017: [HUDI-6393] Add functional tests for RecordLevelIndex

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9017:
URL: https://github.com/apache/hudi/pull/9017#issuecomment-1612701307

   
   ## CI report:
   
   * a3c1d99e2266ec68d9082fe4c76c4bf62070f5a9 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18184)
 
   * ceffe7d8146f48e1c6c083613646463c1404a77f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18194)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub


xushiyan commented on code in PR #9058:
URL: https://github.com/apache/hudi/pull/9058#discussion_r1246371700


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieIndex.java:
##
@@ -749,6 +749,67 @@ public void testRecordIndexTagLocationAndUpdate(boolean 
populateMetaFields) thro
 assertEquals(newInsertsCount, recordLocations.filter(entry -> 
newPartitionPath.equalsIgnoreCase(entry._1.getPartitionPath())).count());
   }
 
+  @ParameterizedTest
+  @ValueSource(strings = "INMEMORY")

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9088: [MINOR] Improve CollectionUtils helper methods

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9088:
URL: https://github.com/apache/hudi/pull/9088#issuecomment-1612690821

   
   ## CI report:
   
   * fb282b7602962846c4f561cd101033fca41e43d6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18182)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9038: [HUDI-6423] Incremental cleaning should consider inflight compaction instant

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9038:
URL: https://github.com/apache/hudi/pull/9038#issuecomment-1612690558

   
   ## CI report:
   
   * a65a29c0cf1c8feb9f39e168ba80c99ebcae1c5d UNKNOWN
   * 34f8823f48712c57058bc37c8936a276c1457557 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18187)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18193)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18188)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9017: [HUDI-6393] Add functional tests for RecordLevelIndex

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9017:
URL: https://github.com/apache/hudi/pull/9017#issuecomment-1612690440

   
   ## CI report:
   
   * d0b2f2457cf648b1b631c75bd64cc1320af69393 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18030)
 
   * a3c1d99e2266ec68d9082fe4c76c4bf62070f5a9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18184)
 
   * ceffe7d8146f48e1c6c083613646463c1404a77f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9088: [MINOR] Improve CollectionUtils helper methods

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9088:
URL: https://github.com/apache/hudi/pull/9088#issuecomment-1612678874

   
   ## CI report:
   
   * fb282b7602962846c4f561cd101033fca41e43d6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1612678677

   
   ## CI report:
   
   * 69b2bb853be0f79845efd56f68b934b9f69ae22a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18160)
 
   * 4775dce07f2f3237b32f22b360f3423b1eafce85 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18191)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9038: [HUDI-6423] Incremental cleaning should consider inflight compaction instant

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9038:
URL: https://github.com/apache/hudi/pull/9038#issuecomment-1612678539

   
   ## CI report:
   
   * a65a29c0cf1c8feb9f39e168ba80c99ebcae1c5d UNKNOWN
   * 34f8823f48712c57058bc37c8936a276c1457557 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18188)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18187)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18193)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-6151) Rollback previously applied commits to MDT when operations are retried.

2023-06-29 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-6151.

Resolution: Fixed

Fixed via master branch: b95248e011931f4748a7a9fbb8298cbbb71bda88

> Rollback previously applied commits to MDT when operations are retried.
> ---
>
> Key: HUDI-6151
> URL: https://issues.apache.org/jira/browse/HUDI-6151
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Prashant Wason
>Assignee: Prashant Wason
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> Operations like Clean, Compaction are retried after failures with the same 
> instant time. If the previous run of the operation successfully committed to 
> the MDT but failed to commit to the dataset, then the operation will be 
> retried later with the same instantTime causing duplicate updates applied to 
> MDT.
> Currently, we simply delete the completed deltacommit without rolling back 
> the deltacommit.
> To handle this, we detect a replay of operation and rollback any changes from 
> that operation in MDT.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[hudi] branch master updated: [HUDI-6151] Rollback previously applied commits to MDT when operations are retried (#8604)

2023-06-29 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new b95248e0119 [HUDI-6151] Rollback previously applied commits to MDT 
when operations are retried (#8604)
b95248e0119 is described below

commit b95248e011931f4748a7a9fbb8298cbbb71bda88
Author: Prashant Wason 
AuthorDate: Thu Jun 29 01:59:08 2023 -0700

[HUDI-6151] Rollback previously applied commits to MDT when operations are 
retried (#8604)

Operations like Clean, Compaction are retried after failures with the same 
instant time. If the previous run of the operation successfully committed to 
the MDT but failed to commit to the dataset, then the operation will be retried 
later with the same instantTime causing duplicate updates applied to MDT.

Currently, we simply delete the completed deltacommit without rolling back 
the deltacommit.

To handle this, we detect a replay of operation and rollback any changes 
from that operation in MDT.

-

Co-authored-by: Sagar Sumit 
---
 .../FlinkHoodieBackedTableMetadataWriter.java  | 50 
 .../SparkHoodieBackedTableMetadataWriter.java  | 38 ++--
 .../functional/TestHoodieBackedMetadata.java   | 68 +-
 3 files changed, 113 insertions(+), 43 deletions(-)

diff --git 
a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/metadata/FlinkHoodieBackedTableMetadataWriter.java
 
b/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/metadata/FlinkHoodieBackedTableMetadataWriter.java
index 7dd32e2916e..6edeac05a74 100644
--- 
a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/metadata/FlinkHoodieBackedTableMetadataWriter.java
+++ 
b/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/metadata/FlinkHoodieBackedTableMetadataWriter.java
@@ -32,9 +32,13 @@ import org.apache.hudi.common.table.timeline.HoodieInstant;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.ValidationUtils;
 import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieMetadataException;
 import org.apache.hudi.exception.HoodieNotSupportedException;
 
 import org.apache.hadoop.conf.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
 import java.util.Collections;
 import java.util.HashMap;
 import java.util.List;
@@ -46,7 +50,7 @@ import static 
org.apache.hudi.common.model.HoodieFailedWritesCleaningPolicy.EAGE
  * Flink hoodie backed table metadata writer.
  */
 public class FlinkHoodieBackedTableMetadataWriter extends 
HoodieBackedTableMetadataWriter {
-
+  private static final Logger LOG = 
LoggerFactory.getLogger(FlinkHoodieBackedTableMetadataWriter.class);
   private transient BaseHoodieWriteClient writeClient;
 
   public static HoodieTableMetadataWriter create(Configuration conf, 
HoodieWriteConfig writeConfig,
@@ -118,33 +122,31 @@ public class FlinkHoodieBackedTableMetadataWriter extends 
HoodieBackedTableMetad
 
   if 
(!metadataMetaClient.getActiveTimeline().containsInstant(instantTime)) {
 // if this is a new commit being applied to metadata for the first time
-writeClient.startCommitWithTime(instantTime);
-
metadataMetaClient.getActiveTimeline().transitionRequestedToInflight(HoodieActiveTimeline.DELTA_COMMIT_ACTION,
 instantTime);
+LOG.info("New commit at " + instantTime + " being applied to MDT.");
   } else {
-Option alreadyCompletedInstant = 
metadataMetaClient.getActiveTimeline().filterCompletedInstants().filter(entry 
-> entry.getTimestamp().equals(instantTime)).lastInstant();
-if (alreadyCompletedInstant.isPresent()) {
-  // this code path refers to a re-attempted commit that got committed 
to metadata table, but failed in datatable.
-  // for eg, lets say compaction c1 on 1st attempt succeeded in 
metadata table and failed before committing to datatable.
-  // when retried again, data table will first rollback pending 
compaction. these will be applied to metadata table, but all changes
-  // are upserts to metadata table and so only a new delta commit will 
be created.
-  // once rollback is complete, compaction will be retried again, 
which will eventually hit this code block where the respective commit is
-  // already part of completed commit. So, we have to manually remove 
the completed instant and proceed.
-  // and it is for the same reason we enabled 
withAllowMultiWriteOnSameInstant for metadata table.
-  HoodieActiveTimeline.deleteInstantFile(metadataMetaClient.getFs(), 
metadataMetaClient.getMetaPath(), alreadyCompletedInstant.get());
-  metadataMetaClient.reloadActiveTimeline();
+// this code path refers to a re-attempted commit that:
+//   1. got committed to 

[GitHub] [hudi] danny0405 merged pull request #8604: [HUDI-6151] Rollback previously applied commits to MDT when operations are retried.

2023-06-29 Thread via GitHub


danny0405 merged PR #8604:
URL: https://github.com/apache/hudi/pull/8604


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] lokeshj1703 commented on a diff in pull request #9017: [HUDI-6393] Add functional tests for RecordLevelIndex

2023-06-29 Thread via GitHub


lokeshj1703 commented on code in PR #9017:
URL: https://github.com/apache/hudi/pull/9017#discussion_r1246314270


##
pom.xml:
##
@@ -175,7 +175,7 @@
 2.12.10
 ${scala12.version}
 2.8.1
-2.12
+2.11

Review Comment:
   Sorry! Forgot to remove this change. This was only for fixing the issues.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #9017: [HUDI-6393] Add functional tests for RecordLevelIndex

2023-06-29 Thread via GitHub


xushiyan commented on code in PR #9017:
URL: https://github.com/apache/hudi/pull/9017#discussion_r1246304418


##
pom.xml:
##
@@ -175,7 +175,7 @@
 2.12.10
 ${scala12.version}
 2.8.1
-2.12
+2.11

Review Comment:
   this is the default value which should be 2.12 because spark 3 is default 
now. If this is causing a problem, it means the test setup with spark 2.4 
profile has some gap, which we need to only fix for that profile/setup 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1612621031

   
   ## CI report:
   
   * 69b2bb853be0f79845efd56f68b934b9f69ae22a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18160)
 
   * 4775dce07f2f3237b32f22b360f3423b1eafce85 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-5608) Support decimals w/ precision > 30 in Column Stats

2023-06-29 Thread Jira


[ 
https://issues.apache.org/jira/browse/HUDI-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738431#comment-17738431
 ] 

赵富午 commented on HUDI-5608:
---

Is there any new progress?

> Support decimals w/ precision > 30 in Column Stats
> --
>
> Key: HUDI-5608
> URL: https://issues.apache.org/jira/browse/HUDI-5608
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 0.12.2
>Reporter: Alexey Kudinkin
>Priority: Critical
> Fix For: 0.14.0
>
>
> As reported in: [https://github.com/apache/hudi/issues/7732]
>  
> Currently we've limited precision of the supported decimals at 30 assuming 
> that this number is reasonably high to cover 99% of use-cases, but it seems 
> like there's still a demand for even larger Decimals.
> The challenge is however to balance the need to support longer Decimals vs 
> storage space we have to provision for each one of them.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #8604: [HUDI-6151] Rollback previously applied commits to MDT when operations are retried.

2023-06-29 Thread via GitHub


hudi-bot commented on PR #8604:
URL: https://github.com/apache/hudi/pull/8604#issuecomment-1612619567

   
   ## CI report:
   
   * eb39bc7559945e199e43a2a3d51e1ab15b4e3e2f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18183)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9087: [HUDI-6329] Write pipelines for table with consistent bucket index would detect whether clustering service occurs and automatically adjust the

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9087:
URL: https://github.com/apache/hudi/pull/9087#issuecomment-1612610932

   
   ## CI report:
   
   * 1bc4ea70966fd2c2cbd7cea126f4fd6b5c875077 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18181)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9088: [MINOR] Improve CollectionUtils helper methods

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9088:
URL: https://github.com/apache/hudi/pull/9088#issuecomment-1612610988

   
   ## CI report:
   
   * fb282b7602962846c4f561cd101033fca41e43d6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18182)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] liudu2326526 commented on issue #6297: [SUPPORT] Flink SQL client cow table query error "org/apache/parquet/column/ColumnDescriptor" (but mor table query normal)

2023-06-29 Thread via GitHub


liudu2326526 commented on issue #6297:
URL: https://github.com/apache/hudi/issues/6297#issuecomment-1612608779

   
   I also encountered this problem when reading hudi tables.I was able to run 
it locally, but failed to run it on the cluster.
   
   Caused by: java.lang.LinkageError: loader constraint violation: when 
resolving method 'void 
org.apache.flink.formats.parquet.vector.reader.BytesColumnReader.(org.apache.parquet.column.ColumnDescriptor,
 org.apache.parquet.column.page.PageReader)' the class loader 
org.apache.flink.util.ChildFirstClassLoader @d1be487 of the current class, 
org/apache/hudi/table/format/cow/ParquetSplitReaderUtil, and the class loader 
'app' for the method's defining class, 
org/apache/flink/formats/parquet/vector/reader/BytesColumnReader, have 
different Class objects for the type org/apache/parquet/column/ColumnDescriptor 
used in the signature (org.apache.hudi.table.format.cow.ParquetSplitReaderUtil 
is in unnamed module of loader org.apache.flink.util.ChildFirstClassLoader 
@d1be487, parent loader 'app'; 
org.apache.flink.formats.parquet.vector.reader.BytesColumnReader is in unnamed 
module of loader 'app')
   
   Hudi version :0.13.1
   Flink version :1.16.2
   Storage (HDFS/S3/GCS..) : huawei cloud OBS
   Running on Docker? (yes/no) :no
   flink runs in standalon mode
   
   `step 1:Write data
   sTableEnv.executeSql("CREATE TABLE t2(\n"
   + "  uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED,\n"
   + "  name VARCHAR(10),\n"
   + "  age INT,\n"
   + "  ts TIMESTAMP(3),\n"
   + "  `partition` VARCHAR(20)\n"
   + ")\n"
   + "PARTITIONED BY (`partition`)\n" +
   "with (\n" +
   "  'connector' = 'hudi'\n" +
   "  ,'path' = 
'obs://donson-mip-data-warehouse/dev/liudu/data/hudi_data'\n" +
   //"  ,'path' = 
'file:///Users/macbook/Downloads/obsa-hdfs-flink-obs/flink-hudi/src/test/hudi_data'\n"
 +
   //"  ,'table.type' = 'MERGE_ON_READ'\n" +
   ")");
   
   //sTableEnv.executeSql("insert into t2 select * from sourceT");
   
   sTableEnv.executeSql("INSERT INTO t2 VALUES\n"
   + "  ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),\n"
   + "  ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),\n"
   + "  ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),\n"
   + "  ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),\n"
   + "  ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),\n"
   + "  ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),\n"
   + "  ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),\n"
   + "  ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');");
   step 2:Read data
   sTableEnv.executeSql("CREATE TABLE t2(\n"
   + "  uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED,\n"
   + "  name VARCHAR(10),\n"
   + "  age INT,\n"
   + "  ts TIMESTAMP(3),\n"
   + "  `partition` VARCHAR(20)\n"
   + ")\n"
   + "PARTITIONED BY (`partition`)\n" +
   "with (\n" +
   "  'connector' = 'hudi'\n" +
   "  ,'path' = 
'obs://donson-mip-data-warehouse/dev/liudu/data/hudi_data'\n" +
   //"  ,'path' = 
'file:///Users/macbook/Downloads/obsa-hdfs-flink-obs/flink-hudi/src/test/hudi_data'\n"
 +
   //"  ,'table.type' = 'MERGE_ON_READ'\n" +
   ")");
   
   sTableEnv.executeSql("select * from t2 ").print();
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on a diff in pull request #8526: [HUDI-6116] Optimize log block reading by removing seeks to check corrupted blocks.

2023-06-29 Thread via GitHub


codope commented on code in PR #8526:
URL: https://github.com/apache/hudi/pull/8526#discussion_r1246115508


##
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieLogBlock.java:
##
@@ -264,8 +267,9 @@ public static Option 
tryReadContent(FSDataInputStream inputStream, Integ
 
 // TODO re-use buffer if stream is backed by buffer
 // Read the contents in memory
-byte[] content = new byte[contentLength];
-inputStream.readFully(content, 0, contentLength);
+ValidationUtils.checkArgument(contentLength <= Integer.MAX_VALUE, 
String.format("Content length %d exceeds maximum value of %d", contentLength, 
Integer.MAX_VALUE));

Review Comment:
   What's the point of changing the `contentLength` from int to long type and 
then validating that it's less than Integer.MAX_VALUE?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9089: [MINOR] Increase timeout for Azure CI: UT spark-datasource to 240 minutes

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9089:
URL: https://github.com/apache/hudi/pull/9089#issuecomment-1612600328

   
   ## CI report:
   
   * 4d2e8926188ce5aa2342054aeb99bf1d31eaf0e3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18190)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] thomasg19930417 commented on issue #9084: [SUPPORT] Historical Clean and RollBack commits are not archived

2023-06-29 Thread via GitHub


thomasg19930417 commented on issue #9084:
URL: https://github.com/apache/hudi/issues/9084#issuecomment-1612598176

   @danny0405  Are there any parameters to control this, or are there any 
instructions in the documentation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1612553846

   
   ## CI report:
   
   * 3360fa18333a0097fa762824f02eb9cd6c4bad5d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17958)
 
   * 3b6d13a83efdae5e46eebe9ae168ba7e0d8e9f34 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18189)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9089: [MINOR] Increase timeout for Azure CI: UT spark-datasource to 240 minutes

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9089:
URL: https://github.com/apache/hudi/pull/9089#issuecomment-1612545698

   
   ## CI report:
   
   * 4d2e8926188ce5aa2342054aeb99bf1d31eaf0e3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9038: [HUDI-6423] Incremental cleaning should consider inflight compaction instant

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9038:
URL: https://github.com/apache/hudi/pull/9038#issuecomment-1612545458

   
   ## CI report:
   
   * a65a29c0cf1c8feb9f39e168ba80c99ebcae1c5d UNKNOWN
   * 34f8823f48712c57058bc37c8936a276c1457557 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18188)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18187)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1612545219

   
   ## CI report:
   
   * 3360fa18333a0097fa762824f02eb9cd6c4bad5d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17958)
 
   * 3b6d13a83efdae5e46eebe9ae168ba7e0d8e9f34 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9038: [HUDI-6423] Incremental cleaning should consider inflight compaction instant

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9038:
URL: https://github.com/apache/hudi/pull/9038#issuecomment-1612536986

   
   ## CI report:
   
   * 1f61b83797a35d3d960f4bee865b14772931a4d2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18178)
 
   * a65a29c0cf1c8feb9f39e168ba80c99ebcae1c5d UNKNOWN
   * 34f8823f48712c57058bc37c8936a276c1457557 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18187)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18188)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] lokeshj1703 opened a new pull request, #9089: [MINOR] Increase timeout for Azure CI: UT spark-datasource to 240 minutes

2023-06-29 Thread via GitHub


lokeshj1703 opened a new pull request, #9089:
URL: https://github.com/apache/hudi/pull/9089

   ### Change Logs
   
   Azure CI: UT spark-datasource job times out frequently after 3 hours 
duration. This PR increases the timeout to 4 hours.
   
https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=17956=logs=b1544eb9-7ff1-5db9-0187-3e05abf459bc=e0ae894b-41c9-5f4b-7ed2-bdf5243b02e7
 
   
   ### Impact
   
   NA
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   NA
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9038: [HUDI-6423] Incremental cleaning should consider inflight compaction instant

2023-06-29 Thread via GitHub


hudi-bot commented on PR #9038:
URL: https://github.com/apache/hudi/pull/9038#issuecomment-1612497554

   
   ## CI report:
   
   * 1f61b83797a35d3d960f4bee865b14772931a4d2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18178)
 
   * a65a29c0cf1c8feb9f39e168ba80c99ebcae1c5d UNKNOWN
   * 34f8823f48712c57058bc37c8936a276c1457557 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8933: [HUDI-5329] Spark reads table error when Flink creates table without record key and primary key

2023-06-29 Thread via GitHub


hudi-bot commented on PR #8933:
URL: https://github.com/apache/hudi/pull/8933#issuecomment-1612497373

   
   ## CI report:
   
   * 9ab390d9f29c63cdd7a07da37ce1899cb43ce330 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17912)
 
   * d1564f421664fd2dee15dfdbdae4dec07baedf92 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18186)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



<    1   2   3   >