date:20230629

[GitHub] [hudi] hudi-bot commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1613601746

   
   ## CI report:
   
   * 4775dce07f2f3237b32f22b360f3423b1eafce85 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18191)
 
   * af66542fd96990611c79e90c943a18341442 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18203)
 
   * 2aafcc1737e74d9569531d5efc5faf8c5d1b33ec UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] MathurCodes1 opened a new issue, #9096: [SUPPORT] Unable to alter column name for a Hudi table.

2023-06-29 Thread via GitHub



MathurCodes1 opened a new issue, #9096:
URL: https://github.com/apache/hudi/issues/9096

   
   
   **Describe the problem you faced**
   I'm unable to alter the column name of Hudi table .
   spark.sql("ALTER TABLE customer_db.customer RENAME COLUMN subid TO 
subidentifier") unbable to change the column name.
   
   A clear and concise description of the problem.
   
   I'm unable to alter the column name of Hudi table .
   spark.sql("ALTER TABLE customer_db.customer RENAME COLUMN subid TO 
subidentifier") code is unable to change the column name.
   
   Getting the following error when trying to change the column using above 
code:
   **RENAME COLUMN is only supported with v2 tables**
   
   
   **To Reproduce**
   
   ```
   import com.amazonaws.services.glue.GlueContext
   import com.amazonaws.services.glue.util.{GlueArgParser, Job}
   import org.apache.hudi.DataSourceWriteOptions
   import org.apache.spark.sql.functions._
   import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}
   import org.apache.spark.{SparkConf, SparkContext}
   
   import scala.collection.JavaConverters._
   import scala.collection.mutable
   
   object ReportingJob {
   
 var spark: SparkSession = _
 var glueContext: GlueContext = _
   
 def main(inputParams: Array[String]): Unit = {
   
   val args: Map[String, String] = 
GlueArgParser.getResolvedOptions(inputParams, Seq("JOB_NAME").toArray)
   val sysArgs: mutable.Map[String, String] = 
scala.collection.mutable.Map(args.toSeq: _*)
  
   implicit val glueContext: GlueContext = init(sysArgs)
   implicit val spark: SparkSession = glueContext.getSparkSession
   
   import spark.implicits._

   val partitionColumnName: String = "id"
   val hudiTableName: String = "Customer"
   val preCombineKey: String = "id"
   val recordKey = "id"
   val basePath= "s3://aws-amazon-uk/customer/production/"
   
   
  val df= Seq((123,"1","seq1"),(124,"0","seq2")).toDF("id","subid","subseq")
   
 val hudiCommonOptions: Map[String, String] = Map(
   "hoodie.table.name" -> hudiTableName,
   "hoodie.datasource.write.keygenerator.class" -> 
"org.apache.hudi.keygen.ComplexKeyGenerator",
   "hoodie.datasource.write.precombine.field" -> preCombineKey,
   "hoodie.datasource.write.recordkey.field" -> recordKey,
   "hoodie.datasource.write.operation" -> "bulk_insert",
   //"hoodie.datasource.write.operation" -> "upsert",
   "hoodie.datasource.write.row.writer.enable" -> "true",
   "hoodie.datasource.write.reconcile.schema" -> "true",
   "hoodie.datasource.write.partitionpath.field" -> partitionColumnName,
   "hoodie.datasource.write.hive_style_partitioning" -> "true",
   // "hoodie.bulkinsert.shuffle.parallelism" -> "2000",
   //  "hoodie.upsert.shuffle.parallelism" -> "400",
   "hoodie.datasource.hive_sync.enable" -> "true",
   "hoodie.datasource.hive_sync.table" -> hudiTableName,
   "hoodie.datasource.hive_sync.database" -> "customer_db",
   "hoodie.datasource.hive_sync.partition_fields" -> 
partitionColumnName,
   "hoodie.datasource.hive_sync.partition_extractor_class" -> 
"org.apache.hudi.hive.MultiPartKeysValueExtractor",
   "hoodie.datasource.hive_sync.use_jdbc" -> "false",
   "hoodie.combine.before.upsert" -> "true",
   "hoodie.avro.schema.external.transformation" -> "true",
   "hoodie.schema.on.read.enable" -> "true",
   "hoodie.datasource.write.schema.allow.auto.evolution.column.drop" -> 
"true",
   "hoodie.index.type" -> "BLOOM",
   "spark.hadoop.parquet.avro.write-old-list-structure" -> "false",
   DataSourceWriteOptions.TABLE_TYPE.key() -> "COPY_ON_WRITE"
 )
   
   

 df.write.format("org.apache.hudi")
   .options(hudiCommonOptions)
   .mode(SaveMode.Overwrite)
   .save(basePath+hudiTableName)

spark.sql("ALTER TABLE customer_db.customer RENAME COLUMN subid 
TO subidentifier")
 commit()
 }
   
 def commit(): Unit = {
   Job.commit()
 }
   
   
 def init(sysArgs: mutable.Map[String, String]): GlueContext = {
   
   val conf = new SparkConf()
   
   conf.set("spark.serializer", 
"org.apache.spark.serializer.KryoSerializer")
   conf.set("spark.sql.legacy.parquet.int96RebaseModeInRead", "CORRECTED")
   conf.set("spark.sql.legacy.parquet.int96RebaseModeInWrite", "CORRECTED")
   conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", 
"CORRECTED")
   conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", 
"CORRECTED")
   conf.set("spark.sql.avro.datetimeRebaseModeInRead", "CORRECTED")
   val sparkContext = new SparkContext(conf)
   glueContext = new GlueContext(sparkContext)
   Job.init(sysArgs("JOB_NAME"), glueContext, sysArgs.asJava)
   glueContext

[GitHub] [hudi] hudi-bot commented on pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.

2023-06-29 Thread via GitHub



hudi-bot commented on PR #8837:
URL: https://github.com/apache/hudi/pull/8837#issuecomment-1613593548

   
   ## CI report:
   
   * 9751b6399ebf6b629f3940d612bdfe2e2005a25f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18172)
 
   * 50a92342798b808ebe521d82b99e4622eeb77ce8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18207)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on a diff in pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub



nsivabalan commented on code in PR #9058:
URL: https://github.com/apache/hudi/pull/9058#discussion_r1246951472


##
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##
@@ -209,9 +211,10 @@ public class HoodieMetadataPayload implements 
HoodieRecordPayload 
orderingVal) {
-this(Option.of(record));
+  public HoodieMetadataPayload(@Nullable GenericRecord record, Comparable 
orderingVal) {
+this(Option.ofNullable(record));

Review Comment:
   
https://github.com/apache/hudi/blob/dc3aa399ffc4875abba7be5833ebabca222eb6ff/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java#L292
 
   
   We issue deletes to RLI using EmptyRecordPayload which goes in as a Delete 
Log BLock. When we deserialize (read path) this, it goes here where we try to 
instantiate the resp payload using reflection. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on a diff in pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub



nsivabalan commented on code in PR #9058:
URL: https://github.com/apache/hudi/pull/9058#discussion_r1246951472


##
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##
@@ -209,9 +211,10 @@ public class HoodieMetadataPayload implements 
HoodieRecordPayload 
orderingVal) {
-this(Option.of(record));
+  public HoodieMetadataPayload(@Nullable GenericRecord record, Comparable 
orderingVal) {
+this(Option.ofNullable(record));

Review Comment:
   
https://github.com/apache/hudi/blob/dc3aa399ffc4875abba7be5833ebabca222eb6ff/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java#L292
 
   
   We issue deletes to RLI using EmptyRecordPayload which goes in as a Delete 
Log BLock. When we deserialize this, it goes here where we try to instantiate 
the resp payload using reflection. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on a diff in pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub



nsivabalan commented on code in PR #9058:
URL: https://github.com/apache/hudi/pull/9058#discussion_r1246949788


##
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##
@@ -283,6 +285,8 @@ public HoodieMetadataPayload(Option 
recordOpt) {
 
Integer.parseInt(recordIndexRecord.get(RECORD_INDEX_FIELD_FILE_INDEX).toString()),
 
Long.parseLong(recordIndexRecord.get(RECORD_INDEX_FIELD_INSTANT_TIME).toString()));
   }
+} else {
+  this.isDeletedRecord = true;

Review Comment:
   hey folks. here is the context. I feel we should go this route. may be there 
are opportunities to optimize col stats and bloom filter records as well. 
   
   Generally, for any payload, we should have a key and a top level field 
preferrably to denote isDeleted. 
   So, if entire records needs to be deleted, we can rely on the top level 
isDelete field. This is unavoidable since we write using 
EmptyHoodieRecordPayload in some flows (delete), but read back using specific 
payload class. So, every payload will have to support deserialize an 
EmptyRecordPayload. 
   
   
   Now, lets go into more specifics. 
   RLI:
   Commit1: 
   add key1 to RLI partition. 
   
   rolling back commit1:
   delete key1 from RLI partition. 
   From a HoodieRecord standpoint, its as simple as adding a new entry and 
deleting the same. Its simpler and our getInsertValue or 
combineAndGetUpdateValue will be fast. If we push the isDeleted to 
HoodieRecordIndexInfo, then we need to explicitly set the type and then parse 
the HoodieRecordIndexInfo data and then deduce that its deleted. 
   
   Again, w/ EmptyRecordPayload, this is not even doable and we have to go with 
this. 
   
   Why we did not have this issue before. 
   apparently, with FILES, the keys are partitions, and hence, except 
delete_partition, no records from FILES will be deleted in its entirely. 
   
   W/ col stats, a delete, while writing to MDT partition, is yet another 
upsert record with isDeleted within ColumnStats Metadata. So, our 
getInsertValue or combineAndGetUpdate value will need to deserialize entire 
record and then deduce that its deleted. 
   A right fix here also would be to do what we are doing w/ RLI in this patch. 
   
   i.e. 
   in commit1, 
   add col1_part1_file1 : value to MDT
   
   in some X commit, when file1 is deleted:
   just delete col1_part1_file1 from col stats partition in MDT, by using 
EmptyRecordPayload. 
   
   So, Log record reading and compaction will be fast. 
   
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-6426) Upgrade Spark 3.4.1

2023-06-29 Thread Udit Mehrotra (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udit Mehrotra updated HUDI-6426:

Fix Version/s: 0.14.0
 Priority: Blocker  (was: Major)

> Upgrade Spark 3.4.1
> ---
>
> Key: HUDI-6426
> URL: https://issues.apache.org/jira/browse/HUDI-6426
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Rahil Chertara
>Priority: Blocker
> Fix For: 0.14.0
>
>
> Spark 3.4.1 rc1 is out [https://github.com/apache/spark/tree/v3.4.1-rc1] we 
> should start the upgrade process for this. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.

2023-06-29 Thread via GitHub



hudi-bot commented on PR #8837:
URL: https://github.com/apache/hudi/pull/8837#issuecomment-1613571050

   
   ## CI report:
   
   * 9751b6399ebf6b629f3940d612bdfe2e2005a25f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18172)
 
   * 50a92342798b808ebe521d82b99e4622eeb77ce8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8609: [HUDI-6154] Introduced retry while reading hoodie.properties to deal with parallel updates.

2023-06-29 Thread via GitHub



hudi-bot commented on PR #8609:
URL: https://github.com/apache/hudi/pull/8609#issuecomment-1613560816

   
   ## CI report:
   
   * a64034d612fa64c99dd8d319ac00680924773f53 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18197)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] splate commented on pull request #3391: [HUDI-83] Fix Timestamp/Date type read by Hive3

2023-06-29 Thread via GitHub



splate commented on PR #3391:
URL: https://github.com/apache/hudi/pull/3391#issuecomment-1613554365

   Would this bug also exist in the spark hudi libraries used in AWS glue?  My 
issue is I am trying to use Spark SQL to query a hudi table and put it into a 
spark dataframe.  I am getting a casting exception 
("java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be 
cast to org.apache.hadoop.hive.serde2.io.TimestampWritable").   Would this be 
related to this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9066: [HUDI-6452] Add MOR snapshot reader to integrate with query engines without using Hadoop APIs

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9066:
URL: https://github.com/apache/hudi/pull/9066#issuecomment-1613551942

   
   ## CI report:
   
   * 60c1b8c5885fdda28e07f3ba79290f01dc60a9c4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18196)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch master updated (05435bb0344 -> dc3aa399ffc)

2023-06-29 Thread xushiyan

This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 05435bb0344 [MINOR] Increase timeout for Azure CI: UT spark-datasource 
to 240 minutes (#9089)
 add dc3aa399ffc [HUDI-6393] Enable MOR support for Record index with 
functional test cases (#9017)

No new revisions were added by this update.

Summary of changes:
 .../metadata/HoodieBackedTableMetadataWriter.java  |   5 -
 .../hudi/metadata/HoodieBackedTableMetadata.java   |   4 +
 .../hudi/functional/TestRecordLevelIndex.scala | 608 +
 .../org/apache/hudi/util/JavaConversions.scala |  23 +-
 4 files changed, 625 insertions(+), 15 deletions(-)
 create mode 100644 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestRecordLevelIndex.scala
 copy 
hudi-utilities/src/main/java/org/apache/hudi/utilities/exception/HoodieIncrementalPullException.java
 => 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/util/JavaConversions.scala
 (65%)

[GitHub] [hudi] xushiyan merged pull request #9017: [HUDI-6393] Enable MOR support for Record index with functional test cases

2023-06-29 Thread via GitHub



xushiyan merged PR #9017:
URL: https://github.com/apache/hudi/pull/9017


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xushiyan commented on pull request #9017: [HUDI-6393] Enable MOR support for Record index with functional test cases

2023-06-29 Thread via GitHub



xushiyan commented on PR #9017:
URL: https://github.com/apache/hudi/pull/9017#issuecomment-1613510452

   CI is timing out as expected. The newly added testcase is passing. will land 
this now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan merged pull request #9089: [MINOR] Increase timeout for Azure CI: UT spark-datasource to 240 minutes

2023-06-29 Thread via GitHub



nsivabalan merged PR #9089:
URL: https://github.com/apache/hudi/pull/9089


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch master updated (8def3e68ae5 -> 05435bb0344)

2023-06-29 Thread sivabalan

This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 8def3e68ae5 [MINOR] Improve CollectionUtils helper methods (#9088)
 add 05435bb0344 [MINOR] Increase timeout for Azure CI: UT spark-datasource 
to 240 minutes (#9089)

No new revisions were added by this update.

Summary of changes:
 azure-pipelines-20230430.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

[GitHub] [hudi] hudi-bot commented on pull request #9083: PKLess Merge Into

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9083:
URL: https://github.com/apache/hudi/pull/9083#issuecomment-1613500714

   
   ## CI report:
   
   * be6801e9ca41f00576a511c7d3ffe144e90717ee Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18179)
 
   * 3a0bfb88049cf2c0f8afe5c925dbd76fa6f7cd89 UNKNOWN
   * 767eb9cc26d98ed8e64632f98ab688aa4145e5aa Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18204)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9058:
URL: https://github.com/apache/hudi/pull/9058#issuecomment-1613500487

   
   ## CI report:
   
   * 1697d1bfa095ca16a9361e3728a77331d3a28037 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18195)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9083: PKLess Merge Into

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9083:
URL: https://github.com/apache/hudi/pull/9083#issuecomment-1613490095

   
   ## CI report:
   
   * be6801e9ca41f00576a511c7d3ffe144e90717ee Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18179)
 
   * 3a0bfb88049cf2c0f8afe5c925dbd76fa6f7cd89 UNKNOWN
   * 767eb9cc26d98ed8e64632f98ab688aa4145e5aa UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1613489865

   
   ## CI report:
   
   * 4775dce07f2f3237b32f22b360f3423b1eafce85 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18191)
 
   * af66542fd96990611c79e90c943a18341442 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18203)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1613476228

   
   ## CI report:
   
   * 4775dce07f2f3237b32f22b360f3423b1eafce85 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18191)
 
   * af66542fd96990611c79e90c943a18341442 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9083: PKLess Merge Into

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9083:
URL: https://github.com/apache/hudi/pull/9083#issuecomment-1613476471

   
   ## CI report:
   
   * be6801e9ca41f00576a511c7d3ffe144e90717ee Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18179)
 
   * 3a0bfb88049cf2c0f8afe5c925dbd76fa6f7cd89 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9017: [HUDI-6393] Enable MOR support for Record index with functional test cases

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9017:
URL: https://github.com/apache/hudi/pull/9017#issuecomment-1613475988

   
   ## CI report:
   
   * ceffe7d8146f48e1c6c083613646463c1404a77f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18194)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] BBency opened a new issue, #9094: Async Clustering failing with errors for MOR table

2023-06-29 Thread via GitHub



BBency opened a new issue, #9094:
URL: https://github.com/apache/hudi/issues/9094

   **Problem Description**
   
   We have a MOR table which is partitioned by yearmonth(MM). We would like 
to trigger async clustering after doing the compaction at the end of the day so 
that we can stitch together small files into larger files. Async clustering for 
the table is failing. Below are the different approaches I tried and the error 
messages I got.
   
   **Hudi Config Used**
   
   ```
   "hoodie.table.name" -> hudiTableName,
   "hoodie.datasource.write.keygenerator.class" -> 
"org.apache.hudi.keygen.ComplexKeyGenerator",
   "hoodie.datasource.write.precombine.field" -> preCombineKey,
   "hoodie.datasource.write.recordkey.field" -> recordKey,
   "hoodie.datasource.write.operation" -> writeOperation,
   "hoodie.datasource.write.row.writer.enable" -> "true",
   "hoodie.datasource.write.reconcile.schema" -> "true",
   "hoodie.datasource.write.partitionpath.field" -> partitionColumnName,
   "hoodie.datasource.write.hive_style_partitioning" -> "true",
   "hoodie.bulkinsert.sort.mode" -> "GLOBAL_SORT",
   "hoodie.datasource.hive_sync.enable" -> "true",
   "hoodie.datasource.hive_sync.table" -> hudiTableName,
   "hoodie.datasource.hive_sync.database" -> databaseName,
   "hoodie.datasource.hive_sync.partition_fields" -> partitionColumnName,
   "hoodie.datasource.hive_sync.partition_extractor_class" -> 
"org.apache.hudi.hive.MultiPartKeysValueExtractor",
   "hoodie.datasource.hive_sync.use_jdbc" -> "false",
   "hoodie.combine.before.upsert" -> "true",
   "hoodie.index.type" -> "BLOOM",
   "spark.hadoop.parquet.avro.write-old-list-structure" -> "false"
   "hoodie.datasource.write.table.type" -> "MERGE_ON_READ"
   "hoodie.compact.inline" -> "false",
   "hoodie.compact.schedule.inline" -> "true",
   "hoodie.compact.inline.trigger.strategy" -> "NUM_COMMITS",
   "hoodie.compact.inline.max.delta.commits" -> "5",
   "hoodie.cleaner.policy" -> "KEEP_LATEST_COMMITS",
   "hoodie.cleaner.commits.retained" -> "3",
   "hoodie.clustering.async.enabled" -> "true",
   "hoodie.clustering.async.max.commits" -> "2",
   "hoodie.clustering.execution.strategy.class" -> 
"org.apache.hudi.client.clustering.run.strategy.SparkSortAndSizeExecutionStrategy",
   "hoodie.clustering.plan.strategy.sort.columns" -> recordKey,
   "hoodie.clustering.plan.strategy.small.file.limit" -> "67108864",
   "hoodie.clustering.plan.strategy.target.file.max.bytes" -> "134217728",
   "hoodie.clustering.plan.strategy.max.bytes.per.group" -> "2147483648",
   "hoodie.clustering.plan.strategy.max.num.groups" -> "150",
   "hoodie.clustering.preserve.commit.metadata" -> "true"
   ```
   
   **Approaches Tried**
   
   1. Triggered a clustering job with running mode as scheduleAndExecute
   **Code Used**
   
   ``` val hudiClusterConfig = new HoodieClusteringJob.Config
hudiClusterConfig.basePath = 
hudiClusterConfig.tableName = 
hudiClusterConfig.runningMode = "scheduleAndExecute"
hudiClusterConfig.retryLastFailedClusteringJob = true
val configList: util.List[String] = new util.ArrayList()
configList.add("hoodie.clustering.async.enabled=true")
configList.add("hoodie.clustering.async.max.commits=2") 
configList.add("hoodie.clustering.execution.strategy.class=org.apache.hudi.client.clustering.run.strategy.SparkSortAndSizeExecutionStrategy")

configList.add("hoodie.clustering.plan.strategy.sort.columns=")

configList.add("hoodie.clustering.plan.strategy.small.file.limit=67108864")

configList.add("hoodie.clustering.plan.strategy.target.file.max.bytes=134217728")

configList.add("hoodie.clustering.plan.strategy.max.bytes.per.group=2147483648")
configList.add("hoodie.clustering.plan.strategy.max.num.groups=150")
configList.add("hoodie.clustering.preserve.commit.metadata=true")
hudiClusterConfig.configs = configList
val hudiClusterJob = new HoodieClusteringJob(jsc, hudiClusterConfig)
val clusterStatus = hudiClusterJob.cluster(1)
println(clusterStatus)
```
   
**Stacktrace**
   
   ShuffleMapStage 87 (sortBy at RDDCustomColumnsSortPartitioner.java:64) 
failed in 1.098 s due to Job aborted due to stage failure: task 0.0 in stage 
28.0 (TID 367) had a not serializable result: 
org.apache.avro.generic.GenericData$Record
   Serialization stack:
- object not serializable (class: 
org.apache.avro.generic.GenericData$Record, value:

   2. Used the procedure run_clustering to schedule and trigger clustering. We 
found that the replacecommit created through the procedure run had lesser data 
compared to what it was created when scheduled from the code in approach 1
   **Code Used**

   ```query_run_clustering = f"call run_clustering(path => '{path}')"
   spark_df_run_clustering = spark.sql(query_run_clustering)
   spark_df_run_clustering.show()
   ```
   
   **Stacktrace**
   
   An error

[GitHub] [hudi] xushiyan commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-29 Thread via GitHub



xushiyan commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1613418110

   manually verified the flow 0.13.1 -> 0.14.0-SNAPSHOT (this PR)
   
   before upgrade
   
   ```
   hoodie.table.version=5
   hoodie.table.metadata.partitions=files
   ```
   
   upgrade
   
   ```
   ./hudi-cli.sh
   connect --path /tmp/hudi_trips_13_1_to_14_0_COPY_ON_WRITE
   upgrade table --toVersion 6 --sparkMaster 'local[2]'
   ```
   
   after upgrade
   
   ```
   hoodie.table.version=6
   hoodie.table.metadata.partitions=files
   ```
   
   write data with RLI enabled
   
   ```
   hoodie.table.version=6
   hoodie.table.metadata.partitions=files,record_index
   ```
   
   RLI partition and hfiles created
   
   downgrade
   
   ```
   downgrade table --toVersion 5 --sparkMaster 'local[2]'
   ```
   
   after downgrade
   
   ```
   hoodie.table.version=5
   hoodie.table.metadata.partitions=files
   ```
   
   RLI partition is removed
   
   ```
   ➜ ll /tmp/hudi_trips_13_1_to_14_0_COPY_ON_WRITE/.hoodie/metadata/record_index
   ls: 
/tmp/hudi_trips_13_1_to_14_0_COPY_ON_WRITE/.hoodie/metadata/record_index: No 
such file or directory
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #9079: [SUPPORT] Hudi delete not working when using UuidKeyGenerator

2023-06-29 Thread via GitHub



nsivabalan commented on issue #9079:
URL: https://github.com/apache/hudi/issues/9079#issuecomment-1613408548

   this is a known limitation of UUID Key generator. This key gen is generally 
meant to be used only for immutable data. 
   with 0.14.0, we are adding pk less(primary key less) table, you can use 
spark-sql DELETES to delete records. but this is coming in 0.14.0 and we don't 
have any such support in prior versions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] noahtaite commented on issue #9067: [SUPPORT] Manual Glue sync for large, highly partitioned table failing

2023-06-29 Thread via GitHub



noahtaite commented on issue #9067:
URL: https://github.com/apache/hudi/issues/9067#issuecomment-1613377080

   Hello @danny0405 @ad1happy2go I can confirm 0.13.1 works nicely as the HMS 
sync mode now supports batching and boolean values (conditional sync). thank 
you for the support


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] noahtaite closed issue #9067: [SUPPORT] Manual Glue sync for large, highly partitioned table failing

2023-06-29 Thread via GitHub



noahtaite closed issue #9067: [SUPPORT] Manual Glue sync for large, highly 
partitioned table failing
URL: https://github.com/apache/hudi/issues/9067


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] gamblewin opened a new issue, #9093: [SUPPORT] Is it allowed using Flink Table API sqlQuery() to read data from hudi tables?

2023-06-29 Thread via GitHub



gamblewin opened a new issue, #9093:
URL: https://github.com/apache/hudi/issues/9093

   **Describe the problem you faced**
   
   I'm trying to use flink table api sqlQuery to read data from hudi table but 
not working, so am i doing it wrong or hudi doesn't support this way to query 
data.
   
   **Code**
   ```java
   sEnv = StreamExecutionEnvironment.getExecutionEnvironment();
   sTableEnv = StreamTableEnvironment.create(sEnv);
   sEnv.setParallelism(1);
   sEnv.enableCheckpointing(3000);
   // create table
   String createTabelSql = "create table dept(\n" +
   "  dept_id BIGINT PRIMARY KEY NOT ENFORCED,\n" +
   "  dept_name varchar(10),\n" +
   "  ts timestamp(3)\n" +
   ")\n" +
   "with (\n" +
   "  'connector' = 'hudi',\n" +
   "  'path' = 'hdfs://localhost:9000/hudi/dept',\n" +
   "  'table.type' = 'MERGE_ON_READ'\n" +
   ")";
   sTableEnv.executeSql(createTabelSql);
   // insert data
   sTableEnv.executeSql("insert into dept values (1, 'a', NOW()), (2, 'b', 
NOW())");
   // query data
   Table table = sTableEnv.sqlQuery("select * from dept");
   DataStream dataStream = sTableEnv.toDataStream(table);
   // there's nothing to print
   dataStream.print();
   ```
   
   **Environment Description**
   
   * Hudi version : 1.12.0
   
   * Hadoop version : 3.1.3
   
   * Flink version: 1.13.6
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9038: [HUDI-6423] Incremental cleaning should consider inflight compaction instant

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9038:
URL: https://github.com/apache/hudi/pull/9038#issuecomment-1613357939

   
   ## CI report:
   
   * a65a29c0cf1c8feb9f39e168ba80c99ebcae1c5d UNKNOWN
   * 34f8823f48712c57058bc37c8936a276c1457557 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18187)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18193)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18188)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] ad1happy2go commented on issue #9086: [SUPPORT]How to build with scala 2.11 for spark and scala2.12 for flink

2023-06-29 Thread via GitHub



ad1happy2go commented on issue #9086:
URL: https://github.com/apache/hudi/issues/9086#issuecomment-1613339717

   @bigdata-spec I dont think we can build with different scala version in a 
single build. You may need to build it twice and then use the spark and flink 
jars from separate artifacts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] ad1happy2go commented on issue #9091: [BUG] Use NonpartitionedKeyGenerator WriteOperationType BULK_INSERT and UPSERT get different _hoodie_record_key format

2023-06-29 Thread via GitHub



ad1happy2go commented on issue #9091:
URL: https://github.com/apache/hudi/issues/9091#issuecomment-1613328306

   @lipusheng Known Issue which got fixed in hudi 0.13.X.
   
   Refer this GitHub issue - https://github.com/apache/hudi/issues/8981


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9082: [HUDI-6445] Distribute spark ds func tests

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9082:
URL: https://github.com/apache/hudi/pull/9082#issuecomment-1613268792

   
   ## CI report:
   
   * c529c624afdca331514a2bdfb78cc6e18ab9f57a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18185)
 
   * 474ce7e9a78909fe90b0641f7be1b059084bb11a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18202)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9082: [HUDI-6445] Distribute spark ds func tests

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9082:
URL: https://github.com/apache/hudi/pull/9082#issuecomment-1613254158

   
   ## CI report:
   
   * c529c624afdca331514a2bdfb78cc6e18ab9f57a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18185)
 
   * 474ce7e9a78909fe90b0641f7be1b059084bb11a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9041:
URL: https://github.com/apache/hudi/pull/9041#issuecomment-1613253794

   
   ## CI report:
   
   * b681df04a7ad0febbcd9235622c2ee7f98759cf9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18082)
 
   * 2f139383c54f93669342539af77dca9b3a352be3 UNKNOWN
   * a1458e17e5749a89948be8f60387eeecd4c0f87c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18201)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] KenjiFujima commented on pull request #8933: [HUDI-5329] Spark reads table error when Flink creates table without record key and primary key

2023-06-29 Thread via GitHub



KenjiFujima commented on PR #8933:
URL: https://github.com/apache/hudi/pull/8933#issuecomment-1613251280

   @danny0405, I have addressed above comments. PTAL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xushiyan commented on a diff in pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub



xushiyan commented on code in PR #9041:
URL: https://github.com/apache/hudi/pull/9041#discussion_r1246658153


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -459,11 +459,6 @@ private Pair> 
initializeRecordIndexPartition()
 final HoodieMetadataFileSystemView fsView = new 
HoodieMetadataFileSystemView(dataMetaClient,
 dataMetaClient.getActiveTimeline(), metadata);
 
-// MOR tables are not supported
-if (!dataMetaClient.getTableType().equals(HoodieTableType.COPY_ON_WRITE)) {
-  throw new HoodieMetadataException("Only COW tables are supported with 
record index");
-}
-

Review Comment:
   this change will be included in functional test PR (which should be merged 
first). i include it here for CI to pass. when merging, this diff should be 
auto-resolved.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xushiyan commented on a diff in pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub



xushiyan commented on code in PR #9041:
URL: https://github.com/apache/hudi/pull/9041#discussion_r1246655960


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##
@@ -310,6 +312,56 @@ public static  HoodieData> 
mergeForPartitionUpdates(
 return Arrays.asList(deleteRecord, getTaggedRecord(merged, 
Option.empty())).iterator();
   }
 });
-return taggedUpdatingRecords.union(newRecords);
+return taggedUpdatingRecords.union(taggedNewRecords);
+  }
+
+  public static  HoodieData> tagGlobalLocationBackToRecords(
+  HoodieData> incomingRecords,
+  HoodiePairData 
keyAndExistingLocations,
+  boolean mayContainDuplicateLookup,
+  boolean shouldUpdatePartitionPath,
+  HoodieWriteConfig config,
+  HoodieTable table) {
+final HoodieRecordMerger merger = config.getRecordMerger();
+
+HoodiePairData> keyAndIncomingRecords =
+incomingRecords.mapToPair(record -> Pair.of(record.getRecordKey(), 
record));
+
+// Pair of incoming record and the global location if meant for merged 
lookup in later stage
+HoodieData, Option>> 
incomingRecordsAndLocations
+= keyAndIncomingRecords.leftOuterJoin(keyAndExistingLocations).values()
+.map(v -> {
+  final HoodieRecord incomingRecord = v.getLeft();
+  Option currentLocOpt = 
Option.ofNullable(v.getRight().orElse(null));
+  if (currentLocOpt.isPresent()) {
+HoodieRecordGlobalLocation currentLoc = currentLocOpt.get();
+boolean shouldPerformMergedLookUp = mayContainDuplicateLookup
+|| !Objects.equals(incomingRecord.getPartitionPath(), 
currentLoc.getPartitionPath());
+if (shouldUpdatePartitionPath && shouldPerformMergedLookUp) {
+  return Pair.of(incomingRecord, currentLocOpt);
+} else {
+  // - When update partition path is set to false,
+  //   the incoming record will be tagged to the existing record's 
partition regardless of being equal or not.
+  // - When update partition path is set to true,
+  //   the incoming record will be tagged to the existing record's 
partition
+  //   when partition is not updated and the look-up won't have 
duplicates (e.g. COW, or using RLI).
+  return Pair.of((HoodieRecord) getTaggedRecord(

Review Comment:
   this was having Option.empty() as right of the pair and it won't be 
merged-lookup candidates



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codope commented on a diff in pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub



codope commented on code in PR #9041:
URL: https://github.com/apache/hudi/pull/9041#discussion_r1246655286


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##
@@ -310,6 +312,56 @@ public static  HoodieData> 
mergeForPartitionUpdates(
 return Arrays.asList(deleteRecord, getTaggedRecord(merged, 
Option.empty())).iterator();
   }
 });
-return taggedUpdatingRecords.union(newRecords);
+return taggedUpdatingRecords.union(taggedNewRecords);
+  }
+
+  public static  HoodieData> tagGlobalLocationBackToRecords(
+  HoodieData> incomingRecords,
+  HoodiePairData 
keyAndExistingLocations,
+  boolean mayContainDuplicateLookup,
+  boolean shouldUpdatePartitionPath,
+  HoodieWriteConfig config,
+  HoodieTable table) {
+final HoodieRecordMerger merger = config.getRecordMerger();
+
+HoodiePairData> keyAndIncomingRecords =
+incomingRecords.mapToPair(record -> Pair.of(record.getRecordKey(), 
record));
+
+// Pair of incoming record and the global location if meant for merged 
lookup in later stage
+HoodieData, Option>> 
incomingRecordsAndLocations
+= keyAndIncomingRecords.leftOuterJoin(keyAndExistingLocations).values()
+.map(v -> {
+  final HoodieRecord incomingRecord = v.getLeft();
+  Option currentLocOpt = 
Option.ofNullable(v.getRight().orElse(null));
+  if (currentLocOpt.isPresent()) {
+HoodieRecordGlobalLocation currentLoc = currentLocOpt.get();
+boolean shouldPerformMergedLookUp = mayContainDuplicateLookup
+|| !Objects.equals(incomingRecord.getPartitionPath(), 
currentLoc.getPartitionPath());
+if (shouldUpdatePartitionPath && shouldPerformMergedLookUp) {
+  return Pair.of(incomingRecord, currentLocOpt);
+} else {
+  // - When update partition path is set to false,
+  //   the incoming record will be tagged to the existing record's 
partition regardless of being equal or not.
+  // - When update partition path is set to true,
+  //   the incoming record will be tagged to the existing record's 
partition
+  //   when partition is not updated and the look-up won't have 
duplicates (e.g. COW, or using RLI).
+  return Pair.of((HoodieRecord) getTaggedRecord(
+  createNewHoodieRecord(incomingRecord, currentLoc, 
merger), Option.of(currentLoc)),
+  Option.empty());
+}
+  } else {
+return Pair.of(getTaggedRecord(incomingRecord, Option.empty()), 
Option.empty());
+  }
+});
+return shouldUpdatePartitionPath
+? mergeForPartitionUpdatesIfNeeded(incomingRecordsAndLocations, 
config, table)

Review Comment:
   yeah we need to consider duplicates, otherwise we'll have to special-case 
for RLI.



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -459,11 +459,6 @@ private Pair> 
initializeRecordIndexPartition()
 final HoodieMetadataFileSystemView fsView = new 
HoodieMetadataFileSystemView(dataMetaClient,
 dataMetaClient.getActiveTimeline(), metadata);
 
-// MOR tables are not supported
-if (!dataMetaClient.getTableType().equals(HoodieTableType.COPY_ON_WRITE)) {
-  throw new HoodieMetadataException("Only COW tables are supported with 
record index");
-}
-

Review Comment:
   Would prefer to land it in a separate commit. I guess #9017 will land 
earlier anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xushiyan commented on a diff in pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub



xushiyan commented on code in PR #9041:
URL: https://github.com/apache/hudi/pull/9041#discussion_r1241059669


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##
@@ -310,6 +312,56 @@ public static  HoodieData> 
mergeForPartitionUpdates(
 return Arrays.asList(deleteRecord, getTaggedRecord(merged, 
Option.empty())).iterator();
   }
 });
-return taggedUpdatingRecords.union(newRecords);
+return taggedUpdatingRecords.union(taggedNewRecords);
+  }
+
+  public static  HoodieData> tagGlobalLocationBackToRecords(
+  HoodieData> incomingRecords,
+  HoodiePairData 
keyAndExistingLocations,
+  boolean mayContainDuplicateLookup,
+  boolean shouldUpdatePartitionPath,
+  HoodieWriteConfig config,
+  HoodieTable table) {
+final HoodieRecordMerger merger = config.getRecordMerger();
+
+HoodiePairData> keyAndIncomingRecords =
+incomingRecords.mapToPair(record -> Pair.of(record.getRecordKey(), 
record));
+
+// Pair of incoming record and the global location if meant for merged 
lookup in later stage
+HoodieData, Option>> 
incomingRecordsAndLocations
+= keyAndIncomingRecords.leftOuterJoin(keyAndExistingLocations).values()
+.map(v -> {
+  final HoodieRecord incomingRecord = v.getLeft();
+  Option currentLocOpt = 
Option.ofNullable(v.getRight().orElse(null));
+  if (currentLocOpt.isPresent()) {
+HoodieRecordGlobalLocation currentLoc = currentLocOpt.get();
+boolean shouldPerformMergedLookUp = mayContainDuplicateLookup
+|| !Objects.equals(incomingRecord.getPartitionPath(), 
currentLoc.getPartitionPath());
+if (shouldUpdatePartitionPath && shouldPerformMergedLookUp) {
+  return Pair.of(incomingRecord, currentLocOpt);
+} else {
+  // - When update partition path is set to false,
+  //   the incoming record will be tagged to the existing record's 
partition regardless of being equal or not.
+  // - When update partition path is set to true,
+  //   the incoming record will be tagged to the existing record's 
partition
+  //   when partition is not updated and the look-up won't have 
duplicates (e.g. COW, or using RLI).
+  return Pair.of((HoodieRecord) getTaggedRecord(

Review Comment:
   new record creation needs optimization; i have not finished it yet.



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##
@@ -310,6 +312,56 @@ public static  HoodieData> 
mergeForPartitionUpdates(
 return Arrays.asList(deleteRecord, getTaggedRecord(merged, 
Option.empty())).iterator();
   }
 });
-return taggedUpdatingRecords.union(newRecords);
+return taggedUpdatingRecords.union(taggedNewRecords);
+  }
+
+  public static  HoodieData> tagGlobalLocationBackToRecords(
+  HoodieData> incomingRecords,
+  HoodiePairData 
keyAndExistingLocations,
+  boolean mayContainDuplicateLookup,
+  boolean shouldUpdatePartitionPath,
+  HoodieWriteConfig config,
+  HoodieTable table) {
+final HoodieRecordMerger merger = config.getRecordMerger();
+
+HoodiePairData> keyAndIncomingRecords =
+incomingRecords.mapToPair(record -> Pair.of(record.getRecordKey(), 
record));
+
+// Pair of incoming record and the global location if meant for merged 
lookup in later stage
+HoodieData, Option>> 
incomingRecordsAndLocations
+= keyAndIncomingRecords.leftOuterJoin(keyAndExistingLocations).values()
+.map(v -> {
+  final HoodieRecord incomingRecord = v.getLeft();
+  Option currentLocOpt = 
Option.ofNullable(v.getRight().orElse(null));
+  if (currentLocOpt.isPresent()) {
+HoodieRecordGlobalLocation currentLoc = currentLocOpt.get();
+boolean shouldPerformMergedLookUp = mayContainDuplicateLookup
+|| !Objects.equals(incomingRecord.getPartitionPath(), 
currentLoc.getPartitionPath());
+if (shouldUpdatePartitionPath && shouldPerformMergedLookUp) {
+  return Pair.of(incomingRecord, currentLocOpt);
+} else {
+  // - When update partition path is set to false,
+  //   the incoming record will be tagged to the existing record's 
partition regardless of being equal or not.
+  // - When update partition path is set to true,
+  //   the incoming record will be tagged to the existing record's 
partition
+  //   when partition is not updated and the look-up won't have 
duplicates (e.g. COW, or using RLI).
+  return Pair.of((HoodieRecord) getTaggedRecord(

Review Comment:
   refactored



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific commen

[GitHub] [hudi] xushiyan commented on a diff in pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub



xushiyan commented on code in PR #9041:
URL: https://github.com/apache/hudi/pull/9041#discussion_r1241059710


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##
@@ -310,6 +312,56 @@ public static  HoodieData> 
mergeForPartitionUpdates(
 return Arrays.asList(deleteRecord, getTaggedRecord(merged, 
Option.empty())).iterator();
   }
 });
-return taggedUpdatingRecords.union(newRecords);
+return taggedUpdatingRecords.union(taggedNewRecords);
+  }
+
+  public static  HoodieData> tagGlobalLocationBackToRecords(
+  HoodieData> incomingRecords,
+  HoodiePairData 
keyAndExistingLocations,
+  boolean mayContainDuplicateLookup,
+  boolean shouldUpdatePartitionPath,
+  HoodieWriteConfig config,
+  HoodieTable table) {
+final HoodieRecordMerger merger = config.getRecordMerger();
+
+HoodiePairData> keyAndIncomingRecords =
+incomingRecords.mapToPair(record -> Pair.of(record.getRecordKey(), 
record));
+
+// Pair of incoming record and the global location if meant for merged 
lookup in later stage
+HoodieData, Option>> 
incomingRecordsAndLocations
+= keyAndIncomingRecords.leftOuterJoin(keyAndExistingLocations).values()
+.map(v -> {
+  final HoodieRecord incomingRecord = v.getLeft();
+  Option currentLocOpt = 
Option.ofNullable(v.getRight().orElse(null));
+  if (currentLocOpt.isPresent()) {
+HoodieRecordGlobalLocation currentLoc = currentLocOpt.get();
+boolean shouldPerformMergedLookUp = mayContainDuplicateLookup
+|| !Objects.equals(incomingRecord.getPartitionPath(), 
currentLoc.getPartitionPath());
+if (shouldUpdatePartitionPath && shouldPerformMergedLookUp) {
+  return Pair.of(incomingRecord, currentLocOpt);
+} else {
+  // - When update partition path is set to false,
+  //   the incoming record will be tagged to the existing record's 
partition regardless of being equal or not.
+  // - When update partition path is set to true,
+  //   the incoming record will be tagged to the existing record's 
partition
+  //   when partition is not updated and the look-up won't have 
duplicates (e.g. COW, or using RLI).
+  return Pair.of((HoodieRecord) getTaggedRecord(
+  createNewHoodieRecord(incomingRecord, currentLoc, 
merger), Option.of(currentLoc)),
+  Option.empty());
+}
+  } else {
+return Pair.of(getTaggedRecord(incomingRecord, Option.empty()), 
Option.empty());
+  }
+});
+return shouldUpdatePartitionPath
+? mergeForPartitionUpdatesIfNeeded(incomingRecordsAndLocations, 
config, table)
+: incomingRecordsAndLocations.map(Pair::getLeft);
+  }
+
+  public static HoodieRecord createNewHoodieRecord(HoodieRecord oldRecord, 
HoodieRecordGlobalLocation location, HoodieRecordMerger merger) {
+HoodieKey recordKey = new HoodieKey(oldRecord.getRecordKey(), 
location.getPartitionPath());
+return merger.getRecordType() == HoodieRecordType.AVRO

Review Comment:
   new record creation needs optimization; i have not finished it yet.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codope commented on a diff in pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub



codope commented on code in PR #9041:
URL: https://github.com/apache/hudi/pull/9041#discussion_r1246648546


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/simple/HoodieGlobalSimpleIndex.java:
##
@@ -72,85 +68,37 @@ public  HoodieData> tagLocation(
   protected  HoodieData> tagLocationInternal(
   HoodieData> inputRecords, HoodieEngineContext context,
   HoodieTable hoodieTable) {
-
-HoodiePairData> keyedInputRecords =
-inputRecords.mapToPair(entry -> new 
ImmutablePair<>(entry.getRecordKey(), entry));
-HoodiePairData allRecordLocationsInTable =
-fetchAllRecordLocations(context, hoodieTable, 
config.getGlobalSimpleIndexParallelism());
-return getTaggedRecords(keyedInputRecords, allRecordLocationsInTable, 
hoodieTable);
+List> latestBaseFiles = 
getAllBaseFilesInTable(context, hoodieTable);

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1613184716

   
   ## CI report:
   
   * 4775dce07f2f3237b32f22b360f3423b1eafce85 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18191)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9041:
URL: https://github.com/apache/hudi/pull/9041#issuecomment-1613184534

   
   ## CI report:
   
   * b681df04a7ad0febbcd9235622c2ee7f98759cf9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18082)
 
   * 2f139383c54f93669342539af77dca9b3a352be3 UNKNOWN
   * a1458e17e5749a89948be8f60387eeecd4c0f87c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-6459) Add Rollback test for Record Level Index

2023-06-29 Thread Lokesh Jain (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HUDI-6459:
--
Summary: Add Rollback test for Record Level Index  (was: Add Rollback 
validation for Record Level Index)

> Add Rollback test for Record Level Index
> 
>
> Key: HUDI-6459
> URL: https://issues.apache.org/jira/browse/HUDI-6459
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>
> The Jira aims to add validation for rollback with record level index. The 
> validation is added in TestRecordLevelIndex test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-6459) Add Rollback validation for Record Level Index

2023-06-29 Thread Lokesh Jain (Jira)

Lokesh Jain created HUDI-6459:
-

 Summary: Add Rollback validation for Record Level Index
 Key: HUDI-6459
 URL: https://issues.apache.org/jira/browse/HUDI-6459
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Lokesh Jain
Assignee: Lokesh Jain


The Jira aims to add validation for rollback with record level index. The 
validation is added in TestRecordLevelIndex test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9064:
URL: https://github.com/apache/hudi/pull/9064#issuecomment-1613172562

   
   ## CI report:
   
   * 2b572a55998c0e1c4eca7970e8f63ed79254161c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18127)
 
   * b8418b74febf4551c0f79c7ebe71cf24916124e6 UNKNOWN
   * 9c6d2bf222b7247bc926302045123bad69157d39 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18198)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1613172489

   
   ## CI report:
   
   * 4775dce07f2f3237b32f22b360f3423b1eafce85 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9041:
URL: https://github.com/apache/hudi/pull/9041#issuecomment-1613172359

   
   ## CI report:
   
   * b681df04a7ad0febbcd9235622c2ee7f98759cf9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18082)
 
   * 2f139383c54f93669342539af77dca9b3a352be3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9092: [MINOR] Enable log compaction by default for MDT

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9092:
URL: https://github.com/apache/hudi/pull/9092#issuecomment-1613159780

   
   ## CI report:
   
   * 408e9f946e0a0647b0fc9f8e220d55ad2fbde62d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18199)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9089: [MINOR] Increase timeout for Azure CI: UT spark-datasource to 240 minutes

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9089:
URL: https://github.com/apache/hudi/pull/9089#issuecomment-1613159726

   
   ## CI report:
   
   * 4d2e8926188ce5aa2342054aeb99bf1d31eaf0e3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18190)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9064:
URL: https://github.com/apache/hudi/pull/9064#issuecomment-1613159516

   
   ## CI report:
   
   * 2b572a55998c0e1c4eca7970e8f63ed79254161c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18127)
 
   * b8418b74febf4551c0f79c7ebe71cf24916124e6 UNKNOWN
   * 9c6d2bf222b7247bc926302045123bad69157d39 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1613159448

   
   ## CI report:
   
   * 4775dce07f2f3237b32f22b360f3423b1eafce85 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18191)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codope commented on a diff in pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub



codope commented on code in PR #9058:
URL: https://github.com/apache/hudi/pull/9058#discussion_r1246592430


##
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##
@@ -209,9 +211,10 @@ public class HoodieMetadataPayload implements 
HoodieRecordPayload 
orderingVal) {
-this(Option.of(record));
+  public HoodieMetadataPayload(@Nullable GenericRecord record, Comparable 
orderingVal) {
+this(Option.ofNullable(record));

Review Comment:
   Where is this constructor used?



##
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##
@@ -283,6 +285,8 @@ public HoodieMetadataPayload(Option 
recordOpt) {
 
Integer.parseInt(recordIndexRecord.get(RECORD_INDEX_FIELD_FILE_INDEX).toString()),
 
Long.parseLong(recordIndexRecord.get(RECORD_INDEX_FIELD_INSTANT_TIME).toString()));
   }
+} else {
+  this.isDeletedRecord = true;

Review Comment:
   I would favor `isDeleted` field in `HoodieRecordIndexInfo` in the schema.
   
   1. It keeps the schema consistent wrt deletes for different MDT index types. 
Let's say some index types have `isDeleted` and some don't, then it's an added 
mental burden for developers and also not easy to maintain as we add more 
indexes. 
   2. It gives enough flexibility to have separate delete handling logic for 
different index types.
   3. Let's consider the semantics of the if-else in the 
`HoodieMetadataPayload` constructor. It is based on different index types. By 
setting `this.isDeletedRecord = true` in the last else-block we're saying that 
for all index types other than the ones above, consider the record to be 
deleted. It does not make much sense from the pov of adding more index types in 
the future.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xushiyan commented on a diff in pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub



xushiyan commented on code in PR #9041:
URL: https://github.com/apache/hudi/pull/9041#discussion_r1246579974


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##
@@ -310,6 +312,56 @@ public static  HoodieData> 
mergeForPartitionUpdates(
 return Arrays.asList(deleteRecord, getTaggedRecord(merged, 
Option.empty())).iterator();
   }
 });
-return taggedUpdatingRecords.union(newRecords);
+return taggedUpdatingRecords.union(taggedNewRecords);
+  }
+
+  public static  HoodieData> tagGlobalLocationBackToRecords(
+  HoodieData> incomingRecords,
+  HoodiePairData 
keyAndExistingLocations,
+  boolean mayContainDuplicateLookup,
+  boolean shouldUpdatePartitionPath,
+  HoodieWriteConfig config,
+  HoodieTable table) {
+final HoodieRecordMerger merger = config.getRecordMerger();
+
+HoodiePairData> keyAndIncomingRecords =
+incomingRecords.mapToPair(record -> Pair.of(record.getRecordKey(), 
record));
+
+// Pair of incoming record and the global location if meant for merged 
lookup in later stage
+HoodieData, Option>> 
incomingRecordsAndLocations
+= keyAndIncomingRecords.leftOuterJoin(keyAndExistingLocations).values()
+.map(v -> {
+  final HoodieRecord incomingRecord = v.getLeft();
+  Option currentLocOpt = 
Option.ofNullable(v.getRight().orElse(null));
+  if (currentLocOpt.isPresent()) {
+HoodieRecordGlobalLocation currentLoc = currentLocOpt.get();
+boolean shouldPerformMergedLookUp = mayContainDuplicateLookup
+|| !Objects.equals(incomingRecord.getPartitionPath(), 
currentLoc.getPartitionPath());
+if (shouldUpdatePartitionPath && shouldPerformMergedLookUp) {
+  return Pair.of(incomingRecord, currentLocOpt);
+} else {
+  // - When update partition path is set to false,
+  //   the incoming record will be tagged to the existing record's 
partition regardless of being equal or not.
+  // - When update partition path is set to true,
+  //   the incoming record will be tagged to the existing record's 
partition
+  //   when partition is not updated and the look-up won't have 
duplicates (e.g. COW, or using RLI).
+  return Pair.of((HoodieRecord) getTaggedRecord(
+  createNewHoodieRecord(incomingRecord, currentLoc, 
merger), Option.of(currentLoc)),
+  Option.empty());
+}
+  } else {
+return Pair.of(getTaggedRecord(incomingRecord, Option.empty()), 
Option.empty());

Review Comment:
   refactored relevant helper methods



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xushiyan commented on a diff in pull request #9041: [HUDI-6431] Support update partition path in record-level index

2023-06-29 Thread via GitHub



xushiyan commented on code in PR #9041:
URL: https://github.com/apache/hudi/pull/9041#discussion_r1246579590


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##
@@ -310,6 +312,56 @@ public static  HoodieData> 
mergeForPartitionUpdates(
 return Arrays.asList(deleteRecord, getTaggedRecord(merged, 
Option.empty())).iterator();
   }
 });
-return taggedUpdatingRecords.union(newRecords);
+return taggedUpdatingRecords.union(taggedNewRecords);
+  }
+
+  public static  HoodieData> tagGlobalLocationBackToRecords(
+  HoodieData> incomingRecords,
+  HoodiePairData 
keyAndExistingLocations,
+  boolean mayContainDuplicateLookup,
+  boolean shouldUpdatePartitionPath,
+  HoodieWriteConfig config,
+  HoodieTable table) {
+final HoodieRecordMerger merger = config.getRecordMerger();
+
+HoodiePairData> keyAndIncomingRecords =
+incomingRecords.mapToPair(record -> Pair.of(record.getRecordKey(), 
record));
+
+// Pair of incoming record and the global location if meant for merged 
lookup in later stage
+HoodieData, Option>> 
incomingRecordsAndLocations
+= keyAndIncomingRecords.leftOuterJoin(keyAndExistingLocations).values()
+.map(v -> {
+  final HoodieRecord incomingRecord = v.getLeft();
+  Option currentLocOpt = 
Option.ofNullable(v.getRight().orElse(null));
+  if (currentLocOpt.isPresent()) {
+HoodieRecordGlobalLocation currentLoc = currentLocOpt.get();
+boolean shouldPerformMergedLookUp = mayContainDuplicateLookup
+|| !Objects.equals(incomingRecord.getPartitionPath(), 
currentLoc.getPartitionPath());
+if (shouldUpdatePartitionPath && shouldPerformMergedLookUp) {
+  return Pair.of(incomingRecord, currentLocOpt);
+} else {
+  // - When update partition path is set to false,
+  //   the incoming record will be tagged to the existing record's 
partition regardless of being equal or not.
+  // - When update partition path is set to true,
+  //   the incoming record will be tagged to the existing record's 
partition
+  //   when partition is not updated and the look-up won't have 
duplicates (e.g. COW, or using RLI).
+  return Pair.of((HoodieRecord) getTaggedRecord(
+  createNewHoodieRecord(incomingRecord, currentLoc, 
merger), Option.of(currentLoc)),

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch master updated: [MINOR] Improve CollectionUtils helper methods (#9088)

2023-06-29 Thread xushiyan

This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 8def3e68ae5 [MINOR] Improve CollectionUtils helper methods (#9088)
8def3e68ae5 is described below

commit 8def3e68ae5a0b72eefe26db49b6d33226f7b4c0
Author: Shiyan Xu <2701446+xushi...@users.noreply.github.com>
AuthorDate: Thu Jun 29 05:35:19 2023 -0700

[MINOR] Improve CollectionUtils helper methods (#9088)
---
 .../action/clean/CleanPlanActionExecutor.java  |  4 +--
 .../action/commit/TestSchemaEvolutionClient.java   |  3 +-
 .../table/action/rollback/TestRollbackUtils.java   |  3 +-
 .../table/functional/TestCleanPlanExecutor.java|  2 +-
 .../apache/hudi/common/util/CollectionUtils.java   | 35 +++---
 .../hudi/common/table/TestTimelineUtils.java   |  2 +-
 .../table/view/TestIncrementalFSViewSync.java  |  3 +-
 .../hudi/common/testutils/HoodieTestTable.java |  8 ++---
 8 files changed, 23 insertions(+), 37 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java
index 043db1acbf9..ba7c71b1356 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java
@@ -29,7 +29,6 @@ import org.apache.hudi.common.table.timeline.HoodieInstant;
 import org.apache.hudi.common.table.timeline.HoodieTimeline;
 import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
 import org.apache.hudi.common.util.CleanerUtils;
-import org.apache.hudi.common.util.CollectionUtils;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.collection.Pair;
 import org.apache.hudi.config.HoodieWriteConfig;
@@ -42,6 +41,7 @@ import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 import java.io.IOException;
+import java.util.Collections;
 import java.util.List;
 import java.util.Map;
 import java.util.stream.Collectors;
@@ -132,7 +132,7 @@ public class CleanPlanActionExecutor extends 
BaseActionExecutor new HoodieActionInstant(x.getTimestamp(), x.getAction(), 
x.getState().name())).orElse(null),
   planner.getLastCompletedCommitTimestamp(),
-  config.getCleanerPolicy().name(), 
CollectionUtils.createImmutableMap(),
+  config.getCleanerPolicy().name(), Collections.emptyMap(),
   CleanPlanner.LATEST_CLEAN_PLAN_VERSION, cleanOps, 
partitionsToDelete);
 } catch (IOException e) {
   throw new HoodieIOException("Failed to schedule clean operation", e);
diff --git 
a/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/table/action/commit/TestSchemaEvolutionClient.java
 
b/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/table/action/commit/TestSchemaEvolutionClient.java
index bf825df570f..dc45a80754b 100644
--- 
a/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/table/action/commit/TestSchemaEvolutionClient.java
+++ 
b/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/table/action/commit/TestSchemaEvolutionClient.java
@@ -24,7 +24,6 @@ import org.apache.hudi.common.model.HoodieAvroRecord;
 import org.apache.hudi.common.model.HoodieKey;
 import org.apache.hudi.common.table.TableSchemaResolver;
 import org.apache.hudi.common.testutils.RawTripTestPayload;
-import org.apache.hudi.common.util.CollectionUtils;
 import org.apache.hudi.config.HoodieWriteConfig;
 import org.apache.hudi.internal.schema.Types;
 import org.apache.hudi.testutils.HoodieJavaClientTestHarness;
@@ -72,7 +71,7 @@ public class TestSchemaEvolutionClient extends 
HoodieJavaClientTestHarness {
 .withEngineType(EngineType.JAVA)
 .withPath(basePath)
 .withSchema(SCHEMA.toString())
-
.withProps(CollectionUtils.createImmutableMap(HoodieWriteConfig.TBL_NAME.key(), 
"hoodie_test_table"))
+.withProps(Collections.singletonMap(HoodieWriteConfig.TBL_NAME.key(), 
"hoodie_test_table"))
 .build();
 return new HoodieJavaWriteClient<>(context, config);
   }
diff --git 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackUtils.java
 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackUtils.java
index f03d9f3967d..c22a2aef424 100644
--- 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackUtils.java
+++ 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/rollback/TestRollbackUtils.java
@@ -30,6 +30,7 @@ import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.fs.permission.FsPermission;
 import org.junit.jupiter.api.Test;
 
+import jav

[GitHub] [hudi] xushiyan merged pull request #9088: [MINOR] Improve CollectionUtils helper methods

2023-06-29 Thread via GitHub



xushiyan merged PR #9088:
URL: https://github.com/apache/hudi/pull/9088


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9092: [MINOR] Enable log compaction by default for MDT

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9092:
URL: https://github.com/apache/hudi/pull/9092#issuecomment-1613076306

   
   ## CI report:
   
   * 408e9f946e0a0647b0fc9f8e220d55ad2fbde62d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9064:
URL: https://github.com/apache/hudi/pull/9064#issuecomment-1613075951

   
   ## CI report:
   
   * 2b572a55998c0e1c4eca7970e8f63ed79254161c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18127)
 
   * b8418b74febf4551c0f79c7ebe71cf24916124e6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8609: [HUDI-6154] Introduced retry while reading hoodie.properties to deal with parallel updates.

2023-06-29 Thread via GitHub



hudi-bot commented on PR #8609:
URL: https://github.com/apache/hudi/pull/8609#issuecomment-1613056925

   
   ## CI report:
   
   * e14bd41edf6cc961d77087eea67f755f23590834 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17992)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18115)
 
   * a64034d612fa64c99dd8d319ac00680924773f53 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18197)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-6458) Scheduling jobs should not fail when there is no completed commits

2023-06-29 Thread kwang (Jira)

kwang created HUDI-6458:
---

 Summary: Scheduling jobs should not fail when there is no 
completed commits
 Key: HUDI-6458
 URL: https://issues.apache.org/jira/browse/HUDI-6458
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: kwang
 Fix For: 0.14.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] zaza commented on a diff in pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString

2023-06-29 Thread via GitHub



zaza commented on code in PR #9064:
URL: https://github.com/apache/hudi/pull/9064#discussion_r1246538265


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/cdc/HoodieCDCRDD.scala:
##
@@ -561,7 +561,7 @@ class HoodieCDCRDD(
   originTableSchema.structTypeSchema.zipWithIndex.foreach {
 case (field, idx) =>
   if (field.dataType.isInstanceOf[StringType]) {
-map(field.name) = record.getString(idx)
+map(field.name) = 
Option(record.getUTF8String(idx)).map(_.toString).orNull
   } else {

Review Comment:
   This is what I have based on my limited knowledge of Hudi: 
https://github.com/apache/hudi/pull/9064/commits/c88aee0f26afa779594a9981d86aeb3d06727d4b
   
   I'm more than happy to make further adjustments when needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codope opened a new pull request, #9092: [MINOR] Enable log compaction by default for MDT

2023-06-29 Thread via GitHub



codope opened a new pull request, #9092:
URL: https://github.com/apache/hudi/pull/9092

   ### Change Logs
   
   Enable log compaction on metadata table by default.
   
   ### Impact
   
   Will compact log blocks to produce another log file every 5 log blocks.
   
   ### Risk level (write none, low medium or high below)
   
   medium
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9088: [MINOR] Improve CollectionUtils helper methods

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9088:
URL: https://github.com/apache/hudi/pull/9088#issuecomment-1613041272

   
   ## CI report:
   
   * fb282b7602962846c4f561cd101033fca41e43d6 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18182)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8609: [HUDI-6154] Introduced retry while reading hoodie.properties to deal with parallel updates.

2023-06-29 Thread via GitHub



hudi-bot commented on PR #8609:
URL: https://github.com/apache/hudi/pull/8609#issuecomment-1613038827

   
   ## CI report:
   
   * e14bd41edf6cc961d77087eea67f755f23590834 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17992)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18115)
 
   * a64034d612fa64c99dd8d319ac00680924773f53 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-6457) Keep JavaSizeBasedClusteringPlanStrategy and SparkSizeBasedClusteringPlanStrategy aligned

2023-06-29 Thread kwang (Jira)

kwang created HUDI-6457:
---

 Summary: Keep JavaSizeBasedClusteringPlanStrategy and 
SparkSizeBasedClusteringPlanStrategy aligned
 Key: HUDI-6457
 URL: https://issues.apache.org/jira/browse/HUDI-6457
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: kwang
 Fix For: 0.14.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] zaza commented on a diff in pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString

2023-06-29 Thread via GitHub



zaza commented on code in PR #9064:
URL: https://github.com/apache/hudi/pull/9064#discussion_r1246504222


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/cdc/HoodieCDCRDD.scala:
##
@@ -561,7 +561,7 @@ class HoodieCDCRDD(
   originTableSchema.structTypeSchema.zipWithIndex.foreach {
 case (field, idx) =>
   if (field.dataType.isInstanceOf[StringType]) {
-map(field.name) = record.getString(idx)
+map(field.name) = 
Option(record.getUTF8String(idx)).map(_.toString).orNull
   } else {

Review Comment:
   Absolutely, the only problem is that I don't see any unit tests for the cdc 
package so it's hard to follow existing examples. I tried implementing a test 
that extends `HoodieClientTestBase` but that was getting me far from the 
requested "unit test". What would be the best way to start with tests for this 
particular issue?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] lipusheng opened a new issue, #9091: [SUPPORT]

2023-06-29 Thread via GitHub

lipusheng opened a new issue, #9091:
URL: https://github.com/apache/hudi/issues/9091

**_Tips before filing an issue_**

- Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?

- Join the mailing list to engage in conversations and get faster support at
dev-subscr...@hudi.apache.org.

- If you have triaged this as a bug, then file an
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.

**Describe the problem you faced**
When I use the Spark synchronous Hive table data to Hudi table I specify
"KeyGeneratorOptions. RECORDKEY_FIELD_NAME" for "id, user_id," Use the
"KEYGENERATOR_CLAS" to "NonpartitionedKeyGenerator" and specify the "hoodie.
The datasource. Write. Operatio" for "WriteOperationType.BULK_INSERT" In this
case, the _hoodie_record_key for writing data is "125230088,6941". When I
access Kafka data, I just changed the "hoodie. The datasource. Write. Operatio"
for "WriteOperationType. UPSERT", but "_hoodie_record_key format has changed,"
The system changes to user_id:125230088,id:6941, and data duplication occurs
during the query

![image](https://github.com/apache/hudi/assets/57984409/f45c37a8-b38c-4457-9677-2fcbe3bac178)

**To Reproduce**

Steps to reproduce the behavior:

1.
2.
3.
4.

**Expected behavior**

A clear and concise description of what you expected to happen.

**Environment Description**

* Hudi version : 0.12.0

* Spark version : 3.3.1

* Hive version : 3.1.3

* Hadoop version : 3.2.1

* Storage (HDFS/S3/GCS..) : OSS

* Running on Docker? (yes/no) : no

**Additional context**

Add any other context about the problem here.

**Stacktrace**

```Add the stacktrace of the error.```

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codope commented on a diff in pull request #8609: [HUDI-6154] Introduced retry while reading hoodie.properties to deal with parallel updates.

2023-06-29 Thread via GitHub



codope commented on code in PR #8609:
URL: https://github.com/apache/hudi/pull/8609#discussion_r1246489239


##
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java:
##
@@ -334,22 +337,43 @@ public HoodieTableConfig() {
 super();
   }
 
-  private void fetchConfigs(FileSystem fs, String metaPath) throws IOException 
{
+  private static TypedProperties fetchConfigs(FileSystem fs, String metaPath) 
throws IOException {
 Path cfgPath = new Path(metaPath, HOODIE_PROPERTIES_FILE);
-try (FSDataInputStream is = fs.open(cfgPath)) {
-  props.load(is);
-} catch (IOException ioe) {
-  if (!fs.exists(cfgPath)) {
-LOG.warn("Run `table recover-configs` if config update/delete failed 
midway. Falling back to backed up configs.");
-// try the backup. this way no query ever fails if update fails midway.
-Path backupCfgPath = new Path(metaPath, HOODIE_PROPERTIES_FILE_BACKUP);
-try (FSDataInputStream is = fs.open(backupCfgPath)) {
+Path backupCfgPath = new Path(metaPath, HOODIE_PROPERTIES_FILE_BACKUP);
+int readRetryCount = 0;
+boolean found = false;
+
+TypedProperties props = new TypedProperties();
+while (readRetryCount++ < MAX_READ_RETRIES) {
+  for (Path path : Arrays.asList(cfgPath, backupCfgPath)) {
+// Read the properties and validate that it is a valid file
+try (FSDataInputStream is = fs.open(path)) {
+  props.clear();
   props.load(is);
+  found = true;
+  ValidationUtils.checkArgument(validateChecksum(props));
+  return props;
+} catch (IOException e) {
+  LOG.warn(String.format("Could not read properties from %s: %s", 
path, e));
+} catch (IllegalArgumentException e) {
+  LOG.warn(String.format("Invalid properties file %s: %s", path, 
props));
 }
-  } else {
-throw ioe;
+  }
+
+  // Failed to read all files so wait before retrying. This can happen in 
cases of parallel updates to the properties.
+  try {
+Thread.sleep(READ_RETRY_DELAY_MSEC);
+  } catch (InterruptedException e) {
+LOG.warn("Interrupted while waiting");
   }
 }
+
+// If we are here then after all retries either no hoodie.properties was 
found or only an invalid file was found.
+if (found) {
+  throw new IllegalArgumentException("hoodie.properties file seems 
invalid. Please check for left over `.updated` files if any, manually copy it 
to hoodie.properties and retry");
+} else {
+  throw new HoodieIOException("Could not load Hoodie properties from " + 
cfgPath);

Review Comment:
   Fixed the deltastreamer tests by modifying the exception message here as 
deltastreamer depends on specific messae. Pitfalls of depending on exception 
message as business logic! We should try to avoid that as much as apossible.
   
https://github.com/apache/hudi/blob/b95248e011931f4748a7a9fbb8298cbbb71bda88/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java#L695-L697



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9082: [HUDI-6445] Distribute spark ds func tests

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9082:
URL: https://github.com/apache/hudi/pull/9082#issuecomment-1612915683

   
   ## CI report:
   
   * c529c624afdca331514a2bdfb78cc6e18ab9f57a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18185)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1612915077

   
   ## CI report:
   
   * 3b6d13a83efdae5e46eebe9ae168ba7e0d8e9f34 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18189)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] LINGQ1991 commented on issue #8903: [SUPPORT] aws spark3.2.1 & hudi 0.13.1 with java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.PartitionedFile

2023-06-29 Thread via GitHub



LINGQ1991 commented on issue #8903:
URL: https://github.com/apache/hudi/issues/8903#issuecomment-1612912367

   > @ad1happy2go I use emr-6.5.0. It's error with " 
java.lang.NoSuchMethodError: 
org.apache.spark.sql.execution.datasources.PartitionedFile".
   > 
   > But i have package with oss spark and hudi bundle. Work ok now.
   > 
   > ```java
   > 
   > org.apache.maven.plugins
   > maven-shade-plugin
   > 3.2.1
   > 
   > hudi-${spark.version}-plugin
   > 
false
   > 
   > 
   > 
   > package
   > 
   > shade
   > 
   > 
   > 
   > 
   > 
org.apache.spark.sql.execution.datasources.PartitionedFile
   > 
org.local.spark.sql.execution.datasources.PartitionedFile
   > 
   > 
   > org.apache.curator
   > 
org.local.curator
   > 
   > 
   > 
   >  
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"/>
   > 
   > 
   > 
   > *:*
   > 
   > 
module-info.class
   > 
org/apache/spark/unused/**
   > 
   > 
   > 
   > *:*
   > 
   > META-INF/*.SF
   > META-INF/*.DSA
   > META-INF/*.RSA
   > 
   > 
   > 
   > 
   > 
   > 
   > 
   > ```
   
   I have package with hudi bundle. But the following error occurred
   `Caused by: java.lang.ClassCastException: 
org.apache.hudi.spark.org.apache.spark.sql.execution.datasources.PartitionedFile
 cannot be cast to org.apache.spark.sql.execution.datasources.PartitionedFile
at 
org.apache.hudi.HoodieMergeOnReadRDD.read(HoodieMergeOnReadRDD.scala:113)
at 
org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)`
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] flashJd commented on pull request #9048: [HUDI-6434] Fix illegalArgumentException when do read_optimized read in Flink

2023-06-29 Thread via GitHub



flashJd commented on PR #9048:
URL: https://github.com/apache/hudi/pull/9048#issuecomment-1612907539

   > The `DeltaCommitWriteHandleFactory` can be tweaked for the purpose, I'm 
wondering what's the engine conflicts you are talking about?
   
   sry to reply late
   ## engine conflicts:
   v0.12.2 when spark insert overwrite a partition after flink write the log 
files only bucket in this partition, 
   
https://github.com/apache/hudi/blob/b95248e011931f4748a7a9fbb8298cbbb71bda88/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java#L160
 throws, but I found it was fixed in the master
   ## other consideration:
   If align the first create base file logic, many codes can be simplified, 
like:
   
https://github.com/apache/hudi/blob/b95248e011931f4748a7a9fbb8298cbbb71bda88/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java#L362
   
https://github.com/apache/hudi/blob/b95248e011931f4748a7a9fbb8298cbbb71bda88/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/CompactionExecutionHelper.java#L63
   
https://github.com/apache/hudi/blob/b95248e011931f4748a7a9fbb8298cbbb71bda88/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java#L200
   etc.
   what's your opinion, looking forward to your reply


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] flashJd commented on pull request #9048: [HUDI-6434] Fix illegalArgumentException when do read_optimized read in Flink

2023-06-29 Thread via GitHub



flashJd commented on PR #9048:
URL: https://github.com/apache/hudi/pull/9048#issuecomment-1612904526

   > 
   
   sry to reply late
   ## engine conflicts:
   v0.12.2 when spark insert overwrite a partition after flink write the log 
files only bucket in this partition, 
   
https://github.com/apache/hudi/blob/b95248e011931f4748a7a9fbb8298cbbb71bda88/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java#L160
 throws, but I found it was fixed in the master
   ## other consideration:
   If align the first create base file logic, many codes can be simplified, 
like:
   
https://github.com/apache/hudi/blob/b95248e011931f4748a7a9fbb8298cbbb71bda88/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java#L362
   
https://github.com/apache/hudi/blob/b95248e011931f4748a7a9fbb8298cbbb71bda88/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/CompactionExecutionHelper.java#L63
   
https://github.com/apache/hudi/blob/b95248e011931f4748a7a9fbb8298cbbb71bda88/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java#L200
   etc.
   what's your opinion, looking forward to your reply


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] beyond1920 opened a new issue, #9090: [SUPPORT]

2023-06-29 Thread via GitHub



beyond1920 opened a new issue, #9090:
URL: https://github.com/apache/hudi/issues/9090

   I cherry pick [HUDI-1517](https://issues.apache.org/jira/browse/HUDI-1517) 
into internal HUDI version. 
   And find a FileNotFoundException during read latest snapshot of a MOR table.
   
![1688033363329](https://github.com/apache/hudi/assets/1525333/9330203d-866e-4c3d-96a8-922960afc152)
   
   The exception would happen if enable spark speculative feature, there exists 
concurrent writer and reader. For example:
   1. Job1 is writing to a MOR table and not finished yet. It enables spark 
speculative feature.
   2. Job2 is reading the latest snapshot from the MOR table, when it call 
getLatestMergedFileSlicesBeforeOrOn, it might list the log files which are 
written by speculative attempt task in Job1.
   3. Job1 is finished, deletes the log files which are written by slow 
speculative tasks.
   4. Job2 throws the FileNotFoundException when it read the log file which is 
already deleted in step3


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] beyond1920 commented on pull request #4913: [HUDI-1517] create marker file for every log file

2023-06-29 Thread via GitHub



beyond1920 commented on PR #4913:
URL: https://github.com/apache/hudi/pull/4913#issuecomment-1612808066

   I cherrypick this PR to the internal HUDI. And find a 
`FileNotFoundException` during read latest snapshot of a mor table.
   
![1688033363329](https://github.com/apache/hudi/assets/1525333/99459239-1dbf-4067-8020-d4e20bae0bd1)
   The exception would happen if enable spark speculative feature under the 
following case.
   1. Job1 is writing to a MOR table and not finished yet. It enables spark 
speculative feature.
   2. Job2 is reading the latest snapshot from the MOR table, when it call 
`getLatestMergedFileSlicesBeforeOrOn`, it might list the log files which are 
written by speculative attempt task in Job1. 
   3. Job1 is finished, deletes the log files which are written by slow 
speculative tasks.
   4. Job2 throws the `FileNotFoundException` when it read the log file which 
is already deleted in step3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9066: [HUDI-6452] Add MOR snapshot reader to integrate with query engines without using Hadoop APIs

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9066:
URL: https://github.com/apache/hudi/pull/9066#issuecomment-1612807150

   
   ## CI report:
   
   * 8662958e8ccb7203d320dc33445f9f2dbc28fb0c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18159)
 
   * 60c1b8c5885fdda28e07f3ba79290f01dc60a9c4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18196)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #8933: [HUDI-5329] Spark reads table error when Flink creates table without record key and primary key

2023-06-29 Thread via GitHub



hudi-bot commented on PR #8933:
URL: https://github.com/apache/hudi/pull/8933#issuecomment-1612806333

   
   ## CI report:
   
   * d1564f421664fd2dee15dfdbdae4dec07baedf92 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18186)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9066: [HUDI-6452] Add MOR snapshot reader to integrate with query engines without using Hadoop APIs

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9066:
URL: https://github.com/apache/hudi/pull/9066#issuecomment-1612791679

   
   ## CI report:
   
   * 8662958e8ccb7203d320dc33445f9f2dbc28fb0c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18159)
 
   * 60c1b8c5885fdda28e07f3ba79290f01dc60a9c4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9058:
URL: https://github.com/apache/hudi/pull/9058#issuecomment-1612791490

   
   ## CI report:
   
   * 345482ba6529fc3bf0ac9f50ce0c1d79a3accd37 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18163)
 
   * 1697d1bfa095ca16a9361e3728a77331d3a28037 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18195)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9058:
URL: https://github.com/apache/hudi/pull/9058#issuecomment-1612774450

   
   ## CI report:
   
   * 345482ba6529fc3bf0ac9f50ce0c1d79a3accd37 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18163)
 
   * 1697d1bfa095ca16a9361e3728a77331d3a28037 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9017: [HUDI-6393] Add functional tests for RecordLevelIndex

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9017:
URL: https://github.com/apache/hudi/pull/9017#issuecomment-1612701307

   
   ## CI report:
   
   * a3c1d99e2266ec68d9082fe4c76c4bf62070f5a9 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18184)
 
   * ceffe7d8146f48e1c6c083613646463c1404a77f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18194)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xushiyan commented on a diff in pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-29 Thread via GitHub



xushiyan commented on code in PR #9058:
URL: https://github.com/apache/hudi/pull/9058#discussion_r1246371700


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieIndex.java:
##
@@ -749,6 +749,67 @@ public void testRecordIndexTagLocationAndUpdate(boolean 
populateMetaFields) thro
 assertEquals(newInsertsCount, recordLocations.filter(entry -> 
newPartitionPath.equalsIgnoreCase(entry._1.getPartitionPath())).count());
   }
 
+  @ParameterizedTest
+  @ValueSource(strings = "INMEMORY")

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9088: [MINOR] Improve CollectionUtils helper methods

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9088:
URL: https://github.com/apache/hudi/pull/9088#issuecomment-1612690821

   
   ## CI report:
   
   * fb282b7602962846c4f561cd101033fca41e43d6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18182)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9038: [HUDI-6423] Incremental cleaning should consider inflight compaction instant

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9038:
URL: https://github.com/apache/hudi/pull/9038#issuecomment-1612690558

   
   ## CI report:
   
   * a65a29c0cf1c8feb9f39e168ba80c99ebcae1c5d UNKNOWN
   * 34f8823f48712c57058bc37c8936a276c1457557 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18187)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18193)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18188)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9017: [HUDI-6393] Add functional tests for RecordLevelIndex

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9017:
URL: https://github.com/apache/hudi/pull/9017#issuecomment-1612690440

   
   ## CI report:
   
   * d0b2f2457cf648b1b631c75bd64cc1320af69393 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18030)
 
   * a3c1d99e2266ec68d9082fe4c76c4bf62070f5a9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18184)
 
   * ceffe7d8146f48e1c6c083613646463c1404a77f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9088: [MINOR] Improve CollectionUtils helper methods

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9088:
URL: https://github.com/apache/hudi/pull/9088#issuecomment-1612678874

   
   ## CI report:
   
   * fb282b7602962846c4f561cd101033fca41e43d6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1612678677

   
   ## CI report:
   
   * 69b2bb853be0f79845efd56f68b934b9f69ae22a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18160)
 
   * 4775dce07f2f3237b32f22b360f3423b1eafce85 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18191)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9038: [HUDI-6423] Incremental cleaning should consider inflight compaction instant

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9038:
URL: https://github.com/apache/hudi/pull/9038#issuecomment-1612678539

   
   ## CI report:
   
   * a65a29c0cf1c8feb9f39e168ba80c99ebcae1c5d UNKNOWN
   * 34f8823f48712c57058bc37c8936a276c1457557 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18188)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18187)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18193)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (HUDI-6151) Rollback previously applied commits to MDT when operations are retried.

2023-06-29 Thread Danny Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-6151.

Resolution: Fixed

Fixed via master branch: b95248e011931f4748a7a9fbb8298cbbb71bda88

> Rollback previously applied commits to MDT when operations are retried.
> ---
>
> Key: HUDI-6151
> URL: https://issues.apache.org/jira/browse/HUDI-6151
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Prashant Wason
>Assignee: Prashant Wason
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> Operations like Clean, Compaction are retried after failures with the same 
> instant time. If the previous run of the operation successfully committed to 
> the MDT but failed to commit to the dataset, then the operation will be 
> retried later with the same instantTime causing duplicate updates applied to 
> MDT.
> Currently, we simply delete the completed deltacommit without rolling back 
> the deltacommit.
> To handle this, we detect a replay of operation and rollback any changes from 
> that operation in MDT.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[hudi] branch master updated: [HUDI-6151] Rollback previously applied commits to MDT when operations are retried (#8604)

2023-06-29 Thread danny0405

This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new b95248e0119 [HUDI-6151] Rollback previously applied commits to MDT 
when operations are retried (#8604)
b95248e0119 is described below

commit b95248e011931f4748a7a9fbb8298cbbb71bda88
Author: Prashant Wason 
AuthorDate: Thu Jun 29 01:59:08 2023 -0700

[HUDI-6151] Rollback previously applied commits to MDT when operations are 
retried (#8604)

Operations like Clean, Compaction are retried after failures with the same 
instant time. If the previous run of the operation successfully committed to 
the MDT but failed to commit to the dataset, then the operation will be retried 
later with the same instantTime causing duplicate updates applied to MDT.

Currently, we simply delete the completed deltacommit without rolling back 
the deltacommit.

To handle this, we detect a replay of operation and rollback any changes 
from that operation in MDT.

-

Co-authored-by: Sagar Sumit 
---
 .../FlinkHoodieBackedTableMetadataWriter.java  | 50 
 .../SparkHoodieBackedTableMetadataWriter.java  | 38 ++--
 .../functional/TestHoodieBackedMetadata.java   | 68 +-
 3 files changed, 113 insertions(+), 43 deletions(-)

diff --git 
a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/metadata/FlinkHoodieBackedTableMetadataWriter.java
 
b/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/metadata/FlinkHoodieBackedTableMetadataWriter.java
index 7dd32e2916e..6edeac05a74 100644
--- 
a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/metadata/FlinkHoodieBackedTableMetadataWriter.java
+++ 
b/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/metadata/FlinkHoodieBackedTableMetadataWriter.java
@@ -32,9 +32,13 @@ import org.apache.hudi.common.table.timeline.HoodieInstant;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.ValidationUtils;
 import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieMetadataException;
 import org.apache.hudi.exception.HoodieNotSupportedException;
 
 import org.apache.hadoop.conf.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
 import java.util.Collections;
 import java.util.HashMap;
 import java.util.List;
@@ -46,7 +50,7 @@ import static 
org.apache.hudi.common.model.HoodieFailedWritesCleaningPolicy.EAGE
  * Flink hoodie backed table metadata writer.
  */
 public class FlinkHoodieBackedTableMetadataWriter extends 
HoodieBackedTableMetadataWriter {
-
+  private static final Logger LOG = 
LoggerFactory.getLogger(FlinkHoodieBackedTableMetadataWriter.class);
   private transient BaseHoodieWriteClient writeClient;
 
   public static HoodieTableMetadataWriter create(Configuration conf, 
HoodieWriteConfig writeConfig,
@@ -118,33 +122,31 @@ public class FlinkHoodieBackedTableMetadataWriter extends 
HoodieBackedTableMetad
 
   if 
(!metadataMetaClient.getActiveTimeline().containsInstant(instantTime)) {
 // if this is a new commit being applied to metadata for the first time
-writeClient.startCommitWithTime(instantTime);
-
metadataMetaClient.getActiveTimeline().transitionRequestedToInflight(HoodieActiveTimeline.DELTA_COMMIT_ACTION,
 instantTime);
+LOG.info("New commit at " + instantTime + " being applied to MDT.");
   } else {
-Option alreadyCompletedInstant = 
metadataMetaClient.getActiveTimeline().filterCompletedInstants().filter(entry 
-> entry.getTimestamp().equals(instantTime)).lastInstant();
-if (alreadyCompletedInstant.isPresent()) {
-  // this code path refers to a re-attempted commit that got committed 
to metadata table, but failed in datatable.
-  // for eg, lets say compaction c1 on 1st attempt succeeded in 
metadata table and failed before committing to datatable.
-  // when retried again, data table will first rollback pending 
compaction. these will be applied to metadata table, but all changes
-  // are upserts to metadata table and so only a new delta commit will 
be created.
-  // once rollback is complete, compaction will be retried again, 
which will eventually hit this code block where the respective commit is
-  // already part of completed commit. So, we have to manually remove 
the completed instant and proceed.
-  // and it is for the same reason we enabled 
withAllowMultiWriteOnSameInstant for metadata table.
-  HoodieActiveTimeline.deleteInstantFile(metadataMetaClient.getFs(), 
metadataMetaClient.getMetaPath(), alreadyCompletedInstant.get());
-  metadataMetaClient.reloadActiveTimeline();
+// this code path refers to a re-attempted commit that:
+//   1. got committed to metadat

[GitHub] [hudi] danny0405 merged pull request #8604: [HUDI-6151] Rollback previously applied commits to MDT when operations are retried.

2023-06-29 Thread via GitHub



danny0405 merged PR #8604:
URL: https://github.com/apache/hudi/pull/8604


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] lokeshj1703 commented on a diff in pull request #9017: [HUDI-6393] Add functional tests for RecordLevelIndex

2023-06-29 Thread via GitHub



lokeshj1703 commented on code in PR #9017:
URL: https://github.com/apache/hudi/pull/9017#discussion_r1246314270


##
pom.xml:
##
@@ -175,7 +175,7 @@
 2.12.10
 ${scala12.version}
 2.8.1
-2.12
+2.11

Review Comment:
   Sorry! Forgot to remove this change. This was only for fixing the issues.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xushiyan commented on a diff in pull request #9017: [HUDI-6393] Add functional tests for RecordLevelIndex

2023-06-29 Thread via GitHub



xushiyan commented on code in PR #9017:
URL: https://github.com/apache/hudi/pull/9017#discussion_r1246304418


##
pom.xml:
##
@@ -175,7 +175,7 @@
 2.12.10
 ${scala12.version}
 2.8.1
-2.12
+2.11

Review Comment:
   this is the default value which should be 2.12 because spark 3 is default 
now. If this is causing a problem, it means the test setup with spark 2.4 
profile has some gap, which we need to only fix for that profile/setup 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1612621031

   
   ## CI report:
   
   * 69b2bb853be0f79845efd56f68b934b9f69ae22a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18160)
 
   * 4775dce07f2f3237b32f22b360f3423b1eafce85 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-5608) Support decimals w/ precision > 30 in Column Stats

2023-06-29 Thread Jira



[ 
https://issues.apache.org/jira/browse/HUDI-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17738431#comment-17738431
 ] 

赵富午 commented on HUDI-5608:
---

Is there any new progress?

> Support decimals w/ precision > 30 in Column Stats
> --
>
> Key: HUDI-5608
> URL: https://issues.apache.org/jira/browse/HUDI-5608
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 0.12.2
>Reporter: Alexey Kudinkin
>Priority: Critical
> Fix For: 0.14.0
>
>
> As reported in: [https://github.com/apache/hudi/issues/7732]
>  
> Currently we've limited precision of the supported decimals at 30 assuming 
> that this number is reasonably high to cover 99% of use-cases, but it seems 
> like there's still a demand for even larger Decimals.
> The challenge is however to balance the need to support longer Decimals vs 
> storage space we have to provision for each one of them.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #8604: [HUDI-6151] Rollback previously applied commits to MDT when operations are retried.

2023-06-29 Thread via GitHub



hudi-bot commented on PR #8604:
URL: https://github.com/apache/hudi/pull/8604#issuecomment-1612619567

   
   ## CI report:
   
   * eb39bc7559945e199e43a2a3d51e1ab15b4e3e2f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18183)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9087: [HUDI-6329] Write pipelines for table with consistent bucket index would detect whether clustering service occurs and automatically adjust the

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9087:
URL: https://github.com/apache/hudi/pull/9087#issuecomment-1612610932

   
   ## CI report:
   
   * 1bc4ea70966fd2c2cbd7cea126f4fd6b5c875077 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18181)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #9088: [MINOR] Improve CollectionUtils helper methods

2023-06-29 Thread via GitHub



hudi-bot commented on PR #9088:
URL: https://github.com/apache/hudi/pull/9088#issuecomment-1612610988

   
   ## CI report:
   
   * fb282b7602962846c4f561cd101033fca41e43d6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18182)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

< 1 2 3 >

101 - 200 of 209 matches

Mail list logo