[GitHub] [hudi] nsivabalan commented on issue #8132: [SUPPORT] data loss in new base file after compaction

2023-04-27 Thread via GitHub


nsivabalan commented on issue #8132:
URL: https://github.com/apache/hudi/issues/8132#issuecomment-1527027664

   hey @coffee34 : can you help us w/ any more info on this end. we are taking 
a serious look into all data consistency issues. So, interested in getting to 
the bottom of it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan closed issue #7733: [SUPPORT] Duplicate rows found in Hudi non partitioned table.

2023-04-27 Thread via GitHub


nsivabalan closed issue #7733: [SUPPORT] Duplicate rows found in Hudi non 
partitioned table.
URL: https://github.com/apache/hudi/issues/7733


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #7733: [SUPPORT] Duplicate rows found in Hudi non partitioned table.

2023-04-27 Thread via GitHub


nsivabalan commented on issue #7733:
URL: https://github.com/apache/hudi/issues/7733#issuecomment-1527022328

   The linked patch contains unit tests. also, tried reproducing locally w/ 
some failures as well. could not reproduce. 
   closing the issue as not valid anymore. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8592: [MINOR] Improve TestStreamWriteOperatorCoordinator#testCommitOnEmptyBatch

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8592:
URL: https://github.com/apache/hudi/pull/8592#issuecomment-1527011496

   
   ## CI report:
   
   * f3c45931ee531eb35d339a0844cda68cb297cc7c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16719)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ad1happy2go commented on issue #6869: [SUPPORT] Incremental upsert or merge is not working

2023-04-27 Thread via GitHub


ad1happy2go commented on issue #6869:
URL: https://github.com/apache/hudi/issues/6869#issuecomment-1527009817

   Closing this issue As we were not able to reproduce even with earlier 
version.
   Please reopen if you see the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan closed issue #6869: [SUPPORT] Incremental upsert or merge is not working

2023-04-27 Thread via GitHub


nsivabalan closed issue #6869: [SUPPORT] Incremental upsert or merge is not 
working
URL: https://github.com/apache/hudi/issues/6869


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6072) Fix NPE when upsert merger and null map or array

2023-04-27 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-6072:
-
Fix Version/s: 0.13.1
   0.14.0

> Fix NPE when upsert merger and null map or array
> 
>
> Key: HUDI-6072
> URL: https://issues.apache.org/jira/browse/HUDI-6072
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.1, 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-6072) Fix NPE when upsert merger and null map or array

2023-04-27 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-6072.

Resolution: Fixed

Fixed via master branch: 9f8d4d0130dbe78598f24f00e7fa75c13737fc79

> Fix NPE when upsert merger and null map or array
> 
>
> Key: HUDI-6072
> URL: https://issues.apache.org/jira/browse/HUDI-6072
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.1, 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[hudi] branch master updated: [HUDI-6072] Fix NPE when upsert merger and null map or array (#8432)

2023-04-27 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 9f8d4d0130d [HUDI-6072] Fix NPE when upsert merger and null map or 
array (#8432)
9f8d4d0130d is described below

commit 9f8d4d0130dbe78598f24f00e7fa75c13737fc79
Author: Nicolas Paris 
AuthorDate: Fri Apr 28 07:23:46 2023 +0200

[HUDI-6072] Fix NPE when upsert merger and null map or array (#8432)

Co-authored-by: Danny Chan 
---
 .../apache/spark/sql/HoodieInternalRowUtils.scala  |   5 +-
 .../apache/hudi/functional/TestCOWDataSource.scala | 101 ++---
 2 files changed, 73 insertions(+), 33 deletions(-)

diff --git 
a/hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/HoodieInternalRowUtils.scala
 
b/hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/HoodieInternalRowUtils.scala
index 3ea801177fb..b56b0b1e4ce 100644
--- 
a/hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/HoodieInternalRowUtils.scala
+++ 
b/hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/HoodieInternalRowUtils.scala
@@ -188,7 +188,10 @@ object HoodieInternalRowUtils {
   null
 }
 
-fieldWriters(pos)(fieldUpdater, pos, prevValue)
+if(prevValue == null)
+  fieldUpdater.setNullAt(pos)
+else
+  fieldWriters(pos)(fieldUpdater, pos, prevValue)
 pos += 1
   }
 }
diff --git 
a/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala
 
b/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala
index ae1f62b7e61..6b1773807fe 100644
--- 
a/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala
+++ 
b/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala
@@ -60,6 +60,8 @@ import java.sql.{Date, Timestamp}
 import java.util.function.Consumer
 import scala.collection.JavaConversions._
 import scala.collection.JavaConverters._
+import org.junit.jupiter.api.Assertions.assertDoesNotThrow
+import org.junit.jupiter.api.function.Executable
 
 
 /**
@@ -151,22 +153,22 @@ class TestCOWDataSource extends HoodieSparkClientTestBase 
with ScalaAssertionSup
   @Test
   def testInferPartitionBy(): Unit = {
 val (writeOpts, readOpts) = getWriterReaderOpts(HoodieRecordType.AVRO, 
Map())
-  // Insert Operation
-  val records = recordsToStrings(dataGen.generateInserts("000", 
100)).toList
-  val inputDF = spark.read.json(spark.sparkContext.parallelize(records, 2))
+// Insert Operation
+val records = recordsToStrings(dataGen.generateInserts("000", 100)).toList
+val inputDF = spark.read.json(spark.sparkContext.parallelize(records, 2))
 
-  val commonOptsNoPreCombine = Map(
-"hoodie.insert.shuffle.parallelism" -> "4",
-"hoodie.upsert.shuffle.parallelism" -> "4",
-DataSourceWriteOptions.RECORDKEY_FIELD.key -> "_row_key",
-HoodieWriteConfig.TBL_NAME.key -> "hoodie_test"
-  ) ++ writeOpts
+val commonOptsNoPreCombine = Map(
+  "hoodie.insert.shuffle.parallelism" -> "4",
+  "hoodie.upsert.shuffle.parallelism" -> "4",
+  DataSourceWriteOptions.RECORDKEY_FIELD.key -> "_row_key",
+  HoodieWriteConfig.TBL_NAME.key -> "hoodie_test"
+) ++ writeOpts
 
-  inputDF.write.partitionBy("partition").format("hudi")
-.options(commonOptsNoPreCombine)
-.option(DataSourceWriteOptions.OPERATION.key, 
DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL)
-.mode(SaveMode.Overwrite)
-.save(basePath)
+inputDF.write.partitionBy("partition").format("hudi")
+  .options(commonOptsNoPreCombine)
+  .option(DataSourceWriteOptions.OPERATION.key, 
DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL)
+  .mode(SaveMode.Overwrite)
+  .save(basePath)
 
 val snapshot0 = 
spark.read.format("org.apache.hudi").options(readOpts).load(basePath)
 snapshot0.cache()
@@ -195,10 +197,10 @@ class TestCOWDataSource extends HoodieSparkClientTestBase 
with ScalaAssertionSup
 val records2 = recordsToStrings(dataGen.generateInserts("000", 200)).toList
 val inputDF2 = spark.read.json(spark.sparkContext.parallelize(records2, 2))
 // hard code the value for rider and fare so that we can verify the 
partitions paths with hudi
-val toInsertDf = 
inputDF1.withColumn("fare",lit(100)).withColumn("rider",lit("rider-123"))
-  
.union(inputDF2.withColumn("fare",lit(200)).withColumn("rider",lit("rider-456")))
+val toInsertDf = inputDF1.withColumn("fare", lit(100)).withColumn("rider", 
lit("rider-123"))
+  .union(inputDF2.withColumn("fare", lit(200)).withColumn("rider", 
lit("rider-456")))
 
-toInsertDf.write.partitionBy("fare","rider").format("hudi")
+

[GitHub] [hudi] danny0405 merged pull request #8432: [HUDI-6072] Fix NPE when upsert merger and null map or array

2023-04-27 Thread via GitHub


danny0405 merged PR #8432:
URL: https://github.com/apache/hudi/pull/8432


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 closed issue #8431: [SUPPORT] NPE with MapType and new hudi merger

2023-04-27 Thread via GitHub


danny0405 closed issue #8431: [SUPPORT] NPE with MapType and new hudi merger
URL: https://github.com/apache/hudi/issues/8431


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan closed issue #5869: [SUPPORT] There are duplicate values in HUDI MOR table for different partition and not updating values in same partition for GLOBAL_BLOOM

2023-04-27 Thread via GitHub


nsivabalan closed issue #5869: [SUPPORT] There are duplicate values in HUDI MOR 
table for different partition and not updating values in same partition for 
GLOBAL_BLOOM
URL: https://github.com/apache/hudi/issues/5869


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #5869: [SUPPORT] There are duplicate values in HUDI MOR table for different partition and not updating values in same partition for GLOBAL_BLOOM

2023-04-27 Thread via GitHub


nsivabalan commented on issue #5869:
URL: https://github.com/apache/hudi/issues/5869#issuecomment-1527003611

   From what I can glean from the description, looks like the query is a RO 
query and update partition path is set to true. So, w/ 2nd commit, the delete 
record went to a log file in partition creation_date=2015-01-01, while the new 
insert for same record key (100), went to new partition 
creation_date=2015-01-02. hence RO query will return dups. If you trigger 
compaction, this should be resolved. this is a known limitation for RO query. 
   
   
   Also, if you prefer not to update the partition path, for eg, for record 
with record key 100, if you wish to retain the record in partition 2015-01-01 
itself, you should set `hoodie.bloom.index.update.partition.path` = false. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #5869: [SUPPORT] There are duplicate values in HUDI MOR table for different partition and not updating values in same partition for GLOBAL_BLOOM

2023-04-27 Thread via GitHub


nsivabalan commented on issue #5869:
URL: https://github.com/apache/hudi/issues/5869#issuecomment-1527002010

   Not reproducible w/ https://github.com/apache/hudi/pull/8490
   
   ```
   
   import org.apache.spark.sql.SaveMode._
   import org.apache.hudi.DataSourceReadOptions._
   import org.apache.hudi.DataSourceWriteOptions._
   import org.apache.spark.sql.{DataFrame, Row, SparkSession}
   import scala.collection.mutable
   
   val tableName = "hudi5869"
   val spark = SparkSession.builder.enableHiveSupport.getOrCreate
   
   val basePath = "/tmp/hudi5869/"
   
   import spark.implicits._
   // spark-shell
   
   
   val hudiOptions = mutable.Map(
 "hoodie.table.name" -> tableName,
 "hoodie.datasource.write.table.type" -> "MERGE_ON_READ",
 "hoodie.datasource.write.operation" -> "upsert",
 "hoodie.datasource.write.recordkey.field" -> "id",
 "hoodie.datasource.write.precombine.field" -> "last_update_time",
 "hoodie.datasource.write.partitionpath.field" -> "creation_date",
 "hoodie.index.type" -> "GLOBAL_BLOOM",
 "hoodie.bloom.index.update.partition.path" -> "true",
 "hoodie.compact.inline" -> "true",
 "hoodie.datasource.write.keygenerator.class" -> 
"org.apache.hudi.keygen.ComplexKeyGenerator"
   )
   
   val df = Seq(
 ("100", "2015-01-01", "1","a"),
 ("101", "2015-01-01", "1","a")
   ).toDF("id", "creation_date", "last_update_time", "new_col")
   
   
df.write.format("hudi").
   options(hudiOptions).
   mode(Append).
   save(basePath)
   spark.read.format("hudi").load(basePath).show(false)
   
   
   val df1 = Seq(
 ("100", "2015-01-02", "2","b"),
 ("101", "2015-01-01", "2","b")
   ).toDF("id", "creation_date", "last_update_time", "new_col")
   
   
df1.write.format("hudi").
   options(hudiOptions).
   mode(Append).
   save(basePath)
   spark.read.format("hudi").load(basePath).show(false)
   
   
   ```
   
   Output:
   ```
   scala> spark.read.format("hudi").load(basePath).show(false)
   
+---+-+--+--+---+---+-++---+
   |_hoodie_commit_time|_hoodie_commit_seqno 
|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name
  |id 
|creation_date|last_update_time|new_col|
   
+---+-+--+--+---+---+-++---+
   |20230427220827516  |20230427220827516_0_1|id:101|2015-01-01 
   |f183954a-9d23-4192-a1ed-8efc25e4e77f-0  
   |101|2015-01-01   |2   |b  |
   |20230427220827516  |20230427220827516_1_0|id:100|2015-01-02 
   
|b395a368-8e9a-46c8-8660-c78cfd53d06f-0_1-275-1770_20230427220827516.parquet|100|2015-01-02
   |2   |b  |
   
+---+-+--+--+---+---+-++---+
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #6591: [SUPPORT]Duplicate records in MOR

2023-04-27 Thread via GitHub


nsivabalan commented on issue #6591:
URL: https://github.com/apache/hudi/issues/6591#issuecomment-1526991688

   Its already fixed w/ this patch https://github.com/apache/hudi/pull/8490
   
   ```
   scala> spark.read.format("hudi").load(basePath).show(false)
   
+---+-+--+--+---++---++---+--+
   |_hoodie_commit_time|_hoodie_commit_seqno 
|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name
  
|game_schedule_id|game_id|game_date_cn|insert_date|dt|
   
+---+-+--+--+---++---++---+--+
   |20230427215728276  |20230427215728276_0_3|game_schedule_id:5|2022-08-30 
   
|bd4d1121-57bc-4103-91a0-5541a474ef9e-0_0-28-369_20230427215728276.parquet  |5  
 |10005  |2022-08-31  |2022-08-30 12:00:00.000|2022-08-30|
   |20230427215728276  |20230427215728276_0_4|game_schedule_id:6|2022-08-30 
   
|bd4d1121-57bc-4103-91a0-5541a474ef9e-0_0-28-369_20230427215728276.parquet  |6  
 |10006  |2022-08-31  |2022-08-30 12:00:00.000|2022-08-30|
   |20230427215728276  |20230427215728276_0_5|game_schedule_id:1|2022-08-30 
   
|bd4d1121-57bc-4103-91a0-5541a474ef9e-0_0-28-369_20230427215728276.parquet  |1  
 |10001  |2022-08-30  |2022-08-30 12:00:00.000|2022-08-30|
   |20230427215753406  |20230427215753406_1_0|game_schedule_id:2|2022-08-30 
   
|484347af-e681-4b1e-ad99-e7c1cd9adeea-0_1-150-1051_20230427215753406.parquet|2  
 |10002  |2022-08-31  |2022-08-30 12:00:00.000|2022-08-30|
   |20230427215753406  |20230427215753406_1_1|game_schedule_id:3|2022-08-30 
   
|484347af-e681-4b1e-ad99-e7c1cd9adeea-0_1-150-1051_20230427215753406.parquet|3  
 |10003  |2022-08-31  |2022-08-30 12:00:00.000|2022-08-30|
   |20230427215753406  |20230427215753406_1_2|game_schedule_id:4|2022-08-30 
   
|484347af-e681-4b1e-ad99-e7c1cd9adeea-0_1-150-1051_20230427215753406.parquet|4  
 |10004  |2022-08-31  |2022-08-30 12:00:00.000|2022-08-30|
   
+---+-+--+--+---++---++---+--+
   
   
   scala> spark.read.format("hudi").load(basePath).count
   res9: Long = 6
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan closed issue #6591: [SUPPORT]Duplicate records in MOR

2023-04-27 Thread via GitHub


nsivabalan closed issue #6591: [SUPPORT]Duplicate records in MOR
URL: https://github.com/apache/hudi/issues/6591


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8594: [HUDI-6148] Recreate StreamWriteOperatorCoordinator for global failovers

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8594:
URL: https://github.com/apache/hudi/pull/8594#issuecomment-1526984708

   
   ## CI report:
   
   * 4cae8a025b6427876f2bdc3d618e315ca59c547d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16726)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8594: [HUDI-6148] Recreate StreamWriteOperatorCoordinator for global failovers

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8594:
URL: https://github.com/apache/hudi/pull/8594#issuecomment-1526979733

   
   ## CI report:
   
   * 4cae8a025b6427876f2bdc3d618e315ca59c547d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xccui opened a new pull request, #8594: [HUDI-6148] Recreate StreamWriteOperatorCoordinator for global failovers

2023-04-27 Thread via GitHub


xccui opened a new pull request, #8594:
URL: https://github.com/apache/hudi/pull/8594

   ### Change Logs
   
   When a global failover for a Flink job is triggered, it's safer to recreate 
a new `StreamWriteOperatorCoordinator`. Otherwise, all exceptions caused by the 
coordinator itself will not auto-heal.
   
   Flink offers a 
[`RecreateOnResetOperatorCoordinator`](https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/StateAssignmentOperation.java)
 that can be used to restart the coordinator when resetting to a checkpoint (a 
global failover is triggered).
   
   This fixes #8554.
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] prm-xingcan closed pull request #8593: [HUDI-6148] Recreate StreamWriteOperatorCoordinator for global failovers

2023-04-27 Thread via GitHub


prm-xingcan closed pull request #8593: [HUDI-6148] Recreate 
StreamWriteOperatorCoordinator for global failovers
URL: https://github.com/apache/hudi/pull/8593


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6148) Recreate StreamWriteOperatorCoordinator for global failovers

2023-04-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6148:
-
Labels: pull-request-available  (was: )

> Recreate StreamWriteOperatorCoordinator for global failovers
> 
>
> Key: HUDI-6148
> URL: https://issues.apache.org/jira/browse/HUDI-6148
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Xingcan Cui
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.1, 0.12.3
>
>
> When a global failover for a Flink job is triggered, it's safer to recreate a 
> new {{{}StreamWriteOperatorCoordinator{}}}. Otherwise, all exceptions caused 
> by the coordinator itself will not auto-heal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] prm-xingcan opened a new pull request, #8593: [HUDI-6148] Recreate StreamWriteOperatorCoordinator for global failovers

2023-04-27 Thread via GitHub


prm-xingcan opened a new pull request, #8593:
URL: https://github.com/apache/hudi/pull/8593

   ### Change Logs
   
   When a global failover for a Flink job is triggered, it's safer to recreate 
a new `StreamWriteOperatorCoordinator`. Otherwise, all exceptions caused by the 
coordinator itself will not auto-heal. 
   
   Flink offers a `RecreateOnResetOperatorCoordinator` that can be used to 
restart the coordinator when resetting to a checkpoint (a global failover is 
triggered).
   
   This should fix #8554.
   
   ### Impact
   
   No
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6148) Recreate StreamWriteOperatorCoordinator for global failovers

2023-04-27 Thread Xingcan Cui (Jira)
Xingcan Cui created HUDI-6148:
-

 Summary: Recreate StreamWriteOperatorCoordinator for global 
failovers
 Key: HUDI-6148
 URL: https://issues.apache.org/jira/browse/HUDI-6148
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink
Reporter: Xingcan Cui
 Fix For: 0.13.1, 0.12.3


When a global failover for a Flink job is triggered, it's safer to recreate a 
new {{{}StreamWriteOperatorCoordinator{}}}. Otherwise, all exceptions caused by 
the coordinator itself will not auto-heal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #8587: [HUDI-6145] Fix the flink table create schema to be compatible with S…

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8587:
URL: https://github.com/apache/hudi/pull/8587#issuecomment-1526944107

   
   ## CI report:
   
   * 4ae48627662446f99d5aa84dae43725ea4e7a579 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16711)
 
   * e6c9e7f8adcae2128954243d47210552d2d104a5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16724)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8478: [HUDI-6086] Improve HiveSchemaUtil#generateCreateDDL With StringBuilder

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8478:
URL: https://github.com/apache/hudi/pull/8478#issuecomment-1526943843

   
   ## CI report:
   
   * 6d15b09d5d9fff2e0cdbb15faa73f62ad2e5e852 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16718)
 
   * 1b207dfb87f2e63eca74f81b85e4effa41794e2b UNKNOWN
   * dce3b83ad78abe83a70a83ef5a72c32ebb517bb3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16723)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8432: [HUDI-6072] Fix NPE when upsert merger and null map or array

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8432:
URL: https://github.com/apache/hudi/pull/8432#issuecomment-1526943724

   
   ## CI report:
   
   * 8a67f296a325ef968f5a29ac5cd0c75a0f7c83c6 UNKNOWN
   * 22fb6052ee84211a79a360d04600c98697d80afa Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16710)
 
   * 2e3f72f1a66e7108fbb1167dfde17c248639638c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16720)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] Hive query returns null when the where clause has a partition field

2023-04-27 Thread via GitHub


hudi-bot commented on PR #7355:
URL: https://github.com/apache/hudi/pull/7355#issuecomment-1526942779

   
   ## CI report:
   
   * 6c5e047614106881b40c05cbaf4972e82a7c9440 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16705)
 
   * 136ee7be160746b1cf67f956d80e2bf2bfbb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16722)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7173: [HUDI-5189] Make HiveAvroSerializer compatible with hive3

2023-04-27 Thread via GitHub


hudi-bot commented on PR #7173:
URL: https://github.com/apache/hudi/pull/7173#issuecomment-1526942681

   
   ## CI report:
   
   * 650d19166b9d80abed32b07abf49dd4fb087aeeb Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16573)
 
   * 33e116e83e6ca348dc6039db0f76ed5df50a731f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16721)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #8587: [HUDI-6145] Fix the flink table create schema to be compatible with S…

2023-04-27 Thread via GitHub


danny0405 commented on PR #8587:
URL: https://github.com/apache/hudi/pull/8587#issuecomment-1526941833

   The failed test case: 
https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=16711=logs=3b6e910d-b98f-5de6-b9cb-1e5ff571f5de=30b5aae4-0ea0-5566-42d0-febf71a7061a=713634
   
   is flaky and should not be caused by this patch, has validated it multiple 
times and can not reproduce.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8587: [HUDI-6145] Fix the flink table create schema to be compatible with S…

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8587:
URL: https://github.com/apache/hudi/pull/8587#issuecomment-1526939303

   
   ## CI report:
   
   * 4ae48627662446f99d5aa84dae43725ea4e7a579 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16711)
 
   * e6c9e7f8adcae2128954243d47210552d2d104a5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8432: [HUDI-6072] Fix NPE when upsert merger and null map or array

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8432:
URL: https://github.com/apache/hudi/pull/8432#issuecomment-1526938835

   
   ## CI report:
   
   * 8a67f296a325ef968f5a29ac5cd0c75a0f7c83c6 UNKNOWN
   * 22fb6052ee84211a79a360d04600c98697d80afa Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16710)
 
   * 2e3f72f1a66e7108fbb1167dfde17c248639638c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] Hive query returns null when the where clause has a partition field

2023-04-27 Thread via GitHub


hudi-bot commented on PR #7355:
URL: https://github.com/apache/hudi/pull/7355#issuecomment-1526937817

   
   ## CI report:
   
   * 6c5e047614106881b40c05cbaf4972e82a7c9440 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16705)
 
   * 136ee7be160746b1cf67f956d80e2bf2bfbb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7173: [HUDI-5189] Make HiveAvroSerializer compatible with hive3

2023-04-27 Thread via GitHub


hudi-bot commented on PR #7173:
URL: https://github.com/apache/hudi/pull/7173#issuecomment-1526937691

   
   ## CI report:
   
   * 650d19166b9d80abed32b07abf49dd4fb087aeeb Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16573)
 
   * 33e116e83e6ca348dc6039db0f76ed5df50a731f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-6114) Rollback handling in AbstractHoodieLogRecordReader may not work correctly when multi-writer is enabled

2023-04-27 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-6114.

Resolution: Fixed

Fixed via master branch: a8c7d48325a74663dd4a4de8d1d8c0407eb7c258

> Rollback handling in AbstractHoodieLogRecordReader may not work correctly 
> when multi-writer is enabled
> --
>
> Key: HUDI-6114
> URL: https://issues.apache.org/jira/browse/HUDI-6114
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Prashant Wason
>Assignee: Prashant Wason
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.1, 0.14.0
>
>
> When a ROLLBACK command block is encountered, only the last log block is 
> potentially rolled back. This may not work in case of multi-writers where the 
> rollback may be aplicable to an older block.
> E.g. Assume two processed P1 and P2 which are writing data to the MOR table. 
> P1 started at time t1 and P2 started at t2. Lets assume P1 writes the log 
> block and then p2 writes the log block.
>  
> So the log file has two blocks now [LBlock1(instantTime=t1), 
> LBlock2(instantTime=t2)]
> If the P1 failed after writing to log file but before the commit could be 
> created, the inflight commit at t1 would eventually be rolled back. In that 
> case a rollback block will be written. The log file would look like this:
> [LBlock1(instantTime=t1), LBlock2(instantTime=t2), LBlock(Rollback block with 
> targetInstantTime=t1)]
>  
> The current AbstractHoodieLogRecordReader code will not rollback LBlock1 as 
> it only applies rollback to the last block.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6114) Rollback handling in AbstractHoodieLogRecordReader may not work correctly when multi-writer is enabled

2023-04-27 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-6114:
-
Fix Version/s: 0.13.1
   0.14.0

> Rollback handling in AbstractHoodieLogRecordReader may not work correctly 
> when multi-writer is enabled
> --
>
> Key: HUDI-6114
> URL: https://issues.apache.org/jira/browse/HUDI-6114
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Prashant Wason
>Assignee: Prashant Wason
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.1, 0.14.0
>
>
> When a ROLLBACK command block is encountered, only the last log block is 
> potentially rolled back. This may not work in case of multi-writers where the 
> rollback may be aplicable to an older block.
> E.g. Assume two processed P1 and P2 which are writing data to the MOR table. 
> P1 started at time t1 and P2 started at t2. Lets assume P1 writes the log 
> block and then p2 writes the log block.
>  
> So the log file has two blocks now [LBlock1(instantTime=t1), 
> LBlock2(instantTime=t2)]
> If the P1 failed after writing to log file but before the commit could be 
> created, the inflight commit at t1 would eventually be rolled back. In that 
> case a rollback block will be written. The log file would look like this:
> [LBlock1(instantTime=t1), LBlock2(instantTime=t2), LBlock(Rollback block with 
> targetInstantTime=t1)]
>  
> The current AbstractHoodieLogRecordReader code will not rollback LBlock1 as 
> it only applies rollback to the last block.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] danny0405 commented on pull request #8523: [HUDI-6114] Fixed rollback of blocks in scanInternalV1

2023-04-27 Thread via GitHub


danny0405 commented on PR #8523:
URL: https://github.com/apache/hudi/pull/8523#issuecomment-1526936486

   The failed test: 
https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=16702=logs=3b6e910d-b98f-5de6-b9cb-1e5ff571f5de=30b5aae4-0ea0-5566-42d0-febf71a7061a=717193
   
   is a known falky test that has been fixed on master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] rohan-uptycs commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-27 Thread via GitHub


rohan-uptycs commented on code in PR #8503:
URL: https://github.com/apache/hudi/pull/8503#discussion_r1179895882


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java:
##
@@ -509,7 +509,15 @@ private Stream getCommitInstantsToArchive() 
throws IOException {
   }
 
   private Stream getInstantsToArchive() throws IOException {
-Stream instants = 
Stream.concat(getCleanInstantsToArchive(), getCommitInstantsToArchive());
+List commitInstantsToArchive = 
getCommitInstantsToArchive().collect(Collectors.toList());
+Stream instants = 
Stream.concat(getCleanInstantsToArchive(), commitInstantsToArchive.stream());
+HoodieInstant hoodieOldestInstantToArchive = 
commitInstantsToArchive.stream().max(Comparator.comparing(maxInstant -> 
maxInstant.getTimestamp())).orElse(null);
+/**
+ * if hoodieOldestInstantToArchive is null that means nothing is getting 
archived, so no need to update metadata
+ */
+if (hoodieOldestInstantToArchive != null) {
+  table.getIndex().updateMetadata(table, 
Option.of(hoodieOldestInstantToArchive));

Review Comment:
   @SteNicholas , Yeah it can be invoked but i see few problems with it 
   What if underlying file system is down and **updateMetadata** fails to sync 
metadata, then there is no  mechanism to bring it in sync with latest committed 
metadata, and archival will remove replace commit eventually and it will end up 
in an inconsistent state.
   On the other hand in **archival process , it will be eventually in sync with 
committed metadata**  before replace commit getting archived.
   I think consistent hashing metadata has strong dependency on archival 
process, As it is dependent on active timeline replaced commit to load metadata.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (77039ae734a -> a8c7d48325a)

2023-04-27 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 77039ae734a [HUDI-5517] HoodieTimeline support filter instants by 
state transition time (#7627)
 add a8c7d48325a [HUDI-6114] Fixed rollback of blocks in scanInternalV1 
(#8523)

No new revisions were added by this update.

Summary of changes:
 .../table/log/AbstractHoodieLogRecordReader.java   |  50 ++--
 .../common/functional/TestHoodieLogFormat.java | 309 -
 2 files changed, 134 insertions(+), 225 deletions(-)



[GitHub] [hudi] danny0405 merged pull request #8523: [HUDI-6114] Fixed rollback of blocks in scanInternalV1

2023-04-27 Thread via GitHub


danny0405 merged PR #8523:
URL: https://github.com/apache/hudi/pull/8523


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8557: [HUDI-5895] Remove bootstrap key generator configs

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8557:
URL: https://github.com/apache/hudi/pull/8557#issuecomment-1526934343

   
   ## CI report:
   
   * 75d4c5b67703349573569677d88deb2b1eb647fd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16717)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8478: [HUDI-6086] Improve HiveSchemaUtil#generateCreateDDL With StringBuilder

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8478:
URL: https://github.com/apache/hudi/pull/8478#issuecomment-1526934101

   
   ## CI report:
   
   * 6d15b09d5d9fff2e0cdbb15faa73f62ad2e5e852 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16718)
 
   * 1b207dfb87f2e63eca74f81b85e4effa41794e2b UNKNOWN
   * dce3b83ad78abe83a70a83ef5a72c32ebb517bb3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on issue #7602: [SUPPORT] When does the Spark engine's bulk insert mode support bucket index

2023-04-27 Thread via GitHub


danny0405 commented on issue #7602:
URL: https://github.com/apache/hudi/issues/7602#issuecomment-1526921599

   Is this the fix you want? https://github.com/apache/hudi/pull/7834


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8592: [MINOR] Improve TestStreamWriteOperatorCoordinator#testCommitOnEmptyBatch

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8592:
URL: https://github.com/apache/hudi/pull/8592#issuecomment-1526903880

   
   ## CI report:
   
   * f3c45931ee531eb35d339a0844cda68cb297cc7c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16719)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8478: [HUDI-6086] Improve HiveSchemaUtil#generateCreateDDL With StringBuilder

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8478:
URL: https://github.com/apache/hudi/pull/8478#issuecomment-1526903651

   
   ## CI report:
   
   * e529b3409ce663965bd925e7389a99f323d3ef2d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16707)
 
   * 6d15b09d5d9fff2e0cdbb15faa73f62ad2e5e852 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16718)
 
   * 1b207dfb87f2e63eca74f81b85e4effa41794e2b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] slfan1989 commented on pull request #8478: [HUDI-6086] Improve HiveSchemaUtil#generateCreateDDL With StringBuilder

2023-04-27 Thread via GitHub


slfan1989 commented on PR #8478:
URL: https://github.com/apache/hudi/pull/8478#issuecomment-1526902500

   @danny0405 Thank you very much for helping to review the code and providing 
suggestions for improvement! I have refactored this part of the code using 
`StringBuilder`, as we discussed, to improve its readability. I also added some 
comments to provide additional context and clarification. Sorry for the delayed 
response.
   
   When you have time, could you please review this PR again? Thank you very 
much!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8592: [MINOR] Improve TestStreamWriteOperatorCoordinator#testCommitOnEmptyBatch

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8592:
URL: https://github.com/apache/hudi/pull/8592#issuecomment-1526899039

   
   ## CI report:
   
   * f3c45931ee531eb35d339a0844cda68cb297cc7c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8478: [HUDI-6086] Improve HiveSchemaUtil#generateCreateDDL With StringBuilder

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8478:
URL: https://github.com/apache/hudi/pull/8478#issuecomment-1526898655

   
   ## CI report:
   
   * e529b3409ce663965bd925e7389a99f323d3ef2d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16707)
 
   * 6d15b09d5d9fff2e0cdbb15faa73f62ad2e5e852 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8591: [MINOR] Enhancing validate staged bundles script

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8591:
URL: https://github.com/apache/hudi/pull/8591#issuecomment-1526892070

   
   ## CI report:
   
   * 3f368d58afd0982615405d965d29422b57a1 UNKNOWN
   * d7a6f5fbb3c280f2c2baaa76cc77d33b247730a3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16716)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7632: [HUDI-3775] Allow for offline compaction of MOR tables via spark streaming

2023-04-27 Thread via GitHub


hudi-bot commented on PR #7632:
URL: https://github.com/apache/hudi/pull/7632#issuecomment-1526891136

   
   ## CI report:
   
   * 07f5163e41234081bb0c9f9cd9885ed150487d06 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16714)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] stream2000 opened a new pull request, #8592: [MINOR] Improve TestStreamWriteOperatorCoordinator#testCommitOnEmptyBatch

2023-04-27 Thread via GitHub


stream2000 opened a new pull request, #8592:
URL: https://github.com/apache/hudi/pull/8592

   ### Change Logs
   
   Improve TestStreamWriteOperatorCoordinator#testCommitOnEmptyBatch. Now will 
send all write events before committing the empty instant
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8557: [HUDI-5895] Remove bootstrap key generator configs

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8557:
URL: https://github.com/apache/hudi/pull/8557#issuecomment-1526863201

   
   ## CI report:
   
   * a79402c2cb4e8482b02cd8aaad613774703ceb3d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16632)
 
   * 75d4c5b67703349573569677d88deb2b1eb647fd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16717)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8557: [HUDI-5895] Remove bootstrap key generator configs

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8557:
URL: https://github.com/apache/hudi/pull/8557#issuecomment-1526858818

   
   ## CI report:
   
   * a79402c2cb4e8482b02cd8aaad613774703ceb3d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16632)
 
   * 75d4c5b67703349573569677d88deb2b1eb647fd UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8590: [HUDI-3545] [UBER] Make HoodieAvroWriteSupport class configurable

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8590:
URL: https://github.com/apache/hudi/pull/8590#issuecomment-1526854211

   
   ## CI report:
   
   * c9ffac46b68ebb82eb65cb867d579345a2abeac9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16713)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8591: [MINOR] Enhancing validate staged bundles script

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8591:
URL: https://github.com/apache/hudi/pull/8591#issuecomment-1526814179

   
   ## CI report:
   
   * 3f368d58afd0982615405d965d29422b57a1 UNKNOWN
   * d7a6f5fbb3c280f2c2baaa76cc77d33b247730a3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16716)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1526812836

   
   ## CI report:
   
   * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN
   * 76394b7ed5df01286d5085e5f1b43a47e52baa5d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16715)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8591: [MINOR] Enhancing validate staged bundles script

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8591:
URL: https://github.com/apache/hudi/pull/8591#issuecomment-1526782981

   
   ## CI report:
   
   * 3f368d58afd0982615405d965d29422b57a1 UNKNOWN
   * d7a6f5fbb3c280f2c2baaa76cc77d33b247730a3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1526782528

   
   ## CI report:
   
   * c6908a16bf2f1fb46735781f8d969177eadc23a4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668)
 
   * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN
   * 76394b7ed5df01286d5085e5f1b43a47e52baa5d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8591: [MINOR] Enhancing validate staged bundles script

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8591:
URL: https://github.com/apache/hudi/pull/8591#issuecomment-1526772068

   
   ## CI report:
   
   * 3f368d58afd0982615405d965d29422b57a1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1526770534

   
   ## CI report:
   
   * c6908a16bf2f1fb46735781f8d969177eadc23a4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668)
 
   * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan opened a new pull request, #8591: [MINOR] Enhancing validate staged bundles script

2023-04-27 Thread via GitHub


nsivabalan opened a new pull request, #8591:
URL: https://github.com/apache/hudi/pull/8591

   ### Change Logs
   
   Enhancing validate staged bundles script
   
   ### Impact
   
   Enhancing validate staged bundles script
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8589: [6147] Deltastreamer finish failed compaction before ingestion

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8589:
URL: https://github.com/apache/hudi/pull/8589#issuecomment-1526758337

   
   ## CI report:
   
   * 9ee4505209c15331987695061f05d0b8e6d06848 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16712)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7632: [HUDI-3775] Allow for offline compaction of MOR tables via spark streaming

2023-04-27 Thread via GitHub


hudi-bot commented on PR #7632:
URL: https://github.com/apache/hudi/pull/7632#issuecomment-1526754977

   
   ## CI report:
   
   * c7fce967ece6f3cef5cb34ddc2ff7ddffa66727f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16565)
 
   * 07f5163e41234081bb0c9f9cd9885ed150487d06 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16714)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7632: [HUDI-3775] Allow for offline compaction of MOR tables via spark streaming

2023-04-27 Thread via GitHub


hudi-bot commented on PR #7632:
URL: https://github.com/apache/hudi/pull/7632#issuecomment-1526671395

   
   ## CI report:
   
   * c7fce967ece6f3cef5cb34ddc2ff7ddffa66727f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16565)
 
   * 07f5163e41234081bb0c9f9cd9885ed150487d06 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8590: [HUDI-3545] [UBER] Make HoodieAvroWriteSupport class configurable

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8590:
URL: https://github.com/apache/hudi/pull/8590#issuecomment-1526659953

   
   ## CI report:
   
   * c9ffac46b68ebb82eb65cb867d579345a2abeac9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16713)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8590: [HUDI-3545] [UBER] Make HoodieAvroWriteSupport class configurable

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8590:
URL: https://github.com/apache/hudi/pull/8590#issuecomment-1526645612

   
   ## CI report:
   
   * c9ffac46b68ebb82eb65cb867d579345a2abeac9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-3545) Make HoodieAvroWriteSupport class configurable

2023-04-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3545:
-
Labels: pull-request-available  (was: )

> Make HoodieAvroWriteSupport class configurable
> --
>
> Key: HUDI-3545
> URL: https://issues.apache.org/jira/browse/HUDI-3545
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: Surya Prasanna Yalla
>Assignee: Surya Prasanna Yalla
>Priority: Major
>  Labels: pull-request-available
>
> Make HoodieAvroWriteSupport class configurable, that way this class can be 
> overridden by custom write support classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] suryaprasanna opened a new pull request, #8590: [HUDI-3545] [UBER] Make HoodieAvroWriteSupport class configurable

2023-04-27 Thread via GitHub


suryaprasanna opened a new pull request, #8590:
URL: https://github.com/apache/hudi/pull/8590

   ### Change Logs
   
   All write support classes should extend HoodieAvoWriteSupport, the change 
uses Reflection APIs, to load sub classes of HoodieAvroWriteSupport. This 
approach is useful to override writeContext and other methods.
   
   ### Impact
   
   By default HoodieAvoWriteSupport is still used, so there is not impact.
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   A new config hoodie.avro.write.support.class is added under 
HoodieStorageConfig class, which is used to load sub classes of 
HoodieAvoWriteSupport ar runtime.
   
   ### Contributor's checklist
   
   - [*] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [*] Change Logs and Impact were stated clearly
   - [] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8589: [6147] Deltastreamer finish failed compaction before ingestion

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8589:
URL: https://github.com/apache/hudi/pull/8589#issuecomment-1526415903

   
   ## CI report:
   
   * 9ee4505209c15331987695061f05d0b8e6d06848 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16712)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8432: [HUDI-6072] Fix NPE when upsert merger and null map or array

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8432:
URL: https://github.com/apache/hudi/pull/8432#issuecomment-1526414671

   
   ## CI report:
   
   * 8a67f296a325ef968f5a29ac5cd0c75a0f7c83c6 UNKNOWN
   * 22fb6052ee84211a79a360d04600c98697d80afa Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16710)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8587: [HUDI-6145] Fix the flink table create schema to be compatible with S…

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8587:
URL: https://github.com/apache/hudi/pull/8587#issuecomment-1526399286

   
   ## CI report:
   
   * 4ae48627662446f99d5aa84dae43725ea4e7a579 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16711)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8589: [6147] Deltastreamer finish failed compaction before ingestion

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8589:
URL: https://github.com/apache/hudi/pull/8589#issuecomment-1526399416

   
   ## CI report:
   
   * 9ee4505209c15331987695061f05d0b8e6d06848 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] jonvex opened a new pull request, #8589: [6147] Deltastreamer finish failed compaction before ingestion

2023-04-27 Thread via GitHub


jonvex opened a new pull request, #8589:
URL: https://github.com/apache/hudi/pull/8589

   ### Change Logs
   
   Deltastreamer rolls back and does compaction after the deltacommit if it 
failed previously. Now, if the config `--retry-last-pending-inline-compaction` 
is set it will do the compaction before the commit.
   
   ### Impact
   
   Improvement
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8574: [HUDI-6139] Add support for Transformer schema validation in DeltaStreamer

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8574:
URL: https://github.com/apache/hudi/pull/8574#issuecomment-1526246719

   
   ## CI report:
   
   * 080184949c2e1ece7e272e71f0844597a41c8a0b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16709)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6147) Retry Pending Compactions before ingestion in Deltastreamer

2023-04-27 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-6147:
-

 Summary: Retry Pending Compactions before ingestion in 
Deltastreamer
 Key: HUDI-6147
 URL: https://issues.apache.org/jira/browse/HUDI-6147
 Project: Apache Hudi
  Issue Type: Improvement
  Components: compaction, deltastreamer
Reporter: Jonathan Vexler
Assignee: Jonathan Vexler


Currently, the timeline looks like this: The compaction is done after the 
deltacommit
{code:java}
drwxr-xr-x  2 jon  staff    64 Apr 27 15:15 archived
-rw-r--r--  1 jon  staff     0 Apr 27 15:15 
20230427151550027.deltacommit.requested
drwxr-xr-x  4 jon  staff   128 Apr 27 15:15 metadata
-rw-r--r--  1 jon  staff   898 Apr 27 15:15 hoodie.properties
-rw-r--r--  1 jon  staff  2199 Apr 27 15:15 
20230427151550027.deltacommit.inflight
-rw-r--r--  1 jon  staff  5269 Apr 27 15:15 20230427151550027.deltacommit
-rw-r--r--  1 jon  staff     0 Apr 27 15:15 
20230427151558008.deltacommit.requested
-rw-r--r--  1 jon  staff  4365 Apr 27 15:15 
20230427151558008.deltacommit.inflight
-rw-r--r--  1 jon  staff  6177 Apr 27 15:16 20230427151558008.deltacommit
-rw-r--r--  1 jon  staff  3211 Apr 27 15:16 
20230427151602293.compaction.requested
-rw-r--r--  1 jon  staff     0 Apr 27 15:16 
20230427151604955.deltacommit.requested
-rw-r--r--  1 jon  staff  4365 Apr 27 15:16 
20230427151604955.deltacommit.inflight
-rw-r--r--  1 jon  staff  5977 Apr 27 15:16 20230427151604955.deltacommit
-rw-r--r--  1 jon  staff  1229 Apr 27 15:16 20230427151609434.rollback.requested
-rw-r--r--  1 jon  staff     0 Apr 27 15:16 20230427151609434.rollback.inflight
-rw-r--r--  1 jon  staff  1413 Apr 27 15:16 20230427151609434.rollback
-rw-r--r--  1 jon  staff     0 Apr 27 15:16 
20230427151602293.compaction.inflight
-rw-r--r--  1 jon  staff  5271 Apr 27 15:16 20230427151602293.commit {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] sydneyhoran commented on pull request #7913: Adding support for EPOCHMICROSECONDS in TimestampBasedAvroKeyGenerator

2023-04-27 Thread via GitHub


sydneyhoran commented on PR #7913:
URL: https://github.com/apache/hudi/pull/7913#issuecomment-1526194783

   @bvaradar Hi there I just realized I made this PR using my old github 
account so I've missed these notifications. I will look at addressing this test 
case as soon as I have a chance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #8588: [MINOR] Release 0.12.3 script update

2023-04-27 Thread via GitHub


nsivabalan commented on PR #8588:
URL: https://github.com/apache/hudi/pull/8588#issuecomment-1526166427

   When we go for 0.12.4, we might need these scripts since 0.12.x line has 
diff bundles compared to 0.13.0. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan opened a new pull request, #8588: [MINOR] Release 0.12.3 script update

2023-04-27 Thread via GitHub


nsivabalan opened a new pull request, #8588:
URL: https://github.com/apache/hudi/pull/8588

   ### Change Logs
   
   Some release scripts updates for 0.12.3
   
   ### Impact
   
   Some release scripts updates for 0.12.3
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8514: [HUDI-6113] Support multiple transformers using the same config keys in DeltaStreamer

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8514:
URL: https://github.com/apache/hudi/pull/8514#issuecomment-1526142239

   
   ## CI report:
   
   * 272e1bd85c19968502784da9eadc14624059ed7f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16708)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #7632: [HUDI-3775] Allow for offline compaction of MOR tables via spark streaming

2023-04-27 Thread via GitHub


nsivabalan commented on code in PR #7632:
URL: https://github.com/apache/hudi/pull/7632#discussion_r1179510832


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala:
##
@@ -483,6 +483,15 @@ object DataSourceWriteOptions {
   + "This could introduce the potential issue that the job is 
restart(`batch id` is lost) while spark checkpoint write fails, "
   + "causing spark will retry and rewrite the data.")
 
+  val STREAMING_DISABLE_COMPACTION: ConfigProperty[String] = ConfigProperty
+.key("hoodie.datasource.write.streaming.disable.compaction")
+.defaultValue("false")
+.sinceVersion("0.13.0")

Review Comment:
   lets fix the version



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8478: [HUDI-6086] Improve HiveSchemaUtil#generateCreateDDL With StringBuilder

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8478:
URL: https://github.com/apache/hudi/pull/8478#issuecomment-1526072712

   
   ## CI report:
   
   * e529b3409ce663965bd925e7389a99f323d3ef2d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16707)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] Hive query returns null when the where clause has a partition field

2023-04-27 Thread via GitHub


hudi-bot commented on PR #7355:
URL: https://github.com/apache/hudi/pull/7355#issuecomment-1526070638

   
   ## CI report:
   
   * 6c5e047614106881b40c05cbaf4972e82a7c9440 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16705)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] vedantkhandelwal closed issue #8396: [SUPPORT] Cleaner configs not working . Need to clean .hoodie files after certain interval/batch

2023-04-27 Thread via GitHub


vedantkhandelwal closed issue #8396: [SUPPORT] Cleaner configs not working . 
Need to clean .hoodie files after certain interval/batch
URL: https://github.com/apache/hudi/issues/8396


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] vedantkhandelwal commented on issue #8396: [SUPPORT] Cleaner configs not working . Need to clean .hoodie files after certain interval/batch

2023-04-27 Thread via GitHub


vedantkhandelwal commented on issue #8396:
URL: https://github.com/apache/hudi/issues/8396#issuecomment-1526014493

   I've figure out the issue. We were running our data pipelines on hudi 
version 8 (from feb 2022 to july 2022) then we migrated it to hudi version 9 
(july to mid feb 2023). We noticed archive files were created in both .hoodie 
and .hoodie/archived/ directory 
   for instance:
   .hoodie/.commits_.archive.110_1-0-1
   .hoodie/archived/.commits_.archive.110_1-0-1 
   
   Then we migrated to hudi 11.1 and there cleaner was working but s3 files 
were increasing continuously.
   Then we deleted all archive, requested, clean, rollback, deltacommit, 
compaction, commit, inflight files which were older than 2 days(retaining only 
2 days of files)
   Then we migrated to hudi version 12.2 and cleaner was also working fine and 
now files are also limited.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] the-other-tim-brown commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException

2023-04-27 Thread via GitHub


the-other-tim-brown commented on issue #8519:
URL: https://github.com/apache/hudi/issues/8519#issuecomment-1526004722

   @sydneyhoran I'm still trying to come up to speed on the errors you are 
seeing but I can chime in on the behavior for the PostgresDebeziumSource and 
the payload. The 
[source](https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/debezium/PostgresDebeziumSource.java#L79)
 is going to be doing something similar to `before.*` or `after.*` along with 
pulling out some metadata from the debezium record. The payload will be marking 
the row for deletion if the `op` is `d`. In order to properly delete the 
record, the `before` field needs to be set for deletions so you can extract the 
proper `id`, `inserted_at`, and `updated_at` values so Hudi knows which record 
to delete, which partition it is in, and whether it is the latest update for 
that record.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8587: [HUDI-6145] Fix the flink table create schema to be compatible with S…

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8587:
URL: https://github.com/apache/hudi/pull/8587#issuecomment-1525933114

   
   ## CI report:
   
   * 8db0bfcd2ce5aee94771f35ceb8c0eeb905d2003 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16704)
 
   * 4ae48627662446f99d5aa84dae43725ea4e7a579 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16711)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8587: [HUDI-6145] Fix the flink table create schema to be compatible with S…

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8587:
URL: https://github.com/apache/hudi/pull/8587#issuecomment-1525920793

   
   ## CI report:
   
   * 8db0bfcd2ce5aee94771f35ceb8c0eeb905d2003 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16704)
 
   * 4ae48627662446f99d5aa84dae43725ea4e7a579 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8432: [HUDI-6072] Fix NPE when upsert merger and null map or array

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8432:
URL: https://github.com/apache/hudi/pull/8432#issuecomment-1525919896

   
   ## CI report:
   
   * 3e9388ee9a6edaa6caab4f738b093f82744bc7dc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16650)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16693)
 
   * 8a67f296a325ef968f5a29ac5cd0c75a0f7c83c6 UNKNOWN
   * 22fb6052ee84211a79a360d04600c98697d80afa Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16710)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8432: [HUDI-6072] Fix NPE when upsert merger and null map or array

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8432:
URL: https://github.com/apache/hudi/pull/8432#issuecomment-1525905963

   
   ## CI report:
   
   * 3e9388ee9a6edaa6caab4f738b093f82744bc7dc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16650)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16693)
 
   * 8a67f296a325ef968f5a29ac5cd0c75a0f7c83c6 UNKNOWN
   * 22fb6052ee84211a79a360d04600c98697d80afa UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] parisni commented on pull request #8432: [HUDI-6072] Fix NPE when upsert merger and null map or array

2023-04-27 Thread via GitHub


parisni commented on PR #8432:
URL: https://github.com/apache/hudi/pull/8432#issuecomment-1525833564

   @danny0405 added a commit to apply the exact same context as the issue. The 
previous tests did not fail w/o the patch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xiarixiaoyao commented on pull request #8587: [HUDI-6145] Fix the flink table create schema to be compatible with S…

2023-04-27 Thread via GitHub


xiarixiaoyao commented on PR #8587:
URL: https://github.com/apache/hudi/pull/8587#issuecomment-1525755093

   nice work


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8523: [HUDI-6114] Fixed rollback of blocks in scanInternalV1

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8523:
URL: https://github.com/apache/hudi/pull/8523#issuecomment-1525735385

   
   ## CI report:
   
   * 1450f0a041a5b2b5e8730ccc34c320ea7847abf6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16702)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8574: [HUDI-6139] Add support for Transformer schema validation in DeltaStreamer

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8574:
URL: https://github.com/apache/hudi/pull/8574#issuecomment-1525693379

   
   ## CI report:
   
   * 5e6936d61afac645b8372b9a2564981c5aad9dca Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16706)
 
   * 080184949c2e1ece7e272e71f0844597a41c8a0b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16709)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8503:
URL: https://github.com/apache/hudi/pull/8503#issuecomment-1525692925

   
   ## CI report:
   
   * 0738d975df341763e384b9ac9bcad14b006c9c47 UNKNOWN
   * f6a34e0433d9a3f2f7d7c618fe1e94edd3fc82cb Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16701)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-27 Thread via GitHub


SteNicholas commented on code in PR #8503:
URL: https://github.com/apache/hudi/pull/8503#discussion_r1179114163


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java:
##
@@ -509,7 +509,15 @@ private Stream getCommitInstantsToArchive() 
throws IOException {
   }
 
   private Stream getInstantsToArchive() throws IOException {
-Stream instants = 
Stream.concat(getCleanInstantsToArchive(), getCommitInstantsToArchive());
+List commitInstantsToArchive = 
getCommitInstantsToArchive().collect(Collectors.toList());
+Stream instants = 
Stream.concat(getCleanInstantsToArchive(), commitInstantsToArchive.stream());
+HoodieInstant hoodieOldestInstantToArchive = 
commitInstantsToArchive.stream().max(Comparator.comparing(maxInstant -> 
maxInstant.getTimestamp())).orElse(null);
+/**
+ * if hoodieOldestInstantToArchive is null that means nothing is getting 
archived, so no need to update metadata
+ */
+if (hoodieOldestInstantToArchive != null) {
+  table.getIndex().updateMetadata(table, 
Option.of(hoodieOldestInstantToArchive));

Review Comment:
   @rohan-uptycs, when `postCommit` executes successfully, `updateMetadata` 
could invoke.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8574: [HUDI-6139] Add support for Transformer schema validation in DeltaStreamer

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8574:
URL: https://github.com/apache/hudi/pull/8574#issuecomment-1525630390

   
   ## CI report:
   
   * cf4e7358763e10aab951d16d7270f7592f7c62b0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16645)
 
   * 5e6936d61afac645b8372b9a2564981c5aad9dca Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16706)
 
   * 080184949c2e1ece7e272e71f0844597a41c8a0b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8514: [HUDI-6113] Support multiple transformers using the same config keys in DeltaStreamer

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8514:
URL: https://github.com/apache/hudi/pull/8514#issuecomment-1525630032

   
   ## CI report:
   
   * 524a060f0e5fcf31d43cc538dbc80004d64c9b52 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16685)
 
   * 272e1bd85c19968502784da9eadc14624059ed7f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16708)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8478: [HUDI-6086] Improve HiveSchemaUtil#generateCreateDDL With StringBuilder

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8478:
URL: https://github.com/apache/hudi/pull/8478#issuecomment-1525629794

   
   ## CI report:
   
   * 6605f4759c25ff79eb43928cdbc97b086a905534 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16698)
 
   * e529b3409ce663965bd925e7389a99f323d3ef2d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16707)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8514: [HUDI-6113] Support multiple transformers using the same config keys in DeltaStreamer

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8514:
URL: https://github.com/apache/hudi/pull/8514#issuecomment-1525619167

   
   ## CI report:
   
   * 524a060f0e5fcf31d43cc538dbc80004d64c9b52 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16685)
 
   * 272e1bd85c19968502784da9eadc14624059ed7f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8478: [HUDI-6086] Improve HiveSchemaUtil#generateCreateDDL With StringBuilder

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8478:
URL: https://github.com/apache/hudi/pull/8478#issuecomment-1525618855

   
   ## CI report:
   
   * 6605f4759c25ff79eb43928cdbc97b086a905534 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16698)
 
   * e529b3409ce663965bd925e7389a99f323d3ef2d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8585: [DOC]Improve documentation of org.apache.hudi.common.table.view.Abstr…

2023-04-27 Thread via GitHub


hudi-bot commented on PR #8585:
URL: https://github.com/apache/hudi/pull/8585#issuecomment-1525604467

   
   ## CI report:
   
   * ef0b65c6471448ba86899c587618e60a6377d3c8 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16700)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >