Re: [PR] [HUDI-6472] fix spark sql does not ignore case [hudi]

2024-03-05 Thread via GitHub


beyond1920 commented on PR #10826:
URL: https://github.com/apache/hudi/pull/10826#issuecomment-1980284804

   @jonvex It seems a little heavy to use the optimizer here just for case 
insensitive.
   Besides, if wrap an optimize phase here, user might missed information about 
plan conversion in spark sql WEB UI, right?
   Is there any better solution?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6472] fix spark sql does not ignore case [hudi]

2024-03-05 Thread via GitHub


beyond1920 commented on code in PR #10826:
URL: https://github.com/apache/hudi/pull/10826#discussion_r1513982842


##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala:
##
@@ -95,7 +95,9 @@ object InsertIntoHoodieTableCommand extends Logging with 
ProvidesHoodieConfig wi
 }
 val config = buildHoodieInsertConfig(catalogTable, sparkSession, 
isOverWritePartition, isOverWriteTable, partitionSpec, extraOptions, 
staticOverwritePartitionPathOpt)
 
-val alignedQuery = alignQueryOutput(query, catalogTable, partitionSpec, 
sparkSession.sessionState.conf)
+val optimizer = sparkSession.sessionState.optimizer
+val optimizerPlan = optimizer.execute(query)
+val alignedQuery = alignQueryOutput(optimizerPlan, catalogTable, 
partitionSpec, sparkSession.sessionState.conf)

Review Comment:
   It's seems a little heavy to use optimizer just for case insensitive.
   Besides, if wrap an optimize phase here, user might missed something in 
spark sql WEB UI.
   Is there any better solution?



##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala:
##
@@ -95,7 +95,9 @@ object InsertIntoHoodieTableCommand extends Logging with 
ProvidesHoodieConfig wi
 }
 val config = buildHoodieInsertConfig(catalogTable, sparkSession, 
isOverWritePartition, isOverWriteTable, partitionSpec, extraOptions, 
staticOverwritePartitionPathOpt)
 
-val alignedQuery = alignQueryOutput(query, catalogTable, partitionSpec, 
sparkSession.sessionState.conf)
+val optimizer = sparkSession.sessionState.optimizer
+val optimizerPlan = optimizer.execute(query)
+val alignedQuery = alignQueryOutput(optimizerPlan, catalogTable, 
partitionSpec, sparkSession.sessionState.conf)

Review Comment:
   It seems a little heavy to use optimizer just for case insensitive.
   Besides, if wrap an optimize phase here, user might missed something in 
spark sql WEB UI.
   Is there any better solution?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7457] Remove runtime shutdown hook from HoodieLogFormatWriter [hudi]

2024-03-05 Thread via GitHub


bvaradar commented on code in PR #10789:
URL: https://github.com/apache/hudi/pull/10789#discussion_r1513977038


##
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatWriter.java:
##
@@ -62,15 +61,14 @@ public class HoodieLogFormatWriter implements 
HoodieLogFormat.Writer {
   Short replication,
   Long sizeThreshold,
   String rolloverLogWriteToken,
-  LogFileCreationCallback fileCreationHook) {
+  LogFileCreationCallback fileCreationCallback) {

Review Comment:
   Looking at other places like HoodieWriteHandle#createLogWriter, we are 
passing the LogFormatWriter to the caller. So, it does not look like we can get 
away by try-with-resource refactoring alone. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6472] fix spark sql does not ignore case [hudi]

2024-03-05 Thread via GitHub


danny0405 commented on code in PR #10826:
URL: https://github.com/apache/hudi/pull/10826#discussion_r1513958254


##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala:
##
@@ -95,7 +95,9 @@ object InsertIntoHoodieTableCommand extends Logging with 
ProvidesHoodieConfig wi
 }
 val config = buildHoodieInsertConfig(catalogTable, sparkSession, 
isOverWritePartition, isOverWriteTable, partitionSpec, extraOptions, 
staticOverwritePartitionPathOpt)
 
-val alignedQuery = alignQueryOutput(query, catalogTable, partitionSpec, 
sparkSession.sessionState.conf)
+val optimizer = sparkSession.sessionState.optimizer
+val optimizerPlan = optimizer.execute(query)
+val alignedQuery = alignQueryOutput(optimizerPlan, catalogTable, 
partitionSpec, sparkSession.sessionState.conf)

Review Comment:
   Yeah, we need some clarification here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6472] fix spark sql does not ignore case [hudi]

2024-03-05 Thread via GitHub


danny0405 commented on code in PR #10826:
URL: https://github.com/apache/hudi/pull/10826#discussion_r1513954128


##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala:
##
@@ -346,12 +346,14 @@ case class MergeIntoHoodieTableCommand(mergeInto: 
MergeIntoTable) extends Hoodie
   Project(incomingDataCols, joinData)
 }
 
-val projectedJoinOutput = projectedJoinPlan.output
+val optimizer = sparkSession.sessionState.optimizer
+val projectedJoinOptimizerPlan = optimizer.execute(projectedJoinPlan)
+val projectedJoinOptimizerOutput = projectedJoinOptimizerPlan.output

Review Comment:
   So the `optimizer.execute` is the critical step to fix the case 
sensitiveness.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10818:
URL: https://github.com/apache/hudi/pull/10818#issuecomment-1980238002

   
   ## CI report:
   
   * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN
   * 9e94942e0c82e9ed8e41744ab0fa3033fe5c0e39 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22810)
 
   * a3ed6c818a477182fe075a8e06efe3f80353ce43 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22812)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] If Sanitastiion Enabled In HudiStreamer It is taking too much time [hudi]

2024-03-05 Thread via GitHub


Amar1404 commented on issue #10466:
URL: https://github.com/apache/hudi/issues/10466#issuecomment-1980235660

   @ad1happy2go - Any updated on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-03-05 Thread via GitHub


yihua closed pull request #10360: [HUDI-6497] WIP HoodieStorage abstraction
URL: https://github.com/apache/hudi/pull/10360


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-03-05 Thread via GitHub


yihua commented on PR #10360:
URL: https://github.com/apache/hudi/pull/10360#issuecomment-1980235486

   This PR is replaced by #10591 as the last main piece of the storage 
abstraction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7059][WIP] Read record positions with filter pushdown using Spark parquet reader [hudi]

2024-03-05 Thread via GitHub


yihua commented on PR #10030:
URL: https://github.com/apache/hudi/pull/10030#issuecomment-1980233153

   The functionality is implemented in #10167.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7059][WIP] Read record positions with filter pushdown using Spark parquet reader [hudi]

2024-03-05 Thread via GitHub


yihua closed pull request #10030: [HUDI-7059][WIP] Read record positions with 
filter pushdown using Spark parquet reader
URL: https://github.com/apache/hudi/pull/10030


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10818:
URL: https://github.com/apache/hudi/pull/10818#issuecomment-1980229022

   
   ## CI report:
   
   * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN
   * 9e94942e0c82e9ed8e41744ab0fa3033fe5c0e39 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22810)
 
   * a3ed6c818a477182fe075a8e06efe3f80353ce43 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10818:
URL: https://github.com/apache/hudi/pull/10818#issuecomment-1980220475

   
   ## CI report:
   
   * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN
   * 9e94942e0c82e9ed8e41744ab0fa3033fe5c0e39 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22810)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7457] Remove runtime shutdown hook from HoodieLogFormatWriter [hudi]

2024-03-05 Thread via GitHub


danny0405 commented on PR #10789:
URL: https://github.com/apache/hudi/pull/10789#issuecomment-1980202198

   Hi, @nsivabalan , can you help to review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7457] Remove runtime shutdown hook from HoodieLogFormatWriter [hudi]

2024-03-05 Thread via GitHub


danny0405 commented on code in PR #10789:
URL: https://github.com/apache/hudi/pull/10789#discussion_r1513920873


##
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatWriter.java:
##
@@ -62,15 +61,14 @@ public class HoodieLogFormatWriter implements 
HoodieLogFormat.Writer {
   Short replication,
   Long sizeThreshold,
   String rolloverLogWriteToken,
-  LogFileCreationCallback fileCreationHook) {
+  LogFileCreationCallback fileCreationCallback) {

Review Comment:
   Hi, @bvaradar @nbalajee can you help to confirm whether it is safe to remove 
this shutdown hook from the log format writer?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10818:
URL: https://github.com/apache/hudi/pull/10818#issuecomment-1980167738

   
   ## CI report:
   
   * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN
   * 6fa1f1c34c6007131603081330cc6cd878df4d75 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22809)
 
   * 9e94942e0c82e9ed8e41744ab0fa3033fe5c0e39 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22810)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6472] fix spark sql does not ignore case [hudi]

2024-03-05 Thread via GitHub


yihua commented on code in PR #10826:
URL: https://github.com/apache/hudi/pull/10826#discussion_r1513878332


##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala:
##
@@ -372,17 +374,23 @@ case class MergeIntoHoodieTableCommand(mergeInto: 
MergeIntoTable) extends Hoodie
 // In case when we're not adding new columns we need to make sure that the 
casing of the key attributes'
 // matches to that one of the target table. This is necessary b/c unlike 
Spark, Avro is case-sensitive
 // and therefore would fail downstream if case of corresponding columns 
don't match
+val partitionColumns = 
hoodieCatalogTable.tableConfig.getPartitionFieldProp.split(",").toSeq
 val existingAttributes = existingAttributesMap.map(_._1)
-val adjustedSourceTableOutput = projectedJoinOutput.map { attr =>
+val adjustedSourceTableOutput = projectedJoinOptimizerOutput.map { attr =>
   existingAttributes.find(keyAttr => resolver(keyAttr.name, attr.name)) 
match {
 // To align the casing we just rename the attribute to match that one 
of the
 // target table
 case Some(keyAttr) => attr.withName(keyAttr.name)
-case _ => attr
+// additional check for partition columns because they are not 
required,
+// but we still care about casing because of keygenerator
+case _ => partitionColumns.find(colName => resolver(colName, 
attr.name)) match {

Review Comment:
   can we add a test around this case?



##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestInsertTable.scala:
##
@@ -2448,6 +2449,50 @@ class TestInsertTable extends HoodieSparkSqlTestBase {
 })
   }
 
+  test("Test query with Foldable Propagation expression") {

Review Comment:
   Make the test name more readable



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10818:
URL: https://github.com/apache/hudi/pull/10818#issuecomment-1980160838

   
   ## CI report:
   
   * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN
   * 6fa1f1c34c6007131603081330cc6cd878df4d75 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22809)
 
   * 9e94942e0c82e9ed8e41744ab0fa3033fe5c0e39 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6472] fix spark sql does not ignore case [hudi]

2024-03-05 Thread via GitHub


yihua commented on code in PR #10826:
URL: https://github.com/apache/hudi/pull/10826#discussion_r1513876839


##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala:
##
@@ -95,7 +95,9 @@ object InsertIntoHoodieTableCommand extends Logging with 
ProvidesHoodieConfig wi
 }
 val config = buildHoodieInsertConfig(catalogTable, sparkSession, 
isOverWritePartition, isOverWriteTable, partitionSpec, extraOptions, 
staticOverwritePartitionPathOpt)
 
-val alignedQuery = alignQueryOutput(query, catalogTable, partitionSpec, 
sparkSession.sessionState.conf)
+val optimizer = sparkSession.sessionState.optimizer
+val optimizerPlan = optimizer.execute(query)
+val alignedQuery = alignQueryOutput(optimizerPlan, catalogTable, 
partitionSpec, sparkSession.sessionState.conf)

Review Comment:
   is this required for case insensitivity or this is for performance 
optimization?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6472] fix spark sql does not ignore case [hudi]

2024-03-05 Thread via GitHub


yihua commented on code in PR #10826:
URL: https://github.com/apache/hudi/pull/10826#discussion_r1513874685


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestInsertTable.scala:
##
@@ -2448,6 +2449,50 @@ class TestInsertTable extends HoodieSparkSqlTestBase {
 })
   }
 
+  test("Test query with Foldable Propagation expression") {
+withRecordType(Seq(HoodieRecordType.AVRO))(withTempDir { tmp =>

Review Comment:
   Remove `withRecordType(Seq(HoodieRecordType.AVRO))(` as it's not required.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix Azure publishing of JUnit results [hudi]

2024-03-05 Thread via GitHub


zhangyue19921010 merged PR #10817:
URL: https://github.com/apache/hudi/pull/10817


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (f5284964e29 -> b710c07c0e8)

2024-03-05 Thread zhangyue19921010
This is an automated email from the ASF dual-hosted git repository.

zhangyue19921010 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from f5284964e29 [HUDI-7418] Create a common method for filtering in S3 and 
GCS sources and add tests for filtering out extensions (#10724)
 add b710c07c0e8 [MINOR] Fix Azure publishing of JUnit results (#10817)

No new revisions were added by this update.

Summary of changes:
 azure-pipelines-20230430.yml | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)



Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10818:
URL: https://github.com/apache/hudi/pull/10818#issuecomment-1980153644

   
   ## CI report:
   
   * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN
   * 6fa1f1c34c6007131603081330cc6cd878df4d75 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22809)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] - Hudi 0.12.1 - production job slowing down [hudi]

2024-03-05 Thread via GitHub


ad1happy2go commented on issue #10822:
URL: https://github.com/apache/hudi/issues/10822#issuecomment-1980135128

   @joshhamann Can you please provide the writer configuration to look into 
this more.
   
   If you are using upsert operation type, The load to a new Hudi Table will be 
expected to run faster as there is no existing dataset to join with to identify 
which records need to be upserted. So when we benchmarked 5 min vs 15 min, was 
the Hudi Table was empty or it had same amount of existing data as old table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Slashes in partition columns [hudi]

2024-03-05 Thread via GitHub


ad1happy2go commented on issue #10754:
URL: https://github.com/apache/hudi/issues/10754#issuecomment-1980114792

   Similar jira raised to fix this issue - 
https://issues.apache.org/jira/browse/HUDI-7484


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix Azure publishing of JUnit results [hudi]

2024-03-05 Thread via GitHub


yihua commented on PR #10817:
URL: https://github.com/apache/hudi/pull/10817#issuecomment-1980109582

   @stream2000 @zhangyue19921010 @leesf  appreciate it if one of you could 
review and land this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch branch-0.x updated: [HUDI-7463] Bump Spark 3.5 version to Spark 3.5.1 (#10788)

2024-03-05 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch branch-0.x
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/branch-0.x by this push:
 new 2b4e6588079 [HUDI-7463] Bump Spark 3.5 version to Spark 3.5.1 (#10788)
2b4e6588079 is described below

commit 2b4e658807933bde0a31f5fe565bd80f11d13f31
Author: Shawn Chang <42792772+c...@users.noreply.github.com>
AuthorDate: Tue Mar 5 21:11:40 2024 -0800

[HUDI-7463] Bump Spark 3.5 version to Spark 3.5.1 (#10788)

Co-authored-by: Shawn Chang 
---
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pom.xml b/pom.xml
index 903d3a58714..9b76ec7e95d 100644
--- a/pom.xml
+++ b/pom.xml
@@ -166,7 +166,7 @@
 3.2.3
 3.3.1
 3.4.1
-3.5.0
+3.5.1
 hudi-spark3.2.x
 

Re: [PR] [HUDI-7463][branch-0.x] Bump Spark 3.5 version to Spark 3.5.1 [hudi]

2024-03-05 Thread via GitHub


yihua merged PR #10788:
URL: https://github.com/apache/hudi/pull/10788


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10818:
URL: https://github.com/apache/hudi/pull/10818#issuecomment-1980098240

   
   ## CI report:
   
   * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN
   * b75239bfa677b1640ff49dcd705acb3db4263a69 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22806)
 
   * 3a23e19c1f719b79092bc65c7047c6c83bae657c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22808)
 
   * 6fa1f1c34c6007131603081330cc6cd878df4d75 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22809)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] updated video content [hudi]

2024-03-05 Thread via GitHub


nfarah86 commented on PR #10827:
URL: https://github.com/apache/hudi/pull/10827#issuecomment-1980084074

   https://github.com/apache/hudi/assets/5392555/598bce23-f2f5-48e5-835f-bbd262a5f1ef";>
   
   @bhasudha video blogs are ready


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] updated video content [hudi]

2024-03-05 Thread via GitHub


nfarah86 opened a new pull request, #10827:
URL: https://github.com/apache/hudi/pull/10827

   ### Change Logs
   
   updated video content
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   none
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10818:
URL: https://github.com/apache/hudi/pull/10818#issuecomment-1980064748

   
   ## CI report:
   
   * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN
   * b75239bfa677b1640ff49dcd705acb3db4263a69 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22806)
 
   * 3a23e19c1f719b79092bc65c7047c6c83bae657c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22808)
 
   * 6fa1f1c34c6007131603081330cc6cd878df4d75 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10818:
URL: https://github.com/apache/hudi/pull/10818#issuecomment-1980059762

   
   ## CI report:
   
   * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN
   * b75239bfa677b1640ff49dcd705acb3db4263a69 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22806)
 
   * 3a23e19c1f719b79092bc65c7047c6c83bae657c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6472] fix spark sql does not ignore case [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10826:
URL: https://github.com/apache/hudi/pull/10826#issuecomment-1980054774

   
   ## CI report:
   
   * 0c2acd76a0d937ac926d5fdabafbfc1d66b61e2f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22807)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10818:
URL: https://github.com/apache/hudi/pull/10818#issuecomment-1980054735

   
   ## CI report:
   
   * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN
   * b75239bfa677b1640ff49dcd705acb3db4263a69 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22806)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10818:
URL: https://github.com/apache/hudi/pull/10818#issuecomment-1979998052

   
   ## CI report:
   
   * d8ffb6b051f147146e927f1241efd015bf758c6a Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22804)
 
   * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN
   * b75239bfa677b1640ff49dcd705acb3db4263a69 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22806)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7529] fix multiple tasks get the lock at the same time when use… [hudi]

2024-03-05 Thread via GitHub


KnightChess closed pull request #10412: [HUDI-7529] fix multiple tasks get the 
lock at the same time when use…
URL: https://github.com/apache/hudi/pull/10412


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7529] fix multiple tasks get the lock at the same time when use… [hudi]

2024-03-05 Thread via GitHub


KnightChess commented on PR #10412:
URL: https://github.com/apache/hudi/pull/10412#issuecomment-1979984556

   close it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [WIP][HUDI-6472] fix spark sql does not ignore case [hudi]

2024-03-05 Thread via GitHub


KnightChess closed pull request #10582: [WIP][HUDI-6472] fix spark sql does not 
ignore case
URL: https://github.com/apache/hudi/pull/10582


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] change hive/adb tool not auto create database default [hudi]

2024-03-05 Thread via GitHub


KnightChess closed pull request #9640: [MINOR] change hive/adb tool not auto 
create database default
URL: https://github.com/apache/hudi/pull/9640


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-5956] Simple repair spark sql dag ui display problem [hudi]

2024-03-05 Thread via GitHub


KnightChess closed pull request #8233: [HUDI-5956] Simple repair spark sql dag 
ui display problem
URL: https://github.com/apache/hudi/pull/8233


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [WIP][HUDI-6472] fix spark sql does not ignore case [hudi]

2024-03-05 Thread via GitHub


KnightChess commented on PR #10582:
URL: https://github.com/apache/hudi/pull/10582#issuecomment-1979982764

   Sorry for the late reply. @jonvex I will close this pr, thank you work for 
it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Data loss due to incorrect selection of log file during compaction [hudi]

2024-03-05 Thread via GitHub


Ytimetravel commented on issue #10803:
URL: https://github.com/apache/hudi/issues/10803#issuecomment-1979974997

   Thank you very much for following up on the issue and providing feedback. I 
will use the tool you provided to obtain some meta info about our log blocks 
and records, and will get back to you later~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6472] fix spark sql does not ignore case [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10826:
URL: https://github.com/apache/hudi/pull/10826#issuecomment-1979962655

   
   ## CI report:
   
   * 0c2acd76a0d937ac926d5fdabafbfc1d66b61e2f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22807)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10818:
URL: https://github.com/apache/hudi/pull/10818#issuecomment-1979962618

   
   ## CI report:
   
   * d8ffb6b051f147146e927f1241efd015bf758c6a Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22804)
 
   * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN
   * b75239bfa677b1640ff49dcd705acb3db4263a69 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-2706] refactor spark-sql to make consistent with DataFrame api [hudi]

2024-03-05 Thread via GitHub


boneanxs commented on code in PR #3936:
URL: https://github.com/apache/hudi/pull/3936#discussion_r1513731553


##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/DeleteHoodieTableCommand.scala:
##
@@ -56,32 +57,36 @@ case class DeleteHoodieTableCommand(deleteTable: 
DeleteFromTable) extends Runnab
   }
 
   private def buildHoodieConfig(sparkSession: SparkSession): Map[String, 
String] = {
-val targetTable = sparkSession.sessionState.catalog
-  .getTableMetadata(tableId)
+val targetTable = 
sparkSession.sessionState.catalog.getTableMetadata(tableId)
+val tblProperties = targetTable.storage.properties ++ 
targetTable.properties
 val path = getTableLocation(targetTable, sparkSession)
 val conf = sparkSession.sessionState.newHadoopConf()
 val metaClient = HoodieTableMetaClient.builder()
   .setBasePath(path)
   .setConf(conf)
   .build()
 val tableConfig = metaClient.getTableConfig
-val primaryColumns = 
HoodieOptionConfig.getPrimaryColumns(targetTable.storage.properties)

Review Comment:
   As far as I know, hudi currently uses `spark.sql.caseSensitive` to choose 
caseSensitive or not during analyze stage, and by default it's false, so it 
might be reasonable that we need to respect that configure as well here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6472] fix spark sql does not ignore case [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10826:
URL: https://github.com/apache/hudi/pull/10826#issuecomment-1979957408

   
   ## CI report:
   
   * 0c2acd76a0d937ac926d5fdabafbfc1d66b61e2f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10818:
URL: https://github.com/apache/hudi/pull/10818#issuecomment-1979957377

   
   ## CI report:
   
   * d8ffb6b051f147146e927f1241efd015bf758c6a Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22804)
 
   * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10818:
URL: https://github.com/apache/hudi/pull/10818#issuecomment-1979951480

   
   ## CI report:
   
   * d8ffb6b051f147146e927f1241efd015bf758c6a Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22804)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (HUDI-7475) Disable ITs in hudi-aws module

2024-03-05 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov resolved HUDI-7475.
-

> Disable ITs in hudi-aws module
> --
>
> Key: HUDI-7475
> URL: https://issues.apache.org/jira/browse/HUDI-7475
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: pull-request-available
>
> The tests do not work.  Disabling them to unblock Azure CI.
> {code:java}
> [ERROR] Errors: 
> [ERROR]   ITTestGluePartitionPushdown.setUp:96 » Execution 
> software.amazon.awssdk.core.e...
> [ERROR]   ITTestGluePartitionPushdown.setUp:96 » Execution 
> software.amazon.awssdk.core.e...
> [ERROR]   ITTestGluePartitionPushdown.setUp:96 » Execution 
> software.amazon.awssdk.core.e...
> [ERROR]   
> ITTestDynamoDBBasedLockProvider.setup:66->getDynamoClientWithLocalEndpoint:110
>  IllegalState
> [INFO] 
> [ERROR] Tests run: 9, Failures: 0, Errors: 4, Skipped: 0
> 2024-03-04T04:55:22.6893321Z [ERROR] 
> org.apache.hudi.aws.transaction.integ.ITTestDynamoDBBasedLockProvider  Time 
> elapsed: 0.019 s  <<< ERROR!
> 2024-03-04T04:55:22.6893739Z java.lang.IllegalStateException: 
> dynamodb-local.endpoint system property not set
> 2024-03-04T04:55:22.6894356Z  at 
> org.apache.hudi.aws.transaction.integ.ITTestDynamoDBBasedLockProvider.getDynamoClientWithLocalEndpoint(ITTestDynamoDBBasedLockProvider.java:110)
> 2024-03-04T04:55:22.6894867Z  at 
> org.apache.hudi.aws.transaction.integ.ITTestDynamoDBBasedLockProvider.setup(ITTestDynamoDBBasedLockProvider.java:66)
> 2024-03-04T04:55:22.6895225Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2024-03-04T04:55:22.6895711Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2024-03-04T04:55:22.6896080Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2024-03-04T04:55:22.6896418Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2024-03-04T04:55:22.6896755Z  at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
> 2024-03-04T04:55:22.6897322Z  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> 2024-03-04T04:55:22.6897911Z  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> 2024-03-04T04:55:22.6971261Z  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
> 2024-03-04T04:55:22.6971737Z  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptLifecycleMethod(TimeoutExtension.java:126)
> 2024-03-04T04:55:22.6972156Z  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptBeforeAllMethod(TimeoutExtension.java:68)
> 2024-03-04T04:55:22.6972608Z  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
> 2024-03-04T04:55:22.6973048Z  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
> 2024-03-04T04:55:22.6973483Z  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> 2024-03-04T04:55:22.6974121Z  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> 2024-03-04T04:55:22.6974562Z  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> 2024-03-04T04:55:22.6975257Z  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> 2024-03-04T04:55:22.6975649Z  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
> 2024-03-04T04:55:22.6976025Z  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
> 2024-03-04T04:55:22.6976454Z  at 
> org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.lambda$invokeBeforeAllMethods$9(ClassBasedTestDescriptor.java:384)
> 2024-03-04T04:55:22.6976901Z  at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> 2024-03-04T04:55:22.6977341Z  at 
> org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.invokeBeforeAllMethods(ClassBasedTestDescriptor.java:382)
> 2024-03-04T04:55:22.6977781Z  at 
> org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.before(ClassBasedTestDescriptor.java:196)
> 2024-03-04T04:55:22.6978194Z  at 
> org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.before(ClassBasedTestDescriptor.java:78)
> 2024-03-04T04:55:22.6978624Z  at 
> org.junit.platform.engine.support.hierarchical.NodeTest

[jira] [Created] (HUDI-7484) Fix partitioning style when partition is inferred from partitionBy

2024-03-05 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-7484:
-

 Summary: Fix partitioning style when partition is inferred from 
partitionBy
 Key: HUDI-7484
 URL: https://issues.apache.org/jira/browse/HUDI-7484
 Project: Apache Hudi
  Issue Type: Task
Reporter: Sagar Sumit
 Fix For: 1.0.0


When inferring partition from partitionBy() arguments and hive style 
partitioning is enabled, we observe that the partitioining style is not 
uniformed for multi-level partition. Directory structure is as follows:
partition=2015
                       |- 03
                             |- 15
                             |- 16



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [I] [SUPPORT] Dataloss in FlinkCDC into Hudi without any exception or other infomation [hudi]

2024-03-05 Thread via GitHub


xuzifu666 commented on issue #10542:
URL: https://github.com/apache/hudi/issues/10542#issuecomment-1979941864

   > hey @xuzifu666 : do you happened to have the old data intact which had 
data loss. We would like to root cause this. 0.x release line will be used by 
lot of OSS users. So, we really wanna get to the bottom of it and fix it.
   > 
   > Would greatly appreciate if you an help us triage this.
   > 
   > * Do you happened to know when exactly the data loss happens. do you see 
anything interesting in the timeline around the time the data loss happens.
   > * Is it a single writer or multi-writer.
   > * We do have some suspicion around log record reading that we are chasing. 
Ref ticket: [[SUPPORT] Data loss due to incorrect selection of log file during 
compaction #10803](https://github.com/apache/hudi/issues/10803) But I do not 
want to bias this one. lets get more info about when exactly data loss is seen.
   > * Are there any task retries in general. I am not familiar w/ flink. But 
in spark, we might have spark task retries. Are there any such things happening 
in your pipeline.
   > * Is it happening across all pipelines occasionally or very few pipelines. 
And if its very few, is there any common characteristics like index type, 
metadata enabled, etc. in comparison to other pipelines which does not have the 
data loss issue.
   > * And can you confirm that these pipelines were running w/o any issues w/ 
older versions of hudi.
   > * Do you happened to reproduce this in a deterministic manner?
   
   Hi @nsivabalan Thanks for your attention, according to your raised 
conditions,I list as follow:
   1. From all the loss record timestamp,It would happend arround flink job 
checkpoint finished,but job state is ok,no exception in timeline. because this 
it is hard to tag the root.
   2. In our case,dataloss happend in single write job.
   3. https://github.com/apache/hudi/issues/10803 the issue had read 
recently,but it produce in compaction sence,we had test in all sences about:a. 
flink job with compaction online; b. flink job without compaction c.flink job 
with compaction by spark compaction sync.   These scences all could happend 
dataloss.
   4. All the time job is stable without any exception. No any retried during 
the running time.
   5. Pipline is about 4 or 5 number size,and we did not use mdt,table type is 
mor,index type is bucket.
   6. We use Hudi version is 0.14.0
   7. Since now we had get a deterministic manner to reproduce it because job 
state is very well and timeline state is OK.
   If you have any other questions can leave anytime.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7480) initializeFunctionalIndexPartition is called multiple times

2024-03-05 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7480:
--
Fix Version/s: 1.0.0

> initializeFunctionalIndexPartition is called multiple times
> ---
>
> Key: HUDI-7480
> URL: https://issues.apache.org/jira/browse/HUDI-7480
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Vinaykumar Bhat
>Assignee: Sagar Sumit
>Priority: Major
> Fix For: 1.0.0
>
>
> This is due to a issue in 
> initializeFromFilesystem(), which tries to check if MDT partition needs to be 
> initialized based on the absence of partition-type. But for functional index, 
> partition-type actually store the prefix (func_index_)- hence the check 
> always fails and we try to reinit the same functional index partition again.
>  
> Simple test:
> {quote}spark.sql(
> s"""
> |create table $tableName (
> | id int,
> | name string,
> | price double,
> | ts long
> |) using hudi
> | options (
> | primaryKey ='id',
> | type = '$tableType',
> | preCombineField = 'ts',
> | hoodie.metadata.record.index.enable = 'true',
> | hoodie.datasource.write.recordkey.field = 'id'
> | )
> | partitioned by(ts)
> | location '$basePath'
> """.stripMargin)
> spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)")
> spark.sql(s"insert into $tableName values(2, 'a2', 10, 1001)")
> spark.sql(s"insert into $tableName values(3, 'a3', 10, 1002)")
>  
> var createIndexSql = s"create index idx_datestr on $tableName using 
> column_stats(ts) options(func='from_unixtime', format='-MM-dd')"
> spark.sql(createIndexSql)
>  
> -- This insert throws null-pointer exception
> spark.sql(s"insert into $tableName values(4, 'a4', 10, 1004)"){quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [WIP][HUDI-6472] fix spark sql does not ignore case [hudi]

2024-03-05 Thread via GitHub


jonvex commented on PR #10582:
URL: https://github.com/apache/hudi/pull/10582#issuecomment-1979936406

   made some changes to this pr and put them into a new one 
https://github.com/apache/hudi/pull/10826. @danny0405 how should we proceed? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Schema file too large and keeps growing, OOM when http handle it [hudi]

2024-03-05 Thread via GitHub


lei-su-awx commented on issue #10816:
URL: https://github.com/apache/hudi/issues/10816#issuecomment-1979934695

   @ad1happy2go got it, thanks for your reply.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-6472] fix spark sql does not ignore case [hudi]

2024-03-05 Thread via GitHub


jonvex opened a new pull request, #10826:
URL: https://github.com/apache/hudi/pull/10826

   ### Change Logs
   
   https://github.com/apache/hudi/pull/10582
   with the following changes:
   
   - HoodieSpark32PlusAnalysis: made this change much less complex
   - correct capitalization of partition column names for keygen
   
   ### Impact
   
   allow merge into to ignore case
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-60) [UMBRELLA] Support Apache Beam / Hudi IO

2024-03-05 Thread xy (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823826#comment-17823826
 ] 

xy commented on HUDI-60:


:),I aggree that during the work in HudiIO table services should be disabled 
and let user do table service aync offline. Decoupling the logic is fitable.

> [UMBRELLA] Support Apache Beam / Hudi IO
> 
>
> Key: HUDI-60
> URL: https://issues.apache.org/jira/browse/HUDI-60
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: spark, Utilities
>Reporter: Vinoth Chandar
>Assignee: xy
>Priority: Major
>  Labels: gsoc, gsoc2021, hudi-umbrellas, mentor
> Fix For: 0.15.0
>
>
> We would like to add a HudiIO for Beam, along the lines of 
> [https://github.com/apache/beam/blob/master/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java]
>  
> For the initial cut : we can leave the table services turned off on the 
> writer and advise users to run them independently?
> During this work - we can also look into anything need to be fixed on the 
> java-client module, which works with GenericRecords as well (used by the 
> Kafka Connect Sink). So if thats in shape, this can be much easier.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7418] Create a common method for filtering in S3 and GCS sources and add tests for filtering out extensions [hudi]

2024-03-05 Thread via GitHub


yihua merged PR #10724:
URL: https://github.com/apache/hudi/pull/10724


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (31613745168 -> f5284964e29)

2024-03-05 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 31613745168 [HUDI-7475] Disable ITs in hudi-aws module (#10821)
 add f5284964e29 [HUDI-7418] Create a common method for filtering in S3 and 
GCS sources and add tests for filtering out extensions (#10724)

No new revisions were added by this update.

Summary of changes:
 .../hudi/utilities/config/CloudSourceConfig.java   |  4 +-
 .../config/S3EventsHoodieIncrSourceConfig.java |  6 ++
 .../sources/GcsEventsHoodieIncrSource.java |  8 +--
 .../sources/S3EventsHoodieIncrSource.java  | 50 +++-
 .../helpers/CloudObjectsSelectorCommon.java| 68 ++
 .../helpers/gcs/GcsObjectMetadataFetcher.java  | 39 +
 .../sources/TestGcsEventsHoodieIncrSource.java | 42 +
 .../sources/TestS3EventsHoodieIncrSource.java  |  6 +-
 8 files changed, 124 insertions(+), 99 deletions(-)



Re: [PR] initial commit for hudi blogs [hudi]

2024-03-05 Thread via GitHub


nfarah86 commented on PR #10719:
URL: https://github.com/apache/hudi/pull/10719#issuecomment-1979916357

   should be good @bhasudha 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] initial commit for hudi blogs [hudi]

2024-03-05 Thread via GitHub


nfarah86 commented on code in PR #10719:
URL: https://github.com/apache/hudi/pull/10719#discussion_r1513695794


##
website/blog/2024-01-18-Deleting-Items-from-Apache-Hudi-using-Delta-Streamer-in-UPSERT-Mode-with-Kafka-Avro-Messages.mdx:
##
@@ -0,0 +1,23 @@
+---
+title: "Deleting Items from Apache Hudi using Delta Streamer in UPSERT Mode 
with Kafka Avro Messages"
+excerpt: "Deleting Items from Apache Hudi using Delta Streamer in UPSERT Mode 
with Kafka Avro Messages"
+author: Soumil Shah
+category: blog
+image: 
/assets/images/blog/2024-01-18-Deleting-Items-from-Apache-Hudi-using-Delta-Streamer-in-UPSERT-Mode-with-Kafka-Avro-Messages.png
+tags:
+- blog
+- apache hudi
+- linkedin
+- beginner
+- hudi streamer
+- deltastreamer
+- apache kafka
+- apache avro
+- upsert

Review Comment:
   added the singular version



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Dataloss in FlinkCDC into Hudi without any exception or other infomation [hudi]

2024-03-05 Thread via GitHub


nsivabalan commented on issue #10542:
URL: https://github.com/apache/hudi/issues/10542#issuecomment-1979914443

   hey @xuzifu666 : 
   do you happened to have the old data intact which had data loss. We would 
like to root cause this. 0.x release line will be used by lot of OSS users. So, 
we really wanna get to the bottom of it and fix it. 
   
   Would greatly appreciate if you an help us triage this. 
   
   - Do you happened to know when exactly the data loss happens. do you see 
anything interesting in the timeline around the time the data loss happens. 
   - Is it a single writer or multi-writer. 
   - We do have some suspicion around log record reading that we are chasing. 
Ref ticket: https://github.com/apache/hudi/issues/10803 But I do not want to 
bias this one. lets get more info about when exactly data loss is seen. 
   - Are there any task retries in general. I am not familiar w/ flink. But in 
spark, we might have spark task retries. Are there any such things happening in 
your pipeline. 
   - Is it happening across all pipelines occasionally or very few pipelines. 
And if its very few, is there any common characteristics like index type, 
metadata enabled, etc. in comparison to other pipelines which does not have the 
data loss issue. 
   - And can you confirm that these pipelines were running w/o any issues w/ 
older versions of hudi. 
   - Do you happened to reproduce this in a deterministic manner? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10818:
URL: https://github.com/apache/hudi/pull/10818#issuecomment-1979913265

   
   ## CI report:
   
   * 251ba740bd933494dafdfcd6be5393400c10bd0f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22803)
 
   * d8ffb6b051f147146e927f1241efd015bf758c6a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22804)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] initial commit for hudi blogs [hudi]

2024-03-05 Thread via GitHub


nfarah86 commented on code in PR #10719:
URL: https://github.com/apache/hudi/pull/10719#discussion_r1513693521


##
website/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.mdx:
##
@@ -0,0 +1,16 @@
+---
+title: "How a POC became a production-ready Hudi data lakehouse through close 
team collaboration"
+excerpt: "How a POC became a production-ready Hudi data lakehouse through 
close team collaboration"
+author: leboncoin tech blog
+category: blog
+image: 
/assets/images/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.png
+tags:
+- blog
+- apache hudi
+- medium

Review Comment:
   i put the singular version



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] initial commit for hudi blogs [hudi]

2024-03-05 Thread via GitHub


nfarah86 commented on code in PR #10719:
URL: https://github.com/apache/hudi/pull/10719#discussion_r1513692010


##
website/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.mdx:
##
@@ -0,0 +1,16 @@
+---
+title: "How a POC became a production-ready Hudi data lakehouse through close 
team collaboration"
+excerpt: "How a POC became a production-ready Hudi data lakehouse through 
close team collaboration"
+author: leboncoin tech blog
+category: blog
+image: 
/assets/images/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.png
+tags:
+- blog
+- apache hudi
+- medium

Review Comment:
   i found the blog on medium- 
https://medium.com/leboncoin-tech-blog/how-a-poc-became-a-production-ready-hudi-data-lakehouse-through-close-team-collaboration-c7f33eb746a8



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] initial commit for hudi blogs [hudi]

2024-03-05 Thread via GitHub


nfarah86 commented on code in PR #10719:
URL: https://github.com/apache/hudi/pull/10719#discussion_r1513692010


##
website/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.mdx:
##
@@ -0,0 +1,16 @@
+---
+title: "How a POC became a production-ready Hudi data lakehouse through close 
team collaboration"
+excerpt: "How a POC became a production-ready Hudi data lakehouse through 
close team collaboration"
+author: leboncoin tech blog
+category: blog
+image: 
/assets/images/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.png
+tags:
+- blog
+- apache hudi
+- medium

Review Comment:
   i found the blog on medium- 
https://medium.com/leboncoin-tech-blog/how-a-poc-became-a-production-ready-hudi-data-lakehouse-through-close-team-collaboration-c7f33eb746a8



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10818:
URL: https://github.com/apache/hudi/pull/10818#issuecomment-1979907408

   
   ## CI report:
   
   * 251ba740bd933494dafdfcd6be5393400c10bd0f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22803)
 
   * d8ffb6b051f147146e927f1241efd015bf758c6a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7418] Create a common method for filtering in S3 and GCS sources and add tests for filtering out extensions [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10724:
URL: https://github.com/apache/hudi/pull/10724#issuecomment-1979907220

   
   ## CI report:
   
   * ead7905ee6e86f7ad3f3ad63f954592fde08502b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22797)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Data loss due to incorrect selection of log file during compaction [hudi]

2024-03-05 Thread via GitHub


nsivabalan commented on issue #10803:
URL: https://github.com/apache/hudi/issues/10803#issuecomment-1979906166

   Hey, I wrote a tool that could help us spit out some meta info about our log 
blocks and records. 
   https://github.com/nsivabalan/hudi/tree/printAllVersionsOfRecordTool
   here is the branch. 
   
   Can you help us run the tool and share us the output. 
   
   Its a spark submit command. Its going to log some info about the log files 
we are interested in. 
   
   sample command 
   ```
   ./bin/spark-submit --conf 
'spark.serializer=org.apache.spark.serializer.KryoSerializer'  --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' 
--class org.apache.hudi.utilities.PrintRecordsTool 
PATH_TO_BUNDLE/hudi-utilities-bundle_2.12-0.15.0-SNAPSHOT.jar --props 
/tmp/props.in --base-path /tmp/hudi_trips_mor/ --partition-path 
asia/india/chennai  --file-id c3ef010f-61ae-4aa3-a033-25b278da17c6-0  
--base-instant-time 20240302002723362 --print-log-blocks-info
   ```
   
   ```
   cat /tmp/props.in 
   hoodie.datasource.write.recordkey.field=uuid
   hoodie.datasource.write.partitionpath.field=partitionpath
   hoodie.datasource.write.precombine.field=ts
   ```
   
   Ensure you set the right values for partition path, fileID and the base 
instant time. 
   This should help w/ our triaging


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] initial commit for blogs [hudi]

2024-03-05 Thread via GitHub


nfarah86 commented on PR #10825:
URL: https://github.com/apache/hudi/pull/10825#issuecomment-1979904950

   @bhasudha 
   
   https://github.com/apache/hudi/assets/5392555/0e6b0c32-be3b-43ca-855e-e44a8aa2405d";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] initial commit for blogs [hudi]

2024-03-05 Thread via GitHub


nfarah86 opened a new pull request, #10825:
URL: https://github.com/apache/hudi/pull/10825

   ### Change Logs
   
   updated blog
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   low
   ### Documentation Update
   
   updated blogs
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] initial commit for hudi blogs [hudi]

2024-03-05 Thread via GitHub


bhasudha commented on code in PR #10719:
URL: https://github.com/apache/hudi/pull/10719#discussion_r1513686982


##
website/blog/2024-01-17-Enforce-fine-grained-access-control-on-Open-Table-Formats-via-Amazon-EMR-integrated-with-AWS-Lake-Formation.mdx:
##
@@ -0,0 +1,23 @@
+---
+title: "Enforce fine-grained access control on Open Table Formats via Amazon 
EMR integrated with AWS Lake Formation"
+excerpt: "Enforce fine-grained access control on Open Table Formats via Amazon 
EMR integrated with AWS Lake Formation"
+author: Raymond Lai, Aditya Shah, Bin Wang, and Melody Yang
+category: blog
+image: 
/assets/images/blog/2024-01-17-Enforce-fine-grained-access-control-on-Open-Table-Formats-via-Amazon-EMR-integrated-with-AWS-Lake-Formation.png
+tags:
+- blog
+- apache hudi
+- aws
+- intermediate
+- amazon emr
+- aws lake formation
+- aws glue
+- aws s3
+- amazon sagemaker
+- aws cloud9
+- amazon athena

Review Comment:
   can we add `access control` as well ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7418] Create a common method for filtering in S3 and GCS sources and add tests for filtering out extensions [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10724:
URL: https://github.com/apache/hudi/pull/10724#issuecomment-1979901293

   
   ## CI report:
   
   * ead7905ee6e86f7ad3f3ad63f954592fde08502b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] initial commit for hudi blogs [hudi]

2024-03-05 Thread via GitHub


bhasudha commented on code in PR #10719:
URL: https://github.com/apache/hudi/pull/10719#discussion_r1513685994


##
website/blog/2024-01-18-Deleting-Items-from-Apache-Hudi-using-Delta-Streamer-in-UPSERT-Mode-with-Kafka-Avro-Messages.mdx:
##
@@ -0,0 +1,23 @@
+---
+title: "Deleting Items from Apache Hudi using Delta Streamer in UPSERT Mode 
with Kafka Avro Messages"
+excerpt: "Deleting Items from Apache Hudi using Delta Streamer in UPSERT Mode 
with Kafka Avro Messages"
+author: Soumil Shah
+category: blog
+image: 
/assets/images/blog/2024-01-18-Deleting-Items-from-Apache-Hudi-using-Delta-Streamer-in-UPSERT-Mode-with-Kafka-Avro-Messages.png
+tags:
+- blog
+- apache hudi
+- linkedin
+- beginner
+- hudi streamer
+- deltastreamer
+- apache kafka
+- apache avro
+- upsert

Review Comment:
   add `deletes`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] initial commit for hudi blogs [hudi]

2024-03-05 Thread via GitHub


bhasudha commented on code in PR #10719:
URL: https://github.com/apache/hudi/pull/10719#discussion_r1513683042


##
website/blog/2024-01-20-Data-Engineering-Bootstrapping-Data-lake-with-Apache-Hudi.mdx:
##
@@ -0,0 +1,20 @@
+---
+title: "Data Engineering: Bootstrapping Data lake with Apache Hudi"
+excerpt: "Data Engineering: Bootstrapping Data lake with Apache Hudi"
+author: Krishna Prasad
+category: blog
+image: 
/assets/images/blog/2024-01-20-Data-Engineering-Bootstrapping-Data-lake-with-Apache-Hudi.png
+tags:
+- blog
+- apache hudi
+- medium
+- intermediate

Review Comment:
   this seems beginner level.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-2706] refactor spark-sql to make consistent with DataFrame api [hudi]

2024-03-05 Thread via GitHub


danny0405 commented on code in PR #3936:
URL: https://github.com/apache/hudi/pull/3936#discussion_r1513682928


##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/DeleteHoodieTableCommand.scala:
##
@@ -56,32 +57,36 @@ case class DeleteHoodieTableCommand(deleteTable: 
DeleteFromTable) extends Runnab
   }
 
   private def buildHoodieConfig(sparkSession: SparkSession): Map[String, 
String] = {
-val targetTable = sparkSession.sessionState.catalog
-  .getTableMetadata(tableId)
+val targetTable = 
sparkSession.sessionState.catalog.getTableMetadata(tableId)
+val tblProperties = targetTable.storage.properties ++ 
targetTable.properties
 val path = getTableLocation(targetTable, sparkSession)
 val conf = sparkSession.sessionState.newHadoopConf()
 val metaClient = HoodieTableMetaClient.builder()
   .setBasePath(path)
   .setConf(conf)
   .build()
 val tableConfig = metaClient.getTableConfig
-val primaryColumns = 
HoodieOptionConfig.getPrimaryColumns(targetTable.storage.properties)

Review Comment:
   cc @boneanxs for taking a look if you have time~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] initial commit for hudi blogs [hudi]

2024-03-05 Thread via GitHub


bhasudha commented on code in PR #10719:
URL: https://github.com/apache/hudi/pull/10719#discussion_r1513680939


##
website/blog/2024-01-30-Leverage-Partition-Paths-of-your-data-lake-tables-to-Optimize-Data-Retrieval-Costs-on-the-cloud.mdx:
##
@@ -0,0 +1,19 @@
+---
+title: "Leverage Partition Paths of your data lake tables to Optimize Data 
Retrieval Costs on the cloud"
+excerpt: "Leverage Partition Paths of your data lake tables to Optimize Data 
Retrieval Costs on the cloud"
+author: Krishna Prasad
+category: blog
+image: 
/assets/images/blog/2024-01-30-Leverage-Partition-Paths-of-your-data-lake-tables-to-Optimize-Data-Retrieval-Costs-on-the-cloud.png
+tags:
+- blog
+- apache hudi
+- medium
+- intermediate
+- aws glue

Review Comment:
   add `partition` tag?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-2706] refactor spark-sql to make consistent with DataFrame api [hudi]

2024-03-05 Thread via GitHub


jonvex commented on code in PR #3936:
URL: https://github.com/apache/hudi/pull/3936#discussion_r1513680318


##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/DeleteHoodieTableCommand.scala:
##
@@ -56,32 +57,36 @@ case class DeleteHoodieTableCommand(deleteTable: 
DeleteFromTable) extends Runnab
   }
 
   private def buildHoodieConfig(sparkSession: SparkSession): Map[String, 
String] = {
-val targetTable = sparkSession.sessionState.catalog
-  .getTableMetadata(tableId)
+val targetTable = 
sparkSession.sessionState.catalog.getTableMetadata(tableId)
+val tblProperties = targetTable.storage.properties ++ 
targetTable.properties
 val path = getTableLocation(targetTable, sparkSession)
 val conf = sparkSession.sessionState.newHadoopConf()
 val metaClient = HoodieTableMetaClient.builder()
   .setBasePath(path)
   .setConf(conf)
   .build()
 val tableConfig = metaClient.getTableConfig
-val primaryColumns = 
HoodieOptionConfig.getPrimaryColumns(targetTable.storage.properties)

Review Comment:
   @YannByron @xushiyan @danny0405 @leesf : Do we have context around why the 
case sensitivity was changed here. 
   Looks like case sensitivity is broken w/ spark-sql Merge Into as of now. 
   We are looking to work towards a fix. but wanted to ensure we don't 
unintentionally break something else if this piece of code was intentionally 
written for some reason. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7138] Fix the conversion of typed properties to map in scala [hudi]

2024-03-05 Thread via GitHub


rmahindra123 closed pull request #10416: [HUDI-7138] Fix the conversion of 
typed properties to map in scala
URL: https://github.com/apache/hudi/pull/10416


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7478] Fix max delta commits guard check w/ MDT [hudi]

2024-03-05 Thread via GitHub


danny0405 commented on code in PR #10820:
URL: https://github.com/apache/hudi/pull/10820#discussion_r1513671566


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -830,7 +830,7 @@ private static void 
deletePendingIndexingInstant(HoodieTableMetaClient metaClien
   protected static void checkNumDeltaCommits(HoodieTableMetaClient metaClient, 
int maxNumDeltaCommitsWhenPending) {
 final HoodieActiveTimeline activeTimeline = 
metaClient.reloadActiveTimeline();
 Option lastCompaction = 
activeTimeline.filterCompletedInstants()
-.filter(s -> s.getAction().equals(COMPACTION_ACTION)).lastInstant();
+.filter(s -> s.getAction().equals(COMMIT_ACTION)).lastInstant();

Review Comment:
   I'm wondering whether we should use the `COMPACTION_ACTION` for committed 
instant in release 1.0.x, cc @vinothchandar ~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] initial commit for hudi blogs [hudi]

2024-03-05 Thread via GitHub


bhasudha commented on code in PR #10719:
URL: https://github.com/apache/hudi/pull/10719#discussion_r1513668790


##
website/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.mdx:
##
@@ -0,0 +1,16 @@
+---
+title: "How a POC became a production-ready Hudi data lakehouse through close 
team collaboration"
+excerpt: "How a POC became a production-ready Hudi data lakehouse through 
close team collaboration"
+author: leboncoin tech blog

Review Comment:
   Author Info obtained from the blog :
   `By Xiaoxiao Rey, Data Engineer, and [Hussein 
Awala](https://medium.com/@hussein-awala), Senior Data Engineer`
   
   Can we use these author names?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] initial commit for hudi blogs [hudi]

2024-03-05 Thread via GitHub


bhasudha commented on code in PR #10719:
URL: https://github.com/apache/hudi/pull/10719#discussion_r1513667248


##
website/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.mdx:
##
@@ -0,0 +1,16 @@
+---
+title: "How a POC became a production-ready Hudi data lakehouse through close 
team collaboration"
+excerpt: "How a POC became a production-ready Hudi data lakehouse through 
close team collaboration"
+author: leboncoin tech blog
+category: blog
+image: 
/assets/images/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.png
+tags:
+- blog
+- apache hudi
+- medium
+- beginner

Review Comment:
   change to `use-case` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] initial commit for hudi blogs [hudi]

2024-03-05 Thread via GitHub


bhasudha commented on code in PR #10719:
URL: https://github.com/apache/hudi/pull/10719#discussion_r151373


##
website/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.mdx:
##
@@ -0,0 +1,16 @@
+---
+title: "How a POC became a production-ready Hudi data lakehouse through close 
team collaboration"
+excerpt: "How a POC became a production-ready Hudi data lakehouse through 
close team collaboration"
+author: leboncoin tech blog
+category: blog
+image: 
/assets/images/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.png
+tags:
+- blog
+- apache hudi
+- medium

Review Comment:
   This should be `leboncoin-tech-blog` instead of `medium`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] initial commit for hudi blogs [hudi]

2024-03-05 Thread via GitHub


bhasudha commented on code in PR #10719:
URL: https://github.com/apache/hudi/pull/10719#discussion_r1513664527


##
website/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.mdx:
##
@@ -0,0 +1,16 @@
+---
+title: "How a POC became a production-ready Hudi data lakehouse through close 
team collaboration"
+excerpt: "How a POC became a production-ready Hudi data lakehouse through 
close team collaboration"
+author: leboncoin tech blog
+category: blog
+image: 
/assets/images/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.png
+tags:
+- blog
+- apache hudi
+- medium

Review Comment:
   Add tags such as `deletes` `gdpr deletion` `upserts` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Data loss due to incorrect selection of log file during compaction [hudi]

2024-03-05 Thread via GitHub


nsivabalan commented on issue #10803:
URL: https://github.com/apache/hudi/issues/10803#issuecomment-1979863694

   Sorry about lot of follow up questions. 
   Can you tell us what storage scheme you are using. Partial write failures 
should not happen w/ S3 or other cloud stores. if it had been hdfs, we cold see 
partial write failures. We are trying to gauge if the 2nd log file was properly 
formed or was it corrupted due to partial write failure. 
   
   
   If you have a backup of the data, let us know. We can share some tool that 
can spit out info about the log files (valid log blocks, no of valid records, 
etc) and might help us in our triaging. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10818:
URL: https://github.com/apache/hudi/pull/10818#issuecomment-1979854060

   
   ## CI report:
   
   * 251ba740bd933494dafdfcd6be5393400c10bd0f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22803)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7478] Fix max delta commits guard check w/ MDT [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10820:
URL: https://github.com/apache/hudi/pull/10820#issuecomment-1979854100

   
   ## CI report:
   
   * 800fcd378e0bb95c53a81c3c60796c97ea53d821 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22794)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR][Test only] 1 [hudi]

2024-03-05 Thread via GitHub


yihua closed pull request #10824: [MINOR][Test only] 1
URL: https://github.com/apache/hudi/pull/10824


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [MINOR][Test only] 1 [hudi]

2024-03-05 Thread via GitHub


yihua opened a new pull request, #10824:
URL: https://github.com/apache/hudi/pull/10824

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7478] Fix max delta commits guard check w/ MDT [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10820:
URL: https://github.com/apache/hudi/pull/10820#issuecomment-1979847082

   
   ## CI report:
   
   * 800fcd378e0bb95c53a81c3c60796c97ea53d821 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22794)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10818:
URL: https://github.com/apache/hudi/pull/10818#issuecomment-1979847047

   
   ## CI report:
   
   *  Unknown: [CANCELED](TBD) 
   * 251ba740bd933494dafdfcd6be5393400c10bd0f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]

2024-03-05 Thread via GitHub


yihua commented on PR #10818:
URL: https://github.com/apache/hudi/pull/10818#issuecomment-1979831673

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7478] Fix max delta commits guard check w/ MDT [hudi]

2024-03-05 Thread via GitHub


wombatu-kun commented on PR #10820:
URL: https://github.com/apache/hudi/pull/10820#issuecomment-1979821628

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7483) TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict

2024-03-05 Thread Lin Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu updated HUDI-7483:
--
Description: 
https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=22801&view=logs&j=600e7de6-e133-5e69-e615-50ee129b3c08&t=bbbd7bcc-ae73-56b8-887a-cd2d6deaafc7&s=859b8d9a-8fd6-5a5c-6f5e-f84f1990894e
{code:java}
[ERROR] Tests run: 29, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
307.331 s <<< FAILURE! - in org.apache.hudi.client.TestHoodieClientMultiWriter
[ERROR] testMultiWriterWithAsyncTableServicesWithConflict{HoodieTableType, 
Class, ConflictResolutionStrategy}[6]  Time elapsed: 16.083 s  <<< ERROR!
java.util.concurrent.ExecutionException: org.opentest4j.AssertionFailedError: 
Expected org.apache.hudi.exception.HoodieWriteConflictException to be thrown, 
but nothing was thrown.
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.hudi.client.TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict(TestHoodieClientMultiWriter.java:565)
{code}

  was:
{code:java}
[ERROR] Tests run: 29, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
307.331 s <<< FAILURE! - in org.apache.hudi.client.TestHoodieClientMultiWriter
[ERROR] testMultiWriterWithAsyncTableServicesWithConflict{HoodieTableType, 
Class, ConflictResolutionStrategy}[6]  Time elapsed: 16.083 s  <<< ERROR!
java.util.concurrent.ExecutionException: org.opentest4j.AssertionFailedError: 
Expected org.apache.hudi.exception.HoodieWriteConflictException to be thrown, 
but nothing was thrown.
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.hudi.client.TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict(TestHoodieClientMultiWriter.java:565)
{code}


> TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict
> -
>
> Key: HUDI-7483
> URL: https://issues.apache.org/jira/browse/HUDI-7483
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Lin Liu
>Priority: Major
>
> https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=22801&view=logs&j=600e7de6-e133-5e69-e615-50ee129b3c08&t=bbbd7bcc-ae73-56b8-887a-cd2d6deaafc7&s=859b8d9a-8fd6-5a5c-6f5e-f84f1990894e
> {code:java}
> [ERROR] Tests run: 29, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 307.331 s <<< FAILURE! - in org.apache.hudi.client.TestHoodieClientMultiWriter
> [ERROR] testMultiWriterWithAsyncTableServicesWithConflict{HoodieTableType, 
> Class, ConflictResolutionStrategy}[6]  Time elapsed: 16.083 s  <<< ERROR!
> java.util.concurrent.ExecutionException: org.opentest4j.AssertionFailedError: 
> Expected org.apache.hudi.exception.HoodieWriteConflictException to be thrown, 
> but nothing was thrown.
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.hudi.client.TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict(TestHoodieClientMultiWriter.java:565)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7483) TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict

2024-03-05 Thread Lin Liu (Jira)
Lin Liu created HUDI-7483:
-

 Summary: 
TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict
 Key: HUDI-7483
 URL: https://issues.apache.org/jira/browse/HUDI-7483
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Lin Liu


{code:java}
[ERROR] Tests run: 29, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
307.331 s <<< FAILURE! - in org.apache.hudi.client.TestHoodieClientMultiWriter
[ERROR] testMultiWriterWithAsyncTableServicesWithConflict{HoodieTableType, 
Class, ConflictResolutionStrategy}[6]  Time elapsed: 16.083 s  <<< ERROR!
java.util.concurrent.ExecutionException: org.opentest4j.AssertionFailedError: 
Expected org.apache.hudi.exception.HoodieWriteConflictException to be thrown, 
but nothing was thrown.
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.hudi.client.TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict(TestHoodieClientMultiWriter.java:565)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [I] [SUPPORT] java.lang.NoClassDefFoundError: org/apache/hudi/com/fasterxml/jackson/module/scala/DefaultScalaModule$ when doing an Incremental CDC Query in 0.14.1 [hudi]

2024-03-05 Thread via GitHub


Tyler-Rendina commented on issue #10590:
URL: https://github.com/apache/hudi/issues/10590#issuecomment-1979738484

   Is there a way to manually add the class after importing the spark bundle?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1979723667

   
   ## CI report:
   
   * c7c575df44ea9bf7f7b26587e26116d93955b2e2 UNKNOWN
   * 572c0ce5761d4e06fdf8ceebf808a9850d499bbb Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22799)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] hudi 0.14.1 and hudi 0.14.0 build issue [hudi]

2024-03-05 Thread via GitHub


yihua commented on issue #10808:
URL: https://github.com/apache/hudi/issues/10808#issuecomment-1979684160

   I've updated the Spark 3.5 support PR to have label `release-0.15.0` instead 
of `release-0.14.1`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10818:
URL: https://github.com/apache/hudi/pull/10818#issuecomment-1979663466

   
   ## CI report:
   
   * 04833166c6b9d859f7d0d7b26eb54ec6938a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22802)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]

2024-03-05 Thread via GitHub


hudi-bot commented on PR #10818:
URL: https://github.com/apache/hudi/pull/10818#issuecomment-1979652823

   
   ## CI report:
   
   * c2ee3d3e53250fc6757172510018026a026d0bbe Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22800)
 
   * 04833166c6b9d859f7d0d7b26eb54ec6938a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >