Re: [PR] [HUDI-6472] fix spark sql does not ignore case [hudi]
beyond1920 commented on PR #10826: URL: https://github.com/apache/hudi/pull/10826#issuecomment-1980284804 @jonvex It seems a little heavy to use the optimizer here just for case insensitive. Besides, if wrap an optimize phase here, user might missed information about plan conversion in spark sql WEB UI, right? Is there any better solution? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6472] fix spark sql does not ignore case [hudi]
beyond1920 commented on code in PR #10826: URL: https://github.com/apache/hudi/pull/10826#discussion_r1513982842 ## hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala: ## @@ -95,7 +95,9 @@ object InsertIntoHoodieTableCommand extends Logging with ProvidesHoodieConfig wi } val config = buildHoodieInsertConfig(catalogTable, sparkSession, isOverWritePartition, isOverWriteTable, partitionSpec, extraOptions, staticOverwritePartitionPathOpt) -val alignedQuery = alignQueryOutput(query, catalogTable, partitionSpec, sparkSession.sessionState.conf) +val optimizer = sparkSession.sessionState.optimizer +val optimizerPlan = optimizer.execute(query) +val alignedQuery = alignQueryOutput(optimizerPlan, catalogTable, partitionSpec, sparkSession.sessionState.conf) Review Comment: It's seems a little heavy to use optimizer just for case insensitive. Besides, if wrap an optimize phase here, user might missed something in spark sql WEB UI. Is there any better solution? ## hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala: ## @@ -95,7 +95,9 @@ object InsertIntoHoodieTableCommand extends Logging with ProvidesHoodieConfig wi } val config = buildHoodieInsertConfig(catalogTable, sparkSession, isOverWritePartition, isOverWriteTable, partitionSpec, extraOptions, staticOverwritePartitionPathOpt) -val alignedQuery = alignQueryOutput(query, catalogTable, partitionSpec, sparkSession.sessionState.conf) +val optimizer = sparkSession.sessionState.optimizer +val optimizerPlan = optimizer.execute(query) +val alignedQuery = alignQueryOutput(optimizerPlan, catalogTable, partitionSpec, sparkSession.sessionState.conf) Review Comment: It seems a little heavy to use optimizer just for case insensitive. Besides, if wrap an optimize phase here, user might missed something in spark sql WEB UI. Is there any better solution? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7457] Remove runtime shutdown hook from HoodieLogFormatWriter [hudi]
bvaradar commented on code in PR #10789: URL: https://github.com/apache/hudi/pull/10789#discussion_r1513977038 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatWriter.java: ## @@ -62,15 +61,14 @@ public class HoodieLogFormatWriter implements HoodieLogFormat.Writer { Short replication, Long sizeThreshold, String rolloverLogWriteToken, - LogFileCreationCallback fileCreationHook) { + LogFileCreationCallback fileCreationCallback) { Review Comment: Looking at other places like HoodieWriteHandle#createLogWriter, we are passing the LogFormatWriter to the caller. So, it does not look like we can get away by try-with-resource refactoring alone. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6472] fix spark sql does not ignore case [hudi]
danny0405 commented on code in PR #10826: URL: https://github.com/apache/hudi/pull/10826#discussion_r1513958254 ## hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala: ## @@ -95,7 +95,9 @@ object InsertIntoHoodieTableCommand extends Logging with ProvidesHoodieConfig wi } val config = buildHoodieInsertConfig(catalogTable, sparkSession, isOverWritePartition, isOverWriteTable, partitionSpec, extraOptions, staticOverwritePartitionPathOpt) -val alignedQuery = alignQueryOutput(query, catalogTable, partitionSpec, sparkSession.sessionState.conf) +val optimizer = sparkSession.sessionState.optimizer +val optimizerPlan = optimizer.execute(query) +val alignedQuery = alignQueryOutput(optimizerPlan, catalogTable, partitionSpec, sparkSession.sessionState.conf) Review Comment: Yeah, we need some clarification here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6472] fix spark sql does not ignore case [hudi]
danny0405 commented on code in PR #10826: URL: https://github.com/apache/hudi/pull/10826#discussion_r1513954128 ## hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala: ## @@ -346,12 +346,14 @@ case class MergeIntoHoodieTableCommand(mergeInto: MergeIntoTable) extends Hoodie Project(incomingDataCols, joinData) } -val projectedJoinOutput = projectedJoinPlan.output +val optimizer = sparkSession.sessionState.optimizer +val projectedJoinOptimizerPlan = optimizer.execute(projectedJoinPlan) +val projectedJoinOptimizerOutput = projectedJoinOptimizerPlan.output Review Comment: So the `optimizer.execute` is the critical step to fix the case sensitiveness. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]
hudi-bot commented on PR #10818: URL: https://github.com/apache/hudi/pull/10818#issuecomment-1980238002 ## CI report: * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN * 9e94942e0c82e9ed8e41744ab0fa3033fe5c0e39 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22810) * a3ed6c818a477182fe075a8e06efe3f80353ce43 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22812) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] If Sanitastiion Enabled In HudiStreamer It is taking too much time [hudi]
Amar1404 commented on issue #10466: URL: https://github.com/apache/hudi/issues/10466#issuecomment-1980235660 @ad1happy2go - Any updated on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
yihua closed pull request #10360: [HUDI-6497] WIP HoodieStorage abstraction URL: https://github.com/apache/hudi/pull/10360 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
yihua commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1980235486 This PR is replaced by #10591 as the last main piece of the storage abstraction. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7059][WIP] Read record positions with filter pushdown using Spark parquet reader [hudi]
yihua commented on PR #10030: URL: https://github.com/apache/hudi/pull/10030#issuecomment-1980233153 The functionality is implemented in #10167. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7059][WIP] Read record positions with filter pushdown using Spark parquet reader [hudi]
yihua closed pull request #10030: [HUDI-7059][WIP] Read record positions with filter pushdown using Spark parquet reader URL: https://github.com/apache/hudi/pull/10030 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]
hudi-bot commented on PR #10818: URL: https://github.com/apache/hudi/pull/10818#issuecomment-1980229022 ## CI report: * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN * 9e94942e0c82e9ed8e41744ab0fa3033fe5c0e39 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22810) * a3ed6c818a477182fe075a8e06efe3f80353ce43 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]
hudi-bot commented on PR #10818: URL: https://github.com/apache/hudi/pull/10818#issuecomment-1980220475 ## CI report: * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN * 9e94942e0c82e9ed8e41744ab0fa3033fe5c0e39 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22810) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7457] Remove runtime shutdown hook from HoodieLogFormatWriter [hudi]
danny0405 commented on PR #10789: URL: https://github.com/apache/hudi/pull/10789#issuecomment-1980202198 Hi, @nsivabalan , can you help to review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7457] Remove runtime shutdown hook from HoodieLogFormatWriter [hudi]
danny0405 commented on code in PR #10789: URL: https://github.com/apache/hudi/pull/10789#discussion_r1513920873 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatWriter.java: ## @@ -62,15 +61,14 @@ public class HoodieLogFormatWriter implements HoodieLogFormat.Writer { Short replication, Long sizeThreshold, String rolloverLogWriteToken, - LogFileCreationCallback fileCreationHook) { + LogFileCreationCallback fileCreationCallback) { Review Comment: Hi, @bvaradar @nbalajee can you help to confirm whether it is safe to remove this shutdown hook from the log format writer? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]
hudi-bot commented on PR #10818: URL: https://github.com/apache/hudi/pull/10818#issuecomment-1980167738 ## CI report: * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN * 6fa1f1c34c6007131603081330cc6cd878df4d75 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22809) * 9e94942e0c82e9ed8e41744ab0fa3033fe5c0e39 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22810) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6472] fix spark sql does not ignore case [hudi]
yihua commented on code in PR #10826: URL: https://github.com/apache/hudi/pull/10826#discussion_r1513878332 ## hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala: ## @@ -372,17 +374,23 @@ case class MergeIntoHoodieTableCommand(mergeInto: MergeIntoTable) extends Hoodie // In case when we're not adding new columns we need to make sure that the casing of the key attributes' // matches to that one of the target table. This is necessary b/c unlike Spark, Avro is case-sensitive // and therefore would fail downstream if case of corresponding columns don't match +val partitionColumns = hoodieCatalogTable.tableConfig.getPartitionFieldProp.split(",").toSeq val existingAttributes = existingAttributesMap.map(_._1) -val adjustedSourceTableOutput = projectedJoinOutput.map { attr => +val adjustedSourceTableOutput = projectedJoinOptimizerOutput.map { attr => existingAttributes.find(keyAttr => resolver(keyAttr.name, attr.name)) match { // To align the casing we just rename the attribute to match that one of the // target table case Some(keyAttr) => attr.withName(keyAttr.name) -case _ => attr +// additional check for partition columns because they are not required, +// but we still care about casing because of keygenerator +case _ => partitionColumns.find(colName => resolver(colName, attr.name)) match { Review Comment: can we add a test around this case? ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestInsertTable.scala: ## @@ -2448,6 +2449,50 @@ class TestInsertTable extends HoodieSparkSqlTestBase { }) } + test("Test query with Foldable Propagation expression") { Review Comment: Make the test name more readable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]
hudi-bot commented on PR #10818: URL: https://github.com/apache/hudi/pull/10818#issuecomment-1980160838 ## CI report: * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN * 6fa1f1c34c6007131603081330cc6cd878df4d75 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22809) * 9e94942e0c82e9ed8e41744ab0fa3033fe5c0e39 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6472] fix spark sql does not ignore case [hudi]
yihua commented on code in PR #10826: URL: https://github.com/apache/hudi/pull/10826#discussion_r1513876839 ## hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala: ## @@ -95,7 +95,9 @@ object InsertIntoHoodieTableCommand extends Logging with ProvidesHoodieConfig wi } val config = buildHoodieInsertConfig(catalogTable, sparkSession, isOverWritePartition, isOverWriteTable, partitionSpec, extraOptions, staticOverwritePartitionPathOpt) -val alignedQuery = alignQueryOutput(query, catalogTable, partitionSpec, sparkSession.sessionState.conf) +val optimizer = sparkSession.sessionState.optimizer +val optimizerPlan = optimizer.execute(query) +val alignedQuery = alignQueryOutput(optimizerPlan, catalogTable, partitionSpec, sparkSession.sessionState.conf) Review Comment: is this required for case insensitivity or this is for performance optimization? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6472] fix spark sql does not ignore case [hudi]
yihua commented on code in PR #10826: URL: https://github.com/apache/hudi/pull/10826#discussion_r1513874685 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestInsertTable.scala: ## @@ -2448,6 +2449,50 @@ class TestInsertTable extends HoodieSparkSqlTestBase { }) } + test("Test query with Foldable Propagation expression") { +withRecordType(Seq(HoodieRecordType.AVRO))(withTempDir { tmp => Review Comment: Remove `withRecordType(Seq(HoodieRecordType.AVRO))(` as it's not required. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Fix Azure publishing of JUnit results [hudi]
zhangyue19921010 merged PR #10817: URL: https://github.com/apache/hudi/pull/10817 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated (f5284964e29 -> b710c07c0e8)
This is an automated email from the ASF dual-hosted git repository. zhangyue19921010 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from f5284964e29 [HUDI-7418] Create a common method for filtering in S3 and GCS sources and add tests for filtering out extensions (#10724) add b710c07c0e8 [MINOR] Fix Azure publishing of JUnit results (#10817) No new revisions were added by this update. Summary of changes: azure-pipelines-20230430.yml | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-)
Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]
hudi-bot commented on PR #10818: URL: https://github.com/apache/hudi/pull/10818#issuecomment-1980153644 ## CI report: * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN * 6fa1f1c34c6007131603081330cc6cd878df4d75 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22809) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] - Hudi 0.12.1 - production job slowing down [hudi]
ad1happy2go commented on issue #10822: URL: https://github.com/apache/hudi/issues/10822#issuecomment-1980135128 @joshhamann Can you please provide the writer configuration to look into this more. If you are using upsert operation type, The load to a new Hudi Table will be expected to run faster as there is no existing dataset to join with to identify which records need to be upserted. So when we benchmarked 5 min vs 15 min, was the Hudi Table was empty or it had same amount of existing data as old table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Slashes in partition columns [hudi]
ad1happy2go commented on issue #10754: URL: https://github.com/apache/hudi/issues/10754#issuecomment-1980114792 Similar jira raised to fix this issue - https://issues.apache.org/jira/browse/HUDI-7484 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Fix Azure publishing of JUnit results [hudi]
yihua commented on PR #10817: URL: https://github.com/apache/hudi/pull/10817#issuecomment-1980109582 @stream2000 @zhangyue19921010 @leesf appreciate it if one of you could review and land this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch branch-0.x updated: [HUDI-7463] Bump Spark 3.5 version to Spark 3.5.1 (#10788)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch branch-0.x in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/branch-0.x by this push: new 2b4e6588079 [HUDI-7463] Bump Spark 3.5 version to Spark 3.5.1 (#10788) 2b4e6588079 is described below commit 2b4e658807933bde0a31f5fe565bd80f11d13f31 Author: Shawn Chang <42792772+c...@users.noreply.github.com> AuthorDate: Tue Mar 5 21:11:40 2024 -0800 [HUDI-7463] Bump Spark 3.5 version to Spark 3.5.1 (#10788) Co-authored-by: Shawn Chang --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index 903d3a58714..9b76ec7e95d 100644 --- a/pom.xml +++ b/pom.xml @@ -166,7 +166,7 @@ 3.2.3 3.3.1 3.4.1 -3.5.0 +3.5.1 hudi-spark3.2.x
Re: [PR] [HUDI-7463][branch-0.x] Bump Spark 3.5 version to Spark 3.5.1 [hudi]
yihua merged PR #10788: URL: https://github.com/apache/hudi/pull/10788 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]
hudi-bot commented on PR #10818: URL: https://github.com/apache/hudi/pull/10818#issuecomment-1980098240 ## CI report: * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN * b75239bfa677b1640ff49dcd705acb3db4263a69 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22806) * 3a23e19c1f719b79092bc65c7047c6c83bae657c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22808) * 6fa1f1c34c6007131603081330cc6cd878df4d75 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22809) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] updated video content [hudi]
nfarah86 commented on PR #10827: URL: https://github.com/apache/hudi/pull/10827#issuecomment-1980084074 https://github.com/apache/hudi/assets/5392555/598bce23-f2f5-48e5-835f-bbd262a5f1ef";> @bhasudha video blogs are ready -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] updated video content [hudi]
nfarah86 opened a new pull request, #10827: URL: https://github.com/apache/hudi/pull/10827 ### Change Logs updated video content ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) none ### Documentation Update none - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]
hudi-bot commented on PR #10818: URL: https://github.com/apache/hudi/pull/10818#issuecomment-1980064748 ## CI report: * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN * b75239bfa677b1640ff49dcd705acb3db4263a69 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22806) * 3a23e19c1f719b79092bc65c7047c6c83bae657c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22808) * 6fa1f1c34c6007131603081330cc6cd878df4d75 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]
hudi-bot commented on PR #10818: URL: https://github.com/apache/hudi/pull/10818#issuecomment-1980059762 ## CI report: * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN * b75239bfa677b1640ff49dcd705acb3db4263a69 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22806) * 3a23e19c1f719b79092bc65c7047c6c83bae657c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6472] fix spark sql does not ignore case [hudi]
hudi-bot commented on PR #10826: URL: https://github.com/apache/hudi/pull/10826#issuecomment-1980054774 ## CI report: * 0c2acd76a0d937ac926d5fdabafbfc1d66b61e2f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22807) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]
hudi-bot commented on PR #10818: URL: https://github.com/apache/hudi/pull/10818#issuecomment-1980054735 ## CI report: * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN * b75239bfa677b1640ff49dcd705acb3db4263a69 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22806) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]
hudi-bot commented on PR #10818: URL: https://github.com/apache/hudi/pull/10818#issuecomment-1979998052 ## CI report: * d8ffb6b051f147146e927f1241efd015bf758c6a Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22804) * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN * b75239bfa677b1640ff49dcd705acb3db4263a69 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22806) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7529] fix multiple tasks get the lock at the same time when use… [hudi]
KnightChess closed pull request #10412: [HUDI-7529] fix multiple tasks get the lock at the same time when use… URL: https://github.com/apache/hudi/pull/10412 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7529] fix multiple tasks get the lock at the same time when use… [hudi]
KnightChess commented on PR #10412: URL: https://github.com/apache/hudi/pull/10412#issuecomment-1979984556 close it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [WIP][HUDI-6472] fix spark sql does not ignore case [hudi]
KnightChess closed pull request #10582: [WIP][HUDI-6472] fix spark sql does not ignore case URL: https://github.com/apache/hudi/pull/10582 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] change hive/adb tool not auto create database default [hudi]
KnightChess closed pull request #9640: [MINOR] change hive/adb tool not auto create database default URL: https://github.com/apache/hudi/pull/9640 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-5956] Simple repair spark sql dag ui display problem [hudi]
KnightChess closed pull request #8233: [HUDI-5956] Simple repair spark sql dag ui display problem URL: https://github.com/apache/hudi/pull/8233 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [WIP][HUDI-6472] fix spark sql does not ignore case [hudi]
KnightChess commented on PR #10582: URL: https://github.com/apache/hudi/pull/10582#issuecomment-1979982764 Sorry for the late reply. @jonvex I will close this pr, thank you work for it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Data loss due to incorrect selection of log file during compaction [hudi]
Ytimetravel commented on issue #10803: URL: https://github.com/apache/hudi/issues/10803#issuecomment-1979974997 Thank you very much for following up on the issue and providing feedback. I will use the tool you provided to obtain some meta info about our log blocks and records, and will get back to you later~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6472] fix spark sql does not ignore case [hudi]
hudi-bot commented on PR #10826: URL: https://github.com/apache/hudi/pull/10826#issuecomment-1979962655 ## CI report: * 0c2acd76a0d937ac926d5fdabafbfc1d66b61e2f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22807) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]
hudi-bot commented on PR #10818: URL: https://github.com/apache/hudi/pull/10818#issuecomment-1979962618 ## CI report: * d8ffb6b051f147146e927f1241efd015bf758c6a Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22804) * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN * b75239bfa677b1640ff49dcd705acb3db4263a69 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-2706] refactor spark-sql to make consistent with DataFrame api [hudi]
boneanxs commented on code in PR #3936: URL: https://github.com/apache/hudi/pull/3936#discussion_r1513731553 ## hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/DeleteHoodieTableCommand.scala: ## @@ -56,32 +57,36 @@ case class DeleteHoodieTableCommand(deleteTable: DeleteFromTable) extends Runnab } private def buildHoodieConfig(sparkSession: SparkSession): Map[String, String] = { -val targetTable = sparkSession.sessionState.catalog - .getTableMetadata(tableId) +val targetTable = sparkSession.sessionState.catalog.getTableMetadata(tableId) +val tblProperties = targetTable.storage.properties ++ targetTable.properties val path = getTableLocation(targetTable, sparkSession) val conf = sparkSession.sessionState.newHadoopConf() val metaClient = HoodieTableMetaClient.builder() .setBasePath(path) .setConf(conf) .build() val tableConfig = metaClient.getTableConfig -val primaryColumns = HoodieOptionConfig.getPrimaryColumns(targetTable.storage.properties) Review Comment: As far as I know, hudi currently uses `spark.sql.caseSensitive` to choose caseSensitive or not during analyze stage, and by default it's false, so it might be reasonable that we need to respect that configure as well here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6472] fix spark sql does not ignore case [hudi]
hudi-bot commented on PR #10826: URL: https://github.com/apache/hudi/pull/10826#issuecomment-1979957408 ## CI report: * 0c2acd76a0d937ac926d5fdabafbfc1d66b61e2f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]
hudi-bot commented on PR #10818: URL: https://github.com/apache/hudi/pull/10818#issuecomment-1979957377 ## CI report: * d8ffb6b051f147146e927f1241efd015bf758c6a Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22804) * 8193cde8b9c587ab66928080de4f70c50d64dca4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]
hudi-bot commented on PR #10818: URL: https://github.com/apache/hudi/pull/10818#issuecomment-1979951480 ## CI report: * d8ffb6b051f147146e927f1241efd015bf758c6a Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22804) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (HUDI-7475) Disable ITs in hudi-aws module
[ https://issues.apache.org/jira/browse/HUDI-7475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov resolved HUDI-7475. - > Disable ITs in hudi-aws module > -- > > Key: HUDI-7475 > URL: https://issues.apache.org/jira/browse/HUDI-7475 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Vova Kolmakov >Priority: Major > Labels: pull-request-available > > The tests do not work. Disabling them to unblock Azure CI. > {code:java} > [ERROR] Errors: > [ERROR] ITTestGluePartitionPushdown.setUp:96 » Execution > software.amazon.awssdk.core.e... > [ERROR] ITTestGluePartitionPushdown.setUp:96 » Execution > software.amazon.awssdk.core.e... > [ERROR] ITTestGluePartitionPushdown.setUp:96 » Execution > software.amazon.awssdk.core.e... > [ERROR] > ITTestDynamoDBBasedLockProvider.setup:66->getDynamoClientWithLocalEndpoint:110 > IllegalState > [INFO] > [ERROR] Tests run: 9, Failures: 0, Errors: 4, Skipped: 0 > 2024-03-04T04:55:22.6893321Z [ERROR] > org.apache.hudi.aws.transaction.integ.ITTestDynamoDBBasedLockProvider Time > elapsed: 0.019 s <<< ERROR! > 2024-03-04T04:55:22.6893739Z java.lang.IllegalStateException: > dynamodb-local.endpoint system property not set > 2024-03-04T04:55:22.6894356Z at > org.apache.hudi.aws.transaction.integ.ITTestDynamoDBBasedLockProvider.getDynamoClientWithLocalEndpoint(ITTestDynamoDBBasedLockProvider.java:110) > 2024-03-04T04:55:22.6894867Z at > org.apache.hudi.aws.transaction.integ.ITTestDynamoDBBasedLockProvider.setup(ITTestDynamoDBBasedLockProvider.java:66) > 2024-03-04T04:55:22.6895225Z at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2024-03-04T04:55:22.6895711Z at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2024-03-04T04:55:22.6896080Z at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2024-03-04T04:55:22.6896418Z at > java.lang.reflect.Method.invoke(Method.java:498) > 2024-03-04T04:55:22.6896755Z at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688) > 2024-03-04T04:55:22.6897322Z at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > 2024-03-04T04:55:22.6897911Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > 2024-03-04T04:55:22.6971261Z at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > 2024-03-04T04:55:22.6971737Z at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptLifecycleMethod(TimeoutExtension.java:126) > 2024-03-04T04:55:22.6972156Z at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptBeforeAllMethod(TimeoutExtension.java:68) > 2024-03-04T04:55:22.6972608Z at > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > 2024-03-04T04:55:22.6973048Z at > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > 2024-03-04T04:55:22.6973483Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > 2024-03-04T04:55:22.6974121Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > 2024-03-04T04:55:22.6974562Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > 2024-03-04T04:55:22.6975257Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > 2024-03-04T04:55:22.6975649Z at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) > 2024-03-04T04:55:22.6976025Z at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) > 2024-03-04T04:55:22.6976454Z at > org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.lambda$invokeBeforeAllMethods$9(ClassBasedTestDescriptor.java:384) > 2024-03-04T04:55:22.6976901Z at > org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > 2024-03-04T04:55:22.6977341Z at > org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.invokeBeforeAllMethods(ClassBasedTestDescriptor.java:382) > 2024-03-04T04:55:22.6977781Z at > org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.before(ClassBasedTestDescriptor.java:196) > 2024-03-04T04:55:22.6978194Z at > org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.before(ClassBasedTestDescriptor.java:78) > 2024-03-04T04:55:22.6978624Z at > org.junit.platform.engine.support.hierarchical.NodeTest
[jira] [Created] (HUDI-7484) Fix partitioning style when partition is inferred from partitionBy
Sagar Sumit created HUDI-7484: - Summary: Fix partitioning style when partition is inferred from partitionBy Key: HUDI-7484 URL: https://issues.apache.org/jira/browse/HUDI-7484 Project: Apache Hudi Issue Type: Task Reporter: Sagar Sumit Fix For: 1.0.0 When inferring partition from partitionBy() arguments and hive style partitioning is enabled, we observe that the partitioining style is not uniformed for multi-level partition. Directory structure is as follows: partition=2015 |- 03 |- 15 |- 16 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [I] [SUPPORT] Dataloss in FlinkCDC into Hudi without any exception or other infomation [hudi]
xuzifu666 commented on issue #10542: URL: https://github.com/apache/hudi/issues/10542#issuecomment-1979941864 > hey @xuzifu666 : do you happened to have the old data intact which had data loss. We would like to root cause this. 0.x release line will be used by lot of OSS users. So, we really wanna get to the bottom of it and fix it. > > Would greatly appreciate if you an help us triage this. > > * Do you happened to know when exactly the data loss happens. do you see anything interesting in the timeline around the time the data loss happens. > * Is it a single writer or multi-writer. > * We do have some suspicion around log record reading that we are chasing. Ref ticket: [[SUPPORT] Data loss due to incorrect selection of log file during compaction #10803](https://github.com/apache/hudi/issues/10803) But I do not want to bias this one. lets get more info about when exactly data loss is seen. > * Are there any task retries in general. I am not familiar w/ flink. But in spark, we might have spark task retries. Are there any such things happening in your pipeline. > * Is it happening across all pipelines occasionally or very few pipelines. And if its very few, is there any common characteristics like index type, metadata enabled, etc. in comparison to other pipelines which does not have the data loss issue. > * And can you confirm that these pipelines were running w/o any issues w/ older versions of hudi. > * Do you happened to reproduce this in a deterministic manner? Hi @nsivabalan Thanks for your attention, according to your raised conditions,I list as follow: 1. From all the loss record timestamp,It would happend arround flink job checkpoint finished,but job state is ok,no exception in timeline. because this it is hard to tag the root. 2. In our case,dataloss happend in single write job. 3. https://github.com/apache/hudi/issues/10803 the issue had read recently,but it produce in compaction sence,we had test in all sences about:a. flink job with compaction online; b. flink job without compaction c.flink job with compaction by spark compaction sync. These scences all could happend dataloss. 4. All the time job is stable without any exception. No any retried during the running time. 5. Pipline is about 4 or 5 number size,and we did not use mdt,table type is mor,index type is bucket. 6. We use Hudi version is 0.14.0 7. Since now we had get a deterministic manner to reproduce it because job state is very well and timeline state is OK. If you have any other questions can leave anytime. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7480) initializeFunctionalIndexPartition is called multiple times
[ https://issues.apache.org/jira/browse/HUDI-7480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7480: -- Fix Version/s: 1.0.0 > initializeFunctionalIndexPartition is called multiple times > --- > > Key: HUDI-7480 > URL: https://issues.apache.org/jira/browse/HUDI-7480 > Project: Apache Hudi > Issue Type: Bug >Reporter: Vinaykumar Bhat >Assignee: Sagar Sumit >Priority: Major > Fix For: 1.0.0 > > > This is due to a issue in > initializeFromFilesystem(), which tries to check if MDT partition needs to be > initialized based on the absence of partition-type. But for functional index, > partition-type actually store the prefix (func_index_)- hence the check > always fails and we try to reinit the same functional index partition again. > > Simple test: > {quote}spark.sql( > s""" > |create table $tableName ( > | id int, > | name string, > | price double, > | ts long > |) using hudi > | options ( > | primaryKey ='id', > | type = '$tableType', > | preCombineField = 'ts', > | hoodie.metadata.record.index.enable = 'true', > | hoodie.datasource.write.recordkey.field = 'id' > | ) > | partitioned by(ts) > | location '$basePath' > """.stripMargin) > spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)") > spark.sql(s"insert into $tableName values(2, 'a2', 10, 1001)") > spark.sql(s"insert into $tableName values(3, 'a3', 10, 1002)") > > var createIndexSql = s"create index idx_datestr on $tableName using > column_stats(ts) options(func='from_unixtime', format='-MM-dd')" > spark.sql(createIndexSql) > > -- This insert throws null-pointer exception > spark.sql(s"insert into $tableName values(4, 'a4', 10, 1004)"){quote} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [WIP][HUDI-6472] fix spark sql does not ignore case [hudi]
jonvex commented on PR #10582: URL: https://github.com/apache/hudi/pull/10582#issuecomment-1979936406 made some changes to this pr and put them into a new one https://github.com/apache/hudi/pull/10826. @danny0405 how should we proceed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Schema file too large and keeps growing, OOM when http handle it [hudi]
lei-su-awx commented on issue #10816: URL: https://github.com/apache/hudi/issues/10816#issuecomment-1979934695 @ad1happy2go got it, thanks for your reply. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-6472] fix spark sql does not ignore case [hudi]
jonvex opened a new pull request, #10826: URL: https://github.com/apache/hudi/pull/10826 ### Change Logs https://github.com/apache/hudi/pull/10582 with the following changes: - HoodieSpark32PlusAnalysis: made this change much less complex - correct capitalization of partition column names for keygen ### Impact allow merge into to ignore case ### Risk level (write none, low medium or high below) low ### Documentation Update N/A ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-60) [UMBRELLA] Support Apache Beam / Hudi IO
[ https://issues.apache.org/jira/browse/HUDI-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823826#comment-17823826 ] xy commented on HUDI-60: :),I aggree that during the work in HudiIO table services should be disabled and let user do table service aync offline. Decoupling the logic is fitable. > [UMBRELLA] Support Apache Beam / Hudi IO > > > Key: HUDI-60 > URL: https://issues.apache.org/jira/browse/HUDI-60 > Project: Apache Hudi > Issue Type: Epic > Components: spark, Utilities >Reporter: Vinoth Chandar >Assignee: xy >Priority: Major > Labels: gsoc, gsoc2021, hudi-umbrellas, mentor > Fix For: 0.15.0 > > > We would like to add a HudiIO for Beam, along the lines of > [https://github.com/apache/beam/blob/master/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java] > > For the initial cut : we can leave the table services turned off on the > writer and advise users to run them independently? > During this work - we can also look into anything need to be fixed on the > java-client module, which works with GenericRecords as well (used by the > Kafka Connect Sink). So if thats in shape, this can be much easier. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7418] Create a common method for filtering in S3 and GCS sources and add tests for filtering out extensions [hudi]
yihua merged PR #10724: URL: https://github.com/apache/hudi/pull/10724 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated (31613745168 -> f5284964e29)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 31613745168 [HUDI-7475] Disable ITs in hudi-aws module (#10821) add f5284964e29 [HUDI-7418] Create a common method for filtering in S3 and GCS sources and add tests for filtering out extensions (#10724) No new revisions were added by this update. Summary of changes: .../hudi/utilities/config/CloudSourceConfig.java | 4 +- .../config/S3EventsHoodieIncrSourceConfig.java | 6 ++ .../sources/GcsEventsHoodieIncrSource.java | 8 +-- .../sources/S3EventsHoodieIncrSource.java | 50 +++- .../helpers/CloudObjectsSelectorCommon.java| 68 ++ .../helpers/gcs/GcsObjectMetadataFetcher.java | 39 + .../sources/TestGcsEventsHoodieIncrSource.java | 42 + .../sources/TestS3EventsHoodieIncrSource.java | 6 +- 8 files changed, 124 insertions(+), 99 deletions(-)
Re: [PR] initial commit for hudi blogs [hudi]
nfarah86 commented on PR #10719: URL: https://github.com/apache/hudi/pull/10719#issuecomment-1979916357 should be good @bhasudha -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] initial commit for hudi blogs [hudi]
nfarah86 commented on code in PR #10719: URL: https://github.com/apache/hudi/pull/10719#discussion_r1513695794 ## website/blog/2024-01-18-Deleting-Items-from-Apache-Hudi-using-Delta-Streamer-in-UPSERT-Mode-with-Kafka-Avro-Messages.mdx: ## @@ -0,0 +1,23 @@ +--- +title: "Deleting Items from Apache Hudi using Delta Streamer in UPSERT Mode with Kafka Avro Messages" +excerpt: "Deleting Items from Apache Hudi using Delta Streamer in UPSERT Mode with Kafka Avro Messages" +author: Soumil Shah +category: blog +image: /assets/images/blog/2024-01-18-Deleting-Items-from-Apache-Hudi-using-Delta-Streamer-in-UPSERT-Mode-with-Kafka-Avro-Messages.png +tags: +- blog +- apache hudi +- linkedin +- beginner +- hudi streamer +- deltastreamer +- apache kafka +- apache avro +- upsert Review Comment: added the singular version -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Dataloss in FlinkCDC into Hudi without any exception or other infomation [hudi]
nsivabalan commented on issue #10542: URL: https://github.com/apache/hudi/issues/10542#issuecomment-1979914443 hey @xuzifu666 : do you happened to have the old data intact which had data loss. We would like to root cause this. 0.x release line will be used by lot of OSS users. So, we really wanna get to the bottom of it and fix it. Would greatly appreciate if you an help us triage this. - Do you happened to know when exactly the data loss happens. do you see anything interesting in the timeline around the time the data loss happens. - Is it a single writer or multi-writer. - We do have some suspicion around log record reading that we are chasing. Ref ticket: https://github.com/apache/hudi/issues/10803 But I do not want to bias this one. lets get more info about when exactly data loss is seen. - Are there any task retries in general. I am not familiar w/ flink. But in spark, we might have spark task retries. Are there any such things happening in your pipeline. - Is it happening across all pipelines occasionally or very few pipelines. And if its very few, is there any common characteristics like index type, metadata enabled, etc. in comparison to other pipelines which does not have the data loss issue. - And can you confirm that these pipelines were running w/o any issues w/ older versions of hudi. - Do you happened to reproduce this in a deterministic manner? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]
hudi-bot commented on PR #10818: URL: https://github.com/apache/hudi/pull/10818#issuecomment-1979913265 ## CI report: * 251ba740bd933494dafdfcd6be5393400c10bd0f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22803) * d8ffb6b051f147146e927f1241efd015bf758c6a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22804) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] initial commit for hudi blogs [hudi]
nfarah86 commented on code in PR #10719: URL: https://github.com/apache/hudi/pull/10719#discussion_r1513693521 ## website/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.mdx: ## @@ -0,0 +1,16 @@ +--- +title: "How a POC became a production-ready Hudi data lakehouse through close team collaboration" +excerpt: "How a POC became a production-ready Hudi data lakehouse through close team collaboration" +author: leboncoin tech blog +category: blog +image: /assets/images/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.png +tags: +- blog +- apache hudi +- medium Review Comment: i put the singular version -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] initial commit for hudi blogs [hudi]
nfarah86 commented on code in PR #10719: URL: https://github.com/apache/hudi/pull/10719#discussion_r1513692010 ## website/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.mdx: ## @@ -0,0 +1,16 @@ +--- +title: "How a POC became a production-ready Hudi data lakehouse through close team collaboration" +excerpt: "How a POC became a production-ready Hudi data lakehouse through close team collaboration" +author: leboncoin tech blog +category: blog +image: /assets/images/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.png +tags: +- blog +- apache hudi +- medium Review Comment: i found the blog on medium- https://medium.com/leboncoin-tech-blog/how-a-poc-became-a-production-ready-hudi-data-lakehouse-through-close-team-collaboration-c7f33eb746a8 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] initial commit for hudi blogs [hudi]
nfarah86 commented on code in PR #10719: URL: https://github.com/apache/hudi/pull/10719#discussion_r1513692010 ## website/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.mdx: ## @@ -0,0 +1,16 @@ +--- +title: "How a POC became a production-ready Hudi data lakehouse through close team collaboration" +excerpt: "How a POC became a production-ready Hudi data lakehouse through close team collaboration" +author: leboncoin tech blog +category: blog +image: /assets/images/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.png +tags: +- blog +- apache hudi +- medium Review Comment: i found the blog on medium- https://medium.com/leboncoin-tech-blog/how-a-poc-became-a-production-ready-hudi-data-lakehouse-through-close-team-collaboration-c7f33eb746a8 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]
hudi-bot commented on PR #10818: URL: https://github.com/apache/hudi/pull/10818#issuecomment-1979907408 ## CI report: * 251ba740bd933494dafdfcd6be5393400c10bd0f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22803) * d8ffb6b051f147146e927f1241efd015bf758c6a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7418] Create a common method for filtering in S3 and GCS sources and add tests for filtering out extensions [hudi]
hudi-bot commented on PR #10724: URL: https://github.com/apache/hudi/pull/10724#issuecomment-1979907220 ## CI report: * ead7905ee6e86f7ad3f3ad63f954592fde08502b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22797) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Data loss due to incorrect selection of log file during compaction [hudi]
nsivabalan commented on issue #10803: URL: https://github.com/apache/hudi/issues/10803#issuecomment-1979906166 Hey, I wrote a tool that could help us spit out some meta info about our log blocks and records. https://github.com/nsivabalan/hudi/tree/printAllVersionsOfRecordTool here is the branch. Can you help us run the tool and share us the output. Its a spark submit command. Its going to log some info about the log files we are interested in. sample command ``` ./bin/spark-submit --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' --class org.apache.hudi.utilities.PrintRecordsTool PATH_TO_BUNDLE/hudi-utilities-bundle_2.12-0.15.0-SNAPSHOT.jar --props /tmp/props.in --base-path /tmp/hudi_trips_mor/ --partition-path asia/india/chennai --file-id c3ef010f-61ae-4aa3-a033-25b278da17c6-0 --base-instant-time 20240302002723362 --print-log-blocks-info ``` ``` cat /tmp/props.in hoodie.datasource.write.recordkey.field=uuid hoodie.datasource.write.partitionpath.field=partitionpath hoodie.datasource.write.precombine.field=ts ``` Ensure you set the right values for partition path, fileID and the base instant time. This should help w/ our triaging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] initial commit for blogs [hudi]
nfarah86 commented on PR #10825: URL: https://github.com/apache/hudi/pull/10825#issuecomment-1979904950 @bhasudha https://github.com/apache/hudi/assets/5392555/0e6b0c32-be3b-43ca-855e-e44a8aa2405d";> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] initial commit for blogs [hudi]
nfarah86 opened a new pull request, #10825: URL: https://github.com/apache/hudi/pull/10825 ### Change Logs updated blog ### Impact none ### Risk level (write none, low medium or high below) low ### Documentation Update updated blogs - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] initial commit for hudi blogs [hudi]
bhasudha commented on code in PR #10719: URL: https://github.com/apache/hudi/pull/10719#discussion_r1513686982 ## website/blog/2024-01-17-Enforce-fine-grained-access-control-on-Open-Table-Formats-via-Amazon-EMR-integrated-with-AWS-Lake-Formation.mdx: ## @@ -0,0 +1,23 @@ +--- +title: "Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation" +excerpt: "Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation" +author: Raymond Lai, Aditya Shah, Bin Wang, and Melody Yang +category: blog +image: /assets/images/blog/2024-01-17-Enforce-fine-grained-access-control-on-Open-Table-Formats-via-Amazon-EMR-integrated-with-AWS-Lake-Formation.png +tags: +- blog +- apache hudi +- aws +- intermediate +- amazon emr +- aws lake formation +- aws glue +- aws s3 +- amazon sagemaker +- aws cloud9 +- amazon athena Review Comment: can we add `access control` as well ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7418] Create a common method for filtering in S3 and GCS sources and add tests for filtering out extensions [hudi]
hudi-bot commented on PR #10724: URL: https://github.com/apache/hudi/pull/10724#issuecomment-1979901293 ## CI report: * ead7905ee6e86f7ad3f3ad63f954592fde08502b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] initial commit for hudi blogs [hudi]
bhasudha commented on code in PR #10719: URL: https://github.com/apache/hudi/pull/10719#discussion_r1513685994 ## website/blog/2024-01-18-Deleting-Items-from-Apache-Hudi-using-Delta-Streamer-in-UPSERT-Mode-with-Kafka-Avro-Messages.mdx: ## @@ -0,0 +1,23 @@ +--- +title: "Deleting Items from Apache Hudi using Delta Streamer in UPSERT Mode with Kafka Avro Messages" +excerpt: "Deleting Items from Apache Hudi using Delta Streamer in UPSERT Mode with Kafka Avro Messages" +author: Soumil Shah +category: blog +image: /assets/images/blog/2024-01-18-Deleting-Items-from-Apache-Hudi-using-Delta-Streamer-in-UPSERT-Mode-with-Kafka-Avro-Messages.png +tags: +- blog +- apache hudi +- linkedin +- beginner +- hudi streamer +- deltastreamer +- apache kafka +- apache avro +- upsert Review Comment: add `deletes` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] initial commit for hudi blogs [hudi]
bhasudha commented on code in PR #10719: URL: https://github.com/apache/hudi/pull/10719#discussion_r1513683042 ## website/blog/2024-01-20-Data-Engineering-Bootstrapping-Data-lake-with-Apache-Hudi.mdx: ## @@ -0,0 +1,20 @@ +--- +title: "Data Engineering: Bootstrapping Data lake with Apache Hudi" +excerpt: "Data Engineering: Bootstrapping Data lake with Apache Hudi" +author: Krishna Prasad +category: blog +image: /assets/images/blog/2024-01-20-Data-Engineering-Bootstrapping-Data-lake-with-Apache-Hudi.png +tags: +- blog +- apache hudi +- medium +- intermediate Review Comment: this seems beginner level. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-2706] refactor spark-sql to make consistent with DataFrame api [hudi]
danny0405 commented on code in PR #3936: URL: https://github.com/apache/hudi/pull/3936#discussion_r1513682928 ## hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/DeleteHoodieTableCommand.scala: ## @@ -56,32 +57,36 @@ case class DeleteHoodieTableCommand(deleteTable: DeleteFromTable) extends Runnab } private def buildHoodieConfig(sparkSession: SparkSession): Map[String, String] = { -val targetTable = sparkSession.sessionState.catalog - .getTableMetadata(tableId) +val targetTable = sparkSession.sessionState.catalog.getTableMetadata(tableId) +val tblProperties = targetTable.storage.properties ++ targetTable.properties val path = getTableLocation(targetTable, sparkSession) val conf = sparkSession.sessionState.newHadoopConf() val metaClient = HoodieTableMetaClient.builder() .setBasePath(path) .setConf(conf) .build() val tableConfig = metaClient.getTableConfig -val primaryColumns = HoodieOptionConfig.getPrimaryColumns(targetTable.storage.properties) Review Comment: cc @boneanxs for taking a look if you have time~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] initial commit for hudi blogs [hudi]
bhasudha commented on code in PR #10719: URL: https://github.com/apache/hudi/pull/10719#discussion_r1513680939 ## website/blog/2024-01-30-Leverage-Partition-Paths-of-your-data-lake-tables-to-Optimize-Data-Retrieval-Costs-on-the-cloud.mdx: ## @@ -0,0 +1,19 @@ +--- +title: "Leverage Partition Paths of your data lake tables to Optimize Data Retrieval Costs on the cloud" +excerpt: "Leverage Partition Paths of your data lake tables to Optimize Data Retrieval Costs on the cloud" +author: Krishna Prasad +category: blog +image: /assets/images/blog/2024-01-30-Leverage-Partition-Paths-of-your-data-lake-tables-to-Optimize-Data-Retrieval-Costs-on-the-cloud.png +tags: +- blog +- apache hudi +- medium +- intermediate +- aws glue Review Comment: add `partition` tag? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-2706] refactor spark-sql to make consistent with DataFrame api [hudi]
jonvex commented on code in PR #3936: URL: https://github.com/apache/hudi/pull/3936#discussion_r1513680318 ## hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/DeleteHoodieTableCommand.scala: ## @@ -56,32 +57,36 @@ case class DeleteHoodieTableCommand(deleteTable: DeleteFromTable) extends Runnab } private def buildHoodieConfig(sparkSession: SparkSession): Map[String, String] = { -val targetTable = sparkSession.sessionState.catalog - .getTableMetadata(tableId) +val targetTable = sparkSession.sessionState.catalog.getTableMetadata(tableId) +val tblProperties = targetTable.storage.properties ++ targetTable.properties val path = getTableLocation(targetTable, sparkSession) val conf = sparkSession.sessionState.newHadoopConf() val metaClient = HoodieTableMetaClient.builder() .setBasePath(path) .setConf(conf) .build() val tableConfig = metaClient.getTableConfig -val primaryColumns = HoodieOptionConfig.getPrimaryColumns(targetTable.storage.properties) Review Comment: @YannByron @xushiyan @danny0405 @leesf : Do we have context around why the case sensitivity was changed here. Looks like case sensitivity is broken w/ spark-sql Merge Into as of now. We are looking to work towards a fix. but wanted to ensure we don't unintentionally break something else if this piece of code was intentionally written for some reason. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7138] Fix the conversion of typed properties to map in scala [hudi]
rmahindra123 closed pull request #10416: [HUDI-7138] Fix the conversion of typed properties to map in scala URL: https://github.com/apache/hudi/pull/10416 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7478] Fix max delta commits guard check w/ MDT [hudi]
danny0405 commented on code in PR #10820: URL: https://github.com/apache/hudi/pull/10820#discussion_r1513671566 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -830,7 +830,7 @@ private static void deletePendingIndexingInstant(HoodieTableMetaClient metaClien protected static void checkNumDeltaCommits(HoodieTableMetaClient metaClient, int maxNumDeltaCommitsWhenPending) { final HoodieActiveTimeline activeTimeline = metaClient.reloadActiveTimeline(); Option lastCompaction = activeTimeline.filterCompletedInstants() -.filter(s -> s.getAction().equals(COMPACTION_ACTION)).lastInstant(); +.filter(s -> s.getAction().equals(COMMIT_ACTION)).lastInstant(); Review Comment: I'm wondering whether we should use the `COMPACTION_ACTION` for committed instant in release 1.0.x, cc @vinothchandar ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] initial commit for hudi blogs [hudi]
bhasudha commented on code in PR #10719: URL: https://github.com/apache/hudi/pull/10719#discussion_r1513668790 ## website/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.mdx: ## @@ -0,0 +1,16 @@ +--- +title: "How a POC became a production-ready Hudi data lakehouse through close team collaboration" +excerpt: "How a POC became a production-ready Hudi data lakehouse through close team collaboration" +author: leboncoin tech blog Review Comment: Author Info obtained from the blog : `By Xiaoxiao Rey, Data Engineer, and [Hussein Awala](https://medium.com/@hussein-awala), Senior Data Engineer` Can we use these author names? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] initial commit for hudi blogs [hudi]
bhasudha commented on code in PR #10719: URL: https://github.com/apache/hudi/pull/10719#discussion_r1513667248 ## website/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.mdx: ## @@ -0,0 +1,16 @@ +--- +title: "How a POC became a production-ready Hudi data lakehouse through close team collaboration" +excerpt: "How a POC became a production-ready Hudi data lakehouse through close team collaboration" +author: leboncoin tech blog +category: blog +image: /assets/images/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.png +tags: +- blog +- apache hudi +- medium +- beginner Review Comment: change to `use-case` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] initial commit for hudi blogs [hudi]
bhasudha commented on code in PR #10719: URL: https://github.com/apache/hudi/pull/10719#discussion_r151373 ## website/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.mdx: ## @@ -0,0 +1,16 @@ +--- +title: "How a POC became a production-ready Hudi data lakehouse through close team collaboration" +excerpt: "How a POC became a production-ready Hudi data lakehouse through close team collaboration" +author: leboncoin tech blog +category: blog +image: /assets/images/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.png +tags: +- blog +- apache hudi +- medium Review Comment: This should be `leboncoin-tech-blog` instead of `medium` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] initial commit for hudi blogs [hudi]
bhasudha commented on code in PR #10719: URL: https://github.com/apache/hudi/pull/10719#discussion_r1513664527 ## website/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.mdx: ## @@ -0,0 +1,16 @@ +--- +title: "How a POC became a production-ready Hudi data lakehouse through close team collaboration" +excerpt: "How a POC became a production-ready Hudi data lakehouse through close team collaboration" +author: leboncoin tech blog +category: blog +image: /assets/images/blog/2024-02-12-How-a-POC-became-a-production-ready-Hudi-data-lakehouse-through-close-team-collaboration.png +tags: +- blog +- apache hudi +- medium Review Comment: Add tags such as `deletes` `gdpr deletion` `upserts` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Data loss due to incorrect selection of log file during compaction [hudi]
nsivabalan commented on issue #10803: URL: https://github.com/apache/hudi/issues/10803#issuecomment-1979863694 Sorry about lot of follow up questions. Can you tell us what storage scheme you are using. Partial write failures should not happen w/ S3 or other cloud stores. if it had been hdfs, we cold see partial write failures. We are trying to gauge if the 2nd log file was properly formed or was it corrupted due to partial write failure. If you have a backup of the data, let us know. We can share some tool that can spit out info about the log files (valid log blocks, no of valid records, etc) and might help us in our triaging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]
hudi-bot commented on PR #10818: URL: https://github.com/apache/hudi/pull/10818#issuecomment-1979854060 ## CI report: * 251ba740bd933494dafdfcd6be5393400c10bd0f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22803) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7478] Fix max delta commits guard check w/ MDT [hudi]
hudi-bot commented on PR #10820: URL: https://github.com/apache/hudi/pull/10820#issuecomment-1979854100 ## CI report: * 800fcd378e0bb95c53a81c3c60796c97ea53d821 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22794) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR][Test only] 1 [hudi]
yihua closed pull request #10824: [MINOR][Test only] 1 URL: https://github.com/apache/hudi/pull/10824 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [MINOR][Test only] 1 [hudi]
yihua opened a new pull request, #10824: URL: https://github.com/apache/hudi/pull/10824 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7478] Fix max delta commits guard check w/ MDT [hudi]
hudi-bot commented on PR #10820: URL: https://github.com/apache/hudi/pull/10820#issuecomment-1979847082 ## CI report: * 800fcd378e0bb95c53a81c3c60796c97ea53d821 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22794) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]
hudi-bot commented on PR #10818: URL: https://github.com/apache/hudi/pull/10818#issuecomment-1979847047 ## CI report: * Unknown: [CANCELED](TBD) * 251ba740bd933494dafdfcd6be5393400c10bd0f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]
yihua commented on PR #10818: URL: https://github.com/apache/hudi/pull/10818#issuecomment-1979831673 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7478] Fix max delta commits guard check w/ MDT [hudi]
wombatu-kun commented on PR #10820: URL: https://github.com/apache/hudi/pull/10820#issuecomment-1979821628 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7483) TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict
[ https://issues.apache.org/jira/browse/HUDI-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu updated HUDI-7483: -- Description: https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=22801&view=logs&j=600e7de6-e133-5e69-e615-50ee129b3c08&t=bbbd7bcc-ae73-56b8-887a-cd2d6deaafc7&s=859b8d9a-8fd6-5a5c-6f5e-f84f1990894e {code:java} [ERROR] Tests run: 29, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 307.331 s <<< FAILURE! - in org.apache.hudi.client.TestHoodieClientMultiWriter [ERROR] testMultiWriterWithAsyncTableServicesWithConflict{HoodieTableType, Class, ConflictResolutionStrategy}[6] Time elapsed: 16.083 s <<< ERROR! java.util.concurrent.ExecutionException: org.opentest4j.AssertionFailedError: Expected org.apache.hudi.exception.HoodieWriteConflictException to be thrown, but nothing was thrown. at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hudi.client.TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict(TestHoodieClientMultiWriter.java:565) {code} was: {code:java} [ERROR] Tests run: 29, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 307.331 s <<< FAILURE! - in org.apache.hudi.client.TestHoodieClientMultiWriter [ERROR] testMultiWriterWithAsyncTableServicesWithConflict{HoodieTableType, Class, ConflictResolutionStrategy}[6] Time elapsed: 16.083 s <<< ERROR! java.util.concurrent.ExecutionException: org.opentest4j.AssertionFailedError: Expected org.apache.hudi.exception.HoodieWriteConflictException to be thrown, but nothing was thrown. at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hudi.client.TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict(TestHoodieClientMultiWriter.java:565) {code} > TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict > - > > Key: HUDI-7483 > URL: https://issues.apache.org/jira/browse/HUDI-7483 > Project: Apache Hudi > Issue Type: Bug >Reporter: Lin Liu >Priority: Major > > https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=22801&view=logs&j=600e7de6-e133-5e69-e615-50ee129b3c08&t=bbbd7bcc-ae73-56b8-887a-cd2d6deaafc7&s=859b8d9a-8fd6-5a5c-6f5e-f84f1990894e > {code:java} > [ERROR] Tests run: 29, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 307.331 s <<< FAILURE! - in org.apache.hudi.client.TestHoodieClientMultiWriter > [ERROR] testMultiWriterWithAsyncTableServicesWithConflict{HoodieTableType, > Class, ConflictResolutionStrategy}[6] Time elapsed: 16.083 s <<< ERROR! > java.util.concurrent.ExecutionException: org.opentest4j.AssertionFailedError: > Expected org.apache.hudi.exception.HoodieWriteConflictException to be thrown, > but nothing was thrown. > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hudi.client.TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict(TestHoodieClientMultiWriter.java:565) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7483) TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict
Lin Liu created HUDI-7483: - Summary: TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict Key: HUDI-7483 URL: https://issues.apache.org/jira/browse/HUDI-7483 Project: Apache Hudi Issue Type: Bug Reporter: Lin Liu {code:java} [ERROR] Tests run: 29, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 307.331 s <<< FAILURE! - in org.apache.hudi.client.TestHoodieClientMultiWriter [ERROR] testMultiWriterWithAsyncTableServicesWithConflict{HoodieTableType, Class, ConflictResolutionStrategy}[6] Time elapsed: 16.083 s <<< ERROR! java.util.concurrent.ExecutionException: org.opentest4j.AssertionFailedError: Expected org.apache.hudi.exception.HoodieWriteConflictException to be thrown, but nothing was thrown. at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hudi.client.TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict(TestHoodieClientMultiWriter.java:565) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [I] [SUPPORT] java.lang.NoClassDefFoundError: org/apache/hudi/com/fasterxml/jackson/module/scala/DefaultScalaModule$ when doing an Incremental CDC Query in 0.14.1 [hudi]
Tyler-Rendina commented on issue #10590: URL: https://github.com/apache/hudi/issues/10590#issuecomment-1979738484 Is there a way to manually add the class after importing the spark bundle? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7473] Rebalance CI [hudi]
hudi-bot commented on PR #10805: URL: https://github.com/apache/hudi/pull/10805#issuecomment-1979723667 ## CI report: * c7c575df44ea9bf7f7b26587e26116d93955b2e2 UNKNOWN * 572c0ce5761d4e06fdf8ceebf808a9850d499bbb Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22799) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] hudi 0.14.1 and hudi 0.14.0 build issue [hudi]
yihua commented on issue #10808: URL: https://github.com/apache/hudi/issues/10808#issuecomment-1979684160 I've updated the Spark 3.5 support PR to have label `release-0.15.0` instead of `release-0.14.1`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]
hudi-bot commented on PR #10818: URL: https://github.com/apache/hudi/pull/10818#issuecomment-1979663466 ## CI report: * 04833166c6b9d859f7d0d7b26eb54ec6938a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22802) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Publish test results from the containerized job to Azure [hudi]
hudi-bot commented on PR #10818: URL: https://github.com/apache/hudi/pull/10818#issuecomment-1979652823 ## CI report: * c2ee3d3e53250fc6757172510018026a026d0bbe Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22800) * 04833166c6b9d859f7d0d7b26eb54ec6938a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org