[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file
hudi-bot commented on PR #6358: URL: https://github.com/apache/hudi/pull/6358#issuecomment-1269321846 ## CI report: * 288d166c49602a4593b1e97763a467811903737d UNKNOWN * f21eab07069aa87544e04b115e7463126cd9c472 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12015) * 3dd9e31fa787ee2c4308bca9b2fe691566c51ec5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12022) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file
hudi-bot commented on PR #6358: URL: https://github.com/apache/hudi/pull/6358#issuecomment-1269318141 ## CI report: * 288d166c49602a4593b1e97763a467811903737d UNKNOWN * f21eab07069aa87544e04b115e7463126cd9c472 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12015) * 3dd9e31fa787ee2c4308bca9b2fe691566c51ec5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6836: [HUDI-4952] Fixing reading from metadata table when there are no inflight commits
hudi-bot commented on PR #6836: URL: https://github.com/apache/hudi/pull/6836#issuecomment-1269280015 ## CI report: * e246d65957362860b850f1af9ef973b85bf1a4eb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12017) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6745: Fix comment in RFC46
hudi-bot commented on PR #6745: URL: https://github.com/apache/hudi/pull/6745#issuecomment-1269274673 ## CI report: * af8e58757bed12e53907076da02add1ba98b220c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12014) * 261adecadc91712a222905082cad122befe81566 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12021) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6745: Fix comment in RFC46
hudi-bot commented on PR #6745: URL: https://github.com/apache/hudi/pull/6745#issuecomment-1269271854 ## CI report: * af8e58757bed12e53907076da02add1ba98b220c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12014) * 261adecadc91712a222905082cad122befe81566 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4605) Upgrade hudi-presto-bundle version to 0.12.0
[ https://issues.apache.org/jira/browse/HUDI-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4605: - Fix Version/s: 0.12.2 > Upgrade hudi-presto-bundle version to 0.12.0 > > > Key: HUDI-4605 > URL: https://issues.apache.org/jira/browse/HUDI-4605 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Assignee: Ethan Guo >Priority: Major > Fix For: 0.12.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6876: [MINOR] Handling null event time
hudi-bot commented on PR #6876: URL: https://github.com/apache/hudi/pull/6876#issuecomment-1269268926 ## CI report: * 6ce255ff0537ecb4ecf9bf7cf7f2534f7021337b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12019) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4522) [DOCS] Set presto session prop to use parquet column names in case of type mismatch
[ https://issues.apache.org/jira/browse/HUDI-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4522: - Fix Version/s: 0.13.0 (was: 0.12.0) > [DOCS] Set presto session prop to use parquet column names in case of type > mismatch > --- > > Key: HUDI-4522 > URL: https://issues.apache.org/jira/browse/HUDI-4522 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Assignee: Léo Biscassi >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > See https://github.com/apache/hudi/issues/6142 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6862: [HUDI-4989] fixing deltastreamer init failures
hudi-bot commented on PR #6862: URL: https://github.com/apache/hudi/pull/6862#issuecomment-1269268848 ## CI report: * 149aec6ea8ff6d895da07b0226be1efdf920e3d8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12016) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4522) [DOCS] Set presto session prop to use parquet column names in case of type mismatch
[ https://issues.apache.org/jira/browse/HUDI-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4522: - Fix Version/s: 0.12.0 (was: 0.13.0) > [DOCS] Set presto session prop to use parquet column names in case of type > mismatch > --- > > Key: HUDI-4522 > URL: https://issues.apache.org/jira/browse/HUDI-4522 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Assignee: Léo Biscassi >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > > See https://github.com/apache/hudi/issues/6142 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-3210) [UMBRELLA] Native Presto connector for Hudi
[ https://issues.apache.org/jira/browse/HUDI-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3210: - Summary: [UMBRELLA] Native Presto connector for Hudi (was: [UMBRELLA] A new Presto connector for Hudi) > [UMBRELLA] Native Presto connector for Hudi > --- > > Key: HUDI-3210 > URL: https://issues.apache.org/jira/browse/HUDI-3210 > Project: Apache Hudi > Issue Type: Epic > Components: trino-presto >Reporter: Todd Gao >Assignee: Sagar Sumit >Priority: Major > Fix For: 0.13.0, 1.0.0 > > > This JIRA tracks all the tasks related to building a new Hudi connector in > Presto. > h4. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-4988) Add Docs regarding Hudi RecordMerger
[ https://issues.apache.org/jira/browse/HUDI-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank Wong closed HUDI-4988. Resolution: Fixed > Add Docs regarding Hudi RecordMerger > > > Key: HUDI-4988 > URL: https://issues.apache.org/jira/browse/HUDI-4988 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Alexey Kudinkin >Assignee: Frank Wong >Priority: Critical > > We need to make sure that we're adding docs explaining > - RecordMerger component its API and lifecycle > - Relationship w/ Merging Strategy > - Its current limitations (and future evolution) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-4988) Add Docs regarding Hudi RecordMerger
[ https://issues.apache.org/jira/browse/HUDI-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank Wong reassigned HUDI-4988: Assignee: Frank Wong > Add Docs regarding Hudi RecordMerger > > > Key: HUDI-4988 > URL: https://issues.apache.org/jira/browse/HUDI-4988 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Alexey Kudinkin >Assignee: Frank Wong >Priority: Critical > > We need to make sure that we're adding docs explaining > - RecordMerger component its API and lifecycle > - Relationship w/ Merging Strategy > - Its current limitations (and future evolution) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-3217) RFC-46: Optimize Record Payload handling
[ https://issues.apache.org/jira/browse/HUDI-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank Wong reassigned HUDI-3217: Assignee: Alexey Kudinkin (was: Frank Wong) > RFC-46: Optimize Record Payload handling > > > Key: HUDI-3217 > URL: https://issues.apache.org/jira/browse/HUDI-3217 > Project: Apache Hudi > Issue Type: Epic > Components: storage-management, writer-core >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: hudi-umbrellas, pull-request-available > Fix For: 0.13.0 > > > Currently Hudi is biased t/w assumption of particular payload representation > (Avro), long-term we would like to steer away from this to keep the record > payload be completely opaque, so that > # We can keep record payload representation engine-specific > # Avoid unnecessary serde loops (Engine-specific > Avro > Engine-specific > > Binary) > h2. *Proposal* > > *Phase 2: Revisiting Record Handling* > {_}T-shirt{_}: 2-2.5 weeks > {_}Goal{_}: Avoid tight coupling with particular record representation on the > Read Path (currently Avro) and enable > * Revisit RecordPayload APIs > ** Deprecate {{getInsertValue}} and {{combineAndGetUpdateValue}} APIs > replacing w/ new “opaque” APIs (not returning Avro payloads) > ** Rebase RecordPayload hierarchy to be engine-specific: > *** Common engine-specific base abstracting common functionality (Spark, > Flink, Java) > *** Each feature-specific semantic will have to implement for all engines > ** Introduce new APIs > *** To access keys (record, partition) > *** To convert record to Avro (for BWC) > * Revisit RecordPayload handling > ** In WriteHandles > *** API will be accepting opaque RecordPayload (no Avro conversion) > *** Can do (opaque) record merging if necessary > *** Passes RP as is to FileWriter > ** In FileWriters > *** Will accept RecordPayload interface > *** Should be engine-specific (to handle internal record representation > ** In RecordReaders > *** API will be providing opaque RecordPayload (no Avro conversion) > > REF > [https://app.clickup.com/18029943/v/dc/h67bq-1900/h67bq-6680] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (HUDI-3217) RFC-46: Optimize Record Payload handling
[ https://issues.apache.org/jira/browse/HUDI-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank Wong reopened HUDI-3217: -- > RFC-46: Optimize Record Payload handling > > > Key: HUDI-3217 > URL: https://issues.apache.org/jira/browse/HUDI-3217 > Project: Apache Hudi > Issue Type: Epic > Components: storage-management, writer-core >Reporter: Alexey Kudinkin >Assignee: Frank Wong >Priority: Blocker > Labels: hudi-umbrellas, pull-request-available > Fix For: 0.13.0 > > > Currently Hudi is biased t/w assumption of particular payload representation > (Avro), long-term we would like to steer away from this to keep the record > payload be completely opaque, so that > # We can keep record payload representation engine-specific > # Avoid unnecessary serde loops (Engine-specific > Avro > Engine-specific > > Binary) > h2. *Proposal* > > *Phase 2: Revisiting Record Handling* > {_}T-shirt{_}: 2-2.5 weeks > {_}Goal{_}: Avoid tight coupling with particular record representation on the > Read Path (currently Avro) and enable > * Revisit RecordPayload APIs > ** Deprecate {{getInsertValue}} and {{combineAndGetUpdateValue}} APIs > replacing w/ new “opaque” APIs (not returning Avro payloads) > ** Rebase RecordPayload hierarchy to be engine-specific: > *** Common engine-specific base abstracting common functionality (Spark, > Flink, Java) > *** Each feature-specific semantic will have to implement for all engines > ** Introduce new APIs > *** To access keys (record, partition) > *** To convert record to Avro (for BWC) > * Revisit RecordPayload handling > ** In WriteHandles > *** API will be accepting opaque RecordPayload (no Avro conversion) > *** Can do (opaque) record merging if necessary > *** Passes RP as is to FileWriter > ** In FileWriters > *** Will accept RecordPayload interface > *** Should be engine-specific (to handle internal record representation > ** In RecordReaders > *** API will be providing opaque RecordPayload (no Avro conversion) > > REF > [https://app.clickup.com/18029943/v/dc/h67bq-1900/h67bq-6680] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-3217) RFC-46: Optimize Record Payload handling
[ https://issues.apache.org/jira/browse/HUDI-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank Wong updated HUDI-3217: - Status: In Progress (was: Reopened) > RFC-46: Optimize Record Payload handling > > > Key: HUDI-3217 > URL: https://issues.apache.org/jira/browse/HUDI-3217 > Project: Apache Hudi > Issue Type: Epic > Components: storage-management, writer-core >Reporter: Alexey Kudinkin >Assignee: Frank Wong >Priority: Blocker > Labels: hudi-umbrellas, pull-request-available > Fix For: 0.13.0 > > > Currently Hudi is biased t/w assumption of particular payload representation > (Avro), long-term we would like to steer away from this to keep the record > payload be completely opaque, so that > # We can keep record payload representation engine-specific > # Avoid unnecessary serde loops (Engine-specific > Avro > Engine-specific > > Binary) > h2. *Proposal* > > *Phase 2: Revisiting Record Handling* > {_}T-shirt{_}: 2-2.5 weeks > {_}Goal{_}: Avoid tight coupling with particular record representation on the > Read Path (currently Avro) and enable > * Revisit RecordPayload APIs > ** Deprecate {{getInsertValue}} and {{combineAndGetUpdateValue}} APIs > replacing w/ new “opaque” APIs (not returning Avro payloads) > ** Rebase RecordPayload hierarchy to be engine-specific: > *** Common engine-specific base abstracting common functionality (Spark, > Flink, Java) > *** Each feature-specific semantic will have to implement for all engines > ** Introduce new APIs > *** To access keys (record, partition) > *** To convert record to Avro (for BWC) > * Revisit RecordPayload handling > ** In WriteHandles > *** API will be accepting opaque RecordPayload (no Avro conversion) > *** Can do (opaque) record merging if necessary > *** Passes RP as is to FileWriter > ** In FileWriters > *** Will accept RecordPayload interface > *** Should be engine-specific (to handle internal record representation > ** In RecordReaders > *** API will be providing opaque RecordPayload (no Avro conversion) > > REF > [https://app.clickup.com/18029943/v/dc/h67bq-1900/h67bq-6680] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[hudi] branch master updated: [MINOR] Fix deploy script for flink 1.15 (#6872)
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new fd8a947e61 [MINOR] Fix deploy script for flink 1.15 (#6872) fd8a947e61 is described below commit fd8a947e6158ed848c7bb2efb272d833ae5c6442 Author: Shiyan Xu <2701446+xushi...@users.noreply.github.com> AuthorDate: Thu Oct 6 10:52:38 2022 +0800 [MINOR] Fix deploy script for flink 1.15 (#6872) --- scripts/release/deploy_staging_jars.sh | 2 +- scripts/release/validate_staged_bundles.sh | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/scripts/release/deploy_staging_jars.sh b/scripts/release/deploy_staging_jars.sh index df4ac84efa..b6b035877e 100755 --- a/scripts/release/deploy_staging_jars.sh +++ b/scripts/release/deploy_staging_jars.sh @@ -43,7 +43,7 @@ declare -a ALL_VERSION_OPTS=( "-Dscala-2.11 -Dspark2.4 -Dflink1.13" "-Dscala-2.11 -Dspark2.4 -Dflink1.14" "-Dscala-2.12 -Dspark2.4 -Dflink1.13" -"-Dscala-2.12 -Dspark3.3 -Dflink1.14" +"-Dscala-2.12 -Dspark3.3 -Dflink1.15" "-Dscala-2.12 -Dspark3.2 -Dflink1.14" "-Dscala-2.12 -Dspark3.1 -Dflink1.14" # run this last to make sure utilities bundle has spark 3.1 ) diff --git a/scripts/release/validate_staged_bundles.sh b/scripts/release/validate_staged_bundles.sh index db99dcba12..baf506f944 100755 --- a/scripts/release/validate_staged_bundles.sh +++ b/scripts/release/validate_staged_bundles.sh @@ -34,6 +34,7 @@ declare -a BUNDLE_URLS=( "${STAGING_REPO}/hudi-flink1.13-bundle_2.12/${VERSION}/hudi-flink1.13-bundle_2.12-${VERSION}.jar" "${STAGING_REPO}/hudi-flink1.14-bundle_2.11/${VERSION}/hudi-flink1.14-bundle_2.11-${VERSION}.jar" "${STAGING_REPO}/hudi-flink1.14-bundle_2.12/${VERSION}/hudi-flink1.14-bundle_2.12-${VERSION}.jar" +"${STAGING_REPO}/hudi-flink1.15-bundle/${VERSION}/hudi-flink1.15-bundle-${VERSION}.jar" "${STAGING_REPO}/hudi-gcp-bundle/${VERSION}/hudi-gcp-bundle-${VERSION}.jar" "${STAGING_REPO}/hudi-hadoop-mr-bundle/${VERSION}/hudi-hadoop-mr-bundle-${VERSION}.jar" "${STAGING_REPO}/hudi-hive-sync-bundle/${VERSION}/hudi-hive-sync-bundle-${VERSION}.jar"
[GitHub] [hudi] xushiyan merged pull request #6872: [MINOR] Fix deploy script for flink 1.15
xushiyan merged PR #6872: URL: https://github.com/apache/hudi/pull/6872 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6876: [MINOR] Handling null event time
hudi-bot commented on PR #6876: URL: https://github.com/apache/hudi/pull/6876#issuecomment-1269234032 ## CI report: * 6ce255ff0537ecb4ecf9bf7cf7f2534f7021337b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4986) Enhance hudi integ test readme for multi-writer tests
[ https://issues.apache.org/jira/browse/HUDI-4986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4986: - Labels: pull-request-available (was: ) > Enhance hudi integ test readme for multi-writer tests > - > > Key: HUDI-4986 > URL: https://issues.apache.org/jira/browse/HUDI-4986 > Project: Apache Hudi > Issue Type: Improvement > Components: docs, tests-ci >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[hudi] branch master updated: Enhancing README for multi-writer tests (#6870)
This is an automated email from the ASF dual-hosted git repository. codope pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 280194d3b6 Enhancing README for multi-writer tests (#6870) 280194d3b6 is described below commit 280194d3b6ef6e2181a137dd709f0c8e80d5de3a Author: Sivabalan Narayanan AuthorDate: Wed Oct 5 19:41:52 2022 -0700 Enhancing README for multi-writer tests (#6870) --- hudi-integ-test/README.md | 52 ++- 1 file changed, 51 insertions(+), 1 deletion(-) diff --git a/hudi-integ-test/README.md b/hudi-integ-test/README.md index 687ad9a2a9..bea9219294 100644 --- a/hudi-integ-test/README.md +++ b/hudi-integ-test/README.md @@ -525,7 +525,44 @@ Spark submit with the flag: ### Multi-writer tests Integ test framework also supports multi-writer tests. - Multi-writer tests with deltastreamer and a spark data source writer. + Multi-writer tests with deltastreamer and a spark data source writer. + +Props of interest +Top level configs: +- --target-base-path refers to target hudi table base path +- --input-base-paths comma separated input paths. If you plan to spin up two writers, this should contain input dir for two. +- --props-paths comma separated property file paths. Again, if you plan to spin up two writers, this should contain the property file for each writer. +- --workload-yaml-paths comma separated workload yaml files for each writer. + +Configs in property file: +- hoodie.deltastreamer.source.dfs.root : This property should refer to input dir for each writer in their corresponding property file. +In other words, this should match w/ --input-base-paths. +- hoodie.deltastreamer.schemaprovider.target.schema.file : refers to target schema. If you are running in docker, do copy the source avsc file to docker as well. +- hoodie.deltastreamer.schemaprovider.source.schema.file : refer to source schema. Same as above (copy to docker if needed) + +We have sample properties file to use based on whether InProcessLockProvider is used or ZookeeperBasedLockProvider is used. + +multi-writer-local-1.properties +multi-writer-local-2.properties +multi-writer-local-3.properties +multi-writer-local-4.properties + +These have configs that uses InProcessLockProvider. Configs specifc to InProcessLockProvider is: +hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.InProcessLockProvider + +multi-writer-1.properties +multi-writer-2.properties + +These have configs that uses ZookeeperBasedLockProvider. Setting up zookeeper is outside of the scope of this README. Ensure +zookeeper is up before running these. Configs specific to ZookeeperBasedLockProvider: + +hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider +hoodie.write.lock.zookeeper.url=zookeeper:2181 +hoodie.write.lock.zookeeper.port=2181 +hoodie.write.lock.zookeeper.lock_key=locks +hoodie.write.lock.zookeeper.base_path=/tmp/.locks + +If you are running locally, ensure you update the schema file accordingly. Sample spark-submit command to test one delta streamer and a spark data source writer. ```shell @@ -593,6 +630,19 @@ Sample spark-submit command to test one delta streamer and a spark data source w --use-hudi-data-to-generate-updates ``` +Properties that differ from previous scenario and this one are: +--input-base-paths refers to 4 paths instead of 2 +--props-paths again, refers to 4 paths intead of 2. + -- Each property file will contain properties for one spark datasource writer. +--workload-yaml-paths refers to 4 paths instead of 2. + -- Each yaml file used different range of partitions so that there won't be any conflicts while doing concurrent writes. + +MOR Table: +Running multi-writer tests for COW woks for entire iteration. but w/ MOR table, sometimes one of the writer could fail stating that +there is already a scheduled delta commit. In general, while scheduling compaction, there should not be inflight delta commits. +But w/ multiple threads trying to ingest in their own frequency, this is unavoidable. After few iterations, one of your thread could +die because there is an inflight delta commit from another writer. + === ### Testing async table services We can test async table services with deltastreamer using below command. 3 additional arguments are required to test async
[GitHub] [hudi] codope merged pull request #6870: [HUDI-4986] Enhancing README for multi-writer tests
codope merged PR #6870: URL: https://github.com/apache/hudi/pull/6870 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-4970] Update kafka-connect readme and refactor HoodieConfig#create (#6857)
This is an automated email from the ASF dual-hosted git repository. codope pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new fb4f026580 [HUDI-4970] Update kafka-connect readme and refactor HoodieConfig#create (#6857) fb4f026580 is described below commit fb4f02658050a74179338d4cfba07ceabe688c53 Author: Sagar Sumit AuthorDate: Thu Oct 6 08:11:35 2022 +0530 [HUDI-4970] Update kafka-connect readme and refactor HoodieConfig#create (#6857) --- .../apache/hudi/cli/commands/TestUpgradeDowngradeCommand.java | 6 +++--- .../org/apache/hudi/table/upgrade/TestUpgradeDowngrade.java | 6 +++--- .../main/java/org/apache/hudi/common/config/HoodieConfig.java | 9 + hudi-kafka-connect/README.md | 11 +++ .../sql/hudi/procedure/TestUpgradeOrDowngradeProcedure.scala | 5 +++-- 5 files changed, 17 insertions(+), 20 deletions(-) diff --git a/hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestUpgradeDowngradeCommand.java b/hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestUpgradeDowngradeCommand.java index ed4c952824..ff983d44ae 100644 --- a/hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestUpgradeDowngradeCommand.java +++ b/hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestUpgradeDowngradeCommand.java @@ -164,9 +164,9 @@ public class TestUpgradeDowngradeCommand extends CLIFunctionalTestHarness { Path propertyFile = new Path(metaClient.getMetaPath() + "/" + HoodieTableConfig.HOODIE_PROPERTIES_FILE); // Load the properties and verify FSDataInputStream fsDataInputStream = metaClient.getFs().open(propertyFile); -HoodieConfig hoodieConfig = HoodieConfig.create(fsDataInputStream); +HoodieConfig config = new HoodieConfig(); +config.getProps().load(fsDataInputStream); fsDataInputStream.close(); -assertEquals(Integer.toString(expectedVersion.versionCode()), hoodieConfig -.getString(HoodieTableConfig.VERSION)); +assertEquals(Integer.toString(expectedVersion.versionCode()), config.getString(HoodieTableConfig.VERSION)); } } diff --git a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/upgrade/TestUpgradeDowngrade.java b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/upgrade/TestUpgradeDowngrade.java index 39dbacabac..64ee23c35e 100644 --- a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/upgrade/TestUpgradeDowngrade.java +++ b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/upgrade/TestUpgradeDowngrade.java @@ -770,9 +770,9 @@ public class TestUpgradeDowngrade extends HoodieClientTestBase { Path propertyFile = new Path(metaClient.getMetaPath() + "/" + HoodieTableConfig.HOODIE_PROPERTIES_FILE); // Load the properties and verify FSDataInputStream fsDataInputStream = metaClient.getFs().open(propertyFile); -HoodieConfig hoodieConfig = HoodieConfig.create(fsDataInputStream); +HoodieConfig config = new HoodieConfig(); +config.getProps().load(fsDataInputStream); fsDataInputStream.close(); -assertEquals(Integer.toString(expectedVersion.versionCode()), hoodieConfig -.getString(HoodieTableConfig.VERSION)); +assertEquals(Integer.toString(expectedVersion.versionCode()), config.getString(HoodieTableConfig.VERSION)); } } diff --git a/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieConfig.java b/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieConfig.java index 366d19fe6e..91f0671cf9 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieConfig.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieConfig.java @@ -18,15 +18,14 @@ package org.apache.hudi.common.config; -import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hudi.common.util.Option; import org.apache.hudi.common.util.ReflectionUtils; import org.apache.hudi.common.util.StringUtils; import org.apache.hudi.exception.HoodieException; + import org.apache.log4j.LogManager; import org.apache.log4j.Logger; -import java.io.IOException; import java.io.Serializable; import java.lang.reflect.Modifier; import java.util.Arrays; @@ -42,12 +41,6 @@ public class HoodieConfig implements Serializable { protected static final String CONFIG_VALUES_DELIMITER = ","; - public static HoodieConfig create(FSDataInputStream inputStream) throws IOException { -HoodieConfig config = new HoodieConfig(); -config.props.load(inputStream); -return config; - } - protected TypedProperties props; public HoodieConfig() { diff --git a/hudi-kafka-connect/README.md b/hudi-kafka-connect/README.md index 449236ea5c..a1d6f812c1 100644 --- a/hudi-kafka-connect/README.md +++ b/hudi-kafka-connect/README.md @@ -36,10 +36,10 @@ After installing these dependencies, follow steps based on your requirement. ### 1 -
[GitHub] [hudi] codope merged pull request #6857: [HUDI-4970] Update kafka-connect readme and refactor HoodieConfig#create
codope merged PR #6857: URL: https://github.com/apache/hudi/pull/6857 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on pull request #6857: [HUDI-4970] Update kafka-connect readme and refactor HoodieConfig#create
codope commented on PR #6857: URL: https://github.com/apache/hudi/pull/6857#issuecomment-1269231905 Landing it. Just a readme and test update. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: Revert "[HUDI-4915] improve avro serializer/deserializer (#6788)" (#6809)
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 067cc24d88 Revert "[HUDI-4915] improve avro serializer/deserializer (#6788)" (#6809) 067cc24d88 is described below commit 067cc24d88fd299f1dfc8b96a1995621799613d4 Author: Yann Byron AuthorDate: Thu Oct 6 10:40:55 2022 +0800 Revert "[HUDI-4915] improve avro serializer/deserializer (#6788)" (#6809) This reverts commit 79b3e2b899cc303490c22610fda0e5ac2013cf02. --- .../org/apache/spark/sql/avro/AvroDeserializer.scala | 20 +--- .../org/apache/spark/sql/avro/AvroSerializer.scala | 17 +++-- .../org/apache/spark/sql/avro/AvroDeserializer.scala | 20 +--- .../org/apache/spark/sql/avro/AvroSerializer.scala | 19 --- .../org/apache/spark/sql/avro/AvroDeserializer.scala | 20 +--- .../org/apache/spark/sql/avro/AvroSerializer.scala | 19 --- .../org/apache/spark/sql/avro/AvroDeserializer.scala | 20 +--- .../org/apache/spark/sql/avro/AvroSerializer.scala | 19 --- 8 files changed, 99 insertions(+), 55 deletions(-) diff --git a/hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala b/hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala index 921e6deb58..9725fb63f5 100644 --- a/hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala +++ b/hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala @@ -49,27 +49,33 @@ import scala.collection.mutable.ArrayBuffer class AvroDeserializer(rootAvroType: Schema, rootCatalystType: DataType) { private lazy val decimalConversions = new DecimalConversion() - def deserialize(data: Any): Any = rootCatalystType match { + private val converter: Any => Any = rootCatalystType match { // A shortcut for empty schema. case st: StructType if st.isEmpty => - InternalRow.empty + (data: Any) => InternalRow.empty case st: StructType => val resultRow = new SpecificInternalRow(st.map(_.dataType)) val fieldUpdater = new RowUpdater(resultRow) val writer = getRecordWriter(rootAvroType, st, Nil) - val record = data.asInstanceOf[GenericRecord] - writer(fieldUpdater, record) - resultRow + (data: Any) => { +val record = data.asInstanceOf[GenericRecord] +writer(fieldUpdater, record) +resultRow + } case _ => val tmpRow = new SpecificInternalRow(Seq(rootCatalystType)) val fieldUpdater = new RowUpdater(tmpRow) val writer = newWriter(rootAvroType, rootCatalystType, Nil) - writer(fieldUpdater, 0, data) - tmpRow.get(0, rootCatalystType) + (data: Any) => { +writer(fieldUpdater, 0, data) +tmpRow.get(0, rootCatalystType) + } } + def deserialize(data: Any): Any = converter(data) + /** * Creates a writer to write avro values to Catalyst values at the given ordinal with the given * updater. diff --git a/hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala b/hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala index e0c7344138..2b88be8165 100644 --- a/hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala +++ b/hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala @@ -47,6 +47,10 @@ import org.apache.spark.sql.types._ class AvroSerializer(rootCatalystType: DataType, rootAvroType: Schema, nullable: Boolean) { def serialize(catalystData: Any): Any = { +converter.apply(catalystData) + } + + private val converter: Any => Any = { val actualAvroType = resolveNullableType(rootAvroType, nullable) val baseConverter = rootCatalystType match { case st: StructType => @@ -59,13 +63,14 @@ class AvroSerializer(rootCatalystType: DataType, rootAvroType: Schema, nullable: converter.apply(tmpRow, 0) } if (nullable) { - if (catalystData == null) { -null - } else { -baseConverter.apply(catalystData) - } + (data: Any) => +if (data == null) { + null +} else { + baseConverter.apply(data) +} } else { - baseConverter.apply(catalystData) + baseConverter } } diff --git a/hudi-spark-datasource/hudi-spark3.1.x/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala b/hudi-spark-datasource/hudi-spark3.1.x/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala index 61482ab96f..5fb6d907bd 100644 ---
[GitHub] [hudi] xushiyan merged pull request #6809: Revert "[HUDI-4915] improve avro serializer/deserializer (#6788)"
xushiyan merged PR #6809: URL: https://github.com/apache/hudi/pull/6809 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6836: [HUDI-4952] Fixing reading from metadata table when there are no inflight commits
hudi-bot commented on PR #6836: URL: https://github.com/apache/hudi/pull/6836#issuecomment-1269231396 ## CI report: * 23d923e6b8c75781053f3f7bbc811084141f7786 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11978) * e246d65957362860b850f1af9ef973b85bf1a4eb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12017) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan opened a new pull request, #6876: [MINOR] Handling null event time
nsivabalan opened a new pull request, #6876: URL: https://github.com/apache/hudi/pull/6876 ### Change Logs Seeing noisy debug logs (`Fail to parse event time value`) with tests. Fixing null event time handling. ### Impact _Describe any public API or user-facing feature change or any performance impact._ **Risk level: none | low | medium | high** _Choose one. If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6836: [HUDI-4952] Fixing reading from metadata table when there are no inflight commits
hudi-bot commented on PR #6836: URL: https://github.com/apache/hudi/pull/6836#issuecomment-1269228467 ## CI report: * 23d923e6b8c75781053f3f7bbc811084141f7786 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11978) * e246d65957362860b850f1af9ef973b85bf1a4eb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file
hudi-bot commented on PR #6358: URL: https://github.com/apache/hudi/pull/6358#issuecomment-1269225293 ## CI report: * 288d166c49602a4593b1e97763a467811903737d UNKNOWN * f21eab07069aa87544e04b115e7463126cd9c472 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12015) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #6836: [HUDI-4952] Fixing reading from metadata table when there are no inflight commits
nsivabalan commented on code in PR #6836: URL: https://github.com/apache/hudi/pull/6836#discussion_r988495539 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scala: ## @@ -293,16 +293,20 @@ object HoodieFileIndex extends Logging { schema.fieldNames.filter { colName => refs.exists(r => resolver.apply(colName, r.name)) } } - def getConfigProperties(spark: SparkSession, options: Map[String, String]) = { + private def isFilesPartitionAvailable(metaClient: HoodieTableMetaClient): Boolean = { + metaClient.getTableConfig.getMetadataPartitions.contains(HoodieTableMetadataUtil.PARTITION_NAME_FILES) + } + + def getConfigProperties(spark: SparkSession, options: Map[String, String], metaClient: HoodieTableMetaClient) = { val sqlConf: SQLConf = spark.sessionState.conf val properties = new TypedProperties() // To support metadata listing via Spark SQL we allow users to pass the config via SQL Conf in spark session. Users // would be able to run SET hoodie.metadata.enable=true in the spark sql session to enable metadata listing. -properties.setProperty(HoodieMetadataConfig.ENABLE.key(), - sqlConf.getConfString(HoodieMetadataConfig.ENABLE.key(), -HoodieMetadataConfig.DEFAULT_METADATA_ENABLE_FOR_READERS.toString)) -properties.putAll(options.filter(p => p._2 != null).asJava) +val isMetadataFilesPartitionAvailable = isFilesPartitionAvailable(metaClient) && sqlConf.getConfString(HoodieMetadataConfig.ENABLE.key(), Review Comment: Isn't this the entry point to metadata table on the read path? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6862: [HUDI-4989] fixing deltastreamer init failures
hudi-bot commented on PR #6862: URL: https://github.com/apache/hudi/pull/6862#issuecomment-1269183442 ## CI report: * 149aec6ea8ff6d895da07b0226be1efdf920e3d8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12016) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4989) Deltastreamer fails if table instantiation failed mid-way in prior attempt
[ https://issues.apache.org/jira/browse/HUDI-4989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4989: - Labels: pull-request-available (was: ) > Deltastreamer fails if table instantiation failed mid-way in prior attempt > -- > > Key: HUDI-4989 > URL: https://issues.apache.org/jira/browse/HUDI-4989 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.13.0 > > > If table instantiation failed mid-way, and if deltastreamer is restarted, it > could fail if hoodie.properties does not exist. > > we need to make it resilient in such cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6862: [HUDI-4989] fixing deltastreamer init failures
hudi-bot commented on PR #6862: URL: https://github.com/apache/hudi/pull/6862#issuecomment-1269180039 ## CI report: * 149aec6ea8ff6d895da07b0226be1efdf920e3d8 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4989) Deltastreamer fails if table instantiation failed mid-way in prior attempt
[ https://issues.apache.org/jira/browse/HUDI-4989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4989: -- Priority: Critical (was: Major) > Deltastreamer fails if table instantiation failed mid-way in prior attempt > -- > > Key: HUDI-4989 > URL: https://issues.apache.org/jira/browse/HUDI-4989 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > Fix For: 0.13.0 > > > If table instantiation failed mid-way, and if deltastreamer is restarted, it > could fail if hoodie.properties does not exist. > > we need to make it resilient in such cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4989) Deltastreamer fails if table instantiation failed mid-way in prior attempt
sivabalan narayanan created HUDI-4989: - Summary: Deltastreamer fails if table instantiation failed mid-way in prior attempt Key: HUDI-4989 URL: https://issues.apache.org/jira/browse/HUDI-4989 Project: Apache Hudi Issue Type: Improvement Components: deltastreamer Reporter: sivabalan narayanan If table instantiation failed mid-way, and if deltastreamer is restarted, it could fail if hoodie.properties does not exist. we need to make it resilient in such cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4989) Deltastreamer fails if table instantiation failed mid-way in prior attempt
[ https://issues.apache.org/jira/browse/HUDI-4989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4989: -- Fix Version/s: 0.13.0 > Deltastreamer fails if table instantiation failed mid-way in prior attempt > -- > > Key: HUDI-4989 > URL: https://issues.apache.org/jira/browse/HUDI-4989 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Fix For: 0.13.0 > > > If table instantiation failed mid-way, and if deltastreamer is restarted, it > could fail if hoodie.properties does not exist. > > we need to make it resilient in such cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-4989) Deltastreamer fails if table instantiation failed mid-way in prior attempt
[ https://issues.apache.org/jira/browse/HUDI-4989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-4989: - Assignee: sivabalan narayanan > Deltastreamer fails if table instantiation failed mid-way in prior attempt > -- > > Key: HUDI-4989 > URL: https://issues.apache.org/jira/browse/HUDI-4989 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > > If table instantiation failed mid-way, and if deltastreamer is restarted, it > could fail if hoodie.properties does not exist. > > we need to make it resilient in such cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] yihua commented on a diff in pull request #6003: [HUDI-1575][RFC-56] Early Conflict Detection For Multi-writer
yihua commented on code in PR #6003: URL: https://github.com/apache/hudi/pull/6003#discussion_r988454552 ## rfc/rfc-56/rfc-56.md: ## @@ -0,0 +1,238 @@ + + +# RFC-56: Early Conflict Detection For Multi-writer + +## Proposers + +- @zhangyue19921010 + +## Approvers + +- @yihua + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-1575 + +## Abstract + +At present, Hudi implements an OCC (Optimistic Concurrency Control) based on timeline to ensure data consistency, +integrity and correctness between multi-writers. OCC detects the conflict at Hudi's file group level, i.e., two +concurrent writers updating the same file group are detected as a conflict. Currently, the conflict detection is +performed before commit metadata and after the data writing is completed. If any conflict is detected, it leads to a +waste of cluster resources because computing and writing were finished already. + +To solve this problem, this RFC proposes an early conflict detection mechanism to detect the conflict during the data +writing phase and abort the writing early if conflict is detected, using Hudi's marker mechanism. Before writing each +data file, the writer creates a corresponding marker to mark that the file is created, so that the writer can use the +markers to automatically clean up uncommitted data in failure and rollback scenarios. We propose to use the markers +identify the conflict at the file group level during writing data. There are some subtle differences in early conflict +detection work flow between different types of marker maintainers. For direct markers, hoodie lists necessary marker +files directly and does conflict checking before the writers creating markers and before starting to write corresponding +data file. For the timeline-server based markers, hoodie just gets the result of marker conflict checking before the +writers creating markers and before starting to write corresponding data files. The conflicts are asynchronously and +periodically checked so that the writing conflicts can be detected as early as possible. Both writers may still write +the data files of the same file slice, until the conflict is detected in the next round of checking. + +What's more? Hoodie can stop writing earlier because of early conflict detection and release the resources to cluster, +improving resource utilization. + +Note that, the early conflict detection proposed by this RFC operates within OCC. Any conflict detection outside the +scope of OCC is not handle. For example, current OCC for multiple writers cannot detect the conflict if two concurrent +writers perform INSERT operations for the same set of record keys, because the writers write to different file groups. +This RFC does not intend to address this problem. + +## Background + +As we know, transactions and multi-writers of data lakes are becoming the key characteristics of building Lakehouse +these days. Quoting this inspiring blog Lakehouse Concurrency Control: Are we too optimistic? directly: +https://hudi.apache.org/blog/2021/12/16/lakehouse-concurrency-control-are-we-too-optimistic/ + +> "Hudi implements a file level, log based concurrency control protocol on the Hudi timeline, which in-turn relies +> on bare minimum atomic puts to cloud storage. By building on an event log as the central piece for inter process +> coordination, Hudi is able to offer a few flexible deployment models that offer greater concurrency over pure OCC +> approaches that just track table snapshots." + +In the multi-writer scenario, Hudi's existing conflict detection occurs after the writer finishing writing the data and +before committing the metadata. In other words, the writer just detects the occurrence of the conflict when it starts to +commit, although all calculations and data writing have been completed, which causes a waste of resources. + +For example: + +Now there are two writing jobs: job1 writes 10M data to the Hudi table, including updates to file group 1. Another job2 +writes 100G to the Hudi table, and also updates the same file group 1. + +Job1 finishes and commits to Hudi successfully. After a few hours, job2 finishes writing data files(100G) and starts to +commit metadata. At this time, a conflict with job1 is found, and the job2 has to be aborted and re-run after failure. +Obviously, a lot of computing resources and time are wasted for job2. + +Hudi currently has two important mechanisms, marker mechanism and heartbeat mechanism: + +1. Marker mechanism can track all the files that are part of an active write. +2. Heartbeat mechanism that can track all active writers to a Hudi table. + +Based on marker and heartbeat, this RFC proposes a new conflict detection: Early Conflict Detection. Before the writer +creates the marker and before it starts to write the file, Hudi performs this new conflict detection, trying to detect +the writing conflict directly (for direct markers) or get the async conflict
[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file
hudi-bot commented on PR #6358: URL: https://github.com/apache/hudi/pull/6358#issuecomment-1269130978 ## CI report: * 288d166c49602a4593b1e97763a467811903737d UNKNOWN * 91655a8009d60f6337939f87d6e2e01922877848 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12002) * f21eab07069aa87544e04b115e7463126cd9c472 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12015) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file
hudi-bot commented on PR #6358: URL: https://github.com/apache/hudi/pull/6358#issuecomment-1269127142 ## CI report: * 288d166c49602a4593b1e97763a467811903737d UNKNOWN * 91655a8009d60f6337939f87d6e2e01922877848 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12002) * f21eab07069aa87544e04b115e7463126cd9c472 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6745: Fix comment in RFC46
hudi-bot commented on PR #6745: URL: https://github.com/apache/hudi/pull/6745#issuecomment-1269065121 ## CI report: * af8e58757bed12e53907076da02add1ba98b220c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12014) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] the-other-tim-brown commented on a diff in pull request #6806: [HUDI-4905] Improve type handling in proto schema conversion
the-other-tim-brown commented on code in PR #6806: URL: https://github.com/apache/hudi/pull/6806#discussion_r988360500 ## hudi-utilities/pom.xml: ## @@ -85,7 +85,6 @@ com.google.protobuf protobuf-java-util - test Review Comment: Should I revert that last commit then? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6745: Fix comment in RFC46
hudi-bot commented on PR #6745: URL: https://github.com/apache/hudi/pull/6745#issuecomment-1268931103 ## CI report: * 7525a09b2415fbf4e5e7de7c71cfffd8afc8c410 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12011) * af8e58757bed12e53907076da02add1ba98b220c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12014) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6745: Fix comment in RFC46
hudi-bot commented on PR #6745: URL: https://github.com/apache/hudi/pull/6745#issuecomment-1268925125 ## CI report: * 7525a09b2415fbf4e5e7de7c71cfffd8afc8c410 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12011) * af8e58757bed12e53907076da02add1ba98b220c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6806: [HUDI-4905] Improve type handling in proto schema conversion
hudi-bot commented on PR #6806: URL: https://github.com/apache/hudi/pull/6806#issuecomment-1268918966 ## CI report: * f03f9610cf4e2c490d33ca734ca9b3241b2be778 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12012) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6858: [HUDI-4824]RANGE BUCKET index, base logic and test.
hudi-bot commented on PR #6858: URL: https://github.com/apache/hudi/pull/6858#issuecomment-1265746656 ## CI report: * 76d14eb325d62a026248cc5c30de0d415e0c92a2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11975) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6858: [HUDI-4824]RANGE BUCKET index, base logic and test.
hudi-bot commented on PR #6858: URL: https://github.com/apache/hudi/pull/6858#issuecomment-1265741058 ## CI report: * 76d14eb325d62a026248cc5c30de0d415e0c92a2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wqwl611 commented on pull request #6636: [HUDI-4824]Add new index RANGE_BUCKET , when primary key is auto-increment like most mysql table
wqwl611 commented on PR #6636: URL: https://github.com/apache/hudi/pull/6636#issuecomment-1265711269 I accidentally deleted my previous fork, so I opened a new PR, and modified the code according to the previous review requirements, please help me review it,thanks. @danny0405 @YuweiXiao @xushiyan @alexeykudinkin https://github.com/apache/hudi/pull/6858 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wqwl611 commented on pull request #6858: [HUDI-4824]RANGE Bucket index, base logic and test.
wqwl611 commented on PR #6858: URL: https://github.com/apache/hudi/pull/6858#issuecomment-1265709727 I accidentally deleted my previous fork, so I opened a new PR, and modified the code according to the previous review requirements, please help me review it,thanks. @danny0405 @YuweiXiao @xushiyan @alexeykudinkin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wqwl611 opened a new pull request, #6858: [HUDI-4824]RANGE Bucket index, base logic and test.
wqwl611 opened a new pull request, #6858: URL: https://github.com/apache/hudi/pull/6858 ### Change Logs The rangeBucket is mainly used in the scenario of sync mysql tables to hudi in near real time, which avoids the disadvantage of the fixed number of buckets in simpleBucket. Usually, in the mysql table, there is an auto-increment id primary key field. In the mysql cdc synchronization scenario, we can use the database name and table name as the partition field of hudi, and id as the primary key field of the hudi table,This can deal with sub-library and sub-table. In order to reach better sync performance, we usually use bucket index, but if we use simple bucket index, because the number of buckets is fixed, it is difficult for us to determine a suitable number of buckets, and as the table grows, The previous number of buckets will no longer be appropriate. So, I propose rangeBucekt, in the simpleBucket index, the bucket number is (hash % bucketNum), and in rangetBucket, we will use ( id / fixedStep) to determine the bucket number, so that as the id grows,The number of buckets also increases. For example, if step = 10 is set, then, because the id is self-increasing, a bucket will be generated for every 10 pieces of data. In the actual scenario, I set step=1,000,000, the usual size of each mysql record is similar, then the approximate size of each bucket will be 50M ~ 350M, which avoids the disadvantage of the fixed number of buckets in simpleBucket ### Impact Introduce a new index RANGE_BUCKET, people can ust it like following: option(HoodieIndexConfig.INDEX_TYPE.key, IndexType.BUCKET.name()). option(HoodieIndexConfig.BUCKET_INDEX_ENGINE_TYPE.key, "RANGE_BUCKET"). option(HoodieIndexConfig.RANGE_BUCKET_STEP_SIZE.key, 2). option(HoodieLayoutConfig.LAYOUT_TYPE.key, "BUCKET"). **Risk level: none | low | medium | high** low -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6857: [HUDI-4970] Update kafka-connect readme and refactor HoodieConfig#create
hudi-bot commented on PR #6857: URL: https://github.com/apache/hudi/pull/6857#issuecomment-1265637355 ## CI report: * d4f9276e7a3802fb2df5c3b7e28c224e4a1e7f15 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11974) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on issue #6332: Avoid creating Configuration copies in Hudi
codope commented on issue #6332: URL: https://github.com/apache/hudi/issues/6332#issuecomment-1265613884 Synced up with @pratyakshsharma regarding this issue. First of all, the issue affects hudi tables queries via presto-hive connector. We need to see if we can use the config provided by the engine itself while instantiating the meta client. Created HUDI-4974 to track that issue. For now, we have a mitigation. We will unwrap the wrapper config object and pass that. Closing the issue as it has been triaged and we have an interim solution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope closed issue #6332: Avoid creating Configuration copies in Hudi
codope closed issue #6332: Avoid creating Configuration copies in Hudi URL: https://github.com/apache/hudi/issues/6332 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4970) hudi-kafka-connect-bundle: Could not initialize class org.apache.hadoop.security.UserGroupInformation
[ https://issues.apache.org/jira/browse/HUDI-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4970: - Labels: pull-request-available (was: ) > hudi-kafka-connect-bundle: Could not initialize class > org.apache.hadoop.security.UserGroupInformation > - > > Key: HUDI-4970 > URL: https://issues.apache.org/jira/browse/HUDI-4970 > Project: Apache Hudi > Issue Type: Bug >Reporter: Sagar Sumit >Assignee: Sagar Sumit >Priority: Critical > Labels: pull-request-available > Fix For: 0.12.2 > > > The Kafka connect sink loads successfully but fails to sync Hudi table due to > NoClassDefFoundError: Could not initialize class > org.apache.hadoop.security.UserGroupInformation > {code:java} > [2022-10-03 14:31:49,872] INFO The value of > hoodie.datasource.write.keygenerator.type is empty, using SIMPLE > (org.apache.hudi.keygen.factory.HoodieAvroKeyGeneratorFactory:63)[2022-10-03 > 14:31:49,872] INFO Setting record key volume and partition fields date for > table file:///tmp/hoodie/hudi-test-topichudi-test-topic > (org.apache.hudi.connect.writers.KafkaConnectTransactionServices:93)[2022-10-03 > 14:31:49,872] INFO Initializing file:///tmp/hoodie/hudi-test-topic as hoodie > table file:///tmp/hoodie/hudi-test-topic > (org.apache.hudi.common.table.HoodieTableMetaClient:424)[2022-10-03 > 14:31:49,872] INFO Existing partitions deleted [hudi-test-topic-0] > (org.apache.hudi.connect.HoodieSinkTask:156)[2022-10-03 14:31:49,872] ERROR > WorkerSinkTask{id=hudi-sink-3} Task threw an uncaught and unrecoverable > exception. Task is being killed and will not recover until manually restarted > (org.apache.kafka.connect.runtime.WorkerTask:184)java.lang.NoClassDefFoundError: > Could not initialize class org.apache.hadoop.security.UserGroupInformation > at org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:3431) > at > org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:3421) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3263) at > org.apache.hadoop.fs.FileSystem.get(FileSystem.java:475) at > org.apache.hadoop.fs.Path.getFileSystem(Path.java:356) at > org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:110)at > org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:103)at > org.apache.hudi.common.table.HoodieTableMetaClient.initTableAndGetMetaClient(HoodieTableMetaClient.java:426) > at > org.apache.hudi.common.table.HoodieTableMetaClient$PropertyBuilder.initTable(HoodieTableMetaClient.java:1110) > at > org.apache.hudi.connect.writers.KafkaConnectTransactionServices.(KafkaConnectTransactionServices.java:104) > at > org.apache.hudi.connect.transaction.ConnectTransactionCoordinator.(ConnectTransactionCoordinator.java:88) > at > org.apache.hudi.connect.HoodieSinkTask.bootstrap(HoodieSinkTask.java:191) > at org.apache.hudi.connect.HoodieSinkTask.open(HoodieSinkTask.java:151) at > org.apache.kafka.connect.runtime.WorkerSinkTask.openPartitions(WorkerSinkTask.java:635) > at > org.apache.kafka.connect.runtime.WorkerSinkTask.access$1000(WorkerSinkTask.java:71){code} > Follow [https://github.com/apache/hudi/tree/master/hudi-kafka-connect#readme] > to reproduce. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6857: [HUDI-4970] Update kafka-connect readme and refactor HoodieConfig#create
hudi-bot commented on PR #6857: URL: https://github.com/apache/hudi/pull/6857#issuecomment-1265411902 ## CI report: * d4f9276e7a3802fb2df5c3b7e28c224e4a1e7f15 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11974) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6857: [HUDI-4970] Update kafka-connect readme and refactor HoodieConfig#create
hudi-bot commented on PR #6857: URL: https://github.com/apache/hudi/pull/6857#issuecomment-1265405407 ## CI report: * d4f9276e7a3802fb2df5c3b7e28c224e4a1e7f15 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6850: [Draft][HUDI-4964] inline all the getter methods that have no logic …
hudi-bot commented on PR #6850: URL: https://github.com/apache/hudi/pull/6850#issuecomment-1265398895 ## CI report: * 4b1c2e6a4a256989d070a105cdd88ef02aaa8fc1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11972) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope opened a new pull request, #6857: [HUDI-4970] Update kafka-connect readme and refactor HoodieConfig#create
codope opened a new pull request, #6857: URL: https://github.com/apache/hudi/pull/6857 ### Change Logs Update Kafka-connect setup with some details. Also, remove `HoodieConfig#create`, which is being used only in tests an unnecessarily class loader has to load `org.apache.hadoop.fs.FSDataInputStream`. ### Impact Refactoring does not have any impact in this case. **Risk level: none** _Choose one. If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6838: [MINOR] Update azure image and balance CI jobs
hudi-bot commented on PR #6838: URL: https://github.com/apache/hudi/pull/6838#issuecomment-1265314521 ## CI report: * 3b01f5fd8a8be1d5b7dfca7adc882771f7fa787d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11970) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope merged pull request #6846: [HUDI-4962] Move cloud dependencies to cloud modules
codope merged PR #6846: URL: https://github.com/apache/hudi/pull/6846 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pramodbiligiri commented on pull request #6846: [HUDI-4962] Move cloud dependencies to cloud modules
pramodbiligiri commented on PR #6846: URL: https://github.com/apache/hudi/pull/6846#issuecomment-1265279684 Tested that this build worked locally. Built this branch as follows and ran the sync. Results after sync pasted down below. Build: $ mvn -DskipTests -Dspark3.2 -Dscala-2.12 -Dcheckstyle.skip -Drat.skip clean install Testing the sync (each file has 10 records): ``` Before Sync: select max(id) from gcs_data; ++ | _c0 | ++ | 20370 | ++ select max(name) from gcs_meta_hive; +--+ | _c0| +--+ | country=IN/data-file-2040.jsonl | +--+ After Sync: select max(id) from gcs_data; ++ | _c0 | ++ | 20450 | ++ select max(name) from gcs_meta_hive; +--+ | _c0| +--+ | country=IN/data-file-2045.jsonl | +--+ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6850: [Draft][HUDI-4964] inline all the getter methods that have no logic …
hudi-bot commented on PR #6850: URL: https://github.com/apache/hudi/pull/6850#issuecomment-1265246778 ## CI report: * 13a464e9dca3394ed7d946c0e682ad02f7edfc43 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11971) * 4b1c2e6a4a256989d070a105cdd88ef02aaa8fc1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11972) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6856: [HUDI-4968] Update misleading read.streaming.skip_compaction config
hudi-bot commented on PR #6856: URL: https://github.com/apache/hudi/pull/6856#issuecomment-1265241676 ## CI report: * 7fbe39a558949b0e0e8938546aad96e5ba0c1956 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11969) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dragonH commented on issue #6832: [SUPPORT] AWS Glue 3.0 fail to write dataset with hudi (hive sync issue)
dragonH commented on issue #6832: URL: https://github.com/apache/hudi/issues/6832#issuecomment-1265226296 @kazdy thanks for the information! wonder if there's better way to avoid this kind of issue e.g. - add a new config property `AWSGlueDataCataglogEnabled` and if it set as `True`, convert the table name to lowercase at the begin and at the rest - or edit the `table-name-related` config property hint to highlight this (if using aws glue data catalog, table name sould be lowercase) cause the original one and the exception raised could not really tell about this thanks for your help ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6850: [Draft][HUDI-4964] inline all the getter methods that have no logic …
hudi-bot commented on PR #6850: URL: https://github.com/apache/hudi/pull/6850#issuecomment-1265171581 ## CI report: * 13a464e9dca3394ed7d946c0e682ad02f7edfc43 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11971) * 4b1c2e6a4a256989d070a105cdd88ef02aaa8fc1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6838: [MINOR] Update azure image and balance CI jobs
hudi-bot commented on PR #6838: URL: https://github.com/apache/hudi/pull/6838#issuecomment-1265165246 ## CI report: * b4875afb16a2a8bdd0bce03f518af4fee9ada2a7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11948) * 3b01f5fd8a8be1d5b7dfca7adc882771f7fa787d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11970) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6838: [MINOR] Update azure image and balance CI jobs
hudi-bot commented on PR #6838: URL: https://github.com/apache/hudi/pull/6838#issuecomment-1265159045 ## CI report: * b4875afb16a2a8bdd0bce03f518af4fee9ada2a7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11948) * 3b01f5fd8a8be1d5b7dfca7adc882771f7fa787d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6850: [Draft][HUDI-4964] inline all the getter methods that have no logic …
hudi-bot commented on PR #6850: URL: https://github.com/apache/hudi/pull/6850#issuecomment-1265159167 ## CI report: * e3aef767db19eed24222f8fff89ae4c59d0799c2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11956) * 13a464e9dca3394ed7d946c0e682ad02f7edfc43 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6850: [Draft][HUDI-4964] inline all the getter methods that have no logic …
hudi-bot commented on PR #6850: URL: https://github.com/apache/hudi/pull/6850#issuecomment-1265165358 ## CI report: * e3aef767db19eed24222f8fff89ae4c59d0799c2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11956) * 13a464e9dca3394ed7d946c0e682ad02f7edfc43 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11971) * 4b1c2e6a4a256989d070a105cdd88ef02aaa8fc1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6856: [HUDI-4968] Update misleading read.streaming.skip_compaction config
hudi-bot commented on PR #6856: URL: https://github.com/apache/hudi/pull/6856#issuecomment-1265078737 ## CI report: * 7fbe39a558949b0e0e8938546aad96e5ba0c1956 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11969) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] voonhous opened a new pull request, #6856: [HUDI-4968] Update misleading read.streaming.skip_compaction config
voonhous opened a new pull request, #6856: URL: https://github.com/apache/hudi/pull/6856 ### Change Logs Update misleading `read.streaming.skip_compaction` config. ### Impact _Describe any public API or user-facing feature change or any performance impact._ **Risk level: none | low | medium | high** _Choose one. If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hkszn commented on issue #6718: [SUPPORT] Deltastreamer concurrent writes in continuous mode
hkszn commented on issue #6718: URL: https://github.com/apache/hudi/issues/6718#issuecomment-1265049923 Thank you for your reply. > If you are interested, I can guide you on how to achieve this. Yes, I would like to try it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6856: [HUDI-4968] Update misleading read.streaming.skip_compaction config
hudi-bot commented on PR #6856: URL: https://github.com/apache/hudi/pull/6856#issuecomment-1265072776 ## CI report: * 7fbe39a558949b0e0e8938546aad96e5ba0c1956 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kazdy commented on issue #6832: [SUPPORT] AWS Glue 3.0 fail to write dataset with hudi (hive sync issue)
kazdy commented on issue #6832: URL: https://github.com/apache/hudi/issues/6832#issuecomment-1265036326 Btw on emr you'll get the same error because glue client is the same. I got this error when running hudi on emr. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4968) Fix ambiguous stream read config
[ https://issues.apache.org/jira/browse/HUDI-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4968: - Labels: pull-request-available (was: ) > Fix ambiguous stream read config > > > Key: HUDI-4968 > URL: https://issues.apache.org/jira/browse/HUDI-4968 > Project: Apache Hudi > Issue Type: Task >Reporter: voon >Priority: Major > Labels: pull-request-available > > Fix ambiguous stream read configs by: > # Updating the relevant versioned (versions >= 0.11.0) markdown pages to > change _read.streaming.start-commit_ to _read.start-commit_ > # Update the _read.streaming.skip_compaction_ description to accurately > describe when it should be used. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6838: [MINOR] Update azure image and balance CI jobs
hudi-bot commented on PR #6838: URL: https://github.com/apache/hudi/pull/6838#issuecomment-1265000971 ## CI report: * b4875afb16a2a8bdd0bce03f518af4fee9ada2a7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11948) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] voonhous opened a new pull request, #6855: [HUDI-4968] Update old config keys
voonhous opened a new pull request, #6855: URL: https://github.com/apache/hudi/pull/6855 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ **Risk level: none | low | medium | high** _Choose one. If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on pull request #6852: [MINOR] Fix testUpdateRejectForClustering
xushiyan commented on PR #6852: URL: https://github.com/apache/hudi/pull/6852#issuecomment-1264955767 Test fix works. CI failure is irrelevant. Landing this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6838: [MINOR] Update azure image and balance CI jobs
hudi-bot commented on PR #6838: URL: https://github.com/apache/hudi/pull/6838#issuecomment-1264952448 ## CI report: * b4875afb16a2a8bdd0bce03f518af4fee9ada2a7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a diff in pull request #6732: [HUDI-4148] Add client for hudi table management service
xushiyan commented on code in PR #6732: URL: https://github.com/apache/hudi/pull/6732#discussion_r985359891 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java: ## @@ -1052,13 +875,29 @@ public void dropIndex(List partitionTypes) { } } + /** + * Performs Clustering for the workload stored in instant-time. + * + * @param clusteringInstantTime Clustering Instant Time + * @return Collection of WriteStatus to inspect errors and counts + */ + public HoodieWriteMetadata cluster(String clusteringInstantTime) { +if (delegateToTableManagerService(config, ActionType.replacecommit)) { + throw new HoodieException(ActionType.replacecommit.name() + " delegate to table management service!"); Review Comment: please align on the name ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseTableServiceClient.java: ## @@ -0,0 +1,432 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.client; + +import org.apache.hudi.avro.model.HoodieCleanMetadata; +import org.apache.hudi.avro.model.HoodieCleanerPlan; +import org.apache.hudi.avro.model.HoodieClusteringPlan; +import org.apache.hudi.avro.model.HoodieCompactionPlan; +import org.apache.hudi.common.engine.HoodieEngineContext; +import org.apache.hudi.common.model.ActionType; +import org.apache.hudi.common.model.HoodieCommitMetadata; +import org.apache.hudi.common.model.HoodieWriteStat; +import org.apache.hudi.common.model.TableServiceType; +import org.apache.hudi.common.table.timeline.HoodieActiveTimeline; +import org.apache.hudi.common.table.timeline.HoodieInstant; +import org.apache.hudi.common.table.timeline.HoodieTimeline; +import org.apache.hudi.common.util.CleanerUtils; +import org.apache.hudi.common.util.ClusteringUtils; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.collection.Pair; +import org.apache.hudi.config.HoodieClusteringConfig; +import org.apache.hudi.config.HoodieCompactionConfig; +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.exception.HoodieCommitException; +import org.apache.hudi.exception.HoodieIOException; +import org.apache.hudi.metadata.HoodieTableMetadataWriter; +import org.apache.hudi.metrics.HoodieMetrics; +import org.apache.hudi.table.HoodieTable; +import org.apache.hudi.table.action.HoodieWriteMetadata; + +import com.codahale.metrics.Timer; +import org.apache.hadoop.conf.Configuration; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; + +import java.util.List; +import java.util.Map; + +public abstract class BaseTableServiceClient extends CommonHoodieClient { Review Comment: BaseHoodieTableServiceClient to align with BaseHoodieWriteClient ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/table/manager/HoodieTableManagerClient.java: ## @@ -0,0 +1,191 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.client.table.manager; + +import org.apache.hudi.common.config.HoodieTableManagerConfig; +import org.apache.hudi.common.table.HoodieTableMetaClient; +import org.apache.hudi.common.table.timeline.HoodieInstant; +import org.apache.hudi.common.util.ClusteringUtils; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.common.util.ValidationUtils; +import org.apache.hudi.exception.HoodieRemoteException; +
[GitHub] [hudi] xushiyan merged pull request #6852: [MINOR] Fix testUpdateRejectForClustering
xushiyan merged PR #6852: URL: https://github.com/apache/hudi/pull/6852 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6838: [MINOR] Update azure image and balance CI jobs
hudi-bot commented on PR #6838: URL: https://github.com/apache/hudi/pull/6838#issuecomment-1264955736 ## CI report: * b4875afb16a2a8bdd0bce03f518af4fee9ada2a7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11948) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #6281: [SUPPORT] AwsGlueCatalogSyncTool -The number of partition keys do not match the number of partition values
xushiyan commented on issue #6281: URL: https://github.com/apache/hudi/issues/6281#issuecomment-1264943872 @crutis closing this as explained by @yihua . let us know how it works -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan closed issue #6281: [SUPPORT] AwsGlueCatalogSyncTool -The number of partition keys do not match the number of partition values
xushiyan closed issue #6281: [SUPPORT] AwsGlueCatalogSyncTool -The number of partition keys do not match the number of partition values URL: https://github.com/apache/hudi/issues/6281 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan merged pull request #6851: [HUDI-4966] Add a partition extractor to handle partition values with slashes
xushiyan merged PR #6851: URL: https://github.com/apache/hudi/pull/6851 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on pull request #6732: [HUDI-4148] Add client for hudi table management service
xushiyan commented on PR #6732: URL: https://github.com/apache/hudi/pull/6732#issuecomment-1264932668 @yuzhaojing please also fill up the PR description properly. as discussed, a class diagram to show the new hierarchy expedites the review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhangyue19921010 commented on a diff in pull request #6003: [HUDI-1575][RFC-56] Early Conflict Detection For Multi-writer
zhangyue19921010 commented on code in PR #6003: URL: https://github.com/apache/hudi/pull/6003#discussion_r985349298 ## rfc/rfc-56/rfc-56.md: ## @@ -0,0 +1,238 @@ + + +# RFC-56: Early Conflict Detection For Multi-writer + +## Proposers + +- @zhangyue19921010 + +## Approvers + +- @yihua + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-1575 + +## Abstract + +At present, Hudi implements an OCC (Optimistic Concurrency Control) based on timeline to ensure data consistency, +integrity and correctness between multi-writers. OCC detects the conflict at Hudi's file group level, i.e., two +concurrent writers updating the same file group are detected as a conflict. Currently, the conflict detection is +performed before commit metadata and after the data writing is completed. If any conflict is detected, it leads to a +waste of cluster resources because computing and writing were finished already. + +To solve this problem, this RFC proposes an early conflict detection mechanism to detect the conflict during the data +writing phase and abort the writing early if conflict is detected, using Hudi's marker mechanism. Before writing each +data file, the writer creates a corresponding marker to mark that the file is created, so that the writer can use the +markers to automatically clean up uncommitted data in failure and rollback scenarios. We propose to use the markers +identify the conflict at the file group level during writing data. There are some subtle differences in early conflict +detection work flow between different types of marker maintainers. For direct markers, hoodie lists necessary marker +files directly and does conflict checking before the writers creating markers and before starting to write corresponding +data file. For the timeline-server based markers, hoodie just gets the result of marker conflict checking before the +writers creating markers and before starting to write corresponding data files. The conflicts are asynchronously and +periodically checked so that the writing conflicts can be detected as early as possible. Both writers may still write +the data files of the same file slice, until the conflict is detected in the next round of checking. + +What's more? Hoodie can stop writing earlier because of early conflict detection and release the resources to cluster, +improving resource utilization. + +Note that, the early conflict detection proposed by this RFC operates within OCC. Any conflict detection outside the +scope of OCC is not handle. For example, current OCC for multiple writers cannot detect the conflict if two concurrent +writers perform INSERT operations for the same set of record keys, because the writers write to different file groups. +This RFC does not intend to address this problem. + +## Background + +As we know, transactions and multi-writers of data lakes are becoming the key characteristics of building Lakehouse +these days. Quoting this inspiring blog Lakehouse Concurrency Control: Are we too optimistic? directly: +https://hudi.apache.org/blog/2021/12/16/lakehouse-concurrency-control-are-we-too-optimistic/ + +> "Hudi implements a file level, log based concurrency control protocol on the Hudi timeline, which in-turn relies +> on bare minimum atomic puts to cloud storage. By building on an event log as the central piece for inter process +> coordination, Hudi is able to offer a few flexible deployment models that offer greater concurrency over pure OCC +> approaches that just track table snapshots." + +In the multi-writer scenario, Hudi's existing conflict detection occurs after the writer finishing writing the data and +before committing the metadata. In other words, the writer just detects the occurrence of the conflict when it starts to +commit, although all calculations and data writing have been completed, which causes a waste of resources. + +For example: + +Now there are two writing jobs: job1 writes 10M data to the Hudi table, including updates to file group 1. Another job2 +writes 100G to the Hudi table, and also updates the same file group 1. + +Job1 finishes and commits to Hudi successfully. After a few hours, job2 finishes writing data files(100G) and starts to +commit metadata. At this time, a conflict with job1 is found, and the job2 has to be aborted and re-run after failure. +Obviously, a lot of computing resources and time are wasted for job2. + +Hudi currently has two important mechanisms, marker mechanism and heartbeat mechanism: + +1. Marker mechanism can track all the files that are part of an active write. +2. Heartbeat mechanism that can track all active writers to a Hudi table. + +Based on marker and heartbeat, this RFC proposes a new conflict detection: Early Conflict Detection. Before the writer +creates the marker and before it starts to write the file, Hudi performs this new conflict detection, trying to detect +the writing conflict directly (for direct markers) or get the async
[GitHub] [hudi] xushiyan commented on a diff in pull request #4309: [HUDI-3016][RFC-43] Proposal to implement Table Management Service
xushiyan commented on code in PR #4309: URL: https://github.com/apache/hudi/pull/4309#discussion_r985348379 ## rfc/rfc-43/rfc-43.md: ## @@ -0,0 +1,369 @@ + + +# RFC-43: Implement Table Management ServiceTable Management Service for Hudi + +## Proposers + +- @yuzhaojing + +## Approvers + +- @vinothchandar +- @Raymond + +## Status + +JIRA: [https://issues.apache.org/jira/browse/HUDI-3016](https://issues.apache.org/jira/browse/HUDI-3016) + +## Abstract + +Hudi table needs table management operations. Currently, schedule these job provides Three ways: + +- Inline, execute these job and writing job in the same application, perform the these job and writing job serially. + +- Async, execute these job and writing job in the same application, Async parallel execution of these job and write job. + +- Independent compaction/clustering job, execute an async compaction/clustering job of another application. + +With the increase in the number of HUDI tables, due to a lack of management capabilities, maintenance costs will become +higher. This proposal is to implement an independent compaction/clustering Service to manage the Hudi +compaction/clustering job. + +## Background + +In the current implementation, if the HUDI table needs do compact/cluster, it only has three ways: + +1. Use inline compaction/clustering, in this mode the job will be block writing job. + +2. Using Async compaction/clustering, in this mode the job execute async but also sharing the resource with HUDI to + write a job that may affect the stability of job writing, which is not what the user wants to see. + +3. Using independent compaction/clustering job is a better way to schedule the job, in this mode the job execute async + and do not sharing resources with writing job, but also has some questions: +1. Users have to enable lock service providers so that there is not data loss. Especially when compaction/clustering + is getting scheduled, no other writes should proceed concurrently and hence a lock is required. +2. The user needs to manually start an async compaction/clustering application, which means that the user needs to + maintain two jobs. +3. With the increase in the number of HUDI jobs, there is no unified service to manage compaction/clustering jobs ( + monitor, retry, history, etc...), which will make maintenance costs increase. + +With this effort, we want to provide an independent compaction/clustering Service, it will have these abilities: + +- Provides a pluggable execution interface that can adapt to multiple execution engines, such as Spark and Flink. + +- With the ability to failover, need to be persisted compaction/clustering message. + +- Perfect metrics and reuse HoodieMetric expose to the outside. + +- Provide automatic failure retry for compaction/clustering job. + +## Implementation + +### Processing mode + +Different processing modes depending on whether the meta server is enabled + +- Hudi metaserver is used +- The pull-based mechanism works for fewer tables. Scanning 1000s of tables for possible services is going to induce + lots of a load of listing. +- The meta server provides a listener that takes as input the uris of the Table Management Service and triggers a + callback through the hook at each instant commit, thereby calling the Table Management Service to do the + scheduling/execution for the table. + ![](service_with_meta_server.png) + +- Hudi metaserver is not used +- for every write/commit on the table, the table management server is notified. +- Each request to the table management server carries all pending instant matches the current action type. + ![](service_without_meta_server.png) + +### Processing flow + +- If hudi metaserver is used, after receiving the request, the table management server schedules the relevant table + service to the table's timeline Review Comment: > schedules the relevant table service to the table's timeline need to make it explicit: this is table timeline managed in metaserver, right? it can confuse with the table timeline on storage. Should also mention how metaserver interact with storage in this case. ## rfc/rfc-43/rfc-43.md: ## @@ -0,0 +1,316 @@ + + +# RFC-43: Implement Table Management ServiceTable Management Service for Hudi + +## Proposers + +- @yuzhaojing + +## Approvers + +- @vinothchandar +- @Raymond + +## Status + +JIRA: [https://issues.apache.org/jira/browse/HUDI-3016](https://issues.apache.org/jira/browse/HUDI-3016) + +## Abstract + +Hudi table needs table management operations. Currently, schedule these job provides Three ways: + +- Inline, execute these job and writing job in the same application, perform the these job and writing job serially. + +- Async, execute these job and writing job in the same application, Async parallel execution of these job and write job. + +- Independent compaction/clustering job,
[GitHub] [hudi] yesemsanthoshkumar commented on a diff in pull request #6726: [HUDI-4630] Add transformer capability to individual feeds in MultiTableDeltaStreamer
yesemsanthoshkumar commented on code in PR #6726: URL: https://github.com/apache/hudi/pull/6726#discussion_r985346731 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java: ## @@ -135,6 +135,7 @@ private void populateTableExecutionContextList(TypedProperties properties, Strin if (cfg.enableMetaSync && StringUtils.isNullOrEmpty(tableProperties.getString(HoodieSyncConfig.META_SYNC_TABLE_NAME.key(), ""))) { throw new HoodieException("Meta sync table field not provided!"); } + populateTransformerProps(cfg, tableProperties); Review Comment: @yihua Sure. I'm new to this. I'll work over this weekend. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhangyue19921010 commented on pull request #6003: [HUDI-1575][RFC-56] Early Conflict Detection For Multi-writer
zhangyue19921010 commented on PR #6003: URL: https://github.com/apache/hudi/pull/6003#issuecomment-1264887516 Hi @yihua and @pratyakshsharma . Really appreciate for your attention here! Address the comments. PTAL :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dragonH commented on issue #6832: [SUPPORT] AWS Glue 3.0 fail to write dataset with hudi (hive sync issue)
dragonH commented on issue #6832: URL: https://github.com/apache/hudi/issues/6832#issuecomment-1264882911 hi @codope sure, wiil also do the latest hudi testing with EMR and share th result here thanks for the help hi @kazdy thanks for the help i acknowledge the behavior of aws glue that converting the table name and columns name to lowarcase but was suprised that this casue the issue after converting the table name to lowarcase the data was written successfully https://user-images.githubusercontent.com/18332044/193495139-64ddc70d-1468-49c8-8772-ba7af43e80dc.png;> just curious the step of how hudi created and sync the table cause the we could see the table was created with the lowarcase name how come it used the original name( with upper case) to find and compared the partition 樂 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6854: [HUDI-4631] Adding retries to spark datasource writes on conflict failures:
hudi-bot commented on PR #6854: URL: https://github.com/apache/hudi/pull/6854#issuecomment-1264756277 ## CI report: * 3fd99e92b8be748fa52e025f8bc6bbf6681df359 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11966) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4631) Enhance retries for failed writes w/ write conflicts in a multi writer scenarios
[ https://issues.apache.org/jira/browse/HUDI-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4631: - Labels: pull-request-available (was: ) > Enhance retries for failed writes w/ write conflicts in a multi writer > scenarios > > > Key: HUDI-4631 > URL: https://issues.apache.org/jira/browse/HUDI-4631 > Project: Apache Hudi > Issue Type: Improvement > Components: multi-writer >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > lets say there are two writers from t0 to t5. so hudi fails w2 and succeeds > w1. and user restarts w2 and for next 5 mins, lets say there are no other > overlapping writers. So the same write from w2 will now succeed. so, whenever > there is a write conflict and pipeline fails, all user needs to do is, just > restart the pipeline or retry to ingest the same batch. > > Ask: can we add retries within hudi during such failures. Anyways, in most > cases, users just restart the pipeline in such cases. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6854: [HUDI-4631] Adding retries to spark datasource writes on conflict failures:
hudi-bot commented on PR #6854: URL: https://github.com/apache/hudi/pull/6854#issuecomment-1264730004 ## CI report: * 3fd99e92b8be748fa52e025f8bc6bbf6681df359 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11966) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6854: [HUDI-4631] Adding retries to spark datasource writes on conflict failures:
hudi-bot commented on PR #6854: URL: https://github.com/apache/hudi/pull/6854#issuecomment-1264729199 ## CI report: * 3fd99e92b8be748fa52e025f8bc6bbf6681df359 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on pull request #6851: [HUDI-4966] Add a partition extractor to handle partition values with slashes
yihua commented on PR #6851: URL: https://github.com/apache/hudi/pull/6851#issuecomment-1264718033 > @yihua thanks for looking into this. I think the user's problem can also be resolved by using `SlashEncodedDayPartitionValueExtractor` ? probably need to follow the [migration guide](https://hudi.apache.org/releases/release-0.12.0#configuration-updates) If the date output format is “/MM/dd”, yes. But we should also allow user to specify any format, e.g., “MM/dd/HH” or “MM/dd/”, which don’t have any corresponding partition extractor that works. The new partition extractor addresses this problem. Even for “/MM/dd”, there is no need to specify extractor after the fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan opened a new pull request, #6854: [HUDI-4631] Adding retries to spark datasource writes on conflict failures:
nsivabalan opened a new pull request, #6854: URL: https://github.com/apache/hudi/pull/6854 ### Change Logs With Hudi's OCC, one of the commit is expected to fail if there are overalapping writes. From a user's standpoint, very likely user may retry the failed write w/o any additional action. So, adding a retry functionality to spark datasource writes with hudi automatically incase of conflict failures. ### Impact User experience w/ multi-writers will be improved with these automatic retries. **Risk level: medium ** Users should enable the retries w/ caution since it could keep retrying the failed commit again until max retries are exhausted. Could result in some compute cost for large batches. ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - Configs introduced - `hoodie.write.lock.retry.on.conflict.failures` : to enable retries on conflict failures. Default is false. - `hoodie.write.lock.num.retries.on.conflict.failures` : max number of times to retry on conflict failures. - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6852: [MINOR] Fix testUpdateRejectForClustering
hudi-bot commented on PR #6852: URL: https://github.com/apache/hudi/pull/6852#issuecomment-1264701402 ## CI report: * 8a6ef11a8c573fa3cd49217157a6a8bb7f112395 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11965) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org