svn commit: r69529 - in /release/hudi/0.15.0: ./ hudi-0.15.0.src.tgz hudi-0.15.0.src.tgz.asc hudi-0.15.0.src.tgz.sha512
Author: yihua Date: Tue Jun 4 06:13:26 2024 New Revision: 69529 Log: Add Apache Hudi 0.15.0 source release Added: release/hudi/0.15.0/ release/hudi/0.15.0/hudi-0.15.0.src.tgz (with props) release/hudi/0.15.0/hudi-0.15.0.src.tgz.asc release/hudi/0.15.0/hudi-0.15.0.src.tgz.sha512 Added: release/hudi/0.15.0/hudi-0.15.0.src.tgz == Binary file - no diff available. Propchange: release/hudi/0.15.0/hudi-0.15.0.src.tgz -- svn:mime-type = application/octet-stream Added: release/hudi/0.15.0/hudi-0.15.0.src.tgz.asc == --- release/hudi/0.15.0/hudi-0.15.0.src.tgz.asc (added) +++ release/hudi/0.15.0/hudi-0.15.0.src.tgz.asc Tue Jun 4 06:13:26 2024 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCAAdFiEEiIqTQeYA64VQqs1e+xt1BPf3cMkFAmZerO0ACgkQ+xt1BPf3 +cMkCwQ/+KPVteMK7Q7gH5oCzOrGeafKkm9i4IySfSg+BkjyYwncTgyCnoOn+bQLm +nQIiIPFpF+ROcnaD+iQqd+VPeuX/V5JS3MzeZddw/k1MueXuTGuqd2q84LNp3LYe +hXmCmUCG6nQHWulcgRVIKrrfdBMUeZDTL3WR7JgirNJovOkrdP9k3prl5jeHes2v +fEu1WwTq6NajPYtRM8csRMPVWIO5oDW9oJTF45OGQvjZTc0pb8AgudP6f7CRcutW +QQz/EX4AFISe0azaHw7NHLJoR75h4Iz+Onzo520d5fKlowDKVEXVyYLgY9ThEFks +GboqpO2LQDiGyzdwVM6KAfQsVOwNWJ+4VgItlWHlfe4NE/wZr61OfdU2fIonFGfu +SN1Z3wKyJC5SmAeRsRRm9L791CbGab5D4ZYI+r2MCO0kvzOKG8yl+bPjCt94Qc62 +1TrnsaVz5k9CmIl3A8dxtiCtG/g/W/68qliKgXX8TivMJ8Gr2LFJCEygu9JplxUl +R0sb6+4Bmftyu8NHF6j4LWcL6Ae3ySQf0oN8q3laekMjf4rrcqoGKzH/A6GAAdtO +D17JDreky3ARU6aksbFTzoKM6nwKQTsva3gD6xjCmcaIMfoOTUa7QdCQgNQC2Afa ++EiQvGq7touqwlfxUwKLfyx0BMD1DdjPZ07a3oJln2odf5kI4TQ= +=W9Sg +-END PGP SIGNATURE- Added: release/hudi/0.15.0/hudi-0.15.0.src.tgz.sha512 == --- release/hudi/0.15.0/hudi-0.15.0.src.tgz.sha512 (added) +++ release/hudi/0.15.0/hudi-0.15.0.src.tgz.sha512 Tue Jun 4 06:13:26 2024 @@ -0,0 +1 @@ +5e8627c69b9c13c6e0b3849ca829d0a941ff55fd7fbc2697bc98bcde155ff28642ad6fc7f9f234e262acf13069667ba71f74437433edb59e7904bc4a5086f6bf hudi-0.15.0.src.tgz
svn commit: r69528 - in /dev/hudi/hudi-0.15.0: ./ hudi-0.15.0.src.tgz hudi-0.15.0.src.tgz.asc hudi-0.15.0.src.tgz.sha512
Author: yihua Date: Tue Jun 4 06:00:37 2024 New Revision: 69528 Log: Add Apache Hudi 0.15.0 source release Added: dev/hudi/hudi-0.15.0/ dev/hudi/hudi-0.15.0/hudi-0.15.0.src.tgz (with props) dev/hudi/hudi-0.15.0/hudi-0.15.0.src.tgz.asc dev/hudi/hudi-0.15.0/hudi-0.15.0.src.tgz.sha512 Added: dev/hudi/hudi-0.15.0/hudi-0.15.0.src.tgz == Binary file - no diff available. Propchange: dev/hudi/hudi-0.15.0/hudi-0.15.0.src.tgz -- svn:mime-type = application/octet-stream Added: dev/hudi/hudi-0.15.0/hudi-0.15.0.src.tgz.asc == --- dev/hudi/hudi-0.15.0/hudi-0.15.0.src.tgz.asc (added) +++ dev/hudi/hudi-0.15.0/hudi-0.15.0.src.tgz.asc Tue Jun 4 06:00:37 2024 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCAAdFiEEiIqTQeYA64VQqs1e+xt1BPf3cMkFAmZerO0ACgkQ+xt1BPf3 +cMkCwQ/+KPVteMK7Q7gH5oCzOrGeafKkm9i4IySfSg+BkjyYwncTgyCnoOn+bQLm +nQIiIPFpF+ROcnaD+iQqd+VPeuX/V5JS3MzeZddw/k1MueXuTGuqd2q84LNp3LYe +hXmCmUCG6nQHWulcgRVIKrrfdBMUeZDTL3WR7JgirNJovOkrdP9k3prl5jeHes2v +fEu1WwTq6NajPYtRM8csRMPVWIO5oDW9oJTF45OGQvjZTc0pb8AgudP6f7CRcutW +QQz/EX4AFISe0azaHw7NHLJoR75h4Iz+Onzo520d5fKlowDKVEXVyYLgY9ThEFks +GboqpO2LQDiGyzdwVM6KAfQsVOwNWJ+4VgItlWHlfe4NE/wZr61OfdU2fIonFGfu +SN1Z3wKyJC5SmAeRsRRm9L791CbGab5D4ZYI+r2MCO0kvzOKG8yl+bPjCt94Qc62 +1TrnsaVz5k9CmIl3A8dxtiCtG/g/W/68qliKgXX8TivMJ8Gr2LFJCEygu9JplxUl +R0sb6+4Bmftyu8NHF6j4LWcL6Ae3ySQf0oN8q3laekMjf4rrcqoGKzH/A6GAAdtO +D17JDreky3ARU6aksbFTzoKM6nwKQTsva3gD6xjCmcaIMfoOTUa7QdCQgNQC2Afa ++EiQvGq7touqwlfxUwKLfyx0BMD1DdjPZ07a3oJln2odf5kI4TQ= +=W9Sg +-END PGP SIGNATURE- Added: dev/hudi/hudi-0.15.0/hudi-0.15.0.src.tgz.sha512 == --- dev/hudi/hudi-0.15.0/hudi-0.15.0.src.tgz.sha512 (added) +++ dev/hudi/hudi-0.15.0/hudi-0.15.0.src.tgz.sha512 Tue Jun 4 06:00:37 2024 @@ -0,0 +1 @@ +5e8627c69b9c13c6e0b3849ca829d0a941ff55fd7fbc2697bc98bcde155ff28642ad6fc7f9f234e262acf13069667ba71f74437433edb59e7904bc4a5086f6bf hudi-0.15.0.src.tgz
(hudi) annotated tag release-0.15.0 updated (38832854be3 -> 3b2205a3e49)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a change to annotated tag release-0.15.0 in repository https://gitbox.apache.org/repos/asf/hudi.git *** WARNING: tag release-0.15.0 was modified! *** from 38832854be3 (commit) to 3b2205a3e49 (tag) tagging 38832854be37cb78ad1edd87f515f01ca5ea6a8a (commit) replaces release-0.15.0-rc3 by Y Ethan Guo on Mon Jun 3 22:54:32 2024 -0700 - Log - 0.15.0 -BEGIN PGP SIGNATURE- iQIzBAABCAAdFiEEDE0xZCfsqnGiCtlma+HUVMkPXqUFAmZerBgACgkQa+HUVMkP XqW2kxAAlAQJ8w7WbKJCy4c/YY+lPOc2jX0b/Q4+lMoIV/aQJu8XDFtBNEin7GBE b2g4iLEag0SDAu3dzpR5YqmPCrGPGfkP4ZHOeYWsuxXiHn/UKQGLZR3hBvjZQXSE fpe2C0B/h7/U6u4In31cqAL4N9DNXJcQt+780R+SUJbRWbyqRZfU3ddHIOkOZNJg lZ1UrJ7rFlF/VNUWpb6BDIHPm8+7p0jygt0YJKOtefL55tJSA2PNy/FhOAt6Fs2A FXRTLQn7lbNb9DAX6xpu+wdgt/KGW1RzvPy4CnlfSC/3h4NZ7SDDOt3/nG7Sz0M+ 5slho5iolnsMNGkuRebGH/V/zOPKhI7bLLrpAKtxrPtRyoOj+io0queqVR0fkuQZ q315iUAGWRIzEZ9TbvR23dm2zitlUSgP0+2dETs7oVZ6c5jL1ojyZ1MlKcWMZS86 0xrv0vLNdnCHr0u+rCkhiaz/FqWxmnl6sQIRpicrHpGnZpYyNFauJ1TLS9OjOud7 86sMEK6atM8emXl+iYJfcEyJXpDHR0dncJrpDEk1XIDh/Bg2aZL1yvwbTFrDVp/x PDrOC5JufvfY05XZHMU2HTaIquVfBlJ0GdsWHL45TJ5HtZ4hBPVL1rknjNjgkx4N nF2b3zQuCzMr/+7D6VbDNIlWsAg8wvmY8bjKo4XtTEldnNUMR30= =E0hr -END PGP SIGNATURE- --- No new revisions were added by this update. Summary of changes:
(hudi) branch release-0.15.0 updated: [MINOR] Update release version to reflect published version 0.15.0
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch release-0.15.0 in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/release-0.15.0 by this push: new 38832854be3 [MINOR] Update release version to reflect published version 0.15.0 38832854be3 is described below commit 38832854be37cb78ad1edd87f515f01ca5ea6a8a Author: Y Ethan Guo AuthorDate: Mon Jun 3 22:49:24 2024 -0700 [MINOR] Update release version to reflect published version 0.15.0 --- docker/hoodie/hadoop/base/pom.xml| 2 +- docker/hoodie/hadoop/base_java11/pom.xml | 2 +- docker/hoodie/hadoop/datanode/pom.xml| 2 +- docker/hoodie/hadoop/historyserver/pom.xml | 2 +- docker/hoodie/hadoop/hive_base/pom.xml | 2 +- docker/hoodie/hadoop/namenode/pom.xml| 2 +- docker/hoodie/hadoop/pom.xml | 2 +- docker/hoodie/hadoop/prestobase/pom.xml | 2 +- docker/hoodie/hadoop/spark_base/pom.xml | 2 +- docker/hoodie/hadoop/sparkadhoc/pom.xml | 2 +- docker/hoodie/hadoop/sparkmaster/pom.xml | 2 +- docker/hoodie/hadoop/sparkworker/pom.xml | 2 +- docker/hoodie/hadoop/trinobase/pom.xml | 2 +- docker/hoodie/hadoop/trinocoordinator/pom.xml| 2 +- docker/hoodie/hadoop/trinoworker/pom.xml | 2 +- hudi-aws/pom.xml | 4 ++-- hudi-cli/pom.xml | 2 +- hudi-client/hudi-client-common/pom.xml | 4 ++-- hudi-client/hudi-flink-client/pom.xml| 4 ++-- hudi-client/hudi-java-client/pom.xml | 4 ++-- hudi-client/hudi-spark-client/pom.xml| 4 ++-- hudi-client/pom.xml | 2 +- hudi-common/pom.xml | 2 +- hudi-examples/hudi-examples-common/pom.xml | 2 +- hudi-examples/hudi-examples-flink/pom.xml| 2 +- hudi-examples/hudi-examples-java/pom.xml | 2 +- hudi-examples/hudi-examples-spark/pom.xml| 2 +- hudi-examples/pom.xml| 2 +- hudi-flink-datasource/hudi-flink/pom.xml | 4 ++-- hudi-flink-datasource/hudi-flink1.14.x/pom.xml | 4 ++-- hudi-flink-datasource/hudi-flink1.15.x/pom.xml | 4 ++-- hudi-flink-datasource/hudi-flink1.16.x/pom.xml | 4 ++-- hudi-flink-datasource/hudi-flink1.17.x/pom.xml | 4 ++-- hudi-flink-datasource/hudi-flink1.18.x/pom.xml | 4 ++-- hudi-flink-datasource/pom.xml| 4 ++-- hudi-gcp/pom.xml | 2 +- hudi-hadoop-common/pom.xml | 2 +- hudi-hadoop-mr/pom.xml | 2 +- hudi-integ-test/pom.xml | 2 +- hudi-io/pom.xml | 2 +- hudi-kafka-connect/pom.xml | 4 ++-- hudi-platform-service/hudi-metaserver/hudi-metaserver-client/pom.xml | 2 +- hudi-platform-service/hudi-metaserver/hudi-metaserver-server/pom.xml | 2 +- hudi-platform-service/hudi-metaserver/pom.xml| 4 ++-- hudi-platform-service/pom.xml| 2 +- hudi-spark-datasource/hudi-spark-common/pom.xml | 4 ++-- hudi-spark-datasource/hudi-spark/pom.xml | 4 ++-- hudi-spark-datasource/hudi-spark2-common/pom.xml | 2 +- hudi-spark-datasource/hudi-spark2/pom.xml| 4 ++-- hudi-spark-datasource/hudi-spark3-common/pom.xml | 2 +- hudi-spark-datasource/hudi-spark3.0.x/pom.xml| 4 ++-- hudi-spark-datasource/hudi-spark3.1.x/pom.xml| 4 ++-- hudi-spark-datasource/hudi-spark3.2.x/pom.xml| 4 ++-- hudi-spark-datasource/hudi-spark3.2plus-common/pom.xml | 2 +- hudi-spark-datasource/hudi-spark3.3.x/pom.xml| 4 ++-- hudi-spark-datasource/hudi-spark3.4.x/pom.xml| 4 ++-- hudi-spark-datasource/hudi-spark3.5.x/pom.xml| 4 ++--
Re: [PR] [MINOR] Correct order of test services start in `UtilitiesTestBase` [hudi]
hudi-bot commented on PR #11387: URL: https://github.com/apache/hudi/pull/11387#issuecomment-2146639225 ## CI report: * 5c44161ef57a474b702de93418e0d601da490897 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24211) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7819] Fix OptionsResolver#allowCommitOnEmptyBatch default value bug [hudi]
hudi-bot commented on PR #11370: URL: https://github.com/apache/hudi/pull/11370#issuecomment-2146639091 ## CI report: * deba838f4432bea1bf8b5ca914cfebd272821f24 UNKNOWN * e1c37e6afc4869b7af4f5746ef5baf0512fba58f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24209) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Correct order of test services start in `UtilitiesTestBase` [hudi]
hudi-bot commented on PR #11387: URL: https://github.com/apache/hudi/pull/11387#issuecomment-2146631915 ## CI report: * 5c44161ef57a474b702de93418e0d601da490897 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT]Trying to find org.apache.hudi.com.google.common.base.Preconditions when using ZookeeperBasedLockProvider [hudi]
Gatsby-Lee commented on issue #8723: URL: https://github.com/apache/hudi/issues/8723#issuecomment-2146629737 I have the same issue with the Hudi Jar in the Amazon EMR Image 7.1.0. hmmm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7819] Fix OptionsResolver#allowCommitOnEmptyBatch default value bug [hudi]
hudi-bot commented on PR #11370: URL: https://github.com/apache/hudi/pull/11370#issuecomment-2146624701 ## CI report: * a7f8320f47046e457868ced0c56e98f2f35001e4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24187) * deba838f4432bea1bf8b5ca914cfebd272821f24 UNKNOWN * e1c37e6afc4869b7af4f5746ef5baf0512fba58f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24209) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [MINOR] Correct order of test services start in `UtilitiesTestBase` [hudi]
geserdugarov opened a new pull request, #11387: URL: https://github.com/apache/hudi/pull/11387 For now, all uses of `initTestServices()` are with `needsHdfs=false`, `needsHive=false`, `needsZookeeper=false`, except `HoodieDeltaStreamerTestBase`, where `needsHive=true`. So there is no problem with `initTestServices()` for now. But if we switch `needsZookeeper` to `true`, then we will face errors of Hive connection to Zookeeper in `HoodieDeltaStreamerTestBase`. Correct order of services start: 1. Zookeeper 2. HDFS 3. Hive Also fixed `cleanUpUtilitiesTestServices()` ordering, in reverse to corresponding initialization. ### Impact No impact ### Risk level (write none, low medium or high below) None ### Documentation Update No need ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Change Logs and Impact were stated clearly - [x] Adequate tests were added if applicable - [x] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Hive Sync tool fails to sync Hoodi table written using Flink 1.16 to HMS [hudi]
alberttwong commented on issue #8848: URL: https://github.com/apache/hudi/issues/8848#issuecomment-2146593572 @danny0405 I'm documenting my process at https://github.com/apache/incubator-xtable/discussions/457 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7819] Fix OptionsResolver#allowCommitOnEmptyBatch default value bug [hudi]
hudi-bot commented on PR #11370: URL: https://github.com/apache/hudi/pull/11370#issuecomment-2146580024 ## CI report: * a7f8320f47046e457868ced0c56e98f2f35001e4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24187) * deba838f4432bea1bf8b5ca914cfebd272821f24 UNKNOWN * e1c37e6afc4869b7af4f5746ef5baf0512fba58f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7819] Fix OptionsResolver#allowCommitOnEmptyBatch default value bug [hudi]
hudi-bot commented on PR #11370: URL: https://github.com/apache/hudi/pull/11370#issuecomment-2146572269 ## CI report: * a7f8320f47046e457868ced0c56e98f2f35001e4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24187) * deba838f4432bea1bf8b5ca914cfebd272821f24 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-7824) Fix incremental partitions fetch logic when savepoint is removed for Incr cleaner
[ https://issues.apache.org/jira/browse/HUDI-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-7824. - Fix Version/s: 1.0.0 0.16.0 Resolution: Fixed > Fix incremental partitions fetch logic when savepoint is removed for Incr > cleaner > - > > Key: HUDI-7824 > URL: https://issues.apache.org/jira/browse/HUDI-7824 > Project: Apache Hudi > Issue Type: Bug > Components: cleaning >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0, 0.16.0 > > > with incremental cleaner, if a savepoint is blocking cleaning up of a commit > and cleaner moved ahead wrt earliest commit to retain, when savepoint is > removed later, cleaner should account for cleaning up the commit of interest. > > Lets ensure clean planner account for all partitions when such savepoint > removal is detected -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch master updated (d0c7de050a8 -> ffd4f52b9ab)
This is an automated email from the ASF dual-hosted git repository. codope pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from d0c7de050a8 [HUDI-7822] Bump io.airlift:aircompressor from 0.25 to 0.27 (#11380) add ffd4f52b9ab [HUDI-7824] Fixing incr cleaner with savepoint removal (#11375) No new revisions were added by this update. Summary of changes: .../hudi/table/action/clean/CleanPlanner.java | 58 +- .../apache/hudi/table/action/TestCleanPlanner.java | 55 +++- 2 files changed, 55 insertions(+), 58 deletions(-)
Re: [PR] [HUDI-7824] Fixing incr cleaner with savepoint removal [hudi]
codope merged PR #11375: URL: https://github.com/apache/hudi/pull/11375 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7747] In MetaClient remove getBasePathV2() and return StoragePath from getBasePath() [hudi]
wombatu-kun commented on PR #11385: URL: https://github.com/apache/hudi/pull/11385#issuecomment-2146538109 > Let me know if you prefer to address the `toString()` calls in this PR. Also, could you raise another PR against `branch-0.x` with the same changes? Ok, i'll do it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-7782) Task not serializable due to DynamoDBBasedLockProvider and HiveMetastoreBasedLockProvider in clean action
[ https://issues.apache.org/jira/browse/HUDI-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov reassigned HUDI-7782: --- Assignee: Vova Kolmakov > Task not serializable due to DynamoDBBasedLockProvider and > HiveMetastoreBasedLockProvider in clean action > - > > Key: HUDI-7782 > URL: https://issues.apache.org/jira/browse/HUDI-7782 > Project: Apache Hudi > Issue Type: Bug >Reporter: hector >Assignee: Vova Kolmakov >Priority: Major > > Caused by: java.io.NotSerializableException: > org.apache.hudi.hive.transaction.lock.HiveMetastoreBasedLockProvider > Serialization stack: > - object not serializable (class: > org.apache.hudi.hive.transaction.lock.HiveMetastoreBasedLockProvider, value: > org.apache.hudi.hive.transaction.lock.HiveMetastoreBasedLockProvider@1355d2ca) > > like HUDI-3638, only fixed the issue of ZookeeperbasedLockProvider. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7747] In MetaClient remove getBasePathV2() and return StoragePath from getBasePath() [hudi]
yihua commented on code in PR #11385: URL: https://github.com/apache/hudi/pull/11385#discussion_r1625298506 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java: ## @@ -150,8 +150,8 @@ private void validateBeforeScheduling() { private void abort(HoodieInstant indexInstant) { // delete metadata partition partitionIndexTypes.forEach(partitionType -> { - if (metadataPartitionExists(table.getMetaClient().getBasePath(), context, partitionType.getPartitionPath())) { -deleteMetadataPartition(table.getMetaClient().getBasePath(), context, partitionType.getPartitionPath()); + if (metadataPartitionExists(table.getMetaClient().getBasePath().toString(), context, partitionType.getPartitionPath())) { Review Comment: It would be great to get rid of most of the `toString()` calls. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7747] In MetaClient remove getBasePathV2() and return StoragePath from getBasePath() [hudi]
yihua commented on code in PR #11385: URL: https://github.com/apache/hudi/pull/11385#discussion_r1625295987 ## hudi-cli/src/main/java/org/apache/hudi/cli/commands/FileSystemViewCommand.java: ## @@ -239,9 +239,9 @@ private HoodieTableFileSystemView buildFileSystemView(String globRegex, String m HoodieTableMetaClient client = HoodieCLI.getTableMetaClient(); HoodieTableMetaClient metaClient = HoodieTableMetaClient.builder() .setConf(client.getStorageConf().newInstance()) - .setBasePath(client.getBasePath()).setLoadActiveTimelineOnLoad(true).build(); + .setBasePath(client.getBasePath().toString()).setLoadActiveTimelineOnLoad(true).build(); Review Comment: nit: can some of the `toString()` calls be avoided by directly passing the `StoragePath` instance? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]
hudi-bot commented on PR #11152: URL: https://github.com/apache/hudi/pull/11152#issuecomment-2146485218 ## CI report: * dd052193e61243e0f2228fe8993851ac066dbdda Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24208) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-7827) Bump io.airlift:aircompressor from 0.25 to 0.27
[ https://issues.apache.org/jira/browse/HUDI-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov reassigned HUDI-7827: --- Assignee: (was: Vova Kolmakov) > Bump io.airlift:aircompressor from 0.25 to 0.27 > --- > > Key: HUDI-7827 > URL: https://issues.apache.org/jira/browse/HUDI-7827 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7827) Bump io.airlift:aircompressor from 0.25 to 0.27
[ https://issues.apache.org/jira/browse/HUDI-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov reassigned HUDI-7827: --- Assignee: Vova Kolmakov > Bump io.airlift:aircompressor from 0.25 to 0.27 > --- > > Key: HUDI-7827 > URL: https://issues.apache.org/jira/browse/HUDI-7827 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Assignee: Vova Kolmakov >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7747] In MetaClient remove getBasePathV2() and return StoragePath from getBasePath() [hudi]
hudi-bot commented on PR #11385: URL: https://github.com/apache/hudi/pull/11385#issuecomment-2146435632 ## CI report: * 064b5310f709e5886dd7e278d1ebf9cdcfbe70c7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24206) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7824] Fixing incr cleaner with savepoint removal [hudi]
hudi-bot commented on PR #11375: URL: https://github.com/apache/hudi/pull/11375#issuecomment-2146435535 ## CI report: * f9f468cff5a5cb05c822e3dc0c349b60217fb208 UNKNOWN * 8475945ae37ea4e76e388a89f1c0c908bc943508 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24207) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7825]Support Report pending clustering and compaction plan metric [hudi]
danny0405 commented on code in PR #11377: URL: https://github.com/apache/hudi/pull/11377#discussion_r1625240199 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java: ## @@ -272,6 +275,28 @@ public void notifyCheckpointComplete(long checkpointId) { ); } + private void emitCompactionAndClusteringMetrics(Configuration conf, + HoodieTableMetaClient metaClient, HoodieFlinkWriteClient writeClient) { +if (conf.getBoolean(FlinkOptions.CLUSTERING_SCHEDULE_ENABLED) +&& !conf.getBoolean(FlinkOptions.CLUSTERING_ASYNC_ENABLED)) { + HoodieTimeline pendingReplaceTimeline = metaClient.getActiveTimeline() + .filterPendingReplaceTimeline(); + HoodieMetrics metrics = writeClient.getMetrics(); + if (metrics != null) { + metrics.setPendingClusteringCount(pendingReplaceTimeline.countInstants()); + } +} +if (conf.getBoolean(FlinkOptions.COMPACTION_SCHEDULE_ENABLED) Review Comment: yeah, you are right, there are two set of metrics for Flink now, we might need to unify them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]
the-other-tim-brown commented on code in PR #11152: URL: https://github.com/apache/hudi/pull/11152#discussion_r1625236004 ## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/utils/HoodieWriterClientTestHarness.java: ## @@ -165,71 +247,1183 @@ public HoodieWriteConfig.Builder getConfigBuilder(String schemaStr, HoodieIndex. return builder; } - public void assertPartitionMetadataForRecords(String basePath, List inputRecords, -HoodieStorage storage) throws IOException { -Set partitionPathSet = inputRecords.stream() -.map(HoodieRecord::getPartitionPath) -.collect(Collectors.toSet()); -assertPartitionMetadata(basePath, partitionPathSet.stream().toArray(String[]::new), storage); + // Functional Interfaces for passing lambda and Hoodie Write API contexts + + @FunctionalInterface + public interface Function2 { Review Comment: I'm fine with keeping it as is. I didn't realize the difference -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7824] Fixing incr cleaner with savepoint removal [hudi]
hudi-bot commented on PR #11375: URL: https://github.com/apache/hudi/pull/11375#issuecomment-2146392324 ## CI report: * f9f468cff5a5cb05c822e3dc0c349b60217fb208 UNKNOWN * eb047ef0d7f79002d87338b776e03923de161dee Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24185) * 8475945ae37ea4e76e388a89f1c0c908bc943508 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24207) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]
hudi-bot commented on PR #11152: URL: https://github.com/apache/hudi/pull/11152#issuecomment-2146391979 ## CI report: * 9946df9dd0aca2e4e8613b36265462d76397c8d8 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24194) * dd052193e61243e0f2228fe8993851ac066dbdda Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24208) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7824] Fixing incr cleaner with savepoint removal [hudi]
hudi-bot commented on PR #11375: URL: https://github.com/apache/hudi/pull/11375#issuecomment-2146386048 ## CI report: * f9f468cff5a5cb05c822e3dc0c349b60217fb208 UNKNOWN * eb047ef0d7f79002d87338b776e03923de161dee Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24185) * 8475945ae37ea4e76e388a89f1c0c908bc943508 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]
hudi-bot commented on PR #11152: URL: https://github.com/apache/hudi/pull/11152#issuecomment-2146385756 ## CI report: * 9946df9dd0aca2e4e8613b36265462d76397c8d8 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24194) * dd052193e61243e0f2228fe8993851ac066dbdda UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-5956] fix spark DAG ui when write [hudi]
danny0405 commented on code in PR #11376: URL: https://github.com/apache/hudi/pull/11376#discussion_r1625210183 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala: ## @@ -174,32 +174,43 @@ class HoodieSparkSqlWriterInternal { sourceDf: DataFrame, streamingWritesParamsOpt: Option[StreamingWriteParams] = Option.empty, hoodieWriteClient: Option[SparkRDDWriteClient[_]] = Option.empty): - (Boolean, HOption[String], HOption[String], HOption[String], SparkRDDWriteClient[_], HoodieTableConfig) = { -var succeeded = false -var counter = 0 -val maxRetry: Integer = Integer.parseInt(optParams.getOrElse(HoodieWriteConfig.NUM_RETRIES_ON_CONFLICT_FAILURES.key(), HoodieWriteConfig.NUM_RETRIES_ON_CONFLICT_FAILURES.defaultValue().toString)) -var toReturn: (Boolean, HOption[String], HOption[String], HOption[String], SparkRDDWriteClient[_], HoodieTableConfig) = null -while (counter <= maxRetry && !succeeded) { - try { -toReturn = writeInternal(sqlContext, mode, optParams, sourceDf, streamingWritesParamsOpt, hoodieWriteClient) -if (counter > 0) { - log.warn(s"Succeeded with attempt no $counter") -} -succeeded = true - } catch { -case e: HoodieWriteConflictException => - val writeConcurrencyMode = optParams.getOrElse(HoodieWriteConfig.WRITE_CONCURRENCY_MODE.key(), HoodieWriteConfig.WRITE_CONCURRENCY_MODE.defaultValue()) - if (WriteConcurrencyMode.supportsMultiWriter(writeConcurrencyMode) && counter < maxRetry) { -counter += 1 -log.warn(s"Conflict found. Retrying again for attempt no $counter") - } else { -throw e +val retryWrite: () => (Boolean, HOption[String], HOption[String], HOption[String], SparkRDDWriteClient[_], HoodieTableConfig) = () => { + var succeeded = false + var counter = 0 + val maxRetry: Integer = Integer.parseInt(optParams.getOrElse(HoodieWriteConfig.NUM_RETRIES_ON_CONFLICT_FAILURES.key(), HoodieWriteConfig.NUM_RETRIES_ON_CONFLICT_FAILURES.defaultValue().toString)) + var toReturn: (Boolean, HOption[String], HOption[String], HOption[String], SparkRDDWriteClient[_], HoodieTableConfig) = null + + while (counter <= maxRetry && !succeeded) { +try { + toReturn = writeInternal(sqlContext, mode, optParams, sourceDf, streamingWritesParamsOpt, hoodieWriteClient) + if (counter > 0) { +log.warn(s"Succeeded with attempt no $counter") } + succeeded = true +} catch { + case e: HoodieWriteConflictException => +val writeConcurrencyMode = optParams.getOrElse(HoodieWriteConfig.WRITE_CONCURRENCY_MODE.key(), HoodieWriteConfig.WRITE_CONCURRENCY_MODE.defaultValue()) +if (WriteConcurrencyMode.supportsMultiWriter(writeConcurrencyMode) && counter < maxRetry) { + counter += 1 + log.warn(s"Conflict found. Retrying again for attempt no $counter") +} else { + throw e +} +} } + toReturn +} + +val executionId = getExecutionId(sqlContext.sparkContext, sourceDf.queryExecution) +if (executionId.isEmpty) { + sparkAdapter.sqlExecutionWithNewExecutionId(sourceDf.sparkSession, sourceDf.queryExecution, Option("Hudi Command"))( Review Comment: @jonvex do you have intreast for the review? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]
danny0405 commented on code in PR #11162: URL: https://github.com/apache/hudi/pull/11162#discussion_r1625209429 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/common/HoodieSparkEngineContext.java: ## @@ -229,6 +231,13 @@ public void cancelAllJobs() { javaSparkContext.cancelAllJobs(); } + @Override + public O aggregate(HoodieData data, O zeroValue, Functions.Function2 seqOp, Functions.Function2 combOp) { +Function2 seqOpFunc = seqOp::apply; +Function2 combOpFunc = combOp::apply; +return HoodieJavaRDD.getJavaRDD(data).aggregate(zeroValue, seqOpFunc, combOpFunc); Review Comment: I didn't see changes for `HoodieMetadataMergedLogRecordScanner` that switches from string cache key to serializable, then how we support the non-string secondary index fields? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Hive Sync tool fails to sync Hoodi table written using Flink 1.16 to HMS [hudi]
danny0405 commented on issue #8848: URL: https://github.com/apache/hudi/issues/8848#issuecomment-2146377029 @alberttwong Did you package the jar manually with the hive profile? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]
wombatu-kun commented on code in PR #11152: URL: https://github.com/apache/hudi/pull/11152#discussion_r1625204089 ## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/utils/HoodieWriterClientTestHarness.java: ## @@ -165,71 +247,1183 @@ public HoodieWriteConfig.Builder getConfigBuilder(String schemaStr, HoodieIndex. return builder; } - public void assertPartitionMetadataForRecords(String basePath, List inputRecords, -HoodieStorage storage) throws IOException { -Set partitionPathSet = inputRecords.stream() -.map(HoodieRecord::getPartitionPath) -.collect(Collectors.toSet()); -assertPartitionMetadata(basePath, partitionPathSet.stream().toArray(String[]::new), storage); + // Functional Interfaces for passing lambda and Hoodie Write API contexts + + @FunctionalInterface + public interface Function2 { Review Comment: The order of types is different from BiFunction: here resulting type is first, but in BiFunction it is the last. I did not add Function2, before refactoring it was already in code and it was used a lot (>50 usages). And also there is Function3 with resulting type in the first place. Function3 has >80 usages. And if we replace Function2 with BiFunction, we should also reorder type params in Function3 declaration and usages for consistency. Is it really necessary? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]
wombatu-kun commented on code in PR #11152: URL: https://github.com/apache/hudi/pull/11152#discussion_r1625200070 ## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/utils/HoodieWriterClientTestHarness.java: ## @@ -165,71 +247,1183 @@ public HoodieWriteConfig.Builder getConfigBuilder(String schemaStr, HoodieIndex. return builder; } - public void assertPartitionMetadataForRecords(String basePath, List inputRecords, -HoodieStorage storage) throws IOException { -Set partitionPathSet = inputRecords.stream() -.map(HoodieRecord::getPartitionPath) -.collect(Collectors.toSet()); -assertPartitionMetadata(basePath, partitionPathSet.stream().toArray(String[]::new), storage); + // Functional Interfaces for passing lambda and Hoodie Write API contexts + + @FunctionalInterface + public interface Function2 { + +R apply(T1 v1, T2 v2) throws IOException; + } + + @FunctionalInterface + public interface Function3 { + +R apply(T1 v1, T2 v2, T3 v3) throws IOException; + } + + /* Auxiliary methods for testing CopyOnWriteStorage with Spark and Java clients + to avoid code duplication in TestHoodieClientOnCopyOnWriteStorage and TestHoodieJavaClientOnCopyOnWriteStorage */ + + protected List writeAndVerifyBatch(BaseHoodieWriteClient client, List inserts, String commitTime, boolean populateMetaFields, boolean autoCommitOff) throws IOException { +// override in subclasses if needed +return Collections.emptyList(); Review Comment: Ok, made it abstract, moved it's implementations from TestHoodieJavaClientOnCopyOnWriteStorage to HoodieJavaClientTestHarness, and from TestHoodieClientOnCopyOnWriteStorage to HoodieSparkClientTestHarness. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]
wombatu-kun commented on code in PR #11152: URL: https://github.com/apache/hudi/pull/11152#discussion_r1625198786 ## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/testutils/Assertions.java: ## @@ -51,4 +67,88 @@ public static void assertFileSizesEqual(List statuses, CheckedFunct assertEquals(fileSizeGetter.apply(status), status.getStat().getFileSizeInBytes(; } + public static void assertPartitionMetadataForRecords(String basePath, List inputRecords, +HoodieStorage storage) throws IOException { +Set partitionPathSet = inputRecords.stream() +.map(HoodieRecord::getPartitionPath) +.collect(Collectors.toSet()); +assertPartitionMetadata(basePath, partitionPathSet.stream().toArray(String[]::new), storage); Review Comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7824] Fixing incr cleaner with savepoint removal [hudi]
nsivabalan commented on code in PR #11375: URL: https://github.com/apache/hudi/pull/11375#discussion_r1625198742 ## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/TestCleanPlanner.java: ## @@ -393,23 +417,23 @@ static Stream keepLatestByHoursOrCommitsArgsIncrCleanPartitions() { Map> latestSavepoints = new HashMap<>(); latestSavepoints.put(savepoint2, Collections.singletonList(PARTITION1)); latestSavepoints.put(savepoint3, Collections.singletonList(PARTITION1)); - arguments.addAll(buildArgumentsForCleanByHoursAndCommitsIncrCleanPartitionsCases( + arguments.addAll(buildArgumentsForCleanByHoursAndCommitsIncrCleanPartitionsCases(true, earliestInstant, lastCompletedInLastClean, lastCleanInstant, earliestInstantInLastClean, Collections.singletonList(PARTITION1), Collections.singletonMap(savepoint2, Collections.singletonList(PARTITION1)), activeInstantsPartitionsMap2, latestSavepoints, twoPartitionsInActiveTimeline, false)); // 2 savepoints were tracked in previous clean. one of them is removed in latest. A partition which was part of the removed savepoint should be added in final // list of partitions to clean Map> previousSavepoints = new HashMap<>(); -latestSavepoints.put(savepoint2, Collections.singletonList(PARTITION1)); -latestSavepoints.put(savepoint3, Collections.singletonList(PARTITION2)); - arguments.addAll(buildArgumentsForCleanByHoursAndCommitsIncrCleanPartitionsCases( +previousSavepoints.put(savepoint2, Collections.singletonList(PARTITION1)); +previousSavepoints.put(savepoint3, Collections.singletonList(PARTITION2)); + arguments.addAll(buildArgumentsForCleanByHoursAndCommitsIncrCleanPartitionsCases(true, earliestInstant, lastCompletedInLastClean, lastCleanInstant, earliestInstantInLastClean, Collections.singletonList(PARTITION1), -previousSavepoints, activeInstantsPartitionsMap2, Collections.singletonMap(savepoint3, Collections.singletonList(PARTITION2)), twoPartitionsInActiveTimeline, false)); +previousSavepoints, activeInstantsPartitionsMap2, Collections.singletonMap(savepoint3, Collections.singletonList(PARTITION2)), threePartitionsInActiveTimeline, false)); Review Comment: sure. makes sense -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7747] In MetaClient remove getBasePathV2() and return StoragePath from getBasePath() [hudi]
hudi-bot commented on PR #11385: URL: https://github.com/apache/hudi/pull/11385#issuecomment-2146337860 ## CI report: * 8605f0fd0fa5bc1c82a26eac8147fc521040f53a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24201) * 064b5310f709e5886dd7e278d1ebf9cdcfbe70c7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24206) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7747] In MetaClient remove getBasePathV2() and return StoragePath from getBasePath() [hudi]
hudi-bot commented on PR #11385: URL: https://github.com/apache/hudi/pull/11385#issuecomment-2146330844 ## CI report: * 8605f0fd0fa5bc1c82a26eac8147fc521040f53a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24201) * 064b5310f709e5886dd7e278d1ebf9cdcfbe70c7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]
the-other-tim-brown commented on PR #11152: URL: https://github.com/apache/hudi/pull/11152#issuecomment-2146293807 Just a couple minor nitpicks but the refactor looks good to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]
the-other-tim-brown commented on code in PR #11152: URL: https://github.com/apache/hudi/pull/11152#discussion_r1625155568 ## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/testutils/Assertions.java: ## @@ -51,4 +67,88 @@ public static void assertFileSizesEqual(List statuses, CheckedFunct assertEquals(fileSizeGetter.apply(status), status.getStat().getFileSizeInBytes(; } + public static void assertPartitionMetadataForRecords(String basePath, List inputRecords, +HoodieStorage storage) throws IOException { +Set partitionPathSet = inputRecords.stream() +.map(HoodieRecord::getPartitionPath) +.collect(Collectors.toSet()); +assertPartitionMetadata(basePath, partitionPathSet.stream().toArray(String[]::new), storage); Review Comment: For this line and 83, you can simplify this by not collecting to a set and just use `distinct()` on the stream ## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/utils/HoodieWriterClientTestHarness.java: ## @@ -165,71 +247,1183 @@ public HoodieWriteConfig.Builder getConfigBuilder(String schemaStr, HoodieIndex. return builder; } - public void assertPartitionMetadataForRecords(String basePath, List inputRecords, -HoodieStorage storage) throws IOException { -Set partitionPathSet = inputRecords.stream() -.map(HoodieRecord::getPartitionPath) -.collect(Collectors.toSet()); -assertPartitionMetadata(basePath, partitionPathSet.stream().toArray(String[]::new), storage); + // Functional Interfaces for passing lambda and Hoodie Write API contexts + + @FunctionalInterface + public interface Function2 { + +R apply(T1 v1, T2 v2) throws IOException; + } + + @FunctionalInterface + public interface Function3 { + +R apply(T1 v1, T2 v2, T3 v3) throws IOException; + } + + /* Auxiliary methods for testing CopyOnWriteStorage with Spark and Java clients + to avoid code duplication in TestHoodieClientOnCopyOnWriteStorage and TestHoodieJavaClientOnCopyOnWriteStorage */ + + protected List writeAndVerifyBatch(BaseHoodieWriteClient client, List inserts, String commitTime, boolean populateMetaFields, boolean autoCommitOff) throws IOException { +// override in subclasses if needed +return Collections.emptyList(); Review Comment: Should this just be abstract? Returning empty list by default may be misleading to other developers in the future that extend this class ## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/utils/HoodieWriterClientTestHarness.java: ## @@ -165,71 +247,1183 @@ public HoodieWriteConfig.Builder getConfigBuilder(String schemaStr, HoodieIndex. return builder; } - public void assertPartitionMetadataForRecords(String basePath, List inputRecords, -HoodieStorage storage) throws IOException { -Set partitionPathSet = inputRecords.stream() -.map(HoodieRecord::getPartitionPath) -.collect(Collectors.toSet()); -assertPartitionMetadata(basePath, partitionPathSet.stream().toArray(String[]::new), storage); + // Functional Interfaces for passing lambda and Hoodie Write API contexts + + @FunctionalInterface + public interface Function2 { Review Comment: There is already a BiFunction in java that does the same thing, can we just use that? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7823] Simplify dependency management on exclusions [hudi]
hudi-bot commented on PR #11374: URL: https://github.com/apache/hudi/pull/11374#issuecomment-2146288529 ## CI report: * 05fab0df29530420f0a77abf46be996b70c1bc25 UNKNOWN * 1e6955bbac8cc18f6774360c7b3ef4e307c1c397 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24205) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7823] Simplify dependency management on exclusions [hudi]
hudi-bot commented on PR #11374: URL: https://github.com/apache/hudi/pull/11374#issuecomment-2146281871 ## CI report: * 05fab0df29530420f0a77abf46be996b70c1bc25 UNKNOWN * 70cb1fe3bf55810cb26a89147fad92594537388c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24204) * 1e6955bbac8cc18f6774360c7b3ef4e307c1c397 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7823] Simplify dependency management on exclusions [hudi]
hudi-bot commented on PR #11374: URL: https://github.com/apache/hudi/pull/11374#issuecomment-2146274832 ## CI report: * 05fab0df29530420f0a77abf46be996b70c1bc25 UNKNOWN * 6abd40f1b77feb86cdc95d58cd2285c546a1f63e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24180) * 70cb1fe3bf55810cb26a89147fad92594537388c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7824] Fixing incr cleaner with savepoint removal [hudi]
the-other-tim-brown commented on code in PR #11375: URL: https://github.com/apache/hudi/pull/11375#discussion_r1625111858 ## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/TestCleanPlanner.java: ## @@ -160,6 +178,12 @@ void testPartitionsForIncrCleaning(HoodieWriteConfig config, String earliestInst mockLastCleanCommit(mockHoodieTable, lastCleanInstant, earliestInstantsInLastClean, activeTimeline, cleanMetadataOptionPair); mockFewActiveInstants(mockHoodieTable, activeInstantsPartitions, savepointsTrackedInLastClean, areCommitsForSavepointsRemoved); +// mock getAllPartitions +HoodieStorage storage = mock(HoodieStorage.class); +when(mockHoodieTable.getStorage()).thenReturn(storage); +mockedStatic.when(() -> FSUtils.getAllPartitionPaths(context, storage, config.getMetadataConfig(), config.getBasePath())) Review Comment: we could also update the CleanPlanner to use `hoodieTable.getMetadataTable().getAllPartitionPaths()` which could make the test setup cleaner as well ## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/TestCleanPlanner.java: ## @@ -393,23 +417,23 @@ static Stream keepLatestByHoursOrCommitsArgsIncrCleanPartitions() { Map> latestSavepoints = new HashMap<>(); latestSavepoints.put(savepoint2, Collections.singletonList(PARTITION1)); latestSavepoints.put(savepoint3, Collections.singletonList(PARTITION1)); - arguments.addAll(buildArgumentsForCleanByHoursAndCommitsIncrCleanPartitionsCases( + arguments.addAll(buildArgumentsForCleanByHoursAndCommitsIncrCleanPartitionsCases(true, earliestInstant, lastCompletedInLastClean, lastCleanInstant, earliestInstantInLastClean, Collections.singletonList(PARTITION1), Collections.singletonMap(savepoint2, Collections.singletonList(PARTITION1)), activeInstantsPartitionsMap2, latestSavepoints, twoPartitionsInActiveTimeline, false)); // 2 savepoints were tracked in previous clean. one of them is removed in latest. A partition which was part of the removed savepoint should be added in final // list of partitions to clean Map> previousSavepoints = new HashMap<>(); -latestSavepoints.put(savepoint2, Collections.singletonList(PARTITION1)); -latestSavepoints.put(savepoint3, Collections.singletonList(PARTITION2)); - arguments.addAll(buildArgumentsForCleanByHoursAndCommitsIncrCleanPartitionsCases( +previousSavepoints.put(savepoint2, Collections.singletonList(PARTITION1)); +previousSavepoints.put(savepoint3, Collections.singletonList(PARTITION2)); + arguments.addAll(buildArgumentsForCleanByHoursAndCommitsIncrCleanPartitionsCases(true, earliestInstant, lastCompletedInLastClean, lastCleanInstant, earliestInstantInLastClean, Collections.singletonList(PARTITION1), -previousSavepoints, activeInstantsPartitionsMap2, Collections.singletonMap(savepoint3, Collections.singletonList(PARTITION2)), twoPartitionsInActiveTimeline, false)); +previousSavepoints, activeInstantsPartitionsMap2, Collections.singletonMap(savepoint3, Collections.singletonList(PARTITION2)), threePartitionsInActiveTimeline, false)); Review Comment: Should the descriptions in the comments be updated to match the changes in the expected partitions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]
hudi-bot commented on PR #11162: URL: https://github.com/apache/hudi/pull/11162#issuecomment-2145850157 ## CI report: * b342d8f8e10f77419bf1bd0bc9f626a596ad65f9 UNKNOWN * ab2875d506fbb642636ca10d044fa9b9e5c951ae Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24203) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Implement Support for Ursa Frame ingestion pipelines [hudi]
balaji-varadarajan closed pull request #11386: Implement Support for Ursa Frame ingestion pipelines URL: https://github.com/apache/hudi/pull/11386 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]
hudi-bot commented on PR #11162: URL: https://github.com/apache/hudi/pull/11162#issuecomment-2145748432 ## CI report: * b342d8f8e10f77419bf1bd0bc9f626a596ad65f9 UNKNOWN * b00831e0a0506714d27bc2a64e58084b357a83cc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24196) * ab2875d506fbb642636ca10d044fa9b9e5c951ae Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24203) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Hive Sync tool fails to sync Hoodi table written using Flink 1.16 to HMS [hudi]
alberttwong commented on issue #8848: URL: https://github.com/apache/hudi/issues/8848#issuecomment-2145746488 adding in https://mvnrepository.com/artifact/org.apache.thrift/libfb303 ``` Running Command : java -cp /hive/lib/hive-metastore-3.1.3.jar::/hive/lib/hive-service-3.1.3.jar::/hive/lib/hive-exec-3.1.3.jar::/hive/lib/hive-jdbc-3.1.3.jar:/hive/lib/hive-jdbc-handler-3.1.3.jar::/hive/lib/jackson-annotations-2.12.0.jar:/hive/lib/jackson-core-2.12.0.jar:/hive/lib/jackson-core-asl-1.9.13.jar:/hive/lib/jackson-databind-2.12.0.jar:/hive/lib/jackson-dataformat-smile-2.12.0.jar:/hive/lib/jackson-mapper-asl-1.9.13.jar:/hive/lib/jackson-module-scala_2.11-2.12.0.jar::/hadoop/share/hadoop/common/*:/hadoop/share/hadoop/mapreduce/*:/hadoop/share/hadoop/hdfs/*:/hadoop/share/hadoop/common/lib/*:/hadoop/share/hadoop/hdfs/lib/*:/root/.ivy2/jars/*:/hadoop/etc/hadoop:/opt/hudi/hudi-sync/hudi-hive-sync/../../packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-bundle-1.0.0-SNAPSHOT.jar org.apache.hudi.hive.HiveSyncTool --metastore-uris thrift://hive-metastore:9083 --partitioned-by city --base-path s3a://warehouse/people --database hudi_db --table people --sync-mode hms 2024-06-03 17:15:25,270 INFO [main] conf.HiveConf (HiveConf.java:findConfigFile(187)) - Found configuration file null 2024-06-03 17:15:25,444 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:(60)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2024-06-03 17:15:25,550 INFO [main] impl.MetricsConfig (MetricsConfig.java:loadFirst(120)) - Loaded properties from hadoop-metrics2.properties 2024-06-03 17:15:25,581 INFO [main] impl.MetricsSystemImpl (MetricsSystemImpl.java:startTimer(378)) - Scheduled Metric snapshot period at 10 second(s). 2024-06-03 17:15:25,581 INFO [main] impl.MetricsSystemImpl (MetricsSystemImpl.java:start(191)) - s3a-file-system metrics system started 2024-06-03 17:15:26,025 INFO [main] table.HoodieTableMetaClient (HoodieTableMetaClient.java:(148)) - Loading HoodieTableMetaClient from s3a://warehouse/people 2024-06-03 17:15:26,120 INFO [main] table.HoodieTableConfig (HoodieTableConfig.java:(309)) - Loading table properties from s3a://warehouse/people/.hoodie/hoodie.properties 2024-06-03 17:15:26,140 INFO [main] table.HoodieTableMetaClient (HoodieTableMetaClient.java:(169)) - Finished Loading Table of type COPY_ON_WRITE(version=1) from s3a://warehouse/people 2024-06-03 17:15:26,140 INFO [main] table.HoodieTableMetaClient (HoodieTableMetaClient.java:(171)) - Loading Active commit timeline for s3a://warehouse/people 2024-06-03 17:15:26,159 INFO [main] timeline.HoodieActiveTimeline (HoodieActiveTimeline.java:(177)) - Loaded instants upto : Option{val=[20240603170053432__commit__COMPLETED]} 2024-06-03 17:15:26,229 ERROR [main] utils.MetaStoreUtils (MetaStoreUtils.java:logAndThrowMetaException(166)) - Got exception: java.net.URISyntaxException Illegal character in hostname at index 35: thrift://demo-hive-metastore-1.demo_default:9083 java.net.URISyntaxException: Illegal character in hostname at index 35: thrift://demo-hive-metastore-1.demo_default:9083 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]
hudi-bot commented on PR #11162: URL: https://github.com/apache/hudi/pull/11162#issuecomment-2145735114 ## CI report: * b342d8f8e10f77419bf1bd0bc9f626a596ad65f9 UNKNOWN * b00831e0a0506714d27bc2a64e58084b357a83cc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24196) * ab2875d506fbb642636ca10d044fa9b9e5c951ae UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Hive Sync tool fails to sync Hoodi table written using Flink 1.16 to HMS [hudi]
alberttwong commented on issue #8848: URL: https://github.com/apache/hudi/issues/8848#issuecomment-2145731084 after adding https://mvnrepository.com/artifact/org.apache.calcite/calcite-core, I ran into ``` root@spark:/opt/hudi/hudi-sync/hudi-hive-sync# ./run_sync_tool.sh --metastore-uris 'thrift://hive-metastore:9083' --partitioned-by city --base-path 's3a://warehouse/people' --database hudi_db --table people --sync-mode hms setting hadoop conf dir Running Command : java -cp /hive/lib/hive-metastore-3.1.3.jar::/hive/lib/hive-service-3.1.3.jar::/hive/lib/hive-exec-3.1.3.jar::/hive/lib/hive-jdbc-3.1.3.jar:/hive/lib/hive-jdbc-handler-3.1.3.jar::/hive/lib/jackson-annotations-2.12.0.jar:/hive/lib/jackson-core-2.12.0.jar:/hive/lib/jackson-core-asl-1.9.13.jar:/hive/lib/jackson-databind-2.12.0.jar:/hive/lib/jackson-dataformat-smile-2.12.0.jar:/hive/lib/jackson-mapper-asl-1.9.13.jar:/hive/lib/jackson-module-scala_2.11-2.12.0.jar::/hadoop/share/hadoop/common/*:/hadoop/share/hadoop/mapreduce/*:/hadoop/share/hadoop/hdfs/*:/hadoop/share/hadoop/common/lib/*:/hadoop/share/hadoop/hdfs/lib/*:/root/.ivy2/jars/*:/hadoop/etc/hadoop:/opt/hudi/hudi-sync/hudi-hive-sync/../../packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-bundle-1.0.0-SNAPSHOT.jar org.apache.hudi.hive.HiveSyncTool --metastore-uris thrift://hive-metastore:9083 --partitioned-by city --base-path s3a://warehouse/people --database hudi_db --table people --sync-mode hms 2024-06-03 17:10:42,515 INFO [main] conf.HiveConf (HiveConf.java:findConfigFile(187)) - Found configuration file null 2024-06-03 17:10:42,707 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:(60)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2024-06-03 17:10:42,824 INFO [main] impl.MetricsConfig (MetricsConfig.java:loadFirst(120)) - Loaded properties from hadoop-metrics2.properties 2024-06-03 17:10:42,858 INFO [main] impl.MetricsSystemImpl (MetricsSystemImpl.java:startTimer(378)) - Scheduled Metric snapshot period at 10 second(s). 2024-06-03 17:10:42,858 INFO [main] impl.MetricsSystemImpl (MetricsSystemImpl.java:start(191)) - s3a-file-system metrics system started 2024-06-03 17:10:43,304 INFO [main] table.HoodieTableMetaClient (HoodieTableMetaClient.java:(148)) - Loading HoodieTableMetaClient from s3a://warehouse/people 2024-06-03 17:10:43,395 INFO [main] table.HoodieTableConfig (HoodieTableConfig.java:(309)) - Loading table properties from s3a://warehouse/people/.hoodie/hoodie.properties 2024-06-03 17:10:43,413 INFO [main] table.HoodieTableMetaClient (HoodieTableMetaClient.java:(169)) - Finished Loading Table of type COPY_ON_WRITE(version=1) from s3a://warehouse/people 2024-06-03 17:10:43,413 INFO [main] table.HoodieTableMetaClient (HoodieTableMetaClient.java:(171)) - Loading Active commit timeline for s3a://warehouse/people 2024-06-03 17:10:43,431 INFO [main] timeline.HoodieActiveTimeline (HoodieActiveTimeline.java:(177)) - Loaded instants upto : Option{val=[20240603170053432__commit__COMPLETED]} Exception in thread "main" java.lang.NoClassDefFoundError: com/facebook/fb303/FacebookService$Iface ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7747] In MetaClient remove getBasePathV2() and return StoragePath from getBasePath() [hudi]
hudi-bot commented on PR #11385: URL: https://github.com/apache/hudi/pull/11385#issuecomment-2145719815 ## CI report: * 8605f0fd0fa5bc1c82a26eac8147fc521040f53a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24201) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]
codope commented on code in PR #11162: URL: https://github.com/apache/hudi/pull/11162#discussion_r1624781058 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/common/HoodieSparkEngineContext.java: ## @@ -229,6 +231,13 @@ public void cancelAllJobs() { javaSparkContext.cancelAllJobs(); } + @Override + public O aggregate(HoodieData data, O zeroValue, Functions.Function2 seqOp, Functions.Function2 combOp) { +Function2 seqOpFunc = seqOp::apply; +Function2 combOpFunc = combOp::apply; +return HoodieJavaRDD.getJavaRDD(data).aggregate(zeroValue, seqOpFunc, combOpFunc); Review Comment: This is based on https://github.com/apache/spark/blob/7e8b60b5ae7d6453bc1ce51b5112c975f9aa8757/core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala#L426 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Implement Support for Ursa Frame ingestion pipelines [hudi]
balaji-varadarajan opened a new pull request, #11386: URL: https://github.com/apache/hudi/pull/11386 ### Change Logs As described in https://docs.google.com/document/d/1sY1Kimyom_qL9-a5Z7lVf43SkZDZCi_wlwxs7J95WMU/edit, Implement pipelines to ingest Ursa frame-gen and other process output to lakehouse. ### Impact As described in https://docs.google.com/document/d/1sY1Kimyom_qL9-a5Z7lVf43SkZDZCi_wlwxs7J95WMU/edit, Implement pipelines to ingest Ursa frame-gen and other process output to lakehouse. ### Risk level (write none, low medium or high below) none ### Documentation Update none ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]
codope commented on code in PR #11162: URL: https://github.com/apache/hudi/pull/11162#discussion_r1624771184 ## hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java: ## @@ -312,6 +312,33 @@ public Map> readRecordIndex(List + * If the Metadata Table is not enabled, an exception is thrown to distinguish this from the absence of the key. + * + * @param secondaryKeys The list of secondary keys to read + */ + @Override + public Map> readSecondaryIndex(List secondaryKeys) { + ValidationUtils.checkState(dataMetaClient.getTableConfig().isMetadataPartitionAvailable(MetadataPartitionType.RECORD_INDEX), +"Record index is not initialized in MDT"); +ValidationUtils.checkState( + dataMetaClient.getTableConfig().getMetadataPartitions().stream().anyMatch(partitionName -> partitionName.startsWith(MetadataPartitionType.SECONDARY_INDEX.getPartitionPath())), +"Secondary index is not initialized in MDT"); +// Fetch secondary-index records +Map>> secondaryKeyRecords = getSecondaryIndexRecords(secondaryKeys, MetadataPartitionType.SECONDARY_INDEX.getPartitionPath()); +// Now collect the record-keys and fetch the RLI records Review Comment: No, here it is RLI. Secondary index contains mapping from secondary key to primary key. So, we need to lookup RLI to get the files for those matching primary keys. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]
codope commented on code in PR #11162: URL: https://github.com/apache/hudi/pull/11162#discussion_r1624769566 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java: ## @@ -851,6 +852,158 @@ private Map reverseLookupSecondaryKeys(String partitionName, Lis return recordKeyMap; } + @Override + protected Map>> getSecondaryIndexRecords(List keys, String partitionName) { +if (keys.isEmpty()) { + return Collections.emptyMap(); +} + +Map>> result = new HashMap<>(); + +// Load the file slices for the partition. Each file slice is a shard which saves a portion of the keys. +List partitionFileSlices = partitionFileSliceMap.computeIfAbsent(partitionName, +k -> HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(metadataMetaClient, metadataFileSystemView, partitionName)); +final int numFileSlices = partitionFileSlices.size(); +ValidationUtils.checkState(numFileSlices > 0, "Number of file slices for partition " + partitionName + " should be > 0"); + +// Lookup keys from each file slice +// TODO: parallelize this loop +for (FileSlice partition : partitionFileSlices) { + Map>> currentFileSliceResult = lookupSecondaryKeysFromFileSlice(partitionName, keys, partition); + + currentFileSliceResult.forEach((secondaryKey, secondaryRecords) -> { +result.merge(secondaryKey, secondaryRecords, (oldRecords, newRecords) -> { + newRecords.addAll(oldRecords); + return newRecords; +}); + }); +} + +return result; + } + + /** + * Lookup list of keys from a single file slice. + * + * @param partitionName Name of the partition + * @param secondaryKeys The list of secondary keys to lookup + * @param fileSlice The file slice to read + * @return A {@code Map} of secondary-key to list of {@code HoodieRecord} for the secondary-keys which were found in the file slice + */ + private Map>> lookupSecondaryKeysFromFileSlice(String partitionName, List secondaryKeys, FileSlice fileSlice) { +Map> logRecordsMap = new HashMap<>(); + +Pair, HoodieMetadataLogRecordReader> readers = getOrCreateReaders(partitionName, fileSlice); +try { + List timings = new ArrayList<>(1); + HoodieSeekingFileReader baseFileReader = readers.getKey(); + HoodieMetadataLogRecordReader logRecordScanner = readers.getRight(); + if (baseFileReader == null && logRecordScanner == null) { +return Collections.emptyMap(); + } + + // Sort it here once so that we don't need to sort individually for base file and for each individual log files. + Set secondaryKeySet = new HashSet<>(secondaryKeys.size()); + List sortedSecondaryKeys = new ArrayList<>(secondaryKeys); + Collections.sort(sortedSecondaryKeys); Review Comment: Good point! I have now parallelized the lookup through engineContext. So, this sorting would only be limited to single partition of data which should not spill to disk. But, even if it does, spark will handle it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]
codope commented on code in PR #11162: URL: https://github.com/apache/hudi/pull/11162#discussion_r1624767426 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestSecondaryIndexWithSql.scala: ## @@ -95,4 +97,39 @@ class TestSecondaryIndexWithSql extends SecondaryIndexTestBase { private def checkAnswer(sql: String)(expects: Seq[Any]*): Unit = { assertResult(expects.map(row => Row(row: _*)).toArray.sortBy(_.toString()))(spark.sql(sql).collect().sortBy(_.toString())) } + + @Test + def testSecondaryIndexWithInFilter(): Unit = { +if (HoodieSparkUtils.gteqSpark3_2) { + var hudiOpts = commonOpts + hudiOpts = hudiOpts + ( +DataSourceWriteOptions.TABLE_TYPE.key -> HoodieTableType.COPY_ON_WRITE.name(), +DataSourceReadOptions.ENABLE_DATA_SKIPPING.key -> "true") + + spark.sql( +s""" + |create table $tableName ( + | record_key_col string, + | not_record_key_col string, + | partition_key_col string + |) using hudi + | options ( + | primaryKey ='record_key_col', + | hoodie.metadata.enable = 'true', + | hoodie.metadata.record.index.enable = 'true', + | hoodie.datasource.write.recordkey.field = 'record_key_col', + | hoodie.enable.data.skipping = 'true' + | ) + | partitioned by(partition_key_col) + | location '$basePath' + """.stripMargin) + spark.sql(s"insert into $tableName values('row1', 'abc', 'p1')") Review Comment: i've added now.. but i discovered one issue while testing which I am still fixing. ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SecondaryIndexSupport.scala: ## @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi + +import org.apache.hudi.RecordLevelIndexSupport.filterQueryWithRecordKey +import org.apache.hudi.SecondaryIndexSupport.filterQueriesWithSecondaryKey +import org.apache.hudi.common.config.HoodieMetadataConfig +import org.apache.hudi.common.fs.FSUtils +import org.apache.hudi.common.model.FileSlice +import org.apache.hudi.common.table.HoodieTableMetaClient +import org.apache.hudi.metadata.HoodieTableMetadataUtil.PARTITION_NAME_SECONDARY_INDEX +import org.apache.hudi.storage.StoragePath +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.expressions.Expression + +import scala.collection.JavaConverters._ +import scala.collection.{JavaConverters, mutable} + +class SecondaryIndexSupport(spark: SparkSession, +metadataConfig: HoodieMetadataConfig, +metaClient: HoodieTableMetaClient) extends RecordLevelIndexSupport(spark, metadataConfig, metaClient) { + override def getIndexName: String = SecondaryIndexSupport.INDEX_NAME Review Comment: yes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7747] In MetaClient remove getBasePathV2() and return StoragePath from getBasePath() [hudi]
hudi-bot commented on PR #11385: URL: https://github.com/apache/hudi/pull/11385#issuecomment-2145635715 ## CI report: * 8605f0fd0fa5bc1c82a26eac8147fc521040f53a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24201) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7747] In MetaClient remove getBasePathV2() and return StoragePath from getBasePath() [hudi]
hudi-bot commented on PR #11385: URL: https://github.com/apache/hudi/pull/11385#issuecomment-2145618615 ## CI report: * 8605f0fd0fa5bc1c82a26eac8147fc521040f53a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch branch-0.x updated: [HUDI-7816] Provide SourceProfileSupplier option into the SnapshotLoadQuerySplitter (branch-0.x) (#11379)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch branch-0.x in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/branch-0.x by this push: new 63773c58018 [HUDI-7816] Provide SourceProfileSupplier option into the SnapshotLoadQuerySplitter (branch-0.x) (#11379) 63773c58018 is described below commit 63773c58018efe3414c941d65ed78958fcf6d32f Author: Matthew Wong AuthorDate: Mon Jun 3 09:07:56 2024 -0700 [HUDI-7816] Provide SourceProfileSupplier option into the SnapshotLoadQuerySplitter (branch-0.x) (#11379) --- .../apache/hudi/utilities/sources/HoodieIncrSource.java | 2 +- .../utilities/sources/SnapshotLoadQuerySplitter.java | 16 .../hudi/utilities/sources/helpers/QueryRunner.java | 2 +- .../sources/helpers/TestSnapshotQuerySplitterImpl.java | 3 ++- 4 files changed, 16 insertions(+), 7 deletions(-) diff --git a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java b/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java index 768e4c3c3fc..79264c6fd6e 100644 --- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java +++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java @@ -203,7 +203,7 @@ public class HoodieIncrSource extends RowSource { .option(DataSourceReadOptions.QUERY_TYPE().key(), DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL()) .load(srcPath); if (snapshotLoadQuerySplitter.isPresent()) { -queryInfo = snapshotLoadQuerySplitter.get().getNextCheckpoint(snapshot, queryInfo); +queryInfo = snapshotLoadQuerySplitter.get().getNextCheckpoint(snapshot, queryInfo, sourceProfileSupplier); } source = snapshot // add filtering so that only interested records are returned. diff --git a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/SnapshotLoadQuerySplitter.java b/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/SnapshotLoadQuerySplitter.java index ca299122ec7..f0fd1fed904 100644 --- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/SnapshotLoadQuerySplitter.java +++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/SnapshotLoadQuerySplitter.java @@ -18,10 +18,14 @@ package org.apache.hudi.utilities.sources; +import org.apache.hudi.ApiMaturityLevel; +import org.apache.hudi.PublicAPIClass; +import org.apache.hudi.PublicAPIMethod; import org.apache.hudi.common.config.TypedProperties; import org.apache.hudi.common.util.Option; import org.apache.hudi.common.util.ReflectionUtils; import org.apache.hudi.utilities.sources.helpers.QueryInfo; +import org.apache.hudi.utilities.streamer.SourceProfileSupplier; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; @@ -30,6 +34,7 @@ import static org.apache.hudi.utilities.sources.SnapshotLoadQuerySplitter.Config /** * Abstract splitter responsible for managing the snapshot load query operations. */ +@PublicAPIClass(maturity = ApiMaturityLevel.EVOLVING) public abstract class SnapshotLoadQuerySplitter { /** @@ -61,20 +66,23 @@ public abstract class SnapshotLoadQuerySplitter { * * @param df The dataset to process. * @param beginCheckpointStr The starting checkpoint string. + * @param sourceProfileSupplier An Option of a SourceProfileSupplier to use in load splitting implementation * @return The next checkpoint as an Option. */ - public abstract Option getNextCheckpoint(Dataset df, String beginCheckpointStr); + @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING) + public abstract Option getNextCheckpoint(Dataset df, String beginCheckpointStr, Option sourceProfileSupplier); /** - * Retrieves the next checkpoint based on query information. + * Retrieves the next checkpoint based on query information and a SourceProfileSupplier. * * @param df The dataset to process. * @param queryInfo The query information object. + * @param sourceProfileSupplier An Option of a SourceProfileSupplier to use in load splitting implementation * @return Updated query information with the next checkpoint, in case of empty checkpoint, * returning endPoint same as queryInfo.getEndInstant(). */ - public QueryInfo getNextCheckpoint(Dataset df, QueryInfo queryInfo) { -return getNextCheckpoint(df, queryInfo.getStartInstant()) + public QueryInfo getNextCheckpoint(Dataset df, QueryInfo queryInfo, Option sourceProfileSupplier) { +return getNextCheckpoint(df, queryInfo.getStartInstant(), sourceProfileSupplier) .map(checkpoint -> queryInfo.withUpdatedEndInstant(checkpoint)) .orElse(queryInfo); } diff --git a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/QueryRunner.java b/hudi-utilities/src/main/java/org/apache/hudi/utilitie
Re: [PR] [HUDI-7816] Provide SourceProfileSupplier option into the SnapshotLoadQuerySplitter (branch-0.x) [hudi]
yihua merged PR #11379: URL: https://github.com/apache/hudi/pull/11379 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch dependabot/maven/io.airlift-aircompressor-0.27 deleted (was 5042e73eb65)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch dependabot/maven/io.airlift-aircompressor-0.27 in repository https://gitbox.apache.org/repos/asf/hudi.git was 5042e73eb65 Bump io.airlift:aircompressor from 0.25 to 0.27 The revisions that were on this branch are still contained in other references; therefore, this change does not discard any commits from the repository.
(hudi) branch master updated: [HUDI-7822] Bump io.airlift:aircompressor from 0.25 to 0.27 (#11380)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new d0c7de050a8 [HUDI-7822] Bump io.airlift:aircompressor from 0.25 to 0.27 (#11380) d0c7de050a8 is described below commit d0c7de050a8900a29f5d127093b378b96f9c5158 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> AuthorDate: Mon Jun 3 09:07:28 2024 -0700 [HUDI-7822] Bump io.airlift:aircompressor from 0.25 to 0.27 (#11380) Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index 8e86c154e86..b3eade21971 100644 --- a/pom.xml +++ b/pom.xml @@ -130,7 +130,7 @@ 1.6.0 1.5.6 0.9.47 -0.25 +0.27 0.13.0 0.8.0 4.5.13
Re: [PR] [HUDI-7822] Bump io.airlift:aircompressor from 0.25 to 0.27 [hudi]
yihua merged PR #11380: URL: https://github.com/apache/hudi/pull/11380 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-7822) Resolve the conflicts between mixed hdfs and local path in Flink tests
[ https://issues.apache.org/jira/browse/HUDI-7822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-7822: --- Assignee: Ethan Guo > Resolve the conflicts between mixed hdfs and local path in Flink tests > -- > > Key: HUDI-7822 > URL: https://issues.apache.org/jira/browse/HUDI-7822 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > Bumps the depdency to mitigate vulnerability. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7827) Bump io.airlift:aircompressor from 0.25 to 0.27
Ethan Guo created HUDI-7827: --- Summary: Bump io.airlift:aircompressor from 0.25 to 0.27 Key: HUDI-7827 URL: https://issues.apache.org/jira/browse/HUDI-7827 Project: Apache Hudi Issue Type: Bug Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7822) Resolve the conflicts between mixed hdfs and local path in Flink tests
[ https://issues.apache.org/jira/browse/HUDI-7822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7822: Fix Version/s: 0.16.0 > Resolve the conflicts between mixed hdfs and local path in Flink tests > -- > > Key: HUDI-7822 > URL: https://issues.apache.org/jira/browse/HUDI-7822 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0, 0.16.0 > > > Bumps the depdency to mitigate vulnerability. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7822) Resolve the conflicts between mixed hdfs and local path in Flink tests
[ https://issues.apache.org/jira/browse/HUDI-7822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7822: Description: Bumps the depdency to mitigate vulnerability. > Resolve the conflicts between mixed hdfs and local path in Flink tests > -- > > Key: HUDI-7822 > URL: https://issues.apache.org/jira/browse/HUDI-7822 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > Bumps the depdency to mitigate vulnerability. -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch branch-0.x updated: [MINOR] Avoid logging full commit metadata at info level (#11382)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch branch-0.x in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/branch-0.x by this push: new 4684082406c [MINOR] Avoid logging full commit metadata at info level (#11382) 4684082406c is described below commit 4684082406c4d23c97b25e96297b7c05fd653208 Author: Tim Brown AuthorDate: Mon Jun 3 11:01:42 2024 -0500 [MINOR] Avoid logging full commit metadata at info level (#11382) --- .../org/apache/hudi/client/BaseHoodieTableServiceClient.java | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java index 7dcff3bd6f2..ff0f635b06e 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java @@ -327,7 +327,8 @@ public abstract class BaseHoodieTableServiceClient extends BaseHoodieCl finalizeWrite(table, compactionCommitTime, writeStats); // commit to data table after committing to metadata table. writeTableMetadata(table, compactionCommitTime, metadata, context.emptyHoodieData()); - LOG.info("Committing Compaction {}. Finished with result {}", compactionCommitTime, metadata); + LOG.info("Committing Compaction {}", compactionCommitTime); + LOG.debug("Compaction {} finished with result: {}", compactionCommitTime, metadata); CompactHelpers.getInstance().completeInflightCompaction(table, compactionCommitTime, metadata); } finally { this.txnManager.endTransaction(Option.of(compactionInstant)); @@ -388,7 +389,8 @@ public abstract class BaseHoodieTableServiceClient extends BaseHoodieCl finalizeWrite(table, logCompactionCommitTime, writeStats); // commit to data table after committing to metadata table. writeTableMetadata(table, logCompactionCommitTime, metadata, context.emptyHoodieData()); - LOG.info("Committing Log Compaction {}. Finished with result {}", logCompactionCommitTime, metadata); + LOG.info("Committing Log Compaction {}", logCompactionCommitTime); + LOG.debug("Log Compaction {} finished with result {}", logCompactionCommitTime, metadata); CompactHelpers.getInstance().completeInflightLogCompaction(table, logCompactionCommitTime, metadata); } finally { this.txnManager.endTransaction(Option.of(logCompactionInstant)); @@ -513,7 +515,8 @@ public abstract class BaseHoodieTableServiceClient extends BaseHoodieCl // Update table's metadata (table) writeTableMetadata(table, clusteringInstant.getTimestamp(), metadata, writeStatuses.orElseGet(context::emptyHoodieData)); - LOG.info("Committing Clustering {}. Finished with result {}", clusteringCommitTime, metadata); + LOG.info("Committing Clustering {}", clusteringCommitTime); + LOG.debug("Clustering {} finished with result {}", clusteringCommitTime, metadata); table.getActiveTimeline().transitionReplaceInflightToComplete( clusteringInstant,
Re: [PR] [MINOR] Avoid logging full commit metadata at info level [hudi]
yihua merged PR #11382: URL: https://github.com/apache/hudi/pull/11382 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [DISCUSSION] Deltastreamer - Reading commit checkpoint from Kafka instead of latest Hoodie commit [hudi]
KishanFairmatic commented on issue #11268: URL: https://github.com/apache/hudi/issues/11268#issuecomment-2145507599 @danny0405 : In master i.e. HUDI version 1.0.0, there is a flag `--ignore-checkpoint`, which will do the same thing. We use version 0.13.0 for now, so would use this for now till 1.0.0 is stable and we are ready to upgrade. But by default, when auto.offset.reset = group, and for the first attempt when there are no commits in kafka, it defaults to latest, which might mean loss of data. Either that should be earliest, or there should be an option to choose earliest to prevent missing any data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2145498112 ## CI report: * c98242b22fb2518c0cc93c037df558037030500f UNKNOWN * 22846139475031d663fc6bb2b1a554dd1b2e637e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24200) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7747] In MetaClient remove getBasePathV2() and return StoragePath from getBasePath() [hudi]
hudi-bot commented on PR #11385: URL: https://github.com/apache/hudi/pull/11385#issuecomment-2145441835 ## CI report: * 11cd96c4d0e7727918907e231c3eef8c997f0476 UNKNOWN * 8605f0fd0fa5bc1c82a26eac8147fc521040f53a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24201) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7747] In MetaClient remove getBasePathV2() and return StoragePath from getBasePath() [hudi]
hudi-bot commented on PR #11385: URL: https://github.com/apache/hudi/pull/11385#issuecomment-2145334253 ## CI report: * 11cd96c4d0e7727918907e231c3eef8c997f0476 UNKNOWN * 8605f0fd0fa5bc1c82a26eac8147fc521040f53a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24201) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2145332448 ## CI report: * c98242b22fb2518c0cc93c037df558037030500f UNKNOWN * ec6fa62945094d548dce7d7e8e6ef2363ba0d05f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24179) * 22846139475031d663fc6bb2b1a554dd1b2e637e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24200) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7825]Support Report pending clustering and compaction plan metric [hudi]
LXin96 commented on code in PR #11377: URL: https://github.com/apache/hudi/pull/11377#discussion_r1624539082 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java: ## @@ -272,6 +275,28 @@ public void notifyCheckpointComplete(long checkpointId) { ); } + private void emitCompactionAndClusteringMetrics(Configuration conf, + HoodieTableMetaClient metaClient, HoodieFlinkWriteClient writeClient) { +if (conf.getBoolean(FlinkOptions.CLUSTERING_SCHEDULE_ENABLED) +&& !conf.getBoolean(FlinkOptions.CLUSTERING_ASYNC_ENABLED)) { + HoodieTimeline pendingReplaceTimeline = metaClient.getActiveTimeline() + .filterPendingReplaceTimeline(); + HoodieMetrics metrics = writeClient.getMetrics(); + if (metrics != null) { + metrics.setPendingClusteringCount(pendingReplaceTimeline.countInstants()); + } +} +if (conf.getBoolean(FlinkOptions.COMPACTION_SCHEDULE_ENABLED) Review Comment: @danny0405 en,i get you, however, this is another situation, when we FlinkOptions.COMPACTION_SCHEDULE_ENABLED set to true, FlinkOptions.COMPACTION_ASYNC_ENABLED set to false , the CompactionPlanOperator will not be added to pipeline, then will not report the pending compaction plan , the situation is as same as clustering. this situation happens to offload the clustering or compaction. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7747] In MetaClient remove getBasePathV2() and return StoragePath from getBasePath() [hudi]
hudi-bot commented on PR #11385: URL: https://github.com/apache/hudi/pull/11385#issuecomment-2145315579 ## CI report: * 11cd96c4d0e7727918907e231c3eef8c997f0476 UNKNOWN * 8605f0fd0fa5bc1c82a26eac8147fc521040f53a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2145313418 ## CI report: * c98242b22fb2518c0cc93c037df558037030500f UNKNOWN * ec6fa62945094d548dce7d7e8e6ef2363ba0d05f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24179) * 22846139475031d663fc6bb2b1a554dd1b2e637e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7747] In MetaClient remove getBasePathV2() and return StoragePath from getBasePath() [hudi]
hudi-bot commented on PR #11385: URL: https://github.com/apache/hudi/pull/11385#issuecomment-2145294738 ## CI report: * 11cd96c4d0e7727918907e231c3eef8c997f0476 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]
jonvex commented on code in PR #10957: URL: https://github.com/apache/hudi/pull/10957#discussion_r1624489092 ## hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java: ## @@ -231,7 +231,13 @@ private static Option findNestedField(Schema schema, String[] fiel if (!nestedPart.isPresent()) { return Option.empty(); } -return nestedPart; +boolean isUnion = false; Review Comment: hudi-common/src/test/java/org/apache/hudi/avro/TestAvroSchemaUtils.java I uncommented the test in this pr -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7747) In MetaClient remove getBasePathV2() and return StoragePath from getBasePath()
[ https://issues.apache.org/jira/browse/HUDI-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7747: - Labels: pull-request-available (was: ) > In MetaClient remove getBasePathV2() and return StoragePath from getBasePath() > -- > > Key: HUDI-7747 > URL: https://issues.apache.org/jira/browse/HUDI-7747 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Jonathan Vexler >Assignee: Vova Kolmakov >Priority: Major > Labels: pull-request-available > > In HoodieTableMetaClient remove getBasePathV2() and return StoragePath from > getBasePath(). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7747] In MetaClient remove getBasePathV2() and return StoragePath from getBasePath() [hudi]
wombatu-kun opened a new pull request, #11385: URL: https://github.com/apache/hudi/pull/11385 ### Change Logs In HoodieTableMetaClient remove getBasePathV2() and return StoragePath from getBasePath(). ### Impact none ### Risk level (write none, low medium or high below) none ### Documentation Update none - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-5956] fix spark DAG ui when write [hudi]
KnightChess commented on code in PR #11376: URL: https://github.com/apache/hudi/pull/11376#discussion_r1624306145 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala: ## @@ -174,32 +174,43 @@ class HoodieSparkSqlWriterInternal { sourceDf: DataFrame, streamingWritesParamsOpt: Option[StreamingWriteParams] = Option.empty, hoodieWriteClient: Option[SparkRDDWriteClient[_]] = Option.empty): - (Boolean, HOption[String], HOption[String], HOption[String], SparkRDDWriteClient[_], HoodieTableConfig) = { -var succeeded = false -var counter = 0 -val maxRetry: Integer = Integer.parseInt(optParams.getOrElse(HoodieWriteConfig.NUM_RETRIES_ON_CONFLICT_FAILURES.key(), HoodieWriteConfig.NUM_RETRIES_ON_CONFLICT_FAILURES.defaultValue().toString)) -var toReturn: (Boolean, HOption[String], HOption[String], HOption[String], SparkRDDWriteClient[_], HoodieTableConfig) = null -while (counter <= maxRetry && !succeeded) { - try { -toReturn = writeInternal(sqlContext, mode, optParams, sourceDf, streamingWritesParamsOpt, hoodieWriteClient) -if (counter > 0) { - log.warn(s"Succeeded with attempt no $counter") -} -succeeded = true - } catch { -case e: HoodieWriteConflictException => - val writeConcurrencyMode = optParams.getOrElse(HoodieWriteConfig.WRITE_CONCURRENCY_MODE.key(), HoodieWriteConfig.WRITE_CONCURRENCY_MODE.defaultValue()) - if (WriteConcurrencyMode.supportsMultiWriter(writeConcurrencyMode) && counter < maxRetry) { -counter += 1 -log.warn(s"Conflict found. Retrying again for attempt no $counter") - } else { -throw e +val retryWrite: () => (Boolean, HOption[String], HOption[String], HOption[String], SparkRDDWriteClient[_], HoodieTableConfig) = () => { + var succeeded = false + var counter = 0 + val maxRetry: Integer = Integer.parseInt(optParams.getOrElse(HoodieWriteConfig.NUM_RETRIES_ON_CONFLICT_FAILURES.key(), HoodieWriteConfig.NUM_RETRIES_ON_CONFLICT_FAILURES.defaultValue().toString)) + var toReturn: (Boolean, HOption[String], HOption[String], HOption[String], SparkRDDWriteClient[_], HoodieTableConfig) = null + + while (counter <= maxRetry && !succeeded) { +try { + toReturn = writeInternal(sqlContext, mode, optParams, sourceDf, streamingWritesParamsOpt, hoodieWriteClient) + if (counter > 0) { +log.warn(s"Succeeded with attempt no $counter") } + succeeded = true +} catch { + case e: HoodieWriteConflictException => +val writeConcurrencyMode = optParams.getOrElse(HoodieWriteConfig.WRITE_CONCURRENCY_MODE.key(), HoodieWriteConfig.WRITE_CONCURRENCY_MODE.defaultValue()) +if (WriteConcurrencyMode.supportsMultiWriter(writeConcurrencyMode) && counter < maxRetry) { + counter += 1 + log.warn(s"Conflict found. Retrying again for attempt no $counter") +} else { + throw e +} +} } + toReturn +} + +val executionId = getExecutionId(sqlContext.sparkContext, sourceDf.queryExecution) +if (executionId.isEmpty) { + sparkAdapter.sqlExecutionWithNewExecutionId(sourceDf.sparkSession, sourceDf.queryExecution, Option("Hudi Command"))( Review Comment: this executionId will be sub-list in rootExecutionId after this pr https://github.com/apache/spark/pull/40403, so ignore this `TODO` https://github.com/apache/hudi/pull/8233#discussion_r1298071684, cc @codope -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Fix LoggerName for BaseHoodieTableServiceClient [hudi]
danny0405 merged PR #11384: URL: https://github.com/apache/hudi/pull/11384 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [MINOR] Fix LoggerName for BaseHoodieTableServiceClient (#11384)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new e131a049de2 [MINOR] Fix LoggerName for BaseHoodieTableServiceClient (#11384) e131a049de2 is described below commit e131a049de25ebc08d86eb3148e49bd2c1f87b54 Author: wuzhenhua <102498303+wuzhenhu...@users.noreply.github.com> AuthorDate: Mon Jun 3 18:36:26 2024 +0800 [MINOR] Fix LoggerName for BaseHoodieTableServiceClient (#11384) Co-authored-by: Admin --- .../main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java index a05a236f31d..23dfec7dee3 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java @@ -97,7 +97,7 @@ import static org.apache.hudi.metadata.HoodieTableMetadataUtil.isIndexingCommit; */ public abstract class BaseHoodieTableServiceClient extends BaseHoodieClient implements RunsTableService { - private static final Logger LOG = LoggerFactory.getLogger(BaseHoodieWriteClient.class); + private static final Logger LOG = LoggerFactory.getLogger(BaseHoodieTableServiceClient.class); protected transient Timer.Context compactionTimer; protected transient Timer.Context clusteringTimer;
Re: [PR] [MINOR] Fix LoggerName for BaseHoodieTableServiceClient [hudi]
hudi-bot commented on PR #11384: URL: https://github.com/apache/hudi/pull/11384#issuecomment-2144815626 ## CI report: * b54b7755e458b8d4da262febd4d2cf9f0607ada8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24197) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Fix LoggerName for BaseHoodieTableServiceClient [hudi]
hudi-bot commented on PR #11384: URL: https://github.com/apache/hudi/pull/11384#issuecomment-2144794744 ## CI report: * b54b7755e458b8d4da262febd4d2cf9f0607ada8 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]
hudi-bot commented on PR #11162: URL: https://github.com/apache/hudi/pull/11162#issuecomment-2144793701 ## CI report: * b342d8f8e10f77419bf1bd0bc9f626a596ad65f9 UNKNOWN * b00831e0a0506714d27bc2a64e58084b357a83cc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24196) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [MINOR] Fix LoggerName for BaseHoodieTableServiceClient [hudi]
wuzhenhua01 opened a new pull request, #11384: URL: https://github.com/apache/hudi/pull/11384 ### Change Logs Fix LoggerName for BaseHoodieTableServiceClient ### Impact No ### Risk level (write none, low medium or high below) none ### Documentation Update ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Change Logs and Impact were stated clearly - [x] Adequate tests were added if applicable - [x] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]
hudi-bot commented on PR #11162: URL: https://github.com/apache/hudi/pull/11162#issuecomment-2144669166 ## CI report: * df12fa59cbba5b14bb98d66dffb510f5b1659177 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24195) * b342d8f8e10f77419bf1bd0bc9f626a596ad65f9 UNKNOWN * b00831e0a0506714d27bc2a64e58084b357a83cc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24196) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]
wombatu-kun commented on PR #11152: URL: https://github.com/apache/hudi/pull/11152#issuecomment-2144664083 I'm tired of resolving conflicts for this PR again and again. Somebody review it and merge, please! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [DISCUSSION] Deltastreamer - Reading commit checkpoint from Kafka instead of latest Hoodie commit [hudi]
danny0405 commented on issue #11268: URL: https://github.com/apache/hudi/issues/11268#issuecomment-2144599040 Ususally we do not creating PR agains released tag, can you fire a new one against master? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]
danny0405 commented on code in PR #11162: URL: https://github.com/apache/hudi/pull/11162#discussion_r1624007332 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestSecondaryIndexWithSql.scala: ## @@ -95,4 +97,39 @@ class TestSecondaryIndexWithSql extends SecondaryIndexTestBase { private def checkAnswer(sql: String)(expects: Seq[Any]*): Unit = { assertResult(expects.map(row => Row(row: _*)).toArray.sortBy(_.toString()))(spark.sql(sql).collect().sortBy(_.toString())) } + + @Test + def testSecondaryIndexWithInFilter(): Unit = { +if (HoodieSparkUtils.gteqSpark3_2) { + var hudiOpts = commonOpts + hudiOpts = hudiOpts + ( +DataSourceWriteOptions.TABLE_TYPE.key -> HoodieTableType.COPY_ON_WRITE.name(), +DataSourceReadOptions.ENABLE_DATA_SKIPPING.key -> "true") + + spark.sql( +s""" + |create table $tableName ( + | record_key_col string, + | not_record_key_col string, + | partition_key_col string + |) using hudi + | options ( + | primaryKey ='record_key_col', + | hoodie.metadata.enable = 'true', + | hoodie.metadata.record.index.enable = 'true', + | hoodie.datasource.write.recordkey.field = 'record_key_col', + | hoodie.enable.data.skipping = 'true' + | ) + | partitioned by(partition_key_col) + | location '$basePath' + """.stripMargin) + spark.sql(s"insert into $tableName values('row1', 'abc', 'p1')") Review Comment: do we have test case for non-string values? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]
danny0405 commented on code in PR #11162: URL: https://github.com/apache/hudi/pull/11162#discussion_r1624004588 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SecondaryIndexSupport.scala: ## @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi + +import org.apache.hudi.RecordLevelIndexSupport.filterQueryWithRecordKey +import org.apache.hudi.SecondaryIndexSupport.filterQueriesWithSecondaryKey +import org.apache.hudi.common.config.HoodieMetadataConfig +import org.apache.hudi.common.fs.FSUtils +import org.apache.hudi.common.model.FileSlice +import org.apache.hudi.common.table.HoodieTableMetaClient +import org.apache.hudi.metadata.HoodieTableMetadataUtil.PARTITION_NAME_SECONDARY_INDEX +import org.apache.hudi.storage.StoragePath +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.expressions.Expression + +import scala.collection.JavaConverters._ +import scala.collection.{JavaConverters, mutable} + +class SecondaryIndexSupport(spark: SparkSession, +metadataConfig: HoodieMetadataConfig, +metaClient: HoodieTableMetaClient) extends RecordLevelIndexSupport(spark, metadataConfig, metaClient) { + override def getIndexName: String = SecondaryIndexSupport.INDEX_NAME Review Comment: This is also a code reuse from RLI right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]
danny0405 commented on code in PR #11162: URL: https://github.com/apache/hudi/pull/11162#discussion_r1624001312 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java: ## @@ -851,6 +852,158 @@ private Map reverseLookupSecondaryKeys(String partitionName, Lis return recordKeyMap; } + @Override + protected Map>> getSecondaryIndexRecords(List keys, String partitionName) { +if (keys.isEmpty()) { + return Collections.emptyMap(); +} + +Map>> result = new HashMap<>(); + +// Load the file slices for the partition. Each file slice is a shard which saves a portion of the keys. +List partitionFileSlices = partitionFileSliceMap.computeIfAbsent(partitionName, +k -> HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(metadataMetaClient, metadataFileSystemView, partitionName)); +final int numFileSlices = partitionFileSlices.size(); +ValidationUtils.checkState(numFileSlices > 0, "Number of file slices for partition " + partitionName + " should be > 0"); + +// Lookup keys from each file slice +// TODO: parallelize this loop +for (FileSlice partition : partitionFileSlices) { + Map>> currentFileSliceResult = lookupSecondaryKeysFromFileSlice(partitionName, keys, partition); + + currentFileSliceResult.forEach((secondaryKey, secondaryRecords) -> { +result.merge(secondaryKey, secondaryRecords, (oldRecords, newRecords) -> { + newRecords.addAll(oldRecords); + return newRecords; +}); + }); +} + +return result; + } + + /** + * Lookup list of keys from a single file slice. + * + * @param partitionName Name of the partition + * @param secondaryKeys The list of secondary keys to lookup + * @param fileSlice The file slice to read + * @return A {@code Map} of secondary-key to list of {@code HoodieRecord} for the secondary-keys which were found in the file slice + */ + private Map>> lookupSecondaryKeysFromFileSlice(String partitionName, List secondaryKeys, FileSlice fileSlice) { +Map> logRecordsMap = new HashMap<>(); + +Pair, HoodieMetadataLogRecordReader> readers = getOrCreateReaders(partitionName, fileSlice); +try { + List timings = new ArrayList<>(1); + HoodieSeekingFileReader baseFileReader = readers.getKey(); + HoodieMetadataLogRecordReader logRecordScanner = readers.getRight(); + if (baseFileReader == null && logRecordScanner == null) { +return Collections.emptyMap(); + } + + // Sort it here once so that we don't need to sort individually for base file and for each individual log files. + Set secondaryKeySet = new HashSet<>(secondaryKeys.size()); + List sortedSecondaryKeys = new ArrayList<>(secondaryKeys); + Collections.sort(sortedSecondaryKeys); Review Comment: Wondering if we have general sort solution that supports spilling to disk, the in-memory sort is slow and also has risk of OOM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]
hudi-bot commented on PR #11162: URL: https://github.com/apache/hudi/pull/11162#issuecomment-2144576528 ## CI report: * df12fa59cbba5b14bb98d66dffb510f5b1659177 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24195) * b342d8f8e10f77419bf1bd0bc9f626a596ad65f9 UNKNOWN * b00831e0a0506714d27bc2a64e58084b357a83cc UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]
danny0405 commented on code in PR #11162: URL: https://github.com/apache/hudi/pull/11162#discussion_r1623995661 ## hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java: ## @@ -312,6 +312,33 @@ public Map> readRecordIndex(List + * If the Metadata Table is not enabled, an exception is thrown to distinguish this from the absence of the key. + * + * @param secondaryKeys The list of secondary keys to read + */ + @Override + public Map> readSecondaryIndex(List secondaryKeys) { + ValidationUtils.checkState(dataMetaClient.getTableConfig().isMetadataPartitionAvailable(MetadataPartitionType.RECORD_INDEX), +"Record index is not initialized in MDT"); +ValidationUtils.checkState( + dataMetaClient.getTableConfig().getMetadataPartitions().stream().anyMatch(partitionName -> partitionName.startsWith(MetadataPartitionType.SECONDARY_INDEX.getPartitionPath())), +"Secondary index is not initialized in MDT"); +// Fetch secondary-index records +Map>> secondaryKeyRecords = getSecondaryIndexRecords(secondaryKeys, MetadataPartitionType.SECONDARY_INDEX.getPartitionPath()); +// Now collect the record-keys and fetch the RLI records Review Comment: do you mean secondary index records? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7146] Integrate secondary index on reader path [hudi]
hudi-bot commented on PR #11162: URL: https://github.com/apache/hudi/pull/11162#issuecomment-2144549129 ## CI report: * 3c52961bdbcb210e4c7140f5939143cfda7adb50 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24151) * df12fa59cbba5b14bb98d66dffb510f5b1659177 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24195) * b342d8f8e10f77419bf1bd0bc9f626a596ad65f9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]
hudi-bot commented on PR #11152: URL: https://github.com/apache/hudi/pull/11152#issuecomment-2144531002 ## CI report: * 9946df9dd0aca2e4e8613b36265462d76397c8d8 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24194) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org