[GitHub] [hudi] TengHuo commented on pull request #6000: [HUDI-4340] fix not parsable text DateTimeParseException in HoodieInstantTimeGenerator.parseDateFromInstantTime
TengHuo commented on PR #6000: URL: https://github.com/apache/hudi/pull/6000#issuecomment-1226840804 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6493: [HUDI-4715] Needed To ReSync Hive In StreamWriteOperatorCoordinator's…
hudi-bot commented on PR #6493: URL: https://github.com/apache/hudi/pull/6493#issuecomment-1226819031 ## CI report: * 1168aaac3b5f8339c7f366ea486d5bf7f8ca0259 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan opened a new pull request, #6494: Hudi 4696 fix flaky
xushiyan opened a new pull request, #6494: URL: https://github.com/apache/hudi/pull/6494 ### Change Logs Fix flakiness ```text [ERROR] org.apache.hudi.hadoop.functional.TestHoodieCombineHiveInputFormat Time elapsed: 0.675 s <<< ERROR! java.lang.NullPointerException at org.apache.hudi.common.testutils.minicluster.HdfsTestService.configureDFSCluster(HdfsTestService.java:135) at org.apache.hudi.common.testutils.minicluster.HdfsTestService.start(HdfsTestService.java:87) at org.apache.hudi.common.testutils.minicluster.MiniClusterUtil.setUp(MiniClusterUtil.java:41) ``` ### Impact **Risk level: none** ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6484: [HUDI-4703] use the historical schema to response time travel query
hudi-bot commented on PR #6484: URL: https://github.com/apache/hudi/pull/6484#issuecomment-1226815507 ## CI report: * cab8551200693c9633c54b3e2acc9ab68ddc44cf Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10930) * 5fbf569007c321799dcc80b827c0a2417e1dbf49 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10936) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4715) Needed To ReSync Hive In StreamWriteOperatorCoordinator's initInstant method
[ https://issues.apache.org/jira/browse/HUDI-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4715: - Labels: pull-request-available (was: ) > Needed To ReSync Hive In StreamWriteOperatorCoordinator's initInstant method > > > Key: HUDI-4715 > URL: https://issues.apache.org/jira/browse/HUDI-4715 > Project: Apache Hudi > Issue Type: Bug >Reporter: yuemeng >Priority: Major > Labels: pull-request-available > > Currently. It recommits the last inflight instant if the write metadata > checkpoint successfully > but was not committed due to some rare cases. We also need to ReSync hive as > same as we recommit the instant if we need sync hive -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-4715) Needed To ReSync Hive In StreamWriteOperatorCoordinator's initInstant method
[ https://issues.apache.org/jira/browse/HUDI-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuemeng reassigned HUDI-4715: - Assignee: yuemeng > Needed To ReSync Hive In StreamWriteOperatorCoordinator's initInstant method > > > Key: HUDI-4715 > URL: https://issues.apache.org/jira/browse/HUDI-4715 > Project: Apache Hudi > Issue Type: Bug >Reporter: yuemeng >Assignee: yuemeng >Priority: Major > Labels: pull-request-available > > Currently. It recommits the last inflight instant if the write metadata > checkpoint successfully > but was not committed due to some rare cases. We also need to ReSync hive as > same as we recommit the instant if we need sync hive -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] JerryYue-M opened a new pull request, #6493: [HUDI-4715] Needed To ReSync Hive In StreamWriteOperatorCoordinator's…
JerryYue-M opened a new pull request, #6493: URL: https://github.com/apache/hudi/pull/6493 … initInstant method ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ **Risk level: none | low | medium | high** _Choose one. If medium or high, explain what verification was done to mitigate the risks._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6484: [HUDI-4703] use the historical schema to response time travel query
hudi-bot commented on PR #6484: URL: https://github.com/apache/hudi/pull/6484#issuecomment-1226812308 ## CI report: * cab8551200693c9633c54b3e2acc9ab68ddc44cf Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10930) * 5fbf569007c321799dcc80b827c0a2417e1dbf49 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6490: [HUDI-4713] Fix flaky ITTestHoodieDataSource#testAppendWrite
hudi-bot commented on PR #6490: URL: https://github.com/apache/hudi/pull/6490#issuecomment-1226806258 ## CI report: * 9d140233d69c7b5d3c7e68d31c3824e7082e40cb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10931) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6484: [HUDI-4703] use the historical schema to response time travel query
hudi-bot commented on PR #6484: URL: https://github.com/apache/hudi/pull/6484#issuecomment-1226806206 ## CI report: * cab8551200693c9633c54b3e2acc9ab68ddc44cf Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10930) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-4665) Flip default for "ignore.failed.batch" for streaming sink
[ https://issues.apache.org/jira/browse/HUDI-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu closed HUDI-4665. Fix Version/s: 0.12.1 Resolution: Fixed > Flip default for "ignore.failed.batch" for streaming sink > - > > Key: HUDI-4665 > URL: https://issues.apache.org/jira/browse/HUDI-4665 > Project: Apache Hudi > Issue Type: Improvement > Components: spark >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.12.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[hudi] branch master updated (5f92221655 -> c188852f49)
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 5f92221655 [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies (#6170) add c188852f49 [HUDI-4665] Flipping default for "ignore failed batch" config in streaming sink to false (#6450) No new revisions were added by this update. Summary of changes: .../src/main/java/org/apache/hudi/configuration/FlinkOptions.java | 3 ++- .../src/main/scala/org/apache/hudi/DataSourceOptions.scala | 7 --- .../src/main/scala/org/apache/hudi/HoodieStreamingSink.scala | 2 +- .../hudi-spark/src/test/java/HoodieJavaStreamingApp.java | 1 + 4 files changed, 8 insertions(+), 5 deletions(-)
[GitHub] [hudi] xushiyan merged pull request #6450: [HUDI-4665] Flipping default for "ignore failed batch" config in streaming sink to false
xushiyan merged PR #6450: URL: https://github.com/apache/hudi/pull/6450 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4716) Avoid bundle parquet in hadoop-mr
[ https://issues.apache.org/jira/browse/HUDI-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4716: - Fix Version/s: 0.13.0 (was: 0.12.1) > Avoid bundle parquet in hadoop-mr > - > > Key: HUDI-4716 > URL: https://issues.apache.org/jira/browse/HUDI-4716 > Project: Apache Hudi > Issue Type: Improvement > Components: dependencies >Reporter: Raymond Xu >Priority: Blocker > Fix For: 0.13.0 > > > As per discussion in > https://github.com/apache/hudi/pull/5250#discussion_r930144788 > This will reduce the bundle size and uphold the principle of not bundling > file storage format. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4716) Avoid bundle parquet in hadoop-mr
Raymond Xu created HUDI-4716: Summary: Avoid bundle parquet in hadoop-mr Key: HUDI-4716 URL: https://issues.apache.org/jira/browse/HUDI-4716 Project: Apache Hudi Issue Type: Improvement Components: dependencies Reporter: Raymond Xu Fix For: 0.12.1 As per discussion in https://github.com/apache/hudi/pull/5250#discussion_r930144788 This will reduce the bundle size and uphold the principle of not bundling file storage format. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …
hudi-bot commented on PR #6489: URL: https://github.com/apache/hudi/pull/6489#issuecomment-1226773397 ## CI report: * 47680402da599615de30c13a1f22f79f3573ee30 UNKNOWN * e5b86079b7405e9bb1604c9960d85349855db23a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10929) * 5613f14b3d5f1c8aaf8de1730e2f21b78a657150 UNKNOWN * b26294c07ac06186c66a10444e7677656be94037 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10934) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …
hudi-bot commented on PR #6489: URL: https://github.com/apache/hudi/pull/6489#issuecomment-1226770952 ## CI report: * 47680402da599615de30c13a1f22f79f3573ee30 UNKNOWN * e5b86079b7405e9bb1604c9960d85349855db23a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10929) * 5613f14b3d5f1c8aaf8de1730e2f21b78a657150 UNKNOWN * b26294c07ac06186c66a10444e7677656be94037 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6135: [HUDI-4418] Add support for ProtoKafkaSource
hudi-bot commented on PR #6135: URL: https://github.com/apache/hudi/pull/6135#issuecomment-1226770645 ## CI report: * d36fed637603d9959e8d049ac0815b9c729eb246 UNKNOWN * 194396032e698ac4210d8652a969c7e58832d5db Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10925) * 9a4df85fd5b1787af587817cba959c536d904fde Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10933) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zjuwangg opened a new pull request, #6492: Update compaction.md
zjuwangg opened a new pull request, #6492: URL: https://github.com/apache/hudi/pull/6492 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ **Risk level: none | low | medium | high** _Choose one. If medium or high, explain what verification was done to mitigate the risks._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6450: [HUDI-4665] Flipping default for "ignore failed batch" config in streaming sink to false
hudi-bot commented on PR #6450: URL: https://github.com/apache/hudi/pull/6450#issuecomment-1226768249 ## CI report: * 7deae45265d4ade6c324d4477e6144289f1714fd Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10927) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-4715) Needed To ReSync Hive In StreamWriteOperatorCoordinator's initInstant method
yuemeng created HUDI-4715: - Summary: Needed To ReSync Hive In StreamWriteOperatorCoordinator's initInstant method Key: HUDI-4715 URL: https://issues.apache.org/jira/browse/HUDI-4715 Project: Apache Hudi Issue Type: Bug Reporter: yuemeng Currently. It recommits the last inflight instant if the write metadata checkpoint successfully but was not committed due to some rare cases. We also need to ReSync hive as same as we recommit the instant if we need sync hive -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6491: [HUDI-4714] HoodieFlinkWriteClient can't load callback config to Hood…
hudi-bot commented on PR #6491: URL: https://github.com/apache/hudi/pull/6491#issuecomment-1226737763 ## CI report: * 25604fb76beaec486a84061f29f63260b3412521 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10932) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …
hudi-bot commented on PR #6489: URL: https://github.com/apache/hudi/pull/6489#issuecomment-1226737727 ## CI report: * 47680402da599615de30c13a1f22f79f3573ee30 UNKNOWN * e5b86079b7405e9bb1604c9960d85349855db23a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10929) * 5613f14b3d5f1c8aaf8de1730e2f21b78a657150 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] WangCHX commented on issue #6487: [SUPPORT] Primary key check in deltastreamer
WangCHX commented on issue #6487: URL: https://github.com/apache/hudi/issues/6487#issuecomment-1226735410 thanks. @yihua -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6491: [HUDI-4714] HoodieFlinkWriteClient can't load callback config to Hood…
hudi-bot commented on PR #6491: URL: https://github.com/apache/hudi/pull/6491#issuecomment-1226735211 ## CI report: * 25604fb76beaec486a84061f29f63260b3412521 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6490: [HUDI-4713] Fix flaky ITTestHoodieDataSource#testAppendWrite
hudi-bot commented on PR #6490: URL: https://github.com/apache/hudi/pull/6490#issuecomment-1226735186 ## CI report: * 9d140233d69c7b5d3c7e68d31c3824e7082e40cb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10931) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …
hudi-bot commented on PR #6489: URL: https://github.com/apache/hudi/pull/6489#issuecomment-1226735169 ## CI report: * 47680402da599615de30c13a1f22f79f3573ee30 UNKNOWN * e5b86079b7405e9bb1604c9960d85349855db23a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10929) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6484: [HUDI-4703] use the historical schema to response time travel query
hudi-bot commented on PR #6484: URL: https://github.com/apache/hudi/pull/6484#issuecomment-1226735140 ## CI report: * 5fa2c9cab3ed92e80292666043cbbd71cb24dc23 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10915) * cab8551200693c9633c54b3e2acc9ab68ddc44cf Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10930) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6450: [HUDI-4665] Flipping default for "ignore failed batch" config in streaming sink to false
hudi-bot commented on PR #6450: URL: https://github.com/apache/hudi/pull/6450#issuecomment-1226735073 ## CI report: * 7deae45265d4ade6c324d4477e6144289f1714fd Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10927) * Unknown: [CANCELED](TBD) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …
hudi-bot commented on PR #6489: URL: https://github.com/apache/hudi/pull/6489#issuecomment-1226732509 ## CI report: * 47680402da599615de30c13a1f22f79f3573ee30 UNKNOWN * e5b86079b7405e9bb1604c9960d85349855db23a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6484: [HUDI-4703] use the historical schema to response time travel query
hudi-bot commented on PR #6484: URL: https://github.com/apache/hudi/pull/6484#issuecomment-1226732486 ## CI report: * 5fa2c9cab3ed92e80292666043cbbd71cb24dc23 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10915) * cab8551200693c9633c54b3e2acc9ab68ddc44cf UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6490: [HUDI-4713] Fix flaky ITTestHoodieDataSource#testAppendWrite
hudi-bot commented on PR #6490: URL: https://github.com/apache/hudi/pull/6490#issuecomment-1226732541 ## CI report: * 9d140233d69c7b5d3c7e68d31c3824e7082e40cb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6450: [HUDI-4665] Flipping default for "ignore failed batch" config in streaming sink to false
hudi-bot commented on PR #6450: URL: https://github.com/apache/hudi/pull/6450#issuecomment-1226732418 ## CI report: * 7deae45265d4ade6c324d4477e6144289f1714fd Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10927) * Unknown: [CANCELED](TBD) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6135: [HUDI-4418] Add support for ProtoKafkaSource
hudi-bot commented on PR #6135: URL: https://github.com/apache/hudi/pull/6135#issuecomment-1226732135 ## CI report: * d36fed637603d9959e8d049ac0815b9c729eb246 UNKNOWN * 194396032e698ac4210d8652a969c7e58832d5db Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10925) * 9a4df85fd5b1787af587817cba959c536d904fde UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4714) HoodieFlinkWriteClient can't load callback config to HoodieWriteConfig
[ https://issues.apache.org/jira/browse/HUDI-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4714: - Labels: pull-request-available (was: ) > HoodieFlinkWriteClient can't load callback config to HoodieWriteConfig > -- > > Key: HUDI-4714 > URL: https://issues.apache.org/jira/browse/HUDI-4714 > Project: Apache Hudi > Issue Type: Bug >Reporter: yuemeng >Assignee: yuemeng >Priority: Major > Labels: pull-request-available > > Currently, it doesn't load callback config to write config when call > StreamUtil's getHoodieClientConfig method > So In hoodie Flink write client ,callback never worked > {code} > HoodieWriteConfig.Builder builder = > HoodieWriteConfig.newBuilder() > .withEngineType(EngineType.FLINK) > .withPath(conf.getString(FlinkOptions.PATH)) > .combineInput(conf.getBoolean(FlinkOptions.PRE_COMBINE), true) > .withMergeAllowDuplicateOnInserts(OptionsResolver.insertClustering(conf)) > .withClusteringConfig( > HoodieClusteringConfig.newBuilder() > .withAsyncClustering(conf.getBoolean(FlinkOptions.CLUSTERING_ASYNC_ENABLED)) > .withClusteringPlanStrategyClass(conf.getString(FlinkOptions.CLUSTERING_PLAN_STRATEGY_CLASS)) > .withClusteringPlanPartitionFilterMode( > ClusteringPlanPartitionFilterMode.valueOf(conf.getString(FlinkOptions.CLUSTERING_PLAN_PARTITION_FILTER_MODE_NAME))) > .withClusteringTargetPartitions(conf.getInteger(FlinkOptions.CLUSTERING_TARGET_PARTITIONS)) > .withClusteringMaxNumGroups(conf.getInteger(FlinkOptions.CLUSTERING_MAX_NUM_GROUPS)) > .withClusteringTargetFileMaxBytes(conf.getInteger(FlinkOptions.CLUSTERING_PLAN_STRATEGY_TARGET_FILE_MAX_BYTES)) > .withClusteringPlanSmallFileLimit(conf.getInteger(FlinkOptions.CLUSTERING_PLAN_STRATEGY_SMALL_FILE_LIMIT) > * 1024 * 1024L) > .withClusteringSkipPartitionsFromLatest(conf.getInteger(FlinkOptions.CLUSTERING_PLAN_STRATEGY_SKIP_PARTITIONS_FROM_LATEST)) > .withAsyncClusteringMaxCommits(conf.getInteger(FlinkOptions.CLUSTERING_DELTA_COMMITS)) > .build()) > .withCleanConfig(HoodieCleanConfig.newBuilder() > .withAsyncClean(conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) > .retainCommits(conf.getInteger(FlinkOptions.CLEAN_RETAIN_COMMITS)) > .cleanerNumHoursRetained(conf.getInteger(FlinkOptions.CLEAN_RETAIN_HOURS)) > .retainFileVersions(conf.getInteger(FlinkOptions.CLEAN_RETAIN_FILE_VERSIONS)) > // override and hardcode to 20, > // actually Flink cleaning is always with parallelism 1 now > .withCleanerParallelism(20) > .withCleanerPolicy(HoodieCleaningPolicy.valueOf(conf.getString(FlinkOptions.CLEAN_POLICY))) > .build()) > .withArchivalConfig(HoodieArchivalConfig.newBuilder() > .archiveCommitsWith(conf.getInteger(FlinkOptions.ARCHIVE_MIN_COMMITS), > conf.getInteger(FlinkOptions.ARCHIVE_MAX_COMMITS)) > .build()) > .withCompactionConfig(HoodieCompactionConfig.newBuilder() > .withTargetIOPerCompactionInMB(conf.getLong(FlinkOptions.COMPACTION_TARGET_IO)) > .withInlineCompactionTriggerStrategy( > CompactionTriggerStrategy.valueOf(conf.getString(FlinkOptions.COMPACTION_TRIGGER_STRATEGY).toUpperCase(Locale.ROOT))) > .withMaxNumDeltaCommitsBeforeCompaction(conf.getInteger(FlinkOptions.COMPACTION_DELTA_COMMITS)) > .withMaxDeltaSecondsBeforeCompaction(conf.getInteger(FlinkOptions.COMPACTION_DELTA_SECONDS)) > .build()) > .withMemoryConfig( > HoodieMemoryConfig.newBuilder() > .withMaxMemoryMaxSize( > conf.getInteger(FlinkOptions.WRITE_MERGE_MAX_MEMORY) * 1024 * 1024L, > conf.getInteger(FlinkOptions.COMPACTION_MAX_MEMORY) * 1024 * 1024L > ).build()) > .forTable(conf.getString(FlinkOptions.TABLE_NAME)) > .withStorageConfig(HoodieStorageConfig.newBuilder() > .logFileDataBlockMaxSize(conf.getInteger(FlinkOptions.WRITE_LOG_BLOCK_SIZE) * > 1024 * 1024) > .logFileMaxSize(conf.getLong(FlinkOptions.WRITE_LOG_MAX_SIZE) * 1024 * 1024) > .parquetBlockSize(conf.getInteger(FlinkOptions.WRITE_PARQUET_BLOCK_SIZE) * > 1024 * 1024) > .parquetPageSize(conf.getInteger(FlinkOptions.WRITE_PARQUET_PAGE_SIZE) * 1024 > * 1024) > .parquetMaxFileSize(conf.getInteger(FlinkOptions.WRITE_PARQUET_MAX_FILE_SIZE) > * 1024 * 1024L) > .build()) > .withMetadataConfig(HoodieMetadataConfig.newBuilder() > .enable(conf.getBoolean(FlinkOptions.METADATA_ENABLED)) > .withMaxNumDeltaCommitsBeforeCompaction(conf.getInteger(FlinkOptions.METADATA_COMPACTION_DELTA_COMMITS)) > .build()) > .withLockConfig(HoodieLockConfig.newBuilder() > .withLockProvider(FileSystemBasedLockProvider.class) > .withLockWaitTimeInMillis(2000L) // 2s > .withFileSystemLockExpire(1) // 1 minute > .withClientNumRetries(30) > .withFileSystemLockPath(StreamerUtil.getAuxiliaryPath(conf)) > .build()) > .withPayloadConfig(getPayloadConfig(conf)) > .withEmbeddedTimelineServerEnabled(enableEmbeddedTimelineService) > .withEmbeddedTimelineServerReuseEnabled(true) // make
[GitHub] [hudi] JerryYue-M opened a new pull request, #6491: [HUDI-4714] HoodieFlinkWriteClient can't load callback config to Hood…
JerryYue-M opened a new pull request, #6491: URL: https://github.com/apache/hudi/pull/6491 …ieWriteConfig ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ **Risk level: none | low | medium | high** _Choose one. If medium or high, explain what verification was done to mitigate the risks._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …
hudi-bot commented on PR #6489: URL: https://github.com/apache/hudi/pull/6489#issuecomment-1226729451 ## CI report: * 47680402da599615de30c13a1f22f79f3573ee30 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6450: [HUDI-4665] Flipping default for "ignore failed batch" config in streaming sink to false
hudi-bot commented on PR #6450: URL: https://github.com/apache/hudi/pull/6450#issuecomment-1226729350 ## CI report: * 7deae45265d4ade6c324d4477e6144289f1714fd UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a diff in pull request #6409: [HUDI-4629] Create hive table from existing hoodie table failed when the table schema is not defined
xushiyan commented on code in PR #6409: URL: https://github.com/apache/hudi/pull/6409#discussion_r954466771 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/catalyst/catalog/HoodieCatalogTable.scala: ## @@ -129,7 +129,17 @@ class HoodieCatalogTable(val spark: SparkSession, var table: CatalogTable) exten /** * Table schema */ - lazy val tableSchema: StructType = table.schema + lazy val tableSchema: StructType = if (table.schema.nonEmpty) { +table.schema + } else { +val schemaFromMetaOpt = loadTableSchemaByMetaClient() Review Comment: @jinxing64 can you also have a look pls? given this is based on your previous change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a diff in pull request #6409: [HUDI-4629] Create hive table from existing hoodie table failed when the table schema is not defined
xushiyan commented on code in PR #6409: URL: https://github.com/apache/hudi/pull/6409#discussion_r954466051 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/catalyst/catalog/HoodieCatalogTable.scala: ## @@ -129,7 +129,17 @@ class HoodieCatalogTable(val spark: SparkSession, var table: CatalogTable) exten /** * Table schema */ - lazy val tableSchema: StructType = table.schema + lazy val tableSchema: StructType = if (table.schema.nonEmpty) { +table.schema + } else { +val schemaFromMetaOpt = loadTableSchemaByMetaClient() Review Comment: this already handled in `parseSchemaAndConfigs()` , isn't it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-4714) HoodieFlinkWriteClient can't load callback config to HoodieWriteConfig
yuemeng created HUDI-4714: - Summary: HoodieFlinkWriteClient can't load callback config to HoodieWriteConfig Key: HUDI-4714 URL: https://issues.apache.org/jira/browse/HUDI-4714 Project: Apache Hudi Issue Type: Bug Reporter: yuemeng Currently, it doesn't load callback config to write config when call StreamUtil's getHoodieClientConfig method So In hoodie Flink write client ,callback never worked {code} HoodieWriteConfig.Builder builder = HoodieWriteConfig.newBuilder() .withEngineType(EngineType.FLINK) .withPath(conf.getString(FlinkOptions.PATH)) .combineInput(conf.getBoolean(FlinkOptions.PRE_COMBINE), true) .withMergeAllowDuplicateOnInserts(OptionsResolver.insertClustering(conf)) .withClusteringConfig( HoodieClusteringConfig.newBuilder() .withAsyncClustering(conf.getBoolean(FlinkOptions.CLUSTERING_ASYNC_ENABLED)) .withClusteringPlanStrategyClass(conf.getString(FlinkOptions.CLUSTERING_PLAN_STRATEGY_CLASS)) .withClusteringPlanPartitionFilterMode( ClusteringPlanPartitionFilterMode.valueOf(conf.getString(FlinkOptions.CLUSTERING_PLAN_PARTITION_FILTER_MODE_NAME))) .withClusteringTargetPartitions(conf.getInteger(FlinkOptions.CLUSTERING_TARGET_PARTITIONS)) .withClusteringMaxNumGroups(conf.getInteger(FlinkOptions.CLUSTERING_MAX_NUM_GROUPS)) .withClusteringTargetFileMaxBytes(conf.getInteger(FlinkOptions.CLUSTERING_PLAN_STRATEGY_TARGET_FILE_MAX_BYTES)) .withClusteringPlanSmallFileLimit(conf.getInteger(FlinkOptions.CLUSTERING_PLAN_STRATEGY_SMALL_FILE_LIMIT) * 1024 * 1024L) .withClusteringSkipPartitionsFromLatest(conf.getInteger(FlinkOptions.CLUSTERING_PLAN_STRATEGY_SKIP_PARTITIONS_FROM_LATEST)) .withAsyncClusteringMaxCommits(conf.getInteger(FlinkOptions.CLUSTERING_DELTA_COMMITS)) .build()) .withCleanConfig(HoodieCleanConfig.newBuilder() .withAsyncClean(conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) .retainCommits(conf.getInteger(FlinkOptions.CLEAN_RETAIN_COMMITS)) .cleanerNumHoursRetained(conf.getInteger(FlinkOptions.CLEAN_RETAIN_HOURS)) .retainFileVersions(conf.getInteger(FlinkOptions.CLEAN_RETAIN_FILE_VERSIONS)) // override and hardcode to 20, // actually Flink cleaning is always with parallelism 1 now .withCleanerParallelism(20) .withCleanerPolicy(HoodieCleaningPolicy.valueOf(conf.getString(FlinkOptions.CLEAN_POLICY))) .build()) .withArchivalConfig(HoodieArchivalConfig.newBuilder() .archiveCommitsWith(conf.getInteger(FlinkOptions.ARCHIVE_MIN_COMMITS), conf.getInteger(FlinkOptions.ARCHIVE_MAX_COMMITS)) .build()) .withCompactionConfig(HoodieCompactionConfig.newBuilder() .withTargetIOPerCompactionInMB(conf.getLong(FlinkOptions.COMPACTION_TARGET_IO)) .withInlineCompactionTriggerStrategy( CompactionTriggerStrategy.valueOf(conf.getString(FlinkOptions.COMPACTION_TRIGGER_STRATEGY).toUpperCase(Locale.ROOT))) .withMaxNumDeltaCommitsBeforeCompaction(conf.getInteger(FlinkOptions.COMPACTION_DELTA_COMMITS)) .withMaxDeltaSecondsBeforeCompaction(conf.getInteger(FlinkOptions.COMPACTION_DELTA_SECONDS)) .build()) .withMemoryConfig( HoodieMemoryConfig.newBuilder() .withMaxMemoryMaxSize( conf.getInteger(FlinkOptions.WRITE_MERGE_MAX_MEMORY) * 1024 * 1024L, conf.getInteger(FlinkOptions.COMPACTION_MAX_MEMORY) * 1024 * 1024L ).build()) .forTable(conf.getString(FlinkOptions.TABLE_NAME)) .withStorageConfig(HoodieStorageConfig.newBuilder() .logFileDataBlockMaxSize(conf.getInteger(FlinkOptions.WRITE_LOG_BLOCK_SIZE) * 1024 * 1024) .logFileMaxSize(conf.getLong(FlinkOptions.WRITE_LOG_MAX_SIZE) * 1024 * 1024) .parquetBlockSize(conf.getInteger(FlinkOptions.WRITE_PARQUET_BLOCK_SIZE) * 1024 * 1024) .parquetPageSize(conf.getInteger(FlinkOptions.WRITE_PARQUET_PAGE_SIZE) * 1024 * 1024) .parquetMaxFileSize(conf.getInteger(FlinkOptions.WRITE_PARQUET_MAX_FILE_SIZE) * 1024 * 1024L) .build()) .withMetadataConfig(HoodieMetadataConfig.newBuilder() .enable(conf.getBoolean(FlinkOptions.METADATA_ENABLED)) .withMaxNumDeltaCommitsBeforeCompaction(conf.getInteger(FlinkOptions.METADATA_COMPACTION_DELTA_COMMITS)) .build()) .withLockConfig(HoodieLockConfig.newBuilder() .withLockProvider(FileSystemBasedLockProvider.class) .withLockWaitTimeInMillis(2000L) // 2s .withFileSystemLockExpire(1) // 1 minute .withClientNumRetries(30) .withFileSystemLockPath(StreamerUtil.getAuxiliaryPath(conf)) .build()) .withPayloadConfig(getPayloadConfig(conf)) .withEmbeddedTimelineServerEnabled(enableEmbeddedTimelineService) .withEmbeddedTimelineServerReuseEnabled(true) // make write client embedded timeline service singleton .withAutoCommit(false) .withAllowOperationMetadataField(conf.getBoolean(FlinkOptions.CHANGELOG_ENABLED)) .withProps(flinkConf2TypedProperties(conf)) .withSchema(getSourceSchema(conf).toString()); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-4714) HoodieFlinkWriteClient can't load callback config to HoodieWriteConfig
[ https://issues.apache.org/jira/browse/HUDI-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuemeng reassigned HUDI-4714: - Assignee: yuemeng > HoodieFlinkWriteClient can't load callback config to HoodieWriteConfig > -- > > Key: HUDI-4714 > URL: https://issues.apache.org/jira/browse/HUDI-4714 > Project: Apache Hudi > Issue Type: Bug >Reporter: yuemeng >Assignee: yuemeng >Priority: Major > > Currently, it doesn't load callback config to write config when call > StreamUtil's getHoodieClientConfig method > So In hoodie Flink write client ,callback never worked > {code} > HoodieWriteConfig.Builder builder = > HoodieWriteConfig.newBuilder() > .withEngineType(EngineType.FLINK) > .withPath(conf.getString(FlinkOptions.PATH)) > .combineInput(conf.getBoolean(FlinkOptions.PRE_COMBINE), true) > .withMergeAllowDuplicateOnInserts(OptionsResolver.insertClustering(conf)) > .withClusteringConfig( > HoodieClusteringConfig.newBuilder() > .withAsyncClustering(conf.getBoolean(FlinkOptions.CLUSTERING_ASYNC_ENABLED)) > .withClusteringPlanStrategyClass(conf.getString(FlinkOptions.CLUSTERING_PLAN_STRATEGY_CLASS)) > .withClusteringPlanPartitionFilterMode( > ClusteringPlanPartitionFilterMode.valueOf(conf.getString(FlinkOptions.CLUSTERING_PLAN_PARTITION_FILTER_MODE_NAME))) > .withClusteringTargetPartitions(conf.getInteger(FlinkOptions.CLUSTERING_TARGET_PARTITIONS)) > .withClusteringMaxNumGroups(conf.getInteger(FlinkOptions.CLUSTERING_MAX_NUM_GROUPS)) > .withClusteringTargetFileMaxBytes(conf.getInteger(FlinkOptions.CLUSTERING_PLAN_STRATEGY_TARGET_FILE_MAX_BYTES)) > .withClusteringPlanSmallFileLimit(conf.getInteger(FlinkOptions.CLUSTERING_PLAN_STRATEGY_SMALL_FILE_LIMIT) > * 1024 * 1024L) > .withClusteringSkipPartitionsFromLatest(conf.getInteger(FlinkOptions.CLUSTERING_PLAN_STRATEGY_SKIP_PARTITIONS_FROM_LATEST)) > .withAsyncClusteringMaxCommits(conf.getInteger(FlinkOptions.CLUSTERING_DELTA_COMMITS)) > .build()) > .withCleanConfig(HoodieCleanConfig.newBuilder() > .withAsyncClean(conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) > .retainCommits(conf.getInteger(FlinkOptions.CLEAN_RETAIN_COMMITS)) > .cleanerNumHoursRetained(conf.getInteger(FlinkOptions.CLEAN_RETAIN_HOURS)) > .retainFileVersions(conf.getInteger(FlinkOptions.CLEAN_RETAIN_FILE_VERSIONS)) > // override and hardcode to 20, > // actually Flink cleaning is always with parallelism 1 now > .withCleanerParallelism(20) > .withCleanerPolicy(HoodieCleaningPolicy.valueOf(conf.getString(FlinkOptions.CLEAN_POLICY))) > .build()) > .withArchivalConfig(HoodieArchivalConfig.newBuilder() > .archiveCommitsWith(conf.getInteger(FlinkOptions.ARCHIVE_MIN_COMMITS), > conf.getInteger(FlinkOptions.ARCHIVE_MAX_COMMITS)) > .build()) > .withCompactionConfig(HoodieCompactionConfig.newBuilder() > .withTargetIOPerCompactionInMB(conf.getLong(FlinkOptions.COMPACTION_TARGET_IO)) > .withInlineCompactionTriggerStrategy( > CompactionTriggerStrategy.valueOf(conf.getString(FlinkOptions.COMPACTION_TRIGGER_STRATEGY).toUpperCase(Locale.ROOT))) > .withMaxNumDeltaCommitsBeforeCompaction(conf.getInteger(FlinkOptions.COMPACTION_DELTA_COMMITS)) > .withMaxDeltaSecondsBeforeCompaction(conf.getInteger(FlinkOptions.COMPACTION_DELTA_SECONDS)) > .build()) > .withMemoryConfig( > HoodieMemoryConfig.newBuilder() > .withMaxMemoryMaxSize( > conf.getInteger(FlinkOptions.WRITE_MERGE_MAX_MEMORY) * 1024 * 1024L, > conf.getInteger(FlinkOptions.COMPACTION_MAX_MEMORY) * 1024 * 1024L > ).build()) > .forTable(conf.getString(FlinkOptions.TABLE_NAME)) > .withStorageConfig(HoodieStorageConfig.newBuilder() > .logFileDataBlockMaxSize(conf.getInteger(FlinkOptions.WRITE_LOG_BLOCK_SIZE) * > 1024 * 1024) > .logFileMaxSize(conf.getLong(FlinkOptions.WRITE_LOG_MAX_SIZE) * 1024 * 1024) > .parquetBlockSize(conf.getInteger(FlinkOptions.WRITE_PARQUET_BLOCK_SIZE) * > 1024 * 1024) > .parquetPageSize(conf.getInteger(FlinkOptions.WRITE_PARQUET_PAGE_SIZE) * 1024 > * 1024) > .parquetMaxFileSize(conf.getInteger(FlinkOptions.WRITE_PARQUET_MAX_FILE_SIZE) > * 1024 * 1024L) > .build()) > .withMetadataConfig(HoodieMetadataConfig.newBuilder() > .enable(conf.getBoolean(FlinkOptions.METADATA_ENABLED)) > .withMaxNumDeltaCommitsBeforeCompaction(conf.getInteger(FlinkOptions.METADATA_COMPACTION_DELTA_COMMITS)) > .build()) > .withLockConfig(HoodieLockConfig.newBuilder() > .withLockProvider(FileSystemBasedLockProvider.class) > .withLockWaitTimeInMillis(2000L) // 2s > .withFileSystemLockExpire(1) // 1 minute > .withClientNumRetries(30) > .withFileSystemLockPath(StreamerUtil.getAuxiliaryPath(conf)) > .build()) > .withPayloadConfig(getPayloadConfig(conf)) > .withEmbeddedTimelineServerEnabled(enableEmbeddedTimelineService) > .withEmbeddedTimelineServerReuseEnabled(true) // make write client embedded > timeline service singleton > .withAutoCommit(false
[jira] [Updated] (HUDI-4713) Fix flaky ITTestHoodieDataSource#testAppendWrite
[ https://issues.apache.org/jira/browse/HUDI-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4713: - Labels: pull-request-available (was: ) > Fix flaky ITTestHoodieDataSource#testAppendWrite > > > Key: HUDI-4713 > URL: https://issues.apache.org/jira/browse/HUDI-4713 > Project: Apache Hudi > Issue Type: Test > Components: flink >Reporter: Danny Chen >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] danny0405 opened a new pull request, #6490: [HUDI-4713] Fix flaky ITTestHoodieDataSource#testAppendWrite
danny0405 opened a new pull request, #6490: URL: https://github.com/apache/hudi/pull/6490 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ **Risk level: none | low | medium | high** _Choose one. If medium or high, explain what verification was done to mitigate the risks._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-4713) Fix flaky ITTestHoodieDataSource#testAppendWrite
Danny Chen created HUDI-4713: Summary: Fix flaky ITTestHoodieDataSource#testAppendWrite Key: HUDI-4713 URL: https://issues.apache.org/jira/browse/HUDI-4713 Project: Apache Hudi Issue Type: Test Components: flink Reporter: Danny Chen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] wzx140 commented on pull request #6486: [HUDI-4706] Fix InternalSchemaChangeApplier#applyAddChange error to add nest type
wzx140 commented on PR #6486: URL: https://github.com/apache/hudi/pull/6486#issuecomment-1226708070 @xiarixiaoyao Could you please review it? Really thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a diff in pull request #6240: [HUDI-4482][HUDI-4483] fix checkstyle error and remove guava
xushiyan commented on code in PR #6240: URL: https://github.com/apache/hudi/pull/6240#discussion_r954451365 ## hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowHoodieLogFileMetadataProcedure.scala: ## @@ -30,6 +29,7 @@ import org.apache.parquet.avro.AvroSchemaConverter import org.apache.spark.sql.Row import org.apache.spark.sql.types.{DataTypes, Metadata, StructField, StructType} +import java.util Review Comment: to be consistent with other code like in hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowFileSystemViewProcedure.scala we'd prefer use `java.util.HashMap` (explicit) over `util.HashMap`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-4392) Flink MOR table inline compaction plan execution sequence should be configurable
[ https://issues.apache.org/jira/browse/HUDI-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuemeng reassigned HUDI-4392: - Assignee: yuemeng > Flink MOR table inline compaction plan execution sequence should be > configurable > > > Key: HUDI-4392 > URL: https://issues.apache.org/jira/browse/HUDI-4392 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: yuemeng >Assignee: yuemeng >Priority: Major > Labels: pull-request-available > > when there leaves too much compaction in some cases. Flink inline compaction > always deals the earliest then one by one, it may fail the job because of too > much compact operation and we can do nothing > Flink MOR table inline compaction plan execution sequence should be > configurable to avoid too much compaction needed to compact to fail the job > When there are a large number of compact plans that need to be executed, > inline compact operation handles the latest compaction plan to ensure > stability and some external job (offline or compact server) to handle the > rest compaction plan. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4485) Hudi cli got empty result for command show fsview all
[ https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4485: - Labels: pull-request-available (was: ) > Hudi cli got empty result for command show fsview all > - > > Key: HUDI-4485 > URL: https://issues.apache.org/jira/browse/HUDI-4485 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.11.1 > Environment: Hudi version : 0.11.1 > Spark version : 3.1.1 > Hive version : 3.1.0 > Hadoop version : 3.1.1 >Reporter: Yao Zhang >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Attachments: spring-shell-1.2.0.RELEASE.jar > > > This issue is from: [[SUPPORT] Hudi cli got empty result for command show > fsview all · Issue #6177 · apache/hudi > (github.com)|https://github.com/apache/hudi/issues/6177] > {*}{{*}}Describe the problem you faced{{*}}{*} > Hudi cli got empty result after running command show fsview all. > ![image]([https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png]) > The type of table t1 is COW and I am sure that the parquet file is actually > generated inside data folder. Also, the parquet files are not damaged as the > data could be retrieved correctly by reading as Hudi table or directly > reading each parquet file(using Spark). > {*}{{*}}To Reproduce{{*}}{*} > Steps to reproduce the behavior: > 1. Enter Flink SQL client. > 2. Execute the SQL and check the data was written successfully. > ```sql > CREATE TABLE t1( > uuid VARCHAR(20), > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 'hdfs:///path/to/table/', > 'table.type' = 'COPY_ON_WRITE' > ); > – insert data using values > INSERT INTO t1 VALUES > ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), > ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), > ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), > ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), > ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), > ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), > ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), > ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); > ``` > 3. Enter Hudi cli and execute `show fsview all` > {*}{{*}}Expected behavior{{*}}{*} > `show fsview all` in Hudi cli should return all file slices. > {*}{{*}}Environment Description{{*}}{*} > * Hudi version : 0.11.1 > * Spark version : 3.1.1 > * Hive version : 3.1.0 > * Hadoop version : 3.1.1 > * Storage (HDFS/S3/GCS..) : HDFS > * Running on Docker? (yes/no) : no > {*}{{*}}Additional context{{*}}{*} > No. > {*}{{*}}Stacktrace{{*}}{*} > N/A > > Temporary solution: > I modified and recompiled spring-shell 1.2.0.RELEASE. Please download the > attachment and replace the same file in ${HUDI_CLI_DIR}/target/lib/. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] paul8263 opened a new pull request, #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …
paul8263 opened a new pull request, #6489: URL: https://github.com/apache/hudi/pull/6489 …value for show fsview all pathRegex parameter. ### Change Logs In order to fix [HUDI-4485](https://issues.apache.org/jira/projects/HUDI/issues/HUDI-4485), we bumped spring shell to 2.1.1 and updated the default value for show fsview all pathRegex parameter. ### Impact Public API and user-facing features are not affected. But it may have performance impact. **Risk level: medium** Updated the unit test and all hudi-cli tests can pass. Also its functionality has been tested in the real-world environment. ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6450: [HUDI-4665] Flipping default for "ignore failed batch" config in streaming sink to false
hudi-bot commented on PR #6450: URL: https://github.com/apache/hudi/pull/6450#issuecomment-1226691384 ## CI report: * 7deae45265d4ade6c324d4477e6144289f1714fd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10927) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan merged pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies
xushiyan merged PR #6170: URL: https://github.com/apache/hudi/pull/6170 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on issue #6487: [SUPPORT] Primary key check in deltastreamer
yihua commented on issue #6487: URL: https://github.com/apache/hudi/issues/6487#issuecomment-1226658157 @WangCHX Let me check the code. There should be validation of the write config against the table config to make sure they are consistent. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6450: [HUDI-4665] Flipping default for "ignore failed batch" config in streaming sink to false
hudi-bot commented on PR #6450: URL: https://github.com/apache/hudi/pull/6450#issuecomment-1226656369 ## CI report: * 214c313d89e0d8abc5ea356d0fc10c475b138ad2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10923) * 7deae45265d4ade6c324d4477e6144289f1714fd Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10927) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6450: [HUDI-4665] Flipping default for "ignore failed batch" config in streaming sink to false
hudi-bot commented on PR #6450: URL: https://github.com/apache/hudi/pull/6450#issuecomment-1226621598 ## CI report: * 214c313d89e0d8abc5ea356d0fc10c475b138ad2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10923) * 7deae45265d4ade6c324d4477e6144289f1714fd UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (1e162bb73a -> e5584b3735)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 1e162bb73a HUDI-4687 add show_invalid_parquet procedure (#6480) add e5584b3735 [HUDI-4584] Fixing `SQLConf` not being propagated to executor (#6352) No new revisions were added by this update. Summary of changes: .../scala/org/apache/hudi/HoodieSparkUtils.scala | 15 +- .../spark/sql/execution/SQLConfInjectingRDD.scala | 61 ++ 2 files changed, 74 insertions(+), 2 deletions(-) create mode 100644 hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/execution/SQLConfInjectingRDD.scala
[GitHub] [hudi] yihua merged pull request #6352: [HUDI-4584] Fixing `SQLConf` not being propagated to executor
yihua merged PR #6352: URL: https://github.com/apache/hudi/pull/6352 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on issue #5765: [SUPPORT] throw "java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()"
yihua commented on issue #5765: URL: https://github.com/apache/hudi/issues/5765#issuecomment-1226572795 @nsivabalan We can document the workaround, yet it's still not ideal for users relying on releases. I'll check if we can fix it by the dependency management within Hudi. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch asf-site updated: [DOCS] Fix youtube image (#6488)
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 37aeb19151 [DOCS] Fix youtube image (#6488) 37aeb19151 is described below commit 37aeb19151447e0a3905b092b2ba93ffe408447e Author: Bhavani Sudha Saktheeswaran <2179254+bhasu...@users.noreply.github.com> AuthorDate: Thu Aug 25 03:32:19 2022 +0530 [DOCS] Fix youtube image (#6488) --- website/src/css/custom.css| 2 +- website/static/assets/images/youtube.jpeg | Bin 8825 -> 0 bytes website/static/assets/images/youtube.png | Bin 0 -> 970 bytes 3 files changed, 1 insertion(+), 1 deletion(-) diff --git a/website/src/css/custom.css b/website/src/css/custom.css index 21c02d3bbb..2d9bd50881 100644 --- a/website/src/css/custom.css +++ b/website/src/css/custom.css @@ -94,7 +94,7 @@ html[data-theme='dark'] .docusaurus-highlight-code-line { } .header-youtube-link:before { - background: url(/assets/images/youtube.jpeg) no-repeat; + background: url(/assets/images/youtube.png) no-repeat; content: ""; display: flex; height: 30px; diff --git a/website/static/assets/images/youtube.jpeg b/website/static/assets/images/youtube.jpeg deleted file mode 100644 index 0b5cf9732a..00 Binary files a/website/static/assets/images/youtube.jpeg and /dev/null differ diff --git a/website/static/assets/images/youtube.png b/website/static/assets/images/youtube.png new file mode 100644 index 00..0700bfb1b6 Binary files /dev/null and b/website/static/assets/images/youtube.png differ
[GitHub] [hudi] xushiyan merged pull request #6488: [DOCS] Fix youtube image in header
xushiyan merged PR #6488: URL: https://github.com/apache/hudi/pull/6488 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6046: [HUDI-4363] Support Clustering row writer to improve performance
alexeykudinkin commented on code in PR #6046: URL: https://github.com/apache/hudi/pull/6046#discussion_r951930671 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java: ## @@ -98,10 +110,18 @@ public HoodieWriteMetadata> performClustering(final Hood // execute clustering for each group async and collect WriteStatus Stream> writeStatusesStream = FutureUtils.allOf( clusteringPlan.getInputGroups().stream() -.map(inputGroup -> runClusteringForGroupAsync(inputGroup, -clusteringPlan.getStrategy().getStrategyParams(), - Option.ofNullable(clusteringPlan.getPreserveHoodieMetadata()).orElse(false), -instantTime)) +.map(inputGroup -> { + if (Boolean.parseBoolean(getWriteConfig().getString(HoodieClusteringConfig.CLUSTERING_AS_ROW))) { Review Comment: Let's abstract this as a method in `WriteConfig` ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RowSpatialCurveSortPartitioner.java: ## @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.execution.bulkinsert; + +import org.apache.hudi.config.HoodieClusteringConfig; +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.sort.SpaceCurveSortingHelper; +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Row; + +import java.util.Arrays; +import java.util.List; + +public class RowSpatialCurveSortPartitioner extends RowCustomColumnsSortPartitioner { Review Comment: Why do we inherit from `RowCustomColumnsSortPartitioner` ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java: ## @@ -273,6 +398,62 @@ private HoodieData> readRecordsForGroupBaseFiles(JavaSparkContex .map(record -> transform(record, writeConfig))); } + /** + * Get dataset of all records for the group. This includes all records from file slice (Apply updates from log files, if any). + */ + private Dataset readRecordsForGroupAsRow(JavaSparkContext jsc, + HoodieClusteringGroup clusteringGroup, + String instantTime) { +List clusteringOps = clusteringGroup.getSlices().stream() +.map(ClusteringOperation::create).collect(Collectors.toList()); +boolean hasLogFiles = clusteringOps.stream().anyMatch(op -> op.getDeltaFilePaths().size() > 0); +SQLContext sqlContext = new SQLContext(jsc.sc()); + +String[] baseFilePaths = clusteringOps +.stream() +.map(op -> { + ArrayList pairs = new ArrayList<>(); + if (op.getBootstrapFilePath() != null) { +pairs.add(op.getBootstrapFilePath()); + } + if (op.getDataFilePath() != null) { +pairs.add(op.getDataFilePath()); + } + return pairs; +}) +.flatMap(Collection::stream) +.filter(path -> !path.isEmpty()) +.toArray(String[]::new); +String[] deltaPaths = clusteringOps +.stream() +.filter(op -> !op.getDeltaFilePaths().isEmpty()) +.flatMap(op -> op.getDeltaFilePaths().stream()) +.toArray(String[]::new); + +Dataset inputRecords; +if (hasLogFiles) { + String compactionFractor = Option.ofNullable(getWriteConfig().getString("compaction.memory.fraction")) + .orElse("0.75"); + String[] paths = new String[baseFilePaths.length + deltaPaths.length]; + System.arraycopy(baseFilePaths, 0, paths, 0, baseFilePaths.length); + System.arraycopy(deltaPaths, 0, paths, baseFilePaths.length, deltaPaths.length); Review Comment: You can use `CollectionUtils.combine` ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieInternalWriteStatusCoordinator.java: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional inf
[GitHub] [hudi] hudi-bot commented on pull request #6135: [HUDI-4418] Add support for ProtoKafkaSource
hudi-bot commented on PR #6135: URL: https://github.com/apache/hudi/pull/6135#issuecomment-1226410197 ## CI report: * d36fed637603d9959e8d049ac0815b9c729eb246 UNKNOWN * 194396032e698ac4210d8652a969c7e58832d5db Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10925) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bhasudha opened a new pull request, #6488: [DOCS] Fix youtube image in header
bhasudha opened a new pull request, #6488: URL: https://github.com/apache/hudi/pull/6488 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ **Risk level: none | low | medium | high** _Choose one. If medium or high, explain what verification was done to mitigate the risks._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on pull request #6216: [HUDI-4475] fix create table with not exists hoodie properties file
xushiyan commented on PR #6216: URL: https://github.com/apache/hudi/pull/6216#issuecomment-1226338378 > > > What kind of operations would need to cause this case? > > > > > > Because now the production environment finds that re-deleting the table (non-purge) and some other abnormal scenarios of compcation will cause the hudi properties file to be deleted, and then the re-creation of the table will fail. > > I think we need to figure out the root cause why compaction deletes `hoodie.properties`. Also can we add ut for deleting the `hoodie.properties` manually and see whether the writing works well? @leesf @XuQianJin-Stars I agree that we should fix the root cause. This patch is more like treating the symptom. @XuQianJin-Stars let's close this and try reproduce the problem you had? if any corner case where properties file was unexpectedly deleted, it should be fixed most likely around the code in https://github.com/apache/hudi/blob/a75cc02273ae87c383ae1ed46f95006c366f70fc/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java#L344 as mentioned by @nsivabalan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on pull request #6432: [HUDI-4586] Improve metadata fetching in bloom index
alexeykudinkin commented on PR #6432: URL: https://github.com/apache/hudi/pull/6432#issuecomment-1226334636 Following up on my previous comment: taking a deeper look i see following issues in our code at the moment 1. Creating `HoodieTableBackedMetadata` instance w/in `HoodieTable` we don't specify that it should reuse MT readers. 2. `HoodieMetadataMergedLogRecordReader.getRecordsByKeys` always clears previously computed `records` and always scans from scratch while instead we should NOT be re-processing records that have already been processed, and instead just incrementally process missing ones. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6432: [HUDI-4586] Improve metadata fetching in bloom index
alexeykudinkin commented on code in PR #6432: URL: https://github.com/apache/hudi/pull/6432#discussion_r954187229 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/BloomIndexFileInfo.java: ## @@ -27,19 +27,20 @@ public class BloomIndexFileInfo implements Serializable { private final String fileId; - + private final String filename; private final String minRecordKey; - private final String maxRecordKey; - public BloomIndexFileInfo(String fileId, String minRecordKey, String maxRecordKey) { + public BloomIndexFileInfo(String fileId, String filename, String minRecordKey, String maxRecordKey) { this.fileId = fileId; +this.filename = filename; Review Comment: nit: `fileName` ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/BloomIndexFileInfo.java: ## @@ -27,19 +27,20 @@ public class BloomIndexFileInfo implements Serializable { private final String fileId; - + private final String filename; Review Comment: Do we really need to store both file-id and file-name? I think we can just store the file-name, and then convert it to file-id wherever necessary ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bloom/HoodieMetadataBloomIndexCheckFunction.java: ## @@ -83,37 +89,64 @@ protected void start() { @Override protected List computeNext() { // Partition path and file name pair to list of keys - final Map, List> fileToKeysMap = new HashMap<>(); - final Map fileIDBaseFileMap = new HashMap<>(); + final Map, List> batchFileToKeysMap = new HashMap<>(); final List resultList = new ArrayList<>(); + String lastFileId = null; + + try { +// Here we batch process the lookup of bloom filters in metadata table +// assuming the partition path and file name pairs are already sorted by the corresponding key +while (inputItr.hasNext()) { + Tuple2, HoodieKey> entry = inputItr.next(); + final String partitionPath = entry._2.getPartitionPath(); + final String fileId = entry._1._1(); + final String filename = entry._1._2(); + + if (lastFileId == null || !lastFileId.equals(fileId)) { +if (processedFileIdSet.contains(fileId)) { + LOG.warn(String.format("Fetching the bloom filter for file ID %s again. " + + " The input pairs of file ID and record key are not sorted.", fileId)); +} +lastFileId = fileId; +processedFileIdSet.add(fileId); + } + + batchFileToKeysMap.computeIfAbsent(Pair.of(partitionPath, filename), k -> new ArrayList<>()).add(entry._2); - while (inputItr.hasNext()) { -Tuple2 entry = inputItr.next(); -final String partitionPath = entry._2.getPartitionPath(); -final String fileId = entry._1; -if (!fileIDBaseFileMap.containsKey(fileId)) { - Option baseFile = hoodieTable.getBaseFileOnlyView().getLatestBaseFile(partitionPath, fileId); - if (!baseFile.isPresent()) { -throw new HoodieIndexException("Failed to find the base file for partition: " + partitionPath -+ ", fileId: " + fileId); + if (batchFileToKeysMap.size() == batchSize) { +resultList.addAll(lookupKeysInBloomFilters(batchFileToKeysMap)); +batchFileToKeysMap.clear(); } - fileIDBaseFileMap.put(fileId, baseFile.get()); } -fileToKeysMap.computeIfAbsent(Pair.of(partitionPath, fileIDBaseFileMap.get(fileId).getFileName()), -k -> new ArrayList<>()).add(entry._2); -if (fileToKeysMap.size() > BLOOM_FILTER_CHECK_MAX_FILE_COUNT_PER_BATCH) { - break; + +if (batchFileToKeysMap.size() > 0) { + resultList.addAll(lookupKeysInBloomFilters(batchFileToKeysMap)); + batchFileToKeysMap.clear(); } + +return resultList; + } catch (Throwable e) { +if (e instanceof HoodieException) { + throw e; +} +throw new HoodieIndexException("Error checking bloom filter using metadata table.", e); } - if (fileToKeysMap.isEmpty()) { -return Collections.emptyList(); - } +} + +@Override +protected void end() { +} - List> partitionNameFileNameList = new ArrayList<>(fileToKeysMap.keySet()); +private List lookupKeysInBloomFilters( +Map, List> fileToKeysMap) { + List resultList = new ArrayList<>(); + List> partitionPathFileNameList = new ArrayList<>(fileToKeysMap.keySet()); + HoodieTimer timer = HoodieTimer.start(); Map, BloomFilter> fileToBloomFilterMap = - hoodieTable.getMetadataTable().getBloomFilters(partitionNameFileNameList); + hoodieTable.getMetadataTable().getBloomFilters(partitionPathFileNameList); + LOG.error(String.format("Took %d ms to look up %s bloom
[jira] [Updated] (HUDI-4712) Flaky Tests w/ azure CI
[ https://issues.apache.org/jira/browse/HUDI-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4712: -- Epic Link: HUDI-4302 > Flaky Tests w/ azure CI > --- > > Key: HUDI-4712 > URL: https://issues.apache.org/jira/browse/HUDI-4712 > Project: Apache Hudi > Issue Type: Test >Reporter: sivabalan narayanan >Priority: Major > > In our azure CI runs, last 4 to 5 runs have failed. Mostly its related to > Flink It tests. > > Tracking the failures here. > [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10911/logs/24] > {code:java} > 2022-08-24T02:57:19.7513911Z [INFO] Running > org.apache.hudi.table.ITTestHoodieDataSource > 2022-08-24T03:13:47.5967799Z [INFO] Tests run: 96, Failures: 0, Errors: 0, > Skipped: 0, Time elapsed: 987.826 s - in > org.apache.hudi.table.ITTestHoodieDataSource > 2022-08-24T03:13:47.5997817Z [INFO] Running > org.apache.hudi.sink.compact.ITTestHoodieFlinkCompactor > 2022-08-24T03:15:01.9378742Z [ERROR] Tests run: 7, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 74.337 s <<< FAILURE! - in > org.apache.hudi.sink.compact.ITTestHoodieFlinkCompactor > 2022-08-24T03:15:01.9381479Z [ERROR] > testHoodieFlinkCompactorWithPlanSelectStrategy{boolean}[1] Time elapsed: > 7.976 s <<< ERROR! > 2022-08-24T03:15:01.9382198Z > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > 2022-08-24T03:15:01.9383171Z Caused by: > org.apache.flink.runtime.JobException: Recovery is suppressed by > NoRestartBackoffTimeStrategy > 2022-08-24T03:15:01.9383730Z Caused by: > org.apache.hudi.exception.HoodieIOException: IOException when reading log > file > 2022-08-24T03:15:01.9384878Z Caused by: java.io.FileNotFoundException: File > file:/tmp/junit4630651021120836419/par4/.d6675c02-f0f4-40ba-9f5e-986b84f73cb6_20220824031447463.log.1_0-4-0 > does not exist > 2022-08-24T03:15:01.9650641Z > 2022-08-24T03:15:01.9651540Z [INFO] Running > org.apache.hudi.sink.cluster.ITTestHoodieFlinkClustering > 2022-08-24T03:15:21.1311853Z [INFO] Tests run: 2, Failures: 0, Errors: 0, > Skipped: 0, Time elapsed: 19.189 s - in > org.apache.hudi.sink.cluster.ITTestHoodieFlinkClustering > 2022-08-24T03:15:21.1324486Z [INFO] Running > org.apache.hudi.sink.ITTestDataStreamWrite > 2022-08-24T03:17:40.5148801Z [ERROR] Tests run: 9, Failures: 1, Errors: 0, > Skipped: 0, Time elapsed: 139.379 s <<< FAILURE! - in > org.apache.hudi.sink.ITTestDataStreamWrite > 2022-08-24T03:17:40.5149895Z [ERROR] > testWriteMergeOnReadWithCompaction{String}[1] Time elapsed: 21.725 s <<< > FAILURE! > 2022-08-24T03:17:40.5150555Z org.opentest4j.AssertionFailedError: expected: > but was: > 2022-08-24T03:17:40.5151262Z at > org.apache.hudi.sink.ITTestDataStreamWrite.testWriteToHoodie(ITTestDataStreamWrite.java:252) > 2022-08-24T03:17:40.5152067Z at > org.apache.hudi.sink.ITTestDataStreamWrite.testWriteToHoodie(ITTestDataStreamWrite.java:182) > 2022-08-24T03:17:40.5153086Z at > org.apache.hudi.sink.ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction(ITTestDataStreamWrite.java:156) > 2022-08-24T03:17:40.5153593Z > 2022-08-24T03:17:41.0381772Z [INFO] > 2022-08-24T03:17:41.0382573Z [INFO] Results: > 2022-08-24T03:17:41.0383977Z [INFO] > 2022-08-24T03:17:41.0384447Z [ERROR] Failures: > 2022-08-24T03:17:41.0385833Z [ERROR] > ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction:156->testWriteToHoodie:182->testWriteToHoodie:252 > expected: but was: > 2022-08-24T03:17:41.0386560Z [ERROR] Errors: > 2022-08-24T03:17:41.0387330Z [ERROR] > ITTestHoodieFlinkCompactor.testHoodieFlinkCompactorWithPlanSelectStrategy » > JobExecution > 2022-08-24T03:17:41.0387896Z [INFO] {code} > [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10911/logs/24] > {code:java} > 2022-08-24T02:57:18.8817403Z [INFO] > --- > 2022-08-24T02:57:19.7513911Z [INFO] Running > org.apache.hudi.table.ITTestHoodieDataSource > 2022-08-24T03:13:47.5967799Z [INFO] Tests run: 96, Failures: 0, Errors: 0, > Skipped: 0, Time elapsed: 987.826 s - in > org.apache.hudi.table.ITTestHoodieDataSource > 2022-08-24T03:13:47.5997817Z [INFO] Running > org.apache.hudi.sink.compact.ITTestHoodieFlinkCompactor > 2022-08-24T03:15:01.9378742Z [ERROR] Tests run: 7, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 74.337 s <<< FAILURE! - in > org.apache.hudi.sink.compact.ITTestHoodieFlinkCompactor > 2022-08-24T03:15:01.9381479Z [ERROR] > testHoodieFlinkCompactorWithPlanSelectStrategy{boolean}[1] Time elapsed: > 7.976 s <<< ERROR! > 2022-08-24T03:15:01.9382198Z > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > 2022-08-24T03:15:01.9383171Z Caused by: > org.apache.f
[jira] [Created] (HUDI-4712) Flaky Tests w/ azure CI
sivabalan narayanan created HUDI-4712: - Summary: Flaky Tests w/ azure CI Key: HUDI-4712 URL: https://issues.apache.org/jira/browse/HUDI-4712 Project: Apache Hudi Issue Type: Test Reporter: sivabalan narayanan In our azure CI runs, last 4 to 5 runs have failed. Mostly its related to Flink It tests. Tracking the failures here. [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10911/logs/24] {code:java} 2022-08-24T02:57:19.7513911Z [INFO] Running org.apache.hudi.table.ITTestHoodieDataSource 2022-08-24T03:13:47.5967799Z [INFO] Tests run: 96, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 987.826 s - in org.apache.hudi.table.ITTestHoodieDataSource 2022-08-24T03:13:47.5997817Z [INFO] Running org.apache.hudi.sink.compact.ITTestHoodieFlinkCompactor 2022-08-24T03:15:01.9378742Z [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 74.337 s <<< FAILURE! - in org.apache.hudi.sink.compact.ITTestHoodieFlinkCompactor 2022-08-24T03:15:01.9381479Z [ERROR] testHoodieFlinkCompactorWithPlanSelectStrategy{boolean}[1] Time elapsed: 7.976 s <<< ERROR! 2022-08-24T03:15:01.9382198Z org.apache.flink.runtime.client.JobExecutionException: Job execution failed. 2022-08-24T03:15:01.9383171Z Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy 2022-08-24T03:15:01.9383730Z Caused by: org.apache.hudi.exception.HoodieIOException: IOException when reading log file 2022-08-24T03:15:01.9384878Z Caused by: java.io.FileNotFoundException: File file:/tmp/junit4630651021120836419/par4/.d6675c02-f0f4-40ba-9f5e-986b84f73cb6_20220824031447463.log.1_0-4-0 does not exist 2022-08-24T03:15:01.9650641Z 2022-08-24T03:15:01.9651540Z [INFO] Running org.apache.hudi.sink.cluster.ITTestHoodieFlinkClustering 2022-08-24T03:15:21.1311853Z [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 19.189 s - in org.apache.hudi.sink.cluster.ITTestHoodieFlinkClustering 2022-08-24T03:15:21.1324486Z [INFO] Running org.apache.hudi.sink.ITTestDataStreamWrite 2022-08-24T03:17:40.5148801Z [ERROR] Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 139.379 s <<< FAILURE! - in org.apache.hudi.sink.ITTestDataStreamWrite 2022-08-24T03:17:40.5149895Z [ERROR] testWriteMergeOnReadWithCompaction{String}[1] Time elapsed: 21.725 s <<< FAILURE! 2022-08-24T03:17:40.5150555Z org.opentest4j.AssertionFailedError: expected: but was: 2022-08-24T03:17:40.5151262Zat org.apache.hudi.sink.ITTestDataStreamWrite.testWriteToHoodie(ITTestDataStreamWrite.java:252) 2022-08-24T03:17:40.5152067Zat org.apache.hudi.sink.ITTestDataStreamWrite.testWriteToHoodie(ITTestDataStreamWrite.java:182) 2022-08-24T03:17:40.5153086Zat org.apache.hudi.sink.ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction(ITTestDataStreamWrite.java:156) 2022-08-24T03:17:40.5153593Z 2022-08-24T03:17:41.0381772Z [INFO] 2022-08-24T03:17:41.0382573Z [INFO] Results: 2022-08-24T03:17:41.0383977Z [INFO] 2022-08-24T03:17:41.0384447Z [ERROR] Failures: 2022-08-24T03:17:41.0385833Z [ERROR] ITTestDataStreamWrite.testWriteMergeOnReadWithCompaction:156->testWriteToHoodie:182->testWriteToHoodie:252 expected: but was: 2022-08-24T03:17:41.0386560Z [ERROR] Errors: 2022-08-24T03:17:41.0387330Z [ERROR] ITTestHoodieFlinkCompactor.testHoodieFlinkCompactorWithPlanSelectStrategy » JobExecution 2022-08-24T03:17:41.0387896Z [INFO] {code} [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10911/logs/24] {code:java} 2022-08-24T02:57:18.8817403Z [INFO] --- 2022-08-24T02:57:19.7513911Z [INFO] Running org.apache.hudi.table.ITTestHoodieDataSource 2022-08-24T03:13:47.5967799Z [INFO] Tests run: 96, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 987.826 s - in org.apache.hudi.table.ITTestHoodieDataSource 2022-08-24T03:13:47.5997817Z [INFO] Running org.apache.hudi.sink.compact.ITTestHoodieFlinkCompactor 2022-08-24T03:15:01.9378742Z [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 74.337 s <<< FAILURE! - in org.apache.hudi.sink.compact.ITTestHoodieFlinkCompactor 2022-08-24T03:15:01.9381479Z [ERROR] testHoodieFlinkCompactorWithPlanSelectStrategy{boolean}[1] Time elapsed: 7.976 s <<< ERROR! 2022-08-24T03:15:01.9382198Z org.apache.flink.runtime.client.JobExecutionException: Job execution failed. 2022-08-24T03:15:01.9383171Z Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy 2022-08-24T03:15:01.9383730Z Caused by: org.apache.hudi.exception.HoodieIOException: IOException when reading log file 2022-08-24T03:15:01.9384878Z Caused by: java.io.FileNotFoundException: File file:/tmp/junit4630651021120836419/par4/.d6675c02-f0f4-40ba-9f5e-986b84f73cb6
[GitHub] [hudi] hudi-bot commented on pull request #6135: [HUDI-4418] Add support for ProtoKafkaSource
hudi-bot commented on PR #6135: URL: https://github.com/apache/hudi/pull/6135#issuecomment-1226143907 ## CI report: * d36fed637603d9959e8d049ac0815b9c729eb246 UNKNOWN * f70abbc3b45005d40e74252814edc0078a50030e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10909) * 194396032e698ac4210d8652a969c7e58832d5db Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10925) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] WangCHX opened a new issue, #6487: [SUPPORT] Primary key check in deltastreamer
WangCHX opened a new issue, #6487: URL: https://github.com/apache/hudi/issues/6487 **Describe the problem you faced** we accidentally configure wrong primary key in the spark write config, it cause duplicate data. wondering if there is a way to avoid it. **To Reproduce** change the primary config in write config and run the spark job. **Expected behavior** maybe should block the spark job to write data if the primary key config is different from the primary key in the original table. **Environment Description** * Hudi version : 0.11.0 * Spark version : 3.2.1 * Storage (HDFS/S3/GCS..) : GCS * Running on Docker? (yes/no) : yes. on k8s. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1265) Improving bootstrap and efficient migration of existing non-Hudi dataset
[ https://issues.apache.org/jira/browse/HUDI-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1265: - Due Date: 30/Sep/22 > Improving bootstrap and efficient migration of existing non-Hudi dataset > > > Key: HUDI-1265 > URL: https://issues.apache.org/jira/browse/HUDI-1265 > Project: Apache Hudi > Issue Type: Epic > Components: bootstrap >Reporter: Balaji Varadarajan >Assignee: Ethan Guo >Priority: Blocker > Labels: hudi-umbrellas > Fix For: 0.13.0 > > > This is an EPIC to revisit the logic of bootstrap for efficient migration of > existing non-Hudi dataset, bridging any gaps with new features such as > metadata table. > Here are the two modes of bootstrap and migration we suppose to support: > # Onboard for new partitions alone: Given an existing non-Hudi partitioned > dataset (/path/parquet), Hudi manages new partitions under the same table > path (/path/parquet) while keeping non-Hudi partitions untouched in place. > Query engine treats non-Hudi partitions differently when reading the data. > This works perfect for immutable data where there are no updates to old > partitions and new data is only appended to the new partition. > # Metadata-only and full-record bootstrap: Given an existing parquet dataset > (/path/parquet), Hudi generates the record-level metadata (Hudi meta columns) > during the bootstrap process in a new table path (/path/parquet_hudi) > different from the parquet dataset. There are two modes; they can be chosen > at the granularity of partition in a single bootstrap action. This unlocks > the ability for Hudi to do upsert for all partitions. > ## Metadata-only: generates record-level metadata only per parquet file and > a bootstrap index for mapping, without rewriting the actual data records. > During query execution, the source data is merged with Hudi metadata to > return the results. This is the default mode. > ## Full-record: use bulk insert to generate record-level metadata, copy over > and rewrite the source data with bulk insert. During query execution, > record-level metadata, i.e., meta columns, and the data columns are read from > the same parquet, improving the read performance. > Phase 1: Testing and verification of status-quo (1~1.5 week) > Writing: > * Two migration modes above > * COW and MOR > * 1 additional commit after bootstrap doing upsert for metadata-only and > full-record bootstrap > * Spark datasource, Deltastreamer > * Partitioned and non-partitioned table > * Simple/complex key gen > * Hive-style partition > * w/ and w/o metadata table enabled > * Meta sync > Reading: > * Hive QL, Spark SQL, Spark datasource, Presto/Trino > * Snapshot, read-optimized, incremental query > * Queries in the original query testing plan: > [https://docs.google.com/spreadsheets/d/1xVfatk-6-fekwuCCZ-nTHQkewcHSEk89y-ReVV5vHQU/edit#gid=1813901684] > Need to develop a validation tool for automated validation > * Metadata, i.e., meta columns and index in metadata table, is properly > populated > * Data queried from Hudi table matches the parquet data > Add tests when needed > * HUDI-4125 Add integration tests around bootstrapped Hudi table > Phase 2: Functionality and correctness fix, (2~3 weeks) > Known and possible issues: > * Spark cannot see non-Hudi partitions in first onboarding mode > * Bootstrap Relation does not support MOR; HUDI-2071 Support Reading > Bootstrap MOR RT Table In Spark DataSource Table > * HUDI-915 Partition Columns missing in files upserted after Metadata > Bootstrap > * HUDI-992 For hive-style partitioned source data, partition columns synced > with Hive will always have String type > * HUDI-1369 Bootstrap Datasource jobs from hanging via spark-submit > * HUDI-3122 Presto query failed for bootstrap tables > * HUDI-1779 Fail to bootstrap/upsert a table which contains timestamp column > Phase 3: Performance (1~2 weeks) > * HUDI-1157 Optimization whether to query Bootstrapped table using > HoodieBootstrapRelation vs Sparks Parquet datasource > * HUDI-4453 Support partition pruning for tables Bootstrapped from Source > Hive Style partitioned tables > * HUDI-619 Avoid stitching meta columns and only load data columns for > improving read performance > * HUDI-1158 Optimizations in parallelized listing behaviour for markers and > bootstrap source files > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6438: [HUDI-4642] Adding support to hudi-cli to repair depcrated partition
hudi-bot commented on PR #6438: URL: https://github.com/apache/hudi/pull/6438#issuecomment-1226132397 ## CI report: * fea65135a8035ef70929759594da64dc985a2d0a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10924) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch asf-site updated: [DOCS] Add YouTube channel and Office hours page (#6482)
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 50937a8c70 [DOCS] Add YouTube channel and Office hours page (#6482) 50937a8c70 is described below commit 50937a8c7014af79b30c067a4641c6df47cc6889 Author: Bhavani Sudha Saktheeswaran <2179254+bhasu...@users.noreply.github.com> AuthorDate: Wed Aug 24 23:58:37 2022 +0530 [DOCS] Add YouTube channel and Office hours page (#6482) --- website/community/office_hours.md | 13 + website/community/syncs.md| 9 - website/community/team.md | 2 +- website/docusaurus.config.js | 14 ++ website/src/css/custom.css| 10 +- website/static/assets/images/youtube.jpeg | Bin 0 -> 8825 bytes 6 files changed, 37 insertions(+), 11 deletions(-) diff --git a/website/community/office_hours.md b/website/community/office_hours.md new file mode 100644 index 00..4ca6efae1c --- /dev/null +++ b/website/community/office_hours.md @@ -0,0 +1,13 @@ +--- +sidebar_position: 3 +title: "Office Hours" +toc: true +--- + +# Weekly Office Hours + +**[ZOOM LINK TO JOIN](https://zoom.us/j/95710395048)** + +Office hours are held every week on Thu, 08:00 AM Pacific Time (US and Canada)([translate to other time zones](https://www.worldtimebuddy.com/?qm=1&lid=5368361,2643743,1264527,1796236&h=5368361&date=2022-8-25&sln=8-9&hf=1)) + +One of the PMC members/committers will hold office hours to help answer questions interactively, on a first-come first-serve basis. diff --git a/website/community/syncs.md b/website/community/syncs.md index b1373e8b33..7cb89ea0f0 100644 --- a/website/community/syncs.md +++ b/website/community/syncs.md @@ -37,12 +37,3 @@ If you would like to present in one of the community calls, please fill out a [f Here are some upcoming calls for convenience. ![Upcoming calls](/assets/images/upcoming-community-calls.png) - - -## Weekly Office Hours - -**[ZOOM LINK TO JOIN](https://zoom.us/j/95710395048)** - -When every week on Thu, 08:00 AM Pacific Time (US and Canada)([translate to other time zones](https://www.worldtimebuddy.com/?qm=1&lid=5368361,2643743,1264527,1796236&h=5368361&date=2021-11-24&sln=8-9&hf=1)) - -One of the PMC members/committers will hold office hours to help answer questions interactively, on a first-come first-serve basis. diff --git a/website/community/team.md b/website/community/team.md index a69306e68d..062a6d2cd8 100644 --- a/website/community/team.md +++ b/website/community/team.md @@ -1,5 +1,5 @@ --- -sidebar_position: 3 +sidebar_position: 4 title: "Team" toc: true last_modified_at: 2020-09-01T15:59:57-04:00 diff --git a/website/docusaurus.config.js b/website/docusaurus.config.js index d67bde016f..f3bff36771 100644 --- a/website/docusaurus.config.js +++ b/website/docusaurus.config.js @@ -185,6 +185,10 @@ module.exports = { label: 'Community Syncs', to: '/community/syncs', }, +{ + label: 'Office Hours', + to: '/community/office_hours', +}, { label: 'Team', to: '/community/team', @@ -231,6 +235,12 @@ module.exports = { className: 'header-slack-link', 'aria-label': 'Hudi Slack Channel', }, +{ + href: 'https://www.youtube.com/channel/UCs7AhE0BWaEPZSChrBR-Muw', + position: 'right', + className: 'header-youtube-link', + 'aria-label': 'Hudi YouTube Channel', +}, ], }, footer: { @@ -342,6 +352,10 @@ module.exports = { label: 'Twitter', href: 'https://twitter.com/ApacheHudi', }, +{ + label: 'YouTube', + href: 'https://www.youtube.com/channel/UCs7AhE0BWaEPZSChrBR-Muw', +}, { label: 'Mailing List', to: 'mailto:dev-subscr...@hudi.apache.org?Subject=SubscribeToHudi', diff --git a/website/src/css/custom.css b/website/src/css/custom.css index 94f10971e9..21c02d3bbb 100644 --- a/website/src/css/custom.css +++ b/website/src/css/custom.css @@ -39,7 +39,7 @@ html[data-theme='dark'] .docusaurus-highlight-code-line { } @media (max-width: 767px) { - .hero__img, .header-github-link, .header-slack-link, .header-twitter-link { + .hero__img, .header-github-link, .header-slack-link, .header-twitter-link, .header-youtube-link { display: none; } .hero__title { @@ -93,6 +93,14 @@ html[data-theme='dark'] .docusaurus-highlight-code-line { width: 30px; } +.header-youtube-link:before { + background: url(/assets/images/youtube.jpeg) no-repeat; + content: ""; + display: flex; + height: 30px; + width: 30px; +} + .hero__title { font-size: 4rem; text-ali
[GitHub] [hudi] xushiyan merged pull request #6482: [DOCS] Add youtube channel and Office hours page
xushiyan merged PR #6482: URL: https://github.com/apache/hudi/pull/6482 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6438: [HUDI-4642] Adding support to hudi-cli to repair depcrated partition
hudi-bot commented on PR #6438: URL: https://github.com/apache/hudi/pull/6438#issuecomment-1226069262 ## CI report: * 6e3fa8ca9ca7f5a72bfd9d8c4874183b9ff64586 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10810) * fea65135a8035ef70929759594da64dc985a2d0a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10924) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6438: [HUDI-4642] Adding support to hudi-cli to repair depcrated partition
hudi-bot commented on PR #6438: URL: https://github.com/apache/hudi/pull/6438#issuecomment-1226064753 ## CI report: * 6e3fa8ca9ca7f5a72bfd9d8c4874183b9ff64586 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10810) * fea65135a8035ef70929759594da64dc985a2d0a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6135: [HUDI-4418] Add support for ProtoKafkaSource
hudi-bot commented on PR #6135: URL: https://github.com/apache/hudi/pull/6135#issuecomment-1226064289 ## CI report: * d36fed637603d9959e8d049ac0815b9c729eb246 UNKNOWN * f70abbc3b45005d40e74252814edc0078a50030e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10909) * 194396032e698ac4210d8652a969c7e58832d5db UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-4711) Fix flaky: ITTestHoodieDataSource#testAppendWrite (false)
sivabalan narayanan created HUDI-4711: - Summary: Fix flaky: ITTestHoodieDataSource#testAppendWrite (false) Key: HUDI-4711 URL: https://issues.apache.org/jira/browse/HUDI-4711 Project: Apache Hudi Issue Type: Improvement Reporter: sivabalan narayanan occurances: # Aug24th: [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10923/logs/22] # -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4710) Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue
sivabalan narayanan created HUDI-4710: - Summary: Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue Key: HUDI-4710 URL: https://issues.apache.org/jira/browse/HUDI-4710 Project: Apache Hudi Issue Type: Improvement Reporter: sivabalan narayanan Instance occurance: Aug 24th: [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/10923/logs/22] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4709) [RFC-48] Log Compaction Code review
[ https://issues.apache.org/jira/browse/HUDI-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4709: -- Story Points: 2 > [RFC-48] Log Compaction Code review > --- > > Key: HUDI-4709 > URL: https://issues.apache.org/jira/browse/HUDI-4709 > Project: Apache Hudi > Issue Type: Task >Reporter: Prasanna Rajaperumal >Assignee: sivabalan narayanan >Priority: Major > > Specifically the changes on the merge logic in AbstractHoodieLogRecordReader -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4709) [RFC-48] Log Compaction Code review
[ https://issues.apache.org/jira/browse/HUDI-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4709: -- Sprint: 2022/08/22 > [RFC-48] Log Compaction Code review > --- > > Key: HUDI-4709 > URL: https://issues.apache.org/jira/browse/HUDI-4709 > Project: Apache Hudi > Issue Type: Task >Reporter: Prasanna Rajaperumal >Assignee: sivabalan narayanan >Priority: Major > > Specifically the changes on the merge logic in AbstractHoodieLogRecordReader -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-4709) [RFC-48] Log Compaction Code review
[ https://issues.apache.org/jira/browse/HUDI-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-4709: - Assignee: sivabalan narayanan > [RFC-48] Log Compaction Code review > --- > > Key: HUDI-4709 > URL: https://issues.apache.org/jira/browse/HUDI-4709 > Project: Apache Hudi > Issue Type: Task >Reporter: Prasanna Rajaperumal >Assignee: sivabalan narayanan >Priority: Major > > Specifically the changes on the merge logic in AbstractHoodieLogRecordReader -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6486: [HUDI-4706] Fix InternalSchemaChangeApplier#applyAddChange error to add nest type
hudi-bot commented on PR #6486: URL: https://github.com/apache/hudi/pull/6486#issuecomment-1225990562 ## CI report: * d6b7c487e76c46460a2fb0c9647aeea901d17995 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10921) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6486: [HUDI-4706] Fix InternalSchemaChangeApplier#applyAddChange error to add nest type
hudi-bot commented on PR #6486: URL: https://github.com/apache/hudi/pull/6486#issuecomment-1225985254 ## CI report: * d6b7c487e76c46460a2fb0c9647aeea901d17995 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10921) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4709) [RFC-48] Log Compaction Code review
[ https://issues.apache.org/jira/browse/HUDI-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Rajaperumal updated HUDI-4709: --- Summary: [RFC-48] Log Compaction Code review (was: [RFC-48] Log Compaction Review Code) > [RFC-48] Log Compaction Code review > --- > > Key: HUDI-4709 > URL: https://issues.apache.org/jira/browse/HUDI-4709 > Project: Apache Hudi > Issue Type: Task >Reporter: Prasanna Rajaperumal >Priority: Major > > Specifically the changes on the merge logic in AbstractHoodieLogRecordReader -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4709) [RFC-48] Log Compaction Review Code
[ https://issues.apache.org/jira/browse/HUDI-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Rajaperumal updated HUDI-4709: --- Summary: [RFC-48] Log Compaction Review Code (was: [RFC-48] Review Code) > [RFC-48] Log Compaction Review Code > --- > > Key: HUDI-4709 > URL: https://issues.apache.org/jira/browse/HUDI-4709 > Project: Apache Hudi > Issue Type: Task >Reporter: Prasanna Rajaperumal >Priority: Major > > Specifically the changes on the merge logic in AbstractHoodieLogRecordReader -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4709) [RFC-48] Review Code
Prasanna Rajaperumal created HUDI-4709: -- Summary: [RFC-48] Review Code Key: HUDI-4709 URL: https://issues.apache.org/jira/browse/HUDI-4709 Project: Apache Hudi Issue Type: Task Reporter: Prasanna Rajaperumal Specifically the changes on the merge logic in AbstractHoodieLogRecordReader -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] wzx140 commented on pull request #6486: [HUDI-4706] Fix InternalSchemaChangeApplier#applyAddChange error to add nest type
wzx140 commented on PR #6486: URL: https://github.com/apache/hudi/pull/6486#issuecomment-1225940143 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6450: [HUDI-4665] Flipping default for "ignore failed batch" config in streaming sink to false
hudi-bot commented on PR #6450: URL: https://github.com/apache/hudi/pull/6450#issuecomment-1225917271 ## CI report: * 214c313d89e0d8abc5ea356d0fc10c475b138ad2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10923) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] 15663671003 commented on issue #5765: [SUPPORT] throw "java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()"
15663671003 commented on issue #5765: URL: https://github.com/apache/hudi/issues/5765#issuecomment-1225911236 > I created a ticket to track the fix: [HUDI-4341](https://issues.apache.org/jira/browse/HUDI-4341). Will the next version consider fixing this problem? which bothers newbies like me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4696) Flaky: TestHoodieCombineHiveInputFormat.setUpClass:86 » NullPointer
[ https://issues.apache.org/jira/browse/HUDI-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4696: - Status: In Progress (was: Open) > Flaky: TestHoodieCombineHiveInputFormat.setUpClass:86 » NullPointer > > > Key: HUDI-4696 > URL: https://issues.apache.org/jira/browse/HUDI-4696 > Project: Apache Hudi > Issue Type: Task >Reporter: Raymond Xu >Assignee: Raymond Xu >Priority: Major > Fix For: 0.12.1 > > > https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=10720&view=logs&j=dcedfe73-9485-5cc5-817a-73b61fc5dcb0&t=746585d8-b50a-55c3-26c5-517d93af9934 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-2673) Add integration/e2e test for kafka-connect functionality
[ https://issues.apache.org/jira/browse/HUDI-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2673: - Sprint: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 2022/05/02, 2022/05/16, 2022/09/05 (was: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 2022/05/02, 2022/05/16) > Add integration/e2e test for kafka-connect functionality > > > Key: HUDI-2673 > URL: https://issues.apache.org/jira/browse/HUDI-2673 > Project: Apache Hudi > Issue Type: Task > Components: kafka-connect, tests-ci >Reporter: Ethan Guo >Assignee: Raymond Xu >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > The integration test should use bundle jar and run in docker setup. This can > prevent any issue in the bundle, like HUDI-3903, that is not covered by unit > and functional tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-2673) Add integration/e2e test for kafka-connect functionality
[ https://issues.apache.org/jira/browse/HUDI-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2673: - Sprint: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 2022/05/02, 2022/05/16 (was: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 2022/05/02, 2022/05/16, 2022/08/22) > Add integration/e2e test for kafka-connect functionality > > > Key: HUDI-2673 > URL: https://issues.apache.org/jira/browse/HUDI-2673 > Project: Apache Hudi > Issue Type: Task > Components: kafka-connect, tests-ci >Reporter: Ethan Guo >Assignee: Raymond Xu >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > The integration test should use bundle jar and run in docker setup. This can > prevent any issue in the bundle, like HUDI-3903, that is not covered by unit > and functional tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4650) Commits Command: Include both active and archive timeline for a given range of intants
[ https://issues.apache.org/jira/browse/HUDI-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4650: - Reviewers: sivabalan narayanan > Commits Command: Include both active and archive timeline for a given range > of intants > -- > > Key: HUDI-4650 > URL: https://issues.apache.org/jira/browse/HUDI-4650 > Project: Apache Hudi > Issue Type: Improvement > Components: cli >Reporter: Sagar Sumit >Assignee: Sagar Sumit >Priority: Blocker > Fix For: 0.12.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4528) Diff tool to compare metadata across snapshots in a given time range
[ https://issues.apache.org/jira/browse/HUDI-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4528: - Reviewers: sivabalan narayanan > Diff tool to compare metadata across snapshots in a given time range > > > Key: HUDI-4528 > URL: https://issues.apache.org/jira/browse/HUDI-4528 > Project: Apache Hudi > Issue Type: Task > Components: cli >Reporter: Sagar Sumit >Assignee: Sagar Sumit >Priority: Blocker > Labels: pull-request-available > Fix For: 0.12.1 > > > A tool that diffs two snapshots at table and partition level and can give > info about what new file ids got created, deleted, updated and track other > changes that are captured in write stats. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4633) Add command to trace partition through a range of commits
[ https://issues.apache.org/jira/browse/HUDI-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4633: - Reviewers: sivabalan narayanan > Add command to trace partition through a range of commits > - > > Key: HUDI-4633 > URL: https://issues.apache.org/jira/browse/HUDI-4633 > Project: Apache Hudi > Issue Type: Improvement > Components: cli >Reporter: Sagar Sumit >Assignee: Sagar Sumit >Priority: Blocker > Labels: pull-request-available > Fix For: 0.12.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4389) Make HoodieStreamingSink idempotent
[ https://issues.apache.org/jira/browse/HUDI-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4389: - Status: In Progress (was: Open) > Make HoodieStreamingSink idempotent > --- > > Key: HUDI-4389 > URL: https://issues.apache.org/jira/browse/HUDI-4389 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Assignee: Sagar Sumit >Priority: Blocker > Labels: pull-request-available, streaming > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4389) Make HoodieStreamingSink idempotent
[ https://issues.apache.org/jira/browse/HUDI-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4389: - Status: Patch Available (was: In Progress) > Make HoodieStreamingSink idempotent > --- > > Key: HUDI-4389 > URL: https://issues.apache.org/jira/browse/HUDI-4389 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Assignee: Sagar Sumit >Priority: Blocker > Labels: pull-request-available, streaming > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4389) Make HoodieStreamingSink idempotent
[ https://issues.apache.org/jira/browse/HUDI-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4389: - Story Points: 1 > Make HoodieStreamingSink idempotent > --- > > Key: HUDI-4389 > URL: https://issues.apache.org/jira/browse/HUDI-4389 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Assignee: Sagar Sumit >Priority: Blocker > Labels: pull-request-available, streaming > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-2673) Add integration/e2e test for kafka-connect functionality
[ https://issues.apache.org/jira/browse/HUDI-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584323#comment-17584323 ] Raymond Xu commented on HUDI-2673: -- Pivot to make KC into docker demo > Add integration/e2e test for kafka-connect functionality > > > Key: HUDI-2673 > URL: https://issues.apache.org/jira/browse/HUDI-2673 > Project: Apache Hudi > Issue Type: Task > Components: kafka-connect, tests-ci >Reporter: Ethan Guo >Assignee: Raymond Xu >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > The integration test should use bundle jar and run in docker setup. This can > prevent any issue in the bundle, like HUDI-3903, that is not covered by unit > and functional tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)