[GitHub] [hudi] hudi-bot commented on pull request #7467: [MINOR] fixed Flink's DataStream does not support creating managed table
hudi-bot commented on PR #7467: URL: https://github.com/apache/hudi/pull/7467#issuecomment-1352682386 ## CI report: * 825867864b847cb097b17f281824096a2ce41c42 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13753) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13755) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7468: [HUDI-5394] Fix RowCustomColumnsSortPartitioner
hudi-bot commented on PR #7468: URL: https://github.com/apache/hudi/pull/7468#issuecomment-1352682419 ## CI report: * e42cf58d7a37de5f724673c71ea350937709041a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13754) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] qifanlili commented on pull request #7467: [MINOR] fixed Flink's DataStream does not support creating managed table
qifanlili commented on PR #7467: URL: https://github.com/apache/hudi/pull/7467#issuecomment-1352648693 Exactly the same as [#7222](https://github.com/apache/hudi/pull/7222) , please take another look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] loukey-lj commented on pull request #6612: [RFC-58][HUDI-4790] a more effective HoodieMergeHandler for COW table with parquet
loukey-lj commented on PR #6612: URL: https://github.com/apache/hudi/pull/6612#issuecomment-1352637903 > I don't know if I can fully support schema evolution. I hope to improve this function with the help of the community. I will write a small demo as soon as possible -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5386) Cleaning conflicts in occ mode
[ https://issues.apache.org/jira/browse/HUDI-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5386: Summary: Cleaning conflicts in occ mode (was: Rollback conflict in occ mode) > Cleaning conflicts in occ mode > -- > > Key: HUDI-5386 > URL: https://issues.apache.org/jira/browse/HUDI-5386 > Project: Apache Hudi > Issue Type: Bug >Reporter: HunterXHunter >Priority: Major > Attachments: image-2022-12-14-11-26-21-995.png, > image-2022-12-14-11-26-37-252.png > > > {code:java} > configuration parameter: > 'hoodie.cleaner.policy.failed.writes' = 'LAZY' > 'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code} > Because `getInstantsToRollback` is not locked, multiple writes get the same > `instantsToRollback`, the same `instant` will be deleted multiple times and > the same `rollback.inflight` will be created multiple times. > !image-2022-12-14-11-26-37-252.png! > !image-2022-12-14-11-26-21-995.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] Zouxxyy commented on pull request #7468: [HUDI-5394] Fix RowCustomColumnsSortPartitioner
Zouxxyy commented on PR #7468: URL: https://github.com/apache/hudi/pull/7468#issuecomment-1352631288 @boneanxs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7468: [HUDI-5394] Fix RowCustomColumnsSortPartitioner
hudi-bot commented on PR #7468: URL: https://github.com/apache/hudi/pull/7468#issuecomment-1352630472 ## CI report: * e42cf58d7a37de5f724673c71ea350937709041a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7467: [MINOR] fixed Flink's DataStream does not support creating managed table
hudi-bot commented on PR #7467: URL: https://github.com/apache/hudi/pull/7467#issuecomment-1352630435 ## CI report: * 825867864b847cb097b17f281824096a2ce41c42 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7377: [HUDI-4827] Rebase Azure Image on Ubuntu 22.04
hudi-bot commented on PR #7377: URL: https://github.com/apache/hudi/pull/7377#issuecomment-1352630075 ## CI report: * c204b495505dd07be32da89de25e9bb19ceb19d2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13462) * 676c3d09549bd933a2f722989fa4e431a822418e UNKNOWN * 62c8ca2b5fb40e5dd62859b0437004f236535b11 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13751) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5394) RowCustomColumnsSortPartitioner should not use sortWithinPartitions
[ https://issues.apache.org/jira/browse/HUDI-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5394: - Labels: pull-request-available (was: ) > RowCustomColumnsSortPartitioner should not use sortWithinPartitions > --- > > Key: HUDI-5394 > URL: https://issues.apache.org/jira/browse/HUDI-5394 > Project: Apache Hudi > Issue Type: Bug >Reporter: zouxxyy >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] qifanlili commented on pull request #7467: [MINOR] fixed Flink's DataStream does not support creating managed table
qifanlili commented on PR #7467: URL: https://github.com/apache/hudi/pull/7467#issuecomment-1352626937 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Zouxxyy opened a new pull request, #7468: [HUDI-5394] Fix RowCustomColumnsSortPartitioner
Zouxxyy opened a new pull request, #7468: URL: https://github.com/apache/hudi/pull/7468 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update None ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7455: [DO_NOT_MERGE] Release 0.12.2 blockers candidate
hudi-bot commented on PR #7455: URL: https://github.com/apache/hudi/pull/7455#issuecomment-1352626034 ## CI report: * ee8c9dfe97b6f4fad9824244d93bd81718d56511 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13732) * 738f67333ccb21b6c96540ded3477cb379ce0a57 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13752) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7377: [HUDI-4827] Rebase Azure Image on Ubuntu 22.04
hudi-bot commented on PR #7377: URL: https://github.com/apache/hudi/pull/7377#issuecomment-1352625879 ## CI report: * c204b495505dd07be32da89de25e9bb19ceb19d2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13462) * 676c3d09549bd933a2f722989fa4e431a822418e UNKNOWN * 62c8ca2b5fb40e5dd62859b0437004f236535b11 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] qifanlili commented on pull request #7467: [MINOR] fixed Flink's DataStream does not support creating managed table
qifanlili commented on PR #7467: URL: https://github.com/apache/hudi/pull/7467#issuecomment-1352625050 ![Uploading image.png…]() set hoodie.datasource.hive_sync.create_managed_table = true does not take effect The reason is the following code, which is always True This props was not set when the HiveSyncContext was created ![Uploading image.png…]() -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-5394) RowCustomColumnsSortPartitioner should not use sortWithinPartitions
zouxxyy created HUDI-5394: - Summary: RowCustomColumnsSortPartitioner should not use sortWithinPartitions Key: HUDI-5394 URL: https://issues.apache.org/jira/browse/HUDI-5394 Project: Apache Hudi Issue Type: Bug Reporter: zouxxyy -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] qifanlili opened a new pull request, #7467: [MINOR] fixed Flink's DataStream does not support creating managed table
qifanlili opened a new pull request, #7467: URL: https://github.com/apache/hudi/pull/7467 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5090) throw runtime Exception when flink streming job checkpoint abort
[ https://issues.apache.org/jira/browse/HUDI-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-5090: - Fix Version/s: (was: 0.12.2) > throw runtime Exception when flink streming job checkpoint abort > > > Key: HUDI-5090 > URL: https://issues.apache.org/jira/browse/HUDI-5090 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: chenfengLiu >Assignee: chenfengLiu >Priority: Major > Labels: pull-request-available > > When write task in a Flink job want to flush data, there is a condition that > listened a new instant which have been start. If there is no new instant, the > TM will wait for timeout. > We can see the code at > [https://github.com/apache/hudi/blob/master/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/common/AbstractStreamWriteFunction.java#L252.] > Now there is a case that when the JM start new instant fail, JM won't retry > this work. So how all the write tasks will hang. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #7466: [HUDI-5393] Remove the reuse of metadata table writer for flink write…
hudi-bot commented on PR #7466: URL: https://github.com/apache/hudi/pull/7466#issuecomment-1352621750 ## CI report: * 699dea79216b47ec98c271b346b48be2ab112571 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13750) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7465: [HUDI-3661] Flink async compaction is not thread safe when use waterm…
hudi-bot commented on PR #7465: URL: https://github.com/apache/hudi/pull/7465#issuecomment-1352621725 ## CI report: * 25fa9a09aa18a39692fc7885de2361fdd0057f7f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13749) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7464: [HUDI-5366] Closing metadata writer from within writeClient (0.12.2)
hudi-bot commented on PR #7464: URL: https://github.com/apache/hudi/pull/7464#issuecomment-1352621705 ## CI report: * 76da506b6afafdd8138445333d988a6bc5a5cd0e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13748) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7455: [DO_NOT_MERGE] Release 0.12.2 blockers candidate
hudi-bot commented on PR #7455: URL: https://github.com/apache/hudi/pull/7455#issuecomment-1352621635 ## CI report: * ee8c9dfe97b6f4fad9824244d93bd81718d56511 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13732) * 738f67333ccb21b6c96540ded3477cb379ce0a57 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7377: [HUDI-4827] Rebase Azure Image on Ubuntu 22.04
hudi-bot commented on PR #7377: URL: https://github.com/apache/hudi/pull/7377#issuecomment-1352621407 ## CI report: * c204b495505dd07be32da89de25e9bb19ceb19d2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13462) * 676c3d09549bd933a2f722989fa4e431a822418e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup
hudi-bot commented on PR #7159: URL: https://github.com/apache/hudi/pull/7159#issuecomment-1352621117 ## CI report: * 15ecd91180d32c7fa1905c11408f4bc23347e682 UNKNOWN * 45cf7f3c242e20f49b95242a06efe1e24649edc7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13739) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] guanziyue commented on pull request #6612: [RFC-58][HUDI-4790] a more effective HoodieMergeHandler for COW table with parquet
guanziyue commented on PR #6612: URL: https://github.com/apache/hudi/pull/6612#issuecomment-1352617591 > > @loukey-lj : can you respond to @guanziyue 's comment above. I will review this patch by this week. > > Yes, this optimization is applicable to other frameworks. For hudi, its advantage is that it can get rowgroups and store them in the index while updating the index. For schema evolution, we currently only support adding fields. Different rowgroups in the Parquet file can have different schmeas, but this is unknown to the query side. If schema changes are not considered, I can submit a small demo Thanks for your reply. Agree that this idea can improve performance a lot theoretically. It worries me that current parquet implementation or interface cannot fully support this idea. Looking forward to this RFC! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7455: [DO_NOT_MERGE] Release 0.12.2 blockers candidate
hudi-bot commented on PR #7455: URL: https://github.com/apache/hudi/pull/7455#issuecomment-1352616734 ## CI report: * ee8c9dfe97b6f4fad9824244d93bd81718d56511 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13732) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch release-0.12.2-blockers-candidate updated (ee8c9dfe97b -> 738f67333cc)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a change to branch release-0.12.2-blockers-candidate in repository https://gitbox.apache.org/repos/asf/hudi.git from ee8c9dfe97b Fixing schemas used for bootstrap reader add 738f67333cc [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex (#7450) No new revisions were added by this update. Summary of changes: hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
[GitHub] [hudi] nsivabalan merged pull request #7450: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex
nsivabalan merged PR #7450: URL: https://github.com/apache/hudi/pull/7450 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5365) Add TOS StorageScheme to support Volcengine Object Storage
[ https://issues.apache.org/jira/browse/HUDI-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-5365: - Fix Version/s: 0.13.0 > Add TOS StorageScheme to support Volcengine Object Storage > -- > > Key: HUDI-5365 > URL: https://issues.apache.org/jira/browse/HUDI-5365 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Zhiping Wu >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > [TOS|https://www.volcengine.com/product/tos] is an object storage from > Volcengine, and [CFS|https://www.volcengine.com/product/cfs] is a cloud HDFS > from Volcengine. Hudi StorageSchme doesn't support them currently, which > cause we cannot integrate any data processing engines with Hudi on TOS/CFS, I > would suggest support them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HUDI-5365) Add TOS StorageScheme to support Volcengine Object Storage
[ https://issues.apache.org/jira/browse/HUDI-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen resolved HUDI-5365. -- > Add TOS StorageScheme to support Volcengine Object Storage > -- > > Key: HUDI-5365 > URL: https://issues.apache.org/jira/browse/HUDI-5365 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Zhiping Wu >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > [TOS|https://www.volcengine.com/product/tos] is an object storage from > Volcengine, and [CFS|https://www.volcengine.com/product/cfs] is a cloud HDFS > from Volcengine. Hudi StorageSchme doesn't support them currently, which > cause we cannot integrate any data processing engines with Hudi on TOS/CFS, I > would suggest support them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-5365) Add TOS StorageScheme to support Volcengine Object Storage
[ https://issues.apache.org/jira/browse/HUDI-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17647845#comment-17647845 ] Danny Chen commented on HUDI-5365: -- Fixed via master branch: 6ef477238b4818b3a4da07f1426ea0dd296b7dbb > Add TOS StorageScheme to support Volcengine Object Storage > -- > > Key: HUDI-5365 > URL: https://issues.apache.org/jira/browse/HUDI-5365 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Zhiping Wu >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > [TOS|https://www.volcengine.com/product/tos] is an object storage from > Volcengine, and [CFS|https://www.volcengine.com/product/cfs] is a cloud HDFS > from Volcengine. Hudi StorageSchme doesn't support them currently, which > cause we cannot integrate any data processing engines with Hudi on TOS/CFS, I > would suggest support them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[hudi] branch master updated: [HUDI-5365] Add Volcengine Object Storage(tos) and Cloud HDFS(cfs) (#7425)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 6ef477238b4 [HUDI-5365] Add Volcengine Object Storage(tos) and Cloud HDFS(cfs) (#7425) 6ef477238b4 is described below commit 6ef477238b4818b3a4da07f1426ea0dd296b7dbb Author: stayrascal AuthorDate: Thu Dec 15 13:23:14 2022 +0800 [HUDI-5365] Add Volcengine Object Storage(tos) and Cloud HDFS(cfs) (#7425) Co-authored-by: wuzhiping --- .../src/main/java/org/apache/hudi/common/fs/StorageSchemes.java | 6 +- .../src/test/java/org/apache/hudi/common/fs/TestStorageSchemes.java | 2 ++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java b/hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java index 10619f8b3af..9b5af8bc648 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java @@ -69,7 +69,11 @@ public enum StorageSchemes { // Baidu Object Storage BOS("bos", false), // Oracle Cloud Infrastructure Object Storage - OCI("oci", false); + OCI("oci", false), + // Volcengine Object Storage + TOS("tos", false), + // Volcengine Cloud HDFS + CFS("cfs", true); private String scheme; private boolean supportsAppend; diff --git a/hudi-common/src/test/java/org/apache/hudi/common/fs/TestStorageSchemes.java b/hudi-common/src/test/java/org/apache/hudi/common/fs/TestStorageSchemes.java index 354ad6d0cca..7f2e0c2f8de 100644 --- a/hudi-common/src/test/java/org/apache/hudi/common/fs/TestStorageSchemes.java +++ b/hudi-common/src/test/java/org/apache/hudi/common/fs/TestStorageSchemes.java @@ -52,6 +52,8 @@ public class TestStorageSchemes { assertFalse(StorageSchemes.isAppendSupported("ks3")); assertTrue(StorageSchemes.isAppendSupported("ofs")); assertFalse(StorageSchemes.isAppendSupported("oci")); +assertFalse(StorageSchemes.isAppendSupported("tos")); +assertTrue(StorageSchemes.isAppendSupported("cfs")); assertThrows(IllegalArgumentException.class, () -> { StorageSchemes.isAppendSupported("s2"); }, "Should throw exception for unsupported schemes");
[GitHub] [hudi] danny0405 merged pull request #7425: [HUDI-5365] Add Volcengine Object Storage(tos) and Cloud HDFS(cfs)
danny0405 merged PR #7425: URL: https://github.com/apache/hudi/pull/7425 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7466: [HUDI-5393] Remove the reuse of metadata table writer for flink write…
hudi-bot commented on PR #7466: URL: https://github.com/apache/hudi/pull/7466#issuecomment-1352575059 ## CI report: * 699dea79216b47ec98c271b346b48be2ab112571 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7462: [HUDI-5290] Remove the lock in HoodieFlinkWriteClient#writeTableMetatata (0.12.2)
hudi-bot commented on PR #7462: URL: https://github.com/apache/hudi/pull/7462#issuecomment-1352575027 ## CI report: * b296d6ba677a4211e1c0927cd7228e8ff25a5d94 UNKNOWN * 448285015964bd681d1291cb1545ebdec605a3e8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13746) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7464: [HUDI-5366] Closing metadata writer from within writeClient (0.12.2)
hudi-bot commented on PR #7464: URL: https://github.com/apache/hudi/pull/7464#issuecomment-1352575036 ## CI report: * 76da506b6afafdd8138445333d988a6bc5a5cd0e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7465: [HUDI-3661] Flink async compaction is not thread safe when use waterm…
hudi-bot commented on PR #7465: URL: https://github.com/apache/hudi/pull/7465#issuecomment-1352575047 ## CI report: * 25fa9a09aa18a39692fc7885de2361fdd0057f7f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #7428: [HUDI-5368] decouple GlueCatalogSyncTool by using reflecting instead of import class directly.
danny0405 commented on PR #7428: URL: https://github.com/apache/hudi/pull/7428#issuecomment-1352573297 @xushiyan Can you take a look if you have time :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7462: [HUDI-5290] Remove the lock in HoodieFlinkWriteClient#writeTableMetatata (0.12.2)
hudi-bot commented on PR #7462: URL: https://github.com/apache/hudi/pull/7462#issuecomment-1352571760 ## CI report: * b296d6ba677a4211e1c0927cd7228e8ff25a5d94 UNKNOWN * 448285015964bd681d1291cb1545ebdec605a3e8 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Write call stack information to lock file
hudi-bot commented on PR #7440: URL: https://github.com/apache/hudi/pull/7440#issuecomment-1352571674 ## CI report: * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN * 391cc64f7aaabdc0f72c85fa3ac03036d09ef43a UNKNOWN * 63fe7e7c8a882d757cbea6a4d26b7aba4bdad748 UNKNOWN * 6158087d7e518a7bef01ba01b993029436bf429d Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13740) * a509172c60864820f6758716c4e832645a97a57f UNKNOWN * 155476af5ac3169cc11c9b4fab5057ca407995f8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13745) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #7419: [WIP][HUDI-5357] Optimize deployment of release artifacts
danny0405 commented on code in PR #7419: URL: https://github.com/apache/hudi/pull/7419#discussion_r1049222602 ## scripts/release/deploy_staging_jars.sh: ## @@ -37,15 +37,26 @@ if [ "$#" -gt "1" ]; then fi declare -a ALL_VERSION_OPTS=( -"-Dscala-2.11 -Dspark2 -Dflink1.13" # for legacy bundle name -"-Dscala-2.12 -Dspark2 -Dflink1.13" # for legacy bundle name -"-Dscala-2.12 -Dspark3 -Dflink1.14" # for legacy bundle name -"-Dscala-2.11 -Dspark2.4 -Dflink1.13" -"-Dscala-2.11 -Dspark2.4 -Dflink1.14" -"-Dscala-2.12 -Dspark2.4 -Dflink1.13" -"-Dscala-2.12 -Dspark3.3 -Dflink1.15" -"-Dscala-2.12 -Dspark3.2 -Dflink1.14" -"-Dscala-2.12 -Dspark3.1 -Dflink1.14" # run this last to make sure utilities bundle has spark 3.1 +# upload all module jars and bundle jars +"-Dscala-2.11 -Dspark2.4" +"-Dscala-2.12 -Dspark2.4" +"-Dscala-2.12 -Dspark3.1" +"-Dscala-2.12 -Dspark3.2" +"-Dscala-2.12 -Dspark3.3" + +# spark bundles (legacy) (not overwriting previous uploads as these jar names are unique) +"-Dscala-2.11 -Dspark2 -pl packaging/hudi-spark-bundle" # for legacy bundle name hudi-spark-bundle_2.11 +"-Dscala-2.12 -Dspark2 -pl packaging/hudi-spark-bundle" # for legacy bundle name hudi-spark-bundle_2.12 +"-Dscala-2.12 -Dspark3 -pl packaging/hudi-spark-bundle" # for legacy bundle name hudi-spark3-bundle_2.12 + +# utilities bundles (legacy) (overwriting previous uploads) +"-Dscala-2.11 -Dspark2.4 -pl packaging/hudi-utilities-bundle" # utilities-bundle_2.11 is for spark 2.4 only +"-Dscala-2.12 -Dspark3.1 -pl packaging/hudi-utilities-bundle" # utilities-bundle_2.12 is for spark 3.1 only + +# flink bundles (overwriting previous uploads) +"-Dscala-2.12 -Dflink1.13 -Davro.version=1.10.0 -pl packaging/hudi-flink-bundle" +"-Dscala-2.12 -Dflink1.14 -Davro.version=1.10.0 -pl packaging/hudi-flink-bundle" +"-Dscala-2.12 -Dflink1.15 -Davro.version=1.10.0 -pl packaging/hudi-flink-bundle" Review Comment: The hard code avro version is hard to maintain. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #7464: [HUDI-5366] Closing metadata writer from within writeClient (0.12.2)
danny0405 commented on PR #7464: URL: https://github.com/apache/hudi/pull/7464#issuecomment-1352568899 Thanks for the ckerry-pick, have fired a following fix: https://github.com/apache/hudi/pull/7466 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7462: [HUDI-5290] Remove the lock in HoodieFlinkWriteClient#writeTableMetatata (0.12.2)
hudi-bot commented on PR #7462: URL: https://github.com/apache/hudi/pull/7462#issuecomment-1352567892 ## CI report: * b296d6ba677a4211e1c0927cd7228e8ff25a5d94 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] [testing]Hive3 query returns null when the where clause has a partition field
hudi-bot commented on PR #7355: URL: https://github.com/apache/hudi/pull/7355#issuecomment-1352567402 ## CI report: * c60978aaf0dd183b05139dda6bd741ea43877f42 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13715) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13736) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13744) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5393) Remove the reuse of metadata table writer for flink write client
[ https://issues.apache.org/jira/browse/HUDI-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5393: - Labels: pull-request-available (was: ) > Remove the reuse of metadata table writer for flink write client > > > Key: HUDI-5393 > URL: https://issues.apache.org/jira/browse/HUDI-5393 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.12.2, 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #7450: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex
hudi-bot commented on PR #7450: URL: https://github.com/apache/hudi/pull/7450#issuecomment-1352567780 ## CI report: * 3d4f6bf574764b5e8c962b94fa1b7fbfd6e735b5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13737) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7423: [HUDI-5384] Adding optimization rule to appropriately push down filters into the `HoodieFileIndex`
hudi-bot commented on PR #7423: URL: https://github.com/apache/hudi/pull/7423#issuecomment-1352567585 ## CI report: * 09b901a56869b8282c92d6c05ad746f98f2d6a01 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13735) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 opened a new pull request, #7466: [HUDI-5393] Remove the reuse of metadata table writer for flink write…
danny0405 opened a new pull request, #7466: URL: https://github.com/apache/hudi/pull/7466 … client ### Change Logs After HUDI-5366, the writer is closed after each write, there is no need to reuse the writer anymore, even thoudh the reuse can reduce some cost but the state is hard to maintain as correct. ### Impact No ### Risk level (write none, low medium or high below) none ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5393) Remove the reuse of metadata table writer for flink write client
[ https://issues.apache.org/jira/browse/HUDI-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-5393: - Fix Version/s: 0.12.2 0.13.0 > Remove the reuse of metadata table writer for flink write client > > > Key: HUDI-5393 > URL: https://issues.apache.org/jira/browse/HUDI-5393 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Danny Chen >Priority: Major > Fix For: 0.12.2, 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] xicm commented on pull request #7355: [HUDI-5308] [testing]Hive3 query returns null when the where clause has a partition field
xicm commented on PR #7355: URL: https://github.com/apache/hudi/pull/7355#issuecomment-1352559067 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a diff in pull request #7175: [HUDI-5191] Fix compatibility with avro 1.10
xushiyan commented on code in PR #7175: URL: https://github.com/apache/hudi/pull/7175#discussion_r1049214021 ## .github/workflows/bot.yml: ## @@ -73,6 +73,14 @@ jobs: run: | HUDI_VERSION=$(mvn help:evaluate -Dexpression=project.version -q -DforceStdout) ./packaging/bundle-validation/ci_run.sh $HUDI_VERSION + - name: Common Test Review Comment: there is a deeper issue with this - hudi common is tightly coupled with avro models, which variates wrt spark profiles. currently hudi-common jar won't be compatible across all engine profiles: e.g., if built with spark3.3 (avro 1.11), it won't work with spark 2 (avro 1.8) or flink (avro 1.10). this needs to be decoupled first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-5393) Remove the reuse of metadata table writer for flink write client
Danny Chen created HUDI-5393: Summary: Remove the reuse of metadata table writer for flink write client Key: HUDI-5393 URL: https://issues.apache.org/jira/browse/HUDI-5393 Project: Apache Hudi Issue Type: Bug Components: flink Reporter: Danny Chen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] danny0405 opened a new pull request, #7465: [HUDI-3661] Flink async compaction is not thread safe when use waterm…
danny0405 opened a new pull request, #7465: URL: https://github.com/apache/hudi/pull/7465 …ark (#7399) (cherry picked from commit 86d1e39fb4e971b11e8c6394f6611b7bd7089bd4) ### Change Logs This is a bug fix cherry pick for release 0.12.2. ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan opened a new pull request, #7464: [HUDI-5366] Closing metadata writer from within writeClient (0.12.2)
nsivabalan opened a new pull request, #7464: URL: https://github.com/apache/hudi/pull/7464 ### Change Logs Re-applying https://github.com/apache/hudi/pull/7437 against 0.12.2 branch. Closing metadata writer wherever possible. Stacked on top of https://github.com/apache/hudi/pull/7462 ### Impact Closing open file handles to MDT. ### Risk level (write none, low medium or high below) low. ### Documentation Update N/A ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] abhishekkh opened a new pull request, #7463: add jsontoavro converter
abhishekkh opened a new pull request, #7463: URL: https://github.com/apache/hudi/pull/7463 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] loukey-lj commented on a diff in pull request #7336: [HUDI-5297][HUDI-5298] Refactoring WriteStatus
loukey-lj commented on code in PR #7336: URL: https://github.com/apache/hudi/pull/7336#discussion_r1049198455 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java: ## @@ -86,6 +88,7 @@ protected HoodieTimer timer; protected WriteStatus writeStatus; + protected HoodieRecordLocation newRecordLocation; Review Comment: I got it wrong. I thought that a WriteStatus only has one location instance. Now it seems that there is no great change from obtaining the loaction from HoodieRecord. But in some partition change scenarios, we may need to use HoodieRecord's operation to determine the curd of the index -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] qifanlili closed pull request #7222: [MINOR] fixed Flink's DataStream does not support creating managed table
qifanlili closed pull request #7222: [MINOR] fixed Flink's DataStream does not support creating managed table URL: https://github.com/apache/hudi/pull/7222 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan opened a new pull request, #7462: [HUDI-5290] Remove the lock in HoodieFlinkWriteClient#writeTableMetad…
nsivabalan opened a new pull request, #7462: URL: https://github.com/apache/hudi/pull/7462 …ata (#7320) ### Change Logs Re-applying https://github.com/apache/hudi/pull/7320 against 0.12.2 branch remove the lock in #writeTableMetadata ### Impact Support metadata table in Flink ### Risk level (write none, low medium or high below) low ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Write call stack information to lock file
hudi-bot commented on PR #7440: URL: https://github.com/apache/hudi/pull/7440#issuecomment-1352527902 ## CI report: * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN * 391cc64f7aaabdc0f72c85fa3ac03036d09ef43a UNKNOWN * 63fe7e7c8a882d757cbea6a4d26b7aba4bdad748 UNKNOWN * 6158087d7e518a7bef01ba01b993029436bf429d Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13740) * a509172c60864820f6758716c4e832645a97a57f UNKNOWN * 155476af5ac3169cc11c9b4fab5057ca407995f8 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7461: [HUDI-5392] Fixing Bootstrapping flow handling of arrays
hudi-bot commented on PR #7461: URL: https://github.com/apache/hudi/pull/7461#issuecomment-1352524838 ## CI report: * 8305782809d957b5fc7d280414a4e700a47138d6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13734) * f152865c372e5e57fbb0acd23b3f704b73c1cd5f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13743) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7456: [HUDI-4917][FOLLOW_UP]Optimize codes logic to not break the old class meaning
hudi-bot commented on PR #7456: URL: https://github.com/apache/hudi/pull/7456#issuecomment-1352524818 ## CI report: * 2d48757d98f331de07db0796554bf8da73de1ffc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13716) * c4041c7446cd25b9809f616b640c069ef2959107 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13742) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Write call stack information to lock file
hudi-bot commented on PR #7440: URL: https://github.com/apache/hudi/pull/7440#issuecomment-1352524771 ## CI report: * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN * 391cc64f7aaabdc0f72c85fa3ac03036d09ef43a UNKNOWN * 63fe7e7c8a882d757cbea6a4d26b7aba4bdad748 UNKNOWN * 65e21b863cdfec85ffc17beb3f0a6560796a0a09 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13729) * 6158087d7e518a7bef01ba01b993029436bf429d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13740) * a509172c60864820f6758716c4e832645a97a57f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup
hudi-bot commented on PR #7159: URL: https://github.com/apache/hudi/pull/7159#issuecomment-1352524563 ## CI report: * 15ecd91180d32c7fa1905c11408f4bc23347e682 UNKNOWN * 6fd8a8f6cd9907dfe4f25164f2e0240af65cab5e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13680) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13738) * 45cf7f3c242e20f49b95242a06efe1e24649edc7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13739) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6361: [HUDI-4690][HUDI-4503] Cleaning up Hudi custom Spark `Rule`s
hudi-bot commented on PR #6361: URL: https://github.com/apache/hudi/pull/6361#issuecomment-1352524227 ## CI report: * a8dd96042f42ca74fa8789decdea7397072ec890 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13388) * a28a39f44afe5561fbf33a6381721e98911a01db Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13741) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7461: [HUDI-5392] Fixing Bootstrapping flow handling of arrays
hudi-bot commented on PR #7461: URL: https://github.com/apache/hudi/pull/7461#issuecomment-1352520613 ## CI report: * 8305782809d957b5fc7d280414a4e700a47138d6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13734) * f152865c372e5e57fbb0acd23b3f704b73c1cd5f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7456: [HUDI-4917][FOLLOW_UP]Optimize codes logic to not break the old class meaning
hudi-bot commented on PR #7456: URL: https://github.com/apache/hudi/pull/7456#issuecomment-1352520580 ## CI report: * 2d48757d98f331de07db0796554bf8da73de1ffc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13716) * c4041c7446cd25b9809f616b640c069ef2959107 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #7450: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex
nsivabalan commented on code in PR #7450: URL: https://github.com/apache/hudi/pull/7450#discussion_r1049189154 ## hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java: ## @@ -1598,7 +1597,7 @@ public void testAvroLogRecordReaderWithMixedInsertsCorruptsAndRollback(ExternalS scanner.close(); } - @ParameterizedTest + /*@ParameterizedTest Review Comment: I based of this patch on our release branch which had this test unintentionally pulled in. but guess w/ latest state of the branch, its not an issue. this test is not pulled in only -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Write call stack information to lock file
hudi-bot commented on PR #7440: URL: https://github.com/apache/hudi/pull/7440#issuecomment-1352520525 ## CI report: * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN * 391cc64f7aaabdc0f72c85fa3ac03036d09ef43a UNKNOWN * 63fe7e7c8a882d757cbea6a4d26b7aba4bdad748 UNKNOWN * 65e21b863cdfec85ffc17beb3f0a6560796a0a09 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13729) * 6158087d7e518a7bef01ba01b993029436bf429d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup
hudi-bot commented on PR #7159: URL: https://github.com/apache/hudi/pull/7159#issuecomment-1352520269 ## CI report: * 15ecd91180d32c7fa1905c11408f4bc23347e682 UNKNOWN * 6fd8a8f6cd9907dfe4f25164f2e0240af65cab5e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13680) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13738) * 45cf7f3c242e20f49b95242a06efe1e24649edc7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6361: [HUDI-4690][HUDI-4503] Cleaning up Hudi custom Spark `Rule`s
hudi-bot commented on PR #6361: URL: https://github.com/apache/hudi/pull/6361#issuecomment-1352519867 ## CI report: * a8dd96042f42ca74fa8789decdea7397072ec890 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13388) * a28a39f44afe5561fbf33a6381721e98911a01db UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup
hudi-bot commented on PR #7159: URL: https://github.com/apache/hudi/pull/7159#issuecomment-1352517388 ## CI report: * 15ecd91180d32c7fa1905c11408f4bc23347e682 UNKNOWN * 6fd8a8f6cd9907dfe4f25164f2e0240af65cab5e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13680) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13738) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] loukey-lj commented on a diff in pull request #7336: [HUDI-5297][HUDI-5298] Refactoring WriteStatus
loukey-lj commented on code in PR #7336: URL: https://github.com/apache/hudi/pull/7336#discussion_r1049177064 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java: ## @@ -86,6 +88,7 @@ protected HoodieTimer timer; protected WriteStatus writeStatus; + protected HoodieRecordLocation newRecordLocation; Review Comment: I think it is also necessary to obtain the location from every record. The index can not only reach the file level, but also the rowGroup level and even the row level -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] melin commented on issue #7406: [SUPPORT] Support Debezium JSON
melin commented on issue #7406: URL: https://github.com/apache/hudi/issues/7406#issuecomment-1352506799 > @melin can you elaborate the use case pls? Avro format, which relies on kafka schema registry, increases deployment and maintenance costs. It's more convenient if it's json -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhuanshenbsj1 commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup
zhuanshenbsj1 commented on PR #7159: URL: https://github.com/apache/hudi/pull/7159#issuecomment-1352506224 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] loukey-lj commented on pull request #6612: [RFC-58][HUDI-4790] a more effective HoodieMergeHandler for COW table with parquet
loukey-lj commented on PR #6612: URL: https://github.com/apache/hudi/pull/6612#issuecomment-1352505945 > @loukey-lj : can you respond to @guanziyue 's comment above. I will review this patch by this week. Yes, this optimization is applicable to other frameworks. For hudi, its advantage is that it can get rowgroups and store them in the index while updating the index. For schema evolution, we currently only support adding fields. Different rowgroups in the Parquet file can have different schmeas, but this is unknown to the query side. If schema changes are not considered, I can submit a small demo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3636) Clustering fails due to marker creation failure
[ https://issues.apache.org/jira/browse/HUDI-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3636: -- Fix Version/s: 0.13.0 (was: 0.12.2) > Clustering fails due to marker creation failure > --- > > Key: HUDI-3636 > URL: https://issues.apache.org/jira/browse/HUDI-3636 > Project: Apache Hudi > Issue Type: Bug > Components: multi-writer >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > Scenario: multi-writer test, one writer doing ingesting with Deltastreamer > continuous mode, COW, inserts, async clustering and cleaning (partitions > under 2022/1, 2022/2), another writer with Spark datasource doing backfills > to different partitions (2021/12). > 0.10.0 no MT, clustering instant is inflight (failing it in the middle before > upgrade) ➝ 0.11 MT, with multi-writer configuration the same as before. > The clustering/replace instant cannot make progress due to marker creation > failure, failing the DS ingestion as well. Need to investigate if this is > timeline-server-based marker related or MT related. > {code:java} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in > stage 46.0 failed 1 times, most recent failure: Lost task 2.0 in stage 46.0 > (TID 277) (192.168.70.231 executor driver): java.lang.RuntimeException: > org.apache.hudi.exception.HoodieException: > org.apache.hudi.exception.HoodieException: > java.util.concurrent.ExecutionException: > org.apache.hudi.exception.HoodieRemoteException: Failed to create marker file > 2022/1/24/aa2f24d3-882f-4d48-b20e-9fcd3540c7a7-0_2-46-277_20220314101326706.parquet.marker.CREATE > Connect to localhost:26754 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] > failed: Connection refused (Connection refused) > at > org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121) > at > scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:46) > at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at scala.collection.Iterator.foreach$(Iterator.scala:943) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:366) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) > at scala.collection.AbstractIterator.to(Iterator.scala:1431) > at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) > at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1431) > at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at > org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hudi.exception.HoodieException: > org.apache.hudi.exception.HoodieException: > java.util.concurrent.ExecutionException: > org.apache.hudi.exception.HoodieRemoteException: Failed to create marker file > 2022/1/24/aa2f24d3-882f-4d48-b20e-9fcd3540c7a7-0_2-46-277_20220314101326706.parquet.marker.CREATE > Connect to localhost:26754 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] > failed: Connection refused (Connection refused) > at >
[GitHub] [hudi] hudi-bot commented on pull request #7450: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex
hudi-bot commented on PR #7450: URL: https://github.com/apache/hudi/pull/7450#issuecomment-1352477380 ## CI report: * f11664234aaf6c74c98c1d75a364770931f9c00b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13706) * 3d4f6bf574764b5e8c962b94fa1b7fbfd6e735b5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13737) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7450: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex
hudi-bot commented on PR #7450: URL: https://github.com/apache/hudi/pull/7450#issuecomment-1352474318 ## CI report: * f11664234aaf6c74c98c1d75a364770931f9c00b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13706) * 3d4f6bf574764b5e8c962b94fa1b7fbfd6e735b5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] [testing]Hive3 query returns null when the where clause has a partition field
hudi-bot commented on PR #7355: URL: https://github.com/apache/hudi/pull/7355#issuecomment-1352471026 ## CI report: * c60978aaf0dd183b05139dda6bd741ea43877f42 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13715) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13736) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xicm commented on pull request #7355: [HUDI-5308] [testing]Hive3 query returns null when the where clause has a partition field
xicm commented on PR #7355: URL: https://github.com/apache/hudi/pull/7355#issuecomment-1352444013 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7450: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex
alexeykudinkin commented on code in PR #7450: URL: https://github.com/apache/hudi/pull/7450#discussion_r1049125968 ## hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java: ## @@ -1598,7 +1597,7 @@ public void testAvroLogRecordReaderWithMixedInsertsCorruptsAndRollback(ExternalS scanner.close(); } - @ParameterizedTest + /*@ParameterizedTest Review Comment: Why changing this one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7423: [HUDI-5384] Adding optimization rule to appropriately push down filters into the `HoodieFileIndex`
hudi-bot commented on PR #7423: URL: https://github.com/apache/hudi/pull/7423#issuecomment-1352430240 ## CI report: * 2905580eede076436b472c22da2f2d6af27d1e1e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13699) * 09b901a56869b8282c92d6c05ad746f98f2d6a01 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13735) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7461: [HUDI-5392] Fixing Bootstrapping flow handling of arrays
hudi-bot commented on PR #7461: URL: https://github.com/apache/hudi/pull/7461#issuecomment-1352423852 ## CI report: * 8305782809d957b5fc7d280414a4e700a47138d6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13734) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7423: [HUDI-5384] Adding optimization rule to appropriately push down filters into the `HoodieFileIndex`
hudi-bot commented on PR #7423: URL: https://github.com/apache/hudi/pull/7423#issuecomment-1352423665 ## CI report: * 2905580eede076436b472c22da2f2d6af27d1e1e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13699) * 09b901a56869b8282c92d6c05ad746f98f2d6a01 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kasured commented on issue #7246: [SUPPORT] Controlling the Archival process retention
kasured commented on issue #7246: URL: https://github.com/apache/hudi/issues/7246#issuecomment-1352420251 Sorry for the late update. I should have followed up with that before. Right after the issue was created we decided to go with the presumably least risky option of decreasing the batch size 'hoodie.commits.archival.batch'. It helped to eliminate the issue with that particular table. However, the remaining concern for us at the moment is that regardless the options I have listed (unless there are some other) either the number of archive files will keep increasing (if the archive merge is disabled) or the overall archival size will be accumulating (if archival is enabled). Therefore we can close the issue as there is no OOM anymore, on the other hand there seems to be no way to control the growth of the archival files/overall size. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7461: [HUDI-5392] Fixing Bootstrapping flow handling of arrays
hudi-bot commented on PR #7461: URL: https://github.com/apache/hudi/pull/7461#issuecomment-1352419381 ## CI report: * c74b3094a3b1cf632e569635ee570bd53ebcde1e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13733) * 8305782809d957b5fc7d280414a4e700a47138d6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7461: [HUDI-5392] Fixing Bootstrapping flow handling of arrays
hudi-bot commented on PR #7461: URL: https://github.com/apache/hudi/pull/7461#issuecomment-1352415760 ## CI report: * c74b3094a3b1cf632e569635ee570bd53ebcde1e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13733) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5392) Fix Bootstrap files reader to configure arrays to be read in the new format
[ https://issues.apache.org/jira/browse/HUDI-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5392: -- Sprint: 2022/12/12 > Fix Bootstrap files reader to configure arrays to be read in the new format > --- > > Key: HUDI-5392 > URL: https://issues.apache.org/jira/browse/HUDI-5392 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > When writing Bootstrap file we’re using Spark writer that writes arrays in > the new format, while Hudi reads it in the old (Avro compatible) format: > {code:java} > // Old > optional group tip_history (LIST) { > repeated group array { > optional double amount; > optional binary currency (UTF8); > } > } > // new > optional group tip_history (LIST) { > repeated group list { > optional group element { > optional double amount; > optional binary currency (UTF8); > } > } > } {code} > > To fix that we need to make sure that Bootstrap files are *always* read in a > new format (Spark default) unlike Hudi's Parquet files > We also need to fix TestDataSourceForBootstrap, as it currently doesn't > actually assert that the records are written correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5392) Fix Bootstrap files reader to configure arrays to be read in the new format
[ https://issues.apache.org/jira/browse/HUDI-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5392: -- Status: In Progress (was: Open) > Fix Bootstrap files reader to configure arrays to be read in the new format > --- > > Key: HUDI-5392 > URL: https://issues.apache.org/jira/browse/HUDI-5392 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > When writing Bootstrap file we’re using Spark writer that writes arrays in > the new format, while Hudi reads it in the old (Avro compatible) format: > {code:java} > // Old > optional group tip_history (LIST) { > repeated group array { > optional double amount; > optional binary currency (UTF8); > } > } > // new > optional group tip_history (LIST) { > repeated group list { > optional group element { > optional double amount; > optional binary currency (UTF8); > } > } > } {code} > > To fix that we need to make sure that Bootstrap files are *always* read in a > new format (Spark default) unlike Hudi's Parquet files > We also need to fix TestDataSourceForBootstrap, as it currently doesn't > actually assert that the records are written correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #7461: [HUDI-5392] Fixing Bootstrapping flow handling of arrays
hudi-bot commented on PR #7461: URL: https://github.com/apache/hudi/pull/7461#issuecomment-1352368542 ## CI report: * c74b3094a3b1cf632e569635ee570bd53ebcde1e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5392) Fix Bootstrap files reader to configure arrays to be read in the new format
[ https://issues.apache.org/jira/browse/HUDI-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5392: - Labels: pull-request-available (was: ) > Fix Bootstrap files reader to configure arrays to be read in the new format > --- > > Key: HUDI-5392 > URL: https://issues.apache.org/jira/browse/HUDI-5392 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > When writing Bootstrap file we’re using Spark writer that writes arrays in > the new format, while Hudi reads it in the old (Avro compatible) format: > {code:java} > // Old > optional group tip_history (LIST) { > repeated group array { > optional double amount; > optional binary currency (UTF8); > } > } > // new > optional group tip_history (LIST) { > repeated group list { > optional group element { > optional double amount; > optional binary currency (UTF8); > } > } > } {code} > > To fix that we need to make sure that Bootstrap files are *always* read in a > new format (Spark default) unlike Hudi's Parquet files > We also need to fix TestDataSourceForBootstrap, as it currently doesn't > actually assert that the records are written correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] alexeykudinkin opened a new pull request, #7461: [HUDI-5392] Fixing Bootstrapping flow'
alexeykudinkin opened a new pull request, #7461: URL: https://github.com/apache/hudi/pull/7461 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7455: [DO_NOT_MERGE] Release 0.12.2 blockers candidate
hudi-bot commented on PR #7455: URL: https://github.com/apache/hudi/pull/7455#issuecomment-1352364791 ## CI report: * 292630b480861b993951eca862f25f0c9b861ec1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13731) * ee8c9dfe97b6f4fad9824244d93bd81718d56511 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13732) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7455: [DO_NOT_MERGE] Release 0.12.2 blockers candidate
hudi-bot commented on PR #7455: URL: https://github.com/apache/hudi/pull/7455#issuecomment-1352361142 ## CI report: * 292630b480861b993951eca862f25f0c9b861ec1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13731) * ee8c9dfe97b6f4fad9824244d93bd81718d56511 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-5392) Fix Bootstrap files reader to configure arrays to be read in the new format
[ https://issues.apache.org/jira/browse/HUDI-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-5392: - Assignee: Alexey Kudinkin (was: Ethan Guo) > Fix Bootstrap files reader to configure arrays to be read in the new format > --- > > Key: HUDI-5392 > URL: https://issues.apache.org/jira/browse/HUDI-5392 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Fix For: 0.13.0 > > > When writing Bootstrap file we’re using Spark writer that writes arrays in > the new format, while Hudi reads it in the old (Avro compatible) format: > {code:java} > // Old > optional group tip_history (LIST) { > repeated group array { > optional double amount; > optional binary currency (UTF8); > } > } > // new > optional group tip_history (LIST) { > repeated group list { > optional group element { > optional double amount; > optional binary currency (UTF8); > } > } > } {code} > > To fix that we need to make sure that Bootstrap files are *always* read in a > new format (Spark default) unlike Hudi's Parquet files > We also need to fix TestDataSourceForBootstrap, as it currently doesn't > actually assert that the records are written correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5392) Fix Bootstrap files reader to configure arrays to be read in the new format
[ https://issues.apache.org/jira/browse/HUDI-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5392: -- Story Points: 8 (was: 2) > Fix Bootstrap files reader to configure arrays to be read in the new format > --- > > Key: HUDI-5392 > URL: https://issues.apache.org/jira/browse/HUDI-5392 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Fix For: 0.13.0 > > > When writing Bootstrap file we’re using Spark writer that writes arrays in > the new format, while Hudi reads it in the old (Avro compatible) format: > {code:java} > // Old > optional group tip_history (LIST) { > repeated group array { > optional double amount; > optional binary currency (UTF8); > } > } > // new > optional group tip_history (LIST) { > repeated group list { > optional group element { > optional double amount; > optional binary currency (UTF8); > } > } > } {code} > > To fix that we need to make sure that Bootstrap files are *always* read in a > new format (Spark default) unlike Hudi's Parquet files > We also need to fix TestDataSourceForBootstrap, as it currently doesn't > actually assert that the records are written correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[hudi] branch release-0.12.2-blockers-candidate updated (292630b4808 -> ee8c9dfe97b)
This is an automated email from the ASF dual-hosted git repository. akudinkin pushed a change to branch release-0.12.2-blockers-candidate in repository https://gitbox.apache.org/repos/asf/hudi.git from 292630b4808 Avoiding costly lookups into the schema cache in `SqlTypedRecord` add ee8c9dfe97b Fixing schemas used for bootstrap reader No new revisions were added by this update. Summary of changes: .../org/apache/hudi/table/action/commit/HoodieMergeHelper.java | 10 -- .../org/apache/hudi/table/action/commit/FlinkMergeHelper.java | 9 - .../org/apache/hudi/table/action/commit/JavaMergeHelper.java | 9 - 3 files changed, 24 insertions(+), 4 deletions(-)
[jira] [Commented] (HUDI-5392) Fix Bootstrap files reader to configure arrays to be read in the new format
[ https://issues.apache.org/jira/browse/HUDI-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17647735#comment-17647735 ] Alexey Kudinkin commented on HUDI-5392: --- Another contributing issue is that when reading Bootstrap file we don't specify the expected schema and therefore records from the Bootstrap file are read in the schema decode from file's Parquet one. This is problematic b/c when we validate the Avro schemas their corresponding names are checked and this creates mismatches since Parquet schemas don't bear names/namespaces (of the structs) > Fix Bootstrap files reader to configure arrays to be read in the new format > --- > > Key: HUDI-5392 > URL: https://issues.apache.org/jira/browse/HUDI-5392 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap >Reporter: Alexey Kudinkin >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.13.0 > > > When writing Bootstrap file we’re using Spark writer that writes arrays in > the new format, while Hudi reads it in the old (Avro compatible) format: > {code:java} > // Old > optional group tip_history (LIST) { > repeated group array { > optional double amount; > optional binary currency (UTF8); > } > } > // new > optional group tip_history (LIST) { > repeated group list { > optional group element { > optional double amount; > optional binary currency (UTF8); > } > } > } {code} > > To fix that we need to make sure that Bootstrap files are *always* read in a > new format (Spark default) unlike Hudi's Parquet files > We also need to fix TestDataSourceForBootstrap, as it currently doesn't > actually assert that the records are written correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-5392) Fix Bootstrap files reader to configure arrays to be read in the new format
[ https://issues.apache.org/jira/browse/HUDI-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-5392: - Assignee: Ethan Guo (was: Alexey Kudinkin) > Fix Bootstrap files reader to configure arrays to be read in the new format > --- > > Key: HUDI-5392 > URL: https://issues.apache.org/jira/browse/HUDI-5392 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap >Reporter: Alexey Kudinkin >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.13.0 > > > When writing Bootstrap file we’re using Spark writer that writes arrays in > the new format, while Hudi reads it in the old (Avro compatible) format: > {code:java} > // Old > optional group tip_history (LIST) { > repeated group array { > optional double amount; > optional binary currency (UTF8); > } > } > // new > optional group tip_history (LIST) { > repeated group list { > optional group element { > optional double amount; > optional binary currency (UTF8); > } > } > } {code} > > To fix that we need to make sure that Bootstrap files are *always* read in a > new format (Spark default) unlike Hudi's Parquet files > We also need to fix TestDataSourceForBootstrap, as it currently doesn't > actually assert that the records are written correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-5392) Fix Bootstrap files reader to configure arrays to be read in the new format
[ https://issues.apache.org/jira/browse/HUDI-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-5392: - Assignee: Alexey Kudinkin > Fix Bootstrap files reader to configure arrays to be read in the new format > --- > > Key: HUDI-5392 > URL: https://issues.apache.org/jira/browse/HUDI-5392 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Fix For: 0.13.0 > > > When writing Bootstrap file we’re using Spark writer that writes arrays in > the new format, while Hudi reads it in the old (Avro compatible) format: > {code:java} > // Old > optional group tip_history (LIST) { > repeated group array { > optional double amount; > optional binary currency (UTF8); > } > } > // new > optional group tip_history (LIST) { > repeated group list { > optional group element { > optional double amount; > optional binary currency (UTF8); > } > } > } {code} > > To fix that we need to make sure that Bootstrap files are *always* read in a > new format (Spark default) unlike Hudi's Parquet files > We also need to fix TestDataSourceForBootstrap, as it currently doesn't > actually assert that the records are written correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010)