Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]
hudi-bot commented on PR #10426: URL: https://github.com/apache/hudi/pull/10426#issuecomment-1882542666 ## CI report: * 0510337de5adb626429e72da5539b9f23231974f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21880) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7276] FILE_GROUP_READER_ENABLED should be disable for query [hudi]
linliu-code commented on PR #10455: URL: https://github.com/apache/hudi/pull/10455#issuecomment-1882525739 @xuzifu666 do you have any same data and the query that generates the error? With that I can try to fix the problem asap. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-1623] Solid completion time on timeline [hudi]
waywtdcc commented on PR #9617: URL: https://github.com/apache/hudi/pull/9617#issuecomment-1882506165 > No, we would fix the ugrade for 1.0.0 GA release. Does this mean that the official version 1.0.0 will support automatic upgrade and fix this bug? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
hudi-bot commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882481681 ## CI report: * 9c61cc3b1ff124314bb7cacb82bb141762678d54 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21878) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1882474976 ## CI report: * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN * 4d1c6b4d5d83eaed954174ce26a83e23be62bb20 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21882) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1882441755 ## CI report: * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN * 6a208e28d1b9a9a34616f80bfb9e0ab1ead14dc3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21881) * 4d1c6b4d5d83eaed954174ce26a83e23be62bb20 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1882436037 ## CI report: * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN * 88c81170920a6eaf95e826e979ce6e526d8b33f9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21879) * 6a208e28d1b9a9a34616f80bfb9e0ab1ead14dc3 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]
hudi-bot commented on PR #10426: URL: https://github.com/apache/hudi/pull/10426#issuecomment-1882430499 ## CI report: * 30f390339ad03845b0068256f7938155cf9e08e9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21861) * 0510337de5adb626429e72da5539b9f23231974f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21880) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [] CVE-2023-44487 Upgrade jetty and exclude older jetty [hudi]
hudi-bot commented on PR #10223: URL: https://github.com/apache/hudi/pull/10223#issuecomment-1882430134 ## CI report: * 4827a8d0481f67243920efee57eda41b8a8210a7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21876) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
hudi-bot commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882430073 ## CI report: * Unknown: [CANCELED](TBD) * 9c61cc3b1ff124314bb7cacb82bb141762678d54 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21878) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]
hudi-bot commented on PR #10426: URL: https://github.com/apache/hudi/pull/10426#issuecomment-1882401300 ## CI report: * 30f390339ad03845b0068256f7938155cf9e08e9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21861) * 0510337de5adb626429e72da5539b9f23231974f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1882401167 ## CI report: * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN * 88c81170920a6eaf95e826e979ce6e526d8b33f9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21879) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1882396258 ## CI report: * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN * b925f08b8056623f82bfec8e8031db0536d7bbee Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21877) * 88c81170920a6eaf95e826e979ce6e526d8b33f9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7251] Support partial schema for CDC workload [hudi]
hudi-bot commented on PR #10462: URL: https://github.com/apache/hudi/pull/10462#issuecomment-1882391258 ## CI report: * 1698e7f7a5410b6f7c46fc21e9e73e1362e5f5ce Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21873) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
hudi-bot commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882355671 ## CI report: * 378a6a619dc288301c70275483bbb0ecfa73a7f1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21875) * Unknown: [CANCELED](TBD) * 9c61cc3b1ff124314bb7cacb82bb141762678d54 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21878) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-7280) Add/Drop/Rename table properties hoodie.properties
[ https://issues.apache.org/jira/browse/HUDI-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu reassigned HUDI-7280: - Assignee: (was: Lin Liu) > Add/Drop/Rename table properties hoodie.properties > --- > > Key: HUDI-7280 > URL: https://issues.apache.org/jira/browse/HUDI-7280 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Lin Liu >Priority: Major > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1882340593 ## CI report: * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN * b925f08b8056623f82bfec8e8031db0536d7bbee Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21877) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
hudi-bot commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882339654 ## CI report: * 378a6a619dc288301c70275483bbb0ecfa73a7f1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21875) * Unknown: [CANCELED](TBD) * 9c61cc3b1ff124314bb7cacb82bb141762678d54 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]
beyond1920 commented on code in PR #10426: URL: https://github.com/apache/hudi/pull/10426#discussion_r1444512704 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java: ## @@ -971,6 +952,59 @@ public void alterPartitionColumnStatistics( throw new HoodieCatalogException("Not supported."); } + public boolean isUpdatePermissible(ObjectPath tablePath, CatalogBaseTable newCatalogTable, boolean ignoreIfNotExists) throws TableNotExistException { +if (!newCatalogTable.getOptions().getOrDefault(CONNECTOR.key(), "").equalsIgnoreCase("hudi")) { + throw new HoodieCatalogException(String.format("The %s is not hoodie table", tablePath.getObjectName())); +} +if (newCatalogTable instanceof CatalogView) { + throw new HoodieCatalogException("Hoodie catalog does not support to ALTER VIEW"); +} + +try { + Table hiveTable = getHiveTable(tablePath); + if (!sameOptions(hiveTable.getParameters(), newCatalogTable.getOptions(), FlinkOptions.TABLE_TYPE) + || !sameOptions(hiveTable.getParameters(), newCatalogTable.getOptions(), FlinkOptions.INDEX_TYPE)) { Review Comment: @xiarixiaoyao After I forbid modifying `primaryKeys` and `preCombinekeys`, some tests failed in `TestHoodieHiveCatalog.testCreateAndGetHoodieTable` and `TestHoodieHiveCatalog.testCreateAndGetHoodieTable`. See more info in [Failed pipelines](https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=21861=logs=600e7de6-e133-5e69-e615-50ee129b3c08=bbbd7bcc-ae73-56b8-887a-cd2d6deaafc7). Maybe the real user would also modify `primaryKeys` and `preCombinekeys` after create the table. I would remove those limitation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7240] Clean delete logic [hudi]
hudi-bot commented on PR #10398: URL: https://github.com/apache/hudi/pull/10398#issuecomment-1882325265 ## CI report: * 3d71d1c0e3220f0639b702d91539e1d070e93cca Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21874) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1882324940 ## CI report: * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN * 3192ba3b71bdfe1421d92369e8787efbee821f90 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21871) * b925f08b8056623f82bfec8e8031db0536d7bbee UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
rohitmittapalli commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882325117 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] add parquet merge schema config [hudi]
rohitmittapalli opened a new pull request, #10463: URL: https://github.com/apache/hudi/pull/10463 ### Change Logs Adding Parquet DFS source config class from this PR: https://github.com/apache/hudi/pull/10199/files ### Impact Adding public documentation. ### Risk level (write none, low medium or high below) Low ### Documentation Update ### Contributor's checklist - [X] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [X] Change Logs and Impact were stated clearly - [X] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7280) Add/Drop/Rename table properties hoodie.properties
[ https://issues.apache.org/jira/browse/HUDI-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu updated HUDI-7280: -- Issue Type: Improvement (was: Bug) > Add/Drop/Rename table properties hoodie.properties > --- > > Key: HUDI-7280 > URL: https://issues.apache.org/jira/browse/HUDI-7280 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Lin Liu >Assignee: Lin Liu >Priority: Major > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7280) Add/Drop/Rename table properties hoodie.properties
Lin Liu created HUDI-7280: - Summary: Add/Drop/Rename table properties hoodie.properties Key: HUDI-7280 URL: https://issues.apache.org/jira/browse/HUDI-7280 Project: Apache Hudi Issue Type: Bug Reporter: Lin Liu Assignee: Lin Liu Fix For: 1.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [] CVE-2023-44487 Upgrade jetty and exclude older jetty [hudi]
hudi-bot commented on PR #10223: URL: https://github.com/apache/hudi/pull/10223#issuecomment-1882239577 ## CI report: * 536833e03706c665b00e88986596bf9f44aa2c47 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21801) * 4827a8d0481f67243920efee57eda41b8a8210a7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21876) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
hudi-bot commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882239471 ## CI report: * f7566099db43c39a06db5e4ae905a65dfd69a7ca Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21190) * 378a6a619dc288301c70275483bbb0ecfa73a7f1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21875) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [] CVE-2023-44487 Upgrade jetty and exclude older jetty [hudi]
hudi-bot commented on PR #10223: URL: https://github.com/apache/hudi/pull/10223#issuecomment-1882224888 ## CI report: * 536833e03706c665b00e88986596bf9f44aa2c47 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21801) * 4827a8d0481f67243920efee57eda41b8a8210a7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
hudi-bot commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882224728 ## CI report: * f7566099db43c39a06db5e4ae905a65dfd69a7ca Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21190) * 378a6a619dc288301c70275483bbb0ecfa73a7f1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
hudi-bot commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882210046 ## CI report: * f7566099db43c39a06db5e4ae905a65dfd69a7ca Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21190) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7251] Support partial schema for CDC workload [hudi]
hudi-bot commented on PR #10462: URL: https://github.com/apache/hudi/pull/10462#issuecomment-1882211360 ## CI report: * b06d46d4d50ffaa766d50e299f5fc3b3dd536c3e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21872) * 1698e7f7a5410b6f7c46fc21e9e73e1362e5f5ce Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21873) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Merge schema in ParuqetDFSSource [hudi]
rohitmittapalli commented on PR #10199: URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882191866 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7247]Spark truncate table supports concurrency [hudi]
stream2000 commented on code in PR #10390: URL: https://github.com/apache/hudi/pull/10390#discussion_r144086 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/command/TruncateHoodieTableCommand.scala: ## @@ -68,7 +71,12 @@ case class TruncateHoodieTableCommand( val targetPath = new Path(basePath) val engineContext = new HoodieSparkEngineContext(sparkSession.sparkContext) val fs = FSUtils.getFs(basePath, sparkSession.sparkContext.hadoopConfiguration) + val hoodieWriteConfig = HoodieWriteConfig.newBuilder().withPath(basePath).withProps(properties).withEngineType(EngineType.SPARK) +.build() + val transactionManager = new TransactionManager(hoodieWriteConfig, fs) + transactionManager.beginTransaction(org.apache.hudi.common.util.Option.empty(), org.apache.hudi.common.util.Option.empty()) FSUtils.deleteDir(engineContext, fs, targetPath, sparkSession.sparkContext.defaultParallelism) + transactionManager.endTransaction(org.apache.hudi.common.util.Option.empty()) Review Comment: Can we use replace commit here to provide the transaction mechanism? We have already done it for partitioned table -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7240] Clean delete logic [hudi]
hudi-bot commented on PR #10398: URL: https://github.com/apache/hudi/pull/10398#issuecomment-1882131119 ## CI report: * c3779c92492d0deeb5a261c63710a96ade6f0524 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21684) * 3d71d1c0e3220f0639b702d91539e1d070e93cca Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21874) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7251] Support partial schema for CDC workload [hudi]
hudi-bot commented on PR #10462: URL: https://github.com/apache/hudi/pull/10462#issuecomment-1882124489 ## CI report: * b06d46d4d50ffaa766d50e299f5fc3b3dd536c3e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21872) * 1698e7f7a5410b6f7c46fc21e9e73e1362e5f5ce Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21873) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7240] Clean delete logic [hudi]
hudi-bot commented on PR #10398: URL: https://github.com/apache/hudi/pull/10398#issuecomment-1882124334 ## CI report: * c3779c92492d0deeb5a261c63710a96ade6f0524 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21684) * 3d71d1c0e3220f0639b702d91539e1d070e93cca UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]
danny0405 commented on code in PR #10426: URL: https://github.com/apache/hudi/pull/10426#discussion_r1445532868 ## hudi-flink-datasource/hudi-flink1.14.x/src/main/java/org/apache/hudi/adapter/Utils.java: ## @@ -57,4 +64,8 @@ public static BinaryExternalSorter getBinaryExternalSorter( return new BinaryExternalSorter(owner, memoryManager, reservedMemorySize, ioManager, inputSerializer, serializer, normalizedKeyComputer, comparator, conf); } + + public static InternalSchema applyTableChange(InternalSchema oldSchema, List changes, Function convertFunc) { +throw new RuntimeException("There is no possible to hit this method!"); Review Comment: Can we just throw a assertionError with simple msg ‘unexpected’ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]
danny0405 commented on code in PR #10426: URL: https://github.com/apache/hudi/pull/10426#discussion_r1445531226 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java: ## @@ -131,6 +138,13 @@ public class HoodieHiveCatalog extends AbstractCatalog { // optional catalog base path: used for db/table path inference. private final String catalogPath; private final boolean external; + private static final Map> IMMUTABLE_CONFS = new HashMap<>(); + { Review Comment: Don’t think we need this. A simple if-else check is enough. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7251] Support partial schema for CDC workload [hudi]
hudi-bot commented on PR #10462: URL: https://github.com/apache/hudi/pull/10462#issuecomment-1882118575 ## CI report: * b06d46d4d50ffaa766d50e299f5fc3b3dd536c3e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21872) * 1698e7f7a5410b6f7c46fc21e9e73e1362e5f5ce UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch release-0.14.1 updated (66cff7d7642 -> 5b0d67bc798)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a change to branch release-0.14.1 in repository https://gitbox.apache.org/repos/asf/hudi.git from 66cff7d7642 Bumping release candidate number 2 add 5b0d67bc798 [MINOR] Update release version to reflect published version 0.14.1 No new revisions were added by this update. Summary of changes: docker/hoodie/hadoop/base/pom.xml| 2 +- docker/hoodie/hadoop/base_java11/pom.xml | 2 +- docker/hoodie/hadoop/datanode/pom.xml| 2 +- docker/hoodie/hadoop/historyserver/pom.xml | 2 +- docker/hoodie/hadoop/hive_base/pom.xml | 2 +- docker/hoodie/hadoop/namenode/pom.xml| 2 +- docker/hoodie/hadoop/pom.xml | 2 +- docker/hoodie/hadoop/prestobase/pom.xml | 2 +- docker/hoodie/hadoop/spark_base/pom.xml | 2 +- docker/hoodie/hadoop/sparkadhoc/pom.xml | 2 +- docker/hoodie/hadoop/sparkmaster/pom.xml | 2 +- docker/hoodie/hadoop/sparkworker/pom.xml | 2 +- docker/hoodie/hadoop/trinobase/pom.xml | 2 +- docker/hoodie/hadoop/trinocoordinator/pom.xml| 2 +- docker/hoodie/hadoop/trinoworker/pom.xml | 2 +- hudi-aws/pom.xml | 4 ++-- hudi-cli/pom.xml | 2 +- hudi-client/hudi-client-common/pom.xml | 4 ++-- hudi-client/hudi-flink-client/pom.xml| 4 ++-- hudi-client/hudi-java-client/pom.xml | 4 ++-- hudi-client/hudi-spark-client/pom.xml| 4 ++-- hudi-client/pom.xml | 2 +- hudi-common/pom.xml | 2 +- hudi-examples/hudi-examples-common/pom.xml | 2 +- hudi-examples/hudi-examples-flink/pom.xml| 2 +- hudi-examples/hudi-examples-java/pom.xml | 2 +- hudi-examples/hudi-examples-spark/pom.xml| 2 +- hudi-examples/pom.xml| 2 +- hudi-flink-datasource/hudi-flink/pom.xml | 4 ++-- hudi-flink-datasource/hudi-flink1.13.x/pom.xml | 4 ++-- hudi-flink-datasource/hudi-flink1.14.x/pom.xml | 4 ++-- hudi-flink-datasource/hudi-flink1.15.x/pom.xml | 4 ++-- hudi-flink-datasource/hudi-flink1.16.x/pom.xml | 4 ++-- hudi-flink-datasource/hudi-flink1.17.x/pom.xml | 4 ++-- hudi-flink-datasource/pom.xml| 4 ++-- hudi-gcp/pom.xml | 2 +- hudi-hadoop-mr/pom.xml | 2 +- hudi-integ-test/pom.xml | 2 +- hudi-kafka-connect/pom.xml | 4 ++-- hudi-platform-service/hudi-metaserver/hudi-metaserver-client/pom.xml | 2 +- hudi-platform-service/hudi-metaserver/hudi-metaserver-server/pom.xml | 2 +- hudi-platform-service/hudi-metaserver/pom.xml| 4 ++-- hudi-platform-service/pom.xml| 2 +- hudi-spark-datasource/hudi-spark-common/pom.xml | 4 ++-- hudi-spark-datasource/hudi-spark/pom.xml | 4 ++-- hudi-spark-datasource/hudi-spark2-common/pom.xml | 2 +- hudi-spark-datasource/hudi-spark2/pom.xml| 4 ++-- hudi-spark-datasource/hudi-spark3-common/pom.xml | 2 +- hudi-spark-datasource/hudi-spark3.0.x/pom.xml| 4 ++-- hudi-spark-datasource/hudi-spark3.1.x/pom.xml| 4 ++-- hudi-spark-datasource/hudi-spark3.2.x/pom.xml| 4 ++-- hudi-spark-datasource/hudi-spark3.2plus-common/pom.xml | 2 +- hudi-spark-datasource/hudi-spark3.3.x/pom.xml| 4 ++-- hudi-spark-datasource/hudi-spark3.4.x/pom.xml| 4 ++-- hudi-spark-datasource/pom.xml| 2 +- hudi-sync/hudi-adb-sync/pom.xml | 2 +- hudi-sync/hudi-datahub-sync/pom.xml | 2 +- hudi-sync/hudi-hive-sync/pom.xml | 2 +- hudi-sync/hudi-sync-common/pom.xml | 2 +- hudi-sync/pom.xml
Re: [PR] [HUDI-7251] Support partial schema for CDC workload [hudi]
hudi-bot commented on PR #10462: URL: https://github.com/apache/hudi/pull/10462#issuecomment-1882065745 ## CI report: * b06d46d4d50ffaa766d50e299f5fc3b3dd536c3e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21872) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1882065105 ## CI report: * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN * 3192ba3b71bdfe1421d92369e8787efbee821f90 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21871) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [DOCS] Add OneTable to list of catalogs [hudi]
bhasudha closed pull request #10431: [DOCS] Add OneTable to list of catalogs URL: https://github.com/apache/hudi/pull/10431 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch asf-site updated: added onetable support (#10461)
This is an automated email from the ASF dual-hosted git repository. bhavanisudha pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 0aa1ecc907e added onetable support (#10461) 0aa1ecc907e is described below commit 0aa1ecc907ea2337e6e6b0e26dad4eba8c26e36a Author: Sagar Lakshmipathy <18vidhyasa...@gmail.com> AuthorDate: Mon Jan 8 16:33:53 2024 -0800 added onetable support (#10461) --- website/docs/syncing_onetable.md | 54 ++ website/sidebars.js| 3 +- .../version-0.14.0/syncing_onetable.md | 54 ++ .../version-0.14.1/syncing_onetable.md | 54 ++ .../version-0.14.0-sidebars.json | 3 +- .../version-0.14.1-sidebars.json | 3 +- 6 files changed, 168 insertions(+), 3 deletions(-) diff --git a/website/docs/syncing_onetable.md b/website/docs/syncing_onetable.md new file mode 100644 index 000..99760e101c0 --- /dev/null +++ b/website/docs/syncing_onetable.md @@ -0,0 +1,54 @@ +--- +title: OneTable +keywords: [onetable, hudi, delta-lake, iceberg, sync] +--- + +Hudi (tables created from 0.14.0 onwards) supports syncing to [OneTable](https://onetable.dev/), providing users with the option to interoperate with other table formats like Delta Lake and Apache Iceberg. + +## Interoperating with OneTable + +If you have tables in one of the supported formats (Delta/Iceberg), you can use OneTable to translate the existing metadata to read as a Hudi table and vice versa. + +### Installation + +You can work with OneTable by either building the jar from the [source](https://github.com/onetable-io/onetable) or by downloading from [GitHub packages](https://github.com/onetable-io/onetable/packages/1986830). + +:::tip Note +If you're using one of the JVM languages to work with Hudi/Delta/Iceberg, you can directly use OneTable as a [dependency](https://github.com/onetable-io/onetable/packages/1986830) in your project. +This is highlighted in this [demo](https://onetable.dev/docs/demo/docker). +::: + +### Syncing to OneTable + +Once you have the jar, you can simply run it against a Hudi/Delta/Iceberg table to add target table format metadata to the table. +Below is an example configuration to translate a Hudi table to Delta & Iceberg table. + +```shell md title="my_config.yaml" +sourceFormat: HUDI +targetFormats: + - DELTA + - ICEBERG +datasets: + - +tableBasePath: path/to/hudi/table +tableName: tableName +partitionSpec: partition_field_name:VALUE +``` + +```shell md title="shell" +java -jar path/to/bundled-onetable.jar --datasetConfig path/to/my_config.yaml +``` + +### Hudi Streamer Extensions +If you want to use OneTable with Hudi Streamer to sync each commit into other table formats, you have to + +1. Add the [extensions jar](https://github.com/onetable-io/onetable/tree/main/hudi-support/extensions) `hudi-extensions-0.1.0-SNAPSHOT-bundled.jar` to your class path. +2. Add `io.onetable.hudi.sync.OneTableSyncTool` to your list of sync classes +3. Set the following configurations based on your preferences: + + ``` + hoodie.onetable.formats: "ICEBERG,DELTA" + hoodie.onetable.target.metadata.retention.hr: 168 + ``` + +For more examples, you can refer to the [OneTable docs](https://onetable.dev/docs/how-to). \ No newline at end of file diff --git a/website/sidebars.js b/website/sidebars.js index 9ee664a606c..72456f554bb 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -57,7 +57,8 @@ module.exports = { 'syncing_aws_glue_data_catalog', 'syncing_datahub', 'syncing_metastore', -"gcp_bigquery" +'gcp_bigquery', +'syncing_onetable' ], } ], diff --git a/website/versioned_docs/version-0.14.0/syncing_onetable.md b/website/versioned_docs/version-0.14.0/syncing_onetable.md new file mode 100644 index 000..99760e101c0 --- /dev/null +++ b/website/versioned_docs/version-0.14.0/syncing_onetable.md @@ -0,0 +1,54 @@ +--- +title: OneTable +keywords: [onetable, hudi, delta-lake, iceberg, sync] +--- + +Hudi (tables created from 0.14.0 onwards) supports syncing to [OneTable](https://onetable.dev/), providing users with the option to interoperate with other table formats like Delta Lake and Apache Iceberg. + +## Interoperating with OneTable + +If you have tables in one of the supported formats (Delta/Iceberg), you can use OneTable to translate the existing metadata to read as a Hudi table and vice versa. + +### Installation + +You can work with OneTable by either building the jar from the [source](https://github.com/onetable-io/onetable) or by downloading from [GitHub
Re: [PR] [DOCS] Added onetable support [hudi]
bhasudha merged PR #10461: URL: https://github.com/apache/hudi/pull/10461 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7251] Support partial schema for CDC workload [hudi]
hudi-bot commented on PR #10462: URL: https://github.com/apache/hudi/pull/10462#issuecomment-1882050370 ## CI report: * b06d46d4d50ffaa766d50e299f5fc3b3dd536c3e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1882049695 ## CI report: * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN * 505fda45b13ca3afd7cb8e0ef5fe575456ad14ca Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21867) * 3192ba3b71bdfe1421d92369e8787efbee821f90 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Add parallel listing of existing partitions [hudi]
hudi-bot commented on PR #10460: URL: https://github.com/apache/hudi/pull/10460#issuecomment-1882036532 ## CI report: * 1022041ada5bc1b360c463e2b044e232fd8f9749 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21869) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7251) Improve the logic of AppendHandle for partial updates
[ https://issues.apache.org/jira/browse/HUDI-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7251: - Labels: pull-request-available (was: ) > Improve the logic of AppendHandle for partial updates > - > > Key: HUDI-7251 > URL: https://issues.apache.org/jira/browse/HUDI-7251 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Lin Liu >Assignee: Lin Liu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7251] Support partial schema for CDC workload [hudi]
linliu-code opened a new pull request, #10462: URL: https://github.com/apache/hudi/pull/10462 ### Change Logs We pass the partial schema into write config, which will be used in HoodieAppendHandle. Here we assume 1. the inputBatch contains the partial schema which is from the upstream. 2. the inputBatch contains either insert, update or delete records. ### Impact Low since the partial schema equals the full schema by default. ### Risk level (write none, low medium or high below) Low. ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [DOCS] Added onetable support [hudi]
sagarlakshmipathy opened a new pull request, #10461: URL: https://github.com/apache/hudi/pull/10461 ### Change Logs Added OneTable to Catalogs page for both current and 0.14.0 (only supported from 0.14 onwards) ### Impact Doc change, minor. ### Risk level (write none, low medium or high below) Minor ### Documentation Update Doc update ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Change Logs and Impact were stated clearly - [x] Adequate tests were added if applicable - [NA] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Add parallel listing of existing partitions [hudi]
hudi-bot commented on PR #10460: URL: https://github.com/apache/hudi/pull/10460#issuecomment-1881865528 ## CI report: * e3813a268b2e2270300c4be75e3660594c8f48d2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21865) * 1022041ada5bc1b360c463e2b044e232fd8f9749 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21869) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1881865121 ## CI report: * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN * 505fda45b13ca3afd7cb8e0ef5fe575456ad14ca Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21867) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Add parallel listing of existing partitions [hudi]
hudi-bot commented on PR #10460: URL: https://github.com/apache/hudi/pull/10460#issuecomment-1881854982 ## CI report: * e3813a268b2e2270300c4be75e3660594c8f48d2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21865) * 1022041ada5bc1b360c463e2b044e232fd8f9749 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1881854545 ## CI report: * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN * 940711369818ddd1525a7e9b255fa48ea3de4152 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21868) * 505fda45b13ca3afd7cb8e0ef5fe575456ad14ca UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] fix violations of Sonarqube rule java:S2184 [hudi]
bvaradar commented on PR #10444: URL: https://github.com/apache/hudi/pull/10444#issuecomment-1881846064 @KUTEJiang : Can you fix the PR to pass the validation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7229] support partial updates for cdc payload demo [hudi]
linliu-code commented on PR #10384: URL: https://github.com/apache/hudi/pull/10384#issuecomment-1881811126 This is not a correct direction for cdc demo. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7229] support partial updates for cdc payload demo [hudi]
linliu-code closed pull request #10384: [HUDI-7229] support partial updates for cdc payload demo URL: https://github.com/apache/hudi/pull/10384 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [WIP] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]
vinothchandar commented on code in PR #10422: URL: https://github.com/apache/hudi/pull/10422#discussion_r1445060501 ## hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/realtime/TestHoodieRealtimeRecordReader.java: ## @@ -116,6 +117,7 @@ public void setUp() { hadoopConf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName()); baseJobConf = new JobConf(hadoopConf); baseJobConf.set(HoodieMemoryConfig.MAX_DFS_STREAM_BUFFER_SIZE.key(), String.valueOf(1024 * 1024)); +baseJobConf.set(HoodieReaderConfig.FILE_GROUP_READER_ENABLED.key(), "false"); Review Comment: why "false" ## packaging/bundle-validation/validate.sh: ## @@ -93,7 +93,7 @@ test_spark_hadoop_mr_bundles () { # save HiveQL query results hiveqlresultsdir=/tmp/hadoop-mr-bundle/hiveql/trips/results mkdir -p $hiveqlresultsdir -$HIVE_HOME/bin/beeline --hiveconf hive.input.format=org.apache.hudi.hadoop.HoodieParquetInputFormat \ +$HIVE_HOME/bin/beeline --verbose --hiveconf hive.input.format=org.apache.hudi.hadoop.HoodieParquetInputFormat \ Review Comment: does this need to be checked in? ## hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java: ## @@ -91,9 +94,42 @@ private void initAvroInputFormat() { } } + private static boolean checkTableIsHudi(final InputSplit split, final JobConf job) { Review Comment: rename: checkIfHudiTable ## hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/ObjectInspectorCache.java: ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.hadoop.utils; + +import com.github.benmanes.caffeine.cache.Cache; +import com.github.benmanes.caffeine.cache.Caffeine; +import org.apache.avro.Schema; +import org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector; +import org.apache.hadoop.hive.serde.serdeConstants; +import org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils; +import org.apache.hadoop.io.ArrayWritable; +import org.apache.hadoop.mapred.JobConf; + +import java.util.Arrays; +import java.util.HashMap; +import java.util.List; +import java.util.Locale; +import java.util.Map; +import java.util.Set; +import java.util.stream.Collectors; +import java.util.stream.IntStream; + +public class ObjectInspectorCache { Review Comment: java docs ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java: ## @@ -227,23 +231,31 @@ private ClosableIterator makeBootstrapBaseFileIterator(HoodieBaseFile baseFil BaseFile dataFile = baseFile.getBootstrapBaseFile().get(); Pair,List> requiredFields = getDataAndMetaCols(requiredSchema); Pair,List> allFields = getDataAndMetaCols(dataSchema); - -Option> dataFileIterator = requiredFields.getRight().isEmpty() ? Option.empty() : - Option.of(readerContext.getFileRecordIterator(dataFile.getHadoopPath(), 0, dataFile.getFileLen(), -createSchemaFromFields(allFields.getRight()), createSchemaFromFields(requiredFields.getRight()), hadoopConf)); - -Option> skeletonFileIterator = requiredFields.getLeft().isEmpty() ? Option.empty() : - Option.of(readerContext.getFileRecordIterator(baseFile.getHadoopPath(), 0, baseFile.getFileLen(), -createSchemaFromFields(allFields.getLeft()), createSchemaFromFields(requiredFields.getLeft()), hadoopConf)); +Option,Schema>> dataFileIterator = Review Comment: its cool we are able to add a new engine without much changes to this class. ## hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/AbstractRealtimeRecordReader.java: ## @@ -101,6 +102,9 @@ public AbstractRealtimeRecordReader(RealtimeSplit split, JobConf job) { throw new HoodieException("Could not create HoodieRealtimeRecordReader on path " + this.split.getPath(), e); } prepareHiveAvroSerializer(); +if
Re: [PR] [HUDI-7236] Fix mit change partition4 [hudi]
nsivabalan closed pull request #10369: [HUDI-7236] Fix mit change partition4 URL: https://github.com/apache/hudi/pull/10369 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7236] Fix mit change partition5 [hudi]
nsivabalan closed pull request #10370: [HUDI-7236] Fix mit change partition5 URL: https://github.com/apache/hudi/pull/10370 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7236] Fix mit change partition2 [hudi]
nsivabalan closed pull request #10367: [HUDI-7236] Fix mit change partition2 URL: https://github.com/apache/hudi/pull/10367 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7236] Fix mit change partition3 [hudi]
nsivabalan closed pull request #10368: [HUDI-7236] Fix mit change partition3 URL: https://github.com/apache/hudi/pull/10368 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1881771331 ## CI report: * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN * 940711369818ddd1525a7e9b255fa48ea3de4152 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21868) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1881761174 ## CI report: * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN * 6e1bcd405ba3a0d3edd11560da1e5377a16c7d33 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21863) * 940711369818ddd1525a7e9b255fa48ea3de4152 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7251) Improve the logic of AppendHandle for partial updates
[ https://issues.apache.org/jira/browse/HUDI-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu updated HUDI-7251: -- Summary: Improve the logic of AppendHandle for partial updates (was: Improve the logic of AppenHandle for partial updates) > Improve the logic of AppendHandle for partial updates > - > > Key: HUDI-7251 > URL: https://issues.apache.org/jira/browse/HUDI-7251 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Lin Liu >Assignee: Lin Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [MINOR] Add parallel listing of existing partitions [hudi]
hudi-bot commented on PR #10460: URL: https://github.com/apache/hudi/pull/10460#issuecomment-1881686424 ## CI report: * e3813a268b2e2270300c4be75e3660594c8f48d2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21865) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7251) Improve the logic of AppenHandle for partial updates
[ https://issues.apache.org/jira/browse/HUDI-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu updated HUDI-7251: -- Status: In Progress (was: Open) > Improve the logic of AppenHandle for partial updates > > > Key: HUDI-7251 > URL: https://issues.apache.org/jira/browse/HUDI-7251 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Lin Liu >Assignee: Lin Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7229] support partial updates for cdc payload demo [hudi]
bvaradar commented on code in PR #10384: URL: https://github.com/apache/hudi/pull/10384#discussion_r1445146024 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/SourceFormatAdapter.java: ## @@ -158,8 +160,9 @@ public InputBatch> fetchNewDataInAvroFormat(Option> r = ((Source>) source).fetchNext(lastCkptStr, sourceLimit); -return new InputBatch<>(Option.ofNullable(r.getBatch().map( +// InputBatch> r = ((Source>) source).fetchNext(lastCkptStr, sourceLimit); +List>> rs = ((Source>) source).fetchNextForParitialUpdate(lastCkptStr, sourceLimit); Review Comment: Can you keep fetchNext as the main API to be called. Processing PartialUpdate or fullUpdate needs to be encapsulated inside the Source implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [MINOR] Disable flaky test (#10449)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new b54381ff9e6 [MINOR] Disable flaky test (#10449) b54381ff9e6 is described below commit b54381ff9e6d34a41eaf756555c7967eb7146380 Author: Jon Vexler AuthorDate: Mon Jan 8 13:23:17 2024 -0500 [MINOR] Disable flaky test (#10449) Co-authored-by: Jonathan Vexler <=> --- .../src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala| 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala b/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala index 38221cc05c7..599e8ae9708 100644 --- a/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala +++ b/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala @@ -40,7 +40,7 @@ import org.apache.spark.sql.functions.{expr, lit} import org.apache.spark.sql.hudi.HoodieSparkSessionExtension import org.apache.spark.sql.hudi.command.SqlKeyGenerator import org.junit.jupiter.api.Assertions.{assertEquals, assertFalse, assertNotNull, assertNull, assertTrue, fail} -import org.junit.jupiter.api.{AfterEach, BeforeEach, Test} +import org.junit.jupiter.api.{AfterEach, BeforeEach, Disabled, Test} import org.junit.jupiter.params.ParameterizedTest import org.junit.jupiter.params.provider.Arguments.arguments import org.junit.jupiter.params.provider._ @@ -1341,8 +1341,9 @@ def testBulkInsertForDropPartitionColumn(): Unit = { /* * Test case for instant is generated with commit timezone when TIMELINE_TIMEZONE set to UTC * related to HUDI-5978 + * Issue [HUDI-7275] is tracking this test being disabled */ - @Test + @Disabled def testInsertDatasetWithTimelineTimezoneUTC(): Unit = { val defaultTimezone = TimeZone.getDefault try {
Re: [PR] [MINOR] Disable org.apache.hudi.TestHoodieSparkSqlWriter#testInsertDatasetWithTimelineTimezoneUTC [hudi]
nsivabalan merged PR #10449: URL: https://github.com/apache/hudi/pull/10449 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Fixed unit tests [hudi]
bvaradar commented on PR #10362: URL: https://github.com/apache/hudi/pull/10362#issuecomment-1881544848 @yihua : If you are ok with this change, can you land it ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Add parallel listing of existing partitions [hudi]
hudi-bot commented on PR #10460: URL: https://github.com/apache/hudi/pull/10460#issuecomment-1881544202 ## CI report: * e3813a268b2e2270300c4be75e3660594c8f48d2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21865) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]
hudi-bot commented on PR #10352: URL: https://github.com/apache/hudi/pull/10352#issuecomment-1881543595 ## CI report: * 0279dd4b1ab59776cbb5024810f5bb6a00fd2164 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21864) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Add parallel listing of existing partitions [hudi]
hudi-bot commented on PR #10460: URL: https://github.com/apache/hudi/pull/10460#issuecomment-1881530821 ## CI report: * e3813a268b2e2270300c4be75e3660594c8f48d2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Add parallel listing of existing partitions [hudi]
VitoMakarevich opened a new pull request, #10460: URL: https://github.com/apache/hudi/pull/10460 ### Change Logs Currently, the sync works next way(simplified for particular problem explanation): 1. Read partitions which were upserted(create/update is indistinguishable) and deleted from the commitline(since previous sync). 2. Read existing partitions from Glue. 3. Iterate to understand which AWS call should be done for the changed partition(create/update/delete). To facilitate this, there are 2 clients, 1 is `AWSGlueCatalogSyncClient`, second is `HoodieHiveSyncClient`. In the Spark-EMR case, the second relies on the AWS's Hive-Glue interface implementation. There is no public code of the actual version, but there is of some previous, the interesting part is [this](https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/d885a996137b19df6a128cb950ebc43711374e83/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/metastore/DefaultAWSGlueMetastore.java#L301). Saying short for AWS Glue exists a mechanism of [segments](https://docs.aws.amazon.com/glue/latest/webapi/API_Segment.html) which allows to query long lists in parallel. I added code similar to what exists in AWS Hive-Glue implementation. The real use-case for us was that we have been using `HiveSyncTool` (table with ~200k partitions) - and it was behaving slow when a lot of partitions were changing/in case of resync - so we decided to try out `AWSGlueCatalogSyncClient` which gave about 2x boost in write speed, but after we noticed a significant degradation in speed of listing(roughly 40 sec vs 210 on the ~same amount of partitions - 200k). This improvement should bring the same listing speed to the tool. ### Impact There is a new config added with the default value from AWS Glue-Hive(parallelism of 5) - while before it was sequential, so I don't expect it to be failing, but we may set parallelism to 1 to be backward-compatible for whatever reasons. ### Risk level (write none, low medium or high below) Low ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ **Need to adjust the website, but please let me know if you are ok with the changes, I'll proceed then.** - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]
hudi-bot commented on PR #10352: URL: https://github.com/apache/hudi/pull/10352#issuecomment-1881441184 ## CI report: * b5d7c6285d8bc0a0809beebae19a2a789fc55661 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21858) * 0279dd4b1ab59776cbb5024810f5bb6a00fd2164 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21864) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]
hudi-bot commented on PR #10352: URL: https://github.com/apache/hudi/pull/10352#issuecomment-1881424054 ## CI report: * b5d7c6285d8bc0a0809beebae19a2a789fc55661 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21858) * 0279dd4b1ab59776cbb5024810f5bb6a00fd2164 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]
vinothchandar commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1444903624 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/PartitionStatsIndexSupport.scala: ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi + +import org.apache.hudi.HoodieConversionUtils.toScalaOption +import org.apache.hudi.avro.model.{HoodieMetadataColumnStats, HoodieMetadataRecord} +import org.apache.hudi.client.common.HoodieSparkEngineContext +import org.apache.hudi.common.config.HoodieMetadataConfig +import org.apache.hudi.common.data.HoodieData +import org.apache.hudi.common.model.HoodieRecord +import org.apache.hudi.common.table.HoodieTableMetaClient +import org.apache.hudi.common.util.ValidationUtils.checkState +import org.apache.hudi.common.util.hash.ColumnIndexID +import org.apache.hudi.metadata.{HoodieMetadataPayload, HoodieTableMetadata, HoodieTableMetadataUtil} +import org.apache.hudi.util.JFunction +import org.apache.spark.api.java.JavaSparkContext +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.types.StructType + +import scala.collection.JavaConverters._ + +class PartitionStatsIndexSupport(spark: SparkSession, + tableSchema: StructType, + @transient metadataConfig: HoodieMetadataConfig, + @transient metaClient: HoodieTableMetaClient, + allowCaching: Boolean = false) + extends ColumnStatsIndexSupport(spark, tableSchema, metadataConfig, metaClient, allowCaching) { + + @transient private lazy val engineCtx = new HoodieSparkEngineContext(new JavaSparkContext(spark.sparkContext)) + @transient private lazy val metadataTable: HoodieTableMetadata = +HoodieTableMetadata.create(engineCtx, metadataConfig, metaClient.getBasePathV2.toString) + + override def isIndexAvailable: Boolean = { +checkState(metadataConfig.enabled, "Metadata Table support has to be enabled") + metaClient.getTableConfig.getMetadataPartitions.contains(HoodieTableMetadataUtil.PARTITION_NAME_PARTITION_STATS) + } + + override def loadColumnStatsIndexRecords(targetColumns: Seq[String], shouldReadInMemory: Boolean): HoodieData[HoodieMetadataColumnStats] = { +checkState(targetColumns.nonEmpty) +val encodedTargetColumnNames = targetColumns.map(colName => new ColumnIndexID(colName).asBase64EncodedString()) +val metadataRecords: HoodieData[HoodieRecord[HoodieMetadataPayload]] = + metadataTable.getRecordsByKeyPrefixes(encodedTargetColumnNames.asJava, HoodieTableMetadataUtil.PARTITION_NAME_PARTITION_STATS, shouldReadInMemory) +val columnStatsRecords: HoodieData[HoodieMetadataColumnStats] = + // NOTE: Explicit conversion is required for Scala 2.11 Review Comment: side note: I think we can drop support for 2.11? ## hudi-common/src/main/java/org/apache/hudi/common/util/ParquetUtils.java: ## @@ -454,6 +454,51 @@ private static BigDecimal extractDecimal(Object val, DecimalMetadata decimalMeta } } + /** + * Aggregate column range statistics across files in a partition. + * + * @param fileRanges List of column range statistics for each file in a partition + */ + public > HoodieColumnRangeMetadata getColumnRangeInPartition(@Nonnull List> fileRanges) { Review Comment: this is leaking upper level context (files and ranges) into ParquetUtils? This class ought to be about just reading various things out of parquet files. the actual columnar file format. Please relocate and add unit tests/ ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/PartitionStatsIndexTestBase.scala: ## @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may
Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]
vinothchandar commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1444890638 ## hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java: ## @@ -330,6 +330,27 @@ public final class HoodieMetadataConfig extends HoodieConfig { .sinceVersion("1.0.0") .withDocumentation("Parallelism to use, when generating functional index."); + public static final ConfigProperty ENABLE_METADATA_INDEX_PARTITION_STATS = ConfigProperty Review Comment: Can we turn this on and get all tests to pass? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1881241846 ## CI report: * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN * 6e1bcd405ba3a0d3edd11560da1e5377a16c7d33 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21863) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]
hudi-bot commented on PR #10360: URL: https://github.com/apache/hudi/pull/10360#issuecomment-1881226057 ## CI report: * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN * 1dded8d1f774627a49661bc2d1ca812da3caa540 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21686) * 6e1bcd405ba3a0d3edd11560da1e5377a16c7d33 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Hudi dbt incremental materialization not working during incremental dbt run with spark [hudi]
ad1happy2go commented on issue #10448: URL: https://github.com/apache/hudi/issues/10448#issuecomment-1881216828 Synced with @jetansi and ran his models in my setup and all of that working fine. So, should not be a hoodie issue but a setup issue. Shared the dbt setup configs I am using and he will try to fix the setup. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7279] make sampling rate configurable for BOUNDED_IN_MEMORY executor type [hudi]
hudi-bot commented on PR #10459: URL: https://github.com/apache/hudi/pull/10459#issuecomment-1881211302 ## CI report: * e0a43ce9e388b4c8daf83c4ced333f8435de9991 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21862) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]
hudi-bot commented on PR #10352: URL: https://github.com/apache/hudi/pull/10352#issuecomment-1881116180 ## CI report: * b5d7c6285d8bc0a0809beebae19a2a789fc55661 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21858) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]
hudi-bot commented on PR #10426: URL: https://github.com/apache/hudi/pull/10426#issuecomment-1881101054 ## CI report: * 30f390339ad03845b0068256f7938155cf9e08e9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21861) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7279] make sampling rate configurable for BOUNDED_IN_MEMORY executor type [hudi]
hudi-bot commented on PR #10459: URL: https://github.com/apache/hudi/pull/10459#issuecomment-1881101296 ## CI report: * 889a89640b0db39545469a625c0b961829f0aa0a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21860) * e0a43ce9e388b4c8daf83c4ced333f8435de9991 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21862) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]
hudi-bot commented on PR #10352: URL: https://github.com/apache/hudi/pull/10352#issuecomment-1881100486 ## CI report: * b5d7c6285d8bc0a0809beebae19a2a789fc55661 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7279] make sampling rate configurable for BOUNDED_IN_MEMORY executor type [hudi]
hudi-bot commented on PR #10459: URL: https://github.com/apache/hudi/pull/10459#issuecomment-1881023369 ## CI report: * 889a89640b0db39545469a625c0b961829f0aa0a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21860) * e0a43ce9e388b4c8daf83c4ced333f8435de9991 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21862) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7279] make sampling rate configurable for BOUNDED_IN_MEMORY executor type [hudi]
hudi-bot commented on PR #10459: URL: https://github.com/apache/hudi/pull/10459#issuecomment-1881009514 ## CI report: * 889a89640b0db39545469a625c0b961829f0aa0a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21860) * e0a43ce9e388b4c8daf83c4ced333f8435de9991 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7279] make sampling rate configurable for BOUNDED_IN_MEMORY executor type [hudi]
waitingF commented on code in PR #10459: URL: https://github.com/apache/hudi/pull/10459#discussion_r1444569349 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -353,6 +353,18 @@ public class HoodieWriteConfig extends HoodieConfig { .markAdvanced() .withDocumentation("Size of in-memory buffer used for parallelizing network reads and lake storage writes."); + public static final ConfigProperty WRITE_BUFFER_RECORD_SAMPLING_RATE = ConfigProperty + .key("hoodie.write.buffer.record.sampling.rate") + .defaultValue(String.valueOf(64)) + .markAdvanced() Review Comment: sure, done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]
hudi-bot commented on PR #10426: URL: https://github.com/apache/hudi/pull/10426#issuecomment-1880929710 ## CI report: * 1a4507ec8f1ef19381266816fd8dd58e9b73abdc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21856) * 30f390339ad03845b0068256f7938155cf9e08e9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21861) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]
hudi-bot commented on PR #10426: URL: https://github.com/apache/hudi/pull/10426#issuecomment-1880917599 ## CI report: * 1a4507ec8f1ef19381266816fd8dd58e9b73abdc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21856) * 30f390339ad03845b0068256f7938155cf9e08e9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7278] make bloom filter skippable for CPU saving [hudi]
hudi-bot commented on PR #10457: URL: https://github.com/apache/hudi/pull/10457#issuecomment-1880899016 ## CI report: * 7c668bbb0b7cafeb9b6c4d302d6154c91beb366e Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21859) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]
beyond1920 commented on code in PR #10426: URL: https://github.com/apache/hudi/pull/10426#discussion_r1444512704 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java: ## @@ -971,6 +952,59 @@ public void alterPartitionColumnStatistics( throw new HoodieCatalogException("Not supported."); } + public boolean isUpdatePermissible(ObjectPath tablePath, CatalogBaseTable newCatalogTable, boolean ignoreIfNotExists) throws TableNotExistException { +if (!newCatalogTable.getOptions().getOrDefault(CONNECTOR.key(), "").equalsIgnoreCase("hudi")) { + throw new HoodieCatalogException(String.format("The %s is not hoodie table", tablePath.getObjectName())); +} +if (newCatalogTable instanceof CatalogView) { + throw new HoodieCatalogException("Hoodie catalog does not support to ALTER VIEW"); +} + +try { + Table hiveTable = getHiveTable(tablePath); + if (!sameOptions(hiveTable.getParameters(), newCatalogTable.getOptions(), FlinkOptions.TABLE_TYPE) + || !sameOptions(hiveTable.getParameters(), newCatalogTable.getOptions(), FlinkOptions.INDEX_TYPE)) { Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] HUDI baseFile is empty String and this causes IllegalArgumentException [hudi]
ad1happy2go commented on issue #10458: URL: https://github.com/apache/hudi/issues/10458#issuecomment-1880868263 @nicholasxu Thanks for raising this. I am also getting this error while querying with 'read.streaming.enabled' and 'cdc.enabled' is true . Normal reads are running fine. We will look into it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7279] make sampling rate configurable for BOUNDED_IN_MEMORY executor type [hudi]
hudi-bot commented on PR #10459: URL: https://github.com/apache/hudi/pull/10459#issuecomment-1880838207 ## CI report: * 889a89640b0db39545469a625c0b961829f0aa0a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21860) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7279] make sampling rate configurable for BOUNDED_IN_MEMORY executor type [hudi]
hudi-bot commented on PR #10459: URL: https://github.com/apache/hudi/pull/10459#issuecomment-1880827643 ## CI report: * 889a89640b0db39545469a625c0b961829f0aa0a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7279] make sampling rate configurable for BOUNDED_IN_MEMORY executor type [hudi]
codope commented on code in PR #10459: URL: https://github.com/apache/hudi/pull/10459#discussion_r148100 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -353,6 +353,18 @@ public class HoodieWriteConfig extends HoodieConfig { .markAdvanced() .withDocumentation("Size of in-memory buffer used for parallelizing network reads and lake storage writes."); + public static final ConfigProperty WRITE_BUFFER_RECORD_SAMPLING_RATE = ConfigProperty + .key("hoodie.write.buffer.record.sampling.rate") + .defaultValue(String.valueOf(64)) + .markAdvanced() Review Comment: Please also add `.sinceVersion("1.0.0")` for both the configs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7278] make bloom filter skippable for CPU saving [hudi]
waitingF commented on code in PR #10457: URL: https://github.com/apache/hudi/pull/10457#discussion_r135982 ## hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroFileWriterFactory.java: ## @@ -51,7 +51,7 @@ protected HoodieFileWriter newParquetFileWriter( String instantTime, Path path, Configuration conf, HoodieConfig config, Schema schema, TaskContextSupplier taskContextSupplier) throws IOException { boolean populateMetaFields = config.getBooleanOrDefault(HoodieTableConfig.POPULATE_META_FIELDS); -boolean enableBloomFilter = populateMetaFields; +boolean enableBloomFilter = populateMetaFields && config.getBooleanOrDefault(HoodieStorageConfig.PARQUET_WITH_BLOOM_FILTER_ENABLED); Review Comment: Nice advice, will adjust -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org