[GitHub] [hudi] hudi-bot commented on pull request #4913: [WIP][HUDI-1517] create marker file for every log file
hudi-bot commented on pull request #4913: URL: https://github.com/apache/hudi/pull/4913#issuecomment-1051787441 ## CI report: * 875ec8b00cd379e669498fe7575503b192f0de5e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6341) * 6816a4b47b88108172b46fece160e4e078345687 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6345) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4913: [WIP][HUDI-1517] create marker file for every log file
hudi-bot removed a comment on pull request #4913: URL: https://github.com/apache/hudi/pull/4913#issuecomment-1051783953 ## CI report: * 875ec8b00cd379e669498fe7575503b192f0de5e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6341) * 6816a4b47b88108172b46fece160e4e078345687 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4913: [WIP][HUDI-1517] create marker file for every log file
hudi-bot removed a comment on pull request #4913: URL: https://github.com/apache/hudi/pull/4913#issuecomment-1051595547 ## CI report: * 875ec8b00cd379e669498fe7575503b192f0de5e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6341) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4913: [WIP][HUDI-1517] create marker file for every log file
hudi-bot commented on pull request #4913: URL: https://github.com/apache/hudi/pull/4913#issuecomment-1051783953 ## CI report: * 875ec8b00cd379e669498fe7575503b192f0de5e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6341) * 6816a4b47b88108172b46fece160e4e078345687 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4909: [HUDI-3516][HUDI-FLINK]Optimize the memory of HoodieDataBlock for Flink
hudi-bot removed a comment on pull request #4909: URL: https://github.com/apache/hudi/pull/4909#issuecomment-1051523638 ## CI report: * 33493276318c24c12d2e78ed719b0cb794c8b656 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6335) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4909: [HUDI-3516][HUDI-FLINK]Optimize the memory of HoodieDataBlock for Flink
hudi-bot commented on pull request #4909: URL: https://github.com/apache/hudi/pull/4909#issuecomment-1051780284 ## CI report: * 33493276318c24c12d2e78ed719b0cb794c8b656 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6335) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6344) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark
hudi-bot commented on pull request #4910: URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051780334 ## CI report: * 6ba1413ff8b09ec39ec823ae2e3816cd217df553 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6342) * 758d417cc8f02537d8174f19c904c062b0873646 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6343) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark
hudi-bot removed a comment on pull request #4910: URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051759517 ## CI report: * 6ba1413ff8b09ec39ec823ae2e3816cd217df553 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6342) * 758d417cc8f02537d8174f19c904c062b0873646 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] cuibo01 commented on pull request #4909: [HUDI-3516][HUDI-FLINK]Optimize the memory of HoodieDataBlock for Flink
cuibo01 commented on pull request #4909: URL: https://github.com/apache/hudi/pull/4909#issuecomment-1051779248 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark
hudi-bot commented on pull request #4910: URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051759517 ## CI report: * 6ba1413ff8b09ec39ec823ae2e3816cd217df553 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6342) * 758d417cc8f02537d8174f19c904c062b0873646 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark
hudi-bot removed a comment on pull request #4910: URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051756102 ## CI report: * e909b66fb05a4cdad405b144b041554f45664d3e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6340) * 6ba1413ff8b09ec39ec823ae2e3816cd217df553 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6342) * 758d417cc8f02537d8174f19c904c062b0873646 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark
hudi-bot removed a comment on pull request #4910: URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051704555 ## CI report: * e909b66fb05a4cdad405b144b041554f45664d3e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6340) * 6ba1413ff8b09ec39ec823ae2e3816cd217df553 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6342) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark
hudi-bot commented on pull request #4910: URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051756102 ## CI report: * e909b66fb05a4cdad405b144b041554f45664d3e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6340) * 6ba1413ff8b09ec39ec823ae2e3816cd217df553 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6342) * 758d417cc8f02537d8174f19c904c062b0873646 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark
hudi-bot commented on pull request #4910: URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051704555 ## CI report: * e909b66fb05a4cdad405b144b041554f45664d3e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6340) * 6ba1413ff8b09ec39ec823ae2e3816cd217df553 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6342) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark
hudi-bot removed a comment on pull request #4910: URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051701135 ## CI report: * e909b66fb05a4cdad405b144b041554f45664d3e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6340) * 6ba1413ff8b09ec39ec823ae2e3816cd217df553 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark
hudi-bot commented on pull request #4910: URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051701135 ## CI report: * e909b66fb05a4cdad405b144b041554f45664d3e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6340) * 6ba1413ff8b09ec39ec823ae2e3816cd217df553 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark
hudi-bot removed a comment on pull request #4910: URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051572873 ## CI report: * e909b66fb05a4cdad405b144b041554f45664d3e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6340) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4913: [WIP][HUDI-1517] create marker file for every log file
hudi-bot commented on pull request #4913: URL: https://github.com/apache/hudi/pull/4913#issuecomment-1051595547 ## CI report: * 875ec8b00cd379e669498fe7575503b192f0de5e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6341) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4913: [WIP][HUDI-1517] create marker file for every log file
hudi-bot removed a comment on pull request #4913: URL: https://github.com/apache/hudi/pull/4913#issuecomment-1051565083 ## CI report: * ea1621d1d17e2c85fe9f69f6b39aaa08f61871d7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6328) * 875ec8b00cd379e669498fe7575503b192f0de5e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6341) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark
hudi-bot commented on pull request #4910: URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051572873 ## CI report: * e909b66fb05a4cdad405b144b041554f45664d3e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6340) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark
hudi-bot removed a comment on pull request #4910: URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051543590 ## CI report: * 4e04e2076296e16e1b5b60b510ed85d536873a93 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6316) * e909b66fb05a4cdad405b144b041554f45664d3e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6340) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] XuQianJin-Stars commented on pull request #4901: [HUDI-3445] Supporting Clustering Command Based on Call Procedure Command for Spark SQL
XuQianJin-Stars commented on pull request #4901: URL: https://github.com/apache/hudi/pull/4901#issuecomment-1051567205 > @XuQianJin-Stars do you have some time to review this pr, thanks, may be we can add this to [HUDI-3161](https://issues.apache.org/jira/browse/HUDI-3161) well, Let me review this pr. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] stayrascal commented on a change in pull request #4724: [HUDI-2815] add partial overwrite payload to support partial overwrit…
stayrascal commented on a change in pull request #4724: URL: https://github.com/apache/hudi/pull/4724#discussion_r815265938 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java ## @@ -58,6 +58,31 @@ default T preCombine(T oldValue, Properties properties) { return preCombine(oldValue); } + /** + *When more than one HoodieRecord have the same HoodieKey in the incoming batch, this function combines them before attempting to insert/upsert by taking in a property map. + * + * @param oldValue instance of the old {@link HoodieRecordPayload} to be combined with. + * @param properties Payload related properties. For example pass the ordering field(s) name to extract from value in storage. + * @param schema Schema used for record + * @return the combined value + */ + @PublicAPIMethod(maturity = ApiMaturityLevel.STABLE) + default T preCombine(T oldValue, Properties properties, Schema schema) { Review comment: BTW, thanks a lot for you time, will ping you on slack. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] stayrascal commented on a change in pull request #4724: [HUDI-2815] add partial overwrite payload to support partial overwrit…
stayrascal commented on a change in pull request #4724: URL: https://github.com/apache/hudi/pull/4724#discussion_r815265857 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java ## @@ -58,6 +58,31 @@ default T preCombine(T oldValue, Properties properties) { return preCombine(oldValue); } + /** + *When more than one HoodieRecord have the same HoodieKey in the incoming batch, this function combines them before attempting to insert/upsert by taking in a property map. + * + * @param oldValue instance of the old {@link HoodieRecordPayload} to be combined with. + * @param properties Payload related properties. For example pass the ordering field(s) name to extract from value in storage. + * @param schema Schema used for record + * @return the combined value + */ + @PublicAPIMethod(maturity = ApiMaturityLevel.STABLE) + default T preCombine(T oldValue, Properties properties, Schema schema) { Review comment: Hi @alexeykudinkin , Thanks a lot for you detail clarification. 1. Regarding the design of `preCombine`, I'm clear now. I'm sorry I don't know the detail of RFC-46, and also I didn't find the link RFC-46 from [here](https://cwiki.apache.org/confluence/display/HUDI/RFC+Process), cloud you please share the link? 2. and regarding the requirements for partial updates/overwrite, I saw some same requirements from community. In my case, generally, we want to build a customer profile with multiple attributes, these attributes might come from different systems, one system might only provides some attributes in a event/record, and two systems might the events/records with different attributes, we should not only choose the recent one, we need to merged them before writing to disk. Otherwise, we have to keep all change logs, and then start a new job to dedup & merge these attributes among the change logs. For example, we have 10 attributes a1-a10(all of them are optional), source system A only has the a1-a5, source system B only has a6-a10, what result we expect is that the final record contains a1-a10, not only a1-a5 or a6-a10. And because we might receive two events/records in same time, they might be in a same batch, that's why we want to merge them before `combineAndGetUpdateValue `. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4913: [WIP][HUDI-1517] create marker file for every log file
hudi-bot commented on pull request #4913: URL: https://github.com/apache/hudi/pull/4913#issuecomment-1051565083 ## CI report: * ea1621d1d17e2c85fe9f69f6b39aaa08f61871d7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6328) * 875ec8b00cd379e669498fe7575503b192f0de5e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6341) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4913: [WIP][HUDI-1517] create marker file for every log file
hudi-bot removed a comment on pull request #4913: URL: https://github.com/apache/hudi/pull/4913#issuecomment-1051564339 ## CI report: * ea1621d1d17e2c85fe9f69f6b39aaa08f61871d7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6328) * 875ec8b00cd379e669498fe7575503b192f0de5e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4913: [WIP][HUDI-1517] create marker file for every log file
hudi-bot commented on pull request #4913: URL: https://github.com/apache/hudi/pull/4913#issuecomment-1051564339 ## CI report: * ea1621d1d17e2c85fe9f69f6b39aaa08f61871d7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6328) * 875ec8b00cd379e669498fe7575503b192f0de5e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4913: [WIP][HUDI-1517] create marker file for every log file
hudi-bot removed a comment on pull request #4913: URL: https://github.com/apache/hudi/pull/4913#issuecomment-1051223344 ## CI report: * ea1621d1d17e2c85fe9f69f6b39aaa08f61871d7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6328) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds
hudi-bot removed a comment on pull request #4866: URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051543457 ## CI report: * 19e5414e6de587e6c941e818e6961a96057b5e7f Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6337) * 180ea55d8c08a4933202dbb3cd2cc87b06e0ef3d UNKNOWN * 2c85afe1c7a84f3e7d55a8111bc6a6e9a0214c16 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6339) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds
hudi-bot commented on pull request #4866: URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051558049 ## CI report: * 180ea55d8c08a4933202dbb3cd2cc87b06e0ef3d UNKNOWN * 2c85afe1c7a84f3e7d55a8111bc6a6e9a0214c16 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6339) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao removed a comment on pull request #4901: [HUDI-3445] Supporting Clustering Command Based on Call Procedure Command for Spark SQL
xiarixiaoyao removed a comment on pull request #4901: URL: https://github.com/apache/hudi/pull/4901#issuecomment-1051549434 @XuQianJin-Stars do you have some time to review this pr, thanks, may be we can add this to HUDI-3161 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4739: [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS
hudi-bot commented on pull request #4739: URL: https://github.com/apache/hudi/pull/4739#issuecomment-1051552922 ## CI report: * 11f1b688459ab9017ebde2a38d1645e0f59b50c3 UNKNOWN * c243f70d774b7ecb059dad4bb03870b2c2d4436b UNKNOWN * 2f65bc5fe37942aa72721f19a483194da02ba912 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6333) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6338) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4739: [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS
hudi-bot removed a comment on pull request #4739: URL: https://github.com/apache/hudi/pull/4739#issuecomment-1051495717 ## CI report: * 11f1b688459ab9017ebde2a38d1645e0f59b50c3 UNKNOWN * c243f70d774b7ecb059dad4bb03870b2c2d4436b UNKNOWN * 2f65bc5fe37942aa72721f19a483194da02ba912 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6333) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6338) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on pull request #4901: [HUDI-3445] Supporting Clustering Command Based on Call Procedure Command for Spark SQL
xiarixiaoyao commented on pull request #4901: URL: https://github.com/apache/hudi/pull/4901#issuecomment-1051550139 @huberylee Thank you very much for contributing, pls fixed the CI build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on pull request #4901: [HUDI-3445] Supporting Clustering Command Based on Call Procedure Command for Spark SQL
xiarixiaoyao commented on pull request #4901: URL: https://github.com/apache/hudi/pull/4901#issuecomment-1051549434 @XuQianJin-Stars do you have some time to review this pr, thanks, may be we can add this to HUDI-3161 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on pull request #4901: [HUDI-3445] Supporting Clustering Command Based on Call Procedure Command for Spark SQL
xiarixiaoyao commented on pull request #4901: URL: https://github.com/apache/hudi/pull/4901#issuecomment-1051549041 @XuQianJin-Stars do you have some time to review this pr, thanks, may be we can add this to HUDI-3161 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds
hudi-bot commented on pull request #4866: URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051543457 ## CI report: * 19e5414e6de587e6c941e818e6961a96057b5e7f Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6337) * 180ea55d8c08a4933202dbb3cd2cc87b06e0ef3d UNKNOWN * 2c85afe1c7a84f3e7d55a8111bc6a6e9a0214c16 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6339) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark
hudi-bot commented on pull request #4910: URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051543590 ## CI report: * 4e04e2076296e16e1b5b60b510ed85d536873a93 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6316) * e909b66fb05a4cdad405b144b041554f45664d3e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6340) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds
hudi-bot removed a comment on pull request #4866: URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051499404 ## CI report: * 19e5414e6de587e6c941e818e6961a96057b5e7f Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6337) * 180ea55d8c08a4933202dbb3cd2cc87b06e0ef3d UNKNOWN * 2c85afe1c7a84f3e7d55a8111bc6a6e9a0214c16 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark
hudi-bot removed a comment on pull request #4910: URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051540581 ## CI report: * 4e04e2076296e16e1b5b60b510ed85d536873a93 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6316) * e909b66fb05a4cdad405b144b041554f45664d3e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark
hudi-bot commented on pull request #4910: URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051540581 ## CI report: * 4e04e2076296e16e1b5b60b510ed85d536873a93 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6316) * e909b66fb05a4cdad405b144b041554f45664d3e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark
hudi-bot removed a comment on pull request #4910: URL: https://github.com/apache/hudi/pull/4910#issuecomment-1050966417 ## CI report: * 4e04e2076296e16e1b5b60b510ed85d536873a93 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6316) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4909: [HUDI-3516][HUDI-FLINK]Optimize the memory of HoodieDataBlock for Flink
hudi-bot commented on pull request #4909: URL: https://github.com/apache/hudi/pull/4909#issuecomment-1051523638 ## CI report: * 33493276318c24c12d2e78ed719b0cb794c8b656 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6335) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4909: [HUDI-3516][HUDI-FLINK]Optimize the memory of HoodieDataBlock for Flink
hudi-bot removed a comment on pull request #4909: URL: https://github.com/apache/hudi/pull/4909#issuecomment-1051437141 ## CI report: * 8e1f10b144a87e8e262c19d1e408cdb4829c8fcf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6310) * 33493276318c24c12d2e78ed719b0cb794c8b656 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6335) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4911: [HUDI-3460]Flink TM memory Optimization
hudi-bot commented on pull request #4911: URL: https://github.com/apache/hudi/pull/4911#issuecomment-1051516503 ## CI report: * f275704b3dc2fbe99be692bc8c4d2cef383664f0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6336) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4911: [HUDI-3460]Flink TM memory Optimization
hudi-bot removed a comment on pull request #4911: URL: https://github.com/apache/hudi/pull/4911#issuecomment-1051437185 ## CI report: * 17d293ebacd7125656795cc04815e5bd41a1666a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6313) * f275704b3dc2fbe99be692bc8c4d2cef383664f0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6336) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a change in pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds
xushiyan commented on a change in pull request #4866: URL: https://github.com/apache/hudi/pull/4866#discussion_r815258027 ## File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestDataGenerator.java ## @@ -269,25 +300,25 @@ public static GenericRecord generateGenericRecord(String rowKey, String partitio rec.put("partition_path", partitionPath); rec.put("rider", riderName); rec.put("driver", driverName); -rec.put("begin_lat", RAND.nextDouble()); -rec.put("begin_lon", RAND.nextDouble()); -rec.put("end_lat", RAND.nextDouble()); -rec.put("end_lon", RAND.nextDouble()); +rec.put("begin_lat", r.nextDouble()); +rec.put("begin_lon", r.nextDouble()); +rec.put("end_lat", r.nextDouble()); +rec.put("end_lon", r.nextDouble()); if (isFlattened) { - rec.put("fare", RAND.nextDouble() * 100); + rec.put("fare", r.nextDouble() * 100); rec.put("currency", "USD"); } else { - rec.put("distance_in_meters", RAND.nextInt()); - rec.put("seconds_since_epoch", RAND.nextLong()); - rec.put("weight", RAND.nextFloat()); + rec.put("distance_in_meters", r.nextInt()); + rec.put("seconds_since_epoch", r.nextLong()); + rec.put("weight", r.nextFloat()); byte[] bytes = "Canada".getBytes(); rec.put("nation", ByteBuffer.wrap(bytes)); - long currentTimeMillis = System.currentTimeMillis(); - Date date = new Date(currentTimeMillis); + long randomMillis = genRandomTimeMillis(r); + Date date = new Date(randomMillis); Review comment: It's probably LocalDate we need -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds
hudi-bot removed a comment on pull request #4866: URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051483851 ## CI report: * 19e5414e6de587e6c941e818e6961a96057b5e7f Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6337) * 180ea55d8c08a4933202dbb3cd2cc87b06e0ef3d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds
hudi-bot commented on pull request #4866: URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051499404 ## CI report: * 19e5414e6de587e6c941e818e6961a96057b5e7f Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6337) * 180ea55d8c08a4933202dbb3cd2cc87b06e0ef3d UNKNOWN * 2c85afe1c7a84f3e7d55a8111bc6a6e9a0214c16 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4739: [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS
hudi-bot commented on pull request #4739: URL: https://github.com/apache/hudi/pull/4739#issuecomment-1051495717 ## CI report: * 11f1b688459ab9017ebde2a38d1645e0f59b50c3 UNKNOWN * c243f70d774b7ecb059dad4bb03870b2c2d4436b UNKNOWN * 2f65bc5fe37942aa72721f19a483194da02ba912 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6333) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6338) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4739: [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS
hudi-bot removed a comment on pull request #4739: URL: https://github.com/apache/hudi/pull/4739#issuecomment-1051413537 ## CI report: * 11f1b688459ab9017ebde2a38d1645e0f59b50c3 UNKNOWN * c243f70d774b7ecb059dad4bb03870b2c2d4436b UNKNOWN * 2f65bc5fe37942aa72721f19a483194da02ba912 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6333) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on pull request #4739: [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS
alexeykudinkin commented on pull request #4739: URL: https://github.com/apache/hudi/pull/4739#issuecomment-1051494793 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds
alexeykudinkin commented on a change in pull request #4866: URL: https://github.com/apache/hudi/pull/4866#discussion_r815255941 ## File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestDataGenerator.java ## @@ -269,25 +300,25 @@ public static GenericRecord generateGenericRecord(String rowKey, String partitio rec.put("partition_path", partitionPath); rec.put("rider", riderName); rec.put("driver", driverName); -rec.put("begin_lat", RAND.nextDouble()); -rec.put("begin_lon", RAND.nextDouble()); -rec.put("end_lat", RAND.nextDouble()); -rec.put("end_lon", RAND.nextDouble()); +rec.put("begin_lat", r.nextDouble()); +rec.put("begin_lon", r.nextDouble()); +rec.put("end_lat", r.nextDouble()); +rec.put("end_lon", r.nextDouble()); if (isFlattened) { - rec.put("fare", RAND.nextDouble() * 100); + rec.put("fare", r.nextDouble() * 100); rec.put("currency", "USD"); } else { - rec.put("distance_in_meters", RAND.nextInt()); - rec.put("seconds_since_epoch", RAND.nextLong()); - rec.put("weight", RAND.nextFloat()); + rec.put("distance_in_meters", r.nextInt()); + rec.put("seconds_since_epoch", r.nextLong()); + rec.put("weight", r.nextFloat()); byte[] bytes = "Canada".getBytes(); rec.put("nation", ByteBuffer.wrap(bytes)); - long currentTimeMillis = System.currentTimeMillis(); - Date date = new Date(currentTimeMillis); + long randomMillis = genRandomTimeMillis(r); + Date date = new Date(randomMillis); Review comment: Yeah, no problem. Which one are you referring to? There's no such thing as `DateTime` in java.time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4818: [HUDI-3396] Make sure `BaseFileOnlyViewRelation` only reads projected columns
alexeykudinkin commented on a change in pull request #4818: URL: https://github.com/apache/hudi/pull/4818#discussion_r815255066 ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkClientFunctionalTestHarness.java ## @@ -348,6 +355,21 @@ protected HoodieWriteConfig getConfig(Boolean autoCommit, Boolean rollbackUsingM .withRollbackUsingMarkers(rollbackUsingMarkers); } + protected Dataset toDataset(List records) { Review comment: Good call ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileScanRDD.scala ## @@ -18,56 +18,37 @@ package org.apache.hudi -import org.apache.spark.{Partition, TaskContext} -import org.apache.spark.rdd.RDD -import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow} +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.execution.QueryExecutionException -import org.apache.spark.sql.{Row, SparkSession} import org.apache.spark.sql.execution.datasources.{FilePartition, PartitionedFile, SchemaColumnConvertNotSupportedException} -import org.apache.spark.sql.types.StructType +import org.apache.spark.{Partition, TaskContext} /** - * Similar to [[org.apache.spark.sql.execution.datasources.FileScanRDD]]. - * - * This class will extract the fields needed according to [[requiredColumns]] and - * return iterator of [[org.apache.spark.sql.Row]] directly. + * TODO eval if we actually need it Review comment: This would be cleaned up in a stacked on PR ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileScanRDD.scala ## @@ -93,17 +74,8 @@ class HoodieFileScanRDD( // Register an on-task-completion callback to close the input stream. context.addTaskCompletionListener[Unit](_ => iterator.close()) -// extract required columns from row -val iterAfterExtract = HoodieDataSourceHelper.extractRequiredSchema( Review comment: This utility itself would be cleaned up in a stacked PR ## File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestParquetColumnProjection.scala ## @@ -0,0 +1,293 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.functional + +import org.apache.avro.Schema +import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions, DefaultSource, HoodieBaseRelation, HoodieSparkUtils, HoodieUnsafeRDD} +import org.apache.hudi.common.config.HoodieMetadataConfig +import org.apache.hudi.common.model.HoodieRecord +import org.apache.hudi.common.testutils.{HadoopMapRedUtils, HoodieTestDataGenerator} +import org.apache.hudi.config.HoodieWriteConfig +import org.apache.hudi.keygen.NonpartitionedKeyGenerator +import org.apache.hudi.testutils.SparkClientFunctionalTestHarness +import org.apache.parquet.hadoop.util.counters.BenchmarkCounter +import org.apache.spark.HoodieUnsafeRDDUtils +import org.apache.spark.sql.{Dataset, Row, SaveMode} +import org.apache.spark.sql.catalyst.InternalRow +import org.junit.jupiter.api.Assertions.assertEquals +import org.junit.jupiter.api.{Tag, Test} + +import scala.collection.JavaConverters._ + +@Tag("functional") +class TestParquetColumnProjection extends SparkClientFunctionalTestHarness { + + val defaultWriteOpts = Map( +"hoodie.insert.shuffle.parallelism" -> "4", +"hoodie.upsert.shuffle.parallelism" -> "4", +"hoodie.bulkinsert.shuffle.parallelism" -> "2", +"hoodie.delete.shuffle.parallelism" -> "1", +DataSourceWriteOptions.RECORDKEY_FIELD.key -> "_row_key", +DataSourceWriteOptions.PRECOMBINE_FIELD.key -> "timestamp", +HoodieWriteConfig.TBL_NAME.key -> "hoodie_test", +HoodieMetadataConfig.ENABLE.key -> "true", +// NOTE: It's critical that we use non-partitioned table, since the way we track amount of bytes read +// is not robust, and works most reliably only when we read just a single file. As such, making table +// non-partitioned makes it much more likely just a single file will be written +DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key ->
[GitHub] [hudi] hudi-bot removed a comment on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds
hudi-bot removed a comment on pull request #4866: URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051481439 ## CI report: * e6627d184210cb0949ed4f378a0e55fcff3823af Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6251) * 19e5414e6de587e6c941e818e6961a96057b5e7f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6337) * 180ea55d8c08a4933202dbb3cd2cc87b06e0ef3d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds
hudi-bot commented on pull request #4866: URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051483851 ## CI report: * 19e5414e6de587e6c941e818e6961a96057b5e7f Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6337) * 180ea55d8c08a4933202dbb3cd2cc87b06e0ef3d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds
hudi-bot commented on pull request #4866: URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051481439 ## CI report: * e6627d184210cb0949ed4f378a0e55fcff3823af Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6251) * 19e5414e6de587e6c941e818e6961a96057b5e7f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6337) * 180ea55d8c08a4933202dbb3cd2cc87b06e0ef3d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds
hudi-bot removed a comment on pull request #4866: URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051475251 ## CI report: * e6627d184210cb0949ed4f378a0e55fcff3823af Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6251) * 19e5414e6de587e6c941e818e6961a96057b5e7f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6337) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds
hudi-bot removed a comment on pull request #4866: URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051472573 ## CI report: * e6627d184210cb0949ed4f378a0e55fcff3823af Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6251) * 19e5414e6de587e6c941e818e6961a96057b5e7f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds
hudi-bot commented on pull request #4866: URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051475251 ## CI report: * e6627d184210cb0949ed4f378a0e55fcff3823af Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6251) * 19e5414e6de587e6c941e818e6961a96057b5e7f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6337) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds
hudi-bot removed a comment on pull request #4866: URL: https://github.com/apache/hudi/pull/4866#issuecomment-1049425839 ## CI report: * e6627d184210cb0949ed4f378a0e55fcff3823af Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6251) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds
hudi-bot commented on pull request #4866: URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051472573 ## CI report: * e6627d184210cb0949ed4f378a0e55fcff3823af Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6251) * 19e5414e6de587e6c941e818e6961a96057b5e7f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-347) Fix TestHoodieClientOnCopyOnWriteStorage Tests with modular private methods
[ https://issues.apache.org/jira/browse/HUDI-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498382#comment-17498382 ] sivabalan narayanan commented on HUDI-347: -- yes, makes sense. please go ahead. > Fix TestHoodieClientOnCopyOnWriteStorage Tests with modular private methods > --- > > Key: HUDI-347 > URL: https://issues.apache.org/jira/browse/HUDI-347 > Project: Apache Hudi > Issue Type: Improvement > Components: Testing, writer-core >Reporter: sivabalan narayanan >Assignee: Rajesh >Priority: Major > Labels: new-to-hudi > Original Estimate: 24h > Remaining Estimate: 24h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HUDI-3409) Expose Timeline Server Metrics
[ https://issues.apache.org/jira/browse/HUDI-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498381#comment-17498381 ] sivabalan narayanan commented on HUDI-3409: --- yes, sure makes sense. > Expose Timeline Server Metrics > -- > > Key: HUDI-3409 > URL: https://issues.apache.org/jira/browse/HUDI-3409 > Project: Apache Hudi > Issue Type: Improvement > Components: timeline-server >Reporter: DarAmani Swift >Assignee: Rajesh >Priority: Major > Labels: new-to-hudi > > Timeline server metrics are pushed to local registry but never going to > reporters. Exposing these metrics would greatly improve debugging latency > around async processes and timeline server syncs. > Metrics are already captured in the [Request > Handler|https://github.com/apache/hudi/blob/master/hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/RequestHandler.java#L527-L531] > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot removed a comment on pull request #4909: [HUDI-3516][HUDI-FLINK]Optimize the memory of HoodieDataBlock for Flink
hudi-bot removed a comment on pull request #4909: URL: https://github.com/apache/hudi/pull/4909#issuecomment-1051434239 ## CI report: * 8e1f10b144a87e8e262c19d1e408cdb4829c8fcf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6310) * 33493276318c24c12d2e78ed719b0cb794c8b656 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4911: [HUDI-3460]Flink TM memory Optimization
hudi-bot commented on pull request #4911: URL: https://github.com/apache/hudi/pull/4911#issuecomment-1051437185 ## CI report: * 17d293ebacd7125656795cc04815e5bd41a1666a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6313) * f275704b3dc2fbe99be692bc8c4d2cef383664f0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6336) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4909: [HUDI-3516][HUDI-FLINK]Optimize the memory of HoodieDataBlock for Flink
hudi-bot commented on pull request #4909: URL: https://github.com/apache/hudi/pull/4909#issuecomment-1051437141 ## CI report: * 8e1f10b144a87e8e262c19d1e408cdb4829c8fcf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6310) * 33493276318c24c12d2e78ed719b0cb794c8b656 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6335) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4911: [HUDI-3460]Flink TM memory Optimization
hudi-bot removed a comment on pull request #4911: URL: https://github.com/apache/hudi/pull/4911#issuecomment-1051434289 ## CI report: * 17d293ebacd7125656795cc04815e5bd41a1666a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6313) * f275704b3dc2fbe99be692bc8c4d2cef383664f0 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4911: [HUDI-3460]Flink TM memory Optimization
hudi-bot removed a comment on pull request #4911: URL: https://github.com/apache/hudi/pull/4911#issuecomment-1050892174 ## CI report: * 17d293ebacd7125656795cc04815e5bd41a1666a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6313) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4911: [HUDI-3460]Flink TM memory Optimization
hudi-bot commented on pull request #4911: URL: https://github.com/apache/hudi/pull/4911#issuecomment-1051434289 ## CI report: * 17d293ebacd7125656795cc04815e5bd41a1666a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6313) * f275704b3dc2fbe99be692bc8c4d2cef383664f0 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4909: [HUDI-3516][HUDI-FLINK]Optimize the memory of HoodieDataBlock for Flink
hudi-bot removed a comment on pull request #4909: URL: https://github.com/apache/hudi/pull/4909#issuecomment-1050837761 ## CI report: * 8e1f10b144a87e8e262c19d1e408cdb4829c8fcf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6310) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4909: [HUDI-3516][HUDI-FLINK]Optimize the memory of HoodieDataBlock for Flink
hudi-bot commented on pull request #4909: URL: https://github.com/apache/hudi/pull/4909#issuecomment-1051434239 ## CI report: * 8e1f10b144a87e8e262c19d1e408cdb4829c8fcf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6310) * 33493276318c24c12d2e78ed719b0cb794c8b656 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] allenxyang commented on a change in pull request #4679: [HUDI-3315] RFC-35 Part-1 Support bucket index in Flink writer
allenxyang commented on a change in pull request #4679: URL: https://github.com/apache/hudi/pull/4679#discussion_r814511291 ## File path: hudi-flink/src/main/java/org/apache/hudi/sink/utils/Pipelines.java ## @@ -189,21 +191,32 @@ } public static DataStream hoodieStreamWrite(Configuration conf, int defaultParallelism, DataStream dataStream) { -WriteOperatorFactory operatorFactory = StreamWriteOperator.getFactory(conf); -return dataStream +if (OptionsResolver.isBucketIndexType(conf)) { Review comment: After I used this patch, reported this error. I don't know why. Caused by: java.lang.RuntimeException: The timer service has not been initialized. at org.apache.flink.streaming.api.operators.AbstractStreamOperator.getInternalTimerService(AbstractStreamOperator.java:616) at org.apache.flink.streaming.api.operators.KeyedProcessOperator.open(KeyedProcessOperator.java:62) at org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:428) at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$2(StreamTask.java:555) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) at org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:545) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:585) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:765) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:580) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4907: [WIP][CI Test Only][HUDI-1180] Upgrade HBase to 2.4.9
hudi-bot commented on pull request #4907: URL: https://github.com/apache/hudi/pull/4907#issuecomment-1051421561 ## CI report: * 55db32bf4b6aa3796be90879815328ae376dd606 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6334) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4907: [WIP][CI Test Only][HUDI-1180] Upgrade HBase to 2.4.9
hudi-bot removed a comment on pull request #4907: URL: https://github.com/apache/hudi/pull/4907#issuecomment-1051371368 ## CI report: * 90895141bcecf4a6f966faf73b2c0fa609290281 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6331) * 55db32bf4b6aa3796be90879815328ae376dd606 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6334) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4739: [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS
hudi-bot commented on pull request #4739: URL: https://github.com/apache/hudi/pull/4739#issuecomment-1051413537 ## CI report: * 11f1b688459ab9017ebde2a38d1645e0f59b50c3 UNKNOWN * c243f70d774b7ecb059dad4bb03870b2c2d4436b UNKNOWN * 2f65bc5fe37942aa72721f19a483194da02ba912 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6333) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4739: [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS
hudi-bot removed a comment on pull request #4739: URL: https://github.com/apache/hudi/pull/4739#issuecomment-1051369841 ## CI report: * 11f1b688459ab9017ebde2a38d1645e0f59b50c3 UNKNOWN * c243f70d774b7ecb059dad4bb03870b2c2d4436b UNKNOWN * 2790e24a229e808602113c7ed80932b09e56c8fd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6255) * 2f65bc5fe37942aa72721f19a483194da02ba912 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6333) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #4818: [HUDI-3396] Make sure `BaseFileOnlyViewRelation` only reads projected columns
nsivabalan commented on a change in pull request #4818: URL: https://github.com/apache/hudi/pull/4818#discussion_r815229255 ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkClientFunctionalTestHarness.java ## @@ -348,6 +355,21 @@ protected HoodieWriteConfig getConfig(Boolean autoCommit, Boolean rollbackUsingM .withRollbackUsingMarkers(rollbackUsingMarkers); } + protected Dataset toDataset(List records) { Review comment: can we take in AvroSchema as an argument. may be create another overloaded method which calls into this w/ some default for avro schema. ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BaseFileOnlyViewRelation.scala ## @@ -89,18 +108,46 @@ class BaseFileOnlyViewRelation( inMemoryFileIndex.listFiles(partitionFilters, dataFilters) } -val partitionFiles = partitionDirectories.flatMap { partition => +val partitions = partitionDirectories.flatMap { partition => partition.files.flatMap { file => +// TODO move to adapter +// TODO fix, currently assuming parquet as underlying format HoodieDataSourceHelper.splitFiles( sparkSession = sparkSession, file = file, - partitionValues = partition.values + // TODO clarify why this is required + partitionValues = InternalRow.empty Review comment: I see why we are doing this. do you think we can fix HoodieDataSourceHelper.splitFiles() only to not take in the last arg and directly set InternalRow.empty when creating Partitionedfile. I guess this is the only usage/caller. so should be safe to do. ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieDataSourceHelper.scala ## @@ -33,43 +33,6 @@ import scala.collection.JavaConverters._ object HoodieDataSourceHelper extends PredicateHelper { - /** - * Partition the given condition into two sequence of conjunctive predicates: - * - predicates that can be evaluated using metadata only. - * - other predicates. - */ - def splitPartitionAndDataPredicates( - spark: SparkSession, - condition: Expression, - partitionColumns: Seq[String]): (Seq[Expression], Seq[Expression]) = { -splitConjunctivePredicates(condition).partition( - isPredicateMetadataOnly(spark, _, partitionColumns)) - } - - /** - * Check if condition can be evaluated using only metadata. In Delta, this means the condition - * only references partition columns and involves no subquery. - */ - def isPredicateMetadataOnly( Review comment: guess it was copied over from here. ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieUnsafeRDD.scala ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.{Partition, SparkContext, TaskContext} + +/** + * !!! PLEASE READ CAREFULLY !!! + * + * Base class for all of the custom low-overhead RDD implementations for Hudi. + * + * To keep memory allocation footprint as low as possible, each inheritor of this RDD base class + * + * + * 1. Does NOT deserialize from [[InternalRow]] to [[Row]] (therefore only providing access to + * Catalyst internal representations (often mutable) of the read row) + * + * 2. DOES NOT COPY UNDERLYING ROW OUT OF THE BOX, meaning that + * + * a) access to this RDD is NOT thread-safe + * + * b) iterating over it reference to a _mutable_ underlying instance (of [[InternalRow]]) is + * returned, entailing that after [[Iterator#next()]] is invoked on the provided iterator, + * previous reference becomes **invalid**. Therefore, you will have to copy underlying mutable + * instance of [[InternalRow]] if you plan to access it after [[Iterator#next()]] is invoked (filling + * it with the next row's payload) + * + * c) due to item b) above, no operation other than the iteration will produce meaningful + * results on it and
[GitHub] [hudi] hudi-bot removed a comment on pull request #4915: [WIP] Allow loading external configs while querying Hudi tables with Spark
hudi-bot removed a comment on pull request #4915: URL: https://github.com/apache/hudi/pull/4915#issuecomment-1051367237 ## CI report: * 402156b00dfb5d49593f80df9d18656e08fd3dcb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6332) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4915: [WIP] Allow loading external configs while querying Hudi tables with Spark
hudi-bot commented on pull request #4915: URL: https://github.com/apache/hudi/pull/4915#issuecomment-1051405803 ## CI report: * 402156b00dfb5d49593f80df9d18656e08fd3dcb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6332) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 closed issue #4757: [SUPPORT] Support flink 1.14 in future
danny0405 closed issue #4757: URL: https://github.com/apache/hudi/issues/4757 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4907: [WIP][CI Test Only][HUDI-1180] Upgrade HBase to 2.4.9
hudi-bot removed a comment on pull request #4907: URL: https://github.com/apache/hudi/pull/4907#issuecomment-1051369994 ## CI report: * 90895141bcecf4a6f966faf73b2c0fa609290281 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6331) * 55db32bf4b6aa3796be90879815328ae376dd606 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4907: [WIP][CI Test Only][HUDI-1180] Upgrade HBase to 2.4.9
hudi-bot commented on pull request #4907: URL: https://github.com/apache/hudi/pull/4907#issuecomment-1051371368 ## CI report: * 90895141bcecf4a6f966faf73b2c0fa609290281 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6331) * 55db32bf4b6aa3796be90879815328ae376dd606 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6334) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4907: [WIP][CI Test Only][HUDI-1180] Upgrade HBase to 2.4.9
hudi-bot removed a comment on pull request #4907: URL: https://github.com/apache/hudi/pull/4907#issuecomment-1051333479 ## CI report: * 90895141bcecf4a6f966faf73b2c0fa609290281 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6331) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4907: [WIP][CI Test Only][HUDI-1180] Upgrade HBase to 2.4.9
hudi-bot commented on pull request #4907: URL: https://github.com/apache/hudi/pull/4907#issuecomment-1051369994 ## CI report: * 90895141bcecf4a6f966faf73b2c0fa609290281 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6331) * 55db32bf4b6aa3796be90879815328ae376dd606 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4739: [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS
hudi-bot commented on pull request #4739: URL: https://github.com/apache/hudi/pull/4739#issuecomment-1051369841 ## CI report: * 11f1b688459ab9017ebde2a38d1645e0f59b50c3 UNKNOWN * c243f70d774b7ecb059dad4bb03870b2c2d4436b UNKNOWN * 2790e24a229e808602113c7ed80932b09e56c8fd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6255) * 2f65bc5fe37942aa72721f19a483194da02ba912 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6333) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4739: [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS
hudi-bot removed a comment on pull request #4739: URL: https://github.com/apache/hudi/pull/4739#issuecomment-1051359116 ## CI report: * 11f1b688459ab9017ebde2a38d1645e0f59b50c3 UNKNOWN * c243f70d774b7ecb059dad4bb03870b2c2d4436b UNKNOWN * 2790e24a229e808602113c7ed80932b09e56c8fd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6255) * 2f65bc5fe37942aa72721f19a483194da02ba912 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4915: [WIP] Allow loading external configs while querying Hudi tables with Spark
hudi-bot commented on pull request #4915: URL: https://github.com/apache/hudi/pull/4915#issuecomment-1051367237 ## CI report: * 402156b00dfb5d49593f80df9d18656e08fd3dcb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6332) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4915: [WIP] Allow loading external configs while querying Hudi tables with Spark
hudi-bot removed a comment on pull request #4915: URL: https://github.com/apache/hudi/pull/4915#issuecomment-1051365831 ## CI report: * 402156b00dfb5d49593f80df9d18656e08fd3dcb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4915: [WIP] Allow loading external configs while querying Hudi tables with Spark
hudi-bot commented on pull request #4915: URL: https://github.com/apache/hudi/pull/4915#issuecomment-1051365831 ## CI report: * 402156b00dfb5d49593f80df9d18656e08fd3dcb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhedoubushishi opened a new pull request #4915: [WIP] Allow loading external configs while querying Hudi tables with Spark
zhedoubushishi opened a new pull request #4915: URL: https://github.com/apache/hudi/pull/4915 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4739: [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS
hudi-bot commented on pull request #4739: URL: https://github.com/apache/hudi/pull/4739#issuecomment-1051359116 ## CI report: * 11f1b688459ab9017ebde2a38d1645e0f59b50c3 UNKNOWN * c243f70d774b7ecb059dad4bb03870b2c2d4436b UNKNOWN * 2790e24a229e808602113c7ed80932b09e56c8fd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6255) * 2f65bc5fe37942aa72721f19a483194da02ba912 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4739: [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS
hudi-bot removed a comment on pull request #4739: URL: https://github.com/apache/hudi/pull/4739#issuecomment-1049487198 ## CI report: * 11f1b688459ab9017ebde2a38d1645e0f59b50c3 UNKNOWN * c243f70d774b7ecb059dad4bb03870b2c2d4436b UNKNOWN * 2790e24a229e808602113c7ed80932b09e56c8fd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6255) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-3519) Make sure every public Hudi Client Method invokes necessary prologue
Alexey Kudinkin created HUDI-3519: - Summary: Make sure every public Hudi Client Method invokes necessary prologue Key: HUDI-3519 URL: https://issues.apache.org/jira/browse/HUDI-3519 Project: Apache Hudi Issue Type: Bug Reporter: Alexey Kudinkin Right now, only a handful of operations actually invoke the "prologue" method doing, for ex # Checks around whether the table needs to be upgraded # Bootstraps MDT (if necessary) As well as some other minor book-keeping stuff. As part of [https://github.com/apache/hudi/pull/4739,] i had to address that and introduced universal method `initTable` that serves as such prologue. However, while i've injected it into most major public methods of the Hudi Client's Base class, we need to carefully and holistically review all remaining exposed *public* methods and make sure that all _public-facing_ operations (insert, upsert, commit, delete, rollback, clean, etc) are invoking prologue properly. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot removed a comment on pull request #4907: [WIP][CI Test Only][HUDI-1180] Upgrade HBase to 2.4.9
hudi-bot removed a comment on pull request #4907: URL: https://github.com/apache/hudi/pull/4907#issuecomment-1051299304 ## CI report: * b4cc5c73732106ad1528ef52b52e3dedfcc305d1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6325) * 90895141bcecf4a6f966faf73b2c0fa609290281 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6331) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4907: [WIP][CI Test Only][HUDI-1180] Upgrade HBase to 2.4.9
hudi-bot commented on pull request #4907: URL: https://github.com/apache/hudi/pull/4907#issuecomment-1051333479 ## CI report: * 90895141bcecf4a6f966faf73b2c0fa609290281 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6331) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4468: [HUDI-3130] Fixing Hive getSchema for RT tables addressing different partitions having different schemas
hudi-bot commented on pull request #4468: URL: https://github.com/apache/hudi/pull/4468#issuecomment-1051316141 ## CI report: * 2461ceb7bb042694782128370df1e96f05a71391 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6330) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4468: [HUDI-3130] Fixing Hive getSchema for RT tables addressing different partitions having different schemas
hudi-bot removed a comment on pull request #4468: URL: https://github.com/apache/hudi/pull/4468#issuecomment-1051280091 ## CI report: * 59331875dcbfb6d64a6cca6e2794c7be565bc97d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6329) * 2461ceb7bb042694782128370df1e96f05a71391 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6330) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4724: [HUDI-2815] add partial overwrite payload to support partial overwrit…
alexeykudinkin commented on a change in pull request #4724: URL: https://github.com/apache/hudi/pull/4724#discussion_r815187048 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java ## @@ -58,6 +58,31 @@ default T preCombine(T oldValue, Properties properties) { return preCombine(oldValue); } + /** + *When more than one HoodieRecord have the same HoodieKey in the incoming batch, this function combines them before attempting to insert/upsert by taking in a property map. + * + * @param oldValue instance of the old {@link HoodieRecordPayload} to be combined with. + * @param properties Payload related properties. For example pass the ordering field(s) name to extract from value in storage. + * @param schema Schema used for record + * @return the combined value + */ + @PublicAPIMethod(maturity = ApiMaturityLevel.STABLE) + default T preCombine(T oldValue, Properties properties, Schema schema) { Review comment: Let me try to clarify a few things: `preCombine` has a _very specific_ semantic: it's de-duplicating by the way of picking "most recent" among records in the batch. Expectation always is that it being handed 2 records it will **have to** return either of them. It could not produce new record. If we want to revisit this semantic this is a far larger change that will surely require writing an RFC and broader discussion regarding the merits of such migration. Please also keep in mind that as of RFC-46 there's an effort underway to abstract whole "record combination/merging" semantic out of `RecordPayload` hierarchy into standalone Combination/Merge Engine API. > First, from the description of preCombine method, it used for combining multiple records with same HoodieKey before attempting to insert/upsert to disk. The "combine multiple records" might not mean only choosing one of them, we also can combine & merged them to a new one, just depends on how the sub-class implement the preCombine logic(Please correct me if my understanding is wrong :) ). Yeah, it might be a little bit confused that we need Schema if we are trying to merged them. Please see my comment regarding `preCombine` semantic above. I certainly agree with you that the name is confusing, but i've tried to clear that confusion. Let me know if you have more questions about it. > Second, I checked when will we call preCombine method is trying to duplicate records with same HoodieKey before insert/update to disk, especially in Flink write case, even through the duplicated logic is choose the latest record, but we need to ensure that one HoodieKey should only contains one record before comparing to existing record and write to disk, otherwise, some records will missed. For example, in HoodieMergeHandle.init(fieId, newRecordsIter), it will convert the record iterator to a map and treat the recordKey as key. So we might not stop de-duping logics and merge them against what is on disk unless we change the logic here. And also we implement another class/method to handle the merge logic, and switch the existing de-duping logic from calling preCombine to new class/method, we have to add an condition to control whether should we call preCombine or not, I think it might not a good way. Instead, we should handle it in preCombine method by different implemented payl oad. You're bringing up a good points, let's dive into them one by one: so currently we have 2 mechanisms 1. `preCombine` that allows to select "most recent" record among those having the same key w/in the batch 2. `combineAndGetUpdateValue` that allows to combine previous or "historical" record (on Disk) with the new incoming one (all partial merging semantic is currently implemented in this method) You rightfully mention some of the invariants are currently that the batch would be de-duped at certain level (b/c we have to maintain PK uniqueness on disk), and so we might need to shift that to accommodate for case that you have. And that's exactly what my question was: if you can elaborate on use-case that you have at hand that you're trying to solve w/ this PR, i would be able to better understand where you're coming from and what's the best path forward for us here. Questions i'm looking an answers for are basically following: 1. What's nature of your use-case? (domain, record types, frequency, size, etc) 2. Where requirements for partial updates are coming from? and etc. I'm happy to set some 30min to talk in person regarding this or connect on Slack and discuss it there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at:
[GitHub] [hudi] hudi-bot removed a comment on pull request #4907: [WIP][CI Test Only][HUDI-1180] Upgrade HBase to 2.4.9
hudi-bot removed a comment on pull request #4907: URL: https://github.com/apache/hudi/pull/4907#issuecomment-1051297656 ## CI report: * b4cc5c73732106ad1528ef52b52e3dedfcc305d1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6325) * 90895141bcecf4a6f966faf73b2c0fa609290281 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org