[GitHub] [hudi] hudi-bot commented on pull request #4699: [HUDI-3336][HUDI-FLINK] Configurations transferred through Flink SQL canno…
hudi-bot commented on pull request #4699: URL: https://github.com/apache/hudi/pull/4699#issuecomment-1024858566 ## CI report: * 6e9036e89041d7ee6cf995b29502cbc10bfbff8d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5579) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5591) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5602) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4699: [HUDI-3336][HUDI-FLINK] Configurations transferred through Flink SQL canno…
hudi-bot removed a comment on pull request #4699: URL: https://github.com/apache/hudi/pull/4699#issuecomment-1024850140 ## CI report: * 6e9036e89041d7ee6cf995b29502cbc10bfbff8d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5579) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5591) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5602) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4556: [HUDI-3191][Stacked on 4716] Removing duplicating file-listing process w/in Hive's MOR `FileInputFormat`s
hudi-bot commented on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1024851007 ## CI report: * 77d11131baabd1c4e3cc2050337daca4df5f6427 UNKNOWN * 3d9c2ae28da858d1e8476052c99391015effb7db UNKNOWN * 31b0669d7b638bd65a17b22a2ceb772f2627512c UNKNOWN * 28a5a4f537544d35dfcd8700a7b97fb7216682ce UNKNOWN * c09e228f7cce78a7dbbc394e93b3cf8c6c3d4d5f UNKNOWN * 5b8f5819fff8fec34864eb409fd429b95be17b9b UNKNOWN * 13702fc2232ebd9300966b0a917a3358d8d0f18c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5590) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5600) * 9bdd8a60f95123f82a5f30f5e68929f5b7e1f8da Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5603) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4556: [HUDI-3191][Stacked on 4716] Removing duplicating file-listing process w/in Hive's MOR `FileInputFormat`s
hudi-bot removed a comment on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1024849640 ## CI report: * 77d11131baabd1c4e3cc2050337daca4df5f6427 UNKNOWN * 3d9c2ae28da858d1e8476052c99391015effb7db UNKNOWN * 31b0669d7b638bd65a17b22a2ceb772f2627512c UNKNOWN * 28a5a4f537544d35dfcd8700a7b97fb7216682ce UNKNOWN * c09e228f7cce78a7dbbc394e93b3cf8c6c3d4d5f UNKNOWN * 5b8f5819fff8fec34864eb409fd429b95be17b9b UNKNOWN * 13702fc2232ebd9300966b0a917a3358d8d0f18c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5590) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5600) * 9bdd8a60f95123f82a5f30f5e68929f5b7e1f8da UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4699: [HUDI-3336][HUDI-FLINK] Configurations transferred through Flink SQL canno…
hudi-bot commented on pull request #4699: URL: https://github.com/apache/hudi/pull/4699#issuecomment-1024850140 ## CI report: * 6e9036e89041d7ee6cf995b29502cbc10bfbff8d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5579) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5591) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5602) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4699: [HUDI-3336][HUDI-FLINK] Configurations transferred through Flink SQL canno…
hudi-bot removed a comment on pull request #4699: URL: https://github.com/apache/hudi/pull/4699#issuecomment-1024822171 ## CI report: * 6e9036e89041d7ee6cf995b29502cbc10bfbff8d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5579) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5591) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4714: [HUDI-3204] fix problem that spark on TimestampKeyGenerator has no re…
hudi-bot commented on pull request #4714: URL: https://github.com/apache/hudi/pull/4714#issuecomment-1024850151 ## CI report: * e8999e4928debb876332f287a1584cc7cbd69c85 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5599) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4714: [HUDI-3204] fix problem that spark on TimestampKeyGenerator has no re…
hudi-bot removed a comment on pull request #4714: URL: https://github.com/apache/hudi/pull/4714#issuecomment-1024839306 ## CI report: * 1dc6c2f464e45fcbffe5d7fde2fbb1c66a6fca34 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5582) * e8999e4928debb876332f287a1584cc7cbd69c85 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5599) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] cuibo01 commented on pull request #4699: [HUDI-3336][HUDI-FLINK] Configurations transferred through Flink SQL canno…
cuibo01 commented on pull request #4699: URL: https://github.com/apache/hudi/pull/4699#issuecomment-1024849931 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] cuibo01 commented on pull request #4699: [HUDI-3336][HUDI-FLINK] Configurations transferred through Flink SQL canno…
cuibo01 commented on pull request #4699: URL: https://github.com/apache/hudi/pull/4699#issuecomment-1024849829 > Thanks, can you explain a little why we need theses `hadoop.` prefixed config options ? In the same application, the same core/hdfs-site is used but some configurations may be different for different jobs, for example, the compact memory size, the FileSystem... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4556: [HUDI-3191][Stacked on 4716] Removing duplicating file-listing process w/in Hive's MOR `FileInputFormat`s
hudi-bot removed a comment on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1024849226 ## CI report: * 77d11131baabd1c4e3cc2050337daca4df5f6427 UNKNOWN * 3d9c2ae28da858d1e8476052c99391015effb7db UNKNOWN * 31b0669d7b638bd65a17b22a2ceb772f2627512c UNKNOWN * 28a5a4f537544d35dfcd8700a7b97fb7216682ce UNKNOWN * c09e228f7cce78a7dbbc394e93b3cf8c6c3d4d5f UNKNOWN * 5b8f5819fff8fec34864eb409fd429b95be17b9b UNKNOWN * 13702fc2232ebd9300966b0a917a3358d8d0f18c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5590) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5600) * 9bdd8a60f95123f82a5f30f5e68929f5b7e1f8da UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4556: [HUDI-3191][Stacked on 4716] Removing duplicating file-listing process w/in Hive's MOR `FileInputFormat`s
hudi-bot commented on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1024849640 ## CI report: * 77d11131baabd1c4e3cc2050337daca4df5f6427 UNKNOWN * 3d9c2ae28da858d1e8476052c99391015effb7db UNKNOWN * 31b0669d7b638bd65a17b22a2ceb772f2627512c UNKNOWN * 28a5a4f537544d35dfcd8700a7b97fb7216682ce UNKNOWN * c09e228f7cce78a7dbbc394e93b3cf8c6c3d4d5f UNKNOWN * 5b8f5819fff8fec34864eb409fd429b95be17b9b UNKNOWN * 13702fc2232ebd9300966b0a917a3358d8d0f18c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5590) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5600) * 9bdd8a60f95123f82a5f30f5e68929f5b7e1f8da UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4556: [HUDI-3191][Stacked on 4716] Removing duplicating file-listing process w/in Hive's MOR `FileInputFormat`s
hudi-bot removed a comment on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1024845763 ## CI report: * 77d11131baabd1c4e3cc2050337daca4df5f6427 UNKNOWN * 3d9c2ae28da858d1e8476052c99391015effb7db UNKNOWN * 31b0669d7b638bd65a17b22a2ceb772f2627512c UNKNOWN * 28a5a4f537544d35dfcd8700a7b97fb7216682ce UNKNOWN * c09e228f7cce78a7dbbc394e93b3cf8c6c3d4d5f UNKNOWN * 5b8f5819fff8fec34864eb409fd429b95be17b9b UNKNOWN * 13702fc2232ebd9300966b0a917a3358d8d0f18c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5590) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5600) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4556: [HUDI-3191][Stacked on 4716] Removing duplicating file-listing process w/in Hive's MOR `FileInputFormat`s
hudi-bot commented on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1024849226 ## CI report: * 77d11131baabd1c4e3cc2050337daca4df5f6427 UNKNOWN * 3d9c2ae28da858d1e8476052c99391015effb7db UNKNOWN * 31b0669d7b638bd65a17b22a2ceb772f2627512c UNKNOWN * 28a5a4f537544d35dfcd8700a7b97fb7216682ce UNKNOWN * c09e228f7cce78a7dbbc394e93b3cf8c6c3d4d5f UNKNOWN * 5b8f5819fff8fec34864eb409fd429b95be17b9b UNKNOWN * 13702fc2232ebd9300966b0a917a3358d8d0f18c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5590) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5600) * 9bdd8a60f95123f82a5f30f5e68929f5b7e1f8da UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4716: [HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issues
hudi-bot removed a comment on pull request #4716: URL: https://github.com/apache/hudi/pull/4716#issuecomment-1024837778 ## CI report: * 62b022ec6642cef208e7fb370f95ed0ed1c5ba83 UNKNOWN * bb05ceb91b62d86e5f06ae6f00a28eb57c17aa6d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5593) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5598) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4716: [HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issues
hudi-bot commented on pull request #4716: URL: https://github.com/apache/hudi/pull/4716#issuecomment-1024848858 ## CI report: * 62b022ec6642cef208e7fb370f95ed0ed1c5ba83 UNKNOWN * bb05ceb91b62d86e5f06ae6f00a28eb57c17aa6d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5593) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5598) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4705: [HUDI-3337] Fixing Parquet Column Range metadata extraction
hudi-bot commented on pull request #4705: URL: https://github.com/apache/hudi/pull/4705#issuecomment-1024848478 ## CI report: * fc6f1f4af2201fb5541aeae70c745e7b6dc3981e UNKNOWN * 1ce429a085e29115d1f97ba2b4dd5a8e8ecc867c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5564) * 2fa66c4290ee2555973e29934ca7ecb8e4a0e709 UNKNOWN * 1c680c1381798065c51a5e6a7958238b1b540027 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5601) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4705: [HUDI-3337] Fixing Parquet Column Range metadata extraction
hudi-bot removed a comment on pull request #4705: URL: https://github.com/apache/hudi/pull/4705#issuecomment-1024841207 ## CI report: * fc6f1f4af2201fb5541aeae70c745e7b6dc3981e UNKNOWN * 1ce429a085e29115d1f97ba2b4dd5a8e8ecc867c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5564) * 2fa66c4290ee2555973e29934ca7ecb8e4a0e709 UNKNOWN * 1c680c1381798065c51a5e6a7958238b1b540027 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dcoliversun removed a comment on pull request #4717: [HUDI-3344] Standard format for HoodieDataSourceExample.scala
dcoliversun removed a comment on pull request #4717: URL: https://github.com/apache/hudi/pull/4717#issuecomment-1024845948 /assign @leesf -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dcoliversun commented on pull request #4717: [HUDI-3344] Standard format for HoodieDataSourceExample.scala
dcoliversun commented on pull request #4717: URL: https://github.com/apache/hudi/pull/4717#issuecomment-1024845948 /assign @leesf -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4556: [HUDI-3191][Stacked on 4716] Removing duplicating file-listing process w/in Hive's MOR `FileInputFormat`s
hudi-bot commented on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1024845763 ## CI report: * 77d11131baabd1c4e3cc2050337daca4df5f6427 UNKNOWN * 3d9c2ae28da858d1e8476052c99391015effb7db UNKNOWN * 31b0669d7b638bd65a17b22a2ceb772f2627512c UNKNOWN * 28a5a4f537544d35dfcd8700a7b97fb7216682ce UNKNOWN * c09e228f7cce78a7dbbc394e93b3cf8c6c3d4d5f UNKNOWN * 5b8f5819fff8fec34864eb409fd429b95be17b9b UNKNOWN * 13702fc2232ebd9300966b0a917a3358d8d0f18c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5590) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5600) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4556: [HUDI-3191][Stacked on 4716] Removing duplicating file-listing process w/in Hive's MOR `FileInputFormat`s
hudi-bot removed a comment on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1024819714 ## CI report: * 77d11131baabd1c4e3cc2050337daca4df5f6427 UNKNOWN * 3d9c2ae28da858d1e8476052c99391015effb7db UNKNOWN * 31b0669d7b638bd65a17b22a2ceb772f2627512c UNKNOWN * 28a5a4f537544d35dfcd8700a7b97fb7216682ce UNKNOWN * c09e228f7cce78a7dbbc394e93b3cf8c6c3d4d5f UNKNOWN * 5b8f5819fff8fec34864eb409fd429b95be17b9b UNKNOWN * 13702fc2232ebd9300966b0a917a3358d8d0f18c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5590) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on pull request #4556: [HUDI-3191][Stacked on 4716] Removing duplicating file-listing process w/in Hive's MOR `FileInputFormat`s
alexeykudinkin commented on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1024845655 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4705: [HUDI-3337] Fixing Parquet Column Range metadata extraction
hudi-bot commented on pull request #4705: URL: https://github.com/apache/hudi/pull/4705#issuecomment-1024841207 ## CI report: * fc6f1f4af2201fb5541aeae70c745e7b6dc3981e UNKNOWN * 1ce429a085e29115d1f97ba2b4dd5a8e8ecc867c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5564) * 2fa66c4290ee2555973e29934ca7ecb8e4a0e709 UNKNOWN * 1c680c1381798065c51a5e6a7958238b1b540027 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4705: [HUDI-3337] Fixing Parquet Column Range metadata extraction
hudi-bot removed a comment on pull request #4705: URL: https://github.com/apache/hudi/pull/4705#issuecomment-1024840853 ## CI report: * fc6f1f4af2201fb5541aeae70c745e7b6dc3981e UNKNOWN * 1ce429a085e29115d1f97ba2b4dd5a8e8ecc867c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5564) * 2fa66c4290ee2555973e29934ca7ecb8e4a0e709 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #4352: [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups
nsivabalan commented on a change in pull request #4352: URL: https://github.com/apache/hudi/pull/4352#discussion_r795001136 ## File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java ## @@ -399,4 +775,116 @@ public static int mapRecordKeyToFileGroupIndex(String recordKey, int numFileGrou return fileSliceStream.sorted((s1, s2) -> s1.getFileId().compareTo(s2.getFileId())).collect(Collectors.toList()); } + public static List convertMetadataToColumnStatsRecords(HoodieCommitMetadata commitMetadata, Review comment: comment about L760 to L767. I see we are instantiating the HoodieTableFileSystemView everytime here. May be we can try to move the fsView instantiation to the caller and thus amortize the cost. For eg, ``` enablePartition(MetadataPartitionType.FILES, metadataConfig, metaClient, isBootstrapCompleted); ``` we make similar calls thrice with this patch. So, we can instantiate just once outside and use the same here. Try to check for other callers too if we can do any such amortization. ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java ## @@ -101,4 +116,34 @@ public static HoodieRecord getTaggedRecord(HoodieRecord inputRecord, Option filterKeysFromFile(Path filePath, List candidateRecordKeys, +Configuration configuration) throws HoodieIndexException { +ValidationUtils.checkArgument(FSUtils.isBaseFile(filePath)); +List foundRecordKeys = new ArrayList<>(); +try { + // Load all rowKeys from the file, to double-confirm + if (!candidateRecordKeys.isEmpty()) { +HoodieTimer timer = new HoodieTimer().startTimer(); +HoodieFileReader fileReader = HoodieFileReaderFactory.getFileReader(configuration, filePath); +Set fileRowKeys = fileReader.filterRowKeys(new TreeSet<>(candidateRecordKeys)); Review comment: for non metadata path, we can avoid sorting the candidateRecordKeys here, just to keep it same as before. ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java ## @@ -1435,6 +1435,14 @@ public boolean useBloomIndexBucketizedChecking() { return getBoolean(HoodieIndexConfig.BLOOM_INDEX_BUCKETIZED_CHECKING); } + public boolean isMetadataBloomFilterIndexEnabled() { +return isMetadataTableEnabled() && getMetadataConfig().isBloomFilterIndexEnabled(); + } + + public boolean isMetadataIndexColumnStatsForAllColumnsEnabled() { +return isMetadataTableEnabled() && getMetadataConfig().isMetadataColumnStatsIndexForAllColumnsEnabled(); Review comment: supporting a subset configurable by users are not supported in this patch is it? do we have a tracking ticket. ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieKeyLookupHandle.java ## @@ -46,52 +51,58 @@ private static final Logger LOG = LogManager.getLogger(HoodieKeyLookupHandle.class); - private final HoodieTableType tableType; - private final BloomFilter bloomFilter; - private final List candidateRecordKeys; - + private final boolean useMetadataTableIndex; + private Option fileName = Option.empty(); private long totalKeysChecked; public HoodieKeyLookupHandle(HoodieWriteConfig config, HoodieTable hoodieTable, - Pair partitionPathFilePair) { -super(config, null, hoodieTable, partitionPathFilePair); -this.tableType = hoodieTable.getMetaClient().getTableType(); + Pair partitionPathFileIDPair) { +this(config, hoodieTable, partitionPathFileIDPair, Option.empty(), false); + } + + public HoodieKeyLookupHandle(HoodieWriteConfig config, HoodieTable hoodieTable, + Pair partitionPathFileIDPair, Option fileName, + boolean useMetadataTableIndex) { +super(config, hoodieTable, partitionPathFileIDPair); this.candidateRecordKeys = new ArrayList<>(); this.totalKeysChecked = 0; -HoodieTimer timer = new HoodieTimer().startTimer(); - -try { - this.bloomFilter = createNewFileReader().readBloomFilter(); -} catch (IOException e) { - throw new HoodieIndexException(String.format("Error reading bloom filter from %s: %s", partitionPathFilePair, e)); +if (fileName.isPresent()) { + ValidationUtils.checkArgument(FSUtils.getFileId(fileName.get()).equals(getFileId()), + "File name '" + fileName.get() + "' doesn't match this lookup handle fileid '" + getFileId() + "'"); + this.fileName = fileName; } -LOG.info(String.format("Read bloom filter from %s in %d ms", partitionPathFilePair, timer.endTimer())); +this.useMetadataTableIndex = useMetadataTableIndex; +this.bloomFilter = getBloomFilter(); } - /** - * Given a list of row keys and one file, r
[GitHub] [hudi] hudi-bot commented on pull request #4705: [HUDI-3337] Fixing Parquet Column Range metadata extraction
hudi-bot commented on pull request #4705: URL: https://github.com/apache/hudi/pull/4705#issuecomment-1024840853 ## CI report: * fc6f1f4af2201fb5541aeae70c745e7b6dc3981e UNKNOWN * 1ce429a085e29115d1f97ba2b4dd5a8e8ecc867c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5564) * 2fa66c4290ee2555973e29934ca7ecb8e4a0e709 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4717: [HUDI-3344] Standard format for HoodieDataSourceExample.scala
hudi-bot commented on pull request #4717: URL: https://github.com/apache/hudi/pull/4717#issuecomment-1024840867 ## CI report: * 6f9c5632c5647c27e9184dee53e52036150d4bea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5592) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5596) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4717: [HUDI-3344] Standard format for HoodieDataSourceExample.scala
hudi-bot removed a comment on pull request #4717: URL: https://github.com/apache/hudi/pull/4717#issuecomment-1024827774 ## CI report: * 6f9c5632c5647c27e9184dee53e52036150d4bea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5592) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5596) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4705: [HUDI-3337] Fixing Parquet Column Range metadata extraction
hudi-bot removed a comment on pull request #4705: URL: https://github.com/apache/hudi/pull/4705#issuecomment-1023728937 ## CI report: * fc6f1f4af2201fb5541aeae70c745e7b6dc3981e UNKNOWN * 1ce429a085e29115d1f97ba2b4dd5a8e8ecc867c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5564) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4714: [HUDI-3204] fix problem that spark on TimestampKeyGenerator has no re…
hudi-bot removed a comment on pull request #4714: URL: https://github.com/apache/hudi/pull/4714#issuecomment-1024838933 ## CI report: * 1dc6c2f464e45fcbffe5d7fde2fbb1c66a6fca34 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5582) * e8999e4928debb876332f287a1584cc7cbd69c85 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4714: [HUDI-3204] fix problem that spark on TimestampKeyGenerator has no re…
hudi-bot commented on pull request #4714: URL: https://github.com/apache/hudi/pull/4714#issuecomment-1024839306 ## CI report: * 1dc6c2f464e45fcbffe5d7fde2fbb1c66a6fca34 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5582) * e8999e4928debb876332f287a1584cc7cbd69c85 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5599) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4714: [HUDI-3204] fix problem that spark on TimestampKeyGenerator has no re…
hudi-bot removed a comment on pull request #4714: URL: https://github.com/apache/hudi/pull/4714#issuecomment-1024409363 ## CI report: * 1dc6c2f464e45fcbffe5d7fde2fbb1c66a6fca34 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5582) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4714: [HUDI-3204] fix problem that spark on TimestampKeyGenerator has no re…
hudi-bot commented on pull request #4714: URL: https://github.com/apache/hudi/pull/4714#issuecomment-1024838933 ## CI report: * 1dc6c2f464e45fcbffe5d7fde2fbb1c66a6fca34 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5582) * e8999e4928debb876332f287a1584cc7cbd69c85 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4716: [HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issues
hudi-bot removed a comment on pull request #4716: URL: https://github.com/apache/hudi/pull/4716#issuecomment-1024832837 ## CI report: * 62b022ec6642cef208e7fb370f95ed0ed1c5ba83 UNKNOWN * bb05ceb91b62d86e5f06ae6f00a28eb57c17aa6d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5593) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4716: [HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issues
hudi-bot commented on pull request #4716: URL: https://github.com/apache/hudi/pull/4716#issuecomment-1024837778 ## CI report: * 62b022ec6642cef208e7fb370f95ed0ed1c5ba83 UNKNOWN * bb05ceb91b62d86e5f06ae6f00a28eb57c17aa6d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5593) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5598) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on pull request #4716: [HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issues
alexeykudinkin commented on pull request #4716: URL: https://github.com/apache/hudi/pull/4716#issuecomment-1024837464 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #4699: [HUDI-3336][HUDI-FLINK] Configurations transferred through Flink SQL canno…
danny0405 commented on pull request #4699: URL: https://github.com/apache/hudi/pull/4699#issuecomment-1024833689 Thanks, can you explain a little why we need theses `hadoop.` prefixed config options ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4716: [HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issues
hudi-bot removed a comment on pull request #4716: URL: https://github.com/apache/hudi/pull/4716#issuecomment-1024812581 ## CI report: * 62b022ec6642cef208e7fb370f95ed0ed1c5ba83 UNKNOWN * 059ba5b6bc6ec995bf66538da8fa63bac494fd69 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5589) * bb05ceb91b62d86e5f06ae6f00a28eb57c17aa6d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5593) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4716: [HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issues
hudi-bot commented on pull request #4716: URL: https://github.com/apache/hudi/pull/4716#issuecomment-1024832837 ## CI report: * 62b022ec6642cef208e7fb370f95ed0ed1c5ba83 UNKNOWN * bb05ceb91b62d86e5f06ae6f00a28eb57c17aa6d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5593) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #4665: [HUDI-2733] Add support for Thrift sync
nsivabalan commented on a change in pull request #4665: URL: https://github.com/apache/hudi/pull/4665#discussion_r794996423 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/ThriftDDLExecutor.java ## @@ -0,0 +1,254 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.hive.ddl; + +import org.apache.hudi.common.fs.FSUtils; +import org.apache.hudi.common.fs.StorageSchemes; +import org.apache.hudi.hive.HiveSyncConfig; +import org.apache.hudi.hive.HoodieHiveSyncException; +import org.apache.hudi.hive.PartitionValueExtractor; +import org.apache.hudi.hive.thrift.HMSClient; +import org.apache.hudi.hive.util.HiveSchemaUtil; + +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.common.StatsSetupConst; +import org.apache.hadoop.hive.metastore.TableType; +import org.apache.hadoop.hive.metastore.api.Database; +import org.apache.hadoop.hive.metastore.api.EnvironmentContext; +import org.apache.hadoop.hive.metastore.api.FieldSchema; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.SerDeInfo; +import org.apache.hadoop.hive.metastore.api.StorageDescriptor; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; +import org.apache.parquet.schema.MessageType; +import org.apache.thrift.TException; + +import javax.security.auth.login.LoginException; + +import java.io.IOException; +import java.net.URI; +import java.net.URISyntaxException; +import java.util.HashMap; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; + +public class ThriftDDLExecutor implements DDLExecutor { + + private static final Logger LOG = LogManager.getLogger(ThriftDDLExecutor.class); + private final HMSClient client; Review comment: Can we name this HMSThriftClient. We already have hms mode and calling this HMSClient and not being used with "hms" mode doesn't sit well. ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/ThriftDDLExecutor.java ## @@ -0,0 +1,254 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.hive.ddl; + +import org.apache.hudi.common.fs.FSUtils; +import org.apache.hudi.common.fs.StorageSchemes; +import org.apache.hudi.hive.HiveSyncConfig; +import org.apache.hudi.hive.HoodieHiveSyncException; +import org.apache.hudi.hive.PartitionValueExtractor; +import org.apache.hudi.hive.thrift.HMSClient; +import org.apache.hudi.hive.util.HiveSchemaUtil; + +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.common.StatsSetupConst; +import org.apache.hadoop.hive.metastore.TableType; +import org.apache.hadoop.hive.metastore.api.Database; +import org.apache.hadoop.hive.metastore.api.EnvironmentContext; +import org.apache.hadoop.hive.metastore.api.FieldSchema; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.SerDeInfo; +import org.apache.hadoop.hive.metastore.api.StorageDescriptor; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; +import org.apache.parquet.schema.MessageType; +import org.apache.thrift.TException; + +import javax.security.auth.login.LoginException; + +
[hudi] branch master updated (c0e8b03 -> ed7aa13)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from c0e8b03 [HUDI-1977] Fix Hudi CLI tempview query issue (#4626) add ed7aa13 [MINOR] Added log to debug checkpoint resumption when set to 0 (#4650) No new revisions were added by this update. Summary of changes: .../main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java| 2 ++ 1 file changed, 2 insertions(+)
[GitHub] [hudi] nsivabalan merged pull request #4650: [MINOR] Added log to debug checkpoint resumption when set to 0
nsivabalan merged pull request #4650: URL: https://github.com/apache/hudi/pull/4650 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #4712: [HUDI-2809] Introduce a checksum mechanism for validating hoodie.properties
nsivabalan commented on a change in pull request #4712: URL: https://github.com/apache/hudi/pull/4712#discussion_r794995087 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java ## @@ -272,15 +287,29 @@ private static void modify(FileSystem fs, Path metadataFolder, Properties modify /// 2. delete the properties file, reads will go to the backup, until we are done. fs.delete(cfgPath, false); // 3. read current props, upsert and save back. + String checksum; try (FSDataInputStream in = fs.open(backupCfgPath); FSDataOutputStream out = fs.create(cfgPath, true)) { -Properties props = new Properties(); +Properties props = new TypedProperties(); props.load(in); modifyFn.accept(props, modifyProps); +if (props.containsKey(TABLE_CHECKSUM.key()) && validateChecksum(props)) { + checksum = props.getProperty(TABLE_CHECKSUM.key()); +} else { + checksum = String.valueOf(generateChecksum(props)); +} +props.setProperty(TABLE_CHECKSUM.key(), checksum); Review comment: shouldn't we move this to else block only. ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java ## @@ -493,6 +544,13 @@ public String getUrlEncodePartitioning() { return getString(URL_ENCODE_PARTITIONING); } + /** + * Read the table checksum. + */ + public Long getTableChecksum() { Review comment: why public ? ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java ## @@ -272,15 +287,29 @@ private static void modify(FileSystem fs, Path metadataFolder, Properties modify /// 2. delete the properties file, reads will go to the backup, until we are done. fs.delete(cfgPath, false); // 3. read current props, upsert and save back. + String checksum; try (FSDataInputStream in = fs.open(backupCfgPath); FSDataOutputStream out = fs.create(cfgPath, true)) { -Properties props = new Properties(); +Properties props = new TypedProperties(); props.load(in); modifyFn.accept(props, modifyProps); +if (props.containsKey(TABLE_CHECKSUM.key()) && validateChecksum(props)) { + checksum = props.getProperty(TABLE_CHECKSUM.key()); +} else { + checksum = String.valueOf(generateChecksum(props)); +} +props.setProperty(TABLE_CHECKSUM.key(), checksum); props.store(out, "Updated at " + System.currentTimeMillis()); } // 4. verify and remove backup. // FIXME(vc): generate a hash for verification. + try (FSDataInputStream in = fs.open(cfgPath)) { +Properties props = new TypedProperties(); +props.load(in); +if (!props.containsKey(TABLE_CHECKSUM.key()) || !props.getProperty(TABLE_CHECKSUM.key()).equals(checksum)) { + throw new HoodieIOException("Checksum property missing or does not match."); Review comment: is it not possible to regenerate from backup rather than failing here? ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java ## @@ -272,15 +287,29 @@ private static void modify(FileSystem fs, Path metadataFolder, Properties modify /// 2. delete the properties file, reads will go to the backup, until we are done. fs.delete(cfgPath, false); // 3. read current props, upsert and save back. + String checksum; try (FSDataInputStream in = fs.open(backupCfgPath); FSDataOutputStream out = fs.create(cfgPath, true)) { -Properties props = new Properties(); +Properties props = new TypedProperties(); props.load(in); modifyFn.accept(props, modifyProps); +if (props.containsKey(TABLE_CHECKSUM.key()) && validateChecksum(props)) { + checksum = props.getProperty(TABLE_CHECKSUM.key()); +} else { + checksum = String.valueOf(generateChecksum(props)); +} +props.setProperty(TABLE_CHECKSUM.key(), checksum); props.store(out, "Updated at " + System.currentTimeMillis()); } // 4. verify and remove backup. // FIXME(vc): generate a hash for verification. + try (FSDataInputStream in = fs.open(cfgPath)) { Review comment: remove L 305 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4717: [HUDI-3344] Standard format for HoodieDataSourceExample.scala
hudi-bot commented on pull request #4717: URL: https://github.com/apache/hudi/pull/4717#issuecomment-1024827774 ## CI report: * 6f9c5632c5647c27e9184dee53e52036150d4bea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5592) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5596) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4717: [HUDI-3344] Standard format for HoodieDataSourceExample.scala
hudi-bot removed a comment on pull request #4717: URL: https://github.com/apache/hudi/pull/4717#issuecomment-1024826599 ## CI report: * 6f9c5632c5647c27e9184dee53e52036150d4bea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5592) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dcoliversun commented on pull request #4717: [HUDI-3344] Standard format for HoodieDataSourceExample.scala
dcoliversun commented on pull request #4717: URL: https://github.com/apache/hudi/pull/4717#issuecomment-1024827509 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4717: [HUDI-3344] Standard format for HoodieDataSourceExample.scala
hudi-bot removed a comment on pull request #4717: URL: https://github.com/apache/hudi/pull/4717#issuecomment-1024812075 ## CI report: * 6f9c5632c5647c27e9184dee53e52036150d4bea Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5592) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4717: [HUDI-3344] Standard format for HoodieDataSourceExample.scala
hudi-bot commented on pull request #4717: URL: https://github.com/apache/hudi/pull/4717#issuecomment-1024826599 ## CI report: * 6f9c5632c5647c27e9184dee53e52036150d4bea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5592) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4699: [HUDI-3336][HUDI-FLINK] Configurations transferred through Flink SQL canno…
hudi-bot commented on pull request #4699: URL: https://github.com/apache/hudi/pull/4699#issuecomment-1024822171 ## CI report: * 6e9036e89041d7ee6cf995b29502cbc10bfbff8d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5579) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5591) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4699: [HUDI-3336][HUDI-FLINK] Configurations transferred through Flink SQL canno…
hudi-bot removed a comment on pull request #4699: URL: https://github.com/apache/hudi/pull/4699#issuecomment-1024801677 ## CI report: * 6e9036e89041d7ee6cf995b29502cbc10bfbff8d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5579) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5591) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4556: [HUDI-3191][Stacked on 4716] Removing duplicating file-listing process w/in Hive's MOR `FileInputFormat`s
hudi-bot commented on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1024819714 ## CI report: * 77d11131baabd1c4e3cc2050337daca4df5f6427 UNKNOWN * 3d9c2ae28da858d1e8476052c99391015effb7db UNKNOWN * 31b0669d7b638bd65a17b22a2ceb772f2627512c UNKNOWN * 28a5a4f537544d35dfcd8700a7b97fb7216682ce UNKNOWN * c09e228f7cce78a7dbbc394e93b3cf8c6c3d4d5f UNKNOWN * 5b8f5819fff8fec34864eb409fd429b95be17b9b UNKNOWN * 13702fc2232ebd9300966b0a917a3358d8d0f18c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5590) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4556: [HUDI-3191][Stacked on 4716] Removing duplicating file-listing process w/in Hive's MOR `FileInputFormat`s
hudi-bot removed a comment on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1024802263 ## CI report: * 77d11131baabd1c4e3cc2050337daca4df5f6427 UNKNOWN * 3d9c2ae28da858d1e8476052c99391015effb7db UNKNOWN * 31b0669d7b638bd65a17b22a2ceb772f2627512c UNKNOWN * 28a5a4f537544d35dfcd8700a7b97fb7216682ce UNKNOWN * c09e228f7cce78a7dbbc394e93b3cf8c6c3d4d5f UNKNOWN * dc1b033f40cea9642f71b8fcff74aa82106726b6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5484) * 5b8f5819fff8fec34864eb409fd429b95be17b9b UNKNOWN * 13702fc2232ebd9300966b0a917a3358d8d0f18c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5590) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (e78b2f1 -> c0e8b03)
This is an automated email from the ASF dual-hosted git repository. mengtao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from e78b2f1 [HUDI-2943] Complete pending clustering before deltastreamer sync (#4572) add c0e8b03 [HUDI-1977] Fix Hudi CLI tempview query issue (#4626) No new revisions were added by this update. Summary of changes: .../hudi/cli/utils/SparkTempViewProvider.java | 25 +- 1 file changed, 15 insertions(+), 10 deletions(-)
[GitHub] [hudi] xiarixiaoyao merged pull request #4626: [HUDI-1977] Fix Hudi CLI tempview query LOG issue
xiarixiaoyao merged pull request #4626: URL: https://github.com/apache/hudi/pull/4626 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on pull request #4626: [HUDI-1977] Fix Hudi CLI tempview query LOG issue
xiarixiaoyao commented on pull request #4626: URL: https://github.com/apache/hudi/pull/4626#issuecomment-1024813478 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4352: [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups
hudi-bot removed a comment on pull request #4352: URL: https://github.com/apache/hudi/pull/4352#issuecomment-1024786842 ## CI report: * 235981abd20a498a3e29e98ce0eda9de35018f99 UNKNOWN * 4d38e462c4fc79432b3cef2691cb76229d054cab Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5568) * fa4c161d0c8131af8205d070996f14956f7692c9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5588) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4352: [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups
hudi-bot commented on pull request #4352: URL: https://github.com/apache/hudi/pull/4352#issuecomment-1024812991 ## CI report: * 235981abd20a498a3e29e98ce0eda9de35018f99 UNKNOWN * fa4c161d0c8131af8205d070996f14956f7692c9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5588) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4716: [HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issues
hudi-bot removed a comment on pull request #4716: URL: https://github.com/apache/hudi/pull/4716#issuecomment-1024801043 ## CI report: * 62b022ec6642cef208e7fb370f95ed0ed1c5ba83 UNKNOWN * 059ba5b6bc6ec995bf66538da8fa63bac494fd69 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5589) * bb05ceb91b62d86e5f06ae6f00a28eb57c17aa6d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4716: [HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issues
hudi-bot commented on pull request #4716: URL: https://github.com/apache/hudi/pull/4716#issuecomment-1024812581 ## CI report: * 62b022ec6642cef208e7fb370f95ed0ed1c5ba83 UNKNOWN * 059ba5b6bc6ec995bf66538da8fa63bac494fd69 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5589) * bb05ceb91b62d86e5f06ae6f00a28eb57c17aa6d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5593) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dcoliversun commented on pull request #4717: [HUDI-3344] Standard format for HoodieDataSourceExample.scala
dcoliversun commented on pull request #4717: URL: https://github.com/apache/hudi/pull/4717#issuecomment-1024812233 cc @leesf It would be good if you could take a look when you have time, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (2b52a56 -> e78b2f1)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 2b52a56 [HUDI-2688][RFC-40] A new Hudi connector for Trino (#3957) add e78b2f1 [HUDI-2943] Complete pending clustering before deltastreamer sync (#4572) No new revisions were added by this update. Summary of changes: .../hudi/utilities/deltastreamer/DeltaSync.java| 22 +++ .../deltastreamer/HoodieDeltaStreamer.java | 3 ++ .../functional/TestHoodieDeltaStreamer.java| 32 ++ 3 files changed, 57 insertions(+)
[GitHub] [hudi] hudi-bot removed a comment on pull request #4717: [HUDI-3344] Standard format for HoodieDataSourceExample.scala
hudi-bot removed a comment on pull request #4717: URL: https://github.com/apache/hudi/pull/4717#issuecomment-1024811593 ## CI report: * 6f9c5632c5647c27e9184dee53e52036150d4bea UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4717: [HUDI-3344] Standard format for HoodieDataSourceExample.scala
hudi-bot commented on pull request #4717: URL: https://github.com/apache/hudi/pull/4717#issuecomment-1024812075 ## CI report: * 6f9c5632c5647c27e9184dee53e52036150d4bea Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5592) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan merged pull request #4572: [HUDI-2943] Complete pending clustering before deltastreamer sync
nsivabalan merged pull request #4572: URL: https://github.com/apache/hudi/pull/4572 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4717: [HUDI-3344] Standard format for HoodieDataSourceExample.scala
hudi-bot commented on pull request #4717: URL: https://github.com/apache/hudi/pull/4717#issuecomment-1024811593 ## CI report: * 6f9c5632c5647c27e9184dee53e52036150d4bea UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3344) Standard code format for HoodieDataSourceExample.scala
[ https://issues.apache.org/jira/browse/HUDI-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-3344: - Labels: pull-request-available (was: ) > Standard code format for HoodieDataSourceExample.scala > --- > > Key: HUDI-3344 > URL: https://issues.apache.org/jira/browse/HUDI-3344 > Project: Apache Hudi > Issue Type: Improvement >Reporter: qian >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] dcoliversun opened a new pull request #4717: [HUDI-3344] Standard format for HoodieDataSourceExample.scala
dcoliversun opened a new pull request #4717: URL: https://github.com/apache/hudi/pull/4717 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request This pull request is to standard code format for HoodieDataSourceExample.scala ## Brief change log - *Align code in HoodieDataSourceExample.scala* ## Verify this pull request This pull request is a trivial rework / code cleanup without any test coverage. ## Committer checklist - [x] Has a corresponding JIRA in PR title & commit - [x] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron commented on pull request #4714: [HUDI-3204] fix problem that spark on TimestampKeyGenerator has no re…
YannByron commented on pull request #4714: URL: https://github.com/apache/hudi/pull/4714#issuecomment-1024810709 @nsivabalan @leesf could you help to review this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-3344) Standard code format for HoodieDataSourceExample.scala
qian created HUDI-3344: -- Summary: Standard code format for HoodieDataSourceExample.scala Key: HUDI-3344 URL: https://issues.apache.org/jira/browse/HUDI-3344 Project: Apache Hudi Issue Type: Improvement Reporter: qian -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] peanut-chenzhong commented on pull request #4626: [HUDI-1977] Fix Hudi CLI tempview query LOG issue
peanut-chenzhong commented on pull request #4626: URL: https://github.com/apache/hudi/pull/4626#issuecomment-1024804519 @n3nash please help review this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4556: [HUDI-3191][Stacked on 4716] Removing duplicating file-listing process w/in Hive's MOR `FileInputFormat`s
hudi-bot removed a comment on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1024801588 ## CI report: * 77d11131baabd1c4e3cc2050337daca4df5f6427 UNKNOWN * 3d9c2ae28da858d1e8476052c99391015effb7db UNKNOWN * 31b0669d7b638bd65a17b22a2ceb772f2627512c UNKNOWN * 28a5a4f537544d35dfcd8700a7b97fb7216682ce UNKNOWN * c09e228f7cce78a7dbbc394e93b3cf8c6c3d4d5f UNKNOWN * dc1b033f40cea9642f71b8fcff74aa82106726b6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5484) * 5b8f5819fff8fec34864eb409fd429b95be17b9b UNKNOWN * 13702fc2232ebd9300966b0a917a3358d8d0f18c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4556: [HUDI-3191][Stacked on 4716] Removing duplicating file-listing process w/in Hive's MOR `FileInputFormat`s
hudi-bot commented on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1024802263 ## CI report: * 77d11131baabd1c4e3cc2050337daca4df5f6427 UNKNOWN * 3d9c2ae28da858d1e8476052c99391015effb7db UNKNOWN * 31b0669d7b638bd65a17b22a2ceb772f2627512c UNKNOWN * 28a5a4f537544d35dfcd8700a7b97fb7216682ce UNKNOWN * c09e228f7cce78a7dbbc394e93b3cf8c6c3d4d5f UNKNOWN * dc1b033f40cea9642f71b8fcff74aa82106726b6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5484) * 5b8f5819fff8fec34864eb409fd429b95be17b9b UNKNOWN * 13702fc2232ebd9300966b0a917a3358d8d0f18c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5590) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4711: [WIP][HUDI-1295][HUDI-3166] Hoodie Index Type Metadata Bloom implementation
hudi-bot commented on pull request #4711: URL: https://github.com/apache/hudi/pull/4711#issuecomment-1024801695 ## CI report: * c15ad0d2d77cda49cba5961a1baaf20b592499b4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5587) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4711: [WIP][HUDI-1295][HUDI-3166] Hoodie Index Type Metadata Bloom implementation
hudi-bot removed a comment on pull request #4711: URL: https://github.com/apache/hudi/pull/4711#issuecomment-1024770458 ## CI report: * 1f2e400c0f77ac70906cba487f31cc9b9daf7915 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5576) * c15ad0d2d77cda49cba5961a1baaf20b592499b4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5587) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4699: [HUDI-3336][HUDI-FLINK] Configurations transferred through Flink SQL canno…
hudi-bot commented on pull request #4699: URL: https://github.com/apache/hudi/pull/4699#issuecomment-1024801677 ## CI report: * 6e9036e89041d7ee6cf995b29502cbc10bfbff8d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5579) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5591) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4699: [HUDI-3336][HUDI-FLINK] Configurations transferred through Flink SQL canno…
hudi-bot removed a comment on pull request #4699: URL: https://github.com/apache/hudi/pull/4699#issuecomment-1024249663 ## CI report: * 6e9036e89041d7ee6cf995b29502cbc10bfbff8d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5579) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4556: [HUDI-3191][Stacked on 4716] Removing duplicating file-listing process w/in Hive's MOR `FileInputFormat`s
hudi-bot removed a comment on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1024799065 ## CI report: * 77d11131baabd1c4e3cc2050337daca4df5f6427 UNKNOWN * 3d9c2ae28da858d1e8476052c99391015effb7db UNKNOWN * 31b0669d7b638bd65a17b22a2ceb772f2627512c UNKNOWN * 28a5a4f537544d35dfcd8700a7b97fb7216682ce UNKNOWN * c09e228f7cce78a7dbbc394e93b3cf8c6c3d4d5f UNKNOWN * dc1b033f40cea9642f71b8fcff74aa82106726b6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5484) * 5b8f5819fff8fec34864eb409fd429b95be17b9b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4556: [HUDI-3191][Stacked on 4716] Removing duplicating file-listing process w/in Hive's MOR `FileInputFormat`s
hudi-bot commented on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1024801588 ## CI report: * 77d11131baabd1c4e3cc2050337daca4df5f6427 UNKNOWN * 3d9c2ae28da858d1e8476052c99391015effb7db UNKNOWN * 31b0669d7b638bd65a17b22a2ceb772f2627512c UNKNOWN * 28a5a4f537544d35dfcd8700a7b97fb7216682ce UNKNOWN * c09e228f7cce78a7dbbc394e93b3cf8c6c3d4d5f UNKNOWN * dc1b033f40cea9642f71b8fcff74aa82106726b6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5484) * 5b8f5819fff8fec34864eb409fd429b95be17b9b UNKNOWN * 13702fc2232ebd9300966b0a917a3358d8d0f18c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] cuibo01 commented on pull request #4699: [HUDI-3336][HUDI-FLINK] Configurations transferred through Flink SQL canno…
cuibo01 commented on pull request #4699: URL: https://github.com/apache/hudi/pull/4699#issuecomment-1024801521 @danny0405 pls review thx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on pull request #4699: [HUDI-3336][HUDI-FLINK] Configurations transferred through Flink SQL canno…
xiarixiaoyao commented on pull request #4699: URL: https://github.com/apache/hudi/pull/4699#issuecomment-1024801186 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4716: [HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issues
hudi-bot removed a comment on pull request #4716: URL: https://github.com/apache/hudi/pull/4716#issuecomment-1024799819 ## CI report: * 62b022ec6642cef208e7fb370f95ed0ed1c5ba83 UNKNOWN * 059ba5b6bc6ec995bf66538da8fa63bac494fd69 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5589) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4716: [HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issues
alexeykudinkin commented on a change in pull request #4716: URL: https://github.com/apache/hudi/pull/4716#discussion_r794979942 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java ## @@ -442,8 +443,11 @@ private void updateTableMetadata(HoodieTable>, JavaRD metaClient, config, context, SparkUpgradeDowngradeHelper.getInstance()) .run(HoodieTableVersion.current(), instantTime); metaClient.reloadActiveTimeline(); -initializeMetadataTable(Option.of(instantTime)); } + // Initialize Metadata Table to make sure it's bootstrapped _before_ the operation, + // if it didn't exist before + // See https://issues.apache.org/jira/browse/HUDI-3343 for more details + initializeMetadataTable(Option.of(instantTime)); Review comment: This is addressing HUDI-3343 ## File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java ## @@ -264,17 +265,6 @@ private static void processRollbackMetadata(HoodieActiveTimeline metadataTableTi partitionToAppendedFiles.get(partition).merge(new Path(path).getName(), size, fileMergeFn); }); } - - if (pm.getWrittenLogFiles() != null && !pm.getWrittenLogFiles().isEmpty()) { Review comment: This is addressing HUDI-3322 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4716: [HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issues
hudi-bot commented on pull request #4716: URL: https://github.com/apache/hudi/pull/4716#issuecomment-1024801043 ## CI report: * 62b022ec6642cef208e7fb370f95ed0ed1c5ba83 UNKNOWN * 059ba5b6bc6ec995bf66538da8fa63bac494fd69 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5589) * bb05ceb91b62d86e5f06ae6f00a28eb57c17aa6d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4716: [HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issues
hudi-bot removed a comment on pull request #4716: URL: https://github.com/apache/hudi/pull/4716#issuecomment-1024799152 ## CI report: * 62b022ec6642cef208e7fb370f95ed0ed1c5ba83 UNKNOWN * 059ba5b6bc6ec995bf66538da8fa63bac494fd69 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4716: [HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issues
hudi-bot commented on pull request #4716: URL: https://github.com/apache/hudi/pull/4716#issuecomment-1024799819 ## CI report: * 62b022ec6642cef208e7fb370f95ed0ed1c5ba83 UNKNOWN * 059ba5b6bc6ec995bf66538da8fa63bac494fd69 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5589) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-3279) Metadata table stores incorrect file sizes after Restore
[ https://issues.apache.org/jira/browse/HUDI-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin closed HUDI-3279. - Resolution: Duplicate > Metadata table stores incorrect file sizes after Restore > > > Key: HUDI-3279 > URL: https://issues.apache.org/jira/browse/HUDI-3279 > Project: Apache Hudi > Issue Type: Task >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Fix For: 0.11.0 > > Attachments: Screen Shot 2022-01-19 at 12.17.21 PM.png, Screen Shot > 2022-01-19 at 12.18.27 PM.png, Screen Shot 2022-01-19 at 7.56.37 PM.png > > > While working on [https://github.com/apache/hudi/pull/4556,] I have stumbled > upon an issue of the LogBlock Scanner EOF-ing on the log-files in tests after > performing Restore operation. > The root-cause of these turned out to be Metadata Table storing incorrect > sizes of the files after Restore (sizes in MT are essentially 2x of what is > in FS): > !Screen Shot 2022-01-19 at 12.17.21 PM.png! > !Screen Shot 2022-01-19 at 12.18.27 PM.png! > > This seems to occur due to following: > # Metadata table treats new Records for the same file as "deltas", appending > the file-size to its records > (https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java#L227)] > # Upon Restore (which is handled simply as a collection of Rollbacks) we > pick *max* of the sizes of the files before and after the operation, not > regarding to which we're actually rolling back to > (https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java#L254).] > > *Proposal* > Instead of simply always picking the max size, we should pick the size of the > file as it was right before. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3318) Write RFC regarding proposed changes to the RecordPayload hierarchy
[ https://issues.apache.org/jira/browse/HUDI-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3318: -- Status: Patch Available (was: In Progress) > Write RFC regarding proposed changes to the RecordPayload hierarchy > --- > > Key: HUDI-3318 > URL: https://issues.apache.org/jira/browse/HUDI-3318 > Project: Apache Hudi > Issue Type: Task > Components: writer-core >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3337) ParquetUtils fails extracting Parquet Column Range Metadata
[ https://issues.apache.org/jira/browse/HUDI-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3337: -- Status: Patch Available (was: In Progress) > ParquetUtils fails extracting Parquet Column Range Metadata > --- > > Key: HUDI-3337 > URL: https://issues.apache.org/jira/browse/HUDI-3337 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > [~manojpec] discovered following issue while testing MT flows, with > {{TestHoodieBackedMetadata#testTableOperationsWithMetadataIndex}} failing > with: > > {code:java} > 17400 [Executor task launch worker for task 240] ERROR > org.apache.hudi.metadata.HoodieTableMetadataUtil - Failed to read column > stats for > /var/folders/t7/kr69rlvx5rdd824m61zjqkjrgn/T/junit2402861080324269156/dataset/2016/03/15/44396fda-48db-4d10-9f47-275c39317115-0_0-101-234_003.parquet > java.lang.ClassCastException: > org.apache.parquet.io.api.Binary$ByteArrayBackedBinary cannot be cast to > java.lang.Integer > at > org.apache.hudi.common.util.ParquetUtils.convertToNativeJavaType(ParquetUtils.java:369) > at > org.apache.hudi.common.util.ParquetUtils.lambda$null$2(ParquetUtils.java:305) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at > java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) > at > java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) > at > org.apache.hudi.common.util.ParquetUtils.readRangeFromParquetMetadata(ParquetUtils.java:313) > at > org.apache.hudi.metadata.HoodieTableMetadataUtil.getColumnStats(HoodieTableMetadataUtil.java:878) > at > org.apache.hudi.metadata.HoodieTableMetadataUtil.translateWriteStatToColumnStats(HoodieTableMetadataUtil.java:858) > at > org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$createColumnStatsFromWriteStats$7e2376a$1(HoodieTableMetadataUtil.java:819) > at > org.apache.hudi.client.common.HoodieSparkEngineContext.lambda$flatMap$7d470b86$1(HoodieSparkEngineContext.java:134) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:125) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:125) > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) > at scala.collection.AbstractIterator.to(Iterator.scala:1334) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1334) > at > org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945) > at > org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945) > at > org.apache.spark.
[GitHub] [hudi] hudi-bot removed a comment on pull request #4716: [HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issues
hudi-bot removed a comment on pull request #4716: URL: https://github.com/apache/hudi/pull/4716#issuecomment-1024798480 ## CI report: * 62b022ec6642cef208e7fb370f95ed0ed1c5ba83 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4716: [HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issues
hudi-bot commented on pull request #4716: URL: https://github.com/apache/hudi/pull/4716#issuecomment-1024799152 ## CI report: * 62b022ec6642cef208e7fb370f95ed0ed1c5ba83 UNKNOWN * 059ba5b6bc6ec995bf66538da8fa63bac494fd69 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4556: [HUDI-3191] Removing duplicating file-listing process w/in Hive's MOR `FileInputFormat`s
hudi-bot removed a comment on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1020724471 ## CI report: * 77d11131baabd1c4e3cc2050337daca4df5f6427 UNKNOWN * 3d9c2ae28da858d1e8476052c99391015effb7db UNKNOWN * 31b0669d7b638bd65a17b22a2ceb772f2627512c UNKNOWN * 28a5a4f537544d35dfcd8700a7b97fb7216682ce UNKNOWN * c09e228f7cce78a7dbbc394e93b3cf8c6c3d4d5f UNKNOWN * dc1b033f40cea9642f71b8fcff74aa82106726b6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5484) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4556: [HUDI-3191] Removing duplicating file-listing process w/in Hive's MOR `FileInputFormat`s
hudi-bot commented on pull request #4556: URL: https://github.com/apache/hudi/pull/4556#issuecomment-1024799065 ## CI report: * 77d11131baabd1c4e3cc2050337daca4df5f6427 UNKNOWN * 3d9c2ae28da858d1e8476052c99391015effb7db UNKNOWN * 31b0669d7b638bd65a17b22a2ceb772f2627512c UNKNOWN * 28a5a4f537544d35dfcd8700a7b97fb7216682ce UNKNOWN * c09e228f7cce78a7dbbc394e93b3cf8c6c3d4d5f UNKNOWN * dc1b033f40cea9642f71b8fcff74aa82106726b6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5484) * 5b8f5819fff8fec34864eb409fd429b95be17b9b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3322) Rollback Plan for Delta Commits constructed incorrectly
[ https://issues.apache.org/jira/browse/HUDI-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3322: -- Reviewers: sivabalan narayanan, Y Ethan Guo > Rollback Plan for Delta Commits constructed incorrectly > --- > > Key: HUDI-3322 > URL: https://issues.apache.org/jira/browse/HUDI-3322 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > Diving deeper into the issue of HUDI-3279, i've realized that the root-cause > of the problem is actually a Rollback Plan for Delta Commits is composed > incorrectly for MOR tables. Consider the case below (we will continue to rely > on test of > {{{}TestHoodieSparkMergeOnReadTableRollback#testMORTableRestore{}}}): > Hoodie Timeline: > {code:java} > alexey.kudinkin@alexeys-mbp junit5494198038159268501 % ls -la .hoodie > total 400 > drwxr-xr-x 52 alexey.kudinkin staff 1664 Jan 25 13:08 . > drwx-- 5 alexey.kudinkin staff 160 Jan 25 12:56 .. > -rw-r--r-- 1 alexey.kudinkin staff 48 Jan 25 12:56 .001.deltacommit.crc > -rw-r--r-- 1 alexey.kudinkin staff 28 Jan 25 12:56 > .001.deltacommit.inflight.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 12:56 > .001.deltacommit.requested.crc > -rw-r--r-- 1 alexey.kudinkin staff 52 Jan 25 12:56 .002.deltacommit.crc > -rw-r--r-- 1 alexey.kudinkin staff 48 Jan 25 12:56 > .002.deltacommit.inflight.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 12:56 > .002.deltacommit.requested.crc > -rw-r--r-- 1 alexey.kudinkin staff 56 Jan 25 12:57 .003.deltacommit.crc > -rw-r--r-- 1 alexey.kudinkin staff 48 Jan 25 12:57 > .003.deltacommit.inflight.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 12:56 > .003.deltacommit.requested.crc > -rw-r--r-- 1 alexey.kudinkin staff 56 Jan 25 12:57 .004.deltacommit.crc > -rw-r--r-- 1 alexey.kudinkin staff 48 Jan 25 12:57 > .004.deltacommit.inflight.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 12:57 > .004.deltacommit.requested.crc > -rw-r--r-- 1 alexey.kudinkin staff 48 Jan 25 12:57 .005.commit.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 12:57 > .005.compaction.inflight.crc > -rw-r--r-- 1 alexey.kudinkin staff 28 Jan 25 12:57 > .005.compaction.requested.crc > -rw-r--r-- 1 alexey.kudinkin staff 52 Jan 25 12:57 .006.deltacommit.crc > -rw-r--r-- 1 alexey.kudinkin staff 48 Jan 25 12:57 > .006.deltacommit.inflight.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 12:57 > .006.deltacommit.requested.crc > -rw-r--r-- 1 alexey.kudinkin staff 52 Jan 25 12:57 .007.deltacommit.crc > -rw-r--r-- 1 alexey.kudinkin staff 48 Jan 25 12:57 > .007.deltacommit.inflight.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 12:57 > .007.deltacommit.requested.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 13:08 > .20220125130818473.restore.inflight.crc > drwxr-xr-x 5 alexey.kudinkin staff 160 Jan 25 12:57 .aux > -rw-r--r-- 1 alexey.kudinkin staff 12 Jan 25 12:56 > .hoodie.properties.crc > drwxr-xr-x 2 alexey.kudinkin staff 64 Jan 25 12:57 .temp > -rw-r--r-- 1 alexey.kudinkin staff 4822 Jan 25 12:56 001.deltacommit > -rw-r--r-- 1 alexey.kudinkin staff 2499 Jan 25 12:56 > 001.deltacommit.inflight > -rw-r--r-- 1 alexey.kudinkin staff 0 Jan 25 12:56 > 001.deltacommit.requested > -rw-r--r-- 1 alexey.kudinkin staff 5451 Jan 25 12:56 002.deltacommit > -rw-r--r-- 1 alexey.kudinkin staff 4620 Jan 25 12:56 > 002.deltacommit.inflight > -rw-r--r-- 1 alexey.kudinkin staff 0 Jan 25 12:56 > 002.deltacommit.requested > -rw-r--r-- 1 alexey.kudinkin staff 5646 Jan 25 12:57 003.deltacommit > -rw-r--r-- 1 alexey.kudinkin staff 4620 Jan 25 12:57 > 003.deltacommit.inflight > -rw-r--r-- 1 alexey.kudinkin staff 0 Jan 25 12:56 > 003.deltacommit.requested > -rw-r--r-- 1 alexey.kudinkin staff 5835 Jan 25 12:57 004.deltacommit > -rw-r--r-- 1 alexey.kudinkin staff 4620 Jan 25 12:57 > 004.deltacommit.inflight > -rw-r--r-- 1 alexey.kudinkin staff 0 Jan 25 12:57 > 004.deltacommit.requested > -rw-r--r-- 1 alexey.kudinkin staff 4756 Jan 25 12:57 005.commit > -rw-r--r-- 1 alexey.kudinkin staff 0 Jan 25 12:57 > 005.compaction.inflight > -rw-r--r-- 1 alexey.kudinkin staff 2507 Jan 25 12:57 > 005.compaction.requested > -rw-r--r-- 1 alexey.kudinkin staff 5362 Jan 25 12:57 006.deltacommit > -rw-r--r-- 1 alexey.kudinkin staff 4620 Jan 25 12:57 > 006.deltacommit.inflight > -rw-r--r-- 1 alexey.kudinkin staff 0 Jan 25 12:57 > 006.deltacommit.requested > -rw-r--r-- 1 alexey.kudinkin staff 5551 Jan 25 12:57
[jira] [Updated] (HUDI-3322) Rollback Plan for Delta Commits constructed incorrectly
[ https://issues.apache.org/jira/browse/HUDI-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3322: -- Status: Patch Available (was: In Progress) > Rollback Plan for Delta Commits constructed incorrectly > --- > > Key: HUDI-3322 > URL: https://issues.apache.org/jira/browse/HUDI-3322 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > Diving deeper into the issue of HUDI-3279, i've realized that the root-cause > of the problem is actually a Rollback Plan for Delta Commits is composed > incorrectly for MOR tables. Consider the case below (we will continue to rely > on test of > {{{}TestHoodieSparkMergeOnReadTableRollback#testMORTableRestore{}}}): > Hoodie Timeline: > {code:java} > alexey.kudinkin@alexeys-mbp junit5494198038159268501 % ls -la .hoodie > total 400 > drwxr-xr-x 52 alexey.kudinkin staff 1664 Jan 25 13:08 . > drwx-- 5 alexey.kudinkin staff 160 Jan 25 12:56 .. > -rw-r--r-- 1 alexey.kudinkin staff 48 Jan 25 12:56 .001.deltacommit.crc > -rw-r--r-- 1 alexey.kudinkin staff 28 Jan 25 12:56 > .001.deltacommit.inflight.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 12:56 > .001.deltacommit.requested.crc > -rw-r--r-- 1 alexey.kudinkin staff 52 Jan 25 12:56 .002.deltacommit.crc > -rw-r--r-- 1 alexey.kudinkin staff 48 Jan 25 12:56 > .002.deltacommit.inflight.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 12:56 > .002.deltacommit.requested.crc > -rw-r--r-- 1 alexey.kudinkin staff 56 Jan 25 12:57 .003.deltacommit.crc > -rw-r--r-- 1 alexey.kudinkin staff 48 Jan 25 12:57 > .003.deltacommit.inflight.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 12:56 > .003.deltacommit.requested.crc > -rw-r--r-- 1 alexey.kudinkin staff 56 Jan 25 12:57 .004.deltacommit.crc > -rw-r--r-- 1 alexey.kudinkin staff 48 Jan 25 12:57 > .004.deltacommit.inflight.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 12:57 > .004.deltacommit.requested.crc > -rw-r--r-- 1 alexey.kudinkin staff 48 Jan 25 12:57 .005.commit.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 12:57 > .005.compaction.inflight.crc > -rw-r--r-- 1 alexey.kudinkin staff 28 Jan 25 12:57 > .005.compaction.requested.crc > -rw-r--r-- 1 alexey.kudinkin staff 52 Jan 25 12:57 .006.deltacommit.crc > -rw-r--r-- 1 alexey.kudinkin staff 48 Jan 25 12:57 > .006.deltacommit.inflight.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 12:57 > .006.deltacommit.requested.crc > -rw-r--r-- 1 alexey.kudinkin staff 52 Jan 25 12:57 .007.deltacommit.crc > -rw-r--r-- 1 alexey.kudinkin staff 48 Jan 25 12:57 > .007.deltacommit.inflight.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 12:57 > .007.deltacommit.requested.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 13:08 > .20220125130818473.restore.inflight.crc > drwxr-xr-x 5 alexey.kudinkin staff 160 Jan 25 12:57 .aux > -rw-r--r-- 1 alexey.kudinkin staff 12 Jan 25 12:56 > .hoodie.properties.crc > drwxr-xr-x 2 alexey.kudinkin staff 64 Jan 25 12:57 .temp > -rw-r--r-- 1 alexey.kudinkin staff 4822 Jan 25 12:56 001.deltacommit > -rw-r--r-- 1 alexey.kudinkin staff 2499 Jan 25 12:56 > 001.deltacommit.inflight > -rw-r--r-- 1 alexey.kudinkin staff 0 Jan 25 12:56 > 001.deltacommit.requested > -rw-r--r-- 1 alexey.kudinkin staff 5451 Jan 25 12:56 002.deltacommit > -rw-r--r-- 1 alexey.kudinkin staff 4620 Jan 25 12:56 > 002.deltacommit.inflight > -rw-r--r-- 1 alexey.kudinkin staff 0 Jan 25 12:56 > 002.deltacommit.requested > -rw-r--r-- 1 alexey.kudinkin staff 5646 Jan 25 12:57 003.deltacommit > -rw-r--r-- 1 alexey.kudinkin staff 4620 Jan 25 12:57 > 003.deltacommit.inflight > -rw-r--r-- 1 alexey.kudinkin staff 0 Jan 25 12:56 > 003.deltacommit.requested > -rw-r--r-- 1 alexey.kudinkin staff 5835 Jan 25 12:57 004.deltacommit > -rw-r--r-- 1 alexey.kudinkin staff 4620 Jan 25 12:57 > 004.deltacommit.inflight > -rw-r--r-- 1 alexey.kudinkin staff 0 Jan 25 12:57 > 004.deltacommit.requested > -rw-r--r-- 1 alexey.kudinkin staff 4756 Jan 25 12:57 005.commit > -rw-r--r-- 1 alexey.kudinkin staff 0 Jan 25 12:57 > 005.compaction.inflight > -rw-r--r-- 1 alexey.kudinkin staff 2507 Jan 25 12:57 > 005.compaction.requested > -rw-r--r-- 1 alexey.kudinkin staff 5362 Jan 25 12:57 006.deltacommit > -rw-r--r-- 1 alexey.kudinkin staff 4620 Jan 25 12:57 > 006.deltacommit.inflight > -rw-r--r-- 1 alexey.kudinkin staff 0 Jan 25 12:57 > 006.deltacommit.requested > -rw-r--r-- 1 alexey.kudinkin staff 5551 Jan 25 12:57
[jira] [Updated] (HUDI-3343) Metadata Table includes Uncommitted Log Files during Bootstrap
[ https://issues.apache.org/jira/browse/HUDI-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3343: -- Status: Patch Available (was: In Progress) > Metadata Table includes Uncommitted Log Files during Bootstrap > -- > > Key: HUDI-3343 > URL: https://issues.apache.org/jira/browse/HUDI-3343 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Fix For: 0.11.0 > > > While working on a fix for HUDI-3322, discovered a following issue: > If we're bootstrapping the MT during pending Rollback operation (this could > happen when previous writer had MT *disabled* when writing the data), since > bootstrapping is done _after_ Rollback is executed (with its side-effects > already being reflected on FS) bootstrapping would incorrectly include > intermediary files created by the Rollback (like log-files being created with > Rollback Command Block appended). > > Filtering of the files is performed here: > https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java#L752 > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3343) Metadata Table includes Uncommitted Log Files during Bootstrap
[ https://issues.apache.org/jira/browse/HUDI-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3343: -- Reviewers: sivabalan narayanan, Y Ethan Guo > Metadata Table includes Uncommitted Log Files during Bootstrap > -- > > Key: HUDI-3343 > URL: https://issues.apache.org/jira/browse/HUDI-3343 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Fix For: 0.11.0 > > > While working on a fix for HUDI-3322, discovered a following issue: > If we're bootstrapping the MT during pending Rollback operation (this could > happen when previous writer had MT *disabled* when writing the data), since > bootstrapping is done _after_ Rollback is executed (with its side-effects > already being reflected on FS) bootstrapping would incorrectly include > intermediary files created by the Rollback (like log-files being created with > Rollback Command Block appended). > > Filtering of the files is performed here: > https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java#L752 > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3343) Metadata Table includes Uncommitted Log Files during Bootstrap
[ https://issues.apache.org/jira/browse/HUDI-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3343: -- Status: In Progress (was: Open) > Metadata Table includes Uncommitted Log Files during Bootstrap > -- > > Key: HUDI-3343 > URL: https://issues.apache.org/jira/browse/HUDI-3343 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Fix For: 0.11.0 > > > While working on a fix for HUDI-3322, discovered a following issue: > If we're bootstrapping the MT during pending Rollback operation (this could > happen when previous writer had MT *disabled* when writing the data), since > bootstrapping is done _after_ Rollback is executed (with its side-effects > already being reflected on FS) bootstrapping would incorrectly include > intermediary files created by the Rollback (like log-files being created with > Rollback Command Block appended). > > Filtering of the files is performed here: > https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java#L752 > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot commented on pull request #4716: [HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issues
hudi-bot commented on pull request #4716: URL: https://github.com/apache/hudi/pull/4716#issuecomment-1024798480 ## CI report: * 62b022ec6642cef208e7fb370f95ed0ed1c5ba83 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3322) Rollback Plan for Delta Commits constructed incorrectly
[ https://issues.apache.org/jira/browse/HUDI-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-3322: - Labels: pull-request-available (was: ) > Rollback Plan for Delta Commits constructed incorrectly > --- > > Key: HUDI-3322 > URL: https://issues.apache.org/jira/browse/HUDI-3322 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > Diving deeper into the issue of HUDI-3279, i've realized that the root-cause > of the problem is actually a Rollback Plan for Delta Commits is composed > incorrectly for MOR tables. Consider the case below (we will continue to rely > on test of > {{{}TestHoodieSparkMergeOnReadTableRollback#testMORTableRestore{}}}): > Hoodie Timeline: > {code:java} > alexey.kudinkin@alexeys-mbp junit5494198038159268501 % ls -la .hoodie > total 400 > drwxr-xr-x 52 alexey.kudinkin staff 1664 Jan 25 13:08 . > drwx-- 5 alexey.kudinkin staff 160 Jan 25 12:56 .. > -rw-r--r-- 1 alexey.kudinkin staff 48 Jan 25 12:56 .001.deltacommit.crc > -rw-r--r-- 1 alexey.kudinkin staff 28 Jan 25 12:56 > .001.deltacommit.inflight.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 12:56 > .001.deltacommit.requested.crc > -rw-r--r-- 1 alexey.kudinkin staff 52 Jan 25 12:56 .002.deltacommit.crc > -rw-r--r-- 1 alexey.kudinkin staff 48 Jan 25 12:56 > .002.deltacommit.inflight.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 12:56 > .002.deltacommit.requested.crc > -rw-r--r-- 1 alexey.kudinkin staff 56 Jan 25 12:57 .003.deltacommit.crc > -rw-r--r-- 1 alexey.kudinkin staff 48 Jan 25 12:57 > .003.deltacommit.inflight.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 12:56 > .003.deltacommit.requested.crc > -rw-r--r-- 1 alexey.kudinkin staff 56 Jan 25 12:57 .004.deltacommit.crc > -rw-r--r-- 1 alexey.kudinkin staff 48 Jan 25 12:57 > .004.deltacommit.inflight.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 12:57 > .004.deltacommit.requested.crc > -rw-r--r-- 1 alexey.kudinkin staff 48 Jan 25 12:57 .005.commit.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 12:57 > .005.compaction.inflight.crc > -rw-r--r-- 1 alexey.kudinkin staff 28 Jan 25 12:57 > .005.compaction.requested.crc > -rw-r--r-- 1 alexey.kudinkin staff 52 Jan 25 12:57 .006.deltacommit.crc > -rw-r--r-- 1 alexey.kudinkin staff 48 Jan 25 12:57 > .006.deltacommit.inflight.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 12:57 > .006.deltacommit.requested.crc > -rw-r--r-- 1 alexey.kudinkin staff 52 Jan 25 12:57 .007.deltacommit.crc > -rw-r--r-- 1 alexey.kudinkin staff 48 Jan 25 12:57 > .007.deltacommit.inflight.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 12:57 > .007.deltacommit.requested.crc > -rw-r--r-- 1 alexey.kudinkin staff 8 Jan 25 13:08 > .20220125130818473.restore.inflight.crc > drwxr-xr-x 5 alexey.kudinkin staff 160 Jan 25 12:57 .aux > -rw-r--r-- 1 alexey.kudinkin staff 12 Jan 25 12:56 > .hoodie.properties.crc > drwxr-xr-x 2 alexey.kudinkin staff 64 Jan 25 12:57 .temp > -rw-r--r-- 1 alexey.kudinkin staff 4822 Jan 25 12:56 001.deltacommit > -rw-r--r-- 1 alexey.kudinkin staff 2499 Jan 25 12:56 > 001.deltacommit.inflight > -rw-r--r-- 1 alexey.kudinkin staff 0 Jan 25 12:56 > 001.deltacommit.requested > -rw-r--r-- 1 alexey.kudinkin staff 5451 Jan 25 12:56 002.deltacommit > -rw-r--r-- 1 alexey.kudinkin staff 4620 Jan 25 12:56 > 002.deltacommit.inflight > -rw-r--r-- 1 alexey.kudinkin staff 0 Jan 25 12:56 > 002.deltacommit.requested > -rw-r--r-- 1 alexey.kudinkin staff 5646 Jan 25 12:57 003.deltacommit > -rw-r--r-- 1 alexey.kudinkin staff 4620 Jan 25 12:57 > 003.deltacommit.inflight > -rw-r--r-- 1 alexey.kudinkin staff 0 Jan 25 12:56 > 003.deltacommit.requested > -rw-r--r-- 1 alexey.kudinkin staff 5835 Jan 25 12:57 004.deltacommit > -rw-r--r-- 1 alexey.kudinkin staff 4620 Jan 25 12:57 > 004.deltacommit.inflight > -rw-r--r-- 1 alexey.kudinkin staff 0 Jan 25 12:57 > 004.deltacommit.requested > -rw-r--r-- 1 alexey.kudinkin staff 4756 Jan 25 12:57 005.commit > -rw-r--r-- 1 alexey.kudinkin staff 0 Jan 25 12:57 > 005.compaction.inflight > -rw-r--r-- 1 alexey.kudinkin staff 2507 Jan 25 12:57 > 005.compaction.requested > -rw-r--r-- 1 alexey.kudinkin staff 5362 Jan 25 12:57 006.deltacommit > -rw-r--r-- 1 alexey.kudinkin staff 4620 Jan 25 12:57 > 006.deltacommit.inflight > -rw-r--r-- 1 alexey.kudinkin staff 0 Jan 25 12:57 > 006.deltacommit.requested > -rw-r--r-- 1 alexey.kudinkin staff 5551 Jan 25 12:57 007.de