[GitHub] [hudi] hudi-bot commented on pull request #9517: [HUDI-6708] Support record level indexing with async indexer
hudi-bot commented on PR #9517: URL: https://github.com/apache/hudi/pull/9517#issuecomment-1712432849 ## CI report: * 6628c5285d60b62c44d928eacb67507aab68d5ed Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19577) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9538: [HUDI-6738] - Apply object filter before checkpoint batching in GcsEventsHoodieIncrSource
hudi-bot commented on PR #9538: URL: https://github.com/apache/hudi/pull/9538#issuecomment-1712431281 ## CI report: * 4d0957dc91e137aa6f2302619a60a317001a7017 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19772) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9517: [HUDI-6708] Support record level indexing with async indexer
hudi-bot commented on PR #9517: URL: https://github.com/apache/hudi/pull/9517#issuecomment-1712431262 ## CI report: * 6628c5285d60b62c44d928eacb67507aab68d5ed UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9482: [HUDI-6728] Update BigQuery manifest sync to support schema evolution
hudi-bot commented on PR #9482: URL: https://github.com/apache/hudi/pull/9482#issuecomment-1712431248 ## CI report: * 51c9a626b176d644b6294a57b3d59d70103a892f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19769) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries
nsivabalan commented on code in PR #9611: URL: https://github.com/apache/hudi/pull/9611#discussion_r1320485055 ## hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/FlinkTaskContextSupplier.java: ## @@ -62,4 +62,9 @@ public Option getProperty(EngineProperty prop) { return Option.empty(); } + @Override + public Supplier getAttemptNumberSupplier() { +return () -> -1; Review Comment: yes. we have disabled it for flink as of now. In java, anyways, MOR is not fully functional from what I know. but I am open to disabling it for java as well. mainly its an issue for ExpressionPayload and any other custom payloads. most of the other payloads are idempotent even if there are duplicate log blocks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #9667: [HUDI-6836] Delta streamer: close metrics for metadata table
nsivabalan commented on code in PR #9667: URL: https://github.com/apache/hudi/pull/9667#discussion_r1320486278 ## hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamer.java: ## @@ -726,10 +734,9 @@ public void testBulkInsertsAndUpsertsWithBootstrap(HoodieRecordType recordType) cfg.configs.add("hoodie.datasource.write.hive_style_partitioning=true"); cfg.configs.add("hoodie.bootstrap.parallelism=5"); cfg.targetBasePath = newDatasetBasePath; -new HoodieDeltaStreamer(cfg, jsc).sync(); +HoodieDeltaStreamer ds = new HoodieDeltaStreamer(cfg, jsc); +ds.sync(); Review Comment: do you think we can add try finally and move df.shutdownGracefully into finally ? ## hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamerMetrics.java: ## @@ -157,6 +159,11 @@ public void shutdown() { if (metrics != null) { metrics.shutdown(); } +// if metadata table is enabled, make sure to shut down the metrics for that table as well +if (writeConfig.getMetadataConfig().enabled()) { + HoodieWriteConfig metadataWriteConfig = HoodieMetadataWriteUtils.createMetadataWriteConfig(writeConfig, HoodieFailedWritesCleaningPolicy.LAZY); Review Comment: Probably better option is: the metrics within WriteClient should be closed when write client is closed. Since we are at this, can you also take care of below? We also have HoodieMetadataMetrics within HoodieBackedTableMetadataWriter which also need to be closed when the MetadataWriter is closed. ## hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamer.java: ## @@ -780,10 +788,12 @@ public void testModifiedTableConfigs() throws Exception { } private void syncAndAssertRecordCount(HoodieDeltaStreamer.Config cfg, Integer expected, String tableBasePath, String metadata, Integer totalCommits) throws Exception { -new HoodieDeltaStreamer(cfg, jsc).sync(); +HoodieDeltaStreamer ds = new HoodieDeltaStreamer(cfg, jsc); +ds.sync(); TestHelpers.assertRecordCount(expected, tableBasePath, sqlContext); TestHelpers.assertDistanceCount(expected, tableBasePath, sqlContext); TestHelpers.assertCommitMetadata(metadata, tableBasePath, fs, totalCommits); +ds.shutdownGracefully(); Review Comment: same suggestion as above -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9669: [HUDI-6838] Fix file writers to honor bloom filter configs
hudi-bot commented on PR #9669: URL: https://github.com/apache/hudi/pull/9669#issuecomment-1712420254 ## CI report: * 718f6b15ca75566c6ea6188e0ce98c45ab3a1732 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19767) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9668: [HUDI-6839] Github actions improvements
hudi-bot commented on PR #9668: URL: https://github.com/apache/hudi/pull/9668#issuecomment-1712420242 ## CI report: * 259298768a177b37bfbb304becb104a278fbbdb2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19766) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries
nsivabalan commented on code in PR #9611: URL: https://github.com/apache/hudi/pull/9611#discussion_r1320485055 ## hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/FlinkTaskContextSupplier.java: ## @@ -62,4 +62,9 @@ public Option getProperty(EngineProperty prop) { return Option.empty(); } + @Override + public Supplier getAttemptNumberSupplier() { +return () -> -1; Review Comment: yes. we have disabled it for flink as of now. In java, anyways, MOR is not fully functional from what I know. but I can open to disabling it for java as well. mainly its an issue for ExpressionPayload and any other custom payloads. most of the other payloads are idempotent even if there are duplicate log blocks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9667: [HUDI-6836] Delta streamer: close metrics for metadata table
hudi-bot commented on PR #9667: URL: https://github.com/apache/hudi/pull/9667#issuecomment-1712412336 ## CI report: * 098db07e60f504b10dc861dca14c97f852507a0d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19765) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9618: [HUDI-6753] Fix parquet inline reading flaky test
hudi-bot commented on PR #9618: URL: https://github.com/apache/hudi/pull/9618#issuecomment-1712410973 ## CI report: * 64f49a57daad6a7927182f97a8219bc20f20df11 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19774) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19771) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9538: [HUDI-6738] - Apply object filter before checkpoint batching in GcsEventsHoodieIncrSource
hudi-bot commented on PR #9538: URL: https://github.com/apache/hudi/pull/9538#issuecomment-1712410900 ## CI report: * e23804783126d93786c26ba43c3ec8f003bb977e Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19755) * 4d0957dc91e137aa6f2302619a60a317001a7017 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19772) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries
hudi-bot commented on PR #9611: URL: https://github.com/apache/hudi/pull/9611#issuecomment-1712410962 ## CI report: * 375c15de065dc9244a458b2324832e1f2b0f8bf9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19773) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19770) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9618: [HUDI-6753] Fix parquet inline reading flaky test
hudi-bot commented on PR #9618: URL: https://github.com/apache/hudi/pull/9618#issuecomment-1712409627 ## CI report: * ad2e4fda3ffb21219ba4101fdc7c331572524a83 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19656) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19753) * 64f49a57daad6a7927182f97a8219bc20f20df11 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19771) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19774) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries
hudi-bot commented on PR #9611: URL: https://github.com/apache/hudi/pull/9611#issuecomment-1712409616 ## CI report: * e86aa76be3833bbcec446d3ed65e05d4b7a94049 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19759) * 375c15de065dc9244a458b2324832e1f2b0f8bf9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19770) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19773) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Zouxxyy commented on pull request #9554: [HUDI-6760] Add SelfDescribingInputFormatInterface for hive FileInput…
Zouxxyy commented on PR #9554: URL: https://github.com/apache/hudi/pull/9554#issuecomment-1712404491 here is the error in integration-tests, don't know much about the env of integration testing, can anyone help~ ``` 2023-09-08T05:11:59.7764700Z Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.SelfDescribingInputFormatInterface 2023-09-08T05:11:59.7764906Z at java.net.URLClassLoader.findClass(URLClassLoader.java:382) 2023-09-08T05:11:59.7765092Z at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 2023-09-08T05:11:59.7765284Z at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 2023-09-08T05:11:59.7765373Z ... 58 more 2023-09-08T05:11:59.7765560Z 23/09/08 05:11:59 INFO util.ShutdownHookManager: Shutdown hook called 2023-09-08T05:11:59.7766126Z 23/09/08 05:11:59 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-b81218a3-32e6-4851-9b25-b15373acd05b 2023-09-08T05:11:59.7766507Z 23/09/08 05:11:59 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-9b58a267-201d-4404-baeb-49e617b23ad1 2023-09-08T05:11:59.7766647Z 2023-09-08T05:11:59.7766919Z Sep 08, 2023 5:11:59 AM org.glassfish.jersey.internal.Errors logErrors 2023-09-08T05:11:59.7767534Z WARNING: The following warnings have been detected: WARNING: Cannot create new registration for component type class com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider: Existing previous registration found for the type. 2023-09-08T05:11:59.7767548Z 2023-09-08T05:11:59.7768090Z [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 96.75 s <<< FAILURE! - in org.apache.hudi.integ.command.ITTestHoodieSyncCommand ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9538: [HUDI-6738] - Apply object filter before checkpoint batching in GcsEventsHoodieIncrSource
hudi-bot commented on PR #9538: URL: https://github.com/apache/hudi/pull/9538#issuecomment-1712402903 ## CI report: * e23804783126d93786c26ba43c3ec8f003bb977e Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19755) * 4d0957dc91e137aa6f2302619a60a317001a7017 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9482: [HUDI-6728] Update BigQuery manifest sync to support schema evolution
hudi-bot commented on PR #9482: URL: https://github.com/apache/hudi/pull/9482#issuecomment-1712402873 ## CI report: * Unknown: [CANCELED](TBD) * 51c9a626b176d644b6294a57b3d59d70103a892f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19769) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries
nsivabalan commented on PR #9611: URL: https://github.com/apache/hudi/pull/9611#issuecomment-1712401788 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #9618: [HUDI-6753] Fix parquet inline reading flaky test
nsivabalan commented on PR #9618: URL: https://github.com/apache/hudi/pull/9618#issuecomment-1712401748 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9618: [HUDI-6753] Fix parquet inline reading flaky test
hudi-bot commented on PR #9618: URL: https://github.com/apache/hudi/pull/9618#issuecomment-1712401396 ## CI report: * ad2e4fda3ffb21219ba4101fdc7c331572524a83 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19656) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19753) * 64f49a57daad6a7927182f97a8219bc20f20df11 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9666: [HUDI-6834] Fixing time travel queries when overlaps with cleaner and archival time window
hudi-bot commented on PR #9666: URL: https://github.com/apache/hudi/pull/9666#issuecomment-1712401455 ## CI report: * 9eb5b9774aff4779e33feac92a513a9543be9752 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19764) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries
hudi-bot commented on PR #9611: URL: https://github.com/apache/hudi/pull/9611#issuecomment-1712401387 ## CI report: * e86aa76be3833bbcec446d3ed65e05d4b7a94049 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19759) * 375c15de065dc9244a458b2324832e1f2b0f8bf9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9482: [HUDI-6728] Update BigQuery manifest sync to support schema evolution
hudi-bot commented on PR #9482: URL: https://github.com/apache/hudi/pull/9482#issuecomment-1712401316 ## CI report: * Unknown: [CANCELED](TBD) * 51c9a626b176d644b6294a57b3d59d70103a892f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #9482: [HUDI-6728] Update BigQuery manifest sync to support schema evolution
nsivabalan commented on PR #9482: URL: https://github.com/apache/hudi/pull/9482#issuecomment-1712400125 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9669: [HUDI-6838] Fix file writers to honor bloom filter configs
hudi-bot commented on PR #9669: URL: https://github.com/apache/hudi/pull/9669#issuecomment-171232 ## CI report: * 718f6b15ca75566c6ea6188e0ce98c45ab3a1732 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19767) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9668: [HUDI-6839] Github actions improvements
hudi-bot commented on PR #9668: URL: https://github.com/apache/hudi/pull/9668#issuecomment-1712399989 ## CI report: * 259298768a177b37bfbb304becb104a278fbbdb2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19766) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-6820] Fixing CI stability issues (#9661)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 2acaa752db2 [HUDI-6820] Fixing CI stability issues (#9661) 2acaa752db2 is described below commit 2acaa752db2ff1664d82f12ea54d4a76fba05e21 Author: Lokesh Jain AuthorDate: Sat Sep 9 08:43:29 2023 +0530 [HUDI-6820] Fixing CI stability issues (#9661) - We face frequent flakiness around 2 modules (hudi-hadoop-mr and hudi-java-client). so, moving them out to github actions from azure CI. - Added explicit timeouts for few of deltastreamer continuous tests so that those fail instead of timing out. - Co-authored-by: sivabalan --- .github/workflows/bot.yml | 32 ++ azure-pipelines-20230430.yml | 2 ++ .../deltastreamer/TestHoodieDeltaStreamer.java | 5 3 files changed, 39 insertions(+) diff --git a/.github/workflows/bot.yml b/.github/workflows/bot.yml index 0811c828e49..acd51b8e123 100644 --- a/.github/workflows/bot.yml +++ b/.github/workflows/bot.yml @@ -112,6 +112,38 @@ jobs: run: mvn test -Pfunctional-tests -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" -pl "$SPARK_COMMON_MODULES,$SPARK_MODULES" $MVN_ARGS + test-hudi-hadoop-mr-and-hudi-java-client: +runs-on: ubuntu-latest +strategy: + matrix: +include: + - scalaProfile: "scala-2.12" +sparkProfile: "spark3.2" +flinkProfile: "flink1.17" + +steps: + - uses: actions/checkout@v3 + - name: Set up JDK 8 +uses: actions/setup-java@v3 +with: + java-version: '8' + distribution: 'adopt' + architecture: x64 + - name: Build Project +env: + SCALA_PROFILE: ${{ matrix.scalaProfile }} + SPARK_PROFILE: ${{ matrix.sparkProfile }} + FLINK_PROFILE: ${{ matrix.flinkProfile }} +run: + mvn clean install -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" -D"FLINK_PROFILE" -DskipTests=true -Phudi-platform-service $MVN_ARGS + - name: UT - hudi-hadoop-mr and hudi-client/hudi-java-client +env: + SCALA_PROFILE: ${{ matrix.scalaProfile }} + SPARK_PROFILE: ${{ matrix.sparkProfile }} + FLINK_PROFILE: ${{ matrix.flinkProfile }} +run: + mvn test -Punit-tests -fae -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" -D"FLINK_PROFILE" -pl hudi-hadoop-mr,hudi-client/hudi-java-client $MVN_ARGS + test-spark-java17: runs-on: ubuntu-latest strategy: diff --git a/azure-pipelines-20230430.yml b/azure-pipelines-20230430.yml index 2da5ab0d4f9..25a149b5cf4 100644 --- a/azure-pipelines-20230430.yml +++ b/azure-pipelines-20230430.yml @@ -53,6 +53,8 @@ parameters: - name: job4UTModules type: object default: + - '!hudi-hadoop-mr' + - '!hudi-client/hudi-java-client' - '!hudi-client/hudi-spark-client' - '!hudi-common' - '!hudi-examples' diff --git a/hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamer.java b/hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamer.java index 6324fb83fc9..2a7db25647e 100644 --- a/hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamer.java +++ b/hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamer.java @@ -120,6 +120,7 @@ import org.apache.spark.sql.types.StructField; import org.junit.jupiter.api.Assertions; import org.junit.jupiter.api.Disabled; import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.Timeout; import org.junit.jupiter.params.ParameterizedTest; import org.junit.jupiter.params.provider.Arguments; import org.junit.jupiter.params.provider.CsvSource; @@ -869,6 +870,7 @@ public class TestHoodieDeltaStreamer extends HoodieDeltaStreamerTestBase { defaultSchemaProviderClassName = FilebasedSchemaProvider.class.getName(); } + @Timeout(600) @ParameterizedTest @EnumSource(value = HoodieRecordType.class, names = {"AVRO", "SPARK"}) public void testUpsertsCOWContinuousMode(HoodieRecordType recordType) throws Exception { @@ -892,12 +894,14 @@ public class TestHoodieDeltaStreamer extends HoodieDeltaStreamerTestBase { UtilitiesTestBase.Helpers.deleteFileFromDfs(fs, tableBasePath); } + @Timeout(600) @ParameterizedTest @EnumSource(value = HoodieRecordType.class, names = {"AVRO"}) public void testUpsertsMORContinuousModeShutdownGracefully(HoodieRecordType recordType) throws Exception { testUpsertsContinuousMode(HoodieTableType.MERGE_ON_READ, "continuous_cow", true, recordType); } + @Timeout(600) @ParameterizedTest @EnumSource(value = HoodieRecordType.class, names = {"AVRO", "SPARK"}) public void testUpsertsMORContin
[GitHub] [hudi] nsivabalan merged pull request #9661: [HUDI-6820] Fixing CI stability issues
nsivabalan merged PR #9661: URL: https://github.com/apache/hudi/pull/9661 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #9661: [HUDI-6820] Fixing CI stability issues
nsivabalan commented on PR #9661: URL: https://github.com/apache/hudi/pull/9661#issuecomment-1712397965 https://github.com/apache/hudi/assets/513218/2007cdde-0353-4281-9bd8-94b8b35aea85";> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9669: [HUDI-6838] Fix file writers to honor bloom filter configs
hudi-bot commented on PR #9669: URL: https://github.com/apache/hudi/pull/9669#issuecomment-1712391641 ## CI report: * 718f6b15ca75566c6ea6188e0ce98c45ab3a1732 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9668: [HUDI-6839] Github actions improvements
hudi-bot commented on PR #9668: URL: https://github.com/apache/hudi/pull/9668#issuecomment-1712391635 ## CI report: * 259298768a177b37bfbb304becb104a278fbbdb2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] the-other-tim-brown commented on pull request #9650: [HUDI-6831] Add back missing project_id to query statement in BigQuerySyncTool
the-other-tim-brown commented on PR #9650: URL: https://github.com/apache/hudi/pull/9650#issuecomment-1712390246 > @the-other-tim-brown Do you have intreast to review this PR? LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6838) Fix file writers to honor bloom filter configs
[ https://issues.apache.org/jira/browse/HUDI-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6838: - Labels: pull-request-available (was: ) > Fix file writers to honor bloom filter configs > -- > > Key: HUDI-6838 > URL: https://issues.apache.org/jira/browse/HUDI-6838 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 0.14.0 > > > Bloom filter configs are hard-coded in `HoodieFileWriterFactory` > {code:java} > protected BloomFilter createBloomFilter(HoodieConfig config) { > return BloomFilterFactory.createBloomFilter(6, 0.1, 10, > BloomFilterTypeCode.DYNAMIC_V0.name()); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] yihua opened a new pull request, #9669: [HUDI-6838] Fix file writers to honor bloom filter configs
yihua opened a new pull request, #9669: URL: https://github.com/apache/hudi/pull/9669 ### Change Logs This fixes the Hudi file writers to honor bloom filter configs. Before this fix, the bloom filter parameters are hard-coded in `HoodieFileWriterFactory`. Given that `HoodieFileWriterFactory` is in `hudi-common` module, the bloom filter-related configs are moved from `HoodieIndexConfig` (in `hudi-client-common` module) to `HoodieStorageConfig` (in `hudi-common` module), so they can be referenced. ### Impact Make sure bloom filter configs take effect. ### Risk level low ### Documentation Update We need to add a regression note. ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6839) Github Actions Workflow Improvements
[ https://issues.apache.org/jira/browse/HUDI-6839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6839: - Labels: pull-request-available (was: ) > Github Actions Workflow Improvements > > > Key: HUDI-6839 > URL: https://issues.apache.org/jira/browse/HUDI-6839 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Timothy Brown >Assignee: Timothy Brown >Priority: Major > Labels: pull-request-available > > # Leverage maven cache option for build speed > # Use parallel build when packaging jars for tests > # Cancel inflight tests when updates to branches are pushed to save on costs -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] the-other-tim-brown opened a new pull request, #9668: [HUDI-6839] Github actions improvements
the-other-tim-brown opened a new pull request, #9668: URL: https://github.com/apache/hudi/pull/9668 ### Change Logs - Cancel running tests when a PR is updated to save on cost of maintaining the project - Use maven cache option for setup java https://github.com/actions/setup-java#caching-packages-dependencies - Use maven build parallelism when running `install` with `-DskipTests` to build modules in parallel ### Impact Improve CI times ### Risk level (write none, low medium or high below) None ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9664: [HUDI-6835] Adjust spark sql core flow test scenarios
hudi-bot commented on PR #9664: URL: https://github.com/apache/hudi/pull/9664#issuecomment-1712388169 ## CI report: * 3d758a3723cc125009b6129f526c5df1d35bbf8d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19761) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #9650: [HUDI-6831] Add back missing project_id to query statement in BigQuerySyncTool
danny0405 commented on PR #9650: URL: https://github.com/apache/hudi/pull/9650#issuecomment-1712388181 @the-other-tim-brown Do you have intreast to review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries
danny0405 commented on code in PR #9611: URL: https://github.com/apache/hudi/pull/9611#discussion_r1320459648 ## hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/FlinkTaskContextSupplier.java: ## @@ -62,4 +62,9 @@ public Option getProperty(EngineProperty prop) { return Option.empty(); } + @Override + public Supplier getAttemptNumberSupplier() { +return () -> -1; Review Comment: > only updates go to log files. Only true for spark, so you are fixing a bug dependent on Spark write christeristic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-6839) Github Actions Workflow Improvements
[ https://issues.apache.org/jira/browse/HUDI-6839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Brown reassigned HUDI-6839: --- Assignee: Timothy Brown > Github Actions Workflow Improvements > > > Key: HUDI-6839 > URL: https://issues.apache.org/jira/browse/HUDI-6839 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Timothy Brown >Assignee: Timothy Brown >Priority: Major > > # Leverage maven cache option for build speed > # Use parallel build when packaging jars for tests > # Cancel inflight tests when updates to branches are pushed to save on costs -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6839) Github Actions Workflow Improvements
Timothy Brown created HUDI-6839: --- Summary: Github Actions Workflow Improvements Key: HUDI-6839 URL: https://issues.apache.org/jira/browse/HUDI-6839 Project: Apache Hudi Issue Type: Improvement Reporter: Timothy Brown # Leverage maven cache option for build speed # Use parallel build when packaging jars for tests # Cancel inflight tests when updates to branches are pushed to save on costs -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6838) Fix file writers to honor bloom filter configs
[ https://issues.apache.org/jira/browse/HUDI-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6838: Priority: Blocker (was: Major) > Fix file writers to honor bloom filter configs > -- > > Key: HUDI-6838 > URL: https://issues.apache.org/jira/browse/HUDI-6838 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Priority: Blocker > Fix For: 0.14.0 > > > Bloom filter configs are hard-coded in `HoodieFileWriterFactory` > {code:java} > protected BloomFilter createBloomFilter(HoodieConfig config) { > return BloomFilterFactory.createBloomFilter(6, 0.1, 10, > BloomFilterTypeCode.DYNAMIC_V0.name()); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6838) Fix file writers to honor bloom filter configs
[ https://issues.apache.org/jira/browse/HUDI-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6838: Description: Bloom filter configs are hard-coded in `HoodieFileWriterFactory` {code:java} protected BloomFilter createBloomFilter(HoodieConfig config) { return BloomFilterFactory.createBloomFilter(6, 0.1, 10, BloomFilterTypeCode.DYNAMIC_V0.name()); } {code} > Fix file writers to honor bloom filter configs > -- > > Key: HUDI-6838 > URL: https://issues.apache.org/jira/browse/HUDI-6838 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Priority: Major > Fix For: 0.14.0 > > > Bloom filter configs are hard-coded in `HoodieFileWriterFactory` > {code:java} > protected BloomFilter createBloomFilter(HoodieConfig config) { > return BloomFilterFactory.createBloomFilter(6, 0.1, 10, > BloomFilterTypeCode.DYNAMIC_V0.name()); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-6838) Fix file writers to honor bloom filter configs
[ https://issues.apache.org/jira/browse/HUDI-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-6838: --- Assignee: Ethan Guo > Fix file writers to honor bloom filter configs > -- > > Key: HUDI-6838 > URL: https://issues.apache.org/jira/browse/HUDI-6838 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.14.0 > > > Bloom filter configs are hard-coded in `HoodieFileWriterFactory` > {code:java} > protected BloomFilter createBloomFilter(HoodieConfig config) { > return BloomFilterFactory.createBloomFilter(6, 0.1, 10, > BloomFilterTypeCode.DYNAMIC_V0.name()); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6838) Fix file writers to honor bloom filter configs
[ https://issues.apache.org/jira/browse/HUDI-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6838: Fix Version/s: 0.14.0 > Fix file writers to honor bloom filter configs > -- > > Key: HUDI-6838 > URL: https://issues.apache.org/jira/browse/HUDI-6838 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Priority: Major > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6838) Fix file writers to honor bloom filter configs
Ethan Guo created HUDI-6838: --- Summary: Fix file writers to honor bloom filter configs Key: HUDI-6838 URL: https://issues.apache.org/jira/browse/HUDI-6838 Project: Apache Hudi Issue Type: Bug Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #9661: [HUDI-6820] Fixing CI stability issues
hudi-bot commented on PR #9661: URL: https://github.com/apache/hudi/pull/9661#issuecomment-1712374663 ## CI report: * 1932225fb7c7c1d6b9518fbd3b455050002ccc63 UNKNOWN * ef9d12359f8d84c024812c262843e0a6699c4540 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19760) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19758) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9667: [HUDI-6836] Delta streamer: close metrics for metadata table
hudi-bot commented on PR #9667: URL: https://github.com/apache/hudi/pull/9667#issuecomment-1712361736 ## CI report: * 098db07e60f504b10dc861dca14c97f852507a0d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19765) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9665: [HUDI-6478] Deduce op as upsert for INSERT INTO
hudi-bot commented on PR #9665: URL: https://github.com/apache/hudi/pull/9665#issuecomment-1712361720 ## CI report: * 867f25a6378b6e522a35e11f7e4e622efc6e360a UNKNOWN * 18d03141e69fc59e700e9abf9b14ba4028ce307d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19763) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9665: [HUDI-6478] Deduce op as upsert for INSERT INTO
hudi-bot commented on PR #9665: URL: https://github.com/apache/hudi/pull/9665#issuecomment-1712359057 ## CI report: * b134855bf923e5932c76b3a5117515cd3e49afe7 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19762) * 867f25a6378b6e522a35e11f7e4e622efc6e360a UNKNOWN * 18d03141e69fc59e700e9abf9b14ba4028ce307d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9667: [HUDI-6836] Delta streamer: close metrics for metadata table
hudi-bot commented on PR #9667: URL: https://github.com/apache/hudi/pull/9667#issuecomment-1712359087 ## CI report: * 098db07e60f504b10dc861dca14c97f852507a0d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9666: [HUDI-6834] Fixing time travel queries when overlaps with cleaner and archival time window
hudi-bot commented on PR #9666: URL: https://github.com/apache/hudi/pull/9666#issuecomment-1712359070 ## CI report: * 9eb5b9774aff4779e33feac92a513a9543be9752 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19764) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9665: [HUDI-6478] Deduce op as upsert for INSERT INTO
hudi-bot commented on PR #9665: URL: https://github.com/apache/hudi/pull/9665#issuecomment-1712355980 ## CI report: * b134855bf923e5932c76b3a5117515cd3e49afe7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19762) * 867f25a6378b6e522a35e11f7e4e622efc6e360a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9666: [HUDI-6834] Fixing time travel queries when overlaps with cleaner and archival time window
hudi-bot commented on PR #9666: URL: https://github.com/apache/hudi/pull/9666#issuecomment-1712356002 ## CI report: * 9eb5b9774aff4779e33feac92a513a9543be9752 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9664: [HUDI-6835] Adjust spark sql core flow test scenarios
hudi-bot commented on PR #9664: URL: https://github.com/apache/hudi/pull/9664#issuecomment-1712355945 ## CI report: * 3d758a3723cc125009b6129f526c5df1d35bbf8d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19761) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6784) Clean Merger API and its invocations
[ https://issues.apache.org/jira/browse/HUDI-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu updated HUDI-6784: -- Summary: Clean Merger API and its invocations (was: Support custom logic for deletion) > Clean Merger API and its invocations > > > Key: HUDI-6784 > URL: https://issues.apache.org/jira/browse/HUDI-6784 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Lin Liu >Assignee: Lin Liu >Priority: Major > Fix For: 1.0.0 > > > Add `Optional<>` for newer parameter in merger. If newer is empty, then it > means this is a deletion operation. > > To goal of this task is: > # Clean the API design of the merger, which should support all operations > for a record. > # Insert its calling into the correct places, such that default logic and > custom logic can be supported. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6784) Support custom logic for deletion
[ https://issues.apache.org/jira/browse/HUDI-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu updated HUDI-6784: -- Description: Add `Optional<>` for newer parameter in merger. If newer is empty, then it means this is a deletion operation. To goal of this task is: # Clean the API design of the merger, which should support all operations for a record. # Insert its calling into the correct places, such that default logic and custom logic can be supported. was:Add `Optional<>` for newer parameter in merger. If newer is empty, then it means this is a deletion operation. > Support custom logic for deletion > - > > Key: HUDI-6784 > URL: https://issues.apache.org/jira/browse/HUDI-6784 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Lin Liu >Assignee: Lin Liu >Priority: Major > Fix For: 1.0.0 > > > Add `Optional<>` for newer parameter in merger. If newer is empty, then it > means this is a deletion operation. > > To goal of this task is: > # Clean the API design of the merger, which should support all operations > for a record. > # Insert its calling into the correct places, such that default logic and > custom logic can be supported. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-6837) Ensure the getInsertValue is wrapped correctly
[ https://issues.apache.org/jira/browse/HUDI-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu closed HUDI-6837. - Resolution: Resolved > Ensure the getInsertValue is wrapped correctly > -- > > Key: HUDI-6837 > URL: https://issues.apache.org/jira/browse/HUDI-6837 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Lin Liu >Priority: Major > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-6837) Ensure the getInsertValue is wrapped correctly
[ https://issues.apache.org/jira/browse/HUDI-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763288#comment-17763288 ] Lin Liu commented on HUDI-6837: --- The usage of getInsertValue has been verified. The only leftover of this function is for HoodieMetadataRecord, which will be handled separately. So we can close this task. > Ensure the getInsertValue is wrapped correctly > -- > > Key: HUDI-6837 > URL: https://issues.apache.org/jira/browse/HUDI-6837 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Lin Liu >Priority: Major > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6836) Shutdown metrics for metadata table writer in deltastreamer
[ https://issues.apache.org/jira/browse/HUDI-6836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6836: - Labels: pull-request-available (was: ) > Shutdown metrics for metadata table writer in deltastreamer > --- > > Key: HUDI-6836 > URL: https://issues.apache.org/jira/browse/HUDI-6836 > Project: Apache Hudi > Issue Type: Bug >Reporter: Timothy Brown >Assignee: Timothy Brown >Priority: Minor > Labels: pull-request-available > > When debugging some Deltastreamer tests, I noticed that there is still a > running metrics instance for the metadata table path. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6837) Ensure the getInsertValue is wrapped correctly
Lin Liu created HUDI-6837: - Summary: Ensure the getInsertValue is wrapped correctly Key: HUDI-6837 URL: https://issues.apache.org/jira/browse/HUDI-6837 Project: Apache Hudi Issue Type: New Feature Reporter: Lin Liu Fix For: 1.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] the-other-tim-brown opened a new pull request, #9667: [HUDI-6836] Delta streamer: close metrics for metadata table
the-other-tim-brown opened a new pull request, #9667: URL: https://github.com/apache/hudi/pull/9667 ### Change Logs Updates the shutdown of the delta streamer to shutdown the metrics for the metadata table writer as well as the metrics registered for the main base path if metadata table is enabled. ### Impact Shuts down a metrics reporter ### Risk level (write none, low medium or high below) none ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (HUDI-6702) Extend merge API to support all merging operations
[ https://issues.apache.org/jira/browse/HUDI-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763287#comment-17763287 ] Lin Liu edited comment on HUDI-6702 at 9/8/23 11:46 PM: The first step of the task is to ensure the HoodieRecordPayload.getInsertValue is called correctly. I have checked all the current design, and the calling of the getInsertValue has been wrapped into HoodieAvroRecord class. So the first goal of this task has been done. Next subtask is to ensure that we apply merge api to the places where the custom logic should be added. was (Author: JIRAUSER301185): The first step of the task is to ensure the HoodieRecordPayload.getInsertValue is called correctly. I have checked all the current design, and the calling of the getInsertValue has been wrapped into HoodieAvroRecord class. So this task has been done. > Extend merge API to support all merging operations > -- > > Key: HUDI-6702 > URL: https://issues.apache.org/jira/browse/HUDI-6702 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Sagar Sumit >Assignee: Lin Liu >Priority: Blocker > Labels: pull-request-available > Fix For: 1.0.0 > > > See this issue for more details- [https://github.com/apache/hudi/issues/9430] > We may have to introduce a new API or figure out a way for the current merger > to skip empty records. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6836) Shutdown metrics for metadata table writer in deltastreamer
Timothy Brown created HUDI-6836: --- Summary: Shutdown metrics for metadata table writer in deltastreamer Key: HUDI-6836 URL: https://issues.apache.org/jira/browse/HUDI-6836 Project: Apache Hudi Issue Type: Bug Reporter: Timothy Brown When debugging some Deltastreamer tests, I noticed that there is still a running metrics instance for the metadata table path. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-6836) Shutdown metrics for metadata table writer in deltastreamer
[ https://issues.apache.org/jira/browse/HUDI-6836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Brown reassigned HUDI-6836: --- Assignee: Timothy Brown > Shutdown metrics for metadata table writer in deltastreamer > --- > > Key: HUDI-6836 > URL: https://issues.apache.org/jira/browse/HUDI-6836 > Project: Apache Hudi > Issue Type: Bug >Reporter: Timothy Brown >Assignee: Timothy Brown >Priority: Minor > > When debugging some Deltastreamer tests, I noticed that there is still a > running metrics instance for the metadata table path. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-6702) Extend merge API to support all merging operations
[ https://issues.apache.org/jira/browse/HUDI-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763287#comment-17763287 ] Lin Liu commented on HUDI-6702: --- The first step of the task is to ensure the HoodieRecordPayload.getInsertValue is called correctly. I have checked all the current design, and the calling of the getInsertValue has been wrapped into HoodieAvroRecord class. So this task has been done. > Extend merge API to support all merging operations > -- > > Key: HUDI-6702 > URL: https://issues.apache.org/jira/browse/HUDI-6702 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Sagar Sumit >Assignee: Lin Liu >Priority: Blocker > Labels: pull-request-available > Fix For: 1.0.0 > > > See this issue for more details- [https://github.com/apache/hudi/issues/9430] > We may have to introduce a new API or figure out a way for the current merger > to skip empty records. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] xushiyan commented on a diff in pull request #9665: [HUDI-6478] Deduce op as upsert for INSERT INTO
xushiyan commented on code in PR #9665: URL: https://github.com/apache/hudi/pull/9665#discussion_r1320425108 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/AutoRecordKeyGenerationUtils.scala: ## @@ -32,29 +31,29 @@ object AutoRecordKeyGenerationUtils { private val log = LoggerFactory.getLogger(getClass) def mayBeValidateParamsForAutoGenerationOfRecordKeys(parameters: Map[String, String], hoodieConfig: HoodieConfig): Unit = { -val autoGenerateRecordKeys = isAutoGenerateRecordKeys(parameters) -// hudi will auto generate. -if (autoGenerateRecordKeys) { Review Comment: diff here is just for early return -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9665: [HUDI-6478] Deduce op as upsert for INSERT INTO
hudi-bot commented on PR #9665: URL: https://github.com/apache/hudi/pull/9665#issuecomment-1712334697 ## CI report: * b134855bf923e5932c76b3a5117515cd3e49afe7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9664: [HUDI-6835] Adjust spark sql core flow test scenarios
hudi-bot commented on PR #9664: URL: https://github.com/apache/hudi/pull/9664#issuecomment-1712334662 ## CI report: * 3d758a3723cc125009b6129f526c5df1d35bbf8d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan opened a new pull request, #9666: [HUDI-6834] Fixing time travel queries when overlaps with cleaner and archival time window
nsivabalan opened a new pull request, #9666: URL: https://github.com/apache/hudi/pull/9666 …me window ### Change Logs When time travel query overlaps with cleaner or archival window, we should explicitly fail the query. If not, we might end up serving partial/wrong results or empty rows. ### Impact When time travel query overlaps with cleaner or archival window, we should explicitly fail the query. If not, we might end up serving partial/wrong results or empty rows. ### Risk level (write none, low medium or high below) low ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6834) Time travel query for an instant not in active timeline should throw exception
[ https://issues.apache.org/jira/browse/HUDI-6834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6834: - Labels: pull-request-available (was: ) > Time travel query for an instant not in active timeline should throw > exception > --- > > Key: HUDI-6834 > URL: https://issues.apache.org/jira/browse/HUDI-6834 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > with timestamp as of query, if the timestamp being requested is cleaned or > archived, we should throw exception. > since none of the file slices might be available to serve. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #9473: [HUDI-6724] - Defaulting previous Instant time to init time to enable full read of initial commit
hudi-bot commented on PR #9473: URL: https://github.com/apache/hudi/pull/9473#issuecomment-1712330556 ## CI report: * 9498e0dc2757dee0736d9500ddf2c7e9ff64cca5 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19537) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan opened a new pull request, #9665: [HUDI-6478] Deduce op as upsert for INSERT INTO
xushiyan opened a new pull request, #9665: URL: https://github.com/apache/hudi/pull/9665 ### Change Logs When users explicitly defines primaryKey and preCombineField when `CREATE TABLE`, subsequent `INSERT INTO` will deduce the operation as `UPSERT`. ### Impact Spark `INSERT INTO` semantics. ### Risk level Medium ### Documentation Update NA ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6835) Adjust spark sql core flow test scenarios
[ https://issues.apache.org/jira/browse/HUDI-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6835: - Labels: pull-request-available (was: ) > Adjust spark sql core flow test scenarios > - > > Key: HUDI-6835 > URL: https://issues.apache.org/jira/browse/HUDI-6835 > Project: Apache Hudi > Issue Type: Test > Components: spark-sql, tests-ci >Reporter: Raymond Xu >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > - Adjust the scenarios to cover GLOBAL_SIMPLE and GLOBAL_BLOOM cases. > - For MDT disabled/enabled case, disable/enable both writer and reader -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] xushiyan commented on pull request #9664: [HUDI-6835] Adjust spark sql core flow test scenarios
xushiyan commented on PR #9664: URL: https://github.com/apache/hudi/pull/9664#issuecomment-1712321505 @jonvex @nsivabalan can you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan opened a new pull request, #9664: [HUDI-6835] Adjust spark sql core flow test scenarios
xushiyan opened a new pull request, #9664: URL: https://github.com/apache/hudi/pull/9664 ### Change Logs - Adjust the scenarios to cover GLOBAL_SIMPLE and GLOBAL_BLOOM cases. - For MDT disabled/enabled case, disable/enable both writer and reader ### Impact Manual spark sql tests (as it's not running by CI) ### Risk level None. Test only. ### Documentation Update NA ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6835) Adjust spark sql core flow test scenarios
[ https://issues.apache.org/jira/browse/HUDI-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-6835: - Description: - Adjust the scenarios to cover GLOBAL_SIMPLE and GLOBAL_BLOOM cases. - For MDT disabled/enabled case, disable/enable both writer and reader > Adjust spark sql core flow test scenarios > - > > Key: HUDI-6835 > URL: https://issues.apache.org/jira/browse/HUDI-6835 > Project: Apache Hudi > Issue Type: Test > Components: spark-sql, tests-ci >Reporter: Raymond Xu >Priority: Major > Fix For: 0.14.0 > > > - Adjust the scenarios to cover GLOBAL_SIMPLE and GLOBAL_BLOOM cases. > - For MDT disabled/enabled case, disable/enable both writer and reader -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6835) Adjust spark sql core flow test scenarios
Raymond Xu created HUDI-6835: Summary: Adjust spark sql core flow test scenarios Key: HUDI-6835 URL: https://issues.apache.org/jira/browse/HUDI-6835 Project: Apache Hudi Issue Type: Test Components: spark-sql, tests-ci Reporter: Raymond Xu Fix For: 0.14.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6478) Simplify INSERT_INTO configs
[ https://issues.apache.org/jira/browse/HUDI-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-6478: - Fix Version/s: 0.14.0 > Simplify INSERT_INTO configs > > > Key: HUDI-6478 > URL: https://issues.apache.org/jira/browse/HUDI-6478 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > We have 2 to 3 diff configs in the mix for INSERT_INTO command. lets try to > simplify them. > > hoodie.sql.insert.mode, drop dups, hoodie.sql.bulk.insert.enable and > datasource.operation.type. > > Rough notes: > > hoodie.sql.bulk.insert.enable: true | false. > > hoodie.sql.insert.mode: STRICT| NON_STRICT | UPSERT > STRICT: we can't re-ingest same record again. will throw if found duplicates > to be ingested again. > NON_STRICT: no such constraints. but has to be set along w/ bulk_insert(if > its enabled). if not, exception will be thrown. > UPSERT: default insert.mode(until a week back where in we switch to make > bulk_insert the default for INSERT_INTO). will take care of de-dup. will use > OverwriteWithLatestAvroPayload(which means that we can update an existing > record across batches). > > datasource.operation.type: insert, bulk_insert, upsert > > drop.dups: Drop new incoming records if it already exists. > > Proposal: > > * We will introduce a new config named "hoodie.sql.write.operation" which > will have 3 values ("insert", "bulk_insert" and "upsert"). Default value will > be "insert" for INSERT_INTO. > ** Deprecate hoodie.sql.insert.mode and "hoodie.sql.bulk.insert.enable". > * Also, enable "hoodie.merge.allow.duplicate.on.inserts" = true if operation > type is "Insert" for both spark-sql and spark-ds. This will maintain > duplicates but still help w/ small file management with "insert"s. > * Introduce a new config named "hoodie.datasource.insert.dedupe.policy" > whose valid values are "ignore, fail and drop". Make "ignore" as default. > "fail" will mimic "STRICT" mode we support as of now. Even spark-ds users can > use the fail/STRICT behavior if need be. > ** Deprecate hoodie.datasource.insert.drop.dups. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries
hudi-bot commented on PR #9611: URL: https://github.com/apache/hudi/pull/9611#issuecomment-1712286204 ## CI report: * e86aa76be3833bbcec446d3ed65e05d4b7a94049 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19759) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] linliu-code commented on pull request #9572: [HUDI-6702] Remove unnecessary calls of `getInsertValue` api from HoodieRecordPayload
linliu-code commented on PR #9572: URL: https://github.com/apache/hudi/pull/9572#issuecomment-1712275348 This issue should have been fixed together with PR: https://github.com/apache/hudi/pull/9593 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] linliu-code closed pull request #9572: [HUDI-6702] Remove unnecessary calls of `getInsertValue` api from HoodieRecordPayload
linliu-code closed pull request #9572: [HUDI-6702] Remove unnecessary calls of `getInsertValue` api from HoodieRecordPayload URL: https://github.com/apache/hudi/pull/9572 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-6834) Time travel query for an instant not in active timeline should throw exception
sivabalan narayanan created HUDI-6834: - Summary: Time travel query for an instant not in active timeline should throw exception Key: HUDI-6834 URL: https://issues.apache.org/jira/browse/HUDI-6834 Project: Apache Hudi Issue Type: Bug Components: reader-core Reporter: sivabalan narayanan with timestamp as of query, if the timestamp being requested is cleaned or archived, we should throw exception. since none of the file slices might be available to serve. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] yihua commented on a diff in pull request #9473: [HUDI-6724] - Defaulting previous Instant time to init time to enable full read of initial commit
yihua commented on code in PR #9473: URL: https://github.com/apache/hudi/pull/9473#discussion_r1320360360 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java: ## @@ -130,7 +130,7 @@ public static QueryInfo generateQueryInfo(JavaSparkContext jssc, String srcBaseP } }); -String previousInstantTime = beginInstantTime; +String previousInstantTime = DEFAULT_BEGIN_TIMESTAMP; Review Comment: I think there's still a hole here. If `beginInstantTime` is the first commit in the active timeline and there are archived commits, `previousInstantTime` should not be set to `DEFAULT_BEGIN_TIMESTAMP`? Could you check this case? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on a diff in pull request #9473: [HUDI-6724] - Defaulting previous Instant time to init time to enable full read of initial commit
yihua commented on code in PR #9473: URL: https://github.com/apache/hudi/pull/9473#discussion_r1320357906 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java: ## @@ -130,7 +130,7 @@ public static QueryInfo generateQueryInfo(JavaSparkContext jssc, String srcBaseP } }); Review Comment: ```suggestion // When `beginInstantTime` is `DEFAULT_BEGIN_TIMESTAMP` (due to missing checkpoint), `previousInstantTime` is set to `DEFAULT_BEGIN_TIMESTAMP` as well. ``` ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java: ## @@ -130,7 +130,7 @@ public static QueryInfo generateQueryInfo(JavaSparkContext jssc, String srcBaseP } }); -String previousInstantTime = beginInstantTime; +String previousInstantTime = DEFAULT_BEGIN_TIMESTAMP; if (!beginInstantTime.equals(DEFAULT_BEGIN_TIMESTAMP)) { Review Comment: ```suggestion // When `beginInstantTime` is present, `previousInstantTime` is set to the completed commit before `beginInstantTime` if that exists. If there is no completed commit before `beginInstantTime`, e.g., `beginInstantTime` is the first commit in the active timeline, `previousInstantTime` is set to `DEFAULT_BEGIN_TIMESTAMP`. if (!beginInstantTime.equals(DEFAULT_BEGIN_TIMESTAMP)) { ``` ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java: ## @@ -130,7 +130,7 @@ public static QueryInfo generateQueryInfo(JavaSparkContext jssc, String srcBaseP } }); -String previousInstantTime = beginInstantTime; +String previousInstantTime = DEFAULT_BEGIN_TIMESTAMP; Review Comment: Have you also gone through the logic for `MissingCheckpointStrategy.READ_LATEST` and see if there's any gap? ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java: ## @@ -130,7 +130,7 @@ public static QueryInfo generateQueryInfo(JavaSparkContext jssc, String srcBaseP } }); -String previousInstantTime = beginInstantTime; +String previousInstantTime = DEFAULT_BEGIN_TIMESTAMP; Review Comment: Basically, the only change here is when `beginInstantTime === `, the `previousInstantTime` is `DEFAULT_BEGIN_TIMESTAMP` now whereas `` before this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9661: [HUDI-6820] Fixing CI stability issues
hudi-bot commented on PR #9661: URL: https://github.com/apache/hudi/pull/9661#issuecomment-1712249218 ## CI report: * 1932225fb7c7c1d6b9518fbd3b455050002ccc63 UNKNOWN * ef9d12359f8d84c024812c262843e0a6699c4540 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19760) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19758) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9473: [HUDI-6724] - Defaulting previous Instant time to init time to enable full read of initial commit
hudi-bot commented on PR #9473: URL: https://github.com/apache/hudi/pull/9473#issuecomment-1712248462 ## CI report: * 9498e0dc2757dee0736d9500ddf2c7e9ff64cca5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19537) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9661: [HUDI-6820] Fixing CI stability issues
hudi-bot commented on PR #9661: URL: https://github.com/apache/hudi/pull/9661#issuecomment-1712240913 ## CI report: * 1932225fb7c7c1d6b9518fbd3b455050002ccc63 UNKNOWN * e0ec295431ea2d1004b8540179d8d4815519e6f7 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19754) * ef9d12359f8d84c024812c262843e0a6699c4540 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19758) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19760) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries
hudi-bot commented on PR #9611: URL: https://github.com/apache/hudi/pull/9611#issuecomment-1712240456 ## CI report: * 5620dc41d25ce155b3c57d083880e9f0e697ce9c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19756) * e86aa76be3833bbcec446d3ed65e05d4b7a94049 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19759) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9473: [HUDI-6724] - Defaulting previous Instant time to init time to enable full read of initial commit
hudi-bot commented on PR #9473: URL: https://github.com/apache/hudi/pull/9473#issuecomment-1712239965 ## CI report: * 9498e0dc2757dee0736d9500ddf2c7e9ff64cca5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #9661: [HUDI-6820] Fixing CI stability issues
nsivabalan commented on PR #9661: URL: https://github.com/apache/hudi/pull/9661#issuecomment-1712236128 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on a diff in pull request #9473: [HUDI-6724] - Defaulting previous Instant time to init time to enable full read of initial commit
yihua commented on code in PR #9473: URL: https://github.com/apache/hudi/pull/9473#discussion_r1320330466 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java: ## @@ -130,7 +130,7 @@ public static QueryInfo generateQueryInfo(JavaSparkContext jssc, String srcBaseP } }); -String previousInstantTime = beginInstantTime; +String previousInstantTime = DEFAULT_BEGIN_TIMESTAMP; Review Comment: OK. Could you add some docs here on different scenarios, how `previous`, `start`, and `end` instants for `QueryInfo` are decided? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9661: [HUDI-6820] Fixing CI stability issues
hudi-bot commented on PR #9661: URL: https://github.com/apache/hudi/pull/9661#issuecomment-1712199593 ## CI report: * 1932225fb7c7c1d6b9518fbd3b455050002ccc63 UNKNOWN * 41e8307d90997eddcd62a7215825be48cd54919b Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19750) * e0ec295431ea2d1004b8540179d8d4815519e6f7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19754) * ef9d12359f8d84c024812c262843e0a6699c4540 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries
hudi-bot commented on PR #9611: URL: https://github.com/apache/hudi/pull/9611#issuecomment-1712199208 ## CI report: * f2cc702a15efade40dfd14402c9e3d87f311054f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19694) * 5620dc41d25ce155b3c57d083880e9f0e697ce9c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19756) * e86aa76be3833bbcec446d3ed65e05d4b7a94049 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jmnatzaganian opened a new issue, #9663: [SUPPORT] Flink MOR only creates log files
jmnatzaganian opened a new issue, #9663: URL: https://github.com/apache/hudi/issues/9663 **Describe the problem you faced** When using Flink with an MOR table only log files are created. This is different from Spark, which only uses log files when changes to a file group occur. Log files are not optimized for large payloads which results in a costly performance hit when flushing the data. The end result is a pipeline which quickly buffers data, blocks while converting the data, and then flushes. For base files, the heavy lifting is passed to the parquet writer which automatically addresses these complications by incrementally processing data at the row group level. In that situation, the expectation is a relatively flat CPU usage, since serialization and compression is occurring in parallel with reads. Over-provisioning helps to a degree, but is limited to the size of the batch size. Reducing the batch size helps significantly but results in too many small files. Compaction can be used to help with this, but this adds a large cost in the I/O to S3 and compaction job. **To Reproduce** See the attached file, [Demo.java](https://github.com/apache/hudi/files/12563405/Demo.java.txt), from @kzdravkov. This is a self-contained Flink example illustrating the behavior. **Expected behavior** See the attached file [hudi_spark_mor.py](https://github.com/apache/hudi/files/12563406/hudi_spark_mor.py.txt). This is a self-contained PySpark example that shows the behavior that is expected. In short - whenever inserts occur it's expected that a base file will be created. It's additionally expected that the file bin packing logic will occur as appropriate. **Environment Description** * Hudi version: 0.13.1 * Spark version: 3.1.1 * Flink version: 1.13.1 * Hive version: N/A * Hadoop version: * Storage (HDFS/S3/GCS..): Local, but S3 in prod * Running on Docker? (yes/no): No. Flink is running in k8s. Data is in S3. Catalog uses Glue. For the purposes of this issue, local is sufficient for demonstration. **Additional context** @kzdravkov initially started this discussion in slack [here](https://apache-hudi.slack.com/archives/C01ULJQCXJ5/p1693575957361609) with @danny0405 and others. The thread has some additional details about our use case, but I'll include some important notes below: The data volume is modest at 50-100k rps with 100 mb/s uncompressed data. We are trying to migrate an existing Flink job that is a basic parquet table to Hudi. The job is insert heavy, but does have upserts and as such needs to be set to `upsert`. Our data model requires a global index to ensure we don't have duplicates (record --> partition mapping can change). This has been mitigated by using a pseudo-global index by setting the `FLINK_STATE` index to address most of the duplicates. All of the table services are done independently via Spark (currently Glue jobs). Given that this is blocking a rollout, we are planning to test our job with COW to see if it's good enough, and then migrate to MOR once this is addressed. For anyone curious about the log file overhead, please see the above slack thread. I believe this is also an issue, but assuming log blocks are kept small then it's minor and likely not worth the effort. I'd like to exclude that from this issue, since it's a separate and secondary issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9662: [HUDI-6820] Set timeout for TestHoodieDeltaStreamer continuous mode tests
hudi-bot commented on PR #9662: URL: https://github.com/apache/hudi/pull/9662#issuecomment-1712190098 ## CI report: * a297a4dd27cda4491ea6dc79afc57c979feaefcf Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19751) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9618: [HUDI-6753] Fix parquet inline reading flaky test
hudi-bot commented on PR #9618: URL: https://github.com/apache/hudi/pull/9618#issuecomment-1712189655 ## CI report: * ad2e4fda3ffb21219ba4101fdc7c331572524a83 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19656) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19753) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9482: [HUDI-6728] Update BigQuery manifest sync to support schema evolution
hudi-bot commented on PR #9482: URL: https://github.com/apache/hudi/pull/9482#issuecomment-1712189148 ## CI report: * bd99a6ce89b2ab0946f9409c7caf493c95d3befa Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19748) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #9538: [HUDI-6738] - Apply object filter before checkpoint batching in GcsEventsHoodieIncrSource
hudi-bot commented on PR #9538: URL: https://github.com/apache/hudi/pull/9538#issuecomment-1712189232 ## CI report: * e23804783126d93786c26ba43c3ec8f003bb977e Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19755) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org