[GitHub] [hudi] hudi-bot commented on pull request #9517: [HUDI-6708] Support record level indexing with async indexer

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9517:
URL: https://github.com/apache/hudi/pull/9517#issuecomment-1712432849

   
   ## CI report:
   
   * 6628c5285d60b62c44d928eacb67507aab68d5ed Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19577)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9538: [HUDI-6738] - Apply object filter before checkpoint batching in GcsEventsHoodieIncrSource

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9538:
URL: https://github.com/apache/hudi/pull/9538#issuecomment-1712431281

   
   ## CI report:
   
   * 4d0957dc91e137aa6f2302619a60a317001a7017 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19772)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9517: [HUDI-6708] Support record level indexing with async indexer

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9517:
URL: https://github.com/apache/hudi/pull/9517#issuecomment-1712431262

   
   ## CI report:
   
   * 6628c5285d60b62c44d928eacb67507aab68d5ed UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9482: [HUDI-6728] Update BigQuery manifest sync to support schema evolution

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9482:
URL: https://github.com/apache/hudi/pull/9482#issuecomment-1712431248

   
   ## CI report:
   
   * 51c9a626b176d644b6294a57b3d59d70103a892f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19769)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries

2023-09-08 Thread via GitHub


nsivabalan commented on code in PR #9611:
URL: https://github.com/apache/hudi/pull/9611#discussion_r1320485055


##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/FlinkTaskContextSupplier.java:
##
@@ -62,4 +62,9 @@ public Option getProperty(EngineProperty prop) {
 return Option.empty();
   }
 
+  @Override
+  public Supplier getAttemptNumberSupplier() {
+return () -> -1;

Review Comment:
   yes. we have disabled it for flink as of now. In java, anyways, MOR is not 
fully functional from what I know. but I am open to disabling it for java as 
well. mainly its an issue for ExpressionPayload and any other custom payloads. 
most of the other payloads are idempotent even if there are duplicate log 
blocks. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #9667: [HUDI-6836] Delta streamer: close metrics for metadata table

2023-09-08 Thread via GitHub


nsivabalan commented on code in PR #9667:
URL: https://github.com/apache/hudi/pull/9667#discussion_r1320486278


##
hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamer.java:
##
@@ -726,10 +734,9 @@ public void 
testBulkInsertsAndUpsertsWithBootstrap(HoodieRecordType recordType)
 cfg.configs.add("hoodie.datasource.write.hive_style_partitioning=true");
 cfg.configs.add("hoodie.bootstrap.parallelism=5");
 cfg.targetBasePath = newDatasetBasePath;
-new HoodieDeltaStreamer(cfg, jsc).sync();
+HoodieDeltaStreamer ds = new HoodieDeltaStreamer(cfg, jsc);
+ds.sync();

Review Comment:
   do you think we can add try finally and move df.shutdownGracefully into 
finally ?



##
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamerMetrics.java:
##
@@ -157,6 +159,11 @@ public void shutdown() {
 if (metrics != null) {
   metrics.shutdown();
 }
+// if metadata table is enabled, make sure to shut down the metrics for 
that table as well
+if (writeConfig.getMetadataConfig().enabled()) {
+  HoodieWriteConfig metadataWriteConfig = 
HoodieMetadataWriteUtils.createMetadataWriteConfig(writeConfig, 
HoodieFailedWritesCleaningPolicy.LAZY);

Review Comment:
   Probably better option is:
   the metrics within WriteClient should be closed when write client is closed. 
   
   Since we are at this, can you also take care of below?
   We also have HoodieMetadataMetrics within HoodieBackedTableMetadataWriter 
which also need to be closed when the MetadataWriter is closed. 



##
hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamer.java:
##
@@ -780,10 +788,12 @@ public void testModifiedTableConfigs() throws Exception {
   }
 
   private void syncAndAssertRecordCount(HoodieDeltaStreamer.Config cfg, 
Integer expected, String tableBasePath, String metadata, Integer totalCommits) 
throws Exception {
-new HoodieDeltaStreamer(cfg, jsc).sync();
+HoodieDeltaStreamer ds = new HoodieDeltaStreamer(cfg, jsc);
+ds.sync();
 TestHelpers.assertRecordCount(expected, tableBasePath, sqlContext);
 TestHelpers.assertDistanceCount(expected, tableBasePath, sqlContext);
 TestHelpers.assertCommitMetadata(metadata, tableBasePath, fs, 
totalCommits);
+ds.shutdownGracefully();

Review Comment:
   same suggestion as above



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9669: [HUDI-6838] Fix file writers to honor bloom filter configs

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9669:
URL: https://github.com/apache/hudi/pull/9669#issuecomment-1712420254

   
   ## CI report:
   
   * 718f6b15ca75566c6ea6188e0ce98c45ab3a1732 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19767)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9668: [HUDI-6839] Github actions improvements

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9668:
URL: https://github.com/apache/hudi/pull/9668#issuecomment-1712420242

   
   ## CI report:
   
   * 259298768a177b37bfbb304becb104a278fbbdb2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19766)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries

2023-09-08 Thread via GitHub


nsivabalan commented on code in PR #9611:
URL: https://github.com/apache/hudi/pull/9611#discussion_r1320485055


##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/FlinkTaskContextSupplier.java:
##
@@ -62,4 +62,9 @@ public Option getProperty(EngineProperty prop) {
 return Option.empty();
   }
 
+  @Override
+  public Supplier getAttemptNumberSupplier() {
+return () -> -1;

Review Comment:
   yes. we have disabled it for flink as of now. In java, anyways, MOR is not 
fully functional from what I know. but I can open to disabling it for java as 
well. mainly its an issue for ExpressionPayload and any other custom payloads. 
most of the other payloads are idempotent even if there are duplicate log 
blocks. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9667: [HUDI-6836] Delta streamer: close metrics for metadata table

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9667:
URL: https://github.com/apache/hudi/pull/9667#issuecomment-1712412336

   
   ## CI report:
   
   * 098db07e60f504b10dc861dca14c97f852507a0d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19765)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9618: [HUDI-6753] Fix parquet inline reading flaky test

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9618:
URL: https://github.com/apache/hudi/pull/9618#issuecomment-1712410973

   
   ## CI report:
   
   * 64f49a57daad6a7927182f97a8219bc20f20df11 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19774)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19771)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9538: [HUDI-6738] - Apply object filter before checkpoint batching in GcsEventsHoodieIncrSource

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9538:
URL: https://github.com/apache/hudi/pull/9538#issuecomment-1712410900

   
   ## CI report:
   
   * e23804783126d93786c26ba43c3ec8f003bb977e Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19755)
 
   * 4d0957dc91e137aa6f2302619a60a317001a7017 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19772)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9611:
URL: https://github.com/apache/hudi/pull/9611#issuecomment-1712410962

   
   ## CI report:
   
   * 375c15de065dc9244a458b2324832e1f2b0f8bf9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19773)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19770)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9618: [HUDI-6753] Fix parquet inline reading flaky test

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9618:
URL: https://github.com/apache/hudi/pull/9618#issuecomment-1712409627

   
   ## CI report:
   
   * ad2e4fda3ffb21219ba4101fdc7c331572524a83 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19656)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19753)
 
   * 64f49a57daad6a7927182f97a8219bc20f20df11 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19771)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19774)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9611:
URL: https://github.com/apache/hudi/pull/9611#issuecomment-1712409616

   
   ## CI report:
   
   * e86aa76be3833bbcec446d3ed65e05d4b7a94049 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19759)
 
   * 375c15de065dc9244a458b2324832e1f2b0f8bf9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19770)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19773)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Zouxxyy commented on pull request #9554: [HUDI-6760] Add SelfDescribingInputFormatInterface for hive FileInput…

2023-09-08 Thread via GitHub


Zouxxyy commented on PR #9554:
URL: https://github.com/apache/hudi/pull/9554#issuecomment-1712404491

   here is the error in integration-tests,  don't know much about the env of 
integration testing, can anyone help~
   
   ```
   2023-09-08T05:11:59.7764700Z Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.hive.ql.io.SelfDescribingInputFormatInterface
   2023-09-08T05:11:59.7764906Z at 
java.net.URLClassLoader.findClass(URLClassLoader.java:382)
   2023-09-08T05:11:59.7765092Z at 
java.lang.ClassLoader.loadClass(ClassLoader.java:424)
   2023-09-08T05:11:59.7765284Z at 
java.lang.ClassLoader.loadClass(ClassLoader.java:357)
   2023-09-08T05:11:59.7765373Z ... 58 more
   2023-09-08T05:11:59.7765560Z 23/09/08 05:11:59 INFO 
util.ShutdownHookManager: Shutdown hook called
   2023-09-08T05:11:59.7766126Z 23/09/08 05:11:59 INFO 
util.ShutdownHookManager: Deleting directory 
/tmp/spark-b81218a3-32e6-4851-9b25-b15373acd05b
   2023-09-08T05:11:59.7766507Z 23/09/08 05:11:59 INFO 
util.ShutdownHookManager: Deleting directory 
/tmp/spark-9b58a267-201d-4404-baeb-49e617b23ad1
   2023-09-08T05:11:59.7766647Z 
   2023-09-08T05:11:59.7766919Z Sep 08, 2023 5:11:59 AM 
org.glassfish.jersey.internal.Errors logErrors
   2023-09-08T05:11:59.7767534Z WARNING: The following warnings have been 
detected: WARNING: Cannot create new registration for component type class 
com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider: Existing previous 
registration found for the type.
   2023-09-08T05:11:59.7767548Z 
   2023-09-08T05:11:59.7768090Z [ERROR] Tests run: 1, Failures: 1, Errors: 0, 
Skipped: 0, Time elapsed: 96.75 s <<< FAILURE! - in 
org.apache.hudi.integ.command.ITTestHoodieSyncCommand
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9538: [HUDI-6738] - Apply object filter before checkpoint batching in GcsEventsHoodieIncrSource

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9538:
URL: https://github.com/apache/hudi/pull/9538#issuecomment-1712402903

   
   ## CI report:
   
   * e23804783126d93786c26ba43c3ec8f003bb977e Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19755)
 
   * 4d0957dc91e137aa6f2302619a60a317001a7017 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9482: [HUDI-6728] Update BigQuery manifest sync to support schema evolution

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9482:
URL: https://github.com/apache/hudi/pull/9482#issuecomment-1712402873

   
   ## CI report:
   
   *  Unknown: [CANCELED](TBD) 
   * 51c9a626b176d644b6294a57b3d59d70103a892f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19769)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries

2023-09-08 Thread via GitHub


nsivabalan commented on PR #9611:
URL: https://github.com/apache/hudi/pull/9611#issuecomment-1712401788

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #9618: [HUDI-6753] Fix parquet inline reading flaky test

2023-09-08 Thread via GitHub


nsivabalan commented on PR #9618:
URL: https://github.com/apache/hudi/pull/9618#issuecomment-1712401748

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9618: [HUDI-6753] Fix parquet inline reading flaky test

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9618:
URL: https://github.com/apache/hudi/pull/9618#issuecomment-1712401396

   
   ## CI report:
   
   * ad2e4fda3ffb21219ba4101fdc7c331572524a83 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19656)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19753)
 
   * 64f49a57daad6a7927182f97a8219bc20f20df11 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9666: [HUDI-6834] Fixing time travel queries when overlaps with cleaner and archival time window

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9666:
URL: https://github.com/apache/hudi/pull/9666#issuecomment-1712401455

   
   ## CI report:
   
   * 9eb5b9774aff4779e33feac92a513a9543be9752 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19764)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9611:
URL: https://github.com/apache/hudi/pull/9611#issuecomment-1712401387

   
   ## CI report:
   
   * e86aa76be3833bbcec446d3ed65e05d4b7a94049 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19759)
 
   * 375c15de065dc9244a458b2324832e1f2b0f8bf9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9482: [HUDI-6728] Update BigQuery manifest sync to support schema evolution

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9482:
URL: https://github.com/apache/hudi/pull/9482#issuecomment-1712401316

   
   ## CI report:
   
   *  Unknown: [CANCELED](TBD) 
   * 51c9a626b176d644b6294a57b3d59d70103a892f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #9482: [HUDI-6728] Update BigQuery manifest sync to support schema evolution

2023-09-08 Thread via GitHub


nsivabalan commented on PR #9482:
URL: https://github.com/apache/hudi/pull/9482#issuecomment-1712400125

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9669: [HUDI-6838] Fix file writers to honor bloom filter configs

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9669:
URL: https://github.com/apache/hudi/pull/9669#issuecomment-171232

   
   ## CI report:
   
   * 718f6b15ca75566c6ea6188e0ce98c45ab3a1732 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19767)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9668: [HUDI-6839] Github actions improvements

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9668:
URL: https://github.com/apache/hudi/pull/9668#issuecomment-1712399989

   
   ## CI report:
   
   * 259298768a177b37bfbb304becb104a278fbbdb2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19766)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [HUDI-6820] Fixing CI stability issues (#9661)

2023-09-08 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 2acaa752db2 [HUDI-6820] Fixing CI stability issues (#9661)
2acaa752db2 is described below

commit 2acaa752db2ff1664d82f12ea54d4a76fba05e21
Author: Lokesh Jain 
AuthorDate: Sat Sep 9 08:43:29 2023 +0530

[HUDI-6820] Fixing CI stability issues (#9661)

- We face frequent flakiness around 2 modules (hudi-hadoop-mr and 
hudi-java-client). so, moving them out to github actions from azure CI.
- Added explicit timeouts for few of deltastreamer continuous tests so that 
those fail instead of timing out.

-

Co-authored-by: sivabalan 
---
 .github/workflows/bot.yml  | 32 ++
 azure-pipelines-20230430.yml   |  2 ++
 .../deltastreamer/TestHoodieDeltaStreamer.java |  5 
 3 files changed, 39 insertions(+)

diff --git a/.github/workflows/bot.yml b/.github/workflows/bot.yml
index 0811c828e49..acd51b8e123 100644
--- a/.github/workflows/bot.yml
+++ b/.github/workflows/bot.yml
@@ -112,6 +112,38 @@ jobs:
 run:
   mvn test -Pfunctional-tests -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-pl "$SPARK_COMMON_MODULES,$SPARK_MODULES" $MVN_ARGS
 
+  test-hudi-hadoop-mr-and-hudi-java-client:
+runs-on: ubuntu-latest
+strategy:
+  matrix:
+include:
+  - scalaProfile: "scala-2.12"
+sparkProfile: "spark3.2"
+flinkProfile: "flink1.17"
+
+steps:
+  - uses: actions/checkout@v3
+  - name: Set up JDK 8
+uses: actions/setup-java@v3
+with:
+  java-version: '8'
+  distribution: 'adopt'
+  architecture: x64
+  - name: Build Project
+env:
+  SCALA_PROFILE: ${{ matrix.scalaProfile }}
+  SPARK_PROFILE: ${{ matrix.sparkProfile }}
+  FLINK_PROFILE: ${{ matrix.flinkProfile }}
+run:
+  mvn clean install -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-D"FLINK_PROFILE" -DskipTests=true -Phudi-platform-service $MVN_ARGS
+  - name: UT - hudi-hadoop-mr and hudi-client/hudi-java-client
+env:
+  SCALA_PROFILE: ${{ matrix.scalaProfile }}
+  SPARK_PROFILE: ${{ matrix.sparkProfile }}
+  FLINK_PROFILE: ${{ matrix.flinkProfile }}
+run:
+  mvn test -Punit-tests -fae -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-D"FLINK_PROFILE" -pl hudi-hadoop-mr,hudi-client/hudi-java-client $MVN_ARGS
+
   test-spark-java17:
 runs-on: ubuntu-latest
 strategy:
diff --git a/azure-pipelines-20230430.yml b/azure-pipelines-20230430.yml
index 2da5ab0d4f9..25a149b5cf4 100644
--- a/azure-pipelines-20230430.yml
+++ b/azure-pipelines-20230430.yml
@@ -53,6 +53,8 @@ parameters:
   - name: job4UTModules
 type: object
 default:
+  - '!hudi-hadoop-mr'
+  - '!hudi-client/hudi-java-client'
   - '!hudi-client/hudi-spark-client'
   - '!hudi-common'
   - '!hudi-examples'
diff --git 
a/hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamer.java
 
b/hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamer.java
index 6324fb83fc9..2a7db25647e 100644
--- 
a/hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamer.java
+++ 
b/hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamer.java
@@ -120,6 +120,7 @@ import org.apache.spark.sql.types.StructField;
 import org.junit.jupiter.api.Assertions;
 import org.junit.jupiter.api.Disabled;
 import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.Timeout;
 import org.junit.jupiter.params.ParameterizedTest;
 import org.junit.jupiter.params.provider.Arguments;
 import org.junit.jupiter.params.provider.CsvSource;
@@ -869,6 +870,7 @@ public class TestHoodieDeltaStreamer extends 
HoodieDeltaStreamerTestBase {
 defaultSchemaProviderClassName = FilebasedSchemaProvider.class.getName();
   }
 
+  @Timeout(600)
   @ParameterizedTest
   @EnumSource(value = HoodieRecordType.class, names = {"AVRO", "SPARK"})
   public void testUpsertsCOWContinuousMode(HoodieRecordType recordType) throws 
Exception {
@@ -892,12 +894,14 @@ public class TestHoodieDeltaStreamer extends 
HoodieDeltaStreamerTestBase {
 UtilitiesTestBase.Helpers.deleteFileFromDfs(fs, tableBasePath);
   }
 
+  @Timeout(600)
   @ParameterizedTest
   @EnumSource(value = HoodieRecordType.class, names = {"AVRO"})
   public void testUpsertsMORContinuousModeShutdownGracefully(HoodieRecordType 
recordType) throws Exception {
 testUpsertsContinuousMode(HoodieTableType.MERGE_ON_READ, "continuous_cow", 
true, recordType);
   }
 
+  @Timeout(600)
   @ParameterizedTest
   @EnumSource(value = HoodieRecordType.class, names = {"AVRO", "SPARK"})
   public void testUpsertsMORContin

[GitHub] [hudi] nsivabalan merged pull request #9661: [HUDI-6820] Fixing CI stability issues

2023-09-08 Thread via GitHub


nsivabalan merged PR #9661:
URL: https://github.com/apache/hudi/pull/9661


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #9661: [HUDI-6820] Fixing CI stability issues

2023-09-08 Thread via GitHub


nsivabalan commented on PR #9661:
URL: https://github.com/apache/hudi/pull/9661#issuecomment-1712397965

   https://github.com/apache/hudi/assets/513218/2007cdde-0353-4281-9bd8-94b8b35aea85";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9669: [HUDI-6838] Fix file writers to honor bloom filter configs

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9669:
URL: https://github.com/apache/hudi/pull/9669#issuecomment-1712391641

   
   ## CI report:
   
   * 718f6b15ca75566c6ea6188e0ce98c45ab3a1732 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9668: [HUDI-6839] Github actions improvements

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9668:
URL: https://github.com/apache/hudi/pull/9668#issuecomment-1712391635

   
   ## CI report:
   
   * 259298768a177b37bfbb304becb104a278fbbdb2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] the-other-tim-brown commented on pull request #9650: [HUDI-6831] Add back missing project_id to query statement in BigQuerySyncTool

2023-09-08 Thread via GitHub


the-other-tim-brown commented on PR #9650:
URL: https://github.com/apache/hudi/pull/9650#issuecomment-1712390246

   > @the-other-tim-brown Do you have intreast to review this PR?
   
   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6838) Fix file writers to honor bloom filter configs

2023-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6838:
-
Labels: pull-request-available  (was: )

> Fix file writers to honor bloom filter configs
> --
>
> Key: HUDI-6838
> URL: https://issues.apache.org/jira/browse/HUDI-6838
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> Bloom filter configs are hard-coded in `HoodieFileWriterFactory`
> {code:java}
> protected BloomFilter createBloomFilter(HoodieConfig config) {
> return BloomFilterFactory.createBloomFilter(6, 0.1, 10,
> BloomFilterTypeCode.DYNAMIC_V0.name());
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] yihua opened a new pull request, #9669: [HUDI-6838] Fix file writers to honor bloom filter configs

2023-09-08 Thread via GitHub


yihua opened a new pull request, #9669:
URL: https://github.com/apache/hudi/pull/9669

   ### Change Logs
   
   This fixes the Hudi file writers to honor bloom filter configs.  Before this 
fix, the bloom filter parameters are hard-coded in `HoodieFileWriterFactory`.
   
   Given that `HoodieFileWriterFactory` is in `hudi-common` module, the bloom 
filter-related configs are moved from `HoodieIndexConfig` (in 
`hudi-client-common` module) to `HoodieStorageConfig` (in `hudi-common` 
module), so they can be referenced.
   
   ### Impact
   
   Make sure bloom filter configs take effect.
   
   ### Risk level
   
   low
   
   ### Documentation Update
   
   We need to add a regression note.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6839) Github Actions Workflow Improvements

2023-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6839:
-
Labels: pull-request-available  (was: )

> Github Actions Workflow Improvements
> 
>
> Key: HUDI-6839
> URL: https://issues.apache.org/jira/browse/HUDI-6839
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
>
> # Leverage maven cache option for build speed
>  # Use parallel build when packaging jars for tests
>  # Cancel inflight tests when updates to branches are pushed to save on costs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] the-other-tim-brown opened a new pull request, #9668: [HUDI-6839] Github actions improvements

2023-09-08 Thread via GitHub


the-other-tim-brown opened a new pull request, #9668:
URL: https://github.com/apache/hudi/pull/9668

   ### Change Logs
   
   - Cancel running tests when a PR is updated to save on cost of maintaining 
the project
   - Use maven cache option for setup java 
https://github.com/actions/setup-java#caching-packages-dependencies
   - Use maven build parallelism when running `install` with `-DskipTests` to 
build modules in parallel
   
   ### Impact
   
   Improve CI times
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9664: [HUDI-6835] Adjust spark sql core flow test scenarios

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9664:
URL: https://github.com/apache/hudi/pull/9664#issuecomment-1712388169

   
   ## CI report:
   
   * 3d758a3723cc125009b6129f526c5df1d35bbf8d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19761)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #9650: [HUDI-6831] Add back missing project_id to query statement in BigQuerySyncTool

2023-09-08 Thread via GitHub


danny0405 commented on PR #9650:
URL: https://github.com/apache/hudi/pull/9650#issuecomment-1712388181

   @the-other-tim-brown Do you have intreast to review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries

2023-09-08 Thread via GitHub


danny0405 commented on code in PR #9611:
URL: https://github.com/apache/hudi/pull/9611#discussion_r1320459648


##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/FlinkTaskContextSupplier.java:
##
@@ -62,4 +62,9 @@ public Option getProperty(EngineProperty prop) {
 return Option.empty();
   }
 
+  @Override
+  public Supplier getAttemptNumberSupplier() {
+return () -> -1;

Review Comment:
   > only updates go to log files.
   
   Only true for spark, so you are fixing a bug dependent on Spark write 
christeristic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-6839) Github Actions Workflow Improvements

2023-09-08 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-6839:
---

Assignee: Timothy Brown

> Github Actions Workflow Improvements
> 
>
> Key: HUDI-6839
> URL: https://issues.apache.org/jira/browse/HUDI-6839
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>
> # Leverage maven cache option for build speed
>  # Use parallel build when packaging jars for tests
>  # Cancel inflight tests when updates to branches are pushed to save on costs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6839) Github Actions Workflow Improvements

2023-09-08 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6839:
---

 Summary: Github Actions Workflow Improvements
 Key: HUDI-6839
 URL: https://issues.apache.org/jira/browse/HUDI-6839
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


# Leverage maven cache option for build speed
 # Use parallel build when packaging jars for tests
 # Cancel inflight tests when updates to branches are pushed to save on costs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6838) Fix file writers to honor bloom filter configs

2023-09-08 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6838:

Priority: Blocker  (was: Major)

> Fix file writers to honor bloom filter configs
> --
>
> Key: HUDI-6838
> URL: https://issues.apache.org/jira/browse/HUDI-6838
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Priority: Blocker
> Fix For: 0.14.0
>
>
> Bloom filter configs are hard-coded in `HoodieFileWriterFactory`
> {code:java}
> protected BloomFilter createBloomFilter(HoodieConfig config) {
> return BloomFilterFactory.createBloomFilter(6, 0.1, 10,
> BloomFilterTypeCode.DYNAMIC_V0.name());
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6838) Fix file writers to honor bloom filter configs

2023-09-08 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6838:

Description: 
Bloom filter configs are hard-coded in `HoodieFileWriterFactory`

{code:java}
protected BloomFilter createBloomFilter(HoodieConfig config) {
return BloomFilterFactory.createBloomFilter(6, 0.1, 10,
BloomFilterTypeCode.DYNAMIC_V0.name());
  }
{code}


> Fix file writers to honor bloom filter configs
> --
>
> Key: HUDI-6838
> URL: https://issues.apache.org/jira/browse/HUDI-6838
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 0.14.0
>
>
> Bloom filter configs are hard-coded in `HoodieFileWriterFactory`
> {code:java}
> protected BloomFilter createBloomFilter(HoodieConfig config) {
> return BloomFilterFactory.createBloomFilter(6, 0.1, 10,
> BloomFilterTypeCode.DYNAMIC_V0.name());
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6838) Fix file writers to honor bloom filter configs

2023-09-08 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-6838:
---

Assignee: Ethan Guo

> Fix file writers to honor bloom filter configs
> --
>
> Key: HUDI-6838
> URL: https://issues.apache.org/jira/browse/HUDI-6838
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.14.0
>
>
> Bloom filter configs are hard-coded in `HoodieFileWriterFactory`
> {code:java}
> protected BloomFilter createBloomFilter(HoodieConfig config) {
> return BloomFilterFactory.createBloomFilter(6, 0.1, 10,
> BloomFilterTypeCode.DYNAMIC_V0.name());
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6838) Fix file writers to honor bloom filter configs

2023-09-08 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6838:

Fix Version/s: 0.14.0

> Fix file writers to honor bloom filter configs
> --
>
> Key: HUDI-6838
> URL: https://issues.apache.org/jira/browse/HUDI-6838
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6838) Fix file writers to honor bloom filter configs

2023-09-08 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-6838:
---

 Summary: Fix file writers to honor bloom filter configs
 Key: HUDI-6838
 URL: https://issues.apache.org/jira/browse/HUDI-6838
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9661: [HUDI-6820] Fixing CI stability issues

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9661:
URL: https://github.com/apache/hudi/pull/9661#issuecomment-1712374663

   
   ## CI report:
   
   * 1932225fb7c7c1d6b9518fbd3b455050002ccc63 UNKNOWN
   * ef9d12359f8d84c024812c262843e0a6699c4540 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19760)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19758)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9667: [HUDI-6836] Delta streamer: close metrics for metadata table

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9667:
URL: https://github.com/apache/hudi/pull/9667#issuecomment-1712361736

   
   ## CI report:
   
   * 098db07e60f504b10dc861dca14c97f852507a0d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19765)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9665: [HUDI-6478] Deduce op as upsert for INSERT INTO

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9665:
URL: https://github.com/apache/hudi/pull/9665#issuecomment-1712361720

   
   ## CI report:
   
   * 867f25a6378b6e522a35e11f7e4e622efc6e360a UNKNOWN
   * 18d03141e69fc59e700e9abf9b14ba4028ce307d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19763)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9665: [HUDI-6478] Deduce op as upsert for INSERT INTO

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9665:
URL: https://github.com/apache/hudi/pull/9665#issuecomment-1712359057

   
   ## CI report:
   
   * b134855bf923e5932c76b3a5117515cd3e49afe7 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19762)
 
   * 867f25a6378b6e522a35e11f7e4e622efc6e360a UNKNOWN
   * 18d03141e69fc59e700e9abf9b14ba4028ce307d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9667: [HUDI-6836] Delta streamer: close metrics for metadata table

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9667:
URL: https://github.com/apache/hudi/pull/9667#issuecomment-1712359087

   
   ## CI report:
   
   * 098db07e60f504b10dc861dca14c97f852507a0d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9666: [HUDI-6834] Fixing time travel queries when overlaps with cleaner and archival time window

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9666:
URL: https://github.com/apache/hudi/pull/9666#issuecomment-1712359070

   
   ## CI report:
   
   * 9eb5b9774aff4779e33feac92a513a9543be9752 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19764)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9665: [HUDI-6478] Deduce op as upsert for INSERT INTO

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9665:
URL: https://github.com/apache/hudi/pull/9665#issuecomment-1712355980

   
   ## CI report:
   
   * b134855bf923e5932c76b3a5117515cd3e49afe7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19762)
 
   * 867f25a6378b6e522a35e11f7e4e622efc6e360a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9666: [HUDI-6834] Fixing time travel queries when overlaps with cleaner and archival time window

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9666:
URL: https://github.com/apache/hudi/pull/9666#issuecomment-1712356002

   
   ## CI report:
   
   * 9eb5b9774aff4779e33feac92a513a9543be9752 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9664: [HUDI-6835] Adjust spark sql core flow test scenarios

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9664:
URL: https://github.com/apache/hudi/pull/9664#issuecomment-1712355945

   
   ## CI report:
   
   * 3d758a3723cc125009b6129f526c5df1d35bbf8d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19761)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6784) Clean Merger API and its invocations

2023-09-08 Thread Lin Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu updated HUDI-6784:
--
Summary: Clean Merger API and its invocations  (was: Support custom logic 
for deletion)

> Clean Merger API and its invocations
> 
>
> Key: HUDI-6784
> URL: https://issues.apache.org/jira/browse/HUDI-6784
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Lin Liu
>Assignee: Lin Liu
>Priority: Major
> Fix For: 1.0.0
>
>
> Add `Optional<>` for newer parameter in merger. If newer is empty, then it 
> means this is a deletion operation.
>  
> To goal of this task is:
>  # Clean the API design of the merger, which should support all operations 
> for a record.
>  # Insert its calling into the correct places, such that default logic and 
> custom logic can be supported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6784) Support custom logic for deletion

2023-09-08 Thread Lin Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu updated HUDI-6784:
--
Description: 
Add `Optional<>` for newer parameter in merger. If newer is empty, then it 
means this is a deletion operation.

 

To goal of this task is:
 # Clean the API design of the merger, which should support all operations for 
a record.
 # Insert its calling into the correct places, such that default logic and 
custom logic can be supported.

  was:Add `Optional<>` for newer parameter in merger. If newer is empty, then 
it means this is a deletion operation.


> Support custom logic for deletion
> -
>
> Key: HUDI-6784
> URL: https://issues.apache.org/jira/browse/HUDI-6784
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Lin Liu
>Assignee: Lin Liu
>Priority: Major
> Fix For: 1.0.0
>
>
> Add `Optional<>` for newer parameter in merger. If newer is empty, then it 
> means this is a deletion operation.
>  
> To goal of this task is:
>  # Clean the API design of the merger, which should support all operations 
> for a record.
>  # Insert its calling into the correct places, such that default logic and 
> custom logic can be supported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-6837) Ensure the getInsertValue is wrapped correctly

2023-09-08 Thread Lin Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu closed HUDI-6837.
-
Resolution: Resolved

> Ensure the getInsertValue is wrapped correctly
> --
>
> Key: HUDI-6837
> URL: https://issues.apache.org/jira/browse/HUDI-6837
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Lin Liu
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-6837) Ensure the getInsertValue is wrapped correctly

2023-09-08 Thread Lin Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763288#comment-17763288
 ] 

Lin Liu commented on HUDI-6837:
---

The usage of getInsertValue has been verified. The only leftover of this 
function is for HoodieMetadataRecord, which will be handled separately. So we 
can close this task.

> Ensure the getInsertValue is wrapped correctly
> --
>
> Key: HUDI-6837
> URL: https://issues.apache.org/jira/browse/HUDI-6837
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Lin Liu
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6836) Shutdown metrics for metadata table writer in deltastreamer

2023-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6836:
-
Labels: pull-request-available  (was: )

> Shutdown metrics for metadata table writer in deltastreamer
> ---
>
> Key: HUDI-6836
> URL: https://issues.apache.org/jira/browse/HUDI-6836
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Minor
>  Labels: pull-request-available
>
> When debugging some Deltastreamer tests, I noticed that there is still a 
> running metrics instance for the metadata table path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6837) Ensure the getInsertValue is wrapped correctly

2023-09-08 Thread Lin Liu (Jira)
Lin Liu created HUDI-6837:
-

 Summary: Ensure the getInsertValue is wrapped correctly
 Key: HUDI-6837
 URL: https://issues.apache.org/jira/browse/HUDI-6837
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: Lin Liu
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] the-other-tim-brown opened a new pull request, #9667: [HUDI-6836] Delta streamer: close metrics for metadata table

2023-09-08 Thread via GitHub


the-other-tim-brown opened a new pull request, #9667:
URL: https://github.com/apache/hudi/pull/9667

   ### Change Logs
   
   Updates the shutdown of the delta streamer to shutdown the metrics for the 
metadata table writer as well as the metrics registered for the main base path 
if metadata table is enabled.
   
   ### Impact
   
   Shuts down a metrics reporter
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (HUDI-6702) Extend merge API to support all merging operations

2023-09-08 Thread Lin Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763287#comment-17763287
 ] 

Lin Liu edited comment on HUDI-6702 at 9/8/23 11:46 PM:


The first step of the task is to ensure the HoodieRecordPayload.getInsertValue 
is called correctly. I have checked all the current design, and the calling of 
the getInsertValue has been wrapped into HoodieAvroRecord class. So the first 
goal of this task has been done.

Next subtask is to ensure that we apply merge api to the places where the 
custom logic should be added.


was (Author: JIRAUSER301185):
The first step of the task is to ensure the HoodieRecordPayload.getInsertValue 
is called correctly. I have checked all the current design, and the calling of 
the getInsertValue has been wrapped into HoodieAvroRecord class. So this task 
has been done.

> Extend merge API to support all merging operations
> --
>
> Key: HUDI-6702
> URL: https://issues.apache.org/jira/browse/HUDI-6702
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Sagar Sumit
>Assignee: Lin Liu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> See this issue for more details- [https://github.com/apache/hudi/issues/9430]
> We may have to introduce a new API or figure out a way for the current merger 
> to skip empty records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6836) Shutdown metrics for metadata table writer in deltastreamer

2023-09-08 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6836:
---

 Summary: Shutdown metrics for metadata table writer in 
deltastreamer
 Key: HUDI-6836
 URL: https://issues.apache.org/jira/browse/HUDI-6836
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Timothy Brown


When debugging some Deltastreamer tests, I noticed that there is still a 
running metrics instance for the metadata table path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6836) Shutdown metrics for metadata table writer in deltastreamer

2023-09-08 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-6836:
---

Assignee: Timothy Brown

> Shutdown metrics for metadata table writer in deltastreamer
> ---
>
> Key: HUDI-6836
> URL: https://issues.apache.org/jira/browse/HUDI-6836
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Minor
>
> When debugging some Deltastreamer tests, I noticed that there is still a 
> running metrics instance for the metadata table path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-6702) Extend merge API to support all merging operations

2023-09-08 Thread Lin Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763287#comment-17763287
 ] 

Lin Liu commented on HUDI-6702:
---

The first step of the task is to ensure the HoodieRecordPayload.getInsertValue 
is called correctly. I have checked all the current design, and the calling of 
the getInsertValue has been wrapped into HoodieAvroRecord class. So this task 
has been done.

> Extend merge API to support all merging operations
> --
>
> Key: HUDI-6702
> URL: https://issues.apache.org/jira/browse/HUDI-6702
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Sagar Sumit
>Assignee: Lin Liu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> See this issue for more details- [https://github.com/apache/hudi/issues/9430]
> We may have to introduce a new API or figure out a way for the current merger 
> to skip empty records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] xushiyan commented on a diff in pull request #9665: [HUDI-6478] Deduce op as upsert for INSERT INTO

2023-09-08 Thread via GitHub


xushiyan commented on code in PR #9665:
URL: https://github.com/apache/hudi/pull/9665#discussion_r1320425108


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/AutoRecordKeyGenerationUtils.scala:
##
@@ -32,29 +31,29 @@ object AutoRecordKeyGenerationUtils {
   private val log = LoggerFactory.getLogger(getClass)
 
   def mayBeValidateParamsForAutoGenerationOfRecordKeys(parameters: Map[String, 
String], hoodieConfig: HoodieConfig): Unit = {
-val autoGenerateRecordKeys = isAutoGenerateRecordKeys(parameters)
-// hudi will auto generate.
-if (autoGenerateRecordKeys) {

Review Comment:
   diff here is just for early return



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9665: [HUDI-6478] Deduce op as upsert for INSERT INTO

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9665:
URL: https://github.com/apache/hudi/pull/9665#issuecomment-1712334697

   
   ## CI report:
   
   * b134855bf923e5932c76b3a5117515cd3e49afe7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9664: [HUDI-6835] Adjust spark sql core flow test scenarios

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9664:
URL: https://github.com/apache/hudi/pull/9664#issuecomment-1712334662

   
   ## CI report:
   
   * 3d758a3723cc125009b6129f526c5df1d35bbf8d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan opened a new pull request, #9666: [HUDI-6834] Fixing time travel queries when overlaps with cleaner and archival time window

2023-09-08 Thread via GitHub


nsivabalan opened a new pull request, #9666:
URL: https://github.com/apache/hudi/pull/9666

   …me window
   
   ### Change Logs
   
   When time travel query overlaps with cleaner or archival window, we should 
explicitly fail the query. If not, we might end up serving partial/wrong 
results or empty rows. 
   
   ### Impact
   
   When time travel query overlaps with cleaner or archival window, we should 
explicitly fail the query. If not, we might end up serving partial/wrong 
results or empty rows. 
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6834) Time travel query for an instant not in active timeline should throw exception

2023-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6834:
-
Labels: pull-request-available  (was: )

> Time travel query for an instant not in active timeline should throw 
> exception 
> ---
>
> Key: HUDI-6834
> URL: https://issues.apache.org/jira/browse/HUDI-6834
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: reader-core
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> with timestamp as of query, if the timestamp being requested is cleaned or 
> archived, we should throw exception. 
> since none of the file slices might be available to serve. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9473: [HUDI-6724] - Defaulting previous Instant time to init time to enable full read of initial commit

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9473:
URL: https://github.com/apache/hudi/pull/9473#issuecomment-1712330556

   
   ## CI report:
   
   * 9498e0dc2757dee0736d9500ddf2c7e9ff64cca5 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19537)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan opened a new pull request, #9665: [HUDI-6478] Deduce op as upsert for INSERT INTO

2023-09-08 Thread via GitHub


xushiyan opened a new pull request, #9665:
URL: https://github.com/apache/hudi/pull/9665

   ### Change Logs
   
   When users explicitly defines primaryKey and preCombineField when `CREATE 
TABLE`, subsequent `INSERT INTO` will deduce the operation as `UPSERT`.
   
   ### Impact
   
   Spark `INSERT INTO` semantics.
   
   ### Risk level
   
   Medium
   
   ### Documentation Update
   
   NA
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6835) Adjust spark sql core flow test scenarios

2023-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6835:
-
Labels: pull-request-available  (was: )

> Adjust spark sql core flow test scenarios
> -
>
> Key: HUDI-6835
> URL: https://issues.apache.org/jira/browse/HUDI-6835
> Project: Apache Hudi
>  Issue Type: Test
>  Components: spark-sql, tests-ci
>Reporter: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> - Adjust the scenarios to cover GLOBAL_SIMPLE and GLOBAL_BLOOM cases.
> - For MDT disabled/enabled case, disable/enable both writer and reader



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] xushiyan commented on pull request #9664: [HUDI-6835] Adjust spark sql core flow test scenarios

2023-09-08 Thread via GitHub


xushiyan commented on PR #9664:
URL: https://github.com/apache/hudi/pull/9664#issuecomment-1712321505

   @jonvex @nsivabalan can you please take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan opened a new pull request, #9664: [HUDI-6835] Adjust spark sql core flow test scenarios

2023-09-08 Thread via GitHub


xushiyan opened a new pull request, #9664:
URL: https://github.com/apache/hudi/pull/9664

   ### Change Logs
   
   - Adjust the scenarios to cover GLOBAL_SIMPLE and GLOBAL_BLOOM cases.
   - For MDT disabled/enabled case, disable/enable both writer and reader
   
   ### Impact
   
   Manual spark sql tests (as it's not running by CI)
   
   ### Risk level
   
   None. Test only.
   
   ### Documentation Update
   
   NA
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6835) Adjust spark sql core flow test scenarios

2023-09-08 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-6835:
-
Description: 
- Adjust the scenarios to cover GLOBAL_SIMPLE and GLOBAL_BLOOM cases.
- For MDT disabled/enabled case, disable/enable both writer and reader

> Adjust spark sql core flow test scenarios
> -
>
> Key: HUDI-6835
> URL: https://issues.apache.org/jira/browse/HUDI-6835
> Project: Apache Hudi
>  Issue Type: Test
>  Components: spark-sql, tests-ci
>Reporter: Raymond Xu
>Priority: Major
> Fix For: 0.14.0
>
>
> - Adjust the scenarios to cover GLOBAL_SIMPLE and GLOBAL_BLOOM cases.
> - For MDT disabled/enabled case, disable/enable both writer and reader



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6835) Adjust spark sql core flow test scenarios

2023-09-08 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-6835:


 Summary: Adjust spark sql core flow test scenarios
 Key: HUDI-6835
 URL: https://issues.apache.org/jira/browse/HUDI-6835
 Project: Apache Hudi
  Issue Type: Test
  Components: spark-sql, tests-ci
Reporter: Raymond Xu
 Fix For: 0.14.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6478) Simplify INSERT_INTO configs

2023-09-08 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-6478:
-
Fix Version/s: 0.14.0

> Simplify INSERT_INTO configs
> 
>
> Key: HUDI-6478
> URL: https://issues.apache.org/jira/browse/HUDI-6478
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> We have 2 to 3 diff configs in the mix for INSERT_INTO command. lets try to 
> simplify them.
>  
> hoodie.sql.insert.mode, drop dups, hoodie.sql.bulk.insert.enable and 
> datasource.operation.type.
>  
> Rough notes:
>  
> hoodie.sql.bulk.insert.enable: true | false.
>  
> hoodie.sql.insert.mode: STRICT| NON_STRICT | UPSERT
> STRICT: we can't re-ingest same record again. will throw if found duplicates 
> to be ingested again.
> NON_STRICT: no such constraints. but has to be set along w/ bulk_insert(if 
> its enabled). if not, exception will be thrown.
> UPSERT: default insert.mode(until a week back where in we switch to make 
> bulk_insert the default for INSERT_INTO). will take care of de-dup. will use 
> OverwriteWithLatestAvroPayload(which means that we can update an existing 
> record across batches).
>  
> datasource.operation.type: insert, bulk_insert, upsert
>  
> drop.dups: Drop new incoming records if it already exists.
>  
> Proposal:
>  
>  * We will introduce a new config named "hoodie.sql.write.operation" which 
> will have 3 values ("insert", "bulk_insert" and "upsert"). Default value will 
> be "insert" for INSERT_INTO.
>  ** Deprecate hoodie.sql.insert.mode and "hoodie.sql.bulk.insert.enable".
>  * Also, enable "hoodie.merge.allow.duplicate.on.inserts" = true if operation 
> type is "Insert" for both spark-sql and spark-ds. This will maintain 
> duplicates but still help w/ small file management with "insert"s.
>  * Introduce a new config named "hoodie.datasource.insert.dedupe.policy" 
> whose valid values are "ignore, fail and drop". Make "ignore" as default. 
> "fail" will mimic "STRICT" mode we support as of now. Even spark-ds users can 
> use the fail/STRICT behavior if need be.
>  ** Deprecate hoodie.datasource.insert.drop.dups.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9611:
URL: https://github.com/apache/hudi/pull/9611#issuecomment-1712286204

   
   ## CI report:
   
   * e86aa76be3833bbcec446d3ed65e05d4b7a94049 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19759)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] linliu-code commented on pull request #9572: [HUDI-6702] Remove unnecessary calls of `getInsertValue` api from HoodieRecordPayload

2023-09-08 Thread via GitHub


linliu-code commented on PR #9572:
URL: https://github.com/apache/hudi/pull/9572#issuecomment-1712275348

   This issue should have been fixed together with PR: 
https://github.com/apache/hudi/pull/9593


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] linliu-code closed pull request #9572: [HUDI-6702] Remove unnecessary calls of `getInsertValue` api from HoodieRecordPayload

2023-09-08 Thread via GitHub


linliu-code closed pull request #9572: [HUDI-6702] Remove unnecessary calls of 
`getInsertValue` api from HoodieRecordPayload
URL: https://github.com/apache/hudi/pull/9572


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6834) Time travel query for an instant not in active timeline should throw exception

2023-09-08 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-6834:
-

 Summary: Time travel query for an instant not in active timeline 
should throw exception 
 Key: HUDI-6834
 URL: https://issues.apache.org/jira/browse/HUDI-6834
 Project: Apache Hudi
  Issue Type: Bug
  Components: reader-core
Reporter: sivabalan narayanan


with timestamp as of query, if the timestamp being requested is cleaned or 
archived, we should throw exception. 

since none of the file slices might be available to serve. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] yihua commented on a diff in pull request #9473: [HUDI-6724] - Defaulting previous Instant time to init time to enable full read of initial commit

2023-09-08 Thread via GitHub


yihua commented on code in PR #9473:
URL: https://github.com/apache/hudi/pull/9473#discussion_r1320360360


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java:
##
@@ -130,7 +130,7 @@ public static QueryInfo generateQueryInfo(JavaSparkContext 
jssc, String srcBaseP
   }
 });
 
-String previousInstantTime = beginInstantTime;
+String previousInstantTime = DEFAULT_BEGIN_TIMESTAMP;

Review Comment:
   I think there's still a hole here.  If `beginInstantTime` is the first 
commit in the active timeline and there are archived commits, 
`previousInstantTime` should not be set to `DEFAULT_BEGIN_TIMESTAMP`?  Could 
you check this case?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on a diff in pull request #9473: [HUDI-6724] - Defaulting previous Instant time to init time to enable full read of initial commit

2023-09-08 Thread via GitHub


yihua commented on code in PR #9473:
URL: https://github.com/apache/hudi/pull/9473#discussion_r1320357906


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java:
##
@@ -130,7 +130,7 @@ public static QueryInfo generateQueryInfo(JavaSparkContext 
jssc, String srcBaseP
   }
 });
 

Review Comment:
   ```suggestion
   
   // When `beginInstantTime` is `DEFAULT_BEGIN_TIMESTAMP` (due to missing 
checkpoint), `previousInstantTime` is set to `DEFAULT_BEGIN_TIMESTAMP` as well.
   ```



##
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java:
##
@@ -130,7 +130,7 @@ public static QueryInfo generateQueryInfo(JavaSparkContext 
jssc, String srcBaseP
   }
 });
 
-String previousInstantTime = beginInstantTime;
+String previousInstantTime = DEFAULT_BEGIN_TIMESTAMP;
 if (!beginInstantTime.equals(DEFAULT_BEGIN_TIMESTAMP)) {

Review Comment:
   ```suggestion
   // When `beginInstantTime` is present, `previousInstantTime` is set to the 
completed commit before `beginInstantTime` if that exists.  If there is no 
completed commit before `beginInstantTime`, e.g., `beginInstantTime` is the 
first commit in the active timeline, `previousInstantTime` is set to 
`DEFAULT_BEGIN_TIMESTAMP`.
   if (!beginInstantTime.equals(DEFAULT_BEGIN_TIMESTAMP)) {
   ```



##
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java:
##
@@ -130,7 +130,7 @@ public static QueryInfo generateQueryInfo(JavaSparkContext 
jssc, String srcBaseP
   }
 });
 
-String previousInstantTime = beginInstantTime;
+String previousInstantTime = DEFAULT_BEGIN_TIMESTAMP;

Review Comment:
   Have you also gone through the logic for 
`MissingCheckpointStrategy.READ_LATEST` and see if there's any gap?



##
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java:
##
@@ -130,7 +130,7 @@ public static QueryInfo generateQueryInfo(JavaSparkContext 
jssc, String srcBaseP
   }
 });
 
-String previousInstantTime = beginInstantTime;
+String previousInstantTime = DEFAULT_BEGIN_TIMESTAMP;

Review Comment:
   Basically, the only change here is when `beginInstantTime === `, the `previousInstantTime` is `DEFAULT_BEGIN_TIMESTAMP` now whereas 
`` before this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9661: [HUDI-6820] Fixing CI stability issues

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9661:
URL: https://github.com/apache/hudi/pull/9661#issuecomment-1712249218

   
   ## CI report:
   
   * 1932225fb7c7c1d6b9518fbd3b455050002ccc63 UNKNOWN
   * ef9d12359f8d84c024812c262843e0a6699c4540 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19760)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19758)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9473: [HUDI-6724] - Defaulting previous Instant time to init time to enable full read of initial commit

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9473:
URL: https://github.com/apache/hudi/pull/9473#issuecomment-1712248462

   
   ## CI report:
   
   * 9498e0dc2757dee0736d9500ddf2c7e9ff64cca5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19537)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9661: [HUDI-6820] Fixing CI stability issues

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9661:
URL: https://github.com/apache/hudi/pull/9661#issuecomment-1712240913

   
   ## CI report:
   
   * 1932225fb7c7c1d6b9518fbd3b455050002ccc63 UNKNOWN
   * e0ec295431ea2d1004b8540179d8d4815519e6f7 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19754)
 
   * ef9d12359f8d84c024812c262843e0a6699c4540 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19758)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19760)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9611:
URL: https://github.com/apache/hudi/pull/9611#issuecomment-1712240456

   
   ## CI report:
   
   * 5620dc41d25ce155b3c57d083880e9f0e697ce9c Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19756)
 
   * e86aa76be3833bbcec446d3ed65e05d4b7a94049 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19759)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9473: [HUDI-6724] - Defaulting previous Instant time to init time to enable full read of initial commit

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9473:
URL: https://github.com/apache/hudi/pull/9473#issuecomment-1712239965

   
   ## CI report:
   
   * 9498e0dc2757dee0736d9500ddf2c7e9ff64cca5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #9661: [HUDI-6820] Fixing CI stability issues

2023-09-08 Thread via GitHub


nsivabalan commented on PR #9661:
URL: https://github.com/apache/hudi/pull/9661#issuecomment-1712236128

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on a diff in pull request #9473: [HUDI-6724] - Defaulting previous Instant time to init time to enable full read of initial commit

2023-09-08 Thread via GitHub


yihua commented on code in PR #9473:
URL: https://github.com/apache/hudi/pull/9473#discussion_r1320330466


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java:
##
@@ -130,7 +130,7 @@ public static QueryInfo generateQueryInfo(JavaSparkContext 
jssc, String srcBaseP
   }
 });
 
-String previousInstantTime = beginInstantTime;
+String previousInstantTime = DEFAULT_BEGIN_TIMESTAMP;

Review Comment:
   OK.  Could you add some docs here on different scenarios, how `previous`, 
`start`, and `end` instants for `QueryInfo` are decided?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9661: [HUDI-6820] Fixing CI stability issues

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9661:
URL: https://github.com/apache/hudi/pull/9661#issuecomment-1712199593

   
   ## CI report:
   
   * 1932225fb7c7c1d6b9518fbd3b455050002ccc63 UNKNOWN
   * 41e8307d90997eddcd62a7215825be48cd54919b Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19750)
 
   * e0ec295431ea2d1004b8540179d8d4815519e6f7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19754)
 
   * ef9d12359f8d84c024812c262843e0a6699c4540 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9611: [HUDI-6758] Fixing deducing spurious log blocks due to spark retries

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9611:
URL: https://github.com/apache/hudi/pull/9611#issuecomment-1712199208

   
   ## CI report:
   
   * f2cc702a15efade40dfd14402c9e3d87f311054f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19694)
 
   * 5620dc41d25ce155b3c57d083880e9f0e697ce9c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19756)
 
   * e86aa76be3833bbcec446d3ed65e05d4b7a94049 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] jmnatzaganian opened a new issue, #9663: [SUPPORT] Flink MOR only creates log files

2023-09-08 Thread via GitHub


jmnatzaganian opened a new issue, #9663:
URL: https://github.com/apache/hudi/issues/9663

   **Describe the problem you faced**
   
   When using Flink with an MOR table only log files are created. This is 
different from Spark, which only uses log files when changes to a file group 
occur. Log files are not optimized for large payloads which results in a costly 
performance hit when flushing the data. The end result is a pipeline which 
quickly buffers data, blocks while converting the data, and then flushes. For 
base files, the heavy lifting is passed to the parquet writer which 
automatically addresses these complications by incrementally processing data at 
the row group level. In that situation, the expectation is a relatively flat 
CPU usage, since serialization and compression is occurring in parallel with 
reads.
   
   Over-provisioning helps to a degree, but is limited to the size of the batch 
size. Reducing the batch size helps significantly but results in too many small 
files. Compaction can be used to help with this, but this adds a large cost in 
the I/O to S3 and compaction job.
   
   **To Reproduce**
   
   See the attached file, 
[Demo.java](https://github.com/apache/hudi/files/12563405/Demo.java.txt), from 
@kzdravkov. This is a self-contained Flink example illustrating the behavior.
   
   **Expected behavior**
   
   See the attached file 
[hudi_spark_mor.py](https://github.com/apache/hudi/files/12563406/hudi_spark_mor.py.txt).
 This is a self-contained PySpark example that shows the behavior that is 
expected. In short - whenever inserts occur it's expected that a base file will 
be created. It's additionally expected that the file bin packing logic will 
occur as appropriate.
   
   **Environment Description**
   
   * Hudi version: 0.13.1
   * Spark version: 3.1.1
   * Flink version: 1.13.1
   * Hive version: N/A
   * Hadoop version: 
   * Storage (HDFS/S3/GCS..): Local, but S3 in prod
   * Running on Docker? (yes/no): No.
   
   Flink is running in k8s. Data is in S3. Catalog uses Glue. For the purposes 
of this issue, local is sufficient for demonstration.
   
   **Additional context**
   
   @kzdravkov initially started this discussion in slack 
[here](https://apache-hudi.slack.com/archives/C01ULJQCXJ5/p1693575957361609) 
with @danny0405 and others. The thread has some additional details about our 
use case, but I'll include some important notes below:
   
   The data volume is modest at 50-100k rps with 100 mb/s uncompressed data. We 
are trying to migrate an existing Flink job that is a basic parquet table to 
Hudi. The job is insert heavy, but does have upserts and as such needs to be 
set to `upsert`. Our data model requires a global index to ensure we don't have 
duplicates (record --> partition mapping can change). This has been mitigated 
by using a pseudo-global index by setting the `FLINK_STATE` index to address 
most of the duplicates. All of the table services are done independently via 
Spark (currently Glue jobs).
   
   Given that this is blocking a rollout, we are planning to test our job with 
COW to see if it's good enough, and then migrate to MOR once this is addressed.
   
   For anyone curious about the log file overhead, please see the above slack 
thread. I believe this is also an issue, but assuming log blocks are kept small 
then it's minor and likely not worth the effort. I'd like to exclude that from 
this issue, since it's a separate and secondary issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9662: [HUDI-6820] Set timeout for TestHoodieDeltaStreamer continuous mode tests

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9662:
URL: https://github.com/apache/hudi/pull/9662#issuecomment-1712190098

   
   ## CI report:
   
   * a297a4dd27cda4491ea6dc79afc57c979feaefcf Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19751)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9618: [HUDI-6753] Fix parquet inline reading flaky test

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9618:
URL: https://github.com/apache/hudi/pull/9618#issuecomment-1712189655

   
   ## CI report:
   
   * ad2e4fda3ffb21219ba4101fdc7c331572524a83 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19656)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19753)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9482: [HUDI-6728] Update BigQuery manifest sync to support schema evolution

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9482:
URL: https://github.com/apache/hudi/pull/9482#issuecomment-1712189148

   
   ## CI report:
   
   * bd99a6ce89b2ab0946f9409c7caf493c95d3befa Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19748)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9538: [HUDI-6738] - Apply object filter before checkpoint batching in GcsEventsHoodieIncrSource

2023-09-08 Thread via GitHub


hudi-bot commented on PR #9538:
URL: https://github.com/apache/hudi/pull/9538#issuecomment-1712189232

   
   ## CI report:
   
   * e23804783126d93786c26ba43c3ec8f003bb977e Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19755)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   3   >