[GitHub] [hudi] hudi-bot commented on pull request #4925: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single sink table from multiple source tables

2022-03-30 Thread GitBox


hudi-bot commented on pull request #4925:
URL: https://github.com/apache/hudi/pull/4925#issuecomment-1084164580


   
   ## CI report:
   
   * 333da7447af7d602ffa3067a759cecc62e4365d8 UNKNOWN
   * f7d371bcdc435fb2ca6738c706233e387fa37e1a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7601)
 
   * 2390118e95cdd87eb191ea4c736b0479bee9aae4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4925: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single sink table from multiple source tables

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #4925:
URL: https://github.com/apache/hudi/pull/4925#issuecomment-1084104818


   
   ## CI report:
   
   * 333da7447af7d602ffa3067a759cecc62e4365d8 UNKNOWN
   * f7d371bcdc435fb2ca6738c706233e387fa37e1a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7601)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5181: [HUDI-3664] Fixing Column Stats Index composition

2022-03-30 Thread GitBox


hudi-bot commented on pull request #5181:
URL: https://github.com/apache/hudi/pull/5181#issuecomment-1084162549


   
   ## CI report:
   
   * cae55044a706a9cc245fabf280f1b201718d79ff UNKNOWN
   * b5721e004c19e7b24ad70696f66a9860e481fb72 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7587)
 
   * 63f6ef63329e3b727e91fec5c4caab76b3688485 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5181: [HUDI-3664] Fixing Column Stats Index composition

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #5181:
URL: https://github.com/apache/hudi/pull/5181#issuecomment-1083954447


   
   ## CI report:
   
   * cae55044a706a9cc245fabf280f1b201718d79ff UNKNOWN
   * b5721e004c19e7b24ad70696f66a9860e481fb72 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7587)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5125: [HUDI-3357] MVP implementation of BigQuerySyncTool

2022-03-30 Thread GitBox


hudi-bot commented on pull request #5125:
URL: https://github.com/apache/hudi/pull/5125#issuecomment-1084162447


   
   ## CI report:
   
   * ff55dc8ddd8a3f167ab11ab3d454841f8dffc389 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7618)
 
   * 4999bc9229273673625abb31ed7826b05f58722d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5125: [HUDI-3357] MVP implementation of BigQuerySyncTool

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #5125:
URL: https://github.com/apache/hudi/pull/5125#issuecomment-1084160398


   
   ## CI report:
   
   * 7580e998c628309f0e37a292a4cbacbc83cad438 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7613)
 
   * ff55dc8ddd8a3f167ab11ab3d454841f8dffc389 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7618)
 
   * 4999bc9229273673625abb31ed7826b05f58722d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5125: [HUDI-3357] MVP implementation of BigQuerySyncTool

2022-03-30 Thread GitBox


hudi-bot commented on pull request #5125:
URL: https://github.com/apache/hudi/pull/5125#issuecomment-1084160398


   
   ## CI report:
   
   * 7580e998c628309f0e37a292a4cbacbc83cad438 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7613)
 
   * ff55dc8ddd8a3f167ab11ab3d454841f8dffc389 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7618)
 
   * 4999bc9229273673625abb31ed7826b05f58722d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5125: [HUDI-3357] MVP implementation of BigQuerySyncTool

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #5125:
URL: https://github.com/apache/hudi/pull/5125#issuecomment-1084157940


   
   ## CI report:
   
   * 7580e998c628309f0e37a292a4cbacbc83cad438 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7613)
 
   * ff55dc8ddd8a3f167ab11ab3d454841f8dffc389 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7618)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5125: [HUDI-3357] MVP implementation of BigQuerySyncTool

2022-03-30 Thread GitBox


hudi-bot commented on pull request #5125:
URL: https://github.com/apache/hudi/pull/5125#issuecomment-1084157940


   
   ## CI report:
   
   * 7580e998c628309f0e37a292a4cbacbc83cad438 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7613)
 
   * ff55dc8ddd8a3f167ab11ab3d454841f8dffc389 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7618)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5125: [HUDI-3357] MVP implementation of BigQuerySyncTool

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #5125:
URL: https://github.com/apache/hudi/pull/5125#issuecomment-1084101901


   
   ## CI report:
   
   * 7580e998c628309f0e37a292a4cbacbc83cad438 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7613)
 
   * ff55dc8ddd8a3f167ab11ab3d454841f8dffc389 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] KnightChess commented on pull request #5134: [HUDI-3750] Fix NPE when build HoodieFileIndex

2022-03-30 Thread GitBox


KnightChess commented on pull request #5134:
URL: https://github.com/apache/hudi/pull/5134#issuecomment-1084153004


   @yihua ok, thanks :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1084145291


   
   ## CI report:
   
   * 52be34d7d5e025180415c46e64a3e2145c29e498 UNKNOWN
   * 78e86dd1953cc4d6bf10ca808a7bcffe22b4b587 UNKNOWN
   * fa9cee18b16f1b11ed039a9da3c490f017775e8d UNKNOWN
   * 60d9cf848b623c27078e1d0b9db329eb8f4cec94 UNKNOWN
   * 9729597c54733bc6518b14418bdbe1cf7febb80b UNKNOWN
   * a543ce26adebe58f6f0954a54524a4cb393c0a0c UNKNOWN
   * f7a1729d2a9529f03b1f3d259b1b1ba4920f137e UNKNOWN
   * 1816108f1144b1c918988022fc7147fbe7bb8f9d UNKNOWN
   * 85cc0c336425a2b1c70694a5f4222c63f98fc3e5 UNKNOWN
   * c41514d513eb6adc831ea580d48a65fd77f49da6 UNKNOWN
   * 52b0671b08edd5d21053b4210e9001e11a7cca01 UNKNOWN
   * d9cc545cf661d7e3adc886ef70542e37426eee0d UNKNOWN
   * 4096466ae627f1c4ca46cae8899507c801e71d1f UNKNOWN
   * 222bf09f67a09c3e30d57796cf435e9a2c594831 UNKNOWN
   * ce6743ba070142bde59f8eaac5b911e6339c2212 UNKNOWN
   * 7ff8b85d83908184608cb59e67fb9236fcad26ea UNKNOWN
   * b3d94a15f601a19a94beda555618eb8e5ea66e33 UNKNOWN
   * 7c80ff0e9ca20acc155b1ea631df9e5efe828adf UNKNOWN
   * 5c147a6a52c3f118a154c9b36c09b9f245dd20e2 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7612)
 
   * 54aa92b9ddba8d567ea3b593700601460bb6ea53 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-03-30 Thread GitBox


hudi-bot commented on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1084149764


   
   ## CI report:
   
   * 52be34d7d5e025180415c46e64a3e2145c29e498 UNKNOWN
   * 78e86dd1953cc4d6bf10ca808a7bcffe22b4b587 UNKNOWN
   * fa9cee18b16f1b11ed039a9da3c490f017775e8d UNKNOWN
   * 60d9cf848b623c27078e1d0b9db329eb8f4cec94 UNKNOWN
   * 9729597c54733bc6518b14418bdbe1cf7febb80b UNKNOWN
   * a543ce26adebe58f6f0954a54524a4cb393c0a0c UNKNOWN
   * f7a1729d2a9529f03b1f3d259b1b1ba4920f137e UNKNOWN
   * 1816108f1144b1c918988022fc7147fbe7bb8f9d UNKNOWN
   * 85cc0c336425a2b1c70694a5f4222c63f98fc3e5 UNKNOWN
   * c41514d513eb6adc831ea580d48a65fd77f49da6 UNKNOWN
   * 52b0671b08edd5d21053b4210e9001e11a7cca01 UNKNOWN
   * d9cc545cf661d7e3adc886ef70542e37426eee0d UNKNOWN
   * 4096466ae627f1c4ca46cae8899507c801e71d1f UNKNOWN
   * 222bf09f67a09c3e30d57796cf435e9a2c594831 UNKNOWN
   * ce6743ba070142bde59f8eaac5b911e6339c2212 UNKNOWN
   * 7ff8b85d83908184608cb59e67fb9236fcad26ea UNKNOWN
   * b3d94a15f601a19a94beda555618eb8e5ea66e33 UNKNOWN
   * 7c80ff0e9ca20acc155b1ea631df9e5efe828adf UNKNOWN
   * 5c147a6a52c3f118a154c9b36c09b9f245dd20e2 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7612)
 
   * 54aa92b9ddba8d567ea3b593700601460bb6ea53 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7616)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5157: [HUDI-3732] Fixing rollback validation

2022-03-30 Thread GitBox


hudi-bot commented on pull request #5157:
URL: https://github.com/apache/hudi/pull/5157#issuecomment-1084148086


   
   ## CI report:
   
   *  Unknown: [CANCELED](TBD) 
   * 6d049f532a187f840f35851ed3b48a1d7de754b3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7615)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5157: [HUDI-3732] Fixing rollback validation

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #5157:
URL: https://github.com/apache/hudi/pull/5157#issuecomment-1084145636


   
   ## CI report:
   
   * 768e5fd2a93b912412aa29231e07157823e61148 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7604)
 
   *  Unknown: [CANCELED](TBD) 
   * 6d049f532a187f840f35851ed3b48a1d7de754b3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated: [HUDI-3692] MetadataFileSystemView includes compaction in timeline (#5110)

2022-03-30 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new ce45f7f  [HUDI-3692] MetadataFileSystemView includes compaction in 
timeline (#5110)
ce45f7f is described below

commit ce45f7f129e4d83705fc6a2a3dc6cfd7414480e1
Author: Yuwei XIAO 
AuthorDate: Thu Mar 31 14:24:59 2022 +0800

[HUDI-3692] MetadataFileSystemView includes compaction in timeline (#5110)
---
 .../TestHoodieSparkMergeOnReadTableCompaction.java | 56 ++
 .../common/table/view/FileSystemViewManager.java   |  2 +-
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableCompaction.java
 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableCompaction.java
index f4f47d3..3b30c5b 100644
--- 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableCompaction.java
+++ 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableCompaction.java
@@ -21,6 +21,9 @@ package org.apache.hudi.table.functional;
 
 import org.apache.hudi.client.SparkRDDWriteClient;
 import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.common.config.HoodieMetadataConfig;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.model.HoodieRecord;
 import org.apache.hudi.common.model.HoodieTableType;
 import org.apache.hudi.common.model.HoodieWriteStat;
 import org.apache.hudi.common.table.HoodieTableMetaClient;
@@ -43,12 +46,16 @@ import org.junit.jupiter.api.Assertions;
 import org.junit.jupiter.api.BeforeEach;
 import org.junit.jupiter.api.Tag;
 import org.junit.jupiter.api.Test;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.Arguments;
+import org.junit.jupiter.params.provider.MethodSource;
 
 import java.io.IOException;
 import java.nio.file.Paths;
 import java.util.Arrays;
 import java.util.List;
 import java.util.stream.Collectors;
+import java.util.stream.Stream;
 
 import static 
org.apache.hudi.common.testutils.HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA;
 import static org.apache.hudi.config.HoodieWriteConfig.AUTO_COMMIT_ENABLE;
@@ -56,6 +63,17 @@ import static 
org.apache.hudi.config.HoodieWriteConfig.AUTO_COMMIT_ENABLE;
 @Tag("functional")
 public class TestHoodieSparkMergeOnReadTableCompaction extends 
SparkClientFunctionalTestHarness {
 
+  private static Stream writeLogTest() {
+// enable metadata table, enable embedded time line server
+Object[][] data = new Object[][] {
+{true, true},
+{true, false},
+{false, true},
+{false, false}
+};
+return Stream.of(data).map(Arguments::of);
+  }
+
   private HoodieTestDataGenerator dataGen;
   private SparkRDDWriteClient client;
   private HoodieTableMetaClient metaClient;
@@ -104,6 +122,44 @@ public class TestHoodieSparkMergeOnReadTableCompaction 
extends SparkClientFuncti
 Assertions.assertEquals(300, readTableTotalRecordsNum());
   }
 
+  @ParameterizedTest
+  @MethodSource("writeLogTest")
+  public void testWriteLogDuringCompaction(boolean enableMetadataTable, 
boolean enableTimelineServer) throws IOException {
+HoodieWriteConfig config = HoodieWriteConfig.newBuilder()
+.forTable("test-trip-table")
+.withPath(basePath())
+.withSchema(TRIP_EXAMPLE_SCHEMA)
+.withParallelism(2, 2)
+.withAutoCommit(true)
+.withEmbeddedTimelineServerEnabled(enableTimelineServer)
+
.withMetadataConfig(HoodieMetadataConfig.newBuilder().enable(enableMetadataTable).build())
+.withCompactionConfig(HoodieCompactionConfig.newBuilder()
+.withMaxNumDeltaCommitsBeforeCompaction(1).build())
+.withLayoutConfig(HoodieLayoutConfig.newBuilder()
+.withLayoutType(HoodieStorageLayout.LayoutType.BUCKET.name())
+
.withLayoutPartitioner(SparkBucketIndexPartitioner.class.getName()).build())
+
.withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.BUCKET).withBucketNum("1").build()).build();
+metaClient = getHoodieMetaClient(HoodieTableType.MERGE_ON_READ, 
config.getProps());
+client = getHoodieWriteClient(config);
+
+final List records = dataGen.generateInserts("001", 100);
+JavaRDD writeRecords = jsc().parallelize(records, 2);
+
+// initialize 100 records
+client.upsert(writeRecords, client.startCommit());
+// update 100 records
+client.upsert(writeRecords, client.startCommit());
+// schedule compaction
+client.scheduleCompaction(Option.empty());
+// delete 50 records
+List toBeDeleted = 
records.stream().map(HoodieRecord::getKey).limit(50).collec

[GitHub] [hudi] nsivabalan merged pull request #5110: [HUDI-3692] Fix lost log writes during async compaction when using metadata table

2022-03-30 Thread GitBox


nsivabalan merged pull request #5110:
URL: https://github.com/apache/hudi/pull/5110


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5157: [HUDI-3732] Fixing rollback validation

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #5157:
URL: https://github.com/apache/hudi/pull/5157#issuecomment-1083995710


   
   ## CI report:
   
   * c442e8ad60d47462f8c754b47c9f55098217f03b Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7593)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7477)
 
   * 768e5fd2a93b912412aa29231e07157823e61148 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7604)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5157: [HUDI-3732] Fixing rollback validation

2022-03-30 Thread GitBox


hudi-bot commented on pull request #5157:
URL: https://github.com/apache/hudi/pull/5157#issuecomment-1084145636


   
   ## CI report:
   
   * 768e5fd2a93b912412aa29231e07157823e61148 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7604)
 
   *  Unknown: [CANCELED](TBD) 
   * 6d049f532a187f840f35851ed3b48a1d7de754b3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-03-30 Thread GitBox


hudi-bot commented on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1084145291


   
   ## CI report:
   
   * 52be34d7d5e025180415c46e64a3e2145c29e498 UNKNOWN
   * 78e86dd1953cc4d6bf10ca808a7bcffe22b4b587 UNKNOWN
   * fa9cee18b16f1b11ed039a9da3c490f017775e8d UNKNOWN
   * 60d9cf848b623c27078e1d0b9db329eb8f4cec94 UNKNOWN
   * 9729597c54733bc6518b14418bdbe1cf7febb80b UNKNOWN
   * a543ce26adebe58f6f0954a54524a4cb393c0a0c UNKNOWN
   * f7a1729d2a9529f03b1f3d259b1b1ba4920f137e UNKNOWN
   * 1816108f1144b1c918988022fc7147fbe7bb8f9d UNKNOWN
   * 85cc0c336425a2b1c70694a5f4222c63f98fc3e5 UNKNOWN
   * c41514d513eb6adc831ea580d48a65fd77f49da6 UNKNOWN
   * 52b0671b08edd5d21053b4210e9001e11a7cca01 UNKNOWN
   * d9cc545cf661d7e3adc886ef70542e37426eee0d UNKNOWN
   * 4096466ae627f1c4ca46cae8899507c801e71d1f UNKNOWN
   * 222bf09f67a09c3e30d57796cf435e9a2c594831 UNKNOWN
   * ce6743ba070142bde59f8eaac5b911e6339c2212 UNKNOWN
   * 7ff8b85d83908184608cb59e67fb9236fcad26ea UNKNOWN
   * b3d94a15f601a19a94beda555618eb8e5ea66e33 UNKNOWN
   * 7c80ff0e9ca20acc155b1ea631df9e5efe828adf UNKNOWN
   * 5c147a6a52c3f118a154c9b36c09b9f245dd20e2 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7612)
 
   * 54aa92b9ddba8d567ea3b593700601460bb6ea53 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1084143034


   
   ## CI report:
   
   * 52be34d7d5e025180415c46e64a3e2145c29e498 UNKNOWN
   * 78e86dd1953cc4d6bf10ca808a7bcffe22b4b587 UNKNOWN
   * fa9cee18b16f1b11ed039a9da3c490f017775e8d UNKNOWN
   * 60d9cf848b623c27078e1d0b9db329eb8f4cec94 UNKNOWN
   * 9729597c54733bc6518b14418bdbe1cf7febb80b UNKNOWN
   * a543ce26adebe58f6f0954a54524a4cb393c0a0c UNKNOWN
   * f7a1729d2a9529f03b1f3d259b1b1ba4920f137e UNKNOWN
   * 1816108f1144b1c918988022fc7147fbe7bb8f9d UNKNOWN
   * 85cc0c336425a2b1c70694a5f4222c63f98fc3e5 UNKNOWN
   * c41514d513eb6adc831ea580d48a65fd77f49da6 UNKNOWN
   * 52b0671b08edd5d21053b4210e9001e11a7cca01 UNKNOWN
   * d9cc545cf661d7e3adc886ef70542e37426eee0d UNKNOWN
   * 4096466ae627f1c4ca46cae8899507c801e71d1f UNKNOWN
   * 222bf09f67a09c3e30d57796cf435e9a2c594831 UNKNOWN
   * ce6743ba070142bde59f8eaac5b911e6339c2212 UNKNOWN
   * 7ff8b85d83908184608cb59e67fb9236fcad26ea UNKNOWN
   * b3d94a15f601a19a94beda555618eb8e5ea66e33 UNKNOWN
   * 7c80ff0e9ca20acc155b1ea631df9e5efe828adf UNKNOWN
   * 54f7f1cbb353fa5a14b1d0c97dde3cb5c7f48f38 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7541)
 
   * 5c147a6a52c3f118a154c9b36c09b9f245dd20e2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7612)
 
   * 54aa92b9ddba8d567ea3b593700601460bb6ea53 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #5157: [HUDI-3732] Fixing rollback validation

2022-03-30 Thread GitBox


nsivabalan commented on pull request #5157:
URL: https://github.com/apache/hudi/pull/5157#issuecomment-1084143423


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1084090687


   
   ## CI report:
   
   * 52be34d7d5e025180415c46e64a3e2145c29e498 UNKNOWN
   * 78e86dd1953cc4d6bf10ca808a7bcffe22b4b587 UNKNOWN
   * fa9cee18b16f1b11ed039a9da3c490f017775e8d UNKNOWN
   * 60d9cf848b623c27078e1d0b9db329eb8f4cec94 UNKNOWN
   * 9729597c54733bc6518b14418bdbe1cf7febb80b UNKNOWN
   * a543ce26adebe58f6f0954a54524a4cb393c0a0c UNKNOWN
   * f7a1729d2a9529f03b1f3d259b1b1ba4920f137e UNKNOWN
   * 1816108f1144b1c918988022fc7147fbe7bb8f9d UNKNOWN
   * 85cc0c336425a2b1c70694a5f4222c63f98fc3e5 UNKNOWN
   * c41514d513eb6adc831ea580d48a65fd77f49da6 UNKNOWN
   * 52b0671b08edd5d21053b4210e9001e11a7cca01 UNKNOWN
   * d9cc545cf661d7e3adc886ef70542e37426eee0d UNKNOWN
   * 4096466ae627f1c4ca46cae8899507c801e71d1f UNKNOWN
   * 222bf09f67a09c3e30d57796cf435e9a2c594831 UNKNOWN
   * ce6743ba070142bde59f8eaac5b911e6339c2212 UNKNOWN
   * 7ff8b85d83908184608cb59e67fb9236fcad26ea UNKNOWN
   * b3d94a15f601a19a94beda555618eb8e5ea66e33 UNKNOWN
   * 7c80ff0e9ca20acc155b1ea631df9e5efe828adf UNKNOWN
   * 54f7f1cbb353fa5a14b1d0c97dde3cb5c7f48f38 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7541)
 
   * 5c147a6a52c3f118a154c9b36c09b9f245dd20e2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7612)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-03-30 Thread GitBox


hudi-bot commented on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1084143034


   
   ## CI report:
   
   * 52be34d7d5e025180415c46e64a3e2145c29e498 UNKNOWN
   * 78e86dd1953cc4d6bf10ca808a7bcffe22b4b587 UNKNOWN
   * fa9cee18b16f1b11ed039a9da3c490f017775e8d UNKNOWN
   * 60d9cf848b623c27078e1d0b9db329eb8f4cec94 UNKNOWN
   * 9729597c54733bc6518b14418bdbe1cf7febb80b UNKNOWN
   * a543ce26adebe58f6f0954a54524a4cb393c0a0c UNKNOWN
   * f7a1729d2a9529f03b1f3d259b1b1ba4920f137e UNKNOWN
   * 1816108f1144b1c918988022fc7147fbe7bb8f9d UNKNOWN
   * 85cc0c336425a2b1c70694a5f4222c63f98fc3e5 UNKNOWN
   * c41514d513eb6adc831ea580d48a65fd77f49da6 UNKNOWN
   * 52b0671b08edd5d21053b4210e9001e11a7cca01 UNKNOWN
   * d9cc545cf661d7e3adc886ef70542e37426eee0d UNKNOWN
   * 4096466ae627f1c4ca46cae8899507c801e71d1f UNKNOWN
   * 222bf09f67a09c3e30d57796cf435e9a2c594831 UNKNOWN
   * ce6743ba070142bde59f8eaac5b911e6339c2212 UNKNOWN
   * 7ff8b85d83908184608cb59e67fb9236fcad26ea UNKNOWN
   * b3d94a15f601a19a94beda555618eb8e5ea66e33 UNKNOWN
   * 7c80ff0e9ca20acc155b1ea631df9e5efe828adf UNKNOWN
   * 54f7f1cbb353fa5a14b1d0c97dde3cb5c7f48f38 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7541)
 
   * 5c147a6a52c3f118a154c9b36c09b9f245dd20e2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7612)
 
   * 54aa92b9ddba8d567ea3b593700601460bb6ea53 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-3756) Clean up indexing APIs in write client

2022-03-30 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-3756:
-

 Summary: Clean up indexing APIs in write client
 Key: HUDI-3756
 URL: https://issues.apache.org/jira/browse/HUDI-3756
 Project: Apache Hudi
  Issue Type: Task
Reporter: Sagar Sumit


See HUDI-3755 for more details.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-3753) Support a CLI command to delete any metadata partition and clean index commits from timeline

2022-03-30 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit reassigned HUDI-3753:
-

Assignee: (was: Sagar Sumit)

> Support a CLI command to delete any metadata partition and clean index  
> commits from timeline
> -
>
> Key: HUDI-3753
> URL: https://issues.apache.org/jira/browse/HUDI-3753
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Priority: Critical
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-3754) Cover all failure scenarios in indexer unit tests

2022-03-30 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit reassigned HUDI-3754:
-

Assignee: (was: Sagar Sumit)

> Cover all failure scenarios in indexer unit tests
> -
>
> Key: HUDI-3754
> URL: https://issues.apache.org/jira/browse/HUDI-3754
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Priority: Critical
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3755) Change the index plan to capture everything that is needed to create index

2022-03-30 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-3755:
-

 Summary: Change the index plan to capture everything that is 
needed to create index
 Key: HUDI-3755
 URL: https://issues.apache.org/jira/browse/HUDI-3755
 Project: Apache Hudi
  Issue Type: Task
Reporter: Sagar Sumit


Currently, we capture the MetadataPartitionType and base instant upto which 
there are no holes in the timeline. We can also write other options, e.g. 
columns that are going to be indexed, catch up start instant, etc. to the plan. 
This will make it easier to reindex,  just read the plan, map it to configs and 
start indexing. This also makes the write client APIs cleaner.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3755) Change the index plan to capture everything that is needed to create index

2022-03-30 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-3755:
--
Priority: Critical  (was: Major)

> Change the index plan to capture everything that is needed to create index
> --
>
> Key: HUDI-3755
> URL: https://issues.apache.org/jira/browse/HUDI-3755
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Priority: Critical
>
> Currently, we capture the MetadataPartitionType and base instant upto which 
> there are no holes in the timeline. We can also write other options, e.g. 
> columns that are going to be indexed, catch up start instant, etc. to the 
> plan. This will make it easier to reindex,  just read the plan, map it to 
> configs and start indexing. This also makes the write client APIs cleaner.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] YuweiXiao commented on a change in pull request #5110: [HUDI-3692] Fix lost log writes during async compaction when using metadata table

2022-03-30 Thread GitBox


YuweiXiao commented on a change in pull request #5110:
URL: https://github.com/apache/hudi/pull/5110#discussion_r839219512



##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableCompaction.java
##
@@ -104,6 +122,44 @@ public void testWriteDuringCompaction() throws IOException 
{
 Assertions.assertEquals(300, readTableTotalRecordsNum());
   }
 
+  @ParameterizedTest
+  @MethodSource("writeLogTest")
+  public void testWriteLogDuringCompaction(boolean enableMetadataTable, 
boolean enableTimelineServer) throws IOException {
+HoodieWriteConfig config = HoodieWriteConfig.newBuilder()
+.forTable("test-trip-table")
+.withPath(basePath())
+.withSchema(TRIP_EXAMPLE_SCHEMA)
+.withParallelism(2, 2)
+.withAutoCommit(true)
+.withEmbeddedTimelineServerEnabled(enableTimelineServer)
+
.withMetadataConfig(HoodieMetadataConfig.newBuilder().enable(enableMetadataTable).build())
+.withCompactionConfig(HoodieCompactionConfig.newBuilder()
+.withMaxNumDeltaCommitsBeforeCompaction(1).build())
+.withLayoutConfig(HoodieLayoutConfig.newBuilder()
+.withLayoutType(HoodieStorageLayout.LayoutType.BUCKET.name())
+
.withLayoutPartitioner(SparkBucketIndexPartitioner.class.getName()).build())
+
.withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.BUCKET).withBucketNum("1").build()).build();
+metaClient = getHoodieMetaClient(HoodieTableType.MERGE_ON_READ, 
config.getProps());
+client = getHoodieWriteClient(config);
+
+final List records = dataGen.generateInserts("001", 100);
+JavaRDD writeRecords = jsc().parallelize(records, 2);
+
+// initialize 100 records
+client.upsert(writeRecords, client.startCommit());
+// update 100 records
+client.upsert(writeRecords, client.startCommit());
+// schedule compaction
+client.scheduleCompaction(Option.empty());
+// delete 50 records
+List toBeDeleted = 
records.stream().map(HoodieRecord::getKey).limit(50).collect(Collectors.toList());
+JavaRDD deleteRecords = jsc().parallelize(toBeDeleted, 2);
+client.delete(deleteRecords, client.startCommit());
+// insert the same 100 records again
+client.upsert(writeRecords, client.startCommit());
+Assertions.assertEquals(100, readTableTotalRecordsNum());

Review comment:
   Yes, it will fail




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #5110: [HUDI-3692] Fix lost log writes during async compaction when using metadata table

2022-03-30 Thread GitBox


nsivabalan commented on a change in pull request #5110:
URL: https://github.com/apache/hudi/pull/5110#discussion_r839217950



##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableCompaction.java
##
@@ -104,6 +122,44 @@ public void testWriteDuringCompaction() throws IOException 
{
 Assertions.assertEquals(300, readTableTotalRecordsNum());
   }
 
+  @ParameterizedTest
+  @MethodSource("writeLogTest")
+  public void testWriteLogDuringCompaction(boolean enableMetadataTable, 
boolean enableTimelineServer) throws IOException {
+HoodieWriteConfig config = HoodieWriteConfig.newBuilder()
+.forTable("test-trip-table")
+.withPath(basePath())
+.withSchema(TRIP_EXAMPLE_SCHEMA)
+.withParallelism(2, 2)
+.withAutoCommit(true)
+.withEmbeddedTimelineServerEnabled(enableTimelineServer)
+
.withMetadataConfig(HoodieMetadataConfig.newBuilder().enable(enableMetadataTable).build())
+.withCompactionConfig(HoodieCompactionConfig.newBuilder()
+.withMaxNumDeltaCommitsBeforeCompaction(1).build())
+.withLayoutConfig(HoodieLayoutConfig.newBuilder()
+.withLayoutType(HoodieStorageLayout.LayoutType.BUCKET.name())
+
.withLayoutPartitioner(SparkBucketIndexPartitioner.class.getName()).build())
+
.withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.BUCKET).withBucketNum("1").build()).build();
+metaClient = getHoodieMetaClient(HoodieTableType.MERGE_ON_READ, 
config.getProps());
+client = getHoodieWriteClient(config);
+
+final List records = dataGen.generateInserts("001", 100);
+JavaRDD writeRecords = jsc().parallelize(records, 2);
+
+// initialize 100 records
+client.upsert(writeRecords, client.startCommit());
+// update 100 records
+client.upsert(writeRecords, client.startCommit());
+// schedule compaction
+client.scheduleCompaction(Option.empty());
+// delete 50 records
+List toBeDeleted = 
records.stream().map(HoodieRecord::getKey).limit(50).collect(Collectors.toList());
+JavaRDD deleteRecords = jsc().parallelize(toBeDeleted, 2);
+client.delete(deleteRecords, client.startCommit());
+// insert the same 100 records again
+client.upsert(writeRecords, client.startCommit());
+Assertions.assertEquals(100, readTableTotalRecordsNum());

Review comment:
   @YuweiXiao : was this test failing w/o the fix ? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-3754) Cover all failure scenarios in indexer unit tests

2022-03-30 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-3754:
-

 Summary: Cover all failure scenarios in indexer unit tests
 Key: HUDI-3754
 URL: https://issues.apache.org/jira/browse/HUDI-3754
 Project: Apache Hudi
  Issue Type: Task
Reporter: Sagar Sumit
Assignee: Sagar Sumit






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3753) Support a CLI command to delete any metadata partition and clean index commits from timeline

2022-03-30 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-3753:
-

 Summary: Support a CLI command to delete any metadata partition 
and clean index  commits from timeline
 Key: HUDI-3753
 URL: https://issues.apache.org/jira/browse/HUDI-3753
 Project: Apache Hudi
  Issue Type: Task
Reporter: Sagar Sumit
Assignee: Sagar Sumit






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] codope commented on a change in pull request #4489: [HUDI-3135] Make delete partitions lazy to be executed by the cleaner

2022-03-30 Thread GitBox


codope commented on a change in pull request #4489:
URL: https://github.com/apache/hudi/pull/4489#discussion_r839211271



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java
##
@@ -350,8 +355,12 @@ public CleanPlanner(HoodieEngineContext context, 
HoodieTable hoodieT
   }
 }
   }
+  // if there are no valid file groups for the partition, mark it to be 
deleted
+  if (fileGroups.isEmpty()) {

Review comment:
   +1




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4925: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single sink table from multiple source tables

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #4925:
URL: https://github.com/apache/hudi/pull/4925#issuecomment-1083984283


   
   ## CI report:
   
   * 333da7447af7d602ffa3067a759cecc62e4365d8 UNKNOWN
   * a270fcd27ae23cd7da02629fc5e24fcb27837f17 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7569)
 
   * f7d371bcdc435fb2ca6738c706233e387fa37e1a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7601)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4925: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single sink table from multiple source tables

2022-03-30 Thread GitBox


hudi-bot commented on pull request #4925:
URL: https://github.com/apache/hudi/pull/4925#issuecomment-1084104818


   
   ## CI report:
   
   * 333da7447af7d602ffa3067a759cecc62e4365d8 UNKNOWN
   * f7d371bcdc435fb2ca6738c706233e387fa37e1a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7601)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5125: [HUDI-3357] MVP implementation of BigQuerySyncTool

2022-03-30 Thread GitBox


hudi-bot commented on pull request #5125:
URL: https://github.com/apache/hudi/pull/5125#issuecomment-1084101901


   
   ## CI report:
   
   * 7580e998c628309f0e37a292a4cbacbc83cad438 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7613)
 
   * ff55dc8ddd8a3f167ab11ab3d454841f8dffc389 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5125: [HUDI-3357] MVP implementation of BigQuerySyncTool

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #5125:
URL: https://github.com/apache/hudi/pull/5125#issuecomment-1084100355


   
   ## CI report:
   
   * b630adc621c074df9b8ada2e5c4435f037484bd2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7313)
 
   * 7580e998c628309f0e37a292a4cbacbc83cad438 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7613)
 
   * ff55dc8ddd8a3f167ab11ab3d454841f8dffc389 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5125: [HUDI-3357] MVP implementation of BigQuerySyncTool

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #5125:
URL: https://github.com/apache/hudi/pull/5125#issuecomment-1084097341


   
   ## CI report:
   
   * b630adc621c074df9b8ada2e5c4435f037484bd2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7313)
 
   * 7580e998c628309f0e37a292a4cbacbc83cad438 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7613)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5125: [HUDI-3357] MVP implementation of BigQuerySyncTool

2022-03-30 Thread GitBox


hudi-bot commented on pull request #5125:
URL: https://github.com/apache/hudi/pull/5125#issuecomment-1084100355


   
   ## CI report:
   
   * b630adc621c074df9b8ada2e5c4435f037484bd2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7313)
 
   * 7580e998c628309f0e37a292a4cbacbc83cad438 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7613)
 
   * ff55dc8ddd8a3f167ab11ab3d454841f8dffc389 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (f6ff95f -> 4569734)

2022-03-30 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from f6ff95f  [MINOR][DOCS] Update hudi-utilities-slim-bundle docs (#5184)
 add 4569734  [HUDI-3713] Guarding archival for multi-writer  (#5138)

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/client/BaseHoodieWriteClient.java  |  20 +++--
 .../apache/hudi/client/HoodieTimelineArchiver.java |  29 +-
 .../apache/hudi/client/HoodieFlinkWriteClient.java |   5 +-
 .../apache/hudi/client/HoodieJavaWriteClient.java  |   2 +-
 .../apache/hudi/client/SparkRDDWriteClient.java|   2 +-
 .../apache/hudi/io/TestHoodieTimelineArchiver.java | 100 -
 6 files changed, 138 insertions(+), 20 deletions(-)


[GitHub] [hudi] nsivabalan merged pull request #5138: [HUDI-3713] Guarding archival for multi-writer

2022-03-30 Thread GitBox


nsivabalan merged pull request #5138:
URL: https://github.com/apache/hudi/pull/5138


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5125: [HUDI-3357] MVP implementation of BigQuerySyncTool

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #5125:
URL: https://github.com/apache/hudi/pull/5125#issuecomment-1084095771


   
   ## CI report:
   
   * b630adc621c074df9b8ada2e5c4435f037484bd2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7313)
 
   * 7580e998c628309f0e37a292a4cbacbc83cad438 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5125: [HUDI-3357] MVP implementation of BigQuerySyncTool

2022-03-30 Thread GitBox


hudi-bot commented on pull request #5125:
URL: https://github.com/apache/hudi/pull/5125#issuecomment-1084097341


   
   ## CI report:
   
   * b630adc621c074df9b8ada2e5c4435f037484bd2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7313)
 
   * 7580e998c628309f0e37a292a4cbacbc83cad438 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7613)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vingov commented on a change in pull request #5125: [HUDI-3357] MVP implementation of BigQuerySyncTool

2022-03-30 Thread GitBox


vingov commented on a change in pull request #5125:
URL: https://github.com/apache/hudi/pull/5125#discussion_r839193909



##
File path: 
hudi-sync/hudi-bigquery-sync/src/main/java/org/apache/hudi/bigquery/HoodieBigQueryClient.java
##
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.bigquery;
+
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.sync.common.AbstractSyncHoodieClient;
+
+import com.google.cloud.bigquery.BigQuery;
+import com.google.cloud.bigquery.BigQueryException;
+import com.google.cloud.bigquery.BigQueryOptions;
+import com.google.cloud.bigquery.CsvOptions;
+import com.google.cloud.bigquery.ExternalTableDefinition;
+import com.google.cloud.bigquery.Field;
+import com.google.cloud.bigquery.FormatOptions;
+import com.google.cloud.bigquery.HivePartitioningOptions;
+import com.google.cloud.bigquery.Schema;
+import com.google.cloud.bigquery.StandardSQLTypeName;
+import com.google.cloud.bigquery.TableId;
+import com.google.cloud.bigquery.TableInfo;
+import com.google.cloud.bigquery.ViewDefinition;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.parquet.schema.MessageType;
+
+import java.util.List;
+import java.util.Map;
+
+public class HoodieBigQueryClient extends AbstractSyncHoodieClient {
+  private static final Logger LOG = 
LogManager.getLogger(HoodieBigQueryClient.class);
+  private transient BigQuery bigquery;
+
+  public HoodieBigQueryClient(final BigQuerySyncConfig syncConfig, final 
FileSystem fs) {
+super(syncConfig.basePath, syncConfig.assumeDatePartitioning, 
syncConfig.useFileListingFromMetadata,
+false, fs);
+this.createBigQueryConnection();
+  }
+
+  private void createBigQueryConnection() {
+if (bigquery == null) {
+  try {
+// Initialize client that will be used to send requests. This client 
only needs to be created
+// once, and can be reused for multiple requests.
+bigquery = BigQueryOptions.getDefaultInstance().getService();
+LOG.info("Successfully established BigQuery connection.");
+  } catch (BigQueryException e) {
+throw new HoodieException("Cannot create bigQuery connection ", e);
+  }
+}
+  }
+
+  @Override
+  public void createTable(final String tableName, final MessageType 
storageSchema, final String inputFormatClass,
+  final String outputFormatClass, final String 
serdeClass,
+  final Map serdeProperties, final 
Map tableProperties) {
+// bigQuery create table arguments are different, so do nothing.
+  }
+
+  public void createVersionsTable(
+  String projectId, String datasetName, String tableName, String 
sourceUri, String sourceUriPrefix, List partitionFields) {
+try {
+  ExternalTableDefinition customTable;
+  TableId tableId = TableId.of(projectId, datasetName, tableName);
+
+  if (partitionFields != null) {
+// Configuring partitioning options for partitioned table.
+HivePartitioningOptions hivePartitioningOptions =
+HivePartitioningOptions.newBuilder()
+.setMode("AUTO")
+.setRequirePartitionFilter(false)
+.setSourceUriPrefix(sourceUriPrefix)
+.build();
+customTable =
+ExternalTableDefinition.newBuilder(sourceUri, 
FormatOptions.parquet())
+.setAutodetect(true)
+.setHivePartitioningOptions(hivePartitioningOptions)
+.setIgnoreUnknownValues(true)
+.setMaxBadRecords(0)
+.build();
+  } else {
+customTable =
+ExternalTableDefinition.newBuilder(sourceUri, 
FormatOptions.parquet())
+.setAutodetect(true)
+.setIgnoreUnknownValues(true)
+.setMaxBadRecords(0)
+.build();
+  }
+
+  bigquery.create(TableInfo.of(tableId, customTable));
+  LOG.info("External table created using hivepartitioningoptions");
+} catch (BigQueryException e) {
+  throw new HoodieException(

[GitHub] [hudi] vingov commented on a change in pull request #5125: [HUDI-3357] MVP implementation of BigQuerySyncTool

2022-03-30 Thread GitBox


vingov commented on a change in pull request #5125:
URL: https://github.com/apache/hudi/pull/5125#discussion_r839193430



##
File path: 
hudi-sync/hudi-bigquery-sync/src/main/java/org/apache/hudi/bigquery/BigQuerySyncTool.java
##
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.bigquery;
+
+import org.apache.hudi.bigquery.util.Utils;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.InvalidTableException;
+import org.apache.hudi.sync.common.AbstractSyncTool;
+
+import com.beust.jcommander.JCommander;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.util.List;
+import java.util.Properties;
+
+/**
+ * Tool to sync a hoodie table with a big query table. Either use it as an api
+ * BigQuerySyncTool.syncHoodieTable(BigQuerySyncConfig) or as a command line 
java -cp hoodie-hive.jar BigQuerySyncTool [args]
+ * 
+ * This utility will get the schema from the latest commit and will sync big 
query table schema
+ */
+public class BigQuerySyncTool extends AbstractSyncTool {
+  private static final Logger LOG = 
LogManager.getLogger(BigQuerySyncTool.class);
+  public final BigQuerySyncConfig cfg;
+  public final HoodieBigQueryClient hoodieBigQueryClient;
+  public String projectId;
+  public String datasetName;
+  public String manifestTableName;
+  public String versionsTableName;
+  public String snapshotViewName;
+  public String sourceUri;
+  public String sourceUriPrefix;
+  public List partitionFields;
+
+  private BigQuerySyncTool(Properties properties, Configuration conf, 
FileSystem fs) {
+super(new TypedProperties(properties), conf, fs);
+hoodieBigQueryClient = new 
HoodieBigQueryClient(Utils.propertiesToConfig(properties), fs);
+cfg = Utils.propertiesToConfig(properties);
+switch (hoodieBigQueryClient.getTableType()) {
+  case COPY_ON_WRITE:
+projectId = cfg.projectId;
+datasetName = cfg.datasetName;
+manifestTableName = cfg.tableName + "_manifest";
+versionsTableName = cfg.tableName + "_versions";
+snapshotViewName = cfg.tableName;
+sourceUri = cfg.sourceUri;
+sourceUriPrefix = cfg.sourceUriPrefix;
+partitionFields = cfg.partitionFields;
+break;
+  case MERGE_ON_READ:
+LOG.error("Not supported table type " + 
hoodieBigQueryClient.getTableType());
+throw new InvalidTableException(hoodieBigQueryClient.getBasePath());
+  default:
+LOG.error("Unknown table type " + hoodieBigQueryClient.getTableType());
+throw new InvalidTableException(hoodieBigQueryClient.getBasePath());
+}
+  }
+
+  public static void main(String[] args) {
+// parse the params
+BigQuerySyncConfig cfg = new BigQuerySyncConfig();
+JCommander cmd = new JCommander(cfg, null, args);
+if (cfg.help || args.length == 0) {
+  cmd.usage();
+  System.exit(1);
+}
+FileSystem fs = FSUtils.getFs(cfg.basePath, new Configuration());
+new BigQuerySyncTool(Utils.configToProperties(cfg), new Configuration(), 
fs).syncHoodieTable();
+  }
+
+  @Override
+  public void syncHoodieTable() {
+try {
+  switch (hoodieBigQueryClient.getTableType()) {
+case COPY_ON_WRITE:
+  syncCoWTable();
+  break;
+case MERGE_ON_READ:
+  LOG.error("Not supported table type " + 
hoodieBigQueryClient.getTableType());
+  throw new InvalidTableException(hoodieBigQueryClient.getBasePath());
+default:
+  LOG.error("Unknown table type " + 
hoodieBigQueryClient.getTableType());
+  throw new InvalidTableException(hoodieBigQueryClient.getBasePath());
+  }
+} catch (RuntimeException re) {
+  throw new HoodieException("Got runtime exception when big query syncing 
" + cfg.tableName, re);
+} finally {
+  hoodieBigQueryClient.close();

Review comment:
   Since the client is created on a different method, it's hard to use the 
try-with-resource block in th

[GitHub] [hudi] vingov commented on a change in pull request #5125: [HUDI-3357] MVP implementation of BigQuerySyncTool

2022-03-30 Thread GitBox


vingov commented on a change in pull request #5125:
URL: https://github.com/apache/hudi/pull/5125#discussion_r839192680



##
File path: 
hudi-sync/hudi-bigquery-sync/src/main/java/org/apache/hudi/bigquery/BigQuerySyncTool.java
##
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.bigquery;
+
+import org.apache.hudi.bigquery.util.Utils;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.InvalidTableException;
+import org.apache.hudi.sync.common.AbstractSyncTool;
+
+import com.beust.jcommander.JCommander;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.util.List;
+import java.util.Properties;
+
+/**
+ * Tool to sync a hoodie table with a big query table. Either use it as an api
+ * BigQuerySyncTool.syncHoodieTable(BigQuerySyncConfig) or as a command line 
java -cp hoodie-hive.jar BigQuerySyncTool [args]
+ * 
+ * This utility will get the schema from the latest commit and will sync big 
query table schema
+ */
+public class BigQuerySyncTool extends AbstractSyncTool {
+  private static final Logger LOG = 
LogManager.getLogger(BigQuerySyncTool.class);
+  public final BigQuerySyncConfig cfg;
+  public final HoodieBigQueryClient hoodieBigQueryClient;
+  public String projectId;
+  public String datasetName;
+  public String manifestTableName;
+  public String versionsTableName;
+  public String snapshotViewName;
+  public String sourceUri;
+  public String sourceUriPrefix;
+  public List partitionFields;
+
+  private BigQuerySyncTool(Properties properties, Configuration conf, 
FileSystem fs) {
+super(new TypedProperties(properties), conf, fs);
+hoodieBigQueryClient = new 
HoodieBigQueryClient(Utils.propertiesToConfig(properties), fs);
+cfg = Utils.propertiesToConfig(properties);
+switch (hoodieBigQueryClient.getTableType()) {
+  case COPY_ON_WRITE:
+projectId = cfg.projectId;
+datasetName = cfg.datasetName;
+manifestTableName = cfg.tableName + "_manifest";
+versionsTableName = cfg.tableName + "_versions";
+snapshotViewName = cfg.tableName;
+sourceUri = cfg.sourceUri;
+sourceUriPrefix = cfg.sourceUriPrefix;
+partitionFields = cfg.partitionFields;
+break;
+  case MERGE_ON_READ:
+LOG.error("Not supported table type " + 
hoodieBigQueryClient.getTableType());
+throw new InvalidTableException(hoodieBigQueryClient.getBasePath());
+  default:
+LOG.error("Unknown table type " + hoodieBigQueryClient.getTableType());
+throw new InvalidTableException(hoodieBigQueryClient.getBasePath());
+}
+  }
+
+  public static void main(String[] args) {
+// parse the params
+BigQuerySyncConfig cfg = new BigQuerySyncConfig();
+JCommander cmd = new JCommander(cfg, null, args);
+if (cfg.help || args.length == 0) {
+  cmd.usage();
+  System.exit(1);
+}
+FileSystem fs = FSUtils.getFs(cfg.basePath, new Configuration());
+new BigQuerySyncTool(Utils.configToProperties(cfg), new Configuration(), 
fs).syncHoodieTable();
+  }
+
+  @Override
+  public void syncHoodieTable() {
+try {
+  switch (hoodieBigQueryClient.getTableType()) {
+case COPY_ON_WRITE:
+  syncCoWTable();
+  break;
+case MERGE_ON_READ:
+  LOG.error("Not supported table type " + 
hoodieBigQueryClient.getTableType());
+  throw new InvalidTableException(hoodieBigQueryClient.getBasePath());
+default:
+  LOG.error("Unknown table type " + 
hoodieBigQueryClient.getTableType());
+  throw new InvalidTableException(hoodieBigQueryClient.getBasePath());
+  }
+} catch (RuntimeException re) {

Review comment:
   Resolved.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about 

[GitHub] [hudi] hudi-bot commented on pull request #5125: [HUDI-3357] MVP implementation of BigQuerySyncTool

2022-03-30 Thread GitBox


hudi-bot commented on pull request #5125:
URL: https://github.com/apache/hudi/pull/5125#issuecomment-1084095771


   
   ## CI report:
   
   * b630adc621c074df9b8ada2e5c4435f037484bd2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7313)
 
   * 7580e998c628309f0e37a292a4cbacbc83cad438 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5125: [HUDI-3357] MVP implementation of BigQuerySyncTool

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #5125:
URL: https://github.com/apache/hudi/pull/5125#issuecomment-1078568661


   
   ## CI report:
   
   * b630adc621c074df9b8ada2e5c4435f037484bd2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7313)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vingov commented on a change in pull request #5125: [HUDI-3357] MVP implementation of BigQuerySyncTool

2022-03-30 Thread GitBox


vingov commented on a change in pull request #5125:
URL: https://github.com/apache/hudi/pull/5125#discussion_r839192064



##
File path: 
hudi-sync/hudi-bigquery-sync/src/main/java/org/apache/hudi/bigquery/BigQuerySyncTool.java
##
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.bigquery;
+
+import org.apache.hudi.bigquery.util.Utils;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.InvalidTableException;
+import org.apache.hudi.sync.common.AbstractSyncTool;
+
+import com.beust.jcommander.JCommander;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.util.List;
+import java.util.Properties;
+
+/**
+ * Tool to sync a hoodie table with a big query table. Either use it as an api
+ * BigQuerySyncTool.syncHoodieTable(BigQuerySyncConfig) or as a command line 
java -cp hoodie-hive.jar BigQuerySyncTool [args]
+ * 
+ * This utility will get the schema from the latest commit and will sync big 
query table schema
+ */
+public class BigQuerySyncTool extends AbstractSyncTool {
+  private static final Logger LOG = 
LogManager.getLogger(BigQuerySyncTool.class);
+  public final BigQuerySyncConfig cfg;
+  public final HoodieBigQueryClient hoodieBigQueryClient;
+  public String projectId;
+  public String datasetName;
+  public String manifestTableName;
+  public String versionsTableName;
+  public String snapshotViewName;
+  public String sourceUri;
+  public String sourceUriPrefix;
+  public List partitionFields;
+
+  private BigQuerySyncTool(Properties properties, Configuration conf, 
FileSystem fs) {
+super(new TypedProperties(properties), conf, fs);
+hoodieBigQueryClient = new 
HoodieBigQueryClient(Utils.propertiesToConfig(properties), fs);
+cfg = Utils.propertiesToConfig(properties);
+switch (hoodieBigQueryClient.getTableType()) {
+  case COPY_ON_WRITE:
+projectId = cfg.projectId;
+datasetName = cfg.datasetName;
+manifestTableName = cfg.tableName + "_manifest";
+versionsTableName = cfg.tableName + "_versions";
+snapshotViewName = cfg.tableName;
+sourceUri = cfg.sourceUri;
+sourceUriPrefix = cfg.sourceUriPrefix;
+partitionFields = cfg.partitionFields;
+break;
+  case MERGE_ON_READ:
+LOG.error("Not supported table type " + 
hoodieBigQueryClient.getTableType());
+throw new InvalidTableException(hoodieBigQueryClient.getBasePath());
+  default:
+LOG.error("Unknown table type " + hoodieBigQueryClient.getTableType());
+throw new InvalidTableException(hoodieBigQueryClient.getBasePath());
+}
+  }
+
+  public static void main(String[] args) {
+// parse the params
+BigQuerySyncConfig cfg = new BigQuerySyncConfig();
+JCommander cmd = new JCommander(cfg, null, args);
+if (cfg.help || args.length == 0) {
+  cmd.usage();
+  System.exit(1);
+}
+FileSystem fs = FSUtils.getFs(cfg.basePath, new Configuration());
+new BigQuerySyncTool(Utils.configToProperties(cfg), new Configuration(), 
fs).syncHoodieTable();

Review comment:
   Updated the code.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vingov commented on a change in pull request #5125: [HUDI-3357] MVP implementation of BigQuerySyncTool

2022-03-30 Thread GitBox


vingov commented on a change in pull request #5125:
URL: https://github.com/apache/hudi/pull/5125#discussion_r839191258



##
File path: 
hudi-sync/hudi-bigquery-sync/src/main/java/org/apache/hudi/bigquery/BigQuerySyncConfig.java
##
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.bigquery;
+
+import org.apache.hudi.common.config.HoodieMetadataConfig;
+
+import com.beust.jcommander.Parameter;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Configs needed to sync data into BigQuery.
+ */
+public class BigQuerySyncConfig implements Serializable {
+
+  @Parameter(names = {"--help", "-h"}, help = true)
+  public final Boolean help = false;

Review comment:
   Agreed, updated the code, thanks for flagging.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5138: [HUDI-3713] Guarding archival for multi-writer

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #5138:
URL: https://github.com/apache/hudi/pull/5138#issuecomment-1083982800


   
   ## CI report:
   
   * 95deb535a66091bdc7702a3da49a3d7a6a4f58e5 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7597)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7596)
 
   * 720872b477abf379ccc889de5d88079fdf96c1cd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7600)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5138: [HUDI-3713] Guarding archival for multi-writer

2022-03-30 Thread GitBox


hudi-bot commented on pull request #5138:
URL: https://github.com/apache/hudi/pull/5138#issuecomment-1084092535


   
   ## CI report:
   
   * 720872b477abf379ccc889de5d88079fdf96c1cd Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7600)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-03-30 Thread GitBox


hudi-bot commented on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1084090687


   
   ## CI report:
   
   * 52be34d7d5e025180415c46e64a3e2145c29e498 UNKNOWN
   * 78e86dd1953cc4d6bf10ca808a7bcffe22b4b587 UNKNOWN
   * fa9cee18b16f1b11ed039a9da3c490f017775e8d UNKNOWN
   * 60d9cf848b623c27078e1d0b9db329eb8f4cec94 UNKNOWN
   * 9729597c54733bc6518b14418bdbe1cf7febb80b UNKNOWN
   * a543ce26adebe58f6f0954a54524a4cb393c0a0c UNKNOWN
   * f7a1729d2a9529f03b1f3d259b1b1ba4920f137e UNKNOWN
   * 1816108f1144b1c918988022fc7147fbe7bb8f9d UNKNOWN
   * 85cc0c336425a2b1c70694a5f4222c63f98fc3e5 UNKNOWN
   * c41514d513eb6adc831ea580d48a65fd77f49da6 UNKNOWN
   * 52b0671b08edd5d21053b4210e9001e11a7cca01 UNKNOWN
   * d9cc545cf661d7e3adc886ef70542e37426eee0d UNKNOWN
   * 4096466ae627f1c4ca46cae8899507c801e71d1f UNKNOWN
   * 222bf09f67a09c3e30d57796cf435e9a2c594831 UNKNOWN
   * ce6743ba070142bde59f8eaac5b911e6339c2212 UNKNOWN
   * 7ff8b85d83908184608cb59e67fb9236fcad26ea UNKNOWN
   * b3d94a15f601a19a94beda555618eb8e5ea66e33 UNKNOWN
   * 7c80ff0e9ca20acc155b1ea631df9e5efe828adf UNKNOWN
   * 54f7f1cbb353fa5a14b1d0c97dde3cb5c7f48f38 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7541)
 
   * 5c147a6a52c3f118a154c9b36c09b9f245dd20e2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7612)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1084069843


   
   ## CI report:
   
   * 52be34d7d5e025180415c46e64a3e2145c29e498 UNKNOWN
   * 78e86dd1953cc4d6bf10ca808a7bcffe22b4b587 UNKNOWN
   * fa9cee18b16f1b11ed039a9da3c490f017775e8d UNKNOWN
   * 60d9cf848b623c27078e1d0b9db329eb8f4cec94 UNKNOWN
   * 9729597c54733bc6518b14418bdbe1cf7febb80b UNKNOWN
   * a543ce26adebe58f6f0954a54524a4cb393c0a0c UNKNOWN
   * f7a1729d2a9529f03b1f3d259b1b1ba4920f137e UNKNOWN
   * 1816108f1144b1c918988022fc7147fbe7bb8f9d UNKNOWN
   * 85cc0c336425a2b1c70694a5f4222c63f98fc3e5 UNKNOWN
   * c41514d513eb6adc831ea580d48a65fd77f49da6 UNKNOWN
   * 52b0671b08edd5d21053b4210e9001e11a7cca01 UNKNOWN
   * d9cc545cf661d7e3adc886ef70542e37426eee0d UNKNOWN
   * 4096466ae627f1c4ca46cae8899507c801e71d1f UNKNOWN
   * 222bf09f67a09c3e30d57796cf435e9a2c594831 UNKNOWN
   * ce6743ba070142bde59f8eaac5b911e6339c2212 UNKNOWN
   * 7ff8b85d83908184608cb59e67fb9236fcad26ea UNKNOWN
   * b3d94a15f601a19a94beda555618eb8e5ea66e33 UNKNOWN
   * 7c80ff0e9ca20acc155b1ea631df9e5efe828adf UNKNOWN
   * 54f7f1cbb353fa5a14b1d0c97dde3cb5c7f48f38 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7541)
 
   * 5c147a6a52c3f118a154c9b36c09b9f245dd20e2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (2dbb273 -> f6ff95f)

2022-03-30 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 2dbb273  [HUDI-3721] Delete MDT if necessary when trigger rollback to 
savepoint (#5173)
 add f6ff95f  [MINOR][DOCS] Update hudi-utilities-slim-bundle docs (#5184)

No new revisions were added by this update.

Summary of changes:
 README.md  |  8 +---
 packaging/hudi-utilities-slim-bundle/README.md | 22 ++
 2 files changed, 23 insertions(+), 7 deletions(-)
 create mode 100644 packaging/hudi-utilities-slim-bundle/README.md


[GitHub] [hudi] xushiyan merged pull request #5184: [MINOR][DOCS] Update hudi-utilities-slim-bundle docs

2022-03-30 Thread GitBox


xushiyan merged pull request #5184:
URL: https://github.com/apache/hudi/pull/5184


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-03-30 Thread GitBox


hudi-bot commented on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1084069843


   
   ## CI report:
   
   * 52be34d7d5e025180415c46e64a3e2145c29e498 UNKNOWN
   * 78e86dd1953cc4d6bf10ca808a7bcffe22b4b587 UNKNOWN
   * fa9cee18b16f1b11ed039a9da3c490f017775e8d UNKNOWN
   * 60d9cf848b623c27078e1d0b9db329eb8f4cec94 UNKNOWN
   * 9729597c54733bc6518b14418bdbe1cf7febb80b UNKNOWN
   * a543ce26adebe58f6f0954a54524a4cb393c0a0c UNKNOWN
   * f7a1729d2a9529f03b1f3d259b1b1ba4920f137e UNKNOWN
   * 1816108f1144b1c918988022fc7147fbe7bb8f9d UNKNOWN
   * 85cc0c336425a2b1c70694a5f4222c63f98fc3e5 UNKNOWN
   * c41514d513eb6adc831ea580d48a65fd77f49da6 UNKNOWN
   * 52b0671b08edd5d21053b4210e9001e11a7cca01 UNKNOWN
   * d9cc545cf661d7e3adc886ef70542e37426eee0d UNKNOWN
   * 4096466ae627f1c4ca46cae8899507c801e71d1f UNKNOWN
   * 222bf09f67a09c3e30d57796cf435e9a2c594831 UNKNOWN
   * ce6743ba070142bde59f8eaac5b911e6339c2212 UNKNOWN
   * 7ff8b85d83908184608cb59e67fb9236fcad26ea UNKNOWN
   * b3d94a15f601a19a94beda555618eb8e5ea66e33 UNKNOWN
   * 7c80ff0e9ca20acc155b1ea631df9e5efe828adf UNKNOWN
   * 54f7f1cbb353fa5a14b1d0c97dde3cb5c7f48f38 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7541)
 
   * 5c147a6a52c3f118a154c9b36c09b9f245dd20e2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1082709362


   
   ## CI report:
   
   * 52be34d7d5e025180415c46e64a3e2145c29e498 UNKNOWN
   * 78e86dd1953cc4d6bf10ca808a7bcffe22b4b587 UNKNOWN
   * fa9cee18b16f1b11ed039a9da3c490f017775e8d UNKNOWN
   * 60d9cf848b623c27078e1d0b9db329eb8f4cec94 UNKNOWN
   * 9729597c54733bc6518b14418bdbe1cf7febb80b UNKNOWN
   * a543ce26adebe58f6f0954a54524a4cb393c0a0c UNKNOWN
   * f7a1729d2a9529f03b1f3d259b1b1ba4920f137e UNKNOWN
   * 1816108f1144b1c918988022fc7147fbe7bb8f9d UNKNOWN
   * 85cc0c336425a2b1c70694a5f4222c63f98fc3e5 UNKNOWN
   * c41514d513eb6adc831ea580d48a65fd77f49da6 UNKNOWN
   * 52b0671b08edd5d21053b4210e9001e11a7cca01 UNKNOWN
   * d9cc545cf661d7e3adc886ef70542e37426eee0d UNKNOWN
   * 4096466ae627f1c4ca46cae8899507c801e71d1f UNKNOWN
   * 222bf09f67a09c3e30d57796cf435e9a2c594831 UNKNOWN
   * ce6743ba070142bde59f8eaac5b911e6339c2212 UNKNOWN
   * 7ff8b85d83908184608cb59e67fb9236fcad26ea UNKNOWN
   * b3d94a15f601a19a94beda555618eb8e5ea66e33 UNKNOWN
   * 7c80ff0e9ca20acc155b1ea631df9e5efe828adf UNKNOWN
   * 54f7f1cbb353fa5a14b1d0c97dde3cb5c7f48f38 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7541)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5182: [MINOR] Fixing parquet reader iterator close

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #5182:
URL: https://github.com/apache/hudi/pull/5182#issuecomment-1083799817


   
   ## CI report:
   
   * 67d0c2b5abb88b41089780d6b9014b7c8aa40c03 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7592)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5182: [MINOR] Fixing parquet reader iterator close

2022-03-30 Thread GitBox


hudi-bot commented on pull request #5182:
URL: https://github.com/apache/hudi/pull/5182#issuecomment-1084051167


   
   ## CI report:
   
   * 67d0c2b5abb88b41089780d6b9014b7c8aa40c03 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7592)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

2022-03-30 Thread GitBox


codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r839147377



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/ThreeToFourUpgradeHandler.java
##
@@ -35,7 +40,12 @@
   @Override
   public Map upgrade(HoodieWriteConfig config, 
HoodieEngineContext context, String instantTime, SupportsUpgradeDowngrade 
upgradeDowngradeHelper) {
 Map tablePropsToAdd = new Hashtable<>();
-tablePropsToAdd.put(HoodieTableConfig.TABLE_CHECKSUM, 
String.valueOf(HoodieTableConfig.generateChecksum(config.getProps(;
+tablePropsToAdd.put(TABLE_CHECKSUM, 
String.valueOf(HoodieTableConfig.generateChecksum(config.getProps(;
+// if metadata is enabled and files partition exist then update 
TABLE_METADATA_INDEX_COMPLETED
+// schema for the files partition is same between the two versions
+if (config.isMetadataTableEnabled() && 
metadataPartitionExists(config.getBasePath(), context, 
MetadataPartitionType.FILES)) {
+  tablePropsToAdd.put(TABLE_METADATA_PARTITIONS, 
MetadataPartitionType.FILES.getPartitionPath());
+}

Review comment:
   @zhangyue19921010 Good question! So, if no upgrade is required.. or 
let's say you upgraded to current version with metadata disabled and then later 
after few commits metadata was enabled, then this table config will get update 
in the metadata initialization path i.e. where 
`HoodieBackedTableMetadataWriter#updateInitializedPartitionsInTableConfig` is 
called.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5184: [MINOR][DOCS] Update hudi-utilities-slim-bundle docs

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #5184:
URL: https://github.com/apache/hudi/pull/5184#issuecomment-1084042321


   
   ## CI report:
   
   * b1ca0e49a1598e0a5a6d55c05c14b10e464acab8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5184: [MINOR][DOCS] Update hudi-utilities-slim-bundle docs

2022-03-30 Thread GitBox


hudi-bot commented on pull request #5184:
URL: https://github.com/apache/hudi/pull/5184#issuecomment-1084043450


   
   ## CI report:
   
   * b1ca0e49a1598e0a5a6d55c05c14b10e464acab8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7610)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3752) Update website content based on 0.11 new features

2022-03-30 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3752:
-
Description: 
content to update
- utilities slim bundle https://github.com/apache/hudi/pull/5184/files

  was:Update 


> Update website content based on 0.11 new features
> -
>
> Key: HUDI-3752
> URL: https://issues.apache.org/jira/browse/HUDI-3752
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Reporter: Raymond Xu
>Priority: Major
> Fix For: 0.11.0
>
>
> content to update
> - utilities slim bundle https://github.com/apache/hudi/pull/5184/files



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3752) Update website content based on 0.11 new features

2022-03-30 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3752:
-
Fix Version/s: 0.11.0

> Update website content based on 0.11 new features
> -
>
> Key: HUDI-3752
> URL: https://issues.apache.org/jira/browse/HUDI-3752
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Reporter: Raymond Xu
>Priority: Major
> Fix For: 0.11.0
>
>
> Update 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3752) Update website content based on 0.11 new features

2022-03-30 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3752:
-
Description: Update 

> Update website content based on 0.11 new features
> -
>
> Key: HUDI-3752
> URL: https://issues.apache.org/jira/browse/HUDI-3752
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Reporter: Raymond Xu
>Priority: Major
>
> Update 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #5168: [HUDI-3729][SPARK] fixed the per regression by enable vectorizeReader for parquet file

2022-03-30 Thread GitBox


hudi-bot commented on pull request #5168:
URL: https://github.com/apache/hudi/pull/5168#issuecomment-1084042283


   
   ## CI report:
   
   * 6ceac1ae15e9b2ef1411050d060f4acce5737305 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7547)
 
   * 121cb81a9551d980601397638577dc8c24ff9869 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7609)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5168: [HUDI-3729][SPARK] fixed the per regression by enable vectorizeReader for parquet file

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #5168:
URL: https://github.com/apache/hudi/pull/5168#issuecomment-1084041048


   
   ## CI report:
   
   * 6ceac1ae15e9b2ef1411050d060f4acce5737305 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7547)
 
   * 121cb81a9551d980601397638577dc8c24ff9869 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5184: [MINOR][DOCS] Update hudi-utilities-slim-bundle docs

2022-03-30 Thread GitBox


hudi-bot commented on pull request #5184:
URL: https://github.com/apache/hudi/pull/5184#issuecomment-1084042321


   
   ## CI report:
   
   * b1ca0e49a1598e0a5a6d55c05c14b10e464acab8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5168: [HUDI-3729][SPARK] fixed the per regression by enable vectorizeReader for parquet file

2022-03-30 Thread GitBox


hudi-bot commented on pull request #5168:
URL: https://github.com/apache/hudi/pull/5168#issuecomment-1084041048


   
   ## CI report:
   
   * 6ceac1ae15e9b2ef1411050d060f4acce5737305 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7547)
 
   * 121cb81a9551d980601397638577dc8c24ff9869 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5168: [HUDI-3729][SPARK] fixed the per regression by enable vectorizeReader for parquet file

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #5168:
URL: https://github.com/apache/hudi/pull/5168#issuecomment-1082902906


   
   ## CI report:
   
   * 6ceac1ae15e9b2ef1411050d060f4acce5737305 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7547)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-3752) Update website content based on 0.11 new features

2022-03-30 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-3752:


 Summary: Update website content based on 0.11 new features
 Key: HUDI-3752
 URL: https://issues.apache.org/jira/browse/HUDI-3752
 Project: Apache Hudi
  Issue Type: Task
  Components: docs
Reporter: Raymond Xu






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] yihua opened a new pull request #5184: [DOCS] Update hudi-utilities-slim-bundle docs

2022-03-30 Thread GitBox


yihua opened a new pull request #5184:
URL: https://github.com/apache/hudi/pull/5184


   ## What is the purpose of the pull request
   
   As above, this PR moves the hudi-utilities-slim-bundle docs to the module.
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xushiyan commented on a change in pull request #5176: [HUDI-3700] Add hudi-utilities-slim-bundle excluding hudi-spark-datasource modules

2022-03-30 Thread GitBox


xushiyan commented on a change in pull request #5176:
URL: https://github.com/apache/hudi/pull/5176#discussion_r839140245



##
File path: README.md
##
@@ -94,6 +94,12 @@ mvn clean package -DskipTests -Dspark3.1.x
 
 Starting from versions 0.11, Hudi no longer requires `spark-avro` to be 
specified using `--packages`
 
+### Usage of hudi-utilities-slim-bundle
+
+Starting from versions 0.11, Hudi provides hudi-utilities-slim-bundle which 
excludes hudi-spark-datasource modules.
+This new bundle is intended to be used with Hudi Spark bundle together, if 
using hudi-utilities-bundle solely
+introduces problems for a specific Spark version.
+

Review comment:
   as discussed we can move this to packages' own readme instead in the 
repo readme




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3751) Hive count throw exception after truncate table

2022-03-30 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HUDI-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

董可伦 updated HUDI-3751:
--
Description: 
{code:java}
// Spark-Sql 
// create table
create table test_hudi_table (
  id int,
  name string,
  price double,
  ts long,
  year string,
  month string,
  day string
) using hudi
partitioned by (year,month,day)
 options (
  primaryKey = 'id',
  preCombineField = 'ts',
  type = 'cow'
 );

// insert
insert into test_hudi_table values (1,'hudi', 10.0,1000,'2022','03','31');
// truncate
truncate table test_hudi_table ;{code}
{code:java}
 // hive tez
select count(1) from
test_hudi_table;
{code}
 

 
{code:java}
//then throw exception
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, 
vertexId=vertex_1648681063719_0012_1_00, diagnostics=[Vertex 
vertex_1648681063719_0012_1_00 [Map 1] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: test_hudi_table initializer failed, 
vertex=vertex_1648681063719_0012_1_00 [Map 1], java.io.FileNotFoundException: 
File does not exist: 
/test_hudi/test_hudi_table/year=2022/month=03/day=31/dd21e7de-430d-4368-8667-07a71e078e3a-0_0-33-1616_20220331095249651.parquet
        at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:86)
        at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:158)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1931)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:426)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
        at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
        at 
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:864)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:851)
        at 
org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:908)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$2.doCall(DistributedFileSystem.java:274)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$2.doCall(DistributedFileSystem.java:271)
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:281)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:255)
        at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:361)
        at 
org.apache.hudi.hadoop.HoodieParquetInputFormatBase.getSplits(HoodieParquetInputFormatBase.java:68)
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:524)
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:779)
        at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio

[jira] [Created] (HUDI-3751) Hive count throw exception after truncate table

2022-03-30 Thread Jira
董可伦 created HUDI-3751:
-

 Summary: Hive count throw exception after truncate table
 Key: HUDI-3751
 URL: https://issues.apache.org/jira/browse/HUDI-3751
 Project: Apache Hudi
  Issue Type: Bug
  Components: hive
Reporter: 董可伦
Assignee: 董可伦






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[hudi] branch master updated: [HUDI-3721] Delete MDT if necessary when trigger rollback to savepoint (#5173)

2022-03-30 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 2dbb273  [HUDI-3721] Delete MDT if necessary when trigger rollback to 
savepoint (#5173)
2dbb273 is described below

commit 2dbb273d26ea38ecf28af704876861554250449c
Author: YueZhang <69956021+zhangyue19921...@users.noreply.github.com>
AuthorDate: Thu Mar 31 11:26:37 2022 +0800

[HUDI-3721] Delete MDT if necessary when trigger rollback to savepoint 
(#5173)

Co-authored-by: yuezhang 
---
 .../hudi/cli/integ/ITTestSavepointsCommand.java| 53 ++
 .../apache/hudi/client/BaseHoodieWriteClient.java  | 45 ++
 .../apache/hudi/client/HoodieFlinkWriteClient.java |  2 +-
 .../apache/hudi/client/HoodieJavaWriteClient.java  |  2 +-
 .../apache/hudi/client/SparkRDDWriteClient.java| 12 +++--
 .../hudi/client/TestTableSchemaEvolution.java  |  4 +-
 .../functional/TestHoodieBackedMetadata.java   |  2 +-
 .../TestHoodieClientOnCopyOnWriteStorage.java  |  2 +-
 .../TestHoodieSparkMergeOnReadTableRollback.java   |  8 ++--
 9 files changed, 106 insertions(+), 24 deletions(-)

diff --git 
a/hudi-cli/src/test/java/org/apache/hudi/cli/integ/ITTestSavepointsCommand.java 
b/hudi-cli/src/test/java/org/apache/hudi/cli/integ/ITTestSavepointsCommand.java
index 5f8021a..7de1c2d 100644
--- 
a/hudi-cli/src/test/java/org/apache/hudi/cli/integ/ITTestSavepointsCommand.java
+++ 
b/hudi-cli/src/test/java/org/apache/hudi/cli/integ/ITTestSavepointsCommand.java
@@ -22,6 +22,8 @@ import org.apache.hadoop.fs.Path;
 import org.apache.hudi.cli.HoodieCLI;
 import org.apache.hudi.cli.commands.TableCommand;
 import org.apache.hudi.cli.testutils.AbstractShellIntegrationTest;
+import org.apache.hudi.client.common.HoodieSparkEngineContext;
+import org.apache.hudi.common.config.HoodieMetadataConfig;
 import org.apache.hudi.common.model.HoodieTableType;
 import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
 import org.apache.hudi.common.table.timeline.HoodieInstant;
@@ -29,6 +31,9 @@ import org.apache.hudi.common.table.timeline.HoodieTimeline;
 import org.apache.hudi.common.table.timeline.versioning.TimelineLayoutVersion;
 import org.apache.hudi.common.testutils.HoodieTestDataGenerator;
 
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter;
 import org.junit.jupiter.api.BeforeEach;
 import org.junit.jupiter.api.Test;
 import org.springframework.shell.core.CommandResult;
@@ -119,6 +124,54 @@ public class ITTestSavepointsCommand extends 
AbstractShellIntegrationTest {
   }
 
   /**
+   * Test case of command 'savepoint rollback' with metadata table bootstrap.
+   */
+  @Test
+  public void testRollbackToSavepointWithMetadataTableEnable() throws 
IOException {
+// generate for savepoints
+for (int i = 101; i < 105; i++) {
+  String instantTime = String.valueOf(i);
+  HoodieTestDataGenerator.createCommitFile(tablePath, instantTime, 
jsc.hadoopConfiguration());
+}
+
+// generate one savepoint at 102
+String savepoint = "102";
+HoodieTestDataGenerator.createSavepointFile(tablePath, savepoint, 
jsc.hadoopConfiguration());
+
+// re-bootstrap metadata table
+// delete first
+String basePath = metaClient.getBasePath();
+Path metadataTableBasePath = new 
Path(HoodieTableMetadata.getMetadataTableBasePath(basePath));
+metaClient.getFs().delete(metadataTableBasePath, true);
+
+// then bootstrap metadata table at instant 104
+HoodieWriteConfig writeConfig = 
HoodieWriteConfig.newBuilder().withPath(HoodieCLI.basePath)
+
.withMetadataConfig(HoodieMetadataConfig.newBuilder().enable(true).build()).build();
+SparkHoodieBackedTableMetadataWriter.create(HoodieCLI.conf, writeConfig, 
new HoodieSparkEngineContext(jsc));
+
+assertTrue(HoodieCLI.fs.exists(metadataTableBasePath));
+
+// roll back to savepoint
+CommandResult cr = getShell().executeCommand(
+String.format("savepoint rollback --savepoint %s --sparkMaster %s", 
savepoint, "local"));
+
+assertAll("Command run failed",
+() -> assertTrue(cr.isSuccess()),
+() -> assertEquals(
+String.format("Savepoint \"%s\" rolled back", savepoint), 
cr.getResult().toString()));
+
+// there is 1 restore instant
+HoodieActiveTimeline timeline = 
HoodieCLI.getTableMetaClient().getActiveTimeline();
+assertEquals(1, timeline.getRestoreTimeline().countInstants());
+
+// 103 and 104 instant had rollback
+assertFalse(timeline.getCommitTimeline().containsInstant(
+new HoodieInstant(HoodieInstant.State.COMPLETED, "commit", "103")));
+assertFalse(timeline.getCommitTimeline().containsInstant(
+new HoodieInstant(HoodieInstant.State.COMPLETED, "commit

[GitHub] [hudi] yihua merged pull request #5173: [HUDI-3721] Delete MDT if necessary when trigger rollback to savepoint

2022-03-30 Thread GitBox


yihua merged pull request #5173:
URL: https://github.com/apache/hudi/pull/5173


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yihua commented on a change in pull request #5173: [HUDI-3721] Delete MDT if necessary when trigger rollback to savepoint

2022-03-30 Thread GitBox


yihua commented on a change in pull request #5173:
URL: https://github.com/apache/hudi/pull/5173#discussion_r839137184



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java
##
@@ -428,11 +428,13 @@ private void updateTableMetadata(HoodieTable table, 
HoodieCommitMetadata commitM
   }
 
   @Override
-  protected HoodieTable doInitTable(HoodieTableMetaClient metaClient, 
Option instantTime) {
-// Initialize Metadata Table to make sure it's bootstrapped _before_ the 
operation,
-// if it didn't exist before
-// See https://issues.apache.org/jira/browse/HUDI-3343 for more details
-initializeMetadataTable(instantTime);
+  protected HoodieTable doInitTable(HoodieTableMetaClient metaClient, 
Option instantTime, boolean initialMetadataTableIfNecessary) {
+if (initialMetadataTableIfNecessary) {
+  // Initialize Metadata Table to make sure it's bootstrapped _before_ the 
operation,
+  // if it didn't exist before
+  // See https://issues.apache.org/jira/browse/HUDI-3343 for more details
+  initializeMetadataTable(instantTime);
+}
 
 // Create a Hoodie table which encapsulated the commits and files visible
 return HoodieSparkTable.create(config, (HoodieSparkEngineContext) context, 
metaClient, config.isMetadataTableEnabled());

Review comment:
   Per discussion, there is no change required here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

2022-03-30 Thread GitBox


zhangyue19921010 commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r839136167



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/ThreeToFourUpgradeHandler.java
##
@@ -35,7 +40,12 @@
   @Override
   public Map upgrade(HoodieWriteConfig config, 
HoodieEngineContext context, String instantTime, SupportsUpgradeDowngrade 
upgradeDowngradeHelper) {
 Map tablePropsToAdd = new Hashtable<>();
-tablePropsToAdd.put(HoodieTableConfig.TABLE_CHECKSUM, 
String.valueOf(HoodieTableConfig.generateChecksum(config.getProps(;
+tablePropsToAdd.put(TABLE_CHECKSUM, 
String.valueOf(HoodieTableConfig.generateChecksum(config.getProps(;
+// if metadata is enabled and files partition exist then update 
TABLE_METADATA_INDEX_COMPLETED
+// schema for the files partition is same between the two versions
+if (config.isMetadataTableEnabled() && 
metadataPartitionExists(config.getBasePath(), context, 
MetadataPartitionType.FILES)) {
+  tablePropsToAdd.put(TABLE_METADATA_PARTITIONS, 
MetadataPartitionType.FILES.getPartitionPath());
+}

Review comment:
   Hi @codope Just thinking, when users set current version 4 which means 
there is no need for upgrade/downgrade. Then how can we update the 
`TABLE_METADATA_PARTITIONS ` column here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-03-30 Thread GitBox


xiarixiaoyao commented on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1084032214


   > @xiarixiaoyao : while you are fixing the integ-test bundle that 
@nsivabalan pointed out, can you rename the variables we are using to lookup 
schema to be instantTime consistently. (there are places (for eg: 
AbstractHoodieLogRecordReader.java) where are using names like "currentTime"). 
Thanks. We should use the name instant to represent HoodieInstant and 
instantTime to represent the string representation of the time referred by 
HoodieInstant)
   
   sorry for later, let me fixed those problem。 thanks very much


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

2022-03-30 Thread GitBox


zhangyue19921010 commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r839136167



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/ThreeToFourUpgradeHandler.java
##
@@ -35,7 +40,12 @@
   @Override
   public Map upgrade(HoodieWriteConfig config, 
HoodieEngineContext context, String instantTime, SupportsUpgradeDowngrade 
upgradeDowngradeHelper) {
 Map tablePropsToAdd = new Hashtable<>();
-tablePropsToAdd.put(HoodieTableConfig.TABLE_CHECKSUM, 
String.valueOf(HoodieTableConfig.generateChecksum(config.getProps(;
+tablePropsToAdd.put(TABLE_CHECKSUM, 
String.valueOf(HoodieTableConfig.generateChecksum(config.getProps(;
+// if metadata is enabled and files partition exist then update 
TABLE_METADATA_INDEX_COMPLETED
+// schema for the files partition is same between the two versions
+if (config.isMetadataTableEnabled() && 
metadataPartitionExists(config.getBasePath(), context, 
MetadataPartitionType.FILES)) {
+  tablePropsToAdd.put(TABLE_METADATA_PARTITIONS, 
MetadataPartitionType.FILES.getPartitionPath());
+}

Review comment:
   Hi @codope Just thinking, when user set current version is 4 which means 
there is no need for upgrade/downgrade. Then how can we update the 
`TABLE_METADATA_PARTITIONS ` column here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-03-30 Thread GitBox


xiarixiaoyao commented on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1084031826


   > hey @xiarixiaoyao : I tried to run integ test for this patch and running 
into NoClassDefFoundError when metadata table compaction kicks in. Likely this 
will happen for any MOR table compaction. Can you take a look and fix it.
   > 
   > ```
   > 22/03/30 23:52:42 ERROR HoodieTestSuiteJob: Failed to run Test Suite 
   > java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: 
com/github/benmanes/caffeine/cache/Caf
   > feine
   > at java.util.concurrent.FutureTask.report(FutureTask.java:122)
   > at java.util.concurrent.FutureTask.get(FutureTask.java:206)
   > at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.execute(DagScheduler.java:113)
   > at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.schedule(DagScheduler.java:68)
   > at 
org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:213)
   > at 
org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.main(HoodieTestSuiteJob.java:180)
   > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   > at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   > at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   > at java.lang.reflect.Method.invoke(Method.java:498)
   > at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
   > at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
   > at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
   > at 
org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
   > at 
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
   > at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
   > at 
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
   > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   > at 
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
   > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   > Caused by: java.lang.NoClassDefFoundError: 
com/github/benmanes/caffeine/cache/Caffeine
   > at 
org.apache.hudi.common.util.InternalSchemaCache.(InternalSchemaCache.java:61)
   > at 
org.apache.hudi.table.action.compact.RunCompactionActionExecutor.execute(RunCompactionActionExecutor.java:79)
   > at 
org.apache.hudi.table.HoodieSparkMergeOnReadTable.compact(HoodieSparkMergeOnReadTable.java:142)
   > at 
org.apache.hudi.client.SparkRDDWriteClient.compact(SparkRDDWriteClient.java:345)
   > at 
org.apache.hudi.client.BaseHoodieWriteClient.compact(BaseHoodieWriteClient.java:974)
   > at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.compactIfNecessary(HoodieBackedTableMetadataWriter.java:825)
   > at 
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.commit(SparkHoodieBackedTableMetadataWriter.java:136)
   > at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.processAndCommit(HoodieBackedTableMetadataWriter.java:674)
   > at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:686)
   > at 
org.apache.hudi.client.BaseHoodieWriteClient.lambda$writeTableMetadata$0(BaseHoodieWriteClient.java:324)
   > at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
   > at 
org.apache.hudi.client.BaseHoodieWriteClient.writeTableMetadata(BaseHoodieWriteClient.java:324)
   > at 
org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:280)
   > at 
org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:226)
   > at 
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:122)
   > at 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:644)
   > at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:309)
   > at 
org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:161)
   > at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
   > at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
   > at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
   > at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
   > at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands

[GitHub] [hudi] hudi-bot commented on pull request #4489: [HUDI-3135] Make delete partitions lazy to be executed by the cleaner

2022-03-30 Thread GitBox


hudi-bot commented on pull request #4489:
URL: https://github.com/apache/hudi/pull/4489#issuecomment-1084022140


   
   ## CI report:
   
   * 8573748e2c9d8e77a8cc2294d322c9ca3f84c3e9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7589)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4489: [HUDI-3135] Make delete partitions lazy to be executed by the cleaner

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #4489:
URL: https://github.com/apache/hudi/pull/4489#issuecomment-1083762050


   
   ## CI report:
   
   * 227baaac22b57d11566bf9307596b79d4763baf8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7567)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7588)
 
   * 8573748e2c9d8e77a8cc2294d322c9ca3f84c3e9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7589)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5179: [HUDI-3290] Different file formats for the partition metadata file.

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #5179:
URL: https://github.com/apache/hudi/pull/5179#issuecomment-1084014134


   
   ## CI report:
   
   * 9d0aa2c64de04d05976bad60f58fae84d5b79c8b Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7591)
 
   * a6afcea229b6ecfcb2e6ea8de506424e22da1133 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7594)
 
   * 1504f8e3a3e27527758ceaeb35f9779014bc46fb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5179: [HUDI-3290] Different file formats for the partition metadata file.

2022-03-30 Thread GitBox


hudi-bot commented on pull request #5179:
URL: https://github.com/apache/hudi/pull/5179#issuecomment-1084015678


   
   ## CI report:
   
   * a6afcea229b6ecfcb2e6ea8de506424e22da1133 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7594)
 
   * 1504f8e3a3e27527758ceaeb35f9779014bc46fb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7607)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5179: [HUDI-3290] Different file formats for the partition metadata file.

2022-03-30 Thread GitBox


hudi-bot commented on pull request #5179:
URL: https://github.com/apache/hudi/pull/5179#issuecomment-1084014134


   
   ## CI report:
   
   * 9d0aa2c64de04d05976bad60f58fae84d5b79c8b Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7591)
 
   * a6afcea229b6ecfcb2e6ea8de506424e22da1133 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7594)
 
   * 1504f8e3a3e27527758ceaeb35f9779014bc46fb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5179: [HUDI-3290] Different file formats for the partition metadata file.

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #5179:
URL: https://github.com/apache/hudi/pull/5179#issuecomment-1083865512


   
   ## CI report:
   
   * 9d0aa2c64de04d05976bad60f58fae84d5b79c8b Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7591)
 
   * a6afcea229b6ecfcb2e6ea8de506424e22da1133 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7594)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5158: [HUDI-3733] Adding HoodieFailedWritesCleaningPolicy for restore with hudi-cli

2022-03-30 Thread GitBox


hudi-bot commented on pull request #5158:
URL: https://github.com/apache/hudi/pull/5158#issuecomment-1084010103


   
   ## CI report:
   
   * 77bfef4579876bf79d8ed971946da7c15a1e80e5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7474)
 
   * 35481045e9f77ced3079a717ae53791986de2a94 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7606)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5158: [HUDI-3733] Adding HoodieFailedWritesCleaningPolicy for restore with hudi-cli

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #5158:
URL: https://github.com/apache/hudi/pull/5158#issuecomment-1084008529


   
   ## CI report:
   
   * 77bfef4579876bf79d8ed971946da7c15a1e80e5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7474)
 
   * 35481045e9f77ced3079a717ae53791986de2a94 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5158: [HUDI-3733] Adding HoodieFailedWritesCleaningPolicy for restore with hudi-cli

2022-03-30 Thread GitBox


hudi-bot commented on pull request #5158:
URL: https://github.com/apache/hudi/pull/5158#issuecomment-1084008529


   
   ## CI report:
   
   * 77bfef4579876bf79d8ed971946da7c15a1e80e5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7474)
 
   * 35481045e9f77ced3079a717ae53791986de2a94 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5158: [HUDI-3733] Adding HoodieFailedWritesCleaningPolicy for restore with hudi-cli

2022-03-30 Thread GitBox


hudi-bot removed a comment on pull request #5158:
URL: https://github.com/apache/hudi/pull/5158#issuecomment-1081209282


   
   ## CI report:
   
   * 77bfef4579876bf79d8ed971946da7c15a1e80e5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7474)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3744) NoSuchMethodError of getReadStatistics with Spark 3.2/Hadoop 3.2 using HBase

2022-03-30 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-3744:

Status: In Progress  (was: Open)

> NoSuchMethodError of getReadStatistics with Spark 3.2/Hadoop 3.2 using HBase 
> -
>
> Key: HUDI-3744
> URL: https://issues.apache.org/jira/browse/HUDI-3744
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.11.0
>
>
> Environment: Hadoop 3.2.1 & Spark-3.2.1 
> hudi  compile from commit f2a93ead3b5a6964a72b3543ada58aa334edef9c 
> just use spark-sql and default job configuration to execute "show partitions 
> [hudi_table_name];"
> {code:java}
> // command
> spark-sql  --conf spark.serializer=org.apache.spark.serializer.KryoSerializer 
> --conf 
> spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension 
> --conf 
> spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog
> // spark-sql
> spark-sql> show partitions hudi_partition_table;
> {code}
> // code placeholderjava.lang.NoSuchMethodError: 
> org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;
>     at 
> org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.updateInputStreamStatistics(FSDataInputStreamWrapper.java:249)
>     at 
> org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.close(FSDataInputStreamWrapper.java:296)
>     at 
> org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.closeStreams(HFileBlock.java:1825)
>     at 
> org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFilePreadReader.close(HFilePreadReader.java:107)
>     at 
> org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.close(HFileReaderImpl.java:1421)
>     at 
> org.apache.hudi.io.storage.HoodieHFileReader.close(HoodieHFileReader.java:423)
>     at 
> org.apache.hudi.metadata.HoodieBackedTableMetadata.close(HoodieBackedTableMetadata.java:435)
>     at 
> org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeys$0(HoodieBackedTableMetadata.java:162)
>     at java.util.HashMap.forEach(HashMap.java:1290)
>     at 
> org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:138)
>     at 
> org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordByKey(HoodieBackedTableMetadata.java:128)
>     at 
> org.apache.hudi.metadata.BaseTableMetadata.fetchAllPartitionPaths(BaseTableMetadata.java:281)
>     at 
> org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:111)
>     at 
> org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:308)
>     at 
> org.apache.spark.sql.hudi.HoodieSqlCommonUtils$.getAllPartitionPaths(HoodieSqlCommonUtils.scala:81)
>     at 
> org.apache.spark.sql.catalyst.catalog.HoodieCatalogTable.getPartitionPaths(HoodieCatalogTable.scala:157)
>     at 
> org.apache.spark.sql.hudi.command.ShowHoodieTablePartitionsCommand.run(ShowHoodieTablePartitionsCommand.scala:51)
>     at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
>     at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
>     at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
>     at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
>     at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
>     at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
>     at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
>     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
>     at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>     at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
>     at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (HUDI-3681) Provision additional bundles aliased to Spark minor version

2022-03-30 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-3681.
---
Resolution: Fixed

> Provision additional bundles aliased to Spark minor version
> ---
>
> Key: HUDI-3681
> URL: https://issues.apache.org/jira/browse/HUDI-3681
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Alexey Kudinkin
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> To follow on the updated model of compatibility with Spark, we need to 
> provision additional bundles as part of the 0.11 release referencing Spark 
> minor version in the bundle name ("spark3.1", "spark3.2", "spark2.4") 
> As such following bundles will be released in 0.11
>  * "spark2.4" (Scala 2.11/2.12)
>  * "spark3.1"
>  * "spark3.2"
>  * "spark3" (this is the same as "spark3.2")
>  * "spark2" (Scala 2.11/2.12, this is the same as "spark2.4")



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (HUDI-3700) Revisit hudi-utilities-bundle build wrt Spark versions

2022-03-30 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-3700.
---
Resolution: Fixed

> Revisit hudi-utilities-bundle build wrt Spark versions
> --
>
> Key: HUDI-3700
> URL: https://issues.apache.org/jira/browse/HUDI-3700
> Project: Apache Hudi
>  Issue Type: Task
>  Components: dependencies, spark
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> When we build hudi-utilities-bundle, the Spark profile can affect the bundle 
> jar.  This causes incompatibility between hudi-utilities-bundle and some 
> Spark versions.  When the hudi-utilities-bundle is built with the Spark 
> version that is going to be used for the ingestion, there is no error.
>  
> For example:
> When running deltastreamer with hudi-utilities-bundle_2.12-0.10.1.jar using 
> Spark 3.1.2, the ingestion job throws java.lang.ClassNotFoundException: 
> org.apache.spark.sql.adapter.Spark3Adapter.
> {code:java}
> /Users/ethan/Work/lib/spark-3.1.2-bin-hadoop3.2/bin/spark-submit \
>         --master local[6] \
>         --driver-memory 6g --executor-memory 2g --num-executors 6 
> --executor-cores 1 \
>         --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
>         --conf spark.sql.catalogImplementation=hive \
>         --conf spark.driver.maxResultSize=1g \
>         --conf spark.speculation=true \
>         --conf spark.speculation.multiplier=1.0 \
>         --conf spark.speculation.quantile=0.5 \
>         --conf spark.ui.port=6679 \
>         --conf spark.eventLog.enabled=true \
>         --conf spark.eventLog.dir=/Users/ethan/Work/data/hudi/spark-logs \
>         --packages org.apache.spark:spark-avro_2.12:3.1.2 \
>         --jars 
> /Users/ethan/Work/repo/hudi-benchmarks/target/hudi-benchmarks-0.1-SNAPSHOT.jar
>  \
>         --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
>         
> /Users/ethan/Work/lib/hudi_releases/0.10.1/hudi-utilities-bundle_2.12-0.10.1.jar
>  \
>         --props 
> /Users/ethan/Work/scripts/hbase-upgrade-testing/hudi_0_10_1_cow/ds_cow_before.properties
>  \
>         --source-class BenchmarkDataSource \
>         --source-ordering-field ts \
>         --target-base-path 
> file:/Users/ethan/Work/scripts/hbase-upgrade-testing/hudi_0_10_1_cow/test_table
>  \
>         --target-table test_table \
>         --table-type COPY_ON_WRITE \
>         --op INSERT >> ds_before.log 2>&1 {code}
>  
> {code:java}
> Exception in thread "main" java.lang.ClassNotFoundException: 
> org.apache.spark.sql.adapter.Spark3Adapter
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>     at 
> org.apache.hudi.SparkAdapterSupport.sparkAdapter(SparkAdapterSupport.scala:35)
>     at 
> org.apache.hudi.SparkAdapterSupport.sparkAdapter$(SparkAdapterSupport.scala:29)
>     at 
> org.apache.hudi.HoodieSparkUtils$.sparkAdapter$lzycompute(HoodieSparkUtils.scala:48)
>     at 
> org.apache.hudi.HoodieSparkUtils$.sparkAdapter(HoodieSparkUtils.scala:48)
>     at 
> org.apache.hudi.HoodieSparkUtils$.createRddInternal(HoodieSparkUtils.scala:144)
>     at org.apache.hudi.HoodieSparkUtils$.createRdd(HoodieSparkUtils.scala:136)
>     at org.apache.hudi.HoodieSparkUtils.createRdd(HoodieSparkUtils.scala)
>     at 
> org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.lambda$fetchNewDataInAvroFormat$1(SourceFormatAdapter.java:79)
>     at org.apache.hudi.common.util.Option.map(Option.java:107)
>     at 
> org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(SourceFormatAdapter.java:70)
>     at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:425)
>     at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:290)
>     at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:193)
>     at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
>     at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:191)
>     at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:514)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>     at 
> org.apache.spa

[jira] [Closed] (HUDI-3750) Fix NPE when build HoodieFileIndex

2022-03-30 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-3750.
---
Resolution: Fixed

> Fix NPE when build HoodieFileIndex
> --
>
> Key: HUDI-3750
> URL: https://issues.apache.org/jira/browse/HUDI-3750
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink, spark
>Reporter: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> fix npc when spark read table which is init by flink.
> configuration *hoodie.datasource.write.partitionpath.field (unpartitioned 
> table)* will not write into hoodie.properties
> or when flink initTable, set hoodie.datasource.write.partitionpath.field 
> empty string
> [https://github.com/apache/hudi/blob/master/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java#L267]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3750) Fix NPE when build HoodieFileIndex

2022-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3750:
-
Labels: pull-request-available  (was: )

> Fix NPE when build HoodieFileIndex
> --
>
> Key: HUDI-3750
> URL: https://issues.apache.org/jira/browse/HUDI-3750
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink, spark
>Reporter: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> fix npc when spark read table which is init by flink.
> configuration *hoodie.datasource.write.partitionpath.field (unpartitioned 
> table)* will not write into hoodie.properties
> or when flink initTable, set hoodie.datasource.write.partitionpath.field 
> empty string
> [https://github.com/apache/hudi/blob/master/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java#L267]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[hudi] branch master updated (d80c806 -> 2c4554f)

2022-03-30 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from d80c806  [MINOR] Fixing flakiness in 
TestHoodieSparkMergeOnReadTableRollback.testRollbackWithDeltaAndCompactionCommit
 (#5183)
 add 2c4554f  [HUDI-3750] Fix NPE when build HoodieFileIndex (#5134)

No new revisions were added by this update.

Summary of changes:
 .../src/main/scala/org/apache/hudi/HoodieFileIndex.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


  1   2   3   4   5   6   >