date:20220902

[GitHub] [hudi] junyuc25 closed pull request #6466: Shutdown CloudWatch reporter when query completes

2022-09-02 Thread GitBox



junyuc25 closed pull request #6466: Shutdown CloudWatch reporter when query 
completes
URL: https://github.com/apache/hudi/pull/6466


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6566: [HUDI-4766] Fix HoodieFlinkClusteringJob

2022-09-02 Thread GitBox



hudi-bot commented on PR #6566:
URL: https://github.com/apache/hudi/pull/6566#issuecomment-1236054693

   
   ## CI report:
   
   * b10c9d062f03c2c2675866c6f4bf6346dc03ea49 UNKNOWN
   * a2dcd81f74603e88c4db895900d43eee6702a6da UNKNOWN
   * c404647afc6d26bc0e69a7a8ef93f378b397bb96 UNKNOWN
   * e8100c4d856971de8dd42ba239a4f029d6ce676e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5)
 
   * 257a2f2acf08448c082c89510cd731b4d8f1b877 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11130)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6550: [HUDI-4691] Cleaning up duplicated classes in Spark 3.3 module

2022-09-02 Thread GitBox



hudi-bot commented on PR #6550:
URL: https://github.com/apache/hudi/pull/6550#issuecomment-1236054684

   
   ## CI report:
   
   * 2e05253a64130a6a74ad67e639acc12b3319187b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11126)
 
   * 684ca9bdec8d75a27bf78ec09bf2ba31f67bdda4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11132)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6016: [HUDI-4465] Optimizing file-listing sequence of Metadata Table

2022-09-02 Thread GitBox



hudi-bot commented on PR #6016:
URL: https://github.com/apache/hudi/pull/6016#issuecomment-1236054584

   
   ## CI report:
   
   * ea2c947722271521c860ee1244586654b20cead0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11129)
 
   * 8be611a24503d2bd26a924b815b0b92aac4e787a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11131)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6566: [HUDI-4766] Fix HoodieFlinkClusteringJob

2022-09-02 Thread GitBox



hudi-bot commented on PR #6566:
URL: https://github.com/apache/hudi/pull/6566#issuecomment-1236054019

   
   ## CI report:
   
   * b10c9d062f03c2c2675866c6f4bf6346dc03ea49 UNKNOWN
   * a2dcd81f74603e88c4db895900d43eee6702a6da UNKNOWN
   * c404647afc6d26bc0e69a7a8ef93f378b397bb96 UNKNOWN
   * e8100c4d856971de8dd42ba239a4f029d6ce676e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5)
 
   * 257a2f2acf08448c082c89510cd731b4d8f1b877 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6550: [HUDI-4691] Cleaning up duplicated classes in Spark 3.3 module

2022-09-02 Thread GitBox



hudi-bot commented on PR #6550:
URL: https://github.com/apache/hudi/pull/6550#issuecomment-1236054014

   
   ## CI report:
   
   * 2e05253a64130a6a74ad67e639acc12b3319187b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11126)
 
   * 684ca9bdec8d75a27bf78ec09bf2ba31f67bdda4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6016: [HUDI-4465] Optimizing file-listing sequence of Metadata Table

2022-09-02 Thread GitBox



hudi-bot commented on PR #6016:
URL: https://github.com/apache/hudi/pull/6016#issuecomment-1236053887

   
   ## CI report:
   
   * ea2c947722271521c860ee1244586654b20cead0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11129)
 
   * 8be611a24503d2bd26a924b815b0b92aac4e787a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] voonhous commented on a diff in pull request #6566: [HUDI-4766] Fix HoodieFlinkClusteringJob

2022-09-02 Thread GitBox



voonhous commented on code in PR #6566:
URL: https://github.com/apache/hudi/pull/6566#discussion_r962105607


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/HoodieFlinkClusteringJob.java:
##
@@ -215,37 +217,63 @@ protected Pair 
startService() {
   }, executor), executor);
 }
 
+/**
+ * Follows the same execution methodology of HoodieFlinkCompactor, where 
only one clustering job is allowed to be
+ * executed at any point in time.
+ * 
+ * If there is an inflight clustering job, it will be rolled back and 
re-attempted.
+ * 
+ * A clustering plan will be generated if `schedule` is true.
+ *
+ * @throws Exception
+ * @see HoodieFlinkCompactor
+ */
 private void cluster() throws Exception {
   table.getMetaClient().reloadActiveTimeline();
 
-  // judges whether there are operations
-  // to compute the clustering instant time and exec clustering.
   if (cfg.schedule) {
+// create a clustering plan on the timeline
 ClusteringUtil.validateClusteringScheduling(conf);
-String clusteringInstantTime = 
HoodieActiveTimeline.createNewInstantTime();
+
+String clusteringInstantTime = cfg.clusteringInstantTime != null ? 
cfg.clusteringInstantTime
+: HoodieActiveTimeline.createNewInstantTime();
+
+LOG.info("Creating a clustering plan for instant [" + 
clusteringInstantTime + "]");
 boolean scheduled = 
writeClient.scheduleClusteringAtInstant(clusteringInstantTime, Option.empty());
 if (!scheduled) {
   // do nothing.
   LOG.info("No clustering plan for this job");
+  executeDummyPipeline();
   return;
 }
 table.getMetaClient().reloadActiveTimeline();
   }
 
   // fetch the instant based on the configured execution sequence
-  List instants = 
ClusteringUtils.getPendingClusteringInstantTimes(table.getMetaClient()).stream()
-  .filter(instant -> instant.getState() == 
HoodieInstant.State.REQUESTED).collect(Collectors.toList());
+  List instants = 
ClusteringUtils.getPendingClusteringInstantTimes(table.getMetaClient());
   if (instants.isEmpty()) {
 // do nothing.
 LOG.info("No clustering plan scheduled, turns on the clustering plan 
schedule with --schedule option");
+executeDummyPipeline();
 return;
   }
 
-  HoodieInstant clusteringInstant = 
CompactionUtil.isLIFO(cfg.clusteringSeq) ? instants.get(instants.size() - 1) : 
instants.get(0);
+  HoodieInstant reqClusteringInstant;
+  if (cfg.clusteringInstantTime != null) {

Review Comment:
   Fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6550: [HUDI-4691] Cleaning up duplicated classes in Spark 3.3 module

2022-09-02 Thread GitBox



alexeykudinkin commented on code in PR #6550:
URL: https://github.com/apache/hudi/pull/6550#discussion_r962089348


##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala:
##
@@ -94,10 +90,10 @@ object HoodieAnalysis {
   //
   // It's critical for this rules to follow in this order, so that 
DataSource V2 to V1 fallback
   // is performed prior to other rules being evaluated
-  rules ++= Seq(dataSourceV2ToV1Fallback, spark3Analysis, 
spark3ResolveReferences, resolveAlterTableCommands)
+  rules ++= Seq(dataSourceV2ToV1Fallback, spark3Analysis, 
resolveAlterTableCommands)
 
 } else if (HoodieSparkUtils.gteqSpark3_1) {
-  val spark31ResolveAlterTableCommandsClass = 
"org.apache.spark.sql.hudi.Spark312ResolveHudiAlterTableCommand"
+  val spark31ResolveAlterTableCommandsClass = 
"org.apache.spark.sql.hudi.Spark31ResolveHudiAlterTableCommand"

Review Comment:
   Yeah, in this case class renames don't have any impact



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] voonhous commented on a diff in pull request #6566: [HUDI-4766] Fix HoodieFlinkClusteringJob

2022-09-02 Thread GitBox



voonhous commented on code in PR #6566:
URL: https://github.com/apache/hudi/pull/6566#discussion_r962104627


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/FlinkClusteringConfig.java:
##
@@ -69,13 +83,14 @@ public class FlinkClusteringConfig extends Configuration {
   required = false)
   public Integer archiveMaxCommits = 30;
 
-  @Parameter(names = {"--schedule", "-sc"}, description = "Not recommended. 
Schedule the clustering plan in this job.\n"
-  + "There is a risk of losing data when scheduling clustering outside the 
writer job.\n"
-  + "Scheduling clustering in the writer job and only let this job do the 
clustering execution is recommended.\n"
-  + "Default is true", required = false)
-  public Boolean schedule = true;
+  @Parameter(names = {"--schedule", "-sc"}, description = "Schedule the 
clustering plan in this job.\n"
+  + "Default is false", required = false)
+  public Boolean schedule = false;
+
+  @Parameter(names = {"--instant-time", "-it"}, description = "Clustering 
Instant time")
+  public String clusteringInstantTime = null;

Review Comment:
   From `HoodieClusteringJob.java`
   ```
   @Parameter(names = {"--instant-time", "-it"}, description = "Clustering 
Instant time, only used when set --mode execute. "
   + "If the instant time is not provided with --mode execute, "
   + "the earliest scheduled clustering instant time is used by 
default. "
   + "When set \"--mode scheduleAndExecute\" this instant-time will be 
ignored.")
   public String clusteringInstantTime = null;
   ```
   
   Should we standardise the parameter? Given that the Spark parameter is using 
`--instant-time`, we should ensure that both of them are the same to avoid 
confusion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] voonhous commented on a diff in pull request #6566: [HUDI-4766] Fix HoodieFlinkClusteringJob

2022-09-02 Thread GitBox



voonhous commented on code in PR #6566:
URL: https://github.com/apache/hudi/pull/6566#discussion_r962104352


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/HoodieFlinkClusteringJob.java:
##
@@ -335,5 +391,17 @@ public void shutdownAsyncService(boolean error) {
 public void shutDown() {
   shutdownAsyncService(false);
 }
+
+/**
+ * Execute a dummy pipeline to prevent "no execute() calls" exceptions 
from being thrown if
+ * clustering is not performed.
+ */

Review Comment:
   How do we do that? 
   
   Do we call `shutDown(false)` to achieve this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] voonhous commented on a diff in pull request #6566: [HUDI-4766] Fix HoodieFlinkClusteringJob

2022-09-02 Thread GitBox



voonhous commented on code in PR #6566:
URL: https://github.com/apache/hudi/pull/6566#discussion_r962104203


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/HoodieFlinkClusteringJob.java:
##
@@ -215,82 +226,126 @@ protected Pair 
startService() {
   }, executor), executor);
 }
 
+/**
+ * Follows the same execution methodology of HoodieFlinkCompactor, where 
only one clustering job
+ * is allowed to be executed at any point in time.
+ * 
+ * If there is an inflight clustering job, it will be rolled back and 
re-attempted.
+ * 
+ * A clustering plan will be generated if `schedule` is true.
+ *
+ * @throws Exception
+ * @see HoodieFlinkCompactor
+ */
 private void cluster() throws Exception {
   table.getMetaClient().reloadActiveTimeline();
 
-  // judges whether there are operations
-  // to compute the clustering instant time and exec clustering.
   if (cfg.schedule) {
+// create a clustering plan on the timeline
 ClusteringUtil.validateClusteringScheduling(conf);
-String clusteringInstantTime = 
HoodieActiveTimeline.createNewInstantTime();
-boolean scheduled = 
writeClient.scheduleClusteringAtInstant(clusteringInstantTime, Option.empty());
+
+String clusteringInstantTime = cfg.clusteringInstantTime != null ? 
cfg.clusteringInstantTime
+: HoodieActiveTimeline.createNewInstantTime();
+
+LOG.info("Creating a clustering plan for instant [" + 
clusteringInstantTime + "]");
+boolean scheduled = 
writeClient.scheduleClusteringAtInstant(clusteringInstantTime,
+Option.empty());
 if (!scheduled) {
   // do nothing.
   LOG.info("No clustering plan for this job");
+  executeDummyPipeline();
   return;
 }
 table.getMetaClient().reloadActiveTimeline();
   }
 
   // fetch the instant based on the configured execution sequence
-  List instants = 
ClusteringUtils.getPendingClusteringInstantTimes(table.getMetaClient()).stream()
-  .filter(instant -> instant.getState() == 
HoodieInstant.State.REQUESTED).collect(Collectors.toList());
+  List instants = 
ClusteringUtils.getPendingClusteringInstantTimes(
+  table.getMetaClient());
   if (instants.isEmpty()) {
 // do nothing.
-LOG.info("No clustering plan scheduled, turns on the clustering plan 
schedule with --schedule option");
+LOG.info(
+"No clustering plan scheduled, turns on the clustering plan 
schedule with --schedule option");
+executeDummyPipeline();
 return;
   }
 
-  HoodieInstant clusteringInstant = 
CompactionUtil.isLIFO(cfg.clusteringSeq) ? instants.get(instants.size() - 1) : 
instants.get(0);
+  HoodieInstant reqClusteringInstant;
+  if (cfg.clusteringInstantTime != null) {
+List reqHoodieInstant = instants
+.stream()
+.filter(i -> i.getTimestamp().equals(cfg.clusteringInstantTime))

Review Comment:
   Using this instead:
   
   ```java
   reqClusteringInstant = instants.stream()
   .filter(i -> i.getTimestamp().equals(cfg.clusteringInstantTime))
   .findFirst()
   .orElseThrow(() -> new HoodieException("Clustering instant [" + 
cfg.clusteringInstantTime + "] not found"));
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6016: [HUDI-4465] Optimizing file-listing sequence of Metadata Table

2022-09-02 Thread GitBox



hudi-bot commented on PR #6016:
URL: https://github.com/apache/hudi/pull/6016#issuecomment-1236047005

   
   ## CI report:
   
   * ea2c947722271521c860ee1244586654b20cead0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11129)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6016: [HUDI-4465] Optimizing file-listing sequence of Metadata Table

2022-09-02 Thread GitBox



hudi-bot commented on PR #6016:
URL: https://github.com/apache/hudi/pull/6016#issuecomment-1236045116

   
   ## CI report:
   
   * 95ce817e050387177ba9620d33868eae1d04306c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11127)
 
   * ea2c947722271521c860ee1244586654b20cead0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11129)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] cshuo commented on a diff in pull request #6325: [MINOR] Improve flink dummySink's parallelism

2022-09-02 Thread GitBox



cshuo commented on code in PR #6325:
URL: https://github.com/apache/hudi/pull/6325#discussion_r962096207


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/Pipelines.java:
##
@@ -432,7 +432,9 @@ public static DataStreamSink clean(Configuration 
conf, DataStream dummySink(DataStream 
dataStream) {
-return dataStream.addSink(Pipelines.DummySink.INSTANCE).name("dummy");
+return dataStream.addSink(Pipelines.DummySink.INSTANCE)
+.setParallelism(1)
+.name("dummy");

Review Comment:
   Sorry to jump in...But I think it's more properly to set the parallelism of 
dummy sink as `FlinkOptions.WRITE_TASKS`, so that dummy sink can be chained 
with hoodie_write_task, which would reduce resource cost in some cases, e.g., 
when slot-sharing is disabled.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6016: [HUDI-4465] Optimizing file-listing sequence of Metadata Table

2022-09-02 Thread GitBox



hudi-bot commented on PR #6016:
URL: https://github.com/apache/hudi/pull/6016#issuecomment-1236038010

   
   ## CI report:
   
   * 95ce817e050387177ba9620d33868eae1d04306c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11127)
 
   * ea2c947722271521c860ee1244586654b20cead0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file

2022-09-02 Thread GitBox



hudi-bot commented on PR #6358:
URL: https://github.com/apache/hudi/pull/6358#issuecomment-1236035286

   
   ## CI report:
   
   * 8915ca346137d319276026dd7aa396a9c7bd2b29 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11128)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on a diff in pull request #6566: [HUDI-4766] Fix HoodieFlinkClusteringJob

2022-09-02 Thread GitBox



danny0405 commented on code in PR #6566:
URL: https://github.com/apache/hudi/pull/6566#discussion_r962089533


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/HoodieFlinkClusteringJob.java:
##
@@ -215,37 +217,63 @@ protected Pair 
startService() {
   }, executor), executor);
 }
 
+/**
+ * Follows the same execution methodology of HoodieFlinkCompactor, where 
only one clustering job is allowed to be
+ * executed at any point in time.
+ * 
+ * If there is an inflight clustering job, it will be rolled back and 
re-attempted.
+ * 
+ * A clustering plan will be generated if `schedule` is true.
+ *
+ * @throws Exception
+ * @see HoodieFlinkCompactor
+ */
 private void cluster() throws Exception {
   table.getMetaClient().reloadActiveTimeline();
 
-  // judges whether there are operations
-  // to compute the clustering instant time and exec clustering.
   if (cfg.schedule) {
+// create a clustering plan on the timeline
 ClusteringUtil.validateClusteringScheduling(conf);
-String clusteringInstantTime = 
HoodieActiveTimeline.createNewInstantTime();
+
+String clusteringInstantTime = cfg.clusteringInstantTime != null ? 
cfg.clusteringInstantTime
+: HoodieActiveTimeline.createNewInstantTime();
+
+LOG.info("Creating a clustering plan for instant [" + 
clusteringInstantTime + "]");
 boolean scheduled = 
writeClient.scheduleClusteringAtInstant(clusteringInstantTime, Option.empty());
 if (!scheduled) {
   // do nothing.
   LOG.info("No clustering plan for this job");
+  executeDummyPipeline();
   return;
 }
 table.getMetaClient().reloadActiveTimeline();
   }
 
   // fetch the instant based on the configured execution sequence
-  List instants = 
ClusteringUtils.getPendingClusteringInstantTimes(table.getMetaClient()).stream()
-  .filter(instant -> instant.getState() == 
HoodieInstant.State.REQUESTED).collect(Collectors.toList());
+  List instants = 
ClusteringUtils.getPendingClusteringInstantTimes(table.getMetaClient());
   if (instants.isEmpty()) {
 // do nothing.
 LOG.info("No clustering plan scheduled, turns on the clustering plan 
schedule with --schedule option");
+executeDummyPipeline();
 return;
   }
 
-  HoodieInstant clusteringInstant = 
CompactionUtil.isLIFO(cfg.clusteringSeq) ? instants.get(instants.size() - 1) : 
instants.get(0);
+  HoodieInstant reqClusteringInstant;
+  if (cfg.clusteringInstantTime != null) {

Review Comment:
   Let's name it back to `clusteringInstant`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on a diff in pull request #6566: [HUDI-4766] Fix HoodieFlinkClusteringJob

2022-09-02 Thread GitBox



danny0405 commented on code in PR #6566:
URL: https://github.com/apache/hudi/pull/6566#discussion_r962089276


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java:
##
@@ -259,26 +271,31 @@ private void compact() throws Exception {
   if (compactionPlans.isEmpty()) {
 // No compaction plan, do nothing and return.
 LOG.info("No compaction plan for instant " + String.join(",", 
compactionInstantTimes));
+executeDummyPipeline();
 return;
   }
 
-  List instants = 
compactionInstantTimes.stream().map(HoodieTimeline::getCompactionRequestedInstant).collect(Collectors.toList());
+  List instants = compactionInstantTimes.stream()
+  
.map(HoodieTimeline::getCompactionRequestedInstant).collect(Collectors.toList());
   for (HoodieInstant instant : instants) {
 if (!pendingCompactionTimeline.containsInstant(instant)) {
   // this means that the compaction plan was written to auxiliary 
path(.tmp)
   // but not the meta path(.hoodie), this usually happens when the job 
crush
   // exceptionally.
   // clean the compaction plan in auxiliary path and cancels the 
compaction.
-  LOG.warn("The compaction plan was fetched through the auxiliary 
path(.tmp) but not the meta path(.hoodie).\n"
-  + "Clean the compaction plan in auxiliary path and cancels the 
compaction");
+  LOG.warn(
+  "The compaction plan was fetched through the auxiliary 
path(.tmp) but not the meta path(.hoodie).\n"
+  + "Clean the compaction plan in auxiliary path and cancels 
the compaction");
   CompactionUtil.cleanInstant(table.getMetaClient(), instant);

Review Comment:
   Revert all the unnecessary change.



##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java:
##
@@ -259,26 +271,31 @@ private void compact() throws Exception {
   if (compactionPlans.isEmpty()) {
 // No compaction plan, do nothing and return.
 LOG.info("No compaction plan for instant " + String.join(",", 
compactionInstantTimes));
+executeDummyPipeline();
 return;
   }
 
-  List instants = 
compactionInstantTimes.stream().map(HoodieTimeline::getCompactionRequestedInstant).collect(Collectors.toList());
+  List instants = compactionInstantTimes.stream()
+  
.map(HoodieTimeline::getCompactionRequestedInstant).collect(Collectors.toList());
   for (HoodieInstant instant : instants) {
 if (!pendingCompactionTimeline.containsInstant(instant)) {
   // this means that the compaction plan was written to auxiliary 
path(.tmp)
   // but not the meta path(.hoodie), this usually happens when the job 
crush
   // exceptionally.
   // clean the compaction plan in auxiliary path and cancels the 
compaction.
-  LOG.warn("The compaction plan was fetched through the auxiliary 
path(.tmp) but not the meta path(.hoodie).\n"
-  + "Clean the compaction plan in auxiliary path and cancels the 
compaction");
+  LOG.warn(
+  "The compaction plan was fetched through the auxiliary 
path(.tmp) but not the meta path(.hoodie).\n"
+  + "Clean the compaction plan in auxiliary path and cancels 
the compaction");
   CompactionUtil.cleanInstant(table.getMetaClient(), instant);
+  executeDummyPipeline();
   return;
 }
   }
 
   // get compactionParallelism.
   int compactionParallelism = 
conf.getInteger(FlinkOptions.COMPACTION_TASKS) == -1
-  ? Math.toIntExact(compactionPlans.stream().mapToLong(pair -> 
pair.getRight().getOperations().size()).sum())
+  ? Math.toIntExact(
+  compactionPlans.stream().mapToLong(pair -> 
pair.getRight().getOperations().size()).sum())

Review Comment:
   Revert all the unnecessary change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on a diff in pull request #6566: [HUDI-4766] Fix HoodieFlinkClusteringJob

2022-09-02 Thread GitBox



danny0405 commented on code in PR #6566:
URL: https://github.com/apache/hudi/pull/6566#discussion_r962089264


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java:
##
@@ -259,26 +271,31 @@ private void compact() throws Exception {
   if (compactionPlans.isEmpty()) {
 // No compaction plan, do nothing and return.
 LOG.info("No compaction plan for instant " + String.join(",", 
compactionInstantTimes));
+executeDummyPipeline();
 return;
   }
 
-  List instants = 
compactionInstantTimes.stream().map(HoodieTimeline::getCompactionRequestedInstant).collect(Collectors.toList());
+  List instants = compactionInstantTimes.stream()
+  
.map(HoodieTimeline::getCompactionRequestedInstant).collect(Collectors.toList());

Review Comment:
   Revert all the unnecessary change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on a diff in pull request #6566: [HUDI-4766] Fix HoodieFlinkClusteringJob

2022-09-02 Thread GitBox



danny0405 commented on code in PR #6566:
URL: https://github.com/apache/hudi/pull/6566#discussion_r962089250


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java:
##
@@ -247,9 +257,11 @@ private void compact() throws Exception {
   List> compactionPlans = 
compactionInstantTimes.stream()
   .map(timestamp -> {
 try {
-  return Pair.of(timestamp, 
CompactionUtils.getCompactionPlan(table.getMetaClient(), timestamp));
+  return Pair.of(timestamp,

Review Comment:
   Revert all the unnecessary change.



##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java:
##
@@ -247,9 +257,11 @@ private void compact() throws Exception {
   List> compactionPlans = 
compactionInstantTimes.stream()
   .map(timestamp -> {
 try {
-  return Pair.of(timestamp, 
CompactionUtils.getCompactionPlan(table.getMetaClient(), timestamp));
+  return Pair.of(timestamp,
+  CompactionUtils.getCompactionPlan(table.getMetaClient(), 
timestamp));
 } catch (IOException e) {
-  throw new HoodieException("Get compaction plan at instant " + 
timestamp + " error", e);
+  throw new HoodieException("Get compaction plan at instant " + 
timestamp + " error",
+  e);

Review Comment:
   Revert all the unnecessary change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on a diff in pull request #6566: [HUDI-4766] Fix HoodieFlinkClusteringJob

2022-09-02 Thread GitBox



danny0405 commented on code in PR #6566:
URL: https://github.com/apache/hudi/pull/6566#discussion_r962089218


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java:
##
@@ -211,28 +214,35 @@ private void compact() throws Exception {
 
   // checks the compaction plan and do compaction.
   if (cfg.schedule) {
-Option compactionInstantTimeOption = 
CompactionUtil.getCompactionInstantTime(metaClient);
+Option compactionInstantTimeOption = 
CompactionUtil.getCompactionInstantTime(
+metaClient);
 if (compactionInstantTimeOption.isPresent()) {
   boolean scheduled = 
writeClient.scheduleCompactionAtInstant(compactionInstantTimeOption.get(), 
Option.empty());
   if (!scheduled) {
 // do nothing.
 LOG.info("No compaction plan for this job ");
+executeDummyPipeline();
 return;
   }
   table.getMetaClient().reloadActiveTimeline();
 }
   }
 
   // fetch the instant based on the configured execution sequence
-  HoodieTimeline pendingCompactionTimeline = 
table.getActiveTimeline().filterPendingCompactionTimeline();
-  List requested = 
CompactionPlanStrategies.getStrategy(cfg).select(pendingCompactionTimeline);
+  HoodieTimeline pendingCompactionTimeline = table.getActiveTimeline()
+  .filterPendingCompactionTimeline();
+  List requested = CompactionPlanStrategies.getStrategy(cfg)
+  .select(pendingCompactionTimeline);
   if (requested.isEmpty()) {
 // do nothing.
-LOG.info("No compaction plan scheduled, turns on the compaction plan 
schedule with --schedule option");
+LOG.info(
+"No compaction plan scheduled, turns on the compaction plan 
schedule with --schedule option");

Review Comment:
   Revert all the unnecessary change.



##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java:
##
@@ -211,28 +214,35 @@ private void compact() throws Exception {
 
   // checks the compaction plan and do compaction.
   if (cfg.schedule) {
-Option compactionInstantTimeOption = 
CompactionUtil.getCompactionInstantTime(metaClient);
+Option compactionInstantTimeOption = 
CompactionUtil.getCompactionInstantTime(
+metaClient);
 if (compactionInstantTimeOption.isPresent()) {
   boolean scheduled = 
writeClient.scheduleCompactionAtInstant(compactionInstantTimeOption.get(), 
Option.empty());
   if (!scheduled) {
 // do nothing.
 LOG.info("No compaction plan for this job ");
+executeDummyPipeline();
 return;
   }
   table.getMetaClient().reloadActiveTimeline();
 }
   }
 
   // fetch the instant based on the configured execution sequence
-  HoodieTimeline pendingCompactionTimeline = 
table.getActiveTimeline().filterPendingCompactionTimeline();
-  List requested = 
CompactionPlanStrategies.getStrategy(cfg).select(pendingCompactionTimeline);
+  HoodieTimeline pendingCompactionTimeline = table.getActiveTimeline()
+  .filterPendingCompactionTimeline();
+  List requested = CompactionPlanStrategies.getStrategy(cfg)
+  .select(pendingCompactionTimeline);
   if (requested.isEmpty()) {
 // do nothing.
-LOG.info("No compaction plan scheduled, turns on the compaction plan 
schedule with --schedule option");
+LOG.info(
+"No compaction plan scheduled, turns on the compaction plan 
schedule with --schedule option");
+executeDummyPipeline();
 return;
   }
 
-  List compactionInstantTimes = 
requested.stream().map(HoodieInstant::getTimestamp).collect(Collectors.toList());
+  List compactionInstantTimes = 
requested.stream().map(HoodieInstant::getTimestamp)
+  .collect(Collectors.toList());

Review Comment:
   Revert all the unnecessary change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on a diff in pull request #6566: [HUDI-4766] Fix HoodieFlinkClusteringJob

2022-09-02 Thread GitBox



danny0405 commented on code in PR #6566:
URL: https://github.com/apache/hudi/pull/6566#discussion_r962089161


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/HoodieFlinkClusteringJob.java:
##
@@ -301,7 +332,8 @@ private void cluster() throws Exception {
   long ckpTimeout = env.getCheckpointConfig().getCheckpointTimeout();
   conf.setLong(FlinkOptions.WRITE_COMMIT_ACK_TIMEOUT, ckpTimeout);
 
-  DataStream dataStream = env.addSource(new 
ClusteringPlanSourceFunction(clusteringInstant.getTimestamp(), clusteringPlan))
+  DataStream dataStream = env.addSource(
+  new 
ClusteringPlanSourceFunction(clusteringInstant.getTimestamp(), clusteringPlan))

Review Comment:
   Revert all the unnecessary change.



##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java:
##
@@ -158,7 +160,8 @@ public static class AsyncCompactionService extends 
HoodieAsyncTableService {
  */
 private final ExecutorService executor;
 
-public AsyncCompactionService(FlinkCompactionConfig cfg, Configuration 
conf, StreamExecutionEnvironment env) throws Exception {
+public AsyncCompactionService(FlinkCompactionConfig cfg, Configuration 
conf,
+StreamExecutionEnvironment env) throws Exception {

Review Comment:
   Revert all the unnecessary change.



##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java:
##
@@ -211,28 +214,35 @@ private void compact() throws Exception {
 
   // checks the compaction plan and do compaction.
   if (cfg.schedule) {
-Option compactionInstantTimeOption = 
CompactionUtil.getCompactionInstantTime(metaClient);
+Option compactionInstantTimeOption = 
CompactionUtil.getCompactionInstantTime(
+metaClient);

Review Comment:
   Revert all the unnecessary change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on a diff in pull request #6566: [HUDI-4766] Fix HoodieFlinkClusteringJob

2022-09-02 Thread GitBox



danny0405 commented on code in PR #6566:
URL: https://github.com/apache/hudi/pull/6566#discussion_r962089147


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/HoodieFlinkClusteringJob.java:
##
@@ -279,18 +310,18 @@ private void cluster() throws Exception {
 // exceptionally.
 
 // clean the clustering plan in auxiliary path and cancels the 
clustering.
-
 LOG.warn("The clustering plan was fetched through the auxiliary 
path(.tmp) but not the meta path(.hoodie).\n"
 + "Clean the clustering plan in auxiliary path and cancels the 
clustering");
 CompactionUtil.cleanInstant(table.getMetaClient(), instant);
+executeDummyPipeline();
 return;
   }
 
   // get clusteringParallelism.
   int clusteringParallelism = 
conf.getInteger(FlinkOptions.CLUSTERING_TASKS) == -1
   ? clusteringPlan.getInputGroups().size() : 
conf.getInteger(FlinkOptions.CLUSTERING_TASKS);
 
-  // Mark instant as clustering inflight
+  // mark instant as clustering inflight

Review Comment:
   Revert all the unnecessary change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on a diff in pull request #6566: [HUDI-4766] Fix HoodieFlinkClusteringJob

2022-09-02 Thread GitBox



danny0405 commented on code in PR #6566:
URL: https://github.com/apache/hudi/pull/6566#discussion_r962089036


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java:
##
@@ -211,28 +214,35 @@ private void compact() throws Exception {
 
   // checks the compaction plan and do compaction.
   if (cfg.schedule) {
-Option compactionInstantTimeOption = 
CompactionUtil.getCompactionInstantTime(metaClient);
+Option compactionInstantTimeOption = 
CompactionUtil.getCompactionInstantTime(
+metaClient);
 if (compactionInstantTimeOption.isPresent()) {
   boolean scheduled = 
writeClient.scheduleCompactionAtInstant(compactionInstantTimeOption.get(), 
Option.empty());
   if (!scheduled) {
 // do nothing.
 LOG.info("No compaction plan for this job ");
+executeDummyPipeline();
 return;
   }
   table.getMetaClient().reloadActiveTimeline();
 }
   }
 
   // fetch the instant based on the configured execution sequence
-  HoodieTimeline pendingCompactionTimeline = 
table.getActiveTimeline().filterPendingCompactionTimeline();
-  List requested = 
CompactionPlanStrategies.getStrategy(cfg).select(pendingCompactionTimeline);
+  HoodieTimeline pendingCompactionTimeline = table.getActiveTimeline()
+  .filterPendingCompactionTimeline();
+  List requested = CompactionPlanStrategies.getStrategy(cfg)

Review Comment:
   Revert all the unnecessary change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on a diff in pull request #6566: [HUDI-4766] Fix HoodieFlinkClusteringJob

2022-09-02 Thread GitBox



danny0405 commented on code in PR #6566:
URL: https://github.com/apache/hudi/pull/6566#discussion_r962089018


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/HoodieFlinkClusteringJob.java:
##
@@ -335,5 +391,17 @@ public void shutdownAsyncService(boolean error) {
 public void shutDown() {
   shutdownAsyncService(false);
 }
+
+/**
+ * Execute a dummy pipeline to prevent "no execute() calls" exceptions 
from being thrown if
+ * clustering is not performed.
+ */

Review Comment:
   Should we just stop submit the job to cluster then ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on a diff in pull request #6566: [HUDI-4766] Fix HoodieFlinkClusteringJob

2022-09-02 Thread GitBox



danny0405 commented on code in PR #6566:
URL: https://github.com/apache/hudi/pull/6566#discussion_r962088631


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/FlinkClusteringConfig.java:
##
@@ -69,13 +83,14 @@ public class FlinkClusteringConfig extends Configuration {
   required = false)
   public Integer archiveMaxCommits = 30;
 
-  @Parameter(names = {"--schedule", "-sc"}, description = "Not recommended. 
Schedule the clustering plan in this job.\n"
-  + "There is a risk of losing data when scheduling clustering outside the 
writer job.\n"
-  + "Scheduling clustering in the writer job and only let this job do the 
clustering execution is recommended.\n"
-  + "Default is true", required = false)
-  public Boolean schedule = true;
+  @Parameter(names = {"--schedule", "-sc"}, description = "Schedule the 
clustering plan in this job.\n"
+  + "Default is false", required = false)
+  public Boolean schedule = false;
+
+  @Parameter(names = {"--instant-time", "-it"}, description = "Clustering 
Instant time")
+  public String clusteringInstantTime = null;

Review Comment:
   Let's rename `--instant-time` to `--instant`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on a diff in pull request #6567: [HUDI-4767] Fix non partition table in hudi-flink ignore KEYGEN_CLASS…

2022-09-02 Thread GitBox



danny0405 commented on code in PR #6567:
URL: https://github.com/apache/hudi/pull/6567#discussion_r962087299


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java:
##
@@ -217,31 +217,33 @@ private static void setupHoodieKeyOptions(Configuration 
conf, CatalogTable table
   }
 }
 
-// tweak the key gen class if possible
-final String[] partitions = 
conf.getString(FlinkOptions.PARTITION_PATH_FIELD).split(",");
-final String[] pks = 
conf.getString(FlinkOptions.RECORD_KEY_FIELD).split(",");
-if (partitions.length == 1) {
-  final String partitionField = partitions[0];
-  if (partitionField.isEmpty()) {
-conf.setString(FlinkOptions.KEYGEN_CLASS_NAME, 
NonpartitionedAvroKeyGenerator.class.getName());
-LOG.info("Table option [{}] is reset to {} because this is a 
non-partitioned table",
-FlinkOptions.KEYGEN_CLASS_NAME.key(), 
NonpartitionedAvroKeyGenerator.class.getName());
-return;
+if (StringUtils.isNullOrEmpty(conf.get(FlinkOptions.KEYGEN_CLASS_NAME))) {
+  // tweak the key gen class if possible

Review Comment:
   In https://github.com/apache/hudi/pull/5815, we have fixed the spark sql to 
use `NonpartitionedKeyGenerator` for non partitioned table.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file

2022-09-02 Thread GitBox



hudi-bot commented on PR #6358:
URL: https://github.com/apache/hudi/pull/6358#issuecomment-1236026461

   
   ## CI report:
   
   * 3e4361accdd100bebd942d54151236ed971046e1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10787)
 
   * 8915ca346137d319276026dd7aa396a9c7bd2b29 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11128)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file

2022-09-02 Thread GitBox



hudi-bot commented on PR #6358:
URL: https://github.com/apache/hudi/pull/6358#issuecomment-1236025829

   
   ## CI report:
   
   * 3e4361accdd100bebd942d54151236ed971046e1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10787)
 
   * 8915ca346137d319276026dd7aa396a9c7bd2b29 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch asf-site updated: [DOCS] Add support for Apache Doris and StarRocks (#6570)

2022-09-02 Thread yihua

This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 09a8207f65 [DOCS] Add support for Apache Doris and StarRocks (#6570)
09a8207f65 is described below

commit 09a8207f65a678cd825ecc06e6810f2221b545e1
Author: Bhavani Sudha Saktheeswaran <2179254+bhasu...@users.noreply.github.com>
AuthorDate: Fri Sep 2 17:43:32 2022 -0700

[DOCS] Add support for Apache Doris and StarRocks (#6570)
---
 website/docs/query_engine_setup.md |  15 +++
 website/docs/querying_data.md  |   7 +++
 website/static/assets/images/hudi-lake.png | Bin 152033 -> 356832 bytes
 .../version-0.10.0/query_engine_setup.md   |  14 ++
 .../versioned_docs/version-0.10.0/querying_data.md |   8 
 .../version-0.10.1/query_engine_setup.md   |   6 ++
 .../versioned_docs/version-0.10.1/querying_data.md |   4 
 .../version-0.11.0/query_engine_setup.md   |   6 ++
 .../versioned_docs/version-0.11.0/querying_data.md |   4 
 .../version-0.11.1/query_engine_setup.md   |   6 ++
 .../versioned_docs/version-0.11.1/querying_data.md |   5 +
 .../version-0.12.0/query_engine_setup.md   |   6 ++
 .../versioned_docs/version-0.12.0/querying_data.md |   4 
 13 files changed, 85 insertions(+)

diff --git a/website/docs/query_engine_setup.md 
b/website/docs/query_engine_setup.md
index 63978797a6..b73e8a9d4a 100644
--- a/website/docs/query_engine_setup.md
+++ b/website/docs/query_engine_setup.md
@@ -99,3 +99,18 @@ Hudi tables are supported only when AWS Glue Data Catalog is 
used. It's not supp
 
 Please refer to [Redshift Spectrum Integration with Apache 
Hudi](https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-tables.html#c-spectrum-column-mapping-hudi)
 for more details.
+
+## Doris
+Copy on Write Tables in Hudi version 0.10.0 can be queried via Doris external 
tables starting from Doris version 1.1. 
+Please refer to [Doris Hudi external 
table](https://doris.apache.org/docs/ecosystem/external-table/hudi-external-table/
 )
+for more details on the setup. 
+
+:::note
+The current default supported version of Hudi is 0.10.0 and has not been 
tested in other versions. More versions will be supported in the future.
+:::
+
+## StarRocks
+Copy on Write tables in Apache Hudi 0.10.0 and above can be queried via 
StarRocks external tables from StarRocks version 2.2.0. 
+Only snapshot queries are supported currently. In future releases Merge on 
Read tables will also be supported. 
+Please refer to [StarRocks Hudi external 
table](https://docs.starrocks.com/en-us/2.2/using_starrocks/External_table#hudi-external-table)
+for more details on the setup.
diff --git a/website/docs/querying_data.md b/website/docs/querying_data.md
index 024c84f5df..27551e5235 100644
--- a/website/docs/querying_data.md
+++ b/website/docs/querying_data.md
@@ -269,6 +269,11 @@ REFRESH database.table_name
 ## Redshift Spectrum
 To set up Redshift Spectrum for querying Hudi, see the [Query Engine 
Setup](/docs/next/query_engine_setup#redshift-spectrum) page.
 
+## Doris 
+To set up Doris for querying Hudi, see the [Query Engine 
Setup](/docs/next/query_engine_setup#doris) page.
+
+## StarRocks
+To set up StarRocks for querying Hudi, see the [Query Engine 
Setup](/docs/next/query_engine_setup#starrocks) page.
 
 ## Support Matrix
 
@@ -286,6 +291,8 @@ Following tables show whether a given query is supported on 
specific query engin
 | **Trino** |Y|N|
 | **Impala**|Y|N|
 | **Redshift Spectrum** |Y|N|
+| **Doris** |Y|N|
+| **StarRocks** |Y|N|
 
 
 
diff --git a/website/static/assets/images/hudi-lake.png 
b/website/static/assets/images/hudi-lake.png
index 4e6f9cf0f3..82e628125c 100644
Binary files a/website/static/assets/images/hudi-lake.png and 
b/website/static/assets/images/hudi-lake.png differ
diff --git a/website/versioned_docs/version-0.10.0/query_engine_setup.md 
b/website/versioned_docs/version-0.10.0/query_engine_setup.md
index 6e4f60b496..80197ea528 100644
--- a/website/versioned_docs/version-0.10.0/query_engine_setup.md
+++ b/website/versioned_docs/version-0.10.0/query_engine_setup.md
@@ -81,3 +81,17 @@ Hudi tables are supported only when AWS Glue Data Catalog is 
used. It's not supp
 Please refer to [Redshift Spectrum Integration with Apache 
Hudi](https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-tables.html#c-spectrum-column-mapping-hudi)
 for more details.
 
+## Doris
+Copy on Write Tables in Hudi version 0.10.0 can be queried via Doris external 
tables starting from Doris version 1.1.
+Please refer to [Doris Hudi external 
table](https://doris.apache.org/docs/ecosystem/external-table/hudi-external-table/
 )
+for more details on the setup.
+
+:::note
+The current default supp

[GitHub] [hudi] yihua merged pull request #6570: [DOCS] Add support for Apache Doris and StarRocks

2022-09-02 Thread GitBox



yihua merged PR #6570:
URL: https://github.com/apache/hudi/pull/6570


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-4468) Simplify TimeTravel logic for Spark 3.3

2022-09-02 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-4468:
--
Sprint:   (was: 2022/09/19)

> Simplify TimeTravel logic for Spark 3.3
> ---
>
> Key: HUDI-4468
> URL: https://issues.apache.org/jira/browse/HUDI-4468
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Shawn Chang
>Assignee: Alexey Kudinkin
>Priority: Major
> Fix For: 0.12.1
>
>
> Existing Hudi relies on .g4 files and antlr classes to make time travel work 
> for Spark 3.2 
> As time travel is supported on Spark 3.3. Those logic can be greatly 
> simplified and some of them can also be removed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4497) Vet all critical code paths for double-checked locking

2022-09-02 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-4497:
--
Sprint: 2022/09/19

> Vet all critical code paths for double-checked locking
> --
>
> Key: HUDI-4497
> URL: https://issues.apache.org/jira/browse/HUDI-4497
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Alexey Kudinkin
>Priority: Major
> Fix For: 0.13.0
>
>
> Based on the followup mentioned in 
> https://github.com/apache/hudi/pull/5523#discussion_r927125192



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-619) Investigate and implement mechanism to have hive/presto/sparksql queries avoid stitching and return null values for hoodie columns

2022-09-02 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-619:
-
Sprint: 2022/09/05  (was: 2022/09/19)

> Investigate and implement mechanism to have hive/presto/sparksql queries 
> avoid stitching and return null values for hoodie columns 
> ---
>
> Key: HUDI-619
> URL: https://issues.apache.org/jira/browse/HUDI-619
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: hive, spark, trino-presto
>Affects Versions: 0.9.0
>Reporter: Balaji Varadarajan
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>
> This idea is suggested by Vinoth during RFC review. This ticket is to track 
> the feasibility and implementation of it. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2022-09-02 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-992:
--

Assignee: Ethan Guo  (was: Udit Mehrotra)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap, meta-sync
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-4453) Support partition pruning for tables Bootstrapped from Source Hive Style partitioned tables

2022-09-02 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-4453:
---

Assignee: Ethan Guo

> Support partition pruning for tables Bootstrapped from Source Hive Style 
> partitioned tables
> ---
>
> Key: HUDI-4453
> URL: https://issues.apache.org/jira/browse/HUDI-4453
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Udit Mehrotra
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>
> As of now the *Bootstrap* feature determines the source schema by reading it 
> from the source parquet files => 
> [https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/bootstrap/ParquetBootstrapMetadataHandler.java#L61]
> This does not consider parquet tables which might be Hive style partitioned. 
> Thus, from the source schema partition columns would be missed and not 
> written to the target Hudi table either. Also because of this partition 
> pruning does not work, as we are unable to prune out source partitions. We 
> should improve this logic to determine partition schema correctly from the 
> partition paths in case of hive style partitioned tables and write the 
> partition column values correctly in the target Hudi table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-3122) presto query failed for bootstrap tables

2022-09-02 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-3122:
---

Assignee: Ethan Guo

> presto query failed for bootstrap tables
> 
>
> Key: HUDI-3122
> URL: https://issues.apache.org/jira/browse/HUDI-3122
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: trino-presto
>Reporter: Wenning Ding
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>
>  
> {{java.lang.NoClassDefFoundError: 
> org/apache/hudi/org/apache/hadoop/hbase/io/hfile/CacheConfig
> at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex.createReader(HFileBootstrapIndex.java:181)
> at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex.access$400(HFileBootstrapIndex.java:76)
> at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex$HFileBootstrapIndexReader.partitionIndexReader(HFileBootstrapIndex.java:272)
> at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex$HFileBootstrapIndexReader.fetchBootstrapIndexInfo(HFileBootstrapIndex.java:262)
> at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex$HFileBootstrapIndexReader.initIndexInfo(HFileBootstrapIndex.java:252)
> at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex$HFileBootstrapIndexReader.(HFileBootstrapIndex.java:243)
> at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex.createReader(HFileBootstrapIndex.java:191)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$addFilesToView$2(AbstractTableFileSystemView.java:137)
> at java.util.HashMap.forEach(HashMap.java:1290)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.addFilesToView(AbstractTableFileSystemView.java:134)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$9(AbstractTableFileSystemView.java:294)
> at 
> java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:281)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-915) Partition Columns missing in files upserted after Metadata Bootstrap

2022-09-02 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-915:
--

Assignee: Ethan Guo  (was: Udit Mehrotra)

> Partition Columns missing in files upserted after Metadata Bootstrap
> 
>
> Key: HUDI-915
> URL: https://issues.apache.org/jira/browse/HUDI-915
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Common Core
>Affects Versions: 0.9.0
>Reporter: Udit Mehrotra
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>
> This issue happens in when the source data is partitioned using _*hive-style 
> partitioning*_ which is also the default behavior of spark when it writes the 
> data. With this partitioning, the partition column/schema is never stored in 
> the files but instead retrieved on the fly from the file paths which have 
> partition folder in the form *_partition_key=partition_value_*.
> Now, during metadata bootstrap we store only the metadata columns in the hudi 
> table folder. Also the *bootstrap schema* we are computing directly reads 
> schema from the source data file which does not have the *partition column 
> schema* in it. Thus it is not complete.
> All this manifests into issues when we ultimately do *upserts* on these 
> bootstrapped files and they are fully bootstrapped. During upsert time the 
> schema evolves because the upsert dataframe needs to have partition column in 
> it for performing upserts. Thus ultimately the *upserted rows* have the 
> correct partition column value stored, while the other records which are 
> simply copied over from the metadata bootstrap file have missing partition 
> column in them. Thus, we observe a different behavior here with 
> *bootstrapped* vs *non-bootstrapped* tables.
> While this is not at the moment creating issues with *Hive* because it is 
> able to determine the partition columns becuase of all the metadata it 
> stores, however it creates a problem with other engines like *Spark* where 
> the partition columns will show up as *null* when the upserted files are read.
> Thus, the proposal is to fix the following issues:
>  * When performing bootstrap, figure out the partition schema and store it in 
> the *bootstrap schema* in the commit metadata file. This would provide the 
> following benefits:
>  ** From a completeness perspective this is good so that there is no 
> behavioral changes between bootstrapped vs non-bootstrapped tables.
>  ** In spark bootstrap relation and incremental query relation where we need 
> to figure out the latest schema, once can simply get the accurate schema from 
> the commit metadata file instead of having to determine whether or not 
> partition column is present in the schema obtained from the metadata file and 
> if not figure out the partition schema everytime and merge (which can be 
> expensive).
>  * When doing upsert on files that are metadata bootstrapped, the partition 
> column values should be correctly determined and copied to the upserted file 
> to avoid missing and null values.
>  ** Again this is consistent behavior with non-bootstrapped tables and even 
> though Hive seems to somehow handle this, we should consider other engines 
> like *Spark* where it cannot be automatically handled.
>  ** Without this it will be significantly more complicated to be able to 
> provide the partition value on read side in spark, to be able to determine 
> everytime whether partition value is null and somehow filling it in.
>  ** Once the table is fully bootstrapped at some point in future, and the 
> bootstrap commit is say cleaned up and spark querying happens through 
> *parquet* datasource instead of *new bootstrapped datasource*, the *parquet 
> datasource* will return null values wherever it find the missing partition 
> values. In that case, we have no control over the *parquet* datasource as it 
> is simply reading from the file. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-1001) Add implementation to translate source partition paths when doing metadata bootstrap

2022-09-02 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-1001:
---

Assignee: Ethan Guo

> Add implementation to translate source partition paths when doing metadata 
> bootstrap
> 
>
> Key: HUDI-1001
> URL: https://issues.apache.org/jira/browse/HUDI-1001
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: bootstrap
>Affects Versions: 0.9.0
>Reporter: Balaji Varadarajan
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>
> While doing metadata bootstrap, we can provide ability to change the 
> partition-path name. It will still be 1-1 between source and bootstrapped 
> table but we can make the partition-path adhere to hive style.
> For e:g /src_base_path/2020/06/05/ can be mapped to 
> /bootstrap_base_path/ds=2020%2F06%2F05/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-2071) Support Reading Bootstrap MOR RT Table In Spark DataSource Table

2022-09-02 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-2071:
---

Assignee: Ethan Guo

> Support Reading Bootstrap MOR RT Table  In Spark DataSource Table
> -
>
> Key: HUDI-2071
> URL: https://issues.apache.org/jira/browse/HUDI-2071
> Project: Apache Hudi
>  Issue Type: Task
>  Components: spark
>Reporter: pengzhiwei
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>
> Currently spark datasource table use the HoodieBootstrapRelation to read 
> bootstrap table.
> However, for bootstrap mor rt table, we have not support yet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-1157) Optimization whether to query Bootstrapped table using HoodieBootstrapRelation vs Sparks Parquet datasource

2022-09-02 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-1157:
---

Assignee: Ethan Guo

> Optimization whether to query Bootstrapped table using 
> HoodieBootstrapRelation vs Sparks Parquet datasource
> ---
>
> Key: HUDI-1157
> URL: https://issues.apache.org/jira/browse/HUDI-1157
> Project: Apache Hudi
>  Issue Type: Task
>  Components: bootstrap
>Reporter: Udit Mehrotra
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>
> This has been discussed in 
> [https://github.com/apache/hudi/pull/1702#discussion_r466317612]
> As of now, while querying using *DataSource* we are checking if the table has 
> been bootstrapped by the present of *bootstrap base path* in 
> *hoodie.properties* file, and based on that query the table using 
> *HoodieBootstrapRelation*  vs *Spark Parquet Data Source*. However, there 
> could be a scenario where all the files in the originally bootstrapped table 
> have wither been *upserted/deleted* and thus have been fully bootstrapped and 
> their data has been moved over to the target hoodie table. For such tables, 
> we can start querying them using *Spark Parquet Data Source* which will be 
> faster with all of spark's optimizations.
> So, basically we a need a way to check if all of the files have been fully 
> bootstrapped and moved over to the target location.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-619) Investigate and implement mechanism to have hive/presto/sparksql queries avoid stitching and return null values for hoodie columns

2022-09-02 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-619:
--

Assignee: Ethan Guo

> Investigate and implement mechanism to have hive/presto/sparksql queries 
> avoid stitching and return null values for hoodie columns 
> ---
>
> Key: HUDI-619
> URL: https://issues.apache.org/jira/browse/HUDI-619
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: hive, spark, trino-presto
>Affects Versions: 0.9.0
>Reporter: Balaji Varadarajan
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>
> This idea is suggested by Vinoth during RFC review. This ticket is to track 
> the feasibility and implementation of it. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-1158) Optimizations in parallelized listing behaviour for markers and bootstrap source files

2022-09-02 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-1158:
---

Assignee: Ethan Guo

> Optimizations in parallelized listing behaviour for markers and bootstrap 
> source files
> --
>
> Key: HUDI-1158
> URL: https://issues.apache.org/jira/browse/HUDI-1158
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Udit Mehrotra
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>
> * Extract out the common inner logic
>  * Parallelize not just at top directory level, but at the leaf partition 
> folders level



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-621) Presto Integration for supporting Bootstrapped table

2022-09-02 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-621:
--

Assignee: Ethan Guo  (was: Udit Mehrotra)

> Presto Integration for supporting Bootstrapped table
> 
>
> Key: HUDI-621
> URL: https://issues.apache.org/jira/browse/HUDI-621
> Project: Apache Hudi
>  Issue Type: Task
>  Components: trino-presto
>Affects Versions: 0.9.0
>Reporter: Balaji Varadarajan
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-4573) Fix HoodieMultiTableDeltaStreamer to write all tables in continuous mode

2022-09-02 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-4573:
-

Assignee: sivabalan narayanan  (was: Ethan Guo)

> Fix HoodieMultiTableDeltaStreamer to write all tables in continuous mode
> 
>
> Key: HUDI-4573
> URL: https://issues.apache.org/jira/browse/HUDI-4573
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: deltastreamer
>Reporter: Ethan Guo
>Assignee: sivabalan narayanan
>Priority: Blocker
> Fix For: 0.12.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-2580) Ability to clean up dangling data files using hudi-cli

2022-09-02 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-2580:
-

Assignee: Sagar Sumit  (was: Ethan Guo)

> Ability to clean up dangling data files using hudi-cli
> --
>
> Key: HUDI-2580
> URL: https://issues.apache.org/jira/browse/HUDI-2580
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Blocker
>  Labels: sev:normal, user-support-issues
> Fix For: 0.12.1
>
>
> See https://github.com/apache/hudi/issues/3739
> Scenario: commits archived but data files not cleaned up because cleaning 
> frequency is lesser than tha of archival.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4739) Wrong value returned when length equals 1

2022-09-02 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-4739:
--
Reviewers: sivabalan narayanan  (was: Ethan Guo)

> Wrong value returned when length equals 1
> -
>
> Key: HUDI-4739
> URL: https://issues.apache.org/jira/browse/HUDI-4739
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: writer-core
>Reporter: wuwenchi
>Assignee: wuwenchi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> In "KeyGenUtils#extractRecordKeys" function, it will return the value 
> corresponding to the key, but when the length is equal to 1, the key and 
> value are returned.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-3636) Clustering fails due to marker creation failure

2022-09-02 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-3636:
-

Assignee: Ethan Guo  (was: sivabalan narayanan)

> Clustering fails due to marker creation failure
> ---
>
> Key: HUDI-3636
> URL: https://issues.apache.org/jira/browse/HUDI-3636
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: multi-writer
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> Scenario: multi-writer test, one writer doing ingesting with Deltastreamer 
> continuous mode, COW, inserts, async clustering and cleaning (partitions 
> under 2022/1, 2022/2), another writer with Spark datasource doing backfills 
> to different partitions (2021/12).  
> 0.10.0 no MT, clustering instant is inflight (failing it in the middle before 
> upgrade) ➝ 0.11 MT, with multi-writer configuration the same as before.
> The clustering/replace instant cannot make progress due to marker creation 
> failure, failing the DS ingestion as well.  Need to investigate if this is 
> timeline-server-based marker related or MT related.
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in 
> stage 46.0 failed 1 times, most recent failure: Lost task 2.0 in stage 46.0 
> (TID 277) (192.168.70.231 executor driver): java.lang.RuntimeException: 
> org.apache.hudi.exception.HoodieException: 
> org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieRemoteException: Failed to create marker file 
> 2022/1/24/aa2f24d3-882f-4d48-b20e-9fcd3540c7a7-0_2-46-277_20220314101326706.parquet.marker.CREATE
> Connect to localhost:26754 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] 
> failed: Connection refused (Connection refused)
>     at 
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
>     at 
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:46)
>     at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
>     at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
>     at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>     at scala.collection.Iterator.foreach(Iterator.scala:943)
>     at scala.collection.Iterator.foreach$(Iterator.scala:943)
>     at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>     at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
>     at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
>     at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
>     at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
>     at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
>     at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
>     at scala.collection.AbstractIterator.to(Iterator.scala:1431)
>     at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
>     at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
>     at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
>     at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
>     at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
>     at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
>     at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
>     at 
> org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>     at org.apache.spark.scheduler.Task.run(Task.scala:131)
>     at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieException: 
> org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieRemoteException: Failed to create marker file 
> 2022/1/24/aa2f24d3-882f-4d48-b20e-9fcd3540c7a7-0_2-46-277_20220314101326706.parquet.marker.CREATE
> Connect to localhost:26754 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] 
> failed: Connection refused (Connection refused)
>     at 
> org.apache.hudi.execution.SparkLazyInsertIterable.computeNext(SparkLazyInsertIterable.jav

[jira] [Updated] (HUDI-956) Test MOR : Presto Realtime Query with metadata bootstrap

2022-09-02 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-956:
---
Sprint:   (was: 2022/08/22)

> Test MOR : Presto Realtime Query with metadata bootstrap
> 
>
> Key: HUDI-956
> URL: https://issues.apache.org/jira/browse/HUDI-956
> Project: Apache Hudi
>  Issue Type: Task
>  Components: trino-presto
>Reporter: Balaji Varadarajan
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-3636) Clustering fails due to marker creation failure

2022-09-02 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-3636:
-

Assignee: sivabalan narayanan  (was: Ethan Guo)

> Clustering fails due to marker creation failure
> ---
>
> Key: HUDI-3636
> URL: https://issues.apache.org/jira/browse/HUDI-3636
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: multi-writer
>Reporter: Ethan Guo
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> Scenario: multi-writer test, one writer doing ingesting with Deltastreamer 
> continuous mode, COW, inserts, async clustering and cleaning (partitions 
> under 2022/1, 2022/2), another writer with Spark datasource doing backfills 
> to different partitions (2021/12).  
> 0.10.0 no MT, clustering instant is inflight (failing it in the middle before 
> upgrade) ➝ 0.11 MT, with multi-writer configuration the same as before.
> The clustering/replace instant cannot make progress due to marker creation 
> failure, failing the DS ingestion as well.  Need to investigate if this is 
> timeline-server-based marker related or MT related.
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in 
> stage 46.0 failed 1 times, most recent failure: Lost task 2.0 in stage 46.0 
> (TID 277) (192.168.70.231 executor driver): java.lang.RuntimeException: 
> org.apache.hudi.exception.HoodieException: 
> org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieRemoteException: Failed to create marker file 
> 2022/1/24/aa2f24d3-882f-4d48-b20e-9fcd3540c7a7-0_2-46-277_20220314101326706.parquet.marker.CREATE
> Connect to localhost:26754 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] 
> failed: Connection refused (Connection refused)
>     at 
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
>     at 
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:46)
>     at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
>     at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
>     at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>     at scala.collection.Iterator.foreach(Iterator.scala:943)
>     at scala.collection.Iterator.foreach$(Iterator.scala:943)
>     at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>     at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
>     at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
>     at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
>     at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
>     at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
>     at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
>     at scala.collection.AbstractIterator.to(Iterator.scala:1431)
>     at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
>     at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
>     at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
>     at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
>     at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
>     at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
>     at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
>     at 
> org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>     at org.apache.spark.scheduler.Task.run(Task.scala:131)
>     at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieException: 
> org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieRemoteException: Failed to create marker file 
> 2022/1/24/aa2f24d3-882f-4d48-b20e-9fcd3540c7a7-0_2-46-277_20220314101326706.parquet.marker.CREATE
> Connect to localhost:26754 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] 
> failed: Connection refused (Connection refused)
>     at 
> org.apache.hudi.execution.SparkLazyInsertIterable.computeNext(SparkLazyInsertIt

[jira] [Updated] (HUDI-955) Test MOR : Presto Read Optimized Query with metadata bootstrap

2022-09-02 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-955:
---
Sprint:   (was: 2022/08/22)

> Test MOR : Presto Read Optimized Query with metadata bootstrap
> --
>
> Key: HUDI-955
> URL: https://issues.apache.org/jira/browse/HUDI-955
> Project: Apache Hudi
>  Issue Type: Task
>  Components: trino-presto
>Reporter: Balaji Varadarajan
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4658) Test MOR: Deltastreamer metadata-only and full-record bootstrap operation

2022-09-02 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-4658:

Sprint:   (was: 2022/08/22)

> Test MOR: Deltastreamer metadata-only and full-record bootstrap operation
> -
>
> Key: HUDI-4658
> URL: https://issues.apache.org/jira/browse/HUDI-4658
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4653) Test MOR: Spark datasource writing with non-Hudi partitions

2022-09-02 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-4653:

Sprint:   (was: 2022/08/22)

> Test MOR: Spark datasource writing with non-Hudi partitions
> ---
>
> Key: HUDI-4653
> URL: https://issues.apache.org/jira/browse/HUDI-4653
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4654) Test MOR: Deltastreamer writing with non-Hudi partitions

2022-09-02 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-4654:

Sprint:   (was: 2022/08/22)

> Test MOR: Deltastreamer writing with non-Hudi partitions
> 
>
> Key: HUDI-4654
> URL: https://issues.apache.org/jira/browse/HUDI-4654
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4657) Test MOR: Spark datasource metadata-only and full-record bootstrap operation

2022-09-02 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-4657:

Sprint:   (was: 2022/08/22)

> Test MOR: Spark datasource metadata-only and full-record bootstrap operation
> 
>
> Key: HUDI-4657
> URL: https://issues.apache.org/jira/browse/HUDI-4657
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-4772) Revisit dropped Partition Columns handling

2022-09-02 Thread Alexey Kudinkin (Jira)

Alexey Kudinkin created HUDI-4772:
-

 Summary: Revisit dropped Partition Columns handling
 Key: HUDI-4772
 URL: https://issues.apache.org/jira/browse/HUDI-4772
 Project: Apache Hudi
  Issue Type: Bug
  Components: writer-core
Affects Versions: 0.13.0
Reporter: Alexey Kudinkin
Assignee: Alexey Kudinkin


Currently, dropping partition columns (controlled by 
"hoodie.datasource.write.drop.partition.columns") is handled in a piecemeal 
fashion, which unfortunately may to lead to very subtle and hard to 
troubleshoot issues when used.

For ex, currently in HoodieSparkSqlWriter this would affect what will be 
persisted as writer's schema – in case partition columns are dropped from the 
data file we will persist "reduced" schema as the one that was used by the 
Writer, which is invalid since Writer was using the full schema, however 
partition columns weren't persisted in the Data Files (ie dropped, since 
they're already encoded into the partition path)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #6502: HUDI-4722 Added locking metrics for Hudi

2022-09-02 Thread GitBox



hudi-bot commented on PR #6502:
URL: https://github.com/apache/hudi/pull/6502#issuecomment-1235946893

   
   ## CI report:
   
   * fbedf9a29c4c574ad4d69406416dbb057c080345 UNKNOWN
   * 8b1585464429a60d9eff4cfa2cb9f937b1ac6f0d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10956)
 
   * c0ea0012654e4e190bcc7a09dd6836ab64fb8ea2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11125)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6502: HUDI-4722 Added locking metrics for Hudi

2022-09-02 Thread GitBox



hudi-bot commented on PR #6502:
URL: https://github.com/apache/hudi/pull/6502#issuecomment-1235942821

   
   ## CI report:
   
   * fbedf9a29c4c574ad4d69406416dbb057c080345 UNKNOWN
   * 8b1585464429a60d9eff4cfa2cb9f937b1ac6f0d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10956)
 
   * c0ea0012654e4e190bcc7a09dd6836ab64fb8ea2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11125)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] jsbali commented on pull request #6502: HUDI-4722 Added locking metrics for Hudi

2022-09-02 Thread GitBox



jsbali commented on PR #6502:
URL: https://github.com/apache/hudi/pull/6502#issuecomment-1235939365

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6016: [HUDI-4465] Optimizing file-listing sequence of Metadata Table

2022-09-02 Thread GitBox



hudi-bot commented on PR #6016:
URL: https://github.com/apache/hudi/pull/6016#issuecomment-1235900840

   
   ## CI report:
   
   * 95ce817e050387177ba9620d33868eae1d04306c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11127)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6196: [HUDI-4071] Enable schema reconciliation by default

2022-09-02 Thread GitBox



alexeykudinkin commented on code in PR #6196:
URL: https://github.com/apache/hudi/pull/6196#discussion_r961984500


##
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieCommonConfig.java:
##
@@ -38,7 +38,7 @@ public class HoodieCommonConfig extends HoodieConfig {
 
   public static final ConfigProperty RECONCILE_SCHEMA = ConfigProperty
   .key("hoodie.datasource.write.reconcile.schema")
-  .defaultValue(false)
+  .defaultValue(true)

Review Comment:
   Initially was not in favor of this change, but now thinking about it a 
little more and especially in the light of 
https://github.com/apache/hudi/pull/6358, i think this is the right thing to 
do: for ex, after #6358, we'd be allowing to go writes, which might have 
columns dropped in the new batch. Now, there are 2 scenarios based on whether 
the reconciliation is enabled or not:
   
   1. If reconciliation is _enabled_: we will be favoring table's schema and 
use it as a _writer-schema_. So in that case we will rewrite the incoming batch 
into the table's schema before applying it to the table.
   
   2. If reconciliation is _disabled_: we will be favoring incoming batch's 
schema and use it as a _writer-schema_. In this case, for ex, for COW, we will 
be reading the table in its existing schema, but the new base files will be 
written in the writer's schema (ie w/ the column dropped)
   
   Both of these approaches are legitimate and could be preferred in different 
circumstances. What's important here for us is to pick the right default 
setting that would minimize the _surprise effect_. 
   
   Having reflected on this for some time now i think, that enabling 
reconciliation by default makes more sense as it protects table's schema from 
accidental mishaps in the incoming batches. And if somebody prefers the flow #2 
the could easily opt-in for it by simply disabling the reconciliation.
   
   WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6502: HUDI-4722 Added locking metrics for Hudi

2022-09-02 Thread GitBox



hudi-bot commented on PR #6502:
URL: https://github.com/apache/hudi/pull/6502#issuecomment-1235865144

   
   ## CI report:
   
   * fbedf9a29c4c574ad4d69406416dbb057c080345 UNKNOWN
   * 8b1585464429a60d9eff4cfa2cb9f937b1ac6f0d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10956)
 
   * c0ea0012654e4e190bcc7a09dd6836ab64fb8ea2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11125)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6016: [HUDI-4465] Optimizing file-listing sequence of Metadata Table

2022-09-02 Thread GitBox



hudi-bot commented on PR #6016:
URL: https://github.com/apache/hudi/pull/6016#issuecomment-1235857674

   
   ## CI report:
   
   * 705660efda3e17a13071c7ab3550daceefa9d3b8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10440)
 
   * 95ce817e050387177ba9620d33868eae1d04306c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11127)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6016: [HUDI-4465] Optimizing file-listing sequence of Metadata Table

2022-09-02 Thread GitBox



hudi-bot commented on PR #6016:
URL: https://github.com/apache/hudi/pull/6016#issuecomment-1235853711

   
   ## CI report:
   
   * 705660efda3e17a13071c7ab3550daceefa9d3b8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10440)
 
   * 95ce817e050387177ba9620d33868eae1d04306c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6575: [HUDI-4754] Add compliance check in github actions

2022-09-02 Thread GitBox



hudi-bot commented on PR #6575:
URL: https://github.com/apache/hudi/pull/6575#issuecomment-1235850579

   
   ## CI report:
   
   * 1600e31836157c8d05e3bc8b9e08e1717471f1a6 UNKNOWN
   * 4d02f2c64a5fc4b89889677ee639a20b53cec26a UNKNOWN
   * 48147d19c835e7868102fd2d083659e6ee2ac343 UNKNOWN
   * 0cb0b8ff84a5880e2718c9eb177019457b3e00c9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11124)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] bhasudha commented on pull request #6570: [DOCS] Add support for Apache Doris and StarRocks

2022-09-02 Thread GitBox



bhasudha commented on PR #6570:
URL: https://github.com/apache/hudi/pull/6570#issuecomment-1235831710

   > 
   
   @nsivabalan  PR for redshift spectrum fix here- 
https://github.com/apache/hudi/pull/6577 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] bhasudha opened a new pull request, #6577: [DOCS] Fix Redshift spectrum Hudi version details

2022-09-02 Thread GitBox



bhasudha opened a new pull request, #6577:
URL: https://github.com/apache/hudi/pull/6577

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to 
mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6550: [HUDI-4691] Cleaning up duplicated classes in Spark 3.3 module

2022-09-02 Thread GitBox



hudi-bot commented on PR #6550:
URL: https://github.com/apache/hudi/pull/6550#issuecomment-1235816942

   
   ## CI report:
   
   * 2e05253a64130a6a74ad67e639acc12b3319187b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11126)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6550: [HUDI-4691] Cleaning up duplicated classes in Spark 3.3 module

2022-09-02 Thread GitBox



hudi-bot commented on PR #6550:
URL: https://github.com/apache/hudi/pull/6550#issuecomment-1235813490

   
   ## CI report:
   
   * 8a43a078a076f64b7e66ecb7d9471ec5d7c86646 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11103)
 
   * 2e05253a64130a6a74ad67e639acc12b3319187b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6502: HUDI-4722 Added locking metrics for Hudi

2022-09-02 Thread GitBox



hudi-bot commented on PR #6502:
URL: https://github.com/apache/hudi/pull/6502#issuecomment-1235813381

   
   ## CI report:
   
   * fbedf9a29c4c574ad4d69406416dbb057c080345 UNKNOWN
   * 8b1585464429a60d9eff4cfa2cb9f937b1ac6f0d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10956)
 
   * c0ea0012654e4e190bcc7a09dd6836ab64fb8ea2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11125)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6502: HUDI-4722 Added locking metrics for Hudi

2022-09-02 Thread GitBox



hudi-bot commented on PR #6502:
URL: https://github.com/apache/hudi/pull/6502#issuecomment-1235809905

   
   ## CI report:
   
   * fbedf9a29c4c574ad4d69406416dbb057c080345 UNKNOWN
   * 8b1585464429a60d9eff4cfa2cb9f937b1ac6f0d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10956)
 
   * c0ea0012654e4e190bcc7a09dd6836ab64fb8ea2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6575: [HUDI-4754] Add compliance check in github actions

2022-09-02 Thread GitBox



hudi-bot commented on PR #6575:
URL: https://github.com/apache/hudi/pull/6575#issuecomment-1235806516

   
   ## CI report:
   
   * 353d9b6f7c4bcf1defb4eead956a334443c89b31 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11123)
 
   * 1600e31836157c8d05e3bc8b9e08e1717471f1a6 UNKNOWN
   * 4d02f2c64a5fc4b89889677ee639a20b53cec26a UNKNOWN
   * 48147d19c835e7868102fd2d083659e6ee2ac343 UNKNOWN
   * 0cb0b8ff84a5880e2718c9eb177019457b3e00c9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11124)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6442: [HUDI-4449] Support DataSourceV2 Read for Spark3.2

2022-09-02 Thread GitBox



hudi-bot commented on PR #6442:
URL: https://github.com/apache/hudi/pull/6442#issuecomment-1235806130

   
   ## CI report:
   
   * 208824bed3a3c3b2723663a49ab2e8a8c68a095b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11122)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] jsbali commented on pull request #6502: HUDI-4722 Added locking metrics for Hudi

2022-09-02 Thread GitBox



jsbali commented on PR #6502:
URL: https://github.com/apache/hudi/pull/6502#issuecomment-1235780001

   > 
   
   Yes, I have verified it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] jsbali commented on pull request #6502: HUDI-4722 Added locking metrics for Hudi

2022-09-02 Thread GitBox



jsbali commented on PR #6502:
URL: https://github.com/apache/hudi/pull/6502#issuecomment-1235779553

   Thanks for the review @nsivabalan Have taken care of all the comments. PTAL
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] bhasudha commented on pull request #6570: [DOCS] Add support for Apache Doris and StarRocks

2022-09-02 Thread GitBox



bhasudha commented on PR #6570:
URL: https://github.com/apache/hudi/pull/6570#issuecomment-1235776907

   > 
   @nsivabalan  thanks for noting that. I can followup on redshift spectrum 
separately in a different PR after gathering those details. If nothing else is 
blocking can you approve this one ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on pull request #6570: [DOCS] Add support for Apache Doris and StarRocks

2022-09-02 Thread GitBox



nsivabalan commented on PR #6570:
URL: https://github.com/apache/hudi/pull/6570#issuecomment-1235773755

   guess for hudi 10.0 or above, guess there is some minimal version of 
Redshift spectrum one has to use. can you add those details please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] jsbali commented on a diff in pull request #6502: HUDI-4722 Added locking metrics for Hudi

2022-09-02 Thread GitBox



jsbali commented on code in PR #6502:
URL: https://github.com/apache/hudi/pull/6502#discussion_r961900715


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java:
##
@@ -470,8 +471,14 @@ protected void preCommit(HoodieInstant inflightInstant, 
HoodieCommitMetadata met
 // Create a Hoodie table after startTxn which encapsulated the commits and 
files visible.
 // Important to create this after the lock to ensure the latest commits 
show up in the timeline without need for reload
 HoodieTable table = createTable(config, hadoopConf);
-TransactionUtils.resolveWriteConflictIfAny(table, 
this.txnManager.getCurrentTransactionOwner(),
-Option.of(metadata), config, 
txnManager.getLastCompletedTransactionOwner(), false, 
this.pendingInflightAndRequestedInstants);
+try {
+  TransactionUtils.resolveWriteConflictIfAny(table, 
this.txnManager.getCurrentTransactionOwner(),
+  Option.of(metadata), config, 
txnManager.getLastCompletedTransactionOwner(), false, 
this.pendingInflightAndRequestedInstants);
+  metrics.emitConflictResolutionSuccessful();

Review Comment:
   Added



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/LockManager.java:
##
@@ -64,10 +69,14 @@ public void lock() {
   boolean acquired = false;
   while (retryCount <= maxRetries) {
 try {
+  metrics.startLockApiTimerContext();
   acquired = 
lockProvider.tryLock(writeConfig.getLockAcquireWaitTimeoutInMs(), 
TimeUnit.MILLISECONDS);
   if (acquired) {
+metrics.updateLockAcquiredMetric();
+metrics.startLockHeldTimerContext();
 break;
   }
+  metrics.updateLockNotAcquiredMetric();

Review Comment:
   Moved it to catch as well. But can't remove it from here as tryLock can 
simply return false for some implementations and not throw an exception.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] jsbali commented on a diff in pull request #6502: HUDI-4722 Added locking metrics for Hudi

2022-09-02 Thread GitBox



jsbali commented on code in PR #6502:
URL: https://github.com/apache/hudi/pull/6502#discussion_r961900521


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/metrics/HoodieLockMetrics.java:
##
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client.transaction.lock.metrics;
+
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.metrics.Metrics;
+
+import com.codahale.metrics.Counter;
+import com.codahale.metrics.MetricRegistry;
+import com.codahale.metrics.SlidingWindowReservoir;
+import com.codahale.metrics.Timer;
+
+import java.util.concurrent.TimeUnit;
+
+public class HoodieLockMetrics {
+
+  private final HoodieWriteConfig writeConfig;
+  private final boolean isMetricsEnabled;
+  private final int keepLastNtimes = 100;
+  private final transient HoodieTimer lockDurationTimer = HoodieTimer.create();
+  private final transient HoodieTimer lockApiRequestDurationTimer = 
HoodieTimer.create();
+  private transient Counter lockAttempts;
+  private transient Counter succesfulLockAttempts;
+  private transient Counter failedLockAttempts;
+  private transient Timer lockDuration;
+  private transient Timer lockApiRequestDuration;
+
+  public HoodieLockMetrics(HoodieWriteConfig writeConfig) {
+this.isMetricsEnabled = writeConfig.isMetricsOn();
+this.writeConfig = writeConfig;
+
+if (writeConfig.isMetricsOn()) {

Review Comment:
   Done added



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6098: [HUDI-4389] Make HoodieStreamingSink idempotent

2022-09-02 Thread GitBox



hudi-bot commented on PR #6098:
URL: https://github.com/apache/hudi/pull/6098#issuecomment-1235754483

   
   ## CI report:
   
   * 1d2a193ac4bf4df359d1f6f6de7a3ec4d427025a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11121)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6485: [HUDI-4528] Add diff tool to compare commit metadata

2022-09-02 Thread GitBox



hudi-bot commented on PR #6485:
URL: https://github.com/apache/hudi/pull/6485#issuecomment-1235706796

   
   ## CI report:
   
   * 2f23cec83200a6410e53a1030d26095cae663f61 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11120)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6575: [HUDI-4754] Add compliance check in github actions

2022-09-02 Thread GitBox



hudi-bot commented on PR #6575:
URL: https://github.com/apache/hudi/pull/6575#issuecomment-1235702007

   
   ## CI report:
   
   * 353d9b6f7c4bcf1defb4eead956a334443c89b31 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11123)
 
   * 1600e31836157c8d05e3bc8b9e08e1717471f1a6 UNKNOWN
   * 4d02f2c64a5fc4b89889677ee639a20b53cec26a UNKNOWN
   * 48147d19c835e7868102fd2d083659e6ee2ac343 UNKNOWN
   * 0cb0b8ff84a5880e2718c9eb177019457b3e00c9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11124)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6575: [HUDI-4754] Add compliance check in github actions

2022-09-02 Thread GitBox



hudi-bot commented on PR #6575:
URL: https://github.com/apache/hudi/pull/6575#issuecomment-1235696750

   
   ## CI report:
   
   * 353d9b6f7c4bcf1defb4eead956a334443c89b31 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11123)
 
   * 1600e31836157c8d05e3bc8b9e08e1717471f1a6 UNKNOWN
   * 4d02f2c64a5fc4b89889677ee639a20b53cec26a UNKNOWN
   * 48147d19c835e7868102fd2d083659e6ee2ac343 UNKNOWN
   * 0cb0b8ff84a5880e2718c9eb177019457b3e00c9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6550: [HUDI-4691] Cleaning up duplicated classes in Spark 3.3 module

2022-09-02 Thread GitBox



alexeykudinkin commented on code in PR #6550:
URL: https://github.com/apache/hudi/pull/6550#discussion_r961841532


##
pom.xml:
##
@@ -1938,7 +1910,8 @@
 ${scala12.version}
 2.12
 hudi-spark3.2.x
-hudi-spark3-common
+
+
hudi-spark3*-common

Review Comment:
   Yeah, not a big fan of globbing (it's quite brittle). Let me try to have 
separate property (the only reason i opted for globbing initially was b/c 
wasn't sure if Maven will be able to handle an empty clause, since for ex, this 
parameter for Spark 2 and 3.1 would be empty)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6550: [HUDI-4691] Cleaning up duplicated classes in Spark 3.3 module

2022-09-02 Thread GitBox



alexeykudinkin commented on code in PR #6550:
URL: https://github.com/apache/hudi/pull/6550#discussion_r961841532


##
pom.xml:
##
@@ -1938,7 +1910,8 @@
 ${scala12.version}
 2.12
 hudi-spark3.2.x
-hudi-spark3-common
+
+
hudi-spark3*-common

Review Comment:
   Yeah, not a big fan of globbing (it's quite brittle). Let me try to have 
separate module (the only reason i opted for globbing initially was b/c wasn't 
sure if Maven will be able to handle an empty clause, since for ex, this 
parameter for Spark 2 and 3.1 would be empty)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6550: [HUDI-4691] Cleaning up duplicated classes in Spark 3.3 module

2022-09-02 Thread GitBox



alexeykudinkin commented on code in PR #6550:
URL: https://github.com/apache/hudi/pull/6550#discussion_r961840674


##
hudi-spark-datasource/hudi-spark3.2plus-common/src/main/scala/org/apache/hudi/Spark32PlusDefaultSource.scala:
##
@@ -25,7 +25,7 @@ import org.apache.spark.sql.sources.DataSourceRegister
  *   there are no regressions in performance
  *   Please check out HUDI-4178 for more details
  */
-class Spark3DefaultSource extends DefaultSource with DataSourceRegister /* 
with TableProvider */ {
+class Spark32PlusDefaultSource extends DefaultSource with DataSourceRegister 
/* with TableProvider */ {

Review Comment:
   The plan is to restore it once we migrate to DSv2



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6575: [HUDI-4754] Add compliance check in github actions

2022-09-02 Thread GitBox



hudi-bot commented on PR #6575:
URL: https://github.com/apache/hudi/pull/6575#issuecomment-1235691278

   
   ## CI report:
   
   * 353d9b6f7c4bcf1defb4eead956a334443c89b31 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11123)
 
   * 1600e31836157c8d05e3bc8b9e08e1717471f1a6 UNKNOWN
   * 4d02f2c64a5fc4b89889677ee639a20b53cec26a UNKNOWN
   * 48147d19c835e7868102fd2d083659e6ee2ac343 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6550: [HUDI-4691] Cleaning up duplicated classes in Spark 3.3 module

2022-09-02 Thread GitBox



alexeykudinkin commented on code in PR #6550:
URL: https://github.com/apache/hudi/pull/6550#discussion_r961839256


##
hudi-spark-datasource/hudi-spark3.2.x/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark32DataSourceUtils.scala:
##
@@ -1,77 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.execution.datasources.parquet
-
-import org.apache.spark.sql.SPARK_VERSION_METADATA_KEY
-import org.apache.spark.sql.internal.SQLConf
-import org.apache.spark.sql.internal.SQLConf.LegacyBehaviorPolicy
-import org.apache.spark.util.Utils
-
-object Spark32DataSourceUtils {

Review Comment:
   Nope, not propagating a config was a miss before (b/c of duplication of the 
classes, it was handled in 3.2, but not in 3.1



##
hudi-spark-datasource/hudi-spark3.2.x/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark32DataSourceUtils.scala:
##
@@ -1,77 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.execution.datasources.parquet
-
-import org.apache.spark.sql.SPARK_VERSION_METADATA_KEY
-import org.apache.spark.sql.internal.SQLConf
-import org.apache.spark.sql.internal.SQLConf.LegacyBehaviorPolicy
-import org.apache.spark.util.Utils
-
-object Spark32DataSourceUtils {

Review Comment:
   Nope, not propagating a config was a miss before (b/c of duplication of the 
classes, it was handled in 3.2, but not in 3.1)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6550: [HUDI-4691] Cleaning up duplicated classes in Spark 3.3 module

2022-09-02 Thread GitBox



alexeykudinkin commented on code in PR #6550:
URL: https://github.com/apache/hudi/pull/6550#discussion_r961838761


##
hudi-spark-datasource/hudi-spark3.1.x/src/main/scala/org/apache/spark/sql/hudi/command/Spark31AlterTableCommand.scala:
##
@@ -52,7 +52,7 @@ import scala.collection.JavaConverters._
 import scala.util.control.NonFatal
 
 // TODO: we should remove this file when we support datasourceV2 for hoodie on 
spark3.1x
-case class AlterTableCommand312(table: CatalogTable, changes: 
Seq[TableChange], changeType: ColumnChangeID) extends RunnableCommand with 
Logging {
+case class Spark31AlterTableCommand(table: CatalogTable, changes: 
Seq[TableChange], changeType: ColumnChangeID) extends RunnableCommand with 
Logging {

Review Comment:
   We do refined our Spark compatibility mode in 0.11: we now promise we'd stay 
compatible w/ ALL versions w/in a minor branch.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6550: [HUDI-4691] Cleaning up duplicated classes in Spark 3.3 module

2022-09-02 Thread GitBox



alexeykudinkin commented on code in PR #6550:
URL: https://github.com/apache/hudi/pull/6550#discussion_r961838079


##
hudi-spark-datasource/hudi-spark3.1.x/src/main/scala/org/apache/spark/sql/HoodieSpark31CatalogUtils.scala:
##
@@ -15,19 +15,16 @@
  * limitations under the License.
  */
 
-package org.apache.spark.sql.catalyst.plans.logical
+package org.apache.spark.sql
 
-import org.apache.spark.sql.catalyst.expressions.{Attribute, Expression}
+import org.apache.spark.sql.connector.expressions.{BucketTransform, 
NamedReference, Transform}
 
-case class TimeTravelRelation(
-   table: LogicalPlan,
-   timestamp: Option[Expression],
-   version: Option[String]) extends Command {
-  override def children: Seq[LogicalPlan] = Seq.empty
+object HoodieSpark31CatalogUtils extends HoodieSpark3CatalogUtils {
 
-  override def output: Seq[Attribute] = Nil
+  override def unapplyBucketTransform(t: Transform): Option[(Int, 
Seq[NamedReference], Seq[NamedReference])] =
+t match {
+  case BucketTransform(numBuckets, ref) => Some(numBuckets, Seq(ref), 
Seq.empty)

Review Comment:
   Correct, they for whatever reason just submit a single ref



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6550: [HUDI-4691] Cleaning up duplicated classes in Spark 3.3 module

2022-09-02 Thread GitBox



alexeykudinkin commented on code in PR #6550:
URL: https://github.com/apache/hudi/pull/6550#discussion_r961837710


##
hudi-spark-datasource/hudi-spark/pom.xml:
##
@@ -203,41 +207,6 @@
   hudi-sync-common
   ${project.version}
 
-
-  org.apache.hudi
-  hudi-spark-common_${scala.binary.version}
-  ${project.version}
-  
-
-  org.apache.curator
-  *
-
-  
-
-
-
-  org.apache.hudi
-  ${hudi.spark.module}_${scala.binary.version}
-  ${project.version}
-  
-
-  org.apache.hudi
-  *
-
-  
-
-
-
-  org.apache.hudi
-  ${hudi.spark.common.module}

Review Comment:
   It's a transitive dependency, no need to list it directly



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] fengjian428 opened a new pull request, #6576: [Draft][WIP][Hudi-4678] [RFC-61] Snapshot view management

2022-09-02 Thread GitBox



fengjian428 opened a new pull request, #6576:
URL: https://github.com/apache/hudi/pull/6576

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to 
mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6575: [HUDI-4754] Add compliance check in github actions

2022-09-02 Thread GitBox



hudi-bot commented on PR #6575:
URL: https://github.com/apache/hudi/pull/6575#issuecomment-1235637784

   
   ## CI report:
   
   * 353d9b6f7c4bcf1defb4eead956a334443c89b31 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11123)
 
   * 1600e31836157c8d05e3bc8b9e08e1717471f1a6 UNKNOWN
   * 4d02f2c64a5fc4b89889677ee639a20b53cec26a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6575: [HUDI-4754] Add compliance check in github actions

2022-09-02 Thread GitBox



hudi-bot commented on PR #6575:
URL: https://github.com/apache/hudi/pull/6575#issuecomment-1235632474

   
   ## CI report:
   
   * 353d9b6f7c4bcf1defb4eead956a334443c89b31 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11123)
 
   * 1600e31836157c8d05e3bc8b9e08e1717471f1a6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6575: [HUDI-4754] Add compliance check in github actions

2022-09-02 Thread GitBox



hudi-bot commented on PR #6575:
URL: https://github.com/apache/hudi/pull/6575#issuecomment-1235626668

   
   ## CI report:
   
   * 353d9b6f7c4bcf1defb4eead956a334443c89b31 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11123)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6442: [HUDI-4449] Support DataSourceV2 Read for Spark3.2

2022-09-02 Thread GitBox



hudi-bot commented on PR #6442:
URL: https://github.com/apache/hudi/pull/6442#issuecomment-1235626225

   
   ## CI report:
   
   * 30a035fe4b878d05d345614932abe9d4cbcd0051 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6)
 
   * 208824bed3a3c3b2723663a49ab2e8a8c68a095b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11122)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6575: [HUDI-4754] Add compliance check in github actions

2022-09-02 Thread GitBox



hudi-bot commented on PR #6575:
URL: https://github.com/apache/hudi/pull/6575#issuecomment-1235621099

   
   ## CI report:
   
   * 353d9b6f7c4bcf1defb4eead956a334443c89b31 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6442: [HUDI-4449] Support DataSourceV2 Read for Spark3.2

2022-09-02 Thread GitBox



hudi-bot commented on PR #6442:
URL: https://github.com/apache/hudi/pull/6442#issuecomment-1235620591

   
   ## CI report:
   
   * 30a035fe4b878d05d345614932abe9d4cbcd0051 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6)
 
   * 208824bed3a3c3b2723663a49ab2e8a8c68a095b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6569: [HUDI-4648] Support rename partition through CLI

2022-09-02 Thread GitBox



hudi-bot commented on PR #6569:
URL: https://github.com/apache/hudi/pull/6569#issuecomment-1235614805

   
   ## CI report:
   
   * 059a2a712adb33786ecb39164e681ebaec4ecf94 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

1 2 >

1 - 100 of 145 matches

Mail list logo