[GitHub] [hudi] hudi-bot commented on pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-16 Thread via GitHub


hudi-bot commented on PR #8418:
URL: https://github.com/apache/hudi/pull/8418#issuecomment-1510798747

   
   ## CI report:
   
   * 1579a6e6966b051e9a6a5b696a9e6e35500a929a UNKNOWN
   * 79bc2ba070ecbcbdeab6ff88f9424cbd5a6b5e49 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16381)
 
   * 5f063bb1db0459bb6a62dd55a57279cf45f4e9fa Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16385)
 
   * 596c673ec020e47bff03f76e121fad2c2ae2daee UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7627: [HUDI-5517] HoodieTimeline support filter instants by state transition time

2023-04-16 Thread via GitHub


hudi-bot commented on PR #7627:
URL: https://github.com/apache/hudi/pull/7627#issuecomment-1510797365

   
   ## CI report:
   
   * 693fe2d347bf62a4340ad5d397d29caf092a5cb9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15267)
 
   * d5f7e0f51ce768b34aa01ff970da9c187c4f8c16 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16388)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8461: [HUDI-6070] Files pruning for bucket index table pk filtering queries

2023-04-16 Thread via GitHub


hudi-bot commented on PR #8461:
URL: https://github.com/apache/hudi/pull/8461#issuecomment-1510792176

   
   ## CI report:
   
   * 6f63e6704b12a56cbdcef44c34ba5595b163acfa Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16364)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16390)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8437: [HUDI-6066] HoodieTableSource supports parquet predicate push down

2023-04-16 Thread via GitHub


hudi-bot commented on PR #8437:
URL: https://github.com/apache/hudi/pull/8437#issuecomment-1510791996

   
   ## CI report:
   
   * 4fdb9dc536d97832f1dc16dd1c754ce7015b1bc6 UNKNOWN
   * 7308c0d7f8ce875fa9ffcff9af197a87d836cb04 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16287)
 
   * 6af209c352d9665ad1f8a0243a27f50e0d26b43a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] XuQianJin-Stars commented on pull request #8461: [HUDI-6070] Files pruning for bucket index table pk filtering queries

2023-04-16 Thread via GitHub


XuQianJin-Stars commented on PR #8461:
URL: https://github.com/apache/hudi/pull/8461#issuecomment-1510791508

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7627: [HUDI-5517] HoodieTimeline support filter instants by state transition time

2023-04-16 Thread via GitHub


hudi-bot commented on PR #7627:
URL: https://github.com/apache/hudi/pull/7627#issuecomment-1510790494

   
   ## CI report:
   
   * 693fe2d347bf62a4340ad5d397d29caf092a5cb9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15267)
 
   * d5f7e0f51ce768b34aa01ff970da9c187c4f8c16 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #7627: [HUDI-5517] HoodieTimeline support filter instants by state transition time

2023-04-16 Thread via GitHub


danny0405 commented on code in PR #7627:
URL: https://github.com/apache/hudi/pull/7627#discussion_r1168233922


##
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieCommonConfig.java:
##
@@ -67,6 +67,13 @@ public class HoodieCommonConfig extends HoodieConfig {
   .defaultValue(true)
   .withDocumentation("Turn on compression for BITCASK disk map used by the 
External Spillable Map");
 
+  public static final ConfigProperty 
INCREMENTAL_FETCH_INSTANT_BY_STATE_TRANSITION_TIME = ConfigProperty
+  .key("hoodie.incremental.fetch.instant.by.state.transition.time")
+  .defaultValue(false)
+  .sinceVersion("0.13.0")

Review Comment:
   need to fix the since version to 0.14.0



##
hudi-common/src/test/java/org/apache/hudi/common/table/timeline/TestHoodieInstant.java:
##
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.table.timeline;
+
+import org.apache.hudi.common.testutils.HoodieCommonTestHarness;
+import org.apache.hudi.common.util.Option;
+import org.junit.jupiter.api.Test;
+
+import java.io.IOException;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.testutils.Assertions.assertStreamEquals;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+public class TestHoodieInstant extends HoodieCommonTestHarness {
+
+  @Test
+  public void testExtractTimestamp() {
+String fileName = "20230104152218702.inflight";
+assertEquals("20230104152218702", 
HoodieInstant.extractTimestamp(fileName));
+
+fileName = "20230104152218702.commit.request";
+assertEquals("20230104152218702", 
HoodieInstant.extractTimestamp(fileName));
+  }
+
+  @Test
+  public void testGetTimelineFileExtension() {
+String fileName = "20230104152218702.inflight";
+assertEquals(".inflight", 
HoodieInstant.getTimelineFileExtension(fileName));
+
+fileName = "20230104152218702.commit.request";
+assertEquals(".commit.request", 
HoodieInstant.getTimelineFileExtension(fileName));
+  }
+
+  @Test
+  public void testCreateHoodieInstantByFileStatus() throws IOException {
+try {
+  initMetaClient();
+  HoodieInstant instantRequested =
+  new HoodieInstant(HoodieInstant.State.REQUESTED, 
HoodieTimeline.COMMIT_ACTION, "001");
+  HoodieInstant instantCommitted =
+  new HoodieInstant(HoodieInstant.State.COMPLETED, 
HoodieTimeline.COMMIT_ACTION, "001");
+  HoodieActiveTimeline timeline = metaClient.getActiveTimeline();
+  timeline.createNewInstant(instantRequested);
+  timeline.transitionRequestedToInflight(instantRequested, Option.empty());
+  timeline.saveAsComplete(
+  new HoodieInstant(true, instantRequested.getAction(), 
instantRequested.getTimestamp()),
+  Option.empty());
+  metaClient.reloadActiveTimeline();

Review Comment:
   We also need some tests for incremental data source.



##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala:
##
@@ -133,8 +133,16 @@ trait HoodieIncrementalRelationTrait extends 
HoodieBaseRelation {
   // Validate this Incremental implementation is properly configured
   validate()
 
+  private val useStateTransitionTime = 
optParams.get(DataSourceReadOptions.INCREMENTAL_FETCH_INSTANT_BY_STATE_TRANSITION_TIME.key)
+.map(_.toBoolean)
+
.getOrElse(DataSourceReadOptions.INCREMENTAL_FETCH_INSTANT_BY_STATE_TRANSITION_TIME.defaultValue)
+
   protected def startTimestamp: String = 
optParams(DataSourceReadOptions.BEGIN_INSTANTTIME.key)
-  protected def endTimestamp: String = 
optParams.getOrElse(DataSourceReadOptions.END_INSTANTTIME.key, 
super.timeline.lastInstant().get.getTimestamp)
+  protected def endTimestamp: String = if (useStateTransitionTime) {
+optParams.getOrElse(DataSourceReadOptions.END_INSTANTTIME.key, 
super.timeline.lastInstant().get.getStateTransitionTime)
+  } else {

Review Comment:
   We need some clarification for the semantics change of the two options when 
`useStateTransitionTime` is set as true.We should add some clarification for 
the new sementics of `BEGIN_INSTANTTIME` and `END_INSTANTTIME`.



##
hudi-spark-data

[GitHub] [hudi] voonhous commented on a diff in pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-16 Thread via GitHub


voonhous commented on code in PR #8418:
URL: https://github.com/apache/hudi/pull/8418#discussion_r1168238938


##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/storage/row/parquet/ParquetRowDataWriter.java:
##
@@ -31,6 +31,7 @@
 import org.apache.flink.table.types.logical.RowType;
 import org.apache.flink.table.types.logical.TimestampType;
 import org.apache.flink.util.Preconditions;
+import org.apache.hudi.common.util.ValidationUtils;
 import org.apache.parquet.io.api.Binary;

Review Comment:
   Man, after adding the hudi checkstyles, it's adding whitelines instead of 
ordering the imports... 
   
   ETA:
   Nvm, fixed it, i was reformating code instead of optimizing imports... My bad



##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/cluster/ITTestHoodieFlinkClustering.java:
##
@@ -18,6 +18,11 @@
 
 package org.apache.hudi.sink.cluster;
 
+import java.util.List;
+import java.util.stream.Collectors;
+import org.apache.flink.table.api.DataTypes;
+import org.apache.flink.table.api.TableResult;
+import org.apache.flink.table.api.ValidationException;

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] boneanxs commented on a diff in pull request #7627: [HUDI-5517] HoodieTimeline support filter instants by state transition time

2023-04-16 Thread via GitHub


boneanxs commented on code in PR #7627:
URL: https://github.com/apache/hudi/pull/7627#discussion_r1168236489


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/IncrementalRelation.scala:
##
@@ -205,6 +218,9 @@ class IncrementalRelation(val sqlContext: SQLContext,
   val endInstantArchived = 
commitTimeline.isBeforeTimelineStarts(endInstantTime)
 
   val scanDf = if (fallbackToFullTableScan && (startInstantArchived || 
endInstantArchived)) {
+if (useStateTransitionTime) {
+  throw new HoodieException("Cannot use stateTransitionTime while 
enables full table scan")

Review Comment:
   If enables `useStateTransitionTime`, means they provides state transition 
time for `startInstantTime` and `endInstantTime`, and they will be used to 
compare with `_hoodie_commit_time`, then the result is not accurate. So here we 
better then error out, or force setting `useStateTransitionTime` back to false?



##
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstant.java:
##
@@ -46,14 +55,35 @@ public class HoodieInstant implements Serializable, 
Comparable {
   public static final Comparator COMPARATOR = 
Comparator.comparing(HoodieInstant::getTimestamp)
   .thenComparing(ACTION_COMPARATOR).thenComparing(HoodieInstant::getState);
 
+  public static final Comparator STATE_TRANSITION_COMPARATOR =
+  Comparator.comparing(HoodieInstant::getStateTransitionTime)

Review Comment:
   We need to use `STATE_TRANSITION_COMPARATOR` in 
`Timeline.getInstantsOrderedByStateTransitionTs`, which should sort 
`stateTransitionTime` first. So here cannot use `COMPARATOR` directly



##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/IncrementalRelation.scala:
##
@@ -82,9 +87,17 @@ class IncrementalRelation(val sqlContext: SQLContext,
 
   private val lastInstant = commitTimeline.lastInstant().get()
 
-  private val commitsTimelineToReturn = commitTimeline.findInstantsInRange(
-optParams(DataSourceReadOptions.BEGIN_INSTANTTIME.key),
-optParams.getOrElse(DataSourceReadOptions.END_INSTANTTIME.key(), 
lastInstant.getTimestamp))
+  private val commitsTimelineToReturn = {
+if (useStateTransitionTime) {

Review Comment:
   If I understand correctly, we only use begin and end instant time(which is 
`stateTransitionTime` if enabled) to filter instants, and still use instants' 
commit time to compare with `_hoodie_commit_time`



##
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstant.java:
##
@@ -70,32 +100,38 @@ public enum State {
 NIL
   }
 
-  private State state = State.COMPLETED;
-  private String action;
-  private String timestamp;
+  private final State state;
+  private final String action;
+  private final String timestamp;
+  private final String stateTransitionTime;
 
   /**
* Load the instant from the meta FileStatus.
*/
   public HoodieInstant(FileStatus fileStatus) {
 // First read the instant timestamp. [==>20170101193025<==].commit
 String fileName = fileStatus.getPath().getName();
-String fileExtension = getTimelineFileExtension(fileName);
-timestamp = fileName.replace(fileExtension, "");
-
-// Next read the action for this marker
-action = fileExtension.replaceFirst(".", "");
-if (action.equals("inflight")) {
-  // This is to support backwards compatibility on how in-flight commit 
files were written
-  // General rule is inflight extension is ..inflight, but for 
commit it is .inflight
-  action = "commit";
-  state = State.INFLIGHT;
-} else if (action.contains(HoodieTimeline.INFLIGHT_EXTENSION)) {
-  state = State.INFLIGHT;
-  action = action.replace(HoodieTimeline.INFLIGHT_EXTENSION, "");
-} else if (action.contains(HoodieTimeline.REQUESTED_EXTENSION)) {
-  state = State.REQUESTED;
-  action = action.replace(HoodieTimeline.REQUESTED_EXTENSION, "");
+Matcher matcher = NAME_FORMAT.matcher(fileName);
+if (matcher.find()) {
+  timestamp = matcher.group(1);
+  if (matcher.group(2).equals(HoodieTimeline.INFLIGHT_EXTENSION)) {
+// This is to support backwards compatibility on how in-flight commit 
files were written
+// General rule is inflight extension is ..inflight, but for 
commit it is .inflight
+action = "commit";
+state = State.INFLIGHT;
+  } else {
+action = matcher.group(2).replaceFirst(".", "");
+if (matcher.groupCount() == 3 && matcher.group(3) != null) {
+  state = State.valueOf(matcher.group(3).replaceFirst(".", 
"").toUpperCase());
+} else {
+  // Like 20230104152218702.commit
+  state = State.COMPLETED;
+}
+  }
+  stateTransitionTime =

Review Comment:
   Thank you, @vinothchandar , for bringing up this important question. We 
apologize for not considering this aspect earlier. Since the timeline of the 
new migrated table coul

[GitHub] [hudi] voonhous commented on a diff in pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-16 Thread via GitHub


voonhous commented on code in PR #8418:
URL: https://github.com/apache/hudi/pull/8418#discussion_r1168238938


##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/storage/row/parquet/ParquetRowDataWriter.java:
##
@@ -31,6 +31,7 @@
 import org.apache.flink.table.types.logical.RowType;
 import org.apache.flink.table.types.logical.TimestampType;
 import org.apache.flink.util.Preconditions;
+import org.apache.hudi.common.util.ValidationUtils;
 import org.apache.parquet.io.api.Binary;

Review Comment:
   Man, after adding the hudi checkstyles, it's adding whitelines instead of 
ordering the imports... 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Guanpx opened a new issue, #8475: [SUPPORT] ERROR HoodieMetadataException with spark clean

2023-04-16 Thread via GitHub


Guanpx opened a new issue, #8475:
URL: https://github.com/apache/hudi/issues/8475

   
   
   **Describe the problem you faced**
   
   when i use org.apache.hudi.utilities.HoodieCleaner to clean hudi old version 
files, meet this error;
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   ```
   spark-submit --master yarn \
   --name B_P1_HUDI_clean_$1 \
   --deploy-mode cluster \
   --driver-memory 2g \
   --executor-memory 500m \
   --executor-cores 2 \
   --num-executors 1 \
   --conf spark.default.parallelism=200 \
   --conf spark.dynamicAllocation.enabled=false \
   --class  org.apache.hudi.utilities.HoodieCleaner \
   /home/cdh/pxguan/spark_offline/hudi/hudi-utilities-bundle_2.12-0.12.0.jar 
   --target-base-path path_to_table --props props
   ```
   
   
   **Expected behavior**
   
   **there are 8000+ files in table_path ;
   when files 100+, this error does not face**
   
   **Environment Description**
   
   * Hudi version : 0.12.0
   
   * Spark version : 2.4.3
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   
   
   **Stacktrace**
   
   ```
   ERROR service.RequestHandler: Got runtime exception servicing request 
partition=&maxinstant=20230417133654798&basepath=hdfs%3A%2Fhudi%2Fdw%2Frds.db%table_path&lastinstantts=20230417140011979&timelinehash=bb0d1f56fec5e7b3f00202df2d61e989975ae7a568d6fe4dd0965615c431715b
   org.apache.hudi.exception.HoodieMetadataException: Failed to retrieve files 
in partition hdfs:/hudi/dw/rds.db/table_path from metadata
at 
org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:137)
at 
org.apache.hudi.metadata.HoodieMetadataFileSystemView.listPartition(HoodieMetadataFileSystemView.java:65)
at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$9(AbstractTableFileSystemView.java:305)
at 
java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:296)
at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.getAllFileGroupsIncludingReplaced(AbstractTableFileSystemView.java:744)
at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.getReplacedFileGroupsBefore(AbstractTableFileSystemView.java:758)
at 
org.apache.hudi.timeline.service.handlers.FileSliceHandler.getReplacedFileGroupsBefore(FileSliceHandler.java:102)
at 
org.apache.hudi.timeline.service.RequestHandler.lambda$registerFileSlicesAPI$21(RequestHandler.java:402)
at 
org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(RequestHandler.java:498)
at 
io.javalin.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:22)
at io.javalin.Javalin.lambda$addHandler$0(Javalin.java:606)
at 
io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:46)
at 
io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:17)
at 
io.javalin.core.JavalinServlet$service$1.invoke(JavalinServlet.kt:143)
at io.javalin.core.JavalinServlet$service$2.invoke(JavalinServlet.kt:41)
at io.javalin.core.JavalinServlet.service(JavalinServlet.kt:107)
at 
io.javalin.core.util.JettyServerUtil$initialize$httpHandler$1.doHandle(JettyServerUtil.kt:72)
at 
org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at 
org.apache.hudi.org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
at 
org.apache.hudi.org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1668)
at 
org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at 
org.apache.hudi.org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
at 
org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at 
org.apache.hudi.org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61)
at 
org.apache.hudi.org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:174)
at 
org.apache.hudi.org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.apache.hudi.org.eclipse.jetty.server.Server.handle(Server.java:502)
at 
org.apache.hudi.org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
at 
org.apache.hudi.org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
at 
org.apache.hudi.org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
at 
org.apache.hudi.org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at 
org.apache.hudi.org.eclipse.jetty.io.ChannelEndPoint$2.run(Cha

[GitHub] [hudi] danny0405 commented on a diff in pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-16 Thread via GitHub


danny0405 commented on code in PR #8418:
URL: https://github.com/apache/hudi/pull/8418#discussion_r116823


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/cluster/ITTestHoodieFlinkClustering.java:
##
@@ -18,6 +18,11 @@
 
 package org.apache.hudi.sink.cluster;
 
+import java.util.List;
+import java.util.stream.Collectors;
+import org.apache.flink.table.api.DataTypes;
+import org.apache.flink.table.api.TableResult;
+import org.apache.flink.table.api.ValidationException;

Review Comment:
   hudi package should be in the first block and isolated with other blocks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on a diff in pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-16 Thread via GitHub


voonhous commented on code in PR #8418:
URL: https://github.com/apache/hudi/pull/8418#discussion_r1168231169


##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/storage/row/parquet/ParquetRowDataWriter.java:
##
@@ -31,6 +31,7 @@
 import org.apache.flink.table.types.logical.RowType;
 import org.apache.flink.table.types.logical.TimestampType;
 import org.apache.flink.util.Preconditions;
+import org.apache.hudi.common.util.ValidationUtils;
 import org.apache.parquet.io.api.Binary;

Review Comment:
   Got it, will fix that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-16 Thread via GitHub


danny0405 commented on code in PR #8418:
URL: https://github.com/apache/hudi/pull/8418#discussion_r1168230549


##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/storage/row/parquet/ParquetRowDataWriter.java:
##
@@ -31,6 +31,7 @@
 import org.apache.flink.table.types.logical.RowType;
 import org.apache.flink.table.types.logical.TimestampType;
 import org.apache.flink.util.Preconditions;
+import org.apache.hudi.common.util.ValidationUtils;
 import org.apache.parquet.io.api.Binary;

Review Comment:
   Take other Java files for reference of import sequence.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-16 Thread via GitHub


danny0405 commented on code in PR #8418:
URL: https://github.com/apache/hudi/pull/8418#discussion_r1168230208


##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/storage/row/parquet/ParquetRowDataWriter.java:
##
@@ -31,6 +31,7 @@
 import org.apache.flink.table.types.logical.RowType;
 import org.apache.flink.table.types.logical.TimestampType;
 import org.apache.flink.util.Preconditions;
+import org.apache.hudi.common.util.ValidationUtils;
 import org.apache.parquet.io.api.Binary;

Review Comment:
   org.apache.hudi package should be alway in the highest import sequence, you 
many need to import the checkstyle from the file: 
https://github.com/apache/hudi/blob/d9b29e540314f842d8d3aa86f8429cca8a8cf786/style/checkstyle.xml#L291



##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/storage/row/parquet/ParquetRowDataWriter.java:
##
@@ -31,6 +31,7 @@
 import org.apache.flink.table.types.logical.RowType;
 import org.apache.flink.table.types.logical.TimestampType;
 import org.apache.flink.util.Preconditions;
+import org.apache.hudi.common.util.ValidationUtils;
 import org.apache.parquet.io.api.Binary;

Review Comment:
   org.apache.hudi package should be always in the highest import sequence, you 
many need to import the checkstyle from the file: 
https://github.com/apache/hudi/blob/d9b29e540314f842d8d3aa86f8429cca8a8cf786/style/checkstyle.xml#L291



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8335: [HUDI-6009] Let the jetty server in TimelineService create daemon threads

2023-04-16 Thread via GitHub


danny0405 commented on code in PR #8335:
URL: https://github.com/apache/hudi/pull/8335#discussion_r1168227069


##
hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/TimelineService.java:
##
@@ -342,8 +341,19 @@ private int startServiceOnPort(int port) throws 
IOException {
   }
 
   public int startService() throws IOException {
-final Server server = timelineServerConf.numThreads == DEFAULT_NUM_THREADS 
? new JettyServer(new JavalinConfig()).server() :
-new Server(new QueuedThreadPool(timelineServerConf.numThreads));
+int maxThreads;
+if (timelineServerConf.numThreads > 0) {

Review Comment:
   int maxThreads = timelineServerConf.numThreads > 0 ? 
timelineServerConf.numThreads : 250;



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8335: [HUDI-6009] Let the jetty server in TimelineService create daemon threads

2023-04-16 Thread via GitHub


danny0405 commented on code in PR #8335:
URL: https://github.com/apache/hudi/pull/8335#discussion_r1168224408


##
hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/TimelineService.java:
##
@@ -342,8 +341,19 @@ private int startServiceOnPort(int port) throws 
IOException {
   }
 
   public int startService() throws IOException {
-final Server server = timelineServerConf.numThreads == DEFAULT_NUM_THREADS 
? new JettyServer(new JavalinConfig()).server() :
-new Server(new QueuedThreadPool(timelineServerConf.numThreads));
+int maxThreads;
+if (timelineServerConf.numThreads > 0) {
+  maxThreads = timelineServerConf.numThreads;
+} else {
+  // io.javalin.jetty.JettyUtil.defaultThreadPool
+  maxThreads = 250;
+}
+QueuedThreadPool pool = new QueuedThreadPool(maxThreads, 8, 60_000);
+pool.setDaemon(true);
+final Server server = new Server(pool);
+ScheduledExecutorScheduler scheduler = new 
ScheduledExecutorScheduler("TimelineService-JettyScheduler", true, 8);
+server.addBean(scheduler);
+

Review Comment:
   So the `#addBean` is used for lifecycle management of the server components.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7614: [HUDI-5509] check if dfs support atomic creation when using filesyste…

2023-04-16 Thread via GitHub


hudi-bot commented on PR #7614:
URL: https://github.com/apache/hudi/pull/7614#issuecomment-1510750239

   
   ## CI report:
   
   * 058ab2703bda207fc9f5861d5e4b865e83ee1b45 UNKNOWN
   * 9b853e4f449343ffae850eae895191fbdfb94b12 UNKNOWN
   * 60b50fe79f0e316d591dbecff68cbc3c2c5b4a4b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16193)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8449: [HUDI-6071] If the Flink Hive Catalog is used and the table type is B…

2023-04-16 Thread via GitHub


hudi-bot commented on PR #8449:
URL: https://github.com/apache/hudi/pull/8449#issuecomment-1510745606

   
   ## CI report:
   
   * c8acee7666a4cabce9b9eb76b1da71b1f6826bf9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16326)
 
   * 576bfa355295722a082fd176cd1f9a8e81dea94e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16386)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-16 Thread via GitHub


hudi-bot commented on PR #8418:
URL: https://github.com/apache/hudi/pull/8418#issuecomment-1510745435

   
   ## CI report:
   
   * 1579a6e6966b051e9a6a5b696a9e6e35500a929a UNKNOWN
   * 79bc2ba070ecbcbdeab6ff88f9424cbd5a6b5e49 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16381)
 
   * 5f063bb1db0459bb6a62dd55a57279cf45f4e9fa Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16385)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7614: [HUDI-5509] check if dfs support atomic creation when using filesyste…

2023-04-16 Thread via GitHub


hudi-bot commented on PR #7614:
URL: https://github.com/apache/hudi/pull/7614#issuecomment-1510744321

   
   ## CI report:
   
   * 058ab2703bda207fc9f5861d5e4b865e83ee1b45 UNKNOWN
   * 9b853e4f449343ffae850eae895191fbdfb94b12 UNKNOWN
   * d49c1d2e4ce8b6e2cc83da12a74ccf27912013b7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16377)
 
   * 60b50fe79f0e316d591dbecff68cbc3c2c5b4a4b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8449: [HUDI-6071] If the Flink Hive Catalog is used and the table type is B…

2023-04-16 Thread via GitHub


hudi-bot commented on PR #8449:
URL: https://github.com/apache/hudi/pull/8449#issuecomment-1510739107

   
   ## CI report:
   
   * c8acee7666a4cabce9b9eb76b1da71b1f6826bf9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16326)
 
   * 576bfa355295722a082fd176cd1f9a8e81dea94e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-16 Thread via GitHub


hudi-bot commented on PR #8418:
URL: https://github.com/apache/hudi/pull/8418#issuecomment-1510738769

   
   ## CI report:
   
   * 1579a6e6966b051e9a6a5b696a9e6e35500a929a UNKNOWN
   * be4235b4f42ffe845ec17dad7113af2de7a94332 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16380)
 
   * 79bc2ba070ecbcbdeab6ff88f9424cbd5a6b5e49 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16381)
 
   * 5f063bb1db0459bb6a62dd55a57279cf45f4e9fa UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ChestnutQiang commented on a diff in pull request #8449: [HUDI-6071] If the Flink Hive Catalog is used and the table type is B…

2023-04-16 Thread via GitHub


ChestnutQiang commented on code in PR #8449:
URL: https://github.com/apache/hudi/pull/8449#discussion_r1168193918


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/util/ExpressionUtilsTest.java:
##
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.util;
+
+import org.apache.flink.table.api.DataTypes;
+import org.apache.flink.table.expressions.CallExpression;
+import org.apache.flink.table.expressions.Expression;
+import org.apache.flink.table.expressions.FieldReferenceExpression;
+import org.apache.flink.table.expressions.ValueLiteralExpression;
+import org.apache.flink.table.functions.BuiltInFunctionDefinitions;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.RowType;
+import org.junit.jupiter.api.Test;
+
+import java.util.Arrays;
+import java.util.List;
+
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+class ExpressionUtilsTest {
+

Review Comment:
   @danny0405



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on a diff in pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-16 Thread via GitHub


voonhous commented on code in PR #8418:
URL: https://github.com/apache/hudi/pull/8418#discussion_r1168176154


##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/storage/row/parquet/ParquetRowDataWriter.java:
##
@@ -31,6 +31,7 @@
 import org.apache.flink.table.types.logical.RowType;
 import org.apache.flink.table.types.logical.TimestampType;
 import org.apache.flink.util.Preconditions;
+import org.apache.hudi.common.util.ValidationUtils;
 import org.apache.parquet.io.api.Binary;

Review Comment:
   Hmmm, I don't think there's anything wrong with this, `h` is after `f`, so, 
`org.apache.hudi` should be behind `org.apache.flink` right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ChestnutQiang commented on a diff in pull request #8449: [HUDI-6071] If the Flink Hive Catalog is used and the table type is B…

2023-04-16 Thread via GitHub


ChestnutQiang commented on code in PR #8449:
URL: https://github.com/apache/hudi/pull/8449#discussion_r1168173285


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/util/ExpressionUtilsTest.java:
##
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.util;
+
+import org.apache.flink.table.api.DataTypes;
+import org.apache.flink.table.expressions.CallExpression;
+import org.apache.flink.table.expressions.Expression;
+import org.apache.flink.table.expressions.FieldReferenceExpression;
+import org.apache.flink.table.expressions.ValueLiteralExpression;
+import org.apache.flink.table.functions.BuiltInFunctionDefinitions;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.RowType;
+import org.junit.jupiter.api.Test;
+
+import java.util.Arrays;
+import java.util.List;
+
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+class ExpressionUtilsTest {
+

Review Comment:
   I have added tests for non null literals in TestExpressionUtils.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on a diff in pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-16 Thread via GitHub


voonhous commented on code in PR #8418:
URL: https://github.com/apache/hudi/pull/8418#discussion_r1168173022


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/cluster/ITTestHoodieFlinkClustering.java:
##
@@ -18,6 +18,11 @@
 
 package org.apache.hudi.sink.cluster;
 
+import java.util.List;
+import java.util.stream.Collectors;
+import org.apache.flink.table.api.DataTypes;
+import org.apache.flink.table.api.TableResult;
+import org.apache.flink.table.api.ValidationException;

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-16 Thread via GitHub


danny0405 commented on code in PR #8418:
URL: https://github.com/apache/hudi/pull/8418#discussion_r1168171291


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/cluster/ITTestHoodieFlinkClustering.java:
##
@@ -18,6 +18,11 @@
 
 package org.apache.hudi.sink.cluster;
 
+import java.util.List;
+import java.util.stream.Collectors;
+import org.apache.flink.table.api.DataTypes;
+import org.apache.flink.table.api.TableResult;
+import org.apache.flink.table.api.ValidationException;

Review Comment:
   Fix the import sequence.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-16 Thread via GitHub


danny0405 commented on code in PR #8418:
URL: https://github.com/apache/hudi/pull/8418#discussion_r1168171167


##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/storage/row/parquet/ParquetRowDataWriter.java:
##
@@ -31,6 +31,7 @@
 import org.apache.flink.table.types.logical.RowType;
 import org.apache.flink.table.types.logical.TimestampType;
 import org.apache.flink.util.Preconditions;
+import org.apache.hudi.common.util.ValidationUtils;
 import org.apache.parquet.io.api.Binary;

Review Comment:
   You might need to fix your import sequence of the IDEA.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7614: [HUDI-5509] check if dfs support atomic creation when using filesyste…

2023-04-16 Thread via GitHub


hudi-bot commented on PR #7614:
URL: https://github.com/apache/hudi/pull/7614#issuecomment-1510702491

   
   ## CI report:
   
   * 058ab2703bda207fc9f5861d5e4b865e83ee1b45 UNKNOWN
   * 9b853e4f449343ffae850eae895191fbdfb94b12 UNKNOWN
   * d49c1d2e4ce8b6e2cc83da12a74ccf27912013b7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16377)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8455: [HUDI-6076] Change clustering.plan.strategy.small.file.limit's unit to byte

2023-04-16 Thread via GitHub


hudi-bot commented on PR #8455:
URL: https://github.com/apache/hudi/pull/8455#issuecomment-1510696166

   
   ## CI report:
   
   * f0215afb8f8298848391fa8168189832c614a667 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16342)
 
   * e91532fbc17af62a0227bd780426330dfe92d533 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16384)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8385: [HUDI-6040]Stop writing and reading compaction plans from .aux folder

2023-04-16 Thread via GitHub


hudi-bot commented on PR #8385:
URL: https://github.com/apache/hudi/pull/8385#issuecomment-1510695513

   
   ## CI report:
   
   * 3874447e48c21cb336f28625e1682b8f229f623c UNKNOWN
   * 1cd0db680780d02ff786121f394dccfcd621d37d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16378)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8380: [HUDI-6033] Fix rounding exception when to decimal casting

2023-04-16 Thread via GitHub


hudi-bot commented on PR #8380:
URL: https://github.com/apache/hudi/pull/8380#issuecomment-1510695444

   
   ## CI report:
   
   * 4127079fc6162fee6b08501c700cf9b835a38d3c UNKNOWN
   * ddf99d1d66b9b98deeadc09136e07a0aaceb5c8a UNKNOWN
   * 27d656870879682bdabebbbf2c2b00a98d1fa579 UNKNOWN
   * 09c2c89681611cd6a142822a69d01ddb11cb96bc Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16313)
 
   * 3bf8c8558ce88f9fe97efe290444a81a8f6a UNKNOWN
   * 07418ebae94ef7b69eb0aec3d9964548046be44c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16383)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8455: [HUDI-6076] Change clustering.plan.strategy.small.file.limit's unit to byte

2023-04-16 Thread via GitHub


hudi-bot commented on PR #8455:
URL: https://github.com/apache/hudi/pull/8455#issuecomment-1510690326

   
   ## CI report:
   
   * f0215afb8f8298848391fa8168189832c614a667 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16342)
 
   * e91532fbc17af62a0227bd780426330dfe92d533 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8380: [HUDI-6033] Fix rounding exception when to decimal casting

2023-04-16 Thread via GitHub


hudi-bot commented on PR #8380:
URL: https://github.com/apache/hudi/pull/8380#issuecomment-1510690134

   
   ## CI report:
   
   * 4127079fc6162fee6b08501c700cf9b835a38d3c UNKNOWN
   * ddf99d1d66b9b98deeadc09136e07a0aaceb5c8a UNKNOWN
   * 27d656870879682bdabebbbf2c2b00a98d1fa579 UNKNOWN
   * 09c2c89681611cd6a142822a69d01ddb11cb96bc Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16313)
 
   * 3bf8c8558ce88f9fe97efe290444a81a8f6a UNKNOWN
   * 07418ebae94ef7b69eb0aec3d9964548046be44c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-16 Thread via GitHub


hudi-bot commented on PR #8418:
URL: https://github.com/apache/hudi/pull/8418#issuecomment-1510686260

   
   ## CI report:
   
   * 1579a6e6966b051e9a6a5b696a9e6e35500a929a UNKNOWN
   * be4235b4f42ffe845ec17dad7113af2de7a94332 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16380)
 
   * 79bc2ba070ecbcbdeab6ff88f9424cbd5a6b5e49 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16381)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8394: [HUDI-6085] Eliminate cleaning tasks for flink mor table if async cleaning is disabled

2023-04-16 Thread via GitHub


hudi-bot commented on PR #8394:
URL: https://github.com/apache/hudi/pull/8394#issuecomment-1510686209

   
   ## CI report:
   
   * 36fc037a6ef0e2c6c3409694a50a196b738d4e4d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16307)
 
   * 79ec658c4c5280dc2ba0fbc5d4570a76fe09952a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16382)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8380: [HUDI-6033] Fix rounding exception when to decimal casting

2023-04-16 Thread via GitHub


hudi-bot commented on PR #8380:
URL: https://github.com/apache/hudi/pull/8380#issuecomment-1510686119

   
   ## CI report:
   
   * 4127079fc6162fee6b08501c700cf9b835a38d3c UNKNOWN
   * ddf99d1d66b9b98deeadc09136e07a0aaceb5c8a UNKNOWN
   * 27d656870879682bdabebbbf2c2b00a98d1fa579 UNKNOWN
   * 09c2c89681611cd6a142822a69d01ddb11cb96bc Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16313)
 
   * 3bf8c8558ce88f9fe97efe290444a81a8f6a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Zouxxyy commented on pull request #8455: [HUDI-6076] Change clustering.plan.strategy.small.file.limit's unit to byte

2023-04-16 Thread via GitHub


Zouxxyy commented on PR #8455:
URL: https://github.com/apache/hudi/pull/8455#issuecomment-1510680652

   @danny0405 done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on pull request #8380: [HUDI-6033] Fix rounding exception when to decimal casting

2023-04-16 Thread via GitHub


voonhous commented on PR #8380:
URL: https://github.com/apache/hudi/pull/8380#issuecomment-1510662033

   @danny0405 Can you please help to review this PR again? resolved the merge 
conflicts caused by changes in javadoc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-04-16 Thread via GitHub


danny0405 commented on code in PR #8472:
URL: https://github.com/apache/hudi/pull/8472#discussion_r1168133213


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordStatus.java:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import com.esotericsoftware.kryo.Kryo;
+import com.esotericsoftware.kryo.KryoSerializable;
+import com.esotericsoftware.kryo.io.Input;
+import com.esotericsoftware.kryo.io.Output;
+
+import java.io.Serializable;
+
+public class HoodieRecordStatus implements Serializable, KryoSerializable {
+
+

Review Comment:
   key + location are actually an index item, just rename it to 
`HoodieIndexItem` ?



##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordStatus.java:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import com.esotericsoftware.kryo.Kryo;
+import com.esotericsoftware.kryo.KryoSerializable;
+import com.esotericsoftware.kryo.io.Input;
+import com.esotericsoftware.kryo.io.Output;
+
+import java.io.Serializable;
+
+public class HoodieRecordStatus implements Serializable, KryoSerializable {
+
+

Review Comment:
   key + location are actually an index item, just rename it to `IndexItem` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-16 Thread via GitHub


hudi-bot commented on PR #8418:
URL: https://github.com/apache/hudi/pull/8418#issuecomment-1510657026

   
   ## CI report:
   
   * 1579a6e6966b051e9a6a5b696a9e6e35500a929a UNKNOWN
   * cb05421be9bb950f7dadfc6a5cdfa4c07e5de6a3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16340)
 
   * be4235b4f42ffe845ec17dad7113af2de7a94332 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16380)
 
   * 79bc2ba070ecbcbdeab6ff88f9424cbd5a6b5e49 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8394: [HUDI-6085] Eliminate cleaning tasks for flink mor table if async cleaning is disabled

2023-04-16 Thread via GitHub


hudi-bot commented on PR #8394:
URL: https://github.com/apache/hudi/pull/8394#issuecomment-1510656865

   
   ## CI report:
   
   * 36fc037a6ef0e2c6c3409694a50a196b738d4e4d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16307)
 
   * 79ec658c4c5280dc2ba0fbc5d4570a76fe09952a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #8455: [HUDI-6076] Change clustering.plan.strategy.small.file.limit's unit to byte

2023-04-16 Thread via GitHub


danny0405 commented on PR #8455:
URL: https://github.com/apache/hudi/pull/8455#issuecomment-1510655063

   > just fix the bug and not change the unit
   
   Let's fix the bug first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8467: [HUDI-6084] Added FailOnFirstErrorWriteStatus for MDT to ensure that write operations fail fast on any error.

2023-04-16 Thread via GitHub


danny0405 commented on code in PR #8467:
URL: https://github.com/apache/hudi/pull/8467#discussion_r1168129395


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -170,9 +171,12 @@ protected  
HoodieBackedTableMetadataWriter(Configu
   "Cleaning is controlled internally for Metadata table.");
   
ValidationUtils.checkArgument(!this.metadataWriteConfig.inlineCompactionEnabled(),
   "Compaction is controlled internally for metadata table.");
-  // Metadata Table cannot have metadata listing turned on. (infinite 
loop, much?)
+  // Auto commit is required
   
ValidationUtils.checkArgument(this.metadataWriteConfig.shouldAutoCommit(),
   "Auto commit is required for Metadata Table");
+  
ValidationUtils.checkArgument(this.metadataWriteConfig.getWriteStatusClassName().equals(FailOnFirstErrorWriteStatus.class.getName()),
+  "MDT should use " + FailOnFirstErrorWriteStatus.class.getName());
+  // Metadata Table cannot have metadata listing turned on. (infinite 
loop, much?)

Review Comment:
   Not sure where the MDT is configured.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-16 Thread via GitHub


hudi-bot commented on PR #8418:
URL: https://github.com/apache/hudi/pull/8418#issuecomment-1510653019

   
   ## CI report:
   
   * 1579a6e6966b051e9a6a5b696a9e6e35500a929a UNKNOWN
   * cb05421be9bb950f7dadfc6a5cdfa4c07e5de6a3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16340)
 
   * be4235b4f42ffe845ec17dad7113af2de7a94332 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on issue #5857: [SUPPORT]Problem using Multiple writers(flink spark) to write to hudi

2023-04-16 Thread via GitHub


danny0405 commented on issue #5857:
URL: https://github.com/apache/hudi/issues/5857#issuecomment-1510652578

   > > Flink multi writers (OCC) is not supported yet.
   > 
   > Do you mean if there are two flink writer to write hudi, there would have 
error?
   
   Yes, we are trying to support lockless multi-writer for next release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-16 Thread via GitHub


voonhous commented on PR #8418:
URL: https://github.com/apache/hudi/pull/8418#issuecomment-1510652533

   Added more checkstyle fixes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #8394: [HUDI-6085] Eliminate cleaning tasks for flink mor table if async cleaning is disabled

2023-04-16 Thread via GitHub


zhuanshenbsj1 commented on code in PR #8394:
URL: https://github.com/apache/hudi/pull/8394#discussion_r1168119639


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/HoodieFlinkClusteringJob.java:
##
@@ -358,6 +358,9 @@ private void cluster() throws Exception {
   .setParallelism(1);
 
   env.execute("flink_hudi_clustering_" + clusteringInstant.getTimestamp());
+  if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED) && 
conf.getBoolean(FlinkOptions.CLEAN_OFFLINE_ENABLE)){
+writeClient.clean();
+  }

Review Comment:
   Oh, I see. My Hudi version does not include this PR 
https://github.com/apache/hudi/pull/6515, So cleanFunction#open only do clean 
when OptionsResolver.isInsertOverwrite(conf) is true.
   
   ```
   @Override
public void open(Configuration parameters) throws Exception {
  super.open(parameters);
  if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) {
this.writeClient = FlinkWriteClients.createWriteClient(conf, 
getRuntimeContext());
this.executor = 
NonThrownExecutor.builder(LOG).waitForTasksFinish(true).build();
   
if (OptionsResolver.isInsertOverwrite(conf)) {
  String instantTime = HoodieActiveTimeline.createNewInstantTime();
  LOG.info(String.format("exec sync clean with instant time %s...", 
instantTime));
  executor.execute(() -> writeClient.clean(instantTime), "wait for sync 
cleaning finish");
}
  }
}
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8385: [HUDI-6040]Stop writing and reading compaction plans from .aux folder

2023-04-16 Thread via GitHub


hudi-bot commented on PR #8385:
URL: https://github.com/apache/hudi/pull/8385#issuecomment-1510649769

   
   ## CI report:
   
   * 3874447e48c21cb336f28625e1682b8f229f623c UNKNOWN
   * 6fd073a2061145c3800023590c50f55837a59171 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16299)
 
   * 1cd0db680780d02ff786121f394dccfcd621d37d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16378)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #8394: [HUDI-6085] Eliminate cleaning tasks for flink mor table if async cleaning is disabled

2023-04-16 Thread via GitHub


zhuanshenbsj1 commented on code in PR #8394:
URL: https://github.com/apache/hudi/pull/8394#discussion_r1168119189


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java:
##
@@ -314,6 +314,9 @@ private void compact() throws Exception {
   .setParallelism(1);
 
   env.execute("flink_hudi_compaction_" + String.join(",", 
compactionInstantTimes));
+  if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED) && 
conf.getBoolean(FlinkOptions.CLEAN_OFFLINE_ENABLE)){
+writeClient.clean();
+  }

Review Comment:
   Oh, I see.  My Hudi version does not include this PR 
https://github.com/apache/hudi/pull/6515, So cleanFunction#open only do clean 
when OptionsResolver.isInsertOverwrite(conf) is true.
   
```
@Override
 public void open(Configuration parameters) throws Exception {
   super.open(parameters);
   if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) {
 this.writeClient = FlinkWriteClients.createWriteClient(conf, 
getRuntimeContext());
 this.executor = 
NonThrownExecutor.builder(LOG).waitForTasksFinish(true).build();
   
 if (OptionsResolver.isInsertOverwrite(conf)) {
   String instantTime = HoodieActiveTimeline.createNewInstantTime();
   LOG.info(String.format("exec sync clean with instant time %s...", 
instantTime));
   executor.execute(() -> writeClient.clean(instantTime), "wait for 
sync cleaning finish");
 }
   }
 }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-6080) Use hoodie.properties to determine if the table exists

2023-04-16 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-6080.

Fix Version/s: 0.14.0
   Resolution: Fixed

Fixed via master branch: d9b29e540314f842d8d3aa86f8429cca8a8cf786

> Use hoodie.properties to determine if the table exists
> --
>
> Key: HUDI-6080
> URL: https://issues.apache.org/jira/browse/HUDI-6080
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: xi chaomin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> After HUDI-6005, if OCC is enabled and WRITE_CLIENT_ID is not set, flink will 
> create .hoodie dir before init table. But we check if table exists with 
> .hoodie, we will get an exception when we write to hudi.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[hudi] branch master updated: [HUDI-6080] Use hoodie.properties to determine if the table exists (#8462)

2023-04-16 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new d9b29e54031 [HUDI-6080] Use hoodie.properties to determine if the 
table exists (#8462)
d9b29e54031 is described below

commit d9b29e540314f842d8d3aa86f8429cca8a8cf786
Author: Manu <36392121+x...@users.noreply.github.com>
AuthorDate: Mon Apr 17 11:35:09 2023 +0800

[HUDI-6080] Use hoodie.properties to determine if the table exists (#8462)
---
 .../java/org/apache/hudi/util/StreamerUtil.java |  3 ++-
 .../org/apache/hudi/utils/TestStreamerUtil.java | 21 +
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git 
a/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java
 
b/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java
index 1b032bf73ee..61643e68214 100644
--- 
a/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java
+++ 
b/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java
@@ -236,7 +236,8 @@ public class StreamerUtil {
 // Hadoop FileSystem
 FileSystem fs = FSUtils.getFs(basePath, hadoopConf);
 try {
-  return fs.exists(new Path(basePath, 
HoodieTableMetaClient.METAFOLDER_NAME));
+  return fs.exists(new Path(basePath, 
HoodieTableMetaClient.METAFOLDER_NAME))
+  && fs.exists(new Path(new Path(basePath, 
HoodieTableMetaClient.METAFOLDER_NAME), 
HoodieTableConfig.HOODIE_PROPERTIES_FILE));
 } catch (IOException e) {
   throw new HoodieException("Error while checking whether table exists 
under path:" + basePath, e);
 }
diff --git 
a/hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/utils/TestStreamerUtil.java
 
b/hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/utils/TestStreamerUtil.java
index a641811bb73..d3bdc479d31 100644
--- 
a/hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/utils/TestStreamerUtil.java
+++ 
b/hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/utils/TestStreamerUtil.java
@@ -18,13 +18,18 @@
 
 package org.apache.hudi.utils;
 
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.table.HoodieTableConfig;
 import org.apache.hudi.common.table.HoodieTableMetaClient;
 import org.apache.hudi.common.util.FileIOUtils;
 import org.apache.hudi.configuration.FlinkOptions;
+import org.apache.hudi.configuration.HadoopConfigurations;
 import org.apache.hudi.keygen.SimpleAvroKeyGenerator;
 import org.apache.hudi.util.StreamerUtil;
 
 import org.apache.flink.configuration.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
 import org.junit.jupiter.api.Test;
 import org.junit.jupiter.api.io.TempDir;
 
@@ -101,5 +106,21 @@ public class TestStreamerUtil {
 long diff = StreamerUtil.instantTimeDiffSeconds(higher, lower);
 assertThat(diff, is(75L));
   }
+
+  @Test
+  void testTableExist() throws IOException {
+Configuration conf = 
TestConfigurations.getDefaultConf(tempFile.getAbsolutePath());
+String basePath = tempFile.getAbsolutePath();
+
+assertFalse(StreamerUtil.tableExists(basePath, 
HadoopConfigurations.getHadoopConf(conf)));
+
+try (FileSystem fs = FSUtils.getFs(basePath, 
HadoopConfigurations.getHadoopConf(conf))) {
+  fs.mkdirs(new Path(basePath, HoodieTableMetaClient.METAFOLDER_NAME));
+  assertFalse(StreamerUtil.tableExists(basePath, 
HadoopConfigurations.getHadoopConf(conf)));
+
+  fs.create(new Path(new Path(basePath, 
HoodieTableMetaClient.METAFOLDER_NAME), 
HoodieTableConfig.HOODIE_PROPERTIES_FILE));
+  assertTrue(StreamerUtil.tableExists(basePath, 
HadoopConfigurations.getHadoopConf(conf)));
+}
+  }
 }
 



[GitHub] [hudi] danny0405 merged pull request #8462: [HUDI-6080] Use hoodie.properties to determine if the table exists

2023-04-16 Thread via GitHub


danny0405 merged PR #8462:
URL: https://github.com/apache/hudi/pull/8462


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on a diff in pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-16 Thread via GitHub


voonhous commented on code in PR #8418:
URL: https://github.com/apache/hudi/pull/8418#discussion_r1168124406


##
hudi-client/hudi-flink-client/src/test/java/org/apache/hudi/io/storage/row/parquet/TestParquetSchemaConverter.java:
##
@@ -25,6 +25,7 @@
 
 import static org.hamcrest.CoreMatchers.is;
 import static org.hamcrest.MatcherAssert.assertThat;
+import static org.junit.jupiter.api.Assertions.assertThrows;
 

Review Comment:
   Done.



##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/storage/row/HoodieRowDataParquetWriteSupport.java:
##
@@ -18,12 +18,11 @@
 
 package org.apache.hudi.io.storage.row;
 
-import org.apache.hudi.avro.HoodieBloomFilterWriteSupport;
-import org.apache.hudi.common.bloom.BloomFilter;
-
 import org.apache.flink.table.data.RowData;
 import org.apache.flink.table.types.logical.RowType;
 import org.apache.hadoop.conf.Configuration;
+import org.apache.hudi.avro.HoodieBloomFilterWriteSupport;
+import org.apache.hudi.common.bloom.BloomFilter;
 import org.apache.hudi.common.util.Option;

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] huangxiaopingRD closed pull request #8353: [MINOR] Remove unused code

2023-04-16 Thread via GitHub


huangxiaopingRD closed pull request #8353: [MINOR] Remove unused code
URL: https://github.com/apache/hudi/pull/8353


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] huangxiaopingRD commented on pull request #8353: [MINOR] Remove unused code

2023-04-16 Thread via GitHub


huangxiaopingRD commented on PR #8353:
URL: https://github.com/apache/hudi/pull/8353#issuecomment-1510645688

   > @huangxiaopingRD : Can you merge all the refactoring code to a single PR. 
Makes it easy to review and land.
   > 
   > Thanks, Balaji.V
   
   Thanks for your advice, I will do that.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-16 Thread via GitHub


danny0405 commented on code in PR #8418:
URL: https://github.com/apache/hudi/pull/8418#discussion_r1168123583


##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/storage/row/HoodieRowDataParquetWriteSupport.java:
##
@@ -18,12 +18,11 @@
 
 package org.apache.hudi.io.storage.row;
 
-import org.apache.hudi.avro.HoodieBloomFilterWriteSupport;
-import org.apache.hudi.common.bloom.BloomFilter;
-
 import org.apache.flink.table.data.RowData;
 import org.apache.flink.table.types.logical.RowType;
 import org.apache.hadoop.conf.Configuration;
+import org.apache.hudi.avro.HoodieBloomFilterWriteSupport;
+import org.apache.hudi.common.bloom.BloomFilter;
 import org.apache.hudi.common.util.Option;

Review Comment:
   Unnecessary change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #8394: [HUDI-6085] Eliminate cleaning tasks for flink mor table if async cleaning is disabled

2023-04-16 Thread via GitHub


zhuanshenbsj1 commented on code in PR #8394:
URL: https://github.com/apache/hudi/pull/8394#discussion_r1168119189


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java:
##
@@ -314,6 +314,9 @@ private void compact() throws Exception {
   .setParallelism(1);
 
   env.execute("flink_hudi_compaction_" + String.join(",", 
compactionInstantTimes));
+  if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED) && 
conf.getBoolean(FlinkOptions.CLEAN_OFFLINE_ENABLE)){
+writeClient.clean();
+  }

Review Comment:
   CleanFunction#open only do clean when 
OptionsResolver.isInsertOverwrite(conf) is true.
   
```
@Override
 public void open(Configuration parameters) throws Exception {
   super.open(parameters);
   if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) {
 this.writeClient = FlinkWriteClients.createWriteClient(conf, 
getRuntimeContext());
 this.executor = 
NonThrownExecutor.builder(LOG).waitForTasksFinish(true).build();
   
 if (OptionsResolver.isInsertOverwrite(conf)) {
   String instantTime = HoodieActiveTimeline.createNewInstantTime();
   LOG.info(String.format("exec sync clean with instant time %s...", 
instantTime));
   executor.execute(() -> writeClient.clean(instantTime), "wait for 
sync cleaning finish");
 }
   }
 }
   ```
   
   I think the real cleaning is actually in CompactionCommitSink#doCommit, and 
it depends on whether to enable configuration CLEAN_ASYNC_ENABLED.
   
   ```
 private void doCommit(String instant, Collection 
events) throws IOException {
   List statuses = events.stream()
   .map(CompactionCommitEvent::getWriteStatuses)
   .flatMap(Collection::stream)
   .collect(Collectors.toList());
   
   HoodieCommitMetadata metadata = 
CompactHelpers.getInstance().createCompactionMetadata(
   table, instant, HoodieListData.eager(statuses), 
writeClient.getConfig().getSchema());
   
   // commit the compaction
   this.writeClient.commitCompaction(instant, metadata, Option.empty());
   
   // Whether to clean up the old log file when compaction
   if (!conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) {
 this.writeClient.clean();
   }
 }
   ```
   
   Online async-clean and offline clean use the same config, So I added 
configuration CLEAN_OFFLINE_ENABLE to distinguish it from configuration 
CLEAN_ASYNC_ENABLED.
   



##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/HoodieFlinkClusteringJob.java:
##
@@ -358,6 +358,9 @@ private void cluster() throws Exception {
   .setParallelism(1);
 
   env.execute("flink_hudi_clustering_" + clusteringInstant.getTimestamp());
+  if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED) && 
conf.getBoolean(FlinkOptions.CLEAN_OFFLINE_ENABLE)){
+writeClient.clean();
+  }

Review Comment:
   CleanFunction#open only do clean when 
OptionsResolver.isInsertOverwrite(conf) is true.
   
   ```
 @Override
 public void open(Configuration parameters) throws Exception {
   super.open(parameters);
   if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) {
 this.writeClient = FlinkWriteClients.createWriteClient(conf, 
getRuntimeContext());
 this.executor = 
NonThrownExecutor.builder(LOG).waitForTasksFinish(true).build();
   
 if (OptionsResolver.isInsertOverwrite(conf)) {
   String instantTime = HoodieActiveTimeline.createNewInstantTime();
   LOG.info(String.format("exec sync clean with instant time %s...", 
instantTime));
   executor.execute(() -> writeClient.clean(instantTime), "wait for 
sync cleaning finish");
 }
   }
 }
   ```
   I think the real cleaning is actually in ClusteringCommitSink#doCommit, and 
it depends on whether to enable configuration CLEAN_ASYNC_ENABLED.
   
   ```
   
 private void doCommit(String instant, HoodieClusteringPlan clusteringPlan, 
List events) {
   List statuses = events.stream()
   .map(ClusteringCommitEvent::getWriteStatuses)
   .flatMap(Collection::stream)
   .collect(Collectors.toList());
   
   HoodieWriteMetadata> writeMetadata = new 
HoodieWriteMetadata<>();
   writeMetadata.setWriteStatuses(statuses);
   
writeMetadata.setWriteStats(statuses.stream().map(WriteStatus::getStat).collect(Collectors.toList()));
   
writeMetadata.setPartitionToReplaceFileIds(getPartitionToReplacedFileIds(clusteringPlan,
 writeMetadata));
   validateWriteResult(clusteringPlan, instant, writeMetadata);
   if (!writeMetadata.getCommitMetadata().isPresent()) {
 HoodieCommitMetadata commitMetadata = CommitUtils.buildMetadata(
 writeMetadata.getWriteStats().get(),
 writeMetadata.getPartitionToReplaceFileIds(),
 Option.empty(),
 WriteOperation

[GitHub] [hudi] danny0405 commented on a diff in pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-16 Thread via GitHub


danny0405 commented on code in PR #8418:
URL: https://github.com/apache/hudi/pull/8418#discussion_r1168123430


##
hudi-client/hudi-flink-client/src/test/java/org/apache/hudi/io/storage/row/parquet/TestParquetSchemaConverter.java:
##
@@ -25,6 +25,7 @@
 
 import static org.hamcrest.CoreMatchers.is;
 import static org.hamcrest.MatcherAssert.assertThat;
+import static org.junit.jupiter.api.Assertions.assertThrows;
 

Review Comment:
   Useless import



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #8394: [HUDI-6085] Eliminate cleaning tasks for flink mor table if async cleaning is disabled

2023-04-16 Thread via GitHub


zhuanshenbsj1 commented on code in PR #8394:
URL: https://github.com/apache/hudi/pull/8394#discussion_r1168119189


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java:
##
@@ -314,6 +314,9 @@ private void compact() throws Exception {
   .setParallelism(1);
 
   env.execute("flink_hudi_compaction_" + String.join(",", 
compactionInstantTimes));
+  if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED) && 
conf.getBoolean(FlinkOptions.CLEAN_OFFLINE_ENABLE)){
+writeClient.clean();
+  }

Review Comment:
   CleanFunction#open only do clean when 
OptionsResolver.isInsertOverwrite(conf) is true.
   
```
@Override
 public void open(Configuration parameters) throws Exception {
   super.open(parameters);
   if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) {
 this.writeClient = FlinkWriteClients.createWriteClient(conf, 
getRuntimeContext());
 this.executor = 
NonThrownExecutor.builder(LOG).waitForTasksFinish(true).build();
   
 if (OptionsResolver.isInsertOverwrite(conf)) {
   String instantTime = HoodieActiveTimeline.createNewInstantTime();
   LOG.info(String.format("exec sync clean with instant time %s...", 
instantTime));
   executor.execute(() -> writeClient.clean(instantTime), "wait for 
sync cleaning finish");
 }
   }
 }
   ```
   
   I think the real cleaning is actually in CompactionCommitSink#doCommit, and 
it depends on whether to enable configuration CLEAN_ASYNC_ENABLED.
   
   ```
 private void doCommit(String instant, Collection 
events) throws IOException {
   List statuses = events.stream()
   .map(CompactionCommitEvent::getWriteStatuses)
   .flatMap(Collection::stream)
   .collect(Collectors.toList());
   
   HoodieCommitMetadata metadata = 
CompactHelpers.getInstance().createCompactionMetadata(
   table, instant, HoodieListData.eager(statuses), 
writeClient.getConfig().getSchema());
   
   // commit the compaction
   this.writeClient.commitCompaction(instant, metadata, Option.empty());
   
   // Whether to clean up the old log file when compaction
   if (!conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) {
 this.writeClient.clean();
   }
 }
   ```
   
   So online async-clean and offline clean use the same config, So I added 
configuration CLEAN_OFFLINE_ENABLE to distinguish it from configuration 
CLEAN_ASYNC_ENABLED.
   



##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/HoodieFlinkClusteringJob.java:
##
@@ -358,6 +358,9 @@ private void cluster() throws Exception {
   .setParallelism(1);
 
   env.execute("flink_hudi_clustering_" + clusteringInstant.getTimestamp());
+  if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED) && 
conf.getBoolean(FlinkOptions.CLEAN_OFFLINE_ENABLE)){
+writeClient.clean();
+  }

Review Comment:
   CleanFunction#open only do clean when 
OptionsResolver.isInsertOverwrite(conf) is true.
   
   ```
 @Override
 public void open(Configuration parameters) throws Exception {
   super.open(parameters);
   if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) {
 this.writeClient = FlinkWriteClients.createWriteClient(conf, 
getRuntimeContext());
 this.executor = 
NonThrownExecutor.builder(LOG).waitForTasksFinish(true).build();
   
 if (OptionsResolver.isInsertOverwrite(conf)) {
   String instantTime = HoodieActiveTimeline.createNewInstantTime();
   LOG.info(String.format("exec sync clean with instant time %s...", 
instantTime));
   executor.execute(() -> writeClient.clean(instantTime), "wait for 
sync cleaning finish");
 }
   }
 }
   ```
   I think the real cleaning is actually in ClusteringCommitSink#doCommit, and 
it depends on whether to enable configuration CLEAN_ASYNC_ENABLED.
   
   ```
   
 private void doCommit(String instant, HoodieClusteringPlan clusteringPlan, 
List events) {
   List statuses = events.stream()
   .map(ClusteringCommitEvent::getWriteStatuses)
   .flatMap(Collection::stream)
   .collect(Collectors.toList());
   
   HoodieWriteMetadata> writeMetadata = new 
HoodieWriteMetadata<>();
   writeMetadata.setWriteStatuses(statuses);
   
writeMetadata.setWriteStats(statuses.stream().map(WriteStatus::getStat).collect(Collectors.toList()));
   
writeMetadata.setPartitionToReplaceFileIds(getPartitionToReplacedFileIds(clusteringPlan,
 writeMetadata));
   validateWriteResult(clusteringPlan, instant, writeMetadata);
   if (!writeMetadata.getCommitMetadata().isPresent()) {
 HoodieCommitMetadata commitMetadata = CommitUtils.buildMetadata(
 writeMetadata.getWriteStats().get(),
 writeMetadata.getPartitionToReplaceFileIds(),
 Option.empty(),
 WriteOperat

[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #8394: [HUDI-6085] Eliminate cleaning tasks for flink mor table if async cleaning is disabled

2023-04-16 Thread via GitHub


zhuanshenbsj1 commented on code in PR #8394:
URL: https://github.com/apache/hudi/pull/8394#discussion_r1168119189


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java:
##
@@ -314,6 +314,9 @@ private void compact() throws Exception {
   .setParallelism(1);
 
   env.execute("flink_hudi_compaction_" + String.join(",", 
compactionInstantTimes));
+  if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED) && 
conf.getBoolean(FlinkOptions.CLEAN_OFFLINE_ENABLE)){
+writeClient.clean();
+  }

Review Comment:
   CleanFunction#open only do clean when 
OptionsResolver.isInsertOverwrite(conf) is true.
   
```
@Override
 public void open(Configuration parameters) throws Exception {
   super.open(parameters);
   if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) {
 this.writeClient = FlinkWriteClients.createWriteClient(conf, 
getRuntimeContext());
 this.executor = 
NonThrownExecutor.builder(LOG).waitForTasksFinish(true).build();
   
 if (OptionsResolver.isInsertOverwrite(conf)) {
   String instantTime = HoodieActiveTimeline.createNewInstantTime();
   LOG.info(String.format("exec sync clean with instant time %s...", 
instantTime));
   executor.execute(() -> writeClient.clean(instantTime), "wait for 
sync cleaning finish");
 }
   }
 }
   ```
   
   I think the real cleaning is actually in CompactionCommitSink#doCommit, and 
it depends on whether to enable configuration CLEAN_ASYNC_ENABLED.
   
   ```
 private void doCommit(String instant, Collection 
events) throws IOException {
   List statuses = events.stream()
   .map(CompactionCommitEvent::getWriteStatuses)
   .flatMap(Collection::stream)
   .collect(Collectors.toList());
   
   HoodieCommitMetadata metadata = 
CompactHelpers.getInstance().createCompactionMetadata(
   table, instant, HoodieListData.eager(statuses), 
writeClient.getConfig().getSchema());
   
   // commit the compaction
   this.writeClient.commitCompaction(instant, metadata, Option.empty());
   
   // Whether to clean up the old log file when compaction
   if (!conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) {
 this.writeClient.clean();
   }
 }
   ```
   
   So I added configuration CLEAN_OFFLINE_ENABLE to distinguish it from 
configuration CLEAN_ASYNC_ENABLED.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #8394: [HUDI-6085] Eliminate cleaning tasks for flink mor table if async cleaning is disabled

2023-04-16 Thread via GitHub


zhuanshenbsj1 commented on code in PR #8394:
URL: https://github.com/apache/hudi/pull/8394#discussion_r1168119639


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/HoodieFlinkClusteringJob.java:
##
@@ -358,6 +358,9 @@ private void cluster() throws Exception {
   .setParallelism(1);
 
   env.execute("flink_hudi_clustering_" + clusteringInstant.getTimestamp());
+  if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED) && 
conf.getBoolean(FlinkOptions.CLEAN_OFFLINE_ENABLE)){
+writeClient.clean();
+  }

Review Comment:
   CleanFunction#open only do clean when 
OptionsResolver.isInsertOverwrite(conf) is true.
   
   ```
 @Override
 public void open(Configuration parameters) throws Exception {
   super.open(parameters);
   if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) {
 this.writeClient = FlinkWriteClients.createWriteClient(conf, 
getRuntimeContext());
 this.executor = 
NonThrownExecutor.builder(LOG).waitForTasksFinish(true).build();
   
 if (OptionsResolver.isInsertOverwrite(conf)) {
   String instantTime = HoodieActiveTimeline.createNewInstantTime();
   LOG.info(String.format("exec sync clean with instant time %s...", 
instantTime));
   executor.execute(() -> writeClient.clean(instantTime), "wait for 
sync cleaning finish");
 }
   }
 }
   ```
   I think the real cleaning is actually in ClusteringCommitSink#doCommit, and 
it depends on whether to enable configuration CLEAN_ASYNC_ENABLED.
   
   ```
   
 private void doCommit(String instant, HoodieClusteringPlan clusteringPlan, 
List events) {
   List statuses = events.stream()
   .map(ClusteringCommitEvent::getWriteStatuses)
   .flatMap(Collection::stream)
   .collect(Collectors.toList());
   
   HoodieWriteMetadata> writeMetadata = new 
HoodieWriteMetadata<>();
   writeMetadata.setWriteStatuses(statuses);
   
writeMetadata.setWriteStats(statuses.stream().map(WriteStatus::getStat).collect(Collectors.toList()));
   
writeMetadata.setPartitionToReplaceFileIds(getPartitionToReplacedFileIds(clusteringPlan,
 writeMetadata));
   validateWriteResult(clusteringPlan, instant, writeMetadata);
   if (!writeMetadata.getCommitMetadata().isPresent()) {
 HoodieCommitMetadata commitMetadata = CommitUtils.buildMetadata(
 writeMetadata.getWriteStats().get(),
 writeMetadata.getPartitionToReplaceFileIds(),
 Option.empty(),
 WriteOperationType.CLUSTER,
 this.writeClient.getConfig().getSchema(),
 HoodieTimeline.REPLACE_COMMIT_ACTION);
 writeMetadata.setCommitMetadata(Option.of(commitMetadata));
   }
   // commit the clustering
   this.table.getMetaClient().reloadActiveTimeline();
   this.writeClient.completeTableService(
   TableServiceType.CLUSTER, writeMetadata.getCommitMetadata().get(), 
table, instant);
   
   // whether to clean up the input base parquet files used for clustering
   if (!conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) {
 LOG.info("Running inline clean");
 this.writeClient.clean();
   }
 }
   ```
   So I added configuration CLEAN_OFFLINE_ENABLE to distinguish it from 
configuration CLEAN_ASYNC_ENABLED.



##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java:
##
@@ -314,6 +314,9 @@ private void compact() throws Exception {
   .setParallelism(1);
 
   env.execute("flink_hudi_compaction_" + String.join(",", 
compactionInstantTimes));
+  if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED) && 
conf.getBoolean(FlinkOptions.CLEAN_OFFLINE_ENABLE)){
+writeClient.clean();
+  }

Review Comment:
   CompactionCommitSink#open only do clean when 
OptionsResolver.isInsertOverwrite(conf) is true.
   
```
@Override
 public void open(Configuration parameters) throws Exception {
   super.open(parameters);
   if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) {
 this.writeClient = FlinkWriteClients.createWriteClient(conf, 
getRuntimeContext());
 this.executor = 
NonThrownExecutor.builder(LOG).waitForTasksFinish(true).build();
   
 if (OptionsResolver.isInsertOverwrite(conf)) {
   String instantTime = HoodieActiveTimeline.createNewInstantTime();
   LOG.info(String.format("exec sync clean with instant time %s...", 
instantTime));
   executor.execute(() -> writeClient.clean(instantTime), "wait for 
sync cleaning finish");
 }
   }
 }
   ```
   
   I think the real cleaning is actually in CompactionCommitSink#doCommit, and 
it depends on whether to enable configuration CLEAN_ASYNC_ENABLED.
   
   ```
 private void doCommit(String instant, Collection 
events) throws IOException {
   List statuses = events.stream()
   .map(Com

[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #8394: [HUDI-6085] Eliminate cleaning tasks for flink mor table if async cleaning is disabled

2023-04-16 Thread via GitHub


zhuanshenbsj1 commented on code in PR #8394:
URL: https://github.com/apache/hudi/pull/8394#discussion_r1168119189


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java:
##
@@ -314,6 +314,9 @@ private void compact() throws Exception {
   .setParallelism(1);
 
   env.execute("flink_hudi_compaction_" + String.join(",", 
compactionInstantTimes));
+  if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED) && 
conf.getBoolean(FlinkOptions.CLEAN_OFFLINE_ENABLE)){
+writeClient.clean();
+  }

Review Comment:
   CompactionCommitSink#open only do clean when 
OptionsResolver.isInsertOverwrite(conf) is true.
   
```
@Override
 public void open(Configuration parameters) throws Exception {
   super.open(parameters);
   if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) {
 this.writeClient = FlinkWriteClients.createWriteClient(conf, 
getRuntimeContext());
 this.executor = 
NonThrownExecutor.builder(LOG).waitForTasksFinish(true).build();
   
 if (OptionsResolver.isInsertOverwrite(conf)) {
   String instantTime = HoodieActiveTimeline.createNewInstantTime();
   LOG.info(String.format("exec sync clean with instant time %s...", 
instantTime));
   executor.execute(() -> writeClient.clean(instantTime), "wait for 
sync cleaning finish");
 }
   }
 }
   ```
   
   I think the real cleaning is actually in CompactionCommitSink#doCommit, and 
it depends on whether to enable configuration CLEAN_ASYNC_ENABLED.
   
   ```
 private void doCommit(String instant, Collection 
events) throws IOException {
   List statuses = events.stream()
   .map(CompactionCommitEvent::getWriteStatuses)
   .flatMap(Collection::stream)
   .collect(Collectors.toList());
   
   HoodieCommitMetadata metadata = 
CompactHelpers.getInstance().createCompactionMetadata(
   table, instant, HoodieListData.eager(statuses), 
writeClient.getConfig().getSchema());
   
   // commit the compaction
   this.writeClient.commitCompaction(instant, metadata, Option.empty());
   
   // Whether to clean up the old log file when compaction
   if (!conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) {
 this.writeClient.clean();
   }
 }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #8394: [HUDI-6085] Eliminate cleaning tasks for flink mor table if async cleaning is disabled

2023-04-16 Thread via GitHub


zhuanshenbsj1 commented on code in PR #8394:
URL: https://github.com/apache/hudi/pull/8394#discussion_r1168119189


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java:
##
@@ -314,6 +314,9 @@ private void compact() throws Exception {
   .setParallelism(1);
 
   env.execute("flink_hudi_compaction_" + String.join(",", 
compactionInstantTimes));
+  if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED) && 
conf.getBoolean(FlinkOptions.CLEAN_OFFLINE_ENABLE)){
+writeClient.clean();
+  }

Review Comment:
   CompactionCommitSink#open only do clean when 
OptionsResolver.isInsertOverwrite(conf) is true.
   
```
@Override
 public void open(Configuration parameters) throws Exception {
   super.open(parameters);
   if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) {
 this.writeClient = FlinkWriteClients.createWriteClient(conf, 
getRuntimeContext());
 this.executor = 
NonThrownExecutor.builder(LOG).waitForTasksFinish(true).build();
   
 if (OptionsResolver.isInsertOverwrite(conf)) {
   String instantTime = HoodieActiveTimeline.createNewInstantTime();
   LOG.info(String.format("exec sync clean with instant time %s...", 
instantTime));
   executor.execute(() -> writeClient.clean(instantTime), "wait for 
sync cleaning finish");
 }
   }
 }
   ```
   
   I think the real cleaning is actually in CompactionCommitSink#doCommit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #8394: [HUDI-6085] Eliminate cleaning tasks for flink mor table if async cleaning is disabled

2023-04-16 Thread via GitHub


zhuanshenbsj1 commented on code in PR #8394:
URL: https://github.com/apache/hudi/pull/8394#discussion_r1168119189


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java:
##
@@ -314,6 +314,9 @@ private void compact() throws Exception {
   .setParallelism(1);
 
   env.execute("flink_hudi_compaction_" + String.join(",", 
compactionInstantTimes));
+  if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED) && 
conf.getBoolean(FlinkOptions.CLEAN_OFFLINE_ENABLE)){
+writeClient.clean();
+  }

Review Comment:
   CompactionCommitSink#open only do clean when 
OptionsResolver.isInsertOverwrite(conf) is true.
   
```
@Override
 public void open(Configuration parameters) throws Exception {
   super.open(parameters);
   if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) {
 this.writeClient = FlinkWriteClients.createWriteClient(conf, 
getRuntimeContext());
 this.executor = 
NonThrownExecutor.builder(LOG).waitForTasksFinish(true).build();
   
 if (OptionsResolver.isInsertOverwrite(conf)) {
   String instantTime = HoodieActiveTimeline.createNewInstantTime();
   LOG.info(String.format("exec sync clean with instant time %s...", 
instantTime));
   executor.execute(() -> writeClient.clean(instantTime), "wait for 
sync cleaning finish");
 }
   }
 }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #8394: [HUDI-6085] Eliminate cleaning tasks for flink mor table if async cleaning is disabled

2023-04-16 Thread via GitHub


zhuanshenbsj1 commented on code in PR #8394:
URL: https://github.com/apache/hudi/pull/8394#discussion_r1168119639


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/HoodieFlinkClusteringJob.java:
##
@@ -358,6 +358,9 @@ private void cluster() throws Exception {
   .setParallelism(1);
 
   env.execute("flink_hudi_clustering_" + clusteringInstant.getTimestamp());
+  if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED) && 
conf.getBoolean(FlinkOptions.CLEAN_OFFLINE_ENABLE)){
+writeClient.clean();
+  }

Review Comment:
   CleanFunction#open only do clean when 
OptionsResolver.isInsertOverwrite(conf) is true.
   
   ```
 @Override
 public void open(Configuration parameters) throws Exception {
   super.open(parameters);
   if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) {
 this.writeClient = FlinkWriteClients.createWriteClient(conf, 
getRuntimeContext());
 this.executor = 
NonThrownExecutor.builder(LOG).waitForTasksFinish(true).build();
   
 if (OptionsResolver.isInsertOverwrite(conf)) {
   String instantTime = HoodieActiveTimeline.createNewInstantTime();
   LOG.info(String.format("exec sync clean with instant time %s...", 
instantTime));
   executor.execute(() -> writeClient.clean(instantTime), "wait for 
sync cleaning finish");
 }
   }
 }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #8394: [HUDI-6085] Eliminate cleaning tasks for flink mor table if async cleaning is disabled

2023-04-16 Thread via GitHub


zhuanshenbsj1 commented on code in PR #8394:
URL: https://github.com/apache/hudi/pull/8394#discussion_r1168119639


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/HoodieFlinkClusteringJob.java:
##
@@ -358,6 +358,9 @@ private void cluster() throws Exception {
   .setParallelism(1);
 
   env.execute("flink_hudi_clustering_" + clusteringInstant.getTimestamp());
+  if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED) && 
conf.getBoolean(FlinkOptions.CLEAN_OFFLINE_ENABLE)){
+writeClient.clean();
+  }

Review Comment:
   CleanFunction#open only do clean when 
OptionsResolver.isInsertOverwrite(conf) is true.
   
 @Override
 public void open(Configuration parameters) throws Exception {
   super.open(parameters);
   if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) {
 this.writeClient = FlinkWriteClients.createWriteClient(conf, 
getRuntimeContext());
 this.executor = 
NonThrownExecutor.builder(LOG).waitForTasksFinish(true).build();
   
 if (OptionsResolver.isInsertOverwrite(conf)) {
   String instantTime = HoodieActiveTimeline.createNewInstantTime();
   LOG.info(String.format("exec sync clean with instant time %s...", 
instantTime));
   executor.execute(() -> writeClient.clean(instantTime), "wait for 
sync cleaning finish");
 }
   }
 }



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6085) Eliminate cleaning tasks for flink mor table if async cleaning is disabled

2023-04-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6085:
-
Labels: pull-request-available  (was: )

> Eliminate cleaning tasks for flink mor table if async cleaning is disabled
> --
>
> Key: HUDI-6085
> URL: https://issues.apache.org/jira/browse/HUDI-6085
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #8394: [HUDI-6085] Eliminate cleaning tasks for flink mor table if async cleaning is disabled

2023-04-16 Thread via GitHub


zhuanshenbsj1 commented on code in PR #8394:
URL: https://github.com/apache/hudi/pull/8394#discussion_r1168119189


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java:
##
@@ -314,6 +314,9 @@ private void compact() throws Exception {
   .setParallelism(1);
 
   env.execute("flink_hudi_compaction_" + String.join(",", 
compactionInstantTimes));
+  if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED) && 
conf.getBoolean(FlinkOptions.CLEAN_OFFLINE_ENABLE)){
+writeClient.clean();
+  }

Review Comment:
   CompactionCommitSink#open only do clean when 
OptionsResolver.isInsertOverwrite(conf) is true.
   
 @Override
 public void open(Configuration parameters) throws Exception {
   super.open(parameters);
   if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) {
 this.writeClient = FlinkWriteClients.createWriteClient(conf, 
getRuntimeContext());
 this.executor = 
NonThrownExecutor.builder(LOG).waitForTasksFinish(true).build();
   
 if (OptionsResolver.isInsertOverwrite(conf)) {
   String instantTime = HoodieActiveTimeline.createNewInstantTime();
   LOG.info(String.format("exec sync clean with instant time %s...", 
instantTime));
   executor.execute(() -> writeClient.clean(instantTime), "wait for 
sync cleaning finish");
 }
   }
 }



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] coffee34 opened a new issue, #8474: [SUPPORT] Duplicate records caused by misclassification as insert during upsert after spark executor loss

2023-04-16 Thread via GitHub


coffee34 opened a new issue, #8474:
URL: https://github.com/apache/hudi/issues/8474

   **Describe the problem you faced**
   
   I have observed that when upserting existing records to the same partition 
and experiencing an executor loss during the Building workload profile, which 
causes the stage to re-run, the record may be misclassified as an insert 
instead of an update. As a result, there will be 2 records with the same record 
key in the same partition. This issue only occurs when a Spark executor is lost 
and the stage is re-run; usually, there are no duplicates.
   https://user-images.githubusercontent.com/64056509/232363759-65c40f3b-443e-40fe-9c28-eba848616e85.png";>
   
   My hypothesis is that Hudi encounters an issue when re-running the 
tagLocation phase, which results in the failure to find the corresponding base 
file for that record.
   
   This has happened several times in our production environment after executor 
loss, but I have been unable to reproduce it in our staging environment. I have 
modified the Spark configuration to prevent executor loss, so the issue is not 
currently occurring.
   
   I would greatly appreciate it if you could provide some insight into why 
this might happen and any possible solutions or workarounds to address this 
issue.
   
   **Environment Description**
   
   * Hudi version : 0.11.1
   
   * Spark version : 3.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Here is the config used when writing to hudi
   ```
   --write-type upsert
   --load.hudi.record-key   id
   --load.options   {"hoodie.upsert.shuffle.parallelism":200}
   
   DataSourceWriteOptions.ASYNC_COMPACT_ENABLE -> false,
   DataSourceWriteOptions.HIVE_STYLE_PARTITIONING -> true,
   DataSourceWriteOptions.TABLE_TYPE -> 
DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL,
   FileSystemViewStorageConfig.INCREMENTAL_TIMELINE_SYNC_ENABLE -> false,
   HoodieCompactionConfig.CLEANER_FILE_VERSIONS_RETAINED -> 6, 
   HoodieCompactionConfig.CLEANER_POLICY -> 
HoodieCleaningPolicy.KEEP_LATEST_FILE_VERSIONS,
   HoodieCompactionConfig.INLINE_COMPACT -> true,
   HoodiePayloadConfig.EVENT_TIME_FIELD -> Columns.InternalTimestamp,
   HoodiePayloadConfig.ORDERING_FIELD -> Columns.InternalTimestamp,
   HoodieIndexConfig.INDEX_TYPE -> HoodieIndex.IndexType.BLOOM,
   HoodieWriteConfig.MARKERS_TYPE -> MarkerType.DIRECT,
   HoodieWriteConfig.ROLLBACK_USING_MARKERS_ENABLE -> false,
   HoodieWriteCommitCallbackConfig.TURN_CALLBACK_ON -> true
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8385: [HUDI-6040]Stop writing and reading compaction plans from .aux folder

2023-04-16 Thread via GitHub


hudi-bot commented on PR #8385:
URL: https://github.com/apache/hudi/pull/8385#issuecomment-1510616319

   
   ## CI report:
   
   * 3874447e48c21cb336f28625e1682b8f229f623c UNKNOWN
   * 6fd073a2061145c3800023590c50f55837a59171 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16299)
 
   * 1cd0db680780d02ff786121f394dccfcd621d37d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7614: [HUDI-5509] check if dfs support atomic creation when using filesyste…

2023-04-16 Thread via GitHub


hudi-bot commented on PR #7614:
URL: https://github.com/apache/hudi/pull/7614#issuecomment-1510615128

   
   ## CI report:
   
   * 058ab2703bda207fc9f5861d5e4b865e83ee1b45 UNKNOWN
   * 9b853e4f449343ffae850eae895191fbdfb94b12 UNKNOWN
   * 14afb18aba30b3c4fd323e23e0a6e42e2bac841e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16376)
 
   * d49c1d2e4ce8b6e2cc83da12a74ccf27912013b7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16377)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7614: [HUDI-5509] check if dfs support atomic creation when using filesyste…

2023-04-16 Thread via GitHub


hudi-bot commented on PR #7614:
URL: https://github.com/apache/hudi/pull/7614#issuecomment-1510610385

   
   ## CI report:
   
   * 058ab2703bda207fc9f5861d5e4b865e83ee1b45 UNKNOWN
   * 9b853e4f449343ffae850eae895191fbdfb94b12 UNKNOWN
   * 14afb18aba30b3c4fd323e23e0a6e42e2bac841e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16376)
 
   * d49c1d2e4ce8b6e2cc83da12a74ccf27912013b7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Mulavar commented on pull request #8385: [HUDI-6040]Stop writing and reading compaction plans from .aux folder

2023-04-16 Thread via GitHub


Mulavar commented on PR #8385:
URL: https://github.com/apache/hudi/pull/8385#issuecomment-1510607484

   @bvaradar Yeah bvaradar, I agreed with you that we need to add some test 
cases to verify the upgrader/downgrader, but I'm not sure if there's a need to 
run subsequent compaction, cause we do not change anything about the compaction 
plan content, so maybe we should better verify if we can still read the 
compaction plan from .hoodie folder when after deleting compaction.requested 
from .aux folder. Since the upgrade verification has been covered by 
org.apache.hudi.common.table.timeline.TestHoodieActiveTimeline#testLoadingInstantsFromFiles
 and we could add some tests to verify SixToFiveDowngradeHandler. WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Zouxxyy closed pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink

2023-04-16 Thread via GitHub


Zouxxyy closed pull request #8456: [HUDI-6078] Make clean controlled by 
parameter in flink
URL: https://github.com/apache/hudi/pull/8456


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-662) Attribute binary dependencies that are included in the bundle jars

2023-04-16 Thread Henri Yandell (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17712848#comment-17712848
 ] 

Henri Yandell commented on HUDI-662:


Noting that 
[https://repo1.maven.org/maven2/org/apache/hudi/hudi-trino-bundle/0.13.0/] 
appears to include a openjdk.jol jol-core dependency under GPL-2.0 WITH 
ClasspathException-2.0. You should remove this or open a LEGAL Jira item to 
discuss.

> Attribute binary dependencies that are included in the bundle jars
> --
>
> Key: HUDI-662
> URL: https://issues.apache.org/jira/browse/HUDI-662
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies, Release & Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> [https://www.apache.org/legal/resolved.html] is the comprehensive guide here.
>  [http://www.apache.org/dev/licensing-howto.html] is the comprehensive guide 
> here.'
> [http://www.apache.org/legal/src-headers.html] also 
>  
> Previously, we asked about some specific dependencies here
>  https://issues.apache.org/jira/browse/LEGAL-461



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-5298) Optimize WriteStatus storing HoodieRecord

2023-04-16 Thread xiaochen zhou (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17712845#comment-17712845
 ] 

xiaochen zhou commented on HUDI-5298:
-

I have opened the PR, please help to review it, thanks a lot.

https://github.com/apache/hudi/pull/8472
 

> Optimize WriteStatus storing HoodieRecord
> -
>
> Key: HUDI-5298
> URL: https://issues.apache.org/jira/browse/HUDI-5298
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available
>
> WriteStatus stores the entire HoodieRecord. we can optimize it to store just 
> the required info (record key, partition path, location). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-6035) Make simple index parallelism auto inferred

2023-04-16 Thread xiaochen zhou (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17712844#comment-17712844
 ] 

xiaochen zhou commented on HUDI-6035:
-

Thanks for your reply. I have opened the PR, please help to review it, thanks a 
lot. https://github.com/apache/hudi/pull/8468

> Make simple index parallelism auto inferred
> ---
>
> Key: HUDI-6035
> URL: https://issues.apache.org/jira/browse/HUDI-6035
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: index
>Reporter: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
>
> simple index parallelism still default to 100, while like bloom index 
> parallelism is auto inferred from input partitions (default 0). we should fix 
> simple index parallelism for this, as it's the default index type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] bvaradar commented on a diff in pull request #8378: [HUDI-6031] fix bug: checkpoint lost after changing cow to mor

2023-04-16 Thread via GitHub


bvaradar commented on code in PR #8378:
URL: https://github.com/apache/hudi/pull/8378#discussion_r1167991935


##
hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamer.java:
##
@@ -2604,6 +2605,59 @@ public void testForceEmptyMetaSync() throws Exception {
 assertTrue(hiveClient.tableExists(tableName), "Table " + tableName + " 
should exist");
   }
 
+  @Test
+  public void testResumeCheckpointAfterChangingCOW2MOR() throws Exception {
+String tableBasePath = basePath + 
"/test_resume_checkpoint_after_changing_cow_to_mor";
+// default table type is COW
+HoodieDeltaStreamer.Config cfg = TestHelpers.makeConfig(tableBasePath, 
WriteOperationType.BULK_INSERT);
+new HoodieDeltaStreamer(cfg, jsc).sync();
+TestHelpers.assertRecordCount(1000, tableBasePath, sqlContext);
+TestHelpers.assertCommitMetadata("0", tableBasePath, fs, 1);
+TestHelpers.assertAtLeastNCommits(1, tableBasePath, fs);
+
+// change cow to mor
+HoodieTableMetaClient metaClient = HoodieTableMetaClient.builder()
+.setConf(new Configuration(fs.getConf()))
+.setBasePath(cfg.targetBasePath)
+.setLoadActiveTimelineOnLoad(false)
+.build();
+Properties hoodieProps = new Properties();
+hoodieProps.load(fs.open(new Path(cfg.targetBasePath + 
"/.hoodie/hoodie.properties")));
+LOG.info("old props: {}", hoodieProps);
+hoodieProps.put("hoodie.table.type", HoodieTableType.MERGE_ON_READ.name());
+LOG.info("new props: {}", hoodieProps);
+Path metaPathDir = new Path(metaClient.getBasePathV2(), METAFOLDER_NAME);
+HoodieTableConfig.create(metaClient.getFs(), metaPathDir, hoodieProps);
+
+// continue deltastreamer
+cfg = TestHelpers.makeConfig(tableBasePath, WriteOperationType.UPSERT);
+cfg.tableType = HoodieTableType.MERGE_ON_READ.name();
+new HoodieDeltaStreamer(cfg, jsc).sync();
+// out of 1000 new records, 500 are inserts, 450 are updates and 50 are 
deletes.

Review Comment:
   Does this test fail with current master ? Can you please provide the 
assertion failure with current master ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] bvaradar commented on a diff in pull request #8385: [HUDI-6040]Stop writing and reading compaction plans from .aux folder

2023-04-16 Thread via GitHub


bvaradar commented on code in PR #8385:
URL: https://github.com/apache/hudi/pull/8385#discussion_r1167981665


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/UpgradeDowngrade.java:
##
@@ -87,7 +87,10 @@ public boolean needsUpgradeOrDowngrade(HoodieTableVersion 
toVersion) {
* pre 0.6.0 -> v0
* 0.6.0 to 0.8.0 -> v1
* 0.9.0 -> v2
-   * 0.10.0 to current -> v3
+   * 0.10.0 -> v3

Review Comment:
   thank for updating this. 



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/FiveToSixUpgradeHandler.java:
##
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.upgrade;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hudi.common.config.ConfigProperty;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieUpgradeDowngradeException;
+import org.apache.hudi.table.HoodieTable;
+
+import java.io.IOException;
+import java.util.Map;
+
+/**
+ * Upgrade handle to assist in upgrading hoodie table from version 5 to 6.
+ */
+public class FiveToSixUpgradeHandler implements UpgradeHandler {
+
+  @Override
+  public Map upgrade(HoodieWriteConfig config, 
HoodieEngineContext context, String instantTime, SupportsUpgradeDowngrade 
upgradeDowngradeHelper) {
+HoodieTable table = upgradeDowngradeHelper.getTable(config, context);
+HoodieTableMetaClient metaClient = table.getMetaClient();
+// delete compaction requested file from .aux
+HoodieTimeline compactionTimeline = 
metaClient.getActiveTimeline().filterPendingCompactionTimeline()
+.filter(instant -> instant.getState() == 
HoodieInstant.State.REQUESTED);
+compactionTimeline.getInstants().stream().forEach(instant -> {
+  String fileName = instant.getFileName();
+  try {
+metaClient.getFs().delete(new Path(metaClient.getMetaAuxiliaryPath(), 
fileName));

Review Comment:
   Yes, Good point. We should only cleanup compaction plans in .aux folder and 
remove compaction only callers 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] melin opened a new issue, #8473: [SUPPORT] Spark 3.4.0

2023-04-16 Thread via GitHub


melin opened a new issue, #8473:
URL: https://github.com/apache/hudi/issues/8473

   https://spark.apache.org/releases/spark-release-3-4-0.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7614: [HUDI-5509] check if dfs support atomic creation when using filesyste…

2023-04-16 Thread via GitHub


hudi-bot commented on PR #7614:
URL: https://github.com/apache/hudi/pull/7614#issuecomment-1510383594

   
   ## CI report:
   
   * 058ab2703bda207fc9f5861d5e4b865e83ee1b45 UNKNOWN
   * 9b853e4f449343ffae850eae895191fbdfb94b12 UNKNOWN
   * 14afb18aba30b3c4fd323e23e0a6e42e2bac841e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16376)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ad1happy2go commented on issue #8447: [SUPPORT] Docker Demo Issue With Current master(0.14.0-SNAPSHOT)

2023-04-16 Thread via GitHub


ad1happy2go commented on issue #8447:
URL: https://github.com/apache/hudi/issues/8447#issuecomment-1510379843

   @agrawalreetika Is there any special requirement that you need to use 
master, can you please try to use the last stable version - 0.13.0.
   
   Actually the docker demo given here 
(https://hudi.apache.org/docs/docker_demo/) is tested on the above version only.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ad1happy2go commented on issue #8436: [SUPPORT] run hoodie cleaner process as a spark submit request on EMR 6.9

2023-04-16 Thread via GitHub


ad1happy2go commented on issue #8436:
URL: https://github.com/apache/hudi/issues/8436#issuecomment-1510378693

   @alexone95 Just checking if the above command works for you?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] chenbodeng719 commented on issue #8060: [SUPPORT] An instant exception occurs when the flink job is restarted

2023-04-16 Thread via GitHub


chenbodeng719 commented on issue #8060:
URL: https://github.com/apache/hudi/issues/8060#issuecomment-1510352656

   > I found that the task resumed after the task ck three times, I'll check 
the task detail log later.
   > 
   > 
![5317ab0d6be939bd610a2129f97f750](https://user-images.githubusercontent.com/30795397/222034915-5375709e-12a3-4388-ac63-cc0f135a70be.png)
   
   I think this is because that hudi fix MOR table log file that already 
exists. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7614: [HUDI-5509] check if dfs support atomic creation when using filesyste…

2023-04-16 Thread via GitHub


hudi-bot commented on PR #7614:
URL: https://github.com/apache/hudi/pull/7614#issuecomment-1510343733

   
   ## CI report:
   
   * 058ab2703bda207fc9f5861d5e4b865e83ee1b45 UNKNOWN
   * 9b853e4f449343ffae850eae895191fbdfb94b12 UNKNOWN
   * 1da0ef054a9905a0ba789324ee529456858956ff Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16368)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16369)
 
   * 14afb18aba30b3c4fd323e23e0a6e42e2bac841e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16376)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7614: [HUDI-5509] check if dfs support atomic creation when using filesyste…

2023-04-16 Thread via GitHub


hudi-bot commented on PR #7614:
URL: https://github.com/apache/hudi/pull/7614#issuecomment-1510333712

   
   ## CI report:
   
   * 058ab2703bda207fc9f5861d5e4b865e83ee1b45 UNKNOWN
   * 9b853e4f449343ffae850eae895191fbdfb94b12 UNKNOWN
   * 1da0ef054a9905a0ba789324ee529456858956ff Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16368)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16369)
 
   * 14afb18aba30b3c4fd323e23e0a6e42e2bac841e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org