[GitHub] [hudi] hudi-bot commented on pull request #9472: [HUDI-6719]Fix data inconsistency issues caused by concurrent clustering and delete partition.

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9472:
URL: https://github.com/apache/hudi/pull/9472#issuecomment-1692866533

   
   ## CI report:
   
   * d403d1696e76ee6daf6ad6ddc62e9a218a5accab Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19417)
 
   * 1264a3a18c712b6ab80e7256d628c46338062433 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19460)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] majian1998 commented on a diff in pull request #9472: [HUDI-6719]Fix data inconsistency issues caused by concurrent clustering and delete partition.

2023-08-24 Thread via GitHub


majian1998 commented on code in PR #9472:
URL: https://github.com/apache/hudi/pull/9472#discussion_r1305253669


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/ReplaceCommitValidateUtil.java:
##
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.action.cluster;
+
+import org.apache.hudi.common.model.HoodieReplaceCommitMetadata;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.util.JsonUtils;
+import org.apache.hudi.exception.HoodieException;
+
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.util.FileIOUtils.LOG;
+
+public class ReplaceCommitValidateUtil {
+  public static final String REPLACE_COMMIT_FILE_IDS = "replaceCommitFileIds";
+  public static void validateReplaceCommit(HoodieTableMetaClient metaClient) {
+metaClient.reloadActiveTimeline();
+Set replaceFileids = new HashSet<>();
+
+// Verify pending and completed replace commit
+
Stream.concat(metaClient.getActiveTimeline().getCompletedReplaceTimeline().getInstantsAsStream(),
+
metaClient.getActiveTimeline().filterInflights().filterPendingReplaceTimeline().getInstantsAsStream()).map(instant
 -> {
+  try {
+HoodieReplaceCommitMetadata replaceCommitMetadata =
+
HoodieReplaceCommitMetadata.fromBytes(metaClient.getActiveTimeline().getInstantDetails(instant).get(),
+HoodieReplaceCommitMetadata.class);
+if (!instant.isCompleted()) {
+  return 
JsonUtils.getObjectMapper().readValue(replaceCommitMetadata.getExtraMetadata().get(REPLACE_COMMIT_FILE_IDS),
 String[].class);
+} else {
+  return 
replaceCommitMetadata.getPartitionToReplaceFileIds().values().stream()
+  .flatMap(List::stream)
+  .toArray(String[]::new);
+}
+  } catch (Exception e) {
+// If the key does not exist or there is a JSON parsing error, LOG 
reports an error and ignores it.
+LOG.error("Error when reading REPLACECOMMIT meta", e);
+return null;

Review Comment:
   In other words, my expectation is that the program should not encounter any 
issues leading to termination in my validation, unless there are duplicate file 
ids.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9472: [HUDI-6719]Fix data inconsistency issues caused by concurrent clustering and delete partition.

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9472:
URL: https://github.com/apache/hudi/pull/9472#issuecomment-1692857813

   
   ## CI report:
   
   * d403d1696e76ee6daf6ad6ddc62e9a218a5accab Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19417)
 
   * 1264a3a18c712b6ab80e7256d628c46338062433 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] majian1998 commented on a diff in pull request #9472: [HUDI-6719]Fix data inconsistency issues caused by concurrent clustering and delete partition.

2023-08-24 Thread via GitHub


majian1998 commented on code in PR #9472:
URL: https://github.com/apache/hudi/pull/9472#discussion_r1305249985


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/ReplaceCommitValidateUtil.java:
##
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.action.cluster;
+
+import org.apache.hudi.common.model.HoodieReplaceCommitMetadata;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.util.JsonUtils;
+import org.apache.hudi.exception.HoodieException;
+
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.util.FileIOUtils.LOG;
+
+public class ReplaceCommitValidateUtil {
+  public static final String REPLACE_COMMIT_FILE_IDS = "replaceCommitFileIds";
+  public static void validateReplaceCommit(HoodieTableMetaClient metaClient) {
+metaClient.reloadActiveTimeline();
+Set replaceFileids = new HashSet<>();
+
+// Verify pending and completed replace commit
+
Stream.concat(metaClient.getActiveTimeline().getCompletedReplaceTimeline().getInstantsAsStream(),
+
metaClient.getActiveTimeline().filterInflights().filterPendingReplaceTimeline().getInstantsAsStream()).map(instant
 -> {
+  try {
+HoodieReplaceCommitMetadata replaceCommitMetadata =
+
HoodieReplaceCommitMetadata.fromBytes(metaClient.getActiveTimeline().getInstantDetails(instant).get(),
+HoodieReplaceCommitMetadata.class);
+if (!instant.isCompleted()) {
+  return 
JsonUtils.getObjectMapper().readValue(replaceCommitMetadata.getExtraMetadata().get(REPLACE_COMMIT_FILE_IDS),
 String[].class);
+} else {
+  return 
replaceCommitMetadata.getPartitionToReplaceFileIds().values().stream()
+  .flatMap(List::stream)
+  .toArray(String[]::new);
+}
+  } catch (Exception e) {
+// If the key does not exist or there is a JSON parsing error, LOG 
reports an error and ignores it.
+LOG.error("Error when reading REPLACECOMMIT meta", e);
+return null;

Review Comment:
   The issue currently observed is that the key "REPLACE_COMMIT_FILE_IDS" does 
not have a value in the inflight metadata. In such cases, the code throws an 
error, but in certain testing scenarios, it may not call the generated inflight 
REPLACECOMMIT through `table.cluster.` Instead, it directly calls 
transitionReplaceRequestedToInflight to generate the inflight operation. 
Consequently, an exception occurs because REPLACE_COMMIT_FILE_IDS is not stored 
in the inflight metadata by `table.cluster`. No other issues have been 
identified thus far. The reason for catching this exception without terminating 
the program is that it serves as an additional validation program. I do not 
want to impact the original execution logic unless there is a validation 
failure. Therefore, any other unexpected exceptions are also skipped in this 
scenario. We can discuss how this situation can be handled better.  ^_^



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9533: [HUDI-6445] Fixing metrics in tests

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9533:
URL: https://github.com/apache/hudi/pull/9533#issuecomment-1692849353

   
   ## CI report:
   
   * 78ba0da62104725400b6aec04775f1e2c7dfd113 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19459)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19455)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on a diff in pull request #9472: [HUDI-6719]Fix data inconsistency issues caused by concurrent clustering and delete partition.

2023-08-24 Thread via GitHub


voonhous commented on code in PR #9472:
URL: https://github.com/apache/hudi/pull/9472#discussion_r1305231230


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/ReplaceCommitValidateUtil.java:
##
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.action.cluster;
+
+import org.apache.hudi.common.model.HoodieReplaceCommitMetadata;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.util.JsonUtils;
+import org.apache.hudi.exception.HoodieException;
+
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.util.FileIOUtils.LOG;
+
+public class ReplaceCommitValidateUtil {
+  public static final String REPLACE_COMMIT_FILE_IDS = "replaceCommitFileIds";
+  public static void validateReplaceCommit(HoodieTableMetaClient metaClient) {
+metaClient.reloadActiveTimeline();
+Set replaceFileids = new HashSet<>();
+
+// Verify pending and completed replace commit
+
Stream.concat(metaClient.getActiveTimeline().getCompletedReplaceTimeline().getInstantsAsStream(),
+
metaClient.getActiveTimeline().filterInflights().filterPendingReplaceTimeline().getInstantsAsStream()).map(instant
 -> {
+  try {
+HoodieReplaceCommitMetadata replaceCommitMetadata =
+
HoodieReplaceCommitMetadata.fromBytes(metaClient.getActiveTimeline().getInstantDetails(instant).get(),
+HoodieReplaceCommitMetadata.class);
+if (!instant.isCompleted()) {
+  return 
JsonUtils.getObjectMapper().readValue(replaceCommitMetadata.getExtraMetadata().get(REPLACE_COMMIT_FILE_IDS),
 String[].class);
+} else {
+  return 
replaceCommitMetadata.getPartitionToReplaceFileIds().values().stream()
+  .flatMap(List::stream)
+  .toArray(String[]::new);
+}
+  } catch (Exception e) {
+// If the key does not exist or there is a JSON parsing error, LOG 
reports an error and ignores it.
+LOG.error("Error when reading REPLACECOMMIT meta", e);
+return null;

Review Comment:
   In what cases are we not able to read the REPLACECOMMIT meta here? If we are 
not able to read the meta here, is it safe to proceed with the REPLACECOMMIT 
that the current writer is performing?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] majian1998 commented on a diff in pull request #9472: [HUDI-6719]Fix data inconsistency issues caused by concurrent clustering and delete partition.

2023-08-24 Thread via GitHub


majian1998 commented on code in PR #9472:
URL: https://github.com/apache/hudi/pull/9472#discussion_r1305217059


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/ReplaceCommitValidateUtil.java:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.action.cluster;
+
+import org.apache.hudi.common.model.HoodieReplaceCommitMetadata;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.util.JsonUtils;
+import org.apache.hudi.exception.HoodieException;
+
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.util.FileIOUtils.LOG;
+
+public class ReplaceCommitValidateUtil {
+  public static final String REPLACE_COMMIT_FILE_IDS = "replaceCommitFileIds";
+  public static void validateReplaceCommit(HoodieTableMetaClient metaClient) {
+metaClient.reloadActiveTimeline();
+Set replaceFileids = new HashSet<>();
+
+// Verify pending and completed replace commit
+
Stream.concat(metaClient.getActiveTimeline().getCompletedReplaceTimeline().getInstantsAsStream(),
+
metaClient.getActiveTimeline().filterInflights().filterPendingReplaceTimeline().getInstantsAsStream()).map(instant
 -> {
+  try {
+HoodieReplaceCommitMetadata replaceCommitMetadata =
+
HoodieReplaceCommitMetadata.fromBytes(metaClient.getActiveTimeline().getInstantDetails(instant).get(),
+HoodieReplaceCommitMetadata.class);
+if (!instant.isCompleted()) {
+  return 
JsonUtils.getObjectMapper().readValue(replaceCommitMetadata.getExtraMetadata().get(REPLACE_COMMIT_FILE_IDS),
 String[].class);
+} else {
+  return 
replaceCommitMetadata.getPartitionToReplaceFileIds().values().stream()
+  .flatMap(List::stream)
+  .toArray(String[]::new);
+}
+  } catch (Exception e) {
+// If the key does not exist or there is a JSON parsing error, LOG 
reports an error and ignores it.
+LOG.error("Error when reading replace commit meta", e);
+return null;
+  }
+}).filter(Objects::nonNull)
+.forEach(fileIdArray -> {
+  Arrays.stream(fileIdArray)
+  .filter(fileId -> !replaceFileids.add(fileId))
+  .findFirst()
+  .ifPresent(s -> {
+throw new HoodieException("Replace commit involves duplicate 
file ID, which will cause the table status to be abnormal. The replace commit 
has been terminated.");

Review Comment:
   done~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6753) Fix parquet inline reading flaky test

2023-08-24 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-6753:
--
Description: 
Sometimes we see some flakiness around parquet inline reading. 

 

Ref: 
[https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/19457/logs/8]

 

 
{code:java}
2023-08-25T05:00:14.1359469Z 1389627 [Executor task launch worker for task 1.0 
in stage 4124.0 (TID 5621)] ERROR 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader [] - Got 
exception when reading log file
2023-08-25T05:00:14.1360427Z org.apache.hudi.exception.HoodieException: unable 
to read next record from parquet file 
2023-08-25T05:00:14.1361525Zat 
org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:54)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-25T05:00:14.1362403Zat 
org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-25T05:00:14.1363340Zat 
org.apache.hudi.common.util.collection.MappingIterator.hasNext(MappingIterator.java:39)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-25T05:00:14.1364854Zat 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processDataBlock(AbstractHoodieLogRecordReader.java:625)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-25T05:00:14.1365985Zat 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:667)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-25T05:00:14.1367473Zat 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternalV1(AbstractHoodieLogRecordReader.java:362)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-25T05:00:14.1368371Zat 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:220)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-25T05:00:14.1369127Zat 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:201)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-25T05:00:14.1369901Zat 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:117)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-25T05:00:14.1370633Zat 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:76)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-25T05:00:14.1371380Zat 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:466)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-25T05:00:14.1372312Zat 
org.apache.hudi.LogFileIterator$.scanLog(Iterators.scala:371) 
~[hudi-spark-common_2.12-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-25T05:00:14.1372915Zat 
org.apache.hudi.LogFileIterator.(Iterators.scala:110) 
~[hudi-spark-common_2.12-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-25T05:00:14.1373549Zat 
org.apache.hudi.RecordMergingFileIterator.(Iterators.scala:201) 
~[hudi-spark-common_2.12-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-25T05:00:14.1374172Zat 
org.apache.hudi.RecordMergingFileIterator.(Iterators.scala:212) 
~[hudi-spark-common_2.12-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-25T05:00:14.1374809Zat 
org.apache.hudi.RecordMergingFileIterator.(Iterators.scala:217) 
~[hudi-spark-common_2.12-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-25T05:00:14.1375480Zat 
org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:109) 
~[hudi-spark-common_2.12-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-25T05:00:14.1376156Zat 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) 
~[spark-core_2.12-3.2.3.jar:3.2.3]
2023-08-25T05:00:14.1376653Zat 
org.apache.spark.rdd.RDD.iterator(RDD.scala:337) 
~[spark-core_2.12-3.2.3.jar:3.2.3]
2023-08-25T05:00:14.1377283Zat 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) 
~[spark-core_2.12-3.2.3.jar:3.2.3]
2023-08-25T05:00:14.1377837Zat 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) 
~[spark-core_2.12-3.2.3.jar:3.2.3]
2023-08-25T05:00:14.1378323Zat 
org.apache.spark.rdd.RDD.iterator(RDD.scala:337) 
~[spark-core_2.12-3.2.3.jar:3.2.3]
2023-08-25T05:00:14.1378855Zat 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) 
~[spark-core_2.12-3.2.3.jar:3.2.3]
2023-08-25T05:00:14.1379397Zat 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) 
~[spark-core_2.12-3.2.3.jar:3.2.3]
2023-08-25T05:00:14.1379899Zat 
org.apache.spark.rdd.RDD.iterator(RDD.scala:337) 
~[spark-core_2.12-3.2.3.jar:3.2.3]
2023-08-25T05:00:14.1380446Zat 
org.apache.spark.shuffle.ShuffleWritePr

[jira] [Created] (HUDI-6754) Fix NullPointerException w/ AbstractRealTimRecordReader

2023-08-24 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-6754:
-

 Summary: Fix NullPointerException w/ AbstractRealTimRecordReader
 Key: HUDI-6754
 URL: https://issues.apache.org/jira/browse/HUDI-6754
 Project: Apache Hudi
  Issue Type: Bug
  Components: reader-core
Reporter: sivabalan narayanan


recently, we are seeing NPE w/ AbstractRealTimeRecordReader in our CI test 
runs. we need to chase them and put in a fix. 

Ref: 
[https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/19420/logs/35]

 
{code:java}
2023-08-23T08:16:08.7070674Z [INFO] Running 
org.apache.hudi.hadoop.realtime.TestHoodieMergeOnReadSnapshotReader
2023-08-23T08:16:09.4083756Z 10259 [main] WARN  
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader [] - fall to init 
HiveAvroSerializer to support payload merge
2023-08-23T08:16:09.4084752Z java.lang.NullPointerException: null
2023-08-23T08:16:09.4085512Zat 
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.prepareHiveAvroSerializer(AbstractRealtimeRecordReader.java:116)
 ~[classes/:?]
2023-08-23T08:16:09.4086265Zat 
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.(AbstractRealtimeRecordReader.java:103)
 ~[classes/:?]
2023-08-23T08:16:09.4087168Zat 
org.apache.hudi.hadoop.realtime.HoodieMergeOnReadSnapshotReader.(HoodieMergeOnReadSnapshotReader.java:106)
 ~[classes/:?]
2023-08-23T08:16:09.4087997Zat 
org.apache.hudi.hadoop.realtime.TestHoodieMergeOnReadSnapshotReader.lambda$testReaderInternal$0(TestHoodieMergeOnReadSnapshotReader.java:162)
 ~[test-classes/:?]
2023-08-23T08:16:09.4088379Zat 
java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_382]
2023-08-23T08:16:09.4089029Zat 
org.apache.hudi.hadoop.realtime.TestHoodieMergeOnReadSnapshotReader.testReaderInternal(TestHoodieMergeOnReadSnapshotReader.java:131)
 ~[test-classes/:?]
2023-08-23T08:16:09.4089707Zat 
org.apache.hudi.hadoop.realtime.TestHoodieMergeOnReadSnapshotReader.testSnapshotReaderPartitioned(TestHoodieMergeOnReadSnapshotReader.java:102)
 ~[test-classes/:?]
2023-08-23T08:16:09.4090092Zat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_382]
2023-08-23T08:16:09.4090857Zat 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_382]
2023-08-23T08:16:09.4091226Zat 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_382]
2023-08-23T08:16:09.4091546Zat 
java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_382]
2023-08-23T08:16:09.4092052Zat 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
 ~[junit-platform-commons-1.7.2.jar:1.7.2]
2023-08-23T08:16:09.4092633Zat 
org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
 ~[junit-jupiter-engine-5.7.2.jar:5.7.2]
2023-08-23T08:16:09.4093293Zat 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
 ~[junit-jupiter-engine-5.7.2.jar:5.7.2]
2023-08-23T08:16:09.4093917Zat 
org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
 ~[junit-jupiter-engine-5.7.2.jar:5.7.2]
2023-08-23T08:16:09.4094722Zat 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
 ~[junit-jupiter-engine-5.7.2.jar:5.7.2]
2023-08-23T08:16:09.4095348Zat 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
 ~[junit-jupiter-engine-5.7.2.jar:5.7.2]
2023-08-23T08:16:09.4096018Zat 
org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
 ~[junit-jupiter-engine-5.7.2.jar:5.7.2]
2023-08-23T08:16:09.4096648Zat 
org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
 ~[junit-jupiter-engine-5.7.2.jar:5.7.2]
2023-08-23T08:16:09.4097318Zat 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
 ~[junit-jupiter-engine-5.7.2.jar:5.7.2]
2023-08-23T08:16:09.4097975Zat 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
 ~[junit-jupiter-engine-5.7.2.jar:5.7.2]
2023-08-23T08:16:09.4098632Zat 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
 ~[junit-jupiter-engine-5.7.2.jar:5.7.2]
2023-08-23T08:16:09.4099252Zat 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
 ~[junit-jupiter-engine-5.7.2.jar:5.7.2]
2023-08-23T08:16:09.4099846Zat 
org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
 ~[junit-jupiter

[jira] [Updated] (HUDI-6753) Fix parquet inline reading flaky test

2023-08-24 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-6753:
--
Description: 
Sometimes we see some flakiness around parquet inline reading. 

 

Ref: 
[https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/19457/logs/8]

 

 
{code:java}
2023-08-24T23:58:28.3525728Z org.apache.hadoop.fs.FileAlreadyExistsException: 
File already exists: 
file:/tmp/junit97984765308488449/java_client_tests1692921505110/2016/03/15/a6427627-cf2b-46b4-9be4-53b6ad29cfb5-0_0-0-0_006.parquet
2023-08-24T23:58:28.3526765Zat 
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:296) 
~[hadoop-common-2.10.1.jar:?]
2023-08-24T23:58:28.3527818Zat 
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:334) 
~[hadoop-common-2.10.1.jar:?]
2023-08-24T23:58:28.3528456Zat 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:399)
 ~[hadoop-common-2.10.1.jar:?]
2023-08-24T23:58:28.3528999Zat 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:462) 
~[hadoop-common-2.10.1.jar:?]
2023-08-24T23:58:28.3529661Zat 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:441) 
~[hadoop-common-2.10.1.jar:?]
2023-08-24T23:58:28.3530139Zat 
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1072) 
~[hadoop-common-2.10.1.jar:?]
2023-08-24T23:58:28.3530588Zat 
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1053) 
~[hadoop-common-2.10.1.jar:?]
2023-08-24T23:58:28.3531161Zat 
org.apache.hudi.common.fs.HoodieWrapperFileSystem.lambda$create$12(HoodieWrapperFileSystem.java:317)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-24T23:58:28.3531857Zat 
org.apache.hudi.common.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:114)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-24T23:58:28.3532528Zat 
org.apache.hudi.common.fs.HoodieWrapperFileSystem.create(HoodieWrapperFileSystem.java:315)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-24T23:58:28.3533506Zat 
org.apache.parquet.hadoop.util.HadoopOutputFile.create(HadoopOutputFile.java:74)
 ~[parquet-hadoop-1.12.2.jar:1.12.2]
2023-08-24T23:58:28.3534047Zat 
org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:329) 
~[parquet-hadoop-1.12.2.jar:1.12.2]
2023-08-24T23:58:28.3534743Zat 
org.apache.parquet.hadoop.ParquetWriter.(ParquetWriter.java:292) 
~[parquet-hadoop-1.12.2.jar:1.12.2]
2023-08-24T23:58:28.3535303Zat 
org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:658) 
~[parquet-hadoop-1.12.2.jar:1.12.2]
2023-08-24T23:58:28.3535885Zat 
org.apache.hudi.io.storage.HoodieBaseParquetWriter.(HoodieBaseParquetWriter.java:80)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-24T23:58:28.3536699Zat 
org.apache.hudi.io.storage.HoodieAvroParquetWriter.(HoodieAvroParquetWriter.java:54)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-24T23:58:28.3537374Zat 
org.apache.hudi.io.storage.HoodieAvroFileWriterFactory.newParquetFileWriter(HoodieAvroFileWriterFactory.java:69)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-24T23:58:28.3538051Zat 
org.apache.hudi.io.storage.HoodieFileWriterFactory.getFileWriterByFormat(HoodieFileWriterFactory.java:80)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-24T23:58:28.3538711Zat 
org.apache.hudi.io.storage.HoodieFileWriterFactory.getFileWriter(HoodieFileWriterFactory.java:67)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-24T23:58:28.3539119Zat 
org.apache.hudi.io.HoodieMergeHandle.init(HoodieMergeHandle.java:201) 
~[classes/:?]
2023-08-24T23:58:28.3539449Zat 
org.apache.hudi.io.HoodieMergeHandle.(HoodieMergeHandle.java:149) 
~[classes/:?]
2023-08-24T23:58:28.3540049Zat 
org.apache.hudi.client.functional.TestHoodieJavaClientOnCopyOnWriteStorage.testUpsertsInternal(TestHoodieJavaClientOnCopyOnWriteStorage.java:591)
 ~[test-classes/:?]
2023-08-24T23:58:28.3540753Zat 
org.apache.hudi.client.functional.TestHoodieJavaClientOnCopyOnWriteStorage.testUpsertsPrepped(TestHoodieJavaClientOnCopyOnWriteStorage.java:455)
 ~[test-classes/:?]
2023-08-24T23:58:28.3541154Zat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_382]
2023-08-24T23:58:28.3541450Zat 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_382]
2023-08-24T23:58:28.3541817Zat 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_382]
2023-08-24T23:58:28.3542148Zat 
java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_382]
2023-08-24T23:58:28.3542646Zat 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
 ~[junit-platform-commons-1.7.2.jar:1.7.

[GitHub] [hudi] voonhous commented on a diff in pull request #9472: [HUDI-6719]Fix data inconsistency issues caused by concurrent clustering and delete partition.

2023-08-24 Thread via GitHub


voonhous commented on code in PR #9472:
URL: https://github.com/apache/hudi/pull/9472#discussion_r1305209537


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/ReplaceCommitValidateUtil.java:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.action.cluster;
+
+import org.apache.hudi.common.model.HoodieReplaceCommitMetadata;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.util.JsonUtils;
+import org.apache.hudi.exception.HoodieException;
+
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.util.FileIOUtils.LOG;
+
+public class ReplaceCommitValidateUtil {
+  public static final String REPLACE_COMMIT_FILE_IDS = "replaceCommitFileIds";
+  public static void validateReplaceCommit(HoodieTableMetaClient metaClient) {
+metaClient.reloadActiveTimeline();
+Set replaceFileids = new HashSet<>();
+
+// Verify pending and completed replace commit
+
Stream.concat(metaClient.getActiveTimeline().getCompletedReplaceTimeline().getInstantsAsStream(),
+
metaClient.getActiveTimeline().filterInflights().filterPendingReplaceTimeline().getInstantsAsStream()).map(instant
 -> {
+  try {
+HoodieReplaceCommitMetadata replaceCommitMetadata =
+
HoodieReplaceCommitMetadata.fromBytes(metaClient.getActiveTimeline().getInstantDetails(instant).get(),
+HoodieReplaceCommitMetadata.class);
+if (!instant.isCompleted()) {
+  return 
JsonUtils.getObjectMapper().readValue(replaceCommitMetadata.getExtraMetadata().get(REPLACE_COMMIT_FILE_IDS),
 String[].class);
+} else {
+  return 
replaceCommitMetadata.getPartitionToReplaceFileIds().values().stream()
+  .flatMap(List::stream)
+  .toArray(String[]::new);
+}
+  } catch (Exception e) {
+// If the key does not exist or there is a JSON parsing error, LOG 
reports an error and ignores it.
+LOG.error("Error when reading replace commit meta", e);
+return null;
+  }
+}).filter(Objects::nonNull)
+.forEach(fileIdArray -> {
+  Arrays.stream(fileIdArray)
+  .filter(fileId -> !replaceFileids.add(fileId))
+  .findFirst()
+  .ifPresent(s -> {
+throw new HoodieException("Replace commit involves duplicate 
file ID, which will cause the table status to be abnormal. The replace commit 
has been terminated.");

Review Comment:
   The first file id should be enough. (The `s` variable here should be enough)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] majian1998 commented on a diff in pull request #9472: [HUDI-6719]Fix data inconsistency issues caused by concurrent clustering and delete partition.

2023-08-24 Thread via GitHub


majian1998 commented on code in PR #9472:
URL: https://github.com/apache/hudi/pull/9472#discussion_r1305191846


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/ReplaceCommitValidateUtil.java:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.action.cluster;
+
+import org.apache.hudi.common.model.HoodieReplaceCommitMetadata;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.util.JsonUtils;
+import org.apache.hudi.exception.HoodieException;
+
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.util.FileIOUtils.LOG;
+
+public class ReplaceCommitValidateUtil {
+  public static final String REPLACE_COMMIT_FILE_IDS = "replaceCommitFileIds";
+  public static void validateReplaceCommit(HoodieTableMetaClient metaClient) {
+metaClient.reloadActiveTimeline();
+Set replaceFileids = new HashSet<>();
+
+// Verify pending and completed replace commit
+
Stream.concat(metaClient.getActiveTimeline().getCompletedReplaceTimeline().getInstantsAsStream(),
+
metaClient.getActiveTimeline().filterInflights().filterPendingReplaceTimeline().getInstantsAsStream()).map(instant
 -> {
+  try {
+HoodieReplaceCommitMetadata replaceCommitMetadata =
+
HoodieReplaceCommitMetadata.fromBytes(metaClient.getActiveTimeline().getInstantDetails(instant).get(),
+HoodieReplaceCommitMetadata.class);
+if (!instant.isCompleted()) {
+  return 
JsonUtils.getObjectMapper().readValue(replaceCommitMetadata.getExtraMetadata().get(REPLACE_COMMIT_FILE_IDS),
 String[].class);
+} else {
+  return 
replaceCommitMetadata.getPartitionToReplaceFileIds().values().stream()
+  .flatMap(List::stream)
+  .toArray(String[]::new);
+}
+  } catch (Exception e) {
+// If the key does not exist or there is a JSON parsing error, LOG 
reports an error and ignores it.
+LOG.error("Error when reading replace commit meta", e);
+return null;
+  }
+}).filter(Objects::nonNull)
+.forEach(fileIdArray -> {
+  Arrays.stream(fileIdArray)
+  .filter(fileId -> !replaceFileids.add(fileId))
+  .findFirst()
+  .ifPresent(s -> {
+throw new HoodieException("Replace commit involves duplicate 
file ID, which will cause the table status to be abnormal. The replace commit 
has been terminated.");

Review Comment:
   This is a great suggestion, but would printing all file IDs in case of 
multiple conflicts make the log bloated and difficult to read? Alternatively, 
could I perform some additional processing in this stream to pass along the 
instant and print the conflicting instant instead? However, adding more 
processing may slightly reduce the readability of this stream.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6753) Fix parquet inline reading flaky test

2023-08-24 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-6753:
--
Epic Link: HUDI-4302

> Fix parquet inline reading flaky test
> -
>
> Key: HUDI-6753
> URL: https://issues.apache.org/jira/browse/HUDI-6753
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: reader-core
>Reporter: sivabalan narayanan
>Priority: Major
>
> Sometimes we see some flakiness around parquet inline reading. 
>  
> {code:java}
> 2023-08-24T23:58:28.3525728Z org.apache.hadoop.fs.FileAlreadyExistsException: 
> File already exists: 
> file:/tmp/junit97984765308488449/java_client_tests1692921505110/2016/03/15/a6427627-cf2b-46b4-9be4-53b6ad29cfb5-0_0-0-0_006.parquet
> 2023-08-24T23:58:28.3526765Z  at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:296) 
> ~[hadoop-common-2.10.1.jar:?]
> 2023-08-24T23:58:28.3527818Z  at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:334) 
> ~[hadoop-common-2.10.1.jar:?]
> 2023-08-24T23:58:28.3528456Z  at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:399)
>  ~[hadoop-common-2.10.1.jar:?]
> 2023-08-24T23:58:28.3528999Z  at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:462) 
> ~[hadoop-common-2.10.1.jar:?]
> 2023-08-24T23:58:28.3529661Z  at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:441) 
> ~[hadoop-common-2.10.1.jar:?]
> 2023-08-24T23:58:28.3530139Z  at 
> org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1072) 
> ~[hadoop-common-2.10.1.jar:?]
> 2023-08-24T23:58:28.3530588Z  at 
> org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1053) 
> ~[hadoop-common-2.10.1.jar:?]
> 2023-08-24T23:58:28.3531161Z  at 
> org.apache.hudi.common.fs.HoodieWrapperFileSystem.lambda$create$12(HoodieWrapperFileSystem.java:317)
>  ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
> 2023-08-24T23:58:28.3531857Z  at 
> org.apache.hudi.common.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:114)
>  ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
> 2023-08-24T23:58:28.3532528Z  at 
> org.apache.hudi.common.fs.HoodieWrapperFileSystem.create(HoodieWrapperFileSystem.java:315)
>  ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
> 2023-08-24T23:58:28.3533506Z  at 
> org.apache.parquet.hadoop.util.HadoopOutputFile.create(HadoopOutputFile.java:74)
>  ~[parquet-hadoop-1.12.2.jar:1.12.2]
> 2023-08-24T23:58:28.3534047Z  at 
> org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:329)
>  ~[parquet-hadoop-1.12.2.jar:1.12.2]
> 2023-08-24T23:58:28.3534743Z  at 
> org.apache.parquet.hadoop.ParquetWriter.(ParquetWriter.java:292) 
> ~[parquet-hadoop-1.12.2.jar:1.12.2]
> 2023-08-24T23:58:28.3535303Z  at 
> org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:658) 
> ~[parquet-hadoop-1.12.2.jar:1.12.2]
> 2023-08-24T23:58:28.3535885Z  at 
> org.apache.hudi.io.storage.HoodieBaseParquetWriter.(HoodieBaseParquetWriter.java:80)
>  ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
> 2023-08-24T23:58:28.3536699Z  at 
> org.apache.hudi.io.storage.HoodieAvroParquetWriter.(HoodieAvroParquetWriter.java:54)
>  ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
> 2023-08-24T23:58:28.3537374Z  at 
> org.apache.hudi.io.storage.HoodieAvroFileWriterFactory.newParquetFileWriter(HoodieAvroFileWriterFactory.java:69)
>  ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
> 2023-08-24T23:58:28.3538051Z  at 
> org.apache.hudi.io.storage.HoodieFileWriterFactory.getFileWriterByFormat(HoodieFileWriterFactory.java:80)
>  ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
> 2023-08-24T23:58:28.3538711Z  at 
> org.apache.hudi.io.storage.HoodieFileWriterFactory.getFileWriter(HoodieFileWriterFactory.java:67)
>  ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
> 2023-08-24T23:58:28.3539119Z  at 
> org.apache.hudi.io.HoodieMergeHandle.init(HoodieMergeHandle.java:201) 
> ~[classes/:?]
> 2023-08-24T23:58:28.3539449Z  at 
> org.apache.hudi.io.HoodieMergeHandle.(HoodieMergeHandle.java:149) 
> ~[classes/:?]
> 2023-08-24T23:58:28.3540049Z  at 
> org.apache.hudi.client.functional.TestHoodieJavaClientOnCopyOnWriteStorage.testUpsertsInternal(TestHoodieJavaClientOnCopyOnWriteStorage.java:591)
>  ~[test-classes/:?]
> 2023-08-24T23:58:28.3540753Z  at 
> org.apache.hudi.client.functional.TestHoodieJavaClientOnCopyOnWriteStorage.testUpsertsPrepped(TestHoodieJavaClientOnCopyOnWriteStorage.java:455)
>  ~[test-classes/:?]
> 2023-08-24T23:58:28.3541154Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_382]
> 2023-08-24T23:58:28.3541450Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_382]
> 2023-08-24T23:58:28.3541817Z  at 
> sun.reflect.De

[jira] [Created] (HUDI-6753) Fix parquet inline reading flaky test

2023-08-24 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-6753:
-

 Summary: Fix parquet inline reading flaky test
 Key: HUDI-6753
 URL: https://issues.apache.org/jira/browse/HUDI-6753
 Project: Apache Hudi
  Issue Type: Bug
  Components: reader-core
Reporter: sivabalan narayanan


Sometimes we see some flakiness around parquet inline reading. 

 
{code:java}
2023-08-24T23:58:28.3525728Z org.apache.hadoop.fs.FileAlreadyExistsException: 
File already exists: 
file:/tmp/junit97984765308488449/java_client_tests1692921505110/2016/03/15/a6427627-cf2b-46b4-9be4-53b6ad29cfb5-0_0-0-0_006.parquet
2023-08-24T23:58:28.3526765Zat 
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:296) 
~[hadoop-common-2.10.1.jar:?]
2023-08-24T23:58:28.3527818Zat 
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:334) 
~[hadoop-common-2.10.1.jar:?]
2023-08-24T23:58:28.3528456Zat 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:399)
 ~[hadoop-common-2.10.1.jar:?]
2023-08-24T23:58:28.3528999Zat 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:462) 
~[hadoop-common-2.10.1.jar:?]
2023-08-24T23:58:28.3529661Zat 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:441) 
~[hadoop-common-2.10.1.jar:?]
2023-08-24T23:58:28.3530139Zat 
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1072) 
~[hadoop-common-2.10.1.jar:?]
2023-08-24T23:58:28.3530588Zat 
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1053) 
~[hadoop-common-2.10.1.jar:?]
2023-08-24T23:58:28.3531161Zat 
org.apache.hudi.common.fs.HoodieWrapperFileSystem.lambda$create$12(HoodieWrapperFileSystem.java:317)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-24T23:58:28.3531857Zat 
org.apache.hudi.common.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:114)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-24T23:58:28.3532528Zat 
org.apache.hudi.common.fs.HoodieWrapperFileSystem.create(HoodieWrapperFileSystem.java:315)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-24T23:58:28.3533506Zat 
org.apache.parquet.hadoop.util.HadoopOutputFile.create(HadoopOutputFile.java:74)
 ~[parquet-hadoop-1.12.2.jar:1.12.2]
2023-08-24T23:58:28.3534047Zat 
org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:329) 
~[parquet-hadoop-1.12.2.jar:1.12.2]
2023-08-24T23:58:28.3534743Zat 
org.apache.parquet.hadoop.ParquetWriter.(ParquetWriter.java:292) 
~[parquet-hadoop-1.12.2.jar:1.12.2]
2023-08-24T23:58:28.3535303Zat 
org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:658) 
~[parquet-hadoop-1.12.2.jar:1.12.2]
2023-08-24T23:58:28.3535885Zat 
org.apache.hudi.io.storage.HoodieBaseParquetWriter.(HoodieBaseParquetWriter.java:80)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-24T23:58:28.3536699Zat 
org.apache.hudi.io.storage.HoodieAvroParquetWriter.(HoodieAvroParquetWriter.java:54)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-24T23:58:28.3537374Zat 
org.apache.hudi.io.storage.HoodieAvroFileWriterFactory.newParquetFileWriter(HoodieAvroFileWriterFactory.java:69)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-24T23:58:28.3538051Zat 
org.apache.hudi.io.storage.HoodieFileWriterFactory.getFileWriterByFormat(HoodieFileWriterFactory.java:80)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-24T23:58:28.3538711Zat 
org.apache.hudi.io.storage.HoodieFileWriterFactory.getFileWriter(HoodieFileWriterFactory.java:67)
 ~[hudi-common-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
2023-08-24T23:58:28.3539119Zat 
org.apache.hudi.io.HoodieMergeHandle.init(HoodieMergeHandle.java:201) 
~[classes/:?]
2023-08-24T23:58:28.3539449Zat 
org.apache.hudi.io.HoodieMergeHandle.(HoodieMergeHandle.java:149) 
~[classes/:?]
2023-08-24T23:58:28.3540049Zat 
org.apache.hudi.client.functional.TestHoodieJavaClientOnCopyOnWriteStorage.testUpsertsInternal(TestHoodieJavaClientOnCopyOnWriteStorage.java:591)
 ~[test-classes/:?]
2023-08-24T23:58:28.3540753Zat 
org.apache.hudi.client.functional.TestHoodieJavaClientOnCopyOnWriteStorage.testUpsertsPrepped(TestHoodieJavaClientOnCopyOnWriteStorage.java:455)
 ~[test-classes/:?]
2023-08-24T23:58:28.3541154Zat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_382]
2023-08-24T23:58:28.3541450Zat 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_382]
2023-08-24T23:58:28.3541817Zat 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_382]
2023-08-24T23:58:28.3542148Zat 
java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_382]
2023-08-24T23:58:28.3542646Zat 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
 ~[juni

[GitHub] [hudi] majian1998 commented on a diff in pull request #9472: [HUDI-6719]Fix data inconsistency issues caused by concurrent clustering and delete partition.

2023-08-24 Thread via GitHub


majian1998 commented on code in PR #9472:
URL: https://github.com/apache/hudi/pull/9472#discussion_r1305179990


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/ReplaceCommitValidateUtil.java:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.action.cluster;
+
+import org.apache.hudi.common.model.HoodieReplaceCommitMetadata;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.util.JsonUtils;
+import org.apache.hudi.exception.HoodieException;
+
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.util.FileIOUtils.LOG;
+
+public class ReplaceCommitValidateUtil {
+  public static final String REPLACE_COMMIT_FILE_IDS = "replaceCommitFileIds";
+  public static void validateReplaceCommit(HoodieTableMetaClient metaClient) {
+metaClient.reloadActiveTimeline();
+Set replaceFileids = new HashSet<>();
+
+// Verify pending and completed replace commit
+
Stream.concat(metaClient.getActiveTimeline().getCompletedReplaceTimeline().getInstantsAsStream(),
+
metaClient.getActiveTimeline().filterInflights().filterPendingReplaceTimeline().getInstantsAsStream()).map(instant
 -> {
+  try {
+HoodieReplaceCommitMetadata replaceCommitMetadata =
+
HoodieReplaceCommitMetadata.fromBytes(metaClient.getActiveTimeline().getInstantDetails(instant).get(),
+HoodieReplaceCommitMetadata.class);
+if (!instant.isCompleted()) {
+  return 
JsonUtils.getObjectMapper().readValue(replaceCommitMetadata.getExtraMetadata().get(REPLACE_COMMIT_FILE_IDS),
 String[].class);
+} else {
+  return 
replaceCommitMetadata.getPartitionToReplaceFileIds().values().stream()
+  .flatMap(List::stream)
+  .toArray(String[]::new);
+}
+  } catch (Exception e) {
+// If the key does not exist or there is a JSON parsing error, LOG 
reports an error and ignores it.
+LOG.error("Error when reading replace commit meta", e);
+return null;
+  }
+}).filter(Objects::nonNull)
+.forEach(fileIdArray -> {

Review Comment:
   done 



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/ReplaceCommitValidateUtil.java:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.action.cluster;
+
+import org.apache.hudi.common.model.HoodieReplaceCommitMetadata;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.util.JsonUtils;
+import org.apache.hudi.exception.HoodieException;
+
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.util.FileIOUtils.LOG;
+
+public class ReplaceCommitValidateUtil {
+  public static final String REPLACE_COMMIT_FILE_IDS = "replaceCommitFileIds";
+  public static void validateReplaceCommit(HoodieTableMetaClient metaClient) {
+metaClient.reloadActiveTimeline();
+Set replaceFileids = new HashSet<>();
+
+// Verify pending and completed replac

[jira] [Comment Edited] (HUDI-1623) Support start_commit_time & end_commit_times for serializable incremental pull

2023-08-24 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758834#comment-17758834
 ] 

Vinoth Chandar edited comment on HUDI-1623 at 8/25/23 5:40 AM:
---

On TrueTime, we add a new _TrueTimeGenerator_ Interface.

By default, we rely on existing lock provider interface. 
{code:java}
Class WaitBasedTrueTimeGenerator implements TrueTimeGenerator {
   long maxExpectedClockSkewMs;
   LockProvider lock;  
 
   long generateTime() {
 try (lock) {
 long ts = System.currentTimeMillis();
 Thread.sleep(maxExpectedClockSkewMs);
 return ts;
 }
   }
}
{code}
Without relying on clock skew, 
{code:java}
Class StatefulTrueTimeGenerator implements TrueTimeGenerator {
   String timeStampFilePath = ".../.hoodie/truetime_latest";
   LockProvider lock;  
 
   long generateTime() {
 try (lock) {
 long currentMaxTrueTime = readAsLong(timeStampFilePath);
 long newTrueTime = Math.max(ts, currentMaxTrueTime + 100);
 writeAsLong(newTrueTime);
 return newTrueTime;
 }
   }
}
{code}


was (Author: vc):
On TrueTime, we add a new _TrueTimeGenerator_ Interface.

By default, we rely on existing lock provider interface. 
{code:java}
Class WaitBasedTrueTimeGenerator implements TrueTimeGenerator {
   long maxExpectedClockSkewMs;
   LockProvider lock;  
 
   long generateTime() {
 try (lock) {
 long ts = System.currentTimeMillis();
 Thread.sleep(maxExpectedClockSkewMs);
 return ts;
 }
   }
}
{code}
 

> Support start_commit_time & end_commit_times for serializable incremental pull
> --
>
> Key: HUDI-1623
> URL: https://issues.apache.org/jira/browse/HUDI-1623
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Nishith Agarwal
>Assignee: Danny Chen
>Priority: Critical
> Fix For: 1.0.0
>
>
> We suggest a new file naming for the *completed* metadata file:
> ${start_time}.${action}.${completion_time}
>  
> We also need a global *Time Generator* that can ensure the monotonical 
> increasing generation of the timestamp, for example, maybe hold a mutex lock 
> with the last generated timestamp backing up there. Say it may holds a lock 
> {*}L1{*}. For each instant time generation, it needs guard from the lock.
>  
> Before creating the completed file, we also need a lock guard from L1.
>  
> Things need to note:
> 1. we only add completion timestamp to the completed metadata file;
> 2. we only add lock guard to the completed metadata file creation, not the 
> whole commiting procedure;
> 3. for regular instant time generation, we also need a lock (that we should 
> ship out by default)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9531: [HUDI-6445] Triage ci flakiness aug24 2023 attempt7

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9531:
URL: https://github.com/apache/hudi/pull/9531#issuecomment-1692790494

   
   ## CI report:
   
   * 2b4485a87838bc599b6d5c8c868d4098a34feae0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19457)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19453)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] tchow-notion commented on issue #9289: [SUPPORT] ConcurrentModificationException in BaseHoodieTableFileIndex.getTotalCachedFilesSize()

2023-08-24 Thread via GitHub


tchow-notion commented on issue #9289:
URL: https://github.com/apache/hudi/issues/9289#issuecomment-1692786968

   Hi folks, is there an update on this issue? We are experiencing this with 
concurrent reads.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-1623) Support start_commit_time & end_commit_times for serializable incremental pull

2023-08-24 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758834#comment-17758834
 ] 

Vinoth Chandar commented on HUDI-1623:
--

On TrueTime, we add a new _TrueTimeGenerator_ Interface.

By default, we rely on existing lock provider interface. 
{code:java}
Class WaitBasedTrueTimeGenerator implements TrueTimeGenerator {
   long maxExpectedClockSkewMs;
   LockProvider lock;  
 
   long generateTime() {
 try (lock) {
 long ts = System.currentTimeMillis();
 Thread.sleep(maxExpectedClockSkewMs);
 return ts;
 }
   }
}
{code}
 

> Support start_commit_time & end_commit_times for serializable incremental pull
> --
>
> Key: HUDI-1623
> URL: https://issues.apache.org/jira/browse/HUDI-1623
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Nishith Agarwal
>Assignee: Danny Chen
>Priority: Critical
> Fix For: 1.0.0
>
>
> We suggest a new file naming for the *completed* metadata file:
> ${start_time}.${action}.${completion_time}
>  
> We also need a global *Time Generator* that can ensure the monotonical 
> increasing generation of the timestamp, for example, maybe hold a mutex lock 
> with the last generated timestamp backing up there. Say it may holds a lock 
> {*}L1{*}. For each instant time generation, it needs guard from the lock.
>  
> Before creating the completed file, we also need a lock guard from L1.
>  
> Things need to note:
> 1. we only add completion timestamp to the completed metadata file;
> 2. we only add lock guard to the completed metadata file creation, not the 
> whole commiting procedure;
> 3. for regular instant time generation, we also need a lock (that we should 
> ship out by default)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-1623) Support start_commit_time & end_commit_times for serializable incremental pull

2023-08-24 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758833#comment-17758833
 ] 

Vinoth Chandar commented on HUDI-1623:
--

On the naming of the active timeline instants: 

let's _${start_time}_${completion_time}.${action}_ for completed instants. and 
leave the requested/inflight  alone without changes. 

> Support start_commit_time & end_commit_times for serializable incremental pull
> --
>
> Key: HUDI-1623
> URL: https://issues.apache.org/jira/browse/HUDI-1623
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Nishith Agarwal
>Assignee: Danny Chen
>Priority: Critical
> Fix For: 1.0.0
>
>
> We suggest a new file naming for the *completed* metadata file:
> ${start_time}.${action}.${completion_time}
>  
> We also need a global *Time Generator* that can ensure the monotonical 
> increasing generation of the timestamp, for example, maybe hold a mutex lock 
> with the last generated timestamp backing up there. Say it may holds a lock 
> {*}L1{*}. For each instant time generation, it needs guard from the lock.
>  
> Before creating the completed file, we also need a lock guard from L1.
>  
> Things need to note:
> 1. we only add completion timestamp to the completed metadata file;
> 2. we only add lock guard to the completed metadata file creation, not the 
> whole commiting procedure;
> 3. for regular instant time generation, we also need a lock (that we should 
> ship out by default)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6708) Support Record Index with the Async Indexer

2023-08-24 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-6708:
-
Reviewers: Vinoth Chandar

> Support Record Index with the Async Indexer
> ---
>
> Key: HUDI-6708
> URL: https://issues.apache.org/jira/browse/HUDI-6708
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Record index can be created using the async indexer if there are no inflight 
> commits. With inflight commits, the catch-up task will [throw the 
> UnsupportedException|https://github.com/apache/hudi/blob/616b663c19eb059438dc3a7f80c5c84b39a6ef6f/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java#L380-L382]
>  as we don't have WriteStatus (which is needed to build record index) at that 
> time. Hudi should be able to build the complete record index concurrently 
> with ingestion (except for the limitations that currently exist for 
> concurrent indexing).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9533: [HUDI-6445] Fixing metrics in tests

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9533:
URL: https://github.com/apache/hudi/pull/9533#issuecomment-1692750092

   
   ## CI report:
   
   * 78ba0da62104725400b6aec04775f1e2c7dfd113 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19459)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19455)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9532: [HUDI-6445] Triage ci flakiness aug24 2023 attempt8

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9532:
URL: https://github.com/apache/hudi/pull/9532#issuecomment-1692750065

   
   ## CI report:
   
   * c2c0d27633185197448a772bf437424c6c8b212b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19458)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19454)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9531: [HUDI-6445] Triage ci flakiness aug24 2023 attempt7

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9531:
URL: https://github.com/apache/hudi/pull/9531#issuecomment-1692750049

   
   ## CI report:
   
   * 2b4485a87838bc599b6d5c8c868d4098a34feae0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19457)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19453)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9530: [HUDI-6445] Triage ci flakiness aug24 2023 attempt6

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9530:
URL: https://github.com/apache/hudi/pull/9530#issuecomment-1692750002

   
   ## CI report:
   
   * 9c10982d6aefc758e345826ab7a42d5c48371cb5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19456)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19452)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #9209: [HUDI-6539] New LSM tree style archived timeline

2023-08-24 Thread via GitHub


danny0405 commented on code in PR #9209:
URL: https://github.com/apache/hudi/pull/9209#discussion_r1305108862


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java:
##
@@ -424,9 +159,7 @@ private Stream getCommitInstantsToArchive() 
throws IOException {
   LESSER_THAN, 
oldestPendingInstant.get().getTimestamp())).findFirst());
   // Check if the completed instant is higher than the oldest inflight 
instant
   // in that case update the oldestCommitToRetain to oldestInflight commit 
time.
-  if (!completedCommitBeforeOldestPendingInstant.isPresent()
-  || 
HoodieTimeline.compareTimestamps(oldestPendingInstant.get().getTimestamp(),
-  LESSER_THAN, 
completedCommitBeforeOldestPendingInstant.get().getTimestamp())) {
+  if (!completedCommitBeforeOldestPendingInstant.isPresent()) {

Review Comment:
   Because the check
   ```java
   HoodieTimeline.compareTimestamps(oldestPendingInstant.get().getTimestamp(),
 LESSER_THAN, 
completedCommitBeforeOldestPendingInstant.get().getTimestamp())
   ```
   
   is always false. I can revert the change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9533: [HUDI-6445] Fixing metrics in tests

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9533:
URL: https://github.com/apache/hudi/pull/9533#issuecomment-1692722141

   
   ## CI report:
   
   * 78ba0da62104725400b6aec04775f1e2c7dfd113 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19455)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19459)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9532: [HUDI-6445] Triage ci flakiness aug24 2023 attempt8

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9532:
URL: https://github.com/apache/hudi/pull/9532#issuecomment-1692722079

   
   ## CI report:
   
   * c2c0d27633185197448a772bf437424c6c8b212b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19454)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19458)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9531: [HUDI-6445] Triage ci flakiness aug24 2023 attempt7

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9531:
URL: https://github.com/apache/hudi/pull/9531#issuecomment-1692722033

   
   ## CI report:
   
   * 2b4485a87838bc599b6d5c8c868d4098a34feae0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19453)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19457)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9530: [HUDI-6445] Triage ci flakiness aug24 2023 attempt6

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9530:
URL: https://github.com/apache/hudi/pull/9530#issuecomment-1692721989

   
   ## CI report:
   
   * 9c10982d6aefc758e345826ab7a42d5c48371cb5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19452)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19456)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6722) Performance and API improvement on record merging

2023-08-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6722:

Epic Name: Performance and API improvement on record merging

> Performance and API improvement on record merging
> -
>
> Key: HUDI-6722
> URL: https://issues.apache.org/jira/browse/HUDI-6722
> Project: Apache Hudi
>  Issue Type: Epic
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6752) Scope out the work for file group reading and writing with record merging in Spark

2023-08-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6752:

Status: In Progress  (was: Open)

> Scope out the work for file group reading and writing with record merging in 
> Spark
> --
>
> Key: HUDI-6752
> URL: https://issues.apache.org/jira/browse/HUDI-6752
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6752) Scope out the work for file group reading and writing with record merging in Spark

2023-08-24 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-6752:
---

 Summary: Scope out the work for file group reading and writing 
with record merging in Spark
 Key: HUDI-6752
 URL: https://issues.apache.org/jira/browse/HUDI-6752
 Project: Apache Hudi
  Issue Type: Task
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6752) Scope out the work for file group reading and writing with record merging in Spark

2023-08-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6752:

Fix Version/s: 1.0.0

> Scope out the work for file group reading and writing with record merging in 
> Spark
> --
>
> Key: HUDI-6752
> URL: https://issues.apache.org/jira/browse/HUDI-6752
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6752) Scope out the work for file group reading and writing with record merging in Spark

2023-08-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-6752:
---

Assignee: Ethan Guo

> Scope out the work for file group reading and writing with record merging in 
> Spark
> --
>
> Key: HUDI-6752
> URL: https://issues.apache.org/jira/browse/HUDI-6752
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6751) Scope out remaining work for the record merging API

2023-08-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6751:

Status: In Progress  (was: Open)

> Scope out remaining work for the record merging API
> ---
>
> Key: HUDI-6751
> URL: https://issues.apache.org/jira/browse/HUDI-6751
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6751) Scope out remaining work for the record merging API

2023-08-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-6751:
---

Assignee: Ethan Guo

> Scope out remaining work for the record merging API
> ---
>
> Key: HUDI-6751
> URL: https://issues.apache.org/jira/browse/HUDI-6751
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6751) Scope out remaining work for the record merging API

2023-08-24 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-6751:
---

 Summary: Scope out remaining work for the record merging API
 Key: HUDI-6751
 URL: https://issues.apache.org/jira/browse/HUDI-6751
 Project: Apache Hudi
  Issue Type: Task
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6751) Scope out remaining work for the record merging API

2023-08-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6751:

Fix Version/s: 1.0.0

> Scope out remaining work for the record merging API
> ---
>
> Key: HUDI-6751
> URL: https://issues.apache.org/jira/browse/HUDI-6751
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-6721) Prototype and benchmark partial updates in MOR log merging

2023-08-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-6721.
---
Resolution: Fixed

> Prototype and benchmark partial updates in MOR log merging
> --
>
> Key: HUDI-6721
> URL: https://issues.apache.org/jira/browse/HUDI-6721
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-6723) Prototype and benchmark event-time based in MOR log merging

2023-08-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-6723.
---
Resolution: Fixed

> Prototype and benchmark event-time based in MOR log merging
> ---
>
> Key: HUDI-6723
> URL: https://issues.apache.org/jira/browse/HUDI-6723
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-6720) Prototype and benchmark position- and key-based updates and deletes in MOR

2023-08-24 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-6720.
---
Resolution: Fixed

> Prototype and benchmark position- and key-based updates and deletes in MOR
> --
>
> Key: HUDI-6720
> URL: https://issues.apache.org/jira/browse/HUDI-6720
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9532: [HUDI-6445] Triage ci flakiness aug24 2023 attempt8

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9532:
URL: https://github.com/apache/hudi/pull/9532#issuecomment-1692716897

   
   ## CI report:
   
   * c2c0d27633185197448a772bf437424c6c8b212b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9530: [HUDI-6445] Triage ci flakiness aug24 2023 attempt6

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9530:
URL: https://github.com/apache/hudi/pull/9530#issuecomment-1692716804

   
   ## CI report:
   
   * 9c10982d6aefc758e345826ab7a42d5c48371cb5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9533: [HUDI-6445] Fixing metrics in tests

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9533:
URL: https://github.com/apache/hudi/pull/9533#issuecomment-1692716928

   
   ## CI report:
   
   * 78ba0da62104725400b6aec04775f1e2c7dfd113 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9531: [HUDI-6445] Triage ci flakiness aug24 2023 attempt7

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9531:
URL: https://github.com/apache/hudi/pull/9531#issuecomment-1692716851

   
   ## CI report:
   
   * 2b4485a87838bc599b6d5c8c868d4098a34feae0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9515: [HUDI-2141] Support flink compaction metrics

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9515:
URL: https://github.com/apache/hudi/pull/9515#issuecomment-1692716707

   
   ## CI report:
   
   * 4ccfb5f43f82a7c57de83794ebaa9c5a051c679a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19436)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19451)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] guanziyue commented on a diff in pull request #9523: [DO NOT MEGRE] HUDI-1517 Adding support for Per log file marker to track all log files added by a commit and to assist with rollbac

2023-08-24 Thread via GitHub


guanziyue commented on code in PR #9523:
URL: https://github.com/apache/hudi/pull/9523#discussion_r1305098875


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/MarkerBasedRollbackStrategy.java:
##
@@ -103,35 +120,47 @@ public List 
getRollbackRequests(HoodieInstant instantToRo
 }
   }
 
-  protected HoodieRollbackRequest getRollbackRequestForAppend(HoodieInstant 
instantToRollback, String markerFilePath) throws IOException {
-Path baseFilePathForAppend = new Path(basePath, markerFilePath);
-String fileId = FSUtils.getFileIdFromFilePath(baseFilePathForAppend);
-String baseCommitTime = 
FSUtils.getCommitTime(baseFilePathForAppend.getName());
-String relativePartitionPath = FSUtils.getRelativePartitionPath(new 
Path(basePath), baseFilePathForAppend.getParent());
-Path partitionPath = FSUtils.getPartitionPath(config.getBasePath(), 
relativePartitionPath);
-
-// NOTE: Since we're rolling back incomplete Delta Commit, it only could 
have appended its
-//   block to the latest log-file
-// TODO(HUDI-1517) use provided marker-file's path instead
-Option latestLogFileOption = 
FSUtils.getLatestLogFile(table.getMetaClient().getFs(), partitionPath, fileId,
-HoodieFileFormat.HOODIE_LOG.getFileExtension(), baseCommitTime);
-
-// Log file can be deleted if the commit to rollback is also the commit 
that created the fileGroup
-if (latestLogFileOption.isPresent() && 
baseCommitTime.equals(instantToRollback.getTimestamp())) {
-  Path fullDeletePath = new Path(partitionPath, 
latestLogFileOption.get().getFileName());
-  return new HoodieRollbackRequest(relativePartitionPath, EMPTY_STRING, 
EMPTY_STRING,
-  Collections.singletonList(fullDeletePath.toString()),
-  Collections.emptyMap());
+  protected HoodieRollbackRequest getRollbackRequestForAppend(String 
markerFilePath) throws IOException {
+Path filePath = new Path(basePath, markerFilePath);
+String fileId;
+String baseCommitTime;
+String relativePartitionPath;
+Option latestLogFileOption;
+
+// Old marker files may be generated from base file name before HUDI-1517. 
keep compatible with them.
+// TODO: deprecated in HUDI-1517, may be removed in the future. 
@guanziyue.gzy
+if (FSUtils.isBaseFile(filePath)) {

Review Comment:
   Yes. But it is also a good idea to do this in upgrade and downgrade. I do 
this because I have no chance to test upgrade or downgrade with real data at 
that time. If you prefer that way, I can change it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #9532: [HUDI-6445] Triage ci flakiness aug24 2023 attempt8

2023-08-24 Thread via GitHub


nsivabalan commented on PR #9532:
URL: https://github.com/apache/hudi/pull/9532#issuecomment-1692713081

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #9531: [HUDI-6445] Triage ci flakiness aug24 2023 attempt7

2023-08-24 Thread via GitHub


nsivabalan commented on PR #9531:
URL: https://github.com/apache/hudi/pull/9531#issuecomment-1692713000

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #9530: [HUDI-6445] Triage ci flakiness aug24 2023 attempt6

2023-08-24 Thread via GitHub


nsivabalan commented on PR #9530:
URL: https://github.com/apache/hudi/pull/9530#issuecomment-1692712824

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] stream2000 commented on pull request #9515: [HUDI-2141] Support flink compaction metrics

2023-08-24 Thread via GitHub


stream2000 commented on PR #9515:
URL: https://github.com/apache/hudi/pull/9515#issuecomment-1692712087

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan opened a new pull request, #9533: [HUDI-6445] Fixing metrics in tests

2023-08-24 Thread via GitHub


nsivabalan opened a new pull request, #9533:
URL: https://github.com/apache/hudi/pull/9533

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9528: [HUDI-6445] Triage ci flakiness aug24 2023 attempt5

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9528:
URL: https://github.com/apache/hudi/pull/9528#issuecomment-1692711970

   
   ## CI report:
   
   * 198827246a0508c418eed0824ea4772f5e14303b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19450)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] guanziyue commented on a diff in pull request #9523: [DO NOT MEGRE] HUDI-1517 Adding support for Per log file marker to track all log files added by a commit and to assist with rollbac

2023-08-24 Thread via GitHub


guanziyue commented on code in PR #9523:
URL: https://github.com/apache/hudi/pull/9523#discussion_r1305094068


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/marker/DirectWriteMarkers.java:
##
@@ -130,6 +132,53 @@ public Set 
createdAndMergedDataPaths(HoodieEngineContext context, int pa
 return dataFiles;
   }
 
+  public Set appendedLogPaths(HoodieEngineContext context, int 
parallelism) throws IOException {

Review Comment:
   Currently createdAndMergedDataPaths not return any log file markers because 
the change to generate 'create' type marker for log file has been reverted. I 
notice you also comment the code about this change. Let me explain it there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan closed pull request #9527: [HUDI-6445] Triage ci flakiness aug24 2023 attempt4

2023-08-24 Thread via GitHub


nsivabalan closed pull request #9527: [HUDI-6445] Triage ci flakiness aug24 
2023 attempt4
URL: https://github.com/apache/hudi/pull/9527


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan closed pull request #9526: [HUDI-6445] Triage ci flakiness aug24 2023 attempt3

2023-08-24 Thread via GitHub


nsivabalan closed pull request #9526: [HUDI-6445] Triage ci flakiness aug24 
2023 attempt3
URL: https://github.com/apache/hudi/pull/9526


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan opened a new pull request, #9532: [HUDI-6445] Triage ci flakiness aug24 2023 attempt8

2023-08-24 Thread via GitHub


nsivabalan opened a new pull request, #9532:
URL: https://github.com/apache/hudi/pull/9532

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan opened a new pull request, #9531: [HUDI-6445] Triage ci flakiness aug24 2023 attempt7

2023-08-24 Thread via GitHub


nsivabalan opened a new pull request, #9531:
URL: https://github.com/apache/hudi/pull/9531

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan opened a new pull request, #9530: [HUDI-6445] Triage ci flakiness aug24 2023 attempt6

2023-08-24 Thread via GitHub


nsivabalan opened a new pull request, #9530:
URL: https://github.com/apache/hudi/pull/9530

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6710) Initial investigation to unearth gaps and unearth more open items.

2023-08-24 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-6710:
--
Status: Patch Available  (was: In Progress)

> Initial investigation to unearth gaps and unearth more open items.
> --
>
> Key: HUDI-6710
> URL: https://issues.apache.org/jira/browse/HUDI-6710
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Vinoth Chandar
>Assignee: Sagar Sumit
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-6710) Initial investigation to unearth gaps and unearth more open items.

2023-08-24 Thread Sagar Sumit (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758801#comment-17758801
 ] 

Sagar Sumit commented on HUDI-6710:
---

I think we are clear what we need to do to support multi-table transactions in 
general and for Spark in particular. Timeline and catalog changes are sorted 
out. Read-write and write-write conflicts are sorted assuming different 
concurrency control mechanism. Will put out an RFC that will cover all of this 
in more detail. 

> Initial investigation to unearth gaps and unearth more open items.
> --
>
> Key: HUDI-6710
> URL: https://issues.apache.org/jira/browse/HUDI-6710
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Vinoth Chandar
>Assignee: Sagar Sumit
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-6710) Initial investigation to unearth gaps and unearth more open items.

2023-08-24 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-6710.
-
Resolution: Done

> Initial investigation to unearth gaps and unearth more open items.
> --
>
> Key: HUDI-6710
> URL: https://issues.apache.org/jira/browse/HUDI-6710
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Vinoth Chandar
>Assignee: Sagar Sumit
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] danny0405 commented on a diff in pull request #9209: [HUDI-6539] New LSM tree style archived timeline

2023-08-24 Thread via GitHub


danny0405 commented on code in PR #9209:
URL: https://github.com/apache/hudi/pull/9209#discussion_r1305083315


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/ActiveAction.java:
##
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client.utils;

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6749) Support BEGIN...END SQL syntax for multi-table transaction

2023-08-24 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-6749:
--
Fix Version/s: 1.0.0

> Support BEGIN...END SQL syntax for multi-table transaction
> --
>
> Key: HUDI-6749
> URL: https://issues.apache.org/jira/browse/HUDI-6749
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
> Fix For: 1.0.0
>
>
> Need to add this grammar. Spark doesn't natively support it.
> {code:java}
> BEGIN
>  INSERT INTO $table1 VALUES (1, 'a1', 10, 1000);
>  UPDATE TABLE $table2 SET name = 'a2' where id = 1;
> END {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6750) Enhance ConflictResolutionStrategy to resolve conflicts at database level

2023-08-24 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-6750:
--
Fix Version/s: 1.0.0

> Enhance ConflictResolutionStrategy to resolve conflicts at database level
> -
>
> Key: HUDI-6750
> URL: https://issues.apache.org/jira/browse/HUDI-6750
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6748) Enhance TransactionManager to support multi-table transaction

2023-08-24 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-6748:
--
Fix Version/s: 1.0.0

> Enhance TransactionManager to support multi-table transaction
> -
>
> Key: HUDI-6748
> URL: https://issues.apache.org/jira/browse/HUDI-6748
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
> Fix For: 1.0.0
>
>
> Transaction Manager should be database timeline-aware and return a 
> transaction instance when a transaction is started, which can be used with 
> dataframe writes as below.
> {code:java}
> val transaction = HoodieTransactionManager.beginTransaction()
> // similar to catalog.addTableToTransaction and execute the write. // Then 
> pass the control to 
> HoodieTransactionManager.df1.write.format("hudi").withTransaction(transaction).options(opts1).save(basePath1)
> // Same as above. If this failed, then rollback and pass the control to 
> HoodieTransactionManager.// HoodieTransactionManager will rollback all 
> previous transactions associated with transaction 
> id.df2.write.format("hudi").withTransaction(transaction).options(opts2).save(basePath2)
>  
> // All success, go ahead and commit or finish rollback (ROLLED_BACK state on 
> timeline is written in that case)transaction.commit() {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6748) Enhance TransactionManager to support multi-table transaction

2023-08-24 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-6748:
--
Description: 
Transaction Manager should be database timeline-aware and return a transaction 
instance when a transaction is started, which can be used with dataframe writes 
as below.
{code:java}
val transaction = HoodieTransactionManager.beginTransaction()
df1.write.format("hudi").withTransaction(transaction).options(opts1).save(basePath1)
df2.write.format("hudi").withTransaction(transaction).options(opts2).save(basePath2)
 
// All success, go ahead and commit or finish rollback (ROLLED_BACK state on 
timeline is written in that case)
transaction.commit() {code}
 

  was:
Transaction Manager should be database timeline-aware and return a transaction 
instance when a transaction is started, which can be used with dataframe writes 
as below.
{code:java}
val transaction = HoodieTransactionManager.beginTransaction()
// similar to catalog.addTableToTransaction and execute the write. // Then pass 
the control to 
HoodieTransactionManager.df1.write.format("hudi").withTransaction(transaction).options(opts1).save(basePath1)
// Same as above. If this failed, then rollback and pass the control to 
HoodieTransactionManager.// HoodieTransactionManager will rollback all previous 
transactions associated with transaction 
id.df2.write.format("hudi").withTransaction(transaction).options(opts2).save(basePath2)
 
// All success, go ahead and commit or finish rollback (ROLLED_BACK state on 
timeline is written in that case)transaction.commit() {code}
 


> Enhance TransactionManager to support multi-table transaction
> -
>
> Key: HUDI-6748
> URL: https://issues.apache.org/jira/browse/HUDI-6748
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
> Fix For: 1.0.0
>
>
> Transaction Manager should be database timeline-aware and return a 
> transaction instance when a transaction is started, which can be used with 
> dataframe writes as below.
> {code:java}
> val transaction = HoodieTransactionManager.beginTransaction()
> df1.write.format("hudi").withTransaction(transaction).options(opts1).save(basePath1)
> df2.write.format("hudi").withTransaction(transaction).options(opts2).save(basePath2)
>  
> // All success, go ahead and commit or finish rollback (ROLLED_BACK state on 
> timeline is written in that case)
> transaction.commit() {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6745) Make Spark HoodieCatalog transaction-aware

2023-08-24 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-6745:
--
Fix Version/s: 1.0.0

> Make Spark HoodieCatalog transaction-aware
> --
>
> Key: HUDI-6745
> URL: https://issues.apache.org/jira/browse/HUDI-6745
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6746) Add database timeline and new actions to support multi-table transactions

2023-08-24 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-6746:
--
Fix Version/s: 1.0.0

> Add database timeline and new actions to support multi-table transactions
> -
>
> Key: HUDI-6746
> URL: https://issues.apache.org/jira/browse/HUDI-6746
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6744) Support notion of database in HoodieCatalog

2023-08-24 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-6744:
--
Fix Version/s: 1.0.0

> Support notion of database in HoodieCatalog
> ---
>
> Key: HUDI-6744
> URL: https://issues.apache.org/jira/browse/HUDI-6744
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6744) Support notion of database in HoodieCatalog

2023-08-24 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit reassigned HUDI-6744:
-

Assignee: Sagar Sumit

> Support notion of database in HoodieCatalog
> ---
>
> Key: HUDI-6744
> URL: https://issues.apache.org/jira/browse/HUDI-6744
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] voonhous commented on a diff in pull request #9472: [HUDI-6719]Fix data inconsistency issues caused by concurrent clustering and delete partition.

2023-08-24 Thread via GitHub


voonhous commented on code in PR #9472:
URL: https://github.com/apache/hudi/pull/9472#discussion_r1305077926


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/ReplaceCommitValidateUtil.java:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.action.cluster;
+
+import org.apache.hudi.common.model.HoodieReplaceCommitMetadata;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.util.JsonUtils;
+import org.apache.hudi.exception.HoodieException;
+
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.util.FileIOUtils.LOG;
+
+public class ReplaceCommitValidateUtil {
+  public static final String REPLACE_COMMIT_FILE_IDS = "replaceCommitFileIds";
+  public static void validateReplaceCommit(HoodieTableMetaClient metaClient) {
+metaClient.reloadActiveTimeline();
+Set replaceFileids = new HashSet<>();
+
+// Verify pending and completed replace commit
+
Stream.concat(metaClient.getActiveTimeline().getCompletedReplaceTimeline().getInstantsAsStream(),
+
metaClient.getActiveTimeline().filterInflights().filterPendingReplaceTimeline().getInstantsAsStream()).map(instant
 -> {
+  try {
+HoodieReplaceCommitMetadata replaceCommitMetadata =
+
HoodieReplaceCommitMetadata.fromBytes(metaClient.getActiveTimeline().getInstantDetails(instant).get(),
+HoodieReplaceCommitMetadata.class);
+if (!instant.isCompleted()) {
+  return 
JsonUtils.getObjectMapper().readValue(replaceCommitMetadata.getExtraMetadata().get(REPLACE_COMMIT_FILE_IDS),
 String[].class);
+} else {
+  return 
replaceCommitMetadata.getPartitionToReplaceFileIds().values().stream()
+  .flatMap(List::stream)
+  .toArray(String[]::new);
+}
+  } catch (Exception e) {
+// If the key does not exist or there is a JSON parsing error, LOG 
reports an error and ignores it.
+LOG.error("Error when reading replace commit meta", e);

Review Comment:
   NIT: ~~replace commit~~ ->  `REPLACECOMMIT`



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/ReplaceCommitValidateUtil.java:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.action.cluster;
+
+import org.apache.hudi.common.model.HoodieReplaceCommitMetadata;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.util.JsonUtils;
+import org.apache.hudi.exception.HoodieException;
+
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.util.FileIOUtils.LOG;
+
+public class ReplaceCommitValidateUtil {
+  public static final String REPLACE_COMMIT_FILE_IDS = "replaceCommitFileIds";
+  public static void validateReplaceCommit(HoodieTableMetaClient metaClient) {
+metaClient.reloadActiveTimeline();
+Set replaceFileids = new HashSet<>();
+
+// Verify pending and completed replace commit
+
Stream.concat(metaClient.getActiveTimeline().getCompletedRep

[jira] [Assigned] (HUDI-6749) Support BEGIN...END SQL syntax for multi-table transaction

2023-08-24 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit reassigned HUDI-6749:
-

Assignee: Sagar Sumit

> Support BEGIN...END SQL syntax for multi-table transaction
> --
>
> Key: HUDI-6749
> URL: https://issues.apache.org/jira/browse/HUDI-6749
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>
> Need to add this grammar. Spark doesn't natively support it.
> {code:java}
> BEGIN
>  INSERT INTO $table1 VALUES (1, 'a1', 10, 1000);
>  UPDATE TABLE $table2 SET name = 'a2' where id = 1;
> END {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6748) Enhance TransactionManager to support multi-table transaction

2023-08-24 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit reassigned HUDI-6748:
-

Assignee: Sagar Sumit

> Enhance TransactionManager to support multi-table transaction
> -
>
> Key: HUDI-6748
> URL: https://issues.apache.org/jira/browse/HUDI-6748
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>
> Transaction Manager should be database timeline-aware and return a 
> transaction instance when a transaction is started, which can be used with 
> dataframe writes as below.
> {code:java}
> val transaction = HoodieTransactionManager.beginTransaction()
> // similar to catalog.addTableToTransaction and execute the write. // Then 
> pass the control to 
> HoodieTransactionManager.df1.write.format("hudi").withTransaction(transaction).options(opts1).save(basePath1)
> // Same as above. If this failed, then rollback and pass the control to 
> HoodieTransactionManager.// HoodieTransactionManager will rollback all 
> previous transactions associated with transaction 
> id.df2.write.format("hudi").withTransaction(transaction).options(opts2).save(basePath2)
>  
> // All success, go ahead and commit or finish rollback (ROLLED_BACK state on 
> timeline is written in that case)transaction.commit() {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6745) Make Spark HoodieCatalog transaction-aware

2023-08-24 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit reassigned HUDI-6745:
-

Assignee: Sagar Sumit

> Make Spark HoodieCatalog transaction-aware
> --
>
> Key: HUDI-6745
> URL: https://issues.apache.org/jira/browse/HUDI-6745
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6750) Enhance ConflictResolutionStrategy to resolve conflicts at database level

2023-08-24 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit reassigned HUDI-6750:
-

Assignee: Sagar Sumit

> Enhance ConflictResolutionStrategy to resolve conflicts at database level
> -
>
> Key: HUDI-6750
> URL: https://issues.apache.org/jira/browse/HUDI-6750
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6746) Add database timeline and new actions to support multi-table transactions

2023-08-24 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit reassigned HUDI-6746:
-

Assignee: Sagar Sumit

> Add database timeline and new actions to support multi-table transactions
> -
>
> Key: HUDI-6746
> URL: https://issues.apache.org/jira/browse/HUDI-6746
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6750) Enhance ConflictResolutionStrategy to resolve conflicts at database level

2023-08-24 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-6750:
-

 Summary: Enhance ConflictResolutionStrategy to resolve conflicts 
at database level
 Key: HUDI-6750
 URL: https://issues.apache.org/jira/browse/HUDI-6750
 Project: Apache Hudi
  Issue Type: Task
Reporter: Sagar Sumit






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6749) Support BEGIN...END SQL syntax for multi-table transaction

2023-08-24 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-6749:
-

 Summary: Support BEGIN...END SQL syntax for multi-table transaction
 Key: HUDI-6749
 URL: https://issues.apache.org/jira/browse/HUDI-6749
 Project: Apache Hudi
  Issue Type: Task
Reporter: Sagar Sumit


Need to add this grammar. Spark doesn't natively support it.
{code:java}
BEGIN
 INSERT INTO $table1 VALUES (1, 'a1', 10, 1000);
 UPDATE TABLE $table2 SET name = 'a2' where id = 1;
END {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9525: [HUDI-6445] Triage ci flakiness aug24 2023 attempt2

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9525:
URL: https://github.com/apache/hudi/pull/9525#issuecomment-1692685689

   
   ## CI report:
   
   * 2a619f68600e8bd311102ecd8954698e17c6843d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19447)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6748) Enhance TransactionManager to support multi-table transaction

2023-08-24 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-6748:
-

 Summary: Enhance TransactionManager to support multi-table 
transaction
 Key: HUDI-6748
 URL: https://issues.apache.org/jira/browse/HUDI-6748
 Project: Apache Hudi
  Issue Type: Task
Reporter: Sagar Sumit


Transaction Manager should be database timeline-aware and return a transaction 
instance when a transaction is started, which can be used with dataframe writes 
as below.
{code:java}
val transaction = HoodieTransactionManager.beginTransaction()
// similar to catalog.addTableToTransaction and execute the write. // Then pass 
the control to 
HoodieTransactionManager.df1.write.format("hudi").withTransaction(transaction).options(opts1).save(basePath1)
// Same as above. If this failed, then rollback and pass the control to 
HoodieTransactionManager.// HoodieTransactionManager will rollback all previous 
transactions associated with transaction 
id.df2.write.format("hudi").withTransaction(transaction).options(opts2).save(basePath2)
 
// All success, go ahead and commit or finish rollback (ROLLED_BACK state on 
timeline is written in that case)transaction.commit() {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] danny0405 commented on a diff in pull request #9209: [HUDI-6539] New LSM tree style archived timeline

2023-08-24 Thread via GitHub


danny0405 commented on code in PR #9209:
URL: https://github.com/apache/hudi/pull/9209#discussion_r1305073510


##
hudi-common/src/main/avro/HoodieLSMTimelineInstant.avsc:
##
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+{
+   "type":"record",
+   "name":"HoodieLSMTimelineInstant",
+   "namespace":"org.apache.hudi.avro.model",
+   "fields":[
+  {
+ "name":"instantTime",
+ "type":["null","string"],
+ "default": null
+  },
+  {
+ "name":"completionTime",
+ "type":["null","string"],
+ "default": null
+  },
+  {
+ "name":"action",
+ "type":["null","string"],
+ "default": null
+  },
+  {
+ "name":"metadata",
+ "type":["null", "bytes"],
+ "default": null
+  },
+  {
+ "name":"plan",
+ "type":["null", "bytes"],

Review Comment:
   Yeah, bytes is more effocient. Fired a follow up JIRA: 
https://issues.apache.org/jira/browse/HUDI-6747



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6747) Support decode UDF for plan and commit metadata from the archived timeline

2023-08-24 Thread Danny Chen (Jira)
Danny Chen created HUDI-6747:


 Summary: Support decode UDF for plan and commit metadata from the 
archived timeline
 Key: HUDI-6747
 URL: https://issues.apache.org/jira/browse/HUDI-6747
 Project: Apache Hudi
  Issue Type: Improvement
  Components: core
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6746) Add database timeline and new actions to support multi-table transactions

2023-08-24 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-6746:
-

 Summary: Add database timeline and new actions to support 
multi-table transactions
 Key: HUDI-6746
 URL: https://issues.apache.org/jira/browse/HUDI-6746
 Project: Apache Hudi
  Issue Type: Task
Reporter: Sagar Sumit






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6744) Support notion of database in HoodieCatalog

2023-08-24 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-6744:
--
Summary: Support notion of database in HoodieCatalog  (was: Support 
database and database timeline in HoodieCatalog)

> Support notion of database in HoodieCatalog
> ---
>
> Key: HUDI-6744
> URL: https://issues.apache.org/jira/browse/HUDI-6744
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6745) Make Spark HoodieCatalog transaction-aware

2023-08-24 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-6745:
--
Summary: Make Spark HoodieCatalog transaction-aware  (was: Make 
HoodieCatalog transaction-aware)

> Make Spark HoodieCatalog transaction-aware
> --
>
> Key: HUDI-6745
> URL: https://issues.apache.org/jira/browse/HUDI-6745
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] voonhous commented on a diff in pull request #9472: [HUDI-6719]Fix data inconsistency issues caused by concurrent clustering and delete partition.

2023-08-24 Thread via GitHub


voonhous commented on code in PR #9472:
URL: https://github.com/apache/hudi/pull/9472#discussion_r1305067820


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/ReplaceCommitValidateUtil.java:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.action.cluster;
+
+import org.apache.hudi.common.model.HoodieReplaceCommitMetadata;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.util.JsonUtils;
+import org.apache.hudi.exception.HoodieException;
+
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.util.FileIOUtils.LOG;
+
+public class ReplaceCommitValidateUtil {
+  public static final String REPLACE_COMMIT_FILE_IDS = "replaceCommitFileIds";
+  public static void validateReplaceCommit(HoodieTableMetaClient metaClient) {
+metaClient.reloadActiveTimeline();
+Set replaceFileids = new HashSet<>();
+
+// Verify pending and completed replace commit
+
Stream.concat(metaClient.getActiveTimeline().getCompletedReplaceTimeline().getInstantsAsStream(),
+
metaClient.getActiveTimeline().filterInflights().filterPendingReplaceTimeline().getInstantsAsStream()).map(instant
 -> {
+  try {
+HoodieReplaceCommitMetadata replaceCommitMetadata =
+
HoodieReplaceCommitMetadata.fromBytes(metaClient.getActiveTimeline().getInstantDetails(instant).get(),
+HoodieReplaceCommitMetadata.class);
+if (!instant.isCompleted()) {
+  return 
JsonUtils.getObjectMapper().readValue(replaceCommitMetadata.getExtraMetadata().get(REPLACE_COMMIT_FILE_IDS),
 String[].class);
+} else {
+  return 
replaceCommitMetadata.getPartitionToReplaceFileIds().values().stream()
+  .flatMap(List::stream)
+  .toArray(String[]::new);
+}
+  } catch (Exception e) {
+// If the key does not exist or there is a JSON parsing error, LOG 
reports an error and ignores it.
+LOG.error("Error when reading replace commit meta", e);
+return null;
+  }
+}).filter(Objects::nonNull)
+.forEach(fileIdArray -> {

Review Comment:
   NIT: Use a `flatMap` here instead? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6745) Make HoodieCatalog transaction-aware

2023-08-24 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-6745:
-

 Summary: Make HoodieCatalog transaction-aware
 Key: HUDI-6745
 URL: https://issues.apache.org/jira/browse/HUDI-6745
 Project: Apache Hudi
  Issue Type: Task
Reporter: Sagar Sumit






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6744) Support database and database timeline in HoodieCatalog

2023-08-24 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-6744:
-

 Summary: Support database and database timeline in HoodieCatalog
 Key: HUDI-6744
 URL: https://issues.apache.org/jira/browse/HUDI-6744
 Project: Apache Hudi
  Issue Type: Task
Reporter: Sagar Sumit






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6711) Write a RFC for Multi Table Txns

2023-08-24 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-6711:
--
Status: In Progress  (was: Open)

> Write a RFC for Multi Table Txns
> 
>
> Key: HUDI-6711
> URL: https://issues.apache.org/jira/browse/HUDI-6711
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Vinoth Chandar
>Assignee: Sagar Sumit
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9523: [DO NOT MEGRE] HUDI-1517-new with nsivabalan

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9523:
URL: https://github.com/apache/hudi/pull/9523#issuecomment-1692645105

   
   ## CI report:
   
   * f04d3fb5e5400c0dc6e4a091e7251381fa00a743 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19446)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9521: [HUDI-6736] Revert pr 8849

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9521:
URL: https://github.com/apache/hudi/pull/9521#issuecomment-1692639315

   
   ## CI report:
   
   * 0aa97d414fd91d95e5931d108407dbc2b280b519 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19444)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9526: [HUDI-6445] Triage ci flakiness aug24 2023 attempt3

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9526:
URL: https://github.com/apache/hudi/pull/9526#issuecomment-1692631752

   
   ## CI report:
   
   * 418646378e3c1a7a949188c6ea9ed74629a7f8d4 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19449)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19448)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9527: [HUDI-6445] Triage ci flakiness aug24 2023 attempt4

2023-08-24 Thread via GitHub


hudi-bot commented on PR #9527:
URL: https://github.com/apache/hudi/pull/9527#issuecomment-1692631780

   
   ## CI report:
   
   * 418646378e3c1a7a949188c6ea9ed74629a7f8d4 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19449)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19448)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (HUDI-6712) Implement optimized keyed lookup on parquet files

2023-08-24 Thread Lin Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-6712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758787#comment-17758787
 ] 

Lin Liu edited comment on HUDI-6712 at 8/25/23 1:33 AM:


During our process of moving lake_plumber code into hudi, we found that the 
parquet version in lake plumbe is 1.13.1, but in hudi is 1.10.1 for spark2, and 
1.12.2 for spark3. Though we can ignore for compiling for spark2 for now, I 
have done a few checks for spark3:
 # ParquetRewriter can be compiled for parquet 1.12.2 version, and its 
benchmark has been run on a file in 1.10.1 without any issues (benchmark 
finished successfully.)
 # ParqueKeyedLookup can be compiled for parquet 1.12.2; however, its benchmark 
fails to be compiled; after commenting out the failing part, the benchmark 
threw NullPointerException during execution. After checking, the error is 
related to page index. Will dig deeper.  


was (Author: JIRAUSER301185):
During our process of moving lake_plumber code into hudi, we found that the 
parquet version in lake plumbe is 1.13.1, but in hudi is 1.10.1 for spark2, and 
1.12.2 for spark3. Though we can not ignore for compiling for spark2, I have 
done a few checks:
 # ParquetRewriter can be compiled for parquet 1.12.2 version, and its 
benchmark has been run on a file in 1.10.1 without any issues (benchmark 
finished successfully.)
 # ParqueKeyedLookup can be compiled for parquet 1.12.2; however, its benchmark 
fails to be compiled; after commenting out the failing part, the benchmark 
threw NullPointerException during execution. After checking, the error is 
related to page index. Will dig deeper.  

> Implement optimized keyed lookup on parquet files
> -
>
> Key: HUDI-6712
> URL: https://issues.apache.org/jira/browse/HUDI-6712
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Vinoth Chandar
>Assignee: Lin Liu
>Priority: Major
> Fix For: 1.0.0
>
>
> Parquet performs poorly when performing a lookup of specific records, based 
> on a single key lookup column. 
> e.g: select * from parquet where key in ("a","b", "c) (SQL)
> e.g: List lookup(parquetFile, Set keys) (code) 
> Let's implement a reader, that is optimized for this pattern, by scanning 
> least amount of data. 
> Requirements: 
> 1. Need to support multiple values for same key. 
> 2. Can assume the file is sorted by the key/lookup field. 
> 3. Should handle non-existence of keys.
> 4. Should leverage parquet metadata (bloom filters, column index, ... ) to 
> minimize read read. 
> 5. Must to the minimum about of RPC calls to cloud storage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HUDI-6712) Implement optimized keyed lookup on parquet files

2023-08-24 Thread Lin Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-6712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758787#comment-17758787
 ] 

Lin Liu edited comment on HUDI-6712 at 8/25/23 1:31 AM:


During our process of moving lake_plumber code into hudi, we found that the 
parquet version in lake plumbe is 1.13.1, but in hudi is 1.10.1 for spark2, and 
1.12.2 for spark3. Though we can not ignore for compiling for spark2, I have 
done a few checks:
 # ParquetRewriter can be compiled for parquet 1.12.2 version, and its 
benchmark has been run on a file in 1.10.1 without any issues (benchmark 
finished successfully.)
 # ParqueKeyedLookup can be compiled for parquet 1.12.2; however, its benchmark 
fails to be compiled; after commenting out the failing part, the benchmark 
threw NullPointerException during execution. After checking, the error is 
related to page index. Will dig deeper.  


was (Author: JIRAUSER301185):
During our process of moving lake_plumber code into hudi, we found that the 
parquet version in lake plumbe is 1.13.1, but in hudi is 1.10.1 for spark2, and 
1.12.2 for spark3. Though we can not ignore for compiling for spark2, I have 
done a few checks:
 # ParquetRewriter can be compiled for parquet 1.12.2 version, and its 
benchmark has been run on a file in 1.10.1 without any issues (benchmark 
finished successfully.)
 # ParqueKeyedLookup can be compiled for parquet 1.12.2; however, its benchmark 
fails to be compiled; after commenting out the failing part, the benchmark 
threw NullPointerException. After checking, the error is related to page index. 
Will dig deeper.  

> Implement optimized keyed lookup on parquet files
> -
>
> Key: HUDI-6712
> URL: https://issues.apache.org/jira/browse/HUDI-6712
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Vinoth Chandar
>Assignee: Lin Liu
>Priority: Major
> Fix For: 1.0.0
>
>
> Parquet performs poorly when performing a lookup of specific records, based 
> on a single key lookup column. 
> e.g: select * from parquet where key in ("a","b", "c) (SQL)
> e.g: List lookup(parquetFile, Set keys) (code) 
> Let's implement a reader, that is optimized for this pattern, by scanning 
> least amount of data. 
> Requirements: 
> 1. Need to support multiple values for same key. 
> 2. Can assume the file is sorted by the key/lookup field. 
> 3. Should handle non-existence of keys.
> 4. Should leverage parquet metadata (bloom filters, column index, ... ) to 
> minimize read read. 
> 5. Must to the minimum about of RPC calls to cloud storage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HUDI-6712) Implement optimized keyed lookup on parquet files

2023-08-24 Thread Lin Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-6712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758787#comment-17758787
 ] 

Lin Liu edited comment on HUDI-6712 at 8/25/23 1:31 AM:


During our process of moving lake_plumber code into hudi, we found that the 
parquet version in lake plumbe is 1.13.1, but in hudi is 1.10.1 for spark2, and 
1.12.2 for spark3. Though we can not ignore for compiling for spark2, I have 
done a few checks:
 # ParquetRewriter can be compiled for parquet 1.12.2 version, and its 
benchmark has been run on a file in 1.10.1 without any issues (benchmark 
finished successfully.)
 # ParqueKeyedLookup can be compiled for parquet 1.12.2; however, its benchmark 
fails to be compiled; after commenting out the failing part, the benchmark 
threw NullPointerException. After checking, the error is related to page index. 
Will dig deeper.  


was (Author: JIRAUSER301185):
During our process of moving lake_plumber code into hudi, we found that the 
parquet version in lake plumbe is 1.13.1, but in hudi is 1.10.1 for spark2, and 
1.12.2 for spark3. Though we can not ignore for compiling for spark2, I have 
done a few checks:
 # ParquetRewriter can be compiled for parquet 1.12.2 version, and its 
benchmark has been run on a file in 1.10.1 without any issues (benchmark 
finished successfully.)
 # ParqueKeyedLookup can be compiled for parquet 1.12.2; however, its benchmark 
fails to be compiled; after removing the failing part, the benchmark threw 
NullPointerException. After checking, the error is related to page index. Will 
dig deeper.  

> Implement optimized keyed lookup on parquet files
> -
>
> Key: HUDI-6712
> URL: https://issues.apache.org/jira/browse/HUDI-6712
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Vinoth Chandar
>Assignee: Lin Liu
>Priority: Major
> Fix For: 1.0.0
>
>
> Parquet performs poorly when performing a lookup of specific records, based 
> on a single key lookup column. 
> e.g: select * from parquet where key in ("a","b", "c) (SQL)
> e.g: List lookup(parquetFile, Set keys) (code) 
> Let's implement a reader, that is optimized for this pattern, by scanning 
> least amount of data. 
> Requirements: 
> 1. Need to support multiple values for same key. 
> 2. Can assume the file is sorted by the key/lookup field. 
> 3. Should handle non-existence of keys.
> 4. Should leverage parquet metadata (bloom filters, column index, ... ) to 
> minimize read read. 
> 5. Must to the minimum about of RPC calls to cloud storage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-6712) Implement optimized keyed lookup on parquet files

2023-08-24 Thread Lin Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-6712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758787#comment-17758787
 ] 

Lin Liu commented on HUDI-6712:
---

During our process of moving lake_plumber code into hudi, we found that the 
parquet version in lake plumbe is 1.13.1, but in hudi is 1.10.1 for spark2, and 
1.12.2 for spark3. Though we can not ignore for compiling for spark2, I have 
done a few checks:
 # ParquetRewriter can be compiled for parquet 1.12.2 version, and its 
benchmark has been run on a file in 1.10.1 without any issues (benchmark 
finished successfully.)
 # ParqueKeyedLookup can be compiled for parquet 1.12.2; however, its benchmark 
fails to be compiled; after removing the failing part, the benchmark threw 
NullPointerException. After checking, the error is related to page index. Will 
dig deeper.  

> Implement optimized keyed lookup on parquet files
> -
>
> Key: HUDI-6712
> URL: https://issues.apache.org/jira/browse/HUDI-6712
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Vinoth Chandar
>Assignee: Lin Liu
>Priority: Major
> Fix For: 1.0.0
>
>
> Parquet performs poorly when performing a lookup of specific records, based 
> on a single key lookup column. 
> e.g: select * from parquet where key in ("a","b", "c) (SQL)
> e.g: List lookup(parquetFile, Set keys) (code) 
> Let's implement a reader, that is optimized for this pattern, by scanning 
> least amount of data. 
> Requirements: 
> 1. Need to support multiple values for same key. 
> 2. Can assume the file is sorted by the key/lookup field. 
> 3. Should handle non-existence of keys.
> 4. Should leverage parquet metadata (bloom filters, column index, ... ) to 
> minimize read read. 
> 5. Must to the minimum about of RPC calls to cloud storage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] njalan opened a new issue, #9529: S3 storage cpu is high when spark streaming is upserting data into hudi

2023-08-24 Thread via GitHub


njalan opened a new issue, #9529:
URL: https://github.com/apache/hudi/issues/9529

   S3 storage cpu is high when spark streaming is  upserting data into hudi.
   it is local storage works like s3 and when the spark steaming jobs is more 
the cpu is pretty high but I/O is not high.
   
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.9
   
   * Spark version : 3.0.1
   
   * Hive version : 3
   
   * Hadoop version : 3.2.2
   
   * Storage (HDFS/S3/GCS..) : s3
   
   * Running on Docker? (yes/no) :no
   
   Is there any way to monitor hudi ingestion process? I want to get more 
metrics regarding the performance.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   3   >