Re: [PR] [HUDI-7624] Fixing index tagging duration [hudi]

2024-05-05 Thread via GitHub


nsivabalan commented on code in PR #11035:
URL: https://github.com/apache/hudi/pull/11035#discussion_r1590607676


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/HoodieMetrics.java:
##
@@ -207,6 +210,13 @@ public Timer.Context getIndexCtx() {
 return indexTimer == null ? null : indexTimer.time();
   }
 
+  public Timer.Context getPreWriteTimerCtx() {
+if (config.isMetricsOn() && preWriteTimer == null) {
+  preWriteTimer = createTimer(preWriteTimerName);
+}

Review Comment:
   sure. do you have any good suggestion for this metric name. 
   As you might be aware, it spans from reading from source up until completion 
of tag location. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7429] Fixing average record size estimation for delta commits [hudi]

2024-05-05 Thread via GitHub


nsivabalan commented on code in PR #10763:
URL: https://github.com/apache/hudi/pull/10763#discussion_r1590594514


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/AverageRecordSizeUtils.java:
##
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.commit;
+
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.config.HoodieWriteConfig;
+
+import org.apache.hadoop.fs.Path;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Iterator;
+import java.util.concurrent.atomic.AtomicLong;
+
+import static 
org.apache.hudi.common.table.timeline.HoodieTimeline.COMMIT_ACTION;
+import static 
org.apache.hudi.common.table.timeline.HoodieTimeline.DELTA_COMMIT_ACTION;
+import static 
org.apache.hudi.common.table.timeline.HoodieTimeline.REPLACE_COMMIT_ACTION;
+
+/**
+ * Util class to assist with fetching average record size.
+ */
+public class AverageRecordSizeUtils {
+  private static final Logger LOG = 
LoggerFactory.getLogger(AverageRecordSizeUtils.class);
+
+  /**
+   * Obtains the average record size based on records written during previous 
commits. Used for estimating how many
+   * records pack into one file.
+   */
+  static long averageBytesPerRecord(HoodieTimeline commitTimeline, 
HoodieWriteConfig hoodieWriteConfig) {
+long avgSize = hoodieWriteConfig.getCopyOnWriteRecordSizeEstimate();
+long fileSizeThreshold = (long) 
(hoodieWriteConfig.getRecordSizeEstimationThreshold() * 
hoodieWriteConfig.getParquetSmallFileLimit());
+try {

Review Comment:
   addressed it. you can take a look



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7716] Adding more logs around index lookup, kafka sources. [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11157:
URL: https://github.com/apache/hudi/pull/11157#issuecomment-2095279910

   
   ## CI report:
   
   * 8bfd0fc5c69d82e649512df5f4ae5d286e457137 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7710] Use compaction.requested during conflict resolution [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11151:
URL: https://github.com/apache/hudi/pull/11151#issuecomment-2095279790

   
   ## CI report:
   
   * 2985ea2ec2f8a0b62086a6ac9933654051a65738 UNKNOWN
   * eb1c4d47c1464b870719109da1944cec21cd4327 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23681)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7701] Metadata table initailization with pending instants [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11137:
URL: https://github.com/apache/hudi/pull/11137#issuecomment-2095279710

   
   ## CI report:
   
   * adc1380cb496881fd2f1c8b30aa059759c7c5c9c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23662)
 
   * 4c6c60d3b1bdde2c780374b44962e77a6120015f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23683)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #10898:
URL: https://github.com/apache/hudi/pull/10898#issuecomment-2095279242

   
   ## CI report:
   
   * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN
   * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN
   * 51bee45de0d12f1613d7af314914fceb585f4282 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23680)
 
   * 196a076af315f9036459dea07c13df6267de296b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23682)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7701] Metadata table initailization with pending instants [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11137:
URL: https://github.com/apache/hudi/pull/11137#issuecomment-2095271629

   
   ## CI report:
   
   * adc1380cb496881fd2f1c8b30aa059759c7c5c9c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23662)
 
   * 4c6c60d3b1bdde2c780374b44962e77a6120015f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #10898:
URL: https://github.com/apache/hudi/pull/10898#issuecomment-2095271131

   
   ## CI report:
   
   * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN
   * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN
   * 51bee45de0d12f1613d7af314914fceb585f4282 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23680)
 
   * 196a076af315f9036459dea07c13df6267de296b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7716) Add more logs around index lookup

2024-05-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7716:
-
Labels: pull-request-available  (was: )

> Add more logs around index lookup
> -
>
> Key: HUDI-7716
> URL: https://issues.apache.org/jira/browse/HUDI-7716
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: index
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7716] Adding more logs around index lookup, kafka sources. [hudi]

2024-05-05 Thread via GitHub


nsivabalan opened a new pull request, #11157:
URL: https://github.com/apache/hudi/pull/11157

   ### Change Logs
   
   Adding more logs around index lookup, kafka sources. 
   
   ### Impact
   
   Adding more logs around index lookup, kafka sources. 
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7716) Add more logs around index lookup

2024-05-05 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-7716:
-

 Summary: Add more logs around index lookup
 Key: HUDI-7716
 URL: https://issues.apache.org/jira/browse/HUDI-7716
 Project: Apache Hudi
  Issue Type: Improvement
  Components: index
Reporter: sivabalan narayanan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #10898:
URL: https://github.com/apache/hudi/pull/10898#issuecomment-2095263558

   
   ## CI report:
   
   * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN
   * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN
   * 51bee45de0d12f1613d7af314914fceb585f4282 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23680)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-05-05 Thread via GitHub


xuzifu666 commented on code in PR #10898:
URL: https://github.com/apache/hudi/pull/10898#discussion_r1590562722


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/HoodieSimpleBucketIndex.java:
##
@@ -61,14 +78,69 @@ public Map 
loadBucketIdToFileIdMappingForPartitio
   if (!bucketIdToFileIdMapping.containsKey(bucketId)) {
 bucketIdToFileIdMapping.put(bucketId, new 
HoodieRecordLocation(commitTime, fileId));
   } else {
+// Finding the instants which conflict with the bucket id
+Set instants = 
findTheConflictBucketIdInPartition(hoodieTable, partition, bucketId);
+
 // Check if bucket data is valid
 throw new HoodieIOException("Find multiple files at partition 
path="
-+ partition + " belongs to the same bucket id = " + bucketId);
++ partition + " belongs to the same bucket id = " + bucketId
++ ", these instants need to rollback: " + instants.toString()
++ ", you can use rollback_to_instant procedure to recovery");
   }
 });
 return bucketIdToFileIdMapping;
   }
 
+  /**
+   * Find out the conflict files in bucket partition with bucket id
+   */
+  public HashSet findTheConflictBucketIdInPartition(HoodieTable 
hoodieTable, String partition, int bucketId) {
+HashSet instants = new HashSet<>();
+HoodieTableMetaClient metaClient = hoodieTable.getMetaClient();
+StoragePath basePath = metaClient.getBasePathV2();
+StoragePath partitionPath = new StoragePath(basePath.toString(), 
partition);
+
+Stream latestFileSlicesIncludingInflight = 
hoodieTable.getSliceView().getLatestFileSlicesIncludingInflight(partition);
+List pendingInstants = 
latestFileSlicesIncludingInflight.map(fileSlice -> 
fileSlice.getLatestInstantTime()).collect(Collectors.toList());

Review Comment:
   > Does this work for you?
   > 
   > ```java
   > pendingInstantSet = filter the timeline by pending instant
   > partitionFileList = 
hoodieTable.getSliceView().getLatestFileSlicesIncludingInflight(partition).map(fileSlice
 -> fileSlice.getAllFiles).collectAsList;
   > 
   > conflict_ids = set()
   > 
   > for (File f in partitionFileList):
   >   if (getCommitTime(f) in pendingInstantSet):
   > conflict_ids.add(getFileId(f))
   > 
   > then report all the msgs about the conflict ids.
   > ```
   
   I had changed as it,only just a little not the same which not impact the 
logic,can help review once. @danny0405 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7701] Metadata table initailization with pending instants [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #11137:
URL: https://github.com/apache/hudi/pull/11137#discussion_r1590551681


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieMetadataBootstrap.java:
##
@@ -261,12 +260,8 @@ private void bootstrapAndVerifyFailure() throws Exception {
 writeConfig = getWriteConfig(true, true);
 initWriteConfigAndMetatableWriter(writeConfig, true);
 syncTableMetadata(writeConfig);
-try {
-  validateMetadata(testTable);
-  Assertions.fail("Should have failed");
-} catch (IllegalStateException e) {
-  // expected
-}
+Assertions.assertThrows(AssertionFailedError.class, () -> 
validateMetadata(testTable),

Review Comment:
   Yeah, then we need to throw a specific expection from the 
`validateMetadata`. In side `validateMetadata`, we have many assertions for the 
equality of partitions and file list, and here we want to catch the assertion 
error, so just change the class to Java `Error`, and also assert the specific 
error msg.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7701] Metadata table initailization with pending instants [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #11137:
URL: https://github.com/apache/hudi/pull/11137#discussion_r1590551681


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieMetadataBootstrap.java:
##
@@ -261,12 +260,8 @@ private void bootstrapAndVerifyFailure() throws Exception {
 writeConfig = getWriteConfig(true, true);
 initWriteConfigAndMetatableWriter(writeConfig, true);
 syncTableMetadata(writeConfig);
-try {
-  validateMetadata(testTable);
-  Assertions.fail("Should have failed");
-} catch (IllegalStateException e) {
-  // expected
-}
+Assertions.assertThrows(AssertionFailedError.class, () -> 
validateMetadata(testTable),

Review Comment:
   Yeah, then we need to throw a specific expection from the 
`validateMetadata`. In side `validateMetadata`, we have many assertions for the 
equality of partitions and file list, just change the class to Java `Error`, 
and also assert the specific error msg.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7701] Metadata table initailization with pending instants [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #11137:
URL: https://github.com/apache/hudi/pull/11137#discussion_r1590551681


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieMetadataBootstrap.java:
##
@@ -261,12 +260,8 @@ private void bootstrapAndVerifyFailure() throws Exception {
 writeConfig = getWriteConfig(true, true);
 initWriteConfigAndMetatableWriter(writeConfig, true);
 syncTableMetadata(writeConfig);
-try {
-  validateMetadata(testTable);
-  Assertions.fail("Should have failed");
-} catch (IllegalStateException e) {
-  // expected
-}
+Assertions.assertThrows(AssertionFailedError.class, () -> 
validateMetadata(testTable),

Review Comment:
   Yeah, then we need to throw a specific expection from the 
`validateMetadata`. Just change the class to Java Error, and also assert the 
specific error msg.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7701] Metadata table initailization with pending instants [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #11137:
URL: https://github.com/apache/hudi/pull/11137#discussion_r1590555917


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieMetadataBootstrap.java:
##
@@ -261,12 +260,8 @@ private void bootstrapAndVerifyFailure() throws Exception {
 writeConfig = getWriteConfig(true, true);
 initWriteConfigAndMetatableWriter(writeConfig, true);
 syncTableMetadata(writeConfig);
-try {
-  validateMetadata(testTable);
-  Assertions.fail("Should have failed");
-} catch (IllegalStateException e) {
-  // expected
-}
+Assertions.assertThrows(AssertionFailedError.class, () -> 
validateMetadata(testTable),

Review Comment:
   I just revert it back to what it was.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7701] Metadata table initailization with pending instants [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #11137:
URL: https://github.com/apache/hudi/pull/11137#discussion_r1590555917


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieMetadataBootstrap.java:
##
@@ -261,12 +260,8 @@ private void bootstrapAndVerifyFailure() throws Exception {
 writeConfig = getWriteConfig(true, true);
 initWriteConfigAndMetatableWriter(writeConfig, true);
 syncTableMetadata(writeConfig);
-try {
-  validateMetadata(testTable);
-  Assertions.fail("Should have failed");
-} catch (IllegalStateException e) {
-  // expected
-}
+Assertions.assertThrows(AssertionFailedError.class, () -> 
validateMetadata(testTable),

Review Comment:
   I just revert it back to what it was.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7710] Use compaction.requested during conflict resolution [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11151:
URL: https://github.com/apache/hudi/pull/11151#issuecomment-2095223554

   
   ## CI report:
   
   * 2985ea2ec2f8a0b62086a6ac9933654051a65738 UNKNOWN
   * bd09a1b36becfbcdc75195427148eb948e384ac5 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23664)
 
   * eb1c4d47c1464b870719109da1944cec21cd4327 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23681)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #10898:
URL: https://github.com/apache/hudi/pull/10898#issuecomment-2095223153

   
   ## CI report:
   
   * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN
   * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN
   * 4d1e33f471f37c8b0f1b7ff7174dff9f61d502b6 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23679)
 
   * 51bee45de0d12f1613d7af314914fceb585f4282 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23680)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7701] Metadata table initailization with pending instants [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #11137:
URL: https://github.com/apache/hudi/pull/11137#discussion_r1590551681


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieMetadataBootstrap.java:
##
@@ -261,12 +260,8 @@ private void bootstrapAndVerifyFailure() throws Exception {
 writeConfig = getWriteConfig(true, true);
 initWriteConfigAndMetatableWriter(writeConfig, true);
 syncTableMetadata(writeConfig);
-try {
-  validateMetadata(testTable);
-  Assertions.fail("Should have failed");
-} catch (IllegalStateException e) {
-  // expected
-}
+Assertions.assertThrows(AssertionFailedError.class, () -> 
validateMetadata(testTable),

Review Comment:
   Yeah, then we need to throw a specific expection from the `validateMetadata`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7710] Use compaction.requested during conflict resolution [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11151:
URL: https://github.com/apache/hudi/pull/11151#issuecomment-2095217286

   
   ## CI report:
   
   * 2985ea2ec2f8a0b62086a6ac9933654051a65738 UNKNOWN
   * bd09a1b36becfbcdc75195427148eb948e384ac5 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23664)
 
   * eb1c4d47c1464b870719109da1944cec21cd4327 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7701] Metadata table initailization with pending instants [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #11137:
URL: https://github.com/apache/hudi/pull/11137#discussion_r1590551059


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieMetadataBootstrap.java:
##
@@ -167,20 +167,19 @@ public void testMetadataBootstrapInflightCommit() throws 
Exception {
 HoodieTableType tableType = COPY_ON_WRITE;
 init(tableType, false);
 
+// In real production env, bootstrap action can only happen on empty table,
+// otherwise we need to roll back the previous bootstrap first,
+// see 'SparkBootstrapCommitActionExecutor.execute' for more details.
 doPreBootstrapWriteOperation(testTable, INSERT, "001");
 doPreBootstrapWriteOperation(testTable, "002");
 // add an inflight commit
 HoodieCommitMetadata inflightCommitMeta = 
testTable.doWriteOperation("0007", UPSERT, emptyList(),
-asList("p1", "p2"), 2, true, true);
+asList("p1", "p2"), 2, false, true);

Review Comment:
   I added the comment, the bootstrap action only happens once, that is the 
limination of itself, the tricky part is that the commit metadata for bootstrap 
file metadata only contains the file group id instead of specfic file names.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7701] Metadata table initailization with pending instants [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #11137:
URL: https://github.com/apache/hudi/pull/11137#discussion_r1590550442


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java:
##
@@ -925,6 +925,37 @@ public void 
testMetadataTableCompactionWithPendingInstants() throws Exception {
 
assertEquals(HoodieInstantTimeGenerator.instantTimeMinusMillis(inflightInstant2,
 1L), tableMetadata.getLatestCompactionTime().get());
   }
 
+  @Test
+  public void testInitializeMetadataTableWithPendingInstant() throws Exception 
{
+init(COPY_ON_WRITE, false);

Review Comment:
   The table type is not a factor of the functionality, maybe we can just 
paramatize this test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7701] Metadata table initailization with pending instants [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #11137:
URL: https://github.com/apache/hudi/pull/11137#discussion_r1590550442


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java:
##
@@ -925,6 +925,37 @@ public void 
testMetadataTableCompactionWithPendingInstants() throws Exception {
 
assertEquals(HoodieInstantTimeGenerator.instantTimeMinusMillis(inflightInstant2,
 1L), tableMetadata.getLatestCompactionTime().get());
   }
 
+  @Test
+  public void testInitializeMetadataTableWithPendingInstant() throws Exception 
{
+init(COPY_ON_WRITE, false);

Review Comment:
   There should be no much difference for the table type, maybe we can just 
paramatize this test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7701] Metadata table initailization with pending instants [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #11137:
URL: https://github.com/apache/hudi/pull/11137#discussion_r1590550067


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java:
##
@@ -925,6 +925,37 @@ public void 
testMetadataTableCompactionWithPendingInstants() throws Exception {
 
assertEquals(HoodieInstantTimeGenerator.instantTimeMinusMillis(inflightInstant2,
 1L), tableMetadata.getLatestCompactionTime().get());
   }
 
+  @Test
+  public void testInitializeMetadataTableWithPendingInstant() throws Exception 
{
+init(COPY_ON_WRITE, false);
+initWriteConfigAndMetatableWriter(writeConfig, false);
+doWriteOperation(testTable, metaClient.createNewInstantTime(), INSERT);
+doWriteOperation(testTable, metaClient.createNewInstantTime(), INSERT);
+
+// test multi-writer scenario. let's add 1,2,3,4 where 1,2,4 succeeded, 
but 3 is still inflight. so latest delta commit in MDT is 4, while 3 is still 
pending
+// in DT and not seen by MDT yet. compaction should not trigger until 3 
goes to completion.

Review Comment:
   yeah, I can remove it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7701] Metadata table initailization with pending instants [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #11137:
URL: https://github.com/apache/hudi/pull/11137#discussion_r1590549815


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -463,11 +460,11 @@ private String generateUniqueCommitInstantTime(String 
initializationTime) {
 if (HoodieTableMetadataUtil.isIndexingCommit(dataIndexTimeline, 
initializationTime)) {
   return initializationTime;
 }
-// Add suffix to initializationTime to find an unused instant time for the 
next index initialization.
+// otherwise yields the timestamp on the fly.
 // This function would be called multiple times in a single application if 
multiple indexes are being
 // initialized one after the other.
 for (int offset = 0; ; ++offset) {
-  final String commitInstantTime = 
HoodieTableMetadataUtil.createIndexInitTimestamp(initializationTime, offset);
+  final String commitInstantTime = 
HoodieInstantTimeGenerator.instantTimePlusMillis(SOLO_COMMIT_TIMESTAMP, offset);

Review Comment:
   For non-empty table, we also switch to start from the 
`SOLO_COMMIT_TIMESTAMP`. That's the main change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #10898:
URL: https://github.com/apache/hudi/pull/10898#discussion_r1590546729


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/HoodieSimpleBucketIndex.java:
##
@@ -61,14 +78,69 @@ public Map 
loadBucketIdToFileIdMappingForPartitio
   if (!bucketIdToFileIdMapping.containsKey(bucketId)) {
 bucketIdToFileIdMapping.put(bucketId, new 
HoodieRecordLocation(commitTime, fileId));
   } else {
+// Finding the instants which conflict with the bucket id
+Set instants = 
findTheConflictBucketIdInPartition(hoodieTable, partition, bucketId);
+
 // Check if bucket data is valid
 throw new HoodieIOException("Find multiple files at partition 
path="
-+ partition + " belongs to the same bucket id = " + bucketId);
++ partition + " belongs to the same bucket id = " + bucketId
++ ", these instants need to rollback: " + instants.toString()
++ ", you can use rollback_to_instant procedure to recovery");
   }
 });
 return bucketIdToFileIdMapping;
   }
 
+  /**
+   * Find out the conflict files in bucket partition with bucket id
+   */
+  public HashSet findTheConflictBucketIdInPartition(HoodieTable 
hoodieTable, String partition, int bucketId) {
+HashSet instants = new HashSet<>();
+HoodieTableMetaClient metaClient = hoodieTable.getMetaClient();
+StoragePath basePath = metaClient.getBasePathV2();
+StoragePath partitionPath = new StoragePath(basePath.toString(), 
partition);
+
+Stream latestFileSlicesIncludingInflight = 
hoodieTable.getSliceView().getLatestFileSlicesIncludingInflight(partition);
+List pendingInstants = 
latestFileSlicesIncludingInflight.map(fileSlice -> 
fileSlice.getLatestInstantTime()).collect(Collectors.toList());

Review Comment:
   Does this work for you?
   
   ```java
   pendingInstantSet = filter the timeline by pending instant
   partitionFileList = 
hoodieTable.getSliceView().getLatestFileSlicesIncludingInflight(partition).map(fileSlice
 -> fileSlice.getAllFiles).collectAsList;
   
   conflict_ids = set()
   
   for (File f in partitionFileList):
 if (getCommitTime(f) in pendingInstantSet):
   conflict_ids.add(getFileId(f))
   
   then report all the msgs about the conflict ids.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #10898:
URL: https://github.com/apache/hudi/pull/10898#discussion_r1590546729


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/HoodieSimpleBucketIndex.java:
##
@@ -61,14 +78,69 @@ public Map 
loadBucketIdToFileIdMappingForPartitio
   if (!bucketIdToFileIdMapping.containsKey(bucketId)) {
 bucketIdToFileIdMapping.put(bucketId, new 
HoodieRecordLocation(commitTime, fileId));
   } else {
+// Finding the instants which conflict with the bucket id
+Set instants = 
findTheConflictBucketIdInPartition(hoodieTable, partition, bucketId);
+
 // Check if bucket data is valid
 throw new HoodieIOException("Find multiple files at partition 
path="
-+ partition + " belongs to the same bucket id = " + bucketId);
++ partition + " belongs to the same bucket id = " + bucketId
++ ", these instants need to rollback: " + instants.toString()
++ ", you can use rollback_to_instant procedure to recovery");
   }
 });
 return bucketIdToFileIdMapping;
   }
 
+  /**
+   * Find out the conflict files in bucket partition with bucket id
+   */
+  public HashSet findTheConflictBucketIdInPartition(HoodieTable 
hoodieTable, String partition, int bucketId) {
+HashSet instants = new HashSet<>();
+HoodieTableMetaClient metaClient = hoodieTable.getMetaClient();
+StoragePath basePath = metaClient.getBasePathV2();
+StoragePath partitionPath = new StoragePath(basePath.toString(), 
partition);
+
+Stream latestFileSlicesIncludingInflight = 
hoodieTable.getSliceView().getLatestFileSlicesIncludingInflight(partition);
+List pendingInstants = 
latestFileSlicesIncludingInflight.map(fileSlice -> 
fileSlice.getLatestInstantTime()).collect(Collectors.toList());

Review Comment:
   pendingInstants = latestFileSlicesIncludingInflight.map(fileSlice -> 
fileSlice.getLatestInstantTime()).filter(instant -> {instant is pending}) 
.collect(Collectors.toList());



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7710] Use compaction.requested during conflict resolution [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #11151:
URL: https://github.com/apache/hudi/pull/11151#discussion_r1590544744


##
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestSimpleConcurrentFileWritesConflictResolutionStrategy.java:
##
@@ -193,6 +198,38 @@ public void 
testConcurrentWritesWithInterleavingScheduledCompaction() throws Exc
 }
   }
 
+  @Test
+  public void testConcurrentWritesWithInterleavingInflightCompaction() throws 
Exception {
+createCommit(metaClient.createNewInstantTime(), metaClient);
+HoodieActiveTimeline timeline = metaClient.getActiveTimeline();
+// consider commits before this are all successful
+Option lastSuccessfulInstant = 
timeline.getCommitsTimeline().filterCompletedInstants().lastInstant();
+// writer 1 starts
+String currentWriterInstant = metaClient.createNewInstantTime();
+createInflightCommit(currentWriterInstant, metaClient);
+// compaction 1 gets scheduled and becomes inflight
+String newInstantTime = metaClient.createNewInstantTime();
+createPendingCompaction(newInstantTime, metaClient);
+
+Option currentInstant = Option.of(new 
HoodieInstant(State.INFLIGHT, HoodieTimeline.DELTA_COMMIT_ACTION, 
currentWriterInstant));
+SimpleConcurrentFileWritesConflictResolutionStrategy strategy = new 
SimpleConcurrentFileWritesConflictResolutionStrategy();
+HoodieCommitMetadata currentMetadata = 
createCommitMetadata(currentWriterInstant);
+metaClient.reloadActiveTimeline();
+List candidateInstants = 
strategy.getCandidateInstants(metaClient, currentInstant.get(), 
lastSuccessfulInstant).collect(
+Collectors.toList());
+// writer 1 conflicts with compaction 1
+Assertions.assertTrue(candidateInstants.size() == 1);
+ConcurrentOperation thatCommitOperation = new 
ConcurrentOperation(candidateInstants.get(0), metaClient);
+ConcurrentOperation thisCommitOperation = new 
ConcurrentOperation(currentInstant.get(), currentMetadata);
+Assertions.assertTrue(strategy.hasConflict(thisCommitOperation, 
thatCommitOperation));
+try {
+  strategy.resolveConflict(null, thisCommitOperation, thatCommitOperation);
+  Assertions.fail("Cannot reach here, should have thrown a conflict");

Review Comment:
   Use `assertThrows` instead, and let's also supplement the msgs for these 
assertions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7710] Use compaction.requested during conflict resolution [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #11151:
URL: https://github.com/apache/hudi/pull/11151#discussion_r1590543447


##
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestSimpleConcurrentFileWritesConflictResolutionStrategy.java:
##
@@ -193,6 +198,38 @@ public void 
testConcurrentWritesWithInterleavingScheduledCompaction() throws Exc
 }
   }
 
+  @Test
+  public void testConcurrentWritesWithInterleavingInflightCompaction() throws 
Exception {
+createCommit(metaClient.createNewInstantTime(), metaClient);
+HoodieActiveTimeline timeline = metaClient.getActiveTimeline();
+// consider commits before this are all successful
+Option lastSuccessfulInstant = 
timeline.getCommitsTimeline().filterCompletedInstants().lastInstant();
+// writer 1 starts
+String currentWriterInstant = metaClient.createNewInstantTime();
+createInflightCommit(currentWriterInstant, metaClient);
+// compaction 1 gets scheduled and becomes inflight
+String newInstantTime = metaClient.createNewInstantTime();
+createPendingCompaction(newInstantTime, metaClient);
+
+Option currentInstant = Option.of(new 
HoodieInstant(State.INFLIGHT, HoodieTimeline.DELTA_COMMIT_ACTION, 
currentWriterInstant));
+SimpleConcurrentFileWritesConflictResolutionStrategy strategy = new 
SimpleConcurrentFileWritesConflictResolutionStrategy();
+HoodieCommitMetadata currentMetadata = 
createCommitMetadata(currentWriterInstant);
+metaClient.reloadActiveTimeline();
+List candidateInstants = 
strategy.getCandidateInstants(metaClient, currentInstant.get(), 
lastSuccessfulInstant).collect(
+Collectors.toList());
+// writer 1 conflicts with compaction 1
+Assertions.assertTrue(candidateInstants.size() == 1);

Review Comment:
   Using `assertEquals` should be better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7715] Partition TTL for Flink [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #11156:
URL: https://github.com/apache/hudi/pull/11156#discussion_r1590541214


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriterWithPartitionTTl.java:
##
@@ -0,0 +1,89 @@
+/*
+ *
+ *  * Licensed to the Apache Software Foundation (ASF) under one
+ *  * or more contributor license agreements.  See the NOTICE file
+ *  * distributed with this work for additional information
+ *  * regarding copyright ownership.  The ASF licenses this file
+ *  * to you under the Apache License, Version 2.0 (the
+ *  * "License"); you may not use this file except in compliance
+ *  * with the License.  You may obtain a copy of the License at
+ *  *
+ *  *  http://www.apache.org/licenses/LICENSE-2.0
+ *  *
+ *  * Unless required by applicable law or agreed to in writing, software
+ *  * distributed under the License is distributed on an "AS IS" BASIS,
+ *  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  * See the License for the specific language governing permissions and
+ *  * limitations under the License.
+ *
+ */
+
+package org.apache.hudi.sink;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.hudi.avro.model.HoodieReplaceCommitMetadata;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.config.HoodieTTLConfig;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.ttl.strategy.KeepByTimeStrategy;
+import org.apache.hudi.util.StreamerUtil;
+import org.apache.hudi.utils.TestData;
+import org.junit.jupiter.api.Test;
+
+import static 
org.apache.hudi.common.table.timeline.HoodieInstantTimeGenerator.fixInstantTimeCompatibility;
+import static 
org.apache.hudi.common.table.timeline.HoodieInstantTimeGenerator.instantTimePlusMillis;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+/**
+ * Test cases for partition TTL.
+ */
+public class TestWriterWithPartitionTTl extends TestWriteCopyOnWrite {
+  // The origin PartitionTTLStrategy calculate the expire time by DAYs, it's 
too long for test.
+  // Override the method isPartitionExpired to calculate expire time by 
minutes.
+  public static class FlinkPartitionTTLTestStrategy extends KeepByTimeStrategy 
{
+public FlinkPartitionTTLTestStrategy(HoodieTable hoodieTable, String 
instantTime) {
+  super(hoodieTable, instantTime);
+}
+
+@Override
+protected boolean isPartitionExpired(String referenceTime) {
+  String expiredTime = instantTimePlusMillis(referenceTime, ttlInMilis / 
24 / 60);
+  return fixInstantTimeCompatibility(instantTime).compareTo(expiredTime) > 
0;
+}
+  }
+
+  @Override
+  protected void setUp(Configuration conf) {
+conf.setBoolean(HoodieTTLConfig.INLINE_PARTITION_TTL.key(), true);
+conf.setString(HoodieTTLConfig.DAYS_RETAIN.key(), "1");
+conf.setString(HoodieTTLConfig.PARTITION_TTL_STRATEGY_CLASS_NAME.key(), 
FlinkPartitionTTLTestStrategy.class.getName());
+  }
+
+  @Test
+  public void testFlinkWriterWithPartitionTTL() throws Exception {
+// open the function and ingest data
+preparePipeline(conf)
+.consume(TestData.DATA_SET_PART1)
+.assertEmptyDataFiles()
+.checkpoint(1)
+.assertNextEvent()
+.checkpointComplete(1)
+.end();
+
+Thread.sleep(60 * 1000);

Review Comment:
   you can overeide the TTL to a really short time so that there is no need to 
wait.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7715] Partition TTL for Flink [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #11156:
URL: https://github.com/apache/hudi/pull/11156#discussion_r1590540776


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriterWithPartitionTTl.java:
##
@@ -0,0 +1,89 @@
+/*
+ *
+ *  * Licensed to the Apache Software Foundation (ASF) under one
+ *  * or more contributor license agreements.  See the NOTICE file
+ *  * distributed with this work for additional information
+ *  * regarding copyright ownership.  The ASF licenses this file
+ *  * to you under the Apache License, Version 2.0 (the
+ *  * "License"); you may not use this file except in compliance
+ *  * with the License.  You may obtain a copy of the License at
+ *  *
+ *  *  http://www.apache.org/licenses/LICENSE-2.0
+ *  *
+ *  * Unless required by applicable law or agreed to in writing, software
+ *  * distributed under the License is distributed on an "AS IS" BASIS,
+ *  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  * See the License for the specific language governing permissions and
+ *  * limitations under the License.
+ *
+ */
+
+package org.apache.hudi.sink;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.hudi.avro.model.HoodieReplaceCommitMetadata;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.config.HoodieTTLConfig;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.ttl.strategy.KeepByTimeStrategy;
+import org.apache.hudi.util.StreamerUtil;
+import org.apache.hudi.utils.TestData;
+import org.junit.jupiter.api.Test;
+
+import static 
org.apache.hudi.common.table.timeline.HoodieInstantTimeGenerator.fixInstantTimeCompatibility;
+import static 
org.apache.hudi.common.table.timeline.HoodieInstantTimeGenerator.instantTimePlusMillis;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+/**
+ * Test cases for partition TTL.
+ */
+public class TestWriterWithPartitionTTl extends TestWriteCopyOnWrite {
+  // The origin PartitionTTLStrategy calculate the expire time by DAYs, it's 
too long for test.

Review Comment:
   Maybe we can extend from `TestWriteBase`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7701] Metadata table initailization with pending instants [hudi]

2024-05-05 Thread via GitHub


codope commented on code in PR #11137:
URL: https://github.com/apache/hudi/pull/11137#discussion_r1590531114


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java:
##
@@ -925,6 +925,37 @@ public void 
testMetadataTableCompactionWithPendingInstants() throws Exception {
 
assertEquals(HoodieInstantTimeGenerator.instantTimeMinusMillis(inflightInstant2,
 1L), tableMetadata.getLatestCompactionTime().get());
   }
 
+  @Test
+  public void testInitializeMetadataTableWithPendingInstant() throws Exception 
{
+init(COPY_ON_WRITE, false);

Review Comment:
   If we are going to test for only one table type, shall we test for MOR 
instead of COW? If the initialization works for MOR, it should work for COW as 
well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7701] Metadata table initailization with pending instants [hudi]

2024-05-05 Thread via GitHub


codope commented on code in PR #11137:
URL: https://github.com/apache/hudi/pull/11137#discussion_r1590529782


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -463,11 +460,11 @@ private String generateUniqueCommitInstantTime(String 
initializationTime) {
 if (HoodieTableMetadataUtil.isIndexingCommit(dataIndexTimeline, 
initializationTime)) {
   return initializationTime;
 }
-// Add suffix to initializationTime to find an unused instant time for the 
next index initialization.
+// otherwise yields the timestamp on the fly.
 // This function would be called multiple times in a single application if 
multiple indexes are being
 // initialized one after the other.
 for (int offset = 0; ; ++offset) {
-  final String commitInstantTime = 
HoodieTableMetadataUtil.createIndexInitTimestamp(initializationTime, offset);
+  final String commitInstantTime = 
HoodieInstantTimeGenerator.instantTimePlusMillis(SOLO_COMMIT_TIMESTAMP, offset);

Review Comment:
   Why do we need the SOLO_COMMIT_TIMESTAMP again here? I think the 
initializationTime passed to this method should already be 
SOLO_COMMIT_TIMESTAMP if the timeline had no completed instants right? We do 
this in 
https://github.com/apache/hudi/blob/d4f55f193cd79f317d4fc021cc741554ce8cf6cd/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java#L250



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -463,11 +460,11 @@ private String generateUniqueCommitInstantTime(String 
initializationTime) {
 if (HoodieTableMetadataUtil.isIndexingCommit(dataIndexTimeline, 
initializationTime)) {
   return initializationTime;
 }
-// Add suffix to initializationTime to find an unused instant time for the 
next index initialization.
+// otherwise yields the timestamp on the fly.

Review Comment:
   let's move this comment to line 459, just before if blok.



##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieMetadataBootstrap.java:
##
@@ -167,20 +167,19 @@ public void testMetadataBootstrapInflightCommit() throws 
Exception {
 HoodieTableType tableType = COPY_ON_WRITE;
 init(tableType, false);
 
+// In real production env, bootstrap action can only happen on empty table,
+// otherwise we need to roll back the previous bootstrap first,
+// see 'SparkBootstrapCommitActionExecutor.execute' for more details.
 doPreBootstrapWriteOperation(testTable, INSERT, "001");
 doPreBootstrapWriteOperation(testTable, "002");
 // add an inflight commit
 HoodieCommitMetadata inflightCommitMeta = 
testTable.doWriteOperation("0007", UPSERT, emptyList(),
-asList("p1", "p2"), 2, true, true);
+asList("p1", "p2"), 2, false, true);

Review Comment:
   setting to `false` won't do any bootstrap. Can you please explain why 
bootstrap is set to false?



##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java:
##
@@ -925,6 +925,37 @@ public void 
testMetadataTableCompactionWithPendingInstants() throws Exception {
 
assertEquals(HoodieInstantTimeGenerator.instantTimeMinusMillis(inflightInstant2,
 1L), tableMetadata.getLatestCompactionTime().get());
   }
 
+  @Test
+  public void testInitializeMetadataTableWithPendingInstant() throws Exception 
{
+init(COPY_ON_WRITE, false);
+initWriteConfigAndMetatableWriter(writeConfig, false);
+doWriteOperation(testTable, metaClient.createNewInstantTime(), INSERT);
+doWriteOperation(testTable, metaClient.createNewInstantTime(), INSERT);
+
+// test multi-writer scenario. let's add 1,2,3,4 where 1,2,4 succeeded, 
but 3 is still inflight. so latest delta commit in MDT is 4, while 3 is still 
pending
+// in DT and not seen by MDT yet. compaction should not trigger until 3 
goes to completion.

Review Comment:
   Test doesn't seem to do as commented.



##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java:
##
@@ -925,6 +925,37 @@ public void 
testMetadataTableCompactionWithPendingInstants() throws Exception {
 
assertEquals(HoodieInstantTimeGenerator.instantTimeMinusMillis(inflightInstant2,
 1L), tableMetadata.getLatestCompactionTime().get());
   }
 
+  @Test
+  public void testInitializeMetadataTableWithPendingInstant() throws Exception 
{
+init(COPY_ON_WRITE, false);

Review Comment:
   If we are going to for only one table type, shall we test for MOR instead of 
COW? If the initialization works for MOR, it should work for COW as well.



##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieMetadataBootstrap.java:
##
@@ -261,12 +260,8 @@ private void bootstra

Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #10898:
URL: https://github.com/apache/hudi/pull/10898#issuecomment-2095179748

   
   ## CI report:
   
   * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN
   * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN
   * 251687bf32ab0835287ff7eb39bb8995d979d9ac Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23675)
 
   * 4d1e33f471f37c8b0f1b7ff7174dff9f61d502b6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23679)
 
   * 51bee45de0d12f1613d7af314914fceb585f4282 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23680)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #10898:
URL: https://github.com/apache/hudi/pull/10898#issuecomment-2095175067

   
   ## CI report:
   
   * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN
   * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN
   * f3f2c2ba6e0725cdb21261813896c1de1571746d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23541)
 
   * 251687bf32ab0835287ff7eb39bb8995d979d9ac Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23675)
 
   * 4d1e33f471f37c8b0f1b7ff7174dff9f61d502b6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23679)
 
   * 51bee45de0d12f1613d7af314914fceb585f4282 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7710] Use compaction.requested during conflict resolution [hudi]

2024-05-05 Thread via GitHub


linliu-code commented on code in PR #11151:
URL: https://github.com/apache/hudi/pull/11151#discussion_r1590529084


##
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestSimpleConcurrentFileWritesConflictResolutionStrategy.java:
##
@@ -193,6 +193,38 @@ public void 
testConcurrentWritesWithInterleavingScheduledCompaction() throws Exc
 }
   }
 
+  @Test
+  public void testConcurrentWritesWithInterleavingInflightCompaction() throws 
Exception {
+createCommit(metaClient.createNewInstantTime(), metaClient);
+HoodieActiveTimeline timeline = metaClient.getActiveTimeline();
+// consider commits before this are all successful
+Option lastSuccessfulInstant = 
timeline.getCommitsTimeline().filterCompletedInstants().lastInstant();
+// writer 1 starts
+String currentWriterInstant = metaClient.createNewInstantTime();
+createInflightCommit(currentWriterInstant, metaClient);
+// compaction 1 gets scheduled and becomes inflight
+String newInstantTime = metaClient.createNewInstantTime();
+createPendingCompaction(newInstantTime, metaClient);
+
+Option currentInstant = Option.of(new 
HoodieInstant(State.INFLIGHT, HoodieTimeline.COMMIT_ACTION, 
currentWriterInstant));

Review Comment:
   Second thoughts. Let me use MOR instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7712] Fixing RLI initialization to account for file slices instead of just base files while initializing [hudi]

2024-05-05 Thread via GitHub


codope commented on code in PR #11153:
URL: https://github.com/apache/hudi/pull/11153#discussion_r1590512117


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -575,6 +598,40 @@ private Pair> 
initializeRecordIndexPartition()
 return Pair.of(fileGroupCount, records);
   }
 
+  private static HoodieData 
readRecordKeysFromFileSlices(HoodieEngineContext engineContext,

Review Comment:
   There is already `HoodieTableMetadataUtil#readRecordKeysFromFileSlices`. 
Let's reuse that if possible, or maybe consolidate the two (i guess 
`HoodieMergedReadHandle` is also using `HoodieMergedLogRecordScanner`)?



##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestRecordLevelIndex.scala:
##
@@ -55,6 +59,76 @@ class TestRecordLevelIndex extends RecordLevelIndexTestBase {
   saveMode = SaveMode.Overwrite)
   }
 
+  @ParameterizedTest
+  @EnumSource(classOf[HoodieTableType])
+  def testRLIInitializationForMorGlobalIndex(tableType: HoodieTableType): Unit 
= {

Review Comment:
   Is this test useful for COW? Shall we run it only for MOR? I am trying to 
cut down extra test time. I think for COW, the test should pass even with 
current code.



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -575,6 +598,40 @@ private Pair> 
initializeRecordIndexPartition()
 return Pair.of(fileGroupCount, records);
   }
 
+  private static HoodieData 
readRecordKeysFromFileSlices(HoodieEngineContext engineContext,
+  
List> partitionFileSlicePairs,

Review Comment:
   should work.. For COW, file slice won't have any log files for filegroup id, 
which is handled by the HoodieMergedLogRecordScanner.



##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestRecordLevelIndex.scala:
##
@@ -55,6 +59,76 @@ class TestRecordLevelIndex extends RecordLevelIndexTestBase {
   saveMode = SaveMode.Overwrite)
   }
 
+  @ParameterizedTest
+  @EnumSource(classOf[HoodieTableType])
+  def testRLIInitializationForMorGlobalIndex(tableType: HoodieTableType): Unit 
= {
+val hudiOpts = commonOpts + (DataSourceWriteOptions.TABLE_TYPE.key -> 
tableType.name()) +
+  (HoodieMetadataConfig.RECORD_INDEX_MIN_FILE_GROUP_COUNT_PROP.key -> "1") 
+
+  (HoodieMetadataConfig.RECORD_INDEX_MAX_FILE_GROUP_COUNT_PROP.key -> "1") 
+
+  (HoodieIndexConfig.INDEX_TYPE.key -> "RECORD_INDEX") +
+  (HoodieIndexConfig.RECORD_INDEX_UPDATE_PARTITION_PATH_ENABLE.key -> 
"true") -
+  HoodieMetadataConfig.RECORD_INDEX_ENABLE_PROP.key
+
+val dataGen1 = HoodieTestDataGenerator.createTestGeneratorFirstPartition()
+val dataGen2 = HoodieTestDataGenerator.createTestGeneratorSecondPartition()
+
+// batch1 inserts
+val instantTime1 = getInstantTime()
+val latestBatch = recordsToStrings(dataGen1.generateInserts(instantTime1, 
5)).asScala
+var operation = INSERT_OPERATION_OPT_VAL
+val latestBatchDf = 
spark.read.json(spark.sparkContext.parallelize(latestBatch, 1))
+latestBatchDf.cache()
+latestBatchDf.write.format("org.apache.hudi")
+  .options(hudiOpts)
+  .mode(SaveMode.Overwrite)
+  .save(basePath)
+val deletedDf1 = calculateMergedDf(latestBatchDf, operation, true)
+deletedDf1.cache()
+
+// batch2. upsert. update few records to 2nd partition from partition1 and 
insert a few to partition2.
+val instantTime2 = getInstantTime()
+
+val latestBatch2_1 = 
recordsToStrings(dataGen1.generateUniqueUpdates(instantTime2, 3)).asScala
+val latestBatchDf2_1 = 
spark.read.json(spark.sparkContext.parallelize(latestBatch2_1, 1))
+val latestBatchDf2_2 = latestBatchDf2_1.withColumn("partition", 
lit(HoodieTestDataGenerator.DEFAULT_SECOND_PARTITION_PATH))
+  .withColumn("partition_path", 
lit(HoodieTestDataGenerator.DEFAULT_SECOND_PARTITION_PATH))
+val latestBatch2_3 = 
recordsToStrings(dataGen2.generateInserts(instantTime2, 2)).asScala
+val latestBatchDf2_3 = 
spark.read.json(spark.sparkContext.parallelize(latestBatch2_3, 1))
+val latestBatchDf2Final = latestBatchDf2_3.union(latestBatchDf2_2)
+latestBatchDf2Final.cache()
+latestBatchDf2Final.write.format("org.apache.hudi")
+  .options(hudiOpts)
+  .mode(SaveMode.Append)
+  .save(basePath)
+operation = UPSERT_OPERATION_OPT_VAL
+val deletedDf2 = calculateMergedDf(latestBatchDf2Final, operation, true)
+deletedDf2.cache()
+
+val hudiOpts2 = commonOpts + (DataSourceWriteOptions.TABLE_TYPE.key -> 
tableType.name()) +
+  (HoodieMetadataConfig.RECORD_INDEX_MIN_FILE_GROUP_COUNT_PROP.key -> "1") 
+
+  (HoodieMetadataConfig.RECORD_INDEX_MAX_FILE_GROUP_COUNT_PROP.key -> "1") 
+
+  (HoodieIndexConfig.INDEX_TYPE.key -> "RECORD_INDEX") +
+  (HoodieIndexConfig.RECORD_INDEX_UPDATE_PARTI

Re: [PR] [HUDI-7715] Partition TTL for Flink [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11156:
URL: https://github.com/apache/hudi/pull/11156#issuecomment-2095169513

   
   ## CI report:
   
   * d4a130cf98ab8bb42b93ce01a045b74b15117e54 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23678)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] avoid listing files for empty tables [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11155:
URL: https://github.com/apache/hudi/pull/11155#issuecomment-2095169466

   
   ## CI report:
   
   * 83057bf74dc211a7322d704d8711ce5b0c60ae26 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23674)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11154:
URL: https://github.com/apache/hudi/pull/11154#issuecomment-2095169400

   
   ## CI report:
   
   * 12038dbde068e26f733a7b1c9cc7217019c31f25 UNKNOWN
   * b47575632f16f8de71fb6efbc3c0318859ec0dbe Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23676)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11152:
URL: https://github.com/apache/hudi/pull/11152#issuecomment-2095169363

   
   ## CI report:
   
   * 0aa5721472d6e2e8afd609185fc3b999a66c8a7f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23677)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #10898:
URL: https://github.com/apache/hudi/pull/10898#issuecomment-2095169061

   
   ## CI report:
   
   * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN
   * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN
   * f3f2c2ba6e0725cdb21261813896c1de1571746d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23541)
 
   * 251687bf32ab0835287ff7eb39bb8995d979d9ac Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23675)
 
   * 4d1e33f471f37c8b0f1b7ff7174dff9f61d502b6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7710] Use compaction.requested during conflict resolution [hudi]

2024-05-05 Thread via GitHub


linliu-code commented on code in PR #11151:
URL: https://github.com/apache/hudi/pull/11151#discussion_r1590513268


##
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestSimpleConcurrentFileWritesConflictResolutionStrategy.java:
##
@@ -193,6 +193,38 @@ public void 
testConcurrentWritesWithInterleavingScheduledCompaction() throws Exc
 }
   }
 
+  @Test
+  public void testConcurrentWritesWithInterleavingInflightCompaction() throws 
Exception {
+createCommit(metaClient.createNewInstantTime(), metaClient);
+HoodieActiveTimeline timeline = metaClient.getActiveTimeline();
+// consider commits before this are all successful
+Option lastSuccessfulInstant = 
timeline.getCommitsTimeline().filterCompletedInstants().lastInstant();
+// writer 1 starts
+String currentWriterInstant = metaClient.createNewInstantTime();
+createInflightCommit(currentWriterInstant, metaClient);
+// compaction 1 gets scheduled and becomes inflight
+String newInstantTime = metaClient.createNewInstantTime();
+createPendingCompaction(newInstantTime, metaClient);
+
+Option currentInstant = Option.of(new 
HoodieInstant(State.INFLIGHT, HoodieTimeline.COMMIT_ACTION, 
currentWriterInstant));

Review Comment:
   Yeah, I know we should use MOR table for compaction. But this entire test 
class uses COW to test concurrency with compaction. It should be OK since we 
only test the conflict resolution logic.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7710] Use compaction.requested during conflict resolution [hudi]

2024-05-05 Thread via GitHub


linliu-code commented on code in PR #11151:
URL: https://github.com/apache/hudi/pull/11151#discussion_r1590513268


##
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestSimpleConcurrentFileWritesConflictResolutionStrategy.java:
##
@@ -193,6 +193,38 @@ public void 
testConcurrentWritesWithInterleavingScheduledCompaction() throws Exc
 }
   }
 
+  @Test
+  public void testConcurrentWritesWithInterleavingInflightCompaction() throws 
Exception {
+createCommit(metaClient.createNewInstantTime(), metaClient);
+HoodieActiveTimeline timeline = metaClient.getActiveTimeline();
+// consider commits before this are all successful
+Option lastSuccessfulInstant = 
timeline.getCommitsTimeline().filterCompletedInstants().lastInstant();
+// writer 1 starts
+String currentWriterInstant = metaClient.createNewInstantTime();
+createInflightCommit(currentWriterInstant, metaClient);
+// compaction 1 gets scheduled and becomes inflight
+String newInstantTime = metaClient.createNewInstantTime();
+createPendingCompaction(newInstantTime, metaClient);
+
+Option currentInstant = Option.of(new 
HoodieInstant(State.INFLIGHT, HoodieTimeline.COMMIT_ACTION, 
currentWriterInstant));

Review Comment:
   Yeah, I know we should use MOR table for compaction. But this entire test 
class uses COW to test concurrency with compaction. I guess it should be OK 
since we only test the conflict resolution logic.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] chore: try v2 codecov action [hudi-rs]

2024-05-05 Thread via GitHub


xushiyan opened a new pull request, #16:
URL: https://github.com/apache/hudi-rs/pull/16

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7710] Use compaction.requested during conflict resolution [hudi]

2024-05-05 Thread via GitHub


linliu-code commented on code in PR #11151:
URL: https://github.com/apache/hudi/pull/11151#discussion_r1590513268


##
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestSimpleConcurrentFileWritesConflictResolutionStrategy.java:
##
@@ -193,6 +193,38 @@ public void 
testConcurrentWritesWithInterleavingScheduledCompaction() throws Exc
 }
   }
 
+  @Test
+  public void testConcurrentWritesWithInterleavingInflightCompaction() throws 
Exception {
+createCommit(metaClient.createNewInstantTime(), metaClient);
+HoodieActiveTimeline timeline = metaClient.getActiveTimeline();
+// consider commits before this are all successful
+Option lastSuccessfulInstant = 
timeline.getCommitsTimeline().filterCompletedInstants().lastInstant();
+// writer 1 starts
+String currentWriterInstant = metaClient.createNewInstantTime();
+createInflightCommit(currentWriterInstant, metaClient);
+// compaction 1 gets scheduled and becomes inflight
+String newInstantTime = metaClient.createNewInstantTime();
+createPendingCompaction(newInstantTime, metaClient);
+
+Option currentInstant = Option.of(new 
HoodieInstant(State.INFLIGHT, HoodieTimeline.COMMIT_ACTION, 
currentWriterInstant));

Review Comment:
   Yeah, I know we should use MOR table for compaction. But this entire test 
uses COW to test concurrency with compaction. I guess it should be OK since we 
only test the conflict resolution logic.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi-rs) branch main updated: ci: use cargo tarpaulin to generate code coverage (#15)

2024-05-05 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/hudi-rs.git


The following commit(s) were added to refs/heads/main by this push:
 new 0c6456e  ci: use cargo tarpaulin to generate code coverage (#15)
0c6456e is described below

commit 0c6456e1e68e7dcea1e05cf151b11f362daa60ef
Author: Shiyan Xu <2701446+xushi...@users.noreply.github.com>
AuthorDate: Sun May 5 22:53:06 2024 -0500

ci: use cargo tarpaulin to generate code coverage (#15)

Use https://github.com/xd009642/tarpaulin to generate coverage file, which 
will be reported to https://app.codecov.io/gh/apache/hudi-rs

Fixes #6
---
 .github/workflows/ci.yml | 11 +--
 .gitignore   |  4 
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 4bc8235..abe7ec3 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -54,7 +54,14 @@ jobs:
 
   test:
 runs-on: ubuntu-latest
+container:
+  image: xd009642/tarpaulin:0.29.1
+  options: --security-opt seccomp=unconfined
 steps:
   - uses: actions/checkout@v4
-  - name: Unit test
-run: cargo test --no-fail-fast --all-targets --all-features --workspace
+  - name: Unit test with code coverage
+run: cargo tarpaulin --verbose --no-fail-fast --all-features 
--workspace --out xml
+  - name: Upload to codecov.io
+uses: codecov/codecov-action@v4
+with:
+  fail_ci_if_error: true
diff --git a/.gitignore b/.gitignore
index 3c2bbba..5f104d1 100644
--- a/.gitignore
+++ b/.gitignore
@@ -30,3 +30,7 @@ __pycache__
 
 # macOS
 **/.DS_Store
+
+# coverage files
+*.profraw
+cobertura.xml



Re: [I] Configure code coverage reporting [hudi-rs]

2024-05-05 Thread via GitHub


xushiyan closed issue #6: Configure code coverage reporting
URL: https://github.com/apache/hudi-rs/issues/6


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] ci: use cargo tarpaulin to generate code coverage [hudi-rs]

2024-05-05 Thread via GitHub


xushiyan merged PR #15:
URL: https://github.com/apache/hudi-rs/pull/15


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] ci: use cargo tarpaulin to generate code coverage [hudi-rs]

2024-05-05 Thread via GitHub


codecov-commenter commented on PR #15:
URL: https://github.com/apache/hudi-rs/pull/15#issuecomment-2095149560

   ## Welcome to 
[Codecov](https://codecov.io?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 :tada:
   
   Once you merge this PR into your default branch, you're all set! Codecov 
will compare coverage reports and display results in all future pull requests.
   
   Thanks for integrating Codecov - We've got you covered :open_umbrella:


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7715] Partition TTL for Flink [hudi]

2024-05-05 Thread via GitHub


xicm commented on code in PR #11156:
URL: https://github.com/apache/hudi/pull/11156#discussion_r1590503411


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriterWithPartitionTTl.java:
##
@@ -0,0 +1,89 @@
+/*
+ *
+ *  * Licensed to the Apache Software Foundation (ASF) under one
+ *  * or more contributor license agreements.  See the NOTICE file
+ *  * distributed with this work for additional information
+ *  * regarding copyright ownership.  The ASF licenses this file
+ *  * to you under the Apache License, Version 2.0 (the
+ *  * "License"); you may not use this file except in compliance
+ *  * with the License.  You may obtain a copy of the License at
+ *  *
+ *  *  http://www.apache.org/licenses/LICENSE-2.0
+ *  *
+ *  * Unless required by applicable law or agreed to in writing, software
+ *  * distributed under the License is distributed on an "AS IS" BASIS,
+ *  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  * See the License for the specific language governing permissions and
+ *  * limitations under the License.
+ *
+ */
+
+package org.apache.hudi.sink;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.hudi.avro.model.HoodieReplaceCommitMetadata;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.config.HoodieTTLConfig;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.ttl.strategy.KeepByTimeStrategy;
+import org.apache.hudi.util.StreamerUtil;
+import org.apache.hudi.utils.TestData;
+import org.junit.jupiter.api.Test;
+
+import static 
org.apache.hudi.common.table.timeline.HoodieInstantTimeGenerator.fixInstantTimeCompatibility;
+import static 
org.apache.hudi.common.table.timeline.HoodieInstantTimeGenerator.instantTimePlusMillis;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+/**
+ * Test cases for partition TTL.
+ */
+public class TestWriterWithPartitionTTl extends TestWriteCopyOnWrite {
+  // The origin PartitionTTLStrategy calculate the expire time by DAYs, it's 
too long for test.
+  // Override the method isPartitionExpired to calculate expire time by 
minutes.
+  public static class FlinkPartitionTTLTestStrategy extends KeepByTimeStrategy 
{
+public FlinkPartitionTTLTestStrategy(HoodieTable hoodieTable, String 
instantTime) {
+  super(hoodieTable, instantTime);
+}
+
+@Override
+protected boolean isPartitionExpired(String referenceTime) {
+  String expiredTime = instantTimePlusMillis(referenceTime, ttlInMilis / 
24 / 60);
+  return fixInstantTimeCompatibility(instantTime).compareTo(expiredTime) > 
0;
+}
+  }
+
+  @Override
+  protected void setUp(Configuration conf) {
+conf.setBoolean(HoodieTTLConfig.INLINE_PARTITION_TTL.key(), true);
+conf.setString(HoodieTTLConfig.DAYS_RETAIN.key(), "1");
+conf.setString(HoodieTTLConfig.PARTITION_TTL_STRATEGY_CLASS_NAME.key(), 
FlinkPartitionTTLTestStrategy.class.getName());
+  }
+
+  @Test
+  public void testFlinkWriterWithPartitionTTL() throws Exception {
+// open the function and ingest data
+preparePipeline(conf)
+.consume(TestData.DATA_SET_PART1)
+.assertEmptyDataFiles()
+.checkpoint(1)
+.assertNextEvent()
+.checkpointComplete(1)
+.end();
+
+Thread.sleep(60 * 1000);

Review Comment:
   Is there a better way to test ttl?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7715] Partition TTL for Flink [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11156:
URL: https://github.com/apache/hudi/pull/11156#issuecomment-2095131169

   
   ## CI report:
   
   * d4a130cf98ab8bb42b93ce01a045b74b15117e54 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23678)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-05-05 Thread via GitHub


xuzifu666 commented on code in PR #10898:
URL: https://github.com/apache/hudi/pull/10898#discussion_r1590502854


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/HoodieSimpleBucketIndex.java:
##
@@ -61,14 +76,66 @@ public Map 
loadBucketIdToFileIdMappingForPartitio
   if (!bucketIdToFileIdMapping.containsKey(bucketId)) {
 bucketIdToFileIdMapping.put(bucketId, new 
HoodieRecordLocation(commitTime, fileId));
   } else {
+// Finding the instants which conflict with the bucket id
+Set instants = 
findTheConflictBucketIdInPartition(hoodieTable, partition, bucketId);
+
 // Check if bucket data is valid
 throw new HoodieIOException("Find multiple files at partition 
path="
-+ partition + " belongs to the same bucket id = " + bucketId);
++ partition + " belongs to the same bucket id = " + bucketId
++ ", these instants need to rollback: " + instants.toString()
++ ", you can use rollback_to_instant procedure to recovery");
   }
 });
 return bucketIdToFileIdMapping;
   }
 
+  /**
+   * Find out the conflict files in bucket partition with bucket id
+   */
+  public HashSet findTheConflictBucketIdInPartition(HoodieTable 
hoodieTable, String partition, int bucketId) {
+HashSet instants = new HashSet<>();
+HoodieTableMetaClient metaClient = hoodieTable.getMetaClient();
+StoragePath basePath = metaClient.getBasePathV2();
+StoragePath partitionPath = new StoragePath(basePath.toString(), 
partition);
+
+Stream latestFileSlicesIncludingInflight = 
hoodieTable.getSliceView().getLatestFileSlicesIncludingInflight(partition);
+List pendingInstants = 
latestFileSlicesIncludingInflight.map(fileSlice1 -> 
fileSlice1.getBaseInstantTime()).collect(Collectors.toList());
+
+for (String i : pendingInstants) {
+  if (judgeInstantInPath(metaClient, partitionPath, i, bucketId)) {
+instants.add(i);
+// error out director and stop circulate when find out conflict instant
+break;
+  }
+}
+return instants;
+  }
+
+  public Boolean judgeInstantInPath(HoodieTableMetaClient metaClient, 
StoragePath path, String instant, int bucketId) {
+Boolean ret = false;
+try {
+  List fileStatuses = 
metaClient.getStorage().listFiles(path);

Review Comment:
   OK,add a cache list to comfire only once called for each partition. 
@danny0405 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11152:
URL: https://github.com/apache/hudi/pull/11152#issuecomment-2095131086

   
   ## CI report:
   
   * 893f1bc42e270159a9b941efb25fc945dd71d39d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23666)
 
   * 0aa5721472d6e2e8afd609185fc3b999a66c8a7f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23677)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11154:
URL: https://github.com/apache/hudi/pull/11154#issuecomment-2095131123

   
   ## CI report:
   
   * 0ca382304b2197d1f32edf3800c4771de9039598 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23673)
 
   * 12038dbde068e26f733a7b1c9cc7217019c31f25 UNKNOWN
   * b47575632f16f8de71fb6efbc3c0318859ec0dbe Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23676)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #10898:
URL: https://github.com/apache/hudi/pull/10898#issuecomment-2095130724

   
   ## CI report:
   
   * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN
   * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN
   * f3f2c2ba6e0725cdb21261813896c1de1571746d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23541)
 
   * 251687bf32ab0835287ff7eb39bb8995d979d9ac Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23675)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-05-05 Thread via GitHub


xuzifu666 commented on code in PR #10898:
URL: https://github.com/apache/hudi/pull/10898#discussion_r1590502674


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/HoodieSimpleBucketIndex.java:
##
@@ -61,14 +76,66 @@ public Map 
loadBucketIdToFileIdMappingForPartitio
   if (!bucketIdToFileIdMapping.containsKey(bucketId)) {
 bucketIdToFileIdMapping.put(bucketId, new 
HoodieRecordLocation(commitTime, fileId));
   } else {
+// Finding the instants which conflict with the bucket id
+Set instants = 
findTheConflictBucketIdInPartition(hoodieTable, partition, bucketId);
+
 // Check if bucket data is valid
 throw new HoodieIOException("Find multiple files at partition 
path="
-+ partition + " belongs to the same bucket id = " + bucketId);
++ partition + " belongs to the same bucket id = " + bucketId
++ ", these instants need to rollback: " + instants.toString()
++ ", you can use rollback_to_instant procedure to recovery");
   }
 });
 return bucketIdToFileIdMapping;
   }
 
+  /**
+   * Find out the conflict files in bucket partition with bucket id
+   */
+  public HashSet findTheConflictBucketIdInPartition(HoodieTable 
hoodieTable, String partition, int bucketId) {
+HashSet instants = new HashSet<>();
+HoodieTableMetaClient metaClient = hoodieTable.getMetaClient();
+StoragePath basePath = metaClient.getBasePathV2();
+StoragePath partitionPath = new StoragePath(basePath.toString(), 
partition);
+
+Stream latestFileSlicesIncludingInflight = 
hoodieTable.getSliceView().getLatestFileSlicesIncludingInflight(partition);
+List pendingInstants = 
latestFileSlicesIncludingInflight.map(fileSlice1 -> 
fileSlice1.getBaseInstantTime()).collect(Collectors.toList());

Review Comment:
   You are right, getLatestInstantTime is more fitable



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7715] Partition TTL for Flink [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11156:
URL: https://github.com/apache/hudi/pull/11156#issuecomment-2095126657

   
   ## CI report:
   
   * d4a130cf98ab8bb42b93ce01a045b74b15117e54 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11154:
URL: https://github.com/apache/hudi/pull/11154#issuecomment-2095126604

   
   ## CI report:
   
   * 0ca382304b2197d1f32edf3800c4771de9039598 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23673)
 
   * 12038dbde068e26f733a7b1c9cc7217019c31f25 UNKNOWN
   * b47575632f16f8de71fb6efbc3c0318859ec0dbe UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11152:
URL: https://github.com/apache/hudi/pull/11152#issuecomment-2095126571

   
   ## CI report:
   
   * 893f1bc42e270159a9b941efb25fc945dd71d39d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23666)
 
   * 0aa5721472d6e2e8afd609185fc3b999a66c8a7f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #10898:
URL: https://github.com/apache/hudi/pull/10898#issuecomment-2095126271

   
   ## CI report:
   
   * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN
   * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN
   * f3f2c2ba6e0725cdb21261813896c1de1571746d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23541)
 
   * 251687bf32ab0835287ff7eb39bb8995d979d9ac UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11154:
URL: https://github.com/apache/hudi/pull/11154#issuecomment-2095122322

   
   ## CI report:
   
   * 0ca382304b2197d1f32edf3800c4771de9039598 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23673)
 
   * 12038dbde068e26f733a7b1c9cc7217019c31f25 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7715) Partition TTL for Flink

2024-05-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7715:
-
Labels: pull-request-available  (was: )

> Partition TTL for Flink
> ---
>
> Key: HUDI-7715
> URL: https://issues.apache.org/jira/browse/HUDI-7715
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: xi chaomin
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7715] Partition TTL for Flink [hudi]

2024-05-05 Thread via GitHub


xicm opened a new pull request, #11156:
URL: https://github.com/apache/hudi/pull/11156

   ### Change Logs
   
   Partition TTL for Flink
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   none
   ### Documentation Update
   
   none
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #10898:
URL: https://github.com/apache/hudi/pull/10898#discussion_r1590494018


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/HoodieSimpleBucketIndex.java:
##
@@ -61,14 +76,66 @@ public Map 
loadBucketIdToFileIdMappingForPartitio
   if (!bucketIdToFileIdMapping.containsKey(bucketId)) {
 bucketIdToFileIdMapping.put(bucketId, new 
HoodieRecordLocation(commitTime, fileId));
   } else {
+// Finding the instants which conflict with the bucket id
+Set instants = 
findTheConflictBucketIdInPartition(hoodieTable, partition, bucketId);
+
 // Check if bucket data is valid
 throw new HoodieIOException("Find multiple files at partition 
path="
-+ partition + " belongs to the same bucket id = " + bucketId);
++ partition + " belongs to the same bucket id = " + bucketId
++ ", these instants need to rollback: " + instants.toString()
++ ", you can use rollback_to_instant procedure to recovery");
   }
 });
 return bucketIdToFileIdMapping;
   }
 
+  /**
+   * Find out the conflict files in bucket partition with bucket id
+   */
+  public HashSet findTheConflictBucketIdInPartition(HoodieTable 
hoodieTable, String partition, int bucketId) {
+HashSet instants = new HashSet<>();
+HoodieTableMetaClient metaClient = hoodieTable.getMetaClient();
+StoragePath basePath = metaClient.getBasePathV2();
+StoragePath partitionPath = new StoragePath(basePath.toString(), 
partition);
+
+Stream latestFileSlicesIncludingInflight = 
hoodieTable.getSliceView().getLatestFileSlicesIncludingInflight(partition);
+List pendingInstants = 
latestFileSlicesIncludingInflight.map(fileSlice1 -> 
fileSlice1.getBaseInstantTime()).collect(Collectors.toList());
+
+for (String i : pendingInstants) {
+  if (judgeInstantInPath(metaClient, partitionPath, i, bucketId)) {
+instants.add(i);
+// error out director and stop circulate when find out conflict instant
+break;
+  }
+}
+return instants;
+  }
+
+  public Boolean judgeInstantInPath(HoodieTableMetaClient metaClient, 
StoragePath path, String instant, int bucketId) {
+Boolean ret = false;
+try {
+  List fileStatuses = 
metaClient.getStorage().listFiles(path);

Review Comment:
   still we should limit the file listing to be only once for each partition.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #10898:
URL: https://github.com/apache/hudi/pull/10898#discussion_r1590493921


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/HoodieSimpleBucketIndex.java:
##
@@ -61,14 +76,66 @@ public Map 
loadBucketIdToFileIdMappingForPartitio
   if (!bucketIdToFileIdMapping.containsKey(bucketId)) {
 bucketIdToFileIdMapping.put(bucketId, new 
HoodieRecordLocation(commitTime, fileId));
   } else {
+// Finding the instants which conflict with the bucket id
+Set instants = 
findTheConflictBucketIdInPartition(hoodieTable, partition, bucketId);
+
 // Check if bucket data is valid
 throw new HoodieIOException("Find multiple files at partition 
path="
-+ partition + " belongs to the same bucket id = " + bucketId);
++ partition + " belongs to the same bucket id = " + bucketId
++ ", these instants need to rollback: " + instants.toString()
++ ", you can use rollback_to_instant procedure to recovery");
   }
 });
 return bucketIdToFileIdMapping;
   }
 
+  /**
+   * Find out the conflict files in bucket partition with bucket id
+   */
+  public HashSet findTheConflictBucketIdInPartition(HoodieTable 
hoodieTable, String partition, int bucketId) {
+HashSet instants = new HashSet<>();
+HoodieTableMetaClient metaClient = hoodieTable.getMetaClient();
+StoragePath basePath = metaClient.getBasePathV2();
+StoragePath partitionPath = new StoragePath(basePath.toString(), 
partition);
+
+Stream latestFileSlicesIncludingInflight = 
hoodieTable.getSliceView().getLatestFileSlicesIncludingInflight(partition);
+List pendingInstants = 
latestFileSlicesIncludingInflight.map(fileSlice1 -> 
fileSlice1.getBaseInstantTime()).collect(Collectors.toList());

Review Comment:
   Should we use `getLatestInstantTime` for each file slice?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7715) Partition TTL for Flink

2024-05-05 Thread xi chaomin (Jira)
xi chaomin created HUDI-7715:


 Summary: Partition TTL for Flink
 Key: HUDI-7715
 URL: https://issues.apache.org/jira/browse/HUDI-7715
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: xi chaomin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7714) Partition TTL for Flink

2024-05-05 Thread xi chaomin (Jira)
xi chaomin created HUDI-7714:


 Summary: Partition TTL for Flink
 Key: HUDI-7714
 URL: https://issues.apache.org/jira/browse/HUDI-7714
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: xi chaomin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7685) Fix delete partition instant commit in partition TTL

2024-05-05 Thread xi chaomin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xi chaomin updated HUDI-7685:
-
Description: 
After cherry pick [https://github.com/apache/hudi/pull/9723] I find the delete 
partition action can't commit. Auto commit in HoodieSparkSqlWriter is disabled, 
while SparkDeletePartitionCommitActionExecutor calls commitOnAutoCommit to 
commit the instant.

 
DataSourceUtils -> createHoodieConfig
{code:java}
// code placeholder
HoodieWriteConfig.Builder builder = HoodieWriteConfig.newBuilder()
.withPath(basePath).withAutoCommit(false).combineInput(combineInserts, 
true); {code}
 

SparkDeletePartitionCommitActionExecutor -> commitOnAutoCommit
{code:java}
// code placeholder
protected void commitOnAutoCommit(HoodieWriteMetadata result) {
  // validate commit action before committing result
  runPrecommitValidators(result);
  if (config.shouldAutoCommit()) {
LOG.info("Auto commit enabled: Committing " + instantTime);
autoCommit(result);
  } else {
LOG.info("Auto commit disabled for " + instantTime);
  }
} {code}

  was:
After cherry pick [https://github.com/apache/hudi/pull/9723] I find the delete 
partition action can‘t commit. Auto commit in HoodieSparkSqlWriter is disabled, 
while SparkDeletePartitionCommitActionExecutor calls commitOnAutoCommit to 
commit the instant.

 
DataSourceUtils -> createHoodieConfig
{code:java}
// code placeholder
HoodieWriteConfig.Builder builder = HoodieWriteConfig.newBuilder()
.withPath(basePath).withAutoCommit(false).combineInput(combineInserts, 
true); {code}
 

SparkDeletePartitionCommitActionExecutor -> commitOnAutoCommit
{code:java}
// code placeholder
protected void commitOnAutoCommit(HoodieWriteMetadata result) {
  // validate commit action before committing result
  runPrecommitValidators(result);
  if (config.shouldAutoCommit()) {
LOG.info("Auto commit enabled: Committing " + instantTime);
autoCommit(result);
  } else {
LOG.info("Auto commit disabled for " + instantTime);
  }
} {code}

Summary: Fix delete partition instant commit in partition TTL   (was: 
Fix bug in partition TTL)

> Fix delete partition instant commit in partition TTL 
> -
>
> Key: HUDI-7685
> URL: https://issues.apache.org/jira/browse/HUDI-7685
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: xi chaomin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> After cherry pick [https://github.com/apache/hudi/pull/9723] I find the 
> delete partition action can't commit. Auto commit in HoodieSparkSqlWriter is 
> disabled, while SparkDeletePartitionCommitActionExecutor calls 
> commitOnAutoCommit to commit the instant.
>  
> DataSourceUtils -> createHoodieConfig
> {code:java}
> // code placeholder
> HoodieWriteConfig.Builder builder = HoodieWriteConfig.newBuilder()
> .withPath(basePath).withAutoCommit(false).combineInput(combineInserts, 
> true); {code}
>  
> SparkDeletePartitionCommitActionExecutor -> commitOnAutoCommit
> {code:java}
> // code placeholder
> protected void commitOnAutoCommit(HoodieWriteMetadata result) {
>   // validate commit action before committing result
>   runPrecommitValidators(result);
>   if (config.shouldAutoCommit()) {
> LOG.info("Auto commit enabled: Committing " + instantTime);
> autoCommit(result);
>   } else {
> LOG.info("Auto commit disabled for " + instantTime);
>   }
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7703) Clean plan does not need to include partitions with no files to delete

2024-05-05 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-7703.

Resolution: Fixed

> Clean plan does not need to include partitions with no files to delete
> --
>
> Key: HUDI-7703
> URL: https://issues.apache.org/jira/browse/HUDI-7703
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: table-service
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
> Attachments: Screenshot 2024-04-10 at 2.59.57 PM.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [MINOR] avoid listing files for empty tables [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11155:
URL: https://github.com/apache/hudi/pull/11155#issuecomment-2095096491

   
   ## CI report:
   
   * 0e7b3d3563d1a73b9131335a993e1b25c0ab964e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23671)
 
   * 83057bf74dc211a7322d704d8711ce5b0c60ae26 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23674)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11154:
URL: https://github.com/apache/hudi/pull/11154#issuecomment-2095096473

   
   ## CI report:
   
   * 235093b2175c43d984b316b3be6fef6d8cfe79f8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23670)
 
   * 0ca382304b2197d1f32edf3800c4771de9039598 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23673)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] avoid listing files for empty tables [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11155:
URL: https://github.com/apache/hudi/pull/11155#issuecomment-2095090412

   
   ## CI report:
   
   * 0e7b3d3563d1a73b9131335a993e1b25c0ab964e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23671)
 
   * 83057bf74dc211a7322d704d8711ce5b0c60ae26 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11154:
URL: https://github.com/apache/hudi/pull/11154#issuecomment-2095090368

   
   ## CI report:
   
   * 235093b2175c43d984b316b3be6fef6d8cfe79f8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23670)
 
   * 0ca382304b2197d1f32edf3800c4771de9039598 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] avoid listing files for MDT initialization when there are no completed commits [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11155:
URL: https://github.com/apache/hudi/pull/11155#issuecomment-2095048546

   
   ## CI report:
   
   * 0e7b3d3563d1a73b9131335a993e1b25c0ab964e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23671)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11154:
URL: https://github.com/apache/hudi/pull/11154#issuecomment-2095044371

   
   ## CI report:
   
   * 235093b2175c43d984b316b3be6fef6d8cfe79f8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23670)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Error using the property hoodie.datasource.write.drop.partition.columns [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on issue #11144:
URL: https://github.com/apache/hudi/issues/11144#issuecomment-2095033520

   `hoodie.datasource.write.drop.partition.columns` is setup by default as 
false, which means the data file does not include the partition columns, the 
partition field you declared here should be a field name instead of a value 
`CID@12`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-7703] Clean plan to exclude partitions with no deleting file (#11136)

2024-05-05 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new d4f55f193cd [HUDI-7703] Clean plan to exclude partitions with no 
deleting file (#11136)
d4f55f193cd is described below

commit d4f55f193cd79f317d4fc021cc741554ce8cf6cd
Author: Shiyan Xu <2701446+xushi...@users.noreply.github.com>
AuthorDate: Sun May 5 19:25:48 2024 -0500

[HUDI-7703] Clean plan to exclude partitions with no deleting file (#11136)
---
 .../java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java | 1 +
 1 file changed, 1 insertion(+)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java
index 77c96b47f05..0329fc8ddc6 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java
@@ -138,6 +138,7 @@ public class CleanPlanActionExecutor extends 
BaseActionExecutor !e.getValue().getValue().isEmpty())
 .collect(Collectors.toMap(Map.Entry::getKey, e -> 
CleanerUtils.convertToHoodieCleanFileInfoList(e.getValue().getValue();
 
 
partitionsToDelete.addAll(cleanOpsWithPartitionMeta.entrySet().stream().filter(entry
 -> entry.getValue().getKey()).map(Map.Entry::getKey)



Re: [PR] [HUDI-7703] Clean plan to exclude partitions with no deleting file [hudi]

2024-05-05 Thread via GitHub


danny0405 merged PR #11136:
URL: https://github.com/apache/hudi/pull/11136


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] avoid listing files for MDT initialization when there are no completed commits [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11155:
URL: https://github.com/apache/hudi/pull/11155#issuecomment-2095016266

   
   ## CI report:
   
   * 0e7b3d3563d1a73b9131335a993e1b25c0ab964e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23671)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11154:
URL: https://github.com/apache/hudi/pull/11154#issuecomment-2095016252

   
   ## CI report:
   
   * c5bfe41814cd2a4f5c9e9c4b37d625eda57f193e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23669)
 
   * 235093b2175c43d984b316b3be6fef6d8cfe79f8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23670)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Use parent as the glob path when full file path specified [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #11150:
URL: https://github.com/apache/hudi/pull/11150#discussion_r1590454993


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##
@@ -457,9 +457,10 @@ private Dataset 
readRecordsForGroupAsRow(JavaSparkContext jsc,
 
 String readPathString =
 String.join(",", 
Arrays.stream(paths).map(StoragePath::toString).toArray(String[]::new));
+String globPathString = String.join(",", 
Arrays.stream(paths).map(StoragePath::getParent).map(StoragePath::toString).distinct().toArray(String[]::new));
 params.put("hoodie.datasource.read.paths", readPathString);
 // Building HoodieFileIndex needs this param to decide query path
-params.put("glob.paths", readPathString);
+params.put("glob.paths", globPathString);
 

Review Comment:
   Not sure whether `TestHoodieSparkMergeOnReadTableClustering` is the 
candidate.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7712] Fixing RLI initialization to account for file slices instead of just base files while initializing [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #11153:
URL: https://github.com/apache/hudi/pull/11153#discussion_r1590454600


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergedReadHandle.java:
##
@@ -52,23 +59,55 @@
 public class HoodieMergedReadHandle extends HoodieReadHandle {
 
   protected final Schema readerSchema;
+  private final Option fileSliceOpt;
+  private final HoodieTableMetaClient metaClient;
+  private final TaskContextSupplier taskContextSupplier;
 
   public HoodieMergedReadHandle(HoodieWriteConfig config,
 Option instantTime,
 HoodieTable hoodieTable,
 Pair partitionPathFileIDPair) {
+this(config, instantTime, hoodieTable, hoodieTable.getMetaClient(), 
hoodieTable.getTaskContextSupplier(),
+partitionPathFileIDPair, Option.empty());
+  }
+
+  public HoodieMergedReadHandle(HoodieWriteConfig config,
+Option instantTime,
+HoodieTable hoodieTable,
+HoodieTableMetaClient metaClient,
+TaskContextSupplier taskContextSupplier,
+Pair partitionPathFileIDPair,
+Option fileSliceOption) {
 super(config, instantTime, hoodieTable, partitionPathFileIDPair);
 readerSchema = HoodieAvroUtils.addMetadataFields(new 
Schema.Parser().parse(config.getSchema()), 
config.allowOperationMetadataField());
+if (hoodieTable != null) {
+  this.metaClient = hoodieTable.getMetaClient();
+} else {
+  this.metaClient = metaClient;
+}
+if (this.storage == null) {
+  this.storage = this.metaClient.getStorage();
+  this.fs = (FileSystem) this.storage.getFileSystem();
+}
+this.taskContextSupplier = taskContextSupplier;
+fileSliceOpt = fileSliceOption.isPresent() ? fileSliceOption : 
getLatestFileSlice();
+  }
+
+  @Override
+  public HoodieStorage getStorage() {
+// constructor in  HoodieIOHandle calls this. We do have two different 
code paths, where either of HoodieTable is null or meta client.
+// In case metaClient not being null, we can set fileSystem after the 
constructor of HoodieIOHandle is called. Hence return null in the interim.

Review Comment:
   Can we create both of them on the fly, why we need this complexity?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7712] Fixing RLI initialization to account for file slices instead of just base files while initializing [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #11153:
URL: https://github.com/apache/hudi/pull/11153#discussion_r1590454451


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -575,6 +598,40 @@ private Pair> 
initializeRecordIndexPartition()
 return Pair.of(fileGroupCount, records);
   }
 
+  private static HoodieData 
readRecordKeysFromFileSlices(HoodieEngineContext engineContext,
+  
List> partitionFileSlicePairs,

Review Comment:
   Does this code work for `COW` table too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] avoid listing files for MDT initialization when there are no completed commits [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11155:
URL: https://github.com/apache/hudi/pull/11155#issuecomment-2095012701

   
   ## CI report:
   
   * 0e7b3d3563d1a73b9131335a993e1b25c0ab964e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11154:
URL: https://github.com/apache/hudi/pull/11154#issuecomment-2095012685

   
   ## CI report:
   
   * c5bfe41814cd2a4f5c9e9c4b37d625eda57f193e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23669)
 
   * 235093b2175c43d984b316b3be6fef6d8cfe79f8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7710] Use compaction.requested during conflict resolution [hudi]

2024-05-05 Thread via GitHub


danny0405 commented on code in PR #11151:
URL: https://github.com/apache/hudi/pull/11151#discussion_r1590453222


##
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestSimpleConcurrentFileWritesConflictResolutionStrategy.java:
##
@@ -193,6 +193,38 @@ public void 
testConcurrentWritesWithInterleavingScheduledCompaction() throws Exc
 }
   }
 
+  @Test
+  public void testConcurrentWritesWithInterleavingInflightCompaction() throws 
Exception {
+createCommit(metaClient.createNewInstantTime(), metaClient);
+HoodieActiveTimeline timeline = metaClient.getActiveTimeline();
+// consider commits before this are all successful
+Option lastSuccessfulInstant = 
timeline.getCommitsTimeline().filterCompletedInstants().lastInstant();
+// writer 1 starts
+String currentWriterInstant = metaClient.createNewInstantTime();
+createInflightCommit(currentWriterInstant, metaClient);
+// compaction 1 gets scheduled and becomes inflight
+String newInstantTime = metaClient.createNewInstantTime();
+createPendingCompaction(newInstantTime, metaClient);
+
+Option currentInstant = Option.of(new 
HoodieInstant(State.INFLIGHT, HoodieTimeline.COMMIT_ACTION, 
currentWriterInstant));

Review Comment:
   So this is a COW table?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [MINOR] avoid listing files for MDT initialization when there are no completed commits [hudi]

2024-05-05 Thread via GitHub


the-other-tim-brown opened a new pull request, #11155:
URL: https://github.com/apache/hudi/pull/11155

   ### Change Logs
   
   - Avoid extra listing
   - Make listing use a queue instead of copying linked lists
   
   ### Impact
   
   Minor reduction in overhead when initializing MDT
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11154:
URL: https://github.com/apache/hudi/pull/11154#issuecomment-2094994051

   
   ## CI report:
   
   * c5bfe41814cd2a4f5c9e9c4b37d625eda57f193e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23669)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11154:
URL: https://github.com/apache/hudi/pull/11154#issuecomment-2094992191

   
   ## CI report:
   
   * ccd797847fbe66e887d43b6e402d8701a6b7e81a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23668)
 
   * c5bfe41814cd2a4f5c9e9c4b37d625eda57f193e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23669)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]

2024-05-05 Thread via GitHub


hudi-bot commented on PR #11154:
URL: https://github.com/apache/hudi/pull/11154#issuecomment-2094989856

   
   ## CI report:
   
   * ccd797847fbe66e887d43b6e402d8701a6b7e81a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23668)
 
   * c5bfe41814cd2a4f5c9e9c4b37d625eda57f193e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi-rs) branch main updated: style: enforce rust code style (#14)

2024-05-05 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/hudi-rs.git


The following commit(s) were added to refs/heads/main by this push:
 new 586d210  style: enforce rust code style (#14)
586d210 is described below

commit 586d210f7593dd97f0b3305b5fb0bdbd08799123
Author: Shiyan Xu <2701446+xushi...@users.noreply.github.com>
AuthorDate: Sun May 5 17:33:54 2024 -0500

style: enforce rust code style (#14)

Use `rustfmt` and `clippy` to enforce code style through CI check job.
---
 .github/workflows/ci.yml  |  8 +--
 Cargo.toml|  6 +-
 Makefile  | 31 +++
 crates/core/src/error.rs  | 11 
 crates/core/src/file_group/mod.rs | 66 --
 crates/core/src/table/file_system_view.rs | 64 --
 crates/core/src/table/meta_client.rs  | 83 +++-
 crates/core/src/table/mod.rs  | 25 +
 crates/core/src/timeline/mod.rs   | 91 ---
 crates/datafusion/src/bin/main.rs |  4 +-
 crates/datafusion/src/lib.rs  | 25 +++--
 crates/fs/src/file_systems.rs | 29 ++
 rust-toolchain.toml   | 21 +++
 13 files changed, 267 insertions(+), 197 deletions(-)

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 948a05b..4bc8235 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -34,9 +34,10 @@ jobs:
 runs-on: ubuntu-latest
 steps:
   - uses: actions/checkout@v4
-
-  - name: Check License Header
+  - name: Check license header
 uses: apache/skywalking-eyes/header@v0.6.0
+  - name: Check code style
+run: make check
 
   build:
 runs-on: ${{ matrix.os }}
@@ -55,6 +56,5 @@ jobs:
 runs-on: ubuntu-latest
 steps:
   - uses: actions/checkout@v4
-
-  - name: Unit Test
+  - name: Unit test
 run: cargo test --no-fail-fast --all-targets --all-features --workspace
diff --git a/Cargo.toml b/Cargo.toml
index a24ee76..8c9a163 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -22,7 +22,7 @@ members = [
 resolver = "2"
 
 [workspace.package]
-version = "0.2.0"
+version = "0.1.0"
 edition = "2021"
 license = "Apache-2.0"
 rust-version = "1.75.0"
@@ -31,7 +31,7 @@ rust-version = "1.75.0"
 # arrow
 arrow = { version = "50" }
 arrow-arith = { version = "50" }
-arrow-array = { version = "50", features = ["chrono-tz"]}
+arrow-array = { version = "50", features = ["chrono-tz"] }
 arrow-buffer = { version = "50" }
 arrow-cast = { version = "50" }
 arrow-ipc = { version = "50" }
@@ -68,4 +68,4 @@ uuid = { version = "1" }
 async-trait = { version = "0.1" }
 futures = { version = "0.3" }
 tokio = { version = "1" }
-num_cpus = { version = "1" }
\ No newline at end of file
+num_cpus = { version = "1" }
diff --git a/Makefile b/Makefile
new file mode 100644
index 000..3a5ba3f
--- /dev/null
+++ b/Makefile
@@ -0,0 +1,31 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+.EXPORT_ALL_VARIABLES:
+
+RUST_LOG = debug
+
+build:
+   cargo build
+
+check-fmt:
+   cargo fmt --all -- --check
+
+check-clippy:
+   cargo clippy --all-targets --all-features --workspace -- -D warnings
+
+check: check-fmt check-clippy
diff --git a/crates/core/src/error.rs b/crates/core/src/error.rs
index 9b6e3c0..f7f0bd7 100644
--- a/crates/core/src/error.rs
+++ b/crates/core/src/error.rs
@@ -19,24 +19,15 @@
 
 use std::error::Error;
 use std::fmt::Debug;
-use std::io;
 
 use thiserror::Error;
 
 #[derive(Debug, Error)]
 pub enum HudiFileGroupError {
-#[error("Base File {0} has unsupported format: {1}")]
-UnsupportedBaseFileFormat(String, String),
 #[error("Commit time {0} is already present in File Group {1}")]
 CommitTimeAlreadyExists(String, String),
 }
 
-#[derive(Debug, Error)]
-pub enum HudiTimelineError {
-#[error("Error in reading commit metadata: {0}")]
-FailToReadCommitMetadata(io::Error),
-}
-
 #[derive(Debug, Error)]
 pub enum HudiFileSystemViewError {
 #[error("Error in loading partitions: {0}")]
@@ -

Re: [PR] style: enforce rust code style [hudi-rs]

2024-05-05 Thread via GitHub


xushiyan merged PR #14:
URL: https://github.com/apache/hudi-rs/pull/14


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Enforce Rust code style [hudi-rs]

2024-05-05 Thread via GitHub


xushiyan closed issue #13: Enforce Rust code style
URL: https://github.com/apache/hudi-rs/issues/13


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] style: enforce rust code style [hudi-rs]

2024-05-05 Thread via GitHub


xushiyan opened a new pull request, #14:
URL: https://github.com/apache/hudi-rs/pull/14

   Use `rustfmt` and `clippy` to enforce code style. Enforce code style in CI 
check job.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7710] Use compaction.requested during conflict resolution [hudi]

2024-05-05 Thread via GitHub


linliu-code commented on PR #11151:
URL: https://github.com/apache/hudi/pull/11151#issuecomment-2094949761

   @yihua @nsivabalan Please review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi-rs) branch main updated: chore: add commit linting (#12)

2024-05-05 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/hudi-rs.git


The following commit(s) were added to refs/heads/main by this push:
 new 374a125  chore: add commit linting (#12)
374a125 is described below

commit 374a125db27893409d576608fdaccd7a97f70d9f
Author: Shiyan Xu <2701446+xushi...@users.noreply.github.com>
AuthorDate: Sun May 5 15:12:44 2024 -0500

chore: add commit linting (#12)

Add github action job to enforce commit message conventional specified by 
https://www.conventionalcommits.org/en/v1.0.0/
---
 .commitlintrc.yaml   | 46 ++
 .github/workflows/pr.yml | 37 +
 2 files changed, 83 insertions(+)

diff --git a/.commitlintrc.yaml b/.commitlintrc.yaml
new file mode 100644
index 000..20ffc70
--- /dev/null
+++ b/.commitlintrc.yaml
@@ -0,0 +1,46 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+---
+# The rules below have been manually copied from 
@commitlint/config-conventional
+# and match the v1.0.0 specification:
+# https://www.conventionalcommits.org/en/v1.0.0/#specification
+#
+# You can remove them and uncomment the config below when the following issue 
is
+# fixed: https://github.com/conventional-changelog/commitlint/issues/613
+#
+# extends:
+#   - '@commitlint/config-conventional'
+rules:
+  body-leading-blank: [1, always]
+  body-max-line-length: [2, always, 100]
+  footer-leading-blank: [1, always]
+  footer-max-line-length: [2, always, 100]
+  header-max-length: [2, always, 100]
+  subject-case:
+- 2
+- never
+- [sentence-case, start-case, pascal-case, upper-case]
+  subject-empty: [2, never]
+  subject-full-stop: [2, never, "."]
+  type-case: [2, always, lower-case]
+  type-empty: [2, never]
+  type-enum:
+- 2
+- always
+- [build, chore, ci, docs, feat, fix, perf, refactor, revert, style, test]
diff --git a/.github/workflows/pr.yml b/.github/workflows/pr.yml
new file mode 100644
index 000..13f3c13
--- /dev/null
+++ b/.github/workflows/pr.yml
@@ -0,0 +1,37 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+name: PR
+
+on:
+  pull_request:
+types: [ opened, edited, reopened, synchronize ]
+branches:
+  - main
+
+jobs:
+  check_title:
+runs-on: ubuntu-latest
+steps:
+  - uses: actions/checkout@v4
+with:
+  node-version: '20.x'
+  - name: Linting
+run: |
+  npm i -g conventional-changelog-conventionalcommits
+  npm i -g commitlint@latest
+  echo ${{ github.event.pull_request.title }} | npx commitlint



Re: [PR] chore: add commit linting flow [hudi-rs]

2024-05-05 Thread via GitHub


xushiyan merged PR #12:
URL: https://github.com/apache/hudi-rs/pull/12


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >