date:20240609

[jira] [Updated] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7852:

Story Points: 1

> Constrain the comparison of different types of ordering values to limited 
> cases
> ---
>
> Key: HUDI-7852
> URL: https://issues.apache.org/jira/browse/HUDI-7852
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers by casting 
> them to the long value, which may not be safe for Float and Double.  We 
> should limit the allowed cases to avoid wrong results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7852:

Status: In Progress  (was: Open)

> Constrain the comparison of different types of ordering values to limited 
> cases
> ---
>
> Key: HUDI-7852
> URL: https://issues.apache.org/jira/browse/HUDI-7852
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers by casting 
> them to the long value, which may not be safe for Float and Double.  We 
> should limit the allowed cases to avoid wrong results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-7851] Fix java doc of DeltaWriteProfile [hudi]

2024-06-09 Thread via GitHub



hudi-bot commented on PR #11422:
URL: https://github.com/apache/hudi/pull/11422#issuecomment-2157229627

   
   ## CI report:
   
   * c7b9a3e72f987f3de9fa15917526fbb6f55d8d1b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24326)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7851] Fix java doc of DeltaWriteProfile [hudi]

2024-06-09 Thread via GitHub



hudi-bot commented on PR #11422:
URL: https://github.com/apache/hudi/pull/11422#issuecomment-2157189600

   
   ## CI report:
   
   * c7b9a3e72f987f3de9fa15917526fbb6f55d8d1b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub



yihua commented on code in PR #9894:
URL: https://github.com/apache/hudi/pull/9894#discussion_r1632554676


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java:
##
@@ -147,6 +153,37 @@ public void close() {
 records.clear();
   }
 
+  /**
+   * Compares two {@link Comparable}s.  If both are numbers, converts them to 
{@link Long} for comparison.
+   * If one of the {@link Comparable}s is a String, assumes that both are 
String values for comparison.
+   *
+   * @param o1 {@link Comparable} object.
+   * @param o2 other {@link Comparable} object to compare to.
+   * @return comparison result.
+   */
+  @VisibleForTesting
+  static int compareTo(Comparable o1, Comparable o2) {
+// TODO(HUDI-7848): fix the delete records to contain the correct ordering 
value type
+//  so this util with the number comparison is not necessary.
+try {
+  return o1.compareTo(o2);
+} catch (ClassCastException e) {
+  if (o1 instanceof Number && o2 instanceof Number) {
+Long o1LongValue = ((Number) o1).longValue();
+Long o2LongValue = ((Number) o2).longValue();
+return o1LongValue.compareTo(o2LongValue);

Review Comment:
   HUDI-7852 to track.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7852:

Description: HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers 
by casting them to the long value, which may not be safe for Float and Double.  
We should limit the allowed cases to avoid wrong results.

> Constrain the comparison of different types of ordering values to limited 
> cases
> ---
>
> Key: HUDI-7852
> URL: https://issues.apache.org/jira/browse/HUDI-7852
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
>
> HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers by casting 
> them to the long value, which may not be safe for Float and Double.  We 
> should limit the allowed cases to avoid wrong results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7852:

Fix Version/s: 1.0.0

> Constrain the comparison of different types of ordering values to limited 
> cases
> ---
>
> Key: HUDI-7852
> URL: https://issues.apache.org/jira/browse/HUDI-7852
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers by casting 
> them to the long value, which may not be safe for Float and Double.  We 
> should limit the allowed cases to avoid wrong results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-7852:
---

Assignee: Ethan Guo

> Constrain the comparison of different types of ordering values to limited 
> cases
> ---
>
> Key: HUDI-7852
> URL: https://issues.apache.org/jira/browse/HUDI-7852
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>
> HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers by casting 
> them to the long value, which may not be safe for Float and Double.  We 
> should limit the allowed cases to avoid wrong results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases

2024-06-09 Thread Ethan Guo (Jira)

Ethan Guo created HUDI-7852:
---

 Summary: Constrain the comparison of different types of ordering 
values to limited cases
 Key: HUDI-7852
 URL: https://issues.apache.org/jira/browse/HUDI-7852
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub



yihua commented on code in PR #9894:
URL: https://github.com/apache/hudi/pull/9894#discussion_r1632553603


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java:
##
@@ -147,6 +153,37 @@ public void close() {
 records.clear();
   }
 
+  /**
+   * Compares two {@link Comparable}s.  If both are numbers, converts them to 
{@link Long} for comparison.
+   * If one of the {@link Comparable}s is a String, assumes that both are 
String values for comparison.
+   *
+   * @param o1 {@link Comparable} object.
+   * @param o2 other {@link Comparable} object to compare to.
+   * @return comparison result.
+   */
+  @VisibleForTesting
+  static int compareTo(Comparable o1, Comparable o2) {
+// TODO(HUDI-7848): fix the delete records to contain the correct ordering 
value type
+//  so this util with the number comparison is not necessary.
+try {
+  return o1.compareTo(o2);
+} catch (ClassCastException e) {
+  if (o1 instanceof Number && o2 instanceof Number) {
+Long o1LongValue = ((Number) o1).longValue();
+Long o2LongValue = ((Number) o2).longValue();
+return o1LongValue.compareTo(o2LongValue);

Review Comment:
   We can constrain the comparison to Long and Integer only to limit the 
possibility of wrong results.  I'll create a follow-up PR to fix this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub



yihua commented on code in PR #9894:
URL: https://github.com/apache/hudi/pull/9894#discussion_r1632551875


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java:
##
@@ -147,6 +153,37 @@ public void close() {
 records.clear();
   }
 
+  /**
+   * Compares two {@link Comparable}s.  If both are numbers, converts them to 
{@link Long} for comparison.
+   * If one of the {@link Comparable}s is a String, assumes that both are 
String values for comparison.
+   *
+   * @param o1 {@link Comparable} object.
+   * @param o2 other {@link Comparable} object to compare to.
+   * @return comparison result.
+   */
+  @VisibleForTesting
+  static int compareTo(Comparable o1, Comparable o2) {
+// TODO(HUDI-7848): fix the delete records to contain the correct ordering 
value type

Review Comment:
   Yes, based on the test cases this only happens when the ordering field value 
is deserialized from the delete records.  We need to check if the existing 
Avro-based merging logic has done schema handling to make this work (which may 
also incur additional overhead).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub



yihua commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2157165840

   CI is green.
   https://github.com/apache/hudi/assets/2497195/6d8f4fa9-3e64-4914-9a46-05e8783cd458;>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-7851) Fix java doc of DeltaWriteProfile

2024-06-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7851:
-
Labels: pull-request-available  (was: )

> Fix java doc of DeltaWriteProfile
> -
>
> Key: HUDI-7851
> URL: https://issues.apache.org/jira/browse/HUDI-7851
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: bradley
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[PR] [HUDI-7851] Fix java doc of DeltaWriteProfile [hudi]

2024-06-09 Thread via GitHub



usberkeley opened a new pull request, #11422:
URL: https://github.com/apache/hudi/pull/11422

   ### Change Logs
   
   Fix java doc of DeltaWriteProfile
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [1] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [1] Change Logs and Impact were stated clearly
   - [1] Adequate tests were added if applicable
   - [1] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

(hudi) branch master updated: [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader (#9894)

2024-06-09 Thread codope

This is an automated email from the ASF dual-hosted git repository.

codope pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new c0576131759 [HUDI-6798] Add record merging mode and implement 
event-time ordering in the new file group reader (#9894)
c0576131759 is described below

commit c05761317596585a3c0c3cc69a34b4407843351c
Author: Y Ethan Guo 
AuthorDate: Sun Jun 9 20:48:09 2024 -0700

[HUDI-6798] Add record merging mode and implement event-time ordering in 
the new file group reader (#9894)

This PR adds a new table config `hoodie.record.merge.mode` to control the
record merging mode and behavior in the new file group reader
(`HoodieFileGroupReader`) and implements event-time ordering in it.
The config `hoodie.record.merge.mode` is going to be the single config that
determines how the record merging happens in release 1.0 and beyond.

-

Co-authored-by: Sagar Sumit 
---
 .../hudi/client/TestTableSchemaEvolution.java  |   3 +
 .../hudi/common/config/HoodieCommonConfig.java |   3 +
 .../apache/hudi/common/config/RecordMergeMode.java |  36 
 .../hudi/common/table/HoodieTableConfig.java   |  13 +-
 .../hudi/common/table/HoodieTableMetaClient.java   | 114 ++-
 .../table/log/BaseHoodieLogRecordReader.java   |   7 +
 .../table/log/HoodieMergedLogRecordReader.java |  13 +-
 .../read/HoodieBaseFileGroupRecordBuffer.java  | 209 -
 .../common/table/read/HoodieFileGroupReader.java   |  26 ++-
 .../table/read/TestHoodieFileGroupReaderBase.java  |  77 ++--
 .../common/table/TestHoodieTableMetaClient.java| 144 ++
 .../hudi/common/table/read/TestCustomMerger.java   |   4 +
 .../common/table/read/TestEventTimeMerging.java|   4 +
 ...stHoodiePositionBasedFileGroupRecordBuffer.java |   6 +-
 .../read/TestHoodieFileGroupReaderOnSpark.scala|  11 +-
 15 files changed, 588 insertions(+), 82 deletions(-)

diff --git 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java
 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java
index f5fa70c6668..496b42c13d6 100644
--- 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java
+++ 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java
@@ -20,6 +20,7 @@ package org.apache.hudi.client;
 
 import org.apache.hudi.avro.AvroSchemaUtils;
 import org.apache.hudi.avro.HoodieAvroUtils;
+import org.apache.hudi.common.config.RecordMergeMode;
 import org.apache.hudi.common.model.HoodieAvroRecord;
 import org.apache.hudi.common.model.HoodieKey;
 import org.apache.hudi.common.model.HoodieRecord;
@@ -48,6 +49,7 @@ import java.io.IOException;
 import java.util.List;
 import java.util.stream.Collectors;
 
+import static 
org.apache.hudi.common.config.HoodieCommonConfig.RECORD_MERGE_MODE;
 import static 
org.apache.hudi.common.table.timeline.versioning.TimelineLayoutVersion.VERSION_1;
 import static 
org.apache.hudi.common.testutils.HoodieTestDataGenerator.EXTRA_TYPE_SCHEMA;
 import static 
org.apache.hudi.common.testutils.HoodieTestDataGenerator.FARE_NESTED_SCHEMA;
@@ -165,6 +167,7 @@ public class TestTableSchemaEvolution extends 
HoodieClientTestBase {
 HoodieTableMetaClient.withPropertyBuilder()
 .fromMetaClient(metaClient)
 .setTableType(HoodieTableType.MERGE_ON_READ)
+
.setRecordMergeMode(RecordMergeMode.valueOf(RECORD_MERGE_MODE.defaultValue()))
 .setTimelineLayoutVersion(VERSION_1)
 .initTable(metaClient.getStorageConf().newInstance(), 
metaClient.getBasePath());
 
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieCommonConfig.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieCommonConfig.java
index 1a4c2e31780..c96b07ee4f0 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieCommonConfig.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieCommonConfig.java
@@ -18,6 +18,7 @@
 
 package org.apache.hudi.common.config;
 
+import org.apache.hudi.common.table.HoodieTableConfig;
 import 
org.apache.hudi.common.table.timeline.TimelineUtils.HollowCommitHandling;
 import org.apache.hudi.common.util.collection.ExternalSpillableMap;
 
@@ -81,6 +82,8 @@ public class HoodieCommonConfig extends HoodieConfig {
   + " operation will fail schema compatibility check. Set this option 
to true will make the missing "
   + " column be filled with null values to successfully complete the 
write operation.");
 
+  public static final ConfigProperty RECORD_MERGE_MODE = 
HoodieTableConfig.RECORD_MERGE_MODE;
+
   public static final ConfigProperty 
SPILLABLE_DISK_MAP_TYPE = ConfigProperty
   .key("hoodie.common.spillable.diskmap.type")

Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub



codope merged PR #9894:
URL: https://github.com/apache/hudi/pull/9894


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-7851) Fix java doc of DeltaWriteProfile

2024-06-09 Thread bradley (Jira)

bradley created HUDI-7851:
-

 Summary: Fix java doc of DeltaWriteProfile
 Key: HUDI-7851
 URL: https://issues.apache.org/jira/browse/HUDI-7851
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: bradley






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub



hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2157141857

   
   ## CI report:
   
   * 3a1ec4524a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24313)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub



hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2157136586

   
   ## CI report:
   
   * ca01c48cd352583dbf024006de57c9f6827b237b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24324)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24323)
 
   * 3a1ec4524a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub



codope commented on code in PR #9894:
URL: https://github.com/apache/hudi/pull/9894#discussion_r1632524037


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java:
##
@@ -147,6 +153,37 @@ public void close() {
 records.clear();
   }
 
+  /**
+   * Compares two {@link Comparable}s.  If both are numbers, converts them to 
{@link Long} for comparison.
+   * If one of the {@link Comparable}s is a String, assumes that both are 
String values for comparison.
+   *
+   * @param o1 {@link Comparable} object.
+   * @param o2 other {@link Comparable} object to compare to.
+   * @return comparison result.
+   */
+  @VisibleForTesting
+  static int compareTo(Comparable o1, Comparable o2) {
+// TODO(HUDI-7848): fix the delete records to contain the correct ordering 
value type

Review Comment:
   does this happen only for delete records?



##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java:
##
@@ -147,6 +153,37 @@ public void close() {
 records.clear();
   }
 
+  /**
+   * Compares two {@link Comparable}s.  If both are numbers, converts them to 
{@link Long} for comparison.
+   * If one of the {@link Comparable}s is a String, assumes that both are 
String values for comparison.
+   *
+   * @param o1 {@link Comparable} object.
+   * @param o2 other {@link Comparable} object to compare to.
+   * @return comparison result.
+   */
+  @VisibleForTesting
+  static int compareTo(Comparable o1, Comparable o2) {
+// TODO(HUDI-7848): fix the delete records to contain the correct ordering 
value type
+//  so this util with the number comparison is not necessary.
+try {
+  return o1.compareTo(o2);
+} catch (ClassCastException e) {
+  if (o1 instanceof Number && o2 instanceof Number) {
+Long o1LongValue = ((Number) o1).longValue();
+Long o2LongValue = ((Number) o2).longValue();
+return o1LongValue.compareTo(o2LongValue);

Review Comment:
   can possibly lead to wrong result with float/double?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7826] Make column nullable when setNullForMissingColumns is true [hudi]

2024-06-09 Thread via GitHub



the-other-tim-brown commented on code in PR #11381:
URL: https://github.com/apache/hudi/pull/11381#discussion_r1632525375


##
hudi-common/src/main/java/org/apache/hudi/internal/schema/utils/AvroSchemaEvolutionUtils.java:
##
@@ -113,6 +120,21 @@ public static InternalSchema reconcileSchema(Schema 
incomingSchema, InternalSche
   typeChange.updateColumnType(col, inComingInternalSchema.findType(col));
 });
 
+// mark columns missing from incoming schema as nullable
+Set visited = new HashSet<>();
+diffFromOldSchema.stream()
+// ignore meta fields
+.filter(col -> !META_FIELD_NAMES.contains(col))
+.sorted()
+.forEach(col -> {
+  // if parent is marked as nullable, only update the parent and not 
all the missing children field
+  String parent = TableChangesHelper.getParentName(col);
+  if (!visited.contains(parent)) {
+typeChange.updateColumnNullability(col, true);
+  }
+  visited.add(col);
+});

Review Comment:
   @nsivabalan I've updated the PR to include the boolean



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (HUDI-7849) Reduce time spent on running testFiltersInFileFormat

2024-06-09 Thread Vova Kolmakov (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov reassigned HUDI-7849:
---

Assignee: Vova Kolmakov

> Reduce time spent on running testFiltersInFileFormat
> 
>
> Key: HUDI-7849
> URL: https://issues.apache.org/jira/browse/HUDI-7849
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vova Kolmakov
>Priority: Major
> Fix For: 1.0.0
>
>
> Below shows the top long-running tests in the job "UT flink & FT common & 
> flink & spark-client & hudi-spark" in Azure CI.  The time running 
> testFiltersInFileFormat should be reduced.
> {code:java}
> /usr/bin/bash --noprofile --norc 
> /home/vsts/work/_temp/4fa77791-00bc-40cc-82d7-1fb635914a0f.sh
> grep: */target/surefire-reports/*.xml: No such file or directory
> 366.474 boolean) [2] false(testFiltersInFileFormat
> 223.221 boolean) [1] true(testFiltersInFileFormat
> 80.903 HoodieTableType, Integer) [3] MERGE_ON_READ, 2(testNewParquetFileFormat
> 65.48 boolean) [2] true(testDeletePartitionAndArchive
> 56.558 boolean) [1] false(testDeletePartitionAndArchive{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-7847) Infer record merge mode during table upgrade

2024-06-09 Thread Geser Dugarov (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geser Dugarov reassigned HUDI-7847:
---

Assignee: Geser Dugarov

> Infer record merge mode during table upgrade
> 
>
> Key: HUDI-7847
> URL: https://issues.apache.org/jira/browse/HUDI-7847
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Geser Dugarov
>Priority: Major
> Fix For: 1.0.0
>
>
> Record merge mode is required to dictate the merging behavior in release 1.x, 
> playing the same role as the payload class config in the release 0.x.  During 
> table upgrade, we need to infer the record merge mode based on the payload 
> class so it's correctly set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-7838) Use Config hoodie.schema.cache.enable in HoodieBaseFileGroupRecordBuffer and AbstractHoodieLogRecordReader

2024-06-09 Thread Vova Kolmakov (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov reassigned HUDI-7838:
---

Assignee: Vova Kolmakov

> Use Config hoodie.schema.cache.enable in HoodieBaseFileGroupRecordBuffer and  
> AbstractHoodieLogRecordReader
> ---
>
> Key: HUDI-7838
> URL: https://issues.apache.org/jira/browse/HUDI-7838
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: reader-core
>Reporter: Jonathan Vexler
>Assignee: Vova Kolmakov
>Priority: Major
>
> hoodie.schema.cache.enable should be used to decide if we want to use the 
> schema cache. Currently it is hardcoded to false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-7826] Make column nullable when setNullForMissingColumns is true [hudi]

2024-06-09 Thread via GitHub



hudi-bot commented on PR #11381:
URL: https://github.com/apache/hudi/pull/11381#issuecomment-2157083778

   
   ## CI report:
   
   * 0d1802d42d4b67cc791cbd8d8c4619dd7a52d319 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24320)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (HUDI-7850) Makes hoodie.record.merge.mode mandatory upon creating the table and first write

2024-06-09 Thread Geser Dugarov (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geser Dugarov reassigned HUDI-7850:
---

Assignee: Geser Dugarov

> Makes hoodie.record.merge.mode mandatory upon creating the table and first 
> write
> 
>
> Key: HUDI-7850
> URL: https://issues.apache.org/jira/browse/HUDI-7850
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Geser Dugarov
>Priority: Major
> Fix For: 1.0.0
>
>
> Right now, "hoodie.record.merge.mode" is optional during writes as it is 
> inferred from the payload class name, payload type, and the record merger 
> strategy during the creation of the table properties.  We should make this 
> config mandatory in release 1.0 and make other merge configs optional to 
> simplify the configuration experience.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub



hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2157082849

   
   ## CI report:
   
   * ca01c48cd352583dbf024006de57c9f6827b237b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24324)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24323)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-7839) Can not find props file when using HoodieDeltaStreamer with Hudi 0.14.1

2024-06-09 Thread Vova Kolmakov (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853545#comment-17853545
 ] 

Vova Kolmakov commented on HUDI-7839:
-

Fixed via master branch: 9f9064761bac766cc7884027432568c06817ddd7

> Can not find props file when using HoodieDeltaStreamer with Hudi 0.14.1
> ---
>
> Key: HUDI-7839
> URL: https://issues.apache.org/jira/browse/HUDI-7839
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Xiaoxuan Li
>Assignee: Vova Kolmakov
>Priority: Major
>
> When use HoodieDeltaStreamer with Hudi 0.14.1, the following error was throw
> {noformat}
> Cannot read properties from dfs from file 
> file:/mnt1/yarn/usercache/hadoop/appcache/application_1717399456895_0009/container_1717399456895_0009_02_01/src/test/resources/streamer-config/dfs-source.properties{noformat}
>  
> It works fine on Hudi 0.14.0. It might related to a new change bring in 
> 0.14.1 -> [https://github.com/apache/hudi/pull/9913]
>  
> error log:
> {code:java}
> 24/06/06 22:42:09 INFO Client:client token: N/Adiagnostics: User class threw 
> exception: org.apache.hudi.exception.HoodieIOException: Cannot read 
> properties from dfs from file 
> file:/mnt1/yarn/usercache/hadoop/appcache/application_1717399456895_0009/container_1717399456895_0009_02_01/src/test/resources/streamer-config/dfs-source.propertiesat
>  
> org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:166)at
>  
> org.apache.hudi.common.config.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:85)at
>  org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:232)at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$Config.getProps(HoodieStreamer.java:437)at
>  
> org.apache.hudi.utilities.streamer.StreamSync.getDeducedSchemaProvider(StreamSync.java:656)at
>  
> org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:632)at
>  
> org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:525)at
>  
> org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:498)at
>  
> org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:404)at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:850)at
>  
> org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)at
>  org.apache.hudi.common.util.Option.ifPresent(Option.java:97)at 
> org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:207)at
>  
> org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:592)at
>  java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)at
>  
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at
>  java.base/java.lang.reflect.Method.invoke(Method.java:568)at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:741)Caused
>  by: java.io.FileNotFoundException: File 
> file:/mnt1/yarn/usercache/hadoop/appcache/application_1717399456895_0009/container_1717399456895_0009_02_01/src/test/resources/streamer-config/dfs-source.properties
>  does not existat 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:968)at
>  
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1289)at
>  
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:958)at
>  
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:472)at
>  
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:188)at
>  org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:581)at 
> org.apache.hadoop.fs.FileSystem.open(FileSystem.java:1004)at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:161)...
>  18 more
> ApplicationMaster host: ip-172-31-75-55.ec2.internalApplicationMaster RPC 
> port: 43905queue: defaultstart time: 1717713711465final status: 
> FAILEDtracking URL: 
> http://ip-172-31-69-122.ec2.internal:20888/proxy/application_1717399456895_0009/user:
>  hadoop24/06/06 22:42:09 ERROR Client: Application diagnostics message: User 
> class threw exception: org.apache.hudi.exception.HoodieIOException: Cannot 
> read properties from dfs from file 
> file:/mnt1/yarn/usercache/hadoop/appcache/application_1717399456895_0009/container_1717399456895_0009_02_01/src/test/resources/streamer-config/dfs-source.propertiesat
>  
>

Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub



hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2157034573

   
   ## CI report:
   
   * 7b6c9d86accaf976f4db0185fa1a203c82f04446 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24322)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24321)
 
   * ca01c48cd352583dbf024006de57c9f6827b237b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub



hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2157025978

   
   ## CI report:
   
   * a6ffe1240055d6135a517dfcada59edc95383423 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24318)
 
   * 7b6c9d86accaf976f4db0185fa1a203c82f04446 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-7850) Makes hoodie.record.merge.mode mandatory upon creating the table and first write

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7850:

Fix Version/s: 1.0.0

> Makes hoodie.record.merge.mode mandatory upon creating the table and first 
> write
> 
>
> Key: HUDI-7850
> URL: https://issues.apache.org/jira/browse/HUDI-7850
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> Right now, "hoodie.record.merge.mode" is optional during writes as it is 
> inferred from the payload class name, payload type, and the record merger 
> strategy during the creation of the table properties.  We should make this 
> config mandatory in release 1.0 and make other merge configs optional to 
> simplify the configuration experience.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7850) Makes hoodie.record.merge.mode mandatory upon creating the table and first write

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7850:

Description: Right now, "hoodie.record.merge.mode" is optional during 
writes as it is inferred from the payload class name, payload type, and the 
record merger strategy during the creation of the table properties.  We should 
make this config mandatory in release 1.0 and make other merge configs optional 
to simplify the configuration experience.  (was: Right now )

> Makes hoodie.record.merge.mode mandatory upon creating the table and first 
> write
> 
>
> Key: HUDI-7850
> URL: https://issues.apache.org/jira/browse/HUDI-7850
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Priority: Major
>
> Right now, "hoodie.record.merge.mode" is optional during writes as it is 
> inferred from the payload class name, payload type, and the record merger 
> strategy during the creation of the table properties.  We should make this 
> config mandatory in release 1.0 and make other merge configs optional to 
> simplify the configuration experience.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7850) Makes `hoodie.record.merge.mode` mandatory upon creating the table and first write

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7850:

Description: Right now 

> Makes `hoodie.record.merge.mode` mandatory upon creating the table and first 
> write
> --
>
> Key: HUDI-7850
> URL: https://issues.apache.org/jira/browse/HUDI-7850
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Priority: Major
>
> Right now 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-7850) Makes `hoodie.record.merge.mode` mandatory upon creating the table and first write

2024-06-09 Thread Ethan Guo (Jira)

Ethan Guo created HUDI-7850:
---

 Summary: Makes `hoodie.record.merge.mode` mandatory upon creating 
the table and first write
 Key: HUDI-7850
 URL: https://issues.apache.org/jira/browse/HUDI-7850
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7850) Makes hoodie.record.merge.mode mandatory upon creating the table and first write

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7850:

Summary: Makes hoodie.record.merge.mode mandatory upon creating the table 
and first write  (was: Makes `hoodie.record.merge.mode` mandatory upon creating 
the table and first write)

> Makes hoodie.record.merge.mode mandatory upon creating the table and first 
> write
> 
>
> Key: HUDI-7850
> URL: https://issues.apache.org/jira/browse/HUDI-7850
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Priority: Major
>
> Right now 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub



hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2157018634

   
   ## CI report:
   
   * a6ffe1240055d6135a517dfcada59edc95383423 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-7842) Update docs on the new record merge mode config

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7842:

Summary: Update docs on the new record merge mode config  (was: Update docs 
with the new record merge mode config)

> Update docs on the new record merge mode config
> ---
>
> Key: HUDI-7842
> URL: https://issues.apache.org/jira/browse/HUDI-7842
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> We should educate users on the new record merge mode config introduced by 
> HUDI-6798 that simplifies configs controlling the merging behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7849) Reduce time spent on running testFiltersInFileFormat

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7849:

Description: 
Below shows the top long-running tests in the job "UT flink & FT common & flink 
& spark-client & hudi-spark" in Azure CI.  The time running 
testFiltersInFileFormat should be reduced.
{code:java}
/usr/bin/bash --noprofile --norc 
/home/vsts/work/_temp/4fa77791-00bc-40cc-82d7-1fb635914a0f.sh
grep: */target/surefire-reports/*.xml: No such file or directory
366.474 boolean) [2] false(testFiltersInFileFormat
223.221 boolean) [1] true(testFiltersInFileFormat
80.903 HoodieTableType, Integer) [3] MERGE_ON_READ, 2(testNewParquetFileFormat
65.48 boolean) [2] true(testDeletePartitionAndArchive
56.558 boolean) [1] false(testDeletePartitionAndArchive{code}

> Reduce time spent on running testFiltersInFileFormat
> 
>
> Key: HUDI-7849
> URL: https://issues.apache.org/jira/browse/HUDI-7849
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> Below shows the top long-running tests in the job "UT flink & FT common & 
> flink & spark-client & hudi-spark" in Azure CI.  The time running 
> testFiltersInFileFormat should be reduced.
> {code:java}
> /usr/bin/bash --noprofile --norc 
> /home/vsts/work/_temp/4fa77791-00bc-40cc-82d7-1fb635914a0f.sh
> grep: */target/surefire-reports/*.xml: No such file or directory
> 366.474 boolean) [2] false(testFiltersInFileFormat
> 223.221 boolean) [1] true(testFiltersInFileFormat
> 80.903 HoodieTableType, Integer) [3] MERGE_ON_READ, 2(testNewParquetFileFormat
> 65.48 boolean) [2] true(testDeletePartitionAndArchive
> 56.558 boolean) [1] false(testDeletePartitionAndArchive{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [I] [SUPPORT] Data deduplication caused by drawback in the delete invalid files before commit [hudi]

2024-06-09 Thread via GitHub



danny0405 commented on issue #11419:
URL: https://github.com/apache/hudi/issues/11419#issuecomment-2156988506

   hmm, would you mind  to fire a fix for it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-7849) Reduce time spent on running testFiltersInFileFormat

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7849:

Fix Version/s: 1.0.0

> Reduce time spent on running testFiltersInFileFormat
> 
>
> Key: HUDI-7849
> URL: https://issues.apache.org/jira/browse/HUDI-7849
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-7849) Reduce time spent on running testFiltersInFileFormat

2024-06-09 Thread Ethan Guo (Jira)

Ethan Guo created HUDI-7849:
---

 Summary: Reduce time spent on running testFiltersInFileFormat
 Key: HUDI-7849
 URL: https://issues.apache.org/jira/browse/HUDI-7849
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-7826] Make column nullable when setNullForMissingColumns is true [hudi]

2024-06-09 Thread via GitHub



hudi-bot commented on PR #11381:
URL: https://github.com/apache/hudi/pull/11381#issuecomment-2156963475

   
   ## CI report:
   
   * 7ac5620ea218b34184ba918f6197339f2f695eb9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24317)
 
   * 0d1802d42d4b67cc791cbd8d8c4619dd7a52d319 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24320)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7826] Make column nullable when setNullForMissingColumns is true [hudi]

2024-06-09 Thread via GitHub



hudi-bot commented on PR #11381:
URL: https://github.com/apache/hudi/pull/11381#issuecomment-2156937303

   
   ## CI report:
   
   * 7ac5620ea218b34184ba918f6197339f2f695eb9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24317)
 
   * 0d1802d42d4b67cc791cbd8d8c4619dd7a52d319 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub



hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156933007

   
   ## CI report:
   
   * 3a1ec4524a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24313)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub



hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156929070

   
   ## CI report:
   
   * a6ffe1240055d6135a517dfcada59edc95383423 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24319)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24318)
 
   * 3a1ec4524a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (HUDI-7759) Remove Hadoop dependencies in hudi-common module

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7759.
---
Resolution: Fixed

> Remove Hadoop dependencies in hudi-common module
> 
>
> Key: HUDI-7759
> URL: https://issues.apache.org/jira/browse/HUDI-7759
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-7752) Abstract serializeRecords for log writing

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7752.
---
Resolution: Fixed

> Abstract serializeRecords for log writing
> -
>
> Key: HUDI-7752
> URL: https://issues.apache.org/jira/browse/HUDI-7752
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-7754) Remove AvroWriteSupport and ParquetReaderIterator from hudi-common

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7754.
---
Resolution: Fixed

> Remove AvroWriteSupport and ParquetReaderIterator from hudi-common
> --
>
> Key: HUDI-7754
> URL: https://issues.apache.org/jira/browse/HUDI-7754
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> 2 classes with hadoop deps that can be moved to hadoop common and aren't 
> covered by other prs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-7750) Move HoodieLogFormatWriter class to hoodie-hadoop-common module

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7750.
---
Resolution: Fixed

> Move HoodieLogFormatWriter class to hoodie-hadoop-common module
> ---
>
> Key: HUDI-7750
> URL: https://issues.apache.org/jira/browse/HUDI-7750
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-09 Thread via GitHub



hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156876724

   
   ## CI report:
   
   * a6ffe1240055d6135a517dfcada59edc95383423 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24319)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24318)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (HUDI-4732) Leverage Schema Registry for reading proto messages from kafka

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-4732.
---
Resolution: Fixed

> Leverage Schema Registry for reading proto messages from kafka
> --
>
> Key: HUDI-4732
> URL: https://issues.apache.org/jira/browse/HUDI-4732
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> If you use the Confluent Schema Registry, they provide a way to deserialize 
> the kafka message value without providing the protobuf class name. The first 
> cut of ProtoKafkaSource requires users to specify a classname but we want to 
> allow users the flexibility to use this other method of deserializing the 
> message.
>  
> Docs: 
> https://docs.confluent.io/platform/current/schema-registry/serdes-develop/serdes-protobuf.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7739) Shudown asyncDetectorExecutor in AsyncTimelineServerBasedDetectionStrategy

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7739:

Fix Version/s: 0.15.0

> Shudown asyncDetectorExecutor in AsyncTimelineServerBasedDetectionStrategy
> --
>
> Key: HUDI-7739
> URL: https://issues.apache.org/jira/browse/HUDI-7739
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Xinyu Zou
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7699) Support STS external ids and configurable session names in the AWS StsAssumeRoleCredentialsProvider

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7699:

Fix Version/s: 0.15.0

> Support STS external ids and configurable session names in the AWS 
> StsAssumeRoleCredentialsProvider
> ---
>
> Key: HUDI-7699
> URL: https://issues.apache.org/jira/browse/HUDI-7699
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ian Streeter
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> [HUDI-6695|https://issues.apache.org/jira/browse/HUDI-6695] added a AWS 
> credentials provider to support assuming a role when syncing to Glue.
> 
> We use Hudi in a multi-tenant environment, and our customers give us 
> delegated access to their Glue catalog.  In this multi-tenant setup it is 
> important to use [an external 
> ID|https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html]
>  to improve security when assuming IAM roles.
> 
> Furthermore, the STS session name is currently hard-coded to "hoodie".  
> It is helpful for us to have configurable session names so we have better 
> tracability of what entities are creating STS sessions in the cloud.
> 
> Currently, the assumed role is configured with the 
> {{hoodie.aws.role.arn}} config property.  I would like to add the following 
> extra optional config properties, which will be used by the 
> {{HoodieConfigAWSAssumedRoleCredentialsProvider}}:
> 
> - {{hoodie.aws.role.external.id}}
> - {{hoodie.aws.role.session.name}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7738) FileStreamReader need set Charset with UTF-8

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7738:

Fix Version/s: 0.15.0

> FileStreamReader need set Charset with UTF-8
> 
>
> Key: HUDI-7738
> URL: https://issues.apache.org/jira/browse/HUDI-7738
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> FileStreamReader need set Charset with UTF-8



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7737) Bump Spark 3.4 version to Spark 3.4.3

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7737:

Fix Version/s: 0.15.0

> Bump Spark 3.4 version to Spark 3.4.3
> -
>
> Key: HUDI-7737
> URL: https://issues.apache.org/jira/browse/HUDI-7737
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Geser Dugarov
>Assignee: Geser Dugarov
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Spark 3.4.3 has been released: https://github.com/apache/spark/tree/v3.4.3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7715) Partition TTL for Flink

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7715:

Fix Version/s: 1.0.0

> Partition TTL for Flink
> ---
>
> Key: HUDI-7715
> URL: https://issues.apache.org/jira/browse/HUDI-7715
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: xi chaomin
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7720) Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7720:

Fix Version/s: 0.15.0
   1.0.0

> Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups
> -
>
> Key: HUDI-7720
> URL: https://issues.apache.org/jira/browse/HUDI-7720
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
> Attachments: 1280X1280.PNG
>
>
> Job aborted due to stage failure: Task 3 in stage 35.0 failed 4 times, most 
> recent failure: Lost task 3.3 in stage 35.0 (TID 32175) (10-222-33-34.lan 
> executor 204): java.lang.NullPointerException
> at java.util.ArrayList.(ArrayList.java:178)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.fetchAllStoredFileGroups(HoodieTableFileSystemView.java:308)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.getAllFileGroupsIncludingReplaced(AbstractTableFileSystemView.java:976)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.getReplacedFileGroupsBefore(AbstractTableFileSystemView.java:989)
> at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:104)
> at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getReplacedFileGroupsBefore(PriorityBasedFileSystemView.java:232)
> at 
> org.apache.hudi.table.action.clean.CleanPlanner.getReplacedFilesEligibleToClean(CleanPlanner.java:441)
> at 
> org.apache.hudi.table.action.clean.CleanPlanner.getFilesToCleanKeepingLatestCommits(CleanPlanner.java:330)
> at 
> org.apache.hudi.table.action.clean.CleanPlanner.getFilesToCleanKeepingLatestCommits(CleanPlanner.java:295)
> at 
> org.apache.hudi.table.action.clean.CleanPlanner.getDeletePaths(CleanPlanner.java:493)
> at 
> org.apache.hudi.table.action.clean.CleanPlanActionExecutor.lambda$requestClean$af5da5d2$1(CleanPlanActionExecutor.java:122)
>  at 
> org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
>  at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
> at scala.collection.Iterator.foreach(Iterator.scala:943)
> at scala.collection.Iterator.foreach$(Iterator.scala:943) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at 
> scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
> at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) 
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
> at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
> at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) at 
> scala.collection.AbstractIterator.to(Iterator.scala:1431) at 
> scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) at 
> scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
> at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
> at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) at 
> scala.collection.AbstractIterator.toArray(Iterator.scala:1431) at 
> org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
> at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303) 
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at 
> org.apache.spark.scheduler.Task.run(Task.scala:131) at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1480)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7721) Fix broken build on master

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7721:

Fix Version/s: 0.15.0

> Fix broken build on master
> --
>
> Key: HUDI-7721
> URL: https://issues.apache.org/jira/browse/HUDI-7721
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> TestHoodieDeltaStreamer is invalid due to 
> [https://github.com/apache/hudi/pull/11099.] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-7720) Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7720.
---
Resolution: Fixed

> Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups
> -
>
> Key: HUDI-7720
> URL: https://issues.apache.org/jira/browse/HUDI-7720
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
> Attachments: 1280X1280.PNG
>
>
> Job aborted due to stage failure: Task 3 in stage 35.0 failed 4 times, most 
> recent failure: Lost task 3.3 in stage 35.0 (TID 32175) (10-222-33-34.lan 
> executor 204): java.lang.NullPointerException
> at java.util.ArrayList.(ArrayList.java:178)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.fetchAllStoredFileGroups(HoodieTableFileSystemView.java:308)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.getAllFileGroupsIncludingReplaced(AbstractTableFileSystemView.java:976)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.getReplacedFileGroupsBefore(AbstractTableFileSystemView.java:989)
> at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:104)
> at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getReplacedFileGroupsBefore(PriorityBasedFileSystemView.java:232)
> at 
> org.apache.hudi.table.action.clean.CleanPlanner.getReplacedFilesEligibleToClean(CleanPlanner.java:441)
> at 
> org.apache.hudi.table.action.clean.CleanPlanner.getFilesToCleanKeepingLatestCommits(CleanPlanner.java:330)
> at 
> org.apache.hudi.table.action.clean.CleanPlanner.getFilesToCleanKeepingLatestCommits(CleanPlanner.java:295)
> at 
> org.apache.hudi.table.action.clean.CleanPlanner.getDeletePaths(CleanPlanner.java:493)
> at 
> org.apache.hudi.table.action.clean.CleanPlanActionExecutor.lambda$requestClean$af5da5d2$1(CleanPlanActionExecutor.java:122)
>  at 
> org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
>  at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
> at scala.collection.Iterator.foreach(Iterator.scala:943)
> at scala.collection.Iterator.foreach$(Iterator.scala:943) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at 
> scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
> at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) 
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
> at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
> at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) at 
> scala.collection.AbstractIterator.to(Iterator.scala:1431) at 
> scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) at 
> scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
> at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
> at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) at 
> scala.collection.AbstractIterator.toArray(Iterator.scala:1431) at 
> org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
> at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303) 
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at 
> org.apache.spark.scheduler.Task.run(Task.scala:131) at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1480)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7641) Add metrics to track what partitions are enabled in MDT

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7641:

Fix Version/s: 1.0.0

> Add metrics to track what partitions are enabled in MDT
> ---
>
> Key: HUDI-7641
> URL: https://issues.apache.org/jira/browse/HUDI-7641
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-7467) TestHoodieDeltaStreamer. testAutoGenerateRecordKeys

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7467.
---
Resolution: Fixed

> TestHoodieDeltaStreamer. testAutoGenerateRecordKeys
> ---
>
> Key: HUDI-7467
> URL: https://issues.apache.org/jira/browse/HUDI-7467
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: tests-ci
>Reporter: Lin Liu
>Assignee: tao pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> This test is flaky and sometimes it fails in Azure CI.  We need to reproduce 
> it locally and check why it is flaky (if there is any bug causing it, or it's 
> due to test setup).
> [https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=22725=logs=dcedfe73-9485-5cc5-817a-73b61fc5dcb0=9df7def4-004b-5fb7-f042-da5d723783ad=859b8d9a-8fd6-5a5c-6f5e-f84f1990894e]
> {code:java}
> [ERROR] Tests run: 131, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: 
> 2,459.289 s <<< FAILURE! - in 
> org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamer
> [ERROR] testAutoGenerateRecordKeys  Time elapsed: 14.248 s  <<< FAILURE!
> org.opentest4j.AssertionFailedError: expected: <300> but was: <500>
>   at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55)
>   at 
> org.junit.jupiter.api.AssertionUtils.failNotEqual(AssertionUtils.java:62)
>   at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:166)
>   at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:161)
>   at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:611)
>   at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerTestBase.assertRecordCount(HoodieDeltaStreamerTestBase.java:486)
>   at 
> org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamer.testAutoGenerateRecordKeys(TestHoodieDeltaStreamer.java:2823)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7467) TestHoodieDeltaStreamer. testAutoGenerateRecordKeys

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7467:

Fix Version/s: 0.15.0
   1.0.0

> TestHoodieDeltaStreamer. testAutoGenerateRecordKeys
> ---
>
> Key: HUDI-7467
> URL: https://issues.apache.org/jira/browse/HUDI-7467
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: tests-ci
>Reporter: Lin Liu
>Assignee: tao pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> This test is flaky and sometimes it fails in Azure CI.  We need to reproduce 
> it locally and check why it is flaky (if there is any bug causing it, or it's 
> due to test setup).
> [https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=22725=logs=dcedfe73-9485-5cc5-817a-73b61fc5dcb0=9df7def4-004b-5fb7-f042-da5d723783ad=859b8d9a-8fd6-5a5c-6f5e-f84f1990894e]
> {code:java}
> [ERROR] Tests run: 131, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: 
> 2,459.289 s <<< FAILURE! - in 
> org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamer
> [ERROR] testAutoGenerateRecordKeys  Time elapsed: 14.248 s  <<< FAILURE!
> org.opentest4j.AssertionFailedError: expected: <300> but was: <500>
>   at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55)
>   at 
> org.junit.jupiter.api.AssertionUtils.failNotEqual(AssertionUtils.java:62)
>   at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:166)
>   at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:161)
>   at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:611)
>   at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerTestBase.assertRecordCount(HoodieDeltaStreamerTestBase.java:486)
>   at 
> org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamer.testAutoGenerateRecordKeys(TestHoodieDeltaStreamer.java:2823)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-7641) Add metrics to track what partitions are enabled in MDT

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7641.
---
Resolution: Fixed

> Add metrics to track what partitions are enabled in MDT
> ---
>
> Key: HUDI-7641
> URL: https://issues.apache.org/jira/browse/HUDI-7641
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7710) BugFix: Remove compaction.inflight from conflict resolution

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7710:

Fix Version/s: 0.15.0
   1.0.0

> BugFix: Remove compaction.inflight from conflict resolution
> ---
>
> Key: HUDI-7710
> URL: https://issues.apache.org/jira/browse/HUDI-7710
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: compaction
>Reporter: Lin Liu
>Assignee: Lin Liu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> During conflict resolution, compaction.inflight is found; since they don't 
> contain any plan information, this could cause NPE error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-7688) Avoid always repeated inflate when encounter InterruptedIOException

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7688.
---
Resolution: Fixed

> Avoid always repeated inflate when encounter InterruptedIOException
> ---
>
> Key: HUDI-7688
> URL: https://issues.apache.org/jira/browse/HUDI-7688
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Jing Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
> Attachments: image-2024-04-30-11-25-41-671.png, 
> image-2024-04-30-11-27-59-572.png
>
>
> !image-2024-04-30-11-25-41-671.png!
> !image-2024-04-30-11-27-59-572.png!
> We should avoid always retry inflate when encounter InterruptedIOException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7688) Avoid always repeated inflate when encounter InterruptedIOException

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7688:

Fix Version/s: 0.15.0
   1.0.0

> Avoid always repeated inflate when encounter InterruptedIOException
> ---
>
> Key: HUDI-7688
> URL: https://issues.apache.org/jira/browse/HUDI-7688
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Jing Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
> Attachments: image-2024-04-30-11-25-41-671.png, 
> image-2024-04-30-11-27-59-572.png
>
>
> !image-2024-04-30-11-25-41-671.png!
> !image-2024-04-30-11-27-59-572.png!
> We should avoid always retry inflate when encounter InterruptedIOException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7667) Create util method to get offset range for fetching new data in KafkaSource

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7667:

Fix Version/s: 0.15.0

> Create util method to get offset range for fetching new data in KafkaSource
> ---
>
> Key: HUDI-7667
> URL: https://issues.apache.org/jira/browse/HUDI-7667
> Project: Apache Hudi
>  Issue Type: Wish
>  Components: deltastreamer
>Reporter: Vinish Reddy
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7684) Sort the records for Flink metadata table bulk_insert

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7684:

Fix Version/s: 0.15.0

> Sort the records for Flink metadata table bulk_insert
> -
>
> Key: HUDI-7684
> URL: https://issues.apache.org/jira/browse/HUDI-7684
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> The HFile write requires the input to be sorted, without the sort, 
> re-enabling MDT on existing table could incur issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7682) Remove the files copy in Azure CI tests report

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7682:

Fix Version/s: 0.15.0

> Remove the files copy in Azure CI tests report
> --
>
> Key: HUDI-7682
> URL: https://issues.apache.org/jira/browse/HUDI-7682
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: compile
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-7511) Offset range calculation in kafka should return all topic partitions

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7511.
---
Resolution: Fixed

> Offset range calculation in kafka should return all topic partitions 
> -
>
> Key: HUDI-7511
> URL: https://issues.apache.org/jira/browse/HUDI-7511
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: deltastreamer
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> after [https://github.com/apache/hudi/pull/10869] got landed, we are not 
> returning every topic partition in final ranges. But for checkpointing 
> purpose, we need to have every kafka topic partition in final ranges even if 
> we are not consuming anything. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7658) Log time taken when meta sync fails in stream sync

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7658:

Fix Version/s: 0.15.0
   1.0.0

> Log time taken when meta sync fails in stream sync
> --
>
> Key: HUDI-7658
> URL: https://issues.apache.org/jira/browse/HUDI-7658
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Time is only printed in log statements on success, but it is useful to see 
> the log on failure as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-7658) Log time taken when meta sync fails in stream sync

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7658.
---
Resolution: Fixed

> Log time taken when meta sync fails in stream sync
> --
>
> Key: HUDI-7658
> URL: https://issues.apache.org/jira/browse/HUDI-7658
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Time is only printed in log statements on success, but it is useful to see 
> the log on failure as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7511) Offset range calculation in kafka should return all topic partitions

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7511:

Fix Version/s: 0.15.0
   1.0.0

> Offset range calculation in kafka should return all topic partitions 
> -
>
> Key: HUDI-7511
> URL: https://issues.apache.org/jira/browse/HUDI-7511
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: deltastreamer
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> after [https://github.com/apache/hudi/pull/10869] got landed, we are not 
> returning every topic partition in final ranges. But for checkpointing 
> purpose, we need to have every kafka topic partition in final ranges even if 
> we are not consuming anything. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7672) Fix the Hive server scratch dir for tests in hudi-utilities

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7672:

Fix Version/s: 0.15.0

> Fix the Hive server scratch dir for tests in hudi-utilities
> ---
>
> Key: HUDI-7672
> URL: https://issues.apache.org/jira/browse/HUDI-7672
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Currently a null/hive/${user} dir would be left over when the tests finished, 
> which also introduces some permission access issues for Azure CI test reports.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7648) Refactor MetadataPartitionType so as to enahance reuse across metadata writer

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7648:

Fix Version/s: (was: 0.15.0)

> Refactor MetadataPartitionType so as to enahance reuse across metadata writer
> -
>
> Key: HUDI-7648
> URL: https://issues.apache.org/jira/browse/HUDI-7648
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: hudi-1.0.0-beta2, pull-request-available
> Fix For: 1.0.0
>
>
> https://github.com/apache/hudi/pull/10352#discussion_r1569972641



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7645) Optimize BQ sync tool for MDT

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7645:

Fix Version/s: 0.15.0
   1.0.0

> Optimize BQ sync tool for MDT
> -
>
> Key: HUDI-7645
> URL: https://issues.apache.org/jira/browse/HUDI-7645
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: meta-sync
>Reporter: sivabalan narayanan
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Looks like in BQ sync, we are polling fsview for latest files sequentially 
> for every partition. 
>  
> When MDT is enabled, we could load all partitions in one call. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7648) Refactor MetadataPartitionType so as to enahance reuse across metadata writer

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7648:

Fix Version/s: 0.15.0

> Refactor MetadataPartitionType so as to enahance reuse across metadata writer
> -
>
> Key: HUDI-7648
> URL: https://issues.apache.org/jira/browse/HUDI-7648
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: hudi-1.0.0-beta2, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> https://github.com/apache/hudi/pull/10352#discussion_r1569972641



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7632) Remove FileSystem usage in HoodieLogFormatWriter

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7632:

Fix Version/s: 0.15.0

> Remove FileSystem usage in HoodieLogFormatWriter
> 
>
> Key: HUDI-7632
> URL: https://issues.apache.org/jira/browse/HUDI-7632
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> https://github.com/apache/hudi/pull/10591#discussion_r1569173014



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-6386) Fix testArchivalWithMultiWriters when metadata enabled

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6386:

Fix Version/s: 0.15.0
   1.0.0
   (was: 1.1.0)

> Fix testArchivalWithMultiWriters when metadata enabled
> --
>
> Key: HUDI-6386
> URL: https://issues.apache.org/jira/browse/HUDI-6386
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: Sagar Sumit
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> From base RLI patch, we found this test failing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-7235) Checkpoint in S3/GCS when there are no commits left to consume in S3 metadata table

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7235.
---
Resolution: Fixed

> Checkpoint in S3/GCS when there are no commits left to consume in S3 metadata 
> table
> ---
>
> Key: HUDI-7235
> URL: https://issues.apache.org/jira/browse/HUDI-7235
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Vinish Reddy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7235) Checkpoint in S3/GCS when there are no commits left to consume in S3 metadata table

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7235:

Fix Version/s: 0.15.0
   1.0.0

> Checkpoint in S3/GCS when there are no commits left to consume in S3 metadata 
> table
> ---
>
> Key: HUDI-7235
> URL: https://issues.apache.org/jira/browse/HUDI-7235
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Vinish Reddy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-6386) Fix testArchivalWithMultiWriters when metadata enabled

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-6386.
---
Resolution: Fixed

> Fix testArchivalWithMultiWriters when metadata enabled
> --
>
> Key: HUDI-6386
> URL: https://issues.apache.org/jira/browse/HUDI-6386
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: Sagar Sumit
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> From base RLI patch, we found this test failing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7655) Support configuration for clean to fail execution if there is at least one file is marked as a failed delete

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7655:

Fix Version/s: 0.15.0

> Support configuration for clean to fail execution if there is at least one 
> file is marked as a failed delete
> 
>
> Key: HUDI-7655
> URL: https://issues.apache.org/jira/browse/HUDI-7655
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Krishen Bhan
>Assignee: sivabalan narayanan
>Priority: Minor
>  Labels: clean, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> When a HUDI clean plan is executed, any targeted file that was not confirmed 
> as deleted (or non-existing) will be marked as a "failed delete". Although 
> these failed deletes will be added to `.clean` metadata, if incremental clean 
> is used then these files might not ever be picked up again as a future clean 
> plan, unless a "full-scan" clean ends up being scheduled. In addition to 
> leading to more files unnecessarily taking up storage space for longer, then 
> can lead to the following dataset consistency issue for COW datasets:
>  # Insert at C1 creates file group f1 in partition
>  # Replacecommit at RC2 creates file group f2 in partition, and replaces f1
>  # Any reader of partition that calls HUDI API (with or without using MDT) 
> will recognize that f1 should be ignored, as it has been replaced. This is 
> since RC2 instant file is in active timeline
>  # Some completed instants later an incremental clean is scheduled. It moves 
> the "earliest commit to retain" to an time after instant time RC2, so it 
> targets f1 for deletion. But during execution of the plan, it fails to delete 
> f1.
>  # An archive job eventually is triggered, and archives C1 and RC2. Note that 
> f1 is still in partition
> At this point, any job/query that reads the aforementioned partition directly 
> from the DFS file system calls (without directly using MDT FILES partition) 
> will consider both f1 and f2 as valid file groups, since RC2 is no longer in 
> active timeline. This is a data consistency issue, and will only be resolved 
> if a "full-scan" clean is triggered and deletes f1.
> This specific scenario can be avoided if the user can configure HUDI clean to 
> fail execution of a clean plan unless all files are confirmed as deleted (or 
> not existing in DFS already), "blocking" the clean. The next clean attempt 
> will re-execute this existing plan, since clean plans cannot be "rolled 
> back". 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-7655) Support configuration for clean to fail execution if there is at least one file is marked as a failed delete

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7655.
---
Resolution: Fixed

> Support configuration for clean to fail execution if there is at least one 
> file is marked as a failed delete
> 
>
> Key: HUDI-7655
> URL: https://issues.apache.org/jira/browse/HUDI-7655
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Krishen Bhan
>Assignee: sivabalan narayanan
>Priority: Minor
>  Labels: clean, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> When a HUDI clean plan is executed, any targeted file that was not confirmed 
> as deleted (or non-existing) will be marked as a "failed delete". Although 
> these failed deletes will be added to `.clean` metadata, if incremental clean 
> is used then these files might not ever be picked up again as a future clean 
> plan, unless a "full-scan" clean ends up being scheduled. In addition to 
> leading to more files unnecessarily taking up storage space for longer, then 
> can lead to the following dataset consistency issue for COW datasets:
>  # Insert at C1 creates file group f1 in partition
>  # Replacecommit at RC2 creates file group f2 in partition, and replaces f1
>  # Any reader of partition that calls HUDI API (with or without using MDT) 
> will recognize that f1 should be ignored, as it has been replaced. This is 
> since RC2 instant file is in active timeline
>  # Some completed instants later an incremental clean is scheduled. It moves 
> the "earliest commit to retain" to an time after instant time RC2, so it 
> targets f1 for deletion. But during execution of the plan, it fails to delete 
> f1.
>  # An archive job eventually is triggered, and archives C1 and RC2. Note that 
> f1 is still in partition
> At this point, any job/query that reads the aforementioned partition directly 
> from the DFS file system calls (without directly using MDT FILES partition) 
> will consider both f1 and f2 as valid file groups, since RC2 is no longer in 
> active timeline. This is a data consistency issue, and will only be resolved 
> if a "full-scan" clean is triggered and deletes f1.
> This specific scenario can be avoided if the user can configure HUDI clean to 
> fail execution of a clean plan unless all files are confirmed as deleted (or 
> not existing in DFS already), "blocking" the clean. The next clean attempt 
> will re-execute this existing plan, since clean plans cannot be "rolled 
> back". 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7647) READ_UTC_TIMEZONE doesn't affect log files for MOR tables

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7647:

Fix Version/s: 0.15.0

> READ_UTC_TIMEZONE doesn't affect log files for MOR tables
> -
>
> Key: HUDI-7647
> URL: https://issues.apache.org/jira/browse/HUDI-7647
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Mark Bukhner
>Assignee: Danny Chen
>Priority: Major
>  Labels: flink, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Write COPY_ON_WRITE table:
> {code:java}
> tableEnv.executeSql("CREATE TABLE test_2(\n"
> + "  uuid VARCHAR(40),\n"
> + "  name VARCHAR(10),\n"
> + "  age INT,\n"
> + "  ts TIMESTAMP(3),\n"
> + "  `partition` VARCHAR(20)\n"
> + ")\n"
> + "PARTITIONED BY (`partition`)\n"
> + "WITH (\n"
> + "  'connector' = 'hudi',\n"
> + "  'path' = '...',\n"
> + "  'table.type' = 'COPY_ON_WRITE',\n"
> + "  'write.utc-timezone' = 'true',\n"
> + "  'index.type' = 'INMEMORY'\n"
> + ");").await(); 
> tableEnv.executeSql("insert into test_2 \n" 
> + "values ('ab', 'cccx', 12, TIMESTAMP '1972-01-01 00:00:01', 'xx'),\n"
> + " ('ab', 'cccx', 12, TIMESTAMP '1970-01-01 00:00:01', 
> 'xx');").await();{code}
> Then read COW table with READ_UTC_TIMEZONE will recieve:
> {code:java}
> +I[ab, cccx, 12, 1972-01-01T00:00:01, xx] // if READ_UTC_TIMEZONE = 'true' 
> +I[ab, cccx, 12, 1972-01-01T07:00:01, xx] // if READ_UTC_TIMEZONE = 'false' 
> {code}
> But if create and write table with 'table.type' = 'COPY_ON_WRITE' will 
> recieve:
> {code:java}
> +I[ab, cccx, 12, 1972-01-01T00:00:01, xx] // if READ_UTC_TIMEZONE = 'true'
> +I[ab, cccx, 12, 1972-01-01T00:00:01, xx] // if READ_UTC_TIMEZONE = 'false'
> {code}
> There is no difference between READ_UTC_TIMEZONE equals true or false while 
> read log files (MOR table), but 7h difference while read COW table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7628) Rename FSUtils.getPartitionPath to constructAbsolutePath

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7628:

Fix Version/s: 0.15.0

> Rename FSUtils.getPartitionPath to constructAbsolutePath
> 
>
> Key: HUDI-7628
> URL: https://issues.apache.org/jira/browse/HUDI-7628
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> [https://github.com/apache/hudi/pull/10591#discussion_r1483632718]
> Rename FSUtils.getPartitionPath to constructAbsolutePath and partitionPath 
> argument to relativePartitionPath so that the naming reflects the 
> functionality.  This has to be merged after HUDI-6497 and the above PR to 
> reduce merging conflicts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7631) Clean up usage of `CachingPath` outside hudi-common module

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7631:

Fix Version/s: 0.15.0

> Clean up usage of `CachingPath` outside hudi-common module
> --
>
> Key: HUDI-7631
> URL: https://issues.apache.org/jira/browse/HUDI-7631
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> https://github.com/apache/hudi/pull/10591#discussion_r1484923458



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-7618) Add ability to ignore checkpoints in delta streamer

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7618.
---
Resolution: Fixed

> Add ability to ignore checkpoints in delta streamer
> ---
>
> Key: HUDI-7618
> URL: https://issues.apache.org/jira/browse/HUDI-7618
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sampan s nayak
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Adding a new parameter to ignore checkpoints.
> Use case: when we want to switch source topic, or path, etc. often times it 
> will be hard to compute the exact checkpoint we want to start ingesting from 
> the updated source. With this change, we will just have to pass the ignore 
> checkpoint and then use some other source specific property (Kafka starting 
> offsets for eg) to decide how to start ingesting newer data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7623) Refactoring of RemoteHoodieTableFileSystemView and RequestHandler

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7623:

Fix Version/s: 0.15.0

> Refactoring of RemoteHoodieTableFileSystemView and RequestHandler
> -
>
> Key: HUDI-7623
> URL: https://issues.apache.org/jira/browse/HUDI-7623
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: code-quality
>Reporter: Vova Kolmakov
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Found a lot of code duplicates and inconsistent naming in 
> RemoteHoodieTableFileSystemView and RequestHandler classes.
>  * remove code duplicates;
>  * fix logging;
>  * fix naming of constants (request urls);
>  * move some methods to right places



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7618) Add ability to ignore checkpoints in delta streamer

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7618:

Fix Version/s: 0.15.0
   1.0.0

> Add ability to ignore checkpoints in delta streamer
> ---
>
> Key: HUDI-7618
> URL: https://issues.apache.org/jira/browse/HUDI-7618
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sampan s nayak
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Adding a new parameter to ignore checkpoints.
> Use case: when we want to switch source topic, or path, etc. often times it 
> will be hard to compute the exact checkpoint we want to start ingesting from 
> the updated source. With this change, we will just have to pass the ignore 
> checkpoint and then use some other source specific property (Kafka starting 
> offsets for eg) to decide how to start ingesting newer data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4228) Clean up literal usage in Hudi CLI argument check

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-4228:

Fix Version/s: 0.15.0

> Clean up literal usage in Hudi CLI argument check
> -
>
> Key: HUDI-4228
> URL: https://issues.apache.org/jira/browse/HUDI-4228
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: Ethan Guo
>Assignee: Vova Kolmakov
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> In "org.apache.hudi.cli.commands.SparkMain", the logic for checking number of 
> arguments for different Hudi CLI commands is hardcoded with literals like 
> this:
> {code:java}
> case COMPACT_RUN:
>   assert (args.length >= 10);
>   propsFilePath = null;
>   if (!StringUtils.isNullOrEmpty(args[9])) {
> propsFilePath = args[9];
>   }
>   configs = new ArrayList<>();
>   if (args.length > 10) {
> configs.addAll(Arrays.asList(args).subList(9, args.length));
>   }
>   returnCode = compact(jsc, args[3], args[4], args[5], 
> Integer.parseInt(args[6]), args[7],
>   Integer.parseInt(args[8]), HoodieCompactor.EXECUTE, propsFilePath, 
> configs);
>   break; {code}
> We should have a better way of validating this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-7606) Ensure that rdds persisted by table services are released in SparkRDDWriteClient

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7606.
---
Resolution: Fixed

> Ensure that rdds persisted by table services are released in 
> SparkRDDWriteClient
> 
>
> Key: HUDI-7606
> URL: https://issues.apache.org/jira/browse/HUDI-7606
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Rajesh Mahindra
>Assignee: Rajesh Mahindra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Ensure that rdds persisted by table services are released in 
> SparkRDDWriteClient since the RDDs are currently release prior to the table 
> services.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7619) Remove code duplicates in HoodieTableMetadataUtil

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7619:

Fix Version/s: 0.15.0

> Remove code duplicates in HoodieTableMetadataUtil
> -
>
> Key: HUDI-7619
> URL: https://issues.apache.org/jira/browse/HUDI-7619
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: code-quality
>Reporter: Vova Kolmakov
>Assignee: Vova Kolmakov
>Priority: Minor
>  Labels: pull-request-available, refactoring
> Fix For: 0.15.0, 1.0.0
>
>
> Remove code duplication in HoodieTableMetadataUtil by extracting of 
> {{ClosableIterator}} creation to separate method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-6762) Remove usages of MetadataRecordsGenerationParams

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6762:

Fix Version/s: 0.15.0

> Remove usages of MetadataRecordsGenerationParams
> 
>
> Key: HUDI-6762
> URL: https://issues.apache.org/jira/browse/HUDI-6762
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Vova Kolmakov
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> MetadataRecordsGenerationParams is deprecated. We already rely on table 
> config for enabled mdt partition types. See if we can remove this POJO.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-7605) Unable to set merger strategy with DataSourceWriteOptions.RECORD_MERGER_STRATEGY

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7605.
---
Resolution: Fixed

> Unable to set merger strategy with 
> DataSourceWriteOptions.RECORD_MERGER_STRATEGY
> 
>
> Key: HUDI-7605
> URL: https://issues.apache.org/jira/browse/HUDI-7605
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark, spark-sql
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> DataSourceWriteOptions.RECORD_MERGER_STRATEGY.key() should change the 
> strategy set in the tableconfigs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7290) filterPendingReplaceTimeline used incorrectly in various places

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7290:

Fix Version/s: 0.15.0
   1.0.0

> filterPendingReplaceTimeline used incorrectly in various places
> ---
>
> Key: HUDI-7290
> URL: https://issues.apache.org/jira/browse/HUDI-7290
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: clustering, table-service
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> filterPendingReplaceTimeline is used assuming that replace commits are 
> clustering. There are several other actions that are uncommon that also use 
> replace commits. Fix usage to not make that assumption. Possibly remove the 
> filterPendingReplaceTimeline method so that this issue doesn't happen again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-7391) HoodieMetadataMetrics should use Metrics instance for metrics registry

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7391.
---
Resolution: Fixed

> HoodieMetadataMetrics should use Metrics instance for metrics registry
> --
>
> Key: HUDI-7391
> URL: https://issues.apache.org/jira/browse/HUDI-7391
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata, metrics
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Currently HoodieMetadataMetrics stores metrics in memory and these metrics 
> are not pushed by the metric reporters. The metric reporters are configured 
> within Metrics instance. List of changes in the PR:
> 1. Metrics related classes have been moved from hudi-client-common to 
> hudi-common.
> 2. HoodieMetadataMetrics now uses Metrics class so that all the reporters can 
> be supported with it.
> 3. Some gaps in configs which are added in HoodieMetadataWriteUtils
> 4. Some metrics related apis and functionality has been moved to 
> HoodieMetricsConfig. The HoodieWriteConfig APIs now delegate to 
> HoodieMetricsConfig for the functionality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-7290) filterPendingReplaceTimeline used incorrectly in various places

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7290.
---
Resolution: Fixed

> filterPendingReplaceTimeline used incorrectly in various places
> ---
>
> Key: HUDI-7290
> URL: https://issues.apache.org/jira/browse/HUDI-7290
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: clustering, table-service
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> filterPendingReplaceTimeline is used assuming that replace commits are 
> clustering. There are several other actions that are uncommon that also use 
> replace commits. Fix usage to not make that assumption. Possibly remove the 
> filterPendingReplaceTimeline method so that this issue doesn't happen again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7605) Unable to set merger strategy with DataSourceWriteOptions.RECORD_MERGER_STRATEGY

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7605:

Fix Version/s: 0.15.0

> Unable to set merger strategy with 
> DataSourceWriteOptions.RECORD_MERGER_STRATEGY
> 
>
> Key: HUDI-7605
> URL: https://issues.apache.org/jira/browse/HUDI-7605
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark, spark-sql
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> DataSourceWriteOptions.RECORD_MERGER_STRATEGY.key() should change the 
> strategy set in the tableconfigs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7391) HoodieMetadataMetrics should use Metrics instance for metrics registry

2024-06-09 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7391:

Fix Version/s: 0.15.0
   1.0.0

> HoodieMetadataMetrics should use Metrics instance for metrics registry
> --
>
> Key: HUDI-7391
> URL: https://issues.apache.org/jira/browse/HUDI-7391
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata, metrics
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Currently HoodieMetadataMetrics stores metrics in memory and these metrics 
> are not pushed by the metric reporters. The metric reporters are configured 
> within Metrics instance. List of changes in the PR:
> 1. Metrics related classes have been moved from hudi-client-common to 
> hudi-common.
> 2. HoodieMetadataMetrics now uses Metrics class so that all the reporters can 
> be supported with it.
> 3. Some gaps in configs which are added in HoodieMetadataWriteUtils
> 4. Some metrics related apis and functionality has been moved to 
> HoodieMetricsConfig. The HoodieWriteConfig APIs now delegate to 
> HoodieMetricsConfig for the functionality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

1 2 3 >

1 - 100 of 209 matches

Mail list logo