[jira] [Updated] (HUDI-7557) NoSuchElementException when commit corresponding to savepoint has been removed or archived

2024-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7557:
-
Labels: pull-request-available  (was: )

> NoSuchElementException when commit corresponding to savepoint has been 
> removed or archived
> --
>
> Key: HUDI-7557
> URL: https://issues.apache.org/jira/browse/HUDI-7557
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>
> This 
> [block|https://github.com/apache/hudi/blob/26c00a3adefff9217187ca0ab9a5b2a7c9e42199/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java#L246-L249]
>  of code is buggy when commit which was savepointed has been removed or 
> archived.
>  
> {code:java}
> if (!instantOption.isPresent()) {
>         LOG.warn("Skipping to process a commit for which savepoint was 
> removed as the instant moved to archived timeline already");
>       }
> HoodieInstant instant = instantOption.get(); {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7557] Fix incremental cleaner when commit for savepoint removed [hudi]

2024-03-31 Thread via GitHub


codope opened a new pull request, #10946:
URL: https://github.com/apache/hudi/pull/10946

   ### Change Logs
   
   This 
[block](https://github.com/apache/hudi/blob/26c00a3adefff9217187ca0ab9a5b2a7c9e42199/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java#L246-L249)
 of code is buggy when commit which was savepointed has been removed or 
archived. The PR handles the empty `Option`. This code path is exercised only 
when incremental cleaning is enabled and there are savepoints in the timeline.
   
   ### Impact
   
   Bug fix for incremental cleaner.
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7552] Remove the suffix for MDT table service instants [hudi]

2024-03-31 Thread via GitHub


hudi-bot commented on PR #10945:
URL: https://github.com/apache/hudi/pull/10945#issuecomment-2029227726

   
   ## CI report:
   
   * bf8eba5011f8ff4762e4da92aa57057873bafeab Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23063)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-4444) Refactor DataSourceInternalWriterHelper

2024-03-31 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov reassigned HUDI-:
---

Assignee: (was: Vova Kolmakov)

> Refactor DataSourceInternalWriterHelper
> ---
>
> Key: HUDI-
> URL: https://issues.apache.org/jira/browse/HUDI-
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: code-quality
>Reporter: Raymond Xu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> DataSourceInternalWriterHelper constructor is writing files (through 
> writeClient.startCommitWithTime and writeClient.preWrite), which is an 
> anti-pattern. We should refactor this part.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7557) NoSuchElementException when commit corresponding to savepoint has been removed or archived

2024-03-31 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7557:
--
Summary: NoSuchElementException when commit corresponding to savepoint has 
been removed or archived  (was: NoSuchElementException when savepoint has been 
removed or archived)

> NoSuchElementException when commit corresponding to savepoint has been 
> removed or archived
> --
>
> Key: HUDI-7557
> URL: https://issues.apache.org/jira/browse/HUDI-7557
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Sagar Sumit
>Priority: Major
> Fix For: 0.15.0
>
>
> This 
> [block|https://github.com/apache/hudi/blob/26c00a3adefff9217187ca0ab9a5b2a7c9e42199/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java#L246-L249]
>  of code is buggy when commit which was savepointed has been removed or 
> archived.
>  
> {code:java}
> if (!instantOption.isPresent()) {
>         LOG.warn("Skipping to process a commit for which savepoint was 
> removed as the instant moved to archived timeline already");
>       }
> HoodieInstant instant = instantOption.get(); {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7552] Remove the suffix for MDT table service instants [hudi]

2024-03-31 Thread via GitHub


hudi-bot commented on PR #10945:
URL: https://github.com/apache/hudi/pull/10945#issuecomment-2029221444

   
   ## CI report:
   
   * 1fdb25272d5d41970393eb9bc7632a697ca879af Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23062)
 
   * bf8eba5011f8ff4762e4da92aa57057873bafeab Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23063)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7552] Remove the suffix for MDT table service instants [hudi]

2024-03-31 Thread via GitHub


hudi-bot commented on PR #10945:
URL: https://github.com/apache/hudi/pull/10945#issuecomment-2029213925

   
   ## CI report:
   
   * 1fdb25272d5d41970393eb9bc7632a697ca879af Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23062)
 
   * bf8eba5011f8ff4762e4da92aa57057873bafeab UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]

2024-03-31 Thread via GitHub


danny0405 commented on code in PR #10943:
URL: https://github.com/apache/hudi/pull/10943#discussion_r1545988384


##
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java:
##
@@ -209,10 +210,18 @@ public static byte[] avroToJson(GenericRecord record, 
boolean pretty) throws IOE
   private static ByteArrayOutputStream avroToJsonHelper(GenericRecord record, 
boolean pretty) throws IOException {
 DatumWriter writer = new GenericDatumWriter<>(record.getSchema());
 ByteArrayOutputStream out = new ByteArrayOutputStream();
-JsonEncoder jsonEncoder = 
EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty);
-writer.write(record, jsonEncoder);
-jsonEncoder.flush();
-return out;
+try {
+  JsonEncoder jsonEncoder = 
EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty);
+  writer.write(record, jsonEncoder);
+  jsonEncoder.flush();
+  return out;
+} catch (ClassCastException | NullPointerException ex) {
+  // NullPointerException will be thrown in cases where the field values 
are missing
+  // ClassCastException will be thrown in cases where the field values do 
not match the schema type
+  // Fallback to using `toString` which also returns json but without a 
pretty-print option
+  out.write(record.toString().getBytes(StandardCharsets.UTF_8));

Review Comment:
   If the schema does not really change for that, it is okay, maybe we can add 
some use cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7552] Remove the suffix for MDT table service instants [hudi]

2024-03-31 Thread via GitHub


hudi-bot commented on PR #10945:
URL: https://github.com/apache/hudi/pull/10945#issuecomment-2029207026

   
   ## CI report:
   
   * 1fdb25272d5d41970393eb9bc7632a697ca879af Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23062)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-6538) Refactor methods in TimelineDiffHelper class

2024-03-31 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-6538.

Resolution: Fixed

Fixed via master branch: 44ab6f32bffbab8cd250bd0430d9591360f118e7

> Refactor methods in TimelineDiffHelper class
> 
>
> Key: HUDI-6538
> URL: https://issues.apache.org/jira/browse/HUDI-6538
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Surya Prasanna Yalla
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Refactor methods in TimelineDiffHelper class to address following comment in 
> [PR-9007|https://github.com/apache/hudi/pull/9007]
>  
> {code:java}
> The methods getPendingReplaceCommitTransitions and 
> getPendingLogCompactionTransitions look almost the same except the action 
> type, can we abstract a little to merge them altogether?{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6538) Refactor methods in TimelineDiffHelper class

2024-03-31 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-6538:
-
Fix Version/s: 1.0.0

> Refactor methods in TimelineDiffHelper class
> 
>
> Key: HUDI-6538
> URL: https://issues.apache.org/jira/browse/HUDI-6538
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Surya Prasanna Yalla
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Refactor methods in TimelineDiffHelper class to address following comment in 
> [PR-9007|https://github.com/apache/hudi/pull/9007]
>  
> {code:java}
> The methods getPendingReplaceCommitTransitions and 
> getPendingLogCompactionTransitions look almost the same except the action 
> type, can we abstract a little to merge them altogether?{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated: [HUDI-6538] Refactor methods in TimelineDiffHelper class (#10938)

2024-03-31 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 44ab6f32bff [HUDI-6538] Refactor methods in TimelineDiffHelper class 
(#10938)
44ab6f32bff is described below

commit 44ab6f32bffbab8cd250bd0430d9591360f118e7
Author: wombatu-kun 
AuthorDate: Mon Apr 1 12:47:27 2024 +0700

[HUDI-6538] Refactor methods in TimelineDiffHelper class (#10938)
---
 .../common/table/timeline/TimelineDiffHelper.java  | 66 +++---
 1 file changed, 21 insertions(+), 45 deletions(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/TimelineDiffHelper.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/TimelineDiffHelper.java
index aa7e2a30754..a98b71aa571 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/TimelineDiffHelper.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/TimelineDiffHelper.java
@@ -37,8 +37,11 @@ public class TimelineDiffHelper {
 
   private static final Logger LOG = 
LoggerFactory.getLogger(TimelineDiffHelper.class);
 
+  private TimelineDiffHelper() {
+  }
+
   public static TimelineDiffResult 
getNewInstantsForIncrementalSync(HoodieTimeline oldTimeline,
-  HoodieTimeline newTimeline) {
+
HoodieTimeline newTimeline) {
 
 HoodieTimeline oldT = oldTimeline.filterCompletedAndCompactionInstants();
 HoodieTimeline newT = newTimeline.filterCompletedAndCompactionInstants();
@@ -57,14 +60,14 @@ public class TimelineDiffHelper {
   List newInstants = new ArrayList<>();
 
   // Check If any pending compaction is lost. If so, do not allow 
incremental timeline sync
-  List> compactionInstants = 
getPendingCompactionTransitions(oldT, newT);
+  List> compactionInstants = 
getPendingActionTransitions(oldT.filterPendingCompactionTimeline(),
+  newT, HoodieTimeline.COMMIT_ACTION, 
HoodieTimeline.COMPACTION_ACTION);
   List lostPendingCompactions = compactionInstants.stream()
   .filter(instantPair -> instantPair.getValue() == 
null).map(Pair::getKey).collect(Collectors.toList());
   if (!lostPendingCompactions.isEmpty()) {
 // If a compaction is unscheduled, fall back to complete refresh of fs 
view since some log files could have been
 // moved. Its unsafe to incrementally sync in that case.
-LOG.warn("Some pending compactions are no longer in new timeline 
(unscheduled ?). They are :"
-+ lostPendingCompactions);
+LOG.warn("Some pending compactions are no longer in new timeline 
(unscheduled ?). They are: {}", lostPendingCompactions);
 return TimelineDiffResult.UNSAFE_SYNC_RESULT;
   }
   List finishedCompactionInstants = 
compactionInstants.stream()
@@ -74,7 +77,8 @@ public class TimelineDiffHelper {
 
   newTimeline.getInstantsAsStream().filter(instant -> 
!oldTimelineInstants.contains(instant)).forEach(newInstants::add);
 
-  List> logCompactionInstants = 
getPendingLogCompactionTransitions(oldTimeline, newTimeline);
+  List> logCompactionInstants = 
getPendingActionTransitions(oldTimeline.filterPendingLogCompactionTimeline(),
+  newTimeline, HoodieTimeline.DELTA_COMMIT_ACTION, 
HoodieTimeline.LOG_COMPACTION_ACTION);
   List finishedOrRemovedLogCompactionInstants = 
logCompactionInstants.stream()
   .filter(instantPair -> !instantPair.getKey().isCompleted()
   && (instantPair.getValue() == null || 
instantPair.getValue().isCompleted()))
@@ -87,52 +91,24 @@ public class TimelineDiffHelper {
 }
   }
 
-  /**
-   * Getting pending log compaction transitions.
-   */
-  private static List> 
getPendingLogCompactionTransitions(HoodieTimeline oldTimeline,
-   
   HoodieTimeline newTimeline) {
-Set newTimelineInstants = 
newTimeline.getInstantsAsStream().collect(Collectors.toSet());
-
-return 
oldTimeline.filterPendingLogCompactionTimeline().getInstantsAsStream().map(instant
 -> {
-  if (newTimelineInstants.contains(instant)) {
-return Pair.of(instant, instant);
-  } else {
-HoodieInstant logCompacted =
-new HoodieInstant(State.COMPLETED, 
HoodieTimeline.DELTA_COMMIT_ACTION, instant.getTimestamp());
-if (newTimelineInstants.contains(logCompacted)) {
-  return Pair.of(instant, logCompacted);
-}
-HoodieInstant inflightLogCompacted =
-new HoodieInstant(State.INFLIGHT, 
HoodieTimeline.LOG_COMPACTION_ACTION, instant.getTimestamp());
-if (newTimelineInstants.contains(inflightLogCompacted)) {
-  return Pair.of(instant, inflightLogCompacted);
-}
-return Pair.of(instant, null);
-  }
-}).colle

Re: [PR] [HUDI-6538] Refactor methods in TimelineDiffHelper class [hudi]

2024-03-31 Thread via GitHub


danny0405 merged PR #10938:
URL: https://github.com/apache/hudi/pull/10938


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Data lose after writing [hudi]

2024-03-31 Thread via GitHub


ad1happy2go commented on issue #10935:
URL: https://github.com/apache/hudi/issues/10935#issuecomment-2029175623

   @wangzhongz Hudi version you are using is too old. Is it possible for you to 
upgrade?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7552] Remove the suffix for MDT table service instants [hudi]

2024-03-31 Thread via GitHub


hudi-bot commented on PR #10945:
URL: https://github.com/apache/hudi/pull/10945#issuecomment-2029165793

   
   ## CI report:
   
   * 1fdb25272d5d41970393eb9bc7632a697ca879af Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23062)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7552] Remove the suffix for MDT table service instants [hudi]

2024-03-31 Thread via GitHub


hudi-bot commented on PR #10945:
URL: https://github.com/apache/hudi/pull/10945#issuecomment-2029160232

   
   ## CI report:
   
   * 1fdb25272d5d41970393eb9bc7632a697ca879af UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-7510) Loosen the compaction scheduling and rollback check for MDT

2024-03-31 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-7510.

Resolution: Fixed

Fixed via master branch: 9b094e628d6e4b1157cdee6e5ae951a99d32921a

> Loosen the compaction scheduling and rollback check for MDT
> ---
>
> Key: HUDI-7510
> URL: https://issues.apache.org/jira/browse/HUDI-7510
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core, metadata, table-service
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated (26c00a3adef -> 9b094e628d6)

2024-03-31 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 26c00a3adef [HUDI-7187] Fix integ test props to honor new streamer 
properties (#10866)
 add 9b094e628d6 [HUDI-7510] Loosen the compaction scheduling and rollback 
check for MDT (#10874)

No new revisions were added by this update.

Summary of changes:
 .../metadata/HoodieBackedTableMetadataWriter.java  |  74 -
 .../common/testutils/HoodieMetadataTestTable.java  |   1 -
 .../FlinkHoodieBackedTableMetadataWriter.java  |  19 ---
 .../hudi/client/TestJavaHoodieBackedMetadata.java  |  34 ++---
 .../hudi/testutils/TestHoodieMetadataBase.java |   2 +-
 .../functional/TestHoodieBackedMetadata.java   |  95 +++-
 .../apache/hudi/io/TestHoodieTimelineArchiver.java | 165 +
 .../table/action/compact/CompactionTestBase.java   |   2 +-
 8 files changed, 202 insertions(+), 190 deletions(-)



Re: [PR] [HUDI-7510] Loosen the compaction scheduling and rollback check for MDT [hudi]

2024-03-31 Thread via GitHub


danny0405 merged PR #10874:
URL: https://github.com/apache/hudi/pull/10874


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]

2024-03-31 Thread via GitHub


the-other-tim-brown commented on code in PR #10943:
URL: https://github.com/apache/hudi/pull/10943#discussion_r1545960527


##
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java:
##
@@ -209,10 +210,18 @@ public static byte[] avroToJson(GenericRecord record, 
boolean pretty) throws IOE
   private static ByteArrayOutputStream avroToJsonHelper(GenericRecord record, 
boolean pretty) throws IOException {
 DatumWriter writer = new GenericDatumWriter<>(record.getSchema());
 ByteArrayOutputStream out = new ByteArrayOutputStream();
-JsonEncoder jsonEncoder = 
EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty);
-writer.write(record, jsonEncoder);
-jsonEncoder.flush();
-return out;
+try {
+  JsonEncoder jsonEncoder = 
EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty);
+  writer.write(record, jsonEncoder);
+  jsonEncoder.flush();
+  return out;
+} catch (ClassCastException | NullPointerException ex) {
+  // NullPointerException will be thrown in cases where the field values 
are missing
+  // ClassCastException will be thrown in cases where the field values do 
not match the schema type
+  // Fallback to using `toString` which also returns json but without a 
pretty-print option
+  out.write(record.toString().getBytes(StandardCharsets.UTF_8));

Review Comment:
   Ok, I'm not familiar with that flow. If this is breaking that flow, I can 
just make a new method



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6538] Refactor methods in TimelineDiffHelper class [hudi]

2024-03-31 Thread via GitHub


wombatu-kun commented on PR #10938:
URL: https://github.com/apache/hudi/pull/10938#issuecomment-2029157601

   @nsivabalan this refactoring is made addressing the code you proposed in 
comment to other PR.  
   Could you please review it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7526] Fix constructors for bulkinsert sort partitioners to ensure we could use it as user defined partitioners [hudi]

2024-03-31 Thread via GitHub


wombatu-kun commented on code in PR #10942:
URL: https://github.com/apache/hudi/pull/10942#discussion_r1545958016


##
hudi-client/hudi-java-client/src/main/java/org/apache/hudi/execution/bulkinsert/JavaGlobalSortPartitioner.java:
##
@@ -31,12 +32,21 @@
  *
  * @param  HoodieRecordPayload type
  */
-public class JavaGlobalSortPartitioner
-implements BulkInsertPartitioner>> {
+public class JavaGlobalSortPartitioner implements 
BulkInsertPartitioner>> {
+
+  public JavaGlobalSortPartitioner() {
+  }
+
+  /**
+   * Constructor to create as UserDefinedBulkInsertPartitioner class via 
reflection
+   * @param config HoodieWriteConfig

Review Comment:
   Yes, in this case HoodieWriteConfig is ignored just because this Partitioner 
is not configurable at all, but it does not mean that it should not be used as 
`UserDefinedBulkInsertPartitioner`.  
   So I think, the purpose of this task is not to make all 
BulkInsertPartitioners customizable with HoodieWriteConfig, but only to make 
them instantiable via reflection with already existing common approach for 
UserDefinedBulkInsertPartitioner (constructor with HoodieWriteConfig as the 
only parameter).  
   @nsivabalan am I right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Changing the Properties to Load From Both Default Path and Enviorment [hudi]

2024-03-31 Thread via GitHub


Amar1404 commented on PR #10835:
URL: https://github.com/apache/hudi/pull/10835#issuecomment-2029144280

   @CTTY : So here In EMR the default conf is getting applied, but as per the 
document of hudi if we specify the ENV HUDI_DEFAULT_CONF is not getting applied 
due to the bug in code, which i have fixed. Now the conf from current thread is 
loaded, the the Enviorment variable is loaded then from the local system.
   
   
   The EMR configuration in existing is still getting applied, just the changes 
it to applied now seting in ENV variable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7557) NoSuchElementException when savepoint has been removed or archived

2024-03-31 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-7557:
-

 Summary: NoSuchElementException when savepoint has been removed or 
archived
 Key: HUDI-7557
 URL: https://issues.apache.org/jira/browse/HUDI-7557
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Sagar Sumit
 Fix For: 0.15.0


This 
[block|https://github.com/apache/hudi/blob/26c00a3adefff9217187ca0ab9a5b2a7c9e42199/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java#L246-L249]
 of code is buggy when commit which was savepointed has been removed or 
archived.

 
{code:java}
if (!instantOption.isPresent()) {
        LOG.warn("Skipping to process a commit for which savepoint was removed 
as the instant moved to archived timeline already");
      }
HoodieInstant instant = instantOption.get(); {code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7526] Fix constructors for bulkinsert sort partitioners to ensure we could use it as user defined partitioners [hudi]

2024-03-31 Thread via GitHub


danny0405 commented on code in PR #10942:
URL: https://github.com/apache/hudi/pull/10942#discussion_r1545945929


##
hudi-client/hudi-java-client/src/main/java/org/apache/hudi/execution/bulkinsert/JavaGlobalSortPartitioner.java:
##
@@ -31,12 +32,21 @@
  *
  * @param  HoodieRecordPayload type
  */
-public class JavaGlobalSortPartitioner
-implements BulkInsertPartitioner>> {
+public class JavaGlobalSortPartitioner implements 
BulkInsertPartitioner>> {
+
+  public JavaGlobalSortPartitioner() {
+  }
+
+  /**
+   * Constructor to create as UserDefinedBulkInsertPartitioner class via 
reflection
+   * @param config HoodieWriteConfig

Review Comment:
   I got confused because the "customized" `HoodieWriteConfig` does not really 
play a role here and it is ignored?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]

2024-03-31 Thread via GitHub


danny0405 commented on code in PR #10943:
URL: https://github.com/apache/hudi/pull/10943#discussion_r1545945200


##
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java:
##
@@ -209,10 +210,18 @@ public static byte[] avroToJson(GenericRecord record, 
boolean pretty) throws IOE
   private static ByteArrayOutputStream avroToJsonHelper(GenericRecord record, 
boolean pretty) throws IOException {
 DatumWriter writer = new GenericDatumWriter<>(record.getSchema());
 ByteArrayOutputStream out = new ByteArrayOutputStream();
-JsonEncoder jsonEncoder = 
EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty);
-writer.write(record, jsonEncoder);
-jsonEncoder.flush();
-return out;
+try {
+  JsonEncoder jsonEncoder = 
EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty);
+  writer.write(record, jsonEncoder);
+  jsonEncoder.flush();
+  return out;
+} catch (ClassCastException | NullPointerException ex) {
+  // NullPointerException will be thrown in cases where the field values 
are missing
+  // ClassCastException will be thrown in cases where the field values do 
not match the schema type
+  // Fallback to using `toString` which also returns json but without a 
pretty-print option
+  out.write(record.toString().getBytes(StandardCharsets.UTF_8));

Review Comment:
   > You get a string that represents the json of the object, it does not do 
any validation on types/nullability
   
   I kind of remember we have some cases for converting the json into avro then 
back to json again operations for our commit metadata.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7552) Remove the suffix for MDT table service instants

2024-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7552:
-
Labels: pull-request-available  (was: )

> Remove the suffix for MDT table service instants
> 
>
> Key: HUDI-7552
> URL: https://issues.apache.org/jira/browse/HUDI-7552
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7552] Remove the suffix for MDT table service instants [hudi]

2024-03-31 Thread via GitHub


danny0405 opened a new pull request, #10945:
URL: https://github.com/apache/hudi/pull/10945

   ### Change Logs
   
   Remove the suffix of MDT table operation instants (the async index operation 
is kept because there is still some validation on it, the suffix is used for 
efficient filtering).
   
   Also simplify the logic for MDT delta instant validation for log reader.
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   low medium
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7526] Fix constructors for bulkinsert sort partitioners to ensure we could use it as user defined partitioners [hudi]

2024-03-31 Thread via GitHub


wombatu-kun commented on code in PR #10942:
URL: https://github.com/apache/hudi/pull/10942#discussion_r1545900885


##
hudi-client/hudi-java-client/src/main/java/org/apache/hudi/execution/bulkinsert/JavaGlobalSortPartitioner.java:
##
@@ -31,12 +32,21 @@
  *
  * @param  HoodieRecordPayload type
  */
-public class JavaGlobalSortPartitioner
-implements BulkInsertPartitioner>> {
+public class JavaGlobalSortPartitioner implements 
BulkInsertPartitioner>> {
+
+  public JavaGlobalSortPartitioner() {
+  }
+
+  /**
+   * Constructor to create as UserDefinedBulkInsertPartitioner class via 
reflection
+   * @param config HoodieWriteConfig

Review Comment:
   This partitioner will be instantiated when user define write config property 
`hoodie.bulkinsert.user.defined.partitioner.class=org.apache.hudi.execution.bulkinsert.JavaGlobalSortPartitioner`.
  
   This constructor will be called via reflection in methods of DataSourceUtils 
class `createUserDefinedBulkInsertPartitioner(HoodieWriteConfig config)` and 
`createUserDefinedBulkInsertPartitionerWithRows(HoodieWriteConfig config)`.  
   There is nothing to customize in this JavaGlobalSortPartitioner, but, for 
example, provided writeConfig is used for customization  of 
RowSpatialCurveSortPartitioner.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7526] Fix constructors for bulkinsert sort partitioners to ensure we could use it as user defined partitioners [hudi]

2024-03-31 Thread via GitHub


wombatu-kun commented on code in PR #10942:
URL: https://github.com/apache/hudi/pull/10942#discussion_r1545900885


##
hudi-client/hudi-java-client/src/main/java/org/apache/hudi/execution/bulkinsert/JavaGlobalSortPartitioner.java:
##
@@ -31,12 +32,21 @@
  *
  * @param  HoodieRecordPayload type
  */
-public class JavaGlobalSortPartitioner
-implements BulkInsertPartitioner>> {
+public class JavaGlobalSortPartitioner implements 
BulkInsertPartitioner>> {
+
+  public JavaGlobalSortPartitioner() {
+  }
+
+  /**
+   * Constructor to create as UserDefinedBulkInsertPartitioner class via 
reflection
+   * @param config HoodieWriteConfig

Review Comment:
   This partitioner will be instantiated when user define write config property 
`hoodie.bulkinsert.user.defined.partitioner.class=org.apache.hudi.execution.bulkinsert.JavaGlobalSortPartitioner`.
 This constructor will be called via reflection in methods of DataSourceUtils 
class `createUserDefinedBulkInsertPartitioner(HoodieWriteConfig config)` and 
`createUserDefinedBulkInsertPartitionerWithRows(HoodieWriteConfig config)`:  
   
   `private static Option 
createUserDefinedBulkInsertPartitioner(HoodieWriteConfig config)
 throws HoodieException {
   String bulkInsertPartitionerClass = 
config.getUserDefinedBulkInsertPartitionerClass();
   try {
 return StringUtils.isNullOrEmpty(bulkInsertPartitionerClass)
 ? Option.empty() :
 Option.of((BulkInsertPartitioner) 
ReflectionUtils.loadClass(bulkInsertPartitionerClass, config));
   } catch (Throwable e) {
 throw new HoodieException("Could not create 
UserDefinedBulkInsertPartitioner class " + bulkInsertPartitionerClass, e);
   }
 }`
   
   There is nothing to customize in this JavaGlobalSortPartitioner, but, for 
example, provided writeConfig is used for customization  of 
RowSpatialCurveSortPartitioner.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7526] Fix constructors for bulkinsert sort partitioners to ensure we could use it as user defined partitioners [hudi]

2024-03-31 Thread via GitHub


wombatu-kun commented on code in PR #10942:
URL: https://github.com/apache/hudi/pull/10942#discussion_r1545900885


##
hudi-client/hudi-java-client/src/main/java/org/apache/hudi/execution/bulkinsert/JavaGlobalSortPartitioner.java:
##
@@ -31,12 +32,21 @@
  *
  * @param  HoodieRecordPayload type
  */
-public class JavaGlobalSortPartitioner
-implements BulkInsertPartitioner>> {
+public class JavaGlobalSortPartitioner implements 
BulkInsertPartitioner>> {
+
+  public JavaGlobalSortPartitioner() {
+  }
+
+  /**
+   * Constructor to create as UserDefinedBulkInsertPartitioner class via 
reflection
+   * @param config HoodieWriteConfig

Review Comment:
   This partitioner will be instantiated when user define write config property 
`hoodie.bulkinsert.user.defined.partitioner.class=org.apache.hudi.execution.bulkinsert.JavaGlobalSortPartitioner`.
 This constructor will be called via reflection in DataSourceUtils class: 
`createUserDefinedBulkInsertPartitioner(HoodieWriteConfig config)` and 
`createUserDefinedBulkInsertPartitionerWithRows(HoodieWriteConfig config)`.  
   There is nothing to customize in this JavaGlobalSortPartitioner, but, for 
example, provided writeConfig is used for customization  of 
RowSpatialCurveSortPartitioner.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-03-31 Thread via GitHub


xuzifu666 commented on code in PR #10898:
URL: https://github.com/apache/hudi/pull/10898#discussion_r1545892451


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/HoodieSimpleBucketIndex.java:
##
@@ -61,14 +75,63 @@ public Map 
loadBucketIdToFileIdMappingForPartitio
   if (!bucketIdToFileIdMapping.containsKey(bucketId)) {
 bucketIdToFileIdMapping.put(bucketId, new 
HoodieRecordLocation(commitTime, fileId));
   } else {
+// Finding the instants which conflict with the bucket id
+Set instants = 
findTheConflictBucketIdInPartition(hoodieTable, partition, bucketId);
+
 // Check if bucket data is valid
 throw new HoodieIOException("Find multiple files at partition 
path="
-+ partition + " belongs to the same bucket id = " + bucketId);
++ partition + " belongs to the same bucket id = " + bucketId
++ ", these instants need to rollback: " + instants.toString()
++ ", you can use rollback_to_instant procedure to recovery");
   }
 });
 return bucketIdToFileIdMapping;
   }
 
+
+  /**
+   * Find out the conflict files in bucket partition with bucekt id
+   */
+  public HashSet findTheConflictBucketIdInPartition(HoodieTable 
hoodieTable, String partition, int bucketId) {
+HashSet instants = new HashSet<>();

Review Comment:
   From anther view, this position get TableFileSystemView from hoodietable not 
confirm subclass is HoodieTableFileSystemView,at the same time not get all 
pending instant,so I think get pending instant from timeline maybe better 
@danny0405 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]

2024-03-31 Thread via GitHub


xuzifu666 commented on code in PR #10898:
URL: https://github.com/apache/hudi/pull/10898#discussion_r1545892451


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/HoodieSimpleBucketIndex.java:
##
@@ -61,14 +75,63 @@ public Map 
loadBucketIdToFileIdMappingForPartitio
   if (!bucketIdToFileIdMapping.containsKey(bucketId)) {
 bucketIdToFileIdMapping.put(bucketId, new 
HoodieRecordLocation(commitTime, fileId));
   } else {
+// Finding the instants which conflict with the bucket id
+Set instants = 
findTheConflictBucketIdInPartition(hoodieTable, partition, bucketId);
+
 // Check if bucket data is valid
 throw new HoodieIOException("Find multiple files at partition 
path="
-+ partition + " belongs to the same bucket id = " + bucketId);
++ partition + " belongs to the same bucket id = " + bucketId
++ ", these instants need to rollback: " + instants.toString()
++ ", you can use rollback_to_instant procedure to recovery");
   }
 });
 return bucketIdToFileIdMapping;
   }
 
+
+  /**
+   * Find out the conflict files in bucket partition with bucekt id
+   */
+  public HashSet findTheConflictBucketIdInPartition(HoodieTable 
hoodieTable, String partition, int bucketId) {
+HashSet instants = new HashSet<>();

Review Comment:
   From anther view, this position get TableFileSystemView from hoodietable not 
confirm subclass is HoodieTableFileSystemView,at the same time not get all 
pending instant,so I think get pending instant maybe better @danny0405 



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/HoodieSimpleBucketIndex.java:
##
@@ -61,14 +75,63 @@ public Map 
loadBucketIdToFileIdMappingForPartitio
   if (!bucketIdToFileIdMapping.containsKey(bucketId)) {
 bucketIdToFileIdMapping.put(bucketId, new 
HoodieRecordLocation(commitTime, fileId));
   } else {
+// Finding the instants which conflict with the bucket id
+Set instants = 
findTheConflictBucketIdInPartition(hoodieTable, partition, bucketId);
+
 // Check if bucket data is valid
 throw new HoodieIOException("Find multiple files at partition 
path="
-+ partition + " belongs to the same bucket id = " + bucketId);
++ partition + " belongs to the same bucket id = " + bucketId
++ ", these instants need to rollback: " + instants.toString()
++ ", you can use rollback_to_instant procedure to recovery");
   }
 });
 return bucketIdToFileIdMapping;
   }
 
+
+  /**
+   * Find out the conflict files in bucket partition with bucekt id
+   */
+  public HashSet findTheConflictBucketIdInPartition(HoodieTable 
hoodieTable, String partition, int bucketId) {
+HashSet instants = new HashSet<>();

Review Comment:
   Had Tried HoodieTableFileSystemView#fetchLatestFileSlicesIncludingInflight 
get Fileslice of the partition,but seems
not filter the error write instant from fileslices,current logic can 
confirm find out conflict instant,could we keep it? @danny0405 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]

2024-03-31 Thread via GitHub


hudi-bot commented on PR #10943:
URL: https://github.com/apache/hudi/pull/10943#issuecomment-2029024584

   
   ## CI report:
   
   * 70a35f705b74db87648f3f6a7e504614db6416aa Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23061)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUGGEST] Can the community version be updated regularly and faster? The roadmap should also be updated regularly and synchronized. [hudi]

2024-03-31 Thread via GitHub


zyclove opened a new issue, #10944:
URL: https://github.com/apache/hudi/issues/10944

   The version update is also too slow. It has not been updated for a long time.
   Many problems have to be solved in time and support is not available.
   
   https://github.com/apache/hudi/assets/15028279/96958d33-83ea-4282-afe6-b994ce9ff905";>
   
   https://github.com/apache/hudi/assets/15028279/09386936-55b9-48d2-a144-18aafb12ca29";>
   
   1.0 is originally a beta version with many problems. There has been no new 
version for so long, so when will the official version be available?
   
   The hudi roadmap has not been updated for a long time.
   https://hudi.apache.org/roadmap
   https://github.com/apache/hudi/assets/15028279/c3bded86-4445-4707-83c9-eb56f40be918";>
   
   
   I am very optimistic about the positioning and development of Hudi, but I 
sincerely hope that Hudi will develop better and better and truly solve the 
pain points of data lake business.
   
   Best regards


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch asf-site updated: rs - cow snap, mor ro; starrocks - cow snap, mor rt, ro (#10940)

2024-03-31 Thread bhavanisudha
This is an automated email from the ASF dual-hosted git repository.

bhavanisudha pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new e8d498d2199 rs - cow snap, mor ro; starrocks - cow snap, mor rt, ro 
(#10940)
e8d498d2199 is described below

commit e8d498d21998ea1c1005e6350e8828eb6842dcba
Author: Sagar Lakshmipathy <18vidhyasa...@gmail.com>
AuthorDate: Sun Mar 31 18:44:24 2024 -0700

rs - cow snap, mor ro; starrocks - cow snap, mor rt, ro (#10940)
---
 website/docs/sql_queries.md| 36 ++
 .../version-0.12.0/query_engine_setup.md   |  6 ++--
 .../versioned_docs/version-0.12.0/querying_data.md |  2 ++
 .../version-0.12.1/query_engine_setup.md   |  6 ++--
 .../versioned_docs/version-0.12.1/querying_data.md |  2 ++
 .../version-0.12.2/query_engine_setup.md   |  6 ++--
 .../versioned_docs/version-0.12.2/querying_data.md |  2 ++
 .../version-0.12.3/query_engine_setup.md   |  6 ++--
 .../versioned_docs/version-0.12.3/querying_data.md |  3 +-
 .../version-0.13.0/query_engine_setup.md   |  6 ++--
 .../versioned_docs/version-0.13.0/querying_data.md |  3 +-
 .../versioned_docs/version-0.13.1/querying_data.md | 31 +--
 .../versioned_docs/version-0.14.0/sql_queries.md   | 36 ++
 .../versioned_docs/version-0.14.1/sql_queries.md   | 36 ++
 14 files changed, 85 insertions(+), 96 deletions(-)

diff --git a/website/docs/sql_queries.md b/website/docs/sql_queries.md
index d833831169b..2180b40a48d 100644
--- a/website/docs/sql_queries.md
+++ b/website/docs/sql_queries.md
@@ -344,10 +344,8 @@ The current default supported version of Hudi is 0.10.0 ~ 
0.13.1, and has not be
 
 ## StarRocks
 
-Copy on Write tables in Apache Hudi 0.10.0 and above can be queried via 
StarRocks external tables from StarRocks version
-2.2.0. Only snapshot queries are supported currently. In future releases Merge 
on Read tables will also be supported.
-Please refer to [StarRocks Hudi external 
table](https://docs.starrocks.io/en-us/latest/using_starrocks/External_table#hudi-external-table)
-for more details on the setup.
+For Copy-on-Write tables StarRocks provides support for Snapshot queries and 
for Merge-on-Read tables, StarRocks provides support for Snapshot and Read 
Optimized queries.
+Please refer [StarRocks 
docs](https://docs.starrocks.io/docs/data_source/catalog/hudi_catalog/) for 
more details.
 
 ## ClickHouse
 
@@ -386,20 +384,20 @@ Following tables show whether a given query is supported 
on specific query engin
 
 ### Merge-On-Read tables
 
-| Query Engine|Snapshot Queries|Incremental Queries|Read Optimized 
Queries|
-|-||---|--|
-| **Hive**|Y|Y|Y|
-| **Spark SQL**   |Y|Y|Y|
-| **Spark Datasource** |Y|Y|Y|
-| **Flink SQL**   |Y|Y|Y|
-| **PrestoDB**|Y|N|Y|
-| **AWS Athena**  |Y|N|Y|
-| **Big Query**   |Y|N|Y|
-| **Trino**   |N|N|Y|
-| **Impala**  |N|N|Y|
-| **Redshift Spectrum** |N|N|N|
-| **Doris**   |Y|N|Y|
-| **StarRocks**   |N|N|N|
-| **ClickHouse**  |N|N|N|
+| Query Engine| Snapshot Queries |Incremental Queries| Read Optimized 
Queries |
+|-|--|---||
+| **Hive**| Y|Y| Y  |
+| **Spark SQL**   | Y|Y| Y  |
+| **Spark Datasource** | Y|Y| Y  |
+| **Flink SQL**   | Y|Y| Y  |
+| **PrestoDB**| Y|N| Y  |
+| **AWS Athena**  | Y|N| Y  |
+| **Big Query**   | Y|N| Y  |
+| **Trino**   | N|N| Y  |
+| **Impala**  | N|N| Y  |
+| **Redshift Spectrum** | N|N| Y  |
+| **Doris**   | Y|N| Y  |
+| **StarRocks**   | Y|N| Y  |
+| **ClickHouse**  | N|N| N  |
 
 
diff --git a/website/versioned_docs/version-0.12.0/query_engine_setup.md 
b/website/versioned_docs/version-0.12.0/query_engine_setup.md
index 79dfaf81233..47eaeaa27c5 100644
--- a/website/versioned_docs/version-0.12.0/query_engine_setup.md
+++ b/website/versioned_docs/version-0.12.0/query_engine_setup.md
@@ -127,7 +127,5 @@ Please refer to [Redshift Spectrum Integration with Apache 
Hudi](https://docs.aw
 for more details.
 
 ## StarRocks
-Copy on Write tables in Apache Hudi 0.10.0 and above can be queried via 
StarRocks external tables from StarRocks version 2.2.0.
-Only snapshot queries are supported currently

Re: [PR] [MINOR] [DOCS] changes to redshift & starrocks compat matrix [hudi]

2024-03-31 Thread via GitHub


bhasudha merged PR #10940:
URL: https://github.com/apache/hudi/pull/10940


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]

2024-03-31 Thread via GitHub


the-other-tim-brown commented on code in PR #10943:
URL: https://github.com/apache/hudi/pull/10943#discussion_r1545880874


##
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java:
##
@@ -209,10 +210,18 @@ public static byte[] avroToJson(GenericRecord record, 
boolean pretty) throws IOE
   private static ByteArrayOutputStream avroToJsonHelper(GenericRecord record, 
boolean pretty) throws IOException {
 DatumWriter writer = new GenericDatumWriter<>(record.getSchema());
 ByteArrayOutputStream out = new ByteArrayOutputStream();
-JsonEncoder jsonEncoder = 
EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty);
-writer.write(record, jsonEncoder);
-jsonEncoder.flush();
-return out;
+try {
+  JsonEncoder jsonEncoder = 
EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty);
+  writer.write(record, jsonEncoder);
+  jsonEncoder.flush();
+  return out;
+} catch (ClassCastException | NullPointerException ex) {
+  // NullPointerException will be thrown in cases where the field values 
are missing
+  // ClassCastException will be thrown in cases where the field values do 
not match the schema type
+  // Fallback to using `toString` which also returns json but without a 
pretty-print option
+  out.write(record.toString().getBytes(StandardCharsets.UTF_8));

Review Comment:
   > So when the transformed JSON string got converted back into avro, the 
schema could change right?
   
   The case here is when you have some data and are trying to convert it to 
avro and it fails. 
https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamerUtils.java#L164



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]

2024-03-31 Thread via GitHub


the-other-tim-brown commented on code in PR #10943:
URL: https://github.com/apache/hudi/pull/10943#discussion_r1545880266


##
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java:
##
@@ -209,10 +210,18 @@ public static byte[] avroToJson(GenericRecord record, 
boolean pretty) throws IOE
   private static ByteArrayOutputStream avroToJsonHelper(GenericRecord record, 
boolean pretty) throws IOException {
 DatumWriter writer = new GenericDatumWriter<>(record.getSchema());
 ByteArrayOutputStream out = new ByteArrayOutputStream();
-JsonEncoder jsonEncoder = 
EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty);
-writer.write(record, jsonEncoder);
-jsonEncoder.flush();
-return out;
+try {
+  JsonEncoder jsonEncoder = 
EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty);
+  writer.write(record, jsonEncoder);
+  jsonEncoder.flush();
+  return out;
+} catch (ClassCastException | NullPointerException ex) {
+  // NullPointerException will be thrown in cases where the field values 
are missing
+  // ClassCastException will be thrown in cases where the field values do 
not match the schema type
+  // Fallback to using `toString` which also returns json but without a 
pretty-print option
+  out.write(record.toString().getBytes(StandardCharsets.UTF_8));

Review Comment:
   You get a string that represents the json of the object, it does not do any 
validation on types/nullability. See the tests that are added for a sample.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]

2024-03-31 Thread via GitHub


danny0405 commented on code in PR #10943:
URL: https://github.com/apache/hudi/pull/10943#discussion_r1545879905


##
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java:
##
@@ -209,10 +210,18 @@ public static byte[] avroToJson(GenericRecord record, 
boolean pretty) throws IOE
   private static ByteArrayOutputStream avroToJsonHelper(GenericRecord record, 
boolean pretty) throws IOException {
 DatumWriter writer = new GenericDatumWriter<>(record.getSchema());
 ByteArrayOutputStream out = new ByteArrayOutputStream();
-JsonEncoder jsonEncoder = 
EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty);
-writer.write(record, jsonEncoder);
-jsonEncoder.flush();
-return out;
+try {
+  JsonEncoder jsonEncoder = 
EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty);
+  writer.write(record, jsonEncoder);
+  jsonEncoder.flush();
+  return out;
+} catch (ClassCastException | NullPointerException ex) {
+  // NullPointerException will be thrown in cases where the field values 
are missing
+  // ClassCastException will be thrown in cases where the field values do 
not match the schema type
+  // Fallback to using `toString` which also returns json but without a 
pretty-print option
+  out.write(record.toString().getBytes(StandardCharsets.UTF_8));

Review Comment:
   So when the transformed JSON string got converted back into avro, the schema 
could change right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]

2024-03-31 Thread via GitHub


danny0405 commented on code in PR #10943:
URL: https://github.com/apache/hudi/pull/10943#discussion_r1545879712


##
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java:
##
@@ -209,10 +210,18 @@ public static byte[] avroToJson(GenericRecord record, 
boolean pretty) throws IOE
   private static ByteArrayOutputStream avroToJsonHelper(GenericRecord record, 
boolean pretty) throws IOException {
 DatumWriter writer = new GenericDatumWriter<>(record.getSchema());
 ByteArrayOutputStream out = new ByteArrayOutputStream();
-JsonEncoder jsonEncoder = 
EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty);
-writer.write(record, jsonEncoder);
-jsonEncoder.flush();
-return out;
+try {
+  JsonEncoder jsonEncoder = 
EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty);
+  writer.write(record, jsonEncoder);
+  jsonEncoder.flush();
+  return out;
+} catch (ClassCastException | NullPointerException ex) {
+  // NullPointerException will be thrown in cases where the field values 
are missing
+  // ClassCastException will be thrown in cases where the field values do 
not match the schema type
+  // Fallback to using `toString` which also returns json but without a 
pretty-print option
+  out.write(record.toString().getBytes(StandardCharsets.UTF_8));

Review Comment:
   Hmm, seems like a `null` constant for empty field.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]

2024-03-31 Thread via GitHub


hudi-bot commented on PR #10943:
URL: https://github.com/apache/hudi/pull/10943#issuecomment-2028989292

   
   ## CI report:
   
   * 70a35f705b74db87648f3f6a7e504614db6416aa Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23061)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]

2024-03-31 Thread via GitHub


danny0405 commented on code in PR #10943:
URL: https://github.com/apache/hudi/pull/10943#discussion_r1545879498


##
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java:
##
@@ -209,10 +210,18 @@ public static byte[] avroToJson(GenericRecord record, 
boolean pretty) throws IOE
   private static ByteArrayOutputStream avroToJsonHelper(GenericRecord record, 
boolean pretty) throws IOException {
 DatumWriter writer = new GenericDatumWriter<>(record.getSchema());
 ByteArrayOutputStream out = new ByteArrayOutputStream();
-JsonEncoder jsonEncoder = 
EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty);
-writer.write(record, jsonEncoder);
-jsonEncoder.flush();
-return out;
+try {
+  JsonEncoder jsonEncoder = 
EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty);
+  writer.write(record, jsonEncoder);
+  jsonEncoder.flush();
+  return out;
+} catch (ClassCastException | NullPointerException ex) {
+  // NullPointerException will be thrown in cases where the field values 
are missing
+  // ClassCastException will be thrown in cases where the field values do 
not match the schema type
+  // Fallback to using `toString` which also returns json but without a 
pretty-print option
+  out.write(record.toString().getBytes(StandardCharsets.UTF_8));

Review Comment:
   What do we get for `record.toString` when `NullPointerException` is thrown?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7526] Fix constructors for bulkinsert sort partitioners to ensure we could use it as user defined partitioners [hudi]

2024-03-31 Thread via GitHub


danny0405 commented on code in PR #10942:
URL: https://github.com/apache/hudi/pull/10942#discussion_r1545879218


##
hudi-client/hudi-java-client/src/main/java/org/apache/hudi/execution/bulkinsert/JavaGlobalSortPartitioner.java:
##
@@ -31,12 +32,21 @@
  *
  * @param  HoodieRecordPayload type
  */
-public class JavaGlobalSortPartitioner
-implements BulkInsertPartitioner>> {
+public class JavaGlobalSortPartitioner implements 
BulkInsertPartitioner>> {
+
+  public JavaGlobalSortPartitioner() {
+  }
+
+  /**
+   * Constructor to create as UserDefinedBulkInsertPartitioner class via 
reflection
+   * @param config HoodieWriteConfig

Review Comment:
   Can you give an example how this partitioner got instantiated and customized?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]

2024-03-31 Thread via GitHub


hudi-bot commented on PR #10943:
URL: https://github.com/apache/hudi/pull/10943#issuecomment-2028985048

   
   ## CI report:
   
   * 70a35f705b74db87648f3f6a7e504614db6416aa UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [MINOR] Handle cases of malformed records when converting to json [hudi]

2024-03-31 Thread via GitHub


the-other-tim-brown opened a new pull request, #10943:
URL: https://github.com/apache/hudi/pull/10943

   ### Change Logs
   
   Handles cases of missing required fields and bad input values when 
converting to JSON. This conversion is used in combination with the Error Table 
so you cannot assume that the records are properly formatted.
   
   ### Impact
   
   Avoids exceptions being thrown for malformed input data being sent to the 
error table writer
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7526] Fix constructors for bulkinsert sort partitioners to ensure we could use it as user defined partitioners [hudi]

2024-03-31 Thread via GitHub


hudi-bot commented on PR #10942:
URL: https://github.com/apache/hudi/pull/10942#issuecomment-2028820679

   
   ## CI report:
   
   * ea11f68c1778f9ec23eab6a887076e51f60caa0b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23060)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7526] Fix constructors for bulkinsert sort partitioners to ensure we could use it as user defined partitioners [hudi]

2024-03-31 Thread via GitHub


hudi-bot commented on PR #10942:
URL: https://github.com/apache/hudi/pull/10942#issuecomment-2028787238

   
   ## CI report:
   
   * ea11f68c1778f9ec23eab6a887076e51f60caa0b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7526) Fix constructors for all bulk insert sort partitioners to ensure we could use it as user defined partitioners

2024-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7526:
-
Labels: pull-request-available  (was: )

> Fix constructors for all bulk insert sort partitioners to ensure we could use 
> it as user defined partitioners 
> --
>
> Key: HUDI-7526
> URL: https://issues.apache.org/jira/browse/HUDI-7526
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: writer-core
>Reporter: sivabalan narayanan
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: pull-request-available
>
> Our constructor for user defined sort partitioner takes in write config, 
> while some of the partitioners used in out of the box sort mode, does not 
> account for it. 
>  
> Lets fix the sort partitioners to ensure anything can be used as user defined 
> partitioners. 
> For eg, NoneSortMode does not have a constructor that takes in write config 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7526] Fix constructors for bulkinsert sort partitioners to ensure we could use it as user defined partitioners [hudi]

2024-03-31 Thread via GitHub


wombatu-kun opened a new pull request, #10942:
URL: https://github.com/apache/hudi/pull/10942

   ### Change Logs
   Our constructor for user defined sort partitioner takes in write config, 
while some of the partitioners used in out of the box sort mode, does not 
account for it.   
   Lets fix the sort partitioners to ensure anything can be used as user 
defined partitioners. 
   For eg, NoneSortMode does not have a constructor that takes in write config 
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   none
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-7526) Fix constructors for all bulk insert sort partitioners to ensure we could use it as user defined partitioners

2024-03-31 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov reassigned HUDI-7526:
---

Assignee: Vova Kolmakov

> Fix constructors for all bulk insert sort partitioners to ensure we could use 
> it as user defined partitioners 
> --
>
> Key: HUDI-7526
> URL: https://issues.apache.org/jira/browse/HUDI-7526
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: writer-core
>Reporter: sivabalan narayanan
>Assignee: Vova Kolmakov
>Priority: Major
>
> Our constructor for user defined sort partitioner takes in write config, 
> while some of the partitioners used in out of the box sort mode, does not 
> account for it. 
>  
> Lets fix the sort partitioners to ensure anything can be used as user defined 
> partitioners. 
> For eg, NoneSortMode does not have a constructor that takes in write config 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7526) Fix constructors for all bulk insert sort partitioners to ensure we could use it as user defined partitioners

2024-03-31 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov updated HUDI-7526:

Status: In Progress  (was: Open)

> Fix constructors for all bulk insert sort partitioners to ensure we could use 
> it as user defined partitioners 
> --
>
> Key: HUDI-7526
> URL: https://issues.apache.org/jira/browse/HUDI-7526
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: writer-core
>Reporter: sivabalan narayanan
>Assignee: Vova Kolmakov
>Priority: Major
>
> Our constructor for user defined sort partitioner takes in write config, 
> while some of the partitioners used in out of the box sort mode, does not 
> account for it. 
>  
> Lets fix the sort partitioners to ensure anything can be used as user defined 
> partitioners. 
> For eg, NoneSortMode does not have a constructor that takes in write config 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)