[jira] [Updated] (HUDI-7557) NoSuchElementException when commit corresponding to savepoint has been removed or archived
[ https://issues.apache.org/jira/browse/HUDI-7557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7557: - Labels: pull-request-available (was: ) > NoSuchElementException when commit corresponding to savepoint has been > removed or archived > -- > > Key: HUDI-7557 > URL: https://issues.apache.org/jira/browse/HUDI-7557 > Project: Apache Hudi > Issue Type: Bug >Reporter: Sagar Sumit >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > > This > [block|https://github.com/apache/hudi/blob/26c00a3adefff9217187ca0ab9a5b2a7c9e42199/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java#L246-L249] > of code is buggy when commit which was savepointed has been removed or > archived. > > {code:java} > if (!instantOption.isPresent()) { > LOG.warn("Skipping to process a commit for which savepoint was > removed as the instant moved to archived timeline already"); > } > HoodieInstant instant = instantOption.get(); {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7557] Fix incremental cleaner when commit for savepoint removed [hudi]
codope opened a new pull request, #10946: URL: https://github.com/apache/hudi/pull/10946 ### Change Logs This [block](https://github.com/apache/hudi/blob/26c00a3adefff9217187ca0ab9a5b2a7c9e42199/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java#L246-L249) of code is buggy when commit which was savepointed has been removed or archived. The PR handles the empty `Option`. This code path is exercised only when incremental cleaning is enabled and there are savepoints in the timeline. ### Impact Bug fix for incremental cleaner. ### Risk level (write none, low medium or high below) low ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7552] Remove the suffix for MDT table service instants [hudi]
hudi-bot commented on PR #10945: URL: https://github.com/apache/hudi/pull/10945#issuecomment-2029227726 ## CI report: * bf8eba5011f8ff4762e4da92aa57057873bafeab Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23063) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-4444) Refactor DataSourceInternalWriterHelper
[ https://issues.apache.org/jira/browse/HUDI-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov reassigned HUDI-: --- Assignee: (was: Vova Kolmakov) > Refactor DataSourceInternalWriterHelper > --- > > Key: HUDI- > URL: https://issues.apache.org/jira/browse/HUDI- > Project: Apache Hudi > Issue Type: Improvement > Components: code-quality >Reporter: Raymond Xu >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > > DataSourceInternalWriterHelper constructor is writing files (through > writeClient.startCommitWithTime and writeClient.preWrite), which is an > anti-pattern. We should refactor this part. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7557) NoSuchElementException when commit corresponding to savepoint has been removed or archived
[ https://issues.apache.org/jira/browse/HUDI-7557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7557: -- Summary: NoSuchElementException when commit corresponding to savepoint has been removed or archived (was: NoSuchElementException when savepoint has been removed or archived) > NoSuchElementException when commit corresponding to savepoint has been > removed or archived > -- > > Key: HUDI-7557 > URL: https://issues.apache.org/jira/browse/HUDI-7557 > Project: Apache Hudi > Issue Type: Bug >Reporter: Sagar Sumit >Priority: Major > Fix For: 0.15.0 > > > This > [block|https://github.com/apache/hudi/blob/26c00a3adefff9217187ca0ab9a5b2a7c9e42199/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java#L246-L249] > of code is buggy when commit which was savepointed has been removed or > archived. > > {code:java} > if (!instantOption.isPresent()) { > LOG.warn("Skipping to process a commit for which savepoint was > removed as the instant moved to archived timeline already"); > } > HoodieInstant instant = instantOption.get(); {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7552] Remove the suffix for MDT table service instants [hudi]
hudi-bot commented on PR #10945: URL: https://github.com/apache/hudi/pull/10945#issuecomment-2029221444 ## CI report: * 1fdb25272d5d41970393eb9bc7632a697ca879af Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23062) * bf8eba5011f8ff4762e4da92aa57057873bafeab Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23063) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7552] Remove the suffix for MDT table service instants [hudi]
hudi-bot commented on PR #10945: URL: https://github.com/apache/hudi/pull/10945#issuecomment-2029213925 ## CI report: * 1fdb25272d5d41970393eb9bc7632a697ca879af Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23062) * bf8eba5011f8ff4762e4da92aa57057873bafeab UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]
danny0405 commented on code in PR #10943: URL: https://github.com/apache/hudi/pull/10943#discussion_r1545988384 ## hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java: ## @@ -209,10 +210,18 @@ public static byte[] avroToJson(GenericRecord record, boolean pretty) throws IOE private static ByteArrayOutputStream avroToJsonHelper(GenericRecord record, boolean pretty) throws IOException { DatumWriter writer = new GenericDatumWriter<>(record.getSchema()); ByteArrayOutputStream out = new ByteArrayOutputStream(); -JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty); -writer.write(record, jsonEncoder); -jsonEncoder.flush(); -return out; +try { + JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty); + writer.write(record, jsonEncoder); + jsonEncoder.flush(); + return out; +} catch (ClassCastException | NullPointerException ex) { + // NullPointerException will be thrown in cases where the field values are missing + // ClassCastException will be thrown in cases where the field values do not match the schema type + // Fallback to using `toString` which also returns json but without a pretty-print option + out.write(record.toString().getBytes(StandardCharsets.UTF_8)); Review Comment: If the schema does not really change for that, it is okay, maybe we can add some use cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7552] Remove the suffix for MDT table service instants [hudi]
hudi-bot commented on PR #10945: URL: https://github.com/apache/hudi/pull/10945#issuecomment-2029207026 ## CI report: * 1fdb25272d5d41970393eb9bc7632a697ca879af Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23062) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-6538) Refactor methods in TimelineDiffHelper class
[ https://issues.apache.org/jira/browse/HUDI-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-6538. Resolution: Fixed Fixed via master branch: 44ab6f32bffbab8cd250bd0430d9591360f118e7 > Refactor methods in TimelineDiffHelper class > > > Key: HUDI-6538 > URL: https://issues.apache.org/jira/browse/HUDI-6538 > Project: Apache Hudi > Issue Type: Task >Reporter: Surya Prasanna Yalla >Assignee: Vova Kolmakov >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > Refactor methods in TimelineDiffHelper class to address following comment in > [PR-9007|https://github.com/apache/hudi/pull/9007] > > {code:java} > The methods getPendingReplaceCommitTransitions and > getPendingLogCompactionTransitions look almost the same except the action > type, can we abstract a little to merge them altogether?{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6538) Refactor methods in TimelineDiffHelper class
[ https://issues.apache.org/jira/browse/HUDI-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-6538: - Fix Version/s: 1.0.0 > Refactor methods in TimelineDiffHelper class > > > Key: HUDI-6538 > URL: https://issues.apache.org/jira/browse/HUDI-6538 > Project: Apache Hudi > Issue Type: Task >Reporter: Surya Prasanna Yalla >Assignee: Vova Kolmakov >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > Refactor methods in TimelineDiffHelper class to address following comment in > [PR-9007|https://github.com/apache/hudi/pull/9007] > > {code:java} > The methods getPendingReplaceCommitTransitions and > getPendingLogCompactionTransitions look almost the same except the action > type, can we abstract a little to merge them altogether?{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch master updated: [HUDI-6538] Refactor methods in TimelineDiffHelper class (#10938)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 44ab6f32bff [HUDI-6538] Refactor methods in TimelineDiffHelper class (#10938) 44ab6f32bff is described below commit 44ab6f32bffbab8cd250bd0430d9591360f118e7 Author: wombatu-kun AuthorDate: Mon Apr 1 12:47:27 2024 +0700 [HUDI-6538] Refactor methods in TimelineDiffHelper class (#10938) --- .../common/table/timeline/TimelineDiffHelper.java | 66 +++--- 1 file changed, 21 insertions(+), 45 deletions(-) diff --git a/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/TimelineDiffHelper.java b/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/TimelineDiffHelper.java index aa7e2a30754..a98b71aa571 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/TimelineDiffHelper.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/TimelineDiffHelper.java @@ -37,8 +37,11 @@ public class TimelineDiffHelper { private static final Logger LOG = LoggerFactory.getLogger(TimelineDiffHelper.class); + private TimelineDiffHelper() { + } + public static TimelineDiffResult getNewInstantsForIncrementalSync(HoodieTimeline oldTimeline, - HoodieTimeline newTimeline) { + HoodieTimeline newTimeline) { HoodieTimeline oldT = oldTimeline.filterCompletedAndCompactionInstants(); HoodieTimeline newT = newTimeline.filterCompletedAndCompactionInstants(); @@ -57,14 +60,14 @@ public class TimelineDiffHelper { List newInstants = new ArrayList<>(); // Check If any pending compaction is lost. If so, do not allow incremental timeline sync - List> compactionInstants = getPendingCompactionTransitions(oldT, newT); + List> compactionInstants = getPendingActionTransitions(oldT.filterPendingCompactionTimeline(), + newT, HoodieTimeline.COMMIT_ACTION, HoodieTimeline.COMPACTION_ACTION); List lostPendingCompactions = compactionInstants.stream() .filter(instantPair -> instantPair.getValue() == null).map(Pair::getKey).collect(Collectors.toList()); if (!lostPendingCompactions.isEmpty()) { // If a compaction is unscheduled, fall back to complete refresh of fs view since some log files could have been // moved. Its unsafe to incrementally sync in that case. -LOG.warn("Some pending compactions are no longer in new timeline (unscheduled ?). They are :" -+ lostPendingCompactions); +LOG.warn("Some pending compactions are no longer in new timeline (unscheduled ?). They are: {}", lostPendingCompactions); return TimelineDiffResult.UNSAFE_SYNC_RESULT; } List finishedCompactionInstants = compactionInstants.stream() @@ -74,7 +77,8 @@ public class TimelineDiffHelper { newTimeline.getInstantsAsStream().filter(instant -> !oldTimelineInstants.contains(instant)).forEach(newInstants::add); - List> logCompactionInstants = getPendingLogCompactionTransitions(oldTimeline, newTimeline); + List> logCompactionInstants = getPendingActionTransitions(oldTimeline.filterPendingLogCompactionTimeline(), + newTimeline, HoodieTimeline.DELTA_COMMIT_ACTION, HoodieTimeline.LOG_COMPACTION_ACTION); List finishedOrRemovedLogCompactionInstants = logCompactionInstants.stream() .filter(instantPair -> !instantPair.getKey().isCompleted() && (instantPair.getValue() == null || instantPair.getValue().isCompleted())) @@ -87,52 +91,24 @@ public class TimelineDiffHelper { } } - /** - * Getting pending log compaction transitions. - */ - private static List> getPendingLogCompactionTransitions(HoodieTimeline oldTimeline, - HoodieTimeline newTimeline) { -Set newTimelineInstants = newTimeline.getInstantsAsStream().collect(Collectors.toSet()); - -return oldTimeline.filterPendingLogCompactionTimeline().getInstantsAsStream().map(instant -> { - if (newTimelineInstants.contains(instant)) { -return Pair.of(instant, instant); - } else { -HoodieInstant logCompacted = -new HoodieInstant(State.COMPLETED, HoodieTimeline.DELTA_COMMIT_ACTION, instant.getTimestamp()); -if (newTimelineInstants.contains(logCompacted)) { - return Pair.of(instant, logCompacted); -} -HoodieInstant inflightLogCompacted = -new HoodieInstant(State.INFLIGHT, HoodieTimeline.LOG_COMPACTION_ACTION, instant.getTimestamp()); -if (newTimelineInstants.contains(inflightLogCompacted)) { - return Pair.of(instant, inflightLogCompacted); -} -return Pair.of(instant, null); - } -}).colle
Re: [PR] [HUDI-6538] Refactor methods in TimelineDiffHelper class [hudi]
danny0405 merged PR #10938: URL: https://github.com/apache/hudi/pull/10938 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Data lose after writing [hudi]
ad1happy2go commented on issue #10935: URL: https://github.com/apache/hudi/issues/10935#issuecomment-2029175623 @wangzhongz Hudi version you are using is too old. Is it possible for you to upgrade? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7552] Remove the suffix for MDT table service instants [hudi]
hudi-bot commented on PR #10945: URL: https://github.com/apache/hudi/pull/10945#issuecomment-2029165793 ## CI report: * 1fdb25272d5d41970393eb9bc7632a697ca879af Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23062) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7552] Remove the suffix for MDT table service instants [hudi]
hudi-bot commented on PR #10945: URL: https://github.com/apache/hudi/pull/10945#issuecomment-2029160232 ## CI report: * 1fdb25272d5d41970393eb9bc7632a697ca879af UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-7510) Loosen the compaction scheduling and rollback check for MDT
[ https://issues.apache.org/jira/browse/HUDI-7510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-7510. Resolution: Fixed Fixed via master branch: 9b094e628d6e4b1157cdee6e5ae951a99d32921a > Loosen the compaction scheduling and rollback check for MDT > --- > > Key: HUDI-7510 > URL: https://issues.apache.org/jira/browse/HUDI-7510 > Project: Apache Hudi > Issue Type: Improvement > Components: core, metadata, table-service >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch master updated (26c00a3adef -> 9b094e628d6)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 26c00a3adef [HUDI-7187] Fix integ test props to honor new streamer properties (#10866) add 9b094e628d6 [HUDI-7510] Loosen the compaction scheduling and rollback check for MDT (#10874) No new revisions were added by this update. Summary of changes: .../metadata/HoodieBackedTableMetadataWriter.java | 74 - .../common/testutils/HoodieMetadataTestTable.java | 1 - .../FlinkHoodieBackedTableMetadataWriter.java | 19 --- .../hudi/client/TestJavaHoodieBackedMetadata.java | 34 ++--- .../hudi/testutils/TestHoodieMetadataBase.java | 2 +- .../functional/TestHoodieBackedMetadata.java | 95 +++- .../apache/hudi/io/TestHoodieTimelineArchiver.java | 165 + .../table/action/compact/CompactionTestBase.java | 2 +- 8 files changed, 202 insertions(+), 190 deletions(-)
Re: [PR] [HUDI-7510] Loosen the compaction scheduling and rollback check for MDT [hudi]
danny0405 merged PR #10874: URL: https://github.com/apache/hudi/pull/10874 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]
the-other-tim-brown commented on code in PR #10943: URL: https://github.com/apache/hudi/pull/10943#discussion_r1545960527 ## hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java: ## @@ -209,10 +210,18 @@ public static byte[] avroToJson(GenericRecord record, boolean pretty) throws IOE private static ByteArrayOutputStream avroToJsonHelper(GenericRecord record, boolean pretty) throws IOException { DatumWriter writer = new GenericDatumWriter<>(record.getSchema()); ByteArrayOutputStream out = new ByteArrayOutputStream(); -JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty); -writer.write(record, jsonEncoder); -jsonEncoder.flush(); -return out; +try { + JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty); + writer.write(record, jsonEncoder); + jsonEncoder.flush(); + return out; +} catch (ClassCastException | NullPointerException ex) { + // NullPointerException will be thrown in cases where the field values are missing + // ClassCastException will be thrown in cases where the field values do not match the schema type + // Fallback to using `toString` which also returns json but without a pretty-print option + out.write(record.toString().getBytes(StandardCharsets.UTF_8)); Review Comment: Ok, I'm not familiar with that flow. If this is breaking that flow, I can just make a new method -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6538] Refactor methods in TimelineDiffHelper class [hudi]
wombatu-kun commented on PR #10938: URL: https://github.com/apache/hudi/pull/10938#issuecomment-2029157601 @nsivabalan this refactoring is made addressing the code you proposed in comment to other PR. Could you please review it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7526] Fix constructors for bulkinsert sort partitioners to ensure we could use it as user defined partitioners [hudi]
wombatu-kun commented on code in PR #10942: URL: https://github.com/apache/hudi/pull/10942#discussion_r1545958016 ## hudi-client/hudi-java-client/src/main/java/org/apache/hudi/execution/bulkinsert/JavaGlobalSortPartitioner.java: ## @@ -31,12 +32,21 @@ * * @param HoodieRecordPayload type */ -public class JavaGlobalSortPartitioner -implements BulkInsertPartitioner>> { +public class JavaGlobalSortPartitioner implements BulkInsertPartitioner>> { + + public JavaGlobalSortPartitioner() { + } + + /** + * Constructor to create as UserDefinedBulkInsertPartitioner class via reflection + * @param config HoodieWriteConfig Review Comment: Yes, in this case HoodieWriteConfig is ignored just because this Partitioner is not configurable at all, but it does not mean that it should not be used as `UserDefinedBulkInsertPartitioner`. So I think, the purpose of this task is not to make all BulkInsertPartitioners customizable with HoodieWriteConfig, but only to make them instantiable via reflection with already existing common approach for UserDefinedBulkInsertPartitioner (constructor with HoodieWriteConfig as the only parameter). @nsivabalan am I right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Changing the Properties to Load From Both Default Path and Enviorment [hudi]
Amar1404 commented on PR #10835: URL: https://github.com/apache/hudi/pull/10835#issuecomment-2029144280 @CTTY : So here In EMR the default conf is getting applied, but as per the document of hudi if we specify the ENV HUDI_DEFAULT_CONF is not getting applied due to the bug in code, which i have fixed. Now the conf from current thread is loaded, the the Enviorment variable is loaded then from the local system. The EMR configuration in existing is still getting applied, just the changes it to applied now seting in ENV variable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7557) NoSuchElementException when savepoint has been removed or archived
Sagar Sumit created HUDI-7557: - Summary: NoSuchElementException when savepoint has been removed or archived Key: HUDI-7557 URL: https://issues.apache.org/jira/browse/HUDI-7557 Project: Apache Hudi Issue Type: Bug Reporter: Sagar Sumit Fix For: 0.15.0 This [block|https://github.com/apache/hudi/blob/26c00a3adefff9217187ca0ab9a5b2a7c9e42199/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java#L246-L249] of code is buggy when commit which was savepointed has been removed or archived. {code:java} if (!instantOption.isPresent()) { LOG.warn("Skipping to process a commit for which savepoint was removed as the instant moved to archived timeline already"); } HoodieInstant instant = instantOption.get(); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7526] Fix constructors for bulkinsert sort partitioners to ensure we could use it as user defined partitioners [hudi]
danny0405 commented on code in PR #10942: URL: https://github.com/apache/hudi/pull/10942#discussion_r1545945929 ## hudi-client/hudi-java-client/src/main/java/org/apache/hudi/execution/bulkinsert/JavaGlobalSortPartitioner.java: ## @@ -31,12 +32,21 @@ * * @param HoodieRecordPayload type */ -public class JavaGlobalSortPartitioner -implements BulkInsertPartitioner>> { +public class JavaGlobalSortPartitioner implements BulkInsertPartitioner>> { + + public JavaGlobalSortPartitioner() { + } + + /** + * Constructor to create as UserDefinedBulkInsertPartitioner class via reflection + * @param config HoodieWriteConfig Review Comment: I got confused because the "customized" `HoodieWriteConfig` does not really play a role here and it is ignored? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]
danny0405 commented on code in PR #10943: URL: https://github.com/apache/hudi/pull/10943#discussion_r1545945200 ## hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java: ## @@ -209,10 +210,18 @@ public static byte[] avroToJson(GenericRecord record, boolean pretty) throws IOE private static ByteArrayOutputStream avroToJsonHelper(GenericRecord record, boolean pretty) throws IOException { DatumWriter writer = new GenericDatumWriter<>(record.getSchema()); ByteArrayOutputStream out = new ByteArrayOutputStream(); -JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty); -writer.write(record, jsonEncoder); -jsonEncoder.flush(); -return out; +try { + JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty); + writer.write(record, jsonEncoder); + jsonEncoder.flush(); + return out; +} catch (ClassCastException | NullPointerException ex) { + // NullPointerException will be thrown in cases where the field values are missing + // ClassCastException will be thrown in cases where the field values do not match the schema type + // Fallback to using `toString` which also returns json but without a pretty-print option + out.write(record.toString().getBytes(StandardCharsets.UTF_8)); Review Comment: > You get a string that represents the json of the object, it does not do any validation on types/nullability I kind of remember we have some cases for converting the json into avro then back to json again operations for our commit metadata. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7552) Remove the suffix for MDT table service instants
[ https://issues.apache.org/jira/browse/HUDI-7552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7552: - Labels: pull-request-available (was: ) > Remove the suffix for MDT table service instants > > > Key: HUDI-7552 > URL: https://issues.apache.org/jira/browse/HUDI-7552 > Project: Apache Hudi > Issue Type: Improvement > Components: core >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7552] Remove the suffix for MDT table service instants [hudi]
danny0405 opened a new pull request, #10945: URL: https://github.com/apache/hudi/pull/10945 ### Change Logs Remove the suffix of MDT table operation instants (the async index operation is kept because there is still some validation on it, the suffix is used for efficient filtering). Also simplify the logic for MDT delta instant validation for log reader. ### Impact none ### Risk level (write none, low medium or high below) low medium ### Documentation Update none ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7526] Fix constructors for bulkinsert sort partitioners to ensure we could use it as user defined partitioners [hudi]
wombatu-kun commented on code in PR #10942: URL: https://github.com/apache/hudi/pull/10942#discussion_r1545900885 ## hudi-client/hudi-java-client/src/main/java/org/apache/hudi/execution/bulkinsert/JavaGlobalSortPartitioner.java: ## @@ -31,12 +32,21 @@ * * @param HoodieRecordPayload type */ -public class JavaGlobalSortPartitioner -implements BulkInsertPartitioner>> { +public class JavaGlobalSortPartitioner implements BulkInsertPartitioner>> { + + public JavaGlobalSortPartitioner() { + } + + /** + * Constructor to create as UserDefinedBulkInsertPartitioner class via reflection + * @param config HoodieWriteConfig Review Comment: This partitioner will be instantiated when user define write config property `hoodie.bulkinsert.user.defined.partitioner.class=org.apache.hudi.execution.bulkinsert.JavaGlobalSortPartitioner`. This constructor will be called via reflection in methods of DataSourceUtils class `createUserDefinedBulkInsertPartitioner(HoodieWriteConfig config)` and `createUserDefinedBulkInsertPartitionerWithRows(HoodieWriteConfig config)`. There is nothing to customize in this JavaGlobalSortPartitioner, but, for example, provided writeConfig is used for customization of RowSpatialCurveSortPartitioner. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7526] Fix constructors for bulkinsert sort partitioners to ensure we could use it as user defined partitioners [hudi]
wombatu-kun commented on code in PR #10942: URL: https://github.com/apache/hudi/pull/10942#discussion_r1545900885 ## hudi-client/hudi-java-client/src/main/java/org/apache/hudi/execution/bulkinsert/JavaGlobalSortPartitioner.java: ## @@ -31,12 +32,21 @@ * * @param HoodieRecordPayload type */ -public class JavaGlobalSortPartitioner -implements BulkInsertPartitioner>> { +public class JavaGlobalSortPartitioner implements BulkInsertPartitioner>> { + + public JavaGlobalSortPartitioner() { + } + + /** + * Constructor to create as UserDefinedBulkInsertPartitioner class via reflection + * @param config HoodieWriteConfig Review Comment: This partitioner will be instantiated when user define write config property `hoodie.bulkinsert.user.defined.partitioner.class=org.apache.hudi.execution.bulkinsert.JavaGlobalSortPartitioner`. This constructor will be called via reflection in methods of DataSourceUtils class `createUserDefinedBulkInsertPartitioner(HoodieWriteConfig config)` and `createUserDefinedBulkInsertPartitionerWithRows(HoodieWriteConfig config)`: `private static Option createUserDefinedBulkInsertPartitioner(HoodieWriteConfig config) throws HoodieException { String bulkInsertPartitionerClass = config.getUserDefinedBulkInsertPartitionerClass(); try { return StringUtils.isNullOrEmpty(bulkInsertPartitionerClass) ? Option.empty() : Option.of((BulkInsertPartitioner) ReflectionUtils.loadClass(bulkInsertPartitionerClass, config)); } catch (Throwable e) { throw new HoodieException("Could not create UserDefinedBulkInsertPartitioner class " + bulkInsertPartitionerClass, e); } }` There is nothing to customize in this JavaGlobalSortPartitioner, but, for example, provided writeConfig is used for customization of RowSpatialCurveSortPartitioner. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7526] Fix constructors for bulkinsert sort partitioners to ensure we could use it as user defined partitioners [hudi]
wombatu-kun commented on code in PR #10942: URL: https://github.com/apache/hudi/pull/10942#discussion_r1545900885 ## hudi-client/hudi-java-client/src/main/java/org/apache/hudi/execution/bulkinsert/JavaGlobalSortPartitioner.java: ## @@ -31,12 +32,21 @@ * * @param HoodieRecordPayload type */ -public class JavaGlobalSortPartitioner -implements BulkInsertPartitioner>> { +public class JavaGlobalSortPartitioner implements BulkInsertPartitioner>> { + + public JavaGlobalSortPartitioner() { + } + + /** + * Constructor to create as UserDefinedBulkInsertPartitioner class via reflection + * @param config HoodieWriteConfig Review Comment: This partitioner will be instantiated when user define write config property `hoodie.bulkinsert.user.defined.partitioner.class=org.apache.hudi.execution.bulkinsert.JavaGlobalSortPartitioner`. This constructor will be called via reflection in DataSourceUtils class: `createUserDefinedBulkInsertPartitioner(HoodieWriteConfig config)` and `createUserDefinedBulkInsertPartitionerWithRows(HoodieWriteConfig config)`. There is nothing to customize in this JavaGlobalSortPartitioner, but, for example, provided writeConfig is used for customization of RowSpatialCurveSortPartitioner. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]
xuzifu666 commented on code in PR #10898: URL: https://github.com/apache/hudi/pull/10898#discussion_r1545892451 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/HoodieSimpleBucketIndex.java: ## @@ -61,14 +75,63 @@ public Map loadBucketIdToFileIdMappingForPartitio if (!bucketIdToFileIdMapping.containsKey(bucketId)) { bucketIdToFileIdMapping.put(bucketId, new HoodieRecordLocation(commitTime, fileId)); } else { +// Finding the instants which conflict with the bucket id +Set instants = findTheConflictBucketIdInPartition(hoodieTable, partition, bucketId); + // Check if bucket data is valid throw new HoodieIOException("Find multiple files at partition path=" -+ partition + " belongs to the same bucket id = " + bucketId); ++ partition + " belongs to the same bucket id = " + bucketId ++ ", these instants need to rollback: " + instants.toString() ++ ", you can use rollback_to_instant procedure to recovery"); } }); return bucketIdToFileIdMapping; } + + /** + * Find out the conflict files in bucket partition with bucekt id + */ + public HashSet findTheConflictBucketIdInPartition(HoodieTable hoodieTable, String partition, int bucketId) { +HashSet instants = new HashSet<>(); Review Comment: From anther view, this position get TableFileSystemView from hoodietable not confirm subclass is HoodieTableFileSystemView,at the same time not get all pending instant,so I think get pending instant from timeline maybe better @danny0405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]
xuzifu666 commented on code in PR #10898: URL: https://github.com/apache/hudi/pull/10898#discussion_r1545892451 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/HoodieSimpleBucketIndex.java: ## @@ -61,14 +75,63 @@ public Map loadBucketIdToFileIdMappingForPartitio if (!bucketIdToFileIdMapping.containsKey(bucketId)) { bucketIdToFileIdMapping.put(bucketId, new HoodieRecordLocation(commitTime, fileId)); } else { +// Finding the instants which conflict with the bucket id +Set instants = findTheConflictBucketIdInPartition(hoodieTable, partition, bucketId); + // Check if bucket data is valid throw new HoodieIOException("Find multiple files at partition path=" -+ partition + " belongs to the same bucket id = " + bucketId); ++ partition + " belongs to the same bucket id = " + bucketId ++ ", these instants need to rollback: " + instants.toString() ++ ", you can use rollback_to_instant procedure to recovery"); } }); return bucketIdToFileIdMapping; } + + /** + * Find out the conflict files in bucket partition with bucekt id + */ + public HashSet findTheConflictBucketIdInPartition(HoodieTable hoodieTable, String partition, int bucketId) { +HashSet instants = new HashSet<>(); Review Comment: From anther view, this position get TableFileSystemView from hoodietable not confirm subclass is HoodieTableFileSystemView,at the same time not get all pending instant,so I think get pending instant maybe better @danny0405 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/HoodieSimpleBucketIndex.java: ## @@ -61,14 +75,63 @@ public Map loadBucketIdToFileIdMappingForPartitio if (!bucketIdToFileIdMapping.containsKey(bucketId)) { bucketIdToFileIdMapping.put(bucketId, new HoodieRecordLocation(commitTime, fileId)); } else { +// Finding the instants which conflict with the bucket id +Set instants = findTheConflictBucketIdInPartition(hoodieTable, partition, bucketId); + // Check if bucket data is valid throw new HoodieIOException("Find multiple files at partition path=" -+ partition + " belongs to the same bucket id = " + bucketId); ++ partition + " belongs to the same bucket id = " + bucketId ++ ", these instants need to rollback: " + instants.toString() ++ ", you can use rollback_to_instant procedure to recovery"); } }); return bucketIdToFileIdMapping; } + + /** + * Find out the conflict files in bucket partition with bucekt id + */ + public HashSet findTheConflictBucketIdInPartition(HoodieTable hoodieTable, String partition, int bucketId) { +HashSet instants = new HashSet<>(); Review Comment: Had Tried HoodieTableFileSystemView#fetchLatestFileSlicesIncludingInflight get Fileslice of the partition,but seems not filter the error write instant from fileslices,current logic can confirm find out conflict instant,could we keep it? @danny0405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]
hudi-bot commented on PR #10943: URL: https://github.com/apache/hudi/pull/10943#issuecomment-2029024584 ## CI report: * 70a35f705b74db87648f3f6a7e504614db6416aa Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23061) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [SUGGEST] Can the community version be updated regularly and faster? The roadmap should also be updated regularly and synchronized. [hudi]
zyclove opened a new issue, #10944: URL: https://github.com/apache/hudi/issues/10944 The version update is also too slow. It has not been updated for a long time. Many problems have to be solved in time and support is not available. https://github.com/apache/hudi/assets/15028279/96958d33-83ea-4282-afe6-b994ce9ff905";> https://github.com/apache/hudi/assets/15028279/09386936-55b9-48d2-a144-18aafb12ca29";> 1.0 is originally a beta version with many problems. There has been no new version for so long, so when will the official version be available? The hudi roadmap has not been updated for a long time. https://hudi.apache.org/roadmap https://github.com/apache/hudi/assets/15028279/c3bded86-4445-4707-83c9-eb56f40be918";> I am very optimistic about the positioning and development of Hudi, but I sincerely hope that Hudi will develop better and better and truly solve the pain points of data lake business. Best regards -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch asf-site updated: rs - cow snap, mor ro; starrocks - cow snap, mor rt, ro (#10940)
This is an automated email from the ASF dual-hosted git repository. bhavanisudha pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new e8d498d2199 rs - cow snap, mor ro; starrocks - cow snap, mor rt, ro (#10940) e8d498d2199 is described below commit e8d498d21998ea1c1005e6350e8828eb6842dcba Author: Sagar Lakshmipathy <18vidhyasa...@gmail.com> AuthorDate: Sun Mar 31 18:44:24 2024 -0700 rs - cow snap, mor ro; starrocks - cow snap, mor rt, ro (#10940) --- website/docs/sql_queries.md| 36 ++ .../version-0.12.0/query_engine_setup.md | 6 ++-- .../versioned_docs/version-0.12.0/querying_data.md | 2 ++ .../version-0.12.1/query_engine_setup.md | 6 ++-- .../versioned_docs/version-0.12.1/querying_data.md | 2 ++ .../version-0.12.2/query_engine_setup.md | 6 ++-- .../versioned_docs/version-0.12.2/querying_data.md | 2 ++ .../version-0.12.3/query_engine_setup.md | 6 ++-- .../versioned_docs/version-0.12.3/querying_data.md | 3 +- .../version-0.13.0/query_engine_setup.md | 6 ++-- .../versioned_docs/version-0.13.0/querying_data.md | 3 +- .../versioned_docs/version-0.13.1/querying_data.md | 31 +-- .../versioned_docs/version-0.14.0/sql_queries.md | 36 ++ .../versioned_docs/version-0.14.1/sql_queries.md | 36 ++ 14 files changed, 85 insertions(+), 96 deletions(-) diff --git a/website/docs/sql_queries.md b/website/docs/sql_queries.md index d833831169b..2180b40a48d 100644 --- a/website/docs/sql_queries.md +++ b/website/docs/sql_queries.md @@ -344,10 +344,8 @@ The current default supported version of Hudi is 0.10.0 ~ 0.13.1, and has not be ## StarRocks -Copy on Write tables in Apache Hudi 0.10.0 and above can be queried via StarRocks external tables from StarRocks version -2.2.0. Only snapshot queries are supported currently. In future releases Merge on Read tables will also be supported. -Please refer to [StarRocks Hudi external table](https://docs.starrocks.io/en-us/latest/using_starrocks/External_table#hudi-external-table) -for more details on the setup. +For Copy-on-Write tables StarRocks provides support for Snapshot queries and for Merge-on-Read tables, StarRocks provides support for Snapshot and Read Optimized queries. +Please refer [StarRocks docs](https://docs.starrocks.io/docs/data_source/catalog/hudi_catalog/) for more details. ## ClickHouse @@ -386,20 +384,20 @@ Following tables show whether a given query is supported on specific query engin ### Merge-On-Read tables -| Query Engine|Snapshot Queries|Incremental Queries|Read Optimized Queries| -|-||---|--| -| **Hive**|Y|Y|Y| -| **Spark SQL** |Y|Y|Y| -| **Spark Datasource** |Y|Y|Y| -| **Flink SQL** |Y|Y|Y| -| **PrestoDB**|Y|N|Y| -| **AWS Athena** |Y|N|Y| -| **Big Query** |Y|N|Y| -| **Trino** |N|N|Y| -| **Impala** |N|N|Y| -| **Redshift Spectrum** |N|N|N| -| **Doris** |Y|N|Y| -| **StarRocks** |N|N|N| -| **ClickHouse** |N|N|N| +| Query Engine| Snapshot Queries |Incremental Queries| Read Optimized Queries | +|-|--|---|| +| **Hive**| Y|Y| Y | +| **Spark SQL** | Y|Y| Y | +| **Spark Datasource** | Y|Y| Y | +| **Flink SQL** | Y|Y| Y | +| **PrestoDB**| Y|N| Y | +| **AWS Athena** | Y|N| Y | +| **Big Query** | Y|N| Y | +| **Trino** | N|N| Y | +| **Impala** | N|N| Y | +| **Redshift Spectrum** | N|N| Y | +| **Doris** | Y|N| Y | +| **StarRocks** | Y|N| Y | +| **ClickHouse** | N|N| N | diff --git a/website/versioned_docs/version-0.12.0/query_engine_setup.md b/website/versioned_docs/version-0.12.0/query_engine_setup.md index 79dfaf81233..47eaeaa27c5 100644 --- a/website/versioned_docs/version-0.12.0/query_engine_setup.md +++ b/website/versioned_docs/version-0.12.0/query_engine_setup.md @@ -127,7 +127,5 @@ Please refer to [Redshift Spectrum Integration with Apache Hudi](https://docs.aw for more details. ## StarRocks -Copy on Write tables in Apache Hudi 0.10.0 and above can be queried via StarRocks external tables from StarRocks version 2.2.0. -Only snapshot queries are supported currently
Re: [PR] [MINOR] [DOCS] changes to redshift & starrocks compat matrix [hudi]
bhasudha merged PR #10940: URL: https://github.com/apache/hudi/pull/10940 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]
the-other-tim-brown commented on code in PR #10943: URL: https://github.com/apache/hudi/pull/10943#discussion_r1545880874 ## hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java: ## @@ -209,10 +210,18 @@ public static byte[] avroToJson(GenericRecord record, boolean pretty) throws IOE private static ByteArrayOutputStream avroToJsonHelper(GenericRecord record, boolean pretty) throws IOException { DatumWriter writer = new GenericDatumWriter<>(record.getSchema()); ByteArrayOutputStream out = new ByteArrayOutputStream(); -JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty); -writer.write(record, jsonEncoder); -jsonEncoder.flush(); -return out; +try { + JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty); + writer.write(record, jsonEncoder); + jsonEncoder.flush(); + return out; +} catch (ClassCastException | NullPointerException ex) { + // NullPointerException will be thrown in cases where the field values are missing + // ClassCastException will be thrown in cases where the field values do not match the schema type + // Fallback to using `toString` which also returns json but without a pretty-print option + out.write(record.toString().getBytes(StandardCharsets.UTF_8)); Review Comment: > So when the transformed JSON string got converted back into avro, the schema could change right? The case here is when you have some data and are trying to convert it to avro and it fails. https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamerUtils.java#L164 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]
the-other-tim-brown commented on code in PR #10943: URL: https://github.com/apache/hudi/pull/10943#discussion_r1545880266 ## hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java: ## @@ -209,10 +210,18 @@ public static byte[] avroToJson(GenericRecord record, boolean pretty) throws IOE private static ByteArrayOutputStream avroToJsonHelper(GenericRecord record, boolean pretty) throws IOException { DatumWriter writer = new GenericDatumWriter<>(record.getSchema()); ByteArrayOutputStream out = new ByteArrayOutputStream(); -JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty); -writer.write(record, jsonEncoder); -jsonEncoder.flush(); -return out; +try { + JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty); + writer.write(record, jsonEncoder); + jsonEncoder.flush(); + return out; +} catch (ClassCastException | NullPointerException ex) { + // NullPointerException will be thrown in cases where the field values are missing + // ClassCastException will be thrown in cases where the field values do not match the schema type + // Fallback to using `toString` which also returns json but without a pretty-print option + out.write(record.toString().getBytes(StandardCharsets.UTF_8)); Review Comment: You get a string that represents the json of the object, it does not do any validation on types/nullability. See the tests that are added for a sample. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]
danny0405 commented on code in PR #10943: URL: https://github.com/apache/hudi/pull/10943#discussion_r1545879905 ## hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java: ## @@ -209,10 +210,18 @@ public static byte[] avroToJson(GenericRecord record, boolean pretty) throws IOE private static ByteArrayOutputStream avroToJsonHelper(GenericRecord record, boolean pretty) throws IOException { DatumWriter writer = new GenericDatumWriter<>(record.getSchema()); ByteArrayOutputStream out = new ByteArrayOutputStream(); -JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty); -writer.write(record, jsonEncoder); -jsonEncoder.flush(); -return out; +try { + JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty); + writer.write(record, jsonEncoder); + jsonEncoder.flush(); + return out; +} catch (ClassCastException | NullPointerException ex) { + // NullPointerException will be thrown in cases where the field values are missing + // ClassCastException will be thrown in cases where the field values do not match the schema type + // Fallback to using `toString` which also returns json but without a pretty-print option + out.write(record.toString().getBytes(StandardCharsets.UTF_8)); Review Comment: So when the transformed JSON string got converted back into avro, the schema could change right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]
danny0405 commented on code in PR #10943: URL: https://github.com/apache/hudi/pull/10943#discussion_r1545879712 ## hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java: ## @@ -209,10 +210,18 @@ public static byte[] avroToJson(GenericRecord record, boolean pretty) throws IOE private static ByteArrayOutputStream avroToJsonHelper(GenericRecord record, boolean pretty) throws IOException { DatumWriter writer = new GenericDatumWriter<>(record.getSchema()); ByteArrayOutputStream out = new ByteArrayOutputStream(); -JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty); -writer.write(record, jsonEncoder); -jsonEncoder.flush(); -return out; +try { + JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty); + writer.write(record, jsonEncoder); + jsonEncoder.flush(); + return out; +} catch (ClassCastException | NullPointerException ex) { + // NullPointerException will be thrown in cases where the field values are missing + // ClassCastException will be thrown in cases where the field values do not match the schema type + // Fallback to using `toString` which also returns json but without a pretty-print option + out.write(record.toString().getBytes(StandardCharsets.UTF_8)); Review Comment: Hmm, seems like a `null` constant for empty field. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]
hudi-bot commented on PR #10943: URL: https://github.com/apache/hudi/pull/10943#issuecomment-2028989292 ## CI report: * 70a35f705b74db87648f3f6a7e504614db6416aa Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23061) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]
danny0405 commented on code in PR #10943: URL: https://github.com/apache/hudi/pull/10943#discussion_r1545879498 ## hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java: ## @@ -209,10 +210,18 @@ public static byte[] avroToJson(GenericRecord record, boolean pretty) throws IOE private static ByteArrayOutputStream avroToJsonHelper(GenericRecord record, boolean pretty) throws IOException { DatumWriter writer = new GenericDatumWriter<>(record.getSchema()); ByteArrayOutputStream out = new ByteArrayOutputStream(); -JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty); -writer.write(record, jsonEncoder); -jsonEncoder.flush(); -return out; +try { + JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty); + writer.write(record, jsonEncoder); + jsonEncoder.flush(); + return out; +} catch (ClassCastException | NullPointerException ex) { + // NullPointerException will be thrown in cases where the field values are missing + // ClassCastException will be thrown in cases where the field values do not match the schema type + // Fallback to using `toString` which also returns json but without a pretty-print option + out.write(record.toString().getBytes(StandardCharsets.UTF_8)); Review Comment: What do we get for `record.toString` when `NullPointerException` is thrown? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7526] Fix constructors for bulkinsert sort partitioners to ensure we could use it as user defined partitioners [hudi]
danny0405 commented on code in PR #10942: URL: https://github.com/apache/hudi/pull/10942#discussion_r1545879218 ## hudi-client/hudi-java-client/src/main/java/org/apache/hudi/execution/bulkinsert/JavaGlobalSortPartitioner.java: ## @@ -31,12 +32,21 @@ * * @param HoodieRecordPayload type */ -public class JavaGlobalSortPartitioner -implements BulkInsertPartitioner>> { +public class JavaGlobalSortPartitioner implements BulkInsertPartitioner>> { + + public JavaGlobalSortPartitioner() { + } + + /** + * Constructor to create as UserDefinedBulkInsertPartitioner class via reflection + * @param config HoodieWriteConfig Review Comment: Can you give an example how this partitioner got instantiated and customized? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]
hudi-bot commented on PR #10943: URL: https://github.com/apache/hudi/pull/10943#issuecomment-2028985048 ## CI report: * 70a35f705b74db87648f3f6a7e504614db6416aa UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [MINOR] Handle cases of malformed records when converting to json [hudi]
the-other-tim-brown opened a new pull request, #10943: URL: https://github.com/apache/hudi/pull/10943 ### Change Logs Handles cases of missing required fields and bad input values when converting to JSON. This conversion is used in combination with the Error Table so you cannot assume that the records are properly formatted. ### Impact Avoids exceptions being thrown for malformed input data being sent to the error table writer ### Risk level (write none, low medium or high below) None ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7526] Fix constructors for bulkinsert sort partitioners to ensure we could use it as user defined partitioners [hudi]
hudi-bot commented on PR #10942: URL: https://github.com/apache/hudi/pull/10942#issuecomment-2028820679 ## CI report: * ea11f68c1778f9ec23eab6a887076e51f60caa0b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23060) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7526] Fix constructors for bulkinsert sort partitioners to ensure we could use it as user defined partitioners [hudi]
hudi-bot commented on PR #10942: URL: https://github.com/apache/hudi/pull/10942#issuecomment-2028787238 ## CI report: * ea11f68c1778f9ec23eab6a887076e51f60caa0b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7526) Fix constructors for all bulk insert sort partitioners to ensure we could use it as user defined partitioners
[ https://issues.apache.org/jira/browse/HUDI-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7526: - Labels: pull-request-available (was: ) > Fix constructors for all bulk insert sort partitioners to ensure we could use > it as user defined partitioners > -- > > Key: HUDI-7526 > URL: https://issues.apache.org/jira/browse/HUDI-7526 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Reporter: sivabalan narayanan >Assignee: Vova Kolmakov >Priority: Major > Labels: pull-request-available > > Our constructor for user defined sort partitioner takes in write config, > while some of the partitioners used in out of the box sort mode, does not > account for it. > > Lets fix the sort partitioners to ensure anything can be used as user defined > partitioners. > For eg, NoneSortMode does not have a constructor that takes in write config -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7526] Fix constructors for bulkinsert sort partitioners to ensure we could use it as user defined partitioners [hudi]
wombatu-kun opened a new pull request, #10942: URL: https://github.com/apache/hudi/pull/10942 ### Change Logs Our constructor for user defined sort partitioner takes in write config, while some of the partitioners used in out of the box sort mode, does not account for it. Lets fix the sort partitioners to ensure anything can be used as user defined partitioners. For eg, NoneSortMode does not have a constructor that takes in write config ### Impact none ### Risk level (write none, low medium or high below) none ### Documentation Update none - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-7526) Fix constructors for all bulk insert sort partitioners to ensure we could use it as user defined partitioners
[ https://issues.apache.org/jira/browse/HUDI-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov reassigned HUDI-7526: --- Assignee: Vova Kolmakov > Fix constructors for all bulk insert sort partitioners to ensure we could use > it as user defined partitioners > -- > > Key: HUDI-7526 > URL: https://issues.apache.org/jira/browse/HUDI-7526 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Reporter: sivabalan narayanan >Assignee: Vova Kolmakov >Priority: Major > > Our constructor for user defined sort partitioner takes in write config, > while some of the partitioners used in out of the box sort mode, does not > account for it. > > Lets fix the sort partitioners to ensure anything can be used as user defined > partitioners. > For eg, NoneSortMode does not have a constructor that takes in write config -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7526) Fix constructors for all bulk insert sort partitioners to ensure we could use it as user defined partitioners
[ https://issues.apache.org/jira/browse/HUDI-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov updated HUDI-7526: Status: In Progress (was: Open) > Fix constructors for all bulk insert sort partitioners to ensure we could use > it as user defined partitioners > -- > > Key: HUDI-7526 > URL: https://issues.apache.org/jira/browse/HUDI-7526 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Reporter: sivabalan narayanan >Assignee: Vova Kolmakov >Priority: Major > > Our constructor for user defined sort partitioner takes in write config, > while some of the partitioners used in out of the box sort mode, does not > account for it. > > Lets fix the sort partitioners to ensure anything can be used as user defined > partitioners. > For eg, NoneSortMode does not have a constructor that takes in write config -- This message was sent by Atlassian Jira (v8.20.10#820010)