Re: [PR] [HUDI-6757] Fix compaction execution terminated in async threads in flink bounded… [hudi]
flashJd commented on PR #9544: URL: https://github.com/apache/hudi/pull/9544#issuecomment-2031185874 > yeah, @flashJd is there any possibility you can rebase with the latest master? @danny0405 you can collaborate, I doesn't pay attention to hudi several month -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6854] Change default payload type to HOODIE_AVRO_DEFAULT [hudi]
hudi-bot commented on PR #10949: URL: https://github.com/apache/hudi/pull/10949#issuecomment-2031142489 ## CI report: * c344e38bfcfea10fb1556a4d335af1b5b92da6ee Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23077) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]
melin commented on PR #9717: URL: https://github.com/apache/hudi/pull/9717#issuecomment-2031122042 > The 0.15.0 release branch is planned to be cut this month once we verify engine integrations. When will it be released? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6854] Change default payload type to HOODIE_AVRO_DEFAULT [hudi]
hudi-bot commented on PR #10949: URL: https://github.com/apache/hudi/pull/10949#issuecomment-2031095571 ## CI report: * 1984e34cf984ca5088cd921e26cd3d74421afb03 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23070) * c344e38bfcfea10fb1556a4d335af1b5b92da6ee Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23077) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6854] Change default payload type to HOODIE_AVRO_DEFAULT [hudi]
hudi-bot commented on PR #10949: URL: https://github.com/apache/hudi/pull/10949#issuecomment-2031089981 ## CI report: * 1984e34cf984ca5088cd921e26cd3d74421afb03 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23070) * c344e38bfcfea10fb1556a4d335af1b5b92da6ee UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]
hudi-bot commented on PR #10898: URL: https://github.com/apache/hudi/pull/10898#issuecomment-2031039502 ## CI report: * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN * 208249c7f8164e434a8760d64678ab86295a26fc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23076) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]
hudi-bot commented on PR #10898: URL: https://github.com/apache/hudi/pull/10898#issuecomment-2031034219 ## CI report: * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN * 029a6466f51d1ad0103521c45639aaf2e47240c9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23075) * 208249c7f8164e434a8760d64678ab86295a26fc UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]
hudi-bot commented on PR #10898: URL: https://github.com/apache/hudi/pull/10898#issuecomment-2031028647 ## CI report: * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN * 029a6466f51d1ad0103521c45639aaf2e47240c9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23075) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]
hudi-bot commented on PR #10898: URL: https://github.com/apache/hudi/pull/10898#issuecomment-2030997720 ## CI report: * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN * 3dc06097b480a32194508bb1d1edd6f4806feeec Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23074) * 029a6466f51d1ad0103521c45639aaf2e47240c9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (HUDI-4699) Primary key-less data model
[ https://issues.apache.org/jira/browse/HUDI-4699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan resolved HUDI-4699. --- > Primary key-less data model > --- > > Key: HUDI-4699 > URL: https://issues.apache.org/jira/browse/HUDI-4699 > Project: Apache Hudi > Issue Type: Epic > Components: writer-core >Reporter: Sagar Sumit >Priority: Major > Labels: pull-request-available > > Hudi requires users to specify a primary key field. Can we do away with this > requirement? This epic tracks the work to support use cases which does not > require primary key based data modelling. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-4699) Primary key-less data model
[ https://issues.apache.org/jira/browse/HUDI-4699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-4699. - Fix Version/s: 0.14.0 Resolution: Fixed > Primary key-less data model > --- > > Key: HUDI-4699 > URL: https://issues.apache.org/jira/browse/HUDI-4699 > Project: Apache Hudi > Issue Type: Epic > Components: writer-core >Reporter: Sagar Sumit >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > Hudi requires users to specify a primary key field. Can we do away with this > requirement? This epic tracks the work to support use cases which does not > require primary key based data modelling. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (HUDI-4699) Primary key-less data model
[ https://issues.apache.org/jira/browse/HUDI-4699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reopened HUDI-4699: --- Assignee: sivabalan narayanan > Primary key-less data model > --- > > Key: HUDI-4699 > URL: https://issues.apache.org/jira/browse/HUDI-4699 > Project: Apache Hudi > Issue Type: Epic > Components: writer-core >Reporter: Sagar Sumit >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Hudi requires users to specify a primary key field. Can we do away with this > requirement? This epic tracks the work to support use cases which does not > require primary key based data modelling. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [I] [SUGGEST] Can the community version be updated regularly and faster? The roadmap should also be updated regularly and synchronized. [hudi]
danny0405 commented on issue #10944: URL: https://github.com/apache/hudi/issues/10944#issuecomment-2030996714 Thanks for the notation, we are working hard to prepare a GA release for 1.0, we want it to be in good shape, that is why the waiting period is kind of long comparing to other releases. Will update the roadmap soon, thanks for the reminder again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]
danny0405 commented on code in PR #10943: URL: https://github.com/apache/hudi/pull/10943#discussion_r1547081375 ## hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java: ## @@ -209,10 +210,18 @@ public static byte[] avroToJson(GenericRecord record, boolean pretty) throws IOE private static ByteArrayOutputStream avroToJsonHelper(GenericRecord record, boolean pretty) throws IOException { DatumWriter writer = new GenericDatumWriter<>(record.getSchema()); ByteArrayOutputStream out = new ByteArrayOutputStream(); -JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty); -writer.write(record, jsonEncoder); -jsonEncoder.flush(); -return out; +try { + JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty); + writer.write(record, jsonEncoder); + jsonEncoder.flush(); + return out; +} catch (ClassCastException | NullPointerException ex) { + // NullPointerException will be thrown in cases where the field values are missing + // ClassCastException will be thrown in cases where the field values do not match the schema type + // Fallback to using `toString` which also returns json but without a pretty-print option + out.write(record.toString().getBytes(StandardCharsets.UTF_8)); Review Comment: > think I've convinced myself there should just be a new method like "safeToJson" that does not throw an exception that we use in the error table/writer cases since those are not as critical to Hudi. +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]
hudi-bot commented on PR #10898: URL: https://github.com/apache/hudi/pull/10898#issuecomment-2030992946 ## CI report: * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN * 627ddbeabf3e1886f64f1432499003f39ddba49c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23073) * 3dc06097b480a32194508bb1d1edd6f4806feeec UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]
hudi-bot commented on PR #10898: URL: https://github.com/apache/hudi/pull/10898#issuecomment-2030985254 ## CI report: * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN * 627ddbeabf3e1886f64f1432499003f39ddba49c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23073) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7559) Fix functional index (on column stats): Handle NPE in filterQueriesWithRecordKey(...)
[ https://issues.apache.org/jira/browse/HUDI-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinaykumar Bhat updated HUDI-7559: -- Description: `RecordLevelIndexSupport::filterQueryWithRecordKey(...)` throws NPE which is then subsequently ignored by `lookupCandidateFilesInMetadataTable()` rendering every other index (like FunctionalIndex, ColStat Index) to not be used for data skipping (i.e pruning files) (was: `RecordLevelIndexSupport::filterQueryWithRecordKey(...)` throws NPE which is then subsequently `lookupCandidateFilesInMetadataTable()` rendering every other index (like FunctionalIndex, ColStat Index) to not be used for data skipping (i.e pruning files)) > Fix functional index (on column stats): Handle NPE in > filterQueriesWithRecordKey(...) > - > > Key: HUDI-7559 > URL: https://issues.apache.org/jira/browse/HUDI-7559 > Project: Apache Hudi > Issue Type: Bug >Reporter: Vinaykumar Bhat >Assignee: Vinaykumar Bhat >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > `RecordLevelIndexSupport::filterQueryWithRecordKey(...)` throws NPE which is > then subsequently ignored by `lookupCandidateFilesInMetadataTable()` > rendering every other index (like FunctionalIndex, ColStat Index) to not be > used for data skipping (i.e pruning files) -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]
xuzifu666 closed pull request #10898: [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple URL: https://github.com/apache/hudi/pull/10898 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]
hudi-bot commented on PR #10898: URL: https://github.com/apache/hudi/pull/10898#issuecomment-2030954340 ## CI report: * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN * de9c573008c76367234cd859ca80ee165556e954 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23072) * 627ddbeabf3e1886f64f1432499003f39ddba49c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7552] Remove the suffix for MDT table service instants [hudi]
hudi-bot commented on PR #10945: URL: https://github.com/apache/hudi/pull/10945#issuecomment-2030948488 ## CI report: * 6c3830bb4de1887f41aebc139b3fc837e446ead5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23071) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]
hudi-bot commented on PR #10898: URL: https://github.com/apache/hudi/pull/10898#issuecomment-2030948345 ## CI report: * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN * bc4fe83062daefe310b394a0d9b698a8c950c068 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23049) * de9c573008c76367234cd859ca80ee165556e954 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23072) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]
hudi-bot commented on PR #10898: URL: https://github.com/apache/hudi/pull/10898#issuecomment-2030941589 ## CI report: * e9fc630d3a8999c7ef0db7bd94da910b1f77df7d UNKNOWN * b7011691a07deb288ce0341dcd55bb6feeb4101d UNKNOWN * bc4fe83062daefe310b394a0d9b698a8c950c068 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23049) * de9c573008c76367234cd859ca80ee165556e954 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-7559) Fix functional index (on column stats): Handle NPE in filterQueriesWithRecordKey(...)
[ https://issues.apache.org/jira/browse/HUDI-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832988#comment-17832988 ] Vinoth Chandar commented on HUDI-7559: -- [~codope] Hows this different from what we tested for beta1? > Fix functional index (on column stats): Handle NPE in > filterQueriesWithRecordKey(...) > - > > Key: HUDI-7559 > URL: https://issues.apache.org/jira/browse/HUDI-7559 > Project: Apache Hudi > Issue Type: Bug >Reporter: Vinaykumar Bhat >Assignee: Vinaykumar Bhat >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > `RecordLevelIndexSupport::filterQueryWithRecordKey(...)` throws NPE which is > then subsequently `lookupCandidateFilesInMetadataTable()` rendering every > other index (like FunctionalIndex, ColStat Index) to not be used for data > skipping (i.e pruning files) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7559) Fix functional index (on column stats): Handle NPE in filterQueriesWithRecordKey(...)
[ https://issues.apache.org/jira/browse/HUDI-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-7559: - Sprint: Sprint 2024-03-25 > Fix functional index (on column stats): Handle NPE in > filterQueriesWithRecordKey(...) > - > > Key: HUDI-7559 > URL: https://issues.apache.org/jira/browse/HUDI-7559 > Project: Apache Hudi > Issue Type: Bug >Reporter: Vinaykumar Bhat >Assignee: Vinaykumar Bhat >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > `RecordLevelIndexSupport::filterQueryWithRecordKey(...)` throws NPE which is > then subsequently `lookupCandidateFilesInMetadataTable()` rendering every > other index (like FunctionalIndex, ColStat Index) to not be used for data > skipping (i.e pruning files) -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]
xuzifu666 commented on PR #10898: URL: https://github.com/apache/hudi/pull/10898#issuecomment-2030921118 > @xuzifu666 @danny0405 @beyond1920 i think we should solve the root cause of bucket duplication. There are currently three situations where bucket file duplication occurs > > 1. Spark Speculation execution . Turn off speculative execution , we can solve this problem > 2. hoodier archiver Parallel deletet complete timeline . 1.0 has solved this problem. > 3. Concurrent into ovewrite of multiple spark writer . this is a bug need to fixed. > > now focus on the scence3: Concurrent into ovewrite of multiple spark writer when hudi build fileslice, hudi will call isFileSliceCommitted to Determine if the current file is committed. > > ``` > /** >* A FileSlice is considered committed, if one of the following is true - There is a committed data file - There are >* some log files, that are based off a commit or delta commit. >*/ > private boolean isFileSliceCommitted(FileSlice slice) { > if (!compareTimestamps(slice.getBaseInstantTime(), LESSER_THAN_OR_EQUALS, lastInstant.get().getTimestamp())) { > return false; > } > > return timeline.containsOrBeforeTimelineStarts(slice.getBaseInstantTime()); > } > ``` > > this is ok for single concurrent write scenario, but for mutil write the logical of isFileSliceCommitted has some bugs. If a file has a smaller commit time then smallest complete commit, Hudi will directly determine that the file is committed, even if it is a Garbage file or (File generated by write failure) > > eg: two spark app insert overwrite hudi BUCKET table with same partition. app1: start write commit at 0001 write files: 0--uuid1.parquet app2: start write commit at 0002 write files: 0--uuid2.parquet app1 maybe failed to write due to OCC /cancel/OOM, but 0--uuid1.parquet is already written. when hudi build fileslice, 0--uuid1.parquet is considered as committed. since it‘s committime 0001 < smallest complete commit 0002. this is wrong, committime 0001 is not committed maybe we can modify isFileSliceCommitted like this > > ``` >private boolean isFileSliceCommitted(FileSlice slice) { > if (!compareTimestamps(slice.getBaseInstantTime(), LESSER_THAN_OR_EQUALS, lastInstant.get().getTimestamp())) { > return false; > } > > return timeline.containsOrBeforeTimelineStarts(slice.getBaseInstantTime()) && UncompleteTimelineNotContains(slice.getBaseInstantTime()); > } > ``` > > finally, I think Hudi's fileslices should be managed uniformly, just like iceberg/delta lakes, rather than being obtained through list operation. Thanks for your advice,had test it in multiple write sences,it is ok as expected @xiarixiaoyao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7552] Remove the suffix for MDT table service instants [hudi]
hudi-bot commented on PR #10945: URL: https://github.com/apache/hudi/pull/10945#issuecomment-2030897587 ## CI report: * bf8eba5011f8ff4762e4da92aa57057873bafeab Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23063) * 6c3830bb4de1887f41aebc139b3fc837e446ead5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23071) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7552] Remove the suffix for MDT table service instants [hudi]
hudi-bot commented on PR #10945: URL: https://github.com/apache/hudi/pull/10945#issuecomment-2030892213 ## CI report: * bf8eba5011f8ff4762e4da92aa57057873bafeab Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23063) * 6c3830bb4de1887f41aebc139b3fc837e446ead5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1455) Hudi integration with project nessie
[ https://issues.apache.org/jira/browse/HUDI-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832970#comment-17832970 ] Wenrui Meng commented on HUDI-1455: --- Is there any plan for this issue? > Hudi integration with project nessie > > > Key: HUDI-1455 > URL: https://issues.apache.org/jira/browse/HUDI-1455 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Vinoth Chandar >Priority: Major > > [https://github.com/apache/hudi/issues/2330#issuecomment-743423398] > Follow up from this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7466] Add tests to AWSGlueCatalogSyncClient [hudi]
parisni commented on PR #10897: URL: https://github.com/apache/hudi/pull/10897#issuecomment-2030589638 > Can we make repairing tests a separate effort? makes sense. thanks for your insight -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [DOCS] Update roadmap [hudi]
xushiyan opened a new pull request, #10950: URL: https://github.com/apache/hudi/pull/10950 Update roadmap. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7235] Fix checkpoint bug for S3/GCS Incremental Source [hudi]
bvaradar commented on PR #10336: URL: https://github.com/apache/hudi/pull/10336#issuecomment-2030335892 @vinishjail97 : Can you address these comments and land it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] use Temurin jdk [hudi]
bvaradar commented on PR #10948: URL: https://github.com/apache/hudi/pull/10948#issuecomment-2030325960 Will land once the CI tests succeed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3431) Certify Hudi against Spark3 Hive3 Hadoop3
[ https://issues.apache.org/jira/browse/HUDI-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3431: - Fix Version/s: 0.15.0 > Certify Hudi against Spark3 Hive3 Hadoop3 > - > > Key: HUDI-3431 > URL: https://issues.apache.org/jira/browse/HUDI-3431 > Project: Apache Hudi > Issue Type: Epic > Components: dependencies >Reporter: Raymond Xu >Assignee: Rahil Chertara >Priority: Blocker > Fix For: 0.15.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [MINOR] Handle cases of malformed records when converting to json [hudi]
the-other-tim-brown commented on code in PR #10943: URL: https://github.com/apache/hudi/pull/10943#discussion_r1546657481 ## hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java: ## @@ -209,10 +210,18 @@ public static byte[] avroToJson(GenericRecord record, boolean pretty) throws IOE private static ByteArrayOutputStream avroToJsonHelper(GenericRecord record, boolean pretty) throws IOException { DatumWriter writer = new GenericDatumWriter<>(record.getSchema()); ByteArrayOutputStream out = new ByteArrayOutputStream(); -JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty); -writer.write(record, jsonEncoder); -jsonEncoder.flush(); -return out; +try { + JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(record.getSchema(), out, pretty); + writer.write(record, jsonEncoder); + jsonEncoder.flush(); + return out; +} catch (ClassCastException | NullPointerException ex) { + // NullPointerException will be thrown in cases where the field values are missing + // ClassCastException will be thrown in cases where the field values do not match the schema type + // Fallback to using `toString` which also returns json but without a pretty-print option + out.write(record.toString().getBytes(StandardCharsets.UTF_8)); Review Comment: One concern I have is that this could hide some exception and then we don't catch something in our initial testing for some more critical timeline related flow. I think I've convinced myself there should just be a new method like "safeToJson" that does not throw an exception that we use in the error table/writer cases since those are not as critical to Hudi. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [HUDI-7557] Fix incremental cleaner when commit for savepoint removed (#10946)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 9efced37f81 [HUDI-7557] Fix incremental cleaner when commit for savepoint removed (#10946) 9efced37f81 is described below commit 9efced37f819ae59b51099ee43dc75e1a876a855 Author: Sagar Sumit AuthorDate: Mon Apr 1 23:00:19 2024 +0530 [HUDI-7557] Fix incremental cleaner when commit for savepoint removed (#10946) --- .../hudi/table/action/clean/CleanPlanner.java | 1 + .../apache/hudi/table/action/TestCleanPlanner.java | 89 -- 2 files changed, 51 insertions(+), 39 deletions(-) diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java index 48ec8f9baa1..753f8c8253d 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java @@ -245,6 +245,7 @@ public class CleanPlanner implements Serializable { Option instantOption = hoodieTable.getCompletedCommitsTimeline().filter(instant -> instant.getTimestamp().equals(savepointCommit)).firstInstant(); if (!instantOption.isPresent()) { LOG.warn("Skipping to process a commit for which savepoint was removed as the instant moved to archived timeline already"); +return Stream.empty(); } HoodieInstant instant = instantOption.get(); return getPartitionsForInstants(instant); diff --git a/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/TestCleanPlanner.java b/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/TestCleanPlanner.java index 8052572fcea..9989273b723 100644 --- a/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/TestCleanPlanner.java +++ b/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/TestCleanPlanner.java @@ -138,14 +138,14 @@ public class TestCleanPlanner { void testPartitionsForIncrCleaning(HoodieWriteConfig config, String earliestInstant, String lastCompletedTimeInLastClean, String lastCleanInstant, String earliestInstantsInLastClean, List partitionsInLastClean, Map> savepointsTrackedInLastClean, Map> activeInstantsPartitions, - Map> savepoints, List expectedPartitions) throws IOException { + Map> savepoints, List expectedPartitions, boolean areCommitsForSavepointsRemoved) throws IOException { HoodieActiveTimeline activeTimeline = mock(HoodieActiveTimeline.class); when(mockHoodieTable.getActiveTimeline()).thenReturn(activeTimeline); // setup savepoint mocks Set savepointTimestamps = savepoints.keySet().stream().collect(Collectors.toSet()); when(mockHoodieTable.getSavepointTimestamps()).thenReturn(savepointTimestamps); if (!savepoints.isEmpty()) { - for (Map.Entry> entry: savepoints.entrySet()) { + for (Map.Entry> entry : savepoints.entrySet()) { Pair> savepointMetadataOptionPair = getSavepointMetadata(entry.getValue()); HoodieInstant instant = new HoodieInstant(false, HoodieTimeline.SAVEPOINT_ACTION, entry.getKey()); when(activeTimeline.getInstantDetails(instant)).thenReturn(savepointMetadataOptionPair.getRight()); @@ -156,7 +156,7 @@ public class TestCleanPlanner { Pair> cleanMetadataOptionPair = getCleanCommitMetadata(partitionsInLastClean, lastCleanInstant, earliestInstantsInLastClean, lastCompletedTimeInLastClean, savepointsTrackedInLastClean.keySet()); mockLastCleanCommit(mockHoodieTable, lastCleanInstant, earliestInstantsInLastClean, activeTimeline, cleanMetadataOptionPair); -mockFewActiveInstants(mockHoodieTable, activeInstantsPartitions, savepointsTrackedInLastClean); +mockFewActiveInstants(mockHoodieTable, activeInstantsPartitions, savepointsTrackedInLastClean, areCommitsForSavepointsRemoved); // Trigger clean and validate partitions to clean. CleanPlanner cleanPlanner = new CleanPlanner<>(context, mockHoodieTable, config); @@ -332,7 +332,7 @@ public class TestCleanPlanner { static Stream keepLatestByHoursOrCommitsArgsIncrCleanPartitions() { String earliestInstant = "20231204194919610"; -String earliestInstantPlusTwoDays = "20231206194919610"; +String earliestInstantPlusTwoDays = "20231206194919610"; String lastCleanInstant = earliestInstantPlusTwoDays; String earliestInstantMinusThreeDays = "20231201194919610"; String earliestInstantMinusFourDays = "20231130194919610"; @@ -340,9 +340,9 @@ public class T
Re: [PR] [HUDI-7557] Fix incremental cleaner when commit for savepoint removed [hudi]
nsivabalan merged PR #10946: URL: https://github.com/apache/hudi/pull/10946 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7486] Classify schema exceptions when converting from avro to spark row representation [hudi]
hudi-bot commented on PR #10778: URL: https://github.com/apache/hudi/pull/10778#issuecomment-2030174858 ## CI report: * 51380200fafd1b3917658c549ab3caa3e5a408f5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23069) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6854] Change default payload type to HOODIE_AVRO_DEFAULT [hudi]
hudi-bot commented on PR #10949: URL: https://github.com/apache/hudi/pull/10949#issuecomment-2030152748 ## CI report: * 1984e34cf984ca5088cd921e26cd3d74421afb03 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23070) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Column comments not syncing to AWS Glue Catalog [hudi]
TrustOkoroego commented on issue #8857: URL: https://github.com/apache/hudi/issues/8857#issuecomment-2030087508 @cbts-alec-johnson I need to implement this. Could you please tell you your configuration t o sync the comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Incremental query not working on COW table [hudi]
NishantBaheti commented on issue #10850: URL: https://github.com/apache/hudi/issues/10850#issuecomment-2030086053 @ad1happy2go moved to the MOR table. COW configurations felt a little unstable. had to rush the project to production quickly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] use Temurin jdk [hudi]
hudi-bot commented on PR #10948: URL: https://github.com/apache/hudi/pull/10948#issuecomment-2030071024 ## CI report: * 3109fe81b4d356316fb2b2837270c226a36ccf50 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23067) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6854] Change default payload type to HOODIE_AVRO_DEFAULT [hudi]
hudi-bot commented on PR #10949: URL: https://github.com/apache/hudi/pull/10949#issuecomment-2030045443 ## CI report: * 685ba9e778377eb4c1a72016c1c8a745e965551e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23068) * 1984e34cf984ca5088cd921e26cd3d74421afb03 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23070) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7486] Classify schema exceptions when converting from avro to spark row representation [hudi]
hudi-bot commented on PR #10778: URL: https://github.com/apache/hudi/pull/10778#issuecomment-2030044819 ## CI report: * 0e2e1d8ea5829905db3464a97593bb81231bbc08 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23066) * 51380200fafd1b3917658c549ab3caa3e5a408f5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23069) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7562) using -DTest=TestClass will still run scala tests
Jonathan Vexler created HUDI-7562: - Summary: using -DTest=TestClass will still run scala tests Key: HUDI-7562 URL: https://issues.apache.org/jira/browse/HUDI-7562 Project: Apache Hudi Issue Type: Bug Reporter: Jonathan Vexler As a workaround for now, you can set -DwildcardSuites="abdcd" so that all scala tests are filtered out. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6854] Change default payload type to HOODIE_AVRO_DEFAULT [hudi]
hudi-bot commented on PR #10949: URL: https://github.com/apache/hudi/pull/10949#issuecomment-2029917956 ## CI report: * 685ba9e778377eb4c1a72016c1c8a745e965551e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23068) * 1984e34cf984ca5088cd921e26cd3d74421afb03 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23070) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6854] Change default payload type to HOODIE_AVRO_DEFAULT [hudi]
hudi-bot commented on PR #10949: URL: https://github.com/apache/hudi/pull/10949#issuecomment-2029905910 ## CI report: * 685ba9e778377eb4c1a72016c1c8a745e965551e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23068) * 1984e34cf984ca5088cd921e26cd3d74421afb03 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7486] Classify schema exceptions when converting from avro to spark row representation [hudi]
hudi-bot commented on PR #10778: URL: https://github.com/apache/hudi/pull/10778#issuecomment-2029905218 ## CI report: * 521ae79c05782ff553c945bc84c27afe33f8e52a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23053) * 0e2e1d8ea5829905db3464a97593bb81231bbc08 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23066) * 51380200fafd1b3917658c549ab3caa3e5a408f5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23069) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7486] Classify schema exceptions when converting from avro to spark row representation [hudi]
hudi-bot commented on PR #10778: URL: https://github.com/apache/hudi/pull/10778#issuecomment-2029891276 ## CI report: * 521ae79c05782ff553c945bc84c27afe33f8e52a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23053) * 0e2e1d8ea5829905db3464a97593bb81231bbc08 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23066) * 51380200fafd1b3917658c549ab3caa3e5a408f5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] use Temurin jdk [hudi]
hudi-bot commented on PR #10948: URL: https://github.com/apache/hudi/pull/10948#issuecomment-2029822794 ## CI report: * 3109fe81b4d356316fb2b2837270c226a36ccf50 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23067) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6854] Change default payload type to HOODIE_AVRO_DEFAULT [hudi]
hudi-bot commented on PR #10949: URL: https://github.com/apache/hudi/pull/10949#issuecomment-2029822835 ## CI report: * 685ba9e778377eb4c1a72016c1c8a745e965551e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23068) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7486] Classify schema exceptions when converting from avro to spark row representation [hudi]
hudi-bot commented on PR #10778: URL: https://github.com/apache/hudi/pull/10778#issuecomment-2029822297 ## CI report: * 521ae79c05782ff553c945bc84c27afe33f8e52a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23053) * 0e2e1d8ea5829905db3464a97593bb81231bbc08 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23066) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7486] Classify schema exceptions when converting from avro to spark row representation [hudi]
hudi-bot commented on PR #10778: URL: https://github.com/apache/hudi/pull/10778#issuecomment-2029810492 ## CI report: * 521ae79c05782ff553c945bc84c27afe33f8e52a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23053) * 0e2e1d8ea5829905db3464a97593bb81231bbc08 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6854] Change default payload type to HOODIE_AVRO_DEFAULT [hudi]
hudi-bot commented on PR #10949: URL: https://github.com/apache/hudi/pull/10949#issuecomment-2029811194 ## CI report: * 685ba9e778377eb4c1a72016c1c8a745e965551e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] use Temurin jdk [hudi]
hudi-bot commented on PR #10948: URL: https://github.com/apache/hudi/pull/10948#issuecomment-2029811129 ## CI report: * 3109fe81b4d356316fb2b2837270c226a36ccf50 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] hudi0.14.0: Insert data into hudi with spark or create a new table exception [hudi]
ad1happy2go commented on issue #10838: URL: https://github.com/apache/hudi/issues/10838#issuecomment-2029795850 @SmyxBug Were you able to get it working with suggestion @CTTY provided. Feel free to close if you are all good here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6854) Change default keygen type to HOODIE_AVRO_DEFAULT
[ https://issues.apache.org/jira/browse/HUDI-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6854: - Labels: pull-request-available (was: ) > Change default keygen type to HOODIE_AVRO_DEFAULT > - > > Key: HUDI-6854 > URL: https://issues.apache.org/jira/browse/HUDI-6854 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Assignee: Vova Kolmakov >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > Current default is OVERWRITE_LATEST which instantiates > OverwriteWithLatestAvroPayload but it's not intuitive when latest gets > written and user sets some precombine field and expects to merge records > based on that field. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-6854] Change default payload type to HOODIE_AVRO_DEFAULT [hudi]
wombatu-kun opened a new pull request, #10949: URL: https://github.com/apache/hudi/pull/10949 ### Change Logs Changed default payload type to HOODIE_AVRO_DEFAULT. Current default is OVERWRITE_LATEST which instantiates OverwriteWithLatestAvroPayload but it's not intuitive when latest gets written and user sets some precombine field and expects to merge records based on that field. ### Impact none ### Risk level (write none, low medium or high below) none ### Documentation Update Needs to update default value in documentation. - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Incremental query not working on COW table [hudi]
ad1happy2go commented on issue #10850: URL: https://github.com/apache/hudi/issues/10850#issuecomment-2029789548 @NishantBaheti Were you able to get it resolve? Can you let us know full stack trace. Looks like Unable to load class means some library conflicts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-6854) Change default keygen type to HOODIE_AVRO_DEFAULT
[ https://issues.apache.org/jira/browse/HUDI-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov reassigned HUDI-6854: --- Assignee: Vova Kolmakov > Change default keygen type to HOODIE_AVRO_DEFAULT > - > > Key: HUDI-6854 > URL: https://issues.apache.org/jira/browse/HUDI-6854 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Assignee: Vova Kolmakov >Priority: Major > Fix For: 1.0.0 > > > Current default is OVERWRITE_LATEST which instantiates > OverwriteWithLatestAvroPayload but it's not intuitive when latest gets > written and user sets some precombine field and expects to merge records > based on that field. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6854) Change default keygen type to HOODIE_AVRO_DEFAULT
[ https://issues.apache.org/jira/browse/HUDI-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov updated HUDI-6854: Status: In Progress (was: Open) > Change default keygen type to HOODIE_AVRO_DEFAULT > - > > Key: HUDI-6854 > URL: https://issues.apache.org/jira/browse/HUDI-6854 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Assignee: Vova Kolmakov >Priority: Major > Fix For: 1.0.0 > > > Current default is OVERWRITE_LATEST which instantiates > OverwriteWithLatestAvroPayload but it's not intuitive when latest gets > written and user sets some precombine field and expects to merge records > based on that field. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [I] [SUPPORT] could hudi skip shuffle in SortMergeJoin, like what bucketby does in Spark? [hudi]
ad1happy2go commented on issue #10704: URL: https://github.com/apache/hudi/issues/10704#issuecomment-2029776148 @boneanxs @ziudu Created a JIRA - https://issues.apache.org/jira/browse/HUDI-7561 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7561) Skip shuffling entire data in SortMergeJoin while upserting
Aditya Goenka created HUDI-7561: --- Summary: Skip shuffling entire data in SortMergeJoin while upserting Key: HUDI-7561 URL: https://issues.apache.org/jira/browse/HUDI-7561 Project: Apache Hudi Issue Type: Improvement Components: writer-core Reporter: Aditya Goenka Fix For: 1.1.0 [https://github.com/apache/hudi/issues/10704] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [MINOR] use Temurin jdk [hudi]
sullis opened a new pull request, #10948: URL: https://github.com/apache/hudi/pull/10948 ### Change Logs [MINOR] replace AdoptOpenJDK with Temurin jdk ### Impact n/a ### Risk level (write none, low medium or high below) low ### Documentation Update n/a ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Data duplicated in base file on updating record partition [hudi]
codope closed issue #10932: [SUPPORT] Data duplicated in base file on updating record partition URL: https://github.com/apache/hudi/issues/10932 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7559] [1/n] Fix RecordLevelIndexSupport::filterQueryWithRecordKey [hudi]
hudi-bot commented on PR #10947: URL: https://github.com/apache/hudi/pull/10947#issuecomment-2029728713 ## CI report: * 85cbde75f0f652274dc28f940cd0a159096b6aad Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23065) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Nested object support in Hudi Table using Flink [hudi]
ad1happy2go commented on issue #10895: URL: https://github.com/apache/hudi/issues/10895#issuecomment-2029718076 @waytoharish Did you got a chance to try out GenericRowData, Are you still facing the issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Async Clustering failing for MoR in 0.13.0 [hudi]
ad1happy2go commented on issue #8153: URL: https://github.com/apache/hudi/issues/8153#issuecomment-2029711805 @haripriyarhp I tried with 0.14.X version and it works fine. Couldn't reproduce. I know I am late. Let me know in case you were able to resolve this issue or need any other help on this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Archival not working for hudi & corresponding hudi metadata table [hudi]
codope closed issue #9478: [SUPPORT] Archival not working for hudi & corresponding hudi metadata table URL: https://github.com/apache/hudi/issues/9478 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Archival not working for hudi & corresponding hudi metadata table [hudi]
ad1happy2go commented on issue #9478: URL: https://github.com/apache/hudi/issues/9478#issuecomment-2029704490 @PankajKaushal Closing this out. Please reopen or create a new one in case of any more issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] spark stuctrued streaming failed to update MDT metadata [hudi]
ad1happy2go commented on issue #10891: URL: https://github.com/apache/hudi/issues/10891#issuecomment-2029680639 @xicm I will try to reproduce it. Can you provide more details on the steps which I can follow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Hudi deltastreamer fails due to Clean [hudi]
codope closed issue #7209: [SUPPORT] Hudi deltastreamer fails due to Clean URL: https://github.com/apache/hudi/issues/7209 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Hudi deltastreamer fails due to Clean [hudi]
ad1happy2go commented on issue #7209: URL: https://github.com/apache/hudi/issues/7209#issuecomment-2029660841 @koldic Sorry we missed it. You can use multi writer concurrency control to handle that. https://hudi.apache.org/docs/concurrency_control/#enabling-multi-writing Closing this issue as it was due to multi writers. Thanks. Feel free to open new one in case of any new issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Historical Clean and RollBack commits are not archived [hudi]
codope closed issue #9084: [SUPPORT] Historical Clean and RollBack commits are not archived URL: https://github.com/apache/hudi/issues/9084 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Historical Clean and RollBack commits are not archived [hudi]
ad1happy2go commented on issue #9084: URL: https://github.com/apache/hudi/issues/9084#issuecomment-2029654012 @thomasg19930417 Closing this issue. Please reopen in case you still have any doubts on this. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] AWS Athena query fail when compaction is scheduled for MOR table [hudi]
ad1happy2go commented on issue #9907: URL: https://github.com/apache/hudi/issues/9907#issuecomment-2029651546 @brightwon Were you able to identify the root cause issue? Do let us know in case you still need help here . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7559] [1/n] Fix RecordLevelIndexSupport::filterQueryWithRecordKey [hudi]
hudi-bot commented on PR #10947: URL: https://github.com/apache/hudi/pull/10947#issuecomment-2029649541 ## CI report: * 85cbde75f0f652274dc28f940cd0a159096b6aad Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23065) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Occur bucketid multiple cannot write data to the wrong partition [hudi]
ad1happy2go commented on issue #10899: URL: https://github.com/apache/hudi/issues/10899#issuecomment-2029642719 Also are you using Spark Structured streaming or HudiStreamer? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Occur bucketid multiple cannot write data to the wrong partition [hudi]
ad1happy2go commented on issue #10899: URL: https://github.com/apache/hudi/issues/10899#issuecomment-2029642198 @xuzifu666 Can you please post the table/writer configuration you are using? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7559] [1/n] Fix RecordLevelIndexSupport::filterQueryWithRecordKey [hudi]
hudi-bot commented on PR #10947: URL: https://github.com/apache/hudi/pull/10947#issuecomment-2029642273 ## CI report: * 85cbde75f0f652274dc28f940cd0a159096b6aad UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7559) Fix functional index (on column stats): Handle NPE in filterQueriesWithRecordKey(...)
[ https://issues.apache.org/jira/browse/HUDI-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinaykumar Bhat updated HUDI-7559: -- Description: `RecordLevelIndexSupport::filterQueryWithRecordKey(...)` throws NPE which is then subsequently `lookupCandidateFilesInMetadataTable()` rendering every other index (like FunctionalIndex, ColStat Index) to not be used for data skipping (i.e pruning files) Summary: Fix functional index (on column stats): Handle NPE in filterQueriesWithRecordKey(...) (was: Fix issues with functional index (on column stats) based pruning) > Fix functional index (on column stats): Handle NPE in > filterQueriesWithRecordKey(...) > - > > Key: HUDI-7559 > URL: https://issues.apache.org/jira/browse/HUDI-7559 > Project: Apache Hudi > Issue Type: Bug >Reporter: Vinaykumar Bhat >Assignee: Vinaykumar Bhat >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > `RecordLevelIndexSupport::filterQueryWithRecordKey(...)` throws NPE which is > then subsequently `lookupCandidateFilesInMetadataTable()` rendering every > other index (like FunctionalIndex, ColStat Index) to not be used for data > skipping (i.e pruning files) -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [I] [SUPPORT] IllegalArgumentException at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:33) [hudi]
ad1happy2go commented on issue #10906: URL: https://github.com/apache/hudi/issues/10906#issuecomment-2029636299 @michael1991 Thanks for identifying the root cause. Do you have a fix in your mind. Created tracking jira for the same - https://issues.apache.org/jira/browse/HUDI-7560 Are you using spark structured streaming to write or HudiStreamer? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7560) Rollback with async cleaning creating deadlocks and failing the subsequent write
Aditya Goenka created HUDI-7560: --- Summary: Rollback with async cleaning creating deadlocks and failing the subsequent write Key: HUDI-7560 URL: https://issues.apache.org/jira/browse/HUDI-7560 Project: Apache Hudi Issue Type: Bug Components: writer-core Reporter: Aditya Goenka Fix For: 1.1.0 Github Issue - [https://github.com/apache/hudi/issues/10906] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7559) Fix issues with functional index (on column stats) based pruning
[ https://issues.apache.org/jira/browse/HUDI-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinaykumar Bhat updated HUDI-7559: -- Status: In Progress (was: Open) > Fix issues with functional index (on column stats) based pruning > > > Key: HUDI-7559 > URL: https://issues.apache.org/jira/browse/HUDI-7559 > Project: Apache Hudi > Issue Type: Bug >Reporter: Vinaykumar Bhat >Assignee: Vinaykumar Bhat >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7559) Fix issues with functional index (on column stats) based pruning
[ https://issues.apache.org/jira/browse/HUDI-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinaykumar Bhat updated HUDI-7559: -- Epic Link: HUDI-512 > Fix issues with functional index (on column stats) based pruning > > > Key: HUDI-7559 > URL: https://issues.apache.org/jira/browse/HUDI-7559 > Project: Apache Hudi > Issue Type: Bug >Reporter: Vinaykumar Bhat >Assignee: Vinaykumar Bhat >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7559) Fix issues with functional index (on column stats) based pruning
[ https://issues.apache.org/jira/browse/HUDI-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinaykumar Bhat updated HUDI-7559: -- Fix Version/s: 1.0.0 > Fix issues with functional index (on column stats) based pruning > > > Key: HUDI-7559 > URL: https://issues.apache.org/jira/browse/HUDI-7559 > Project: Apache Hudi > Issue Type: Bug >Reporter: Vinaykumar Bhat >Assignee: Vinaykumar Bhat >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7557] Fix incremental cleaner when commit for savepoint removed [hudi]
danny0405 commented on code in PR #10946: URL: https://github.com/apache/hudi/pull/10946#discussion_r1546222049 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java: ## @@ -245,6 +245,7 @@ private List getPartitionsFromDeletedSavepoint(HoodieCleanMetadata clean Option instantOption = hoodieTable.getCompletedCommitsTimeline().filter(instant -> instant.getTimestamp().equals(savepointCommit)).firstInstant(); if (!instantOption.isPresent()) { LOG.warn("Skipping to process a commit for which savepoint was removed as the instant moved to archived timeline already"); +return Stream.empty(); Review Comment: Does this mean the archived savepoint partition never got cleaned? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7559] [1/n] Fix RecordLevelIndexSupport::filterQueryWithRecordKey [hudi]
bhat-vinay commented on PR #10947: URL: https://github.com/apache/hudi/pull/10947#issuecomment-2029589170 cc: @codope Please review. This is the first PR in a series of fixes required to prune files (and enable data skipping) using functional index based on column stats. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7559) Fix issues with functional index (on column stats) based pruning
[ https://issues.apache.org/jira/browse/HUDI-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7559: - Labels: pull-request-available (was: ) > Fix issues with functional index (on column stats) based pruning > > > Key: HUDI-7559 > URL: https://issues.apache.org/jira/browse/HUDI-7559 > Project: Apache Hudi > Issue Type: Bug >Reporter: Vinaykumar Bhat >Assignee: Vinaykumar Bhat >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7559] [1/n] Fix RecordLevelIndexSupport::filterQueryWithRecordKey [hudi]
bhat-vinay opened a new pull request, #10947: URL: https://github.com/apache/hudi/pull/10947 RecordLevelIndexSupport::filterQueryWithRecordKey() throws a NPE if the EqualTo query predicate is not of the form `AttributeReference = Literal`. This is because RecordLevelIndexSupport:::getAttributeLiteralTuple() returns null in such cases which is then derefercend unconditionally. This bug was rendering the functional index to not be used even when the query predicate had spark functions on which functional index is built. Hence these column-stats based functional index was not pruning files. This PR makes the following minor changes. 1. Move some methods in RecordLevelIndexSupport into an object to make it static (to aid in unit testing) 2. Fix filterQueryWithRecordKey() by checking for null return values from the call to getAttributeLiteralTuple 3. Add unit tests in TestRecordLevelIndexSupport.scala ### Change Logs This PR makes the following minor changes. 1. Move some methods in RecordLevelIndexSupport into an object to make it static (to aid in unit testing) 2. Fix filterQueryWithRecordKey() by checking for null return values from the call to getAttributeLiteralTuple 3. Add unit tests in TestRecordLevelIndexSupport.scala ### Impact Bug fix. ### Risk level (write none, low medium or high below) None ### Documentation Update None ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7522] Support find out the conflict instants in bucket partition when bucket id multiple [hudi]
xiarixiaoyao commented on PR #10898: URL: https://github.com/apache/hudi/pull/10898#issuecomment-2029573829 @xuzifu666 @danny0405 @beyond1920 i think we should solve the root cause of bucket duplication. There are currently three situations where bucket file duplication occurs 1. Spark Speculation execution . Turn off speculative execution , we can solve this problem 2. hoodier archiver Parallel deletet complete timeline . 1.0 has solved this problem. 3. Concurrent into ovewrite of multiple spark writer . this is a bug need to fixed. now focus on the scence3: Concurrent into ovewrite of multiple spark writer when hudi build fileslice, hudi will call isFileSliceCommitted to Determine if the current file is committed. ``` /** * A FileSlice is considered committed, if one of the following is true - There is a committed data file - There are * some log files, that are based off a commit or delta commit. */ private boolean isFileSliceCommitted(FileSlice slice) { if (!compareTimestamps(slice.getBaseInstantTime(), LESSER_THAN_OR_EQUALS, lastInstant.get().getTimestamp())) { return false; } return timeline.containsOrBeforeTimelineStarts(slice.getBaseInstantTime()); } ``` this is ok for single concurrent write scenario, but for mutil write the logical of isFileSliceCommitted has some bugs. If a file has a smaller commit time then smallest complete commit, Hudi will directly determine that the file is committed, even if it is a Garbage file or (File generated by write failure) eg: two spark app insert overwrite hudi BUCKET table with same partition. app1: start write commit at 0001 write files: 0--uuid1.parquet app2: start write commit at 0002 write files: 0--uuid2.parquet app1 maybe failed to write due to OCC /cancel/OOM, but 0--uuid1.parquet is already written. when hudi build fileslice, 0--uuid1.parquet is considered as committed. since it‘s committime 0001 < smallest complete commit 0002. this is wrong, committime 0001 is not committed maybe we can modify isFileSliceCommitted like this ``` private boolean isFileSliceCommitted(FileSlice slice) { if (!compareTimestamps(slice.getBaseInstantTime(), LESSER_THAN_OR_EQUALS, lastInstant.get().getTimestamp())) { return false; } return timeline.containsOrBeforeTimelineStarts(slice.getBaseInstantTime()) && UncompleteTimelineNotContains(slice.getBaseInstantTime()); } ``` finally, I think Hudi's fileslices should be managed uniformly, just like iceberg/delta lakes, rather than being obtained through list operation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7526] Fix constructors for bulkinsert sort partitioners to ensure we could use it as user defined partitioners [hudi]
wombatu-kun commented on code in PR #10942: URL: https://github.com/apache/hudi/pull/10942#discussion_r1546190914 ## hudi-client/hudi-java-client/src/main/java/org/apache/hudi/execution/bulkinsert/JavaGlobalSortPartitioner.java: ## @@ -31,12 +32,21 @@ * * @param HoodieRecordPayload type */ -public class JavaGlobalSortPartitioner -implements BulkInsertPartitioner>> { +public class JavaGlobalSortPartitioner implements BulkInsertPartitioner>> { + + public JavaGlobalSortPartitioner() { + } + + /** + * Constructor to create as UserDefinedBulkInsertPartitioner class via reflection + * @param config HoodieWriteConfig Review Comment: Before this fix: if user wants to use JavaGlobalSortPartitioner and he set `hoodie.bulkinsert.user.defined.partitioner.class=org.apache.hudi.execution.bulkinsert.JavaGlobalSortPartitioner`, it will not work because this partitioner could not be instantiated via reflection (as it has no constructor with writeConfig parameter). We create this constructor to add ability to use JavaGlobalSortPartitioner as user defined partitioner just by setting it's class name in writeConfig. Don't know how to explain more clear. Let's wait for the author's reply. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7526] Fix constructors for bulkinsert sort partitioners to ensure we could use it as user defined partitioners [hudi]
danny0405 commented on code in PR #10942: URL: https://github.com/apache/hudi/pull/10942#discussion_r1546174541 ## hudi-client/hudi-java-client/src/main/java/org/apache/hudi/execution/bulkinsert/JavaGlobalSortPartitioner.java: ## @@ -31,12 +32,21 @@ * * @param HoodieRecordPayload type */ -public class JavaGlobalSortPartitioner -implements BulkInsertPartitioner>> { +public class JavaGlobalSortPartitioner implements BulkInsertPartitioner>> { + + public JavaGlobalSortPartitioner() { + } + + /** + * Constructor to create as UserDefinedBulkInsertPartitioner class via reflection + * @param config HoodieWriteConfig Review Comment: > Yes, in this case HoodieWriteConfig is ignored just because this Partitioner is not configurable at all, but it does not mean that it should not be used as UserDefinedBulkInsertPartitioner That does not make sense for me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7557] Fix incremental cleaner when commit for savepoint removed [hudi]
hudi-bot commented on PR #10946: URL: https://github.com/apache/hudi/pull/10946#issuecomment-2029433733 ## CI report: * cbcbc5182f524886946fdefec86faf75110f35c5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23064) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Hudi cdc upserts stopped working after migrating from hudi 13.1 to 14.0 [hudi]
ROOBALJINDAL closed issue #10884: [SUPPORT] Hudi cdc upserts stopped working after migrating from hudi 13.1 to 14.0 URL: https://github.com/apache/hudi/issues/10884 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Hudi cdc upserts stopped working after migrating from hudi 13.1 to 14.0 [hudi]
ROOBALJINDAL commented on issue #10884: URL: https://github.com/apache/hudi/issues/10884#issuecomment-2029401141 I have found the issue. We were using custom MssqlDebeziumSource class as debezium source and in constructor we were using `HoodieStreamerMetrics` instead of `HoodieIngestionMetrics` (which is introduced in hudi 14.0) Once corrected the class, it started working. We can close this issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] The parquet files for the MOR table have been generated, but the RO table in Hive still cannot query the latest data in the parquet files. [hudi]
ad1happy2go commented on issue #10907: URL: https://github.com/apache/hudi/issues/10907#issuecomment-2029384654 @Toroidals Did you got a chance to check it? Were you able to identify the root cause for the issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7559) Fix issues with functional index (on column stats) based pruning
Vinaykumar Bhat created HUDI-7559: - Summary: Fix issues with functional index (on column stats) based pruning Key: HUDI-7559 URL: https://issues.apache.org/jira/browse/HUDI-7559 Project: Apache Hudi Issue Type: Bug Reporter: Vinaykumar Bhat -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7559) Fix issues with functional index (on column stats) based pruning
[ https://issues.apache.org/jira/browse/HUDI-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinaykumar Bhat reassigned HUDI-7559: - Assignee: Vinaykumar Bhat > Fix issues with functional index (on column stats) based pruning > > > Key: HUDI-7559 > URL: https://issues.apache.org/jira/browse/HUDI-7559 > Project: Apache Hudi > Issue Type: Bug >Reporter: Vinaykumar Bhat >Assignee: Vinaykumar Bhat >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [I] [SUPPORT] Requesting Support for insert_overwrite in Delta Streamer [hudi]
ad1happy2go commented on issue #10896: URL: https://github.com/apache/hudi/issues/10896#issuecomment-2029348950 As, Sudha suggested, can you also send a mail to dev list thread and point the conversation here. Good to hear thought on this from others. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Requesting Support for insert_overwrite in Delta Streamer [hudi]
ad1happy2go commented on issue #10896: URL: https://github.com/apache/hudi/issues/10896#issuecomment-2029348293 @soumilshah1995 This makes sense. Create a JIRA also to track - https://issues.apache.org/jira/browse/HUDI-7558 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org