[jira] [Updated] (HUDI-7221) Move Hudi Option class from hudi-common to hudi-io module
[ https://issues.apache.org/jira/browse/HUDI-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7221: Description: This is to make classes in hudi-io module to also use Option class. > Move Hudi Option class from hudi-common to hudi-io module > - > > Key: HUDI-7221 > URL: https://issues.apache.org/jira/browse/HUDI-7221 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > > This is to make classes in hudi-io module to also use Option class. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-7846) Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build
[ https://issues.apache.org/jira/browse/HUDI-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7846. --- Resolution: Fixed > Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven > parallel build > - > > Key: HUDI-7846 > URL: https://issues.apache.org/jira/browse/HUDI-7846 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.16.0, 1.0.0 > > > The following warning is thrown when doing maven parallel build with `mvn -T > 1C ...` > {code:java} > [WARNING] Enable debug to see precisely which goals are not marked as > thread-safe. > [WARNING] * > [WARNING] * Your build is requesting parallel execution, but this * > [WARNING] * project contains the following plugin(s) that have goals not * > [WARNING] * marked as thread-safe to support parallel execution. * > [WARNING] * While this /may/ work fine, please look for plugin updates * > [WARNING] * and/or request plugins be made thread-safe. * > [WARNING] * If reporting an issue, report it against the plugin in * > [WARNING] * question, not against Apache Maven. * > [WARNING] * > [WARNING] The following plugins are not marked as thread-safe in > hudi-hadoop-mr: > [WARNING] org.apache.rat:apache-rat-plugin:0.13 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7846) Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build
[ https://issues.apache.org/jira/browse/HUDI-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7846: Story Points: 0 > Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven > parallel build > - > > Key: HUDI-7846 > URL: https://issues.apache.org/jira/browse/HUDI-7846 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.16.0, 1.0.0 > > > The following warning is thrown when doing maven parallel build with `mvn -T > 1C ...` > {code:java} > [WARNING] Enable debug to see precisely which goals are not marked as > thread-safe. > [WARNING] * > [WARNING] * Your build is requesting parallel execution, but this * > [WARNING] * project contains the following plugin(s) that have goals not * > [WARNING] * marked as thread-safe to support parallel execution. * > [WARNING] * While this /may/ work fine, please look for plugin updates * > [WARNING] * and/or request plugins be made thread-safe. * > [WARNING] * If reporting an issue, report it against the plugin in * > [WARNING] * question, not against Apache Maven. * > [WARNING] * > [WARNING] The following plugins are not marked as thread-safe in > hudi-hadoop-mr: > [WARNING] org.apache.rat:apache-rat-plugin:0.13 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6798) Implement event-time-based merging mode in FileGroupReader
[ https://issues.apache.org/jira/browse/HUDI-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6798: Story Points: 10 (was: 3) > Implement event-time-based merging mode in FileGroupReader > -- > > Key: HUDI-6798 > URL: https://issues.apache.org/jira/browse/HUDI-6798 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: hudi-1.0.0-beta2, pull-request-available > Fix For: 1.0.0 > > > To achieve this, we should add a new table config > {{hoodie.record.merge.mode}} to control the record merging mode and behavior > in the new file group reader ({{{}HoodieFileGroupReader{}}}) and implements > event-time ordering in it. The table config {{hoodie.record.merge.mode}} is > going to be the single config that determines how the record merging happens > in release 1.0 and beyond. > > Three merging modes to define: > * {{{}OVERWRITE_WITH_LATEST{}}}: using transaction time to merge records, > i.e., the record from later transaction overwrites the earlier record with > the same key. This corresponds to the behavior of existing payload class > {{{}OverwriteWithLatestAvroPayload{}}}. > * {{{}EVENT_TIME_ORDERING{}}}: using event time as the ordering to merge > records, i.e., the record with the larger event time overwrites the record > with the smaller event time on the same key, regardless of transaction time. > The event time or preCombine field needs to be specified by the user. This > corresponds to the behavior of existing payload class > {{{}DefaultHoodieRecordPayload{}}}. > * {{{}CUSTOM{}}}: using custom merging logic specified by the user. When a > user specifies a custom record merger strategy or payload class with Avro > record merger, this is going to be specified so the record merging follows > user-defined logic as before. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-6798) Implement event-time-based merging mode in FileGroupReader
[ https://issues.apache.org/jira/browse/HUDI-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-6798. --- Resolution: Fixed > Implement event-time-based merging mode in FileGroupReader > -- > > Key: HUDI-6798 > URL: https://issues.apache.org/jira/browse/HUDI-6798 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: hudi-1.0.0-beta2, pull-request-available > Fix For: 1.0.0 > > > To achieve this, we should add a new table config > {{hoodie.record.merge.mode}} to control the record merging mode and behavior > in the new file group reader ({{{}HoodieFileGroupReader{}}}) and implements > event-time ordering in it. The table config {{hoodie.record.merge.mode}} is > going to be the single config that determines how the record merging happens > in release 1.0 and beyond. > > Three merging modes to define: > * {{{}OVERWRITE_WITH_LATEST{}}}: using transaction time to merge records, > i.e., the record from later transaction overwrites the earlier record with > the same key. This corresponds to the behavior of existing payload class > {{{}OverwriteWithLatestAvroPayload{}}}. > * {{{}EVENT_TIME_ORDERING{}}}: using event time as the ordering to merge > records, i.e., the record with the larger event time overwrites the record > with the smaller event time on the same key, regardless of transaction time. > The event time or preCombine field needs to be specified by the user. This > corresponds to the behavior of existing payload class > {{{}DefaultHoodieRecordPayload{}}}. > * {{{}CUSTOM{}}}: using custom merging logic specified by the user. When a > user specifies a custom record merger strategy or payload class with Avro > record merger, this is going to be specified so the record merging follows > user-defined logic as before. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6798) Implement event-time-based merging mode in FileGroupReader
[ https://issues.apache.org/jira/browse/HUDI-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6798: Description: To achieve this, we should add a new table config {{hoodie.record.merge.mode}} to control the record merging mode and behavior in the new file group reader ({{{}HoodieFileGroupReader{}}}) and implements event-time ordering in it. The table config {{hoodie.record.merge.mode}} is going to be the single config that determines how the record merging happens in release 1.0 and beyond. Three merging modes to define: * {{{}OVERWRITE_WITH_LATEST{}}}: using transaction time to merge records, i.e., the record from later transaction overwrites the earlier record with the same key. This corresponds to the behavior of existing payload class {{{}OverwriteWithLatestAvroPayload{}}}. * {{{}EVENT_TIME_ORDERING{}}}: using event time as the ordering to merge records, i.e., the record with the larger event time overwrites the record with the smaller event time on the same key, regardless of transaction time. The event time or preCombine field needs to be specified by the user. This corresponds to the behavior of existing payload class {{{}DefaultHoodieRecordPayload{}}}. * {{{}CUSTOM{}}}: using custom merging logic specified by the user. When a user specifies a custom record merger strategy or payload class with Avro record merger, this is going to be specified so the record merging follows user-defined logic as before. was:To achieve this, we should add a new table config {{hoodie.record.merge.mode}} to control the record merging mode and behavior in the new file group reader ({{{}HoodieFileGroupReader{}}}) and implements event-time ordering in it. The table config {{hoodie.record.merge.mode}} is going to be the single config that determines how the record merging happens in release 1.0 and beyond. > Implement event-time-based merging mode in FileGroupReader > -- > > Key: HUDI-6798 > URL: https://issues.apache.org/jira/browse/HUDI-6798 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: hudi-1.0.0-beta2, pull-request-available > Fix For: 1.0.0 > > > To achieve this, we should add a new table config > {{hoodie.record.merge.mode}} to control the record merging mode and behavior > in the new file group reader ({{{}HoodieFileGroupReader{}}}) and implements > event-time ordering in it. The table config {{hoodie.record.merge.mode}} is > going to be the single config that determines how the record merging happens > in release 1.0 and beyond. > > Three merging modes to define: > * {{{}OVERWRITE_WITH_LATEST{}}}: using transaction time to merge records, > i.e., the record from later transaction overwrites the earlier record with > the same key. This corresponds to the behavior of existing payload class > {{{}OverwriteWithLatestAvroPayload{}}}. > * {{{}EVENT_TIME_ORDERING{}}}: using event time as the ordering to merge > records, i.e., the record with the larger event time overwrites the record > with the smaller event time on the same key, regardless of transaction time. > The event time or preCombine field needs to be specified by the user. This > corresponds to the behavior of existing payload class > {{{}DefaultHoodieRecordPayload{}}}. > * {{{}CUSTOM{}}}: using custom merging logic specified by the user. When a > user specifies a custom record merger strategy or payload class with Avro > record merger, this is going to be specified so the record merging follows > user-defined logic as before. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6798) Implement event-time-based merging mode in FileGroupReader
[ https://issues.apache.org/jira/browse/HUDI-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6798: Description: To achieve this, we should add a new table config {{hoodie.record.merge.mode}} to control the record merging mode and behavior in the new file group reader ({{{}HoodieFileGroupReader{}}}) and implements event-time ordering in it. The table config {{hoodie.record.merge.mode}} is going to be the single config that determines how the record merging happens in release 1.0 and beyond. > Implement event-time-based merging mode in FileGroupReader > -- > > Key: HUDI-6798 > URL: https://issues.apache.org/jira/browse/HUDI-6798 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: hudi-1.0.0-beta2, pull-request-available > Fix For: 1.0.0 > > > To achieve this, we should add a new table config > {{hoodie.record.merge.mode}} to control the record merging mode and behavior > in the new file group reader ({{{}HoodieFileGroupReader{}}}) and implements > event-time ordering in it. The table config {{hoodie.record.merge.mode}} is > going to be the single config that determines how the record merging happens > in release 1.0 and beyond. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-7045) Fix new file format and reader for schema evolution
[ https://issues.apache.org/jira/browse/HUDI-7045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7045. --- Resolution: Fixed > Fix new file format and reader for schema evolution > --- > > Key: HUDI-7045 > URL: https://issues.apache.org/jira/browse/HUDI-7045 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > When this is implemented, parquet readers should not be created in > HoodieFileGroupReaderBasedParquetFileFormat. Additionally, we can > uncomment/add the code from this commit: > [https://github.com/apache/hudi/pull/10137/commits/b0b711e0c355320da652fa7f2d8669539873d4d6] -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [MINOR] Moving to 0.16.0-SNAPSHOT on branch-0.x [hudi]
hudi-bot commented on PR #11425: URL: https://github.com/apache/hudi/pull/11425#issuecomment-2157467585 ## CI report: * 47b890aa81f7f92f092da13ef7a7999f579f5d03 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7853) Fix missing serDe properties post migration from hiveSync to glueSync
[ https://issues.apache.org/jira/browse/HUDI-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7853: - Labels: pull-request-available (was: ) > Fix missing serDe properties post migration from hiveSync to glueSync > - > > Key: HUDI-7853 > URL: https://issues.apache.org/jira/browse/HUDI-7853 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Prathit Malik >Assignee: Prathit Malik >Priority: Major > Labels: pull-request-available > > More info : [https://github.com/apache/hudi/issues/11397] > > After migration to 0.13.1, hudi table path is missing from serde properties > due to which when reading from spark below error is thrown > - org.apache.hudi.exception.HoodieException: 'path' or 'Key: > 'hoodie.datasource.read.paths' , default: null description: Comma separated > list of file paths to read within a Hudi table. since version: version is not > defined deprecated after: version is not defined)' or both must be specified. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7852] Constrain the comparison of different types of ordering values to limited cases [hudi]
hudi-bot commented on PR #11424: URL: https://github.com/apache/hudi/pull/11424#issuecomment-2157467513 ## CI report: * f90504d5f8ef99b4ea25dd5b05127c54d3f4252e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24328) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7849] Reduce time spent on running testFiltersInFileFormat [hudi]
hudi-bot commented on PR #11423: URL: https://github.com/apache/hudi/pull/11423#issuecomment-2157467452 ## CI report: * 19caeb8d2270645aa6d0ddbdeaa08b31755d974b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24327) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7853] Fix missing serDe properties post migration from hiveSync to glueSync [hudi]
hudi-bot commented on PR #11404: URL: https://github.com/apache/hudi/pull/11404#issuecomment-2157467328 ## CI report: * fcafb8766c4b27557d1c40398ce28d8de8aec724 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24243) * 9dcca40f6488ef301b93176293f0f46dcb8ae017 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-7847) Infer record merge mode during table upgrade
[ https://issues.apache.org/jira/browse/HUDI-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853562#comment-17853562 ] Ethan Guo commented on HUDI-7847: - Note that HUDI-6798 has added the inference logic on the new record merge mode table config in HoodieTableMetaClient#inferRecordMergeMode (see [https://github.com/apache/hudi/pull/9894).] We can reuse the same logic during table upgrade from table version 7 to 8 (SevenToEightUpgradeHandler). > Infer record merge mode during table upgrade > > > Key: HUDI-7847 > URL: https://issues.apache.org/jira/browse/HUDI-7847 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Geser Dugarov >Priority: Major > Fix For: 1.0.0 > > > Record merge mode is required to dictate the merging behavior in release 1.x, > playing the same role as the payload class config in the release 0.x. During > table upgrade, we need to infer the record merge mode based on the payload > class so it's correctly set. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7846) Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build
[ https://issues.apache.org/jira/browse/HUDI-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7846: Sprint: 2024/06/03-16 > Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven > parallel build > - > > Key: HUDI-7846 > URL: https://issues.apache.org/jira/browse/HUDI-7846 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.16.0, 1.0.0 > > > The following warning is thrown when doing maven parallel build with `mvn -T > 1C ...` > {code:java} > [WARNING] Enable debug to see precisely which goals are not marked as > thread-safe. > [WARNING] * > [WARNING] * Your build is requesting parallel execution, but this * > [WARNING] * project contains the following plugin(s) that have goals not * > [WARNING] * marked as thread-safe to support parallel execution. * > [WARNING] * While this /may/ work fine, please look for plugin updates * > [WARNING] * and/or request plugins be made thread-safe. * > [WARNING] * If reporting an issue, report it against the plugin in * > [WARNING] * question, not against Apache Maven. * > [WARNING] * > [WARNING] The following plugins are not marked as thread-safe in > hudi-hadoop-mr: > [WARNING] org.apache.rat:apache-rat-plugin:0.13 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7853) Fix missing serDe properties post migration from hiveSync to glueSync
[ https://issues.apache.org/jira/browse/HUDI-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prathit Malik updated HUDI-7853: Description: More info : [https://github.com/apache/hudi/issues/11397] After migration to 0.13.1, hudi table path is missing from serde properties due to which when reading from spark below error is thrown - org.apache.hudi.exception.HoodieException: 'path' or 'Key: 'hoodie.datasource.read.paths' , default: null description: Comma separated list of file paths to read within a Hudi table. since version: version is not defined deprecated after: version is not defined)' or both must be specified. was: More info : [https://github.com/apache/hudi/issues/11397] After migration to 0.13.1, hudi table path is missing from serde properties due to which when reading from spark below error is thrown ```org.apache.hudi.exception.HoodieException: 'path' or 'Key: 'hoodie.datasource.read.paths' , default: null description: Comma separated list of file paths to read within a Hudi table. since version: version is not defined deprecated after: version is not defined)' or both must be specified.``` > Fix missing serDe properties post migration from hiveSync to glueSync > - > > Key: HUDI-7853 > URL: https://issues.apache.org/jira/browse/HUDI-7853 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Prathit Malik >Assignee: Prathit Malik >Priority: Major > > More info : [https://github.com/apache/hudi/issues/11397] > > After migration to 0.13.1, hudi table path is missing from serde properties > due to which when reading from spark below error is thrown > - org.apache.hudi.exception.HoodieException: 'path' or 'Key: > 'hoodie.datasource.read.paths' , default: null description: Comma separated > list of file paths to read within a Hudi table. since version: version is not > defined deprecated after: version is not defined)' or both must be specified. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7852] Constrain the comparison of different types of ordering values to limited cases [hudi]
hudi-bot commented on PR #11424: URL: https://github.com/apache/hudi/pull/11424#issuecomment-2157455424 ## CI report: * f90504d5f8ef99b4ea25dd5b05127c54d3f4252e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7849] Reduce time spent on running testFiltersInFileFormat [hudi]
hudi-bot commented on PR #11423: URL: https://github.com/apache/hudi/pull/11423#issuecomment-2157455359 ## CI report: * 19caeb8d2270645aa6d0ddbdeaa08b31755d974b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7853) Fix missing serDe properties post migration from hiveSync to glueSync
Prathit Malik created HUDI-7853: --- Summary: Fix missing serDe properties post migration from hiveSync to glueSync Key: HUDI-7853 URL: https://issues.apache.org/jira/browse/HUDI-7853 Project: Apache Hudi Issue Type: Improvement Reporter: Prathit Malik Assignee: Prathit Malik More info : [https://github.com/apache/hudi/issues/11397] After migration to 0.13.1, hudi table path is missing from serde properties due to which when reading from spark below error is thrown ```org.apache.hudi.exception.HoodieException: 'path' or 'Key: 'hoodie.datasource.read.paths' , default: null description: Comma separated list of file paths to read within a Hudi table. since version: version is not defined deprecated after: version is not defined)' or both must be specified.``` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases
[ https://issues.apache.org/jira/browse/HUDI-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7852: Sprint: 2024/06/03-16 > Constrain the comparison of different types of ordering values to limited > cases > --- > > Key: HUDI-7852 > URL: https://issues.apache.org/jira/browse/HUDI-7852 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers by casting > them to the long value, which may not be safe for Float and Double. We > should limit the allowed cases to avoid wrong results. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases
[ https://issues.apache.org/jira/browse/HUDI-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7852: Status: Patch Available (was: In Progress) > Constrain the comparison of different types of ordering values to limited > cases > --- > > Key: HUDI-7852 > URL: https://issues.apache.org/jira/browse/HUDI-7852 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers by casting > them to the long value, which may not be safe for Float and Double. We > should limit the allowed cases to avoid wrong results. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [MINOR] Moving to 0.16.0-SNAPSHOT on branch-0.x [hudi]
yihua opened a new pull request, #11425: URL: https://github.com/apache/hudi/pull/11425 ### Change Logs This PR moves branch-0.x to version 0.16.0-SNAPSHOT. ### Impact Moves to the next 0.x version. ### Risk level none ### Documentation Update none ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7851] Fix java doc of DeltaWriteProfile [hudi]
hudi-bot commented on PR #11422: URL: https://github.com/apache/hudi/pull/11422#issuecomment-2157444537 ## CI report: * c7b9a3e72f987f3de9fa15917526fbb6f55d8d1b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24326) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases
[ https://issues.apache.org/jira/browse/HUDI-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7852: - Labels: pull-request-available (was: ) > Constrain the comparison of different types of ordering values to limited > cases > --- > > Key: HUDI-7852 > URL: https://issues.apache.org/jira/browse/HUDI-7852 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers by casting > them to the long value, which may not be safe for Float and Double. We > should limit the allowed cases to avoid wrong results. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7852] Constrain the comparison of different types of ordering values to limited cases [hudi]
yihua opened a new pull request, #11424: URL: https://github.com/apache/hudi/pull/11424 ### Change Logs `HoodieBaseFileGroupRecordBuffer#compareTo` compares the numbers by casting them to the long value, which may not be safe for Float and Double. This PR limits the allowed cases of ordering value comparison to avoid wrong results. ### Impact Makes ordering value comparison safe. ### Risk level low ### Documentation Update none ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7849) Reduce time spent on running testFiltersInFileFormat
[ https://issues.apache.org/jira/browse/HUDI-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov updated HUDI-7849: Status: In Progress (was: Open) > Reduce time spent on running testFiltersInFileFormat > > > Key: HUDI-7849 > URL: https://issues.apache.org/jira/browse/HUDI-7849 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Vova Kolmakov >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > Below shows the top long-running tests in the job "UT flink & FT common & > flink & spark-client & hudi-spark" in Azure CI. The time running > testFiltersInFileFormat should be reduced. > {code:java} > /usr/bin/bash --noprofile --norc > /home/vsts/work/_temp/4fa77791-00bc-40cc-82d7-1fb635914a0f.sh > grep: */target/surefire-reports/*.xml: No such file or directory > 366.474 boolean) [2] false(testFiltersInFileFormat > 223.221 boolean) [1] true(testFiltersInFileFormat > 80.903 HoodieTableType, Integer) [3] MERGE_ON_READ, 2(testNewParquetFileFormat > 65.48 boolean) [2] true(testDeletePartitionAndArchive > 56.558 boolean) [1] false(testDeletePartitionAndArchive{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7849) Reduce time spent on running testFiltersInFileFormat
[ https://issues.apache.org/jira/browse/HUDI-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7849: - Labels: pull-request-available (was: ) > Reduce time spent on running testFiltersInFileFormat > > > Key: HUDI-7849 > URL: https://issues.apache.org/jira/browse/HUDI-7849 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Vova Kolmakov >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > Below shows the top long-running tests in the job "UT flink & FT common & > flink & spark-client & hudi-spark" in Azure CI. The time running > testFiltersInFileFormat should be reduced. > {code:java} > /usr/bin/bash --noprofile --norc > /home/vsts/work/_temp/4fa77791-00bc-40cc-82d7-1fb635914a0f.sh > grep: */target/surefire-reports/*.xml: No such file or directory > 366.474 boolean) [2] false(testFiltersInFileFormat > 223.221 boolean) [1] true(testFiltersInFileFormat > 80.903 HoodieTableType, Integer) [3] MERGE_ON_READ, 2(testNewParquetFileFormat > 65.48 boolean) [2] true(testDeletePartitionAndArchive > 56.558 boolean) [1] false(testDeletePartitionAndArchive{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7849] Reduce time spent on running testFiltersInFileFormat [hudi]
wombatu-kun opened a new pull request, #11423: URL: https://github.com/apache/hudi/pull/11423 ### Change Logs - reduced number of inserts/updates (from 10/2 to 100/20); - added lib `spark-fast-tests` (in test scope) and used `assertSmallDatasetEquality` for comparing dataframes. ### Impact While running testFiltersInFileFormat locally: before: [true] - 1,26min, [false] - 1,01min; after: [true] - 33sec, [false] - 19 sec. ### Risk level (write none, low medium or high below) none ### Documentation Update none - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases
[ https://issues.apache.org/jira/browse/HUDI-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7852: Story Points: 1 > Constrain the comparison of different types of ordering values to limited > cases > --- > > Key: HUDI-7852 > URL: https://issues.apache.org/jira/browse/HUDI-7852 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > > HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers by casting > them to the long value, which may not be safe for Float and Double. We > should limit the allowed cases to avoid wrong results. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases
[ https://issues.apache.org/jira/browse/HUDI-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7852: Status: In Progress (was: Open) > Constrain the comparison of different types of ordering values to limited > cases > --- > > Key: HUDI-7852 > URL: https://issues.apache.org/jira/browse/HUDI-7852 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > > HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers by casting > them to the long value, which may not be safe for Float and Double. We > should limit the allowed cases to avoid wrong results. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7851] Fix java doc of DeltaWriteProfile [hudi]
hudi-bot commented on PR #11422: URL: https://github.com/apache/hudi/pull/11422#issuecomment-2157229627 ## CI report: * c7b9a3e72f987f3de9fa15917526fbb6f55d8d1b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24326) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7851] Fix java doc of DeltaWriteProfile [hudi]
hudi-bot commented on PR #11422: URL: https://github.com/apache/hudi/pull/11422#issuecomment-2157189600 ## CI report: * c7b9a3e72f987f3de9fa15917526fbb6f55d8d1b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
yihua commented on code in PR #9894: URL: https://github.com/apache/hudi/pull/9894#discussion_r1632554676 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java: ## @@ -147,6 +153,37 @@ public void close() { records.clear(); } + /** + * Compares two {@link Comparable}s. If both are numbers, converts them to {@link Long} for comparison. + * If one of the {@link Comparable}s is a String, assumes that both are String values for comparison. + * + * @param o1 {@link Comparable} object. + * @param o2 other {@link Comparable} object to compare to. + * @return comparison result. + */ + @VisibleForTesting + static int compareTo(Comparable o1, Comparable o2) { +// TODO(HUDI-7848): fix the delete records to contain the correct ordering value type +// so this util with the number comparison is not necessary. +try { + return o1.compareTo(o2); +} catch (ClassCastException e) { + if (o1 instanceof Number && o2 instanceof Number) { +Long o1LongValue = ((Number) o1).longValue(); +Long o2LongValue = ((Number) o2).longValue(); +return o1LongValue.compareTo(o2LongValue); Review Comment: HUDI-7852 to track. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases
[ https://issues.apache.org/jira/browse/HUDI-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7852: Description: HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers by casting them to the long value, which may not be safe for Float and Double. We should limit the allowed cases to avoid wrong results. > Constrain the comparison of different types of ordering values to limited > cases > --- > > Key: HUDI-7852 > URL: https://issues.apache.org/jira/browse/HUDI-7852 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Major > > HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers by casting > them to the long value, which may not be safe for Float and Double. We > should limit the allowed cases to avoid wrong results. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases
[ https://issues.apache.org/jira/browse/HUDI-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7852: Fix Version/s: 1.0.0 > Constrain the comparison of different types of ordering values to limited > cases > --- > > Key: HUDI-7852 > URL: https://issues.apache.org/jira/browse/HUDI-7852 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > > HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers by casting > them to the long value, which may not be safe for Float and Double. We > should limit the allowed cases to avoid wrong results. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases
[ https://issues.apache.org/jira/browse/HUDI-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-7852: --- Assignee: Ethan Guo > Constrain the comparison of different types of ordering values to limited > cases > --- > > Key: HUDI-7852 > URL: https://issues.apache.org/jira/browse/HUDI-7852 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > > HoodieBaseFileGroupRecordBuffer#compareTo compares the numbers by casting > them to the long value, which may not be safe for Float and Double. We > should limit the allowed cases to avoid wrong results. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7852) Constrain the comparison of different types of ordering values to limited cases
Ethan Guo created HUDI-7852: --- Summary: Constrain the comparison of different types of ordering values to limited cases Key: HUDI-7852 URL: https://issues.apache.org/jira/browse/HUDI-7852 Project: Apache Hudi Issue Type: Improvement Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
yihua commented on code in PR #9894: URL: https://github.com/apache/hudi/pull/9894#discussion_r1632553603 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java: ## @@ -147,6 +153,37 @@ public void close() { records.clear(); } + /** + * Compares two {@link Comparable}s. If both are numbers, converts them to {@link Long} for comparison. + * If one of the {@link Comparable}s is a String, assumes that both are String values for comparison. + * + * @param o1 {@link Comparable} object. + * @param o2 other {@link Comparable} object to compare to. + * @return comparison result. + */ + @VisibleForTesting + static int compareTo(Comparable o1, Comparable o2) { +// TODO(HUDI-7848): fix the delete records to contain the correct ordering value type +// so this util with the number comparison is not necessary. +try { + return o1.compareTo(o2); +} catch (ClassCastException e) { + if (o1 instanceof Number && o2 instanceof Number) { +Long o1LongValue = ((Number) o1).longValue(); +Long o2LongValue = ((Number) o2).longValue(); +return o1LongValue.compareTo(o2LongValue); Review Comment: We can constrain the comparison to Long and Integer only to limit the possibility of wrong results. I'll create a follow-up PR to fix this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
yihua commented on code in PR #9894: URL: https://github.com/apache/hudi/pull/9894#discussion_r1632551875 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java: ## @@ -147,6 +153,37 @@ public void close() { records.clear(); } + /** + * Compares two {@link Comparable}s. If both are numbers, converts them to {@link Long} for comparison. + * If one of the {@link Comparable}s is a String, assumes that both are String values for comparison. + * + * @param o1 {@link Comparable} object. + * @param o2 other {@link Comparable} object to compare to. + * @return comparison result. + */ + @VisibleForTesting + static int compareTo(Comparable o1, Comparable o2) { +// TODO(HUDI-7848): fix the delete records to contain the correct ordering value type Review Comment: Yes, based on the test cases this only happens when the ordering field value is deserialized from the delete records. We need to check if the existing Avro-based merging logic has done schema handling to make this work (which may also incur additional overhead). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
yihua commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2157165840 CI is green. https://github.com/apache/hudi/assets/2497195/6d8f4fa9-3e64-4914-9a46-05e8783cd458";> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7851) Fix java doc of DeltaWriteProfile
[ https://issues.apache.org/jira/browse/HUDI-7851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7851: - Labels: pull-request-available (was: ) > Fix java doc of DeltaWriteProfile > - > > Key: HUDI-7851 > URL: https://issues.apache.org/jira/browse/HUDI-7851 > Project: Apache Hudi > Issue Type: Improvement >Reporter: bradley >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7851] Fix java doc of DeltaWriteProfile [hudi]
usberkeley opened a new pull request, #11422: URL: https://github.com/apache/hudi/pull/11422 ### Change Logs Fix java doc of DeltaWriteProfile ### Impact none ### Risk level (write none, low medium or high below) none ### Documentation Update none ### Contributor's checklist - [1] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [1] Change Logs and Impact were stated clearly - [1] Adequate tests were added if applicable - [1] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader (#9894)
This is an automated email from the ASF dual-hosted git repository. codope pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new c0576131759 [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader (#9894) c0576131759 is described below commit c05761317596585a3c0c3cc69a34b4407843351c Author: Y Ethan Guo AuthorDate: Sun Jun 9 20:48:09 2024 -0700 [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader (#9894) This PR adds a new table config `hoodie.record.merge.mode` to control the record merging mode and behavior in the new file group reader (`HoodieFileGroupReader`) and implements event-time ordering in it. The config `hoodie.record.merge.mode` is going to be the single config that determines how the record merging happens in release 1.0 and beyond. - Co-authored-by: Sagar Sumit --- .../hudi/client/TestTableSchemaEvolution.java | 3 + .../hudi/common/config/HoodieCommonConfig.java | 3 + .../apache/hudi/common/config/RecordMergeMode.java | 36 .../hudi/common/table/HoodieTableConfig.java | 13 +- .../hudi/common/table/HoodieTableMetaClient.java | 114 ++- .../table/log/BaseHoodieLogRecordReader.java | 7 + .../table/log/HoodieMergedLogRecordReader.java | 13 +- .../read/HoodieBaseFileGroupRecordBuffer.java | 209 - .../common/table/read/HoodieFileGroupReader.java | 26 ++- .../table/read/TestHoodieFileGroupReaderBase.java | 77 ++-- .../common/table/TestHoodieTableMetaClient.java| 144 ++ .../hudi/common/table/read/TestCustomMerger.java | 4 + .../common/table/read/TestEventTimeMerging.java| 4 + ...stHoodiePositionBasedFileGroupRecordBuffer.java | 6 +- .../read/TestHoodieFileGroupReaderOnSpark.scala| 11 +- 15 files changed, 588 insertions(+), 82 deletions(-) diff --git a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java index f5fa70c6668..496b42c13d6 100644 --- a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java +++ b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java @@ -20,6 +20,7 @@ package org.apache.hudi.client; import org.apache.hudi.avro.AvroSchemaUtils; import org.apache.hudi.avro.HoodieAvroUtils; +import org.apache.hudi.common.config.RecordMergeMode; import org.apache.hudi.common.model.HoodieAvroRecord; import org.apache.hudi.common.model.HoodieKey; import org.apache.hudi.common.model.HoodieRecord; @@ -48,6 +49,7 @@ import java.io.IOException; import java.util.List; import java.util.stream.Collectors; +import static org.apache.hudi.common.config.HoodieCommonConfig.RECORD_MERGE_MODE; import static org.apache.hudi.common.table.timeline.versioning.TimelineLayoutVersion.VERSION_1; import static org.apache.hudi.common.testutils.HoodieTestDataGenerator.EXTRA_TYPE_SCHEMA; import static org.apache.hudi.common.testutils.HoodieTestDataGenerator.FARE_NESTED_SCHEMA; @@ -165,6 +167,7 @@ public class TestTableSchemaEvolution extends HoodieClientTestBase { HoodieTableMetaClient.withPropertyBuilder() .fromMetaClient(metaClient) .setTableType(HoodieTableType.MERGE_ON_READ) + .setRecordMergeMode(RecordMergeMode.valueOf(RECORD_MERGE_MODE.defaultValue())) .setTimelineLayoutVersion(VERSION_1) .initTable(metaClient.getStorageConf().newInstance(), metaClient.getBasePath()); diff --git a/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieCommonConfig.java b/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieCommonConfig.java index 1a4c2e31780..c96b07ee4f0 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieCommonConfig.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieCommonConfig.java @@ -18,6 +18,7 @@ package org.apache.hudi.common.config; +import org.apache.hudi.common.table.HoodieTableConfig; import org.apache.hudi.common.table.timeline.TimelineUtils.HollowCommitHandling; import org.apache.hudi.common.util.collection.ExternalSpillableMap; @@ -81,6 +82,8 @@ public class HoodieCommonConfig extends HoodieConfig { + " operation will fail schema compatibility check. Set this option to true will make the missing " + " column be filled with null values to successfully complete the write operation."); + public static final ConfigProperty RECORD_MERGE_MODE = HoodieTableConfig.RECORD_MERGE_MODE; + public static final ConfigProperty SPILLABLE_DISK_MAP_TYPE = ConfigProperty .key("hoodie.common.spillable.diskmap.type")
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
codope merged PR #9894: URL: https://github.com/apache/hudi/pull/9894 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7851) Fix java doc of DeltaWriteProfile
bradley created HUDI-7851: - Summary: Fix java doc of DeltaWriteProfile Key: HUDI-7851 URL: https://issues.apache.org/jira/browse/HUDI-7851 Project: Apache Hudi Issue Type: Improvement Reporter: bradley -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2157141857 ## CI report: * 3a1ec4524a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24313) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2157136586 ## CI report: * ca01c48cd352583dbf024006de57c9f6827b237b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24324) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24323) * 3a1ec4524a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
codope commented on code in PR #9894: URL: https://github.com/apache/hudi/pull/9894#discussion_r1632524037 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java: ## @@ -147,6 +153,37 @@ public void close() { records.clear(); } + /** + * Compares two {@link Comparable}s. If both are numbers, converts them to {@link Long} for comparison. + * If one of the {@link Comparable}s is a String, assumes that both are String values for comparison. + * + * @param o1 {@link Comparable} object. + * @param o2 other {@link Comparable} object to compare to. + * @return comparison result. + */ + @VisibleForTesting + static int compareTo(Comparable o1, Comparable o2) { +// TODO(HUDI-7848): fix the delete records to contain the correct ordering value type Review Comment: does this happen only for delete records? ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java: ## @@ -147,6 +153,37 @@ public void close() { records.clear(); } + /** + * Compares two {@link Comparable}s. If both are numbers, converts them to {@link Long} for comparison. + * If one of the {@link Comparable}s is a String, assumes that both are String values for comparison. + * + * @param o1 {@link Comparable} object. + * @param o2 other {@link Comparable} object to compare to. + * @return comparison result. + */ + @VisibleForTesting + static int compareTo(Comparable o1, Comparable o2) { +// TODO(HUDI-7848): fix the delete records to contain the correct ordering value type +// so this util with the number comparison is not necessary. +try { + return o1.compareTo(o2); +} catch (ClassCastException e) { + if (o1 instanceof Number && o2 instanceof Number) { +Long o1LongValue = ((Number) o1).longValue(); +Long o2LongValue = ((Number) o2).longValue(); +return o1LongValue.compareTo(o2LongValue); Review Comment: can possibly lead to wrong result with float/double? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7826] Make column nullable when setNullForMissingColumns is true [hudi]
the-other-tim-brown commented on code in PR #11381: URL: https://github.com/apache/hudi/pull/11381#discussion_r1632525375 ## hudi-common/src/main/java/org/apache/hudi/internal/schema/utils/AvroSchemaEvolutionUtils.java: ## @@ -113,6 +120,21 @@ public static InternalSchema reconcileSchema(Schema incomingSchema, InternalSche typeChange.updateColumnType(col, inComingInternalSchema.findType(col)); }); +// mark columns missing from incoming schema as nullable +Set visited = new HashSet<>(); +diffFromOldSchema.stream() +// ignore meta fields +.filter(col -> !META_FIELD_NAMES.contains(col)) +.sorted() +.forEach(col -> { + // if parent is marked as nullable, only update the parent and not all the missing children field + String parent = TableChangesHelper.getParentName(col); + if (!visited.contains(parent)) { +typeChange.updateColumnNullability(col, true); + } + visited.add(col); +}); Review Comment: @nsivabalan I've updated the PR to include the boolean -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-7849) Reduce time spent on running testFiltersInFileFormat
[ https://issues.apache.org/jira/browse/HUDI-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov reassigned HUDI-7849: --- Assignee: Vova Kolmakov > Reduce time spent on running testFiltersInFileFormat > > > Key: HUDI-7849 > URL: https://issues.apache.org/jira/browse/HUDI-7849 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Vova Kolmakov >Priority: Major > Fix For: 1.0.0 > > > Below shows the top long-running tests in the job "UT flink & FT common & > flink & spark-client & hudi-spark" in Azure CI. The time running > testFiltersInFileFormat should be reduced. > {code:java} > /usr/bin/bash --noprofile --norc > /home/vsts/work/_temp/4fa77791-00bc-40cc-82d7-1fb635914a0f.sh > grep: */target/surefire-reports/*.xml: No such file or directory > 366.474 boolean) [2] false(testFiltersInFileFormat > 223.221 boolean) [1] true(testFiltersInFileFormat > 80.903 HoodieTableType, Integer) [3] MERGE_ON_READ, 2(testNewParquetFileFormat > 65.48 boolean) [2] true(testDeletePartitionAndArchive > 56.558 boolean) [1] false(testDeletePartitionAndArchive{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7847) Infer record merge mode during table upgrade
[ https://issues.apache.org/jira/browse/HUDI-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geser Dugarov reassigned HUDI-7847: --- Assignee: Geser Dugarov > Infer record merge mode during table upgrade > > > Key: HUDI-7847 > URL: https://issues.apache.org/jira/browse/HUDI-7847 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Geser Dugarov >Priority: Major > Fix For: 1.0.0 > > > Record merge mode is required to dictate the merging behavior in release 1.x, > playing the same role as the payload class config in the release 0.x. During > table upgrade, we need to infer the record merge mode based on the payload > class so it's correctly set. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7838) Use Config hoodie.schema.cache.enable in HoodieBaseFileGroupRecordBuffer and AbstractHoodieLogRecordReader
[ https://issues.apache.org/jira/browse/HUDI-7838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov reassigned HUDI-7838: --- Assignee: Vova Kolmakov > Use Config hoodie.schema.cache.enable in HoodieBaseFileGroupRecordBuffer and > AbstractHoodieLogRecordReader > --- > > Key: HUDI-7838 > URL: https://issues.apache.org/jira/browse/HUDI-7838 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core >Reporter: Jonathan Vexler >Assignee: Vova Kolmakov >Priority: Major > > hoodie.schema.cache.enable should be used to decide if we want to use the > schema cache. Currently it is hardcoded to false. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7826] Make column nullable when setNullForMissingColumns is true [hudi]
hudi-bot commented on PR #11381: URL: https://github.com/apache/hudi/pull/11381#issuecomment-2157083778 ## CI report: * 0d1802d42d4b67cc791cbd8d8c4619dd7a52d319 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24320) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-7850) Makes hoodie.record.merge.mode mandatory upon creating the table and first write
[ https://issues.apache.org/jira/browse/HUDI-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geser Dugarov reassigned HUDI-7850: --- Assignee: Geser Dugarov > Makes hoodie.record.merge.mode mandatory upon creating the table and first > write > > > Key: HUDI-7850 > URL: https://issues.apache.org/jira/browse/HUDI-7850 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Ethan Guo >Assignee: Geser Dugarov >Priority: Major > Fix For: 1.0.0 > > > Right now, "hoodie.record.merge.mode" is optional during writes as it is > inferred from the payload class name, payload type, and the record merger > strategy during the creation of the table properties. We should make this > config mandatory in release 1.0 and make other merge configs optional to > simplify the configuration experience. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2157082849 ## CI report: * ca01c48cd352583dbf024006de57c9f6827b237b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24324) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24323) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-7839) Can not find props file when using HoodieDeltaStreamer with Hudi 0.14.1
[ https://issues.apache.org/jira/browse/HUDI-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853545#comment-17853545 ] Vova Kolmakov commented on HUDI-7839: - Fixed via master branch: 9f9064761bac766cc7884027432568c06817ddd7 > Can not find props file when using HoodieDeltaStreamer with Hudi 0.14.1 > --- > > Key: HUDI-7839 > URL: https://issues.apache.org/jira/browse/HUDI-7839 > Project: Apache Hudi > Issue Type: Bug >Reporter: Xiaoxuan Li >Assignee: Vova Kolmakov >Priority: Major > > When use HoodieDeltaStreamer with Hudi 0.14.1, the following error was throw > {noformat} > Cannot read properties from dfs from file > file:/mnt1/yarn/usercache/hadoop/appcache/application_1717399456895_0009/container_1717399456895_0009_02_01/src/test/resources/streamer-config/dfs-source.properties{noformat} > > It works fine on Hudi 0.14.0. It might related to a new change bring in > 0.14.1 -> [https://github.com/apache/hudi/pull/9913] > > error log: > {code:java} > 24/06/06 22:42:09 INFO Client:client token: N/Adiagnostics: User class threw > exception: org.apache.hudi.exception.HoodieIOException: Cannot read > properties from dfs from file > file:/mnt1/yarn/usercache/hadoop/appcache/application_1717399456895_0009/container_1717399456895_0009_02_01/src/test/resources/streamer-config/dfs-source.propertiesat > > org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:166)at > > org.apache.hudi.common.config.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:85)at > org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:232)at > org.apache.hudi.utilities.streamer.HoodieStreamer$Config.getProps(HoodieStreamer.java:437)at > > org.apache.hudi.utilities.streamer.StreamSync.getDeducedSchemaProvider(StreamSync.java:656)at > > org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:632)at > > org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:525)at > > org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:498)at > > org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:404)at > org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:850)at > > org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)at > org.apache.hudi.common.util.Option.ifPresent(Option.java:97)at > org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:207)at > > org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:592)at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method)at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)at > > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at > java.base/java.lang.reflect.Method.invoke(Method.java:568)at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:741)Caused > by: java.io.FileNotFoundException: File > file:/mnt1/yarn/usercache/hadoop/appcache/application_1717399456895_0009/container_1717399456895_0009_02_01/src/test/resources/streamer-config/dfs-source.properties > does not existat > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:968)at > > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1289)at > > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:958)at > > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:472)at > > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:188)at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:581)at > org.apache.hadoop.fs.FileSystem.open(FileSystem.java:1004)at > org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:161)... > 18 more > ApplicationMaster host: ip-172-31-75-55.ec2.internalApplicationMaster RPC > port: 43905queue: defaultstart time: 1717713711465final status: > FAILEDtracking URL: > http://ip-172-31-69-122.ec2.internal:20888/proxy/application_1717399456895_0009/user: > hadoop24/06/06 22:42:09 ERROR Client: Application diagnostics message: User > class threw exception: org.apache.hudi.exception.HoodieIOException: Cannot > read properties from dfs from file > file:/mnt1/yarn/usercache/hadoop/appcache/application_1717399456895_0009/container_1717399456895_0009_02_01/src/test/resources/streamer-config/dfs-source.propertiesat > > org.
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2157034573 ## CI report: * 7b6c9d86accaf976f4db0185fa1a203c82f04446 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24322) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24321) * ca01c48cd352583dbf024006de57c9f6827b237b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2157025978 ## CI report: * a6ffe1240055d6135a517dfcada59edc95383423 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24318) * 7b6c9d86accaf976f4db0185fa1a203c82f04446 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7850) Makes hoodie.record.merge.mode mandatory upon creating the table and first write
[ https://issues.apache.org/jira/browse/HUDI-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7850: Fix Version/s: 1.0.0 > Makes hoodie.record.merge.mode mandatory upon creating the table and first > write > > > Key: HUDI-7850 > URL: https://issues.apache.org/jira/browse/HUDI-7850 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > > Right now, "hoodie.record.merge.mode" is optional during writes as it is > inferred from the payload class name, payload type, and the record merger > strategy during the creation of the table properties. We should make this > config mandatory in release 1.0 and make other merge configs optional to > simplify the configuration experience. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7850) Makes hoodie.record.merge.mode mandatory upon creating the table and first write
[ https://issues.apache.org/jira/browse/HUDI-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7850: Description: Right now, "hoodie.record.merge.mode" is optional during writes as it is inferred from the payload class name, payload type, and the record merger strategy during the creation of the table properties. We should make this config mandatory in release 1.0 and make other merge configs optional to simplify the configuration experience. (was: Right now ) > Makes hoodie.record.merge.mode mandatory upon creating the table and first > write > > > Key: HUDI-7850 > URL: https://issues.apache.org/jira/browse/HUDI-7850 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Ethan Guo >Priority: Major > > Right now, "hoodie.record.merge.mode" is optional during writes as it is > inferred from the payload class name, payload type, and the record merger > strategy during the creation of the table properties. We should make this > config mandatory in release 1.0 and make other merge configs optional to > simplify the configuration experience. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7850) Makes `hoodie.record.merge.mode` mandatory upon creating the table and first write
[ https://issues.apache.org/jira/browse/HUDI-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7850: Description: Right now > Makes `hoodie.record.merge.mode` mandatory upon creating the table and first > write > -- > > Key: HUDI-7850 > URL: https://issues.apache.org/jira/browse/HUDI-7850 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Ethan Guo >Priority: Major > > Right now -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7850) Makes `hoodie.record.merge.mode` mandatory upon creating the table and first write
Ethan Guo created HUDI-7850: --- Summary: Makes `hoodie.record.merge.mode` mandatory upon creating the table and first write Key: HUDI-7850 URL: https://issues.apache.org/jira/browse/HUDI-7850 Project: Apache Hudi Issue Type: New Feature Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7850) Makes hoodie.record.merge.mode mandatory upon creating the table and first write
[ https://issues.apache.org/jira/browse/HUDI-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7850: Summary: Makes hoodie.record.merge.mode mandatory upon creating the table and first write (was: Makes `hoodie.record.merge.mode` mandatory upon creating the table and first write) > Makes hoodie.record.merge.mode mandatory upon creating the table and first > write > > > Key: HUDI-7850 > URL: https://issues.apache.org/jira/browse/HUDI-7850 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Ethan Guo >Priority: Major > > Right now -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2157018634 ## CI report: * a6ffe1240055d6135a517dfcada59edc95383423 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7842) Update docs on the new record merge mode config
[ https://issues.apache.org/jira/browse/HUDI-7842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7842: Summary: Update docs on the new record merge mode config (was: Update docs with the new record merge mode config) > Update docs on the new record merge mode config > --- > > Key: HUDI-7842 > URL: https://issues.apache.org/jira/browse/HUDI-7842 > Project: Apache Hudi > Issue Type: Task >Reporter: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > > We should educate users on the new record merge mode config introduced by > HUDI-6798 that simplifies configs controlling the merging behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7849) Reduce time spent on running testFiltersInFileFormat
[ https://issues.apache.org/jira/browse/HUDI-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7849: Description: Below shows the top long-running tests in the job "UT flink & FT common & flink & spark-client & hudi-spark" in Azure CI. The time running testFiltersInFileFormat should be reduced. {code:java} /usr/bin/bash --noprofile --norc /home/vsts/work/_temp/4fa77791-00bc-40cc-82d7-1fb635914a0f.sh grep: */target/surefire-reports/*.xml: No such file or directory 366.474 boolean) [2] false(testFiltersInFileFormat 223.221 boolean) [1] true(testFiltersInFileFormat 80.903 HoodieTableType, Integer) [3] MERGE_ON_READ, 2(testNewParquetFileFormat 65.48 boolean) [2] true(testDeletePartitionAndArchive 56.558 boolean) [1] false(testDeletePartitionAndArchive{code} > Reduce time spent on running testFiltersInFileFormat > > > Key: HUDI-7849 > URL: https://issues.apache.org/jira/browse/HUDI-7849 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > > Below shows the top long-running tests in the job "UT flink & FT common & > flink & spark-client & hudi-spark" in Azure CI. The time running > testFiltersInFileFormat should be reduced. > {code:java} > /usr/bin/bash --noprofile --norc > /home/vsts/work/_temp/4fa77791-00bc-40cc-82d7-1fb635914a0f.sh > grep: */target/surefire-reports/*.xml: No such file or directory > 366.474 boolean) [2] false(testFiltersInFileFormat > 223.221 boolean) [1] true(testFiltersInFileFormat > 80.903 HoodieTableType, Integer) [3] MERGE_ON_READ, 2(testNewParquetFileFormat > 65.48 boolean) [2] true(testDeletePartitionAndArchive > 56.558 boolean) [1] false(testDeletePartitionAndArchive{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [I] [SUPPORT] Data deduplication caused by drawback in the delete invalid files before commit [hudi]
danny0405 commented on issue #11419: URL: https://github.com/apache/hudi/issues/11419#issuecomment-2156988506 hmm, would you mind to fire a fix for it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7849) Reduce time spent on running testFiltersInFileFormat
[ https://issues.apache.org/jira/browse/HUDI-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7849: Fix Version/s: 1.0.0 > Reduce time spent on running testFiltersInFileFormat > > > Key: HUDI-7849 > URL: https://issues.apache.org/jira/browse/HUDI-7849 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7849) Reduce time spent on running testFiltersInFileFormat
Ethan Guo created HUDI-7849: --- Summary: Reduce time spent on running testFiltersInFileFormat Key: HUDI-7849 URL: https://issues.apache.org/jira/browse/HUDI-7849 Project: Apache Hudi Issue Type: Improvement Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7826] Make column nullable when setNullForMissingColumns is true [hudi]
hudi-bot commented on PR #11381: URL: https://github.com/apache/hudi/pull/11381#issuecomment-2156963475 ## CI report: * 7ac5620ea218b34184ba918f6197339f2f695eb9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24317) * 0d1802d42d4b67cc791cbd8d8c4619dd7a52d319 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24320) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7826] Make column nullable when setNullForMissingColumns is true [hudi]
hudi-bot commented on PR #11381: URL: https://github.com/apache/hudi/pull/11381#issuecomment-2156937303 ## CI report: * 7ac5620ea218b34184ba918f6197339f2f695eb9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24317) * 0d1802d42d4b67cc791cbd8d8c4619dd7a52d319 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156933007 ## CI report: * 3a1ec4524a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24313) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156929070 ## CI report: * a6ffe1240055d6135a517dfcada59edc95383423 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24319) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24318) * 3a1ec4524a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-7759) Remove Hadoop dependencies in hudi-common module
[ https://issues.apache.org/jira/browse/HUDI-7759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7759. --- Resolution: Fixed > Remove Hadoop dependencies in hudi-common module > > > Key: HUDI-7759 > URL: https://issues.apache.org/jira/browse/HUDI-7759 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-7752) Abstract serializeRecords for log writing
[ https://issues.apache.org/jira/browse/HUDI-7752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7752. --- Resolution: Fixed > Abstract serializeRecords for log writing > - > > Key: HUDI-7752 > URL: https://issues.apache.org/jira/browse/HUDI-7752 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-7754) Remove AvroWriteSupport and ParquetReaderIterator from hudi-common
[ https://issues.apache.org/jira/browse/HUDI-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7754. --- Resolution: Fixed > Remove AvroWriteSupport and ParquetReaderIterator from hudi-common > -- > > Key: HUDI-7754 > URL: https://issues.apache.org/jira/browse/HUDI-7754 > Project: Apache Hudi > Issue Type: Task >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > 2 classes with hadoop deps that can be moved to hadoop common and aren't > covered by other prs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-7750) Move HoodieLogFormatWriter class to hoodie-hadoop-common module
[ https://issues.apache.org/jira/browse/HUDI-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7750. --- Resolution: Fixed > Move HoodieLogFormatWriter class to hoodie-hadoop-common module > --- > > Key: HUDI-7750 > URL: https://issues.apache.org/jira/browse/HUDI-7750 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156876724 ## CI report: * a6ffe1240055d6135a517dfcada59edc95383423 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24319) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24318) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-4732) Leverage Schema Registry for reading proto messages from kafka
[ https://issues.apache.org/jira/browse/HUDI-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-4732. --- Resolution: Fixed > Leverage Schema Registry for reading proto messages from kafka > -- > > Key: HUDI-4732 > URL: https://issues.apache.org/jira/browse/HUDI-4732 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Timothy Brown >Assignee: Timothy Brown >Priority: Minor > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > If you use the Confluent Schema Registry, they provide a way to deserialize > the kafka message value without providing the protobuf class name. The first > cut of ProtoKafkaSource requires users to specify a classname but we want to > allow users the flexibility to use this other method of deserializing the > message. > > Docs: > https://docs.confluent.io/platform/current/schema-registry/serdes-develop/serdes-protobuf.html -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7739) Shudown asyncDetectorExecutor in AsyncTimelineServerBasedDetectionStrategy
[ https://issues.apache.org/jira/browse/HUDI-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7739: Fix Version/s: 0.15.0 > Shudown asyncDetectorExecutor in AsyncTimelineServerBasedDetectionStrategy > -- > > Key: HUDI-7739 > URL: https://issues.apache.org/jira/browse/HUDI-7739 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Xinyu Zou >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7699) Support STS external ids and configurable session names in the AWS StsAssumeRoleCredentialsProvider
[ https://issues.apache.org/jira/browse/HUDI-7699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7699: Fix Version/s: 0.15.0 > Support STS external ids and configurable session names in the AWS > StsAssumeRoleCredentialsProvider > --- > > Key: HUDI-7699 > URL: https://issues.apache.org/jira/browse/HUDI-7699 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Ian Streeter >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > [HUDI-6695|https://issues.apache.org/jira/browse/HUDI-6695] added a AWS > credentials provider to support assuming a role when syncing to Glue. > > We use Hudi in a multi-tenant environment, and our customers give us > delegated access to their Glue catalog. In this multi-tenant setup it is > important to use [an external > ID|https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html] > to improve security when assuming IAM roles. > > Furthermore, the STS session name is currently hard-coded to "hoodie". > It is helpful for us to have configurable session names so we have better > tracability of what entities are creating STS sessions in the cloud. > > Currently, the assumed role is configured with the > {{hoodie.aws.role.arn}} config property. I would like to add the following > extra optional config properties, which will be used by the > {{HoodieConfigAWSAssumedRoleCredentialsProvider}}: > > - {{hoodie.aws.role.external.id}} > - {{hoodie.aws.role.session.name}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7738) FileStreamReader need set Charset with UTF-8
[ https://issues.apache.org/jira/browse/HUDI-7738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7738: Fix Version/s: 0.15.0 > FileStreamReader need set Charset with UTF-8 > > > Key: HUDI-7738 > URL: https://issues.apache.org/jira/browse/HUDI-7738 > Project: Apache Hudi > Issue Type: Improvement > Components: cli >Reporter: xy >Assignee: xy >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > FileStreamReader need set Charset with UTF-8 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7737) Bump Spark 3.4 version to Spark 3.4.3
[ https://issues.apache.org/jira/browse/HUDI-7737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7737: Fix Version/s: 0.15.0 > Bump Spark 3.4 version to Spark 3.4.3 > - > > Key: HUDI-7737 > URL: https://issues.apache.org/jira/browse/HUDI-7737 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Geser Dugarov >Assignee: Geser Dugarov >Priority: Minor > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > Spark 3.4.3 has been released: https://github.com/apache/spark/tree/v3.4.3 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7715) Partition TTL for Flink
[ https://issues.apache.org/jira/browse/HUDI-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7715: Fix Version/s: 1.0.0 > Partition TTL for Flink > --- > > Key: HUDI-7715 > URL: https://issues.apache.org/jira/browse/HUDI-7715 > Project: Apache Hudi > Issue Type: Improvement >Reporter: xi chaomin >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7720) Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups
[ https://issues.apache.org/jira/browse/HUDI-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7720: Fix Version/s: 0.15.0 1.0.0 > Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups > - > > Key: HUDI-7720 > URL: https://issues.apache.org/jira/browse/HUDI-7720 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql >Reporter: xy >Assignee: xy >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > Attachments: 1280X1280.PNG > > > Job aborted due to stage failure: Task 3 in stage 35.0 failed 4 times, most > recent failure: Lost task 3.3 in stage 35.0 (TID 32175) (10-222-33-34.lan > executor 204): java.lang.NullPointerException > at java.util.ArrayList.(ArrayList.java:178) > at > org.apache.hudi.common.table.view.HoodieTableFileSystemView.fetchAllStoredFileGroups(HoodieTableFileSystemView.java:308) > at > org.apache.hudi.common.table.view.AbstractTableFileSystemView.getAllFileGroupsIncludingReplaced(AbstractTableFileSystemView.java:976) > at > org.apache.hudi.common.table.view.AbstractTableFileSystemView.getReplacedFileGroupsBefore(AbstractTableFileSystemView.java:989) > at > org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:104) > at > org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getReplacedFileGroupsBefore(PriorityBasedFileSystemView.java:232) > at > org.apache.hudi.table.action.clean.CleanPlanner.getReplacedFilesEligibleToClean(CleanPlanner.java:441) > at > org.apache.hudi.table.action.clean.CleanPlanner.getFilesToCleanKeepingLatestCommits(CleanPlanner.java:330) > at > org.apache.hudi.table.action.clean.CleanPlanner.getFilesToCleanKeepingLatestCommits(CleanPlanner.java:295) > at > org.apache.hudi.table.action.clean.CleanPlanner.getDeletePaths(CleanPlanner.java:493) > at > org.apache.hudi.table.action.clean.CleanPlanActionExecutor.lambda$requestClean$af5da5d2$1(CleanPlanActionExecutor.java:122) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at scala.collection.Iterator.foreach$(Iterator.scala:943) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at > scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:366) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) at > scala.collection.AbstractIterator.to(Iterator.scala:1431) at > scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) at > scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) at > scala.collection.AbstractIterator.toArray(Iterator.scala:1431) at > org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at > org.apache.spark.scheduler.Task.run(Task.scala:131) at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1480) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7721) Fix broken build on master
[ https://issues.apache.org/jira/browse/HUDI-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7721: Fix Version/s: 0.15.0 > Fix broken build on master > -- > > Key: HUDI-7721 > URL: https://issues.apache.org/jira/browse/HUDI-7721 > Project: Apache Hudi > Issue Type: Bug >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Critical > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > TestHoodieDeltaStreamer is invalid due to > [https://github.com/apache/hudi/pull/11099.] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-7720) Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups
[ https://issues.apache.org/jira/browse/HUDI-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7720. --- Resolution: Fixed > Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups > - > > Key: HUDI-7720 > URL: https://issues.apache.org/jira/browse/HUDI-7720 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql >Reporter: xy >Assignee: xy >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > Attachments: 1280X1280.PNG > > > Job aborted due to stage failure: Task 3 in stage 35.0 failed 4 times, most > recent failure: Lost task 3.3 in stage 35.0 (TID 32175) (10-222-33-34.lan > executor 204): java.lang.NullPointerException > at java.util.ArrayList.(ArrayList.java:178) > at > org.apache.hudi.common.table.view.HoodieTableFileSystemView.fetchAllStoredFileGroups(HoodieTableFileSystemView.java:308) > at > org.apache.hudi.common.table.view.AbstractTableFileSystemView.getAllFileGroupsIncludingReplaced(AbstractTableFileSystemView.java:976) > at > org.apache.hudi.common.table.view.AbstractTableFileSystemView.getReplacedFileGroupsBefore(AbstractTableFileSystemView.java:989) > at > org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:104) > at > org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getReplacedFileGroupsBefore(PriorityBasedFileSystemView.java:232) > at > org.apache.hudi.table.action.clean.CleanPlanner.getReplacedFilesEligibleToClean(CleanPlanner.java:441) > at > org.apache.hudi.table.action.clean.CleanPlanner.getFilesToCleanKeepingLatestCommits(CleanPlanner.java:330) > at > org.apache.hudi.table.action.clean.CleanPlanner.getFilesToCleanKeepingLatestCommits(CleanPlanner.java:295) > at > org.apache.hudi.table.action.clean.CleanPlanner.getDeletePaths(CleanPlanner.java:493) > at > org.apache.hudi.table.action.clean.CleanPlanActionExecutor.lambda$requestClean$af5da5d2$1(CleanPlanActionExecutor.java:122) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at scala.collection.Iterator.foreach$(Iterator.scala:943) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at > scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:366) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) at > scala.collection.AbstractIterator.to(Iterator.scala:1431) at > scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) at > scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) at > scala.collection.AbstractIterator.toArray(Iterator.scala:1431) at > org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at > org.apache.spark.scheduler.Task.run(Task.scala:131) at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1480) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7641) Add metrics to track what partitions are enabled in MDT
[ https://issues.apache.org/jira/browse/HUDI-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7641: Fix Version/s: 1.0.0 > Add metrics to track what partitions are enabled in MDT > --- > > Key: HUDI-7641 > URL: https://issues.apache.org/jira/browse/HUDI-7641 > Project: Apache Hudi > Issue Type: Improvement > Components: metadata >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-7467) TestHoodieDeltaStreamer. testAutoGenerateRecordKeys
[ https://issues.apache.org/jira/browse/HUDI-7467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7467. --- Resolution: Fixed > TestHoodieDeltaStreamer. testAutoGenerateRecordKeys > --- > > Key: HUDI-7467 > URL: https://issues.apache.org/jira/browse/HUDI-7467 > Project: Apache Hudi > Issue Type: Bug > Components: tests-ci >Reporter: Lin Liu >Assignee: tao pan >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > This test is flaky and sometimes it fails in Azure CI. We need to reproduce > it locally and check why it is flaky (if there is any bug causing it, or it's > due to test setup). > [https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=22725&view=logs&j=dcedfe73-9485-5cc5-817a-73b61fc5dcb0&t=9df7def4-004b-5fb7-f042-da5d723783ad&s=859b8d9a-8fd6-5a5c-6f5e-f84f1990894e] > {code:java} > [ERROR] Tests run: 131, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: > 2,459.289 s <<< FAILURE! - in > org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamer > [ERROR] testAutoGenerateRecordKeys Time elapsed: 14.248 s <<< FAILURE! > org.opentest4j.AssertionFailedError: expected: <300> but was: <500> > at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55) > at > org.junit.jupiter.api.AssertionUtils.failNotEqual(AssertionUtils.java:62) > at > org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:166) > at > org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:161) > at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:611) > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerTestBase.assertRecordCount(HoodieDeltaStreamerTestBase.java:486) > at > org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamer.testAutoGenerateRecordKeys(TestHoodieDeltaStreamer.java:2823) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7467) TestHoodieDeltaStreamer. testAutoGenerateRecordKeys
[ https://issues.apache.org/jira/browse/HUDI-7467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7467: Fix Version/s: 0.15.0 1.0.0 > TestHoodieDeltaStreamer. testAutoGenerateRecordKeys > --- > > Key: HUDI-7467 > URL: https://issues.apache.org/jira/browse/HUDI-7467 > Project: Apache Hudi > Issue Type: Bug > Components: tests-ci >Reporter: Lin Liu >Assignee: tao pan >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > This test is flaky and sometimes it fails in Azure CI. We need to reproduce > it locally and check why it is flaky (if there is any bug causing it, or it's > due to test setup). > [https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=22725&view=logs&j=dcedfe73-9485-5cc5-817a-73b61fc5dcb0&t=9df7def4-004b-5fb7-f042-da5d723783ad&s=859b8d9a-8fd6-5a5c-6f5e-f84f1990894e] > {code:java} > [ERROR] Tests run: 131, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: > 2,459.289 s <<< FAILURE! - in > org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamer > [ERROR] testAutoGenerateRecordKeys Time elapsed: 14.248 s <<< FAILURE! > org.opentest4j.AssertionFailedError: expected: <300> but was: <500> > at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55) > at > org.junit.jupiter.api.AssertionUtils.failNotEqual(AssertionUtils.java:62) > at > org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:166) > at > org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:161) > at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:611) > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerTestBase.assertRecordCount(HoodieDeltaStreamerTestBase.java:486) > at > org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamer.testAutoGenerateRecordKeys(TestHoodieDeltaStreamer.java:2823) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-7641) Add metrics to track what partitions are enabled in MDT
[ https://issues.apache.org/jira/browse/HUDI-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7641. --- Resolution: Fixed > Add metrics to track what partitions are enabled in MDT > --- > > Key: HUDI-7641 > URL: https://issues.apache.org/jira/browse/HUDI-7641 > Project: Apache Hudi > Issue Type: Improvement > Components: metadata >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7710) BugFix: Remove compaction.inflight from conflict resolution
[ https://issues.apache.org/jira/browse/HUDI-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7710: Fix Version/s: 0.15.0 1.0.0 > BugFix: Remove compaction.inflight from conflict resolution > --- > > Key: HUDI-7710 > URL: https://issues.apache.org/jira/browse/HUDI-7710 > Project: Apache Hudi > Issue Type: Improvement > Components: compaction >Reporter: Lin Liu >Assignee: Lin Liu >Priority: Critical > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > During conflict resolution, compaction.inflight is found; since they don't > contain any plan information, this could cause NPE error. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-7688) Avoid always repeated inflate when encounter InterruptedIOException
[ https://issues.apache.org/jira/browse/HUDI-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7688. --- Resolution: Fixed > Avoid always repeated inflate when encounter InterruptedIOException > --- > > Key: HUDI-7688 > URL: https://issues.apache.org/jira/browse/HUDI-7688 > Project: Apache Hudi > Issue Type: Bug >Reporter: Jing Zhang >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > Attachments: image-2024-04-30-11-25-41-671.png, > image-2024-04-30-11-27-59-572.png > > > !image-2024-04-30-11-25-41-671.png! > !image-2024-04-30-11-27-59-572.png! > We should avoid always retry inflate when encounter InterruptedIOException. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7688) Avoid always repeated inflate when encounter InterruptedIOException
[ https://issues.apache.org/jira/browse/HUDI-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7688: Fix Version/s: 0.15.0 1.0.0 > Avoid always repeated inflate when encounter InterruptedIOException > --- > > Key: HUDI-7688 > URL: https://issues.apache.org/jira/browse/HUDI-7688 > Project: Apache Hudi > Issue Type: Bug >Reporter: Jing Zhang >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > Attachments: image-2024-04-30-11-25-41-671.png, > image-2024-04-30-11-27-59-572.png > > > !image-2024-04-30-11-25-41-671.png! > !image-2024-04-30-11-27-59-572.png! > We should avoid always retry inflate when encounter InterruptedIOException. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7667) Create util method to get offset range for fetching new data in KafkaSource
[ https://issues.apache.org/jira/browse/HUDI-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7667: Fix Version/s: 0.15.0 > Create util method to get offset range for fetching new data in KafkaSource > --- > > Key: HUDI-7667 > URL: https://issues.apache.org/jira/browse/HUDI-7667 > Project: Apache Hudi > Issue Type: Wish > Components: deltastreamer >Reporter: Vinish Reddy >Priority: Minor > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7684) Sort the records for Flink metadata table bulk_insert
[ https://issues.apache.org/jira/browse/HUDI-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7684: Fix Version/s: 0.15.0 > Sort the records for Flink metadata table bulk_insert > - > > Key: HUDI-7684 > URL: https://issues.apache.org/jira/browse/HUDI-7684 > Project: Apache Hudi > Issue Type: Improvement > Components: core >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > The HFile write requires the input to be sorted, without the sort, > re-enabling MDT on existing table could incur issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7682) Remove the files copy in Azure CI tests report
[ https://issues.apache.org/jira/browse/HUDI-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7682: Fix Version/s: 0.15.0 > Remove the files copy in Azure CI tests report > -- > > Key: HUDI-7682 > URL: https://issues.apache.org/jira/browse/HUDI-7682 > Project: Apache Hudi > Issue Type: Improvement > Components: compile >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-7511) Offset range calculation in kafka should return all topic partitions
[ https://issues.apache.org/jira/browse/HUDI-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7511. --- Resolution: Fixed > Offset range calculation in kafka should return all topic partitions > - > > Key: HUDI-7511 > URL: https://issues.apache.org/jira/browse/HUDI-7511 > Project: Apache Hudi > Issue Type: Bug > Components: deltastreamer >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > after [https://github.com/apache/hudi/pull/10869] got landed, we are not > returning every topic partition in final ranges. But for checkpointing > purpose, we need to have every kafka topic partition in final ranges even if > we are not consuming anything. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7658) Log time taken when meta sync fails in stream sync
[ https://issues.apache.org/jira/browse/HUDI-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7658: Fix Version/s: 0.15.0 1.0.0 > Log time taken when meta sync fails in stream sync > -- > > Key: HUDI-7658 > URL: https://issues.apache.org/jira/browse/HUDI-7658 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > Time is only printed in log statements on success, but it is useful to see > the log on failure as well -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-7658) Log time taken when meta sync fails in stream sync
[ https://issues.apache.org/jira/browse/HUDI-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7658. --- Resolution: Fixed > Log time taken when meta sync fails in stream sync > -- > > Key: HUDI-7658 > URL: https://issues.apache.org/jira/browse/HUDI-7658 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > Time is only printed in log statements on success, but it is useful to see > the log on failure as well -- This message was sent by Atlassian Jira (v8.20.10#820010)