[jira] [Comment Edited] (HUDI-7033) Fix read error for schema evolution + partition value extraction
[ https://issues.apache.org/jira/browse/HUDI-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859598#comment-17859598 ] Geser Dugarov edited comment on HUDI-7033 at 6/24/24 7:50 AM: -- Merged a4fa3451916de11dc082792076b62013586dadaf in linked MR 9994 refers to [non-merged MR 9889|https://github.com/apache/hudi/pull/9889] was (Author: JIRAUSER301110): Merged a4fa3451916de11dc082792076b62013586dadaf refers to [non-merged MR 9889|https://github.com/apache/hudi/pull/9889] > Fix read error for schema evolution + partition value extraction > > > Key: HUDI-7033 > URL: https://issues.apache.org/jira/browse/HUDI-7033 > Project: Apache Hudi > Issue Type: Bug >Reporter: voon >Priority: Major > Labels: pull-request-available > > After HUDI-6960 is merged, there > *shouldExtractPartitionValuesFromPartitionPath* will correctly ignore > partition columns in requiredSchema. > > When using the configs below, there will be read errors. > > {code:java} > hoodie.datasource.read.extract.partition.values.from.path = true {code} > > > When the config above is added together with: > > {code:java} > hoodie.schema.on.read.enable = true {code} > > The query schema will be pruned to **{*}NOT{*}** contain any partition > columns. > > When rebuilding parquet filters, file schema's columns are scanned against > querySchema. However, Hudi files (file schema) might still contain partition > columns. And when partition filters are being rebuilt with these file schema > against query schema, it will lead to partition columns not being found. > > {code:java} > Caused by: java.lang.IllegalArgumentException: cannot found filter col > name:region from querySchema: table { > 5: id: optional int > 6: name: optional string > 7: ts: optional long > } > at > org.apache.hudi.internal.schema.utils.InternalSchemaUtils.reBuildFilterName(InternalSchemaUtils.java:180) > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HUDI-7033) Fix read error for schema evolution + partition value extraction
[ https://issues.apache.org/jira/browse/HUDI-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859598#comment-17859598 ] Geser Dugarov edited comment on HUDI-7033 at 6/24/24 7:47 AM: -- Merged a4fa3451916de11dc082792076b62013586dadaf refers to [non-merged MR 9889|https://github.com/apache/hudi/pull/9889] was (Author: JIRAUSER301110): Merged a4fa3451916de11dc082792076b62013586dadaf refer to [non-merged MR 9889|https://github.com/apache/hudi/pull/9889] > Fix read error for schema evolution + partition value extraction > > > Key: HUDI-7033 > URL: https://issues.apache.org/jira/browse/HUDI-7033 > Project: Apache Hudi > Issue Type: Bug >Reporter: voon >Priority: Major > Labels: pull-request-available > > After HUDI-6960 is merged, there > *shouldExtractPartitionValuesFromPartitionPath* will correctly ignore > partition columns in requiredSchema. > > When using the configs below, there will be read errors. > > {code:java} > hoodie.datasource.read.extract.partition.values.from.path = true {code} > > > When the config above is added together with: > > {code:java} > hoodie.schema.on.read.enable = true {code} > > The query schema will be pruned to **{*}NOT{*}** contain any partition > columns. > > When rebuilding parquet filters, file schema's columns are scanned against > querySchema. However, Hudi files (file schema) might still contain partition > columns. And when partition filters are being rebuilt with these file schema > against query schema, it will lead to partition columns not being found. > > {code:java} > Caused by: java.lang.IllegalArgumentException: cannot found filter col > name:region from querySchema: table { > 5: id: optional int > 6: name: optional string > 7: ts: optional long > } > at > org.apache.hudi.internal.schema.utils.InternalSchemaUtils.reBuildFilterName(InternalSchemaUtils.java:180) > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)