[jira] [Updated] (HUDI-6221) Fix flink online clustering exception when using complex type.
[ https://issues.apache.org/jira/browse/HUDI-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying Lin updated HUDI-6221: --- Description: When using flink 1.13.6 and hudi 0.13.0 cow + append + clustering mode, if the field list contains map type and aysnc clustering job scheduled, will throw exception: {quote}The requested schema is not compatible with the file schema. incompatible types: required binary key (STRING) != optional binary key (STRING) {quote} Root reason is [HUDI-3378|https://github.com/apache/hudi/pull/7345] change parquet reader. The latest parquet reader is compatible with spark but not fully compatible with flink due to flink parquet schema is different from spark parquet schema. We will make two patch, the first patch fix this bug in 0.13.x. The last patch fix diff schema between flink parquet and spark parquet. was: When using flink 1.13.6 and hudi 0.13.0 cow + append + clustering mode, if the field list contains map type and aysnc clustering job scheduled, will throw exception: The requested schema is not compatible with the file schema. incompatible types: required binary key (STRING) != optional binary key (STRING) Root reason is [HUDI-3378|https://github.com/apache/hudi/pull/7345] change parquet reader. The latest parquet reader is compatible with spark but not fully compatible with flink due to flink parquet schema is different from spark parquet schema. We will make two patch, the first patch fix this bug in 0.13.x. The last patch fix diff schema between flink parquet and spark parquet. > Fix flink online clustering exception when using complex type. > -- > > Key: HUDI-6221 > URL: https://issues.apache.org/jira/browse/HUDI-6221 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Ying Lin >Priority: Major > > When using flink 1.13.6 and hudi 0.13.0 cow + append + clustering mode, if > the field list contains map type and aysnc clustering job scheduled, will > throw exception: > {quote}The requested schema is not compatible with the file schema. > incompatible types: required binary key (STRING) != optional binary key > (STRING) > {quote} > Root reason is [HUDI-3378|https://github.com/apache/hudi/pull/7345] change > parquet reader. The latest parquet reader is compatible with spark but not > fully compatible with flink due to flink parquet schema is different from > spark parquet schema. > We will make two patch, the first patch fix this bug in 0.13.x. The last > patch fix diff schema between flink parquet and spark parquet. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6221) Fix flink online clustering exception when using complex type.
Ying Lin created HUDI-6221: -- Summary: Fix flink online clustering exception when using complex type. Key: HUDI-6221 URL: https://issues.apache.org/jira/browse/HUDI-6221 Project: Apache Hudi Issue Type: Bug Components: flink Reporter: Ying Lin When using flink 1.13.6 and hudi 0.13.0 cow + append + clustering mode, if the field list contains map type and aysnc clustering job scheduled, will throw exception: The requested schema is not compatible with the file schema. incompatible types: required binary key (STRING) != optional binary key (STRING) Root reason is [HUDI-3378|https://github.com/apache/hudi/pull/7345] change parquet reader. The latest parquet reader is compatible with spark but not fully compatible with flink due to flink parquet schema is different from spark parquet schema. We will make two patch, the first patch fix this bug in 0.13.x. The last patch fix diff schema between flink parquet and spark parquet. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6188) Unify the logic of intra-partition upsert and cross-partition upsert in flink state index.
Ying Lin created HUDI-6188: -- Summary: Unify the logic of intra-partition upsert and cross-partition upsert in flink state index. Key: HUDI-6188 URL: https://issues.apache.org/jira/browse/HUDI-6188 Project: Apache Hudi Issue Type: Improvement Components: index Reporter: Ying Lin Assignee: Ying Lin Now when partitioning upsert, according to {{precombine.field}} parameter, keep the record with the largest value after upserting. This is widely used to solve the case of out-of-order data, by setting the {{precombine.field}} to the event time to keep records with the largest event time. However, when using the FLINK_STATE index type, if cross-partition occurs, the precombine.field parameter will not fully take effect. In the case of cross-partitioning, the current logic uses data that arrives later, even if the event time is smaller. It may be necessary to unify the logic of intra-partition upsert and cross-partition upsert, which is convenient for users to understand and use. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5380) Change table path but table location in metastore will not change after hive-sync.
[ https://issues.apache.org/jira/browse/HUDI-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying Lin updated HUDI-5380: --- Summary: Change table path but table location in metastore will not change after hive-sync. (was: If we synchronize an existing table, the location of the table will not change.) > Change table path but table location in metastore will not change after > hive-sync. > -- > > Key: HUDI-5380 > URL: https://issues.apache.org/jira/browse/HUDI-5380 > Project: Apache Hudi > Issue Type: Bug > Components: hive, meta-sync >Reporter: Ying Lin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5380) If we synchronize an existing table, the location of the table will not change.
Ying Lin created HUDI-5380: -- Summary: If we synchronize an existing table, the location of the table will not change. Key: HUDI-5380 URL: https://issues.apache.org/jira/browse/HUDI-5380 Project: Apache Hudi Issue Type: Bug Components: hive, meta-sync Reporter: Ying Lin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4590) Add hudi-aws dependency to hudi-flink-bundle.
[ https://issues.apache.org/jira/browse/HUDI-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying Lin updated HUDI-4590: --- Description: Before 0.11.1, hudi-flink-bundle shade hudi-aws because hudi-client-common included hudi-aws. But after HUDI-3193,we need to shade it mannually. was: Before 0.11.1, hudi-flink-bundle shade hudi-aws because hudi-client-common included hudi-aws. But after HUDI-3193,we need to shade it mannual. > Add hudi-aws dependency to hudi-flink-bundle. > - > > Key: HUDI-4590 > URL: https://issues.apache.org/jira/browse/HUDI-4590 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Ying Lin >Assignee: Ying Lin >Priority: Major > > Before 0.11.1, hudi-flink-bundle shade hudi-aws because hudi-client-common > included hudi-aws. > But after HUDI-3193,we need to shade it mannually. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (HUDI-4590) Add hudi-aws dependency to hudi-flink-bundle.
[ https://issues.apache.org/jira/browse/HUDI-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying Lin reopened HUDI-4590: Assignee: Ying Lin > Add hudi-aws dependency to hudi-flink-bundle. > - > > Key: HUDI-4590 > URL: https://issues.apache.org/jira/browse/HUDI-4590 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Ying Lin >Assignee: Ying Lin >Priority: Major > > Before 0.11.1, hudi-flink-bundle shade hudi-aws because hudi-client-common > included hudi-aws. > But after HUDI-3193,we need to shade it mannual. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-4590) Add hudi-aws dependency to hudi-flink-bundle.
[ https://issues.apache.org/jira/browse/HUDI-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying Lin closed HUDI-4590. -- Resolution: Not A Problem > Add hudi-aws dependency to hudi-flink-bundle. > - > > Key: HUDI-4590 > URL: https://issues.apache.org/jira/browse/HUDI-4590 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Ying Lin >Priority: Major > > Before 0.11.1, hudi-flink-bundle shade hudi-aws because hudi-client-common > included hudi-aws. > But after HUDI-3193,we need to shade it mannual. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4590) Add hudi-aws dependency to hudi-flink-bundle.
Ying Lin created HUDI-4590: -- Summary: Add hudi-aws dependency to hudi-flink-bundle. Key: HUDI-4590 URL: https://issues.apache.org/jira/browse/HUDI-4590 Project: Apache Hudi Issue Type: Bug Components: flink Reporter: Ying Lin Before 0.11.1, hudi-flink-bundle shade hudi-aws because hudi-client-common included hudi-aws. But after HUDI-3193,we need to shade it mannual. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4570) Fix hive sync path error due to reuse of storage descriptors.
[ https://issues.apache.org/jira/browse/HUDI-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying Lin updated HUDI-4570: --- Summary: Fix hive sync path error due to reuse of storage descriptors. (was: Fix the hive sync path error because of reuse storage descriptor.) > Fix hive sync path error due to reuse of storage descriptors. > - > > Key: HUDI-4570 > URL: https://issues.apache.org/jira/browse/HUDI-4570 > Project: Apache Hudi > Issue Type: Bug > Components: hive, meta-sync >Reporter: Ying Lin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4570) Fix the hive sync path error because of reuse storage descriptor.
Ying Lin created HUDI-4570: -- Summary: Fix the hive sync path error because of reuse storage descriptor. Key: HUDI-4570 URL: https://issues.apache.org/jira/browse/HUDI-4570 Project: Apache Hudi Issue Type: Bug Components: hive, meta-sync Reporter: Ying Lin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-4562) Hudi FlinkOptions support configure 'hoodie.index.type'
[ https://issues.apache.org/jira/browse/HUDI-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying Lin closed HUDI-4562. -- Resolution: Not A Problem > Hudi FlinkOptions support configure 'hoodie.index.type' > --- > > Key: HUDI-4562 > URL: https://issues.apache.org/jira/browse/HUDI-4562 > Project: Apache Hudi > Issue Type: Improvement > Components: flink-sql >Reporter: Ying Lin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4562) Hudi FlinkOptions support configure 'hoodie.index.type'
Ying Lin created HUDI-4562: -- Summary: Hudi FlinkOptions support configure 'hoodie.index.type' Key: HUDI-4562 URL: https://issues.apache.org/jira/browse/HUDI-4562 Project: Apache Hudi Issue Type: Improvement Components: flink-sql Reporter: Ying Lin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4332) The current instant may be wrong under some extreme conditions in AppendWriteFunction.
Title: Message Title Ying Lin created an issue Apache Hudi / HUDI-4332 The current instant may be wrong under some extreme conditions in AppendWriteFunction. Issue Type: Bug Assignee: Unassigned Components: flink Created: 28/Jun/22 01:55 Priority: Major Reporter: Ying Lin Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Created] (HUDI-4314) Improve the performance of reading from the specified instant when the Flink streaming read application starts
Ying Lin created HUDI-4314: -- Summary: Improve the performance of reading from the specified instant when the Flink streaming read application starts Key: HUDI-4314 URL: https://issues.apache.org/jira/browse/HUDI-4314 Project: Apache Hudi Issue Type: Improvement Components: flink Reporter: Ying Lin When a Flink streaming reading application starts, it starts reading from the specified instant (or resumes the instant when it was stopped). We need to filter out the file paths that does not exist, some files may be cleaned by the cleaner. The current implementation is to do an _exists_ operation on all files, so an optimized way is to only do an _exists_ operatiion for lastest version files. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HUDI-4172) Fix the compatibility of flink1.13 when using complex data type.
Ying Lin created HUDI-4172: -- Summary: Fix the compatibility of flink1.13 when using complex data type. Key: HUDI-4172 URL: https://issues.apache.org/jira/browse/HUDI-4172 Project: Apache Hudi Issue Type: Bug Components: flink Reporter: Ying Lin Hudi is compatible with `flink1.13` and `flink1.14` in 0.11.0 and provide complex data types. But when using `flink-1.13`, there are some class comfilct because of same the fully qualified name in `hudi-flink1.13` and `flink-table-runtime`, so that we cannot provide the guarantee which class the classloader will load. For example, `org.apache.flink.table.data.ColumnarMapData` is exists in `hudi-flink1.13` and `flink-table-runtime`, but may flink application's classloader will load the `ColumnarMapData` in `flink-table-runtime`. So we need to shade these class with different package when using `flink1.13` profile. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HUDI-4083) Fix the flink application fails to start due to uncompleted archiving deletion
[ https://issues.apache.org/jira/browse/HUDI-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying Lin resolved HUDI-4083. > Fix the flink application fails to start due to uncompleted archiving deletion > -- > > Key: HUDI-4083 > URL: https://issues.apache.org/jira/browse/HUDI-4083 > Project: Apache Hudi > Issue Type: Bug > Components: archiving >Reporter: Ying Lin >Priority: Critical > Labels: pull-request-available > > Suppose a flink application crashes while archiving, it leaves some instant > files that should be deleted. > If the commit file is deleted but the in-flight file is left, when the flink > application starts, it will scan and get the last pending instant and throw > an exception. > So we need to delete non-commit file first, and then delete commit file. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Reopened] (HUDI-4083) Fix the flink application fails to start due to uncompleted archiving deletion
[ https://issues.apache.org/jira/browse/HUDI-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying Lin reopened HUDI-4083: > Fix the flink application fails to start due to uncompleted archiving deletion > -- > > Key: HUDI-4083 > URL: https://issues.apache.org/jira/browse/HUDI-4083 > Project: Apache Hudi > Issue Type: Bug > Components: archiving >Reporter: Ying Lin >Priority: Critical > > Suppose a flink application crashes while archiving, it leaves some instant > files that should be deleted. > If the commit file is deleted but the in-flight file is left, when the flink > application starts, it will scan and get the last pending instant and throw > an exception. > So we need to delete non-commit file first, and then delete commit file. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HUDI-4083) Fix the flink application fails to start due to uncompleted archiving deletion
[ https://issues.apache.org/jira/browse/HUDI-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying Lin resolved HUDI-4083. > Fix the flink application fails to start due to uncompleted archiving deletion > -- > > Key: HUDI-4083 > URL: https://issues.apache.org/jira/browse/HUDI-4083 > Project: Apache Hudi > Issue Type: Bug > Components: archiving >Reporter: Ying Lin >Priority: Critical > > Suppose a flink application crashes while archiving, it leaves some instant > files that should be deleted. > If the commit file is deleted but the in-flight file is left, when the flink > application starts, it will scan and get the last pending instant and throw > an exception. > So we need to delete non-commit file first, and then delete commit file. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HUDI-4083) Fix the flink application fails to start due to uncompleted archiving deletion
Ying Lin created HUDI-4083: -- Summary: Fix the flink application fails to start due to uncompleted archiving deletion Key: HUDI-4083 URL: https://issues.apache.org/jira/browse/HUDI-4083 Project: Apache Hudi Issue Type: Bug Components: archiving Reporter: Ying Lin Suppose a flink application crashes while archiving, it leaves some instant files that should be deleted. If the commit file is deleted but the in-flight file is left, when the flink application starts, it will scan and get the last pending instant and throw an exception. So we need to delete non-commit file first, and then delete commit file. -- This message was sent by Atlassian Jira (v8.20.7#820007)