[jira] [Updated] (HUDI-6221) Fix flink online clustering exception when using complex type.

2023-05-16 Thread Ying Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Lin updated HUDI-6221:
---
Description: 
When using flink 1.13.6 and hudi 0.13.0 cow + append + clustering mode, if the 
field list contains map type and aysnc clustering job scheduled, will throw 
exception: 
{quote}The requested schema is not compatible with the file schema. 
incompatible types: required binary key (STRING) != optional binary key (STRING)
{quote}
Root reason is [HUDI-3378|https://github.com/apache/hudi/pull/7345] change 
parquet reader. The latest parquet reader is compatible with spark but not 
fully compatible with flink due to flink parquet schema is different from spark 
parquet schema.

We will make two patch, the first patch fix this bug in 0.13.x. The last patch 
fix diff schema between flink parquet and spark parquet.

  was:
When using flink 1.13.6 and hudi 0.13.0 cow + append + clustering mode, if the 
field list contains map type and aysnc clustering job scheduled, will throw 
exception: 
The requested schema is not compatible with the file schema. incompatible 
types: required binary key (STRING) != optional binary key (STRING)
Root reason is [HUDI-3378|https://github.com/apache/hudi/pull/7345] change 
parquet reader. The latest parquet reader is compatible with spark but not 
fully compatible with flink due to flink parquet schema is different from spark 
parquet schema.

We will make two patch, the first patch fix this bug in 0.13.x. The last patch 
fix diff schema between flink parquet and spark parquet.


> Fix flink online clustering exception when using complex type.
> --
>
> Key: HUDI-6221
> URL: https://issues.apache.org/jira/browse/HUDI-6221
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Ying Lin
>Priority: Major
>
> When using flink 1.13.6 and hudi 0.13.0 cow + append + clustering mode, if 
> the field list contains map type and aysnc clustering job scheduled, will 
> throw exception: 
> {quote}The requested schema is not compatible with the file schema. 
> incompatible types: required binary key (STRING) != optional binary key 
> (STRING)
> {quote}
> Root reason is [HUDI-3378|https://github.com/apache/hudi/pull/7345] change 
> parquet reader. The latest parquet reader is compatible with spark but not 
> fully compatible with flink due to flink parquet schema is different from 
> spark parquet schema.
> We will make two patch, the first patch fix this bug in 0.13.x. The last 
> patch fix diff schema between flink parquet and spark parquet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6221) Fix flink online clustering exception when using complex type.

2023-05-16 Thread Ying Lin (Jira)
Ying Lin created HUDI-6221:
--

 Summary: Fix flink online clustering exception when using complex 
type.
 Key: HUDI-6221
 URL: https://issues.apache.org/jira/browse/HUDI-6221
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink
Reporter: Ying Lin


When using flink 1.13.6 and hudi 0.13.0 cow + append + clustering mode, if the 
field list contains map type and aysnc clustering job scheduled, will throw 
exception: 
The requested schema is not compatible with the file schema. incompatible 
types: required binary key (STRING) != optional binary key (STRING)
Root reason is [HUDI-3378|https://github.com/apache/hudi/pull/7345] change 
parquet reader. The latest parquet reader is compatible with spark but not 
fully compatible with flink due to flink parquet schema is different from spark 
parquet schema.

We will make two patch, the first patch fix this bug in 0.13.x. The last patch 
fix diff schema between flink parquet and spark parquet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6188) Unify the logic of intra-partition upsert and cross-partition upsert in flink state index.

2023-05-08 Thread Ying Lin (Jira)
Ying Lin created HUDI-6188:
--

 Summary: Unify the logic of intra-partition upsert and 
cross-partition upsert in flink state index.
 Key: HUDI-6188
 URL: https://issues.apache.org/jira/browse/HUDI-6188
 Project: Apache Hudi
  Issue Type: Improvement
  Components: index
Reporter: Ying Lin
Assignee: Ying Lin


Now when partitioning upsert, according to {{precombine.field}} parameter, keep 
the record with the largest value after upserting.

This is widely used to solve the case of out-of-order data, by setting the 
{{precombine.field}} to the event time to keep records with the largest event 
time.

However, when using the FLINK_STATE index type, if cross-partition occurs, the 
precombine.field parameter will not fully take effect.

In the case of cross-partitioning, the current logic uses data that arrives 
later, even if the event time is smaller.

It may be necessary to unify the logic of intra-partition upsert and 
cross-partition upsert, which is convenient for users to understand and use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5380) Change table path but table location in metastore will not change after hive-sync.

2022-12-13 Thread Ying Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Lin updated HUDI-5380:
---
Summary: Change table path but table location in metastore will not change 
after hive-sync.  (was: If we synchronize an existing table, the location of 
the table will not change.)

> Change table path but table location in metastore will not change after 
> hive-sync.
> --
>
> Key: HUDI-5380
> URL: https://issues.apache.org/jira/browse/HUDI-5380
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hive, meta-sync
>Reporter: Ying Lin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5380) If we synchronize an existing table, the location of the table will not change.

2022-12-13 Thread Ying Lin (Jira)
Ying Lin created HUDI-5380:
--

 Summary: If we synchronize an existing table, the location of the 
table will not change.
 Key: HUDI-5380
 URL: https://issues.apache.org/jira/browse/HUDI-5380
 Project: Apache Hudi
  Issue Type: Bug
  Components: hive, meta-sync
Reporter: Ying Lin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-4590) Add hudi-aws dependency to hudi-flink-bundle.

2022-08-10 Thread Ying Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Lin updated HUDI-4590:
---
Description: 
Before 0.11.1, hudi-flink-bundle shade hudi-aws because hudi-client-common 
included hudi-aws.

But after HUDI-3193,we need to shade it mannually.

  was:
Before 0.11.1, hudi-flink-bundle shade hudi-aws because hudi-client-common 
included hudi-aws.

But after HUDI-3193,we need to shade it mannual.


> Add hudi-aws dependency to hudi-flink-bundle.
> -
>
> Key: HUDI-4590
> URL: https://issues.apache.org/jira/browse/HUDI-4590
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Ying Lin
>Assignee: Ying Lin
>Priority: Major
>
> Before 0.11.1, hudi-flink-bundle shade hudi-aws because hudi-client-common 
> included hudi-aws.
> But after HUDI-3193,we need to shade it mannually.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HUDI-4590) Add hudi-aws dependency to hudi-flink-bundle.

2022-08-10 Thread Ying Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Lin reopened HUDI-4590:

  Assignee: Ying Lin

> Add hudi-aws dependency to hudi-flink-bundle.
> -
>
> Key: HUDI-4590
> URL: https://issues.apache.org/jira/browse/HUDI-4590
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Ying Lin
>Assignee: Ying Lin
>Priority: Major
>
> Before 0.11.1, hudi-flink-bundle shade hudi-aws because hudi-client-common 
> included hudi-aws.
> But after HUDI-3193,we need to shade it mannual.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-4590) Add hudi-aws dependency to hudi-flink-bundle.

2022-08-10 Thread Ying Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Lin closed HUDI-4590.
--
Resolution: Not A Problem

> Add hudi-aws dependency to hudi-flink-bundle.
> -
>
> Key: HUDI-4590
> URL: https://issues.apache.org/jira/browse/HUDI-4590
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Ying Lin
>Priority: Major
>
> Before 0.11.1, hudi-flink-bundle shade hudi-aws because hudi-client-common 
> included hudi-aws.
> But after HUDI-3193,we need to shade it mannual.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-4590) Add hudi-aws dependency to hudi-flink-bundle.

2022-08-10 Thread Ying Lin (Jira)
Ying Lin created HUDI-4590:
--

 Summary: Add hudi-aws dependency to hudi-flink-bundle.
 Key: HUDI-4590
 URL: https://issues.apache.org/jira/browse/HUDI-4590
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink
Reporter: Ying Lin


Before 0.11.1, hudi-flink-bundle shade hudi-aws because hudi-client-common 
included hudi-aws.

But after HUDI-3193,we need to shade it mannual.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-4570) Fix hive sync path error due to reuse of storage descriptors.

2022-08-08 Thread Ying Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Lin updated HUDI-4570:
---
Summary: Fix hive sync path error due to reuse of storage descriptors.  
(was: Fix the hive sync path error because of reuse storage descriptor.)

> Fix hive sync path error due to reuse of storage descriptors.
> -
>
> Key: HUDI-4570
> URL: https://issues.apache.org/jira/browse/HUDI-4570
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hive, meta-sync
>Reporter: Ying Lin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-4570) Fix the hive sync path error because of reuse storage descriptor.

2022-08-08 Thread Ying Lin (Jira)
Ying Lin created HUDI-4570:
--

 Summary: Fix the hive sync path error because of reuse storage 
descriptor.
 Key: HUDI-4570
 URL: https://issues.apache.org/jira/browse/HUDI-4570
 Project: Apache Hudi
  Issue Type: Bug
  Components: hive, meta-sync
Reporter: Ying Lin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-4562) Hudi FlinkOptions support configure 'hoodie.index.type'

2022-08-07 Thread Ying Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Lin closed HUDI-4562.
--
Resolution: Not A Problem

> Hudi FlinkOptions support configure 'hoodie.index.type'
> ---
>
> Key: HUDI-4562
> URL: https://issues.apache.org/jira/browse/HUDI-4562
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink-sql
>Reporter: Ying Lin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-4562) Hudi FlinkOptions support configure 'hoodie.index.type'

2022-08-07 Thread Ying Lin (Jira)
Ying Lin created HUDI-4562:
--

 Summary: Hudi FlinkOptions support configure 'hoodie.index.type'
 Key: HUDI-4562
 URL: https://issues.apache.org/jira/browse/HUDI-4562
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink-sql
Reporter: Ying Lin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-4332) The current instant may be wrong under some extreme conditions in AppendWriteFunction.

2022-06-27 Thread Ying Lin (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Ying Lin created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Apache Hudi /  HUDI-4332  
 
 
  The current instant may be wrong under some extreme conditions in AppendWriteFunction.   
 

  
 
 
 
 

 
Issue Type: 
  Bug  
 
 
Assignee: 
 Unassigned  
 
 
Components: 
 flink  
 
 
Created: 
 28/Jun/22 01:55  
 
 
Priority: 
  Major  
 
 
Reporter: 
 Ying Lin  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)  
 
 

 
   
 

  
 

  
 

   



[jira] [Created] (HUDI-4314) Improve the performance of reading from the specified instant when the Flink streaming read application starts

2022-06-24 Thread Ying Lin (Jira)
Ying Lin created HUDI-4314:
--

 Summary: Improve the performance of reading from the specified 
instant when the Flink streaming read application starts
 Key: HUDI-4314
 URL: https://issues.apache.org/jira/browse/HUDI-4314
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: Ying Lin


When a Flink streaming reading application starts, it starts reading from the 
specified instant (or resumes the instant when it was stopped).

We need to filter out the file paths that does not exist, some files may be 
cleaned by the cleaner.

The current implementation is to do an _exists_ operation on all files, so an 
optimized way is to only do an _exists_ operatiion for lastest version files.

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HUDI-4172) Fix the compatibility of flink1.13 when using complex data type.

2022-05-31 Thread Ying Lin (Jira)
Ying Lin created HUDI-4172:
--

 Summary: Fix the compatibility of flink1.13 when using complex 
data type.
 Key: HUDI-4172
 URL: https://issues.apache.org/jira/browse/HUDI-4172
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink
Reporter: Ying Lin


Hudi is compatible with `flink1.13` and `flink1.14` in 0.11.0 and provide 
complex data types.

But when using `flink-1.13`, there are some class comfilct because of same the 
fully qualified name in `hudi-flink1.13` and `flink-table-runtime`, so that we 
cannot provide the guarantee which class the classloader will load.

For example, `org.apache.flink.table.data.ColumnarMapData` is exists in 
`hudi-flink1.13` and `flink-table-runtime`, but may flink application's 
classloader will load the `ColumnarMapData` in `flink-table-runtime`.

So we need to shade these class with different package when using `flink1.13` 
profile.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HUDI-4083) Fix the flink application fails to start due to uncompleted archiving deletion

2022-05-25 Thread Ying Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Lin resolved HUDI-4083.


> Fix the flink application fails to start due to uncompleted archiving deletion
> --
>
> Key: HUDI-4083
> URL: https://issues.apache.org/jira/browse/HUDI-4083
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: archiving
>Reporter: Ying Lin
>Priority: Critical
>  Labels: pull-request-available
>
> Suppose a flink application crashes while archiving, it leaves some instant 
> files that should be deleted.
> If the commit file is deleted but the in-flight file is left, when the flink 
> application starts, it will scan and get the last pending instant and throw 
> an exception.
> So we need to delete non-commit file first, and then delete commit file.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Reopened] (HUDI-4083) Fix the flink application fails to start due to uncompleted archiving deletion

2022-05-11 Thread Ying Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Lin reopened HUDI-4083:


> Fix the flink application fails to start due to uncompleted archiving deletion
> --
>
> Key: HUDI-4083
> URL: https://issues.apache.org/jira/browse/HUDI-4083
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: archiving
>Reporter: Ying Lin
>Priority: Critical
>
> Suppose a flink application crashes while archiving, it leaves some instant 
> files that should be deleted.
> If the commit file is deleted but the in-flight file is left, when the flink 
> application starts, it will scan and get the last pending instant and throw 
> an exception.
> So we need to delete non-commit file first, and then delete commit file.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HUDI-4083) Fix the flink application fails to start due to uncompleted archiving deletion

2022-05-11 Thread Ying Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Lin resolved HUDI-4083.


> Fix the flink application fails to start due to uncompleted archiving deletion
> --
>
> Key: HUDI-4083
> URL: https://issues.apache.org/jira/browse/HUDI-4083
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: archiving
>Reporter: Ying Lin
>Priority: Critical
>
> Suppose a flink application crashes while archiving, it leaves some instant 
> files that should be deleted.
> If the commit file is deleted but the in-flight file is left, when the flink 
> application starts, it will scan and get the last pending instant and throw 
> an exception.
> So we need to delete non-commit file first, and then delete commit file.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HUDI-4083) Fix the flink application fails to start due to uncompleted archiving deletion

2022-05-11 Thread Ying Lin (Jira)
Ying Lin created HUDI-4083:
--

 Summary: Fix the flink application fails to start due to 
uncompleted archiving deletion
 Key: HUDI-4083
 URL: https://issues.apache.org/jira/browse/HUDI-4083
 Project: Apache Hudi
  Issue Type: Bug
  Components: archiving
Reporter: Ying Lin


Suppose a flink application crashes while archiving, it leaves some instant 
files that should be deleted.

If the commit file is deleted but the in-flight file is left, when the flink 
application starts, it will scan and get the last pending instant and throw an 
exception.

So we need to delete non-commit file first, and then delete commit file.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)