[jira] [Resolved] (HUDI-4397) Flink Inline Cluster and Compact plan distribute strategy changed from rebalance to hash to avoid potential multiple threads accessing the same file

Danny Chen (Jira) Thu, 14 Jul 2022 21:23:04 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Danny Chen resolved HUDI-4397.
------------------------------

> Flink Inline Cluster and Compact plan distribute strategy changed from 
> rebalance to hash to avoid potential multiple threads accessing the same file
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-4397
>                 URL: https://issues.apache.org/jira/browse/HUDI-4397
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: yuemeng
>            Priority: Major
>              Labels: pull-request-available
>
> Currently.  Flink Inline cluster and compact distribute strategy are 
> rebalanced. when a compact operation doesn't succeed then it rolls back and 
> executes again later. rebalance strategy may lead to the same file send to a 
> different thread so accessed by multiple threads. such as failed rollback 
> thread and normal compact thread.  cause follow error:
> {code:java}
> writing record  HoodieRecord{key=HoodieKey
> { recordKey=a:100 partitionPath=2022-06-30/18}
> , currentLocation='null', newLocation='null'}
> java.io.IOException: The file being written is in an invalid state. Probably 
> caused by an error thrown previously. Current state: COLUMN
>         at 
> org.apache.hudi.org.apache.parquet.hadoop.ParquetFileWriter$STATE.error(ParquetFileWriter.java:217)
>  ~[?:?]
>         at 
> org.apache.hudi.org.apache.parquet.hadoop.ParquetFileWriter$STATE.startBlock(ParquetFileWriter.java:209)
>  ~[?:?]
>         at 
> org.apache.hudi.org.apache.parquet.hadoop.ParquetFileWriter.startBlock(ParquetFileWriter.java:407)
>  ~[?:?]
>         at 
> org.apache.hudi.org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:184)
>  ~[?:?]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HUDI-4397) Flink Inline Cluster and Compact plan distribute strategy changed from rebalance to hash to avoid potential multiple threads accessing the same file

Reply via email to