[ https://issues.apache.org/jira/browse/HUDI-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Danny Chen resolved HUDI-4397. ------------------------------ > Flink Inline Cluster and Compact plan distribute strategy changed from > rebalance to hash to avoid potential multiple threads accessing the same file > ---------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HUDI-4397 > URL: https://issues.apache.org/jira/browse/HUDI-4397 > Project: Apache Hudi > Issue Type: Improvement > Reporter: yuemeng > Priority: Major > Labels: pull-request-available > > Currently. Flink Inline cluster and compact distribute strategy are > rebalanced. when a compact operation doesn't succeed then it rolls back and > executes again later. rebalance strategy may lead to the same file send to a > different thread so accessed by multiple threads. such as failed rollback > thread and normal compact thread. cause follow error: > {code:java} > writing record HoodieRecord{key=HoodieKey > { recordKey=a:100 partitionPath=2022-06-30/18} > , currentLocation='null', newLocation='null'} > java.io.IOException: The file being written is in an invalid state. Probably > caused by an error thrown previously. Current state: COLUMN > at > org.apache.hudi.org.apache.parquet.hadoop.ParquetFileWriter$STATE.error(ParquetFileWriter.java:217) > ~[?:?] > at > org.apache.hudi.org.apache.parquet.hadoop.ParquetFileWriter$STATE.startBlock(ParquetFileWriter.java:209) > ~[?:?] > at > org.apache.hudi.org.apache.parquet.hadoop.ParquetFileWriter.startBlock(ParquetFileWriter.java:407) > ~[?:?] > at > org.apache.hudi.org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:184) > ~[?:?] > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)