[ https://issues.apache.org/jira/browse/HUDI-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
sivabalan narayanan closed HUDI-4044. ------------------------------------- Resolution: Fixed > When reading data from flink-hudi to external storage, the result is incorrect > ------------------------------------------------------------------------------ > > Key: HUDI-4044 > URL: https://issues.apache.org/jira/browse/HUDI-4044 > Project: Apache Hudi > Issue Type: Bug > Components: flink > Affects Versions: 0.11.0 > Reporter: yanxiang > Priority: Major > Labels: pull-request-available > Fix For: 0.11.1, 0.12.0 > > > When reading data from flink-hudi to external storage, the result is > incorrect because of concurrency issues: > > Here's the case: > > There is a split_monitor task that listens for changes on the TimeLine every > N seconds; There are four split_reader tasks for processing changing data and > sinking data to external storage: > > (1) First,split_monitor listens to Instance1 changes , and the corresponding > fileId is log1. Split_monitor distributes the fileId information to > split_reader task 1 in Rebanlance mode for processing. > > (2) then,split_monitor listens for Instance2 change . The corresponding > fileId is log1 (assuming that the changed data have the same primary key ). > The split_monitor task distributes fileId information to split_reader task 2 > in Rebanlance mode for processing. > > (3) Split_reader task 1 and split_reader task 2 process the same primary key > data, and their processing speeds are inconsistent. As a result, the sequence > of data sink to external storage is inconsistent. The data modified earlier > overwrites the data modified later, resulting in incorrect data. > > > Solution: > After the split_monitor task monitors the data changes, it distributes them > to the split_reader task through the FileId Hash mode to ensure that the same > FileId files are processed in the same split_reader task, thus solving this > problem . -- This message was sent by Atlassian Jira (v8.20.7#820007)