[ 
https://issues.apache.org/jira/browse/HUDI-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan closed HUDI-4044.
-------------------------------------
    Resolution: Fixed

> When reading data from flink-hudi to external storage, the result is incorrect
> ------------------------------------------------------------------------------
>
>                 Key: HUDI-4044
>                 URL: https://issues.apache.org/jira/browse/HUDI-4044
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: flink
>    Affects Versions: 0.11.0
>            Reporter: yanxiang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.11.1, 0.12.0
>
>
> When reading data from flink-hudi to external storage, the result is 
> incorrect  because of concurrency issues:
>  
> Here's the  case:
>  
> There is a split_monitor task that listens for changes on the TimeLine every 
> N seconds; There are four split_reader tasks for processing changing data and 
> sinking data to external storage:
>  
> (1) First,split_monitor listens to Instance1 changes , and the corresponding 
> fileId is log1. Split_monitor distributes the fileId information to 
> split_reader task 1 in Rebanlance mode for processing.
>  
> (2) then,split_monitor listens for Instance2 change . The corresponding 
> fileId is log1 (assuming that the changed data have the same primary key ). 
> The split_monitor task distributes fileId information to split_reader task 2 
> in Rebanlance mode for processing.
>  
> (3) Split_reader task 1 and split_reader task 2 process the same primary key 
> data, and their processing speeds are inconsistent. As a result, the sequence 
> of data sink to external storage is inconsistent. The data modified earlier 
> overwrites the data modified later, resulting in incorrect data.
>  
>  
> Solution:
> After the split_monitor task monitors the data changes, it distributes them 
> to the split_reader task through the FileId Hash mode to ensure that the same 
> FileId files are processed in the same split_reader task, thus solving this 
> problem .



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to