[ 
https://issues.apache.org/jira/browse/HUDI-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenfengLiu reassigned HUDI-4350:
---------------------------------

    Assignee: chenfengLiu

> reduce the shuffle work when we just insert but not update and delete
> ---------------------------------------------------------------------
>
>                 Key: HUDI-4350
>                 URL: https://issues.apache.org/jira/browse/HUDI-4350
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: flink
>            Reporter: chenfengLiu
>            Assignee: chenfengLiu
>            Priority: Major
>
> As the discussion on the https://issues.apache.org/jira/browse/HUDI-4338, 
> more shuffle work will cause the network overhead and the risk of the data 
> skew.
> So when we build the flink data stream to write to hudi, the orignal plan is 
> able to improve this point.
> Now if we wanna update or delete record, we need to load index first, then 
> send the index record and the hoodie record to Bucket Assgin Operator.
> Bucket Assin Opeator will build the index state for assgining the bucket for 
> incomming record.
> If we just insert the new record not update or delete, we don't need these 
> works like buld the index, repartion the existed record.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to