[ https://issues.apache.org/jira/browse/HUDI-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
chenfengLiu reassigned HUDI-4350: --------------------------------- Assignee: chenfengLiu > reduce the shuffle work when we just insert but not update and delete > --------------------------------------------------------------------- > > Key: HUDI-4350 > URL: https://issues.apache.org/jira/browse/HUDI-4350 > Project: Apache Hudi > Issue Type: Improvement > Components: flink > Reporter: chenfengLiu > Assignee: chenfengLiu > Priority: Major > > As the discussion on the https://issues.apache.org/jira/browse/HUDI-4338, > more shuffle work will cause the network overhead and the risk of the data > skew. > So when we build the flink data stream to write to hudi, the orignal plan is > able to improve this point. > Now if we wanna update or delete record, we need to load index first, then > send the index record and the hoodie record to Bucket Assgin Operator. > Bucket Assin Opeator will build the index state for assgining the bucket for > incomming record. > If we just insert the new record not update or delete, we don't need these > works like buld the index, repartion the existed record. -- This message was sent by Atlassian Jira (v8.20.10#820010)