[ https://issues.apache.org/jira/browse/HUDI-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yue Zhang updated HUDI-5651: ---------------------------- Fix Version/s: (was: 0.13.1) > sort the inputs by record keys for bulk insert tasks > ---------------------------------------------------- > > Key: HUDI-5651 > URL: https://issues.apache.org/jira/browse/HUDI-5651 > Project: Apache Hudi > Issue Type: Improvement > Reporter: vortual > Assignee: Danny Chen > Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > BulkInsert adds an option to sort by primary key: > WRITE_BULK_INSERT_SORT_INPUT_BY_RECORD_KEY > The advantage of sorting data according to the primary key: Flink needs to > scan fewer files when new data is added later, and the memory usage will also > be reduced -- This message was sent by Atlassian Jira (v8.20.10#820010)