[ https://issues.apache.org/jira/browse/HUDI-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Danny Chen updated HUDI-5651: ----------------------------- Sprint: 0.13.0 Final Sprint 3 > sort the inputs by record keys for bulk insert tasks > ---------------------------------------------------- > > Key: HUDI-5651 > URL: https://issues.apache.org/jira/browse/HUDI-5651 > Project: Apache Hudi > Issue Type: Improvement > Reporter: vortual > Assignee: vortual > Priority: Major > Labels: pull-request-available > > BulkInsert adds an option to sort by primary key: > WRITE_BULK_INSERT_SORT_INPUT_BY_RECORD_KEY > The advantage of sorting data according to the primary key: Flink needs to > scan fewer files when new data is added later, and the memory usage will also > be reduced -- This message was sent by Atlassian Jira (v8.20.10#820010)