[ https://issues.apache.org/jira/browse/HUDI-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raymond Xu updated HUDI-5321: ----------------------------- Sprint: 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31, Sprint 2023-02-14 (was: 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31) > Fix Bulk Insert ColumnSortPartitioners > -------------------------------------- > > Key: HUDI-5321 > URL: https://issues.apache.org/jira/browse/HUDI-5321 > Project: Apache Hudi > Issue Type: Bug > Affects Versions: 0.12.1 > Reporter: Alexey Kudinkin > Assignee: Jonathan Vexler > Priority: Critical > Labels: pull-request-available > Fix For: 0.14.0 > > > Currently, all of the Custom Bulk Insert ColumnSortPartitioner impls > incorrectly return "true" from the "arePartitionRecordsSorted" method, even > though records might not necessarily be sorted by the partition-path columns > as is required by this method. > In case when such Partitioner is used and the data is NOT sorted by the list > of columns that start w/ partition ones, this could lead to a Parquet writers > being closed prematurely when writing files creating a LOT of small files. -- This message was sent by Atlassian Jira (v8.20.10#820010)