[ https://issues.apache.org/jira/browse/HUDI-7111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Danny Chen updated HUDI-7111: ----------------------------- Fix Version/s: 0.14.1 > Performance regression of spark job which written into simple bucket index > table > -------------------------------------------------------------------------------- > > Key: HUDI-7111 > URL: https://issues.apache.org/jira/browse/HUDI-7111 > Project: Apache Hudi > Issue Type: Improvement > Components: spark > Reporter: Jing Zhang > Priority: Major > Labels: pull-request-available > Fix For: 0.14.1 > > Attachments: image-2023-11-16-23-41-32-729.png > > > After upgrade the version to 0.14.0, the performance of the Spark job, which > is written into a simple bucket index table, is regressing. > !image-2023-11-16-23-41-32-729.png! > The reason is in the [PR#4480|https://github.com/apache/hudi/pull/4480], the > refactor of bucket index introduce two unnecessary stages in tag for simple > bucket index. > {code:java} > List<String> partitions = > records.map(HoodieRecord::getPartitionPath).distinct().collectAsList(); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)