[ https://issues.apache.org/jira/browse/HUDI-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
sivabalan narayanan reopened HUDI-3069: --------------------------------------- > Improve HoodieMergedLogRecordScanner avoid putting unnecessary hoodie records > ----------------------------------------------------------------------------- > > Key: HUDI-3069 > URL: https://issues.apache.org/jira/browse/HUDI-3069 > Project: Apache Hudi > Issue Type: Improvement > Components: Common Core > Reporter: scx > Priority: Major > Labels: performance, pull-request-available > Fix For: 0.11.0 > > > I found that when the compact plan is generated, the delta log files under > each filegroup are arranged in the natural order of instant time. in the > majority of cases,We can think that the latest data is in the latest delta > log file, so we sort it from large to small according to the instance time, > which can largely avoid rewriting the data in the compact process, and then > optimize the compact time. > In addition, when reading the delta log file, we compare the data in the > external spillablemap with the delta log data. If oldrecord is selected, > there is no need to rewrite the data in the external spillablemap. Rewriting > data will waste a lot of resources when data is spill to disk > -- This message was sent by Atlassian Jira (v8.20.1#820001)