[ 
https://issues.apache.org/jira/browse/HUDI-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reopened HUDI-3069:
---------------------------------------

> Improve HoodieMergedLogRecordScanner avoid putting unnecessary hoodie records
> -----------------------------------------------------------------------------
>
>                 Key: HUDI-3069
>                 URL: https://issues.apache.org/jira/browse/HUDI-3069
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: Common Core
>            Reporter: scx
>            Priority: Major
>              Labels: performance, pull-request-available
>             Fix For: 0.11.0
>
>
> I found that when the compact plan is generated, the delta log files under 
> each filegroup are arranged in the natural order of instant time. in the 
> majority of cases,We can think that the latest data is in the latest delta 
> log file, so we sort it from large to small according to the instance time, 
> which can largely avoid rewriting the data in the compact process, and then 
> optimize the compact time.
> In addition, when reading the delta log file, we compare the data in the 
> external spillablemap with the delta log data. If oldrecord is selected, 
> there is no need to rewrite the data in the external spillablemap. Rewriting 
> data will waste a lot of resources when data is spill to disk
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to