cshuo commented on PR #18083: URL: https://github.com/apache/hudi/pull/18083#issuecomment-3949854378
@prashantwason thks for the detail explanation. > Would you like me to add some basic benchmarking to quantify the latency improvements? Or do you have specific concerns about the approach that I should address first? Regarding the append write with buffer sorting, there are already two approaches, `AppendWriteFunctionWithBIMBufferSort` and `AppendWriteFunctionWithDisruptorBufferSort` in the repository. Both implement sorting and flushing in an asynchronous manner, aiming to mitigate spike issues during the flush process. This PR adopts a continuous sorting approach, which can indeed alleviate sorting spikes. But the flush process involves not only sorting but also file-writing overhead and continuous sort will spread sorting overhead across each record writing process. Therefore, I’d like to understand how much performance gain this PR will bring compared to the existed functions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
