[GitHub] [hudi] prashantwason opened a new pull request, #8526: [HUDI-6116] Optimize log block reading by removing seeks to check corrupted blocks.
prashantwason opened a new pull request, #8526: URL: https://github.com/apache/hudi/pull/8526 [HUDI-6116] Optimize log block reading by removing seeks to check corrupted blocks. ### Change Logs 1. Removed the eager check for isBlockCorrupted after reading block size in HoodieLogFileReader 2. Added validation checks after reading each item (version, size, blockType, content, etc) from the log block 3. Added a unit test which generated various corruption scenarios and validates that the corrupted blocks are found ### Impact Improved performance of reading a log file when there is high latency or a large number of log blocks exist. ### Risk level (write none, low medium or high below) None ### Documentation Update None ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] prashantwason opened a new pull request, #8526: [HUDI-6116] Optimize log block reading by removing seeks to check corrupted blocks.
prashantwason opened a new pull request, #8526: URL: https://github.com/apache/hudi/pull/8526 [HUDI-6116] Optimize log block reading by removing seeks to check corrupted blocks. ### Change Logs 1. Removed the eager check for isBlockCorrupted after reading block size in HoodieLogFileReader 2. Added validation checks after reading each item (version, size, blockType, content, etc) from the log block 3. Added a unit test which generated various corruption scenarios and validates that the corrupted blocks are found ### Impact Improved performance of reading a log file when there is high latency or a large number of log blocks exist. ### Risk level (write none, low medium or high below) None ### Documentation Update None ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org