nsivabalan opened a new pull request #3762: URL: https://github.com/apache/hudi/pull/3762
## What is the purpose of the pull request - Added support to read HFile log blocks via inline FileSystem in metadata table. - Also added support to read for a list of keys(batch get) rather than full scan in metadata table. ## Brief change log - Added two new configs to HoodieMetadataConfig. `hoodie.metadata.enable.inline.reading.log.files` and `hoodie.metadata.enable.full.scan.log.files`. - Since we are adding support for seek based read, renamed AbstractHoodieLogRecordScanner to AbstractHoodieLogRecordReader. and so have renamed HoodieMetadataMergedLogRecordReader. - Added new method to HoodieMetadataMergedLogRecordReader to support this purpose(i.e. reading records for a list of keys) w/o doing full scan. ``` public List<Pair<String, Option<HoodieRecord<HoodieMetadataPayload>>>> getRecordsByKeys(List<String> keys) { } ``` - Added new method to HoodieDataBlock for the new requirement. Base class does not have any impl. HoodieHFileDataBlock overrides and gives a concrete impl where in records are read via inline FileSystem with seek based approach. ``` public List<IndexedRecord> getRecords(List<String> keys) throws IOException { } ``` - HoodieDataBlock also adheres to enableInline config even if not for batch get. Basically 3 options are possible. a: full scan w/o inline. b. full scan with inlining. c. batch get (with inline) - have fixed metadata reader (HoodieBackedTableMetadata) to leverage the new apis based on config values. ## Verify this pull request This change added tests and can be verified as follows: - Added tests to TestHoodieRealtimeRecordReader to verify the change. - Found some gaps in testing HFileWriter and Reader especially around seek based read and have added TestHoodieHFileReaderWriter to test these cases. - Enabled inline and batch get reads to 1 test in TestHoodieBackedMetadata. ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org