nsivabalan opened a new pull request #3762:
URL: https://github.com/apache/hudi/pull/3762


   ## What is the purpose of the pull request
   
   - Added support to read HFile log blocks via inline FileSystem in metadata 
table.
   - Also added support to read for a list of keys(batch get) rather than full 
scan in metadata table. 
   
   ## Brief change log
   - Added two new configs to HoodieMetadataConfig. 
`hoodie.metadata.enable.inline.reading.log.files` and 
`hoodie.metadata.enable.full.scan.log.files`. 
   - Since we are adding support for seek based read, renamed 
AbstractHoodieLogRecordScanner to AbstractHoodieLogRecordReader. and so have 
renamed HoodieMetadataMergedLogRecordReader. 
   - Added new method to HoodieMetadataMergedLogRecordReader to support this 
purpose(i.e. reading records for a list of keys) w/o doing full scan. 
   ```
   public List<Pair<String, Option<HoodieRecord<HoodieMetadataPayload>>>> 
getRecordsByKeys(List<String> keys) {
   
   }
   ```
   - Added new method to HoodieDataBlock for the new requirement. Base class 
does not have any impl. HoodieHFileDataBlock overrides and gives a concrete 
impl where in records are read via inline FileSystem with seek based approach. 
   ```
   public List<IndexedRecord> getRecords(List<String> keys) throws IOException {
   }
   ```
   - HoodieDataBlock also adheres to enableInline config even if not for batch 
get. Basically 3 options are possible. a: full scan w/o inline. b. full scan 
with inlining. c. batch get (with inline) 
   - have fixed metadata reader (HoodieBackedTableMetadata) to leverage the new 
apis based on config values. 
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
     - Added tests to TestHoodieRealtimeRecordReader to verify the change.
     - Found some gaps in testing HFileWriter and Reader especially around seek 
based read and have added TestHoodieHFileReaderWriter to test these cases.
     - Enabled inline and batch get reads to 1 test in 
TestHoodieBackedMetadata. 
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to