[ 
https://issues.apache.org/jira/browse/HUDI-5236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin reassigned HUDI-5236:
-------------------------------------

    Assignee: Alexey Kudinkin

> Implement HoodieBackedTableMetadata v2
> --------------------------------------
>
>                 Key: HUDI-5236
>                 URL: https://issues.apache.org/jira/browse/HUDI-5236
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Alexey Kudinkin
>            Assignee: Alexey Kudinkin
>            Priority: Blocker
>             Fix For: 0.13.1
>
>
> *Problem Statement*
> Currently, MT performance is hardly predictable due to variety of factors 
> such as, for ex, 
> whether the MT is compacted: if table is NOT compacted, when loading "files" 
> partition for ex, we will load all of the delta-log files materializing them 
> in-memory,  meaning that all subsequent requests will be served from memory. 
> However, when table IS compacted, we will only prematerialize the updated 
> records but not the records sitting in the base file, which would require us 
> to go fetch from base HFile every time (even though there's block-level 
> caching implemented inside HFile reader).
> More generally, `HoodieBackedTableMetadata` being the primary facade and 
> interface for MT, currently doesn't have a well thought-through architecture 
> and APIs, instead it serves simply as an aggregation layer for the 
> lower-level components (LogRecordScanner, FileReader, etc).
> This is problematic, since MT is a core component performance of which has 
> direct implication on the query planning and beyond. As such, it has to have:
>  # {*}Predictable performance{*}: how state of MT affects performance should 
> be easy to comprehend and reason about (for ex, {_}it's expected that 
> performance could be decreasing, with increase in scale or if the table is 
> not compacted for a long time; however it's totally unexpected that 
> performance could become worse than it was after compaction{_})
>  # {*}Have clear configuration levers{*}: behavior, performance of the MT 
> should have crystal clear configuration levers – whether records are 
> materialized in-memory or loaded dynamically, 
>  
> *Solution*
> To address aforementioned problems, we propose to implement 
> HoodieBackedTableMetadataV2 providing
>  * {*}Materialization{*}: it should allow MT to be read in either of 2 ways
>  ** _Eagerly:_ when whole MT is loaded in-memory before accessing
>  ** _Lazily:_ when MT is queried on an ad-hoc basis, however caching the 
> results of the previous queries for subsequent use
>  * {*}Configuration{*}: it should be easy to prod 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to