[ https://issues.apache.org/jira/browse/HUDI-5236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Kudinkin reassigned HUDI-5236: ------------------------------------- Assignee: Alexey Kudinkin > Implement HoodieBackedTableMetadata v2 > -------------------------------------- > > Key: HUDI-5236 > URL: https://issues.apache.org/jira/browse/HUDI-5236 > Project: Apache Hudi > Issue Type: Improvement > Reporter: Alexey Kudinkin > Assignee: Alexey Kudinkin > Priority: Blocker > Fix For: 0.13.1 > > > *Problem Statement* > Currently, MT performance is hardly predictable due to variety of factors > such as, for ex, > whether the MT is compacted: if table is NOT compacted, when loading "files" > partition for ex, we will load all of the delta-log files materializing them > in-memory, meaning that all subsequent requests will be served from memory. > However, when table IS compacted, we will only prematerialize the updated > records but not the records sitting in the base file, which would require us > to go fetch from base HFile every time (even though there's block-level > caching implemented inside HFile reader). > More generally, `HoodieBackedTableMetadata` being the primary facade and > interface for MT, currently doesn't have a well thought-through architecture > and APIs, instead it serves simply as an aggregation layer for the > lower-level components (LogRecordScanner, FileReader, etc). > This is problematic, since MT is a core component performance of which has > direct implication on the query planning and beyond. As such, it has to have: > # {*}Predictable performance{*}: how state of MT affects performance should > be easy to comprehend and reason about (for ex, {_}it's expected that > performance could be decreasing, with increase in scale or if the table is > not compacted for a long time; however it's totally unexpected that > performance could become worse than it was after compaction{_}) > # {*}Have clear configuration levers{*}: behavior, performance of the MT > should have crystal clear configuration levers – whether records are > materialized in-memory or loaded dynamically, > > *Solution* > To address aforementioned problems, we propose to implement > HoodieBackedTableMetadataV2 providing > * {*}Materialization{*}: it should allow MT to be read in either of 2 ways > ** _Eagerly:_ when whole MT is loaded in-memory before accessing > ** _Lazily:_ when MT is queried on an ad-hoc basis, however caching the > results of the previous queries for subsequent use > * {*}Configuration{*}: it should be easy to prod -- This message was sent by Atlassian Jira (v8.20.10#820010)