[
https://issues.apache.org/jira/browse/SPARK-56734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-56734:
-----------------------------------
Labels: pull-request-available (was: )
> Optimize RocksDBPersistenceEngine by segregating data into distinct Column
> Families
> -----------------------------------------------------------------------------------
>
> Key: SPARK-56734
> URL: https://issues.apache.org/jira/browse/SPARK-56734
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 4.3.0
> Reporter: darion yaphet
> Priority: Major
> Labels: pull-request-available
>
> *Motivation*
> Currently, {{RocksDBPersistenceEngine}} in the Spark Master stores all
> metadata (Applications, Workers, Drivers) in a single default Column Family,
> using key prefixes to distinguish them. This causes significant performance
> issues during recovery: * *Inefficient Scanning:* Reading a specific type
> (e.g., Applications) requires scanning the entire database and performing
> expensive string prefix matching, leading to *O(N_total)* complexity.
> * *High Overhead:* The current approach wastes CPU on string operations and
> causes cache contention between different data types.
> *Proposed Solution*
> Refactor {{RocksDBPersistenceEngine}} to use native *Column Families* for
> data isolation (e.g., separate CFs for Apps, Workers, and Drivers). *
> Eliminate key prefixing logic and route data directly to the corresponding
> {{{}ColumnFamilyHandle{}}}.
> * Allow the engine to scan only the relevant Column Family during recovery.
> *Benefits* * *Faster Recovery:* Optimizes read complexity from *O(N_total)*
> to {*}O(N_type){*}, drastically reducing Master startup time.
> * *Better Performance:* Removes string matching overhead and improves Block
> Cache hit rates.
> * *Granular Control:* Enables independent configuration (e.g., compression,
> TTL) for different metadata types.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]