[ 
https://issues.apache.org/jira/browse/SPARK-56734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-56734:
-----------------------------------
    Labels: pull-request-available  (was: )

> Optimize RocksDBPersistenceEngine by segregating data into distinct Column 
> Families
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-56734
>                 URL: https://issues.apache.org/jira/browse/SPARK-56734
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 4.3.0
>            Reporter: darion yaphet
>            Priority: Major
>              Labels: pull-request-available
>
> *Motivation*
> Currently, {{RocksDBPersistenceEngine}} in the Spark Master stores all 
> metadata (Applications, Workers, Drivers) in a single default Column Family, 
> using key prefixes to distinguish them. This causes significant performance 
> issues during recovery: * *Inefficient Scanning:* Reading a specific type 
> (e.g., Applications) requires scanning the entire database and performing 
> expensive string prefix matching, leading to *O(N_total)* complexity.
>  * *High Overhead:* The current approach wastes CPU on string operations and 
> causes cache contention between different data types.
> *Proposed Solution*
> Refactor {{RocksDBPersistenceEngine}} to use native *Column Families* for 
> data isolation (e.g., separate CFs for Apps, Workers, and Drivers). * 
> Eliminate key prefixing logic and route data directly to the corresponding 
> {{{}ColumnFamilyHandle{}}}.
>  * Allow the engine to scan only the relevant Column Family during recovery.
> *Benefits* * *Faster Recovery:* Optimizes read complexity from *O(N_total)* 
> to {*}O(N_type){*}, drastically reducing Master startup time.
>  * *Better Performance:* Removes string matching overhead and improves Block 
> Cache hit rates.
>  * *Granular Control:* Enables independent configuration (e.g., compression, 
> TTL) for different metadata types.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to