gaozhangmin opened a new issue, #3973:
URL: https://github.com/apache/bookkeeper/issues/3973

   In a typical bookkeeper deployment, SSD disks are used to store Journal log 
data, while HDD disks are used to store Ledger data. Data writes are initially 
stored in memory and then asynchronously flushed to the HDD disk in the 
background. However, due to memory limitations, the amount of data that can be 
cached is restricted. Consequently, requests for historical data ultimately 
rely on the HDD disk, which becomes a bottleneck for the entire Bookkeeper 
cluster. Moreover, during data recovery processes following node failures, a 
substantial amount of historical data needs to be read from the HDD disk, 
leading to the disk's I/O utilization reaching maximum capacity and resulting 
in significant read request delays or failures.
   
   To address these challenges, a new architecture is proposed: the 
introduction of a disk cache between the memory cache and the HDD disk, 
utilizing an SSD disk as an intermediary medium to significantly extend data 
caching duration. The data flow is as follows: journal -> write cache -> SSD 
cache -> HDD disk. The SSD disk cache functions as a regular LedgerStorage 
layer and is compatible with all existing LedgerStorage implementations. The 
following outlines the process:
   
   1. Data eviction from the disk cache to the Ledger data disk occurs on a 
per-log file basis.
   2. A new configuration parameter, diskCacheRetentionTime, is added to set 
the duration for which hot data is retained. Files with write timestamps older 
than the retention time will be evicted to the Ledger data disk.
   3. A new configuration parameter, diskCacheThreshold, is added. If the disk 
cache utilization exceeds the threshold, the eviction process is accelerated. 
Data is evicted to the Ledger data disk based on the order of file write time 
until the disk space recovers above the threshold.
   4. A new thread, ColdStorageArchiveThread, is introduced to periodically 
evict data from the disk cache to the Ledger data disk.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to