Gavin,
This idea looks promising, as Dave mentioned, it could anticipate
adding support for moving cold data to cheaper cloud storage

Enrico

Il giorno ven 26 mag 2023 alle ore 06:17 Dave Fisher
<wave4d...@comcast.net> ha scritto:
>
>
>
> Sent from my iPhone
>
> > On May 25, 2023, at 7:37 PM, Gavin gao <gaozhang...@gmail.com> wrote:
> >
> > In a typical bookkeeper deployment, SSD disks are used to store Journal log
> > data, while HDD disks are used to store Ledger data.
>
> What is used is a deployment choice. I know that when OMB is run locally 
> attached SSDs are used for both.
>
> I do agree that the choice of SSD and HDD disks can impact Bookkeeper 
> performance. Increasing IOPs and throughput will impact performance 
> significantly. For example in AWS a default gp3 attached disk will have large 
> latencies, but upping performance will give performance maybe 4x slower than 
> an SSD locally attached.
> > Data writes are
> > initially stored in memory and then asynchronously flushed to the HDD disk
> > in the background. However, due to memory limitations, the amount of data
> > that can be cached is restricted. Consequently, requests for historical
> > data ultimately rely on the HDD disk, which becomes a bottleneck for the
> > entire Bookkeeper cluster. Moreover, during data recovery processes
> > following node failures, a substantial amount of historical data needs to
> > be read from the HDD disk, leading to the disk's I/O utilization reaching
> > maximum capacity and resulting in significant read request delays or
> > failures.
> >
> > To address these challenges, a new architecture is proposed: the
> > introduction of a disk cache between the memory cache and the HDD disk,
> > utilizing an SSD disk as an intermediary medium to significantly extend
> > data caching duration. The data flow is as follows: journal -> write cache
> > -> SSD cache -> HDD disk. The SSD disk cache functions as a regular
> > LedgerStorage layer and is compatible with all existing LedgerStorage
> > implementations.
>
> A different way to look at this is to consider the cold layer as being 
> optional and within HDD or even S3. In S3 you could have advantages with 
> recovery into different AZs. You could also significantly improve replay.
> > The following outlines the process:
> >
> >   1. Data eviction from the disk cache to the Ledger data disk occurs on a
> >   per-log file basis.
> >   2. A new configuration parameter, diskCacheRetentionTime, is added to
> >   set the duration for which hot data is retained. Files with write
> >   timestamps older than the retention time will be evicted to the Ledger 
> > data
> >   disk.
>
> If you can adjust this to use a recent use approach then very long ledger can 
> be easily read with predictively moving ledgers from cold to hot.
>
> >   3. A new configuration parameter, diskCacheThreshold, is added. If the
> >   disk cache utilization exceeds the threshold, the eviction process is
> >   accelerated. Data is evicted to the Ledger data disk based on the order of
> >   file writes until the disk space recovers above the threshold.
> >   4. A new thread, ColdStorageArchiveThread, is introduced to periodically
> >   evict data from the disk cache to the Ledger data disk.
>
> Anotger thread is also needed - ColdStorageRetrievalThread.
>
> Just some thoughts.
>
> Best,
> Dave

Reply via email to