Re: Introduce cold ledger storage layer.

2023-05-25 Thread Enrico Olivelli
Gavin,
This idea looks promising, as Dave mentioned, it could anticipate
adding support for moving cold data to cheaper cloud storage

Enrico

Il giorno ven 26 mag 2023 alle ore 06:17 Dave Fisher
 ha scritto:
>
>
>
> Sent from my iPhone
>
> > On May 25, 2023, at 7:37 PM, Gavin gao  wrote:
> >
> > In a typical bookkeeper deployment, SSD disks are used to store Journal log
> > data, while HDD disks are used to store Ledger data.
>
> What is used is a deployment choice. I know that when OMB is run locally 
> attached SSDs are used for both.
>
> I do agree that the choice of SSD and HDD disks can impact Bookkeeper 
> performance. Increasing IOPs and throughput will impact performance 
> significantly. For example in AWS a default gp3 attached disk will have large 
> latencies, but upping performance will give performance maybe 4x slower than 
> an SSD locally attached.
> > Data writes are
> > initially stored in memory and then asynchronously flushed to the HDD disk
> > in the background. However, due to memory limitations, the amount of data
> > that can be cached is restricted. Consequently, requests for historical
> > data ultimately rely on the HDD disk, which becomes a bottleneck for the
> > entire Bookkeeper cluster. Moreover, during data recovery processes
> > following node failures, a substantial amount of historical data needs to
> > be read from the HDD disk, leading to the disk's I/O utilization reaching
> > maximum capacity and resulting in significant read request delays or
> > failures.
> >
> > To address these challenges, a new architecture is proposed: the
> > introduction of a disk cache between the memory cache and the HDD disk,
> > utilizing an SSD disk as an intermediary medium to significantly extend
> > data caching duration. The data flow is as follows: journal -> write cache
> > -> SSD cache -> HDD disk. The SSD disk cache functions as a regular
> > LedgerStorage layer and is compatible with all existing LedgerStorage
> > implementations.
>
> A different way to look at this is to consider the cold layer as being 
> optional and within HDD or even S3. In S3 you could have advantages with 
> recovery into different AZs. You could also significantly improve replay.
> > The following outlines the process:
> >
> >   1. Data eviction from the disk cache to the Ledger data disk occurs on a
> >   per-log file basis.
> >   2. A new configuration parameter, diskCacheRetentionTime, is added to
> >   set the duration for which hot data is retained. Files with write
> >   timestamps older than the retention time will be evicted to the Ledger 
> > data
> >   disk.
>
> If you can adjust this to use a recent use approach then very long ledger can 
> be easily read with predictively moving ledgers from cold to hot.
>
> >   3. A new configuration parameter, diskCacheThreshold, is added. If the
> >   disk cache utilization exceeds the threshold, the eviction process is
> >   accelerated. Data is evicted to the Ledger data disk based on the order of
> >   file writes until the disk space recovers above the threshold.
> >   4. A new thread, ColdStorageArchiveThread, is introduced to periodically
> >   evict data from the disk cache to the Ledger data disk.
>
> Anotger thread is also needed - ColdStorageRetrievalThread.
>
> Just some thoughts.
>
> Best,
> Dave


Re: Introduce cold ledger storage layer.

2023-05-25 Thread Dave Fisher



Sent from my iPhone

> On May 25, 2023, at 7:37 PM, Gavin gao  wrote:
> 
> In a typical bookkeeper deployment, SSD disks are used to store Journal log
> data, while HDD disks are used to store Ledger data.

What is used is a deployment choice. I know that when OMB is run locally 
attached SSDs are used for both.

I do agree that the choice of SSD and HDD disks can impact Bookkeeper 
performance. Increasing IOPs and throughput will impact performance 
significantly. For example in AWS a default gp3 attached disk will have large 
latencies, but upping performance will give performance maybe 4x slower than an 
SSD locally attached.
> Data writes are
> initially stored in memory and then asynchronously flushed to the HDD disk
> in the background. However, due to memory limitations, the amount of data
> that can be cached is restricted. Consequently, requests for historical
> data ultimately rely on the HDD disk, which becomes a bottleneck for the
> entire Bookkeeper cluster. Moreover, during data recovery processes
> following node failures, a substantial amount of historical data needs to
> be read from the HDD disk, leading to the disk's I/O utilization reaching
> maximum capacity and resulting in significant read request delays or
> failures.
> 
> To address these challenges, a new architecture is proposed: the
> introduction of a disk cache between the memory cache and the HDD disk,
> utilizing an SSD disk as an intermediary medium to significantly extend
> data caching duration. The data flow is as follows: journal -> write cache
> -> SSD cache -> HDD disk. The SSD disk cache functions as a regular
> LedgerStorage layer and is compatible with all existing LedgerStorage
> implementations.

A different way to look at this is to consider the cold layer as being optional 
and within HDD or even S3. In S3 you could have advantages with recovery into 
different AZs. You could also significantly improve replay.
> The following outlines the process:
> 
>   1. Data eviction from the disk cache to the Ledger data disk occurs on a
>   per-log file basis.
>   2. A new configuration parameter, diskCacheRetentionTime, is added to
>   set the duration for which hot data is retained. Files with write
>   timestamps older than the retention time will be evicted to the Ledger data
>   disk.

If you can adjust this to use a recent use approach then very long ledger can 
be easily read with predictively moving ledgers from cold to hot.

>   3. A new configuration parameter, diskCacheThreshold, is added. If the
>   disk cache utilization exceeds the threshold, the eviction process is
>   accelerated. Data is evicted to the Ledger data disk based on the order of
>   file writes until the disk space recovers above the threshold.
>   4. A new thread, ColdStorageArchiveThread, is introduced to periodically
>   evict data from the disk cache to the Ledger data disk.

Anotger thread is also needed - ColdStorageRetrievalThread.

Just some thoughts.

Best,
Dave

Re: Introduce cold ledger storage layer.

2023-05-25 Thread Wenbing Shen
Hi, Gavin gao

A very interesting new feature. Our team once discussed implementing the
ssd cache layer in bookkeeper, because our other internal message queues,
such as kafka, use this architecture. We hope that bookkeeper storage can
be applied to the same machine type, but because Other internal work, this
work has not been formally arranged in daily work.

Adding the ssd cache layer, I believe that the read and write timeout
caused by the hot traffic of the local HDD disk will be effectively
improved.

I'm really looking forward to this feature. :)

Thanks,
wenbingshen

Gavin gao  于2023年5月26日周五 10:37写道:

> In a typical bookkeeper deployment, SSD disks are used to store Journal log
> data, while HDD disks are used to store Ledger data. Data writes are
> initially stored in memory and then asynchronously flushed to the HDD disk
> in the background. However, due to memory limitations, the amount of data
> that can be cached is restricted. Consequently, requests for historical
> data ultimately rely on the HDD disk, which becomes a bottleneck for the
> entire Bookkeeper cluster. Moreover, during data recovery processes
> following node failures, a substantial amount of historical data needs to
> be read from the HDD disk, leading to the disk's I/O utilization reaching
> maximum capacity and resulting in significant read request delays or
> failures.
>
> To address these challenges, a new architecture is proposed: the
> introduction of a disk cache between the memory cache and the HDD disk,
> utilizing an SSD disk as an intermediary medium to significantly extend
> data caching duration. The data flow is as follows: journal -> write cache
> -> SSD cache -> HDD disk. The SSD disk cache functions as a regular
> LedgerStorage layer and is compatible with all existing LedgerStorage
> implementations. The following outlines the process:
>
>1. Data eviction from the disk cache to the Ledger data disk occurs on a
>per-log file basis.
>2. A new configuration parameter, diskCacheRetentionTime, is added to
>set the duration for which hot data is retained. Files with write
>timestamps older than the retention time will be evicted to the Ledger
> data
>disk.
>3. A new configuration parameter, diskCacheThreshold, is added. If the
>disk cache utilization exceeds the threshold, the eviction process is
>accelerated. Data is evicted to the Ledger data disk based on the order
> of
>file writes until the disk space recovers above the threshold.
>4. A new thread, ColdStorageArchiveThread, is introduced to periodically
>evict data from the disk cache to the Ledger data disk.
>


Introduce cold ledger storage layer.

2023-05-25 Thread Gavin gao
In a typical bookkeeper deployment, SSD disks are used to store Journal log
data, while HDD disks are used to store Ledger data. Data writes are
initially stored in memory and then asynchronously flushed to the HDD disk
in the background. However, due to memory limitations, the amount of data
that can be cached is restricted. Consequently, requests for historical
data ultimately rely on the HDD disk, which becomes a bottleneck for the
entire Bookkeeper cluster. Moreover, during data recovery processes
following node failures, a substantial amount of historical data needs to
be read from the HDD disk, leading to the disk's I/O utilization reaching
maximum capacity and resulting in significant read request delays or
failures.

To address these challenges, a new architecture is proposed: the
introduction of a disk cache between the memory cache and the HDD disk,
utilizing an SSD disk as an intermediary medium to significantly extend
data caching duration. The data flow is as follows: journal -> write cache
-> SSD cache -> HDD disk. The SSD disk cache functions as a regular
LedgerStorage layer and is compatible with all existing LedgerStorage
implementations. The following outlines the process:

   1. Data eviction from the disk cache to the Ledger data disk occurs on a
   per-log file basis.
   2. A new configuration parameter, diskCacheRetentionTime, is added to
   set the duration for which hot data is retained. Files with write
   timestamps older than the retention time will be evicted to the Ledger data
   disk.
   3. A new configuration parameter, diskCacheThreshold, is added. If the
   disk cache utilization exceeds the threshold, the eviction process is
   accelerated. Data is evicted to the Ledger data disk based on the order of
   file writes until the disk space recovers above the threshold.
   4. A new thread, ColdStorageArchiveThread, is introduced to periodically
   evict data from the disk cache to the Ledger data disk.