Re: [9fans] [PATCH] fossil: fix a deadlock in the caching logic

Bakul Shah Sat, 08 Apr 2023 10:13:45 -0700

Things like wear leveling are done by the FTL (flash translation layer) in the 
firmware. Other things it does: erase before write, logical to physical 
mapping, erasing blocks, garbage collection (moving live data around to free up 
whole blocks) etc. Typically ease blocks are 128KB or larger but seem to be 
treated as a secret by the SSD companies! At least NVMe SSDs provide at least 
64 request queues, each can hold lots of requests. There is enough buffering to 
be  able to flush all data to the Flash in case of power failure but not sure 
if that is exposed to the user (apart from a flush command).


Apart from not doing seek related optimizations and placement, you’d probably 
want to minimize unnecessary writes as SSD lifetime is limited by the amount 
you write (seems to be about at least 600 times the capacity so a TB disk will 
have 600TBW lifetime). That means avoiding metadata updates if you can, 
Deduplication may also help. I have heard that you can never really erase data 
even if you do a secure erase so the FS should have an encryption layer. On the 
flip side it may *lose* data if left unpowered for a long time (this period 
goes down fast with increased temperature). JEDEC says 1 year retention at 30°C 
for consumer and 3 month retention at 40°C for enterprise SSDs. So may be a FS 
driver should do a background scrub on reconnect if the device was not powered 
on for a long time.

> On Apr 8, 2023, at 8:12 AM, Dan Cross <cro...@gmail.com> wrote:
> 
> On Sat, Apr 8, 2023 at 10:37 AM Charles Forsyth
> <charles.fors...@gmail.com> wrote:
>> It was the different characteristics of hard drives, even decent SATA, 
>> compared to SSD and nvme that I had in mind.
> 
> Since details have been requested about this. I wouldn't presume to
> speak from Charles, but some of those differences _may_ include:
> 
> 1. Optimizing for the rotational latency of spinning media, and its effects 
> vis:
>  a. the layout of storage structures on the disk,
>  b. placement of _data_ on the device.
> 2. Effects with respect to things that aren't considerations for rotating 
> disks
>  a. Wear-leveling may be the canonical example here
> 3. Effects at the controller level.
>  a. Caching, and the effect that has on how operations are ordered to
> ensure consistency
>  b. Queuing for related objects written asynchronously and
> assumptions about latency
> 
> In short, when you change storage technologies, assumptions that were
> made with, say, a filesystem was initially written may be invalidated.
> Consider the BSD FFS for example: UFS was written in an era of VAXen
> and slow, 3600 RPM spinning disks like RA81s attached to relatively
> unintelligent controllers; it made a number of fundamental design
> decisions based on that, trying to optimize placement of data and
> metadata near each other (to minimize head travel--this is the whole
> cylinder group thing), implementation that explicitly accounted for
> platter rotation with respect to scheduling operations for the
> underlying storage device, putting multiple copies of the superblock
> in multiple locations in the disk to maximize the chances of recovery
> in the event of the (all-too-common) head crashes of the era, etc.
> They also did very careful ordering of operations for soft-updates in
> UFS2 to ensure filesystem consistency when updating metadata in the
> face of a system crash (or power failure, or whatever). It turns out
> that many of those optimizations become pessimizations (or at least
> irrelevant) when you're all of a sudden writing to a solid-state
> device, nevermind battery-backed DRAM on a much more advanced
> controller.
> 
> - Dan C.

------------------------------------------
9fans: 9fans
Permalink: 
https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-M30011df66797d8263cd1bf6c
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

Re: [9fans] [PATCH] fossil: fix a deadlock in the caching logic

Reply via email to