On Thu, Jan 18, 2024 at 04:40:47PM +0100, Zdenek Kabelac wrote:
> Cache can contain blocks that are still being 'synchronized' to the cache
> origin. So while the 'writing' process doesn't get ACK for writes - the
> cache
> may have valid blocks that are 'dirty' in terms of being synchronized to
> origin device.
> 
> And while this is usually not a problem when system works properly,
> it's getting into weird 'state machine' model when i.e. origin device has
> errors - which might be even 'transient' with all the variety of storage
> types and raid arrays with integrity and self-healing and so on...
> 
> So while it's usually not a problem for a laptop with 2 disks, the world is
> more complex...

Ehm, but wouldn't anything other than discarding that block from the cache and 
using whatever is on the backing storage introduce unpredictable errors?
As like you already said it was never ACKed, so the software that tried to 
write it never expected it to be written.
Why exactly are we allowed to use the data from the write-through cache to 
modify the data on the backing storage in such cases?
I.E. Why can we safely consider it as valid data?

> metadata - so if there is again some 'reboot' and PV with cache appears back
> - it will not interfere with the system (aka providing some historical
> cached blocks,  so just like mirrored leg needs some care...)

Same here, why do we have to consider these blocks at all and can't discard 
them? We know when a drive re-appears, so we could just not use it without 
validation, or in the case the volatile flag I suggested would be used, just 
wipe it and start over...

After all I don't know anyone that designs their storage systems with the 
assumption that the write-through cache has to be redundant.
Even more, I know enough people in data center environments that reuse their 
"failing but still kinda good" SSDs and NVMEs for write-through caches using 
the assumption that them failing at most impacts read performance but not data 
security.

Is there some common missconception at play? Or what exaclty am I missing here?

Sincerely,
Klaus Frank

Reply via email to