Am Sat, 15 Aug 2015 05:10:57 +0000 (UTC)
schrieb Duncan <1i5t5.dun...@cox.net>:

> Marc Joliet posted on Fri, 14 Aug 2015 23:37:37 +0200 as excerpted:
> 
> > (One other thing I found interesting was that "btrfs scrub" didn't care
> > about the link count errors.)
> 
> A lot of people are confused about exactly what btrfs scrub does, and 
> expect it to detect and possibly fix stuff it has nothing to do with.  
> It's *not* an fsck.
> 
> Scrub does one very useful, but limited, thing.  It systematically 
> verifies that the computed checksums for all data and metadata covered by 
> checksums match the corresponding recorded checksums.  For dup/raid1/
> raid10 modes, if there's a match failure, it will look up the other copy 
> and see if it matches, replacing the invalid block with a new copy of the 
> other one, assuming it's valid.  For raid56 modes, it attempts to compute 
> the valid copy from parity and, again assuming a match after doing so, 
> does the replace.  If a valid copy cannot be found or computed, either 
> because it's damaged too or because there's no second copy or parity to 
> fall back on (single and raid0 modes), then scrub will detect but cannot 
> correct the error.
> 
> In routine usage, btrfs automatically does the same thing if it happens 
> to come across checksum errors in its normal IO stream, but it has to 
> come across them first.  Scrub's benefit is that it systematically 
> verifies (and corrects errors where it can) checksums on the entire 
> filesystem, not just the parts that happen to appear in the normal IO 
> stream.

I know all that, I just thought it was interesting and wanted to remark as
such. After thinking about it a bit, of course, it makes perfect sense and is
not very interesting at all:  scrub will just verify that the checksums match,
no matter whether the underlying (meta)data is valid or not.

> Such checksum errors can be for a few reasons...
> 
> I have one ssd that's gradually failing and returns checksum errors 
> fairly regularly.  Were I using a normal filesystem I'd have had to 
> replace it some time ago.  But with btrfs in raid1 mode and regular 
> scrubs (and backups, should they be needed; sometimes I let them get a 
> bit stale, but I do have them and am prepared to live with the stale 
> restored data if I have to), I've been able to keep using the failing 
> device.  When the scrubs hit errors and btrfs does the rewrite from the 
> good copy, a block relocation on the failing device is triggered as well, 
> with the bad block taken out of service and a new one from the set of 
> spares all modern devices have takes its place.  Currently, smartctl -A 
> reports 904 reallocated sectors raw value, with a standardized value of 
> 92.  Before the first reallocated sector, the standardized value was 253, 
> perfect.  With the first reallocated sector, it immediately dropped to 
> 100, apparently the rounded percentage of spare sectors left.  It has 
> gradually dropped since then to its current 92, with a threshold value of 
> 36.  So while it's gradually failing, there's still plenty of spare 
> sectors left.  Normally I would have replaced the device even so, but 
> I've never actually had the opportunity to actually watch a slow failure 
> continue to get worse over time, and now that I do I'm a bit curious how 
> things will go, so I'm just letting it happen, tho I do have a 
> replacement device already purchased and ready, when the time comes. 

I'm curious how that will pan out.  My experience with HDDs is that at some
point the sector reallocations start picking up at a somewhat constant (maybe
even accelerating) rate.  I wonder how SSDs behave in this regard.

> So real media failure, bitrot, is one reason for bad checksums.  The data 
> read back from the device simply isn't the same data that was stored to 
> it, and the checksum fails as a result.
> 
> Of course bad connector cables or storage chipset firmware or hardware is 
> another "hardware" cause.
> 
> Sudden reboot or power loss, with data being actively written and one 
> copy either already updated or not yet touched, while the other is 
> actually being written at the time of the crash so the write isn't 
> completed, is yet another reason for checksum failure.  This one is 
> actually why a scrub can appear to do so much more than it does, because 
> where there's a second copy (or parity) of the data available, scrub can 
> use it to recover the partially written copy (which being partially 
> written fails its checksum verification) to either the completed write 
> state, if the other copy was already written, or the pre-write state, if 
> the other copy hadn't been written at all, yet.  In this way the result 
> is often the same one an fsck would normally produce, detecting and 
> fixing the error, but the mechanism is entirely different -- it only 
> detected and fixed the error because the checksum was bad and it had a 
> good copy it could replace it with, not because it had any smarts about 
> how the filesystem actually worked, and could actually tell what the 
> error was and correct it by actually correcting it.
> 
> 
> Meanwhile, in your case the problem was an actual btrfs logic bug -- it 
> didn't track the inode ref-counts correctly, and didn't remove the inode 
> when the last reference to it was deleted, because it still thought there 
> were more references.  So the metadata actually written to storage was 
> incorrect due to the logic flaw, but the checksum covering it was indeed 
> the correct checksum for that metadata, as wrong as the metadata actually 
> happened to be.  So scrub couldn't detect the error, because it was an 
> error not in checksum, which was computed correctly over the metadata, 
> but in the logic of the metadata itself as it was written.  Scrub 
> therefore had nothing to do with that error and was in fact totally 
> oblivious to the fact that the valid checksum covered flawed data in the 
> first place.  Only a tool that could follow the actual logic, send in 
> this case, since it has to follow the logic in ordered to properly send 
> it, could detect the error, and only btrfs check knew enough about the 
> logic to both detect the problem and correct it -- tho even then, it 
> couldn't totally fix it, as part of the metadata was irretrievably 
> missing, so it simply dropped what it could retrieve in lost-and-found.
> 
> 
> That should make the answer to the question of why scrub couldn't detect 
> and fix the problem clearer -- scrub only detects and possibly fixes a 
> very specific problem. checksum verification failure, and that's not the 
> problem you had.  As far as scrub was concerned, the checksums were fine, 
> and that's all it knows about, so to it, the data and metadata were fine.

Yeah, that's a more verbose way to put it :) .  Thanks anyway.

Greetings
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

Attachment: pgpOQqzgQxBER.pgp
Description: Digitale Signatur von OpenPGP

Reply via email to