Re: Adventures in btrfs raid5 disk recovery

Austin S. Hemmelgarn Fri, 24 Jun 2016 11:20:59 -0700

On 2016-06-24 13:52, Chris Murphy wrote:

On Fri, Jun 24, 2016 at 11:21 AM, Andrei Borzenkov <arvidj...@gmail.com> wrote:

24.06.2016 20:06, Chris Murphy пишет:

On Fri, Jun 24, 2016 at 3:52 AM, Andrei Borzenkov <arvidj...@gmail.com> wrote:

On Fri, Jun 24, 2016 at 11:50 AM, Hugo Mills <h...@carfax.org.uk> wrote:
eta)data and RAID56 parity is not data.


   Checksums are not parity, correct. However, every data block
(including, I think, the parity) is checksummed and put into the csum
tree. This allows the FS to determine where damage has occurred,
rather thansimply detecting that it has occurred (which would be the
case if the parity doesn't match the data, or if the two copies of a
RAID-1 array don't match).


Yes, that is what I wrote below. But that means that RAID5 with one
degraded disk won't be able to reconstruct data on this degraded disk
because reconstructed extent content won't match checksum. Which kinda
makes RAID5 pointless.


I don't understand this. Whether the failed disk means a stripe is
missing a data strip or parity strip, if any other strip is damaged of
course the reconstruction isn't going to match checksum. This does not
make raid5 pointless.


Yes, you are right. We have double failure here. Still, in current
situation we apparently may end with btrfs reconstructing missing block
using wrong information. As was mentioned elsewhere, btrfs does not
verify checksum of reconstructed block, meaning data corruption.


Well that'd be bad, but also good in that it would explain a lot of
problems people have when metadata is also raid5. In this whole thread
the premise is the metadata is raid1, so the fs doesn't totally face
plant we just get a bunch of weird data corruptions. The metadata
raid5 case were sorta "WTF happened?" and not much was really said
about it other than telling the user to scrape off what they can and
start over.

Anyway, while not good I still think this is not super problematic to
at least *do* check EXTENT_CSUM after reconstruction from parity
rather than assuming that reconstruction happened correctly. The data
needed to pass fail the rebuild is already on the disk. It just needs
to be checked.

Better would be to get parity csummed and put into the csum tree. But
I don't know how much that helps. Think about always computing and
writing csums for parity, which almost never get used vs keeping
things the way they are now and just *checking our work* after
reconstruction from parity. If there's some obvious major advantage to
checksumming the parity I'm all ears but I'm not thinking of it at the
moment.

Well, the obvious major advantage that comes to mind for me tochecksumming parity is that it would let us scrub the parity data itselfand verify it. I'd personally much rather know my parity is bad beforeI need to use it than after using it to reconstruct data and getting anerror there, and I'd be willing to be that most seasoned sysadminsworking for companies using big storage arrays likely feel the sameabout it. I could see it being practical to have an option to turn thisoff for performance reasons or similar, but again, I have a feeling thatmost people would rather be able to check if a rebuild will eat databefore trying to rebuild (depending on the situation in such a case, itwill sometimes just make more sense to nuke the array and restore from abackup instead of spending time waiting for it to rebuild).


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Adventures in btrfs raid5 disk recovery

Reply via email to