[UNRESOLVED] Re: errors found in extent allocation tree or chunk allocation after power failure

Pallissard, Matthew Fri, 27 Sep 2019 17:02:23 -0700

On 2019-09-25T14:32:31, Pallissard, Matthew wrote:
> On 2019-09-25T15:05:44, Chris Murphy wrote:
> > On Wed, Sep 25, 2019 at 1:34 PM Pallissard, Matthew <m...@pallissard.net> 
> > wrote:
> > > On 2019-09-25T13:08:34, Chris Murphy wrote:
> > > > On Wed, Sep 25, 2019 at 8:50 AM Pallissard, Matthew 
> > > > <m...@pallissard.net> wrote:
> > > > >
> > > > > Version:
> > > > > Kernel: 5.2.2-arch1-1-ARCH #1 SMP PREEMPT Sun Jul 21 19:18:34 UTC 
> > > > > 2019 x86_64 GNU/Linux
> > > >
> > > > You need to upgrade to arch kernel 5.2.14 or newer (they backported the 
> > > > fix first appearing in stable 5.2.15). Or you need to downgrade to 5.1 
> > > > series.
> > > > https://lore.kernel.org/linux-btrfs/20190911145542.1125-1-fdman...@kernel.org/T/#u
> > > >
> > > > That's a nasty bug. I don't offhand see evidence that you've hit this 
> > > > bug. But I'm not certain. So first thing should be to use a different 
> > > > kernel.
> > >
> > > Interesting, I'll go ahead with a kernel upgrade as that easy enough.
> > > However, that looks like it's related to a stacktrace regarding a hung 
> > > process.  Which is not the original problem I had.
> > > Based on the output in my previous email, I've been working under the 
> > > assumption that there is a problem on-disk.  Is that not correct?
> >
> > That bug does cause filesystem corruption that is not repairable.
> > Whether you have that problem or a different problem, I'm not sure.
> > But it's best to avoid combining problems.
> >
> > The file system mounts rw now? Or still only mounts ro?
> 
> It mounts RW, but I have yet to attempt an actual write.
> 
> 
> > I think most of the errors reported by btrfs check, if they still exist 
> > after doing a scrub, should be repaired by 'btrfs check --repair' but I 
> > don't advise that until later. I'm not a developer, maybe Qu can offer some 
> > advise on those errors.
> 
> 
> > > > Next, anytime there is a crash or powerfailur with Btrfs raid56, you 
> > > > need to do a complete scrub of the volume. Obviously will take time but 
> > > > that's what needs to be done first.
> > >
> > > I'm using raid 10, not 5 or 6.
> >
> > Same advice, but it's not as important to raid10 because it doesn't have 
> > the write hole problem.
> 
> 
> > > > OK actually, before the scrub you need to confirm that each drive's SCT 
> > > > ERC time is *less* than the kernel's SCSI command timer. e.g.
> > >
> > > I gather that I should probably do this before any scrub, be it raid 5, 
> > > 6, or 10.  But, Is a scrub the operation I should attempt on this raid 10 
> > > array to repair the specific errors mentioned in my previous email?
> >
> > Definitely deal with the timing issue first. If by chance there are bad 
> > sectors on any of the drives, they must be properly reported by the drive 
> > with a discrete read error in order for Btrfs to do a proper fixup. If the 
> > times are mismatched, then Linux can get tired waiting, and do a link reset 
> > on the drive before the read error happens. And now the whole command queue 
> > is lost and the problem isn't fixed.
> 
> Good to know, that seems like a critical piece of information.  A few 
> searches turned up this page, https://wiki.debian.org/Btrfs#FAQ.
> 
> Should this be noted on the 'gotchas' or 'getting started page as well?  I'd 
> be happy to make edits should the powers that be allow it.
> 
> 
> > There are myriad errors and the advice I'm giving to scrub is a safe first 
> > step to make sure the storage stack is sane - or at least we know where the 
> > simpler problems are. And then move to the less simple ones that have 
> > higher risk.  It also changed the volume the least. Everything else, like 
> > balance and chunk recover and btrfs check --repair - all make substantial 
> > changes to the file system and have higher risk of making things worse.
> 
> This sounds sensible.
> 
> 
> > In theory if the storage stack does exactly what Btrfs says, then at worst 
> > you should lose some data, but the file system itself should be consistent. 
> > And that includes power failures. The fact there's problems reported 
> > suggests a bug somewhere - it could be Btrfs, it could be device mapper, it 
> > could be controller or drive firmware.
> 
> I'll go ahead with a kernel upgrade/make sure the timing issues are squared 
> away.  Then I'll kick off a scrub.
> 
> I'll report back when the scrub is complete or something interesting happens. 
>  Whichever comes first.


As a followup;
1. I took care of the timing issues
2. ran a scrub.
3. I ran a balance, it kept failing with about 20% left
  - stacktraces in dmesg showed spinlock stuff

3. got I/O errors on one file during my final backup, (
  - post-backup hashsums of everything else checked out
  - the errors during the copy were csum mismatches should anyone care

4. ran a bunch of potentially disruptive btrfs check commands in alphabetical 
order because "why not at this point?"
  - they had zero affect as far as I can tell, all the same files were 
readable, the btrfs check errors looked identical (admittedly I didn't put them 
side by side)

5. re-provisioned the array, restored from backups.

As I thought about it, it may have not been an issue with the original power 
outage.  I only ran a check after the power outage.  My array could have had an 
issue due to a previous bug. I was on a 5.2x kernel for several weeks under 
high load.  Anyway, there are enough unknowns to make a root cause analysis not 
worth my time.

Marking this as unresolved folks in the future who may be looking for answers.

Matt Pallissard

signature.asc
Description: PGP signature

[UNRESOLVED] Re: errors found in extent allocation tree or chunk allocation after power failure

Reply via email to