Re:

Andy Leadbetter Fri, 23 Nov 2018 01:35:05 -0800
Will capture all of that this evening, and try it with the latest
kernel and tools.  Thanks for the input on what info is relevant, with
gather it asap.
On Fri, 23 Nov 2018 at 07:53, Chris Murphy <li...@colorremedies.com> wrote:
>
> On Thu, Nov 22, 2018 at 11:41 PM Andy Leadbetter
> <andy.leadbet...@theleadbetters.com> wrote:
> >
> > I have a failing 2TB disk that is part of a 4 disk RAID 6 system.  I
> > have added a new 2TB disk to the computer, and started a BTRFS replace
> > for the old and new disk.  The process starts correctly however some
> > hours into the job, there is an error and kernel oops. relevant log
> > below.
>
> The relevant log is the entire dmesg, not a snippet. It's decently
> likely there's more than one thing going on here. We also need full
> output of 'smartctl -x' for all four drives, and also 'smartctl -l
> scterc' for all four drives, and also 'cat
> /sys/block/sda/device/timeout' for all four drives. And which bcache
> mode you're using.
>
> The call trace provided is from kernel 4.15 which is sufficiently long
> ago I think any dev working on raid56 might want to see where it's
> getting tripped up on something a lot newer, and this is why:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/fs/btrfs/raid56.c?id=v4.19.3&id2=v4.15.1
>
> That's a lot of changes in just the raid56 code between 4.15 and 4.19.
> And then in you call trace, btrfs_dev_replace_start is found in
> dev-replace.c which likewise has a lot of changes. But then also, I
> think 4.15 might still be in the era where it was not recommended to
> use 'btrfs dev replace' for raid56, only non-raid56. I'm not sure if
> the problems with device replace were fixed, and if they were fixed
> kernel or progs side. Anyway, the latest I recall, it was recommended
> on raid56 to 'btrfs dev add' then 'btrfs dev remove'.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/fs/btrfs/dev-replace.c?id=v4.19.3&id2=v4.15.1
>
> And that's only a few hundred changes for each. Check out inode.c -
> there are over 2000 changes.
>
>
> > The disks are configured on top of bcache, in 5 arrays with a small
> > 128GB SSD cache shared.  The system in this configuration has worked
> > perfectly for 3 years, until 2 weeks ago csum errors started
> > appearing.  I have a crashplan backup of all files on the disk, so I
> > am not concerned about data loss, but I would like to avoid rebuild
> > the system.
>
> btrfs-progs 4.17 still considers raid56 experimental, not for
> production use. And three years ago, the current upstream kernel
> released was 4.3 so I'm gonna guess the kernel history of this file
> system goes back older than that, very close to raid56 code birth. And
> then adding bcache to this mix just makes it all the more complicated.
>
>
>
> >
> > btrfs dev stats shows
> > [/dev/bcache0].write_io_errs    0
> > [/dev/bcache0].read_io_errs     0
> > [/dev/bcache0].flush_io_errs    0
> > [/dev/bcache0].corruption_errs  0
> > [/dev/bcache0].generation_errs  0
> > [/dev/bcache1].write_io_errs    0
> > [/dev/bcache1].read_io_errs     20
> > [/dev/bcache1].flush_io_errs    0
> > [/dev/bcache1].corruption_errs  0
> > [/dev/bcache1].generation_errs  14
> > [/dev/bcache3].write_io_errs    0
> > [/dev/bcache3].read_io_errs     0
> > [/dev/bcache3].flush_io_errs    0
> > [/dev/bcache3].corruption_errs  0
> > [/dev/bcache3].generation_errs  19
> > [/dev/bcache2].write_io_errs    0
> > [/dev/bcache2].read_io_errs     0
> > [/dev/bcache2].flush_io_errs    0
> > [/dev/bcache2].corruption_errs  0
> > [/dev/bcache2].generation_errs  2
>
>
> 3 of 4 drives have at least one generation error. While there are no
> corruptions reported, generation errors can be really tricky to
> recover from at all. If only one device had only read errors, this
> would be a lot less difficult.
>
>
> > I've tried the latest kernel, and the latest tools, but nothing will
> > allow me to replace, or delete the failed disk.
>
> If the file system is mounted, I would try to make a local backup ASAP
> before you lose the whole volume. Whether it's LVM pool of two drives
> (linear/concat) with XFS, or if you go with Btrfs -dsingle -mraid1
> (also basically a concat) doesn't really matter, but I'd get whatever
> you can off the drive. I expect avoiding a rebuild in some form or
> another is very wishful thinking and not very likely.
>
> The more changes are made to the file system, repair attempts or
> otherwise writing to it, decreases the chance of recovery.
>
> --
> Chris Murphy
Re:

Reply via email to