Will do, thanks!

- mike
On Tue, Dec 4, 2018 at 9:24 PM Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
>
>
>
> On 2018/12/5 上午6:33, Mike Javorski wrote:
> > On Tue, Dec 4, 2018 at 2:18 AM Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
> >>
> >>
> >>
> >> On 2018/12/4 上午11:32, Mike Javorski wrote:
> >>> Need a bit of advice here ladies / gents. I am running into an issue
> >>> which Qu Wenruo seems to have posted a patch for several weeks ago
> >>> (see https://patchwork.kernel.org/patch/10694997/).
> >>>
> >>> Here is the relevant dmesg output which led me to Qu's patch.
> >>> ----
> >>> [   10.032475] BTRFS critical (device sdb): corrupt leaf: root=2
> >>> block=24655027060736 slot=20 bg_start=13188988928 bg_len=10804527104,
> >>> invalid block group size, have 10804527104 expect (0, 10737418240]
> >>> [   10.032493] BTRFS error (device sdb): failed to read block groups: -5
> >>> [   10.053365] BTRFS error (device sdb): open_ctree failed
> >>> ----
> >>
> >> Exactly the same symptom.
> >>
> >>>
> >>> This server has a 16 disk btrfs filesystem (RAID6) which I boot
> >>> periodically to btrfs-send snapshots to. This machine is running
> >>> ArchLinux and I had just updated  to their latest 4.19.4 kernel
> >>> package (from 4.18.10 which was working fine). I've tried updating to
> >>> the 4.19.6 kernel that is in testing, but that doesn't seem to resolve
> >>> the issue. From what I can see on kernel.org, the patch above is not
> >>> pushed to stable or to Linus' tree.
> >>>
> >>> At this point the question is what to do. Is my FS toast?
> >>
> >> If there is no other problem at all, your fs is just fine.
> >> It's my original patch too sensitive (the excuse for not checking chunk
> >> allocator carefully enough).
> >>
> >> But since you have the down time, it's never a bad idea to run a btrfs
> >> check --readonly to see if your fs is really OK.
> >>
> >
> > After running for 4 hours...
> >
> > UUID: 25b16375-b90b-408e-b592-fb07ed116d58
> > [1/7] checking root items
> > [2/7] checking extents
> > [3/7] checking free space cache
> > [4/7] checking fs roots
> > [5/7] checking only csums items (without verifying data)
> > [6/7] checking root refs
> > [7/7] checking quota groups
> > found 24939616169984 bytes used, no error found
> > total csum bytes: 24321980768
> > total tree bytes: 41129721856
> > total fs tree bytes: 9854648320
> > total extent tree bytes: 737804288
> > btree space waste bytes: 7483785005
> > file data blocks allocated: 212883520618496
> >  referenced 212876546314240
> >
> > So things appear good to go. I will keep an eye out for the patch to
> > land before upgrading the kernel again.
> >
> >>> Could I
> >>> revert to the 4.18.10 kernel and boot safely?
> >>
> >> If your btrfs check --readonly doesn't report any problem, then you're
> >> completely fine to do so.
> >> Although I still recommend to go RAID10 other than RAID5/6.
> >
> > I understand the risk, but don't have the funds to buy sufficient
> > disks to operate in RAID10.
>
> Then my advice would be, for any powerloss, please run a full-disk scrub
> (and of course ensure there is not another powerloss during scrubbing).
>
> I know this sounds silly and slow, but at least it should workaround the
> write hole problem.
>
> Thanks,
> Qu
>
> > The data is mostly large files and
> > activity is predominantly reads, so risk is currently acceptable given
> > the backup server. All super critical data is backed up to (very slow)
> > cloud storage.
> >
> >>
> >> Thanks,
> >> Qu
> >>
> >>> I don't know if the 4.19
> >>> boot process may have flipped some bits which would make reverting
> >>> problematic.
> >>>
> >>> Thanks much,
> >>>
> >>> - mike
> >>>
> >>
>

Reply via email to