Re: BTRFS with RAID1 cannot boot when removing drive

Saint Germain Mon, 10 Feb 2014 18:31:40 -0800

Hello Duncan,

What an amazing extensive answer you gave me !
Thank you so much for it.


See my comments below.

On Mon, 10 Feb 2014 03:34:49 +0000 (UTC), Duncan <1i5t5.dun...@cox.net>
wrote :

> > I am experimenting with BTRFS and RAID1 on my Debian Wheezy (with
> > backported kernel 3.12-0.bpo.1-amd64) using a a motherboard with
> > UEFI.
> 
> My systems don't do UEFI, but I do run GPT partitions and use grub2
> for booting, with grub2-core installed to a BIOS/reserved type
> partition (instead of as an EFI service as it would be with UEFI).
> And I have root filesystem btrfs two-device raid1 mode working fine
> here, tested bootable with only one device of the two available.
> 
> So while I can't help you directly with UEFI, I know the rest of it
> can/ does work.
> 
> One more thing:  I do have a (small) separate btrfs /boot, actually
> two of them as I setup a separate /boot on each of the two devices in
> ordered to have a backup /boot, since grub can only point to
> one /boot by default, and while pointing to another in grub's rescue
> mode is possible, I didn't want to have to deal with that if the
> first /boot was corrupted, as it's easier to simply point the BIOS at
> a different drive entirely and load its (independently installed and
> configured) grub and /boot.
> 

Can you explain why you choose to have a dedicated "/boot" partition ?
I also read on this thread that it may be better to have a
dedicated /boot partition:
https://bbs.archlinux.org/viewtopic.php?pid=1342893#p1342893


> > However I haven't managed to make the system boot when the removing
> > the first hard drive.
> > 
> > I have installed Debian with the following partition on the first
> > hard drive (no BTRFS subsystem):
> > /dev/sda1: for / (BTRFS)
> > /dev/sda2: for /home (BTRFS)
> > /dev/sda3: for swap
> > 
> > Then I added another drive for a RAID1 configuration (with btrfs
> > balance) and I installed grub on the second hard drive with
> > "grub-install /dev/sdb".
> 
> Just for clarification as you don't mention it specifically, altho
> your btrfs filesystem show information suggests you did it this way,
> are your partition layouts identical on both drives?
> 
> That's what I've done here, and I definitely find that easiest to
> manage and even just to think about, tho it's definitely not a
> requirement.  But using different partition layouts does
> significantly increase management complexity, so it's useful to avoid
> if possible. =:^)

Yes, the partition layout is exactly the same on both drive (copied
with sfdisk). I also try to keep things simple ;-)

> > If I boot on sdb, it takes sda1 as the root filesystem
> 
> > If I switched the cable, it always take the first hard drive as
> > the root filesystem (now sdb)
> 
> That's normal /appearance/, but that /appearance/ doesn't fully
> reflect reality.
> 
> The problem is that mount output (and /proc/self/mounts), fstab, etc, 
> were designed with single-device filesystems in mind, and
> multi-device btrfs has to be made to fix the existing rules as best
> it can.
> 
> So what's actually happening is that the for a btrfs composed of
> multiple devices, since there's only one "device slot" for the kernel
> to list devices, it only displays the first one it happens to come
> across, even tho the filesystem will normally (unless degraded)
> require that all component devices be available and logically
> assembled into the filesystem before it can be mounted.
> 
> When you boot on sdb, naturally, the sdb component of the
> multi-device filesystem that the kernel finds, so it's the one
> listed, even tho the filesystem is actually composed of more devices,
> not just that one.

I am not following you: it seems to be the opposite of what you
describe. If I boot on sdb, I expect sdb1 and sdb2 to be the first
components that the kernel find. However I can see that sda1 and sda2
are used (using the 'mount' command).

> When you switch the cables, the first one is, at
> least on your system, always the first device component of the
> filesystem detected, so it's always the one occupying the single
> device slot available for display, even tho the filesystem has
> actually assembled all devices into the complete filesystem before
> mounting.
> 

Normally the 2 hard drive should be exactly the same (or I didn't
understand something) except for the UUID_SUB.
That's why I don't understand if I switch the cable, I should get
exactly the same results with 'mount'.
But that is not the case, the 'mount' command always point to the same
partition:
- without cable switch: sda1 and sda2
- with cable switch: sdb1 and sdb2
Everything happen as if the system is using the UUID_SUB to get his
'favorite' partition.

> > If I disconnect /dev/sda, the system doesn't boot with a message
> > saying that it hasn't found the UUID:
> > 
> > Scanning for BTRFS filesystems...
> > mount:
> > mounting /dev/disk/by-uuid/c64fca2a-5700-4cca-abac-3a61f2f7486c
> > on /root failed: Invalid argument
> > 
> > Can you tell me what I have done incorrectly ?
> > Is it because of UEFI ? If yes I haven't understood how I can
> > correct it in a simple way.
> 
> As you haven't mentioned it and the grub config below doesn't mention
> it either, I'm almost certain that you're simply not aware of the
> "degraded" mount option, and when/how it should be used.
> 

Ah yes I read about it but didn't understand that it applied in my
sitution. Thanks for pointing it out.

> You should be able to mount a two-device btrfs raid1 filesystem with
> only a single device with the degraded mount option, tho I believe
> current kernels refuse a read-write mount in that case, so you'll
> have read-only access until you btrfs device add a second device, so
> it can do normal raid1 mode once again.
> 

Indeed I managed to boot with the degraded option.
However sda1 is mounted ny default in read-write (not read only) and
sda2 (my /home) refuse to be mounted:
Mounting local filesystemsError mounting: mount: wrong fs type, bad
option, bad superblock on /dev/sda2, missing codepage or helper
program, or other error.
This is with the kernel 3.12-0.bpo.1-amd64.

> That should answer your immediate question, but do read up on the
> wiki. In addition to much of the FAQ, you'll want to read the
> sysadmin guide page, particularly the raid and data duplication
> section, and the multiple devices page, since they're directly
> apropos to btrfs multi- device raid modes.  You'll probably want to
> read the problem FAQ and gotchas pages just for the heads-up as well,
> and likely at least the raid section of the use cases page as well.

Will do !

> 
> Meanwhile, I don't believe it's on the wiki, but it's worth noting my 
> experience with btrfs raid1 mode in my pre-deployment tests.
> Actually, with the (I believe) mandatory read-only mount if raid1 is
> degraded below two devices, this problem's going to be harder to run
> into than it was in my testing several kernels ago, but here's what I
> found:
> 
> What I did was writable-degraded-mount first one of the btrfs raid1
> pair, then the other (with the other one offline in each case), and
> change a test file with each mount, so that the two copies were
> different, and neither one the same as the original file.  Then I
> remounted the filesystem with both devices once again, to see what
> would happen.
> 
> Based on my previous history with mdraid and how i knew it to behave,
> I expected some note in the log about the two devices having
> unmatched write generation and possibly an automated resync to catch
> the one back up to the other, or alternatively, dropping the one from
> the mount and requiring me to do some sort of manual sync (tho I
> really didn't know what sort of btrfs command I'd use for that, but
> this was pre-deployment testing and I was experimenting with the
> intent of finding this sort of thing out!).
> 
> That's *NOT* what I got!
> 
> What I got was NO warnings, simply one of the two new versions
> displayed when I catted the file.  I'm not sure if it could have
> shown me the other one such that which one it showed was random, or
> not, but that I didn't get a warning was certainly unsettling to me.
> 
> Then I unmounted and unplugged the one with that version of the file,
> and remounted degraded again, to check if the other copy had been
> silently updated.  It was exactly as it had been, so the copies were
> still different.
> 
> What I'd do after that today were I redoing this test, would be
> either a scrub or a balance, which would presumably find and correct
> the difference.  However, back then I didn't know enough about what I
> was doing to test that, so I didn't, and I still don't actually know
> how/ whether the difference would have been detected and corrected,
> since I never did actually test that.
> 
> 
> My takeaway from that test was not to actually play around with
> degraded writable mounts to much, and for SURE if I did, to take care
> that if I was to write-mount one and ever intended to bring back the
> other one, I should be sure it was always the same one I was
> write-mounting and updating, so only one would be changed and it'd
> always be clear which copy was the newest.  (Btrfs behavior on this
> point has since been confirmed by a dev, btrfs tracks write
> generation and will always take the higher sequence write generation
> if there's a difference.  If the write generations happened to be the
> same, however, as I took what he said, it'd depend on which one the
> kernel happened to find first.  So always making sure the same one
> was written to was and remains a good idea, so different writes don't
> get done to different devices, with some of those writes dropped when
> they're recombined in an undegraded mount.)
> 
> And if there was any doubt, the best action would be to wipe (or trim/
> discard, my devices are SSD so that's the simplest option) the one 
> filesystem, and btrfs device add and btrfs balance back to it from
> the other exactly as if it were a new device, rather than risk not
> knowing which of the two differing versions btrfs would end up with.
> 
> But as I said, if btrfs only allows read-only mounts of filesystems 
> without enough devices to properly complete the raidlevel, that
> shouldn't be as big an issue these days, since it should be more
> difficult or impossible to get the two devices separately mounted
> writable in the first place, with the consequence that the differing
> copies issue will be difficult or impossible to trigger in the first
> place. =:^)
> 
> 
> But that's still a very useful heads-up for anyone using btrfs in
> raid1 mode to know about, particularly when they're working with
> degraded mode, just to keep the possibility in mind and be safe with
> their manipulations to avoid it... unless of course they're testing
> exactly the same sort of thing I was. =:^)
> 

Well I think I just experienced it the hard way.
I tried unplugging one hard drive and the other to test a little how
BTRFS will react and after several times (I absolutely didn't make
any modification whatsoever) the system refuse to boot.
I tried everything for hours to restore it (btrfsck, scrub, etc.) but I
keep receiving error messages.
At the end I just reinstall everything (fortunately it was a test
system).
So yes you are right, as soon as one hard drive failed, you MUST mount
in read-only mode, otherwise it is almost a given that something bad
will happen.

I think I will try again to experiment, but by taking snapshots before
this time ;-)

Thanks again for your superb help ! It keeps me motivated to keep on
with BTRFS !
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS with RAID1 cannot boot when removing drive

Reply via email to