Re: Unexpected raid1 behaviour

Austin S. Hemmelgarn Wed, 20 Dec 2017 05:33:59 -0800

On 2017-12-19 17:23, Tomasz Pala wrote:

On Tue, Dec 19, 2017 at 15:47:03 -0500, Austin S. Hemmelgarn wrote:

Sth like this? I got such problem a few months ago, my solution was
accepted upstream:
https://github.com/systemd/systemd/commit/0e8856d25ab71764a279c2377ae593c0f2460d8f

Rationale is in referred ticket, udev would not support any more btrfs
logic, so unless btrfs handles this itself on kernel level (daemon?),
that is all that can be done.

Or maybe systemd can quit trying to treat BTRFS like a volume manager
(which it isn't) and just try to mount the requested filesystem with the
requested options?


Tried that before ("just mount my filesystem, stupid"), it is a no-go.
The problem source is not within systemd treating BTRFS differently, but
in btrfs kernel logic that it uses. Just to show it:

1. create 2-volume btrfs, e.g. /dev/sda and /dev/sdb,
2. reboot the system into clean state (init=/bin/sh), (or remove btrfs-scan 
tool),
3. try
mount /dev/sda /test - fails
mount /dev/sdb /test - works
4. reboot again and try in reversed order
mount /dev/sdb /test - fails
mount /dev/sda /test - works

THIS readiness is exposed via udev to systemd. And it must be used for
multi-layer setups to work (consider stacked LUKS, LVM, MD, iSCSI, FC etc).

Except BTRFS _IS NOT MULTIPLE LAYERS_. It's one layer at the filesystemlayer, and handles the other 'layers' internally.


In short: until *something* scans all the btrfs components, so the
kernel makes it ready, systemd won't even try to mount it.

Which is the problem here. Systemd needs to treat BTRFS differently,even if the ioctl it's using gets 'fixed', currently it's treating itlike LVM or MD, when it needs to be treated as just a filesystem with anextra wait condition prior to mount (and needs to trust that the userknows what they are doing when they mount something by hand). The IOCTLsystemd is using was poorly named, what it really does is say that theFS is ready to mount normally (that is, without needing 'device=' or'degraded' mount options). Aside from this being problematic withdegraded volumes, it's got an inherent TOCTOU race condition (so do thechecks with all the other block layers you mentioned FWIW). If systemdwould just treat BTRFS like a filesystem instead of a volume manager,and try to mount the volume with the specified options (after waitingfor udev to report that it's done scanning everything) instead of askingthe kernel if it's ready, none of this would be an issue.

Put slightly differently: I use OpenRC and sysv init. I have a scriptthat runs right after udev starts and directly scans all fixed disks forBTRFS signatures, and that's _all_ that I need to do to get multi-deviceBTRFS working properly with the standard local filesystem mount scriptin Gentoo. I don't have to deal with any of this crap that systemdusers do because Gentoo's OpenRC script for mounting local filesystemstreats BTRFS like any other filesystem, and (sensibly) assumes that ifthe call to mount succeeds, things are ready and working.

Then you would just be able to specify 'degraded' in
your mount options, and you don't have to care that the kernel refuses
to mount degraded filesystems without being explicitly asked to.


Exactly. But since LP refused to try mounting despite kernel "not-ready"
state - it is the kernel that must emit 'ready'. So the
question is: how can I make kernel to mark degraded array as "ready"?

You can't, because the DEVICE_READY IOCTL is coded to mark the volumeready when all component devices are ready. IOW, it's there to say'this mount will work without needing -o degraded or specifying anydevices in the mount options'.

The issue is the interaction here, not the kernel behavior by itself,since the kernel behavior produces no issues whatsoever for other initsystems (though I will acknowledge that the ioctl itself is really onlyused by systemd, but I contend that that's because everything else issensible enough to understand that the ioctl is functionally useless andjust avoid it).


The obvious answer is: do it via kernel command line, just like mdadm
does:
rootflags=device=/dev/sda,device=/dev/sdb
rootflags=device=/dev/sda,device=missing
rootflags=device=/dev/sda,device=/dev/sdb,degraded

If only btrfs.ko recognized this, kernel would be able to assemble
multivolume btrfs itself. Not only this would allow automated degraded
mounts, it would also allow using initrd-less kernels on such volumes.

Last I checked, the 'device=' options work on upstream kernels justfine, though I've never tried the degraded option. Of course, I'm alsonot using systemd, so it may be some interaction with systemd that'scausing them to not work (and yes, I understand that I'm inclined toblame systemd most of the time based on significant past experience withsystemd creating issues that never existed before).

It doesn't have to be default, might be kernel compile-time knob, module
parameter or anything else to make the *R*aid work.

There's a mount option for it per-filesystem.  Just add that to all your
mount calls, and you get exactly the same effect.


If only they were passed...


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Unexpected raid1 behaviour

Reply via email to