Am Tue, 2 May 2017 21:50:19 +0200 schrieb Goffredo Baroncelli <kreij...@inwind.it>:
> On 2017-05-02 20:49, Adam Borowski wrote: > >> It could be some daemon that waits for btrfs to become complete. > >> Do we have something? > > Such a daemon would also have to read the chunk tree. > > I don't think that a daemon is necessary. As proof of concept, in the > past I developed a mount helper [1] which handled the mount of a > btrfs filesystem: this handler first checks if the filesystem is a > multivolume devices, if so it waits that all the devices are > appeared. Finally mount the filesystem. > > > It's not so simple -- such a btrfs device would have THREE states: > > > > 1. not mountable yet (multi-device with not enough disks present) > > 2. mountable ro / rw-degraded > > 3. healthy > > My mount.btrfs could be "programmed" to wait a timeout, then it > mounts the filesystem as degraded if not all devices are present. > This is a very simple strategy, but this could be expanded. > > I am inclined to think that the current approach doesn't fit well the > btrfs requirements. The roles and responsibilities are spread to too > much layer (udev, systemd, mount)... I hoped that my helper could be > adopted in order to concentrate all the responsibility to only one > binary; this would reduce the interface number with the other > subsystem (eg systemd, udev). > > For example, it would be possible to implement a sane check that > prevent to mount a btrfs filesystem if two devices exposes the same > UUID... Ideally, the btrfs wouldn't even appear in /dev until it was assembled by udev. But apparently that's not the case, and I think this is where the problems come from. I wish, btrfs would not show up as device nodes in /dev that the mount command identified as btrfs. Instead, btrfs would expose (probably through udev) a device node in /dev/btrfs/fs_identifier when it is ready. Apparently, the core problem of how to handle degraded btrfs still remains. Maybe it could be solved by adding more stages of btrfs nodes, like /dev/btrfs-incomplete (for unusable btrfs), /dev/btrfs-degraded (for btrfs still missing devices but at least one stripe of btrfs raid available) and /dev/btrfs as the final stage. That way, a mount process could wait for a while, and if the device doesn't appear, it tries the degraded stage instead. If the fs is opened from the degraded dev node stage, udev (or other processes) that scan for devices should stop assembling the fs if they still do so. bcache has a similar approach by hiding an fs within a protective superblock. Unless bcache is setup, the fs won't show up in /dev, and that fs won't be visible by other means. Btrfs should do something similar and only show a single device node if assembled completely. The component devices would have superblocks ignored by mount, and only the final node would expose a virtual superblock and the compound device after it. Of course, this makes things like compound device resizing more complicated maybe even impossible. If I'm not totally wrong, I think this is also how zfs exposes its pools. You need user space tools to make the fs pools visible in the tree. If zfs is incomplete, there's nothing to mount, and thus no race condition. But I never tried zfs seriously, so I do not know. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html