Re: APFS improvements (e.g. firm links, volume w/ subvols replication) as ideas for Btrfs?

Chris Murphy Tue, 11 Jun 2019 21:06:01 -0700

On Tue, Jun 11, 2019 at 12:31 PM Neal Gompa <ngomp...@gmail.com> wrote:
>
> Hey,
>
> So Apple held its WWDC event last week, and among other things, they
> talked about improvements they've made to filesystems in macOS[1].
>
> Among other things, one of the things introduced was a concept of
> "firm links", which is something like NTFS' directory junctions,
> except they can cross (sub)volumes.


My understanding is it's a work around for the lack of APFS supporting
directory hardlinks. Btrfs does support directory hardlinks but a
hardlink points to a particular inode within a particular subvolume
(files tree) so it's not possible to have a hard link that crosses
subvolumes. A reflink can already do this, but it's really just an
efficient copy, the resulting directory is independent. A directory
symlink can mirror a directory across subvolumes, but like any symlink
it must have a fixed path available to always find the real deal.

I think a firm link like thing on Btrfs would require a format change,
but I'm not certain. My best guess of what it'd be, is a dir/file
object that gets its own inode but contains a hard reference (not
independent object) to a subvolid+inode.


>This concept makes it easier to
> handle uglier layouts. While bind mounts work kind of okay for this
> with simpler configurations, it requires operating system awareness,
> rather than being setup automatically as the volume is mounted. This
> is less brittle and works better for recovery environments, and help
> make easier to do read-only system volumes while supported read-write
> sections in a more flexible way.

There are a couple of things going on. One is something between VFS
and Btrfs does this goofy assumption that bind mounts are subvolumes,
which is definitely not true. I bring this up here:
https://lore.kernel.org/linux-btrfs/CAJCQCtT=-YoFJgEo=bfqfipdtmojcyr3djpsekf+hq22gyg...@mail.gmail.com/

Near as I can tell, Btrfs kernel code just needs to be smarter about
distinguishing between bind mounts of directories versus the behind
the scene bind mount used for subvolumes mounted using -o subvol= or
-o subvolid= ; I don't think that's difficult. It's just someone needs
to work through the logic and set aside the resources to do it.

Second, the FHS is a PITA anyway, but it really shows its unhelpful
ways when it comes to read-only, recoverable/resettable systems. Just
see the massively complicated subvolume carveouts opensuse has to do
when installed on Btrfs, and the even more complicated gymnastics
libostree is doing on the various rpm-ostree variants including Fedora
Silverblue.

Apple, a long long time ago said, fuck that insanity, we're burying
the FHS so mortal users can't see that shit. And we're going to have a
plain language set of directories for, you know, actual people who
need to get work done.

So definitely consider me in the camp of the FHS making life harder, not easier.

>
> For example, this would be useful if a volume has two subvolumes: OS
> and data. OS would have /usr and data would have /var and /home. But,
> importantly, a couple of system data things need to be part of the OS
> that are on /var: /var/lib/rpm and /var/lib/alternatives. These two
> belong with the OS, and it's incredibly difficult to move it around
> due to all kinds of ecosystem knock-on effects. (If you want to know
> more about that, just ask the SUSE kiwi team... it's the gift that
> keeps on giving...). Both /var/lib/rpm and /var/lib/alternatives are
> part of the OS, but they're in /var. It'd be great to stitch that in
> from the read-only OS volume into the /var subvolume so that it's
> actually part of the OS volume even though it looks like it's in the
> data one. It's completely transparent to everything. Supporting atomic
> updates (with something like a dnf plugin) becomes much easier because
> we can trigger snapshot and subvolume mounts with preserving enough
> structure to make things work. In this circumstance, we can flip the
> properties so that the new location has a rw OS and ro data volume
> mount for doing only software updates (or leave data volume rw during
> this transaction and merge the changes back into the OS). We could
> also do creative things with /etc if we so wish...

Is it really best to do this in Btrfs proper, rather than in VFS?


> Another thing that APFS seems to support now is creating linked
> snapshots (snapshots of multiple subvolumes that are paired together
> as single snapshot) for full system replication. Obviously, with firm
> links, it makes sense to be able to do such a thing so that full
> system replication works properly. As far as I know, it shouldn't be a
> difficult concept to implement in Btrfs, but I guess it wouldn't be
> really necessary if we don't have firm links...

Right now a subvolume is really just a files tree. It's not as
separate as it might seem from the pool, compared to what a ZFS
dataset is, or I guess it's called a volume is in APFS. To do this on
Btrfs probably is another disk format change. My guess is something
based on seed-sprout feature, but without the mandatory 2nd block
device for the spout. i.e. freeze all the trees.

--
Chris Murphy

Re: APFS improvements (e.g. firm links, volume w/ subvols replication) as ideas for Btrfs?

Reply via email to