Re: APFS improvements (e.g. firm links, volume w/ subvols replication) as ideas for Btrfs?

David Sterba Wed, 12 Jun 2019 02:59:11 -0700

On Tue, Jun 11, 2019 at 10:03:51PM -0600, Chris Murphy wrote:
> On Tue, Jun 11, 2019 at 12:31 PM Neal Gompa <ngomp...@gmail.com> wrote:
> >
> > Hey,
> >
> > So Apple held its WWDC event last week, and among other things, they
> > talked about improvements they've made to filesystems in macOS[1].
> >
> > Among other things, one of the things introduced was a concept of
> > "firm links", which is something like NTFS' directory junctions,
> > except they can cross (sub)volumes.
> 
> My understanding is it's a work around for the lack of APFS supporting
> directory hardlinks. Btrfs does support directory hardlinks but a


Directory hardlinks are not supported in general on linux and prohibited
on the VFS level. (check fs/namei.c vfs_link, explicitly returns -EPERM
for a directory).

> hardlink points to a particular inode within a particular subvolume
> (files tree) so it's not possible to have a hard link that crosses
> subvolumes. A reflink can already do this, but it's really just an
> efficient copy, the resulting directory is independent. A directory
> symlink can mirror a directory across subvolumes, but like any symlink
> it must have a fixed path available to always find the real deal.
> 
> I think a firm link like thing on Btrfs would require a format change,
> but I'm not certain. My best guess of what it'd be, is a dir/file
> object that gets its own inode but contains a hard reference (not
> independent object) to a subvolid+inode.
> 
> 
> >This concept makes it easier to
> > handle uglier layouts. While bind mounts work kind of okay for this
> > with simpler configurations, it requires operating system awareness,
> > rather than being setup automatically as the volume is mounted. This
> > is less brittle and works better for recovery environments, and help
> > make easier to do read-only system volumes while supported read-write
> > sections in a more flexible way.
> 
> There are a couple of things going on. One is something between VFS
> and Btrfs does this goofy assumption that bind mounts are subvolumes,
> which is definitely not true. I bring this up here:
> https://lore.kernel.org/linux-btrfs/CAJCQCtT=-YoFJgEo=bfqfipdtmojcyr3djpsekf+hq22gyg...@mail.gmail.com/

The subvolumes build on top of the bind mount API internally but it is
or should be a different kind of object.

> Near as I can tell, Btrfs kernel code just needs to be smarter about
> distinguishing between bind mounts of directories versus the behind
> the scene bind mount used for subvolumes mounted using -o subvol= or
> -o subvolid= ; I don't think that's difficult. It's just someone needs
> to work through the logic and set aside the resources to do it.

I tried to fix that and got half way through, then hit the difficult
problems mainly with nested subvolumes. For leaf subvolumes, the
difference between

  subvolume/dir/dir/dir (bind mounted)

and

  subvolume (mounted with -o)

is to traverse back the path until the subvolume is hit, which in both
cases would be 'subvolume'. Howvever, with nested subvolumes it's not
easy to see where to stop

  subvol1/dir/dir/subvol2/dir/dir/subvol3/dir/dir

and take 3 cases:

  mount -o subvol=subvol1
  mount -o subvol=subvol2
  mount -o subvol=subvol3

the backward path traversal will always say it's subvol3 (that's wrong
from users POV). Keeping track of the exact subvolume that was mounted
is not trivial because it partially has to duplicate the internal VFS
information which makes it hard to keep consistent after moves.

There was a concept proposal called 'fs view' that would add proper
subvolume abstraction for subvolumes to VFS but I don't know how far
this got.

Re: APFS improvements (e.g. firm links, volume w/ subvols replication) as ideas for Btrfs?

Reply via email to