On 9/16/16 4:28 PM, Tomasz Sterna wrote: > Hi. > > I have spotted an issue with stat(2) call on files on btrfs. > It is giving me dev_t st_dev number that does not correspond to any > mounted filesystem in proc's mountinfo.
That's by design. Your particular file system may only use one device but, internally, btrfs uses virtualized storage that may be spread across multiple devices. To make things more complicated, snapshots mean that: sled1a:/mnt # btrfs sub list . ID 257 gen 14 top level 5 path a ID 258 gen 14 top level 5 path b sled1a:/mnt # ls -laRi .: total 16 256 drwxr-xr-x 1 root root 4 Sep 20 09:08 . 256 drwxr-xr-x 1 root root 220 Sep 16 09:49 .. 256 drwxr-xr-x 1 root root 8 Sep 14 10:24 a 256 drwxr-xr-x 1 root root 8 Sep 14 10:24 b ./a: total 4112 256 drwxr-xr-x 1 root root 8 Sep 14 10:24 . 256 drwxr-xr-x 1 root root 4 Sep 20 09:08 .. 257 -rw-r--r-- 1 root root 4194304 Sep 14 10:24 file ./b: total 4112 256 drwxr-xr-x 1 root root 8 Sep 14 10:24 . 256 drwxr-xr-x 1 root root 4 Sep 20 09:08 .. 257 -rw-r--r-- 1 root root 4194304 Sep 14 10:24 file Under normal circumstances those are two files with the same st_dev and the same inode number. That would normally correspond to a hard link, but the files do not (necessarily) correspond to the same file. ... but because we use anonymous device numbers for each subvolume, we have different device numbers for each one. sled1a:/mnt # stat --format "%n st_dev=%d" {a,b}/file a/file st_dev=69 b/file st_dev=70 It's a pretty big usability wart that we don't consistently report the device number. We do it correctly in stat() but there are other places in the code that assume that inode->i_sb->s_dev will work. In the SUSE kernels, we have patches that add a super_operation to report the correct device number everywhere, but even that is a hack. > I already attempted a illinformed-patch in fs/btrfs/super.c: > > @@ -1127,6 +1127,7 @@ static int btrfs_fill_super(struct super_block *sb, > goto fail_close; > } > > + sb->s_dev = inode->i_sb->s_dev; > sb->s_root = d_make_root(inode); > if (!sb->s_root) { > err = -ENOMEM; > > but it didn't help. It wouldn't. That is assigning a variable to itself. > I would like to dig deeper and fix it, but first I have to ask: > - Which number is wrong? > The one returned by stat() or the one in mountinfo? The one in mountinfo, but then that means that the user only sees the anonymous devices in mount(8), which isn't what we want either. I'm afraid the correct fix is very involved and requires non-trivial changes in the VFS layer as well. It's on my long-term TODO list. I currently have some patches that do the magic with vfsmounts but it's far from being usable. -Jeff -- Jeff Mahoney SUSE Labs
signature.asc
Description: OpenPGP digital signature