Hi,

I just added new disk with btrfs and my intention was to use space of
this new disk in multiple mountpoints of existing tree.
So after create new btrfs pool on tol of new device I've created two
subvolumes srv an lxc then after adding necessary entries in fstab to
mount in /srv and /var/lib/lxc -> mount -a.
So far so good because what I've done is not illegal or wrong and such
trick is possible and it works.
Then after checking what I've just done using mount command looking on
the output I've realised that something is really wrong :/

# mount | grep btrfs
/dev/mmcblk0p2 on / type btrfs
(rw,relatime,ssd,space_cache,subvolid=257,subvol=/fedora)
/dev/sda2 on /var/lib/lxc type btrfs
(rw,relatime,ssd,space_cache,subvolid=257,subvol=/lxc)
/dev/sda2 on /srv type btrfs
(rw,relatime,ssd,space_cache,subvolid=259,subvol=/srv)

As it is possible to see both mount points are listed as mounted out
of /dev/sda2.
That is a real problem for any monitoring software which will be
catching two times mountes exactly the same device.
So where is the design issue?
1) btrfs has no concept of storage poll with name like it is in case of zfs
2) all btrfs subvolumes are not listed in mount output

With zpool name is possible to present all volumes as subobject of the
storage pool like in output:

$ zfs list
NAME                              USED  AVAIL  REFER  MOUNTPOINT
rpool                            2.03G   298G   144K  /rpool
rpool/ROOT                       1.96G   298G   144K  legacy
rpool/ROOT/solaris-0              584K   298G  1.63G  /
rpool/ROOT/solaris-0/var          352K   298G   140M  /var
rpool/ROOT/solaris-1             1.96G   298G  1.63G  /
rpool/ROOT/solaris-1/var          225M   298G   124M  /var
rpool/VARSHARE                   31.9M   298G   312K  /var/share
rpool/VARSHARE/pkg                296K   298G   152K  /var/share/pkg
rpool/VARSHARE/pkg/repositories   144K   298G   144K
/var/share/pkg/repositories
rpool/VARSHARE/sstore            22.3M   298G  22.3M  /var/share/sstore/repo
rpool/VARSHARE/tmp               9.07M   298G  9.07M  /var/tmp
rpool/export                     42.2M   298G   152K  /export
rpool/export/home                42.1M   298G   168K  /export/home
rpool/export/home/jacek           176K   298G   176K  /export/home/jacek
rpool/export/home/ss              196K   298G   196K  /export/home/ss
rpool/export/home/tkloczko       41.6M   298G  41.6M  /export/home/tkloczko

and all those zfs subvolumes are viable in mount output so they are
fully distinguishable

 mount | grep rpool
/ on rpool/ROOT/solaris-1
read/write/setuid/nodevices/rstchown/nonbmand/exec/xattr/noatime/mountpoint=/data/HPE-Builder/root//zone=HPE-Builder/nozonemod/sharezone=1/dev=35d002b
on Sun Mar  3 04:17:05 2019
/var on rpool/ROOT/solaris-1/var
read/write/setuid/nodevices/rstchown/nonbmand/exec/xattr/noatime/mountpoint=/data/HPE-Builder/root/var/zone=HPE-Builder/nozonemod/sharezone=1/dev=35d002c
on Sun Mar  3 04:17:07 2019
/var/share on rpool/VARSHARE
read/write/nosetuid/nodevices/rstchown/nonbmand/noexec/noxattr/noatime/zone=HPE-Builder/sharezone=1/dev=35d0031
on Sun Mar  3 04:17:27 2019
/var/tmp on rpool/VARSHARE/tmp
read/write/setuid/nodevices/rstchown/nonbmand/exec/xattr/noatime/zone=HPE-Builder/sharezone=1/dev=35d0032
on Sun Mar  3 04:17:27 2019
/export on rpool/export
read/write/setuid/nodevices/rstchown/nonbmand/exec/xattr/noatime/zone=HPE-Builder/sharezone=1/dev=35d0035
on Sun Mar  3 04:17:35 2019
/export/home on rpool/export/home
read/write/setuid/nodevices/rstchown/nonbmand/exec/xattr/noatime/zone=HPE-Builder/sharezone=1/dev=35d0036
on Sun Mar  3 04:17:35 2019
/export/home/jacek on rpool/export/home/jacek
read/write/setuid/nodevices/rstchown/nonbmand/exec/xattr/noatime/zone=HPE-Builder/sharezone=1/dev=35d0037
on Sun Mar  3 04:17:35 2019
/export/home/ss on rpool/export/home/ss
read/write/setuid/nodevices/rstchown/nonbmand/exec/xattr/noatime/zone=HPE-Builder/sharezone=1/dev=35d0038
on Sun Mar  3 04:17:35 2019
/export/home/tkloczko on rpool/export/home/tkloczko
read/write/setuid/nodevices/rstchown/nonbmand/exec/xattr/noatime/zone=HPE-Builder/sharezone=1/dev=35d0039
on Sun Mar  3 04:17:36 2019
/rpool on rpool
read/write/setuid/nodevices/rstchown/nonbmand/exec/xattr/noatime/zone=HPE-Builder/sharezone=1/dev=35d003a
on Sun Mar  3 04:17:36 2019
/var/share/pkg on rpool/VARSHARE/pkg
read/write/nosetuid/nodevices/rstchown/nonbmand/noexec/noxattr/noatime/zone=HPE-Builder/sharezone=1/dev=35d003f
on Sun Mar  3 04:17:37 2019
/var/share/pkg/repositories on rpool/VARSHARE/pkg/repositories
read/write/nosetuid/nodevices/rstchown/nonbmand/noexec/noxattr/noatime/zone=HPE-Builder/sharezone=1/dev=35d0040
on Sun Mar  3 04:17:37 2019
/var/share/sstore/repo on rpool/VARSHARE/sstore
read/write/nosetuid/nodevices/rstchown/nonbmand/noexec/noxattr/noatime/zone=HPE-Builder/sharezone=1/dev=35d0044
on Sun Mar  3 04:17:39 2019

Because all zfs volumes are visable on VFS layer Solaris is able to
produce per volume VFS layer statistics. In case of btrfs something
like this is not possible because only mountpoint mounted is what was
specified for mount (manually or over fstab)
For example I can take from mount output dev=35d0044 and use it with kstat:

$ kstat -p | grep 35d0044
unix:1:vopstats_35d0044:aread_bytes     0
unix:1:vopstats_35d0044:aread_time      0
unix:1:vopstats_35d0044:awrite_bytes    0
unix:1:vopstats_35d0044:awrite_time     0
unix:1:vopstats_35d0044:class   misc
unix:1:vopstats_35d0044:crtime  254.04196691
unix:1:vopstats_35d0044:nacancel        0
unix:1:vopstats_35d0044:naccess 107474
unix:1:vopstats_35d0044:naddmap 0
unix:1:vopstats_35d0044:nafsync 0
unix:1:vopstats_35d0044:naread  0
unix:1:vopstats_35d0044:nawrite 0
unix:1:vopstats_35d0044:nclose  107474
unix:1:vopstats_35d0044:ncmp    42
unix:1:vopstats_35d0044:ncreate 58700
unix:1:vopstats_35d0044:ndelmap 0
unix:1:vopstats_35d0044:ndispose        0
unix:1:vopstats_35d0044:ndump   0
unix:1:vopstats_35d0044:ndumpctl        0
unix:1:vopstats_35d0044:nfid    0
unix:1:vopstats_35d0044:nfrlock 0
unix:1:vopstats_35d0044:nfsync  0
unix:1:vopstats_35d0044:ngetattr        156527
unix:1:vopstats_35d0044:ngetpage        0
unix:1:vopstats_35d0044:ngetsecattr     58700
unix:1:vopstats_35d0044:ninactive       58661
unix:1:vopstats_35d0044:nioctl  0
unix:1:vopstats_35d0044:nlink   0
unix:1:vopstats_35d0044:nlookup 1549680
unix:1:vopstats_35d0044:nmap    0
unix:1:vopstats_35d0044:nmkdir  0
unix:1:vopstats_35d0044:nopen   107474
unix:1:vopstats_35d0044:npageio 0
unix:1:vopstats_35d0044:npathconf       0
unix:1:vopstats_35d0044:npoll   0
unix:1:vopstats_35d0044:nputpage        0
unix:1:vopstats_35d0044:nread   48729
unix:1:vopstats_35d0044:nreaddir        87
unix:1:vopstats_35d0044:nreadlink       0
unix:1:vopstats_35d0044:nrealvfs        0
unix:1:vopstats_35d0044:nrealvp 140620
unix:1:vopstats_35d0044:nreflink        0
unix:1:vopstats_35d0044:nreletocache    1585518
unix:1:vopstats_35d0044:nremove 51329
unix:1:vopstats_35d0044:nrename 7738
unix:1:vopstats_35d0044:nreqzcbuf       0
unix:1:vopstats_35d0044:nretzcbuf       0
unix:1:vopstats_35d0044:nrmdir  0
unix:1:vopstats_35d0044:nrwlock 107515
unix:1:vopstats_35d0044:nrwunlock       107515
unix:1:vopstats_35d0044:nseek   0
unix:1:vopstats_35d0044:nsetattr        0
unix:1:vopstats_35d0044:nsetfl  0
unix:1:vopstats_35d0044:nsetsecattr     0
unix:1:vopstats_35d0044:nshrlock        0
unix:1:vopstats_35d0044:nspace  0
unix:1:vopstats_35d0044:nsymlink        0
unix:1:vopstats_35d0044:nvnevent        0
unix:1:vopstats_35d0044:nwrite  58699
unix:1:vopstats_35d0044:read_bytes      550957714
unix:1:vopstats_35d0044:read_time       514114360
unix:1:vopstats_35d0044:readdir_bytes   117480
unix:1:vopstats_35d0044:readdir_time    286027046
unix:1:vopstats_35d0044:snaptime        109395.47355841
unix:1:vopstats_35d0044:write_bytes     595622652
unix:1:vopstats_35d0044:write_time      1543437021

Other thing is that looks still btrfs does not provide any per volume metrics.

Other thing that in case of btrfs it is not possible to have subvolume
but mounted.
In case of zfs all volumes with mountpoint "legacy" are not mouted
when zpool is imported which creates perfect platform for cloning
rootfs which always is as not mounted/legacy. With multiple clones is
possible to use them as separated instances of boot environments
(BEs). Without such feature looks like using btrfs would be way
harder.
There is of course much more ZFS features which do not have any
analogues in case of btrfs. Some people here familiar with zfs are
probably more or less aware what is till missing.

Just in case .. I'm not complaining. I'm only trying gently point on
something which already will cause some confusion maybe starting kind
of discussion about how to solve current state.
I thing that redesign btrfs to have subvolumes visible in mount could
solve few things.

Comments?

kloczek
-- 
Tomasz Kłoczko | LinkedIn: http://lnkd.in/FXPWxH

Reply via email to