On 2014-12-02 06:54, Anand Jain wrote:
Stupid thought, why don't we just add blacklisting based on device path like LVM has for pvscan?On 02/12/2014 19:14, Goffredo Baroncelli wrote:I further investigate this issue. MegaBrutal, reported the following issue: doing a lvm snapshot of the device of a mounted btrfs fs, the new snapshot device name replaces the name of the original device in the output of /proc/mounts. This confused tools like grub-probe which report a wrong root device.very good test case indeed thanks. Actual IO would still go to the original device, until FS is remounted.It has to be pointed out that instead the link under /sys/fs/btrfs/<fsid>/devices is correct.In this context the above sysfs path will be out of sync with the reality, its just stale sysfs entry.What happens is that *even if the filesystem is mounted*, doing a "btrfs dev scan" of a snapshot (of the real volume), the device name of the filesystem is replaced with the snapshot one.we have some fundamentally wrong stuff. My original patch tried to fix it. But later discovered that some external entities like systmed and boot process is using that bug as a feature and we had to revert the patch. Fundamentally scsi inquiry serial number is only number which is unique to the device (including the virtual device, but there could be some legacy virtual device which didn't follow that strictly, Anyway those I deem to be device side issue.) Btrfs depends on the combination of fsid, uuid and devid (and generation number) to identify the unique device volume, which is weak and easy to go wrong.Anand, with b96de000b, tried to fix it; however further regression appeared and Chris reverted this commit (see below). BR G.Baroncelli commit b96de000bc8bc9688b3a2abea4332bd57648a49f Author: Anand Jain <anand.j...@oracle.com> Date: Thu Jul 3 18:22:05 2014 +0800 Btrfs: device_list_add() should not update list when mounted [...] commit 0f23ae74f589304bf33233f85737f4fd368549eb Author: Chris Mason <c...@fb.com> Date: Thu Sep 18 07:49:05 2014 -0700 Revert "Btrfs: device_list_add() should not update list when mounted" This reverts commit b96de000bc8bc9688b3a2abea4332bd57648a49f. This commit is triggering failures to mount by subvolume id in some configurations. The main problem is how many different ways this scanning function is used, both for scanning while mounted and unmounted. A proper cleanup is too big for late rcs. [...] On 12/02/2014 09:28 AM, MegaBrutal wrote:2014-12-02 8:50 GMT+01:00 Goffredo Baroncelli <kreij...@inwind.it>:On 12/02/2014 01:15 AM, MegaBrutal wrote:2014-12-02 0:24 GMT+01:00 Robert White <rwh...@pobox.com>:On 12/01/2014 02:10 PM, MegaBrutal wrote:Since having duplicate UUIDs on devices is not a problem for me since I can tell them apart by LVM names, the discussion is of little relevance to my use case. Of course it's interesting and I like to read it along, it is not about the actual problem at hand.Which is why you use the device= mount option, which would take LVM names and which was repeatedly discussed as solving this very problem. Once you decide to duplicate the UUIDs with LVM snapshots you take up the burden of disambiguating your storage. Which is part of why re-reading was suggested as this was covered in some depth and _is_ _exactly_ about the problem at hand.Nope. root@reproduce-1391429:~# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic root=/dev/mapper/vg-rootlv ro rootflags=device=/dev/mapper/vg-rootlv,subvol=@ Observe, device= mount option is added.device= options is needed only in a btrfs multi-volume scenario. If you have only one disk, this is not neededI know. I only did this as a demonstration for Robert. He insisted it will certainly solve the problem. Well, it doesn't.root@reproduce-1391429:~# ./reproduce-1391429.sh #!/bin/sh -v lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert rootlv vg -wi-ao--- 1.00g swap0 vg -wi-ao--- 256.00m grub-probe --target=device / /dev/mapper/vg-rootlv grep " / " /proc/mounts rootfs / rootfs rw 0 0 /dev/dm-1 / btrfs rw,relatime,space_cache 0 0 lvcreate --snapshot --size=128M --name z vg/rootlv Logical volume "z" created lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert rootlv vg owi-aos-- 1.00g swap0 vg -wi-ao--- 256.00m z vg swi-a-s-- 128.00m rootlv 0.11 ls -l /dev/vg/ total 0 lrwxrwxrwx 1 root root 7 Dec 2 00:12 rootlv -> ../dm-1 lrwxrwxrwx 1 root root 7 Dec 2 00:12 swap0 -> ../dm-0 lrwxrwxrwx 1 root root 7 Dec 2 00:12 z -> ../dm-2 grub-probe --target=device / /dev/mapper/vg-z grep " / " /proc/mounts rootfs / rootfs rw 0 0 /dev/dm-2 / btrfs rw,relatime,space_cache 0 0What /proc/self/mountinfo contains ?Before creating snapshot: 15 20 0:15 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw 16 20 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw 17 20 0:5 / /dev rw,relatime - devtmpfs udev rw,size=241692k,nr_inodes=60423,mode=755 18 17 0:12 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=000 19 20 0:16 / /run rw,nosuid,noexec,relatime - tmpfs tmpfs rw,size=50084k,mode=755 20 0 0:17 /@ / rw,relatime - btrfs /dev/dm-1 rw,space_cache <----- THIS! 21 15 0:20 / /sys/fs/cgroup rw,relatime - tmpfs none rw,size=4k,mode=755 22 15 0:21 / /sys/fs/fuse/connections rw,relatime - fusectl none rw 23 15 0:6 / /sys/kernel/debug rw,relatime - debugfs none rw 24 15 0:10 / /sys/kernel/security rw,relatime - securityfs none rw 25 19 0:22 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs none rw,size=5120k 26 19 0:23 / /run/shm rw,nosuid,nodev,relatime - tmpfs none rw 27 19 0:24 / /run/user rw,nosuid,nodev,noexec,relatime - tmpfs none rw,size=102400k,mode=755 28 15 0:25 / /sys/fs/pstore rw,relatime - pstore none rw 29 20 253:1 / /boot rw,relatime - ext2 /dev/vda1 rw After creating snapshot: 15 20 0:15 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw 16 20 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw 17 20 0:5 / /dev rw,relatime - devtmpfs udev rw,size=241692k,nr_inodes=60423,mode=755 18 17 0:12 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=000 19 20 0:16 / /run rw,nosuid,noexec,relatime - tmpfs tmpfs rw,size=50084k,mode=755 20 0 0:17 /@ / rw,relatime - btrfs /dev/dm-2 rw,space_cache <----- WTF?! 21 15 0:20 / /sys/fs/cgroup rw,relatime - tmpfs none rw,size=4k,mode=755 22 15 0:21 / /sys/fs/fuse/connections rw,relatime - fusectl none rw 23 15 0:6 / /sys/kernel/debug rw,relatime - debugfs none rw 24 15 0:10 / /sys/kernel/security rw,relatime - securityfs none rw 25 19 0:22 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs none rw,size=5120k 26 19 0:23 / /run/shm rw,nosuid,nodev,relatime - tmpfs none rw 27 19 0:24 / /run/user rw,nosuid,nodev,noexec,relatime - tmpfs none rw,size=102400k,mode=755 28 15 0:25 / /sys/fs/pstore rw,relatime - pstore none rw 29 20 253:1 / /boot rw,relatime - ext2 /dev/vda1 rw So it's consistent with what /proc/mounts reports.And more important question: it is only the value returned by /proc/mount wrongly or also the filesystem content is affected ?I quote my bug report on this: "The information reported in /proc/mounts is certainly bogus, since still the origin device is being written, the kernel does not actually mix up the devices for write operations, and such, the phenomenon does not cause data corruption. (I did an entire distro release upgrade while the conditions were present, and I centainly would have suffered severe data corruption otherwise. Fortunately, the origin device had the new distro, and the snapshot device had the old one, so besides the mixup in /proc/mounts, no actual damage happened.)"
smime.p7s
Description: S/MIME Cryptographic Signature