On Sun, 2015-12-06 at 22:34 +0800, Qu Wenruo wrote: > Not sure about LVM/MD, but they should suffer the same UUID conflict > problem. Well I had that actually quite often in LVM (i.e. same UUIDs visible on the same system), basically because we made clones from one template VM image and when that is normally booted, LVM doesn't allow to change the UUIDs of already active PV/VG/LVs (or maybe just some of these three, forgot the details)
But there was never any issue, LVM on the host system, when one set was already used, continues to use that just fine and the toolset reports which it would use (more below). > The only idea I have can only enhance the behavior, but never fix it. > For example, if found multiple btrfs devices with same devid, just > refuse to mount. > And for already mounted btrfs, ignore any duplicated fsid/devid. Well I think that's already a perfectly valid solution... basically the idea that I had before. I'd call that a 100% fix, not just a workaround. If then the tools (i.e. btrfstune) allows to change the UUID of the duplicate set of devices (perhaps again with the necessity to specify each of them via device=/dev/sda,etc.) I'd be completely happy again,... and the show could get on ;) > The problem can get even tricky for case like device missing for a > while > and appear again case. I had thought about that too: a) In the non-malicious case, this could e.g. mean that a device from a btrfs RAID was missing and a clone with the same UUID / dev ID get's added to the system Possible consequences, AFAICS: - The data is simply auto-rebuilt on the clone. - Some corruptions occur when the clone is older, and data that was only on the newer device is now missing (not sure if this can happen at all or whether generation IDs prevent it). b) In the malicious/attack case, one possible scenario could be: A device is missing from a btrfs RAID... the machine is left unattended. An attacker comes plugs in the USB stick with the missing UUID. Is the rebuild (and thus data leakage) now happening automatically? In any case though, a simply solution could be, that not automatic assemblies happen per default, and the people who still want to do that, are properly warned about the possible implications in the docs. > But just as you mentioned, it *IS* a real problem, and we should need > to > enhance it. Should one (or I) add this as a ticket to the kernel bugzilla, or as an entry to the btrfs wiki? > I'd like to see how LVM/DM behaves first, at least as a reference if > they are really so safe. Well that's very simple to check, I did it here for the LV case only: root@lcg-lrz-admin:~# truncate -s 1G image1 root@lcg-lrz-admin:~# losetup -f image1 root@lcg-lrz-admin:~# pvcreate /dev/loop0 Physical volume "/dev/loop0" successfully created root@lcg-lrz-admin:~# losetup -d /dev/loop0 root@lcg-lrz-admin:~# cp image1 image2 root@lcg-lrz-admin:~# losetup -f image1 root@lcg-lrz-admin:~# pvscan PV /dev/sdb VG vg_data lvm2 [50,00 GiB / 0 free] PV /dev/sda1 VG vg_system lvm2 [9,99 GiB / 0 free] PV /dev/loop0 lvm2 [1,00 GiB] Total: 3 [60,99 GiB] / in use: 2 [59,99 GiB] / in no VG: 1 [1,00 GiB] root@lcg-lrz-admin:~# losetup -f image2 root@lcg-lrz-admin:~# pvscan Found duplicate PV tSK9Cdpw6bcmocZnxFPD6ThNz1opRXsB: using /dev/loop1 not /dev/loop0 PV /dev/sdb VG vg_data lvm2 [50,00 GiB / 0 free] PV /dev/sda1 VG vg_system lvm2 [9,99 GiB / 0 free] PV /dev/loop1 lvm2 [1,00 GiB] Total: 3 [60,99 GiB] / in use: 2 [59,99 GiB] / in no VG: 1 [1,00 GiB] Obviously, with PVs alone, there is no "x is already used" case. As one can see it just says it would ignore one of them, which I think is rather stupid in that particular case (i.e. non of the devices already used somehow), because it probably just "randomly" decides which is to be used, which is ambiguous. > And what will rescan show if they are not active? My experience was always (it's just quite late and I don't want to simulate everything right now, which is trivial anyway): - It shows warnings about the duplicates in the tools - It continues to use the already active devices (if any) - Unfortunately, while the kernel continues to use the already used devices, the toolset may use other device (kinda stupid, but at least it warns and the already used devices seem to be still properly used): continuation from the setup above: root@lcg-lrz-admin:~# losetup -d /dev/loop1 (now only image1 is seen as loop0) root@lcg-lrz-admin:~# vgcreate vg_test /dev/loop0 Volume group "vg_test" successfully created root@lcg-lrz-admin:~# lvcreate -n test vg_test -l 100 Logical volume "test" created root@lcg-lrz-admin:~# mkfs.ext4 /dev/vg_test/test mke2fs 1.42.12 (29-Aug-2014) ... root@lcg-lrz-admin:~# mount /dev/vg_test/test /mnt/ root@lcg-lrz-admin:~# losetup -a /dev/loop0: [64768]:518297 (/root/image1) root@lcg-lrz-admin:~# losetup -f image2 root@lcg-lrz-admin:~# vgs Found duplicate PV tSK9Cdpw6bcmocZnxFPD6ThNz1opRXsB: using /dev/loop1 not /dev/loop0 VG #PV #LV #SN Attr VSize VFree vg_data 1 1 0 wz--n- 50,00g 0 vg_system 1 2 0 wz--n- 9,99g 0 root@lcg-lrz-admin:~# lvs Found duplicate PV tSK9Cdpw6bcmocZnxFPD6ThNz1opRXsB: using /dev/loop1 not /dev/loop0 LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert data vg_data -wi-ao---- 50,00g root vg_system -wi-ao---- 9,02g swap vg_system -wi-ao---- 1000,00m As you can see, even though loop0 is used (by the kernel) the toolset would use loop1... o.O Yeah, don't ask me why... I once had a discussion with Alastair from the LVM people about that, forgot the exact reasons (if there were any) and I was simply happy that it continued to use the already open devices properly. > Or after a reboot? Haven't checked this right now but I guess it again just decides on one of them (which is pretty bad). > > I would expect that in addition to the fs UUID, it needs a form of > > device ID... so why not simply ignoring any new device for which > > there > > already is a matching fs UUID and device ID, unless the respective > > tool > > (mount, btrfs, etc.) is explicitly told so via some > > device=/dev/sda,/dev/sdb option. > > IIRC, there were some btrfs-progs patches for such behavior, not sure > about kernel part though. > But at least an interesting method to solve the problem. > (Better than just rejecting mounting any) Of course if the user wouldn't specify those, it would still need to reject mounting/using/activating/fsck'ing/etc. ... > > If that means that less things work out of the box (in the sense of > > "auto-assembly") well than this is simply necessary. > > data security and consistency is definitely much more important > > than > > any fancy auto-magic. > > Can't agree any more. > Especially when auto leads to wrong behavior (Like kernel version > based > probing). Good to hear... well... you're the developer... spread the word :D > And after all, this topic makes me remember the bugreport of fuzzed > (but > csum recalculated) images. > I used to ignore them and I think that wouldn't happen. > > But the reporter is right, it's a btrfs security problem, and now I'm > super happy to see such report. As I've said, I've been quite surprised that no one seems to have thought about that before (especially the security aspect of that issue). > As it's easy to fix, I can always submit some patches if there is no > other guy faster than me. :) Awesome... showstopper number #1 just seems to be about to walk away :D > So for this one, as long as we find a good behavior to solve it, it > won't be a big thing. Great... keep me/us updated :) Cheers, Chris.
smime.p7s
Description: S/MIME cryptographic signature