On Sun, 2015-12-06 at 22:34 +0800, Qu Wenruo wrote:
> Not sure about LVM/MD, but they should suffer the same UUID conflict
> problem.
Well I had that actually quite often in LVM (i.e. same UUIDs visible on
the same system), basically because we made clones from one template VM
image and when that is normally booted, LVM doesn't allow to change the
UUIDs of already active PV/VG/LVs (or maybe just some of these three,
forgot the details)

But there was never any issue, LVM on the host system, when one set was
already used, continues to use that just fine and the toolset reports
which it would use (more below).


> The only idea I have can only enhance the behavior, but never fix it.
> For example, if found multiple btrfs devices with same devid, just 
> refuse to mount.
> And for already mounted btrfs, ignore any duplicated fsid/devid.
Well I think that's already a perfectly valid solution... basically the
idea that I had before.
I'd call that a 100% fix, not just a workaround.

If then the tools (i.e. btrfstune) allows to change the UUID of the duplicate 
set of devices (perhaps again with the necessity to specify each of them via 
device=/dev/sda,etc.) I'd be completely happy again,... and the show could get 
on ;)

> The problem can get even tricky for case like device missing for a
> while 
> and appear again case.
I had thought about that too:
a) In the non-malicious case, this could e.g. mean that a device from a
btrfs RAID was missing and a clone with the same UUID / dev ID get's
added to the system
Possible consequences, AFAICS:
- The data is simply auto-rebuilt on the clone.
- Some corruptions occur when the clone is older, and data that was
only on the newer device is now missing (not sure if this can happen at
all or whether generation IDs prevent it).

b) In the malicious/attack case, one possible scenario could be:
A device is missing from a btrfs RAID... the machine is left
unattended. An attacker comes plugs in the USB stick with the missing
UUID. Is the rebuild (and thus data leakage) now happening
automatically?

In any case though, a simply solution could be, that not automatic
assemblies happen per default, and the people who still want to do
that, are properly warned about the possible implications in the docs.


> But just as you mentioned, it *IS* a real problem, and we should need
> to 
> enhance it.
Should one (or I) add this as a ticket to the kernel bugzilla, or as an
entry to the btrfs wiki?


> I'd like to see how LVM/DM behaves first, at least as a reference if 
> they are really so safe.
Well that's very simple to check, I did it here for the LV case only:
root@lcg-lrz-admin:~# truncate -s 1G image1
root@lcg-lrz-admin:~# losetup -f image1 
root@lcg-lrz-admin:~# pvcreate /dev/loop0
  Physical volume "/dev/loop0" successfully created
root@lcg-lrz-admin:~# losetup -d /dev/loop0 
root@lcg-lrz-admin:~# cp image1 image2
root@lcg-lrz-admin:~# losetup -f image1 
root@lcg-lrz-admin:~# pvscan 
  PV /dev/sdb     VG vg_data     lvm2 [50,00 GiB / 0    free]
  PV /dev/sda1    VG vg_system   lvm2 [9,99 GiB / 0    free]
  PV /dev/loop0                  lvm2 [1,00 GiB]
  Total: 3 [60,99 GiB] / in use: 2 [59,99 GiB] / in no VG: 1 [1,00 GiB]
root@lcg-lrz-admin:~# losetup -f image2 
root@lcg-lrz-admin:~# pvscan 
  Found duplicate PV tSK9Cdpw6bcmocZnxFPD6ThNz1opRXsB: using /dev/loop1 not 
/dev/loop0
  PV /dev/sdb     VG vg_data     lvm2 [50,00 GiB / 0    free]
  PV /dev/sda1    VG vg_system   lvm2 [9,99 GiB / 0    free]
  PV /dev/loop1                  lvm2 [1,00 GiB]
  Total: 3 [60,99 GiB] / in use: 2 [59,99 GiB] / in no VG: 1 [1,00 GiB]

Obviously, with PVs alone, there is no "x is already used" case. As one
can see it just says it would ignore one of them, which I think is
rather stupid in that particular case (i.e. non of the devices already
used somehow), because it probably just "randomly" decides which is to
be used, which is ambiguous.


> And what will rescan show if they are not active?
My experience was always (it's just quite late and I don't want to
simulate everything right now, which is trivial anyway):
- It shows warnings about the duplicates in the tools
- It continues to use the already active devices (if any)
- Unfortunately, while the kernel continues to use the already used
devices, the toolset may use other device (kinda stupid, but at least
it warns and the already used devices seem to be still properly used):

continuation from the setup above:
root@lcg-lrz-admin:~# losetup -d /dev/loop1 
(now only image1 is seen as loop0)
root@lcg-lrz-admin:~# vgcreate vg_test /dev/loop0
  Volume group "vg_test" successfully created
root@lcg-lrz-admin:~# lvcreate -n test vg_test -l 100
  Logical volume "test" created
root@lcg-lrz-admin:~# mkfs.ext4 /dev/vg_test/test 
mke2fs 1.42.12 (29-Aug-2014)
...
root@lcg-lrz-admin:~# mount /dev/vg_test/test /mnt/
root@lcg-lrz-admin:~# losetup -a
/dev/loop0: [64768]:518297 (/root/image1)
root@lcg-lrz-admin:~# losetup -f image2 
root@lcg-lrz-admin:~# vgs
  Found duplicate PV tSK9Cdpw6bcmocZnxFPD6ThNz1opRXsB: using /dev/loop1 not 
/dev/loop0
  VG        #PV #LV #SN Attr   VSize  VFree
  vg_data     1   1   0 wz--n- 50,00g    0 
  vg_system   1   2   0 wz--n-  9,99g    0 
root@lcg-lrz-admin:~# lvs
  Found duplicate PV tSK9Cdpw6bcmocZnxFPD6ThNz1opRXsB: using /dev/loop1 not 
/dev/loop0
  LV   VG        Attr       LSize    Pool Origin Data%  Meta%  Move Log 
Cpy%Sync Convert
  data vg_data   -wi-ao----   50,00g                                            
        
  root vg_system -wi-ao----    9,02g                                            
        
  swap vg_system -wi-ao---- 1000,00m                                            
        

As you can see, even though loop0 is used (by the kernel) the toolset
would use loop1... o.O
Yeah, don't ask me why... I once had a discussion with Alastair from
the LVM people about that, forgot the exact reasons (if there were any)
and I was simply happy that it continued to use the already open
devices properly.


>  Or after a reboot?
Haven't checked this right now but I guess it again just decides on one
of them (which is pretty bad).


> > I would expect that in addition to the fs UUID, it needs a form of
> > device ID... so why not simply ignoring any new device for which
> > there
> > already is a matching fs UUID and device ID, unless the respective
> > tool
> > (mount, btrfs, etc.) is explicitly told so via some
> > device=/dev/sda,/dev/sdb option.
> 
> IIRC, there were some btrfs-progs patches for such behavior, not sure
> about kernel part though.
> But at least an interesting method to solve the problem.
> (Better than just rejecting mounting any)
Of course if the user wouldn't specify those, it would still need to
reject mounting/using/activating/fsck'ing/etc. ...


> > If that means that less things work out of the box (in the sense of
> > "auto-assembly") well than this is simply necessary.
> > data security and consistency is definitely much more important
> > than
> > any fancy auto-magic.
> 
> Can't agree any more.
> Especially when auto leads to wrong behavior (Like kernel version
> based 
> probing).
Good to hear... well... you're the developer... spread the word :D


> And after all, this topic makes me remember the bugreport of fuzzed
> (but 
> csum recalculated) images.
> I used to ignore them and I think that wouldn't happen.
> 
> But the reporter is right, it's a btrfs security problem, and now I'm
> super happy to see such report.
As I've said, I've been quite surprised that no one seems to have
thought about that before (especially the security aspect of that
issue).


> As it's easy to fix, I can always submit some patches if there is no 
> other guy faster than me. :)
Awesome... showstopper number #1 just seems to be about to walk away :D


> So for this one, as long as we find a good behavior to solve it, it 
> won't be a big thing.
Great... keep me/us updated :)


Cheers,
Chris.

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to