On Tue, Nov 25, 2014 at 05:34:15PM +0100, Goffredo Baroncelli wrote:
> On 11/23/2014 01:19 AM, Zygo Blaxell wrote:
> [...]
> > md-raid works as long as you specify the devices, and because it's always
> > the lowest layer it can ignore LVs (snapshot or otherwise).  It's also
> > not a particularly common use case, while making an LV snapshot of a
> > filesystem is a typical use case.
> 
> I fully agree; but you still consider a *multi-device* btrfs over lvm...
> This is like a dm over lvm... which doesn't make sense at all (as you 
> already wrote)

It makes sense for btrfs because btrfs can productively use LVs on
different PVs (e.g. btrfs-raid1 on two LVs, one on each PV).  LVM is
the bottom layer because not everything in the world is btrfs--things
like ephemeral /tmp, boot, swap, and temporary backup copies of the btrfs
(e.g.  before running btrfsck) have to live on the same physical drives
as the btrfs filesystems.

> >>> and mounting the filesystem fails at 3.  
> >> Are you sure ?
> > 
> > Yes, I'm sure.  I've had to replace filesystems destroyed this way.
> > 
> >> [working instance snipped]
> > 
> >> On the basis of the example above, in case you want to mount a 
> >> "single-disk", BTRFS seems me to work properly. You have to pay
> >> attention only to not mount the two filesystem at the same time.
> > 
> > The problem is btrfs stops searching when it sees one disk with each UUID,
> 
> BTRFS doens't search anything. It is udev which "push" the information
> on the kernel module. The btrfs module groups these information by UUID.
> When a new disk is inserted, overwrite the information of the old one.

Same result:  when presented with multiple devices with the same UUID,
one is chosen arbitrarily instead of rejecting all of them.

> > so the set of disks (snapshot vs origin) that you get is *random*.
> > For a pair of origin + snapshots, there's a 50% chance it works, 50%
> > chance it eats your data.
> 
> Sorry but I have to disagree: the code is quite clear 
> (see fs/btrfs/volume.c, near line 512):
> 
> [...]
> 
>         } else if (!device->name || strcmp(device->name->str, path)) {
>                 /*
>                  * When FS is already mounted.
>                  * 1. If you are here and if the device->name is NULL that
>                  *    means this device was missing at time of FS mount.
>                  * 2. If you are here and if the device->name is different
>                  *    from 'path' that means either
>                  *      a. The same device disappeared and reappeared with
>                  *         different name. or
>                  *      b. The missing-disk-which-was-replaced, has
>                  *         reappeared now.

If the FS is already mounted then there is no issue.  It's when you're trying
to mount the FS that the fun occurs.

>                  *
>                  * We must allow 1 and 2a above. But 2b would be a spurious
>                  * and unintentional.
> 
> [...]
> 
> The case is the 2a; in this case btrfs store the new name and mount it.
> 
> Anyway I made a small test: I created 1 btrfs filesystem, and 
> made a lvm-snapshot. Then create two different file in the snapshot and in
> the original one. I run a program which mounts randomly the first or
> the latter, checks if the correct file is present; after more than 130 tests I
> never saw your "50% chance it works": it always works.

One btrfs filesystem on two LVs with a snapshot of each LV also present.
So you'd have:

        lv00 - btrfs device 1
        lv01 - btrfs device 2
        lv00snap - snapshot of lv00
        lv01snap - snapshot of lv01

If you mount by device UUID then you get one of these results at random:

        lv00 + lv01 - OK
        lv00snap + lv01snap - also OK
        lv00 + lv01snap - failure
        lv00snap + lv01 - failure

2 failures, 2 successes = 50% failure rate.

If you mount by the name of one of the devices then you only get the two
rows of the above table that match the device you named, but you still
get one success row and one failure row.

Which result you get seems to depend on the order in which LVM enumerates
the LVs, so if you are doing a mount/umount loop then you won't see any
problems as btrfs will consistently make the same choice of LVs over
and over again.  Rebooting or creating other LVs in between mounts will
definitely cause problems.

> BR
> G.Baroncelli
> 
> > 
> >> BR
> >> G.Baroncelli
> >>
> >>
> >> -- 
> >> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> >> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
> 
> 
> -- 
> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
> 

Attachment: signature.asc
Description: Digital signature

Reply via email to