Re: BTRFS messes up snapshot LV with origin

2014-12-01 Thread Zygo Blaxell
On Fri, Nov 28, 2014 at 11:55:07PM -0800, Robert White wrote:
 On 11/28/2014 08:59 PM, Zygo Blaxell wrote:
 On Fri, Nov 28, 2014 at 06:05:48PM +0100, Goffredo Baroncelli wrote:
 On 11/27/2014 05:15 AM, Zygo Blaxell wrote:
 This is a weakness of the current udev and asynchronous device hotplug
 concept:  there is no notion of bus enumeration in progress, so we can be
 trying to assemble multi-device storage before we have all the devices
 visible.  Assembly of aggregate storage (whatever it is--btrfs, md,
 lvm2...) has to wait until all known storage buses are fully enumerated
 in order to detect if there are duplicates.
 
 It is more complex than that. Some devices may appear after the 1st bus
 enumeration.
 
 That case is well handled already--a new enumeration will start with the
 second (and all later) hotplug events.
 
 The problem arises when we try to assemble disk arrays before the
 known end of the 1st (or any) enumeration.  There is no way for an
 enumerating agent to tell other agents this is definitely not the
 complete list of devices yet, other devices may be inserted imminently
 and defer all the multi-device assembly until the address space of the
 enumering bus is fully covered.
 
 MDADM has an attached but not started state for arrays that
 handles this condition during incremental assembly. (see mdadm
 --incremental /dev/whatever),

 [...very complicated mdadm-architecture-invades-the-filesystem-layer
 thing snipped...]

I don't see why it can't all be done in user-space more or less the same
way LVM does.  Scan all the parititions known to be available, build a
table of devices with UUIDs matching the target filesystem, check for
sufficiency, check for uniqueness, and if the configuration passes all the
sanity checks (or we have hints from the user that resolve ambiguity),
submit the entire list of devices to the kernel as a BTRFS filesystem.
If there are UUID duplicates or missing devices, submit nothing to the
kernel at all.

initramfs-less multi-disk configurations can calculate all that in
advance and generate a rootflags parameter for the kernel command line.
It's not necessary to resolve every possible situation in the kernel.



signature.asc
Description: Digital signature


Re: BTRFS messes up snapshot LV with origin

2014-11-29 Thread Robert White

On 11/28/2014 11:35 PM, Goffredo Baroncelli wrote:

I agree with you; but I have to find a default so during the boot
a system can start even if snapshots are present.


No, you really _don't_ need to find such a default.

Better a system that doesn't boot than one that boots based on a guess.

I've been spending a lot of time thinking about booting while writing 
underdog (http://underdog.sourceforge.net) and while booting is fragile, 
an even partially incorrect boot is a system and _security_ nightmare.


If you start making preferential guesses then an intruder could trick 
the system into booting from a thumb-drive or other alternate media by 
coercing a UUID colision in a way that the system picks the new media.


Conflicts should _never_ be guessed at during boot. Ever.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-29 Thread Robert White

On 11/28/2014 11:29 PM, Duncan wrote:

Since I can't/won't run pretty much anything proprietary, there's little
chance of it being taken as anything but Linux, here.  (Tho I actually
use (c)gdisk for partitioning here and it appears to use a different GUID.
(0700 in its short form which AFAIK is gdisk specific, for MS basic data,
while it uses 8300 for general Linux filesystems.  I could look up the
long form GUIDs, but meh...)


Partition type codes (e.g. 0700, 8300, EF00, etc) have _nothing_ to do 
with UUIDs. They are type codes. They aren't short form of anything 
else at all. In fact 0700 is the _long_ _form_ of the original code of 
7, but in big-endian order now that it went from one byte to two.


Microsoft started using pre-assigned UUIDs as classes, e.g. type codes 
they could cram into their various registry files. If you actually read 
the registry you'll find a lot of places where rational word is 
defined as {some_uuid_here} and then eslwere {some_uuid_here} has a 
bunch of data items attached to it.


So gpartd didn;t reuse microsoft UUIDs.

In some/many of the older formats there was a code for operating system 
data (which I think is what 7 was originally). Others came by and said 
since we're going to put in a type code for linux swap (82) then lets 
put in a code for linux data as well (83), and all this before the whole 
byte expansion to turn these things from bytes into two-byte words.


Once everybody else picked their own type codes for their data 
partitions, everybody just started calling 7 microsoft data. And linux 
doesn't care at all since it's noise since every partition just ends up 
as /dev/[sh]d? anyway.


All this stuff has historical reasons. GNU/Linux attempts to be an 
egalitarian actor so it adapts to whatever you do.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-29 Thread Duncan
Robert White posted on Sat, 29 Nov 2014 00:20:11 -0800 as excerpted:

 On 11/28/2014 11:29 PM, Duncan wrote:
 (Tho I actually use (c)gdisk for partitioning here and it appears to
 use a different GUID. (0700 in its short form which AFAIK is gdisk
 specific, for MS basic data, while it uses 8300 for general Linux
 filesystems.  I could look up the long form GUIDs, but meh...)
 
 Partition type codes (e.g. 0700, 8300, EF00, etc) have _nothing_ to do
 with UUIDs. They are type codes. They aren't short form of anything
 else at all. In fact 0700 is the _long_ _form_ of the original code of
 7, but in big-endian order now that it went from one byte to two.

You obviously know where the short forms originated (MBR type codes), but 
you haven't the foggiest what you're talking about in relation to gdisk, 
where they're used as 4-hex-char entry shortcuts for the similar GPT/EFI 
GUIDs.  Now that's what I expected with the mention of a different 
partition editor, thus my mention that they were shortcuts for GUIDs, 
apparently gdisk specific, but in gdisk they certainly ARE shortcuts to 
the various GUIDs and you certainly do *NOT* know what you're talking 
about saying they are not even related.

From the gdisk (8) manpage entry for the l/list action:

l   Display a summary of partition types. GPT uses a GUID to
identify partition types for particular OSes and purposes. For
ease of data entry, gdisk compresses these into two-byte
(four-digit hexadecimal) values that are related to their 
equivalent MBR codes.  Specifically, the MBR code is multiplied
by hexadecimal 0x0100. For instance, the code for Linux swap
space in MBR is 0x82, and it's 0x8200 in gdisk. A one-to-one
correspondence is impossible, though. Most notably, the codes
for all varieties of FAT and NTFS partition correspond to a
single GPT code (entered as 0x0700 in sgdisk).  Some OSes use a
single MBR code but employ many more codes in GPT. For these,
gdisk adds code numbers sequentially, such as 0xa500 for a
FreeBSD disklabel, 0xa501 for FreeBSD boot, 0xa502 for FreeBSD
swap, and so on. Note that these two-byte codes are unique to
gdisk.

See also the gdisk home page:

http://www.rodsbooks.com/gdisk/

In particular, see the gdisk walkthru here:

http://www.rodsbooks.com/gdisk/walkthrough.html

... and the gdisk manpage I quoted above here:

http://www.rodsbooks.com/gdisk/gdisk.html


So as I said, gdisk uses a 4-hexit short code based on the legacy MBR 
type-code as an easy entry and display form referencing the longer and 
much less human readable GUIDs, just like I said, and such usage is gdisk 
specific, just like I said I thought it was.

And you might have known the legacy MBR type-codes from which they were 
derived, but obviously you had no idea what I was talking about here, and 
despite my saying it was gdisk specific you decided to simply claim I 
didn't know what I was talking about without actually checking the 
situation, despite my telling you exactly what app I was referring to and 
that I thought those references were app-specific, giving you plenty of 
chance to actually look it up yourself if you decided to, or simply not 
argue that point if you weren't interested in checking out the app-
specific stuff.

=:^(

 Microsoft started using pre-assigned UUIDs as classes, e.g. type codes
 they could cram into their various registry files. If you actually read
 the registry you'll find a lot of places where rational word is
 defined as {some_uuid_here} and then eslwere {some_uuid_here} has a
 bunch of data items attached to it.

FWIW I know about the MS registry stuff from actually doing MS-registry 
and API related programming (hobbiest/VB level but using the regular API 
not just the VB exposed stuff) back before the turn of the century.  I've 
not touched it in nearing a decade and a half now and my knowledge is 
consequently dated 9x vintage, but it obviously had the registry and I 
used to be /quite/ familiar with it, including of course the UUIDs.

 So gpartd didn;t reuse microsoft UUIDs.
 
 In some/many of the older formats there was a code for operating system
 data (which I think is what 7 was originally). Others came by and said
 since we're going to put in a type code for linux swap (82) then lets
 put in a code for linux data as well (83), and all this before the whole
 byte expansion to turn these things from bytes into two-byte words.
 
 Once everybody else picked their own type codes for their data
 partitions, everybody just started calling 7 microsoft data. And linux
 doesn't care at all since it's noise since every partition just ends up
 as /dev/[sh]d? anyway.
 
 All this stuff has historical reasons. GNU/Linux attempts to be an
 egalitarian actor so it adapts to whatever you do.

This part I have no disagreement with...


-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program 

Re: BTRFS messes up snapshot LV with origin

2014-11-29 Thread Robert White

On 11/29/2014 01:41 AM, Duncan wrote:

Robert White posted on Sat, 29 Nov 2014 00:20:11 -0800 as excerpted:
l   Display a summary of partition types. GPT uses a GUID to
identify partition types for particular OSes and purposes. For
ease of data entry, gdisk compresses these into two-byte
(four-digit hexadecimal) values that are related to their
equivalent MBR codes.  Specifically, the MBR code is multiplied
by hexadecimal 0x0100.


That EFI uses GUIDs is one thing. That the standard allows these to be 
selected based on type codes originally derived from ms-dos partition 
type codes (compressed is the wrong word) is something else. If they 
were compressed then it would be a relationship that could represent 
any GUID at all. It's marginally hashed, in that there is a table 
lookup, but its not properly a hashed as the hash function is 
undefined for virtually all possible input values.



The other partition GUID is acutally more interesting.



So as I said, gdisk uses a 4-hexit short code based on the legacy MBR
type-code as an easy entry and display form referencing the longer and
much less human readable GUIDs, just like I said, and such usage is gdisk
specific, just like I said I thought it was.


Which is not what you said. None of the above was mentioned in the email 
to which I responded.


What you actually said ::

[QUOTE]
Since I can't/won't run pretty much anything proprietary, there's little 
chance of it being taken as anything but Linux, here.  (Tho I actually 
use (c)gdisk for partitioning here and it appears to use a different 
GUID. (0700 in its short form which AFAIK is gdisk specific, for MS 
basic data, while it uses 8300 for general Linux filesystems.  I could 
look up the long form GUIDs, but meh...)

[/QUOTE]

None of which is gdisk specific, and all of which is based on EFI and 
the GUID partition table.


What I mistakenly attributed to you and was key to my initial response 
was your extension of Chris Murphy:

 Chris Murphy posted on Fri, 28 Nov 2014 00:10:40 -0700 as excerpted:
 A very good example of WTF reusage of a UUID that irks me to no end is
 GNU parted devs decided to recycle the Microsoft Windows Basic Data
 partition type GUID for Linux partitions. It's like watching 
someone get

 run over by a zamboni with 50 feet of advance notice...

[So my bad there on the quoting...]

The irking there being dumb because the universally used type GUID has 
nothing to do with the second GUID that universally identifies the 
partition regardless of type.


But here is the thing... for all the screed about open and closed 
source... (and I am an open source guy myself) The actual EFI standard 
dictates these partition numbers and whatnot so if you used the 
microsoft tools you'd get the same results.


http://en.wikipedia.org/wiki/GUID_Partition_Table#Partition_type_GUIDs

AND microsoft was one of several principle players in the EFI and its 
GUID partition subparts.


So his being irked to no end and your agreement and that's why I used 
gdisk response are both completely misplaced, and potentially 
misleading to others.


I just went a little off the rails while trying to explain. /D'oh.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-29 Thread Robert White
To those reading along who don't already know. My explanation below is 
factually inadequate or wrong in various places...


The type codes as presented in the various EFI/GUID disk partitioning 
tools as 0700, 8200, 8300, EF02, and so on are never written to disk as 
such. They are short-hand values (chosen to be deliberately similar to 
the MS-DOS partitioning type codes of 07, 82, 83, etc) to select 
standardized GUIDs for the partition type field.


So there is the two-digit code from the ms-dos partitoning scheme, then 
there are the four-digit codes that let you select which type GUID will 
be written in an EFI partition scheme.


The question of reuse is still improper as the type codes were 
assigned by the EFI standard for specific use as type codes. The EFI 
tool used (gdisk, or windows disk partitioning tool, etc) is immaterial 
as the result codes are selected by standard.


I could have, and should have, been _way_ more clear, and/or less wrong. 8-)

http://en.wikipedia.org/wiki/GUID_Partition_Table#Partition_type_GUIDs


On 11/29/2014 12:20 AM, Robert White wrote:

On 11/28/2014 11:29 PM, Duncan wrote:

Since I can't/won't run pretty much anything proprietary, there's little
chance of it being taken as anything but Linux, here.  (Tho I actually
use (c)gdisk for partitioning here and it appears to use a different
GUID.
(0700 in its short form which AFAIK is gdisk specific, for MS basic data,
while it uses 8300 for general Linux filesystems.  I could look up the
long form GUIDs, but meh...)


Partition type codes (e.g. 0700, 8300, EF00, etc) have _nothing_ to do
with UUIDs. They are type codes. They aren't short form of anything
else at all. In fact 0700 is the _long_ _form_ of the original code of
7, but in big-endian order now that it went from one byte to two.

Microsoft started using pre-assigned UUIDs as classes, e.g. type codes
they could cram into their various registry files. If you actually read
the registry you'll find a lot of places where rational word is
defined as {some_uuid_here} and then eslwere {some_uuid_here} has a
bunch of data items attached to it.

So gpartd didn;t reuse microsoft UUIDs.

In some/many of the older formats there was a code for operating system
data (which I think is what 7 was originally). Others came by and said
since we're going to put in a type code for linux swap (82) then lets
put in a code for linux data as well (83), and all this before the whole
byte expansion to turn these things from bytes into two-byte words.

Once everybody else picked their own type codes for their data
partitions, everybody just started calling 7 microsoft data. And linux
doesn't care at all since it's noise since every partition just ends up
as /dev/[sh]d? anyway.

All this stuff has historical reasons. GNU/Linux attempts to be an
egalitarian actor so it adapts to whatever you do.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-29 Thread Chris Murphy
On Sat, Nov 29, 2014 at 1:20 AM, Robert White rwh...@pobox.com wrote:
 On 11/28/2014 11:29 PM, Duncan wrote:

 Since I can't/won't run pretty much anything proprietary, there's little
 chance of it being taken as anything but Linux, here.  (Tho I actually
 use (c)gdisk for partitioning here and it appears to use a different GUID.
 (0700 in its short form which AFAIK is gdisk specific, for MS basic data,
 while it uses 8300 for general Linux filesystems.  I could look up the
 long form GUIDs, but meh...)


 Partition type codes (e.g. 0700, 8300, EF00, etc) have _nothing_ to do with
 UUIDs. They are type codes. They aren't short form of anything else at
 all. In fact 0700 is the _long_ _form_ of the original code of 7, but in
 big-endian order now that it went from one byte to two.

No that's not correct. These four digit type codes are a user facing
friendly type code, the actual on-disk partitiontype GUID is a UUID
in that at the time of creation that UUID followed RFC 4122 so it was
unique: no one else was using the UUID. That UUID in the context of a
partitiontype GUID is intended to describe the purpose of that
partition: what OS, what file system, where it should mount or be used
for, etc. This is elaborately detailed in the GPT (GUID partition
table) portion of the UEFI specification. A 120 bit type code is
rather difficult for humans to remember and interact with, hence gdisk
and recently fdisk now use a four digit type code as a front end for
the partitiontypeGUID. The selection of four digits was to account for
the fact there are many many many more type codes now possible,
essentially unlimited.

This is a case where UUID are reused effectively.



 Microsoft started using pre-assigned UUIDs as classes, e.g. type codes
 they could cram into their various registry files. If you actually read the
 registry you'll find a lot of places where rational word is defined as
 {some_uuid_here} and then eslwere {some_uuid_here} has a bunch of data items
 attached to it.

 So gpartd didn;t reuse microsoft UUIDs.

GNU parted absolutely re-used partitiontypeGUID
EBD0A0A2-B9E5-4433-87C0-68B6B72699C for Linux, by default. This you
know as gdisk (and friends) type code 0700. It's the same thing as
using type code 07 on an MBR partitioned disk instead of 83. It's
ridiculous that this happened considering we had distinction on MBR
with limited type code availability, and on GPT with unlimited type
codes the decision was to use an already existing type code,
EBD0A0A2-B9E5-4433-87C0-68B6B72699C.

http://www.rodsbooks.com/linux-fs-code/

The Linux partitiontype GUID is now
0FC63DAF-8483-4772-8E79-3D69D8477DE4. And actually some others have
been created also for encryption, RAID, LVM, swap, and a pile of GUIDs
from the 'discoverable partitions spec' hosted at freedesktop.org for
autodiscovery by systemd. Only very recent versions of parted supports
code 0FC63DAF-8483-4772-8E79-3D69D8477DE4.


 All this stuff has historical reasons. GNU/Linux attempts to be an
 egalitarian actor so it adapts to whatever you do.

With respect to this particular reuse of a Windows type code, it did a
total face plant on adaptation. The very decision to reuse that GUID
was a huge, weird mistake that we'll live with for years to come. Data
loss will result from it. And then it was made worse, upon recognition
that the conflict was probably not a good idea, to undermine patching
GNU parted in a timely manner. The patch to fix the problem, from the
gdisk author, sat around for two years before parted upstream merged
it. There really isn't good diplomatic language to use for this. Some
people flat out dropped the ball, and just didn't give a crap.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-29 Thread Duncan
Robert White posted on Sat, 29 Nov 2014 08:50:57 -0800 as excerpted:

 To those reading along who don't already know. My explanation below is
 factually inadequate or wrong in various places...
 
 The type codes as presented in the various EFI/GUID disk partitioning
 tools as 0700, 8200, 8300, EF02, and so on are never written to disk as
 such. They are short-hand values (chosen to be deliberately similar to
 the MS-DOS partitioning type codes of 07, 82, 83, etc) to select
 standardized GUIDs for the partition type field.

 I could have, and should have, been _way_ more clear, and/or less wrong.
 8-)
 
 http://en.wikipedia.org/wiki/GUID_Partition_Table#Partition_type_GUIDs

Thanks.

While I guess we all end up eat humble pie occasionally, you handled it 
with more rather more grace that I often do, and by taking such a hard 
line myself I didn't make it as easy as I might have.


-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-28 Thread Goffredo Baroncelli
On 11/27/2014 05:15 AM, Zygo Blaxell wrote:
 On Wed, Nov 26, 2014 at 06:19:05PM +0100, Goffredo Baroncelli wrote:
 On 11/25/2014 11:21 PM, Zygo Blaxell wrote:
 However I still doesn't understood why you want btrfs-w/multiple disk 
 over LVM ?
 I want to split a few disks into partitions, but I want to create,
 move, and resize the partitions from time to time.  Only LVM can do
 that without taking the machine down, reducing RAID integrity levels,
 hotplugging drives, or leaving installed drives idle most of the time.

 I want btrfs-raid1 because of its ability to replace corrupted or lost
 data from one disk using the other.  If I run a single-volume btrfs
 on LVM-RAID1 (or dm-RAID1, or RAID1 at any other layer of the storage
 stack), I can detect lost data, but not replace it automatically from
 the other mirror.
 OK, now I have understood.

 Anyway as workaround, take in account that you can pass explicitly the
 devices as:

 mount -o device=/dev/sda,device=/dev/sdb,device=/dev/sdc /dev/sdd /mnt

 (supposing that the filesystem is on /dev/sda.../dev/sdd)

 I am working to a mount.btrfs helper. The aim of this helper is to manage
 the assembling of multiple devices; the main points will be:
 - wait until all the devices appeared
 
 ...and make sure there are no duplicate UUIDs.
Yes, at the end I implemented in this way the snapshot detection:
if two autodetected devices have the same DISK_UUID (reported as 
SUB_UUID by blkid), th emount process stopped. I checked also the 
num_device field of the superblock.

 
 - allow (if required) to mount in degraded mode after a timeout
 
 This is a terrible idea with current btrfs, at least for read-write
 degraded mounting (fallback to read-only degraded would be OK).
 Mounting a filesystem read-write and degraded is something you only want
 to do immediately before you replace all the missing disks and bring the
 filesystem up to a non-degraded space and after you've ensured that the
 missing disks can never, ever come back; otherwise, btrfs eats your data
 in a slightly different way than we have discussed so far...

I don't care. If the user pass degraded in the options of mount, 
he have it. Anyway this (wrong) btrfs behavior I hope that it will be
solved.
 
 - at this point it could/should also skip the lvm-snapshotted devices (but 
 before 
 I have to know how recognize these) 
 
 You don't have to recognize them as snapshots (and it's probably better
 not to treat snapshots specially anyway--how do you know whether the
 snapshot or the origin LVs are wanted for mounting?).  You just have to
 detect duplicate UUIDs at the btrfs subdevice level, and if any are found,
 stop immediately (or get a hint from the admin).

For the disk autodetection, I still convinced that it is a sane default
to skip the lvm-snapshot

 
 This is a weakness of the current udev and asynchronous device hotplug
 concept:  there is no notion of bus enumeration in progress, so we can be
 trying to assemble multi-device storage before we have all the devices
 visible.  Assembly of aggregate storage (whatever it is--btrfs, md,
 lvm2...) has to wait until all known storage buses are fully enumerated
 in order to detect if there are duplicates.

It is more complex than that. Some devices may appear after the 1st bus
enumeration.


 
 I hope to issue the patches in the next week

 BR
 G.Baroncelli

 -- 
 gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
 Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-28 Thread Robert White

On 11/28/2014 09:05 AM, Goffredo Baroncelli wrote:

For the disk autodetection, I still convinced that it is a sane default
to skip the lvm-snapshot


No... please don't...

Maybe offer an option to select between snapshots or no-snapshots but in 
much the same way there is no _functional_ difference between a 
subvolume and a snapshot in btrfs, there is no degenerate status to an 
LVM snapshot.


It would be way more useful if the helper dumped a message via stderr or 
syslog that said something like UUID= ambiguous, must select 
between /dev/AA and /dev/BB using device= to mount filesystem.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-28 Thread Zygo Blaxell
On Fri, Nov 28, 2014 at 06:05:48PM +0100, Goffredo Baroncelli wrote:
 On 11/27/2014 05:15 AM, Zygo Blaxell wrote:
  This is a weakness of the current udev and asynchronous device hotplug
  concept:  there is no notion of bus enumeration in progress, so we can be
  trying to assemble multi-device storage before we have all the devices
  visible.  Assembly of aggregate storage (whatever it is--btrfs, md,
  lvm2...) has to wait until all known storage buses are fully enumerated
  in order to detect if there are duplicates.
 
 It is more complex than that. Some devices may appear after the 1st bus
 enumeration.

That case is well handled already--a new enumeration will start with the
second (and all later) hotplug events.

The problem arises when we try to assemble disk arrays before the
known end of the 1st (or any) enumeration.  There is no way for an
enumerating agent to tell other agents this is definitely not the
complete list of devices yet, other devices may be inserted imminently
and defer all the multi-device assembly until the address space of the
enumering bus is fully covered.



signature.asc
Description: Digital signature


Re: BTRFS messes up snapshot LV with origin

2014-11-28 Thread Duncan
Chris Murphy posted on Fri, 28 Nov 2014 00:10:40 -0700 as excerpted:

 On Thu, Nov 27, 2014 at 2:08 AM, Duncan 1i5t5.dun...@cox.net wrote:
 So, umm... kinda late now, but read that copy as if it had a footnote
 attached, saying Yes, I know it's not actual copy, it's two views of
 the same thing using COW, but my point is, from the btrfs perspective
 it's a copy, the universally UNIQUE ID no longer looks unique and
 thus no longer can be properly called a UUID at all.
 
 The copy is sort of a misnomer anyway because up until the computer age
 the copy was a derivative, a facsimile, like a photocopy. But a copy of
 a digital file is actually another original. Therein lies the problem
 with the LVM snapshot in this context, we don't want another original.
 We want a copy, as in we want something we know has been derived from
 something else, and therefore can be discriminated.

Very good point.  I had all the pieces but hadn't put them together yet, 
so thanks. =:^)

 Well RFC 4122 I don't think would say it's not a UUID, the uniqueness is
 only guaranteed at the time of UUID creation. And duplication isn't
 creation so it's not going to say these things are no longer UUIDs,
 they're just UUIDs that have been recycled. That RFC doesn't specify
 workflow, but if it did, I think it'd basically say oh crap, why'd you
 go and do that? After all a major point of UUIDs is that they are
 effectively unlimited in quantity, therefore a.) we don't need central
 registry to avoid (unintended) collisions because they're so uncommon,
 b.) we're encouraged to not be attached to specific UUIDs when in doubt
 just create another one.

Another good point.  One common and less RFC/technical way of putting it, 
that I had thought about a few times but hadn't actually posted yet IIRC, 
is the old If it hurts when you bang your head against the wall, quit 
banging! =:^)

IOW, LVM could change the UUIDs in its copies, COWing that bit in 
ordered to do so.  While that wouldn't change the same UUIDs embedded in 
for instance btrfs internals it would provide a mechanism to keep initial 
scans from confusing things, and filesystems or other UUID applications 
that duplicated the number for their own internals would then need to 
provide tools that rewrote them to match the LVM-changed master location 
UUID.  Those that failed to do so would fail to function unless/until the 
master location version was changed back, but the tools and likely would 
eventually be provided, as I expect they will be here, but the difference 
would be at least it'd keep mixups like this from happening.

 A very good example of WTF reusage of a UUID that irks me to no end is
 GNU parted devs decided to recycle the Microsoft Windows Basic Data
 partition type GUID for Linux partitions. It's like watching someone get
 run over by a zamboni with 50 feet of advance notice...

At least I don't have to worry about that one, since I no longer agree to 
WE REFUSE TO TELL YOU SPECIFICALLY WHAT THIS SOFTWARE DOES AS WE DON'T 
SUPPLY THE SOURCES, BUT YOU ARE STILL REQUIRED TO ACCEPT ALL 
RESPONSIBILITY FOR IT, REGARDLESS OF WHAT IT DOES AND REGARDLESS OF 
WHETHER WE'VE BEEN WARNED style EULAs, which is basically all of them, 
which means I have no legal way to run that software, so I don't.  Note 
that the GPL among others has similar liability disclaimer wording (and 
to be fair it'd be hard not to, since the sources are there and the 
original author can hardly be held responsible for later modifications to 
them), but because it actually gives you the sources too, it allows you 
to fairly make your own decision about the responsibility you're about to 
take on.

Since I can't/won't run pretty much anything proprietary, there's little 
chance of it being taken as anything but Linux, here.  (Tho I actually 
use (c)gdisk for partitioning here and it appears to use a different GUID. 
(0700 in its short form which AFAIK is gdisk specific, for MS basic data, 
while it uses 8300 for general Linux filesystems.  I could look up the 
long form GUIDs, but meh...)


-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-28 Thread Goffredo Baroncelli
On 11/29/2014 02:25 AM, Robert White wrote:
 On 11/28/2014 09:05 AM, Goffredo Baroncelli wrote:
 For the disk autodetection, I still convinced that it is a sane
 default to skip the lvm-snapshot
 
 No... please don't...
 
 Maybe offer an option to select between snapshots or no-snapshots but
 in much the same way there is no _functional_ difference between a
 subvolume and a snapshot in btrfs, there is no degenerate status to
 an LVM snapshot.

I agree with you; but I have to find a default so during the boot
a system can start even if snapshots are present.

And pay attention that there would be cases where multiple
snapshot are present: how group these ? My be for generation number ?

Anyway for the moment my help simply refuse to mount if there is
a conflict of dev_uuid.

 
 It would be way more useful if the helper dumped a message via stderr
 or syslog that said something like UUID= ambiguous, 

This is what it is printed when the helper finds a duplicate uuid:

ghigo@emulato:~$ sudo lvdisplay | grep LV Path
  LV Path/dev/test/lv01
  LV Path/dev/test/lv02
  LV Path/dev/test/lv02_snap
  LV Path/dev/test/lv01_snap

ghigo@emulato:~$ sudo mount /dev/test/lv01 /mnt/btrfs1/
ERROR: disk '/dev/mapper/test-lv01' and '/dev/mapper/test-lv01_snap' have the 
same disk uuid
ERROR: disk '/dev/mapper/test-lv02_snap' and '/dev/mapper/test-lv02' have the 
same disk uuid

 must
 select between /dev/AA and /dev/BB using device= to mount
 filesystem.

But anyway I can force the disk to mount:

ghigo@emulato:~$ sudo mount /dev/test/lv01_snap -o device=/dev/test/lv02_snap 
/mnt/btrfs1/

 
 
 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-28 Thread MegaBrutal
2014-11-29 2:25 GMT+01:00 Robert White rwh...@pobox.com:

 On 11/28/2014 09:05 AM, Goffredo Baroncelli wrote:

 For the disk autodetection, I still convinced that it is a sane default
 to skip the lvm-snapshot


 No... please don't...

 Maybe offer an option to select between snapshots or no-snapshots but in much 
 the same way there is no _functional_ difference between a subvolume and a 
 snapshot in btrfs, there is no degenerate status to an LVM snapshot.

 It would be way more useful if the helper dumped a message via stderr or 
 syslog that said something like UUID= ambiguous, must select between 
 /dev/AA and /dev/BB using device= to mount filesystem.



I agree with this. Sometimes people will exactly want to do that:
mount the snapshot devices and not the origins. Listing devices in the
device= mount option sounds perfectly sane.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-28 Thread Robert White

On 11/28/2014 08:59 PM, Zygo Blaxell wrote:

On Fri, Nov 28, 2014 at 06:05:48PM +0100, Goffredo Baroncelli wrote:

On 11/27/2014 05:15 AM, Zygo Blaxell wrote:

This is a weakness of the current udev and asynchronous device hotplug
concept:  there is no notion of bus enumeration in progress, so we can be
trying to assemble multi-device storage before we have all the devices
visible.  Assembly of aggregate storage (whatever it is--btrfs, md,
lvm2...) has to wait until all known storage buses are fully enumerated
in order to detect if there are duplicates.


It is more complex than that. Some devices may appear after the 1st bus
enumeration.


That case is well handled already--a new enumeration will start with the
second (and all later) hotplug events.

The problem arises when we try to assemble disk arrays before the
known end of the 1st (or any) enumeration.  There is no way for an
enumerating agent to tell other agents this is definitely not the
complete list of devices yet, other devices may be inserted imminently
and defer all the multi-device assembly until the address space of the
enumering bus is fully covered.

MDADM has an attached but not started state for arrays that handles 
this condition during incremental assembly. (see mdadm --incremental 
/dev/whatever),


To slightly misuse the vocabulary, as each partition is encountered and 
submitted to the system it's checked for a superblock. If one is found 
then it has the identity of an array encoded on it and if that array 
doesn't exist it is allocated, otherwise the device is added to the 
existent array. The array is only started if all the devices are 
accounted for unless an option is added to allow earlier starts, and 
even then enough of the devices must be present to make sense (e.g. 
only one device missing from a RAID5, or a correct pair of devices for a 
RAID10 etc.)


So we'd need a partially assembled but not started state and some 
ioctls to do things like force-start or force-disown a filesystem that 
cannot be finished automatically.


That sort of thing is very easy to do with devices because devices don't 
have to be opened and can reject an open attempt, or at least the 
read/writes after an open and such.


Unfortunately a filesystem can really only exist as a mounted thing, and 
can really only be controlled by remounting thereafter. The most 
efficient way to do this would be to have a alternate file system 
operations structure that was filled mostly with dummy operations that 
would return ENOENT and friends. Then the remount that finally fulfilled 
the file system's requirements would then switch out that struct for the 
fully functional one. That remount would need an adddev= and some 
other such options (much like AUFS adds layers).


It;s all doable. But it stretches to near breaking the mount paradigm. 
You would need an operation that looked like mount -t btrfs -o 
do_we_need_this /dev/whatever /this/datum/means/nothing to match and 
attach a device wherever it goes or you might end up needing to do the 
Cartesian product of trial attachments of each new device to all active 
fileystems to match it up, which is an ugly external scripting requirement.


As far as waiting for the address space to be fully covered. Meh. If a 
ready-or-not, or ready-enough, status is established in the file system 
it would be undesirable for it to know anything about any other subsystem.


We don't care if enumeration is done we only care if we have a 
rational set of storage, and whether that rational set is enough to be 
fully ready, enough to be only read-ready, or just plain not enough.


In theory, the idempotent mount command could be

mount -t btrfs some-uuid-instead-of-device /mount/point
mount -t btrfs some-other-uuid-here /other/mount/point

to create the zero-devices involved entity, followed by

mount -t btrfs -o trydev /dev/something /this/bit/is/ignored

repeated for all possible somethings. /mount/point and 
/other/mount/point would be returning ENOENT for their contents until 
they were ready-enough.


In practice this is very impure compared to how mdadm has the /dev/md- 
namespace in which to build its devices before any actual mount is possible.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-27 Thread Duncan
Robert White posted on Wed, 26 Nov 2014 14:08:14 -0800 as excerpted:

 On 11/25/2014 07:22 PM, Duncan wrote:
From my perspective, however, btrfs is simply incompatible with lvm
 snapshots, because the basic assumptions are incompatible.  Btrfs
 assumes UUIDs will be exactly what they say on the label, /unique/,
 while lvm's snapshot feature directly breaks that uniqueness by copying
 the (former) UUID, thus making the former UUID no longer unique and
 thus no longer truly UUID.  Thus, part of the lvm /feature/ of
 snapshots is in direct contradiction to a basic assumption of btrfs,
 that UUIDs are exactly that, unique, making that feature directly
 incompatible with btrfs on a very basic level.
 
 A finer point here. LVM doesn't copy the UUID. AN LVM snapshot is a
 copy-on-write entity so it _exposes_ the single sector(s) of the
 superblock(s) in both views of the underlying storage.

I /hate/ it when this happens, which is why my posts often end up so 
long.  People keep saying shorten them, but when I try, invariably I end 
up shortcutting something like this and get called on it! =:^(

So, umm... kinda late now, but read that copy as if it had a footnote 
attached, saying Yes, I know it's not actual copy, it's two views of the 
same thing using COW, but my point is, from the btrfs perspective it's a 
copy, the universally UNIQUE ID no longer looks unique and thus no 
longer can be properly called a UUID at all.

Which kinda makes most of the rest of what you said, which I agree with 
in general were it the case that I actually thought of it as a literal 
copy, unnecessary...

Tho I can't fault you for catching and pointing out my shortcut as an 
error, because you're absolutely correct in that case, and I'd almost 
certainly be doing the same thing were the situation reversed.

 So while you may have a point about btrfs being unprepared for LVM,
 neither party is particularly at fault in any way.
 
 The damn you photocopier for making photocopies so identically nature
 of your problem with LVM seems to be leading you to misplaced
 conclusions.

Well, to the extent that I tried to take an unwarranted logical shortcut 
and didn't properly describe it...

But... I'd still say LVM is at fault to the extent that anyone is, as 
it /knows/ it's dealing with UUIDs because after all that's part of 
what's /on/ what it's snapshotting, and it doesn't make any effort to 
deal with the situation, despite the at least theoretical (and now in 
fact) confusion that may occur when former UUIDs are no longer unique and 
thus no longer UUIDs.

However, the point remains, they are pretty much incompatible, in that 
one assumes unique means that a second one won't pop up elsewhere and 
depends on exactly that, while the functionality of the other is exactly 
that, to make another view of the same thing, including the otherwise 
unique ID, pop up elsewhere, with COW semantics.

 If you are waiting for someone to code it up perhaps you should do so.

I'm not sure if that was the singular or plural you, but in any case, 
it won't be /me/, because I'm not a coder, simply another sysadmin 
willing to guinea-pig this fascinating new filesystem toy. =:^)

 As previously stated XFS solved this problem by providing a tool that
 would change the UUID of a file system. This tool cold then be pointed
 at either (or both) the original and/or snapshot volumes as needed.

I think that'll eventually happen.  Actually, I see it's on the wiki 
project ideas page, now (see 1.2.25 and 1.2.26, online/offline UUID 
changes, respectively):

https://btrfs.wiki.kernel.org/index.php/Project_ideas

There's even POC code. =:^)  Wiki page history says Kdave added that on 
06 Oct. 2014, so the entry is reasonably new, and the POC's encouraging, 
but will it go anywhere from there?

 Given that BTRFS want's to play in the same level of abstraction as LVM,
 its kind of a given that they'll butt heads over things like conflicting
 definitions of what it means to take a snapshot.

Agreed.

Actually, given btrfs is already doing much of it, it'd be interesting if 
it eventually got the ability to specify where subvolumes went and limit 
them in size (ideally more directly than the existing btrfs quotas 
related functionality does, etc, thus avoiding having to rely on LVM for 
that and eliminating the need for it in scenarios where that's desired.  
Couple that with the better snapshot handling that is already in the 
works, and would there /still/ be a need for LVM under btrfs then; for 
what if so, and could it too be integrated into btrfs?

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-27 Thread Chris Murphy
On Thu, Nov 27, 2014 at 2:08 AM, Duncan 1i5t5.dun...@cox.net wrote:
 So, umm... kinda late now, but read that copy as if it had a footnote
 attached, saying Yes, I know it's not actual copy, it's two views of the
 same thing using COW, but my point is, from the btrfs perspective it's a
 copy, the universally UNIQUE ID no longer looks unique and thus no
 longer can be properly called a UUID at all.

The copy is sort of a misnomer anyway because up until the computer
age the copy was a derivative, a facsimile, like a photocopy. But a
copy of a digital file is actually another original. Therein lies the
problem with the LVM snapshot in this context, we don't want another
original. We want a copy, as in we want something we know has been
derived from something else, and therefore can be discriminated.

And that's the same problem with subvolume UUIDs being reused when
creating new Btrfs volumes, which have new volume UUIDs, from a Btrfs
seed device. There are now multiple originals of those subvolumes,
there's no distinguishing them by their UUID alone.


 But... I'd still say LVM is at fault to the extent that anyone is, as
 it /knows/ it's dealing with UUIDs because after all that's part of
 what's /on/ what it's snapshotting, and it doesn't make any effort to
 deal with the situation, despite the at least theoretical (and now in
 fact) confusion that may occur when former UUIDs are no longer unique and
 thus no longer UUIDs.

Well RFC 4122 I don't think would say it's not a UUID, the uniqueness
is only guaranteed at the time of UUID creation. And duplication isn't
creation so it's not going to say these things are no longer UUIDs,
they're just UUIDs that have been recycled. That RFC doesn't specify
workflow, but if it did, I think it'd basically say oh crap, why'd
you go and do that? After all a major point of UUIDs is that they are
effectively unlimited in quantity, therefore a.) we don't need central
registry to avoid (unintended) collisions because they're so uncommon,
b.) we're encouraged to not be attached to specific UUIDs when in
doubt just create another one.

A very good example of WTF reusage of a UUID that irks me to no end is
GNU parted devs decided to recycle the Microsoft Windows Basic Data
partition type GUID for Linux partitions. It's like watching someone
get run over by a zamboni with 50 feet of advance notice...



-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-26 Thread Goffredo Baroncelli
On 11/25/2014 11:21 PM, Zygo Blaxell wrote:
  However I still doesn't understood why you want btrfs-w/multiple disk over 
  LVM ?
 I want to split a few disks into partitions, but I want to create,
 move, and resize the partitions from time to time.  Only LVM can do
 that without taking the machine down, reducing RAID integrity levels,
 hotplugging drives, or leaving installed drives idle most of the time.
 
 I want btrfs-raid1 because of its ability to replace corrupted or lost
 data from one disk using the other.  If I run a single-volume btrfs
 on LVM-RAID1 (or dm-RAID1, or RAID1 at any other layer of the storage
 stack), I can detect lost data, but not replace it automatically from
 the other mirror.
OK, now I have understood.

Anyway as workaround, take in account that you can pass explicitly the
devices as:

mount -o device=/dev/sda,device=/dev/sdb,device=/dev/sdc /dev/sdd /mnt

(supposing that the filesystem is on /dev/sda.../dev/sdd)

I am working to a mount.btrfs helper. The aim of this helper is to manage
the assembling of multiple devices; the main points will be:
- wait until all the devices appeared
- allow (if required) to mount in degraded mode after a timeout
- at this point it could/should also skip the lvm-snapshotted devices (but 
before 
I have to know how recognize these) 

I hope to issue the patches in the next week

BR
G.Baroncelli

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-26 Thread Robert White

On 11/25/2014 07:22 PM, Duncan wrote:

From my perspective, however, btrfs is simply incompatible with lvm

snapshots, because the basic assumptions are incompatible.  Btrfs assumes
UUIDs will be exactly what they say on the label, /unique/, while lvm's
snapshot feature directly breaks that uniqueness by copying the (former)
UUID, thus making the former UUID no longer unique and thus no longer
truly UUID.  Thus, part of the lvm /feature/ of snapshots is in direct
contradiction to a basic assumption of btrfs, that UUIDs are exactly
that, unique, making that feature directly incompatible with btrfs on a
very basic level.


A finer point here. LVM doesn't copy the UUID. AN LVM snapshot is a 
copy-on-write entity so it _exposes_ the single sector(s) of the 
superblock(s) in both views of the underlying storage. This is universal 
to the idea of a snapshot. Just as a btrfs subvol snap /old /new 
exposes all the unique elements of /old under the name /new (in 
preparation for the user to implement subsequent divergence); lvmcreate 
--snapshot Old New causes every block-N of Old to be identically 
available as block-N of New (in preparation for the user to implement 
subsequent divergence).


In point of fact the LVM snapshot operation is a zero-copy operation at 
its heart. After the snapshot is established, when a block in modified 
in Old, it's original content is saved in New. When blocks are written 
in New, they are written in place and the reference to the block content 
in Old is overwritten.


This is the reason that fsfreeze is unnecessary for things above LVM 
snapshots as the instant-in-time divergence is _instant_. It's not that 
LVM goes out and does an fsfreeze equivalent action, its that the switch 
to write-divergence is essentially atomic. A bunch of metatdata is setup 
and then all-at-once one write behavior is switched with another by 
re-mapping the device access routines.


So while you may have a point about btrfs being unprepared for LVM, 
neither party is particularly at fault in any way.


The damn you photocopier for making photocopies so identically nature 
of your problem with LVM seems to be leading you to misplaced conclusions.


If you need to harmonize these sorts of things, you need to be able to 
re-write blocks in question with disambiguating information (like new 
UUIDS) or restrict your accesses in some other manner.


If you are waiting for someone to code it up perhaps you should do so. 
But it will _never_ be automatic because the use cases that don't match 
your expectations may need the founding assumptions to be as they are today.


In other words, your belief that your position is entirely logical may 
be a little off, particularly if you think LVM is Copying things when 
it does a snapshot.


As previously stated XFS solved this problem by providing a tool that 
would change the UUID of a file system. This tool cold then be pointed 
at either (or both) the original and/or snapshot volumes as needed.


I don't see a re-make the btrfs option for changing UUIDs and LVM 
doesn't care _at_ _all_ about what is actually in its volumes (okay, 
lvresize has some fsck nonsense, but that's just messy).


It might even be wrong to try to harmonize those features, like trying 
to put a manual clutch into a car with an automatic transmission... it 
may just not fit.


Given that BTRFS want's to play in the same level of abstraction as LVM, 
its kind of a given that they'll butt heads over things like conflicting 
definitions of what it means to take a snapshot.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-26 Thread Zygo Blaxell
On Wed, Nov 26, 2014 at 06:19:05PM +0100, Goffredo Baroncelli wrote:
 On 11/25/2014 11:21 PM, Zygo Blaxell wrote:
   However I still doesn't understood why you want btrfs-w/multiple disk 
   over LVM ?
  I want to split a few disks into partitions, but I want to create,
  move, and resize the partitions from time to time.  Only LVM can do
  that without taking the machine down, reducing RAID integrity levels,
  hotplugging drives, or leaving installed drives idle most of the time.
  
  I want btrfs-raid1 because of its ability to replace corrupted or lost
  data from one disk using the other.  If I run a single-volume btrfs
  on LVM-RAID1 (or dm-RAID1, or RAID1 at any other layer of the storage
  stack), I can detect lost data, but not replace it automatically from
  the other mirror.
 OK, now I have understood.
 
 Anyway as workaround, take in account that you can pass explicitly the
 devices as:
 
 mount -o device=/dev/sda,device=/dev/sdb,device=/dev/sdc /dev/sdd /mnt
 
 (supposing that the filesystem is on /dev/sda.../dev/sdd)
 
 I am working to a mount.btrfs helper. The aim of this helper is to manage
 the assembling of multiple devices; the main points will be:
 - wait until all the devices appeared

...and make sure there are no duplicate UUIDs.

 - allow (if required) to mount in degraded mode after a timeout

This is a terrible idea with current btrfs, at least for read-write
degraded mounting (fallback to read-only degraded would be OK).
Mounting a filesystem read-write and degraded is something you only want
to do immediately before you replace all the missing disks and bring the
filesystem up to a non-degraded space and after you've ensured that the
missing disks can never, ever come back; otherwise, btrfs eats your data
in a slightly different way than we have discussed so far...

 - at this point it could/should also skip the lvm-snapshotted devices (but 
 before 
 I have to know how recognize these) 

You don't have to recognize them as snapshots (and it's probably better
not to treat snapshots specially anyway--how do you know whether the
snapshot or the origin LVs are wanted for mounting?).  You just have to
detect duplicate UUIDs at the btrfs subdevice level, and if any are found,
stop immediately (or get a hint from the admin).

This is a weakness of the current udev and asynchronous device hotplug
concept:  there is no notion of bus enumeration in progress, so we can be
trying to assemble multi-device storage before we have all the devices
visible.  Assembly of aggregate storage (whatever it is--btrfs, md,
lvm2...) has to wait until all known storage buses are fully enumerated
in order to detect if there are duplicates.

 I hope to issue the patches in the next week
 
 BR
 G.Baroncelli
 
 -- 
 gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
 Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5


signature.asc
Description: Digital signature


Re: BTRFS messes up snapshot LV with origin

2014-11-25 Thread Goffredo Baroncelli
On 11/23/2014 01:19 AM, Zygo Blaxell wrote:
[...]
 md-raid works as long as you specify the devices, and because it's always
 the lowest layer it can ignore LVs (snapshot or otherwise).  It's also
 not a particularly common use case, while making an LV snapshot of a
 filesystem is a typical use case.

I fully agree; but you still consider a *multi-device* btrfs over lvm...
This is like a dm over lvm... which doesn't make sense at all (as you 
already wrote)

 
 and mounting the filesystem fails at 3.  
 Are you sure ?
 
 Yes, I'm sure.  I've had to replace filesystems destroyed this way.
 
 [working instance snipped]
 
 On the basis of the example above, in case you want to mount a 
 single-disk, BTRFS seems me to work properly. You have to pay
 attention only to not mount the two filesystem at the same time.
 
 The problem is btrfs stops searching when it sees one disk with each UUID,

BTRFS doens't search anything. It is udev which push the information
on the kernel module. The btrfs module groups these information by UUID.
When a new disk is inserted, overwrite the information of the old one.


 so the set of disks (snapshot vs origin) that you get is *random*.
 For a pair of origin + snapshots, there's a 50% chance it works, 50%
 chance it eats your data.

Sorry but I have to disagree: the code is quite clear 
(see fs/btrfs/volume.c, near line 512):

[...]

} else if (!device-name || strcmp(device-name-str, path)) {
/*
 * When FS is already mounted.
 * 1. If you are here and if the device-name is NULL that
 *means this device was missing at time of FS mount.
 * 2. If you are here and if the device-name is different
 *from 'path' that means either
 *  a. The same device disappeared and reappeared with
 * different name. or
 *  b. The missing-disk-which-was-replaced, has
 * reappeared now.
 *
 * We must allow 1 and 2a above. But 2b would be a spurious
 * and unintentional.

[...]

The case is the 2a; in this case btrfs store the new name and mount it.

Anyway I made a small test: I created 1 btrfs filesystem, and 
made a lvm-snapshot. Then create two different file in the snapshot and in
the original one. I run a program which mounts randomly the first or
the latter, checks if the correct file is present; after more than 130 tests I
never saw your 50% chance it works: it always works.

BR
G.Baroncelli

 
 BR
 G.Baroncelli


 -- 
 gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
 Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-25 Thread Zygo Blaxell
On Tue, Nov 25, 2014 at 05:34:15PM +0100, Goffredo Baroncelli wrote:
 On 11/23/2014 01:19 AM, Zygo Blaxell wrote:
 [...]
  md-raid works as long as you specify the devices, and because it's always
  the lowest layer it can ignore LVs (snapshot or otherwise).  It's also
  not a particularly common use case, while making an LV snapshot of a
  filesystem is a typical use case.
 
 I fully agree; but you still consider a *multi-device* btrfs over lvm...
 This is like a dm over lvm... which doesn't make sense at all (as you 
 already wrote)

It makes sense for btrfs because btrfs can productively use LVs on
different PVs (e.g. btrfs-raid1 on two LVs, one on each PV).  LVM is
the bottom layer because not everything in the world is btrfs--things
like ephemeral /tmp, boot, swap, and temporary backup copies of the btrfs
(e.g.  before running btrfsck) have to live on the same physical drives
as the btrfs filesystems.

  and mounting the filesystem fails at 3.  
  Are you sure ?
  
  Yes, I'm sure.  I've had to replace filesystems destroyed this way.
  
  [working instance snipped]
  
  On the basis of the example above, in case you want to mount a 
  single-disk, BTRFS seems me to work properly. You have to pay
  attention only to not mount the two filesystem at the same time.
  
  The problem is btrfs stops searching when it sees one disk with each UUID,
 
 BTRFS doens't search anything. It is udev which push the information
 on the kernel module. The btrfs module groups these information by UUID.
 When a new disk is inserted, overwrite the information of the old one.

Same result:  when presented with multiple devices with the same UUID,
one is chosen arbitrarily instead of rejecting all of them.

  so the set of disks (snapshot vs origin) that you get is *random*.
  For a pair of origin + snapshots, there's a 50% chance it works, 50%
  chance it eats your data.
 
 Sorry but I have to disagree: the code is quite clear 
 (see fs/btrfs/volume.c, near line 512):
 
 [...]
 
 } else if (!device-name || strcmp(device-name-str, path)) {
 /*
  * When FS is already mounted.
  * 1. If you are here and if the device-name is NULL that
  *means this device was missing at time of FS mount.
  * 2. If you are here and if the device-name is different
  *from 'path' that means either
  *  a. The same device disappeared and reappeared with
  * different name. or
  *  b. The missing-disk-which-was-replaced, has
  * reappeared now.

If the FS is already mounted then there is no issue.  It's when you're trying
to mount the FS that the fun occurs.

  *
  * We must allow 1 and 2a above. But 2b would be a spurious
  * and unintentional.
 
 [...]
 
 The case is the 2a; in this case btrfs store the new name and mount it.
 
 Anyway I made a small test: I created 1 btrfs filesystem, and 
 made a lvm-snapshot. Then create two different file in the snapshot and in
 the original one. I run a program which mounts randomly the first or
 the latter, checks if the correct file is present; after more than 130 tests I
 never saw your 50% chance it works: it always works.

One btrfs filesystem on two LVs with a snapshot of each LV also present.
So you'd have:

lv00 - btrfs device 1
lv01 - btrfs device 2
lv00snap - snapshot of lv00
lv01snap - snapshot of lv01

If you mount by device UUID then you get one of these results at random:

lv00 + lv01 - OK
lv00snap + lv01snap - also OK
lv00 + lv01snap - failure
lv00snap + lv01 - failure

2 failures, 2 successes = 50% failure rate.

If you mount by the name of one of the devices then you only get the two
rows of the above table that match the device you named, but you still
get one success row and one failure row.

Which result you get seems to depend on the order in which LVM enumerates
the LVs, so if you are doing a mount/umount loop then you won't see any
problems as btrfs will consistently make the same choice of LVs over
and over again.  Rebooting or creating other LVs in between mounts will
definitely cause problems.

 BR
 G.Baroncelli
 
  
  BR
  G.Baroncelli
 
 
  -- 
  gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
  Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
 
 
 -- 
 gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
 Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
 


signature.asc
Description: Digital signature


Re: BTRFS messes up snapshot LV with origin

2014-11-25 Thread Goffredo Baroncelli
On 11/25/2014 09:29 PM, Zygo Blaxell wrote:
 On Tue, Nov 25, 2014 at 05:34:15PM +0100, Goffredo Baroncelli wrote:
 On 11/23/2014 01:19 AM, Zygo Blaxell wrote:
 [...]
 md-raid works as long as you specify the devices, and because it's always
 the lowest layer it can ignore LVs (snapshot or otherwise).  It's also
 not a particularly common use case, while making an LV snapshot of a
 filesystem is a typical use case.

 I fully agree; but you still consider a *multi-device* btrfs over lvm...
 This is like a dm over lvm... which doesn't make sense at all (as you 
 already wrote)
 
 It makes sense for btrfs because btrfs can productively use LVs on
 different PVs (e.g. btrfs-raid1 on two LVs, one on each PV).  LVM is
 the bottom layer because not everything in the world is btrfs--things
 like ephemeral /tmp, boot, swap, and temporary backup copies of the btrfs
 (e.g.  before running btrfsck) have to live on the same physical drives
 as the btrfs filesystems.

Let me to summrize

1) btrfs-single-disk on lvm works fine
2) btrfs-w/multiple-disk on lvm works fine
3) btrfs-single-disk on lvm works fine even with snapshot

4) btrfs-w/multiple-disk doesn't work with lvm AND snapshot

However I still doesn't understood why you want btrfs-w/multiple disk over LVM ?



 
 and mounting the filesystem fails at 3.  
 Are you sure ?

 Yes, I'm sure.  I've had to replace filesystems destroyed this way.

In a previous email you wrote:
 Multi-device btrfs fails at 2, 
So I assumed that the point 3 onwards were related to a single-disk btrfs.



[...]


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-25 Thread Zygo Blaxell
On Tue, Nov 25, 2014 at 10:59:53PM +0100, Goffredo Baroncelli wrote:
 On 11/25/2014 09:29 PM, Zygo Blaxell wrote:
  On Tue, Nov 25, 2014 at 05:34:15PM +0100, Goffredo Baroncelli wrote:
  On 11/23/2014 01:19 AM, Zygo Blaxell wrote:
  [...]
  md-raid works as long as you specify the devices, and because it's always
  the lowest layer it can ignore LVs (snapshot or otherwise).  It's also
  not a particularly common use case, while making an LV snapshot of a
  filesystem is a typical use case.
 
  I fully agree; but you still consider a *multi-device* btrfs over lvm...
  This is like a dm over lvm... which doesn't make sense at all (as you 
  already wrote)
  
  It makes sense for btrfs because btrfs can productively use LVs on
  different PVs (e.g. btrfs-raid1 on two LVs, one on each PV).  LVM is
  the bottom layer because not everything in the world is btrfs--things
  like ephemeral /tmp, boot, swap, and temporary backup copies of the btrfs
  (e.g.  before running btrfsck) have to live on the same physical drives
  as the btrfs filesystems.
 
 Let me to summrize
 
 1) btrfs-single-disk on lvm works fine
 2) btrfs-w/multiple-disk on lvm works fine
 3) btrfs-single-disk on lvm works fine even with snapshot
 
 4) btrfs-w/multiple-disk doesn't work with lvm AND snapshot
 
 However I still doesn't understood why you want btrfs-w/multiple disk over 
 LVM ?

I want to split a few disks into partitions, but I want to create,
move, and resize the partitions from time to time.  Only LVM can do
that without taking the machine down, reducing RAID integrity levels,
hotplugging drives, or leaving installed drives idle most of the time.

I want btrfs-raid1 because of its ability to replace corrupted or lost
data from one disk using the other.  If I run a single-volume btrfs
on LVM-RAID1 (or dm-RAID1, or RAID1 at any other layer of the storage
stack), I can detect lost data, but not replace it automatically from
the other mirror.

Since I want both things at the same time, I have btrfs w/multiple disks
on LVM.

The LVM snapshots are for providing an 'undo' capability when I experiment
with some btrfs or btrfsck feature that destroys the filesystem.

  and mounting the filesystem fails at 3.  
  Are you sure ?
 
  Yes, I'm sure.  I've had to replace filesystems destroyed this way.
 
 In a previous email you wrote:
  Multi-device btrfs fails at 2, 
 So I assumed that the point 3 onwards were related to a single-disk btrfs.
 
 
 
 [...]
 
 
 -- 
 gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
 Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5


signature.asc
Description: Digital signature


Re: BTRFS messes up snapshot LV with origin

2014-11-25 Thread Chris Murphy
What happens when all btrfs LVs are unmounted, and you lvchange -an
the LVs (the pair) you do not want mounted; and then btrfs dev scan;
and then mount one of the devices? It should only find the matching LV
because the others are deactivated. I know this isn't ideal, but it's
better than corruption.


Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-25 Thread Duncan
Goffredo Baroncelli posted on Tue, 25 Nov 2014 22:59:53 +0100 as
excerpted:

 However I still doesn't understood why you want btrfs-w/multiple disk
 over LVM ?

While I'm not an LVM person here, and he already replied with essentially 
the same point, I think it's worth repeating...

Btrfs' checksummed error detection and automatic rewrite from a different 
copy isn't a small thing, and simply isn't available at all with most 
would-be alternatives (zfs being the only similar thing I know of for 
Linux, and of course it has its own issues both technical and social/
legal/license).  That alone is worth running multi-device btrfs to get.  
That makes btrfs a near-mandatory part of the picture, whatever it's on.

And for people wanting LVM's volume management (including partitioning 
without many of the limitations), the direct result is multi-device btrfs 
on lvm.

From my perspective, however, btrfs is simply incompatible with lvm 
snapshots, because the basic assumptions are incompatible.  Btrfs assumes 
UUIDs will be exactly what they say on the label, /unique/, while lvm's 
snapshot feature directly breaks that uniqueness by copying the (former) 
UUID, thus making the former UUID no longer unique and thus no longer 
truly UUID.  Thus, part of the lvm /feature/ of snapshots is in direct 
contradiction to a basic assumption of btrfs, that UUIDs are exactly 
that, unique, making that feature directly incompatible with btrfs on a 
very basic level.

So people can have their btrfs on lvm, but if they do, they have to forego 
LVM snapshots because btrfs isn't compatible with their usage.  To me 
it's as simple as that, and people can choose either btrfs or lvm 
snapshots, but not both, it's one XOR the other.  So for me it's simply 
choose the one you will have the most difficulty doing without and forgo 
the other one.  Not a problem, just make your choice and move on.

OTOH, there's that common signature about the reasonable man folding to 
the circumstance while the unreasonable man insisting on folding the 
circumstance to his wishes instead, so progress depends on the 
unreasonable man...

But that's exactly what I see here, an unreasonable man insisting that 
entirely logical circumstance bend to his will.  Which, given someone to 
actually code it up, it might well do. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-25 Thread Chris Murphy
On Tue, Nov 25, 2014 at 7:11 PM, Zygo Blaxell zblax...@furryterror.org wrote:
 On Tue, Nov 25, 2014 at 03:46:32PM -0700, Chris Murphy wrote:
 What happens when all btrfs LVs are unmounted, and you lvchange -an
 the LVs (the pair) you do not want mounted; and then btrfs dev scan;
 and then mount one of the devices? It should only find the matching LV
 because the others are deactivated. I know this isn't ideal, but it's
 better than corruption.

 This is one of two possible ways to assemble the btrfs correctly.
 The other is to explicitly name all of the devices when mounting.

OK I didn't realize it was possible to explicitly name all of them,
the last time I'd tried this (about 9 epochs ago) mount didn't
understand being passed two devices before the mount point.


 The challenge for the poor end-user (or inexperienced sysadmin) is to
 defeat all the defaults in system installers, initramfs-tools, lvm2,
 udev, etc. to prevent btrfs from destroying a filesystem accidentally.

I agree if it finds two identical volumes it should fail to mount with
some coherent error.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-22 Thread Goffredo Baroncelli
On 11/21/2014 05:28 AM, Zygo Blaxell wrote:
 e.g. if an ext4 filesystem explodes, I can:
 
   1.  make a LVM snapshot of the broken filesystem
 
   2.  run e2fsck on the snapshot
 
   3.  mount and repair the snapshot, e.g. rsync any missing files
   from backups, salvage anything that survived
 
   4.  LVM merge the snapshot to its origin volume
 
   5.  umount the origin volume and mount the merged volume
   (or just reboot)
 
 ...and I can do all of this on a running system, in-place, with only a
 few minutes of downtime in the must-reboot case.
 
 None of the above works with btrfs at all.  Multi-device btrfs fails
 at 2, 

You can't compare ext4 with btrfs, if you are talking about a multi-device 
filesystem: ext4 haven't this capability. 
Try to make a md-raid over a snapshotted logical volume(s); I never tried
that, but I suppose that there will be the same problems...

 and mounting the filesystem fails at 3.  
Are you sure ?

ghigo@venice:/tmp$ # create a btrfs filesystem in a logical volume
ghigo@venice:/tmp$ sudo truncate -s +10G disk.img
ghigo@venice:/tmp$ sudo losetup -f disk.img 
ghigo@venice:/tmp$ sudo pvcreate /dev/loop0 
ghigo@venice:/tmp$ sudo vgcreate vgtest /dev/loop0 
ghigo@venice:/tmp$ sudo lvcreate -n lvone -L 3G vgtest
ghigo@venice:/tmp$ sudo mkfs.btrfs /dev/vgtest/lvone 
ghigo@venice:/tmp$ mkdir t

ghigo@venice:/tmp$ # create a file inside a btrfs fs
ghigo@venice:/tmp$ sudo mount /dev/vgtest/lvone t/
ghigo@venice:/tmp$ sudo dd if=/dev/zero of=t/disk-orig bs=1M count=1
ghigo@venice:/tmp$ sudo umount t

ghigo@venice:/tmp$ # make a lvm snapshot and add a 2nd file
ghigo@venice:/tmp$ sudo lvcreate -s -n lvone_snap -L 3G vgtest/lvone
ghigo@venice:/tmp$ sudo mount /dev/vgtest/lvone_snap t/
ghigo@venice:/tmp$ sudo dd if=/dev/zero of=t/disk-snap bs=1M count=1
ghigo@venice:/tmp$ sudo umount t

ghigo@venice:/tmp$ # mount the first one lv, and check the file
ghigo@venice:/tmp$ sudo mount /dev/vgtest/lvone t/
ghigo@venice:/tmp$ ls -l t
total 1024
-rw-r--r-- 1 root root 1048576 Nov 22 18:11 disk-orig
ghigo@venice:/tmp$ sudo umount t

ghigo@venice:/tmp$ # mount the first one lv, and check the files
ghigo@venice:/tmp$ sudo mount /dev/vgtest/lvone_snap t/
ghigo@venice:/tmp$ ls -l t
total 2048
-rw-r--r-- 1 root root 1048576 Nov 22 18:11 disk-orig
-rw-r--r-- 1 root root 1048576 Nov 22 18:12 disk-snap

On the basis of the example above, in case you want to mount a 
single-disk, BTRFS seems me to work properly. You have to pay
attention only to not mount the two filesystem at the same time.

BR
G.Baroncelli


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Volume/subvolume UUID uniqueness, was: BTRFS messes up snapshot LV with origin

2014-11-22 Thread Chris Murphy
I don't know how to fix this but I've convinced myself there's at
least a small problem. And not just with LVM snapshots as in the
originating thread.

- Via seed device method of creating a Btrfs volume, the resulting
volume gets a new UUID. The volume UUID from the seed device doesn't
pass through, is not inherited / copied. Therefore there's already
recognition that snapshotting a Btrfs volume, which is what volume
creation from a seed device effectively is, should result in the new
volume getting a new UUID.

Therefore it seems reasonable a mechanism to support new volume UUIDs
upon LVM snapshots being taken is needed. Maybe leveraging existing
seed code can help, consider existing volume data a virtual seed
device, and the remaining free space as a virtual added device to
enable changing volume UUID rather than rewriting possibly piles of
UUIDs.


- While the seed device method of creating a Btrfs volume results in a
new volume UUID, subvolume UUIDs from the seed pass through to the new
volume. Since I can create many new volumes from one seed device, in
effect I'm creating many instances of subvolumes with identical UUIDs
and can now no longer be differentiated, locally and remotely. This
seems to be a much bigger problem than the LVM case, since it occurs
with only Btrfs tools being used.

The grandiose idea of UUIDs is persistence in identifying a specific
object/resource for all time, anywhere in the universe. Reducing this
to something practical, it should enable a way to identify an object
or resource within one or two human lifetimes, within our solar
system. Yet the current implementation has broken this on a much
shorter time scale, on a single computer.

Since we recognize subvolume snapshots should get new subvolume UUIDs,
and volume snapshots via seed device method creation of new volumes
get new volume UUIDs; a volume snapshot of course is also snapshotting
the subvolumes too, so the subvolume UUIDs can't pass through the way
they do right now. It's not correct behavior.

Another matter is what to do with parent uuid and snapshot
relationship metadata in the new volume. Assume all subvolumes get new
UUIDs on the new volume, there are three potentials:
1. parent uuid is always blank, no relationships between subvolumes is preserved
2. parent uuid is the uuid of its identical mirror (the original) in
the seed device.
3. parent uuid is the new uuid of its relative parent on the current
new volume, preserving relationships between subvolumes and snapshots.

I think any of those three are better than UUID duplication (recycling
actually). Maybe I'm not thinking of a use case for preserving these
UUIDs but at the moment I think it's specious. We can't be attached to
specific UUIDs, the instant a subvolume is effectively snapshot by LVM
or Btrfs seed device, it's a unique object/resource, and should have
its own URN. Afterall by default these objects are read/write. Maybe
if by default they were readonly I could be convinced of the validity
of UUID preservation.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Volume/subvolume UUID uniqueness, was: BTRFS messes up snapshot LV with origin

2014-11-22 Thread Robert White

On 11/22/2014 02:50 PM, Robert White wrote:

Take a couple snapshots of a subvolume, and then
send those subvolumes to another file system with send/receive, and then
do btrfs subvolume list -u -q on the two filesystems and tell me that
mess makes sense. Or try to recreate a subvolume from its snapshot in a
way that doesn't shatter the relationships in your backup scheme. (I'm
researching for a couple patches but I'm not expecting a warm reception
given the silence to date).


(ASIDE In particular use btrfs sub send -c SNAP1 SNAP2 and then btrfs 
sub send -c SNAP2 SNAP3 etc before doing the btrfs sub list -u -q to 
view the mess I speak of.)


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-22 Thread Zygo Blaxell
On Sat, Nov 22, 2014 at 06:34:38PM +0100, Goffredo Baroncelli wrote:
 On 11/21/2014 05:28 AM, Zygo Blaxell wrote:
  e.g. if an ext4 filesystem explodes, I can:
  
  1.  make a LVM snapshot of the broken filesystem
  
  2.  run e2fsck on the snapshot
  
  3.  mount and repair the snapshot, e.g. rsync any missing files
  from backups, salvage anything that survived
  
  4.  LVM merge the snapshot to its origin volume
  
  5.  umount the origin volume and mount the merged volume
  (or just reboot)
  
  ...and I can do all of this on a running system, in-place, with only a
  few minutes of downtime in the must-reboot case.
  
  None of the above works with btrfs at all.  Multi-device btrfs fails
  at 2, 
 
 You can't compare ext4 with btrfs, if you are talking about a multi-device 
 filesystem: ext4 haven't this capability. 

btrfs fails this comparison as a single-device filesystem.

 Try to make a md-raid over a snapshotted logical volume(s); I never tried
 that, but I suppose that there will be the same problems...

md-raid works as long as you specify the devices, and because it's always
the lowest layer it can ignore LVs (snapshot or otherwise).  It's also
not a particularly common use case, while making an LV snapshot of a
filesystem is a typical use case.

  and mounting the filesystem fails at 3.  
 Are you sure ?

Yes, I'm sure.  I've had to replace filesystems destroyed this way.

[working instance snipped]

 On the basis of the example above, in case you want to mount a 
 single-disk, BTRFS seems me to work properly. You have to pay
 attention only to not mount the two filesystem at the same time.

The problem is btrfs stops searching when it sees one disk with each UUID,
so the set of disks (snapshot vs origin) that you get is *random*.
For a pair of origin + snapshots, there's a 50% chance it works, 50%
chance it eats your data.

 BR
 G.Baroncelli
 
 
 -- 
 gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
 Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5


signature.asc
Description: Digital signature


Re: BTRFS messes up snapshot LV with origin

2014-11-21 Thread Robert White

On 11/20/2014 10:22 PM, Duncan wrote:

But while other filesystems might allow un-UUIDs (heh, UUUIDs or U3IDs
=:^), because they're no longer unique, requiring them to be unique just
as the label says cannot be considered a bug.  It's simply stricter
enforcement of the rules, which are, after all, plainly stated in the
descriptive name.


You take Us away, not add them

UID = unique ID
GUID = globally unique ID
UUID = universally unique ID


And other file systems have the same issues. XFS, for example uses UUIDs 
in the same way. It just has a command to re-brand the filesystem's UUID 
which you apply to the LVM snapshot immediately after taking the 
snapshot. (problem long-since established and understood since 2009 or so.)


I don't know if this approach would work for BRFS with subvolumes.

Example Citation :: 
http://www.miljan.org/main/2009/11/16/lvm-snapshots-and-xfs/


XFS also has the nouuids mount option.

btrfs has device= mount option.

But any system with unique ids will have this identical issue when 
block-snapshot support is added underneath.


-- Rob.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-21 Thread Duncan
Robert White posted on Fri, 21 Nov 2014 03:35:05 -0800 as excerpted:

 On 11/20/2014 10:22 PM, Duncan wrote:
 But while other filesystems might allow un-UUIDs (heh, UUUIDs or U3IDs
 =:^), because they're no longer unique, requiring them to be unique
 just as the label says cannot be considered a bug.  It's simply
 stricter enforcement of the rules, which are, after all, plainly stated
 in the descriptive name.
 
 You take Us away, not add them
 
 UID = unique ID GUID = globally unique ID UUID = universally unique ID

I was making a joke, as I happened to notice un-UUID =3 U-s just as I was 
writing that.  Universally unique ID = UUID, un-UUID (not universally 
unique ID) = UUUID = U^3ID. =:^)

Of course formally it'd be NUID (not/non- unique) or some such, but un-
UUID served my purpose well enough, including the joke once I noticed it, 
so...

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-21 Thread Zygo Blaxell
On Fri, Nov 21, 2014 at 06:22:57AM +, Duncan wrote:
 After all, an LVM block-level snapshot takes the same space as a file 
 containing the same raw data, and if there's room for the data in an LVM 
 snapshot, given a different layout, there's room for exactly the same 
 amount of data as a file on a different filesystem, piped thru some 
 compressor if necessary due to tight datasize constraints.

That isn't true at all.  A repairing fsck can take less than 1% of the
overall volume size, and a full conversion from another filesystem type
can take less than 10%.  Usually I can find enough space by blowing away
the swap LV for a few hours.

I do NOT usually have 13TB of slack space lying around in a 26TB disk
array, nor do I have enough bandwidth to move those 13TB to another
machine without great inconvenience.

 But while other filesystems might allow un-UUIDs (heh, UUUIDs or U3IDs 
 =:^), because they're no longer unique, requiring them to be unique just 
 as the label says cannot be considered a bug.  It's simply stricter 
 enforcement of the rules, which are, after all, plainly stated in the 
 descriptive name.

It's not a bug as long as I can completely control which devices are
searched for UUIDs, and the system behaves sanely when multiple UUIDs
are found through automatic discovery; otherwise, it's not only a bug,
it's a DoS attack security vulnerability.  Consider what happens if
someone looks at /sys/fs/btrfs, reads the non-secret UUIDs, builds a fake
filesystem with those UUIDs, puts the fake filesystem on a USB stick,
and plugs it back into the victim machine...

 -- 
 Duncan - List replies preferred.   No HTML msgs.
 Every nonfree program has a lord, a master --
 and if you use the program, he is your master.  Richard Stallman
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: Digital signature


Re: BTRFS messes up snapshot LV with origin

2014-11-21 Thread Chris Murphy
On Thu, Nov 20, 2014 at 11:22 PM, Duncan 1i5t5.dun...@cox.net wrote:


 When I have such a filesystem level problem, I simply dd from the backing
 device to some other location, generally to a file that's on a different
 filesystem (preferrably non-btrfs, I use reiserfs as I've found it very
 resilient, here), in which case btrfs device scan won't see the UUID on
 the copy as it scans block devices, not inside non-device files.

That's hours of dd and you have to find space to do it.


 After all, an LVM block-level snapshot takes the same space as a file
 containing the same raw data, and if there's room for the data in an LVM
 snapshot, given a different layout, there's room for exactly the same
 amount of data as a file on a different filesystem, piped thru some
 compressor if necessary due to tight datasize constraints.

That's not true for thin volume snapshots. They take up next to no
space upon creation, they don't need space reserved in advance.
They're more like a qcow2 snapshot than a conventional LVM snapshot; a
big difference being if you delete the snapshot, or you delete a bunch
of files in a thin volume and follow it with fstrim, the unused
extents are returned to the thin pool.

There has been a fragmentation problem with thin volumes; I don't know
if that's solved yet. And I don't know if it exacerbates things with
Btrfs fragmentation.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-21 Thread Duncan
Chris Murphy posted on Fri, 21 Nov 2014 11:23:45 -0700 as excerpted:

 On Thu, Nov 20, 2014 at 11:22 PM, Duncan 1i5t5.dun...@cox.net wrote:
 
 
 When I have such a filesystem level problem, I simply dd from the
 backing device to some other location, generally to a file that's on a
 different filesystem (preferrably non-btrfs, I use reiserfs as I've
 found it very resilient, here), in which case btrfs device scan won't
 see the UUID on the copy as it scans block devices, not inside
 non-device files.
 
 That's hours of dd and you have to find space to do it.

I did it recently here.  There's a method to my sub-100-GiB partition 
madness! =:^)  The partitions in question were on SSD, and were small 
enough I could simply DD them to files on my media filesystem, which was 
after all designed to be able to take full ISO images, etc.

Additionally, due to size and reasonably consistent linear intra-file 
access patterns, the media filesystem's still on much cheaper spinning 
rust, while most of the system's on much faster to random-access but far 
more expensive SSD, so in this case one side was SSD, the other spinning 
rust.

Tho granted, if you're doing single-partition/filesystem multi-TiB 
filesystems, it does get to be a problem.  As there would have been if 
the filesystem in question was the media filesystem, altho that one's not 
yet btrfs for a reason.  But still, if there's room enough for an LVM 
snapshot in the first place, with a different layout, there'd be room for 
the same data as a file.  That's pretty basic.

 After all, an LVM block-level snapshot takes the same space as a file
 containing the same raw data, and if there's room for the data in an
 LVM snapshot, given a different layout, there's room for exactly the
 same amount of data as a file on a different filesystem, piped thru
 some compressor if necessary due to tight datasize constraints.
 
 That's not true for thin volume snapshots. They take up next to no space
 upon creation, they don't need space reserved in advance.

Thus the mention of compression if necessary.  Thin-volume snapshots are 
effectively compression by another name, and a raw dd from them should 
compress pretty much equally well, depending on compression method 
chosen, of course. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-21 Thread Duncan
Zygo Blaxell posted on Fri, 21 Nov 2014 12:56:23 -0500 as excerpted:

 It's not a bug as long as I can completely control which devices are
 searched for UUIDs, and the system behaves sanely when multiple UUIDs
 are found through automatic discovery; otherwise, it's not only a bug,
 it's a DoS attack security vulnerability.  Consider what happens if
 someone looks at /sys/fs/btrfs, reads the non-secret UUIDs, builds a
 fake filesystem with those UUIDs, puts the fake filesystem on a USB
 stick, and plugs it back into the victim machine...

With the current state of USB vulnerability (firmware reprogrammed as an 
input device, etc, the vuln has been all over the tech news for some 
months now), anyone with USB access to the machine is simply another case 
of anyone with physical access to the machine, they're normally assumed 
to be able to be able to at minimum take down the machine, the ultimate 
DoS, in any case, and often to have effective root, tho that can be 
mitigated to some extent with encryption, etc.  It's generally assumed 
that if you have physical access, as required to plug in that USB, game 
over, the machine is effectively p40wn3d.  At the /very/ least, with 
physical access it's vulnerable to the sledgehammer DoS, and there's 
little to be done about that but prevent physical access by all means 
necessary (armed guards, nuclear silo hosting, etc) in the first place.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-21 Thread Duncan
Duncan posted on Fri, 21 Nov 2014 22:49:06 + as excerpted:

 Chris Murphy posted...

 That's not true for thin volume snapshots. They take up next to no
 space upon creation, they don't need space reserved in advance.
 
 Thus the mention of compression if necessary.  Thin-volume snapshots are
 effectively compression by another name, and a raw dd from them should
 compress pretty much equally well, depending on compression method
 chosen, of course. =:^)

Oops, I mis-parsed thin.  Good point and thanks, Chris.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-21 Thread Duncan
Duncan posted on Fri, 21 Nov 2014 23:41:49 + as excerpted:

 Duncan posted on Fri, 21 Nov 2014 22:49:06 + as excerpted:
 
 Chris Murphy posted...
 
 That's not true for thin volume snapshots. They take up next to no
 space upon creation, they don't need space reserved in advance.
 
 Thus the mention of compression if necessary.  Thin-volume snapshots
 are effectively compression by another name, and a raw dd from them
 should compress pretty much equally well, depending on compression
 method chosen, of course. =:^)
 
 Oops, I mis-parsed thin.  Good point and thanks, Chris.

... And Zygo, who pointed out my error as well. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-20 Thread Zygo Blaxell
On Mon, Nov 17, 2014 at 08:04:05PM +0100, Goffredo Baroncelli wrote:
 On 2014-11-17 07:59, Brendan Hide wrote:
  
  That leaves two aspects of this issue which I view as two separate bugs:
  a) Btrfs cannot gracefully handle separate filesystems that have the same 
  UUID. At all.
  b) Grub appears to pick the wrong filesystem when presented with two 
  filesystems with the same UUID.
  
  I feel a) is a btrfs bug.
  I feel b) is a bug that is more about ecosystem design than grub being 
  silly.
 
 Regarding a)
 IIRC, btrfs collects the filesystem information by UUID; if two 
 filesystems have the same UUID (like the LVM-snapshot case), the
 last filesystem discovered overwrite the first one.
 
 The filesystem discovering is done in user-space; so it should be simple
 to skip a filesystem on a LVM-snapshot.
 
 Regarding b)
 I am bit confused: if I understood correctly, the root filesystem was
 picked from a LVM-snapshot, so grub-probe *correctly* reported that
 the root device is the snapshot.
 The problem was that during the boot filesystem discovering: first
 scanned the *real* device, then the LVM-snapshot; the latter
 overwrote the former so the system booted from the LVM-snapshot.

IMHO if the device UUID search finds multiple devices with the same device
UUID, it should ignore _all_ of them as the identification problem
is unsolvable without further user input.  This is what the 'device='
mount option is for.

 My conclusion is that we should improve the btrfs scan so:
 - in udev rules, a partition that is a LVM snapshot by default 
 should be not scanned by btrfs dev scan
 - btrfs dev scan, during the partition discovery should skip the 
 lvm-snapshot.

That would mean I can't do this:

1.  lvm snapshot of ext4 filesystem

2.  btrfs-convert the snapshot

3.  mount the snapshot, make sure it's OK

4.  merge LVM snapshot to overwrite original ext4 filesystem

which would be a shame since that's the only way I ever convert ext3/4
filesystems to btrfs (btrfs-convert is a little buggy still).

 BR
 G.Baroncelli
 
 
 
 -- 
 gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
 Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: Digital signature


Re: BTRFS messes up snapshot LV with origin

2014-11-20 Thread Zygo Blaxell
On Wed, Nov 19, 2014 at 10:20:17AM -0500, Phillip Susi wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 11/18/2014 9:54 PM, Chris Murphy wrote:
  Why is it silly? Btrfs on a thin volume has practical use case
  aside from just being thinly provisioned, its snapshots are block
  device based, not merely that of an fs tree.
 
 Umm... because one of the big selling points of btrfs is that it is in
 a much better position to make snapshots being aware of the fs tree
 rather than doing it in the block layer.

One of the big selling points of LVM is that it is in a much better
position to make snapshots so you can run btrfsck on the shattered
remains of your broken btrfs filesystem.

The UUID-driven behavior of btrfs is _really extremely annoying_.
No other filesystem forces me to jump through the hoops btrfs does
to get routine admin tasks done.

e.g. if an ext4 filesystem explodes, I can:

1.  make a LVM snapshot of the broken filesystem

2.  run e2fsck on the snapshot

3.  mount and repair the snapshot, e.g. rsync any missing files
from backups, salvage anything that survived

4.  LVM merge the snapshot to its origin volume

5.  umount the origin volume and mount the merged volume
(or just reboot)

...and I can do all of this on a running system, in-place, with only a
few minutes of downtime in the must-reboot case.

None of the above works with btrfs at all.  Multi-device btrfs fails
at 2, and mounting the filesystem fails at 3.  The closest I've gotten
to this workflow is to set up a kvm instance that can see only the LVM
snapshots, (only) and run the btrfsck or rsync there--and hope that the
system doesn't crash and reboot during that time, or the filesystem will
be more or less destroyed by the random combination of origin and
snapshot LVs.

I've also learned the hard way to always make an LVM snapshot before
running btrfsck, just in case you discover a new btrfsck bug with your
filesystem.  That at least works for single-device btrfs filesystems.

 So it is kind of silly in the first place to be using lvm snapshots
 under btrfs, but it is is doubly silly to use lvm for snapshots, and
 btrfs for the mirroring rather than lvm.  Pick one layer and use it
 for both functions.  Even if that is lvm, then it should also be
 handling the mirroring.
 
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v2.0.17 (MingW32)
 
 iQEcBAEBAgAGBQJUbLUxAAoJEI5FoCIzSKrwh0oH/3TZ2oo8u2BjHYO3b0x8800/
 LFkmGFWrZFSnAvtWuN5B1WlhMXku4dxLRXz14fJKFp3fNmnYRNVvw3tu9btvsBsC
 sZdwLaKwKPHTK8RS+QCI2pZPX+cGB+F7/z9PCHrzIzzCKk/4SvnJ76e2nnZFpY1m
 Md3f1BCHEVUPMMXbqv6Ry6v7PDs/8bx8WITYyAL9uh3tjh0dXQsjbZJn5u4XDitS
 /CoE8eX4rf1vc7qHI4K56TtArCcXQxAHcC56fXmcmS03bVhAkkJ5Z+/uwi6+TkJe
 55rMFCd7UFy9pwKha3Q2flJHtDYG6ns7Njyff6BSL9Yzq7tHh4wLk1H3XxaOCP8=
 =ktv/
 -END PGP SIGNATURE-
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: Digital signature


Re: BTRFS messes up snapshot LV with origin

2014-11-19 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11/18/2014 9:54 PM, Chris Murphy wrote:
 Why is it silly? Btrfs on a thin volume has practical use case
 aside from just being thinly provisioned, its snapshots are block
 device based, not merely that of an fs tree.

Umm... because one of the big selling points of btrfs is that it is in
a much better position to make snapshots being aware of the fs tree
rather than doing it in the block layer.

So it is kind of silly in the first place to be using lvm snapshots
under btrfs, but it is is doubly silly to use lvm for snapshots, and
btrfs for the mirroring rather than lvm.  Pick one layer and use it
for both functions.  Even if that is lvm, then it should also be
handling the mirroring.

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUbLUxAAoJEI5FoCIzSKrwh0oH/3TZ2oo8u2BjHYO3b0x8800/
LFkmGFWrZFSnAvtWuN5B1WlhMXku4dxLRXz14fJKFp3fNmnYRNVvw3tu9btvsBsC
sZdwLaKwKPHTK8RS+QCI2pZPX+cGB+F7/z9PCHrzIzzCKk/4SvnJ76e2nnZFpY1m
Md3f1BCHEVUPMMXbqv6Ry6v7PDs/8bx8WITYyAL9uh3tjh0dXQsjbZJn5u4XDitS
/CoE8eX4rf1vc7qHI4K56TtArCcXQxAHcC56fXmcmS03bVhAkkJ5Z+/uwi6+TkJe
55rMFCd7UFy9pwKha3Q2flJHtDYG6ns7Njyff6BSL9Yzq7tHh4wLk1H3XxaOCP8=
=ktv/
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-19 Thread Chris Murphy
On Wed, Nov 19, 2014 at 8:20 AM, Phillip Susi ps...@ubuntu.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 11/18/2014 9:54 PM, Chris Murphy wrote:
 Why is it silly? Btrfs on a thin volume has practical use case
 aside from just being thinly provisioned, its snapshots are block
 device based, not merely that of an fs tree.

 Umm... because one of the big selling points of btrfs is that it is in
 a much better position to make snapshots being aware of the fs tree
 rather than doing it in the block layer.

This is why we have fsfreeze before taking block level snapshots. And
I point out that consistent snapshots with Btrfs have posed challenges
too, there's a recent fstest snapshoting after file write + truncate
for this reason.

A block layer snapshot will snapshot the entire file system, not just
one tree. We don't have a way in Btrfs to snapshot the entire volume.
Considering how things still aren't exactly stable yet, in particular
with many snapshots, it's not unreasonable to want to freeze then
snapshot the entire volume before doing some possibly risky testing or
usage where even a Btrfs snapshot doesn't protect your entire volume
should things go wrong.



 So it is kind of silly in the first place to be using lvm snapshots
 under btrfs, but it is is doubly silly to use lvm for snapshots, and
 btrfs for the mirroring rather than lvm.  Pick one layer and use it
 for both functions.  Even if that is lvm, then it should also be
 handling the mirroring.


Thin volumes are more efficient. And the user creating them doesn't
have to mess around with locating physical devices or possibly
partitioning them. Plus in enterprise environments with lots of
storage and many different kinds of use cases, even knowledable users
aren't always granted full access to the physical storage anyway. They
get a VG to play with, or now they can have a thin pool and only
consume on storage what is actually used, and not what they've
reserved. You can mkfs a 4TG virtual size volume, while it only uses
1MB of physical extents on storage. And all of that is orthogonal to
using XFS or Btrfs which again comes down to use case. And whether I'd
have LVM mirror or Btrfs mirror is again a question of use case, maybe
I'm OK with LVM mirroring and I just get the rare corrupt file warning
and that's OK. In another use case, corruption isn't OK, I need higher
availability of known good data therefore I need Btrfs doing the
mirroring.

So I find your argument thus far uncompelling.


Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-19 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11/19/2014 1:33 PM, Chris Murphy wrote:
 Thin volumes are more efficient. And the user creating them doesn't
 have to mess around with locating physical devices or possibly
 partitioning them. Plus in enterprise environments with lots of
 storage and many different kinds of use cases, even knowledable
 users aren't always granted full access to the physical storage
 anyway. They get a VG to play with, or now they can have a thin
 pool and only consume on storage what is actually used, and not
 what they've reserved. You can mkfs a 4TG virtual size volume, 
 while it only uses 1MB of physical extents on storage. And all of 
 that is orthogonal to using XFS or Btrfs which again comes down to 
 use case. And whether I'd have LVM mirror or Btrfs mirror is again 
 a question of use case, maybe I'm OK with LVM mirroring and I just 
 get the rare corrupt file warning and that's OK. In another use 
 case, corruption isn't OK, I need higher availability of known
 good data therefore I need Btrfs doing the mirroring.

Correct me if I'm wrong, but this kind of setup is basically where you
have a provider running an lvm thin pool volume on their hardware, and
exposing it to the customer's vm as a virtual disk.  In that case,
then the provider can do their snapshots and it won't cause this
problem since the snapshots aren't visible to the vm.  Also in these
cases the provider is normally already providing data protection by
having the vg on a raid6 or raid60 or something, so having the client
vm mirror the data in btrfs is a bit redundant.




-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUbO4nAAoJEI5FoCIzSKrwl/QIAJ7arJ0ZXVc16pBRjE2F66uV
GAOhatdx8pLhGey6by+gV8Ltvx4bK3BG40dkvQIM9RN9UFC5vofQ4FnzIn1nfXZB
qyyITE2mF+lE3RNCb8ZKxwG58rfa9NOModPCeNVFWkS6+fyyhGY23sliWbVO6b15
w6BD5xu/Pp7Fhgkx81AL07XpusR9c8pKZd8ZHw4nozFHw20+13XuL+2g8axpZS+O
Xd9W5GRlC+0k9jQ0q9xGi1jh6QpjMSWVj54MNS5jRubsY65TtmFPkdvgaMGD4U5k
bADSEUMfij9NRMw8VwA4ik/JEi1IbukD4u1geKeZTowMGXReel2RimeA/PhFYcc=
=tmDI
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-18 Thread Duncan
Chris Murphy posted on Mon, 17 Nov 2014 23:21:57 -0700 as excerpted:

 I think we’re well past the expiration date on grub.cfg, a line should
 be drawn in the sand to deprecate routine use of os-prober +
 grub-mkconfig,
 and move to drop-in scripts by whatever the distro presumes will be
 responsible for managing what “tree” will be booted or will be offered
 as a boot option, all GRUB needs to learn is how to use that drop in
 script file format.
 
 Ergo just because I’ve snapshot my root does not mean grub-mkconfig
 should be creating boot entries for it. But whatever usespace tool I’m
 using to do those snapshots (ostree, snapper, whatever the GNOME folks
 might come up with) should be the thing that creates the boot entry
 script; or as simple as this 2-4 line script should be, even hand done
 by a user, unlike the current grub.cfg file format.

FWIW, I hand-edit my grub.cfg here, grub-probe was taking /forever/ on my 
system back when I upgraded to grub2, and the direct drive 
configuration of direct grub.cfg editing was /far/ more flexible, or at 
least /far/ easier to learn how to do what I wanted to do than to figure 
out how to do it thru the translation layer, in any case.

The configuration is advanced enough it has individual choices to set 
standard init and init=/bin/bash, current/fallback/stable kernels, 
current/backup/second-backup roots, etc, plus a choice to interactively 
type in additional kernel commandline options, loading those choices into 
grub variables as I change them, then another choice to boot using the 
loaded variables to select the kernel and setup the kernel commandline.  
The initial grub.cfg has the default boot option, plus others that load 
either a troubleshooting menu or the backups choices menu, from separate 
included config files, as necessary.  Just /thinking/ about trying to do 
that via the cumbersome translation layer gives me a headache, and since 
I had to learn the grub scripting layer language to set it up anyway, I 
might as well just write and troubleshoot it in that directly rather than 
trying to figure out how to get the translation layer to write it, and 
then have to troubleshoot BOTH the translation layer and the lower level 
script.

Then I deleted grub-probe and grub-mkconfig so they couldn't be run 
accidentally with unconfigured/default translation-level options to undo 
all my hard work, and set a mask on them so updating the package wouldn't 
reinstall them.

So deprecate/kill os-prober and grub-mkconfig if you want, but grub.cfg 
needs to stay working!

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-18 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11/18/2014 1:16 AM, Chris Murphy wrote:
 If fstab specifies rootfs as UUID, and there are two volumes with
 the same UUID, it’s now ambiguous which one at boot time is the
 intended rootfs. It’s no different than the days of /dev/sdXY where
 X would change designations between boots = ambiguity and why we
 went to UUID.

He already said he has NOT rebooted, so there is no way that the
snapshot has actually been mounted, even if it were UUID confusion.

 So we kinda need a way to distinguish derivative volumes. Maybe
 XFS and ext4 could easily change the volume UUID, but my vague 
 recollection is this is difficult on Btrfs? So that led me to the 
 idea of a way to create an on-the-fly (but consistent) “virtual 
 volume UUID” maybe based on a hash of both the LVM LV and fs
 volume UUID.

When using LVM, you should be referring to the volume by the LVM name
rather than UUID.  LVM names are stable, and don't have the duplicate
uuid problem.

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUa2j4AAoJEI5FoCIzSKrwvywH/3yS25MAIwsGfIwBfCrNN5Qo
NlBttcUcrYgOD/nQHEuulHdilWrvz3q6jGwVL9W8MQsHm0Ah5dMatT5e5zr1DSNC
ZqSEXSE8jsYJu99FUWevxO7wtb94ioKa+OF1u0zsaA5yQUdaj5smPqK3iUfskUhs
jE/vsJmws5iBv0dxnZI/6n3YqOB1Qck4PcMItRj8xvZQ0GjARIVw36pgJnmboGfY
vWRmUXnTeLMu9ilHWhqNUIh3lTTUvRdaYoZtTr6eYh9sIntDCegN71WGmO8FfdjP
vXhikg7Yx7FhkhxAl1X2NzM93d7fUSQDeQfTLYLMDbbTV/n2HwcoZ6G2+IQEJnQ=
=3Lv1
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-18 Thread Chris Murphy

On Nov 18, 2014, at 8:42 AM, Phillip Susi ps...@ubuntu.com wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 11/18/2014 1:16 AM, Chris Murphy wrote:
 If fstab specifies rootfs as UUID, and there are two volumes with
 the same UUID, it’s now ambiguous which one at boot time is the
 intended rootfs. It’s no different than the days of /dev/sdXY where
 X would change designations between boots = ambiguity and why we
 went to UUID.
 
 He already said he has NOT rebooted, so there is no way that the
 snapshot has actually been mounted, even if it were UUID confusion.
 
 So we kinda need a way to distinguish derivative volumes. Maybe
 XFS and ext4 could easily change the volume UUID, but my vague 
 recollection is this is difficult on Btrfs? So that led me to the 
 idea of a way to create an on-the-fly (but consistent) “virtual 
 volume UUID” maybe based on a hash of both the LVM LV and fs
 volume UUID.
 
 When using LVM, you should be referring to the volume by the LVM name
 rather than UUID.  LVM names are stable, and don't have the duplicate
 uuid problem.

What if you have a Btrfs raid1 volume using two LV’s and then snapshot both 
LV’s?

Of course I’d specify one of the devices by VG-LV name. But Btrfs finds 
additional devices itself, it doesn’t support explicitly naming additional 
member devices. And in this example, there are two identical candidates, so 
it’s ambiguous to Btrfs which one to use. And further it’s unknown to the user 
which one Btrfs chose because neither mount, nor /proc/mounts right now shows 
anything other than the first device that’s mounted. So it’s using one of those 
two VG-LV’s automatically but not informing us which one.

I think there’s some metadata that can be set on each LV whether it’s 
automatically activated (at e.g. boot time) so I think the thing to do would be 
to make sure the snapshot LV’s are not activated, therefore their UUID’s 
shouldn’t be visible to Btrfs and it won’t automatically discover and use the 
wrong LV. But I haven’t tested this.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-18 Thread Goffredo Baroncelli
On 2014-11-18 07:21, Chris Murphy wrote:
 Ergo just because I’ve snapshot my root does not mean grub-mkconfig
 should be creating boot entries for it.

I find this an useful feature: a snapshot of / is done to rollback
some changes, so why don't let grub to start (the kernel) from ?

Anyway I find grub-mkconfig quite useful for a standard user.
For more advance uses cases editing by hand grub.cfg may be possible.

BR
G.Baroncelli



-- 
gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-18 Thread MegaBrutal
2014-11-18 16:42 GMT+01:00 Phillip Susi ps...@ubuntu.com:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 11/18/2014 1:16 AM, Chris Murphy wrote:
  If fstab specifies rootfs as UUID, and there are two volumes with
  the same UUID, it’s now ambiguous which one at boot time is the
  intended rootfs. It’s no different than the days of /dev/sdXY where
  X would change designations between boots = ambiguity and why we
  went to UUID.

 He already said he has NOT rebooted, so there is no way that the
 snapshot has actually been mounted, even if it were UUID confusion.


That's right.

Anyway, I've built a system to reproduce the bug. You can download the
image and run it with KVM or other virtualization technology.
Instructions are straightforward – if you start the VM, you'll know
what to do, and you'll see what I was talking about.

http://undead.megabrutal.com/kvm-reproduce-1391429.img.xz

Download size: 113 MB; Unpacked image size: 2 GB.


  So we kinda need a way to distinguish derivative volumes. Maybe
  XFS and ext4 could easily change the volume UUID, but my vague
  recollection is this is difficult on Btrfs? So that led me to the
  idea of a way to create an on-the-fly (but consistent) “virtual
  volume UUID” maybe based on a hash of both the LVM LV and fs
  volume UUID.

 When using LVM, you should be referring to the volume by the LVM name
 rather than UUID.  LVM names are stable, and don't have the duplicate
 uuid problem.


I use LVM names to identify volumes. I initially suspected it's an
UUID confusion, because I thought grub-probe looks for the volume by
UUID. But now I think the problem is nothing to do with UUIDs.
Probably I should have looked deeper into the problem before I
hypothesized.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-18 Thread Robert White

On 11/18/2014 07:42 AM, Phillip Susi wrote:


On 11/18/2014 1:16 AM, Chris Murphy wrote:

(stuff about UUIDs and LVM snapshots).

 (suggestion to use LVM paths instead).

This is also an XFS+LVM+LVM_Snapshot problem going back to at least 
2009. It's inherent to the block-device-level snapshot phenomonia.


q.v. http://www.miljan.org/main/2009/11/16/lvm-snapshots-and-xfs/ et al

In XFS you attack the snapshot with a command to regenerate the UUID as 
soon as you take the snapshot. I don't think there is a regenerate all 
my UUIDs command for BTRFS.


There are other places this can bone you, like old-format mdadm mirrors, 
where the metadata was only at the end of the partition so you could 
accidentally see two copied of your RAID1 file system if you hand't 
built/started the array.


There is no really good way to prevent this other than being really 
careful or not doing that at all.


Sorry. Cost of doing business. Cheers...
Rob.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-18 Thread Chris Murphy

On Nov 18, 2014, at 1:17 PM, Phillip Susi ps...@ubuntu.com wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 11/18/2014 2:17 PM, Chris Murphy wrote:
 What if you have a Btrfs raid1 volume using two LV’s and then 
 snapshot both LV’s?
 
 That's even more silly than a single lvm snapshot under btrfs.  Just
 don't do it.

Why is it silly? Btrfs on a thin volume has practical use case aside from just 
being thinly provisioned, its snapshots are block device based, not merely that 
of an fs tree.

Looks like lvm.conf does have a way to affect LV autoactivation, and there may 
be another way to achieve this also. Right after the snapshot(s) they’d need to 
have their autoactivation disabled to avoid UUID confusion.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-18 Thread Duncan
Robert White posted on Tue, 18 Nov 2014 17:29:12 -0800 as excerpted:

 On 11/18/2014 07:42 AM, Phillip Susi wrote:
 
 On 11/18/2014 1:16 AM, Chris Murphy wrote:
 (stuff about UUIDs and LVM snapshots).
   (suggestion to use LVM paths instead).
 
 This is also an XFS+LVM+LVM_Snapshot problem going back to at least
 2009. It's inherent to the block-device-level snapshot phenomonia.
 
 q.v. http://www.miljan.org/main/2009/11/16/lvm-snapshots-and-xfs/ et al
 
 In XFS you attack the snapshot with a command to regenerate the UUID as
 soon as you take the snapshot. I don't think there is a regenerate all
 my UUIDs command for BTRFS.

Which was part of my point in my reply.  Btrfs embeds the UUID in the 
metadata deeply enough that it's no simple task to simply change it to 
something else and be done.  It's quite a complicated operation for any 
(future, none current) tool that attempts it, with the most likely 
candidate being an option to btrfs balance or the like, but even then, 
we're looking at a timescale of hours for spinning rust.

So while it's possible in theory, in practice such a regenerate-all UUIDs 
command for btrfs isn't available yet, and given the time involved in 
rewriting all those metadata UUIDs to something else, during which the 
filesystem's in a critically unstable state, and the limited use-case 
with other alternatives, such a tool isn't all /that/ practical in any 
case.

Making an entirely new btrfs and doing a btrfs send/receive for the 
duplicate, or using btrfs snapshots, is a more practical way to go.  (Tho 
watch out for the implications of btrfs snapshots on nocow files!)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-17 Thread MegaBrutal
2014-11-17 7:59 GMT+01:00 Brendan Hide bren...@swiftspirit.co.za:

 Grub is already a little smart here - it avoids snapshots. But in this case 
 it is relying on the UUID and only finding it in the snapshot. So possibly 
 this is a bug in grub affecting the bug reporter specifically - but perhaps 
 the bug is in btrfs where grub is relying on btrfs code.


Yesterday, when I reproduced the phenomenon on a VM, I've found
something rather interesting thing: even /proc/mounts reports
incorrectly, that the snapshot is being mounted instead of the root
FS. Note, there were no reboot. Just create an LVM snapshot and then
check /proc/mounts.

I couldn't reproduce the same with non-root file systems. It seems
this only appears when the device in question is mounted as root FS.


 Yes, I'd rather use btrfs' snapshot mechanism - but this is often a choice 
 that is left to the user/admin/distro. I don't think saying LVM snapshots 
 are incompatible with btrfs is the right way to go either.


Before I did a release upgrade, just to be safe, I made both (LVM and
btrfs snapshot).



 That leaves two aspects of this issue which I view as two separate bugs:
 a) Btrfs cannot gracefully handle separate filesystems that have the same 
 UUID. At all.
 b) Grub appears to pick the wrong filesystem when presented with two 
 filesystems with the same UUID.

 I feel a) is a btrfs bug.
 I feel b) is a bug that is more about ecosystem design than grub being 
 silly.

 I imagine a couple of aspects that could help fix a):
 - Utilise a unique drive identifier in the btrfs metadata (surely this 
 exists already?). This way, any two filesystems will always have different 
 drive identifiers *except* in cases like a ddrescue'd copy or a block-level 
 snapshot. This will provide a sensible mechanism for defined behaviour, 
 preventing corruption - even if that defined behaviour is to simply give 
 out lots of PEBKAC errors and panic.
 - Utilise a drive list to ensure that two unrelated filesystems with the 
 same UUID cannot get mixed up. Yes, the user/admin would likely be the 
 culprit here (perhaps a VM rollout process that always gives out the same 
 UUID in all its filesystems). Again, does btrfs not already have something 
 like this built-in that we're simply not utilising fully?

 I'm not exactly sure of the correct way to fix b) except that I imagine it 
 would be trivial to fix once a) is fixed.


Note that everything that is written into the file system's metadata
gets duplicated with an LVM snapshot. So a unique drive identifier
wouldn't solve the problem, as it would also get replicated, and BTRFS
would still see two identical devices.

But devices on Linux have major and minor numbers those uniquely
identify devices while they are attached. The original and the
snapshot device have different major/minor numbers, and it would be
quite enough to differentiate the devices while they are being
opened/mounted.

By the way, I actually made an entire release upgrade with the
snapshot being there and being reported incorrectly. This would have
caused enough corruption in the file system that I would have surely
noticed it. But I didn't perceive any data corruption. BTRFS didn't
actually write to the snapshot device. It seems the device is only
mixed up in /proc/mounts, so probably the problem is not so severe as
we think, and wouldn't require fundamental changes to BTRFS to fix it.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-17 Thread Brendan Hide

On 2014/11/17 09:35, Daniel Dressler top-posted:

If a UUID is not unique enough how will adding a second UUID or
unique drive identifier help?
A UUID is *supposed* to be unique by design. Isolated, the design is 
adequate.


But the bigger picture clearly shows the design is naive. And broken.

A second per-disk id (note I said unique - but I never said universal 
as in UUID) would allow for better-defined behaviour where, presently, 
we're simply saying current behaviour is undefined and you're likely to 
get corruption.


On the other hand, I asked already if we have IDs of some sort (how else 
do we know which disk a chunk is stored on?), thus I don't think we need 
to add anything to the format.


A simple scenario similar to the one the OP introduced:

Disk sda - says it is UUID Z with diskid 0
Disk sdb - says it is UUID Z with diskid 0

If we're ignoring the fact that there are two disks with the same UUID 
and diskid and it causes corruption, then the kernel is doing something 
stupid but fixable. We have some choices:
- give a clear warning and ignore one of the disks (could just pick the 
first one - or be a little smarter and pick one based on some heuristic 
- for example extent generation number)

- give a clear error and panic

Normal multi-disk scenario:
Disk sda - UUID Z with diskid 1
Disk sdb - UUID Z with diskid 2

These two disks are in the same filesystem and are supposed to work 
together - no issues.


My second suggestion covers another scenario as well:

Disk sda - UUID Z with diskid 1; root block indicates that only diskid 
1 is recorded as being part of the filesystem
Disk sdb - UUID Z with diskid 3; root block indicates that only diskid 
3 is recorded as being part of the filesystem


Again, based on the existing featureset, it seems reasonable that this 
information should already be recorded in the fs metadata. If the 
behaviour is undefined and causing corruption, again the kernel is 
currently doing something stupid but fixable. Again, we have similar 
choices:

- give a clear warning and ignore bad disk(s)
- give a clear error and panic


2014-11-17 15:59 GMT+09:00 Brendan Hide bren...@swiftspirit.co.za:

cc'd bug-g...@gnu.org for FYI

On 2014/11/17 03:42, Duncan wrote:

MegaBrutal posted on Sun, 16 Nov 2014 22:35:26 +0100 as excerpted:


Hello guys,

I think you'll like this...
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1391429

UUID is an initialism for Universally Unique IDentifier.[1]

If the UUID isn't unique, by definition, then, it can't be a UUID, and
that's a bug in whatever is making the non-unique would-be UUID that
isn't unique and thus cannot be a universally unique ID.  In this case
that would appear to be LVM.

Perhaps the right question to ask is Where should this bug be fixed?.

TL;DR: This needs more thought and input from btrfs devs. To LVM, the bug is
likely seen as being out of scope. The correct fix probably lies in the
ecosystem design, which requires co-operation from btrfs.

Making a snapshot in LVM is a fundamental thing - and I feel LVM, in making
its snapshot, is doing its job exactly as expected.

Additionally, there are other ways to get to a similar state without LVM:
ddrescue backup, SAN snapshot, old missing disk re-introduced, etc.

That leaves two places where this can be fixed: grub and btrfs

Grub is already a little smart here - it avoids snapshots. But in this case
it is relying on the UUID and only finding it in the snapshot. So possibly
this is a bug in grub affecting the bug reporter specifically - but perhaps
the bug is in btrfs where grub is relying on btrfs code.

Yes, I'd rather use btrfs' snapshot mechanism - but this is often a choice
that is left to the user/admin/distro. I don't think saying LVM snapshots
are incompatible with btrfs is the right way to go either.

That leaves two aspects of this issue which I view as two separate bugs:
a) Btrfs cannot gracefully handle separate filesystems that have the same
UUID. At all.
b) Grub appears to pick the wrong filesystem when presented with two
filesystems with the same UUID.

I feel a) is a btrfs bug.
I feel b) is a bug that is more about ecosystem design than grub being
silly.

I imagine a couple of aspects that could help fix a):
- Utilise a unique drive identifier in the btrfs metadata (surely this
exists already?). This way, any two filesystems will always have different
drive identifiers *except* in cases like a ddrescue'd copy or a block-level
snapshot. This will provide a sensible mechanism for defined behaviour,
preventing corruption - even if that defined behaviour is to simply give
out lots of PEBKAC errors and panic.
- Utilise a drive list to ensure that two unrelated filesystems with the
same UUID cannot get mixed up. Yes, the user/admin would likely be the
culprit here (perhaps a VM rollout process that always gives out the same
UUID in all its filesystems). Again, does btrfs not already have something
like this built-in that we're simply not utilising 

Re: BTRFS messes up snapshot LV with origin

2014-11-17 Thread Goffredo Baroncelli
On 2014-11-17 07:59, Brendan Hide wrote:
 
 That leaves two aspects of this issue which I view as two separate bugs:
 a) Btrfs cannot gracefully handle separate filesystems that have the same 
 UUID. At all.
 b) Grub appears to pick the wrong filesystem when presented with two 
 filesystems with the same UUID.
 
 I feel a) is a btrfs bug.
 I feel b) is a bug that is more about ecosystem design than grub being 
 silly.

Regarding a)
IIRC, btrfs collects the filesystem information by UUID; if two 
filesystems have the same UUID (like the LVM-snapshot case), the
last filesystem discovered overwrite the first one.

The filesystem discovering is done in user-space; so it should be simple
to skip a filesystem on a LVM-snapshot.

Regarding b)
I am bit confused: if I understood correctly, the root filesystem was
picked from a LVM-snapshot, so grub-probe *correctly* reported that
the root device is the snapshot.
The problem was that during the boot filesystem discovering: first
scanned the *real* device, then the LVM-snapshot; the latter
overwrote the former so the system booted from the LVM-snapshot.

My conclusion is that we should improve the btrfs scan so:
- in udev rules, a partition that is a LVM snapshot by default 
should be not scanned by btrfs dev scan
- btrfs dev scan, during the partition discovery should skip the 
lvm-snapshot.

BR
G.Baroncelli



-- 
gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: BTRFS messes up snapshot LV with origin

2014-11-17 Thread MegaBrutal
2014-11-17 20:04 GMT+01:00 Goffredo Baroncelli kreij...@inwind.it:

 Regarding b)
 I am bit confused: if I understood correctly, the root filesystem was
 picked from a LVM-snapshot, so grub-probe *correctly* reported that
 the root device is the snapshot.


This is not what happens. The system doesn't even get a reboot when
the mix-up happens.

You boot from the original device, create an LVM-snapshot*, and mount
starts to report the snapshot as the root device, while in fact it
isn't.

I know my initial descriptions of the bug were misleading, as myself
didn't know what the heck is going on.

From this point, please take these comments as reference:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1391429/comments/2
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1391429/comments/4


* I know I shouldn't make an LVM-snapshot of a mounted file system,
but this is not the point.


P.S.: E-mail sent twice, as lists didn't accept it in HTML. Plus I'm
not on the GRUB list, and can't post there.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: BTRFS messes up snapshot LV with origin

2014-11-17 Thread Goffredo Baroncelli
On 2014-11-17 20:45, MegaBrutal wrote:
 * I know I shouldn't make an LVM-snapshot of a mounted file system,
 but this is not the point.

This should be supported for the filesystem which support the freezing

See 
http://stackoverflow.com/questions/1940093/lvm-snapshot-of-mounted-filesystem


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-17 Thread Chris Murphy

On Nov 17, 2014, at 12:45 PM, MegaBrutal megabru...@gmail.com wrote:

 2014-11-17 20:04 GMT+01:00 Goffredo Baroncelli kreij...@inwind.it:
 
 Regarding b)
 I am bit confused: if I understood correctly, the root filesystem was
 picked from a LVM-snapshot, so grub-probe *correctly* reported that
 the root device is the snapshot.
 
 
 This is not what happens. The system doesn't even get a reboot when
 the mix-up happens.
 
 You boot from the original device, create an LVM-snapshot*, and mount
 starts to report the snapshot as the root device, while in fact it
 isn’t.

If fstab specifies rootfs as UUID, and there are two volumes with the same 
UUID, it’s now ambiguous which one at boot time is the intended rootfs. It’s no 
different than the days of /dev/sdXY where X would change designations between 
boots = ambiguity and why we went to UUID. 

So we kinda need a way to distinguish derivative volumes. Maybe XFS and ext4 
could easily change the volume UUID, but my vague recollection is this is 
difficult on Btrfs? So that led me to the idea of a way to create an on-the-fly 
(but consistent) “virtual volume UUID” maybe based on a hash of both the LVM LV 
and fs volume UUID.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-17 Thread Chris Murphy

On Nov 16, 2014, at 11:59 PM, Brendan Hide bren...@swiftspirit.co.za wrote:

 cc'd bug-g...@gnu.org for FYI
 
 On 2014/11/17 03:42, Duncan wrote:
 MegaBrutal posted on Sun, 16 Nov 2014 22:35:26 +0100 as excerpted:
 
 Hello guys,
 
 I think you'll like this...
 https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1391429
 UUID is an initialism for Universally Unique IDentifier.[1]
 
 If the UUID isn't unique, by definition, then, it can't be a UUID, and
 that's a bug in whatever is making the non-unique would-be UUID that
 isn't unique and thus cannot be a universally unique ID.  In this case
 that would appear to be LVM.
 Perhaps the right question to ask is Where should this bug be fixed?”.
 
 TL;DR: This needs more thought and input from btrfs devs. To LVM, the bug is 
 likely seen as being out of scope. The correct fix probably lies in the 
 ecosystem design, which requires co-operation from btrfs.

I think the libblkid folks should be brought into this discussion, see what 
their take on this.

LVM conventional snapshots causing this problem is rare / self-limiting as 
they’re short lived. LVM thinp snapshots mean there can be dozens, and they can 
sanely endure for the life of the thin pool.

Effectively we have derivative volumes. At snapshot time, should a.) the fs 
volume UUID be changed; b.) each fs adds an additional/secondary volume UUID at 
snapshot time; c.) each fs adds a derivative/version indicator, i.e. 0 at mkfs 
time and maybe epoch time stamped at snapshot time; d.) not use fs UUID for 
identifying volumes uniqueness, instead use a virtual volume UUID which is 
externally determined based on whether the fs is on an LV snapshot.



 Making a snapshot in LVM is a fundamental thing - and I feel LVM, in making 
 its snapshot, is doing its job exactly as expected.
 
 Additionally, there are other ways to get to a similar state without LVM: 
 ddrescue backup, SAN snapshot, old missing disk re-introduced, etc.

Sure and likewise self limiting problem. LVM thinp snapshots actually do make 
this confusion of multiple instances of the same volume UUID much much more 
likely.

 
 That leaves two places where this can be fixed: grub and btrfs

The GRUB os-prober and grub-mkconfig paradigm I think needs to come to an end. 
The grub.cfg is not supposed to be externally modified, the design is that 
os-prober + grub-mkconfig obliterate it and generate a whole new one from 
scratch anytime the system boot state changes, i.e. anytime a new kernel is 
added.

GRUB isn’t good at OS discovery now, I think it should just be abandoned. It 
can have its grub.cfg generated to do whatever complex things are needed, but 
the individual boot menu entries should exist as drop-in scripts managed by 
whatever is changing the OS boot state. This is the fundamental part of the two 
bootloaderspecs:
http://www.freedesktop.org/wiki/Specifications/BootLoaderSpec/
http://www.freedesktop.org/wiki/MatthewGarrett/BootLoaderSpec/

And it’s a fundamental part of OSTree which supports multiple bootable trees on 
any filesystem, and currently uses a variation on bootloaderspec drop-in 
scripts to inform GRUB how to boot such a system:
https://wiki.gnome.org/action/show/Projects/OSTree?action=showredirect=OSTree




 
 Grub is already a little smart here - it avoids snapshots. But in this case 
 it is relying on the UUID and only finding it in the snapshot. So possibly 
 this is a bug in grub affecting the bug reporter specifically - but perhaps 
 the bug is in btrfs where grub is relying on btrfs code.
 
 Yes, I'd rather use btrfs' snapshot mechanism - but this is often a choice 
 that is left to the user/admin/distro. I don't think saying LVM snapshots 
 are incompatible with btrfs is the right way to go either.
 
 That leaves two aspects of this issue which I view as two separate bugs:
 a) Btrfs cannot gracefully handle separate filesystems that have the same 
 UUID. At all.
 b) Grub appears to pick the wrong filesystem when presented with two 
 filesystems with the same UUID.
 
 I feel a) is a btrfs bug.
 I feel b) is a bug that is more about ecosystem design than grub being 
 silly.

I think we’re well past the expiration date on grub.cfg, a line should be drawn 
in the sand to deprecate routine use of os-prober + grub-mkconfig, and move to 
drop-in scripts by whatever the distro presumes will be responsible for 
managing what “tree” will be booted or will be offered as a boot option, all 
GRUB needs to learn is how to use that drop in script file format.

Ergo just because I’ve snapshot my root does not mean grub-mkconfig should be 
creating boot entries for it. But whatever usespace tool I’m using to do those 
snapshots (ostree, snapper, whatever the GNOME folks might come up with) should 
be the thing that creates the boot entry script; or as simple as this 2-4 line 
script should be, even hand done by a user, unlike the current grub.cfg file 
format.

Further I’d like to get more traction from the syslinux/extlinux 

BTRFS messes up snapshot LV with origin

2014-11-16 Thread MegaBrutal
Hello guys,

I think you'll like this...
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1391429


MegaBrutal
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-16 Thread Duncan
MegaBrutal posted on Sun, 16 Nov 2014 22:35:26 +0100 as excerpted:

 Hello guys,
 
 I think you'll like this...
 https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1391429

UUID is an initialism for Universally Unique IDentifier.[1]

If the UUID isn't unique, by definition, then, it can't be a UUID, and 
that's a bug in whatever is making the non-unique would-be UUID that 
isn't unique and thus cannot be a universally unique ID.  In this case 
that would appear to be LVM.

Meanwhile, if two or more devices are btrfs and have the same UUID, btrfs 
considers them part of the same filesystem, since btrfs /can/ be a multi-
device filesystem.  That's not a bug; that's the way btrfs IDs multiple 
devices as part of the same filesystem, because a UUID, by definition, 
can be relied upon to be unique, or it's no longer a UUID.  Additionally, 
the UUID is actually written into the metadata of the filesystem in such 
a way that it's /not/ a simple task to change the UUID.  Put simply, it's 
ingrained into the filesystem so deeply it cannot be changed, at least 
not without rewriting pretty much all the metadata.  (FWIW, a btrfs 
balance does just that, rewrite the data, metadata, or both.  However, I 
don't believe a balance plugin to change the UUID is yet available.  
You're simply not supposed to change the UUID once the filesystem is 
created.)

So if LVM snapshots duplicate a UUID, as I believe they do, then there's 
your bug, because they're breaking the definition of Universally *UNIQUE* 
ID.  That being the case, using them with btrfs is pretty essentially 
broken, because btrfs depends on UUIDs to be what they say on the label, 
actually unique, and UUIDs are deeply enough ingrained into the very 
fabric of btrfs that it's simply not possible to change that on the btrfs 
side.

Meanwhile, since btrfs *DOES* depend on UUIDs being unique, if there's 
multiple btrfs that accidentally have the same UUID, btrfs will not 
distinguish between them and will very possibly be writing into both of 
them.  If I found myself in that situation, I'd very carefully copy all 
the data I wanted to save off the filesystem and do a new mkfs as soon as 
possible, because I would not consider the filesystem as it was at all 
stable, and I'd count myself very lucky if I got everything off the 
filesystem without damage.  In actuality, since the second device was a 
snapshot of the first, if you catch it reasonably quickly you likely 
won't have too many issues.  However, a btrfs in that condition is in an 
undefined state, and the longer it exists in that state, the more likely 
things are to go wrong, possibly VERY VERY wrong.  So if you don't 
already have backups for anything you consider valuable on that thing, 
get it off there as soon as you possibly can, and consider yourself very 
lucky if nothing's damaged as a result.

---
[1] http://en.wiktionary.org/wiki/UUID

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-16 Thread Brendan Hide

cc'd bug-g...@gnu.org for FYI

On 2014/11/17 03:42, Duncan wrote:

MegaBrutal posted on Sun, 16 Nov 2014 22:35:26 +0100 as excerpted:


Hello guys,

I think you'll like this...
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1391429

UUID is an initialism for Universally Unique IDentifier.[1]

If the UUID isn't unique, by definition, then, it can't be a UUID, and
that's a bug in whatever is making the non-unique would-be UUID that
isn't unique and thus cannot be a universally unique ID.  In this case
that would appear to be LVM.

Perhaps the right question to ask is Where should this bug be fixed?.

TL;DR: This needs more thought and input from btrfs devs. To LVM, the 
bug is likely seen as being out of scope. The correct fix probably 
lies in the ecosystem design, which requires co-operation from btrfs.


Making a snapshot in LVM is a fundamental thing - and I feel LVM, in 
making its snapshot, is doing its job exactly as expected.


Additionally, there are other ways to get to a similar state without 
LVM: ddrescue backup, SAN snapshot, old missing disk re-introduced, etc.


That leaves two places where this can be fixed: grub and btrfs

Grub is already a little smart here - it avoids snapshots. But in this 
case it is relying on the UUID and only finding it in the snapshot. So 
possibly this is a bug in grub affecting the bug reporter specifically - 
but perhaps the bug is in btrfs where grub is relying on btrfs code.


Yes, I'd rather use btrfs' snapshot mechanism - but this is often a 
choice that is left to the user/admin/distro. I don't think saying LVM 
snapshots are incompatible with btrfs is the right way to go either.


That leaves two aspects of this issue which I view as two separate bugs:
a) Btrfs cannot gracefully handle separate filesystems that have the 
same UUID. At all.
b) Grub appears to pick the wrong filesystem when presented with two 
filesystems with the same UUID.


I feel a) is a btrfs bug.
I feel b) is a bug that is more about ecosystem design than grub being 
silly.


I imagine a couple of aspects that could help fix a):
- Utilise a unique drive identifier in the btrfs metadata (surely this 
exists already?). This way, any two filesystems will always have 
different drive identifiers *except* in cases like a ddrescue'd copy or 
a block-level snapshot. This will provide a sensible mechanism for 
defined behaviour, preventing corruption - even if that defined 
behaviour is to simply give out lots of PEBKAC errors and panic.
- Utilise a drive list to ensure that two unrelated filesystems with 
the same UUID cannot get mixed up. Yes, the user/admin would likely be 
the culprit here (perhaps a VM rollout process that always gives out the 
same UUID in all its filesystems). Again, does btrfs not already have 
something like this built-in that we're simply not utilising fully?


I'm not exactly sure of the correct way to fix b) except that I 
imagine it would be trivial to fix once a) is fixed.


--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-16 Thread Daniel Dressler
If a UUID is not unique enough how will adding a second UUID or
unique drive identifier help?

A UUID only serves any purpose when it is unique. Thus duplicate UUIDs
are themselves a failure state.

The solution should be to make it harder to get into this failure
state. Not to make all programs resilient against running under this
failure state. It isn't a btrfs bug that it requires Universal Unique
IDs to be universally unique.

Daniel

2014-11-17 15:59 GMT+09:00 Brendan Hide bren...@swiftspirit.co.za:
 cc'd bug-g...@gnu.org for FYI

 On 2014/11/17 03:42, Duncan wrote:

 MegaBrutal posted on Sun, 16 Nov 2014 22:35:26 +0100 as excerpted:

 Hello guys,

 I think you'll like this...
 https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1391429

 UUID is an initialism for Universally Unique IDentifier.[1]

 If the UUID isn't unique, by definition, then, it can't be a UUID, and
 that's a bug in whatever is making the non-unique would-be UUID that
 isn't unique and thus cannot be a universally unique ID.  In this case
 that would appear to be LVM.

 Perhaps the right question to ask is Where should this bug be fixed?.

 TL;DR: This needs more thought and input from btrfs devs. To LVM, the bug is
 likely seen as being out of scope. The correct fix probably lies in the
 ecosystem design, which requires co-operation from btrfs.

 Making a snapshot in LVM is a fundamental thing - and I feel LVM, in making
 its snapshot, is doing its job exactly as expected.

 Additionally, there are other ways to get to a similar state without LVM:
 ddrescue backup, SAN snapshot, old missing disk re-introduced, etc.

 That leaves two places where this can be fixed: grub and btrfs

 Grub is already a little smart here - it avoids snapshots. But in this case
 it is relying on the UUID and only finding it in the snapshot. So possibly
 this is a bug in grub affecting the bug reporter specifically - but perhaps
 the bug is in btrfs where grub is relying on btrfs code.

 Yes, I'd rather use btrfs' snapshot mechanism - but this is often a choice
 that is left to the user/admin/distro. I don't think saying LVM snapshots
 are incompatible with btrfs is the right way to go either.

 That leaves two aspects of this issue which I view as two separate bugs:
 a) Btrfs cannot gracefully handle separate filesystems that have the same
 UUID. At all.
 b) Grub appears to pick the wrong filesystem when presented with two
 filesystems with the same UUID.

 I feel a) is a btrfs bug.
 I feel b) is a bug that is more about ecosystem design than grub being
 silly.

 I imagine a couple of aspects that could help fix a):
 - Utilise a unique drive identifier in the btrfs metadata (surely this
 exists already?). This way, any two filesystems will always have different
 drive identifiers *except* in cases like a ddrescue'd copy or a block-level
 snapshot. This will provide a sensible mechanism for defined behaviour,
 preventing corruption - even if that defined behaviour is to simply give
 out lots of PEBKAC errors and panic.
 - Utilise a drive list to ensure that two unrelated filesystems with the
 same UUID cannot get mixed up. Yes, the user/admin would likely be the
 culprit here (perhaps a VM rollout process that always gives out the same
 UUID in all its filesystems). Again, does btrfs not already have something
 like this built-in that we're simply not utilising fully?

 I'm not exactly sure of the correct way to fix b) except that I imagine it
 would be trivial to fix once a) is fixed.

 --
 __
 Brendan Hide
 http://swiftspirit.co.za/
 http://www.webafrica.co.za/?AFF1E97


 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html