subject:"btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT"

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-06 Thread Chris Samuel

On Sun, 5 Jan 2014 01:25:19 PM Chris Murphy wrote:

 Does the Ubuntu 12.03 LTS installer let you create sysroot on a Btrfs raid1
 volume?

I doubt it, given the alpha for 14.04 doesn't seem to have the concept yet. 
:-)

https://bugs.launchpad.net/ubuntu/+source/grub-installer/+bug/1266200

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP


signature.asc
Description: This is a digitally signed message part.

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-06 Thread Chris Murphy


On Jan 6, 2014, at 3:20 AM, Chris Samuel ch...@csamuel.org wrote:

 On Sun, 5 Jan 2014 01:25:19 PM Chris Murphy wrote:
 
 Does the Ubuntu 12.03 LTS installer let you create sysroot on a Btrfs raid1
 volume?
 
 I doubt it, given the alpha for 14.04 doesn't seem to have the concept yet. 
 :-)
 
 https://bugs.launchpad.net/ubuntu/+source/grub-installer/+bug/1266200

Color me surprised.

Fedora 20 lets you create Btrfs raid1/raid0 for rootfs, but due to a long 
standing grubby bug [1] /boot can't be on Btrfs, so it's only ext4. That means 
only one of your disks will get grub.cfg, and means if it dies, you won't boot 
without user intervention that also requires esoteric grub knowledge. 

/boot needs to be on Btrfs or it gets messy. The messy alternative, each drive 
has an ext4 boot partition means kernel updates have to be written to each 
drive, and each drives separate /boot/grub/grub.cfg needs to be updated. That's 
kinda ick x2.  Yes they could be made md raid1 to solve part of this.

It gets slightly more amusing on UEFI, where the installer needs to be smart 
enough to create (or reuse) the EFI System partition on each device [2] for the 
bootloader but NOT for the grub.cfg [3], otherwise we have separate grub.cfgs 
on each ESP to update when there are kernel updates.

And if a disk fails, and is replaced, while grub-install works on BIOS, it 
doesn't work on UEFI because it'll only install a bootloader if the ESP is 
mounted in the right location.

So until every duck is in the row, I think we can hardly point one finger when 
it comes to making a degrade system bootable without any human intervention.

[1] grubby fatal error updating grub.cfg when /boot is btrfs
https://bugzilla.redhat.com/show_bug.cgi?id=864198

[2] RFE: always create required bootloader partitions in custom partitioning
https://bugzilla.redhat.com/show_bug.cgi?id=1022316

[2] On EFI, grub.cfg should be in /boot/grub not /boot/efi/EFI/fedora
https://bugzilla.redhat.com/show_bug.cgi?id=1048999


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-06 Thread Jim Salter

FWIW, Ubuntu (and I presume Debian) will work just fine with a single / 
on btrfs, single or multi disk.


I currently have two machines booting to a btrfs-raid10 / with no 
separate /boot, one booting to a btrfs single disk / with no /boot, and 
one booting to a btrfs-raid10 / with an ext4-on-mdraid1 /boot.


On 01/06/2014 01:30 PM, Chris Murphy wrote:
Color me surprised. Fedora 20 lets you create Btrfs raid1/raid0 for 
rootfs, but due to a long standing grubby bug [1] /boot can't be on 
Btrfs, so it's only ext4. 



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-06 Thread Chris Murphy


On Jan 6, 2014, at 12:25 PM, Jim Salter j...@jrs-s.net wrote:

 FWIW, Ubuntu (and I presume Debian) will work just fine with a single / on 
 btrfs, single or multi disk.
 
 I currently have two machines booting to a btrfs-raid10 / with no separate 
 /boot, one booting to a btrfs single disk / with no /boot, and one booting to 
 a btrfs-raid10 / with an ext4-on-mdraid1 /boot.

Did you create the multiple device layouts outside of the installer first?

What I'm seeing in the Ubuntu 12.03.04 installer is a choice of which disk to 
put the bootloader. If that's reliable UI, then it won't put it on both disks 
which means a single point of failure in which case -o degraded not being 
automatic with Btrfs is essentially pointless if we don't have a bootloader. I 
also see no way in the UI to even create Btrfs raid of any sort.

Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-06 Thread Jim Salter

No, the installer is completely unaware. What I was getting at is that 
rebalancing (and installing the bootloader) is dead easy, so it doesn't 
bug me personally much.  It'd be nice to eventually get something in the 
installer to make it obvious to the oblivious that it can be done and 
how, but in the meantime, it's frankly easier to set up btrfs-raid 
WITHOUT installer support than it is to set up mdraid WITH installer support


Install process for 4-drive btrfs-raid10 root on Ubuntu (desktop or server):

1. do single-disk install on first disk, default all the way through 
except picking btrfs instead of ext4 for /
2. sfdisk -d /dev/sda | sfdisk /dev/sdb ; sfdisk -d /dev/sda | sfdisk 
/dev/sdc ; sfdisk -d /dev/sda | sfdisk /dev/sdd

3. btrfs dev add /dev/sdb1 /dev/sdc1 /dev/sdd1 /
4. btrfs balance start -dconvert=raid10 -mconvert=raid10 /
5. grub-install /dev/sdb ; grub-install /dev/sdc ; grub-install /dev/sdd

Done. The rebalancing takes less than a minute, and the system's 
responsive while it happens.  Once you've done the grub-install on the 
additional drives, you're good to go - Ubuntu already uses the UUID 
instead of a device ID for GRUB and fstab, so the btrfs mount will scan 
all drives and find any that are there. The only hitch is the need to 
mount degraded that I Chicken Littled about earlier so loudly. =)


On 01/06/2014 05:05 PM, Chris Murphy wrote:

On Jan 6, 2014, at 12:25 PM, Jim Salter j...@jrs-s.net wrote:


FWIW, Ubuntu (and I presume Debian) will work just fine with a single / on 
btrfs, single or multi disk.

I currently have two machines booting to a btrfs-raid10 / with no separate 
/boot, one booting to a btrfs single disk / with no /boot, and one booting to a 
btrfs-raid10 / with an ext4-on-mdraid1 /boot.

Did you create the multiple device layouts outside of the installer first?

What I'm seeing in the Ubuntu 12.03.04 installer is a choice of which disk to 
put the bootloader. If that's reliable UI, then it won't put it on both disks 
which means a single point of failure in which case -o degraded not being 
automatic with Btrfs is essentially pointless if we don't have a bootloader. I 
also see no way in the UI to even create Btrfs raid of any sort.

Chris Murphy


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-06 Thread Chris Samuel

On 07/01/14 06:25, Jim Salter wrote:

 FWIW, Ubuntu (and I presume Debian) will work just fine with a single /
 on btrfs, single or multi disk.
 
 I currently have two machines booting to a btrfs-raid10 / with no
 separate /boot, one booting to a btrfs single disk / with no /boot, and
 one booting to a btrfs-raid10 / with an ext4-on-mdraid1 /boot.

Actually I've run into a problem with grub where a fresh install cannot
boot from a btrfs /boot if your first partition is not 1MB aligned
(sector 2048) then there is then not enough space for it to store its
btrfs code. :-(

https://bugs.launchpad.net/ubuntu/+source/grub-installer/+bug/1266195

I don't want to move my first partition as it's a Dell special (type
'de') and I'm not sure what the impact would be, so I just created an
ext4 /boot and the install then worked.

Regarding RAID, yes I realise it's easy to do post-fact, in fact on the
same test system I added an external USB2 drive to the root filesystem
and rebalanced as RAID-1, worked nicely.

I'm planning on adding dual SSDs as my OS disks to my desktop and this
experiment was to learn whether the Kubuntu installer handled it yet and
if not to do a quick practice of setting it up by hand. :-)

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-05 Thread Chris Samuel

On Sat, 4 Jan 2014 02:56:39 PM Chris Mason wrote:

 Seconded +ADs-)  We're really focused on nailing down these problems instead
 of hiding behind the experimental flag.  I know we won't be perfect
 overnight, but it's time to focus on production workloads.

Perhaps an option here is to remove the need to specify the degraded flag but 
if the filesystem notice that it is mounting a RAID array and would otherwise 
fail it then sets the degraded flag itself and carries on?

That way the fact it was degraded would be visible in /proc/mounts and could 
be detected with health check scripts like NRPE for icinga/nagios.

Looking at the code this would be in read_one_dev() in fs/btrfs/volumes.c ?

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP


signature.asc
Description: This is a digitally signed message part.

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-05 Thread Duncan

Jim Salter posted on Sat, 04 Jan 2014 16:22:53 -0500 as excerpted:


 On 01/04/2014 01:10 AM, Duncan wrote:
 The example given in the OP was of a 4-device raid10, already the
 minimum number to work undegraded, with one device dropped out, to
 below the minimum required number to mount undegraded, so of /course/
 it wouldn't mount without that option.
 
 The issue was not realizing that a degraded fault-tolerant array would
 refuse to mount without being passed an -o degraded option. Yes, it's on
 the wiki - but it's on the wiki under *replacing* a device, not in the
 FAQ, not in the head of the multiple devices section, etc; and no
 coherent message is thrown either on the console or in the kernel log
 when you do attempt to mount a degraded array without the correct
 argument.
 
 IMO that's a bug. =)

I'd agree, usability bug, one of many smoothing out the rough it works, 
but it's not easy to work with it bugs.

FWIW I'm seeing progress in that area, now.  The rush of functional bugs 
and fixes for them has finally slowed down to the point where there's 
beginning to be time to focus on the usability and rough edges bugs.  I 
believe I saw a post in October or November from Chris Mason, where he 
said yes, the maturing of btrfs has been predicted before, but it really 
does seem like the functional bugs are slowing down to the point where 
the usability bugs can finally be addressed, and 2014 really does look 
like the year that btrfs will finally start shaping up into a mature 
looking and acting filesystem, including in usability, etc.

And Chris mentioned the GSoS project that worked on one angle of this 
specific issue, too.  Getting that code integrated and having btrfs 
finally be able to recognize a dropped and re-added device and 
automatically trigger a resync... that'd be a pretty sweet improvement to 
get. =:^)  While they're working on that they may well take a look at at 
least giving the admin more information on a degraded-needed mount 
failure, too, tweaking the kernel log messages, etc, and possibly taking 
a second look as to whether full refusing to mount is the best situation 
then, or not.

Actually, I wonder... what about mounting in such a situation, but read-
only and refusing to go writable unless degraded is added too?  That 
would preserve the first, do no harm, don't make the problem worse 
ideal, while mounting but read-only unless degraded is added with the rw, 
wouldn't be /quite/ as drastic as refusing to mount entirely, unless 
degraded is added.  I actually think that, plus some better logging 
saying hey, we don't have enough devices to write with the requested raid 
level, so remount rw,degraded, and either add another device or 
reconfigure the raid mode to something suitable for the number of devices.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-05 Thread Duncan

Chris Samuel posted on Sun, 05 Jan 2014 20:20:26 +1100 as excerpted:

 On Sat, 4 Jan 2014 02:56:39 PM Chris Mason wrote:
 
 Seconded +ADs-)  We're really focused on nailing down these problems
 instead of hiding behind the experimental flag.  I know we won't be
 perfect overnight, but it's time to focus on production workloads.
 
 Perhaps an option here is to remove the need to specify the degraded
 flag but if the filesystem notice that it is mounting a RAID array and
 would otherwise fail it then sets the degraded flag itself and carries
 on?
 
 That way the fact it was degraded would be visible in /proc/mounts and
 could be detected with health check scripts like NRPE for icinga/nagios.
 
 Looking at the code this would be in read_one_dev() in
 fs/btrfs/volumes.c ?

The idea I came up elsewhere was to mount read-only, with a dmesg to the 
effect that the filesystem was configured for a raid-level that the 
current number of devices couldn't support, so mount rw,degraded to 
accept that temporarily and to make changes, either by adding a new 
device to fill out the required number for the configured raid level, or 
by reducing the configured raid level to match reality.

The read-only mount would be better than not mounting at all, while 
preserving the first, do no further harm ideal, since mounted read-
only, the existing situation should at least remain stable.  It would 
also alert the admin to problems, with a reasonable log message saying 
how to fix them, while letting the admin at least access the filesystem 
in read-only mode, thereby giving him tools access to manage whatever 
maintenance tasks are necessary, should it be the rootfs.  The admin 
could then take the action they deemed appropriate, whether that was 
getting the data backed up, or mounting degraded,rw in ordered to either 
add a device and bring it back to functional or to rebalance to a lower 
data/metadata redundancy level due to lack of devices.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-05 Thread Chris Murphy


On Jan 4, 2014, at 2:16 PM, Jim Salter j...@jrs-s.net wrote:

 
 On 01/04/2014 02:18 PM, Chris Murphy wrote:
 I'm not sure what else you're referring to?(working on boot environment of 
 btrfs)
 
 Just the string of caveats regarding mounting at boot time - needing to 
 monkeypatch 00_header to avoid the bogus sparse file error

I don't know what bogus sparse file error refers to. What version of GRUB? 
I'm seeing Ubuntu 12.03 precise-updates listing GRUB 1.99 which is rather old.


 (which, worse, tells you to press a key when pressing a key does nothing) 
 followed by this, in my opinion completely unexpected, behavior when missing 
 a disk in a fault-tolerant array, which also requires monkey-patching in 
 fstab and now elsewhere in GRUB to avoid.

and…

 I'm aware it's not intended for production yet.

On the one hand you say you're aware, yet on the other hand you say the missing 
disk behavior is completely unexpected.

Some parts of Btrfs, in certain contexts, are production ready. But the 
developmental state of Btrfs places a burden on the user to know more details 
about that state than he might otherwise be expected to know with more 
stable/mature file systems.

My opinion is that it's inappropriate for degraded mounts to be made automatic 
when there's no method of notifying user space of this state change. 
Gnome-shell via udisks will inform users of a degraded md array. Something 
equivalent to that is needed before Btrfs should enable a scenario where a user 
boots a computer in degraded state without being informed as if there's nothing 
wrong at all. That's demonstrably far worse than scary boot failure, during 
which one copy of data is still likely safe, unlike permitting uninformed 
degraded rw operation.



 However, it's just on the cusp, with distributions not only including it in 
 their installers but a couple teetering on the fence with declaring it their 
 next default FS (Oracle Unbreakable, OpenSuse, hell even RedHat was flirting 
 with the idea) that it seems to me some extra testing with an eye towards 
 production isn't a bad thing.

Does the Ubuntu 12.03 LTS installer let you create sysroot on a Btrfs raid1 
volume?

 That's why I'm here. Not to crap on anybody, but to get involved, hopefully 
 helpfully.

I think you're better off using something more developmental, it necessarily 
needs to exist in the first place there, before it can trickle down to an LTS 
release.

 
 fs_passno is 1 which doesn't apply to Btrfs.
 Again, that's the distribution's default, so the argument should be with 
 them, not me…

Yes so you'd want to file a bug? That's how you get involved.

 with that said, I'd respectfully argue that fs_passno 1 is correct for any 
 root file system; if the file system itself declines to run an fsck that's up 
 to the filesystem, but it's correct to specify fs_passno 1 if the filesystem 
 is to be mounted as root in the first place.
 
 I'm open to hearing why that's a bad idea, if you have a specific reason?

It's a minor point, but it shows that fs_passno has become quaint, like 
grandma's iron cozy. It's not applicable for either XFS or Btrfs. It's arguably 
inapplicable for ext3/4 but its fsck program has an optimization to skip fully 
checking the file system if the journal replay succeeds. There is no unattended 
fsck for either XFS or Btrfs.

On systemd systems, it reads fstab, and if fs_passno is non-zero it checks for 
the existence of /sbin/fsck.fs and if it doesn't exist, then it doesn't run 
fsck for that entry. This topic was recently brought up and is in the archives.


 Well actually LVM thinp does have fast snapshots without requiring 
 preallocation, and uses COW.
 
 LVM's snapshots aren't very useful for me - there's a performance penalty 
 while you have them in place, so they're best used as a transient 
 use-then-immediately-delete feature, for instance for rsync'ing off a 
 database binary. Until recently, there also wasn't a good way to roll back an 
 LV to a snapshot, and even now, that can be pretty problematic.

This describes old LVM snapshots, not LVM thinp snapshots.

 Finally, there's no way to get a partial copy of an LV snapshot out of the 
 snapshot and back into production, so if eg you have virtual machines of 
 significant size, you could be looking at *hours* of file copy operations to 
 restore an individual VM out of a snapshot (if you even have the drive space 
 available for it), as compared to btrfs' cp --reflink=always operation, which 
 allows you to do the same thing instantaneously.

LVM isn't a file system, so limitations compared to Btrfs are expected.

 
 I'm not sure what you mean by self-correcting, but if the drive reports a 
 read error md, lvm, and Btrfs raid1+ all will get missing data from 
 mirror/parity reconstruction, and write corrected data back to the bad 
 sector.
 
 You're assuming that the drive will actually *report* a read error, which is 
 frequently not the case.

This is discussed in

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-04 Thread Chris Samuel

On Sat, 4 Jan 2014 06:10:14 AM Duncan wrote:

 Btrfs remains under development and there are clear warnings
 about using it without  backups one hasn't tested recovery from
 or are not otherwise prepared to  actually use.  It's stated in
 multiple locations on the wiki; it's stated on the kernel btrfs
 config option, and it's stated in mkfs.btrfs output when you
 create the filesystem.

Actually the scary warnings are gone from the Kconfig file for what will be the 
3.13 kernel.  Removed by this commit:

commit 4204617d142c0887e45fda2562cb5c58097b918e
Author: David Sterba dste...@suse.cz
Date:   Wed Nov 20 14:32:34 2013 +0100

btrfs: update kconfig help text

Reflect the current status. Portions of the text taken from the
wiki pages.

Signed-off-by: David Sterba dste...@suse.cz
Signed-off-by: Chris Mason chris.ma...@fusionio.com


-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP


signature.asc
Description: This is a digitally signed message part.

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-04 Thread Chris Samuel

On Sat, 4 Jan 2014 12:57:02 AM Dave wrote:

 I find myself annoyed by the constant disclaimers I
 read on this list, about the experimental status of Btrfs, but it's
 apparent that this hasn't sunk in for everyone.

Btrfs will no longer marked as experimental in the kernel as of 3.13.

Unless someone submits a patch to fix it first. :-)

Can we also keep things polite here please.

thanks,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP


signature.asc
Description: This is a digitally signed message part.

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-04 Thread Duncan

Chris Samuel posted on Sat, 04 Jan 2014 22:20:20 +1100 as excerpted:

 On Sat, 4 Jan 2014 06:10:14 AM Duncan wrote:
 
 Btrfs remains under development and there are clear warnings about
 using it without  backups one hasn't tested recovery from or are not
 otherwise prepared to  actually use.  It's stated in multiple locations
 on the wiki; it's stated on the kernel btrfs config option, and it's
 stated in mkfs.btrfs output when you create the filesystem.
 
 Actually the scary warnings are gone from the Kconfig file for what will
 be the 3.13 kernel.  Removed by this commit:
 
 commit 4204617d142c0887e45fda2562cb5c58097b918e

FWIW, I'd characterize that as toned down somewhat, not /gone/.  You 
don't see ext4 or other mature filesystems saying The filesystem disk 
format is no longer unstable, and it's not expected to change 
unless ..., do you?

Not expected to change and etc is definitely toned down from what it 
was, no argument there, but it still isn't exactly what one would expect 
in a description from a stable filesystem.  If there's still some chance 
of the disk format changing, what does that say about the code /dealing/ 
with that disk format?  That doesn't sound exactly like something I'd be 
comfortable staking my reputation as a sysadmin on as judged fully 
reliable and ready for my mission-critical data, for sure!

Tho agreed, one certainly has to read between the lines a bit more for 
the kernel option now than they did.

But the real kicker for me was when I redid several of my btrfs 
partitions to take advantage of newer features, 16 KiB nodes, etc, and 
saw the warning it's giving, yes, in btrfs-progs 3.12 after all the 
recent documentation changes, etc.  Not everybody builds their own 
kernel, but it's kind of hard to get a btrfs filesystem without making 
one!  (Yes, I know the installers make the filesystem for many people, 
and may well hide the output, but if so and the distros don't provide a 
similar warning when people choose btrfs, that's entirely on the distros 
at that point.  Not much btrfs as upstream can do about that.)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-04 Thread Chris Mason

On Sat, 2014-01-04 at 06:10 +, Duncan wrote:
 Chris Murphy posted on Fri, 03 Jan 2014 16:22:44 -0700 as excerpted:
 
  I would not make this option persistent by putting it permanently in the
  grub.cfg; although I don't know the consequence of always mounting with
  degraded even if not necessary it could have some negative effects (?)
 
 Degraded only actually does anything if it's actually needed.  On a 
 normal array it'll be a NOOP, so should be entirely safe for /normal/ 
 operation, but that doesn't mean I'd /recommend/ it for normal operation, 
 since it bypasses checks that are there for a reason, thus silently 
 bypassing information that an admin needs to know before he boots it 
 anyway, in ordered to recover.
 

 However, I've some other comments to add:
 
 1) As you I'm uncomfortable with the whole idea of adding degraded 
 permanently at this point.
 

I added mount -o degraded just because I wanted the admin to be notified
of failures.  Right now it's still the most reliable way to notify them,
but I definitely agree we can do better.  Leaving it on all the time?  I
don't think this is a great long term solution, unless you are actively
monitoring the system to make sure there are no failures.

Also, as Neil Brown pointed out it does put you at risk of transient
device detection failures getting things out of sync.

 Test:  
 
 a) Create a two device btrfs raid1.
 
 b) Mount it and write some data to it.
 
 c) Unmount it, unplug one device, mount degraded the remaining device.
 
 d) Write some data to a test file on it, noting the path/filename and 
 data.
 
 e) Unmount again, switch plugged devices so the formerly unplugged one is 
 now the plugged one, and again mount degraded.
 
 f) Write some DIFFERENT data to the SAME path/file as in (d), so the two 
 versions each on its own device have now incompatibly forked.
 
 g) Unmount, plug both devices in and mount, now undegraded.
 
 What I discovered back then, and to my knowledge the same behavior exists 
 today, is that entirely unexpectedly from and in contrast to my mdraid 
 experience, THE FILESYSTEM MOUNTED WITHOUT PROTEST!!
 
 h) I checked the file and one variant as written was returned.  STILL NO 
 WARNING!  While I didn't test it, I'm assuming based on the PID-based 
 round-robin read-assignment that I now know btrfs uses, that which copy I 
 got would depend on whether the PID of the reading thread was even or 
 odd, as that's what determines what device of the pair is read.  (There 
 has actually been some discussion of that as it's not a particularly 
 intelligent balancing scheme and it's on the list to change, but the 
 current even/odd works well enough for an initial implementation while 
 the filesystem remains under development.)
 
 i) Were I rerunning the test today, I'd try a scrub and see what it did 
 with the difference.  But I was early enough in my btrfs learning that I 
 didn't know to run it at that point, so didn't do so.  I'd still be 
 interested in how it handled that, tho based on what I know of btrfs 
 behavior in general, I can /predict/ that which copy it'd scrub out and 
 which it would keep, would again depend on the PID of the scrub thread, 
 since both copies would appear valid (would verify against their checksum 
 on the same device) when read, and it's only when matched against the 
 other that a problem, presumably with the other copy, would be detected.
 

It'll pick the latest generation number and use that one as the one true
source.  For the others you'll get crc errors which make it fall back to
the latest one.  If the two have exactly the same generation number,
we'll have a hard time picking the best one.

Ilya has a series of changes from this year's GSOC that we need to clean
up and integrate.  It detects offline devices and brings them up to date
automatically.

He targeted the pull-one-drive use case explicitly.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-04 Thread Chris Mason

On Sat, 2014-01-04 at 22:28 +1100, Chris Samuel wrote:
 On Sat, 4 Jan 2014 12:57:02 AM Dave wrote:
 
  I find myself annoyed by the constant disclaimers I
  read on this list, about the experimental status of Btrfs, but it's
  apparent that this hasn't sunk in for everyone.
 
 Btrfs will no longer marked as experimental in the kernel as of 3.13.
 
 Unless someone submits a patch to fix it first. :-)
 
 Can we also keep things polite here please.

Seconded ;)  We're really focused on nailing down these problems instead
of hiding behind the experimental flag.  I know we won't be perfect
overnight, but it's time to focus on production workloads.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-04 Thread Goffredo Baroncelli

On 2014-01-04 15:51, Chris Mason wrote:
 I added mount -o degraded just because I wanted the admin to be notified
 of failures.  Right now it's still the most reliable way to notify them,
 but I definitely agree we can do better.  

I think that we should align us to what the others raid subsystem (md
and dm) do in these cases.
Reading the man page of mdadm, to me it seems that an array is
constructed even without some disks; the only requirement is the disks
have to be valid (i.e. not out of sync)

 Leaving it on all the time?  I
 don't think this is a great long term solution, unless you are actively
 monitoring the system to make sure there are no failures.

Anyway mdadm has the monitor mode, which reports this kind of error.
From mdadm man page:
Follow or Monitor
  Monitor one or more md devices and act on any state
  changes.  This is only meaningful for RAID1,
  4, 5, 6, 10 or multipath arrays, as only these have
  interesting state.  RAID0  or  Linear  never
  have missing, spare, or failed drives, so there is
  nothing to monitor.


Best regards
GB



-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-04 Thread Chris Murphy


On Jan 3, 2014, at 7:59 PM, Jim Salter j...@jrs-s.net wrote:

 
 On 01/03/2014 07:27 PM, Chris Murphy wrote:
 This is the wrong way to solve this. /etc/grub.d/10_linux is subject to 
 being replaced on updates. It is not recommended it be edited, same as for 
 grub.cfg. The correct way is as I already stated, which is to edit the 
 GRUB_CMDLINE_LINUX= line in /etc/default/grub. 
 Fair enough - though since I already have to monkey-patch 00_header, I kind 
 of already have an eye on grub.d so it doesn't seem as onerous as it 
 otherwise would. There is definitely a lot of work that needs to be done on 
 the boot sequence for btrfs IMO.

Most of this work is done for a while in current versions of GRUB 2.00. There 
are a few fixes due in 2.02.  There are some logical challenges making 
snapshots bootable in a coherent way. But a major advantage of Btrfs is that 
functionality is contained in one place so once the kernel is booted things 
usually just work, so I'm not sure what else you're referring to? 


 I think it's bad advice to recommend always persistently mounting a good 
 volume with this option. There's a reason why degraded is not the default 
 mount option, and why there isn't yet automatic degraded mount 
 functionality. That fstab contains other errors.
 What other errors does it contain? Aside from adding the degraded option, 
 that's a bone-stock fstab entry from an Ubuntu Server installation.

fs_passno is 1 which doesn't apply to Btrfs.


 You're simply dissatisfied with the state of Btrfs development and are 
 suggesting bad hacks as a work around. That's my argument. Again, if your 
 use case requires automatic degraded mounts, use a technology that's mature 
 and well tested for that use case. Don't expect a lot of sympathy if these 
 bad hacks cause you problems later. 
 You're suggesting the wrong alternatives here (mdraid, LVM, etc) - they don't 
 provide the features that I need or are accustomed to (true snapshots, copy 
 on write, self-correcting redundant arrays, and on down the line).

Well actually LVM thinp does have fast snapshots without requiring 
preallocation, and uses COW. I'm not sure what you mean by self-correcting, but 
if the drive reports a read error md, lvm, and Btrfs raid1+ all will get 
missing data from mirror/parity reconstruction, and write corrected data back 
to the bad sector. All offer scrubbing (except Btrfs raid5/6). If you mean an 
independent means of verifying data via checksumming, true you're looking at 
Btrfs, ZFS, or PI.

 If you're going to shoo me off, the correct way to do it is to wave me in the 
 direction of ZFS

There's no shooing, I'm just making observations.


Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-04 Thread Duncan

Chris Mason posted on Sat, 04 Jan 2014 14:51:23 + as excerpted:

 It'll pick the latest generation number and use that one as the one true
 source.  For the others you'll get crc errors which make it fall back to
 the latest one.  If the two have exactly the same generation number,
 we'll have a hard time picking the best one.
 
 Ilya has a series of changes from this year's GSOC that we need to clean
 up and integrate.  It detects offline devices and brings them up to date
 automatically.
 
 He targeted the pull-one-drive use case explicitly.

Thanks for the explanation and bits to look forward to.

I'll be looking forward to seeing that GSOC stuff then, as having 
dropouts and re-adds auto-handled would be a sweet feature to add to the 
raid featureset, improving things from a sysadmin's prepared-to-deal-with-
recovery perspective quite a bit. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-04 Thread Jim Salter



On 01/04/2014 02:18 PM, Chris Murphy wrote:
I'm not sure what else you're referring to?(working on boot 
environment of btrfs)


Just the string of caveats regarding mounting at boot time - needing to 
monkeypatch 00_header to avoid the bogus sparse file error (which, 
worse, tells you to press a key when pressing a key does nothing) 
followed by this, in my opinion completely unexpected, behavior when 
missing a disk in a fault-tolerant array, which also requires 
monkey-patching in fstab and now elsewhere in GRUB to avoid.


Please keep in mind - I think we got off on the wrong foot here, and I'm 
sorry for my part in that, it was unintentional. I *love* btrfs, and 
think the devs are doing incredible work. I'm excited about it. I'm 
aware it's not intended for production yet. However, it's just on the 
cusp, with distributions not only including it in their installers but a 
couple teetering on the fence with declaring it their next default FS 
(Oracle Unbreakable, OpenSuse, hell even RedHat was flirting with the 
idea) that it seems to me some extra testing with an eye towards 
production isn't a bad thing. That's why I'm here. Not to crap on 
anybody, but to get involved, hopefully helpfully.



fs_passno is 1 which doesn't apply to Btrfs.
Again, that's the distribution's default, so the argument should be with 
them, not me... with that said, I'd respectfully argue that fs_passno 1 
is correct for any root file system; if the file system itself declines 
to run an fsck that's up to the filesystem, but it's correct to specify 
fs_passno 1 if the filesystem is to be mounted as root in the first place.


I'm open to hearing why that's a bad idea, if you have a specific reason?

Well actually LVM thinp does have fast snapshots without requiring 
preallocation, and uses COW.


LVM's snapshots aren't very useful for me - there's a performance 
penalty while you have them in place, so they're best used as a 
transient use-then-immediately-delete feature, for instance for 
rsync'ing off a database binary. Until recently, there also wasn't a 
good way to roll back an LV to a snapshot, and even now, that can be 
pretty problematic. Finally, there's no way to get a partial copy of an 
LV snapshot out of the snapshot and back into production, so if eg you 
have virtual machines of significant size, you could be looking at 
*hours* of file copy operations to restore an individual VM out of a 
snapshot (if you even have the drive space available for it), as 
compared to btrfs' cp --reflink=always operation, which allows you to do 
the same thing instantaneously.


FWIW, I think the ability to do cp --reflink=always is one of the big 
killer features that makes btrfs more attractive than zfs (which, again 
FWIW, I have 5+ years of experience with, and is my current primary 
storage system).


I'm not sure what you mean by self-correcting, but if the drive 
reports a read error md, lvm, and Btrfs raid1+ all will get missing 
data from mirror/parity reconstruction, and write corrected data back 
to the bad sector.


You're assuming that the drive will actually *report* a read error, 
which is frequently not the case. I have a production ZFS array right 
now that I need to replace an Intel SSD on - the SSD has thrown  10K 
checksum errors in six months. Zero read or write errors. Neither 
hardware RAID nor mdraid nor LVM would have helped me there.


Since running filesystems that do block-level checksumming, I have 
become aware that bitrot happens without hardware errors getting thrown 
FAR more frequently than I would have thought before having the tools to 
spot it. ZFS, and now btrfs, are the only tools at hand that can 
actually prevent it.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-04 Thread Jim Salter



On 01/04/2014 01:10 AM, Duncan wrote:
The example given in the OP was of a 4-device raid10, already the 
minimum number to work undegraded, with one device dropped out, to 
below the minimum required number to mount undegraded, so of /course/ 
it wouldn't mount without that option.


The issue was not realizing that a degraded fault-tolerant array would 
refuse to mount without being passed an -o degraded option. Yes, it's on 
the wiki - but it's on the wiki under *replacing* a device, not in the 
FAQ, not in the head of the multiple devices section, etc; and no 
coherent message is thrown either on the console or in the kernel log 
when you do attempt to mount a degraded array without the correct argument.


IMO that's a bug. =)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-03 Thread Jim Salter

I'm using Ubuntu 12.04.3 with an up-to-date 3.11 kernel, and the 
btrfs-progs from Debian Sid (since the ones from Ubuntu are ancient).


I discovered to my horror during testing today that neither raid1 nor 
raid10 arrays are fault tolerant of losing an actual disk.


mkfs.btrfs -d raid10 -m raid10 /dev/vdc /dev/vdd /dev/vdd /dev/vde
mkdir /test
mount /dev/vdb /test
echo test  /test/test
btrfs filesystem sync /test
shutdown -hP now

After shutting down the VM, I can remove ANY of the drives from the 
btrfs raid10 array, and be unable to mount the array. In this case, I 
removed the drive that was at /dev/vde, then restarted the VM.


btrfs fi show
Label: none  uuid: 94af1f5d-6ad2-4582-ab4a-5410c410c455
Total devices 4 FS bytes used 156.00KB
 devid3 size 1.00GB used 212.75MB path /dev/vdd
 devid3 size 1.00GB used 212.75MB path /dev/vdc
 devid3 size 1.00GB used 232.75MB path /dev/vdb
 *** Some devices missing

OK, we have three of four raid10 devices present. Should be fine. Let's 
mount it:


mount -t btrfs /dev/vdb /test
mount: wrong fs type, bad option, bad superblock on /dev/vdb,
   missing codepage or helper program, or other error
   In some cases useful info is found in syslog - try
   dmesg | tail or so

What's the kernel log got to say about it?

dmesg | tail -n 4
[  536.694363] device fsid 94af1f5d-6ad2-4582-ab4a-5410c410c455 devid 1 
transid 7 /dev/vdb

[  536.700515] btrfs: disk space caching is enabled
[  536.703491] btrfs: failed to read the system array on vdd
[  536.708337] btrfs: open_ctree failed

Same behavior persists whether I create a raid1 or raid10 array, and 
whether I create it as that raid level using mkfs.btrfs or convert it 
afterwards using btrfs balance start -dconvert=raidn -mconvert=raidn. 
Also persists even if I both scrub AND sync the array before shutting 
the machine down and removing one of the disks.


What's up with this? This is a MASSIVE bug, and I haven't seen anybody 
else talking about it... has nobody tried actually failing out a disk 
yet, or what?

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-03 Thread Joshua Schüler

Am 03.01.2014 23:28, schrieb Jim Salter:
 I'm using Ubuntu 12.04.3 with an up-to-date 3.11 kernel, and the
 btrfs-progs from Debian Sid (since the ones from Ubuntu are ancient).
 
 I discovered to my horror during testing today that neither raid1 nor
 raid10 arrays are fault tolerant of losing an actual disk.
 
 mkfs.btrfs -d raid10 -m raid10 /dev/vdc /dev/vdd /dev/vdd /dev/vde
 mkdir /test
 mount /dev/vdb /test
 echo test  /test/test
 btrfs filesystem sync /test
 shutdown -hP now
 
 After shutting down the VM, I can remove ANY of the drives from the
 btrfs raid10 array, and be unable to mount the array. In this case, I
 removed the drive that was at /dev/vde, then restarted the VM.
 
 btrfs fi show
 Label: none  uuid: 94af1f5d-6ad2-4582-ab4a-5410c410c455
 Total devices 4 FS bytes used 156.00KB
  devid3 size 1.00GB used 212.75MB path /dev/vdd
  devid3 size 1.00GB used 212.75MB path /dev/vdc
  devid3 size 1.00GB used 232.75MB path /dev/vdb
  *** Some devices missing
 
 OK, we have three of four raid10 devices present. Should be fine. Let's
 mount it:
 
 mount -t btrfs /dev/vdb /test
 mount: wrong fs type, bad option, bad superblock on /dev/vdb,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
 
 What's the kernel log got to say about it?
 
 dmesg | tail -n 4
 [  536.694363] device fsid 94af1f5d-6ad2-4582-ab4a-5410c410c455 devid 1
 transid 7 /dev/vdb
 [  536.700515] btrfs: disk space caching is enabled
 [  536.703491] btrfs: failed to read the system array on vdd
 [  536.708337] btrfs: open_ctree failed
 
 Same behavior persists whether I create a raid1 or raid10 array, and
 whether I create it as that raid level using mkfs.btrfs or convert it
 afterwards using btrfs balance start -dconvert=raidn -mconvert=raidn.
 Also persists even if I both scrub AND sync the array before shutting
 the machine down and removing one of the disks.
 
 What's up with this? This is a MASSIVE bug, and I haven't seen anybody
 else talking about it... has nobody tried actually failing out a disk
 yet, or what?

Hey Jim,

keep calm and read the wiki ;)
https://btrfs.wiki.kernel.org/

You need to mount with -o degraded to tell btrfs a disk is missing.


Joshua


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-03 Thread Jim Salter

I actually read the wiki pretty obsessively before blasting the list - 
could not successfully find anything answering the question, by scanning 
the FAQ or by Googling.


You're right - mount -t btrfs -o degraded /dev/vdb /test worked fine.

HOWEVER - this won't allow a root filesystem to mount. How do you deal 
with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root 
filesystem? Few things are scarier than seeing the cannot find init 
message in GRUB and being faced with a BusyBox prompt... which is 
actually how I initially got my scare; I was trying to do a walkthrough 
for setting up a raid1 / for an article in a major online magazine and 
it wouldn't boot at all after removing a device; I backed off and tested 
with a non root filesystem before hitting the list.


I did find the -o degraded argument in the wiki now that you mentioned 
it - but it's not prominent enough if you ask me. =)




On 01/03/2014 05:43 PM, Joshua Schüler wrote:

Am 03.01.2014 23:28, schrieb Jim Salter:

I'm using Ubuntu 12.04.3 with an up-to-date 3.11 kernel, and the
btrfs-progs from Debian Sid (since the ones from Ubuntu are ancient).

I discovered to my horror during testing today that neither raid1 nor
raid10 arrays are fault tolerant of losing an actual disk.

mkfs.btrfs -d raid10 -m raid10 /dev/vdc /dev/vdd /dev/vdd /dev/vde
mkdir /test
mount /dev/vdb /test
echo test  /test/test
btrfs filesystem sync /test
shutdown -hP now

After shutting down the VM, I can remove ANY of the drives from the
btrfs raid10 array, and be unable to mount the array. In this case, I
removed the drive that was at /dev/vde, then restarted the VM.

btrfs fi show
Label: none  uuid: 94af1f5d-6ad2-4582-ab4a-5410c410c455
 Total devices 4 FS bytes used 156.00KB
  devid3 size 1.00GB used 212.75MB path /dev/vdd
  devid3 size 1.00GB used 212.75MB path /dev/vdc
  devid3 size 1.00GB used 232.75MB path /dev/vdb
  *** Some devices missing

OK, we have three of four raid10 devices present. Should be fine. Let's
mount it:

mount -t btrfs /dev/vdb /test
mount: wrong fs type, bad option, bad superblock on /dev/vdb,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so

What's the kernel log got to say about it?

dmesg | tail -n 4
[  536.694363] device fsid 94af1f5d-6ad2-4582-ab4a-5410c410c455 devid 1
transid 7 /dev/vdb
[  536.700515] btrfs: disk space caching is enabled
[  536.703491] btrfs: failed to read the system array on vdd
[  536.708337] btrfs: open_ctree failed

Same behavior persists whether I create a raid1 or raid10 array, and
whether I create it as that raid level using mkfs.btrfs or convert it
afterwards using btrfs balance start -dconvert=raidn -mconvert=raidn.
Also persists even if I both scrub AND sync the array before shutting
the machine down and removing one of the disks.

What's up with this? This is a MASSIVE bug, and I haven't seen anybody
else talking about it... has nobody tried actually failing out a disk
yet, or what?

Hey Jim,

keep calm and read the wiki ;)
https://btrfs.wiki.kernel.org/

You need to mount with -o degraded to tell btrfs a disk is missing.


Joshua




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-03 Thread Joshua Schüler

Am 03.01.2014 23:56, schrieb Jim Salter:
 I actually read the wiki pretty obsessively before blasting the list -
 could not successfully find anything answering the question, by scanning
 the FAQ or by Googling.
 
 You're right - mount -t btrfs -o degraded /dev/vdb /test worked fine.
don't forget to
btrfs device delete missing path
See
https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices
 
 HOWEVER - this won't allow a root filesystem to mount. How do you deal
 with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root
 filesystem? Few things are scarier than seeing the cannot find init
 message in GRUB and being faced with a BusyBox prompt... which is
 actually how I initially got my scare; I was trying to do a walkthrough
 for setting up a raid1 / for an article in a major online magazine and
 it wouldn't boot at all after removing a device; I backed off and tested
 with a non root filesystem before hitting the list.
Add -o degraded to the boot-options in GRUB.

If your filesystem is more heavily corrupted then you either need the
btrfs tools in your initrd or a rescue cd
 
 I did find the -o degraded argument in the wiki now that you mentioned
 it - but it's not prominent enough if you ask me. =)
 
 

[snip]

Joshua
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-03 Thread Hugo Mills

On Fri, Jan 03, 2014 at 05:56:42PM -0500, Jim Salter wrote:
 I actually read the wiki pretty obsessively before blasting the list
 - could not successfully find anything answering the question, by
 scanning the FAQ or by Googling.
 
 You're right - mount -t btrfs -o degraded /dev/vdb /test worked fine.
 
 HOWEVER - this won't allow a root filesystem to mount. How do you
 deal with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your
 root filesystem? Few things are scarier than seeing the cannot find
 init message in GRUB and being faced with a BusyBox prompt...

   Use grub's command-line editing to add rootflags=degraded to it.

   Hugo.

 which
 is actually how I initially got my scare; I was trying to do a
 walkthrough for setting up a raid1 / for an article in a major
 online magazine and it wouldn't boot at all after removing a device;
 I backed off and tested with a non root filesystem before hitting
 the list.
 
 I did find the -o degraded argument in the wiki now that you
 mentioned it - but it's not prominent enough if you ask me. =)
 
 
 
 On 01/03/2014 05:43 PM, Joshua Schüler wrote:
 Am 03.01.2014 23:28, schrieb Jim Salter:
 I'm using Ubuntu 12.04.3 with an up-to-date 3.11 kernel, and the
 btrfs-progs from Debian Sid (since the ones from Ubuntu are ancient).
 
 I discovered to my horror during testing today that neither raid1 nor
 raid10 arrays are fault tolerant of losing an actual disk.
 
 mkfs.btrfs -d raid10 -m raid10 /dev/vdc /dev/vdd /dev/vdd /dev/vde
 mkdir /test
 mount /dev/vdb /test
 echo test  /test/test
 btrfs filesystem sync /test
 shutdown -hP now
 
 After shutting down the VM, I can remove ANY of the drives from the
 btrfs raid10 array, and be unable to mount the array. In this case, I
 removed the drive that was at /dev/vde, then restarted the VM.
 
 btrfs fi show
 Label: none  uuid: 94af1f5d-6ad2-4582-ab4a-5410c410c455
  Total devices 4 FS bytes used 156.00KB
   devid3 size 1.00GB used 212.75MB path /dev/vdd
   devid3 size 1.00GB used 212.75MB path /dev/vdc
   devid3 size 1.00GB used 232.75MB path /dev/vdb
   *** Some devices missing
 
 OK, we have three of four raid10 devices present. Should be fine. Let's
 mount it:
 
 mount -t btrfs /dev/vdb /test
 mount: wrong fs type, bad option, bad superblock on /dev/vdb,
 missing codepage or helper program, or other error
 In some cases useful info is found in syslog - try
 dmesg | tail or so
 
 What's the kernel log got to say about it?
 
 dmesg | tail -n 4
 [  536.694363] device fsid 94af1f5d-6ad2-4582-ab4a-5410c410c455 devid 1
 transid 7 /dev/vdb
 [  536.700515] btrfs: disk space caching is enabled
 [  536.703491] btrfs: failed to read the system array on vdd
 [  536.708337] btrfs: open_ctree failed
 
 Same behavior persists whether I create a raid1 or raid10 array, and
 whether I create it as that raid level using mkfs.btrfs or convert it
 afterwards using btrfs balance start -dconvert=raidn -mconvert=raidn.
 Also persists even if I both scrub AND sync the array before shutting
 the machine down and removing one of the disks.
 
 What's up with this? This is a MASSIVE bug, and I haven't seen anybody
 else talking about it... has nobody tried actually failing out a disk
 yet, or what?
 Hey Jim,
 
 keep calm and read the wiki ;)
 https://btrfs.wiki.kernel.org/
 
 You need to mount with -o degraded to tell btrfs a disk is missing.
 
 
 Joshua
 
 
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Eighth Army Push Bottles Up Germans -- WWII newspaper ---  
 headline (possibly apocryphal)  


signature.asc
Description: Digital signature

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-03 Thread Jim Salter

Sorry - where do I put this in GRUB? /boot/grub/grub.cfg is still kinda 
black magic to me, and I don't think I'm supposed to be editing it 
directly at all anymore anyway, if I remember correctly...

HOWEVER - this won't allow a root filesystem to mount. How do you deal
with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root
filesystem? Few things are scarier than seeing the cannot find init
message in GRUB and being faced with a BusyBox prompt... which is
actually how I initially got my scare; I was trying to do a walkthrough
for setting up a raid1 / for an article in a major online magazine and
it wouldn't boot at all after removing a device; I backed off and tested
with a non root filesystem before hitting the list.

Add -o degraded to the boot-options in GRUB.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-03 Thread Hugo Mills

On Fri, Jan 03, 2014 at 06:13:25PM -0500, Jim Salter wrote:
 Sorry - where do I put this in GRUB? /boot/grub/grub.cfg is still
 kinda black magic to me, and I don't think I'm supposed to be
 editing it directly at all anymore anyway, if I remember
 correctly...

   You don't need to edit grub.cfg -- when you boot, grub has an edit
option, so you can do it at boot time without having to use a rescue
disk.

   Regardless, the thing you need to edit is the line starting
linux, and will look something like this:

linux /vmlinuz-3.11.0-rc2-dirty root=UUID=1b6ec419-211a-445e-b762-ae7da27b6e8a 
ro single rootflags=subvol=fs-root

   If there's a rootflags= option already (as above), add ,degraded
to the end. If there isn't, add rootflags=degraded.

   Hugo.

 HOWEVER - this won't allow a root filesystem to mount. How do you deal
 with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root
 filesystem? Few things are scarier than seeing the cannot find init
 message in GRUB and being faced with a BusyBox prompt... which is
 actually how I initially got my scare; I was trying to do a walkthrough
 for setting up a raid1 / for an article in a major online magazine and
 it wouldn't boot at all after removing a device; I backed off and tested
 with a non root filesystem before hitting the list.
 Add -o degraded to the boot-options in GRUB.
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Eighth Army Push Bottles Up Germans -- WWII newspaper ---  
 headline (possibly apocryphal)  


signature.asc
Description: Digital signature

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-03 Thread Chris Murphy


On Jan 3, 2014, at 3:56 PM, Jim Salter j...@jrs-s.net wrote:

 I actually read the wiki pretty obsessively before blasting the list - could 
 not successfully find anything answering the question, by scanning the FAQ or 
 by Googling.
 
 You're right - mount -t btrfs -o degraded /dev/vdb /test worked fine.
 
 HOWEVER - this won't allow a root filesystem to mount. How do you deal with 
 this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root filesystem? 

I'd say that it's not ready for unattended/auto degraded mounting, that this is 
intended to be a red flag show stopper to get the attention of the user. Before 
automatic degraded mounts, which md and LVM raid do now, there probably needs 
to be notification support in desktop's, .e.g. Gnome will report degraded state 
for at least md arrays (maybe LVM too, not sure). There's also a list of other 
multiple device stuff on the to do, some of which maybe should be done before 
auto degraded mount, for example the hot spare work.

https://btrfs.wiki.kernel.org/index.php/Project_ideas#Multiple_Devices


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-03 Thread Jim Salter

Yep - had just figured that out and successfully booted with it, and was 
in the process of typing up instructions for the list (and posterity).


One thing that concerns me is that edits made directly to grub.cfg will 
get wiped out with every kernel upgrade when update-grub is run - any 
idea where I'd put this in /etc/grub.d to have a persistent change?


I have to tell you, I'm not real thrilled with this behavior either way 
- it means I can't have the option to automatically mount degraded 
filesystems without the filesystems in question ALWAYS showing as being 
mounted degraded, whether the disks are all present and working fine or 
not. That's kind of blecchy. =\



On 01/03/2014 06:18 PM, Hugo Mills wrote:

On Fri, Jan 03, 2014 at 06:13:25PM -0500, Jim Salter wrote:

Sorry - where do I put this in GRUB? /boot/grub/grub.cfg is still
kinda black magic to me, and I don't think I'm supposed to be
editing it directly at all anymore anyway, if I remember
correctly...

You don't need to edit grub.cfg -- when you boot, grub has an edit
option, so you can do it at boot time without having to use a rescue
disk.

Regardless, the thing you need to edit is the line starting
linux, and will look something like this:

linux /vmlinuz-3.11.0-rc2-dirty root=UUID=1b6ec419-211a-445e-b762-ae7da27b6e8a 
ro single rootflags=subvol=fs-root

If there's a rootflags= option already (as above), add ,degraded
to the end. If there isn't, add rootflags=degraded.

Hugo.


HOWEVER - this won't allow a root filesystem to mount. How do you deal
with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root
filesystem? Few things are scarier than seeing the cannot find init
message in GRUB and being faced with a BusyBox prompt... which is
actually how I initially got my scare; I was trying to do a walkthrough
for setting up a raid1 / for an article in a major online magazine and
it wouldn't boot at all after removing a device; I backed off and tested
with a non root filesystem before hitting the list.

Add -o degraded to the boot-options in GRUB.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-03 Thread Chris Murphy


On Jan 3, 2014, at 4:13 PM, Jim Salter j...@jrs-s.net wrote:

 Sorry - where do I put this in GRUB? /boot/grub/grub.cfg is still kinda black 
 magic to me, and I don't think I'm supposed to be editing it directly at all 
 anymore anyway, if I remember correctly…

Don't edit the grub.cfg directly. At the grub menu, only highlight the entry 
you want to boot, then hit 'e', and then edit the existing linux/linuxefi line. 
If you already have rootfs on a subvolume, you'll have an existing parameter on 
that line rootflags=subvol=rootname and you can change this to 
rootflags=subvol=rootname,degraded

I would not make this option persistent by putting it permanently in the 
grub.cfg; although I don't know the consequence of always mounting with 
degraded even if not necessary it could have some negative effects (?)


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-03 Thread Chris Murphy


On Jan 3, 2014, at 4:25 PM, Jim Salter j...@jrs-s.net wrote:

 
 One thing that concerns me is that edits made directly to grub.cfg will get 
 wiped out with every kernel upgrade when update-grub is run - any idea where 
 I'd put this in /etc/grub.d to have a persistent change?

/etc/default/grub

I don't recommend making it persistent. At this stage of development, a disk 
failure should cause mount failure so you're alerted to the problem.

 I have to tell you, I'm not real thrilled with this behavior either way - it 
 means I can't have the option to automatically mount degraded filesystems 
 without the filesystems in question ALWAYS showing as being mounted degraded, 
 whether the disks are all present and working fine or not. That's kind of 
 blecchy. =\

If you need something that comes up degraded automatically by design as a 
supported use case, use md (or possibly LVM which uses different user space 
tools and monitoring but uses the md kernel driver code and supports raid 
0,1,5,6 - quite nifty). I haven't tried this yet, but I think that's also 
supported with the thin provisioning work, which even if you don't use thin 
provisioning gets you the significantly more efficient snapshot behavior.

Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-03 Thread Jim Salter

For anybody else interested, if you want your system to automatically 
boot a degraded btrfs array, here are my crib notes, verified working:


* boot degraded

1. edit /etc/grub.d/10_linux, add degraded to the rootflags

GRUB_CMDLINE_LINUX=rootflags=degraded,subvol=${rootsubvol} 
${GRUB_CMDLINE_LINUX}



2. add degraded to options in /etc/fstab also

UUID=bf9ea9b9-54a7-4efc-8003-6ac0b344c6b5 /   btrfs 
defaults,degraded,subvol=@   0   1



3. Update and reinstall GRUB to all boot disks

update-grub
grub-install /dev/vda
grub-install /dev/vdb

Now you have a system which will automatically start a degraded array.


**

Side note: sorry, but I absolutely don't buy the argument that the 
system won't boot without you driving down to its physical location, 
standing in front of it, and hammering panickily at a BusyBox prompt is 
the best way to find out your array is degraded.  I'll set up a Nagios 
module to check for degraded arrays using btrfs fi list instead, thanks...



On 01/03/2014 06:06 PM, Freddie Cash wrote:
Why is manual intervention even needed?  Why isn't the filesystem 
smart enough to mount in a degraded mode automatically?


--
Freddie Cash
fjwc...@gmail.com mailto:fjwc...@gmail.com


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-03 Thread Jim Salter

Minor correction: you need to close the double-quotes at the end of the 
GRUB_CMDLINE_LINUX line:


GRUB_CMDLINE_LINUX=rootflags=degraded,subvol=${rootsubvol} 
${GRUB_CMDLINE_LINUX}



On 01/03/2014 06:42 PM, Jim Salter wrote:
For anybody else interested, if you want your system to automatically 
boot a degraded btrfs array, here are my crib notes, verified working:


* boot degraded

1. edit /etc/grub.d/10_linux, add degraded to the rootflags

GRUB_CMDLINE_LINUX=rootflags=degraded,subvol=${rootsubvol} 
${GRUB_CMDLINE_LINUX}



2. add degraded to options in /etc/fstab also

UUID=bf9ea9b9-54a7-4efc-8003-6ac0b344c6b5 /   btrfs 
defaults,degraded,subvol=@   0   1



3. Update and reinstall GRUB to all boot disks

update-grub
grub-install /dev/vda
grub-install /dev/vdb

Now you have a system which will automatically start a degraded array.


**

Side note: sorry, but I absolutely don't buy the argument that the 
system won't boot without you driving down to its physical location, 
standing in front of it, and hammering panickily at a BusyBox prompt 
is the best way to find out your array is degraded.  I'll set up a 
Nagios module to check for degraded arrays using btrfs fi list 
instead, thanks...



On 01/03/2014 06:06 PM, Freddie Cash wrote:
Why is manual intervention even needed? Why isn't the filesystem 
smart enough to mount in a degraded mode automatically?


--
Freddie Cash
fjwc...@gmail.com mailto:fjwc...@gmail.com


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-03 Thread Chris Murphy


On Jan 3, 2014, at 4:42 PM, Jim Salter j...@jrs-s.net wrote:

 For anybody else interested, if you want your system to automatically boot a 
 degraded btrfs array, here are my crib notes, verified working:
 
 * boot degraded
 
 1. edit /etc/grub.d/10_linux, add degraded to the rootflags
 
GRUB_CMDLINE_LINUX=rootflags=degraded,subvol=${rootsubvol} 
 ${GRUB_CMDLINE_LINUX}

This is the wrong way to solve this. /etc/grub.d/10_linux is subject to being 
replaced on updates. It is not recommended it be edited, same as for grub.cfg. 
The correct way is as I already stated, which is to edit the 
GRUB_CMDLINE_LINUX= line in /etc/default/grub.


 2. add degraded to options in /etc/fstab also
 
 UUID=bf9ea9b9-54a7-4efc-8003-6ac0b344c6b5 /   btrfs 
 defaults,degraded,subvol=@   0   1


I think it's bad advice to recommend always persistently mounting a good volume 
with this option. There's a reason why degraded is not the default mount 
option, and why there isn't yet automatic degraded mount functionality. That 
fstab contains other errors.

The correct way to automate this before Btrfs developers get around to it is to 
create a systemd unit that checks for the mount failure, determines that 
there's a missing device, and generates a modified sysroot.mount job that 
includes degraded.


 Side note: sorry, but I absolutely don't buy the argument that the system 
 won't boot without you driving down to its physical location, standing in 
 front of it, and hammering panickily at a BusyBox prompt is the best way to 
 find out your array is degraded.

You're simply dissatisfied with the state of Btrfs development and are 
suggesting bad hacks as a work around. That's my argument. Again, if your use 
case requires automatic degraded mounts, use a technology that's mature and 
well tested for that use case. Don't expect a lot of sympathy if these bad 
hacks cause you problems later.


  I'll set up a Nagios module to check for degraded arrays using btrfs fi list 
 instead, thanks…

That's a good idea, except that it's show rather than list.



Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-03 Thread Jim Salter



On 01/03/2014 07:27 PM, Chris Murphy wrote:
This is the wrong way to solve this. /etc/grub.d/10_linux is subject 
to being replaced on updates. It is not recommended it be edited, same 
as for grub.cfg. The correct way is as I already stated, which is to 
edit the GRUB_CMDLINE_LINUX= line in /etc/default/grub. 
Fair enough - though since I already have to monkey-patch 00_header, I 
kind of already have an eye on grub.d so it doesn't seem as onerous as 
it otherwise would. There is definitely a lot of work that needs to be 
done on the boot sequence for btrfs IMO.
I think it's bad advice to recommend always persistently mounting a 
good volume with this option. There's a reason why degraded is not the 
default mount option, and why there isn't yet automatic degraded mount 
functionality. That fstab contains other errors.
What other errors does it contain? Aside from adding the degraded 
option, that's a bone-stock fstab entry from an Ubuntu Server installation.
The correct way to automate this before Btrfs developers get around to 
it is to create a systemd unit that checks for the mount failure, 
determines that there's a missing device, and generates a modified 
sysroot.mount job that includes degraded. 
Systemd is not the boot system in use for my distribution, and using it 
would require me to build a custom kernel, among other things. We're 
going to have to agree to disagree that that's an appropriate 
workaround, I think.
You're simply dissatisfied with the state of Btrfs development and are 
suggesting bad hacks as a work around. That's my argument. Again, if 
your use case requires automatic degraded mounts, use a technology 
that's mature and well tested for that use case. Don't expect a lot of 
sympathy if these bad hacks cause you problems later. 
You're suggesting the wrong alternatives here (mdraid, LVM, etc) - they 
don't provide the features that I need or are accustomed to (true 
snapshots, copy on write, self-correcting redundant arrays, and on down 
the line). If you're going to shoo me off, the correct way to do it is 
to wave me in the direction of ZFS, in which case I can tell you I've 
been a happy user of ZFS for 5+ years now on hundreds of systems. ZFS 
and btrfs are literally the *only* options available that do what I want 
to do, and have been doing for years now. (At least aside from 
six-figure-and-up proprietary systems, which I have neither the budget 
nor the inclination for.)


I'm testing btrfs heavily in throwaway virtual environments and in a few 
small, heavily-monitored test production instances because ZFS on 
Linux has its own set of problems, both technical and licensing, and I 
think it's clear btrfs is going to take the lead in the very near future 
- in many ways, it does already.

  I'll set up a Nagios module to check for degraded arrays using btrfs fi list 
instead, thanks…

That's a good idea, except that it's show rather than list.
Yup, that's what I meant all right. I frequently still get the syntax 
backwards between btrfs fi show and btrfs subv list.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-03 Thread Dave

On Fri, Jan 3, 2014 at 9:59 PM, Jim Salter j...@jrs-s.net wrote:
 You're suggesting the wrong alternatives here (mdraid, LVM, etc) - they
 don't provide the features that I need or are accustomed to (true snapshots,
 copy on write, self-correcting redundant arrays, and on down the line). If
 you're going to shoo me off, the correct way to do it is to wave me in the
 direction of ZFS, in which case I can tell you I've been a happy user of ZFS
 for 5+ years now on hundreds of systems. ZFS and btrfs are literally the
 *only* options available that do what I want to do, and have been doing for
 years now. (At least aside from six-figure-and-up proprietary systems, which
 I have neither the budget nor the inclination for.)

Jim, there's nothing stopping you from creating a Btrfs filesystem on
top of an mdraid array.  I'm currently running three WD Red 3TB drives
in a raid5 configuration under a Btrfs filesystem.  This configuration
works pretty well and fills the feature gap you're describing.

I will say, though, that the whole tone of your email chain leaves a
bad taste in my mouth; kind of like a poorly adjusted relative who
shows up once a year for Thanksgiving and makes everyone feel
uncomfortable.  I find myself annoyed by the constant disclaimers I
read on this list, about the experimental status of Btrfs, but it's
apparent that this hasn't sunk in for everyone.  Your poor budget
doesn't a production filesytem make.

I and many others on this list who have been using Btrfs, will tell
you with no hesitation, that due to the maturity of the code, Btrfs
should be making NO assumptions in the event of a failure, and
everything should come to a screeching halt.  I've seen it all: the
infamous 120 second process hangs, csum errors, multiple separate
catastrophic failures (search me on this list).  Things are MOSTLY
stable but you simply have to glance at a few weeks of history on this
list to see the experimental status is fully justified.  I use Btrfs
because of its intoxicating feature set.  As an IT director though,
I'd never subject my company to these rigors.  If Btrfs on mdraid
isn't an acceptable solution for you, then ZFS is the only responsible
alternative.
-- 
-=[dave]=-

Entropy isn't what it used to be.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-03 Thread Duncan

Chris Murphy posted on Fri, 03 Jan 2014 16:22:44 -0700 as excerpted:

 I would not make this option persistent by putting it permanently in the
 grub.cfg; although I don't know the consequence of always mounting with
 degraded even if not necessary it could have some negative effects (?)

Degraded only actually does anything if it's actually needed.  On a 
normal array it'll be a NOOP, so should be entirely safe for /normal/ 
operation, but that doesn't mean I'd /recommend/ it for normal operation, 
since it bypasses checks that are there for a reason, thus silently 
bypassing information that an admin needs to know before he boots it 
anyway, in ordered to recover.

However, I've some other comments to add:

1) As you I'm uncomfortable with the whole idea of adding degraded 
permanently at this point.

Mention was made of having to drive down to the data center and actually 
stand in front of the box if something goes wrong, otherwise.  At the 
moment, for btrfs' development state at this point, fine.  Btrfs remains 
under development and there are clear warnings about using it without 
backups one hasn't tested recovery from or are not otherwise prepared to 
actually use.  It's stated in multiple locations on the wiki; it's stated 
on the kernel btrfs config option, and it's stated in mkfs.btrfs output 
when you create the filesystem.  If after all that people are using it in 
a remote situation where they're not prepared to drive down to the data 
center and stab at the keys if they have to, they're using possibly the 
right filesystem, but at the wrong too early point in its development, 
for their needs at this moment.


2) As the wiki explains, certain configurations require at least a 
minimum number of devices in ordered to work undegraded.  The example 
given in the OP was of a 4-device raid10, already the minimum number to 
work undegraded, with one device dropped out, to below the minimum 
required number to mount undegraded, so of /course/ it wouldn't mount 
without that option.

If five or six devices would have been used, a device could have been 
dropped and the remaining number of devices would still be greater than 
or equal to the minimum number of devices to run an undegraded raid10, 
and the result would likely have been different, since there's still 
enough devices to mount writable with proper redundancy, even if existing 
information doesn't have that redundancy until a rebalance is done to 
take care of the missing device.

Similarly with a raid1 and its minimum two devices.  Configure with 
three, then drop one, and it should still work as it's above the two 
minimum for raid1 configuration.  Configure with two and drop one, and 
you'll have to mount degraded (and it'll drop to read-only if it happens 
in operation) since there's no second device to write the second copy to, 
as required by raid1.

3) Frankly, this whole thread smells of going off half cocked, posting 
before doing the proper research.  I know when I took a look at btrfs 
here, I read up on the wiki, reading the multiple devices stuff, the faq, 
the problem faq, the gotchas, the use cases, the sysadmin guide, the 
getting started and mount options... loading the pages multiple times as 
I followed links back and forth between them.

Because I care about my data and want to understand what I'm doing with 
it before I do it!

And even now I often reread specific parts as I'm trying to help others 
with questions on this list

Then I still had some questions about how it worked that I couldn't find 
answers for on the wiki, and as traditional with mailing lists and 
newsgroups before them, I read several weeks worth of posts (on an 
archive for lists) before actually posting my questions, to see if they 
were FAQs already answered on the list.

Then and only then did I post the questions to the list, and when I did, 
it was, Questions I haven't found answers for on the wiki or list, not 
THE WORLD IS GOING TO END, OH NOS!!111!!11!111!!!

Now later on I did post some behavior that had me rather upset, but that 
was AFTER I had already engaged the list in general, and was pretty sure 
by that point that what I was seeing was NOT covered on the wiki, and was 
reasonably new information for at least SOME list users.

4) As a matter of fact, AFAIK that behavior remains relevant today, and 
may well be of interest to the OP.

FWIW my background was Linux kernel md/raid, so I approached the btrfs 
raid expecting similar behavior.  What I found in my testing (and NOT 
covered on the WIKI or in the various documentation other than in a few 
threads on list to this day, AFAIK) , however...

Test:  

a) Create a two device btrfs raid1.

b) Mount it and write some data to it.

c) Unmount it, unplug one device, mount degraded the remaining device.

d) Write some data to a test file on it, noting the path/filename and 
data.

e) Unmount again, switch plugged devices so the formerly unplugged one is 
now the plugged

37 matches

Mail list logo