Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Sun, 5 Jan 2014 01:25:19 PM Chris Murphy wrote: Does the Ubuntu 12.03 LTS installer let you create sysroot on a Btrfs raid1 volume? I doubt it, given the alpha for 14.04 doesn't seem to have the concept yet. :-) https://bugs.launchpad.net/ubuntu/+source/grub-installer/+bug/1266200 All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: http://en.wikipedia.org/wiki/OpenPGP signature.asc Description: This is a digitally signed message part.
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Jan 6, 2014, at 3:20 AM, Chris Samuel ch...@csamuel.org wrote: On Sun, 5 Jan 2014 01:25:19 PM Chris Murphy wrote: Does the Ubuntu 12.03 LTS installer let you create sysroot on a Btrfs raid1 volume? I doubt it, given the alpha for 14.04 doesn't seem to have the concept yet. :-) https://bugs.launchpad.net/ubuntu/+source/grub-installer/+bug/1266200 Color me surprised. Fedora 20 lets you create Btrfs raid1/raid0 for rootfs, but due to a long standing grubby bug [1] /boot can't be on Btrfs, so it's only ext4. That means only one of your disks will get grub.cfg, and means if it dies, you won't boot without user intervention that also requires esoteric grub knowledge. /boot needs to be on Btrfs or it gets messy. The messy alternative, each drive has an ext4 boot partition means kernel updates have to be written to each drive, and each drives separate /boot/grub/grub.cfg needs to be updated. That's kinda ick x2. Yes they could be made md raid1 to solve part of this. It gets slightly more amusing on UEFI, where the installer needs to be smart enough to create (or reuse) the EFI System partition on each device [2] for the bootloader but NOT for the grub.cfg [3], otherwise we have separate grub.cfgs on each ESP to update when there are kernel updates. And if a disk fails, and is replaced, while grub-install works on BIOS, it doesn't work on UEFI because it'll only install a bootloader if the ESP is mounted in the right location. So until every duck is in the row, I think we can hardly point one finger when it comes to making a degrade system bootable without any human intervention. [1] grubby fatal error updating grub.cfg when /boot is btrfs https://bugzilla.redhat.com/show_bug.cgi?id=864198 [2] RFE: always create required bootloader partitions in custom partitioning https://bugzilla.redhat.com/show_bug.cgi?id=1022316 [2] On EFI, grub.cfg should be in /boot/grub not /boot/efi/EFI/fedora https://bugzilla.redhat.com/show_bug.cgi?id=1048999 Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
FWIW, Ubuntu (and I presume Debian) will work just fine with a single / on btrfs, single or multi disk. I currently have two machines booting to a btrfs-raid10 / with no separate /boot, one booting to a btrfs single disk / with no /boot, and one booting to a btrfs-raid10 / with an ext4-on-mdraid1 /boot. On 01/06/2014 01:30 PM, Chris Murphy wrote: Color me surprised. Fedora 20 lets you create Btrfs raid1/raid0 for rootfs, but due to a long standing grubby bug [1] /boot can't be on Btrfs, so it's only ext4. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Jan 6, 2014, at 12:25 PM, Jim Salter j...@jrs-s.net wrote: FWIW, Ubuntu (and I presume Debian) will work just fine with a single / on btrfs, single or multi disk. I currently have two machines booting to a btrfs-raid10 / with no separate /boot, one booting to a btrfs single disk / with no /boot, and one booting to a btrfs-raid10 / with an ext4-on-mdraid1 /boot. Did you create the multiple device layouts outside of the installer first? What I'm seeing in the Ubuntu 12.03.04 installer is a choice of which disk to put the bootloader. If that's reliable UI, then it won't put it on both disks which means a single point of failure in which case -o degraded not being automatic with Btrfs is essentially pointless if we don't have a bootloader. I also see no way in the UI to even create Btrfs raid of any sort. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
No, the installer is completely unaware. What I was getting at is that rebalancing (and installing the bootloader) is dead easy, so it doesn't bug me personally much. It'd be nice to eventually get something in the installer to make it obvious to the oblivious that it can be done and how, but in the meantime, it's frankly easier to set up btrfs-raid WITHOUT installer support than it is to set up mdraid WITH installer support Install process for 4-drive btrfs-raid10 root on Ubuntu (desktop or server): 1. do single-disk install on first disk, default all the way through except picking btrfs instead of ext4 for / 2. sfdisk -d /dev/sda | sfdisk /dev/sdb ; sfdisk -d /dev/sda | sfdisk /dev/sdc ; sfdisk -d /dev/sda | sfdisk /dev/sdd 3. btrfs dev add /dev/sdb1 /dev/sdc1 /dev/sdd1 / 4. btrfs balance start -dconvert=raid10 -mconvert=raid10 / 5. grub-install /dev/sdb ; grub-install /dev/sdc ; grub-install /dev/sdd Done. The rebalancing takes less than a minute, and the system's responsive while it happens. Once you've done the grub-install on the additional drives, you're good to go - Ubuntu already uses the UUID instead of a device ID for GRUB and fstab, so the btrfs mount will scan all drives and find any that are there. The only hitch is the need to mount degraded that I Chicken Littled about earlier so loudly. =) On 01/06/2014 05:05 PM, Chris Murphy wrote: On Jan 6, 2014, at 12:25 PM, Jim Salter j...@jrs-s.net wrote: FWIW, Ubuntu (and I presume Debian) will work just fine with a single / on btrfs, single or multi disk. I currently have two machines booting to a btrfs-raid10 / with no separate /boot, one booting to a btrfs single disk / with no /boot, and one booting to a btrfs-raid10 / with an ext4-on-mdraid1 /boot. Did you create the multiple device layouts outside of the installer first? What I'm seeing in the Ubuntu 12.03.04 installer is a choice of which disk to put the bootloader. If that's reliable UI, then it won't put it on both disks which means a single point of failure in which case -o degraded not being automatic with Btrfs is essentially pointless if we don't have a bootloader. I also see no way in the UI to even create Btrfs raid of any sort. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On 07/01/14 06:25, Jim Salter wrote: FWIW, Ubuntu (and I presume Debian) will work just fine with a single / on btrfs, single or multi disk. I currently have two machines booting to a btrfs-raid10 / with no separate /boot, one booting to a btrfs single disk / with no /boot, and one booting to a btrfs-raid10 / with an ext4-on-mdraid1 /boot. Actually I've run into a problem with grub where a fresh install cannot boot from a btrfs /boot if your first partition is not 1MB aligned (sector 2048) then there is then not enough space for it to store its btrfs code. :-( https://bugs.launchpad.net/ubuntu/+source/grub-installer/+bug/1266195 I don't want to move my first partition as it's a Dell special (type 'de') and I'm not sure what the impact would be, so I just created an ext4 /boot and the install then worked. Regarding RAID, yes I realise it's easy to do post-fact, in fact on the same test system I added an external USB2 drive to the root filesystem and rebalanced as RAID-1, worked nicely. I'm planning on adding dual SSDs as my OS disks to my desktop and this experiment was to learn whether the Kubuntu installer handled it yet and if not to do a quick practice of setting it up by hand. :-) All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Sat, 4 Jan 2014 02:56:39 PM Chris Mason wrote: Seconded +ADs-) We're really focused on nailing down these problems instead of hiding behind the experimental flag. I know we won't be perfect overnight, but it's time to focus on production workloads. Perhaps an option here is to remove the need to specify the degraded flag but if the filesystem notice that it is mounting a RAID array and would otherwise fail it then sets the degraded flag itself and carries on? That way the fact it was degraded would be visible in /proc/mounts and could be detected with health check scripts like NRPE for icinga/nagios. Looking at the code this would be in read_one_dev() in fs/btrfs/volumes.c ? All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: http://en.wikipedia.org/wiki/OpenPGP signature.asc Description: This is a digitally signed message part.
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
Jim Salter posted on Sat, 04 Jan 2014 16:22:53 -0500 as excerpted: On 01/04/2014 01:10 AM, Duncan wrote: The example given in the OP was of a 4-device raid10, already the minimum number to work undegraded, with one device dropped out, to below the minimum required number to mount undegraded, so of /course/ it wouldn't mount without that option. The issue was not realizing that a degraded fault-tolerant array would refuse to mount without being passed an -o degraded option. Yes, it's on the wiki - but it's on the wiki under *replacing* a device, not in the FAQ, not in the head of the multiple devices section, etc; and no coherent message is thrown either on the console or in the kernel log when you do attempt to mount a degraded array without the correct argument. IMO that's a bug. =) I'd agree, usability bug, one of many smoothing out the rough it works, but it's not easy to work with it bugs. FWIW I'm seeing progress in that area, now. The rush of functional bugs and fixes for them has finally slowed down to the point where there's beginning to be time to focus on the usability and rough edges bugs. I believe I saw a post in October or November from Chris Mason, where he said yes, the maturing of btrfs has been predicted before, but it really does seem like the functional bugs are slowing down to the point where the usability bugs can finally be addressed, and 2014 really does look like the year that btrfs will finally start shaping up into a mature looking and acting filesystem, including in usability, etc. And Chris mentioned the GSoS project that worked on one angle of this specific issue, too. Getting that code integrated and having btrfs finally be able to recognize a dropped and re-added device and automatically trigger a resync... that'd be a pretty sweet improvement to get. =:^) While they're working on that they may well take a look at at least giving the admin more information on a degraded-needed mount failure, too, tweaking the kernel log messages, etc, and possibly taking a second look as to whether full refusing to mount is the best situation then, or not. Actually, I wonder... what about mounting in such a situation, but read- only and refusing to go writable unless degraded is added too? That would preserve the first, do no harm, don't make the problem worse ideal, while mounting but read-only unless degraded is added with the rw, wouldn't be /quite/ as drastic as refusing to mount entirely, unless degraded is added. I actually think that, plus some better logging saying hey, we don't have enough devices to write with the requested raid level, so remount rw,degraded, and either add another device or reconfigure the raid mode to something suitable for the number of devices. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
Chris Samuel posted on Sun, 05 Jan 2014 20:20:26 +1100 as excerpted: On Sat, 4 Jan 2014 02:56:39 PM Chris Mason wrote: Seconded +ADs-) We're really focused on nailing down these problems instead of hiding behind the experimental flag. I know we won't be perfect overnight, but it's time to focus on production workloads. Perhaps an option here is to remove the need to specify the degraded flag but if the filesystem notice that it is mounting a RAID array and would otherwise fail it then sets the degraded flag itself and carries on? That way the fact it was degraded would be visible in /proc/mounts and could be detected with health check scripts like NRPE for icinga/nagios. Looking at the code this would be in read_one_dev() in fs/btrfs/volumes.c ? The idea I came up elsewhere was to mount read-only, with a dmesg to the effect that the filesystem was configured for a raid-level that the current number of devices couldn't support, so mount rw,degraded to accept that temporarily and to make changes, either by adding a new device to fill out the required number for the configured raid level, or by reducing the configured raid level to match reality. The read-only mount would be better than not mounting at all, while preserving the first, do no further harm ideal, since mounted read- only, the existing situation should at least remain stable. It would also alert the admin to problems, with a reasonable log message saying how to fix them, while letting the admin at least access the filesystem in read-only mode, thereby giving him tools access to manage whatever maintenance tasks are necessary, should it be the rootfs. The admin could then take the action they deemed appropriate, whether that was getting the data backed up, or mounting degraded,rw in ordered to either add a device and bring it back to functional or to rebalance to a lower data/metadata redundancy level due to lack of devices. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Jan 4, 2014, at 2:16 PM, Jim Salter j...@jrs-s.net wrote: On 01/04/2014 02:18 PM, Chris Murphy wrote: I'm not sure what else you're referring to?(working on boot environment of btrfs) Just the string of caveats regarding mounting at boot time - needing to monkeypatch 00_header to avoid the bogus sparse file error I don't know what bogus sparse file error refers to. What version of GRUB? I'm seeing Ubuntu 12.03 precise-updates listing GRUB 1.99 which is rather old. (which, worse, tells you to press a key when pressing a key does nothing) followed by this, in my opinion completely unexpected, behavior when missing a disk in a fault-tolerant array, which also requires monkey-patching in fstab and now elsewhere in GRUB to avoid. and… I'm aware it's not intended for production yet. On the one hand you say you're aware, yet on the other hand you say the missing disk behavior is completely unexpected. Some parts of Btrfs, in certain contexts, are production ready. But the developmental state of Btrfs places a burden on the user to know more details about that state than he might otherwise be expected to know with more stable/mature file systems. My opinion is that it's inappropriate for degraded mounts to be made automatic when there's no method of notifying user space of this state change. Gnome-shell via udisks will inform users of a degraded md array. Something equivalent to that is needed before Btrfs should enable a scenario where a user boots a computer in degraded state without being informed as if there's nothing wrong at all. That's demonstrably far worse than scary boot failure, during which one copy of data is still likely safe, unlike permitting uninformed degraded rw operation. However, it's just on the cusp, with distributions not only including it in their installers but a couple teetering on the fence with declaring it their next default FS (Oracle Unbreakable, OpenSuse, hell even RedHat was flirting with the idea) that it seems to me some extra testing with an eye towards production isn't a bad thing. Does the Ubuntu 12.03 LTS installer let you create sysroot on a Btrfs raid1 volume? That's why I'm here. Not to crap on anybody, but to get involved, hopefully helpfully. I think you're better off using something more developmental, it necessarily needs to exist in the first place there, before it can trickle down to an LTS release. fs_passno is 1 which doesn't apply to Btrfs. Again, that's the distribution's default, so the argument should be with them, not me… Yes so you'd want to file a bug? That's how you get involved. with that said, I'd respectfully argue that fs_passno 1 is correct for any root file system; if the file system itself declines to run an fsck that's up to the filesystem, but it's correct to specify fs_passno 1 if the filesystem is to be mounted as root in the first place. I'm open to hearing why that's a bad idea, if you have a specific reason? It's a minor point, but it shows that fs_passno has become quaint, like grandma's iron cozy. It's not applicable for either XFS or Btrfs. It's arguably inapplicable for ext3/4 but its fsck program has an optimization to skip fully checking the file system if the journal replay succeeds. There is no unattended fsck for either XFS or Btrfs. On systemd systems, it reads fstab, and if fs_passno is non-zero it checks for the existence of /sbin/fsck.fs and if it doesn't exist, then it doesn't run fsck for that entry. This topic was recently brought up and is in the archives. Well actually LVM thinp does have fast snapshots without requiring preallocation, and uses COW. LVM's snapshots aren't very useful for me - there's a performance penalty while you have them in place, so they're best used as a transient use-then-immediately-delete feature, for instance for rsync'ing off a database binary. Until recently, there also wasn't a good way to roll back an LV to a snapshot, and even now, that can be pretty problematic. This describes old LVM snapshots, not LVM thinp snapshots. Finally, there's no way to get a partial copy of an LV snapshot out of the snapshot and back into production, so if eg you have virtual machines of significant size, you could be looking at *hours* of file copy operations to restore an individual VM out of a snapshot (if you even have the drive space available for it), as compared to btrfs' cp --reflink=always operation, which allows you to do the same thing instantaneously. LVM isn't a file system, so limitations compared to Btrfs are expected. I'm not sure what you mean by self-correcting, but if the drive reports a read error md, lvm, and Btrfs raid1+ all will get missing data from mirror/parity reconstruction, and write corrected data back to the bad sector. You're assuming that the drive will actually *report* a read error, which is frequently not the case. This is discussed in
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Sat, 4 Jan 2014 06:10:14 AM Duncan wrote: Btrfs remains under development and there are clear warnings about using it without backups one hasn't tested recovery from or are not otherwise prepared to actually use. It's stated in multiple locations on the wiki; it's stated on the kernel btrfs config option, and it's stated in mkfs.btrfs output when you create the filesystem. Actually the scary warnings are gone from the Kconfig file for what will be the 3.13 kernel. Removed by this commit: commit 4204617d142c0887e45fda2562cb5c58097b918e Author: David Sterba dste...@suse.cz Date: Wed Nov 20 14:32:34 2013 +0100 btrfs: update kconfig help text Reflect the current status. Portions of the text taken from the wiki pages. Signed-off-by: David Sterba dste...@suse.cz Signed-off-by: Chris Mason chris.ma...@fusionio.com -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: http://en.wikipedia.org/wiki/OpenPGP signature.asc Description: This is a digitally signed message part.
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Sat, 4 Jan 2014 12:57:02 AM Dave wrote: I find myself annoyed by the constant disclaimers I read on this list, about the experimental status of Btrfs, but it's apparent that this hasn't sunk in for everyone. Btrfs will no longer marked as experimental in the kernel as of 3.13. Unless someone submits a patch to fix it first. :-) Can we also keep things polite here please. thanks, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: http://en.wikipedia.org/wiki/OpenPGP signature.asc Description: This is a digitally signed message part.
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
Chris Samuel posted on Sat, 04 Jan 2014 22:20:20 +1100 as excerpted: On Sat, 4 Jan 2014 06:10:14 AM Duncan wrote: Btrfs remains under development and there are clear warnings about using it without backups one hasn't tested recovery from or are not otherwise prepared to actually use. It's stated in multiple locations on the wiki; it's stated on the kernel btrfs config option, and it's stated in mkfs.btrfs output when you create the filesystem. Actually the scary warnings are gone from the Kconfig file for what will be the 3.13 kernel. Removed by this commit: commit 4204617d142c0887e45fda2562cb5c58097b918e FWIW, I'd characterize that as toned down somewhat, not /gone/. You don't see ext4 or other mature filesystems saying The filesystem disk format is no longer unstable, and it's not expected to change unless ..., do you? Not expected to change and etc is definitely toned down from what it was, no argument there, but it still isn't exactly what one would expect in a description from a stable filesystem. If there's still some chance of the disk format changing, what does that say about the code /dealing/ with that disk format? That doesn't sound exactly like something I'd be comfortable staking my reputation as a sysadmin on as judged fully reliable and ready for my mission-critical data, for sure! Tho agreed, one certainly has to read between the lines a bit more for the kernel option now than they did. But the real kicker for me was when I redid several of my btrfs partitions to take advantage of newer features, 16 KiB nodes, etc, and saw the warning it's giving, yes, in btrfs-progs 3.12 after all the recent documentation changes, etc. Not everybody builds their own kernel, but it's kind of hard to get a btrfs filesystem without making one! (Yes, I know the installers make the filesystem for many people, and may well hide the output, but if so and the distros don't provide a similar warning when people choose btrfs, that's entirely on the distros at that point. Not much btrfs as upstream can do about that.) -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Sat, 2014-01-04 at 06:10 +, Duncan wrote: Chris Murphy posted on Fri, 03 Jan 2014 16:22:44 -0700 as excerpted: I would not make this option persistent by putting it permanently in the grub.cfg; although I don't know the consequence of always mounting with degraded even if not necessary it could have some negative effects (?) Degraded only actually does anything if it's actually needed. On a normal array it'll be a NOOP, so should be entirely safe for /normal/ operation, but that doesn't mean I'd /recommend/ it for normal operation, since it bypasses checks that are there for a reason, thus silently bypassing information that an admin needs to know before he boots it anyway, in ordered to recover. However, I've some other comments to add: 1) As you I'm uncomfortable with the whole idea of adding degraded permanently at this point. I added mount -o degraded just because I wanted the admin to be notified of failures. Right now it's still the most reliable way to notify them, but I definitely agree we can do better. Leaving it on all the time? I don't think this is a great long term solution, unless you are actively monitoring the system to make sure there are no failures. Also, as Neil Brown pointed out it does put you at risk of transient device detection failures getting things out of sync. Test: a) Create a two device btrfs raid1. b) Mount it and write some data to it. c) Unmount it, unplug one device, mount degraded the remaining device. d) Write some data to a test file on it, noting the path/filename and data. e) Unmount again, switch plugged devices so the formerly unplugged one is now the plugged one, and again mount degraded. f) Write some DIFFERENT data to the SAME path/file as in (d), so the two versions each on its own device have now incompatibly forked. g) Unmount, plug both devices in and mount, now undegraded. What I discovered back then, and to my knowledge the same behavior exists today, is that entirely unexpectedly from and in contrast to my mdraid experience, THE FILESYSTEM MOUNTED WITHOUT PROTEST!! h) I checked the file and one variant as written was returned. STILL NO WARNING! While I didn't test it, I'm assuming based on the PID-based round-robin read-assignment that I now know btrfs uses, that which copy I got would depend on whether the PID of the reading thread was even or odd, as that's what determines what device of the pair is read. (There has actually been some discussion of that as it's not a particularly intelligent balancing scheme and it's on the list to change, but the current even/odd works well enough for an initial implementation while the filesystem remains under development.) i) Were I rerunning the test today, I'd try a scrub and see what it did with the difference. But I was early enough in my btrfs learning that I didn't know to run it at that point, so didn't do so. I'd still be interested in how it handled that, tho based on what I know of btrfs behavior in general, I can /predict/ that which copy it'd scrub out and which it would keep, would again depend on the PID of the scrub thread, since both copies would appear valid (would verify against their checksum on the same device) when read, and it's only when matched against the other that a problem, presumably with the other copy, would be detected. It'll pick the latest generation number and use that one as the one true source. For the others you'll get crc errors which make it fall back to the latest one. If the two have exactly the same generation number, we'll have a hard time picking the best one. Ilya has a series of changes from this year's GSOC that we need to clean up and integrate. It detects offline devices and brings them up to date automatically. He targeted the pull-one-drive use case explicitly. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Sat, 2014-01-04 at 22:28 +1100, Chris Samuel wrote: On Sat, 4 Jan 2014 12:57:02 AM Dave wrote: I find myself annoyed by the constant disclaimers I read on this list, about the experimental status of Btrfs, but it's apparent that this hasn't sunk in for everyone. Btrfs will no longer marked as experimental in the kernel as of 3.13. Unless someone submits a patch to fix it first. :-) Can we also keep things polite here please. Seconded ;) We're really focused on nailing down these problems instead of hiding behind the experimental flag. I know we won't be perfect overnight, but it's time to focus on production workloads. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On 2014-01-04 15:51, Chris Mason wrote: I added mount -o degraded just because I wanted the admin to be notified of failures. Right now it's still the most reliable way to notify them, but I definitely agree we can do better. I think that we should align us to what the others raid subsystem (md and dm) do in these cases. Reading the man page of mdadm, to me it seems that an array is constructed even without some disks; the only requirement is the disks have to be valid (i.e. not out of sync) Leaving it on all the time? I don't think this is a great long term solution, unless you are actively monitoring the system to make sure there are no failures. Anyway mdadm has the monitor mode, which reports this kind of error. From mdadm man page: Follow or Monitor Monitor one or more md devices and act on any state changes. This is only meaningful for RAID1, 4, 5, 6, 10 or multipath arrays, as only these have interesting state. RAID0 or Linear never have missing, spare, or failed drives, so there is nothing to monitor. Best regards GB -- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Jan 3, 2014, at 7:59 PM, Jim Salter j...@jrs-s.net wrote: On 01/03/2014 07:27 PM, Chris Murphy wrote: This is the wrong way to solve this. /etc/grub.d/10_linux is subject to being replaced on updates. It is not recommended it be edited, same as for grub.cfg. The correct way is as I already stated, which is to edit the GRUB_CMDLINE_LINUX= line in /etc/default/grub. Fair enough - though since I already have to monkey-patch 00_header, I kind of already have an eye on grub.d so it doesn't seem as onerous as it otherwise would. There is definitely a lot of work that needs to be done on the boot sequence for btrfs IMO. Most of this work is done for a while in current versions of GRUB 2.00. There are a few fixes due in 2.02. There are some logical challenges making snapshots bootable in a coherent way. But a major advantage of Btrfs is that functionality is contained in one place so once the kernel is booted things usually just work, so I'm not sure what else you're referring to? I think it's bad advice to recommend always persistently mounting a good volume with this option. There's a reason why degraded is not the default mount option, and why there isn't yet automatic degraded mount functionality. That fstab contains other errors. What other errors does it contain? Aside from adding the degraded option, that's a bone-stock fstab entry from an Ubuntu Server installation. fs_passno is 1 which doesn't apply to Btrfs. You're simply dissatisfied with the state of Btrfs development and are suggesting bad hacks as a work around. That's my argument. Again, if your use case requires automatic degraded mounts, use a technology that's mature and well tested for that use case. Don't expect a lot of sympathy if these bad hacks cause you problems later. You're suggesting the wrong alternatives here (mdraid, LVM, etc) - they don't provide the features that I need or are accustomed to (true snapshots, copy on write, self-correcting redundant arrays, and on down the line). Well actually LVM thinp does have fast snapshots without requiring preallocation, and uses COW. I'm not sure what you mean by self-correcting, but if the drive reports a read error md, lvm, and Btrfs raid1+ all will get missing data from mirror/parity reconstruction, and write corrected data back to the bad sector. All offer scrubbing (except Btrfs raid5/6). If you mean an independent means of verifying data via checksumming, true you're looking at Btrfs, ZFS, or PI. If you're going to shoo me off, the correct way to do it is to wave me in the direction of ZFS There's no shooing, I'm just making observations. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
Chris Mason posted on Sat, 04 Jan 2014 14:51:23 + as excerpted: It'll pick the latest generation number and use that one as the one true source. For the others you'll get crc errors which make it fall back to the latest one. If the two have exactly the same generation number, we'll have a hard time picking the best one. Ilya has a series of changes from this year's GSOC that we need to clean up and integrate. It detects offline devices and brings them up to date automatically. He targeted the pull-one-drive use case explicitly. Thanks for the explanation and bits to look forward to. I'll be looking forward to seeing that GSOC stuff then, as having dropouts and re-adds auto-handled would be a sweet feature to add to the raid featureset, improving things from a sysadmin's prepared-to-deal-with- recovery perspective quite a bit. =:^) -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On 01/04/2014 02:18 PM, Chris Murphy wrote: I'm not sure what else you're referring to?(working on boot environment of btrfs) Just the string of caveats regarding mounting at boot time - needing to monkeypatch 00_header to avoid the bogus sparse file error (which, worse, tells you to press a key when pressing a key does nothing) followed by this, in my opinion completely unexpected, behavior when missing a disk in a fault-tolerant array, which also requires monkey-patching in fstab and now elsewhere in GRUB to avoid. Please keep in mind - I think we got off on the wrong foot here, and I'm sorry for my part in that, it was unintentional. I *love* btrfs, and think the devs are doing incredible work. I'm excited about it. I'm aware it's not intended for production yet. However, it's just on the cusp, with distributions not only including it in their installers but a couple teetering on the fence with declaring it their next default FS (Oracle Unbreakable, OpenSuse, hell even RedHat was flirting with the idea) that it seems to me some extra testing with an eye towards production isn't a bad thing. That's why I'm here. Not to crap on anybody, but to get involved, hopefully helpfully. fs_passno is 1 which doesn't apply to Btrfs. Again, that's the distribution's default, so the argument should be with them, not me... with that said, I'd respectfully argue that fs_passno 1 is correct for any root file system; if the file system itself declines to run an fsck that's up to the filesystem, but it's correct to specify fs_passno 1 if the filesystem is to be mounted as root in the first place. I'm open to hearing why that's a bad idea, if you have a specific reason? Well actually LVM thinp does have fast snapshots without requiring preallocation, and uses COW. LVM's snapshots aren't very useful for me - there's a performance penalty while you have them in place, so they're best used as a transient use-then-immediately-delete feature, for instance for rsync'ing off a database binary. Until recently, there also wasn't a good way to roll back an LV to a snapshot, and even now, that can be pretty problematic. Finally, there's no way to get a partial copy of an LV snapshot out of the snapshot and back into production, so if eg you have virtual machines of significant size, you could be looking at *hours* of file copy operations to restore an individual VM out of a snapshot (if you even have the drive space available for it), as compared to btrfs' cp --reflink=always operation, which allows you to do the same thing instantaneously. FWIW, I think the ability to do cp --reflink=always is one of the big killer features that makes btrfs more attractive than zfs (which, again FWIW, I have 5+ years of experience with, and is my current primary storage system). I'm not sure what you mean by self-correcting, but if the drive reports a read error md, lvm, and Btrfs raid1+ all will get missing data from mirror/parity reconstruction, and write corrected data back to the bad sector. You're assuming that the drive will actually *report* a read error, which is frequently not the case. I have a production ZFS array right now that I need to replace an Intel SSD on - the SSD has thrown 10K checksum errors in six months. Zero read or write errors. Neither hardware RAID nor mdraid nor LVM would have helped me there. Since running filesystems that do block-level checksumming, I have become aware that bitrot happens without hardware errors getting thrown FAR more frequently than I would have thought before having the tools to spot it. ZFS, and now btrfs, are the only tools at hand that can actually prevent it. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On 01/04/2014 01:10 AM, Duncan wrote: The example given in the OP was of a 4-device raid10, already the minimum number to work undegraded, with one device dropped out, to below the minimum required number to mount undegraded, so of /course/ it wouldn't mount without that option. The issue was not realizing that a degraded fault-tolerant array would refuse to mount without being passed an -o degraded option. Yes, it's on the wiki - but it's on the wiki under *replacing* a device, not in the FAQ, not in the head of the multiple devices section, etc; and no coherent message is thrown either on the console or in the kernel log when you do attempt to mount a degraded array without the correct argument. IMO that's a bug. =) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
I'm using Ubuntu 12.04.3 with an up-to-date 3.11 kernel, and the btrfs-progs from Debian Sid (since the ones from Ubuntu are ancient). I discovered to my horror during testing today that neither raid1 nor raid10 arrays are fault tolerant of losing an actual disk. mkfs.btrfs -d raid10 -m raid10 /dev/vdc /dev/vdd /dev/vdd /dev/vde mkdir /test mount /dev/vdb /test echo test /test/test btrfs filesystem sync /test shutdown -hP now After shutting down the VM, I can remove ANY of the drives from the btrfs raid10 array, and be unable to mount the array. In this case, I removed the drive that was at /dev/vde, then restarted the VM. btrfs fi show Label: none uuid: 94af1f5d-6ad2-4582-ab4a-5410c410c455 Total devices 4 FS bytes used 156.00KB devid3 size 1.00GB used 212.75MB path /dev/vdd devid3 size 1.00GB used 212.75MB path /dev/vdc devid3 size 1.00GB used 232.75MB path /dev/vdb *** Some devices missing OK, we have three of four raid10 devices present. Should be fine. Let's mount it: mount -t btrfs /dev/vdb /test mount: wrong fs type, bad option, bad superblock on /dev/vdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so What's the kernel log got to say about it? dmesg | tail -n 4 [ 536.694363] device fsid 94af1f5d-6ad2-4582-ab4a-5410c410c455 devid 1 transid 7 /dev/vdb [ 536.700515] btrfs: disk space caching is enabled [ 536.703491] btrfs: failed to read the system array on vdd [ 536.708337] btrfs: open_ctree failed Same behavior persists whether I create a raid1 or raid10 array, and whether I create it as that raid level using mkfs.btrfs or convert it afterwards using btrfs balance start -dconvert=raidn -mconvert=raidn. Also persists even if I both scrub AND sync the array before shutting the machine down and removing one of the disks. What's up with this? This is a MASSIVE bug, and I haven't seen anybody else talking about it... has nobody tried actually failing out a disk yet, or what? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
Am 03.01.2014 23:28, schrieb Jim Salter: I'm using Ubuntu 12.04.3 with an up-to-date 3.11 kernel, and the btrfs-progs from Debian Sid (since the ones from Ubuntu are ancient). I discovered to my horror during testing today that neither raid1 nor raid10 arrays are fault tolerant of losing an actual disk. mkfs.btrfs -d raid10 -m raid10 /dev/vdc /dev/vdd /dev/vdd /dev/vde mkdir /test mount /dev/vdb /test echo test /test/test btrfs filesystem sync /test shutdown -hP now After shutting down the VM, I can remove ANY of the drives from the btrfs raid10 array, and be unable to mount the array. In this case, I removed the drive that was at /dev/vde, then restarted the VM. btrfs fi show Label: none uuid: 94af1f5d-6ad2-4582-ab4a-5410c410c455 Total devices 4 FS bytes used 156.00KB devid3 size 1.00GB used 212.75MB path /dev/vdd devid3 size 1.00GB used 212.75MB path /dev/vdc devid3 size 1.00GB used 232.75MB path /dev/vdb *** Some devices missing OK, we have three of four raid10 devices present. Should be fine. Let's mount it: mount -t btrfs /dev/vdb /test mount: wrong fs type, bad option, bad superblock on /dev/vdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so What's the kernel log got to say about it? dmesg | tail -n 4 [ 536.694363] device fsid 94af1f5d-6ad2-4582-ab4a-5410c410c455 devid 1 transid 7 /dev/vdb [ 536.700515] btrfs: disk space caching is enabled [ 536.703491] btrfs: failed to read the system array on vdd [ 536.708337] btrfs: open_ctree failed Same behavior persists whether I create a raid1 or raid10 array, and whether I create it as that raid level using mkfs.btrfs or convert it afterwards using btrfs balance start -dconvert=raidn -mconvert=raidn. Also persists even if I both scrub AND sync the array before shutting the machine down and removing one of the disks. What's up with this? This is a MASSIVE bug, and I haven't seen anybody else talking about it... has nobody tried actually failing out a disk yet, or what? Hey Jim, keep calm and read the wiki ;) https://btrfs.wiki.kernel.org/ You need to mount with -o degraded to tell btrfs a disk is missing. Joshua -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
I actually read the wiki pretty obsessively before blasting the list - could not successfully find anything answering the question, by scanning the FAQ or by Googling. You're right - mount -t btrfs -o degraded /dev/vdb /test worked fine. HOWEVER - this won't allow a root filesystem to mount. How do you deal with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root filesystem? Few things are scarier than seeing the cannot find init message in GRUB and being faced with a BusyBox prompt... which is actually how I initially got my scare; I was trying to do a walkthrough for setting up a raid1 / for an article in a major online magazine and it wouldn't boot at all after removing a device; I backed off and tested with a non root filesystem before hitting the list. I did find the -o degraded argument in the wiki now that you mentioned it - but it's not prominent enough if you ask me. =) On 01/03/2014 05:43 PM, Joshua Schüler wrote: Am 03.01.2014 23:28, schrieb Jim Salter: I'm using Ubuntu 12.04.3 with an up-to-date 3.11 kernel, and the btrfs-progs from Debian Sid (since the ones from Ubuntu are ancient). I discovered to my horror during testing today that neither raid1 nor raid10 arrays are fault tolerant of losing an actual disk. mkfs.btrfs -d raid10 -m raid10 /dev/vdc /dev/vdd /dev/vdd /dev/vde mkdir /test mount /dev/vdb /test echo test /test/test btrfs filesystem sync /test shutdown -hP now After shutting down the VM, I can remove ANY of the drives from the btrfs raid10 array, and be unable to mount the array. In this case, I removed the drive that was at /dev/vde, then restarted the VM. btrfs fi show Label: none uuid: 94af1f5d-6ad2-4582-ab4a-5410c410c455 Total devices 4 FS bytes used 156.00KB devid3 size 1.00GB used 212.75MB path /dev/vdd devid3 size 1.00GB used 212.75MB path /dev/vdc devid3 size 1.00GB used 232.75MB path /dev/vdb *** Some devices missing OK, we have three of four raid10 devices present. Should be fine. Let's mount it: mount -t btrfs /dev/vdb /test mount: wrong fs type, bad option, bad superblock on /dev/vdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so What's the kernel log got to say about it? dmesg | tail -n 4 [ 536.694363] device fsid 94af1f5d-6ad2-4582-ab4a-5410c410c455 devid 1 transid 7 /dev/vdb [ 536.700515] btrfs: disk space caching is enabled [ 536.703491] btrfs: failed to read the system array on vdd [ 536.708337] btrfs: open_ctree failed Same behavior persists whether I create a raid1 or raid10 array, and whether I create it as that raid level using mkfs.btrfs or convert it afterwards using btrfs balance start -dconvert=raidn -mconvert=raidn. Also persists even if I both scrub AND sync the array before shutting the machine down and removing one of the disks. What's up with this? This is a MASSIVE bug, and I haven't seen anybody else talking about it... has nobody tried actually failing out a disk yet, or what? Hey Jim, keep calm and read the wiki ;) https://btrfs.wiki.kernel.org/ You need to mount with -o degraded to tell btrfs a disk is missing. Joshua -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
Am 03.01.2014 23:56, schrieb Jim Salter: I actually read the wiki pretty obsessively before blasting the list - could not successfully find anything answering the question, by scanning the FAQ or by Googling. You're right - mount -t btrfs -o degraded /dev/vdb /test worked fine. don't forget to btrfs device delete missing path See https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices HOWEVER - this won't allow a root filesystem to mount. How do you deal with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root filesystem? Few things are scarier than seeing the cannot find init message in GRUB and being faced with a BusyBox prompt... which is actually how I initially got my scare; I was trying to do a walkthrough for setting up a raid1 / for an article in a major online magazine and it wouldn't boot at all after removing a device; I backed off and tested with a non root filesystem before hitting the list. Add -o degraded to the boot-options in GRUB. If your filesystem is more heavily corrupted then you either need the btrfs tools in your initrd or a rescue cd I did find the -o degraded argument in the wiki now that you mentioned it - but it's not prominent enough if you ask me. =) [snip] Joshua -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Fri, Jan 03, 2014 at 05:56:42PM -0500, Jim Salter wrote: I actually read the wiki pretty obsessively before blasting the list - could not successfully find anything answering the question, by scanning the FAQ or by Googling. You're right - mount -t btrfs -o degraded /dev/vdb /test worked fine. HOWEVER - this won't allow a root filesystem to mount. How do you deal with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root filesystem? Few things are scarier than seeing the cannot find init message in GRUB and being faced with a BusyBox prompt... Use grub's command-line editing to add rootflags=degraded to it. Hugo. which is actually how I initially got my scare; I was trying to do a walkthrough for setting up a raid1 / for an article in a major online magazine and it wouldn't boot at all after removing a device; I backed off and tested with a non root filesystem before hitting the list. I did find the -o degraded argument in the wiki now that you mentioned it - but it's not prominent enough if you ask me. =) On 01/03/2014 05:43 PM, Joshua Schüler wrote: Am 03.01.2014 23:28, schrieb Jim Salter: I'm using Ubuntu 12.04.3 with an up-to-date 3.11 kernel, and the btrfs-progs from Debian Sid (since the ones from Ubuntu are ancient). I discovered to my horror during testing today that neither raid1 nor raid10 arrays are fault tolerant of losing an actual disk. mkfs.btrfs -d raid10 -m raid10 /dev/vdc /dev/vdd /dev/vdd /dev/vde mkdir /test mount /dev/vdb /test echo test /test/test btrfs filesystem sync /test shutdown -hP now After shutting down the VM, I can remove ANY of the drives from the btrfs raid10 array, and be unable to mount the array. In this case, I removed the drive that was at /dev/vde, then restarted the VM. btrfs fi show Label: none uuid: 94af1f5d-6ad2-4582-ab4a-5410c410c455 Total devices 4 FS bytes used 156.00KB devid3 size 1.00GB used 212.75MB path /dev/vdd devid3 size 1.00GB used 212.75MB path /dev/vdc devid3 size 1.00GB used 232.75MB path /dev/vdb *** Some devices missing OK, we have three of four raid10 devices present. Should be fine. Let's mount it: mount -t btrfs /dev/vdb /test mount: wrong fs type, bad option, bad superblock on /dev/vdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so What's the kernel log got to say about it? dmesg | tail -n 4 [ 536.694363] device fsid 94af1f5d-6ad2-4582-ab4a-5410c410c455 devid 1 transid 7 /dev/vdb [ 536.700515] btrfs: disk space caching is enabled [ 536.703491] btrfs: failed to read the system array on vdd [ 536.708337] btrfs: open_ctree failed Same behavior persists whether I create a raid1 or raid10 array, and whether I create it as that raid level using mkfs.btrfs or convert it afterwards using btrfs balance start -dconvert=raidn -mconvert=raidn. Also persists even if I both scrub AND sync the array before shutting the machine down and removing one of the disks. What's up with this? This is a MASSIVE bug, and I haven't seen anybody else talking about it... has nobody tried actually failing out a disk yet, or what? Hey Jim, keep calm and read the wiki ;) https://btrfs.wiki.kernel.org/ You need to mount with -o degraded to tell btrfs a disk is missing. Joshua -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Eighth Army Push Bottles Up Germans -- WWII newspaper --- headline (possibly apocryphal) signature.asc Description: Digital signature
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
Sorry - where do I put this in GRUB? /boot/grub/grub.cfg is still kinda black magic to me, and I don't think I'm supposed to be editing it directly at all anymore anyway, if I remember correctly... HOWEVER - this won't allow a root filesystem to mount. How do you deal with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root filesystem? Few things are scarier than seeing the cannot find init message in GRUB and being faced with a BusyBox prompt... which is actually how I initially got my scare; I was trying to do a walkthrough for setting up a raid1 / for an article in a major online magazine and it wouldn't boot at all after removing a device; I backed off and tested with a non root filesystem before hitting the list. Add -o degraded to the boot-options in GRUB. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Fri, Jan 03, 2014 at 06:13:25PM -0500, Jim Salter wrote: Sorry - where do I put this in GRUB? /boot/grub/grub.cfg is still kinda black magic to me, and I don't think I'm supposed to be editing it directly at all anymore anyway, if I remember correctly... You don't need to edit grub.cfg -- when you boot, grub has an edit option, so you can do it at boot time without having to use a rescue disk. Regardless, the thing you need to edit is the line starting linux, and will look something like this: linux /vmlinuz-3.11.0-rc2-dirty root=UUID=1b6ec419-211a-445e-b762-ae7da27b6e8a ro single rootflags=subvol=fs-root If there's a rootflags= option already (as above), add ,degraded to the end. If there isn't, add rootflags=degraded. Hugo. HOWEVER - this won't allow a root filesystem to mount. How do you deal with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root filesystem? Few things are scarier than seeing the cannot find init message in GRUB and being faced with a BusyBox prompt... which is actually how I initially got my scare; I was trying to do a walkthrough for setting up a raid1 / for an article in a major online magazine and it wouldn't boot at all after removing a device; I backed off and tested with a non root filesystem before hitting the list. Add -o degraded to the boot-options in GRUB. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Eighth Army Push Bottles Up Germans -- WWII newspaper --- headline (possibly apocryphal) signature.asc Description: Digital signature
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Jan 3, 2014, at 3:56 PM, Jim Salter j...@jrs-s.net wrote: I actually read the wiki pretty obsessively before blasting the list - could not successfully find anything answering the question, by scanning the FAQ or by Googling. You're right - mount -t btrfs -o degraded /dev/vdb /test worked fine. HOWEVER - this won't allow a root filesystem to mount. How do you deal with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root filesystem? I'd say that it's not ready for unattended/auto degraded mounting, that this is intended to be a red flag show stopper to get the attention of the user. Before automatic degraded mounts, which md and LVM raid do now, there probably needs to be notification support in desktop's, .e.g. Gnome will report degraded state for at least md arrays (maybe LVM too, not sure). There's also a list of other multiple device stuff on the to do, some of which maybe should be done before auto degraded mount, for example the hot spare work. https://btrfs.wiki.kernel.org/index.php/Project_ideas#Multiple_Devices Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
Yep - had just figured that out and successfully booted with it, and was in the process of typing up instructions for the list (and posterity). One thing that concerns me is that edits made directly to grub.cfg will get wiped out with every kernel upgrade when update-grub is run - any idea where I'd put this in /etc/grub.d to have a persistent change? I have to tell you, I'm not real thrilled with this behavior either way - it means I can't have the option to automatically mount degraded filesystems without the filesystems in question ALWAYS showing as being mounted degraded, whether the disks are all present and working fine or not. That's kind of blecchy. =\ On 01/03/2014 06:18 PM, Hugo Mills wrote: On Fri, Jan 03, 2014 at 06:13:25PM -0500, Jim Salter wrote: Sorry - where do I put this in GRUB? /boot/grub/grub.cfg is still kinda black magic to me, and I don't think I'm supposed to be editing it directly at all anymore anyway, if I remember correctly... You don't need to edit grub.cfg -- when you boot, grub has an edit option, so you can do it at boot time without having to use a rescue disk. Regardless, the thing you need to edit is the line starting linux, and will look something like this: linux /vmlinuz-3.11.0-rc2-dirty root=UUID=1b6ec419-211a-445e-b762-ae7da27b6e8a ro single rootflags=subvol=fs-root If there's a rootflags= option already (as above), add ,degraded to the end. If there isn't, add rootflags=degraded. Hugo. HOWEVER - this won't allow a root filesystem to mount. How do you deal with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root filesystem? Few things are scarier than seeing the cannot find init message in GRUB and being faced with a BusyBox prompt... which is actually how I initially got my scare; I was trying to do a walkthrough for setting up a raid1 / for an article in a major online magazine and it wouldn't boot at all after removing a device; I backed off and tested with a non root filesystem before hitting the list. Add -o degraded to the boot-options in GRUB. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Jan 3, 2014, at 4:13 PM, Jim Salter j...@jrs-s.net wrote: Sorry - where do I put this in GRUB? /boot/grub/grub.cfg is still kinda black magic to me, and I don't think I'm supposed to be editing it directly at all anymore anyway, if I remember correctly… Don't edit the grub.cfg directly. At the grub menu, only highlight the entry you want to boot, then hit 'e', and then edit the existing linux/linuxefi line. If you already have rootfs on a subvolume, you'll have an existing parameter on that line rootflags=subvol=rootname and you can change this to rootflags=subvol=rootname,degraded I would not make this option persistent by putting it permanently in the grub.cfg; although I don't know the consequence of always mounting with degraded even if not necessary it could have some negative effects (?) Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Jan 3, 2014, at 4:25 PM, Jim Salter j...@jrs-s.net wrote: One thing that concerns me is that edits made directly to grub.cfg will get wiped out with every kernel upgrade when update-grub is run - any idea where I'd put this in /etc/grub.d to have a persistent change? /etc/default/grub I don't recommend making it persistent. At this stage of development, a disk failure should cause mount failure so you're alerted to the problem. I have to tell you, I'm not real thrilled with this behavior either way - it means I can't have the option to automatically mount degraded filesystems without the filesystems in question ALWAYS showing as being mounted degraded, whether the disks are all present and working fine or not. That's kind of blecchy. =\ If you need something that comes up degraded automatically by design as a supported use case, use md (or possibly LVM which uses different user space tools and monitoring but uses the md kernel driver code and supports raid 0,1,5,6 - quite nifty). I haven't tried this yet, but I think that's also supported with the thin provisioning work, which even if you don't use thin provisioning gets you the significantly more efficient snapshot behavior. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
For anybody else interested, if you want your system to automatically boot a degraded btrfs array, here are my crib notes, verified working: * boot degraded 1. edit /etc/grub.d/10_linux, add degraded to the rootflags GRUB_CMDLINE_LINUX=rootflags=degraded,subvol=${rootsubvol} ${GRUB_CMDLINE_LINUX} 2. add degraded to options in /etc/fstab also UUID=bf9ea9b9-54a7-4efc-8003-6ac0b344c6b5 / btrfs defaults,degraded,subvol=@ 0 1 3. Update and reinstall GRUB to all boot disks update-grub grub-install /dev/vda grub-install /dev/vdb Now you have a system which will automatically start a degraded array. ** Side note: sorry, but I absolutely don't buy the argument that the system won't boot without you driving down to its physical location, standing in front of it, and hammering panickily at a BusyBox prompt is the best way to find out your array is degraded. I'll set up a Nagios module to check for degraded arrays using btrfs fi list instead, thanks... On 01/03/2014 06:06 PM, Freddie Cash wrote: Why is manual intervention even needed? Why isn't the filesystem smart enough to mount in a degraded mode automatically? -- Freddie Cash fjwc...@gmail.com mailto:fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
Minor correction: you need to close the double-quotes at the end of the GRUB_CMDLINE_LINUX line: GRUB_CMDLINE_LINUX=rootflags=degraded,subvol=${rootsubvol} ${GRUB_CMDLINE_LINUX} On 01/03/2014 06:42 PM, Jim Salter wrote: For anybody else interested, if you want your system to automatically boot a degraded btrfs array, here are my crib notes, verified working: * boot degraded 1. edit /etc/grub.d/10_linux, add degraded to the rootflags GRUB_CMDLINE_LINUX=rootflags=degraded,subvol=${rootsubvol} ${GRUB_CMDLINE_LINUX} 2. add degraded to options in /etc/fstab also UUID=bf9ea9b9-54a7-4efc-8003-6ac0b344c6b5 / btrfs defaults,degraded,subvol=@ 0 1 3. Update and reinstall GRUB to all boot disks update-grub grub-install /dev/vda grub-install /dev/vdb Now you have a system which will automatically start a degraded array. ** Side note: sorry, but I absolutely don't buy the argument that the system won't boot without you driving down to its physical location, standing in front of it, and hammering panickily at a BusyBox prompt is the best way to find out your array is degraded. I'll set up a Nagios module to check for degraded arrays using btrfs fi list instead, thanks... On 01/03/2014 06:06 PM, Freddie Cash wrote: Why is manual intervention even needed? Why isn't the filesystem smart enough to mount in a degraded mode automatically? -- Freddie Cash fjwc...@gmail.com mailto:fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Jan 3, 2014, at 4:42 PM, Jim Salter j...@jrs-s.net wrote: For anybody else interested, if you want your system to automatically boot a degraded btrfs array, here are my crib notes, verified working: * boot degraded 1. edit /etc/grub.d/10_linux, add degraded to the rootflags GRUB_CMDLINE_LINUX=rootflags=degraded,subvol=${rootsubvol} ${GRUB_CMDLINE_LINUX} This is the wrong way to solve this. /etc/grub.d/10_linux is subject to being replaced on updates. It is not recommended it be edited, same as for grub.cfg. The correct way is as I already stated, which is to edit the GRUB_CMDLINE_LINUX= line in /etc/default/grub. 2. add degraded to options in /etc/fstab also UUID=bf9ea9b9-54a7-4efc-8003-6ac0b344c6b5 / btrfs defaults,degraded,subvol=@ 0 1 I think it's bad advice to recommend always persistently mounting a good volume with this option. There's a reason why degraded is not the default mount option, and why there isn't yet automatic degraded mount functionality. That fstab contains other errors. The correct way to automate this before Btrfs developers get around to it is to create a systemd unit that checks for the mount failure, determines that there's a missing device, and generates a modified sysroot.mount job that includes degraded. Side note: sorry, but I absolutely don't buy the argument that the system won't boot without you driving down to its physical location, standing in front of it, and hammering panickily at a BusyBox prompt is the best way to find out your array is degraded. You're simply dissatisfied with the state of Btrfs development and are suggesting bad hacks as a work around. That's my argument. Again, if your use case requires automatic degraded mounts, use a technology that's mature and well tested for that use case. Don't expect a lot of sympathy if these bad hacks cause you problems later. I'll set up a Nagios module to check for degraded arrays using btrfs fi list instead, thanks… That's a good idea, except that it's show rather than list. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On 01/03/2014 07:27 PM, Chris Murphy wrote: This is the wrong way to solve this. /etc/grub.d/10_linux is subject to being replaced on updates. It is not recommended it be edited, same as for grub.cfg. The correct way is as I already stated, which is to edit the GRUB_CMDLINE_LINUX= line in /etc/default/grub. Fair enough - though since I already have to monkey-patch 00_header, I kind of already have an eye on grub.d so it doesn't seem as onerous as it otherwise would. There is definitely a lot of work that needs to be done on the boot sequence for btrfs IMO. I think it's bad advice to recommend always persistently mounting a good volume with this option. There's a reason why degraded is not the default mount option, and why there isn't yet automatic degraded mount functionality. That fstab contains other errors. What other errors does it contain? Aside from adding the degraded option, that's a bone-stock fstab entry from an Ubuntu Server installation. The correct way to automate this before Btrfs developers get around to it is to create a systemd unit that checks for the mount failure, determines that there's a missing device, and generates a modified sysroot.mount job that includes degraded. Systemd is not the boot system in use for my distribution, and using it would require me to build a custom kernel, among other things. We're going to have to agree to disagree that that's an appropriate workaround, I think. You're simply dissatisfied with the state of Btrfs development and are suggesting bad hacks as a work around. That's my argument. Again, if your use case requires automatic degraded mounts, use a technology that's mature and well tested for that use case. Don't expect a lot of sympathy if these bad hacks cause you problems later. You're suggesting the wrong alternatives here (mdraid, LVM, etc) - they don't provide the features that I need or are accustomed to (true snapshots, copy on write, self-correcting redundant arrays, and on down the line). If you're going to shoo me off, the correct way to do it is to wave me in the direction of ZFS, in which case I can tell you I've been a happy user of ZFS for 5+ years now on hundreds of systems. ZFS and btrfs are literally the *only* options available that do what I want to do, and have been doing for years now. (At least aside from six-figure-and-up proprietary systems, which I have neither the budget nor the inclination for.) I'm testing btrfs heavily in throwaway virtual environments and in a few small, heavily-monitored test production instances because ZFS on Linux has its own set of problems, both technical and licensing, and I think it's clear btrfs is going to take the lead in the very near future - in many ways, it does already. I'll set up a Nagios module to check for degraded arrays using btrfs fi list instead, thanks… That's a good idea, except that it's show rather than list. Yup, that's what I meant all right. I frequently still get the syntax backwards between btrfs fi show and btrfs subv list. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Fri, Jan 3, 2014 at 9:59 PM, Jim Salter j...@jrs-s.net wrote: You're suggesting the wrong alternatives here (mdraid, LVM, etc) - they don't provide the features that I need or are accustomed to (true snapshots, copy on write, self-correcting redundant arrays, and on down the line). If you're going to shoo me off, the correct way to do it is to wave me in the direction of ZFS, in which case I can tell you I've been a happy user of ZFS for 5+ years now on hundreds of systems. ZFS and btrfs are literally the *only* options available that do what I want to do, and have been doing for years now. (At least aside from six-figure-and-up proprietary systems, which I have neither the budget nor the inclination for.) Jim, there's nothing stopping you from creating a Btrfs filesystem on top of an mdraid array. I'm currently running three WD Red 3TB drives in a raid5 configuration under a Btrfs filesystem. This configuration works pretty well and fills the feature gap you're describing. I will say, though, that the whole tone of your email chain leaves a bad taste in my mouth; kind of like a poorly adjusted relative who shows up once a year for Thanksgiving and makes everyone feel uncomfortable. I find myself annoyed by the constant disclaimers I read on this list, about the experimental status of Btrfs, but it's apparent that this hasn't sunk in for everyone. Your poor budget doesn't a production filesytem make. I and many others on this list who have been using Btrfs, will tell you with no hesitation, that due to the maturity of the code, Btrfs should be making NO assumptions in the event of a failure, and everything should come to a screeching halt. I've seen it all: the infamous 120 second process hangs, csum errors, multiple separate catastrophic failures (search me on this list). Things are MOSTLY stable but you simply have to glance at a few weeks of history on this list to see the experimental status is fully justified. I use Btrfs because of its intoxicating feature set. As an IT director though, I'd never subject my company to these rigors. If Btrfs on mdraid isn't an acceptable solution for you, then ZFS is the only responsible alternative. -- -=[dave]=- Entropy isn't what it used to be. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
Chris Murphy posted on Fri, 03 Jan 2014 16:22:44 -0700 as excerpted: I would not make this option persistent by putting it permanently in the grub.cfg; although I don't know the consequence of always mounting with degraded even if not necessary it could have some negative effects (?) Degraded only actually does anything if it's actually needed. On a normal array it'll be a NOOP, so should be entirely safe for /normal/ operation, but that doesn't mean I'd /recommend/ it for normal operation, since it bypasses checks that are there for a reason, thus silently bypassing information that an admin needs to know before he boots it anyway, in ordered to recover. However, I've some other comments to add: 1) As you I'm uncomfortable with the whole idea of adding degraded permanently at this point. Mention was made of having to drive down to the data center and actually stand in front of the box if something goes wrong, otherwise. At the moment, for btrfs' development state at this point, fine. Btrfs remains under development and there are clear warnings about using it without backups one hasn't tested recovery from or are not otherwise prepared to actually use. It's stated in multiple locations on the wiki; it's stated on the kernel btrfs config option, and it's stated in mkfs.btrfs output when you create the filesystem. If after all that people are using it in a remote situation where they're not prepared to drive down to the data center and stab at the keys if they have to, they're using possibly the right filesystem, but at the wrong too early point in its development, for their needs at this moment. 2) As the wiki explains, certain configurations require at least a minimum number of devices in ordered to work undegraded. The example given in the OP was of a 4-device raid10, already the minimum number to work undegraded, with one device dropped out, to below the minimum required number to mount undegraded, so of /course/ it wouldn't mount without that option. If five or six devices would have been used, a device could have been dropped and the remaining number of devices would still be greater than or equal to the minimum number of devices to run an undegraded raid10, and the result would likely have been different, since there's still enough devices to mount writable with proper redundancy, even if existing information doesn't have that redundancy until a rebalance is done to take care of the missing device. Similarly with a raid1 and its minimum two devices. Configure with three, then drop one, and it should still work as it's above the two minimum for raid1 configuration. Configure with two and drop one, and you'll have to mount degraded (and it'll drop to read-only if it happens in operation) since there's no second device to write the second copy to, as required by raid1. 3) Frankly, this whole thread smells of going off half cocked, posting before doing the proper research. I know when I took a look at btrfs here, I read up on the wiki, reading the multiple devices stuff, the faq, the problem faq, the gotchas, the use cases, the sysadmin guide, the getting started and mount options... loading the pages multiple times as I followed links back and forth between them. Because I care about my data and want to understand what I'm doing with it before I do it! And even now I often reread specific parts as I'm trying to help others with questions on this list Then I still had some questions about how it worked that I couldn't find answers for on the wiki, and as traditional with mailing lists and newsgroups before them, I read several weeks worth of posts (on an archive for lists) before actually posting my questions, to see if they were FAQs already answered on the list. Then and only then did I post the questions to the list, and when I did, it was, Questions I haven't found answers for on the wiki or list, not THE WORLD IS GOING TO END, OH NOS!!111!!11!111!!! Now later on I did post some behavior that had me rather upset, but that was AFTER I had already engaged the list in general, and was pretty sure by that point that what I was seeing was NOT covered on the wiki, and was reasonably new information for at least SOME list users. 4) As a matter of fact, AFAIK that behavior remains relevant today, and may well be of interest to the OP. FWIW my background was Linux kernel md/raid, so I approached the btrfs raid expecting similar behavior. What I found in my testing (and NOT covered on the WIKI or in the various documentation other than in a few threads on list to this day, AFAIK) , however... Test: a) Create a two device btrfs raid1. b) Mount it and write some data to it. c) Unmount it, unplug one device, mount degraded the remaining device. d) Write some data to a test file on it, noting the path/filename and data. e) Unmount again, switch plugged devices so the formerly unplugged one is now the plugged