Re: Volume appears full but TB's of space available
On 2017-04-08 01:12, Duncan wrote: Austin S. Hemmelgarn posted on Fri, 07 Apr 2017 07:41:22 -0400 as excerpted: 2. Results from 'btrfs scrub'. This is somewhat tricky because scrub is either asynchronous or blocks for a _long_ time. The simplest option I've found is to fire off an asynchronous scrub to run during down-time, and then schedule recurring checks with 'btrfs scrub status'. On the plus side, 'btrfs scrub status' already returns non-zero if the scrub found errors. This is (one place) where my "keep it small enough to be in-practice- manageable" comes in. I always run my scrubs with -B (don't background, always, because I've scripted it), and they normally come back within a minute. =:^) But that's because I'm running multiple btrfs pair-device raid1 on a pair of partitioned SSDs, with each independent btrfs built on a partition from each ssd, with all partitions under 50 GiB. So scrubs takes less than a minute to run (on the under 1 GiB /var/log, it returns effectively immediately, as soon as I hit enter on the command), but that's not entirely surprising at the sizes of the ssd-based btrfs' I am running. When scrubs (and balances, and checks) come back in a minute or so, it makes maintenance /so/ much less of a hassle. =:^) And the generally single-purpose and relatively small size of each filesystem means I can, for instance, keep / (with all the system libs, bins, manpages, and the installed-package database, among other things) mounted read-only by default, and keep the updates partition (gentoo so that's the gentoo and overlay trees, the sources and binpkg cache, ccache cache, etc) and (large non-ssd/non-btrfs) media partitions unmounted by default. Which in turn means when something /does/ go wrong, as long as it wasn't a physical device, there's much less data at risk, because most of it was probably either unmounted, or mounted read-only. Which in turn means I don't have to worry about scrub/check or other repair on those filesystems at all, only the ones that were actually mounted writable. And as mentioned, those scrub and check fast enough that I can literally wait at the terminal for command completion. =:^) Of course my setup's what most would call partitioned to the extreme, but it does have its advantages, and it works well for me, which after all is the important thing for /my/ setup. Eh, maybe most people who never dealt with disks with capacities on the order of triple-digit _megabytes_. TBH, most of my systems look pretty similar, although I split at places that most people think are odd until I explain the reasoning (like /var/cache or the RRD storage for collectd). With the exception of the backing storage for the storage micro-cluster I have on my home network and the VM storage, all my filesystems are 32GB or less (and usually some multiple of 8G), although I'm not lucky enough to have a good enough system to run maintenance that fast (although part of that might be that I don't heavily over-provision space in most of the filesystems, but instead leave a reasonable amount of slack-space at the LVM level, so if a filesystem gets wedged, I just temporarily resize the LV it's on so I can fix it). But the more generic point remains, if you setup multi-TB filesystems that take days or weeks for a maintenance command to complete, running those maintenance commands isn't going to be something done as often as one arguably should, and rebuilding from a filesystem or device failure is going to take far longer than one would like, as well. We've seen the reports here. If that's what you're doing, strongly consider breaking your filesystems down to something rather more manageable, say a couple TiB each. Broken along natural usage lines, it can save a lot on the caffeine and headache pills when something does go wrong. Unless of course like one poster here, you're handling double-digit-TB super-collider data files. Those tend to be a bit difficult to store on sub-double-digit-TB filesystems. =:^) But that's the other extreme from what I've done here, and he actually has a good /reason/ for /his/ double-digit- or even triple-digit-TB filesystems. There's not much to be done about his use-case, and indeed, AFAIK he decided btrfs simply isn't stable and mature enough for that use-case yet, tho I believe he's using it for some other, more minor and less gargantuan use-cases. Even aside from that, there are cases where you essentially need large filesystems. One good example is NAS usage. In that case, it's a lot simpler to provision one filesystem and then share out subsets of it than it is to provision one for each share. Clustering is another good example (the micro-cluster I mentioned above being a good example of this, by just using one filesystem for each back-end system, I end up saving a very large amount of resources without compromising performance (although, the 200GB back-end filesystems are nowhere near the multi-TB
Re: Volume appears full but TB's of space available
Austin S. Hemmelgarn posted on Fri, 07 Apr 2017 07:41:22 -0400 as excerpted: > 2. Results from 'btrfs scrub'. This is somewhat tricky because scrub is > either asynchronous or blocks for a _long_ time. The simplest option > I've found is to fire off an asynchronous scrub to run during down-time, > and then schedule recurring checks with 'btrfs scrub status'. On the > plus side, 'btrfs scrub status' already returns non-zero if the scrub > found errors. This is (one place) where my "keep it small enough to be in-practice- manageable" comes in. I always run my scrubs with -B (don't background, always, because I've scripted it), and they normally come back within a minute. =:^) But that's because I'm running multiple btrfs pair-device raid1 on a pair of partitioned SSDs, with each independent btrfs built on a partition from each ssd, with all partitions under 50 GiB. So scrubs takes less than a minute to run (on the under 1 GiB /var/log, it returns effectively immediately, as soon as I hit enter on the command), but that's not entirely surprising at the sizes of the ssd-based btrfs' I am running. When scrubs (and balances, and checks) come back in a minute or so, it makes maintenance /so/ much less of a hassle. =:^) And the generally single-purpose and relatively small size of each filesystem means I can, for instance, keep / (with all the system libs, bins, manpages, and the installed-package database, among other things) mounted read-only by default, and keep the updates partition (gentoo so that's the gentoo and overlay trees, the sources and binpkg cache, ccache cache, etc) and (large non-ssd/non-btrfs) media partitions unmounted by default. Which in turn means when something /does/ go wrong, as long as it wasn't a physical device, there's much less data at risk, because most of it was probably either unmounted, or mounted read-only. Which in turn means I don't have to worry about scrub/check or other repair on those filesystems at all, only the ones that were actually mounted writable. And as mentioned, those scrub and check fast enough that I can literally wait at the terminal for command completion. =:^) Of course my setup's what most would call partitioned to the extreme, but it does have its advantages, and it works well for me, which after all is the important thing for /my/ setup. But the more generic point remains, if you setup multi-TB filesystems that take days or weeks for a maintenance command to complete, running those maintenance commands isn't going to be something done as often as one arguably should, and rebuilding from a filesystem or device failure is going to take far longer than one would like, as well. We've seen the reports here. If that's what you're doing, strongly consider breaking your filesystems down to something rather more manageable, say a couple TiB each. Broken along natural usage lines, it can save a lot on the caffeine and headache pills when something does go wrong. Unless of course like one poster here, you're handling double-digit-TB super-collider data files. Those tend to be a bit difficult to store on sub-double-digit-TB filesystems. =:^) But that's the other extreme from what I've done here, and he actually has a good /reason/ for /his/ double-digit- or even triple-digit-TB filesystems. There's not much to be done about his use-case, and indeed, AFAIK he decided btrfs simply isn't stable and mature enough for that use-case yet, tho I believe he's using it for some other, more minor and less gargantuan use-cases. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Volume appears full but TB's of space available
On 2017-04-07 13:05, John Petrini wrote: The use case actually is not Ceph, I was just drawing a comparison between Ceph's object replication strategy vs BTRF's chunk mirroring. That's actually a really good comparison that I hadn't thought of before. From what I can tell from my limited understanding of how Ceph works, the general principals are pretty similar, except that BTRFS doesn't understand or implement failure domains (although having CRUSH implemented in BTRFS for chunk placement would be a killer feature IMO). I do find the conversation interesting however as I work with Ceph quite a lot but have always gone with the default XFS filesystem for on OSD's. From a stability perspective, I would normally go with XFS still for the OSD's. Most of the data integrity features provided by BTRFS are also implemented in Ceph, so you don't gain much other than flexibility currently by using BTRFS instead of XFS. The one advantage BTRFS has in my experience over XFS for something like this is that it seems (with recent versions at least) to be more likely to survive a power-failure without any serious data loss than XFS is, but that's not really a common concern in Ceph's primary use case. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Volume appears full but TB's of space available
The use case actually is not Ceph, I was just drawing a comparison between Ceph's object replication strategy vs BTRF's chunk mirroring. I do find the conversation interesting however as I work with Ceph quite a lot but have always gone with the default XFS filesystem for on OSD's. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Volume appears full but TB's of space available
On 2017-04-07 12:58, John Petrini wrote: When you say "running BTRFS raid1 on top of LVM RAID0 volumes" do you mean creating two LVM RAID-0 volumes and then putting BTRFS RAID1 on the two resulting logical volumes? Yes, although it doesn't have to be LVM, it could just as easily be MD or even hardware RAID (I just prefer LVM for the flexibility it offers). A quick tip regarding this, it seems to get the best performance if the stripe size (the -I option for lvcreate) is chosen so that it either matches the BTRFS block size, or such that each block in BTRFS gets striped across all the disks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Volume appears full but TB's of space available
When you say "running BTRFS raid1 on top of LVM RAID0 volumes" do you mean creating two LVM RAID-0 volumes and then putting BTRFS RAID1 on the two resulting logical volumes? ___ John Petrini NOC Systems Administrator // CoreDial, LLC // coredial.com // Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422 P: 215.297.4400 x232 // F: 215.297.4401 // E: jpetr...@coredial.com Interested in sponsoring PartnerConnex 2017? Learn more. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer. On Fri, Apr 7, 2017 at 12:51 PM, Austin S. Hemmelgarnwrote: > On 2017-04-07 12:04, Chris Murphy wrote: >> >> On Fri, Apr 7, 2017 at 5:41 AM, Austin S. Hemmelgarn >> wrote: >> >>> I'm rather fond of running BTRFS raid1 on top of LVM RAID0 volumes, >>> which while it provides no better data safety than BTRFS raid10 mode, >>> gets >>> noticeably better performance. >> >> >> This does in fact have better data safety than Btrfs raid10 because it >> is possible to lose more than one drive without data loss. You can >> only lose drives on one side of the mirroring, however. This is a >> conventional raid0+1, so it's not as scalable as raid10 when it comes >> to rebuild time. >> > That's a good point that I don't often remember, and I'm pretty sure that > such an array will rebuild slower from a single device loss than BTRFS > raid10 would, but most of that should be that BTRFS is smart enough to only > rewrite what it has to. > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Volume appears full but TB's of space available
On 2017-04-07 12:28, Chris Murphy wrote: On Fri, Apr 7, 2017 at 7:50 AM, Austin S. Hemmelgarnwrote: If you care about both performance and data safety, I would suggest using BTRFS raid1 mode on top of LVM or MD RAID0 together with having good backups and good monitoring. Statistically speaking, catastrophic hardware failures are rare, and you'll usually have more than enough warning that a device is failing before it actually does, so provided you keep on top of monitoring and replace disks that are showing signs of impending failure as soon as possible, you will be no worse off in terms of data integrity than running ext4 or XFS on top of a LVM or MD RAID10 volume. Depending on the workload, and what replication is being used by Ceph above this storage stack, it might make make more sense to do something like three lvm/md raid5 arrays, and then Btrfs single data, raid1 metadata, across those three raid5s. That's giving up only three drives to parity rather than 1/2 the drives, and rebuild time is shorter than losing one drive in a raid0 array. Ah, I had forgotten it was a Ceph back-end system. In that case, I would actually suggest essentially the same setup that Chris did, although I would personally be a bit more conservative and use RAID6 instead of RAID5 for the LVM/MD arrays. As he said though, it really depends on what higher-level replication you're doing. In particular, if you're running erasure coding instead of replication at the Ceph level, I would probably still go with BTRFS raid1 on top of LVM/MD RAID0 just to balance out the performance hit from the erasure coding. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Volume appears full but TB's of space available
On 2017-04-07 12:04, Chris Murphy wrote: On Fri, Apr 7, 2017 at 5:41 AM, Austin S. Hemmelgarnwrote: I'm rather fond of running BTRFS raid1 on top of LVM RAID0 volumes, which while it provides no better data safety than BTRFS raid10 mode, gets noticeably better performance. This does in fact have better data safety than Btrfs raid10 because it is possible to lose more than one drive without data loss. You can only lose drives on one side of the mirroring, however. This is a conventional raid0+1, so it's not as scalable as raid10 when it comes to rebuild time. That's a good point that I don't often remember, and I'm pretty sure that such an array will rebuild slower from a single device loss than BTRFS raid10 would, but most of that should be that BTRFS is smart enough to only rewrite what it has to. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Volume appears full but TB's of space available
On Fri, Apr 7, 2017 at 7:50 AM, Austin S. Hemmelgarnwrote: > If you care about both performance and data safety, I would suggest using > BTRFS raid1 mode on top of LVM or MD RAID0 together with having good backups > and good monitoring. Statistically speaking, catastrophic hardware failures > are rare, and you'll usually have more than enough warning that a device is > failing before it actually does, so provided you keep on top of monitoring > and replace disks that are showing signs of impending failure as soon as > possible, you will be no worse off in terms of data integrity than running > ext4 or XFS on top of a LVM or MD RAID10 volume. Depending on the workload, and what replication is being used by Ceph above this storage stack, it might make make more sense to do something like three lvm/md raid5 arrays, and then Btrfs single data, raid1 metadata, across those three raid5s. That's giving up only three drives to parity rather than 1/2 the drives, and rebuild time is shorter than losing one drive in a raid0 array. If this is one ceph host, then it might make sense to split the drives up so there are two storage bricks using ceph replication between them for the equivalent of raid1. One brick can do Btrfs on LVM/md raid5, call it brick A. The other brick can do XFS on LVM/md linear, call it brick B. The advantage there is the different bricks are going to have faster commit to stable media times with a mixed workload. The Btrfs on raid5 brick will do better with sequential reads and writes. The XFS on linear will do better with metadata heavy reads and writes. There's probably some Ceph tuning where you can point certain workloads to particular volumes, where those volumes are backed by different priorities to the underlying storage. So you'd setup ceph volume "mail" to be backed in order by brick B then A. Not very well known but XFS will parallelize across drives in a linear/concat arrangement, it's quite useful for e.g. busy mail servers. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Volume appears full but TB's of space available
On Fri, Apr 7, 2017 at 5:41 AM, Austin S. Hemmelgarnwrote: > I'm rather fond of running BTRFS raid1 on top of LVM RAID0 volumes, > which while it provides no better data safety than BTRFS raid10 mode, gets > noticeably better performance. This does in fact have better data safety than Btrfs raid10 because it is possible to lose more than one drive without data loss. You can only lose drives on one side of the mirroring, however. This is a conventional raid0+1, so it's not as scalable as raid10 when it comes to rebuild time. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Volume appears full but TB's of space available
On 2017-04-07 09:28, John Petrini wrote: Hi Austin, Thanks for taking to time to provide all of this great information! Glad I could help. You've got me curious about RAID1. If I were to convert the array to RAID1 could it then sustain a multi drive failure? Or in other words do I actually end up with mirrored pairs or can a chunk still be mirrored to any disk in the array? Are there performance implications to using RAID1 vs RAID10? For raid10, your data is stored as 2 replicas striped at or below the filesystem-block level across all the disks in the array. Because of how the data striping is done currently, you're functionally guaranteed to lose data if you lose more than one disk in raid10 mode. This theoretically could be improved so that partial losses could be recovered, but doing so with the current implementation would be extremely complicated, and as such is not a high priority (although patches would almost certainly be welcome). For raid1, your data is stored as 2 replicas with each entirely on one disk, but individual chunks (the higher level allocation in BTRFS) are distributed in a round-robin fashion among the disks, so any given filesystem block is on exactly 2 disks. With the current implementation, for any reasonably utilized filesystem, you will lose data if you lose 2 or more disks in raid1 mode. That said, there are plans (still currently vaporware in favor of getting raid5/6 working) to add arbitrary replication levels to BTRFS, so once that hits, you could set things to have as many replicas as you want. In effect, both can currently only sustain one disk failure, but losing 2 disks in raid10 will probably corrupt files (currently, it will functionally kill the FS, although with a bit of theoretically simple work this could be changed), while losing 2 disks in raid1 mode will usually just make files disappear unless they are larger than the data chunk size (usually between 1-5GB depending on the size of the FS), so if you're just storing small files, you'll have an easier time quantifying data loss with raid1 than raid10. Both modes have the possibility of completely losing the FS if the lost disks happen to take out the System chunk. As for performance, raid10 mode in BTRFS gets better performance, but you can get even better performance than that by running BTRFS in raid1 mode on top of 2 LVM or MD raid0 volumes. Such a configuration provides the same effective data safety as BTRFS raid10, but can get anywhere from 5-30% better performance depending on the workload. If you care about both performance and data safety, I would suggest using BTRFS raid1 mode on top of LVM or MD RAID0 together with having good backups and good monitoring. Statistically speaking, catastrophic hardware failures are rare, and you'll usually have more than enough warning that a device is failing before it actually does, so provided you keep on top of monitoring and replace disks that are showing signs of impending failure as soon as possible, you will be no worse off in terms of data integrity than running ext4 or XFS on top of a LVM or MD RAID10 volume. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Volume appears full but TB's of space available
Hi Austin, Thanks for taking to time to provide all of this great information! You've got me curious about RAID1. If I were to convert the array to RAID1 could it then sustain a multi drive failure? Or in other words do I actually end up with mirrored pairs or can a chunk still be mirrored to any disk in the array? Are there performance implications to using RAID1 vs RAID10? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Volume appears full but TB's of space available
On 2017-04-06 23:25, John Petrini wrote: Interesting. That's the first time I'm hearing this. If that's the case I feel like it's a stretch to call it RAID10 at all. It sounds a lot more like basic replication similar to Ceph only Ceph understands failure domains and therefore can be configured to handle device failure (albeit at a higher level) Yeah, the stacking is a bit odd, and there are some rather annoying caveats that make most of the names other than raid5/raid6 misleading. In fact, raid1 mode in BTRFS is more like what most people think of as RAID10 when run on more than 2 disks than BTRFS raid10 mode is, although it stripes at a much higher level. I do of course keep backups but I chose RAID10 for the mix of performance and reliability. It doesn't seems worth it losing 50% of my usable space for the performance gain alone. Thank you for letting me know about this. Knowing that I think I may have to reconsider my choice here. I've really been enjoying the flexibility of BTRS which is why I switched to it in the first place but with experimental RAID5/6 and what you've just told me I'm beginning to doubt that it's the right choice. There are some other options in how you configure it. Most of the more useful operational modes actually require stacking BTRFS on top of LVM or MD. I'm rather fond of running BTRFS raid1 on top of LVM RAID0 volumes, which while it provides no better data safety than BTRFS raid10 mode, gets noticeably better performance. You can also reverse that to get something more like traditional RAID10, but you lose the self-correcting aspect of BTRFS. What's more concerning is that I haven't found a good way to monitor BTRFS. I might be able to accept that the array can only handle a single drive failure if I was confident that I could detect it but so far I haven't found a good solution for this. This I can actually give some advice on. There are a couple of options, but the easiest is to find a piece of generic monitoring software that can check the return code of external programs, and then write some simple scripts to perform the checks on BTRFS. The things you want to keep an eye on are: 1. Output of 'btrfs dev stats'. If you've got a new enough copy of btrfs-progs, you can pass '--check' and the return code will be non-zero if any of the error counters isn't zero. If you've got to use an older version, you'll instead have to write a script to parse the output (I will comment that this is much easier in a language like Perl or Python than it is in bash). You want to watch for steady increases in error counts or sudden large jumps. Single intermittent errors are worth tracking, but they tend to happen more frequently the larger the array is. 2. Results from 'btrfs scrub'. This is somewhat tricky because scrub is either asynchronous or blocks for a _long_ time. The simplest option I've found is to fire off an asynchronous scrub to run during down-time, and then schedule recurring checks with 'btrfs scrub status'. On the plus side, 'btrfs scrub status' already returns non-zero if the scrub found errors. 3. Watch the filesystem flags. Some monitoring software can easily do this for you (Monit for example can watch for changes in the flags). The general idea here is that BTRFS will go read-only if it hits certain serious errors, so you can watch for that transition and send a notification when it happens. This is also worth watching since the filesystem flags should not change during normal operation of any filesystem. 4. Watch SMART status on the drives and run regular self-tests. Most of the time, issues will show up here before they show up in the FS, so by watching this, you may have an opportunity to replace devices before the filesystem ends up completely broken. 5. If you're feeling really ambitious, watch the kernel logs for errors from BTRFS and whatever storage drivers you use. This is the least reliable thing out of this list to automate, so I'd not suggest just doing this by itself. The first two items are BTRFS specific. The rest however, are standard things you should be monitoring regardless of what type of storage stack you have. Of these, item 3 will immediately trigger in the event of a catastrophic device failure, while 1, 2, and 5 will provide better coverage of slow failures, and 4 will cover both aspects. As far as what to use to actually track these, that really depends on your use case. For tracking on an individual system basis, I'd suggest Monit, it's efficient, easy to configure, provides some degree of error resilience, and can actually cover a lot of monitoring tasks beyond stuff like this. If you want some kind of centralized monitoring, I'd probably go with Nagios, but that's more because that's the standard for that type of thing, not because I've used it myself (I much prefer per-system decentralized monitoring, with only the checks that systems are online
Re: Volume appears full but TB's of space available
Interesting. That's the first time I'm hearing this. If that's the case I feel like it's a stretch to call it RAID10 at all. It sounds a lot more like basic replication similar to Ceph only Ceph understands failure domains and therefore can be configured to handle device failure (albeit at a higher level) I do of course keep backups but I chose RAID10 for the mix of performance and reliability. It doesn't seems worth it losing 50% of my usable space for the performance gain alone. Thank you for letting me know about this. Knowing that I think I may have to reconsider my choice here. I've really been enjoying the flexibility of BTRS which is why I switched to it in the first place but with experimental RAID5/6 and what you've just told me I'm beginning to doubt that it's the right choice. What's more concerning is that I haven't found a good way to monitor BTRFS. I might be able to accept that the array can only handle a single drive failure if I was confident that I could detect it but so far I haven't found a good solution for this. ___ John Petrini NOC Systems Administrator // CoreDial, LLC // coredial.com // Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422 P: 215.297.4400 x232 // F: 215.297.4401 // E: jpetr...@coredial.com Interested in sponsoring PartnerConnex 2017? Learn more. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer. On Thu, Apr 6, 2017 at 10:42 PM, Chris Murphywrote: > On Thu, Apr 6, 2017 at 7:31 PM, John Petrini wrote: >> Hi Chris, >> >> I've followed your advice and converted the system chunk to raid10. I >> hadn't noticed it was raid0 and it's scary to think that I've been >> running this array for three months like that. Thank you for saving me >> a lot of pain down the road! > > For what it's worth, it is imperative to keep frequent backups with > Btrfs raid10, it is in some ways more like raid0+1. It can only > tolerate the loss of a single device. It will continue to function > with 2+ devices in a very deceptive degraded state, until it > inevitably hits dual missing chunks of metadata or data, and then it > will faceplant. And then you'll be looking at a scrape operation. > > It's not like raid10 where you can lose one of each mirrored pair. > Btrfs raid10 mirrors chunks, not drives. So your metadata and data are > all distributed across all of the drives, and that in effect means you > can only lose 1 drive. If you lose a 2nd drive, some amount of > metadata and data will have been lost. > > > -- > Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Volume appears full but TB's of space available
On Thu, Apr 6, 2017 at 7:31 PM, John Petriniwrote: > Hi Chris, > > I've followed your advice and converted the system chunk to raid10. I > hadn't noticed it was raid0 and it's scary to think that I've been > running this array for three months like that. Thank you for saving me > a lot of pain down the road! For what it's worth, it is imperative to keep frequent backups with Btrfs raid10, it is in some ways more like raid0+1. It can only tolerate the loss of a single device. It will continue to function with 2+ devices in a very deceptive degraded state, until it inevitably hits dual missing chunks of metadata or data, and then it will faceplant. And then you'll be looking at a scrape operation. It's not like raid10 where you can lose one of each mirrored pair. Btrfs raid10 mirrors chunks, not drives. So your metadata and data are all distributed across all of the drives, and that in effect means you can only lose 1 drive. If you lose a 2nd drive, some amount of metadata and data will have been lost. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Volume appears full but TB's of space available
Hi Chris, I've followed your advice and converted the system chunk to raid10. I hadn't noticed it was raid0 and it's scary to think that I've been running this array for three months like that. Thank you for saving me a lot of pain down the road! Also thank you for the clarification on the output - this is making a lot more sense. Regards, John Petrini -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Volume appears full but TB's of space available
On Thu, Apr 6, 2017 at 7:15 PM, John Petriniwrote: > Okay so I came across this bug report: > https://bugzilla.redhat.com/show_bug.cgi?id=1243986 > > It looks like I'm just misinterpreting the output of btrfs fi df. What > should I be looking at to determine the actual free space? Is Free > (estimated): 13.83TiB (min: 13.83TiB) the proper metric? > > Simply running df does not seem to report the usage properly > > /dev/sdj 25T 11T 5.9T 65% /mnt/storage-array Free should be correct. And df -h should be IEC units, so I'd expect it to be closer to the value of btrfs fi us than this. But the code has changed over time, I'm not sure when the last adjustment was made. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Volume appears full but TB's of space available
On Thu, Apr 6, 2017 at 6:47 PM, John Petriniwrote: > sudo btrfs fi df /mnt/storage-array/ > Data, RAID10: total=10.72TiB, used=10.72TiB > System, RAID0: total=128.00MiB, used=944.00KiB > Metadata, RAID10: total=14.00GiB, used=12.63GiB > GlobalReserve, single: total=512.00MiB, used=0.00B The third line is kinda scary. System chunk is raid0, so ostensibly a single device failure means the entire array is lost. The fastest way to fix it is: btrfs balance start -mconvert=raid10,soft That will make the system chunk raid10. > > sudo btrfs fi usage /mnt/storage-array/ > Overall: > Device size: 49.12TiB > Device allocated: 21.47TiB > Device unallocated: 27.65TiB > Device missing: 0.00B > Used: 21.45TiB > Free (estimated): 13.83TiB (min: 13.83TiB) > Data ratio: 2.00 > Metadata ratio: 2.00 > Global reserve: 512.00MiB (used: 0.00B) > > Data,RAID10: Size:10.72TiB, Used:10.71TiB This is saying you have 10.72T of data. But because it's raid10, it will take up 2x that much space. This is what's reflected by the Overall: Used: value of 21.45T, plus some extra for metadata which is also 2x. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Volume appears full but TB's of space available
Okay so I came across this bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1243986 It looks like I'm just misinterpreting the output of btrfs fi df. What should I be looking at to determine the actual free space? Is Free (estimated): 13.83TiB (min: 13.83TiB) the proper metric? Simply running df does not seem to report the usage properly /dev/sdj 25T 11T 5.9T 65% /mnt/storage-array Thank you, John Petrini -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html