Re: Switch raid mode without rebalance?
Hi, On 2016-08-26 13:52, Austin S. Hemmelgarn wrote: Regular 'df' isn't to be trusted when dealing with BTRFS, the only reason we report anything there is because many things break horribly if we don't. Yeah, I noticed. Seems to produce a reasonable guess, though. Additionally, while running with multiple profiles while not balancing should work, it's pretty much untested, and any number of things may break. Oh. Good to know. Assuming your two disks have similar latency and transfer speed, you're almost certainly better off just converting completely to single mode (which works like raid0, just at the chunk level instead of the block level). Okay, I see. On a slightly separate note, if your doing backups more frequently than once a week, your probably better off just leaving the disks connected and running. Regular load/unload cycles are generally harder on the mechanical components in modern disks than just leaving them running 24/7. True. A bit of context: I first want to make a full backup locally, then use the disks off-site for nightly incremental backups via internet. Cheers Gert -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Switch raid mode without rebalance?
Hi Chris, first off, thank you for the detailled explanations! On 2016-08-26 01:04, Chris Murphy wrote: No, it's not a file, directory or subvolume specific command. It applies to a whole volume. You are right, but all I was after in the first place was a way to change the mode for new data on the whole volume. But I understand now that there is no simple switch for that. (And to be honest it's not overly important, I was just being curious.) If I add another file, I'll get another data chunk allocated, and it'll be added to the chunk tree as item 5, and it'll have its own physical offset on each device. And this chunk just uses the same profile as the last one (or the parent in the tree), I suppose. So the point now is, in order to change the profile of a chunk, it has to be completely rewritten. That makes sense. To do what you want is planned, with no work picked up yet as far as I know. It'd probably involve some work to associate something like an xattr to let the allocator know which profile the user wants for the data, and then to allocate it to the proper existing chunk or create a new chunk with that profile as needed. I see. I think it would be very nice to have something like that for the different compression modes. For example, use LZ4 for daily use but LZMA for the subvolume that stores backups, and no compression at all for /boot, so the bootloader doesn't have to know about all the different compression algorithms. Speaking of which, I read here: https://btrfs.wiki.kernel.org/index.php/Compression#Why_does_not_du_report_the_compressed_size.3F that du will not tell me the compressed size of a file. This is very counter-intuitive, isn't it? The reason stated is that some tools apparently determine the sparse-ness of a file by comparing the size with the stat.st_blocks value. I do not know if there is a better way to do that, so maybe my argument falls apart right here, BUT: this looks to me like working around one bug by introducing another. Wouldn't it be better to have a mount option "make_du_lie_for_buggy_tools" for those that really need it? BTW, which tools would an honest du break, and how? (What harm is there in thinking that a compressed file is sparse?) Thanks! Gert -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Switch raid mode without rebalance?
Hi, On 2016-08-25 21:50, Chris Murphy wrote: It's incidental right now. It's not something controllable or intended to have enduring mixed profile block groups. I see. (Kindof) Such a switch doesn't exist, there's no way to define what files, directories, or subvolumes, have what profiles. Well it kind of does - a running balance process seems to have just that effect, it's just not persistent (and has the side effect of, well, balancing the existing data). How does btrfs find out which raid mode to use when writing new data? That's kindof an interesting question. If you were to do 'btrfs balance start -dconvert=single -mconvert=raid1' and very soon after that do 'btrfs balance cancel' you'll end up with one or a few new chunks with those profiles. When data is allocated to those chunks, they will have those profile characteristics. When data is allocated to old chunks that are still raid0, it will be raid0. The thing is, you can't really tell or control what data will be placed in what chunk. So it's plausible that some new data goes in old raid0 chunk, and some old data goes in new single/raid1 chunks. I'm not quite familiar with the concept of a chunk here. Are chunks allocated for new data, or is the unallocated space divided into chunks, too? In the former case, when creating a new chunk, does btrfs just look into a random already existing chunk and copy the raid mode from there? In the latter case, could you (in theory) change the raid mode of all empty chunks only? I know this is not an intended usage scenario; just being curious here. Thanks! Gert -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Switch raid mode without rebalance?
Hi, On 2016-08-25 20:26, Justin Kilpatrick wrote: I'm not sure why you want to avoid a balance, I didn't check, but I imagined it would slow down my rsync significantly. Once you start this command all the new data should follow the new rules. Ah, now that's interesting. When the balance is running, df shows 4TB free space; when I cancel the balance, the free space goes back to a few 100GB. But if the balancing only happens when the disk would otherwise be idle, and the mere fact that there is a balance process running will cause the new data to be written in single mode, I'm all set. I would not even have to wait for the balance to finish after the rsync is done; I could just cancel it and unmount the disks. A bit hacky, but in this case totally acceptable. Thanks! Gert -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Switch raid mode without rebalance?
Hi, I recently created a new btrfs on two disks - one 6TB, one 2TB - for temporary backup purposes. It apparently defaulted to raid0 for data, and I didn't realize at the time that this would become a problem. Now the 2TB is almost full, and df tells me I only have about 200GB of free space. Which makes sense, because raid0 spreads the data evenly on all disks. However, I know that btrfs can have different raid modes on the same filesystem at the same time. So I was hoping I could just tell it to "switch to single mode for all new data", but I don't have a clue how to do that. I *could* rebalance, of course, but is that really necessary? How does btrfs find out which raid mode to use when writing new data? Gert -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS as image store for KVM?
Hi, thank you all for your helpful comments. From what I've read, I forged the following guidelines (for myself; ymmv): - Use btrfs for generic data storage on spinning disks and for everything on ssds. - Use zfs for spinning disks that may be used for cow-unfriendly workloads, like vm images (if they are too big and/or too fast-changing for a scheduled defrag to make sense). For now I'm going with the following setup: a Debian system with root on btrfs/raid1 on two ssds, and a raidz1 pool for storage and vm images. However, those few vms that really should be fast would also fit on the SSDs, so I might move them there and switch from ZFS to btrfs on the storage pool at some point in the future. Some of the ideas presented here sound really interesting - for example I think that improving the Linux page cache to be more "arc-like" will probably benefit not only btrfs. Having both the page cache and the arc in parallel when using ZoL does not feel like an elegant solution, so maybe there's hope for that. (But I don't know if it is feasible for ZoL to abandon the arc in favor of an improved Linux page cache; I imagine it might be much work for little benefit.) Thanks again Gert -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS as image store for KVM?
On 2015-09-18 04:22, Duncan wrote: one way or another, you're going to have to write two things, one a checksum of the other, and if they are in- place-overwrites, while the race can be narrowed, there's always going to be a point at which either one or the other will have been written, while the other hasn't been, and if failure occurs at that point... ...then you still can recover the old data from the mirror or parity, and at least you don't have any inconsistent data. It's like the failure occurred just a tiny bit earlier. The only real way around that is /some/ form of copy-on-write, such that both the change and its checksum can be written to a different location than the old version, with a single, atomic write then updating a pointer to point to the new version of both the data and its checksum, instead of the old one. Or an intent log, but I guess that introduces a lot of additional writes (and seeks) that would impact performance noticeably... Gert -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS as image store for KVM?
Hi, thank you for your answers! So it seems there are several suboptimal alternatives here... MD+LVM is very close to what I want, but md has no way to cope with silent data corruption. So if I'd want to use a guest filesystem that has no checksums either, I'm out of luck. I'm honestly a bit confused here - isn't checksumming one of the most obvious things to want in a software RAID setup? Is it a feature that might appear in the future? Maybe I should talk to the md guys... BTRFS looks really nice feature-wise, but is not (yet) optimized for my use-case I guess. Disabling COW would certainly help, but I don't want to lose the data checksums. Is nodatacowbutkeepdatachecksums a feature that might turn up in the future? Maybe ZFS is the best choice for my scenario. At least, it seems to work fine for Joyent - their SmartOS virtualization OS is essentially Illumos (Solaris) with ZFS, and KVM ported from Linux. Since ZFS supports "Volumes" (virtual block devices inside a ZPool), I suspect these are probably optimized to be used for VM images (i.e. do as little COW as possible). Of course, snapshots will always degrade performance to a degree. However, there are some drawbacks to ZFS: - It's less flexible, especially when it comes to reconfiguration of disk arrays. Add or remove a disk to/from a RaidZ and rebalance, that would be just awesome. It's possible in BTRFS, but not ZFS. :-( - The not-so-good integration of the fs cache, at least on Linux. I don't know if this is really an issue, though. Actually, I imagine it's more of an issue for guest systems, because it probably breaks memory ballooning. (?) So it seems there are two options for me: 1. Go with ZFS for now, until BTRFS finds a better way to handle disk images, or until md gets data checksums. 2. Buy a bunch of SSDs for VM disk images and use spinning disks for data storage only. In that case, BTRFS should probably do fine. Any comments on that? Am I missing something? Thanks! Gert -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS as image store for KVM?
On 17.09.2015 at 21:43, Hugo Mills wrote: On Thu, Sep 17, 2015 at 07:56:08PM +0200, Gert Menke wrote: BTRFS looks really nice feature-wise, but is not (yet) optimized for my use-case I guess. Disabling COW would certainly help, but I don't want to lose the data checksums. Is nodatacowbutkeepdatachecksums a feature that might turn up in the future? [snip] No. If you try doing that particular combination of features, you end up with a filesystem that can be inconsistent: there's a race condition between updating the data in a file and updating the csum record for it, and the race can't be fixed. I'm no filesystem expert, but isn't that what an intent log is for? (Does btrfs have an intent log?) And, is this also true for mirrored or raid5 disks? I'm thinking something like "if the data does not match the checksum, just restore both from mirror/parity" should be possible, right? Gert -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS as image store for KVM?
On 17.09.2015 at 20:35, Chris Murphy wrote: You can use Btrfs in the guest to get at least notification of SDC. Yes, but I'd rather not depend on all potential guest OSes having btrfs or something similar. Another way is to put a conventional fs image on e.g. GlusterFS with checksumming enabled (and at least distributed+replicated filtering). This sounds interesting! I'll have a look at this. If you do this directly on Btrfs, maybe you can mitigate some of the fragmentation issues with bcache or dmcache; Thanks, I did not know about these. bcache seems to be more or less what "zpool add foo cache /dev/ssd" does. Definitely worth a look. > and for persistent snapshotting, use qcow2 to do it instead of Btrfs. You'd use Btrfs snapshots to create a subvolume for doing backups of the images, and then get rid of the Btrfs snapshot. Good idea. Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BTRFS as image store for KVM?
Hi everybody, first off, I'm not 100% sure if this is the right place to ask, so if it's not, I apologize and I'd appreciate a pointer in the right direction. I want to build a virtualization server to replace my current home server. I'm thinking about a Debian system with libvirt/KVM. The system will have one or two SSDs and five harddisks with some kind of software RAID5 for storage. I'd like to have a filesystem with data checksums, so BTRFS seems like the right way to go. However, I read that BTRFS does not perform well as storage for KVM disk images. (See here: http://www.linux-kvm.org/page/Tuning_KVM ) Is this still true? I would appreciate any comments and/or tips you might have on this topic. Is anyone using BTRFS as an image store? Are there any special settings I should be aware of to make it work well? Thanks, Gert -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html