Hey. I've worried before about the topics Mitch has raised. Some questions.
1) AFAIU, the fragmentation problem exists especially for those files that see many random writes, especially, but not limited to, big files. Now that databases and VMs are affected by this, is probably broadly known in the meantime (well at least by people on that list). But I'd guess there are n other cases where such IO patterns can happen which one simply never notices, while the btrfs continues to degrade. So is there any general approach towards this? And what are the actual possible consequences? Is it just that fs gets slower (due to the fragmentation) or may I even run into other issues to the point the space is eaten up or the fs becomes basically unusable? This is especially important for me, because for some VMs and even DBs I wouldn't want to use nodatacow, because I want to have the checksumming. (i.e. those cases where data integrity is much more important than security) 2) Why does notdatacow imply nodatasum and can that ever be decoupled? For me the checksumming is actually the most important part of btrfs (not that I wouldn't like its other features as well)... so turning it off is something I really would want to avoid. Plus it opens questions like: When there are no checksums, how can it (in the RAID cases) decide which block is the good one in case of corruptions? 3) When I would actually disable datacow for e.g. a subvolume that holds VMs or DBs... what are all the implications? Obviously no checksumming, but what happens if I snapshot such a subvolume or if I send/receive it? I'd expect that then some kind of CoW needs to take place or does that simply not work? 4) Duncan mentioned that defrag (and I guess that's also for auto- defrag) isn't ref-link aware... Isn't that somehow a complete showstopper? As soon as one uses snapshot, and would defrag or auto defrag any of them, space usage would just explode, perhaps to the extent of ENOSPC, and rendering the fs effectively useless. That sounds to me like, either I can't use ref-links, which are crucial not only to snapshots but every file I copy with cp --reflink auto ... or I can't defrag... which however will sooner or later cause quite some fragmentation issues on btrfs? 5) Especially keeping (4) in mind but also the other comments in from Duncan and Austin... Is auto-defrag now recommended to be generally used? Are both auto-defrag and defrag considered stable to be used? Or are there other implications, like when I use compression 6) Does defragmentation work with compression? Or is it just filefrag which can't cope with it? Any other combinations or things with the typicaly btrfs technologies (cow/nowcow, compression, snapshots, subvols, compressions, defrag, balance) that one can do but which lead to unexpected problems (I, for example, wouldn't have expected that defragmentation isn't ref-link aware... still kinda shocked ;) ) For example, when I do a balance and change the compression, and I have multiple snaphots or files within one subvol that share their blocks... would that also lead to copies being made and the space growing possibly dramatically? 7) How das free-space defragmentation happen (or is there even such a thing)? For example, when I have my big qemu images, *not* using nodatacow, and I copy the image e.g. with qemu-img old.img new.img ... and delete the old then. Then I'd expect that the new.img is more or less not fragmented,... but will my free space (from the removed old.img) still be completely messed up sooner or later driving me into problems? 8) why does a balance not also defragment? Since everything is anyway copied... why not defragmenting it? I somehow would have hoped that a balance cleans up all kinds of things,... like free space issues and also fragmentation. Given all these issues,... fragmentation, situations in which space may grow dramatically where the end-user/admin may not necessarily expect it (e.g. the defrag or the balance+compression case?)... btrfs seem to require much more in-depth knowledge and especially care (that even depends on the type of data) on the end-user/admin side than the traditional filesystems. Are there for example any general recommendations what to regularly to do keep the fs in a clean and proper shape (and I don't count "start with a fresh one and copy the data over" as a valid way). Thanks, Chris. >
smime.p7s
Description: S/MIME cryptographic signature