But I read somewhere that compression should be turned off on mounts that just store large VM-images. Is that wrong?
Btw, I am not pre-allocation space for the images. I use sparse files with: dd if=/dev/zero of=drive.img bs=1 count=1 seek=300G It creates the file in a few ms. Is it better to use "fallocate" with btrfs? If I use sparse files, it adds a benefit when I want to copy/move the image-file to another server. Like if the 300GB sparse file just has 10GB of data in it, I only need to copy 10GB when moving it to another server. Would the same be true with "fallocate"? Anyways, would disabling CoW (by putting +C on the parent dir) prevent the performance issues and 2*filesize issue? 2014-12-20 13:52 GMT+08:00 Zygo Blaxell <ce3g8...@umail.furryterror.org>: > On Fri, Dec 19, 2014 at 04:17:08PM -0500, Josef Bacik wrote: >> >And for your inode you now have this >> > >> >inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), >> >disklen 4k >> >inode 256, file offset 4k, size 302g-4k, offset 4k, diskbytenr 123, >> >disklen 302g >> > >> >and in your extent tree you have >> > >> >extent bytenr 123, len 302g, refs 1 >> >extent bytenr whatever, len 4k, refs 1 >> > >> >See that? Your file is still the same size, it is still 302g. If you >> >cp'ed it right now it would copy 302g of information. But what you have >> >actually allocated on disk? Well that's now 302g + 4k. Now lets say >> >your virt thing decides to write to the middle, lets say at offset 12k, >> >now you have this >> > >> >inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), >> >disklen 4k >> >inode 256, file offset 4k, size 3k, offset 4k, diskbytenr 123, disklen 302g >> >inode 256, file offset 12k, size 4k, offset 0, diskebytenr whatever, >> >disklen 4k >> >inode 256, file offset 16k, size 302g - 16k, offset 16k, diskbytenr 123, >> >disklen 302g >> > >> >and in the extent tree you have this >> > >> >extent bytenr 123, len 302g, refs 2 >> >extent bytenr whatever, len 4k, refs 1 >> >extent bytenr notimportant, len 4k, refs 1 >> > >> >See that refs 2 change? We split the original extent, so we have 2 file >> >extents pointing to the same physical extents, so we bumped the ref >> >count. This will happen over and over again until we have completely >> >overwritten the original extent, at which point your space usage will go >> >back down to ~302g. > > Wait, *what*? > > OK, I did a small experiment, and found that btrfs actually does do > something like this. Can't argue with fact, though it would be nice if > btrfs could be smarter and drop unused portions of the original extent > sooner. :-P > > The above quoted scenario is a little oversimplified. Chances are that > 302G file is made of much smaller extents (128M..256M). If the VM is > writing 4K randomly everywhere then those 128M+ extents are not going > away any time soon. Even the extents that are dropped stick around for > a few btrfs transaction commits before they go away. > > I couldn't reproduce this behavior until I realized the extents I was > overwriting in my tests were exactly the same size and position of > the extents on disk. I changed the offset slightly and found that > partially-overwritten extents do in fact stick around in their entirety. > > There seems to be an unexpected benefit for compression here: compression > keeps the extents small, so many small updates will be less likely to > leave big mostly-unused extents lying around the filesystem. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html