But I read somewhere that compression should be turned off on mounts
that just store large VM-images. Is that wrong?

Btw, I am not pre-allocation space for the images. I use sparse files with:

dd if=/dev/zero of=drive.img bs=1 count=1 seek=300G

It creates the file in a few ms.
Is it better to use "fallocate" with btrfs?

If I use sparse files, it adds a benefit when I want to copy/move the
image-file to another server.
Like if the 300GB sparse file just has 10GB of data in it, I only need
to copy 10GB when moving it to another server.
Would the same be true with "fallocate"?

Anyways, would disabling CoW (by putting +C on the parent dir) prevent
the performance issues and 2*filesize issue?

2014-12-20 13:52 GMT+08:00 Zygo Blaxell <ce3g8...@umail.furryterror.org>:
> On Fri, Dec 19, 2014 at 04:17:08PM -0500, Josef Bacik wrote:
>> >And for your inode you now have this
>> >
>> >inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g),
>> >disklen 4k
>> >inode 256, file offset 4k, size 302g-4k, offset 4k, diskbytenr 123,
>> >disklen 302g
>> >
>> >and in your extent tree you have
>> >
>> >extent bytenr 123, len 302g, refs 1
>> >extent bytenr whatever, len 4k, refs 1
>> >
>> >See that?  Your file is still the same size, it is still 302g.  If you
>> >cp'ed it right now it would copy 302g of information.  But what you have
>> >actually allocated on disk?  Well that's now 302g + 4k.  Now lets say
>> >your virt thing decides to write to the middle, lets say at offset 12k,
>> >now you have this
>> >
>> >inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g),
>> >disklen 4k
>> >inode 256, file offset 4k, size 3k, offset 4k, diskbytenr 123, disklen 302g
>> >inode 256, file offset 12k, size 4k, offset 0, diskebytenr whatever,
>> >disklen 4k
>> >inode 256, file offset 16k, size 302g - 16k, offset 16k, diskbytenr 123,
>> >disklen 302g
>> >
>> >and in the extent tree you have this
>> >
>> >extent bytenr 123, len 302g, refs 2
>> >extent bytenr whatever, len 4k, refs 1
>> >extent bytenr notimportant, len 4k, refs 1
>> >
>> >See that refs 2 change?  We split the original extent, so we have 2 file
>> >extents pointing to the same physical extents, so we bumped the ref
>> >count.  This will happen over and over again until we have completely
>> >overwritten the original extent, at which point your space usage will go
>> >back down to ~302g.
>
> Wait, *what*?
>
> OK, I did a small experiment, and found that btrfs actually does do
> something like this.  Can't argue with fact, though it would be nice if
> btrfs could be smarter and drop unused portions of the original extent
> sooner.  :-P
>
> The above quoted scenario is a little oversimplified.  Chances are that
> 302G file is made of much smaller extents (128M..256M).  If the VM is
> writing 4K randomly everywhere then those 128M+ extents are not going
> away any time soon.  Even the extents that are dropped stick around for
> a few btrfs transaction commits before they go away.
>
> I couldn't reproduce this behavior until I realized the extents I was
> overwriting in my tests were exactly the same size and position of
> the extents on disk.  I changed the offset slightly and found that
> partially-overwritten extents do in fact stick around in their entirety.
>
> There seems to be an unexpected benefit for compression here:  compression
> keeps the extents small, so many small updates will be less likely to
> leave big mostly-unused extents lying around the filesystem.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to