Re: Heavy nocow'd VM image fragmentation

Austin S Hemmelgarn Mon, 27 Oct 2014 05:05:42 -0700

On 2014-10-26 13:20, Larkin Lowrey wrote:

On 10/24/2014 10:28 PM, Duncan wrote:

Robert White posted on Fri, 24 Oct 2014 19:41:32 -0700 as excerpted:

On 10/24/2014 04:49 AM, Marc MERLIN wrote:

On Thu, Oct 23, 2014 at 06:04:43PM -0500, Larkin Lowrey wrote:

I have a 240GB VirtualBox vdi image that is showing heavy
fragmentation (filefrag). The file was created in a dir that was
chattr +C'd, the file was created via fallocate and the contents of
the orignal image were copied into the file via dd. I verified that
the image was +C.

To be honest, I have the same problem, and it's vexing:

If I understand correctly, when you take a snapshot the file goes into
what I call "1COW" mode.

Yes, but the OP said he hadn't snapshotted since creating the file, and
MM's a regular that actually wrote much of the wiki documentation on
raid56 modes, so he better know about the snapshotting problem too.

So that can't be it.  There's apparently a bug in some recent code, and
it's not honoring the NOCOW even in normal operation, when it should be.

(FWIW I'm not running any VMs or large DBs here, so don't have nocow set
on anything and can and do use autodefrag on all my btrfs.  So I can't
say one way or the other, personally.)


Correct, there were no snapshots during VM usage when the fragmentation
occurred.

One unusual property of my setup is I have my fs on top of bcache. More
specifically, the stack is md raid6  -> bcache -> lvm -> btrfs. When the
fs mounts it has mount option 'ssd' due to the fact that bcache sets
/sys/block/bcache0/queue/rotational to 0.

Is there any reason why either the 'ssd' mount option or being backed by
bcache could be responsible?


Two things:

First, regarding your question, the ssd mount option "shouldn't" be responsible for this, because it is supposed to spread out allocation only at the chunk level, not the block level, but some recent commit may have changed that. Are you using any kind of compression in btrfs? If so, then filefrag won't report the number of fragments correctly (it currently reports the number of compressed blocks in the file instead), and in fact, if you are using compression in btrfs, I would expect the number of compressed blocks to go up as you use more space in the VM image, long runs of zero bytes compress well, other stuff (especially on-disk structures from encapsulated filesystems) doesn't. You might consider putting the vm images directly on the LVM layer instead, that tends to get much better performance in my experience than storing them on a filesystem.

Secondly, I'd recommend switching from using bcache under LVM to using dm-cache on top of LVM, as it makes it much easier to recover from the various failure modes, and also to deal with a corrupted cache, due to the fact that dm-cache doesn't put any metadata on the backing device. It takes longer to shutdown when in write-back mode, and isn't SSD optimized, but has also been much more reliable in my experience.

smime.p7s
Description: S/MIME Cryptographic Signature

Re: Heavy nocow'd VM image fragmentation

Reply via email to