On 5/24/19 10:00 PM, brendan.h...@gmail.com wrote:
Hi folks,
Summary/Questions:
1. Is the extremely large minimum-IO value of 256KB for the dom0 block devices
representing Q4 VM volume in the thin pool ... intentional?
2. And if so, to what purpose (e.g. performance, etc.)?
3. And if so, has the impact of this value on depending on discards for
returning unused disk space to the pool been factored in?
---discussion and supporting cmd output follows---
As you can see below, the MIN-IO (minimum IO size for R/W/D) and DISC-GRAN
(minimum size allowed for discard/trim commands) on most of the thin pool
volumes are both set to 256KB. Shown below, you can see this is the case for
the debian 9 VM and the dom0 root volume. Same for all the VM volumes that I
cut out of the output below for brevity/privacy.
Everything else in the stack, the drives, partitions, luks/crypt container and
even some of the non-VM filesystem pool volumes and/or metadata have much more
reasonable MIN-IO and DISC-GRAN values of 512 bytes or 4K...including dom0 swap!
The result is that turning on automatic trimming of the filesystems within VMs
requires large holes to be created on the virtual disk before triggering
discards that can be transmitted down the stack during deletions. To rephrase:
in the default configuration, for data to be recovered from VM volumes back
into the pool after deletions, the deletions must include files with large
contiguous sections. Also, this negatively impacts physical disk trimming, if
the user has configured it.
The 256K value may explain why folks have only found that manually invoking
'sudo fstrim /av' is the only guaranteed way to trigger full release of storage
back into the pool from VMs, leaving users who do not regularly trim from
inside their VMs at risk of the pool running out of room.
Hi Brendan,
It would be interesting if thin-lvm min transfer were the reason for
this difference in behavior between fstrim and the filesystem.
However, I think you're wrong to assume that any free block at any scale
should be discarded at the lvm level. This behavior is probably a
feature designed to prevent pool metadata use from exploding to the
point where the volume becomes slow or unmanageable. Controlling
metadata size is a serious issue with COW storage systems and at some
point compromises must be made between data efficiency and metadata
efficiency.
On thin-lvm volumes, maxing-out the allocated metadata space can have
serious consequences including loss of the entire pool. I experienced
this myself several weeks ago and I was just barely able to manage
recovery without reinstalling the whole system – it involved deleting
and re-creating the thin-pool, then restoring all the volumes from backup.
Run the 'lvs' command and look at the Meta% column for pool00. If its
much more than 50% there is reason for concern, because if you put the
system through a flurry of activity including cloning/snapshotting
and/or modifying many small files then that figure could balloon close
to 100% in a very short period.
--
Chris Laprise, tas...@posteo.net
https://github.com/tasket
https://twitter.com/ttaskett
PGP: BEE2 20C5 356E 764A 73EB 4AB3 1DC4 D106 F07F 1886
--
You received this message because you are subscribed to the Google Groups
"qubes-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to qubes-users+unsubscr...@googlegroups.com.
To post to this group, send email to qubes-users@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/qubes-users/5c817d50-76a6-069d-16b9-990c893339a4%40posteo.net.
For more options, visit https://groups.google.com/d/optout.