On 5/24/19 10:00 PM, brendan.h...@gmail.com wrote:
Hi folks,

Summary/Questions:

1. Is the extremely large minimum-IO value of 256KB for the dom0 block devices 
representing Q4 VM volume in the thin pool ... intentional?
2. And if so, to what purpose (e.g. performance, etc.)?
3. And if so, has the impact of this value on depending on discards for 
returning unused disk space to the pool been factored in?

---discussion and supporting cmd output follows---

As you can see below, the MIN-IO (minimum IO size for R/W/D) and DISC-GRAN 
(minimum size allowed for discard/trim commands) on most of the thin pool 
volumes are both set to 256KB. Shown below, you can see this is the case for 
the debian 9 VM and the dom0 root volume. Same for all the VM volumes that I 
cut out of the output below for brevity/privacy.

Everything else in the stack, the drives, partitions, luks/crypt container and 
even some of the non-VM filesystem pool volumes and/or metadata have much more 
reasonable MIN-IO and DISC-GRAN values of 512 bytes or 4K...including dom0 swap!

The result is that turning on automatic trimming of the filesystems within VMs 
requires large holes to be created on the virtual disk before triggering 
discards that can be transmitted down the stack during deletions. To rephrase: 
in the default configuration, for data to be recovered from VM volumes back 
into the pool after deletions, the deletions must include files with large 
contiguous sections. Also, this negatively impacts physical disk trimming, if 
the user has configured it.

The 256K value may explain why folks have only found that manually invoking  
'sudo fstrim /av' is the only guaranteed way to trigger full release of storage 
back into the pool from VMs, leaving users who do not regularly trim from 
inside their VMs at risk of the pool running out of room.

Hi Brendan,

It would be interesting if thin-lvm min transfer were the reason for this difference in behavior between fstrim and the filesystem.

However, I think you're wrong to assume that any free block at any scale should be discarded at the lvm level. This behavior is probably a feature designed to prevent pool metadata use from exploding to the point where the volume becomes slow or unmanageable. Controlling metadata size is a serious issue with COW storage systems and at some point compromises must be made between data efficiency and metadata efficiency.

On thin-lvm volumes, maxing-out the allocated metadata space can have serious consequences including loss of the entire pool. I experienced this myself several weeks ago and I was just barely able to manage recovery without reinstalling the whole system – it involved deleting and re-creating the thin-pool, then restoring all the volumes from backup.

Run the 'lvs' command and look at the Meta% column for pool00. If its much more than 50% there is reason for concern, because if you put the system through a flurry of activity including cloning/snapshotting and/or modifying many small files then that figure could balloon close to 100% in a very short period.

--

Chris Laprise, tas...@posteo.net
https://github.com/tasket
https://twitter.com/ttaskett
PGP: BEE2 20C5 356E 764A 73EB  4AB3 1DC4 D106 F07F 1886

--
You received this message because you are subscribed to the Google Groups 
"qubes-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to qubes-users+unsubscr...@googlegroups.com.
To post to this group, send email to qubes-users@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/qubes-users/5c817d50-76a6-069d-16b9-990c893339a4%40posteo.net.
For more options, visit https://groups.google.com/d/optout.

Reply via email to