Re: [linux-lvm] LVM performance vs direct dm-thin

2022-02-01 Thread Marian Csontos
On Sun, Jan 30, 2022 at 11:17 PM Demi Marie Obenour <
d...@invisiblethingslab.com> wrote:

> On Sun, Jan 30, 2022 at 04:39:30PM -0500, Stuart D. Gathman wrote:
> > Your VM usage is different from ours - you seem to need to clone and
> > activate a VM quickly (like a vps provider might need to do).  We
> > generally have to buy more RAM to add a new VM :-), so performance of
> > creating a new LV is the least of our worries.
>
> To put it mildly, yes :).  Ideally we could get VM boot time down to
> 100ms or lower.
>

Out of curiosity, is snapshot creation the main culprit to boot a VM in
under 100ms? Does Qubes OS use tweaked linux distributions, to achieve the
desired boot time?

Back to business. Perhaps I missed an answer to this question: Are the
Qubes OS VMs throw away?  Throw away in the sense like many containers are
- it's just a runtime which can be "easily" reconstructed. If so, you can
ignore the safety belts and try to squeeze more performance by sacrificing
(meta)data integrity.

And the answer to that question seems to be both Yes and No. Classical pets
vs cattle.

As I understand it, except of the system VMs, there are at least two kinds
of user domains and these have different requirements:

1. few permanent pet VMs (Work, Personal, Banking, ...), in Qubes OS called
AppVMs,
2. and many transient cattle VMs (e.g. for opening an attachment from
email, or browsing web, or batch processing of received files) called
Disposable VMs.

For AppVMs, there are only "few" of those and these are running most of the
time so start time may be less important than data safety. Certainly
creation time is only once in a while operation so I would say use LVM for
these. And where snapshots are not required, use plain linear LVs, one less
thing which could go wrong. However, AppVMs are created from Template VMs,
so snapshots seem to be part of the system. But data may be on linear LVs
anyway as these are not shared and these are the most important part of the
system. And you can still use old style snapshots for backing up the data
(and by backup I mean snapshot, copy, delete snapshot. Not a long term
snapshot. And definitely not multiple snapshots).

Now I realized there is the third kind of user domains - Template VMs.
Similarly to App VM, there are only few of those, and creating them
requires downloading an image, upgrading system on an existing template, or
even installation of the system, so any LVM overhead is insignificant for
these. Use thin volumes.

For the Disposable VMs it is the creation + startup time which matters. Use
whatever is the fastest method. These are created from template VMs too.
What LVM/DM has to offer here is external origin. So the templates
themselves could be managed by LVM, and Qubes OS could use them as external
origin for Disposable VMs using device mapper directly. These could be held
in a disposable thin pool which can be reinitialized from scratch on host
reboot, after a crash, or on a problem with the pool. As a bonus this would
also address the absence of thin pool shrinking.

I wonder if a pool of ready to be used VMs could solve some of the startup
time issues - keep $POOL_SIZE VMs (all using LVM) ready and just inject the
data to one of the VMs when needed and prepare a new one asynchronously. So
you could have to some extent both the quick start and data safety as a
solution for the hypothetical third kind of domains requiring them - e.g. a
Disposable VM spawn to edit a file from a third party - you want to keep
the state on a reboot or a system crash.
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM performance vs direct dm-thin

2022-02-01 Thread Demi Marie Obenour
On Sun, Jan 30, 2022 at 06:43:13PM +0100, Zdenek Kabelac wrote:
> Dne 30. 01. 22 v 17:45 Demi Marie Obenour napsal(a):
> > On Sun, Jan 30, 2022 at 11:52:52AM +0100, Zdenek Kabelac wrote:
> > > Dne 30. 01. 22 v 1:32 Demi Marie Obenour napsal(a):
> > > > On Sat, Jan 29, 2022 at 10:32:52PM +0100, Zdenek Kabelac wrote:
> > > > > Dne 29. 01. 22 v 21:34 Demi Marie Obenour napsal(a):
> > > > > > How much slower are operations on an LVM2 thin pool compared to 
> > > > > > manually
> > > > > > managing a dm-thin target via ioctls?  I am mostly concerned about
> > > > > > volume snapshot, creation, and destruction.  Data integrity is very
> > > > > > important, so taking shortcuts that risk data loss is out of the
> > > > > > question.  However, the application may have some additional 
> > > > > > information
> > > > > > that LVM2 does not have.  For instance, it may know that the volume 
> > > > > > that
> > > > > > it is snapshotting is not in use, or that a certain volume it is
> > > > > > creating will never be used after power-off.
> > > > > > 
> > > > 
> > > > > So brave developers may always write their own management tools for 
> > > > > their
> > > > > constrained environment requirements that will by significantly 
> > > > > faster in
> > > > > terms of how many thins you could create per minute (btw you will 
> > > > > need to
> > > > > also consider dropping usage of udev on such system)
> > > > 
> > > > What kind of constraints are you referring to?  Is it possible and safe
> > > > to have udev running, but told to ignore the thins in question?
> > > 
> > > Lvm2 is oriented more towards managing set of different disks,
> > > where user is adding/removing/replacing them.  So it's more about
> > > recoverability, good support for manual repair  (ascii metadata),
> > > tracking history of changes,  backward compatibility, support
> > > of conversion to different volume types (i.e. caching of thins, pvmove...)
> > > Support for no/udev & no/systemd, clusters and nearly every linux distro
> > > available... So there is a lot - and this all adds quite complexity.
> > 
> > I am certain it does, and that makes a lot of sense.  Thanks for the
> > hard work!  Those features are all useful for Qubes OS, too — just not
> > in the VM startup/shutdown path.
> > 
> > > So once you scratch all this - and you say you only care about single disc
> > > then you are able to use more efficient metadata formats which you could
> > > even keep permanently in memory during the lifetime - this all adds great
> > > performance.
> > > 
> > > But it all depends how you could constrain your environment.
> > > 
> > > It's worth to mention there is lvm2 support for 'external' 'thin volume'
> > > creators - so lvm2 only maintains 'thin-pool' data & metadata LV - but 
> > > thin
> > > volume creation, activation, deactivation of thins is left to external 
> > > tool.
> > > This has been used by docker for a while - later on they switched to
> > > overlayFs I believe..
> > 
> > That indeeds sounds like a good choice for Qubes OS.  It would allow the
> > data and metadata LVs to be any volume type that lvm2 supports, and
> > managed using all of lvm2’s features.  So one could still put the
> > metadata on a RAID-10 volume while everything else is RAID-6, or set up
> > a dm-cache volume to store the data (please correct me if I am wrong).
> > Qubes OS has already moved to using a separate thin pool for virtual
> > machines, as it prevents dom0 (privileged management VM) from being run
> > out of disk space (by accident or malice).  That means that the thin
> > pool use for guests is managed only by Qubes OS, and so the standard
> > lvm2 tools do not need to touch it.
> > 
> > Is this a setup that you would recommend, and would be comfortable using
> > in production?  As far as metadata is concerned, Qubes OS has its own
> > XML file containing metadata about all qubes, which should suffice for
> > this purpose.  To prevent races during updates and ensure automatic
> > crash recovery, is it sufficient to store metadata for both new and old
> > transaction IDs, and pick the correct one based on the device-mapper
> > status line?  I have seen lvm2 get in an inconsistent state (transaction
> > ID off by one) that required manual repair before, which is quite
> > unnerving for a desktop OS.
> 
> My biased advice would be to stay with lvm2. There is lot of work, many
> things are not well documented and getting everything running correctly will
> take a lot of effort  (Docker in fact did not managed to do it well and was
> incapable to provide any recoverability)

What did Docker do wrong?  Would it be possible for a future version of
lvm2 to be able to automatically recover from off-by-one thin pool
transaction IDs?

> > One feature that would be nice is to be able to import an
> > externally-provided mapping of thin pool device numbers to LV names, so
> > that lvm2 could provide a (read-only, and not guaranteed fresh) view of
> > system