Re: [linux-lvm] LVM performance vs direct dm-thin

2022-01-30 Thread Gionatan Danti

Il 2022-01-30 22:17 Demi Marie Obenour ha scritto:

On Xen, the paravirtualised block backend driver (blkback) requires a
block device, so file-based virtual disks are implemented with a loop
device managed by the toolstack.  Suggestions for improving this
less-than-satisfactory situation are welcome.


Ah - I expected that with something as

disk = [ 'file:mydisk.img,hda,w' ]

Xen would have directly used "mydisk.img" as the backend disk file. Does 
it instead automatically create a loopback overlay?


I mainly use KVM, and maybe I am spoiled by its capability to use 
basically any datastore as backing disk.

Thanks.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] LVM performance vs direct dm-thin

2022-01-30 Thread Gionatan Danti

Il 2022-01-30 22:39 Stuart D. Gathman ha scritto:

I use LVM as flexible partitions (i.e. only classic LVs, no thin pool).
Classic LVs perform like partitions, literally using the same driver
(device mapper) with a small number of extents, and are if anything
more recoverable than partition tables.  We used to put LVM on bare
drives (like AIX did) - who needs a partition table?  But on Wintel,
you need a partition table for EFI and so that alien operating systems
know there is something already on a disk.


Classical (fat) LVs are rock solid, but how do you cope with fast (maybe 
rolling) snapshotting? This is the main selling point of thinlvm.



Since we use LVs like partitions - mixing with btrfs is not an issue.
Just use the LVs like partitions.  I haven't tried ZFS on linux - it
may have LVM like features that could fight with LVM.  ZFS would be my
first choice on a BSD box.


I broadly use ZFS - and yes, it is a wonderful tools. Than said, it has 
its own gotcha. For example:
- snapshot rollback is a destructive operation (ie: after rollback, you 
permanently lose the current filesystem state);
- clones (writable snapshots) depend on the read-only base image (ie: on 
the original snapshot), which you can not delete until you have its 
clones around.


Moreover, snapshotting/cloning a ZFS dataset (or volume) does not appear 
to be significantly faster then LVM - sometime it requires ~1s, 
depending on the load.



We do not use LVM raid - but either run mdraid underneath, or let btrfs
do it's data duplication thing with LVs on different spindles.


I always found btrfs very underperforming when facing random rewrite 
workloads as VMs and DBs. Can I ask your experience?

Regards.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] LVM performance vs direct dm-thin

2022-01-30 Thread Demi Marie Obenour
On Sun, Jan 30, 2022 at 04:39:30PM -0500, Stuart D. Gathman wrote:
> On Sun, 2022-01-30 at 11:45 -0500, Demi Marie Obenour wrote:
> > On Sun, Jan 30, 2022 at 11:52:52AM +0100, Zdenek Kabelac wrote:
> > > 
> > 
> > > Since you mentioned ZFS - you might want focus on using 'ZFS-only'
> > > solution.
> > > Combining  ZFS or Btrfs with lvm2 is always going to be a painful
> > > way as
> > > those filesystems have their own volume management.
> > 
> > Absolutely!  That said, I do wonder what your thoughts on using loop
> > devices for VM storage are.  I know they are slower than thin
> > volumes,
> > but they are also much easier to manage, since they are just ordinary
> > disk files.  Any filesystem with reflink can provide the needed
> > copy-on-write support.
> 
> I use loop devices for test cases - especially with simulated IO
> errors.  Devs really appreciate having an easy reproducer for
> database/filesystem bugs (which often involve handling of IO errors). 
> But not for production VMs.
> 
> I use LVM as flexible partitions (i.e. only classic LVs, no thin pool).
> Classic LVs perform like partitions, literally using the same driver
> (device mapper) with a small number of extents, and are if anything
> more recoverable than partition tables.  We used to put LVM on bare
> drives (like AIX did) - who needs a partition table?  But on Wintel,
> you need a partition table for EFI and so that alien operating systems
> know there is something already on a disk.
> 
> Your VM usage is different from ours - you seem to need to clone and
> activate a VM quickly (like a vps provider might need to do).  We
> generally have to buy more RAM to add a new VM :-), so performance of
> creating a new LV is the least of our worries.

To put it mildly, yes :).  Ideally we could get VM boot time down to
100ms or lower.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM performance vs direct dm-thin

2022-01-30 Thread Stuart D. Gathman
On Sun, 2022-01-30 at 11:45 -0500, Demi Marie Obenour wrote:
> On Sun, Jan 30, 2022 at 11:52:52AM +0100, Zdenek Kabelac wrote:
> > 
> 
> > Since you mentioned ZFS - you might want focus on using 'ZFS-only'
> > solution.
> > Combining  ZFS or Btrfs with lvm2 is always going to be a painful
> > way as
> > those filesystems have their own volume management.
> 
> Absolutely!  That said, I do wonder what your thoughts on using loop
> devices for VM storage are.  I know they are slower than thin
> volumes,
> but they are also much easier to manage, since they are just ordinary
> disk files.  Any filesystem with reflink can provide the needed
> copy-on-write support.

I use loop devices for test cases - especially with simulated IO
errors.  Devs really appreciate having an easy reproducer for
database/filesystem bugs (which often involve handling of IO errors). 
But not for production VMs.

I use LVM as flexible partitions (i.e. only classic LVs, no thin pool).
Classic LVs perform like partitions, literally using the same driver
(device mapper) with a small number of extents, and are if anything
more recoverable than partition tables.  We used to put LVM on bare
drives (like AIX did) - who needs a partition table?  But on Wintel,
you need a partition table for EFI and so that alien operating systems
know there is something already on a disk.

Your VM usage is different from ours - you seem to need to clone and
activate a VM quickly (like a vps provider might need to do).  We
generally have to buy more RAM to add a new VM :-), so performance of
creating a new LV is the least of our worries.

Since we use LVs like partitions - mixing with btrfs is not an issue. 
Just use the LVs like partitions.  I haven't tried ZFS on linux - it
may have LVM like features that could fight with LVM.  ZFS would be my
first choice on a BSD box.

We do not use LVM raid - but either run mdraid underneath, or let btrfs
do it's data duplication thing with LVs on different spindles.





___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM performance vs direct dm-thin

2022-01-30 Thread Demi Marie Obenour
On Sun, Jan 30, 2022 at 09:27:56PM +0100, Gionatan Danti wrote:
> Il 2022-01-30 18:43 Zdenek Kabelac ha scritto:
> > Chain filesystem->block_layer->filesystem->block_layer is something
> > you most likely do not want to use for any well performing solution...
> > But it's ok for testing...
> 
> I second that.
> 
> Demi Marie - just a question: are you sure do you really needs a block
> device? I don't know QubeOS, but both KVM and Xen can use files as virtual
> disks. This would enable you to ignore loopback mounts.

On Xen, the paravirtualised block backend driver (blkback) requires a
block device, so file-based virtual disks are implemented with a loop
device managed by the toolstack.  Suggestions for improving this
less-than-satisfactory situation are welcome.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Running thin_trim before activating a thin pool

2022-01-30 Thread Gionatan Danti

Il 2022-01-30 12:18 Zdenek Kabelac ha scritto:
 > Thin is more oriented towards extreme speed.

VDO is more about 'compression & deduplication' - so space efficiency.

Combining both together is kind of harming their advantages.


Unfortunately, it is the only (current) solution to have snapshotting 
with data compression/deduplication.

Integrating snapshot capability into VDO would be awesome!
Thanks.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] LVM performance vs direct dm-thin

2022-01-30 Thread Gionatan Danti

Il 2022-01-30 18:43 Zdenek Kabelac ha scritto:

Chain filesystem->block_layer->filesystem->block_layer is something
you most likely do not want to use for any well performing solution...
But it's ok for testing...


I second that.

Demi Marie - just a question: are you sure do you really needs a block 
device? I don't know QubeOS, but both KVM and Xen can use files as 
virtual disks. This would enable you to ignore loopback mounts.


Regards.


--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Running thin_trim before activating a thin pool

2022-01-30 Thread Zdenek Kabelac

Dne 30. 01. 22 v 19:01 Demi Marie Obenour napsal(a):

On Sun, Jan 30, 2022 at 06:56:43PM +0100, Zdenek Kabelac wrote:

Dne 30. 01. 22 v 18:30 Demi Marie Obenour napsal(a):

On Sun, Jan 30, 2022 at 12:18:32PM +0100, Zdenek Kabelac wrote:

Then are always landing in upstream kernel once they are all validated &
tested (recent kernel already has many speed enhancements).


Thanks!  Which mailing list should I be watching?


lkml


You could easily run in parallel individual blkdiscards for your thin LVs
For most modern drives thought it's somewhat waste of time...

Those trimming tools should be used when they are solving some real
problems, running them just for fun is just energy & performance waste


My understanding (which could be wrong) is that periodic trim is
necessary for SSDs.


This was useful for archaic SSDs. Modern SSD/NVMe drives are much smarter...

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Running thin_trim before activating a thin pool

2022-01-30 Thread Demi Marie Obenour
On Sun, Jan 30, 2022 at 06:56:43PM +0100, Zdenek Kabelac wrote:
> Dne 30. 01. 22 v 18:30 Demi Marie Obenour napsal(a):
> > On Sun, Jan 30, 2022 at 12:18:32PM +0100, Zdenek Kabelac wrote:
> > > Discard of thins itself is AFAIC pretty fast - unless you have massively
> > > sized thin devices with many GiB of metadata - obviously you cannot 
> > > process
> > > this amount of metadata in nanoseconds (and there are prepared kernel
> > > patches to make it even faster)
> > 
> > Would you be willing and able to share those patches?
> 
> Then are always landing in upstream kernel once they are all validated &
> tested (recent kernel already has many speed enhancements).

Thanks!  Which mailing list should I be watching?

> > > What is the problem is the speed of discard of physical devices.
> > > You could actually try to feel difference with:
> > > lvchange --discards passdown|nopassdown thinpool
> > 
> > In Qubes OS I believe we do need the discards to be passed down
> > eventually, but I doubt it needs to be synchronous.  Being able to run
> > the equivalent of `fstrim -av` periodically would be amazing.  I’m
> > CC’ing Marek Marczykowski-Górecki (Qubes OS project lead) in case he
> > has something to say.
> 
> You could easily run in parallel individual blkdiscards for your thin LVs
> For most modern drives thought it's somewhat waste of time...
> 
> Those trimming tools should be used when they are solving some real
> problems, running them just for fun is just energy & performance waste

My understanding (which could be wrong) is that periodic trim is
necessary for SSDs.

> > > Also it's very important to keep metadata on fast storage device 
> > > (SSD/NVMe)!
> > > Keeping metadata on same hdd spindle as data is always going to feel slow
> > > (in fact it's quite pointless to talk about performance and use hdd...)
> > 
> > That explains why I had such a horrible experience with my initial
> > (split between NVMe and HDD) install.  I would not be surprised if some
> > or all of the metadata volume wound up on the spinning disk.
> 
> With lvm2 user can always 'pvmove'  any LV to any desired PV.
> There is not yet any 'smart' logic to do it automatically.

Good point.  I was probably unware of that at the time.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Running thin_trim before activating a thin pool

2022-01-30 Thread Zdenek Kabelac

Dne 30. 01. 22 v 18:30 Demi Marie Obenour napsal(a):

On Sun, Jan 30, 2022 at 12:18:32PM +0100, Zdenek Kabelac wrote:

Dne 30. 01. 22 v 2:20 Demi Marie Obenour napsal(a):

On Sat, Jan 29, 2022 at 10:40:34PM +0100, Zdenek Kabelac wrote:

Dne 29. 01. 22 v 21:09 Demi Marie Obenour napsal(a):

On Sat, Jan 29, 2022 at 08:42:21PM +0100, Zdenek Kabelac wrote:

Dne 29. 01. 22 v 19:52 Demi Marie Obenour napsal(a):



Discard of thins itself is AFAIC pretty fast - unless you have massively
sized thin devices with many GiB of metadata - obviously you cannot process
this amount of metadata in nanoseconds (and there are prepared kernel
patches to make it even faster)


Would you be willing and able to share those patches?


Then are always landing in upstream kernel once they are all validated & 
tested (recent kernel already has many speed enhancements).





What is the problem is the speed of discard of physical devices.
You could actually try to feel difference with:
lvchange --discards passdown|nopassdown thinpool


In Qubes OS I believe we do need the discards to be passed down
eventually, but I doubt it needs to be synchronous.  Being able to run
the equivalent of `fstrim -av` periodically would be amazing.  I’m
CC’ing Marek Marczykowski-Górecki (Qubes OS project lead) in case he
has something to say.


You could easily run in parallel individual blkdiscards for your thin LVs
For most modern drives thought it's somewhat waste of time...

Those trimming tools should be used when they are solving some real problems, 
running them just for fun is just energy & performance waste





Also it's very important to keep metadata on fast storage device (SSD/NVMe)!
Keeping metadata on same hdd spindle as data is always going to feel slow
(in fact it's quite pointless to talk about performance and use hdd...)


That explains why I had such a horrible experience with my initial
(split between NVMe and HDD) install.  I would not be surprised if some
or all of the metadata volume wound up on the spinning disk.


With lvm2 user can always 'pvmove'  any LV to any desired PV.
There is not yet any 'smart' logic to do it automatically.


add support for efficient snapshots of data stored on a VDO volume, and
to have multiple volumes on top of a single VDO volume.  Furthermore,


We hope we will add some direct 'snapshot' support to VDO so users will not
need to combine both technologies together.


Does that include support for splitting a VDO volume into multiple,
individually-snapshottable volumes, the way thin works?


Yes - that's the plan - to have multiple VDO LV in a single VDOPool.

Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM performance vs direct dm-thin

2022-01-30 Thread Zdenek Kabelac

Dne 30. 01. 22 v 17:45 Demi Marie Obenour napsal(a):

On Sun, Jan 30, 2022 at 11:52:52AM +0100, Zdenek Kabelac wrote:

Dne 30. 01. 22 v 1:32 Demi Marie Obenour napsal(a):

On Sat, Jan 29, 2022 at 10:32:52PM +0100, Zdenek Kabelac wrote:

Dne 29. 01. 22 v 21:34 Demi Marie Obenour napsal(a):

How much slower are operations on an LVM2 thin pool compared to manually
managing a dm-thin target via ioctls?  I am mostly concerned about
volume snapshot, creation, and destruction.  Data integrity is very
important, so taking shortcuts that risk data loss is out of the
question.  However, the application may have some additional information
that LVM2 does not have.  For instance, it may know that the volume that
it is snapshotting is not in use, or that a certain volume it is
creating will never be used after power-off.




So brave developers may always write their own management tools for their
constrained environment requirements that will by significantly faster in
terms of how many thins you could create per minute (btw you will need to
also consider dropping usage of udev on such system)


What kind of constraints are you referring to?  Is it possible and safe
to have udev running, but told to ignore the thins in question?


Lvm2 is oriented more towards managing set of different disks,
where user is adding/removing/replacing them.  So it's more about
recoverability, good support for manual repair  (ascii metadata),
tracking history of changes,  backward compatibility, support
of conversion to different volume types (i.e. caching of thins, pvmove...)
Support for no/udev & no/systemd, clusters and nearly every linux distro
available... So there is a lot - and this all adds quite complexity.


I am certain it does, and that makes a lot of sense.  Thanks for the
hard work!  Those features are all useful for Qubes OS, too — just not
in the VM startup/shutdown path.


So once you scratch all this - and you say you only care about single disc
then you are able to use more efficient metadata formats which you could
even keep permanently in memory during the lifetime - this all adds great
performance.

But it all depends how you could constrain your environment.

It's worth to mention there is lvm2 support for 'external' 'thin volume'
creators - so lvm2 only maintains 'thin-pool' data & metadata LV - but thin
volume creation, activation, deactivation of thins is left to external tool.
This has been used by docker for a while - later on they switched to
overlayFs I believe..


That indeeds sounds like a good choice for Qubes OS.  It would allow the
data and metadata LVs to be any volume type that lvm2 supports, and
managed using all of lvm2’s features.  So one could still put the
metadata on a RAID-10 volume while everything else is RAID-6, or set up
a dm-cache volume to store the data (please correct me if I am wrong).
Qubes OS has already moved to using a separate thin pool for virtual
machines, as it prevents dom0 (privileged management VM) from being run
out of disk space (by accident or malice).  That means that the thin
pool use for guests is managed only by Qubes OS, and so the standard
lvm2 tools do not need to touch it.

Is this a setup that you would recommend, and would be comfortable using
in production?  As far as metadata is concerned, Qubes OS has its own
XML file containing metadata about all qubes, which should suffice for
this purpose.  To prevent races during updates and ensure automatic
crash recovery, is it sufficient to store metadata for both new and old
transaction IDs, and pick the correct one based on the device-mapper
status line?  I have seen lvm2 get in an inconsistent state (transaction
ID off by one) that required manual repair before, which is quite
unnerving for a desktop OS.


My biased advice would be to stay with lvm2. There is lot of work, many things 
are not well documented and getting everything running correctly will take a 
lot of effort  (Docker in fact did not managed to do it well and was incapable 
to provide any recoverability)



One feature that would be nice is to be able to import an
externally-provided mapping of thin pool device numbers to LV names, so
that lvm2 could provide a (read-only, and not guaranteed fresh) view of
system state for reporting purposes.


Once you will have evidence it's the lvm2 causing major issue - you could 
consider whether it's worth to step into a separate project.




It's worth to mention - the more bullet-proof you will want to make your
project - the more closer to the extra processing made by lvm2 you will get.


Why is this?  How does lvm2 compare to stratis, for example?


Stratis is yet another volume manager written in Rust combined with XFS for
easier user experience. That's all I'd probably say about it...


That’s fine.  I guess my question is why making lvm2 bullet-proof needs
so much overhead.


It's difficult - if you would be distributing lvm2 with exact kernel version & 
udev & systemd with a single linux distro - it 

Re: [linux-lvm] Running thin_trim before activating a thin pool

2022-01-30 Thread Demi Marie Obenour
On Sun, Jan 30, 2022 at 12:18:32PM +0100, Zdenek Kabelac wrote:
> Dne 30. 01. 22 v 2:20 Demi Marie Obenour napsal(a):
> > On Sat, Jan 29, 2022 at 10:40:34PM +0100, Zdenek Kabelac wrote:
> > > Dne 29. 01. 22 v 21:09 Demi Marie Obenour napsal(a):
> > > > On Sat, Jan 29, 2022 at 08:42:21PM +0100, Zdenek Kabelac wrote:
> > > > > Dne 29. 01. 22 v 19:52 Demi Marie Obenour napsal(a):
> > > > > > Is it possible to configure LVM2 so that it runs thin_trim before it
> > > > > > activates a thin pool?  Qubes OS currently runs blkdiscard on every 
> > > > > > thin
> > > > > > volume before deleting it, which is slow and unreliable.  Would 
> > > > > > running
> > > > > > thin_trim during system startup provide a better alternative?
> > > > > 
> > > > > Hi
> > > > > 
> > > > > 
> > > > > Nope there is currently no support from lvm2 side for this.
> > > > > Feel free to open RFE.
> > > > 
> > > > Done: https://bugzilla.redhat.com/show_bug.cgi?id=2048160
> > > > 
> > > > 
> > > 
> > > Thanks
> > > 
> > > Although your use-case Thinpool on top of VDO is not really a good plan 
> > > and
> > > there is a good reason behind why lvm2 does not support this device stack
> > > directly (aka thin-pool data LV as VDO LV).
> > > I'd say you are stepping on very very thin ice...
> > 
> > Thin pool on VDO is not my actual use-case.  The actual reason for the
> > ticket is slow discards of thin devices that are about to be deleted;
> 
> Hi
> 
> Discard of thins itself is AFAIC pretty fast - unless you have massively
> sized thin devices with many GiB of metadata - obviously you cannot process
> this amount of metadata in nanoseconds (and there are prepared kernel
> patches to make it even faster)

Would you be willing and able to share those patches?

> What is the problem is the speed of discard of physical devices.
> You could actually try to feel difference with:
> lvchange --discards passdown|nopassdown thinpool

In Qubes OS I believe we do need the discards to be passed down
eventually, but I doubt it needs to be synchronous.  Being able to run
the equivalent of `fstrim -av` periodically would be amazing.  I’m
CC’ing Marek Marczykowski-Górecki (Qubes OS project lead) in case he
has something to say.

> Also it's very important to keep metadata on fast storage device (SSD/NVMe)!
> Keeping metadata on same hdd spindle as data is always going to feel slow
> (in fact it's quite pointless to talk about performance and use hdd...)

That explains why I had such a horrible experience with my initial
(split between NVMe and HDD) install.  I would not be surprised if some
or all of the metadata volume wound up on the spinning disk.

> > you can find more details in the linked GitHub issue.  That said, now I
> > am curious why you state that dm-thin on top of dm-vdo (that is,
> > userspace/filesystem/VM/etc ⇒ dm-thin data (*not* metadata) ⇒ dm-vdo ⇒
> > hardware/dm-crypt/etc) is a bad idea.  It seems to be a decent way to
> 
> Out-of-space recoveries are ATM much harder then what we want.

Okay, thanks!  Will this be fixed in a future version?

> So as long as user can maintain free space of your VDO and thin-pool it's
> ok. Once user runs out of space - recovery is pretty hard task (and there is
> reason we have support...)

Out of space is already a tricky issue in Qubes OS.  I certainly would
not want to make it worse.

> > add support for efficient snapshots of data stored on a VDO volume, and
> > to have multiple volumes on top of a single VDO volume.  Furthermore,
> 
> We hope we will add some direct 'snapshot' support to VDO so users will not
> need to combine both technologies together.

Does that include support for splitting a VDO volume into multiple,
individually-snapshottable volumes, the way thin works?

> Thin is more oriented towards extreme speed.
> VDO is more about 'compression & deduplication' - so space efficiency.
> 
> Combining both together is kind of harming their advantages.

That makes sense.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM performance vs direct dm-thin

2022-01-30 Thread Demi Marie Obenour
On Sun, Jan 30, 2022 at 11:52:52AM +0100, Zdenek Kabelac wrote:
> Dne 30. 01. 22 v 1:32 Demi Marie Obenour napsal(a):
> > On Sat, Jan 29, 2022 at 10:32:52PM +0100, Zdenek Kabelac wrote:
> > > Dne 29. 01. 22 v 21:34 Demi Marie Obenour napsal(a):
> > > > How much slower are operations on an LVM2 thin pool compared to manually
> > > > managing a dm-thin target via ioctls?  I am mostly concerned about
> > > > volume snapshot, creation, and destruction.  Data integrity is very
> > > > important, so taking shortcuts that risk data loss is out of the
> > > > question.  However, the application may have some additional information
> > > > that LVM2 does not have.  For instance, it may know that the volume that
> > > > it is snapshotting is not in use, or that a certain volume it is
> > > > creating will never be used after power-off.
> > > > 
> > 
> > > So brave developers may always write their own management tools for their
> > > constrained environment requirements that will by significantly faster in
> > > terms of how many thins you could create per minute (btw you will need to
> > > also consider dropping usage of udev on such system)
> > 
> > What kind of constraints are you referring to?  Is it possible and safe
> > to have udev running, but told to ignore the thins in question?
> 
> Lvm2 is oriented more towards managing set of different disks,
> where user is adding/removing/replacing them.  So it's more about
> recoverability, good support for manual repair  (ascii metadata),
> tracking history of changes,  backward compatibility, support
> of conversion to different volume types (i.e. caching of thins, pvmove...)
> Support for no/udev & no/systemd, clusters and nearly every linux distro
> available... So there is a lot - and this all adds quite complexity.

I am certain it does, and that makes a lot of sense.  Thanks for the
hard work!  Those features are all useful for Qubes OS, too — just not
in the VM startup/shutdown path.

> So once you scratch all this - and you say you only care about single disc
> then you are able to use more efficient metadata formats which you could
> even keep permanently in memory during the lifetime - this all adds great
> performance.
> 
> But it all depends how you could constrain your environment.
> 
> It's worth to mention there is lvm2 support for 'external' 'thin volume'
> creators - so lvm2 only maintains 'thin-pool' data & metadata LV - but thin
> volume creation, activation, deactivation of thins is left to external tool.
> This has been used by docker for a while - later on they switched to
> overlayFs I believe..

That indeeds sounds like a good choice for Qubes OS.  It would allow the
data and metadata LVs to be any volume type that lvm2 supports, and
managed using all of lvm2’s features.  So one could still put the
metadata on a RAID-10 volume while everything else is RAID-6, or set up
a dm-cache volume to store the data (please correct me if I am wrong).
Qubes OS has already moved to using a separate thin pool for virtual
machines, as it prevents dom0 (privileged management VM) from being run
out of disk space (by accident or malice).  That means that the thin
pool use for guests is managed only by Qubes OS, and so the standard
lvm2 tools do not need to touch it.

Is this a setup that you would recommend, and would be comfortable using
in production?  As far as metadata is concerned, Qubes OS has its own
XML file containing metadata about all qubes, which should suffice for
this purpose.  To prevent races during updates and ensure automatic
crash recovery, is it sufficient to store metadata for both new and old
transaction IDs, and pick the correct one based on the device-mapper
status line?  I have seen lvm2 get in an inconsistent state (transaction
ID off by one) that required manual repair before, which is quite
unnerving for a desktop OS.

One feature that would be nice is to be able to import an
externally-provided mapping of thin pool device numbers to LV names, so
that lvm2 could provide a (read-only, and not guaranteed fresh) view of
system state for reporting purposes.

> > > It's worth to mention - the more bullet-proof you will want to make your
> > > project - the more closer to the extra processing made by lvm2 you will 
> > > get.
> > 
> > Why is this?  How does lvm2 compare to stratis, for example?
> 
> Stratis is yet another volume manager written in Rust combined with XFS for
> easier user experience. That's all I'd probably say about it...

That’s fine.  I guess my question is why making lvm2 bullet-proof needs
so much overhead.

> > > However before you will step into these waters - you should probably
> > > evaluate whether thin-pool actually meet your needs if you have that high
> > > expectation for number of supported volumes - so you will not end up with
> > > hyper fast snapshot creation while the actual usage then is not meeting 
> > > your
> > > needs...
> > 
> > What needs are you thinking of specifically?  Qubes OS needs block
> 

Re: [linux-lvm] Running thin_trim before activating a thin pool

2022-01-30 Thread Zdenek Kabelac

Dne 30. 01. 22 v 2:20 Demi Marie Obenour napsal(a):

On Sat, Jan 29, 2022 at 10:40:34PM +0100, Zdenek Kabelac wrote:

Dne 29. 01. 22 v 21:09 Demi Marie Obenour napsal(a):

On Sat, Jan 29, 2022 at 08:42:21PM +0100, Zdenek Kabelac wrote:

Dne 29. 01. 22 v 19:52 Demi Marie Obenour napsal(a):

Is it possible to configure LVM2 so that it runs thin_trim before it
activates a thin pool?  Qubes OS currently runs blkdiscard on every thin
volume before deleting it, which is slow and unreliable.  Would running
thin_trim during system startup provide a better alternative?


Hi


Nope there is currently no support from lvm2 side for this.
Feel free to open RFE.


Done: https://bugzilla.redhat.com/show_bug.cgi?id=2048160




Thanks

Although your use-case Thinpool on top of VDO is not really a good plan and
there is a good reason behind why lvm2 does not support this device stack
directly (aka thin-pool data LV as VDO LV).
I'd say you are stepping on very very thin ice...


Thin pool on VDO is not my actual use-case.  The actual reason for the
ticket is slow discards of thin devices that are about to be deleted;


Hi

Discard of thins itself is AFAIC pretty fast - unless you have massively sized 
thin devices with many GiB of metadata - obviously you cannot process this 
amount of metadata in nanoseconds (and there are prepared kernel patches to 
make it even faster)


What is the problem is the speed of discard of physical devices.
You could actually try to feel difference with:
lvchange --discards passdown|nopassdown thinpool

Also it's very important to keep metadata on fast storage device (SSD/NVMe)!
Keeping metadata on same hdd spindle as data is always going to feel slow
(in fact it's quite pointless to talk about performance and use hdd...)


you can find more details in the linked GitHub issue.  That said, now I
am curious why you state that dm-thin on top of dm-vdo (that is,
userspace/filesystem/VM/etc ⇒ dm-thin data (*not* metadata) ⇒ dm-vdo ⇒
hardware/dm-crypt/etc) is a bad idea.  It seems to be a decent way to


Out-of-space recoveries are ATM much harder then what we want.

So as long as user can maintain free space of your VDO and thin-pool it's ok. 
Once user runs out of space - recovery is pretty hard task (and there is 
reason we have support...)



add support for efficient snapshots of data stored on a VDO volume, and
to have multiple volumes on top of a single VDO volume.  Furthermore,


We hope we will add some direct 'snapshot' support to VDO so users will not 
need to combine both technologies together.


Thin is more oriented towards extreme speed.
VDO is more about 'compression & deduplication' - so space efficiency.

Combining both together is kind of harming their advantages.


https://access.redhat.com/articles/2106521#vdo recommends exactly this
use-case.  Or am I misunderstanding you?


There are many paths to Rome...
So as mentioned above - you need to pick performance/space effieciency.
And since you want to write your own  thin volume managing software, I'm 
guessing you care about performance a lot  (so we do - but with our given 
constrains that are limiting us to some level)...



Also I assume you have already checked performance of discard on VDO, but I
would not want to run this operation frequently on any larger volume...


I have never actually used VDO myself, although the documentation does
warn about this.


It's been purely related to the initial BZ description which cares a lot about 
thin discard performance and following comment adds VDO discard into same 
equation... :)


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM performance vs direct dm-thin

2022-01-30 Thread Zdenek Kabelac

Dne 30. 01. 22 v 1:32 Demi Marie Obenour napsal(a):

On Sat, Jan 29, 2022 at 10:32:52PM +0100, Zdenek Kabelac wrote:

Dne 29. 01. 22 v 21:34 Demi Marie Obenour napsal(a):

How much slower are operations on an LVM2 thin pool compared to manually
managing a dm-thin target via ioctls?  I am mostly concerned about
volume snapshot, creation, and destruction.  Data integrity is very
important, so taking shortcuts that risk data loss is out of the
question.  However, the application may have some additional information
that LVM2 does not have.  For instance, it may know that the volume that
it is snapshotting is not in use, or that a certain volume it is
creating will never be used after power-off.




So brave developers may always write their own management tools for their
constrained environment requirements that will by significantly faster in
terms of how many thins you could create per minute (btw you will need to
also consider dropping usage of udev on such system)


What kind of constraints are you referring to?  Is it possible and safe
to have udev running, but told to ignore the thins in question?


Lvm2 is oriented more towards managing set of different disks,
where user is adding/removing/replacing them.  So it's more about 
recoverability, good support for manual repair  (ascii metadata),

tracking history of changes,  backward compatibility, support
of conversion to different volume types (i.e. caching of thins, pvmove...)
Support for no/udev & no/systemd, clusters and nearly every linux distro 
available... So there is a lot - and this all adds quite complexity.


So once you scratch all this - and you say you only care about single disc 
then you are able to use more efficient metadata formats which you could even 
keep permanently in memory during the lifetime - this all adds great performance.


But it all depends how you could constrain your environment.

It's worth to mention there is lvm2 support for 'external' 'thin volume' 
creators - so lvm2 only maintains 'thin-pool' data & metadata LV - but thin 
volume creation, activation, deactivation of thins is left to external tool.
This has been used by docker for a while - later on they switched to overlayFs 
I believe..





It's worth to mention - the more bullet-proof you will want to make your
project - the more closer to the extra processing made by lvm2 you will get.


Why is this?  How does lvm2 compare to stratis, for example?


Stratis is yet another volume manager written in Rust combined with XFS for 
easier user experience. That's all I'd probably say about it...



However before you will step into these waters - you should probably
evaluate whether thin-pool actually meet your needs if you have that high
expectation for number of supported volumes - so you will not end up with
hyper fast snapshot creation while the actual usage then is not meeting your
needs...


What needs are you thinking of specifically?  Qubes OS needs block
devices, so filesystem-backed storage would require the use of loop
devices unless I use ZFS zvols.  Do you have any specific
recommendations?


As long as you live in the world without crashes, buggy kernels, apps  and 
failing hard drives everything looks very simple.

And every development costs quite some time & money.

Since you mentioned ZFS - you might want focus on using 'ZFS-only' solution.
Combining  ZFS or Btrfs with lvm2 is always going to be a painful way as those 
filesystems have their own volume management.


Regards

Zdenek

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/