Re: [linux-lvm] Can I combine LUKS and LVM to achieve encryption and snapshots?

2023-09-26 Thread Demi Marie Obenour
On Wed, Sep 27, 2023 at 01:10:10AM +0200, Jean-Marc Saffroy wrote:
> Hi,
> 
> On Tue, Sep 26, 2023 at 10:00 PM Zdenek Kabelac
>  wrote:
> > Yep typical usage is to encrypt underlying PV - and then create LVs and its
> > snapshots on encrypted device.
> 
> Sure, I'd do that in other circumstances.
> 
> But in my case it would just be a waste: I am replacing several disks
> on a desktop computer with a single 2TB NVME SSD for everything. Only
> /home needs to be encrypted, and it's tiny, like 100-200GB. Going
> through encryption for most application I/Os would use CPU time and
> increase latency with no benefit.

"No benefit" depends on one's threat model.  A surprising amount of
sensitive data gets put outside of /home.  For instance, SSH host keys
are in /etc, and system daemons store their data in /var.  That's why
the standard is to encrypt the entire drive, except for /boot and
/boot/efi.  It's the only way to ensure that sensitive data doesn't wind
up on the NVMe drive, from which it cannot be removed except by
destroying or (cryptographically) securely erasing the drive.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] lvconvert --uncache takes hours

2023-03-01 Thread Demi Marie Obenour
On Wed, Mar 01, 2023 at 11:44:00PM +0100, Roy Sigurd Karlsbakk wrote:
> Hi all
> 
> Working with a friend's machine, it has lvmcache turned on with writeback. 
> This has worked well, but now it's uncaching and it takes *hours*. The amount 
> of cache was chosen to 100GB on an SSD not used for much else and the dataset 
> that is being cached, is a RAID-6 set of 10x2TB with XFS on top. The system 
> mainly works with file serving, but also has some VMs that benefit from the 
> caching quite a bit. But then - I wonder - how can it spend hours emptying 
> the cache like this? Most write caching I know of last only seconds or 
> perhaps in really worst case scenarios, minutes. Since this is taking hours, 
> it looks to me something should have been flushed ages ago.
> 
> Have I (or we) done something very stupid here or is this really how it's 
> supposed to work?

It’s likely normal.  HDDs stink at small random writes and RAID-6 makes
this even worse.  That said, I *strongly* recommend using three-disk
RAID-1 for the cache, to match the redundancy of the RAID-6.  With
write-back caching, a failed cache will result in a corrupt and
unrecoverable filesystem.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] How to implement live migration of VMs in thinlv after using lvmlockd

2022-11-01 Thread Demi Marie Obenour
On Tue, Nov 01, 2022 at 12:57:56PM -0500, David Teigland wrote:
> On Wed, Nov 02, 2022 at 01:02:27AM +0800, Zhiyong Ye wrote:
> > Hi Dave,
> > 
> > Thank you for your reply!
> > 
> > Does this mean that there is no way to live migrate VMs when using lvmlockd?
> 
> You could by using linear LVs, ovirt does this using sanlock directly,
> since lvmlockd arrived later.

Another approach would be to use thin provisioning on the SAN instead of
at the LVM level.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] LVM2 : performance drop even after deleting the snapshot

2022-10-17 Thread Demi Marie Obenour
On Fri, Oct 14, 2022 at 10:28:28PM +0200, Roberto Fastec wrote:
> TIP and HINT
> forget SSDs with LVM unless of enterprise level 
> especially if you are going to use/implement the thin provisioning

As a user and developer of Qubes OS, this makes me nervous.  Qubes OS
uses LVM heavily in the default configuration, and is designed for
end-user systems.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] lvmpolld causes high cpu load issue

2022-08-17 Thread Demi Marie Obenour
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On Wed, Aug 17, 2022 at 05:26:08PM +0200, Zdenek Kabelac wrote:
> Dne 17. 08. 22 v 15:41 Martin Wilck napsal(a):
> > On Wed, 2022-08-17 at 14:54 +0200, Zdenek Kabelac wrote:
> > > Dne 17. 08. 22 v 14:39 Martin Wilck napsal(a):
> > > 
> > > 
> > > Let's make clear we are very well aware of all the constrains
> > > associated with
> > > udev rule logic  (and we tried quite hard to minimize impact -
> > > however udevd
> > > developers kind of 'misunderstood'  how badly they will be impacting
> > > system's
> > > performance with the existing watch rule logic - and the story kind
> > > of
> > > 'continues' with  'systemd's' & dBus services unfortunatelly...
> > 
> > I dimly remember you dislike udev ;-)
> 
> Well it's not 'a dislike' from my side - but the architecture alone is just
> missing in many areas...
> 
> Dave is a complete disliker of udev & systemd all together :)

I find udev useful for physical devices, but for virtual devices it is a
terrible fit.  It is far too slow and full of race conditions.

Ideally, device-mapper ioctls would use diskseq instead of major+minor
number everywhere, and devices would be named after the diskseq.

> > I like the general idea of the udev watch. It is the magic that causes
> > newly created partitions to magically appear in the system, which is
> 
> Tragedy of design comes from the plain fact that there are only 'very
> occasional' consumers of all these 'collected' data - but gathering all the
> info and keeping all of it 'up-to-date' is getting very very expensive and
> can basically 'neutralize' a lot of your CPU if you have too many resources
> to watch and keep update.
> 
> 
> > very convenient for users and wouldn't work otherwise. I can see that
> > it might be inappropriate for LVM PVs. We can discuss changing the
> > rules such that the watch is disabled for LVM devices (both PV and LV).
> 
> It's really not fixable as is - since of the complete lack of 'error'
> handling of device in udev DB (i.e. duplicate devices, various frozen
> devices...)
> 
> There is on going  'SID' project - that might push the logic somewhat
> further, but existing 'device' support logic as is today is unfortunate
> 'trace' of how the design should not have been made - and since all
> 'original' programmers left the project long time ago - it's non-trivial to
> push things forward.

What is the SID project, what are its goals, and how does it plan to
achieve them?

- -- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
-BEGIN PGP SIGNATURE-

iQIzBAEBCgAdFiEEdodNnxM2uiJZBxxxsoi1X/+cIsEFAmL9EEEACgkQsoi1X/+c
IsEkLhAA0/7E1rP44bEFCK76JzKvevdlxe7sRBEPvgn/1m9SiEntC47QiEjnQeNi
cI9RLmHUlpYRghfDQMq8vhKk6a+NbnGsWTx3jciqQph+5SSIfPW9VuNi9w0nlvwS
GPHLweMadCblWqXh8XP2RvJx1Z1QeXZ6kYbfMjhZdxY7a/vg0rXTh0XghSyrgfYs
lgFbcqdJbEX5q70OGds8rhxAbTiBKnPHh3z5aFTCN7ILXO4blRWcqhDvAk0w3SQf
lt5WgDBjZ+5gv2pNiNuwZIzqsgL6FDE4CcR+7JWlAakC1GcocVp87aoiR1hNGMob
ZQoGaivvIjqYwSkWUDUArS8ntcKRBr/mYBcm6WuGZFbWja6NT2tEVJ8vcXdr2x5W
DoPk7Vkj/Y9pOn2kcYQMKR1mGOQhq1AwimSHuzPOeWifUWM5BOkH7hS46Tyq2bZJ
BM/QjUQcnckyAgPRYu+OWP3IvfOU+bFdTKabaoNgtCT85mfgL65sr8kx23ikQeZb
RQ9VcbQnJceKrNsqBnCDE4Xegh96er4Gm+68Crdgs0adHOTcyC5937PPSVy99ls8
MbkdPEVGHe4L1TS8XhI6+NCf0oaFCVE/1vKeS4yO28VbSn/N3pbhiNF6cpc0sWDg
NA0mbIsl19t4j8CtXVCjPeh1+RULvXqhedQIC/xJF3FserAInkc=
=J7YY
-END PGP SIGNATURE-

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor

2022-06-16 Thread Demi Marie Obenour
On Thu, Jun 16, 2022 at 03:22:09PM +0200, Gionatan Danti wrote:
> Il 2022-06-16 09:53 Demi Marie Obenour ha scritto:
> > That seems reasonable.  My conclusion is that dm-thin (which is what LVM
> > uses) is not a good fit for workloads with a lot of small random writes
> > and frequent snapshots, due to the 64k minimum chunk size.  This also
> > explains why dm-thin does not allow smaller blocks: not only would it
> > only support very small thin pools, it would also have massive metadata
> > write overhead.  Hopefully dm-thin v2 will improve the situation.
> 
> I think that, in this case, no free lunch really exists. I tried the
> following thin provisioning methods, each with its strong & weak points:
> 
> lvmthin: probably the more flexible of the mainline kernel options. You pay
> for r/m/w only when allocating a small block (say 4K) the first time after
> taking a snapshot. It is fast and well integrated with lvm command line.
> Con: bad behavior on out-of-space condition

Also, the LVM command line is slow, and there is very large write
amplification with lots of random writes immediately after taking a
snapshot.  Furthermore, because of the mismatch between the dm-thin
block size and the filesystem block size, fstrim might not reclaim as
much space in the pool as one would expect.

> xfs + reflink: a great, simple to use tool when applicable. It has a very
> small granularity (4K) with no r/m/w. Cons: requires fine tuning for good
> performance when reflinking big files; IO freezes during metadata copy for
> reflink; a very small granularity means sequential IO is going to suffer
> heavily (see here for more details:
> https://marc.info/?l=linux-xfs=157891132109888=2)

Also heavy fragmentation can make journal replay very slow, to the point
of taking days on spinning hard drives.  Dave Chinner explains this here:
https://lore.kernel.org/linux-xfs/20220509230918.gp1098...@dread.disaster.area/.

> btrfs: very small granularity (4K) and many integrated features. Cons: bad
> performance overall, especially when using mechanical HDD

Also poor out-of-space handling and unbounded worst-case latency.

> vdo: is provides small granularity (4K) thin provisioning, compression and
> deduplication. Cons: (still) out-of-tree; requires a powerloss protected
> writeback cache to maintain good performance; no snapshot capability
> 
> zfs: designed for the ground up for pervasive CoW, with many features and
> ARC/L2ARC. Cons: out-of-tree; using small granularity (4K) means bad overall
> performance; using big granularity (128K by default) is a necessary
> compromise for most HDD pools.

Is this still a problem on NVMe storage?  HDDs will not really be fast
no matter what one does, at least unless there is a write-back cache
that can convert random I/O to sequential I/O.  Even that only helps
much if your working set fits in cache, or if your workload is
write-mostly.

> For what it is worth, I settled on ZFS when using out-of-tree modules is not
> an issue and lvmthin otherwise (but I plan to use xfs + reflink more in the
> future).
> 
> Do you have any information to share about dm-thin v2? I heard about it some
> years ago, but I found no recent info.

It does not exist yet.  Joe Thornber would be the person to ask
regarding any plans to create it.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor

2022-06-16 Thread Demi Marie Obenour
On Wed, Jun 15, 2022 at 03:42:17PM +0800, Zhiyong Ye wrote:
> 
> 
> 在 6/14/22 10:54 PM, Gionatan Danti 写道:
> > Il 2022-06-14 15:29 Zhiyong Ye ha scritto:
> > > The reason for this may be that when the volume creates a snapshot,
> > > each write to an existing block will cause a COW (Copy-on-write), and
> > > the COW is a copy of the entire data block in chunksize, for example,
> > > when the chunksize is 64k, even if only 4k of data is written, the
> > > entire 64k data block will be copied. I'm not sure if I understand
> > > this correctly.
> > 
> > Yes, in your case, the added copies are lowering total available IOPs.
> > But note how the decrease is sub-linear (from 64K to 1M you have a 16x
> > increase in chunk size but "only" a 10x hit in IOPs): this is due to the
> > lowered metadata overhead.
> 
> It seems that the consumption of COW copies when sending 4k requests is much
> greater than the loss from metadata.
> 
> > A last try: if you can, please regenerate your thin volume with 64K
> > chunks and set fio to execute 64K requests. Lets see if LVM is at least
> > smart enough to avoid coping a to-be-completely-overwritten chunks.
> 
> I regenerated the thin volume with the chunksize of 64K and the random write
> performance data tested with fio 64k requests is as follows:
> caseiops
> thin lv 9381
> snapshotted thin lv 8307

That seems reasonable.  My conclusion is that dm-thin (which is what LVM
uses) is not a good fit for workloads with a lot of small random writes
and frequent snapshots, due to the 64k minimum chunk size.  This also
explains why dm-thin does not allow smaller blocks: not only would it
only support very small thin pools, it would also have massive metadata
write overhead.  Hopefully dm-thin v2 will improve the situation.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Why is the performance of my lvmthin snapshot so poor

2022-06-15 Thread Demi Marie Obenour
On Wed, Jun 15, 2022 at 02:40:29PM +0200, Gionatan Danti wrote:
> Il 2022-06-15 11:46 Zhiyong Ye ha scritto:
> > I also think it meets expectations. But is there any other way to
> > optimize snapshot performance at the code level? Does it help to
> > reduce the chunksize size in the code, I see in the help documentation
> > that the chunksize can only be 64k minimum.
> 
> I don't think forcing the code to use smaller recordsize is a good idea.
> Considering the hard limit on metadata size (16 GB max), 64K chunks are good
> for ~16 TB thin pool - already relatively small.
> 
> A, say, 16K recordsize would be good for a 4 TB pool only, an so on.
> Moreover, sequential performance will significantly suffer.
> 
> I think you have to accept the performance hit on first chunck allocation &
> rewrite.

I seriously hope this will be fixed in dm-thin v2.  It’s a significant
problem for Qubes OS.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] lvm commands hanging when run from inside a kubernetes pod

2022-06-07 Thread Demi Marie Obenour
On Mon, Jun 06, 2022 at 05:01:15PM +0530, Abhishek Agarwal wrote:
> Hey Demi, can you explain how it will help to solve the problem? I'm
> actually not aware of that much low-level stuff but would like to learn
> about it.

By default, a systemd unit runs in the same namespaces as systemd
itself.  Therefore, it runs outside of any container, and has full
access to udev and the host filesystem.  This is what you want when
running lvm2 commands.

> Also, can you provide a few references for it on how I can use it?

The easiest method is the systemd-run command-line tool.  I believe
“systemd-run --working-directory=/ --pipe --quiet -- lvm "$@"” should
work, with "$@" replaced by the actual LVM command you want to run.  Be
sure to pass --reportformat=json to get machine-readable JSON output.
The default output depends on configuration in /etc/lvm/lvm.conf, so you
don’t want to rely on it.  Alternatively, you can pass no arguments to
lvm and get an interactive shell, but that is a bit more complex to use.

To use this method, you will need to bind-mount the host’s system-wide
D-Bus instance into your container.  You will likely need to disable all
forms of security confinement and user namespacing as well.  This means
your container will have full control over the system, but LVM requires
full control over the system in order to function, so that does not
impact security much.  Your container can expose an API that impose
whatever restrictions it desires.

Instead of systemd-run, you can use the D-Bus API exposed by PID 1
directly, but that requires slightly more work than just calling a
command-line tool.  I have never used D-Bus from Go so I cannot comment
on how easy this is.

There are some other caveats with LVM.  I am not sure if these matter
for your use-case, but I thought you might want to be aware of them:

- LVM commands are slow (0.2 to 0.4 seconds or so) and serialized with a
  per-volume group lock.  Performance of individual commands is not a
  high priority of LVM upstream as per prior mailing list discussion.
  The actual time that I/O is suspended is much shorter.

- If LVM gets SIGKILLd or OOM-killed, your system may be left in an
  inconsistent state that requires a reboot to fix.  The latter can be
  prevented by setting OOMScoreAdjust to -1000.

- If you use thin provisioning (via thin pools and/or VDO), be sure to
  have monitoring so you can prevent out of space conditions.  Out of
  space conditions will likely result in all volumes going offline, and
  recovery may require growing the pool.

- Thin pools are backed by the dm-thin device-mapper target, which is
  optimized for overwriting already allocated blocks.  Writing to shared
  blocks, and possibly allocating new blocks, appears to triggers a slow
  path in dm-thin.  Discards are only supported at the block size
  granularity, which is typically greater than the block size of a
  filesystem.

- Deleting a thin volume does not pass down discards to the underlying
  block device, even if LVM is configured to discard deleted logical
  volumes.  You need to use blkdiscard before deleting the volume, but
  this can hang the entire pool unless you use the --step option to
  limit the amount of data discarded at once.

- If you are going to be exposing individual thinly-provisioned block
  devices to untrusted code (such as virtual machine guests), you need
  to prevent udev from scanning the thin volumes and keep zeroing of
  newly provisioned blocks enabled.  The latter is synchronous and slow.

- Shrinking thin or VDO pools is not supported.

- Old-style (not thin) snapshots are slow, and only intended for
  short-lived snapshots for backup purposes.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] lvm commands hanging when run from inside a kubernetes pod

2022-06-06 Thread Demi Marie Obenour
On Mon, Jun 06, 2022 at 11:19:47AM +0530, Abhishek Agarwal wrote:
> 1. Yes, use_lvmetad is 0, and its systemd units for it are stopped/disabled.
> 2. Yes, everything on the host machine i.e(/proc, /sys etc) are getting
> mounted on the pod.
> 
> *ubuntu@ip-172-31-89-47*:*~*$ kubectl exec -it openebs-lvm-node-v6jrb -c
> openebs-lvm-plugin  -n kube-system -- sh
> 
> # ls
> 
> bin  boot  dev etc  home  host  lib  lib32  lib64  libx32  media  mnt opt
> plugin  proc  root  run  sbin  srv  sys  tmp  usr var
> 
> # cd /host
> 
> # ls
> 
> bin  boot  dev etc  home  lib lib32  lib64  libx32  lost+found  media  mnt
> opt  proc  root  run  sbin  snap srv  sys  tmp  usr  var
> 
> #
> 3. The detail output of `strace -f -ttt` command:
> https://pastebin.com/raw/VFyXLNaC

I suggest bind-mounting the host’s D-Bus socket into the container and
using systemd’s D-Bus API to run the LVM commands on the host.  This
will avoid the problems you are having.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] lvm commands hanging when run from inside a kubernetes pod

2022-06-01 Thread Demi Marie Obenour
On Wed, Jun 01, 2022 at 12:20:32AM +0530, Abhishek Agarwal wrote:
> Hi Roger. Thanks for your reply. I have rerun the command with `strace -f`
> as you suggested. Here is the pastebin link containing the detailed output
> of the command: https://pastebin.com/raw/VRuBbHBc

Even if you can get LVM “working”, it is still likely to cause data
corruption at some point, as there is no guarantee that different LVM
processes in different namespaces will see each others’ locks.

Why do you need to run LVM in a container?  What are you trying to
accomplish?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] raid10 with missing redundancy, but health status claims it is ok.

2022-05-30 Thread Demi Marie Obenour
On Mon, May 30, 2022 at 10:16:27AM +0200, Olaf Seibert wrote:
> First, John, thanks for your reply.
> 
> On 28.05.22 18:15, John Stoffel wrote:
> >>>>>> "Olaf" == Olaf Seibert  writes:
> > 
> > I'm leaving for the rest of the weekend, but hopefully this will help you...
> > 
> > Olaf> Hi all, I'm new to this list. I hope somebody here can help me.
> > 
> > We will try!  But I would strongly urge that you take backups of all
> > your data NOW, before you do anything else.  Copy to another disk
> > which is seperate from this system just in case.
> 
> Unfortunately there are some complicating factors that I left out so far.
> The machine in question is a host for virtual machines run by customers.
> So we can't just even look at the data, never mind rsyncing it.
> (the name "nova" might have given that away; that is the name of the 
> OpenStack compute service)

Can you try to live-migrate the VMs off of this node?  If not, can you
announce a maintenance window and power off the VMs so you can take a
block-level backup?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Silence Pool Check Overprovisioning Warnings

2022-05-06 Thread Demi Marie Obenour
On Thu, May 05, 2022 at 12:11:29PM -0400, Robert Simmons wrote:
> Greetings,
> 
> I knowingly overprovision storage pools. I have careful monitoring of the
> overall free space in the storage pool, and the snapshots that I take for a
> particular volume are ephemeral. They're only related to a particular
> reverse engineering task. Once the task is complete, all the snapshots are
> deleted. There are a set of warnings that are generated
> during pool_check_overprovisioning here:
> 
> https://github.com/lvmteam/lvm2/blob/f70d97b91620bc7c2e5c5ccc39913c54379322c2/lib/metadata/thin_manip.c#L413-L428
> 
> The second, third, and fourth warnings are printed
> via log_print_unless_silent but the first is printed via log_warn. I would
> like an option that would silence all four of these warnings. If all four
> were logged via log_print_unless_silent, there would still be the problem
> of what other very useful warnings would I be silencing. This would be a
> suboptimal fix. I have read one proposed fix that appears to be optimal for
> my use case: adding an envvar "LVM_SUPPRESS_POOL_WARNINGS". This was
> proposed by zkabelac at redhat.com here:
> https://listman.redhat.com/archives/linux-lvm/2017-September/024332.html
> 
> What would be the next steps to getting this option implemented?
> 
> I see that there have been two threads about these warnings in the past:
> https://listman.redhat.com/archives/linux-lvm/2016-April/023529.html
> https://listman.redhat.com/archives/linux-lvm/2017-September/024323.html
> 
> One issue in the bug tracker that I can find:
> https://bugzilla.redhat.com/show_bug.cgi?id=1465974
> 
> Finally, we have a thread going about our use case over at Proxmox here:
> https://forum.proxmox.com/threads/solved-you-have-not-turned-on-protection-against-thin-pools-running-out-of-space.91055/

The Qubes OS developers (myself included) would also like a solution.
In Qubes OS, LVM is invoked by a python daemon which currently uses an
ugly regex hack to strip that line out of LVM’s stderr before asserting
that LVM’s stderr is empty.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


[linux-lvm] Running multiple LVM commands in a batch

2022-04-08 Thread Demi Marie Obenour
Would it be possible for LVM to support multi-operation transactions
within a volume group?  Qubes OS often needs to perform 3 LVM commands
at once and doesn’t care about the order of the operations.  It would
also be nice to get the result of the system afterwards, to avoid
needing a separate “lvm lvs” call.

The purpose of this is to improve performance, by allowing device
scanning, metadata access, and synchronizing with udev to happen once
per batch instead of once per operation.  I’m willing to promise that
there are no dependencies between operations and that the operations can
safely be performed concurrently.  I also don’t need the overall
operation to be atomic, so long as the system is always in a consistent
state and any problems result in a non-zero exit code.  Getting the
result of the individual operations would be nice but is not required.

I’m not sure what the best syntax for this would be.  For the
interactive shell, “begin” and “end” commands might be an option.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


[linux-lvm] [PATCH 2/3] Disable lvm2 udev rules if `dmsetup splitname` fails

2022-04-04 Thread Demi Marie Obenour
If the output of `dmsetup splitname` cannot be trusted, the safest
option is to disable all lvm2 rules.
---
 udev/11-dm-lvm.rules.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/udev/11-dm-lvm.rules.in b/udev/11-dm-lvm.rules.in
index 7c58994..f7066b7 100644
--- a/udev/11-dm-lvm.rules.in
+++ b/udev/11-dm-lvm.rules.in
@@ -18,7 +18,7 @@ ENV{DM_UDEV_RULES_VSN}!="?*", GOTO="lvm_end"
 ENV{DM_UUID}!="LVM-?*", GOTO="lvm_end"
 
 # Use DM name and split it up into its VG/LV/layer constituents.
-IMPORT{program}="(DM_EXEC)/dmsetup splitname --nameprefixes --noheadings 
--rows $env{DM_NAME}"
+IMPORT{program}!="(DM_EXEC)/dmsetup splitname --nameprefixes --noheadings 
--rows $env{DM_NAME}", GOTO="lvm_disable"
 
 # DM_SUBSYSTEM_UDEV_FLAG0 is the 'NOSCAN' flag for LVM subsystem.
 # This flag is used to temporarily disable selected rules to prevent any
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab





signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


[linux-lvm] [PATCH 1/3] dmsetup: return non-zero on stdio I/O error

2022-04-04 Thread Demi Marie Obenour
If there is an I/O error on stdout, return a non-zero status so that
udev can avoid trusting the values printed.  Deeper changes to the
log code are out of scope for this patch.
---
 libdm/dm-tools/dmsetup.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libdm/dm-tools/dmsetup.c b/libdm/dm-tools/dmsetup.c
index d01b8f2..0449e40 100644
--- a/libdm/dm-tools/dmsetup.c
+++ b/libdm/dm-tools/dmsetup.c
@@ -7491,5 +7491,7 @@ out:
if (_initial_timestamp)
dm_timestamp_destroy(_initial_timestamp);
 
+if (fflush(stdout) || ferror(stdout))
+return 1;
return (_switches[HELP_ARG] || _switches[VERSION_ARG]) ? 0 : ret;
 }
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


[linux-lvm] [PATCH 3/3] Disable udev rules if udev flags can't be obtained

2022-04-04 Thread Demi Marie Obenour
In this cased the safest option is to disable most udev rules.
---
 udev/10-dm.rules.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/udev/10-dm.rules.in b/udev/10-dm.rules.in
index b4fa52a..9fae8df 100644
--- a/udev/10-dm.rules.in
+++ b/udev/10-dm.rules.in
@@ -50,7 +50,7 @@ ACTION!="add|change", GOTO="dm_end"
 # These flags are encoded in DM_COOKIE variable that was introduced in
 # kernel version 2.6.31. Therefore, we can use this feature with
 # kernels >= 2.6.31 only. Cookie is not decoded for remove event.
-ENV{DM_COOKIE}=="?*", IMPORT{program}="(DM_EXEC)/dmsetup udevflags 
$env{DM_COOKIE}"
+ENV{DM_COOKIE}=="?*", IMPORT{program}!="(DM_EXEC)/dmsetup udevflags 
$env{DM_COOKIE}", GOTO="dm_disable"
 
 # Rule out easy-to-detect inappropriate events first.
 ENV{DISK_RO}=="1", GOTO="dm_disable"
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab



signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


[linux-lvm] [PATCH 2/3] Disable lvm2 udev rules if `dmsetup splitname` fails

2022-04-04 Thread Demi Marie Obenour
If the output of `dmsetup splitname` cannot be trusted, the safest
option is to disable all lvm2 rules.
---
 udev/11-dm-lvm.rules.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/udev/11-dm-lvm.rules.in b/udev/11-dm-lvm.rules.in
index 7c58994..f7066b7 100644
--- a/udev/11-dm-lvm.rules.in
+++ b/udev/11-dm-lvm.rules.in
@@ -18,7 +18,7 @@ ENV{DM_UDEV_RULES_VSN}!="?*", GOTO="lvm_end"
 ENV{DM_UUID}!="LVM-?*", GOTO="lvm_end"
 
 # Use DM name and split it up into its VG/LV/layer constituents.
-IMPORT{program}="(DM_EXEC)/dmsetup splitname --nameprefixes --noheadings 
--rows $env{DM_NAME}"
+IMPORT{program}!="(DM_EXEC)/dmsetup splitname --nameprefixes --noheadings 
--rows $env{DM_NAME}", GOTO="lvm_disable"
 
 # DM_SUBSYSTEM_UDEV_FLAG0 is the 'NOSCAN' flag for LVM subsystem.
 # This flag is used to temporarily disable selected rules to prevent any
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab



signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


[linux-lvm] [PATCH 1/3] dmsetup: return non-zero on stdio I/O error

2022-04-04 Thread Demi Marie Obenour
If there is an I/O error on stdout, return a non-zero status so that
udev can avoid trusting the values printed.  Deeper changes to the
log code are out of scope for this patch.
---
 libdm/dm-tools/dmsetup.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libdm/dm-tools/dmsetup.c b/libdm/dm-tools/dmsetup.c
index d01b8f2..0449e40 100644
--- a/libdm/dm-tools/dmsetup.c
+++ b/libdm/dm-tools/dmsetup.c
@@ -7491,5 +7491,7 @@ out:
if (_initial_timestamp)
dm_timestamp_destroy(_initial_timestamp);
 
+if (fflush(stdout) || ferror(stdout))
+return 1;
return (_switches[HELP_ARG] || _switches[VERSION_ARG]) ? 0 : ret;
 }
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab



signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


[linux-lvm] [PATCH 0/3] Make udev rules more robust

2022-04-04 Thread Demi Marie Obenour
This makes the udev rules more robust against various unsafe error
conditions.

Demi Marie Obenour (3):
  dmsetup: return non-zero on stdio I/O error
  Disable lvm2 udev rules if `dmsetup splitname` fails
  Disable udev rules if udev flags can't be obtained

 libdm/dm-tools/dmsetup.c | 2 ++
 udev/10-dm.rules.in  | 2 +-
 udev/11-dm-lvm.rules.in  | 2 +-
 3 files changed, 4 insertions(+), 2 deletions(-)

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


[linux-lvm] Safety of DM_DISABLE_UDEV=1 and --noudevsync

2022-04-01 Thread Demi Marie Obenour
Under what circumstances are DM_DISABLE_UDEV=1 and --noudevsync safe?
In Qubes OS, for example, I am considering using one or both of these,
but only for operations performed by qubesd.  systemd-udevd will still
be running, but it will be told to create no symlinks for the devices
these commands create or destroy.  systemd-udevd will still be in charge
of other devices, however, and other lvm2 commands may run that use
neither of these.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


[linux-lvm] LVM fast mode

2022-03-07 Thread Demi Marie Obenour
Would it be possible to have a fast, but less-safe mode for lvm2?
Right now lvm2 always enumerates every single block device on
the system, which is hugely wasteful when one knows exactly which
block devices are the PVs that will be affected by the operation.
“Less-safe” means that this is intended for use by programs that
know what they are doing, and that know that e.g. clustering is not
in use.  Using this in conjunction with clustering is a bug in the
application.  Commands that enumerate devices will, in fast mode, only
return LVs on the provided list of PVs.  Attempts to modify an LV on a
different PV would fail as of the LV did not exist.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

OpenPGP_0xB288B55FFF9C22C1.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Need inputs: Performance issues with LVM snapshots

2022-03-07 Thread Demi Marie Obenour
On 3/7/22 09:44, Gionatan Danti wrote:
> Il 2022-03-07 12:09 Gaikwad, Hemant ha scritto:
>> Hi,
>>
>> We have been looking at LVM as an option for long term backup using
>> the LVM snapshots. After looking at the various forums and also
>> looking at a few LVM defects, realized that LVM could be a very good
>> option for short term backup, but might result in performance issues
>> if the snapshots are retained for a long time. Also read we should
>> restrict the number of snapshots. We are thinking of keeping it to 3,
>> but do you think that could also be a performance bottleneck. A few
>> forum posts also suggest memory issues with using LVM snapshots. Can
>> you please help with some data on that too. Thanks in advance for
>> making our decision easier. Thanks
>>
>> Regards,
> 
> Classical, non-thin LVM snapshots are only meant to be short-lived (just 
> enough to take a backup), and the performance penalty you talk about 
> does apply.
> Thin LVM snapshots, on the other side, command a much lower performance 
> penalty and can be long-lived (ie: think about a rolling snapshot 
> system).
> So if you need multiple, long-lived snapshots, I strongly suggest you to 
> check lvmthin.
> Regards.

Also worth noting that there is a minimum time to perform an LVM
operation of any type, a bit over 0.2 seconds on my machine.  If you
need to create snapshots exceedingly quickly, then LVM itself will be a
bottleneck.  Virtually all applications will not run into this problem,
however.  The only ones I can think of that will are container or VM
managers that need to spin up a lot of containers or VMs, as Qubes
OS does.  The time I mentioned is the time needed to run the entire LVM
command; the time I/O is suspended for is far, *far* shorter.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

OpenPGP_0xB288B55FFF9C22C1.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Thin pool performance when allocating lots of blocks

2022-02-08 Thread Demi Marie Obenour
On 2/8/22 15:37, Zdenek Kabelac wrote:
> Dne 08. 02. 22 v 20:00 Demi Marie Obenour napsal(a):
>> Are thin volumes (which start as snapshots of a blank volume) efficient
>> for building virtual machine images?  Given the nature of this workload
>> (writing to lots of new, possibly-small files, then copying data from
>> them to a huge disk image), I expect that this will cause sharing to be
>> broken many, many times, and the kernel code that breaks sharing appears
>> to be rather heavyweight.  Furthermore, since zeroing is enabled, this
>> might cause substantial write amplification.  Turning zeroing off is not
>> an option for security reasons.
>>
>> Is there a way to determine if breaking sharing is the cause of
>> performance problems?  If it is, are there any better solutions?
> 
> Hi
> 
> Usually the smaller the thin chunks size is the smaller the problem gets.
> With current released version of thin-provisioning minimal chunk size is 
> 64KiB. So you can't use smaller value to further reduce this impact.
> 
> Note - even if you do a lot of tiny 4KiB writes  - only the 'first' such 
> write 
> into 64K area breaks sharing all following writes to same location no longer 
> have this penalty (also zeroing with 64K is less impactful...)
> 
> But it's clear thin-provisioning comes with some price - so if it's not good 
> enough from time constrains some other solutions might need to be explored.
> (i.e. caching, better hw, splitting  FS into multiple partitions with 
> 'read-only sections,)

Are the code paths that break sharing as heavyweight as I was worried
about?  Would a hypothetical dm-thin2 that used dm-bio-prison-v2 be
faster?

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

OpenPGP_0xB288B55FFF9C22C1.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

[linux-lvm] Thin pool performance when allocating lots of blocks

2022-02-08 Thread Demi Marie Obenour
Are thin volumes (which start as snapshots of a blank volume) efficient
for building virtual machine images?  Given the nature of this workload
(writing to lots of new, possibly-small files, then copying data from
them to a huge disk image), I expect that this will cause sharing to be
broken many, many times, and the kernel code that breaks sharing appears
to be rather heavyweight.  Furthermore, since zeroing is enabled, this
might cause substantial write amplification.  Turning zeroing off is not
an option for security reasons.

Is there a way to determine if breaking sharing is the cause of
performance problems?  If it is, are there any better solutions?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM performance vs direct dm-thin

2022-02-03 Thread Demi Marie Obenour
On Thu, Feb 03, 2022 at 01:28:37PM +0100, Zdenek Kabelac wrote:
> Dne 03. 02. 22 v 5:48 Demi Marie Obenour napsal(a):
> > On Mon, Jan 31, 2022 at 10:29:04PM +0100, Marian Csontos wrote:
> > > On Sun, Jan 30, 2022 at 11:17 PM Demi Marie Obenour <
> > > d...@invisiblethingslab.com> wrote:
> > > 
> > > > On Sun, Jan 30, 2022 at 04:39:30PM -0500, Stuart D. Gathman wrote:
> > > > > Your VM usage is different from ours - you seem to need to clone and
> > > > > activate a VM quickly (like a vps provider might need to do).  We
> > > > > generally have to buy more RAM to add a new VM :-), so performance of
> > > > > creating a new LV is the least of our worries.
> > > > 
> > > > To put it mildly, yes :).  Ideally we could get VM boot time down to
> > > > 100ms or lower.
> > > > 
> > > 
> > > Out of curiosity, is snapshot creation the main culprit to boot a VM in
> > > under 100ms? Does Qubes OS use tweaked linux distributions, to achieve the
> > > desired boot time?
> > 
> > The goal is 100ms from user action until PID 1 starts in the guest.
> > After that, it’s the job of whatever distro the guest is running.
> > Storage management is one area that needs to be optimized to achieve
> > this, though it is not the only one.
> 
> I'm wondering from where those 100ms came from?
> 
> Users often mistakenly target for wrong technologies for their tasks.
> 
> If they need to use containerized software they should use containers like
> i.e. Docker - if they need full virtual secure machine - it certainly has
> it's price (mainly way higher memory consumption)
> I've some doubts there is some real good reason to have quickly created VMs
> as they surely are supposed to be a long time living entities
> (hours/days...)

Simply put, Qubes OS literally does not have a choice.  Qubes OS is
intended to protect against very high-level attackers who are likely to
have 0day exploits against the Linux kernel.  And it is trying to do the
best possible given that constraint.  A microkernel *could* provide
sufficiently strong isolation, but there are none that have sufficiently
broad hardware support and sufficiently capable userlands.

In the long term, I would like to use unikernels for at least some of
the VMs.  Unikernels can start up so quickly that the largest overhead
is the hypervisor’s toolstack.  But that is very much off-topic.

> So unless you want to create something for marketing purposes aka - my table
> is bigger then yours - I don't see the point.
> 
> For quick instancies of software apps I'd always recommend containers -
> which are vastly more efficient and scalable.
> 
> VMs and containers have its strength and weaknesses..
> Not sure why some many people try to pretend VMs can be as efficient as
> containers or containers as secure as VMs. Just always pick the right
> tool...

Qubes OS needs secure *and* fast.  To quote the seL4 microkernel’s
mantra, “Security is no excuse for poor performance!”.

> > > Back to business. Perhaps I missed an answer to this question: Are the
> > > Qubes OS VMs throw away?  Throw away in the sense like many containers are
> > > - it's just a runtime which can be "easily" reconstructed. If so, you can
> > > ignore the safety belts and try to squeeze more performance by sacrificing
> > > (meta)data integrity.
> > 
> > Why does a trade-off need to be made here?  More specifically, why is it
> > not possible to be reasonably fast (a few ms) AND safe?
> 
> Security, safety and determinism always takes away efficiency.
> 
> The higher amount of randomness you can live with, the faster processing you
> can achieve - you just need to cross you fingers :)
> (i.e. drop transaction synchornisation :))
> 
> Quite frankly - if you are orchestrating mostly same VMs, it would be more
> efficient, to just snapshot them with already running memory environment -
> so instead of booting VM always from 'scratch', you restore/resume those VMs
> at some already running point - from which it could start deviate.
> Why wasting CPU on processing over and over same boot
> There you should hunt your miliseconds...

Qubes OS used to do that, but it was a significant maintenance burden.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM performance vs direct dm-thin

2022-02-02 Thread Demi Marie Obenour
On Mon, Jan 31, 2022 at 10:29:04PM +0100, Marian Csontos wrote:
> On Sun, Jan 30, 2022 at 11:17 PM Demi Marie Obenour <
> d...@invisiblethingslab.com> wrote:
> 
>> On Sun, Jan 30, 2022 at 04:39:30PM -0500, Stuart D. Gathman wrote:
>>> Your VM usage is different from ours - you seem to need to clone and
>>> activate a VM quickly (like a vps provider might need to do).  We
>>> generally have to buy more RAM to add a new VM :-), so performance of
>>> creating a new LV is the least of our worries.
>>
>> To put it mildly, yes :).  Ideally we could get VM boot time down to
>> 100ms or lower.
>>
> 
> Out of curiosity, is snapshot creation the main culprit to boot a VM in
> under 100ms? Does Qubes OS use tweaked linux distributions, to achieve the
> desired boot time?

The goal is 100ms from user action until PID 1 starts in the guest.
After that, it’s the job of whatever distro the guest is running.
Storage management is one area that needs to be optimized to achieve
this, though it is not the only one.

> Back to business. Perhaps I missed an answer to this question: Are the
> Qubes OS VMs throw away?  Throw away in the sense like many containers are
> - it's just a runtime which can be "easily" reconstructed. If so, you can
> ignore the safety belts and try to squeeze more performance by sacrificing
> (meta)data integrity.

Why does a trade-off need to be made here?  More specifically, why is it
not possible to be reasonably fast (a few ms) AND safe?

> And the answer to that question seems to be both Yes and No. Classical pets
> vs cattle.
> 
> As I understand it, except of the system VMs, there are at least two kinds
> of user domains and these have different requirements:
> 
> 1. few permanent pet VMs (Work, Personal, Banking, ...), in Qubes OS called
> AppVMs,
> 2. and many transient cattle VMs (e.g. for opening an attachment from
> email, or browsing web, or batch processing of received files) called
> Disposable VMs.
> 
> For AppVMs, there are only "few" of those and these are running most of the
> time so start time may be less important than data safety. Certainly
> creation time is only once in a while operation so I would say use LVM for
> these. And where snapshots are not required, use plain linear LVs, one less
> thing which could go wrong. However, AppVMs are created from Template VMs,
> so snapshots seem to be part of the system.

Snapshots are used and required *everywhere*.  Qubes OS offers
copy-on-write cloning support, and users expect it to be cheap, not
least because renaming a qube is implemented using it.  By default,
AppVM private and TemplateVM root volumes always have at least one
snapshot, to support `qvm-volume revert`.  Start time really matters
too; a user may not wish to have every qube running at once.

In short, performance and safety *both* matter, and data AND metadata
operations are performance-critical.

> But data may be on linear LVs
> anyway as these are not shared and these are the most important part of the
> system. And you can still use old style snapshots for backing up the data
> (and by backup I mean snapshot, copy, delete snapshot. Not a long term
> snapshot. And definitely not multiple snapshots).

Creating a qube is intended to be a cheap operation, so thin
provisioning of storage is required.  Qubes OS also relies heavily
on over-provisioning of storage, so linear LVs and old style snapshots
won’t fly.  Qubes OS does have a storage driver that uses dm-snapshot on
top of loop devices, but that is deprecated, since it cannot provide the
features Qubes OS requires.  As just one example, the default private
volume size is 2GiB, but many qubes use nowhere near this amount of disk
space.

> Now I realized there is the third kind of user domains - Template VMs.
> Similarly to App VM, there are only few of those, and creating them
> requires downloading an image, upgrading system on an existing template, or
> even installation of the system, so any LVM overhead is insignificant for
> these. Use thin volumes.
> 
> For the Disposable VMs it is the creation + startup time which matters. Use
> whatever is the fastest method. These are created from template VMs too.
> What LVM/DM has to offer here is external origin. So the templates
> themselves could be managed by LVM, and Qubes OS could use them as external
> origin for Disposable VMs using device mapper directly. These could be held
> in a disposable thin pool which can be reinitialized from scratch on host
> reboot, after a crash, or on a problem with the pool. As a bonus this would
> also address the absence of thin pool shrinking.

That is an interesting idea I had not considered, but it would add
substantial complexity to the storage management system.  More
generally, the same ap

Re: [linux-lvm] LVM performance vs direct dm-thin

2022-02-02 Thread Demi Marie Obenour
On Wed, Feb 02, 2022 at 11:04:37AM +0100, Zdenek Kabelac wrote:
> Dne 02. 02. 22 v 3:09 Demi Marie Obenour napsal(a):
> > On Sun, Jan 30, 2022 at 06:43:13PM +0100, Zdenek Kabelac wrote:
> > > Dne 30. 01. 22 v 17:45 Demi Marie Obenour napsal(a):
> > > > On Sun, Jan 30, 2022 at 11:52:52AM +0100, Zdenek Kabelac wrote:
> > > > > Dne 30. 01. 22 v 1:32 Demi Marie Obenour napsal(a):
> > > > > > On Sat, Jan 29, 2022 at 10:32:52PM +0100, Zdenek Kabelac wrote:
> > > > > > > Dne 29. 01. 22 v 21:34 Demi Marie Obenour napsal(a):
> > > My biased advice would be to stay with lvm2. There is lot of work, many
> > > things are not well documented and getting everything running correctly 
> > > will
> > > take a lot of effort  (Docker in fact did not managed to do it well and 
> > > was
> > > incapable to provide any recoverability)
> > 
> > What did Docker do wrong?  Would it be possible for a future version of
> > lvm2 to be able to automatically recover from off-by-one thin pool
> > transaction IDs?
> 
> Ensuring all steps in state-machine are always correct is not exactly simple.
> But since I've not heard about off-by-one problem for a long while -  I
> believe we've managed to close all the holes and bugs in double-commit
> system
> and metadata handling by thin-pool and lvm2 (for recent lvm2 & kernel)

How recent are you talking about?  Are there fixes that can be
cherry-picked?  I somewhat recently triggered this issue on a test
machine, so I would like to know.

> > > It's difficult - if you would be distributing lvm2 with exact kernel 
> > > version
> > > & udev & systemd with a single linux distro - it reduces huge set of
> > > troubles...
> > 
> > Qubes OS comes close to this in practice.  systemd and udev versions are
> > known and fixed, and Qubes OS ships its own kernels.
> 
> Systemd/udev evolves - so fixed today doesn't really mean same version will
> be there tomorrow.  And unfortunately systemd is known to introduce
> backward incompatible changes from time to time...

Thankfully, in Qubes OS’s dom0, the version of systemd is frozen and
will never change throughout an entire release.

> > > Chain filesystem->block_layer->filesystem->block_layer is something you 
> > > most
> > > likely do not want to use for any well performing solution...
> > > But it's ok for testing...
> > 
> > How much of this is due to the slow loop driver?  How much of it could
> > be mitigated if btrfs supported an equivalent of zvols?
> 
> Here you are missing the core of problem from kernel POV aka
> how the memory allocation is working and what are the approximation in
> kernel with buffer handling and so on.
> So whoever is using  'loop' devices in production systems in the way
> described above has never really tested any corner case logic

In Qubes OS the loop device is always passed through to a VM or used as
the base device for an old-style device-mapper snapshot.  It is never
mounted on the host.  Are there known problems with either of these
configurations?

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM performance vs direct dm-thin

2022-02-01 Thread Demi Marie Obenour
On Sun, Jan 30, 2022 at 06:43:13PM +0100, Zdenek Kabelac wrote:
> Dne 30. 01. 22 v 17:45 Demi Marie Obenour napsal(a):
> > On Sun, Jan 30, 2022 at 11:52:52AM +0100, Zdenek Kabelac wrote:
> > > Dne 30. 01. 22 v 1:32 Demi Marie Obenour napsal(a):
> > > > On Sat, Jan 29, 2022 at 10:32:52PM +0100, Zdenek Kabelac wrote:
> > > > > Dne 29. 01. 22 v 21:34 Demi Marie Obenour napsal(a):
> > > > > > How much slower are operations on an LVM2 thin pool compared to 
> > > > > > manually
> > > > > > managing a dm-thin target via ioctls?  I am mostly concerned about
> > > > > > volume snapshot, creation, and destruction.  Data integrity is very
> > > > > > important, so taking shortcuts that risk data loss is out of the
> > > > > > question.  However, the application may have some additional 
> > > > > > information
> > > > > > that LVM2 does not have.  For instance, it may know that the volume 
> > > > > > that
> > > > > > it is snapshotting is not in use, or that a certain volume it is
> > > > > > creating will never be used after power-off.
> > > > > > 
> > > > 
> > > > > So brave developers may always write their own management tools for 
> > > > > their
> > > > > constrained environment requirements that will by significantly 
> > > > > faster in
> > > > > terms of how many thins you could create per minute (btw you will 
> > > > > need to
> > > > > also consider dropping usage of udev on such system)
> > > > 
> > > > What kind of constraints are you referring to?  Is it possible and safe
> > > > to have udev running, but told to ignore the thins in question?
> > > 
> > > Lvm2 is oriented more towards managing set of different disks,
> > > where user is adding/removing/replacing them.  So it's more about
> > > recoverability, good support for manual repair  (ascii metadata),
> > > tracking history of changes,  backward compatibility, support
> > > of conversion to different volume types (i.e. caching of thins, pvmove...)
> > > Support for no/udev & no/systemd, clusters and nearly every linux distro
> > > available... So there is a lot - and this all adds quite complexity.
> > 
> > I am certain it does, and that makes a lot of sense.  Thanks for the
> > hard work!  Those features are all useful for Qubes OS, too — just not
> > in the VM startup/shutdown path.
> > 
> > > So once you scratch all this - and you say you only care about single disc
> > > then you are able to use more efficient metadata formats which you could
> > > even keep permanently in memory during the lifetime - this all adds great
> > > performance.
> > > 
> > > But it all depends how you could constrain your environment.
> > > 
> > > It's worth to mention there is lvm2 support for 'external' 'thin volume'
> > > creators - so lvm2 only maintains 'thin-pool' data & metadata LV - but 
> > > thin
> > > volume creation, activation, deactivation of thins is left to external 
> > > tool.
> > > This has been used by docker for a while - later on they switched to
> > > overlayFs I believe..
> > 
> > That indeeds sounds like a good choice for Qubes OS.  It would allow the
> > data and metadata LVs to be any volume type that lvm2 supports, and
> > managed using all of lvm2’s features.  So one could still put the
> > metadata on a RAID-10 volume while everything else is RAID-6, or set up
> > a dm-cache volume to store the data (please correct me if I am wrong).
> > Qubes OS has already moved to using a separate thin pool for virtual
> > machines, as it prevents dom0 (privileged management VM) from being run
> > out of disk space (by accident or malice).  That means that the thin
> > pool use for guests is managed only by Qubes OS, and so the standard
> > lvm2 tools do not need to touch it.
> > 
> > Is this a setup that you would recommend, and would be comfortable using
> > in production?  As far as metadata is concerned, Qubes OS has its own
> > XML file containing metadata about all qubes, which should suffice for
> > this purpose.  To prevent races during updates and ensure automatic
> > crash recovery, is it sufficient to store metadata for both new and old
> > transaction IDs, and pick the correct one based on the device-mapper
> > status line?  I have seen lvm2 get in an inconsistent state (transacti

Re: [linux-lvm] Running thin_trim before activating a thin pool

2022-01-31 Thread Demi Marie Obenour
On Mon, Jan 31, 2022 at 06:54:48PM +0100, Gionatan Danti wrote:
> Il 2022-01-31 16:28 Demi Marie Obenour ha scritto:
> > thin_trim is a userspace tool that works on an entire thin pool, and I
> > suspect it may be significantly faster than blkdiscard of an individual
> > thin volume.  That said, what I would *really* like is something
> > equivalent to fstrim for thin volumes: a tool that works asynchronously,
> > in the background, without disrupting concurrent I/O.
> 
> Are you sure that fstrim works asynchronously?
> I remember "fstrim -v /mybigdevice" tacking some time (~30s).

It’s happens online and without preventing other I/O from happening,
whereas thin_trim can only be run offline.

> > FYI, you might want to specify a full fingerprint here; short key IDs
> > are highly vulnerable to collision and preimage attacks.
> 
> Yeah, it is a 15 years old signature I must decide to update ;)
> Thanks.

Might want to generate a new key while you are at it :) (especially if
the old one is weak)

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Running thin_trim before activating a thin pool

2022-01-31 Thread Demi Marie Obenour
On Mon, Jan 31, 2022 at 12:02:23PM +0100, Gionatan Danti wrote:
> Il 2022-01-29 18:45 Demi Marie Obenour ha scritto:
> > Is it possible to configure LVM2 so that it runs thin_trim before it
> > activates a thin pool?  Qubes OS currently runs blkdiscard on every thin
> > volume before deleting it, which is slow and unreliable.  Would running
> > thin_trim during system startup provide a better alternative?
> 
> I think that, if anything, it would be worse: a long discard during boot can
> be problematic, even leading to timeout on starting other services.
> After all, blkdiscard should be faster then something done at higher level.

thin_trim is a userspace tool that works on an entire thin pool, and I
suspect it may be significantly faster than blkdiscard of an individual
thin volume.  That said, what I would *really* like is something
equivalent to fstrim for thin volumes: a tool that works asynchronously,
in the background, without disrupting concurrent I/O.

> -- 
> Danti Gionatan
> Supporto Tecnico
> Assyoma S.r.l. - www.assyoma.it
> email: g.da...@assyoma.it - i...@assyoma.it
> GPG public key ID: FF5F32A8

FYI, you might want to specify a full fingerprint here; short key IDs
are highly vulnerable to collision and preimage attacks.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

[linux-lvm] Running thin_trim before activating a thin pool

2022-01-31 Thread Demi Marie Obenour
Is it possible to configure LVM2 so that it runs thin_trim before it
activates a thin pool?  Qubes OS currently runs blkdiscard on every thin
volume before deleting it, which is slow and unreliable.  Would running
thin_trim during system startup provide a better alternative?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM performance vs direct dm-thin

2022-01-30 Thread Demi Marie Obenour
On Sun, Jan 30, 2022 at 04:39:30PM -0500, Stuart D. Gathman wrote:
> On Sun, 2022-01-30 at 11:45 -0500, Demi Marie Obenour wrote:
> > On Sun, Jan 30, 2022 at 11:52:52AM +0100, Zdenek Kabelac wrote:
> > > 
> > 
> > > Since you mentioned ZFS - you might want focus on using 'ZFS-only'
> > > solution.
> > > Combining  ZFS or Btrfs with lvm2 is always going to be a painful
> > > way as
> > > those filesystems have their own volume management.
> > 
> > Absolutely!  That said, I do wonder what your thoughts on using loop
> > devices for VM storage are.  I know they are slower than thin
> > volumes,
> > but they are also much easier to manage, since they are just ordinary
> > disk files.  Any filesystem with reflink can provide the needed
> > copy-on-write support.
> 
> I use loop devices for test cases - especially with simulated IO
> errors.  Devs really appreciate having an easy reproducer for
> database/filesystem bugs (which often involve handling of IO errors). 
> But not for production VMs.
> 
> I use LVM as flexible partitions (i.e. only classic LVs, no thin pool).
> Classic LVs perform like partitions, literally using the same driver
> (device mapper) with a small number of extents, and are if anything
> more recoverable than partition tables.  We used to put LVM on bare
> drives (like AIX did) - who needs a partition table?  But on Wintel,
> you need a partition table for EFI and so that alien operating systems
> know there is something already on a disk.
> 
> Your VM usage is different from ours - you seem to need to clone and
> activate a VM quickly (like a vps provider might need to do).  We
> generally have to buy more RAM to add a new VM :-), so performance of
> creating a new LV is the least of our worries.

To put it mildly, yes :).  Ideally we could get VM boot time down to
100ms or lower.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM performance vs direct dm-thin

2022-01-30 Thread Demi Marie Obenour
On Sun, Jan 30, 2022 at 09:27:56PM +0100, Gionatan Danti wrote:
> Il 2022-01-30 18:43 Zdenek Kabelac ha scritto:
> > Chain filesystem->block_layer->filesystem->block_layer is something
> > you most likely do not want to use for any well performing solution...
> > But it's ok for testing...
> 
> I second that.
> 
> Demi Marie - just a question: are you sure do you really needs a block
> device? I don't know QubeOS, but both KVM and Xen can use files as virtual
> disks. This would enable you to ignore loopback mounts.

On Xen, the paravirtualised block backend driver (blkback) requires a
block device, so file-based virtual disks are implemented with a loop
device managed by the toolstack.  Suggestions for improving this
less-than-satisfactory situation are welcome.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Running thin_trim before activating a thin pool

2022-01-30 Thread Demi Marie Obenour
On Sun, Jan 30, 2022 at 06:56:43PM +0100, Zdenek Kabelac wrote:
> Dne 30. 01. 22 v 18:30 Demi Marie Obenour napsal(a):
> > On Sun, Jan 30, 2022 at 12:18:32PM +0100, Zdenek Kabelac wrote:
> > > Discard of thins itself is AFAIC pretty fast - unless you have massively
> > > sized thin devices with many GiB of metadata - obviously you cannot 
> > > process
> > > this amount of metadata in nanoseconds (and there are prepared kernel
> > > patches to make it even faster)
> > 
> > Would you be willing and able to share those patches?
> 
> Then are always landing in upstream kernel once they are all validated &
> tested (recent kernel already has many speed enhancements).

Thanks!  Which mailing list should I be watching?

> > > What is the problem is the speed of discard of physical devices.
> > > You could actually try to feel difference with:
> > > lvchange --discards passdown|nopassdown thinpool
> > 
> > In Qubes OS I believe we do need the discards to be passed down
> > eventually, but I doubt it needs to be synchronous.  Being able to run
> > the equivalent of `fstrim -av` periodically would be amazing.  I’m
> > CC’ing Marek Marczykowski-Górecki (Qubes OS project lead) in case he
> > has something to say.
> 
> You could easily run in parallel individual blkdiscards for your thin LVs
> For most modern drives thought it's somewhat waste of time...
> 
> Those trimming tools should be used when they are solving some real
> problems, running them just for fun is just energy & performance waste

My understanding (which could be wrong) is that periodic trim is
necessary for SSDs.

> > > Also it's very important to keep metadata on fast storage device 
> > > (SSD/NVMe)!
> > > Keeping metadata on same hdd spindle as data is always going to feel slow
> > > (in fact it's quite pointless to talk about performance and use hdd...)
> > 
> > That explains why I had such a horrible experience with my initial
> > (split between NVMe and HDD) install.  I would not be surprised if some
> > or all of the metadata volume wound up on the spinning disk.
> 
> With lvm2 user can always 'pvmove'  any LV to any desired PV.
> There is not yet any 'smart' logic to do it automatically.

Good point.  I was probably unware of that at the time.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Running thin_trim before activating a thin pool

2022-01-30 Thread Demi Marie Obenour
On Sun, Jan 30, 2022 at 12:18:32PM +0100, Zdenek Kabelac wrote:
> Dne 30. 01. 22 v 2:20 Demi Marie Obenour napsal(a):
> > On Sat, Jan 29, 2022 at 10:40:34PM +0100, Zdenek Kabelac wrote:
> > > Dne 29. 01. 22 v 21:09 Demi Marie Obenour napsal(a):
> > > > On Sat, Jan 29, 2022 at 08:42:21PM +0100, Zdenek Kabelac wrote:
> > > > > Dne 29. 01. 22 v 19:52 Demi Marie Obenour napsal(a):
> > > > > > Is it possible to configure LVM2 so that it runs thin_trim before it
> > > > > > activates a thin pool?  Qubes OS currently runs blkdiscard on every 
> > > > > > thin
> > > > > > volume before deleting it, which is slow and unreliable.  Would 
> > > > > > running
> > > > > > thin_trim during system startup provide a better alternative?
> > > > > 
> > > > > Hi
> > > > > 
> > > > > 
> > > > > Nope there is currently no support from lvm2 side for this.
> > > > > Feel free to open RFE.
> > > > 
> > > > Done: https://bugzilla.redhat.com/show_bug.cgi?id=2048160
> > > > 
> > > > 
> > > 
> > > Thanks
> > > 
> > > Although your use-case Thinpool on top of VDO is not really a good plan 
> > > and
> > > there is a good reason behind why lvm2 does not support this device stack
> > > directly (aka thin-pool data LV as VDO LV).
> > > I'd say you are stepping on very very thin ice...
> > 
> > Thin pool on VDO is not my actual use-case.  The actual reason for the
> > ticket is slow discards of thin devices that are about to be deleted;
> 
> Hi
> 
> Discard of thins itself is AFAIC pretty fast - unless you have massively
> sized thin devices with many GiB of metadata - obviously you cannot process
> this amount of metadata in nanoseconds (and there are prepared kernel
> patches to make it even faster)

Would you be willing and able to share those patches?

> What is the problem is the speed of discard of physical devices.
> You could actually try to feel difference with:
> lvchange --discards passdown|nopassdown thinpool

In Qubes OS I believe we do need the discards to be passed down
eventually, but I doubt it needs to be synchronous.  Being able to run
the equivalent of `fstrim -av` periodically would be amazing.  I’m
CC’ing Marek Marczykowski-Górecki (Qubes OS project lead) in case he
has something to say.

> Also it's very important to keep metadata on fast storage device (SSD/NVMe)!
> Keeping metadata on same hdd spindle as data is always going to feel slow
> (in fact it's quite pointless to talk about performance and use hdd...)

That explains why I had such a horrible experience with my initial
(split between NVMe and HDD) install.  I would not be surprised if some
or all of the metadata volume wound up on the spinning disk.

> > you can find more details in the linked GitHub issue.  That said, now I
> > am curious why you state that dm-thin on top of dm-vdo (that is,
> > userspace/filesystem/VM/etc ⇒ dm-thin data (*not* metadata) ⇒ dm-vdo ⇒
> > hardware/dm-crypt/etc) is a bad idea.  It seems to be a decent way to
> 
> Out-of-space recoveries are ATM much harder then what we want.

Okay, thanks!  Will this be fixed in a future version?

> So as long as user can maintain free space of your VDO and thin-pool it's
> ok. Once user runs out of space - recovery is pretty hard task (and there is
> reason we have support...)

Out of space is already a tricky issue in Qubes OS.  I certainly would
not want to make it worse.

> > add support for efficient snapshots of data stored on a VDO volume, and
> > to have multiple volumes on top of a single VDO volume.  Furthermore,
> 
> We hope we will add some direct 'snapshot' support to VDO so users will not
> need to combine both technologies together.

Does that include support for splitting a VDO volume into multiple,
individually-snapshottable volumes, the way thin works?

> Thin is more oriented towards extreme speed.
> VDO is more about 'compression & deduplication' - so space efficiency.
> 
> Combining both together is kind of harming their advantages.

That makes sense.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM performance vs direct dm-thin

2022-01-30 Thread Demi Marie Obenour
On Sun, Jan 30, 2022 at 11:52:52AM +0100, Zdenek Kabelac wrote:
> Dne 30. 01. 22 v 1:32 Demi Marie Obenour napsal(a):
> > On Sat, Jan 29, 2022 at 10:32:52PM +0100, Zdenek Kabelac wrote:
> > > Dne 29. 01. 22 v 21:34 Demi Marie Obenour napsal(a):
> > > > How much slower are operations on an LVM2 thin pool compared to manually
> > > > managing a dm-thin target via ioctls?  I am mostly concerned about
> > > > volume snapshot, creation, and destruction.  Data integrity is very
> > > > important, so taking shortcuts that risk data loss is out of the
> > > > question.  However, the application may have some additional information
> > > > that LVM2 does not have.  For instance, it may know that the volume that
> > > > it is snapshotting is not in use, or that a certain volume it is
> > > > creating will never be used after power-off.
> > > > 
> > 
> > > So brave developers may always write their own management tools for their
> > > constrained environment requirements that will by significantly faster in
> > > terms of how many thins you could create per minute (btw you will need to
> > > also consider dropping usage of udev on such system)
> > 
> > What kind of constraints are you referring to?  Is it possible and safe
> > to have udev running, but told to ignore the thins in question?
> 
> Lvm2 is oriented more towards managing set of different disks,
> where user is adding/removing/replacing them.  So it's more about
> recoverability, good support for manual repair  (ascii metadata),
> tracking history of changes,  backward compatibility, support
> of conversion to different volume types (i.e. caching of thins, pvmove...)
> Support for no/udev & no/systemd, clusters and nearly every linux distro
> available... So there is a lot - and this all adds quite complexity.

I am certain it does, and that makes a lot of sense.  Thanks for the
hard work!  Those features are all useful for Qubes OS, too — just not
in the VM startup/shutdown path.

> So once you scratch all this - and you say you only care about single disc
> then you are able to use more efficient metadata formats which you could
> even keep permanently in memory during the lifetime - this all adds great
> performance.
> 
> But it all depends how you could constrain your environment.
> 
> It's worth to mention there is lvm2 support for 'external' 'thin volume'
> creators - so lvm2 only maintains 'thin-pool' data & metadata LV - but thin
> volume creation, activation, deactivation of thins is left to external tool.
> This has been used by docker for a while - later on they switched to
> overlayFs I believe..

That indeeds sounds like a good choice for Qubes OS.  It would allow the
data and metadata LVs to be any volume type that lvm2 supports, and
managed using all of lvm2’s features.  So one could still put the
metadata on a RAID-10 volume while everything else is RAID-6, or set up
a dm-cache volume to store the data (please correct me if I am wrong).
Qubes OS has already moved to using a separate thin pool for virtual
machines, as it prevents dom0 (privileged management VM) from being run
out of disk space (by accident or malice).  That means that the thin
pool use for guests is managed only by Qubes OS, and so the standard
lvm2 tools do not need to touch it.

Is this a setup that you would recommend, and would be comfortable using
in production?  As far as metadata is concerned, Qubes OS has its own
XML file containing metadata about all qubes, which should suffice for
this purpose.  To prevent races during updates and ensure automatic
crash recovery, is it sufficient to store metadata for both new and old
transaction IDs, and pick the correct one based on the device-mapper
status line?  I have seen lvm2 get in an inconsistent state (transaction
ID off by one) that required manual repair before, which is quite
unnerving for a desktop OS.

One feature that would be nice is to be able to import an
externally-provided mapping of thin pool device numbers to LV names, so
that lvm2 could provide a (read-only, and not guaranteed fresh) view of
system state for reporting purposes.

> > > It's worth to mention - the more bullet-proof you will want to make your
> > > project - the more closer to the extra processing made by lvm2 you will 
> > > get.
> > 
> > Why is this?  How does lvm2 compare to stratis, for example?
> 
> Stratis is yet another volume manager written in Rust combined with XFS for
> easier user experience. That's all I'd probably say about it...

That’s fine.  I guess my question is why making lvm2 bullet-proof needs
so much overhead.

> > > However before you will step into these waters - you should probably
> > > evaluate wheth

Re: [linux-lvm] Running thin_trim before activating a thin pool

2022-01-29 Thread Demi Marie Obenour
On Sat, Jan 29, 2022 at 10:40:34PM +0100, Zdenek Kabelac wrote:
> Dne 29. 01. 22 v 21:09 Demi Marie Obenour napsal(a):
> > On Sat, Jan 29, 2022 at 08:42:21PM +0100, Zdenek Kabelac wrote:
> > > Dne 29. 01. 22 v 19:52 Demi Marie Obenour napsal(a):
> > > > Is it possible to configure LVM2 so that it runs thin_trim before it
> > > > activates a thin pool?  Qubes OS currently runs blkdiscard on every thin
> > > > volume before deleting it, which is slow and unreliable.  Would running
> > > > thin_trim during system startup provide a better alternative?
> > > 
> > > Hi
> > > 
> > > 
> > > Nope there is currently no support from lvm2 side for this.
> > > Feel free to open RFE.
> > 
> > Done: https://bugzilla.redhat.com/show_bug.cgi?id=2048160
> > 
> > 
> 
> Thanks
> 
> Although your use-case Thinpool on top of VDO is not really a good plan and
> there is a good reason behind why lvm2 does not support this device stack
> directly (aka thin-pool data LV as VDO LV).
> I'd say you are stepping on very very thin ice...

Thin pool on VDO is not my actual use-case.  The actual reason for the
ticket is slow discards of thin devices that are about to be deleted;
you can find more details in the linked GitHub issue.  That said, now I
am curious why you state that dm-thin on top of dm-vdo (that is,
userspace/filesystem/VM/etc ⇒ dm-thin data (*not* metadata) ⇒ dm-vdo ⇒
hardware/dm-crypt/etc) is a bad idea.  It seems to be a decent way to
add support for efficient snapshots of data stored on a VDO volume, and
to have multiple volumes on top of a single VDO volume.  Furthermore,
https://access.redhat.com/articles/2106521#vdo recommends exactly this
use-case.  Or am I misunderstanding you?

> Also I assume you have already checked performance of discard on VDO, but I
> would not want to run this operation frequently on any larger volume...

I have never actually used VDO myself, although the documentation does
warn about this.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] LVM performance vs direct dm-thin

2022-01-29 Thread Demi Marie Obenour
On Sat, Jan 29, 2022 at 10:32:52PM +0100, Zdenek Kabelac wrote:
> Dne 29. 01. 22 v 21:34 Demi Marie Obenour napsal(a):
> > How much slower are operations on an LVM2 thin pool compared to manually
> > managing a dm-thin target via ioctls?  I am mostly concerned about
> > volume snapshot, creation, and destruction.  Data integrity is very
> > important, so taking shortcuts that risk data loss is out of the
> > question.  However, the application may have some additional information
> > that LVM2 does not have.  For instance, it may know that the volume that
> > it is snapshotting is not in use, or that a certain volume it is
> > creating will never be used after power-off.
> > 
> 
> Hi
> 
> Short answer: it depends ;)
> 
> Longer story:
> If you want to create few thins per hour - than it doesn't really matter.
> If you want to create few thins in a second - than the cost of lvm2
> management is very high  - as lvm2 does far more work then just sending a
> simple ioctl (as it's called logical volume management for a reason)

Qubes OS definitely falls into the second category.  Starting a qube
(virtual machine) generally involves creating three thins (one fresh and
two snapshots).  Furthermore, Qubes OS frequently starts qubes in
response to user actions, so thin volume creation speed directly impacts
system responsiveness.

> So brave developers may always write their own management tools for their
> constrained environment requirements that will by significantly faster in
> terms of how many thins you could create per minute (btw you will need to
> also consider dropping usage of udev on such system)

What kind of constraints are you referring to?  Is it possible and safe
to have udev running, but told to ignore the thins in question?

> It's worth to mention - the more bullet-proof you will want to make your
> project - the more closer to the extra processing made by lvm2 you will get.

Why is this?  How does lvm2 compare to stratis, for example?

> However before you will step into these waters - you should probably
> evaluate whether thin-pool actually meet your needs if you have that high
> expectation for number of supported volumes - so you will not end up with
> hyper fast snapshot creation while the actual usage then is not meeting your
> needs...

What needs are you thinking of specifically?  Qubes OS needs block
devices, so filesystem-backed storage would require the use of loop
devices unless I use ZFS zvols.  Do you have any specific
recommendations?

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

[linux-lvm] LVM performance vs direct dm-thin

2022-01-29 Thread Demi Marie Obenour
How much slower are operations on an LVM2 thin pool compared to manually
managing a dm-thin target via ioctls?  I am mostly concerned about
volume snapshot, creation, and destruction.  Data integrity is very
important, so taking shortcuts that risk data loss is out of the
question.  However, the application may have some additional information
that LVM2 does not have.  For instance, it may know that the volume that
it is snapshotting is not in use, or that a certain volume it is
creating will never be used after power-off.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Running thin_trim before activating a thin pool

2022-01-29 Thread Demi Marie Obenour
On Sat, Jan 29, 2022 at 08:42:21PM +0100, Zdenek Kabelac wrote:
> Dne 29. 01. 22 v 19:52 Demi Marie Obenour napsal(a):
> > Is it possible to configure LVM2 so that it runs thin_trim before it
> > activates a thin pool?  Qubes OS currently runs blkdiscard on every thin
> > volume before deleting it, which is slow and unreliable.  Would running
> > thin_trim during system startup provide a better alternative?
> 
> Hi
> 
> 
> Nope there is currently no support from lvm2 side for this.
> Feel free to open RFE.

Done: https://bugzilla.redhat.com/show_bug.cgi?id=2048160

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

[linux-lvm] Running thin_trim before activating a thin pool

2022-01-29 Thread Demi Marie Obenour
Is it possible to configure LVM2 so that it runs thin_trim before it
activates a thin pool?  Qubes OS currently runs blkdiscard on every thin
volume before deleting it, which is slow and unreliable.  Would running
thin_trim during system startup provide a better alternative?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/