Hi,

My wording might be a bit over the top, apologies in advance. I do really mean 
the best, just a bit baffled at the wontfix reply and lack of acknowledgement 
of the problem.

I'm not really understanding the "Half VG was never a real thing" approach and 
general dismissal of this bug, I think it should be reopened still and this 
use case handled properly or at least further discussion. I consider it to be 
a regression and quite a nasty one at that especially for servers without BMC.



LVM officially support activation of any LV in a VG as long as specific PVs 
needed for this LV are present. Specifying --activationmode is something else 
entirely and should never be done (unless disaster recovery).

>From lvchange(8) man page about this option:

> Determines if LV activation is allowed when PVs are missing, e.g. because of
> a device failure. ...

Meaning that --activationmode decides whether LV with missing PVs may be 
activated or not. This is not the case in my setup and in many other server 
setups. To make sure: *only* the PVs for the *exact* LV in question need to be 
present, PVs for *other* LVs in the same VG may be missing.

What LVM does not support, unless you force it, is actually changing a VG when 
PVs of this VG are missing, as to my knowledge VG metadata is stored on each 
and every PV independently. LV activation does not count as VG modification.



The entire reason to have a single VG for many things is because of 
flexibility LVM provides within a VG. Almost all of the features *require* PVs 
within the same VG. Snapshots, thin LVM, cache, etc. Even basic feature like 
resizing require a nice PV - VG - LV structure.

As a PV can only be in one VG, what refusal of this bug is suggesting is that 
one has to manually partition (GPT) drives for root-related stuff, with their 
own PVs and VGs and LVs, so it can boot. And then create other partitions with 
PVs on the same disk for the additional stuff. This literally defeats the 
entire purpose of LVM, easy volume management, and means you have to resize 
the physical disk partitions to extend things. Good luck with that!

A few examples: root filesystem on some NVMe SSD and a bunch of HDDs connected 
for persistent storage. Now let's say I want to:
* When running backups, copy-on-write snapshot on the SSD PV so the HDD array 
PVs performance remains in good condition instead of COW on the HDDs.
* Implement LVM caching for this HDD array PVs on fast SSD PV.
* Thin pool on HDDs and store the metadata on the SSD PV for performance.
* Reminder: simply having a /etc/luks.key on rootfs used to open other PVs 
(HDD array) will result in boot failure if they're in the same VG!

All totally reasonable use cases. But if you are requiring split VG for "root 
stuff needed during booting" this is all not supported, because HDDs are only 
activated later. Maybe these HDD PVs are actually on a networked file system 
(NBD, NFS) are only available later. LVM can do all of this safely no problem.
(I have used many of the above in production environments with great results.)



Anyway, the main problem is that the udev rules are currently coded to only do 
activation when LVM_VG_NAME_COMPLETE aka a VG has all the PVs used for all the 
LVs within the VG. This is not suitable for early stage boot processes and 
activation of LV for root, swap and other early required fs. It *is* suitable 
for auto-activation of LVs during late stage boot process or during runtime.

On my workstation, opening mail of this issue, a typical boot is as follows:

# The PV (SSD) containing root and swap is unlocked (LUKS, manual input pw).
# Manual initramfs hack that "lvchange -aay -y root swap".
lvm: WARNING: VG verm-r4e is missing PV HDD_ARRAY_1...
lvm: WARNING: VG verm-r4e is missing PV HDD_ARRAY_2...
# Rootfs get mounted, late-stage boot lvm cryptsetup etc run.
# It notices verm-r4e VG is incomplete.
lvm[1535]: PV /dev/dm-2 online, VG verm-r4e incomplete (need 2).
# The /etc/luks.key is used to open HDD array PVs.
lvm[2387]: PV /dev/dm-17 online, VG verm-r4e incomplete (need 1).
lvm[2534]: PV /dev/dm-19 online, VG verm-r4e is complete.
# Now the all PVs are there, the remainder LVs get auto-activated.
lvm[2540]:   9 logical volume(s) in volume group "verm-r4e" now active
# As a result, my HDD pool LV is on and gets fsck + mounted cleanly.

This is working very well and is by design and very normal way of using LVM. 
Both startup and shutdown works perfectly with clean mount/unmount etc.



Also, if system configuration somehow triggers raid rebuilds every boot then 
sysadmin will get mail from tooling (if properly configured at least). Pretty 
sure it also only syncs dirty pages in such a case anyway, which is very 
quick. In any case I'd rather have a system mail root with some issue than 
hard fail in initramfs for no good reason.

This strongly feels like removing upstream supported use cases to "protect" 
sysadmin from wrongly configuring their system, at the expense of people that 
do know what they're doing and are relying on this feature to work with a more 
complex LVM setup. If LVM is too complex people can use plain partitioning.

I'd be glad to assist in any potential solutions or further explanation, so 
definitely feel free to ask!

Cheers,

-- 
Melvin Vermeeren
Systems engineer

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to