Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-10-21 Thread Martin Wilck
On Mon, 2021-10-18 at 10:04 -0500, David Teigland wrote:


>
> I began trying to use RUN several
> months ago and I think I gave up trying to find a way to pass values
> from
> the RUN program back into the udev rule (possibly by writing values
> to a
> temp file and then doing IMPORT{file}). 

That can't work, because RUN is executed *after* all rules have been
processed. That's documented in the man page. I don't quite understand
what you're trying to achieve.


Martin


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-10-21 Thread Martin Wilck
On Wed, 2021-10-20 at 09:50 -0500, David Teigland wrote:
> 
> I was just providing some background history after you and Peter both
> mentioned the idea of using RUN instead of IMPORT.  That is, I gave
> up
> trying to use RUN many months ago because it wouldn't work, while
> IMPORT
> actually does what we need, to produce the vgname variable inside the
> rule.

I see. The problem with IMPORT (like RUN, actually) is that it's
required to finish quickly, which might not be the case if we encounter
resource shortage / contention as observed by Heming. If that happens,
the programs started may be killed by udevd, with unpredictable
results.

Regards
Martin





___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-10-18 Thread Martin Wilck
On Thu, 2021-09-30 at 10:55 -0500, David Teigland wrote:
> On Wed, Sep 29, 2021 at 11:39:52PM +0200, Peter Rajnoha wrote:
> 
> > 
> >   - For event-based activation, we need to be sure that we use
> > "RUN"
> >     rule, not any of "IMPORT{program}" or "PROGRAM" rule. The
> > difference
> >     is that the "RUN" rules are applied after all the other udev
> > rules are
> >     already applied for current uevent, including creation of
> > symlinks. And
> >     currently, we have IMPORT{program}="pvscan..." in our rule,
> >     unfortunately...
> 
> That's pretty subtle, I'm wary about propagating such specific and
> delicate behavior, seems fragile.

I'd like to second Peter here, "RUN" is in general less fragile than
"IMPORT{PROGRAM}". You should use IMPORT{PROGRAM}" if and only if 

 - the invoked program can work with incomplete udev state of a device
   (the progrem should not try to access the device via
   libudev, it should rather get properties either from sysfs or the
   uevent's environment variables)
 - you need the result or the output of the program in order to proceed
   with rules processing.

Regards,
Martin


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-10-01 Thread Martin Wilck
On Thu, 2021-09-30 at 09:41 -0500, Benjamin Marzinski wrote:
> On Thu, Sep 30, 2021 at 07:51:08AM +0000, Martin Wilck wrote:
> 
> 
> For multipathd, we don't even need to care when all the block devices
> have been processed.  We only need to care about devices that are
> currently multipathed. If multipathd starts up and notices devices
> that
> are in S and not in U, but those devices aren't currently part of a
> multipath device, it can ignore them. 

True. Sorry for being imprecise.

Martin


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-10-01 Thread Martin Wilck
On Thu, 2021-09-30 at 23:32 +0800, heming.z...@suse.com wrote:
> > I just want to say that some of the issues might simply be
> > regressions/issues with systemd/udev that could be fixed. We as
> > providers of block device abstractions where we need to handle,
> > sometimes, thousands of devices, might be the first ones to hit these
> > issues.
> > 
> 
> The rhel8 callgrind picture
> (https://prajnoha.fedorapeople.org/bz1986158/rhel8_libudev_critical_cost.png
> )
> responds to my analysis:
> https://listman.redhat.com/archives/linux-lvm/2021-June/msg00022.html
> handle_db_line took too much time and become the hotspot.

I missed that post. You wrote

> the dev_cache_scan doesn't have direct disk IOs, but libudev will
scan/read
> udev db which issue real disk IOs (location is /run/udev/data).
> ...
> 2. scans/reads udev db (/run/udev/data). may O(n)
>  udev will call device_read_db => handle_db_line to handle every
>line of a db file.
> ...
> I didn't test the related udev code, and guess the <2> takes too much
time.

... but note that /run/udev is on tmpfs, not on a real disk. So  the
accesses should be very fast unless there's some locking happening.

Regards
Martin


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-09-30 Thread Martin Wilck
On Thu, 2021-09-30 at 00:06 +0200, Peter Rajnoha wrote:
> On Tue 28 Sep 2021 12:42, Benjamin Marzinski wrote:
> > On Tue, Sep 28, 2021 at 03:16:08PM +0000, Martin Wilck wrote:
> > > I have pondered this quite a bit, but I can't say I have a
> > > concrete
> > > plan.
> > > 
> > > To avoid depending on "udev settle", multipathd needs to
> > > partially
> > > revert to udev-independent device detection. At least during
> > > initial
> > > startup, we may encounter multipath maps with members that don't
> > > exist
> > > in the udev db, and we need to deal with this situation
> > > gracefully. We
> > > currently don't, and it's a tough problem to solve cleanly. Not
> > > relying
> > > on udev opens up a Pandora's box wrt WWID determination, for
> > > example.
> > > Any such change would without doubt carry a large risk of
> > > regressions
> > > in some scenarios, which we wouldn't want to happen in our large
> > > customer's data centers.
> > 
> > I'm not actually sure that it's as bad as all that. We just may
> > need a
> > way for multipathd to detect if the coldplug has happened.  I'm
> > sure if
> > we say we need it to remove the udev settle, we can get some method
> > to
> > check this. Perhaps there is one already, that I don't know about.
> > If
> 
> The coldplug events are synthesized and as such, they all now contain
> SYNTH_UUID= key-value pair with kernel>=4.13:
> 
>  
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-uevent
> 
> I've already tried to proposee a patch for systemd/udev that would
> mark
> all uevents coming from the trigger (including the one used at boot
> for
> coldplug) with an extra key-value pair that we could easily match in
> rules,
> but that was not accepted. So right now, we could detect that
> synthesized uevent happened, though we can't be sure it was the
> actual
> udev trigger at boot. For that, we'd need the extra marks. I can give
> it
> another try though, maybe if there are more people asking for this
> functionality, we'll be at better position for this to be accepted.

That would allow us to discern synthetic events, but I'm unsure how
this what help us. Here, what matters is to figure out when we don't
expect any more of them to arrive.

I guess it would be possible to compare the list of (interesting)
devices in sysfs with the list of devices in the udev db. For
multipathd, we could

 - scan set U of udev devices on startup
 - scan set S of sysfs devices on startup
 - listen for uevents for updating both S and U
 - after each uevent, check if the difference set of S and U is emtpy
 - if yes, coldplug has finished
 - otherwise, continue waiting, possibly until some timeout expires.

It's more difficult for LVM because you have no daemon maintaining
state.

Martin





> 


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-09-30 Thread Martin Wilck
On Thu, 2021-09-30 at 16:07 +0800, heming.z...@suse.com wrote:
> On 9/30/21 3:51 PM, Martin Wilck wrote:
> 
> 
> Another performance story:
> The legacy lvm2 (2.02.xx) with lvmetad daemon, the event-activation
> mode
> is very likely timeout on a large scale PVs.
> When customer met this issue, we suggested them to disable lvmetad.
> 

Right. IIRC, that used to be a common suggestion to make without having
detailed clues about the issue at hand... it would help more often than
not.

In theory, I believe that a well written daemon maintaining and
consistent internal state (and possibly manipulating it) would scale
better than thousands of clients trying to access state in some shared
fashion (database, filesystem tree, whatever). I have no clue why that
didn't work with lvmetad. It had other issues I never clearly
understood, either. No need to discuss it further, as it has been
abandoned anyway.

Martin


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-09-30 Thread Martin Wilck
On Wed, 2021-09-29 at 23:39 +0200, Peter Rajnoha wrote:
> On Mon 27 Sep 2021 10:38, David Teigland wrote:
> > On Mon, Sep 27, 2021 at 12:00:32PM +0200, Peter Rajnoha wrote:
> > > > - We could use the new lvm-activate-* services to replace the
> > > > activation
> > > > generator when lvm.conf event_activation=0.  This would be done
> > > > by simply
> > > > not creating the event-activation-on file when
> > > > event_activation=0.
> > > 
> > > ...the issue I see here is around the systemd-udev-settle:
> > 
> > Thanks, I have a couple questions about the udev-settle to
> > understand that
> > better, although it seems we may not need it.
> > 
> > >   - the setup where lvm-activate-vgs*.service are always there
> > > (not
> > >     generated only on event_activation=0 as it was before with
> > > the
> > >     original lvm2-activation-*.service) practically means we
> > > always
> > >     make a dependency on systemd-udev-settle.service, which we
> > > shouldn't
> > >     do in case we have event_activation=1.
> > 
> > Why wouldn't the event_activation=1 case want a dependency on udev-
> > settle?
> > 
> 
> For event-based activation, I'd expect it to really behave in event-
> based
> manner, that is, to respond to events as soon as they come and not
> wait
> for all the other devices unnecessarily.

I may be missing something here. Perhaps I misunderstood David's
concept. Of course event-based activation is best - in theory.
The reason we're having this discussion is that it may cause thousands
of event handlers being executed in parallel, and that we have seen
cases where this was causing the system to stall during boot for
minutes, or even forever. The ideal solution for that would  be to
figure out how to avoid the contention, but I thought you and David had
given up on that.

Heming has shown that the "static" activation didn't suffer from this
problem. So, to my understanding, David was seeking for a way to
reconcile these two concepts, by starting out statically and switching
to event-based activation when we can without the risk of stalling. To
do that, we must figure out when to switch, and (like it or not) udev
settle is the best indicator we have.

Also IMO David was striving for a solution that "just works"
efficiently both an small and big systems, without the admin having to
adjust configuration files.

> The use of udev-settle is always a pain - for example, if there's a
> mount
> point defined on top of an LV, with udev-settle as dependency, we
> practically
> wait for all devices to settle. With 'all', I mean even devices which
> are not
> block devices and which are not event related to any of that LVM
> layout and
> the stack underneath. So simply we could be waiting uselessly and we
> could increase possibility of a timeout (...for the mount point
> etc.).

True, but is there anything better?
> 

> Non-event-based LVM activation needs to wait for settle for sure
> (because
> there's full scan across all devices).
> 
> Event-based LVM activation just needs to be sure that:
> 
>   - the pvscan only scans the single device (the one for which
> there's
>     the uevent currently being processed),

If that really worked without taking any locks (e.g. on the data
structures about VGs), it would be the answer.


Martin


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-09-30 Thread Martin Wilck
On Wed, 2021-09-29 at 23:53 +0200, Peter Rajnoha wrote:
> On Tue 28 Sep 2021 06:34, Martin Wilck wrote:
> > 
> > You said it should wait for multipathd, which in turn waits for
> > udev
> > settle. And indeed it makes some sense. After all: the idea was to
> > avoid locking issues or general resource starvation during uevent
> > storms, which typically occur in the coldplug phase, and for which
> > the
> > completion of "udev settle" is the best available indicator.
> > 
> 
> Udevd already limits the number of concurent worker processes
> processing the udev rules for each uevent. So even if we trigger all
> the
> uevents, they are not processed all in parallel, there's some
> queueing.
> 

This is true, but there are situations where reducing the number of
workers to anything reasonable hasn't helped avoid contention
(udev.children-max=1 is unrealistic :-) ). Heming can fill in the
details, I believe. When contention happens, it's very difficult to
debug what's going on, as it's usually during boot, the system is
unresponsive, and it only happens on very large installments that
developers rarely have access to. But Heming went quite a long way
analyzing this.

> However, whether this is good or not depends on perspective - you
> could
> have massive paralelism and a risk of resource starvation or, from
> the
> other side, you could have timeouts because something wasn't
> processed
> in time for other parts of the system which are waiting for
> dependencies.
> 
> Also, the situation might differ based on the fact whether during the
> uevent processing we're only looking at that concrete single device
> for
> which we've just received an event or whether we also need to look at
> other devices.

Yes, "it depends". We are looking for a solution that "works well" for
any setup without specific tuning. Meaning that the system doesn't
stall for substantial amounts of time during boot.

Regards
Martin


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-09-29 Thread Martin Wilck
On Tue, 2021-09-28 at 12:42 -0500, Benjamin Marzinski wrote:
> On Tue, Sep 28, 2021 at 03:16:08PM +0000, Martin Wilck wrote:
> > On Tue, 2021-09-28 at 09:42 -0500, David Teigland wrote:
> > 
> > 
> > I have pondered this quite a bit, but I can't say I have a concrete
> > plan.
> > 
> > To avoid depending on "udev settle", multipathd needs to partially
> > revert to udev-independent device detection. At least during
> > initial
> > startup, we may encounter multipath maps with members that don't
> > exist
> > in the udev db, and we need to deal with this situation gracefully.
> > We
> > currently don't, and it's a tough problem to solve cleanly. Not
> > relying
> > on udev opens up a Pandora's box wrt WWID determination, for
> > example.
> > Any such change would without doubt carry a large risk of
> > regressions
> > in some scenarios, which we wouldn't want to happen in our large
> > customer's data centers.
> 
> I'm not actually sure that it's as bad as all that. We just may need
> a
> way for multipathd to detect if the coldplug has happened.  I'm sure
> if
> we say we need it to remove the udev settle, we can get some method
> to
> check this. Perhaps there is one already, that I don't know about.

Our ideas are not so far apart, but this is the wrong thread on the
wrong mailing list :-) Adding dm-devel.

My thinking is: if during startup multipathd encounters existing maps
with member devices missing in udev, it can test the existence of the
devices in sysfs, and if the devices are present there, it shouldn't
flush the maps. This should probably be a general principle, not only
during startup or "boot" (wondering if it makes sense to try and add a
concept like "started during boot" to multipathd - I'd rather try to
keep it generic). Anyway, however you put it, that means that we'd
deviate at least to some extent from the current "always rely on udev"
principle. That's what I meant. Perhaps I exaggerated the difficulties.
Anyway, details need to be worked out, and I expect some rough edges.

> > I also looked into Lennart's "storage daemon" concept where
> > multipathd
> > would continue running over the initramfs/rootfs switch, but that
> > would
> > be yet another step with even higher risk.
> 
> This is the "set argv[0][0] = '@' to disble initramfs daemon killing"
> concept, right? We still have the problem where the udev database
> gets
> cleared, so if we ever need to look at that while processing the
> coldplug events, we'll have problems.

If multipathd had started during initrd processing, it would have seen
the uevents for the member devices. There are no "remove" events, so
multipathd might not even notice that the devices are gone. But libudev
queries on the devices could fail between pivot and coldplug, which is
perhaps even nastier... Also, a daemon running like this would live in
a separate, detached mount namespace. It couldn't just reread its
configuration file or the wwids file; it would have no access to the
ordinary root FS. 

> > 
> > > Otherwise, when the devices file is not used,
> > > md: from reading the md headers from the disk
> > > mpath: from reading sysfs links and /etc/multipath/wwids
> > 
> > Ugh. Reading sysfs links means that you're indirectly depending on
> > udev, because udev creates those. It's *more* fragile than calling
> > into
> > libudev directly, IMO. Using /etc/multipath/wwids is plain wrong in
> > general. It works only on distros that use "find_multipaths
> > strict",
> > like RHEL. Not to mention that the path can be customized in
> > multipath.conf.
> 
> I admit that a wwid being in the wwids file doesn't mean that it is
> definitely a multipath path device (it could always still be
> blacklisted
> for instance). Also, the ability to move the wwids file is
> unfortunate,
> and probably never used. But it is the case that every wwid in the
> wwids
> file has had a multipath device successfully created for it. This is
> true regardless of the find_multipaths setting, and seems to me to be
> a
> good hint. Conversely, if a device wwid isn't in the wwids file, then
> it
> very likely has never been multipathed before (assuming that the
> wwids
> file is on a writable filesystem).

Hm. I hear you, but I am able to run "multipath -a" and add a wwid to
the file without it being created. Actually I'm able to add bogus wwids
to the file in this way.

Regards,
Martin
> 


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-09-29 Thread Martin Wilck
On Tue, 2021-09-28 at 17:16 +0200, Martin Wilck wrote:
> On Tue, 2021-09-28 at 09:42 -0500, David Teigland wrote:
> 
> > 
> > Firstly, with the new devices file, only the actual md/mpath device
> > will
> > be in the devices file, the components will not be, so lvm will
> > never
> > attempt to look at an md or mpath component device.
> 
> I have to look more closely into the devices file and how it's
> created
> and used. 
> 
> > Otherwise, when the devices file is not used,
> > md: from reading the md headers from the disk
> > mpath: from reading sysfs links and /etc/multipath/wwids
> 
> Ugh. Reading sysfs links means that you're indirectly depending on
> udev, because udev creates those. It's *more* fragile than calling
> into
> libudev directly, IMO.

Bah. Mental short-circuit. You wrote "sysfs symlinks" and I read
"/dev/disk symlinks". Sorry! Then, I'm not quite sure what symlinks you
are talking about though.

Martin




___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-09-29 Thread Martin Wilck
Hello David and Peter,

On Mon, 2021-09-27 at 10:38 -0500, David Teigland wrote:
> On Mon, Sep 27, 2021 at 12:00:32PM +0200, Peter Rajnoha wrote:
> > > - We could use the new lvm-activate-* services to replace the
> > > activation
> > > generator when lvm.conf event_activation=0.  This would be done by
> > > simply
> > > not creating the event-activation-on file when event_activation=0.
> > 
> > ...the issue I see here is around the systemd-udev-settle:
> 
> Thanks, I have a couple questions about the udev-settle to understand
> that
> better, although it seems we may not need it.
> 
> >   - the setup where lvm-activate-vgs*.service are always there (not
> >     generated only on event_activation=0 as it was before with the
> >     original lvm2-activation-*.service) practically means we always
> >     make a dependency on systemd-udev-settle.service, which we
> > shouldn't
> >     do in case we have event_activation=1.
> 
> Why wouldn't the event_activation=1 case want a dependency on udev-
> settle?

You said it should wait for multipathd, which in turn waits for udev
settle. And indeed it makes some sense. After all: the idea was to
avoid locking issues or general resource starvation during uevent
storms, which typically occur in the coldplug phase, and for which the
completion of "udev settle" is the best available indicator.

> 
> >   - If we want to make sure that we run our "non-event-based
> > activation"
> >     after systemd-udev-settle.service, we also need to use
> >     "After=systemd-udev-settle.service" (the "Wants" will only make
> > the
> >     udev settle service executed, but it doesn't order it with
> > respect
> >     to our activation services, so it can happen in parallel - we
> > want
> >     it to happen after the udev settle).
> 
> So we may not fully benefit from settling unless we use After
> (although
> the benefits are uncertain as mentioned below.)

Side note :You may be aware that the systemd people are deprecating
this service (e.g.
https://github.com/opensvc/multipath-tools/issues/3). 
I'm arguing against it (perhaps you want to join in :-), but odds are
that it'll disappear sooner or later. Fot the time being, I don't see a
good alternative.

The dependency type you have to use depends on what you need. Do you
really only depend on udev settle because of multipathd? I don't think
so; even without multipath, thousands of PVs being probed
simultaneously can bring the performance of parallel pvscans down. That
was the original motivation for this discussion, after all. If this is
so, you should use both "Wants" and "After". Otherwise, using only
"After" might be sufficient.

> 
> > Now the question is whether we really need the systemd-udev-settle
> > at
> > all, even for that non-event-based lvm activation. The udev-settle
> > is
> > just to make sure that all the udev processing and udev db content
> > is
> > complete for all triggered devices. But if we're not reading udev
> > db and
> > we're OK that those devices might be open in parallel to lvm
> > activation
> > period (e.g. because there's blkid scan done on disks/PVs), we
> > should be
> > OK even without that settle. However, we're reading some info from
> > udev db,
> > right? (like the multipath component state etc.)
> 
> - Reading the udev db: with the default
> external_device_info_source=none
>   we no longer ask the udev db for any info about devs.  (We now
> follow
>   that setting strictly, and only ask udev when source=udev.)

This is a different discussion, but if you don't ask udev, how do you
determine (reliably, and consistently with other services) whether a
given device will be part of a multipath device or a MD Raid member?

I know well there are arguments both for and against using udev in this
context, but whatever optimizations you implement, they should work
both ways.

> - Concurrent blkid and activation: I can't find an issue with this
>   (couldn't force any interference with some quick tests.)

In the past, there were issues with either pvscan or blkid (or
multipath) failing to open a device while another process had opened it
exclusively. I've never understood all the subtleties. See systemd
commit 3ebdb81 ("udev: serialize/synchronize block device event
handling with file locks").

> - I wonder if After=udev-settle could have an incidental but
> meaningful
>   effect of more PVs being in place before the service runs?

After=udev-settle will make sure that you're past a coldplug uevent
storm during boot. IMO this is the most important part of the equation.
I'd be happy to find a solution for this that doesn't rely on udev
settle, but I don't see any.

Regards
Martin

> 
> I'll try dropping udev-settle in all cases to see how things look.
> 
> Dave
> 


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-09-29 Thread Martin Wilck
On Tue, 2021-09-28 at 09:42 -0500, David Teigland wrote:
> On Tue, Sep 28, 2021 at 06:34:06AM +0000, Martin Wilck wrote:
> > Hello David and Peter,
> > 
> > On Mon, 2021-09-27 at 10:38 -0500, David Teigland wrote:
> > > On Mon, Sep 27, 2021 at 12:00:32PM +0200, Peter Rajnoha wrote:
> > > > > - We could use the new lvm-activate-* services to replace the
> > > > > activation
> > > > > generator when lvm.conf event_activation=0.  This would be
> > > > > done by
> > > > > simply
> > > > > not creating the event-activation-on file when
> > > > > event_activation=0.
> > > > 
> > > > ...the issue I see here is around the systemd-udev-settle:
> > > 
> > > Thanks, I have a couple questions about the udev-settle to
> > > understand
> > > that
> > > better, although it seems we may not need it.
> > > 
> > > >   - the setup where lvm-activate-vgs*.service are always there
> > > > (not
> > > >     generated only on event_activation=0 as it was before with
> > > > the
> > > >     original lvm2-activation-*.service) practically means we
> > > > always
> > > >     make a dependency on systemd-udev-settle.service, which we
> > > > shouldn't
> > > >     do in case we have event_activation=1.
> > > 
> > > Why wouldn't the event_activation=1 case want a dependency on
> > > udev-
> > > settle?
> > 
> > You said it should wait for multipathd, which in turn waits for
> > udev
> > settle. And indeed it makes some sense. After all: the idea was to
> > avoid locking issues or general resource starvation during uevent
> > storms, which typically occur in the coldplug phase, and for which
> > the
> > completion of "udev settle" is the best available indicator.
> 
> Hi Martin, thanks, you have some interesting details here.
> 
> Right, the idea is for lvm-activate-vgs-last to wait for other
> services
> like multipath (or anything else that a PV would typically sit on),
> so
> that it will be able to activate as many VGs as it can that are
> present at
> startup.  And we avoid responding to individual coldplug events for
> PVs,
> saving time/effort/etc.
> 
> > I'm arguing against it (perhaps you want to join in :-), but odds
> > are
> > that it'll disappear sooner or later. Fot the time being, I don't
> > see a
> > good alternative.
> 
> multipath has more complex udev dependencies, I'll be interested to
> see
> how you manage to reduce those, since I've been reducing/isolating
> our
> udev usage also.

I have pondered this quite a bit, but I can't say I have a concrete
plan.

To avoid depending on "udev settle", multipathd needs to partially
revert to udev-independent device detection. At least during initial
startup, we may encounter multipath maps with members that don't exist
in the udev db, and we need to deal with this situation gracefully. We
currently don't, and it's a tough problem to solve cleanly. Not relying
on udev opens up a Pandora's box wrt WWID determination, for example.
Any such change would without doubt carry a large risk of regressions
in some scenarios, which we wouldn't want to happen in our large
customer's data centers.

I also looked into Lennart's "storage daemon" concept where multipathd
would continue running over the initramfs/rootfs switch, but that would
be yet another step with even higher risk.

> 
> > The dependency type you have to use depends on what you need. Do
> > you
> > really only depend on udev settle because of multipathd? I don't
> > think
> > so; even without multipath, thousands of PVs being probed
> > simultaneously can bring the performance of parallel pvscans down.
> > That
> > was the original motivation for this discussion, after all. If this
> > is
> > so, you should use both "Wants" and "After". Otherwise, using only
> > "After" might be sufficient.
> 
> I don't think we really need the settle.  If device nodes for PVs are
> present, then vgchange -aay from lvm-activate-vgs* will see them and
> activate VGs from them, regardless of what udev has or hasn't done
> with
> them yet.

Hm. This would mean that the switch to event-based PV detection could
happen before "udev settle" ends. A coldplug storm of uevents could
create 1000s of PVs in a blink after event-based detection was enabled.
Wouldn't that resurrect the performance issues that you are trying to
fix with this patch set?

> 
> > > - Reading the udev db: with the defau

Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-09-13 Thread Martin Wilck
On Thu, 2021-09-09 at 14:44 -0500, David Teigland wrote:
> On Tue, Jun 08, 2021 at 01:23:33PM +0000, Martin Wilck wrote:
> > On Di, 2021-06-08 at 14:29 +0200, Peter Rajnoha wrote:
> > > On Mon 07 Jun 2021 16:48, David Teigland wrote:
> > > > 
> > > > If there are say 1000 PVs already present on the system, there
> > > > could be
> > > > real savings in having one lvm command process all 1000, and
> > > > then
> > > > switch
> > > > over to processing uevents for any further devices afterward. 
> > > > The
> > > > switch
> > > > over would be delicate because of the obvious races involved
> > > > with
> > > > new devs
> > > > appearing, but probably feasible.
> > > 
> > > Maybe to avoid the race, we could possibly write the proposed
> > > "/run/lvm2/boot-finished" right before we initiate scanning in
> > > "vgchange
> > > -aay" that is a part of the lvm2-activation-net.service (the last
> > > service to do the direct activation).
> > > 
> > > A few event-based pvscans could fire during the window between
> > > "scan initiated phase" in lvm2-activation-net.service's
> > > "ExecStart=vgchange -aay..."
> > > and the originally proposed "ExecStartPost=/bin/touch
> > > /run/lvm2/boot-
> > > finished",
> > > but I think still better than missing important uevents
> > > completely in
> > > this window.
> > 
> > That sounds reasonable. I was thinking along similar lines. Note
> > that
> > in the case where we had problems lately, all actual activation
> > (and
> > slowness) happened in lvm2-activation-early.service.
> 
> I've implemented a solution like this and would like any thoughts,
> improvements, or testing to verify it can help:
> https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-activation-switch-1
> 
> I've taken some direction from the lvm activation generator, but
> there are
> details of that I'm not too familiar with, so I may be missing
> something
> (in particular it has three activation points but I'm showing two
> below.)
> This new method would probably let us drop the activation-generator,
> since
> we could easily configure an equivalent using this new method.
> 
> Here's how it works:
> 
> uevents for PVs run pvscan with the new option --eventactivation
> check.
> This makes pvscan check if the /run/lvm/event-activation-on file
> exists.
> If not, pvscan does nothing.
> 
> lvm-activate-vgs-main.service
> . always runs (not generated)
> . does not wait for other virtual block device systems to start
> . runs vgchange -aay to activate any VGs already present
> 
> lvm-activate-vgs-last.service
> . always runs (not generated)
> . runs after other systems, like multipathd, have started (we want it
>   to find as many VGs to activate as possible)
> . runs vgchange -aay --eventactivation enable
> . the --eventactivation enable creates /run/lvm/event-activation-on,
>   which enables the traditional pvscan activations from uevents.
> . this vgchange also creates pv online files for existing PVs.
>   (Future pvscans will need the online files to know when VGs are
>   completed, i.e. for VGs that are partially complete at the point
>   of switching to event based actvivation.)
> 
> uevents for PVs continue to run pvscan with the new option
> --eventactivation check, but the check now sees the event-activation-
> on
> temp file, so they will do activation as they have before.
> 
> Notes:
> 
> - To avoid missing VGs during the transition to event-based, the
> vgchange
> in lvm-activate-vgs-last will create event-activation-on before doing
> anything else.  This means for a period of time both vgchange and
> pvscan
> may attempt to activate the same VG.  These commits use the existing
> mechanism to resolve this (the --vgonline option and
> /run/lvm/vgs_online).
> 
> - We could use the new lvm-activate-* services to replace the
> activation
> generator when lvm.conf event_activation=0.  This would be done by
> simply
> not creating the event-activation-on file when event_activation=0.
> 
> - To do the reverse, and use only event based activation without any
> lvm-activate-vgs services, a new lvm.conf setting could be used, e.g.
> event_activation_switch=0 and disabling lvm-activate-vgs services.

This last idea sounds awkward to me. But the rest is very nice. 
Heming, do you agree we should give it a try?

Thanks,
Martin


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-07-07 Thread Martin Wilck
On Fr, 2021-07-02 at 16:09 -0500, David Teigland wrote:
> On Sun, Jun 06, 2021 at 02:15:23PM +0800, heming.z...@suse.com wrote:
> > dev_cache_scan //order: O(n^2)
> >  + _insert_dirs //O(n)
> >  | if obtain_device_list_from_udev() true
> >  |   _insert_udev_dir //O(n)
> >  |
> >  + dev_cache_index_devs //O(n)
> 
> I've been running some experiments and trying some patches to improve
> this.  By setting obtain_device_list_from_udev=0, and using the
> attached
> patch to disable dev_cache_index_devs, the pvscan is much better.
> 
> systemctl status lvm2-pvscan appears to show that the pvscan command
> itself runs for only 2-4 seconds, while the service as a whole takes
> around 15 seconds.  See the 16 sec gap below from the end of pvscan
> to the systemd Started message.  If that's accurate, the remaining
> delay
> would lie outside lvm.
> 
> Jul 02 15:27:57 localhost.localdomain systemd[1]: Starting LVM event
> activation on device 253:1710...
> Jul 02 15:28:00 localhost.localdomain lvm[65620]:   pvscan[65620] PV
> /dev/mapper/mpathalz online, VG 1ed02c7d-0019-43c4-91b5-f220f3521ba9
> is complete.
> Jul 02 15:28:00 localhost.localdomain lvm[65620]:   pvscan[65620] VG
> 1ed02c7d-0019-43c4-91b5-f220f3521ba9 run autoactivation.
> Jul 02 15:28:00 localhost.localdomain lvm[65620]:   1 logical
> volume(s) in volume group "1ed02c7d-0019-43c4-91b5-f220f3521ba9" now
> active

Printing this message is really the last thing that pvscan does?

> Jul 02 15:28:16 localhost.localdomain systemd[1]: Started LVM event
> activation on device 253:1710.

If systemd is very busy, it might take some time until it sees the
completion of the unit. We may need to involve systemd experts. Anyway,
what counts is the behavior if we have lots of parallel pvscan
processes.

Thanks,
Martin





___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-06-08 Thread Martin Wilck
On Di, 2021-06-08 at 15:56 +0200, Peter Rajnoha wrote:
> 
> The issue is that we're relying now on udev db records that contain
> info about mpath and MD components - without this, the detection (and
> hence filtering) could fail in certain cases. So if go without
> checking
> udev db, that'll be a step back. As an alternative, we'd need to call
> out mpath and MD directly from LVM2 if we really wanted to avoid
> checking udev db (but then, we're checking the same thing that is
> already checked by udev means).

Recent multipath-tools ships the "libmpathvalid" library that
could be used for this purpose, to make the logic comply with what
multipathd itself uses. It could be used as an alternative to libudev
for this part of the equation.

Martin


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-06-08 Thread Martin Wilck
On Di, 2021-06-08 at 18:02 +0200, Zdenek Kabelac wrote:
> > 
> > > A third related improvement that could follow is to add stronger
> > > native
> > > mpath detection, in which lvm uses uses /etc/multipath/wwids,
> > > directly or
> > > through a multipath library, to identify mpath components.  This
> > > would
> > > supplement the existing sysfs and udev sources, and address the
> > > difficult
> > > case where the mpath device is not yet set up.
> > > 
> > Please don't. Use libmpathvalid if you want to improve in this
> > area.
> > That's what it was made for.
> 
> Problem is addition of another dependency here.
> 
> We may probably think about using  'dlopen' and if library is present
> use it, 
> but IMHO  libmpathvalid should be integrated into libblkid  in some
> way  - 
> linking another library to many other projects that needs to detect
> MP devices 
> really complicates this a lot.  libblkid should be able to decode
> this and 
> make things much cleaner.

Fair enough. I just wanted to say "don't start hardcoding anything new
in lvm2". Currently, you won't find libmpathvalid on many
distributions.

Martin


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-06-08 Thread Martin Wilck
On Di, 2021-06-08 at 10:39 -0500, David Teigland wrote:
> 
> . Use both native md/mpath detection *and* udev info when it's
> readily
>   available (don't wait for it), instead of limiting ourselves to one
>   source of info.  If either source indicates an md/mpath component,
>   then we consider it true.

Hm. You can boot with "multipath=off" which udev would take into
account. What would you do in that case? Native mpath detection would
probably not figure it out.

multipath-tools itself follows the "try udev and fall back to native if
it fails" approach, which isn't always perfect, either.

> A third related improvement that could follow is to add stronger
> native
> mpath detection, in which lvm uses uses /etc/multipath/wwids,
> directly or
> through a multipath library, to identify mpath components.  This
> would
> supplement the existing sysfs and udev sources, and address the
> difficult
> case where the mpath device is not yet set up.
> 

Please don't. Use libmpathvalid if you want to improve in this area.
That's what it was made for.

Regards,
Martin


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-06-08 Thread Martin Wilck
On Di, 2021-06-08 at 14:29 +0200, Peter Rajnoha wrote:
> On Mon 07 Jun 2021 16:48, David Teigland wrote:
> > 
> > If there are say 1000 PVs already present on the system, there
> > could be
> > real savings in having one lvm command process all 1000, and then
> > switch
> > over to processing uevents for any further devices afterward.  The
> > switch
> > over would be delicate because of the obvious races involved with
> > new devs
> > appearing, but probably feasible.
> 
> Maybe to avoid the race, we could possibly write the proposed
> "/run/lvm2/boot-finished" right before we initiate scanning in
> "vgchange
> -aay" that is a part of the lvm2-activation-net.service (the last
> service to do the direct activation).
> 
> A few event-based pvscans could fire during the window between
> "scan initiated phase" in lvm2-activation-net.service's
> "ExecStart=vgchange -aay..."
> and the originally proposed "ExecStartPost=/bin/touch /run/lvm2/boot-
> finished",
> but I think still better than missing important uevents completely in
> this window.

That sounds reasonable. I was thinking along similar lines. Note that
in the case where we had problems lately, all actual activation (and
slowness) happened in lvm2-activation-early.service.

Regards,
Martin



___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-06-08 Thread Martin Wilck
On Mo, 2021-06-07 at 16:30 -0500, David Teigland wrote:
> On Mon, Jun 07, 2021 at 10:27:20AM +0000, Martin Wilck wrote:
> > Most importantly, this was about LVM2 scanning of physical volumes.
> > The
> > number of udev workers has very little influence on PV scanning,
> > because the udev rules only activate systemd service. The actual
> > scanning takes place in lvm2-pvscan@.service. And unlike udev,
> > there's
> > no limit for the number of instances of a given systemd service
> > template that can run at any given time.
> 
> Excessive device scanning has been the historical problem in this area,
> but Heming mentioned dev_cache_scan() specifically as a problem.  That
> was
> surprising to me since it doesn't scan/read devices, it just creates a
> list of device names on the system (either readdir in /dev or udev
> listing.)  If there are still problems with excessive
> scannning/reading,
> we'll need some more diagnosis of what's happening, there could be some
> cases we've missed.

Heming didn't include his measurement results in the initial post.
Here's a small summary. Heming will be able to provide more details.
You'll see that the effects are quite drastic, factors 3-4 between
every step below, factor >60 between best and worst. I'd say these
results are typical for what we observe also on real-world systems.

kvm-qemu, 6 vcpu, 20G memory, 1258 scsi disks, 1015 vg/lv
Shown is "systemd-analyze blame" output.

 1) lvm2 2.03.05 (SUSE SLE15-SP2),
obtain_device_list_from_udev=1 & event_activation=1
9min 51.782s lvm2-pvscan@253:2.service
9min 51.626s lvm2-pvscan@65:96.service
(many other lvm2-pvscan@ services follow)
 2) lvm2 latest master
obtain_device_list_from_udev=1 & event_activation=1
2min 6.736s lvm2-pvscan@70:384.service 
2min 6.628s lvm2-pvscan@70:400.service
 3) lvm2 latest master
obtain_device_list_from_udev=0 & event_activation=1
40.589s lvm2-pvscan@131:976.service
40.589s lvm2-pvscan@131:928.service
 4) lvm2 latest master
obtain_device_list_from_udev=0 & event_activation=0,
21.034s dracut-initqueue.service
 8.674s lvm2-activation-early.service

IIUC, 2) is the effect of _pvscan_aa_quick(). 3) is surprising;
apparently libudev's device detection causes a factor 3 slowdown.
While 40s is not bad, you can see that event based activation still
performs far worse than "serial" device detection lvm2-activation-
early.service.

Personally, I'm sort of wary about obtain_device_list_from_udev=0
because I'm uncertain whether it might break multipath/MD detection.
Perhaps you can clarify that.

Regards
Martin



___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-06-08 Thread Martin Wilck
On Mo, 2021-06-07 at 23:30 +0800, heming.z...@suse.com wrote:
> On 6/7/21 6:27 PM, Martin Wilck wrote:
> > On So, 2021-06-06 at 11:35 -0500, Roger Heflin wrote:
> > > This might be a simpler way to control the number of threads at
> > > the
> > > same time.
> > > 
> > > On large machines (cpu wise, memory wise and disk wise).   I have
> > > only seen lvm timeout when udev_children is set to default.   The
> > > default seems to be set wrong, and the default seemed to be tuned
> > > for
> > > a case where a large number of the disks on the machine were
> > > going to
> > > be timing out (or otherwise really really slow), so to support
> > > this
> > > case a huge number of threads was required..    I found that with
> > > it
> > > set to default on a close to 100 core machine that udev got about
> > > 87
> > > minutes of time during the boot up (about 2 minutes).  Changing
> > > the
> > > number of children to =4 resulted in udev getting around 2-3
> > > minutes
> > > in the same window, and actually resulted in a much faster boot
> > > up
> > > and a much more reliable boot up (no timeouts).
> > 
> > Wow, setting the number of children to 4 is pretty radical. We
> > decrease
> > this parameter often on large machines, but we never went all the
> > way
> > down to a single-digit number. If that's really necessary under
> > whatever circumstances, it's clear evidence of udev's deficiencies.
> > 
> > I am not sure if it's better than Heming's suggestion though. It
> > would
> > affect every device in the system. It wouldn't even be possible to
> > process more than 4 totally different events at the same time.
> > 
> 
> hello
> 
> I tested udev.children_max with value 1, 2 & 4. The results showed it
> didn't take effect, and the booting time even longer than before.
> This solution may suite for some special cases.

Thanks, good to know. There may be other scenarios where Roger's
suggestion might help. But it should be clear that no distribution will
ever use such low limits, because it'd slow down booting on other
systems unnecessarily.

Thanks,
Martin


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-06-08 Thread Martin Wilck
On So, 2021-06-06 at 11:35 -0500, Roger Heflin wrote:
> This might be a simpler way to control the number of threads at the
> same time.
> 
> On large machines (cpu wise, memory wise and disk wise).   I have
> only seen lvm timeout when udev_children is set to default.   The
> default seems to be set wrong, and the default seemed to be tuned for
> a case where a large number of the disks on the machine were going to
> be timing out (or otherwise really really slow), so to support this
> case a huge number of threads was required..    I found that with it
> set to default on a close to 100 core machine that udev got about 87
> minutes of time during the boot up (about 2 minutes).  Changing the
> number of children to =4 resulted in udev getting around 2-3 minutes
> in the same window, and actually resulted in a much faster boot up
> and a much more reliable boot up (no timeouts).

Wow, setting the number of children to 4 is pretty radical. We decrease
this parameter often on large machines, but we never went all the way
down to a single-digit number. If that's really necessary under
whatever circumstances, it's clear evidence of udev's deficiencies.

I am not sure if it's better than Heming's suggestion though. It would
affect every device in the system. It wouldn't even be possible to
process more than 4 totally different events at the same time.

Most importantly, this was about LVM2 scanning of physical volumes. The
number of udev workers has very little influence on PV scanning,
because the udev rules only activate systemd service. The actual
scanning takes place in lvm2-pvscan@.service. And unlike udev, there's
no limit for the number of instances of a given systemd service
template that can run at any given time.

Note that there have been various changes in the way udev calculates
the default number of workers; what udev will use by default depends on
the systemd version and may even be patched by the distribution.

> Below is one case, but I know there are several other similar cases
> for other distributions.    Note the number of default workers = 8 +
> number_of_cpus * 64 which is going to be a disaster as it will result
> in one thread per disk/lun being started at the same time or the
> max_number_of_workers. 

What distribution are you using? This is not the default formula for
children-max any more, and hasn't been for a while.

Regards
Martin


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Discussion: performance issue on event activation mode

2021-06-08 Thread Martin Wilck
On So, 2021-06-06 at 14:15 +0800, heming.z...@suse.com wrote:
> 
> 1. During boot phase, lvm2 automatically swithes to direct activation
> mode
> ("event_activation = 0"). After booted, switch back to the event
> activation mode.
> 
> Booting phase is a speical stage. *During boot*, we could "pretend"
> that direct
> activation (event_activation=0) is set, and rely on lvm2-activation-
> *.service
> for PV detection. Once lvm2-activation-net.service has finished, we
> could
> "switch on" event activation.

I like this idea. Alternatively, we could discuss disabling event
activation only in the "coldplug" phase after switching root (i.e.
between start of systemd-udev-trigger.service and lvm2-
activation.service), because that's the critical time span during which
1000s of events can happen simultaneously.

Regards
Martin



___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-22 Thread Martin Wilck
On Fri, 2021-02-19 at 23:47 +0100, Zdenek Kabelac wrote:
> 
> Right time is when switch is finished and we have rootfs with /usr
> available - should be ensured by  lvm2-monitor.service and it
> dependencies.

While we're at it - I'm wondering why dmeventd is started so early. dm-
event.service on recent installments has only "Requires=dm-
event.socket", so it'll be started almost immediately after switching
root. In particular, it doesn't wait for any sort of device
initialization or udev initialization.

I've gone through the various tasks that dmeventd is responsible for,
and I couldn't see anything that'd be strictly necessary during early
boot. I may be overlooking something of course. Couldn't the monitoring
be delayed to after local-fs.target, for example?

(This is also related to our previous discussion about
external_device_info_source=udev; we found that dmeventd was one of the
primary sources of strange errors with that setting).

Regards
Martin



___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-22 Thread Martin Wilck
On Fri, 2021-02-19 at 10:37 -0600, David Teigland wrote:
> On Thu, Feb 18, 2021 at 04:19:01PM +0100, Martin Wilck wrote:
> > > Feb 10 17:24:26 archlinux lvm[643]:   pvscan[643] VG sys run
> > > autoactivation.
> > > Feb 10 17:24:26 archlinux lvm[643]:   /usr/bin/dmeventd: stat
> > > failed:
> > > No such file or directory
> > 
> > What's going on here? pvscan trying to start dmeventd ? Why ?
> > There's a
> > dedicated service for starting dmeventd (lvm2-monitor.service). I
> > can
> > see that running dmeventd makes sense as you have thin pools, but
> > I'm
> > at a loss why it has to be started at that early stage during boot
> > already.
> > 
> > This is a curious message, it looks as if pvscan was running from
> > an
> > environment (initramfs??) where dmeventd wasn't available. The
> > message
> > is repeated, and after that, pvscan appears to hang...
> 
> I've found that when pvscan activates a VG, there's a bit of code
> that
> attempts to monitor any LVs that are already active in the VG. 
> Monitoring
> means interacting with dmeventd.  I don't know why it's doing that,
> it
> seems strange, but the logic around monitoring in lvm seems ad hoc
> and in
> need of serious reworking.  In this case I'm guessing there's already
> an
> LV active in "sys", perhaps from direct activation in initrd, and
> when
> pvscan activates that VG it attempts to monitor the already active
> LV.
> 
> Another missing piece in lvm monitoring is that we don't have a way
> to
> start lvm2-monitor/dmeventd at the right time (I'm not sure anyone
> even
> knows when the right time is), so we get random behavior depending on
> if
> it's running or not at a given point.  In this case, it looks like it
> happens to not be running yet.  I sometimes suggest disabling lvm2-
> monitor
> and starting it manually once the system is up, to avoid having it
> interfere during startup.

That sounds familiar.

> 
> > > Feb 10 17:24:26 archlinux lvm[643]:   /usr/bin/dmeventd: stat
> > > failed:
> > > No such file or directory
> > > Feb 10 17:24:26 archlinux lvm[643]:   WARNING: Failed to monitor
> > > sys/pool.
> > > Feb 10 17:24:26 archlinux systemd[1]: Stopping LVM event
> > > activation
> > > on device 9:0...
> 
> The unwanted failed monitoring seems to have caused the pvscan
> command to
> exit with an error, which then leads to further mess and confusion
> where
> systemd then thinks it should stop or kill the pvscan service,
> whatever
> that means.

The way I read Oleksandr's logs, systemd is killing all processes
because it wants to switch root, not because of errors in the pvscan
service. The weird thing is that that fails for one of the pvscan tasks
(253:2), and that that service continues to "run" (rather, "hang") long
after the root switch has happened.

Thanks,
Martin



___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-22 Thread Martin Wilck
On Thu, 2021-02-18 at 16:30 +0100, Oleksandr Natalenko wrote:
> > 
> > So what's timing out here is the attempt to _stop_ pvscan. That's
> > curious. It looks like a problem in pvscan to me, not having
> > reacted to
> > a TERM signal for 30s.
> > 
> > It's also worth noting that the parallel pvscan process for device
> > 9:0
> > terminated correctly (didn't hang).
> 
> Yes, pvscan seems to not react to SIGTERM. I have
> DefaultTimeoutStopSec=30s, if I set this to 90s, pvscan hangs for 90s
> respectively.
> 

Good point. That allows us to conclude that pvscan may hang on exit
when udevd isn't available at the time (has been already stopped). That
positively looks like an lvm problem. The After= is a viable
workaround, nothing more and nothing less. We'd need to run pvscan with
increased debug/log level to figure out why it doesn't stop. Given that
you have a workaround, I'm not sure if it's worth the effort for you.

What strikes me more in your logs is the fact that systemd proceeds
with switching root even though the pvscan@253:2 service hasn't
terminated yet. That looks a bit fishy, really. systemd should have
KILLed pvscan before proceeding.

Martin


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-19 Thread Martin Wilck
On Wed, 2021-02-17 at 14:38 +0100, Oleksandr Natalenko wrote:
> Hi.
> 

Thanks for the logs!

> I'm not sure this issue is reproducible with any kind of LVM layout.
> What I have is thin-LVM-on-LUKS-on-LVM:

I saw MD in your other logs...?

More comments below.

> With regard to the journal, here it is (from the same machine in the
> Arch bugreport; matches the second layout above):
> 
> 
> [~]> LC_TIME=C sudo journalctl -b -10 -u lvm2-pvscan@\*
> -- Journal begins at Fri 2020-12-18 16:33:22 CET, ends at Wed 2021-
> 02-17 14:28:05 CET. --
> Feb 10 17:24:17 archlinux systemd[1]: Starting LVM event activation
> on device 9:0...
> Feb 10 17:24:17 archlinux lvm[463]:   pvscan[463] PV /dev/md0 online,
> VG base is complete.
> Feb 10 17:24:17 archlinux lvm[463]:   pvscan[463] VG base run
> autoactivation.
> Feb 10 17:24:17 archlinux lvm[463]:   2 logical volume(s) in volume
> group "base" now active
> Feb 10 17:24:17 archlinux systemd[1]: Finished LVM event activation
> on device 9:0.
> Feb 10 17:24:26 archlinux systemd[1]: Starting LVM event activation
> on device 253:2...
> Feb 10 17:24:26 archlinux lvm[643]:   pvscan[643] PV /dev/mapper/sys
> online, VG sys is complete.

All good up to here, but then...

> Feb 10 17:24:26 archlinux lvm[643]:   pvscan[643] VG sys run
> autoactivation.
> Feb 10 17:24:26 archlinux lvm[643]:   /usr/bin/dmeventd: stat failed:
> No such file or directory

What's going on here? pvscan trying to start dmeventd ? Why ? There's a
dedicated service for starting dmeventd (lvm2-monitor.service). I can
see that running dmeventd makes sense as you have thin pools, but I'm
at a loss why it has to be started at that early stage during boot
already.

This is a curious message, it looks as if pvscan was running from an
environment (initramfs??) where dmeventd wasn't available. The message
is repeated, and after that, pvscan appears to hang...


> Feb 10 17:24:26 archlinux lvm[643]:   /usr/bin/dmeventd: stat failed:
> No such file or directory
> Feb 10 17:24:26 archlinux lvm[643]:   WARNING: Failed to monitor
> sys/pool.
> Feb 10 17:24:26 archlinux systemd[1]: Stopping LVM event activation
> on device 9:0...

Here I suppose systemd is switching root, and trying to stop jobs,
including the pvscan job.


> Feb 10 17:24:26 archlinux lvm[720]:   pvscan[720] PV /dev/md0 online.
> Feb 10 17:24:26 archlinux lvm[643]:   /usr/bin/dmeventd: stat failed:
> No such file or directory
> Feb 10 17:24:26 archlinux lvm[643]:   WARNING: Failed to monitor
> sys/pool.
> Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: State
> 'stop-sigterm' timed out. Killing.
> Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: Killing
> process 643 (lvm) with signal SIGKILL.
> Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: Main
> process exited, code=killed, status=9/KILL
> Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: Failed
> with result 'timeout'.
> Feb 10 17:24:56 spock systemd[1]: Stopped LVM event activation on
> device 253:2.

So what's timing out here is the attempt to _stop_ pvscan. That's
curious. It looks like a problem in pvscan to me, not having reacted to
a TERM signal for 30s.

It's also worth noting that the parallel pvscan process for device 9:0
terminated correctly (didn't hang).

> 
> [~]> LC_TIME=C sudo journalctl -b -10 --grep pvscan
> -- Journal begins at Fri 2020-12-18 16:33:22 CET, ends at Wed 2021-
> 02-17 14:31:27 CET. --
> Feb 10 17:24:17 archlinux systemd[1]: Created slice system-
> lvm2\x2dpvscan.slice.
> Feb 10 17:24:17 archlinux lvm[463]:   pvscan[463] PV /dev/md0 online,
> VG base is complete.
> Feb 10 17:24:17 archlinux lvm[463]:   pvscan[463] VG base run
> autoactivation.
> Feb 10 17:24:17 archlinux audit[1]: SERVICE_START pid=1 uid=0
> auid=4294967295 ses=4294967295 msg='unit=lvm2-pvscan@9:0
> comm="systemd" exe="/init" hostname=? addr=? terminal=? res=success'
> Feb 10 17:24:17 archlinux kernel: audit: type=1130
> audit(1612974257.986:6): pid=1 uid=0 auid=4294967295 ses=4294967295 
> msg='unit=lvm2-pvscan@9:0 comm="systemd" exe="/init" hostname=?
> addr=? terminal=? res=success'
> Feb 10 17:24:26 archlinux lvm[643]:   pvscan[643] PV /dev/mapper/sys
> online, VG sys is complete.
> Feb 10 17:24:26 archlinux lvm[643]:   pvscan[643] VG sys run
> autoactivation.
> Feb 10 17:24:26 archlinux lvm[720]:   pvscan[720] PV /dev/md0 online.
> Feb 10 17:24:27 spock systemd[1]: lvm2-pvscan@9:0.service: Control
> process exited, code=killed, status=15/TERM
> Feb 10 17:24:27 spock systemd[1]: lvm2-pvscan@9:0.service: Failed
> with result 'signal'.
> Feb 10 17:24:26 spock audit[1]: SERVICE_STOP pid=1 uid=0
> auid=4294967295 ses=4294967295 msg='unit=lvm2-pvscan@9:0
> comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=?
> terminal=? res=failed'
> Feb 10 17:24:27 spock systemd[1]: Requested transaction contradicts
> existing jobs: Transaction for lvm2-pvscan@253:2.service/start is
> destructive (lvm2-pvscan@253:2.service has 'stop' job queued, but
> 

Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-17 Thread Martin Wilck
On Wed, 2021-02-17 at 13:03 +0100, Christian Hesse wrote:
> 
> Let's keep this in mind. Now let's have a look at udevd startup: It
> signals
> being ready by calling sd_notifyf(), but it loads rules and applies
> permissions before doing so [0].
> Even before we have some code about handling events and monitoring
> stuff.

It loads the rules, but events will only be processed after entering
sd_event_loop(), which happens after the sd_notify() call.

Anyway, booting the system with "udev.log-priority=debug" might provide
further insight. Oleksandr, could you try that (without the After=
directive)?

> So I guess pvscan is started in initialization phase before udevd
> signals
> being ready. And obviously there is any kind of race condition.

Right. Some uevent might arrive between the creation of the monitor
socket in monitor_new() and entering the event loop. Such event would
be handled immediately, and possibly before systemd receives the
sd_notify message, so a race condition looks possible.

> 
> With the ordering "After=" in `lvm2-pvscan@.service` the service
> start is
> queued at initialization phase, but actual start and pvscan execution
> is
> delayed until udevd signaled being ready.
> 
> > But in general, I think this needs deeper analysis. Looking at
> > https://bugs.archlinux.org/task/69611, the workaround appears to
> > have
> > been found simply by drawing an analogy to a previous similar case.
> > I'd like to understand what happened on the arch system when the
> > error
> > occured, and why this simple ordering directive avoided it.
> 
> As said I can not reproduce it myself... Oleksandr, can you give more
> details?
> Possibly everything from journal regarding systemd-udevd.service (and
> systemd-udevd.socket) and lvm2-pvscan@*.service could help.
> 
> > 1. How had the offending pvscan process been started? I'd expect
> > that
> > "pvscan" (unlike "lvm monitor" in our case) was started by an udev
> > rule. If udevd hadn't started yet, how would that udev rule have be
> > executed? OTOH, if pvscan had not been started by udev but by
> > another
> > systemd service, than *that* service would probably need to get the
> > After=systemd-udevd.service directive.
> 
> To my understanding it was started from udevd by a rule in
> `69-dm-lvm-metad.rules`.
> 
> (BTW, renaming that rule file may make sense now that lvm2-metad is
> gone...)
> 
> > 2. Even without the "After=" directive, I'd assume that pvscan
> > wasn't
> > started "before" systemd-udevd, but rather "simultaneously" (i.e.
> > in
> > the same systemd transaction). Thus systemd-udevd should have
> > started
> > up while pvscan was running, and pvscan should have noticed that
> > udevd
> > eventually became available. Why did pvscan time out? What was it
> > waiting for? We know that lvm checks for the existence of
> > "/run/udev/control", but that should have become avaiable after
> > some
> > fractions of a second of waiting.
> 
> I do not think there is anything starting pvscan before udevd.

I agree. The race described above looks at least possible.
I would go one step further and say that *every* systemd service that
might be started from an udev rule should have an "After=systemd-
udevd.service".

Martin


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd

2021-02-17 Thread Martin Wilck
On Wed, 2021-02-17 at 09:49 +0800, heming.z...@suse.com wrote:
> On 2/11/21 7:16 PM, Christian Hesse wrote:
> > From: Christian Hesse 
> > 
> > Running the scan before udevd finished startup may result in
> > failure.
> > This has been reported for Arch Linux [0] and proper ordering fixes
> > the issue.
> > 
> > [0] https://bugs.archlinux.org/task/69611
> > 
> > Signed-off-by: Christian Hesse 
> > ---
> >   scripts/lvm2-pvscan.service.in | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/scripts/lvm2-pvscan.service.in b/scripts/lvm2-
> > pvscan.service.in
> > index 09753e8c9..7b4ace551 100644
> > --- a/scripts/lvm2-pvscan.service.in
> > +++ b/scripts/lvm2-pvscan.service.in
> > @@ -4,6 +4,7 @@ Documentation=man:pvscan(8)
> >   DefaultDependencies=no
> >   StartLimitIntervalSec=0
> >   BindsTo=dev-block-%i.device
> > +After=systemd-udevd.service
> >   Before=shutdown.target
> >   Conflicts=shutdown.target
> >   
> > 
> 
> I watched a similar issue with lvm2-monitor.service.
> In a very old machine (i586), udevd cost too much time to finish, it
> triggered
> lvm2-monitor timeout then reported:
> > WARNING: Device /dev/sda not initialized in udev database even
> > after waiting 1000 microseconds.
> 
> One workable solution is to add "systemd-udev-settle.service"
> (obsoleted) or
> "local-fs.target" in "After" of lvm2-monitor.service.

We have to differentiate here. In our case we had to wait for "systemd-
udev-settle.service". In the arch case, it was only necessary to wait
for systemd-udevd.service itself. "After=systemd-udevd.service" just
means that the daemon is up, it says nothing about any device
initialization being completed.

But in general, I think this needs deeper analysis. Looking at
https://bugs.archlinux.org/task/69611, the workaround appears to have
been found simply by drawing an analogy to a previous similar case.
I'd like to understand what happened on the arch system when the error
occured, and why this simple ordering directive avoided it.

1. How had the offending pvscan process been started? I'd expect that
"pvscan" (unlike "lvm monitor" in our case) was started by an udev
rule. If udevd hadn't started yet, how would that udev rule have be
executed? OTOH, if pvscan had not been started by udev but by another
systemd service, than *that* service would probably need to get the
After=systemd-udevd.service directive.

2. Even without the "After=" directive, I'd assume that pvscan wasn't
started "before" systemd-udevd, but rather "simultaneously" (i.e. in
the same systemd transaction). Thus systemd-udevd should have started
up while pvscan was running, and pvscan should have noticed that udevd
eventually became available. Why did pvscan time out? What was it
waiting for? We know that lvm checks for the existence of
"/run/udev/control", but that should have become avaiable after some
fractions of a second of waiting.

Regards,
Martin



___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] system boot time regression when using lvm2-2.03.05

2019-09-12 Thread Martin Wilck
On Wed, 2019-09-11 at 11:13 +0200, Zdenek Kabelac wrote:
> Dne 11. 09. 19 v 9:17 Martin Wilck napsal(a):
> > 
> > My idea was not to skip synchronization entirely, but to consider
> > moving it to a separate process / service. I surely don't want to
> > re-
> > invent lvmetad, but Heming's findings show that it's more efficient
> > to
> > do activation in a "single swoop" (like lvm2-activation.service)
> > than
> > with many concurrent pvscan processes.
> > 
> > So instead of activating a VG immediately when it sees all
> > necessary
> > PVs are detected, pvscan could simply spawn a new service which
> > would
> > then take care of the activation, and sync with udev.
> > 
> > Just a thought, I lack in-depth knowledge of LVM2 internals to know
> > if
> > it's possible.
> 
> Well for relatively long time we do want to move 'pvscan' back to be
> processed 
> within udev rules  and activation service being really just a service
> doing  'vgchange -ay'.

That sounds promising (I believe pvscan could well still be called via
'ENV{SYSTEMD_WANTS}+=' rather than being directly called from udev
rules, but that's just a detail). 
But it doesn't sound as if such a solution was imminent, right?

> Another floating idea is to move towards monitoring instead of using
> semaphore
> (since those SysV resources are kind-of limited and a bit problematic
> when there are left in the system).

I'm not sure I understand - are you talking about udev monitoring?

Thanks
Martin


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] system boot time regression when using lvm2-2.03.05

2019-09-11 Thread Martin Wilck
On Tue, 2019-09-10 at 22:38 +0200, Zdenek Kabelac wrote:
> Dne 10. 09. 19 v 17:20 David Teigland napsal(a):
> > > > > _pvscan_aa
> > > > >vgchange_activate
> > > > > _activate_lvs_in_vg
> > > > >  sync_local_dev_names
> > > > >   fs_unlock
> > > > >dm_udev_wait <=== this point!
> > > > > ```
> > > Could you explain to us what's happening in this code? IIUC, an
> > > incoming uevent triggers pvscan, which then possibly triggers VG
> > > activation. That in turn would create more uevents. The pvscan
> > > process
> > > then waits for uevents for the tree "root" of the activated LVs
> > > to be
> > > processed.
> > > 
> > > Can't we move this waiting logic out of the uevent handling? It
> > > seems
> > > weird to me that a process that acts on a uevent waits for the
> > > completion of another, later uevent. This is almost guaranteed to
> > > cause
> > > delays during "uevent storms". Is it really necessary?
> > > 
> > > Maybe we could create a separate service that would be
> > > responsible for
> > > waiting for all these outstanding udev cookies?
> > 
> > Peter Rajnoha walked me through the details of this, and explained
> > that a
> > timeout as you describe looks quite possible given default
> > timeouts, and
> > that lvm doesn't really require that udev wait.
> > 
> > So, I pushed out this patch to allow pvscan with --noudevsync:
> > https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=3e5e7fd6c93517278b2451a08f47e16d052babbb
> > 
> > You'll want to add that option to lvm2-pvscan.service; we can
> > hopefully
> > update the service to use that if things look good from testing.
> 
> This is certainly a bug.
> 
> lvm2 surely does need to communication with udev for any activation.
> 
> We can't let running activation 'on-the-fly' without control on
> system with 
> udev (so we do not issue 'remove' while there is still 'add' in
> progress)
> 
> Also any more complex target like thin-pool need to wait till
> metadata LV gets 
> ready for thin-check.

My idea was not to skip synchronization entirely, but to consider
moving it to a separate process / service. I surely don't want to re-
invent lvmetad, but Heming's findings show that it's more efficient to
do activation in a "single swoop" (like lvm2-activation.service) than
with many concurrent pvscan processes.

So instead of activating a VG immediately when it sees all necessary
PVs are detected, pvscan could simply spawn a new service which would
then take care of the activation, and sync with udev.

Just a thought, I lack in-depth knowledge of LVM2 internals to know if
it's possible.

Thanks
Martin


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] system boot time regression when using lvm2-2.03.05

2019-09-10 Thread Martin Wilck
Hi David,

On Mon, 2019-09-09 at 09:09 -0500, David Teigland wrote:
> On Mon, Sep 09, 2019 at 11:42:17AM +, Heming Zhao wrote:
> > Hello David,
> > 
> > You are right.  Without calling _online_pvscan_one(), the pv/vg/lv
> > won't be actived.
> > The activation jobs will be done by systemd calling lvm2-
> > activation-*.services later.
> > 
> > Current code, the boot process is mainly blocked by:
> > ```
> > _pvscan_aa
> >   vgchange_activate
> >_activate_lvs_in_vg
> > sync_local_dev_names
> >  fs_unlock
> >   dm_udev_wait <=== this point!
> > ```
> 
> Thanks for debugging that.  With so many devices, one possibility
> that
> comes to mind is this error you would probably have seen:
> "Limit for the maximum number of semaphores reached"

Could you explain to us what's happening in this code? IIUC, an
incoming uevent triggers pvscan, which then possibly triggers VG
activation. That in turn would create more uevents. The pvscan process
then waits for uevents for the tree "root" of the activated LVs to be
processed.

Can't we move this waiting logic out of the uevent handling? It seems
weird to me that a process that acts on a uevent waits for the
completion of another, later uevent. This is almost guaranteed to cause
delays during "uevent storms". Is it really necessary?

Maybe we could create a separate service that would be responsible for
waiting for all these outstanding udev cookies?

Martin


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] system boot time regression when using lvm2-2.03.05

2019-09-06 Thread Martin Wilck
On Fri, 2019-09-06 at 05:01 +, Heming Zhao wrote:
> I just tried to only apply below patch (didn't partly backout commit
> 25b58310e3).
> The attrs of lvs output still have 'a' bit.
> 
> ```patch
> +#if 0
>   if (!_online_pvscan_one(cmd, dev, NULL,
> complete_vgnames, saved_vgs, 0, _without_metadata))
>   add_errors++;
> +#endif


IIUC this would mean that you skip David's "pvs_online" file generation
entirely. How did the auto-activation happen, then?

> ```
> 
> the output of "systemd-analysis blame | head -n 10":
> ```
>   59.279s systemd-udev-settle.service
>   39.979s dracut-initqueue.service
>1.676s lvm2-activation-net.service

Could it be that lvm2-activation-net.service activated the VGs? I can
imagine that that would be efficient, because when this service runs
late in the boot process, I'd expect all PVs to be online, so
everything can be activated in a single big swoop. Unfortunately, this
wouldn't work in general, as it would be too late for booting from LVM
volumes.

However I thought all lvm2-acticvation... services were gone with LVM
2.03?

Regards
Martin


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] confused with lvm2 filter rules

2019-06-25 Thread Martin Wilck
Hello Zdenek,

On Tue, 2019-06-25 at 05:30 +, Heming Zhao wrote:
> Hello Zdenek,
> 
> I raise this topic again. Lvm should have bug in filter code.
> Let me show the example.
> 
> filter = [ "a|^/dev/sd.*|", "r|.*|" ]
> As document description, above filter rules:
>   deny all the devices except "/dev/sd.*"
> 
> issue:
>   pvcreate executes successfully with "/dev/disk/by-id/xxx", but 
> vgextend doesn't.
> 
> expect result:
>   pvcreate should deny the device "/dev/disk/by-id/xxx".
> 

I disagree with Heming in this point. IMO both pvcreate and vgextend
should accept the device because of the first "a" rule. In any case,
it's obviously wrong that the two tools behave differently.

Note also that this difference occurs only if lvmetad is used. Without
lvmetad, both commands accept the device. The reason is that in this
case, lvmcache_label_scan(), which also builds the alias tree, is
called before applying the filter. With lvmetad OTOH,
lvmcache_label_scan() is basically a noop.

IMO this should be fixed by adding a call to
lvmcache_seed_infos_from_lvmetad() before applying the device filter to
vgextend. vgcreate() calls it early on, right after
lvmcache_label_scan(); the same might work for vgextend as well.
Alternatively, it might be possible to add a call to 
lvmcache_seed_infos_from_lvmetad() to pvcreate_each_device(); in that
case it might even be possible to remove the early calls in vgcreate().
I don't understand the initialization sequence of LVM2 commands well
enough to create a patch myself. Besides vgextend, other LVM2 commands
that need to apply the filter may be affected, too, as 
lvmcache_seed_infos_from_lvmetad() seems to be used only in a few hand-
selected code paths.

I suspect that this problem came to be in David's "label_scan" patch
series in April 2018. But we haven't verified that yet. I've put David
on CC.

> 
> analysis:
> 
> vgextend log excerpt: the aliases DB is built up _after_ applying
> the I've put
> filter, which falsely rejects the device.
> ```log
>   lvmcmdline.c:2888Processing command: vgextend - -dd -t 
> testvg /dev/disk/by-id/scsi-36001405bbbdf17a69ad46ffb287a3340-part3
>   device/dev-cache.c:723 Found dev 8:35 
> /dev/disk/by-id/scsi-36001405bbbdf17a69ad46ffb287a3340-part3 - new.
>   filters/filter-regex.c:172 
> /dev/disk/by-id/scsi-36001405bbbdf17a69ad46ffb287a3340-part3:
> Skipping 
> (regex)
>   filters/filter-persistent.c:346 filter caching bad 
> /dev/disk/by-id/scsi-36001405bbbdf17a69ad46ffb287a3340-part3
>   ...
>   device/dev-cache.c:1212Creating list of system devices.
>   device/dev-cache.c:763 Found dev 8:35 /dev/sdc3 - new alias
>   device/dev-cache.c:763 Found dev 8:35 
> /dev/disk/by-id/iscsi-iqn.2018-06.de.suse.mwilck:sles15u-sp1-
> 01_i_iface:default-1--iqn.2018-06.de.suse.zeus:01_t_0x1-lun-3-part3 
> - new alias.
> ```
> 
> vgcreate: the aliases DB is built up before applying the filter,
> which 
> works correctly now.
> ```log
>   lvmcmdline.c:2888Processing command: vgcreate - -dd -t 
> tvg1 /dev/disk/by-id/scsi-36001405bbbdf17a69ad46ffb287a3340-part3
>   device/dev-cache.c:1212Creating list of system devices.
>   device/dev-cache.c:723 Found dev 8:35 /dev/sdc3 - new.
>   device/dev-cache.c:763 Found dev 8:35 
> /dev/disk/by-id/iscsi-iqn.2018-06.de.suse.mwilck:sles15u-sp1-
> 01_i_iface:default-1--iqn.2018-06.de.suse.zeus:01_t_0x1-lun-3-part3 
> - new alias.
>   filters/filter-persistent.c:312 /dev/sdc3: filter cache using 
> (cached good)
> ```
> 
> pvcreate will convert "/dev/disk/by-id/xxx" into another name 
> "/dev/sdX", which can pass the filter rule.

A bit more precisely: when running pvcreate (or vgcreate), by the time
the filter is called, "/dev/sdX" has been added to the list of aliases
and thus the device is accepted, whereas in a vgextend run, the list of
aliases has not been built up yet, and contains only a single member
"/dev/disk/by-id/...", which is rejected.

Regards,
Martin

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/