On Tue, Mar 29, 2016 at 10:02:41AM -0400, John Stoffel wrote: > >>>>> "Benjamin" == Benjamin Marzinski <bmarz...@redhat.com> writes: > > Benjamin> lvm needs PV devices to not be suspended while the udev > Benjamin> rules are running, for them to be correctly identified as > Benjamin> PVs. However, multipathd will often be in a situation where > Benjamin> it will create a multipath device upon seeing a path, and > Benjamin> then immediately reload the device upon seeing another path. > Benjamin> If multipath is reloading a device while processing the udev > Benjamin> event from its creation, lvm can fail to identify it as a > Benjamin> PV. This can cause systems to fail to boot. Unfortunately, > Benjamin> using udev synchronization cookies to solve this issue would > Benjamin> cause a host of other issues that could only be avoided by a > Benjamin> pretty substantial change in how multipathd does locking and > Benjamin> event processing. The good news is that multipathd is > Benjamin> already listening to udev events itself, and can make sure > Benjamin> that it isn't reloading when it shouldn't be. > > Benjamin> This patch makes multipathd delay or refuse any reloads that > Benjamin> would happen between the time when it creates a device, and > Benjamin> when it receives the change uevent from the device > Benjamin> creation. The only reloads that it refuses are from the > Benjamin> multipathd interactive commands that make no sense on a not > Benjamin> fully started device. Otherwise, it processes the event or > Benjamin> command, and sets a flag to either mark that device for an > Benjamin> update, or to signal that multipathd needs a > Benjamin> reconfigure. When the udev event for the creation arrives, > Benjamin> multipath will reload the device if necessary. If a > Benjamin> reconfigure has been requested, and no devices are currently > Benjamin> being created, multipathd will also do the reconfigure then. > > Benjamin> Also this patch adds a configurable timer > Benjamin> "missing_uev_msg_delay" defaulting to 30 seconds. If the > Benjamin> udev creation event has not arrived after this timeout has > Benjamin> triggered, multipathd will start printing messages alerting > Benjamin> the user of this every "missing_uev_msg_delay" seconds. > > Should this really keep printing this message every 30 seconds for > eternity? I would think that having it give up after 30 * N seconds > would be better instead. I'm worried that this might block or slow > down system boots forever, instead of at least failing and falling > through so that maybe something can be recovered here.
Fair enough. I should probably lower that timer, and after (timer * some_max_retries) seconds have passed, just stop waiting, and let the reloads go ahead. However, as this is, it isn't too likely to interfere with bootup. It won't do anything to stop the multipath devices from being created, just reloaded. > Basically, what can the user do if they start getting these messages? > We should prompt them with a possible cause/solution if at all > possible. All that the user could do would be to override the waiting, which is just what multipathd could do automatically. If we've already waited a while for the event and haven't received it, we're unlikely to have it come through while we are reloading. Also, the risk is the same for a person manually fixing it and for multipathd automatically doing it, so it seems like multipath should just automatically just stop waiting in this case. -Ben > John -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel