Hi Ben, On Thu, 2025-12-18 at 19:38 -0500, Benjamin Marzinski wrote: > On Wed, Dec 17, 2025 at 10:21:10PM +0100, Martin Wilck wrote: > > The libudev documentation [1] states that libudev is not thread- > > safe and > > shouldn't be used from multithreaded programs. While the man page > > also > > states that using libudev in multithreaded programs is unsafe even > > if > > locking is used, there's not much else we can do in multipath-tools > > to > > mitigate the problem, as we are using libudev in multiple threads > > all over > > the place. > > Ugh. I'm a little confused on what exactly could go wrong. Do you > know > what they are concerned with, that locking cannot fix?
No. I've asked our SUSE systemd experts but they didn't know either. The man page mentions the use of setenv()/getenv(), but I don't think that matters for us. > It sounds like > the functions themselves are safe to call from multiple threads, as > long > as they are working on separate data. If they were manipulating > global > state that wouldn't be true. They aren't working on completely separate data, that's the problem. We only have one "struct udev", which provides the context for the entire operation of libudev in our code. The udev_device structs we use are created from that object and link to it internally. Individual udev_device structs are linked together e.g. through parent releationships. That's why devices obtained by udev_device_get_parent() et al. don't need to be unref()'d; they're referenced by their children and supposed to be freed together with the last child. As the udev operations are opaque, we can't make any assumptions about which of the libudev funtions allocate memory, or modify shared data structures otherwise. Some properties are fetched "lazily", when the user actually needs them, and then cached in libudev. The 2-layer architecture with udev_device being actually a shallow wrapper around sd_device etc. doesn't make it easier to assess the code flow inside the systemd libraries. These issues should be fixed by serializing the libudev accesses with a mutex. I agree with you that I don't understand what could still go wrong if that's done. In theory we could have separate "struct udev" devices per thread, but keeping these in sync with each other would be a nightmare. We could rather give up running multithreaded altogether. > > > Add wrapper code that wraps all calls to libudev in a critical > > section > > holding a specific mutex. > > > > [1] https://man7.org/linux/man-pages/man3/libudev.3.html > > I assume that there must be times when use call a udev outside of the > vecs lock. Did you check how often that is? Not that would be great > to > pile even more work on the vecs lock. I wouldn't want to do that. Even if we could create a comprehensive assessment now, it could become incorrect any time just by adding an innocent libudev call somewhere. Also, I wouldn't want to overload the vecs lock further. This code is ugly, but I am pretty sure that the mutex operations will be light-weight. The mutex is always at the bottom of our call stack, so no ABBA deadlock is possibe. I started out by just wrapping some libudev calls, but I soon realized that we can't do that without making assumptions about libudev's inner workings, which we shouldn't (see above). That said, I agree that it's unclear if this is really worth the effort. I did it because I was hoping to fix the memory leak issue #129 with it. But that attempt was unsuccessful. So we have no evidence that this serialization fixes any actual issue. multipathd has been using libudev from multi-theaded context for 20 years, and we have never encountered a problem that was caused by that (or if we did, we didn't realize). At least in theory, the mutex could cause a deadlock or noticeable delay if some libudev operation blocks or crashes, so the change is not without risk. So, consider this part of the patch set an RFC. OTOH, the libudev man page is pretty clear about being not thread-safe. > Also, git flagged some whitespace errors. Ups, will fix. Thanks, Martin
