On Fri, Dec 19, 2025 at 09:06:49AM +0100, Martin Wilck wrote:
> On Thu, 2025-12-18 at 19:38 -0500, Benjamin Marzinski wrote:
> > On Wed, Dec 17, 2025 at 10:21:10PM +0100, Martin Wilck wrote:
> > It sounds like
> > the functions themselves are safe to call from multiple threads, as
> > long
> > as they are working on separate data. If they were manipulating
> > global
> > state that wouldn't be true.
> 
> They aren't working on completely separate data, that's the problem. We
> only have one "struct udev", which provides the context for the entire
> operation of libudev in our code. The udev_device structs we use are
> created from that object and link to it internally. Individual
> udev_device structs are linked together e.g. through parent
> releationships. That's why devices obtained by udev_device_get_parent()
> et al. don't need to be unref()'d; they're referenced by their children
> and supposed to be freed together with the last child.
> 
> As the udev operations are opaque, we can't make any assumptions about
> which of the libudev funtions allocate memory, or modify shared data
> structures otherwise. Some properties are fetched "lazily", when the
> user actually needs them, and then cached in libudev. The 2-layer
> architecture with udev_device being actually a shallow wrapper around
> sd_device etc. doesn't make it easier to assess the code flow inside
> the systemd libraries.
> 
> These issues should be fixed by serializing the libudev accesses with a
> mutex. I agree with you that I don't understand what could still go
> wrong if that's done.
> 
> In theory we could have separate "struct udev" devices per thread, but
> keeping these in sync with each other would be a nightmare. We could
> rather give up running multithreaded altogether.

Yeah. I started looking into this, and quickly realized the issue with
the shared struct udev and gave up. I can easily believe that two
threads running at the same time could corrupt a sharted object, and
since everything is linked to one struct udev, there's no way that we
can claim that there can't be shared objects in any of our calls.

I assume that we've likely been pretty safe because most (maybe all) of
our udev device accesses (which is where I assume things could get
corrupted the easiest) have been protected by the vecs lock. But the
purge code is about to add a bunch of unprotected accesses, so clearly
we need some locking there. 
 
> > 
> > > Add wrapper code that wraps all calls to libudev in a critical
> > > section
> > > holding a specific mutex.
> > > 
> > > [1] https://man7.org/linux/man-pages/man3/libudev.3.html
> > 
> > I assume that there must be times when use call a udev outside of the
> > vecs lock. Did you check how often that is? Not that would be great
> > to
> > pile even more work on the vecs lock.
> 
> I wouldn't want to do that. Even if we could create a comprehensive
> assessment now, it could become incorrect any time just by adding an
> innocent libudev call somewhere. Also, I wouldn't want to overload the
> vecs lock further. This code is ugly, but I am pretty sure that the
> mutex operations will be light-weight. The mutex is always at the
> bottom of our call stack, so no ABBA deadlock is possibe.
> 
> I started out by just wrapping some libudev calls, but I soon realized
> that we can't do that without making assumptions about libudev's inner
> workings, which we shouldn't (see above).
> 
> That said, I agree that it's unclear if this is really worth the
> effort. I did it because I was hoping to fix the memory leak issue #129
> with it. But that attempt was unsuccessful. So we have no evidence that
> this serialization fixes any actual issue. multipathd has been using
> libudev from multi-theaded context for 20 years, and we have never
> encountered a problem that was caused by that (or if we did, we didn't
> realize). At least in theory, the mutex could cause a deadlock or
> noticeable delay if some libudev operation blocks or crashes, so the
> change is not without risk. 
> 
> So, consider this part of the patch set an RFC.
> 
> OTOH, the libudev man page is pretty clear about being not thread-safe.

Especially with the purge code adding otherwise unprotected udev device
accesses, I think this is probably a good idea. I don't recall seeing
libudev calls block. I'm betting the real risk is for systems where the
root filesystem is multipathed, and you lose all paths. But that's
always the scary case, and we do enough udev accesses with the vecs lock
held, that if we can get wedged in one, we're likely already gonna get
locked up, even without these mutexes.

-Ben
 
> > Also, git flagged some whitespace errors.
> 
> Ups, will fix.
> 
> Thanks,
> Martin


Reply via email to