Hi Ben, On Mon, 2026-04-27 at 20:20 -0400, Benjamin Marzinski wrote: > On Thu, Apr 23, 2026 at 10:44:43PM +0200, Martin Wilck wrote: > > This code adds a generic, abstract implementation of the kind of > > threads we > > need for checkers: detached threads that may time out. The design > > is such > > that these threads never access any memory of the calling program, > > simplifying the management of lifetimes of objects. > > > > See the documentation of the API in "runner.h". > > > > Signed-off-by: Martin Wilck <[email protected]> > > --- > > libmpathutil/Makefile | 2 +- > > libmpathutil/libmpathutil.version | 7 +- > > libmpathutil/runner.c | 221 > > ++++++++++++++++++++++++++++++ > > libmpathutil/runner.h | 102 ++++++++++++++ > > 4 files changed, 330 insertions(+), 2 deletions(-) > > create mode 100644 libmpathutil/runner.c > > create mode 100644 libmpathutil/runner.h > > > > [...] > > diff --git a/libmpathutil/runner.c b/libmpathutil/runner.c > > new file mode 100644 > > index 0000000..f5c5482 > > --- /dev/null > > +++ b/libmpathutil/runner.c [...] > > +static void *runner_thread(void *arg) > > +{ > > + int st, refcount; > > + /* > > + * The cleanup function makes sure memory is freed if the > > thread is > > + * cancelled (-fexceptions). > > + */ > > + struct runner_context *rctx > > __attribute__((cleanup(cleanup_context))) = arg; > > + > > + refcount = uatomic_add_return(&rctx->refcount, 1); > > + assert(refcount == 2); > > + > > + st = uatomic_cmpxchg(&rctx->status, RUNNER_IDLE, > > RUNNER_RUNNING); > > + /* > > + * The thread has already been cancelled before it was > > even started. > > + * In this case, cancel_runner() doesn't release the > > context. > > + * Do it here. See comments for RUNNER_IDLE in > > cancel_runner(). > > + */ > > I'm confused about this, and AFAICS, it doesn't appear to be true. If > you call check_runner() and it returns RUNNER_CANCELLED, you have no > idea if the runner was previously in RUNNER_IDLE, which means that > you > cannot touch the context anymore, or if the runner was in > RUNNER_RUNNING, in which case it's your job to release the context. > In the TUR checker, check_runner_state() just calls release_runner(), > which will cause a UAF error if the runner was in RUNNER_IDLE before > it got cancelled.
Right, good catch, thanks for finding it. This was a remnant of the previous code, where the caller didn't have to release the context after RUNNER_CANCELLED. Releasing the context here is just wrong, I'll drop the line. I had expected my tests to capture an UAF like this, but I made a dumb mistake: I used a timeout of 0 in my test code, which means "infinite" timeout for the runner, and thus my first test case, which was intended to capture thus this kind of issue, was ineffective. I will change my test program now to use 1us timeout if "-t 0" is specified. I will also add a compile-time option to artificially delay thread startup in order to test this. > It seems to me that the correct way to do this would be to set > rctx->refcount to 2 before creating the runner thread, and not even > have > a RUNNER_IDLE state. Correct, too. I was totally naïve to rely on the thread itself increasing the refcount if I allow the thread to be cancelled before it even starts :-/ However, I think I'll keep RUNNER_IDLE state around. It doesn't hurt and it provides possible insight into whether or not the runner function was actually called. > Perhaps I'm just being stupid, but I don't see the > necessity. If you set a thread to deferred cancellation, I don't > believe > it can get cancelled between when you create it, and when it hits > it's > first cancellation point, so you shouldn't have to worry about > getting > cancelled before the runner's cleanup_context handler isset up. I'm > pretty sure that if the thread gets created successfully, it should > always drop its reference count when it exits, regardless of if/when > it > gets cancelled. And if you fail to create the thread, you can just > drop > the refcount back to 1 before calling release_context() in > get_runner(). I believe that it's possible for pthread_create() to succeed and the program to terminate before the thread is started. In which case the thread's cleanup code won't be set up, and the context won't be freed. But we explicitly don't want to wait for these threads, so I think that's a cosmetic problem in a corner case that we don't need to bother with. Thanks, Martin
