Hi Ben,

On Mon, 2026-04-27 at 20:20 -0400, Benjamin Marzinski wrote:
> On Thu, Apr 23, 2026 at 10:44:43PM +0200, Martin Wilck wrote:
> > This code adds a generic, abstract implementation of the kind of
> > threads we
> > need for checkers: detached threads that may time out. The design
> > is such
> > that these threads never access any memory of the calling program,
> > simplifying the management of lifetimes of objects.
> > 
> > See the documentation of the API in "runner.h".
> > 
> > Signed-off-by: Martin Wilck <[email protected]>
> > ---
> >  libmpathutil/Makefile             |   2 +-
> >  libmpathutil/libmpathutil.version |   7 +-
> >  libmpathutil/runner.c             | 221
> > ++++++++++++++++++++++++++++++
> >  libmpathutil/runner.h             | 102 ++++++++++++++
> >  4 files changed, 330 insertions(+), 2 deletions(-)
> >  create mode 100644 libmpathutil/runner.c
> >  create mode 100644 libmpathutil/runner.h
> > 
> > 
[...]
> > diff --git a/libmpathutil/runner.c b/libmpathutil/runner.c
> > new file mode 100644
> > index 0000000..f5c5482
> > --- /dev/null
> > +++ b/libmpathutil/runner.c
[...]
> > +static void *runner_thread(void *arg)
> > +{
> > +   int st, refcount;
> > +   /*
> > +    * The cleanup function makes sure memory is freed if the
> > thread is
> > +    * cancelled (-fexceptions).
> > +    */
> > +   struct runner_context *rctx
> > __attribute__((cleanup(cleanup_context))) = arg;
> > +
> > +   refcount = uatomic_add_return(&rctx->refcount, 1);
> > +   assert(refcount == 2);
> > +
> > +   st = uatomic_cmpxchg(&rctx->status, RUNNER_IDLE,
> > RUNNER_RUNNING);
> > +   /*
> > +    * The thread has already been cancelled before it was
> > even started.
> > +    * In this case, cancel_runner() doesn't release the
> > context.
> > +    * Do it here. See comments for RUNNER_IDLE in
> > cancel_runner().
> > +    */
> 
> I'm confused about this, and AFAICS, it doesn't appear to be true. If
> you call check_runner() and it returns RUNNER_CANCELLED, you have no
> idea if the runner was previously in RUNNER_IDLE, which means that
> you
> cannot touch the context anymore, or if the runner was in
> RUNNER_RUNNING, in which case it's your job to release the context.
> In the TUR checker, check_runner_state() just calls release_runner(),
> which will cause a UAF error if the runner was in RUNNER_IDLE before
> it got cancelled.

Right, good catch, thanks for finding it. This was a remnant of the
previous code, where the caller didn't have to release the context
after RUNNER_CANCELLED. 

Releasing the context here is just wrong, I'll drop the line.

I had expected my tests to capture an UAF like this, but I made a dumb
mistake: I used a timeout of 0 in my test code, which means "infinite"
timeout for the runner, and thus my first test case, which was intended
to capture thus this kind of issue, was ineffective. I will change my
test program now to use 1us timeout if "-t 0" is specified. I will also
add a compile-time option to artificially delay thread startup in order
to test this.

> It seems to me that the correct way to do this would be to set
> rctx->refcount to 2 before creating the runner thread, and not even
> have
> a RUNNER_IDLE state. 

Correct, too. I was totally naïve to rely on the thread itself
increasing the refcount if I allow the thread to be cancelled before
it even starts :-/

However, I think I'll keep RUNNER_IDLE state around. It doesn't hurt
and it provides possible insight into whether or not the runner
function was actually called.

> Perhaps I'm just being stupid, but I don't see the
> necessity. If you set a thread to deferred cancellation, I don't
> believe
> it can get cancelled between when you create it, and when it hits
> it's
> first cancellation point, so you shouldn't have to worry about
> getting
> cancelled before the runner's cleanup_context handler isset up. I'm
> pretty sure that if the thread gets created successfully, it should
> always drop its reference count when it exits, regardless of if/when
> it
> gets cancelled. And if you fail to create the thread, you can just
> drop
> the refcount back to 1 before calling release_context() in
> get_runner().

I believe that it's possible for pthread_create() to succeed and the
program to terminate before the thread is started. In which case the
thread's cleanup code won't be set up, and the context won't be freed.
But we explicitly don't want to wait for these threads, so I think
that's a cosmetic problem in a corner case that we don't need to bother
with.

Thanks,
Martin

Reply via email to