Hi Peter,

Thank you for the review!

Peter Maydell <[email protected]> ezt írta (időpont: 2025. nov.
5., Sze, 14:53):
>
> On Tue, 4 Nov 2025 at 22:08, Balint Reczey via <[email protected]> wrote:
> >
> > Add a libc-backed path for safe_syscall() that make syscalls via
> > libc's syscall(). This enables interposing syscalls via LD_PRELOAD when
> > running static guest binaries under a dynamically linked qemu-user.
> >
> > The assembly implementation (safe_syscall_base()) remains the default.
> > A runtime switch or a set environment variable changes the behavior:
> >
> > Command line: -libc-syscall
> > Environment: QEMU_LIBC_SYSCALL
>
> > +/*
> > + * libc-backed implementation: Make a system call via libc's syscall()
> > + * if no guest signal is pending.
> > + */
> > +long safe_syscall_libc(int *pending, long number, ...)
> > +{
> > +    va_list ap;
> > +    long arg1, arg2, arg3, arg4, arg5, arg6;
> > +    long ret;
> > +
> > +    /* Check if a guest signal is pending */
> > +    if (qatomic_read(pending)) {
> > +        errno = QEMU_ERESTARTSYS;
> > +        return -1;
> > +    }
>
> We check for a pending signal here...
>
> > +
> > +    va_start(ap, number);
> > +    /* Extract up to 6 syscall arguments */
> > +    arg1 = va_arg(ap, long);
> > +    arg2 = va_arg(ap, long);
> > +    arg3 = va_arg(ap, long);
> > +    arg4 = va_arg(ap, long);
> > +    arg5 = va_arg(ap, long);
> > +    arg6 = va_arg(ap, long);
> > +    va_end(ap);
>
> ...but if a signal arrives after we checked but somewhere in here
> before we actually make the host syscall, then we may incorrectly
> block in the syscall.
>
> > +
> > +    /* Make the actual system call using libc's syscall() */
> > +    ret = syscall(number, arg1, arg2, arg3, arg4, arg5, arg6);
>
> This is the race condition which is the reason why safe_syscall
> is implemented in assembly: we need to be able to control exactly
> which code we're in so that the signal handler can adjust the PC
> if it sees that we were attempting to do a syscall when the
> signal arrived. (There's a longer explanation of this in a comment
> in include/user/safe-syscall.h.)

Yes, indeed. I moved the pending check right before the syscall()
call, but there is still a race and I think it can't be resolved as
well as it is done in the assembly implementation.

> Not getting this right results in various hangs and
> other misbehaviour when a signal arrives to the guest
> program at the wrong moment. We don't want to regress
> that behaviour. Any proposal for having QEMU call syscall()
> needs to avoid reintroducing the races.

Yes, this is why there is a run-time switch to choose the less
safe behavior, but I understand that this still may not be acceptable
for the project.
It works beautifully though for wrapping statically linked tools in builds,
thus for anyone interested the patch will be maintained at:
https://github.com/firebuild/qemu/tree/safe-syscalls-via-libc

Cheers,
Balint

> thanks
> -- PMM

Reply via email to