Re: [RFC][PATCH 1/5] exit: kill struct waitid_info

Christian Brauner Fri, 26 Jul 2019 05:44:01 -0700

On Fri, Jul 26, 2019 at 06:59:39AM -0500, Eric W. Biederman wrote:
> Christian Brauner <[email protected]> writes:
> 
> > On Thu, Jul 25, 2019 at 07:46:50AM -0500, Eric W. Biederman wrote:
> >> Linus Torvalds <[email protected]> writes:
> >> 
> >> > On Wed, Jul 24, 2019 at 7:47 AM Christian Brauner <[email protected]> 
> >> > wrote:
> >> >>
> >> >> The code here uses a struct waitid_info to catch basic information about
> >> >> process exit including the pid, uid, status, and signal that caused the
> >> >> process to exit. This information is then stuffed into a struct siginfo
> >> >> for the waitid() syscall. That seems like an odd thing to do. We can
> >> >> just pass down a siginfo_t struct directly which let's us cleanup and
> >> >> simplify the whole code quite a bit.
> >> >
> >> > Ack. Except I'd like the commit message to explain where this comes
> >> > from instead of that "That seems like an odd thing to do".
> >> >
> >> > The _original_ reason for "struct waitid_info" was that "siginfo_t" is
> >> > huge because of all the insane padding that various architectures do.
> >> >
> >> > So it was introduced by commit 67d7ddded322 ("waitid(2): leave copyout
> >> > of siginfo to syscall itself") very much to avoid stack usage issues.
> >> > And I quote:
> >> >
> >> >     collect the information needed for siginfo into
> >> >     a small structure (waitid_info)
> >> >
> >> > simply because "sigset_t" was big.
> >> >
> >> > But that size came from the explicit "pad it out to 128 bytes for
> >> > future expansion that will never happen", and the kernel using the
> >> > same exact sigset_t that was exposed to user space.
> >> >
> >> > Then in commit 4ce5f9c9e754 ("signal: Use a smaller struct siginfo in
> >> > the kernel") we got rid of the insane padding for in-kernel use,
> >> > exactly because it causes issues like this.
> >> >
> >> > End result: that "struct waitid_info" no longer makes sense. It's not
> >> > appreciably smaller than kernel_siginfo_t is today, but it made sense
> >> > at the time.
> >> 
> >> Apologies.  I meant to reply yesterday but I was preempted by baby
> >> issues.
> >> 
> >> I strongly disagree that this direction makes sense.  The largest
> >> value that I see from struct waitid_info is that it makes it possible to
> >> reason about which values are returned where struct kernel_siginfo does
> >> not.
> >> 
> >> One of the details the existence of struct waitid_info makes clear is
> >> that unlike the related child death path the wait code does not
> >> fillin si_utime and si_stime.  Which is very important to know when you
> >> are dealing with y2038 issues and Arnd Bergmann is.
> >> 
> >> The most egregious example I know of using siginfo wrong is:
> >> 70f1b0d34bdf ("signal/usb: Replace kill_pid_info_as_cred with
> >> kill_pid_usb_asyncio").  But just by moving struct siginfo out of the
> >> program logic and into dedicated little functions that just deal with
> >> the craziness of struct siginfo I have found lots of little bugs.
> >> 
> >> We don't need that kind of invitation to bugs in the wait logic.
> >
> > I don't think it's a strong enough argument for rejecting this change.
> > Suspecting that something might go wrong if we simplify something is a
> > valid call to proceed with caution and be on the lookout for potential
> > regressions so we can react fast. I respect that. But it's not
> > necessarily a good argument to reject a change.
> 
> Except your change was not a simplification.   Your change was
> a substitution to do the work of filling in struct kernel_siginfo in 4
> locations instead of just 2.
> 
> The only simplification came from not using unsafe_put_user.  Which is
> valid but has nothing to do with struct waitid_info.
> 
> > I'm happy to switch from an initializer (which is not even clear is a
> > bug) to using clear_siginfo().
> 
> I just double checked the definitions in signal_types.h and
> uapi/asm-generic/siginfo.h and there is definitely padding on 64bit.
> So yes barring magic compiler plug ins it is a bug.
> 
> > And I'm also going to split this patch out of the P_PIDFD patch but I'm
> > going to send this out again. I haven't heard a sound argument why
> > this
> > patch is worse than what we have right now in there.
> 
> Then I am afraid I have not expressed myself well.
> 
> When I read through this patch I saw.
> - A bug when dealing with struct kernel_siginfo.
> - A substitution from of struct waitid_info to struct kernel_siginfo.
> - An actual simplification in replacing several unsafe_put_user calls
>   with copy_siginfo_to_user.
> - A gratuitous change in change the order of several of the statements.
> - No simplification in the logic of do_wait.


I'm not going to feel bad for trying to improve a piece of code and not
getting it right the first time.
Removal of a custom struct that is only used to copy bits of information
into another struct for which we have a dedicated in-kernel struct to
avoid exactly that seems like a valid use of
s/<custom-struct>/<dedicated-kernel-struct>/ to me; especially since I
needed to touch that file anyway.

> 
> Or in short I saw you did "s/struct waitid_info/struct kernel_siginfo/"
> and introduced bugs.  Further you increased the number of locations that
> we need to be very careful with struct kernel_siginfo from 2 to 4.

If you're concerned about siginfo_t being used in more places than it is
right now, then please put a comment above it that it should be avoided
and that because of architectural nuances clear_signfo() needs to be
used to initialize it correctly.

Christian

Re: [RFC][PATCH 1/5] exit: kill struct waitid_info

Reply via email to