> On Jan 4, 2021, at 2:36 PM, David Laight <david.lai...@aculab.com> wrote: > > From: Eric W. Biederman >> Sent: 04 January 2021 20:41 >> >> Al Viro <v...@zeniv.linux.org.uk> writes: >> >>> On Mon, Jan 04, 2021 at 12:16:56PM +0000, David Laight wrote: >>>> On x86 in_compat_syscall() is defined as: >>>> in_ia32_syscall() || in_x32_syscall() >>>> >>>> Now in_ia32_syscall() is a simple check of the TS_COMPAT flag. >>>> However in_x32_syscall() is a horrid beast that has to indirect >>>> through to the original %eax value (ie the syscall number) and >>>> check for a bit there. >>>> >>>> So on a kernel with x32 support (probably most distro kernels) >>>> the in_compat_syscall() check is rather more expensive than >>>> one might expect. >> >> I suggest you check the distro kernels. I suspect they don't compile in >> support for x32. As far as I can tell x32 is an undead beast of a >> subarchitecture that just enough people use that it can't be removed, >> but few enough people use it likely has a few lurking scary bugs. > > It is defined in the Ubuntu kernel configs I've got lurking: > Both 3.8.0-19_generic (Ubuntu 13.04) and 5.4.0-56_generic (probably 20.04). > Which is probably why it is in my test builds (I've just cut out > a lot of modules). > >>>> It would be muck better if both checks could be done together. >>>> I think this would require the syscall entry code to set a >>>> value in both the 64bit and x32 entry paths. >>>> (Can a process make both 64bit and x32 system calls?) >>> >>> Yes, it bloody well can. >>> >>> And I see no benefit in pushing that logics into syscall entry, >>> since anything that calls in_compat_syscall() more than once >>> per syscall execution is doing the wrong thing. Moreover, >>> in quite a few cases we don't call the sucker at all, and for >>> all of those pushing that crap into syscall entry logics is >>> pure loss. >> >> The x32 system calls have their own system call table and it would be >> trivial to set a flag like TS_COMPAT when looking up a system call from >> that table. I expect such a change would be purely in the noise. > > Certainly a write of 0/1/2 into a dirtied cache line of 'current' > could easily cost absolutely nothing. > Especially if current has already been read. > > I also wondered about resetting it to zero when an x32 system call > exits (rather than entry to a 64bit one). > > For ia32 the flag is set (with |=) on every syscall entry. > Even though I'm pretty sure it can only change during exec.
It can change for every syscall. I have tests that do this. > >>> What's the point, really? >> >> Before we came up with the current games with __copy_siginfo_to_user >> and x32_copy_siginfo_to_user I was wondering if we should make such >> a change. The delivery of compat signal frames and core dumps which >> do not go through the system call entry path could almost benefit from >> a flag that could be set/tested when on those paths. > > For signal delivery it should (probably) depend on the system call > that setup the signal handler. I think it has worked this way for some time now. > Although I'm sure I remember one kernel where some of it was done > in libc (with a single entrypoint for all hadlers). > >> The fact that only SIGCHLD (which can not trigger a coredump) is >> different saves the coredump code from needing such a test. >> >> The fact that the signal frame code is simple enough it can directly >> call x32_copy_siginfo_to_user or __copy_siginfo_to_user saves us there. >> >> So I don't think we have any cases where we actually need a flag that >> is independent of the system call but we have come very close. > > If a program can do both 64bit and x32 system calls you probably > need to generate a 64bit core dump if it has ever made a 64bit > system call?? I think core dump should (and does) depend on the execution mode at the time of the crash. It’s worth noting that GCC’s understanding of mixed bitness is horrible.