Cleaning up numbering for new x86 syscalls?
Hi all- We currently have some giant turds in the way that syscalls are numbered. We have the x86_32 table, which is totally sane other than some legacy multiplexers. Then we have the x86_64 table, which is, um, demented: - The numbers don't match x86_32. I have no idea why. - We use bit 30, which triggers in_x32_syscall(). It should have been bit 31, bit I digress. - We have this weird set of extra x32 syscalls that start at 512. Who wants to bet whether we have no bugs if someone does syscall with, say, nr == 512 (i.e. not 512 | BIT(30)) or nr == (16 | BIT(30))? The latter would be non-compat ioctl with in_x32_syscall() set and hence in_compat_syscall() set. - Bloody restart_syscall() has a different number on x86_64 and x64_32, which is a big mess. I propose we consider some subset of the following: 1. Introduce restart_syscall_2(). Make its number be 1024. Maybe someday we could start using it instead of restart_syscall(). The only issue I can see is programs that allow restart_syscall() using seccomp but don't allow the new variant. 2. Introduce an outright ban on new syscalls with nr < 1024. 3. Introduce an outright ban on the addition of new __x32_compat syscalls. If new compat hacks are needed, they can use in_compat_syscall(), thank you very much. 4. Modify the wrappers of the __x32_compat entries so that they will return -ENOSYS if in_x32_syscall() returns false. 5. Adjust the scripts so that we only have to wire up new syscalls once. They'll have a nr above 1024, and they'll have the same nr on all x86 variants. Thoughts?
Re: Cleaning up numbering for new x86 syscalls?
* Andy Lutomirski wrote: > Hi all- > > We currently have some giant turds in the way that syscalls are > numbered. We have the x86_32 table, which is totally sane other than > some legacy multiplexers. Then we have the x86_64 table, which is, > um, demented: > > - The numbers don't match x86_32. I have no idea why. > > - We use bit 30, which triggers in_x32_syscall(). It should have > been bit 31, bit I digress. > > - We have this weird set of extra x32 syscalls that start at 512. > Who wants to bet whether we have no bugs if someone does syscall with, > say, nr == 512 (i.e. not 512 | BIT(30)) or nr == (16 | BIT(30))? The > latter would be non-compat ioctl with in_x32_syscall() set and hence > in_compat_syscall() set. > > - Bloody restart_syscall() has a different number on x86_64 and > x64_32, which is a big mess. > > I propose we consider some subset of the following: > > 1. Introduce restart_syscall_2(). Make its number be 1024. Maybe > someday we could start using it instead of restart_syscall(). The > only issue I can see is programs that allow restart_syscall() using > seccomp but don't allow the new variant. > > 2. Introduce an outright ban on new syscalls with nr < 1024. Also let's make sure it results in a build error or boot panic if someone tries. > 3. Introduce an outright ban on the addition of new __x32_compat > syscalls. If new compat hacks are needed, they can use > in_compat_syscall(), thank you very much. Here too build-time and runtime enforcement would be nice. > 4. Modify the wrappers of the __x32_compat entries so that they will > return -ENOSYS if in_x32_syscall() returns false. > > 5. Adjust the scripts so that we only have to wire up new syscalls > once. They'll have a nr above 1024, and they'll have the same nr on > all x86 variants. > > Thoughts? Fully agreed: 6. Is x32 even used in practice? I still think it was a mistake to add it and some significant distributions like Fedora are not enabling it. Barring any sane way to phase out x32 support I'd suggest we implement all your suggestions. Thanks, Ingo
Re: Cleaning up numbering for new x86 syscalls?
* Andy Lutomirski: > 5. Adjust the scripts so that we only have to wire up new syscalls > once. They'll have a nr above 1024, and they'll have the same nr on > all x86 variants. Is there a sufficiently sized gap on all other architectures as well? The restriction to the x86 variants seems arbitrary to me. Thanks, Florian
Re: Cleaning up numbering for new x86 syscalls?
On Tue, Nov 20, 2018 at 1:03 AM Florian Weimer wrote: > > * Andy Lutomirski: > > > 5. Adjust the scripts so that we only have to wire up new syscalls > > once. They'll have a nr above 1024, and they'll have the same nr on > > all x86 variants. > > Is there a sufficiently sized gap on all other architectures as well? > The restriction to the x86 variants seems arbitrary to me. > Fair point. We have this shiny "generic" syscall list. Maybe we can get x86 synced up with it for new syscalls.
Re: Cleaning up numbering for new x86 syscalls?
On Mon, Nov 19, 2018 at 04:22:49PM -0800, Andy Lutomirski wrote: > Hi all- > > We currently have some giant turds in the way that syscalls are > numbered. We have the x86_32 table, which is totally sane other than > some legacy multiplexers. Then we have the x86_64 table, which is, > um, demented: > > - The numbers don't match x86_32. I have no idea why. > > - We use bit 30, which triggers in_x32_syscall(). It should have > been bit 31, bit I digress. > > - We have this weird set of extra x32 syscalls that start at 512. > Who wants to bet whether we have no bugs if someone does syscall with, > say, nr == 512 (i.e. not 512 | BIT(30)) or nr == (16 | BIT(30))? The > latter would be non-compat ioctl with in_x32_syscall() set and hence > in_compat_syscall() set. > > - Bloody restart_syscall() has a different number on x86_64 and > x64_32, which is a big mess. > > I propose we consider some subset of the following: > > 1. Introduce restart_syscall_2(). Make its number be 1024. Maybe > someday we could start using it instead of restart_syscall(). The > only issue I can see is programs that allow restart_syscall() using > seccomp but don't allow the new variant. > > 2. Introduce an outright ban on new syscalls with nr < 1024. > > 3. Introduce an outright ban on the addition of new __x32_compat > syscalls. If new compat hacks are needed, they can use > in_compat_syscall(), thank you very much. > > 4. Modify the wrappers of the __x32_compat entries so that they will > return -ENOSYS if in_x32_syscall() returns false. This sounds like a great idea independent of all of this. > 5. Adjust the scripts so that we only have to wire up new syscalls > once. They'll have a nr above 1024, and they'll have the same nr on > all x86 variants. > > Thoughts? +1. Who wants to do it? :D Tycho
Re: Cleaning up numbering for new x86 syscalls?
On Tue, Nov 20, 2018 at 07:23:09AM -0800, Andy Lutomirski wrote: > On Tue, Nov 20, 2018 at 1:03 AM Florian Weimer wrote: > > > > * Andy Lutomirski: > > > > > 5. Adjust the scripts so that we only have to wire up new syscalls > > > once. They'll have a nr above 1024, and they'll have the same nr on > > > all x86 variants. > > > > Is there a sufficiently sized gap on all other architectures as well? > > The restriction to the x86 variants seems arbitrary to me. > > > > Fair point. We have this shiny "generic" syscall list. Maybe we can > get x86 synced up with it for new syscalls. I heard this discussed at Plumbers. There was a proposal to use the same syscall numbers across architectures. Also, when adding new generic syscalls, they want all arches to be wired up at the same time. https://linuxplumbersconf.org/event/2/contributions/149/attachments/129/161/Ideas_to_improve_glibc_and_Kernel_interaction.pdf Adding Adhemerval to CC. -- Josh
Re: Cleaning up numbering for new x86 syscalls?
On 20/11/2018 08:33, Ingo Molnar wrote: [...] > 6. Is x32 even used in practice? I still think it was a mistake to add it >and some significant distributions like Fedora are not enabling it. x32 works as far as gcc/gas/ld is concerned (at least for compiling non-trivial programs). Finding a distribution that actually *delivers* x32 libraries is another thing (and said non-trivial software uses ATM e.g. libxml2) - at least I can't find an "x32-Ubuntu". And no, I don't see a compelling reason to (try to) build the n+1. architecture for the major distributions. And yes, lots of stuff will not compile out of the box (especially if one uses a somewhat sane set of gcc options - not only -Wall -Wextra -Werror) but if one gets software to compile for i386 and x86_64, getting it to compile for x32 is a Friday afternoon job (more or less). And yes, there is enough hardware/systems out there that uses 64bit CPUs (for whatever reason - if only that one can't get a 32bit CPU for that board) but will never ever need more than 2-3 GB RAM . MfG, Bernd -- Bernd Petrovitsch Email : be...@petrovitsch.priv.at LUGA : http://www.luga.at
Re: Cleaning up numbering for new x86 syscalls?
On Tue, Nov 20, 2018 at 1:25 AM Andy Lutomirski wrote: > > Hi all- > > We currently have some giant turds in the way that syscalls are > numbered. We have the x86_32 table, which is totally sane other than > some legacy multiplexers. Then we have the x86_64 table, which is, > um, demented: > > - The numbers don't match x86_32. I have no idea why. I think it was an early attempt at cleanup up the table, and only adding those that were still used. Back in the days, each architecture had its own table, and of course they started out as separate top-level architectures. > - We use bit 30, which triggers in_x32_syscall(). It should have > been bit 31, bit I digress. > > - We have this weird set of extra x32 syscalls that start at 512. > Who wants to bet whether we have no bugs if someone does syscall with, > say, nr == 512 (i.e. not 512 | BIT(30)) or nr == (16 | BIT(30))? The > latter would be non-compat ioctl with in_x32_syscall() set and hence > in_compat_syscall() set. The comment in the table says it's purely for keeping the calls in separate cache lines. I don't know if the cache lines make a difference in the end, but it seems that once we start running into the x32 syscall numbers, I think we just treat them like any others, we just choose to never call them from a 64-bit glibc. > I propose we consider some subset of the following: > > 1. Introduce restart_syscall_2(). Make its number be 1024. Maybe > someday we could start using it instead of restart_syscall(). The > only issue I can see is programs that allow restart_syscall() using > seccomp but don't allow the new variant. > > 2. Introduce an outright ban on new syscalls with nr < 1024. This would leave a hole of several hundred numbers if we do it for all architectures. Wasting multiple kilobytes for a cosmetic cleanup might be considered excessive. > 3. Introduce an outright ban on the addition of new __x32_compat > syscalls. If new compat hacks are needed, they can use > in_compat_syscall(), thank you very much. I would definitely want to keep anything regarding x32 out of the common syscall implementation. If you want to add on to that pile, please do it in arch/x86, not in kernel/ or fs/. If we decide that x32 is a failed experiment and we don't keep it working in the future, let's just kill it off right away. I'm fairly sure nobody depends on it for anything real, the only users I could find are either for showing off benchmark results or for playing around with it for fun. Most of that fun part has apparently ended many years ago, but there is still some work going into debian/x32. We probably need to coordinate with them and see if they know of actual users before removing it. Popcon lists 5 active users [1] and a sharp downward trend. > 4. Modify the wrappers of the __x32_compat entries so that they will > return -ENOSYS if in_x32_syscall() returns false. No objection here, but what would that help? > 5. Adjust the scripts so that we only have to wire up new syscalls > once. They'll have a nr above 1024, and they'll have the same nr on > all x86 variants. > > Thoughts? I would definitely welcome assigning the same syscall numbers across all architectures. It is a needless burden for the libc developers to figure out for each syscall which kernel is known to support it. When a call gets added, they typically add logic to check for the system call at runtime, but for older syscalls, it helps to know when all architectures support it once the minimum kernel version for a libc has been raised beyond that. Please see also the work that Firoz Khan has been posting for generalizing the tables on all architectures to use the format we have on x86, arm and s390. I hope we can merge it all for 4.21, and then build on top of that for generalization and cleanups. Arnd [1] https://popcon.debian.org/stat/sub-x32.png
Re: Cleaning up numbering for new x86 syscalls?
On Tue, Nov 20, 2018 at 4:35 PM Andy Lutomirski wrote: > > On Tue, Nov 20, 2018 at 1:03 AM Florian Weimer wrote: > > > > * Andy Lutomirski: > > > > > 5. Adjust the scripts so that we only have to wire up new syscalls > > > once. They'll have a nr above 1024, and they'll have the same nr on > > > all x86 variants. > > > > Is there a sufficiently sized gap on all other architectures as well? > > The restriction to the x86 variants seems arbitrary to me. > > > > Fair point. We have this shiny "generic" syscall list. Maybe we can > get x86 synced up with it for new syscalls. The generic table is already a subset of the x86 tables, so there should be no need to sync up the contents. It's more critical on other architectures that currently lack a number of the syscalls that got added in asm-generic and x86 recently, so I'd like to synchronize these all and add the missing calls to ensure that each architecture has at least all the calls from asm-generic table. After that, I would hope to come up with a way to add future numbers to all tables together, either using the same numbers everywhere (plus an offset where necessary, e.g. mips), or even have an include file logic so we only need a single file for future additions. Note: for y2038, we will have to add around 20 to 25 syscalls to each 32-bit architecture, plus another 10 for those that lack the separate sys_ipc calls. Arnd
Re: Cleaning up numbering for new x86 syscalls?
On Wed, 21 Nov 2018, Bernd Petrovitsch wrote: > And yes, lots of stuff will not compile out of the box (especially if > one uses a somewhat sane set of gcc options - not only -Wall -Wextra > -Werror) but if one gets software to compile for i386 and x86_64, > getting it to compile for x32 is a Friday afternoon job (more or less). > And yes, there is enough hardware/systems out there that uses 64bit CPUs > (for whatever reason - if only that one can't get a 32bit CPU for that > board) but will never ever need more than 2-3 GB RAM . The functionally equivalent 64-bit ILP32 MIPS n32 ABI has been around supported by Linux and the GNU toolchain for some 17 years now and people have been using it, so by now any sane piece of software that does not use handcoded assembly should work out of the box for the x86-64 x32 ABI as well. NB the important advantage of an LP64 ABI over an ILP32 ABI is the ability to mmap(2) files that exceed 4GiB in size (and in reality even smaller ones, as some user VM space is surely needed for other stuff), regardless of how much physical RAM is actually supported or has been installed. And these days even a web browser can easily overrun a 4GiB VM space. :( FWIW, Maciej