Re: [git pull] uaccess-related bits of vfs.git
On Sun, May 14, 2017 at 08:13:56PM +0200, Ingo Molnar wrote: > I'd say that the CLAC/STAC addition pretty much killed any argument in favor > of > "optimized" __get_user() code, so I'd be very happy to see these interfaces > gone > altogether. You and everybody else - these interfaces suck. If anything, we want paired brackets around a series of accesses instead of a single check in front of it. > So as far as x86 usage goes: > > Acked-by: Ingo MolnarUmm... Could you elaborate the situation with xen/page.h stuff? I don't see any obvious reasons that would guaratee that addresses passed to __get_user() and __put_user() there would match the set_fs() state. It might very well be true, but it's not obvious from that code... BTW, does anybody have a suggestion regarding a test load that would hit wait4/waitid as hard as possible? I've turned sys_wait4/sys_waitid into long kernel_wait4(pid_t upid, int *stat_addr, int options, struct rusage *ru) and static long kernel_waitid(int which, pid_t upid, struct waitid_info *infop, int options, struct rusage *ru) (with struct waitid_info { pid_t pid; uid_t uid; int status; int why; };), so that all copying to userland is done in sys_wait4() and friends. It seems to survive testing without any noticable slowdowns, but that's just LTP and xfstests - and a bug in my earlier version of that was _not_ caught by the LTP side; xfstests caught it... So any extra tests (both for correctness and timing) would be very much appreciated...
Re: [git pull] uaccess-related bits of vfs.git
On Sun, May 14, 2017 at 08:13:56PM +0200, Ingo Molnar wrote: > I'd say that the CLAC/STAC addition pretty much killed any argument in favor > of > "optimized" __get_user() code, so I'd be very happy to see these interfaces > gone > altogether. You and everybody else - these interfaces suck. If anything, we want paired brackets around a series of accesses instead of a single check in front of it. > So as far as x86 usage goes: > > Acked-by: Ingo Molnar Umm... Could you elaborate the situation with xen/page.h stuff? I don't see any obvious reasons that would guaratee that addresses passed to __get_user() and __put_user() there would match the set_fs() state. It might very well be true, but it's not obvious from that code... BTW, does anybody have a suggestion regarding a test load that would hit wait4/waitid as hard as possible? I've turned sys_wait4/sys_waitid into long kernel_wait4(pid_t upid, int *stat_addr, int options, struct rusage *ru) and static long kernel_waitid(int which, pid_t upid, struct waitid_info *infop, int options, struct rusage *ru) (with struct waitid_info { pid_t pid; uid_t uid; int status; int why; };), so that all copying to userland is done in sys_wait4() and friends. It seems to survive testing without any noticable slowdowns, but that's just LTP and xfstests - and a bug in my earlier version of that was _not_ caught by the LTP side; xfstests caught it... So any extra tests (both for correctness and timing) would be very much appreciated...
Re: [git pull] uaccess-related bits of vfs.git
* Linus Torvaldswrote: > On Fri, May 12, 2017 at 11:57 PM, Al Viro wrote: > > > > First, some stats: there's a thousand-odd callers of __get_user(). Out of > > those, about 70% are in arch/, mostly in sigframe-related code. > > Sure. And they can be trivially converted, and none of them should care at > all. > > > IOW, we have > > * most of users in arch/* (heavily dominated by signal-related code, > > both loads and stores). Those need careful massage; maybe unsafe-based > > solution, maybe something else, but it's obviously per-architecture work > > and these paths are sensitive. > > Why are they sensitive? > > Why not just do this: > > git grep -l '\<__\(\(get\)\|\(put\)\)_user(' -- arch/x86 > :^arch/x86/include/asm/uaccess.h > | xargs sed -i 's/__\(\(\(get\)\|\(put\)\)_user(\)/\1/g' > > which converts all the x86 uses in one go. > > Anybody who *relies* on not checking the address_limit is so broken as > to be not even funny. And anything that is so performance-sensitive > that anybody can even measure the effect of the above we can convert > later. I'd say that the CLAC/STAC addition pretty much killed any argument in favor of "optimized" __get_user() code, so I'd be very happy to see these interfaces gone altogether. So as far as x86 usage goes: Acked-by: Ingo Molnar Thanks, Ingo
Re: [git pull] uaccess-related bits of vfs.git
* Linus Torvalds wrote: > On Fri, May 12, 2017 at 11:57 PM, Al Viro wrote: > > > > First, some stats: there's a thousand-odd callers of __get_user(). Out of > > those, about 70% are in arch/, mostly in sigframe-related code. > > Sure. And they can be trivially converted, and none of them should care at > all. > > > IOW, we have > > * most of users in arch/* (heavily dominated by signal-related code, > > both loads and stores). Those need careful massage; maybe unsafe-based > > solution, maybe something else, but it's obviously per-architecture work > > and these paths are sensitive. > > Why are they sensitive? > > Why not just do this: > > git grep -l '\<__\(\(get\)\|\(put\)\)_user(' -- arch/x86 > :^arch/x86/include/asm/uaccess.h > | xargs sed -i 's/__\(\(\(get\)\|\(put\)\)_user(\)/\1/g' > > which converts all the x86 uses in one go. > > Anybody who *relies* on not checking the address_limit is so broken as > to be not even funny. And anything that is so performance-sensitive > that anybody can even measure the effect of the above we can convert > later. I'd say that the CLAC/STAC addition pretty much killed any argument in favor of "optimized" __get_user() code, so I'd be very happy to see these interfaces gone altogether. So as far as x86 usage goes: Acked-by: Ingo Molnar Thanks, Ingo
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 01:52:29PM -0700, Linus Torvalds wrote: > I wouldn't change the existing "memdup_user()" interface itself, but > if there really are users that can validly pass in a maxbyte value, > why not add a new helper: > > void *memdup_user_limit(userptr, nmember, nsize, maxsize); > > and then have > > #define memdup_user(ptr,size) memdup_user_limit(ptr, size, 1, -1) > > or something. I definitely see a couple of memdup_user() people who do > that "num*size" multiplication by hand, and it's very easy to get > wrong and have an overflow. > > And for a kvmalloc/kvfree() interface, you *definitely* want that > maxsize thing, since it absolutely needs an upper limit. *nod* Speaking of insanities around open-coded memdup_user()... Enjoy: ias_opt = kmalloc(sizeof(struct irda_ias_set), GFP_ATOMIC); if (ias_opt == NULL) { err = -ENOMEM; goto out; } /* Copy query to the driver. */ if (copy_from_user(ias_opt, optval, optlen)) { Can't have it block, sir, has to be GFP_ATOMIC... Whaddya mean, "what if copy_from_user() blocks?" As far as I can see, that came in circa 2.4.6, when local sturct irda_ias_set got switched to dynamic allocation...
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 01:52:29PM -0700, Linus Torvalds wrote: > I wouldn't change the existing "memdup_user()" interface itself, but > if there really are users that can validly pass in a maxbyte value, > why not add a new helper: > > void *memdup_user_limit(userptr, nmember, nsize, maxsize); > > and then have > > #define memdup_user(ptr,size) memdup_user_limit(ptr, size, 1, -1) > > or something. I definitely see a couple of memdup_user() people who do > that "num*size" multiplication by hand, and it's very easy to get > wrong and have an overflow. > > And for a kvmalloc/kvfree() interface, you *definitely* want that > maxsize thing, since it absolutely needs an upper limit. *nod* Speaking of insanities around open-coded memdup_user()... Enjoy: ias_opt = kmalloc(sizeof(struct irda_ias_set), GFP_ATOMIC); if (ias_opt == NULL) { err = -ENOMEM; goto out; } /* Copy query to the driver. */ if (copy_from_user(ias_opt, optval, optlen)) { Can't have it block, sir, has to be GFP_ATOMIC... Whaddya mean, "what if copy_from_user() blocks?" As far as I can see, that came in circa 2.4.6, when local sturct irda_ias_set got switched to dynamic allocation...
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 1:37 PM, Al Virowrote: > > That's a valid point and it might apply to memdup_user() callers out there. > Potential variants: > * add an explicit upper bound on the size and turn that into > memdup_user() (and check that all memdup_user() callers are bounded). > * have memdup_user() itself pass __GFP_NOWARN. > * add kvmemdup_user() that would use kvmalloc() (with its callers > expected to use kvfree()); see who else might benefit from conversion. All of the above sound reasonable. I wouldn't change the existing "memdup_user()" interface itself, but if there really are users that can validly pass in a maxbyte value, why not add a new helper: void *memdup_user_limit(userptr, nmember, nsize, maxsize); and then have #define memdup_user(ptr,size) memdup_user_limit(ptr, size, 1, -1) or something. I definitely see a couple of memdup_user() people who do that "num*size" multiplication by hand, and it's very easy to get wrong and have an overflow. And for a kvmalloc/kvfree() interface, you *definitely* want that maxsize thing, since it absolutely needs an upper limit. Linus
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 1:37 PM, Al Viro wrote: > > That's a valid point and it might apply to memdup_user() callers out there. > Potential variants: > * add an explicit upper bound on the size and turn that into > memdup_user() (and check that all memdup_user() callers are bounded). > * have memdup_user() itself pass __GFP_NOWARN. > * add kvmemdup_user() that would use kvmalloc() (with its callers > expected to use kvfree()); see who else might benefit from conversion. All of the above sound reasonable. I wouldn't change the existing "memdup_user()" interface itself, but if there really are users that can validly pass in a maxbyte value, why not add a new helper: void *memdup_user_limit(userptr, nmember, nsize, maxsize); and then have #define memdup_user(ptr,size) memdup_user_limit(ptr, size, 1, -1) or something. I definitely see a couple of memdup_user() people who do that "num*size" multiplication by hand, and it's very easy to get wrong and have an overflow. And for a kvmalloc/kvfree() interface, you *definitely* want that maxsize thing, since it absolutely needs an upper limit. Linus
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 10:32:31PM +0200, Geert Uytterhoeven wrote: > You better run that one through linux-spi, to avoid conflicts, cfr. > https://patchwork.kernel.org/patch/9714993/ What I'm going to do is never-rebased #for-spi (well, never after -rc1) merged into #work.uaccess and proposed for pull to linux-spi. The local tree is a mess right now - bloody lot of branches, huge unsorted pile, etc. This was from #unsorted, actually - should've picked the one from #for-spi (a couple of brainos fixed in the version there)...
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 10:32:31PM +0200, Geert Uytterhoeven wrote: > You better run that one through linux-spi, to avoid conflicts, cfr. > https://patchwork.kernel.org/patch/9714993/ What I'm going to do is never-rebased #for-spi (well, never after -rc1) merged into #work.uaccess and proposed for pull to linux-spi. The local tree is a mess right now - bloody lot of branches, huge unsorted pile, etc. This was from #unsorted, actually - should've picked the one from #for-spi (a couple of brainos fixed in the version there)...
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 12:00:10PM -0700, Linus Torvalds wrote: > From: Linus Torvalds> Date: Tue, 24 Mar 2015 10:42:18 -0700 > > > > So I'd suggest we should just do a wholesale replacement of > > __copy_to/from_user() with the non-underlined cases. Then, we could > > switch insividual ones back - with reasoning of why they matter, and > > with pointers to how it does access_ok() two lines before. > > > > We should probably even consider looking at __get_user/__put_user(). > > Few of them are actually performance-critical. > > Look at that date. It's over two years ago. In the intervening two > years, how many of those conversions have happened? Speaking of killing that kind of crap off: there was a question left from the last cycle that hadn't been sorted out. SCTP does this in a couple of places: /* Check the user passed a healthy pointer. */ if (unlikely(!access_ok(VERIFY_READ, addrs, addrs_size))) return -EFAULT; /* Alloc space for the address array in kernel memory. */ kaddrs = kmalloc(addrs_size, GFP_USER | __GFP_NOWARN); if (unlikely(!kaddrs)) return -ENOMEM; if (__copy_from_user(kaddrs, addrs, addrs_size)) { kfree(kaddrs); return -EFAULT; } instead of memdup_user(). Part of the rationale is pretty weak (access_ok() as sanity check to prevent user-triggerable attempts to allocate too much - it still can trivially trigger 2G, so it's not worth much), part is more interesting. Namely, that whining into the syslog shouldn't be that easy to trigger. That's a valid point and it might apply to memdup_user() callers out there. Potential variants: * add an explicit upper bound on the size and turn that into memdup_user() (and check that all memdup_user() callers are bounded). * have memdup_user() itself pass __GFP_NOWARN. * add kvmemdup_user() that would use kvmalloc() (with its callers expected to use kvfree()); see who else might benefit from conversion. Preferences?
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 12:00:10PM -0700, Linus Torvalds wrote: > From: Linus Torvalds > Date: Tue, 24 Mar 2015 10:42:18 -0700 > > > > So I'd suggest we should just do a wholesale replacement of > > __copy_to/from_user() with the non-underlined cases. Then, we could > > switch insividual ones back - with reasoning of why they matter, and > > with pointers to how it does access_ok() two lines before. > > > > We should probably even consider looking at __get_user/__put_user(). > > Few of them are actually performance-critical. > > Look at that date. It's over two years ago. In the intervening two > years, how many of those conversions have happened? Speaking of killing that kind of crap off: there was a question left from the last cycle that hadn't been sorted out. SCTP does this in a couple of places: /* Check the user passed a healthy pointer. */ if (unlikely(!access_ok(VERIFY_READ, addrs, addrs_size))) return -EFAULT; /* Alloc space for the address array in kernel memory. */ kaddrs = kmalloc(addrs_size, GFP_USER | __GFP_NOWARN); if (unlikely(!kaddrs)) return -ENOMEM; if (__copy_from_user(kaddrs, addrs, addrs_size)) { kfree(kaddrs); return -EFAULT; } instead of memdup_user(). Part of the rationale is pretty weak (access_ok() as sanity check to prevent user-triggerable attempts to allocate too much - it still can trivially trigger 2G, so it's not worth much), part is more interesting. Namely, that whining into the syslog shouldn't be that easy to trigger. That's a valid point and it might apply to memdup_user() callers out there. Potential variants: * add an explicit upper bound on the size and turn that into memdup_user() (and check that all memdup_user() callers are bounded). * have memdup_user() itself pass __GFP_NOWARN. * add kvmemdup_user() that would use kvmalloc() (with its callers expected to use kvfree()); see who else might benefit from conversion. Preferences?
Re: [git pull] uaccess-related bits of vfs.git
Hi Al, On Sat, May 13, 2017 at 10:08 PM, Al Virowrote: > On Sat, May 13, 2017 at 08:56:59PM +0100, Al Viro wrote: > >> FWIW, just this cycle (this one I remembered off-hand, there might be >> more): > > And looking through my queue (will be pushed to -next as soon as -rc1 goes > out): > > commit 87fb4c8c103a4cdf17fead4aba58e96940a19a09 > Author: Al Viro > Date: Thu Apr 20 15:47:34 2017 -0400 > > spidev: quit messing with access_ok() > > Signed-off-by: Al Viro > > diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c > index 9e2e099baf8c..8dd22de5e3b5 100644 > --- a/drivers/spi/spidev.c > +++ b/drivers/spi/spidev.c You better run that one through linux-spi, to avoid conflicts, cfr. https://patchwork.kernel.org/patch/9714993/ Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [git pull] uaccess-related bits of vfs.git
Hi Al, On Sat, May 13, 2017 at 10:08 PM, Al Viro wrote: > On Sat, May 13, 2017 at 08:56:59PM +0100, Al Viro wrote: > >> FWIW, just this cycle (this one I remembered off-hand, there might be >> more): > > And looking through my queue (will be pushed to -next as soon as -rc1 goes > out): > > commit 87fb4c8c103a4cdf17fead4aba58e96940a19a09 > Author: Al Viro > Date: Thu Apr 20 15:47:34 2017 -0400 > > spidev: quit messing with access_ok() > > Signed-off-by: Al Viro > > diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c > index 9e2e099baf8c..8dd22de5e3b5 100644 > --- a/drivers/spi/spidev.c > +++ b/drivers/spi/spidev.c You better run that one through linux-spi, to avoid conflicts, cfr. https://patchwork.kernel.org/patch/9714993/ Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 08:56:59PM +0100, Al Viro wrote: > FWIW, just this cycle (this one I remembered off-hand, there might be > more): And looking through my queue (will be pushed to -next as soon as -rc1 goes out): commit 87fb4c8c103a4cdf17fead4aba58e96940a19a09 Author: Al ViroDate: Thu Apr 20 15:47:34 2017 -0400 spidev: quit messing with access_ok() Signed-off-by: Al Viro diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c index 9e2e099baf8c..8dd22de5e3b5 100644 --- a/drivers/spi/spidev.c +++ b/drivers/spi/spidev.c @@ -254,10 +254,6 @@ static int spidev_message(struct spidev_data *spidev, goto done; } k_tmp->rx_buf = rx_buf; - if (!access_ok(VERIFY_WRITE, (u8 __user *) - (uintptr_t) u_tmp->rx_buf, - u_tmp->len)) - goto done; rx_buf += k_tmp->len; } if (u_tmp->tx_buf) { @@ -305,7 +301,7 @@ static int spidev_message(struct spidev_data *spidev, rx_buf = spidev->rx_buffer; for (n = n_xfers, u_tmp = u_xfers; n; n--, u_tmp++) { if (u_tmp->rx_buf) { - if (__copy_to_user((u8 __user *) + if (copy_to_user((u8 __user *) (uintptr_t) u_tmp->rx_buf, rx_buf, u_tmp->len)) { status = -EFAULT; @@ -325,8 +321,7 @@ static struct spi_ioc_transfer * spidev_get_ioc_message(unsigned int cmd, struct spi_ioc_transfer __user *u_ioc, unsigned *n_ioc) { - struct spi_ioc_transfer *ioc; - u32 tmp; + u32 size; /* Check type, command number and direction */ if (_IOC_TYPE(cmd) != SPI_IOC_MAGIC @@ -334,22 +329,15 @@ spidev_get_ioc_message(unsigned int cmd, struct spi_ioc_transfer __user *u_ioc, || _IOC_DIR(cmd) != _IOC_WRITE) return ERR_PTR(-ENOTTY); - tmp = _IOC_SIZE(cmd); + size = _IOC_SIZE(cmd); if ((tmp % sizeof(struct spi_ioc_transfer)) != 0) return ERR_PTR(-EINVAL); - *n_ioc = tmp / sizeof(struct spi_ioc_transfer); + *n_ioc = size / sizeof(struct spi_ioc_transfer); if (*n_ioc == 0) return NULL; /* copy into scratch area */ - ioc = kmalloc(tmp, GFP_KERNEL); - if (!ioc) - return ERR_PTR(-ENOMEM); - if (__copy_from_user(ioc, u_ioc, tmp)) { - kfree(ioc); - return ERR_PTR(-EFAULT); - } - return ioc; + return memdup_user(u_ioc, size); } static long @@ -367,19 +355,6 @@ spidev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) if (_IOC_TYPE(cmd) != SPI_IOC_MAGIC) return -ENOTTY; - /* Check access direction once here; don't repeat below. -* IOC_DIR is from the user perspective, while access_ok is -* from the kernel perspective; so they look reversed. -*/ - if (_IOC_DIR(cmd) & _IOC_READ) - err = !access_ok(VERIFY_WRITE, - (void __user *)arg, _IOC_SIZE(cmd)); - if (err == 0 && _IOC_DIR(cmd) & _IOC_WRITE) - err = !access_ok(VERIFY_READ, - (void __user *)arg, _IOC_SIZE(cmd)); - if (err) - return -EFAULT; - /* guard against device removal before, or while, * we issue this ioctl. */ @@ -402,31 +377,31 @@ spidev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) switch (cmd) { /* read requests */ case SPI_IOC_RD_MODE: - retval = __put_user(spi->mode & SPI_MODE_MASK, + retval = put_user(spi->mode & SPI_MODE_MASK, (__u8 __user *)arg); break; case SPI_IOC_RD_MODE32: - retval = __put_user(spi->mode & SPI_MODE_MASK, + retval = put_user(spi->mode & SPI_MODE_MASK, (__u32 __user *)arg); break; case SPI_IOC_RD_LSB_FIRST: - retval = __put_user((spi->mode & SPI_LSB_FIRST) ? 1 : 0, + retval = put_user((spi->mode & SPI_LSB_FIRST) ? 1 : 0, (__u8 __user *)arg); break; case SPI_IOC_RD_BITS_PER_WORD: - retval = __put_user(spi->bits_per_word, (__u8 __user *)arg); + retval = put_user(spi->bits_per_word, (__u8 __user *)arg); break; case SPI_IOC_RD_MAX_SPEED_HZ: - retval = __put_user(spidev->speed_hz, (__u32 __user *)arg); + retval = put_user(spidev->speed_hz, (__u32
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 08:56:59PM +0100, Al Viro wrote: > FWIW, just this cycle (this one I remembered off-hand, there might be > more): And looking through my queue (will be pushed to -next as soon as -rc1 goes out): commit 87fb4c8c103a4cdf17fead4aba58e96940a19a09 Author: Al Viro Date: Thu Apr 20 15:47:34 2017 -0400 spidev: quit messing with access_ok() Signed-off-by: Al Viro diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c index 9e2e099baf8c..8dd22de5e3b5 100644 --- a/drivers/spi/spidev.c +++ b/drivers/spi/spidev.c @@ -254,10 +254,6 @@ static int spidev_message(struct spidev_data *spidev, goto done; } k_tmp->rx_buf = rx_buf; - if (!access_ok(VERIFY_WRITE, (u8 __user *) - (uintptr_t) u_tmp->rx_buf, - u_tmp->len)) - goto done; rx_buf += k_tmp->len; } if (u_tmp->tx_buf) { @@ -305,7 +301,7 @@ static int spidev_message(struct spidev_data *spidev, rx_buf = spidev->rx_buffer; for (n = n_xfers, u_tmp = u_xfers; n; n--, u_tmp++) { if (u_tmp->rx_buf) { - if (__copy_to_user((u8 __user *) + if (copy_to_user((u8 __user *) (uintptr_t) u_tmp->rx_buf, rx_buf, u_tmp->len)) { status = -EFAULT; @@ -325,8 +321,7 @@ static struct spi_ioc_transfer * spidev_get_ioc_message(unsigned int cmd, struct spi_ioc_transfer __user *u_ioc, unsigned *n_ioc) { - struct spi_ioc_transfer *ioc; - u32 tmp; + u32 size; /* Check type, command number and direction */ if (_IOC_TYPE(cmd) != SPI_IOC_MAGIC @@ -334,22 +329,15 @@ spidev_get_ioc_message(unsigned int cmd, struct spi_ioc_transfer __user *u_ioc, || _IOC_DIR(cmd) != _IOC_WRITE) return ERR_PTR(-ENOTTY); - tmp = _IOC_SIZE(cmd); + size = _IOC_SIZE(cmd); if ((tmp % sizeof(struct spi_ioc_transfer)) != 0) return ERR_PTR(-EINVAL); - *n_ioc = tmp / sizeof(struct spi_ioc_transfer); + *n_ioc = size / sizeof(struct spi_ioc_transfer); if (*n_ioc == 0) return NULL; /* copy into scratch area */ - ioc = kmalloc(tmp, GFP_KERNEL); - if (!ioc) - return ERR_PTR(-ENOMEM); - if (__copy_from_user(ioc, u_ioc, tmp)) { - kfree(ioc); - return ERR_PTR(-EFAULT); - } - return ioc; + return memdup_user(u_ioc, size); } static long @@ -367,19 +355,6 @@ spidev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) if (_IOC_TYPE(cmd) != SPI_IOC_MAGIC) return -ENOTTY; - /* Check access direction once here; don't repeat below. -* IOC_DIR is from the user perspective, while access_ok is -* from the kernel perspective; so they look reversed. -*/ - if (_IOC_DIR(cmd) & _IOC_READ) - err = !access_ok(VERIFY_WRITE, - (void __user *)arg, _IOC_SIZE(cmd)); - if (err == 0 && _IOC_DIR(cmd) & _IOC_WRITE) - err = !access_ok(VERIFY_READ, - (void __user *)arg, _IOC_SIZE(cmd)); - if (err) - return -EFAULT; - /* guard against device removal before, or while, * we issue this ioctl. */ @@ -402,31 +377,31 @@ spidev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) switch (cmd) { /* read requests */ case SPI_IOC_RD_MODE: - retval = __put_user(spi->mode & SPI_MODE_MASK, + retval = put_user(spi->mode & SPI_MODE_MASK, (__u8 __user *)arg); break; case SPI_IOC_RD_MODE32: - retval = __put_user(spi->mode & SPI_MODE_MASK, + retval = put_user(spi->mode & SPI_MODE_MASK, (__u32 __user *)arg); break; case SPI_IOC_RD_LSB_FIRST: - retval = __put_user((spi->mode & SPI_LSB_FIRST) ? 1 : 0, + retval = put_user((spi->mode & SPI_LSB_FIRST) ? 1 : 0, (__u8 __user *)arg); break; case SPI_IOC_RD_BITS_PER_WORD: - retval = __put_user(spi->bits_per_word, (__u8 __user *)arg); + retval = put_user(spi->bits_per_word, (__u8 __user *)arg); break; case SPI_IOC_RD_MAX_SPEED_HZ: - retval = __put_user(spidev->speed_hz, (__u32 __user *)arg); + retval = put_user(spidev->speed_hz, (__u32 __user *)arg); break; /*
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 12:00:10PM -0700, Linus Torvalds wrote: > > We should probably even consider looking at __get_user/__put_user(). > > Few of them are actually performance-critical. > > Look at that date. It's over two years ago. In the intervening two > years, how many of those conversions have happened? > > Here's a hint: it's a very very round number. FWIW, just this cycle (this one I remembered off-hand, there might be more): commit edb88cef0570914375d461107759cf0d6d677ed5 Author: Arnd BergmannDate: Sat Apr 22 00:02:31 2017 +0200 scsi: pmcraid: use normal copy_from_user As pointed out by Al Viro for my previous series, the driver has no need to call access_ok() and __copy_from_user()/__copy_to_user(). Changing it to regular copy_from_user()/copy_to_user() simplifies the code without any real downsides, making it less error-prone at best. This patch by itself also addresses the warning about the access_ok() macro on MIPS, but both fixes improve the code, so ideally we apply them both. Signed-off-by: Arnd Bergmann Reviewed-by: Christoph Hellwig Signed-off-by: Martin K. Petersen
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 12:00:10PM -0700, Linus Torvalds wrote: > > We should probably even consider looking at __get_user/__put_user(). > > Few of them are actually performance-critical. > > Look at that date. It's over two years ago. In the intervening two > years, how many of those conversions have happened? > > Here's a hint: it's a very very round number. FWIW, just this cycle (this one I remembered off-hand, there might be more): commit edb88cef0570914375d461107759cf0d6d677ed5 Author: Arnd Bergmann Date: Sat Apr 22 00:02:31 2017 +0200 scsi: pmcraid: use normal copy_from_user As pointed out by Al Viro for my previous series, the driver has no need to call access_ok() and __copy_from_user()/__copy_to_user(). Changing it to regular copy_from_user()/copy_to_user() simplifies the code without any real downsides, making it less error-prone at best. This patch by itself also addresses the warning about the access_ok() macro on MIPS, but both fixes improve the code, so ideally we apply them both. Signed-off-by: Arnd Bergmann Reviewed-by: Christoph Hellwig Signed-off-by: Martin K. Petersen
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 08:11:42PM +0100, Al Viro wrote: > It's not impossible to review; the problem is in figuring out which codepaths > are hot enough to worry about them. And I'm even more convinced that > bulk search-and-replace is the right approach here; we probably _can_ get ^- not apologies for sloppy editing... > rid of that shit this cycle (or, at least, reduce its use to very few places > in arch/*), but that'll require some cooperation from architecture > maintainers.
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 08:11:42PM +0100, Al Viro wrote: > It's not impossible to review; the problem is in figuring out which codepaths > are hot enough to worry about them. And I'm even more convinced that > bulk search-and-replace is the right approach here; we probably _can_ get ^- not apologies for sloppy editing... > rid of that shit this cycle (or, at least, reduce its use to very few places > in arch/*), but that'll require some cooperation from architecture > maintainers.
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 12:00:10PM -0700, Linus Torvalds wrote: > On Sat, May 13, 2017 at 11:04 AM, Al Virowrote: > > > > > > My point is, this stuff needs looking at. > > No. > > You don't have a point. I've tried to explain that there's no > performance difference, and there is no way in hell that the current > "__" versions are better. > > So what's the point in looking? All you are ever going to come up with > is theoretical "this might" cases. Umm... Just during this subthread - a couple of likely breakages in xen (which would need looking into, right?) and "oopsen on alpha would stop actually printing code", which is better to spot early. The first one might be theoretical, but it's worth giving heads-up to xen folks; I'm fairly sure that this spot will break. The second is not theoretical at all - it *will* break.
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 12:00:10PM -0700, Linus Torvalds wrote: > On Sat, May 13, 2017 at 11:04 AM, Al Viro wrote: > > > > > > My point is, this stuff needs looking at. > > No. > > You don't have a point. I've tried to explain that there's no > performance difference, and there is no way in hell that the current > "__" versions are better. > > So what's the point in looking? All you are ever going to come up with > is theoretical "this might" cases. Umm... Just during this subthread - a couple of likely breakages in xen (which would need looking into, right?) and "oopsen on alpha would stop actually printing code", which is better to spot early. The first one might be theoretical, but it's worth giving heads-up to xen folks; I'm fairly sure that this spot will break. The second is not theoretical at all - it *will* break.
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 07:26:56PM +0100, Al Viro wrote: > BTW, even in arch/* they tend to nest. E.g. arch/alpha has 133 callers > total. Distribution by files: > 35 arch/alpha/kernel/osf_sys.c > 92 arch/alpha/kernel/signal.c > 1 arch/alpha/kernel/traps.c > 4 arch/alpha/lib/csum_partial_copy.c > 1 arch/alpha/mm/fault.c Another large pile is on sparc (208 callers). Except that on sparc64 access_ok() is constant 1, which reduces it to 42 callers. 3 arch/sparc/kernel/ptrace_32.c (all in arch_ptrace()) 31 arch/sparc/kernel/signal_32.c (5 in do_sigreturn(), 6 in do_rt_sigreturn(), 8 in setup_frame(), 11 in setup_rt_frame(), 1 in do_sys_sigstack()) 7 arch/sparc/kernel/sigutil_32.c (2 in save_fpu_state(), 2 in restore_fpu_state(), 2 in save_rwin_state(), 1 in restore_rwin_state() 1 arch/sparc/mm/fault_32.c (in compute_si_addr()) The last one ignores -EFAULT, BTW - not sure what should it have done in that case, though. It's calculation of ->si_addr on SIGSEGV and SIGBUS in case of data access SIGSEGV or SIGBUS; it wants to peek into insn to figure out the effective address. arch_ptrace() one is zeroing several fields in userland struct fps for PTRACE_GETFPREGS. Could've been put_user() (or clear_user(), for that matter); we'd just done copy_regset_to_user() on adjacent addresses, so the lack of access_ok() is not a security hole there). Everything else is in sigframe handling... Other large piles are on mips, ppc and itanic. parisc is next, but there access_ok() is constant 1 as well. Same for s390 and m68k. nios2 and unicore32 are a bit under 80 callers each and the next are tile (~60), sh (~50) and then it drops off fast. It's not impossible to review; the problem is in figuring out which codepaths are hot enough to worry about them. And I'm even more convinced that bulk search-and-replace is the right approach here; we probably _can_ get rid of that shit this cycle (or, at least, reduce its use to very few places in arch/*), but that'll require some cooperation from architecture maintainers.
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 07:26:56PM +0100, Al Viro wrote: > BTW, even in arch/* they tend to nest. E.g. arch/alpha has 133 callers > total. Distribution by files: > 35 arch/alpha/kernel/osf_sys.c > 92 arch/alpha/kernel/signal.c > 1 arch/alpha/kernel/traps.c > 4 arch/alpha/lib/csum_partial_copy.c > 1 arch/alpha/mm/fault.c Another large pile is on sparc (208 callers). Except that on sparc64 access_ok() is constant 1, which reduces it to 42 callers. 3 arch/sparc/kernel/ptrace_32.c (all in arch_ptrace()) 31 arch/sparc/kernel/signal_32.c (5 in do_sigreturn(), 6 in do_rt_sigreturn(), 8 in setup_frame(), 11 in setup_rt_frame(), 1 in do_sys_sigstack()) 7 arch/sparc/kernel/sigutil_32.c (2 in save_fpu_state(), 2 in restore_fpu_state(), 2 in save_rwin_state(), 1 in restore_rwin_state() 1 arch/sparc/mm/fault_32.c (in compute_si_addr()) The last one ignores -EFAULT, BTW - not sure what should it have done in that case, though. It's calculation of ->si_addr on SIGSEGV and SIGBUS in case of data access SIGSEGV or SIGBUS; it wants to peek into insn to figure out the effective address. arch_ptrace() one is zeroing several fields in userland struct fps for PTRACE_GETFPREGS. Could've been put_user() (or clear_user(), for that matter); we'd just done copy_regset_to_user() on adjacent addresses, so the lack of access_ok() is not a security hole there). Everything else is in sigframe handling... Other large piles are on mips, ppc and itanic. parisc is next, but there access_ok() is constant 1 as well. Same for s390 and m68k. nios2 and unicore32 are a bit under 80 callers each and the next are tile (~60), sh (~50) and then it drops off fast. It's not impossible to review; the problem is in figuring out which codepaths are hot enough to worry about them. And I'm even more convinced that bulk search-and-replace is the right approach here; we probably _can_ get rid of that shit this cycle (or, at least, reduce its use to very few places in arch/*), but that'll require some cooperation from architecture maintainers.
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 11:04 AM, Al Virowrote: > > > My point is, this stuff needs looking at. No. You don't have a point. I've tried to explain that there's no performance difference, and there is no way in hell that the current "__" versions are better. So what's the point in looking? All you are ever going to come up with is theoretical "this might" cases. The only sane forward is to just get rid of them. At that point, you *may* end up actually noticing something, but if you do, it's not a "you might" issue. There's literally no upside to this "needs looking at". There are only downsides. And one major downside of your "needs looking at". is this one: From: Linus Torvalds Date: Tue, 24 Mar 2015 10:42:18 -0700 > > So I'd suggest we should just do a wholesale replacement of > __copy_to/from_user() with the non-underlined cases. Then, we could > switch insividual ones back - with reasoning of why they matter, and > with pointers to how it does access_ok() two lines before. > > We should probably even consider looking at __get_user/__put_user(). > Few of them are actually performance-critical. Look at that date. It's over two years ago. In the intervening two years, how many of those conversions have happened? Here's a hint: it's a very very round number. Linus
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 11:04 AM, Al Viro wrote: > > > My point is, this stuff needs looking at. No. You don't have a point. I've tried to explain that there's no performance difference, and there is no way in hell that the current "__" versions are better. So what's the point in looking? All you are ever going to come up with is theoretical "this might" cases. The only sane forward is to just get rid of them. At that point, you *may* end up actually noticing something, but if you do, it's not a "you might" issue. There's literally no upside to this "needs looking at". There are only downsides. And one major downside of your "needs looking at". is this one: From: Linus Torvalds Date: Tue, 24 Mar 2015 10:42:18 -0700 > > So I'd suggest we should just do a wholesale replacement of > __copy_to/from_user() with the non-underlined cases. Then, we could > switch insividual ones back - with reasoning of why they matter, and > with pointers to how it does access_ok() two lines before. > > We should probably even consider looking at __get_user/__put_user(). > Few of them are actually performance-critical. Look at that date. It's over two years ago. In the intervening two years, how many of those conversions have happened? Here's a hint: it's a very very round number. Linus
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 07:04:13PM +0100, Al Viro wrote: > My point is, this stuff needs looking at. Even this quick look in arch/x86 > has shown several fairly different classes of that stuff, probably needing > different approaches. And that - on an architecture that had tons of TLC > around signal delivery; I'm not saying that result is optimal (asm-goto sounds > potentially useful there), but it had a lot of attention given to it... BTW, even in arch/* they tend to nest. E.g. arch/alpha has 133 callers total. Distribution by files: 35 arch/alpha/kernel/osf_sys.c 92 arch/alpha/kernel/signal.c 1 arch/alpha/kernel/traps.c 4 arch/alpha/lib/csum_partial_copy.c 1 arch/alpha/mm/fault.c Distribution by functions: 1 osf_getdomainname() [1] 2 osf_sigstack() 2 get_tv32() 2 put_tv32() 4 get_it32() 4 put_it32() 2 osf_select() 18 osf_wait4() [2] 6 osf_sigaction() 34 restore_sigcontext() 1 do_sigreturn() 42 setup_sigcontext() 3 setup_frame() 6 setup_rt_frame() 1 dik_show_code() [3] 2 csum_partial_cfu_aligned() 2 csum_partial_cfu_src_aligned() 1 do_page_fault() [4] [1] insane, BTW - should be strnlen() + copy_to_user(); should report -EFAULT on failure, while we are at it. [2] with fairly disgusting use of set_fs() in the mix. [3] would break with get_user() - it's oopser fetching code to printk. [4] this: /* As of EV6, a load into $31/$f31 is a prefetch, and never faults (or is suppressed by the PALcode). Support that for older CPUs by ignoring such an instruction. */ if (cause == 0) { unsigned int insn; __get_user(insn, (unsigned int __user *)regs->pc); if ((insn >> 21 & 0x1f) == 0x1f && /* ldq ldl ldt lds ldg ldf ldwu ldbu */ (1ul << (insn >> 26) & 0x30f1400ul)) { regs->pc += 4; return; } }
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 07:04:13PM +0100, Al Viro wrote: > My point is, this stuff needs looking at. Even this quick look in arch/x86 > has shown several fairly different classes of that stuff, probably needing > different approaches. And that - on an architecture that had tons of TLC > around signal delivery; I'm not saying that result is optimal (asm-goto sounds > potentially useful there), but it had a lot of attention given to it... BTW, even in arch/* they tend to nest. E.g. arch/alpha has 133 callers total. Distribution by files: 35 arch/alpha/kernel/osf_sys.c 92 arch/alpha/kernel/signal.c 1 arch/alpha/kernel/traps.c 4 arch/alpha/lib/csum_partial_copy.c 1 arch/alpha/mm/fault.c Distribution by functions: 1 osf_getdomainname() [1] 2 osf_sigstack() 2 get_tv32() 2 put_tv32() 4 get_it32() 4 put_it32() 2 osf_select() 18 osf_wait4() [2] 6 osf_sigaction() 34 restore_sigcontext() 1 do_sigreturn() 42 setup_sigcontext() 3 setup_frame() 6 setup_rt_frame() 1 dik_show_code() [3] 2 csum_partial_cfu_aligned() 2 csum_partial_cfu_src_aligned() 1 do_page_fault() [4] [1] insane, BTW - should be strnlen() + copy_to_user(); should report -EFAULT on failure, while we are at it. [2] with fairly disgusting use of set_fs() in the mix. [3] would break with get_user() - it's oopser fetching code to printk. [4] this: /* As of EV6, a load into $31/$f31 is a prefetch, and never faults (or is suppressed by the PALcode). Support that for older CPUs by ignoring such an instruction. */ if (cause == 0) { unsigned int insn; __get_user(insn, (unsigned int __user *)regs->pc); if ((insn >> 21 & 0x1f) == 0x1f && /* ldq ldl ldt lds ldg ldf ldwu ldbu */ (1ul << (insn >> 26) & 0x30f1400ul)) { regs->pc += 4; return; } }
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 10:18:22AM -0700, Linus Torvalds wrote: > > x86 actually tends to use get_user_ex/put_user_ex instead. > > Yes. Because anybody who *really* cared about performance will have > converted already. The real cost has been stac/clac for a few years > now. On x86. Which has (not counting arch/x86/um, which definitely needs a careful look - there __..._user() is the same as ..._user() and costly as hell) all of 12 callers of __get_user() and 31 callers of __put_user(). More than a half of the latter - in cp_stat64() and I would at least try and see if "convert on stack, then copy_to_user" is better for that one. Other than that, all we have is: arch/x86/entry/common.c:372:__get_user(*(u32 *)>bp, arch/x86/ia32/ia32_signal.c:124:if (__get_user(set.sig[0], >sc.oldmask) arch/x86/ia32/ia32_signal.c:275:if (__put_user(sig, >sig)) arch/x86/include/asm/xen/page.h:86: return __put_user(val, (unsigned long __user *)addr); arch/x86/include/asm/xen/page.h:91: return __get_user(*val, (unsigned long __user *)addr); arch/x86/kernel/fpu/signal.c:100: err |= __get_user(xfeatures, (__u32 *)>header.xfeatures); arch/x86/kernel/fpu/signal.c:115: err |= __put_user(xfeatures, (__u32 *)>header.xfeatures); arch/x86/kernel/fpu/signal.c:46:if (__get_user(magic2, (__u32 __user *)(fpstate + fx_sw->xstate_size)) arch/x86/kernel/fpu/signal.c:66:__put_user(xsave->i387.swd, >status) || arch/x86/kernel/fpu/signal.c:67:__put_user(X86_FXSR_MAGIC, >magic)) arch/x86/kernel/fpu/signal.c:72:if (__get_user(swd, >swd) || __put_user(swd, >status)) arch/x86/kernel/fpu/signal.c:72:if (__get_user(swd, >swd) || __put_user(swd, >status)) arch/x86/kernel/fpu/signal.c:93:err |= __put_user(FP_XSTATE_MAGIC2, arch/x86/kernel/ptrace.c:1043: if (__put_user(word, u++)) arch/x86/kernel/ptrace.c:1070: ret = __get_user(word, u++); arch/x86/kernel/ptrace.c:472: if (__put_user(getreg(target, pos), u++)) arch/x86/kernel/ptrace.c:499: ret = __get_user(word, u++); arch/x86/kernel/signal.c:326: if (__put_user(sig, >sig)) arch/x86/kernel/signal.c:347: err |= __put_user(restorer, >pretcode); arch/x86/kernel/signal.c:356: err |= __put_user(*((u64 *)), (u64 *)frame->retcode); arch/x86/kernel/signal.c:613: if (__get_user(set.sig[0], >sc.oldmask) || (_NSIG_WORDS > 1 arch/x86/kernel/signal.c:647: if (__get_user(uc_flags, >uc.uc_flags)) arch/x86/kernel/signal.c:870: if (__get_user(uc_flags, >uc.uc_flags)) arch/x86/lib/csum-wrappers_64.c:103:*errp = __put_user(val16, (__u16 __user *)dst); arch/x86/lib/csum-wrappers_64.c:45: if (__get_user(val16, (const __u16 __user *)src)) A bunch of strays in signal.c (extending ..._ex use might be a good idea) xen_safe_{write,read}_ulong() (might very well break from adding access_ok() - or be security holes; I'm not familiar enough with that code to tell) and csum_partial_copy_{to,from}_user() - those would need to extend stac/clac pairs already there and switch to unsafe_... Plus several loops in ptrace - might or might not be sensitive, no idea. Plus do_fast_syscall_32(). Might be sensitive, Andy would be the guy to talk to, AFAICS. > And some of the existing cases are just disgusting. There's no excuse > for compat code or for ioctl to use the "__" versions. That seems to > be the bulk of those uses too. Sure. My point is, this stuff needs looking at. Even this quick look in arch/x86 has shown several fairly different classes of that stuff, probably needing different approaches. And that - on an architecture that had tons of TLC around signal delivery; I'm not saying that result is optimal (asm-goto sounds potentially useful there), but it had a lot of attention given to it...
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 10:18:22AM -0700, Linus Torvalds wrote: > > x86 actually tends to use get_user_ex/put_user_ex instead. > > Yes. Because anybody who *really* cared about performance will have > converted already. The real cost has been stac/clac for a few years > now. On x86. Which has (not counting arch/x86/um, which definitely needs a careful look - there __..._user() is the same as ..._user() and costly as hell) all of 12 callers of __get_user() and 31 callers of __put_user(). More than a half of the latter - in cp_stat64() and I would at least try and see if "convert on stack, then copy_to_user" is better for that one. Other than that, all we have is: arch/x86/entry/common.c:372:__get_user(*(u32 *)>bp, arch/x86/ia32/ia32_signal.c:124:if (__get_user(set.sig[0], >sc.oldmask) arch/x86/ia32/ia32_signal.c:275:if (__put_user(sig, >sig)) arch/x86/include/asm/xen/page.h:86: return __put_user(val, (unsigned long __user *)addr); arch/x86/include/asm/xen/page.h:91: return __get_user(*val, (unsigned long __user *)addr); arch/x86/kernel/fpu/signal.c:100: err |= __get_user(xfeatures, (__u32 *)>header.xfeatures); arch/x86/kernel/fpu/signal.c:115: err |= __put_user(xfeatures, (__u32 *)>header.xfeatures); arch/x86/kernel/fpu/signal.c:46:if (__get_user(magic2, (__u32 __user *)(fpstate + fx_sw->xstate_size)) arch/x86/kernel/fpu/signal.c:66:__put_user(xsave->i387.swd, >status) || arch/x86/kernel/fpu/signal.c:67:__put_user(X86_FXSR_MAGIC, >magic)) arch/x86/kernel/fpu/signal.c:72:if (__get_user(swd, >swd) || __put_user(swd, >status)) arch/x86/kernel/fpu/signal.c:72:if (__get_user(swd, >swd) || __put_user(swd, >status)) arch/x86/kernel/fpu/signal.c:93:err |= __put_user(FP_XSTATE_MAGIC2, arch/x86/kernel/ptrace.c:1043: if (__put_user(word, u++)) arch/x86/kernel/ptrace.c:1070: ret = __get_user(word, u++); arch/x86/kernel/ptrace.c:472: if (__put_user(getreg(target, pos), u++)) arch/x86/kernel/ptrace.c:499: ret = __get_user(word, u++); arch/x86/kernel/signal.c:326: if (__put_user(sig, >sig)) arch/x86/kernel/signal.c:347: err |= __put_user(restorer, >pretcode); arch/x86/kernel/signal.c:356: err |= __put_user(*((u64 *)), (u64 *)frame->retcode); arch/x86/kernel/signal.c:613: if (__get_user(set.sig[0], >sc.oldmask) || (_NSIG_WORDS > 1 arch/x86/kernel/signal.c:647: if (__get_user(uc_flags, >uc.uc_flags)) arch/x86/kernel/signal.c:870: if (__get_user(uc_flags, >uc.uc_flags)) arch/x86/lib/csum-wrappers_64.c:103:*errp = __put_user(val16, (__u16 __user *)dst); arch/x86/lib/csum-wrappers_64.c:45: if (__get_user(val16, (const __u16 __user *)src)) A bunch of strays in signal.c (extending ..._ex use might be a good idea) xen_safe_{write,read}_ulong() (might very well break from adding access_ok() - or be security holes; I'm not familiar enough with that code to tell) and csum_partial_copy_{to,from}_user() - those would need to extend stac/clac pairs already there and switch to unsafe_... Plus several loops in ptrace - might or might not be sensitive, no idea. Plus do_fast_syscall_32(). Might be sensitive, Andy would be the guy to talk to, AFAICS. > And some of the existing cases are just disgusting. There's no excuse > for compat code or for ioctl to use the "__" versions. That seems to > be the bulk of those uses too. Sure. My point is, this stuff needs looking at. Even this quick look in arch/x86 has shown several fairly different classes of that stuff, probably needing different approaches. And that - on an architecture that had tons of TLC around signal delivery; I'm not saying that result is optimal (asm-goto sounds potentially useful there), but it had a lot of attention given to it...
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 10:00 AM, Al Virowrote: > On Sat, May 13, 2017 at 09:15:07AM -0700, Linus Torvalds wrote: >> > IOW, we have >> > * most of users in arch/* (heavily dominated by signal-related >> > code, >> > both loads and stores). Those need careful massage; maybe unsafe-based >> > solution, maybe something else, but it's obviously per-architecture work >> > and these paths are sensitive. >> >> Why are they sensitive? > > Because there we do have tons of back-to-back __get_user() (on sigreturn()) > or __put_user() (on signal setup). It doesn't actually matter. Regular "put_user()" doesn't generate noticeably worse code. It's not a normal function call, it's still an inline asm - so it basically generates the exact same code as __xyz_user(), except it's a call to a fixed copy of the code. So the only difference tends to be (a) the extra call/ret instructions, possibly frame pointers (b) fixed registers (c) the added addr_limit checks but (a) is cheap, and (b) isn't a big deal since with "asm volatile" you can't re-order those things against each other anyway. So (b) ends up being approximately the cost of "one extra lea instruction" for the address generation that would otherwise be in the load/store instruction. And (c) is actually a reason *for* doing it. We've had bugs due to people not getting it right. So there really is basically no performance downside. Even with consecutive uses. The code that uses a function call might even be smaller (ie the 1.7kB savings isn't necessarily all in the out-of-line exception handling cases: the stac/clac instructions are six bytes for each use, so it more than makes up for the call instruction). > x86 actually tends to use get_user_ex/put_user_ex instead. Yes. Because anybody who *really* cared about performance will have converted already. The real cost has been stac/clac for a few years now. So we pretty much know that none of the remaining users are really all that critical. Certainly not "five cycles" kind of critical. >> Anybody who *relies* on not checking the address_limit is so broken as >> to be not even funny. > > Except for open-coded probe_kernel_read() somewhere in arch/*; I have not > read through those 700+ callers, so I don't know if there's any such. .. and those need to be fixed. But I'm not seeing the point in wasting valuable human time in reading through thousands of cases, when we can just automate it and see if anything breaks. Because it will break in a *safe* direction. You'll get an error from bad uses that shouldn't have worked to begin with. And some of the existing cases are just disgusting. There's no excuse for compat code or for ioctl to use the "__" versions. That seems to be the bulk of those uses too. Linus
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 10:00 AM, Al Viro wrote: > On Sat, May 13, 2017 at 09:15:07AM -0700, Linus Torvalds wrote: >> > IOW, we have >> > * most of users in arch/* (heavily dominated by signal-related >> > code, >> > both loads and stores). Those need careful massage; maybe unsafe-based >> > solution, maybe something else, but it's obviously per-architecture work >> > and these paths are sensitive. >> >> Why are they sensitive? > > Because there we do have tons of back-to-back __get_user() (on sigreturn()) > or __put_user() (on signal setup). It doesn't actually matter. Regular "put_user()" doesn't generate noticeably worse code. It's not a normal function call, it's still an inline asm - so it basically generates the exact same code as __xyz_user(), except it's a call to a fixed copy of the code. So the only difference tends to be (a) the extra call/ret instructions, possibly frame pointers (b) fixed registers (c) the added addr_limit checks but (a) is cheap, and (b) isn't a big deal since with "asm volatile" you can't re-order those things against each other anyway. So (b) ends up being approximately the cost of "one extra lea instruction" for the address generation that would otherwise be in the load/store instruction. And (c) is actually a reason *for* doing it. We've had bugs due to people not getting it right. So there really is basically no performance downside. Even with consecutive uses. The code that uses a function call might even be smaller (ie the 1.7kB savings isn't necessarily all in the out-of-line exception handling cases: the stac/clac instructions are six bytes for each use, so it more than makes up for the call instruction). > x86 actually tends to use get_user_ex/put_user_ex instead. Yes. Because anybody who *really* cared about performance will have converted already. The real cost has been stac/clac for a few years now. So we pretty much know that none of the remaining users are really all that critical. Certainly not "five cycles" kind of critical. >> Anybody who *relies* on not checking the address_limit is so broken as >> to be not even funny. > > Except for open-coded probe_kernel_read() somewhere in arch/*; I have not > read through those 700+ callers, so I don't know if there's any such. .. and those need to be fixed. But I'm not seeing the point in wasting valuable human time in reading through thousands of cases, when we can just automate it and see if anything breaks. Because it will break in a *safe* direction. You'll get an error from bad uses that shouldn't have worked to begin with. And some of the existing cases are just disgusting. There's no excuse for compat code or for ioctl to use the "__" versions. That seems to be the bulk of those uses too. Linus
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 06:00:56PM +0100, Al Viro wrote: > > But I don't see the excuse for not just doing it. If nobody notices, > > it's an obvious improvement. And if somebody *does* notice, we know > > how to do it properly with unsafe_xyz_user(), because "__xyz_user()" > > most definitely isn't it. > > I think we ought to actually look through those places - there are few > enough of them (outside of arch/, that is) and stac/clac overhead is > not the only problem they tend to have. PS: just to make it clear - I do _not_ propose to keep that shit around indefinitely; I want __get_user()/__put_user() gone by the end of that work.
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 06:00:56PM +0100, Al Viro wrote: > > But I don't see the excuse for not just doing it. If nobody notices, > > it's an obvious improvement. And if somebody *does* notice, we know > > how to do it properly with unsafe_xyz_user(), because "__xyz_user()" > > most definitely isn't it. > > I think we ought to actually look through those places - there are few > enough of them (outside of arch/, that is) and stac/clac overhead is > not the only problem they tend to have. PS: just to make it clear - I do _not_ propose to keep that shit around indefinitely; I want __get_user()/__put_user() gone by the end of that work.
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 09:15:07AM -0700, Linus Torvalds wrote: > > IOW, we have > > * most of users in arch/* (heavily dominated by signal-related code, > > both loads and stores). Those need careful massage; maybe unsafe-based > > solution, maybe something else, but it's obviously per-architecture work > > and these paths are sensitive. > > Why are they sensitive? Because there we do have tons of back-to-back __get_user() (on sigreturn()) or __put_user() (on signal setup). > Why not just do this: > > git grep -l '\<__\(\(get\)\|\(put\)\)_user(' -- arch/x86 > :^arch/x86/include/asm/uaccess.h > | xargs sed -i 's/__\(\(\(get\)\|\(put\)\)_user(\)/\1/g' > > which converts all the x86 uses in one go. x86 actually tends to use get_user_ex/put_user_ex instead. > Anybody who *relies* on not checking the address_limit is so broken as > to be not even funny. Except for open-coded probe_kernel_read() somewhere in arch/*; I have not read through those 700+ callers, so I don't know if there's any such. Sure, those won't be performance-sensitive. > And anything that is so performance-sensitive > that anybody can even measure the effect of the above we can convert > later. > Sure, do it in pieces (eg each architecture separately, then > "drivers", then "networking", then whatever, until all done), just so > that *if* something actually depends on semantics (and that really > shouldn't be the case), we have at least some information from a > bisect. > > But I don't see the excuse for not just doing it. If nobody notices, > it's an obvious improvement. And if somebody *does* notice, we know > how to do it properly with unsafe_xyz_user(), because "__xyz_user()" > most definitely isn't it. I think we ought to actually look through those places - there are few enough of them (outside of arch/, that is) and stac/clac overhead is not the only problem they tend to have.
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 09:15:07AM -0700, Linus Torvalds wrote: > > IOW, we have > > * most of users in arch/* (heavily dominated by signal-related code, > > both loads and stores). Those need careful massage; maybe unsafe-based > > solution, maybe something else, but it's obviously per-architecture work > > and these paths are sensitive. > > Why are they sensitive? Because there we do have tons of back-to-back __get_user() (on sigreturn()) or __put_user() (on signal setup). > Why not just do this: > > git grep -l '\<__\(\(get\)\|\(put\)\)_user(' -- arch/x86 > :^arch/x86/include/asm/uaccess.h > | xargs sed -i 's/__\(\(\(get\)\|\(put\)\)_user(\)/\1/g' > > which converts all the x86 uses in one go. x86 actually tends to use get_user_ex/put_user_ex instead. > Anybody who *relies* on not checking the address_limit is so broken as > to be not even funny. Except for open-coded probe_kernel_read() somewhere in arch/*; I have not read through those 700+ callers, so I don't know if there's any such. Sure, those won't be performance-sensitive. > And anything that is so performance-sensitive > that anybody can even measure the effect of the above we can convert > later. > Sure, do it in pieces (eg each architecture separately, then > "drivers", then "networking", then whatever, until all done), just so > that *if* something actually depends on semantics (and that really > shouldn't be the case), we have at least some information from a > bisect. > > But I don't see the excuse for not just doing it. If nobody notices, > it's an obvious improvement. And if somebody *does* notice, we know > how to do it properly with unsafe_xyz_user(), because "__xyz_user()" > most definitely isn't it. I think we ought to actually look through those places - there are few enough of them (outside of arch/, that is) and stac/clac overhead is not the only problem they tend to have.
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 02:05:27PM +0200, Adam Borowski wrote: > As someone from the peanuts gallery, I took a look for __put_user() in my > usual haunt, drivers/tty/vt/ > > * use 1: con_[gs]et_trans_*(): > Copies a linear array of 256 bytes/shorts, one by one. > The obvious patch has 9 insertions(+), 22 deletions(-). > > * use 2: con_[gs]et_unimap(): > Ditto, up to 65535*2 shorts, also in a nice linear array. > > * use 3: tioclinux(): > Does a __put into a place that was checked only for read. This got me > frightened as it initially looked like something that can allow an user to > write where they shouldn't. Fortunately, it turns out the first argument > to access_ok() is ignored on every single architecture -- why does it even > exist then? I imagined it's there for some odd arch that allows writes > when in privileged mode, but unlike what the docs say, VERIFY_WRITE is > exactly same as VERIFY_READ. It's a remnant of old kludge that never properly worked in the first place. access_ok() should have been called userland_range() - that's all it checks and that's all it *can* check. As it is, each of those __get_user() can bloody well yield -EFAULT. Despite having "passed" access_ok(). Again, the only thing access_ok() checks is that (on architecture with shared address space for userland and kernel) the addresses given are on the userland side. That's _it_ - they can be unmapped, mmapped to broken floppy, whatever; you'll find out when you try to actually copy bytes from from it. What the kludge used to attempt was "let's check that we are not trying to copy into read-only mmapped area - 80386 MMU is fucked in head and won't fault on such stores in ring 0". It had always been racy. Look: thread A: going to copy something to user-supplied address. Do access_ok(). thread A: looks like it's mapped writable, let's go ahead and copy thread A: do something blocking before actually doing __put_user() or __copy_to_user() or whatever it's going to be. thread B: munmap(), mmap() something read-only here. thread A: get to actual __put_user()/__copy_to_user(). 80386: hey, it's ring 0, no need to look at the write protect bit in page tables. What can go wrong, anyway? You can't move any non-static checks to access_ok(). On any architecture. Anything that could change between access_ok() and actual copying can't be checked in access_ok(). As it is, access_ok() has actively misleading calling conventions: * the name implies that having passed access_ok() you don't have to worry about EFAULT * 'direction' argument of that thing reinforces that impression *and* has to be produced by the caller. Most simply pass a constant, which immediately gets dropped (as an aside, take a look at 4b4554f6d - it's amusing), but in some cases it's calculated elsewhere and carefully passed through several levels of call chain. Only to be discarded by access_ok(), of course... > Ie, every use in this sample is wrong. I suspect the rest of the kernel > should be similar. Looking through vt... * con_set_trans_old(): copy_from_user() + loop for doing or with UNI_DIRECT_BASE. Almost certainly will be faster that way - on *any* architecture. * con_get_trans_old(): copy_to_user() would be an obvious optimization. * con_set_trans_new(): copy_from_user(). * con_get_trans_new(): copy_to_user(). * con_set_unimap(): memdup_user() instead of the entire kmalloc_array + loop copying the sucker member by member. With ushort ct you don't need overflow-related logics of kmalloc_array() anyway... * con_get_unimap(): copy_to_user() + put_user() (for uct) * set_selection(): just copy_from_user() into local struct tiocl_selection. * tioclinux(): use put_user(). Yes, you will repeat the same check twice. Once per ioctl(), that involves enough work to make that "recalculate access_ok() once for no reason" non-issue on any architecture. * vt_ioctl(): turn that ushort ll,cc,vlin,clin,vcol,ccol; into struct vt_consize size; or something like that and use copy_from_user() And AFAICS you can lose each and every access_ok() call in there.
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 02:05:27PM +0200, Adam Borowski wrote: > As someone from the peanuts gallery, I took a look for __put_user() in my > usual haunt, drivers/tty/vt/ > > * use 1: con_[gs]et_trans_*(): > Copies a linear array of 256 bytes/shorts, one by one. > The obvious patch has 9 insertions(+), 22 deletions(-). > > * use 2: con_[gs]et_unimap(): > Ditto, up to 65535*2 shorts, also in a nice linear array. > > * use 3: tioclinux(): > Does a __put into a place that was checked only for read. This got me > frightened as it initially looked like something that can allow an user to > write where they shouldn't. Fortunately, it turns out the first argument > to access_ok() is ignored on every single architecture -- why does it even > exist then? I imagined it's there for some odd arch that allows writes > when in privileged mode, but unlike what the docs say, VERIFY_WRITE is > exactly same as VERIFY_READ. It's a remnant of old kludge that never properly worked in the first place. access_ok() should have been called userland_range() - that's all it checks and that's all it *can* check. As it is, each of those __get_user() can bloody well yield -EFAULT. Despite having "passed" access_ok(). Again, the only thing access_ok() checks is that (on architecture with shared address space for userland and kernel) the addresses given are on the userland side. That's _it_ - they can be unmapped, mmapped to broken floppy, whatever; you'll find out when you try to actually copy bytes from from it. What the kludge used to attempt was "let's check that we are not trying to copy into read-only mmapped area - 80386 MMU is fucked in head and won't fault on such stores in ring 0". It had always been racy. Look: thread A: going to copy something to user-supplied address. Do access_ok(). thread A: looks like it's mapped writable, let's go ahead and copy thread A: do something blocking before actually doing __put_user() or __copy_to_user() or whatever it's going to be. thread B: munmap(), mmap() something read-only here. thread A: get to actual __put_user()/__copy_to_user(). 80386: hey, it's ring 0, no need to look at the write protect bit in page tables. What can go wrong, anyway? You can't move any non-static checks to access_ok(). On any architecture. Anything that could change between access_ok() and actual copying can't be checked in access_ok(). As it is, access_ok() has actively misleading calling conventions: * the name implies that having passed access_ok() you don't have to worry about EFAULT * 'direction' argument of that thing reinforces that impression *and* has to be produced by the caller. Most simply pass a constant, which immediately gets dropped (as an aside, take a look at 4b4554f6d - it's amusing), but in some cases it's calculated elsewhere and carefully passed through several levels of call chain. Only to be discarded by access_ok(), of course... > Ie, every use in this sample is wrong. I suspect the rest of the kernel > should be similar. Looking through vt... * con_set_trans_old(): copy_from_user() + loop for doing or with UNI_DIRECT_BASE. Almost certainly will be faster that way - on *any* architecture. * con_get_trans_old(): copy_to_user() would be an obvious optimization. * con_set_trans_new(): copy_from_user(). * con_get_trans_new(): copy_to_user(). * con_set_unimap(): memdup_user() instead of the entire kmalloc_array + loop copying the sucker member by member. With ushort ct you don't need overflow-related logics of kmalloc_array() anyway... * con_get_unimap(): copy_to_user() + put_user() (for uct) * set_selection(): just copy_from_user() into local struct tiocl_selection. * tioclinux(): use put_user(). Yes, you will repeat the same check twice. Once per ioctl(), that involves enough work to make that "recalculate access_ok() once for no reason" non-issue on any architecture. * vt_ioctl(): turn that ushort ll,cc,vlin,clin,vcol,ccol; into struct vt_consize size; or something like that and use copy_from_user() And AFAICS you can lose each and every access_ok() call in there.
Re: [git pull] uaccess-related bits of vfs.git
Oops. *Really* adding the x86 guys now. Blush. Linus On Sat, May 13, 2017 at 9:15 AM, Linus Torvaldswrote: > On Fri, May 12, 2017 at 11:57 PM, Al Viro wrote: >> >> First, some stats: there's a thousand-odd callers of __get_user(). Out of >> those, about 70% are in arch/, mostly in sigframe-related code. > > Sure. And they can be trivially converted, and none of them should care at > all. > >> IOW, we have >> * most of users in arch/* (heavily dominated by signal-related code, >> both loads and stores). Those need careful massage; maybe unsafe-based >> solution, maybe something else, but it's obviously per-architecture work >> and these paths are sensitive. > > Why are they sensitive? > > Why not just do this: > > git grep -l '\<__\(\(get\)\|\(put\)\)_user(' -- arch/x86 > :^arch/x86/include/asm/uaccess.h > | xargs sed -i 's/__\(\(\(get\)\|\(put\)\)_user(\)/\1/g' > > which converts all the x86 uses in one go. > > Anybody who *relies* on not checking the address_limit is so broken as > to be not even funny. And anything that is so performance-sensitive > that anybody can even measure the effect of the above we can convert > later. > > Sure, do it in pieces (eg each architecture separately, then > "drivers", then "networking", then whatever, until all done), just so > that *if* something actually depends on semantics (and that really > shouldn't be the case), we have at least some information from a > bisect. > > But I don't see the excuse for not just doing it. If nobody notices, > it's an obvious improvement. And if somebody *does* notice, we know > how to do it properly with unsafe_xyz_user(), because "__xyz_user()" > most definitely isn't it. > > An example of something that *should* be converted is > "csum_partial_copy_from_user()", but it really does need to use > "user_access_begin()" and friends, because right now it's using > stac/clac for each 16-bit word. That's *expensive*, and it's expensive > whether you use __get_user() or get_user(). > > Adding x86 people just to see how they react to the above "patch". > > For me, in my fairly minimal personal config, the above cleanup patch > shrinks the text size of the resulting kernel by 1.7kB. Yes, most of > it is the out-of-line code, but still.. > > Interestingly, the signal handling code didn't change at all. It looks > like only the 32-bit code uses __put_user() for the frame setup. The > regular code uses put_user_try/put_user_catch, which is the > x86-specific early try at the unsafe versions, but it would actually > be improved by using "unsafe_put_user()" and my patch to make that use > "asm goto". > >Linus > > PS. That "patch" depends on modern git - with older versions of git > you need to do the path negation with ":!pattern", and then you need > to quote it too since '!' is special for shell.
Re: [git pull] uaccess-related bits of vfs.git
Oops. *Really* adding the x86 guys now. Blush. Linus On Sat, May 13, 2017 at 9:15 AM, Linus Torvalds wrote: > On Fri, May 12, 2017 at 11:57 PM, Al Viro wrote: >> >> First, some stats: there's a thousand-odd callers of __get_user(). Out of >> those, about 70% are in arch/, mostly in sigframe-related code. > > Sure. And they can be trivially converted, and none of them should care at > all. > >> IOW, we have >> * most of users in arch/* (heavily dominated by signal-related code, >> both loads and stores). Those need careful massage; maybe unsafe-based >> solution, maybe something else, but it's obviously per-architecture work >> and these paths are sensitive. > > Why are they sensitive? > > Why not just do this: > > git grep -l '\<__\(\(get\)\|\(put\)\)_user(' -- arch/x86 > :^arch/x86/include/asm/uaccess.h > | xargs sed -i 's/__\(\(\(get\)\|\(put\)\)_user(\)/\1/g' > > which converts all the x86 uses in one go. > > Anybody who *relies* on not checking the address_limit is so broken as > to be not even funny. And anything that is so performance-sensitive > that anybody can even measure the effect of the above we can convert > later. > > Sure, do it in pieces (eg each architecture separately, then > "drivers", then "networking", then whatever, until all done), just so > that *if* something actually depends on semantics (and that really > shouldn't be the case), we have at least some information from a > bisect. > > But I don't see the excuse for not just doing it. If nobody notices, > it's an obvious improvement. And if somebody *does* notice, we know > how to do it properly with unsafe_xyz_user(), because "__xyz_user()" > most definitely isn't it. > > An example of something that *should* be converted is > "csum_partial_copy_from_user()", but it really does need to use > "user_access_begin()" and friends, because right now it's using > stac/clac for each 16-bit word. That's *expensive*, and it's expensive > whether you use __get_user() or get_user(). > > Adding x86 people just to see how they react to the above "patch". > > For me, in my fairly minimal personal config, the above cleanup patch > shrinks the text size of the resulting kernel by 1.7kB. Yes, most of > it is the out-of-line code, but still.. > > Interestingly, the signal handling code didn't change at all. It looks > like only the 32-bit code uses __put_user() for the frame setup. The > regular code uses put_user_try/put_user_catch, which is the > x86-specific early try at the unsafe versions, but it would actually > be improved by using "unsafe_put_user()" and my patch to make that use > "asm goto". > >Linus > > PS. That "patch" depends on modern git - with older versions of git > you need to do the path negation with ":!pattern", and then you need > to quote it too since '!' is special for shell.
Re: [git pull] uaccess-related bits of vfs.git
On Fri, May 12, 2017 at 11:57 PM, Al Virowrote: > > First, some stats: there's a thousand-odd callers of __get_user(). Out of > those, about 70% are in arch/, mostly in sigframe-related code. Sure. And they can be trivially converted, and none of them should care at all. > IOW, we have > * most of users in arch/* (heavily dominated by signal-related code, > both loads and stores). Those need careful massage; maybe unsafe-based > solution, maybe something else, but it's obviously per-architecture work > and these paths are sensitive. Why are they sensitive? Why not just do this: git grep -l '\<__\(\(get\)\|\(put\)\)_user(' -- arch/x86 :^arch/x86/include/asm/uaccess.h | xargs sed -i 's/__\(\(\(get\)\|\(put\)\)_user(\)/\1/g' which converts all the x86 uses in one go. Anybody who *relies* on not checking the address_limit is so broken as to be not even funny. And anything that is so performance-sensitive that anybody can even measure the effect of the above we can convert later. Sure, do it in pieces (eg each architecture separately, then "drivers", then "networking", then whatever, until all done), just so that *if* something actually depends on semantics (and that really shouldn't be the case), we have at least some information from a bisect. But I don't see the excuse for not just doing it. If nobody notices, it's an obvious improvement. And if somebody *does* notice, we know how to do it properly with unsafe_xyz_user(), because "__xyz_user()" most definitely isn't it. An example of something that *should* be converted is "csum_partial_copy_from_user()", but it really does need to use "user_access_begin()" and friends, because right now it's using stac/clac for each 16-bit word. That's *expensive*, and it's expensive whether you use __get_user() or get_user(). Adding x86 people just to see how they react to the above "patch". For me, in my fairly minimal personal config, the above cleanup patch shrinks the text size of the resulting kernel by 1.7kB. Yes, most of it is the out-of-line code, but still.. Interestingly, the signal handling code didn't change at all. It looks like only the 32-bit code uses __put_user() for the frame setup. The regular code uses put_user_try/put_user_catch, which is the x86-specific early try at the unsafe versions, but it would actually be improved by using "unsafe_put_user()" and my patch to make that use "asm goto". Linus PS. That "patch" depends on modern git - with older versions of git you need to do the path negation with ":!pattern", and then you need to quote it too since '!' is special for shell.
Re: [git pull] uaccess-related bits of vfs.git
On Fri, May 12, 2017 at 11:57 PM, Al Viro wrote: > > First, some stats: there's a thousand-odd callers of __get_user(). Out of > those, about 70% are in arch/, mostly in sigframe-related code. Sure. And they can be trivially converted, and none of them should care at all. > IOW, we have > * most of users in arch/* (heavily dominated by signal-related code, > both loads and stores). Those need careful massage; maybe unsafe-based > solution, maybe something else, but it's obviously per-architecture work > and these paths are sensitive. Why are they sensitive? Why not just do this: git grep -l '\<__\(\(get\)\|\(put\)\)_user(' -- arch/x86 :^arch/x86/include/asm/uaccess.h | xargs sed -i 's/__\(\(\(get\)\|\(put\)\)_user(\)/\1/g' which converts all the x86 uses in one go. Anybody who *relies* on not checking the address_limit is so broken as to be not even funny. And anything that is so performance-sensitive that anybody can even measure the effect of the above we can convert later. Sure, do it in pieces (eg each architecture separately, then "drivers", then "networking", then whatever, until all done), just so that *if* something actually depends on semantics (and that really shouldn't be the case), we have at least some information from a bisect. But I don't see the excuse for not just doing it. If nobody notices, it's an obvious improvement. And if somebody *does* notice, we know how to do it properly with unsafe_xyz_user(), because "__xyz_user()" most definitely isn't it. An example of something that *should* be converted is "csum_partial_copy_from_user()", but it really does need to use "user_access_begin()" and friends, because right now it's using stac/clac for each 16-bit word. That's *expensive*, and it's expensive whether you use __get_user() or get_user(). Adding x86 people just to see how they react to the above "patch". For me, in my fairly minimal personal config, the above cleanup patch shrinks the text size of the resulting kernel by 1.7kB. Yes, most of it is the out-of-line code, but still.. Interestingly, the signal handling code didn't change at all. It looks like only the 32-bit code uses __put_user() for the frame setup. The regular code uses put_user_try/put_user_catch, which is the x86-specific early try at the unsafe versions, but it would actually be improved by using "unsafe_put_user()" and my patch to make that use "asm goto". Linus PS. That "patch" depends on modern git - with older versions of git you need to do the path negation with ":!pattern", and then you need to quote it too since '!' is special for shell.
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 8:05 AM, Adam Borowskiwrote: > On Sat, May 13, 2017 at 07:57:45AM +0100, Al Viro wrote: >> On Fri, May 12, 2017 at 06:00:44PM -0700, Linus Torvalds wrote: >> > But the *first* thing I'd like to do would be to just get rid of >> > __get_user/__put_user as a thing. It really does generate nasty code, >> > and we might as well just make it do that function call. >> >> But IMO the first step is not to convert __get_user()/__put_user() to >> aliases of get_user()/put_user() - it's to get rid of their infestations >> outside of arch/*. They are concentrated in relatively few places, so >> we can deal with those individually and then just fucking ban their use >> outside of arch/*. That's easy enough to watch for - simple git grep will >> do, >> so anything creeping back will be immediately spotted. > > As someone from the peanuts gallery, I took a look for __put_user() in my > usual haunt, drivers/tty/vt/ > > * use 1: con_[gs]et_trans_*(): > Copies a linear array of 256 bytes/shorts, one by one. > The obvious patch has 9 insertions(+), 22 deletions(-). > > * use 2: con_[gs]et_unimap(): > Ditto, up to 65535*2 shorts, also in a nice linear array. > > * use 3: tioclinux(): > Does a __put into a place that was checked only for read. This got me > frightened as it initially looked like something that can allow an user to > write where they shouldn't. Fortunately, it turns out the first argument > to access_ok() is ignored on every single architecture -- why does it even > exist then? I imagined it's there for some odd arch that allows writes > when in privileged mode, but unlike what the docs say, VERIFY_WRITE is > exactly same as VERIFY_READ. The original i386 (no longer supported) ignored the write-protect bit when in supervisor mode. -- Brian Gerst
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 8:05 AM, Adam Borowski wrote: > On Sat, May 13, 2017 at 07:57:45AM +0100, Al Viro wrote: >> On Fri, May 12, 2017 at 06:00:44PM -0700, Linus Torvalds wrote: >> > But the *first* thing I'd like to do would be to just get rid of >> > __get_user/__put_user as a thing. It really does generate nasty code, >> > and we might as well just make it do that function call. >> >> But IMO the first step is not to convert __get_user()/__put_user() to >> aliases of get_user()/put_user() - it's to get rid of their infestations >> outside of arch/*. They are concentrated in relatively few places, so >> we can deal with those individually and then just fucking ban their use >> outside of arch/*. That's easy enough to watch for - simple git grep will >> do, >> so anything creeping back will be immediately spotted. > > As someone from the peanuts gallery, I took a look for __put_user() in my > usual haunt, drivers/tty/vt/ > > * use 1: con_[gs]et_trans_*(): > Copies a linear array of 256 bytes/shorts, one by one. > The obvious patch has 9 insertions(+), 22 deletions(-). > > * use 2: con_[gs]et_unimap(): > Ditto, up to 65535*2 shorts, also in a nice linear array. > > * use 3: tioclinux(): > Does a __put into a place that was checked only for read. This got me > frightened as it initially looked like something that can allow an user to > write where they shouldn't. Fortunately, it turns out the first argument > to access_ok() is ignored on every single architecture -- why does it even > exist then? I imagined it's there for some odd arch that allows writes > when in privileged mode, but unlike what the docs say, VERIFY_WRITE is > exactly same as VERIFY_READ. The original i386 (no longer supported) ignored the write-protect bit when in supervisor mode. -- Brian Gerst
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 07:57:45AM +0100, Al Viro wrote: > On Fri, May 12, 2017 at 06:00:44PM -0700, Linus Torvalds wrote: > > But the *first* thing I'd like to do would be to just get rid of > > __get_user/__put_user as a thing. It really does generate nasty code, > > and we might as well just make it do that function call. > > But IMO the first step is not to convert __get_user()/__put_user() to > aliases of get_user()/put_user() - it's to get rid of their infestations > outside of arch/*. They are concentrated in relatively few places, so > we can deal with those individually and then just fucking ban their use > outside of arch/*. That's easy enough to watch for - simple git grep will do, > so anything creeping back will be immediately spotted. As someone from the peanuts gallery, I took a look for __put_user() in my usual haunt, drivers/tty/vt/ * use 1: con_[gs]et_trans_*(): Copies a linear array of 256 bytes/shorts, one by one. The obvious patch has 9 insertions(+), 22 deletions(-). * use 2: con_[gs]et_unimap(): Ditto, up to 65535*2 shorts, also in a nice linear array. * use 3: tioclinux(): Does a __put into a place that was checked only for read. This got me frightened as it initially looked like something that can allow an user to write where they shouldn't. Fortunately, it turns out the first argument to access_ok() is ignored on every single architecture -- why does it even exist then? I imagined it's there for some odd arch that allows writes when in privileged mode, but unlike what the docs say, VERIFY_WRITE is exactly same as VERIFY_READ. Ie, every use in this sample is wrong. I suspect the rest of the kernel should be similar. Meow! -- Don't be racist. White, amber or black, all beers should be judged based solely on their merits. Heck, even if occasionally a cider applies for a beer's job, why not? On the other hand, corpo lager is not a race.
Re: [git pull] uaccess-related bits of vfs.git
On Sat, May 13, 2017 at 07:57:45AM +0100, Al Viro wrote: > On Fri, May 12, 2017 at 06:00:44PM -0700, Linus Torvalds wrote: > > But the *first* thing I'd like to do would be to just get rid of > > __get_user/__put_user as a thing. It really does generate nasty code, > > and we might as well just make it do that function call. > > But IMO the first step is not to convert __get_user()/__put_user() to > aliases of get_user()/put_user() - it's to get rid of their infestations > outside of arch/*. They are concentrated in relatively few places, so > we can deal with those individually and then just fucking ban their use > outside of arch/*. That's easy enough to watch for - simple git grep will do, > so anything creeping back will be immediately spotted. As someone from the peanuts gallery, I took a look for __put_user() in my usual haunt, drivers/tty/vt/ * use 1: con_[gs]et_trans_*(): Copies a linear array of 256 bytes/shorts, one by one. The obvious patch has 9 insertions(+), 22 deletions(-). * use 2: con_[gs]et_unimap(): Ditto, up to 65535*2 shorts, also in a nice linear array. * use 3: tioclinux(): Does a __put into a place that was checked only for read. This got me frightened as it initially looked like something that can allow an user to write where they shouldn't. Fortunately, it turns out the first argument to access_ok() is ignored on every single architecture -- why does it even exist then? I imagined it's there for some odd arch that allows writes when in privileged mode, but unlike what the docs say, VERIFY_WRITE is exactly same as VERIFY_READ. Ie, every use in this sample is wrong. I suspect the rest of the kernel should be similar. Meow! -- Don't be racist. White, amber or black, all beers should be judged based solely on their merits. Heck, even if occasionally a cider applies for a beer's job, why not? On the other hand, corpo lager is not a race.
Re: [git pull] uaccess-related bits of vfs.git
On Fri, May 12, 2017 at 06:00:44PM -0700, Linus Torvalds wrote: > So I should have asked earlier, but I was feeling rather busy during > the early merge window.. > So I'm actually more interested to hear if you have any pending work > on cleaning up the __get/put_user() mess we have. Is that on your > radar at all? Yes, it is. > In particular, right now, both __get_user() and __put_user() are nasty > and broken interfaces. > > It *used* to be that they were designed to just generate a single > instruction. That's not what they currently do at all, due to the > whole SMAP/PAN on x86 and arm. > > For example, on x86, right now a single __put_user() call ends up > generating something like [snip] > But even that small thing is rather debatable from a performance > angle: the actual cost of __put_user() these days is not the function > call, but the clac/stac (on modern x86) and the PAN bit games (on > arm). > > So I'm actually inclined to just say "We should make > __get_user/__put_user() just be aliases for the normal > get_user/put_user() model". > which is more boilerplate, but ends up generating much better code. > And for "unsafe_put_user()" in particular, we actually could teach gcc > to use "asm goto" to really improve code generation. I have patches > for that if you want to look. > > I'm attaching an example patch for "filldir()" that does that modern > thing. It almost had the right form as-is anyway, and under some loads > (eg "find") filldir actually shows up in profiles too.\ > But the *first* thing I'd like to do would be to just get rid of > __get_user/__put_user as a thing. It really does generate nasty code, > and we might as well just make it do that function call. > > Because what we do now isn't right. If we care about performance, the > "__xyz_user" versions are wrong (use unsafe_xyz_user() instead). And > if you don't care about performance, you should use the regular > xyz_user() functions that do an ok job by just calling the right-sized > helper function instead of inlining the crud. > > Hmm? First, some stats: there's a thousand-odd callers of __get_user(). Out of those, about 70% are in arch/, mostly in sigframe-related code. Take a look at the output of git grep -n -w __get_user|grep -v '^arch' | sed -e 's/:.*//'|uniq -c|sort -nr 55 drivers/gpu/drm/drm_ioc32.c 43 drivers/staging/comedi/comedi_compat32.c 35 kernel/compat.c 27 net/compat.c 27 block/compat_ioctl.c 15 drivers/usb/core/devio.c 13 drivers/char/ipmi/ipmi_devintf.c 11 kernel/signal.c 10 fs/fcntl.c 8 ipc/compat.c 8 drivers/video/fbdev/sbuslib.c 7 net/socket.c 7 drivers/gpu/drm/mga/mga_ioc32.c 6 fs/select.c 6 drivers/tty/vt/vt_ioctl.c 5 drivers/tty/vt/selection.c 5 drivers/spi/spidev.c 5 drivers/pci/proc.c 4 kernel/ptrace.c 4 ipc/compat_mq.c 4 drivers/tty/vt/consolemap.c 3 sound/oss/sys_timer.c 3 drivers/media/usb/uvc/uvc_v4l2.c 3 drivers/macintosh/ans-lcd.c and then we have a smallish bunch of files with one or two callers. For __put_user() we have ~1800 callers total, ~1100 of them in arch/* and the rest goes like this: 73 drivers/gpu/drm/drm_ioc32.c 66 ipc/compat.c 58 drivers/gpu/drm/radeon/radeon_ioc32.c 55 block/compat_ioctl.c 52 kernel/compat.c 49 kernel/signal.c 43 drivers/staging/comedi/comedi_compat32.c 31 drivers/gpu/drm/r128/r128_ioc32.c 30 drivers/gpu/drm/mga/mga_ioc32.c 27 fs/signalfd.c 25 fs/readdir.c 24 fs/statfs.c 19 kernel/sys.c 17 net/compat.c 11 drivers/scsi/sg.c 10 fs/fcntl.c 8 sound/core/pcm_native.c 8 fs/binfmt_flat.c 7 fs/binfmt_elf_fdpic.c 6 net/socket.c 6 drivers/char/ipmi/ipmi_devintf.c 5 sound/oss/sys_timer.c 5 fs/binfmt_elf.c 5 drivers/video/fbdev/sbuslib.c 5 drivers/tty/vt/consolemap.c 5 drivers/spi/spidev.c 5 drivers/pci/proc.c 4 kernel/ptrace.c 4 ipc/compat_mq.c 3 fs/select.c 3 drivers/tty/vt/vt.c ... IOW, we have * most of users in arch/* (heavily dominated by signal-related code, both loads and stores). Those need careful massage; maybe unsafe-based solution, maybe something else, but it's obviously per-architecture work and these paths are sensitive. * few places outside of arch/* where these are abused; absolute majority are in ioctl compat code and they are _bad_. Bad on x86, bad on s390, etc. I'm not sure if switch to unsafe is the right solution for those, actually - depends on how we end up dealing with compat ioctls of that sort. Might be better to do bulk copy to/from userland, combined with conversion from 32bit to native kernel-side and passing pointers to kernel objects to functions doing actual work. Really depends upon the details. * some places in fairly common codepaths where __get_user() and __put_user() are being played
Re: [git pull] uaccess-related bits of vfs.git
On Fri, May 12, 2017 at 06:00:44PM -0700, Linus Torvalds wrote: > So I should have asked earlier, but I was feeling rather busy during > the early merge window.. > So I'm actually more interested to hear if you have any pending work > on cleaning up the __get/put_user() mess we have. Is that on your > radar at all? Yes, it is. > In particular, right now, both __get_user() and __put_user() are nasty > and broken interfaces. > > It *used* to be that they were designed to just generate a single > instruction. That's not what they currently do at all, due to the > whole SMAP/PAN on x86 and arm. > > For example, on x86, right now a single __put_user() call ends up > generating something like [snip] > But even that small thing is rather debatable from a performance > angle: the actual cost of __put_user() these days is not the function > call, but the clac/stac (on modern x86) and the PAN bit games (on > arm). > > So I'm actually inclined to just say "We should make > __get_user/__put_user() just be aliases for the normal > get_user/put_user() model". > which is more boilerplate, but ends up generating much better code. > And for "unsafe_put_user()" in particular, we actually could teach gcc > to use "asm goto" to really improve code generation. I have patches > for that if you want to look. > > I'm attaching an example patch for "filldir()" that does that modern > thing. It almost had the right form as-is anyway, and under some loads > (eg "find") filldir actually shows up in profiles too.\ > But the *first* thing I'd like to do would be to just get rid of > __get_user/__put_user as a thing. It really does generate nasty code, > and we might as well just make it do that function call. > > Because what we do now isn't right. If we care about performance, the > "__xyz_user" versions are wrong (use unsafe_xyz_user() instead). And > if you don't care about performance, you should use the regular > xyz_user() functions that do an ok job by just calling the right-sized > helper function instead of inlining the crud. > > Hmm? First, some stats: there's a thousand-odd callers of __get_user(). Out of those, about 70% are in arch/, mostly in sigframe-related code. Take a look at the output of git grep -n -w __get_user|grep -v '^arch' | sed -e 's/:.*//'|uniq -c|sort -nr 55 drivers/gpu/drm/drm_ioc32.c 43 drivers/staging/comedi/comedi_compat32.c 35 kernel/compat.c 27 net/compat.c 27 block/compat_ioctl.c 15 drivers/usb/core/devio.c 13 drivers/char/ipmi/ipmi_devintf.c 11 kernel/signal.c 10 fs/fcntl.c 8 ipc/compat.c 8 drivers/video/fbdev/sbuslib.c 7 net/socket.c 7 drivers/gpu/drm/mga/mga_ioc32.c 6 fs/select.c 6 drivers/tty/vt/vt_ioctl.c 5 drivers/tty/vt/selection.c 5 drivers/spi/spidev.c 5 drivers/pci/proc.c 4 kernel/ptrace.c 4 ipc/compat_mq.c 4 drivers/tty/vt/consolemap.c 3 sound/oss/sys_timer.c 3 drivers/media/usb/uvc/uvc_v4l2.c 3 drivers/macintosh/ans-lcd.c and then we have a smallish bunch of files with one or two callers. For __put_user() we have ~1800 callers total, ~1100 of them in arch/* and the rest goes like this: 73 drivers/gpu/drm/drm_ioc32.c 66 ipc/compat.c 58 drivers/gpu/drm/radeon/radeon_ioc32.c 55 block/compat_ioctl.c 52 kernel/compat.c 49 kernel/signal.c 43 drivers/staging/comedi/comedi_compat32.c 31 drivers/gpu/drm/r128/r128_ioc32.c 30 drivers/gpu/drm/mga/mga_ioc32.c 27 fs/signalfd.c 25 fs/readdir.c 24 fs/statfs.c 19 kernel/sys.c 17 net/compat.c 11 drivers/scsi/sg.c 10 fs/fcntl.c 8 sound/core/pcm_native.c 8 fs/binfmt_flat.c 7 fs/binfmt_elf_fdpic.c 6 net/socket.c 6 drivers/char/ipmi/ipmi_devintf.c 5 sound/oss/sys_timer.c 5 fs/binfmt_elf.c 5 drivers/video/fbdev/sbuslib.c 5 drivers/tty/vt/consolemap.c 5 drivers/spi/spidev.c 5 drivers/pci/proc.c 4 kernel/ptrace.c 4 ipc/compat_mq.c 3 fs/select.c 3 drivers/tty/vt/vt.c ... IOW, we have * most of users in arch/* (heavily dominated by signal-related code, both loads and stores). Those need careful massage; maybe unsafe-based solution, maybe something else, but it's obviously per-architecture work and these paths are sensitive. * few places outside of arch/* where these are abused; absolute majority are in ioctl compat code and they are _bad_. Bad on x86, bad on s390, etc. I'm not sure if switch to unsafe is the right solution for those, actually - depends on how we end up dealing with compat ioctls of that sort. Might be better to do bulk copy to/from userland, combined with conversion from 32bit to native kernel-side and passing pointers to kernel objects to functions doing actual work. Really depends upon the details. * some places in fairly common codepaths where __get_user() and __put_user() are being played
Re: [git pull] uaccess-related bits of vfs.git
So I should have asked earlier, but I was feeling rather busy during the early merge window.. On Sun, Apr 30, 2017 at 8:45 PM, Al Virowrote: > uaccess unification pile. It's _not_ the end of uaccess work, but > the next batch of that will go into the next cycle. This one mostly takes > copy_from_user() and friends out of arch/* and gets the zero-padding behaviour > in sync for all architectures. > Dealing with the nocache/writethrough mess is for the next cycle; > fortunately, that's x86-only. So I'm actually more interested to hear if you have any pending work on cleaning up the __get/put_user() mess we have. Is that on your radar at all? In particular, right now, both __get_user() and __put_user() are nasty and broken interfaces. It *used* to be that they were designed to just generate a single instruction. That's not what they currently do at all, due to the whole SMAP/PAN on x86 and arm. For example, on x86, right now a single __put_user() call ends up generating something like #APP # 58 "./arch/x86/include/asm/smap.h" 1 661: 662: .skip -(((6651f-6641f)-(662b-661b)) > 0) * ((6651f-6641f)-(662b-661b)),0x90 663: .pushsection .altinstructions,"a" .long 661b - . .long 6641f - . .word ( 9*32+20) .byte 663b-661b .byte 6651f-6641f .byte 663b-662b .popsection .pushsection .altinstr_replacement, "ax" 6641: .byte 0x0f,0x01,0xcb 6651: .popsection # 0 "" 2 #NO_APP xorl%eax, %eax # __pu_err #APP # 269 "fs/readdir.c" 1 1: movq %rcx,8(%rdi) # offset, MEM[(struct __large_struct *)_16] 2: .section .fixup,"ax" 3: mov $-14,%eax #, __pu_err jmp 2b .previous .pushsection "__ex_table","a" .balign 4 .long (1b) - . .long (3b) - . .long (ex_handler_default) - . .popsection # 0 "" 2 # 52 "./arch/x86/include/asm/smap.h" 1 661: 662: .skip -(((6651f-6641f)-(662b-661b)) > 0) * ((6651f-6641f)-(662b-661b)),0x90 663: .pushsection .altinstructions,"a" .long 661b - . .long 6641f - . .word ( 9*32+20) .byte 663b-661b .byte 6651f-6641f .byte 663b-662b .popsection .pushsection .altinstr_replacement, "ax" 6641: .byte 0x0f,0x01,0xca 6651: .popsection # 0 "" 2 #NO_APP and while much of that is out-of-line and in different sections, it basically means that it's insane how we inline these things. Yes, yes, the inlined part of the above horror-story ends up being just clac xor%eax,%eax mov%rcx,0x8(%rdi) stac and everything else is just boiler-plate for the alt-instruction handling and the exception handling. But even that small thing is rather debatable from a performance angle: the actual cost of __put_user() these days is not the function call, but the clac/stac (on modern x86) and the PAN bit games (on arm). So I'm actually inclined to just say "We should make __get_user/__put_user() just be aliases for the normal get_user/put_user() model". The *correct* way to do fast user space accesses when you hoist error checking outside is to use if (!access_ok(..)) goto efault; user_access_begin(); .. ... loop over or otherwise do multiple "raw" accessess ... unsafe_get/put_user(c, ptr, got_fault); unsafe_get/put_user(c, ptr, got_fault); ... user_access_end(); return 0; got_fault: user_access_end(); efault: return -EFAULT; which is more boilerplate, but ends up generating much better code. And for "unsafe_put_user()" in particular, we actually could teach gcc to use "asm goto" to really improve code generation. I have patches for that if you want to look. I'm attaching an example patch for "filldir()" that does that modern thing. It almost had the right form as-is anyway, and under some loads (eg "find") filldir actually shows up in profiles too.\ And just from a code generation standpoint, look at what the attached patch does: [torvalds@i7 linux]$ diff -u fs/readdir.s ~/readdir.s | diffstat readdir.s | 548 ++ 1 file changed, 201 insertions(+), 347 deletions(-) just from getting rid of that crazy repeated SMAP overhead. Yeah, yeah, doing diffstat on generated assembly files is all kinds of crazy, but it's an example of what kind of odd garbage we currently generate. But the *first* thing I'd like to do would be to just get rid of __get_user/__put_user as a thing. It really does generate nasty code, and we might as well just make it do that function call. Because what we do now isn't right. If we care about performance, the "__xyz_user" versions are wrong (use unsafe_xyz_user() instead). And if you don't care about performance, you should use the
Re: [git pull] uaccess-related bits of vfs.git
So I should have asked earlier, but I was feeling rather busy during the early merge window.. On Sun, Apr 30, 2017 at 8:45 PM, Al Viro wrote: > uaccess unification pile. It's _not_ the end of uaccess work, but > the next batch of that will go into the next cycle. This one mostly takes > copy_from_user() and friends out of arch/* and gets the zero-padding behaviour > in sync for all architectures. > Dealing with the nocache/writethrough mess is for the next cycle; > fortunately, that's x86-only. So I'm actually more interested to hear if you have any pending work on cleaning up the __get/put_user() mess we have. Is that on your radar at all? In particular, right now, both __get_user() and __put_user() are nasty and broken interfaces. It *used* to be that they were designed to just generate a single instruction. That's not what they currently do at all, due to the whole SMAP/PAN on x86 and arm. For example, on x86, right now a single __put_user() call ends up generating something like #APP # 58 "./arch/x86/include/asm/smap.h" 1 661: 662: .skip -(((6651f-6641f)-(662b-661b)) > 0) * ((6651f-6641f)-(662b-661b)),0x90 663: .pushsection .altinstructions,"a" .long 661b - . .long 6641f - . .word ( 9*32+20) .byte 663b-661b .byte 6651f-6641f .byte 663b-662b .popsection .pushsection .altinstr_replacement, "ax" 6641: .byte 0x0f,0x01,0xcb 6651: .popsection # 0 "" 2 #NO_APP xorl%eax, %eax # __pu_err #APP # 269 "fs/readdir.c" 1 1: movq %rcx,8(%rdi) # offset, MEM[(struct __large_struct *)_16] 2: .section .fixup,"ax" 3: mov $-14,%eax #, __pu_err jmp 2b .previous .pushsection "__ex_table","a" .balign 4 .long (1b) - . .long (3b) - . .long (ex_handler_default) - . .popsection # 0 "" 2 # 52 "./arch/x86/include/asm/smap.h" 1 661: 662: .skip -(((6651f-6641f)-(662b-661b)) > 0) * ((6651f-6641f)-(662b-661b)),0x90 663: .pushsection .altinstructions,"a" .long 661b - . .long 6641f - . .word ( 9*32+20) .byte 663b-661b .byte 6651f-6641f .byte 663b-662b .popsection .pushsection .altinstr_replacement, "ax" 6641: .byte 0x0f,0x01,0xca 6651: .popsection # 0 "" 2 #NO_APP and while much of that is out-of-line and in different sections, it basically means that it's insane how we inline these things. Yes, yes, the inlined part of the above horror-story ends up being just clac xor%eax,%eax mov%rcx,0x8(%rdi) stac and everything else is just boiler-plate for the alt-instruction handling and the exception handling. But even that small thing is rather debatable from a performance angle: the actual cost of __put_user() these days is not the function call, but the clac/stac (on modern x86) and the PAN bit games (on arm). So I'm actually inclined to just say "We should make __get_user/__put_user() just be aliases for the normal get_user/put_user() model". The *correct* way to do fast user space accesses when you hoist error checking outside is to use if (!access_ok(..)) goto efault; user_access_begin(); .. ... loop over or otherwise do multiple "raw" accessess ... unsafe_get/put_user(c, ptr, got_fault); unsafe_get/put_user(c, ptr, got_fault); ... user_access_end(); return 0; got_fault: user_access_end(); efault: return -EFAULT; which is more boilerplate, but ends up generating much better code. And for "unsafe_put_user()" in particular, we actually could teach gcc to use "asm goto" to really improve code generation. I have patches for that if you want to look. I'm attaching an example patch for "filldir()" that does that modern thing. It almost had the right form as-is anyway, and under some loads (eg "find") filldir actually shows up in profiles too.\ And just from a code generation standpoint, look at what the attached patch does: [torvalds@i7 linux]$ diff -u fs/readdir.s ~/readdir.s | diffstat readdir.s | 548 ++ 1 file changed, 201 insertions(+), 347 deletions(-) just from getting rid of that crazy repeated SMAP overhead. Yeah, yeah, doing diffstat on generated assembly files is all kinds of crazy, but it's an example of what kind of odd garbage we currently generate. But the *first* thing I'd like to do would be to just get rid of __get_user/__put_user as a thing. It really does generate nasty code, and we might as well just make it do that function call. Because what we do now isn't right. If we care about performance, the "__xyz_user" versions are wrong (use unsafe_xyz_user() instead). And if you don't care about performance, you should use the regular xyz_user() functions