Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Sun, Oct 13, 2019 at 12:22:38PM -0700, Linus Torvalds wrote: > On Sun, Oct 13, 2019 at 12:10 PM Al Viro wrote: > > > > No arguments re put_user_ex side of things... Below is a completely > > untested patch for get_user_ex elimination (it seems to build, but that's > > it); in any case, I would really like to see comments from x86 folks > > before it goes anywhere. > > Please don't do this: > > > + if (unlikely(__copy_from_user(, usc, sizeof(sc > > + goto Efault; > > Why would you use __copy_from_user()? Just don't. > > > + if (unlikely(__copy_from_user(, user_vm86, > > + offsetof(struct vm86_struct, int_revectored > > Same here. > > There's no excuse for __copy_from_user(). FWIW, callers of __copy_from_user() remaining in the generic code: 1) regset.h:user_regset_copyin(). Switch to copy_from_user(); the calling conventions of regset ->set() (as well as the method name) are atrocious, but there are too many instances to mix any work in that direction into this series. Yes, nominally it's an inline, but IRL it's too large and has many callers in the same file(s), so any optimizations of inlining __copy_from_user() will be lost and there's more than enough work done there to make access_ok() a noise. And in this case it doesn't pay to try and lift user_access_begin() into the callers - the work done between the calls is often too non-trivial to be done in such area. The same goes for other regset.h stuff; eventually we might want to try and come up with saner API, but that's a separate story. 2) default csum_partial_copy_from_user(). What we need to do is turn it into default csum_and_copy_from_user(). This #ifndef _HAVE_ARCH_COPY_AND_CSUM_FROM_USER static inline __wsum csum_and_copy_from_user (const void __user *src, void *dst, int len, __wsum sum, int *err_ptr) { if (access_ok(src, len)) return csum_partial_copy_from_user(src, dst, len, sum, err_ptr); if (len) *err_ptr = -EFAULT; return sum; } #endif in checksum.h is the only thing that calls that sucker and we can bloody well combine them and make the users of lib/checksum.h define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER. That puts us reasonably close to having _HAVE_ARCH_COPY_AND_CSUM_FROM_USER unconditional and in any case, __copy_from_user() in lib/checksum.h turns into copy_from_user(). 3) firewire ioctl_queue_iso(). Convert to copy_from_user(), lose the access_ok() before the loop. Definitely not an unsafe_... situation (we call fw_iso_context_queue() after each chunk; _not_ something we want under user_access_begin()/user_access_end()) and it's really not worth trying to save on access_ok() checks there. 4) pstore persistent_ram_update_user(). Obvious copy_from_user(); definitely lose access_ok() in the caller (persistent_ram_write_user()), along with the one in write_pmsg() (several calls back by the callchain). 5) test_kasan: lose the function, lose the tests... 6) drivers/scsi/sg.c nest: sg_read() ones are memdup_user() in disguise (i.e. fold with immediately preceding kmalloc()s). sg_new_write() - fold with access_ok() into copy_from_user() (for both call sites). sg_write() - lose access_ok(), use copy_from_user() (both call sites) and get_user() (instead of the solitary __get_user() there). 7) i915 ones are, frankly, terrifying. Consider e.g. this one: relocs = kvmalloc_array(size, 1, GFP_KERNEL); if (!relocs) { err = -ENOMEM; goto err; } /* copy_from_user is limited to < 4GiB */ copied = 0; do { unsigned int len = min_t(u64, BIT_ULL(31), size - copied); if (__copy_from_user((char *)relocs + copied, (char __user *)urelocs + copied, len)) goto end; copied += len; } while (copied < size); Is that for real? Are they *really* trying to allocate and copy >2Gb of userland data? That's eb_copy_relocations() and that crap is itself in a loop. Sizes come from user-supplied data. WTF? It's some weird kmemdup lookalike and I'd rather heard from maintainers of that thing before doing anything with it. 8) vhost_copy_from_user(). Need comments from mst - it's been a while since I crawled through that code and I'd need his ACK anyway. The logics with positioning of access_ok() in there is non-trivial and I'm not sure how much of that serves as early input validation and how much can be taken out and replaced by use of place copy_from_user() and friends. 9) KVM. There I'm not sure that access_ok() would be the right thing to do. kvm_is_error_hva() tends to serve as the range check in that and
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Tue, Oct 15, 2019 at 08:40:12PM +0100, Al Viro wrote: > or this > static void ntb_memcpy_tx(struct ntb_queue_entry *entry, void __iomem *offset) > { > #ifdef ARCH_HAS_NOCACHE_UACCESS > /* > * Using non-temporal mov to improve performance on non-cached > * writes, even though we aren't actually copying from user space. > */ > __copy_from_user_inatomic_nocache(offset, entry->buf, entry->len); > #else > memcpy_toio(offset, entry->buf, entry->len); > #endif > > /* Ensure that the data is fully copied out before setting the flags > */ > wmb(); > > ntb_tx_copy_callback(entry, NULL); > } > "user" part is bollocks in both cases; moreover, I really wonder about that > ifdef in ntb one - ARCH_HAS_NOCACHE_UACCESS is x86-only *at* *the* *moment* > and it just so happens that ..._toio() doesn't require anything special on > x86. Have e.g. arm grow nocache stuff and the things will suddenly break, > won't they? Incidentally, there are two callers of __copy_from_user_inatomic_nocache() in generic code: lib/iov_iter.c:792: __copy_from_user_inatomic_nocache((to += v.iov_len) - v.iov_len, lib/iov_iter.c:849: if (__copy_from_user_inatomic_nocache((to += v.iov_len) - v.iov_len, Neither is done under under pagefault_disable(), AFAICS. This one drivers/gpu/drm/qxl/qxl_ioctl.c:189:unwritten = __copy_from_user_inatomic_nocache probably is - it has something called qxl_bo_kmap_atomic_page() called just prior, which would seem to imply kmap_atomic() somewhere in it. The same goes for drivers/gpu/drm/i915/i915_gem.c:500:unwritten = __copy_from_user_inatomic_nocache((void __force *)vaddr + offset, So we have 5 callers anywhere. Two are not "inatomic" in any sense; source is in userspace and we want nocache behaviour. Two _are_ done into a page that had been fed through kmap_atomic(); the source is, again, in userland. And the last one is complete BS - it wants memcpy_toio_nocache() and abuses this thing. Incidentally, in case of fault i915 caller ends up unmapping the page, mapping it non-atomic (with kmap?) and doing plain copy_from_user(), nocache be damned. qxl, OTOH, whines and fails all the way to userland...
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Tue, Oct 15, 2019 at 12:00:34PM -0700, Linus Torvalds wrote: > On Tue, Oct 15, 2019 at 11:08 AM Al Viro wrote: > > > > Another question: right now we have > > if (!access_ok(uaddr, sizeof(u32))) > > return -EFAULT; > > > > ret = arch_futex_atomic_op_inuser(op, oparg, , uaddr); > > if (ret) > > return ret; > > in kernel/futex.c. Would there be any objections to moving access_ok() > > inside the instances and moving pagefault_disable()/pagefault_enable() > > outside? > > I think we should remove all the "atomic" versions, and just make the > rule be that if you want atomic, you surround it with > pagefault_disable()/pagefault_enable(). Umm... I thought about that, but ended up with "it documents the intent" - pagefault_disable() might be implicit (e.g. done by kmap_atomic()) or several levels up the call chain. Not sure. > That covers not just the futex ops (where "atomic" is actually > somewhat ambiguous - the ops themselves are atomic too, so the naming > might stay, although arguably the "futex" part makes that pointless > too), but also copy_to_user_inatomic() and the powerpc version of > __get_user_inatomic(). Eh? copy_to_user_inatomic() doesn't exist; __copy_to_user_inatomic() does, but... arch/mips/kernel/unaligned.c:1307: res = __copy_to_user_inatomic(addr, fpr, sizeof(*fpr)); drivers/gpu/drm/i915/i915_gem.c:313:unwritten = __copy_to_user_inatomic(user_data, lib/test_kasan.c:510: unused = __copy_to_user_inatomic(usermem, kmem, size + 1); mm/maccess.c:98:ret = __copy_to_user_inatomic((__force void __user *)dst, src, size); these are all callers it has left anywhere and I'm certainly going to kill it. Now, __copy_from_user_inatomic() has a lot more callers left... Frankly, the messier part of API is the nocache side of things. Consider e.g. this: /* platform specific: cacheless copy */ static void cacheless_memcpy(void *dst, void *src, size_t n) { /* * Use the only available X64 cacheless copy. Add a __user cast * to quiet sparse. The src agument is already in the kernel so * there are no security issues. The extra fault recovery machinery * is not invoked. */ __copy_user_nocache(dst, (void __user *)src, n, 0); } or this static void ntb_memcpy_tx(struct ntb_queue_entry *entry, void __iomem *offset) { #ifdef ARCH_HAS_NOCACHE_UACCESS /* * Using non-temporal mov to improve performance on non-cached * writes, even though we aren't actually copying from user space. */ __copy_from_user_inatomic_nocache(offset, entry->buf, entry->len); #else memcpy_toio(offset, entry->buf, entry->len); #endif /* Ensure that the data is fully copied out before setting the flags */ wmb(); ntb_tx_copy_callback(entry, NULL); } "user" part is bollocks in both cases; moreover, I really wonder about that ifdef in ntb one - ARCH_HAS_NOCACHE_UACCESS is x86-only *at* *the* *moment* and it just so happens that ..._toio() doesn't require anything special on x86. Have e.g. arm grow nocache stuff and the things will suddenly break, won't they?
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Tue, Oct 15, 2019 at 11:08 AM Al Viro wrote: > > Another question: right now we have > if (!access_ok(uaddr, sizeof(u32))) > return -EFAULT; > > ret = arch_futex_atomic_op_inuser(op, oparg, , uaddr); > if (ret) > return ret; > in kernel/futex.c. Would there be any objections to moving access_ok() > inside the instances and moving pagefault_disable()/pagefault_enable() > outside? I think we should remove all the "atomic" versions, and just make the rule be that if you want atomic, you surround it with pagefault_disable()/pagefault_enable(). That covers not just the futex ops (where "atomic" is actually somewhat ambiguous - the ops themselves are atomic too, so the naming might stay, although arguably the "futex" part makes that pointless too), but also copy_to_user_inatomic() and the powerpc version of __get_user_inatomic(). So we'd aim to get rid of all the "inatomic" ones entirely. Same ultimately probably goes for the NMI versions. We should just make it be a rule that we can use all of the user access functions with pagefault_{dis,en}able() around them, and they'll be "safe" to use in atomic context. One issue with the NMI versions is that they actually want to avoid the current value of set_fs(). So copy_from_user_nmi() (at least on x86) is special in that it does if (__range_not_ok(from, n, TASK_SIZE)) return n; instead of access_ok() because of that issue. NMI also has some other issues (nmi_uaccess_okay() on x86, at least), but those *probably* could be handled at page fault time instead. Anyway, NMI is so special that I'd suggest leaving it for later, but the non-NMI atomic accesses I would suggest you clean up at the same time. I think the *only* reason we have the "inatomic()" versions is that the regular ones do that "might_fault()" testing unconditionally, and might_fault() _used_ to be just a might_sleep() - so it's not about functionality per se, it's about "we have this sanity check that we need to undo". We've already made "might_fault()" look at pagefault_disabled(), so I think a lot of the reasons for inatomic are entirely historical. Linus
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
[futex folks and linux-arch Cc'd] On Sun, Oct 13, 2019 at 08:59:49PM +0100, Al Viro wrote: > Re plotting: how strongly would you object against passing the range to > user_access_end()? Powerpc folks have a very close analogue of stac/clac, > currently buried inside their __get_user()/__put_user()/etc. - the same > places where x86 does, including futex.h and friends. > > And there it's even costlier than on x86. It would obviously be nice > to lift it at least out of unsafe_get_user()/unsafe_put_user() and > move into user_access_begin()/user_access_end(); unfortunately, in > one subarchitecture they really want it the range on the user_access_end() > side as well. That's obviously not fatal (they can bloody well save those > into thread_info at user_access_begin()), but right now we have relatively > few user_access_end() callers, so the interface changes are still possible. > > Other architectures with similar stuff are riscv (no arguments, same > as for stac/clac), arm (uaccess_save_and_enable() on the way in, > return value passed to uaccess_restore() on the way out) and s390 > (similar to arm, but there it's needed only to deal with nesting, > and I'm not sure it actually can happen). > > It would be nice to settle the API while there are not too many users > outside of arch/x86; changing it later will be a PITA and we definitely > have architectures that do potentially costly things around the userland > memory access; user_access_begin()/user_access_end() is in the right > place to try and see if they fit there... Another question: right now we have if (!access_ok(uaddr, sizeof(u32))) return -EFAULT; ret = arch_futex_atomic_op_inuser(op, oparg, , uaddr); if (ret) return ret; in kernel/futex.c. Would there be any objections to moving access_ok() inside the instances and moving pagefault_disable()/pagefault_enable() outside? Reasons: * on x86 that would allow folding access_ok() with STAC into user_access_begin(). The same would be doable on other usual suspects (arm, arm64, ppc, riscv, s390), bringing access_ok() next to their STAC counterparts. * pagefault_disable()/pagefault_enable() pair is universal on all architectures, really meant to by the nature of the beast and lifting it into kernel/futex.c would get the same situation as with futex_atomic_cmpxchg_inatomic(). Which also does access_ok() inside the primitive (also foldable into user_access_begin(), at that). * access_ok() would be closer to actual memory access (and out of the generic code). Comments?
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
Linus Torvalds writes: > On Sun, Oct 13, 2019 at 12:59 PM Al Viro wrote: >> Re plotting: how strongly would you object against passing the range to >> user_access_end()? Powerpc folks have a very close analogue of stac/clac, >> currently buried inside their __get_user()/__put_user()/etc. - the same >> places where x86 does, including futex.h and friends. >> >> And there it's even costlier than on x86. It would obviously be nice >> to lift it at least out of unsafe_get_user()/unsafe_put_user() and >> move into user_access_begin()/user_access_end(); unfortunately, in >> one subarchitecture they really want it the range on the user_access_end() >> side as well. > > Hmm. I'm ok with that. > > Do they want the actual range, or would it prefer some kind of opaque > cookie that user_access_begin() returns (where 0 would mean "failure" > of course)? The code does want the actual range, or at least the range rounded to a segment boundary (256MB). But it can get that already from a value it stashes in current->thread, it was just more natural to pass the addr/size with the way the code is currently structured. It seems to generate slightly better code to pass addr/size vs loading it from current->thread, but it's probably in the noise vs everything else that's going on. So a cookie would work fine, we could return the encoded addr/size in the cookie and that might generate better code than loading it back from current->thread. Equally we could just use the value in current->thread and not have any cookie at all. cheers
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Sun, Oct 13, 2019 at 12:59 PM Al Viro wrote: > > Re plotting: how strongly would you object against passing the range to > user_access_end()? Powerpc folks have a very close analogue of stac/clac, > currently buried inside their __get_user()/__put_user()/etc. - the same > places where x86 does, including futex.h and friends. > > And there it's even costlier than on x86. It would obviously be nice > to lift it at least out of unsafe_get_user()/unsafe_put_user() and > move into user_access_begin()/user_access_end(); unfortunately, in > one subarchitecture they really want it the range on the user_access_end() > side as well. Hmm. I'm ok with that. Do they want the actual range, or would it prefer some kind of opaque cookie that user_access_begin() returns (where 0 would mean "failure" of course)? I'm thinking like a local_irq_save/restore thing, which might be the case on yet other architectures. Linus
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Sun, Oct 13, 2019 at 12:22:38PM -0700, Linus Torvalds wrote: > On Sun, Oct 13, 2019 at 12:10 PM Al Viro wrote: > > > > No arguments re put_user_ex side of things... Below is a completely > > untested patch for get_user_ex elimination (it seems to build, but that's > > it); in any case, I would really like to see comments from x86 folks > > before it goes anywhere. > > Please don't do this: > > > + if (unlikely(__copy_from_user(, usc, sizeof(sc > > + goto Efault; > > Why would you use __copy_from_user()? Just don't. > > > + if (unlikely(__copy_from_user(, user_vm86, > > + offsetof(struct vm86_struct, int_revectored > > Same here. > > There's no excuse for __copy_from_user(). Probably... Said that, vm86 one is preceded by if (!access_ok(user_vm86, plus ? sizeof(struct vm86_struct) : sizeof(struct vm86plus_struct))) return -EFAULT; so I didn't want to bother. We'll need to eliminate most of access_ok() anyway, and I figured that conversion to plain copy_from_user() would go there as well. Again, this is not a patch submission - just an illustration of what I meant re getting rid of get_user_ex(). IOW, the whole thing is still in the plotting stage. Re plotting: how strongly would you object against passing the range to user_access_end()? Powerpc folks have a very close analogue of stac/clac, currently buried inside their __get_user()/__put_user()/etc. - the same places where x86 does, including futex.h and friends. And there it's even costlier than on x86. It would obviously be nice to lift it at least out of unsafe_get_user()/unsafe_put_user() and move into user_access_begin()/user_access_end(); unfortunately, in one subarchitecture they really want it the range on the user_access_end() side as well. That's obviously not fatal (they can bloody well save those into thread_info at user_access_begin()), but right now we have relatively few user_access_end() callers, so the interface changes are still possible. Other architectures with similar stuff are riscv (no arguments, same as for stac/clac), arm (uaccess_save_and_enable() on the way in, return value passed to uaccess_restore() on the way out) and s390 (similar to arm, but there it's needed only to deal with nesting, and I'm not sure it actually can happen). It would be nice to settle the API while there are not too many users outside of arch/x86; changing it later will be a PITA and we definitely have architectures that do potentially costly things around the userland memory access; user_access_begin()/user_access_end() is in the right place to try and see if they fit there...
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Sun, Oct 13, 2019 at 12:10 PM Al Viro wrote: > > No arguments re put_user_ex side of things... Below is a completely > untested patch for get_user_ex elimination (it seems to build, but that's > it); in any case, I would really like to see comments from x86 folks > before it goes anywhere. Please don't do this: > + if (unlikely(__copy_from_user(, usc, sizeof(sc > + goto Efault; Why would you use __copy_from_user()? Just don't. > + if (unlikely(__copy_from_user(, user_vm86, > + offsetof(struct vm86_struct, int_revectored Same here. There's no excuse for __copy_from_user(). Linus
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Sun, Oct 13, 2019 at 11:43:57AM -0700, Linus Torvalds wrote: > On Sun, Oct 13, 2019 at 11:13 AM Al Viro wrote: > > > > Umm... TBH, I wonder if we would be better off if restore_sigcontext() > > (i.e. sigreturn()/rt_sigreturn()) would flat-out copy_from_user() the > > entire[*] struct sigcontext into a local variable and then copied fields > > to pt_regs... > > Probably ok., We've generally tried to avoid state that big on the > stack, but you're right that it's shallow. > > > Same for do_sys_vm86(), perhaps. > > > > And these (32bit and 64bit restore_sigcontext() and do_sys_vm86()) > > are the only get_user_ex() users anywhere... > > Yeah, that sounds like a solid strategy for getting rid of them. > > Particularly since we can't really make get_user_ex() generate > particularly good code (at least for now). > > Now, put_user_ex() is a different thing - converting it to > unsafe_put_user() actually does make it generate very good code - much > better than copying data twice. No arguments re put_user_ex side of things... Below is a completely untested patch for get_user_ex elimination (it seems to build, but that's it); in any case, I would really like to see comments from x86 folks before it goes anywhere. diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c index 1cee10091b9f..28a32ccc32de 100644 --- a/arch/x86/ia32/ia32_signal.c +++ b/arch/x86/ia32/ia32_signal.c @@ -46,59 +46,38 @@ #define get_user_seg(seg) ({ unsigned int v; savesegment(seg, v); v; }) #define set_user_seg(seg, v) loadsegment_##seg(v) -#define COPY(x){ \ - get_user_ex(regs->x, >x); \ -} - -#define GET_SEG(seg) ({ \ - unsigned short tmp; \ - get_user_ex(tmp, >seg); \ - tmp;\ -}) +#define COPY(x) regs->x = sc.x -#define COPY_SEG_CPL3(seg) do {\ - regs->seg = GET_SEG(seg) | 3; \ -} while (0) +#define COPY_SEG_CPL3(seg) regs->seg = sc.seg | 3 #define RELOAD_SEG(seg){ \ - unsigned int pre = (seg) | 3; \ + unsigned int pre = sc.seg | 3; \ unsigned int cur = get_user_seg(seg); \ if (pre != cur) \ set_user_seg(seg, pre); \ } static int ia32_restore_sigcontext(struct pt_regs *regs, - struct sigcontext_32 __user *sc) + struct sigcontext_32 __user *usc) { - unsigned int tmpflags, err = 0; - u16 gs, fs, es, ds; - void __user *buf; - u32 tmp; + struct sigcontext_32 sc; /* Always make any pending restarted system calls return -EINTR */ current->restart_block.fn = do_no_restart_syscall; - get_user_try { - gs = GET_SEG(gs); - fs = GET_SEG(fs); - ds = GET_SEG(ds); - es = GET_SEG(es); - - COPY(di); COPY(si); COPY(bp); COPY(sp); COPY(bx); - COPY(dx); COPY(cx); COPY(ip); COPY(ax); - /* Don't touch extended registers */ + if (unlikely(__copy_from_user(, usc, sizeof(sc + goto Efault; - COPY_SEG_CPL3(cs); - COPY_SEG_CPL3(ss); + COPY(di); COPY(si); COPY(bp); COPY(sp); COPY(bx); + COPY(dx); COPY(cx); COPY(ip); COPY(ax); + /* Don't touch extended registers */ - get_user_ex(tmpflags, >flags); - regs->flags = (regs->flags & ~FIX_EFLAGS) | (tmpflags & FIX_EFLAGS); - /* disable syscall checks */ - regs->orig_ax = -1; + COPY_SEG_CPL3(cs); + COPY_SEG_CPL3(ss); - get_user_ex(tmp, >fpstate); - buf = compat_ptr(tmp); - } get_user_catch(err); + regs->flags = (regs->flags & ~FIX_EFLAGS) | (sc.flags & FIX_EFLAGS); + /* disable syscall checks */ + regs->orig_ax = -1; /* * Reload fs and gs if they have changed in the signal @@ -111,11 +90,15 @@ static int ia32_restore_sigcontext(struct pt_regs *regs, RELOAD_SEG(ds); RELOAD_SEG(es); - err |= fpu__restore_sig(buf, 1); + if (unlikely(fpu__restore_sig(compat_ptr(sc.fpstate), 1))) + goto Efault; force_iret(); + return 0; - return err; +Efault: + force_iret(); + return -EFAULT; } asmlinkage long sys32_sigreturn(void) diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index 61d93f062a36..ac81f06f8358 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -335,12 +335,9 @@ do { \ "i" (errret), "0" (retval)); \ }) -#define
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Sun, Oct 13, 2019 at 11:13 AM Al Viro wrote: > > Umm... TBH, I wonder if we would be better off if restore_sigcontext() > (i.e. sigreturn()/rt_sigreturn()) would flat-out copy_from_user() the > entire[*] struct sigcontext into a local variable and then copied fields > to pt_regs... Probably ok., We've generally tried to avoid state that big on the stack, but you're right that it's shallow. > Same for do_sys_vm86(), perhaps. > > And these (32bit and 64bit restore_sigcontext() and do_sys_vm86()) > are the only get_user_ex() users anywhere... Yeah, that sounds like a solid strategy for getting rid of them. Particularly since we can't really make get_user_ex() generate particularly good code (at least for now). Now, put_user_ex() is a different thing - converting it to unsafe_put_user() actually does make it generate very good code - much better than copying data twice. Linus
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Thu, Oct 10, 2019 at 05:31:13PM -0700, Linus Torvalds wrote: > So the code actually needs to properly return the error early, or > initialize the segments that didn't get loaded to 0, or something. > > And when I posted that, Luto said "just get rid of the get_user_ex() > entirely, instead of changing semantics of the existing ones to be > sane. > > Which is probably right. There aren't that many. > > I *thought* there were also cases of us doing some questionably things > inside the get_user_try sections, but those seem to have gotten fixed > already independently, so it's really just the "make try/catch really > try/catch" change that needs some editing of our current broken stuff > that depends on it not actually *catching* exceptions, but on just > continuing on to the next one. Umm... TBH, I wonder if we would be better off if restore_sigcontext() (i.e. sigreturn()/rt_sigreturn()) would flat-out copy_from_user() the entire[*] struct sigcontext into a local variable and then copied fields to pt_regs... The thing is small enough for not blowing the stack (256 bytes max. and it's on a shallow stack) and big enough to make "fancy memcpy + let the compiler think how to combine in-kernel copies" potentially better than hardwired sequence of 64bit loads/stores... [*] OK, sans ->reserved part in the very end on 64bit. 192 bytes to copy. Same for do_sys_vm86(), perhaps - we want regs/flags/cpu_type and screen_bitmap there, i.e. the beginning of struct vm86plus_struct and of struct vm86_struct... 24*32bit. IOW, 96-byte memcpy + gcc-visible field-by-field copying vs. hardwired sequence of 32bit loads (with some 16bit ones thrown in, for extra fun) and compiler told not to reorder anything. And these (32bit and 64bit restore_sigcontext() and do_sys_vm86()) are the only get_user_ex() users anywhere...
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Thu, Oct 10, 2019 at 5:11 PM Al Viro wrote: > > On Thu, Oct 10, 2019 at 03:12:49PM -0700, Linus Torvalds wrote: > > > But I've not gotten around to rewriting those disgusting sequences to > > the unsafe_get/put_user() model. I did look at it, and it requires > > some changes exactly *because* the _ex() functions are broken and > > continue, but also because the current code ends up also doing other > > things inside the try/catch region that you're not supposed to do in a > > user_access_begin/end() region . > > Hmm... Which one was that? AFAICS, we have > do_sys_vm86: only get_user_ex() > restore_sigcontext(): get_user_ex(), set_user_gs() > ia32_restore_sigcontext(): get_user_ex() Try this patch. It works fine (well, it worked fine the lastr time I tried this, I might have screwed something up just now: I re-created the patch since I hadn't saved it). It's nice and clean, and does 1 file changed, 9 insertions(+), 91 deletions(-) by just deleting all the nasty *_ex() macros entirely, replacing them with unsafe_get/put_user() calls. And now those try/catch regions actually work like try/catch regions, and a fault branches to the catch. BUT. It does change semantics, and you get warnings like arch/x86/ia32/ia32_signal.c: In function ‘ia32_restore_sigcontext’: arch/x86/ia32/ia32_signal.c:114:9: warning: ‘buf’ may be used uninitialized in this function [-Wmaybe-uninitialized] 114 | err |= fpu__restore_sig(buf, 1); | ^~~~ arch/x86/ia32/ia32_signal.c:64:27: warning: ‘ds’ may be used uninitialized in this function [-Wmaybe-uninitialized] 64 | unsigned int pre = (seg) | 3; \ | ^ arch/x86/ia32/ia32_signal.c:74:18: note: ‘ds’ was declared here ... arch/x86/kernel/signal.c: In function ‘restore_sigcontext’: arch/x86/kernel/signal.c:152:9: warning: ‘buf’ may be used uninitialized in this function [-Wmaybe-uninitialized] 152 | err |= fpu__restore_sig(buf, IS_ENABLED(CONFIG_X86_32)); | ^~~~ because it's true: those things reall may not be initialized, because the catch thing could have jumped out. So the code actually needs to properly return the error early, or initialize the segments that didn't get loaded to 0, or something. And when I posted that, Luto said "just get rid of the get_user_ex() entirely, instead of changing semantics of the existing ones to be sane. Which is probably right. There aren't that many. I *thought* there were also cases of us doing some questionably things inside the get_user_try sections, but those seem to have gotten fixed already independently, so it's really just the "make try/catch really try/catch" change that needs some editing of our current broken stuff that depends on it not actually *catching* exceptions, but on just continuing on to the next one. Linus arch/x86/include/asm/uaccess.h | 100 - 1 file changed, 9 insertions(+), 91 deletions(-) diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index 61d93f062a36..e87d8911dc53 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -193,23 +193,12 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL)) : : "A" (x), "r" (addr) \ : : label) -#define __put_user_asm_ex_u64(x, addr) \ - asm volatile("\n" \ - "1: movl %%eax,0(%1)\n" \ - "2: movl %%edx,4(%1)\n" \ - "3:" \ - _ASM_EXTABLE_EX(1b, 2b)\ - _ASM_EXTABLE_EX(2b, 3b)\ - : : "A" (x), "r" (addr)) - #define __put_user_x8(x, ptr, __ret_pu)\ asm volatile("call __put_user_8" : "=a" (__ret_pu) \ : "A" ((typeof(*(ptr)))(x)), "c" (ptr) : "ebx") #else #define __put_user_goto_u64(x, ptr, label) \ __put_user_goto(x, ptr, "q", "", "er", label) -#define __put_user_asm_ex_u64(x, addr) \ - __put_user_asm_ex(x, addr, "q", "", "er") #define __put_user_x8(x, ptr, __ret_pu) __put_user_x(8, x, ptr, __ret_pu) #endif @@ -289,31 +278,6 @@ do { \ }\ } while (0) -/* - * This doesn't do __uaccess_begin/end - the exception handling - * around it must do that. - */ -#define __put_user_size_ex(x, ptr, size)\ -do { \ - __chk_user_ptr(ptr); \ - switch (size) { \ - case 1:\ - __put_user_asm_ex(x, ptr, "b", "b", "iq"); \ - break; \ - case 2:\ - __put_user_asm_ex(x, ptr, "w", "w", "ir"); \ - break; \ - case 4:\ - __put_user_asm_ex(x, ptr, "l", "k", "ir"); \ - break; \ - case 8:\ - __put_user_asm_ex_u64((__typeof__(*ptr))(x), ptr); \ - break; \ - default: \ - __put_user_bad(); \ - }\ -} while (0) - #ifdef CONFIG_X86_32 #define __get_user_asm_u64(x, ptr, retval, errret) \ ({ \ @@ -334,13 +298,9 @@ do { \ : "m" (__m(__ptr)), "m"
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Thu, Oct 10, 2019 at 03:12:49PM -0700, Linus Torvalds wrote: > On Thu, Oct 10, 2019 at 12:55 PM Al Viro wrote: > > > > Anyway, another question you way: what do you think of try/catch approaches > > to __get_user() blocks, like e.g. restore_sigcontext() is doing? > > I'd rather have them converted to our unsafe_get/put_user() instead. > > We don't generate great code for the "get" case (because of how gcc > doesn't allow us to mix "asm goto" and outputs), but I really despise > the x86-specific "{get,put}_user_ex()" machinery. It's not actually > doing a real try/catch at all, and will just keep taking faults if one > happens. > > But I've not gotten around to rewriting those disgusting sequences to > the unsafe_get/put_user() model. I did look at it, and it requires > some changes exactly *because* the _ex() functions are broken and > continue, but also because the current code ends up also doing other > things inside the try/catch region that you're not supposed to do in a > user_access_begin/end() region . Hmm... Which one was that? AFAICS, we have do_sys_vm86: only get_user_ex() restore_sigcontext(): get_user_ex(), set_user_gs() ia32_restore_sigcontext(): get_user_ex() So at least get_user_try/get_user_ex/get_user_catch should be killable. The other side... save_v86_state(): put_user_ex() setup_sigcontext(): put_user_ex() __setup_rt_frame(): put_user_ex(), static_cpu_has() another one in __setup_rt_frame(): put_user_ex() x32_setup_rt_frame(): put_user_ex() ia32_setup_sigcontext(): put_user_ex() ia32_setup_frame(): put_user_ex() another one in ia32_setup_frame(): put_user_ex(), static_cpu_has() IDGI... Is static_cpu_has() not allowed in there? Looks like it's all inlines and doesn't do any potentially risky memory accesses... What am I missing? As for the try/catch model... How about if (!user_access_begin()) sod off ... unsafe_get_user(..., l); ... unsafe_get_user_nojump(); ... unsafe_get_user_nojump(); ... if (user_access_did_fail()) goto l; user_access_end() ... return 0; l: ... user_access_end() return -EFAULT; making it clear that we are delaying the check for failures until it's more convenient. And *not* trying to trick C parser into enforcing anything - let objtool do it and to hell with do { and } while (0) in magic macros. Could be mixed with the normal unsafe_..._user() without any problems, AFAICS...
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Thu, Oct 10, 2019 at 12:55 PM Al Viro wrote: > > Anyway, another question you way: what do you think of try/catch approaches > to __get_user() blocks, like e.g. restore_sigcontext() is doing? I'd rather have them converted to our unsafe_get/put_user() instead. We don't generate great code for the "get" case (because of how gcc doesn't allow us to mix "asm goto" and outputs), but I really despise the x86-specific "{get,put}_user_ex()" machinery. It's not actually doing a real try/catch at all, and will just keep taking faults if one happens. But I've not gotten around to rewriting those disgusting sequences to the unsafe_get/put_user() model. I did look at it, and it requires some changes exactly *because* the _ex() functions are broken and continue, but also because the current code ends up also doing other things inside the try/catch region that you're not supposed to do in a user_access_begin/end() region . > Should that be available outside of arch/*? For that matter, would > it be a good idea to convert get_user_ex() users in arch/x86 to > unsafe_get_user()? See above: yes, it would be a good idea to convert to unsafe_get/put_user(), and no, we don't want to expose the horrid *_ex() model to other architectures. Linus
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Mon, Oct 07, 2019 at 09:24:17PM -0700, Linus Torvalds wrote: > On Mon, Oct 7, 2019 at 9:09 PM Linus Torvalds > wrote: > > > > Try the attached patch, and then count the number of "rorx" > > instructions in the kernel. Hint: not many. On my personal config, > > this triggers 15 times in the whole kernel build (not counting > > modules). > > .. and four of them are in perf_callchain_user(), and are due to those > "__copy_from_user_nmi()" with either 4-byte or 8-byte copies. > > It might as well just use __get_user() instead. > > The point being that the silly code in the header files is just > pointless. We shouldn't do it. FWIW, the one that looks the most potentiall sensitive in that bunch is arch/x86/kvm/paging_tmpl.h:388: if (unlikely(__copy_from_user(, ptep_user, sizeof(pte in the bowels of KVM page fault handling. I would be very surprised if the rest would be detectable... Anyway, another question you way: what do you think of try/catch approaches to __get_user() blocks, like e.g. restore_sigcontext() is doing? Should that be available outside of arch/*? For that matter, would it be a good idea to convert get_user_ex() users in arch/x86 to unsafe_get_user()?
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Tue, Oct 08, 2019 at 08:58:58PM +0100, Al Viro wrote: > The difference is, they have separate "for read" and "for write" primitives > and they want the range in their user_access_end() analogue. Separating > the read and write isn't a problem for callers (we want them close to > the actual memory accesses). Passing the range to user_access_end() just > might be tolerable, unless it makes you throw up... NOTE: I'm *NOT* suggesting to bring back the VERIFY_READ/VERIFY_WRITE argument to access_ok(). We'd gotten rid of it, and for a very good reason (and decades overdue). The main difference between access_ok() and user_access_begin() is that the latter is right next to actual memory access, with user_access_end() on the other side, also very close. And most of those guys would be concentrated in a few functions, where we bloody well know which direction we are copying. Even if we try and map ppc allow_..._to_user() on user_access_begin(), access_ok() remains as it is (and I hope we'll get rid of the majority of its caller in process).
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Tue, Oct 08, 2019 at 08:58:58PM +0100, Al Viro wrote: > That's powerpc. And while the constant-sized bits are probably pretty > useless there as well, note the > allow_read_from_user()/prevent_read_from_user() > part. Looks suspiciously similar to user_access_begin()/user_access_end()... > > The difference is, they have separate "for read" and "for write" primitives > and they want the range in their user_access_end() analogue. Separating > the read and write isn't a problem for callers (we want them close to > the actual memory accesses). Passing the range to user_access_end() just > might be tolerable, unless it makes you throw up... BTW, another related cleanup is futex_atomic_op_inuser() and arch_futex_atomic_op_inuser(). In the former we have if (!access_ok(uaddr, sizeof(u32))) return -EFAULT; ret = arch_futex_atomic_op_inuser(op, oparg, , uaddr); if (ret) return ret; and in the latter we've got STAC/CLAC pairs stuck into inlined bits on x86. As well as allow_write_to_user(uaddr, sizeof(*uaddr)) on ppc... I don't see anything in x86 one objtool would've barfed if we pulled STAC/CLAC out and turned access_ok() into user_access_begin(), with matching user_access_end() right after the call of arch_futex_atomic_op_inuser(). Everything is inlined there and no scary memory accesses would get into the scope (well, we do have if (!ret) *oval = oldval; in the very end of arch_futex_atomic_op_inuser() there, but oval is the address of a local variable in the sole caller; if we run with kernel stack on ring 3 page, we are deeply fucked *and* wouldn't have survived that far into futex_atomic_op_inuser() anyway ;-)
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Mon, Oct 07, 2019 at 11:26:35AM -0700, Linus Torvalds wrote: > The good news is that right now x86 is the only architecture that does > that user_access_begin(), so we don't need to worry about anything > else. Apparently the ARM people haven't had enough performance > problems with the PAN bit for them to care. Take a look at this: static inline unsigned long raw_copy_from_user(void *to, const void __user *from, unsigned long n) { unsigned long ret; if (__builtin_constant_p(n) && (n <= 8)) { ret = 1; switch (n) { case 1: barrier_nospec(); __get_user_size(*(u8 *)to, from, 1, ret); break; case 2: barrier_nospec(); __get_user_size(*(u16 *)to, from, 2, ret); break; case 4: barrier_nospec(); __get_user_size(*(u32 *)to, from, 4, ret); break; case 8: barrier_nospec(); __get_user_size(*(u64 *)to, from, 8, ret); break; } if (ret == 0) return 0; } barrier_nospec(); allow_read_from_user(from, n); ret = __copy_tofrom_user((__force void __user *)to, from, n); prevent_read_from_user(from, n); return ret; } That's powerpc. And while the constant-sized bits are probably pretty useless there as well, note the allow_read_from_user()/prevent_read_from_user() part. Looks suspiciously similar to user_access_begin()/user_access_end()... The difference is, they have separate "for read" and "for write" primitives and they want the range in their user_access_end() analogue. Separating the read and write isn't a problem for callers (we want them close to the actual memory accesses). Passing the range to user_access_end() just might be tolerable, unless it makes you throw up...
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Tue, Oct 08, 2019 at 05:38:31PM +0200, Greg KH wrote: > On Tue, Oct 08, 2019 at 04:29:00PM +0100, Al Viro wrote: > > On Tue, Oct 08, 2019 at 03:14:16PM +0200, Greg KH wrote: > > > On Tue, Oct 08, 2019 at 05:57:12AM +0100, Al Viro wrote: > > > > > > > > OK... BTW, do you agree that the use of access_ok() in > > > > drivers/tty/n_hdlc.c:n_hdlc_tty_read() is wrong? It's used as an early > > > > cutoff, so we don't bother waiting if user has passed an obviously bogus > > > > address. copy_to_user() is used for actual copying there... > > > > > > Yes, it's wrong, and not needed. I'll go rip it out unless you want to? > > > > I'll throw it into misc queue for now; it has no prereqs and nothing is > > going > > to depend upon it. > > Great, thanks. > > > While looking for more of the same pattern: usb_device_read(). Frankly, > > usb_device_dump() calling conventions look ugly - it smells like it > > would be much happier as seq_file. Iterator would take some massage, > > but that seems to be doable. Anyway, that's a separate story... > > That's just a debugfs file, and yes, it should be moved to seq_file. I > think I tried it a long time ago, but given it's just a debugging thing, > I gave up as it wasn't worth it. > > But yes, the access_ok() there also seems odd, and should be dropped. I'm almost tempted to keep it there as a reminder/grep fodder ;-) Seriously, though, it might be useful to have a way of marking the places in need of gentle repair of retrocranial inversions _without_ attracting the "checkpatch warning of the week" crowd...
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Tue, Oct 08, 2019 at 04:29:00PM +0100, Al Viro wrote: > On Tue, Oct 08, 2019 at 03:14:16PM +0200, Greg KH wrote: > > On Tue, Oct 08, 2019 at 05:57:12AM +0100, Al Viro wrote: > > > > > > OK... BTW, do you agree that the use of access_ok() in > > > drivers/tty/n_hdlc.c:n_hdlc_tty_read() is wrong? It's used as an early > > > cutoff, so we don't bother waiting if user has passed an obviously bogus > > > address. copy_to_user() is used for actual copying there... > > > > Yes, it's wrong, and not needed. I'll go rip it out unless you want to? > > I'll throw it into misc queue for now; it has no prereqs and nothing is going > to depend upon it. Great, thanks. > While looking for more of the same pattern: usb_device_read(). Frankly, > usb_device_dump() calling conventions look ugly - it smells like it > would be much happier as seq_file. Iterator would take some massage, > but that seems to be doable. Anyway, that's a separate story... That's just a debugfs file, and yes, it should be moved to seq_file. I think I tried it a long time ago, but given it's just a debugging thing, I gave up as it wasn't worth it. But yes, the access_ok() there also seems odd, and should be dropped. thanks, greg k-h
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Tue, Oct 08, 2019 at 03:14:16PM +0200, Greg KH wrote: > On Tue, Oct 08, 2019 at 05:57:12AM +0100, Al Viro wrote: > > > > OK... BTW, do you agree that the use of access_ok() in > > drivers/tty/n_hdlc.c:n_hdlc_tty_read() is wrong? It's used as an early > > cutoff, so we don't bother waiting if user has passed an obviously bogus > > address. copy_to_user() is used for actual copying there... > > Yes, it's wrong, and not needed. I'll go rip it out unless you want to? I'll throw it into misc queue for now; it has no prereqs and nothing is going to depend upon it. While looking for more of the same pattern: usb_device_read(). Frankly, usb_device_dump() calling conventions look ugly - it smells like it would be much happier as seq_file. Iterator would take some massage, but that seems to be doable. Anyway, that's a separate story...
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Tue, Oct 08, 2019 at 05:57:12AM +0100, Al Viro wrote: > > OK... BTW, do you agree that the use of access_ok() in > drivers/tty/n_hdlc.c:n_hdlc_tty_read() is wrong? It's used as an early > cutoff, so we don't bother waiting if user has passed an obviously bogus > address. copy_to_user() is used for actual copying there... Yes, it's wrong, and not needed. I'll go rip it out unless you want to? thanks, greg k-h
RE: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
From: Linus Torvalds > Sent: 07 October 2019 19:11 ... > I've been very close to just removing __get_user/__put_user several > times, exactly because people do completely the wrong thing with them > - not speeding code up, but making it unsafe and buggy. They could do the very simple check that 'user_ptr+size < kernel_base' rather than the full window check under the assumption that access_ok() has been called and that the likely errors are just overruns. > The new "user_access_begin/end()" model is much better, but it also > has actual STATIC checking that there are no function calls etc inside > the region, so it forces you to do the loop properly and tightly, and > not the incorrect "I checked the range somewhere else, now I'm doing > an unsafe copy". > > And it actually speeds things up, unlike the access_ok() games. I've code that does: if (!access_ok(...)) return -EFAULT; ... for (...) { if (__get_user(tmp_u64, user_ptr++)) return -EFAULT; writeq(tmp_u64, io_ptr++); } (Although the code is more complex because not all transfers are multiples of 8 bytes.) With user_access_begin/end() I'd probably want to put the copy loop inside a function (which will probably get inlined) to avoid convoluted error processing. So you end up with: if (!user_access_ok()) return _EFAULT; user_access_begin(); rval = do_copy_code(...); user_access_end(); return rval; Which, at the source level (at least) breaks your 'no function calls' rule. The writeq() might also break it as well. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Tue, Oct 8, 2019 at 1:30 AM Guenter Roeck wrote: > m68k: > > c2p_iplan2.c:(.text+0x98): undefined reference to `c2p_unsupported' > > I don't know the status. Fall-out from the (non)inline optimization. Patch available: https://lore.kernel.org/lkml/20190927094708.11563-1-ge...@linux-m68k.org/ Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Mon, Oct 07, 2019 at 09:14:51PM -0700, Linus Torvalds wrote: > On Mon, Oct 7, 2019 at 9:09 PM Linus Torvalds > wrote: > > > > Try the attached patch, and then count the number of "rorx" > > instructions in the kernel. Hint: not many. On my personal config, > > this triggers 15 times in the whole kernel build (not counting > > modules). > > So here's a serious patch that doesn't just mark things for counting - > it just removes the cases entirely. > > Doesn't this look nice: > > 2 files changed, 2 insertions(+), 133 deletions(-) > > and it is one less thing to worry about when doing further cleanup. > > Seriously, if any of those __copy_{to,from}_user() constant cases were > a big deal, we can turn them into get_user/put_user calls. But only > after they show up as an actual performance issue. Makes sense. I'm not arguing against doing that. Moreover, I suspect that other architectures will be similar, at least once the sigframe-related code for given architecture is dealt with. But that's more of a "let's look at that later" thing (hopefully with maintainers of architectures getting involved).
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Mon, Oct 07, 2019 at 09:09:14PM -0700, Linus Torvalds wrote: > > 1) cross-architecture user_access_begin_dont_use(): on everything > > except x86 it's empty, on x86 - __uaccess_begin_nospec(). > > No, just do a proper range check, and use user_access_begin() > > Stop trying to optimize that range check away. It's a couple of fast > instructions. > > The only ones who don't want the range check are the actual kernel > copy ones, but they don't want the user_access_begin() either. Not at the first step. Sure, in the end we want exactly that, and we want it ASAP. However, the main reason it grows into a tangled mess that would be over the top for this cycle is the impacts in arseload of places all over arch/*. That way we can untangle those. The initial segment that would allow to use raw_copy_to_user() cleanly in readdir.c et.al. could be done with provably zero impact on anything in arch/* outside of arch/x86 usercopy-related code. Moreover, it will be fairly small. And after that the rest can be done in any order, independent from each other. I want to kill __copy_... completely, and I believe we'll be able to do just that in a cycle or two. Once that is done, the helper disappears along with __copy_...(). And it will be documented as a temporary kludge, don't use anywhere else, no matter what from the very beginning. For all the couple of cycles it'll take. I'm serious about getting rid of __copy_...() in that timeframe. There's not that much left. The reason I don't want to do a blanket search-and-replace turning them all into copy_...() is simply that their use is a good indicator of code in need of serious beating^Wamount of careful review. And hell, we might end up doing just that on case-by-case basis. Often enough we will, by what I'd seen there... Again, this kluge is only a splitup aid - by the end of the series it's gone. All it allows is to keep it easier to review. Note, BTW, that bits and pieces converting a given pointless use of __copy_...() to copy_...() can be reordered freely at any point of the sequence - I've already got several. _Some_ of (5) will be conversions a-la readdir.c one and that has to follow (4), but most of it won't be like that. > > void *copy_mount_options(const void __user * data) > > { > > unsigned offs, size; > > char *copy; > > > > if (!data) > > return NULL; > > > > copy = kmalloc(PAGE_SIZE, GFP_KERNEL); > > if (!copy) > > return ERR_PTR(-ENOMEM); > > > > offs = (unsigned long)untagged_addr(data) & (PAGE_SIZE - 1); > > > > if (copy_from_user(copy, data, PAGE_SIZE - offs)) { > > kfree(copy); > > return ERR_PTR(-EFAULT); > > } > > if (offs) { > > if (copy_from_user(copy, data + PAGE_SIZE - offs, offs)) > > memset(copy + PAGE_SIZE - offs, 0, offs); > > } > > return copy; > > } > > > > on the theory that any fault halfway through a page means a race with > > munmap/mprotect/etc. and we can just pretend we'd lost the race entirely. > > And to hell with exact_copy_from_user(), byte-by-byte copying, etc. > > Looks reasonable. OK... BTW, do you agree that the use of access_ok() in drivers/tty/n_hdlc.c:n_hdlc_tty_read() is wrong? It's used as an early cutoff, so we don't bother waiting if user has passed an obviously bogus address. copy_to_user() is used for actual copying there... There are other places following that pattern and IMO they are all wrong. Another variety is a half-arsed filter trying to prevent warnings from too large (and user-controllable) kmalloc() of buffer we'll be copying to. Which is worth very little, since kmalloc() will scream and fail well before access_ok() limits will trip. Those need explicit capping of the size, IMO...
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Mon, Oct 7, 2019 at 9:09 PM Linus Torvalds wrote: > > Try the attached patch, and then count the number of "rorx" > instructions in the kernel. Hint: not many. On my personal config, > this triggers 15 times in the whole kernel build (not counting > modules). .. and four of them are in perf_callchain_user(), and are due to those "__copy_from_user_nmi()" with either 4-byte or 8-byte copies. It might as well just use __get_user() instead. The point being that the silly code in the header files is just pointless. We shouldn't do it. Linus
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Mon, Oct 7, 2019 at 9:09 PM Linus Torvalds wrote: > > Try the attached patch, and then count the number of "rorx" > instructions in the kernel. Hint: not many. On my personal config, > this triggers 15 times in the whole kernel build (not counting > modules). So here's a serious patch that doesn't just mark things for counting - it just removes the cases entirely. Doesn't this look nice: 2 files changed, 2 insertions(+), 133 deletions(-) and it is one less thing to worry about when doing further cleanup. Seriously, if any of those __copy_{to,from}_user() constant cases were a big deal, we can turn them into get_user/put_user calls. But only after they show up as an actual performance issue. Linus arch/x86/include/asm/uaccess_32.h | 27 -- arch/x86/include/asm/uaccess_64.h | 108 +- 2 files changed, 2 insertions(+), 133 deletions(-) diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h index ba2dc1930630..388a40660c7b 100644 --- a/arch/x86/include/asm/uaccess_32.h +++ b/arch/x86/include/asm/uaccess_32.h @@ -23,33 +23,6 @@ raw_copy_to_user(void __user *to, const void *from, unsigned long n) static __always_inline unsigned long raw_copy_from_user(void *to, const void __user *from, unsigned long n) { - if (__builtin_constant_p(n)) { - unsigned long ret; - - switch (n) { - case 1: - ret = 0; - __uaccess_begin_nospec(); - __get_user_asm_nozero(*(u8 *)to, from, ret, - "b", "b", "=q", 1); - __uaccess_end(); - return ret; - case 2: - ret = 0; - __uaccess_begin_nospec(); - __get_user_asm_nozero(*(u16 *)to, from, ret, - "w", "w", "=r", 2); - __uaccess_end(); - return ret; - case 4: - ret = 0; - __uaccess_begin_nospec(); - __get_user_asm_nozero(*(u32 *)to, from, ret, - "l", "k", "=r", 4); - __uaccess_end(); - return ret; - } - } return __copy_user_ll(to, (__force const void *)from, n); } diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h index 5cd1caa8bc65..bc10e3dc64fe 100644 --- a/arch/x86/include/asm/uaccess_64.h +++ b/arch/x86/include/asm/uaccess_64.h @@ -65,117 +65,13 @@ copy_to_user_mcsafe(void *to, const void *from, unsigned len) static __always_inline __must_check unsigned long raw_copy_from_user(void *dst, const void __user *src, unsigned long size) { - int ret = 0; - - if (!__builtin_constant_p(size)) - return copy_user_generic(dst, (__force void *)src, size); - switch (size) { - case 1: - __uaccess_begin_nospec(); - __get_user_asm_nozero(*(u8 *)dst, (u8 __user *)src, - ret, "b", "b", "=q", 1); - __uaccess_end(); - return ret; - case 2: - __uaccess_begin_nospec(); - __get_user_asm_nozero(*(u16 *)dst, (u16 __user *)src, - ret, "w", "w", "=r", 2); - __uaccess_end(); - return ret; - case 4: - __uaccess_begin_nospec(); - __get_user_asm_nozero(*(u32 *)dst, (u32 __user *)src, - ret, "l", "k", "=r", 4); - __uaccess_end(); - return ret; - case 8: - __uaccess_begin_nospec(); - __get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src, - ret, "q", "", "=r", 8); - __uaccess_end(); - return ret; - case 10: - __uaccess_begin_nospec(); - __get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src, - ret, "q", "", "=r", 10); - if (likely(!ret)) - __get_user_asm_nozero(*(u16 *)(8 + (char *)dst), - (u16 __user *)(8 + (char __user *)src), - ret, "w", "w", "=r", 2); - __uaccess_end(); - return ret; - case 16: - __uaccess_begin_nospec(); - __get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src, - ret, "q", "", "=r", 16); - if (likely(!ret)) - __get_user_asm_nozero(*(u64 *)(8 + (char *)dst), - (u64 __user *)(8 + (char __user *)src), - ret, "q", "", "=r", 8); - __uaccess_end(); - return ret; - default: - return copy_user_generic(dst, (__force void *)src, size); - } + return copy_user_generic(dst, (__force void *)src, size); } static __always_inline __must_check unsigned long raw_copy_to_user(void __user *dst, const void *src, unsigned long size) { - int ret = 0; - - if (!__builtin_constant_p(size)) - return copy_user_generic((__force void *)dst, src, size); - switch (size) { - case 1: - __uaccess_begin(); - __put_user_asm(*(u8 *)src, (u8 __user *)dst, - ret, "b", "b", "iq", 1); - __uaccess_end(); - return ret; - case 2: - __uaccess_begin(); - __put_user_asm(*(u16 *)src, (u16 __user *)dst, - ret, "w", "w", "ir", 2); - __uaccess_end(); - return ret; - case 4: - __uaccess_begin(); - __put_user_asm(*(u32 *)src, (u32 __user *)dst, - ret, "l", "k", "ir", 4); - __uaccess_end(); - return ret; - case 8: - __uaccess_begin(); - __put_user_asm(*(u64 *)src, (u64 __user *)dst, - ret, "q", "", "er", 8); - __uaccess_end(); - return ret; - case 10: - __uaccess_begin(); - __put_user_asm(*(u64 *)src, (u64 __user *)dst, - ret, "q", "", "er", 10); - if (likely(!ret)) { -
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Mon, Oct 7, 2019 at 8:29 PM Al Viro wrote: > > For x86? Sure, why not... Note, BTW, that for short constant-sized > copies we *do* STAC/CLAC at the call site - see those > __uaccess_begin_nospec(); > in raw_copy_{from,to}_user() in the switches... Yeah, an that code almost never actually triggers in practice. The code is pointless and dead. The thing is, it's only ever used for the double undescore versions, and the ones that do have have it are almost never constant sizes in the first place. And yes, there's like a couple of cases in the whole kernel. Just remove those constant size cases. They are pointless and just complicate our headers and slow down the compile for no good reason. Try the attached patch, and then count the number of "rorx" instructions in the kernel. Hint: not many. On my personal config, this triggers 15 times in the whole kernel build (not counting modules). It's not worth it. The "speedup" from using __copy_{to,from}_user() with the fancy inlining is negligible. All the cost is in the STAC/CLAC anyway, the code might as well be deleted. > 1) cross-architecture user_access_begin_dont_use(): on everything > except x86 it's empty, on x86 - __uaccess_begin_nospec(). No, just do a proper range check, and use user_access_begin() Stop trying to optimize that range check away. It's a couple of fast instructions. The only ones who don't want the range check are the actual kernel copy ones, but they don't want the user_access_begin() either. > void *copy_mount_options(const void __user * data) > { > unsigned offs, size; > char *copy; > > if (!data) > return NULL; > > copy = kmalloc(PAGE_SIZE, GFP_KERNEL); > if (!copy) > return ERR_PTR(-ENOMEM); > > offs = (unsigned long)untagged_addr(data) & (PAGE_SIZE - 1); > > if (copy_from_user(copy, data, PAGE_SIZE - offs)) { > kfree(copy); > return ERR_PTR(-EFAULT); > } > if (offs) { > if (copy_from_user(copy, data + PAGE_SIZE - offs, offs)) > memset(copy + PAGE_SIZE - offs, 0, offs); > } > return copy; > } > > on the theory that any fault halfway through a page means a race with > munmap/mprotect/etc. and we can just pretend we'd lost the race entirely. > And to hell with exact_copy_from_user(), byte-by-byte copying, etc. Looks reasonable. Linus diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h index 5cd1caa8bc65..db58c4436ce3 100644 --- a/arch/x86/include/asm/uaccess_64.h +++ b/arch/x86/include/asm/uaccess_64.h @@ -62,6 +62,8 @@ copy_to_user_mcsafe(void *to, const void *from, unsigned len) return ret; } +#define marker(x) asm volatile("rorx $" #x ",%rax,%rdx") + static __always_inline __must_check unsigned long raw_copy_from_user(void *dst, const void __user *src, unsigned long size) { @@ -72,30 +74,35 @@ raw_copy_from_user(void *dst, const void __user *src, unsigned long size) switch (size) { case 1: __uaccess_begin_nospec(); + marker(1); __get_user_asm_nozero(*(u8 *)dst, (u8 __user *)src, ret, "b", "b", "=q", 1); __uaccess_end(); return ret; case 2: __uaccess_begin_nospec(); + marker(2); __get_user_asm_nozero(*(u16 *)dst, (u16 __user *)src, ret, "w", "w", "=r", 2); __uaccess_end(); return ret; case 4: __uaccess_begin_nospec(); + marker(4); __get_user_asm_nozero(*(u32 *)dst, (u32 __user *)src, ret, "l", "k", "=r", 4); __uaccess_end(); return ret; case 8: __uaccess_begin_nospec(); + marker(8); __get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src, ret, "q", "", "=r", 8); __uaccess_end(); return ret; case 10: __uaccess_begin_nospec(); + marker(10); __get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src, ret, "q", "", "=r", 10); if (likely(!ret)) @@ -106,6 +113,7 @@ raw_copy_from_user(void *dst, const void __user *src, unsigned long size) return ret; case 16: __uaccess_begin_nospec(); + marker(16); __get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src, ret, "q", "", "=r", 16); if (likely(!ret)) @@ -129,30 +137,35 @@ raw_copy_to_user(void __user *dst, const void *src, unsigned long size) switch (size) { case 1: __uaccess_begin(); + marker(51); __put_user_asm(*(u8 *)src, (u8 __user *)dst, ret, "b", "b", "iq", 1); __uaccess_end(); return ret; case 2: __uaccess_begin(); + marker(52); __put_user_asm(*(u16 *)src, (u16 __user *)dst, ret, "w", "w", "ir", 2); __uaccess_end(); return ret; case 4: __uaccess_begin(); + marker(54); __put_user_asm(*(u32 *)src, (u32 __user *)dst, ret, "l", "k", "ir", 4); __uaccess_end(); return ret; case 8: __uaccess_begin(); + marker(58); __put_user_asm(*(u64 *)src, (u64 __user *)dst, ret, "q", "", "er", 8);
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Mon, Oct 07, 2019 at 11:26:35AM -0700, Linus Torvalds wrote: > But on x86, if we move the STAC/CLAC out of the low-level copy > routines and into the callers, we'll have a _lot_ of churn. I thought > it would be mostly a "teach objtool" thing, but we have lots of > different versions of it. Not just the 32-bit vs 64-bit, it's embedded > in all the low-level asm implementations. > > And we don't want the regular "copy_to/from_user()" to then have to > add the STAC/CLAC at the call-site. So then we'd want to un-inline > copy_to_user() entirely. For x86? Sure, why not... Note, BTW, that for short constant-sized copies we *do* STAC/CLAC at the call site - see those __uaccess_begin_nospec(); in raw_copy_{from,to}_user() in the switches... > Which all sounds like a really good idea, don't get me wrong. I think > we inline it way too aggressively now. But it'sa _big_ job. > > So we probably _should_ > > - remove INLINE_COPY_TO/FROM_USER > > - remove all the "small constant size special cases". > > - make "raw_copy_to/from_user()" have the "unsafe" semantics and make > the out-of-line copy in lib/usercopy.c be the only real interface > > - get rid of a _lot_ of oddities Not that many, really. All we need is a temporary cross-architecture __uaccess_begin_nospec(), so that __copy_{to,from}_user() could have that used, instead of having it done in (x86) raw_copy_..._...(). Other callers of raw_copy_...() would simply wrap it into user_access_begin()/ user_access_end() pairs; this kludge is needed only in __copy_from_user() and __copy_to_user(), and only until we kill their callers outside of arch/*. Which we can do, in a cycle or two. _ANY_ use of that temporary kludge outside of those two functions will be grepped for and LARTed into the ground. > I hope you prove me wrong. But I'll look at a smaller change to just > make x86 use the current special copy loop (as > "unsafe_copy_to_user()") and have everybody else do the trivial > wrapper. > > Because we definitely should do that cleanup (it also fixes the whole > "atomic copy in kernel space" issue that you pointed to that doesn't > actually want STAC/CLAC at all), but it just looks fairly massive to > me. AFAICS, it splits nicely. 1) cross-architecture user_access_begin_dont_use(): on everything except x86 it's empty, on x86 - __uaccess_begin_nospec(). 2) stac/clac lifted into x86 raw_copy_..._user() out of copy_user_generic_unrolled(), copy_user_generic_string() and copy_user_enhanced_fast_string(). Similar lift out of __copy_user_nocache(). 3) lifting that thing as user_access_begin_dont_use() into __copy_..._user...() and as user_access_begin() into other generic callers, consuming access_ok() in the latter. __copy_to_user_inatomic() can die at the same stage. 4) possibly uninlining on x86 (and yes, killing the special size handling off). We don't need to touch the inlining decisions for any other architectures. At that point raw_copy_to_user() is available for e.g. readdir.c to play with. And up to that point only x86 sees any kind of code changes, so we don't have to worry about other architectures. 5) kill the __copy_...() users outside of arch/*, alone with quite a few other weird shits in there. A cycle or two, with the final goal being to kill the damn things off. 6) arch/* users get arch-by-arch treatment - mostly it's sigframe handling. Won't affect the generic code and would be independent for different architectures. Can happen in parallel with (5), actually. 7) ... at that point user_access_begin_dont_user() gets removed and thrown into the pile of mangled fingers of those who'd ignored all warnings and used it somewhere else. I don't believe that (5) would be doable entirely in this cycle, but quite a few bits might be. On a somewhat related note, do you see any problems with void *copy_mount_options(const void __user * data) { unsigned offs, size; char *copy; if (!data) return NULL; copy = kmalloc(PAGE_SIZE, GFP_KERNEL); if (!copy) return ERR_PTR(-ENOMEM); offs = (unsigned long)untagged_addr(data) & (PAGE_SIZE - 1); if (copy_from_user(copy, data, PAGE_SIZE - offs)) { kfree(copy); return ERR_PTR(-EFAULT); } if (offs) { if (copy_from_user(copy, data + PAGE_SIZE - offs, offs)) memset(copy + PAGE_SIZE - offs, 0, offs); } return copy; } on the theory that any fault halfway through a page means a race with munmap/mprotect/etc. and we can just pretend we'd lost the race entirely. And to hell with exact_copy_from_user(), byte-by-byte copying, etc.
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On 10/7/19 12:21 PM, Linus Torvalds wrote: On Sun, Oct 6, 2019 at 3:20 PM Guenter Roeck wrote: this patch causes all my sparc64 emulations to stall during boot. It causes all alpha emulations to crash with [1a] and [1b] when booting from a virtual disk, and one of the xtensa emulations to crash with [2]. So I think your alpha emulation environment may be broken, because Michael Cree reports that it works for him on real hardware, but he does see the kernel unaligned count being high. But regardless, this is my current fairly minimal patch that I think should fix the unaligned issue, while still giving the behavior we want on x86. I hope Al can do something nicer, but I think this is "acceptable". I'm running this now on x86, and I verified that x86-32 code generation looks sane too, but it woudl be good to verify that this makes the alignment issue go away on other architectures. Linus Test results look good. Feel free to add Tested-by: Guenter Roeck to your patch. Build results: total: 158 pass: 154 fail: 4 Failed builds: arm:allmodconfig m68k:defconfig mips:allmodconfig sparc64:allmodconfig Qemu test results: total: 391 pass: 390 fail: 1 Failed tests: ppc64:mpc8544ds:ppc64_e5500_defconfig:nosmp:initrd This is with "regulator: fixed: Prevent NULL pointer dereference when !CONFIG_OF" applied as well. The other failures are unrelated. arm: arch/arm/crypto/aes-ce-core.S:299: Error: selected processor does not support `movw ip,:lower16:.Lcts_permute_table' in ARM mode Fix is pending in crypto tree. m68k: c2p_iplan2.c:(.text+0x98): undefined reference to `c2p_unsupported' I don't know the status. mips: drivers/staging/octeon/ethernet-defines.h:30:38: error: 'CONFIG_CAVIUM_OCTEON_CVMSEG_SIZE' undeclared and other similar errors. I don't know the status. ppc64: powerpc64-linux-ld: mm/page_alloc.o:(.toc+0x18): undefined reference to `node_reclaim_distance' Reported against offending patch earlier today. sparc64: drivers/watchdog/cpwd.c:500:19: error: 'compat_ptr_ioctl' undeclared here Oops. I'll need to look into that. Looks like the patch to use a new infrastructure made it into the kernel but the infrastructure itself didn't make it after all. Guenter
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Mon, Oct 07, 2019 at 12:21:25PM -0700, Linus Torvalds wrote: > On Sun, Oct 6, 2019 at 3:20 PM Guenter Roeck wrote: > > > > this patch causes all my sparc64 emulations to stall during boot. It causes > > all alpha emulations to crash with [1a] and [1b] when booting from a virtual > > disk, and one of the xtensa emulations to crash with [2]. > > So I think your alpha emulation environment may be broken, because > Michael Cree reports that it works for him on real hardware, but he > does see the kernel unaligned count being high. > Yes, that possibility always exists, unfortunately. > But regardless, this is my current fairly minimal patch that I think > should fix the unaligned issue, while still giving the behavior we > want on x86. I hope Al can do something nicer, but I think this is > "acceptable". > > I'm running this now on x86, and I verified that x86-32 code > generation looks sane too, but it woudl be good to verify that this > makes the alignment issue go away on other architectures. > > Linus I started a complete test run with the patch applied. I'll let you know how it went after it is complete - it should be done in a couple of hours. Guenter
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Mon, Oct 7, 2019 at 12:49 PM Tony Luck wrote: > > If PSR.ac is set, we trap. If it isn't set, then model specific > (though all implementations will > trap for an unaligned access that crosses a 4K boundary). Ok. At that point, setting AC unconditionally is the better model just to get test coverage for "it will trap occasionally anyway". Odd "almost-but-not-quite x86" both in naming and in behavior (AC was a no-op in kernel-mode until SMAP). > Your patch does make all the messages go away. > > Tested-by: Tony Luck Ok, I'll commit it, and we'll see what Al can come up with that might be a bigger cleanup. Linus
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Mon, Oct 7, 2019 at 12:09 PM Linus Torvalds wrote: > Hmm? I thought ia64 did unaligneds ok. If PSR.ac is set, we trap. If it isn't set, then model specific (though all implementations will trap for an unaligned access that crosses a 4K boundary). Linux sets PSR.ac. Applications can use prctl(PR_SET_UNALIGN) to choose whether they want the kernel to silently fix things or to send SIGBUS. Kernel always noisily (rate limited) fixes up unaligned access. Your patch does make all the messages go away. Tested-by: Tony Luck
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Sun, Oct 6, 2019 at 3:20 PM Guenter Roeck wrote: > > this patch causes all my sparc64 emulations to stall during boot. It causes > all alpha emulations to crash with [1a] and [1b] when booting from a virtual > disk, and one of the xtensa emulations to crash with [2]. So I think your alpha emulation environment may be broken, because Michael Cree reports that it works for him on real hardware, but he does see the kernel unaligned count being high. But regardless, this is my current fairly minimal patch that I think should fix the unaligned issue, while still giving the behavior we want on x86. I hope Al can do something nicer, but I think this is "acceptable". I'm running this now on x86, and I verified that x86-32 code generation looks sane too, but it woudl be good to verify that this makes the alignment issue go away on other architectures. Linus arch/x86/include/asm/uaccess.h | 23 ++ fs/readdir.c | 44 ++ include/linux/uaccess.h| 6 -- 3 files changed, 29 insertions(+), 44 deletions(-) diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index 35c225ede0e4..61d93f062a36 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -734,5 +734,28 @@ do { \ if (unlikely(__gu_err)) goto err_label; \ } while (0) +/* + * We want the unsafe accessors to always be inlined and use + * the error labels - thus the macro games. + */ +#define unsafe_copy_loop(dst, src, len, type, label) \ + while (len >= sizeof(type)) { \ + unsafe_put_user(*(type *)src,(type __user *)dst,label); \ + dst += sizeof(type); \ + src += sizeof(type); \ + len -= sizeof(type); \ + } + +#define unsafe_copy_to_user(_dst,_src,_len,label) \ +do { \ + char __user *__ucu_dst = (_dst);\ + const char *__ucu_src = (_src); \ + size_t __ucu_len = (_len); \ + unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u64, label); \ + unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u32, label); \ + unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u16, label); \ + unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u8, label); \ +} while (0) + #endif /* _ASM_X86_UACCESS_H */ diff --git a/fs/readdir.c b/fs/readdir.c index 19bea591c3f1..6e2623e57b2e 100644 --- a/fs/readdir.c +++ b/fs/readdir.c @@ -27,53 +27,13 @@ /* * Note the "unsafe_put_user() semantics: we goto a * label for errors. - * - * Also note how we use a "while()" loop here, even though - * only the biggest size needs to loop. The compiler (well, - * at least gcc) is smart enough to turn the smaller sizes - * into just if-statements, and this way we don't need to - * care whether 'u64' or 'u32' is the biggest size. - */ -#define unsafe_copy_loop(dst, src, len, type, label) \ - while (len >= sizeof(type)) {\ - unsafe_put_user(get_unaligned((type *)src), \ - (type __user *)dst, label); \ - dst += sizeof(type);\ - src += sizeof(type);\ - len -= sizeof(type);\ - } - -/* - * We avoid doing 64-bit copies on 32-bit architectures. They - * might be better, but the component names are mostly small, - * and the 64-bit cases can end up being much more complex and - * put much more register pressure on the code, so it's likely - * not worth the pain of unaligned accesses etc. - * - * So limit the copies to "unsigned long" size. I did verify - * that at least the x86-32 case is ok without this limiting, - * but I worry about random other legacy 32-bit cases that - * might not do as well. - */ -#define unsafe_copy_type(dst, src, len, type, label) do { \ - if (sizeof(type) <= sizeof(unsigned long)) \ - unsafe_copy_loop(dst, src, len, type, label); \ -} while (0) - -/* - * Copy the dirent name to user space, and NUL-terminate - * it. This should not be a function call, since we're doing - * the copy inside a "user_access_begin/end()" section. */ #define unsafe_copy_dirent_name(_dst, _src, _len, label) do { \ char __user *dst = (_dst);\ const char *src = (_src);\ size_t len = (_len); \ - unsafe_copy_type(dst, src, len, u64, label); \ - unsafe_copy_type(dst, src, len, u32, label); \ - unsafe_copy_type(dst, src, len, u16, label); \ - unsafe_copy_type(dst, src, len, u8, label); \ - unsafe_put_user(0, dst, label);\ + unsafe_put_user(0, dst+len, label); \ + unsafe_copy_to_user(dst, src, len, label); \ } while (0) diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h index e47d0522a1f4..d4ee6e942562 100644 --- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -355,8 +355,10 @@ extern long strnlen_unsafe_user(const void __user *unsafe_addr, long count); #ifndef user_access_begin #define user_access_begin(ptr,len) access_ok(ptr, len) #define user_access_end() do { } while (0) -#define unsafe_get_user(x, ptr, err) do { if (unlikely(__get_user(x, ptr))) goto err; } while (0) -#define unsafe_put_user(x, ptr, err)
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Mon, Oct 7, 2019 at 11:36 AM Tony Luck wrote: > > Late to this party ,,, but my ia64 console today is full of: Hmm? I thought ia64 did unaligneds ok. But regardless, this is my current (as yet untested) patch. This is not the big user access cleanup that I hope Al will do, this is just a "ok, x86 is the only one who wants a special unsafe_copy_to_user() right now" patch. Linus arch/x86/include/asm/uaccess.h | 23 ++ fs/readdir.c | 44 ++ include/linux/uaccess.h| 6 -- 3 files changed, 29 insertions(+), 44 deletions(-) diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index 35c225ede0e4..61d93f062a36 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -734,5 +734,28 @@ do { \ if (unlikely(__gu_err)) goto err_label; \ } while (0) +/* + * We want the unsafe accessors to always be inlined and use + * the error labels - thus the macro games. + */ +#define unsafe_copy_loop(dst, src, len, type, label) \ + while (len >= sizeof(type)) { \ + unsafe_put_user(*(type *)src,(type __user *)dst,label); \ + dst += sizeof(type); \ + src += sizeof(type); \ + len -= sizeof(type); \ + } + +#define unsafe_copy_to_user(_dst,_src,_len,label) \ +do { \ + char __user *__ucu_dst = (_dst);\ + const char *__ucu_src = (_src); \ + size_t __ucu_len = (_len); \ + unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u64, label); \ + unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u32, label); \ + unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u16, label); \ + unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u8, label); \ +} while (0) + #endif /* _ASM_X86_UACCESS_H */ diff --git a/fs/readdir.c b/fs/readdir.c index 19bea591c3f1..6e2623e57b2e 100644 --- a/fs/readdir.c +++ b/fs/readdir.c @@ -27,53 +27,13 @@ /* * Note the "unsafe_put_user() semantics: we goto a * label for errors. - * - * Also note how we use a "while()" loop here, even though - * only the biggest size needs to loop. The compiler (well, - * at least gcc) is smart enough to turn the smaller sizes - * into just if-statements, and this way we don't need to - * care whether 'u64' or 'u32' is the biggest size. - */ -#define unsafe_copy_loop(dst, src, len, type, label) \ - while (len >= sizeof(type)) {\ - unsafe_put_user(get_unaligned((type *)src), \ - (type __user *)dst, label); \ - dst += sizeof(type);\ - src += sizeof(type);\ - len -= sizeof(type);\ - } - -/* - * We avoid doing 64-bit copies on 32-bit architectures. They - * might be better, but the component names are mostly small, - * and the 64-bit cases can end up being much more complex and - * put much more register pressure on the code, so it's likely - * not worth the pain of unaligned accesses etc. - * - * So limit the copies to "unsigned long" size. I did verify - * that at least the x86-32 case is ok without this limiting, - * but I worry about random other legacy 32-bit cases that - * might not do as well. - */ -#define unsafe_copy_type(dst, src, len, type, label) do { \ - if (sizeof(type) <= sizeof(unsigned long)) \ - unsafe_copy_loop(dst, src, len, type, label); \ -} while (0) - -/* - * Copy the dirent name to user space, and NUL-terminate - * it. This should not be a function call, since we're doing - * the copy inside a "user_access_begin/end()" section. */ #define unsafe_copy_dirent_name(_dst, _src, _len, label) do { \ char __user *dst = (_dst);\ const char *src = (_src);\ size_t len = (_len); \ - unsafe_copy_type(dst, src, len, u64, label); \ - unsafe_copy_type(dst, src, len, u32, label); \ - unsafe_copy_type(dst, src, len, u16, label); \ - unsafe_copy_type(dst, src, len, u8, label); \ - unsafe_put_user(0, dst, label);\ + unsafe_put_user(0, dst+len, label); \ + unsafe_copy_to_user(dst, src, len, label); \ } while (0) diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h index e47d0522a1f4..d4ee6e942562 100644 --- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -355,8 +355,10 @@ extern long strnlen_unsafe_user(const void __user *unsafe_addr, long count); #ifndef user_access_begin #define user_access_begin(ptr,len) access_ok(ptr, len) #define user_access_end() do { } while (0) -#define unsafe_get_user(x, ptr, err) do { if (unlikely(__get_user(x, ptr))) goto err; } while (0) -#define unsafe_put_user(x, ptr, err) do { if (unlikely(__put_user(x, ptr))) goto err; } while (0) +#define unsafe_op_wrap(op, err) do { if (unlikely(op)) goto err; } while (0) +#define unsafe_get_user(x,p,e) unsafe_op_wrap(__get_user(x,p),e) +#define unsafe_put_user(x,p,e) unsafe_op_wrap(__put_user(x,p),e) +#define unsafe_copy_to_user(d,s,l,e) unsafe_op_wrap(__copy_to_user(d,s,l),e) static inline unsigned long user_access_save(void) { return 0UL; } static inline void user_access_restore(unsigned long flags) {
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Mon, Oct 7, 2019 at 11:28 AM Linus Torvalds wrote: > > On Sun, Oct 6, 2019 at 8:11 PM Linus Torvalds > wrote: > > > > > > > > The last two should just do user_access_begin()/user_access_end() > > > instead of access_ok(). __copy_to_user_inatomic() has very few callers > > > as well: > > > > Yeah, good points. > > Looking at it some more this morning, I think it's actually pretty painful. Late to this party ,,, but my ia64 console today is full of: irqbalance(5244): unaligned access to 0x200800042f9b, ip=0xa001002fef90 irqbalance(5244): unaligned access to 0x200800042fbb, ip=0xa001002fef90 irqbalance(5244): unaligned access to 0x200800042fdb, ip=0xa001002fef90 irqbalance(5244): unaligned access to 0x200800042ffb, ip=0xa001002fef90 irqbalance(5244): unaligned access to 0x20080004301b, ip=0xa001002fef90 ia64_handle_unaligned: 95 callbacks suppressed irqbalance(5244): unaligned access to 0x200800042f9b, ip=0xa001002fef90 irqbalance(5244): unaligned access to 0x200800042fbb, ip=0xa001002fef90 irqbalance(5244): unaligned access to 0x200800042fdb, ip=0xa001002fef90 irqbalance(5244): unaligned access to 0x200800042ffb, ip=0xa001002fef90 irqbalance(5244): unaligned access to 0x20080004301b, ip=0xa001002fef90 ia64_handle_unaligned: 95 callbacks suppressed Those ip's point into filldir64() -Tony
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Sun, Oct 6, 2019 at 8:11 PM Linus Torvalds wrote: > > > > > The last two should just do user_access_begin()/user_access_end() > > instead of access_ok(). __copy_to_user_inatomic() has very few callers as > > well: > > Yeah, good points. Looking at it some more this morning, I think it's actually pretty painful. The good news is that right now x86 is the only architecture that does that user_access_begin(), so we don't need to worry about anything else. Apparently the ARM people haven't had enough performance problems with the PAN bit for them to care. We can have a fallback wrapper for unsafe_copy_to_user() for other architectures that just does the __copy_to_user(). But on x86, if we move the STAC/CLAC out of the low-level copy routines and into the callers, we'll have a _lot_ of churn. I thought it would be mostly a "teach objtool" thing, but we have lots of different versions of it. Not just the 32-bit vs 64-bit, it's embedded in all the low-level asm implementations. And we don't want the regular "copy_to/from_user()" to then have to add the STAC/CLAC at the call-site. So then we'd want to un-inline copy_to_user() entirely. Which all sounds like a really good idea, don't get me wrong. I think we inline it way too aggressively now. But it'sa _big_ job. So we probably _should_ - remove INLINE_COPY_TO/FROM_USER - remove all the "small constant size special cases". - make "raw_copy_to/from_user()" have the "unsafe" semantics and make the out-of-line copy in lib/usercopy.c be the only real interface - get rid of a _lot_ of oddities but looking at just how much churn this is, I suspect that for 5.4 it's a bit late to do quite that much cleanup. I hope you prove me wrong. But I'll look at a smaller change to just make x86 use the current special copy loop (as "unsafe_copy_to_user()") and have everybody else do the trivial wrapper. Because we definitely should do that cleanup (it also fixes the whole "atomic copy in kernel space" issue that you pointed to that doesn't actually want STAC/CLAC at all), but it just looks fairly massive to me. Linus
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Mon, Oct 07, 2019 at 11:13:27AM -0700, Linus Torvalds wrote: > On Mon, Oct 7, 2019 at 10:34 AM Al Viro wrote: > > > > Tangentially related: copy_regster_to_user() and copy_regset_from_user(). > > Not a worry. It's not performance-critical code, and if it ever is, it > needs to be rewritten anyway. > > > The former variant tends to lead to few calls > > of __copy_{to,from}_user(); the latter... On x86 it ends up doing > > this: > > Just replace the __put_user() with a put_user() and be done with it. > That code isn't acceptable, and if somebody ever complains about > performance it's not the lack of __put_user that is the problem. I wonder if it would be better off switching to several "copy in bulk" like e.g. ppc does. That boilerplate with parallel "to/from kernel" and "to/from userland" loops is asking for bugs - the calling conventions like "pass kbuf and ubuf; exactly one must be NULL" tend to be trouble, IME; I'm not saying we should just pass struct iov_iter * instead of count+pos+kbuf+ubuf to ->get() and ->set(), but it might clean the things up nicely. Let me look into that zoo a bit more...
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Mon, Oct 7, 2019 at 10:34 AM Al Viro wrote: > > Tangentially related: copy_regster_to_user() and copy_regset_from_user(). Not a worry. It's not performance-critical code, and if it ever is, it needs to be rewritten anyway. > The former variant tends to lead to few calls > of __copy_{to,from}_user(); the latter... On x86 it ends up doing > this: Just replace the __put_user() with a put_user() and be done with it. That code isn't acceptable, and if somebody ever complains about performance it's not the lack of __put_user that is the problem. Linus
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Mon, Oct 7, 2019 at 8:40 AM David Laight wrote: > > You don't really want an extra access_ok() for every 'word' of a copy. Yes you do. > Some copies have to be done a word at a time. Completely immaterial. If you can't see the access_ok() close to the __get/put_user(), you have a bug. Plus the access_ok() is cheap. The real cost is the STAC/CLAC. So stop with the access_ok() "optimizations". They are broken garbage. Really. I've been very close to just removing __get_user/__put_user several times, exactly because people do completely the wrong thing with them - not speeding code up, but making it unsafe and buggy. The new "user_access_begin/end()" model is much better, but it also has actual STATIC checking that there are no function calls etc inside th4e region, so it forces you to do the loop properly and tightly, and not the incorrect "I checked the range somewhere else, now I'm doing an unsafe copy". And it actually speeds things up, unlike the access_ok() games. Linus
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Sun, Oct 06, 2019 at 08:11:42PM -0700, Linus Torvalds wrote: > > So do we want to bother with separation between raw_copy_to_user() and > > unsafe_copy_to_user()? After all, __copy_to_user() also has only few > > callers, most of them in arch/* > > No, you're right. Just switch over. > > > I'll take a look into that tomorrow - half-asleep right now... > > Thanks. No huge hurry. Tangentially related: copy_regster_to_user() and copy_regset_from_user(). That's where we do access_ok(), followed by calls of ->get() and ->set() resp. Those tend to either use user_regset_copy{out,in}(), or open-code those. The former variant tends to lead to few calls of __copy_{to,from}_user(); the latter... On x86 it ends up doing this: static int genregs_get(struct task_struct *target, const struct user_regset *regset, unsigned int pos, unsigned int count, void *kbuf, void __user *ubuf) { if (kbuf) { unsigned long *k = kbuf; while (count >= sizeof(*k)) { *k++ = getreg(target, pos); count -= sizeof(*k); pos += sizeof(*k); } } else { unsigned long __user *u = ubuf; while (count >= sizeof(*u)) { if (__put_user(getreg(target, pos), u++)) return -EFAULT; count -= sizeof(*u); pos += sizeof(*u); } } return 0; } Potentially doing arseloads of stac/clac as it goes. OTOH, getreg() (and setreg()) in there are not entirely trivial, so blanket user_access_begin()/user_access_end() over the entire loop might be a bad idea... How hot is that codepath? I know that arch/um used to rely on it (== PTRACE_[GS]ETREGS) quite a bit...
RE: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
> From: Linus Torvalds > Sent: 07 October 2019 04:12 ... > In this case, I think it's done a few callers up in i915_gem_pread_ioctl(): > > if (!access_ok(u64_to_user_ptr(args->data_ptr), >args->size)) > return -EFAULT; > > but honestly, trying to optimize away another "access_ok()" is just > not worth it. I'd rather have an extra one than miss one. You don't really want an extra access_ok() for every 'word' of a copy. Some copies have to be done a word at a time. And the checks someone added to copy_to/from_user() to detect kernel buffer overruns must kill performance when the buffers are way down the stack or in kmalloc()ed space. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
Hi Max, On 10/6/19 9:04 PM, Max Filippov wrote: On Sun, Oct 6, 2019 at 3:25 PM Guenter Roeck wrote: this patch causes all my sparc64 emulations to stall during boot. It causes all alpha emulations to crash with [1a] and [1b] when booting from a virtual disk, and one of the xtensa emulations to crash with [2]. [...] [2] Unable to handle kernel paging request at virtual address 0004 reboot(50): Oops -1 pc = [<0004>] ra = [] ps = Tainted: G D pc is at 0x4 ra is at filldir64+0x64/0x320 v0 = t0 = 67736d6b t1 = 00012011445b t2 = t3 = t4 = 7ef8 t5 = 000120114448 t6 = t7 = fc0007eec000 s0 = fc000792b5c3 s1 = 0004 s2 = 0018 s3 = fc0007eefec8 s4 = 0008 s5 = f0a3 s6 = 000b a0 = fc000792b5c3 a1 = 2f2f2f2f2f2f2f2f a2 = 0004 a3 = 000b a4 = f0a3 a5 = 0008 t8 = 0018 t9 = t10= 22e1d02a t11= 00011fd6f3b8 pv = fcb9a810 at = 22e1ccf8 gp = fcf03930 sp = (ptrval) Trace: [] proc_readdir_de+0x170/0x300 [] filldir64+0x0/0x320 [] proc_root_readdir+0x3c/0x80 [] iterate_dir+0x198/0x240 [] ksys_getdents64+0xa8/0x160 [] sys_getdents64+0x20/0x40 [] filldir64+0x0/0x320 [] entSys+0xa4/0xc0 This doesn't look like a dump from xtensa core. v5.4-rc2 kernel doesn't crash for me on xtensa, but the userspace doesn't work well, because all directories appear to be empty. __put_user/__get_user don't do unaligned access on xtensa, they check address alignment and return -EFAULT if it's bad. You are right, sorry; I must have mixed that up. xtensa doesn't crash. The boot stalls, similar to sparc64. This is only seen with my nommu test (de212:kc705-nommu:nommu_kc705_defconfig). xtensa mmu tests are fine, at least for me, but then I only run tests with initrd (which for some reason doesn't crash on alpha either). Guenter
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Sun, Oct 6, 2019 at 3:25 PM Guenter Roeck wrote: > this patch causes all my sparc64 emulations to stall during boot. It causes > all alpha emulations to crash with [1a] and [1b] when booting from a virtual > disk, and one of the xtensa emulations to crash with [2]. [...] > [2] > > Unable to handle kernel paging request at virtual address 0004 > reboot(50): Oops -1 > pc = [<0004>] ra = [] ps = Tainted: G > D > pc is at 0x4 > ra is at filldir64+0x64/0x320 > v0 = t0 = 67736d6b t1 = 00012011445b > t2 = t3 = t4 = 7ef8 > t5 = 000120114448 t6 = t7 = fc0007eec000 > s0 = fc000792b5c3 s1 = 0004 s2 = 0018 > s3 = fc0007eefec8 s4 = 0008 s5 = f0a3 > s6 = 000b > a0 = fc000792b5c3 a1 = 2f2f2f2f2f2f2f2f a2 = 0004 > a3 = 000b a4 = f0a3 a5 = 0008 > t8 = 0018 t9 = t10= 22e1d02a > t11= 00011fd6f3b8 pv = fcb9a810 at = 22e1ccf8 > gp = fcf03930 sp = (ptrval) > Trace: > [] proc_readdir_de+0x170/0x300 > [] filldir64+0x0/0x320 > [] proc_root_readdir+0x3c/0x80 > [] iterate_dir+0x198/0x240 > [] ksys_getdents64+0xa8/0x160 > [] sys_getdents64+0x20/0x40 > [] filldir64+0x0/0x320 > [] entSys+0xa4/0xc0 This doesn't look like a dump from xtensa core. v5.4-rc2 kernel doesn't crash for me on xtensa, but the userspace doesn't work well, because all directories appear to be empty. __put_user/__get_user don't do unaligned access on xtensa, they check address alignment and return -EFAULT if it's bad. -- Thanks. -- Max
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Sun, Oct 6, 2019 at 7:30 PM Guenter Roeck wrote: > > > Mind humoring me and trying that on your alpha machine (or emulator, > > or whatever)? > > Here you are. This is with v5.4-rc2 and your previous patch applied > on top. > > / # ./mmtest > Unable to handle kernel paging request at virtual address 0004 Oookay. Well, that's what I expected, but it's good to just have it confirmed. Well, not "good" in this case. Bad bad bad. The fs/readdir.c changes clearly exposed a pre-existing bug on alpha. Not making excuses for it, but at least it explains why code that _looks_ correct ends up causing that kind of issue. I guess the other 'strict alignment' architectures should be checking that test program too. I'll post my test program to the arch maintainers list. Linus
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Sun, Oct 6, 2019 at 7:50 PM Al Viro wrote: > > Out of those, only __copy_to_user_inatomic(), __copy_to_user(), > _copy_to_user() and iov_iter.c:copyout() can be called on > any architecture. > > The last two should just do user_access_begin()/user_access_end() > instead of access_ok(). __copy_to_user_inatomic() has very few callers as > well: Yeah, good points. It looks like it would be better to just change over semantics entirely to the unsafe_copy_user() model. > So few, in fact, that I wonder if we want to keep it at all; the only > thing stopping me from "let's remove it" is that I don't understand > the i915 side of things. Where does it do an equivalent of access_ok()? Honestly, if you have to ask, I think the answer is: just add one. Every single time we've had people who optimized things to try to avoid the access_ok(), they just caused bugs and problems. In this case, I think it's done a few callers up in i915_gem_pread_ioctl(): if (!access_ok(u64_to_user_ptr(args->data_ptr), args->size)) return -EFAULT; but honestly, trying to optimize away another "access_ok()" is just not worth it. I'd rather have an extra one than miss one. > And mm/maccess.c one is __probe_kernel_write(), so presumably we don't > want stac/clac there at all... Yup. > So do we want to bother with separation between raw_copy_to_user() and > unsafe_copy_to_user()? After all, __copy_to_user() also has only few > callers, most of them in arch/* No, you're right. Just switch over. > I'll take a look into that tomorrow - half-asleep right now... Thanks. No huge hurry. Linus
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Sun, Oct 06, 2019 at 07:06:19PM -0700, Linus Torvalds wrote: > On Sun, Oct 6, 2019 at 6:24 PM Al Viro wrote: > > > > Ugh... I wonder if it would be better to lift STAC/CLAC out of > > raw_copy_to_user(), rather than trying to reinvent its guts > > in readdir.c... > > Yeah, I suspect that's the best option. > > Do something like > > - lift STAC/CLAC out of raw_copy_to_user > > - rename it to unsafe_copy_to_user > > - create a new raw_copy_to_user that is just unsafe_copy_to_user() > with the STAC/CLAC around it. > > and the end result would actually be cleanert than what we have now > (which duplicates that STAC/CLAC for each size case etc). > > And then for the "architecture doesn't have user_access_begin/end()" > fallback case, we just do > >#define unsafe_copy_to_user raw_copy_to_user Callers of raw_copy_to_user(): arch/hexagon/mm/uaccess.c:27: uncleared = raw_copy_to_user(dest, _zero_page, PAGE_SIZE); arch/hexagon/mm/uaccess.c:34: count = raw_copy_to_user(dest, _zero_page, count); arch/powerpc/kvm/book3s_64_mmu_radix.c:68: ret = raw_copy_to_user(to, from, n); arch/s390/include/asm/uaccess.h:150:size = raw_copy_to_user(ptr, x, size); include/asm-generic/uaccess.h:145: return unlikely(raw_copy_to_user(ptr, x, size)) ? -EFAULT : 0; include/linux/uaccess.h:93: return raw_copy_to_user(to, from, n); include/linux/uaccess.h:102:return raw_copy_to_user(to, from, n); include/linux/uaccess.h:131:n = raw_copy_to_user(to, from, n); lib/iov_iter.c:142: n = raw_copy_to_user(to, from, n); lib/usercopy.c:28: n = raw_copy_to_user(to, from, n); Out of those, only __copy_to_user_inatomic(), __copy_to_user(), _copy_to_user() and iov_iter.c:copyout() can be called on any architecture. The last two should just do user_access_begin()/user_access_end() instead of access_ok(). __copy_to_user_inatomic() has very few callers as well: arch/mips/kernel/unaligned.c:1307: res = __copy_to_user_inatomic(addr, fpr, sizeof(*fpr)); drivers/gpu/drm/i915/i915_gem.c:345:unwritten = __copy_to_user_inatomic(user_data, lib/test_kasan.c:471: unused = __copy_to_user_inatomic(usermem, kmem, size + 1); mm/maccess.c:98:ret = __copy_to_user_inatomic((__force void __user *)dst, src, size); So few, in fact, that I wonder if we want to keep it at all; the only thing stopping me from "let's remove it" is that I don't understand the i915 side of things. Where does it do an equivalent of access_ok()? And mm/maccess.c one is __probe_kernel_write(), so presumably we don't want stac/clac there at all... So do we want to bother with separation between raw_copy_to_user() and unsafe_copy_to_user()? After all, __copy_to_user() also has only few callers, most of them in arch/* I'll take a look into that tomorrow - half-asleep right now...
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On 10/6/19 6:17 PM, Linus Torvalds wrote: On Sun, Oct 6, 2019 at 5:04 PM Guenter Roeck wrote: [ ... ] And yes, I'll fix that name copy loop in filldir to align the destination first, *but* if I'm right, it means that something like this should also likely cause issues: #define _GNU_SOURCE #include #include int main(int argc, char **argv) { void *mymap; uid_t *bad_ptr = (void *) 0x01; /* Create unpopulated memory area */ mymap = mmap(NULL, 16384, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); /* Unaligned uidpointer in that memory area */ bad_ptr = mymap+1; /* Make the kernel do put_user() on it */ return getresuid(bad_ptr, bad_ptr+1, bad_ptr+2); } because that simple user mode program should cause that same "page fault on unaligned put_user()" behavior as far as I can tell. Mind humoring me and trying that on your alpha machine (or emulator, or whatever)? Here you are. This is with v5.4-rc2 and your previous patch applied on top. / # ./mmtest Unable to handle kernel paging request at virtual address 0004 mmtest(75): Oops -1 pc = [<0004>] ra = [] ps = Not tainted pc is at 0x4 ra is at entSys+0xa4/0xc0 v0 = fff2 t0 = t1 = t2 = t3 = t4 = t5 = fffe t6 = t7 = fc0007edc000 s0 = s1 = 000126f0 s2 = 0001200df19f s3 = 0001200ea0b9 s4 = 000120114630 s5 = 0001201145d8 s6 = 00011f955c50 a0 = 0202c001 a1 = 0202c005 a2 = 0202c009 a3 = a4 = a5 = t8 = t9 = fc00 t10= t11= 00011f955788 pv = fc349450 at = f8db54d3 gp = fcf2a160 sp = ab237c72 Disabling lock debugging due to kernel taint Trace: Code: 00063301 07b6 3f8d Segmentation fault Guenter
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Sun, Oct 6, 2019 at 6:24 PM Al Viro wrote: > > Ugh... I wonder if it would be better to lift STAC/CLAC out of > raw_copy_to_user(), rather than trying to reinvent its guts > in readdir.c... Yeah, I suspect that's the best option. Do something like - lift STAC/CLAC out of raw_copy_to_user - rename it to unsafe_copy_to_user - create a new raw_copy_to_user that is just unsafe_copy_to_user() with the STAC/CLAC around it. and the end result would actually be cleanert than what we have now (which duplicates that STAC/CLAC for each size case etc). And then for the "architecture doesn't have user_access_begin/end()" fallback case, we just do #define unsafe_copy_to_user raw_copy_to_user and the only slight painpoint is that we need to deal with that copy_user_generic() case too. We'd have to mark it uaccess_safe in objtool (but we already have that __memcpy_mcsafe and csum_partial_copy_generic, os it all makes sense), and we'd have to make all the other copy_user_generic() cases then do the CLAC/STAC dance too or something. ANYWAY. As mentioned, I'm not actually all that worried about this all. I could easily also just see the filldir() copy do an extra #ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS if (len && (1 & (uint_ptr_t)dst)) .. copy byte .. if (len > 1 && (2 & (uint_ptr_t)dst)) .. copy word .. if (len > 3 && (4 & (uint_ptr_t)dst) && sizeof(unsigned long) > 4) .. copy dword .. #endif at the start to align the destination. The filldir code is actually somewhat unusual in that it deals with pretty small strings on average, so just doing this might be more efficient anyway. So that doesn't worry me. Multiple ways to solve that part. The "uhhuh, unaligned accesses cause more than performance problems" - that's what worries me. Linus
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Sun, Oct 06, 2019 at 06:17:02PM -0700, Linus Torvalds wrote: > On Sun, Oct 6, 2019 at 5:04 PM Guenter Roeck wrote: > > > > All my alpha, sparc64, and xtensa tests pass with the attached patch > > applied on top of v5.4-rc2. I didn't test any others. > > Okay... I really wish my guess had been wrong. > > Because fixing filldir64 isn't the problem. I can come up with > multiple ways to avoid the unaligned issues if that was the problem. > > But it does look to me like the fundamental problem is that unaligned > __put_user() calls might just be broken on alpha (and likely sparc > too). Because that looks to be the only difference between the > __copy_to_user() approach and using unsafe_put_user() in a loop. > > Now, I should have handled unaligned things differently in the first > place, and in that sense I think commit 9f79b78ef744 ("Convert > filldir[64]() from __put_user() to unsafe_put_user()") really is > non-optimal on architectures with alignment issues. > > And I'll fix it. Ugh... I wonder if it would be better to lift STAC/CLAC out of raw_copy_to_user(), rather than trying to reinvent its guts in readdir.c...
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Sun, Oct 6, 2019 at 5:04 PM Guenter Roeck wrote: > > All my alpha, sparc64, and xtensa tests pass with the attached patch > applied on top of v5.4-rc2. I didn't test any others. Okay... I really wish my guess had been wrong. Because fixing filldir64 isn't the problem. I can come up with multiple ways to avoid the unaligned issues if that was the problem. But it does look to me like the fundamental problem is that unaligned __put_user() calls might just be broken on alpha (and likely sparc too). Because that looks to be the only difference between the __copy_to_user() approach and using unsafe_put_user() in a loop. Now, I should have handled unaligned things differently in the first place, and in that sense I think commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()") really is non-optimal on architectures with alignment issues. And I'll fix it. But at the same time, the fact that "non-optimal" turns into "doesn't work" is a fairly nasty issue. > I'll (try to) send you some disassembly next. Thanks, verified. The "ra is at filldir64+0x64/0x320" is indeed right at the return point of the "jsr verify_dirent_name". But the problem isn't there - that's just left-over state. I'm pretty sure that function worked fine, and returned. The problem is that "pc is at 0x4" and the page fault that then happens at that address as a result. And that seems to be due to this: 8c0: 00 00 29 2c ldq_u t0,0(s0) 8c4: 07 00 89 2c ldq_u t3,7(s0) 8c8: 03 04 e7 47 mov t6,t2 8cc: c1 06 29 48 extql t0,s0,t0 8d0: 44 0f 89 48 extqh t3,s0,t3 8d4: 01 04 24 44 or t0,t3,t0 8d8: 00 00 22 b4 stq t0,0(t1) that's the "get_unaligned((type *)src)" (the first six instructions) followed by the "unsafe_put_user()" done with a single "stq". That's the guts of the unsafe_copy_loop() as part of unsafe_copy_dirent_name() And what I think happens is that it is writing to user memory that is (a) unaligned (b) not currently mapped in user space so then the do_entUna() function tries to handle the unaligned trap, but then it takes an exception while doing that (due to the unmapped page), and then something in that nested exception mess causes it to mess up badly and corrupt the register contents on stack, and it returns with garbage in 'pc', and then you finally die with that Unable to handle kernel paging request at virtual address 0004 pc is at 0x4 thing. And yes, I'll fix that name copy loop in filldir to align the destination first, *but* if I'm right, it means that something like this should also likely cause issues: #define _GNU_SOURCE #include #include int main(int argc, char **argv) { void *mymap; uid_t *bad_ptr = (void *) 0x01; /* Create unpopulated memory area */ mymap = mmap(NULL, 16384, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); /* Unaligned uidpointer in that memory area */ bad_ptr = mymap+1; /* Make the kernel do put_user() on it */ return getresuid(bad_ptr, bad_ptr+1, bad_ptr+2); } because that simple user mode program should cause that same "page fault on unaligned put_user()" behavior as far as I can tell. Mind humoring me and trying that on your alpha machine (or emulator, or whatever)? Linus
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Sun, Oct 06, 2019 at 04:06:16PM -0700, Linus Torvalds wrote: > On Sun, Oct 6, 2019 at 3:20 PM Guenter Roeck wrote: > > > > this patch causes all my sparc64 emulations to stall during boot. It causes > > all alpha emulations to crash with [1a] and [1b] when booting from a virtual > > disk, and one of the xtensa emulations to crash with [2]. > > Ho humm. I've run variations of that patch over a few years on x86, > but obviously not on alpha/sparc. > > At least I should still be able to read alpha assembly, even after all > these years. Would you mind sending me the result of > > make fs/readdir.s > > on alpha with the broken config? I'd hope that the sparc issue is the same. > > Actually, could you also do > > make fs/readdir.o > > and then send me the "objdump --disassemble" of that? That way I get > the instruction offsets without having to count by hand. > Both attached for alpha. > > Unable to handle kernel paging request at virtual address 0004 > > rcS(47): Oops -1 > > pc = [<0004>] ra = [] ps = Not > > tainted > > pc is at 0x4 > > That is _funky_. I'm not seeing how it could possibly jump to 0x4, but > it clearly does. > > That said, are you sure it's _that_ commit? Because this pattern: > Bisect on sparc pointed to this commit, and re-running the tests with the commit reverted passed for all architectures. I didn't check any further. Please let me know if you need anything else at this point. Thanks, Guenter .set noreorder .set volatile .set noat .set nomacro .arch ev5 # GNU C89 (GCC) version 9.2.0 (alpha-linux) # compiled by GNU C version 6.5.0 20181026, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version none # warning: GMP header version 6.1.0 differs from library version 6.1.2. # warning: MPC header version 1.0.3 differs from library version 1.1.0. # GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 # options passed: -nostdinc -I ./arch/alpha/include # -I ./arch/alpha/include/generated -I ./include # -I ./arch/alpha/include/uapi -I ./arch/alpha/include/generated/uapi # -I ./include/uapi -I ./include/generated/uapi # -iprefix /opt/kernel/gcc-9.2.0-nolibc/alpha-linux/bin/../lib/gcc/alpha-linux/9.2.0/ # -D __KERNEL__ -D KBUILD_BASENAME="readdir" -D KBUILD_MODNAME="readdir" # -isystem /opt/kernel/gcc-9.2.0-nolibc/alpha-linux/bin/../lib/gcc/alpha-linux/9.2.0/include # -include ./include/linux/kconfig.h # -include ./include/linux/compiler_types.h -MD fs/.readdir.s.d # fs/readdir.c -mno-fp-regs -mcpu=ev5 -auxbase-strip fs/readdir.s -O2 # -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs # -Werror=implicit-function-declaration -Werror=implicit-int # -Wno-format-security -Wno-frame-address -Wformat-truncation=0 # -Wformat-overflow=0 -Wno-address-of-packed-member # -Wframe-larger-than=2048 -Wno-unused-but-set-variable # -Wimplicit-fallthrough=3 -Wunused-const-variable=0 # -Wdeclaration-after-statement -Wvla -Wno-pointer-sign # -Wno-stringop-truncation -Werror=date-time # -Werror=incompatible-pointer-types -Werror=designated-init # -Wno-packed-not-aligned -std=gnu90 -fno-strict-aliasing -fno-common # -fshort-wchar -fno-PIE -ffixed-8 -fno-jump-tables # -fno-delete-null-pointer-checks -fno-stack-protector # -fomit-frame-pointer -fno-strict-overflow -fno-merge-all-constants # -fmerge-constants -fstack-check=no -fconserve-stack # -fmacro-prefix-map=./= -fverbose-asm --param allow-store-data-races=0 # options enabled: -faggressive-loop-optimizations -falign-functions # -falign-jumps -falign-labels -falign-loops -fassume-phsa -fauto-inc-dec # -fbranch-count-reg -fcaller-saves -fcode-hoisting # -fcombine-stack-adjustments -fcompare-elim -fcprop-registers # -fcrossjumping -fcse-follow-jumps -fdefer-pop -fdevirtualize # -fdevirtualize-speculatively -fdwarf2-cfi-asm -fearly-inlining # -feliminate-unused-debug-types -fexpensive-optimizations # -fforward-propagate -ffp-int-builtin-inexact -ffunction-cse -fgcse # -fgcse-lm -fgnu-runtime -fgnu-unique -fguess-branch-probability # -fhoist-adjacent-loads -fident -fif-conversion -fif-conversion2 # -findirect-inlining -finline -finline-atomics # -finline-functions-called-once -finline-small-functions -fipa-bit-cp # -fipa-cp -fipa-icf -fipa-icf-functions -fipa-icf-variables -fipa-profile # -fipa-pure-const -fipa-ra -fipa-reference -fipa-reference-addressable # -fipa-sra -fipa-stack-alignment -fipa-vrp -fira-hoist-pressure # -fira-share-save-slots -fira-share-spill-slots # -fisolate-erroneous-paths-dereference -fivopts -fkeep-static-consts # -fleading-underscore -flifetime-dse -flra-remat -flto-odr-type-merging # -fmath-errno -fmerge-constants -fmerge-debug-strings # -fmove-loop-invariants -fomit-frame-pointer -foptimize-sibling-calls # -foptimize-strlen -fpartial-inlining -fpcc-struct-return -fpeephole # -fpeephole2 -fplt -fprefetch-loop-arrays -free -freorder-blocks
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On 10/6/19 4:35 PM, Linus Torvalds wrote: [ ... ] Anyway, let me think about this, but just for testing, does the attached patch make any difference? It's not the right thing in general (and most definitely not on x86), but for testing whether this is about unaligned accesses it might work. All my alpha, sparc64, and xtensa tests pass with the attached patch applied on top of v5.4-rc2. I didn't test any others. I'll (try to) send you some disassembly next. Guenter
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Sun, Oct 6, 2019 at 4:06 PM Linus Torvalds wrote: > > Ho humm. I've run variations of that patch over a few years on x86, > but obviously not on alpha/sparc. Oooh. I wonder... This may be the name string copy loop. And it's special in that the result may not be aligned. Now, a "__put_user()" with an unaligned address _should_ work - it's very easy to trigger that from user space by just giving an unaligned address to any system call that then writes a single word. But alpha does #define __put_user_32(x, addr) \ __asm__ __volatile__("1: stl %r2,%1\n" \ "2:\n" \ EXC(1b,2b,$31,%0) \ : "=r"(__pu_err)\ : "m"(__m(addr)), "rJ"(x), "0"(__pu_err)) iow it implements that 32-bit __put_user() as a 'stl'. Which will trap if it's not aligned. And I wonder how much testing that has ever gotten. Nobody really does unaigned accesses on alpha. We need to do that memcpy unrolling on x86, because x86 actually uses "user_access_begin()" and we have magic rules about what is inside that region. But on alpha (and sparc) it might be better to just do "__copy_to_user()". Anyway, this does look like a possible latent bug where the alpha unaligned trap doesn't then handle the case of exceptions. I know it _tries_, but I doubt it's gotten a whole lot of testing. Anyway, let me think about this, but just for testing, does the attached patch make any difference? It's not the right thing in general (and most definitely not on x86), but for testing whether this is about unaligned accesses it might work. It's entirely untested, and in fact on x86 it should cause objtool to complain about a function call with AC set. But I think that on alpha and sparc, using __copy_to_user() for the name copy should work, and would work around the unaligned issue. That said, if it *is* the unaligned issue, then that just means that we have a serious bug elsewhere in the alpha port. Maybe nobody cares. Linus fs/readdir.c | 9 + 1 file changed, 9 insertions(+) diff --git a/fs/readdir.c b/fs/readdir.c index 19bea591c3f1..d49c4e2c66a8 100644 --- a/fs/readdir.c +++ b/fs/readdir.c @@ -76,6 +76,15 @@ unsafe_put_user(0, dst, label);\ } while (0) +/* Alpha (and sparc?) test patch! */ +#undef unsafe_copy_dirent_name +#define unsafe_copy_dirent_name(_dst, _src, _len, label) do { \ + char __user *dst = (_dst);\ + const char *src = (_src);\ + size_t len = (_len); \ + if (__copy_to_user(dst, src, len)) goto label; \ + unsafe_put_user(0, dst+len, label); \ +} while (0) int iterate_dir(struct file *file, struct dir_context *ctx) {
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Sun, Oct 6, 2019 at 3:20 PM Guenter Roeck wrote: > > this patch causes all my sparc64 emulations to stall during boot. It causes > all alpha emulations to crash with [1a] and [1b] when booting from a virtual > disk, and one of the xtensa emulations to crash with [2]. Ho humm. I've run variations of that patch over a few years on x86, but obviously not on alpha/sparc. At least I should still be able to read alpha assembly, even after all these years. Would you mind sending me the result of make fs/readdir.s on alpha with the broken config? I'd hope that the sparc issue is the same. Actually, could you also do make fs/readdir.o and then send me the "objdump --disassemble" of that? That way I get the instruction offsets without having to count by hand. > Unable to handle kernel paging request at virtual address 0004 > rcS(47): Oops -1 > pc = [<0004>] ra = [] ps = Not tainted > pc is at 0x4 That is _funky_. I'm not seeing how it could possibly jump to 0x4, but it clearly does. That said, are you sure it's _that_ commit? Because this pattern: > a0 = fc0007dbca56 a1 = 2f2f2f2f2f2f2f2f a2 = 000a implicates the memchr('/') call in the next one. That's a word full of '/' characters. Of course, it could just be left-over register contents from that memchr(), but it makes me wonder. Particularly since it seems to happen early in filldir64(): > ra is at filldir64+0x64/0x320 which is just a fairly small handful of instructions in, and I wouldn't be shocked if that's the return address for the call to memchr. Linus
Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()
On Sat, May 21, 2016 at 09:59:07PM -0700, Linus Torvalds wrote: > We really should avoid the "__{get,put}_user()" functions entirely, > because they can easily be mis-used and the original intent of being > used for simple direct user accesses no longer holds in a post-SMAP/PAN > world. > > Manually optimizing away the user access range check makes no sense any > more, when the range check is generally much cheaper than the "enable > user accesses" code that the __{get,put}_user() functions still need. > > So instead of __put_user(), use the unsafe_put_user() interface with > user_access_{begin,end}() that really does generate better code these > days, and which is generally a nicer interface. Under some loads, the > multiple user writes that filldir() does are actually quite noticeable. > > This also makes the dirent name copy use unsafe_put_user() with a couple > of macros. We do not want to make function calls with SMAP/PAN > disabled, and the code this generates is quite good when the > architecture uses "asm goto" for unsafe_put_user() like x86 does. > > Note that this doesn't bother with the legacy cases. Nobody should use > them anyway, so performance doesn't really matter there. > > Signed-off-by: Linus Torvalds Linus, this patch causes all my sparc64 emulations to stall during boot. It causes all alpha emulations to crash with [1a] and [1b] when booting from a virtual disk, and one of the xtensa emulations to crash with [2]. Reverting this patch fixes the problem. Guenter --- [1a] Unable to handle kernel paging request at virtual address 0004 rcS(47): Oops -1 pc = [<0004>] ra = [] ps = Not tainted pc is at 0x4 ra is at filldir64+0x64/0x320 v0 = t0 = t1 = 000120117e8b t2 = 646e617275303253 t3 = 646e61727530 t4 = 7fe8 t5 = 000120117e78 t6 = t7 = fc0007ec8000 s0 = fc0007dbca56 s1 = 000a s2 = 0020 s3 = fc0007ecbec8 s4 = 0008 s5 = 0021 s6 = 1cd2631fe897bf5a a0 = fc0007dbca56 a1 = 2f2f2f2f2f2f2f2f a2 = 000a a3 = 1cd2631fe897bf5a a4 = 0021 a5 = 0008 t8 = 0020 t9 = t10= fc0007dbca60 t11= 0001 pv = fcb9a810 at = 0001 gp = fcf03930 sp = (ptrval) Disabling lock debugging due to kernel taint Trace: [] call_filldir+0xe8/0x1b0 [] ext4_readdir+0x924/0xa70 [] _raw_spin_unlock+0x18/0x30 [] __handle_mm_fault+0x9fc/0xc30 [] iterate_dir+0x198/0x240 [] iterate_dir+0x5c/0x240 [] ksys_getdents64+0xa8/0x160 [] sys_getdents64+0x20/0x40 [] filldir64+0x0/0x320 [] entSys+0xa4/0xc0 --- [1b] Unable to handle kernel paging request at virtual address 0004 reboot(50): Oops -1 pc = [<0004>] ra = [] ps = Tainted: G D pc is at 0x4 ra is at filldir64+0x64/0x320 v0 = t0 = 67736d6b t1 = 00012011445b t2 = t3 = t4 = 7ef8 t5 = 000120114448 t6 = t7 = fc0007eec000 s0 = fc000792b5c3 s1 = 0004 s2 = 0018 s3 = fc0007eefec8 s4 = 0008 s5 = f0a3 s6 = 000b a0 = fc000792b5c3 a1 = 2f2f2f2f2f2f2f2f a2 = 0004 a3 = 000b a4 = f0a3 a5 = 0008 t8 = 0018 t9 = t10= 22e1d02a t11= 00011f8fd3b8 pv = fcb9a810 at = 22e1ccf8 gp = fcf03930 sp = (ptrval) Trace: [] proc_readdir_de+0x170/0x300 [] filldir64+0x0/0x320 [] proc_root_readdir+0x3c/0x80 [] iterate_dir+0x198/0x240 [] ksys_getdents64+0xa8/0x160 [] sys_getdents64+0x20/0x40 [] filldir64+0x0/0x320 [] entSys+0xa4/0xc0 --- [2] Unable to handle kernel paging request at virtual address 0004 reboot(50): Oops -1 pc = [<0004>] ra = [] ps = Tainted: G D pc is at 0x4 ra is at filldir64+0x64/0x320 v0 = t0 = 67736d6b t1 = 00012011445b t2 = t3 = t4 = 7ef8 t5 = 000120114448 t6 = t7 = fc0007eec000 s0 = fc000792b5c3 s1 = 0004 s2 = 0018 s3 = fc0007eefec8 s4 = 0008 s5 = f0a3 s6 = 000b a0 = fc000792b5c3 a1 = 2f2f2f2f2f2f2f2f a2 = 0004 a3 = 000b a4 = f0a3 a5 = 0008 t8 = 0018 t9 = t10= 22e1d02a t11= 00011fd6f3b8 pv = fcb9a810 at = 22e1ccf8 gp = fcf03930 sp = (ptrval) Trace: [] proc_readdir_de+0x170/0x300 [] filldir64+0x0/0x320 [] proc_root_readdir+0x3c/0x80 [] iterate_dir+0x198/0x240 [] ksys_getdents64+0xa8/0x160 [] sys_getdents64+0x20/0x40 [] filldir64+0x0/0x320 [] entSys+0xa4/0xc0 Code: 00063301 07a3