subject:"Re\: \[PATCH\] Convert filldir\[64\]\(\) from __put_user\(\) to unsafe_put

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-16 Thread Al Viro

On Sun, Oct 13, 2019 at 12:22:38PM -0700, Linus Torvalds wrote:
> On Sun, Oct 13, 2019 at 12:10 PM Al Viro  wrote:
> >
> > No arguments re put_user_ex side of things...  Below is a completely
> > untested patch for get_user_ex elimination (it seems to build, but that's
> > it); in any case, I would really like to see comments from x86 folks
> > before it goes anywhere.
> 
> Please don't do this:
> 
> > +   if (unlikely(__copy_from_user(, usc, sizeof(sc
> > +   goto Efault;
> 
> Why would you use __copy_from_user()? Just don't.
> 
> > +   if (unlikely(__copy_from_user(, user_vm86,
> > +   offsetof(struct vm86_struct, int_revectored
> 
> Same here.
> 
> There's no excuse for __copy_from_user().

FWIW, callers of __copy_from_user() remaining in the generic code:

1) regset.h:user_regset_copyin().  Switch to copy_from_user(); the calling
conventions of regset ->set() (as well as the method name) are atrocious,
but there are too many instances to mix any work in that direction into
this series.  Yes, nominally it's an inline, but IRL it's too large and
has many callers in the same file(s), so any optimizations of inlining
__copy_from_user() will be lost and there's more than enough work done
there to make access_ok() a noise.  And in this case it doesn't pay
to try and lift user_access_begin() into the callers - the work done
between the calls is often too non-trivial to be done in such area.
The same goes for other regset.h stuff; eventually we might want to
try and come up with saner API, but that's a separate story.

2) default csum_partial_copy_from_user().  What we need to do is
turn it into default csum_and_copy_from_user().  This
#ifndef _HAVE_ARCH_COPY_AND_CSUM_FROM_USER
static inline
__wsum csum_and_copy_from_user (const void __user *src, void *dst,
  int len, __wsum sum, int *err_ptr)
{
if (access_ok(src, len))
return csum_partial_copy_from_user(src, dst, len, sum, err_ptr);

if (len)
*err_ptr = -EFAULT;

return sum;
}
#endif
in checksum.h is the only thing that calls that sucker and we can bloody
well combine them and make the users of lib/checksum.h define
_HAVE_ARCH_COPY_AND_CSUM_FROM_USER.  That puts us reasonably close
to having _HAVE_ARCH_COPY_AND_CSUM_FROM_USER unconditional and in any
case, __copy_from_user() in lib/checksum.h turns into copy_from_user().

3) firewire ioctl_queue_iso().  Convert to copy_from_user(), lose the
access_ok() before the loop.  Definitely not an unsafe_... situation
(we call fw_iso_context_queue() after each chunk; _not_ something
we want under user_access_begin()/user_access_end()) and it's really
not worth trying to save on access_ok() checks there.

4) pstore persistent_ram_update_user().  Obvious copy_from_user(); definitely
lose access_ok() in the caller (persistent_ram_write_user()), along with
the one in write_pmsg() (several calls back by the callchain).

5) test_kasan: lose the function, lose the tests...

6) drivers/scsi/sg.c nest: sg_read() ones are memdup_user() in disguise
(i.e. fold with immediately preceding kmalloc()s).  sg_new_write() -
fold with access_ok() into copy_from_user() (for both call sites).
sg_write() - lose access_ok(), use copy_from_user() (both call sites)
and get_user() (instead of the solitary __get_user() there).

7) i915 ones are, frankly, terrifying.  Consider e.g. this one:
relocs = kvmalloc_array(size, 1, GFP_KERNEL);
if (!relocs) {
err = -ENOMEM;
goto err;
}

/* copy_from_user is limited to < 4GiB */
copied = 0;
do {
unsigned int len =
min_t(u64, BIT_ULL(31), size - copied);

if (__copy_from_user((char *)relocs + copied,
 (char __user *)urelocs + copied,
 len))
goto end;

copied += len;
} while (copied < size);
Is that for real?  Are they *really* trying to allocate and copy >2Gb of
userland data?  That's eb_copy_relocations() and that crap is itself in
a loop.  Sizes come from user-supplied data.  WTF?  It's some weird
kmemdup lookalike and I'd rather heard from maintainers of that thing
before doing anything with it.

8) vhost_copy_from_user().  Need comments from mst - it's been a while since I 
crawled
through that code and I'd need his ACK anyway.  The logics with positioning of
access_ok() in there is non-trivial and I'm not sure how much of that serves
as early input validation and how much can be taken out and replaced by use of
place copy_from_user() and friends.

9) KVM.  There I'm not sure that access_ok() would be the right thing to
do.  kvm_is_error_hva() tends to serve as the range check in that and

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-15 Thread Al Viro

On Tue, Oct 15, 2019 at 08:40:12PM +0100, Al Viro wrote:
> or this
> static void ntb_memcpy_tx(struct ntb_queue_entry *entry, void __iomem *offset)
> {
> #ifdef ARCH_HAS_NOCACHE_UACCESS
> /*
>  * Using non-temporal mov to improve performance on non-cached
>  * writes, even though we aren't actually copying from user space.
>  */
> __copy_from_user_inatomic_nocache(offset, entry->buf, entry->len);
> #else   
> memcpy_toio(offset, entry->buf, entry->len);
> #endif
> 
> /* Ensure that the data is fully copied out before setting the flags 
> */
> wmb();
> 
> ntb_tx_copy_callback(entry, NULL);
> }
> "user" part is bollocks in both cases; moreover, I really wonder about that
> ifdef in ntb one - ARCH_HAS_NOCACHE_UACCESS is x86-only *at* *the* *moment*
> and it just so happens that ..._toio() doesn't require anything special on
> x86.  Have e.g. arm grow nocache stuff and the things will suddenly break,
> won't they?

Incidentally, there are two callers of __copy_from_user_inatomic_nocache() in
generic code:
lib/iov_iter.c:792: __copy_from_user_inatomic_nocache((to += 
v.iov_len) - v.iov_len,
lib/iov_iter.c:849: if (__copy_from_user_inatomic_nocache((to += 
v.iov_len) - v.iov_len,

Neither is done under under pagefault_disable(), AFAICS.  This one
drivers/gpu/drm/qxl/qxl_ioctl.c:189:unwritten = 
__copy_from_user_inatomic_nocache
probably is - it has something called qxl_bo_kmap_atomic_page() called just 
prior,
which would seem to imply kmap_atomic() somewhere in it.
The same goes for
drivers/gpu/drm/i915/i915_gem.c:500:unwritten = 
__copy_from_user_inatomic_nocache((void __force *)vaddr + offset,

So we have 5 callers anywhere.  Two are not "inatomic" in any sense; source is
in userspace and we want nocache behaviour.  Two _are_ done into a page that
had been fed through kmap_atomic(); the source is, again, in userland.  And
the last one is complete BS - it wants memcpy_toio_nocache() and abuses this
thing.

Incidentally, in case of fault i915 caller ends up unmapping the page,
mapping it non-atomic (with kmap?) and doing plain copy_from_user(),
nocache be damned.  qxl, OTOH, whines and fails all the way to userland...

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-15 Thread Al Viro

On Tue, Oct 15, 2019 at 12:00:34PM -0700, Linus Torvalds wrote:
> On Tue, Oct 15, 2019 at 11:08 AM Al Viro  wrote:
> >
> > Another question: right now we have
> > if (!access_ok(uaddr, sizeof(u32)))
> > return -EFAULT;
> >
> > ret = arch_futex_atomic_op_inuser(op, oparg, , uaddr);
> > if (ret)
> > return ret;
> > in kernel/futex.c.  Would there be any objections to moving access_ok()
> > inside the instances and moving pagefault_disable()/pagefault_enable() 
> > outside?
> 
> I think we should remove all the "atomic" versions, and just make the
> rule be that if you want atomic, you surround it with
> pagefault_disable()/pagefault_enable().

Umm...  I thought about that, but ended up with "it documents the intent" -
pagefault_disable() might be implicit (e.g. done by kmap_atomic()) or
several levels up the call chain.  Not sure.

> That covers not just the futex ops (where "atomic" is actually
> somewhat ambiguous - the ops themselves are atomic too, so the naming
> might stay, although arguably the "futex" part makes that pointless
> too), but also copy_to_user_inatomic() and the powerpc version of
> __get_user_inatomic().

Eh?  copy_to_user_inatomic() doesn't exist; __copy_to_user_inatomic()
does, but...

arch/mips/kernel/unaligned.c:1307:  res = 
__copy_to_user_inatomic(addr, fpr, sizeof(*fpr));
drivers/gpu/drm/i915/i915_gem.c:313:unwritten = 
__copy_to_user_inatomic(user_data,
lib/test_kasan.c:510:   unused = __copy_to_user_inatomic(usermem, kmem, size + 
1);
mm/maccess.c:98:ret = __copy_to_user_inatomic((__force void __user 
*)dst, src, size);

these are all callers it has left anywhere and I'm certainly going to kill it.
Now, __copy_from_user_inatomic() has a lot more callers left...  Frankly,
the messier part of API is the nocache side of things.  Consider e.g. this:
/* platform specific: cacheless copy */
static void cacheless_memcpy(void *dst, void *src, size_t n)
{
/* 
 * Use the only available X64 cacheless copy.  Add a __user cast
 * to quiet sparse.  The src agument is already in the kernel so
 * there are no security issues.  The extra fault recovery machinery
 * is not invoked.
 */
__copy_user_nocache(dst, (void __user *)src, n, 0);
}
or this
static void ntb_memcpy_tx(struct ntb_queue_entry *entry, void __iomem *offset)
{
#ifdef ARCH_HAS_NOCACHE_UACCESS
/*
 * Using non-temporal mov to improve performance on non-cached
 * writes, even though we aren't actually copying from user space.
 */
__copy_from_user_inatomic_nocache(offset, entry->buf, entry->len);
#else   
memcpy_toio(offset, entry->buf, entry->len);
#endif

/* Ensure that the data is fully copied out before setting the flags */
wmb();

ntb_tx_copy_callback(entry, NULL);
}
"user" part is bollocks in both cases; moreover, I really wonder about that
ifdef in ntb one - ARCH_HAS_NOCACHE_UACCESS is x86-only *at* *the* *moment*
and it just so happens that ..._toio() doesn't require anything special on
x86.  Have e.g. arm grow nocache stuff and the things will suddenly break,
won't they?

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-15 Thread Linus Torvalds

On Tue, Oct 15, 2019 at 11:08 AM Al Viro  wrote:
>
> Another question: right now we have
> if (!access_ok(uaddr, sizeof(u32)))
> return -EFAULT;
>
> ret = arch_futex_atomic_op_inuser(op, oparg, , uaddr);
> if (ret)
> return ret;
> in kernel/futex.c.  Would there be any objections to moving access_ok()
> inside the instances and moving pagefault_disable()/pagefault_enable() 
> outside?

I think we should remove all the "atomic" versions, and just make the
rule be that if you want atomic, you surround it with
pagefault_disable()/pagefault_enable().

That covers not just the futex ops (where "atomic" is actually
somewhat ambiguous - the ops themselves are atomic too, so the naming
might stay, although arguably the "futex" part makes that pointless
too), but also copy_to_user_inatomic() and the powerpc version of
__get_user_inatomic().

So we'd aim to get rid of all the "inatomic" ones entirely.

Same ultimately probably goes for the NMI versions. We should just
make it be a rule that we can use all of the user access functions
with pagefault_{dis,en}able() around them, and they'll be "safe" to
use in atomic context.

One issue with the NMI versions is that they actually want to avoid
the current value of set_fs().  So copy_from_user_nmi() (at least on
x86) is special in that it does

if (__range_not_ok(from, n, TASK_SIZE))
return n;

instead of access_ok() because of that issue.

NMI also has some other issues (nmi_uaccess_okay() on x86, at least),
but those *probably* could be handled at page fault time instead.

Anyway, NMI is so special that I'd suggest leaving it for later, but
the non-NMI atomic accesses I would suggest you clean up at the same
time.

I think the *only* reason we have the "inatomic()" versions is that
the regular ones do that "might_fault()" testing unconditionally, and
might_fault() _used_ to be just a might_sleep() - so it's not about
functionality per se, it's about "we have this sanity check that we
need to undo".

We've already made "might_fault()" look at pagefault_disabled(), so I
think a lot of the reasons for inatomic are entirely historical.

Linus

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-15 Thread Al Viro

[futex folks and linux-arch Cc'd]

On Sun, Oct 13, 2019 at 08:59:49PM +0100, Al Viro wrote:

> Re plotting: how strongly would you object against passing the range to
> user_access_end()?  Powerpc folks have a very close analogue of stac/clac,
> currently buried inside their __get_user()/__put_user()/etc. - the same
> places where x86 does, including futex.h and friends.
> 
> And there it's even costlier than on x86.  It would obviously be nice
> to lift it at least out of unsafe_get_user()/unsafe_put_user() and
> move into user_access_begin()/user_access_end(); unfortunately, in
> one subarchitecture they really want it the range on the user_access_end()
> side as well.  That's obviously not fatal (they can bloody well save those
> into thread_info at user_access_begin()), but right now we have relatively
> few user_access_end() callers, so the interface changes are still possible.
> 
> Other architectures with similar stuff are riscv (no arguments, same
> as for stac/clac), arm (uaccess_save_and_enable() on the way in,
> return value passed to uaccess_restore() on the way out) and s390
> (similar to arm, but there it's needed only to deal with nesting,
> and I'm not sure it actually can happen).
> 
> It would be nice to settle the API while there are not too many users
> outside of arch/x86; changing it later will be a PITA and we definitely
> have architectures that do potentially costly things around the userland
> memory access; user_access_begin()/user_access_end() is in the right
> place to try and see if they fit there...

Another question: right now we have
if (!access_ok(uaddr, sizeof(u32)))
return -EFAULT;

ret = arch_futex_atomic_op_inuser(op, oparg, , uaddr);
if (ret)
return ret;
in kernel/futex.c.  Would there be any objections to moving access_ok()
inside the instances and moving pagefault_disable()/pagefault_enable() outside?

Reasons:
* on x86 that would allow folding access_ok() with STAC into
user_access_begin().  The same would be doable on other usual suspects
(arm, arm64, ppc, riscv, s390), bringing access_ok() next to their
STAC counterparts.
* pagefault_disable()/pagefault_enable() pair is universal on
all architectures, really meant to by the nature of the beast and
lifting it into kernel/futex.c would get the same situation as with
futex_atomic_cmpxchg_inatomic().  Which also does access_ok() inside
the primitive (also foldable into user_access_begin(), at that).
* access_ok() would be closer to actual memory access (and
out of the generic code).

Comments?

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-14 Thread Michael Ellerman

Linus Torvalds  writes:
> On Sun, Oct 13, 2019 at 12:59 PM Al Viro  wrote:
>> Re plotting: how strongly would you object against passing the range to
>> user_access_end()?  Powerpc folks have a very close analogue of stac/clac,
>> currently buried inside their __get_user()/__put_user()/etc. - the same
>> places where x86 does, including futex.h and friends.
>>
>> And there it's even costlier than on x86.  It would obviously be nice
>> to lift it at least out of unsafe_get_user()/unsafe_put_user() and
>> move into user_access_begin()/user_access_end(); unfortunately, in
>> one subarchitecture they really want it the range on the user_access_end()
>> side as well.
>
> Hmm. I'm ok with that.
>
> Do they want the actual range, or would it prefer some kind of opaque
> cookie that user_access_begin() returns (where 0 would mean "failure"
> of course)?

The code does want the actual range, or at least the range rounded to a
segment boundary (256MB).

But it can get that already from a value it stashes in current->thread,
it was just more natural to pass the addr/size with the way the code is
currently structured.

It seems to generate slightly better code to pass addr/size vs loading
it from current->thread, but it's probably in the noise vs everything
else that's going on.

So a cookie would work fine, we could return the encoded addr/size in
the cookie and that might generate better code than loading it back from
current->thread. Equally we could just use the value in current->thread
and not have any cookie at all.

cheers

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-13 Thread Linus Torvalds

On Sun, Oct 13, 2019 at 12:59 PM Al Viro  wrote:
>
> Re plotting: how strongly would you object against passing the range to
> user_access_end()?  Powerpc folks have a very close analogue of stac/clac,
> currently buried inside their __get_user()/__put_user()/etc. - the same
> places where x86 does, including futex.h and friends.
>
> And there it's even costlier than on x86.  It would obviously be nice
> to lift it at least out of unsafe_get_user()/unsafe_put_user() and
> move into user_access_begin()/user_access_end(); unfortunately, in
> one subarchitecture they really want it the range on the user_access_end()
> side as well.

Hmm. I'm ok with that.

Do they want the actual range, or would it prefer some kind of opaque
cookie that user_access_begin() returns (where 0 would mean "failure"
of course)?

I'm thinking like a local_irq_save/restore thing, which might be the
case on yet other architectures.

 Linus

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-13 Thread Al Viro

On Sun, Oct 13, 2019 at 12:22:38PM -0700, Linus Torvalds wrote:
> On Sun, Oct 13, 2019 at 12:10 PM Al Viro  wrote:
> >
> > No arguments re put_user_ex side of things...  Below is a completely
> > untested patch for get_user_ex elimination (it seems to build, but that's
> > it); in any case, I would really like to see comments from x86 folks
> > before it goes anywhere.
> 
> Please don't do this:
> 
> > +   if (unlikely(__copy_from_user(, usc, sizeof(sc
> > +   goto Efault;
> 
> Why would you use __copy_from_user()? Just don't.
> 
> > +   if (unlikely(__copy_from_user(, user_vm86,
> > +   offsetof(struct vm86_struct, int_revectored
> 
> Same here.
> 
> There's no excuse for __copy_from_user().

Probably...  Said that, vm86 one is preceded by
if (!access_ok(user_vm86, plus ?
   sizeof(struct vm86_struct) :
   sizeof(struct vm86plus_struct)))
return -EFAULT;
so I didn't want to bother.  We'll need to eliminate most of
access_ok() anyway, and I figured that conversion to plain copy_from_user()
would go there as well.

Again, this is not a patch submission - just an illustration of what I meant
re getting rid of get_user_ex().  IOW, the whole thing is still in the
plotting stage.

Re plotting: how strongly would you object against passing the range to
user_access_end()?  Powerpc folks have a very close analogue of stac/clac,
currently buried inside their __get_user()/__put_user()/etc. - the same
places where x86 does, including futex.h and friends.

And there it's even costlier than on x86.  It would obviously be nice
to lift it at least out of unsafe_get_user()/unsafe_put_user() and
move into user_access_begin()/user_access_end(); unfortunately, in
one subarchitecture they really want it the range on the user_access_end()
side as well.  That's obviously not fatal (they can bloody well save those
into thread_info at user_access_begin()), but right now we have relatively
few user_access_end() callers, so the interface changes are still possible.

Other architectures with similar stuff are riscv (no arguments, same
as for stac/clac), arm (uaccess_save_and_enable() on the way in,
return value passed to uaccess_restore() on the way out) and s390
(similar to arm, but there it's needed only to deal with nesting,
and I'm not sure it actually can happen).

It would be nice to settle the API while there are not too many users
outside of arch/x86; changing it later will be a PITA and we definitely
have architectures that do potentially costly things around the userland
memory access; user_access_begin()/user_access_end() is in the right
place to try and see if they fit there...

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-13 Thread Linus Torvalds

On Sun, Oct 13, 2019 at 12:10 PM Al Viro  wrote:
>
> No arguments re put_user_ex side of things...  Below is a completely
> untested patch for get_user_ex elimination (it seems to build, but that's
> it); in any case, I would really like to see comments from x86 folks
> before it goes anywhere.

Please don't do this:

> +   if (unlikely(__copy_from_user(, usc, sizeof(sc
> +   goto Efault;

Why would you use __copy_from_user()? Just don't.

> +   if (unlikely(__copy_from_user(, user_vm86,
> +   offsetof(struct vm86_struct, int_revectored

Same here.

There's no excuse for __copy_from_user().

   Linus

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-13 Thread Al Viro

On Sun, Oct 13, 2019 at 11:43:57AM -0700, Linus Torvalds wrote:
> On Sun, Oct 13, 2019 at 11:13 AM Al Viro  wrote:
> >
> > Umm...  TBH, I wonder if we would be better off if restore_sigcontext()
> > (i.e. sigreturn()/rt_sigreturn()) would flat-out copy_from_user() the
> > entire[*] struct sigcontext into a local variable and then copied fields
> > to pt_regs...
> 
> Probably ok., We've generally tried to avoid state that big on the
> stack, but you're right that it's shallow.
>
> > Same for do_sys_vm86(), perhaps.
> >
> > And these (32bit and 64bit restore_sigcontext() and do_sys_vm86())
> > are the only get_user_ex() users anywhere...
> 
> Yeah, that sounds like a solid strategy for getting rid of them.
> 
> Particularly since we can't really make get_user_ex() generate
> particularly good code (at least for now).
> 
> Now, put_user_ex() is a different thing - converting it to
> unsafe_put_user() actually does make it generate very good code - much
> better than copying data twice.

No arguments re put_user_ex side of things...  Below is a completely
untested patch for get_user_ex elimination (it seems to build, but that's
it); in any case, I would really like to see comments from x86 folks
before it goes anywhere.

diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index 1cee10091b9f..28a32ccc32de 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -46,59 +46,38 @@
 #define get_user_seg(seg)  ({ unsigned int v; savesegment(seg, v); v; })
 #define set_user_seg(seg, v)   loadsegment_##seg(v)
 
-#define COPY(x){   \
-   get_user_ex(regs->x, >x);   \
-}
-
-#define GET_SEG(seg)   ({  \
-   unsigned short tmp; \
-   get_user_ex(tmp, >seg); \
-   tmp;\
-})
+#define COPY(x) regs->x = sc.x
 
-#define COPY_SEG_CPL3(seg) do {\
-   regs->seg = GET_SEG(seg) | 3;   \
-} while (0)
+#define COPY_SEG_CPL3(seg) regs->seg = sc.seg | 3
 
 #define RELOAD_SEG(seg){   \
-   unsigned int pre = (seg) | 3;   \
+   unsigned int pre = sc.seg | 3;  \
unsigned int cur = get_user_seg(seg);   \
if (pre != cur) \
set_user_seg(seg, pre); \
 }
 
 static int ia32_restore_sigcontext(struct pt_regs *regs,
-  struct sigcontext_32 __user *sc)
+  struct sigcontext_32 __user *usc)
 {
-   unsigned int tmpflags, err = 0;
-   u16 gs, fs, es, ds;
-   void __user *buf;
-   u32 tmp;
+   struct sigcontext_32 sc;
 
/* Always make any pending restarted system calls return -EINTR */
current->restart_block.fn = do_no_restart_syscall;
 
-   get_user_try {
-   gs = GET_SEG(gs);
-   fs = GET_SEG(fs);
-   ds = GET_SEG(ds);
-   es = GET_SEG(es);
-
-   COPY(di); COPY(si); COPY(bp); COPY(sp); COPY(bx);
-   COPY(dx); COPY(cx); COPY(ip); COPY(ax);
-   /* Don't touch extended registers */
+   if (unlikely(__copy_from_user(, usc, sizeof(sc
+   goto Efault;
 
-   COPY_SEG_CPL3(cs);
-   COPY_SEG_CPL3(ss);
+   COPY(di); COPY(si); COPY(bp); COPY(sp); COPY(bx);
+   COPY(dx); COPY(cx); COPY(ip); COPY(ax);
+   /* Don't touch extended registers */
 
-   get_user_ex(tmpflags, >flags);
-   regs->flags = (regs->flags & ~FIX_EFLAGS) | (tmpflags & 
FIX_EFLAGS);
-   /* disable syscall checks */
-   regs->orig_ax = -1;
+   COPY_SEG_CPL3(cs);
+   COPY_SEG_CPL3(ss);
 
-   get_user_ex(tmp, >fpstate);
-   buf = compat_ptr(tmp);
-   } get_user_catch(err);
+   regs->flags = (regs->flags & ~FIX_EFLAGS) | (sc.flags & FIX_EFLAGS);
+   /* disable syscall checks */
+   regs->orig_ax = -1;
 
/*
 * Reload fs and gs if they have changed in the signal
@@ -111,11 +90,15 @@ static int ia32_restore_sigcontext(struct pt_regs *regs,
RELOAD_SEG(ds);
RELOAD_SEG(es);
 
-   err |= fpu__restore_sig(buf, 1);
+   if (unlikely(fpu__restore_sig(compat_ptr(sc.fpstate), 1)))
+   goto Efault;
 
force_iret();
+   return 0;
 
-   return err;
+Efault:
+   force_iret();
+   return -EFAULT;
 }
 
 asmlinkage long sys32_sigreturn(void)
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 61d93f062a36..ac81f06f8358 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -335,12 +335,9 @@ do {   
\
   "i" (errret), "0" (retval)); \
 })
 
-#define

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-13 Thread Linus Torvalds

On Sun, Oct 13, 2019 at 11:13 AM Al Viro  wrote:
>
> Umm...  TBH, I wonder if we would be better off if restore_sigcontext()
> (i.e. sigreturn()/rt_sigreturn()) would flat-out copy_from_user() the
> entire[*] struct sigcontext into a local variable and then copied fields
> to pt_regs...

Probably ok., We've generally tried to avoid state that big on the
stack, but you're right that it's shallow.

> Same for do_sys_vm86(), perhaps.
>
> And these (32bit and 64bit restore_sigcontext() and do_sys_vm86())
> are the only get_user_ex() users anywhere...

Yeah, that sounds like a solid strategy for getting rid of them.

Particularly since we can't really make get_user_ex() generate
particularly good code (at least for now).

Now, put_user_ex() is a different thing - converting it to
unsafe_put_user() actually does make it generate very good code - much
better than copying data twice.

   Linus

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-13 Thread Al Viro

On Thu, Oct 10, 2019 at 05:31:13PM -0700, Linus Torvalds wrote:

> So the code actually needs to properly return the error early, or
> initialize the segments that didn't get loaded to 0, or something.
> 
> And when I posted that, Luto said "just get rid of the get_user_ex()
> entirely, instead of changing semantics of the existing ones to be
> sane.
> 
> Which is probably right. There aren't that many.
> 
> I *thought* there were also cases of us doing some questionably things
> inside the get_user_try sections, but those seem to have gotten fixed
> already independently, so it's really just the "make try/catch really
> try/catch" change that needs some editing of our current broken stuff
> that depends on it not actually *catching* exceptions, but on just
> continuing on to the next one.

Umm...  TBH, I wonder if we would be better off if restore_sigcontext()
(i.e. sigreturn()/rt_sigreturn()) would flat-out copy_from_user() the
entire[*] struct sigcontext into a local variable and then copied fields
to pt_regs...  The thing is small enough for not blowing the stack (256
bytes max. and it's on a shallow stack) and big enough to make "fancy
memcpy + let the compiler think how to combine in-kernel copies"
potentially better than hardwired sequence of 64bit loads/stores...

[*] OK, sans ->reserved part in the very end on 64bit.  192 bytes to
copy.

Same for do_sys_vm86(), perhaps - we want regs/flags/cpu_type and
screen_bitmap there, i.e. the beginning of struct vm86plus_struct
and of struct vm86_struct...  24*32bit.  IOW, 96-byte memcpy +
gcc-visible field-by-field copying vs. hardwired sequence of
32bit loads (with some 16bit ones thrown in, for extra fun) and
compiler told not to reorder anything.

And these (32bit and 64bit restore_sigcontext() and do_sys_vm86())
are the only get_user_ex() users anywhere...

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-10 Thread Linus Torvalds

On Thu, Oct 10, 2019 at 5:11 PM Al Viro  wrote:
>
> On Thu, Oct 10, 2019 at 03:12:49PM -0700, Linus Torvalds wrote:
>
> > But I've not gotten around to rewriting those disgusting sequences to
> > the unsafe_get/put_user() model. I did look at it, and it requires
> > some changes exactly *because* the _ex() functions are broken and
> > continue, but also because the current code ends up also doing other
> > things inside the try/catch region that you're not supposed to do in a
> > user_access_begin/end() region .
>
> Hmm...  Which one was that?  AFAICS, we have
> do_sys_vm86: only get_user_ex()
> restore_sigcontext(): get_user_ex(), set_user_gs()
> ia32_restore_sigcontext(): get_user_ex()

Try this patch.

It works fine (well, it worked fine the lastr time I tried this, I
might have screwed something up just now: I re-created the patch since
I hadn't saved it).

It's nice and clean, and does

 1 file changed, 9 insertions(+), 91 deletions(-)

by just deleting all the nasty *_ex() macros entirely, replacing them
with unsafe_get/put_user() calls.

And now those try/catch regions actually work like try/catch regions,
and a fault branches to the catch.

BUT.

It does change semantics, and you get warnings like

  arch/x86/ia32/ia32_signal.c: In function ‘ia32_restore_sigcontext’:
  arch/x86/ia32/ia32_signal.c:114:9: warning: ‘buf’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
114 |  err |= fpu__restore_sig(buf, 1);
| ^~~~
  arch/x86/ia32/ia32_signal.c:64:27: warning: ‘ds’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
 64 |  unsigned int pre = (seg) | 3;  \
|   ^
  arch/x86/ia32/ia32_signal.c:74:18: note: ‘ds’ was declared here
...
  arch/x86/kernel/signal.c: In function ‘restore_sigcontext’:
  arch/x86/kernel/signal.c:152:9: warning: ‘buf’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
152 |  err |= fpu__restore_sig(buf, IS_ENABLED(CONFIG_X86_32));
| ^~~~

because it's true: those things reall may not be initialized, because
the catch thing could have jumped out.

So the code actually needs to properly return the error early, or
initialize the segments that didn't get loaded to 0, or something.

And when I posted that, Luto said "just get rid of the get_user_ex()
entirely, instead of changing semantics of the existing ones to be
sane.

Which is probably right. There aren't that many.

I *thought* there were also cases of us doing some questionably things
inside the get_user_try sections, but those seem to have gotten fixed
already independently, so it's really just the "make try/catch really
try/catch" change that needs some editing of our current broken stuff
that depends on it not actually *catching* exceptions, but on just
continuing on to the next one.

Linus
 arch/x86/include/asm/uaccess.h | 100 -
 1 file changed, 9 insertions(+), 91 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 61d93f062a36..e87d8911dc53 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -193,23 +193,12 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
 		 : : "A" (x), "r" (addr)			\
 		 : : label)
 
-#define __put_user_asm_ex_u64(x, addr)	\
-	asm volatile("\n"		\
-		 "1:	movl %%eax,0(%1)\n"			\
-		 "2:	movl %%edx,4(%1)\n"			\
-		 "3:"		\
-		 _ASM_EXTABLE_EX(1b, 2b)\
-		 _ASM_EXTABLE_EX(2b, 3b)\
-		 : : "A" (x), "r" (addr))
-
 #define __put_user_x8(x, ptr, __ret_pu)\
 	asm volatile("call __put_user_8" : "=a" (__ret_pu)	\
 		 : "A" ((typeof(*(ptr)))(x)), "c" (ptr) : "ebx")
 #else
 #define __put_user_goto_u64(x, ptr, label) \
 	__put_user_goto(x, ptr, "q", "", "er", label)
-#define __put_user_asm_ex_u64(x, addr)	\
-	__put_user_asm_ex(x, addr, "q", "", "er")
 #define __put_user_x8(x, ptr, __ret_pu) __put_user_x(8, x, ptr, __ret_pu)
 #endif
 
@@ -289,31 +278,6 @@ do {	\
 	}\
 } while (0)
 
-/*
- * This doesn't do __uaccess_begin/end - the exception handling
- * around it must do that.
- */
-#define __put_user_size_ex(x, ptr, size)\
-do {	\
-	__chk_user_ptr(ptr);		\
-	switch (size) {			\
-	case 1:\
-		__put_user_asm_ex(x, ptr, "b", "b", "iq");		\
-		break;			\
-	case 2:\
-		__put_user_asm_ex(x, ptr, "w", "w", "ir");		\
-		break;			\
-	case 4:\
-		__put_user_asm_ex(x, ptr, "l", "k", "ir");		\
-		break;			\
-	case 8:\
-		__put_user_asm_ex_u64((__typeof__(*ptr))(x), ptr);	\
-		break;			\
-	default:			\
-		__put_user_bad();	\
-	}\
-} while (0)
-
 #ifdef CONFIG_X86_32
 #define __get_user_asm_u64(x, ptr, retval, errret)			\
 ({	\
@@ -334,13 +298,9 @@ do {	\
 		 : "m" (__m(__ptr)), "m"

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-10 Thread Al Viro

On Thu, Oct 10, 2019 at 03:12:49PM -0700, Linus Torvalds wrote:
> On Thu, Oct 10, 2019 at 12:55 PM Al Viro  wrote:
> >
> > Anyway, another question you way: what do you think of try/catch approaches
> > to __get_user() blocks, like e.g. restore_sigcontext() is doing?
> 
> I'd rather have them converted to our unsafe_get/put_user() instead.
> 
> We don't generate great code for the "get" case (because of how gcc
> doesn't allow us to mix "asm goto" and outputs), but I really despise
> the x86-specific "{get,put}_user_ex()" machinery. It's not actually
> doing a real try/catch at all, and will just keep taking faults if one
> happens.
> 
> But I've not gotten around to rewriting those disgusting sequences to
> the unsafe_get/put_user() model. I did look at it, and it requires
> some changes exactly *because* the _ex() functions are broken and
> continue, but also because the current code ends up also doing other
> things inside the try/catch region that you're not supposed to do in a
> user_access_begin/end() region .

Hmm...  Which one was that?  AFAICS, we have
do_sys_vm86: only get_user_ex()
restore_sigcontext(): get_user_ex(), set_user_gs()
ia32_restore_sigcontext(): get_user_ex()

So at least get_user_try/get_user_ex/get_user_catch should be killable.
The other side...
save_v86_state(): put_user_ex()
setup_sigcontext(): put_user_ex()
__setup_rt_frame(): put_user_ex(), static_cpu_has()
another one in __setup_rt_frame(): put_user_ex()
x32_setup_rt_frame(): put_user_ex()
ia32_setup_sigcontext(): put_user_ex()
ia32_setup_frame(): put_user_ex()
another one in ia32_setup_frame(): put_user_ex(), static_cpu_has()

IDGI...  Is static_cpu_has() not allowed in there?  Looks like it's all inlines
and doesn't do any potentially risky memory accesses...  What am I missing?

As for the try/catch model...  How about
if (!user_access_begin())
sod off
...
unsafe_get_user(..., l);
...
unsafe_get_user_nojump();
...
unsafe_get_user_nojump();
...
if (user_access_did_fail())
goto l;

user_access_end()
...
return 0;
l:
...
user_access_end()
return -EFAULT;

making it clear that we are delaying the check for failures until it's
more convenient.  And *not* trying to trick C parser into enforcing
anything - let objtool do it and to hell with do { and } while (0) in
magic macros.  Could be mixed with the normal unsafe_..._user() without
any problems, AFAICS...

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-10 Thread Linus Torvalds

On Thu, Oct 10, 2019 at 12:55 PM Al Viro  wrote:
>
> Anyway, another question you way: what do you think of try/catch approaches
> to __get_user() blocks, like e.g. restore_sigcontext() is doing?

I'd rather have them converted to our unsafe_get/put_user() instead.

We don't generate great code for the "get" case (because of how gcc
doesn't allow us to mix "asm goto" and outputs), but I really despise
the x86-specific "{get,put}_user_ex()" machinery. It's not actually
doing a real try/catch at all, and will just keep taking faults if one
happens.

But I've not gotten around to rewriting those disgusting sequences to
the unsafe_get/put_user() model. I did look at it, and it requires
some changes exactly *because* the _ex() functions are broken and
continue, but also because the current code ends up also doing other
things inside the try/catch region that you're not supposed to do in a
user_access_begin/end() region .

> Should that be available outside of arch/*?  For that matter, would
> it be a good idea to convert get_user_ex() users in arch/x86 to
> unsafe_get_user()?

See above: yes, it would be a good idea to convert to
unsafe_get/put_user(), and no, we don't want to expose the horrid
*_ex() model to other architectures.

  Linus

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-10 Thread Al Viro

On Mon, Oct 07, 2019 at 09:24:17PM -0700, Linus Torvalds wrote:
> On Mon, Oct 7, 2019 at 9:09 PM Linus Torvalds
>  wrote:
> >
> > Try the attached patch, and then count the number of "rorx"
> > instructions in the kernel. Hint: not many. On my personal config,
> > this triggers 15 times in the whole kernel build (not counting
> > modules).
> 
> .. and four of them are in perf_callchain_user(), and are due to those
> "__copy_from_user_nmi()" with either 4-byte or 8-byte copies.
> 
> It might as well just use __get_user() instead.
> 
> The point being that the silly code in the header files is just
> pointless. We shouldn't do it.

FWIW, the one that looks the most potentiall sensitive in that bunch is
arch/x86/kvm/paging_tmpl.h:388: if (unlikely(__copy_from_user(, 
ptep_user, sizeof(pte
in the bowels of KVM page fault handling.  I would be very surprised if
the rest would be detectable...

Anyway, another question you way: what do you think of try/catch approaches
to __get_user() blocks, like e.g. restore_sigcontext() is doing?

Should that be available outside of arch/*?  For that matter, would
it be a good idea to convert get_user_ex() users in arch/x86 to
unsafe_get_user()?

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-08 Thread Al Viro

On Tue, Oct 08, 2019 at 08:58:58PM +0100, Al Viro wrote:

> The difference is, they have separate "for read" and "for write" primitives
> and they want the range in their user_access_end() analogue.  Separating
> the read and write isn't a problem for callers (we want them close to
> the actual memory accesses).  Passing the range to user_access_end() just
> might be tolerable, unless it makes you throw up...

NOTE: I'm *NOT* suggesting to bring back the VERIFY_READ/VERIFY_WRITE
argument to access_ok().  We'd gotten rid of it, and for a very good
reason (and decades overdue).

The main difference between access_ok() and user_access_begin() is that
the latter is right next to actual memory access, with user_access_end()
on the other side, also very close.  And most of those guys would be
concentrated in a few functions, where we bloody well know which
direction we are copying.

Even if we try and map ppc allow_..._to_user() on user_access_begin(),
access_ok() remains as it is (and I hope we'll get rid of the majority
of its caller in process).

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-08 Thread Al Viro

On Tue, Oct 08, 2019 at 08:58:58PM +0100, Al Viro wrote:

> That's powerpc.  And while the constant-sized bits are probably pretty
> useless there as well, note the 
> allow_read_from_user()/prevent_read_from_user()
> part.  Looks suspiciously similar to user_access_begin()/user_access_end()...
> 
> The difference is, they have separate "for read" and "for write" primitives
> and they want the range in their user_access_end() analogue.  Separating
> the read and write isn't a problem for callers (we want them close to
> the actual memory accesses).  Passing the range to user_access_end() just
> might be tolerable, unless it makes you throw up...

BTW, another related cleanup is futex_atomic_op_inuser() and
arch_futex_atomic_op_inuser().  In the former we have
if (!access_ok(uaddr, sizeof(u32)))
return -EFAULT;

ret = arch_futex_atomic_op_inuser(op, oparg, , uaddr);
if (ret)
return ret;
and in the latter we've got STAC/CLAC pairs stuck into inlined bits
on x86.  As well as allow_write_to_user(uaddr, sizeof(*uaddr)) on
ppc...

I don't see anything in x86 one objtool would've barfed if we pulled
STAC/CLAC out and turned access_ok() into user_access_begin(),
with matching user_access_end() right after the call of 
arch_futex_atomic_op_inuser().  Everything is inlined there and
no scary memory accesses would get into the scope (well, we do
have
if (!ret)
*oval = oldval;
in the very end of arch_futex_atomic_op_inuser() there, but oval
is the address of a local variable in the sole caller; if we run
with kernel stack on ring 3 page, we are deeply fucked *and*
wouldn't have survived that far into futex_atomic_op_inuser() anyway ;-)

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-08 Thread Al Viro

On Mon, Oct 07, 2019 at 11:26:35AM -0700, Linus Torvalds wrote:

> The good news is that right now x86 is the only architecture that does
> that user_access_begin(), so we don't need to worry about anything
> else. Apparently the ARM people haven't had enough performance
> problems with the PAN bit for them to care.

Take a look at this:
static inline unsigned long raw_copy_from_user(void *to,
const void __user *from, unsigned long n)
{
unsigned long ret;
if (__builtin_constant_p(n) && (n <= 8)) {
ret = 1;

switch (n) {
case 1:
barrier_nospec();
__get_user_size(*(u8 *)to, from, 1, ret);
break;
case 2:
barrier_nospec();
__get_user_size(*(u16 *)to, from, 2, ret);
break;
case 4:
barrier_nospec();
__get_user_size(*(u32 *)to, from, 4, ret);
break;
case 8:
barrier_nospec();
__get_user_size(*(u64 *)to, from, 8, ret);
break;
}
if (ret == 0)
return 0;
}

barrier_nospec();
allow_read_from_user(from, n);
ret = __copy_tofrom_user((__force void __user *)to, from, n);
prevent_read_from_user(from, n);
return ret;
}

That's powerpc.  And while the constant-sized bits are probably pretty
useless there as well, note the allow_read_from_user()/prevent_read_from_user()
part.  Looks suspiciously similar to user_access_begin()/user_access_end()...

The difference is, they have separate "for read" and "for write" primitives
and they want the range in their user_access_end() analogue.  Separating
the read and write isn't a problem for callers (we want them close to
the actual memory accesses).  Passing the range to user_access_end() just
might be tolerable, unless it makes you throw up...

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-08 Thread Al Viro

On Tue, Oct 08, 2019 at 05:38:31PM +0200, Greg KH wrote:
> On Tue, Oct 08, 2019 at 04:29:00PM +0100, Al Viro wrote:
> > On Tue, Oct 08, 2019 at 03:14:16PM +0200, Greg KH wrote:
> > > On Tue, Oct 08, 2019 at 05:57:12AM +0100, Al Viro wrote:
> > > > 
> > > > OK...  BTW, do you agree that the use of access_ok() in
> > > > drivers/tty/n_hdlc.c:n_hdlc_tty_read() is wrong?  It's used as an early
> > > > cutoff, so we don't bother waiting if user has passed an obviously bogus
> > > > address.  copy_to_user() is used for actual copying there...
> > > 
> > > Yes, it's wrong, and not needed.  I'll go rip it out unless you want to?
> > 
> > I'll throw it into misc queue for now; it has no prereqs and nothing is 
> > going
> > to depend upon it.
> 
> Great, thanks.
> 
> > While looking for more of the same pattern: usb_device_read().  Frankly,
> > usb_device_dump() calling conventions look ugly - it smells like it
> > would be much happier as seq_file.  Iterator would take some massage,
> > but that seems to be doable.  Anyway, that's a separate story...
> 
> That's just a debugfs file, and yes, it should be moved to seq_file.  I
> think I tried it a long time ago, but given it's just a debugging thing,
> I gave up as it wasn't worth it.
> 
> But yes, the access_ok() there also seems odd, and should be dropped.

I'm almost tempted to keep it there as a reminder/grep fodder ;-)

Seriously, though, it might be useful to have a way of marking the places
in need of gentle repair of retrocranial inversions _without_ attracting
the "checkpatch warning of the week" crowd...

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-08 Thread Greg KH

On Tue, Oct 08, 2019 at 04:29:00PM +0100, Al Viro wrote:
> On Tue, Oct 08, 2019 at 03:14:16PM +0200, Greg KH wrote:
> > On Tue, Oct 08, 2019 at 05:57:12AM +0100, Al Viro wrote:
> > > 
> > >   OK...  BTW, do you agree that the use of access_ok() in
> > > drivers/tty/n_hdlc.c:n_hdlc_tty_read() is wrong?  It's used as an early
> > > cutoff, so we don't bother waiting if user has passed an obviously bogus
> > > address.  copy_to_user() is used for actual copying there...
> > 
> > Yes, it's wrong, and not needed.  I'll go rip it out unless you want to?
> 
> I'll throw it into misc queue for now; it has no prereqs and nothing is going
> to depend upon it.

Great, thanks.

> While looking for more of the same pattern: usb_device_read().  Frankly,
> usb_device_dump() calling conventions look ugly - it smells like it
> would be much happier as seq_file.  Iterator would take some massage,
> but that seems to be doable.  Anyway, that's a separate story...

That's just a debugfs file, and yes, it should be moved to seq_file.  I
think I tried it a long time ago, but given it's just a debugging thing,
I gave up as it wasn't worth it.

But yes, the access_ok() there also seems odd, and should be dropped.

thanks,

greg k-h

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-08 Thread Al Viro

On Tue, Oct 08, 2019 at 03:14:16PM +0200, Greg KH wrote:
> On Tue, Oct 08, 2019 at 05:57:12AM +0100, Al Viro wrote:
> > 
> > OK...  BTW, do you agree that the use of access_ok() in
> > drivers/tty/n_hdlc.c:n_hdlc_tty_read() is wrong?  It's used as an early
> > cutoff, so we don't bother waiting if user has passed an obviously bogus
> > address.  copy_to_user() is used for actual copying there...
> 
> Yes, it's wrong, and not needed.  I'll go rip it out unless you want to?

I'll throw it into misc queue for now; it has no prereqs and nothing is going
to depend upon it.

While looking for more of the same pattern: usb_device_read().  Frankly,
usb_device_dump() calling conventions look ugly - it smells like it
would be much happier as seq_file.  Iterator would take some massage,
but that seems to be doable.  Anyway, that's a separate story...

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-08 Thread Greg KH

On Tue, Oct 08, 2019 at 05:57:12AM +0100, Al Viro wrote:
> 
>   OK...  BTW, do you agree that the use of access_ok() in
> drivers/tty/n_hdlc.c:n_hdlc_tty_read() is wrong?  It's used as an early
> cutoff, so we don't bother waiting if user has passed an obviously bogus
> address.  copy_to_user() is used for actual copying there...

Yes, it's wrong, and not needed.  I'll go rip it out unless you want to?

thanks,

greg k-h

RE: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-08 Thread David Laight

From: Linus Torvalds 
> Sent: 07 October 2019 19:11
...
> I've been very close to just removing __get_user/__put_user several
> times, exactly because people do completely the wrong thing with them
> - not speeding code up, but making it unsafe and buggy.

They could do the very simple check that 'user_ptr+size < kernel_base'
rather than the full window check under the assumption that access_ok()
has been called and that the likely errors are just overruns.

> The new "user_access_begin/end()" model is much better, but it also
> has actual STATIC checking that there are no function calls etc inside
> the region, so it forces you to do the loop properly and tightly, and
> not the incorrect "I checked the range somewhere else, now I'm doing
> an unsafe copy".
> 
> And it actually speeds things up, unlike the access_ok() games.

I've code that does:
if (!access_ok(...))
return -EFAULT;
...
for (...) {
if (__get_user(tmp_u64, user_ptr++))
return -EFAULT;
writeq(tmp_u64, io_ptr++);
}
(Although the code is more complex because not all transfers are multiples of 8 
bytes.)

With user_access_begin/end() I'd probably want to put the copy loop
inside a function (which will probably get inlined) to avoid convoluted
error processing.
So you end up with:
if (!user_access_ok())
return _EFAULT;
user_access_begin();
rval = do_copy_code(...);
user_access_end();
return rval;
Which, at the source level (at least) breaks your 'no function calls' rule.
The writeq() might also break it as well.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-08 Thread Geert Uytterhoeven

On Tue, Oct 8, 2019 at 1:30 AM Guenter Roeck  wrote:
> m68k:
>
> c2p_iplan2.c:(.text+0x98): undefined reference to `c2p_unsupported'
>
> I don't know the status.

Fall-out from the (non)inline optimization.  Patch available:
https://lore.kernel.org/lkml/20190927094708.11563-1-ge...@linux-m68k.org/

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-07 Thread Al Viro

On Mon, Oct 07, 2019 at 09:14:51PM -0700, Linus Torvalds wrote:
> On Mon, Oct 7, 2019 at 9:09 PM Linus Torvalds
>  wrote:
> >
> > Try the attached patch, and then count the number of "rorx"
> > instructions in the kernel. Hint: not many. On my personal config,
> > this triggers 15 times in the whole kernel build (not counting
> > modules).
> 
> So here's a serious patch that doesn't just mark things for counting -
> it just removes the cases entirely.
> 
> Doesn't this look nice:
> 
>   2 files changed, 2 insertions(+), 133 deletions(-)
> 
> and it is one less thing to worry about when doing further cleanup.
> 
> Seriously, if any of those __copy_{to,from}_user() constant cases were
> a big deal, we can turn them into get_user/put_user calls. But only
> after they show up as an actual performance issue.

Makes sense.  I'm not arguing against doing that.  Moreover, I suspect
that other architectures will be similar, at least once the
sigframe-related code for given architecture is dealt with.  But that's
more of a "let's look at that later" thing (hopefully with maintainers
of architectures getting involved).

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-07 Thread Al Viro

On Mon, Oct 07, 2019 at 09:09:14PM -0700, Linus Torvalds wrote:

> > 1) cross-architecture user_access_begin_dont_use(): on everything
> > except x86 it's empty, on x86 - __uaccess_begin_nospec().
> 
> No, just do a proper range check, and use user_access_begin()
> 
> Stop trying to optimize that range check away. It's a couple of fast
> instructions.
> 
> The only ones who don't want the range check are the actual kernel
> copy ones, but they don't want the user_access_begin() either.

Not at the first step.  Sure, in the end we want exactly that, and
we want it ASAP.  However, the main reason it grows into a tangled
mess that would be over the top for this cycle is the impacts in
arseload of places all over arch/*.

That way we can untangle those.  The initial segment that would
allow to use raw_copy_to_user() cleanly in readdir.c et.al.
could be done with provably zero impact on anything in arch/*
outside of arch/x86 usercopy-related code.

Moreover, it will be fairly small.  And after that the rest can
be done in any order, independent from each other.  I want to
kill __copy_... completely, and I believe we'll be able to do
just that in a cycle or two.

Once that is done, the helper disappears along with __copy_...().
And it will be documented as a temporary kludge, don't use
anywhere else, no matter what from the very beginning.  For
all the couple of cycles it'll take.

I'm serious about getting rid of __copy_...() in that timeframe.
There's not that much left.

The reason I don't want to do a blanket search-and-replace turning
them all into copy_...() is simply that their use is a good indicator
of code in need of serious beating^Wamount of careful review.

And hell, we might end up doing just that on case-by-case basis.
Often enough we will, by what I'd seen there...

Again, this kluge is only a splitup aid - by the end of the series
it's gone.  All it allows is to keep it easier to review.

Note, BTW, that bits and pieces converting a given pointless use
of __copy_...() to copy_...() can be reordered freely at any point
of the sequence - I've already got several.  _Some_ of (5) will
be conversions a-la readdir.c one and that has to follow (4), but
most of it won't be like that.

> > void *copy_mount_options(const void __user * data)
> > {
> > unsigned offs, size;
> > char *copy;
> >
> > if (!data)
> > return NULL;
> >
> > copy = kmalloc(PAGE_SIZE, GFP_KERNEL);
> > if (!copy)
> > return ERR_PTR(-ENOMEM);
> >
> > offs = (unsigned long)untagged_addr(data) & (PAGE_SIZE - 1);
> >
> > if (copy_from_user(copy, data, PAGE_SIZE - offs)) {
> > kfree(copy);
> > return ERR_PTR(-EFAULT);
> > }
> > if (offs) {
> > if (copy_from_user(copy, data + PAGE_SIZE - offs, offs))
> > memset(copy + PAGE_SIZE - offs, 0, offs);
> > }
> > return copy;
> > }
> >
> > on the theory that any fault halfway through a page means a race with
> > munmap/mprotect/etc. and we can just pretend we'd lost the race entirely.
> > And to hell with exact_copy_from_user(), byte-by-byte copying, etc.
> 
> Looks reasonable.

OK...  BTW, do you agree that the use of access_ok() in
drivers/tty/n_hdlc.c:n_hdlc_tty_read() is wrong?  It's used as an early
cutoff, so we don't bother waiting if user has passed an obviously bogus
address.  copy_to_user() is used for actual copying there...

There are other places following that pattern and IMO they are all
wrong.  Another variety is a half-arsed filter trying to prevent warnings
from too large (and user-controllable) kmalloc() of buffer we'll be
copying to.  Which is worth very little, since kmalloc() will scream and
fail well before access_ok() limits will trip.  Those need explicit capping
of the size, IMO...

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-07 Thread Linus Torvalds

On Mon, Oct 7, 2019 at 9:09 PM Linus Torvalds
 wrote:
>
> Try the attached patch, and then count the number of "rorx"
> instructions in the kernel. Hint: not many. On my personal config,
> this triggers 15 times in the whole kernel build (not counting
> modules).

.. and four of them are in perf_callchain_user(), and are due to those
"__copy_from_user_nmi()" with either 4-byte or 8-byte copies.

It might as well just use __get_user() instead.

The point being that the silly code in the header files is just
pointless. We shouldn't do it.

Linus

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-07 Thread Linus Torvalds

On Mon, Oct 7, 2019 at 9:09 PM Linus Torvalds
 wrote:
>
> Try the attached patch, and then count the number of "rorx"
> instructions in the kernel. Hint: not many. On my personal config,
> this triggers 15 times in the whole kernel build (not counting
> modules).

So here's a serious patch that doesn't just mark things for counting -
it just removes the cases entirely.

Doesn't this look nice:

  2 files changed, 2 insertions(+), 133 deletions(-)

and it is one less thing to worry about when doing further cleanup.

Seriously, if any of those __copy_{to,from}_user() constant cases were
a big deal, we can turn them into get_user/put_user calls. But only
after they show up as an actual performance issue.

Linus
 arch/x86/include/asm/uaccess_32.h |  27 --
 arch/x86/include/asm/uaccess_64.h | 108 +-
 2 files changed, 2 insertions(+), 133 deletions(-)

diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h
index ba2dc1930630..388a40660c7b 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -23,33 +23,6 @@ raw_copy_to_user(void __user *to, const void *from, unsigned long n)
 static __always_inline unsigned long
 raw_copy_from_user(void *to, const void __user *from, unsigned long n)
 {
-	if (__builtin_constant_p(n)) {
-		unsigned long ret;
-
-		switch (n) {
-		case 1:
-			ret = 0;
-			__uaccess_begin_nospec();
-			__get_user_asm_nozero(*(u8 *)to, from, ret,
-	  "b", "b", "=q", 1);
-			__uaccess_end();
-			return ret;
-		case 2:
-			ret = 0;
-			__uaccess_begin_nospec();
-			__get_user_asm_nozero(*(u16 *)to, from, ret,
-	  "w", "w", "=r", 2);
-			__uaccess_end();
-			return ret;
-		case 4:
-			ret = 0;
-			__uaccess_begin_nospec();
-			__get_user_asm_nozero(*(u32 *)to, from, ret,
-	  "l", "k", "=r", 4);
-			__uaccess_end();
-			return ret;
-		}
-	}
 	return __copy_user_ll(to, (__force const void *)from, n);
 }
 
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index 5cd1caa8bc65..bc10e3dc64fe 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -65,117 +65,13 @@ copy_to_user_mcsafe(void *to, const void *from, unsigned len)
 static __always_inline __must_check unsigned long
 raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
 {
-	int ret = 0;
-
-	if (!__builtin_constant_p(size))
-		return copy_user_generic(dst, (__force void *)src, size);
-	switch (size) {
-	case 1:
-		__uaccess_begin_nospec();
-		__get_user_asm_nozero(*(u8 *)dst, (u8 __user *)src,
-			  ret, "b", "b", "=q", 1);
-		__uaccess_end();
-		return ret;
-	case 2:
-		__uaccess_begin_nospec();
-		__get_user_asm_nozero(*(u16 *)dst, (u16 __user *)src,
-			  ret, "w", "w", "=r", 2);
-		__uaccess_end();
-		return ret;
-	case 4:
-		__uaccess_begin_nospec();
-		__get_user_asm_nozero(*(u32 *)dst, (u32 __user *)src,
-			  ret, "l", "k", "=r", 4);
-		__uaccess_end();
-		return ret;
-	case 8:
-		__uaccess_begin_nospec();
-		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
-			  ret, "q", "", "=r", 8);
-		__uaccess_end();
-		return ret;
-	case 10:
-		__uaccess_begin_nospec();
-		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
-			   ret, "q", "", "=r", 10);
-		if (likely(!ret))
-			__get_user_asm_nozero(*(u16 *)(8 + (char *)dst),
-   (u16 __user *)(8 + (char __user *)src),
-   ret, "w", "w", "=r", 2);
-		__uaccess_end();
-		return ret;
-	case 16:
-		__uaccess_begin_nospec();
-		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
-			   ret, "q", "", "=r", 16);
-		if (likely(!ret))
-			__get_user_asm_nozero(*(u64 *)(8 + (char *)dst),
-   (u64 __user *)(8 + (char __user *)src),
-   ret, "q", "", "=r", 8);
-		__uaccess_end();
-		return ret;
-	default:
-		return copy_user_generic(dst, (__force void *)src, size);
-	}
+	return copy_user_generic(dst, (__force void *)src, size);
 }
 
 static __always_inline __must_check unsigned long
 raw_copy_to_user(void __user *dst, const void *src, unsigned long size)
 {
-	int ret = 0;
-
-	if (!__builtin_constant_p(size))
-		return copy_user_generic((__force void *)dst, src, size);
-	switch (size) {
-	case 1:
-		__uaccess_begin();
-		__put_user_asm(*(u8 *)src, (u8 __user *)dst,
-			  ret, "b", "b", "iq", 1);
-		__uaccess_end();
-		return ret;
-	case 2:
-		__uaccess_begin();
-		__put_user_asm(*(u16 *)src, (u16 __user *)dst,
-			  ret, "w", "w", "ir", 2);
-		__uaccess_end();
-		return ret;
-	case 4:
-		__uaccess_begin();
-		__put_user_asm(*(u32 *)src, (u32 __user *)dst,
-			  ret, "l", "k", "ir", 4);
-		__uaccess_end();
-		return ret;
-	case 8:
-		__uaccess_begin();
-		__put_user_asm(*(u64 *)src, (u64 __user *)dst,
-			  ret, "q", "", "er", 8);
-		__uaccess_end();
-		return ret;
-	case 10:
-		__uaccess_begin();
-		__put_user_asm(*(u64 *)src, (u64 __user *)dst,
-			   ret, "q", "", "er", 10);
-		if (likely(!ret)) {
-

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-07 Thread Linus Torvalds

On Mon, Oct 7, 2019 at 8:29 PM Al Viro  wrote:
>
> For x86?  Sure, why not...  Note, BTW, that for short constant-sized
> copies we *do* STAC/CLAC at the call site - see those
> __uaccess_begin_nospec();
> in raw_copy_{from,to}_user() in the switches...

Yeah, an that code almost never actually triggers in practice. The
code is pointless and dead.

The thing is, it's only ever used for the double undescore versions,
and the ones that do have have it are almost never constant sizes in
the first place.

And yes, there's like a couple of cases in the whole kernel.

Just remove those constant size cases. They are pointless and just
complicate our headers and slow down the compile for no good reason.

Try the attached patch, and then count the number of "rorx"
instructions in the kernel. Hint: not many. On my personal config,
this triggers 15 times in the whole kernel build (not counting
modules).

It's not worth it. The "speedup" from using __copy_{to,from}_user()
with the fancy inlining is negligible. All the cost is in the
STAC/CLAC anyway, the code might as well be deleted.

> 1) cross-architecture user_access_begin_dont_use(): on everything
> except x86 it's empty, on x86 - __uaccess_begin_nospec().

No, just do a proper range check, and use user_access_begin()

Stop trying to optimize that range check away. It's a couple of fast
instructions.

The only ones who don't want the range check are the actual kernel
copy ones, but they don't want the user_access_begin() either.

> void *copy_mount_options(const void __user * data)
> {
> unsigned offs, size;
> char *copy;
>
> if (!data)
> return NULL;
>
> copy = kmalloc(PAGE_SIZE, GFP_KERNEL);
> if (!copy)
> return ERR_PTR(-ENOMEM);
>
> offs = (unsigned long)untagged_addr(data) & (PAGE_SIZE - 1);
>
> if (copy_from_user(copy, data, PAGE_SIZE - offs)) {
> kfree(copy);
> return ERR_PTR(-EFAULT);
> }
> if (offs) {
> if (copy_from_user(copy, data + PAGE_SIZE - offs, offs))
> memset(copy + PAGE_SIZE - offs, 0, offs);
> }
> return copy;
> }
>
> on the theory that any fault halfway through a page means a race with
> munmap/mprotect/etc. and we can just pretend we'd lost the race entirely.
> And to hell with exact_copy_from_user(), byte-by-byte copying, etc.

Looks reasonable.

  Linus
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index 5cd1caa8bc65..db58c4436ce3 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -62,6 +62,8 @@ copy_to_user_mcsafe(void *to, const void *from, unsigned len)
 	return ret;
 }
 
+#define marker(x) asm volatile("rorx $" #x ",%rax,%rdx")
+
 static __always_inline __must_check unsigned long
 raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
 {
@@ -72,30 +74,35 @@ raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
 	switch (size) {
 	case 1:
 		__uaccess_begin_nospec();
+		marker(1);
 		__get_user_asm_nozero(*(u8 *)dst, (u8 __user *)src,
 			  ret, "b", "b", "=q", 1);
 		__uaccess_end();
 		return ret;
 	case 2:
 		__uaccess_begin_nospec();
+		marker(2);
 		__get_user_asm_nozero(*(u16 *)dst, (u16 __user *)src,
 			  ret, "w", "w", "=r", 2);
 		__uaccess_end();
 		return ret;
 	case 4:
 		__uaccess_begin_nospec();
+		marker(4);
 		__get_user_asm_nozero(*(u32 *)dst, (u32 __user *)src,
 			  ret, "l", "k", "=r", 4);
 		__uaccess_end();
 		return ret;
 	case 8:
 		__uaccess_begin_nospec();
+		marker(8);
 		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
 			  ret, "q", "", "=r", 8);
 		__uaccess_end();
 		return ret;
 	case 10:
 		__uaccess_begin_nospec();
+		marker(10);
 		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
 			   ret, "q", "", "=r", 10);
 		if (likely(!ret))
@@ -106,6 +113,7 @@ raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
 		return ret;
 	case 16:
 		__uaccess_begin_nospec();
+		marker(16);
 		__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
 			   ret, "q", "", "=r", 16);
 		if (likely(!ret))
@@ -129,30 +137,35 @@ raw_copy_to_user(void __user *dst, const void *src, unsigned long size)
 	switch (size) {
 	case 1:
 		__uaccess_begin();
+		marker(51);
 		__put_user_asm(*(u8 *)src, (u8 __user *)dst,
 			  ret, "b", "b", "iq", 1);
 		__uaccess_end();
 		return ret;
 	case 2:
 		__uaccess_begin();
+		marker(52);
 		__put_user_asm(*(u16 *)src, (u16 __user *)dst,
 			  ret, "w", "w", "ir", 2);
 		__uaccess_end();
 		return ret;
 	case 4:
 		__uaccess_begin();
+		marker(54);
 		__put_user_asm(*(u32 *)src, (u32 __user *)dst,
 			  ret, "l", "k", "ir", 4);
 		__uaccess_end();
 		return ret;
 	case 8:
 		__uaccess_begin();
+		marker(58);
 		__put_user_asm(*(u64 *)src, (u64 __user *)dst,
 			  ret, "q", "", "er", 8);

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-07 Thread Al Viro

On Mon, Oct 07, 2019 at 11:26:35AM -0700, Linus Torvalds wrote:

> But on x86, if we move the STAC/CLAC out of the low-level copy
> routines and into the callers, we'll have a _lot_ of churn. I thought
> it would be mostly a "teach objtool" thing, but we have lots of
> different versions of it. Not just the 32-bit vs 64-bit, it's embedded
> in all the low-level asm implementations.
> 
> And we don't want the regular "copy_to/from_user()" to then have to
> add the STAC/CLAC at the call-site. So then we'd want to un-inline
> copy_to_user() entirely.

For x86?  Sure, why not...  Note, BTW, that for short constant-sized
copies we *do* STAC/CLAC at the call site - see those
__uaccess_begin_nospec();
in raw_copy_{from,to}_user() in the switches...

> Which all sounds like a really good idea, don't get me wrong. I think
> we inline it way too aggressively now. But it'sa  _big_ job.
> 
> So we probably _should_
> 
>  - remove INLINE_COPY_TO/FROM_USER
> 
>  - remove all the "small constant size special cases".
> 
>  - make "raw_copy_to/from_user()" have the "unsafe" semantics and make
> the out-of-line copy in lib/usercopy.c be the only real interface
> 
>  - get rid of a _lot_ of oddities

Not that many, really.  All we need is a temporary cross-architecture
__uaccess_begin_nospec(), so that __copy_{to,from}_user() could have
that used, instead of having it done in (x86) raw_copy_..._...().

Other callers of raw_copy_...() would simply wrap it into user_access_begin()/
user_access_end() pairs; this kludge is needed only in __copy_from_user()
and __copy_to_user(), and only until we kill their callers outside of
arch/*.  Which we can do, in a cycle or two.  _ANY_ use of
that temporary kludge outside of those two functions will be grepped
for and LARTed into the ground.

> I hope you prove me wrong. But I'll look at a smaller change to just
> make x86 use the current special copy loop (as
> "unsafe_copy_to_user()") and have everybody else do the trivial
> wrapper.
> 
> Because we definitely should do that cleanup (it also fixes the whole
> "atomic copy in kernel space" issue that you pointed to that doesn't
> actually want STAC/CLAC at all), but it just looks fairly massive to
> me.

AFAICS, it splits nicely.

1) cross-architecture user_access_begin_dont_use(): on everything
except x86 it's empty, on x86 - __uaccess_begin_nospec().

2) stac/clac lifted into x86 raw_copy_..._user() out of
copy_user_generic_unrolled(), copy_user_generic_string() and
copy_user_enhanced_fast_string().  Similar lift out of
__copy_user_nocache().

3) lifting that thing as user_access_begin_dont_use() into
__copy_..._user...() and as user_access_begin() into other
generic callers, consuming access_ok() in the latter.
__copy_to_user_inatomic() can die at the same stage.

4) possibly uninlining on x86 (and yes, killing the special
size handling off).  We don't need to touch the inlining
decisions for any other architectures.

At that point raw_copy_to_user() is available for e.g.
readdir.c to play with.

And up to that point only x86 sees any kind of code changes,
so we don't have to worry about other architectures.

5) kill the __copy_...() users outside of arch/*, alone with
quite a few other weird shits in there.  A cycle or two,
with the final goal being to kill the damn things off.

6) arch/* users get arch-by-arch treatment - mostly
it's sigframe handling.  Won't affect the generic code
and would be independent for different architectures.
Can happen in parallel with (5), actually.

7) ... at that point user_access_begin_dont_user() gets
removed and thrown into the pile of mangled fingers of
those who'd ignored all warnings and used it somewhere
else.

I don't believe that (5) would be doable entirely in
this cycle, but quite a few bits might be.

On a somewhat related note, do you see any problems with

void *copy_mount_options(const void __user * data)
{
unsigned offs, size;
char *copy;

if (!data)
return NULL;

copy = kmalloc(PAGE_SIZE, GFP_KERNEL);
if (!copy)
return ERR_PTR(-ENOMEM);

offs = (unsigned long)untagged_addr(data) & (PAGE_SIZE - 1);

if (copy_from_user(copy, data, PAGE_SIZE - offs)) {
kfree(copy);
return ERR_PTR(-EFAULT);
}
if (offs) {
if (copy_from_user(copy, data + PAGE_SIZE - offs, offs))
memset(copy + PAGE_SIZE - offs, 0, offs);
}
return copy;
}

on the theory that any fault halfway through a page means a race with
munmap/mprotect/etc. and we can just pretend we'd lost the race entirely.
And to hell with exact_copy_from_user(), byte-by-byte copying, etc.

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-07 Thread Guenter Roeck


On 10/7/19 12:21 PM, Linus Torvalds wrote:

On Sun, Oct 6, 2019 at 3:20 PM Guenter Roeck  wrote:


this patch causes all my sparc64 emulations to stall during boot. It causes
all alpha emulations to crash with [1a] and [1b] when booting from a virtual
disk, and one of the xtensa emulations to crash with [2].


So I think your alpha emulation environment may be broken, because
Michael Cree reports that it works for him on real hardware, but he
does see the kernel unaligned count being high.

But regardless, this is my current fairly minimal patch that I think
should fix the unaligned issue, while still giving the behavior we
want on x86. I hope Al can do something nicer, but I think this is
"acceptable".

I'm running this now on x86, and I verified that x86-32 code
generation looks sane too, but it woudl be good to verify that this
makes the alignment issue go away on other architectures.

 Linus



Test results look good. Feel free to add
Tested-by: Guenter Roeck 
to your patch.

Build results:
total: 158 pass: 154 fail: 4
Failed builds:
arm:allmodconfig
m68k:defconfig
mips:allmodconfig
sparc64:allmodconfig
Qemu test results:
total: 391 pass: 390 fail: 1
Failed tests:
ppc64:mpc8544ds:ppc64_e5500_defconfig:nosmp:initrd

This is with "regulator: fixed: Prevent NULL pointer dereference when 
!CONFIG_OF"
applied as well. The other failures are unrelated.

arm:

arch/arm/crypto/aes-ce-core.S:299: Error:
selected processor does not support `movw 
ip,:lower16:.Lcts_permute_table' in ARM mode

Fix is pending in crypto tree.

m68k:

c2p_iplan2.c:(.text+0x98): undefined reference to `c2p_unsupported'

I don't know the status.

mips:

drivers/staging/octeon/ethernet-defines.h:30:38: error:
'CONFIG_CAVIUM_OCTEON_CVMSEG_SIZE' undeclared
and other similar errors. I don't know the status.

ppc64:

powerpc64-linux-ld: mm/page_alloc.o:(.toc+0x18): undefined reference to 
`node_reclaim_distance'

Reported against offending patch earlier today.

sparc64:

drivers/watchdog/cpwd.c:500:19: error: 'compat_ptr_ioctl' undeclared here

Oops. I'll need to look into that. Looks like the patch to use a new
infrastructure made it into the kernel but the infrastructure itself
didn't make it after all.

Guenter

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-07 Thread Guenter Roeck

On Mon, Oct 07, 2019 at 12:21:25PM -0700, Linus Torvalds wrote:
> On Sun, Oct 6, 2019 at 3:20 PM Guenter Roeck  wrote:
> >
> > this patch causes all my sparc64 emulations to stall during boot. It causes
> > all alpha emulations to crash with [1a] and [1b] when booting from a virtual
> > disk, and one of the xtensa emulations to crash with [2].
> 
> So I think your alpha emulation environment may be broken, because
> Michael Cree reports that it works for him on real hardware, but he
> does see the kernel unaligned count being high.
> 
Yes, that possibility always exists, unfortunately.

> But regardless, this is my current fairly minimal patch that I think
> should fix the unaligned issue, while still giving the behavior we
> want on x86. I hope Al can do something nicer, but I think this is
> "acceptable".
> 
> I'm running this now on x86, and I verified that x86-32 code
> generation looks sane too, but it woudl be good to verify that this
> makes the alignment issue go away on other architectures.
> 
> Linus

I started a complete test run with the patch applied. I'll let you know
how it went after it is complete - it should be done in a couple of hours.

Guenter

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-07 Thread Linus Torvalds

On Mon, Oct 7, 2019 at 12:49 PM Tony Luck  wrote:
>
> If PSR.ac is set, we trap. If it isn't set, then model specific
> (though all implementations will
> trap for an unaligned access that crosses a 4K boundary).

Ok. At that point, setting AC unconditionally is the better model just
to get test coverage for "it will trap occasionally anyway".

Odd "almost-but-not-quite x86" both in naming and in behavior (AC was
a no-op in kernel-mode until SMAP).

> Your patch does make all the messages go away.
>
> Tested-by: Tony Luck 

Ok, I'll commit it, and we'll see what Al can come up with that might
be a bigger cleanup.

 Linus

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-07 Thread Tony Luck

On Mon, Oct 7, 2019 at 12:09 PM Linus Torvalds
 wrote:
> Hmm? I thought ia64 did unaligneds ok.

If PSR.ac is set, we trap. If it isn't set, then model specific
(though all implementations will
trap for an unaligned access that crosses a 4K boundary).

Linux sets PSR.ac. Applications can use prctl(PR_SET_UNALIGN) to choose whether
they want the kernel to silently fix things or to send SIGBUS.

Kernel always noisily (rate limited) fixes up unaligned access.

Your patch does make all the messages go away.

Tested-by: Tony Luck

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-07 Thread Linus Torvalds

On Sun, Oct 6, 2019 at 3:20 PM Guenter Roeck  wrote:
>
> this patch causes all my sparc64 emulations to stall during boot. It causes
> all alpha emulations to crash with [1a] and [1b] when booting from a virtual
> disk, and one of the xtensa emulations to crash with [2].

So I think your alpha emulation environment may be broken, because
Michael Cree reports that it works for him on real hardware, but he
does see the kernel unaligned count being high.

But regardless, this is my current fairly minimal patch that I think
should fix the unaligned issue, while still giving the behavior we
want on x86. I hope Al can do something nicer, but I think this is
"acceptable".

I'm running this now on x86, and I verified that x86-32 code
generation looks sane too, but it woudl be good to verify that this
makes the alignment issue go away on other architectures.

Linus
 arch/x86/include/asm/uaccess.h | 23 ++
 fs/readdir.c   | 44 ++
 include/linux/uaccess.h|  6 --
 3 files changed, 29 insertions(+), 44 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 35c225ede0e4..61d93f062a36 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -734,5 +734,28 @@ do {		\
 	if (unlikely(__gu_err)) goto err_label;	\
 } while (0)
 
+/*
+ * We want the unsafe accessors to always be inlined and use
+ * the error labels - thus the macro games.
+ */
+#define unsafe_copy_loop(dst, src, len, type, label)			\
+	while (len >= sizeof(type)) {	\
+		unsafe_put_user(*(type *)src,(type __user *)dst,label);	\
+		dst += sizeof(type);	\
+		src += sizeof(type);	\
+		len -= sizeof(type);	\
+	}
+
+#define unsafe_copy_to_user(_dst,_src,_len,label)			\
+do {	\
+	char __user *__ucu_dst = (_dst);\
+	const char *__ucu_src = (_src);	\
+	size_t __ucu_len = (_len);	\
+	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u64, label);	\
+	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u32, label);	\
+	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u16, label);	\
+	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u8, label);	\
+} while (0)
+
 #endif /* _ASM_X86_UACCESS_H */
 
diff --git a/fs/readdir.c b/fs/readdir.c
index 19bea591c3f1..6e2623e57b2e 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -27,53 +27,13 @@
 /*
  * Note the "unsafe_put_user() semantics: we goto a
  * label for errors.
- *
- * Also note how we use a "while()" loop here, even though
- * only the biggest size needs to loop. The compiler (well,
- * at least gcc) is smart enough to turn the smaller sizes
- * into just if-statements, and this way we don't need to
- * care whether 'u64' or 'u32' is the biggest size.
- */
-#define unsafe_copy_loop(dst, src, len, type, label) 		\
-	while (len >= sizeof(type)) {\
-		unsafe_put_user(get_unaligned((type *)src),	\
-			(type __user *)dst, label);		\
-		dst += sizeof(type);\
-		src += sizeof(type);\
-		len -= sizeof(type);\
-	}
-
-/*
- * We avoid doing 64-bit copies on 32-bit architectures. They
- * might be better, but the component names are mostly small,
- * and the 64-bit cases can end up being much more complex and
- * put much more register pressure on the code, so it's likely
- * not worth the pain of unaligned accesses etc.
- *
- * So limit the copies to "unsigned long" size. I did verify
- * that at least the x86-32 case is ok without this limiting,
- * but I worry about random other legacy 32-bit cases that
- * might not do as well.
- */
-#define unsafe_copy_type(dst, src, len, type, label) do {	\
-	if (sizeof(type) <= sizeof(unsigned long))		\
-		unsafe_copy_loop(dst, src, len, type, label);	\
-} while (0)
-
-/*
- * Copy the dirent name to user space, and NUL-terminate
- * it. This should not be a function call, since we're doing
- * the copy inside a "user_access_begin/end()" section.
  */
 #define unsafe_copy_dirent_name(_dst, _src, _len, label) do {	\
 	char __user *dst = (_dst);\
 	const char *src = (_src);\
 	size_t len = (_len);	\
-	unsafe_copy_type(dst, src, len, u64, label);	 	\
-	unsafe_copy_type(dst, src, len, u32, label);		\
-	unsafe_copy_type(dst, src, len, u16, label);		\
-	unsafe_copy_type(dst, src, len, u8,  label);		\
-	unsafe_put_user(0, dst, label);\
+	unsafe_put_user(0, dst+len, label);			\
+	unsafe_copy_to_user(dst, src, len, label);		\
 } while (0)
 
 
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index e47d0522a1f4..d4ee6e942562 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -355,8 +355,10 @@ extern long strnlen_unsafe_user(const void __user *unsafe_addr, long count);
 #ifndef user_access_begin
 #define user_access_begin(ptr,len) access_ok(ptr, len)
 #define user_access_end() do { } while (0)
-#define unsafe_get_user(x, ptr, err) do { if (unlikely(__get_user(x, ptr))) goto err; } while (0)
-#define unsafe_put_user(x, ptr, err)

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-07 Thread Linus Torvalds

On Mon, Oct 7, 2019 at 11:36 AM Tony Luck  wrote:
>
> Late to this party ,,, but my ia64 console today is full of:

Hmm? I thought ia64 did unaligneds ok.

But regardless, this is my current (as yet untested) patch.  This is
not the big user access cleanup that I hope Al will do, this is just a
"ok, x86 is the only one who wants a special unsafe_copy_to_user()
right now" patch.

Linus
 arch/x86/include/asm/uaccess.h | 23 ++
 fs/readdir.c   | 44 ++
 include/linux/uaccess.h|  6 --
 3 files changed, 29 insertions(+), 44 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 35c225ede0e4..61d93f062a36 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -734,5 +734,28 @@ do {		\
 	if (unlikely(__gu_err)) goto err_label;	\
 } while (0)
 
+/*
+ * We want the unsafe accessors to always be inlined and use
+ * the error labels - thus the macro games.
+ */
+#define unsafe_copy_loop(dst, src, len, type, label)			\
+	while (len >= sizeof(type)) {	\
+		unsafe_put_user(*(type *)src,(type __user *)dst,label);	\
+		dst += sizeof(type);	\
+		src += sizeof(type);	\
+		len -= sizeof(type);	\
+	}
+
+#define unsafe_copy_to_user(_dst,_src,_len,label)			\
+do {	\
+	char __user *__ucu_dst = (_dst);\
+	const char *__ucu_src = (_src);	\
+	size_t __ucu_len = (_len);	\
+	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u64, label);	\
+	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u32, label);	\
+	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u16, label);	\
+	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u8, label);	\
+} while (0)
+
 #endif /* _ASM_X86_UACCESS_H */
 
diff --git a/fs/readdir.c b/fs/readdir.c
index 19bea591c3f1..6e2623e57b2e 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -27,53 +27,13 @@
 /*
  * Note the "unsafe_put_user() semantics: we goto a
  * label for errors.
- *
- * Also note how we use a "while()" loop here, even though
- * only the biggest size needs to loop. The compiler (well,
- * at least gcc) is smart enough to turn the smaller sizes
- * into just if-statements, and this way we don't need to
- * care whether 'u64' or 'u32' is the biggest size.
- */
-#define unsafe_copy_loop(dst, src, len, type, label) 		\
-	while (len >= sizeof(type)) {\
-		unsafe_put_user(get_unaligned((type *)src),	\
-			(type __user *)dst, label);		\
-		dst += sizeof(type);\
-		src += sizeof(type);\
-		len -= sizeof(type);\
-	}
-
-/*
- * We avoid doing 64-bit copies on 32-bit architectures. They
- * might be better, but the component names are mostly small,
- * and the 64-bit cases can end up being much more complex and
- * put much more register pressure on the code, so it's likely
- * not worth the pain of unaligned accesses etc.
- *
- * So limit the copies to "unsigned long" size. I did verify
- * that at least the x86-32 case is ok without this limiting,
- * but I worry about random other legacy 32-bit cases that
- * might not do as well.
- */
-#define unsafe_copy_type(dst, src, len, type, label) do {	\
-	if (sizeof(type) <= sizeof(unsigned long))		\
-		unsafe_copy_loop(dst, src, len, type, label);	\
-} while (0)
-
-/*
- * Copy the dirent name to user space, and NUL-terminate
- * it. This should not be a function call, since we're doing
- * the copy inside a "user_access_begin/end()" section.
  */
 #define unsafe_copy_dirent_name(_dst, _src, _len, label) do {	\
 	char __user *dst = (_dst);\
 	const char *src = (_src);\
 	size_t len = (_len);	\
-	unsafe_copy_type(dst, src, len, u64, label);	 	\
-	unsafe_copy_type(dst, src, len, u32, label);		\
-	unsafe_copy_type(dst, src, len, u16, label);		\
-	unsafe_copy_type(dst, src, len, u8,  label);		\
-	unsafe_put_user(0, dst, label);\
+	unsafe_put_user(0, dst+len, label);			\
+	unsafe_copy_to_user(dst, src, len, label);		\
 } while (0)
 
 
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index e47d0522a1f4..d4ee6e942562 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -355,8 +355,10 @@ extern long strnlen_unsafe_user(const void __user *unsafe_addr, long count);
 #ifndef user_access_begin
 #define user_access_begin(ptr,len) access_ok(ptr, len)
 #define user_access_end() do { } while (0)
-#define unsafe_get_user(x, ptr, err) do { if (unlikely(__get_user(x, ptr))) goto err; } while (0)
-#define unsafe_put_user(x, ptr, err) do { if (unlikely(__put_user(x, ptr))) goto err; } while (0)
+#define unsafe_op_wrap(op, err) do { if (unlikely(op)) goto err; } while (0)
+#define unsafe_get_user(x,p,e) unsafe_op_wrap(__get_user(x,p),e)
+#define unsafe_put_user(x,p,e) unsafe_op_wrap(__put_user(x,p),e)
+#define unsafe_copy_to_user(d,s,l,e) unsafe_op_wrap(__copy_to_user(d,s,l),e)
 static inline unsigned long user_access_save(void) { return 0UL; }
 static inline void user_access_restore(unsigned long flags) {

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-07 Thread Tony Luck

On Mon, Oct 7, 2019 at 11:28 AM Linus Torvalds
 wrote:
>
> On Sun, Oct 6, 2019 at 8:11 PM Linus Torvalds
>  wrote:
> >
> > >
> > > The last two should just do user_access_begin()/user_access_end()
> > > instead of access_ok().  __copy_to_user_inatomic() has very few callers 
> > > as well:
> >
> > Yeah, good points.
>
> Looking at it some more this morning, I think it's actually pretty painful.

Late to this party ,,, but my ia64 console today is full of:

irqbalance(5244): unaligned access to 0x200800042f9b, ip=0xa001002fef90
irqbalance(5244): unaligned access to 0x200800042fbb, ip=0xa001002fef90
irqbalance(5244): unaligned access to 0x200800042fdb, ip=0xa001002fef90
irqbalance(5244): unaligned access to 0x200800042ffb, ip=0xa001002fef90
irqbalance(5244): unaligned access to 0x20080004301b, ip=0xa001002fef90
ia64_handle_unaligned: 95 callbacks suppressed
irqbalance(5244): unaligned access to 0x200800042f9b, ip=0xa001002fef90
irqbalance(5244): unaligned access to 0x200800042fbb, ip=0xa001002fef90
irqbalance(5244): unaligned access to 0x200800042fdb, ip=0xa001002fef90
irqbalance(5244): unaligned access to 0x200800042ffb, ip=0xa001002fef90
irqbalance(5244): unaligned access to 0x20080004301b, ip=0xa001002fef90
ia64_handle_unaligned: 95 callbacks suppressed

Those ip's point into filldir64()

-Tony

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-07 Thread Linus Torvalds

On Sun, Oct 6, 2019 at 8:11 PM Linus Torvalds
 wrote:
>
> >
> > The last two should just do user_access_begin()/user_access_end()
> > instead of access_ok().  __copy_to_user_inatomic() has very few callers as 
> > well:
>
> Yeah, good points.

Looking at it some more this morning, I think it's actually pretty painful.

The good news is that right now x86 is the only architecture that does
that user_access_begin(), so we don't need to worry about anything
else. Apparently the ARM people haven't had enough performance
problems with the PAN bit for them to care.

We can have a fallback wrapper for unsafe_copy_to_user() for other
architectures that just does the __copy_to_user().

But on x86, if we move the STAC/CLAC out of the low-level copy
routines and into the callers, we'll have a _lot_ of churn. I thought
it would be mostly a "teach objtool" thing, but we have lots of
different versions of it. Not just the 32-bit vs 64-bit, it's embedded
in all the low-level asm implementations.

And we don't want the regular "copy_to/from_user()" to then have to
add the STAC/CLAC at the call-site. So then we'd want to un-inline
copy_to_user() entirely.

Which all sounds like a really good idea, don't get me wrong. I think
we inline it way too aggressively now. But it'sa  _big_ job.

So we probably _should_

 - remove INLINE_COPY_TO/FROM_USER

 - remove all the "small constant size special cases".

 - make "raw_copy_to/from_user()" have the "unsafe" semantics and make
the out-of-line copy in lib/usercopy.c be the only real interface

 - get rid of a _lot_ of oddities

but looking at just how much churn this is, I suspect that for 5.4
it's a bit late to do quite that much cleanup.

I hope you prove me wrong. But I'll look at a smaller change to just
make x86 use the current special copy loop (as
"unsafe_copy_to_user()") and have everybody else do the trivial
wrapper.

Because we definitely should do that cleanup (it also fixes the whole
"atomic copy in kernel space" issue that you pointed to that doesn't
actually want STAC/CLAC at all), but it just looks fairly massive to
me.

Linus

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-07 Thread Al Viro

On Mon, Oct 07, 2019 at 11:13:27AM -0700, Linus Torvalds wrote:
> On Mon, Oct 7, 2019 at 10:34 AM Al Viro  wrote:
> >
> > Tangentially related: copy_regster_to_user() and copy_regset_from_user().
> 
> Not a worry. It's not performance-critical code, and if it ever is, it
> needs to be rewritten anyway.
> 
> > The former variant tends to lead to few calls
> > of __copy_{to,from}_user(); the latter...  On x86 it ends up doing
> > this:
> 
> Just replace the __put_user() with a put_user() and be done with it.
> That code isn't acceptable, and if somebody ever complains about
> performance it's not the lack of __put_user that is the problem.

I wonder if it would be better off switching to several "copy in bulk"
like e.g. ppc does.  That boilerplate with parallel "to/from kernel"
and "to/from userland" loops is asking for bugs - the calling
conventions like "pass kbuf and ubuf; exactly one must be NULL"
tend to be trouble, IME; I'm not saying we should just pass
struct iov_iter * instead of count+pos+kbuf+ubuf to ->get() and
->set(), but it might clean the things up nicely.

Let me look into that zoo a bit more...

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-07 Thread Linus Torvalds

On Mon, Oct 7, 2019 at 10:34 AM Al Viro  wrote:
>
> Tangentially related: copy_regster_to_user() and copy_regset_from_user().

Not a worry. It's not performance-critical code, and if it ever is, it
needs to be rewritten anyway.

> The former variant tends to lead to few calls
> of __copy_{to,from}_user(); the latter...  On x86 it ends up doing
> this:

Just replace the __put_user() with a put_user() and be done with it.
That code isn't acceptable, and if somebody ever complains about
performance it's not the lack of __put_user that is the problem.

   Linus

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-07 Thread Linus Torvalds

On Mon, Oct 7, 2019 at 8:40 AM David Laight  wrote:
>
> You don't really want an extra access_ok() for every 'word' of a copy.

Yes you do.

> Some copies have to be done a word at a time.

Completely immaterial. If you can't see the access_ok() close to the
__get/put_user(), you have a bug.

Plus the access_ok() is cheap. The real cost is the STAC/CLAC.

So stop with the access_ok() "optimizations". They are broken garbage.

Really.

I've been very close to just removing __get_user/__put_user several
times, exactly because people do completely the wrong thing with them
- not speeding code up, but making it unsafe and buggy.

The new "user_access_begin/end()" model is much better, but it also
has actual STATIC checking that there are no function calls etc inside
th4e region, so it forces you to do the loop properly and tightly, and
not the incorrect "I checked the range somewhere else, now I'm doing
an unsafe copy".

And it actually speeds things up, unlike the access_ok() games.

   Linus

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-07 Thread Al Viro

On Sun, Oct 06, 2019 at 08:11:42PM -0700, Linus Torvalds wrote:

> > So do we want to bother with separation between raw_copy_to_user() and
> > unsafe_copy_to_user()?  After all, __copy_to_user() also has only few
> > callers, most of them in arch/*
> 
> No, you're right. Just switch over.
> 
> > I'll take a look into that tomorrow - half-asleep right now...
> 
> Thanks. No huge hurry.

Tangentially related: copy_regster_to_user() and copy_regset_from_user().
That's where we do access_ok(), followed by calls of ->get() and
->set() resp.  Those tend to either use user_regset_copy{out,in}(),
or open-code those.  The former variant tends to lead to few calls
of __copy_{to,from}_user(); the latter...  On x86 it ends up doing
this:
static int genregs_get(struct task_struct *target,
   const struct user_regset *regset,
   unsigned int pos, unsigned int count,
   void *kbuf, void __user *ubuf)
{
if (kbuf) {
unsigned long *k = kbuf;
while (count >= sizeof(*k)) {
*k++ = getreg(target, pos);
count -= sizeof(*k);
pos += sizeof(*k);
}
} else {
unsigned long __user *u = ubuf;
while (count >= sizeof(*u)) {
if (__put_user(getreg(target, pos), u++))
return -EFAULT;
count -= sizeof(*u);
pos += sizeof(*u);
}
}

return 0;
}

Potentially doing arseloads of stac/clac as it goes.  OTOH, getreg()
(and setreg()) in there are not entirely trivial, so blanket
user_access_begin()/user_access_end() over the entire loop might be
a bad idea...

How hot is that codepath?  I know that arch/um used to rely on it
(== PTRACE_[GS]ETREGS) quite a bit...

RE: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-07 Thread David Laight

> From: Linus Torvalds
> Sent: 07 October 2019 04:12
...
> In this case, I think it's done a few callers up in i915_gem_pread_ioctl():
> 
> if (!access_ok(u64_to_user_ptr(args->data_ptr),
>args->size))
> return -EFAULT;
> 
> but honestly, trying to optimize away another "access_ok()" is just
> not worth it. I'd rather have an extra one than miss one.

You don't really want an extra access_ok() for every 'word' of a copy.
Some copies have to be done a word at a time.

And the checks someone added to copy_to/from_user() to detect kernel
buffer overruns must kill performance when the buffers are way down the stack
or in kmalloc()ed space.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-07 Thread Guenter Roeck


Hi Max,

On 10/6/19 9:04 PM, Max Filippov wrote:

On Sun, Oct 6, 2019 at 3:25 PM Guenter Roeck  wrote:

this patch causes all my sparc64 emulations to stall during boot. It causes
all alpha emulations to crash with [1a] and [1b] when booting from a virtual
disk, and one of the xtensa emulations to crash with [2].


[...]


[2]

Unable to handle kernel paging request at virtual address 0004
reboot(50): Oops -1
pc = [<0004>]  ra = []  ps = Tainted: G   
   D
pc is at 0x4
ra is at filldir64+0x64/0x320
v0 =   t0 = 67736d6b  t1 = 00012011445b
t2 =   t3 =   t4 = 7ef8
t5 = 000120114448  t6 =   t7 = fc0007eec000
s0 = fc000792b5c3  s1 = 0004  s2 = 0018
s3 = fc0007eefec8  s4 = 0008  s5 = f0a3
s6 = 000b
a0 = fc000792b5c3  a1 = 2f2f2f2f2f2f2f2f  a2 = 0004
a3 = 000b  a4 = f0a3  a5 = 0008
t8 = 0018  t9 =   t10= 22e1d02a
t11= 00011fd6f3b8  pv = fcb9a810  at = 22e1ccf8
gp = fcf03930  sp = (ptrval)
Trace:
[] proc_readdir_de+0x170/0x300
[] filldir64+0x0/0x320
[] proc_root_readdir+0x3c/0x80
[] iterate_dir+0x198/0x240
[] ksys_getdents64+0xa8/0x160
[] sys_getdents64+0x20/0x40
[] filldir64+0x0/0x320
[] entSys+0xa4/0xc0


This doesn't look like a dump from xtensa core.
v5.4-rc2 kernel doesn't crash for me on xtensa, but the userspace
doesn't work well, because all directories appear to be empty.

__put_user/__get_user don't do unaligned access on xtensa,
they check address alignment and return -EFAULT if it's bad.


You are right, sorry; I must have mixed that up. xtensa doesn't crash.
The boot stalls, similar to sparc64. This is only seen with my nommu
test (de212:kc705-nommu:nommu_kc705_defconfig). xtensa mmu tests are fine,
at least for me, but then I only run tests with initrd (which for some
reason doesn't crash on alpha either).

Guenter

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-06 Thread Max Filippov

On Sun, Oct 6, 2019 at 3:25 PM Guenter Roeck  wrote:
> this patch causes all my sparc64 emulations to stall during boot. It causes
> all alpha emulations to crash with [1a] and [1b] when booting from a virtual
> disk, and one of the xtensa emulations to crash with [2].

[...]

> [2]
>
> Unable to handle kernel paging request at virtual address 0004
> reboot(50): Oops -1
> pc = [<0004>]  ra = []  ps = Tainted: G 
>  D
> pc is at 0x4
> ra is at filldir64+0x64/0x320
> v0 =   t0 = 67736d6b  t1 = 00012011445b
> t2 =   t3 =   t4 = 7ef8
> t5 = 000120114448  t6 =   t7 = fc0007eec000
> s0 = fc000792b5c3  s1 = 0004  s2 = 0018
> s3 = fc0007eefec8  s4 = 0008  s5 = f0a3
> s6 = 000b
> a0 = fc000792b5c3  a1 = 2f2f2f2f2f2f2f2f  a2 = 0004
> a3 = 000b  a4 = f0a3  a5 = 0008
> t8 = 0018  t9 =   t10= 22e1d02a
> t11= 00011fd6f3b8  pv = fcb9a810  at = 22e1ccf8
> gp = fcf03930  sp = (ptrval)
> Trace:
> [] proc_readdir_de+0x170/0x300
> [] filldir64+0x0/0x320
> [] proc_root_readdir+0x3c/0x80
> [] iterate_dir+0x198/0x240
> [] ksys_getdents64+0xa8/0x160
> [] sys_getdents64+0x20/0x40
> [] filldir64+0x0/0x320
> [] entSys+0xa4/0xc0

This doesn't look like a dump from xtensa core.
v5.4-rc2 kernel doesn't crash for me on xtensa, but the userspace
doesn't work well, because all directories appear to be empty.

__put_user/__get_user don't do unaligned access on xtensa,
they check address alignment and return -EFAULT if it's bad.

-- 
Thanks.
-- Max

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-06 Thread Linus Torvalds

On Sun, Oct 6, 2019 at 7:30 PM Guenter Roeck  wrote:
>
> > Mind humoring me and trying that on your alpha machine (or emulator,
> > or whatever)?
>
> Here you are. This is with v5.4-rc2 and your previous patch applied
> on top.
>
> / # ./mmtest
> Unable to handle kernel paging request at virtual address 0004

Oookay.

Well, that's what I expected, but it's good to just have it confirmed.

Well, not "good" in this case. Bad bad bad.

The fs/readdir.c changes clearly exposed a pre-existing bug on alpha.
Not making excuses for it, but at least it explains why code that
_looks_ correct ends up causing that kind of issue.

I guess the other 'strict alignment' architectures should be checking
that test program too. I'll post my test program to the arch
maintainers list.

 Linus

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-06 Thread Linus Torvalds

On Sun, Oct 6, 2019 at 7:50 PM Al Viro  wrote:
>
> Out of those, only __copy_to_user_inatomic(), __copy_to_user(),
> _copy_to_user() and iov_iter.c:copyout() can be called on
> any architecture.
>
> The last two should just do user_access_begin()/user_access_end()
> instead of access_ok().  __copy_to_user_inatomic() has very few callers as 
> well:

Yeah, good points.

It looks like it would be better to just change over semantics
entirely to the unsafe_copy_user() model.

> So few, in fact, that I wonder if we want to keep it at all; the only
> thing stopping me from "let's remove it" is that I don't understand
> the i915 side of things.  Where does it do an equivalent of access_ok()?

Honestly, if you have to ask, I think the answer is: just add one.

Every single time we've had people who optimized things to try to
avoid the access_ok(), they just caused bugs and problems.

In this case, I think it's done a few callers up in i915_gem_pread_ioctl():

if (!access_ok(u64_to_user_ptr(args->data_ptr),
   args->size))
return -EFAULT;

but honestly, trying to optimize away another "access_ok()" is just
not worth it. I'd rather have an extra one than miss one.

> And mm/maccess.c one is __probe_kernel_write(), so presumably we don't
> want stac/clac there at all...

Yup.

> So do we want to bother with separation between raw_copy_to_user() and
> unsafe_copy_to_user()?  After all, __copy_to_user() also has only few
> callers, most of them in arch/*

No, you're right. Just switch over.

> I'll take a look into that tomorrow - half-asleep right now...

Thanks. No huge hurry.

 Linus

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-06 Thread Al Viro

On Sun, Oct 06, 2019 at 07:06:19PM -0700, Linus Torvalds wrote:
> On Sun, Oct 6, 2019 at 6:24 PM Al Viro  wrote:
> >
> > Ugh...  I wonder if it would be better to lift STAC/CLAC out of
> > raw_copy_to_user(), rather than trying to reinvent its guts
> > in readdir.c...
> 
> Yeah, I suspect that's the best option.
> 
> Do something like
> 
>  - lift STAC/CLAC out of raw_copy_to_user
> 
>  - rename it to unsafe_copy_to_user
> 
>  - create a new raw_copy_to_user that is just unsafe_copy_to_user()
> with the STAC/CLAC around it.
> 
> and the end result would actually be cleanert than what we have now
> (which duplicates that STAC/CLAC for each size case etc).
> 
> And then for the "architecture doesn't have user_access_begin/end()"
> fallback case, we just do
> 
>#define unsafe_copy_to_user raw_copy_to_user

Callers of raw_copy_to_user():
arch/hexagon/mm/uaccess.c:27:   uncleared = raw_copy_to_user(dest, 
_zero_page, PAGE_SIZE);
arch/hexagon/mm/uaccess.c:34:   count = raw_copy_to_user(dest, 
_zero_page, count);
arch/powerpc/kvm/book3s_64_mmu_radix.c:68:  ret = 
raw_copy_to_user(to, from, n);
arch/s390/include/asm/uaccess.h:150:size = raw_copy_to_user(ptr, x, size);
include/asm-generic/uaccess.h:145:  return unlikely(raw_copy_to_user(ptr, 
x, size)) ? -EFAULT : 0;
include/linux/uaccess.h:93: return raw_copy_to_user(to, from, n);
include/linux/uaccess.h:102:return raw_copy_to_user(to, from, n);
include/linux/uaccess.h:131:n = raw_copy_to_user(to, from, n);
lib/iov_iter.c:142: n = raw_copy_to_user(to, from, n);
lib/usercopy.c:28:  n = raw_copy_to_user(to, from, n);

Out of those, only __copy_to_user_inatomic(), __copy_to_user(),
_copy_to_user() and iov_iter.c:copyout() can be called on
any architecture.

The last two should just do user_access_begin()/user_access_end()
instead of access_ok().  __copy_to_user_inatomic() has very few callers as well:

arch/mips/kernel/unaligned.c:1307:  res = 
__copy_to_user_inatomic(addr, fpr, sizeof(*fpr));
drivers/gpu/drm/i915/i915_gem.c:345:unwritten = 
__copy_to_user_inatomic(user_data,
lib/test_kasan.c:471:   unused = __copy_to_user_inatomic(usermem, kmem, size + 
1);
mm/maccess.c:98:ret = __copy_to_user_inatomic((__force void __user 
*)dst, src, size);

So few, in fact, that I wonder if we want to keep it at all; the only
thing stopping me from "let's remove it" is that I don't understand
the i915 side of things.  Where does it do an equivalent of access_ok()?

And mm/maccess.c one is __probe_kernel_write(), so presumably we don't
want stac/clac there at all...

So do we want to bother with separation between raw_copy_to_user() and
unsafe_copy_to_user()?  After all, __copy_to_user() also has only few
callers, most of them in arch/*

I'll take a look into that tomorrow - half-asleep right now...

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-06 Thread Guenter Roeck


On 10/6/19 6:17 PM, Linus Torvalds wrote:

On Sun, Oct 6, 2019 at 5:04 PM Guenter Roeck  wrote:

[ ... ]

And yes, I'll fix that name copy loop in filldir to align the
destination first, *but* if I'm right, it means that something like
this should also likely cause issues:

   #define _GNU_SOURCE
   #include 
   #include 

   int main(int argc, char **argv)
   {
 void *mymap;
 uid_t *bad_ptr = (void *) 0x01;

 /* Create unpopulated memory area */
 mymap = mmap(NULL, 16384, PROT_READ | PROT_WRITE, MAP_PRIVATE
| MAP_ANONYMOUS, -1, 0);

 /* Unaligned uidpointer in that memory area */
 bad_ptr = mymap+1;

 /* Make the kernel do put_user() on it */
 return getresuid(bad_ptr, bad_ptr+1, bad_ptr+2);
   }

because that simple user mode program should cause that same "page
fault on unaligned put_user()" behavior as far as I can tell.

Mind humoring me and trying that on your alpha machine (or emulator,
or whatever)?



Here you are. This is with v5.4-rc2 and your previous patch applied
on top.

/ # ./mmtest
Unable to handle kernel paging request at virtual address 0004
mmtest(75): Oops -1
pc = [<0004>]  ra = []  ps = Not tainted
pc is at 0x4
ra is at entSys+0xa4/0xc0
v0 = fff2  t0 =   t1 = 
t2 =   t3 =   t4 = 
t5 = fffe  t6 =   t7 = fc0007edc000
s0 =   s1 = 000126f0  s2 = 0001200df19f
s3 = 0001200ea0b9  s4 = 000120114630  s5 = 0001201145d8
s6 = 00011f955c50
a0 = 0202c001  a1 = 0202c005  a2 = 0202c009
a3 =   a4 =   a5 = 
t8 =   t9 = fc00  t10= 
t11= 00011f955788  pv = fc349450  at = f8db54d3
gp = fcf2a160  sp = ab237c72
Disabling lock debugging due to kernel taint
Trace:

Code:
 
 00063301
 07b6
 
 3f8d

Segmentation fault

Guenter

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-06 Thread Linus Torvalds

On Sun, Oct 6, 2019 at 6:24 PM Al Viro  wrote:
>
> Ugh...  I wonder if it would be better to lift STAC/CLAC out of
> raw_copy_to_user(), rather than trying to reinvent its guts
> in readdir.c...

Yeah, I suspect that's the best option.

Do something like

 - lift STAC/CLAC out of raw_copy_to_user

 - rename it to unsafe_copy_to_user

 - create a new raw_copy_to_user that is just unsafe_copy_to_user()
with the STAC/CLAC around it.

and the end result would actually be cleanert than what we have now
(which duplicates that STAC/CLAC for each size case etc).

And then for the "architecture doesn't have user_access_begin/end()"
fallback case, we just do

   #define unsafe_copy_to_user raw_copy_to_user

and the only slight painpoint is that we need to deal with that
copy_user_generic() case too.

We'd have to mark it uaccess_safe in objtool (but we already have that
__memcpy_mcsafe and csum_partial_copy_generic, os it all makes sense),
and we'd have to make all the other copy_user_generic() cases then do
the CLAC/STAC dance too or something.

ANYWAY.  As mentioned, I'm not actually all that worried about this all.

I could easily also just see the filldir() copy do an extra

#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
if (len && (1 & (uint_ptr_t)dst)) .. copy byte ..
if (len > 1 && (2 & (uint_ptr_t)dst)) .. copy word ..
if (len > 3 && (4 & (uint_ptr_t)dst) && sizeof(unsigned long) > 4)
.. copy dword ..
#endif

at the start to align the destination.

The filldir code is actually somewhat unusual in that it deals with
pretty small strings on average, so just doing this might be more
efficient anyway.

So that doesn't worry me. Multiple ways to solve that part.

The "uhhuh, unaligned accesses cause more than performance problems" -
that's what worries me.

Linus

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-06 Thread Al Viro

On Sun, Oct 06, 2019 at 06:17:02PM -0700, Linus Torvalds wrote:
> On Sun, Oct 6, 2019 at 5:04 PM Guenter Roeck  wrote:
> >
> > All my alpha, sparc64, and xtensa tests pass with the attached patch
> > applied on top of v5.4-rc2. I didn't test any others.
> 
> Okay... I really wish my guess had been wrong.
> 
> Because fixing filldir64 isn't the problem. I can come up with
> multiple ways to avoid the unaligned issues if that was the problem.
> 
> But it does look to me like the fundamental problem is that unaligned
> __put_user() calls might just be broken on alpha (and likely sparc
> too). Because that looks to be the only difference between the
> __copy_to_user() approach and using unsafe_put_user() in a loop.
> 
> Now, I should have handled unaligned things differently in the first
> place, and in that sense I think commit 9f79b78ef744 ("Convert
> filldir[64]() from __put_user() to unsafe_put_user()") really is
> non-optimal on architectures with alignment issues.
> 
> And I'll fix it.

Ugh...  I wonder if it would be better to lift STAC/CLAC out of
raw_copy_to_user(), rather than trying to reinvent its guts
in readdir.c...

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-06 Thread Linus Torvalds

On Sun, Oct 6, 2019 at 5:04 PM Guenter Roeck  wrote:
>
> All my alpha, sparc64, and xtensa tests pass with the attached patch
> applied on top of v5.4-rc2. I didn't test any others.

Okay... I really wish my guess had been wrong.

Because fixing filldir64 isn't the problem. I can come up with
multiple ways to avoid the unaligned issues if that was the problem.

But it does look to me like the fundamental problem is that unaligned
__put_user() calls might just be broken on alpha (and likely sparc
too). Because that looks to be the only difference between the
__copy_to_user() approach and using unsafe_put_user() in a loop.

Now, I should have handled unaligned things differently in the first
place, and in that sense I think commit 9f79b78ef744 ("Convert
filldir[64]() from __put_user() to unsafe_put_user()") really is
non-optimal on architectures with alignment issues.

And I'll fix it.

But at the same time, the fact that "non-optimal" turns into "doesn't
work" is a fairly nasty issue.

> I'll (try to) send you some disassembly next.

Thanks, verified.

The "ra is at filldir64+0x64/0x320" is indeed right at the return
point of the "jsr verify_dirent_name".

But the problem isn't there - that's just left-over state. I'm pretty
sure that function worked fine, and returned.

The problem is that "pc is at 0x4" and the page fault that then
happens at that address as a result.

And that seems to be due to this:

 8c0:   00 00 29 2c ldq_u   t0,0(s0)
 8c4:   07 00 89 2c ldq_u   t3,7(s0)
 8c8:   03 04 e7 47 mov t6,t2
 8cc:   c1 06 29 48 extql   t0,s0,t0
 8d0:   44 0f 89 48 extqh   t3,s0,t3
 8d4:   01 04 24 44 or  t0,t3,t0
 8d8:   00 00 22 b4 stq t0,0(t1)

that's the "get_unaligned((type *)src)" (the first six instructions)
followed by the "unsafe_put_user()" done with a single "stq". That's
the guts of the unsafe_copy_loop() as part of
unsafe_copy_dirent_name()

And what I think happens is that it is writing to user memory that is

 (a) unaligned

 (b) not currently mapped in user space

so then the do_entUna() function tries to handle the unaligned trap,
but then it takes an exception while doing that (due to the unmapped
page), and then something in that nested exception mess causes it to
mess up badly and corrupt the register contents on stack, and it
returns with garbage in 'pc', and then you finally die with that

   Unable to handle kernel paging request at virtual address 0004
   pc is at 0x4

thing.

And yes, I'll fix that name copy loop in filldir to align the
destination first, *but* if I'm right, it means that something like
this should also likely cause issues:

  #define _GNU_SOURCE
  #include 
  #include 

  int main(int argc, char **argv)
  {
void *mymap;
uid_t *bad_ptr = (void *) 0x01;

/* Create unpopulated memory area */
mymap = mmap(NULL, 16384, PROT_READ | PROT_WRITE, MAP_PRIVATE
| MAP_ANONYMOUS, -1, 0);

/* Unaligned uidpointer in that memory area */
bad_ptr = mymap+1;

/* Make the kernel do put_user() on it */
return getresuid(bad_ptr, bad_ptr+1, bad_ptr+2);
  }

because that simple user mode program should cause that same "page
fault on unaligned put_user()" behavior as far as I can tell.

Mind humoring me and trying that on your alpha machine (or emulator,
or whatever)?

   Linus

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-06 Thread Guenter Roeck

On Sun, Oct 06, 2019 at 04:06:16PM -0700, Linus Torvalds wrote:
> On Sun, Oct 6, 2019 at 3:20 PM Guenter Roeck  wrote:
> >
> > this patch causes all my sparc64 emulations to stall during boot. It causes
> > all alpha emulations to crash with [1a] and [1b] when booting from a virtual
> > disk, and one of the xtensa emulations to crash with [2].
> 
> Ho humm. I've run variations of that patch over a few years on x86,
> but obviously not on alpha/sparc.
> 
> At least I should still be able to read alpha assembly, even after all
> these years. Would you mind sending me the result of
> 
> make fs/readdir.s
> 
> on alpha with the broken config? I'd hope that the sparc issue is the same.
> 
> Actually, could you also do
> 
> make fs/readdir.o
> 
> and then send me the "objdump --disassemble" of that? That way I get
> the instruction offsets without having to count by hand.
> 

Both attached for alpha.

> > Unable to handle kernel paging request at virtual address 0004
> > rcS(47): Oops -1
> > pc = [<0004>]  ra = []  ps = Not 
> > tainted
> > pc is at 0x4
> 
> That is _funky_. I'm not seeing how it could possibly jump to 0x4, but
> it clearly does.
> 
> That said, are you sure it's _that_ commit? Because this pattern:
> 
Bisect on sparc pointed to this commit, and re-running the tests with
the commit reverted passed for all architectures. I didn't check any
further.

Please let me know if you need anything else at this point.

Thanks,
Guenter
.set noreorder
.set volatile
.set noat
.set nomacro
.arch ev5
 # GNU C89 (GCC) version 9.2.0 (alpha-linux)
 #  compiled by GNU C version 6.5.0 20181026, GMP version 6.1.0, MPFR 
version 3.1.4, MPC version 1.0.3, isl version none
 # warning: GMP header version 6.1.0 differs from library version 6.1.2.
 # warning: MPC header version 1.0.3 differs from library version 1.1.0.
 # GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
 # options passed:  -nostdinc -I ./arch/alpha/include
 # -I ./arch/alpha/include/generated -I ./include
 # -I ./arch/alpha/include/uapi -I ./arch/alpha/include/generated/uapi
 # -I ./include/uapi -I ./include/generated/uapi
 # -iprefix 
/opt/kernel/gcc-9.2.0-nolibc/alpha-linux/bin/../lib/gcc/alpha-linux/9.2.0/
 # -D __KERNEL__ -D KBUILD_BASENAME="readdir" -D KBUILD_MODNAME="readdir"
 # -isystem 
/opt/kernel/gcc-9.2.0-nolibc/alpha-linux/bin/../lib/gcc/alpha-linux/9.2.0/include
 # -include ./include/linux/kconfig.h
 # -include ./include/linux/compiler_types.h -MD fs/.readdir.s.d
 # fs/readdir.c -mno-fp-regs -mcpu=ev5 -auxbase-strip fs/readdir.s -O2
 # -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs
 # -Werror=implicit-function-declaration -Werror=implicit-int
 # -Wno-format-security -Wno-frame-address -Wformat-truncation=0
 # -Wformat-overflow=0 -Wno-address-of-packed-member
 # -Wframe-larger-than=2048 -Wno-unused-but-set-variable
 # -Wimplicit-fallthrough=3 -Wunused-const-variable=0
 # -Wdeclaration-after-statement -Wvla -Wno-pointer-sign
 # -Wno-stringop-truncation -Werror=date-time
 # -Werror=incompatible-pointer-types -Werror=designated-init
 # -Wno-packed-not-aligned -std=gnu90 -fno-strict-aliasing -fno-common
 # -fshort-wchar -fno-PIE -ffixed-8 -fno-jump-tables
 # -fno-delete-null-pointer-checks -fno-stack-protector
 # -fomit-frame-pointer -fno-strict-overflow -fno-merge-all-constants
 # -fmerge-constants -fstack-check=no -fconserve-stack
 # -fmacro-prefix-map=./= -fverbose-asm --param allow-store-data-races=0
 # options enabled:  -faggressive-loop-optimizations -falign-functions
 # -falign-jumps -falign-labels -falign-loops -fassume-phsa -fauto-inc-dec
 # -fbranch-count-reg -fcaller-saves -fcode-hoisting
 # -fcombine-stack-adjustments -fcompare-elim -fcprop-registers
 # -fcrossjumping -fcse-follow-jumps -fdefer-pop -fdevirtualize
 # -fdevirtualize-speculatively -fdwarf2-cfi-asm -fearly-inlining
 # -feliminate-unused-debug-types -fexpensive-optimizations
 # -fforward-propagate -ffp-int-builtin-inexact -ffunction-cse -fgcse
 # -fgcse-lm -fgnu-runtime -fgnu-unique -fguess-branch-probability
 # -fhoist-adjacent-loads -fident -fif-conversion -fif-conversion2
 # -findirect-inlining -finline -finline-atomics
 # -finline-functions-called-once -finline-small-functions -fipa-bit-cp
 # -fipa-cp -fipa-icf -fipa-icf-functions -fipa-icf-variables -fipa-profile
 # -fipa-pure-const -fipa-ra -fipa-reference -fipa-reference-addressable
 # -fipa-sra -fipa-stack-alignment -fipa-vrp -fira-hoist-pressure
 # -fira-share-save-slots -fira-share-spill-slots
 # -fisolate-erroneous-paths-dereference -fivopts -fkeep-static-consts
 # -fleading-underscore -flifetime-dse -flra-remat -flto-odr-type-merging
 # -fmath-errno -fmerge-constants -fmerge-debug-strings
 # -fmove-loop-invariants -fomit-frame-pointer -foptimize-sibling-calls
 # -foptimize-strlen -fpartial-inlining -fpcc-struct-return -fpeephole
 # -fpeephole2 -fplt -fprefetch-loop-arrays -free -freorder-blocks

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-06 Thread Guenter Roeck


On 10/6/19 4:35 PM, Linus Torvalds wrote:
[ ... ]


Anyway, let me think about this, but just for testing, does the
attached patch make any difference? It's not the right thing in
general (and most definitely not on x86), but for testing whether this
is about unaligned accesses it might work.



All my alpha, sparc64, and xtensa tests pass with the attached patch
applied on top of v5.4-rc2. I didn't test any others.

I'll (try to) send you some disassembly next.

Guenter

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-06 Thread Linus Torvalds

On Sun, Oct 6, 2019 at 4:06 PM Linus Torvalds
 wrote:
>
> Ho humm. I've run variations of that patch over a few years on x86,
> but obviously not on alpha/sparc.

Oooh.

I wonder... This may be the name string copy loop. And it's special in
that the result may not be aligned.

Now, a "__put_user()" with an unaligned address _should_ work - it's
very easy to trigger that from user space by just giving an unaligned
address to any system call that then writes a single word.

But alpha does

  #define __put_user_32(x, addr)  \
  __asm__ __volatile__("1: stl %r2,%1\n"  \
  "2:\n"  \
  EXC(1b,2b,$31,%0)   \
  : "=r"(__pu_err)\
  : "m"(__m(addr)), "rJ"(x), "0"(__pu_err))

iow it implements that 32-bit __put_user() as a 'stl'.

Which will trap if it's not aligned.

And I wonder how much testing that has ever gotten. Nobody really does
unaigned accesses on alpha.

We need to do that memcpy unrolling on x86, because x86 actually uses
"user_access_begin()" and we have magic rules about what is inside
that region.

But on alpha (and sparc) it might be better to just do "__copy_to_user()".

Anyway, this does look like a possible latent bug where the alpha
unaligned trap doesn't then handle the case of exceptions. I know it
_tries_, but I doubt it's gotten a whole lot of testing.

Anyway, let me think about this, but just for testing, does the
attached patch make any difference? It's not the right thing in
general (and most definitely not on x86), but for testing whether this
is about unaligned accesses it might work.

It's entirely untested, and in fact on x86 it should cause objtool to
complain about a function call with AC set. But I think that on alpha
and sparc, using __copy_to_user() for the name copy should work, and
would work around the unaligned issue.

That said, if it *is* the unaligned issue, then that just means that
we have a serious bug elsewhere in the alpha port. Maybe nobody cares.

  Linus
 fs/readdir.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/fs/readdir.c b/fs/readdir.c
index 19bea591c3f1..d49c4e2c66a8 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -76,6 +76,15 @@
 	unsafe_put_user(0, dst, label);\
 } while (0)

+/* Alpha (and sparc?) test patch! */
+#undef unsafe_copy_dirent_name
+#define unsafe_copy_dirent_name(_dst, _src, _len, label) do {	\
+	char __user *dst = (_dst);\
+	const char *src = (_src);\
+	size_t len = (_len);	\
+	if (__copy_to_user(dst, src, len)) goto label;		\
+	unsafe_put_user(0, dst+len, label);			\
+} while (0)

 int iterate_dir(struct file *file, struct dir_context *ctx)
 {

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-06 Thread Linus Torvalds

On Sun, Oct 6, 2019 at 3:20 PM Guenter Roeck  wrote:
>
> this patch causes all my sparc64 emulations to stall during boot. It causes
> all alpha emulations to crash with [1a] and [1b] when booting from a virtual
> disk, and one of the xtensa emulations to crash with [2].

Ho humm. I've run variations of that patch over a few years on x86,
but obviously not on alpha/sparc.

At least I should still be able to read alpha assembly, even after all
these years. Would you mind sending me the result of

make fs/readdir.s

on alpha with the broken config? I'd hope that the sparc issue is the same.

Actually, could you also do

make fs/readdir.o

and then send me the "objdump --disassemble" of that? That way I get
the instruction offsets without having to count by hand.

> Unable to handle kernel paging request at virtual address 0004
> rcS(47): Oops -1
> pc = [<0004>]  ra = []  ps = Not tainted
> pc is at 0x4

That is _funky_. I'm not seeing how it could possibly jump to 0x4, but
it clearly does.

That said, are you sure it's _that_ commit? Because this pattern:

> a0 = fc0007dbca56  a1 = 2f2f2f2f2f2f2f2f  a2 = 000a

implicates the memchr('/') call in the next one. That's a word full of
'/' characters.

Of course, it could just be left-over register contents from that
memchr(), but it makes me wonder. Particularly since it seems to
happen early in filldir64():

> ra is at filldir64+0x64/0x320

which is just a fairly small handful of instructions in, and I
wouldn't be shocked if that's the return address for the call to
memchr.

  Linus

Re: [PATCH] Convert filldir[64]() from __put_user() to unsafe_put_user()

2019-10-06 Thread Guenter Roeck

On Sat, May 21, 2016 at 09:59:07PM -0700, Linus Torvalds wrote:
> We really should avoid the "__{get,put}_user()" functions entirely,
> because they can easily be mis-used and the original intent of being
> used for simple direct user accesses no longer holds in a post-SMAP/PAN
> world.
> 
> Manually optimizing away the user access range check makes no sense any
> more, when the range check is generally much cheaper than the "enable
> user accesses" code that the __{get,put}_user() functions still need.
> 
> So instead of __put_user(), use the unsafe_put_user() interface with
> user_access_{begin,end}() that really does generate better code these
> days, and which is generally a nicer interface.  Under some loads, the
> multiple user writes that filldir() does are actually quite noticeable.
> 
> This also makes the dirent name copy use unsafe_put_user() with a couple
> of macros.  We do not want to make function calls with SMAP/PAN
> disabled, and the code this generates is quite good when the
> architecture uses "asm goto" for unsafe_put_user() like x86 does.
> 
> Note that this doesn't bother with the legacy cases.  Nobody should use
> them anyway, so performance doesn't really matter there.
> 
> Signed-off-by: Linus Torvalds 

Linus,

this patch causes all my sparc64 emulations to stall during boot. It causes
all alpha emulations to crash with [1a] and [1b] when booting from a virtual
disk, and one of the xtensa emulations to crash with [2].

Reverting this patch fixes the problem.

Guenter

---
[1a]

Unable to handle kernel paging request at virtual address 0004
rcS(47): Oops -1
pc = [<0004>]  ra = []  ps = Not tainted
pc is at 0x4
ra is at filldir64+0x64/0x320
v0 =   t0 =   t1 = 000120117e8b
t2 = 646e617275303253  t3 = 646e61727530  t4 = 7fe8
t5 = 000120117e78  t6 =   t7 = fc0007ec8000
s0 = fc0007dbca56  s1 = 000a  s2 = 0020
s3 = fc0007ecbec8  s4 = 0008  s5 = 0021
s6 = 1cd2631fe897bf5a
a0 = fc0007dbca56  a1 = 2f2f2f2f2f2f2f2f  a2 = 000a
a3 = 1cd2631fe897bf5a  a4 = 0021  a5 = 0008
t8 = 0020  t9 =   t10= fc0007dbca60
t11= 0001  pv = fcb9a810  at = 0001
gp = fcf03930  sp = (ptrval)
Disabling lock debugging due to kernel taint
Trace:
[] call_filldir+0xe8/0x1b0
[] ext4_readdir+0x924/0xa70
[] _raw_spin_unlock+0x18/0x30
[] __handle_mm_fault+0x9fc/0xc30
[] iterate_dir+0x198/0x240
[] iterate_dir+0x5c/0x240
[] ksys_getdents64+0xa8/0x160
[] sys_getdents64+0x20/0x40
[] filldir64+0x0/0x320
[] entSys+0xa4/0xc0

---
[1b]

Unable to handle kernel paging request at virtual address 0004
reboot(50): Oops -1
pc = [<0004>]  ra = []  ps = Tainted: G   
   D
pc is at 0x4
ra is at filldir64+0x64/0x320
v0 =   t0 = 67736d6b  t1 = 00012011445b
t2 =   t3 =   t4 = 7ef8
t5 = 000120114448  t6 =   t7 = fc0007eec000
s0 = fc000792b5c3  s1 = 0004  s2 = 0018
s3 = fc0007eefec8  s4 = 0008  s5 = f0a3
s6 = 000b
a0 = fc000792b5c3  a1 = 2f2f2f2f2f2f2f2f  a2 = 0004
a3 = 000b  a4 = f0a3  a5 = 0008
t8 = 0018  t9 =   t10= 22e1d02a
t11= 00011f8fd3b8  pv = fcb9a810  at = 22e1ccf8
gp = fcf03930  sp = (ptrval)
Trace:
[] proc_readdir_de+0x170/0x300
[] filldir64+0x0/0x320
[] proc_root_readdir+0x3c/0x80
[] iterate_dir+0x198/0x240
[] ksys_getdents64+0xa8/0x160
[] sys_getdents64+0x20/0x40
[] filldir64+0x0/0x320
[] entSys+0xa4/0xc0

---
[2]

Unable to handle kernel paging request at virtual address 0004
reboot(50): Oops -1
pc = [<0004>]  ra = []  ps = Tainted: G   
   D
pc is at 0x4
ra is at filldir64+0x64/0x320
v0 =   t0 = 67736d6b  t1 = 00012011445b
t2 =   t3 =   t4 = 7ef8
t5 = 000120114448  t6 =   t7 = fc0007eec000
s0 = fc000792b5c3  s1 = 0004  s2 = 0018
s3 = fc0007eefec8  s4 = 0008  s5 = f0a3
s6 = 000b
a0 = fc000792b5c3  a1 = 2f2f2f2f2f2f2f2f  a2 = 0004
a3 = 000b  a4 = f0a3  a5 = 0008
t8 = 0018  t9 =   t10= 22e1d02a
t11= 00011fd6f3b8  pv = fcb9a810  at = 22e1ccf8
gp = fcf03930  sp = (ptrval)
Trace:
[] proc_readdir_de+0x170/0x300
[] filldir64+0x0/0x320
[] proc_root_readdir+0x3c/0x80
[] iterate_dir+0x198/0x240
[] ksys_getdents64+0xa8/0x160
[] sys_getdents64+0x20/0x40
[] filldir64+0x0/0x320
[] entSys+0xa4/0xc0

Code:
 
 00063301
 07a3

58 matches

Mail list logo