Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()

2018-03-17 Thread Josh Poimboeuf
On Sat, Mar 17, 2018 at 01:07:32PM -0700, Kees Cook wrote:
> On Sat, Mar 17, 2018 at 11:52 AM, Linus Torvalds
>  wrote:
> > So the above is completely insane, bit there is actually a chance that
> > using that completely crazy "x -> sizeof(char[x])" conversion actually
> > helps, because it really does have a (very odd) evaluation-time
> > change.  sizeof() has to be evaluated as part of the constant
> > expression evaluation, in ways that "__builtin_constant_p()" isn't
> > specified to be done.
> >
> > But it is also definitely me grasping at straws. If that doesn't work
> > for 4.4, there's nothing else I can possibly see.
> 
> No luck! :( gcc 4.4 refuses to play along. And, hilariously, not only
> does it not change the complaint about __builtin_choose_expr(), it
> also thinks that's a VLA now.
> 
> ./include/linux/mm.h: In function ‘get_mm_hiwater_rss’:
> ./include/linux/mm.h:1567: warning: variable length array is used
> ./include/linux/mm.h:1567: error: first argument to
> ‘__builtin_choose_expr’ not a constant
> 
> 6.8 is happy with it (of course).
> 
> I do think the earlier version (without the
> sizeof-hiding-builting_constant_p) provides a template for a
> const_max() that both you and Rasmus would be happy with, though!

I thought we were dropping support for 4.4 (for other reasons).  Isn't
it 4.6 we should be looking at?

-- 
Josh


Re: [PATCH 0/3] Remove accidental VLA usage

2018-03-08 Thread Josh Poimboeuf
On Thu, Mar 08, 2018 at 10:02:01AM -0800, Kees Cook wrote:
> On Thu, Mar 8, 2018 at 7:02 AM, Josh Poimboeuf <jpoim...@redhat.com> wrote:
> > On Wed, Mar 07, 2018 at 07:30:44PM -0800, Kees Cook wrote:
> >> This series adds SIMPLE_MAX() to be used in places where a stack array
> >> is actually fixed, but the compiler still warns about VLA usage due to
> >> confusion caused by the safety checks in the max() macro.
> >>
> >> I'm sending these via -mm since that's where I've introduced SIMPLE_MAX(),
> >> and they should all have no operational differences.
> >
> > What if we instead simplify the max() macro's type checking so that GCC
> > can more easily fold the array size constants?  The below patch seems to
> > work:
> 
> Oh magic! Very nice. I couldn't figure out how to do this when I
> stared at it. Yes, let me respin. (I assume I can add your S-o-b?)

I'm going to be traveling for a few days, so I bequeath the patch to
you.  You can add my SOB.  I agree with Steve's suggestion to run it
through 0-day.

> 
> -Kees
> 
> >
> > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > index 3fd291503576..ec863726da29 100644
> > --- a/include/linux/kernel.h
> > +++ b/include/linux/kernel.h
> > @@ -782,42 +782,32 @@ ftrace_vprintk(const char *fmt, va_list ap)
> >  static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
> >  #endif /* CONFIG_TRACING */
> >
> > -/*
> > - * min()/max()/clamp() macros that also do
> > - * strict type-checking.. See the
> > - * "unnecessary" pointer comparison.
> > - */
> > -#define __min(t1, t2, min1, min2, x, y) ({ \
> > -   t1 min1 = (x);  \
> > -   t2 min2 = (y);  \
> > -   (void) ( == );\
> > -   min1 < min2 ? min1 : min2; })
> > +extern long __error_incompatible_types_in_min_macro;
> > +extern long __error_incompatible_types_in_max_macro;
> > +
> > +#define __min(t1, t2, x, y)\
> > +   __builtin_choose_expr(__builtin_types_compatible_p(t1, t2), \
> > + (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
> > + (t1)__error_incompatible_types_in_min_macro)
> >
> >  /**
> >   * min - return minimum of two values of the same or compatible types
> >   * @x: first value
> >   * @y: second value
> >   */
> > -#define min(x, y)  \
> > -   __min(typeof(x), typeof(y), \
> > - __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
> > - x, y)
> > +#define min(x, y) __min(typeof(x), typeof(y), x, y)\
> >
> > -#define __max(t1, t2, max1, max2, x, y) ({ \
> > -   t1 max1 = (x);  \
> > -   t2 max2 = (y);  \
> > -   (void) ( == );\
> > -   max1 > max2 ? max1 : max2; })
> > +#define __max(t1, t2, x, y)\
> > +   __builtin_choose_expr(__builtin_types_compatible_p(t1, t2), \
> > + (t1)(x) > (t2)(y) ? (t1)(x) : (t2)(y),\
> > + (t1)__error_incompatible_types_in_max_macro)
> >
> >  /**
> >   * max - return maximum of two values of the same or compatible types
> >   * @x: first value
> >   * @y: second value
> >   */
> > -#define max(x, y)  \
> > -   __max(typeof(x), typeof(y), \
> > - __UNIQUE_ID(max1_), __UNIQUE_ID(max2_),   \
> > - x, y)
> > +#define max(x, y) __max(typeof(x), typeof(y), x, y)
> >
> >  /**
> >   * min3 - return minimum of three values
> > @@ -869,10 +859,7 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
> > oops_dump_mode) { }
> >   * @x: first value
> >   * @y: second value
> >   */
> > -#define min_t(type, x, y)  \
> > -   __min(type, type,   \
> > - __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
> > - x, y)
> > +#define min_t(type, x, y) __min(type, type, x, y)
> >
> >  /**
> >   * max_t - return maximum of two values, using the specified type
> > @@ -880,10 +867,7 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
> > oops_dump_mode) { }
> >   * @x: first value
> >   * @y: second value
> >   */
> > -#define max_t(type, x, y)  \
> > -   __max(type, type,   \
> > - __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
> > - x, y)
> > +#define max_t(type, x, y) __max(type, type, x, y)  
> > \
> >
> >  /**
> >   * clamp_t - return a value clamped to a given range using a given type
> 
> 
> 
> -- 
> Kees Cook
> Pixel Security

-- 
Josh


Re: [PATCH 0/3] Remove accidental VLA usage

2018-03-08 Thread Josh Poimboeuf
On Wed, Mar 07, 2018 at 07:30:44PM -0800, Kees Cook wrote:
> This series adds SIMPLE_MAX() to be used in places where a stack array
> is actually fixed, but the compiler still warns about VLA usage due to
> confusion caused by the safety checks in the max() macro.
> 
> I'm sending these via -mm since that's where I've introduced SIMPLE_MAX(),
> and they should all have no operational differences.

What if we instead simplify the max() macro's type checking so that GCC
can more easily fold the array size constants?  The below patch seems to
work:

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 3fd291503576..ec863726da29 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -782,42 +782,32 @@ ftrace_vprintk(const char *fmt, va_list ap)
 static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
 #endif /* CONFIG_TRACING */
 
-/*
- * min()/max()/clamp() macros that also do
- * strict type-checking.. See the
- * "unnecessary" pointer comparison.
- */
-#define __min(t1, t2, min1, min2, x, y) ({ \
-   t1 min1 = (x);  \
-   t2 min2 = (y);  \
-   (void) ( == );\
-   min1 < min2 ? min1 : min2; })
+extern long __error_incompatible_types_in_min_macro;
+extern long __error_incompatible_types_in_max_macro;
+
+#define __min(t1, t2, x, y)\
+   __builtin_choose_expr(__builtin_types_compatible_p(t1, t2), \
+ (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
+ (t1)__error_incompatible_types_in_min_macro)
 
 /**
  * min - return minimum of two values of the same or compatible types
  * @x: first value
  * @y: second value
  */
-#define min(x, y)  \
-   __min(typeof(x), typeof(y), \
- __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
- x, y)
+#define min(x, y) __min(typeof(x), typeof(y), x, y)\
 
-#define __max(t1, t2, max1, max2, x, y) ({ \
-   t1 max1 = (x);  \
-   t2 max2 = (y);  \
-   (void) ( == );\
-   max1 > max2 ? max1 : max2; })
+#define __max(t1, t2, x, y)\
+   __builtin_choose_expr(__builtin_types_compatible_p(t1, t2), \
+ (t1)(x) > (t2)(y) ? (t1)(x) : (t2)(y),\
+ (t1)__error_incompatible_types_in_max_macro)
 
 /**
  * max - return maximum of two values of the same or compatible types
  * @x: first value
  * @y: second value
  */
-#define max(x, y)  \
-   __max(typeof(x), typeof(y), \
- __UNIQUE_ID(max1_), __UNIQUE_ID(max2_),   \
- x, y)
+#define max(x, y) __max(typeof(x), typeof(y), x, y)
 
 /**
  * min3 - return minimum of three values
@@ -869,10 +859,7 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
oops_dump_mode) { }
  * @x: first value
  * @y: second value
  */
-#define min_t(type, x, y)  \
-   __min(type, type,   \
- __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
- x, y)
+#define min_t(type, x, y) __min(type, type, x, y)
 
 /**
  * max_t - return maximum of two values, using the specified type
@@ -880,10 +867,7 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
oops_dump_mode) { }
  * @x: first value
  * @y: second value
  */
-#define max_t(type, x, y)  \
-   __max(type, type,   \
- __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
- x, y)
+#define max_t(type, x, y) __max(type, type, x, y)  
\
 
 /**
  * clamp_t - return a value clamped to a given range using a given type


Re: [PATCH 06/18] x86, barrier: stop speculation for failed access_ok

2018-01-09 Thread Josh Poimboeuf
On Tue, Jan 09, 2018 at 01:59:04PM -0800, Dan Williams wrote:
> > Right, but what's the purpose of preventing speculation past
> > access_ok()?
> 
> Caution. It's the same rationale for the nospec_array_ptr() patches.
> If we, kernel community, can identify any possible speculation past a
> bounds check we should inject a speculation mitigation. Unless there's
> a way to be 100% certain that the first unwanted speculation can be
> turned into a gadget later on in the instruction stream, err on the
> side of shutting it down early.

I'm all for being cautious.  The nospec_array_ptr() patches are fine,
and they make sense in light of the variant 1 CVE.

But that still doesn't answer my question.  I haven't seen *any*
rationale for this patch.  It would be helpful to at least describe
what's being protected against, even if it's hypothetical.  How can we
review it if the commit log doesn't describe its purpose?

-- 
Josh


Re: [PATCH 06/18] x86, barrier: stop speculation for failed access_ok

2018-01-09 Thread Josh Poimboeuf
On Tue, Jan 09, 2018 at 01:47:09PM -0800, Dan Williams wrote:
> On Tue, Jan 9, 2018 at 1:41 PM, Josh Poimboeuf <jpoim...@redhat.com> wrote:
> > On Fri, Jan 05, 2018 at 06:52:07PM -0800, Linus Torvalds wrote:
> >> On Fri, Jan 5, 2018 at 5:10 PM, Dan Williams <dan.j.willi...@intel.com> 
> >> wrote:
> >> > From: Andi Kleen <a...@linux.intel.com>
> >> >
> >> > When access_ok fails we should always stop speculating.
> >> > Add the required barriers to the x86 access_ok macro.
> >>
> >> Honestly, this seems completely bogus.
> >>
> >> The description is pure garbage afaik.
> >>
> >> The fact is, we have to stop speculating when access_ok() does *not*
> >> fail - because that's when we'll actually do the access. And it's that
> >> access that needs to be non-speculative.
> >>
> >> That actually seems to be what the code does (it stops speculation
> >> when __range_not_ok() returns false, but access_ok() is
> >> !__range_not_ok()). But the explanation is crap, and dangerous.
> >
> > The description also seems to be missing the "why", as it's not
> > self-evident (to me, at least).
> >
> > Isn't this (access_ok/uaccess_begin/ASM_STAC) too early for the lfence?
> >
> > i.e., wouldn't the pattern be:
> >
> > get_user(uval, uptr);
> > if (uval < array_size) {
> > lfence();
> > foo = a2[a1[uval] * 256];
> > }
> >
> > Shouldn't the lfence come much later, *after* reading the variable and
> > comparing it and branching accordingly?
> 
> The goal of putting the lfence in uaccess_begin() is to prevent
> speculation past access_ok().

Right, but what's the purpose of preventing speculation past
access_ok()?

-- 
Josh


Re: [PATCH 06/18] x86, barrier: stop speculation for failed access_ok

2018-01-09 Thread Josh Poimboeuf
On Fri, Jan 05, 2018 at 06:52:07PM -0800, Linus Torvalds wrote:
> On Fri, Jan 5, 2018 at 5:10 PM, Dan Williams  wrote:
> > From: Andi Kleen 
> >
> > When access_ok fails we should always stop speculating.
> > Add the required barriers to the x86 access_ok macro.
> 
> Honestly, this seems completely bogus.
> 
> The description is pure garbage afaik.
> 
> The fact is, we have to stop speculating when access_ok() does *not*
> fail - because that's when we'll actually do the access. And it's that
> access that needs to be non-speculative.
> 
> That actually seems to be what the code does (it stops speculation
> when __range_not_ok() returns false, but access_ok() is
> !__range_not_ok()). But the explanation is crap, and dangerous.

The description also seems to be missing the "why", as it's not
self-evident (to me, at least).

Isn't this (access_ok/uaccess_begin/ASM_STAC) too early for the lfence?

i.e., wouldn't the pattern be:

get_user(uval, uptr);
if (uval < array_size) {
lfence();
foo = a2[a1[uval] * 256];
}

Shouldn't the lfence come much later, *after* reading the variable and
comparing it and branching accordingly?

-- 
Josh


Re: [PATCH 00/18] prevent bounds-check bypass via speculative execution

2018-01-09 Thread Josh Poimboeuf
On Tue, Jan 09, 2018 at 11:44:05AM -0800, Dan Williams wrote:
> On Tue, Jan 9, 2018 at 11:34 AM, Jiri Kosina  wrote:
> > On Fri, 5 Jan 2018, Dan Williams wrote:
> >
> > [ ... snip ... ]
> >> Andi Kleen (1):
> >>   x86, barrier: stop speculation for failed access_ok
> >>
> >> Dan Williams (13):
> >>   x86: implement nospec_barrier()
> >>   [media] uvcvideo: prevent bounds-check bypass via speculative 
> >> execution
> >>   carl9170: prevent bounds-check bypass via speculative execution
> >>   p54: prevent bounds-check bypass via speculative execution
> >>   qla2xxx: prevent bounds-check bypass via speculative execution
> >>   cw1200: prevent bounds-check bypass via speculative execution
> >>   Thermal/int340x: prevent bounds-check bypass via speculative 
> >> execution
> >>   ipv6: prevent bounds-check bypass via speculative execution
> >>   ipv4: prevent bounds-check bypass via speculative execution
> >>   vfs, fdtable: prevent bounds-check bypass via speculative execution
> >>   net: mpls: prevent bounds-check bypass via speculative execution
> >>   udf: prevent bounds-check bypass via speculative execution
> >>   userns: prevent bounds-check bypass via speculative execution
> >>
> >> Mark Rutland (4):
> >>   asm-generic/barrier: add generic nospec helpers
> >>   Documentation: document nospec helpers
> >>   arm64: implement nospec_ptr()
> >>   arm: implement nospec_ptr()
> >
> > So considering the recent publication of [1], how come we all of a sudden
> > don't need the barriers in ___bpf_prog_run(), namely for LD_IMM_DW and
> > LDX_MEM_##SIZEOP, and something comparable for eBPF JIT?
> >
> > Is this going to be handled in eBPF in some other way?
> >
> > Without that in place, and considering Jann Horn's paper, it would seem
> > like PTI doesn't really lock it down fully, right?
> 
> Here is the latest (v3) bpf fix:
> 
> https://patchwork.ozlabs.org/patch/856645/
> 
> I currently have v2 on my 'nospec' branch and will move that to v3 for
> the next update, unless it goes upstream before then.

That patch seems specific to CONFIG_BPF_SYSCALL.  Is the bpf() syscall
the only attack vector?  Or are there other ways to run bpf programs
that we should be worried about?

-- 
Josh


Re: WARNING: kernel stack frame pointer at ffff880156a5fea0 in bash:2103 has bad value 00007ffec7d87e50

2017-09-27 Thread Josh Poimboeuf
On Wed, Sep 27, 2017 at 08:51:22AM +0200, Richard Weinberger wrote:
> Am Mittwoch, 27. September 2017, 00:42:46 CEST schrieb Josh Poimboeuf:
> > > Here is another variant of the warning, it matches the attached .config:
> > I can take a look at it.  Unfortunately, for these types of issues I
> > often need the vmlinux file to be able to make sense of the unwinder
> > dump.  So if you happen to have somewhere to copy the vmlinux to, that
> > would be helpful.  Or if you give me your GCC version I can try to
> > rebuild it locally.
> 
> There you go:
> http://git.infradead.org/~rw/bpf_splat/vmlinux.xz

Thanks.  Can you test this fix?


diff --git a/arch/x86/kernel/kprobes/common.h b/arch/x86/kernel/kprobes/common.h
index db2182d63ed0..3fc0f9a794cb 100644
--- a/arch/x86/kernel/kprobes/common.h
+++ b/arch/x86/kernel/kprobes/common.h
@@ -3,6 +3,15 @@
 
 /* Kprobes and Optprobes common header */
 
+#include 
+
+#ifdef CONFIG_FRAME_POINTER
+# define SAVE_RBP_STRING " push %" _ASM_BP "\n" \
+"  mov  %" _ASM_SP ", %" _ASM_BP "\n"
+#else
+# define SAVE_RBP_STRING " push %" _ASM_BP "\n"
+#endif
+
 #ifdef CONFIG_X86_64
 #define SAVE_REGS_STRING   \
/* Skip cs, ip, orig_ax. */ \
@@ -17,7 +26,7 @@
"   pushq %r10\n"   \
"   pushq %r11\n"   \
"   pushq %rbx\n"   \
-   "   pushq %rbp\n"   \
+   SAVE_RBP_STRING \
"   pushq %r12\n"   \
"   pushq %r13\n"   \
"   pushq %r14\n"   \
@@ -48,7 +57,7 @@
"   pushl %es\n"\
"   pushl %ds\n"\
"   pushl %eax\n"   \
-   "   pushl %ebp\n"   \
+   SAVE_RBP_STRING \
"   pushl %edi\n"   \
"   pushl %esi\n"   \
"   pushl %edx\n"   \


Re: WARNING: kernel stack frame pointer at ffff880156a5fea0 in bash:2103 has bad value 00007ffec7d87e50

2017-09-26 Thread Josh Poimboeuf
On Tue, Sep 26, 2017 at 11:51:31PM +0200, Richard Weinberger wrote:
> Alexei,
> 
> CC'ing Josh and Ingo.
> 
> Am Dienstag, 26. September 2017, 06:09:02 CEST schrieb Alexei Starovoitov:
> > On Mon, Sep 25, 2017 at 11:23:31PM +0200, Richard Weinberger wrote:
> > > Hi!
> > > 
> > > While playing with bcc's opensnoop tool on Linux 4.14-rc2 I managed to
> > > trigger this splat:
> > > 
> > > [  297.629773] WARNING: kernel stack frame pointer at 880156a5fea0 in
> > > bash:2103 has bad value 7ffec7d87e50
> > > [  297.629777] unwind stack type:0 next_sp:  (null) mask:0x6
> > > graph_idx:0
> > > [  297.629783] 88015b207ae0: 88015b207b68 (0x88015b207b68)
> > > [  297.629790] 88015b207ae8: b163c00e
> > > (__save_stack_trace+0x6e/
> > > 0xd0)
> > > [  297.629792] 88015b207af0:  ...
> > > [  297.629795] 88015b207af8: 880156a58000 (0x880156a58000)
> > > [  297.629799] 88015b207b00: 880156a6 (0x880156a6)
> > > [  297.629800] 88015b207b08:  ...
> > > [  297.629803] 88015b207b10: 0006 (0x6)
> > > [  297.629806] 88015b207b18: 880151b02700 (0x880151b02700)
> > > [  297.629809] 88015b207b20: 0101 (0x101)
> > > [  297.629812] 88015b207b28: 880156a5fea0 (0x880156a5fea0)
> > > [  297.629815] 88015b207b30: 88015b207ae0 (0x88015b207ae0)
> > > [  297.629818] 88015b207b38: c0050282 (0xc0050282)
> > > [  297.629819] 88015b207b40:  ...
> > > [  297.629822] 88015b207b48: 0100 (0x100)
> > > [  297.629825] 88015b207b50: 880157b98280 (0x880157b98280)
> > > [  297.629828] 88015b207b58: 880157b98380 (0x880157b98380)
> > > [  297.629831] 88015b207b60: 88015ad2b500 (0x88015ad2b500)
> > > [  297.629834] 88015b207b68: 88015b207b78 (0x88015b207b78)
> > > [  297.629838] 88015b207b70: b163c086
> > > (save_stack_trace+0x16/0x20) [  297.629841] 88015b207b78:
> > > 88015b207da8 (0x88015b207da8) [  297.629847] 88015b207b80:
> > > b18a8ed6 (save_stack+0x46/0xd0) [  297.629850] 88015b207b88:
> > > 004c (0x4c)
> > > [  297.629852] 88015b207b90: 88015b207ba0 (0x88015b207ba0)
> > > [  297.629855] 88015b207b98: 8801 (0x8801)
> > > [  297.629859] 88015b207ba0: b163c086
> > > (save_stack_trace+0x16/0x20) [  297.629864] 88015b207ba8:
> > > b18a8ed6 (save_stack+0x46/0xd0) [  297.629868] 88015b207bb0:
> > > b18a9752 (kasan_slab_free+0x72/0xc0)
> > Thanks for the report!
> > I'm not sure I understand what's going on here.
> > It seems you have kasan enabled and it's trying to do save_stack()
> > and something crashing?
> > I don't see any bpf related helpers in the stack trace.
> > Which architecture is this? and .config ?
> > Is bpf jit enabled? If so, make sure that net.core.bpf_jit_kallsyms=1
> 
> I found some time to dig a little further.
> It seems to happen only when CONFIG_DEBUG_SPINLOCK is enabled, please see the 
> attached .config. The JIT is off.
> KAsan is also not involved at all, the regular stack saving machinery from 
> the 
> trace framework initiates the stack unwinder.
> 
> The issue arises as soon as in pre_handler_kretprobe() 
> raw_spin_lock_irqsave() 
> is being called.
> It happens on all releases that have commit c32c47c68a0a ("x86/unwind: Warn 
> on 
> bad frame pointer").
> Interestingly it does not happen when I run 
> samples/kprobes/kretprobe_example.ko. So, BPF must be involved somehow.
> 
> Here is another variant of the warning, it matches the attached .config:

I can take a look at it.  Unfortunately, for these types of issues I
often need the vmlinux file to be able to make sense of the unwinder
dump.  So if you happen to have somewhere to copy the vmlinux to, that
would be helpful.  Or if you give me your GCC version I can try to
rebuild it locally.

-- 
Josh


[tip:x86/urgent] x86/asm: Don't use RBP as a temporary register in csum_partial_copy_generic()

2017-05-05 Thread tip-bot for Josh Poimboeuf
Commit-ID:  42fc6c6cb1662ba2fa727dd01c9473c63be4e3b6
Gitweb: http://git.kernel.org/tip/42fc6c6cb1662ba2fa727dd01c9473c63be4e3b6
Author: Josh Poimboeuf <jpoim...@redhat.com>
AuthorDate: Thu, 4 May 2017 09:51:40 -0500
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Fri, 5 May 2017 07:59:24 +0200

x86/asm: Don't use RBP as a temporary register in csum_partial_copy_generic()

Andrey Konovalov reported the following warning while fuzzing the kernel
with syzkaller:

  WARNING: kernel stack regs at 8800686869f8 in a.out:4933 has bad 'bp' 
value c3fc855a10167ec0

The unwinder dump revealed that RBP had a bad value when an interrupt
occurred in csum_partial_copy_generic().

That function saves RBP on the stack and then overwrites it, using it as
a scratch register.  That's problematic because it breaks stack traces
if an interrupt occurs in the middle of the function.

Replace the usage of RBP with another callee-saved register (R15) so
stack traces are no longer affected.

Reported-by: Andrey Konovalov <andreyk...@google.com>
Tested-by: Andrey Konovalov <andreyk...@google.com>
Signed-off-by: Josh Poimboeuf <jpoim...@redhat.com>
Cc: Cong Wang <xiyou.wangc...@gmail.com>
Cc: David S . Miller <da...@davemloft.net>
Cc: Dmitry Vyukov <dvyu...@google.com>
Cc: Eric Dumazet <eduma...@google.com>
Cc: Kostya Serebryany <k...@google.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Marcelo Ricardo Leitner <marcelo.leit...@gmail.com>
Cc: Neil Horman <nhor...@tuxdriver.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Vlad Yasevich <vyasev...@gmail.com>
Cc: linux-s...@vger.kernel.org
Cc: netdev <netdev@vger.kernel.org>
Cc: syzkaller <syzkal...@googlegroups.com>
Link: 
http://lkml.kernel.org/r/4b03a961efda5ec9bfe46b7b9c9ad72d1efad343.1493909486.git.jpoim...@redhat.com
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 arch/x86/lib/csum-copy_64.S | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/lib/csum-copy_64.S b/arch/x86/lib/csum-copy_64.S
index 7e48807..45a53df 100644
--- a/arch/x86/lib/csum-copy_64.S
+++ b/arch/x86/lib/csum-copy_64.S
@@ -55,7 +55,7 @@ ENTRY(csum_partial_copy_generic)
movq  %r12, 3*8(%rsp)
movq  %r14, 4*8(%rsp)
movq  %r13, 5*8(%rsp)
-   movq  %rbp, 6*8(%rsp)
+   movq  %r15, 6*8(%rsp)
 
movq  %r8, (%rsp)
movq  %r9, 1*8(%rsp)
@@ -74,7 +74,7 @@ ENTRY(csum_partial_copy_generic)
/* main loop. clear in 64 byte blocks */
/* r9: zero, r8: temp2, rbx: temp1, rax: sum, rcx: saved length */
/* r11: temp3, rdx: temp4, r12 loopcnt */
-   /* r10: temp5, rbp: temp6, r14 temp7, r13 temp8 */
+   /* r10: temp5, r15: temp6, r14 temp7, r13 temp8 */
.p2align 4
 .Lloop:
source
@@ -89,7 +89,7 @@ ENTRY(csum_partial_copy_generic)
source
movq  32(%rdi), %r10
source
-   movq  40(%rdi), %rbp
+   movq  40(%rdi), %r15
source
movq  48(%rdi), %r14
source
@@ -103,7 +103,7 @@ ENTRY(csum_partial_copy_generic)
adcq  %r11, %rax
adcq  %rdx, %rax
adcq  %r10, %rax
-   adcq  %rbp, %rax
+   adcq  %r15, %rax
adcq  %r14, %rax
adcq  %r13, %rax
 
@@ -121,7 +121,7 @@ ENTRY(csum_partial_copy_generic)
dest
movq %r10, 32(%rsi)
dest
-   movq %rbp, 40(%rsi)
+   movq %r15, 40(%rsi)
dest
movq %r14, 48(%rsi)
dest
@@ -203,7 +203,7 @@ ENTRY(csum_partial_copy_generic)
movq 3*8(%rsp), %r12
movq 4*8(%rsp), %r14
movq 5*8(%rsp), %r13
-   movq 6*8(%rsp), %rbp
+   movq 6*8(%rsp), %r15
addq $7*8, %rsp
ret
 


Re: [PATCH] x86/asm: Don't use rbp as temp register in csum_partial_copy_generic()

2017-05-04 Thread Josh Poimboeuf
On Thu, May 04, 2017 at 03:56:49PM +, David Laight wrote:
> From: Josh Poimboeuf
> > Sent: 04 May 2017 15:52
> > Andrey Konovalov reported the following warning while fuzzing the kernel
> > with syzkaller:
> > 
> >   WARNING: kernel stack regs at 8800686869f8 in a.out:4933 has bad 'bp' 
> > value c3fc855a10167ec0
> > 
> > The unwinder dump revealed that rbp had a bad value when an interrupt
> > occurred in csum_partial_copy_generic().
> > 
> > That function saves rbp on the stack and then overwrites it, using it as
> > a scratch register.  That's problematic because it breaks stack traces
> > if an interrupt occurs in the middle of the function.
> 
> Does gcc guarantee not to use bp as a scratch register in leaf functions?

At least in practice, gcc doesn't touch rbp in leaf functions.  (I don't
know about guarantees.)

-- 
Josh


[PATCH] x86/asm: Don't use rbp as temp register in csum_partial_copy_generic()

2017-05-04 Thread Josh Poimboeuf
Andrey Konovalov reported the following warning while fuzzing the kernel
with syzkaller:

  WARNING: kernel stack regs at 8800686869f8 in a.out:4933 has bad 'bp' 
value c3fc855a10167ec0

The unwinder dump revealed that rbp had a bad value when an interrupt
occurred in csum_partial_copy_generic().

That function saves rbp on the stack and then overwrites it, using it as
a scratch register.  That's problematic because it breaks stack traces
if an interrupt occurs in the middle of the function.

Replace the usage of rbp with another callee-saved register (r15) so
stack traces are no longer be affected.

Reported-by: Andrey Konovalov <andreyk...@google.com>
Tested-by: Andrey Konovalov <andreyk...@google.com>
Signed-off-by: Josh Poimboeuf <jpoim...@redhat.com>
---
 arch/x86/lib/csum-copy_64.S | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/lib/csum-copy_64.S b/arch/x86/lib/csum-copy_64.S
index 7e48807..45a53df 100644
--- a/arch/x86/lib/csum-copy_64.S
+++ b/arch/x86/lib/csum-copy_64.S
@@ -55,7 +55,7 @@ ENTRY(csum_partial_copy_generic)
movq  %r12, 3*8(%rsp)
movq  %r14, 4*8(%rsp)
movq  %r13, 5*8(%rsp)
-   movq  %rbp, 6*8(%rsp)
+   movq  %r15, 6*8(%rsp)
 
movq  %r8, (%rsp)
movq  %r9, 1*8(%rsp)
@@ -74,7 +74,7 @@ ENTRY(csum_partial_copy_generic)
/* main loop. clear in 64 byte blocks */
/* r9: zero, r8: temp2, rbx: temp1, rax: sum, rcx: saved length */
/* r11: temp3, rdx: temp4, r12 loopcnt */
-   /* r10: temp5, rbp: temp6, r14 temp7, r13 temp8 */
+   /* r10: temp5, r15: temp6, r14 temp7, r13 temp8 */
.p2align 4
 .Lloop:
source
@@ -89,7 +89,7 @@ ENTRY(csum_partial_copy_generic)
source
movq  32(%rdi), %r10
source
-   movq  40(%rdi), %rbp
+   movq  40(%rdi), %r15
source
movq  48(%rdi), %r14
source
@@ -103,7 +103,7 @@ ENTRY(csum_partial_copy_generic)
adcq  %r11, %rax
adcq  %rdx, %rax
adcq  %r10, %rax
-   adcq  %rbp, %rax
+   adcq  %r15, %rax
adcq  %r14, %rax
adcq  %r13, %rax
 
@@ -121,7 +121,7 @@ ENTRY(csum_partial_copy_generic)
dest
movq %r10, 32(%rsi)
dest
-   movq %rbp, 40(%rsi)
+   movq %r15, 40(%rsi)
dest
movq %r14, 48(%rsi)
dest
@@ -203,7 +203,7 @@ ENTRY(csum_partial_copy_generic)
movq 3*8(%rsp), %r12
movq 4*8(%rsp), %r14
movq 5*8(%rsp), %r13
-   movq 6*8(%rsp), %rbp
+   movq 6*8(%rsp), %r15
addq $7*8, %rsp
ret
 
-- 
2.7.4



Re: x86: warning: kernel stack regs has bad 'bp' value

2017-05-03 Thread Josh Poimboeuf
On Wed, May 03, 2017 at 02:48:28PM +0200, Andrey Konovalov wrote:
> Hi,
> 
> I've got the following error report while fuzzing the kernel with syzkaller.
> 
> On commit 89c9fea3c8034cdb2fd745f551cde0b507fd6893 (4.11.0+).
> 
> A reproducer and .config are attached.
> 
> The reproducer open SCTP sockets and sends data to it in a loop.
> I'm not sure whether this is an issue with SCTP or with something else.
> 
> WARNING: kernel stack regs at 8800686869f8 in a.out:4933 has bad
> 'bp' value c3fc855a10167ec0

Hi Andrey,

Can you test this patch?


diff --git a/arch/x86/lib/csum-copy_64.S b/arch/x86/lib/csum-copy_64.S
index 7e48807..45a53df 100644
--- a/arch/x86/lib/csum-copy_64.S
+++ b/arch/x86/lib/csum-copy_64.S
@@ -55,7 +55,7 @@ ENTRY(csum_partial_copy_generic)
movq  %r12, 3*8(%rsp)
movq  %r14, 4*8(%rsp)
movq  %r13, 5*8(%rsp)
-   movq  %rbp, 6*8(%rsp)
+   movq  %r15, 6*8(%rsp)
 
movq  %r8, (%rsp)
movq  %r9, 1*8(%rsp)
@@ -74,7 +74,7 @@ ENTRY(csum_partial_copy_generic)
/* main loop. clear in 64 byte blocks */
/* r9: zero, r8: temp2, rbx: temp1, rax: sum, rcx: saved length */
/* r11: temp3, rdx: temp4, r12 loopcnt */
-   /* r10: temp5, rbp: temp6, r14 temp7, r13 temp8 */
+   /* r10: temp5, r15: temp6, r14 temp7, r13 temp8 */
.p2align 4
 .Lloop:
source
@@ -89,7 +89,7 @@ ENTRY(csum_partial_copy_generic)
source
movq  32(%rdi), %r10
source
-   movq  40(%rdi), %rbp
+   movq  40(%rdi), %r15
source
movq  48(%rdi), %r14
source
@@ -103,7 +103,7 @@ ENTRY(csum_partial_copy_generic)
adcq  %r11, %rax
adcq  %rdx, %rax
adcq  %r10, %rax
-   adcq  %rbp, %rax
+   adcq  %r15, %rax
adcq  %r14, %rax
adcq  %r13, %rax
 
@@ -121,7 +121,7 @@ ENTRY(csum_partial_copy_generic)
dest
movq %r10, 32(%rsi)
dest
-   movq %rbp, 40(%rsi)
+   movq %r15, 40(%rsi)
dest
movq %r14, 48(%rsi)
dest
@@ -203,7 +203,7 @@ ENTRY(csum_partial_copy_generic)
movq 3*8(%rsp), %r12
movq 4*8(%rsp), %r14
movq 5*8(%rsp), %r13
-   movq 6*8(%rsp), %rbp
+   movq 6*8(%rsp), %r15
addq $7*8, %rsp
ret
 


[PATCH v19 04/10] bpf: Mark __bpf_prog_run() stack frame as non-standard

2016-02-28 Thread Josh Poimboeuf
objtool reports the following false positive warnings:

  kernel/bpf/core.o: warning: objtool: __bpf_prog_run()+0x5c: sibling call from 
callable instruction with changed frame pointer
  kernel/bpf/core.o: warning: objtool: __bpf_prog_run()+0x60: function has 
unreachable instruction
  kernel/bpf/core.o: warning: objtool: __bpf_prog_run()+0x64: function has 
unreachable instruction
  [...]

It's confused by the following dynamic jump instruction in
__bpf_prog_run()::

  jmp *(%r12,%rax,8)

which corresponds to the following line in the C code:

  goto *jumptable[insn->code];

There's no way for objtool to deterministically find all possible
branch targets for a dynamic jump, so it can't verify this code.

In this case the jumps all stay within the function, and there's nothing
unusual going on related to the stack, so we can whitelist the function.

Signed-off-by: Josh Poimboeuf <jpoim...@redhat.com>
Acked-by: Daniel Borkmann <dan...@iogearbox.net>
Acked-by: Alexei Starovoitov <a...@kernel.org>
Cc: netdev@vger.kernel.org
---
 kernel/bpf/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 972d9a8..be0abf6 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -649,6 +650,7 @@ load_byte:
WARN_RATELIMIT(1, "unknown opcode %02x\n", insn->code);
return 0;
 }
+STACK_FRAME_NON_STANDARD(__bpf_prog_run); /* jump table */
 
 bool bpf_prog_array_compatible(struct bpf_array *array,
   const struct bpf_prog *fp)
-- 
2.4.3



[PATCH v18 4/9] bpf: Mark __bpf_prog_run() stack frame as non-standard

2016-02-25 Thread Josh Poimboeuf
objtool reports the following false positive warnings:

  objtool: kernel/bpf/core.o: __bpf_prog_run()+0x5c: sibling call from callable 
instruction with changed frame pointer
  objtool: kernel/bpf/core.o: __bpf_prog_run()+0x60: function has unreachable 
instruction
  objtool: kernel/bpf/core.o: __bpf_prog_run()+0x64: function has unreachable 
instruction
  [...]

It's confused by the following dynamic jump instruction in
__bpf_prog_run()::

  jmp *(%r12,%rax,8)

which corresponds to the following line in the C code:

  goto *jumptable[insn->code];

There's no way for objtool to deterministically find all possible
branch targets for a dynamic jump, so it can't verify this code.

In this case the jumps all stay within the function, and there's nothing
unusual going on related to the stack, so we can whitelist the function.

Signed-off-by: Josh Poimboeuf <jpoim...@redhat.com>
Acked-by: Daniel Borkmann <dan...@iogearbox.net>
Acked-by: Alexei Starovoitov <a...@kernel.org>
Cc: netdev@vger.kernel.org
---
 kernel/bpf/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 972d9a8..be0abf6 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -649,6 +650,7 @@ load_byte:
WARN_RATELIMIT(1, "unknown opcode %02x\n", insn->code);
return 0;
 }
+STACK_FRAME_NON_STANDARD(__bpf_prog_run); /* jump table */
 
 bool bpf_prog_array_compatible(struct bpf_array *array,
   const struct bpf_prog *fp)
-- 
2.4.3



Re: [PATCH v17 0/9] Compile-time stack metadata validation

2016-02-25 Thread Josh Poimboeuf
On Thu, Feb 25, 2016 at 09:02:04AM +0100, Ingo Molnar wrote:
> 
> * Josh Poimboeuf <jpoim...@redhat.com> wrote:
> 
> > This is v17 of the compile-time stack metadata validation patch set.
> > 
> > It's based on tip:x86/debug.  However, note that when run against that
> > branch it will give a lot of warnings:
> > 
> >   objtool: arch/x86/ia32/sys_ia32.o: __ex_table size not a multiple of 12
> >   objtool: arch/x86/ia32/ia32_signal.o: __ex_table size not a multiple of 12
> >   objtool: arch/x86/entry/common.o: __ex_table size not a multiple of 12
> >   ...
> > 
> > These warnings means that objtool is expecting the new exception table
> > format which was introduced with:
> > 
> >   548acf19234d ("x86/mm: Expand the exception table logic to allow new 
> > handling options")
> > 
> > So that commit is needed for those warnings to go away.
> 
> Ok. I have created a new topic branch for the objtool commits: 
> tip:core/objtool, 
> and merged tip:ras/core into it (which hosts 548acf19234d).
> 
> So please use tip:core/objtool as a base from now on.
> 
> Two more minor observations:
> 
> Please re-order the patches slightly so that the annotations are added first, 
> and 
> objtool is added and Kconfig-enabled in the final patches.
> 
> >   x86/objtool: Compile-time stack metadata validation
> >   x86/objtool: Add CONFIG_STACK_VALIDATION option
> >   x86/objtool: Enable objtool on x86_64
> >   x86/objtool: Add STACK_FRAME_NON_STANDARD macro
> >   x86/objtool: Add directory and file whitelists
> 
> btw., please use the more generic 'objtool: ' prefix, the concept is not 
> fundamentally x86 specific - especially once dwarf debuginfo is interpreted 
> it 
> will be useful beyond x86 as well I suspect.
> 
> >   x86/xen: Add xen_cpuid() to objtool whitelist
> >   bpf: Add __bpf_prog_run() to objtool whitelist
> >   sched: Add __schedule() to objtool whitelist
> >   x86/kprobes: Add kretprobe_trampoline() to objtool whitelist

Ok.  I'll make all your suggested changes and post v18 today.

-- 
Josh


[PATCH v17 0/9] Compile-time stack metadata validation

2016-02-24 Thread Josh Poimboeuf
tion
- detection of "noreturn" dead end functions
- added a Kbuild mechanism for skipping files and dirs
- moved frame pointer macros to arch/x86/include/asm/frame.h
- moved ignore macros to include/linux/stackvalidate.h

v5:
- stackvalidate -> asmvalidate
- frame pointers only required for non-leaf functions
- check for the use of the FP_SAVE/RESTORE macros instead of manually
  analyzing code to detect frame pointer usage
- additional checks to ensure each function doesn't leave its boundaries
- make the macros simpler and more flexible
- support for analyzing ALTERNATIVE macros
- simplified the arch interfaces in scripts/asmvalidate/arch.h
- fixed some asmvalidate warnings
- rebased onto latest tip asm cleanups
- many more small changes

v4:
- Changed the default to CONFIG_STACK_VALIDATION=n, until all the asm
  code can get cleaned up.
- Fixed a stackvalidate error path exit code issue found by Michal
  Marek.

v3:
- Added a patch to make the push/pop CFI macros arch-independent, as
  suggested by H. Peter Anvin

v2:
- Fixed memory leaks reported by Petr Mladek

Cc: linux-ker...@vger.kernel.org
Cc: live-patch...@vger.kernel.org
Cc: Michal Marek <mma...@suse.cz>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Andi Kleen <a...@firstfloor.org>
Cc: Pedro Alves <pal...@redhat.com>
Cc: Namhyung Kim <namhy...@gmail.com>
Cc: Bernd Petrovitsch <be...@petrovitsch.priv.at>
Cc: Chris J Arges <chris.j.ar...@canonical.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Jiri Slaby <jsl...@suse.cz>
Cc: Arnaldo Carvalho de Melo <a...@kernel.org>

Josh Poimboeuf (9):
  x86/objtool: Compile-time stack metadata validation
  x86/objtool: Add CONFIG_STACK_VALIDATION option
  x86/objtool: Enable objtool on x86_64
  x86/objtool: Add STACK_FRAME_NON_STANDARD macro
  x86/objtool: Add directory and file whitelists
  x86/xen: Add xen_cpuid() to objtool whitelist
  bpf: Add __bpf_prog_run() to objtool whitelist
  sched: Add __schedule() to objtool whitelist
  x86/kprobes: Add kretprobe_trampoline() to objtool whitelist

 MAINTAINERS   |   6 +
 Makefile  |   5 +-
 arch/Kconfig  |   6 +
 arch/x86/Kconfig  |   1 +
 arch/x86/boot/Makefile|   3 +-
 arch/x86/boot/compressed/Makefile |   3 +-
 arch/x86/entry/Makefile   |   4 +
 arch/x86/entry/vdso/Makefile  |   6 +-
 arch/x86/kernel/Makefile  |  11 +-
 arch/x86/kernel/kprobes/core.c|   2 +
 arch/x86/kernel/vmlinux.lds.S |   5 +-
 arch/x86/platform/efi/Makefile|   2 +
 arch/x86/purgatory/Makefile   |   2 +
 arch/x86/realmode/Makefile|   4 +-
 arch/x86/realmode/rm/Makefile |   3 +-
 arch/x86/xen/enlighten.c  |   3 +-
 drivers/firmware/efi/libstub/Makefile |   1 +
 include/linux/objtool.h   |  22 +
 kernel/bpf/core.c |   2 +
 kernel/sched/core.c   |   2 +
 lib/Kconfig.debug |  12 +
 scripts/Makefile.build|  39 +-
 scripts/mod/Makefile  |   2 +
 tools/Makefile|  14 +-
 tools/objtool/.gitignore  |   2 +
 tools/objtool/Build   |  13 +
 tools/objtool/Documentation/stack-validation.txt  | 340 
 tools/objtool/Makefile|  60 ++
 tools/objtool/arch.h  |  44 +
 tools/objtool/arch/x86/Build  |  12 +
 tools/objtool/arch/x86/decode.c   | 172 
 tools/objtool/arch/x86/insn/gen-insn-attr-x86.awk | 387 +
 tools/objtool/arch/x86/insn/inat.c|  97 +++
 tools/objtool/arch/x86/insn/inat.h| 221 +
 tools/objtool/arch/x86/insn/inat_types.h  |  29 +
 tools/objtool/arch/x86/insn/insn.c| 594 +
 tools/objtool/arch/x86/insn/insn.h| 201 +
 tools/objtool/arch/x86/insn/x86-opcode-map.txt| 984 +
 tools/objtool/builtin-check.c | 993 ++
 tools/objtool/builtin.h   |  22 +
 tools/objtool/elf.c   | 403 +
 tools/objtool/elf.h   |  79 ++
 tools/objtool/objtool.c   | 134 +++
 tools/objtool/special.c   | 193 +
 tools/objtool/special.h  

[PATCH v17 7/9] bpf: Add __bpf_prog_run() to objtool whitelist

2016-02-24 Thread Josh Poimboeuf
objtool reports the following false positive warnings:

  objtool: kernel/bpf/core.o: __bpf_prog_run()+0x5c: sibling call from callable 
instruction with changed frame pointer
  objtool: kernel/bpf/core.o: __bpf_prog_run()+0x60: function has unreachable 
instruction
  objtool: kernel/bpf/core.o: __bpf_prog_run()+0x64: function has unreachable 
instruction
  [...]

It's confused by the following dynamic jump instruction in
__bpf_prog_run()::

  jmp *(%r12,%rax,8)

which corresponds to the following line in the C code:

  goto *jumptable[insn->code];

There's no way for objtool to deterministically find all possible
branch targets for a dynamic jump, so it can't verify this code.

In this case the jumps all stay within the function, and there's nothing
unusual going on related to the stack, so we can whitelist the function.

Signed-off-by: Josh Poimboeuf <jpoim...@redhat.com>
Acked-by: Daniel Borkmann <dan...@iogearbox.net>
Acked-by: Alexei Starovoitov <a...@kernel.org>
Cc: netdev@vger.kernel.org
---
 kernel/bpf/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 972d9a8..b26b1f8 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -649,6 +650,7 @@ load_byte:
WARN_RATELIMIT(1, "unknown opcode %02x\n", insn->code);
return 0;
 }
+STACK_FRAME_NON_STANDARD(__bpf_prog_run); /* jump table */
 
 bool bpf_prog_array_compatible(struct bpf_array *array,
   const struct bpf_prog *fp)
-- 
2.4.3



Re: [PATCH 00/33] Compile-time stack metadata validation

2016-02-24 Thread Josh Poimboeuf
On Wed, Feb 24, 2016 at 08:40:54AM +0100, Ingo Molnar wrote:
> 
> * Josh Poimboeuf <jpoim...@redhat.com> wrote:
> 
> > Hi Ingo,
> > 
> > On Tue, Feb 23, 2016 at 09:14:06AM +0100, Ingo Molnar wrote:
> > > So I tried out this latest stacktool series and it looks mostly good for 
> > > an 
> > > upstream merge.
> > > 
> > > To help this effort move forward I've applied the preparatory/fix patches 
> > > that are 
> > > part of this series to tip:x86/debug - that's 26 out of 31 patches. (I've 
> > > propagated all the acks that the latest submission got into the 
> > > changelogs.)
> > 
> > Thanks very much for your review and for applying the fixes!
> > 
> > A few issues relating to the merge:
> > 
> > - The tip:x86/debug branch fails to build because it depends on
> >   ec5186557abb ("x86/asm: Add C versions of frame pointer macros") which
> >   is in tip:x86/asm.
> 
> Indeed...

FYI, v17 will be based on tip:x86/debug.  But note that, when run
against that branch, it'll spit out a lot of these warnings:

  objtool: arch/x86/ia32/sys_ia32.o: __ex_table size not a multiple of 12
  objtool: arch/x86/ia32/ia32_signal.o: __ex_table size not a multiple of 12
  objtool: arch/x86/entry/common.o: __ex_table size not a multiple of 12

Those warnings mean it's expecting the new exception table format which
was added in:

  548acf19234d ("x86/mm: Expand the exception table logic to allow new handling 
options")

So that commit is needed to avoid the warnings.

Thanks!

-- 
Josh


Re: [PATCH 00/33] Compile-time stack metadata validation

2016-02-23 Thread Josh Poimboeuf
On Tue, Feb 23, 2016 at 11:27:17AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Feb 23, 2016 at 09:14:06AM +0100, Ingo Molnar escreveu:
> > The fact that 'stacktool' already checks about assembly details like 
> > __ex_table[] 
> > shows that my review feedback early iterations of this series, that the 
> > 'stacktool' name is too specific, was correct.
> > 
> > We really need to rename it before it gets upstream and the situation gets 
> > worse. 
> > __ex_table[] has obviously nothing to do with the 'stack layout' of 
> > binaries.
> > 
> > Another suitable name would be 'asmtool' or 'objtool'. For example the 
> > following 
> > would naturally express what it does:
> > 
> >   objtool check kernel/built-in.o
> > 
> > the name expresses that the tool checks object files, independently of the 
> > existing toolchain. Its primary purpose right now is the checking of stack 
> > layout 
> > details, but the tool is not limited to that at all.
> 
> Removing 'tool' from the tool name would be nice too :-) Making it
> easily googlable would be good too, lotsa people complain about 'perf'
> being too vague, see for a quick laugher:
> 
> http://www.brendangregg.com/perf.html
> 
> ``Searching for just "perf" finds sites on the police, petroleum, weed
> control, and a T-shirt. This is not an official perf page, for either
> perf_events or the T-shirt.''
> 
> The T-shirt: http://www.brendangregg.com/perf_events/omg-so-perf.jpg

Yeah, 'tool' in the name is kind of silly, but the above type of
situation is why I prefer 'objtool' over 'obj'.

Though I have to admit I like the idea of making a t-shirt for it ;-)

> Maybe we should ask Linus to come with some other nice, short,
> searchable, funny name like 'git'?
> 
> 'chob' as in 'check object'?

I think 'objtool' is searchable enough.  And I also like the fact that
its name at least gives you an idea of what it does (and eventually it
will do more than just "checking").

-- 
Josh


Re: [PATCH 00/33] Compile-time stack metadata validation

2016-02-23 Thread Josh Poimboeuf
Hi Ingo,

On Tue, Feb 23, 2016 at 09:14:06AM +0100, Ingo Molnar wrote:
> So I tried out this latest stacktool series and it looks mostly good for an 
> upstream merge.
> 
> To help this effort move forward I've applied the preparatory/fix patches 
> that are 
> part of this series to tip:x86/debug - that's 26 out of 31 patches. (I've 
> propagated all the acks that the latest submission got into the changelogs.)

Thanks very much for your review and for applying the fixes!

A few issues relating to the merge:

- The tip:x86/debug branch fails to build because it depends on
  ec5186557abb ("x86/asm: Add C versions of frame pointer macros") which
  is in tip:x86/asm.

- As Pavel mentioned, the tip-bot seems to be spitting out garbage
  emails from:
  
=?UTF-8?B?dGlwLWJvdCBmb3IgSm9zaCBQb2ltYm9ldWYgPHRpcGJvdEB6eXRvci5jb20+?=@zytor.com.

> I have 5 (easy to address) observations that need to be addressed before we 
> can 
> move on with the rest of the merge:
> 
> 1)
> 
> Due to recent changes to x86 exception handling, I get a lot of bogus 
> warnings 
> about exception table sizes:
> 
>   stacktool: arch/x86/kernel/cpu/mcheck/mce.o: __ex_table size not a multiple 
> of 24
>   stacktool: arch/x86/kernel/cpu/mtrr/generic.o: __ex_table size not a 
> multiple of 24
>   stacktool: arch/x86/kernel/cpu/mtrr/cleanup.o: __ex_table size not a 
> multiple of 24
> 
> this is due to:
> 
>   548acf19234d x86/mm: Expand the exception table logic to allow new handling 
> options

Ok, I'll fix it up.

> 2)
> 
> The fact that 'stacktool' already checks about assembly details like 
> __ex_table[] 
> shows that my review feedback early iterations of this series, that the 
> 'stacktool' name is too specific, was correct.
> 
> We really need to rename it before it gets upstream and the situation gets 
> worse. 
> __ex_table[] has obviously nothing to do with the 'stack layout' of binaries.
> 
> Another suitable name would be 'asmtool' or 'objtool'. For example the 
> following 
> would naturally express what it does:
> 
>   objtool check kernel/built-in.o
> 
> the name expresses that the tool checks object files, independently of the 
> existing toolchain. Its primary purpose right now is the checking of stack 
> layout 
> details, but the tool is not limited to that at all.

Fair enough.  I'll rename it to objtool if there are no objections.

> 3)
> 
> There's quite a bit of overhead when running the tool on larger object files, 
> most 
> prominently in cmd_check():
> 
> 62.06%  stacktoolstacktool  [.] 
> cmd_check
>  6.72%  stacktoolstacktool  [.] 
> find_rela_by_dest_range
> 
> I added -g to the CFLAGS, which made it visible to annotated output of perf 
> top:
> 
> 0.00 :40430d:   lea0x4(%rdx,%rax,1),%r13
>  :  find_instruction():
>  :  struct section 
> *sec,
>  :  unsigned long 
> offset)
>  :  {
>  :  struct instruction *insn;
>  :
>  :  list_for_each_entry(insn, >insns, list)
> 0.03 :404312:   mov0x38(%rsp),%rax
> 0.00 :404317:   cmp%rbp,%rax
> 0.00 :40431a:   jne404334 
> 0.00 :40431c:   jmpq   4045ba 
> 0.00 :404321:   nopl   0x0(%rax)
> 6.14 :404328:   mov(%rax),%rax
> 0.00 :40432b:   cmp%rbp,%rax
> 2.02 :40432e:   je 4045ba 
>  :  if (insn->sec == sec && insn->offset == 
> offset)
> 0.55 :404334:   cmp%r12,0x10(%rax)
>87.91 :404338:   jne404328 
> 0.00 :40433a:   cmp%r13,0x18(%rax)
> 3.36 :40433e:   jne404328 
>  :  get_jump_destinations():
>  :   * later in validate_functions().
>  :   */
>  :  continue;
>  :  }
>  :
>  :  insn->jump_dest = find_instruction(file, 
> dest_sec, dest_off);
> 0.00 :404340:   mov%rax,0x48(%rbx)
> 0.00 :404344:   jmpq   4042b0 
> 0.00 :404349:   nopl   0x0(%rax)
>  :  fprintf():
>  :
>  :  # ifdef __va_arg_pack
>  :  __fortify_function int
>  :  fprintf (FILE *__restrict __stream, const char *__restrict 
> __fmt, ...)
>  :  {
>  :return __fprintf_chk (__stream, __USE_FORTIFY_LEVEL - 1, 
> __fmt,
> 
> that looks like a linear list search? That would suck with thousands of 
> entries.
> 
> (If this is non-trivial to improve then we can delay 

Re: [PATCH 00/33] Compile-time stack metadata validation

2016-02-15 Thread Josh Poimboeuf
On Mon, Feb 15, 2016 at 08:56:21AM -0800, Linus Torvalds wrote:
> On Feb 15, 2016 8:31 AM, "Josh Poimboeuf" <jpoim...@redhat.com> wrote:
> >
> > So is the goal to optimize for size?  If I replace the calls to
> > __preempt_schedule[_notrace]() with real C calls and remove the thunks,
> > it only adds about 2k to vmlinux.
> 
> It adds nasty register pressure in some of the most critical parts of the
> kernel, and makes the asm code harder to read.
> 
> And yes, I still read the asm. For performance reasons, and when decoding
> oopses.
> 
> I realize that few other people care about generated code. That's sad.
> 
> > There are two ways to fix the warnings:
> >
> > 1. get rid of the thunks and call the C functions directly; or
> 
> No. Not until gcc learns about per-function callibg conventions (so that it
> can be marked as not clobbering registers).
> 
> > 2. add the stack pointer to the asm() statement output operand list to
> > ensure a stack frame gets created in the caller function before the
> > call.
> 
> That probably doesn't make things much worse. It probably makes least
> functions have a stack frame if they do preempt disable, but it might still
> be acceptable. Hard to say before I end up hurting this case again.

Oddly, this change (see patch below) actually seems to make things
faster in a lot of cases.  For many smaller functions it causes the
stack frame creation to get moved out of the common path and into the
unlikely path.

For example, here's the original cyc2ns_read_end():

  8101f8c0 :
  8101f8c0: 55  push   %rbp
  8101f8c1: 48 89 e5mov%rsp,%rbp
  8101f8c4: 83 6f 10 01 subl   $0x1,0x10(%rdi)
  8101f8c8: 75 08   jne8101f8d2 
<cyc2ns_read_end+0x12>
  8101f8ca: 65 48 89 3d e6 5a ffmov
%rdi,%gs:0x7eff5ae6(%rip)# 153b8 <cyc2ns+0x38>
  8101f8d1: 7e 
  8101f8d2: 65 ff 0d 77 c4 fe 7edecl   %gs:0x7efec477(%rip) 
   # bd50 <__preempt_count>
  8101f8d9: 74 02   je 8101f8dd 
<cyc2ns_read_end+0x1d>
  8101f8db: 5d  pop%rbp
  8101f8dc: c3  retq   
  8101f8dd: e8 1e 37 fe ff  callq  81003000 
<___preempt_schedule>
  8101f8e2: 5d  pop%rbp
  8101f8e3: c3  retq   
  8101f8e4: 66 66 66 2e 0f 1f 84data16 data16 nopw 
%cs:0x0(%rax,%rax,1)
  8101f8eb: 00 00 00 00 00 

And here's the same function with the patch:

  8101f8c0 :
  8101f8c0: 83 6f 10 01 subl   $0x1,0x10(%rdi)
  8101f8c4: 75 08   jne8101f8ce 
<cyc2ns_read_end+0xe>
  8101f8c6: 65 48 89 3d ea 5a ffmov
%rdi,%gs:0x7eff5aea(%rip)# 153b8 <cyc2ns+0x38>
  8101f8cd: 7e 
  8101f8ce: 65 ff 0d 7b c4 fe 7edecl   %gs:0x7efec47b(%rip) 
   # bd50 <__preempt_count>
  8101f8d5: 74 01   je 8101f8d8 
<cyc2ns_read_end+0x18>
  8101f8d7: c3  retq   
  8101f8d8: 55  push   %rbp
  8101f8d9: 48 89 e5mov%rsp,%rbp
  8101f8dc: e8 1f 37 fe ff  callq  81003000 
<___preempt_schedule>
  8101f8e1: 5d  pop%rbp
  8101f8e2: c3  retq   
  8101f8e3: 66 66 66 66 2e 0f 1fdata16 data16 data16 nopw 
%cs:0x0(%rax,%rax,1)
  8101f8ea: 84 00 00 00 00 00 

Notice that it moved the frame pointer setup code to the unlikely
___preempt_schedule() call path.  Going through a sampling of the
differences in the asm, that's the most common change I see.

Otherwise it has no real effect on callers which already have stack
frames (though it does change the ordering of some 'mov's).

And it has the intended effect of fixing the following warnings by
ensuring these call sites have stack frames:

  stacktool: drivers/scsi/hpsa.o: hpsa_scsi_do_simple_cmd.constprop.106()+0x79: 
call without frame pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x70: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x92: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_free()+0xff: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_free()+0xf5: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_free()+0x11a: call without frame 
pointer save/setup
  stacktool: fs/mbcac

Re: [PATCH 00/33] Compile-time stack metadata validation

2016-02-15 Thread Josh Poimboeuf
On Fri, Feb 12, 2016 at 09:10:11PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 12, 2016 at 12:32:06PM -0600, Josh Poimboeuf wrote:
> > What I actually see in the listing is:
> > 
> > decl__percpu_prefix:__preempt_count
> > je  1f:
> > 
> >  1:
> > call___preempt_schedule
> > 
> > So it puts the "call ___preempt_schedule" in the slow path.
> 
> Ah yes indeed. Same difference though.
> 
> > I also don't see how that would be related to the use of the asm
> > statement in the __preempt_schedule() macro.  Doesn't the use of
> > unlikely() in preempt_enable() put the call in the slow path?
> 
> Sadly no, unlikely() and asm_goto don't work well together. But the slow
> path or not isn't the reason we do the asm call thing.
> 
> >   #define preempt_enable() \
> >   do { \
> >   barrier(); \
> >   if (unlikely(preempt_count_dec_and_test())) \
> >   preempt_schedule(); \
> >   } while (0)
> > 
> > Also, why is the thunk needed?  Any reason why preempt_enable() can't be
> > called directly from C?
> 
> That would make the call-site save registers and increase the size of
> every preempt_enable(). By using the thunk we can do callee saved
> registers and avoid blowing up the call site.

So is the goal to optimize for size?  If I replace the calls to
__preempt_schedule[_notrace]() with real C calls and remove the thunks,
it only adds about 2k to vmlinux.

There are two ways to fix the warnings:

1. get rid of the thunks and call the C functions directly; or

2. add the stack pointer to the asm() statement output operand list to
ensure a stack frame gets created in the caller function before the
call.  (Note this still allows the thunks to do callee saved registers.)

I like #1 better, but maybe I'm still missing the point of the thunks.

-- 
Josh


Re: [PATCH 00/33] Compile-time stack metadata validation

2016-02-12 Thread Josh Poimboeuf
On Fri, Feb 12, 2016 at 11:36:24AM +0100, Jiri Slaby wrote:
> On 01/21/2016, 11:49 PM, Josh Poimboeuf wrote:
> > This is v16 of the compile-time stack metadata validation patch set,
> > along with proposed fixes for most of the warnings it found.  It's based
> > on the tip/master branch.
> 
> Hi,
> 
> with this config:
> https://github.com/openSUSE/kernel-source/blob/master/config/x86_64/vanilla
> 
> I am seeing a lot of functions in C which do not have frame pointer 
> setup/cleanup:

Hi Jiri,

Thanks for testing.

> stacktool: drivers/scsi/hpsa.o: hpsa_scsi_do_simple_cmd.constprop.106()+0x79: 
> call without frame pointer save/setup

This seems like a real frame pointer bug caused by the following line in
arch/x86/include/asm/preempt.h:

  # define __preempt_schedule() asm ("call ___preempt_schedule")

The asm statement doesn't have the stack pointer as an output operand,
so gcc doesn't skips the frame pointer setup before calling.

However, I suspect the "bug" is intentional for optimization purposes.

> stacktool: drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.o: 
> cfs_cdebug_show.part.5.constprop.35()+0x0: frame pointer state mismatch
> stacktool: drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.o: 
> cfs_cdebug_show.part.5.constprop.35()+0x8: duplicate frame pointer save
> stacktool: drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.o: 
> cfs_cdebug_show.part.5.constprop.35()+0x9: duplicate frame pointer setup
> stacktool: drivers/staging/lustre/lnet/klnds/socklnd/socklnd.o: 
> ksocknal_connsock_decref()+0x0: duplicate frame pointer save
> stacktool: drivers/staging/lustre/lnet/klnds/socklnd/socklnd.o: 
> ksocknal_connsock_decref()+0x0: frame pointer state mismatch
> stacktool: drivers/staging/lustre/lnet/klnds/socklnd/socklnd.o: 
> ksocknal_connsock_decref()+0x1: duplicate frame pointer setup
> stacktool: drivers/staging/lustre/lnet/klnds/socklnd/socklnd.o: .text: 
> unexpected end of section
> stacktool: drivers/staging/lustre/lnet/lnet/lib-move.o: 
> cfs_cdebug_show.part.1.constprop.16()+0x0: frame pointer state mismatch
> stacktool: drivers/staging/lustre/lnet/lnet/lib-move.o: 
> cfs_cdebug_show.part.1.constprop.16()+0x8: duplicate frame pointer save
> stacktool: drivers/staging/lustre/lnet/lnet/lib-move.o: 
> cfs_cdebug_show.part.1.constprop.16()+0x9: duplicate frame pointer setup
> stacktool: drivers/staging/lustre/lnet/lnet/lib-move.o: .text: unexpected end 
> of section
> stacktool: drivers/staging/lustre/lnet/lnet/lo.o: .text: unexpected end of 
> section
> stacktool: drivers/staging/lustre/lnet/lnet/nidstrings.o: 
> cfs_print_nidlist()+0x220: frame pointer state mismatch
> stacktool: drivers/staging/lustre/lnet/lnet/peer.o: .text: unexpected end of 
> section
> stacktool: drivers/staging/lustre/lnet/lnet/router.o: 
> cfs_cdebug_show.part.0.constprop.16()+0x0: frame pointer state mismatch
> stacktool: drivers/staging/lustre/lnet/lnet/router.o: 
> cfs_cdebug_show.part.0.constprop.16()+0x8: duplicate frame pointer save
> stacktool: drivers/staging/lustre/lnet/lnet/router.o: 
> cfs_cdebug_show.part.0.constprop.16()+0x9: duplicate frame pointer setup
> stacktool: drivers/staging/lustre/lnet/lnet/router.o: 
> lnet_find_net_locked()+0x8a: frame pointer state mismatch
> stacktool: drivers/staging/lustre/lnet/lnet/router.o: 
> lnet_find_net_locked()+0x8a: return without frame pointer restore
> stacktool: drivers/staging/lustre/lustre/fid/fid_request.o: .text: unexpected 
> end of section
> stacktool: drivers/staging/lustre/lustre/fld/lproc_fld.o: .text: unexpected 
> end of section
> stacktool: drivers/staging/lustre/lustre/libcfs/libcfs_lock.o: .text: 
> unexpected end of section
> stacktool: drivers/staging/lustre/lustre/libcfs/libcfs_mem.o: .text: 
> unexpected end of section
> stacktool: drivers/staging/lustre/lustre/llite/dir.o: obd_unpackmd()+0x0: 
> duplicate frame pointer save
> stacktool: drivers/staging/lustre/lustre/llite/dir.o: obd_unpackmd()+0x0: 
> frame pointer state mismatch
> stacktool: drivers/staging/lustre/lustre/llite/dir.o: obd_unpackmd()+0x4: 
> duplicate frame pointer setup
> stacktool: drivers/staging/lustre/lustre/llite/file.o: 
> md_intent_lock.part.28()+0x0: duplicate frame pointer save
> stacktool: drivers/staging/lustre/lustre/llite/file.o: 
> md_intent_lock.part.28()+0x0: frame pointer state mismatch
> stacktool: drivers/staging/lustre/lustre/llite/file.o: 
> md_intent_lock.part.28()+0x24: duplicate frame pointer setup
> stacktool: drivers/staging/lustre/lustre/llite/../lclient/glimpse.o: 
> cl_io_get()+0x0: frame pointer state mismatch
> stacktool: drivers/staging/lustre/lustre/llite/../lclient/glimpse.o: 
> cl_io_get()+0x1a: duplicate frame pointer save
> stacktool: drivers/staging/lustre/lus

Re: [PATCH 00/33] Compile-time stack metadata validation

2016-02-12 Thread Josh Poimboeuf
On Fri, Feb 12, 2016 at 06:10:37PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 12, 2016 at 08:45:43AM -0600, Josh Poimboeuf wrote:
> > On Fri, Feb 12, 2016 at 11:36:24AM +0100, Jiri Slaby wrote:
> > 
> > This seems like a real frame pointer bug caused by the following line in
> > arch/x86/include/asm/preempt.h:
> > 
> >   # define __preempt_schedule() asm ("call ___preempt_schedule")
> 
> The purpose there is that:
> 
>   preempt_enable();
> 
> turns into:
> 
>   decl__percpu_prefix:__preempt_count
>   jnz 1f:
>   call___preempt_schedule
> 1:
> 
> See arch/x86/include/asm/preempt.h:__preempt_count_dec_and_test()

Sorry, I'm kind of confused.  Do you mean that's what preempt_enable()
would turn into *without* the above define?

What I actually see in the listing is:

decl__percpu_prefix:__preempt_count
je  1f:

 1:
call___preempt_schedule

So it puts the "call ___preempt_schedule" in the slow path.

I also don't see how that would be related to the use of the asm
statement in the __preempt_schedule() macro.  Doesn't the use of
unlikely() in preempt_enable() put the call in the slow path?

  #define preempt_enable() \
  do { \
  barrier(); \
  if (unlikely(preempt_count_dec_and_test())) \
  preempt_schedule(); \
  } while (0)

Also, why is the thunk needed?  Any reason why preempt_enable() can't be
called directly from C?

-- 
Josh


Re: [PATCH 00/33] Compile-time stack metadata validation

2016-02-12 Thread Josh Poimboeuf
On Fri, Feb 12, 2016 at 12:32:06PM -0600, Josh Poimboeuf wrote:
> On Fri, Feb 12, 2016 at 06:10:37PM +0100, Peter Zijlstra wrote:
> > On Fri, Feb 12, 2016 at 08:45:43AM -0600, Josh Poimboeuf wrote:
> > > On Fri, Feb 12, 2016 at 11:36:24AM +0100, Jiri Slaby wrote:
> > > 
> > > This seems like a real frame pointer bug caused by the following line in
> > > arch/x86/include/asm/preempt.h:
> > > 
> > >   # define __preempt_schedule() asm ("call ___preempt_schedule")
> > 
> > The purpose there is that:
> > 
> > preempt_enable();
> > 
> > turns into:
> > 
> > decl__percpu_prefix:__preempt_count
> > jnz 1f:
> > call___preempt_schedule
> > 1:
> > 
> > See arch/x86/include/asm/preempt.h:__preempt_count_dec_and_test()
> 
> Sorry, I'm kind of confused.  Do you mean that's what preempt_enable()
> would turn into *without* the above define?
> 
> What I actually see in the listing is:
> 
>   decl__percpu_prefix:__preempt_count
>   je  1f:
>   
>  1:
>   call___preempt_schedule
> 
> So it puts the "call ___preempt_schedule" in the slow path.
> 
> I also don't see how that would be related to the use of the asm
> statement in the __preempt_schedule() macro.  Doesn't the use of
> unlikely() in preempt_enable() put the call in the slow path?
> 
>   #define preempt_enable() \
>   do { \
> barrier(); \
> if (unlikely(preempt_count_dec_and_test())) \
> preempt_schedule(); \
>   } while (0)
> 
> Also, why is the thunk needed?  Any reason why preempt_enable() can't be
> called directly from C?

Sorry, s/preempt_enable/preempt_schedule/ on that last sentence.

-- 
Josh


Re: [PATCH 00/33] Compile-time stack metadata validation

2016-01-22 Thread Josh Poimboeuf
On Fri, Jan 22, 2016 at 02:40:35PM -0600, Chris J Arges wrote:
> On Fri, Jan 22, 2016 at 01:14:47PM -0600, Josh Poimboeuf wrote:
> > On Fri, Jan 22, 2016 at 11:43:48AM -0600, Chris J Arges wrote:
> > > On Thu, Jan 21, 2016 at 04:49:04PM -0600, Josh Poimboeuf wrote:
> > > > This is v16 of the compile-time stack metadata validation patch set,
> > > > along with proposed fixes for most of the warnings it found.  It's based
> > > > on the tip/master branch.
> > > >
> > > Josh,
> > > 
> > > Looks good, with my config [1] I do still get a few warnings building
> > > linux/linux-next.
> > > 
> > > Here are the warnings:
> > > $ grep ^stacktool build.log | grep -v staging
> > 
> > Thanks for reporting these!
> > 
> > > stacktool: arch/x86/kvm/vmx.o: vmx_handle_external_intr()+0x67: call 
> > > without frame pointer save/setup
> > 
> > This can be fixed by setting the stack pointer as an output operand for
> > the inline asm call in vmx_handle_external_intr().
> > 
> > Feel free to submit a patch, or I'll get around to it eventually.
> > 
> > > stacktool: fs/reiserfs/namei.o: set_de_name_and_namelen()+0x9e: return 
> > > without frame pointer restore
> > > stacktool: fs/reiserfs/namei.o: set_de_name_and_namelen()+0x89: duplicate 
> > > frame pointer save
> > > stacktool: fs/reiserfs/namei.o: set_de_name_and_namelen()+0x8a: duplicate 
> > > frame pointer setup
> > > stacktool: fs/reiserfs/namei.o: set_de_name_and_namelen()+0x9e: frame 
> > > pointer state mismatch
> > > stacktool: fs/reiserfs/namei.o: set_de_name_and_namelen()+0x0: frame 
> > > pointer state mismatch
> > 
> > These are false positives.  Stacktool is confused by the use of a
> > "noreturn" function which it doesn't know about (__reiserfs_panic).
> > 
> > Unfortunately the only solution I currently have for dealing with global
> > noreturn functions is to just hard-code a list of them.  So the short
> > term fix would be to add "__reiserfs_panic" to the global_noreturns list
> > in tools/stacktool/builtin-check.c.
> > 
> > I'm still trying to figure out a better way to deal with this type of
> > issue, as it's a pain to have to keep a hard-coded list of noreturn
> > functions.  Unfortunately that info isn't available in the ELF.
> > 
> 
> Josh,
> Ok I'll hack on the patches above.
> 
> > > stacktool: fs/reiserfs/ibalance.o: .text: unexpected end of section
> > > stacktool: fs/reiserfs/tail_conversion.o: .text: unexpected end of section
> > 
> > For some reason I'm not able to recreate these warnings...  Can you
> > share one of the .o files?
> 
> Binaries are here:
> http://people.canonical.com/~arges/stacktool/

Thanks, looks like the same __reiserfs_panic() noreturn fix for those.

-- 
Josh


Re: [PATCH 00/33] Compile-time stack metadata validation

2016-01-22 Thread Josh Poimboeuf
On Fri, Jan 22, 2016 at 11:43:48AM -0600, Chris J Arges wrote:
> On Thu, Jan 21, 2016 at 04:49:04PM -0600, Josh Poimboeuf wrote:
> > This is v16 of the compile-time stack metadata validation patch set,
> > along with proposed fixes for most of the warnings it found.  It's based
> > on the tip/master branch.
> >
> Josh,
> 
> Looks good, with my config [1] I do still get a few warnings building
> linux/linux-next.
> 
> Here are the warnings:
> $ grep ^stacktool build.log | grep -v staging

Thanks for reporting these!

> stacktool: arch/x86/kvm/vmx.o: vmx_handle_external_intr()+0x67: call without 
> frame pointer save/setup

This can be fixed by setting the stack pointer as an output operand for
the inline asm call in vmx_handle_external_intr().

Feel free to submit a patch, or I'll get around to it eventually.

> stacktool: fs/reiserfs/namei.o: set_de_name_and_namelen()+0x9e: return 
> without frame pointer restore
> stacktool: fs/reiserfs/namei.o: set_de_name_and_namelen()+0x89: duplicate 
> frame pointer save
> stacktool: fs/reiserfs/namei.o: set_de_name_and_namelen()+0x8a: duplicate 
> frame pointer setup
> stacktool: fs/reiserfs/namei.o: set_de_name_and_namelen()+0x9e: frame pointer 
> state mismatch
> stacktool: fs/reiserfs/namei.o: set_de_name_and_namelen()+0x0: frame pointer 
> state mismatch

These are false positives.  Stacktool is confused by the use of a
"noreturn" function which it doesn't know about (__reiserfs_panic).

Unfortunately the only solution I currently have for dealing with global
noreturn functions is to just hard-code a list of them.  So the short
term fix would be to add "__reiserfs_panic" to the global_noreturns list
in tools/stacktool/builtin-check.c.

I'm still trying to figure out a better way to deal with this type of
issue, as it's a pain to have to keep a hard-coded list of noreturn
functions.  Unfortunately that info isn't available in the ELF.

> stacktool: fs/reiserfs/ibalance.o: .text: unexpected end of section
> stacktool: fs/reiserfs/tail_conversion.o: .text: unexpected end of section

For some reason I'm not able to recreate these warnings...  Can you
share one of the .o files?

-- 
Josh


Re: [PATCH 23/33] x86/asm/bpf: Create stack frames in bpf_jit.S

2016-01-22 Thread Josh Poimboeuf
On Thu, Jan 21, 2016 at 08:18:46PM -0800, Alexei Starovoitov wrote:
> On Thu, Jan 21, 2016 at 09:55:31PM -0600, Josh Poimboeuf wrote:
> > On Thu, Jan 21, 2016 at 06:44:28PM -0800, Alexei Starovoitov wrote:
> > > On Thu, Jan 21, 2016 at 04:49:27PM -0600, Josh Poimboeuf wrote:
> > > > bpf_jit.S has several callable non-leaf functions which don't honor
> > > > CONFIG_FRAME_POINTER, which can result in bad stack traces.
> > > > 
> > > > Create a stack frame before the call instructions when
> > > > CONFIG_FRAME_POINTER is enabled.
> > > > 
> > > > Signed-off-by: Josh Poimboeuf <jpoim...@redhat.com>
> > > > Cc: Alexei Starovoitov <a...@kernel.org>
> > > > Cc: netdev@vger.kernel.org
> > > > ---
> > > >  arch/x86/net/bpf_jit.S | 9 +++--
> > > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/net/bpf_jit.S b/arch/x86/net/bpf_jit.S
> > > > index eb4a3bd..f2a7faf 100644
> > > > --- a/arch/x86/net/bpf_jit.S
> > > > +++ b/arch/x86/net/bpf_jit.S
> > > > @@ -8,6 +8,7 @@
> > > >   * of the License.
> > > >   */
> > > >  #include 
> > > > +#include 
> > > >  
> > > >  /*
> > > >   * Calling convention :
> > > > @@ -65,16 +66,18 @@ FUNC(sk_load_byte_positive_offset)
> > > >  
> > > >  /* rsi contains offset and can be scratched */
> > > >  #define bpf_slow_path_common(LEN)  \
> > > > +   lea -MAX_BPF_STACK + 32(%rbp), %rdx;\
> > > > +   FRAME_BEGIN;\
> > > > mov %rbx, %rdi; /* arg1 == skb */   \
> > > > push%r9;\
> > > > pushSKBDATA;\
> > > >  /* rsi already has offset */   \
> > > > mov $LEN,%ecx;  /* len */   \
> > > > -   lea - MAX_BPF_STACK + 32(%rbp),%rdx;
> > > > \
> > > > callskb_copy_bits;  \
> > > > test%eax,%eax;  \
> > > > pop SKBDATA;\
> > > > -   pop %r9;
> > > > +   pop %r9;\
> > > > +   FRAME_END
> > > 
> > > I'm not sure what above is doing.
> > > There is already 'push rbp; mov rbp,rsp' at the beginning of generated
> > > code and with above the stack trace will show two function at the same ip?
> > > since there were no calls between them?
> > > I think the stack walker will get even more confused?
> > > Also the JIT of bpf_call insn will emit variable number of push/pop
> > > around the call and I definitely don't want to add extra push rbp
> > > there, since it's the critical path and callee will do its own
> > > push rbp.
> > > Also there are push/pops emitted around div/mod
> > > and there is indirect goto emitted as well for bpf_tail_call
> > > that jumps into different function body without touching
> > > current stack.
> > 
> > Hm, I'm not sure I follow.  Let me try to explain my understanding.
> > 
> > As you mentioned, the generated code sets up the frame pointer.  From
> > emit_prologue():
> > 
> > EMIT1(0x55); /* push rbp */
> > EMIT3(0x48, 0x89, 0xE5); /* mov rbp,rsp */
> > 
> > And then later, do_jit() can generate a call into the functions in
> > bpf_jit.S.  For example:
> > 
> > func = CHOOSE_LOAD_FUNC(imm32, sk_load_word);
> > ...
> > EMIT1_off32(0xE8, jmp_offset); /* call */
> > 
> > So the functions in bpf_jit.S are being called by the generated code.
> > They're not part of the generated code itself.  So they're callees and
> > need to create their own stack frame before they call out to somewhere
> > else.
> > 
> > Or did I miss something?
> 
> yes. all correct.
> This particular patch is ok, since it adds to
> bpf_slow_path_common and as the name says it's slow and rare,
> but wanted to make sure the rest of it is understood.
> 
> > > Also none of the JITed function are dwarf annotated.
> > 
> > But what does that have to do with frame pointers?
> 
> nothing, but then why do you need
> .type name, @function
> annotations in another patch?

Those are ELF function annotations which are needed so that stacktool
can fi

Re: [PATCH 23/33] x86/asm/bpf: Create stack frames in bpf_jit.S

2016-01-22 Thread Josh Poimboeuf
On Fri, Jan 22, 2016 at 09:18:23AM -0800, Alexei Starovoitov wrote:
> On Fri, Jan 22, 2016 at 09:58:04AM -0600, Josh Poimboeuf wrote:
> > On Thu, Jan 21, 2016 at 08:18:46PM -0800, Alexei Starovoitov wrote:
> > > On Thu, Jan 21, 2016 at 09:55:31PM -0600, Josh Poimboeuf wrote:
> > > > On Thu, Jan 21, 2016 at 06:44:28PM -0800, Alexei Starovoitov wrote:
> > > > > On Thu, Jan 21, 2016 at 04:49:27PM -0600, Josh Poimboeuf wrote:
> > > > > > bpf_jit.S has several callable non-leaf functions which don't honor
> > > > > > CONFIG_FRAME_POINTER, which can result in bad stack traces.
> > > > > > 
> > > > > > Create a stack frame before the call instructions when
> > > > > > CONFIG_FRAME_POINTER is enabled.
> > > > > > 
> > > > > > Signed-off-by: Josh Poimboeuf <jpoim...@redhat.com>
> > > > > > Cc: Alexei Starovoitov <a...@kernel.org>
> > > > > > Cc: netdev@vger.kernel.org
> > > > > > ---
> > > > > >  arch/x86/net/bpf_jit.S | 9 +++--
> ...
> > > > > >  /* rsi contains offset and can be scratched */
> > > > > >  #define bpf_slow_path_common(LEN)  \
> > > > > > +   lea -MAX_BPF_STACK + 32(%rbp), %rdx;\
> > > > > > +   FRAME_BEGIN;\
> > > > > > mov %rbx, %rdi; /* arg1 == skb */   \
> > > > > > push%r9;\
> > > > > > pushSKBDATA;\
> > > > > >  /* rsi already has offset */   \
> > > > > > mov $LEN,%ecx;  /* len */   \
> > > > > > -   lea - MAX_BPF_STACK + 32(%rbp),%rdx;
> > > > > > \
> > > > > > callskb_copy_bits;  \
> > > > > > test%eax,%eax;  \
> > > > > > pop SKBDATA;\
> > > > > > -   pop %r9;
> > > > > > +   pop %r9;\
> > > > > > +   FRAME_END
> ...
> > > > Well, but the point of these patches isn't to make the tool happy.  It's
> > > > really to make sure that runtime stack traces can be made reliable.
> > > > Maybe I'm missing something but I don't see why JIT code can't honor
> > > > CONFIG_FRAME_POINTER just like any other code.
> > > 
> > > It can if there is no performance cost added.
> > 
> > CONFIG_FRAME_POINTER always adds a small performance cost but as you
> > mentioned it only affects the slow path here.  And hopefully we'll soon
> > have an in-kernel DWARF unwinder on x86 so we can get rid of the need
> > for frame pointers.
> 
> ok. fair enough.
> Acked-by: Alexei Starovoitov <a...@kernel.org>

Thanks!

Can I assume your ack also applies to the previous patch which adds the
ELF annotations ("x86/asm/bpf: Annotate callable functions")?

-- 
Josh


[PATCH 31/33] bpf: Add __bpf_prog_run() to stacktool whitelist

2016-01-21 Thread Josh Poimboeuf
stacktool reports the following false positive warnings:

  stacktool: kernel/bpf/core.o: __bpf_prog_run()+0x5c: sibling call from 
callable instruction with changed frame pointer
  stacktool: kernel/bpf/core.o: __bpf_prog_run()+0x60: function has unreachable 
instruction
  stacktool: kernel/bpf/core.o: __bpf_prog_run()+0x64: function has unreachable 
instruction
  [...]

It's confused by the following dynamic jump instruction in
__bpf_prog_run()::

  jmp *(%r12,%rax,8)

which corresponds to the following line in the C code:

  goto *jumptable[insn->code];

There's no way for stacktool to deterministically find all possible
branch targets for a dynamic jump, so it can't verify this code.

In this case the jumps all stay within the function, and there's nothing
unusual going on related to the stack, so we can whitelist the function.

Signed-off-by: Josh Poimboeuf <jpoim...@redhat.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: netdev@vger.kernel.org
---
 kernel/bpf/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 972d9a8..7108a96 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -649,6 +650,7 @@ load_byte:
WARN_RATELIMIT(1, "unknown opcode %02x\n", insn->code);
return 0;
 }
+STACKTOOL_IGNORE_FUNC(__bpf_prog_run);
 
 bool bpf_prog_array_compatible(struct bpf_array *array,
   const struct bpf_prog *fp)
-- 
2.4.3



[PATCH 22/33] x86/asm/bpf: Annotate callable functions

2016-01-21 Thread Josh Poimboeuf
bpf_jit.S has several functions which can be called from C code.  Give
them proper ELF annotations.

Signed-off-by: Josh Poimboeuf <jpoim...@redhat.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: netdev@vger.kernel.org
---
 arch/x86/net/bpf_jit.S | 39 ---
 1 file changed, 16 insertions(+), 23 deletions(-)

diff --git a/arch/x86/net/bpf_jit.S b/arch/x86/net/bpf_jit.S
index 4093216..eb4a3bd 100644
--- a/arch/x86/net/bpf_jit.S
+++ b/arch/x86/net/bpf_jit.S
@@ -22,15 +22,16 @@
32 /* space for rbx,r13,r14,r15 */ + \
8 /* space for skb_copy_bits */)
 
-sk_load_word:
-   .globl  sk_load_word
+#define FUNC(name) \
+   .globl name; \
+   .type name, @function; \
+   name:
 
+FUNC(sk_load_word)
test%esi,%esi
js  bpf_slow_path_word_neg
 
-sk_load_word_positive_offset:
-   .globl  sk_load_word_positive_offset
-
+FUNC(sk_load_word_positive_offset)
mov %r9d,%eax   # hlen
sub %esi,%eax   # hlen - offset
cmp $3,%eax
@@ -39,15 +40,11 @@ sk_load_word_positive_offset:
bswap   %eax/* ntohl() */
ret
 
-sk_load_half:
-   .globl  sk_load_half
-
+FUNC(sk_load_half)
test%esi,%esi
js  bpf_slow_path_half_neg
 
-sk_load_half_positive_offset:
-   .globl  sk_load_half_positive_offset
-
+FUNC(sk_load_half_positive_offset)
mov %r9d,%eax
sub %esi,%eax   #   hlen - offset
cmp $1,%eax
@@ -56,15 +53,11 @@ sk_load_half_positive_offset:
rol $8,%ax  # ntohs()
ret
 
-sk_load_byte:
-   .globl  sk_load_byte
-
+FUNC(sk_load_byte)
test%esi,%esi
js  bpf_slow_path_byte_neg
 
-sk_load_byte_positive_offset:
-   .globl  sk_load_byte_positive_offset
-
+FUNC(sk_load_byte_positive_offset)
cmp %esi,%r9d   /* if (offset >= hlen) goto bpf_slow_path_byte */
jle bpf_slow_path_byte
movzbl  (SKBDATA,%rsi),%eax
@@ -120,8 +113,8 @@ bpf_slow_path_byte:
 bpf_slow_path_word_neg:
cmp SKF_MAX_NEG_OFF, %esi   /* test range */
jl  bpf_error   /* offset lower -> error  */
-sk_load_word_negative_offset:
-   .globl  sk_load_word_negative_offset
+
+FUNC(sk_load_word_negative_offset)
sk_negative_common(4)
mov (%rax), %eax
bswap   %eax
@@ -130,8 +123,8 @@ sk_load_word_negative_offset:
 bpf_slow_path_half_neg:
cmp SKF_MAX_NEG_OFF, %esi
jl  bpf_error
-sk_load_half_negative_offset:
-   .globl  sk_load_half_negative_offset
+
+FUNC(sk_load_half_negative_offset)
sk_negative_common(2)
mov (%rax),%ax
rol $8,%ax
@@ -141,8 +134,8 @@ sk_load_half_negative_offset:
 bpf_slow_path_byte_neg:
cmp SKF_MAX_NEG_OFF, %esi
jl  bpf_error
-sk_load_byte_negative_offset:
-   .globl  sk_load_byte_negative_offset
+
+FUNC(sk_load_byte_negative_offset)
sk_negative_common(1)
movzbl  (%rax), %eax
ret
-- 
2.4.3



[PATCH 23/33] x86/asm/bpf: Create stack frames in bpf_jit.S

2016-01-21 Thread Josh Poimboeuf
bpf_jit.S has several callable non-leaf functions which don't honor
CONFIG_FRAME_POINTER, which can result in bad stack traces.

Create a stack frame before the call instructions when
CONFIG_FRAME_POINTER is enabled.

Signed-off-by: Josh Poimboeuf <jpoim...@redhat.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: netdev@vger.kernel.org
---
 arch/x86/net/bpf_jit.S | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/net/bpf_jit.S b/arch/x86/net/bpf_jit.S
index eb4a3bd..f2a7faf 100644
--- a/arch/x86/net/bpf_jit.S
+++ b/arch/x86/net/bpf_jit.S
@@ -8,6 +8,7 @@
  * of the License.
  */
 #include 
+#include 
 
 /*
  * Calling convention :
@@ -65,16 +66,18 @@ FUNC(sk_load_byte_positive_offset)
 
 /* rsi contains offset and can be scratched */
 #define bpf_slow_path_common(LEN)  \
+   lea -MAX_BPF_STACK + 32(%rbp), %rdx;\
+   FRAME_BEGIN;\
mov %rbx, %rdi; /* arg1 == skb */   \
push%r9;\
pushSKBDATA;\
 /* rsi already has offset */   \
mov $LEN,%ecx;  /* len */   \
-   lea - MAX_BPF_STACK + 32(%rbp),%rdx;\
callskb_copy_bits;  \
test%eax,%eax;  \
pop SKBDATA;\
-   pop %r9;
+   pop %r9;\
+   FRAME_END
 
 
 bpf_slow_path_word:
@@ -99,6 +102,7 @@ bpf_slow_path_byte:
ret
 
 #define sk_negative_common(SIZE)   \
+   FRAME_BEGIN;\
mov %rbx, %rdi; /* arg1 == skb */   \
push%r9;\
pushSKBDATA;\
@@ -108,6 +112,7 @@ bpf_slow_path_byte:
test%rax,%rax;  \
pop SKBDATA;\
pop %r9;\
+   FRAME_END;  \
jz  bpf_error
 
 bpf_slow_path_word_neg:
-- 
2.4.3



[PATCH 00/33] Compile-time stack metadata validation

2016-01-21 Thread Josh Poimboeuf
daries
- make the macros simpler and more flexible
- support for analyzing ALTERNATIVE macros
- simplified the arch interfaces in scripts/asmvalidate/arch.h
- fixed some asmvalidate warnings
- rebased onto latest tip asm cleanups
- many more small changes

v4:
- Changed the default to CONFIG_STACK_VALIDATION=n, until all the asm
  code can get cleaned up.
- Fixed a stackvalidate error path exit code issue found by Michal
  Marek.

v3:
- Added a patch to make the push/pop CFI macros arch-independent, as
  suggested by H. Peter Anvin

v2:
- Fixed memory leaks reported by Petr Mladek

Cc: linux-ker...@vger.kernel.org
Cc: live-patch...@vger.kernel.org
Cc: Michal Marek <mma...@suse.cz>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Andi Kleen <a...@firstfloor.org>
Cc: Pedro Alves <pal...@redhat.com>
Cc: Namhyung Kim <namhy...@gmail.com>
Cc: Bernd Petrovitsch <be...@petrovitsch.priv.at>
Cc: Chris J Arges <chris.j.ar...@canonical.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Jiri Slaby <jsl...@suse.cz>
Cc: Arnaldo Carvalho de Melo <a...@kernel.org>

Chris J Arges (1):
  x86/uaccess: Add stack frame output operand in get_user inline asm

Josh Poimboeuf (32):
  x86/stacktool: Compile-time stack metadata validation
  kbuild/stacktool: Add CONFIG_STACK_VALIDATION option
  x86/stacktool: Enable stacktool on x86_64
  x86/stacktool: Add STACKTOOL_IGNORE_FUNC macro
  x86/xen: Add stack frame dependency to hypercall inline asm calls
  x86/asm/xen: Set ELF function type for xen_adjust_exception_frame()
  x86/asm/xen: Create stack frames in xen-asm.S
  x86/paravirt: Add stack frame dependency to PVOP inline asm calls
  x86/paravirt: Create a stack frame in PV_CALLEE_SAVE_REGS_THUNK
  x86/amd: Set ELF function type for vide()
  x86/asm/crypto: Move .Lbswap_mask data to .rodata section
  x86/asm/crypto: Move jump_table to .rodata section
  x86/asm/crypto: Simplify stack usage in sha-mb functions
  x86/asm/crypto: Don't use rbp as a scratch register
  x86/asm/crypto: Create stack frames in crypto functions
  x86/asm/entry: Create stack frames in thunk functions
  x86/asm/acpi: Create a stack frame in do_suspend_lowlevel()
  x86/asm: Create stack frames in rwsem functions
  x86/asm/efi: Create a stack frame in efi_call()
  x86/asm/power: Create stack frames in hibernate_asm_64.S
  x86/asm/bpf: Annotate callable functions
  x86/asm/bpf: Create stack frames in bpf_jit.S
  x86/kprobes: Get rid of kretprobe_trampoline_holder()
  x86/kvm: Set ELF function type for fastop functions
  x86/kvm: Add stack frame dependency to test_cc() inline asm
  watchdog/hpwdt: Create stack frame in asminline_call()
  x86/locking: Create stack frame in PV unlock
  x86/stacktool: Add directory and file whitelists
  x86/xen: Add xen_cpuid() to stacktool whitelist
  bpf: Add __bpf_prog_run() to stacktool whitelist
  sched: Add __schedule() to stacktool whitelist
  x86/kprobes: Add kretprobe_trampoline() to stacktool whitelist

 MAINTAINERS|   6 +
 Makefile   |   5 +-
 arch/Kconfig   |   6 +
 arch/x86/Kconfig   |   1 +
 arch/x86/boot/Makefile |   1 +
 arch/x86/boot/compressed/Makefile  |   3 +-
 arch/x86/crypto/aesni-intel_asm.S  |  75 +-
 arch/x86/crypto/camellia-aesni-avx-asm_64.S|  15 +
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S   |  15 +
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S  |   9 +
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S  |  13 +
 arch/x86/crypto/crc32c-pcl-intel-asm_64.S  |   8 +-
 arch/x86/crypto/ghash-clmulni-intel_asm.S  |   5 +
 arch/x86/crypto/serpent-avx-x86_64-asm_64.S|  13 +
 arch/x86/crypto/serpent-avx2-asm_64.S  |  13 +
 arch/x86/crypto/sha-mb/sha1_mb_mgr_flush_avx2.S|  35 +-
 arch/x86/crypto/sha-mb/sha1_mb_mgr_submit_avx2.S   |  36 +-
 arch/x86/crypto/twofish-avx-x86_64-asm_64.S|  13 +
 arch/x86/entry/Makefile|   4 +
 arch/x86/entry/thunk_64.S  |   4 +
 arch/x86/entry/vdso/Makefile   |   5 +-
 arch/x86/include/asm/paravirt.h|   9 +-
 arch/x86/include/asm/paravirt_types.h  |  18 +-
 arch/x86/include/asm/qspinlock_paravirt.h  |   4 +
 arch/x86/include/asm/uaccess.h |   5 +-
 arch/x86/include/asm/xen/hypercall.h   |   5 +-
 arch/x86/kernel/Makefile   |   5 +
 arch/x86/kernel/acpi/wakeup_64.S   |   3 +
 arch/x86/kernel/cpu/amd.c  |   5 +-
 arch/x86/kernel/kprobes/core.c |  59 +-
 arch/x86/kernel/vmlinux.lds.S

Re: [PATCH 31/33] bpf: Add __bpf_prog_run() to stacktool whitelist

2016-01-21 Thread Josh Poimboeuf
On Thu, Jan 21, 2016 at 06:55:41PM -0800, Alexei Starovoitov wrote:
> On Thu, Jan 21, 2016 at 04:49:35PM -0600, Josh Poimboeuf wrote:
> > stacktool reports the following false positive warnings:
> > 
> >   stacktool: kernel/bpf/core.o: __bpf_prog_run()+0x5c: sibling call from 
> > callable instruction with changed frame pointer
> >   stacktool: kernel/bpf/core.o: __bpf_prog_run()+0x60: function has 
> > unreachable instruction
> >   stacktool: kernel/bpf/core.o: __bpf_prog_run()+0x64: function has 
> > unreachable instruction
> >   [...]
> > 
> > It's confused by the following dynamic jump instruction in
> > __bpf_prog_run()::
> > 
> >   jmp *(%r12,%rax,8)
> > 
> > which corresponds to the following line in the C code:
> > 
> >   goto *jumptable[insn->code];
> > 
> > There's no way for stacktool to deterministically find all possible
> > branch targets for a dynamic jump, so it can't verify this code.
> > 
> > In this case the jumps all stay within the function, and there's nothing
> > unusual going on related to the stack, so we can whitelist the function.
> 
> well, few things are very unusual in this function.
> did you see what JMP_CALL does? it's a call into a different function,
> but not like typical indirect call. Will it be ok as well?
> 
> In general it's not possible for any tool to identify all possible
> branch targets. bpf programs can be loaded on the fly and
> jumping sequence will change.
> So if this marking says 'don't bother analyzing this function because
> it does sane stuff' that's probably not the case.
> If this marking says 'don't bother analyzing, the stack may be crazy
> from here on' then it's ok.

So the tool doesn't need to follow all possible call targets.  Instead
it just verifies that all functions follow the frame pointer convention.
That way it doesn't matter *which* function is being called because they
all do the right thing.

But it *does* need to follow all jump targets, so that it can analyze
all possible code paths within the function itself.  With a dynamic
jump, it can't do that.

So the JMP_CALL is fine, but the goto *jumptable[insn->code] isn't.
(And BTW that's the only occurrence of such a dynamic jump table in the
entire kernel.)

-- 
Josh


Re: [PATCH 23/33] x86/asm/bpf: Create stack frames in bpf_jit.S

2016-01-21 Thread Josh Poimboeuf
On Thu, Jan 21, 2016 at 06:44:28PM -0800, Alexei Starovoitov wrote:
> On Thu, Jan 21, 2016 at 04:49:27PM -0600, Josh Poimboeuf wrote:
> > bpf_jit.S has several callable non-leaf functions which don't honor
> > CONFIG_FRAME_POINTER, which can result in bad stack traces.
> > 
> > Create a stack frame before the call instructions when
> > CONFIG_FRAME_POINTER is enabled.
> > 
> > Signed-off-by: Josh Poimboeuf <jpoim...@redhat.com>
> > Cc: Alexei Starovoitov <a...@kernel.org>
> > Cc: netdev@vger.kernel.org
> > ---
> >  arch/x86/net/bpf_jit.S | 9 +++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/net/bpf_jit.S b/arch/x86/net/bpf_jit.S
> > index eb4a3bd..f2a7faf 100644
> > --- a/arch/x86/net/bpf_jit.S
> > +++ b/arch/x86/net/bpf_jit.S
> > @@ -8,6 +8,7 @@
> >   * of the License.
> >   */
> >  #include 
> > +#include 
> >  
> >  /*
> >   * Calling convention :
> > @@ -65,16 +66,18 @@ FUNC(sk_load_byte_positive_offset)
> >  
> >  /* rsi contains offset and can be scratched */
> >  #define bpf_slow_path_common(LEN)  \
> > +   lea -MAX_BPF_STACK + 32(%rbp), %rdx;\
> > +   FRAME_BEGIN;\
> > mov %rbx, %rdi; /* arg1 == skb */   \
> > push%r9;\
> > pushSKBDATA;\
> >  /* rsi already has offset */   \
> > mov $LEN,%ecx;  /* len */   \
> > -   lea - MAX_BPF_STACK + 32(%rbp),%rdx;\
> > callskb_copy_bits;  \
> > test%eax,%eax;  \
> > pop SKBDATA;\
> > -   pop %r9;
> > +   pop %r9;\
> > +   FRAME_END
> 
> I'm not sure what above is doing.
> There is already 'push rbp; mov rbp,rsp' at the beginning of generated
> code and with above the stack trace will show two function at the same ip?
> since there were no calls between them?
> I think the stack walker will get even more confused?
> Also the JIT of bpf_call insn will emit variable number of push/pop
> around the call and I definitely don't want to add extra push rbp
> there, since it's the critical path and callee will do its own
> push rbp.
> Also there are push/pops emitted around div/mod
> and there is indirect goto emitted as well for bpf_tail_call
> that jumps into different function body without touching
> current stack.

Hm, I'm not sure I follow.  Let me try to explain my understanding.

As you mentioned, the generated code sets up the frame pointer.  From
emit_prologue():

EMIT1(0x55); /* push rbp */
EMIT3(0x48, 0x89, 0xE5); /* mov rbp,rsp */

And then later, do_jit() can generate a call into the functions in
bpf_jit.S.  For example:

func = CHOOSE_LOAD_FUNC(imm32, sk_load_word);
...
EMIT1_off32(0xE8, jmp_offset); /* call */

So the functions in bpf_jit.S are being called by the generated code.
They're not part of the generated code itself.  So they're callees and
need to create their own stack frame before they call out to somewhere
else.

Or did I miss something?

> Also none of the JITed function are dwarf annotated.

But what does that have to do with frame pointers?

> I could be missing something. I think either this patch
> is not need or you need to teach the tool to ignore
> all JITed stuff. I don't think it's practical to annotate
> everything. Different JITs do their own magic.
> s390 JIT is even more fancy.

Well, but the point of these patches isn't to make the tool happy.  It's
really to make sure that runtime stack traces can be made reliable.
Maybe I'm missing something but I don't see why JIT code can't honor
CONFIG_FRAME_POINTER just like any other code.

-- 
Josh


Re: [PATCH 1/4] list: introduce list_is_first()

2015-12-10 Thread Josh Poimboeuf
On Thu, Dec 10, 2015 at 08:10:34AM -0700, Jens Axboe wrote:
> On 12/10/2015 07:17 AM, Geliang Tang wrote:
> >We already have list_is_last(), it makes sense to also add
> >list_is_first() for consistency. This list utility function
> >to check for first element in a list.
> 
> Honestly, I think we already have way too many of these kind of helpers.
> IMHO they don't really help, they hurt readability. You should know how the
> list works anyway, and if you do, then it's a no-brainer what's first and
> last. If you don't, then you are bound to screw up in other ways.
> 
> Just my 2 cents.

Personally I would disagree.  Something like:

  if (list_is_first(>queuelist, >queue))

is much more readable to me than:

  if (rq->queuelist.prev == >queue)

The first one takes no effort for me -- it's almost English.  While the
second one takes me a few seconds (and some precious brain cycles) to
decipher.

Maybe whether it's readable depends on how many years you've been
looking at the pattern.  But IMHO we shouldn't make "having x # of years
staring at kernel code" a prerequisite for being able to read kernel
code.

-- 
Josh
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html