subject:"Re\: \[PATCH 1\/3\] x86\/entry\/64\: Refactor IRQ stacks and make then NMI\-safe"

On Wed, Aug 5, 2015 at 11:27 AM, Steven Rostedt  wrote:
> On Wed, 5 Aug 2015 11:24:54 -0700
> Andy Lutomirski  wrote:
>
>> On Wed, Aug 5, 2015 at 1:59 AM, Ingo Molnar  wrote:
>> >
>> > * Andy Lutomirski  wrote:
>> >
>> >> --- a/arch/x86/kernel/process_64.c
>> >> +++ b/arch/x86/kernel/process_64.c
>> >> @@ -280,6 +280,10 @@ __switch_to(struct task_struct *prev_p, struct 
>> >> task_struct *next_p)
>> >>   unsigned fsindex, gsindex;
>> >>   fpu_switch_t fpu_switch;
>> >>
>> >> +#ifdef CONFIG_DEBUG_ENTRY
>> >> + WARN_ON(this_cpu_read(irq_count));
>> >> +#endif
>> >
>> > Please introduce a less noisy (to the eyes) version of this, something 
>> > like:
>> >
>> > WARN_ON_DEBUG_ENTRY(this_cpu_read(irq_count));
>> >
>> > or so, similar to WARN_ON_FPU().
>>
>> I can do that (or "DEBUG_ENTRY_WARN_ON"?  we seem to be inconsistent
>> about ordering).
>>
>> Or would if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) WARN_ON(...) be better?
>>
>
> Does WARN_ON(IS_ENABLED(CONFIG_DEBUG_ENTRY) && this_cpu_read(irq_count))
> work?

I'd be okay with it.  Ingo?

(Except that that line of code is from v1, and v2 looks slightly
different here, but that's beside the point.)

--Andy

>
> -- Steve



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

2015-08-05 Thread Steven Rostedt

On Wed, 5 Aug 2015 11:24:54 -0700
Andy Lutomirski  wrote:

> On Wed, Aug 5, 2015 at 1:59 AM, Ingo Molnar  wrote:
> >
> > * Andy Lutomirski  wrote:
> >
> >> --- a/arch/x86/kernel/process_64.c
> >> +++ b/arch/x86/kernel/process_64.c
> >> @@ -280,6 +280,10 @@ __switch_to(struct task_struct *prev_p, struct 
> >> task_struct *next_p)
> >>   unsigned fsindex, gsindex;
> >>   fpu_switch_t fpu_switch;
> >>
> >> +#ifdef CONFIG_DEBUG_ENTRY
> >> + WARN_ON(this_cpu_read(irq_count));
> >> +#endif
> >
> > Please introduce a less noisy (to the eyes) version of this, something like:
> >
> > WARN_ON_DEBUG_ENTRY(this_cpu_read(irq_count));
> >
> > or so, similar to WARN_ON_FPU().
> 
> I can do that (or "DEBUG_ENTRY_WARN_ON"?  we seem to be inconsistent
> about ordering).
> 
> Or would if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) WARN_ON(...) be better?
> 

Does WARN_ON(IS_ENABLED(CONFIG_DEBUG_ENTRY) && this_cpu_read(irq_count))
work?

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

On Wed, Aug 5, 2015 at 1:59 AM, Ingo Molnar  wrote:
>
> * Andy Lutomirski  wrote:
>
>> --- a/arch/x86/kernel/process_64.c
>> +++ b/arch/x86/kernel/process_64.c
>> @@ -280,6 +280,10 @@ __switch_to(struct task_struct *prev_p, struct 
>> task_struct *next_p)
>>   unsigned fsindex, gsindex;
>>   fpu_switch_t fpu_switch;
>>
>> +#ifdef CONFIG_DEBUG_ENTRY
>> + WARN_ON(this_cpu_read(irq_count));
>> +#endif
>
> Please introduce a less noisy (to the eyes) version of this, something like:
>
> WARN_ON_DEBUG_ENTRY(this_cpu_read(irq_count));
>
> or so, similar to WARN_ON_FPU().

I can do that (or "DEBUG_ENTRY_WARN_ON"?  we seem to be inconsistent
about ordering).

Or would if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) WARN_ON(...) be better?

--Andy

>
> Thanks,
>
> Ingo



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

2015-08-05 Thread Ingo Molnar


* Andy Lutomirski  wrote:

> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -280,6 +280,10 @@ __switch_to(struct task_struct *prev_p, struct 
> task_struct *next_p)
>   unsigned fsindex, gsindex;
>   fpu_switch_t fpu_switch;
>  
> +#ifdef CONFIG_DEBUG_ENTRY
> + WARN_ON(this_cpu_read(irq_count));
> +#endif

Please introduce a less noisy (to the eyes) version of this, something like:

WARN_ON_DEBUG_ENTRY(this_cpu_read(irq_count));

or so, similar to WARN_ON_FPU().

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

2015-08-05 Thread Ingo Molnar


* Andy Lutomirski l...@kernel.org wrote:

 --- a/arch/x86/kernel/process_64.c
 +++ b/arch/x86/kernel/process_64.c
 @@ -280,6 +280,10 @@ __switch_to(struct task_struct *prev_p, struct 
 task_struct *next_p)
   unsigned fsindex, gsindex;
   fpu_switch_t fpu_switch;
  
 +#ifdef CONFIG_DEBUG_ENTRY
 + WARN_ON(this_cpu_read(irq_count));
 +#endif

Please introduce a less noisy (to the eyes) version of this, something like:

WARN_ON_DEBUG_ENTRY(this_cpu_read(irq_count));

or so, similar to WARN_ON_FPU().

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

On Wed, Aug 5, 2015 at 1:59 AM, Ingo Molnar mi...@kernel.org wrote:

 * Andy Lutomirski l...@kernel.org wrote:

 --- a/arch/x86/kernel/process_64.c
 +++ b/arch/x86/kernel/process_64.c
 @@ -280,6 +280,10 @@ __switch_to(struct task_struct *prev_p, struct 
 task_struct *next_p)
   unsigned fsindex, gsindex;
   fpu_switch_t fpu_switch;

 +#ifdef CONFIG_DEBUG_ENTRY
 + WARN_ON(this_cpu_read(irq_count));
 +#endif

 Please introduce a less noisy (to the eyes) version of this, something like:

 WARN_ON_DEBUG_ENTRY(this_cpu_read(irq_count));

 or so, similar to WARN_ON_FPU().

I can do that (or DEBUG_ENTRY_WARN_ON?  we seem to be inconsistent
about ordering).

Or would if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) WARN_ON(...) be better?

--Andy


 Thanks,

 Ingo



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

2015-08-05 Thread Steven Rostedt

On Wed, 5 Aug 2015 11:24:54 -0700
Andy Lutomirski l...@amacapital.net wrote:

 On Wed, Aug 5, 2015 at 1:59 AM, Ingo Molnar mi...@kernel.org wrote:
 
  * Andy Lutomirski l...@kernel.org wrote:
 
  --- a/arch/x86/kernel/process_64.c
  +++ b/arch/x86/kernel/process_64.c
  @@ -280,6 +280,10 @@ __switch_to(struct task_struct *prev_p, struct 
  task_struct *next_p)
unsigned fsindex, gsindex;
fpu_switch_t fpu_switch;
 
  +#ifdef CONFIG_DEBUG_ENTRY
  + WARN_ON(this_cpu_read(irq_count));
  +#endif
 
  Please introduce a less noisy (to the eyes) version of this, something like:
 
  WARN_ON_DEBUG_ENTRY(this_cpu_read(irq_count));
 
  or so, similar to WARN_ON_FPU().
 
 I can do that (or DEBUG_ENTRY_WARN_ON?  we seem to be inconsistent
 about ordering).
 
 Or would if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) WARN_ON(...) be better?
 

Does WARN_ON(IS_ENABLED(CONFIG_DEBUG_ENTRY)  this_cpu_read(irq_count))
work?

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

On Wed, Aug 5, 2015 at 11:27 AM, Steven Rostedt rost...@goodmis.org wrote:
 On Wed, 5 Aug 2015 11:24:54 -0700
 Andy Lutomirski l...@amacapital.net wrote:

 On Wed, Aug 5, 2015 at 1:59 AM, Ingo Molnar mi...@kernel.org wrote:
 
  * Andy Lutomirski l...@kernel.org wrote:
 
  --- a/arch/x86/kernel/process_64.c
  +++ b/arch/x86/kernel/process_64.c
  @@ -280,6 +280,10 @@ __switch_to(struct task_struct *prev_p, struct 
  task_struct *next_p)
unsigned fsindex, gsindex;
fpu_switch_t fpu_switch;
 
  +#ifdef CONFIG_DEBUG_ENTRY
  + WARN_ON(this_cpu_read(irq_count));
  +#endif
 
  Please introduce a less noisy (to the eyes) version of this, something 
  like:
 
  WARN_ON_DEBUG_ENTRY(this_cpu_read(irq_count));
 
  or so, similar to WARN_ON_FPU().

 I can do that (or DEBUG_ENTRY_WARN_ON?  we seem to be inconsistent
 about ordering).

 Or would if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) WARN_ON(...) be better?


 Does WARN_ON(IS_ENABLED(CONFIG_DEBUG_ENTRY)  this_cpu_read(irq_count))
 work?

I'd be okay with it.  Ingo?

(Except that that line of code is from v1, and v2 looks slightly
different here, but that's beside the point.)

--Andy


 -- Steve



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

On Sat, Jul 25, 2015 at 10:59 AM, Andy Lutomirski  wrote:
>
> What if we added something like:
>
> if (regs->ip == ret_after_sti && !user_mode(regs) && (regs->flags &
> X86_EFLAGS_IF)) {
>   regs->ip--;
>   regs->flags &= ~X86_EFLAGS_IF;
> }
>
> to do_nmi, do_machine_check, and do_debug (the latter because kernel
> breakpoints, damnit)?

Hmm. And how would you test it?

Putting an instruction breakpoint on the final 'ret' might do it, I
guess. "mov ss" disables even that (and is documented to disable nmi
too), but maybe it works with 'sti; ret'.

But yes, as long as we'd have some test coverage, that sounds doable.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

2015-07-25 Thread Andy Lutomirski

On Sat, Jul 25, 2015 at 10:56 AM, Linus Torvalds
 wrote:
> On Fri, Jul 24, 2015 at 9:59 PM, Andy Lutomirski  wrote:
>>
>> And people will give me five new heads if I ignore Linus and do RET
>> even with IF=1, saving 300 cycles?
>
> So I'm still nervous about that "sti; ret" when we're back on the
> original kernel stack that took the original fault or interrupt.  But
> it's probably ok.
>
> Yes, it's irq-safe. But it's not NMI-safe, so if an NMI happens there,
> when the NMI returns, an interrupt might occur there too. But since
> we're back on the original stack where the original fault happened,
> and since interrupts were enabled, I don't see why that would be
> horrible. In theory, we might have a growing stack if this keeps
> happening, but since the only way to get that is to get the NMI in
> that one-instruction window (and apparently on at least _some_
> microarchitectures the sti shadow stops even NMI's), I don't see how
> any kind of unbounded growth would happen.
>
> So.
>
> I think it would work, and it might even be good for "coverage" (ie
> the whole "iret-to-ret-conversion" will not have a lot of testing if
> it only happens for faults with interrupts disabled).
>
> But it still worries me a bit.
>

What if we added something like:

if (regs->ip == ret_after_sti && !user_mode(regs) && (regs->flags &
X86_EFLAGS_IF)) {
  regs->ip--;
  regs->flags &= ~X86_EFLAGS_IF;
}

to do_nmi, do_machine_check, and do_debug (the latter because kernel
breakpoints, damnit)?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

On Fri, Jul 24, 2015 at 9:59 PM, Andy Lutomirski  wrote:
>
> And people will give me five new heads if I ignore Linus and do RET
> even with IF=1, saving 300 cycles?

So I'm still nervous about that "sti; ret" when we're back on the
original kernel stack that took the original fault or interrupt.  But
it's probably ok.

Yes, it's irq-safe. But it's not NMI-safe, so if an NMI happens there,
when the NMI returns, an interrupt might occur there too. But since
we're back on the original stack where the original fault happened,
and since interrupts were enabled, I don't see why that would be
horrible. In theory, we might have a growing stack if this keeps
happening, but since the only way to get that is to get the NMI in
that one-instruction window (and apparently on at least _some_
microarchitectures the sti shadow stops even NMI's), I don't see how
any kind of unbounded growth would happen.

So.

I think it would work, and it might even be good for "coverage" (ie
the whole "iret-to-ret-conversion" will not have a lot of testing if
it only happens for faults with interrupts disabled).

But it still worries me a bit.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

2015-07-25 Thread Borislav Petkov

On Fri, Jul 24, 2015 at 09:59:16PM -0700, Andy Lutomirski wrote:
> And people will give me five new heads if I ignore Linus and do RET
> even with IF=1, saving 300 cycles?

As long as you don't make it too complex and corner-casy, I'll give you
hats for those heads.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

2015-07-25 Thread Borislav Petkov

On Fri, Jul 24, 2015 at 09:59:16PM -0700, Andy Lutomirski wrote:
 And people will give me five new heads if I ignore Linus and do RET
 even with IF=1, saving 300 cycles?

As long as you don't make it too complex and corner-casy, I'll give you
hats for those heads.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

2015-07-25 Thread Andy Lutomirski

On Sat, Jul 25, 2015 at 10:56 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
 On Fri, Jul 24, 2015 at 9:59 PM, Andy Lutomirski l...@amacapital.net wrote:

 And people will give me five new heads if I ignore Linus and do RET
 even with IF=1, saving 300 cycles?

 So I'm still nervous about that sti; ret when we're back on the
 original kernel stack that took the original fault or interrupt.  But
 it's probably ok.

 Yes, it's irq-safe. But it's not NMI-safe, so if an NMI happens there,
 when the NMI returns, an interrupt might occur there too. But since
 we're back on the original stack where the original fault happened,
 and since interrupts were enabled, I don't see why that would be
 horrible. In theory, we might have a growing stack if this keeps
 happening, but since the only way to get that is to get the NMI in
 that one-instruction window (and apparently on at least _some_
 microarchitectures the sti shadow stops even NMI's), I don't see how
 any kind of unbounded growth would happen.

 So.

 I think it would work, and it might even be good for coverage (ie
 the whole iret-to-ret-conversion will not have a lot of testing if
 it only happens for faults with interrupts disabled).

 But it still worries me a bit.


What if we added something like:

if (regs-ip == ret_after_sti  !user_mode(regs)  (regs-flags 
X86_EFLAGS_IF)) {
  regs-ip--;
  regs-flags = ~X86_EFLAGS_IF;
}

to do_nmi, do_machine_check, and do_debug (the latter because kernel
breakpoints, damnit)?

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

On Fri, Jul 24, 2015 at 9:59 PM, Andy Lutomirski l...@amacapital.net wrote:

 And people will give me five new heads if I ignore Linus and do RET
 even with IF=1, saving 300 cycles?

So I'm still nervous about that sti; ret when we're back on the
original kernel stack that took the original fault or interrupt.  But
it's probably ok.

Yes, it's irq-safe. But it's not NMI-safe, so if an NMI happens there,
when the NMI returns, an interrupt might occur there too. But since
we're back on the original stack where the original fault happened,
and since interrupts were enabled, I don't see why that would be
horrible. In theory, we might have a growing stack if this keeps
happening, but since the only way to get that is to get the NMI in
that one-instruction window (and apparently on at least _some_
microarchitectures the sti shadow stops even NMI's), I don't see how
any kind of unbounded growth would happen.

So.

I think it would work, and it might even be good for coverage (ie
the whole iret-to-ret-conversion will not have a lot of testing if
it only happens for faults with interrupts disabled).

But it still worries me a bit.

  Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

On Sat, Jul 25, 2015 at 10:59 AM, Andy Lutomirski l...@amacapital.net wrote:

 What if we added something like:

 if (regs-ip == ret_after_sti  !user_mode(regs)  (regs-flags 
 X86_EFLAGS_IF)) {
   regs-ip--;
   regs-flags = ~X86_EFLAGS_IF;
 }

 to do_nmi, do_machine_check, and do_debug (the latter because kernel
 breakpoints, damnit)?

Hmm. And how would you test it?

Putting an instruction breakpoint on the final 'ret' might do it, I
guess. mov ss disables even that (and is documented to disable nmi
too), but maybe it works with 'sti; ret'.

But yes, as long as we'd have some test coverage, that sounds doable.

Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

On Fri, Jul 24, 2015 at 9:32 PM, Borislav Petkov  wrote:
> On Fri, Jul 24, 2015 at 09:28:47PM -0700, Andy Lutomirski wrote:
>> Yeah, I'm going to submit v2 with the simple approach.  I admit I'm
>> rather fond of xadd as a way to switch rsp and set a flag at the same
>> time, though :)
>
> I know you are.
>
> But people will rip your head out if you added 60 cycles to the IRQ
> path.

And people will give me five new heads if I ignore Linus and do RET
even with IF=1, saving 300 cycles?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

On Fri, Jul 24, 2015 at 09:28:47PM -0700, Andy Lutomirski wrote:
> Yeah, I'm going to submit v2 with the simple approach.  I admit I'm
> rather fond of xadd as a way to switch rsp and set a flag at the same
> time, though :)

I know you are.

But people will rip your head out if you added 60 cycles to the IRQ
path.

:-)

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

On Fri, Jul 24, 2015 at 9:16 PM, Borislav Petkov  wrote:
> On Fri, Jul 24, 2015 at 11:02:51AM -0700, Andy Lutomirski wrote:
>> So really the only difference between this simple approach (which is
>> more or less what we do now) and my fancy approach is that a kernel
>> instruction breakpoint will cause do_debug to run on the initial stack
>> instead of the IRQ stack.
>
> Sounds ok to me. What would be the worst thing if we limited the #DB
> stack? Some breakpoints will get ignored? In an endless stream of
> breakpoints hammering? Doesn't sound like a valid use case to me, does
> it?
>
>> I'm still tempted to say we should use my overly paranoid atomic
>> approach for now and optimize later,...
>
> But why change it if the simple approach of incrementing irq_count first
> is still fine? I think we want to KISS here exactly because apparently
> complexity in that area is a serious PITA...

Yeah, I'm going to submit v2 with the simple approach.  I admit I'm
rather fond of xadd as a way to switch rsp and set a flag at the same
time, though :)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

On Fri, Jul 24, 2015 at 11:02:51AM -0700, Andy Lutomirski wrote:
> So really the only difference between this simple approach (which is
> more or less what we do now) and my fancy approach is that a kernel
> instruction breakpoint will cause do_debug to run on the initial stack
> instead of the IRQ stack.

Sounds ok to me. What would be the worst thing if we limited the #DB
stack? Some breakpoints will get ignored? In an endless stream of
breakpoints hammering? Doesn't sound like a valid use case to me, does
it?

> I'm still tempted to say we should use my overly paranoid atomic
> approach for now and optimize later,...

But why change it if the simple approach of incrementing irq_count first
is still fine? I think we want to KISS here exactly because apparently
complexity in that area is a serious PITA...

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

On Fri, Jul 24, 2015 at 3:25 AM, Borislav Petkov  wrote:
> On Thu, Jul 23, 2015 at 11:08:39PM -0700, Andy Lutomirski wrote:
>> To be obviously safe against any local exception, we want a single
>> instruction that will change %rsp and some in-memory flag at the same
>> time.  There aren't a whole lot of candidates.  Cmpxchg isn't useful
>> (cmpxchg with a memory operand doesn't modify its register operand).
>
> Why would you even need that?
>
> You do LOCK; CMPXCHG on a per_cpu variable and then test ZF? I.e., use
> it as a mutex in asm. With ZF=1, you switch stacks, with ZF=0, you
> busy-wait ...
>
> Or am I missing something?

I'm not worried about other CPUs at all, so the LOCK isn't needed.
I'm worried about an interrupt coming while the in-memory state says
we're on the IRQ stack but RSP isn't pointing at the IRQ stack or vice
versa.

This isn't possible in current kernels because nothing ever switches
to the IRQ stack except IRQs, and those don't happen with IF = 0
(unless Xen does more awful things than I realize), but I want to use
IRQ stacks for int3, and that can happen inside NMI.

An alternative solution would be to never switch to the IRQ stack if
RSP points to the IRQ stack already or if we're already on an IST
stack, but that seems full of corner cases.

But wait.  Maybe a really simple approach is fine: first increment
irq_count and then switch RSP.  That leaves a window with irq_count
marking the IRQ stack as being in use but RSP not pointing to the IRQ
stack.

What can go wrong?  We can get an NMI, an MCE, a breakpoint, or a
vmalloc fault.  An NMI is fine.  It will switch to the NMI stack.  The
IRQ stack is marked as in use, but that doesn't matter -- the NMI
stack has plenty of space.  An MCE is fine.  It will switch to the IST
stack.  It might return using RET, which will push one single extra
word to the kernel stack, but that's not a problem for stack overruns
(unless there's a never-ending stream of MCEs, but we're already
terminally screwed if that happens).  A breakpoint will *not* switch
to an IST stack because we're going to get rid of the debug stack, and
it will fail to switch to the IRQ stack, so we need to limit the stack
depth of do_debug.  Maybe that's okay.  A breakpoint with NMI or MCE
inside is still fine.

So really the only difference between this simple approach (which is
more or less what we do now) and my fancy approach is that a kernel
instruction breakpoint will cause do_debug to run on the initial stack
instead of the IRQ stack.

I'm still tempted to say we should use my overly paranoid atomic
approach for now and optimize later, but I'm fine with spinning a v3,
too.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

On Thu, Jul 23, 2015 at 11:08:39PM -0700, Andy Lutomirski wrote:
> To be obviously safe against any local exception, we want a single
> instruction that will change %rsp and some in-memory flag at the same
> time.  There aren't a whole lot of candidates.  Cmpxchg isn't useful
> (cmpxchg with a memory operand doesn't modify its register operand).

Why would you even need that?

You do LOCK; CMPXCHG on a per_cpu variable and then test ZF? I.e., use
it as a mutex in asm. With ZF=1, you switch stacks, with ZF=0, you
busy-wait ...

Or am I missing something?

This way you serialize all irq stack switchers...

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

On Thu, Jul 23, 2015 at 3:37 PM, Andy Lutomirski  wrote:
> This will allow IRQ stacks to nest inside NMIs or similar entries
> that can happen during IRQ stack setup or teardown.
>
> The Xen code here has a confusing comment.
>
> Signed-off-by: Andy Lutomirski 
> ---
>  arch/x86/entry/entry_64.S| 72 
> ++--
>  arch/x86/kernel/cpu/common.c |  2 +-
>  arch/x86/kernel/process_64.c |  4 +++
>  3 files changed, 47 insertions(+), 31 deletions(-)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index d3033183ed70..5f7df8949fa7 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -491,6 +491,39 @@ ENTRY(irq_entries_start)
>  END(irq_entries_start)

This code is much more subtle than I thought when I wrote it:

>
>  /*
> + * Enters the IRQ stack if we're not already using it.  NMI-safe.  Clobbers
> + * flags and puts old RSP into old_rsp, and leaves all other GPRs alone.
> + * Requires kernel GSBASE.
> + *
> + * The invariant is that, if irq_count != 0, then we're either on the
> + * IRQ stack or an IST stack, even if an NMI interrupts IRQ stack entry
> + * or exit.
> + */
> +.macro ENTER_IRQ_STACK old_rsp
> +   movq%rsp, \old_rsp
> +   cmpl$0, PER_CPU_VAR(irq_count)
> +   jne 694f
> +   movqPER_CPU_VAR(irq_stack_ptr), %rsp
> +   /*
> +* Right now, we're on the irq stack with irq_count == 0.  A nested
> +* IRQ stack switch could clobber the stack.  That's fine: the stack
> +* is empty.
> +*/

A nested ENTER_IRQ_STACK/LEAVE_IRQ_STACK pair is fine here.  Anything
else that does PUSH (or non-IST interrupt delivery) right here is not
safe because something could interrupt *that* and do ENTER_IRQ_STACK,
thus clobbering whatever got pushed here.

In a world populated by sane people, the only things that can
interrupt here are a vmalloc fault (let's just kill that), NMI, or
MCE.  But we're insane and we're talking about removing breakpoints
from the IST stack and even returning from IST entries using RET,
either of which will write something to (%rsp) and expect it not to
get clobbered.

We can't interchange the incl and the movq, because then we aren't
safe against nested ENTER_IRQ_STACK.

To be obviously safe against any local exception, we want a single
instruction that will change %rsp and some in-memory flag at the same
time.  There aren't a whole lot of candidates.  Cmpxchg isn't useful
(cmpxchg with a memory operand doesn't modify its register operand).
xchg could plausibly be abused to work, but it's slow because it's
always atomic.  Enter isn't going to work without a window in which
rsp contains something bogus.

Xadd, on the other hand, just might work.

We need two percpu variables: irq_stack_ptr and irq_stack_flag.
irq_stack_ptr points to the IRQ stack and isn't modified.
irq_stack_flag == irq_stack_ptr if we're on the IRQ stack and
irq_stack_flag has any other value if we're not on the IRQ stack.
Then algebra happens.  Unfortunately, the best I came up with so far
uses xadd to enter the IRQ stack and xchg to leave.

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/entry_ist=36825112b6082f2711605647366cd682a6be678a

I don't love it because it probably adds 60 cycles or so to IRQs.  On
the other hand, it was fun to write.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

On Thu, Jul 23, 2015 at 3:37 PM, Andy Lutomirski l...@kernel.org wrote:
This will allow IRQ stacks to nest inside NMIs or similar entries
that can happen during IRQ stack setup or teardown.

The Xen code here has a confusing comment.

Signed-off-by: Andy Lutomirski l...@kernel.org
---
arch/x86/entry/entry_64.S| 72
++--
arch/x86/kernel/cpu/common.c | 2 +-
arch/x86/kernel/process_64.c | 4 +++
3 files changed, 47 insertions(+), 31 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index d3033183ed70..5f7df8949fa7 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -491,6 +491,39 @@ ENTRY(irq_entries_start)
END(irq_entries_start)

This code is much more subtle than I thought when I wrote it:

/*
+ * Enters the IRQ stack if we're not already using it. NMI-safe. Clobbers
+ * flags and puts old RSP into old_rsp, and leaves all other GPRs alone.
+ * Requires kernel GSBASE.
+ *
+ * The invariant is that, if irq_count != 0, then we're either on the
+ * IRQ stack or an IST stack, even if an NMI interrupts IRQ stack entry
+ * or exit.
+ */
+.macro ENTER_IRQ_STACK old_rsp
+ movq%rsp, \old_rsp
+ cmpl$0, PER_CPU_VAR(irq_count)
+ jne 694f
+ movqPER_CPU_VAR(irq_stack_ptr), %rsp
+ /*
+* Right now, we're on the irq stack with irq_count == 0. A nested
+* IRQ stack switch could clobber the stack. That's fine: the stack
+* is empty.
+*/

A nested ENTER_IRQ_STACK/LEAVE_IRQ_STACK pair is fine here. Anything
else that does PUSH (or non-IST interrupt delivery) right here is not
safe because something could interrupt *that* and do ENTER_IRQ_STACK,
thus clobbering whatever got pushed here.

In a world populated by sane people, the only things that can
interrupt here are a vmalloc fault (let's just kill that), NMI, or
MCE. But we're insane and we're talking about removing breakpoints
from the IST stack and even returning from IST entries using RET,
either of which will write something to (%rsp) and expect it not to
get clobbered.

We can't interchange the incl and the movq, because then we aren't
safe against nested ENTER_IRQ_STACK.

To be obviously safe against any local exception, we want a single
instruction that will change %rsp and some in-memory flag at the same
time. There aren't a whole lot of candidates. Cmpxchg isn't useful
(cmpxchg with a memory operand doesn't modify its register operand).
xchg could plausibly be abused to work, but it's slow because it's
always atomic. Enter isn't going to work without a window in which
rsp contains something bogus.

Xadd, on the other hand, just might work.

We need two percpu variables: irq_stack_ptr and irq_stack_flag.
irq_stack_ptr points to the IRQ stack and isn't modified.
irq_stack_flag == irq_stack_ptr if we're on the IRQ stack and
irq_stack_flag has any other value if we're not on the IRQ stack.
Then algebra happens. Unfortunately, the best I came up with so far
uses xadd to enter the IRQ stack and xchg to leave.

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/entry_istid=36825112b6082f2711605647366cd682a6be678a

I don't love it because it probably adds 60 cycles or so to IRQs. On
the other hand, it was fun to write.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

On Thu, Jul 23, 2015 at 11:08:39PM -0700, Andy Lutomirski wrote:
 To be obviously safe against any local exception, we want a single
 instruction that will change %rsp and some in-memory flag at the same
 time.  There aren't a whole lot of candidates.  Cmpxchg isn't useful
 (cmpxchg with a memory operand doesn't modify its register operand).

Why would you even need that?

You do LOCK; CMPXCHG on a per_cpu variable and then test ZF? I.e., use
it as a mutex in asm. With ZF=1, you switch stacks, with ZF=0, you
busy-wait ...

Or am I missing something?

This way you serialize all irq stack switchers...

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

On Fri, Jul 24, 2015 at 3:25 AM, Borislav Petkov b...@alien8.de wrote:
 On Thu, Jul 23, 2015 at 11:08:39PM -0700, Andy Lutomirski wrote:
 To be obviously safe against any local exception, we want a single
 instruction that will change %rsp and some in-memory flag at the same
 time.  There aren't a whole lot of candidates.  Cmpxchg isn't useful
 (cmpxchg with a memory operand doesn't modify its register operand).

 Why would you even need that?

 You do LOCK; CMPXCHG on a per_cpu variable and then test ZF? I.e., use
 it as a mutex in asm. With ZF=1, you switch stacks, with ZF=0, you
 busy-wait ...

 Or am I missing something?

I'm not worried about other CPUs at all, so the LOCK isn't needed.
I'm worried about an interrupt coming while the in-memory state says
we're on the IRQ stack but RSP isn't pointing at the IRQ stack or vice
versa.

This isn't possible in current kernels because nothing ever switches
to the IRQ stack except IRQs, and those don't happen with IF = 0
(unless Xen does more awful things than I realize), but I want to use
IRQ stacks for int3, and that can happen inside NMI.

An alternative solution would be to never switch to the IRQ stack if
RSP points to the IRQ stack already or if we're already on an IST
stack, but that seems full of corner cases.

But wait.  Maybe a really simple approach is fine: first increment
irq_count and then switch RSP.  That leaves a window with irq_count
marking the IRQ stack as being in use but RSP not pointing to the IRQ
stack.

What can go wrong?  We can get an NMI, an MCE, a breakpoint, or a
vmalloc fault.  An NMI is fine.  It will switch to the NMI stack.  The
IRQ stack is marked as in use, but that doesn't matter -- the NMI
stack has plenty of space.  An MCE is fine.  It will switch to the IST
stack.  It might return using RET, which will push one single extra
word to the kernel stack, but that's not a problem for stack overruns
(unless there's a never-ending stream of MCEs, but we're already
terminally screwed if that happens).  A breakpoint will *not* switch
to an IST stack because we're going to get rid of the debug stack, and
it will fail to switch to the IRQ stack, so we need to limit the stack
depth of do_debug.  Maybe that's okay.  A breakpoint with NMI or MCE
inside is still fine.

So really the only difference between this simple approach (which is
more or less what we do now) and my fancy approach is that a kernel
instruction breakpoint will cause do_debug to run on the initial stack
instead of the IRQ stack.

I'm still tempted to say we should use my overly paranoid atomic
approach for now and optimize later, but I'm fine with spinning a v3,
too.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

On Fri, Jul 24, 2015 at 11:02:51AM -0700, Andy Lutomirski wrote:
 So really the only difference between this simple approach (which is
 more or less what we do now) and my fancy approach is that a kernel
 instruction breakpoint will cause do_debug to run on the initial stack
 instead of the IRQ stack.

Sounds ok to me. What would be the worst thing if we limited the #DB
stack? Some breakpoints will get ignored? In an endless stream of
breakpoints hammering? Doesn't sound like a valid use case to me, does
it?

 I'm still tempted to say we should use my overly paranoid atomic
 approach for now and optimize later,...

But why change it if the simple approach of incrementing irq_count first
is still fine? I think we want to KISS here exactly because apparently
complexity in that area is a serious PITA...

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

On Fri, Jul 24, 2015 at 9:16 PM, Borislav Petkov b...@alien8.de wrote:
 On Fri, Jul 24, 2015 at 11:02:51AM -0700, Andy Lutomirski wrote:
 So really the only difference between this simple approach (which is
 more or less what we do now) and my fancy approach is that a kernel
 instruction breakpoint will cause do_debug to run on the initial stack
 instead of the IRQ stack.

 Sounds ok to me. What would be the worst thing if we limited the #DB
 stack? Some breakpoints will get ignored? In an endless stream of
 breakpoints hammering? Doesn't sound like a valid use case to me, does
 it?

 I'm still tempted to say we should use my overly paranoid atomic
 approach for now and optimize later,...

 But why change it if the simple approach of incrementing irq_count first
 is still fine? I think we want to KISS here exactly because apparently
 complexity in that area is a serious PITA...

Yeah, I'm going to submit v2 with the simple approach.  I admit I'm
rather fond of xadd as a way to switch rsp and set a flag at the same
time, though :)

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

On Fri, Jul 24, 2015 at 09:28:47PM -0700, Andy Lutomirski wrote:
 Yeah, I'm going to submit v2 with the simple approach.  I admit I'm
 rather fond of xadd as a way to switch rsp and set a flag at the same
 time, though :)

I know you are.

But people will rip your head out if you added 60 cycles to the IRQ
path.

:-)

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe