Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
* Andy Lutomirski wrote: > On Wed, Aug 5, 2015 at 11:27 AM, Steven Rostedt wrote: > > On Wed, 5 Aug 2015 11:24:54 -0700 > > Andy Lutomirski wrote: > > > >> On Wed, Aug 5, 2015 at 1:59 AM, Ingo Molnar wrote: > >> > > >> > * Andy Lutomirski wrote: > >> > > >> >> --- a/arch/x86/kernel/process_64.c > >> >> +++ b/arch/x86/kernel/process_64.c > >> >> @@ -280,6 +280,10 @@ __switch_to(struct task_struct *prev_p, struct > >> >> task_struct *next_p) > >> >> unsigned fsindex, gsindex; > >> >> fpu_switch_t fpu_switch; > >> >> > >> >> +#ifdef CONFIG_DEBUG_ENTRY > >> >> + WARN_ON(this_cpu_read(irq_count)); > >> >> +#endif > >> > > >> > Please introduce a less noisy (to the eyes) version of this, something > >> > like: > >> > > >> > WARN_ON_DEBUG_ENTRY(this_cpu_read(irq_count)); > >> > > >> > or so, similar to WARN_ON_FPU(). > >> > >> I can do that (or "DEBUG_ENTRY_WARN_ON"? we seem to be inconsistent > >> about ordering). > >> > >> Or would if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) WARN_ON(...) be better? > >> > > > > Does WARN_ON(IS_ENABLED(CONFIG_DEBUG_ENTRY) && this_cpu_read(irq_count)) > > work? > > I'd be okay with it. Ingo? Yeah, that one is more compact than the #ifdef variant. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
* Andy Lutomirski l...@amacapital.net wrote: On Wed, Aug 5, 2015 at 11:27 AM, Steven Rostedt rost...@goodmis.org wrote: On Wed, 5 Aug 2015 11:24:54 -0700 Andy Lutomirski l...@amacapital.net wrote: On Wed, Aug 5, 2015 at 1:59 AM, Ingo Molnar mi...@kernel.org wrote: * Andy Lutomirski l...@kernel.org wrote: --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -280,6 +280,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) unsigned fsindex, gsindex; fpu_switch_t fpu_switch; +#ifdef CONFIG_DEBUG_ENTRY + WARN_ON(this_cpu_read(irq_count)); +#endif Please introduce a less noisy (to the eyes) version of this, something like: WARN_ON_DEBUG_ENTRY(this_cpu_read(irq_count)); or so, similar to WARN_ON_FPU(). I can do that (or DEBUG_ENTRY_WARN_ON? we seem to be inconsistent about ordering). Or would if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) WARN_ON(...) be better? Does WARN_ON(IS_ENABLED(CONFIG_DEBUG_ENTRY) this_cpu_read(irq_count)) work? I'd be okay with it. Ingo? Yeah, that one is more compact than the #ifdef variant. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Wed, Aug 5, 2015 at 11:27 AM, Steven Rostedt wrote: > On Wed, 5 Aug 2015 11:24:54 -0700 > Andy Lutomirski wrote: > >> On Wed, Aug 5, 2015 at 1:59 AM, Ingo Molnar wrote: >> > >> > * Andy Lutomirski wrote: >> > >> >> --- a/arch/x86/kernel/process_64.c >> >> +++ b/arch/x86/kernel/process_64.c >> >> @@ -280,6 +280,10 @@ __switch_to(struct task_struct *prev_p, struct >> >> task_struct *next_p) >> >> unsigned fsindex, gsindex; >> >> fpu_switch_t fpu_switch; >> >> >> >> +#ifdef CONFIG_DEBUG_ENTRY >> >> + WARN_ON(this_cpu_read(irq_count)); >> >> +#endif >> > >> > Please introduce a less noisy (to the eyes) version of this, something >> > like: >> > >> > WARN_ON_DEBUG_ENTRY(this_cpu_read(irq_count)); >> > >> > or so, similar to WARN_ON_FPU(). >> >> I can do that (or "DEBUG_ENTRY_WARN_ON"? we seem to be inconsistent >> about ordering). >> >> Or would if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) WARN_ON(...) be better? >> > > Does WARN_ON(IS_ENABLED(CONFIG_DEBUG_ENTRY) && this_cpu_read(irq_count)) > work? I'd be okay with it. Ingo? (Except that that line of code is from v1, and v2 looks slightly different here, but that's beside the point.) --Andy > > -- Steve -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Wed, 5 Aug 2015 11:24:54 -0700 Andy Lutomirski wrote: > On Wed, Aug 5, 2015 at 1:59 AM, Ingo Molnar wrote: > > > > * Andy Lutomirski wrote: > > > >> --- a/arch/x86/kernel/process_64.c > >> +++ b/arch/x86/kernel/process_64.c > >> @@ -280,6 +280,10 @@ __switch_to(struct task_struct *prev_p, struct > >> task_struct *next_p) > >> unsigned fsindex, gsindex; > >> fpu_switch_t fpu_switch; > >> > >> +#ifdef CONFIG_DEBUG_ENTRY > >> + WARN_ON(this_cpu_read(irq_count)); > >> +#endif > > > > Please introduce a less noisy (to the eyes) version of this, something like: > > > > WARN_ON_DEBUG_ENTRY(this_cpu_read(irq_count)); > > > > or so, similar to WARN_ON_FPU(). > > I can do that (or "DEBUG_ENTRY_WARN_ON"? we seem to be inconsistent > about ordering). > > Or would if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) WARN_ON(...) be better? > Does WARN_ON(IS_ENABLED(CONFIG_DEBUG_ENTRY) && this_cpu_read(irq_count)) work? -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Wed, Aug 5, 2015 at 1:59 AM, Ingo Molnar wrote: > > * Andy Lutomirski wrote: > >> --- a/arch/x86/kernel/process_64.c >> +++ b/arch/x86/kernel/process_64.c >> @@ -280,6 +280,10 @@ __switch_to(struct task_struct *prev_p, struct >> task_struct *next_p) >> unsigned fsindex, gsindex; >> fpu_switch_t fpu_switch; >> >> +#ifdef CONFIG_DEBUG_ENTRY >> + WARN_ON(this_cpu_read(irq_count)); >> +#endif > > Please introduce a less noisy (to the eyes) version of this, something like: > > WARN_ON_DEBUG_ENTRY(this_cpu_read(irq_count)); > > or so, similar to WARN_ON_FPU(). I can do that (or "DEBUG_ENTRY_WARN_ON"? we seem to be inconsistent about ordering). Or would if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) WARN_ON(...) be better? --Andy > > Thanks, > > Ingo -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
* Andy Lutomirski wrote: > --- a/arch/x86/kernel/process_64.c > +++ b/arch/x86/kernel/process_64.c > @@ -280,6 +280,10 @@ __switch_to(struct task_struct *prev_p, struct > task_struct *next_p) > unsigned fsindex, gsindex; > fpu_switch_t fpu_switch; > > +#ifdef CONFIG_DEBUG_ENTRY > + WARN_ON(this_cpu_read(irq_count)); > +#endif Please introduce a less noisy (to the eyes) version of this, something like: WARN_ON_DEBUG_ENTRY(this_cpu_read(irq_count)); or so, similar to WARN_ON_FPU(). Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
* Andy Lutomirski l...@kernel.org wrote: --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -280,6 +280,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) unsigned fsindex, gsindex; fpu_switch_t fpu_switch; +#ifdef CONFIG_DEBUG_ENTRY + WARN_ON(this_cpu_read(irq_count)); +#endif Please introduce a less noisy (to the eyes) version of this, something like: WARN_ON_DEBUG_ENTRY(this_cpu_read(irq_count)); or so, similar to WARN_ON_FPU(). Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Wed, Aug 5, 2015 at 1:59 AM, Ingo Molnar mi...@kernel.org wrote: * Andy Lutomirski l...@kernel.org wrote: --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -280,6 +280,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) unsigned fsindex, gsindex; fpu_switch_t fpu_switch; +#ifdef CONFIG_DEBUG_ENTRY + WARN_ON(this_cpu_read(irq_count)); +#endif Please introduce a less noisy (to the eyes) version of this, something like: WARN_ON_DEBUG_ENTRY(this_cpu_read(irq_count)); or so, similar to WARN_ON_FPU(). I can do that (or DEBUG_ENTRY_WARN_ON? we seem to be inconsistent about ordering). Or would if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) WARN_ON(...) be better? --Andy Thanks, Ingo -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Wed, 5 Aug 2015 11:24:54 -0700 Andy Lutomirski l...@amacapital.net wrote: On Wed, Aug 5, 2015 at 1:59 AM, Ingo Molnar mi...@kernel.org wrote: * Andy Lutomirski l...@kernel.org wrote: --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -280,6 +280,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) unsigned fsindex, gsindex; fpu_switch_t fpu_switch; +#ifdef CONFIG_DEBUG_ENTRY + WARN_ON(this_cpu_read(irq_count)); +#endif Please introduce a less noisy (to the eyes) version of this, something like: WARN_ON_DEBUG_ENTRY(this_cpu_read(irq_count)); or so, similar to WARN_ON_FPU(). I can do that (or DEBUG_ENTRY_WARN_ON? we seem to be inconsistent about ordering). Or would if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) WARN_ON(...) be better? Does WARN_ON(IS_ENABLED(CONFIG_DEBUG_ENTRY) this_cpu_read(irq_count)) work? -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Wed, Aug 5, 2015 at 11:27 AM, Steven Rostedt rost...@goodmis.org wrote: On Wed, 5 Aug 2015 11:24:54 -0700 Andy Lutomirski l...@amacapital.net wrote: On Wed, Aug 5, 2015 at 1:59 AM, Ingo Molnar mi...@kernel.org wrote: * Andy Lutomirski l...@kernel.org wrote: --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -280,6 +280,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) unsigned fsindex, gsindex; fpu_switch_t fpu_switch; +#ifdef CONFIG_DEBUG_ENTRY + WARN_ON(this_cpu_read(irq_count)); +#endif Please introduce a less noisy (to the eyes) version of this, something like: WARN_ON_DEBUG_ENTRY(this_cpu_read(irq_count)); or so, similar to WARN_ON_FPU(). I can do that (or DEBUG_ENTRY_WARN_ON? we seem to be inconsistent about ordering). Or would if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) WARN_ON(...) be better? Does WARN_ON(IS_ENABLED(CONFIG_DEBUG_ENTRY) this_cpu_read(irq_count)) work? I'd be okay with it. Ingo? (Except that that line of code is from v1, and v2 looks slightly different here, but that's beside the point.) --Andy -- Steve -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Sat, Jul 25, 2015 at 10:59 AM, Andy Lutomirski wrote: > > What if we added something like: > > if (regs->ip == ret_after_sti && !user_mode(regs) && (regs->flags & > X86_EFLAGS_IF)) { > regs->ip--; > regs->flags &= ~X86_EFLAGS_IF; > } > > to do_nmi, do_machine_check, and do_debug (the latter because kernel > breakpoints, damnit)? Hmm. And how would you test it? Putting an instruction breakpoint on the final 'ret' might do it, I guess. "mov ss" disables even that (and is documented to disable nmi too), but maybe it works with 'sti; ret'. But yes, as long as we'd have some test coverage, that sounds doable. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Sat, Jul 25, 2015 at 10:56 AM, Linus Torvalds wrote: > On Fri, Jul 24, 2015 at 9:59 PM, Andy Lutomirski wrote: >> >> And people will give me five new heads if I ignore Linus and do RET >> even with IF=1, saving 300 cycles? > > So I'm still nervous about that "sti; ret" when we're back on the > original kernel stack that took the original fault or interrupt. But > it's probably ok. > > Yes, it's irq-safe. But it's not NMI-safe, so if an NMI happens there, > when the NMI returns, an interrupt might occur there too. But since > we're back on the original stack where the original fault happened, > and since interrupts were enabled, I don't see why that would be > horrible. In theory, we might have a growing stack if this keeps > happening, but since the only way to get that is to get the NMI in > that one-instruction window (and apparently on at least _some_ > microarchitectures the sti shadow stops even NMI's), I don't see how > any kind of unbounded growth would happen. > > So. > > I think it would work, and it might even be good for "coverage" (ie > the whole "iret-to-ret-conversion" will not have a lot of testing if > it only happens for faults with interrupts disabled). > > But it still worries me a bit. > What if we added something like: if (regs->ip == ret_after_sti && !user_mode(regs) && (regs->flags & X86_EFLAGS_IF)) { regs->ip--; regs->flags &= ~X86_EFLAGS_IF; } to do_nmi, do_machine_check, and do_debug (the latter because kernel breakpoints, damnit)? --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Fri, Jul 24, 2015 at 9:59 PM, Andy Lutomirski wrote: > > And people will give me five new heads if I ignore Linus and do RET > even with IF=1, saving 300 cycles? So I'm still nervous about that "sti; ret" when we're back on the original kernel stack that took the original fault or interrupt. But it's probably ok. Yes, it's irq-safe. But it's not NMI-safe, so if an NMI happens there, when the NMI returns, an interrupt might occur there too. But since we're back on the original stack where the original fault happened, and since interrupts were enabled, I don't see why that would be horrible. In theory, we might have a growing stack if this keeps happening, but since the only way to get that is to get the NMI in that one-instruction window (and apparently on at least _some_ microarchitectures the sti shadow stops even NMI's), I don't see how any kind of unbounded growth would happen. So. I think it would work, and it might even be good for "coverage" (ie the whole "iret-to-ret-conversion" will not have a lot of testing if it only happens for faults with interrupts disabled). But it still worries me a bit. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Fri, Jul 24, 2015 at 09:59:16PM -0700, Andy Lutomirski wrote: > And people will give me five new heads if I ignore Linus and do RET > even with IF=1, saving 300 cycles? As long as you don't make it too complex and corner-casy, I'll give you hats for those heads. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Fri, Jul 24, 2015 at 09:59:16PM -0700, Andy Lutomirski wrote: And people will give me five new heads if I ignore Linus and do RET even with IF=1, saving 300 cycles? As long as you don't make it too complex and corner-casy, I'll give you hats for those heads. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Sat, Jul 25, 2015 at 10:56 AM, Linus Torvalds torva...@linux-foundation.org wrote: On Fri, Jul 24, 2015 at 9:59 PM, Andy Lutomirski l...@amacapital.net wrote: And people will give me five new heads if I ignore Linus and do RET even with IF=1, saving 300 cycles? So I'm still nervous about that sti; ret when we're back on the original kernel stack that took the original fault or interrupt. But it's probably ok. Yes, it's irq-safe. But it's not NMI-safe, so if an NMI happens there, when the NMI returns, an interrupt might occur there too. But since we're back on the original stack where the original fault happened, and since interrupts were enabled, I don't see why that would be horrible. In theory, we might have a growing stack if this keeps happening, but since the only way to get that is to get the NMI in that one-instruction window (and apparently on at least _some_ microarchitectures the sti shadow stops even NMI's), I don't see how any kind of unbounded growth would happen. So. I think it would work, and it might even be good for coverage (ie the whole iret-to-ret-conversion will not have a lot of testing if it only happens for faults with interrupts disabled). But it still worries me a bit. What if we added something like: if (regs-ip == ret_after_sti !user_mode(regs) (regs-flags X86_EFLAGS_IF)) { regs-ip--; regs-flags = ~X86_EFLAGS_IF; } to do_nmi, do_machine_check, and do_debug (the latter because kernel breakpoints, damnit)? --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Fri, Jul 24, 2015 at 9:59 PM, Andy Lutomirski l...@amacapital.net wrote: And people will give me five new heads if I ignore Linus and do RET even with IF=1, saving 300 cycles? So I'm still nervous about that sti; ret when we're back on the original kernel stack that took the original fault or interrupt. But it's probably ok. Yes, it's irq-safe. But it's not NMI-safe, so if an NMI happens there, when the NMI returns, an interrupt might occur there too. But since we're back on the original stack where the original fault happened, and since interrupts were enabled, I don't see why that would be horrible. In theory, we might have a growing stack if this keeps happening, but since the only way to get that is to get the NMI in that one-instruction window (and apparently on at least _some_ microarchitectures the sti shadow stops even NMI's), I don't see how any kind of unbounded growth would happen. So. I think it would work, and it might even be good for coverage (ie the whole iret-to-ret-conversion will not have a lot of testing if it only happens for faults with interrupts disabled). But it still worries me a bit. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Sat, Jul 25, 2015 at 10:59 AM, Andy Lutomirski l...@amacapital.net wrote: What if we added something like: if (regs-ip == ret_after_sti !user_mode(regs) (regs-flags X86_EFLAGS_IF)) { regs-ip--; regs-flags = ~X86_EFLAGS_IF; } to do_nmi, do_machine_check, and do_debug (the latter because kernel breakpoints, damnit)? Hmm. And how would you test it? Putting an instruction breakpoint on the final 'ret' might do it, I guess. mov ss disables even that (and is documented to disable nmi too), but maybe it works with 'sti; ret'. But yes, as long as we'd have some test coverage, that sounds doable. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Fri, Jul 24, 2015 at 9:32 PM, Borislav Petkov wrote: > On Fri, Jul 24, 2015 at 09:28:47PM -0700, Andy Lutomirski wrote: >> Yeah, I'm going to submit v2 with the simple approach. I admit I'm >> rather fond of xadd as a way to switch rsp and set a flag at the same >> time, though :) > > I know you are. > > But people will rip your head out if you added 60 cycles to the IRQ > path. And people will give me five new heads if I ignore Linus and do RET even with IF=1, saving 300 cycles? --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Fri, Jul 24, 2015 at 09:28:47PM -0700, Andy Lutomirski wrote: > Yeah, I'm going to submit v2 with the simple approach. I admit I'm > rather fond of xadd as a way to switch rsp and set a flag at the same > time, though :) I know you are. But people will rip your head out if you added 60 cycles to the IRQ path. :-) -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Fri, Jul 24, 2015 at 9:16 PM, Borislav Petkov wrote: > On Fri, Jul 24, 2015 at 11:02:51AM -0700, Andy Lutomirski wrote: >> So really the only difference between this simple approach (which is >> more or less what we do now) and my fancy approach is that a kernel >> instruction breakpoint will cause do_debug to run on the initial stack >> instead of the IRQ stack. > > Sounds ok to me. What would be the worst thing if we limited the #DB > stack? Some breakpoints will get ignored? In an endless stream of > breakpoints hammering? Doesn't sound like a valid use case to me, does > it? > >> I'm still tempted to say we should use my overly paranoid atomic >> approach for now and optimize later,... > > But why change it if the simple approach of incrementing irq_count first > is still fine? I think we want to KISS here exactly because apparently > complexity in that area is a serious PITA... Yeah, I'm going to submit v2 with the simple approach. I admit I'm rather fond of xadd as a way to switch rsp and set a flag at the same time, though :) --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Fri, Jul 24, 2015 at 11:02:51AM -0700, Andy Lutomirski wrote: > So really the only difference between this simple approach (which is > more or less what we do now) and my fancy approach is that a kernel > instruction breakpoint will cause do_debug to run on the initial stack > instead of the IRQ stack. Sounds ok to me. What would be the worst thing if we limited the #DB stack? Some breakpoints will get ignored? In an endless stream of breakpoints hammering? Doesn't sound like a valid use case to me, does it? > I'm still tempted to say we should use my overly paranoid atomic > approach for now and optimize later,... But why change it if the simple approach of incrementing irq_count first is still fine? I think we want to KISS here exactly because apparently complexity in that area is a serious PITA... -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Fri, Jul 24, 2015 at 3:25 AM, Borislav Petkov wrote: > On Thu, Jul 23, 2015 at 11:08:39PM -0700, Andy Lutomirski wrote: >> To be obviously safe against any local exception, we want a single >> instruction that will change %rsp and some in-memory flag at the same >> time. There aren't a whole lot of candidates. Cmpxchg isn't useful >> (cmpxchg with a memory operand doesn't modify its register operand). > > Why would you even need that? > > You do LOCK; CMPXCHG on a per_cpu variable and then test ZF? I.e., use > it as a mutex in asm. With ZF=1, you switch stacks, with ZF=0, you > busy-wait ... > > Or am I missing something? I'm not worried about other CPUs at all, so the LOCK isn't needed. I'm worried about an interrupt coming while the in-memory state says we're on the IRQ stack but RSP isn't pointing at the IRQ stack or vice versa. This isn't possible in current kernels because nothing ever switches to the IRQ stack except IRQs, and those don't happen with IF = 0 (unless Xen does more awful things than I realize), but I want to use IRQ stacks for int3, and that can happen inside NMI. An alternative solution would be to never switch to the IRQ stack if RSP points to the IRQ stack already or if we're already on an IST stack, but that seems full of corner cases. But wait. Maybe a really simple approach is fine: first increment irq_count and then switch RSP. That leaves a window with irq_count marking the IRQ stack as being in use but RSP not pointing to the IRQ stack. What can go wrong? We can get an NMI, an MCE, a breakpoint, or a vmalloc fault. An NMI is fine. It will switch to the NMI stack. The IRQ stack is marked as in use, but that doesn't matter -- the NMI stack has plenty of space. An MCE is fine. It will switch to the IST stack. It might return using RET, which will push one single extra word to the kernel stack, but that's not a problem for stack overruns (unless there's a never-ending stream of MCEs, but we're already terminally screwed if that happens). A breakpoint will *not* switch to an IST stack because we're going to get rid of the debug stack, and it will fail to switch to the IRQ stack, so we need to limit the stack depth of do_debug. Maybe that's okay. A breakpoint with NMI or MCE inside is still fine. So really the only difference between this simple approach (which is more or less what we do now) and my fancy approach is that a kernel instruction breakpoint will cause do_debug to run on the initial stack instead of the IRQ stack. I'm still tempted to say we should use my overly paranoid atomic approach for now and optimize later, but I'm fine with spinning a v3, too. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Thu, Jul 23, 2015 at 11:08:39PM -0700, Andy Lutomirski wrote: > To be obviously safe against any local exception, we want a single > instruction that will change %rsp and some in-memory flag at the same > time. There aren't a whole lot of candidates. Cmpxchg isn't useful > (cmpxchg with a memory operand doesn't modify its register operand). Why would you even need that? You do LOCK; CMPXCHG on a per_cpu variable and then test ZF? I.e., use it as a mutex in asm. With ZF=1, you switch stacks, with ZF=0, you busy-wait ... Or am I missing something? This way you serialize all irq stack switchers... -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Thu, Jul 23, 2015 at 3:37 PM, Andy Lutomirski wrote: > This will allow IRQ stacks to nest inside NMIs or similar entries > that can happen during IRQ stack setup or teardown. > > The Xen code here has a confusing comment. > > Signed-off-by: Andy Lutomirski > --- > arch/x86/entry/entry_64.S| 72 > ++-- > arch/x86/kernel/cpu/common.c | 2 +- > arch/x86/kernel/process_64.c | 4 +++ > 3 files changed, 47 insertions(+), 31 deletions(-) > > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S > index d3033183ed70..5f7df8949fa7 100644 > --- a/arch/x86/entry/entry_64.S > +++ b/arch/x86/entry/entry_64.S > @@ -491,6 +491,39 @@ ENTRY(irq_entries_start) > END(irq_entries_start) This code is much more subtle than I thought when I wrote it: > > /* > + * Enters the IRQ stack if we're not already using it. NMI-safe. Clobbers > + * flags and puts old RSP into old_rsp, and leaves all other GPRs alone. > + * Requires kernel GSBASE. > + * > + * The invariant is that, if irq_count != 0, then we're either on the > + * IRQ stack or an IST stack, even if an NMI interrupts IRQ stack entry > + * or exit. > + */ > +.macro ENTER_IRQ_STACK old_rsp > + movq%rsp, \old_rsp > + cmpl$0, PER_CPU_VAR(irq_count) > + jne 694f > + movqPER_CPU_VAR(irq_stack_ptr), %rsp > + /* > +* Right now, we're on the irq stack with irq_count == 0. A nested > +* IRQ stack switch could clobber the stack. That's fine: the stack > +* is empty. > +*/ A nested ENTER_IRQ_STACK/LEAVE_IRQ_STACK pair is fine here. Anything else that does PUSH (or non-IST interrupt delivery) right here is not safe because something could interrupt *that* and do ENTER_IRQ_STACK, thus clobbering whatever got pushed here. In a world populated by sane people, the only things that can interrupt here are a vmalloc fault (let's just kill that), NMI, or MCE. But we're insane and we're talking about removing breakpoints from the IST stack and even returning from IST entries using RET, either of which will write something to (%rsp) and expect it not to get clobbered. We can't interchange the incl and the movq, because then we aren't safe against nested ENTER_IRQ_STACK. To be obviously safe against any local exception, we want a single instruction that will change %rsp and some in-memory flag at the same time. There aren't a whole lot of candidates. Cmpxchg isn't useful (cmpxchg with a memory operand doesn't modify its register operand). xchg could plausibly be abused to work, but it's slow because it's always atomic. Enter isn't going to work without a window in which rsp contains something bogus. Xadd, on the other hand, just might work. We need two percpu variables: irq_stack_ptr and irq_stack_flag. irq_stack_ptr points to the IRQ stack and isn't modified. irq_stack_flag == irq_stack_ptr if we're on the IRQ stack and irq_stack_flag has any other value if we're not on the IRQ stack. Then algebra happens. Unfortunately, the best I came up with so far uses xadd to enter the IRQ stack and xchg to leave. https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/entry_ist=36825112b6082f2711605647366cd682a6be678a I don't love it because it probably adds 60 cycles or so to IRQs. On the other hand, it was fun to write. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Thu, Jul 23, 2015 at 3:37 PM, Andy Lutomirski l...@kernel.org wrote: This will allow IRQ stacks to nest inside NMIs or similar entries that can happen during IRQ stack setup or teardown. The Xen code here has a confusing comment. Signed-off-by: Andy Lutomirski l...@kernel.org --- arch/x86/entry/entry_64.S| 72 ++-- arch/x86/kernel/cpu/common.c | 2 +- arch/x86/kernel/process_64.c | 4 +++ 3 files changed, 47 insertions(+), 31 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index d3033183ed70..5f7df8949fa7 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -491,6 +491,39 @@ ENTRY(irq_entries_start) END(irq_entries_start) This code is much more subtle than I thought when I wrote it: /* + * Enters the IRQ stack if we're not already using it. NMI-safe. Clobbers + * flags and puts old RSP into old_rsp, and leaves all other GPRs alone. + * Requires kernel GSBASE. + * + * The invariant is that, if irq_count != 0, then we're either on the + * IRQ stack or an IST stack, even if an NMI interrupts IRQ stack entry + * or exit. + */ +.macro ENTER_IRQ_STACK old_rsp + movq%rsp, \old_rsp + cmpl$0, PER_CPU_VAR(irq_count) + jne 694f + movqPER_CPU_VAR(irq_stack_ptr), %rsp + /* +* Right now, we're on the irq stack with irq_count == 0. A nested +* IRQ stack switch could clobber the stack. That's fine: the stack +* is empty. +*/ A nested ENTER_IRQ_STACK/LEAVE_IRQ_STACK pair is fine here. Anything else that does PUSH (or non-IST interrupt delivery) right here is not safe because something could interrupt *that* and do ENTER_IRQ_STACK, thus clobbering whatever got pushed here. In a world populated by sane people, the only things that can interrupt here are a vmalloc fault (let's just kill that), NMI, or MCE. But we're insane and we're talking about removing breakpoints from the IST stack and even returning from IST entries using RET, either of which will write something to (%rsp) and expect it not to get clobbered. We can't interchange the incl and the movq, because then we aren't safe against nested ENTER_IRQ_STACK. To be obviously safe against any local exception, we want a single instruction that will change %rsp and some in-memory flag at the same time. There aren't a whole lot of candidates. Cmpxchg isn't useful (cmpxchg with a memory operand doesn't modify its register operand). xchg could plausibly be abused to work, but it's slow because it's always atomic. Enter isn't going to work without a window in which rsp contains something bogus. Xadd, on the other hand, just might work. We need two percpu variables: irq_stack_ptr and irq_stack_flag. irq_stack_ptr points to the IRQ stack and isn't modified. irq_stack_flag == irq_stack_ptr if we're on the IRQ stack and irq_stack_flag has any other value if we're not on the IRQ stack. Then algebra happens. Unfortunately, the best I came up with so far uses xadd to enter the IRQ stack and xchg to leave. https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/entry_istid=36825112b6082f2711605647366cd682a6be678a I don't love it because it probably adds 60 cycles or so to IRQs. On the other hand, it was fun to write. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Thu, Jul 23, 2015 at 11:08:39PM -0700, Andy Lutomirski wrote: To be obviously safe against any local exception, we want a single instruction that will change %rsp and some in-memory flag at the same time. There aren't a whole lot of candidates. Cmpxchg isn't useful (cmpxchg with a memory operand doesn't modify its register operand). Why would you even need that? You do LOCK; CMPXCHG on a per_cpu variable and then test ZF? I.e., use it as a mutex in asm. With ZF=1, you switch stacks, with ZF=0, you busy-wait ... Or am I missing something? This way you serialize all irq stack switchers... -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Fri, Jul 24, 2015 at 3:25 AM, Borislav Petkov b...@alien8.de wrote: On Thu, Jul 23, 2015 at 11:08:39PM -0700, Andy Lutomirski wrote: To be obviously safe against any local exception, we want a single instruction that will change %rsp and some in-memory flag at the same time. There aren't a whole lot of candidates. Cmpxchg isn't useful (cmpxchg with a memory operand doesn't modify its register operand). Why would you even need that? You do LOCK; CMPXCHG on a per_cpu variable and then test ZF? I.e., use it as a mutex in asm. With ZF=1, you switch stacks, with ZF=0, you busy-wait ... Or am I missing something? I'm not worried about other CPUs at all, so the LOCK isn't needed. I'm worried about an interrupt coming while the in-memory state says we're on the IRQ stack but RSP isn't pointing at the IRQ stack or vice versa. This isn't possible in current kernels because nothing ever switches to the IRQ stack except IRQs, and those don't happen with IF = 0 (unless Xen does more awful things than I realize), but I want to use IRQ stacks for int3, and that can happen inside NMI. An alternative solution would be to never switch to the IRQ stack if RSP points to the IRQ stack already or if we're already on an IST stack, but that seems full of corner cases. But wait. Maybe a really simple approach is fine: first increment irq_count and then switch RSP. That leaves a window with irq_count marking the IRQ stack as being in use but RSP not pointing to the IRQ stack. What can go wrong? We can get an NMI, an MCE, a breakpoint, or a vmalloc fault. An NMI is fine. It will switch to the NMI stack. The IRQ stack is marked as in use, but that doesn't matter -- the NMI stack has plenty of space. An MCE is fine. It will switch to the IST stack. It might return using RET, which will push one single extra word to the kernel stack, but that's not a problem for stack overruns (unless there's a never-ending stream of MCEs, but we're already terminally screwed if that happens). A breakpoint will *not* switch to an IST stack because we're going to get rid of the debug stack, and it will fail to switch to the IRQ stack, so we need to limit the stack depth of do_debug. Maybe that's okay. A breakpoint with NMI or MCE inside is still fine. So really the only difference between this simple approach (which is more or less what we do now) and my fancy approach is that a kernel instruction breakpoint will cause do_debug to run on the initial stack instead of the IRQ stack. I'm still tempted to say we should use my overly paranoid atomic approach for now and optimize later, but I'm fine with spinning a v3, too. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Fri, Jul 24, 2015 at 11:02:51AM -0700, Andy Lutomirski wrote: So really the only difference between this simple approach (which is more or less what we do now) and my fancy approach is that a kernel instruction breakpoint will cause do_debug to run on the initial stack instead of the IRQ stack. Sounds ok to me. What would be the worst thing if we limited the #DB stack? Some breakpoints will get ignored? In an endless stream of breakpoints hammering? Doesn't sound like a valid use case to me, does it? I'm still tempted to say we should use my overly paranoid atomic approach for now and optimize later,... But why change it if the simple approach of incrementing irq_count first is still fine? I think we want to KISS here exactly because apparently complexity in that area is a serious PITA... -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Fri, Jul 24, 2015 at 9:16 PM, Borislav Petkov b...@alien8.de wrote: On Fri, Jul 24, 2015 at 11:02:51AM -0700, Andy Lutomirski wrote: So really the only difference between this simple approach (which is more or less what we do now) and my fancy approach is that a kernel instruction breakpoint will cause do_debug to run on the initial stack instead of the IRQ stack. Sounds ok to me. What would be the worst thing if we limited the #DB stack? Some breakpoints will get ignored? In an endless stream of breakpoints hammering? Doesn't sound like a valid use case to me, does it? I'm still tempted to say we should use my overly paranoid atomic approach for now and optimize later,... But why change it if the simple approach of incrementing irq_count first is still fine? I think we want to KISS here exactly because apparently complexity in that area is a serious PITA... Yeah, I'm going to submit v2 with the simple approach. I admit I'm rather fond of xadd as a way to switch rsp and set a flag at the same time, though :) --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Fri, Jul 24, 2015 at 09:28:47PM -0700, Andy Lutomirski wrote: Yeah, I'm going to submit v2 with the simple approach. I admit I'm rather fond of xadd as a way to switch rsp and set a flag at the same time, though :) I know you are. But people will rip your head out if you added 60 cycles to the IRQ path. :-) -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe
On Fri, Jul 24, 2015 at 9:32 PM, Borislav Petkov b...@alien8.de wrote: On Fri, Jul 24, 2015 at 09:28:47PM -0700, Andy Lutomirski wrote: Yeah, I'm going to submit v2 with the simple approach. I admit I'm rather fond of xadd as a way to switch rsp and set a flag at the same time, though :) I know you are. But people will rip your head out if you added 60 cycles to the IRQ path. And people will give me five new heads if I ignore Linus and do RET even with IF=1, saving 300 cycles? --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/