subject:"64bit x86\: NMI nesting still buggy\?"

On Tue, 29 Apr 2014 20:48:34 +0200 (CEST)
Jiri Kosina  wrote:

> On Tue, 29 Apr 2014, Steven Rostedt wrote:
> 
> > > Just to be clear here -- I don't have a box that can reproduce this; I 
> > > whole-heartedly believe that even if there are boxes with this behavior 
> > > (and I assume there are, otherwise Intel wouldn't be mentioning it in the 
> > > docs), it'd be hard to trigger on those.
> > 
> > I see your point. But it is documented for those that control both NMIs
> > and SMMs. As it says in the document: "If the SMI handler requires the
> > use of NMI interrupts". That to me sounds like a system that has
> > control over both SMIs *and* NMIs. The BIOS should not have any control
> > over NMIs, as the OS requires that. And the OS has no control over
> > SMIs.
> > 
> > That paragraph sounds irrelevant to normal BIOS and OS systems as
> > neither "owns" both SMIs and NMIs.
> 
> Which doesn't really help me being less nervous about this whole thing.
> 
> I don't believe Intel would put a completely arbitrary and nonsencial 
> paragraph into the manual all of a sudden. It'd be great to know the 
> rationale why this has been added in the first place.

Honestly, it doesn't seem to be stating policy, it seems to be stating
"what happens if I do this". Again, BIOS writers need to be more
concern about what the OS might need. They should not be changing the
way NMIs work from under the covers. The OS has no protection from this
at all. Just like the bug I had reported where the BIOS writers caused
the second PIT to get corrupted. The bug was on their end.

> 
> > > We were hunting something completely different, and came through this 
> > > paragraph in the Intel manual, and found it rather scary.
> > 
> > But this is all irrelevant anyway as this is all hypothetical and
> > there's been no real world bug with this.
> 
> One would hope. Again -- I believe if this would trigger here and here a 
> few times a year, everyone would probably atribute it to a "random hang", 
> reboot, and never see the bug again.
> 

I highly doubt it. It would cause issues on all the systems that run an
NMI watch dog. There's enough out there that a random hang will raise
an eyebrow.

And it would trigger much more often on systems that don't do the
tricks we do with my changes. There's a lot of them out there too.

I wouldn't be losing any sleep over this.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

On Tue, 29 Apr 2014, Steven Rostedt wrote:

> > Just to be clear here -- I don't have a box that can reproduce this; I 
> > whole-heartedly believe that even if there are boxes with this behavior 
> > (and I assume there are, otherwise Intel wouldn't be mentioning it in the 
> > docs), it'd be hard to trigger on those.
> 
> I see your point. But it is documented for those that control both NMIs
> and SMMs. As it says in the document: "If the SMI handler requires the
> use of NMI interrupts". That to me sounds like a system that has
> control over both SMIs *and* NMIs. The BIOS should not have any control
> over NMIs, as the OS requires that. And the OS has no control over
> SMIs.
> 
> That paragraph sounds irrelevant to normal BIOS and OS systems as
> neither "owns" both SMIs and NMIs.

Which doesn't really help me being less nervous about this whole thing.

I don't believe Intel would put a completely arbitrary and nonsencial 
paragraph into the manual all of a sudden. It'd be great to know the 
rationale why this has been added in the first place.

> > We were hunting something completely different, and came through this 
> > paragraph in the Intel manual, and found it rather scary.
> 
> But this is all irrelevant anyway as this is all hypothetical and
> there's been no real world bug with this.

One would hope. Again -- I believe if this would trigger here and here a 
few times a year, everyone would probably atribute it to a "random hang", 
reboot, and never see the bug again.

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

On Tue, 29 Apr 2014 18:51:13 +0200 (CEST)
Jiri Kosina  wrote:

> Just to be clear here -- I don't have a box that can reproduce this; I 
> whole-heartedly believe that even if there are boxes with this behavior 
> (and I assume there are, otherwise Intel wouldn't be mentioning it in the 
> docs), it'd be hard to trigger on those.

I see your point. But it is documented for those that control both NMIs
and SMMs. As it says in the document: "If the SMI handler requires the
use of NMI interrupts". That to me sounds like a system that has
control over both SMIs *and* NMIs. The BIOS should not have any control
over NMIs, as the OS requires that. And the OS has no control over
SMIs.

That paragraph sounds irrelevant to normal BIOS and OS systems as
neither "owns" both SMIs and NMIs.

I've fought BIOS engineers before, where they would say something like
"Oh! You want to use the second PIT? I'll fix my code. Sorry".

> We were hunting something completely different, and came through this 
> paragraph in the Intel manual, and found it rather scary.

But this is all irrelevant anyway as this is all hypothetical and
there's been no real world bug with this.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

On Tue, 29 Apr 2014, Steven Rostedt wrote:

> You keep saying 38.4, but I don't see any 38.4. Perhaps you meant 34.8?

Yeah, sorry for the typo.

> Which BTW is this:
> 
> 
> 34.8 NMI HANDLING WHILE IN SMM
[ ... snip ... ]
> 
> 
> Read the first paragraph. That sounds like normal operation. The SMM
> should use the RSM to return and that does not re-enable NMIs if the
> SMM triggered during an NMI.

Yup, so far so good.

> The above is just stating that the SMM can enable NMIs if it wants to
> by executing an IRET. Which to me sounds rather buggy to do.

That's exactly the point actually. Basically, that paragraph allows the 
SMM code writers to issue iret. If they do it, the very problem I am 
trying to describe here might happen.

> Now the third paragraph is rather ambiguous. It sounds like it's still
> talking about doing an IRET in the SMI handler. As the IRET will enable
> NMIs, and if the SMI happened while an NMI was happening, the new NMI
> will happen. In this case, the NMI handler needs to address this. But
> this really sounds like if you have control of both SMM handlers and
> NMI handlers, which the Linux kernel certainly does not. Again, I label
> this as a bug in the BIOS.
> 
> And again, if the SMM were to trigger a fault, it too would enable
> NMIs. That is something that the SMM handler should not do.

That's what the last paragraph is talking about BTW, related to Pentium 
CPUs. That part is scary by itself.

> Can you reproduce your problem on different platforms, or is this just
> one box that exhibits this behavior? If it's only one box, I'm betting
> it has a BIOS doing nasty things.

Just to be clear here -- I don't have a box that can reproduce this; I 
whole-heartedly believe that even if there are boxes with this behavior 
(and I assume there are, otherwise Intel wouldn't be mentioning it in the 
docs), it'd be hard to trigger on those.

We were hunting something completely different, and came through this 
paragraph in the Intel manual, and found it rather scary.

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

On Tue, 29 Apr 2014 12:09:08 -0400
Steven Rostedt  wrote:

> Can you reproduce your problem on different platforms, or is this just
> one box that exhibits this behavior? If it's only one box, I'm betting
> it has a BIOS doing nasty things.

This box probably crashes on all kernels too. My NMI nesting changes
did not fix a bug (well, it did as a side effect, see below). It was
done to allow NMIs to use IRET so that we could remove stopmachine from
ftrace, and instead have it use breakpoints (which return with IRET).

The bug that was fixed by this was the ability to do stack traces
(sysrq-t) from NMI context. Stack traces can page fault, and when I was
debugging hard lock ups and having the NMI do a stack dump of all
tasks, another NMI would trigger and corrupt the stack of the NMI doing
the dumps. But that was something that would only be seen while
debugging, and not something seen in normal operation.

I don't see a bug to fix in the kernel. I see a bug to fix in the
vendor's BIOS.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

On Tue, 29 Apr 2014 17:24:32 +0200 (CEST)
Jiri Kosina  wrote:

> On Tue, 29 Apr 2014, Steven Rostedt wrote:
> 
> > > According to 38.4 of [1], when SMM mode is entered while the CPU is 
> > > handling NMI, the end result might be that upon exit from SMM, NMIs will 
> > > be re-enabled and latched NMI delivered as nested [2].
> > 
> > Note, if this were true, then the x86_64 hardware would be extremely
> > buggy. That's because NMIs are not made to be nested. If SMM's come in
> > during an NMI and re-enables the NMI, then *all* software would break.
> > That would basically make NMIs useless.
> > 
> > The only time I've ever witness problems (and I stress NMIs all the
> > time), is when the NMI itself does a fault. Which my patch set handles
> > properly. 
> 
> Yes, it indeed does. 
> 
> In the scenario I have outlined, the race window is extremely small, plus 
> NMIs don't happen that often, plus SMIs don't happen that often, plus 
> (hopefully) many BIOSes don't enable NMIs upon SMM exit.
> 
> The problem is, that Intel documentation is clear in this respect, and 
> explicitly states it can happen. And we are violating that, which makes me 
> rather nervous -- it'd be very nice to know what is the background of 38.4 
> section text in the Intel docs.
> 

You keep saying 38.4, but I don't see any 38.4. Perhaps you meant 34.8?

Which BTW is this:

34.8 NMI HANDLING WHILE IN SMM

NMI interrupts are blocked upon entry to the SMI handler. If an NMI
request occurs during the SMI handler, it is latched and serviced after
the processor exits SMM. Only one NMI request will be latched during
the SMI handler. If an NMI request is pending when the processor
executes the RSM instruction, the NMI is serviced before the next
instruction of the interrupted code sequence. This assumes that NMIs
were not blocked before the SMI occurred. If NMIs were blocked before
the SMI occurred, they are blocked after execution of RSM.

Although NMI requests are blocked when the processor enters SMM, they
may be enabled through software by executing an IRET instruction. If
the SMI handler requires the use of NMI interrupts, it should invoke a
dummy interrupt service routine for the purpose of executing an IRET
instruction. Once an IRET instruction is executed, NMI interrupt
requests are serviced in the same “real mode” manner in which they are
handled outside of SMM.

A special case can occur if an SMI handler nests inside an NMI handler
and then another NMI occurs. During NMI interrupt handling, NMI
interrupts are disabled, so normally NMI interrupts are serviced and
completed with an IRET instruction one at a time. When the processor
enters SMM while executing an NMI handler, the processor saves the
SMRAM state save map but does not save the attribute to keep NMI
interrupts disabled. Potentially, an NMI could be latched (while in SMM
or upon exit) and serviced upon exit of SMM even though the previous
NMI handler has still not completed. One or more NMIs could thus be
nested inside the first NMI handler. The NMI interrupt handler should
take this possibility into consideration.

Also, for the Pentium processor, exceptions that invoke a trap or fault
handler will enable NMI interrupts from inside of SMM. This behavior is
implementation specific for the Pentium processor and is not part of
the IA-32 architecture.

Read the first paragraph. That sounds like normal operation. The SMM
should use the RSM to return and that does not re-enable NMIs if the
SMM triggered during an NMI.

The above is just stating that the SMM can enable NMIs if it wants to
by executing an IRET. Which to me sounds rather buggy to do.

Now the third paragraph is rather ambiguous. It sounds like it's still
talking about doing an IRET in the SMI handler. As the IRET will enable
NMIs, and if the SMI happened while an NMI was happening, the new NMI
will happen. In this case, the NMI handler needs to address this. But
this really sounds like if you have control of both SMM handlers and
NMI handlers, which the Linux kernel certainly does not. Again, I label
this as a bug in the BIOS.

And again, if the SMM were to trigger a fault, it too would enable
NMIs. That is something that the SMM handler should not do.

Can you reproduce your problem on different platforms, or is this just
one box that exhibits this behavior? If it's only one box, I'm betting
it has a BIOS doing nasty things.

No where in the Intel text do I see that the operating system is to
handle nested NMIs. It needs to handle it if you control the SMMs,
which the operating system does not. Sounds like they are talking to
the firmware folks.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

On Tue, 29 Apr 2014, Steven Rostedt wrote:

> > According to 38.4 of [1], when SMM mode is entered while the CPU is 
> > handling NMI, the end result might be that upon exit from SMM, NMIs will 
> > be re-enabled and latched NMI delivered as nested [2].
> 
> Note, if this were true, then the x86_64 hardware would be extremely
> buggy. That's because NMIs are not made to be nested. If SMM's come in
> during an NMI and re-enables the NMI, then *all* software would break.
> That would basically make NMIs useless.
> 
> The only time I've ever witness problems (and I stress NMIs all the
> time), is when the NMI itself does a fault. Which my patch set handles
> properly. 

Yes, it indeed does. 

In the scenario I have outlined, the race window is extremely small, plus 
NMIs don't happen that often, plus SMIs don't happen that often, plus 
(hopefully) many BIOSes don't enable NMIs upon SMM exit.

The problem is, that Intel documentation is clear in this respect, and 
explicitly states it can happen. And we are violating that, which makes me 
rather nervous -- it'd be very nice to know what is the background of 38.4 
section text in the Intel docs.

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

2014-04-29 Thread Petr Tesarik

On Tue, 29 Apr 2014 06:29:04 -0700
"H. Peter Anvin"  wrote:

> On 04/29/2014 06:05 AM, Jiri Kosina wrote:
> > 
> > We were not able to come up with any other fix than avoiding using IST 
> > completely on x86_64, and instead going back to stack switching in 
> > software -- the same way 32bit x86 does.
> > 
> 
> This is not possible, though, because there are several windows during
> which if we were to take an exception which doesn't do IST, e.g. NMI, we
> are worse than dead -- we are in fact rootable.  Right after SYSCALL in
> particular.

Ah, right. SYSCALL does not update RSP. :-(
Hm, so anything that can fire up right after a SYSCALL must use IST.
It's possible to use an alternative IDT that gets loaded as the first
thing in an NMI handler, but this gets incredibly ugly...

> > So basically, I have two questions:
> > 
> > (1) is the above analysis correct? (if not, why?)
> > (2) if it is correct, is there any other option for fix than avoiding 
> > using IST for exception stack switching, and having kernel do the 
> > legacy task switching (the same way x86_32 is doing)?
> 
> It is not an option, see above.
> 
> > [1] 
> > http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
> > 
> > [2] "A special case can occur if an SMI handler nests inside an NMI 
> >  handler and then another NMI occurs. During NMI interrupt 
> >  handling, NMI interrupts are disabled, so normally NMI interrupts 
> >  are serviced and completed with an IRET instruction one at a 
> >  time. When the processor enters SMM while executing an NMI 
> >  handler, the processor saves the SMRAM state save map but does 
> >  not save the attribute to keep NMI interrupts disabled. 
> >  Potentially, an NMI could be latched (while in SMM or upon exit) 
> >  and serviced upon exit of SMM even though the previous NMI  
> >  handler has still not completed."
> 
> I believe [2] only applies if there is an IRET executing inside the SMM
> handler, which should not normally be the case.  It might also have been
> addressed since that was written, but I don't know.

The trouble here is that the official Intel documentation describes how
to do this and specifically requests the OS to cope with nested NMIs.

Petr T
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

On 04/29/2014 07:06 AM, Steven Rostedt wrote:
> On Tue, 29 Apr 2014 06:29:04 -0700
> "H. Peter Anvin"  wrote:
> 
>  
>>> [2] "A special case can occur if an SMI handler nests inside an NMI 
>>>  handler and then another NMI occurs. During NMI interrupt 
>>>  handling, NMI interrupts are disabled, so normally NMI interrupts 
>>>  are serviced and completed with an IRET instruction one at a 
>>>  time. When the processor enters SMM while executing an NMI 
>>>  handler, the processor saves the SMRAM state save map but does 
>>>  not save the attribute to keep NMI interrupts disabled. 
>>>  Potentially, an NMI could be latched (while in SMM or upon exit) 
>>>  and serviced upon exit of SMM even though the previous NMI  
>>>  handler has still not completed."
>>
>> I believe [2] only applies if there is an IRET executing inside the SMM
>> handler, which should not normally be the case.  It might also have been
>> addressed since that was written, but I don't know.
> 
> Bad behaving BIOS? But I'm sure there's no such thing ;-)
> 

Never...

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

On Tue, 29 Apr 2014 06:29:04 -0700
"H. Peter Anvin"  wrote:

 
> > [2] "A special case can occur if an SMI handler nests inside an NMI 
> >  handler and then another NMI occurs. During NMI interrupt 
> >  handling, NMI interrupts are disabled, so normally NMI interrupts 
> >  are serviced and completed with an IRET instruction one at a 
> >  time. When the processor enters SMM while executing an NMI 
> >  handler, the processor saves the SMRAM state save map but does 
> >  not save the attribute to keep NMI interrupts disabled. 
> >  Potentially, an NMI could be latched (while in SMM or upon exit) 
> >  and serviced upon exit of SMM even though the previous NMI  
> >  handler has still not completed."
> 
> I believe [2] only applies if there is an IRET executing inside the SMM
> handler, which should not normally be the case.  It might also have been
> addressed since that was written, but I don't know.

Bad behaving BIOS? But I'm sure there's no such thing ;-)

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

On Tue, 29 Apr 2014 15:05:55 +0200 (CEST)
Jiri Kosina  wrote:

> According to 38.4 of [1], when SMM mode is entered while the CPU is 
> handling NMI, the end result might be that upon exit from SMM, NMIs will 
> be re-enabled and latched NMI delivered as nested [2].

Note, if this were true, then the x86_64 hardware would be extremely
buggy. That's because NMIs are not made to be nested. If SMM's come in
during an NMI and re-enables the NMI, then *all* software would break.
That would basically make NMIs useless.

The only time I've ever witness problems (and I stress NMIs all the
time), is when the NMI itself does a fault. Which my patch set handles
properly. I've also stressed this on boxes that do have SMIs and SMMs.

Now, you can have a bad BIOS that does re-enable NMIs from SMMs or
SMIs, but then you need to take that up with your vendor.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

On 04/29/2014 06:05 AM, Jiri Kosina wrote:
> 
> We were not able to come up with any other fix than avoiding using IST 
> completely on x86_64, and instead going back to stack switching in 
> software -- the same way 32bit x86 does.
> 

This is not possible, though, because there are several windows during
which if we were to take an exception which doesn't do IST, e.g. NMI, we
are worse than dead -- we are in fact rootable.  Right after SYSCALL in
particular.

> So basically, I have two questions:
> 
> (1) is the above analysis correct? (if not, why?)
> (2) if it is correct, is there any other option for fix than avoiding 
> using IST for exception stack switching, and having kernel do the 
> legacy task switching (the same way x86_32 is doing)?

It is not an option, see above.

> [1] 
> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
> 
> [2]   "A special case can occur if an SMI handler nests inside an NMI 
>handler and then another NMI occurs. During NMI interrupt 
>handling, NMI interrupts are disabled, so normally NMI interrupts 
>are serviced and completed with an IRET instruction one at a 
>time. When the processor enters SMM while executing an NMI 
>handler, the processor saves the SMRAM state save map but does 
>not save the attribute to keep NMI interrupts disabled. 
>Potentially, an NMI could be latched (while in SMM or upon exit) 
>and serviced upon exit of SMM even though the previous NMI  
>handler has still not completed."

I believe [2] only applies if there is an IRET executing inside the SMM
handler, which should not normally be the case.  It might also have been
addressed since that was written, but I don't know.

-hpa



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

64bit x86: NMI nesting still buggy?

Hi,

so while debugging some hard-to-explain hangs in the past, we have been 
going around in circles around the NMI nesting disaster, and I tend to 
believe that Steven's fixup (for most part introduced in 3f3c8b8c ("x86: 
Add workaround to NMI iret woes")) makes the race *much* smaller, but it 
doesn't fix it completely (it basically reduces the race to a few 
instructions in first_nmi which are doing the stack preparatory work).

According to 38.4 of [1], when SMM mode is entered while the CPU is 
handling NMI, the end result might be that upon exit from SMM, NMIs will 
be re-enabled and latched NMI delivered as nested [2].

This is handled well by playing the frame-saving and flag-setting games in 
`first_nmi' / `nested_nmi' / `repeat_nmi' (and that also works flawlessly 
in cases exception or breakpoint triggers some time later during NMI 
handling when all the 'nested' setup has been done).

There is unfortunately small race window, which, I believe, is not covered 
by this.

- 1st NMI triggers
- SMM is entered very shortly afterwards, even before `first_nmi' 
  was able to do its job
- 2nd NMI is latched
- SMM exits with NMIs re-enabled (see [2]) and 2nd NMI triggers
- 2nd NMI gets handled properly, exits with iret
- iret returns to the place where 1st NMI was interrupted, but 
  the return address on the stack where iret from 1st NMI should 
  eventually return to is gone, and the 'saved/copy' locations of 
  the stack don't contain the correct frame either

The race is very small and it's hard to trigger SMM in a deterministic 
way, so it's probably very difficult to trigger. But I wouldn't be 
surprised if it'd trigger ocassionally in the wild, and the resulting 
problems were never root-caused (as the problem is very rare, not 
reproducible, probably doesn't happen on the same system more than once in 
a lifetime).

We were not able to come up with any other fix than avoiding using IST 
completely on x86_64, and instead going back to stack switching in 
software -- the same way 32bit x86 does.

So basically, I have two questions:

(1) is the above analysis correct? (if not, why?)
(2) if it is correct, is there any other option for fix than avoiding 
using IST for exception stack switching, and having kernel do the 
legacy task switching (the same way x86_32 is doing)?

[1] 
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

[2] "A special case can occur if an SMI handler nests inside an NMI 
 handler and then another NMI occurs. During NMI interrupt 
 handling, NMI interrupts are disabled, so normally NMI interrupts 
 are serviced and completed with an IRET instruction one at a 
 time. When the processor enters SMM while executing an NMI 
 handler, the processor saves the SMRAM state save map but does 
 not save the attribute to keep NMI interrupts disabled. 
 Potentially, an NMI could be latched (while in SMM or upon exit) 
 and serviced upon exit of SMM even though the previous NMI  
 handler has still not completed."

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

64bit x86: NMI nesting still buggy?

Hi,

so while debugging some hard-to-explain hangs in the past, we have been 
going around in circles around the NMI nesting disaster, and I tend to 
believe that Steven's fixup (for most part introduced in 3f3c8b8c (x86: 
Add workaround to NMI iret woes)) makes the race *much* smaller, but it 
doesn't fix it completely (it basically reduces the race to a few 
instructions in first_nmi which are doing the stack preparatory work).

According to 38.4 of [1], when SMM mode is entered while the CPU is 
handling NMI, the end result might be that upon exit from SMM, NMIs will 
be re-enabled and latched NMI delivered as nested [2].

This is handled well by playing the frame-saving and flag-setting games in 
`first_nmi' / `nested_nmi' / `repeat_nmi' (and that also works flawlessly 
in cases exception or breakpoint triggers some time later during NMI 
handling when all the 'nested' setup has been done).

There is unfortunately small race window, which, I believe, is not covered 
by this.

- 1st NMI triggers
- SMM is entered very shortly afterwards, even before `first_nmi' 
  was able to do its job
- 2nd NMI is latched
- SMM exits with NMIs re-enabled (see [2]) and 2nd NMI triggers
- 2nd NMI gets handled properly, exits with iret
- iret returns to the place where 1st NMI was interrupted, but 
  the return address on the stack where iret from 1st NMI should 
  eventually return to is gone, and the 'saved/copy' locations of 
  the stack don't contain the correct frame either

The race is very small and it's hard to trigger SMM in a deterministic 
way, so it's probably very difficult to trigger. But I wouldn't be 
surprised if it'd trigger ocassionally in the wild, and the resulting 
problems were never root-caused (as the problem is very rare, not 
reproducible, probably doesn't happen on the same system more than once in 
a lifetime).

We were not able to come up with any other fix than avoiding using IST 
completely on x86_64, and instead going back to stack switching in 
software -- the same way 32bit x86 does.

So basically, I have two questions:

(1) is the above analysis correct? (if not, why?)
(2) if it is correct, is there any other option for fix than avoiding 
using IST for exception stack switching, and having kernel do the 
legacy task switching (the same way x86_32 is doing)?

[1] 
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

[2] A special case can occur if an SMI handler nests inside an NMI 
 handler and then another NMI occurs. During NMI interrupt 
 handling, NMI interrupts are disabled, so normally NMI interrupts 
 are serviced and completed with an IRET instruction one at a 
 time. When the processor enters SMM while executing an NMI 
 handler, the processor saves the SMRAM state save map but does 
 not save the attribute to keep NMI interrupts disabled. 
 Potentially, an NMI could be latched (while in SMM or upon exit) 
 and serviced upon exit of SMM even though the previous NMI  
 handler has still not completed.

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

On 04/29/2014 06:05 AM, Jiri Kosina wrote:

We were not able to come up with any other fix than avoiding using IST
completely on x86_64, and instead going back to stack switching in
software -- the same way 32bit x86 does.

This is not possible, though, because there are several windows during
which if we were to take an exception which doesn't do IST, e.g. NMI, we
are worse than dead -- we are in fact rootable. Right after SYSCALL in
particular.

So basically, I have two questions:

(1) is the above analysis correct? (if not, why?)
(2) if it is correct, is there any other option for fix than avoiding
using IST for exception stack switching, and having kernel do the
legacy task switching (the same way x86_32 is doing)?

It is not an option, see above.

[1]
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

[2] A special case can occur if an SMI handler nests inside an NMI
handler and then another NMI occurs. During NMI interrupt
handling, NMI interrupts are disabled, so normally NMI interrupts
are serviced and completed with an IRET instruction one at a
time. When the processor enters SMM while executing an NMI
handler, the processor saves the SMRAM state save map but does
not save the attribute to keep NMI interrupts disabled.
Potentially, an NMI could be latched (while in SMM or upon exit)
and serviced upon exit of SMM even though the previous NMI
handler has still not completed.

I believe [2] only applies if there is an IRET executing inside the SMM
handler, which should not normally be the case. It might also have been
addressed since that was written, but I don't know.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

On Tue, 29 Apr 2014 15:05:55 +0200 (CEST)
Jiri Kosina jkos...@suse.cz wrote:


 According to 38.4 of [1], when SMM mode is entered while the CPU is 
 handling NMI, the end result might be that upon exit from SMM, NMIs will 
 be re-enabled and latched NMI delivered as nested [2].

Note, if this were true, then the x86_64 hardware would be extremely
buggy. That's because NMIs are not made to be nested. If SMM's come in
during an NMI and re-enables the NMI, then *all* software would break.
That would basically make NMIs useless.

The only time I've ever witness problems (and I stress NMIs all the
time), is when the NMI itself does a fault. Which my patch set handles
properly. I've also stressed this on boxes that do have SMIs and SMMs.

Now, you can have a bad BIOS that does re-enable NMIs from SMMs or
SMIs, but then you need to take that up with your vendor.

-- Steve


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

On Tue, 29 Apr 2014 06:29:04 -0700
H. Peter Anvin h...@linux.intel.com wrote:

 
  [2] A special case can occur if an SMI handler nests inside an NMI 
   handler and then another NMI occurs. During NMI interrupt 
   handling, NMI interrupts are disabled, so normally NMI interrupts 
   are serviced and completed with an IRET instruction one at a 
   time. When the processor enters SMM while executing an NMI 
   handler, the processor saves the SMRAM state save map but does 
   not save the attribute to keep NMI interrupts disabled. 
   Potentially, an NMI could be latched (while in SMM or upon exit) 
   and serviced upon exit of SMM even though the previous NMI  
   handler has still not completed.
 
 I believe [2] only applies if there is an IRET executing inside the SMM
 handler, which should not normally be the case.  It might also have been
 addressed since that was written, but I don't know.

Bad behaving BIOS? But I'm sure there's no such thing ;-)

-- Steve

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

On 04/29/2014 07:06 AM, Steven Rostedt wrote:
 On Tue, 29 Apr 2014 06:29:04 -0700
 H. Peter Anvin h...@linux.intel.com wrote:
 
  
 [2] A special case can occur if an SMI handler nests inside an NMI 
  handler and then another NMI occurs. During NMI interrupt 
  handling, NMI interrupts are disabled, so normally NMI interrupts 
  are serviced and completed with an IRET instruction one at a 
  time. When the processor enters SMM while executing an NMI 
  handler, the processor saves the SMRAM state save map but does 
  not save the attribute to keep NMI interrupts disabled. 
  Potentially, an NMI could be latched (while in SMM or upon exit) 
  and serviced upon exit of SMM even though the previous NMI  
  handler has still not completed.

 I believe [2] only applies if there is an IRET executing inside the SMM
 handler, which should not normally be the case.  It might also have been
 addressed since that was written, but I don't know.
 
 Bad behaving BIOS? But I'm sure there's no such thing ;-)
 

Never...

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

2014-04-29 Thread Petr Tesarik

On Tue, 29 Apr 2014 06:29:04 -0700
H. Peter Anvin h...@linux.intel.com wrote:

On 04/29/2014 06:05 AM, Jiri Kosina wrote:

We were not able to come up with any other fix than avoiding using IST
completely on x86_64, and instead going back to stack switching in
software -- the same way 32bit x86 does.

Ah, right. SYSCALL does not update RSP. :-(
Hm, so anything that can fire up right after a SYSCALL must use IST.
It's possible to use an alternative IDT that gets loaded as the first
thing in an NMI handler, but this gets incredibly ugly...

So basically, I have two questions:

It is not an option, see above.

[1]
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

I believe [2] only applies if there is an IRET executing inside the SMM
handler, which should not normally be the case. It might also have been
addressed since that was written, but I don't know.

The trouble here is that the official Intel documentation describes how
to do this and specifically requests the OS to cope with nested NMIs.

Petr T
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

On Tue, 29 Apr 2014, Steven Rostedt wrote:

  According to 38.4 of [1], when SMM mode is entered while the CPU is 
  handling NMI, the end result might be that upon exit from SMM, NMIs will 
  be re-enabled and latched NMI delivered as nested [2].
 
 Note, if this were true, then the x86_64 hardware would be extremely
 buggy. That's because NMIs are not made to be nested. If SMM's come in
 during an NMI and re-enables the NMI, then *all* software would break.
 That would basically make NMIs useless.
 
 The only time I've ever witness problems (and I stress NMIs all the
 time), is when the NMI itself does a fault. Which my patch set handles
 properly. 

Yes, it indeed does. 

In the scenario I have outlined, the race window is extremely small, plus 
NMIs don't happen that often, plus SMIs don't happen that often, plus 
(hopefully) many BIOSes don't enable NMIs upon SMM exit.

The problem is, that Intel documentation is clear in this respect, and 
explicitly states it can happen. And we are violating that, which makes me 
rather nervous -- it'd be very nice to know what is the background of 38.4 
section text in the Intel docs.

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

On Tue, 29 Apr 2014 17:24:32 +0200 (CEST)
Jiri Kosina jkos...@suse.cz wrote:

 On Tue, 29 Apr 2014, Steven Rostedt wrote:
 
   According to 38.4 of [1], when SMM mode is entered while the CPU is 
   handling NMI, the end result might be that upon exit from SMM, NMIs will 
   be re-enabled and latched NMI delivered as nested [2].
  
  Note, if this were true, then the x86_64 hardware would be extremely
  buggy. That's because NMIs are not made to be nested. If SMM's come in
  during an NMI and re-enables the NMI, then *all* software would break.
  That would basically make NMIs useless.
  
  The only time I've ever witness problems (and I stress NMIs all the
  time), is when the NMI itself does a fault. Which my patch set handles
  properly. 
 
 Yes, it indeed does. 
 
 In the scenario I have outlined, the race window is extremely small, plus 
 NMIs don't happen that often, plus SMIs don't happen that often, plus 
 (hopefully) many BIOSes don't enable NMIs upon SMM exit.
 
 The problem is, that Intel documentation is clear in this respect, and 
 explicitly states it can happen. And we are violating that, which makes me 
 rather nervous -- it'd be very nice to know what is the background of 38.4 
 section text in the Intel docs.
 

You keep saying 38.4, but I don't see any 38.4. Perhaps you meant 34.8?

Which BTW is this:


34.8 NMI HANDLING WHILE IN SMM

NMI interrupts are blocked upon entry to the SMI handler. If an NMI
request occurs during the SMI handler, it is latched and serviced after
the processor exits SMM. Only one NMI request will be latched during
the SMI handler. If an NMI request is pending when the processor
executes the RSM instruction, the NMI is serviced before the next
instruction of the interrupted code sequence. This assumes that NMIs
were not blocked before the SMI occurred. If NMIs were blocked before
the SMI occurred, they are blocked after execution of RSM.

Although NMI requests are blocked when the processor enters SMM, they
may be enabled through software by executing an IRET instruction. If
the SMI handler requires the use of NMI interrupts, it should invoke a
dummy interrupt service routine for the purpose of executing an IRET
instruction. Once an IRET instruction is executed, NMI interrupt
requests are serviced in the same “real mode” manner in which they are
handled outside of SMM.

A special case can occur if an SMI handler nests inside an NMI handler
and then another NMI occurs. During NMI interrupt handling, NMI
interrupts are disabled, so normally NMI interrupts are serviced and
completed with an IRET instruction one at a time. When the processor
enters SMM while executing an NMI handler, the processor saves the
SMRAM state save map but does not save the attribute to keep NMI
interrupts disabled. Potentially, an NMI could be latched (while in SMM
or upon exit) and serviced upon exit of SMM even though the previous
NMI handler has still not completed. One or more NMIs could thus be
nested inside the first NMI handler. The NMI interrupt handler should
take this possibility into consideration.

Also, for the Pentium processor, exceptions that invoke a trap or fault
handler will enable NMI interrupts from inside of SMM. This behavior is
implementation specific for the Pentium processor and is not part of
the IA-32 architecture.


Read the first paragraph. That sounds like normal operation. The SMM
should use the RSM to return and that does not re-enable NMIs if the
SMM triggered during an NMI.

The above is just stating that the SMM can enable NMIs if it wants to
by executing an IRET. Which to me sounds rather buggy to do.

Now the third paragraph is rather ambiguous. It sounds like it's still
talking about doing an IRET in the SMI handler. As the IRET will enable
NMIs, and if the SMI happened while an NMI was happening, the new NMI
will happen. In this case, the NMI handler needs to address this. But
this really sounds like if you have control of both SMM handlers and
NMI handlers, which the Linux kernel certainly does not. Again, I label
this as a bug in the BIOS.

And again, if the SMM were to trigger a fault, it too would enable
NMIs. That is something that the SMM handler should not do.


Can you reproduce your problem on different platforms, or is this just
one box that exhibits this behavior? If it's only one box, I'm betting
it has a BIOS doing nasty things.

No where in the Intel text do I see that the operating system is to
handle nested NMIs. It needs to handle it if you control the SMMs,
which the operating system does not. Sounds like they are talking to
the firmware folks.

-- Steve


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

On Tue, 29 Apr 2014 12:09:08 -0400
Steven Rostedt rost...@goodmis.org wrote:

 Can you reproduce your problem on different platforms, or is this just
 one box that exhibits this behavior? If it's only one box, I'm betting
 it has a BIOS doing nasty things.

This box probably crashes on all kernels too. My NMI nesting changes
did not fix a bug (well, it did as a side effect, see below). It was
done to allow NMIs to use IRET so that we could remove stopmachine from
ftrace, and instead have it use breakpoints (which return with IRET).

The bug that was fixed by this was the ability to do stack traces
(sysrq-t) from NMI context. Stack traces can page fault, and when I was
debugging hard lock ups and having the NMI do a stack dump of all
tasks, another NMI would trigger and corrupt the stack of the NMI doing
the dumps. But that was something that would only be seen while
debugging, and not something seen in normal operation.

I don't see a bug to fix in the kernel. I see a bug to fix in the
vendor's BIOS.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

On Tue, 29 Apr 2014, Steven Rostedt wrote:

 You keep saying 38.4, but I don't see any 38.4. Perhaps you meant 34.8?

Yeah, sorry for the typo.

 Which BTW is this:
 
 
 34.8 NMI HANDLING WHILE IN SMM
[ ... snip ... ]
 
 
 Read the first paragraph. That sounds like normal operation. The SMM
 should use the RSM to return and that does not re-enable NMIs if the
 SMM triggered during an NMI.

Yup, so far so good.

 The above is just stating that the SMM can enable NMIs if it wants to
 by executing an IRET. Which to me sounds rather buggy to do.

That's exactly the point actually. Basically, that paragraph allows the 
SMM code writers to issue iret. If they do it, the very problem I am 
trying to describe here might happen.

 Now the third paragraph is rather ambiguous. It sounds like it's still
 talking about doing an IRET in the SMI handler. As the IRET will enable
 NMIs, and if the SMI happened while an NMI was happening, the new NMI
 will happen. In this case, the NMI handler needs to address this. But
 this really sounds like if you have control of both SMM handlers and
 NMI handlers, which the Linux kernel certainly does not. Again, I label
 this as a bug in the BIOS.
 
 And again, if the SMM were to trigger a fault, it too would enable
 NMIs. That is something that the SMM handler should not do.

That's what the last paragraph is talking about BTW, related to Pentium 
CPUs. That part is scary by itself.

 Can you reproduce your problem on different platforms, or is this just
 one box that exhibits this behavior? If it's only one box, I'm betting
 it has a BIOS doing nasty things.

Just to be clear here -- I don't have a box that can reproduce this; I 
whole-heartedly believe that even if there are boxes with this behavior 
(and I assume there are, otherwise Intel wouldn't be mentioning it in the 
docs), it'd be hard to trigger on those.

We were hunting something completely different, and came through this 
paragraph in the Intel manual, and found it rather scary.

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

On Tue, 29 Apr 2014 18:51:13 +0200 (CEST)
Jiri Kosina jkos...@suse.cz wrote:


 Just to be clear here -- I don't have a box that can reproduce this; I 
 whole-heartedly believe that even if there are boxes with this behavior 
 (and I assume there are, otherwise Intel wouldn't be mentioning it in the 
 docs), it'd be hard to trigger on those.

I see your point. But it is documented for those that control both NMIs
and SMMs. As it says in the document: If the SMI handler requires the
use of NMI interrupts. That to me sounds like a system that has
control over both SMIs *and* NMIs. The BIOS should not have any control
over NMIs, as the OS requires that. And the OS has no control over
SMIs.

That paragraph sounds irrelevant to normal BIOS and OS systems as
neither owns both SMIs and NMIs.

I've fought BIOS engineers before, where they would say something like
Oh! You want to use the second PIT? I'll fix my code. Sorry.

 We were hunting something completely different, and came through this 
 paragraph in the Intel manual, and found it rather scary.

But this is all irrelevant anyway as this is all hypothetical and
there's been no real world bug with this.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?

On Tue, 29 Apr 2014, Steven Rostedt wrote:

  Just to be clear here -- I don't have a box that can reproduce this; I 
  whole-heartedly believe that even if there are boxes with this behavior 
  (and I assume there are, otherwise Intel wouldn't be mentioning it in the 
  docs), it'd be hard to trigger on those.
 
 I see your point. But it is documented for those that control both NMIs
 and SMMs. As it says in the document: If the SMI handler requires the
 use of NMI interrupts. That to me sounds like a system that has
 control over both SMIs *and* NMIs. The BIOS should not have any control
 over NMIs, as the OS requires that. And the OS has no control over
 SMIs.
 
 That paragraph sounds irrelevant to normal BIOS and OS systems as
 neither owns both SMIs and NMIs.

Which doesn't really help me being less nervous about this whole thing.

I don't believe Intel would put a completely arbitrary and nonsencial 
paragraph into the manual all of a sudden. It'd be great to know the 
rationale why this has been added in the first place.

  We were hunting something completely different, and came through this 
  paragraph in the Intel manual, and found it rather scary.
 
 But this is all irrelevant anyway as this is all hypothetical and
 there's been no real world bug with this.

One would hope. Again -- I believe if this would trigger here and here a 
few times a year, everyone would probably atribute it to a random hang, 
reboot, and never see the bug again.

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 64bit x86: NMI nesting still buggy?