On Mon, Jun 25, 2018 at 3:15 PM Steven Rostedt <rost...@goodmis.org> wrote: > > On Mon, 25 Jun 2018 13:47:08 -0700 > "Paul E. McKenney" <paul...@linux.vnet.ibm.com> wrote: > > > On Mon, Jun 25, 2018 at 04:25:57PM -0400, Steven Rostedt wrote: > > > On Mon, 25 Jun 2018 09:39:51 -0700 > > > Joel Fernandes <j...@joelfernandes.org> wrote: > > > > > > > For whatever its worth, I made some notes of what I understood from > > > > reading > > > > the code and old posts because I was sure I would otherwise forget > > > > everything: > > > > http://www.joelfernandes.org/linuxinternals/2018/06/15/rcu-dynticks.html > > > > > > Nice write up. I may point some people to this ;-) > > > > > > Anyway "complications due to nested NMIs (yes NMIs can nest!)" > > > > > > What arch allows for NMIs to nest. Because we don't let that happen on > > > x86, and there's code that I know of that is called by NMIs that is not > > > re-entrant, and can crash if we allow for NMIs to nest. For example > > > "in_nmi()" will not show that we are in_nmi() if we allow for nesting > > > of NMIs. It has a single bit that gets incremented when we enter NMI > > > code, and cleared when we leave it. > > > > Last I checked with Andy Lutomirski, there are a number of things that, > > though not NMIs, act like NMIs and that can interrupt each others' > > handlers. This is on x86. > > > > Perhaps things like MCEs, but they don't call nmi_enter(). And usually > when something does, it probably puts the machine into an unstable > state. Getting RCU right, may be the least of the worries. > > You may want to ask Andy if there's legitimate interruptions of NMIs > that doesn't mean "please reboot as soon as possible"? >
Yes, sadly. CPU A gets an NMI. While processing it, CPU B has user code access a faulty NVDIMM address, causing CPU B to generate a machine check. CPU A also gets a machine check because someone at Intel thought it was a good idea. CPU A will mostly ignore the machine check, but it still happens. I think it's reasonable to say that nmi_enter() won't nest, but I don't see how we can avoid rcu_nmi_enter() nesting.