> On Thu, Oct 01, 2015 at 02:33:18AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote: > > > On Fri, Sep 25, 2015 at 08:28:11PM +0900, Hidehiro Kawai wrote: > > > > This patch introduces new boot option "noextnmi" which disables > > > > external NMI. This option is useful for the dump capture kernel > > > > so that an HA application or administrator wouldn't mistakenly > > > > shoot down the kernel by NMI. > > > > > > So that they can get really stuck when the crash kernel crashes, right? > > > ;-) > > > > No, it is different from my intention. > > > > `mistakenly' in the above means; they issue NMI due to a misconception > > that the monitored host is stuck in the 1st kernel while it is actually > > taking a crash dump in the 2nd kernel. To avoid this kind of accident, > > there is a tool such as fence_kdump which notifies "I'm taking a crash > > dump, so don't send NMI" to the HA clustering software. However, there > > is a time window between kernel panic and the notification. > > > > "noextnmi" allows users to avoid this kind of accident all the time of > > 2nd kernel. > > Yes yes, I understand. But if the crash kernel also gets stuck they have > no means of recovery, right? (other than power cycling the hardware)
Yes, but I think it's not a big problem. I suppose that a sever which uses this feature will equip a BMC and BMC mandatorily supports hard reset command for the server. If the HA clustering software detects no response from the server after relatively long timeout, it might want to insert hard reset to the server by IPMI over LAN. > Just playing devils advocate here, I don't actually object to the patch. Regards, Hidehiro Kawai Hitachi, Ltd. Research & Development Group