On Fri, Nov 16, 2012 at 7:54 AM, Andriy Gapon <a...@freebsd.org> wrote: > on 16/11/2012 00:58 Ryan Stone said the following: >> At work we have some custom watchdog hardware that sends an NMI upon >> expiry. We've modified the kernel to panic when it receives the watchdog >> NMI. I've been trying the "stop_scheduler_on_panic" mode, and I've >> discovered that when my watchdog expires, the system gets completely >> wedged. After some digging, I've discovered is that I have multiple CPUs >> getting the watchdog NMI and trying to panic concurrently. One of the CPUs >> wins, and the rest spin forever in this code: >> >> /* >> * We don't want multiple CPU's to panic at the same time, so we >> * use panic_cpu as a simple spinlock. We have to keep checking >> * panic_cpu if we are spinning in case the panic on the first >> * CPU is canceled. >> */ >> if (panic_cpu != PCPU_GET(cpuid)) >> while (atomic_cmpset_int(&panic_cpu, NOCPU, >> PCPU_GET(cpuid)) == 0) >> while (panic_cpu != NOCPU) >> ; /* nothing */ >> >> The system wedges when stop_cpus_hard() is called, which sends NMIs to all >> of the other CPUs and waits for them to acknowledge that they are stopped >> before returning. However the CPU will not deliver an NMI to a CPU that is >> already handling an NMI, so the other CPUs that got a watchdog NMI and are >> spinning will never go into the NMI handler and acknowledge that they are >> stopped. > > I thought about this issue and fixed (in my tree) in a different way: > http://people.freebsd.org/~avg/cpu_stop-race.diff > > The need for spinlock_enter in the patch in not entirely clear. > The main idea is that a CPU which calls cpu_stop and loses a race should > voluntary enter cpustop_handler. > I am also not sure about MI-cleanness of this patch.
It is similar to what I propose but with some differences: - It is not clean from MI perspective - I don't think we need to treact it specially, I would just unconditionally stop all the CPUs entering in the "spinlock zone", making the patch simpler. So I guess you are really fine with the proposal I made? Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"