On 04/02/2015 05:01 PM, Ingo Molnar wrote: > > * Preeti U Murthy <pre...@linux.vnet.ibm.com> wrote: > >> On 04/02/2015 04:12 PM, Ingo Molnar wrote: >>> >>> * Preeti U Murthy <pre...@linux.vnet.ibm.com> wrote: >>> >>>> It was found when doing a hotplug stress test on POWER, that the machine >>>> either hit softlockups or rcu_sched stall warnings. The issue was >>>> traced to commit 7cba160ad789a powernv/cpuidle: Redesign idle states >>>> management, which exposed the cpu down race with hrtimer based broadcast >>>> mode(Commit 5d1638acb9f6(tick: Introduce hrtimer based broadcast). This >>>> is explained below. >>>> >>>> Assume CPU1 is the CPU which holds the hrtimer broadcasting duty before >>>> it is taken down. >>>> >>>> CPU0 CPU1 >>>> >>>> cpu_down() take_cpu_down() >>>> disable_interrupts() >>>> >>>> cpu_die() >>>> >>>> while(CPU1 != CPU_DEAD) { >>>> msleep(100); >>>> switch_to_idle(); >>>> stop_cpu_timer(); >>>> schedule_broadcast(); >>>> } >>>> >>>> tick_cleanup_cpu_dead() >>>> take_over_broadcast() >>>> >>>> So after CPU1 disabled interrupts it cannot handle the broadcast hrtimer >>>> anymore, so CPU0 will be stuck forever. >>>> >>>> Fix this by explicitly taking over broadcast duty before cpu_die(). >>>> This is a temporary workaround. What we really want is a callback in the >>>> clockevent device which allows us to do that from the dying CPU by >>>> pushing the hrtimer onto a different cpu. That might involve an IPI and >>>> is definitely more complex than this immediate fix. >>> >>> So why not use a suitable CPU_DOWN* notifier for this, instead of open >>> coding it all into a random place in the hotplug machinery? >> >> This is because each of them is unsuitable for a reason: >> >> 1. CPU_DOWN_PREPARE stage allows for a fail. The cpu in question may not >> successfully go down. So we may pull the hrtimer unnecessarily. > > Failure is really rare - and as long as things will continue to work > afterwards it's not a problem to pull the hrtimer to this CPU. Right?
We will need to move this function to the clockevents_notify() call under CPU_DOWN_PREPARE. But I see that Tglx wanted to get rid of the clockevents_notify() function because it is more of a multiplex call and less of a notification mechanism and get rid of this function explicitly. Regards Preeti U Murthy > > Thanks, > > Ingo > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev > _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev