On Mon, Mar 28, 2016 at 03:07:36PM +0000, Mathieu Desnoyers wrote: > ----- On Mar 28, 2016, at 9:29 AM, Paul E. McKenney > paul...@linux.vnet.ibm.com wrote: > > > On Mon, Mar 28, 2016 at 08:28:51AM +0200, Peter Zijlstra wrote: > >> On Sun, Mar 27, 2016 at 02:09:14PM -0700, Paul E. McKenney wrote: > >> > >> > > Does that system have MONITOR/MWAIT errata? > >> > > >> > On the off-chance that this question was also directed at me, > >> > >> Hehe, it wasn't, however, since we're here.. > >> > >> > here is > >> > what I am running on. I am running in a qemu/KVM virtual machine, in > >> > case that matters. > >> > >> Have you actually tried on real proper hardware? Does it still reproduce > >> there? > > > > Ross has, but I have not, given that I have a shared system on the one > > hand and a single-socket (four core, eight hardware thread) laptop on > > the other that has even longer reproduction times. The repeat-by is > > as follows: > > > > o Build a kernel with the following Kconfigs: > > > > CONFIG_SMP=y > > CONFIG_NR_CPUS=16 > > CONFIG_PREEMPT_NONE=n > > CONFIG_PREEMPT_VOLUNTARY=n > > CONFIG_PREEMPT=y > > # This should result in CONFIG_PREEMPT_RCU=y > > CONFIG_HZ_PERIODIC=y > > CONFIG_NO_HZ_IDLE=n > > CONFIG_NO_HZ_FULL=n > > CONFIG_RCU_TRACE=y > > CONFIG_HOTPLUG_CPU=y > > CONFIG_RCU_FANOUT=2 > > CONFIG_RCU_FANOUT_LEAF=2 > > CONFIG_RCU_NOCB_CPU=n > > CONFIG_DEBUG_LOCK_ALLOC=n > > CONFIG_RCU_BOOST=y > > CONFIG_RCU_KTHREAD_PRIO=2 > > CONFIG_DEBUG_OBJECTS_RCU_HEAD=n > > CONFIG_RCU_EXPERT=y > > CONFIG_RCU_TORTURE_TEST=y > > CONFIG_PRINTK_TIME=y > > CONFIG_RCU_TORTURE_TEST_SLOW_CLEANUP=y > > CONFIG_RCU_TORTURE_TEST_SLOW_INIT=y > > CONFIG_RCU_TORTURE_TEST_SLOW_PREINIT=y > > > > If desired, you can instead build with CONFIG_RCU_TORTURE_TEST=m > > and modprobe/insmod the module manually. > > > > o Find a two-socket x86 system or larger, with at least 16 CPUs. > > > > o Boot the kernel with the following kernel boot parameters: > > > > rcutorture.onoff_interval=1 rcutorture.onoff_holdoff=30 > > > > The onoff_holdoff is only needed for CONFIG_RCU_TORTURE_TEST=y. > > When manually setting up the module, you get the holdoff for > > free, courtesy of human timescales. > > > > In the absence of instrumentation, I get failures usually within a > > couple of hours, though sometimes much longer. With instrumentation, > > the sky appears to be the limit. :-/ > > > > Ross is running on bare metal with no CPU hotplug, so perhaps his setup > > is of more immediate interest. He is seeing the same symptoms that I am, > > namely a task being repeatedly awakened without actually coming out of > > TASK_INTERRUPTIBLE state, let alone running. As you pointed out earlier, > > he cannot be seeing the same bug that my crude patch suppresses, but > > given that I still see a few failures with that crude patch, it is quite > > possible that there is still a common bug. > > With respect to bare metal vs KVM guest, I've reported an issue with > inaccurate detection of TSC as being an unreliable time source on a > KVM guest. The basic setup is to overcommit the CPU use across the > entire host, thus leading to preemption of the guest. The guest TSC > watchdog then falsely assume that TSC is unreliable, because it gets > preempted for a long time (e.g. 0.5 second) between reading the HPET > and the TSC. > > Ref. http://lkml.iu.edu/hypermail/linux/kernel/1509.1/00379.html > > I'm wondering if what Paul is observing in the KVM setup might be > caused by long preemption by the host. One way to stress test this > is to run parallel kernel builds on the host (or in another guest) > while the guest is running, thus over-committing the CPU use. > > Thoughts ?
If I run NO_HZ_FULL, I do get warnings about unstable timesources. And certainly guest VCPUs can be preempted. However, if they were preempted for the lengths of time I am seeing, I should also see softlockup warnings on the host, which I do not see. That said, perhaps I should cobble together something to force short repeated preemptions at the host level. Maybe that would get the reproduction rate sufficiently high to enable less-dainty debugging. Thanx, Paul