On Wed, Aug 13, 2014 at 11:14:39AM +0530, Amit Shah wrote: > On (Tue) 12 Aug 2014 [14:41:51], Paul E. McKenney wrote: > > On Tue, Aug 12, 2014 at 02:39:36PM -0700, Paul E. McKenney wrote: > > > On Tue, Aug 12, 2014 at 09:06:21AM -0700, Paul E. McKenney wrote: > > > > On Tue, Aug 12, 2014 at 11:03:21AM +0530, Amit Shah wrote: > > > > > > [ . . . ] > > > > > > > > I know of only virtio-console doing this (via userspace only, > > > > > though). > > > > > > > > As in userspace within the guest? That would not work. The userspace > > > > that the qemu is running in might. There is a way to extract ftrace > > > > info > > > > from crash dumps, so one approach would be "sendkey alt-sysrq-c", then > > > > pull the buffer from the resulting dump. For all I know, there might > > > > also > > > > be some script that uses the qemu "x" command to get at the ftrace > > > > buffer. > > > > > > > > Again, I cannot reproduce this, and I have been through the code several > > > > times over the past few days, and am not seeing it. I could start > > > > sending you random diagnostic patches, but it would be much better if > > > > we could get the trace data from the failure. > > I think the only recourse I now have is to dump the guest state from > qemu, and attempt to find the ftrace buffers by poking pages and > finding some ftrace-like struct... and then dumping the buffers.
The data exists in the qemu guest state, so it would be good to have it one way or another. My current (perhaps self-serving) guess is that you have come up with a way to trick qemu into dropping IPIs. > > > Hearing no objections, random patch #1. The compiler could in theory > > > cause trouble without this patch, so there is some possibility that > > > it is a fix. > > > > #2... This would have been a problem without the earlier patch, but > > who knows? (#1 moved from theoretically possible but not on x86 to > > maybe on x86 given a sufficiently malevolent compiler with the > > patch that you located with bisection.) > > I tried all 3 patches individually, and all 3 together, no success. I am not at all surprised. You would have to have an extremely malevolent compiler for two of them to have any effect, and you would have to have someone invoking call_rcu() with irqs disabled from idle for the other to have any effect. Which is why I missed seeing them the first three times I reviewed this code over the past few days. > My gcc is gcc-4.8.3-1.fc20.x86_64. I'm using a fairly uptodate Fedora > 20 system on my laptop for these tests. > > Curiously, patches 1 and 3 applied fine, but this one had a conflict. > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > > index 1dc72f523c4a..1da605740e8d 100644 > > --- a/kernel/rcu/tree_plugin.h > > +++ b/kernel/rcu/tree_plugin.h > > @@ -2137,6 +2137,17 @@ static bool __call_rcu_nocb(struct rcu_data *rdp, > > struct rcu_head *rhp, > > I have this hunk at line 2161, and... > > > trace_rcu_callback(rdp->rsp->name, rhp, > > -atomic_long_read(&rdp->nocb_q_count_lazy), > > -atomic_long_read(&rdp->nocb_q_count)); > > + > > + /* > > + * If called from an extended quiescent state with interrupts > > + * disabled, invoke the RCU core in order to allow the idle-entry > > + * deferred-wakeup check to function. > > + */ > > + if (irqs_disabled_flags(flags) && > > + !rcu_is_watching() && > > + cpu_online(smp_processor_id())) > > + invoke_rcu_core(); > > + > > return true; > > I have return 1; here. > > I'm on linux.git, c8d6637d0497d62093dbba0694c7b3a80b79bfe1. I am working on top of my -rcu tree, which contains the fix from "1" to "true" compared to current mainline. So this will resolve itself, and you should be OK fixing up conflict in either direction. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/