On Thu, Aug 03, 2017 at 01:41:11PM +0200, Daniel Lezcano wrote: > On 02/08/2017 15:07, Steven Rostedt wrote: > > On Wed, 2 Aug 2017 14:42:39 +0200 > > Daniel Lezcano <daniel.lezc...@linaro.org> wrote: > > > >> On Tue, Aug 01, 2017 at 08:12:14PM -0400, Steven Rostedt wrote: > >>> On Wed, 2 Aug 2017 00:15:44 +0200 > >>> Daniel Lezcano <daniel.lezc...@linaro.org> wrote: > >>> > >>>> On 02/08/2017 00:04, Paul E. McKenney wrote: > >>>>>> Hi Paul, > >>>>>> > >>>>>> I have been trying to set the function_graph tracer for ftrace and > >>>>>> each time I > >>>>>> get a CPU stall. > >>>>>> > >>>>>> How to reproduce: > >>>>>> ----------------- > >>>>>> > >>>>>> echo function_graph > > >>>>>> /sys/kernel/debug/tracing/current_tracer > >>>>>> > >>>>>> This error appears with v4.13-rc3 and v4.12-rc6. > >>> > >>> Can you bisect this? It may be due to this commit: > >>> > >>> 0598e4f08 ("ftrace: Add use of synchronize_rcu_tasks() with dynamic > >>> trampolines") > >> > >> Hi Steve, > >> > >> I git bisected but each time the issue occured. I went through the > >> different > >> version down to v4.4 where the board was not fully supported and it ended > >> up to > >> have the same issue. > >> > >> Finally, I had the intuition it could be related to the wall time (there > >> is no > >> RTC clock with battery on the board and the wall time is Jan 1st, 1970). > >> > >> Setting up the with ntpdate solved the problem. > > Actually, it did not solve the problem. The function_graph trace is set, > I can use the system but after awhile (no tracing enabled at anytime), > the stall appears. > > >> Even if it is rarely the case to have the time not set, is it normal to > >> have a > >> RCU cpu stall ? > >> > >> > > > > BTW, function_graph tracer is the most invasive of the tracers. It's 4x > > slower than function tracer. I'm wondering if the tracer isn't the > > cause, but just slows things down enough to cause a some other race > > condition that triggers the bug. > > Yes, that could be true. > > I tried the following scenario: > > - cpufreq governor => userspace + max_freq (1.2GHz) > - function_graph set ==> OK > > - cpufreq governor => userspace + min_freq (200MHz) > - function_graph set ==> RCU stall > > Beside that, I realize the board is constantly processing SOF interrupts > every 124us, so that adds more overhead. > > Removing the USB support, thus the associated processing for the SOF > interrupts, I don't see anymore the RCU stall.
Looks like Steve called this one! ;-) > Is it the expected behavior to have the system hang after a RCU stall > raises ? No, but if NMI stack traces are enabled and there are any NMI problems, bad things can happen. In addition, the bulk of output can cause problems if you have a slow console connection. Thanx, Paul