On Thu, Aug 03, 2017 at 01:41:11PM +0200, Daniel Lezcano wrote:
> On 02/08/2017 15:07, Steven Rostedt wrote:
> > On Wed, 2 Aug 2017 14:42:39 +0200
> > Daniel Lezcano <daniel.lezc...@linaro.org> wrote:
> > 
> >> On Tue, Aug 01, 2017 at 08:12:14PM -0400, Steven Rostedt wrote:
> >>> On Wed, 2 Aug 2017 00:15:44 +0200
> >>> Daniel Lezcano <daniel.lezc...@linaro.org> wrote:
> >>>   
> >>>> On 02/08/2017 00:04, Paul E. McKenney wrote:  
> >>>>>> Hi Paul,
> >>>>>>
> >>>>>> I have been trying to set the function_graph tracer for ftrace and 
> >>>>>> each time I
> >>>>>> get a CPU stall.
> >>>>>>
> >>>>>> How to reproduce:
> >>>>>> -----------------
> >>>>>>
> >>>>>>                 echo function_graph > 
> >>>>>> /sys/kernel/debug/tracing/current_tracer
> >>>>>>
> >>>>>> This error appears with v4.13-rc3 and v4.12-rc6.  
> >>>
> >>> Can you bisect this? It may be due to this commit:
> >>>
> >>> 0598e4f08 ("ftrace: Add use of synchronize_rcu_tasks() with dynamic 
> >>> trampolines")  
> >>
> >> Hi Steve,
> >>
> >> I git bisected but each time the issue occured. I went through the 
> >> different
> >> version down to v4.4 where the board was not fully supported and it ended 
> >> up to
> >> have the same issue.
> >>
> >> Finally, I had the intuition it could be related to the wall time (there 
> >> is no
> >> RTC clock with battery on the board and the wall time is Jan 1st, 1970).
> >>
> >> Setting up the with ntpdate solved the problem.
> 
> Actually, it did not solve the problem. The function_graph trace is set,
> I can use the system but after awhile (no tracing enabled at anytime),
> the stall appears.
> 
> >> Even if it is rarely the case to have the time not set, is it normal to 
> >> have a
> >> RCU cpu stall ?
> >>
> >>
> > 
> > BTW, function_graph tracer is the most invasive of the tracers. It's 4x
> > slower than function tracer. I'm wondering if the tracer isn't the
> > cause, but just slows things down enough to cause a some other race
> > condition that triggers the bug.
> 
> Yes, that could be true.
> 
> I tried the following scenario:
> 
>  - cpufreq governor => userspace + max_freq (1.2GHz)
>    - function_graph set ==> OK
> 
>  - cpufreq governor => userspace + min_freq (200MHz)
>    - function_graph set ==> RCU stall
> 
> Beside that, I realize the board is constantly processing SOF interrupts
> every 124us, so that adds more overhead.
> 
> Removing the USB support, thus the associated processing for the SOF
> interrupts, I don't see anymore the RCU stall.

Looks like Steve called this one!  ;-)

> Is it the expected behavior to have the system hang after a RCU stall
> raises ?

No, but if NMI stack traces are enabled and there are any NMI problems,
bad things can happen.  In addition, the bulk of output can cause problems
if you have a slow console connection.

                                                        Thanx, Paul

Reply via email to