Adding Rafael

On Fri, 25 Jan 2019 at 15:48, Vincent Guittot
<vincent.guit...@linaro.org> wrote:
>
> On Fri, 25 Jan 2019 at 15:45, Biju Das <biju....@bp.renesas.com> wrote:
> >
> > Hi Vincent,
> >
> > The below patch fixed deadlock.
> >
> > --- a/include/linux/pm_runtime.h
> > +++ b/include/linux/pm_runtime.h
> > @@ -105,7 +105,7 @@ static inline bool pm_runtime_callbacks_present(struct 
> > device *dev)
> >
> >  static inline void pm_runtime_mark_last_busy(struct device *dev)
> >  {
> > -       WRITE_ONCE(dev->power.last_busy, ktime_to_ns(ktime_get()));
> > +       WRITE_ONCE(dev->power.last_busy, ktime_get_mono_fast_ns());
>
> Ah yes. I have searched in runtime.c but  forgot to looked in pm_runtime.h

Rafael,

When pm-runtime is used for a clocksource, we get a deadlock because
we call pm_runtime_mark_last_busy() when ebaling the clocksource
whereas we are in the write_seqcount sequence.
Using ktime_get_mono_fast_ns fixes the problem because it doesn't use
tk_core.seq

Regards,
Vincent

>
> >  }
> >
> > Regards,
> > Biju
> > > -----Original Message-----
> > > From: Biju Das
> > > Sent: 25 January 2019 14:26
> > > To: 'Vincent Guittot' <vincent.guit...@linaro.org>
> > > Cc: Geert Uytterhoeven <ge...@linux-m68k.org>; Sergei Shtylyov
> > > <sergei.shtyl...@cogentembedded.com>; Rob Herring
> > > <robh...@kernel.org>; Mark Rutland <mark.rutl...@arm.com>; Simon
> > > Horman <ho...@verge.net.au>; Magnus Damm
> > > <magnus.d...@gmail.com>; Linux-Renesas <linux-renesas-
> > > s...@vger.kernel.org>; open list:OPEN FIRMWARE AND FLATTENED DEVICE
> > > TREE BINDINGS <devicet...@vger.kernel.org>; Geert Uytterhoeven
> > > <geert+rene...@glider.be>; Chris Paterson
> > > <chris.paters...@renesas.com>; Daniel Lezcano
> > > <daniel.lezc...@linaro.org>; Thomas Gleixner <t...@linutronix.de>; John
> > > Stultz <john.stu...@linaro.org>; Fabrizio Castro
> > > <fabrizio.cas...@bp.renesas.com>; Samuel Holland <sam...@sholland.org>
> > > Subject: RE: [PATCH] arm64: dts: renesas: r8a7796: Add CMT device nodes
> > >
> > > Hi Vincent,
> > >
> > > Thanks for the update. I am still seeing the same deadlock.
> > >
> > > diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> > > index fb5e2b6..a96641a 100644
> > > --- a/drivers/base/power/runtime.c
> > > +++ b/drivers/base/power/runtime.c
> > > @@ -155,7 +155,7 @@ u64 pm_runtime_autosuspend_expiration(struct
> > > device *dev)
> > >
> > >         expires  = READ_ONCE(dev->power.last_busy);
> > >         expires += (u64)autosuspend_delay * NSEC_PER_MSEC;
> > > -       if (expires > ktime_to_ns(ktime_get()))
> > > +       if (expires > ktime_get_mono_fast_ns())
> > >                 return expires; /* Expires in the future */
> > >
> > >         return 0;
> > > @@ -921,7 +921,7 @@ static enum hrtimer_restart
> > > pm_suspend_timer_fn(struct hrtimer *timer)
> > >          * If 'expires' is after the current time, we've been called
> > >          * too early.
> > >          */
> > > -       if (expires > 0 && expires < ktime_to_ns(ktime_get())) {
> > > +       if (expires > 0 && expires < ktime_get_mono_fast_ns()) {
> > >                 dev->power.timer_expires = 0;
> > >                 rpm_suspend(dev, dev->power.timer_autosuspends ?
> > >                     (RPM_ASYNC | RPM_AUTO) : RPM_ASYNC); @@ -940,7 +940,7
> > > @@ static enum hrtimer_restart  pm_suspend_timer_fn(struct hrtimer
> > > *timer)  int pm_schedule_suspend(struct device *dev, unsigned int delay)  
> > > {
> > >         unsigned long flags;
> > > -       ktime_t expires;
> > > +       u64 expires;
> > >         int retval;
> > >
> > >         spin_lock_irqsave(&dev->power.lock, flags); @@ -957,8 +957,8 @@ 
> > > int
> > > pm_schedule_suspend(struct device *dev, unsigned int delay)
> > >         /* Other scheduled or pending requests need to be canceled. */
> > >         pm_runtime_cancel_pending(dev);
> > >
> > > -       expires = ktime_add(ktime_get(), ms_to_ktime(delay));
> > > -       dev->power.timer_expires = ktime_to_ns(expires);
> > > +       expires = ktime_get_mono_fast_ns() + ((u64)delay *
> > > NSEC_PER_MSEC);
> > > +       dev->power.timer_expires = expires;
> > >         dev->power.timer_autosuspends = 0;
> > >         hrtimer_start(&dev->power.suspend_timer, expires,
> > > HRTIMER_MODE_ABS);
> > >
> > >
> > > please find the logs.
> > >
> > > root@salvator-x:~# echo  e60f0000.timer >
> > > /sys/devices/system/clocksource/clocksource0/current_clocksource
> > > [   42.769260]
> > > [   42.770760] ============================================
> > > [   42.776068] WARNING: possible recursive locking detected
> > > [   42.781378] 5.0.0-rc3-next-20190125-00001-gfa6709c-dirty #33 Not 
> > > tainted
> > > [   42.788076] --------------------------------------------
> > > [   42.793385] migration/0/11 is trying to acquire lock:
> > > [   42.798435] 00000000b43b7183 (tk_core.seq){----}, at:
> > > rpm_resume+0x5f0/0x7a8
> > > [   42.805497]
> > > [   42.805497] but task is already holding lock:
> > > [   42.811326] 00000000b43b7183 (tk_core.seq){----}, at:
> > > multi_cpu_stop+0x8c/0x140
> > > [   42.818641]
> > > [   42.818641] other info that might help us debug this:
> > > [   42.825165]  Possible unsafe locking scenario:
> > > [   42.825165]
> > > [   42.831080]        CPU0
> > > [   42.833521]        ----
> > > [   42.835962]   lock(tk_core.seq);
> > > [   42.839185]   lock(tk_core.seq);
> > > [   42.842409]
> > > [   42.842409]  *** DEADLOCK ***
> > > [   42.842409]
> > > [   42.848328]  May be due to missing lock nesting notation
> > > [   42.848328]
> > > [   42.855115] 4 locks held by migration/0/11:
> > > [   42.859293]  #0: 00000000a7511731 (timekeeper_lock){-.-.}, at:
> > > change_clocksource+0x2c/0x118
> > > [   42.867738]  #1: 00000000b43b7183 (tk_core.seq){----}, at:
> > > multi_cpu_stop+0x8c/0x140
> > > [   42.875484]  #2: 00000000b54c38a2 (&ch->lock){....}, at:
> > > sh_cmt_start+0x28/0x200
> > > [   42.882887]  #3: 0000000010d14eec (&(&dev->power.lock)->rlock){-...}, 
> > > at:
> > > __rpm_callback+0xbc/0x1e8
> > > [   42.891937]
> > > [   42.891937] stack backtrace:
> > > [   42.896294] CPU: 0 PID: 11 Comm: migration/0 Not tainted 
> > > 5.0.0-rc3-next-
> > > 20190125-00001-gfa6709c-dirty #33
> > > [   42.905860] Hardware name: Renesas Salvator-X 2nd version board based
> > > on r8a7796 (DT)
> > > [   42.913689] Call trace:
> > > [   42.916137]  dump_backtrace+0x0/0x178
> > > [   42.919796]  show_stack+0x14/0x20
> > > [   42.923111]  dump_stack+0xb0/0xec
> > > [   42.926425]  __lock_acquire+0xfb4/0x1c08
> > > [   42.930345]  lock_acquire+0xd0/0x268
> > > [   42.933917]  ktime_get+0x5c/0x108
> > > [   42.937229]  rpm_resume+0x5f0/0x7a8
> > > [   42.940714]  __pm_runtime_resume+0x50/0x98
> > > [   42.944807]  sh_cmt_start+0x84/0x200
> > > [   42.948380]  sh_cmt_clocksource_enable+0x28/0x48
> > > [   42.952995]  change_clocksource+0x84/0x118
> > > [   42.957087]  multi_cpu_stop+0x8c/0x140
> > > [   42.960834]  cpu_stopper_thread+0xac/0x120
> > > [   42.964931]  smpboot_thread_fn+0x1ac/0x2c8
> > > [   42.969024]  kthread+0x12c/0x130
> > > [   42.972249]  ret_from_fork+0x10/0x18
> > >
> > > Regards,
> > > Biju
> > >
> > > > -----Original Message-----
> > > > From: Vincent Guittot <vincent.guit...@linaro.org>
> > > > Sent: 25 January 2019 14:06
> > > > Subject: Re: [PATCH] arm64: dts: renesas: r8a7796: Add CMT device
> > > > nodes
> > > >
> > > > Hi Biju,
> > > >
> > > > It seems that the call of mark_last_busy in rpm_resume raises this
> > > problem.
> > > > We have the sequence:
> > > > change_clocksource
> > > >
> > > >     write_seqcount_begin
> > > >     ...
> > > >     timekeeping_update
> > > >         ...
> > > >         sh_cmt_clocksource_enable
> > > >             ...
> > > >             rpm_resume
> > > >                 pm_runtime_mark_last_busy
> > > >                     ktime_get
> > > >                         do
> > > >                             read_seqcount_begin
> > > >                         while read_seqcount_retry
> > > >     ....
> > > >     write_seqcount_end
> > > >
> > > > In fact, we should be safe because we haven't yet changed the
> > > > clocksource as we are enabling the clocksource that will be used for the
> > > switch.
> > > >
> > > > Can you try the patch below ?
> > > > It uses ktime_get_mono_fast_ns instead which is lock safe for such
> > > > case
> > > >
> > > > ---
> > > >  drivers/base/power/runtime.c | 10 +++++-----
> > > >  1 file changed, 5 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/drivers/base/power/runtime.c
> > > > b/drivers/base/power/runtime.c index 457be03..708a13f 100644
> > > > --- a/drivers/base/power/runtime.c
> > > > +++ b/drivers/base/power/runtime.c
> > > > @@ -130,7 +130,7 @@ u64 pm_runtime_autosuspend_expiration(struct
> > > > device *dev)  {
> > > >  int autosuspend_delay;
> > > >  u64 last_busy, expires = 0;
> > > > -u64 now = ktime_to_ns(ktime_get());
> > > > +u64 now = ktime_get_mono_fast_ns();
> > > >
> > > >  if (!dev->power.use_autosuspend)
> > > >  goto out;
> > > > @@ -909,7 +909,7 @@ static enum hrtimer_restart
> > > > pm_suspend_timer_fn(struct hrtimer *timer)
> > > >   * If 'expires' is after the current time, we've been called
> > > >   * too early.
> > > >   */
> > > > -if (expires > 0 && expires < ktime_to_ns(ktime_get())) {
> > > > +if (expires > 0 && expires < ktime_get_mono_fast_ns()) {
> > > >  dev->power.timer_expires = 0;
> > > >  rpm_suspend(dev, dev->power.timer_autosuspends ?
> > > >      (RPM_ASYNC | RPM_AUTO) : RPM_ASYNC); @@ -928,7
> > > > +928,7 @@ static enum hrtimer_restart  pm_suspend_timer_fn(struct
> > > > hrtimer *timer)  int pm_schedule_suspend(struct device *dev, unsigned
> > > > int
> > > > delay)  {
> > > >  unsigned long flags;
> > > > -ktime_t expires;
> > > > +u64 expires;
> > > >  int retval;
> > > >
> > > >  spin_lock_irqsave(&dev->power.lock, flags); @@ -945,8 +945,8 @@
> > > int
> > > > pm_schedule_suspend(struct device *dev, unsigned int delay)
> > > >  /* Other scheduled or pending requests need to be canceled. */
> > > >  pm_runtime_cancel_pending(dev);
> > > >
> > > > -expires = ktime_add(ktime_get(), ms_to_ktime(delay));
> > > > -dev->power.timer_expires = ktime_to_ns(expires);
> > > > +expires = ktime_get_mono_fast_ns() + (u64)delay *
> > > > NSEC_PER_MSEC);
> > > > +dev->power.timer_expires = expires;
> > > >  dev->power.timer_autosuspends = 0;
> > > >  hrtimer_start(&dev->power.suspend_timer, expires,
> > > HRTIMER_MODE_ABS);
> > > >
> > > > --
> > > > 2.7.4
> > > >
> > > > Le Friday 25 Jan 2019 à 12:26:54 (+0000), Biju Das a écrit :
> > > > > Hi All,
> > > > >
> > > > > I have enabled lock debugging  and it is showing "possible recursive
> > > > > locking
> > > > detected "with the patch ("8234f6734c5d74ac794e5517437f51c57d65f865"
> > > > PM-runtime: Switch autosuspend over to using hrtimers)
> > > > >
> > > > > root@salvator-x:~# echo  e60f0000.timer >
> > > > /sys/devices/system/clocksource/clocksource0/current_clocksource
> > > > > [   67.995854]
> > > > > [   67.997354] ============================================
> > > > > [   68.002663] WARNING: possible recursive locking detected
> > > > > [   68.007974] 5.0.0-rc3-next-20190125-00005-g382822b #31 Not tainted
> > > > > [   68.014150] --------------------------------------------
> > > > > [   68.019459] migration/0/11 is trying to acquire lock:
> > > > > [   68.024509] 0000000065de5fdf (tk_core.seq){----}, at:
> > > > rpm_resume+0x5f0/0x7a8
> > > > > [   68.031570]
> > > > > [   68.031570] but task is already holding lock:
> > > > > [   68.037399] 0000000065de5fdf (tk_core.seq){----}, at:
> > > > multi_cpu_stop+0x8c/0x140
> > > > > [   68.044714]
> > > > > [   68.044714] other info that might help us debug this:
> > > > > [   68.051237]  Possible unsafe locking scenario:
> > > > > [   68.051237]
> > > > > [   68.057153]        CPU0
> > > > > [   68.059594]        ----
> > > > > [   68.062034]   lock(tk_core.seq);
> > > > > [   68.065258]   lock(tk_core.seq);
> > > > > [   68.068482]
> > > > > [   68.068482]  *** DEADLOCK ***
> > > > > [   68.068482]
> > > > > [   68.074402]  May be due to missing lock nesting notation
> > > > > [   68.074402]
> > > > > [   68.081188] 4 locks held by migration/0/11:
> > > > > [   68.085366]  #0: 00000000463b6446 (timekeeper_lock){-.-.}, at:
> > > > change_clocksource+0x2c/0x118
> > > > > [   68.093812]  #1: 0000000065de5fdf (tk_core.seq){----}, at:
> > > > multi_cpu_stop+0x8c/0x140
> > > > > [   68.101558]  #2: 0000000098b45550 (&ch->lock){....}, at:
> > > > sh_cmt_start+0x28/0x200
> > > > > [   68.108961]  #3: 00000000da0f4e80 
> > > > > (&(&dev->power.lock)->rlock){-...},
> > > at:
> > > > __rpm_callback+0xbc/0x1e8
> > > > > [   68.118011]
> > > > > [   68.118011] stack backtrace:
> > > > > [   68.122368] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.0.0-rc3-
> > > next-
> > > > 20190125-00005-g382822b #31
> > > > > [   68.131412] Hardware name: Renesas Salvator-X 2nd version board
> > > based
> > > > on r8a7796 (DT)
> > > > > [   68.139240] Call trace:
> > > > > [   68.141687]  dump_backtrace+0x0/0x178
> > > > > [   68.145346]  show_stack+0x14/0x20
> > > > > [   68.148661]  dump_stack+0xb0/0xec
> > > > > [   68.151975]  __lock_acquire+0xfb4/0x1c08
> > > > > [   68.155894]  lock_acquire+0xd0/0x268
> > > > > [   68.159467]  ktime_get+0x5c/0x108
> > > > > [   68.162779]  rpm_resume+0x5f0/0x7a8
> > > > > [   68.166265]  __pm_runtime_resume+0x50/0x98
> > > > > [   68.170358]  sh_cmt_start+0x84/0x200
> > > > > [   68.173931]  sh_cmt_clocksource_enable+0x28/0x48
> > > > > [   68.178546]  change_clocksource+0x84/0x118
> > > > > [   68.182639]  multi_cpu_stop+0x8c/0x140
> > > > > [   68.186385]  cpu_stopper_thread+0xac/0x120
> > > > > [   68.190481]  smpboot_thread_fn+0x1ac/0x2c8
> > > > > [   68.194574]  kthread+0x12c/0x130
> > > > > [   68.197800]  ret_from_fork+0x10/0x18
> > > > >
> > > > > Regards,
> > > > > Biju
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Biju Das
> > > > > > Sent: 25 January 2019 11:27
> > > > > > To: 'Geert Uytterhoeven' <ge...@linux-m68k.org>; Vincent Guittot
> > > > > > <vincent.guit...@linaro.org>; Sergei Shtylyov
> > > > > > <sergei.shtyl...@cogentembedded.com>
> > > > > > Cc: Rob Herring <robh...@kernel.org>; Mark Rutland
> > > > > > <mark.rutl...@arm.com>; Simon Horman <ho...@verge.net.au>;
> > > > Magnus
> > > > > > Damm <magnus.d...@gmail.com>; Linux-Renesas <linux-renesas-
> > > > > > s...@vger.kernel.org>; open list:OPEN FIRMWARE AND FLATTENED
> > > > DEVICE
> > > > > > TREE BINDINGS <devicet...@vger.kernel.org>; Geert Uytterhoeven
> > > > > > <geert+rene...@glider.be>; Chris Paterson
> > > > > > <chris.paters...@renesas.com>; Daniel Lezcano
> > > > > > <daniel.lezc...@linaro.org>; Thomas Gleixner <t...@linutronix.de>;
> > > > > > John Stultz <john.stu...@linaro.org>; Fabrizio Castro
> > > > > > <fabrizio.cas...@bp.renesas.com>; Samuel Holland
> > > > > > <sam...@sholland.org>
> > > > > > Subject: RE: [PATCH] arm64: dts: renesas: r8a7796: Add CMT device
> > > > > > nodes
> > > > > >
> > > > > > Hi Geert,
> > > > > >
> > > > > > Thanks for the feedback. I started testing CMT on latest
> > > > > > kernel(Linux-next-
> > > > > > 20190125 and also renesas-dev) and found that it is broken on
> > > > > > R-Car M3-W device.
> > > > > >
> > > > > > On further investigation the patch ("
> > > > > > 8234f6734c5d74ac794e5517437f51c57d65f865"   PM-runtime: Switch
> > > > > > autosuspend over to using hrtimers)  is causing the issue.
> > > > > >
> > > > > > During clock source switching, It calls the function "sh_cmt_enable"
> > > > > > which calls " pm_runtime_get_sync(&ch->cmt->pdev->dev);"  and
> > > > > > after that console freezes.
> > > > > >
> > > > > > Sergei: Have you noticed this issue  on R-Car V3M and V3H boards
> > > > > > with latest kernel?
> > > > > >
> > > > > > Regards,
> > > > > > Biju
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: linux-renesas-soc-ow...@vger.kernel.org
> > > > > > > <linux-renesas-soc- ow...@vger.kernel.org> On Behalf Of Geert
> > > > > > > Uytterhoeven
> > > > > > > Sent: 24 January 2019 10:16
> > > > > > > To: Biju Das <biju....@bp.renesas.com>
> > > > > > > Cc: Rob Herring <robh...@kernel.org>; Mark Rutland
> > > > > > > <mark.rutl...@arm.com>; Simon Horman <ho...@verge.net.au>;
> > > > > > Magnus Damm
> > > > > > > <magnus.d...@gmail.com>; Linux-Renesas <linux-renesas-
> > > > > > > s...@vger.kernel.org>; open list:OPEN FIRMWARE AND FLATTENED
> > > > DEVICE
> > > > > > > TREE BINDINGS <devicet...@vger.kernel.org>; Geert Uytterhoeven
> > > > > > > <geert+rene...@glider.be>; Chris Paterson
> > > > > > > <chris.paters...@renesas.com>; Daniel Lezcano
> > > > > > > <daniel.lezc...@linaro.org>; Thomas Gleixner
> > > > > > > <t...@linutronix.de>; John Stultz <john.stu...@linaro.org>;
> > > > > > > Fabrizio Castro <fabrizio.cas...@bp.renesas.com>; Samuel Holland
> > > > > > <sam...@sholland.org>
> > > > > > > Subject: Re: [PATCH] arm64: dts: renesas: r8a7796: Add CMT
> > > > > > > device nodes
> > > > > > >
> > > > > > > Hi Biju,
> > > > > > >
> > > > > > > On Fri, Oct 26, 2018 at 10:32 AM Biju Das
> > > > > > > <biju....@bp.renesas.com>
> > > > > > wrote:
> > > > > > > > This patch adds CMT{0|1|2|3} device nodes for r8a7796 SoC.
> > > > > > > >
> > > > > > > > Signed-off-by: Biju Das <biju....@bp.renesas.com>
> > > > > > > > ---
> > > > > > > > This patch is tested against renesas-dev
> > > > > > > >
> > > > > > > > I have executed on inconsistency-check, nanosleep and
> > > > > > > > clocksource_switch selftests on this arm64 SoC. The
> > > > > > > > inconsistency-check and nanosleep tests are working fine.The
> > > > > > > > clocksource_switch asynchronous test is failing due to
> > > > > > > > inconsistency-check
> > > > > > > failure on "arch_sys_counter".
> > > > > > > >
> > > > > > > > But if i skip the clocksource_switching of "arch_sys_counter",
> > > > > > > > the asynchronous test is passing for CMT0/1/2/3 timer.
> > > > > > > >
> > > > > > > > Has any one noticed this issue?
> > > > > > >
> > > > > > > clockevents/next now has commit 7cd6dca3600d8d71
> > > > > > > ("clocksource/drivers/arch_timer: Workaround for Allwinner A64
> > > > > > > timer instability").  Perhaps this is related, and the same test
> > > > > > > program may indicate similar issues?
> > > > > > >
> > > > > > > See also https://lore.kernel.org/lkml/20190113021719.46457-2-
> > > > > > > sam...@sholland.org/
> > > > > > >
> > > > > > > Gr{oetje,eeting}s,
> > > > > > >
> > > > > > >                         Geert
> > > > > > >
> > > > > > > --
> > > > > > > Geert Uytterhoeven -- There's lots of Linux beyond ia32 --
> > > > > > > geert@linux- m68k.org
> > > > > > >
> > > > > > > In personal conversations with technical people, I call myself a
> > > > > > > hacker. But when I'm talking to journalists I just say
> > > > > > > "programmer" or
> > > > > > something like that.
> > > > > > >                                 -- Linus Torvalds
> > > > >
> > > > >
> > > > >
> > > > > Renesas Electronics Europe Ltd, Dukes Meadow, Millboard Road, Bourne
> > > > End, Buckinghamshire, SL8 5FH, UK. Registered in England & Wales under
> > > > Registered No. 04586709.
> >
> >
> >
> > Renesas Electronics Europe Ltd, Dukes Meadow, Millboard Road, Bourne End, 
> > Buckinghamshire, SL8 5FH, UK. Registered in England & Wales under 
> > Registered No. 04586709.

Reply via email to