On Fri, Oct 16, 2020 at 10:34:21AM +0100, Sudeep Holla wrote: > On Fri, Oct 16, 2020 at 11:02:02AM +0200, Jerome Brunet wrote: > > > > On Fri 16 Oct 2020 at 10:44, Sudeep Holla <sudeep.ho...@arm.com> wrote: > > > > > On Thu, Oct 15, 2020 at 03:29:35PM +0100, Ionela Voinescu wrote: > > >> Hi Jerome, > > >> > > >> On Thursday 15 Oct 2020 at 15:58:30 (+0200), Jerome Brunet wrote: > > >> > > > >> > On Thu 15 Oct 2020 at 15:46, Ionela Voinescu <ionela.voine...@arm.com> > > >> > wrote: > > >> > > > >> > > Hi guys, > > >> > > > > >> > > On Wednesday 23 Sep 2020 at 14:39:16 (+0200), Jerome Brunet wrote: > > >> > >> If the txdone is done by polling, it is possible for msg_submit() > > >> > >> to start > > >> > >> the timer while txdone_hrtimer() callback is running. If the timer > > >> > >> needs > > >> > >> recheduling, it could already be enqueued by the time > > >> > >> hrtimer_forward_now() > > >> > >> is called, leading hrtimer to loudly complain. > > >> > >> > > >> > >> WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 > > >> > >> hrtimer_forward+0xc4/0x110 > > >> > >> CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted > > >> > >> 5.9.0-rc2-00236-gd3520067d01c-dirty #5 > > >> > >> Hardware name: Libre Computer AML-S805X-AC (DT) > > >> > >> Workqueue: events_freezable_power_ thermal_zone_device_check > > >> > >> pstate: 20000085 (nzCv daIf -PAN -UAO BTYPE=--) > > >> > >> pc : hrtimer_forward+0xc4/0x110 > > >> > >> lr : txdone_hrtimer+0xf8/0x118 > > >> > >> [...] > > >> > >> > > >> > >> Canceling the timer before starting it ensure that the timer > > >> > >> callback is > > >> > >> not running when the timer is started, solving this race condition. > > >> > >> > > >> > >> Fixes: 0cc67945ea59 ("mailbox: switch to hrtimer for tx_complete > > >> > >> polling") > > >> > >> Reported-by: Da Xue <da@libre.computer> > > >> > >> Signed-off-by: Jerome Brunet <jbru...@baylibre.com> > > >> > >> --- > > >> > >> drivers/mailbox/mailbox.c | 8 ++++++-- > > >> > >> 1 file changed, 6 insertions(+), 2 deletions(-) > > >> > >> > > >> > >> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c > > >> > >> index 0b821a5b2db8..34f9ab01caef 100644 > > >> > >> --- a/drivers/mailbox/mailbox.c > > >> > >> +++ b/drivers/mailbox/mailbox.c > > >> > >> @@ -82,9 +82,13 @@ static void msg_submit(struct mbox_chan *chan) > > >> > >> exit: > > >> > >> spin_unlock_irqrestore(&chan->lock, flags); > > >> > >> > > >> > >> - if (!err && (chan->txdone_method & TXDONE_BY_POLL)) > > >> > >> - /* kick start the timer immediately to avoid delays */ > > >> > >> + if (!err && (chan->txdone_method & TXDONE_BY_POLL)) { > > >> > >> + /* Disable the timer if already active ... */ > > >> > >> + hrtimer_cancel(&chan->mbox->poll_hrt); > > >> > >> + > > >> > >> + /* ... and kick start it immediately to avoid delays */ > > >> > >> hrtimer_start(&chan->mbox->poll_hrt, 0, > > >> > >> HRTIMER_MODE_REL); > > >> > >> + } > > >> > >> } > > >> > >> > > >> > >> static void tx_tick(struct mbox_chan *chan, int r) > > >> > > > > >> > > I've tracked a regression back to this commit. Details to reproduce: > > >> > > > >> > Hi Ionela, > > >> > > > >> > I don't have access to your platform and I don't get what is going on > > >> > from the log below. > > >> > > > >> > Could you please give us a bit more details about what is going on ? > > >> > > > >> > > >> I'm not familiar with the mailbox subsystem, so the best I can do right > > >> now is to add Sudeep to Cc, in case this conflicts in some way with the > > >> ARM MHU patches [1]. > > >> > > > > > > Not it can't be doorbell driver as we use SCPI(old firmware) with upstream > > > MHU driver as is limiting the number of channels to be used. > > > > > >> In the meantime I'll get some traces and get more familiar with the > > >> code. > > >> > > > > > > I will try that too. > > > > BTW, this issue was originally reported on amlogic platforms which also > > use arm,mhu mailbox driver. > > > > OK. Anyway just noticed that hrtimer_cancel uses hrtimer_try_to_cancel > and hrtimer_cancel_wait_running. The latter is just cpu_relax() if > PREEMPT_RT=n, so you may still have issue if the hrtimer is still active > or restarts in the meantime. >
Scratch that, I failed to see the loop in hrtimer_cancel earlier. -- Regards, Sudeep