Re: [PATCH RFC] mailbox: move controller timer to per-channel timers

Alexey Klimov Tue, 30 May 2017 09:30:39 -0700

On Thu, May 25, 2017 at 6:43 PM, Alexey Klimov <[email protected]> wrote:
> Hi Jassi,
> 
> sorry for delay again.
> 
> On Tue, Apr 11, 2017 at 06:30:08PM +0530, Jassi Brar wrote:
> > On 11 April 2017 at 18:04, Alexey Klimov <[email protected]> wrote:
> > > On Fri, Apr 07, 2017 at 08:39:35PM +0530, Jassi Brar wrote:
> > >> On Thu, Apr 6, 2017 at 11:01 PM, Alexey Klimov <[email protected]> 
> > >> wrote:
> > >> > When mailbox controller provides two or more channels and
> > >> > they are actively used by mailbox client(s) it's very easy
> > >> > to trigger the warning in hrtimer_forward():
> > >> >
> > >> > [  247.853060] WARNING: CPU: 6 PID: 0 at kernel/time/hrtimer.c:805 
> > >> > hrtimer_forward+0x88/0xd8
> > >> > [  247.853549] Modules linked in:
> > >> > [  247.853907] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G        W       
> > >> > 4.11.0-rc2-00362-g93afaa4513bb-dirty #13
> > >> > [  247.854472] Hardware name: linux,dummy-virt (DT)
> > >> > [  247.854699] task: ffff80001d89d780 task.stack: ffff80001d8c4000
> > >> > [  247.854999] PC is at hrtimer_forward+0x88/0xd8
> > >> > [  247.855280] LR is at txdone_hrtimer+0xd4/0xf8
> > >> > [  247.855551] pc : [<ffff0000081039f0>] lr : [<ffff00000881b874>] 
> > >> > pstate: 200001c5
> > >> > [  247.855857] sp : ffff80001efbdeb0
> > >> > [  247.856072] x29: ffff80001efbdeb0 x28: ffff80001efc3140
> > >> > [  247.856358] x27: ffff00000881b7a0 x26: 00000039ac93e8b6
> > >> > [  247.856604] x25: ffff000008e756be x24: ffff80001c4a1348
> > >> > [  247.856882] x23: 0000000000000001 x22: 00000000000000f8
> > >> > [  247.857189] x21: ffff80001c4a1318 x20: ffff80001d327110
> > >> > [  247.857509] x19: 00000000000f4240 x18: 0000000000000030
> > >> > [  247.857808] x17: 0000ffffaecdf370 x16: ffff0000081ccc80
> > >> > [  247.858000] x15: 0000000000000010 x14: 00000000fffffff0
> > >> > [  247.858186] x13: ffff000008f488e0 x12: 000000000002e3eb
> > >> > [  247.858381] x11: ffff000008979690 x10: 0000000000000000
> > >> > [  247.858573] x9 : 0000000000000001 x8 : ffff80001efc66e0
> > >> > [  247.858758] x7 : ffff80001efc6708 x6 : 00000005be7732f2
> > >> > [  247.858943] x5 : 0000000000000001 x4 : ffff80001c4a1348
> > >> > [  247.859130] x3 : 00000039ac94952a x2 : 00000000000f4240
> > >> > [  247.859315] x1 : 00000039ac98243c x0 : 0000000000038f12
> > >> > [  247.859582] ---[ end trace d61812426ec3c30b ]---
> > >> >
> > >> > To fix this current patch migrates hr timers to be per-channel
> > >> > instead of using only one timer per-controller.
> > >> >
> > >> I think we can do by just checking if hrtimer_active() returns false
> > >> before we do hrtimer_start() in msg_submit() ?
> > >
> > > It looks like it can be easily broken:
> > >
> > > 1) let's say first thread executes timer callback and already checked 
> > > last_tx_done
> > > on channel 0;
> > > 2) second thread submits a message to the controller, say, on channel 0 
> > > and with
> > > help of hrtimer_active() observes that the timer is active (because timer 
> > > callback
> > > is running) and decides not to (re-)start timer;
> > >
> > > After this first thread decides not to restart the timer and finishes 
> > > callback.
> > > The thing that first thread executes tx_tick isn't helpful: for example 
> > > first
> > > thread may have no messages to submit on any channel and therefore is not 
> > > going
> > > to deal with timer.
> > >
> > > Finally, mailbox state machine is stalled. Second thread thinks that 
> > > timer is
> > > active while it's not.
> > >
> > ... you mean race :)  and we have locks for that. You want me to send
> > in a patch?
> 
> We don't have separate lock for timer.
> 
> > > One of the main questions is that there is only one timer per few channels
> > > in current code.
> > >
> > I see that as a good thing because
> > a) Polling anyway doesn't provide 'hard' guarantee even if we have one
> > timer per channel
> > b) The poll period remains same for every channel, so functionality
> > wise you only increase timer callbacks.
> 
> Do you mean something like this below?
> 
> The patch isn't really tested on multi-channel environment yet but
> I will test it. I just want to know if I am on the right way here.
> 
> I know there are some adjustments that can be done in the loop in hr-timer
> callback. The thing that I don't like here is a lot of spin_lock/unlocks
> in the timer callback.
 
Hi Jassi,


I was able to come up with new version with only HR-timer spinlock in timer
callback. That one seems a little bit better.

The patch is below:


>From df8264ec3b1db61fdfe0f381fabb6abdecfb3ec1 Mon Sep 17 00:00:00 2001
From: Alexey Klimov <[email protected]>
Date: Thu, 29 May 2017 17:40:11 +0100
Subject: [PATCH RFC] mailbox: fix hrtimer_forward() warning

When mailbox controller provides two or more channels,
timer-based polling is used by mailbox controller
and channels are actively used by mailbox client(s) it's very
easy to trigger the warning in hrtimer_forward():

[  247.853060] WARNING: CPU: 6 PID: 0 at kernel/time/hrtimer.c:805 
hrtimer_forward+0x88/0xd8
[  247.853549] Modules linked in:
[  247.853907] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G        W       
4.11.0-rc2-00362-g93afaa4513bb-dirty #13
[  247.854472] Hardware name: linux,dummy-virt (DT)
[  247.854699] task: ffff80001d89d780 task.stack: ffff80001d8c4000
[  247.854999] PC is at hrtimer_forward+0x88/0xd8
[  247.855280] LR is at txdone_hrtimer+0xd4/0xf8
[  247.855551] pc : [<ffff0000081039f0>] lr : [<ffff00000881b874>] pstate: 
200001c5
[  247.855857] sp : ffff80001efbdeb0
[  247.856072] x29: ffff80001efbdeb0 x28: ffff80001efc3140
[  247.856358] x27: ffff00000881b7a0 x26: 00000039ac93e8b6
[  247.856604] x25: ffff000008e756be x24: ffff80001c4a1348
[  247.856882] x23: 0000000000000001 x22: 00000000000000f8
[  247.857189] x21: ffff80001c4a1318 x20: ffff80001d327110
[  247.857509] x19: 00000000000f4240 x18: 0000000000000030
[  247.857808] x17: 0000ffffaecdf370 x16: ffff0000081ccc80
[  247.858000] x15: 0000000000000010 x14: 00000000fffffff0
[  247.858186] x13: ffff000008f488e0 x12: 000000000002e3eb
[  247.858381] x11: ffff000008979690 x10: 0000000000000000
[  247.858573] x9 : 0000000000000001 x8 : ffff80001efc66e0
[  247.858758] x7 : ffff80001efc6708 x6 : 00000005be7732f2
[  247.858943] x5 : 0000000000000001 x4 : ffff80001c4a1348
[  247.859130] x3 : 00000039ac94952a x2 : 00000000000f4240
[  247.859315] x1 : 00000039ac98243c x0 : 0000000000038f12
[  247.859582] ---[ end trace d61812426ec3c30b ]---

To fix it this patch introduces one more mbox spinlock for hr-timer and
hrt_active variable in controller structure.
The hr-timer callback restarts timer unless there no active requests
on any channel.

Signed-off-by: Alexey Klimov <[email protected]>
---
 drivers/mailbox/mailbox.c          | 51 ++++++++++++++++++++++++++++++--------
 include/linux/mailbox_controller.h |  2 ++
 2 files changed, 42 insertions(+), 11 deletions(-)

diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index 9dfbf7e..57ae422 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -85,9 +85,17 @@ static void msg_submit(struct mbox_chan *chan)
 exit:
        spin_unlock_irqrestore(&chan->lock, flags);
 
-       if (!err && (chan->txdone_method & TXDONE_BY_POLL))
-               /* kick start the timer immediately to avoid delays */
-               hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
+       if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
+
+               spin_lock_irqsave(&chan->mbox->lock_hrt, flags);
+               /* try to start the timer immediately to avoid delays */
+               if (!chan->mbox->hrt_active) {
+                       chan->mbox->hrt_active = 1;
+                       hrtimer_start(&chan->mbox->poll_hrt, 0,
+                                                       HRTIMER_MODE_REL);
+               }
+               spin_unlock_irqrestore(&chan->mbox->lock_hrt, flags);
+       }
 }
 
 static void tx_tick(struct mbox_chan *chan, int r)
@@ -121,23 +129,43 @@ static enum hrtimer_restart txdone_hrtimer(struct hrtimer 
*hrtimer)
        bool txdone, resched = false;
        int i;
 
+       spin_lock(&mbox->lock_hrt);
+
        for (i = 0; i < mbox->num_chans; i++) {
                struct mbox_chan *chan = &mbox->chans[i];
 
+               /*
+                * Reading of active_req is racy and we may miss an active
+                * request. However, the enqueuing thread will spin on the
+                * HR-timer lock, which we acquired here, and will start the
+                * timer if required.
+                */
                if (chan->active_req && chan->cl) {
-                       txdone = chan->mbox->ops->last_tx_done(chan);
-                       if (txdone)
+                       /*
+                        * If we observe an active request on any channel, then
+                        * we have to re-start the timer; if we see that
+                        * a request has finished, we don't know if tx_tick()
+                        * will schedule the next request or not. Therefore we
+                        * have to re-sched the HR-timer here in all cases.
+                        */
+                       resched = true;
+
+                       txdone = mbox->ops->last_tx_done(chan);
+                       if (txdone) {
+                               spin_unlock(&mbox->lock_hrt);
                                tx_tick(chan, 0);
-                       else
-                               resched = true;
+                               spin_lock(&mbox->lock_hrt);
+                       }
                }
        }
 
-       if (resched) {
+       if (resched)
                hrtimer_forward_now(hrtimer, ms_to_ktime(mbox->txpoll_period));
-               return HRTIMER_RESTART;
-       }
-       return HRTIMER_NORESTART;
+       else
+               mbox->hrt_active = 0;
+       spin_unlock(&mbox->lock_hrt);
+
+       return resched ? HRTIMER_RESTART : HRTIMER_NORESTART;
 }
 
 /**
@@ -462,6 +490,7 @@ int mbox_controller_register(struct mbox_controller *mbox)
                        return -EINVAL;
                }
 
+               spin_lock_init(&mbox->lock_hrt);
                hrtimer_init(&mbox->poll_hrt, CLOCK_MONOTONIC,
                             HRTIMER_MODE_REL);
                mbox->poll_hrt.function = txdone_hrtimer;
diff --git a/include/linux/mailbox_controller.h 
b/include/linux/mailbox_controller.h
index 74deadb..1dd4c47 100644
--- a/include/linux/mailbox_controller.h
+++ b/include/linux/mailbox_controller.h
@@ -83,6 +83,8 @@ struct mbox_controller {
                                      const struct of_phandle_args *sp);
        /* Internal to API */
        struct hrtimer poll_hrt;
+       spinlock_t lock_hrt;
+       unsigned int hrt_active;
        struct list_head node;
 };
 
-- 
1.9.1

Re: [PATCH RFC] mailbox: move controller timer to per-channel timers

Reply via email to