On Thu, May 25, 2017 at 6:43 PM, Alexey Klimov <[email protected]> wrote: > Hi Jassi, > > sorry for delay again. > > On Tue, Apr 11, 2017 at 06:30:08PM +0530, Jassi Brar wrote: > > On 11 April 2017 at 18:04, Alexey Klimov <[email protected]> wrote: > > > On Fri, Apr 07, 2017 at 08:39:35PM +0530, Jassi Brar wrote: > > >> On Thu, Apr 6, 2017 at 11:01 PM, Alexey Klimov <[email protected]> > > >> wrote: > > >> > When mailbox controller provides two or more channels and > > >> > they are actively used by mailbox client(s) it's very easy > > >> > to trigger the warning in hrtimer_forward(): > > >> > > > >> > [ 247.853060] WARNING: CPU: 6 PID: 0 at kernel/time/hrtimer.c:805 > > >> > hrtimer_forward+0x88/0xd8 > > >> > [ 247.853549] Modules linked in: > > >> > [ 247.853907] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G W > > >> > 4.11.0-rc2-00362-g93afaa4513bb-dirty #13 > > >> > [ 247.854472] Hardware name: linux,dummy-virt (DT) > > >> > [ 247.854699] task: ffff80001d89d780 task.stack: ffff80001d8c4000 > > >> > [ 247.854999] PC is at hrtimer_forward+0x88/0xd8 > > >> > [ 247.855280] LR is at txdone_hrtimer+0xd4/0xf8 > > >> > [ 247.855551] pc : [<ffff0000081039f0>] lr : [<ffff00000881b874>] > > >> > pstate: 200001c5 > > >> > [ 247.855857] sp : ffff80001efbdeb0 > > >> > [ 247.856072] x29: ffff80001efbdeb0 x28: ffff80001efc3140 > > >> > [ 247.856358] x27: ffff00000881b7a0 x26: 00000039ac93e8b6 > > >> > [ 247.856604] x25: ffff000008e756be x24: ffff80001c4a1348 > > >> > [ 247.856882] x23: 0000000000000001 x22: 00000000000000f8 > > >> > [ 247.857189] x21: ffff80001c4a1318 x20: ffff80001d327110 > > >> > [ 247.857509] x19: 00000000000f4240 x18: 0000000000000030 > > >> > [ 247.857808] x17: 0000ffffaecdf370 x16: ffff0000081ccc80 > > >> > [ 247.858000] x15: 0000000000000010 x14: 00000000fffffff0 > > >> > [ 247.858186] x13: ffff000008f488e0 x12: 000000000002e3eb > > >> > [ 247.858381] x11: ffff000008979690 x10: 0000000000000000 > > >> > [ 247.858573] x9 : 0000000000000001 x8 : ffff80001efc66e0 > > >> > [ 247.858758] x7 : ffff80001efc6708 x6 : 00000005be7732f2 > > >> > [ 247.858943] x5 : 0000000000000001 x4 : ffff80001c4a1348 > > >> > [ 247.859130] x3 : 00000039ac94952a x2 : 00000000000f4240 > > >> > [ 247.859315] x1 : 00000039ac98243c x0 : 0000000000038f12 > > >> > [ 247.859582] ---[ end trace d61812426ec3c30b ]--- > > >> > > > >> > To fix this current patch migrates hr timers to be per-channel > > >> > instead of using only one timer per-controller. > > >> > > > >> I think we can do by just checking if hrtimer_active() returns false > > >> before we do hrtimer_start() in msg_submit() ? > > > > > > It looks like it can be easily broken: > > > > > > 1) let's say first thread executes timer callback and already checked > > > last_tx_done > > > on channel 0; > > > 2) second thread submits a message to the controller, say, on channel 0 > > > and with > > > help of hrtimer_active() observes that the timer is active (because timer > > > callback > > > is running) and decides not to (re-)start timer; > > > > > > After this first thread decides not to restart the timer and finishes > > > callback. > > > The thing that first thread executes tx_tick isn't helpful: for example > > > first > > > thread may have no messages to submit on any channel and therefore is not > > > going > > > to deal with timer. > > > > > > Finally, mailbox state machine is stalled. Second thread thinks that > > > timer is > > > active while it's not. > > > > > ... you mean race :) and we have locks for that. You want me to send > > in a patch? > > We don't have separate lock for timer. > > > > One of the main questions is that there is only one timer per few channels > > > in current code. > > > > > I see that as a good thing because > > a) Polling anyway doesn't provide 'hard' guarantee even if we have one > > timer per channel > > b) The poll period remains same for every channel, so functionality > > wise you only increase timer callbacks. > > Do you mean something like this below? > > The patch isn't really tested on multi-channel environment yet but > I will test it. I just want to know if I am on the right way here. > > I know there are some adjustments that can be done in the loop in hr-timer > callback. The thing that I don't like here is a lot of spin_lock/unlocks > in the timer callback. Hi Jassi,
I was able to come up with new version with only HR-timer spinlock in timer callback. That one seems a little bit better. The patch is below: >From df8264ec3b1db61fdfe0f381fabb6abdecfb3ec1 Mon Sep 17 00:00:00 2001 From: Alexey Klimov <[email protected]> Date: Thu, 29 May 2017 17:40:11 +0100 Subject: [PATCH RFC] mailbox: fix hrtimer_forward() warning When mailbox controller provides two or more channels, timer-based polling is used by mailbox controller and channels are actively used by mailbox client(s) it's very easy to trigger the warning in hrtimer_forward(): [ 247.853060] WARNING: CPU: 6 PID: 0 at kernel/time/hrtimer.c:805 hrtimer_forward+0x88/0xd8 [ 247.853549] Modules linked in: [ 247.853907] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G W 4.11.0-rc2-00362-g93afaa4513bb-dirty #13 [ 247.854472] Hardware name: linux,dummy-virt (DT) [ 247.854699] task: ffff80001d89d780 task.stack: ffff80001d8c4000 [ 247.854999] PC is at hrtimer_forward+0x88/0xd8 [ 247.855280] LR is at txdone_hrtimer+0xd4/0xf8 [ 247.855551] pc : [<ffff0000081039f0>] lr : [<ffff00000881b874>] pstate: 200001c5 [ 247.855857] sp : ffff80001efbdeb0 [ 247.856072] x29: ffff80001efbdeb0 x28: ffff80001efc3140 [ 247.856358] x27: ffff00000881b7a0 x26: 00000039ac93e8b6 [ 247.856604] x25: ffff000008e756be x24: ffff80001c4a1348 [ 247.856882] x23: 0000000000000001 x22: 00000000000000f8 [ 247.857189] x21: ffff80001c4a1318 x20: ffff80001d327110 [ 247.857509] x19: 00000000000f4240 x18: 0000000000000030 [ 247.857808] x17: 0000ffffaecdf370 x16: ffff0000081ccc80 [ 247.858000] x15: 0000000000000010 x14: 00000000fffffff0 [ 247.858186] x13: ffff000008f488e0 x12: 000000000002e3eb [ 247.858381] x11: ffff000008979690 x10: 0000000000000000 [ 247.858573] x9 : 0000000000000001 x8 : ffff80001efc66e0 [ 247.858758] x7 : ffff80001efc6708 x6 : 00000005be7732f2 [ 247.858943] x5 : 0000000000000001 x4 : ffff80001c4a1348 [ 247.859130] x3 : 00000039ac94952a x2 : 00000000000f4240 [ 247.859315] x1 : 00000039ac98243c x0 : 0000000000038f12 [ 247.859582] ---[ end trace d61812426ec3c30b ]--- To fix it this patch introduces one more mbox spinlock for hr-timer and hrt_active variable in controller structure. The hr-timer callback restarts timer unless there no active requests on any channel. Signed-off-by: Alexey Klimov <[email protected]> --- drivers/mailbox/mailbox.c | 51 ++++++++++++++++++++++++++++++-------- include/linux/mailbox_controller.h | 2 ++ 2 files changed, 42 insertions(+), 11 deletions(-) diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c index 9dfbf7e..57ae422 100644 --- a/drivers/mailbox/mailbox.c +++ b/drivers/mailbox/mailbox.c @@ -85,9 +85,17 @@ static void msg_submit(struct mbox_chan *chan) exit: spin_unlock_irqrestore(&chan->lock, flags); - if (!err && (chan->txdone_method & TXDONE_BY_POLL)) - /* kick start the timer immediately to avoid delays */ - hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL); + if (!err && (chan->txdone_method & TXDONE_BY_POLL)) { + + spin_lock_irqsave(&chan->mbox->lock_hrt, flags); + /* try to start the timer immediately to avoid delays */ + if (!chan->mbox->hrt_active) { + chan->mbox->hrt_active = 1; + hrtimer_start(&chan->mbox->poll_hrt, 0, + HRTIMER_MODE_REL); + } + spin_unlock_irqrestore(&chan->mbox->lock_hrt, flags); + } } static void tx_tick(struct mbox_chan *chan, int r) @@ -121,23 +129,43 @@ static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer) bool txdone, resched = false; int i; + spin_lock(&mbox->lock_hrt); + for (i = 0; i < mbox->num_chans; i++) { struct mbox_chan *chan = &mbox->chans[i]; + /* + * Reading of active_req is racy and we may miss an active + * request. However, the enqueuing thread will spin on the + * HR-timer lock, which we acquired here, and will start the + * timer if required. + */ if (chan->active_req && chan->cl) { - txdone = chan->mbox->ops->last_tx_done(chan); - if (txdone) + /* + * If we observe an active request on any channel, then + * we have to re-start the timer; if we see that + * a request has finished, we don't know if tx_tick() + * will schedule the next request or not. Therefore we + * have to re-sched the HR-timer here in all cases. + */ + resched = true; + + txdone = mbox->ops->last_tx_done(chan); + if (txdone) { + spin_unlock(&mbox->lock_hrt); tx_tick(chan, 0); - else - resched = true; + spin_lock(&mbox->lock_hrt); + } } } - if (resched) { + if (resched) hrtimer_forward_now(hrtimer, ms_to_ktime(mbox->txpoll_period)); - return HRTIMER_RESTART; - } - return HRTIMER_NORESTART; + else + mbox->hrt_active = 0; + spin_unlock(&mbox->lock_hrt); + + return resched ? HRTIMER_RESTART : HRTIMER_NORESTART; } /** @@ -462,6 +490,7 @@ int mbox_controller_register(struct mbox_controller *mbox) return -EINVAL; } + spin_lock_init(&mbox->lock_hrt); hrtimer_init(&mbox->poll_hrt, CLOCK_MONOTONIC, HRTIMER_MODE_REL); mbox->poll_hrt.function = txdone_hrtimer; diff --git a/include/linux/mailbox_controller.h b/include/linux/mailbox_controller.h index 74deadb..1dd4c47 100644 --- a/include/linux/mailbox_controller.h +++ b/include/linux/mailbox_controller.h @@ -83,6 +83,8 @@ struct mbox_controller { const struct of_phandle_args *sp); /* Internal to API */ struct hrtimer poll_hrt; + spinlock_t lock_hrt; + unsigned int hrt_active; struct list_head node; }; -- 1.9.1

