On Sun, 2009-03-29 at 01:24 -0700, Trent Piepho wrote:
> On Sat, 28 Mar 2009, Andy Walls wrote:
> > On Mon, 2009-03-23 at 06:52 -0700, Corey Taylor wrote:
> > I found a race condition between the cx driver and the CX23418 firmware.
> > I have a patch that mitigates the problem here:
> >
> > http://linuxtv.org/hg/~awalls/cx18/rev/9f5f44e0ce6c
>
> > [ We have to do this polling wait because there is a race with the
> > firmware. Once we give it the SW1 interrupt above, it can wake up our
> > waitq with an ack interrupt via the irq handler after we're ready to
> > wait, but before we actually get put to sleep by schedule(). Loosing
> > that race causes us to wait the entire timeout, waitng for a wakeup
> > that's never going to come. ]
Trent,
First, thanks for the fresh perspective.
> A race like this should be avoidable. The way it works is you do something
> like this:
>
> /* 1 */ set_current_state(TASK_INTERRUPTIBLE);
> /* 2 */ cx18_write_reg_expect(cx, irq, SW1_INT_SET, irq, irq);
> /* 3 */ schedule();
> /* 4 */ ack_has_now_been_received();
I tried something like this in my second iteration, see below. (The
patch I put in my repo was actually my third iteration.)
> The race you are talking about is when the ack arrives between line 2 and
> 3. If this happens here, the process' current state is changed to
> TASK_RUNNING when the irq hander that receives the ack tries to wake our
> process. If schedule() is called with the state set to TASK_RUNNING then
> the process doesn't sleep. And thus there is no race. The key is that
> preparing to sleep at line 1 happens before we start the event we want to
> wait for at line 2.
>
> wait_event() should take care of this. wait_event(q, test) basically does:
>
> for(;;) {
> // point A
> add_me_to_waitqueue(q);
> set_current_state(TASK_UNINTERRUPTIBLE);
> if (test)
> break;
> // point B
> schedule();
> }
> clean_up_wait_stuff();
As you know, the condition is checked even before this loop is entered,
to avoid even being even added to a waitqueue. (Thank God for ctags...)
As you may have noticed, the original code was using
wait_event_timeout() before like this:
CX18_DEBUG_HI_IRQ("sending interrupt SW1: %x to send %s\n",
irq, info->name);
cx18_write_reg_expect(cx, irq, SW1_INT_SET, irq, irq);
ret = wait_event_timeout(
*waitq,
cx18_readl(cx, &mb->ack) == cx18_readl(cx, &mb->request),
timeout);
Because waiting for the ack back is the right thing to do, but certainly
waiting too long is not warranted.
This gave me the occasional log message like this:
1: cx18-0: irq: sending interrupt SW1: 8 to send CX18_CPU_DE_SET_MDL
2: cx18-0: irq: received interrupts SW1: 0 SW2: 8 HW2: 0
3: cx18-0: irq: received interrupts SW1: 10000 SW2: 0 HW2: 0
4: cx18-0: warning: sending CX18_CPU_DE_SET_MDL timed out waiting 10 msecs for
RPU acknowledgement
Where line 1 is the driver notifiying the firmware with a SW1 interrupt.
Line 2 is the firmware responding back to the cx18_irq_handler() with
the Ack interrupt in SW2 (the flags match, 8 & 8, by design).
Line 3 is an unrelated incoming video buffer notification for the cx18
driver.
Line 4 is the wait_event_timeout() timing out.
Since, I'm sending buffers back to the firmware on the read()-ing
applications timeline, these delays caused playback problems.
> If your event occurs and wake_up() is called at point A, then the test
> should be true when it's checked and schedule() is never called. If the
> event happens at point B, then the process' state will have been changed to
> TASK_RUNNING by wake_up(), remember it's already on the waitqueue at this
> point, and schedule() won't sleep.
OK, for some reason, I thought schedule() and schedule_timeout() would
go to sleep anyway.
> I think what's probably happening is the test, cx18_readl(cx, &mb->ack) ==
> cx18_readl(cx, &mb->request), is somehow not true even though the ack has
> been received.
A PCI bus read error could be the culprit here. That's the only thing I
can think of. We only get one notification via IRQ from the firmware.
> Maybe a new request was added?
No, I lock the respective epu2apu or epu2cpu mailboxes respectively with
a mutex.
> I think calling wait_event()'s with something that tests a hardware
> register is a little iffy. It's better if the irq handler sets some driver
> state flag (atomically!) that indicates the event you were waiting for has
> happened and then you check that flag.
I was toying with setting an atomic while in the IRQ handler. But then
I realized when we get the ack interrupt, the firmware should actually
be done. So really the wakeup() is the only indicator I really need.
Checking for ack == req is just a formality I guess.
There wasn't a wait_timeout(), so I had tried something like this in my
first iteration:
#define wait_event_oneshot_timeout(wq, condition, timeout) \
({ \
long __ret = timeout; \
if (!(condition)) { \
DEFINE_WAIT(__wait); \
prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); \
if (!(condition)) { \
__ret = schedule_timeout(__ret); \
} \
finish_wait(&wq, &__wait); \
} \
__ret; \
})
...
cx18_write_reg_expect(cx, irq, SW1_INT_SET, irq, irq);
ret = wait_event_oneshot_timeout(*waitq,
cx18_readl(cx, &mb->request) ==
cx18_readl(cx, &mb->ack),
timeout);
...
It didn't work. Sometimes it would wait the whole timeout, but the
cx18_irq_handler() had gotten an ack interrupt.
Then I tried:
// FIXME break into several small timeouts/poll
// or use an atomic to communicate completion
CX18_DEBUG_HI_IRQ("sending interrupt SW1: %x to send %s\n",
irq, info->name);
ret = timeout;
prepare_to_wait(waitq, &w, TASK_UNINTERRUPTIBLE);
cx18_write_reg_expect(cx, irq, SW1_INT_SET, irq, irq);
/*
* Will we schedule in time, before the IRQ handler wakes up our waitq?
* Who knows?! How exciting! Let the race begin!
*/
if (req != cx18_readl(cx, &mb->ack))
ret = schedule_timeout(timeout);
finish_wait(waitq, &w);
It didn't work, sometimes it would wait the whole timeout even though
the ack interrupt had arrived. Again at the time, I was under the
impression that schedule_timeout() would go to sleep anyway even if we
had been awakened (thus my sarcastic comments).
Did I miss anything with either of those two previous tries?
I guess I need to dig into the guts of schedule_timeout() to convince
myself that the process won't be put to sleep.
I'm using Fedora 10 BTW:
Linux palomino.walls.org 2.6.27.9-159.fc10.x86_64 #1 SMP Tue Dec 16
14:47:52 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
Thanks.
Regards,
Andy
_______________________________________________
ivtv-devel mailing list
[email protected]
http://ivtvdriver.org/mailman/listinfo/ivtv-devel