On Mon, 2009-03-30 at 13:54 -0700, Trent Piepho wrote:
> On Sun, 29 Mar 2009, Andy Walls wrote:
> > On Sun, 2009-03-29 at 01:24 -0700, Trent Piepho wrote:
> > > wait_event() should take care of this. wait_event(q, test) basically
> > > does:
> > >
> > > for(;;) {
> > > // point A
> > > add_me_to_waitqueue(q);
> > > set_current_state(TASK_UNINTERRUPTIBLE);
> > > if (test)
> > > break;
> > > // point B
> > > schedule();
> > > }
> > > clean_up_wait_stuff();
> >
> > As you know, the condition is checked even before this loop is entered,
> > to avoid even being even added to a waitqueue. (Thank God for ctags...)
>
> I think the initial check of the condition is just an optimization and
> everything will still work without it. Seeing as all this is inlined, I
> wonder if it's a good optimization...
I guess it depends on how expensive prepare_to_wait() is since it
acquires a spinlock.
But now that you mention inlining. I now have to check if this sequence
that I thought didn't work:
prepare_to_wait(waitq, &w, TASK_UNINTERRUPTIBLE);
cx18_write_reg_expect(cx, irq, SW1_INT_SET, irq, irq);
if (req != cx18_readl(cx, &mb->ack))
ret = schedule_timeout(timeout);
finish_wait(waitq, &w);
is actually getting reordered. I bet constituent parts of the first two
lines may be. That would certainly explain things.
> > As you may have noticed, the original code was using
> > wait_event_timeout() before like this:
> >
> > CX18_DEBUG_HI_IRQ("sending interrupt SW1: %x to send %s\n",
> > irq, info->name);
> > cx18_write_reg_expect(cx, irq, SW1_INT_SET, irq, irq);
> >
> > ret = wait_event_timeout(
> > *waitq,
> > cx18_readl(cx, &mb->ack) == cx18_readl(cx,
> > &mb->request),
> > timeout);
> >
> > Because waiting for the ack back is the right thing to do, but certainly
> > waiting too long is not warranted.
> >
> > This gave me the occasional log message like this:
> >
> > 1: cx18-0: irq: sending interrupt SW1: 8 to send CX18_CPU_DE_SET_MDL
> > 2: cx18-0: irq: received interrupts SW1: 0 SW2: 8 HW2: 0
> > 3: cx18-0: irq: received interrupts SW1: 10000 SW2: 0 HW2: 0
> > 4: cx18-0: warning: sending CX18_CPU_DE_SET_MDL timed out waiting 10 msecs
> > for RPU acknowledgement
> >
> > Where line 1 is the driver notifiying the firmware with a SW1 interrupt.
> > Line 2 is the firmware responding back to the cx18_irq_handler() with
> > the Ack interrupt in SW2 (the flags match, 8 & 8, by design).
> > Line 3 is an unrelated incoming video buffer notification for the cx18
> > driver.
> > Line 4 is the wait_event_timeout() timing out.
>
> Could it be that the wait_event doesn't actually run and check its
> condition until _after_ line 3?
SW2: 8 is a firmware response to us setting the outgoing SW1 to 8.
SW2 getting cleared is done by cx18_irq_handler and its helper:
static void xpu_ack(struct cx18 *cx, u32 sw2)
{
// Wake up the process waiting on the EPU -> CPU mailbox ack
// This one has a flag value of 8
if (sw2 & IRQ_CPU_TO_EPU_ACK)
wake_up(&cx->mb_cpu_waitq);
// Wake up the process waiting on the EPU -> APU mailbox ack
if (sw2 & IRQ_APU_TO_EPU_ACK)
wake_up(&cx->mb_apu_waitq);
}
irqreturn_t cx18_irq_handler(int irq, void *dev_id)
{
struct cx18 *cx = (struct cx18 *)dev_id;
u32 sw1, sw2, hw2;
// Get the status of SW2
sw2 = cx18_read_reg(cx, SW2_INT_STATUS) & cx->sw2_irq_mask;
// Clear any status of SW2 that were found set
if (sw2)
cx18_write_reg_expect(cx, sw2, SW2_INT_STATUS, ~sw2, sw2);
// Act on any SW2 interrupts found set
if (sw2)
xpu_ack(cx, sw2);
return (sw1 || sw2 || hw2) ? IRQ_HANDLED : IRQ_NONE;
}
> In that case SW2 != 8 and so it goes back
> to sleep?
If SW2 got set back to 0 (or != 8), that means we cleared it ourselves.
We only do that slightly prior to waking up the proper waitq.
> Calling wake_up() just makes the processes on the waitq
> runnable, they don't actually run until later, possibly much later.
Hmm. Maybe then we're yielding the processor and then simply running
much later.
If that's the case, I may be stuck with shorter timeouts with polls. :(
> > > If your event occurs and wake_up() is called at point A, then the test
> > > should be true when it's checked and schedule() is never called. If the
> > > event happens at point B, then the process' state will have been changed
> > > to
> > > TASK_RUNNING by wake_up(), remember it's already on the waitqueue at this
> > > point, and schedule() won't sleep.
> >
> > OK, for some reason, I thought schedule() and schedule_timeout() would
> > go to sleep anyway.
>
> AFAIK, they'll still cause the kernel schedule a process. Maybe a
> different process. But the original process is still in TASK_RUNNING state
> and so still in the run queue and will get run again. If it was in
> TASK_(UN)INTERRUPTIBLE state then it wouldn't be in the run queue and
> wouldn't run again until something woke it up.
I'm going to have to figure out how to profile things.
> > > I think what's probably happening is the test, cx18_readl(cx, &mb->ack) ==
> > > cx18_readl(cx, &mb->request), is somehow not true even though the ack has
> > > been received.
> >
> > A PCI bus read error could be the culprit here. That's the only thing I
> > can think of. We only get one notification via IRQ from the firmware.
> >
> >
> > > Maybe a new request was added?
> >
> > No, I lock the respective epu2apu or epu2cpu mailboxes respectively with
> > a mutex.
>
> But in your log:
> > 1: cx18-0: irq: sending interrupt SW1: 8 to send CX18_CPU_DE_SET_MDL
> > 2: cx18-0: irq: received interrupts SW1: 0 SW2: 8 HW2: 0
> > 3: cx18-0: irq: received interrupts SW1: 10000 SW2: 0 HW2: 0
> > 4: cx18-0: warning: sending CX18_CPU_DE_SET_MDL timed out waiting 10 msecs
> > for RPU acknowledgement
>
> Isn't the wait_event_timeout() waiting until line 4? And doesn't line 3
> mean something has changed the registers? Changed them before the
> wait_event finished?
First a note:
There are four mailboxes the driver cares about EPU -> CPU, CPU -> EPU,
EPU -> APU, and APU -> EPU. We mostly care about the EPU/CPU
mailboxes.
Line 1 indicates an outgoing command in the EPU -> CPU mailbox
Line 2 indicates an ack was placed in the EPU -> CPU mailbox
Line 3 indicates an incoming command in the CPU -> EPU mailbox
Line 4 indicates the async notification in Line 2 being missed by
us somehow
So
yes, wait_event_timeout() is waiting until line 4.
yes, line 3 means we cleared the SW2 register when we got line 2,
yes, line 3 also means the firmware is sending us a full capture buffer
with information in the CPU -> EPU mailbox (not the mbox at issue)
But, line 3 does not mean the EPU -> CPU mailbox ack changed, Line 2
tells us that should be the case (different mailboxes)
The process of the IRQ handler clearing SW2 should have resulted in the
IRQ handler also sending a wake_up(). It doesn't appear to happen a
small percentage of the time.
> > > I think calling wait_event()'s with something that tests a hardware
> > > register is a little iffy. It's better if the irq handler sets some
> > > driver
> > > state flag (atomically!) that indicates the event you were waiting for has
> > > happened and then you check that flag.
> >
> > I was toying with setting an atomic while in the IRQ handler. But then
> > I realized when we get the ack interrupt, the firmware should actually
> > be done. So really the wakeup() is the only indicator I really need.
> > Checking for ack == req is just a formality I guess.
>
> If you use an interruptible timeout, then you could get interrupted with a
> signal before the irq handler has woken you.
I used UNITERRUPTIBLE here because it doesn't quite make sense to
inidcate -ERESTARTSYS back to the caller in all circumstances. This
function can be called by read() to send a series of commands to start a
capture or to just send back a drained buffer; poll() to send a series
of commands to start a capture; or by the work handler to return a
drained buffer.
> > There wasn't a wait_timeout(), so I had tried something like this in my
> > first iteration:
>
> It's called sleep_on_timeout(q, timeout).
I was deterred by the big scary comment:
/*
* These are the old interfaces to sleep waiting for an event.
* They are racy. DO NOT use them, use the wait_event* interfaces above.
* We plan to remove these interfaces.
*/
extern void sleep_on(wait_queue_head_t *q);
extern long sleep_on_timeout(wait_queue_head_t *q,
signed long timeout);
Again, thanks for helping me think this through. It's obvious I have
more testing and troubleshooting to do.
My first step will be to check to see if compiler reordering screwed up
one of my prior patch attempts.
Regards,
Andy
_______________________________________________
ivtv-devel mailing list
[email protected]
http://ivtvdriver.org/mailman/listinfo/ivtv-devel