Oddly enough, my problem has already been fixed. That's the good
news. Thanks for the work you guys have done! Now the catch is that
I need some documentation on what the fix was for our quality system.
I've been developing a driver using the 2.4.20 kernel. Since our
product that uses this driver is controlled according to FDA
guidelines, I can't just make arbitrary changes without
justification. So, I'm hunting for some more details to justify the
switch from kernel 2.4.20 to 2.4.22.
Here's the setup. The userland applications initiates an ioctl call
to my driver. This could either be to read data from my device or to
write data out to it. I've only been able to reproduce the hang when
writing data out. Each ioctl call from userland initiates four urb
transfers: three bulk outs and one bulk in. That last bulk in is
reading status information from my device to determine how well the
command worked. The ioctl handler in the driver initiates the first
bulk out urb asynchronously. The ioclt handler then uses
wait_event_interruptible to determine when all four urbs have been
transfered. The completion handler for each urb in turn submits the
next urb. So, three times out of four usb_submit_urb is called from
interrupt context.
To produce the hang, I repeat this ioctl loop several thousand times:
write, write, write, read. I'm not exactly sure how many times I need
to repeat before the hang. It seems somewhat consistent although
there is some variance. The actual hang is between the third write
and the final read. I have pretty good diagnostics on my device. I
can tell quite clearly that the device received the third urb from the
host and has prepared the status packet to be read. It is sitting
and waiting for the host to read that status packet. In my driver the
code looks pretty much like this:
printk("reading status packet");
usb_fill_bulk_urb (.....);
rc = usb_submit_urb(...);
if (rc) {
printk("error reading status packet");
}
When the hang occurs I see the first printk saying it's getting ready
to read the status packet. I don't get the error printk. However, I
don't know what happens after that. My completion handler is never
called. I don't have a bus analyzer, so I can't confirm whether or
not that last transaction is initiated by the kernel.
Here's the tricky part. If I add an "else printk (....)" to the
pseudo-code above, I'm unable to trigger the hang. In other words,
this seems to me to be an ugly timing issue. Take the last debug
statement out, it hangs. Put it back in, it won't hang. When I
upgraded to 2.4.22 (using make oldconfig on the .config from 2.4.20) I
couldn't get it to hang. What I don't know is if there was a bug in
2.4.20 usb core that was fixed in 2.4.22 or if the timing in 2.4.22
was changed enough to mask my problem. So, any guesses what this
might be? Any recommendations?
BTW, you can find the full source to my driver at
http://lathi.net/view/Main/DataPlay.
--
(__) Doug Alcorn - Unix/Linux/Web Developing
oo / PGP 02B3 1E26 BCF2 9AAF 93F1 61D7 450C B264 3E63 D543
|_/ mailto:[EMAIL PROTECTED] http://www.lathi.net
mailto:[EMAIL PROTECTED] is a spam trap
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel