Hi Matthew, On Saturday, 25. November 2006 17:32, Matthew Wilcox wrote: > In the qla case, the mutex can be acquired by a thread which then waits > for the hardware to do something. If the hardware locks up, it is > preferable that the system not hang.
Ok, I looked at it (drivers/scsi/qla2xxx/qla_mbx.c) and the solution seems simple: - Introduce an busy flag, check that BEFORE this mutex_lock() and don't protect it by that mutex. - return -EBUSY to the upper layers, if mailbox still busy - upper layers can either queue the command or use a retry mechanism There are many examples for this in the kernel. NICs have the same problems (transmitter busy or stuck) and have no problem handling that gracefully since ages. > I assumed that he'd spent enough time thinking about it that fixing it > really wasn't feasible. That doesn't depend on time, just whether you get the right idea or not. Anyway I CCed the current maintainers. So my point still stands: Timeout based locking is evil and hides bugs. In this case the bugs are: 1. That mutex protects a code path (mailbox command submission and retrieve) instead of data. 2. "Mailbox is free" is an event, so you should use wait_event_timout() for that Regards Ingo Oeser - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/