Hi Matthew,

On Saturday, 25. November 2006 17:32, Matthew Wilcox wrote:
> In the qla case, the mutex can be acquired by a thread which then waits
> for the hardware to do something.  If the hardware locks up, it is
> preferable that the system not hang.

Ok, I looked at it (drivers/scsi/qla2xxx/qla_mbx.c) 
and the solution seems simple:
- Introduce an busy flag, check that BEFORE this mutex_lock()
  and don't protect it by that mutex.
- return -EBUSY to the  upper layers, if mailbox still busy
- upper layers can either queue the command or use a retry mechanism

There are many examples for this in the kernel. NICs have the same problems
(transmitter busy or stuck) and have no problem handling that gracefully
since ages.

> I assumed that he'd spent enough time thinking about it that fixing it
> really wasn't feasible.

That doesn't depend on time, just whether you get the right idea or not.

Anyway I CCed the current maintainers.

So my point still stands: Timeout based locking is evil and hides bugs.

In this case the bugs are: 
1. That mutex protects a code path (mailbox command submission 
    and retrieve) instead of data.
2. "Mailbox is free" is an event, so you should use wait_event_timout() 
    for that


Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to