Hi All,

Doug Ledford wrote:
> 
> "Boerner, Brian" wrote:
> >
> > I would like to know peoples thought on making a change to the sd driver.
> > Specifically changing the
> > SD_TIMEOUT value from a define to a variable.
> >
> > Reasons:
> > #define SD_TIMEOUT (30 * HZ)
> >
> > This hard coded value of 30 seconds is to short for some hardware raid
> > devices. In most cases,
> > active fail over takes place long before the 30 second threshold, however if
> > the adapter is particularly
> > busy, this can sometimes take a bit longer. The end result is a kernel
> > panic.
> >
> > I propose changing this to a variable:
> > int sd_timeout = 30 * HZ;
> >
> > This allows other drivers (megaraid, aacraid, etc.) to extend this, thus
> > allowing fail over to take place
> > without interrupting the system.
> >
> > if (sd_timeout < (60 * HZ) {
> >         sd_timeout = (60 * HZ);
> > }
> >
> > It looks like the other scsi drivers use their own timeout values. However,
> > most HW Raid adapters
> > depend on the scsi disk driver for such values.
> >
> > Do people have any strong objections to this? I'll certainly make the
> > changes, I just want to hear
> > the good, the bad, and the ugly.
> 
> If you're going to do the work, then make the timeout on a per controller
> basis (I suppose it should actually be on a per device basis when I think
> about it).  Maybe add it to the sd disk structs.  Then the low level driver
> can say "Hey, this is a hardware raid logical volume, I'll make it have a
> timeout of 60 seconds" or they can say "this is a passthrough device, 10
> seconds is a plenty long timeout".  I for one would take the aic7xxx driver
> and modify the timeout values back down to about 8 seconds on it's devices.
> That's more than enough time for any command it sends to complete.  Then the
> hardware raid controller drivers are free to do what they want and they can
> also timeout logical volumes different from pass through disk devices
> (assuming that they are supported and are treated differently on a particular
> brand of hardware raid controller).

        Yes, timeouts should be on a per device basis.

        Regarding "timeout values back down to about 8 seconds"...

        I think you'll find some devices after exhausting it's retries w/ECC,
and perhaps performing Automatic Read/Write Allocation for bad blocks, your 8
second timeout will be exceeded, leading to false errors.  Of course tapes
and medium changers require much longer command timeouts.  Timeouts should
be controlled by the device driver generating the request (IMHO).

        Also, I think you need to honor the device timeout setup, since your
driver does not know if an external RAID box is attached.

        You may also find there are other commands, such as Test Unit Ready,
which require longer timeouts for RAID boxes which do transparent controller
failover.  Replacing all hardcoded timeouts with variables would be great!

Kind Regards,
Robin

> 
> --
> 
>  Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
>       Please check my web site for aic7xxx updates/answers before
>                       e-mailing me about problems
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]

Reply via email to