|
Just thought I would drop a note
and tell people where things stand.
For the most part, things have
been kind of quiet lately. I am trying to polish off a few remaining
issues. I think I have gdth licked, and I think I have the host blocking
feature added back again, and at this point I am waiting for the people
testing. In the background I am starting to clean up a few other
things - I have split part of scsi.c into a new file (and in the long run, there
will probably be several others as well). Unfortunately scsi.c has served
as sort of a repository for a bunch of loosely related things, and both
readability and maintainability have suffered. I am appending the current
patchset, in the event that anyone else wants to look at it.
I was talking with someone about
how to implement automatic spinup of spun down drives, and I realized that there
was a design flaw in the new queueing code. Basically as things stand
currently we pre-allocate Scsi_Cmnd structures based upon the queue depth for
the device. For a disk with a queue depth of 1 there will be
only one Scsi_Cmnd structure dedicated to this device. When a user does
something which would cause an ioctl to be generated, the Scsi_Cmnd structure is
allocated in the ioctl code, and then passed via scsi_do_cmd() or
scsi_wait_cmd() into the mid-layer. As it turns out, this will be inserted
into the queue at the end.
The queue handler function
currently only looks at the head of the queue. It keeps queueing things as
long as resources are available to queue more commands. One of the
resources that gets used up of course is the supply of Scsi_Cmnd
structures. It is basically assumed that all of the structures that are
allocated are essentially queued to the device, and if we run out, the queue
handler can simply return without doing anything. The idea is that when
one of the commands that is currently running completes that we would make
another stab at queueing the next request at the head of the queue.
Consider a disk that is fairly
active, and with a queue depth of 1. Imagine that a user issues an ioctl -
this will allocate the only Scsi_Cmnd structure, and drop it at the tail of the
queue. The scsi_request_fn() function will look at the head of the queue,
and be unable to queue that request because there are no Scsi_Cmnd
structures available, and return. Hence a sort of deadlock will
arise.
My guess at the moment is that
this situation isn't all that common - nobody has reported it yet, but it is
only a matter of time before somebody stumbles across it. The reason
it hasn't come up is that it would only appear for a disk or a cdrom that is
fairly busy, has a small queue depth, and for which an ioctl is issued.
The trick of course is to figure out how to fix it.
In the short term, I could
easily add a simple hack to detect this situation and recover from it. No
big deal here, and until a more architecturally satisfying solution arises, this
will do.
In the longer term, there
are several things that come to mind. First of all, upper level
drivers probably don't actually need a Scsi_Cmnd structure for anything except
for completion processing. As things currently stand, an upper level
driver simply allocates the thing, and then calls scsi_do_cmd/scsi_wait_cmd
without having done much to the Scsi_Cmnd structure at all. Once the
command is complete, then the upper level driver usually releases it after
looking at the status code. There are a handful of things that the upper
level driver actually needs - usually just the completion code and in some cases
the sense data in the event that something went wrong. I could come up with something analagous to scsi_do_cmd or
scsi_wait_cmd that doesn't even take a Scsi_Cmnd structure, but at this point I
don't know to what degree it is possible to replace
scsi_do_cmd/scsi_wait_cmd. If a 100% replacement is feasible, then this
would be a complete solution.
The second thought that comes to
mind is that we really need to get away from pre-allocating Scsi_Cmnd structures
for devices. There should be a general purpose pool allocated for the host
instead (to prevent starvation, there might be one Scsi_Cmnd pre-allocated per
device, and then the rest come from a general purpose pool, but that's just an
idea). The general idea is that managing the queue depth would be
done at a different level, so that it is less likely that we get into the
trouble I described above. While we do have a known problem of
overallocating Scsi_Cmnd structures, I don't like the idea of allocating
consumables if they are not needed right away, and this would be more of a
band-aid.
The last possibility is
that for the case of things like ioctl it is *possible* that we could allocate
an additional Scsi_Cmnd from the general kernel memory pool and hook it
in. In general you don't want to call kmalloc() anywhere that is in the
code path during I/O operations that might take place during swapping, but ioctl
generally won't fall in this category. I am not that wild about this
one.
As things stand I am still
thinking about it. A quickie hack type of workaround certainly
can be done to prevent any user-visible symptoms, but at this point I am trying
to decide what would make the most sense from an architectural point of
view. I won't bother to code up the workaround unless people start to
experience this problem in the field. Unless I am hit with a
brilliant flash of insight, I will probably play with the first possibility
first and see how much trouble it gets me into.
-Eric
|
- scsi queueing status. Eric Youngdale
- SCSI queueing status. Eric Youngdale
- Re: SCSI queueing status. Juergen Fischer
- Re: SCSI queueing status. Eric Youngdale
- SCSI queueing status. Eric Youngdale
- Re: SCSI queueing status. Tim Waugh
- Re: SCSI queueing status. Peter Rival
- Eric Youngdale
linux39a.diff