Hello

I have a setup with a 12 daisy chained EXP3000 enclosures connected to
a server such that each of the disks are accessible via two paths
through multiple sas expanders. The server has 2 dual ported HBAs. I'm
running 2.6.32 kernel variant based on RHEL 6.0. I have seen this on
2.6.31 as well.

I am seeing a bunch of kernel panics pretty frequently with an
interesting pattern -

The kernel paniced trying to deference stuff from scsi_cmnd passed to scsi_dispatch_cmd. Looking at its caller

scsi_request_fn(request_queue *q)
    device = q->quedata;
    req = get_request_from_queue(q);

    cmd = req->special;
    if (!cmd)
        return;

    scsi_dispatch_cmd(cmd);

The interesting thing is cmd->device != device; they should've been the same. In fact it looks like req->special itself is garbage and that's why we have been looking at the wrong stuff in scsi_dispatch_cmd. It also explains why we can find the device on upper frames and prove that it (and all its attributes) are healthy all the way from MD to LSI.

The only place in scsi layer that frees req->special is

scsi_prep_return()
....
    case BLKPREP_KILL:
        req->errors = DID_NO_CONNECT << 16;
        /* release the command and kill it */
        if (req->special) {
            struct scsi_cmnd *cmd = req->special;   <--
            scsi_release_buffers(cmd);                |__ not atomic
            scsi_put_command(cmd);                    |
            req->special = NULL;                    <--
        }

I know this is a basic question, but I'm wondering how does one prevent threads from using req->special while its being deallocated here? I see checks for special == NULL at a lot of places but the release and reset of pointer (above) are not atomic.

Thanks in advance.
--
Aniket Kulkarni


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to