RE: panics with 16port Promise Supertrack EX Controller
> -Original Message- > From: Jens Axboe [mailto:[EMAIL PROTECTED] > Sent: Monday, July 16, 2007 4:18 AM > To: Flavio Curti > Cc: Nick Piggin; Michal Piotrowski; Flavio Curti; > linux-kernel@vger.kernel.org; Promise_Linux > Subject: Re: panics with 16port Promise Supertrack EX Controller > > > On Sun, Jul 15 2007, Flavio Curti wrote: > > Hello > > > > On Mon, Jul 09, 2007 at 11:59:36AM +1000, Nick Piggin wrote: > > > >Jul 8 00:19:13 dorade.cyberlink.ch EFLAGS: 00210046 > > > >(2.6.22-rc7-dorade #1) > > > >>The machine panics > > > >>after some days of running fine, the machine inst heavy loaded. > > > >>The Controller detects as stex device: > > > >kernel BUG at block/as-iosched.c:1084! > > > > > > > >BUG_ON(RB_EMPTY_ROOT(>sort_list[REQ_ASYNC])); > > > Could be a bug in the driver that just happens to be > caught by AS checks. > > > If you could test another scheduler (boot with > elevator=deadline or > > > elevator=cfq), > > > it might help give us an idea. > > > > Ok, I now switched to cfg, and the machine panicd again. > Panic attached. > > Any help is appreciated. > > It really looks like a stex bug - perhaps it's doing double > completions > of a request? > I will come up with something related to tag lock, and see what happens after that. Ed Lin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: panics with 16port Promise Supertrack EX Controller
-Original Message- From: Jens Axboe [mailto:[EMAIL PROTECTED] Sent: Monday, July 16, 2007 4:18 AM To: Flavio Curti Cc: Nick Piggin; Michal Piotrowski; Flavio Curti; linux-kernel@vger.kernel.org; Promise_Linux Subject: Re: panics with 16port Promise Supertrack EX Controller On Sun, Jul 15 2007, Flavio Curti wrote: Hello On Mon, Jul 09, 2007 at 11:59:36AM +1000, Nick Piggin wrote: Jul 8 00:19:13 dorade.cyberlink.ch EFLAGS: 00210046 (2.6.22-rc7-dorade #1) The machine panics after some days of running fine, the machine inst heavy loaded. The Controller detects as stex device: kernel BUG at block/as-iosched.c:1084! BUG_ON(RB_EMPTY_ROOT(ad-sort_list[REQ_ASYNC])); Could be a bug in the driver that just happens to be caught by AS checks. If you could test another scheduler (boot with elevator=deadline or elevator=cfq), it might help give us an idea. Ok, I now switched to cfg, and the machine panicd again. Panic attached. Any help is appreciated. It really looks like a stex bug - perhaps it's doing double completions of a request? I will come up with something related to tag lock, and see what happens after that. Ed Lin - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue
> -Original Message- > From: James Bottomley [mailto:[EMAIL PROTECTED] > Sent: Wednesday, April 04, 2007 10:36 AM > To: Ed Lin > Cc: linux-scsi; linux-kernel; jeff; Promise_Linux > Subject: RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue > > > On Wed, 2007-04-04 at 10:31 -0700, Ed Lin wrote: > > > Sorry. It seems the mail server has problem. The patch is > here in plain > > text. I hope this time it does not mess up. I have problem with > > linux-scsi > > mail list, if you have comment please cc me. Thanks. > > --Ed Lin > > The lines are still broken, I'm afraid ... you can just resend as an > attachement git-applypatch copes fine with that ... inline is > just good > for quoting and replying. > > + if (hba->cardtype == st_shasta) { > > + host->max_channel = 7; > > + host->max_id = 16 + 1; > > + } else if (hba->cardtype == st_yosemite) { > > + host->max_channel = 127; > > + host->max_id = 1 + 1; > > + } else { > > + /* st_vsc and st_vsc1 */ > > + host->max_channel = 0; > > + host->max_id = 128 + 1; > > This is OK. The use of ->channel is still a bit strange ... could we > not simply use lun instead of channel (i.e. map the adapter id/lun to > the mid-layer id/lun instead of using id/channel)? > This is because there is a CONFIG_SCSI_MULTI_LUN option. If this option is not selected, max_scsi_luns will be set to 1 and the RAID arrays with lun>0 will disappear(because they are not scanned). That is unacceptable from a user's view point. I have also explained this in the code comment: /* * firmware uses id/lun pair for a logical drive, but lun would be * always 0 if CONFIG_SCSI_MULTI_LUN not configured, so we use * channel to map lun here */ If it is not allowed to map channel to lun, then maybe I have to report 128 targets and do the mapping in queuecommand... After all there must be a mapping somewhere so I don't see much difference here... I paste the patch again, is the format ok? The internal id/lun mapping of st_vsc and st_vsc1 controllers is different from st_shasta. The original driver code can only map first 16 'entities' for st_vsc and st_vsc1 while there are actually 128 available. The correct mapping should be: channel:0~7, id:0~15 (st_shasta, console is channel:0, id:16, lun:0) channel:0~127, id:0 (st_yosemite, console is channel:0, id:1, lun:0) channel:0, id:0~127 (st_vsc and st_vsc1, console is channel:0, id:128, lun:0) Signed-off-by: Ed Lin <[EMAIL PROTECTED]> --- diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c index 69be132..29a7b61 100644 --- a/drivers/scsi/stex.c +++ b/drivers/scsi/stex.c @@ -113,10 +113,6 @@ enum { SG_CF_64B = 0x40, /* 64 bit item */ SG_CF_HOST = 0x20, /* sg in host memory */ - ST_MAX_ARRAY_SUPPORTED = 16, - ST_MAX_TARGET_NUM = (ST_MAX_ARRAY_SUPPORTED+1), - ST_MAX_LUN_PER_TARGET = 16, - st_shasta = 0, st_vsc = 1, st_vsc1 = 2, @@ -606,7 +602,7 @@ stex_queuecommand(struct scsi_cmnd *cmd, return 0; } case INQUIRY: - if (id != ST_MAX_ARRAY_SUPPORTED) + if (id != host->max_id - 1) break; if (lun == 0 && (cmd->cmnd[1] & INQUIRY_EVPD) == 0) { stex_direct_copy(cmd, console_inq_page, @@ -624,7 +620,7 @@ stex_queuecommand(struct scsi_cmnd *cmd, ver.oem = ST_OEM; ver.build = ST_BUILD_VER; ver.signature[0] = PASSTHRU_SIGNATURE; - ver.console_id = ST_MAX_ARRAY_SUPPORTED; + ver.console_id = host->max_id - 1; ver.host_no = hba->host->host_no; cmd->result = stex_direct_copy(cmd, , sizeof(ver)) ? DID_OK << 16 | COMMAND_COMPLETE << 8 : @@ -645,13 +641,8 @@ stex_queuecommand(struct scsi_cmnd *cmd, req = stex_alloc_req(hba); - if (hba->cardtype == st_yosemite) { - req->lun = lun * (ST_MAX_TARGET_NUM - 1) + id; - req->target = 0; - } else { - req->lun = lun; - req->target = id; - } + req->lun = lun; + req->target = id; /* cdb */ memcpy(req->cdb, cmd->cmnd, STEX_CDB_LENGTH); @@ -1229,11 +1220,22 @@ stex_probe(struct pci_dev *pdev, const s hba->copy_buffer = hba->dma_
RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for console device
> -Original Message- > From: Ed Lin > Sent: Monday, April 02, 2007 4:02 PM > To: James Bottomley > Cc: linux-scsi; linux-kernel; jeff; Promise_Linux > Subject: RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for > console device > > > > > > -Original Message- > > From: James Bottomley [mailto:[EMAIL PROTECTED] > > Sent: Monday, April 02, 2007 11:28 AM > > To: Ed Lin > > Cc: linux-scsi; linux-kernel; jeff; Promise_Linux > > Subject: RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for > > console device > > > > > > On Mon, 2007-04-02 at 11:14 -0700, Ed Lin wrote: > > > I just saw the routine name scsi_eh_try_stu, and didn't notice the > > > allow_restart (partly because I thought it was not harmful...). > > > But the TEST_UNIT_READY must stay. > > > > Sure ... I was just checking since your change log implied > you'd seen > > the problem from the error handler ... however, we can add it ... > > there's a possibility of getting spin up on init from sd anyway. > > > > You make the decision. But after reconsideration, I think it's better > to remove unused code. It also needs change since the patch about > id mapping is modified in another mail. > > How about the attachment here? > Sorry. I think the mail server has problem with attachment. The patch is here in plain text. Sorry for the inconvenience. I have problem with linux-scsi mail list, if you have comment please cc me. Thanks. --Ed Lin After reset completed, the scsi error handler sends out TEST_UNIT_READY to the device. For 'normal' devices this command will be handled by firmware. However, because the RAID console only interfaces to scsi mid layer, the firmware will not process this command for it. This will make the console to be offlined right after reset. Add the handling in driver to fix this problem. Signed-off-by: Ed Lin <[EMAIL PROTECTED]> --- diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c index 4c6ce6a..85c779b 100644 --- a/drivers/scsi/stex.c +++ b/drivers/scsi/stex.c @@ -601,6 +601,13 @@ stex_queuecommand(struct scsi_cmnd *cmd, stex_invalid_field(cmd, done); return 0; } + case TEST_UNIT_READY: + if (id == host->max_id - 1) { + cmd->result = DID_OK << 16 | COMMAND_COMPLETE << 8; + done(cmd); + return 0; + } + break; case INQUIRY: if (id != host->max_id - 1) break; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue
> -Original Message- > From: Ed Lin > Sent: Monday, April 02, 2007 4:01 PM > To: 'James Bottomley' > Cc: linux-scsi; linux-kernel; jeff; Promise_Linux > Subject: RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue > > > > > > -Original Message- > > From: James Bottomley [mailto:[EMAIL PROTECTED] > > Sent: Saturday, March 31, 2007 7:22 AM > > To: Ed Lin > > Cc: linux-scsi; linux-kernel; jeff; Promise_Linux > > Subject: Re: [PATCH 1/4] [SCSI]stex: fix id mapping issue > > > > > > On Fri, 2007-03-30 at 15:21 -0700, Ed Lin wrote: > > > The internal id/lun mapping of st_vsc and st_vsc1 > > controllers is different > > > from st_shasta. The original driver code can only map > > first 16 'entities' > > > for st_vsc and st_vsc1 while there are actually 128 available. > > > > > > Also the ST_MAX_LUN_PER_TARGET should be 8, although this can do > > > no harm because inquiries beyond boundary are discarded > by firmware. > > > > > > The correct internal mapping should be: > > > id:0~15, lun:0~7 (st_shasta) > > > id:0, lun:0~127 (st_yosemite) > > > id:0~127, lun:0 (st_vsc and st_vsc1) > > > To scsi mid layer they are all channel:0~7, id:0~15, lun:0, > > with a maximun > > > 'entity' number of 128. The RAID console only interfaces to > > scsi mid layer > > > and is always mapped at channel:0, id:16, lun:0. > > > > I'm with Christoph here ... if we're going to break the backwards > > compatibility of the mappings (which your code does) then we > > could just > > dump channel and use the SCSI id and lun directly. > > > > Understanding this code is predicated on this quirky definition in > > stex_queuecommand: > > > > id = cmd->device->id; > > lun = cmd->device->channel; /* firmware lun issue work around */ > >^^^ > > > > > @ -645,12 +645,16 @@ stex_queuecommand(struct scsi_cmnd *cmd, > > > > > > req = stex_alloc_req(hba); > > > > > > - if (hba->cardtype == st_yosemite) { > > > - req->lun = lun * (ST_MAX_TARGET_NUM - 1) + id; > > > > This looks to be correct, it goes up id 0 to > ST_MAX_TARGET_NUM -1 then > > takes the next channel. > > > > > - req->target = 0; > > > - } else { > > > + if (hba->cardtype == st_shasta) { > > > req->lun = lun; > > > req->target = id; > > > + } else if (hba->cardtype == st_yosemite){ > > > + req->lun = id * ST_MAX_LUN_PER_TARGET + lun; > > > + req->target = 0; > > > + } else { > > > + /* st_vsc and st_vsc1 */ > > > + req->lun = 0; > > > + req->target = id * ST_MAX_LUN_PER_TARGET + lun; > > > > These both look to be wrong. You're taking the channel as > the lowest > > common denominator, so your first target is on channel 1 id > > 0, your next > > on channel 2, id 0 and so on. That's really going to mess with the > > ordering (which will be user visible) is that really what you want? > > > > How about the attached one? > Sorry. It seems the mail server has problem. The patch is here in plain text. I hope this time it does not mess up. I have problem with linux-scsi mail list, if you have comment please cc me. Thanks. --Ed Lin The internal id/lun mapping of st_vsc and st_vsc1 controllers is different from st_shasta. The original driver code can only map first 16 'entities' for st_vsc and st_vsc1 while there are actually 128 available. The correct mapping should be: channel:0~7, id:0~15 (st_shasta, console is channel:0, id:16, lun:0) channel:0~127, id:0 (st_yosemite, console is channel:0, id:1, lun:0) channel:0, id:0~127 (st_vsc and st_vsc1, console is channel:0, id:128, lun:0) Signed-off-by: Ed Lin <[EMAIL PROTECTED]> --- diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c index 69be132..29a7b61 100644 --- a/drivers/scsi/stex.c +++ b/drivers/scsi/stex.c @@ -113,10 +113,6 @@ enum { SG_CF_64B = 0x40, /* 64 bit item */ SG_CF_HOST = 0x20, /* sg in host memory */ - ST_MAX_ARRAY_SUPPORTED = 16, - ST_MAX_TARGET_NUM = (ST_MAX_ARRAY_SUPPORTED+1), - ST_MAX_LUN_PER_TARGET = 16, - st_shasta = 0, st_vsc = 1, st_vsc1 = 2, @@ -606,7 +602,7 @@ ste
RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue
-Original Message- From: Ed Lin Sent: Monday, April 02, 2007 4:01 PM To: 'James Bottomley' Cc: linux-scsi; linux-kernel; jeff; Promise_Linux Subject: RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue -Original Message- From: James Bottomley [mailto:[EMAIL PROTECTED] Sent: Saturday, March 31, 2007 7:22 AM To: Ed Lin Cc: linux-scsi; linux-kernel; jeff; Promise_Linux Subject: Re: [PATCH 1/4] [SCSI]stex: fix id mapping issue On Fri, 2007-03-30 at 15:21 -0700, Ed Lin wrote: The internal id/lun mapping of st_vsc and st_vsc1 controllers is different from st_shasta. The original driver code can only map first 16 'entities' for st_vsc and st_vsc1 while there are actually 128 available. Also the ST_MAX_LUN_PER_TARGET should be 8, although this can do no harm because inquiries beyond boundary are discarded by firmware. The correct internal mapping should be: id:0~15, lun:0~7 (st_shasta) id:0, lun:0~127 (st_yosemite) id:0~127, lun:0 (st_vsc and st_vsc1) To scsi mid layer they are all channel:0~7, id:0~15, lun:0, with a maximun 'entity' number of 128. The RAID console only interfaces to scsi mid layer and is always mapped at channel:0, id:16, lun:0. I'm with Christoph here ... if we're going to break the backwards compatibility of the mappings (which your code does) then we could just dump channel and use the SCSI id and lun directly. Understanding this code is predicated on this quirky definition in stex_queuecommand: id = cmd-device-id; lun = cmd-device-channel; /* firmware lun issue work around */ ^^^ @ -645,12 +645,16 @@ stex_queuecommand(struct scsi_cmnd *cmd, req = stex_alloc_req(hba); - if (hba-cardtype == st_yosemite) { - req-lun = lun * (ST_MAX_TARGET_NUM - 1) + id; This looks to be correct, it goes up id 0 to ST_MAX_TARGET_NUM -1 then takes the next channel. - req-target = 0; - } else { + if (hba-cardtype == st_shasta) { req-lun = lun; req-target = id; + } else if (hba-cardtype == st_yosemite){ + req-lun = id * ST_MAX_LUN_PER_TARGET + lun; + req-target = 0; + } else { + /* st_vsc and st_vsc1 */ + req-lun = 0; + req-target = id * ST_MAX_LUN_PER_TARGET + lun; These both look to be wrong. You're taking the channel as the lowest common denominator, so your first target is on channel 1 id 0, your next on channel 2, id 0 and so on. That's really going to mess with the ordering (which will be user visible) is that really what you want? How about the attached one? Sorry. It seems the mail server has problem. The patch is here in plain text. I hope this time it does not mess up. I have problem with linux-scsi mail list, if you have comment please cc me. Thanks. --Ed Lin The internal id/lun mapping of st_vsc and st_vsc1 controllers is different from st_shasta. The original driver code can only map first 16 'entities' for st_vsc and st_vsc1 while there are actually 128 available. The correct mapping should be: channel:0~7, id:0~15 (st_shasta, console is channel:0, id:16, lun:0) channel:0~127, id:0 (st_yosemite, console is channel:0, id:1, lun:0) channel:0, id:0~127 (st_vsc and st_vsc1, console is channel:0, id:128, lun:0) Signed-off-by: Ed Lin [EMAIL PROTECTED] --- diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c index 69be132..29a7b61 100644 --- a/drivers/scsi/stex.c +++ b/drivers/scsi/stex.c @@ -113,10 +113,6 @@ enum { SG_CF_64B = 0x40, /* 64 bit item */ SG_CF_HOST = 0x20, /* sg in host memory */ - ST_MAX_ARRAY_SUPPORTED = 16, - ST_MAX_TARGET_NUM = (ST_MAX_ARRAY_SUPPORTED+1), - ST_MAX_LUN_PER_TARGET = 16, - st_shasta = 0, st_vsc = 1, st_vsc1 = 2, @@ -606,7 +602,7 @@ stex_queuecommand(struct scsi_cmnd *cmd, return 0; } case INQUIRY: - if (id != ST_MAX_ARRAY_SUPPORTED) + if (id != host-max_id - 1) break; if (lun == 0 (cmd-cmnd[1] INQUIRY_EVPD) == 0) { stex_direct_copy(cmd, console_inq_page, @@ -624,7 +620,7 @@ stex_queuecommand(struct scsi_cmnd *cmd, ver.oem = ST_OEM; ver.build = ST_BUILD_VER; ver.signature[0] = PASSTHRU_SIGNATURE; - ver.console_id = ST_MAX_ARRAY_SUPPORTED; + ver.console_id = host-max_id - 1; ver.host_no = hba-host-host_no; cmd-result = stex_direct_copy(cmd, ver, sizeof(ver
RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for console device
-Original Message- From: Ed Lin Sent: Monday, April 02, 2007 4:02 PM To: James Bottomley Cc: linux-scsi; linux-kernel; jeff; Promise_Linux Subject: RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for console device -Original Message- From: James Bottomley [mailto:[EMAIL PROTECTED] Sent: Monday, April 02, 2007 11:28 AM To: Ed Lin Cc: linux-scsi; linux-kernel; jeff; Promise_Linux Subject: RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for console device On Mon, 2007-04-02 at 11:14 -0700, Ed Lin wrote: I just saw the routine name scsi_eh_try_stu, and didn't notice the allow_restart (partly because I thought it was not harmful...). But the TEST_UNIT_READY must stay. Sure ... I was just checking since your change log implied you'd seen the problem from the error handler ... however, we can add it ... there's a possibility of getting spin up on init from sd anyway. You make the decision. But after reconsideration, I think it's better to remove unused code. It also needs change since the patch about id mapping is modified in another mail. How about the attachment here? Sorry. I think the mail server has problem with attachment. The patch is here in plain text. Sorry for the inconvenience. I have problem with linux-scsi mail list, if you have comment please cc me. Thanks. --Ed Lin After reset completed, the scsi error handler sends out TEST_UNIT_READY to the device. For 'normal' devices this command will be handled by firmware. However, because the RAID console only interfaces to scsi mid layer, the firmware will not process this command for it. This will make the console to be offlined right after reset. Add the handling in driver to fix this problem. Signed-off-by: Ed Lin [EMAIL PROTECTED] --- diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c index 4c6ce6a..85c779b 100644 --- a/drivers/scsi/stex.c +++ b/drivers/scsi/stex.c @@ -601,6 +601,13 @@ stex_queuecommand(struct scsi_cmnd *cmd, stex_invalid_field(cmd, done); return 0; } + case TEST_UNIT_READY: + if (id == host-max_id - 1) { + cmd-result = DID_OK 16 | COMMAND_COMPLETE 8; + done(cmd); + return 0; + } + break; case INQUIRY: if (id != host-max_id - 1) break; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue
-Original Message- From: James Bottomley [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 04, 2007 10:36 AM To: Ed Lin Cc: linux-scsi; linux-kernel; jeff; Promise_Linux Subject: RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue On Wed, 2007-04-04 at 10:31 -0700, Ed Lin wrote: Sorry. It seems the mail server has problem. The patch is here in plain text. I hope this time it does not mess up. I have problem with linux-scsi mail list, if you have comment please cc me. Thanks. --Ed Lin The lines are still broken, I'm afraid ... you can just resend as an attachement git-applypatch copes fine with that ... inline is just good for quoting and replying. + if (hba-cardtype == st_shasta) { + host-max_channel = 7; + host-max_id = 16 + 1; + } else if (hba-cardtype == st_yosemite) { + host-max_channel = 127; + host-max_id = 1 + 1; + } else { + /* st_vsc and st_vsc1 */ + host-max_channel = 0; + host-max_id = 128 + 1; This is OK. The use of -channel is still a bit strange ... could we not simply use lun instead of channel (i.e. map the adapter id/lun to the mid-layer id/lun instead of using id/channel)? This is because there is a CONFIG_SCSI_MULTI_LUN option. If this option is not selected, max_scsi_luns will be set to 1 and the RAID arrays with lun0 will disappear(because they are not scanned). That is unacceptable from a user's view point. I have also explained this in the code comment: /* * firmware uses id/lun pair for a logical drive, but lun would be * always 0 if CONFIG_SCSI_MULTI_LUN not configured, so we use * channel to map lun here */ If it is not allowed to map channel to lun, then maybe I have to report 128 targets and do the mapping in queuecommand... After all there must be a mapping somewhere so I don't see much difference here... I paste the patch again, is the format ok? The internal id/lun mapping of st_vsc and st_vsc1 controllers is different from st_shasta. The original driver code can only map first 16 'entities' for st_vsc and st_vsc1 while there are actually 128 available. The correct mapping should be: channel:0~7, id:0~15 (st_shasta, console is channel:0, id:16, lun:0) channel:0~127, id:0 (st_yosemite, console is channel:0, id:1, lun:0) channel:0, id:0~127 (st_vsc and st_vsc1, console is channel:0, id:128, lun:0) Signed-off-by: Ed Lin [EMAIL PROTECTED] --- diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c index 69be132..29a7b61 100644 --- a/drivers/scsi/stex.c +++ b/drivers/scsi/stex.c @@ -113,10 +113,6 @@ enum { SG_CF_64B = 0x40, /* 64 bit item */ SG_CF_HOST = 0x20, /* sg in host memory */ - ST_MAX_ARRAY_SUPPORTED = 16, - ST_MAX_TARGET_NUM = (ST_MAX_ARRAY_SUPPORTED+1), - ST_MAX_LUN_PER_TARGET = 16, - st_shasta = 0, st_vsc = 1, st_vsc1 = 2, @@ -606,7 +602,7 @@ stex_queuecommand(struct scsi_cmnd *cmd, return 0; } case INQUIRY: - if (id != ST_MAX_ARRAY_SUPPORTED) + if (id != host-max_id - 1) break; if (lun == 0 (cmd-cmnd[1] INQUIRY_EVPD) == 0) { stex_direct_copy(cmd, console_inq_page, @@ -624,7 +620,7 @@ stex_queuecommand(struct scsi_cmnd *cmd, ver.oem = ST_OEM; ver.build = ST_BUILD_VER; ver.signature[0] = PASSTHRU_SIGNATURE; - ver.console_id = ST_MAX_ARRAY_SUPPORTED; + ver.console_id = host-max_id - 1; ver.host_no = hba-host-host_no; cmd-result = stex_direct_copy(cmd, ver, sizeof(ver)) ? DID_OK 16 | COMMAND_COMPLETE 8 : @@ -645,13 +641,8 @@ stex_queuecommand(struct scsi_cmnd *cmd, req = stex_alloc_req(hba); - if (hba-cardtype == st_yosemite) { - req-lun = lun * (ST_MAX_TARGET_NUM - 1) + id; - req-target = 0; - } else { - req-lun = lun; - req-target = id; - } + req-lun = lun; + req-target = id; /* cdb */ memcpy(req-cdb, cmd-cmnd, STEX_CDB_LENGTH); @@ -1229,11 +1220,22 @@ stex_probe(struct pci_dev *pdev, const s hba-copy_buffer = hba-dma_mem + MU_BUFFER_SIZE; hba-mu_status = MU_STATE_STARTING; - /* firmware uses id/lun pair for a logical drive, but lun would be - always 0 if CONFIG_SCSI_MULTI_LUN not configured, so we use - channel to map lun here */ - host-max_channel = ST_MAX_LUN_PER_TARGET - 1; - host-max_id = ST_MAX_TARGET_NUM
RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for console device
> -Original Message- > From: James Bottomley [mailto:[EMAIL PROTECTED] > Sent: Monday, April 02, 2007 11:28 AM > To: Ed Lin > Cc: linux-scsi; linux-kernel; jeff; Promise_Linux > Subject: RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for > console device > > > On Mon, 2007-04-02 at 11:14 -0700, Ed Lin wrote: > > I just saw the routine name scsi_eh_try_stu, and didn't notice the > > allow_restart (partly because I thought it was not harmful...). > > But the TEST_UNIT_READY must stay. > > Sure ... I was just checking since your change log implied you'd seen > the problem from the error handler ... however, we can add it ... > there's a possibility of getting spin up on init from sd anyway. > You make the decision. But after reconsideration, I think it's better to remove unused code. It also needs change since the patch about id mapping is modified in another mail. How about the attachment here? s3 Description: s3
RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue
> -Original Message- > From: James Bottomley [mailto:[EMAIL PROTECTED] > Sent: Saturday, March 31, 2007 7:22 AM > To: Ed Lin > Cc: linux-scsi; linux-kernel; jeff; Promise_Linux > Subject: Re: [PATCH 1/4] [SCSI]stex: fix id mapping issue > > > On Fri, 2007-03-30 at 15:21 -0700, Ed Lin wrote: > > The internal id/lun mapping of st_vsc and st_vsc1 > controllers is different > > from st_shasta. The original driver code can only map > first 16 'entities' > > for st_vsc and st_vsc1 while there are actually 128 available. > > > > Also the ST_MAX_LUN_PER_TARGET should be 8, although this can do > > no harm because inquiries beyond boundary are discarded by firmware. > > > > The correct internal mapping should be: > > id:0~15, lun:0~7 (st_shasta) > > id:0, lun:0~127 (st_yosemite) > > id:0~127, lun:0 (st_vsc and st_vsc1) > > To scsi mid layer they are all channel:0~7, id:0~15, lun:0, > with a maximun > > 'entity' number of 128. The RAID console only interfaces to > scsi mid layer > > and is always mapped at channel:0, id:16, lun:0. > > I'm with Christoph here ... if we're going to break the backwards > compatibility of the mappings (which your code does) then we > could just > dump channel and use the SCSI id and lun directly. > > Understanding this code is predicated on this quirky definition in > stex_queuecommand: > > id = cmd->device->id; > lun = cmd->device->channel; /* firmware lun issue work around */ >^^^ > > > @ -645,12 +645,16 @@ stex_queuecommand(struct scsi_cmnd *cmd, > > > > req = stex_alloc_req(hba); > > > > - if (hba->cardtype == st_yosemite) { > > - req->lun = lun * (ST_MAX_TARGET_NUM - 1) + id; > > This looks to be correct, it goes up id 0 to ST_MAX_TARGET_NUM -1 then > takes the next channel. > > > - req->target = 0; > > - } else { > > + if (hba->cardtype == st_shasta) { > > req->lun = lun; > > req->target = id; > > + } else if (hba->cardtype == st_yosemite){ > > + req->lun = id * ST_MAX_LUN_PER_TARGET + lun; > > + req->target = 0; > > + } else { > > + /* st_vsc and st_vsc1 */ > > + req->lun = 0; > > + req->target = id * ST_MAX_LUN_PER_TARGET + lun; > > These both look to be wrong. You're taking the channel as the lowest > common denominator, so your first target is on channel 1 id > 0, your next > on channel 2, id 0 and so on. That's really going to mess with the > ordering (which will be user visible) is that really what you want? > How about the attached one? s1 Description: s1
RE: [PATCH 4/4] [SCSI]stex: minor cleanup and version update
> -Original Message- > From: Brian King [mailto:[EMAIL PROTECTED] > Sent: Monday, April 02, 2007 9:05 AM > To: Ed Lin > Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; Promise_Linux > Subject: Re: [PATCH 4/4] [SCSI]stex: minor cleanup and version update > > > Ed Lin wrote: > > @@ -1007,6 +1008,11 @@ static int stex_abort(struct scsi_cmnd * > > u32 data; > > int result = SUCCESS; > > unsigned long flags; > > + > > + printk(KERN_INFO DRV_NAME > > + "(%s): aborting command\n", pci_name(hba->pdev)); > > + scsi_print_command(cmd); > > + > > scmd_printk is probably what you want here rather than just a printk. > > scmd_printk(KERN_ERR, cmd, "aborting command\n"); > > > > @@ -1092,6 +1098,10 @@ static int stex_reset(struct scsi_cmnd * > > unsigned long before; > > hba = (struct st_hba *) >device->host->hostdata[0]; > > > > + printk(KERN_INFO DRV_NAME > > + "(%s): resetting host\n", pci_name(hba->pdev)); > > + scsi_print_command(cmd); > > + > > Same here. Well it's just because printk with pci_name stuff is used across the driver, and I didn't want to break the rule... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for console device
> -Original Message- > From: James Bottomley [mailto:[EMAIL PROTECTED] > Sent: Saturday, March 31, 2007 11:46 AM > To: Ed Lin > Cc: linux-scsi; linux-kernel; jeff; Promise_Linux > Subject: Re: [PATCH 3/4] [SCSI]stex: fix reset recovery for > console device > > > On Fri, 2007-03-30 at 15:21 -0700, Ed Lin wrote: > > After reset completed, the scsi error handler sends out START_STOP > > and TEST_UNIT_READY to the device. For 'normal' devices these > > commands will be handled by firmware. However, because the RAID > > console only interfaces to scsi mid layer, the firmware > will not process > > these commands for it. This will make the console to be > offlined right > > after reset. Add the handling in driver to fix this problem. > > I don't see how this explanation can be correct. The error > handler only > sends a START_STOP command if sdev->allow_restart is one, > which you have > to set in the slave_configure routines (which stex doesn't). > If you're > seeing a START_STOP in the eh path, there's something else wrong. > TEST_UNIT_READY, certainly ... it's part of a restart check. > I just saw the routine name scsi_eh_try_stu, and didn't notice the allow_restart (partly because I thought it was not harmful...). But the TEST_UNIT_READY must stay. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue
> -Original Message- > From: James Bottomley [mailto:[EMAIL PROTECTED] > Sent: Saturday, March 31, 2007 7:22 AM > To: Ed Lin > Cc: linux-scsi; linux-kernel; jeff; Promise_Linux > Subject: Re: [PATCH 1/4] [SCSI]stex: fix id mapping issue > > > On Fri, 2007-03-30 at 15:21 -0700, Ed Lin wrote: > > The internal id/lun mapping of st_vsc and st_vsc1 > controllers is different > > from st_shasta. The original driver code can only map > first 16 'entities' > > for st_vsc and st_vsc1 while there are actually 128 available. > > > > Also the ST_MAX_LUN_PER_TARGET should be 8, although this can do > > no harm because inquiries beyond boundary are discarded by firmware. > > > > The correct internal mapping should be: > > id:0~15, lun:0~7 (st_shasta) > > id:0, lun:0~127 (st_yosemite) > > id:0~127, lun:0 (st_vsc and st_vsc1) > > To scsi mid layer they are all channel:0~7, id:0~15, lun:0, > with a maximun > > 'entity' number of 128. The RAID console only interfaces to > scsi mid layer > > and is always mapped at channel:0, id:16, lun:0. > > I'm with Christoph here ... if we're going to break the backwards > compatibility of the mappings (which your code does) then we > could just > dump channel and use the SCSI id and lun directly. > > Understanding this code is predicated on this quirky definition in > stex_queuecommand: > > id = cmd->device->id; > lun = cmd->device->channel; /* firmware lun issue work around */ >^^^ > > > @ -645,12 +645,16 @@ stex_queuecommand(struct scsi_cmnd *cmd, > > > > req = stex_alloc_req(hba); > > > > - if (hba->cardtype == st_yosemite) { > > - req->lun = lun * (ST_MAX_TARGET_NUM - 1) + id; > > This looks to be correct, it goes up id 0 to ST_MAX_TARGET_NUM -1 then > takes the next channel. > > > - req->target = 0; > > - } else { > > + if (hba->cardtype == st_shasta) { > > req->lun = lun; > > req->target = id; > > + } else if (hba->cardtype == st_yosemite){ > > + req->lun = id * ST_MAX_LUN_PER_TARGET + lun; > > + req->target = 0; > > + } else { > > + /* st_vsc and st_vsc1 */ > > + req->lun = 0; > > + req->target = id * ST_MAX_LUN_PER_TARGET + lun; > > These both look to be wrong. You're taking the channel as the lowest > common denominator, so your first target is on channel 1 id > 0, your next > on channel 2, id 0 and so on. That's really going to mess with the > ordering (which will be user visible) is that really what you want? > Well the firmware doesn't care about the scanning order. You can jump first on 0,8,16... and then go back to 1,9,17... These are consistent to the user if the code is that way(may break previous impression though). Anyway this is a simulation and I can make whatever necessary adjustment. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue
> -Original Message- > From: Christoph Hellwig [mailto:[EMAIL PROTECTED] > Sent: Saturday, March 31, 2007 2:27 AM > To: Ed Lin > Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; Promise_Linux > Subject: Re: [PATCH 1/4] [SCSI]stex: fix id mapping issue > > > On Fri, Mar 30, 2007 at 03:21:33PM -0700, Ed Lin wrote: > > + if (hba->cardtype == st_shasta) { > > req->lun = lun; > > req->target = id; > > + } else if (hba->cardtype == st_yosemite){ > > + req->lun = id * ST_MAX_LUN_PER_TARGET + lun; > > + req->target = 0; > > + } else { > > + /* st_vsc and st_vsc1 */ > > + req->lun = 0; > > + req->target = id * ST_MAX_LUN_PER_TARGET + lun; > > I don't get why you can't export id as targer and lun as lun for > the !st_shasta types. Could you explain in detail what the problem > with that approach would be? > > Of course I can do that. That will result in 1 target and 128 lun for st_yosemite and 128 target and 1 lun for st_vsc. That seems a little weird and I am afraid it will be turned down. Also I can keep a same mapping for the console in the original code. If you think it's ok, that's really better, because it makes the hot path a bit faster. Also because of the CONFIG_SCSI_MULTI_LUN option, I have to map lun to channel otherwise many entities will disappear when that option is not selected. Plus I have to reserve a slot for the RAID console, so the final mapping may be: channel:0~7, id:0~16(st_shasta, channel 0,id 16 is reserved for console) channel:0~127, id:0~1(st_yosemite, channel 0,id 1 is reserved for console) channel:0, id:0~128(st_vsc, channel 0,id 128 is reserved for console) I don't know whether this is acceptable. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue
-Original Message- From: Christoph Hellwig [mailto:[EMAIL PROTECTED] Sent: Saturday, March 31, 2007 2:27 AM To: Ed Lin Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; Promise_Linux Subject: Re: [PATCH 1/4] [SCSI]stex: fix id mapping issue On Fri, Mar 30, 2007 at 03:21:33PM -0700, Ed Lin wrote: + if (hba-cardtype == st_shasta) { req-lun = lun; req-target = id; + } else if (hba-cardtype == st_yosemite){ + req-lun = id * ST_MAX_LUN_PER_TARGET + lun; + req-target = 0; + } else { + /* st_vsc and st_vsc1 */ + req-lun = 0; + req-target = id * ST_MAX_LUN_PER_TARGET + lun; I don't get why you can't export id as targer and lun as lun for the !st_shasta types. Could you explain in detail what the problem with that approach would be? Of course I can do that. That will result in 1 target and 128 lun for st_yosemite and 128 target and 1 lun for st_vsc. That seems a little weird and I am afraid it will be turned down. Also I can keep a same mapping for the console in the original code. If you think it's ok, that's really better, because it makes the hot path a bit faster. Also because of the CONFIG_SCSI_MULTI_LUN option, I have to map lun to channel otherwise many entities will disappear when that option is not selected. Plus I have to reserve a slot for the RAID console, so the final mapping may be: channel:0~7, id:0~16(st_shasta, channel 0,id 16 is reserved for console) channel:0~127, id:0~1(st_yosemite, channel 0,id 1 is reserved for console) channel:0, id:0~128(st_vsc, channel 0,id 128 is reserved for console) I don't know whether this is acceptable. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue
-Original Message- From: James Bottomley [mailto:[EMAIL PROTECTED] Sent: Saturday, March 31, 2007 7:22 AM To: Ed Lin Cc: linux-scsi; linux-kernel; jeff; Promise_Linux Subject: Re: [PATCH 1/4] [SCSI]stex: fix id mapping issue On Fri, 2007-03-30 at 15:21 -0700, Ed Lin wrote: The internal id/lun mapping of st_vsc and st_vsc1 controllers is different from st_shasta. The original driver code can only map first 16 'entities' for st_vsc and st_vsc1 while there are actually 128 available. Also the ST_MAX_LUN_PER_TARGET should be 8, although this can do no harm because inquiries beyond boundary are discarded by firmware. The correct internal mapping should be: id:0~15, lun:0~7 (st_shasta) id:0, lun:0~127 (st_yosemite) id:0~127, lun:0 (st_vsc and st_vsc1) To scsi mid layer they are all channel:0~7, id:0~15, lun:0, with a maximun 'entity' number of 128. The RAID console only interfaces to scsi mid layer and is always mapped at channel:0, id:16, lun:0. I'm with Christoph here ... if we're going to break the backwards compatibility of the mappings (which your code does) then we could just dump channel and use the SCSI id and lun directly. Understanding this code is predicated on this quirky definition in stex_queuecommand: id = cmd-device-id; lun = cmd-device-channel; /* firmware lun issue work around */ ^^^ @ -645,12 +645,16 @@ stex_queuecommand(struct scsi_cmnd *cmd, req = stex_alloc_req(hba); - if (hba-cardtype == st_yosemite) { - req-lun = lun * (ST_MAX_TARGET_NUM - 1) + id; This looks to be correct, it goes up id 0 to ST_MAX_TARGET_NUM -1 then takes the next channel. - req-target = 0; - } else { + if (hba-cardtype == st_shasta) { req-lun = lun; req-target = id; + } else if (hba-cardtype == st_yosemite){ + req-lun = id * ST_MAX_LUN_PER_TARGET + lun; + req-target = 0; + } else { + /* st_vsc and st_vsc1 */ + req-lun = 0; + req-target = id * ST_MAX_LUN_PER_TARGET + lun; These both look to be wrong. You're taking the channel as the lowest common denominator, so your first target is on channel 1 id 0, your next on channel 2, id 0 and so on. That's really going to mess with the ordering (which will be user visible) is that really what you want? Well the firmware doesn't care about the scanning order. You can jump first on 0,8,16... and then go back to 1,9,17... These are consistent to the user if the code is that way(may break previous impression though). Anyway this is a simulation and I can make whatever necessary adjustment. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for console device
-Original Message- From: James Bottomley [mailto:[EMAIL PROTECTED] Sent: Saturday, March 31, 2007 11:46 AM To: Ed Lin Cc: linux-scsi; linux-kernel; jeff; Promise_Linux Subject: Re: [PATCH 3/4] [SCSI]stex: fix reset recovery for console device On Fri, 2007-03-30 at 15:21 -0700, Ed Lin wrote: After reset completed, the scsi error handler sends out START_STOP and TEST_UNIT_READY to the device. For 'normal' devices these commands will be handled by firmware. However, because the RAID console only interfaces to scsi mid layer, the firmware will not process these commands for it. This will make the console to be offlined right after reset. Add the handling in driver to fix this problem. I don't see how this explanation can be correct. The error handler only sends a START_STOP command if sdev-allow_restart is one, which you have to set in the slave_configure routines (which stex doesn't). If you're seeing a START_STOP in the eh path, there's something else wrong. TEST_UNIT_READY, certainly ... it's part of a restart check. I just saw the routine name scsi_eh_try_stu, and didn't notice the allow_restart (partly because I thought it was not harmful...). But the TEST_UNIT_READY must stay. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 4/4] [SCSI]stex: minor cleanup and version update
-Original Message- From: Brian King [mailto:[EMAIL PROTECTED] Sent: Monday, April 02, 2007 9:05 AM To: Ed Lin Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; Promise_Linux Subject: Re: [PATCH 4/4] [SCSI]stex: minor cleanup and version update Ed Lin wrote: @@ -1007,6 +1008,11 @@ static int stex_abort(struct scsi_cmnd * u32 data; int result = SUCCESS; unsigned long flags; + + printk(KERN_INFO DRV_NAME + (%s): aborting command\n, pci_name(hba-pdev)); + scsi_print_command(cmd); + scmd_printk is probably what you want here rather than just a printk. scmd_printk(KERN_ERR, cmd, aborting command\n); @@ -1092,6 +1098,10 @@ static int stex_reset(struct scsi_cmnd * unsigned long before; hba = (struct st_hba *) cmd-device-host-hostdata[0]; + printk(KERN_INFO DRV_NAME + (%s): resetting host\n, pci_name(hba-pdev)); + scsi_print_command(cmd); + Same here. Well it's just because printk with pci_name stuff is used across the driver, and I didn't want to break the rule... - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue
-Original Message- From: James Bottomley [mailto:[EMAIL PROTECTED] Sent: Saturday, March 31, 2007 7:22 AM To: Ed Lin Cc: linux-scsi; linux-kernel; jeff; Promise_Linux Subject: Re: [PATCH 1/4] [SCSI]stex: fix id mapping issue On Fri, 2007-03-30 at 15:21 -0700, Ed Lin wrote: The internal id/lun mapping of st_vsc and st_vsc1 controllers is different from st_shasta. The original driver code can only map first 16 'entities' for st_vsc and st_vsc1 while there are actually 128 available. Also the ST_MAX_LUN_PER_TARGET should be 8, although this can do no harm because inquiries beyond boundary are discarded by firmware. The correct internal mapping should be: id:0~15, lun:0~7 (st_shasta) id:0, lun:0~127 (st_yosemite) id:0~127, lun:0 (st_vsc and st_vsc1) To scsi mid layer they are all channel:0~7, id:0~15, lun:0, with a maximun 'entity' number of 128. The RAID console only interfaces to scsi mid layer and is always mapped at channel:0, id:16, lun:0. I'm with Christoph here ... if we're going to break the backwards compatibility of the mappings (which your code does) then we could just dump channel and use the SCSI id and lun directly. Understanding this code is predicated on this quirky definition in stex_queuecommand: id = cmd-device-id; lun = cmd-device-channel; /* firmware lun issue work around */ ^^^ @ -645,12 +645,16 @@ stex_queuecommand(struct scsi_cmnd *cmd, req = stex_alloc_req(hba); - if (hba-cardtype == st_yosemite) { - req-lun = lun * (ST_MAX_TARGET_NUM - 1) + id; This looks to be correct, it goes up id 0 to ST_MAX_TARGET_NUM -1 then takes the next channel. - req-target = 0; - } else { + if (hba-cardtype == st_shasta) { req-lun = lun; req-target = id; + } else if (hba-cardtype == st_yosemite){ + req-lun = id * ST_MAX_LUN_PER_TARGET + lun; + req-target = 0; + } else { + /* st_vsc and st_vsc1 */ + req-lun = 0; + req-target = id * ST_MAX_LUN_PER_TARGET + lun; These both look to be wrong. You're taking the channel as the lowest common denominator, so your first target is on channel 1 id 0, your next on channel 2, id 0 and so on. That's really going to mess with the ordering (which will be user visible) is that really what you want? How about the attached one? s1 Description: s1
RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for console device
-Original Message- From: James Bottomley [mailto:[EMAIL PROTECTED] Sent: Monday, April 02, 2007 11:28 AM To: Ed Lin Cc: linux-scsi; linux-kernel; jeff; Promise_Linux Subject: RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for console device On Mon, 2007-04-02 at 11:14 -0700, Ed Lin wrote: I just saw the routine name scsi_eh_try_stu, and didn't notice the allow_restart (partly because I thought it was not harmful...). But the TEST_UNIT_READY must stay. Sure ... I was just checking since your change log implied you'd seen the problem from the error handler ... however, we can add it ... there's a possibility of getting spin up on init from sd anyway. You make the decision. But after reconsideration, I think it's better to remove unused code. It also needs change since the patch about id mapping is modified in another mail. How about the attachment here? s3 Description: s3
[PATCH 4/4] [SCSI]stex: minor cleanup and version update
Add debug information into abort and host_reset routine. Change ioremap to ioremap_nocache. Version updated to 3.6..1. Signed-off-by: Ed Lin <[EMAIL PROTECTED]> --- diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c index 9465f35..5a10cfa 100644 --- a/drivers/scsi/stex.c +++ b/drivers/scsi/stex.c @@ -32,11 +32,12 @@ #include #include #include #include +#include #define DRV_NAME "stex" -#define ST_DRIVER_VERSION "3.1.0.1" +#define ST_DRIVER_VERSION "3.6..1" #define ST_VER_MAJOR 3 -#define ST_VER_MINOR 1 +#define ST_VER_MINOR 6 #define ST_OEM 0 #define ST_BUILD_VER 1 @@ -1007,6 +1008,11 @@ static int stex_abort(struct scsi_cmnd * u32 data; int result = SUCCESS; unsigned long flags; + + printk(KERN_INFO DRV_NAME + "(%s): aborting command\n", pci_name(hba->pdev)); + scsi_print_command(cmd); + base = hba->mmio_base; spin_lock_irqsave(host->host_lock, flags); if (tag < host->can_queue && hba->ccb[tag].cmd == cmd) @@ -1092,6 +1098,10 @@ static int stex_reset(struct scsi_cmnd * unsigned long before; hba = (struct st_hba *) >device->host->hostdata[0]; + printk(KERN_INFO DRV_NAME + "(%s): resetting host\n", pci_name(hba->pdev)); + scsi_print_command(cmd); + hba->mu_status = MU_STATE_RESETTING; if (hba->cardtype == st_shasta) @@ -1211,7 +1221,7 @@ stex_probe(struct pci_dev *pdev, const s goto out_scsi_host_put; } - hba->mmio_base = ioremap(pci_resource_start(pdev, 0), + hba->mmio_base = ioremap_nocache(pci_resource_start(pdev, 0), pci_resource_len(pdev, 0)); if ( !hba->mmio_base) { printk(KERN_ERR DRV_NAME "(%s): memory map failed\n", - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/4] [SCSI]stex: extend hard reset wait time
During hard bus reset of st_shasta controllers, 1 ms is not enough for 16-port controllers, although it's good for 8-port controllers. Extend the wait time to 100 ms to allow bus resets finish successfully. Signed-off-by: Ed Lin <[EMAIL PROTECTED]> --- diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c index 4d68533..1e8d7ac 100644 --- a/drivers/scsi/stex.c +++ b/drivers/scsi/stex.c @@ -1055,7 +1055,12 @@ static void stex_hard_reset(struct st_hb pci_read_config_byte(bus->self, PCI_BRIDGE_CONTROL, _bctl); pci_bctl |= PCI_BRIDGE_CTL_BUS_RESET; pci_write_config_byte(bus->self, PCI_BRIDGE_CONTROL, pci_bctl); - msleep(1); + + /* +* 1 ms may be enough for 8-port controllers. But 16-port controllers +* require more time to finish bus reset. Use 100 ms here for safety +*/ + msleep(100); pci_bctl &= ~PCI_BRIDGE_CTL_BUS_RESET; pci_write_config_byte(bus->self, PCI_BRIDGE_CONTROL, pci_bctl); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/4] [SCSI]stex: fix reset recovery for console device
After reset completed, the scsi error handler sends out START_STOP and TEST_UNIT_READY to the device. For 'normal' devices these commands will be handled by firmware. However, because the RAID console only interfaces to scsi mid layer, the firmware will not process these commands for it. This will make the console to be offlined right after reset. Add the handling in driver to fix this problem. Signed-off-by: Ed Lin <[EMAIL PROTECTED]> --- diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c index 1e8d7ac..9465f35 100644 --- a/drivers/scsi/stex.c +++ b/drivers/scsi/stex.c @@ -605,6 +605,14 @@ stex_queuecommand(struct scsi_cmnd *cmd, stex_invalid_field(cmd, done); return 0; } + case TEST_UNIT_READY: + case START_STOP: + if (id == ST_MAX_ARRAY_SUPPORTED) { + cmd->result = DID_OK << 16 | COMMAND_COMPLETE << 8; + done(cmd); + return 0; + } + break; case INQUIRY: if (id != ST_MAX_ARRAY_SUPPORTED) break; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/4] [SCSI]stex: fix id mapping issue
The internal id/lun mapping of st_vsc and st_vsc1 controllers is different from st_shasta. The original driver code can only map first 16 'entities' for st_vsc and st_vsc1 while there are actually 128 available. Also the ST_MAX_LUN_PER_TARGET should be 8, although this can do no harm because inquiries beyond boundary are discarded by firmware. The correct internal mapping should be: id:0~15, lun:0~7 (st_shasta) id:0, lun:0~127 (st_yosemite) id:0~127, lun:0 (st_vsc and st_vsc1) To scsi mid layer they are all channel:0~7, id:0~15, lun:0, with a maximun 'entity' number of 128. The RAID console only interfaces to scsi mid layer and is always mapped at channel:0, id:16, lun:0. Signed-off-by: Ed Lin <[EMAIL PROTECTED]> --- diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c index 69be132..4d68533 100644 --- a/drivers/scsi/stex.c +++ b/drivers/scsi/stex.c @@ -115,7 +115,7 @@ enum { ST_MAX_ARRAY_SUPPORTED = 16, ST_MAX_TARGET_NUM = (ST_MAX_ARRAY_SUPPORTED+1), - ST_MAX_LUN_PER_TARGET = 16, + ST_MAX_LUN_PER_TARGET = 8, st_shasta = 0, st_vsc = 1, @@ -645,12 +645,16 @@ stex_queuecommand(struct scsi_cmnd *cmd, req = stex_alloc_req(hba); - if (hba->cardtype == st_yosemite) { - req->lun = lun * (ST_MAX_TARGET_NUM - 1) + id; - req->target = 0; - } else { + if (hba->cardtype == st_shasta) { req->lun = lun; req->target = id; + } else if (hba->cardtype == st_yosemite){ + req->lun = id * ST_MAX_LUN_PER_TARGET + lun; + req->target = 0; + } else { + /* st_vsc and st_vsc1 */ + req->lun = 0; + req->target = id * ST_MAX_LUN_PER_TARGET + lun; } /* cdb */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/4] [SCSI]stex: fix id mapping issue
The internal id/lun mapping of st_vsc and st_vsc1 controllers is different from st_shasta. The original driver code can only map first 16 'entities' for st_vsc and st_vsc1 while there are actually 128 available. Also the ST_MAX_LUN_PER_TARGET should be 8, although this can do no harm because inquiries beyond boundary are discarded by firmware. The correct internal mapping should be: id:0~15, lun:0~7 (st_shasta) id:0, lun:0~127 (st_yosemite) id:0~127, lun:0 (st_vsc and st_vsc1) To scsi mid layer they are all channel:0~7, id:0~15, lun:0, with a maximun 'entity' number of 128. The RAID console only interfaces to scsi mid layer and is always mapped at channel:0, id:16, lun:0. Signed-off-by: Ed Lin [EMAIL PROTECTED] --- diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c index 69be132..4d68533 100644 --- a/drivers/scsi/stex.c +++ b/drivers/scsi/stex.c @@ -115,7 +115,7 @@ enum { ST_MAX_ARRAY_SUPPORTED = 16, ST_MAX_TARGET_NUM = (ST_MAX_ARRAY_SUPPORTED+1), - ST_MAX_LUN_PER_TARGET = 16, + ST_MAX_LUN_PER_TARGET = 8, st_shasta = 0, st_vsc = 1, @@ -645,12 +645,16 @@ stex_queuecommand(struct scsi_cmnd *cmd, req = stex_alloc_req(hba); - if (hba-cardtype == st_yosemite) { - req-lun = lun * (ST_MAX_TARGET_NUM - 1) + id; - req-target = 0; - } else { + if (hba-cardtype == st_shasta) { req-lun = lun; req-target = id; + } else if (hba-cardtype == st_yosemite){ + req-lun = id * ST_MAX_LUN_PER_TARGET + lun; + req-target = 0; + } else { + /* st_vsc and st_vsc1 */ + req-lun = 0; + req-target = id * ST_MAX_LUN_PER_TARGET + lun; } /* cdb */ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/4] [SCSI]stex: extend hard reset wait time
During hard bus reset of st_shasta controllers, 1 ms is not enough for 16-port controllers, although it's good for 8-port controllers. Extend the wait time to 100 ms to allow bus resets finish successfully. Signed-off-by: Ed Lin [EMAIL PROTECTED] --- diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c index 4d68533..1e8d7ac 100644 --- a/drivers/scsi/stex.c +++ b/drivers/scsi/stex.c @@ -1055,7 +1055,12 @@ static void stex_hard_reset(struct st_hb pci_read_config_byte(bus-self, PCI_BRIDGE_CONTROL, pci_bctl); pci_bctl |= PCI_BRIDGE_CTL_BUS_RESET; pci_write_config_byte(bus-self, PCI_BRIDGE_CONTROL, pci_bctl); - msleep(1); + + /* +* 1 ms may be enough for 8-port controllers. But 16-port controllers +* require more time to finish bus reset. Use 100 ms here for safety +*/ + msleep(100); pci_bctl = ~PCI_BRIDGE_CTL_BUS_RESET; pci_write_config_byte(bus-self, PCI_BRIDGE_CONTROL, pci_bctl); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/4] [SCSI]stex: fix reset recovery for console device
After reset completed, the scsi error handler sends out START_STOP and TEST_UNIT_READY to the device. For 'normal' devices these commands will be handled by firmware. However, because the RAID console only interfaces to scsi mid layer, the firmware will not process these commands for it. This will make the console to be offlined right after reset. Add the handling in driver to fix this problem. Signed-off-by: Ed Lin [EMAIL PROTECTED] --- diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c index 1e8d7ac..9465f35 100644 --- a/drivers/scsi/stex.c +++ b/drivers/scsi/stex.c @@ -605,6 +605,14 @@ stex_queuecommand(struct scsi_cmnd *cmd, stex_invalid_field(cmd, done); return 0; } + case TEST_UNIT_READY: + case START_STOP: + if (id == ST_MAX_ARRAY_SUPPORTED) { + cmd-result = DID_OK 16 | COMMAND_COMPLETE 8; + done(cmd); + return 0; + } + break; case INQUIRY: if (id != ST_MAX_ARRAY_SUPPORTED) break; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/4] [SCSI]stex: minor cleanup and version update
Add debug information into abort and host_reset routine. Change ioremap to ioremap_nocache. Version updated to 3.6..1. Signed-off-by: Ed Lin [EMAIL PROTECTED] --- diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c index 9465f35..5a10cfa 100644 --- a/drivers/scsi/stex.c +++ b/drivers/scsi/stex.c @@ -32,11 +32,12 @@ #include scsi/scsi_device.h #include scsi/scsi_cmnd.h #include scsi/scsi_host.h #include scsi/scsi_tcq.h +#include scsi/scsi_dbg.h #define DRV_NAME stex -#define ST_DRIVER_VERSION 3.1.0.1 +#define ST_DRIVER_VERSION 3.6..1 #define ST_VER_MAJOR 3 -#define ST_VER_MINOR 1 +#define ST_VER_MINOR 6 #define ST_OEM 0 #define ST_BUILD_VER 1 @@ -1007,6 +1008,11 @@ static int stex_abort(struct scsi_cmnd * u32 data; int result = SUCCESS; unsigned long flags; + + printk(KERN_INFO DRV_NAME + (%s): aborting command\n, pci_name(hba-pdev)); + scsi_print_command(cmd); + base = hba-mmio_base; spin_lock_irqsave(host-host_lock, flags); if (tag host-can_queue hba-ccb[tag].cmd == cmd) @@ -1092,6 +1098,10 @@ static int stex_reset(struct scsi_cmnd * unsigned long before; hba = (struct st_hba *) cmd-device-host-hostdata[0]; + printk(KERN_INFO DRV_NAME + (%s): resetting host\n, pci_name(hba-pdev)); + scsi_print_command(cmd); + hba-mu_status = MU_STATE_RESETTING; if (hba-cardtype == st_shasta) @@ -1211,7 +1221,7 @@ stex_probe(struct pci_dev *pdev, const s goto out_scsi_host_put; } - hba-mmio_base = ioremap(pci_resource_start(pdev, 0), + hba-mmio_base = ioremap_nocache(pci_resource_start(pdev, 0), pci_resource_len(pdev, 0)); if ( !hba-mmio_base) { printk(KERN_ERR DRV_NAME (%s): memory map failed\n, - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [patch] scsi: use lock per host instead of per device for shared queue tag host
> -Original Message- > From: Jens Axboe [mailto:[EMAIL PROTECTED] > Sent: Thursday, January 25, 2007 7:48 AM > To: Ed Lin > Cc: David Somayajulu; Michael Reed; linux-scsi; linux-kernel; > james.Bottomley; jeff; Promise_Linux > Subject: Re: [patch] scsi: use lock per host instead of per > device for shared queue tag host > > > On Thu, Jan 25 2007, Jens Axboe wrote: > > On Wed, Jan 24 2007, Ed Lin wrote: > > > > > > > > > > -Original Message- > > > > From: David Somayajulu [mailto:[EMAIL PROTECTED] > > > > Sent: Wednesday, January 24, 2007 5:03 PM > > > > To: Ed Lin; Michael Reed > > > > Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; > > > > Promise_Linux; Jens Axboe > > > > Subject: RE: [patch] scsi: use lock per host instead of per > > > > device for shared queue tag host > > > > > > > > > > > > > It seems another driver(qla4xxx) is also using shared > queue tag. > > > > > It is natural to imagine there might be same symptom in that > > > > > driver. But I don't know the driver and have no hardware so I > > > > > can not say anything certain about it. > > > > > > > > qla4xxx implements slightly differently, in the sense we > > > > don't have the > > > > equivalent of > > > > struct st_ccb ccb[MU_MAX_REQUEST]; > > > > which is in struct st_hba. In other words we don't have > a local array > > > > which like stex to keep track of the outstanding > commands to the hba. > > > > > > > > We had a discussion on this one while implementing > block-layer tagging > > > > in qla4xxx and Jens Axboe added the test_and_set_bit() in the > > > > following > > > > code in blk_queue_start_tag() to take care of it. > > > > do { > > > > tag = find_first_zero_bit(bqt->tag_map, > bqt->max_depth); > > > > if (tag >= bqt->max_depth) > > > > return 1; > > > > } while (test_and_set_bit(tag, bqt->tag_map)); > > > > Please see the following link for the discussion > > > > http://marc.theaimsgroup.com/?l=linux-scsi=115886351206726=2 > > > > > > > > Cheers > > > > David Somayajulu > > > > QLogic Corporation > > > > > > > > > > Yes, this piece of code of allocating tag, in itself, is safe. > > > But the following > > > > > > if (unlikely(!__test_and_clear_bit(tag, bqt->tag_map))) { > > > printk(KERN_ERR "%s: attempt to clear non-busy tag > > > (%d)\n", > > > __FUNCTION__, tag); > > > return; > > > } > > > > > > code of freeing tag (in blk_queue_end_tag())seems to be using > > > unsafe __test_and_clear_bit instead of test_and_clear_bit. > > > I once changed it to test_and_clear_bit and thought it was fixed. > > > But the panic happened thereafter nonetheless(using gcc 3.4.6. > > > gcc 4.1.0 is better but still with kernel errors). bqt also needs > > > to be protected in this case. Replacing queue lock per device with > > > a host lock is a simple but logical fix for it. To introduce a > > > more refined lock is possible, but seems too tedious and elaborate > > > for this issue, since a queue lock is already out there, and a > > > hostwide lock is needed anyway. > > > > Does this fix it? There really should be no need to add > extra locking > > for this, it would be a shame. > > > > diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c > > index fb67897..e752e5d 100644 > > --- a/block/ll_rw_blk.c > > +++ b/block/ll_rw_blk.c > > @@ -1072,12 +1072,16 @@ void > blk_queue_end_tag(request_queue_t *q, struct request *rq) > > */ > > return; > > > > - if (unlikely(!__test_and_clear_bit(tag, bqt->tag_map))) { > > + smp_mb__before_clear_bit(); > > + > > + if (unlikely(!test_and_clear_bit(tag, bqt->tag_map))) { > > printk(KERN_ERR "%s: attempt to clear non-busy > tag (%d)\n", > >__FUNCTION__, tag); > > return; > > } > > > > + smp_mb__after_clear_bit(); > > + > > list_del_init(>queuelist); > > rq->cmd_flags &= ~REQ_QUEUED; > > rq->tag = -1; > &g
RE: [patch] scsi: use lock per host instead of per device for shared queue tag host
-Original Message- From: Jens Axboe [mailto:[EMAIL PROTECTED] Sent: Thursday, January 25, 2007 7:48 AM To: Ed Lin Cc: David Somayajulu; Michael Reed; linux-scsi; linux-kernel; james.Bottomley; jeff; Promise_Linux Subject: Re: [patch] scsi: use lock per host instead of per device for shared queue tag host On Thu, Jan 25 2007, Jens Axboe wrote: On Wed, Jan 24 2007, Ed Lin wrote: -Original Message- From: David Somayajulu [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 24, 2007 5:03 PM To: Ed Lin; Michael Reed Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; Promise_Linux; Jens Axboe Subject: RE: [patch] scsi: use lock per host instead of per device for shared queue tag host It seems another driver(qla4xxx) is also using shared queue tag. It is natural to imagine there might be same symptom in that driver. But I don't know the driver and have no hardware so I can not say anything certain about it. qla4xxx implements slightly differently, in the sense we don't have the equivalent of struct st_ccb ccb[MU_MAX_REQUEST]; which is in struct st_hba. In other words we don't have a local array which like stex to keep track of the outstanding commands to the hba. We had a discussion on this one while implementing block-layer tagging in qla4xxx and Jens Axboe added the test_and_set_bit() in the following code in blk_queue_start_tag() to take care of it. do { tag = find_first_zero_bit(bqt-tag_map, bqt-max_depth); if (tag = bqt-max_depth) return 1; } while (test_and_set_bit(tag, bqt-tag_map)); Please see the following link for the discussion http://marc.theaimsgroup.com/?l=linux-scsim=115886351206726w=2 Cheers David Somayajulu QLogic Corporation Yes, this piece of code of allocating tag, in itself, is safe. But the following if (unlikely(!__test_and_clear_bit(tag, bqt-tag_map))) { printk(KERN_ERR %s: attempt to clear non-busy tag (%d)\n, __FUNCTION__, tag); return; } code of freeing tag (in blk_queue_end_tag())seems to be using unsafe __test_and_clear_bit instead of test_and_clear_bit. I once changed it to test_and_clear_bit and thought it was fixed. But the panic happened thereafter nonetheless(using gcc 3.4.6. gcc 4.1.0 is better but still with kernel errors). bqt also needs to be protected in this case. Replacing queue lock per device with a host lock is a simple but logical fix for it. To introduce a more refined lock is possible, but seems too tedious and elaborate for this issue, since a queue lock is already out there, and a hostwide lock is needed anyway. Does this fix it? There really should be no need to add extra locking for this, it would be a shame. diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c index fb67897..e752e5d 100644 --- a/block/ll_rw_blk.c +++ b/block/ll_rw_blk.c @@ -1072,12 +1072,16 @@ void blk_queue_end_tag(request_queue_t *q, struct request *rq) */ return; - if (unlikely(!__test_and_clear_bit(tag, bqt-tag_map))) { + smp_mb__before_clear_bit(); + + if (unlikely(!test_and_clear_bit(tag, bqt-tag_map))) { printk(KERN_ERR %s: attempt to clear non-busy tag (%d)\n, __FUNCTION__, tag); return; } + smp_mb__after_clear_bit(); + list_del_init(rq-queuelist); rq-cmd_flags = ~REQ_QUEUED; rq-tag = -1; Double checking the actual implementation, the smp_mb__* should not be needed with the test_and_*_bit operations. The __test_and_clear_bit() change is needed, though. What kind of crash did you see when you did that? It should not crash, but you could see the attempt to clear non-busy tag error though. Besides the test_and_clear_bit, I think the bqt code(refer to last mail) also needs protection, like: list_del_init(rq-queuelist); ... if (unlikely(bqt-tag_index[tag] == NULL)) printk(KERN_ERR %s: tag %d is missing\n, __FUNCTION__, tag); bqt-tag_index[tag] = NULL; bqt-busy--; and bqt-tag_index[tag] = rq; ... list_add(rq-queuelist, bqt-busy_list); bqt-busy++; because bqt is also globally shared within all devices in the host in this case. (q-queue_tags was assigned as host-bqt in scsi_activate_tcq ) With a gcc 4.1.0 compiled kernel, I did not get kernel panic, but still got kernel errors: tag is missing. I guess a possible race scenario could be: cpu a:__test_and_clear_bit cpu b:test_and_set_bit, allocate a tag just freed by cpu a cpu b:bqt-tag_index[tag] = rq; cpu a:bqt-tag_index[tag] = NULL; Next time, when the request
RE: [patch] scsi: use lock per host instead of per device forshared queue tag host
> -Original Message- > From: James Bottomley [mailto:[EMAIL PROTECTED] > Sent: Wednesday, January 24, 2007 9:00 AM > To: Ed Lin > Cc: linux-scsi; linux-kernel; jeff; Promise_Linux > Subject: Re: [patch] scsi: use lock per host instead of per > device forshared queue tag host > ... > > This patch looks OK in principle. > > However, are you sure you're not using a sledgehammer to crack a nut? > If the only reason you're doing this is because of the shared tag map, > then probably that should be the area you protect with a per-tag-map > lock. The net effect of what you've done will be to serialise all > accesses to your storage devices. For a small number of devices, this > probably won't matter than much, but for large numbers of devices, > you're probably going to introduce artificial performance > degredation in > the I/O scheduler. > > James > Thanks. Maybe this issue needs more discussion. I'll follow up. Ed Lin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [patch] scsi: use lock per host instead of per device for shared queue tag host
> -Original Message- > From: David Somayajulu [mailto:[EMAIL PROTECTED] > Sent: Wednesday, January 24, 2007 5:03 PM > To: Ed Lin; Michael Reed > Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; > Promise_Linux; Jens Axboe > Subject: RE: [patch] scsi: use lock per host instead of per > device for shared queue tag host > > > > It seems another driver(qla4xxx) is also using shared queue tag. > > It is natural to imagine there might be same symptom in that > > driver. But I don't know the driver and have no hardware so I > > can not say anything certain about it. > > qla4xxx implements slightly differently, in the sense we > don't have the > equivalent of > struct st_ccb ccb[MU_MAX_REQUEST]; > which is in struct st_hba. In other words we don't have a local array > which like stex to keep track of the outstanding commands to the hba. > > We had a discussion on this one while implementing block-layer tagging > in qla4xxx and Jens Axboe added the test_and_set_bit() in the > following > code in blk_queue_start_tag() to take care of it. > do { > tag = find_first_zero_bit(bqt->tag_map, bqt->max_depth); > if (tag >= bqt->max_depth) > return 1; > } while (test_and_set_bit(tag, bqt->tag_map)); > Please see the following link for the discussion > http://marc.theaimsgroup.com/?l=linux-scsi=115886351206726=2 > > Cheers > David Somayajulu > QLogic Corporation > Yes, this piece of code of allocating tag, in itself, is safe. But the following if (unlikely(!__test_and_clear_bit(tag, bqt->tag_map))) { printk(KERN_ERR "%s: attempt to clear non-busy tag (%d)\n", __FUNCTION__, tag); return; } code of freeing tag (in blk_queue_end_tag())seems to be using unsafe __test_and_clear_bit instead of test_and_clear_bit. I once changed it to test_and_clear_bit and thought it was fixed. But the panic happened thereafter nonetheless(using gcc 3.4.6. gcc 4.1.0 is better but still with kernel errors). bqt also needs to be protected in this case. Replacing queue lock per device with a host lock is a simple but logical fix for it. To introduce a more refined lock is possible, but seems too tedious and elaborate for this issue, since a queue lock is already out there, and a hostwide lock is needed anyway. Thanks, Ed Lin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [patch] scsi: use lock per host instead of per device for shared queue tag host
> -Original Message- > From: Michael Reed [mailto:[EMAIL PROTECTED] > Sent: Wednesday, January 24, 2007 7:59 AM > To: Ed Lin > Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; Promise_Linux > Subject: Re: [patch] scsi: use lock per host instead of per > device for shared queue tag host > > > How 'bout a comment in scsh_host.h indicating that the > pointer will be NULL unless > initialized by the driver? > > "Protect shared block queue tag" is unique to stex. Perhaps > have no comment on > the variable declaration in scsi_host.h and explain why you > use it in stex. > > Mike > > Thanks for commenting. I agree more detailed explaination should be better. It seems another driver(qla4xxx) is also using shared queue tag. It is natural to imagine there might be same symptom in that driver. But I don't know the driver and have no hardware so I can not say anything certain about it. Ed Lin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [patch] scsi: use lock per host instead of per device for shared queue tag host
-Original Message- From: Michael Reed [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 24, 2007 7:59 AM To: Ed Lin Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; Promise_Linux Subject: Re: [patch] scsi: use lock per host instead of per device for shared queue tag host How 'bout a comment in scsh_host.h indicating that the pointer will be NULL unless initialized by the driver? Protect shared block queue tag is unique to stex. Perhaps have no comment on the variable declaration in scsi_host.h and explain why you use it in stex. Mike Thanks for commenting. I agree more detailed explaination should be better. It seems another driver(qla4xxx) is also using shared queue tag. It is natural to imagine there might be same symptom in that driver. But I don't know the driver and have no hardware so I can not say anything certain about it. Ed Lin - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [patch] scsi: use lock per host instead of per device for shared queue tag host
-Original Message- From: David Somayajulu [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 24, 2007 5:03 PM To: Ed Lin; Michael Reed Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; Promise_Linux; Jens Axboe Subject: RE: [patch] scsi: use lock per host instead of per device for shared queue tag host It seems another driver(qla4xxx) is also using shared queue tag. It is natural to imagine there might be same symptom in that driver. But I don't know the driver and have no hardware so I can not say anything certain about it. qla4xxx implements slightly differently, in the sense we don't have the equivalent of struct st_ccb ccb[MU_MAX_REQUEST]; which is in struct st_hba. In other words we don't have a local array which like stex to keep track of the outstanding commands to the hba. We had a discussion on this one while implementing block-layer tagging in qla4xxx and Jens Axboe added the test_and_set_bit() in the following code in blk_queue_start_tag() to take care of it. do { tag = find_first_zero_bit(bqt-tag_map, bqt-max_depth); if (tag = bqt-max_depth) return 1; } while (test_and_set_bit(tag, bqt-tag_map)); Please see the following link for the discussion http://marc.theaimsgroup.com/?l=linux-scsim=115886351206726w=2 Cheers David Somayajulu QLogic Corporation Yes, this piece of code of allocating tag, in itself, is safe. But the following if (unlikely(!__test_and_clear_bit(tag, bqt-tag_map))) { printk(KERN_ERR %s: attempt to clear non-busy tag (%d)\n, __FUNCTION__, tag); return; } code of freeing tag (in blk_queue_end_tag())seems to be using unsafe __test_and_clear_bit instead of test_and_clear_bit. I once changed it to test_and_clear_bit and thought it was fixed. But the panic happened thereafter nonetheless(using gcc 3.4.6. gcc 4.1.0 is better but still with kernel errors). bqt also needs to be protected in this case. Replacing queue lock per device with a host lock is a simple but logical fix for it. To introduce a more refined lock is possible, but seems too tedious and elaborate for this issue, since a queue lock is already out there, and a hostwide lock is needed anyway. Thanks, Ed Lin - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [patch] scsi: use lock per host instead of per device forshared queue tag host
-Original Message- From: James Bottomley [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 24, 2007 9:00 AM To: Ed Lin Cc: linux-scsi; linux-kernel; jeff; Promise_Linux Subject: Re: [patch] scsi: use lock per host instead of per device forshared queue tag host ... This patch looks OK in principle. However, are you sure you're not using a sledgehammer to crack a nut? If the only reason you're doing this is because of the shared tag map, then probably that should be the area you protect with a per-tag-map lock. The net effect of what you've done will be to serialise all accesses to your storage devices. For a small number of devices, this probably won't matter than much, but for large numbers of devices, you're probably going to introduce artificial performance degredation in the I/O scheduler. James Thanks. Maybe this issue needs more discussion. I'll follow up. Ed Lin - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] scsi: use lock per host instead of per device for shared queue tag host
The block layer uses lock to protect request queue. Every scsi device has a unique request queue, and queue lock is the default lock in struct request_queue. This is good for normal cases. But for a host with shared queue tag (e.g. stex controllers), a queue lock per device means the shared queue tag is not protected when multiple devices are accessed at a same time. This patch is a simple fix for this situation by introducing a host queue lock to protect shared queue tag. Without this patch we will see various kernel panics (including the BUG() and kernel errors in blk_queue_start_tag and blk_queue_end_tag of ll_rw_blk.c) when accessing different devices simultaneously (e.g. copying big file from one device to another in smp kernels). This is against kernel 2.6.20-rc5. Signed-off-by: Ed Lin <[EMAIL PROTECTED]> --- drivers/scsi/scsi_lib.c |2 +- drivers/scsi/stex.c |2 ++ include/scsi/scsi_host.h |3 +++ 3 files changed, 6 insertions(+), 1 deletion(-) diff -purN a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c --- a/drivers/scsi/scsi_lib.c 2007-01-23 14:40:28.0 -0800 +++ b/drivers/scsi/scsi_lib.c 2007-01-23 14:46:43.0 -0800 @@ -1574,7 +1574,7 @@ struct request_queue *__scsi_alloc_queue { struct request_queue *q; - q = blk_init_queue(request_fn, NULL); + q = blk_init_queue(request_fn, shost->req_q_lock); if (!q) return NULL; diff -purN a/drivers/scsi/stex.c b/drivers/scsi/stex.c --- a/drivers/scsi/stex.c 2007-01-23 14:40:28.0 -0800 +++ b/drivers/scsi/stex.c 2007-01-23 14:48:59.0 -0800 @@ -1254,6 +1254,8 @@ stex_probe(struct pci_dev *pdev, const s if (err) goto out_free_irq; + spin_lock_init(>__req_q_lock); + host->req_q_lock = >__req_q_lock; err = scsi_init_shared_tag_map(host, host->can_queue); if (err) { printk(KERN_ERR DRV_NAME "(%s): init shared queue failed\n", diff -purN a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h --- a/include/scsi/scsi_host.h 2007-01-23 14:40:29.0 -0800 +++ b/include/scsi/scsi_host.h 2007-01-23 14:57:04.0 -0800 @@ -508,6 +508,9 @@ struct Scsi_Host { spinlock_t default_lock; spinlock_t *host_lock; + spinlock_t __req_q_lock; + spinlock_t *req_q_lock;/* protect shared block queue tag */ + struct mutexscan_mutex;/* serialize scanning activity */ struct list_headeh_cmd_q; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] scsi: use lock per host instead of per device for shared queue tag host
The block layer uses lock to protect request queue. Every scsi device has a unique request queue, and queue lock is the default lock in struct request_queue. This is good for normal cases. But for a host with shared queue tag (e.g. stex controllers), a queue lock per device means the shared queue tag is not protected when multiple devices are accessed at a same time. This patch is a simple fix for this situation by introducing a host queue lock to protect shared queue tag. Without this patch we will see various kernel panics (including the BUG() and kernel errors in blk_queue_start_tag and blk_queue_end_tag of ll_rw_blk.c) when accessing different devices simultaneously (e.g. copying big file from one device to another in smp kernels). This is against kernel 2.6.20-rc5. Signed-off-by: Ed Lin [EMAIL PROTECTED] --- drivers/scsi/scsi_lib.c |2 +- drivers/scsi/stex.c |2 ++ include/scsi/scsi_host.h |3 +++ 3 files changed, 6 insertions(+), 1 deletion(-) diff -purN a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c --- a/drivers/scsi/scsi_lib.c 2007-01-23 14:40:28.0 -0800 +++ b/drivers/scsi/scsi_lib.c 2007-01-23 14:46:43.0 -0800 @@ -1574,7 +1574,7 @@ struct request_queue *__scsi_alloc_queue { struct request_queue *q; - q = blk_init_queue(request_fn, NULL); + q = blk_init_queue(request_fn, shost-req_q_lock); if (!q) return NULL; diff -purN a/drivers/scsi/stex.c b/drivers/scsi/stex.c --- a/drivers/scsi/stex.c 2007-01-23 14:40:28.0 -0800 +++ b/drivers/scsi/stex.c 2007-01-23 14:48:59.0 -0800 @@ -1254,6 +1254,8 @@ stex_probe(struct pci_dev *pdev, const s if (err) goto out_free_irq; + spin_lock_init(host-__req_q_lock); + host-req_q_lock = host-__req_q_lock; err = scsi_init_shared_tag_map(host, host-can_queue); if (err) { printk(KERN_ERR DRV_NAME (%s): init shared queue failed\n, diff -purN a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h --- a/include/scsi/scsi_host.h 2007-01-23 14:40:29.0 -0800 +++ b/include/scsi/scsi_host.h 2007-01-23 14:57:04.0 -0800 @@ -508,6 +508,9 @@ struct Scsi_Host { spinlock_t default_lock; spinlock_t *host_lock; + spinlock_t __req_q_lock; + spinlock_t *req_q_lock;/* protect shared block queue tag */ + struct mutexscan_mutex;/* serialize scanning activity */ struct list_headeh_cmd_q; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/