from:"Ed Lin"

RE: panics with 16port Promise Supertrack EX Controller

2007-07-16 Thread Ed Lin



> -Original Message-
> From: Jens Axboe [mailto:[EMAIL PROTECTED] 
> Sent: Monday, July 16, 2007 4:18 AM
> To: Flavio Curti
> Cc: Nick Piggin; Michal Piotrowski; Flavio Curti; 
> linux-kernel@vger.kernel.org; Promise_Linux
> Subject: Re: panics with 16port Promise Supertrack EX Controller
> 
> 
> On Sun, Jul 15 2007, Flavio Curti wrote:
> > Hello
> > 
> > On Mon, Jul 09, 2007 at 11:59:36AM +1000, Nick Piggin wrote:
> > > >Jul  8 00:19:13 dorade.cyberlink.ch EFLAGS: 00210046   
> > > >(2.6.22-rc7-dorade #1)
> > > >>The machine panics
> > > >>after some days of running fine, the machine inst heavy loaded.
> > > >>The Controller detects as stex device:
> > > >kernel BUG at block/as-iosched.c:1084!
> > > >
> > > >BUG_ON(RB_EMPTY_ROOT(>sort_list[REQ_ASYNC]));
> > > Could be a bug in the driver that just happens to be 
> caught by AS checks.
> > > If you could test another scheduler (boot with 
> elevator=deadline or 
> > > elevator=cfq),
> > > it might help give us an idea.
> > 
> > Ok, I now switched to cfg, and the machine panicd again. 
> Panic attached.
> > Any help is appreciated.
> 
> It really looks like a stex bug - perhaps it's doing double 
> completions
> of a request?
> 


I will come up with something related to tag lock, and see
what happens after that.

Ed Lin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: panics with 16port Promise Supertrack EX Controller

2007-07-16 Thread Ed Lin

 -Original Message-
 From: Jens Axboe [mailto:[EMAIL PROTECTED] 
 Sent: Monday, July 16, 2007 4:18 AM
 To: Flavio Curti
 Cc: Nick Piggin; Michal Piotrowski; Flavio Curti; 
 linux-kernel@vger.kernel.org; Promise_Linux
 Subject: Re: panics with 16port Promise Supertrack EX Controller

 On Sun, Jul 15 2007, Flavio Curti wrote:
  Hello

  On Mon, Jul 09, 2007 at 11:59:36AM +1000, Nick Piggin wrote:
   Jul  8 00:19:13 dorade.cyberlink.ch EFLAGS: 00210046   
   (2.6.22-rc7-dorade #1)
   The machine panics
   after some days of running fine, the machine inst heavy loaded.
   The Controller detects as stex device:
   kernel BUG at block/as-iosched.c:1084!

   BUG_ON(RB_EMPTY_ROOT(ad-sort_list[REQ_ASYNC]));
   Could be a bug in the driver that just happens to be 
 caught by AS checks.
   If you could test another scheduler (boot with 
 elevator=deadline or 
   elevator=cfq),
   it might help give us an idea.

  Ok, I now switched to cfg, and the machine panicd again. 
 Panic attached.
  Any help is appreciated.

 It really looks like a stex bug - perhaps it's doing double 
 completions
 of a request?

I will come up with something related to tag lock, and see
what happens after that.

Ed Lin
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue

2007-04-04 Thread Ed Lin



> -Original Message-
> From: James Bottomley [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, April 04, 2007 10:36 AM
> To: Ed Lin
> Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
> Subject: RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue
> 
> 
> On Wed, 2007-04-04 at 10:31 -0700, Ed Lin wrote:
> 
> > Sorry. It seems the mail server has problem. The patch is 
> here in plain
> > text. I hope this time it does not mess up. I have problem with
> > linux-scsi
> > mail list, if you have comment please cc me. Thanks.
> > --Ed Lin
> 
> The lines are still broken, I'm afraid ... you can just resend as an
> attachement git-applypatch copes fine with that ... inline is 
> just good
> for quoting and replying.
> > +   if (hba->cardtype == st_shasta) {
> > +   host->max_channel = 7;
> > +   host->max_id = 16 + 1;
> > +   } else if (hba->cardtype == st_yosemite) {
> > +   host->max_channel = 127;
> > +   host->max_id = 1 + 1;
> > +   } else {
> > +   /* st_vsc and st_vsc1 */
> > +   host->max_channel = 0;
> > +   host->max_id = 128 + 1;
> 
> This is OK.  The use of ->channel is still a bit strange ... could we
> not simply use lun instead of channel (i.e. map the adapter id/lun to
> the mid-layer id/lun instead of using id/channel)?
> 

This is because there is a CONFIG_SCSI_MULTI_LUN option.
If this option is not selected, max_scsi_luns will be set to 1 and
the RAID arrays with lun>0 will disappear(because they are not
scanned). That is unacceptable from a user's view point.  I have
also explained this in the code comment:

/*
 * firmware uses id/lun pair for a logical drive, but lun would be
 * always 0 if CONFIG_SCSI_MULTI_LUN not configured, so we use
 * channel to map lun here
 */

If it is not allowed to map channel to lun, then maybe I have to
report 128 targets and do the mapping in queuecommand...
After all there must be a mapping somewhere so I don't see
much difference here...

I paste the patch again, is the format ok?



The internal id/lun mapping of st_vsc and st_vsc1 controllers is different
from st_shasta. The original driver code can only  map first 16 'entities'
for st_vsc and st_vsc1 while there are actually 128 available.

The correct mapping should be:
channel:0~7, id:0~15 (st_shasta, console is channel:0, id:16, lun:0)
channel:0~127, id:0 (st_yosemite, console is channel:0, id:1, lun:0)
channel:0, id:0~127 (st_vsc and st_vsc1, console is channel:0, id:128, lun:0)

Signed-off-by: Ed Lin <[EMAIL PROTECTED]>
---
diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 69be132..29a7b61 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -113,10 +113,6 @@ enum {
SG_CF_64B   = 0x40, /* 64 bit item */
SG_CF_HOST  = 0x20, /* sg in host memory */
 
-   ST_MAX_ARRAY_SUPPORTED  = 16,
-   ST_MAX_TARGET_NUM   = (ST_MAX_ARRAY_SUPPORTED+1),
-   ST_MAX_LUN_PER_TARGET   = 16,
-
st_shasta   = 0,
st_vsc  = 1,
st_vsc1 = 2,
@@ -606,7 +602,7 @@ stex_queuecommand(struct scsi_cmnd *cmd,
return 0;
}
case INQUIRY:
-   if (id != ST_MAX_ARRAY_SUPPORTED)
+   if (id != host->max_id - 1)
break;
if (lun == 0 && (cmd->cmnd[1] & INQUIRY_EVPD) == 0) {
stex_direct_copy(cmd, console_inq_page,
@@ -624,7 +620,7 @@ stex_queuecommand(struct scsi_cmnd *cmd,
ver.oem = ST_OEM;
ver.build = ST_BUILD_VER;
ver.signature[0] = PASSTHRU_SIGNATURE;
-   ver.console_id = ST_MAX_ARRAY_SUPPORTED;
+   ver.console_id = host->max_id - 1;
ver.host_no = hba->host->host_no;
cmd->result = stex_direct_copy(cmd, , sizeof(ver)) ?
DID_OK << 16 | COMMAND_COMPLETE << 8 :
@@ -645,13 +641,8 @@ stex_queuecommand(struct scsi_cmnd *cmd,
 
req = stex_alloc_req(hba);
 
-   if (hba->cardtype == st_yosemite) {
-   req->lun = lun * (ST_MAX_TARGET_NUM - 1) + id;
-   req->target = 0;
-   } else {
-   req->lun = lun;
-   req->target = id;
-   }
+   req->lun = lun;
+   req->target = id;
 
/* cdb */
memcpy(req->cdb, cmd->cmnd, STEX_CDB_LENGTH);
@@ -1229,11 +1220,22 @@ stex_probe(struct pci_dev *pdev, const s
hba->copy_buffer = hba->dma_

RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for console device

2007-04-04 Thread Ed Lin



> -Original Message-
> From: Ed Lin 
> Sent: Monday, April 02, 2007 4:02 PM
> To: James Bottomley
> Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
> Subject: RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for 
> console device
> 
> 
> 
> 
> > -Original Message-
> > From: James Bottomley [mailto:[EMAIL PROTECTED] 
> > Sent: Monday, April 02, 2007 11:28 AM
> > To: Ed Lin
> > Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
> > Subject: RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for 
> > console device
> > 
> > 
> > On Mon, 2007-04-02 at 11:14 -0700, Ed Lin wrote:
> > > I just saw the routine name scsi_eh_try_stu, and didn't notice the
> > > allow_restart (partly because I thought it was not harmful...).
> > > But the TEST_UNIT_READY must stay.
> > 
> > Sure ... I was just checking since your change log implied 
> you'd seen
> > the problem from the error handler ... however, we can add it ...
> > there's a possibility of getting spin up on init from sd anyway.
> > 
> 
> You make the decision. But after reconsideration, I think it's better
> to remove unused code. It also needs change since the patch about
> id mapping is modified in another mail.
> 
> How about the attachment here?
>

Sorry. I think the mail server has problem with attachment. The patch
is here in plain text. Sorry for the inconvenience. I have problem with
linux-scsi mail list, if you have comment please cc me. Thanks.
--Ed Lin




After reset completed, the scsi error handler sends out TEST_UNIT_READY
to
the device. For 'normal' devices this command will be handled by
firmware.
However, because the RAID console only interfaces to scsi mid layer, the
firmware will not process this command for it. This will make the
console
to be offlined right after reset. Add the handling in driver to fix this
problem.

Signed-off-by: Ed Lin <[EMAIL PROTECTED]>
---
diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 4c6ce6a..85c779b 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -601,6 +601,13 @@ stex_queuecommand(struct scsi_cmnd *cmd,
stex_invalid_field(cmd, done);
return 0;
}
+   case TEST_UNIT_READY:
+   if (id == host->max_id - 1) {
+   cmd->result = DID_OK << 16 | COMMAND_COMPLETE <<
8;
+   done(cmd);
+   return 0;
+   }
+   break;
case INQUIRY:
if (id != host->max_id - 1)
break; 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue

2007-04-04 Thread Ed Lin



> -Original Message-
> From: Ed Lin 
> Sent: Monday, April 02, 2007 4:01 PM
> To: 'James Bottomley'
> Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
> Subject: RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue
> 
> 
> 
> 
> > -Original Message-
> > From: James Bottomley [mailto:[EMAIL PROTECTED] 
> > Sent: Saturday, March 31, 2007 7:22 AM
> > To: Ed Lin
> > Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
> > Subject: Re: [PATCH 1/4] [SCSI]stex: fix id mapping issue
> > 
> > 
> > On Fri, 2007-03-30 at 15:21 -0700, Ed Lin wrote:
> > > The internal id/lun mapping of st_vsc and st_vsc1 
> > controllers is different
> > > from st_shasta. The original driver code can only  map 
> > first 16 'entities'
> > > for st_vsc and st_vsc1 while there are actually 128 available.
> > > 
> > > Also the  ST_MAX_LUN_PER_TARGET should be 8, although this can do
> > > no harm because inquiries beyond boundary are discarded 
> by firmware.
> > > 
> > > The correct internal mapping should be:
> > > id:0~15, lun:0~7 (st_shasta)
> > > id:0, lun:0~127 (st_yosemite)
> > > id:0~127, lun:0 (st_vsc and st_vsc1)
> > > To scsi mid layer they are all channel:0~7, id:0~15, lun:0, 
> > with a maximun
> > > 'entity' number of 128. The RAID console only interfaces to 
> > scsi mid layer
> > > and is always mapped at channel:0, id:16, lun:0.
> > 
> > I'm with Christoph here ... if we're going to break the backwards
> > compatibility of the mappings (which your code does) then we 
> > could just
> > dump channel and use the SCSI id and lun directly.
> > 
> > Understanding this code is predicated on this quirky definition in
> > stex_queuecommand:
> > 
> > id = cmd->device->id;
> > lun = cmd->device->channel; /* firmware lun issue work around */
> >^^^
> > 
> > > @ -645,12 +645,16 @@ stex_queuecommand(struct scsi_cmnd *cmd,
> > >  
> > >   req = stex_alloc_req(hba);
> > >  
> > > - if (hba->cardtype == st_yosemite) {
> > > - req->lun = lun * (ST_MAX_TARGET_NUM - 1) + id;
> > 
> > This looks to be correct, it goes up id 0 to 
> ST_MAX_TARGET_NUM -1 then
> > takes the next channel.
> > 
> > > - req->target = 0;
> > > - } else {
> > > + if (hba->cardtype == st_shasta) {
> > >   req->lun = lun;
> > >   req->target = id;
> > > + } else if (hba->cardtype == st_yosemite){
> > > + req->lun = id * ST_MAX_LUN_PER_TARGET + lun;
> > > + req->target = 0;
> > > + } else {
> > > + /* st_vsc and st_vsc1 */
> > > + req->lun = 0;
> > > + req->target = id * ST_MAX_LUN_PER_TARGET + lun;
> > 
> > These both look to be wrong.  You're taking the channel as 
> the lowest
> > common denominator, so your first target is on channel 1 id 
> > 0, your next
> > on channel 2, id 0 and so on.  That's really going to mess with the
> > ordering (which will be user visible) is that really what you want?
> > 
> 
> How about the attached one? 
> 

Sorry. It seems the mail server has problem. The patch is here in plain
text. I hope this time it does not mess up. I have problem with
linux-scsi
mail list, if you have comment please cc me. Thanks.
--Ed Lin




The internal id/lun mapping of st_vsc and st_vsc1 controllers is
different
from st_shasta. The original driver code can only  map first 16
'entities'
for st_vsc and st_vsc1 while there are actually 128 available.

The correct mapping should be:
channel:0~7, id:0~15 (st_shasta, console is channel:0, id:16, lun:0)
channel:0~127, id:0 (st_yosemite, console is channel:0, id:1, lun:0)
channel:0, id:0~127 (st_vsc and st_vsc1, console is channel:0, id:128,
lun:0)

Signed-off-by: Ed Lin <[EMAIL PROTECTED]>
---
diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 69be132..29a7b61 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -113,10 +113,6 @@ enum {
SG_CF_64B   = 0x40, /* 64 bit item
*/
SG_CF_HOST  = 0x20, /* sg in host
memory */
 
-   ST_MAX_ARRAY_SUPPORTED  = 16,
-   ST_MAX_TARGET_NUM   =
(ST_MAX_ARRAY_SUPPORTED+1),
-   ST_MAX_LUN_PER_TARGET   = 16,
-
st_shasta   = 0,
st_vsc  = 1,
st_vsc1 = 2,
@@ -606,7 +602,7 @@ ste

RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue

2007-04-04 Thread Ed Lin

 -Original Message-
 From: Ed Lin 
 Sent: Monday, April 02, 2007 4:01 PM
 To: 'James Bottomley'
 Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
 Subject: RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue

  -Original Message-
  From: James Bottomley [mailto:[EMAIL PROTECTED] 
  Sent: Saturday, March 31, 2007 7:22 AM
  To: Ed Lin
  Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
  Subject: Re: [PATCH 1/4] [SCSI]stex: fix id mapping issue

  On Fri, 2007-03-30 at 15:21 -0700, Ed Lin wrote:
   The internal id/lun mapping of st_vsc and st_vsc1 
  controllers is different
   from st_shasta. The original driver code can only  map 
  first 16 'entities'
   for st_vsc and st_vsc1 while there are actually 128 available.

   Also the  ST_MAX_LUN_PER_TARGET should be 8, although this can do
   no harm because inquiries beyond boundary are discarded 
 by firmware.

   The correct internal mapping should be:
   id:0~15, lun:0~7 (st_shasta)
   id:0, lun:0~127 (st_yosemite)
   id:0~127, lun:0 (st_vsc and st_vsc1)
   To scsi mid layer they are all channel:0~7, id:0~15, lun:0, 
  with a maximun
   'entity' number of 128. The RAID console only interfaces to 
  scsi mid layer
   and is always mapped at channel:0, id:16, lun:0.

  I'm with Christoph here ... if we're going to break the backwards
  compatibility of the mappings (which your code does) then we 
  could just
  dump channel and use the SCSI id and lun directly.

  Understanding this code is predicated on this quirky definition in
  stex_queuecommand:

  id = cmd-device-id;
  lun = cmd-device-channel; /* firmware lun issue work around */
 ^^^

   @ -645,12 +645,16 @@ stex_queuecommand(struct scsi_cmnd *cmd,

 req = stex_alloc_req(hba);

   - if (hba-cardtype == st_yosemite) {
   - req-lun = lun * (ST_MAX_TARGET_NUM - 1) + id;

  This looks to be correct, it goes up id 0 to 
 ST_MAX_TARGET_NUM -1 then
  takes the next channel.

   - req-target = 0;
   - } else {
   + if (hba-cardtype == st_shasta) {
 req-lun = lun;
 req-target = id;
   + } else if (hba-cardtype == st_yosemite){
   + req-lun = id * ST_MAX_LUN_PER_TARGET + lun;
   + req-target = 0;
   + } else {
   + /* st_vsc and st_vsc1 */
   + req-lun = 0;
   + req-target = id * ST_MAX_LUN_PER_TARGET + lun;

  These both look to be wrong.  You're taking the channel as 
 the lowest
  common denominator, so your first target is on channel 1 id 
  0, your next
  on channel 2, id 0 and so on.  That's really going to mess with the
  ordering (which will be user visible) is that really what you want?

 How about the attached one? 

Sorry. It seems the mail server has problem. The patch is here in plain
text. I hope this time it does not mess up. I have problem with
linux-scsi
mail list, if you have comment please cc me. Thanks.
--Ed Lin

The internal id/lun mapping of st_vsc and st_vsc1 controllers is
different
from st_shasta. The original driver code can only  map first 16
'entities'
for st_vsc and st_vsc1 while there are actually 128 available.

The correct mapping should be:
channel:0~7, id:0~15 (st_shasta, console is channel:0, id:16, lun:0)
channel:0~127, id:0 (st_yosemite, console is channel:0, id:1, lun:0)
channel:0, id:0~127 (st_vsc and st_vsc1, console is channel:0, id:128,
lun:0)

Signed-off-by: Ed Lin [EMAIL PROTECTED]
---
diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 69be132..29a7b61 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -113,10 +113,6 @@ enum {
SG_CF_64B   = 0x40, /* 64 bit item
*/
SG_CF_HOST  = 0x20, /* sg in host
memory */

-   ST_MAX_ARRAY_SUPPORTED  = 16,
-   ST_MAX_TARGET_NUM   =
(ST_MAX_ARRAY_SUPPORTED+1),
-   ST_MAX_LUN_PER_TARGET   = 16,
-
st_shasta   = 0,
st_vsc  = 1,
st_vsc1 = 2,
@@ -606,7 +602,7 @@ stex_queuecommand(struct scsi_cmnd *cmd,
return 0;
}
case INQUIRY:
-   if (id != ST_MAX_ARRAY_SUPPORTED)
+   if (id != host-max_id - 1)
break;
if (lun == 0  (cmd-cmnd[1]  INQUIRY_EVPD) == 0) {
stex_direct_copy(cmd, console_inq_page,
@@ -624,7 +620,7 @@ stex_queuecommand(struct scsi_cmnd *cmd,
ver.oem = ST_OEM;
ver.build = ST_BUILD_VER;
ver.signature[0] = PASSTHRU_SIGNATURE;
-   ver.console_id = ST_MAX_ARRAY_SUPPORTED;
+   ver.console_id = host-max_id - 1;
ver.host_no = hba-host-host_no;
cmd-result = stex_direct_copy(cmd, ver,
sizeof(ver

RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for console device

2007-04-04 Thread Ed Lin

 -Original Message-
 From: Ed Lin 
 Sent: Monday, April 02, 2007 4:02 PM
 To: James Bottomley
 Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
 Subject: RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for 
 console device

  -Original Message-
  From: James Bottomley [mailto:[EMAIL PROTECTED] 
  Sent: Monday, April 02, 2007 11:28 AM
  To: Ed Lin
  Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
  Subject: RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for 
  console device

  On Mon, 2007-04-02 at 11:14 -0700, Ed Lin wrote:
   I just saw the routine name scsi_eh_try_stu, and didn't notice the
   allow_restart (partly because I thought it was not harmful...).
   But the TEST_UNIT_READY must stay.

  Sure ... I was just checking since your change log implied 
 you'd seen
  the problem from the error handler ... however, we can add it ...
  there's a possibility of getting spin up on init from sd anyway.

 You make the decision. But after reconsideration, I think it's better
 to remove unused code. It also needs change since the patch about
 id mapping is modified in another mail.

 How about the attachment here?

Sorry. I think the mail server has problem with attachment. The patch
is here in plain text. Sorry for the inconvenience. I have problem with
linux-scsi mail list, if you have comment please cc me. Thanks.
--Ed Lin

After reset completed, the scsi error handler sends out TEST_UNIT_READY
to
the device. For 'normal' devices this command will be handled by
firmware.
However, because the RAID console only interfaces to scsi mid layer, the
firmware will not process this command for it. This will make the
console
to be offlined right after reset. Add the handling in driver to fix this
problem.

Signed-off-by: Ed Lin [EMAIL PROTECTED]
---
diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 4c6ce6a..85c779b 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -601,6 +601,13 @@ stex_queuecommand(struct scsi_cmnd *cmd,
stex_invalid_field(cmd, done);
return 0;
}
+   case TEST_UNIT_READY:
+   if (id == host-max_id - 1) {
+   cmd-result = DID_OK  16 | COMMAND_COMPLETE 
8;
+   done(cmd);
+   return 0;
+   }
+   break;
case INQUIRY:
if (id != host-max_id - 1)
break; 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue

2007-04-04 Thread Ed Lin



 -Original Message-
 From: James Bottomley [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, April 04, 2007 10:36 AM
 To: Ed Lin
 Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
 Subject: RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue
 
 
 On Wed, 2007-04-04 at 10:31 -0700, Ed Lin wrote:
 
  Sorry. It seems the mail server has problem. The patch is 
 here in plain
  text. I hope this time it does not mess up. I have problem with
  linux-scsi
  mail list, if you have comment please cc me. Thanks.
  --Ed Lin
 
 The lines are still broken, I'm afraid ... you can just resend as an
 attachement git-applypatch copes fine with that ... inline is 
 just good
 for quoting and replying.
  +   if (hba-cardtype == st_shasta) {
  +   host-max_channel = 7;
  +   host-max_id = 16 + 1;
  +   } else if (hba-cardtype == st_yosemite) {
  +   host-max_channel = 127;
  +   host-max_id = 1 + 1;
  +   } else {
  +   /* st_vsc and st_vsc1 */
  +   host-max_channel = 0;
  +   host-max_id = 128 + 1;
 
 This is OK.  The use of -channel is still a bit strange ... could we
 not simply use lun instead of channel (i.e. map the adapter id/lun to
 the mid-layer id/lun instead of using id/channel)?
 

This is because there is a CONFIG_SCSI_MULTI_LUN option.
If this option is not selected, max_scsi_luns will be set to 1 and
the RAID arrays with lun0 will disappear(because they are not
scanned). That is unacceptable from a user's view point.  I have
also explained this in the code comment:

/*
 * firmware uses id/lun pair for a logical drive, but lun would be
 * always 0 if CONFIG_SCSI_MULTI_LUN not configured, so we use
 * channel to map lun here
 */

If it is not allowed to map channel to lun, then maybe I have to
report 128 targets and do the mapping in queuecommand...
After all there must be a mapping somewhere so I don't see
much difference here...

I paste the patch again, is the format ok?



The internal id/lun mapping of st_vsc and st_vsc1 controllers is different
from st_shasta. The original driver code can only  map first 16 'entities'
for st_vsc and st_vsc1 while there are actually 128 available.

The correct mapping should be:
channel:0~7, id:0~15 (st_shasta, console is channel:0, id:16, lun:0)
channel:0~127, id:0 (st_yosemite, console is channel:0, id:1, lun:0)
channel:0, id:0~127 (st_vsc and st_vsc1, console is channel:0, id:128, lun:0)

Signed-off-by: Ed Lin [EMAIL PROTECTED]
---
diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 69be132..29a7b61 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -113,10 +113,6 @@ enum {
SG_CF_64B   = 0x40, /* 64 bit item */
SG_CF_HOST  = 0x20, /* sg in host memory */
 
-   ST_MAX_ARRAY_SUPPORTED  = 16,
-   ST_MAX_TARGET_NUM   = (ST_MAX_ARRAY_SUPPORTED+1),
-   ST_MAX_LUN_PER_TARGET   = 16,
-
st_shasta   = 0,
st_vsc  = 1,
st_vsc1 = 2,
@@ -606,7 +602,7 @@ stex_queuecommand(struct scsi_cmnd *cmd,
return 0;
}
case INQUIRY:
-   if (id != ST_MAX_ARRAY_SUPPORTED)
+   if (id != host-max_id - 1)
break;
if (lun == 0  (cmd-cmnd[1]  INQUIRY_EVPD) == 0) {
stex_direct_copy(cmd, console_inq_page,
@@ -624,7 +620,7 @@ stex_queuecommand(struct scsi_cmnd *cmd,
ver.oem = ST_OEM;
ver.build = ST_BUILD_VER;
ver.signature[0] = PASSTHRU_SIGNATURE;
-   ver.console_id = ST_MAX_ARRAY_SUPPORTED;
+   ver.console_id = host-max_id - 1;
ver.host_no = hba-host-host_no;
cmd-result = stex_direct_copy(cmd, ver, sizeof(ver)) ?
DID_OK  16 | COMMAND_COMPLETE  8 :
@@ -645,13 +641,8 @@ stex_queuecommand(struct scsi_cmnd *cmd,
 
req = stex_alloc_req(hba);
 
-   if (hba-cardtype == st_yosemite) {
-   req-lun = lun * (ST_MAX_TARGET_NUM - 1) + id;
-   req-target = 0;
-   } else {
-   req-lun = lun;
-   req-target = id;
-   }
+   req-lun = lun;
+   req-target = id;
 
/* cdb */
memcpy(req-cdb, cmd-cmnd, STEX_CDB_LENGTH);
@@ -1229,11 +1220,22 @@ stex_probe(struct pci_dev *pdev, const s
hba-copy_buffer = hba-dma_mem + MU_BUFFER_SIZE;
hba-mu_status = MU_STATE_STARTING;
 
-   /* firmware uses id/lun pair for a logical drive, but lun would be
-  always 0 if CONFIG_SCSI_MULTI_LUN not configured, so we use
-  channel to map lun here */
-   host-max_channel = ST_MAX_LUN_PER_TARGET - 1;
-   host-max_id = ST_MAX_TARGET_NUM

RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for console device

2007-04-02 Thread Ed Lin



> -Original Message-
> From: James Bottomley [mailto:[EMAIL PROTECTED] 
> Sent: Monday, April 02, 2007 11:28 AM
> To: Ed Lin
> Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
> Subject: RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for 
> console device
> 
> 
> On Mon, 2007-04-02 at 11:14 -0700, Ed Lin wrote:
> > I just saw the routine name scsi_eh_try_stu, and didn't notice the
> > allow_restart (partly because I thought it was not harmful...).
> > But the TEST_UNIT_READY must stay.
> 
> Sure ... I was just checking since your change log implied you'd seen
> the problem from the error handler ... however, we can add it ...
> there's a possibility of getting spin up on init from sd anyway.
> 

You make the decision. But after reconsideration, I think it's better
to remove unused code. It also needs change since the patch about
id mapping is modified in another mail.

How about the attachment here?


s3
Description: s3

RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue

2007-04-02 Thread Ed Lin



> -Original Message-
> From: James Bottomley [mailto:[EMAIL PROTECTED] 
> Sent: Saturday, March 31, 2007 7:22 AM
> To: Ed Lin
> Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
> Subject: Re: [PATCH 1/4] [SCSI]stex: fix id mapping issue
> 
> 
> On Fri, 2007-03-30 at 15:21 -0700, Ed Lin wrote:
> > The internal id/lun mapping of st_vsc and st_vsc1 
> controllers is different
> > from st_shasta. The original driver code can only  map 
> first 16 'entities'
> > for st_vsc and st_vsc1 while there are actually 128 available.
> > 
> > Also the  ST_MAX_LUN_PER_TARGET should be 8, although this can do
> > no harm because inquiries beyond boundary are discarded by firmware.
> > 
> > The correct internal mapping should be:
> > id:0~15, lun:0~7 (st_shasta)
> > id:0, lun:0~127 (st_yosemite)
> > id:0~127, lun:0 (st_vsc and st_vsc1)
> > To scsi mid layer they are all channel:0~7, id:0~15, lun:0, 
> with a maximun
> > 'entity' number of 128. The RAID console only interfaces to 
> scsi mid layer
> > and is always mapped at channel:0, id:16, lun:0.
> 
> I'm with Christoph here ... if we're going to break the backwards
> compatibility of the mappings (which your code does) then we 
> could just
> dump channel and use the SCSI id and lun directly.
> 
> Understanding this code is predicated on this quirky definition in
> stex_queuecommand:
> 
>   id = cmd->device->id;
>   lun = cmd->device->channel; /* firmware lun issue work around */
>^^^
> 
> > @ -645,12 +645,16 @@ stex_queuecommand(struct scsi_cmnd *cmd,
> >  
> > req = stex_alloc_req(hba);
> >  
> > -   if (hba->cardtype == st_yosemite) {
> > -   req->lun = lun * (ST_MAX_TARGET_NUM - 1) + id;
> 
> This looks to be correct, it goes up id 0 to ST_MAX_TARGET_NUM -1 then
> takes the next channel.
> 
> > -   req->target = 0;
> > -   } else {
> > +   if (hba->cardtype == st_shasta) {
> > req->lun = lun;
> > req->target = id;
> > +   } else if (hba->cardtype == st_yosemite){
> > +   req->lun = id * ST_MAX_LUN_PER_TARGET + lun;
> > +   req->target = 0;
> > +   } else {
> > +   /* st_vsc and st_vsc1 */
> > +   req->lun = 0;
> > +   req->target = id * ST_MAX_LUN_PER_TARGET + lun;
> 
> These both look to be wrong.  You're taking the channel as the lowest
> common denominator, so your first target is on channel 1 id 
> 0, your next
> on channel 2, id 0 and so on.  That's really going to mess with the
> ordering (which will be user visible) is that really what you want?
> 

How about the attached one? 


s1
Description: s1

RE: [PATCH 4/4] [SCSI]stex: minor cleanup and version update

2007-04-02 Thread Ed Lin



> -Original Message-
> From: Brian King [mailto:[EMAIL PROTECTED] 
> Sent: Monday, April 02, 2007 9:05 AM
> To: Ed Lin
> Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; Promise_Linux
> Subject: Re: [PATCH 4/4] [SCSI]stex: minor cleanup and version update
> 
> 
> Ed Lin wrote:
> > @@ -1007,6 +1008,11 @@ static int stex_abort(struct scsi_cmnd *
> > u32 data;
> > int result = SUCCESS;
> > unsigned long flags;
> > +
> > +   printk(KERN_INFO DRV_NAME
> > +   "(%s): aborting command\n", pci_name(hba->pdev));
> > +   scsi_print_command(cmd);
> > +
> 
> scmd_printk is probably what you want here rather than just a printk.
> 
> scmd_printk(KERN_ERR, cmd, "aborting command\n");
> 
> 
> > @@ -1092,6 +1098,10 @@ static int stex_reset(struct scsi_cmnd *
> > unsigned long before;
> > hba = (struct st_hba *) >device->host->hostdata[0];
> > 
> > +   printk(KERN_INFO DRV_NAME
> > +   "(%s): resetting host\n", pci_name(hba->pdev));
> > +   scsi_print_command(cmd);
> > +
> 
> Same here.

Well it's just because printk with pci_name stuff is used across
the driver, and I didn't want to break the rule... 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for console device

2007-04-02 Thread Ed Lin



> -Original Message-
> From: James Bottomley [mailto:[EMAIL PROTECTED] 
> Sent: Saturday, March 31, 2007 11:46 AM
> To: Ed Lin
> Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
> Subject: Re: [PATCH 3/4] [SCSI]stex: fix reset recovery for 
> console device
> 
> 
> On Fri, 2007-03-30 at 15:21 -0700, Ed Lin wrote:
> > After reset completed, the scsi error handler sends out START_STOP
> > and TEST_UNIT_READY to the device. For 'normal' devices these
> > commands will be handled by firmware. However, because the RAID
> > console only interfaces to scsi mid layer, the firmware 
> will not process
> > these commands for it. This will make the console to be 
> offlined right
> > after reset. Add the handling in driver to fix this problem.
> 
> I don't see how this explanation can be correct.  The error 
> handler only
> sends a START_STOP command if sdev->allow_restart is one, 
> which you have
> to set in the slave_configure routines (which stex doesn't).  
> If you're
> seeing a START_STOP in the eh path, there's something else wrong.
> TEST_UNIT_READY, certainly ... it's part of a restart check.
> 

I just saw the routine name scsi_eh_try_stu, and didn't notice the
allow_restart (partly because I thought it was not harmful...).
But the TEST_UNIT_READY must stay.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue

2007-04-02 Thread Ed Lin



> -Original Message-
> From: James Bottomley [mailto:[EMAIL PROTECTED] 
> Sent: Saturday, March 31, 2007 7:22 AM
> To: Ed Lin
> Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
> Subject: Re: [PATCH 1/4] [SCSI]stex: fix id mapping issue
> 
> 
> On Fri, 2007-03-30 at 15:21 -0700, Ed Lin wrote:
> > The internal id/lun mapping of st_vsc and st_vsc1 
> controllers is different
> > from st_shasta. The original driver code can only  map 
> first 16 'entities'
> > for st_vsc and st_vsc1 while there are actually 128 available.
> > 
> > Also the  ST_MAX_LUN_PER_TARGET should be 8, although this can do
> > no harm because inquiries beyond boundary are discarded by firmware.
> > 
> > The correct internal mapping should be:
> > id:0~15, lun:0~7 (st_shasta)
> > id:0, lun:0~127 (st_yosemite)
> > id:0~127, lun:0 (st_vsc and st_vsc1)
> > To scsi mid layer they are all channel:0~7, id:0~15, lun:0, 
> with a maximun
> > 'entity' number of 128. The RAID console only interfaces to 
> scsi mid layer
> > and is always mapped at channel:0, id:16, lun:0.
> 
> I'm with Christoph here ... if we're going to break the backwards
> compatibility of the mappings (which your code does) then we 
> could just
> dump channel and use the SCSI id and lun directly.
> 
> Understanding this code is predicated on this quirky definition in
> stex_queuecommand:
> 
>   id = cmd->device->id;
>   lun = cmd->device->channel; /* firmware lun issue work around */
>^^^
> 
> > @ -645,12 +645,16 @@ stex_queuecommand(struct scsi_cmnd *cmd,
> >  
> > req = stex_alloc_req(hba);
> >  
> > -   if (hba->cardtype == st_yosemite) {
> > -   req->lun = lun * (ST_MAX_TARGET_NUM - 1) + id;
> 
> This looks to be correct, it goes up id 0 to ST_MAX_TARGET_NUM -1 then
> takes the next channel.
> 
> > -   req->target = 0;
> > -   } else {
> > +   if (hba->cardtype == st_shasta) {
> > req->lun = lun;
> > req->target = id;
> > +   } else if (hba->cardtype == st_yosemite){
> > +   req->lun = id * ST_MAX_LUN_PER_TARGET + lun;
> > +   req->target = 0;
> > +   } else {
> > +   /* st_vsc and st_vsc1 */
> > +   req->lun = 0;
> > +   req->target = id * ST_MAX_LUN_PER_TARGET + lun;
> 
> These both look to be wrong.  You're taking the channel as the lowest
> common denominator, so your first target is on channel 1 id 
> 0, your next
> on channel 2, id 0 and so on.  That's really going to mess with the
> ordering (which will be user visible) is that really what you want?
> 

Well the firmware doesn't care about the scanning order. You can jump
first on 0,8,16... and then go back to 1,9,17... These are consistent
to the user if the code is that way(may break previous impression
though).
Anyway this is a simulation and I can make whatever necessary
adjustment.


 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue

2007-04-02 Thread Ed Lin

> -Original Message-
> From: Christoph Hellwig [mailto:[EMAIL PROTECTED] 
> Sent: Saturday, March 31, 2007 2:27 AM
> To: Ed Lin
> Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; Promise_Linux
> Subject: Re: [PATCH 1/4] [SCSI]stex: fix id mapping issue
> 
> 
> On Fri, Mar 30, 2007 at 03:21:33PM -0700, Ed Lin wrote:
> > +   if (hba->cardtype == st_shasta) {
> > req->lun = lun;
> > req->target = id;
> > +   } else if (hba->cardtype == st_yosemite){
> > +   req->lun = id * ST_MAX_LUN_PER_TARGET + lun;
> > +   req->target = 0;
> > +   } else {
> > +   /* st_vsc and st_vsc1 */
> > +   req->lun = 0;
> > +   req->target = id * ST_MAX_LUN_PER_TARGET + lun;
> 
> I don't get why you can't export id as targer and lun as lun for
> the !st_shasta types.  Could you explain in detail what the problem
> with that approach would be?
> 
>

Of course I can do that. That will result in 1 target and 128 lun
for st_yosemite and 128 target and 1 lun for st_vsc. That seems
a little weird and I am afraid it will be turned down. Also
I can keep a same mapping for the console in the original code.

If you think it's ok, that's really better, because it makes the
hot path a bit faster. Also because of the CONFIG_SCSI_MULTI_LUN
option, I have to map lun to channel otherwise many entities
will disappear when that option is not selected. Plus I have to
reserve a slot for the RAID console, so the final mapping may be:

channel:0~7, id:0~16(st_shasta, channel 0,id 16 is reserved for console)
channel:0~127, id:0~1(st_yosemite, channel 0,id 1 is reserved for
console)
channel:0, id:0~128(st_vsc, channel 0,id 128 is reserved for console)

I don't know whether this is acceptable. 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue

2007-04-02 Thread Ed Lin

 -Original Message-
 From: Christoph Hellwig [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, March 31, 2007 2:27 AM
 To: Ed Lin
 Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; Promise_Linux
 Subject: Re: [PATCH 1/4] [SCSI]stex: fix id mapping issue

 On Fri, Mar 30, 2007 at 03:21:33PM -0700, Ed Lin wrote:
  +   if (hba-cardtype == st_shasta) {
  req-lun = lun;
  req-target = id;
  +   } else if (hba-cardtype == st_yosemite){
  +   req-lun = id * ST_MAX_LUN_PER_TARGET + lun;
  +   req-target = 0;
  +   } else {
  +   /* st_vsc and st_vsc1 */
  +   req-lun = 0;
  +   req-target = id * ST_MAX_LUN_PER_TARGET + lun;

 I don't get why you can't export id as targer and lun as lun for
 the !st_shasta types.  Could you explain in detail what the problem
 with that approach would be?

Of course I can do that. That will result in 1 target and 128 lun
for st_yosemite and 128 target and 1 lun for st_vsc. That seems
a little weird and I am afraid it will be turned down. Also
I can keep a same mapping for the console in the original code.

If you think it's ok, that's really better, because it makes the
hot path a bit faster. Also because of the CONFIG_SCSI_MULTI_LUN
option, I have to map lun to channel otherwise many entities
will disappear when that option is not selected. Plus I have to
reserve a slot for the RAID console, so the final mapping may be:

channel:0~7, id:0~16(st_shasta, channel 0,id 16 is reserved for console)
channel:0~127, id:0~1(st_yosemite, channel 0,id 1 is reserved for
console)
channel:0, id:0~128(st_vsc, channel 0,id 128 is reserved for console)

I don't know whether this is acceptable. 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue

2007-04-02 Thread Ed Lin

 -Original Message-
 From: James Bottomley [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, March 31, 2007 7:22 AM
 To: Ed Lin
 Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
 Subject: Re: [PATCH 1/4] [SCSI]stex: fix id mapping issue

 On Fri, 2007-03-30 at 15:21 -0700, Ed Lin wrote:
  The internal id/lun mapping of st_vsc and st_vsc1 
 controllers is different
  from st_shasta. The original driver code can only  map 
 first 16 'entities'
  for st_vsc and st_vsc1 while there are actually 128 available.

  Also the  ST_MAX_LUN_PER_TARGET should be 8, although this can do
  no harm because inquiries beyond boundary are discarded by firmware.

  The correct internal mapping should be:
  id:0~15, lun:0~7 (st_shasta)
  id:0, lun:0~127 (st_yosemite)
  id:0~127, lun:0 (st_vsc and st_vsc1)
  To scsi mid layer they are all channel:0~7, id:0~15, lun:0, 
 with a maximun
  'entity' number of 128. The RAID console only interfaces to 
 scsi mid layer
  and is always mapped at channel:0, id:16, lun:0.

 I'm with Christoph here ... if we're going to break the backwards
 compatibility of the mappings (which your code does) then we 
 could just
 dump channel and use the SCSI id and lun directly.

 Understanding this code is predicated on this quirky definition in
 stex_queuecommand:

   id = cmd-device-id;
   lun = cmd-device-channel; /* firmware lun issue work around */
^^^

  @ -645,12 +645,16 @@ stex_queuecommand(struct scsi_cmnd *cmd,

  req = stex_alloc_req(hba);

  -   if (hba-cardtype == st_yosemite) {
  -   req-lun = lun * (ST_MAX_TARGET_NUM - 1) + id;

 This looks to be correct, it goes up id 0 to ST_MAX_TARGET_NUM -1 then
 takes the next channel.

  -   req-target = 0;
  -   } else {
  +   if (hba-cardtype == st_shasta) {
  req-lun = lun;
  req-target = id;
  +   } else if (hba-cardtype == st_yosemite){
  +   req-lun = id * ST_MAX_LUN_PER_TARGET + lun;
  +   req-target = 0;
  +   } else {
  +   /* st_vsc and st_vsc1 */
  +   req-lun = 0;
  +   req-target = id * ST_MAX_LUN_PER_TARGET + lun;

 These both look to be wrong.  You're taking the channel as the lowest
 common denominator, so your first target is on channel 1 id 
 0, your next
 on channel 2, id 0 and so on.  That's really going to mess with the
 ordering (which will be user visible) is that really what you want?

Well the firmware doesn't care about the scanning order. You can jump
first on 0,8,16... and then go back to 1,9,17... These are consistent
to the user if the code is that way(may break previous impression
though).
Anyway this is a simulation and I can make whatever necessary
adjustment.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for console device

2007-04-02 Thread Ed Lin

 -Original Message-
 From: James Bottomley [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, March 31, 2007 11:46 AM
 To: Ed Lin
 Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
 Subject: Re: [PATCH 3/4] [SCSI]stex: fix reset recovery for 
 console device

 On Fri, 2007-03-30 at 15:21 -0700, Ed Lin wrote:
  After reset completed, the scsi error handler sends out START_STOP
  and TEST_UNIT_READY to the device. For 'normal' devices these
  commands will be handled by firmware. However, because the RAID
  console only interfaces to scsi mid layer, the firmware 
 will not process
  these commands for it. This will make the console to be 
 offlined right
  after reset. Add the handling in driver to fix this problem.

 I don't see how this explanation can be correct.  The error 
 handler only
 sends a START_STOP command if sdev-allow_restart is one, 
 which you have
 to set in the slave_configure routines (which stex doesn't).  
 If you're
 seeing a START_STOP in the eh path, there's something else wrong.
 TEST_UNIT_READY, certainly ... it's part of a restart check.

I just saw the routine name scsi_eh_try_stu, and didn't notice the
allow_restart (partly because I thought it was not harmful...).
But the TEST_UNIT_READY must stay.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 4/4] [SCSI]stex: minor cleanup and version update

2007-04-02 Thread Ed Lin

 -Original Message-
 From: Brian King [mailto:[EMAIL PROTECTED] 
 Sent: Monday, April 02, 2007 9:05 AM
 To: Ed Lin
 Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; Promise_Linux
 Subject: Re: [PATCH 4/4] [SCSI]stex: minor cleanup and version update

 Ed Lin wrote:
  @@ -1007,6 +1008,11 @@ static int stex_abort(struct scsi_cmnd *
  u32 data;
  int result = SUCCESS;
  unsigned long flags;
  +
  +   printk(KERN_INFO DRV_NAME
  +   (%s): aborting command\n, pci_name(hba-pdev));
  +   scsi_print_command(cmd);
  +

 scmd_printk is probably what you want here rather than just a printk.

 scmd_printk(KERN_ERR, cmd, aborting command\n);

  @@ -1092,6 +1098,10 @@ static int stex_reset(struct scsi_cmnd *
  unsigned long before;
  hba = (struct st_hba *) cmd-device-host-hostdata[0];

  +   printk(KERN_INFO DRV_NAME
  +   (%s): resetting host\n, pci_name(hba-pdev));
  +   scsi_print_command(cmd);
  +

 Same here.

Well it's just because printk with pci_name stuff is used across
the driver, and I didn't want to break the rule... 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 1/4] [SCSI]stex: fix id mapping issue

2007-04-02 Thread Ed Lin

 -Original Message-
 From: James Bottomley [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, March 31, 2007 7:22 AM
 To: Ed Lin
 Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
 Subject: Re: [PATCH 1/4] [SCSI]stex: fix id mapping issue

 On Fri, 2007-03-30 at 15:21 -0700, Ed Lin wrote:
  The internal id/lun mapping of st_vsc and st_vsc1 
 controllers is different
  from st_shasta. The original driver code can only  map 
 first 16 'entities'
  for st_vsc and st_vsc1 while there are actually 128 available.

  Also the  ST_MAX_LUN_PER_TARGET should be 8, although this can do
  no harm because inquiries beyond boundary are discarded by firmware.

  The correct internal mapping should be:
  id:0~15, lun:0~7 (st_shasta)
  id:0, lun:0~127 (st_yosemite)
  id:0~127, lun:0 (st_vsc and st_vsc1)
  To scsi mid layer they are all channel:0~7, id:0~15, lun:0, 
 with a maximun
  'entity' number of 128. The RAID console only interfaces to 
 scsi mid layer
  and is always mapped at channel:0, id:16, lun:0.

 I'm with Christoph here ... if we're going to break the backwards
 compatibility of the mappings (which your code does) then we 
 could just
 dump channel and use the SCSI id and lun directly.

 Understanding this code is predicated on this quirky definition in
 stex_queuecommand:

   id = cmd-device-id;
   lun = cmd-device-channel; /* firmware lun issue work around */
^^^

  @ -645,12 +645,16 @@ stex_queuecommand(struct scsi_cmnd *cmd,

  req = stex_alloc_req(hba);

  -   if (hba-cardtype == st_yosemite) {
  -   req-lun = lun * (ST_MAX_TARGET_NUM - 1) + id;

 This looks to be correct, it goes up id 0 to ST_MAX_TARGET_NUM -1 then
 takes the next channel.

  -   req-target = 0;
  -   } else {
  +   if (hba-cardtype == st_shasta) {
  req-lun = lun;
  req-target = id;
  +   } else if (hba-cardtype == st_yosemite){
  +   req-lun = id * ST_MAX_LUN_PER_TARGET + lun;
  +   req-target = 0;
  +   } else {
  +   /* st_vsc and st_vsc1 */
  +   req-lun = 0;
  +   req-target = id * ST_MAX_LUN_PER_TARGET + lun;

 These both look to be wrong.  You're taking the channel as the lowest
 common denominator, so your first target is on channel 1 id 
 0, your next
 on channel 2, id 0 and so on.  That's really going to mess with the
 ordering (which will be user visible) is that really what you want?

How about the attached one? 

s1
Description: s1

RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for console device

2007-04-02 Thread Ed Lin

 -Original Message-
 From: James Bottomley [mailto:[EMAIL PROTECTED] 
 Sent: Monday, April 02, 2007 11:28 AM
 To: Ed Lin
 Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
 Subject: RE: [PATCH 3/4] [SCSI]stex: fix reset recovery for 
 console device

 On Mon, 2007-04-02 at 11:14 -0700, Ed Lin wrote:
  I just saw the routine name scsi_eh_try_stu, and didn't notice the
  allow_restart (partly because I thought it was not harmful...).
  But the TEST_UNIT_READY must stay.

 Sure ... I was just checking since your change log implied you'd seen
 the problem from the error handler ... however, we can add it ...
 there's a possibility of getting spin up on init from sd anyway.

You make the decision. But after reconsideration, I think it's better
to remove unused code. It also needs change since the patch about
id mapping is modified in another mail.

How about the attachment here?

s3
Description: s3

[PATCH 4/4] [SCSI]stex: minor cleanup and version update

2007-03-30 Thread Ed Lin

Add debug information into abort and host_reset routine.
Change ioremap to ioremap_nocache.
Version updated to 3.6..1.

Signed-off-by: Ed Lin <[EMAIL PROTECTED]>
---
diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 9465f35..5a10cfa 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -32,11 +32,12 @@ #include 
 #include 
 #include 
 #include 
+#include 
 
 #define DRV_NAME "stex"
-#define ST_DRIVER_VERSION "3.1.0.1"
+#define ST_DRIVER_VERSION "3.6..1"
 #define ST_VER_MAJOR   3
-#define ST_VER_MINOR   1
+#define ST_VER_MINOR   6
 #define ST_OEM 0
 #define ST_BUILD_VER   1
 
@@ -1007,6 +1008,11 @@ static int stex_abort(struct scsi_cmnd *
u32 data;
int result = SUCCESS;
unsigned long flags;
+
+   printk(KERN_INFO DRV_NAME
+   "(%s): aborting command\n", pci_name(hba->pdev));
+   scsi_print_command(cmd);
+
base = hba->mmio_base;
spin_lock_irqsave(host->host_lock, flags);
if (tag < host->can_queue && hba->ccb[tag].cmd == cmd)
@@ -1092,6 +1098,10 @@ static int stex_reset(struct scsi_cmnd *
unsigned long before;
hba = (struct st_hba *) >device->host->hostdata[0];
 
+   printk(KERN_INFO DRV_NAME
+   "(%s): resetting host\n", pci_name(hba->pdev));
+   scsi_print_command(cmd);
+
hba->mu_status = MU_STATE_RESETTING;
 
if (hba->cardtype == st_shasta)
@@ -1211,7 +1221,7 @@ stex_probe(struct pci_dev *pdev, const s
goto out_scsi_host_put;
}
 
-   hba->mmio_base = ioremap(pci_resource_start(pdev, 0),
+   hba->mmio_base = ioremap_nocache(pci_resource_start(pdev, 0),
pci_resource_len(pdev, 0));
if ( !hba->mmio_base) {
printk(KERN_ERR DRV_NAME "(%s): memory map failed\n",


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/4] [SCSI]stex: extend hard reset wait time

2007-03-30 Thread Ed Lin

During hard bus reset of st_shasta controllers, 1 ms is not enough for
16-port controllers, although it's good for 8-port controllers.  Extend the
wait time to 100  ms to allow bus resets finish successfully.

Signed-off-by: Ed Lin <[EMAIL PROTECTED]>
---
diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 4d68533..1e8d7ac 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -1055,7 +1055,12 @@ static void stex_hard_reset(struct st_hb
pci_read_config_byte(bus->self, PCI_BRIDGE_CONTROL, _bctl);
pci_bctl |= PCI_BRIDGE_CTL_BUS_RESET;
pci_write_config_byte(bus->self, PCI_BRIDGE_CONTROL, pci_bctl);
-   msleep(1);
+
+   /*
+* 1 ms may be enough for 8-port controllers. But 16-port controllers
+* require more time to finish bus reset. Use 100 ms here for safety
+*/
+   msleep(100);
pci_bctl &= ~PCI_BRIDGE_CTL_BUS_RESET;
pci_write_config_byte(bus->self, PCI_BRIDGE_CONTROL, pci_bctl);
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/4] [SCSI]stex: fix reset recovery for console device

2007-03-30 Thread Ed Lin

After reset completed, the scsi error handler sends out START_STOP
and TEST_UNIT_READY to the device. For 'normal' devices these
commands will be handled by firmware. However, because the RAID
console only interfaces to scsi mid layer, the firmware will not process
these commands for it. This will make the console to be offlined right
after reset. Add the handling in driver to fix this problem.

Signed-off-by: Ed Lin <[EMAIL PROTECTED]>
---
diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 1e8d7ac..9465f35 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -605,6 +605,14 @@ stex_queuecommand(struct scsi_cmnd *cmd,
stex_invalid_field(cmd, done);
return 0;
}
+   case TEST_UNIT_READY:
+   case START_STOP:
+   if (id == ST_MAX_ARRAY_SUPPORTED) {
+   cmd->result = DID_OK << 16 | COMMAND_COMPLETE << 8;
+   done(cmd);
+   return 0;
+   }
+   break;
case INQUIRY:
if (id != ST_MAX_ARRAY_SUPPORTED)
break;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/4] [SCSI]stex: fix id mapping issue

2007-03-30 Thread Ed Lin

The internal id/lun mapping of st_vsc and st_vsc1 controllers is different
from st_shasta. The original driver code can only  map first 16 'entities'
for st_vsc and st_vsc1 while there are actually 128 available.

Also the  ST_MAX_LUN_PER_TARGET should be 8, although this can do
no harm because inquiries beyond boundary are discarded by firmware.

The correct internal mapping should be:
id:0~15, lun:0~7 (st_shasta)
id:0, lun:0~127 (st_yosemite)
id:0~127, lun:0 (st_vsc and st_vsc1)
To scsi mid layer they are all channel:0~7, id:0~15, lun:0, with a maximun
'entity' number of 128. The RAID console only interfaces to scsi mid layer
and is always mapped at channel:0, id:16, lun:0.

Signed-off-by: Ed Lin <[EMAIL PROTECTED]>
---
diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 69be132..4d68533 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -115,7 +115,7 @@ enum {
 
ST_MAX_ARRAY_SUPPORTED  = 16,
ST_MAX_TARGET_NUM   = (ST_MAX_ARRAY_SUPPORTED+1),
-   ST_MAX_LUN_PER_TARGET   = 16,
+   ST_MAX_LUN_PER_TARGET   = 8,
 
st_shasta   = 0,
st_vsc  = 1,
@@ -645,12 +645,16 @@ stex_queuecommand(struct scsi_cmnd *cmd,
 
req = stex_alloc_req(hba);
 
-   if (hba->cardtype == st_yosemite) {
-   req->lun = lun * (ST_MAX_TARGET_NUM - 1) + id;
-   req->target = 0;
-   } else {
+   if (hba->cardtype == st_shasta) {
req->lun = lun;
req->target = id;
+   } else if (hba->cardtype == st_yosemite){
+   req->lun = id * ST_MAX_LUN_PER_TARGET + lun;
+   req->target = 0;
+   } else {
+   /* st_vsc and st_vsc1 */
+   req->lun = 0;
+   req->target = id * ST_MAX_LUN_PER_TARGET + lun;
}
 
/* cdb */


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/4] [SCSI]stex: fix id mapping issue

2007-03-30 Thread Ed Lin

The internal id/lun mapping of st_vsc and st_vsc1 controllers is different
from st_shasta. The original driver code can only  map first 16 'entities'
for st_vsc and st_vsc1 while there are actually 128 available.

Also the  ST_MAX_LUN_PER_TARGET should be 8, although this can do
no harm because inquiries beyond boundary are discarded by firmware.

The correct internal mapping should be:
id:0~15, lun:0~7 (st_shasta)
id:0, lun:0~127 (st_yosemite)
id:0~127, lun:0 (st_vsc and st_vsc1)
To scsi mid layer they are all channel:0~7, id:0~15, lun:0, with a maximun
'entity' number of 128. The RAID console only interfaces to scsi mid layer
and is always mapped at channel:0, id:16, lun:0.

Signed-off-by: Ed Lin [EMAIL PROTECTED]
---
diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 69be132..4d68533 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -115,7 +115,7 @@ enum {
 
ST_MAX_ARRAY_SUPPORTED  = 16,
ST_MAX_TARGET_NUM   = (ST_MAX_ARRAY_SUPPORTED+1),
-   ST_MAX_LUN_PER_TARGET   = 16,
+   ST_MAX_LUN_PER_TARGET   = 8,
 
st_shasta   = 0,
st_vsc  = 1,
@@ -645,12 +645,16 @@ stex_queuecommand(struct scsi_cmnd *cmd,
 
req = stex_alloc_req(hba);
 
-   if (hba-cardtype == st_yosemite) {
-   req-lun = lun * (ST_MAX_TARGET_NUM - 1) + id;
-   req-target = 0;
-   } else {
+   if (hba-cardtype == st_shasta) {
req-lun = lun;
req-target = id;
+   } else if (hba-cardtype == st_yosemite){
+   req-lun = id * ST_MAX_LUN_PER_TARGET + lun;
+   req-target = 0;
+   } else {
+   /* st_vsc and st_vsc1 */
+   req-lun = 0;
+   req-target = id * ST_MAX_LUN_PER_TARGET + lun;
}
 
/* cdb */


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/4] [SCSI]stex: extend hard reset wait time

2007-03-30 Thread Ed Lin

During hard bus reset of st_shasta controllers, 1 ms is not enough for
16-port controllers, although it's good for 8-port controllers.  Extend the
wait time to 100  ms to allow bus resets finish successfully.

Signed-off-by: Ed Lin [EMAIL PROTECTED]
---
diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 4d68533..1e8d7ac 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -1055,7 +1055,12 @@ static void stex_hard_reset(struct st_hb
pci_read_config_byte(bus-self, PCI_BRIDGE_CONTROL, pci_bctl);
pci_bctl |= PCI_BRIDGE_CTL_BUS_RESET;
pci_write_config_byte(bus-self, PCI_BRIDGE_CONTROL, pci_bctl);
-   msleep(1);
+
+   /*
+* 1 ms may be enough for 8-port controllers. But 16-port controllers
+* require more time to finish bus reset. Use 100 ms here for safety
+*/
+   msleep(100);
pci_bctl = ~PCI_BRIDGE_CTL_BUS_RESET;
pci_write_config_byte(bus-self, PCI_BRIDGE_CONTROL, pci_bctl);
 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/4] [SCSI]stex: fix reset recovery for console device

2007-03-30 Thread Ed Lin

After reset completed, the scsi error handler sends out START_STOP
and TEST_UNIT_READY to the device. For 'normal' devices these
commands will be handled by firmware. However, because the RAID
console only interfaces to scsi mid layer, the firmware will not process
these commands for it. This will make the console to be offlined right
after reset. Add the handling in driver to fix this problem.

Signed-off-by: Ed Lin [EMAIL PROTECTED]
---
diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 1e8d7ac..9465f35 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -605,6 +605,14 @@ stex_queuecommand(struct scsi_cmnd *cmd,
stex_invalid_field(cmd, done);
return 0;
}
+   case TEST_UNIT_READY:
+   case START_STOP:
+   if (id == ST_MAX_ARRAY_SUPPORTED) {
+   cmd-result = DID_OK  16 | COMMAND_COMPLETE  8;
+   done(cmd);
+   return 0;
+   }
+   break;
case INQUIRY:
if (id != ST_MAX_ARRAY_SUPPORTED)
break;


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/4] [SCSI]stex: minor cleanup and version update

2007-03-30 Thread Ed Lin

Add debug information into abort and host_reset routine.
Change ioremap to ioremap_nocache.
Version updated to 3.6..1.

Signed-off-by: Ed Lin [EMAIL PROTECTED]
---
diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 9465f35..5a10cfa 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -32,11 +32,12 @@ #include scsi/scsi_device.h
 #include scsi/scsi_cmnd.h
 #include scsi/scsi_host.h
 #include scsi/scsi_tcq.h
+#include scsi/scsi_dbg.h
 
 #define DRV_NAME stex
-#define ST_DRIVER_VERSION 3.1.0.1
+#define ST_DRIVER_VERSION 3.6..1
 #define ST_VER_MAJOR   3
-#define ST_VER_MINOR   1
+#define ST_VER_MINOR   6
 #define ST_OEM 0
 #define ST_BUILD_VER   1
 
@@ -1007,6 +1008,11 @@ static int stex_abort(struct scsi_cmnd *
u32 data;
int result = SUCCESS;
unsigned long flags;
+
+   printk(KERN_INFO DRV_NAME
+   (%s): aborting command\n, pci_name(hba-pdev));
+   scsi_print_command(cmd);
+
base = hba-mmio_base;
spin_lock_irqsave(host-host_lock, flags);
if (tag  host-can_queue  hba-ccb[tag].cmd == cmd)
@@ -1092,6 +1098,10 @@ static int stex_reset(struct scsi_cmnd *
unsigned long before;
hba = (struct st_hba *) cmd-device-host-hostdata[0];
 
+   printk(KERN_INFO DRV_NAME
+   (%s): resetting host\n, pci_name(hba-pdev));
+   scsi_print_command(cmd);
+
hba-mu_status = MU_STATE_RESETTING;
 
if (hba-cardtype == st_shasta)
@@ -1211,7 +1221,7 @@ stex_probe(struct pci_dev *pdev, const s
goto out_scsi_host_put;
}
 
-   hba-mmio_base = ioremap(pci_resource_start(pdev, 0),
+   hba-mmio_base = ioremap_nocache(pci_resource_start(pdev, 0),
pci_resource_len(pdev, 0));
if ( !hba-mmio_base) {
printk(KERN_ERR DRV_NAME (%s): memory map failed\n,


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [patch] scsi: use lock per host instead of per device for shared queue tag host

2007-01-25 Thread Ed Lin



> -Original Message-
> From: Jens Axboe [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, January 25, 2007 7:48 AM
> To: Ed Lin
> Cc: David Somayajulu; Michael Reed; linux-scsi; linux-kernel; 
> james.Bottomley; jeff; Promise_Linux
> Subject: Re: [patch] scsi: use lock per host instead of per 
> device for shared queue tag host
> 
> 
> On Thu, Jan 25 2007, Jens Axboe wrote:
> > On Wed, Jan 24 2007, Ed Lin wrote:
> > > 
> > > 
> > > > -Original Message-
> > > > From: David Somayajulu [mailto:[EMAIL PROTECTED] 
> > > > Sent: Wednesday, January 24, 2007 5:03 PM
> > > > To: Ed Lin; Michael Reed
> > > > Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; 
> > > > Promise_Linux; Jens Axboe
> > > > Subject: RE: [patch] scsi: use lock per host instead of per 
> > > > device for shared queue tag host
> > > > 
> > > > 
> > > > > It seems another driver(qla4xxx) is also using shared 
> queue tag.
> > > > > It is natural to imagine there might be same symptom in that
> > > > > driver. But I don't know the driver and have no hardware so I
> > > > > can not say anything certain about it.
> > > > 
> > > > qla4xxx implements slightly differently, in the sense we 
> > > > don't have the
> > > > equivalent of 
> > > > struct st_ccb ccb[MU_MAX_REQUEST]; 
> > > > which is in struct st_hba. In other words we don't have 
> a local array
> > > > which like stex to keep track of the outstanding 
> commands to the hba.
> > > > 
> > > > We had a discussion on this one while implementing 
> block-layer tagging
> > > > in qla4xxx and Jens Axboe added the test_and_set_bit() in the 
> > > > following
> > > > code in blk_queue_start_tag() to take care of it.
> > > > do {
> > > > tag = find_first_zero_bit(bqt->tag_map, 
> bqt->max_depth);
> > > > if (tag >= bqt->max_depth)
> > > > return 1;
> > > > } while (test_and_set_bit(tag, bqt->tag_map));
> > > > Please see the following link for the discussion
> > > > http://marc.theaimsgroup.com/?l=linux-scsi=115886351206726=2
> > > > 
> > > > Cheers
> > > > David Somayajulu
> > > > QLogic Corporation
> > > >
> > > 
> > > Yes, this piece of code of allocating tag, in itself, is safe.
> > > But the following
> > > 
> > >   if (unlikely(!__test_and_clear_bit(tag, bqt->tag_map))) {
> > >   printk(KERN_ERR "%s: attempt to clear non-busy tag
> > > (%d)\n",
> > >  __FUNCTION__, tag);
> > >   return;
> > >   }
> > > 
> > > code of freeing tag (in blk_queue_end_tag())seems to be using
> > > unsafe __test_and_clear_bit instead of test_and_clear_bit.
> > > I once changed it to test_and_clear_bit and thought it was fixed.
> > > But the panic happened thereafter nonetheless(using gcc 3.4.6.
> > > gcc 4.1.0 is better but still with kernel errors). bqt also needs
> > > to be protected in this case. Replacing queue lock per device with
> > > a host lock is a simple but logical fix for it. To introduce a
> > > more refined lock is possible, but seems too tedious and elaborate
> > > for this issue, since a queue lock is already out there, and a
> > > hostwide lock is needed anyway.
> > 
> > Does this fix it? There really should be no need to add 
> extra locking
> > for this, it would be a shame.
> > 
> > diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
> > index fb67897..e752e5d 100644
> > --- a/block/ll_rw_blk.c
> > +++ b/block/ll_rw_blk.c
> > @@ -1072,12 +1072,16 @@ void 
> blk_queue_end_tag(request_queue_t *q, struct request *rq)
> >  */
> > return;
> >  
> > -   if (unlikely(!__test_and_clear_bit(tag, bqt->tag_map))) {
> > +   smp_mb__before_clear_bit();
> > +
> > +   if (unlikely(!test_and_clear_bit(tag, bqt->tag_map))) {
> > printk(KERN_ERR "%s: attempt to clear non-busy 
> tag (%d)\n",
> >__FUNCTION__, tag);
> > return;
> > }
> >  
> > +   smp_mb__after_clear_bit();
> > +
> > list_del_init(>queuelist);
> > rq->cmd_flags &= ~REQ_QUEUED;
> > rq->tag = -1;
> &g

RE: [patch] scsi: use lock per host instead of per device for shared queue tag host

2007-01-25 Thread Ed Lin

 -Original Message-
 From: Jens Axboe [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, January 25, 2007 7:48 AM
 To: Ed Lin
 Cc: David Somayajulu; Michael Reed; linux-scsi; linux-kernel; 
 james.Bottomley; jeff; Promise_Linux
 Subject: Re: [patch] scsi: use lock per host instead of per 
 device for shared queue tag host

 On Thu, Jan 25 2007, Jens Axboe wrote:
  On Wed, Jan 24 2007, Ed Lin wrote:

-Original Message-
From: David Somayajulu [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, January 24, 2007 5:03 PM
To: Ed Lin; Michael Reed
Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; 
Promise_Linux; Jens Axboe
Subject: RE: [patch] scsi: use lock per host instead of per 
device for shared queue tag host

 It seems another driver(qla4xxx) is also using shared 
 queue tag.
 It is natural to imagine there might be same symptom in that
 driver. But I don't know the driver and have no hardware so I
 can not say anything certain about it.

qla4xxx implements slightly differently, in the sense we 
don't have the
equivalent of 
struct st_ccb ccb[MU_MAX_REQUEST]; 
which is in struct st_hba. In other words we don't have 
 a local array
which like stex to keep track of the outstanding 
 commands to the hba.

We had a discussion on this one while implementing 
 block-layer tagging
in qla4xxx and Jens Axboe added the test_and_set_bit() in the 
following
code in blk_queue_start_tag() to take care of it.
do {
tag = find_first_zero_bit(bqt-tag_map, 
 bqt-max_depth);
if (tag = bqt-max_depth)
return 1;
} while (test_and_set_bit(tag, bqt-tag_map));
Please see the following link for the discussion
http://marc.theaimsgroup.com/?l=linux-scsim=115886351206726w=2

Cheers
David Somayajulu
QLogic Corporation

   Yes, this piece of code of allocating tag, in itself, is safe.
   But the following

 if (unlikely(!__test_and_clear_bit(tag, bqt-tag_map))) {
 printk(KERN_ERR %s: attempt to clear non-busy tag
   (%d)\n,
__FUNCTION__, tag);
 return;
 }

   code of freeing tag (in blk_queue_end_tag())seems to be using
   unsafe __test_and_clear_bit instead of test_and_clear_bit.
   I once changed it to test_and_clear_bit and thought it was fixed.
   But the panic happened thereafter nonetheless(using gcc 3.4.6.
   gcc 4.1.0 is better but still with kernel errors). bqt also needs
   to be protected in this case. Replacing queue lock per device with
   a host lock is a simple but logical fix for it. To introduce a
   more refined lock is possible, but seems too tedious and elaborate
   for this issue, since a queue lock is already out there, and a
   hostwide lock is needed anyway.

  Does this fix it? There really should be no need to add 
 extra locking
  for this, it would be a shame.

  diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
  index fb67897..e752e5d 100644
  --- a/block/ll_rw_blk.c
  +++ b/block/ll_rw_blk.c
  @@ -1072,12 +1072,16 @@ void 
 blk_queue_end_tag(request_queue_t *q, struct request *rq)
   */
  return;

  -   if (unlikely(!__test_and_clear_bit(tag, bqt-tag_map))) {
  +   smp_mb__before_clear_bit();
  +
  +   if (unlikely(!test_and_clear_bit(tag, bqt-tag_map))) {
  printk(KERN_ERR %s: attempt to clear non-busy 
 tag (%d)\n,
 __FUNCTION__, tag);
  return;
  }

  +   smp_mb__after_clear_bit();
  +
  list_del_init(rq-queuelist);
  rq-cmd_flags = ~REQ_QUEUED;
  rq-tag = -1;

 Double checking the actual implementation, the smp_mb__* should not be
 needed with the test_and_*_bit operations. The __test_and_clear_bit()
 change is needed, though. What kind of crash did you see when you did
 that? It should not crash, but you could see the attempt to clear
 non-busy tag error though.

Besides the test_and_clear_bit, I think the bqt code(refer to last mail)
also needs protection, like:

list_del_init(rq-queuelist);
...
if (unlikely(bqt-tag_index[tag] == NULL))
printk(KERN_ERR %s: tag %d is missing\n,
   __FUNCTION__, tag);

bqt-tag_index[tag] = NULL;
bqt-busy--;

and

bqt-tag_index[tag] = rq;
...
list_add(rq-queuelist, bqt-busy_list);
bqt-busy++;

because bqt is also globally shared within all devices in the host in
this case. (q-queue_tags was assigned as host-bqt in scsi_activate_tcq
)

With a gcc 4.1.0 compiled kernel, I did not get kernel panic, but
still got kernel errors: tag  is missing. I guess a possible race
scenario could be:

cpu a:__test_and_clear_bit
cpu b:test_and_set_bit, allocate a tag just freed by cpu a
cpu b:bqt-tag_index[tag] = rq; 
cpu a:bqt-tag_index[tag] = NULL;

Next time, when the request

RE: [patch] scsi: use lock per host instead of per device forshared queue tag host

2007-01-24 Thread Ed Lin



> -Original Message-
> From: James Bottomley [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, January 24, 2007 9:00 AM
> To: Ed Lin
> Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
> Subject: Re: [patch] scsi: use lock per host instead of per 
> device forshared queue tag host
> 
...
> 
> This patch looks OK in principle.
> 
> However, are you sure you're not using a sledgehammer to crack a nut?
> If the only reason you're doing this is because of the shared tag map,
> then probably that should be the area you protect with a per-tag-map
> lock.   The net effect of what you've done will be to serialise all
> accesses to your storage devices.  For a small number of devices, this
> probably won't matter than much, but for large numbers of devices,
> you're probably going to introduce artificial performance 
> degredation in
> the I/O scheduler.
> 
> James
> 

Thanks.
Maybe this issue needs more discussion. I'll follow up.

Ed Lin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [patch] scsi: use lock per host instead of per device for shared queue tag host

2007-01-24 Thread Ed Lin



> -Original Message-
> From: David Somayajulu [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, January 24, 2007 5:03 PM
> To: Ed Lin; Michael Reed
> Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; 
> Promise_Linux; Jens Axboe
> Subject: RE: [patch] scsi: use lock per host instead of per 
> device for shared queue tag host
> 
> 
> > It seems another driver(qla4xxx) is also using shared queue tag.
> > It is natural to imagine there might be same symptom in that
> > driver. But I don't know the driver and have no hardware so I
> > can not say anything certain about it.
> 
> qla4xxx implements slightly differently, in the sense we 
> don't have the
> equivalent of 
> struct st_ccb ccb[MU_MAX_REQUEST]; 
> which is in struct st_hba. In other words we don't have a local array
> which like stex to keep track of the outstanding commands to the hba.
> 
> We had a discussion on this one while implementing block-layer tagging
> in qla4xxx and Jens Axboe added the test_and_set_bit() in the 
> following
> code in blk_queue_start_tag() to take care of it.
>   do {
>   tag = find_first_zero_bit(bqt->tag_map, bqt->max_depth);
>   if (tag >= bqt->max_depth)
>   return 1;
>   } while (test_and_set_bit(tag, bqt->tag_map));
> Please see the following link for the discussion
> http://marc.theaimsgroup.com/?l=linux-scsi=115886351206726=2
> 
> Cheers
> David Somayajulu
> QLogic Corporation
>

Yes, this piece of code of allocating tag, in itself, is safe.
But the following

if (unlikely(!__test_and_clear_bit(tag, bqt->tag_map))) {
printk(KERN_ERR "%s: attempt to clear non-busy tag
(%d)\n",
   __FUNCTION__, tag);
return;
}

code of freeing tag (in blk_queue_end_tag())seems to be using
unsafe __test_and_clear_bit instead of test_and_clear_bit.
I once changed it to test_and_clear_bit and thought it was fixed.
But the panic happened thereafter nonetheless(using gcc 3.4.6.
gcc 4.1.0 is better but still with kernel errors). bqt also needs
to be protected in this case. Replacing queue lock per device with
a host lock is a simple but logical fix for it. To introduce a
more refined lock is possible, but seems too tedious and elaborate
for this issue, since a queue lock is already out there, and a
hostwide lock is needed anyway.

Thanks,

Ed Lin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [patch] scsi: use lock per host instead of per device for shared queue tag host

2007-01-24 Thread Ed Lin



> -Original Message-
> From: Michael Reed [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, January 24, 2007 7:59 AM
> To: Ed Lin
> Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; Promise_Linux
> Subject: Re: [patch] scsi: use lock per host instead of per 
> device for shared queue tag host
> 
> 
> How 'bout a comment in scsh_host.h indicating that the 
> pointer will be NULL unless
> initialized by the driver?
> 
> "Protect shared block queue tag" is unique to stex.  Perhaps 
> have no comment on
> the variable declaration in scsi_host.h and explain why you 
> use it in stex.
> 
> Mike
> 
> 

Thanks for commenting. I agree more detailed explaination should
be better.

It seems another driver(qla4xxx) is also using shared queue tag.
It is natural to imagine there might be same symptom in that
driver. But I don't know the driver and have no hardware so I
can not say anything certain about it.

Ed Lin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [patch] scsi: use lock per host instead of per device for shared queue tag host

2007-01-24 Thread Ed Lin

 -Original Message-
 From: Michael Reed [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, January 24, 2007 7:59 AM
 To: Ed Lin
 Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; Promise_Linux
 Subject: Re: [patch] scsi: use lock per host instead of per 
 device for shared queue tag host

 How 'bout a comment in scsh_host.h indicating that the 
 pointer will be NULL unless
 initialized by the driver?

 Protect shared block queue tag is unique to stex.  Perhaps 
 have no comment on
 the variable declaration in scsi_host.h and explain why you 
 use it in stex.

 Mike

Thanks for commenting. I agree more detailed explaination should
be better.

It seems another driver(qla4xxx) is also using shared queue tag.
It is natural to imagine there might be same symptom in that
driver. But I don't know the driver and have no hardware so I
can not say anything certain about it.

Ed Lin
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [patch] scsi: use lock per host instead of per device for shared queue tag host

2007-01-24 Thread Ed Lin

 -Original Message-
 From: David Somayajulu [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, January 24, 2007 5:03 PM
 To: Ed Lin; Michael Reed
 Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; 
 Promise_Linux; Jens Axboe
 Subject: RE: [patch] scsi: use lock per host instead of per 
 device for shared queue tag host

  It seems another driver(qla4xxx) is also using shared queue tag.
  It is natural to imagine there might be same symptom in that
  driver. But I don't know the driver and have no hardware so I
  can not say anything certain about it.

 qla4xxx implements slightly differently, in the sense we 
 don't have the
 equivalent of 
 struct st_ccb ccb[MU_MAX_REQUEST]; 
 which is in struct st_hba. In other words we don't have a local array
 which like stex to keep track of the outstanding commands to the hba.

 We had a discussion on this one while implementing block-layer tagging
 in qla4xxx and Jens Axboe added the test_and_set_bit() in the 
 following
 code in blk_queue_start_tag() to take care of it.
   do {
   tag = find_first_zero_bit(bqt-tag_map, bqt-max_depth);
   if (tag = bqt-max_depth)
   return 1;
   } while (test_and_set_bit(tag, bqt-tag_map));
 Please see the following link for the discussion
 http://marc.theaimsgroup.com/?l=linux-scsim=115886351206726w=2

 Cheers
 David Somayajulu
 QLogic Corporation

Yes, this piece of code of allocating tag, in itself, is safe.
But the following

if (unlikely(!__test_and_clear_bit(tag, bqt-tag_map))) {
printk(KERN_ERR %s: attempt to clear non-busy tag
(%d)\n,
   __FUNCTION__, tag);
return;
}

code of freeing tag (in blk_queue_end_tag())seems to be using
unsafe __test_and_clear_bit instead of test_and_clear_bit.
I once changed it to test_and_clear_bit and thought it was fixed.
But the panic happened thereafter nonetheless(using gcc 3.4.6.
gcc 4.1.0 is better but still with kernel errors). bqt also needs
to be protected in this case. Replacing queue lock per device with
a host lock is a simple but logical fix for it. To introduce a
more refined lock is possible, but seems too tedious and elaborate
for this issue, since a queue lock is already out there, and a
hostwide lock is needed anyway.

Thanks,

Ed Lin
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [patch] scsi: use lock per host instead of per device forshared queue tag host

2007-01-24 Thread Ed Lin

 -Original Message-
 From: James Bottomley [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, January 24, 2007 9:00 AM
 To: Ed Lin
 Cc: linux-scsi; linux-kernel; jeff; Promise_Linux
 Subject: Re: [patch] scsi: use lock per host instead of per 
 device forshared queue tag host

...

 This patch looks OK in principle.

 However, are you sure you're not using a sledgehammer to crack a nut?
 If the only reason you're doing this is because of the shared tag map,
 then probably that should be the area you protect with a per-tag-map
 lock.   The net effect of what you've done will be to serialise all
 accesses to your storage devices.  For a small number of devices, this
 probably won't matter than much, but for large numbers of devices,
 you're probably going to introduce artificial performance 
 degredation in
 the I/O scheduler.

 James

Thanks.
Maybe this issue needs more discussion. I'll follow up.

Ed Lin
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] scsi: use lock per host instead of per device for shared queue tag host

2007-01-23 Thread Ed Lin

The block layer uses lock to protect request queue. Every scsi device
has a unique request queue, and queue lock is the default lock in struct
request_queue. This is good for normal cases. But for a  host with
shared queue tag (e.g. stex controllers), a queue lock per device means
the shared queue tag is not protected when multiple devices are accessed
at a same time.  This patch is a simple fix for this situation by introducing
a host queue lock to protect shared queue tag. Without this patch we will
see various kernel panics (including the BUG() and kernel errors in
blk_queue_start_tag and blk_queue_end_tag of ll_rw_blk.c) when accessing
different devices simultaneously (e.g. copying big file from one device to
another in smp kernels).

This is against kernel 2.6.20-rc5.

Signed-off-by: Ed Lin <[EMAIL PROTECTED]>
---
 drivers/scsi/scsi_lib.c  |2 +-
 drivers/scsi/stex.c  |2 ++
 include/scsi/scsi_host.h |3 +++
 3 files changed, 6 insertions(+), 1 deletion(-)

diff -purN a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
--- a/drivers/scsi/scsi_lib.c   2007-01-23 14:40:28.0 -0800
+++ b/drivers/scsi/scsi_lib.c   2007-01-23 14:46:43.0 -0800
@@ -1574,7 +1574,7 @@ struct request_queue *__scsi_alloc_queue
 {
struct request_queue *q;
 
-   q = blk_init_queue(request_fn, NULL);
+   q = blk_init_queue(request_fn, shost->req_q_lock);
if (!q)
return NULL;
 
diff -purN a/drivers/scsi/stex.c b/drivers/scsi/stex.c
--- a/drivers/scsi/stex.c   2007-01-23 14:40:28.0 -0800
+++ b/drivers/scsi/stex.c   2007-01-23 14:48:59.0 -0800
@@ -1254,6 +1254,8 @@ stex_probe(struct pci_dev *pdev, const s
if (err)
goto out_free_irq;
 
+   spin_lock_init(>__req_q_lock);
+   host->req_q_lock = >__req_q_lock;
err = scsi_init_shared_tag_map(host, host->can_queue);
if (err) {
printk(KERN_ERR DRV_NAME "(%s): init shared queue failed\n",
diff -purN a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
--- a/include/scsi/scsi_host.h  2007-01-23 14:40:29.0 -0800
+++ b/include/scsi/scsi_host.h  2007-01-23 14:57:04.0 -0800
@@ -508,6 +508,9 @@ struct Scsi_Host {
spinlock_t  default_lock;
spinlock_t  *host_lock;
 
+   spinlock_t  __req_q_lock;
+   spinlock_t  *req_q_lock;/* protect shared block queue tag */
+
struct mutexscan_mutex;/* serialize scanning activity */
 
struct list_headeh_cmd_q;



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] scsi: use lock per host instead of per device for shared queue tag host

2007-01-23 Thread Ed Lin

The block layer uses lock to protect request queue. Every scsi device
has a unique request queue, and queue lock is the default lock in struct
request_queue. This is good for normal cases. But for a  host with
shared queue tag (e.g. stex controllers), a queue lock per device means
the shared queue tag is not protected when multiple devices are accessed
at a same time.  This patch is a simple fix for this situation by introducing
a host queue lock to protect shared queue tag. Without this patch we will
see various kernel panics (including the BUG() and kernel errors in
blk_queue_start_tag and blk_queue_end_tag of ll_rw_blk.c) when accessing
different devices simultaneously (e.g. copying big file from one device to
another in smp kernels).

This is against kernel 2.6.20-rc5.

Signed-off-by: Ed Lin [EMAIL PROTECTED]
---
 drivers/scsi/scsi_lib.c  |2 +-
 drivers/scsi/stex.c  |2 ++
 include/scsi/scsi_host.h |3 +++
 3 files changed, 6 insertions(+), 1 deletion(-)

diff -purN a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
--- a/drivers/scsi/scsi_lib.c   2007-01-23 14:40:28.0 -0800
+++ b/drivers/scsi/scsi_lib.c   2007-01-23 14:46:43.0 -0800
@@ -1574,7 +1574,7 @@ struct request_queue *__scsi_alloc_queue
 {
struct request_queue *q;
 
-   q = blk_init_queue(request_fn, NULL);
+   q = blk_init_queue(request_fn, shost-req_q_lock);
if (!q)
return NULL;
 
diff -purN a/drivers/scsi/stex.c b/drivers/scsi/stex.c
--- a/drivers/scsi/stex.c   2007-01-23 14:40:28.0 -0800
+++ b/drivers/scsi/stex.c   2007-01-23 14:48:59.0 -0800
@@ -1254,6 +1254,8 @@ stex_probe(struct pci_dev *pdev, const s
if (err)
goto out_free_irq;
 
+   spin_lock_init(host-__req_q_lock);
+   host-req_q_lock = host-__req_q_lock;
err = scsi_init_shared_tag_map(host, host-can_queue);
if (err) {
printk(KERN_ERR DRV_NAME (%s): init shared queue failed\n,
diff -purN a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
--- a/include/scsi/scsi_host.h  2007-01-23 14:40:29.0 -0800
+++ b/include/scsi/scsi_host.h  2007-01-23 14:57:04.0 -0800
@@ -508,6 +508,9 @@ struct Scsi_Host {
spinlock_t  default_lock;
spinlock_t  *host_lock;
 
+   spinlock_t  __req_q_lock;
+   spinlock_t  *req_q_lock;/* protect shared block queue tag */
+
struct mutexscan_mutex;/* serialize scanning activity */
 
struct list_headeh_cmd_q;



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

38 matches

Mail list logo