Re: BUG 2.6.24-rc4-mm1 -- Boot still hangs w/ async scsi scan

2007-12-11 Thread Lee Schermerhorn
On Thu, 2007-12-06 at 14:01 -0500, Lee Schermerhorn wrote:
 On Thu, 2007-12-06 at 10:35 -0800, Andrew Morton wrote:
  On Thu, 06 Dec 2007 13:14:22 -0500 Lee Schermerhorn [EMAIL PROTECTED] 
  wrote:
  
   On Wed, 2007-12-05 at 13:20 -0800, Andrew Morton wrote:
On Wed, 05 Dec 2007 11:36:39 -0500
Lee Schermerhorn [EMAIL PROTECTED] wrote:

 As reported here:
 
   http://marc.info/?l=linux-scsim=119645761124683w=4
 
 against 24-rc3-mm2, I'm still seeing the hang on my HP ia64 NUMA
 platform under 24-rc4-mm1 with async scsi scan enabled.  I'm still
 seeing the message  mptspi: ioc#: mpt_config failed when it hangs. 
 
 I can boot by disabling async scan.  However, I've also noticed some
 disks attached via one of the mpt adapters [scsi8 in console long 
 in
 message linked above] going off-line during stress tests.  This was
 under 24-rc3-mm2.  Haven't got that far yet with 24-rc4-mm1.
 

Is ther any way of tricking you into
http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt?

Obvious culprits to start with would be git-scsi-misc and maybe
scsi-early-detection-of-medium-not-present-updated.patch.  But there are
only 20-odd scsi patches in there.
   
   The reported hang occurs after pushing the git-scsi-misc patch.
  

After trying a few suspect hunks of the git-scsi-misc.patch, I have
verified that the commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0--as
discussed in  http://marc.info/?t=11968982411r=1w=4 for a
different symptom--seems to be the culprit.  Reverting this patch allows
me to boot with async scsi scan enabled.  I'm starting a stress test to
verify that this fixes the disk going off-line issue that I saw
earlier.

Lee

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG 2.6.24-rc4-mm1 -- Boot still hangs w/ async scsi scan

2007-12-07 Thread Lee Schermerhorn
On Thu, 2007-12-06 at 10:35 -0800, Andrew Morton wrote:
 On Thu, 06 Dec 2007 13:14:22 -0500 Lee Schermerhorn [EMAIL PROTECTED] wrote:
 
  On Wed, 2007-12-05 at 13:20 -0800, Andrew Morton wrote:
   On Wed, 05 Dec 2007 11:36:39 -0500
   Lee Schermerhorn [EMAIL PROTECTED] wrote:
   
As reported here:

http://marc.info/?l=linux-scsim=119645761124683w=4

against 24-rc3-mm2, I'm still seeing the hang on my HP ia64 NUMA
platform under 24-rc4-mm1 with async scsi scan enabled.  I'm still
seeing the message  mptspi: ioc#: mpt_config failed when it hangs. 

I can boot by disabling async scan.  However, I've also noticed some
disks attached via one of the mpt adapters [scsi8 in console long in
message linked above] going off-line during stress tests.  This was
under 24-rc3-mm2.  Haven't got that far yet with 24-rc4-mm1.

   
   Is ther any way of tricking you into
   http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt?
   
   Obvious culprits to start with would be git-scsi-misc and maybe
   scsi-early-detection-of-medium-not-present-updated.patch.  But there are
   only 20-odd scsi patches in there.
  
  The reported hang occurs after pushing the git-scsi-misc patch.
 
 OK, thanks.

More info on the hang.

I thought I'd enable the debug prints in just mpt_config() to see if I
could see why it failed when the hang occurs.  [Hacky patch below.].
However, when I enable the printk's the system boots fine with
git-scsi-misc applied, even with async scan enabled.  An extract of the
console log of the messages is included below, in case that provides a
clue to anyone who might care.

One other thing:  If I leave the system in the hung state long enough, I
start seeing stack dumps and messages about tasks blocked for more than
120 seconds [swapper and scsi_scan_[89]].  Section of the log included
below.

I'll keep investigating in the background...

Lee

---

temp mpt debug patch

 drivers/message/fusion/mptbase.c |5 +
 1 file changed, 5 insertions(+)

Index: Linux/drivers/message/fusion/mptbase.c
===
--- Linux.orig/drivers/message/fusion/mptbase.c 2007-12-07 12:12:00.0 
-0500
+++ Linux/drivers/message/fusion/mptbase.c  2007-12-07 12:16:29.0 
-0500
@@ -5633,6 +5633,9 @@ SendEventAck(MPT_ADAPTER *ioc, EventNoti
  * -EAGAIN if no msg frames currently available
  * -EFAULT for non-successful reply or no reply (timeout)
  */
+// brute force enable dcprintk for just this function
+#undef dcprintk
+#define dcprintk(IOC, CMD) CMD
 int
 mpt_config(MPT_ADAPTER *ioc, CONFIGPARMS *pCfg)
 {
@@ -5746,6 +5749,8 @@ mpt_config(MPT_ADAPTER *ioc, CONFIGPARMS
 
return rc;
 }
+#undef dcprintk
+#define dcprintk(IOC, CMD)
 
 /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
 /**

---
console log extract of messages from patch above:

Fusion MPT base driver 3.04.06
Copyright (c) 1999-2007 LSI Corporation
Fusion MPT SPI Host driver 3.04.06
GSI 43 (level, low) - CPU 2 (0x0800) vector 74
ACPI: PCI Interrupt :38:01.0[A] - GSI 43 (level, low) - IRQ 74
mptbase: ioc0: Initiating bringup
ioc0: LSI53C1030 C0: Capabilities={Initiator,Target}
mptbase: ioc0: Sending Config request type 3, page 0 and action 0
mptbase: ioc0: Sending Config request type 3, page 0 and action 1
mptbase: ioc0: Sending Config request type 3, page 2 and action 0
mptbase: ioc0: Sending Config request type 3, page 2 and action 6
mptbase: ioc0: Sending Config request type 4, page 1 and action 0
mptbase: ioc0: Sending Config request type 4, page 0 and action 0
mptbase: ioc0: Sending Config request type 1, page 1 and action 0
mptbase: ioc0: Sending Config request type 1, page 1 and action 1
mptbase: ioc0: Sending Config request type 1, page 4 and action 0
mptbase: ioc0: Sending Config request type 1, page 4 and action 1
mptbase: ioc0: Sending Config request type 0, page 2 and action 0
mptbase: ioc0: Sending Config request type 0, page 2 and action 1
mptbase: ioc0: Sending Config request type 9, page 0 and action 0
mptbase: ioc0: Sending Config request type 9, page 0 and action 1
scsi8 : ioc0: LSI53C1030 C0, FwRev=01032341h, Ports=1, MaxQ=255, IRQ=74
mptbase: ioc0: Sending Config request type 4, page 1 and action 2
GSI 44 (level, low) - CPU 3 (0x0c00) vector 75
ACPI: PCI Interrupt :38:01.1[B] - GSI 44 (level, low) - IRQ 75
mptbase: ioc1: Initiating bringup
ioc1: LSI53C1030 C0: Capabilities={Initiator,Target}
mptbase: ioc1: Sending Config request type 3, page 0 and action 0
scsi 8:0:0:0: Direct-Access COMPAQ   BF036863B9   HPB4 PQ: 0 ANSI: 3
 target8:0:0: Beginning Domain Validation
mptbase: ioc0: Sending Config request type 4, page 1 and action 2
mptbase: ioc0: Sending Config request type 4, page 1 and action 2
mptbase: ioc1: Sending Config request type 3, page 0 and action 1
mptbase: 

Re: BUG 2.6.24-rc4-mm1 -- Boot still hangs w/ async scsi scan

2007-12-06 Thread Lee Schermerhorn
On Wed, 2007-12-05 at 13:20 -0800, Andrew Morton wrote:
 On Wed, 05 Dec 2007 11:36:39 -0500
 Lee Schermerhorn [EMAIL PROTECTED] wrote:
 
  As reported here:
  
  http://marc.info/?l=linux-scsim=119645761124683w=4
  
  against 24-rc3-mm2, I'm still seeing the hang on my HP ia64 NUMA
  platform under 24-rc4-mm1 with async scsi scan enabled.  I'm still
  seeing the message  mptspi: ioc#: mpt_config failed when it hangs. 
  
  I can boot by disabling async scan.  However, I've also noticed some
  disks attached via one of the mpt adapters [scsi8 in console long in
  message linked above] going off-line during stress tests.  This was
  under 24-rc3-mm2.  Haven't got that far yet with 24-rc4-mm1.
  
 
 Is ther any way of tricking you into
 http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt?
 
 Obvious culprits to start with would be git-scsi-misc and maybe
 scsi-early-detection-of-medium-not-present-updated.patch.  But there are
 only 20-odd scsi patches in there.

The reported hang occurs after pushing the git-scsi-misc patch.  I'm
looking into it now, but it's rather large and I'm a neophyte in this
area.  If James can point me at a broken-out quilt series for this
patch, I'd be willing to try to bisect that--assuming that it IS
bisectable.

Lee

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG 2.6.24-rc4-mm1 -- Boot still hangs w/ async scsi scan

2007-12-06 Thread Andrew Morton
On Thu, 06 Dec 2007 13:14:22 -0500 Lee Schermerhorn [EMAIL PROTECTED] wrote:

 On Wed, 2007-12-05 at 13:20 -0800, Andrew Morton wrote:
  On Wed, 05 Dec 2007 11:36:39 -0500
  Lee Schermerhorn [EMAIL PROTECTED] wrote:
  
   As reported here:
   
 http://marc.info/?l=linux-scsim=119645761124683w=4
   
   against 24-rc3-mm2, I'm still seeing the hang on my HP ia64 NUMA
   platform under 24-rc4-mm1 with async scsi scan enabled.  I'm still
   seeing the message  mptspi: ioc#: mpt_config failed when it hangs. 
   
   I can boot by disabling async scan.  However, I've also noticed some
   disks attached via one of the mpt adapters [scsi8 in console long in
   message linked above] going off-line during stress tests.  This was
   under 24-rc3-mm2.  Haven't got that far yet with 24-rc4-mm1.
   
  
  Is ther any way of tricking you into
  http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt?
  
  Obvious culprits to start with would be git-scsi-misc and maybe
  scsi-early-detection-of-medium-not-present-updated.patch.  But there are
  only 20-odd scsi patches in there.
 
 The reported hang occurs after pushing the git-scsi-misc patch.

OK, thanks.

  I'm
 looking into it now, but it's rather large and I'm a neophyte in this
 area.  If James can point me at a broken-out quilt series for this
 patch, I'd be willing to try to bisect that--

I doubt if such a thing exists.

 assuming that it IS
 bisectable.

Often git trees are not bisectable.  But they should be.

Your best bet is to do a git-bisect on
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git

http://www.kernel.org/doc/local/git-quick.html

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG 2.6.24-rc4-mm1 -- Boot still hangs w/ async scsi scan

2007-12-06 Thread Lee Schermerhorn
On Thu, 2007-12-06 at 10:35 -0800, Andrew Morton wrote:
 On Thu, 06 Dec 2007 13:14:22 -0500 Lee Schermerhorn [EMAIL PROTECTED] wrote:
 
  On Wed, 2007-12-05 at 13:20 -0800, Andrew Morton wrote:
   On Wed, 05 Dec 2007 11:36:39 -0500
   Lee Schermerhorn [EMAIL PROTECTED] wrote:
   
As reported here:

http://marc.info/?l=linux-scsim=119645761124683w=4

against 24-rc3-mm2, I'm still seeing the hang on my HP ia64 NUMA
platform under 24-rc4-mm1 with async scsi scan enabled.  I'm still
seeing the message  mptspi: ioc#: mpt_config failed when it hangs. 

I can boot by disabling async scan.  However, I've also noticed some
disks attached via one of the mpt adapters [scsi8 in console long in
message linked above] going off-line during stress tests.  This was
under 24-rc3-mm2.  Haven't got that far yet with 24-rc4-mm1.

   
   Is ther any way of tricking you into
   http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt?
   
   Obvious culprits to start with would be git-scsi-misc and maybe
   scsi-early-detection-of-medium-not-present-updated.patch.  But there are
   only 20-odd scsi patches in there.
  
  The reported hang occurs after pushing the git-scsi-misc patch.
 
 OK, thanks.
 
   I'm
  looking into it now, but it's rather large and I'm a neophyte in this
  area.  If James can point me at a broken-out quilt series for this
  patch, I'd be willing to try to bisect that--
 
 I doubt if such a thing exists.
 
  assuming that it IS
  bisectable.
 
 Often git trees are not bisectable.  But they should be.
 
 Your best bet is to do a git-bisect on
 git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git
 
 http://www.kernel.org/doc/local/git-quick.html
 

Ah, well... Can't promise that will happen any time soon...

Lee

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG 2.6.24-rc4-mm1 -- Boot still hangs w/ async scsi scan

2007-12-05 Thread Andrew Morton
On Wed, 05 Dec 2007 11:36:39 -0500
Lee Schermerhorn [EMAIL PROTECTED] wrote:

 As reported here:
 
   http://marc.info/?l=linux-scsim=119645761124683w=4
 
 against 24-rc3-mm2, I'm still seeing the hang on my HP ia64 NUMA
 platform under 24-rc4-mm1 with async scsi scan enabled.  I'm still
 seeing the message  mptspi: ioc#: mpt_config failed when it hangs. 
 
 I can boot by disabling async scan.  However, I've also noticed some
 disks attached via one of the mpt adapters [scsi8 in console long in
 message linked above] going off-line during stress tests.  This was
 under 24-rc3-mm2.  Haven't got that far yet with 24-rc4-mm1.
 

Is ther any way of tricking you into
http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt?

Obvious culprits to start with would be git-scsi-misc and maybe
scsi-early-detection-of-medium-not-present-updated.patch.  But there are
only 20-odd scsi patches in there.

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BUG 2.6.24-rc4-mm1 -- Boot still hangs w/ async scsi scan

2007-12-05 Thread Lee Schermerhorn
As reported here:

http://marc.info/?l=linux-scsim=119645761124683w=4

against 24-rc3-mm2, I'm still seeing the hang on my HP ia64 NUMA
platform under 24-rc4-mm1 with async scsi scan enabled.  I'm still
seeing the message  mptspi: ioc#: mpt_config failed when it hangs. 

I can boot by disabling async scan.  However, I've also noticed some
disks attached via one of the mpt adapters [scsi8 in console long in
message linked above] going off-line during stress tests.  This was
under 24-rc3-mm2.  Haven't got that far yet with 24-rc4-mm1.

Regards,
Lee


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG 2.6.24-rc4-mm1 -- Boot still hangs w/ async scsi scan

2007-12-05 Thread Lee Schermerhorn
On Wed, 2007-12-05 at 13:20 -0800, Andrew Morton wrote:
 On Wed, 05 Dec 2007 11:36:39 -0500
 Lee Schermerhorn [EMAIL PROTECTED] wrote:
 
  As reported here:
  
  http://marc.info/?l=linux-scsim=119645761124683w=4
  
  against 24-rc3-mm2, I'm still seeing the hang on my HP ia64 NUMA
  platform under 24-rc4-mm1 with async scsi scan enabled.  I'm still
  seeing the message  mptspi: ioc#: mpt_config failed when it hangs. 
  
  I can boot by disabling async scan.  However, I've also noticed some
  disks attached via one of the mpt adapters [scsi8 in console long in
  message linked above] going off-line during stress tests.  This was
  under 24-rc3-mm2.  Haven't got that far yet with 24-rc4-mm1.
  
 
 Is ther any way of tricking you into
 http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt?
 
 Obvious culprits to start with would be git-scsi-misc and maybe
 scsi-early-detection-of-medium-not-present-updated.patch.  But there are
 only 20-odd scsi patches in there.

I'll try to get to it tomorrow am.  

Lee
 

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html