Re: BUG 2.6.24-rc4-mm1 -- Boot still hangs w/ async scsi scan
On Thu, 2007-12-06 at 14:01 -0500, Lee Schermerhorn wrote: On Thu, 2007-12-06 at 10:35 -0800, Andrew Morton wrote: On Thu, 06 Dec 2007 13:14:22 -0500 Lee Schermerhorn [EMAIL PROTECTED] wrote: On Wed, 2007-12-05 at 13:20 -0800, Andrew Morton wrote: On Wed, 05 Dec 2007 11:36:39 -0500 Lee Schermerhorn [EMAIL PROTECTED] wrote: As reported here: http://marc.info/?l=linux-scsim=119645761124683w=4 against 24-rc3-mm2, I'm still seeing the hang on my HP ia64 NUMA platform under 24-rc4-mm1 with async scsi scan enabled. I'm still seeing the message mptspi: ioc#: mpt_config failed when it hangs. I can boot by disabling async scan. However, I've also noticed some disks attached via one of the mpt adapters [scsi8 in console long in message linked above] going off-line during stress tests. This was under 24-rc3-mm2. Haven't got that far yet with 24-rc4-mm1. Is ther any way of tricking you into http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt? Obvious culprits to start with would be git-scsi-misc and maybe scsi-early-detection-of-medium-not-present-updated.patch. But there are only 20-odd scsi patches in there. The reported hang occurs after pushing the git-scsi-misc patch. After trying a few suspect hunks of the git-scsi-misc.patch, I have verified that the commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0--as discussed in http://marc.info/?t=11968982411r=1w=4 for a different symptom--seems to be the culprit. Reverting this patch allows me to boot with async scsi scan enabled. I'm starting a stress test to verify that this fixes the disk going off-line issue that I saw earlier. Lee - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG 2.6.24-rc4-mm1 -- Boot still hangs w/ async scsi scan
On Thu, 2007-12-06 at 10:35 -0800, Andrew Morton wrote: On Thu, 06 Dec 2007 13:14:22 -0500 Lee Schermerhorn [EMAIL PROTECTED] wrote: On Wed, 2007-12-05 at 13:20 -0800, Andrew Morton wrote: On Wed, 05 Dec 2007 11:36:39 -0500 Lee Schermerhorn [EMAIL PROTECTED] wrote: As reported here: http://marc.info/?l=linux-scsim=119645761124683w=4 against 24-rc3-mm2, I'm still seeing the hang on my HP ia64 NUMA platform under 24-rc4-mm1 with async scsi scan enabled. I'm still seeing the message mptspi: ioc#: mpt_config failed when it hangs. I can boot by disabling async scan. However, I've also noticed some disks attached via one of the mpt adapters [scsi8 in console long in message linked above] going off-line during stress tests. This was under 24-rc3-mm2. Haven't got that far yet with 24-rc4-mm1. Is ther any way of tricking you into http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt? Obvious culprits to start with would be git-scsi-misc and maybe scsi-early-detection-of-medium-not-present-updated.patch. But there are only 20-odd scsi patches in there. The reported hang occurs after pushing the git-scsi-misc patch. OK, thanks. More info on the hang. I thought I'd enable the debug prints in just mpt_config() to see if I could see why it failed when the hang occurs. [Hacky patch below.]. However, when I enable the printk's the system boots fine with git-scsi-misc applied, even with async scan enabled. An extract of the console log of the messages is included below, in case that provides a clue to anyone who might care. One other thing: If I leave the system in the hung state long enough, I start seeing stack dumps and messages about tasks blocked for more than 120 seconds [swapper and scsi_scan_[89]]. Section of the log included below. I'll keep investigating in the background... Lee --- temp mpt debug patch drivers/message/fusion/mptbase.c |5 + 1 file changed, 5 insertions(+) Index: Linux/drivers/message/fusion/mptbase.c === --- Linux.orig/drivers/message/fusion/mptbase.c 2007-12-07 12:12:00.0 -0500 +++ Linux/drivers/message/fusion/mptbase.c 2007-12-07 12:16:29.0 -0500 @@ -5633,6 +5633,9 @@ SendEventAck(MPT_ADAPTER *ioc, EventNoti * -EAGAIN if no msg frames currently available * -EFAULT for non-successful reply or no reply (timeout) */ +// brute force enable dcprintk for just this function +#undef dcprintk +#define dcprintk(IOC, CMD) CMD int mpt_config(MPT_ADAPTER *ioc, CONFIGPARMS *pCfg) { @@ -5746,6 +5749,8 @@ mpt_config(MPT_ADAPTER *ioc, CONFIGPARMS return rc; } +#undef dcprintk +#define dcprintk(IOC, CMD) /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/ /** --- console log extract of messages from patch above: Fusion MPT base driver 3.04.06 Copyright (c) 1999-2007 LSI Corporation Fusion MPT SPI Host driver 3.04.06 GSI 43 (level, low) - CPU 2 (0x0800) vector 74 ACPI: PCI Interrupt :38:01.0[A] - GSI 43 (level, low) - IRQ 74 mptbase: ioc0: Initiating bringup ioc0: LSI53C1030 C0: Capabilities={Initiator,Target} mptbase: ioc0: Sending Config request type 3, page 0 and action 0 mptbase: ioc0: Sending Config request type 3, page 0 and action 1 mptbase: ioc0: Sending Config request type 3, page 2 and action 0 mptbase: ioc0: Sending Config request type 3, page 2 and action 6 mptbase: ioc0: Sending Config request type 4, page 1 and action 0 mptbase: ioc0: Sending Config request type 4, page 0 and action 0 mptbase: ioc0: Sending Config request type 1, page 1 and action 0 mptbase: ioc0: Sending Config request type 1, page 1 and action 1 mptbase: ioc0: Sending Config request type 1, page 4 and action 0 mptbase: ioc0: Sending Config request type 1, page 4 and action 1 mptbase: ioc0: Sending Config request type 0, page 2 and action 0 mptbase: ioc0: Sending Config request type 0, page 2 and action 1 mptbase: ioc0: Sending Config request type 9, page 0 and action 0 mptbase: ioc0: Sending Config request type 9, page 0 and action 1 scsi8 : ioc0: LSI53C1030 C0, FwRev=01032341h, Ports=1, MaxQ=255, IRQ=74 mptbase: ioc0: Sending Config request type 4, page 1 and action 2 GSI 44 (level, low) - CPU 3 (0x0c00) vector 75 ACPI: PCI Interrupt :38:01.1[B] - GSI 44 (level, low) - IRQ 75 mptbase: ioc1: Initiating bringup ioc1: LSI53C1030 C0: Capabilities={Initiator,Target} mptbase: ioc1: Sending Config request type 3, page 0 and action 0 scsi 8:0:0:0: Direct-Access COMPAQ BF036863B9 HPB4 PQ: 0 ANSI: 3 target8:0:0: Beginning Domain Validation mptbase: ioc0: Sending Config request type 4, page 1 and action 2 mptbase: ioc0: Sending Config request type 4, page 1 and action 2 mptbase: ioc1: Sending Config request type 3, page 0 and action 1 mptbase:
Re: BUG 2.6.24-rc4-mm1 -- Boot still hangs w/ async scsi scan
On Wed, 2007-12-05 at 13:20 -0800, Andrew Morton wrote: On Wed, 05 Dec 2007 11:36:39 -0500 Lee Schermerhorn [EMAIL PROTECTED] wrote: As reported here: http://marc.info/?l=linux-scsim=119645761124683w=4 against 24-rc3-mm2, I'm still seeing the hang on my HP ia64 NUMA platform under 24-rc4-mm1 with async scsi scan enabled. I'm still seeing the message mptspi: ioc#: mpt_config failed when it hangs. I can boot by disabling async scan. However, I've also noticed some disks attached via one of the mpt adapters [scsi8 in console long in message linked above] going off-line during stress tests. This was under 24-rc3-mm2. Haven't got that far yet with 24-rc4-mm1. Is ther any way of tricking you into http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt? Obvious culprits to start with would be git-scsi-misc and maybe scsi-early-detection-of-medium-not-present-updated.patch. But there are only 20-odd scsi patches in there. The reported hang occurs after pushing the git-scsi-misc patch. I'm looking into it now, but it's rather large and I'm a neophyte in this area. If James can point me at a broken-out quilt series for this patch, I'd be willing to try to bisect that--assuming that it IS bisectable. Lee - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG 2.6.24-rc4-mm1 -- Boot still hangs w/ async scsi scan
On Thu, 06 Dec 2007 13:14:22 -0500 Lee Schermerhorn [EMAIL PROTECTED] wrote: On Wed, 2007-12-05 at 13:20 -0800, Andrew Morton wrote: On Wed, 05 Dec 2007 11:36:39 -0500 Lee Schermerhorn [EMAIL PROTECTED] wrote: As reported here: http://marc.info/?l=linux-scsim=119645761124683w=4 against 24-rc3-mm2, I'm still seeing the hang on my HP ia64 NUMA platform under 24-rc4-mm1 with async scsi scan enabled. I'm still seeing the message mptspi: ioc#: mpt_config failed when it hangs. I can boot by disabling async scan. However, I've also noticed some disks attached via one of the mpt adapters [scsi8 in console long in message linked above] going off-line during stress tests. This was under 24-rc3-mm2. Haven't got that far yet with 24-rc4-mm1. Is ther any way of tricking you into http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt? Obvious culprits to start with would be git-scsi-misc and maybe scsi-early-detection-of-medium-not-present-updated.patch. But there are only 20-odd scsi patches in there. The reported hang occurs after pushing the git-scsi-misc patch. OK, thanks. I'm looking into it now, but it's rather large and I'm a neophyte in this area. If James can point me at a broken-out quilt series for this patch, I'd be willing to try to bisect that-- I doubt if such a thing exists. assuming that it IS bisectable. Often git trees are not bisectable. But they should be. Your best bet is to do a git-bisect on git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git http://www.kernel.org/doc/local/git-quick.html - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG 2.6.24-rc4-mm1 -- Boot still hangs w/ async scsi scan
On Thu, 2007-12-06 at 10:35 -0800, Andrew Morton wrote: On Thu, 06 Dec 2007 13:14:22 -0500 Lee Schermerhorn [EMAIL PROTECTED] wrote: On Wed, 2007-12-05 at 13:20 -0800, Andrew Morton wrote: On Wed, 05 Dec 2007 11:36:39 -0500 Lee Schermerhorn [EMAIL PROTECTED] wrote: As reported here: http://marc.info/?l=linux-scsim=119645761124683w=4 against 24-rc3-mm2, I'm still seeing the hang on my HP ia64 NUMA platform under 24-rc4-mm1 with async scsi scan enabled. I'm still seeing the message mptspi: ioc#: mpt_config failed when it hangs. I can boot by disabling async scan. However, I've also noticed some disks attached via one of the mpt adapters [scsi8 in console long in message linked above] going off-line during stress tests. This was under 24-rc3-mm2. Haven't got that far yet with 24-rc4-mm1. Is ther any way of tricking you into http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt? Obvious culprits to start with would be git-scsi-misc and maybe scsi-early-detection-of-medium-not-present-updated.patch. But there are only 20-odd scsi patches in there. The reported hang occurs after pushing the git-scsi-misc patch. OK, thanks. I'm looking into it now, but it's rather large and I'm a neophyte in this area. If James can point me at a broken-out quilt series for this patch, I'd be willing to try to bisect that-- I doubt if such a thing exists. assuming that it IS bisectable. Often git trees are not bisectable. But they should be. Your best bet is to do a git-bisect on git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git http://www.kernel.org/doc/local/git-quick.html Ah, well... Can't promise that will happen any time soon... Lee - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG 2.6.24-rc4-mm1 -- Boot still hangs w/ async scsi scan
On Wed, 05 Dec 2007 11:36:39 -0500 Lee Schermerhorn [EMAIL PROTECTED] wrote: As reported here: http://marc.info/?l=linux-scsim=119645761124683w=4 against 24-rc3-mm2, I'm still seeing the hang on my HP ia64 NUMA platform under 24-rc4-mm1 with async scsi scan enabled. I'm still seeing the message mptspi: ioc#: mpt_config failed when it hangs. I can boot by disabling async scan. However, I've also noticed some disks attached via one of the mpt adapters [scsi8 in console long in message linked above] going off-line during stress tests. This was under 24-rc3-mm2. Haven't got that far yet with 24-rc4-mm1. Is ther any way of tricking you into http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt? Obvious culprits to start with would be git-scsi-misc and maybe scsi-early-detection-of-medium-not-present-updated.patch. But there are only 20-odd scsi patches in there. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
BUG 2.6.24-rc4-mm1 -- Boot still hangs w/ async scsi scan
As reported here: http://marc.info/?l=linux-scsim=119645761124683w=4 against 24-rc3-mm2, I'm still seeing the hang on my HP ia64 NUMA platform under 24-rc4-mm1 with async scsi scan enabled. I'm still seeing the message mptspi: ioc#: mpt_config failed when it hangs. I can boot by disabling async scan. However, I've also noticed some disks attached via one of the mpt adapters [scsi8 in console long in message linked above] going off-line during stress tests. This was under 24-rc3-mm2. Haven't got that far yet with 24-rc4-mm1. Regards, Lee - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG 2.6.24-rc4-mm1 -- Boot still hangs w/ async scsi scan
On Wed, 2007-12-05 at 13:20 -0800, Andrew Morton wrote: On Wed, 05 Dec 2007 11:36:39 -0500 Lee Schermerhorn [EMAIL PROTECTED] wrote: As reported here: http://marc.info/?l=linux-scsim=119645761124683w=4 against 24-rc3-mm2, I'm still seeing the hang on my HP ia64 NUMA platform under 24-rc4-mm1 with async scsi scan enabled. I'm still seeing the message mptspi: ioc#: mpt_config failed when it hangs. I can boot by disabling async scan. However, I've also noticed some disks attached via one of the mpt adapters [scsi8 in console long in message linked above] going off-line during stress tests. This was under 24-rc3-mm2. Haven't got that far yet with 24-rc4-mm1. Is ther any way of tricking you into http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt? Obvious culprits to start with would be git-scsi-misc and maybe scsi-early-detection-of-medium-not-present-updated.patch. But there are only 20-odd scsi patches in there. I'll try to get to it tomorrow am. Lee - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html