[Bug 9775] HOST_MSG_LOOP invalid SCB ff

2008-02-16 Thread bugme-daemon
http://bugzilla.kernel.org/show_bug.cgi?id=9775


[EMAIL PROTECTED] changed:

   What|Removed |Added

 Status|NEW |CLOSED
 Resolution||CODE_FIX




--- Comment #11 from [EMAIL PROTECTED]  2008-02-16 18:40 ---
Thanks James,
I've spent an afternoon rebooting now and finally discovered  I had a faulty
external SSCI cable.

Initial tests suggest its ok.

However I remain perplexed. The problem initially manifested when I upgraded my
kernel, not when I diddled with my hardware.

This now seems to have fixed udev bug
http://bugs.gentoo.org/show_bug.cgi?id=200437

as well

how bizarre!

Thanks for your help everyone.

Regards
John


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 9775] HOST_MSG_LOOP invalid SCB ff

2008-02-12 Thread James Bottomley
On Fri, 2008-02-08 at 18:52 -0800, [EMAIL PROTECTED]
wrote:
 Ok, I've spent some time trying different combinations of devices.
 
 Against kernel 2.6.24
 T0 is Quantum DLT8000 ID0
 T1 is Quantum DLT8000 ID1
 MTX is STK L80  ID 15
 Terminators A, B
 
 Channel A   B
 T0,T1,MTX,B Nil   
  
 Crash
 Nil T0,T1,MTX,B   
  
 Parity Error in Data-in Phase
 Nil T0,MTX,B  
  
 Ok, Tar test ok, MTX ok
 Nil T1,MTX,B  
  
 Ok, Tar test ok, MTX ok 
 -- Both drives work ok  
 T1,MTX,BNil   
  
 Ok   Skipped Tests
 T1,MTX,ANil   
  
 Ok   Skipped Tests
 T0,MTX,BNil   
  
 Crash
 T0,MTX,ANil   
  
 Crash
 -- Not the terminator
 
 
 --Test on two channels
 T0,MTX,AT1,B  
  
 Crash
 T1,BT0,MTX,A  
  
 Parity Error in Data-in Phase   
 
 It really doesn't like three devices, on two busses or one.

Well, I still think you have some type of bus instability, but that said
we need to get rid of the panic.

I'm afraid this is going to be a long process.  For the first attempt,
let's see if this is an unsolicited msgin ... it looks like the driver
handling for those is wrong.  Can you try this patch?

Thanks,

James

---

diff --git a/drivers/scsi/aic7xxx/aic7xxx_core.c 
b/drivers/scsi/aic7xxx/aic7xxx_core.c
index 6d2ae64..64e62ce 100644
--- a/drivers/scsi/aic7xxx/aic7xxx_core.c
+++ b/drivers/scsi/aic7xxx/aic7xxx_core.c
@@ -695,15 +695,16 @@ ahc_handle_seqint(struct ahc_softc *ahc, u_int intstat)
scb_index = ahc_inb(ahc, SCB_TAG);
scb = ahc_lookup_scb(ahc, scb_index);
if (devinfo.role == ROLE_INITIATOR) {
-   if (scb == NULL)
-   panic(HOST_MSG_LOOP with 
- invalid SCB %x\n, scb_index);
+   if (bus_phase == P_MESGOUT) {
+   if (scb == NULL)
+   panic(HOST_MSG_LOOP with 
+ invalid SCB %x\n,
+ scb_index);
 
-   if (bus_phase == P_MESGOUT)
ahc_setup_initiator_msgout(ahc,
   devinfo,
   scb);
-   else {
+   } else {
ahc-msg_type =
MSG_TYPE_INITIATOR_MSGIN;
ahc-msgin_index = 0;


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 9775] HOST_MSG_LOOP invalid SCB ff

2008-02-12 Thread bugme-daemon
http://bugzilla.kernel.org/show_bug.cgi?id=9775





--- Comment #10 from [EMAIL PROTECTED]  2008-02-12 13:56 ---
Reply-To: [EMAIL PROTECTED]

On Fri, 2008-02-08 at 18:52 -0800, [EMAIL PROTECTED]
wrote:
 Ok, I've spent some time trying different combinations of devices.
 
 Against kernel 2.6.24
 T0 is Quantum DLT8000 ID0
 T1 is Quantum DLT8000 ID1
 MTX is STK L80  ID 15
 Terminators A, B
 
 Channel A   B
 T0,T1,MTX,B Nil   
  
 Crash
 Nil T0,T1,MTX,B   
  
 Parity Error in Data-in Phase
 Nil T0,MTX,B  
  
 Ok, Tar test ok, MTX ok
 Nil T1,MTX,B  
  
 Ok, Tar test ok, MTX ok 
 -- Both drives work ok  
 T1,MTX,BNil   
  
 Ok   Skipped Tests
 T1,MTX,ANil   
  
 Ok   Skipped Tests
 T0,MTX,BNil   
  
 Crash
 T0,MTX,ANil   
  
 Crash
 -- Not the terminator
 
 
 --Test on two channels
 T0,MTX,AT1,B  
  
 Crash
 T1,BT0,MTX,A  
  
 Parity Error in Data-in Phase   
 
 It really doesn't like three devices, on two busses or one.

Well, I still think you have some type of bus instability, but that said
we need to get rid of the panic.

I'm afraid this is going to be a long process.  For the first attempt,
let's see if this is an unsolicited msgin ... it looks like the driver
handling for those is wrong.  Can you try this patch?

Thanks,

James

---

diff --git a/drivers/scsi/aic7xxx/aic7xxx_core.c
b/drivers/scsi/aic7xxx/aic7xxx_core.c
index 6d2ae64..64e62ce 100644
--- a/drivers/scsi/aic7xxx/aic7xxx_core.c
+++ b/drivers/scsi/aic7xxx/aic7xxx_core.c
@@ -695,15 +695,16 @@ ahc_handle_seqint(struct ahc_softc *ahc, u_int intstat)
scb_index = ahc_inb(ahc, SCB_TAG);
scb = ahc_lookup_scb(ahc, scb_index);
if (devinfo.role == ROLE_INITIATOR) {
-   if (scb == NULL)
-   panic(HOST_MSG_LOOP with 
- invalid SCB %x\n, scb_index);
+   if (bus_phase == P_MESGOUT) {
+   if (scb == NULL)
+   panic(HOST_MSG_LOOP with 
+ invalid SCB %x\n,
+ scb_index);

-   if (bus_phase == P_MESGOUT)
ahc_setup_initiator_msgout(ahc,
   devinfo,
   scb);
-   else {
+   } else {
ahc-msg_type =
MSG_TYPE_INITIATOR_MSGIN;
ahc-msgin_index = 0;


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 9775] HOST_MSG_LOOP invalid SCB ff

2008-02-08 Thread bugme-daemon
http://bugzilla.kernel.org/show_bug.cgi?id=9775





--- Comment #8 from [EMAIL PROTECTED]  2008-02-08 18:52 ---
Ok, I've spent some time trying different combinations of devices.

Against kernel 2.6.24
T0 is Quantum DLT8000 ID0
T1 is Quantum DLT8000 ID1
MTX is STK L80  ID 15
Terminators A, B

Channel A   B
T0,T1,MTX,B Nil
Crash
Nil T0,T1,MTX,B
Parity Error in Data-in Phase
Nil T0,MTX,B   
Ok, Tar test ok, MTX ok
Nil T1,MTX,B   
Ok, Tar test ok, MTX ok 
-- Both drives work ok  
T1,MTX,BNil
Ok   Skipped Tests
T1,MTX,ANil
Ok   Skipped Tests
T0,MTX,BNil
Crash
T0,MTX,ANil
Crash
-- Not the terminator


--Test on two channels
T0,MTX,AT1,B   
Crash
T1,BT0,MTX,A   
Parity Error in Data-in Phase   

It really doesn't like three devices, on two busses or one.


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 9775] HOST_MSG_LOOP invalid SCB ff

2008-01-18 Thread bugme-daemon
http://bugzilla.kernel.org/show_bug.cgi?id=9775





--- Comment #7 from [EMAIL PROTECTED]  2008-01-18 14:36 ---
Duh! I mean boot with it off, power it up and rescan.


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 9775] HOST_MSG_LOOP invalid SCB ff

2008-01-18 Thread bugme-daemon
http://bugzilla.kernel.org/show_bug.cgi?id=9775





--- Comment #6 from [EMAIL PROTECTED]  2008-01-18 14:35 ---
Thanks, I've just done some more testing.
There are no tapes in the drives.
Normally, there is the L80 and a DLT8000 on channel B
and a DLT8000 on channel A

Both busses have external terminators.

If Ch B is used alone the system is fine!
If Ch A is used alone it will fail.

If you you are thinking of some hardware problem, its possible to boot with the
L80 off, cause the scsi bus to rescan and have everything work fine.
Regards,
john


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 9775] HOST_MSG_LOOP invalid SCB ff

2008-01-18 Thread James Bottomley

 Latest working kernel version:
 Earliest failing kernel version: 
 Distribution: Gentoo
 Hardware Environment: ML150G3, (2Core cpu, 64Bit)  AHA3944AUWD card, 
 Storagetek
 L80 +2x DLT8000
 Software Environment: gentoo
 Problem Description: kernel panic 
 
 Steps to reproduce:
 Panic if the L80 is powered up when the kernel boots. 100% on any failing
 kernel.
 Not all kernels fail but most do.
 Git Bisect across linus's tree did not produce a convincing patch.
 Originally filed here: http://bugs.gentoo.org/show_bug.cgi?id=200708
 I have joined the linux-scsi list and will
 
 The event that brought the problem to light was the installation of a
 secondhand Storagetek L80
 tape library. This has two DLT8000 drives on a HV-Differential bus.
 This needed special card, an adaptec 3944AUWD.
 The kernel I was running at that time was 2.6.22-gentoo-r8.
 It worked fine. Then when -r9 came out and this error manifested, the
 assumption
 was that -r9 was broken.
 
 I no longer think this to be the case.
 
 I think they are _ALL_ broken, possibly going way back toward the start of the
 2.6 series.
 I think that the bug may or may not manifest depending on the internal layout
 of data in the kernel
 --A true heisenbug--
 
 All that the git bisect did was to change the internal layout, not add/remove 
 a
 bad patch.
 
 This explains why I could take the 2.6.23.8 kernel and compile for SMP and 
 have
 it fail.
 Compile it for UP and have it work. Initially I thought that meant a locking 
 or
 race issue.
 Now I think its was just another case of altering the internal kernel layout.

Actually, I'd investigate either your tapes or the SCSI bus.

The message is produced deep in the heart of the aic7xxx driver.  It
happens when the driver gets reselected with a tag that doesn't exist.
However, in this case, I think your device is untagged, in which case
this is some handling issue with SCB_LIST_NULL (the value 0xff).

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 9775] HOST_MSG_LOOP invalid SCB ff

2008-01-18 Thread bugme-daemon
http://bugzilla.kernel.org/show_bug.cgi?id=9775





--- Comment #5 from [EMAIL PROTECTED]  2008-01-18 14:27 ---
Reply-To: [EMAIL PROTECTED]


 Latest working kernel version:
 Earliest failing kernel version: 
 Distribution: Gentoo
 Hardware Environment: ML150G3, (2Core cpu, 64Bit)  AHA3944AUWD card, 
 Storagetek
 L80 +2x DLT8000
 Software Environment: gentoo
 Problem Description: kernel panic 
 
 Steps to reproduce:
 Panic if the L80 is powered up when the kernel boots. 100% on any failing
 kernel.
 Not all kernels fail but most do.
 Git Bisect across linus's tree did not produce a convincing patch.
 Originally filed here: http://bugs.gentoo.org/show_bug.cgi?id=200708
 I have joined the linux-scsi list and will
 
 The event that brought the problem to light was the installation of a
 secondhand Storagetek L80
 tape library. This has two DLT8000 drives on a HV-Differential bus.
 This needed special card, an adaptec 3944AUWD.
 The kernel I was running at that time was 2.6.22-gentoo-r8.
 It worked fine. Then when -r9 came out and this error manifested, the
 assumption
 was that -r9 was broken.
 
 I no longer think this to be the case.
 
 I think they are _ALL_ broken, possibly going way back toward the start of the
 2.6 series.
 I think that the bug may or may not manifest depending on the internal layout
 of data in the kernel
 --A true heisenbug--
 
 All that the git bisect did was to change the internal layout, not add/remove 
 a
 bad patch.
 
 This explains why I could take the 2.6.23.8 kernel and compile for SMP and 
 have
 it fail.
 Compile it for UP and have it work. Initially I thought that meant a locking 
 or
 race issue.
 Now I think its was just another case of altering the internal kernel layout.

Actually, I'd investigate either your tapes or the SCSI bus.

The message is produced deep in the heart of the aic7xxx driver.  It
happens when the driver gets reselected with a tag that doesn't exist.
However, in this case, I think your device is untagged, in which case
this is some handling issue with SCB_LIST_NULL (the value 0xff).

James


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html