Re: 2.6.24-rc5: tape drive not responding

2007-12-18 Thread Kai Makisara
On Mon, 17 Dec 2007, James Bottomley wrote:

> 
> On Mon, 2007-12-17 at 13:43 -0800, Andrew Morton wrote:
> > On Mon, 17 Dec 2007 16:02:02 -0500
> > "John Stoffel" <[EMAIL PROTECTED]> wrote:
> > 
> > > 
> > > Just to confirm, the propsed patch to st.c fixes the issue with
> > > 2.6.24-rc5 as well at 2.6.24-rc5-mm1 with access to my DLT tape
> > > drives.
> > 
> > err, what patch to st.c?
> 
> That's this one:
> 
> http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=acdd0b1c371b2fbb4b6110a51ba69cb0af9e6f45
> 
I have done some tests. Firstly, I did not see the BUG with 2.6.24-rc5. 
Looking at include/linux/scatterlist.h suggested that CONFIG_DEBUG_SG has 
something to do with this. When enabled SG debugging, I also saw the BUG. 
Adding this patch solved the problem.

You can add

Acked-by: Kai Makisara <[EMAIL PROTECTED]>

if you want. This fix should be included in 2.6.24.

-- 
Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc5: tape drive not responding

2007-12-17 Thread John Stoffel
> "James" == James Bottomley <[EMAIL PROTECTED]> writes:

James> On Mon, 2007-12-17 at 13:43 -0800, Andrew Morton wrote:
>> On Mon, 17 Dec 2007 16:02:02 -0500
>> "John Stoffel" <[EMAIL PROTECTED]> wrote:
>> 
>> > 
>> > Just to confirm, the propsed patch to st.c fixes the issue with
>> > 2.6.24-rc5 as well at 2.6.24-rc5-mm1 with access to my DLT tape
>> > drives.
>> 
>> err, what patch to st.c?

James> That's this one:

James> 
http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=acdd0b1c371b2fbb4b6110a51ba69cb0af9e6f45

>> So it seems that 2.6.24 (and presumably 2.6.23?) need

James> Not 2.6.23 .. the scatterlist changes causing the st problems
James> are local to 2.6.24.

Correct, I ran 2.6.23 for 47+ days of uptime without any problems.  I
jumped to 2.6.24-rc5-mm1 to do my best to help out with finding
problems.  Happy to have found one.  :]

>> 1: Alan's "initio: fix conflict when loading driver" (currently stocuk
>> in git-scsi-misc)

James> Yes, I'm moving this into scsi-rc-fixes

I have nothing to do with this issue.

>> 2: Boaz's "initio: initio_build_scb() fix" (my name for it)

James> And applying this ... although I'd still appreciate confirmation from
James> someone that the initio driver works after this.

Sorry, I don't have of this hardware at all.

>> 3: The mystery st.c fix.
>> 
>> yes?

James> James

Here's the simple one liner patch for the st.c problem:

--- orig/drivers/scsi/st.c 2007-12-16 20:08:45.0
-0500
+++ patched/drivers/scsi/st.c   2007-12-17 13:55:30.0 -0500
@@ -3611,6 +3611,7 @@
 
tb->dma = need_dma;
tb->buffer_size = got;
+   sg_init_table(tb->sg, max_sg);
 
return tb;
 }


Hopefully it's not whitespace damaged.

John
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc5: tape drive not responding

2007-12-17 Thread Jean-Louis Dupond

http://marc.info/?l=linux-scsi&m=119770154127770&w=2

There is the patch for st.c

Andrew Morton schreef:

On Mon, 17 Dec 2007 16:02:02 -0500
"John Stoffel" <[EMAIL PROTECTED]> wrote:


Just to confirm, the propsed patch to st.c fixes the issue with
2.6.24-rc5 as well at 2.6.24-rc5-mm1 with access to my DLT tape
drives.


err, what patch to st.c?

So it seems that 2.6.24 (and presumably 2.6.23?) need

1: Alan's "initio: fix conflict when loading driver" (currently stocuk
   in git-scsi-misc)

2: Boaz's "initio: initio_build_scb() fix" (my name for it)

3: The mystery st.c fix.

yes?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc5: tape drive not responding

2007-12-17 Thread James Bottomley

On Mon, 2007-12-17 at 13:43 -0800, Andrew Morton wrote:
> On Mon, 17 Dec 2007 16:02:02 -0500
> "John Stoffel" <[EMAIL PROTECTED]> wrote:
> 
> > 
> > Just to confirm, the propsed patch to st.c fixes the issue with
> > 2.6.24-rc5 as well at 2.6.24-rc5-mm1 with access to my DLT tape
> > drives.
> 
> err, what patch to st.c?

That's this one:

http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=acdd0b1c371b2fbb4b6110a51ba69cb0af9e6f45

> So it seems that 2.6.24 (and presumably 2.6.23?) need

Not 2.6.23 .. the scatterlist changes causing the st problems are local
to 2.6.24.

> 1: Alan's "initio: fix conflict when loading driver" (currently stocuk
>in git-scsi-misc)

Yes, I'm moving this into scsi-rc-fixes

> 2: Boaz's "initio: initio_build_scb() fix" (my name for it)

And applying this ... although I'd still appreciate confirmation from
someone that the initio driver works after this.

> 3: The mystery st.c fix.
> 
> yes?

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc5: tape drive not responding

2007-12-17 Thread Andrew Morton
On Mon, 17 Dec 2007 16:02:02 -0500
"John Stoffel" <[EMAIL PROTECTED]> wrote:

> 
> Just to confirm, the propsed patch to st.c fixes the issue with
> 2.6.24-rc5 as well at 2.6.24-rc5-mm1 with access to my DLT tape
> drives.

err, what patch to st.c?

So it seems that 2.6.24 (and presumably 2.6.23?) need

1: Alan's "initio: fix conflict when loading driver" (currently stocuk
   in git-scsi-misc)

2: Boaz's "initio: initio_build_scb() fix" (my name for it)

3: The mystery st.c fix.

yes?
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc5: tape drive not responding

2007-12-17 Thread John Stoffel

Just to confirm, the propsed patch to st.c fixes the issue with
2.6.24-rc5 as well at 2.6.24-rc5-mm1 with access to my DLT tape
drives.

Thanks!
John
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc5: tape drive not responding

2007-12-17 Thread John Stoffel
> "Andrew" == Andrew Morton <[EMAIL PROTECTED]> writes:

Andrew> On Mon, 17 Dec 2007 11:25:51 +0900 FUJITA Tomonori <[EMAIL PROTECTED]> 
wrote:
>> On Sun, 16 Dec 2007 20:05:51 -0500
>> "John Stoffel" <[EMAIL PROTECTED]> wrote:
>> > 
>> > [  273.382057] sd 12:0:0:3: Attached scsi generic sg13 type 0
>> > [  276.244872] [ cut here ]
>> > [  276.300215] kernel BUG at include/linux/scatterlist.h:59!
>> > [  276.364873] invalid opcode:  [#1] SMP 
>> > [  276.414346] Modules linked in:
>> > [  276.451148] 
>> > [  276.469036] Pid: 1824, comm: stinit Not tainted (2.6.24-rc5 #2)
>> > [  276.539940] EIP: 0060:[] EFLAGS: 00010213 CPU: 0
>> > [  276.605651] EIP is at st_do_scsi+0x2e0/0x340
>> > [  276.656788] EAX:  EBX:  ECX: c16ef780 EDX: f7c4f050
>> > [  276.731847] ESI: f7c4f7d0 EDI: 1000 EBP: f7c4f000 ESP: f712bdf8
>> > [  276.806904]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
>> > [  276.871568] Process stinit (pid: 1824, ti=f712b000 task=f750a030 
>> > task.ti=f712b000)
>> > [  276.960139] Stack: 0003 f7c4f050   00d59f80 
>> >  f776fe20 c03468a0 
>> > [  277.062012]00d0 f712be9c f7d2a000 f776fe20 f7d2a018 
>> >  0006 f712be9c 
>> > [  277.163890]f7d2a000 f712beac f7c4f000 c0345790 0006 
>> > 0002 000dbba0  
>> > [  277.265771] Call Trace:
>> > [  277.297383]  [] st_sleep_done+0x0/0x70
>> > [  277.352894]  [] check_tape+0x510/0x640
>> > [  277.408414]  [] st_open+0x18b/0x220
>> > [  277.460803]  [] exact_match+0x0/0x10
>> > [  277.514237]  [] st_open+0x0/0x220
>> > [  277.564553]  [] chrdev_open+0x9f/0x190
>> > [  277.620069]  [] chrdev_open+0x0/0x190
>> > [  277.674543]  [] __dentry_open+0xaf/0x1b0
>> > [  277.732136]  [] nameidata_to_filp+0x35/0x40
>> > [  277.792847]  [] do_filp_open+0x4b/0x60
>> > [  277.848364]  [] get_unused_fd_flags+0x52/0xd0
>> > [  277.911153]  [] do_sys_open+0x4c/0xe0
>> > [  277.965629]  [] sys_open+0x1c/0x20
>> 
>> I think that you need the following patch for the scatterlist problem:
>> 
>> http://marc.info/?l=linux-scsi&m=119770154127770&w=2

Andrew> err, you sent that patch to John a day earlier too.

Andrew> John, can you please apply, test and report?

Happily, this seems to fix the problem with the above crash on
2.6.24-rc5-mm1, I've also left the fix in 2.6.24-rc5 and I'll be
testing that in my next reboot.  It's looking good!

So, this regression is fixed in 2.6.24-rc5-mm1.

Next, it would be nice to rate limit the "Parity error detected..."
messages from the Symbios driver, I'll see if I can hack something up
in the next day or so.

Thanks,
John
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc5: tape drive not responding

2007-12-17 Thread John Stoffel
> "Andrew" == Andrew Morton <[EMAIL PROTECTED]> writes:

Andrew> On Mon, 17 Dec 2007 11:25:51 +0900 FUJITA Tomonori <[EMAIL PROTECTED]> 
wrote:
>> On Sun, 16 Dec 2007 20:05:51 -0500
>> "John Stoffel" <[EMAIL PROTECTED]> wrote:
>> 
>> > [  215.007701] sym1: SCSI parity error detected: SCR1=1 DBC=1128 
>> > SBCL=ae
>> > [  215.008145] sym1: SCSI parity error detected: SCR1=1 DBC=1128 
>> > SBCL=ae
>> > [  215.008678] sym1: SCSI parity error detected: SCR1=1 DBC=1128 
>> > SBCL=ae
>> > [  215.009122] sym1: SCSI parity error detected: SCR1=1 DBC=1128 
>> > SBCL=ae
>> > [  215.009598] sym1: SCSI parity error detected: SCR1=1 DBC=1128 
>> > SBCL=ae
>> > [  215.010042] sym1: SCSI parity error detected: SCR1=1 DBC=1128 
>> > SBCL=ae
>> > [  215.010516] sym1: SCSI parity error detected: SCR1=1 DBC=1128 
>> > SBCL=ae
>> > [  215.010959] sym1: SCSI parity error detected: SCR1=1 DBC=1128 
>> > SBCL=ae
>> > [  215.011403] sym1: SCSI parity error detected: SCR1=1 DBC=1128 
>> > SBCL=ae
>> > [  215.011850] sym1: SCSI parity error detected: SCR1=1 DBC=1128 
>> > SBCL=ae
>> > .
>> > .
>> > .
>> > [  232.954629] sym1: SCSI parity error detected: SCR1=1 DBC=1128 
>> > SBCL=ae
>> > [  233.035902] scsi 3:0:3:0: DEVICE RESET operation started
>> > [  233.099514] sym1: SCSI parity error detected: SCR1=1 DBC=1128 
>> > SBCL=ae
>> > .
>> > .
>> > .
>> > 
>> > These repeat for about 15 seconds or so.  They're really annoying and
>> > I'd love to see some sort of rate limiting put in here.  The messages
>> > and end with:
>> > .
>> > .
>> > .
>> > [  238.084175] sym1: SCSI parity error detected: SCR1=1 DBC=1128 
>> > SBCL=ae
>> > [  238.165887] sym1: SCSI parity error detected: SCR1=1 DBC=1128 
>> > SBCL=ae
>> > [  238.247157] scsi 3:0:3:0: DEVICE RESET operation timed-out.
>> > [  238.313892] sym1: SCSI parity error detected: SCR1=1 DBC=1128 
>> > SBCL=ae
>> > [  238.395192] scsi 3:0:3:0: BUS RESET operation started
>> > [  238.455690] sym1: SCSI parity error detected: SCR1=1 DBC=1128 
>> > SBCL=ae
>> > [  238.539216] sym1: SCSI BUS reset detected.
>> > [  238.592552] sym1: SCSI BUS has been reset.
>> > [  238.641576] scsi 3:0:3:0: BUS RESET operation complete.
>> > [  248.700373]  target3:0:3: wide asynchronous
>> > [  248.752026]  target3:0:3: Wide Transfers Fail
>> > [  248.805220]  target3:0:3: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15)
>> > [  248.886729]  target3:0:3: Domain Validation skipping write tests
>> > [  248.958666]  target3:0:3: Ending Domain Validation
>> > [  252.264086] scsi 3:0:0:0: Attached scsi generic sg2 type 8
>> > [  252.331257] st 3:0:2:0: Attached scsi tape st0
>> > [  252.384549] st 3:0:2:0: st0: try direct i/o: yes (alignment 512 B)
>> > [  252.458875] st 3:0:2:0: Attached scsi generic sg3 type 1
>> > [  252.523963] st 3:0:3:0: Attached scsi tape st1
>> > [  252.577184] st 3:0:3:0: st1: try direct i/o: yes (alignment 512 B)
>> > [  252.651484] st 3:0:3:0: Attached scsi generic sg4 type 1
>> > 
>> > 
>> > I've also got an ATL P1000 SCSI tape library hooked up to this same
>> > controller and port, and I can manipulate it properly using the 'mtx'
>> > program pointed to the /dev/changer alias, which points to the correct
>> > /dev/sg# device.
>> > 
>> > Here's my /proc/scsi/scsi output, as you can see, I've got a bunch of
>> > devices on this system:
>> > 
>> > # cat /proc/scsi/scsi 
>> > Attached devices:
>> > Host: scsi0 Channel: 00 Id: 00 Lun: 00
>> >   Vendor: COMPAQ   Model: HC01841729   Rev: 3208
>> >   Type:   Direct-AccessANSI  SCSI revision: 02
>> > Host: scsi0 Channel: 00 Id: 01 Lun: 00
>> >   Vendor: COMPAQ   Model: BD018222CA   Rev: B016
>> >   Type:   Direct-AccessANSI  SCSI revision: 02
>> > Host: scsi3 Channel: 00 Id: 00 Lun: 00
>> >   Vendor: ATL  Model: P10006220051 Rev: 1.20
>> >   Type:   Medium Changer   ANSI  SCSI revision: 02
>> > Host: scsi3 Channel: 00 Id: 02 Lun: 00
>> >   Vendor: QUANTUM  Model: DLT7000  Rev: 2565
>> >   Type:   Sequential-AccessANSI  SCSI revision: 02
>> > Host: scsi3 Channel: 00 Id: 03 Lun: 00
>> >   Vendor: QUANTUM  Model: DLT7000  Rev: 2565
>> >   Type:   Sequential-AccessANSI  SCSI revision: 02
>> > Host: scsi4 Channel: 00 Id: 00 Lun: 00
>> >   Vendor: SAMSUNG  Model: CDRW/DVD SM-352B Rev: T806
>> >   Type:   CD-ROM   ANSI  SCSI revision: 05
>> > Host: scsi6 Channel: 00 Id: 00 Lun: 00
>> >   Vendor: ATA  Model: ST3320620AS  Rev: 3.AA
>> >   Type:   Direct-AccessANSI  SCSI revision: 05
>> > Host: scsi7 Channel: 00 Id: 00 Lun: 00
>> >   Vendor: ATA  Model: WDC WD3200AAKS-0 Rev: 12.0
>> >   Type:   Direct-AccessANSI  SCSI revision: 05
>> > Host: scsi10 Channel: 00 Id: 00 Lun: 00
>> >   Vendor: ATA  Model: WDC WD1200JB-00C Rev: 17.0
>> >   Type:   Direct-AccessANSI  SCSI rev

Re: 2.6.24-rc5: tape drive not responding

2007-12-17 Thread Andrew Morton
On Mon, 17 Dec 2007 11:25:51 +0900 FUJITA Tomonori <[EMAIL PROTECTED]> wrote:

> On Sun, 16 Dec 2007 20:05:51 -0500
> "John Stoffel" <[EMAIL PROTECTED]> wrote:
> 
> > [  215.007701] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> > [  215.008145] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> > [  215.008678] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> > [  215.009122] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> > [  215.009598] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> > [  215.010042] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> > [  215.010516] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> > [  215.010959] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> > [  215.011403] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> > [  215.011850] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> > .
> > .
> > .
> > [  232.954629] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> > [  233.035902] scsi 3:0:3:0: DEVICE RESET operation started
> > [  233.099514] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> > .
> > .
> > .
> > 
> > These repeat for about 15 seconds or so.  They're really annoying and
> > I'd love to see some sort of rate limiting put in here.  The messages
> > and end with:
> > .
> > .
> > .
> > [  238.084175] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> > [  238.165887] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> > [  238.247157] scsi 3:0:3:0: DEVICE RESET operation timed-out.
> > [  238.313892] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> > [  238.395192] scsi 3:0:3:0: BUS RESET operation started
> > [  238.455690] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> > [  238.539216] sym1: SCSI BUS reset detected.
> > [  238.592552] sym1: SCSI BUS has been reset.
> > [  238.641576] scsi 3:0:3:0: BUS RESET operation complete.
> > [  248.700373]  target3:0:3: wide asynchronous
> > [  248.752026]  target3:0:3: Wide Transfers Fail
> > [  248.805220]  target3:0:3: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15)
> > [  248.886729]  target3:0:3: Domain Validation skipping write tests
> > [  248.958666]  target3:0:3: Ending Domain Validation
> > [  252.264086] scsi 3:0:0:0: Attached scsi generic sg2 type 8
> > [  252.331257] st 3:0:2:0: Attached scsi tape st0
> > [  252.384549] st 3:0:2:0: st0: try direct i/o: yes (alignment 512 B)
> > [  252.458875] st 3:0:2:0: Attached scsi generic sg3 type 1
> > [  252.523963] st 3:0:3:0: Attached scsi tape st1
> > [  252.577184] st 3:0:3:0: st1: try direct i/o: yes (alignment 512 B)
> > [  252.651484] st 3:0:3:0: Attached scsi generic sg4 type 1
> > 
> > 
> > I've also got an ATL P1000 SCSI tape library hooked up to this same
> > controller and port, and I can manipulate it properly using the 'mtx'
> > program pointed to the /dev/changer alias, which points to the correct
> > /dev/sg# device.
> > 
> > Here's my /proc/scsi/scsi output, as you can see, I've got a bunch of
> > devices on this system:
> > 
> > # cat /proc/scsi/scsi 
> > Attached devices:
> > Host: scsi0 Channel: 00 Id: 00 Lun: 00
> >   Vendor: COMPAQ   Model: HC01841729   Rev: 3208
> >   Type:   Direct-AccessANSI  SCSI revision: 02
> > Host: scsi0 Channel: 00 Id: 01 Lun: 00
> >   Vendor: COMPAQ   Model: BD018222CA   Rev: B016
> >   Type:   Direct-AccessANSI  SCSI revision: 02
> > Host: scsi3 Channel: 00 Id: 00 Lun: 00
> >   Vendor: ATL  Model: P10006220051 Rev: 1.20
> >   Type:   Medium Changer   ANSI  SCSI revision: 02
> > Host: scsi3 Channel: 00 Id: 02 Lun: 00
> >   Vendor: QUANTUM  Model: DLT7000  Rev: 2565
> >   Type:   Sequential-AccessANSI  SCSI revision: 02
> > Host: scsi3 Channel: 00 Id: 03 Lun: 00
> >   Vendor: QUANTUM  Model: DLT7000  Rev: 2565
> >   Type:   Sequential-AccessANSI  SCSI revision: 02
> > Host: scsi4 Channel: 00 Id: 00 Lun: 00
> >   Vendor: SAMSUNG  Model: CDRW/DVD SM-352B Rev: T806
> >   Type:   CD-ROM   ANSI  SCSI revision: 05
> > Host: scsi6 Channel: 00 Id: 00 Lun: 00
> >   Vendor: ATA  Model: ST3320620AS  Rev: 3.AA
> >   Type:   Direct-AccessANSI  SCSI revision: 05
> > Host: scsi7 Channel: 00 Id: 00 Lun: 00
> >   Vendor: ATA  Model: WDC WD3200AAKS-0 Rev: 12.0
> >   Type:   Direct-AccessANSI  SCSI revision: 05
> > Host: scsi10 Channel: 00 Id: 00 Lun: 00
> >   Vendor: ATA  Model: WDC WD1200JB-00C Rev: 17.0
> >   Type:   Direct-AccessANSI  SCSI revision: 05
> > Host: scsi11 Channel: 00 Id: 00 Lun: 00
> >   Vendor: ATA  Model: WDC WD1200JB-00E Rev: 15.0
> >   Type:   Direct-AccessANSI  SCSI revision: 05
> > Host: scsi12 Channel: 00 Id: 00 Lun: 00
> >   Vendor: Generic  Model

Re: 2.6.24-rc5: tape drive not responding

2007-12-16 Thread FUJITA Tomonori
On Sun, 16 Dec 2007 20:05:51 -0500
"John Stoffel" <[EMAIL PROTECTED]> wrote:

> [  215.007701] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> [  215.008145] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> [  215.008678] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> [  215.009122] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> [  215.009598] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> [  215.010042] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> [  215.010516] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> [  215.010959] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> [  215.011403] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> [  215.011850] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> .
> .
> .
> [  232.954629] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> [  233.035902] scsi 3:0:3:0: DEVICE RESET operation started
> [  233.099514] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> .
> .
> .
> 
> These repeat for about 15 seconds or so.  They're really annoying and
> I'd love to see some sort of rate limiting put in here.  The messages
> and end with:
> .
> .
> .
> [  238.084175] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> [  238.165887] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> [  238.247157] scsi 3:0:3:0: DEVICE RESET operation timed-out.
> [  238.313892] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> [  238.395192] scsi 3:0:3:0: BUS RESET operation started
> [  238.455690] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
> [  238.539216] sym1: SCSI BUS reset detected.
> [  238.592552] sym1: SCSI BUS has been reset.
> [  238.641576] scsi 3:0:3:0: BUS RESET operation complete.
> [  248.700373]  target3:0:3: wide asynchronous
> [  248.752026]  target3:0:3: Wide Transfers Fail
> [  248.805220]  target3:0:3: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15)
> [  248.886729]  target3:0:3: Domain Validation skipping write tests
> [  248.958666]  target3:0:3: Ending Domain Validation
> [  252.264086] scsi 3:0:0:0: Attached scsi generic sg2 type 8
> [  252.331257] st 3:0:2:0: Attached scsi tape st0
> [  252.384549] st 3:0:2:0: st0: try direct i/o: yes (alignment 512 B)
> [  252.458875] st 3:0:2:0: Attached scsi generic sg3 type 1
> [  252.523963] st 3:0:3:0: Attached scsi tape st1
> [  252.577184] st 3:0:3:0: st1: try direct i/o: yes (alignment 512 B)
> [  252.651484] st 3:0:3:0: Attached scsi generic sg4 type 1
> 
> 
> I've also got an ATL P1000 SCSI tape library hooked up to this same
> controller and port, and I can manipulate it properly using the 'mtx'
> program pointed to the /dev/changer alias, which points to the correct
> /dev/sg# device.
> 
> Here's my /proc/scsi/scsi output, as you can see, I've got a bunch of
> devices on this system:
> 
> # cat /proc/scsi/scsi 
> Attached devices:
> Host: scsi0 Channel: 00 Id: 00 Lun: 00
>   Vendor: COMPAQ   Model: HC01841729   Rev: 3208
>   Type:   Direct-AccessANSI  SCSI revision: 02
> Host: scsi0 Channel: 00 Id: 01 Lun: 00
>   Vendor: COMPAQ   Model: BD018222CA   Rev: B016
>   Type:   Direct-AccessANSI  SCSI revision: 02
> Host: scsi3 Channel: 00 Id: 00 Lun: 00
>   Vendor: ATL  Model: P10006220051 Rev: 1.20
>   Type:   Medium Changer   ANSI  SCSI revision: 02
> Host: scsi3 Channel: 00 Id: 02 Lun: 00
>   Vendor: QUANTUM  Model: DLT7000  Rev: 2565
>   Type:   Sequential-AccessANSI  SCSI revision: 02
> Host: scsi3 Channel: 00 Id: 03 Lun: 00
>   Vendor: QUANTUM  Model: DLT7000  Rev: 2565
>   Type:   Sequential-AccessANSI  SCSI revision: 02
> Host: scsi4 Channel: 00 Id: 00 Lun: 00
>   Vendor: SAMSUNG  Model: CDRW/DVD SM-352B Rev: T806
>   Type:   CD-ROM   ANSI  SCSI revision: 05
> Host: scsi6 Channel: 00 Id: 00 Lun: 00
>   Vendor: ATA  Model: ST3320620AS  Rev: 3.AA
>   Type:   Direct-AccessANSI  SCSI revision: 05
> Host: scsi7 Channel: 00 Id: 00 Lun: 00
>   Vendor: ATA  Model: WDC WD3200AAKS-0 Rev: 12.0
>   Type:   Direct-AccessANSI  SCSI revision: 05
> Host: scsi10 Channel: 00 Id: 00 Lun: 00
>   Vendor: ATA  Model: WDC WD1200JB-00C Rev: 17.0
>   Type:   Direct-AccessANSI  SCSI revision: 05
> Host: scsi11 Channel: 00 Id: 00 Lun: 00
>   Vendor: ATA  Model: WDC WD1200JB-00E Rev: 15.0
>   Type:   Direct-AccessANSI  SCSI revision: 05
> Host: scsi12 Channel: 00 Id: 00 Lun: 00
>   Vendor: Generic  Model: STORAGE DEVICE   Rev: 0001
>   Type:   Direct-AccessANSI  SCSI revision: 00
> Host: scsi12 Channel: 00 Id: 00 Lun: 01
>   Vendor: Generic  Model: STORAGE DEVICE   Rev: 0001
>   Type:   Direct-AccessANSI  SCSI revision: 00
> H

Re: 2.6.24-rc5: tape drive not responding

2007-12-16 Thread John Stoffel

Hi,

This looks to be a regression between 2.6.23 and 2.6.24-rc5, I'll try
to bi-sect this and report more on it.  Basically, when I bootup, I
get a ton of errors in the dmesg log along the lines of:

[  215.007701] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
[  215.008145] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
[  215.008678] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
[  215.009122] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
[  215.009598] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
[  215.010042] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
[  215.010516] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
[  215.010959] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
[  215.011403] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
[  215.011850] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
.
.
.
[  232.954629] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
[  233.035902] scsi 3:0:3:0: DEVICE RESET operation started
[  233.099514] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
.
.
.

These repeat for about 15 seconds or so.  They're really annoying and
I'd love to see some sort of rate limiting put in here.  The messages
and end with:
.
.
.
[  238.084175] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
[  238.165887] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
[  238.247157] scsi 3:0:3:0: DEVICE RESET operation timed-out.
[  238.313892] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
[  238.395192] scsi 3:0:3:0: BUS RESET operation started
[  238.455690] sym1: SCSI parity error detected: SCR1=1 DBC=1128 SBCL=ae
[  238.539216] sym1: SCSI BUS reset detected.
[  238.592552] sym1: SCSI BUS has been reset.
[  238.641576] scsi 3:0:3:0: BUS RESET operation complete.
[  248.700373]  target3:0:3: wide asynchronous
[  248.752026]  target3:0:3: Wide Transfers Fail
[  248.805220]  target3:0:3: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15)
[  248.886729]  target3:0:3: Domain Validation skipping write tests
[  248.958666]  target3:0:3: Ending Domain Validation
[  252.264086] scsi 3:0:0:0: Attached scsi generic sg2 type 8
[  252.331257] st 3:0:2:0: Attached scsi tape st0
[  252.384549] st 3:0:2:0: st0: try direct i/o: yes (alignment 512 B)
[  252.458875] st 3:0:2:0: Attached scsi generic sg3 type 1
[  252.523963] st 3:0:3:0: Attached scsi tape st1
[  252.577184] st 3:0:3:0: st1: try direct i/o: yes (alignment 512 B)
[  252.651484] st 3:0:3:0: Attached scsi generic sg4 type 1


I've also got an ATL P1000 SCSI tape library hooked up to this same
controller and port, and I can manipulate it properly using the 'mtx'
program pointed to the /dev/changer alias, which points to the correct
/dev/sg# device.

Here's my /proc/scsi/scsi output, as you can see, I've got a bunch of
devices on this system:

# cat /proc/scsi/scsi 
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: COMPAQ   Model: HC01841729   Rev: 3208
  Type:   Direct-AccessANSI  SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 00
  Vendor: COMPAQ   Model: BD018222CA   Rev: B016
  Type:   Direct-AccessANSI  SCSI revision: 02
Host: scsi3 Channel: 00 Id: 00 Lun: 00
  Vendor: ATL  Model: P10006220051 Rev: 1.20
  Type:   Medium Changer   ANSI  SCSI revision: 02
Host: scsi3 Channel: 00 Id: 02 Lun: 00
  Vendor: QUANTUM  Model: DLT7000  Rev: 2565
  Type:   Sequential-AccessANSI  SCSI revision: 02
Host: scsi3 Channel: 00 Id: 03 Lun: 00
  Vendor: QUANTUM  Model: DLT7000  Rev: 2565
  Type:   Sequential-AccessANSI  SCSI revision: 02
Host: scsi4 Channel: 00 Id: 00 Lun: 00
  Vendor: SAMSUNG  Model: CDRW/DVD SM-352B Rev: T806
  Type:   CD-ROM   ANSI  SCSI revision: 05
Host: scsi6 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA  Model: ST3320620AS  Rev: 3.AA
  Type:   Direct-AccessANSI  SCSI revision: 05
Host: scsi7 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA  Model: WDC WD3200AAKS-0 Rev: 12.0
  Type:   Direct-AccessANSI  SCSI revision: 05
Host: scsi10 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA  Model: WDC WD1200JB-00C Rev: 17.0
  Type:   Direct-AccessANSI  SCSI revision: 05
Host: scsi11 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA  Model: WDC WD1200JB-00E Rev: 15.0
  Type:   Direct-AccessANSI  SCSI revision: 05
Host: scsi12 Channel: 00 Id: 00 Lun: 00
  Vendor: Generic  Model: STORAGE DEVICE   Rev: 0001
  Type:   Direct-AccessANSI  SCSI revision: 00
Host: scsi12 Channel: 00 Id: 00 Lun: 01
  Vendor: Generic  Model: STORAGE DEVICE   Rev: 0001
  Type:   Direct-AccessANSI  SCSI revision: 00
Host: scsi12 Channel: 00 Id: 00 Lun: 02
  Vendor: Generic  Model: STO