[PATCH] SCSI st : convert to unlocked_ioctl

2008-01-17 Thread Kai Makisara
Convert st to unlocked_ioctl. The necessary locking was already in place.

Signed-off-by: Kai Makisara <[EMAIL PROTECTED]>
---
The patch is against 2.6.24-rc8.

 drivers/scsi/st.c |9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

--- linux-2.6/drivers/scsi/st.c 2007-12-20 18:26:03.0 +0200
+++ linux-2.6-rc8-test/drivers/scsi/st.c2008-01-17 21:49:14.0 
+0200
@@ -9,7 +9,7 @@
Steve Hirsch, Andreas Koppenh"ofer, Michael Leodolter, Eyal Lebedinsky,
Michael Schaefer, J"org Weule, and Eric Youngdale.
 
-   Copyright 1992 - 2007 Kai Makisara
+   Copyright 1992 - 2008 Kai Makisara
email [EMAIL PROTECTED]
 
Some small formal changes - aeb, 950809
@@ -17,7 +17,7 @@
Last modified: 18-JAN-1998 Richard Gooch <[EMAIL PROTECTED]> Devfs support
  */
 
-static const char *verstr = "20070203";
+static const char *verstr = "20080117";
 
 #include 
 
@@ -3214,8 +3214,7 @@ static int partition_tape(struct scsi_ta
 
 
 /* The ioctl command */
-static int st_ioctl(struct inode *inode, struct file *file,
-   unsigned int cmd_in, unsigned long arg)
+static long st_ioctl(struct file *file, unsigned int cmd_in, unsigned long arg)
 {
int i, cmd_nr, cmd_type, bt;
int retval = 0;
@@ -3870,7 +3869,7 @@ static const struct file_operations st_f
.owner =THIS_MODULE,
.read = st_read,
.write =st_write,
-   .ioctl =st_ioctl,
+   .unlocked_ioctl = st_ioctl,
 #ifdef CONFIG_COMPAT
.compat_ioctl = st_compat_ioctl,
 #endif
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] SCSI st : convert to unlocked_ioctl

2008-01-17 Thread Kai Makisara
Convert st to unlocked_ioctl. The necessary locking was already in place.

Signed-off-by: Kai Makisara [EMAIL PROTECTED]
---
The patch is against 2.6.24-rc8.

 drivers/scsi/st.c |9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

--- linux-2.6/drivers/scsi/st.c 2007-12-20 18:26:03.0 +0200
+++ linux-2.6-rc8-test/drivers/scsi/st.c2008-01-17 21:49:14.0 
+0200
@@ -9,7 +9,7 @@
Steve Hirsch, Andreas Koppenhofer, Michael Leodolter, Eyal Lebedinsky,
Michael Schaefer, Jorg Weule, and Eric Youngdale.
 
-   Copyright 1992 - 2007 Kai Makisara
+   Copyright 1992 - 2008 Kai Makisara
email [EMAIL PROTECTED]
 
Some small formal changes - aeb, 950809
@@ -17,7 +17,7 @@
Last modified: 18-JAN-1998 Richard Gooch [EMAIL PROTECTED] Devfs support
  */
 
-static const char *verstr = 20070203;
+static const char *verstr = 20080117;
 
 #include linux/module.h
 
@@ -3214,8 +3214,7 @@ static int partition_tape(struct scsi_ta
 
 
 /* The ioctl command */
-static int st_ioctl(struct inode *inode, struct file *file,
-   unsigned int cmd_in, unsigned long arg)
+static long st_ioctl(struct file *file, unsigned int cmd_in, unsigned long arg)
 {
int i, cmd_nr, cmd_type, bt;
int retval = 0;
@@ -3870,7 +3869,7 @@ static const struct file_operations st_f
.owner =THIS_MODULE,
.read = st_read,
.write =st_write,
-   .ioctl =st_ioctl,
+   .unlocked_ioctl = st_ioctl,
 #ifdef CONFIG_COMPAT
.compat_ioctl = st_compat_ioctl,
 #endif
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc5: tape drive not responding

2007-12-18 Thread Kai Makisara
On Mon, 17 Dec 2007, James Bottomley wrote:

> 
> On Mon, 2007-12-17 at 13:43 -0800, Andrew Morton wrote:
> > On Mon, 17 Dec 2007 16:02:02 -0500
> > "John Stoffel" <[EMAIL PROTECTED]> wrote:
> > 
> > > 
> > > Just to confirm, the propsed patch to st.c fixes the issue with
> > > 2.6.24-rc5 as well at 2.6.24-rc5-mm1 with access to my DLT tape
> > > drives.
> > 
> > err, what patch to st.c?
> 
> That's this one:
> 
> http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=acdd0b1c371b2fbb4b6110a51ba69cb0af9e6f45
> 
I have done some tests. Firstly, I did not see the BUG with 2.6.24-rc5. 
Looking at include/linux/scatterlist.h suggested that CONFIG_DEBUG_SG has 
something to do with this. When enabled SG debugging, I also saw the BUG. 
Adding this patch solved the problem.

You can add

Acked-by: Kai Makisara <[EMAIL PROTECTED]>

if you want. This fix should be included in 2.6.24.

-- 
Kai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc5: tape drive not responding

2007-12-18 Thread Kai Makisara
On Mon, 17 Dec 2007, James Bottomley wrote:

 
 On Mon, 2007-12-17 at 13:43 -0800, Andrew Morton wrote:
  On Mon, 17 Dec 2007 16:02:02 -0500
  John Stoffel [EMAIL PROTECTED] wrote:
  
   
   Just to confirm, the propsed patch to st.c fixes the issue with
   2.6.24-rc5 as well at 2.6.24-rc5-mm1 with access to my DLT tape
   drives.
  
  err, what patch to st.c?
 
 That's this one:
 
 http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=acdd0b1c371b2fbb4b6110a51ba69cb0af9e6f45
 
I have done some tests. Firstly, I did not see the BUG with 2.6.24-rc5. 
Looking at include/linux/scatterlist.h suggested that CONFIG_DEBUG_SG has 
something to do with this. When enabled SG debugging, I also saw the BUG. 
Adding this patch solved the problem.

You can add

Acked-by: Kai Makisara [EMAIL PROTECTED]

if you want. This fix should be included in 2.6.24.

-- 
Kai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] Use mutex instead of semaphore in the SCSI Tape driver

2007-07-30 Thread Kai Makisara
On Sun, 29 Jul 2007, Matthias Kaehlcke wrote:

> The SCSI Tape driver uses a semaphore as mutex. Use the mutex API
> instead of the (binary) semaphore.
> 
> Signed-off-by: Matthias Kaehlcke <[EMAIL PROTECTED]>
> 
Signed-off-by: Kai Makisara <[EMAIL PROTECTED]>

Thanks.

-- 
Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] Use mutex instead of semaphore in the SCSI Tape driver

2007-07-30 Thread Kai Makisara
On Sun, 29 Jul 2007, Matthias Kaehlcke wrote:

 The SCSI Tape driver uses a semaphore as mutex. Use the mutex API
 instead of the (binary) semaphore.
 
 Signed-off-by: Matthias Kaehlcke [EMAIL PROTECTED]
 
Signed-off-by: Kai Makisara [EMAIL PROTECTED]

Thanks.

-- 
Kai
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/33] SG table chaining support

2007-07-16 Thread Kai Makisara
On Mon, 16 Jul 2007, Martin K. Petersen wrote:

> > "John" == John Stoffel <[EMAIL PROTECTED]> writes:
> 
> John> Will this help out tape drive performance at all?  I looked
> John> through the patches quickly, esp the AIC7xxx stuff since that's
> John> what I use, but nothing jumped out at me...
> 
> Yes.  Most modern tape drives want a block size of 1MB or higher.
> With the old stack we'd be stuck at 512KB because the sg limitations
> caused us to come just short of 1MB...
> 
Tape block sizes up to 16 MB have been possible for a very long time but 
this has required tuning of the block/scsi parameters. Very few people 
seem to have done this and the common (mis)belief seems to be that the 
tape block size limit has been 512 kB. It is good if this tuning is not
needed in future.

-- 
Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SMART problems in 2.6.22

2007-07-16 Thread Kai Makisara
On Mon, 16 Jul 2007, Tejun Heo wrote:

> Please try the patch in the following message.
> 
> http://article.gmane.org/gmane.linux.ide/20799/raw
> 
This solves the 'smartctl -H' problem both of my systems (one with Nvidia 
CK804 and one with MCP51).

Tested-by: Kai Makisara <[EMAIL PROTECTED]>

Thanks for pointing out the patch.

-- 
Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SMART problems in 2.6.22

2007-07-16 Thread Kai Makisara
On Tue, 10 Jul 2007, Kai Makisara wrote:

> On Sun, 8 Jul 2007, Bruce Allen wrote:
> 
> > Mark, David, Doug, Tejin, Alan, Jeff, LKML,
> > 
> > I'm afraid that there may be some problem with SMART + libata in the 2.6.22
> > kernel.  An hour ago I discovered that I missed a month of correspondence
> > (some LKML, some private) about this problem which Alan, Tejun, Jeff, Mark 
> > and
> > others copied to me -- it was automatically shoved into one of my mailboxes 
> > by
> > my mail client.  Sorry about that.  So I am trying to catch up to see if 
> > there
> > is some real problem or not.
> > 
...
> > http://www.ussg.iu.edu/hypermail/linux/kernel/0706.1/0849.html
> 
> I have done some more debugging on this one. An easy way to reproduce the 
> problem is to use 'smartctl -H /dev/sdb'. If I enable debugging with '-r 
> ioctl,2', I find the following difference between outputs using 2.6.21.1 
> (works OK) and 2.6.22 (fails):
> 
...
> The log shows that the sense data returned by the commands differ: with 
> 2.6.22 the bytes 4f and 2c (tf.lbam and tf.lbah) are not returned. Both of 
> the status commands fail to return these bytes but the tests in smartctl 
> are more strict for the second case. This is why the second status command 
> seems to be failing.
> 
> Next I added printks to the function ata_qc_complete() in libata-core.c. 
> The changed code from 2.6.22 at line 5222 looked like this:
> 
...
> The output from 2.6.21.6 looks like this:
> 
> Jul  9 18:37:44 kai kernel: [  193.443874] ata_qc_complete before: 00 00 00 40
> Jul  9 18:37:44 kai kernel: [  193.443880] ata_qc_complete 16: 00 4f c2 50
> Jul  9 18:37:44 kai kernel: [  193.462802] ata_qc_complete before: 00 4f c2 40
> Jul  9 18:37:44 kai kernel: [  193.462807] ata_qc_complete 16: 00 4f c2 50
> 
> i.e., the bytes are returned.
> 
> The output from 2.6.22 is different:
> 
> Jul  9 18:44:35 kai kernel: [  147.765965] ata_qc_complete before: 00 00 00 40
> Jul  9 18:44:35 kai kernel: [  147.765970] ata_qc_complete 16: 00 00 00 50
> Jul  9 18:44:35 kai kernel: [  147.784890] ata_qc_complete before: 00 00 00 40
> Jul  9 18:44:35 kai kernel: [  147.784894] ata_qc_complete 16: 00 00 00 50
> 
> The lbam and lbah bytes are not returned but the command byte is.
> 
The other system with the Maxtor disk fails in a slightly different way 
(it correctly returns the c2 byte but not in the correct location):

[  162.896173] ata_qc_complete before: 00 00 00 40
[  162.896179] ata_qc_complete 16: 00 c2 00 50



My earlier 'git bisect' suggested that this problem surfaced after the 
patch

1e999736cafdffc374f22eed37b291129ef82e4e is first bad commit
commit 1e999736cafdffc374f22eed37b291129ef82e4e
Author: Alan Cox <[EMAIL PROTECTED]>
Date:   Wed Apr 11 00:23:13 2007 +0100

libata: HPA support

I have now done some further tests to see what is happening. 
It turned out that after commenting the call (at line 1956 in 
drivers/ata/libata-core.c in 2.6.22)

if (ata_id_hpa_enabled(dev->id))
   dev->n_sectors = ata_hpa_resize(dev);

'smartctl -H' worked again without problems. This applied to both of the 
systems where I see the problem. The disks in both systems support hpa but 
nothing is hidden. Next I commented only the call to 
ata_read_native_max_address_ext() in ata_hpa_resize(). This was enough 
to remove the problem (as was expected).

So, the question is: why does calling ata_read_native_max_address_ext() 
when booting the system cause the SMART RETURN STATUS fail much later?

-- 
Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SMART problems in 2.6.22

2007-07-16 Thread Kai Makisara
On Tue, 10 Jul 2007, Kai Makisara wrote:

 On Sun, 8 Jul 2007, Bruce Allen wrote:
 
  Mark, David, Doug, Tejin, Alan, Jeff, LKML,
  
  I'm afraid that there may be some problem with SMART + libata in the 2.6.22
  kernel.  An hour ago I discovered that I missed a month of correspondence
  (some LKML, some private) about this problem which Alan, Tejun, Jeff, Mark 
  and
  others copied to me -- it was automatically shoved into one of my mailboxes 
  by
  my mail client.  Sorry about that.  So I am trying to catch up to see if 
  there
  is some real problem or not.
  
...
  http://www.ussg.iu.edu/hypermail/linux/kernel/0706.1/0849.html
 
 I have done some more debugging on this one. An easy way to reproduce the 
 problem is to use 'smartctl -H /dev/sdb'. If I enable debugging with '-r 
 ioctl,2', I find the following difference between outputs using 2.6.21.1 
 (works OK) and 2.6.22 (fails):
 
...
 The log shows that the sense data returned by the commands differ: with 
 2.6.22 the bytes 4f and 2c (tf.lbam and tf.lbah) are not returned. Both of 
 the status commands fail to return these bytes but the tests in smartctl 
 are more strict for the second case. This is why the second status command 
 seems to be failing.
 
 Next I added printks to the function ata_qc_complete() in libata-core.c. 
 The changed code from 2.6.22 at line 5222 looked like this:
 
...
 The output from 2.6.21.6 looks like this:
 
 Jul  9 18:37:44 kai kernel: [  193.443874] ata_qc_complete before: 00 00 00 40
 Jul  9 18:37:44 kai kernel: [  193.443880] ata_qc_complete 16: 00 4f c2 50
 Jul  9 18:37:44 kai kernel: [  193.462802] ata_qc_complete before: 00 4f c2 40
 Jul  9 18:37:44 kai kernel: [  193.462807] ata_qc_complete 16: 00 4f c2 50
 
 i.e., the bytes are returned.
 
 The output from 2.6.22 is different:
 
 Jul  9 18:44:35 kai kernel: [  147.765965] ata_qc_complete before: 00 00 00 40
 Jul  9 18:44:35 kai kernel: [  147.765970] ata_qc_complete 16: 00 00 00 50
 Jul  9 18:44:35 kai kernel: [  147.784890] ata_qc_complete before: 00 00 00 40
 Jul  9 18:44:35 kai kernel: [  147.784894] ata_qc_complete 16: 00 00 00 50
 
 The lbam and lbah bytes are not returned but the command byte is.
 
The other system with the Maxtor disk fails in a slightly different way 
(it correctly returns the c2 byte but not in the correct location):

[  162.896173] ata_qc_complete before: 00 00 00 40
[  162.896179] ata_qc_complete 16: 00 c2 00 50



My earlier 'git bisect' suggested that this problem surfaced after the 
patch

1e999736cafdffc374f22eed37b291129ef82e4e is first bad commit
commit 1e999736cafdffc374f22eed37b291129ef82e4e
Author: Alan Cox [EMAIL PROTECTED]
Date:   Wed Apr 11 00:23:13 2007 +0100

libata: HPA support

I have now done some further tests to see what is happening. 
It turned out that after commenting the call (at line 1956 in 
drivers/ata/libata-core.c in 2.6.22)

if (ata_id_hpa_enabled(dev-id))
   dev-n_sectors = ata_hpa_resize(dev);

'smartctl -H' worked again without problems. This applied to both of the 
systems where I see the problem. The disks in both systems support hpa but 
nothing is hidden. Next I commented only the call to 
ata_read_native_max_address_ext() in ata_hpa_resize(). This was enough 
to remove the problem (as was expected).

So, the question is: why does calling ata_read_native_max_address_ext() 
when booting the system cause the SMART RETURN STATUS fail much later?

-- 
Kai
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SMART problems in 2.6.22

2007-07-16 Thread Kai Makisara
On Mon, 16 Jul 2007, Tejun Heo wrote:

 Please try the patch in the following message.
 
 http://article.gmane.org/gmane.linux.ide/20799/raw
 
This solves the 'smartctl -H' problem both of my systems (one with Nvidia 
CK804 and one with MCP51).

Tested-by: Kai Makisara [EMAIL PROTECTED]

Thanks for pointing out the patch.

-- 
Kai
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/33] SG table chaining support

2007-07-16 Thread Kai Makisara
On Mon, 16 Jul 2007, Martin K. Petersen wrote:

  John == John Stoffel [EMAIL PROTECTED] writes:
 
 John Will this help out tape drive performance at all?  I looked
 John through the patches quickly, esp the AIC7xxx stuff since that's
 John what I use, but nothing jumped out at me...
 
 Yes.  Most modern tape drives want a block size of 1MB or higher.
 With the old stack we'd be stuck at 512KB because the sg limitations
 caused us to come just short of 1MB...
 
Tape block sizes up to 16 MB have been possible for a very long time but 
this has required tuning of the block/scsi parameters. Very few people 
seem to have done this and the common (mis)belief seems to be that the 
tape block size limit has been 512 kB. It is good if this tuning is not
needed in future.

-- 
Kai
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SMART problems in 2.6.22

2007-07-09 Thread Kai Makisara
On Sun, 8 Jul 2007, Bruce Allen wrote:

> Mark, David, Doug, Tejin, Alan, Jeff, LKML,
> 
> I'm afraid that there may be some problem with SMART + libata in the 2.6.22
> kernel.  An hour ago I discovered that I missed a month of correspondence
> (some LKML, some private) about this problem which Alan, Tejun, Jeff, Mark and
> others copied to me -- it was automatically shoved into one of my mailboxes by
> my mail client.  Sorry about that.  So I am trying to catch up to see if there
> is some real problem or not.
> 
> Here is a typical bug report that worries me:
> http://article.gmane.org/gmane.linux.utilities.smartmontools/4712
> 
> Here is another similar report:
> http://thread.gmane.org/gmane.linux.utilities.smartmontools/4713
> 
> And another report:
> http://www.mail-archive.com/[EMAIL PROTECTED]/msg358354.html
> 
> >From some of the earlier threads that I missed (below) I have the impression
> that the problem may be a very simple one, namely that starting with 2.6.22
> one needs to run a command to enable SMART when a box is first booted -- the
> kernel no longer does this as part of the init/setup of the disks. But that is
> NOT consistent with the first two reports above, which show 'SMART ENABLED'.
> 
> Here are some of the earlier threads that I completely missed:
> 
> http://www.ussg.iu.edu/hypermail/linux/kernel/0706.1/0849.html

I have done some more debugging on this one. An easy way to reproduce the 
problem is to use 'smartctl -H /dev/sdb'. If I enable debugging with '-r 
ioctl,2', I find the following difference between outputs using 2.6.21.1 
(works OK) and 2.6.22 (fails):

--- sm-2.6.21.1b.log2007-07-09 23:47:28.0 +0300
+++ sm-2.6.22.log   2007-07-09 23:39:56.0 +0300
@@ -11,7 +11,7 @@
   status=0x0
  [ata pass-through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00 ]
   scsi_status=0x0, host_status=0x0, driver_status=0x0
-  info=0x0  duration=0 milliseconds  resid=0
+  info=0x0  duration=4 milliseconds  resid=0
   Incoming data, len=512 [only first 256 bytes shown]:
  00 5a 0c ff 3f 37 c8 10 00  00 00 00 00 3f 00 00 00   
 
  10 00 00 00 00 20 20 20 20  20 20 20 20 20 20 20 20   
 
@@ -97,11 +97,11 @@
   scsi_status=0x2, host_status=0x0, driver_status=0x8
   info=0x1  duration=48 milliseconds  resid=0
   >>> Sense buffer, len=22:
- 00 72 00 00 00 00 00 00 0e  09 0c 00 00 00 00 00 00   
 
- 10 00 4f 00 c2 00 50  
 
+ 00 72 00 00 00 00 00 00 0e  09 0c 00 00 00 01 00 00   
 
+ 10 00 00 00 00 00 50  
 
   status=2: [desc] sense_key=0 asc=0 ascq=0
 Values from ATA status return descriptor are:
- 00 09 0c 00 00 00 00 00 00  00 4f 00 c2 00 50 
 
+ 00 09 0c 00 00 00 01 00 00  00 00 00 00 00 50 
 
 REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS returned 0
 
 REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK
@@ -110,9 +110,13 @@
   info=0x1  duration=40 milliseconds  resid=0
   >>> Sense buffer, len=22:
  00 72 00 00 00 00 00 00 0e  09 0c 00 00 00 00 00 00   
 
- 10 00 4f 00 c2 00 50  
 
+ 10 00 00 00 00 00 50  
 
   status=2: [desc] sense_key=0 asc=0 ascq=0
 Values from ATA status return descriptor are:
- 00 09 0c 00 00 00 00 00 00  00 4f 00 c2 00 50 
 
-REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK returned 0
-
+ 00 09 0c 00 00 00 00 00 00  00 00 00 00 00 50 
 
+Error SMART Status command failed
+Please get assistance from http://smartmontools.sourceforge.net/
+Values from ATA status return descriptor are:
+ 00 09 0c 00 00 00 00 00 00  00 00 00 00 00 50 
 
+REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK returned -1
+A mandatory SMART command failed: exiting. To continue, add one or more 
'-T permissive' options.


The log shows that the sense data returned by the commands differ: with 
2.6.22 the bytes 4f and 2c (tf.lbam and tf.lbah) are not returned. Both of 
the status commands fail to return these bytes but the tests in smartctl 
are more strict for the second case. This is why the second status command 
seems to be failing.

Next I added printks to the function ata_qc_complete() in libata-core.c. 
The changed code from 2.6.22 at line 5222 looked like this:

/* read result TF if requested */
if (qc->flags & ATA_QCFLAG_RESULT_TF) {
if (qc->tf.feature == 0xda)
printk("ata_qc_complete before: %02x %02x %02x 
%02x\n",
   qc->result_tf.feature,
   qc->result_tf.lbam, qc->result_tf.lbah,
  

Re: SMART problems in 2.6.22

2007-07-09 Thread Kai Makisara
On Sun, 8 Jul 2007, Bruce Allen wrote:

 Mark, David, Doug, Tejin, Alan, Jeff, LKML,
 
 I'm afraid that there may be some problem with SMART + libata in the 2.6.22
 kernel.  An hour ago I discovered that I missed a month of correspondence
 (some LKML, some private) about this problem which Alan, Tejun, Jeff, Mark and
 others copied to me -- it was automatically shoved into one of my mailboxes by
 my mail client.  Sorry about that.  So I am trying to catch up to see if there
 is some real problem or not.
 
 Here is a typical bug report that worries me:
 http://article.gmane.org/gmane.linux.utilities.smartmontools/4712
 
 Here is another similar report:
 http://thread.gmane.org/gmane.linux.utilities.smartmontools/4713
 
 And another report:
 http://www.mail-archive.com/[EMAIL PROTECTED]/msg358354.html
 
 From some of the earlier threads that I missed (below) I have the impression
 that the problem may be a very simple one, namely that starting with 2.6.22
 one needs to run a command to enable SMART when a box is first booted -- the
 kernel no longer does this as part of the init/setup of the disks. But that is
 NOT consistent with the first two reports above, which show 'SMART ENABLED'.
 
 Here are some of the earlier threads that I completely missed:
 
 http://www.ussg.iu.edu/hypermail/linux/kernel/0706.1/0849.html

I have done some more debugging on this one. An easy way to reproduce the 
problem is to use 'smartctl -H /dev/sdb'. If I enable debugging with '-r 
ioctl,2', I find the following difference between outputs using 2.6.21.1 
(works OK) and 2.6.22 (fails):

--- sm-2.6.21.1b.log2007-07-09 23:47:28.0 +0300
+++ sm-2.6.22.log   2007-07-09 23:39:56.0 +0300
@@ -11,7 +11,7 @@
   status=0x0
  [ata pass-through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00 ]
   scsi_status=0x0, host_status=0x0, driver_status=0x0
-  info=0x0  duration=0 milliseconds  resid=0
+  info=0x0  duration=4 milliseconds  resid=0
   Incoming data, len=512 [only first 256 bytes shown]:
  00 5a 0c ff 3f 37 c8 10 00  00 00 00 00 3f 00 00 00   
 
  10 00 00 00 00 20 20 20 20  20 20 20 20 20 20 20 20   
 
@@ -97,11 +97,11 @@
   scsi_status=0x2, host_status=0x0, driver_status=0x8
   info=0x1  duration=48 milliseconds  resid=0
Sense buffer, len=22:
- 00 72 00 00 00 00 00 00 0e  09 0c 00 00 00 00 00 00   
 
- 10 00 4f 00 c2 00 50  
 
+ 00 72 00 00 00 00 00 00 0e  09 0c 00 00 00 01 00 00   
 
+ 10 00 00 00 00 00 50  
 
   status=2: [desc] sense_key=0 asc=0 ascq=0
 Values from ATA status return descriptor are:
- 00 09 0c 00 00 00 00 00 00  00 4f 00 c2 00 50 
 
+ 00 09 0c 00 00 00 01 00 00  00 00 00 00 00 50 
 
 REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS returned 0
 
 REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK
@@ -110,9 +110,13 @@
   info=0x1  duration=40 milliseconds  resid=0
Sense buffer, len=22:
  00 72 00 00 00 00 00 00 0e  09 0c 00 00 00 00 00 00   
 
- 10 00 4f 00 c2 00 50  
 
+ 10 00 00 00 00 00 50  
 
   status=2: [desc] sense_key=0 asc=0 ascq=0
 Values from ATA status return descriptor are:
- 00 09 0c 00 00 00 00 00 00  00 4f 00 c2 00 50 
 
-REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK returned 0
-
+ 00 09 0c 00 00 00 00 00 00  00 00 00 00 00 50 
 
+Error SMART Status command failed
+Please get assistance from http://smartmontools.sourceforge.net/
+Values from ATA status return descriptor are:
+ 00 09 0c 00 00 00 00 00 00  00 00 00 00 00 50 
 
+REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK returned -1
+A mandatory SMART command failed: exiting. To continue, add one or more 
'-T permissive' options.


The log shows that the sense data returned by the commands differ: with 
2.6.22 the bytes 4f and 2c (tf.lbam and tf.lbah) are not returned. Both of 
the status commands fail to return these bytes but the tests in smartctl 
are more strict for the second case. This is why the second status command 
seems to be failing.

Next I added printks to the function ata_qc_complete() in libata-core.c. 
The changed code from 2.6.22 at line 5222 looked like this:

/* read result TF if requested */
if (qc-flags  ATA_QCFLAG_RESULT_TF) {
if (qc-tf.feature == 0xda)
printk(ata_qc_complete before: %02x %02x %02x 
%02x\n,
   qc-result_tf.feature,
   qc-result_tf.lbam, qc-result_tf.lbah,
   

2.6.22-rc regression: smartctl does not work with SATA disk

2007-06-10 Thread Kai Makisara
The command 'smartctl -a /dev/sdb' fails with 2.6.22-rc4 kernel. The 
disk /dev/sdb is a SATA disk. The command does work still with a real SCSI 
disk.

The computer has Athlon64 X2 and it is running x86_64 SMP kernel. The 
chipset is Nvidia CK804 and the sata_nv driver is used.

The following output from 'smartctl -a -r ioctl,1 /dev/sdb' tells the disk 
details and shows where the regression is:

-
smartctl version 5.38 [x86_64-suse-linux-gnu] Copyright (C) 2002-7 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

 [inquiry: 12 00 00 00 24 00 ]
  scsi_status=0x0, host_status=0x0, driver_status=0x0
  info=0x0  duration=0 milliseconds  resid=0
  status=0x0
 [ata pass-through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00 ]
  scsi_status=0x0, host_status=0x0, driver_status=0x0
  info=0x0  duration=4 milliseconds  resid=0
  status=0x0
Detected SAT interface, switch to device type 'sat'

REPORT-IOCTL: DeviceFD=3 Command=IDENTIFY DEVICE
 [ata pass-through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00 ]
  scsi_status=0x0, host_status=0x0, driver_status=0x0
  info=0x0  duration=0 milliseconds  resid=0
  status=0x0
REPORT-IOCTL: DeviceFD=3 Command=IDENTIFY DEVICE returned 0
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.10 family
Device Model: ST3320620AS
Serial Number:9QF22KAP
Firmware Version: 3.AAJ
User Capacity:320,072,933,376 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:Sun Jun 10 10:47:30 2007 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled


REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS
 [ata pass-through(16): 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00 ]
  scsi_status=0x2, host_status=0x0, driver_status=0x8
  info=0x1  duration=44 milliseconds  resid=0
  status=2: [desc] sense_key=0 asc=0 ascq=0
REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS returned 0

REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK
 [ata pass-through(16): 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00 ]
  scsi_status=0x2, host_status=0x0, driver_status=0x8
  info=0x1  duration=44 milliseconds  resid=0
  status=2: [desc] sense_key=0 asc=0 ascq=0
Error SMART Status command failed
Please get assistance from http://smartmontools.sourceforge.net/
Values from ATA status return descriptor are:
 00 09 0c 00 00 00 00 00 00  00 00 00 00 00 50  
REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK returned -1
A mandatory SMART command failed: exiting. To continue, add one or more '-T 
permissive' options.


This is smartctl from cvs a few days ago but the smartctl shipping with 
SuSE 10.2 fails in the same way.


I ran 'git bisect' and it suggests that the problem was introduced by

1e999736cafdffc374f22eed37b291129ef82e4e is first bad commit
commit 1e999736cafdffc374f22eed37b291129ef82e4e
Author: Alan Cox <[EMAIL PROTECTED]>
Date:   Wed Apr 11 00:23:13 2007 +0100

libata: HPA support

i.e., before 2.6.22-rc1. At this point I find best to leave the problem to 
experts.

-- 
Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.22-rc regression: smartctl does not work with SATA disk

2007-06-10 Thread Kai Makisara
The command 'smartctl -a /dev/sdb' fails with 2.6.22-rc4 kernel. The 
disk /dev/sdb is a SATA disk. The command does work still with a real SCSI 
disk.

The computer has Athlon64 X2 and it is running x86_64 SMP kernel. The 
chipset is Nvidia CK804 and the sata_nv driver is used.

The following output from 'smartctl -a -r ioctl,1 /dev/sdb' tells the disk 
details and shows where the regression is:

-
smartctl version 5.38 [x86_64-suse-linux-gnu] Copyright (C) 2002-7 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

 [inquiry: 12 00 00 00 24 00 ]
  scsi_status=0x0, host_status=0x0, driver_status=0x0
  info=0x0  duration=0 milliseconds  resid=0
  status=0x0
 [ata pass-through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00 ]
  scsi_status=0x0, host_status=0x0, driver_status=0x0
  info=0x0  duration=4 milliseconds  resid=0
  status=0x0
Detected SAT interface, switch to device type 'sat'

REPORT-IOCTL: DeviceFD=3 Command=IDENTIFY DEVICE
 [ata pass-through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00 ]
  scsi_status=0x0, host_status=0x0, driver_status=0x0
  info=0x0  duration=0 milliseconds  resid=0
  status=0x0
REPORT-IOCTL: DeviceFD=3 Command=IDENTIFY DEVICE returned 0
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.10 family
Device Model: ST3320620AS
Serial Number:9QF22KAP
Firmware Version: 3.AAJ
User Capacity:320,072,933,376 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:Sun Jun 10 10:47:30 2007 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled


REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS
 [ata pass-through(16): 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00 ]
  scsi_status=0x2, host_status=0x0, driver_status=0x8
  info=0x1  duration=44 milliseconds  resid=0
  status=2: [desc] sense_key=0 asc=0 ascq=0
REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS returned 0

REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK
 [ata pass-through(16): 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00 ]
  scsi_status=0x2, host_status=0x0, driver_status=0x8
  info=0x1  duration=44 milliseconds  resid=0
  status=2: [desc] sense_key=0 asc=0 ascq=0
Error SMART Status command failed
Please get assistance from http://smartmontools.sourceforge.net/
Values from ATA status return descriptor are:
 00 09 0c 00 00 00 00 00 00  00 00 00 00 00 50  
REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK returned -1
A mandatory SMART command failed: exiting. To continue, add one or more '-T 
permissive' options.


This is smartctl from cvs a few days ago but the smartctl shipping with 
SuSE 10.2 fails in the same way.


I ran 'git bisect' and it suggests that the problem was introduced by

1e999736cafdffc374f22eed37b291129ef82e4e is first bad commit
commit 1e999736cafdffc374f22eed37b291129ef82e4e
Author: Alan Cox [EMAIL PROTECTED]
Date:   Wed Apr 11 00:23:13 2007 +0100

libata: HPA support

i.e., before 2.6.22-rc1. At this point I find best to leave the problem to 
experts.

-- 
Kai
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

2005-03-25 Thread Kai Makisara
On Fri, 25 Mar 2005, James Bottomley wrote:

> On Fri, 2005-03-25 at 14:38 +0900, Tejun Heo wrote:
> >  We have users of scsi_do_req() other than scsi_wait_req() and they
> > use different done() functions to do different things.  I've checked
> > other done functions and none uses contents inside the passed
> > scsi_cmnd, so using a dummy command should be okay with them.  Am I
> > missing something here?
> 
> Well ... the other users are supposed to be going away.  They're
> actually all coded wrongly in some way or other ... perhaps I should
> speed up the process.
> 
I have seen you mention this several times now and I am getting more and 
more worried. The reason is that scsi_wait_req() is a synchronous 
interface and it does not allow a driver to do this:

- send a request
- do other useful things/let the user do useful work
- wait for completion before starting another request

I fully agree that doing done() correctly _is_ a problem, especially when 
the SCSI subsystem evolves and the high-level driver writers do not follow 
the development closely enough.

One solution to these problems would be to let the drivers still use 
scsi_do_req() and their own done() function, but create two 
(three) helpers:
- one to be called at the beginning of done(); it would do what needs to 
  be done here but lets the driver to do some special things of its own if
  necessary
- one to be called to wait for the request to finish
(- one to do scsi_ro_req() and the things necessary before these)

Having these helpers would isolate the user of the SCSI subsystem from the 
internals. scsi_wait_req() should call these functions and no additional 
maintenance would be needed for this additional asynchronous interface.

The current drivers may not do any work in done() that could not be done 
later but there is one patch pending where this happens: the st 
performance statistics patch needs to get the time stamp when the SCSI 
command is processed.

-- 
Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence

2005-03-25 Thread Kai Makisara
On Fri, 25 Mar 2005, James Bottomley wrote:

 On Fri, 2005-03-25 at 14:38 +0900, Tejun Heo wrote:
   We have users of scsi_do_req() other than scsi_wait_req() and they
  use different done() functions to do different things.  I've checked
  other done functions and none uses contents inside the passed
  scsi_cmnd, so using a dummy command should be okay with them.  Am I
  missing something here?
 
 Well ... the other users are supposed to be going away.  They're
 actually all coded wrongly in some way or other ... perhaps I should
 speed up the process.
 
I have seen you mention this several times now and I am getting more and 
more worried. The reason is that scsi_wait_req() is a synchronous 
interface and it does not allow a driver to do this:

- send a request
- do other useful things/let the user do useful work
- wait for completion before starting another request

I fully agree that doing done() correctly _is_ a problem, especially when 
the SCSI subsystem evolves and the high-level driver writers do not follow 
the development closely enough.

One solution to these problems would be to let the drivers still use 
scsi_do_req() and their own done() function, but create two 
(three) helpers:
- one to be called at the beginning of done(); it would do what needs to 
  be done here but lets the driver to do some special things of its own if
  necessary
- one to be called to wait for the request to finish
(- one to do scsi_ro_req() and the things necessary before these)

Having these helpers would isolate the user of the SCSI subsystem from the 
internals. scsi_wait_req() should call these functions and no additional 
maintenance would be needed for this additional asynchronous interface.

The current drivers may not do any work in done() that could not be done 
later but there is one patch pending where this happens: the st 
performance statistics patch needs to get the time stamp when the SCSI 
command is processed.

-- 
Kai
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] make st seekable again

2005-03-09 Thread Kai Makisara
On Wed, 9 Mar 2005, Alan Cox wrote:

> On Maw, 2005-03-08 at 17:25, Linux Kernel Mailing List wrote:
> > ChangeSet 1.2030, 2005/03/08 09:25:05-08:00, [EMAIL PROTECTED]
> > 
> > [PATCH] make st seekable again
> > 
> > Apparently `tar' errors out if it cannot perform lseek() against a 
> > tape.  Work
> > around that in-kernel.
> 
> Unfortunately this isn't a good idea. Allowing tar to read the tape
> position makes sense, allowing it to zero the position might but you
> have to do major surgery on the driver first because
> 
> 1.It doesn't use ppos
> 2.It doesn't do locking on the ppos at all
> 
> Also allowing apps to randomly seek and report "ok" when they are
> backing up to tape and might really need to see the error is not what
> I'd call stable, professional or quality code.
> 
The proper fix is to fix tar. I have sent an analysis of the problem and a 
suggestion how to fix this to the bug-tar list on March 5 but it is still 
waiting for moderator approval.

While waiting for the application to be fixed, it was decided to restore 
the old behaviour of the tape drivers.

lseek on a tape is not a good fit (addressed by block, blocks on tape can 
have any size, etc.). I don't know any Unix that would really implement 
lseek on tapes but they usually don't return error. This is probably why 
the tar bug has not been found earlier.

There has been one useful way of using lseek() with tapes in some systems. 
Those refuse reads and writes if the file pointer reaches 2 GB. Resetting 
it with lseek(fd,0,0) now and then has allowed writing/reading more than 2 
GB.

I don't think implementing proper read-only lseek for tapes is worth the 
trouble (reliable tracking of the current location is tricky). Purist 
kernels can refuse lseeks. Pragmatic kernels can allow lseeks until 
refusing those won't break common applications.

-- 
Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] make st seekable again

2005-03-09 Thread Kai Makisara
On Wed, 9 Mar 2005, Alan Cox wrote:

 On Maw, 2005-03-08 at 17:25, Linux Kernel Mailing List wrote:
  ChangeSet 1.2030, 2005/03/08 09:25:05-08:00, [EMAIL PROTECTED]
  
  [PATCH] make st seekable again
  
  Apparently `tar' errors out if it cannot perform lseek() against a 
  tape.  Work
  around that in-kernel.
 
 Unfortunately this isn't a good idea. Allowing tar to read the tape
 position makes sense, allowing it to zero the position might but you
 have to do major surgery on the driver first because
 
 1.It doesn't use ppos
 2.It doesn't do locking on the ppos at all
 
 Also allowing apps to randomly seek and report ok when they are
 backing up to tape and might really need to see the error is not what
 I'd call stable, professional or quality code.
 
The proper fix is to fix tar. I have sent an analysis of the problem and a 
suggestion how to fix this to the bug-tar list on March 5 but it is still 
waiting for moderator approval.

While waiting for the application to be fixed, it was decided to restore 
the old behaviour of the tape drivers.

lseek on a tape is not a good fit (addressed by block, blocks on tape can 
have any size, etc.). I don't know any Unix that would really implement 
lseek on tapes but they usually don't return error. This is probably why 
the tar bug has not been found earlier.

There has been one useful way of using lseek() with tapes in some systems. 
Those refuse reads and writes if the file pointer reaches 2 GB. Resetting 
it with lseek(fd,0,0) now and then has allowed writing/reading more than 2 
GB.

I don't think implementing proper read-only lseek for tapes is worth the 
trouble (reliable tracking of the current location is tricky). Purist 
kernels can refuse lseeks. Pragmatic kernels can allow lseeks until 
refusing those won't break common applications.

-- 
Kai
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with SCSI tape rewind / verify on 2.4.29

2005-03-02 Thread Kai Makisara
On Wed, 2 Mar 2005, Andrew Morton wrote:

> Kai Makisara <[EMAIL PROTECTED]> wrote:
> >
> > > 
> >  > v2.6 also contains the same problem BTW.
> >  > 
> >  > Try this:
> >  > 
> >  > --- a/drivers/scsi/st.c.orig 2005-03-02 09:02:13.637158144 -0300
> >  > +++ b/drivers/scsi/st.c  2005-03-02 09:02:20.208159200 -0300
> >  > @@ -3778,7 +3778,6 @@
> >  >  read:   st_read,
> >  >  write:  st_write,
> >  >  ioctl:  st_ioctl,
> >  > -llseek: no_llseek,
> >  >  open:   st_open,
> >  >  flush:  st_flush,
> >  >  release:st_release,
> > 
> >  This change covers up the problem. The real bug is in tar.
> 
> In that case we're kinda screwed, and should change the kernel to make tar
> work again.  We can send a bug report to the tar folks (good luck) and wait
> a few years.
> 
> >  The first BSF did position the tape correctly although it did fail.
> 
> (what's a BSF?)
> 
> If it positioned the tape successfully, why did it claim that it failed? 

BSF moves the tape backwards over filemarks. tar tries to move over one 
filemark. It does not find it because it ends to the beginning of the 
tape. This is why the operation fails. However, the tape is at the 
beginning and this is the correct place with regard to what is done next.

> If we were to fix that up, would tar then be happy?

It is not fixable in the kernel. The beginning of the tape is a special 
case because there is no filemark. Any application should take this into 
account. We could fake a filemark there but this would lead to problems 
because then we could "skip" backwards indefinitely even when the tape 
moves nowhere. This could confuse other applications.

If seek with tape is changed back to returning success, this would enable 
correct tar --verify at the beginning of the tape. However, I am not sure 
what happens if we are not at the beginning. I will investigate this and 
suggest a long term fix to the tar people (a fix that should be compatible 
with all Unix tape semantics I know) and also suggest possible fixes to st 
(this may include automatic writing of a filemark when BSF is used after 
writes).

If you think want to make st return success for seeks even if nothing 
happens (as it did earlier), I don't have anything against that. It would 
solve the practical problem several people have reported recently. (My 
recommendation for the people seeing this problem is to do verification 
separately with 'tar -d'.)

-- 
Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with SCSI tape rewind / verify on 2.4.29

2005-03-02 Thread Kai Makisara
On Wed, 2 Mar 2005, Marcelo Tosatti wrote:

> On Wed, Mar 02, 2005 at 11:17:19PM +0200, Kai Makisara wrote:
...
> > BTW, this "fix" by Solar Designer introduces a bug to 2.4.29: a tape 
> > driver is supposed to return ENOMEM in the case that was changed to return 
> > EIO ;-(
> 
> Reverted.
> 
Thanks.

...
> Thanks for the cluebat Kai, is this problem fixed in newer versions of tar? 
> 
The current CVS version seems to have the same code I quoted.

> I suspect v2.4 should work with older versions of tar, so we should keep 
> "lseek" working to make it happy. What is your opinion?
> 
I commented this in the other reply I just sent and I don't have a clear 
preference. I just hope that 2.4 and 2.6 are fixed in a compatible way.

-- 
Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with SCSI tape rewind / verify on 2.4.29

2005-03-02 Thread Kai Makisara
On Wed, 2 Mar 2005, Marcelo Tosatti wrote:

> On Wed, Mar 02, 2005 at 11:15:42AM -, Mark Yeatman wrote:
> > Hi
> > 
> > Never had to log a bug before, hope this is correctly done.
> > 
> > Thanks
> > 
> > Mark
> > 
> > Detail
> > 
> > [1.] One line summary of the problem:
> > SCSI tape drive is refusing to rewind after backup to allow verify and
> > causing illegal seek error
> > 
> > [2.] Full description of the problem/report:
> > On backup the tape drive is reporting the following error and failing
> > it's backups.
> > 
> > tar: /dev/st0: Warning: Cannot seek: Illegal seek
> > 
> > I have traced this back to failing at an upgrade of the kernel to 2.4.29
> > on Feb 8th. The backups have not worked since. Replacement Drives have
> > been tried and cables to no avail. I noticed in the the changelog that a
> > patch by Solar Designer to the Scsi tape return code had been made. 

BTW, this "fix" by Solar Designer introduces a bug to 2.4.29: a tape 
driver is supposed to return ENOMEM in the case that was changed to return 
EIO ;-(

> 
> v2.6 also contains the same problem BTW.
> 
> Try this:
> 
> --- a/drivers/scsi/st.c.orig  2005-03-02 09:02:13.637158144 -0300
> +++ b/drivers/scsi/st.c   2005-03-02 09:02:20.208159200 -0300
> @@ -3778,7 +3778,6 @@
>   read:   st_read,
>   write:  st_write,
>   ioctl:  st_ioctl,
> - llseek: no_llseek,
>   open:   st_open,
>   flush:  st_flush,
>   release:st_release,

This change covers up the problem. The real bug is in tar. The following 
code is from tar is supposed to reposition the tape to the beginning of 
the file jus written:

#ifdef MTIOCTOP
  {
struct mtop operation;
int status;

operation.mt_op = MTBSF;
operation.mt_count = 1;
if (status = rmtioctl (archive, MTIOCTOP, (char *) ), status 
< 0)
  {
if (errno != EIO
|| (status = rmtioctl (archive, MTIOCTOP, (char *) 
),
status < 0))
  {
#endif
if (rmtlseek (archive, (off_t) 0, SEEK_SET) != 0)
  {
/* Lseek failed.  Try a different method.  */
seek_warn (archive_name_array[0]);
return;
  }
#ifdef MTIOCTOP
  }
  }
  }
#endif


Here is output from strace showing what happens with 'tar -c -W' applied 
at the beginning of the tape (this is using kernel 2.6.11-rc4 but the same 
probably happens with 2.4.29):
...
ioctl(3, MGSL_IOCGPARAMS or MTIOCTOP or SNDCTL_MIDI_MPUMODE, 
0x7fffecd0) = -1 EIO (Input/output error)
ioctl(3, MGSL_IOCGPARAMS or MTIOCTOP or SNDCTL_MIDI_MPUMODE, 
0x7fffecd0) = -1 EIO (Input/output error)
lseek(3, 0, SEEK_SET)   = -1 ESPIPE (Illegal seek)

So, both tape positioning commands fail and the code falls back to lseek. 
Earlier it has returned success even though it has not done anything (this 
was on purpose because it is the way some other Unices behave and with 
reason). In that case this tar succeeded but it was pure luck. The first 
BSF did position the tape correctly although it did fail.

The 2.6 st driver does contain this near the beginning of st_open():

nonseekable_open(inode, filp);

This probably makes lseek fail. This code has been in st.c since 2.6.8.




-- 
Kai

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with SCSI tape rewind / verify on 2.4.29

2005-03-02 Thread Kai Makisara
On Wed, 2 Mar 2005, Marcelo Tosatti wrote:

 On Wed, Mar 02, 2005 at 11:15:42AM -, Mark Yeatman wrote:
  Hi
  
  Never had to log a bug before, hope this is correctly done.
  
  Thanks
  
  Mark
  
  Detail
  
  [1.] One line summary of the problem:
  SCSI tape drive is refusing to rewind after backup to allow verify and
  causing illegal seek error
  
  [2.] Full description of the problem/report:
  On backup the tape drive is reporting the following error and failing
  it's backups.
  
  tar: /dev/st0: Warning: Cannot seek: Illegal seek
  
  I have traced this back to failing at an upgrade of the kernel to 2.4.29
  on Feb 8th. The backups have not worked since. Replacement Drives have
  been tried and cables to no avail. I noticed in the the changelog that a
  patch by Solar Designer to the Scsi tape return code had been made. 

BTW, this fix by Solar Designer introduces a bug to 2.4.29: a tape 
driver is supposed to return ENOMEM in the case that was changed to return 
EIO ;-(

 
 v2.6 also contains the same problem BTW.
 
 Try this:
 
 --- a/drivers/scsi/st.c.orig  2005-03-02 09:02:13.637158144 -0300
 +++ b/drivers/scsi/st.c   2005-03-02 09:02:20.208159200 -0300
 @@ -3778,7 +3778,6 @@
   read:   st_read,
   write:  st_write,
   ioctl:  st_ioctl,
 - llseek: no_llseek,
   open:   st_open,
   flush:  st_flush,
   release:st_release,

This change covers up the problem. The real bug is in tar. The following 
code is from tar is supposed to reposition the tape to the beginning of 
the file jus written:

#ifdef MTIOCTOP
  {
struct mtop operation;
int status;

operation.mt_op = MTBSF;
operation.mt_count = 1;
if (status = rmtioctl (archive, MTIOCTOP, (char *) operation), status 
 0)
  {
if (errno != EIO
|| (status = rmtioctl (archive, MTIOCTOP, (char *) 
operation),
status  0))
  {
#endif
if (rmtlseek (archive, (off_t) 0, SEEK_SET) != 0)
  {
/* Lseek failed.  Try a different method.  */
seek_warn (archive_name_array[0]);
return;
  }
#ifdef MTIOCTOP
  }
  }
  }
#endif


Here is output from strace showing what happens with 'tar -c -W' applied 
at the beginning of the tape (this is using kernel 2.6.11-rc4 but the same 
probably happens with 2.4.29):
...
ioctl(3, MGSL_IOCGPARAMS or MTIOCTOP or SNDCTL_MIDI_MPUMODE, 
0x7fffecd0) = -1 EIO (Input/output error)
ioctl(3, MGSL_IOCGPARAMS or MTIOCTOP or SNDCTL_MIDI_MPUMODE, 
0x7fffecd0) = -1 EIO (Input/output error)
lseek(3, 0, SEEK_SET)   = -1 ESPIPE (Illegal seek)

So, both tape positioning commands fail and the code falls back to lseek. 
Earlier it has returned success even though it has not done anything (this 
was on purpose because it is the way some other Unices behave and with 
reason). In that case this tar succeeded but it was pure luck. The first 
BSF did position the tape correctly although it did fail.

The 2.6 st driver does contain this near the beginning of st_open():

nonseekable_open(inode, filp);

This probably makes lseek fail. This code has been in st.c since 2.6.8.




-- 
Kai

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with SCSI tape rewind / verify on 2.4.29

2005-03-02 Thread Kai Makisara
On Wed, 2 Mar 2005, Marcelo Tosatti wrote:

 On Wed, Mar 02, 2005 at 11:17:19PM +0200, Kai Makisara wrote:
...
  BTW, this fix by Solar Designer introduces a bug to 2.4.29: a tape 
  driver is supposed to return ENOMEM in the case that was changed to return 
  EIO ;-(
 
 Reverted.
 
Thanks.

...
 Thanks for the cluebat Kai, is this problem fixed in newer versions of tar? 
 
The current CVS version seems to have the same code I quoted.

 I suspect v2.4 should work with older versions of tar, so we should keep 
 lseek working to make it happy. What is your opinion?
 
I commented this in the other reply I just sent and I don't have a clear 
preference. I just hope that 2.4 and 2.6 are fixed in a compatible way.

-- 
Kai
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with SCSI tape rewind / verify on 2.4.29

2005-03-02 Thread Kai Makisara
On Wed, 2 Mar 2005, Andrew Morton wrote:

 Kai Makisara [EMAIL PROTECTED] wrote:
 
   
v2.6 also contains the same problem BTW.

Try this:

--- a/drivers/scsi/st.c.orig 2005-03-02 09:02:13.637158144 -0300
+++ b/drivers/scsi/st.c  2005-03-02 09:02:20.208159200 -0300
@@ -3778,7 +3778,6 @@
 read:   st_read,
 write:  st_write,
 ioctl:  st_ioctl,
-llseek: no_llseek,
 open:   st_open,
 flush:  st_flush,
 release:st_release,
  
   This change covers up the problem. The real bug is in tar.
 
 In that case we're kinda screwed, and should change the kernel to make tar
 work again.  We can send a bug report to the tar folks (good luck) and wait
 a few years.
 
   The first BSF did position the tape correctly although it did fail.
 
 (what's a BSF?)
 
 If it positioned the tape successfully, why did it claim that it failed? 

BSF moves the tape backwards over filemarks. tar tries to move over one 
filemark. It does not find it because it ends to the beginning of the 
tape. This is why the operation fails. However, the tape is at the 
beginning and this is the correct place with regard to what is done next.

 If we were to fix that up, would tar then be happy?

It is not fixable in the kernel. The beginning of the tape is a special 
case because there is no filemark. Any application should take this into 
account. We could fake a filemark there but this would lead to problems 
because then we could skip backwards indefinitely even when the tape 
moves nowhere. This could confuse other applications.

If seek with tape is changed back to returning success, this would enable 
correct tar --verify at the beginning of the tape. However, I am not sure 
what happens if we are not at the beginning. I will investigate this and 
suggest a long term fix to the tar people (a fix that should be compatible 
with all Unix tape semantics I know) and also suggest possible fixes to st 
(this may include automatic writing of a filemark when BSF is used after 
writes).

If you think want to make st return success for seeks even if nothing 
happens (as it did earlier), I don't have anything against that. It would 
solve the practical problem several people have reported recently. (My 
recommendation for the people seeing this problem is to do verification 
separately with 'tar -d'.)

-- 
Kai
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: raw tape device support???

2001-05-03 Thread Kai Makisara

On Thu, 3 May 2001, Mark Hounschell wrote:

>  Sorry if this isn't the correct place for this question. Is there or
> will there
> ever be raw tape device access. I'm trying to port an app from Dec unix
> and at
> least there the app requires /dev/rmt** (raw device). I've read in the
> archives
> about how to bind a block device to a raw device using the raw command
> but the
> tape dev (/dev/st*) is a char device and the command doesn't work on
> char devices.
> So I'm trying to figure out to get the same effect as /dev/rmt* does on
> the dec
> box in a linux environment.

You can just use the device /dev/st* (or /dev/nst*). They are raw
(character) devices. If your app needs to find the devices with names
/dev/rmt*, you can make new device nodes or use links.

Kai


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: raw tape device support???

2001-05-03 Thread Kai Makisara

On Thu, 3 May 2001, Mark Hounschell wrote:

  Sorry if this isn't the correct place for this question. Is there or
 will there
 ever be raw tape device access. I'm trying to port an app from Dec unix
 and at
 least there the app requires /dev/rmt** (raw device). I've read in the
 archives
 about how to bind a block device to a raw device using the raw command
 but the
 tape dev (/dev/st*) is a char device and the command doesn't work on
 char devices.
 So I'm trying to figure out to get the same effect as /dev/rmt* does on
 the dec
 box in a linux environment.

You can just use the device /dev/st* (or /dev/nst*). They are raw
(character) devices. If your app needs to find the devices with names
/dev/rmt*, you can make new device nodes or use links.

Kai


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Need help with allocating a 2M buffer size

2001-03-15 Thread Kai Makisara

On Thu, 15 Mar 2001, Byron Stanoszek wrote:

> I have a real picky tape drive (DLT series) that likes to be fed large chunks
> of data at once, otherwise after every 2-4KB of data it halts and rewinds
> itself because its cache for writing to the tape is empty.
>
> My best solution to this problem was to use 'tar -b 4096', which sends 4096 x
> 512-byte blocks at once for a total of a 2MB buffer size. This worked fine for
> several weeks, until 2 days ago I got this message (and the backup fails):
>
> st: failed to enlarge buffer to 2097152 bytes.
>
The default maximum number of scatter/gather segments in the tape driver
is 16. This means that big chunks of memory are needed to allocate a 2 MB
buffer. You can increase the number of segments up to, e.g., 128. This
means that only 16 kB chunks are needed to make up a 2 MB buffer. The
number of scatter/gather segments is also limited by your SCSI adapter
driver. Note that even with 16 kB segments you may find problems at
some time because multi-page allocations are needed.

You can increase the number of scatter/gather segments at system
startup/module loading or when compiling the driver. See the file
linux/drivers/scsi/README.st for the syntax and st_options.h for the
compile-time definition.

Kai


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Need help with allocating a 2M buffer size

2001-03-15 Thread Kai Makisara

On Thu, 15 Mar 2001, Byron Stanoszek wrote:

 I have a real picky tape drive (DLT series) that likes to be fed large chunks
 of data at once, otherwise after every 2-4KB of data it halts and rewinds
 itself because its cache for writing to the tape is empty.

 My best solution to this problem was to use 'tar -b 4096', which sends 4096 x
 512-byte blocks at once for a total of a 2MB buffer size. This worked fine for
 several weeks, until 2 days ago I got this message (and the backup fails):

 st: failed to enlarge buffer to 2097152 bytes.

The default maximum number of scatter/gather segments in the tape driver
is 16. This means that big chunks of memory are needed to allocate a 2 MB
buffer. You can increase the number of segments up to, e.g., 128. This
means that only 16 kB chunks are needed to make up a 2 MB buffer. The
number of scatter/gather segments is also limited by your SCSI adapter
driver. Note that even with 16 kB segments you may find problems at
some time because multi-page allocations are needed.

You can increase the number of scatter/gather segments at system
startup/module loading or when compiling the driver. See the file
linux/drivers/scsi/README.st for the syntax and st_options.h for the
compile-time definition.

Kai


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.0-test4&6 scsi tape problem [not fixed :-/]

2000-08-31 Thread Kai Makisara


I suggest we move this discussion from linux-kernel to linux-scsi.

On Thu, 31 Aug 2000, G. Saraber wrote:

> "Richard B. Johnson" wrote:
> > 
> > On Wed, 30 Aug 2000, G. Saraber wrote:
> > 
> > > Thanks for the excellent guide on how to pinpoint the problem ...
> > >
> > > guess what :-) I decided before I send in another bugreport i'll upgrade
> > > to test7 so the developers dont have to dig through 'old' kernel
> > > versions ..
> > > anyway, the problem went away, they must have fixed it :-)
> 
> ok,
> i wasn't able to reproduce the problem before, but now it's occurred
> again, this the second nightly backup since booting with 2.4.0-test7,
> the first one went fine which i couldnt get to happen with test4 and
> test6 so test7 is slightly better. The error occured after a little over
> 3GB is backed up on a 12GB (uncompressed space) tape.
> However this time I have logs :-) and once again, "mt offline" or the
> button on the tapedrive itself won't release the tape, 
> any mt command after the first "mt offline" gives: 
> [root@ahr log]# mt offline
> /dev/tape: Input/output error
> 
> right away, it doesnt even try to access the drive the second time
> around..
> 
> (logs attached below) i'll do more testing later today.
> i had to heavily snip the log to keep the size under control to save
> bandwith, i'll gladly send you the full list in the format of your
> choice, just drop me a line.
>  
> Regards,
> Gerard Saraber
> [EMAIL PROTECTED]
> http://www.rarcoa.com
>  scsi log --
> Aug 31 00:42:50 ahr kernel: scsi : aborting command due to timeout : pid
> 0, scsi0, channel 0, id 3, lun 0 0x2a 00 00 09 b4 d5 00 00 02 00  
> Aug 31 00:43:15 ahr kernel: scsi : aborting command due to timeout : pid
> 0, scsi0, channel 0, id 3, lun 0 0x28 00 00 02 17 3f 00 00 80 00  
> Aug 31 00:43:20 ahr kernel: scsi : aborting command due to timeout : pid
> 0, scsi0, channel 0, id 3, lun 0 0x2a 00 00 09 b4 d5 00 00 02 00  
> Aug 31 00:43:20 ahr kernel: scsi : aborting command due to timeout : pid
> 0, scsi0, channel 0, id 4, lun 0 0x2a 00 00 26 74 f3 00 00 02 00  
> Aug 31 00:43:20 ahr kernel: scsi : aborting command due to timeout : pid
> 0, scsi0, channel 0, id 4, lun 0 0x2a 00 00 26 77 1b 00 00 22 00  

You get timeouts from devices 3 and 4 on the bus (I assume these are
disks). These are read (0x28) and write (0x2a) commands (10 byte
versions). Note that these timeouts start at 00:42:50.

The disks probably time out because the tape drive has not released the
SCSI bus.

[timeout messages cut]

> Aug 31 00:57:20 ahr kernel: scsi : aborting command due to timeout : pid
> 0, scsi0, channel 0, id 3, lun 0 0x28 00 00 01 74 57 00 00 68 00  
> Aug 31 00:57:20 ahr kernel: scsi : aborting command due to timeout : pid
> 0, scsi0, channel 0, id 5, lun 0 0x0a 01 00 00 40 00  
   ^^^
This is presumably the last command sent to the tape drive. It is a write
command that writes 64 tape blocks in fixed block mode. This looks legal.

The time here is 00:57:20 which means that the tape command times out 15
minutes after the first timeout from the disks. The tape driver timeout is
15 minutes and so the tape command times out properly.

After this the SCSI subsystem decides that it should try to reset the SCSI
bus to resolve the problem.

> Aug 31 00:57:20 ahr kernel: SCSI host 0 abort (pid 0) timed out -
> resetting 
> Aug 31 00:57:20 ahr kernel: SCSI bus is being reset for host 0 channel
> 0. 
> Aug 31 00:57:20 ahr kernel: (scsi0:0:5:0) Synchronous at 10.0 Mbyte/sec,
> offset 15. 
> Aug 31 00:57:20 ahr kernel: (scsi0:0:3:0) Synchronous at 80.0 Mbyte/sec,
> offset 31. 
> Aug 31 00:57:20 ahr kernel: (scsi0:0:4:0) Synchronous at 80.0 Mbyte/sec,
> offset 31. 
> Aug 31 00:57:25 ahr kernel: st0: Error with sense data: Info fld=0x40,
> Current st09:00: sns = f0  6 
 ^
translation: UNIT ATTENTION
(I am not trying to shout: the capital letters are cut from the SCSI
standard draft :-)
(If you enable verbose SCSI messages in the kernel configuration, the
kernel does this translation for you.)

> Aug 31 00:57:25 ahr kernel: ASC=29 ASCQ= 0 
  ^^
translation: POWER ON, RESET, OR BUS DEVICE RESET OCCURRED

> Aug 31 00:57:25 ahr kernel: Raw sense data:0xf0 0x00 0x06 0x00 0x00 0x00
> 0x40 0x12 0x00 0x00 0x00 0x00 0x29 0x00 0x00 0x00 0x00 0x00 0x00 0x00
> 0x00 0x00 0x00 0x00 0x00 0x00  
> Aug 31 00:57:25 ahr kernel: st0: Error with sense data: Info fld=0x1,
> Current st09:00: sns = f0  2 
 ^
translation: NOT READY

> Aug 31 00:57:25 ahr kernel: ASC= 4 ASCQ= 1 
  ^^
translation: LOGICAL UNIT IS IN PROCESS OF BECOMING READY

When the tape driver sees a unit attention anywhere else than at open(),
it prevents further access to the tape until some command is issued that
puts the tape into a known position. Rewind is one example. So, 

Re: 2.4.0-test46 scsi tape problem [not fixed :-/]

2000-08-31 Thread Kai Makisara


I suggest we move this discussion from linux-kernel to linux-scsi.

On Thu, 31 Aug 2000, G. Saraber wrote:

 "Richard B. Johnson" wrote:
  
  On Wed, 30 Aug 2000, G. Saraber wrote:
  
   Thanks for the excellent guide on how to pinpoint the problem ...
  
   guess what :-) I decided before I send in another bugreport i'll upgrade
   to test7 so the developers dont have to dig through 'old' kernel
   versions ..
   anyway, the problem went away, they must have fixed it :-)
 
 ok,
 i wasn't able to reproduce the problem before, but now it's occurred
 again, this the second nightly backup since booting with 2.4.0-test7,
 the first one went fine which i couldnt get to happen with test4 and
 test6 so test7 is slightly better. The error occured after a little over
 3GB is backed up on a 12GB (uncompressed space) tape.
 However this time I have logs :-) and once again, "mt offline" or the
 button on the tapedrive itself won't release the tape, 
 any mt command after the first "mt offline" gives: 
 [root@ahr log]# mt offline
 /dev/tape: Input/output error
 
 right away, it doesnt even try to access the drive the second time
 around..
 
 (logs attached below) i'll do more testing later today.
 i had to heavily snip the log to keep the size under control to save
 bandwith, i'll gladly send you the full list in the format of your
 choice, just drop me a line.
  
 Regards,
 Gerard Saraber
 [EMAIL PROTECTED]
 http://www.rarcoa.com
  scsi log --
 Aug 31 00:42:50 ahr kernel: scsi : aborting command due to timeout : pid
 0, scsi0, channel 0, id 3, lun 0 0x2a 00 00 09 b4 d5 00 00 02 00  
 Aug 31 00:43:15 ahr kernel: scsi : aborting command due to timeout : pid
 0, scsi0, channel 0, id 3, lun 0 0x28 00 00 02 17 3f 00 00 80 00  
 Aug 31 00:43:20 ahr kernel: scsi : aborting command due to timeout : pid
 0, scsi0, channel 0, id 3, lun 0 0x2a 00 00 09 b4 d5 00 00 02 00  
 Aug 31 00:43:20 ahr kernel: scsi : aborting command due to timeout : pid
 0, scsi0, channel 0, id 4, lun 0 0x2a 00 00 26 74 f3 00 00 02 00  
 Aug 31 00:43:20 ahr kernel: scsi : aborting command due to timeout : pid
 0, scsi0, channel 0, id 4, lun 0 0x2a 00 00 26 77 1b 00 00 22 00  

You get timeouts from devices 3 and 4 on the bus (I assume these are
disks). These are read (0x28) and write (0x2a) commands (10 byte
versions). Note that these timeouts start at 00:42:50.

The disks probably time out because the tape drive has not released the
SCSI bus.

[timeout messages cut]

 Aug 31 00:57:20 ahr kernel: scsi : aborting command due to timeout : pid
 0, scsi0, channel 0, id 3, lun 0 0x28 00 00 01 74 57 00 00 68 00  
 Aug 31 00:57:20 ahr kernel: scsi : aborting command due to timeout : pid
 0, scsi0, channel 0, id 5, lun 0 0x0a 01 00 00 40 00  
   ^^^
This is presumably the last command sent to the tape drive. It is a write
command that writes 64 tape blocks in fixed block mode. This looks legal.

The time here is 00:57:20 which means that the tape command times out 15
minutes after the first timeout from the disks. The tape driver timeout is
15 minutes and so the tape command times out properly.

After this the SCSI subsystem decides that it should try to reset the SCSI
bus to resolve the problem.

 Aug 31 00:57:20 ahr kernel: SCSI host 0 abort (pid 0) timed out -
 resetting 
 Aug 31 00:57:20 ahr kernel: SCSI bus is being reset for host 0 channel
 0. 
 Aug 31 00:57:20 ahr kernel: (scsi0:0:5:0) Synchronous at 10.0 Mbyte/sec,
 offset 15. 
 Aug 31 00:57:20 ahr kernel: (scsi0:0:3:0) Synchronous at 80.0 Mbyte/sec,
 offset 31. 
 Aug 31 00:57:20 ahr kernel: (scsi0:0:4:0) Synchronous at 80.0 Mbyte/sec,
 offset 31. 
 Aug 31 00:57:25 ahr kernel: st0: Error with sense data: Info fld=0x40,
 Current st09:00: sns = f0  6 
 ^
translation: UNIT ATTENTION
(I am not trying to shout: the capital letters are cut from the SCSI
standard draft :-)
(If you enable verbose SCSI messages in the kernel configuration, the
kernel does this translation for you.)

 Aug 31 00:57:25 ahr kernel: ASC=29 ASCQ= 0 
  ^^
translation: POWER ON, RESET, OR BUS DEVICE RESET OCCURRED

 Aug 31 00:57:25 ahr kernel: Raw sense data:0xf0 0x00 0x06 0x00 0x00 0x00
 0x40 0x12 0x00 0x00 0x00 0x00 0x29 0x00 0x00 0x00 0x00 0x00 0x00 0x00
 0x00 0x00 0x00 0x00 0x00 0x00  
 Aug 31 00:57:25 ahr kernel: st0: Error with sense data: Info fld=0x1,
 Current st09:00: sns = f0  2 
 ^
translation: NOT READY

 Aug 31 00:57:25 ahr kernel: ASC= 4 ASCQ= 1 
  ^^
translation: LOGICAL UNIT IS IN PROCESS OF BECOMING READY

When the tape driver sees a unit attention anywhere else than at open(),
it prevents further access to the tape until some command is issued that
puts the tape into a known position. Rewind is one example. So, the fact
that you don't seem to be able to do anything with the tape after the bus