[PATCH] SCSI st : convert to unlocked_ioctl
Convert st to unlocked_ioctl. The necessary locking was already in place. Signed-off-by: Kai Makisara <[EMAIL PROTECTED]> --- The patch is against 2.6.24-rc8. drivers/scsi/st.c |9 - 1 file changed, 4 insertions(+), 5 deletions(-) --- linux-2.6/drivers/scsi/st.c 2007-12-20 18:26:03.0 +0200 +++ linux-2.6-rc8-test/drivers/scsi/st.c2008-01-17 21:49:14.0 +0200 @@ -9,7 +9,7 @@ Steve Hirsch, Andreas Koppenh"ofer, Michael Leodolter, Eyal Lebedinsky, Michael Schaefer, J"org Weule, and Eric Youngdale. - Copyright 1992 - 2007 Kai Makisara + Copyright 1992 - 2008 Kai Makisara email [EMAIL PROTECTED] Some small formal changes - aeb, 950809 @@ -17,7 +17,7 @@ Last modified: 18-JAN-1998 Richard Gooch <[EMAIL PROTECTED]> Devfs support */ -static const char *verstr = "20070203"; +static const char *verstr = "20080117"; #include @@ -3214,8 +3214,7 @@ static int partition_tape(struct scsi_ta /* The ioctl command */ -static int st_ioctl(struct inode *inode, struct file *file, - unsigned int cmd_in, unsigned long arg) +static long st_ioctl(struct file *file, unsigned int cmd_in, unsigned long arg) { int i, cmd_nr, cmd_type, bt; int retval = 0; @@ -3870,7 +3869,7 @@ static const struct file_operations st_f .owner =THIS_MODULE, .read = st_read, .write =st_write, - .ioctl =st_ioctl, + .unlocked_ioctl = st_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = st_compat_ioctl, #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] SCSI st : convert to unlocked_ioctl
Convert st to unlocked_ioctl. The necessary locking was already in place. Signed-off-by: Kai Makisara [EMAIL PROTECTED] --- The patch is against 2.6.24-rc8. drivers/scsi/st.c |9 - 1 file changed, 4 insertions(+), 5 deletions(-) --- linux-2.6/drivers/scsi/st.c 2007-12-20 18:26:03.0 +0200 +++ linux-2.6-rc8-test/drivers/scsi/st.c2008-01-17 21:49:14.0 +0200 @@ -9,7 +9,7 @@ Steve Hirsch, Andreas Koppenhofer, Michael Leodolter, Eyal Lebedinsky, Michael Schaefer, Jorg Weule, and Eric Youngdale. - Copyright 1992 - 2007 Kai Makisara + Copyright 1992 - 2008 Kai Makisara email [EMAIL PROTECTED] Some small formal changes - aeb, 950809 @@ -17,7 +17,7 @@ Last modified: 18-JAN-1998 Richard Gooch [EMAIL PROTECTED] Devfs support */ -static const char *verstr = 20070203; +static const char *verstr = 20080117; #include linux/module.h @@ -3214,8 +3214,7 @@ static int partition_tape(struct scsi_ta /* The ioctl command */ -static int st_ioctl(struct inode *inode, struct file *file, - unsigned int cmd_in, unsigned long arg) +static long st_ioctl(struct file *file, unsigned int cmd_in, unsigned long arg) { int i, cmd_nr, cmd_type, bt; int retval = 0; @@ -3870,7 +3869,7 @@ static const struct file_operations st_f .owner =THIS_MODULE, .read = st_read, .write =st_write, - .ioctl =st_ioctl, + .unlocked_ioctl = st_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = st_compat_ioctl, #endif -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5: tape drive not responding
On Mon, 17 Dec 2007, James Bottomley wrote: > > On Mon, 2007-12-17 at 13:43 -0800, Andrew Morton wrote: > > On Mon, 17 Dec 2007 16:02:02 -0500 > > "John Stoffel" <[EMAIL PROTECTED]> wrote: > > > > > > > > Just to confirm, the propsed patch to st.c fixes the issue with > > > 2.6.24-rc5 as well at 2.6.24-rc5-mm1 with access to my DLT tape > > > drives. > > > > err, what patch to st.c? > > That's this one: > > http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=acdd0b1c371b2fbb4b6110a51ba69cb0af9e6f45 > I have done some tests. Firstly, I did not see the BUG with 2.6.24-rc5. Looking at include/linux/scatterlist.h suggested that CONFIG_DEBUG_SG has something to do with this. When enabled SG debugging, I also saw the BUG. Adding this patch solved the problem. You can add Acked-by: Kai Makisara <[EMAIL PROTECTED]> if you want. This fix should be included in 2.6.24. -- Kai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5: tape drive not responding
On Mon, 17 Dec 2007, James Bottomley wrote: On Mon, 2007-12-17 at 13:43 -0800, Andrew Morton wrote: On Mon, 17 Dec 2007 16:02:02 -0500 John Stoffel [EMAIL PROTECTED] wrote: Just to confirm, the propsed patch to st.c fixes the issue with 2.6.24-rc5 as well at 2.6.24-rc5-mm1 with access to my DLT tape drives. err, what patch to st.c? That's this one: http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=acdd0b1c371b2fbb4b6110a51ba69cb0af9e6f45 I have done some tests. Firstly, I did not see the BUG with 2.6.24-rc5. Looking at include/linux/scatterlist.h suggested that CONFIG_DEBUG_SG has something to do with this. When enabled SG debugging, I also saw the BUG. Adding this patch solved the problem. You can add Acked-by: Kai Makisara [EMAIL PROTECTED] if you want. This fix should be included in 2.6.24. -- Kai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/5] Use mutex instead of semaphore in the SCSI Tape driver
On Sun, 29 Jul 2007, Matthias Kaehlcke wrote: > The SCSI Tape driver uses a semaphore as mutex. Use the mutex API > instead of the (binary) semaphore. > > Signed-off-by: Matthias Kaehlcke <[EMAIL PROTECTED]> > Signed-off-by: Kai Makisara <[EMAIL PROTECTED]> Thanks. -- Kai - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/5] Use mutex instead of semaphore in the SCSI Tape driver
On Sun, 29 Jul 2007, Matthias Kaehlcke wrote: The SCSI Tape driver uses a semaphore as mutex. Use the mutex API instead of the (binary) semaphore. Signed-off-by: Matthias Kaehlcke [EMAIL PROTECTED] Signed-off-by: Kai Makisara [EMAIL PROTECTED] Thanks. -- Kai - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/33] SG table chaining support
On Mon, 16 Jul 2007, Martin K. Petersen wrote: > > "John" == John Stoffel <[EMAIL PROTECTED]> writes: > > John> Will this help out tape drive performance at all? I looked > John> through the patches quickly, esp the AIC7xxx stuff since that's > John> what I use, but nothing jumped out at me... > > Yes. Most modern tape drives want a block size of 1MB or higher. > With the old stack we'd be stuck at 512KB because the sg limitations > caused us to come just short of 1MB... > Tape block sizes up to 16 MB have been possible for a very long time but this has required tuning of the block/scsi parameters. Very few people seem to have done this and the common (mis)belief seems to be that the tape block size limit has been 512 kB. It is good if this tuning is not needed in future. -- Kai - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMART problems in 2.6.22
On Mon, 16 Jul 2007, Tejun Heo wrote: > Please try the patch in the following message. > > http://article.gmane.org/gmane.linux.ide/20799/raw > This solves the 'smartctl -H' problem both of my systems (one with Nvidia CK804 and one with MCP51). Tested-by: Kai Makisara <[EMAIL PROTECTED]> Thanks for pointing out the patch. -- Kai - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMART problems in 2.6.22
On Tue, 10 Jul 2007, Kai Makisara wrote: > On Sun, 8 Jul 2007, Bruce Allen wrote: > > > Mark, David, Doug, Tejin, Alan, Jeff, LKML, > > > > I'm afraid that there may be some problem with SMART + libata in the 2.6.22 > > kernel. An hour ago I discovered that I missed a month of correspondence > > (some LKML, some private) about this problem which Alan, Tejun, Jeff, Mark > > and > > others copied to me -- it was automatically shoved into one of my mailboxes > > by > > my mail client. Sorry about that. So I am trying to catch up to see if > > there > > is some real problem or not. > > ... > > http://www.ussg.iu.edu/hypermail/linux/kernel/0706.1/0849.html > > I have done some more debugging on this one. An easy way to reproduce the > problem is to use 'smartctl -H /dev/sdb'. If I enable debugging with '-r > ioctl,2', I find the following difference between outputs using 2.6.21.1 > (works OK) and 2.6.22 (fails): > ... > The log shows that the sense data returned by the commands differ: with > 2.6.22 the bytes 4f and 2c (tf.lbam and tf.lbah) are not returned. Both of > the status commands fail to return these bytes but the tests in smartctl > are more strict for the second case. This is why the second status command > seems to be failing. > > Next I added printks to the function ata_qc_complete() in libata-core.c. > The changed code from 2.6.22 at line 5222 looked like this: > ... > The output from 2.6.21.6 looks like this: > > Jul 9 18:37:44 kai kernel: [ 193.443874] ata_qc_complete before: 00 00 00 40 > Jul 9 18:37:44 kai kernel: [ 193.443880] ata_qc_complete 16: 00 4f c2 50 > Jul 9 18:37:44 kai kernel: [ 193.462802] ata_qc_complete before: 00 4f c2 40 > Jul 9 18:37:44 kai kernel: [ 193.462807] ata_qc_complete 16: 00 4f c2 50 > > i.e., the bytes are returned. > > The output from 2.6.22 is different: > > Jul 9 18:44:35 kai kernel: [ 147.765965] ata_qc_complete before: 00 00 00 40 > Jul 9 18:44:35 kai kernel: [ 147.765970] ata_qc_complete 16: 00 00 00 50 > Jul 9 18:44:35 kai kernel: [ 147.784890] ata_qc_complete before: 00 00 00 40 > Jul 9 18:44:35 kai kernel: [ 147.784894] ata_qc_complete 16: 00 00 00 50 > > The lbam and lbah bytes are not returned but the command byte is. > The other system with the Maxtor disk fails in a slightly different way (it correctly returns the c2 byte but not in the correct location): [ 162.896173] ata_qc_complete before: 00 00 00 40 [ 162.896179] ata_qc_complete 16: 00 c2 00 50 My earlier 'git bisect' suggested that this problem surfaced after the patch 1e999736cafdffc374f22eed37b291129ef82e4e is first bad commit commit 1e999736cafdffc374f22eed37b291129ef82e4e Author: Alan Cox <[EMAIL PROTECTED]> Date: Wed Apr 11 00:23:13 2007 +0100 libata: HPA support I have now done some further tests to see what is happening. It turned out that after commenting the call (at line 1956 in drivers/ata/libata-core.c in 2.6.22) if (ata_id_hpa_enabled(dev->id)) dev->n_sectors = ata_hpa_resize(dev); 'smartctl -H' worked again without problems. This applied to both of the systems where I see the problem. The disks in both systems support hpa but nothing is hidden. Next I commented only the call to ata_read_native_max_address_ext() in ata_hpa_resize(). This was enough to remove the problem (as was expected). So, the question is: why does calling ata_read_native_max_address_ext() when booting the system cause the SMART RETURN STATUS fail much later? -- Kai - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMART problems in 2.6.22
On Tue, 10 Jul 2007, Kai Makisara wrote: On Sun, 8 Jul 2007, Bruce Allen wrote: Mark, David, Doug, Tejin, Alan, Jeff, LKML, I'm afraid that there may be some problem with SMART + libata in the 2.6.22 kernel. An hour ago I discovered that I missed a month of correspondence (some LKML, some private) about this problem which Alan, Tejun, Jeff, Mark and others copied to me -- it was automatically shoved into one of my mailboxes by my mail client. Sorry about that. So I am trying to catch up to see if there is some real problem or not. ... http://www.ussg.iu.edu/hypermail/linux/kernel/0706.1/0849.html I have done some more debugging on this one. An easy way to reproduce the problem is to use 'smartctl -H /dev/sdb'. If I enable debugging with '-r ioctl,2', I find the following difference between outputs using 2.6.21.1 (works OK) and 2.6.22 (fails): ... The log shows that the sense data returned by the commands differ: with 2.6.22 the bytes 4f and 2c (tf.lbam and tf.lbah) are not returned. Both of the status commands fail to return these bytes but the tests in smartctl are more strict for the second case. This is why the second status command seems to be failing. Next I added printks to the function ata_qc_complete() in libata-core.c. The changed code from 2.6.22 at line 5222 looked like this: ... The output from 2.6.21.6 looks like this: Jul 9 18:37:44 kai kernel: [ 193.443874] ata_qc_complete before: 00 00 00 40 Jul 9 18:37:44 kai kernel: [ 193.443880] ata_qc_complete 16: 00 4f c2 50 Jul 9 18:37:44 kai kernel: [ 193.462802] ata_qc_complete before: 00 4f c2 40 Jul 9 18:37:44 kai kernel: [ 193.462807] ata_qc_complete 16: 00 4f c2 50 i.e., the bytes are returned. The output from 2.6.22 is different: Jul 9 18:44:35 kai kernel: [ 147.765965] ata_qc_complete before: 00 00 00 40 Jul 9 18:44:35 kai kernel: [ 147.765970] ata_qc_complete 16: 00 00 00 50 Jul 9 18:44:35 kai kernel: [ 147.784890] ata_qc_complete before: 00 00 00 40 Jul 9 18:44:35 kai kernel: [ 147.784894] ata_qc_complete 16: 00 00 00 50 The lbam and lbah bytes are not returned but the command byte is. The other system with the Maxtor disk fails in a slightly different way (it correctly returns the c2 byte but not in the correct location): [ 162.896173] ata_qc_complete before: 00 00 00 40 [ 162.896179] ata_qc_complete 16: 00 c2 00 50 My earlier 'git bisect' suggested that this problem surfaced after the patch 1e999736cafdffc374f22eed37b291129ef82e4e is first bad commit commit 1e999736cafdffc374f22eed37b291129ef82e4e Author: Alan Cox [EMAIL PROTECTED] Date: Wed Apr 11 00:23:13 2007 +0100 libata: HPA support I have now done some further tests to see what is happening. It turned out that after commenting the call (at line 1956 in drivers/ata/libata-core.c in 2.6.22) if (ata_id_hpa_enabled(dev-id)) dev-n_sectors = ata_hpa_resize(dev); 'smartctl -H' worked again without problems. This applied to both of the systems where I see the problem. The disks in both systems support hpa but nothing is hidden. Next I commented only the call to ata_read_native_max_address_ext() in ata_hpa_resize(). This was enough to remove the problem (as was expected). So, the question is: why does calling ata_read_native_max_address_ext() when booting the system cause the SMART RETURN STATUS fail much later? -- Kai - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMART problems in 2.6.22
On Mon, 16 Jul 2007, Tejun Heo wrote: Please try the patch in the following message. http://article.gmane.org/gmane.linux.ide/20799/raw This solves the 'smartctl -H' problem both of my systems (one with Nvidia CK804 and one with MCP51). Tested-by: Kai Makisara [EMAIL PROTECTED] Thanks for pointing out the patch. -- Kai - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/33] SG table chaining support
On Mon, 16 Jul 2007, Martin K. Petersen wrote: John == John Stoffel [EMAIL PROTECTED] writes: John Will this help out tape drive performance at all? I looked John through the patches quickly, esp the AIC7xxx stuff since that's John what I use, but nothing jumped out at me... Yes. Most modern tape drives want a block size of 1MB or higher. With the old stack we'd be stuck at 512KB because the sg limitations caused us to come just short of 1MB... Tape block sizes up to 16 MB have been possible for a very long time but this has required tuning of the block/scsi parameters. Very few people seem to have done this and the common (mis)belief seems to be that the tape block size limit has been 512 kB. It is good if this tuning is not needed in future. -- Kai - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMART problems in 2.6.22
On Sun, 8 Jul 2007, Bruce Allen wrote: > Mark, David, Doug, Tejin, Alan, Jeff, LKML, > > I'm afraid that there may be some problem with SMART + libata in the 2.6.22 > kernel. An hour ago I discovered that I missed a month of correspondence > (some LKML, some private) about this problem which Alan, Tejun, Jeff, Mark and > others copied to me -- it was automatically shoved into one of my mailboxes by > my mail client. Sorry about that. So I am trying to catch up to see if there > is some real problem or not. > > Here is a typical bug report that worries me: > http://article.gmane.org/gmane.linux.utilities.smartmontools/4712 > > Here is another similar report: > http://thread.gmane.org/gmane.linux.utilities.smartmontools/4713 > > And another report: > http://www.mail-archive.com/[EMAIL PROTECTED]/msg358354.html > > >From some of the earlier threads that I missed (below) I have the impression > that the problem may be a very simple one, namely that starting with 2.6.22 > one needs to run a command to enable SMART when a box is first booted -- the > kernel no longer does this as part of the init/setup of the disks. But that is > NOT consistent with the first two reports above, which show 'SMART ENABLED'. > > Here are some of the earlier threads that I completely missed: > > http://www.ussg.iu.edu/hypermail/linux/kernel/0706.1/0849.html I have done some more debugging on this one. An easy way to reproduce the problem is to use 'smartctl -H /dev/sdb'. If I enable debugging with '-r ioctl,2', I find the following difference between outputs using 2.6.21.1 (works OK) and 2.6.22 (fails): --- sm-2.6.21.1b.log2007-07-09 23:47:28.0 +0300 +++ sm-2.6.22.log 2007-07-09 23:39:56.0 +0300 @@ -11,7 +11,7 @@ status=0x0 [ata pass-through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00 ] scsi_status=0x0, host_status=0x0, driver_status=0x0 - info=0x0 duration=0 milliseconds resid=0 + info=0x0 duration=4 milliseconds resid=0 Incoming data, len=512 [only first 256 bytes shown]: 00 5a 0c ff 3f 37 c8 10 00 00 00 00 00 3f 00 00 00 10 00 00 00 00 20 20 20 20 20 20 20 20 20 20 20 20 @@ -97,11 +97,11 @@ scsi_status=0x2, host_status=0x0, driver_status=0x8 info=0x1 duration=48 milliseconds resid=0 >>> Sense buffer, len=22: - 00 72 00 00 00 00 00 00 0e 09 0c 00 00 00 00 00 00 - 10 00 4f 00 c2 00 50 + 00 72 00 00 00 00 00 00 0e 09 0c 00 00 00 01 00 00 + 10 00 00 00 00 00 50 status=2: [desc] sense_key=0 asc=0 ascq=0 Values from ATA status return descriptor are: - 00 09 0c 00 00 00 00 00 00 00 4f 00 c2 00 50 + 00 09 0c 00 00 00 01 00 00 00 00 00 00 00 50 REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS returned 0 REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK @@ -110,9 +110,13 @@ info=0x1 duration=40 milliseconds resid=0 >>> Sense buffer, len=22: 00 72 00 00 00 00 00 00 0e 09 0c 00 00 00 00 00 00 - 10 00 4f 00 c2 00 50 + 10 00 00 00 00 00 50 status=2: [desc] sense_key=0 asc=0 ascq=0 Values from ATA status return descriptor are: - 00 09 0c 00 00 00 00 00 00 00 4f 00 c2 00 50 -REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK returned 0 - + 00 09 0c 00 00 00 00 00 00 00 00 00 00 00 50 +Error SMART Status command failed +Please get assistance from http://smartmontools.sourceforge.net/ +Values from ATA status return descriptor are: + 00 09 0c 00 00 00 00 00 00 00 00 00 00 00 50 +REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK returned -1 +A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. The log shows that the sense data returned by the commands differ: with 2.6.22 the bytes 4f and 2c (tf.lbam and tf.lbah) are not returned. Both of the status commands fail to return these bytes but the tests in smartctl are more strict for the second case. This is why the second status command seems to be failing. Next I added printks to the function ata_qc_complete() in libata-core.c. The changed code from 2.6.22 at line 5222 looked like this: /* read result TF if requested */ if (qc->flags & ATA_QCFLAG_RESULT_TF) { if (qc->tf.feature == 0xda) printk("ata_qc_complete before: %02x %02x %02x %02x\n", qc->result_tf.feature, qc->result_tf.lbam, qc->result_tf.lbah,
Re: SMART problems in 2.6.22
On Sun, 8 Jul 2007, Bruce Allen wrote: Mark, David, Doug, Tejin, Alan, Jeff, LKML, I'm afraid that there may be some problem with SMART + libata in the 2.6.22 kernel. An hour ago I discovered that I missed a month of correspondence (some LKML, some private) about this problem which Alan, Tejun, Jeff, Mark and others copied to me -- it was automatically shoved into one of my mailboxes by my mail client. Sorry about that. So I am trying to catch up to see if there is some real problem or not. Here is a typical bug report that worries me: http://article.gmane.org/gmane.linux.utilities.smartmontools/4712 Here is another similar report: http://thread.gmane.org/gmane.linux.utilities.smartmontools/4713 And another report: http://www.mail-archive.com/[EMAIL PROTECTED]/msg358354.html From some of the earlier threads that I missed (below) I have the impression that the problem may be a very simple one, namely that starting with 2.6.22 one needs to run a command to enable SMART when a box is first booted -- the kernel no longer does this as part of the init/setup of the disks. But that is NOT consistent with the first two reports above, which show 'SMART ENABLED'. Here are some of the earlier threads that I completely missed: http://www.ussg.iu.edu/hypermail/linux/kernel/0706.1/0849.html I have done some more debugging on this one. An easy way to reproduce the problem is to use 'smartctl -H /dev/sdb'. If I enable debugging with '-r ioctl,2', I find the following difference between outputs using 2.6.21.1 (works OK) and 2.6.22 (fails): --- sm-2.6.21.1b.log2007-07-09 23:47:28.0 +0300 +++ sm-2.6.22.log 2007-07-09 23:39:56.0 +0300 @@ -11,7 +11,7 @@ status=0x0 [ata pass-through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00 ] scsi_status=0x0, host_status=0x0, driver_status=0x0 - info=0x0 duration=0 milliseconds resid=0 + info=0x0 duration=4 milliseconds resid=0 Incoming data, len=512 [only first 256 bytes shown]: 00 5a 0c ff 3f 37 c8 10 00 00 00 00 00 3f 00 00 00 10 00 00 00 00 20 20 20 20 20 20 20 20 20 20 20 20 @@ -97,11 +97,11 @@ scsi_status=0x2, host_status=0x0, driver_status=0x8 info=0x1 duration=48 milliseconds resid=0 Sense buffer, len=22: - 00 72 00 00 00 00 00 00 0e 09 0c 00 00 00 00 00 00 - 10 00 4f 00 c2 00 50 + 00 72 00 00 00 00 00 00 0e 09 0c 00 00 00 01 00 00 + 10 00 00 00 00 00 50 status=2: [desc] sense_key=0 asc=0 ascq=0 Values from ATA status return descriptor are: - 00 09 0c 00 00 00 00 00 00 00 4f 00 c2 00 50 + 00 09 0c 00 00 00 01 00 00 00 00 00 00 00 50 REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS returned 0 REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK @@ -110,9 +110,13 @@ info=0x1 duration=40 milliseconds resid=0 Sense buffer, len=22: 00 72 00 00 00 00 00 00 0e 09 0c 00 00 00 00 00 00 - 10 00 4f 00 c2 00 50 + 10 00 00 00 00 00 50 status=2: [desc] sense_key=0 asc=0 ascq=0 Values from ATA status return descriptor are: - 00 09 0c 00 00 00 00 00 00 00 4f 00 c2 00 50 -REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK returned 0 - + 00 09 0c 00 00 00 00 00 00 00 00 00 00 00 50 +Error SMART Status command failed +Please get assistance from http://smartmontools.sourceforge.net/ +Values from ATA status return descriptor are: + 00 09 0c 00 00 00 00 00 00 00 00 00 00 00 50 +REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK returned -1 +A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. The log shows that the sense data returned by the commands differ: with 2.6.22 the bytes 4f and 2c (tf.lbam and tf.lbah) are not returned. Both of the status commands fail to return these bytes but the tests in smartctl are more strict for the second case. This is why the second status command seems to be failing. Next I added printks to the function ata_qc_complete() in libata-core.c. The changed code from 2.6.22 at line 5222 looked like this: /* read result TF if requested */ if (qc-flags ATA_QCFLAG_RESULT_TF) { if (qc-tf.feature == 0xda) printk(ata_qc_complete before: %02x %02x %02x %02x\n, qc-result_tf.feature, qc-result_tf.lbam, qc-result_tf.lbah,
2.6.22-rc regression: smartctl does not work with SATA disk
The command 'smartctl -a /dev/sdb' fails with 2.6.22-rc4 kernel. The disk /dev/sdb is a SATA disk. The command does work still with a real SCSI disk. The computer has Athlon64 X2 and it is running x86_64 SMP kernel. The chipset is Nvidia CK804 and the sata_nv driver is used. The following output from 'smartctl -a -r ioctl,1 /dev/sdb' tells the disk details and shows where the regression is: - smartctl version 5.38 [x86_64-suse-linux-gnu] Copyright (C) 2002-7 Bruce Allen Home page is http://smartmontools.sourceforge.net/ [inquiry: 12 00 00 00 24 00 ] scsi_status=0x0, host_status=0x0, driver_status=0x0 info=0x0 duration=0 milliseconds resid=0 status=0x0 [ata pass-through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00 ] scsi_status=0x0, host_status=0x0, driver_status=0x0 info=0x0 duration=4 milliseconds resid=0 status=0x0 Detected SAT interface, switch to device type 'sat' REPORT-IOCTL: DeviceFD=3 Command=IDENTIFY DEVICE [ata pass-through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00 ] scsi_status=0x0, host_status=0x0, driver_status=0x0 info=0x0 duration=0 milliseconds resid=0 status=0x0 REPORT-IOCTL: DeviceFD=3 Command=IDENTIFY DEVICE returned 0 === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.10 family Device Model: ST3320620AS Serial Number:9QF22KAP Firmware Version: 3.AAJ User Capacity:320,072,933,376 bytes Device is:In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is:Sun Jun 10 10:47:30 2007 EEST SMART support is: Available - device has SMART capability. SMART support is: Enabled REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS [ata pass-through(16): 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00 ] scsi_status=0x2, host_status=0x0, driver_status=0x8 info=0x1 duration=44 milliseconds resid=0 status=2: [desc] sense_key=0 asc=0 ascq=0 REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS returned 0 REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK [ata pass-through(16): 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00 ] scsi_status=0x2, host_status=0x0, driver_status=0x8 info=0x1 duration=44 milliseconds resid=0 status=2: [desc] sense_key=0 asc=0 ascq=0 Error SMART Status command failed Please get assistance from http://smartmontools.sourceforge.net/ Values from ATA status return descriptor are: 00 09 0c 00 00 00 00 00 00 00 00 00 00 00 50 REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK returned -1 A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. This is smartctl from cvs a few days ago but the smartctl shipping with SuSE 10.2 fails in the same way. I ran 'git bisect' and it suggests that the problem was introduced by 1e999736cafdffc374f22eed37b291129ef82e4e is first bad commit commit 1e999736cafdffc374f22eed37b291129ef82e4e Author: Alan Cox <[EMAIL PROTECTED]> Date: Wed Apr 11 00:23:13 2007 +0100 libata: HPA support i.e., before 2.6.22-rc1. At this point I find best to leave the problem to experts. -- Kai - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.22-rc regression: smartctl does not work with SATA disk
The command 'smartctl -a /dev/sdb' fails with 2.6.22-rc4 kernel. The disk /dev/sdb is a SATA disk. The command does work still with a real SCSI disk. The computer has Athlon64 X2 and it is running x86_64 SMP kernel. The chipset is Nvidia CK804 and the sata_nv driver is used. The following output from 'smartctl -a -r ioctl,1 /dev/sdb' tells the disk details and shows where the regression is: - smartctl version 5.38 [x86_64-suse-linux-gnu] Copyright (C) 2002-7 Bruce Allen Home page is http://smartmontools.sourceforge.net/ [inquiry: 12 00 00 00 24 00 ] scsi_status=0x0, host_status=0x0, driver_status=0x0 info=0x0 duration=0 milliseconds resid=0 status=0x0 [ata pass-through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00 ] scsi_status=0x0, host_status=0x0, driver_status=0x0 info=0x0 duration=4 milliseconds resid=0 status=0x0 Detected SAT interface, switch to device type 'sat' REPORT-IOCTL: DeviceFD=3 Command=IDENTIFY DEVICE [ata pass-through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00 ] scsi_status=0x0, host_status=0x0, driver_status=0x0 info=0x0 duration=0 milliseconds resid=0 status=0x0 REPORT-IOCTL: DeviceFD=3 Command=IDENTIFY DEVICE returned 0 === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.10 family Device Model: ST3320620AS Serial Number:9QF22KAP Firmware Version: 3.AAJ User Capacity:320,072,933,376 bytes Device is:In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is:Sun Jun 10 10:47:30 2007 EEST SMART support is: Available - device has SMART capability. SMART support is: Enabled REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS [ata pass-through(16): 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00 ] scsi_status=0x2, host_status=0x0, driver_status=0x8 info=0x1 duration=44 milliseconds resid=0 status=2: [desc] sense_key=0 asc=0 ascq=0 REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS returned 0 REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK [ata pass-through(16): 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00 ] scsi_status=0x2, host_status=0x0, driver_status=0x8 info=0x1 duration=44 milliseconds resid=0 status=2: [desc] sense_key=0 asc=0 ascq=0 Error SMART Status command failed Please get assistance from http://smartmontools.sourceforge.net/ Values from ATA status return descriptor are: 00 09 0c 00 00 00 00 00 00 00 00 00 00 00 50 REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK returned -1 A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. This is smartctl from cvs a few days ago but the smartctl shipping with SuSE 10.2 fails in the same way. I ran 'git bisect' and it suggests that the problem was introduced by 1e999736cafdffc374f22eed37b291129ef82e4e is first bad commit commit 1e999736cafdffc374f22eed37b291129ef82e4e Author: Alan Cox [EMAIL PROTECTED] Date: Wed Apr 11 00:23:13 2007 +0100 libata: HPA support i.e., before 2.6.22-rc1. At this point I find best to leave the problem to experts. -- Kai - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence
On Fri, 25 Mar 2005, James Bottomley wrote: > On Fri, 2005-03-25 at 14:38 +0900, Tejun Heo wrote: > > We have users of scsi_do_req() other than scsi_wait_req() and they > > use different done() functions to do different things. I've checked > > other done functions and none uses contents inside the passed > > scsi_cmnd, so using a dummy command should be okay with them. Am I > > missing something here? > > Well ... the other users are supposed to be going away. They're > actually all coded wrongly in some way or other ... perhaps I should > speed up the process. > I have seen you mention this several times now and I am getting more and more worried. The reason is that scsi_wait_req() is a synchronous interface and it does not allow a driver to do this: - send a request - do other useful things/let the user do useful work - wait for completion before starting another request I fully agree that doing done() correctly _is_ a problem, especially when the SCSI subsystem evolves and the high-level driver writers do not follow the development closely enough. One solution to these problems would be to let the drivers still use scsi_do_req() and their own done() function, but create two (three) helpers: - one to be called at the beginning of done(); it would do what needs to be done here but lets the driver to do some special things of its own if necessary - one to be called to wait for the request to finish (- one to do scsi_ro_req() and the things necessary before these) Having these helpers would isolate the user of the SCSI subsystem from the internals. scsi_wait_req() should call these functions and no additional maintenance would be needed for this additional asynchronous interface. The current drivers may not do any work in done() that could not be done later but there is one patch pending where this happens: the st performance statistics patch needs to get the time stamp when the SCSI command is processed. -- Kai - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH scsi-misc-2.6 08/08] scsi: fix hot unplug sequence
On Fri, 25 Mar 2005, James Bottomley wrote: On Fri, 2005-03-25 at 14:38 +0900, Tejun Heo wrote: We have users of scsi_do_req() other than scsi_wait_req() and they use different done() functions to do different things. I've checked other done functions and none uses contents inside the passed scsi_cmnd, so using a dummy command should be okay with them. Am I missing something here? Well ... the other users are supposed to be going away. They're actually all coded wrongly in some way or other ... perhaps I should speed up the process. I have seen you mention this several times now and I am getting more and more worried. The reason is that scsi_wait_req() is a synchronous interface and it does not allow a driver to do this: - send a request - do other useful things/let the user do useful work - wait for completion before starting another request I fully agree that doing done() correctly _is_ a problem, especially when the SCSI subsystem evolves and the high-level driver writers do not follow the development closely enough. One solution to these problems would be to let the drivers still use scsi_do_req() and their own done() function, but create two (three) helpers: - one to be called at the beginning of done(); it would do what needs to be done here but lets the driver to do some special things of its own if necessary - one to be called to wait for the request to finish (- one to do scsi_ro_req() and the things necessary before these) Having these helpers would isolate the user of the SCSI subsystem from the internals. scsi_wait_req() should call these functions and no additional maintenance would be needed for this additional asynchronous interface. The current drivers may not do any work in done() that could not be done later but there is one patch pending where this happens: the st performance statistics patch needs to get the time stamp when the SCSI command is processed. -- Kai - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] make st seekable again
On Wed, 9 Mar 2005, Alan Cox wrote: > On Maw, 2005-03-08 at 17:25, Linux Kernel Mailing List wrote: > > ChangeSet 1.2030, 2005/03/08 09:25:05-08:00, [EMAIL PROTECTED] > > > > [PATCH] make st seekable again > > > > Apparently `tar' errors out if it cannot perform lseek() against a > > tape. Work > > around that in-kernel. > > Unfortunately this isn't a good idea. Allowing tar to read the tape > position makes sense, allowing it to zero the position might but you > have to do major surgery on the driver first because > > 1.It doesn't use ppos > 2.It doesn't do locking on the ppos at all > > Also allowing apps to randomly seek and report "ok" when they are > backing up to tape and might really need to see the error is not what > I'd call stable, professional or quality code. > The proper fix is to fix tar. I have sent an analysis of the problem and a suggestion how to fix this to the bug-tar list on March 5 but it is still waiting for moderator approval. While waiting for the application to be fixed, it was decided to restore the old behaviour of the tape drivers. lseek on a tape is not a good fit (addressed by block, blocks on tape can have any size, etc.). I don't know any Unix that would really implement lseek on tapes but they usually don't return error. This is probably why the tar bug has not been found earlier. There has been one useful way of using lseek() with tapes in some systems. Those refuse reads and writes if the file pointer reaches 2 GB. Resetting it with lseek(fd,0,0) now and then has allowed writing/reading more than 2 GB. I don't think implementing proper read-only lseek for tapes is worth the trouble (reliable tracking of the current location is tricky). Purist kernels can refuse lseeks. Pragmatic kernels can allow lseeks until refusing those won't break common applications. -- Kai - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] make st seekable again
On Wed, 9 Mar 2005, Alan Cox wrote: On Maw, 2005-03-08 at 17:25, Linux Kernel Mailing List wrote: ChangeSet 1.2030, 2005/03/08 09:25:05-08:00, [EMAIL PROTECTED] [PATCH] make st seekable again Apparently `tar' errors out if it cannot perform lseek() against a tape. Work around that in-kernel. Unfortunately this isn't a good idea. Allowing tar to read the tape position makes sense, allowing it to zero the position might but you have to do major surgery on the driver first because 1.It doesn't use ppos 2.It doesn't do locking on the ppos at all Also allowing apps to randomly seek and report ok when they are backing up to tape and might really need to see the error is not what I'd call stable, professional or quality code. The proper fix is to fix tar. I have sent an analysis of the problem and a suggestion how to fix this to the bug-tar list on March 5 but it is still waiting for moderator approval. While waiting for the application to be fixed, it was decided to restore the old behaviour of the tape drivers. lseek on a tape is not a good fit (addressed by block, blocks on tape can have any size, etc.). I don't know any Unix that would really implement lseek on tapes but they usually don't return error. This is probably why the tar bug has not been found earlier. There has been one useful way of using lseek() with tapes in some systems. Those refuse reads and writes if the file pointer reaches 2 GB. Resetting it with lseek(fd,0,0) now and then has allowed writing/reading more than 2 GB. I don't think implementing proper read-only lseek for tapes is worth the trouble (reliable tracking of the current location is tricky). Purist kernels can refuse lseeks. Pragmatic kernels can allow lseeks until refusing those won't break common applications. -- Kai - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problems with SCSI tape rewind / verify on 2.4.29
On Wed, 2 Mar 2005, Andrew Morton wrote: > Kai Makisara <[EMAIL PROTECTED]> wrote: > > > > > > > > v2.6 also contains the same problem BTW. > > > > > > Try this: > > > > > > --- a/drivers/scsi/st.c.orig 2005-03-02 09:02:13.637158144 -0300 > > > +++ b/drivers/scsi/st.c 2005-03-02 09:02:20.208159200 -0300 > > > @@ -3778,7 +3778,6 @@ > > > read: st_read, > > > write: st_write, > > > ioctl: st_ioctl, > > > -llseek: no_llseek, > > > open: st_open, > > > flush: st_flush, > > > release:st_release, > > > > This change covers up the problem. The real bug is in tar. > > In that case we're kinda screwed, and should change the kernel to make tar > work again. We can send a bug report to the tar folks (good luck) and wait > a few years. > > > The first BSF did position the tape correctly although it did fail. > > (what's a BSF?) > > If it positioned the tape successfully, why did it claim that it failed? BSF moves the tape backwards over filemarks. tar tries to move over one filemark. It does not find it because it ends to the beginning of the tape. This is why the operation fails. However, the tape is at the beginning and this is the correct place with regard to what is done next. > If we were to fix that up, would tar then be happy? It is not fixable in the kernel. The beginning of the tape is a special case because there is no filemark. Any application should take this into account. We could fake a filemark there but this would lead to problems because then we could "skip" backwards indefinitely even when the tape moves nowhere. This could confuse other applications. If seek with tape is changed back to returning success, this would enable correct tar --verify at the beginning of the tape. However, I am not sure what happens if we are not at the beginning. I will investigate this and suggest a long term fix to the tar people (a fix that should be compatible with all Unix tape semantics I know) and also suggest possible fixes to st (this may include automatic writing of a filemark when BSF is used after writes). If you think want to make st return success for seeks even if nothing happens (as it did earlier), I don't have anything against that. It would solve the practical problem several people have reported recently. (My recommendation for the people seeing this problem is to do verification separately with 'tar -d'.) -- Kai - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problems with SCSI tape rewind / verify on 2.4.29
On Wed, 2 Mar 2005, Marcelo Tosatti wrote: > On Wed, Mar 02, 2005 at 11:17:19PM +0200, Kai Makisara wrote: ... > > BTW, this "fix" by Solar Designer introduces a bug to 2.4.29: a tape > > driver is supposed to return ENOMEM in the case that was changed to return > > EIO ;-( > > Reverted. > Thanks. ... > Thanks for the cluebat Kai, is this problem fixed in newer versions of tar? > The current CVS version seems to have the same code I quoted. > I suspect v2.4 should work with older versions of tar, so we should keep > "lseek" working to make it happy. What is your opinion? > I commented this in the other reply I just sent and I don't have a clear preference. I just hope that 2.4 and 2.6 are fixed in a compatible way. -- Kai - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problems with SCSI tape rewind / verify on 2.4.29
On Wed, 2 Mar 2005, Marcelo Tosatti wrote: > On Wed, Mar 02, 2005 at 11:15:42AM -, Mark Yeatman wrote: > > Hi > > > > Never had to log a bug before, hope this is correctly done. > > > > Thanks > > > > Mark > > > > Detail > > > > [1.] One line summary of the problem: > > SCSI tape drive is refusing to rewind after backup to allow verify and > > causing illegal seek error > > > > [2.] Full description of the problem/report: > > On backup the tape drive is reporting the following error and failing > > it's backups. > > > > tar: /dev/st0: Warning: Cannot seek: Illegal seek > > > > I have traced this back to failing at an upgrade of the kernel to 2.4.29 > > on Feb 8th. The backups have not worked since. Replacement Drives have > > been tried and cables to no avail. I noticed in the the changelog that a > > patch by Solar Designer to the Scsi tape return code had been made. BTW, this "fix" by Solar Designer introduces a bug to 2.4.29: a tape driver is supposed to return ENOMEM in the case that was changed to return EIO ;-( > > v2.6 also contains the same problem BTW. > > Try this: > > --- a/drivers/scsi/st.c.orig 2005-03-02 09:02:13.637158144 -0300 > +++ b/drivers/scsi/st.c 2005-03-02 09:02:20.208159200 -0300 > @@ -3778,7 +3778,6 @@ > read: st_read, > write: st_write, > ioctl: st_ioctl, > - llseek: no_llseek, > open: st_open, > flush: st_flush, > release:st_release, This change covers up the problem. The real bug is in tar. The following code is from tar is supposed to reposition the tape to the beginning of the file jus written: #ifdef MTIOCTOP { struct mtop operation; int status; operation.mt_op = MTBSF; operation.mt_count = 1; if (status = rmtioctl (archive, MTIOCTOP, (char *) ), status < 0) { if (errno != EIO || (status = rmtioctl (archive, MTIOCTOP, (char *) ), status < 0)) { #endif if (rmtlseek (archive, (off_t) 0, SEEK_SET) != 0) { /* Lseek failed. Try a different method. */ seek_warn (archive_name_array[0]); return; } #ifdef MTIOCTOP } } } #endif Here is output from strace showing what happens with 'tar -c -W' applied at the beginning of the tape (this is using kernel 2.6.11-rc4 but the same probably happens with 2.4.29): ... ioctl(3, MGSL_IOCGPARAMS or MTIOCTOP or SNDCTL_MIDI_MPUMODE, 0x7fffecd0) = -1 EIO (Input/output error) ioctl(3, MGSL_IOCGPARAMS or MTIOCTOP or SNDCTL_MIDI_MPUMODE, 0x7fffecd0) = -1 EIO (Input/output error) lseek(3, 0, SEEK_SET) = -1 ESPIPE (Illegal seek) So, both tape positioning commands fail and the code falls back to lseek. Earlier it has returned success even though it has not done anything (this was on purpose because it is the way some other Unices behave and with reason). In that case this tar succeeded but it was pure luck. The first BSF did position the tape correctly although it did fail. The 2.6 st driver does contain this near the beginning of st_open(): nonseekable_open(inode, filp); This probably makes lseek fail. This code has been in st.c since 2.6.8. -- Kai - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problems with SCSI tape rewind / verify on 2.4.29
On Wed, 2 Mar 2005, Marcelo Tosatti wrote: On Wed, Mar 02, 2005 at 11:15:42AM -, Mark Yeatman wrote: Hi Never had to log a bug before, hope this is correctly done. Thanks Mark Detail [1.] One line summary of the problem: SCSI tape drive is refusing to rewind after backup to allow verify and causing illegal seek error [2.] Full description of the problem/report: On backup the tape drive is reporting the following error and failing it's backups. tar: /dev/st0: Warning: Cannot seek: Illegal seek I have traced this back to failing at an upgrade of the kernel to 2.4.29 on Feb 8th. The backups have not worked since. Replacement Drives have been tried and cables to no avail. I noticed in the the changelog that a patch by Solar Designer to the Scsi tape return code had been made. BTW, this fix by Solar Designer introduces a bug to 2.4.29: a tape driver is supposed to return ENOMEM in the case that was changed to return EIO ;-( v2.6 also contains the same problem BTW. Try this: --- a/drivers/scsi/st.c.orig 2005-03-02 09:02:13.637158144 -0300 +++ b/drivers/scsi/st.c 2005-03-02 09:02:20.208159200 -0300 @@ -3778,7 +3778,6 @@ read: st_read, write: st_write, ioctl: st_ioctl, - llseek: no_llseek, open: st_open, flush: st_flush, release:st_release, This change covers up the problem. The real bug is in tar. The following code is from tar is supposed to reposition the tape to the beginning of the file jus written: #ifdef MTIOCTOP { struct mtop operation; int status; operation.mt_op = MTBSF; operation.mt_count = 1; if (status = rmtioctl (archive, MTIOCTOP, (char *) operation), status 0) { if (errno != EIO || (status = rmtioctl (archive, MTIOCTOP, (char *) operation), status 0)) { #endif if (rmtlseek (archive, (off_t) 0, SEEK_SET) != 0) { /* Lseek failed. Try a different method. */ seek_warn (archive_name_array[0]); return; } #ifdef MTIOCTOP } } } #endif Here is output from strace showing what happens with 'tar -c -W' applied at the beginning of the tape (this is using kernel 2.6.11-rc4 but the same probably happens with 2.4.29): ... ioctl(3, MGSL_IOCGPARAMS or MTIOCTOP or SNDCTL_MIDI_MPUMODE, 0x7fffecd0) = -1 EIO (Input/output error) ioctl(3, MGSL_IOCGPARAMS or MTIOCTOP or SNDCTL_MIDI_MPUMODE, 0x7fffecd0) = -1 EIO (Input/output error) lseek(3, 0, SEEK_SET) = -1 ESPIPE (Illegal seek) So, both tape positioning commands fail and the code falls back to lseek. Earlier it has returned success even though it has not done anything (this was on purpose because it is the way some other Unices behave and with reason). In that case this tar succeeded but it was pure luck. The first BSF did position the tape correctly although it did fail. The 2.6 st driver does contain this near the beginning of st_open(): nonseekable_open(inode, filp); This probably makes lseek fail. This code has been in st.c since 2.6.8. -- Kai - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problems with SCSI tape rewind / verify on 2.4.29
On Wed, 2 Mar 2005, Marcelo Tosatti wrote: On Wed, Mar 02, 2005 at 11:17:19PM +0200, Kai Makisara wrote: ... BTW, this fix by Solar Designer introduces a bug to 2.4.29: a tape driver is supposed to return ENOMEM in the case that was changed to return EIO ;-( Reverted. Thanks. ... Thanks for the cluebat Kai, is this problem fixed in newer versions of tar? The current CVS version seems to have the same code I quoted. I suspect v2.4 should work with older versions of tar, so we should keep lseek working to make it happy. What is your opinion? I commented this in the other reply I just sent and I don't have a clear preference. I just hope that 2.4 and 2.6 are fixed in a compatible way. -- Kai - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problems with SCSI tape rewind / verify on 2.4.29
On Wed, 2 Mar 2005, Andrew Morton wrote: Kai Makisara [EMAIL PROTECTED] wrote: v2.6 also contains the same problem BTW. Try this: --- a/drivers/scsi/st.c.orig 2005-03-02 09:02:13.637158144 -0300 +++ b/drivers/scsi/st.c 2005-03-02 09:02:20.208159200 -0300 @@ -3778,7 +3778,6 @@ read: st_read, write: st_write, ioctl: st_ioctl, -llseek: no_llseek, open: st_open, flush: st_flush, release:st_release, This change covers up the problem. The real bug is in tar. In that case we're kinda screwed, and should change the kernel to make tar work again. We can send a bug report to the tar folks (good luck) and wait a few years. The first BSF did position the tape correctly although it did fail. (what's a BSF?) If it positioned the tape successfully, why did it claim that it failed? BSF moves the tape backwards over filemarks. tar tries to move over one filemark. It does not find it because it ends to the beginning of the tape. This is why the operation fails. However, the tape is at the beginning and this is the correct place with regard to what is done next. If we were to fix that up, would tar then be happy? It is not fixable in the kernel. The beginning of the tape is a special case because there is no filemark. Any application should take this into account. We could fake a filemark there but this would lead to problems because then we could skip backwards indefinitely even when the tape moves nowhere. This could confuse other applications. If seek with tape is changed back to returning success, this would enable correct tar --verify at the beginning of the tape. However, I am not sure what happens if we are not at the beginning. I will investigate this and suggest a long term fix to the tar people (a fix that should be compatible with all Unix tape semantics I know) and also suggest possible fixes to st (this may include automatic writing of a filemark when BSF is used after writes). If you think want to make st return success for seeks even if nothing happens (as it did earlier), I don't have anything against that. It would solve the practical problem several people have reported recently. (My recommendation for the people seeing this problem is to do verification separately with 'tar -d'.) -- Kai - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: raw tape device support???
On Thu, 3 May 2001, Mark Hounschell wrote: > Sorry if this isn't the correct place for this question. Is there or > will there > ever be raw tape device access. I'm trying to port an app from Dec unix > and at > least there the app requires /dev/rmt** (raw device). I've read in the > archives > about how to bind a block device to a raw device using the raw command > but the > tape dev (/dev/st*) is a char device and the command doesn't work on > char devices. > So I'm trying to figure out to get the same effect as /dev/rmt* does on > the dec > box in a linux environment. You can just use the device /dev/st* (or /dev/nst*). They are raw (character) devices. If your app needs to find the devices with names /dev/rmt*, you can make new device nodes or use links. Kai - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: raw tape device support???
On Thu, 3 May 2001, Mark Hounschell wrote: Sorry if this isn't the correct place for this question. Is there or will there ever be raw tape device access. I'm trying to port an app from Dec unix and at least there the app requires /dev/rmt** (raw device). I've read in the archives about how to bind a block device to a raw device using the raw command but the tape dev (/dev/st*) is a char device and the command doesn't work on char devices. So I'm trying to figure out to get the same effect as /dev/rmt* does on the dec box in a linux environment. You can just use the device /dev/st* (or /dev/nst*). They are raw (character) devices. If your app needs to find the devices with names /dev/rmt*, you can make new device nodes or use links. Kai - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help with allocating a 2M buffer size
On Thu, 15 Mar 2001, Byron Stanoszek wrote: > I have a real picky tape drive (DLT series) that likes to be fed large chunks > of data at once, otherwise after every 2-4KB of data it halts and rewinds > itself because its cache for writing to the tape is empty. > > My best solution to this problem was to use 'tar -b 4096', which sends 4096 x > 512-byte blocks at once for a total of a 2MB buffer size. This worked fine for > several weeks, until 2 days ago I got this message (and the backup fails): > > st: failed to enlarge buffer to 2097152 bytes. > The default maximum number of scatter/gather segments in the tape driver is 16. This means that big chunks of memory are needed to allocate a 2 MB buffer. You can increase the number of segments up to, e.g., 128. This means that only 16 kB chunks are needed to make up a 2 MB buffer. The number of scatter/gather segments is also limited by your SCSI adapter driver. Note that even with 16 kB segments you may find problems at some time because multi-page allocations are needed. You can increase the number of scatter/gather segments at system startup/module loading or when compiling the driver. See the file linux/drivers/scsi/README.st for the syntax and st_options.h for the compile-time definition. Kai - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help with allocating a 2M buffer size
On Thu, 15 Mar 2001, Byron Stanoszek wrote: I have a real picky tape drive (DLT series) that likes to be fed large chunks of data at once, otherwise after every 2-4KB of data it halts and rewinds itself because its cache for writing to the tape is empty. My best solution to this problem was to use 'tar -b 4096', which sends 4096 x 512-byte blocks at once for a total of a 2MB buffer size. This worked fine for several weeks, until 2 days ago I got this message (and the backup fails): st: failed to enlarge buffer to 2097152 bytes. The default maximum number of scatter/gather segments in the tape driver is 16. This means that big chunks of memory are needed to allocate a 2 MB buffer. You can increase the number of segments up to, e.g., 128. This means that only 16 kB chunks are needed to make up a 2 MB buffer. The number of scatter/gather segments is also limited by your SCSI adapter driver. Note that even with 16 kB segments you may find problems at some time because multi-page allocations are needed. You can increase the number of scatter/gather segments at system startup/module loading or when compiling the driver. See the file linux/drivers/scsi/README.st for the syntax and st_options.h for the compile-time definition. Kai - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0-test4&6 scsi tape problem [not fixed :-/]
I suggest we move this discussion from linux-kernel to linux-scsi. On Thu, 31 Aug 2000, G. Saraber wrote: > "Richard B. Johnson" wrote: > > > > On Wed, 30 Aug 2000, G. Saraber wrote: > > > > > Thanks for the excellent guide on how to pinpoint the problem ... > > > > > > guess what :-) I decided before I send in another bugreport i'll upgrade > > > to test7 so the developers dont have to dig through 'old' kernel > > > versions .. > > > anyway, the problem went away, they must have fixed it :-) > > ok, > i wasn't able to reproduce the problem before, but now it's occurred > again, this the second nightly backup since booting with 2.4.0-test7, > the first one went fine which i couldnt get to happen with test4 and > test6 so test7 is slightly better. The error occured after a little over > 3GB is backed up on a 12GB (uncompressed space) tape. > However this time I have logs :-) and once again, "mt offline" or the > button on the tapedrive itself won't release the tape, > any mt command after the first "mt offline" gives: > [root@ahr log]# mt offline > /dev/tape: Input/output error > > right away, it doesnt even try to access the drive the second time > around.. > > (logs attached below) i'll do more testing later today. > i had to heavily snip the log to keep the size under control to save > bandwith, i'll gladly send you the full list in the format of your > choice, just drop me a line. > > Regards, > Gerard Saraber > [EMAIL PROTECTED] > http://www.rarcoa.com > scsi log -- > Aug 31 00:42:50 ahr kernel: scsi : aborting command due to timeout : pid > 0, scsi0, channel 0, id 3, lun 0 0x2a 00 00 09 b4 d5 00 00 02 00 > Aug 31 00:43:15 ahr kernel: scsi : aborting command due to timeout : pid > 0, scsi0, channel 0, id 3, lun 0 0x28 00 00 02 17 3f 00 00 80 00 > Aug 31 00:43:20 ahr kernel: scsi : aborting command due to timeout : pid > 0, scsi0, channel 0, id 3, lun 0 0x2a 00 00 09 b4 d5 00 00 02 00 > Aug 31 00:43:20 ahr kernel: scsi : aborting command due to timeout : pid > 0, scsi0, channel 0, id 4, lun 0 0x2a 00 00 26 74 f3 00 00 02 00 > Aug 31 00:43:20 ahr kernel: scsi : aborting command due to timeout : pid > 0, scsi0, channel 0, id 4, lun 0 0x2a 00 00 26 77 1b 00 00 22 00 You get timeouts from devices 3 and 4 on the bus (I assume these are disks). These are read (0x28) and write (0x2a) commands (10 byte versions). Note that these timeouts start at 00:42:50. The disks probably time out because the tape drive has not released the SCSI bus. [timeout messages cut] > Aug 31 00:57:20 ahr kernel: scsi : aborting command due to timeout : pid > 0, scsi0, channel 0, id 3, lun 0 0x28 00 00 01 74 57 00 00 68 00 > Aug 31 00:57:20 ahr kernel: scsi : aborting command due to timeout : pid > 0, scsi0, channel 0, id 5, lun 0 0x0a 01 00 00 40 00 ^^^ This is presumably the last command sent to the tape drive. It is a write command that writes 64 tape blocks in fixed block mode. This looks legal. The time here is 00:57:20 which means that the tape command times out 15 minutes after the first timeout from the disks. The tape driver timeout is 15 minutes and so the tape command times out properly. After this the SCSI subsystem decides that it should try to reset the SCSI bus to resolve the problem. > Aug 31 00:57:20 ahr kernel: SCSI host 0 abort (pid 0) timed out - > resetting > Aug 31 00:57:20 ahr kernel: SCSI bus is being reset for host 0 channel > 0. > Aug 31 00:57:20 ahr kernel: (scsi0:0:5:0) Synchronous at 10.0 Mbyte/sec, > offset 15. > Aug 31 00:57:20 ahr kernel: (scsi0:0:3:0) Synchronous at 80.0 Mbyte/sec, > offset 31. > Aug 31 00:57:20 ahr kernel: (scsi0:0:4:0) Synchronous at 80.0 Mbyte/sec, > offset 31. > Aug 31 00:57:25 ahr kernel: st0: Error with sense data: Info fld=0x40, > Current st09:00: sns = f0 6 ^ translation: UNIT ATTENTION (I am not trying to shout: the capital letters are cut from the SCSI standard draft :-) (If you enable verbose SCSI messages in the kernel configuration, the kernel does this translation for you.) > Aug 31 00:57:25 ahr kernel: ASC=29 ASCQ= 0 ^^ translation: POWER ON, RESET, OR BUS DEVICE RESET OCCURRED > Aug 31 00:57:25 ahr kernel: Raw sense data:0xf0 0x00 0x06 0x00 0x00 0x00 > 0x40 0x12 0x00 0x00 0x00 0x00 0x29 0x00 0x00 0x00 0x00 0x00 0x00 0x00 > 0x00 0x00 0x00 0x00 0x00 0x00 > Aug 31 00:57:25 ahr kernel: st0: Error with sense data: Info fld=0x1, > Current st09:00: sns = f0 2 ^ translation: NOT READY > Aug 31 00:57:25 ahr kernel: ASC= 4 ASCQ= 1 ^^ translation: LOGICAL UNIT IS IN PROCESS OF BECOMING READY When the tape driver sees a unit attention anywhere else than at open(), it prevents further access to the tape until some command is issued that puts the tape into a known position. Rewind is one example. So,
Re: 2.4.0-test46 scsi tape problem [not fixed :-/]
I suggest we move this discussion from linux-kernel to linux-scsi. On Thu, 31 Aug 2000, G. Saraber wrote: "Richard B. Johnson" wrote: On Wed, 30 Aug 2000, G. Saraber wrote: Thanks for the excellent guide on how to pinpoint the problem ... guess what :-) I decided before I send in another bugreport i'll upgrade to test7 so the developers dont have to dig through 'old' kernel versions .. anyway, the problem went away, they must have fixed it :-) ok, i wasn't able to reproduce the problem before, but now it's occurred again, this the second nightly backup since booting with 2.4.0-test7, the first one went fine which i couldnt get to happen with test4 and test6 so test7 is slightly better. The error occured after a little over 3GB is backed up on a 12GB (uncompressed space) tape. However this time I have logs :-) and once again, "mt offline" or the button on the tapedrive itself won't release the tape, any mt command after the first "mt offline" gives: [root@ahr log]# mt offline /dev/tape: Input/output error right away, it doesnt even try to access the drive the second time around.. (logs attached below) i'll do more testing later today. i had to heavily snip the log to keep the size under control to save bandwith, i'll gladly send you the full list in the format of your choice, just drop me a line. Regards, Gerard Saraber [EMAIL PROTECTED] http://www.rarcoa.com scsi log -- Aug 31 00:42:50 ahr kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 0x2a 00 00 09 b4 d5 00 00 02 00 Aug 31 00:43:15 ahr kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 0x28 00 00 02 17 3f 00 00 80 00 Aug 31 00:43:20 ahr kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 0x2a 00 00 09 b4 d5 00 00 02 00 Aug 31 00:43:20 ahr kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 4, lun 0 0x2a 00 00 26 74 f3 00 00 02 00 Aug 31 00:43:20 ahr kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 4, lun 0 0x2a 00 00 26 77 1b 00 00 22 00 You get timeouts from devices 3 and 4 on the bus (I assume these are disks). These are read (0x28) and write (0x2a) commands (10 byte versions). Note that these timeouts start at 00:42:50. The disks probably time out because the tape drive has not released the SCSI bus. [timeout messages cut] Aug 31 00:57:20 ahr kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 0 0x28 00 00 01 74 57 00 00 68 00 Aug 31 00:57:20 ahr kernel: scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 5, lun 0 0x0a 01 00 00 40 00 ^^^ This is presumably the last command sent to the tape drive. It is a write command that writes 64 tape blocks in fixed block mode. This looks legal. The time here is 00:57:20 which means that the tape command times out 15 minutes after the first timeout from the disks. The tape driver timeout is 15 minutes and so the tape command times out properly. After this the SCSI subsystem decides that it should try to reset the SCSI bus to resolve the problem. Aug 31 00:57:20 ahr kernel: SCSI host 0 abort (pid 0) timed out - resetting Aug 31 00:57:20 ahr kernel: SCSI bus is being reset for host 0 channel 0. Aug 31 00:57:20 ahr kernel: (scsi0:0:5:0) Synchronous at 10.0 Mbyte/sec, offset 15. Aug 31 00:57:20 ahr kernel: (scsi0:0:3:0) Synchronous at 80.0 Mbyte/sec, offset 31. Aug 31 00:57:20 ahr kernel: (scsi0:0:4:0) Synchronous at 80.0 Mbyte/sec, offset 31. Aug 31 00:57:25 ahr kernel: st0: Error with sense data: Info fld=0x40, Current st09:00: sns = f0 6 ^ translation: UNIT ATTENTION (I am not trying to shout: the capital letters are cut from the SCSI standard draft :-) (If you enable verbose SCSI messages in the kernel configuration, the kernel does this translation for you.) Aug 31 00:57:25 ahr kernel: ASC=29 ASCQ= 0 ^^ translation: POWER ON, RESET, OR BUS DEVICE RESET OCCURRED Aug 31 00:57:25 ahr kernel: Raw sense data:0xf0 0x00 0x06 0x00 0x00 0x00 0x40 0x12 0x00 0x00 0x00 0x00 0x29 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 Aug 31 00:57:25 ahr kernel: st0: Error with sense data: Info fld=0x1, Current st09:00: sns = f0 2 ^ translation: NOT READY Aug 31 00:57:25 ahr kernel: ASC= 4 ASCQ= 1 ^^ translation: LOGICAL UNIT IS IN PROCESS OF BECOMING READY When the tape driver sees a unit attention anywhere else than at open(), it prevents further access to the tape until some command is issued that puts the tape into a known position. Rewind is one example. So, the fact that you don't seem to be able to do anything with the tape after the bus