Re: SCSI ERRORS
> I've seen that error on my SS10 with two disks as well. Do you > have an external device (e.g. CD-ROM, Streamer) connected ? > That error seemed to happen less frequently after I disconnected my > external CD-ROM drive. that's the point where scsi gets a bit esoteric: a bit to long cables, a terminator that is not completely ok, some plugs that are dirty and you have exactly such problems. try to reorder the devices, use another cable and/or terminator usw. it is possible that this helps... one time i had such a problem it was the temperature of the devices and/or cables. turning some fans resolved it... gruess vom emi
Re: SCSI ERRORS
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Am Montag, 2. Juni 2003 23:27 schrieb Sébastien Canchon: > Hello, > > I have an SS20, running Debian gnu/linux 3.0rc1, with two disk scsi ... all > works fine, but 10 minutes ago, when I have reboot the machine i have: > > scsi : aborting command due to timeout : pid 11, scsi0, channel 0, id 3, > lun 0 Test Unit Ready 00 00 00 00 00 esp0: Aborting command > esp0: dumping state > esp0: dma -- cond_reg addr > esp0: SW [sreg<00> sstep<00> ireg<20>] > esp0: HW reread [sreg<12> sstep ireg<00>] > esp0: current command [tgt<03> lun<00> pphase cphase] > esp0: disconnected > SCSI host 0 abort (pid 11) timed out - resetting > SCSI bus is being reset for host 0 channel 0. > esp0: Resetting scsi bus > esp0: Gross error sreg=40 > esp0: SCSI bus reset interrupt > SCSI host 0 channel 0 reset (pid 11) timed out - trying harder > SCSI bus is being reset for host 0 channel 0. > esp0: Resetting scsi bus > esp0: SCSI bus reset interrupt > esp0: SCSI bus reset interrupt > SCSI host 0 reset (pid 11) timed out again - > probably an unrecoverable SCSI bus or device hang. I've seen that error on my SS10 with two disks as well. Do you have an external device (e.g. CD-ROM, Streamer) connected ? That error seemed to happen less frequently after I disconnected my external CD-ROM drive. Stefan - -- - - Stefan Naewe ([EMAIL PROTECTED]) GNU/Linux User #165035 PGP-Key: FF26 564E FB8D 70E8 A607 9C97 1950 C3AE CFBD 78B0 It's most certainly GNU/Linux, not Linux. Read more at http://www.gnu.org/gnu/why-gnu-linux.html. - - -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE+3OTeGVDDrs+9eLARAvmVAJ9nkJ9OxlT7Kd6JFw/O02kcCtxRewCgmZRc 9SrrvA5BIa5WC2VfYwyqhWw= =FHp1 -END PGP SIGNATURE-
SCSI ERRORS
Hello, I have an SS20, running Debian gnu/linux 3.0rc1, with two disk scsi ... all works fine, but 10 minutes ago, when I have reboot the machine i have: scsi : aborting command due to timeout : pid 11, scsi0, channel 0, id 3, lun 0 Test Unit Ready 00 00 00 00 00 esp0: Aborting commandesp0: dumping stateesp0: dma -- cond_reg addresp0: SW [sreg<00> sstep<00> ireg<20>]esp0: HW reread [sreg<12> sstep ireg<00>]esp0: current command [tgt<03> lun<00> pphase cphase]esp0: disconnected SCSI host 0 abort (pid 11) timed out - resettingSCSI bus is being reset for host 0 channel 0.esp0: Resetting scsi busesp0: Gross error sreg=40esp0: SCSI bus reset interruptSCSI host 0 channel 0 reset (pid 11) timed out - trying harderSCSI bus is being reset for host 0 channel 0.esp0: Resetting scsi busesp0: SCSI bus reset interruptesp0: SCSI bus reset interruptSCSI host 0 reset (pid 11) timed out again -probably an unrecoverable SCSI bus or device hang. Need help !!! S.C
Re: Strange SCSI errors.
Daniel van Eeden wrote: So we're sure it's not the hardware. Now something strange, almost the same happend some days ago on my workstation. I had two scsi disks attached to it. Then something strange happened (disk seemed to be very busy). I couldn't get any information from it. after a reboot I found out that te partiontable was damaged. My workstation is an x86 based system with a Adaptec 2940 scsi controller. The failing disk was a 20G seagate with scsi-id 12. (the other disk has scsi-id 10) I didn't check the drive yet, but I suspect it to be ok. (smartclt also told me so) Could be the same problem? Are all disks attached to the same scsi channel? Which kernels are those systems using? which filesystems? (I'm using 2.4.19 with ext3) Hm.. I was using 2.4.19-sun4u-smp with ext3 too when the partition table got messed up. I was writing a partition table to another disk, when the scsi subsystem stopped responding. My drive became unbootable, but by booting from CD I mounted it and took a look at the tables. The partition table spanned from block 0 to the last block, with just one partition. Mounting the drive, however, revealed that it was a 50MB partition (my /boot). But, on my second attempt, my root is ext2 for now, while I try to mirror it over to an ext3 filesystem. That one also hung, I'll check if it is the "live target 0 not responding" this time too, or if it has moved to the ext3 disks. And yes, all disks are attached to the same chain, my disk was a 4.2GB Sun Seagate drive. The symptoms I've encountered so far has been that the system stops responding, and the load slowly increases all the time. I've only got a ss4-110. Can I simulate the problem with that one? Checking my "Black Bible" here, you can have two drives in it. That should be enough to do some testing. Not sure which scsi chipset that one uses, though. There are more common things between the u2 and e3000...there both 64 bit, isn't it? Maybe the kernel, libc or any other software with similair versions? Very true. The E3000 I have has got UltraSPARC I and the U2 at work has UltraSPARC II, both fully 64bit. But since a similar thing happened to your x86 machine, it doesn't seem likely that it has got to do with 32/64-bit. Daniel van Eeden //Andreas
Re: Strange SCSI errors.
So we're sure it's not the hardware. Now something strange, almost the same happend some days ago on my workstation. I had two scsi disks attached to it. Then something strange happened (disk seemed to be very busy). I couldn't get any information from it. after a reboot I found out that te partiontable was damaged. My workstation is an x86 based system with a Adaptec 2940 scsi controller. The failing disk was a 20G seagate with scsi-id 12. (the other disk has scsi-id 10) I didn't check the drive yet, but I suspect it to be ok. (smartclt also told me so) Could be the same problem? Are all disks attached to the same scsi channel? Which kernels are those systems using? which filesystems? (I'm using 2.4.19 with ext3) I've only got a ss4-110. Can I simulate the problem with that one? There are more common things between the u2 and e3000...there both 64 bit, isn't it? Maybe the kernel, libc or any other software with similair versions? Daniel van Eeden <[EMAIL PROTECTED]> Andreas Loong wrote: [note, this is for the ultra2, which displays the same problem] try these: badblocks (read the manpage) read the manpage, tried the program.. it didn't find anything. This doesn't seem to be the problem, from what I've experienced. cat /proc/scsi//0 Sparc ESP Host Adapter: PROM node f006347c PROM name SUNW,fas ESP Model Happy Meal FAS DMA RevisionRev HME/FAS cat /proc/scsi/scsi nothing unusual here. scsi-config (X frontend for scsiinfo) well, it finds the disks etc, and I can't find any strange values. and if everything fails this could be a (dirty) solution: scsiadd -r scsiadd -a This is not really what I want to do. I'll try to explain the problem better : Sometimes, after the system has been up and running for a while with a couple of disks attached, I get "Live target 0 not responding" plastered over the console. The disks becomes totally non-responsive and the LED is lit constantly. Nothing gets written to the logs at all. This happened with one disk, and it managed to corrupt my partition table on that disk. I reinstalled on another disk and thought that I don't want to encounter this kind of problem again, I thought it was the disk that was faulty. Now, with a different disk, woody installed on it. Got the latest SMP kernel from the stable tree and started to construct a mirror of two different disks. Then I got hit with the same error message again. A bit odd, had to reboot.. got the mirrors up and running and today I was just about to copy the contents of the root over to the mirror so that I could quietly sit and work on the files that needed some work in order to reflect the changes. While copying, it hung again. I do not think this is a hardware issue, as it always messes with target 0, no matter what drive is there. The feeling I get after encountered this problem on two different machines is that it is either kernel-based or debian-based. The Ultra2 and the Enterprise 3000 have a few things in common, although one is high-end and the other is rather low-end. 1) Both are SBUS based. 2) Same SCSI chip? I'll check this. 3) Anything else? Hope this clears up any misunderstandings. Wbr Andreas Loong -- +-+ | Daniel van Eeden <[EMAIL PROTECTED]> | | icq: 36952189 | | aim: Compukid128| | jabber: [EMAIL PROTECTED] | | msn: [EMAIL PROTECTED]| | phone: +31 343 522622 | | http://compukid.no-ip.org/about_me.html | +-+
Re: Strange SCSI errors.
[note, this is for the ultra2, which displays the same problem] try these: badblocks (read the manpage) read the manpage, tried the program.. it didn't find anything. This doesn't seem to be the problem, from what I've experienced. cat /proc/scsi//0 Sparc ESP Host Adapter: PROM node f006347c PROM name SUNW,fas ESP Model Happy Meal FAS DMA RevisionRev HME/FAS cat /proc/scsi/scsi nothing unusual here. scsi-config (X frontend for scsiinfo) well, it finds the disks etc, and I can't find any strange values. and if everything fails this could be a (dirty) solution: scsiadd -r scsiadd -a This is not really what I want to do. I'll try to explain the problem better : Sometimes, after the system has been up and running for a while with a couple of disks attached, I get "Live target 0 not responding" plastered over the console. The disks becomes totally non-responsive and the LED is lit constantly. Nothing gets written to the logs at all. This happened with one disk, and it managed to corrupt my partition table on that disk. I reinstalled on another disk and thought that I don't want to encounter this kind of problem again, I thought it was the disk that was faulty. Now, with a different disk, woody installed on it. Got the latest SMP kernel from the stable tree and started to construct a mirror of two different disks. Then I got hit with the same error message again. A bit odd, had to reboot.. got the mirrors up and running and today I was just about to copy the contents of the root over to the mirror so that I could quietly sit and work on the files that needed some work in order to reflect the changes. While copying, it hung again. I do not think this is a hardware issue, as it always messes with target 0, no matter what drive is there. The feeling I get after encountered this problem on two different machines is that it is either kernel-based or debian-based. The Ultra2 and the Enterprise 3000 have a few things in common, although one is high-end and the other is rather low-end. 1) Both are SBUS based. 2) Same SCSI chip? I'll check this. 3) Anything else? Hope this clears up any misunderstandings. Wbr Andreas Loong
Re: Strange SCSI errors.
try these: badblocks (read the manpage) cat /proc/scsi//0 cat /proc/scsi/scsi scsi-config (X frontend for scsiinfo) and if everything fails this could be a (dirty) solution: scsiadd -r scsiadd -a Andreas Loong wrote: Hi there, I have debian woody 3.0r1 stable installed on a Enterprise 3000. (1280 ram, 6x170mhz ultrasparc I) The whole machine behaves extremely nice, when there's one disk in the system. However, having more than one disk, the least bit of disk activity, then disk 0 stops responding. Looking at the disk, the diode is constantly on and the machine has stopped responding. Trying to do anything that isn't currently in ram results in a hang. Removing disk 0 in mid flight and inserting it again yields nothing. The kernel is taken directly from apt, the kernel-image-2.4.19-sun4u-smp so this does strike me as odd. This is not the first machine I've seen this behaviour - it also happens on the Ultra2. Any suggestions on what to check? CC me as I'm not on the list. Wbr Andreas Loong -- +-+ | Daniel van Eeden <[EMAIL PROTECTED]> | | icq: 36952189 | | aim: Compukid128| | jabber: [EMAIL PROTECTED] | | msn: [EMAIL PROTECTED]| | phone: +31 343 522622 | | http://compukid.no-ip.org/about_me.html | +-+
Strange SCSI errors.
Hi there, I have debian woody 3.0r1 stable installed on a Enterprise 3000. (1280 ram, 6x170mhz ultrasparc I) The whole machine behaves extremely nice, when there's one disk in the system. However, having more than one disk, the least bit of disk activity, then disk 0 stops responding. Looking at the disk, the diode is constantly on and the machine has stopped responding. Trying to do anything that isn't currently in ram results in a hang. Removing disk 0 in mid flight and inserting it again yields nothing. The kernel is taken directly from apt, the kernel-image-2.4.19-sun4u-smp so this does strike me as odd. This is not the first machine I've seen this behaviour - it also happens on the Ultra2. Any suggestions on what to check? CC me as I'm not on the list. Wbr Andreas Loong -- ___ Andreas Loong Phone: +46 31 750 20 66 Dimension ABFax: Kruthusg 17 http://www.dimension.se S-405 23 Goteborg ---
Re: SCSI errors: esp0: SCSI bus reset interrupt
Hello, > I've got a SS5, and it's been running Debian 2.1(2.0.38) for > sometime happily. This weekend I upgraded to 2.2 > frozen (2.2.13). Since upgrading, I've gotten a > constant stream of errors: I have the same errors with kernels >2.2.9 If the machine runs with 2.2.9 it works correctly for months, but not with 2.2.10 and up...
SCSI errors: esp0: SCSI bus reset interrupt
Sparc experts: I've got a SS5, and it's been running Debian 2.1(2.0.38) for sometime happily. This weekend I upgraded to 2.2 frozen (2.2.13). Since upgrading, I've gotten a constant stream of errors: Feb 1 11:54:17 elvis kernel: esp0: Resetting scsi bus Feb 1 11:54:17 elvis kernel: esp0: SCSI bus reset interrupt Feb 1 11:54:17 elvis kernel: esp0: SCSI bus reset interrupt Feb 1 11:55:02 elvis kernel: esp0: Resetting scsi bus Feb 1 11:55:02 elvis kernel: esp0: SCSI bus reset interrupt Feb 1 11:55:02 elvis kernel: esp0: SCSI bus reset interrupt whenever there is some disk activity, mostly seems related to the external disk, but it is hard to tell. I tried swapping out the external disk with another, thinking the disk was failing, as well as the cable, no change. These errors appear to have started with the upgrade to kernel 2.2.13 I also have another SS2, with a Cycle 5 upgrade that I moved to 2.2 at the same time, without any of these messages. Any suggestions? --Erik --- "Any chance collision, and I light up in the dark." Erik Blaufuss [EMAIL PROTECTED]