Re: SCSI ERRORS

2003-06-05 Thread Emanuel Schmid
> I've seen that error on my SS10 with two disks as well. Do you
> have an external device (e.g. CD-ROM, Streamer) connected ?
> That error seemed to happen less frequently after I disconnected my
> external CD-ROM drive.

that's the point where scsi gets a bit esoteric: a bit to long cables, a
terminator that is not completely ok, some plugs that are dirty and you
have exactly such problems. try to reorder the devices, use another cable
and/or terminator usw. it is possible that this helps...
one time i had such a problem it was the temperature of the devices and/or
cables. turning some fans resolved it...

gruess
vom
emi



Re: SCSI ERRORS

2003-06-03 Thread Stefan Naewe
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Am Montag, 2. Juni 2003 23:27 schrieb Sébastien Canchon:
> Hello,
>
> I have an SS20, running Debian gnu/linux 3.0rc1, with two disk scsi ... all
> works fine, but 10 minutes ago, when I have reboot the machine i have:
>
> scsi : aborting command due to timeout : pid 11, scsi0, channel 0, id 3,
> lun 0 Test Unit Ready 00 00 00 00 00 esp0: Aborting command
> esp0: dumping state
> esp0: dma -- cond_reg addr
> esp0: SW [sreg<00> sstep<00> ireg<20>]
> esp0: HW reread [sreg<12> sstep ireg<00>]
> esp0: current command [tgt<03> lun<00> pphase cphase]
> esp0: disconnected
> SCSI host 0 abort (pid 11) timed out - resetting
> SCSI bus is being reset for host 0 channel 0.
> esp0: Resetting scsi bus
> esp0: Gross error sreg=40
> esp0: SCSI bus reset interrupt
> SCSI host 0 channel 0 reset (pid 11) timed out - trying harder
> SCSI bus is being reset for host 0 channel 0.
> esp0: Resetting scsi bus
> esp0: SCSI bus reset interrupt
> esp0: SCSI bus reset interrupt
> SCSI host 0 reset (pid 11) timed out again -
> probably an unrecoverable SCSI bus or device hang.

I've seen that error on my SS10 with two disks as well. Do you
have an external device (e.g. CD-ROM, Streamer) connected ?
That error seemed to happen less frequently after I disconnected my external
CD-ROM drive.


Stefan
- -- 
- -
Stefan Naewe ([EMAIL PROTECTED]) GNU/Linux User #165035
PGP-Key: FF26 564E FB8D 70E8 A607  9C97 1950 C3AE CFBD 78B0

It's most certainly GNU/Linux, not Linux. Read more at
http://www.gnu.org/gnu/why-gnu-linux.html.
- -
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE+3OTeGVDDrs+9eLARAvmVAJ9nkJ9OxlT7Kd6JFw/O02kcCtxRewCgmZRc
9SrrvA5BIa5WC2VfYwyqhWw=
=FHp1
-END PGP SIGNATURE-



SCSI ERRORS

2003-06-02 Thread Sébastien Canchon



Hello,
 
I have an SS20, running Debian gnu/linux 3.0rc1, 
with two disk scsi ... all works fine, but 
10 minutes ago, when I have reboot the machine i 
have:
 
scsi : aborting command due to timeout : pid 11, 
scsi0, channel 0, id 3, lun 0 Test Unit Ready 00 00 00 00 00 esp0: Aborting 
commandesp0: dumping stateesp0: dma -- cond_reg 
addresp0: SW [sreg<00> sstep<00> 
ireg<20>]esp0: HW reread [sreg<12> sstep 
ireg<00>]esp0: current command [tgt<03> lun<00> 
pphase cphase]esp0: disconnected SCSI 
host 0 abort (pid 11) timed out - resettingSCSI bus is being reset for host 
0 channel 0.esp0: Resetting scsi busesp0: Gross error sreg=40esp0: 
SCSI bus reset interruptSCSI host 0 channel 0 reset (pid 11) timed out - 
trying harderSCSI bus is being reset for host 0 channel 0.esp0: 
Resetting scsi busesp0: SCSI bus reset interruptesp0: SCSI bus reset 
interruptSCSI host 0 reset (pid 11) timed out again -probably an 
unrecoverable SCSI bus or device hang.
 
Need help !!!
 
S.C


Re: Strange SCSI errors.

2003-02-24 Thread Andreas Loong

Daniel van Eeden wrote:

So we're sure it's not the hardware.
Now something strange, almost the same happend some days ago on my 
workstation.
I had two scsi disks attached to it. Then something strange happened 
(disk seemed to be very busy). I couldn't get any information from it. 
after a reboot I found out that te partiontable was damaged.
My workstation is an x86 based system with a Adaptec 2940 scsi 
controller. The failing disk was a 20G seagate with scsi-id 12. (the 
other disk has scsi-id 10)
I didn't check the drive yet, but I suspect it to be ok. (smartclt also 
told me so)

Could be the same problem?
Are all disks attached to the same scsi channel?
Which kernels are those systems using? which filesystems?
(I'm using 2.4.19 with ext3)


Hm.. I was using 2.4.19-sun4u-smp with ext3 too when the partition table 
got messed up. I was writing a partition table to another disk, when the 
scsi subsystem stopped responding. My drive became unbootable, but by 
booting from CD I mounted it and took a look at the tables.
The partition table spanned from block 0 to the last block, with just 
one partition. Mounting the drive, however, revealed that it was a 50MB 
partition (my /boot).


But, on my second attempt, my root is ext2 for now, while I try to 
mirror it over to an ext3 filesystem. That one also hung, I'll check if 
it is the "live target 0 not responding" this time too, or if it has 
moved to the ext3 disks.


And yes, all disks are attached to the same chain, my disk was a 4.2GB 
Sun Seagate drive. The symptoms I've encountered so far has been that 
the system stops responding, and the load slowly increases all the time.



I've only got a ss4-110. Can I simulate the problem with that one?


Checking my "Black Bible" here, you can have two drives in it. That 
should be enough to do some testing. Not sure which scsi chipset that 
one uses, though.



There are more common things between the u2 and e3000...there both 64 
bit, isn't it?

Maybe the kernel, libc or any other software with similair versions?


Very true. The E3000 I have has got UltraSPARC I and the U2 at work has 
UltraSPARC II, both fully 64bit. But since a similar thing happened to 
your x86 machine, it doesn't seem likely that it has got to do with 
32/64-bit.



Daniel van Eeden


//Andreas



Re: Strange SCSI errors.

2003-02-24 Thread Daniel van Eeden

So we're sure it's not the hardware.
Now something strange, almost the same happend some days ago on my 
workstation.
I had two scsi disks attached to it. Then something strange happened 
(disk seemed to be very busy). I couldn't get any information from it. 
after a reboot I found out that te partiontable was damaged.
My workstation is an x86 based system with a Adaptec 2940 scsi 
controller. The failing disk was a 20G seagate with scsi-id 12. (the 
other disk has scsi-id 10)
I didn't check the drive yet, but I suspect it to be ok. (smartclt also 
told me so)

Could be the same problem?
Are all disks attached to the same scsi channel?
Which kernels are those systems using? which filesystems?
(I'm using 2.4.19 with ext3)
I've only got a ss4-110. Can I simulate the problem with that one?

There are more common things between the u2 and e3000...there both 64 
bit, isn't it?

Maybe the kernel, libc or any other software with similair versions?

Daniel van Eeden <[EMAIL PROTECTED]>

Andreas Loong wrote:

[note, this is for the ultra2, which displays the same problem]


try these:
badblocks (read the manpage)


read the manpage, tried the program.. it didn't find anything.
This doesn't seem to be the problem, from what I've experienced.


cat /proc/scsi//0


Sparc ESP Host Adapter:
PROM node   f006347c
PROM name   SUNW,fas
ESP Model   Happy Meal FAS
DMA RevisionRev HME/FAS



cat /proc/scsi/scsi


nothing unusual here.


scsi-config  (X frontend for scsiinfo)


well, it finds the disks etc, and I can't find any strange values.



and if everything fails this could be a (dirty) solution:
scsiadd -r 
scsiadd -a 


This is not really what I want to do. I'll try to explain the problem 
better :


Sometimes, after the system has been up and running for a while with a 
couple of disks attached, I get "Live target 0 not responding" plastered 
over the console. The disks becomes totally non-responsive and the LED 
is lit constantly. Nothing gets written to the logs at all. This 
happened with one disk, and it managed to corrupt my partition table on 
that disk. I reinstalled on another disk and thought that I don't want 
to encounter this kind of problem again, I thought it was the disk that 
was faulty. Now, with a different disk, woody installed on it. Got the 
latest SMP kernel from the stable tree and started to construct a mirror 
of two different disks. Then I got hit with the same error message 
again. A bit odd, had to reboot.. got the mirrors up and running and 
today I was just about to copy the contents of the root over to the 
mirror so that I could quietly sit and work on the files that needed 
some work in order to reflect the changes. While copying, it hung again.


I do not think this is a hardware issue, as it always messes with target 
0, no matter what drive is there. The feeling I get after encountered 
this problem on two different machines is that it is either kernel-based 
or debian-based. The Ultra2 and the Enterprise 3000 have a few things in 
common, although one is high-end and the other is rather low-end.

1) Both are SBUS based.
2) Same SCSI chip? I'll check this.
3) Anything else?

Hope this clears up any misunderstandings.

Wbr
Andreas Loong





--
+-+
| Daniel van Eeden <[EMAIL PROTECTED]>  |
| icq: 36952189   |
| aim: Compukid128|
| jabber: [EMAIL PROTECTED] |
| msn: [EMAIL PROTECTED]|
| phone: +31 343 522622   |
| http://compukid.no-ip.org/about_me.html |
+-+



Re: Strange SCSI errors.

2003-02-24 Thread Andreas Loong

[note, this is for the ultra2, which displays the same problem]


try these:
badblocks (read the manpage)

read the manpage, tried the program.. it didn't find anything.
This doesn't seem to be the problem, from what I've experienced.


cat /proc/scsi//0

Sparc ESP Host Adapter:
PROM node   f006347c
PROM name   SUNW,fas
ESP Model   Happy Meal FAS
DMA RevisionRev HME/FAS



cat /proc/scsi/scsi

nothing unusual here.


scsi-config  (X frontend for scsiinfo)

well, it finds the disks etc, and I can't find any strange values.



and if everything fails this could be a (dirty) solution:
scsiadd -r 
scsiadd -a 
This is not really what I want to do. I'll try to explain the problem 
better :


Sometimes, after the system has been up and running for a while with a 
couple of disks attached, I get "Live target 0 not responding" plastered 
over the console. The disks becomes totally non-responsive and the LED 
is lit constantly. Nothing gets written to the logs at all. This 
happened with one disk, and it managed to corrupt my partition table on 
that disk. I reinstalled on another disk and thought that I don't want 
to encounter this kind of problem again, I thought it was the disk that 
was faulty. Now, with a different disk, woody installed on it. Got the 
latest SMP kernel from the stable tree and started to construct a mirror 
of two different disks. Then I got hit with the same error message 
again. A bit odd, had to reboot.. got the mirrors up and running and 
today I was just about to copy the contents of the root over to the 
mirror so that I could quietly sit and work on the files that needed 
some work in order to reflect the changes. While copying, it hung again.


I do not think this is a hardware issue, as it always messes with target 
0, no matter what drive is there. The feeling I get after encountered 
this problem on two different machines is that it is either kernel-based 
or debian-based. The Ultra2 and the Enterprise 3000 have a few things in 
common, although one is high-end and the other is rather low-end.

1) Both are SBUS based.
2) Same SCSI chip? I'll check this.
3) Anything else?

Hope this clears up any misunderstandings.

Wbr
Andreas Loong



Re: Strange SCSI errors.

2003-02-24 Thread Daniel van Eeden

try these:
badblocks (read the manpage)
cat /proc/scsi//0
cat /proc/scsi/scsi
scsi-config  (X frontend for scsiinfo)

and if everything fails this could be a (dirty) solution:
scsiadd -r 
scsiadd -a 

Andreas Loong wrote:

Hi there,

I have debian woody 3.0r1 stable installed on a Enterprise 3000.
(1280 ram, 6x170mhz ultrasparc I)
The whole machine behaves extremely nice, when there's one disk in the 
system. However, having more than one disk, the least bit of disk 
activity, then disk 0 stops responding. Looking at the disk, the diode 
is constantly on and the machine has stopped responding. Trying to do 
anything that isn't currently in ram results in a hang.


Removing disk 0 in mid flight and inserting it again yields nothing.
The kernel is taken directly from apt, the kernel-image-2.4.19-sun4u-smp 
so this does strike me as odd. This is not the first machine I've seen 
this behaviour - it also happens on the Ultra2.


Any suggestions on what to check?
CC me as I'm not on the list.

Wbr
Andreas Loong



--
+-+
| Daniel van Eeden <[EMAIL PROTECTED]>  |
| icq: 36952189   |
| aim: Compukid128|
| jabber: [EMAIL PROTECTED] |
| msn: [EMAIL PROTECTED]|
| phone: +31 343 522622   |
| http://compukid.no-ip.org/about_me.html |
+-+



Strange SCSI errors.

2003-02-24 Thread Andreas Loong

Hi there,

I have debian woody 3.0r1 stable installed on a Enterprise 3000.
(1280 ram, 6x170mhz ultrasparc I)
The whole machine behaves extremely nice, when there's one disk in the 
system. However, having more than one disk, the least bit of disk 
activity, then disk 0 stops responding. Looking at the disk, the diode 
is constantly on and the machine has stopped responding. Trying to do 
anything that isn't currently in ram results in a hang.


Removing disk 0 in mid flight and inserting it again yields nothing.
The kernel is taken directly from apt, the kernel-image-2.4.19-sun4u-smp 
so this does strike me as odd. This is not the first machine I've seen 
this behaviour - it also happens on the Ultra2.


Any suggestions on what to check?
CC me as I'm not on the list.

Wbr
Andreas Loong
--

___
Andreas Loong   Phone: +46 31 750 20 66
Dimension ABFax:
Kruthusg 17 http://www.dimension.se
S-405 23 Goteborg
---



Re: SCSI errors: esp0: SCSI bus reset interrupt

2000-02-01 Thread Attila Nagy
Hello,

> I've got a SS5, and it's been running Debian 2.1(2.0.38) for
> sometime happily.  This weekend I upgraded to 2.2
> frozen (2.2.13).  Since upgrading, I've gotten a
> constant stream of errors:
I have the same errors with kernels >2.2.9
If the machine runs with 2.2.9 it works correctly for months, but not with
2.2.10 and up...



SCSI errors: esp0: SCSI bus reset interrupt

2000-02-01 Thread Erik Blaufuss
Sparc experts:

I've got a SS5, and it's been running Debian 2.1(2.0.38) for
sometime happily.  This weekend I upgraded to 2.2
frozen (2.2.13).  Since upgrading, I've gotten a
constant stream of errors:

Feb  1 11:54:17 elvis kernel: esp0: Resetting scsi bus
Feb  1 11:54:17 elvis kernel: esp0: SCSI bus reset interrupt
Feb  1 11:54:17 elvis kernel: esp0: SCSI bus reset interrupt
Feb  1 11:55:02 elvis kernel: esp0: Resetting scsi bus
Feb  1 11:55:02 elvis kernel: esp0: SCSI bus reset interrupt
Feb  1 11:55:02 elvis kernel: esp0: SCSI bus reset interrupt

whenever there is some disk activity, mostly seems
related to the external disk, but it is hard to tell.

I tried swapping out the external disk with another, thinking
the disk was failing, as well as the cable, no change.

These errors appear to have started with the upgrade
to kernel 2.2.13

I also have another SS2, with a Cycle 5 upgrade that
I moved to 2.2 at the same time, without any of these messages.

Any suggestions?
--Erik

---
"Any chance collision, and I light up in the dark."
 Erik Blaufuss   [EMAIL PROTECTED]