Re: SATA timeouts on two disks

2008-01-24 Thread Jim MacBaine
Hi,

On Jan 21, 2008 8:47 AM, Tejun Heo [EMAIL PROTECTED] wrote:

 If you still have the old PSU lying around, please try to power one of
 the failing drive with the old PSU.  Just leave everything else as-is,
 power-up old PSU by itself as described in the following web page and
 connect only one of the failing drive to the old PSU.

   http://modtown.co.uk/mt/article2.php?id=psumod

 And see whether the problem continues and if so on which drives.
 Connecting SATA drives to separate power is completely safe even if they
 don't have common ground because SATA connection never directly connect
 to each other.

Yes, I still have the old PSU lying around.

A co-worker, to whom I explained my problem, asked me whether I had
properly grounded my drives. In fact I had not: The drives resided in
a vibration-absorbing frame through which their exterior had no
electrical contact with the grounded case. Since I grounded the drives
two days ago, I got no new errors.  So maybe my problem is solved.

If not, I will happily try out your suggestion. Would you be so kind
to explain in a few words, what connecting one drive to a second
(supposedly good) PSU will show?

(Is this still on-topic on this list?)

Thanks a lot,
Jim
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SATA timeouts on two disks

2008-01-24 Thread rgheck

Tejun Heo wrote:

Hello,

Jim MacBaine wrote:
  

A co-worker, to whom I explained my problem, asked me whether I had
properly grounded my drives. In fact I had not: The drives resided in
a vibration-absorbing frame through which their exterior had no
electrical contact with the grounded case. Since I grounded the drives
two days ago, I got no new errors.  So maybe my problem is solved.



Hmmm... Grounding. Interesting.

  
Can you say about more about this, Jim? This may also be my problem, or 
part of it, as my drives too are mounted in such a way as not to be in 
physical contact with the case. How did you go about grounding them? I 
suppose one test would be just to remove the washers


That said, in my case, 2.6.24 seems to make a big difference, too. I 
accidentally booted into 2.6.23 today and, boom.


Richard

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SATA timeouts on two disks

2008-01-19 Thread rgheck

Jim MacBaine wrote:

On Jan 13, 2008 1:07 PM, Mikael Pettersson [EMAIL PROTECTED] wrote

The fact that the problems occur on different disks on
different controllers driven by different drivers indicates
that it's not a disk, controller, or driver problem.

I strongly suspect an underdimensioned or failing PSU.


Thanks a lot for your clues.

I bought a new PSU on Monday and didn't get any new disk failures for
days.  But last night the same time-outs occurred again on two disks.
I guess I will try to replace the motherboard including the two SATA
controllers next.
  
I don't know if your problems are similar to mine or not. But I have 
been having extensive problems for quite some time now. Do you get these 
timeouts when using optical drives? That's what seems to trigger it in 
my case: If I'm using the optical drives, I'll often see the errors with 
them first, and then the whole ATA subsystem seems to go down. Then I 
get journal commit errors, general read errors, etc, until the system 
basically locks up. Worst case, it all happens very suddenly, and 
there's not even anything in the logs. Just a couple messages to the 
terminal, usually a journal commit error.


In my case, the opticall drives are a brand new Pioneer DVD-RW on SATA 
and an old Plextor on PATA. I mostly see the errors with the latter but 
have also seen them with the former. I'd thought I'd fixed it by adding 
pnpacpi=off and pci=nomsi,nommconf to the kernel boot options, as well 
as libata noacpi=1 to modules.conf, but now I've just had the problem 
again. I'm now thinking I should try eliminating the Plextor drive. It 
may be that it's the PATA drive that is causing all the trouble. I'll 
report if so.


FYI, here are the relevant modules being loaded:
[EMAIL PROTECTED] rgheck]# lsmod | grep ata
pata_amd   20293  0
pata_pdc2027x  17477  0
sata_nv25157  8
ata_generic14405  0
libata114673  4 pata_amd,pata_pdc2027x,sata_nv,ata_generic
scsi_mod  145657  5 sr_mod,sg,usb_storage,libata,sd_mod
The IDE interface is an nVidia MCP55, apparently, on an ASUS P5N32-E mb.

I doubt very much it's a PS issue in my case. There's not that much in 
the box.


Richard

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SATA timeouts on two disks

2008-01-13 Thread Mikael Pettersson
Jim MacBaine writes:
  Hi,
  
  Recently I'm experiencing strange sata errors on my desktop system.
  The system was recently equipped with three 250 GB SATA drives from

Clue #1: added drives

  three different manufacturers and I'm having an identical problem on
  two of them.  The drives are connected to two on-board controllers on
  an Asus A8V board, which were both running with Linux for more than
  two years with older SATA disks without problems. A hardware failure
  seems unlikely to me as the same error occurrs on two brand new disks
  from two different manufacturers.  I'm running a vanilla 2.6.23.12
  kernel.
  
  Errror on sdc happened about 10 times tonight, each time I could hear
  the disk spin down and up again, while the system was frozen for
  several seconds:
  
  ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x18 action 0x2 frozen
  ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
   res 40/00:00:00:00:40/00:00:00:00:00/00 Emask 0x4 (timeout)
  ata2: soft resetting port
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
  ata2.00: configured for UDMA/133
  ata2: EH complete
  sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
  sd 1:0:0:0: [sdb] Write Protect is off
  sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
  sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
  support DPO or FUA
  
  In the log I also found several identical errors on one other drive:
  
  ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
  ata5.00: cmd 25/00:08:b7:f2:11/00:00:13:00:00/e0 tag 0 cdb 0x0 data 4096 in
   res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
  ata5: soft resetting port
  ata5.00: configured for UDMA/33
  ata5: EH complete
  sd 4:0:0:0: [sdc] 488397168 512-byte hardware sectors (250059 MB)
  sd 4:0:0:0: [sdc] Write Protect is off
  sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
  sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
  support DPO or FUA

Clue #2: both ata2 and ata5 are having problems

  
  Can this be the result of a hardware failure?  I've seen several
  drives being added to an NCQ blacklist during the last weeks.  Is it
  possible that my drives need to be added here, too?  Or have I just
  two failing drives?
  
  Thanks a lot for any clues,
  Jim
  
  
  System boot log extract:
  
  sata_promise :00:08.0: version 2.10
  ACPI: PCI Interrupt :00:08.0[A] - GSI 18 (level, low) - IRQ 18
  scsi0 : sata_promise
  scsi1 : sata_promise
  scsi2 : sata_promise
  ata1: SATA max UDMA/133 cmd 0xf882e200 ctl 0xf882e238 bmdma 0x irq 18
  ata2: SATA max UDMA/133 cmd 0xf882e280 ctl 0xf882e2b8 bmdma 0x irq 18
  ata3: PATA max UDMA/133 cmd 0xf882e300 ctl 0xf882e338 bmdma 0x irq 18
  ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
  ata1.00: ATA-8: SAMSUNG HD252KJ, CM100-12, max UDMA7
  ata1.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 0/32)
  ata1.00: configured for UDMA/133
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
  ata2.00: ATA-7: WDC WD2500JS-55NCB1, 10.02E01, max UDMA/133
  ata2.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 0/32)
  ata2.00: configured for UDMA/133

Clue #3: ata2 is driven by sata_promise (lspci says it's a 20378, they're good)

  scsi 0:0:0:0: Direct-Access ATA  SAMSUNG HD252KJ  CM10 PQ: 0 ANSI: 5
  sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
  sd 0:0:0:0: [sda] Write Protect is off
  sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
  sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
  support DPO or FUA
  sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
  sd 0:0:0:0: [sda] Write Protect is off
  sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
  sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
  support DPO or FUA
   sda: sda2 sda3
  sd 0:0:0:0: [sda] Attached SCSI disk
  scsi 1:0:0:0: Direct-Access ATA  WDC WD2500JS-55N 10.0 PQ: 0 ANSI: 5
  sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
  sd 1:0:0:0: [sdb] Write Protect is off
  sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
  sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
  support DPO or FUA
  sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors (250059 MB)
  sd 1:0:0:0: [sdb] Write Protect is off
  sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
  sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
  support DPO or FUA
   sdb: sdb2 sdb3
  sd 1:0:0:0: [sdb] Attached SCSI disk
  sata_via :00:0f.0: version 2.3
  ACPI: PCI Interrupt :00:0f.0[B] - GSI 20 (level, low) - IRQ 17
  sata_via :00:0f.0: routed to hard irq line 10
  scsi3 : sata_via
  scsi4 : sata_via
  ata4: SATA max UDMA/133 cmd 0x0001d000 ctl 0x0001c802 bmdma 0x0001b800 irq 17
  ata5: SATA max UDMA/133 cmd 0x0001c400 ctl 0x0001c002 bmdma 0x0001b808 irq 17
  ata4: SATA link down 1.5 Gbps (SStatus 0 SControl 300)
  ata5: SATA link up 1.5 Gbps