Re: AHCI Timeout errors on Intel Patsburg

2012-07-29 Thread Steven Hartland


- Original Message - 
From: "Alexander Motin" 

> is  cs  ss 0001 rs 0001 tfd 40 serr 0088

This line (ss and rs fields) tells me that device haven't confirmed 
completion of one NCQ command. Bits set in serr field mean "10b to 8b 
Decode Error" and "Link Sequence Error". I would suggest that something 
wrong with the link quality. That may explain why reducing speed helps.


Thanks Alexander, that's most helpful, will continue with testing to try and
narrow down the issue based on that info :)

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: AHCI Timeout errors on Intel Patsburg

2012-07-29 Thread Alexander Motin

Hi.

> is  cs  ss 0001 rs 0001 tfd 40 serr 0088

This line (ss and rs fields) tells me that device haven't confirmed 
completion of one NCQ command. Bits set in serr field mean "10b to 8b 
Decode Error" and "Link Sequence Error". I would suggest that something 
wrong with the link quality. That may explain why reducing speed helps.


--
Alexander Motin

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


AHCI Timeout errors on Intel Patsburg

2012-07-27 Thread Steven Hartland

We're seeing some strange timeout errors on some new Supermicro
X9DRT-HF MB's we here when combined with KINGSTON HyperX 3K SSD's

It seems that when connnected to the second channel reads often
timeout stalling all IO under 8.3-RELEASE-p3

When this happens we see:-
Jul 27 14:35:59 lon059 kernel: ahcich1: Timeout on slot 0 port 0
Jul 27 14:35:59 lon059 kernel: ahcich1: is  cs  ss 0001 rs 
0001 tfd 40 serr 0088 cmd 0004c017
Jul 27 14:37:41 lon059 kernel: ahcich1: Timeout on slot 0 port 0
Jul 27 14:37:41 lon059 kernel: ahcich1: is  cs  ss 0001 rs 
0001 tfd 40 serr 0088 cmd 0004c017
Jul 27 14:38:35 lon059 kernel: ahcich1: Timeout on slot 0 port 0
Jul 27 14:38:35 lon059 kernel: ahcich1: is  cs  ss 0001 rs 
0001 tfd 40 serr 0088 cmd 0004c017
Jul 27 14:39:05 lon059 kernel: ahcich1: Timeout on slot 0 port 0
Jul 27 14:39:05 lon059 kernel: ahcich1: is  cs  ss 0001 rs 
0001 tfd 40 serr 0088 cmd 0004c017
Jul 27 14:39:39 lon059 kernel: ahcich1: Timeout on slot 0 port 0
Jul 27 14:39:39 lon059 kernel: ahcich1: is  cs  ss 0001 rs 
0001 tfd 40 serr 0088 cmd 0004c017
Jul 27 13:58:06 lon059 kernel: ahcich1: Timeout on slot 14 port 0
Jul 27 13:58:06 lon059 kernel: ahcich1: is  cs  ss 4000 rs 
4000 tfd 40 serr 0088 cmd 0004ce17
Jul 27 14:21:17 lon059 kernel: ahcich1: Timeout on slot 14 port 0
Jul 27 14:21:17 lon059 kernel: ahcich1: is  cs  ss 4000 rs 
4000 tfd 40 serr 0088 cmd 0004ce17
Jul 27 14:29:16 lon059 kernel: ahcich1: Timeout on slot 7 port 0
Jul 27 14:29:16 lon059 kernel: ahcich1: is  cs  ss 0080 rs 
0080 tfd 40 serr 0088 cmd 0004c717
Jul 27 14:31:43 lon059 kernel: ahcich1: Timeout on slot 12 port 0
Jul 27 14:31:43 lon059 kernel: ahcich1: is  cs  ss 1000 rs 
1000 tfd 40 serr 0088 cmd 0004cc17

The disk in ahcich0 is identical but doesn't seem to exhibit the
same problem. Thought it may be a disk issue even though they
are brand new but 2 out of the 3 machines tested have the same
problem.

In addition I've not managed to reproduce the issue if I force
sata to rev 2 with: hint.ahcich.1.sata_rev=2

Machine is running with the latest SSD and machine firmware / bios.

Could this be a ahci bug?

dmesg and camcontrol output:-

ahci0:  port 0x9050-0x9057,0x9040-0x9043,0x9030-0x9037,0x9020-0x9023,0x9000-0x901f mem 
0xdfa22000-0xdfa227ff irq 18 at device 31.2 on pci0

ahci0: [ITHREAD]
ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
ahcich0:  at channel 0 on ahci0
ahcich0: [ITHREAD]
ahcich1:  at channel 1 on ahci0
ahcich1: [ITHREAD]
ahcich2:  at channel 2 on ahci0
ahcich2: [ITHREAD]
ahcich3:  at channel 3 on ahci0
ahcich3: [ITHREAD]
ahcich4:  at channel 4 on ahci0
ahcich4: [ITHREAD]
ahcich5:  at channel 5 on ahci0
ahcich5: [ITHREAD]

camcontrol identify ada1
pass1:  ATA-8 SATA 3.x device
pass1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)

protocol  ATA/ATAPI-8 SATA 3.x
device model  KINGSTON SH103S3120G
firmware revision 501ABBF0
serial number 50026B7223027059
WWN   50026b7223027059
cylinders 16383
heads 16
sectors/track 63
sector size   logical 512, physical 512, offset 0
LBA supported 234441648 sectors
LBA48 supported   234441648 sectors
PIO supported PIO4
DMA supported WDMA2 UDMA6
media RPM non-rotating

Feature  Support  Enabled   Value   Vendor
read ahead yes  yes
write cacheyes  yes
flush cacheyes  yes
overlapno
Tagged Command Queuing (TCQ)   no   no
Native Command Queuing (NCQ)   yes  32 tags
SMART  yes  yes
microcode download yes  yes
security   yes  no
power management   yes  yes
advanced power management  yes  yes 254/0xFE
automatic acoustic management  no   no
media status notification  no   no
power-up in Standbyyes  no
write-read-verify  yes  no  0/0x0
unload yes  yes
free-fall  no   no
data set management (DSM/TRIM) yes
DSM - max 512byte blocks   yes  8
DSM - deterministic read   yes  any value

   Regards
   Steve 




This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or retu