Re: [2.6.21.1] SATA freeze

2007-06-04 Thread Tomasz Chmielewski

Fred Moyer wrote:

Sounds like SMART is likely disabled on that drive. You can try doing 
"smartctl -s on /dev/sda" and see if that will turn it on.




Sorry - that last post of mine was brain dead.  Here's the one with 
(hopefully) useful data.


app2 ~ # smartctl  -d ata -a /dev/sda
smartctl version 5.36 [x86_64-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen



(...)


   -- -- -- -- -- -- --
   84 51 00 b5 c9 73 e0  Error: ICRC, ABRT at LBA = 0x0073c9b5 = 7588277

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --    
   25 00 20 96 c9 73 e0 00  01:25:42.886  READ DMA EXT
   b0 d0 01 00 4f c2 00 02  01:25:42.868  SMART READ DATA
   35 00 08 ae b6 42 e0 00  01:25:42.456  WRITE DMA EXT
   b0 da 00 00 4f c2 00 00  01:25:42.430  SMART RETURN STATUS
   35 00 08 60 81 04 e0 00  01:25:42.376  WRITE DMA EXT


I was getting very similar SMART results, and kernel errors, when used 
PATA drive and SATA_VIA (no freezes or hangs though):



SCSI device sda: 390721968 512-byte hdwr sectors (200050 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata3.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0
 res 51/04:00:0b:ff:bf/00:00:00:00:00/00 Emask 0x1 (device error)
ata3.00: configured for UDMA/100
ata3: EH complete
SCSI device sda: write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata3.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in
 res 51/04:01:00:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error)
ata3.00: configured for UDMA/100
ata3: EH complete



The problem was that I started smartd with wrong parameters:

DEVICESCAN -a -o on -S on -s (S/../.././10|L/../../6/11)


It was solved when I added "-d sat" to smartd parameters:

DEVICESCAN -d sat -a -o on -S on -s (S/../.././10|L/../../6/11)


From that time on, smartctl -a /dev/sda gives "normal" output, and no 
more strange kernel errors.


Hopefully, it'll get fixed in smartmontools soon (or is fixed already, 
but not yet mainline).



--
Tomasz Chmielewski
http://wpkg.org


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] SATA freeze

2007-06-04 Thread Tomasz Chmielewski

Fred Moyer wrote:

Sounds like SMART is likely disabled on that drive. You can try doing 
smartctl -s on /dev/sda and see if that will turn it on.




Sorry - that last post of mine was brain dead.  Here's the one with 
(hopefully) useful data.


app2 ~ # smartctl  -d ata -a /dev/sda
smartctl version 5.36 [x86_64-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen



(...)


   -- -- -- -- -- -- --
   84 51 00 b5 c9 73 e0  Error: ICRC, ABRT at LBA = 0x0073c9b5 = 7588277

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --    
   25 00 20 96 c9 73 e0 00  01:25:42.886  READ DMA EXT
   b0 d0 01 00 4f c2 00 02  01:25:42.868  SMART READ DATA
   35 00 08 ae b6 42 e0 00  01:25:42.456  WRITE DMA EXT
   b0 da 00 00 4f c2 00 00  01:25:42.430  SMART RETURN STATUS
   35 00 08 60 81 04 e0 00  01:25:42.376  WRITE DMA EXT


I was getting very similar SMART results, and kernel errors, when used 
PATA drive and SATA_VIA (no freezes or hangs though):



SCSI device sda: 390721968 512-byte hdwr sectors (200050 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata3.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0
 res 51/04:00:0b:ff:bf/00:00:00:00:00/00 Emask 0x1 (device error)
ata3.00: configured for UDMA/100
ata3: EH complete
SCSI device sda: write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata3.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in
 res 51/04:01:00:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error)
ata3.00: configured for UDMA/100
ata3: EH complete



The problem was that I started smartd with wrong parameters:

DEVICESCAN -a -o on -S on -s (S/../.././10|L/../../6/11)


It was solved when I added -d sat to smartd parameters:

DEVICESCAN -d sat -a -o on -S on -s (S/../.././10|L/../../6/11)


From that time on, smartctl -a /dev/sda gives normal output, and no 
more strange kernel errors.


Hopefully, it'll get fixed in smartmontools soon (or is fixed already, 
but not yet mainline).



--
Tomasz Chmielewski
http://wpkg.org


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] SATA freeze

2007-05-13 Thread Jim Paris
> >  This appears to be a different problem. Something is issuing SMART-related 
> >  commands (smartd or smartctl perhaps) which the drive seems to be reacting 
> >  strangely to.
..
> Specifically, I could trigger it by running 'smartctl -d ata -S on
> /dev/sda' OR (s-S/o/).

This sounds like a known bug in smartmontools:

  http://marc.info/?l=smartmontools-support=117203137719518
  http://www.mail-archive.com/[EMAIL PROTECTED]/msg03160.html

-jim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] SATA freeze

2007-05-13 Thread Jim Paris
   This appears to be a different problem. Something is issuing SMART-related 
   commands (smartd or smartctl perhaps) which the drive seems to be reacting 
   strangely to.
..
 Specifically, I could trigger it by running 'smartctl -d ata -S on
 /dev/sda' OR (s-S/o/).

This sounds like a known bug in smartmontools:

  http://marc.info/?l=smartmontools-supportm=117203137719518
  http://www.mail-archive.com/[EMAIL PROTECTED]/msg03160.html

-jim
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] SATA freeze

2007-05-12 Thread Fred Moyer

Robert Hancock wrote:

Fred Moyer wrote:
This appears to be a different problem. Something is issuing 
SMART-related commands (smartd or smartctl perhaps) which the drive 
seems to be reacting strangely to. It apparently completed the 
command but never raised DRQ to request any data being transferred 
even though we expected it to. Maybe SMART is disabled on the drive 
and that's causing it to just toss these commands? CCing linux-ide in 
case anyone knows what would cause this.


Here's smartctl -a for this drive - same output for both sda and sdb. 
Smartd is currently running.  Any advice appreciated.


Sounds like SMART is likely disabled on that drive. You can try doing 
"smartctl -s on /dev/sda" and see if that will turn it on.




Sorry - that last post of mine was brain dead.  Here's the one with 
(hopefully) useful data.


app2 ~ # smartctl  -d ata -a /dev/sda
smartctl version 5.36 [x86_64-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: ST3808110AS
Serial Number:5LR8895K
Firmware Version: 3.AJJ
User Capacity:80,026,361,856 bytes
Device is:Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:Sat May 12 18:49:06 2007 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: 
Enabled.
Self-test execution status:  (   0) The previous self-test routine 
completed
without error or no self-test 
has ever

been run.
Total time to complete Offline
data collection: ( 431) seconds.
Offline data collection
capabilities:(0x5b) SMART execute Offline immediate.
Auto Offline data collection 
on/off support.

Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time:(   2) minutes.
Extended self-test routine
recommended polling time:(  27) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE 
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   100   253   006Pre-fail 
Always   -   0
  3 Spin_Up_Time0x0002   097   094   000Old_age 
Always   -   0
  4 Start_Stop_Count0x0033   100   100   020Pre-fail 
Always   -   41
  5 Reallocated_Sector_Ct   0x0033   098   098   036Pre-fail 
Always   -   80
  7 Seek_Error_Rate 0x000f   073   060   030Pre-fail 
Always   -   23194052
  9 Power_On_Hours  0x0032   096   096   000Old_age 
Always   -   3899
 10 Spin_Retry_Count0x0013   100   100   097Pre-fail 
Always   -   0
 12 Power_Cycle_Count   0x0033   100   100   020Pre-fail 
Always   -   108
187 Unknown_Attribute   0x0032   001   001   000Old_age   Always 
  -   17863
189 Unknown_Attribute   0x003a   100   100   000Old_age   Always 
  -   0
190 Unknown_Attribute   0x0022   070   057   045Old_age   Always 
  -   2689188364318
194 Temperature_Celsius 0x0022   030   043   000Old_age   Always 
  -   30 (Lifetime Min/Max 0/22)
195 Hardware_ECC_Recovered  0x001a   048   045   000Old_age   Always 
  -   2474070
197 Current_Pending_Sector  0x0012   100   100   000Old_age   Always 
  -   0
198 Offline_Uncorrectable   0x0010   100   100   000Old_age 
Offline  -   0
199 UDMA_CRC_Error_Count0x003e   200   200   000Old_age   Always 
  -   7
200 Multi_Zone_Error_Rate   0x   100   253   000Old_age 
Offline  -   0
202 TA_Increase_Count   0x0032   100   

Re: [2.6.21.1] SATA freeze

2007-05-12 Thread Robin H. Johnson
On Sat, May 12, 2007 at 12:48:59PM -0600, Robert Hancock wrote:
>  Fred Moyer wrote:
> > I just joined the list today so apologies if this email breaks any email 
> > client post threading.
> > I have been seeing similar errors on two different systems.  I applied 
> > Robert's sata_nv patch posted to the list on May 5th, and approved today by 
> > Jeff Garzik.  I've taken several steps to insure that this isn't a faulty 
> > cable or drive issue.  This is running on a hp dl145g2.  Here is my lspci, 
> > dmesg, and relevant kernel config sections:
> 
>  (snip)
> 
> > ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> > ata1.00: cmd b0/d2:f1:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 123392 
> > in
> >  res 50/00:f1:00:4f:c2/00:00:00:00:00/00 Emask 0x202 (HSM 
> > violation)
> > ata1: soft resetting port
> > ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > ata1.00: configured for UDMA/100
> > ata1: EH complete
> 
>  This appears to be a different problem. Something is issuing SMART-related 
>  commands (smartd or smartctl perhaps) which the drive seems to be reacting 
>  strangely to. It apparently completed the command but never raised DRQ to 
>  request any data being transferred even though we expected it to. Maybe 
>  SMART is disabled on the drive and that's causing it to just toss these 
>  commands? CCing linux-ide in case anyone knows what would cause this.
I previously posted a near identical error to linux-ide.
http://article.gmane.org/gmane.linux.ide/18375

Specifically, I could trigger it by running 'smartctl -d ata -S on
/dev/sda' OR (s-S/o/).

Same sata_nv controller, two different drives, many different cables.
Reproducible over 7 systems [two different models of Tyan mobo] that I
have.

-- 
Robin Hugh Johnson
Gentoo Linux Developer & Council Member
E-Mail : [EMAIL PROTECTED]
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85


pgp1lg2M9qRYv.pgp
Description: PGP signature


Re: [2.6.21.1] SATA freeze

2007-05-12 Thread Robert Hancock

Fred Moyer wrote:
This appears to be a different problem. Something is issuing 
SMART-related commands (smartd or smartctl perhaps) which the drive 
seems to be reacting strangely to. It apparently completed the command 
but never raised DRQ to request any data being transferred even though 
we expected it to. Maybe SMART is disabled on the drive and that's 
causing it to just toss these commands? CCing linux-ide in case anyone 
knows what would cause this.


Here's smartctl -a for this drive - same output for both sda and sdb. 
Smartd is currently running.  Any advice appreciated.


Previously on 2.6.15 I was seeing sdb remount as readonly under heavy 
i/o.  I have not seen that issue yet with 2.6.21 (with Robert's patch 
from May 5th for sata_nv), but that occurrence of remounts read-only was 
infrequently, so that issue may be solved.


app2 ~ # smartctl -a /dev/sda
smartctl version 5.36 [x86_64-pc-linux-gnu] Copyright (C) 2002-6 Bruce 
Allen

Home page is http://smartmontools.sourceforge.net/

Device: ATA  ST3808110AS  Version: n/a
Serial number: 5LR8895K
Device type: disk
Local Time is: Sat May 12 12:05:58 2007 PDT
Device does not support SMART

Error Counter logging not supported

[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
Device does not support Self Test logging



Sounds like SMART is likely disabled on that drive. You can try doing 
"smartctl -s on /dev/sda" and see if that will turn it on.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] SATA freeze

2007-05-12 Thread Fred Moyer

Robert Hancock wrote:

Fred Moyer wrote:
I just joined the list today so apologies if this email breaks any 
email client post threading.


I have been seeing similar errors on two different systems.  I applied 
Robert's sata_nv patch posted to the list on May 5th, and approved 
today by Jeff Garzik.  I've taken several steps to insure that this 
isn't a faulty cable or drive issue.  This is running on a hp 
dl145g2.  Here is my lspci, dmesg, and relevant kernel config sections:


(snip)


ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd b0/d2:f1:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 
123392 in
 res 50/00:f1:00:4f:c2/00:00:00:00:00/00 Emask 0x202 (HSM 
violation)

ata1: soft resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: configured for UDMA/100
ata1: EH complete


This appears to be a different problem. Something is issuing 
SMART-related commands (smartd or smartctl perhaps) which the drive 
seems to be reacting strangely to. It apparently completed the command 
but never raised DRQ to request any data being transferred even though 
we expected it to. Maybe SMART is disabled on the drive and that's 
causing it to just toss these commands? CCing linux-ide in case anyone 
knows what would cause this.


Here's smartctl -a for this drive - same output for both sda and sdb. 
Smartd is currently running.  Any advice appreciated.


Previously on 2.6.15 I was seeing sdb remount as readonly under heavy 
i/o.  I have not seen that issue yet with 2.6.21 (with Robert's patch 
from May 5th for sata_nv), but that occurrence of remounts read-only was 
infrequently, so that issue may be solved.


app2 ~ # smartctl -a /dev/sda
smartctl version 5.36 [x86_64-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: ATA  ST3808110AS  Version: n/a
Serial number: 5LR8895K
Device type: disk
Local Time is: Sat May 12 12:05:58 2007 PDT
Device does not support SMART

Error Counter logging not supported

[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
Device does not support Self Test logging

app2 ~ # ps aux | grep smart
root  5227  0.0  0.0   2892   672 ?SMay11   0:00 
/usr/sbin/smartd -p /var/run/smartd.pid
root 19510  0.0  0.0   2648   648 pts/0S+   12:07   0:00 grep 
--colour=auto smart

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] SATA freeze

2007-05-12 Thread Robert Hancock

Fred Moyer wrote:
I just joined the list today so apologies if this email breaks any email 
client post threading.


I have been seeing similar errors on two different systems.  I applied 
Robert's sata_nv patch posted to the list on May 5th, and approved today 
by Jeff Garzik.  I've taken several steps to insure that this isn't a 
faulty cable or drive issue.  This is running on a hp dl145g2.  Here is 
my lspci, dmesg, and relevant kernel config sections:


(snip)


ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd b0/d2:f1:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 
123392 in
 res 50/00:f1:00:4f:c2/00:00:00:00:00/00 Emask 0x202 (HSM 
violation)

ata1: soft resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: configured for UDMA/100
ata1: EH complete


This appears to be a different problem. Something is issuing 
SMART-related commands (smartd or smartctl perhaps) which the drive 
seems to be reacting strangely to. It apparently completed the command 
but never raised DRQ to request any data being transferred even though 
we expected it to. Maybe SMART is disabled on the drive and that's 
causing it to just toss these commands? CCing linux-ide in case anyone 
knows what would cause this.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] SATA freeze

2007-05-12 Thread Robert Hancock

Gerhard Mack wrote:

On Wed, 9 May 2007, Robert Hancock wrote:

Gerhard Mack wrote:

On Wed, 9 May 2007, Jeff Garzik wrote:

Gerhard Mack wrote:

May  9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0
SErr
0x180 action 0x2 frozen
May  9 14:51:35 mgerhard kernel: ata1.00: cmd
35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
May  9 14:51:35 mgerhard kernel:  res
40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout)
May  9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please
be
patient (Status 0xd0)

Anything I can do to figgure out what's causing this?

You're showing various flags set in the SError register, which suggests you're
having SATA communication problems with the drive. A bad SATA cable or power
problems would be a strong possibility.

It really would be nice if we decoded these things more usefully for the user
(same with the regular ATA errors, like drivers/ide does), but in general
SError showing up as non-zero is a bad thing:

0x40 = "Handshake error: When set to one, this bit indicates that one or
more R_ERR handshake response was received in response to frame transmission.
Such errors may be the result of a CRC error detected by the recipient, a
disparity or 10b/8b decoding error, or other error condition leading to a
negative handshake on a transmitted frame."

0x180 = "Link Sequence Error: When set to one, this bit indicates that one
or more Link state machine error conditions was encountered since the last
time this bit was cleared. The Link Layer state machine defines the conditions
under which the link layer detects an erroneous transition."

and

"Transport state transition error: When set to one, this bit indicates that an
error has occurred in the transition from one state to another within the
Transport layer since the last time this bit was cleared."



Just out of curiosity how often is that bit cleared?


I believe that is cleared only on error handling or controller reset, so 
  it just means that it happened sometime since boot or the last libata 
error recovery.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] SATA freeze

2007-05-12 Thread Robert Hancock

Gerhard Mack wrote:

On Wed, 9 May 2007, Robert Hancock wrote:

Gerhard Mack wrote:

On Wed, 9 May 2007, Jeff Garzik wrote:

Gerhard Mack wrote:

May  9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0
SErr
0x180 action 0x2 frozen
May  9 14:51:35 mgerhard kernel: ata1.00: cmd
35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
May  9 14:51:35 mgerhard kernel:  res
40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout)
May  9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please
be
patient (Status 0xd0)

Anything I can do to figgure out what's causing this?

You're showing various flags set in the SError register, which suggests you're
having SATA communication problems with the drive. A bad SATA cable or power
problems would be a strong possibility.

It really would be nice if we decoded these things more usefully for the user
(same with the regular ATA errors, like drivers/ide does), but in general
SError showing up as non-zero is a bad thing:

0x40 = Handshake error: When set to one, this bit indicates that one or
more R_ERR handshake response was received in response to frame transmission.
Such errors may be the result of a CRC error detected by the recipient, a
disparity or 10b/8b decoding error, or other error condition leading to a
negative handshake on a transmitted frame.

0x180 = Link Sequence Error: When set to one, this bit indicates that one
or more Link state machine error conditions was encountered since the last
time this bit was cleared. The Link Layer state machine defines the conditions
under which the link layer detects an erroneous transition.

and

Transport state transition error: When set to one, this bit indicates that an
error has occurred in the transition from one state to another within the
Transport layer since the last time this bit was cleared.



Just out of curiosity how often is that bit cleared?


I believe that is cleared only on error handling or controller reset, so 
  it just means that it happened sometime since boot or the last libata 
error recovery.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] SATA freeze

2007-05-12 Thread Robert Hancock

Fred Moyer wrote:
I just joined the list today so apologies if this email breaks any email 
client post threading.


I have been seeing similar errors on two different systems.  I applied 
Robert's sata_nv patch posted to the list on May 5th, and approved today 
by Jeff Garzik.  I've taken several steps to insure that this isn't a 
faulty cable or drive issue.  This is running on a hp dl145g2.  Here is 
my lspci, dmesg, and relevant kernel config sections:


(snip)


ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd b0/d2:f1:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 
123392 in
 res 50/00:f1:00:4f:c2/00:00:00:00:00/00 Emask 0x202 (HSM 
violation)

ata1: soft resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: configured for UDMA/100
ata1: EH complete


This appears to be a different problem. Something is issuing 
SMART-related commands (smartd or smartctl perhaps) which the drive 
seems to be reacting strangely to. It apparently completed the command 
but never raised DRQ to request any data being transferred even though 
we expected it to. Maybe SMART is disabled on the drive and that's 
causing it to just toss these commands? CCing linux-ide in case anyone 
knows what would cause this.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] SATA freeze

2007-05-12 Thread Fred Moyer

Robert Hancock wrote:

Fred Moyer wrote:
I just joined the list today so apologies if this email breaks any 
email client post threading.


I have been seeing similar errors on two different systems.  I applied 
Robert's sata_nv patch posted to the list on May 5th, and approved 
today by Jeff Garzik.  I've taken several steps to insure that this 
isn't a faulty cable or drive issue.  This is running on a hp 
dl145g2.  Here is my lspci, dmesg, and relevant kernel config sections:


(snip)


ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd b0/d2:f1:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 
123392 in
 res 50/00:f1:00:4f:c2/00:00:00:00:00/00 Emask 0x202 (HSM 
violation)

ata1: soft resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: configured for UDMA/100
ata1: EH complete


This appears to be a different problem. Something is issuing 
SMART-related commands (smartd or smartctl perhaps) which the drive 
seems to be reacting strangely to. It apparently completed the command 
but never raised DRQ to request any data being transferred even though 
we expected it to. Maybe SMART is disabled on the drive and that's 
causing it to just toss these commands? CCing linux-ide in case anyone 
knows what would cause this.


Here's smartctl -a for this drive - same output for both sda and sdb. 
Smartd is currently running.  Any advice appreciated.


Previously on 2.6.15 I was seeing sdb remount as readonly under heavy 
i/o.  I have not seen that issue yet with 2.6.21 (with Robert's patch 
from May 5th for sata_nv), but that occurrence of remounts read-only was 
infrequently, so that issue may be solved.


app2 ~ # smartctl -a /dev/sda
smartctl version 5.36 [x86_64-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: ATA  ST3808110AS  Version: n/a
Serial number: 5LR8895K
Device type: disk
Local Time is: Sat May 12 12:05:58 2007 PDT
Device does not support SMART

Error Counter logging not supported

[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
Device does not support Self Test logging

app2 ~ # ps aux | grep smart
root  5227  0.0  0.0   2892   672 ?SMay11   0:00 
/usr/sbin/smartd -p /var/run/smartd.pid
root 19510  0.0  0.0   2648   648 pts/0S+   12:07   0:00 grep 
--colour=auto smart

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] SATA freeze

2007-05-12 Thread Robert Hancock

Fred Moyer wrote:
This appears to be a different problem. Something is issuing 
SMART-related commands (smartd or smartctl perhaps) which the drive 
seems to be reacting strangely to. It apparently completed the command 
but never raised DRQ to request any data being transferred even though 
we expected it to. Maybe SMART is disabled on the drive and that's 
causing it to just toss these commands? CCing linux-ide in case anyone 
knows what would cause this.


Here's smartctl -a for this drive - same output for both sda and sdb. 
Smartd is currently running.  Any advice appreciated.


Previously on 2.6.15 I was seeing sdb remount as readonly under heavy 
i/o.  I have not seen that issue yet with 2.6.21 (with Robert's patch 
from May 5th for sata_nv), but that occurrence of remounts read-only was 
infrequently, so that issue may be solved.


app2 ~ # smartctl -a /dev/sda
smartctl version 5.36 [x86_64-pc-linux-gnu] Copyright (C) 2002-6 Bruce 
Allen

Home page is http://smartmontools.sourceforge.net/

Device: ATA  ST3808110AS  Version: n/a
Serial number: 5LR8895K
Device type: disk
Local Time is: Sat May 12 12:05:58 2007 PDT
Device does not support SMART

Error Counter logging not supported

[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
Device does not support Self Test logging



Sounds like SMART is likely disabled on that drive. You can try doing 
smartctl -s on /dev/sda and see if that will turn it on.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] SATA freeze

2007-05-12 Thread Fred Moyer

Robert Hancock wrote:

Fred Moyer wrote:
This appears to be a different problem. Something is issuing 
SMART-related commands (smartd or smartctl perhaps) which the drive 
seems to be reacting strangely to. It apparently completed the 
command but never raised DRQ to request any data being transferred 
even though we expected it to. Maybe SMART is disabled on the drive 
and that's causing it to just toss these commands? CCing linux-ide in 
case anyone knows what would cause this.


Here's smartctl -a for this drive - same output for both sda and sdb. 
Smartd is currently running.  Any advice appreciated.


Sounds like SMART is likely disabled on that drive. You can try doing 
smartctl -s on /dev/sda and see if that will turn it on.




Sorry - that last post of mine was brain dead.  Here's the one with 
(hopefully) useful data.


app2 ~ # smartctl  -d ata -a /dev/sda
smartctl version 5.36 [x86_64-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: ST3808110AS
Serial Number:5LR8895K
Firmware Version: 3.AJJ
User Capacity:80,026,361,856 bytes
Device is:Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:Sat May 12 18:49:06 2007 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: 
Enabled.
Self-test execution status:  (   0) The previous self-test routine 
completed
without error or no self-test 
has ever

been run.
Total time to complete Offline
data collection: ( 431) seconds.
Offline data collection
capabilities:(0x5b) SMART execute Offline immediate.
Auto Offline data collection 
on/off support.

Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time:(   2) minutes.
Extended self-test routine
recommended polling time:(  27) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE 
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   100   253   006Pre-fail 
Always   -   0
  3 Spin_Up_Time0x0002   097   094   000Old_age 
Always   -   0
  4 Start_Stop_Count0x0033   100   100   020Pre-fail 
Always   -   41
  5 Reallocated_Sector_Ct   0x0033   098   098   036Pre-fail 
Always   -   80
  7 Seek_Error_Rate 0x000f   073   060   030Pre-fail 
Always   -   23194052
  9 Power_On_Hours  0x0032   096   096   000Old_age 
Always   -   3899
 10 Spin_Retry_Count0x0013   100   100   097Pre-fail 
Always   -   0
 12 Power_Cycle_Count   0x0033   100   100   020Pre-fail 
Always   -   108
187 Unknown_Attribute   0x0032   001   001   000Old_age   Always 
  -   17863
189 Unknown_Attribute   0x003a   100   100   000Old_age   Always 
  -   0
190 Unknown_Attribute   0x0022   070   057   045Old_age   Always 
  -   2689188364318
194 Temperature_Celsius 0x0022   030   043   000Old_age   Always 
  -   30 (Lifetime Min/Max 0/22)
195 Hardware_ECC_Recovered  0x001a   048   045   000Old_age   Always 
  -   2474070
197 Current_Pending_Sector  0x0012   100   100   000Old_age   Always 
  -   0
198 Offline_Uncorrectable   0x0010   100   100   000Old_age 
Offline  -   0
199 UDMA_CRC_Error_Count0x003e   200   200   000Old_age   Always 
  -   7
200 Multi_Zone_Error_Rate   0x   100   253   000Old_age 
Offline  -   0
202 TA_Increase_Count   0x0032   100   

Re: Re: [2.6.21.1] SATA freeze

2007-05-10 Thread Fred Moyer

Robert Hancock wrote:
>Gerhard Mack wrote:
>> On Wed, 9 May 2007, Jeff Garzik wrote:
>>> Gerhard Mack wrote:
 May  9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 
0x0 SErr

 0x180 action 0x2 frozen
 May  9 14:51:35 mgerhard kernel: ata1.00: cmd
 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
 May  9 14:51:35 mgerhard kernel:  res
 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout)
 May  9 14:51:42 mgerhard kernel: ata1: port is slow to respond, 
please be

 patient (Status 0xd0)

 Anything I can do to figgure out what's causing this?
> You're showing various flags set in the SError register, which
> suggests you're having SATA communication problems with the drive. A
> bad SATA cable or power problems would be a strong possibility.

I just joined the list today so apologies if this email breaks any email 
client post threading.


I have been seeing similar errors on two different systems.  I applied 
Robert's sata_nv patch posted to the list on May 5th, and approved today 
by Jeff Garzik.  I've taken several steps to insure that this isn't a 
faulty cable or drive issue.  This is running on a hp dl145g2.  Here is 
my lspci, dmesg, and relevant kernel config sections:



Linux version 2.6.21-gentoo ([EMAIL PROTECTED]) (gcc version 
4.1.1 (Gentoo 4.1.1)) #6 SMP Sun May 6 16:44:40 PDT 2007

Command line: root=/dev/sda2
BIOS-provided physical RAM map:
 BIOS-e820:  - 00098800 (usable)
 BIOS-e820: 00098800 - 000a (reserved)
 BIOS-e820: 000c2000 - 0010 (reserved)
 BIOS-e820: 0010 - bff2 (usable)
 BIOS-e820: bff2 - bff29000 (ACPI data)
 BIOS-e820: bff29000 - bff8 (ACPI NVS)
 BIOS-e820: bff8 - c000 (reserved)
 BIOS-e820: d800 - d8000400 (reserved)
 BIOS-e820: d8001000 - d8001400 (reserved)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - fec00400 (reserved)
 BIOS-e820: fee0 - fee01000 (reserved)
 BIOS-e820: fff8 - 0001 (reserved)
 BIOS-e820: 0001 - 00014000 (usable)
Entering add_active_range(0, 0, 152) 0 entries of 256 used
Entering add_active_range(0, 256, 786208) 1 entries of 256 used
Entering add_active_range(0, 1048576, 1310720) 2 entries of 256 used
end_pfn_map = 1310720
DMI present.
Entering add_active_range(0, 0, 152) 0 entries of 256 used
Entering add_active_range(0, 256, 786208) 1 entries of 256 used
Entering add_active_range(0, 1048576, 1310720) 2 entries of 256 used
Zone PFN ranges:
  DMA 0 -> 4096
  DMA324096 ->  1048576
  Normal1048576 ->  1310720
early_node_map[3] active PFN ranges
0:0 ->  152
0:  256 ->   786208
0:  1048576 ->  1310720
On node 0 totalpages: 1048248
  DMA zone: 56 pages used for memmap
  DMA zone: 1138 pages reserved
  DMA zone: 2798 pages, LIFO batch:0
  DMA32 zone: 14280 pages used for memmap
  DMA32 zone: 767832 pages, LIFO batch:31
  Normal zone: 3584 pages used for memmap
  Normal zone: 258560 pages, LIFO batch:31
Intel MultiProcessor Specification v1.4
MPTABLE: OEM ID: AMD  MPTABLE: Product ID: HAMMER   MPTABLE: 
APIC at: 0xFEE0

Processor #0 (Bootup-CPU)
Processor #1
I/O APIC #2 at 0xFEC0.
I/O APIC #3 at 0xD800.
I/O APIC #4 at 0xD8001000.
Setting APIC routing to flat
Processors: 2
Nosave address range: 00098000 - 00099000
Nosave address range: 00099000 - 000a
Nosave address range: 000a - 000c2000
Nosave address range: 000c2000 - 0010
Nosave address range: bff2 - bff29000
Nosave address range: bff29000 - bff8
Nosave address range: bff8 - c000
Nosave address range: c000 - d800
Nosave address range: d800 - d8001000
Nosave address range: d8001000 - e000
Nosave address range: e000 - f000
Nosave address range: f000 - fec0
Nosave address range: fec0 - fee0
Nosave address range: fee0 - fee01000
Nosave address range: fee01000 - fff8
Nosave address range: fff8 - 0001
Allocating PCI resources starting at c200 (gap: c000:1800)
PERCPU: Allocating 36608 bytes of per cpu data
Built 1 zonelists.  Total pages: 1029190
Kernel command line: root=/dev/sda2
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
time.c: Detected 2009.287 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Checking aperture...
CPU 0: aperture @ 233e00 size 32 MB

Re: [2.6.21.1] SATA freeze

2007-05-10 Thread Gerhard Mack
On Wed, 9 May 2007, Robert Hancock wrote:
> Gerhard Mack wrote:
> > On Wed, 9 May 2007, Jeff Garzik wrote:
> > > Gerhard Mack wrote:
> > > > May  9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0
> > > > SErr
> > > > 0x180 action 0x2 frozen
> > > > May  9 14:51:35 mgerhard kernel: ata1.00: cmd
> > > > 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
> > > > May  9 14:51:35 mgerhard kernel:  res
> > > > 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout)
> > > > May  9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please
> > > > be
> > > > patient (Status 0xd0)
> > > > 
> > > > Anything I can do to figgure out what's causing this?
> 
> You're showing various flags set in the SError register, which suggests you're
> having SATA communication problems with the drive. A bad SATA cable or power
> problems would be a strong possibility.
> 
> It really would be nice if we decoded these things more usefully for the user
> (same with the regular ATA errors, like drivers/ide does), but in general
> SError showing up as non-zero is a bad thing:
> 
> 0x40 = "Handshake error: When set to one, this bit indicates that one or
> more R_ERR handshake response was received in response to frame transmission.
> Such errors may be the result of a CRC error detected by the recipient, a
> disparity or 10b/8b decoding error, or other error condition leading to a
> negative handshake on a transmitted frame."
> 
> 0x180 = "Link Sequence Error: When set to one, this bit indicates that one
> or more Link state machine error conditions was encountered since the last
> time this bit was cleared. The Link Layer state machine defines the conditions
> under which the link layer detects an erroneous transition."
> 
> and
> 
> "Transport state transition error: When set to one, this bit indicates that an
> error has occurred in the transition from one state to another within the
> Transport layer since the last time this bit was cleared."


Just out of curiosity how often is that bit cleared?

Gerhard

--
Gerhard Mack

[EMAIL PROTECTED]

<>< As a computer I find your faith in technology amusing.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] SATA freeze

2007-05-10 Thread Gerhard Mack
On Thu, 10 May 2007, Mikael Pettersson wrote:

> Date: Thu, 10 May 2007 10:51:57 +0200
> From: Mikael Pettersson <[EMAIL PROTECTED]>
> To: Gerhard Mack <[EMAIL PROTECTED]>
> Cc: Jeff Garzik <[EMAIL PROTECTED]>, linux-kernel@vger.kernel.org
> Subject: Re: [2.6.21.1] SATA freeze
> 
> Gerhard Mack writes:
>  > On Wed, 9 May 2007, Jeff Garzik wrote:
>  > > Gerhard Mack wrote:
>  > > > May  9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 
> SErr
>  > > > 0x180 action 0x2 frozen
>  > > > May  9 14:51:35 mgerhard kernel: ata1.00: cmd
>  > > > 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
>  > > > May  9 14:51:35 mgerhard kernel:  res
>  > > > 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout)
>  > > > May  9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please 
> be
>  > > > patient (Status 0xd0)
>  > > > 
>  > > > Anything I can do to figgure out what's causing this?
>  > > 
>  > > Provide full lspci, dmesg, kernel config?
>  > > 
>  > Done.
> 
> Your second boot (warm or cold?)

Warm boot.

> 
>  > May  9 14:43:07 mgerhard kernel: klogd 1.4.1#20, log source = /proc/kmsg 
> started.
>  > May  9 14:43:07 mgerhard kernel: Linux version 2.6.21.1 ([EMAIL 
> PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 
> SMP PREEMPT Wed May 2 20:08:35 EDT 2007
>  > May  9 14:43:07 mgerhard kernel: Command line: root=/dev/sda3 ro 
> 
> worked fine until ReiserFS's journal replay caused a single SATA exception:
> 
>  > May  9 14:43:07 mgerhard kernel: ReiserFS: sda3: There were 7 uncompleted 
> unlinks/truncates. Completed
>  > May  9 14:43:07 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 
> SErr 0x40 action 0x2
>  > May  9 14:43:07 mgerhard kernel: ata1.00: (BMDMA stat 0x25)
>  > May  9 14:43:07 mgerhard kernel: ata1.00: cmd 
> 35/00:58:20:4d:23/00:01:00:00:00/e0 tag 0 cdb 0x0 data 176128 out
>  > May  9 14:43:07 mgerhard kernel:  res 
> 51/84:28:50:4d:23/84:01:00:00:00/e0 Emask 0x10 (ATA bus error)
>  > May  9 14:43:07 mgerhard kernel: ata1: soft resetting port
>  > May  9 14:43:07 mgerhard kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 
> SControl 300)
>  > May  9 14:43:07 mgerhard kernel: ata1.00: configured for UDMA/100
>  > May  9 14:43:07 mgerhard kernel: ata1: EH complete
>  > May  9 14:43:07 mgerhard kernel: SCSI device sda: 488397168 512-byte hdwr 
> sectors (250059 MB)
> 
> Shortly thereafter you loaded a proprietary module

Oops thought I killed that.

> 
>  > May  9 14:43:17 mgerhard kernel: nvidia: module license 'NVIDIA' taints 
> kernel.
>  > May  9 14:43:17 mgerhard kernel: ACPI: PCI Interrupt Link [APC7] enabled 
> at IRQ 16
>  > May  9 14:43:17 mgerhard kernel: ACPI: PCI Interrupt :00:05.0[A] -> 
> Link [APC7] -> GSI 16 (level, low) -> IRQ 16
>  > May  9 14:43:17 mgerhard kernel: PCI: Setting latency timer of device 
> :00:05.0 to 64
>  > May  9 14:43:17 mgerhard kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel 
> Module  1.0-9746  Fri Dec 15 10:19:35 PST 2006
> 
> and immediately there's a large number of SATA exceptions:
> 
>  > May  9 14:44:37 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 
> SErr 0x40 action 0x2
>  > May  9 14:44:37 mgerhard kernel: ata1.00: (BMDMA stat 0x25)
>  > May  9 14:44:37 mgerhard kernel: ata1.00: cmd 
> 35/00:00:b0:53:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
>  > May  9 14:44:37 mgerhard kernel:  res 
> 51/84:60:50:56:c8/84:01:09:00:00/e0 Emask 0x10 (ATA bus error)
>  > May  9 14:44:37 mgerhard kernel: ata1: soft resetting port
>  > May  9 14:44:37 mgerhard kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 
> SControl 300)
>  > May  9 14:44:37 mgerhard kernel: ata1.00: configured for UDMA/100
> (repeated)
> 
> Please try a cold boot (so the HW is in a pristine state) without
> ever loading the nvidia module.

Cold boot cleared the drive problems.  Nvidia loaded or not has no affect 
on it at this point.


Thanks for the help.

Gerhard

--
Gerhard Mack

[EMAIL PROTECTED]

<>< As a computer I find your faith in technology amusing.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] SATA freeze

2007-05-10 Thread Mikael Pettersson
Gerhard Mack writes:
 > On Wed, 9 May 2007, Jeff Garzik wrote:
 > > Gerhard Mack wrote:
 > > > May  9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 
 > > > SErr
 > > > 0x180 action 0x2 frozen
 > > > May  9 14:51:35 mgerhard kernel: ata1.00: cmd
 > > > 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
 > > > May  9 14:51:35 mgerhard kernel:  res
 > > > 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout)
 > > > May  9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be
 > > > patient (Status 0xd0)
 > > > 
 > > > Anything I can do to figgure out what's causing this?
 > > 
 > > Provide full lspci, dmesg, kernel config?
 > > 
 > Done.

Your second boot (warm or cold?)

 > May  9 14:43:07 mgerhard kernel: klogd 1.4.1#20, log source = /proc/kmsg 
 > started.
 > May  9 14:43:07 mgerhard kernel: Linux version 2.6.21.1 ([EMAIL PROTECTED]) 
 > (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP PREEMPT 
 > Wed May 2 20:08:35 EDT 2007
 > May  9 14:43:07 mgerhard kernel: Command line: root=/dev/sda3 ro 

worked fine until ReiserFS's journal replay caused a single SATA exception:

 > May  9 14:43:07 mgerhard kernel: ReiserFS: sda3: There were 7 uncompleted 
 > unlinks/truncates. Completed
 > May  9 14:43:07 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 
 > 0x40 action 0x2
 > May  9 14:43:07 mgerhard kernel: ata1.00: (BMDMA stat 0x25)
 > May  9 14:43:07 mgerhard kernel: ata1.00: cmd 
 > 35/00:58:20:4d:23/00:01:00:00:00/e0 tag 0 cdb 0x0 data 176128 out
 > May  9 14:43:07 mgerhard kernel:  res 
 > 51/84:28:50:4d:23/84:01:00:00:00/e0 Emask 0x10 (ATA bus error)
 > May  9 14:43:07 mgerhard kernel: ata1: soft resetting port
 > May  9 14:43:07 mgerhard kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 
 > SControl 300)
 > May  9 14:43:07 mgerhard kernel: ata1.00: configured for UDMA/100
 > May  9 14:43:07 mgerhard kernel: ata1: EH complete
 > May  9 14:43:07 mgerhard kernel: SCSI device sda: 488397168 512-byte hdwr 
 > sectors (250059 MB)

Shortly thereafter you loaded a proprietary module

 > May  9 14:43:17 mgerhard kernel: nvidia: module license 'NVIDIA' taints 
 > kernel.
 > May  9 14:43:17 mgerhard kernel: ACPI: PCI Interrupt Link [APC7] enabled at 
 > IRQ 16
 > May  9 14:43:17 mgerhard kernel: ACPI: PCI Interrupt :00:05.0[A] -> Link 
 > [APC7] -> GSI 16 (level, low) -> IRQ 16
 > May  9 14:43:17 mgerhard kernel: PCI: Setting latency timer of device 
 > :00:05.0 to 64
 > May  9 14:43:17 mgerhard kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel 
 > Module  1.0-9746  Fri Dec 15 10:19:35 PST 2006

and immediately there's a large number of SATA exceptions:

 > May  9 14:44:37 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 
 > 0x40 action 0x2
 > May  9 14:44:37 mgerhard kernel: ata1.00: (BMDMA stat 0x25)
 > May  9 14:44:37 mgerhard kernel: ata1.00: cmd 
 > 35/00:00:b0:53:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
 > May  9 14:44:37 mgerhard kernel:  res 
 > 51/84:60:50:56:c8/84:01:09:00:00/e0 Emask 0x10 (ATA bus error)
 > May  9 14:44:37 mgerhard kernel: ata1: soft resetting port
 > May  9 14:44:37 mgerhard kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 
 > SControl 300)
 > May  9 14:44:37 mgerhard kernel: ata1.00: configured for UDMA/100
(repeated)

Please try a cold boot (so the HW is in a pristine state) without
ever loading the nvidia module.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] SATA freeze

2007-05-10 Thread Mikael Pettersson
Gerhard Mack writes:
  On Wed, 9 May 2007, Jeff Garzik wrote:
   Gerhard Mack wrote:
May  9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 
SErr
0x180 action 0x2 frozen
May  9 14:51:35 mgerhard kernel: ata1.00: cmd
35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
May  9 14:51:35 mgerhard kernel:  res
40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout)
May  9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be
patient (Status 0xd0)

Anything I can do to figgure out what's causing this?
   
   Provide full lspci, dmesg, kernel config?
   
  Done.

Your second boot (warm or cold?)

  May  9 14:43:07 mgerhard kernel: klogd 1.4.1#20, log source = /proc/kmsg 
  started.
  May  9 14:43:07 mgerhard kernel: Linux version 2.6.21.1 ([EMAIL PROTECTED]) 
  (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP PREEMPT 
  Wed May 2 20:08:35 EDT 2007
  May  9 14:43:07 mgerhard kernel: Command line: root=/dev/sda3 ro 

worked fine until ReiserFS's journal replay caused a single SATA exception:

  May  9 14:43:07 mgerhard kernel: ReiserFS: sda3: There were 7 uncompleted 
  unlinks/truncates. Completed
  May  9 14:43:07 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 
  0x40 action 0x2
  May  9 14:43:07 mgerhard kernel: ata1.00: (BMDMA stat 0x25)
  May  9 14:43:07 mgerhard kernel: ata1.00: cmd 
  35/00:58:20:4d:23/00:01:00:00:00/e0 tag 0 cdb 0x0 data 176128 out
  May  9 14:43:07 mgerhard kernel:  res 
  51/84:28:50:4d:23/84:01:00:00:00/e0 Emask 0x10 (ATA bus error)
  May  9 14:43:07 mgerhard kernel: ata1: soft resetting port
  May  9 14:43:07 mgerhard kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 
  SControl 300)
  May  9 14:43:07 mgerhard kernel: ata1.00: configured for UDMA/100
  May  9 14:43:07 mgerhard kernel: ata1: EH complete
  May  9 14:43:07 mgerhard kernel: SCSI device sda: 488397168 512-byte hdwr 
  sectors (250059 MB)

Shortly thereafter you loaded a proprietary module

  May  9 14:43:17 mgerhard kernel: nvidia: module license 'NVIDIA' taints 
  kernel.
  May  9 14:43:17 mgerhard kernel: ACPI: PCI Interrupt Link [APC7] enabled at 
  IRQ 16
  May  9 14:43:17 mgerhard kernel: ACPI: PCI Interrupt :00:05.0[A] - Link 
  [APC7] - GSI 16 (level, low) - IRQ 16
  May  9 14:43:17 mgerhard kernel: PCI: Setting latency timer of device 
  :00:05.0 to 64
  May  9 14:43:17 mgerhard kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel 
  Module  1.0-9746  Fri Dec 15 10:19:35 PST 2006

and immediately there's a large number of SATA exceptions:

  May  9 14:44:37 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 
  0x40 action 0x2
  May  9 14:44:37 mgerhard kernel: ata1.00: (BMDMA stat 0x25)
  May  9 14:44:37 mgerhard kernel: ata1.00: cmd 
  35/00:00:b0:53:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
  May  9 14:44:37 mgerhard kernel:  res 
  51/84:60:50:56:c8/84:01:09:00:00/e0 Emask 0x10 (ATA bus error)
  May  9 14:44:37 mgerhard kernel: ata1: soft resetting port
  May  9 14:44:37 mgerhard kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 
  SControl 300)
  May  9 14:44:37 mgerhard kernel: ata1.00: configured for UDMA/100
(repeated)

Please try a cold boot (so the HW is in a pristine state) without
ever loading the nvidia module.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] SATA freeze

2007-05-10 Thread Gerhard Mack
On Thu, 10 May 2007, Mikael Pettersson wrote:

 Date: Thu, 10 May 2007 10:51:57 +0200
 From: Mikael Pettersson [EMAIL PROTECTED]
 To: Gerhard Mack [EMAIL PROTECTED]
 Cc: Jeff Garzik [EMAIL PROTECTED], linux-kernel@vger.kernel.org
 Subject: Re: [2.6.21.1] SATA freeze
 
 Gerhard Mack writes:
   On Wed, 9 May 2007, Jeff Garzik wrote:
Gerhard Mack wrote:
 May  9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 
 SErr
 0x180 action 0x2 frozen
 May  9 14:51:35 mgerhard kernel: ata1.00: cmd
 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
 May  9 14:51:35 mgerhard kernel:  res
 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout)
 May  9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please 
 be
 patient (Status 0xd0)
 
 Anything I can do to figgure out what's causing this?

Provide full lspci, dmesg, kernel config?

   Done.
 
 Your second boot (warm or cold?)

Warm boot.

 
   May  9 14:43:07 mgerhard kernel: klogd 1.4.1#20, log source = /proc/kmsg 
 started.
   May  9 14:43:07 mgerhard kernel: Linux version 2.6.21.1 ([EMAIL 
 PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 
 SMP PREEMPT Wed May 2 20:08:35 EDT 2007
   May  9 14:43:07 mgerhard kernel: Command line: root=/dev/sda3 ro 
 
 worked fine until ReiserFS's journal replay caused a single SATA exception:
 
   May  9 14:43:07 mgerhard kernel: ReiserFS: sda3: There were 7 uncompleted 
 unlinks/truncates. Completed
   May  9 14:43:07 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 
 SErr 0x40 action 0x2
   May  9 14:43:07 mgerhard kernel: ata1.00: (BMDMA stat 0x25)
   May  9 14:43:07 mgerhard kernel: ata1.00: cmd 
 35/00:58:20:4d:23/00:01:00:00:00/e0 tag 0 cdb 0x0 data 176128 out
   May  9 14:43:07 mgerhard kernel:  res 
 51/84:28:50:4d:23/84:01:00:00:00/e0 Emask 0x10 (ATA bus error)
   May  9 14:43:07 mgerhard kernel: ata1: soft resetting port
   May  9 14:43:07 mgerhard kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 
 SControl 300)
   May  9 14:43:07 mgerhard kernel: ata1.00: configured for UDMA/100
   May  9 14:43:07 mgerhard kernel: ata1: EH complete
   May  9 14:43:07 mgerhard kernel: SCSI device sda: 488397168 512-byte hdwr 
 sectors (250059 MB)
 
 Shortly thereafter you loaded a proprietary module

Oops thought I killed that.

 
   May  9 14:43:17 mgerhard kernel: nvidia: module license 'NVIDIA' taints 
 kernel.
   May  9 14:43:17 mgerhard kernel: ACPI: PCI Interrupt Link [APC7] enabled 
 at IRQ 16
   May  9 14:43:17 mgerhard kernel: ACPI: PCI Interrupt :00:05.0[A] - 
 Link [APC7] - GSI 16 (level, low) - IRQ 16
   May  9 14:43:17 mgerhard kernel: PCI: Setting latency timer of device 
 :00:05.0 to 64
   May  9 14:43:17 mgerhard kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel 
 Module  1.0-9746  Fri Dec 15 10:19:35 PST 2006
 
 and immediately there's a large number of SATA exceptions:
 
   May  9 14:44:37 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 
 SErr 0x40 action 0x2
   May  9 14:44:37 mgerhard kernel: ata1.00: (BMDMA stat 0x25)
   May  9 14:44:37 mgerhard kernel: ata1.00: cmd 
 35/00:00:b0:53:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
   May  9 14:44:37 mgerhard kernel:  res 
 51/84:60:50:56:c8/84:01:09:00:00/e0 Emask 0x10 (ATA bus error)
   May  9 14:44:37 mgerhard kernel: ata1: soft resetting port
   May  9 14:44:37 mgerhard kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 
 SControl 300)
   May  9 14:44:37 mgerhard kernel: ata1.00: configured for UDMA/100
 (repeated)
 
 Please try a cold boot (so the HW is in a pristine state) without
 ever loading the nvidia module.

Cold boot cleared the drive problems.  Nvidia loaded or not has no affect 
on it at this point.


Thanks for the help.

Gerhard

--
Gerhard Mack

[EMAIL PROTECTED]

 As a computer I find your faith in technology amusing.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] SATA freeze

2007-05-10 Thread Gerhard Mack
On Wed, 9 May 2007, Robert Hancock wrote:
 Gerhard Mack wrote:
  On Wed, 9 May 2007, Jeff Garzik wrote:
   Gerhard Mack wrote:
May  9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0
SErr
0x180 action 0x2 frozen
May  9 14:51:35 mgerhard kernel: ata1.00: cmd
35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
May  9 14:51:35 mgerhard kernel:  res
40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout)
May  9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please
be
patient (Status 0xd0)

Anything I can do to figgure out what's causing this?
 
 You're showing various flags set in the SError register, which suggests you're
 having SATA communication problems with the drive. A bad SATA cable or power
 problems would be a strong possibility.
 
 It really would be nice if we decoded these things more usefully for the user
 (same with the regular ATA errors, like drivers/ide does), but in general
 SError showing up as non-zero is a bad thing:
 
 0x40 = Handshake error: When set to one, this bit indicates that one or
 more R_ERR handshake response was received in response to frame transmission.
 Such errors may be the result of a CRC error detected by the recipient, a
 disparity or 10b/8b decoding error, or other error condition leading to a
 negative handshake on a transmitted frame.
 
 0x180 = Link Sequence Error: When set to one, this bit indicates that one
 or more Link state machine error conditions was encountered since the last
 time this bit was cleared. The Link Layer state machine defines the conditions
 under which the link layer detects an erroneous transition.
 
 and
 
 Transport state transition error: When set to one, this bit indicates that an
 error has occurred in the transition from one state to another within the
 Transport layer since the last time this bit was cleared.


Just out of curiosity how often is that bit cleared?

Gerhard

--
Gerhard Mack

[EMAIL PROTECTED]

 As a computer I find your faith in technology amusing.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [2.6.21.1] SATA freeze

2007-05-10 Thread Fred Moyer

Robert Hancock wrote:
Gerhard Mack wrote:
 On Wed, 9 May 2007, Jeff Garzik wrote:
 Gerhard Mack wrote:
 May  9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 
0x0 SErr

 0x180 action 0x2 frozen
 May  9 14:51:35 mgerhard kernel: ata1.00: cmd
 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
 May  9 14:51:35 mgerhard kernel:  res
 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout)
 May  9 14:51:42 mgerhard kernel: ata1: port is slow to respond, 
please be

 patient (Status 0xd0)

 Anything I can do to figgure out what's causing this?
 You're showing various flags set in the SError register, which
 suggests you're having SATA communication problems with the drive. A
 bad SATA cable or power problems would be a strong possibility.

I just joined the list today so apologies if this email breaks any email 
client post threading.


I have been seeing similar errors on two different systems.  I applied 
Robert's sata_nv patch posted to the list on May 5th, and approved today 
by Jeff Garzik.  I've taken several steps to insure that this isn't a 
faulty cable or drive issue.  This is running on a hp dl145g2.  Here is 
my lspci, dmesg, and relevant kernel config sections:



Linux version 2.6.21-gentoo ([EMAIL PROTECTED]) (gcc version 
4.1.1 (Gentoo 4.1.1)) #6 SMP Sun May 6 16:44:40 PDT 2007

Command line: root=/dev/sda2
BIOS-provided physical RAM map:
 BIOS-e820:  - 00098800 (usable)
 BIOS-e820: 00098800 - 000a (reserved)
 BIOS-e820: 000c2000 - 0010 (reserved)
 BIOS-e820: 0010 - bff2 (usable)
 BIOS-e820: bff2 - bff29000 (ACPI data)
 BIOS-e820: bff29000 - bff8 (ACPI NVS)
 BIOS-e820: bff8 - c000 (reserved)
 BIOS-e820: d800 - d8000400 (reserved)
 BIOS-e820: d8001000 - d8001400 (reserved)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - fec00400 (reserved)
 BIOS-e820: fee0 - fee01000 (reserved)
 BIOS-e820: fff8 - 0001 (reserved)
 BIOS-e820: 0001 - 00014000 (usable)
Entering add_active_range(0, 0, 152) 0 entries of 256 used
Entering add_active_range(0, 256, 786208) 1 entries of 256 used
Entering add_active_range(0, 1048576, 1310720) 2 entries of 256 used
end_pfn_map = 1310720
DMI present.
Entering add_active_range(0, 0, 152) 0 entries of 256 used
Entering add_active_range(0, 256, 786208) 1 entries of 256 used
Entering add_active_range(0, 1048576, 1310720) 2 entries of 256 used
Zone PFN ranges:
  DMA 0 - 4096
  DMA324096 -  1048576
  Normal1048576 -  1310720
early_node_map[3] active PFN ranges
0:0 -  152
0:  256 -   786208
0:  1048576 -  1310720
On node 0 totalpages: 1048248
  DMA zone: 56 pages used for memmap
  DMA zone: 1138 pages reserved
  DMA zone: 2798 pages, LIFO batch:0
  DMA32 zone: 14280 pages used for memmap
  DMA32 zone: 767832 pages, LIFO batch:31
  Normal zone: 3584 pages used for memmap
  Normal zone: 258560 pages, LIFO batch:31
Intel MultiProcessor Specification v1.4
MPTABLE: OEM ID: AMD  MPTABLE: Product ID: HAMMER   MPTABLE: 
APIC at: 0xFEE0

Processor #0 (Bootup-CPU)
Processor #1
I/O APIC #2 at 0xFEC0.
I/O APIC #3 at 0xD800.
I/O APIC #4 at 0xD8001000.
Setting APIC routing to flat
Processors: 2
Nosave address range: 00098000 - 00099000
Nosave address range: 00099000 - 000a
Nosave address range: 000a - 000c2000
Nosave address range: 000c2000 - 0010
Nosave address range: bff2 - bff29000
Nosave address range: bff29000 - bff8
Nosave address range: bff8 - c000
Nosave address range: c000 - d800
Nosave address range: d800 - d8001000
Nosave address range: d8001000 - e000
Nosave address range: e000 - f000
Nosave address range: f000 - fec0
Nosave address range: fec0 - fee0
Nosave address range: fee0 - fee01000
Nosave address range: fee01000 - fff8
Nosave address range: fff8 - 0001
Allocating PCI resources starting at c200 (gap: c000:1800)
PERCPU: Allocating 36608 bytes of per cpu data
Built 1 zonelists.  Total pages: 1029190
Kernel command line: root=/dev/sda2
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
time.c: Detected 2009.287 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Checking aperture...
CPU 0: aperture @ 233e00 size 32 MB
Aperture too small (32 MB)
No AGP bridge found
Your BIOS 

Re: [2.6.21.1] SATA freeze

2007-05-09 Thread Robert Hancock

Gerhard Mack wrote:

On Wed, 9 May 2007, Jeff Garzik wrote:

Gerhard Mack wrote:

May  9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr
0x180 action 0x2 frozen
May  9 14:51:35 mgerhard kernel: ata1.00: cmd
35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
May  9 14:51:35 mgerhard kernel:  res
40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout)
May  9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be
patient (Status 0xd0)

Anything I can do to figgure out what's causing this?


You're showing various flags set in the SError register, which suggests 
you're having SATA communication problems with the drive. A bad SATA 
cable or power problems would be a strong possibility.


It really would be nice if we decoded these things more usefully for the 
user (same with the regular ATA errors, like drivers/ide does), but in 
general SError showing up as non-zero is a bad thing:


0x40 = "Handshake error: When set to one, this bit indicates that 
one or more R_ERR handshake response was received in response to frame 
transmission. Such errors may be the result of a CRC error detected by 
the recipient, a disparity or 10b/8b decoding error, or other error 
condition leading to a negative handshake on a transmitted frame."


0x180 = "Link Sequence Error: When set to one, this bit indicates 
that one or more Link state machine error conditions was encountered 
since the last time this bit was cleared. The Link Layer state machine 
defines the conditions under which the link layer detects an erroneous 
transition."


and

"Transport state transition error: When set to one, this bit indicates 
that an error has occurred in the transition from one state to another 
within the Transport layer since the last time this bit was cleared."


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] SATA freeze

2007-05-09 Thread Chuck Ebbert
Gerhard Mack wrote:
> On Wed, 9 May 2007, Jeff Garzik wrote:
>> Gerhard Mack wrote:
>>> May  9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr
>>> 0x180 action 0x2 frozen
>>> May  9 14:51:35 mgerhard kernel: ata1.00: cmd
>>> 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
>>> May  9 14:51:35 mgerhard kernel:  res
>>> 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout)
>>> May  9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be
>>> patient (Status 0xd0)
>>>
>>> Anything I can do to figgure out what's causing this?
>> Provide full lspci, dmesg, kernel config?
>>
> Done.
> 

You could try:

   pci=nomsi (kernel option)

and/or

   adma=0 (module option for sata_nv)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] SATA freeze

2007-05-09 Thread Jeff Garzik

Gerhard Mack wrote:
May  9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 
SErr 0x180 action 0x2 frozen
May  9 14:51:35 mgerhard kernel: ata1.00: cmd 
35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
May  9 14:51:35 mgerhard kernel:  res 
40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout)
May  9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be 
patient (Status 0xd0)


Anything I can do to figgure out what's causing this?


Provide full lspci, dmesg, kernel config?

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2.6.21.1] SATA freeze

2007-05-09 Thread Gerhard Mack
May  9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 
SErr 0x180 action 0x2 frozen
May  9 14:51:35 mgerhard kernel: ata1.00: cmd 
35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
May  9 14:51:35 mgerhard kernel:  res 
40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout)
May  9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be 
patient (Status 0xd0)

Anything I can do to figgure out what's causing this?

Gerhard
 

--
Gerhard Mack

[EMAIL PROTECTED]

<>< As a computer I find your faith in technology amusing.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2.6.21.1] SATA freeze

2007-05-09 Thread Gerhard Mack
May  9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 
SErr 0x180 action 0x2 frozen
May  9 14:51:35 mgerhard kernel: ata1.00: cmd 
35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
May  9 14:51:35 mgerhard kernel:  res 
40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout)
May  9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be 
patient (Status 0xd0)

Anything I can do to figgure out what's causing this?

Gerhard
 

--
Gerhard Mack

[EMAIL PROTECTED]

 As a computer I find your faith in technology amusing.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] SATA freeze

2007-05-09 Thread Jeff Garzik

Gerhard Mack wrote:
May  9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 
SErr 0x180 action 0x2 frozen
May  9 14:51:35 mgerhard kernel: ata1.00: cmd 
35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
May  9 14:51:35 mgerhard kernel:  res 
40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout)
May  9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be 
patient (Status 0xd0)


Anything I can do to figgure out what's causing this?


Provide full lspci, dmesg, kernel config?

Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] SATA freeze

2007-05-09 Thread Chuck Ebbert
Gerhard Mack wrote:
 On Wed, 9 May 2007, Jeff Garzik wrote:
 Gerhard Mack wrote:
 May  9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr
 0x180 action 0x2 frozen
 May  9 14:51:35 mgerhard kernel: ata1.00: cmd
 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
 May  9 14:51:35 mgerhard kernel:  res
 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout)
 May  9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be
 patient (Status 0xd0)

 Anything I can do to figgure out what's causing this?
 Provide full lspci, dmesg, kernel config?

 Done.
 

You could try:

   pci=nomsi (kernel option)

and/or

   adma=0 (module option for sata_nv)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21.1] SATA freeze

2007-05-09 Thread Robert Hancock

Gerhard Mack wrote:

On Wed, 9 May 2007, Jeff Garzik wrote:

Gerhard Mack wrote:

May  9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr
0x180 action 0x2 frozen
May  9 14:51:35 mgerhard kernel: ata1.00: cmd
35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
May  9 14:51:35 mgerhard kernel:  res
40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout)
May  9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be
patient (Status 0xd0)

Anything I can do to figgure out what's causing this?


You're showing various flags set in the SError register, which suggests 
you're having SATA communication problems with the drive. A bad SATA 
cable or power problems would be a strong possibility.


It really would be nice if we decoded these things more usefully for the 
user (same with the regular ATA errors, like drivers/ide does), but in 
general SError showing up as non-zero is a bad thing:


0x40 = Handshake error: When set to one, this bit indicates that 
one or more R_ERR handshake response was received in response to frame 
transmission. Such errors may be the result of a CRC error detected by 
the recipient, a disparity or 10b/8b decoding error, or other error 
condition leading to a negative handshake on a transmitted frame.


0x180 = Link Sequence Error: When set to one, this bit indicates 
that one or more Link state machine error conditions was encountered 
since the last time this bit was cleared. The Link Layer state machine 
defines the conditions under which the link layer detects an erroneous 
transition.


and

Transport state transition error: When set to one, this bit indicates 
that an error has occurred in the transition from one state to another 
within the Transport layer since the last time this bit was cleared.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/