Re: [2.6.21.1] SATA freeze
Fred Moyer wrote: Sounds like SMART is likely disabled on that drive. You can try doing "smartctl -s on /dev/sda" and see if that will turn it on. Sorry - that last post of mine was brain dead. Here's the one with (hopefully) useful data. app2 ~ # smartctl -d ata -a /dev/sda smartctl version 5.36 [x86_64-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen (...) -- -- -- -- -- -- -- 84 51 00 b5 c9 73 e0 Error: ICRC, ABRT at LBA = 0x0073c9b5 = 7588277 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- 25 00 20 96 c9 73 e0 00 01:25:42.886 READ DMA EXT b0 d0 01 00 4f c2 00 02 01:25:42.868 SMART READ DATA 35 00 08 ae b6 42 e0 00 01:25:42.456 WRITE DMA EXT b0 da 00 00 4f c2 00 00 01:25:42.430 SMART RETURN STATUS 35 00 08 60 81 04 e0 00 01:25:42.376 WRITE DMA EXT I was getting very similar SMART results, and kernel errors, when used PATA drive and SATA_VIA (no freezes or hangs though): SCSI device sda: 390721968 512-byte hdwr sectors (200050 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata3.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0 res 51/04:00:0b:ff:bf/00:00:00:00:00/00 Emask 0x1 (device error) ata3.00: configured for UDMA/100 ata3: EH complete SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata3.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in res 51/04:01:00:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) ata3.00: configured for UDMA/100 ata3: EH complete The problem was that I started smartd with wrong parameters: DEVICESCAN -a -o on -S on -s (S/../.././10|L/../../6/11) It was solved when I added "-d sat" to smartd parameters: DEVICESCAN -d sat -a -o on -S on -s (S/../.././10|L/../../6/11) From that time on, smartctl -a /dev/sda gives "normal" output, and no more strange kernel errors. Hopefully, it'll get fixed in smartmontools soon (or is fixed already, but not yet mainline). -- Tomasz Chmielewski http://wpkg.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Fred Moyer wrote: Sounds like SMART is likely disabled on that drive. You can try doing smartctl -s on /dev/sda and see if that will turn it on. Sorry - that last post of mine was brain dead. Here's the one with (hopefully) useful data. app2 ~ # smartctl -d ata -a /dev/sda smartctl version 5.36 [x86_64-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen (...) -- -- -- -- -- -- -- 84 51 00 b5 c9 73 e0 Error: ICRC, ABRT at LBA = 0x0073c9b5 = 7588277 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- 25 00 20 96 c9 73 e0 00 01:25:42.886 READ DMA EXT b0 d0 01 00 4f c2 00 02 01:25:42.868 SMART READ DATA 35 00 08 ae b6 42 e0 00 01:25:42.456 WRITE DMA EXT b0 da 00 00 4f c2 00 00 01:25:42.430 SMART RETURN STATUS 35 00 08 60 81 04 e0 00 01:25:42.376 WRITE DMA EXT I was getting very similar SMART results, and kernel errors, when used PATA drive and SATA_VIA (no freezes or hangs though): SCSI device sda: 390721968 512-byte hdwr sectors (200050 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata3.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0 res 51/04:00:0b:ff:bf/00:00:00:00:00/00 Emask 0x1 (device error) ata3.00: configured for UDMA/100 ata3: EH complete SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata3.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in res 51/04:01:00:4f:c2/00:00:00:00:00/00 Emask 0x1 (device error) ata3.00: configured for UDMA/100 ata3: EH complete The problem was that I started smartd with wrong parameters: DEVICESCAN -a -o on -S on -s (S/../.././10|L/../../6/11) It was solved when I added -d sat to smartd parameters: DEVICESCAN -d sat -a -o on -S on -s (S/../.././10|L/../../6/11) From that time on, smartctl -a /dev/sda gives normal output, and no more strange kernel errors. Hopefully, it'll get fixed in smartmontools soon (or is fixed already, but not yet mainline). -- Tomasz Chmielewski http://wpkg.org - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
> > This appears to be a different problem. Something is issuing SMART-related > > commands (smartd or smartctl perhaps) which the drive seems to be reacting > > strangely to. .. > Specifically, I could trigger it by running 'smartctl -d ata -S on > /dev/sda' OR (s-S/o/). This sounds like a known bug in smartmontools: http://marc.info/?l=smartmontools-support=117203137719518 http://www.mail-archive.com/[EMAIL PROTECTED]/msg03160.html -jim - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
This appears to be a different problem. Something is issuing SMART-related commands (smartd or smartctl perhaps) which the drive seems to be reacting strangely to. .. Specifically, I could trigger it by running 'smartctl -d ata -S on /dev/sda' OR (s-S/o/). This sounds like a known bug in smartmontools: http://marc.info/?l=smartmontools-supportm=117203137719518 http://www.mail-archive.com/[EMAIL PROTECTED]/msg03160.html -jim - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Robert Hancock wrote: Fred Moyer wrote: This appears to be a different problem. Something is issuing SMART-related commands (smartd or smartctl perhaps) which the drive seems to be reacting strangely to. It apparently completed the command but never raised DRQ to request any data being transferred even though we expected it to. Maybe SMART is disabled on the drive and that's causing it to just toss these commands? CCing linux-ide in case anyone knows what would cause this. Here's smartctl -a for this drive - same output for both sda and sdb. Smartd is currently running. Any advice appreciated. Sounds like SMART is likely disabled on that drive. You can try doing "smartctl -s on /dev/sda" and see if that will turn it on. Sorry - that last post of mine was brain dead. Here's the one with (hopefully) useful data. app2 ~ # smartctl -d ata -a /dev/sda smartctl version 5.36 [x86_64-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: ST3808110AS Serial Number:5LR8895K Firmware Version: 3.AJJ User Capacity:80,026,361,856 bytes Device is:Not in smartctl database [for details use: -P showall] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is:Sat May 12 18:49:06 2007 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 431) seconds. Offline data collection capabilities:(0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities:(0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability:(0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time:( 2) minutes. Extended self-test routine recommended polling time:( 27) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 100 253 006Pre-fail Always - 0 3 Spin_Up_Time0x0002 097 094 000Old_age Always - 0 4 Start_Stop_Count0x0033 100 100 020Pre-fail Always - 41 5 Reallocated_Sector_Ct 0x0033 098 098 036Pre-fail Always - 80 7 Seek_Error_Rate 0x000f 073 060 030Pre-fail Always - 23194052 9 Power_On_Hours 0x0032 096 096 000Old_age Always - 3899 10 Spin_Retry_Count0x0013 100 100 097Pre-fail Always - 0 12 Power_Cycle_Count 0x0033 100 100 020Pre-fail Always - 108 187 Unknown_Attribute 0x0032 001 001 000Old_age Always - 17863 189 Unknown_Attribute 0x003a 100 100 000Old_age Always - 0 190 Unknown_Attribute 0x0022 070 057 045Old_age Always - 2689188364318 194 Temperature_Celsius 0x0022 030 043 000Old_age Always - 30 (Lifetime Min/Max 0/22) 195 Hardware_ECC_Recovered 0x001a 048 045 000Old_age Always - 2474070 197 Current_Pending_Sector 0x0012 100 100 000Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000Old_age Offline - 0 199 UDMA_CRC_Error_Count0x003e 200 200 000Old_age Always - 7 200 Multi_Zone_Error_Rate 0x 100 253 000Old_age Offline - 0 202 TA_Increase_Count 0x0032 100
Re: [2.6.21.1] SATA freeze
On Sat, May 12, 2007 at 12:48:59PM -0600, Robert Hancock wrote: > Fred Moyer wrote: > > I just joined the list today so apologies if this email breaks any email > > client post threading. > > I have been seeing similar errors on two different systems. I applied > > Robert's sata_nv patch posted to the list on May 5th, and approved today by > > Jeff Garzik. I've taken several steps to insure that this isn't a faulty > > cable or drive issue. This is running on a hp dl145g2. Here is my lspci, > > dmesg, and relevant kernel config sections: > > (snip) > > > ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen > > ata1.00: cmd b0/d2:f1:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 123392 > > in > > res 50/00:f1:00:4f:c2/00:00:00:00:00/00 Emask 0x202 (HSM > > violation) > > ata1: soft resetting port > > ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > > ata1.00: configured for UDMA/100 > > ata1: EH complete > > This appears to be a different problem. Something is issuing SMART-related > commands (smartd or smartctl perhaps) which the drive seems to be reacting > strangely to. It apparently completed the command but never raised DRQ to > request any data being transferred even though we expected it to. Maybe > SMART is disabled on the drive and that's causing it to just toss these > commands? CCing linux-ide in case anyone knows what would cause this. I previously posted a near identical error to linux-ide. http://article.gmane.org/gmane.linux.ide/18375 Specifically, I could trigger it by running 'smartctl -d ata -S on /dev/sda' OR (s-S/o/). Same sata_nv controller, two different drives, many different cables. Reproducible over 7 systems [two different models of Tyan mobo] that I have. -- Robin Hugh Johnson Gentoo Linux Developer & Council Member E-Mail : [EMAIL PROTECTED] GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85 pgp1lg2M9qRYv.pgp Description: PGP signature
Re: [2.6.21.1] SATA freeze
Fred Moyer wrote: This appears to be a different problem. Something is issuing SMART-related commands (smartd or smartctl perhaps) which the drive seems to be reacting strangely to. It apparently completed the command but never raised DRQ to request any data being transferred even though we expected it to. Maybe SMART is disabled on the drive and that's causing it to just toss these commands? CCing linux-ide in case anyone knows what would cause this. Here's smartctl -a for this drive - same output for both sda and sdb. Smartd is currently running. Any advice appreciated. Previously on 2.6.15 I was seeing sdb remount as readonly under heavy i/o. I have not seen that issue yet with 2.6.21 (with Robert's patch from May 5th for sata_nv), but that occurrence of remounts read-only was infrequently, so that issue may be solved. app2 ~ # smartctl -a /dev/sda smartctl version 5.36 [x86_64-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: ATA ST3808110AS Version: n/a Serial number: 5LR8895K Device type: disk Local Time is: Sat May 12 12:05:58 2007 PDT Device does not support SMART Error Counter logging not supported [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on'] Device does not support Self Test logging Sounds like SMART is likely disabled on that drive. You can try doing "smartctl -s on /dev/sda" and see if that will turn it on. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Robert Hancock wrote: Fred Moyer wrote: I just joined the list today so apologies if this email breaks any email client post threading. I have been seeing similar errors on two different systems. I applied Robert's sata_nv patch posted to the list on May 5th, and approved today by Jeff Garzik. I've taken several steps to insure that this isn't a faulty cable or drive issue. This is running on a hp dl145g2. Here is my lspci, dmesg, and relevant kernel config sections: (snip) ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd b0/d2:f1:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 123392 in res 50/00:f1:00:4f:c2/00:00:00:00:00/00 Emask 0x202 (HSM violation) ata1: soft resetting port ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: configured for UDMA/100 ata1: EH complete This appears to be a different problem. Something is issuing SMART-related commands (smartd or smartctl perhaps) which the drive seems to be reacting strangely to. It apparently completed the command but never raised DRQ to request any data being transferred even though we expected it to. Maybe SMART is disabled on the drive and that's causing it to just toss these commands? CCing linux-ide in case anyone knows what would cause this. Here's smartctl -a for this drive - same output for both sda and sdb. Smartd is currently running. Any advice appreciated. Previously on 2.6.15 I was seeing sdb remount as readonly under heavy i/o. I have not seen that issue yet with 2.6.21 (with Robert's patch from May 5th for sata_nv), but that occurrence of remounts read-only was infrequently, so that issue may be solved. app2 ~ # smartctl -a /dev/sda smartctl version 5.36 [x86_64-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: ATA ST3808110AS Version: n/a Serial number: 5LR8895K Device type: disk Local Time is: Sat May 12 12:05:58 2007 PDT Device does not support SMART Error Counter logging not supported [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on'] Device does not support Self Test logging app2 ~ # ps aux | grep smart root 5227 0.0 0.0 2892 672 ?SMay11 0:00 /usr/sbin/smartd -p /var/run/smartd.pid root 19510 0.0 0.0 2648 648 pts/0S+ 12:07 0:00 grep --colour=auto smart - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Fred Moyer wrote: I just joined the list today so apologies if this email breaks any email client post threading. I have been seeing similar errors on two different systems. I applied Robert's sata_nv patch posted to the list on May 5th, and approved today by Jeff Garzik. I've taken several steps to insure that this isn't a faulty cable or drive issue. This is running on a hp dl145g2. Here is my lspci, dmesg, and relevant kernel config sections: (snip) ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd b0/d2:f1:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 123392 in res 50/00:f1:00:4f:c2/00:00:00:00:00/00 Emask 0x202 (HSM violation) ata1: soft resetting port ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: configured for UDMA/100 ata1: EH complete This appears to be a different problem. Something is issuing SMART-related commands (smartd or smartctl perhaps) which the drive seems to be reacting strangely to. It apparently completed the command but never raised DRQ to request any data being transferred even though we expected it to. Maybe SMART is disabled on the drive and that's causing it to just toss these commands? CCing linux-ide in case anyone knows what would cause this. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Gerhard Mack wrote: On Wed, 9 May 2007, Robert Hancock wrote: Gerhard Mack wrote: On Wed, 9 May 2007, Jeff Garzik wrote: Gerhard Mack wrote: May 9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x180 action 0x2 frozen May 9 14:51:35 mgerhard kernel: ata1.00: cmd 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out May 9 14:51:35 mgerhard kernel: res 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout) May 9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be patient (Status 0xd0) Anything I can do to figgure out what's causing this? You're showing various flags set in the SError register, which suggests you're having SATA communication problems with the drive. A bad SATA cable or power problems would be a strong possibility. It really would be nice if we decoded these things more usefully for the user (same with the regular ATA errors, like drivers/ide does), but in general SError showing up as non-zero is a bad thing: 0x40 = "Handshake error: When set to one, this bit indicates that one or more R_ERR handshake response was received in response to frame transmission. Such errors may be the result of a CRC error detected by the recipient, a disparity or 10b/8b decoding error, or other error condition leading to a negative handshake on a transmitted frame." 0x180 = "Link Sequence Error: When set to one, this bit indicates that one or more Link state machine error conditions was encountered since the last time this bit was cleared. The Link Layer state machine defines the conditions under which the link layer detects an erroneous transition." and "Transport state transition error: When set to one, this bit indicates that an error has occurred in the transition from one state to another within the Transport layer since the last time this bit was cleared." Just out of curiosity how often is that bit cleared? I believe that is cleared only on error handling or controller reset, so it just means that it happened sometime since boot or the last libata error recovery. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Gerhard Mack wrote: On Wed, 9 May 2007, Robert Hancock wrote: Gerhard Mack wrote: On Wed, 9 May 2007, Jeff Garzik wrote: Gerhard Mack wrote: May 9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x180 action 0x2 frozen May 9 14:51:35 mgerhard kernel: ata1.00: cmd 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out May 9 14:51:35 mgerhard kernel: res 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout) May 9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be patient (Status 0xd0) Anything I can do to figgure out what's causing this? You're showing various flags set in the SError register, which suggests you're having SATA communication problems with the drive. A bad SATA cable or power problems would be a strong possibility. It really would be nice if we decoded these things more usefully for the user (same with the regular ATA errors, like drivers/ide does), but in general SError showing up as non-zero is a bad thing: 0x40 = Handshake error: When set to one, this bit indicates that one or more R_ERR handshake response was received in response to frame transmission. Such errors may be the result of a CRC error detected by the recipient, a disparity or 10b/8b decoding error, or other error condition leading to a negative handshake on a transmitted frame. 0x180 = Link Sequence Error: When set to one, this bit indicates that one or more Link state machine error conditions was encountered since the last time this bit was cleared. The Link Layer state machine defines the conditions under which the link layer detects an erroneous transition. and Transport state transition error: When set to one, this bit indicates that an error has occurred in the transition from one state to another within the Transport layer since the last time this bit was cleared. Just out of curiosity how often is that bit cleared? I believe that is cleared only on error handling or controller reset, so it just means that it happened sometime since boot or the last libata error recovery. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Fred Moyer wrote: I just joined the list today so apologies if this email breaks any email client post threading. I have been seeing similar errors on two different systems. I applied Robert's sata_nv patch posted to the list on May 5th, and approved today by Jeff Garzik. I've taken several steps to insure that this isn't a faulty cable or drive issue. This is running on a hp dl145g2. Here is my lspci, dmesg, and relevant kernel config sections: (snip) ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd b0/d2:f1:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 123392 in res 50/00:f1:00:4f:c2/00:00:00:00:00/00 Emask 0x202 (HSM violation) ata1: soft resetting port ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: configured for UDMA/100 ata1: EH complete This appears to be a different problem. Something is issuing SMART-related commands (smartd or smartctl perhaps) which the drive seems to be reacting strangely to. It apparently completed the command but never raised DRQ to request any data being transferred even though we expected it to. Maybe SMART is disabled on the drive and that's causing it to just toss these commands? CCing linux-ide in case anyone knows what would cause this. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Robert Hancock wrote: Fred Moyer wrote: I just joined the list today so apologies if this email breaks any email client post threading. I have been seeing similar errors on two different systems. I applied Robert's sata_nv patch posted to the list on May 5th, and approved today by Jeff Garzik. I've taken several steps to insure that this isn't a faulty cable or drive issue. This is running on a hp dl145g2. Here is my lspci, dmesg, and relevant kernel config sections: (snip) ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd b0/d2:f1:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 123392 in res 50/00:f1:00:4f:c2/00:00:00:00:00/00 Emask 0x202 (HSM violation) ata1: soft resetting port ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: configured for UDMA/100 ata1: EH complete This appears to be a different problem. Something is issuing SMART-related commands (smartd or smartctl perhaps) which the drive seems to be reacting strangely to. It apparently completed the command but never raised DRQ to request any data being transferred even though we expected it to. Maybe SMART is disabled on the drive and that's causing it to just toss these commands? CCing linux-ide in case anyone knows what would cause this. Here's smartctl -a for this drive - same output for both sda and sdb. Smartd is currently running. Any advice appreciated. Previously on 2.6.15 I was seeing sdb remount as readonly under heavy i/o. I have not seen that issue yet with 2.6.21 (with Robert's patch from May 5th for sata_nv), but that occurrence of remounts read-only was infrequently, so that issue may be solved. app2 ~ # smartctl -a /dev/sda smartctl version 5.36 [x86_64-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: ATA ST3808110AS Version: n/a Serial number: 5LR8895K Device type: disk Local Time is: Sat May 12 12:05:58 2007 PDT Device does not support SMART Error Counter logging not supported [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on'] Device does not support Self Test logging app2 ~ # ps aux | grep smart root 5227 0.0 0.0 2892 672 ?SMay11 0:00 /usr/sbin/smartd -p /var/run/smartd.pid root 19510 0.0 0.0 2648 648 pts/0S+ 12:07 0:00 grep --colour=auto smart - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Fred Moyer wrote: This appears to be a different problem. Something is issuing SMART-related commands (smartd or smartctl perhaps) which the drive seems to be reacting strangely to. It apparently completed the command but never raised DRQ to request any data being transferred even though we expected it to. Maybe SMART is disabled on the drive and that's causing it to just toss these commands? CCing linux-ide in case anyone knows what would cause this. Here's smartctl -a for this drive - same output for both sda and sdb. Smartd is currently running. Any advice appreciated. Previously on 2.6.15 I was seeing sdb remount as readonly under heavy i/o. I have not seen that issue yet with 2.6.21 (with Robert's patch from May 5th for sata_nv), but that occurrence of remounts read-only was infrequently, so that issue may be solved. app2 ~ # smartctl -a /dev/sda smartctl version 5.36 [x86_64-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: ATA ST3808110AS Version: n/a Serial number: 5LR8895K Device type: disk Local Time is: Sat May 12 12:05:58 2007 PDT Device does not support SMART Error Counter logging not supported [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on'] Device does not support Self Test logging Sounds like SMART is likely disabled on that drive. You can try doing smartctl -s on /dev/sda and see if that will turn it on. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Robert Hancock wrote: Fred Moyer wrote: This appears to be a different problem. Something is issuing SMART-related commands (smartd or smartctl perhaps) which the drive seems to be reacting strangely to. It apparently completed the command but never raised DRQ to request any data being transferred even though we expected it to. Maybe SMART is disabled on the drive and that's causing it to just toss these commands? CCing linux-ide in case anyone knows what would cause this. Here's smartctl -a for this drive - same output for both sda and sdb. Smartd is currently running. Any advice appreciated. Sounds like SMART is likely disabled on that drive. You can try doing smartctl -s on /dev/sda and see if that will turn it on. Sorry - that last post of mine was brain dead. Here's the one with (hopefully) useful data. app2 ~ # smartctl -d ata -a /dev/sda smartctl version 5.36 [x86_64-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: ST3808110AS Serial Number:5LR8895K Firmware Version: 3.AJJ User Capacity:80,026,361,856 bytes Device is:Not in smartctl database [for details use: -P showall] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is:Sat May 12 18:49:06 2007 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 431) seconds. Offline data collection capabilities:(0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities:(0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability:(0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time:( 2) minutes. Extended self-test routine recommended polling time:( 27) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 100 253 006Pre-fail Always - 0 3 Spin_Up_Time0x0002 097 094 000Old_age Always - 0 4 Start_Stop_Count0x0033 100 100 020Pre-fail Always - 41 5 Reallocated_Sector_Ct 0x0033 098 098 036Pre-fail Always - 80 7 Seek_Error_Rate 0x000f 073 060 030Pre-fail Always - 23194052 9 Power_On_Hours 0x0032 096 096 000Old_age Always - 3899 10 Spin_Retry_Count0x0013 100 100 097Pre-fail Always - 0 12 Power_Cycle_Count 0x0033 100 100 020Pre-fail Always - 108 187 Unknown_Attribute 0x0032 001 001 000Old_age Always - 17863 189 Unknown_Attribute 0x003a 100 100 000Old_age Always - 0 190 Unknown_Attribute 0x0022 070 057 045Old_age Always - 2689188364318 194 Temperature_Celsius 0x0022 030 043 000Old_age Always - 30 (Lifetime Min/Max 0/22) 195 Hardware_ECC_Recovered 0x001a 048 045 000Old_age Always - 2474070 197 Current_Pending_Sector 0x0012 100 100 000Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000Old_age Offline - 0 199 UDMA_CRC_Error_Count0x003e 200 200 000Old_age Always - 7 200 Multi_Zone_Error_Rate 0x 100 253 000Old_age Offline - 0 202 TA_Increase_Count 0x0032 100
Re: Re: [2.6.21.1] SATA freeze
Robert Hancock wrote: >Gerhard Mack wrote: >> On Wed, 9 May 2007, Jeff Garzik wrote: >>> Gerhard Mack wrote: May 9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x180 action 0x2 frozen May 9 14:51:35 mgerhard kernel: ata1.00: cmd 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out May 9 14:51:35 mgerhard kernel: res 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout) May 9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be patient (Status 0xd0) Anything I can do to figgure out what's causing this? > You're showing various flags set in the SError register, which > suggests you're having SATA communication problems with the drive. A > bad SATA cable or power problems would be a strong possibility. I just joined the list today so apologies if this email breaks any email client post threading. I have been seeing similar errors on two different systems. I applied Robert's sata_nv patch posted to the list on May 5th, and approved today by Jeff Garzik. I've taken several steps to insure that this isn't a faulty cable or drive issue. This is running on a hp dl145g2. Here is my lspci, dmesg, and relevant kernel config sections: Linux version 2.6.21-gentoo ([EMAIL PROTECTED]) (gcc version 4.1.1 (Gentoo 4.1.1)) #6 SMP Sun May 6 16:44:40 PDT 2007 Command line: root=/dev/sda2 BIOS-provided physical RAM map: BIOS-e820: - 00098800 (usable) BIOS-e820: 00098800 - 000a (reserved) BIOS-e820: 000c2000 - 0010 (reserved) BIOS-e820: 0010 - bff2 (usable) BIOS-e820: bff2 - bff29000 (ACPI data) BIOS-e820: bff29000 - bff8 (ACPI NVS) BIOS-e820: bff8 - c000 (reserved) BIOS-e820: d800 - d8000400 (reserved) BIOS-e820: d8001000 - d8001400 (reserved) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - fec00400 (reserved) BIOS-e820: fee0 - fee01000 (reserved) BIOS-e820: fff8 - 0001 (reserved) BIOS-e820: 0001 - 00014000 (usable) Entering add_active_range(0, 0, 152) 0 entries of 256 used Entering add_active_range(0, 256, 786208) 1 entries of 256 used Entering add_active_range(0, 1048576, 1310720) 2 entries of 256 used end_pfn_map = 1310720 DMI present. Entering add_active_range(0, 0, 152) 0 entries of 256 used Entering add_active_range(0, 256, 786208) 1 entries of 256 used Entering add_active_range(0, 1048576, 1310720) 2 entries of 256 used Zone PFN ranges: DMA 0 -> 4096 DMA324096 -> 1048576 Normal1048576 -> 1310720 early_node_map[3] active PFN ranges 0:0 -> 152 0: 256 -> 786208 0: 1048576 -> 1310720 On node 0 totalpages: 1048248 DMA zone: 56 pages used for memmap DMA zone: 1138 pages reserved DMA zone: 2798 pages, LIFO batch:0 DMA32 zone: 14280 pages used for memmap DMA32 zone: 767832 pages, LIFO batch:31 Normal zone: 3584 pages used for memmap Normal zone: 258560 pages, LIFO batch:31 Intel MultiProcessor Specification v1.4 MPTABLE: OEM ID: AMD MPTABLE: Product ID: HAMMER MPTABLE: APIC at: 0xFEE0 Processor #0 (Bootup-CPU) Processor #1 I/O APIC #2 at 0xFEC0. I/O APIC #3 at 0xD800. I/O APIC #4 at 0xD8001000. Setting APIC routing to flat Processors: 2 Nosave address range: 00098000 - 00099000 Nosave address range: 00099000 - 000a Nosave address range: 000a - 000c2000 Nosave address range: 000c2000 - 0010 Nosave address range: bff2 - bff29000 Nosave address range: bff29000 - bff8 Nosave address range: bff8 - c000 Nosave address range: c000 - d800 Nosave address range: d800 - d8001000 Nosave address range: d8001000 - e000 Nosave address range: e000 - f000 Nosave address range: f000 - fec0 Nosave address range: fec0 - fee0 Nosave address range: fee0 - fee01000 Nosave address range: fee01000 - fff8 Nosave address range: fff8 - 0001 Allocating PCI resources starting at c200 (gap: c000:1800) PERCPU: Allocating 36608 bytes of per cpu data Built 1 zonelists. Total pages: 1029190 Kernel command line: root=/dev/sda2 Initializing CPU#0 PID hash table entries: 4096 (order: 12, 32768 bytes) time.c: Detected 2009.287 MHz processor. Console: colour VGA+ 80x25 Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) Checking aperture... CPU 0: aperture @ 233e00 size 32 MB
Re: [2.6.21.1] SATA freeze
On Wed, 9 May 2007, Robert Hancock wrote: > Gerhard Mack wrote: > > On Wed, 9 May 2007, Jeff Garzik wrote: > > > Gerhard Mack wrote: > > > > May 9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 > > > > SErr > > > > 0x180 action 0x2 frozen > > > > May 9 14:51:35 mgerhard kernel: ata1.00: cmd > > > > 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out > > > > May 9 14:51:35 mgerhard kernel: res > > > > 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout) > > > > May 9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please > > > > be > > > > patient (Status 0xd0) > > > > > > > > Anything I can do to figgure out what's causing this? > > You're showing various flags set in the SError register, which suggests you're > having SATA communication problems with the drive. A bad SATA cable or power > problems would be a strong possibility. > > It really would be nice if we decoded these things more usefully for the user > (same with the regular ATA errors, like drivers/ide does), but in general > SError showing up as non-zero is a bad thing: > > 0x40 = "Handshake error: When set to one, this bit indicates that one or > more R_ERR handshake response was received in response to frame transmission. > Such errors may be the result of a CRC error detected by the recipient, a > disparity or 10b/8b decoding error, or other error condition leading to a > negative handshake on a transmitted frame." > > 0x180 = "Link Sequence Error: When set to one, this bit indicates that one > or more Link state machine error conditions was encountered since the last > time this bit was cleared. The Link Layer state machine defines the conditions > under which the link layer detects an erroneous transition." > > and > > "Transport state transition error: When set to one, this bit indicates that an > error has occurred in the transition from one state to another within the > Transport layer since the last time this bit was cleared." Just out of curiosity how often is that bit cleared? Gerhard -- Gerhard Mack [EMAIL PROTECTED] <>< As a computer I find your faith in technology amusing. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
On Thu, 10 May 2007, Mikael Pettersson wrote: > Date: Thu, 10 May 2007 10:51:57 +0200 > From: Mikael Pettersson <[EMAIL PROTECTED]> > To: Gerhard Mack <[EMAIL PROTECTED]> > Cc: Jeff Garzik <[EMAIL PROTECTED]>, linux-kernel@vger.kernel.org > Subject: Re: [2.6.21.1] SATA freeze > > Gerhard Mack writes: > > On Wed, 9 May 2007, Jeff Garzik wrote: > > > Gerhard Mack wrote: > > > > May 9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 > SErr > > > > 0x180 action 0x2 frozen > > > > May 9 14:51:35 mgerhard kernel: ata1.00: cmd > > > > 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out > > > > May 9 14:51:35 mgerhard kernel: res > > > > 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout) > > > > May 9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please > be > > > > patient (Status 0xd0) > > > > > > > > Anything I can do to figgure out what's causing this? > > > > > > Provide full lspci, dmesg, kernel config? > > > > > Done. > > Your second boot (warm or cold?) Warm boot. > > > May 9 14:43:07 mgerhard kernel: klogd 1.4.1#20, log source = /proc/kmsg > started. > > May 9 14:43:07 mgerhard kernel: Linux version 2.6.21.1 ([EMAIL > PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 > SMP PREEMPT Wed May 2 20:08:35 EDT 2007 > > May 9 14:43:07 mgerhard kernel: Command line: root=/dev/sda3 ro > > worked fine until ReiserFS's journal replay caused a single SATA exception: > > > May 9 14:43:07 mgerhard kernel: ReiserFS: sda3: There were 7 uncompleted > unlinks/truncates. Completed > > May 9 14:43:07 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 > SErr 0x40 action 0x2 > > May 9 14:43:07 mgerhard kernel: ata1.00: (BMDMA stat 0x25) > > May 9 14:43:07 mgerhard kernel: ata1.00: cmd > 35/00:58:20:4d:23/00:01:00:00:00/e0 tag 0 cdb 0x0 data 176128 out > > May 9 14:43:07 mgerhard kernel: res > 51/84:28:50:4d:23/84:01:00:00:00/e0 Emask 0x10 (ATA bus error) > > May 9 14:43:07 mgerhard kernel: ata1: soft resetting port > > May 9 14:43:07 mgerhard kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 > SControl 300) > > May 9 14:43:07 mgerhard kernel: ata1.00: configured for UDMA/100 > > May 9 14:43:07 mgerhard kernel: ata1: EH complete > > May 9 14:43:07 mgerhard kernel: SCSI device sda: 488397168 512-byte hdwr > sectors (250059 MB) > > Shortly thereafter you loaded a proprietary module Oops thought I killed that. > > > May 9 14:43:17 mgerhard kernel: nvidia: module license 'NVIDIA' taints > kernel. > > May 9 14:43:17 mgerhard kernel: ACPI: PCI Interrupt Link [APC7] enabled > at IRQ 16 > > May 9 14:43:17 mgerhard kernel: ACPI: PCI Interrupt :00:05.0[A] -> > Link [APC7] -> GSI 16 (level, low) -> IRQ 16 > > May 9 14:43:17 mgerhard kernel: PCI: Setting latency timer of device > :00:05.0 to 64 > > May 9 14:43:17 mgerhard kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel > Module 1.0-9746 Fri Dec 15 10:19:35 PST 2006 > > and immediately there's a large number of SATA exceptions: > > > May 9 14:44:37 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 > SErr 0x40 action 0x2 > > May 9 14:44:37 mgerhard kernel: ata1.00: (BMDMA stat 0x25) > > May 9 14:44:37 mgerhard kernel: ata1.00: cmd > 35/00:00:b0:53:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out > > May 9 14:44:37 mgerhard kernel: res > 51/84:60:50:56:c8/84:01:09:00:00/e0 Emask 0x10 (ATA bus error) > > May 9 14:44:37 mgerhard kernel: ata1: soft resetting port > > May 9 14:44:37 mgerhard kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 > SControl 300) > > May 9 14:44:37 mgerhard kernel: ata1.00: configured for UDMA/100 > (repeated) > > Please try a cold boot (so the HW is in a pristine state) without > ever loading the nvidia module. Cold boot cleared the drive problems. Nvidia loaded or not has no affect on it at this point. Thanks for the help. Gerhard -- Gerhard Mack [EMAIL PROTECTED] <>< As a computer I find your faith in technology amusing. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Gerhard Mack writes: > On Wed, 9 May 2007, Jeff Garzik wrote: > > Gerhard Mack wrote: > > > May 9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 > > > SErr > > > 0x180 action 0x2 frozen > > > May 9 14:51:35 mgerhard kernel: ata1.00: cmd > > > 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out > > > May 9 14:51:35 mgerhard kernel: res > > > 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout) > > > May 9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be > > > patient (Status 0xd0) > > > > > > Anything I can do to figgure out what's causing this? > > > > Provide full lspci, dmesg, kernel config? > > > Done. Your second boot (warm or cold?) > May 9 14:43:07 mgerhard kernel: klogd 1.4.1#20, log source = /proc/kmsg > started. > May 9 14:43:07 mgerhard kernel: Linux version 2.6.21.1 ([EMAIL PROTECTED]) > (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP PREEMPT > Wed May 2 20:08:35 EDT 2007 > May 9 14:43:07 mgerhard kernel: Command line: root=/dev/sda3 ro worked fine until ReiserFS's journal replay caused a single SATA exception: > May 9 14:43:07 mgerhard kernel: ReiserFS: sda3: There were 7 uncompleted > unlinks/truncates. Completed > May 9 14:43:07 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr > 0x40 action 0x2 > May 9 14:43:07 mgerhard kernel: ata1.00: (BMDMA stat 0x25) > May 9 14:43:07 mgerhard kernel: ata1.00: cmd > 35/00:58:20:4d:23/00:01:00:00:00/e0 tag 0 cdb 0x0 data 176128 out > May 9 14:43:07 mgerhard kernel: res > 51/84:28:50:4d:23/84:01:00:00:00/e0 Emask 0x10 (ATA bus error) > May 9 14:43:07 mgerhard kernel: ata1: soft resetting port > May 9 14:43:07 mgerhard kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 > SControl 300) > May 9 14:43:07 mgerhard kernel: ata1.00: configured for UDMA/100 > May 9 14:43:07 mgerhard kernel: ata1: EH complete > May 9 14:43:07 mgerhard kernel: SCSI device sda: 488397168 512-byte hdwr > sectors (250059 MB) Shortly thereafter you loaded a proprietary module > May 9 14:43:17 mgerhard kernel: nvidia: module license 'NVIDIA' taints > kernel. > May 9 14:43:17 mgerhard kernel: ACPI: PCI Interrupt Link [APC7] enabled at > IRQ 16 > May 9 14:43:17 mgerhard kernel: ACPI: PCI Interrupt :00:05.0[A] -> Link > [APC7] -> GSI 16 (level, low) -> IRQ 16 > May 9 14:43:17 mgerhard kernel: PCI: Setting latency timer of device > :00:05.0 to 64 > May 9 14:43:17 mgerhard kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel > Module 1.0-9746 Fri Dec 15 10:19:35 PST 2006 and immediately there's a large number of SATA exceptions: > May 9 14:44:37 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr > 0x40 action 0x2 > May 9 14:44:37 mgerhard kernel: ata1.00: (BMDMA stat 0x25) > May 9 14:44:37 mgerhard kernel: ata1.00: cmd > 35/00:00:b0:53:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out > May 9 14:44:37 mgerhard kernel: res > 51/84:60:50:56:c8/84:01:09:00:00/e0 Emask 0x10 (ATA bus error) > May 9 14:44:37 mgerhard kernel: ata1: soft resetting port > May 9 14:44:37 mgerhard kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 > SControl 300) > May 9 14:44:37 mgerhard kernel: ata1.00: configured for UDMA/100 (repeated) Please try a cold boot (so the HW is in a pristine state) without ever loading the nvidia module. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Gerhard Mack writes: On Wed, 9 May 2007, Jeff Garzik wrote: Gerhard Mack wrote: May 9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x180 action 0x2 frozen May 9 14:51:35 mgerhard kernel: ata1.00: cmd 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out May 9 14:51:35 mgerhard kernel: res 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout) May 9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be patient (Status 0xd0) Anything I can do to figgure out what's causing this? Provide full lspci, dmesg, kernel config? Done. Your second boot (warm or cold?) May 9 14:43:07 mgerhard kernel: klogd 1.4.1#20, log source = /proc/kmsg started. May 9 14:43:07 mgerhard kernel: Linux version 2.6.21.1 ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP PREEMPT Wed May 2 20:08:35 EDT 2007 May 9 14:43:07 mgerhard kernel: Command line: root=/dev/sda3 ro worked fine until ReiserFS's journal replay caused a single SATA exception: May 9 14:43:07 mgerhard kernel: ReiserFS: sda3: There were 7 uncompleted unlinks/truncates. Completed May 9 14:43:07 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x40 action 0x2 May 9 14:43:07 mgerhard kernel: ata1.00: (BMDMA stat 0x25) May 9 14:43:07 mgerhard kernel: ata1.00: cmd 35/00:58:20:4d:23/00:01:00:00:00/e0 tag 0 cdb 0x0 data 176128 out May 9 14:43:07 mgerhard kernel: res 51/84:28:50:4d:23/84:01:00:00:00/e0 Emask 0x10 (ATA bus error) May 9 14:43:07 mgerhard kernel: ata1: soft resetting port May 9 14:43:07 mgerhard kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) May 9 14:43:07 mgerhard kernel: ata1.00: configured for UDMA/100 May 9 14:43:07 mgerhard kernel: ata1: EH complete May 9 14:43:07 mgerhard kernel: SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) Shortly thereafter you loaded a proprietary module May 9 14:43:17 mgerhard kernel: nvidia: module license 'NVIDIA' taints kernel. May 9 14:43:17 mgerhard kernel: ACPI: PCI Interrupt Link [APC7] enabled at IRQ 16 May 9 14:43:17 mgerhard kernel: ACPI: PCI Interrupt :00:05.0[A] - Link [APC7] - GSI 16 (level, low) - IRQ 16 May 9 14:43:17 mgerhard kernel: PCI: Setting latency timer of device :00:05.0 to 64 May 9 14:43:17 mgerhard kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module 1.0-9746 Fri Dec 15 10:19:35 PST 2006 and immediately there's a large number of SATA exceptions: May 9 14:44:37 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x40 action 0x2 May 9 14:44:37 mgerhard kernel: ata1.00: (BMDMA stat 0x25) May 9 14:44:37 mgerhard kernel: ata1.00: cmd 35/00:00:b0:53:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out May 9 14:44:37 mgerhard kernel: res 51/84:60:50:56:c8/84:01:09:00:00/e0 Emask 0x10 (ATA bus error) May 9 14:44:37 mgerhard kernel: ata1: soft resetting port May 9 14:44:37 mgerhard kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) May 9 14:44:37 mgerhard kernel: ata1.00: configured for UDMA/100 (repeated) Please try a cold boot (so the HW is in a pristine state) without ever loading the nvidia module. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
On Thu, 10 May 2007, Mikael Pettersson wrote: Date: Thu, 10 May 2007 10:51:57 +0200 From: Mikael Pettersson [EMAIL PROTECTED] To: Gerhard Mack [EMAIL PROTECTED] Cc: Jeff Garzik [EMAIL PROTECTED], linux-kernel@vger.kernel.org Subject: Re: [2.6.21.1] SATA freeze Gerhard Mack writes: On Wed, 9 May 2007, Jeff Garzik wrote: Gerhard Mack wrote: May 9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x180 action 0x2 frozen May 9 14:51:35 mgerhard kernel: ata1.00: cmd 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out May 9 14:51:35 mgerhard kernel: res 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout) May 9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be patient (Status 0xd0) Anything I can do to figgure out what's causing this? Provide full lspci, dmesg, kernel config? Done. Your second boot (warm or cold?) Warm boot. May 9 14:43:07 mgerhard kernel: klogd 1.4.1#20, log source = /proc/kmsg started. May 9 14:43:07 mgerhard kernel: Linux version 2.6.21.1 ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP PREEMPT Wed May 2 20:08:35 EDT 2007 May 9 14:43:07 mgerhard kernel: Command line: root=/dev/sda3 ro worked fine until ReiserFS's journal replay caused a single SATA exception: May 9 14:43:07 mgerhard kernel: ReiserFS: sda3: There were 7 uncompleted unlinks/truncates. Completed May 9 14:43:07 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x40 action 0x2 May 9 14:43:07 mgerhard kernel: ata1.00: (BMDMA stat 0x25) May 9 14:43:07 mgerhard kernel: ata1.00: cmd 35/00:58:20:4d:23/00:01:00:00:00/e0 tag 0 cdb 0x0 data 176128 out May 9 14:43:07 mgerhard kernel: res 51/84:28:50:4d:23/84:01:00:00:00/e0 Emask 0x10 (ATA bus error) May 9 14:43:07 mgerhard kernel: ata1: soft resetting port May 9 14:43:07 mgerhard kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) May 9 14:43:07 mgerhard kernel: ata1.00: configured for UDMA/100 May 9 14:43:07 mgerhard kernel: ata1: EH complete May 9 14:43:07 mgerhard kernel: SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) Shortly thereafter you loaded a proprietary module Oops thought I killed that. May 9 14:43:17 mgerhard kernel: nvidia: module license 'NVIDIA' taints kernel. May 9 14:43:17 mgerhard kernel: ACPI: PCI Interrupt Link [APC7] enabled at IRQ 16 May 9 14:43:17 mgerhard kernel: ACPI: PCI Interrupt :00:05.0[A] - Link [APC7] - GSI 16 (level, low) - IRQ 16 May 9 14:43:17 mgerhard kernel: PCI: Setting latency timer of device :00:05.0 to 64 May 9 14:43:17 mgerhard kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module 1.0-9746 Fri Dec 15 10:19:35 PST 2006 and immediately there's a large number of SATA exceptions: May 9 14:44:37 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x40 action 0x2 May 9 14:44:37 mgerhard kernel: ata1.00: (BMDMA stat 0x25) May 9 14:44:37 mgerhard kernel: ata1.00: cmd 35/00:00:b0:53:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out May 9 14:44:37 mgerhard kernel: res 51/84:60:50:56:c8/84:01:09:00:00/e0 Emask 0x10 (ATA bus error) May 9 14:44:37 mgerhard kernel: ata1: soft resetting port May 9 14:44:37 mgerhard kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) May 9 14:44:37 mgerhard kernel: ata1.00: configured for UDMA/100 (repeated) Please try a cold boot (so the HW is in a pristine state) without ever loading the nvidia module. Cold boot cleared the drive problems. Nvidia loaded or not has no affect on it at this point. Thanks for the help. Gerhard -- Gerhard Mack [EMAIL PROTECTED] As a computer I find your faith in technology amusing. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
On Wed, 9 May 2007, Robert Hancock wrote: Gerhard Mack wrote: On Wed, 9 May 2007, Jeff Garzik wrote: Gerhard Mack wrote: May 9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x180 action 0x2 frozen May 9 14:51:35 mgerhard kernel: ata1.00: cmd 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out May 9 14:51:35 mgerhard kernel: res 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout) May 9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be patient (Status 0xd0) Anything I can do to figgure out what's causing this? You're showing various flags set in the SError register, which suggests you're having SATA communication problems with the drive. A bad SATA cable or power problems would be a strong possibility. It really would be nice if we decoded these things more usefully for the user (same with the regular ATA errors, like drivers/ide does), but in general SError showing up as non-zero is a bad thing: 0x40 = Handshake error: When set to one, this bit indicates that one or more R_ERR handshake response was received in response to frame transmission. Such errors may be the result of a CRC error detected by the recipient, a disparity or 10b/8b decoding error, or other error condition leading to a negative handshake on a transmitted frame. 0x180 = Link Sequence Error: When set to one, this bit indicates that one or more Link state machine error conditions was encountered since the last time this bit was cleared. The Link Layer state machine defines the conditions under which the link layer detects an erroneous transition. and Transport state transition error: When set to one, this bit indicates that an error has occurred in the transition from one state to another within the Transport layer since the last time this bit was cleared. Just out of curiosity how often is that bit cleared? Gerhard -- Gerhard Mack [EMAIL PROTECTED] As a computer I find your faith in technology amusing. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: [2.6.21.1] SATA freeze
Robert Hancock wrote: Gerhard Mack wrote: On Wed, 9 May 2007, Jeff Garzik wrote: Gerhard Mack wrote: May 9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x180 action 0x2 frozen May 9 14:51:35 mgerhard kernel: ata1.00: cmd 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out May 9 14:51:35 mgerhard kernel: res 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout) May 9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be patient (Status 0xd0) Anything I can do to figgure out what's causing this? You're showing various flags set in the SError register, which suggests you're having SATA communication problems with the drive. A bad SATA cable or power problems would be a strong possibility. I just joined the list today so apologies if this email breaks any email client post threading. I have been seeing similar errors on two different systems. I applied Robert's sata_nv patch posted to the list on May 5th, and approved today by Jeff Garzik. I've taken several steps to insure that this isn't a faulty cable or drive issue. This is running on a hp dl145g2. Here is my lspci, dmesg, and relevant kernel config sections: Linux version 2.6.21-gentoo ([EMAIL PROTECTED]) (gcc version 4.1.1 (Gentoo 4.1.1)) #6 SMP Sun May 6 16:44:40 PDT 2007 Command line: root=/dev/sda2 BIOS-provided physical RAM map: BIOS-e820: - 00098800 (usable) BIOS-e820: 00098800 - 000a (reserved) BIOS-e820: 000c2000 - 0010 (reserved) BIOS-e820: 0010 - bff2 (usable) BIOS-e820: bff2 - bff29000 (ACPI data) BIOS-e820: bff29000 - bff8 (ACPI NVS) BIOS-e820: bff8 - c000 (reserved) BIOS-e820: d800 - d8000400 (reserved) BIOS-e820: d8001000 - d8001400 (reserved) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - fec00400 (reserved) BIOS-e820: fee0 - fee01000 (reserved) BIOS-e820: fff8 - 0001 (reserved) BIOS-e820: 0001 - 00014000 (usable) Entering add_active_range(0, 0, 152) 0 entries of 256 used Entering add_active_range(0, 256, 786208) 1 entries of 256 used Entering add_active_range(0, 1048576, 1310720) 2 entries of 256 used end_pfn_map = 1310720 DMI present. Entering add_active_range(0, 0, 152) 0 entries of 256 used Entering add_active_range(0, 256, 786208) 1 entries of 256 used Entering add_active_range(0, 1048576, 1310720) 2 entries of 256 used Zone PFN ranges: DMA 0 - 4096 DMA324096 - 1048576 Normal1048576 - 1310720 early_node_map[3] active PFN ranges 0:0 - 152 0: 256 - 786208 0: 1048576 - 1310720 On node 0 totalpages: 1048248 DMA zone: 56 pages used for memmap DMA zone: 1138 pages reserved DMA zone: 2798 pages, LIFO batch:0 DMA32 zone: 14280 pages used for memmap DMA32 zone: 767832 pages, LIFO batch:31 Normal zone: 3584 pages used for memmap Normal zone: 258560 pages, LIFO batch:31 Intel MultiProcessor Specification v1.4 MPTABLE: OEM ID: AMD MPTABLE: Product ID: HAMMER MPTABLE: APIC at: 0xFEE0 Processor #0 (Bootup-CPU) Processor #1 I/O APIC #2 at 0xFEC0. I/O APIC #3 at 0xD800. I/O APIC #4 at 0xD8001000. Setting APIC routing to flat Processors: 2 Nosave address range: 00098000 - 00099000 Nosave address range: 00099000 - 000a Nosave address range: 000a - 000c2000 Nosave address range: 000c2000 - 0010 Nosave address range: bff2 - bff29000 Nosave address range: bff29000 - bff8 Nosave address range: bff8 - c000 Nosave address range: c000 - d800 Nosave address range: d800 - d8001000 Nosave address range: d8001000 - e000 Nosave address range: e000 - f000 Nosave address range: f000 - fec0 Nosave address range: fec0 - fee0 Nosave address range: fee0 - fee01000 Nosave address range: fee01000 - fff8 Nosave address range: fff8 - 0001 Allocating PCI resources starting at c200 (gap: c000:1800) PERCPU: Allocating 36608 bytes of per cpu data Built 1 zonelists. Total pages: 1029190 Kernel command line: root=/dev/sda2 Initializing CPU#0 PID hash table entries: 4096 (order: 12, 32768 bytes) time.c: Detected 2009.287 MHz processor. Console: colour VGA+ 80x25 Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) Checking aperture... CPU 0: aperture @ 233e00 size 32 MB Aperture too small (32 MB) No AGP bridge found Your BIOS
Re: [2.6.21.1] SATA freeze
Gerhard Mack wrote: On Wed, 9 May 2007, Jeff Garzik wrote: Gerhard Mack wrote: May 9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x180 action 0x2 frozen May 9 14:51:35 mgerhard kernel: ata1.00: cmd 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out May 9 14:51:35 mgerhard kernel: res 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout) May 9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be patient (Status 0xd0) Anything I can do to figgure out what's causing this? You're showing various flags set in the SError register, which suggests you're having SATA communication problems with the drive. A bad SATA cable or power problems would be a strong possibility. It really would be nice if we decoded these things more usefully for the user (same with the regular ATA errors, like drivers/ide does), but in general SError showing up as non-zero is a bad thing: 0x40 = "Handshake error: When set to one, this bit indicates that one or more R_ERR handshake response was received in response to frame transmission. Such errors may be the result of a CRC error detected by the recipient, a disparity or 10b/8b decoding error, or other error condition leading to a negative handshake on a transmitted frame." 0x180 = "Link Sequence Error: When set to one, this bit indicates that one or more Link state machine error conditions was encountered since the last time this bit was cleared. The Link Layer state machine defines the conditions under which the link layer detects an erroneous transition." and "Transport state transition error: When set to one, this bit indicates that an error has occurred in the transition from one state to another within the Transport layer since the last time this bit was cleared." -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Gerhard Mack wrote: > On Wed, 9 May 2007, Jeff Garzik wrote: >> Gerhard Mack wrote: >>> May 9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr >>> 0x180 action 0x2 frozen >>> May 9 14:51:35 mgerhard kernel: ata1.00: cmd >>> 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out >>> May 9 14:51:35 mgerhard kernel: res >>> 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout) >>> May 9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be >>> patient (Status 0xd0) >>> >>> Anything I can do to figgure out what's causing this? >> Provide full lspci, dmesg, kernel config? >> > Done. > You could try: pci=nomsi (kernel option) and/or adma=0 (module option for sata_nv) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Gerhard Mack wrote: May 9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x180 action 0x2 frozen May 9 14:51:35 mgerhard kernel: ata1.00: cmd 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out May 9 14:51:35 mgerhard kernel: res 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout) May 9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be patient (Status 0xd0) Anything I can do to figgure out what's causing this? Provide full lspci, dmesg, kernel config? Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[2.6.21.1] SATA freeze
May 9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x180 action 0x2 frozen May 9 14:51:35 mgerhard kernel: ata1.00: cmd 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out May 9 14:51:35 mgerhard kernel: res 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout) May 9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be patient (Status 0xd0) Anything I can do to figgure out what's causing this? Gerhard -- Gerhard Mack [EMAIL PROTECTED] <>< As a computer I find your faith in technology amusing. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[2.6.21.1] SATA freeze
May 9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x180 action 0x2 frozen May 9 14:51:35 mgerhard kernel: ata1.00: cmd 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out May 9 14:51:35 mgerhard kernel: res 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout) May 9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be patient (Status 0xd0) Anything I can do to figgure out what's causing this? Gerhard -- Gerhard Mack [EMAIL PROTECTED] As a computer I find your faith in technology amusing. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Gerhard Mack wrote: May 9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x180 action 0x2 frozen May 9 14:51:35 mgerhard kernel: ata1.00: cmd 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out May 9 14:51:35 mgerhard kernel: res 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout) May 9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be patient (Status 0xd0) Anything I can do to figgure out what's causing this? Provide full lspci, dmesg, kernel config? Jeff - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Gerhard Mack wrote: On Wed, 9 May 2007, Jeff Garzik wrote: Gerhard Mack wrote: May 9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x180 action 0x2 frozen May 9 14:51:35 mgerhard kernel: ata1.00: cmd 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out May 9 14:51:35 mgerhard kernel: res 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout) May 9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be patient (Status 0xd0) Anything I can do to figgure out what's causing this? Provide full lspci, dmesg, kernel config? Done. You could try: pci=nomsi (kernel option) and/or adma=0 (module option for sata_nv) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Gerhard Mack wrote: On Wed, 9 May 2007, Jeff Garzik wrote: Gerhard Mack wrote: May 9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x180 action 0x2 frozen May 9 14:51:35 mgerhard kernel: ata1.00: cmd 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out May 9 14:51:35 mgerhard kernel: res 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout) May 9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be patient (Status 0xd0) Anything I can do to figgure out what's causing this? You're showing various flags set in the SError register, which suggests you're having SATA communication problems with the drive. A bad SATA cable or power problems would be a strong possibility. It really would be nice if we decoded these things more usefully for the user (same with the regular ATA errors, like drivers/ide does), but in general SError showing up as non-zero is a bad thing: 0x40 = Handshake error: When set to one, this bit indicates that one or more R_ERR handshake response was received in response to frame transmission. Such errors may be the result of a CRC error detected by the recipient, a disparity or 10b/8b decoding error, or other error condition leading to a negative handshake on a transmitted frame. 0x180 = Link Sequence Error: When set to one, this bit indicates that one or more Link state machine error conditions was encountered since the last time this bit was cleared. The Link Layer state machine defines the conditions under which the link layer detects an erroneous transition. and Transport state transition error: When set to one, this bit indicates that an error has occurred in the transition from one state to another within the Transport layer since the last time this bit was cleared. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/