sata_nv + ADMA + Samsung disk problem

2007-08-08 Thread Gabor Gombas
Hi,

Since I have upgraded to 2.6.22.1 from 2.6.20 I have problems with
Samsung disks. Sometimes the disks stall for about half a minute and
then I have these messages in the logs:

Aug  6 20:10:11 twister kernel: ata7: EH in ADMA mode, notifier 0x0 
notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0 next cpb 
idx 0x0
Aug  6 20:10:12 twister kernel: ata7: CPB 0: ctl_flags 0x9, resp_flags 0x0
Aug  6 20:10:12 twister kernel: ata7: timeout waiting for ADMA IDLE, stat=0x400
Aug  6 20:10:12 twister kernel: ata7: timeout waiting for ADMA LEGACY, 
stat=0x400
Aug  6 20:10:12 twister kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 
action 0x2 frozen
Aug  6 20:10:12 twister kernel: ata7.00: cmd 
ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 
Aug  6 20:10:12 twister kernel:  res 
40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug  6 20:10:12 twister kernel: ata7: soft resetting port
Aug  6 20:10:12 twister kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 
SControl 300)
Aug  6 20:10:12 twister kernel: ata7.00: configured for UDMA/133
Aug  6 20:10:12 twister kernel: ata7: EH complete
Aug  6 20:10:12 twister kernel: sd 6:0:0:0: [sdc] 488397168 512-byte hardware 
sectors (250059 MB)
Aug  6 20:10:12 twister kernel: sd 6:0:0:0: [sdc] Write Protect is off
Aug  6 20:10:12 twister kernel: sd 6:0:0:0: [sdc] Mode Sense: 00 3a 00 00
Aug  6 20:10:12 twister kernel: sd 6:0:0:0: [sdc] Write cache: enabled, read 
cache: enabled, doesn't support DPO or FUA
Aug  6 20:20:25 twister kernel: ata8: EH in ADMA mode, notifier 0x0 
notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0 next cpb 
idx 0x0
Aug  6 20:20:25 twister kernel: ata8: CPB 0: ctl_flags 0x9, resp_flags 0x0
Aug  6 20:20:25 twister kernel: ata8: timeout waiting for ADMA IDLE, stat=0x400
Aug  6 20:20:25 twister kernel: ata8: timeout waiting for ADMA LEGACY, 
stat=0x400
Aug  6 20:20:25 twister kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 
action 0x2 frozen
Aug  6 20:20:25 twister kernel: ata8.00: cmd 
ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 
Aug  6 20:20:25 twister kernel:  res 
40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug  6 20:20:25 twister kernel: ata8: soft resetting port
Aug  6 20:20:25 twister kernel: ata8: SATA link up 3.0 Gbps (SStatus 123 
SControl 300)
Aug  6 20:20:25 twister kernel: ata8.00: configured for UDMA/133
Aug  6 20:20:25 twister kernel: ata8: EH complete
Aug  6 20:20:25 twister kernel: sd 7:0:0:0: [sdd] 488397168 512-byte hardware 
sectors (250059 MB)
Aug  6 20:20:25 twister kernel: sd 7:0:0:0: [sdd] Write Protect is off
Aug  6 20:20:25 twister kernel: sd 7:0:0:0: [sdd] Mode Sense: 00 3a 00 00
Aug  6 20:20:25 twister kernel: sd 7:0:0:0: [sdd] Write cache: enabled, read 
cache: enabled, doesn't support DPO or FUA

I also have two Maxtor disks on the same controller but they are working
correctly in ADMA mode. I now disabled ADMA mode and that seems to help.

hdparm -I output for the misbehaving disks (they're identical):

/dev/sdc:

ATA device, with non-removable media
Model Number:   SAMSUNG SP2504C 
Serial Number:  XX  
Firmware Revision:  VT100-33
Standards:
Used: ATA/ATAPI-7 T13 1532D revision 4a 
Supported: 7 6 5 4 
Configuration:
Logical max current
cylinders   16383   16383
heads   16  16
sectors/track   63  63
--
CHS current addressable sectors:   16514064
LBAuser addressable sectors:  268435455
LBA48  user addressable sectors:  488397168
device size with M = 1024*1024:  238475 MBytes
device size with M = 1000*1000:  250059 MBytes (250 GB)
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16  Current = 16
Recommended acoustic management value: 254, current value: 128
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 udma7 
 Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4 
 Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
Enabled Supported:
   *SMART feature set
Security Mode feature set
   *Power Management feature set
   *Write cache
   *Look-ahead
   *Host Protected Area feature set
   *WRITE_BUFFER command
   *READ_BUFFER command
   *NOP cmd
   *DOWNLOAD_MICROCODE
SET_MAX security extension
   *Automatic Acoustic Management feature set
   *48-bit Address feature set
   *Device Configuration Overlay feature set
   *Mandatory FLUSH_CACHE
   *FLUSH_CACHE_EXT
  

Re: sata_nv + ADMA + Samsung disk problem

2007-08-14 Thread Gabor Gombas
On Tue, Aug 14, 2007 at 06:30:28PM +0900, Tejun Heo wrote:

> Hmmm... That's timeout on cache flush, indicative of failing disk.
> Please post the result of 'smartctl -a /dev/sdc'.

Will do when I get home. Note however that this only occurs in ADMA
mode. It never occured with 2.6.20 and it never occured with 2.6.22 ever
since I have disabled ADMA.

Gabor

-- 
 -
 MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
 -
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sata_nv + ADMA + Samsung disk problem

2007-08-16 Thread Gabor Gombas
Hi,

On Tue, Aug 14, 2007 at 06:30:28PM +0900, Tejun Heo wrote:

> Hmmm... That's timeout on cache flush, indicative of failing disk.
> Please post the result of 'smartctl -a /dev/sdc'.

Ok, so something is fishy in 2.6.22 wrt. SMART.

First, booting back to 2.6.20.5 I confirmed that SMART works without any
problems for all 4 disks, so all the following is a regression in
2.6.22.

I have 4 disks: two Maxtors (hdparm -I output below): sda/sdb, and two
Samsung (hdparm -I output is in my previous mail): sdc/sdd.

< cut >
/dev/sda:

ATA device, with non-removable media
Model Number:   Maxtor 6B250S0  
Serial Number:  
Firmware Revision:  BANC1G10
Standards:
Used: ATA/ATAPI-7 T13 1532D revision 0 
Supported: 7 6 5 4 
Configuration:
Logical max current
cylinders   16383   16383
heads   16  16
sectors/track   63  63
--
CHS current addressable sectors:   16514064
LBAuser addressable sectors:  268435455
LBA48  user addressable sectors:  490234752
device size with M = 1024*1024:  239372 MBytes
device size with M = 1000*1000:  251000 MBytes (251 GB)
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16  Current = 16
Advanced power management level: unknown setting (0x)
Recommended acoustic management value: 192, current value: 128
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
 Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4 
 Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
Enabled Supported:
   *SMART feature set
Security Mode feature set
   *Power Management feature set
   *Write cache
   *Look-ahead
   *Host Protected Area feature set
   *WRITE_VERIFY command
   *WRITE_BUFFER command
   *READ_BUFFER command
   *NOP cmd
   *DOWNLOAD_MICROCODE
Advanced Power Management feature set
SET_MAX security extension
   *Automatic Acoustic Management feature set
   *48-bit Address feature set
   *Device Configuration Overlay feature set
   *Mandatory FLUSH_CACHE
   *FLUSH_CACHE_EXT
   *SMART error logging
   *SMART self-test
Media Card Pass-Through
   *General Purpose Logging feature set
   *WRITE_{DMA|MULTIPLE}_FUA_EXT
   *URG for READ_STREAM[_DMA]_EXT
   *URG for WRITE_STREAM[_DMA]_EXT
   *SATA-I signaling speed (1.5Gb/s)
   *Native Command Queueing (NCQ)
Software settings preservation
   *SMART Command Transport (SCT) feature set
   *SCT Data Tables (AC5)
Security: 
Master password revision code = 65534
supported
not enabled
not locked
frozen
not expired: security count
not supported: enhanced erase
Checksum: correct
< cut >

Under 2.6.22.1, when I try to do "smartctl -d ata -s on /dev/sd[ab]" or
"smartctl -d ata -a /dev/sd[ab]", I get the following error:

< cut >
smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce 
Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Maxtor DiamondMax 10 family (ATA/133 and SATA/150)
Device Model: Maxtor 6B250S0
Serial Number:
Firmware Version: BANC1G10
User Capacity:251,000,193,024 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
Local Time is:Wed Aug 15 12:01:38 2007 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Error SMART Status command failed
Please get assistance from http://smartmontools.sourceforge.net/
Register values returned from SMART Status command are:
CMD=0x50
FR =0x00
NS =0x00
SC =0x00
CL =0xc2
CH =0x00
SEL=0x00
A mandatory SMART command failed: exiting. To continue, add one or more '-T 
permissive' options.
< cut >

To repeat, this does not happen under 2.6.20.5. Using "-T permissive" works:

< cut >
smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce 
Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:

Re: sata_nv + ADMA + Samsung disk problem

2008-01-01 Thread Gabor Gombas
Hi,

Just FYI I've tried to enable ADMA again (now running 2.6.24-rc6) but
the bug is still present:

Jan  1 16:11:21 host kernel: ata7: EH in ADMA mode, notifier 0x0 notifier_error 
0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0 next cpb idx 0x0
Jan  1 16:11:21 host kernel: ata7: CPB 0: ctl_flags 0x9, resp_flags 0x0
Jan  1 16:11:21 host kernel: ata7: timeout waiting for ADMA IDLE, stat=0x400
Jan  1 16:11:21 host kernel: ata7: timeout waiting for ADMA LEGACY, stat=0x400
Jan  1 16:11:21 host kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 
action 0x2 frozen
Jan  1 16:11:21 host kernel: ata7.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 
tag 0
Jan  1 16:11:21 host kernel:  res 40/00:00:00:4f:c2/00:00:00:00:00/00 
Emask 0x4 (timeout)
Jan  1 16:11:21 host kernel: ata7.00: status: { DRDY }
Jan  1 16:11:21 host kernel: ata7: soft resetting link
Jan  1 16:11:22 host kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 
300)
Jan  1 16:11:22 host kernel: ata7.00: configured for UDMA/133
Jan  1 16:11:22 host kernel: ata7: EH complete
Jan  1 16:11:22 host kernel: sd 6:0:0:0: [sdc] 488397168 512-byte hardware 
sectors (250059 MB)
Jan  1 16:11:22 host kernel: sd 6:0:0:0: [sdc] Write Protect is off
Jan  1 16:11:22 host kernel: sd 6:0:0:0: [sdc] Mode Sense: 00 3a 00 00
Jan  1 16:11:22 host kernel: sd 6:0:0:0: [sdc] Write cache: enabled, read 
cache: enabled, doesn't support DPO or FUA

Although this time the above happened more than 3 hours after boot
which is much better than 2.6.22 was. In the past ~4 months ADMA was
disabled and I never had any libata-related error messages.

SMART does not show anything interesting:

smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce 
Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: SAMSUNG SpinPoint P120 series
Device Model: SAMSUNG SP2504C
Serial Number:XX
Firmware Version: VT100-33
User Capacity:250,059,350,016 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 4a
Local Time is:Tue Jan  1 17:38:21 2008 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status:  (   0) The previous self-test routine completed
without error or no self-test has ever 
been run.
Total time to complete Offline 
data collection: (4867) seconds.
Offline data collection
capabilities:(0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off 
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine 
recommended polling time:(   1) minutes.
Extended self-test routine
recommended polling time:(  81) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   100   100   051Pre-fail  Always   
-   0
  3 Spin_Up_Time0x0007   100   100   025Pre-fail  Always   
-   6144
  4 Start_Stop_Count0x0032   099   099   000Old_age   Always   
-   1218
  5 Reallocated_Sector_Ct   0x0033   253   253   010Pre-fail  Always   
-   0
  7 Seek_Error_Rate 0x000f   253   253   051Pre-fail  Always   
-   0
  8 Seek_Time_Performance   0x0025   253   253   015Pre-fail  Offline  
-   11363
  9 Power_On_Hours  0x0032   100   100   000Old_age   Always   
-   3325
 10 Spin_Retry_Count0x0033   253   253   051Pre-fail  Always   
-   0
 11 Calibration_Retry_Count 0x0012   253   002   000Old_age   

Re: sata_nv + ADMA + Samsung disk problem

2008-01-11 Thread Gabor Gombas
On Mon, Jan 07, 2008 at 06:10:29PM -0600, Robert Hancock wrote:

> Gabor, I just noticed you said that it worked OK in 2.6.20, yet 2.6.22  
> fails. 2.6.20 had ADMA support as well, so I wonder what change started  
> causing the problem. Would it be possible for you to do a git bisect (or  
> at least try 2.6.21 to try and narrow it down)?

I've now booted 2.6.21.7, we'll see. The problem with the bisection is
that I can't explicitely trigger the bug so I can't say for sure if a
kernel is good or it is just needs more time to trigger. The average
uptime of this machine is just a couple hours a day.

For example, with 2.6.24-rc6 it took over 3 hours for the first disk to
trigger the bug and the second disk needed more than 7 hours. This
machine is seldom turned on for that long.

Gabor

-- 
 -
 MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
 -
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html