Jeremy Chadwick said:
ad6: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=134802751

Are you sure you don't have a bad hard disk?  This looks to be like a
classic block/sector failure.

I hadn't realized that a bad block would manifest itself with a message about DMA. Seems like such semantics would be a little obscure to most users, apparently including me.

So you're saying that the *exact* same READ_DMA error, at the *exact*
same LBA, is reported on ad4?  If so, that's very bizarre.

No, perhaps I wasn't clear enough. Both instances were on ad6, so far.

Can you please provide the output from the following commands?

See end of message. Let me know if you then want more (in- or out-of-band).

Having now installed smartmontools, you can see below that I ran it for both ad4 and ad6. Sure enough, ad6 has logged 2 READ DMA errors - does that make this a definitive bad disk then?

Should I not be worried about ad4 too? Those Raw_Read_Error_Rate and Seek_Error_Rate numbers should be zero or very close to it, shouldn't they? I don't know how to interpret what I'm seeing in that output, so I'd appreciate any insight. Should I be returning both disks for warranty claims (they're both very recently purchased)?

Wojciech Puchar said:
boot from some kind of live CD, then make another mirror (single disk now) on other drive, then do

dd if=/dev/ad6s1 of=/dev/mirror/newmirror bs=2k conv=noerror,sync

i intentionally did bs=2k instead of larger, to minimize amount of lost data.

then change your system to boot from newmirror, take out /dev/ad6 and have it replaced on warranty (or buy new), put new ad6, insert it to the mirror.

I think you're describing a method to help me save as much data from ad6 as possible. Fortunately, this is all about constructing a new system, so there's no data yet to lose.

Is there anything I should know about this model of hard disk with regards to being known for problems? Also, is there a good test I can perform to hopefully flush out any problems before I put this thing into service?

Carl                                             / K0802647

######## Additional Information ########

# vmstat -i
interrupt                          total       rate
irq1: atkbd0                           4          0
irq4: sio0                        125724         16
irq19: uhci3                           5          0
irq21: uhci1+                     478364         63
irq23: uhci2 ehci1                     1          0
cpu0: timer                     14517071       1923
irq256: em0                       109568         14
cpu1: timer                     14514956       1922
Total                           29745693       3940

# atacontrol list | grep -v "no device present"
ATA channel 0:
ATA channel 1:
ATA channel 2:
    Master:  ad4 <ST31000340AS/SD15> Serial ATA II
ATA channel 3:
    Master:  ad6 <ST31000340AS/SD15> Serial ATA II
ATA channel 4:
    Master: acd0 <HL-DT-ST DVDRAM GH20NS10/EL00> Serial ATA v1.0
ATA channel 5:
ATA channel 6:
ATA channel 7:

# atacontrol cap ad4

Protocol              Serial ATA II
device model          ST31000340AS
serial number         xxxxxxxH
firmware revision     SD15
cylinders             16383
heads                 16
sectors/track         63
lba supported         268435455 sectors
lba48 supported       1953525168 sectors
dma supported
overlap not supported

Feature                      Support  Enable    Value           Vendor
write cache                    yes      yes
read ahead                     yes      yes
Native Command Queuing (NCQ)   yes       -      31/0x1F
Tagged Command Queuing (TCQ)   no       no      31/0x1F
SMART                          yes      yes
microcode download             yes      yes
security                       yes      no
power management               yes      yes
advanced power management      no       no      65278/0xFEFE
automatic acoustic management  no       no      0/0x00  254/0xFE

# atacontrol cap ad6

Protocol              Serial ATA II
device model          ST31000340AS
serial number         xxxxxxxA
firmware revision     SD15
cylinders             16383
heads                 16
sectors/track         63
lba supported         268435455 sectors
lba48 supported       1953525168 sectors
dma supported
overlap not supported

Feature                      Support  Enable    Value           Vendor
write cache                    yes      yes
read ahead                     yes      yes
Native Command Queuing (NCQ)   yes       -      31/0x1F
Tagged Command Queuing (TCQ)   no       no      31/0x1F
SMART                          yes      yes
microcode download             yes      yes
security                       yes      no
power management               yes      yes
advanced power management      no       no      65278/0xFEFE
automatic acoustic management  no       no      0/0x00  254/0xFE

# smartctl -a /dev/ad4
smartctl version 5.38 [i386-portbld-freebsd7.0] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.11
Device Model:     ST31000340AS
Serial Number:    xxxxxxxH
Firmware Version: SD15
User Capacity:    1,000,204,886,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Tue Oct 28 18:07:25 2008 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 650) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 230) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x103b) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 117 099 006 Pre-fail Always - 158643744 3 Spin_Up_Time 0x0003 092 091 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 108 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 064 060 030 Pre-fail Always - 2921473 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 499 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 108 184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Unknown_Attribute 0x0032 100 099 000 Old_age Always - 65540 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 071 069 045 Old_age Always - 29 (Lifetime Min/Max 23/31) 194 Temperature_Celsius 0x0022 029 040 000 Old_age Always - 29 (0 20 0 0) 195 Hardware_ECC_Recovered 0x001a 039 019 000 Old_age Always - 158643744 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

# smartctl -a /dev/ad6
smartctl version 5.38 [i386-portbld-freebsd7.0] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.11
Device Model:     ST31000340AS
Serial Number:    xxxxxxxA
Firmware Version: SD15
User Capacity:    1,000,204,886,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Tue Oct 28 18:08:22 2008 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 642) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 227) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x103b) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 116 100 006 Pre-fail Always - 106947042 3 Spin_Up_Time 0x0003 092 091 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 108 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 2 7 Seek_Error_Rate 0x000f 061 060 030 Pre-fail Always - 1376532 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 499 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 1 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 108 184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 098 098 000 Old_age Always - 2 188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 071 069 045 Old_age Always - 29 (Lifetime Min/Max 23/31) 194 Temperature_Celsius 0x0022 029 040 000 Old_age Always - 29 (0 19 0 0) 195 Hardware_ECC_Recovered 0x001a 038 018 000 Old_age Always - 106947042 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 2 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 2 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0

SMART Error Log Version: 1
ATA Error Count: 2
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 occurred at disk power-on lifetime: 475 hours (19 days + 19 hours)
When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 9d ed 08 08  Error: UNC at LBA = 0x0808ed9d = 134802845

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 00 3f ed 08 48 00  13d+00:32:54.564  READ DMA
  c8 00 00 3f ec 08 48 00  13d+00:32:54.563  READ DMA
  c8 00 00 3f eb 08 48 00  13d+00:32:54.562  READ DMA
  c8 00 00 3f ea 08 48 00  13d+00:32:54.561  READ DMA
  c8 00 00 3f e9 08 48 00  13d+00:32:54.560  READ DMA

Error 1 occurred at disk power-on lifetime: 474 hours (19 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 9d ed 08 08  Error: UNC at LBA = 0x0808ed9d = 134802845

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 00 3f e9 08 48 00  12d+23:04:28.359  READ DMA
  c8 00 00 3f 53 06 48 00  12d+23:04:27.202  READ DMA
  c8 00 00 3f 52 06 48 00  12d+23:04:27.193  READ DMA
  c8 00 00 3f 51 06 48 00  12d+23:04:27.191  READ DMA
  c8 00 00 3f 50 06 48 00  12d+23:04:27.191  READ DMA

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

######## END ########
_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to