On 29/05/2015 5:08 PM, "Petter Adsen" <pet...@synth.no> wrote: > > When I woke up this morning, one of my boxen had spewed out a ton of > errors from one of my SSDs (the root drive), remounted read-only, and > went into a kernel panic. > > After rebooting everything seems fine, though. I've ran a SMART long > test, but as I found out the SMART error log is not supported on this > drive. Neither do I have the log of what happened, since / was > remounted ro. > > I've included the output of "smartctl --all /dev/sdc", but I can't see > anything that stands out. > > Yesterday, I had another kernel panic (that seemed related to systemd), > so I suspect the (manually built) kernel to be at fault here. The RAM > in this machine is all brand new, and I ran memtest less than two weeks > ago, so that should be fine. > > Can anyone look at this log and tell me if there is anything to worry > about? Which of the attributes should I look at, so that I know in the > future? > > (And I did a full backup as recently as yesterday that was tested OK > at the time, so data loss is not a concern. Everything important is on > other drives anyway.) > > ---<snip>--- > smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.19.0-18-generic] (local build) > Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Family: SandForce Driven SSDs > Device Model: KINGSTON SV300S37A120G > Serial Number: <snip> > LU WWN Device Id: 5 0026b7 74703dbf1 > Firmware Version: 525ABBF0 > User Capacity: 120 034 123 776 bytes [120 GB] > Sector Size: 512 bytes logical/physical > Rotation Rate: Solid State Device > Device is: In smartctl database [for details use: -P show] > ATA Version is: ATA8-ACS, ACS-2 T13/2015-D revision 3 > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > Local Time is: Fri May 29 08:50:31 2015 CEST > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > General SMART Values: > Offline data collection status: (0x02) Offline data collection activity > was completed without error. > Auto Offline Data Collection: Disabled. > Self-test execution status: ( 0) The previous self-test routine completed > without error or no self-test has ever > been run. > Total time to complete Offline > data collection: ( 0) seconds. > Offline data collection > capabilities: (0x79) SMART execute Offline immediate. > No Auto Offline data collection support. > Suspend Offline collection upon new > command. > Offline surface scan supported. > Self-test supported. > Conveyance Self-test supported. > Selective Self-test supported. > SMART capabilities: (0x0003) Saves SMART data before entering > power-saving mode. > Supports SMART auto save timer. > Error logging capability: (0x01) Error logging supported. > General Purpose Logging supported. > Short self-test routine > recommended polling time: ( 1) minutes. > Extended self-test routine > recommended polling time: ( 36) minutes. > Conveyance self-test routine > recommended polling time: ( 2) minutes. > SCT capabilities: (0x0025) SCT Status supported. > SCT Data Table supported. > > SMART Attributes Data Structure revision number: 10 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x0033 095 095 050 Pre-fail Always - 0/6132927 > 5 Retired_Block_Count 0x0033 100 100 003 Pre-fail Always - 0 > 9 Power_On_Hours_and_Msec 0x0032 096 096 000 Old_age Always - 4237h+54m+09.420s > 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 74 > 171 Program_Fail_Count 0x000a 000 000 000 Old_age Always - 0 > 172 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0 > 174 Unexpect_Power_Loss_Ct 0x0030 000 000 000 Old_age Offline - 65 > 177 Wear_Range_Delta 0x0000 000 000 000 Old_age Offline - 0 > 181 Program_Fail_Count 0x000a 000 000 000 Old_age Always - 0 > 182 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0 > 187 Reported_Uncorrect 0x0012 100 100 000 Old_age Always - 0 > 189 Airflow_Temperature_Cel 0x0000 024 036 000 Old_age Offline - 24 (Min/Max 15/36) > 194 Temperature_Celsius 0x0022 024 036 000 Old_age Always - 24 (Min/Max 15/36) > 195 ECC_Uncorr_Error_Count 0x001c 120 120 000 Old_age Offline - 0/6132927 > 196 Reallocated_Event_Count 0x0033 100 100 003 Pre-fail Always - 0
Reallocated_Event_Count is 0 meaning no bad sectors were ever found. I have a failing drive atm and this number slowly piles up. > 201 Unc_Soft_Read_Err_Rate 0x001c 120 120 000 Old_age Offline - 0/6132927 > 204 Soft_ECC_Correct_Rate 0x001c 120 120 000 Old_age Offline - 0/6132927 > 230 Life_Curve_Status 0x0013 100 100 000 Pre-fail Always - 100 > 231 SSD_Life_Left 0x0013 100 100 010 Pre-fail Always - 0 > 233 SandForce_Internal 0x0032 000 000 000 Old_age Always - 2063 > 234 SandForce_Internal 0x0032 000 000 000 Old_age Always - 2767 > 241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always - 2767 > 242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always - 2177 > > SMART Error Log not supported > > SMART Self-test Log not supported > > SMART Selective self-test log data structure revision number 1 > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS > 1 0 0 Not_testing > 2 0 0 Not_testing > 3 0 0 Not_testing > 4 0 0 Not_testing > 5 0 0 Not_testing > Selective self-test flags (0x0): > After scanning selected spans, do NOT read-scan remainder of disk. > If Selective self-test is pending on power-up, resume after 0 minute delay. > ---<snip>--- > > Petter > > -- > "I'm ionized" > "Are you sure?" > "I'm positive."