On 29/05/2015 5:08 PM, "Petter Adsen" <pet...@synth.no> wrote:
> When I woke up this morning, one of my boxen had spewed out a ton of
> errors from one of my SSDs (the root drive), remounted read-only, and
> went into a kernel panic.
> After rebooting everything seems fine, though. I've ran a SMART long
> test, but as I found out the SMART error log is not supported on this
> drive. Neither do I have the log of what happened, since / was
> remounted ro.
> I've included the output of "smartctl --all /dev/sdc", but I can't see
> anything that stands out.
> Yesterday, I had another kernel panic (that seemed related to systemd),
> so I suspect the (manually built) kernel to be at fault here. The RAM
> in this machine is all brand new, and I ran memtest less than two weeks
> ago, so that should be fine.
> Can anyone look at this log and tell me if there is anything to worry
> about? Which of the attributes should I look at, so that I know in the
> future?
> (And I did a full backup as recently as yesterday that was tested OK
> at the time, so data loss is not a concern. Everything important is on
> other drives anyway.)
> ---<snip>---
> smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.19.0-18-generic] (local
> Copyright (C) 2002-14, Bruce Allen, Christian Franke,
> Model Family:     SandForce Driven SSDs
> Device Model:     KINGSTON SV300S37A120G
> Serial Number:    <snip>
> LU WWN Device Id: 5 0026b7 74703dbf1
> Firmware Version: 525ABBF0
> User Capacity:    120 034 123 776 bytes [120 GB]
> Sector Size:      512 bytes logical/physical
> Rotation Rate:    Solid State Device
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3
> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Fri May 29 08:50:31 2015 CEST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> SMART overall-health self-assessment test result: PASSED
> General SMART Values:
> Offline data collection status:  (0x02) Offline data collection activity
>                                         was completed without error.
>                                         Auto Offline Data Collection:
> Self-test execution status:      (   0) The previous self-test routine
>                                         without error or no self-test has
>                                         been run.
> Total time to complete Offline
> data collection:                (    0) seconds.
> Offline data collection
> capabilities:                    (0x79) SMART execute Offline immediate.
>                                         No Auto Offline data collection
>                                         Suspend Offline collection upon
>                                         command.
>                                         Offline surface scan supported.
>                                         Self-test supported.
>                                         Conveyance Self-test supported.
>                                         Selective Self-test supported.
> SMART capabilities:            (0x0003) Saves SMART data before entering
>                                         power-saving mode.
>                                         Supports SMART auto save timer.
> Error logging capability:        (0x01) Error logging supported.
>                                         General Purpose Logging supported.
> Short self-test routine
> recommended polling time:        (   1) minutes.
> Extended self-test routine
> recommended polling time:        (  36) minutes.
> Conveyance self-test routine
> recommended polling time:        (   2) minutes.
> SCT capabilities:              (0x0025) SCT Status supported.
>                                         SCT Data Table supported.
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
>   1 Raw_Read_Error_Rate     0x0033   095   095   050    Pre-fail  Always
     -       0/6132927
>   5 Retired_Block_Count     0x0033   100   100   003    Pre-fail  Always
     -       0
>   9 Power_On_Hours_and_Msec 0x0032   096   096   000    Old_age   Always
     -       4237h+54m+09.420s
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always
     -       74
> 171 Program_Fail_Count      0x000a   000   000   000    Old_age   Always
     -       0
> 172 Erase_Fail_Count        0x0032   000   000   000    Old_age   Always
     -       0
> 174 Unexpect_Power_Loss_Ct  0x0030   000   000   000    Old_age
 Offline      -       65
> 177 Wear_Range_Delta        0x0000   000   000   000    Old_age
 Offline      -       0
> 181 Program_Fail_Count      0x000a   000   000   000    Old_age   Always
     -       0
> 182 Erase_Fail_Count        0x0032   000   000   000    Old_age   Always
     -       0
> 187 Reported_Uncorrect      0x0012   100   100   000    Old_age   Always
     -       0
> 189 Airflow_Temperature_Cel 0x0000   024   036   000    Old_age
 Offline      -       24 (Min/Max 15/36)
> 194 Temperature_Celsius     0x0022   024   036   000    Old_age   Always
     -       24 (Min/Max 15/36)
> 195 ECC_Uncorr_Error_Count  0x001c   120   120   000    Old_age
 Offline      -       0/6132927
> 196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail  Always
     -       0

Reallocated_Event_Count is 0 meaning no bad sectors were ever found. I have
a failing drive atm and this number slowly piles up.

> 201 Unc_Soft_Read_Err_Rate  0x001c   120   120   000    Old_age
 Offline      -       0/6132927
> 204 Soft_ECC_Correct_Rate   0x001c   120   120   000    Old_age
 Offline      -       0/6132927
> 230 Life_Curve_Status       0x0013   100   100   000    Pre-fail  Always
     -       100
> 231 SSD_Life_Left           0x0013   100   100   010    Pre-fail  Always
     -       0
> 233 SandForce_Internal      0x0032   000   000   000    Old_age   Always
     -       2063
> 234 SandForce_Internal      0x0032   000   000   000    Old_age   Always
     -       2767
> 241 Lifetime_Writes_GiB     0x0032   000   000   000    Old_age   Always
     -       2767
> 242 Lifetime_Reads_GiB      0x0032   000   000   000    Old_age   Always
     -       2177
> SMART Error Log not supported
> SMART Self-test Log not supported
> SMART Selective self-test log data structure revision number 1
>     1        0        0  Not_testing
>     2        0        0  Not_testing
>     3        0        0  Not_testing
>     4        0        0  Not_testing
>     5        0        0  Not_testing
> Selective self-test flags (0x0):
>   After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute
> ---<snip>---
> Petter
> --
> "I'm ionized"
> "Are you sure?"
> "I'm positive."

