On Tue, Oct 16, 2018 at 6:09 PM Paul Robert Marino <prmari...@gmail.com> wrote: > > to be clear I wasn't saying Smart is useless just that smartctl doesn't > always tell you every thing so you shouldn't rely as a definitive answer on > all issues on all disks. > > As for raid controllers well that's a very long conversation there are good > reasons the enterprise ones do not, at least not directly in a way you can > extract using the smartctl command instead they have more advanced checks > available through the drivers and additional monitoring tools provided by the > manufacturer of the raid controller. > > as for the predictive nature of smart well that's actually in its > specification it predicts errors based on indicators. > > On Tue, Oct 16, 2018 at 7:55 PM Konstantin Olchanski <olcha...@triumf.ca> > wrote: >> >> On Tue, Oct 16, 2018 at 04:20:03PM -0400, Paul Robert Marino wrote: >> > >> > smart is predictive and doesn't catch all errors its also not compatible >> > with all disks and controllers especially raid capable controllers. >> > >> >> >> Do not reject SMART as useless, it correctly reports many actual disk >> failures: >> >> a) overheating (actual disk temperature is reported in degrees Centigrade) >> b) unreadable sectors (data on these sectors is already lost) - disk model >> dependant >> c) "hard to read" sectors (WD specific - "raw read error rate") >> d) sata link communication errors ("CRC error count") >> >> even more useful actual (*not* predictive) stuff is reported for SSDs >> (again, model dependant) >> >> it is true that much of this information is disk model dependant and >> one has to have some experience with the SMART data to be able >> to read it in a meaningful way. >> >> as for raid controllers that prevent access to disk SMART data, >> they are as safe to use a car with a blank dashboard (no fuel level, >> no engine temperature, no speedometer, etc). >>
Posting " smartctl -a" output below. Also just wanted to mention that I have only single disk on my machine. So the disk has not failed. I was able to restart the machine lot of times and the OS came up nice. # smartctl -a /dev/sda smartctl 6.2 2017-02-27 r4394 [x86_64-linux-3.10.0-862.14.4.el7.x86_64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, https://urldefense.proofpoint.com/v2/url?u=http-3A__www.smartmontools.org&d=DwIFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=gd8BzeSQcySVxr0gDWSEbN-P-pgDXkdyCtaMqdCgPPdW1cyL5RIpaIYrCn8C5x2A&m=HVdbO_Zor-zyG5wjg1a513ELmsH-s6kV9BOATEv8HT4&s=0RkyV6zeuis8j3X9vh4DjMK3wxnckgIh-soHjAWzJGo&e= === START OF INFORMATION SECTION === Model Family: Toshiba 3.5" MG03ACAxxx(Y) Enterprise HDD Device Model: TOSHIBA MG03ACA100 Serial Number: 46SIKCQFF LU WWN Device Id: 5 000039 6fbf81f8b Add. Product Id: DELL(tm) Firmware Version: FL2H User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Size: 512 bytes logical/physical Rotation Rate: 7200 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Tue Oct 16 20:17:41 2018 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED Warning: This result is based on an Attribute check. General SMART Values: Offline data collection status: (0x85) Offline data collection activity was aborted by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 90) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 164) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0 2 Throughput_Performance 0x0004 100 100 000 Old_age Offline - 0 3 Spin_Up_Time 0x0027 100 100 001 Pre-fail Always - 4211 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 26 5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 0 7 Seek_Error_Rate 0x000a 100 100 000 Old_age Always - 0 8 Seek_Time_Performance 0x0004 100 100 000 Old_age Offline - 0 9 Power_On_Hours 0x0032 051 051 000 Old_age Always - 19725 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 26 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 25 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 26 194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 32 (Min/Max 20/37) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 2347506755 242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 125819370 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 19240 - # 2 Extended offline Completed without error 00% 17252 - # 3 Short offline Completed without error 00% 17248 - # 4 Short offline Completed without error 00% 2 - # 5 Vendor (0xdf) Completed without error 00% 2 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. # >> >> -- >> Konstantin Olchanski >> Data Acquisition Systems: The Bytes Must Flow! >> Email: olchansk-at-triumf-dot-ca >> Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada