In my experience, you will likely be able to pull a few more weeks / months of life out of the drive but it will die. Mirko's suggestion of migrating to a n=3 raid1 setup is also what I would recommend.
You will notice in your smartctl output that Reallocated_Sector_Ct is 41. That means that there have already been 41 sectors remapped to the spare sectors of your drive. The 8 offline_uncorrectable / current_pending_sector are probably unpopulated sectors that haven't been rewritten to yet, but triggered an i/o error the last time there was data there. For me this was often on a swap partition since there are a lot of transient writes. The next time the system tries to write to those sectors it will either fail and mark it as permanently unusable, or succeed and clear the pending count. Good luck Patrick On Mon, Mar 6, 2017 at 8:27 PM, Gregory Seidman < gsslist+deb...@anthropohedron.net> wrote: > On Mon, Mar 06, 2017 at 12:17:03PM +0100, Mirko Parthey wrote: > > On Sun, Mar 05, 2017 at 08:38:27PM -0800, David Christensen wrote: > > > On 03/05/2017 01:02 PM, Gregory Seidman wrote: > > > >I have a disk that is reporting SMART errors. It is an active disk in > > > >a (kernel, not hardware) RAID1 configuration. I also have a hot spare > > > >in the RAID1, and md hasn't decided it should fail the disk and switch > > > >to the hot spare. Should I proactively tell md to fail the disk (and > > > >let the hot spare take over), or should I just wait until md notices a > > > >problem? > > > > > > I'm confused by "I also have a hot spare in the RAID1". Do you have a > > > two-member RAID1 with a hot spare, or a three-member RAID1? I would > > > prefer the latter: > > > > > > https://manpages.debian.org/jessie/mdadm/md.4.en.html > > > > Refining this advice a bit, I would convert the spare to a full RAID > > member now, without explicitly failing the disk that reports SMART > > errors first. > > Assuming you have a two-member RAID1 with a hot spare, the command > > should be similar to this (untested): > > mdadm -G /dev/mdX -n 3 > > This ensures you keep redundancy during further maintenance actions. > > I was unaware that this was possible. I've run it and mdadm -D reports that > it is now in the "clean, degraded, rebuilding" state. Thank you! I wish I > had room in my system to add the fourth (which I've ordered) without > removing the failing disk, but I do not. > > > Which SMART errors do you get, and who reports them? > > I get emails sent to root: > > This message was generated by the smartd daemon running on: > > host name: XXXXXX > DNS domain: YYYYYY > > The following warning/error was logged by the smartd daemon: > > Device: /dev/sdc [SAT], 8 Currently unreadable (pending) sectors > > Device info: > ST31500341AS, S/N:9VS43CV9, WWN:5-000c50-0208aa9a3, FW:CC1H, 1.50 > TB > > For details see host's SYSLOG. > > You can also use the smartctl utility for further investigation. > The original message about this issue was sent at Wed Dec 14 > 00:51:36 2016 EST > Another message will be sent in 24 hours if the problem persists. > > ...and... > > This message was generated by the smartd daemon running on: > > host name: XXXXXX > DNS domain: YYYYYY > > The following warning/error was logged by the smartd daemon: > > Device: /dev/sdc [SAT], 8 Offline uncorrectable sectors > > Device info: > ST31500341AS, S/N:9VS43CV9, WWN:5-000c50-0208aa9a3, FW:CC1H, 1.50 > TB > > For details see host's SYSLOG. > > You can also use the smartctl utility for further investigation. > The original message about this issue was sent at Wed Dec 14 > 00:51:37 2016 EST > Another message will be sent in 24 hours if the problem persists. > > (Yes, I know, I've been letting it do this since mid-December, which is not > great.) > > > What is the output of the following command for the failing drive? > > smartctl -A /dev/sdY > > # smartctl -A /dev/sdc > smartctl 6.4 2014-10-07 r4002 [i686-linux-3.16.0-4-686-pae] (local > build) > Copyright (C) 2002-14, Bruce Allen, Christian Franke, > www.smartmontools.org > > === START OF READ SMART DATA SECTION === > SMART Attributes Data Structure revision number: 10 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE > UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000f 119 099 006 Pre-fail > Always - 205161943 > 3 Spin_Up_Time 0x0003 100 091 000 Pre-fail > Always - 0 > 4 Start_Stop_Count 0x0032 099 099 020 Old_age > Always - 1055 > 5 Reallocated_Sector_Ct 0x0033 099 099 036 Pre-fail > Always - 41 > 7 Seek_Error_Rate 0x000f 092 060 030 Pre-fail > Always - 1743842168 > 9 Power_On_Hours 0x0032 039 039 000 Old_age > Always - 53898 > 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail > Always - 0 > 12 Power_Cycle_Count 0x0032 100 100 020 Old_age > Always - 85 > 184 End-to-End_Error 0x0032 100 100 099 Old_age > Always - 0 > 187 Reported_Uncorrect 0x0032 097 097 000 Old_age > Always - 3 > 188 Command_Timeout 0x0032 100 098 000 Old_age > Always - 133146017827 > 189 High_Fly_Writes 0x003a 007 007 000 Old_age > Always - 93 > 190 Airflow_Temperature_Cel 0x0022 060 040 045 Old_age > Always In_the_past 40 (Min/Max 26/45 #502) > 194 Temperature_Celsius 0x0022 040 060 000 Old_age > Always - 40 (0 18 0 0 0) > 195 Hardware_ECC_Recovered 0x001a 038 023 000 Old_age > Always - 205161943 > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age > Always - 8 > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age > Offline - 8 > 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age > Always - 1 > 240 Head_Flying_Hours 0x0000 100 253 000 Old_age > Offline - 53897 (15 186 0) > 241 Total_LBAs_Written 0x0000 100 253 000 Old_age > Offline - 917595486 > 242 Total_LBAs_Read 0x0000 100 253 000 Old_age > Offline - 1262569510 > > > Regards, > > Mirko > > Thanks for the help so far, > --Greg > >