On Wed, 14 Jul 2021 at 12:07, Matthias Petermann <m...@petermann-it.de> wrote: > > Hello all, > > > ``` > [ 87240.313853] wd2: (uncorrectable data error) > [ 87240.313853] wd2d: error reading fsbn 5707914328 of > 5707914328-5707914455 (wd2 bn 5707914328; cn 5662613 tn 6 sn 46) > [ 87465.637977] wd2d: error reading fsbn 5710464152 of > 5710464152-5710464215 (wd2 bn 5710464152; cn 5665143 tn 0 sn 8), xfer > 338, retry 0 > [ 87465.637977] wd2: (uncorrectable data error) > [ 87475.561683] wd2: soft error (corrected) xfer 338 > [ 87506.393194] wd2d: error reading fsbn 5710555128 of > 5710555128-5710555255 (wd2 bn 5710555128; cn 5665233 tn 4 sn 12), xfer > 40, retry 0 > [ 87506.393194] wd2: (uncorrectable data error) > [ 87515.156465] wd2d: error reading fsbn 5710555128 of > 5710555128-5710555255 (wd2 bn 5710555128; cn 5665233 tn 4 sn 12), xfer > 40, retry 1 > ``` > > The whole syslog is full of these messages. What surprises me is that > there are "uncorrectable" data errors in the syslog. Nevertheless, the > data can still be read - albeit very slowly. My assumption was that the > redundancies of RAID2 are being used to compensate for the defects. To > my surprise, ZFS does not seem to have noticed any of these defects: > The wd driver is retrying, (IIRC it retries 3 times) and suceeding on the second or 3rd attempt. (See xfer 338, retry 0, followed by a 'soft error corrected' with the same xfer number 10 seconds later. This is the retry suceeding).
This sits below ZFS and therefore ZFS never sees the error. If the read failed 3 times you'd probably get a data error in ZFS. > > For the sake of completeness, here is the issue of S.M.A.R.T. - even if > I find it difficult to interpret: > > ``` > saturn$ doas atactl wd2 smart status > SMART supported, SMART enabled > id value thresh crit collect reliability description raw > 1 197 51 yes online positive Raw read error rate 38669 > 3 176 21 yes online positive Spin-up time 6158 > 4 100 0 no online positive Start/stop count 510 > 5 200 140 yes online positive Reallocated sector count 0 I was expecting to see this value greater than 0 if the drive was failing, is the drive bad or the cabling? Cheers, Ian