Launchpad has imported 11 comments from the remote bug at http://bugs.freedesktop.org/show_bug.cgi?id=25772.
If you reply to an imported comment from within Launchpad, your comment will be sent to the remote bug automatically. Read more about Launchpad's inter-bugtracker facilities at https://help.launchpad.net/InterBugTracking. ------------------------------------------------------------------------ On 2009-12-23T01:11:19+00:00 Jelot-freedesktop wrote: A comment in the code say: /* We use log2(n_sectors) as a threshold here. We had to pick * something, and this makes a bit of sense, or doesn't it? */ this means: 128GB = 2^37 Bytes -> log2(2^28) = 28 sectors 1TB = 2^40 Bytes -> log2(2^31) = 31 sectors 8TB = 2^43 Bytes -> log2(2^34) = 34 sectors I think that this is a unlucky heuristic. The meaning of raw value is vendor specific. Could have sense if BAD_SECTOR_MANY is calculated like: (worst value - threshold value) <= 5 ? obviously this is only an example Reply at: https://bugs.launchpad.net/libatasmart/+bug/438136/comments/98 ------------------------------------------------------------------------ On 2009-12-23T05:15:11+00:00 Lennart-poettering wrote: The entire SMART attribute business is highly vendor dependant since there is no officially accepted spec about SMART attribute decoding. (It never became an official standard, all it ever was was a draft that was later on withdrawn) Fortunately on almost all drives the raw data of quite a few fields can be decoded the same way. In libatasmart we try to include the decoding of fields where it makes sense and is commonly accepted. OTOH the non-raw fields (i.e. "current" and "worst") encode the information about the raw number of sectors (for sector related attributes) in a way that we cannot determine the actual number of sectors anymore. The reason for this extra threshold we apply here is that we wanted vendor-independent health checking. i.e. as long as we can trust the number of raw bad sectors the drive reports we can compare that with a threshold that is not fiddled with by the vendor to make his drives look better. The reason I picked log2() here is simply that we do want to allow more bad sectors on bigger drives than on small ones. But a linearly related threshold seemed to increase too quickly, so the next choice was logarithmic. Do you have any empiric example where the current thresholds do not work as they should? Reply at: https://bugs.launchpad.net/libatasmart/+bug/438136/comments/99 ------------------------------------------------------------------------ On 2009-12-28T08:38:04+00:00 Stephen-boddy wrote: Please check the associated skdump save file. This is an old 20GB laptop drive. In the latest Ubuntu 9.10 they ship with 0.16 of libatasmart. I think this drive is incorrectly flagged as failing, because the lib relies on the raw value being a single raw48 value. This then looks like very many (262166) bad blocks. Using "smartctl -a /dev/sda" I get the following extracts: SMART overall-health self-assessment test result: PASSED 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 262166 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 4 If I use the -v 5,raw8 option 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 0 0 4 0 22 If I use the -v 5,raw16 option 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 4 22 The attribute is being read as raw48, which in this case looks to be completely wrong. Using the different raw# value seems to tie in with attribute 196. It could be argued that if you cannot rely on the format of the raw value, you should not base warnings off it, and only use the normalized, worst and threshold values. I'm technical, and I damn near junked a relatives old but still serviceable laptop because of this. Reply at: https://bugs.launchpad.net/libatasmart/+bug/438136/comments/104 ------------------------------------------------------------------------ On 2009-12-28T08:39:08+00:00 Stephen-boddy wrote: Created an attachment (id=32330) skdump of harddrive Reply at: https://bugs.launchpad.net/libatasmart/+bug/438136/comments/105 ------------------------------------------------------------------------ On 2009-12-29T04:17:41+00:00 Jelot-freedesktop wrote: (In reply to comment #1) > The reason I picked log2() here is simply that we do want to allow more bad > sectors on bigger drives than on small ones. But a linearly related threshold > seemed to increase too quickly, so the next choice was logarithmic. > > Do you have any empiric example where the current thresholds do not work as > they should? > For convenience I use kibibyte, mebibyte, gibibyte ... 128 GiB = 2^37 -> log2(2^37/512) = log2(2^37/2^9) = 28 sectors For an HDD of 128 GiB (2^37 Bytes) the calculated threshold value is 28 sectors (14336 Bytes = 14 KiB), isn't it too low? For an HDD of 1 TiB (2^40 Bytes) the calculated threshold value is 31 sectors (15872 Bytes = 15.5 KiB) ... For an hypothetical HDD of 1 PiB (2^50 Bytes, 1024 tebibyte) the calculated threshold is only 41 sectors (20992 Bytes = 20.5 KiB) ... If we do want to allow more bad sectors on bigger drives than on small ones, IMHO this isn't a good heuristic. Difference between HDD of 128 GiB and HDD of 8 TiB is only 6 sectors (3 KiB) Reply at: https://bugs.launchpad.net/libatasmart/+bug/438136/comments/107 ------------------------------------------------------------------------ On 2009-12-30T06:12:33+00:00 Jelot-freedesktop wrote: I forgotten to say that this bug report and the enhancement requested in Bug #25773 is due to Launchpad Bug 438136 <https://bugs.launchpad.net/ubuntu/+source/libatasmart/+bug/438136?comments=all> On launchpad there are also some screenshots of palimpsest that show the failing hard disk with relatively few bad sectors or with raw value with probably different format (there are some 65537 65539 65551 65643 and similar number of bad sectors) Some example: 117 bad sectors (58.5 KiB) on 1000GB HDD <http://launchpadlibrarian.net/32604239/palimpsest-screenshot.png> 66 bad sectors (33 KiB) on 200GB HDD <http://launchpadlibrarian.net/34794631/Screenshot-SMART%20Data.png> 466 bad sectors (233 KiB) on 1500GB HDD <http://launchpadlibrarian.net/34991157/Screenshot.png> 65 bad sectors (32.5 KiB) on 120GB HDD (all current pending sectors" <http://launchpadlibrarian.net/35201129/Pantallazo-Datos%20SMART.png> 54 bad sectors (27 KiB) on 169GB HDD <http://launchpadlibrarian.net/36115988/Screenshot.png> Reply at: https://bugs.launchpad.net/libatasmart/+bug/438136/comments/108 ------------------------------------------------------------------------ On 2010-03-19T04:00:15+00:00 Martin Pitt wrote: The bigger problem of this is (as you already mentioned) that the raw value is misparsed way too often. Random examples from bug reports: http://launchpadlibrarian.net/34574037/smartctl.txt 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 327697 http://launchpadlibrarian.net/35971054/smartctl_tests.log 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 65542 http://launchpadlibrarian.net/36599746/smartctl_tests-deer.log 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 65552 https://bugzilla.redhat.com/attachment.cgi?id=382378 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 655424 https://bugzilla.redhat.com/show_bug.cgi?id=506254 reallocated-sector-count 100/100/ 5 FAIL 1900724 sectors Prefail Online It seems that "no officially accepted spec about SMART attribute decoding" also hits here in the sense of that way too many drives get the raw counts wrong. In all the 30 or so logs that I looked at in the various Launchpad/RedHat/fd.o bug reports related to this I didn't see an implausible value of the normalized values, though. I appreciate the effort of doing vendor independent bad blocks checking, but a lot of people get tons of false alarms due to that, and thus won't believe it any more if there is really a disk failing some day. My feeling is that a more cautious approach would be to use the normalized value vs. treshold for the time being, and use the raw values if/when that can be made more reliable (then we should use something in between logarithmic and linear, though, since due to sheer probabilities, large disks will have more bad sectors and also more reserve sectors than small ones). Reply at: https://bugs.launchpad.net/libatasmart/+bug/438136/comments/144 ------------------------------------------------------------------------ On 2010-03-19T04:27:33+00:00 Martin Pitt wrote: Created an attachment (id=34234) smart blob with slightly broken sectors BTW, I use this smart blob for playing around and testing, which is a particularly interesting one: It has a few bad sectors (correctly parsed), but not enough yet to be below the vendor specified threshold. 5 reallocated-sector-count 77 1 63 1783 sectors 0xf70600000000 prefail online yes no 197 current-pending-sector 83 6 0 1727 sectors 0xbf0600000000 old-age offline n/a n/a So this can be loaded into skdump or udisks for testing the desktop integration all the way through: $ sudo udisks --ata-smart-refresh /dev/sda --ata-smart-simulate /tmp/smart.blob Reply at: https://bugs.launchpad.net/libatasmart/+bug/438136/comments/145 ------------------------------------------------------------------------ On 2010-03-19T07:02:09+00:00 Martin Pitt wrote: Created an attachment (id=34242) Drop our own "many bad sectors" heuristic This patch just uses the standard "compare normalized value against treshold". I know that it's not necessarily how you really want it to work, but it's a pragmatic solution to avoid all those false positives, which don't help people either. So of course feel free to entirely ignore it, but at least I want to post it here for full disclosure. (I'll apply it to Debian/Ubuntu, we have to get a release out). This patch is against the one in bug 26834. Reply at: https://bugs.launchpad.net/libatasmart/+bug/438136/comments/146 ------------------------------------------------------------------------ On 2010-03-19T07:05:13+00:00 Martin Pitt wrote: Oh, forgot: I compared for i in blob-examples/*; do echo "-- $i"; ./skdump --load=$i; done > /tmp/atasmart-test.out before and after, and get two differences like -^[[1mOverall Status: BAD_SECTOR_MANY^[[0m +^[[1mOverall Status: BAD_SECTOR^[[0m The first one is against blob-examples/Maxtor_96147H8--BAC51KJ0: 5 reallocated-sector-count 226 226 63 69 sectors 0x450000000000 prefail online yes yes and the second one against blob-examples/WDC_WD5000AAKS--00TMA0-12.01C01 5 reallocated-sector-count 192 192 140 63 sectors 0x3f0000000000 prefail online yes yes so under the premise of changing the evaluation to use the normalized numbers those are correct and expected changes. Reply at: https://bugs.launchpad.net/libatasmart/+bug/438136/comments/147 ------------------------------------------------------------------------ On 2010-07-04T02:09:56+00:00 cowbutt wrote: (In reply to comment #1) > The reason I picked log2() here is simply that we do want to allow more bad > sectors on bigger drives than on small ones. But a linearly related threshold > seemed to increase too quickly, so the next choice was logarithmic. > > Do you have any empiric example where the current thresholds do not work as > they should? According to http://www.seagate.com/ww/v/index.jsp?locale=en-US&name =SeaTools_Error_Codes_- _Seagate_Technology&vgnextoid=d173781e73d5d010VgnVCM100000dd04090aRCRD (which I first read about 18 months ago, when 1.5TB drives were brand new), "Current disk drives contain *thousands* [my emphasis] of spare sectors which are automatically reallocated if the drive senses difficulty reading or writing". Therefore, it is my belief that your heuristic is off by somewhere between one and two orders of magnitude as your heuristic only allows for 30 bad sectors on a 1TB drive (Seagate's article would imply it has at least 2000 spare sectors - and maybe more - of which 30 are only 1.5%). As you say, though, this is highly manufacturer- and model-dependent; Seagate's drives might be designed with very many more spare sectors than other manufacturers' drives. The only sure-fire way to interpret the SMART attributes is to compare the cooked value with the vendor-set threshold for that attribute. If you are insistent upon doing something with the raw reallocated sector count attribute, I believe it would be far more useful to alert when it changes, or changes by a large number of sectors in a short period of time. Reply at: https://bugs.launchpad.net/libatasmart/+bug/438136/comments/167 ** Changed in: libatasmart Importance: Unknown => Medium -- palimpsest bad sectors false positive https://bugs.launchpad.net/bugs/438136 You received this bug notification because you are a member of Registry Administrators, which is the registrant for Fedora. _______________________________________________ Mailing list: https://launchpad.net/~registry Post to : [email protected] Unsubscribe : https://launchpad.net/~registry More help : https://help.launchpad.net/ListHelp

