Mayuresh wrote: > Bob Proulx wrote: > > Is your Load_Cycle_Count continuously increasing? > > Doesn't look like. It was 3634 when I started watching and over last few > minutes it changed only to 3635.
That still seems like a rather high load_cycle_count. And if it is increasing every minute then I would investigate the issue further. What does this say? hdparm -B /dev/sda > > Install smartmontools. I also think think you should set up regular > > drive selftests. Ask if you want me to suggest something about this. > > Yes, please do suggest. Is your laptop something that is mostly off but only sometimes on? Or something that is mostly on and sometimes mobile? Or something different? Mobile devices are a little hard to schedule selftests upon because we would want to do it sometime when the device is otherwise idle but on AC mains power. I don't know a perfect answer to mobile devices so let me start by explaining the default configuration, then explaining my preferred configuration for always on systems, then guessing at something good for a mobile device. Install the smartmontools package. The default configuration dynamically searches for disk drives. If smart detects a failure it will notify by sending email. Here is the default config: # The word DEVICESCAN will cause any remaining lines in this # configuration file to be ignored: it tells smartd to scan for all # ATA and SCSI devices. DEVICESCAN may be followed by any of the # Directives listed below, which will be applied to all devices that # are found. Most users should comment out DEVICESCAN and explicitly # list the devices that they wish to monitor. DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner The above is the default. It is documented in the smartd.conf man page. The options listed are: -d TYPE Set the device type: ata, scsi, marvell, removable, 3ware,N, hpt,L/M/N -n MODE No check. MODE is one of: never, sleep, standby, idle -m ADD Send warning email to ADD for -H, -l error, -l selftest, and -f -M TYPE Modify email warning behavior (see man page) The reason it recommends removing DEVICESCAN and replacing it with an explicit configuration is for systems with multiple disk drives. A server with two mirrored RAID1 disks might have one disk fail completely. If using DEVICESCAN it will only detect one disk and won't know there should be a second one. By explicitly telling it that there should be two disks it can report the failure on the missing one. The default config is a good safe default in that it is installable on any system and provides something. Unfortunately they never run any selftests. Therefore on an always on server I change the configuration to be this: # Monitor all attributes, enable automatic offline data collection, # automatic attribute autosave, and start a short self-test every # weekday between 2-3am, and a long self test Saturdays between 3-4am. # Ignore attribute 194 temperature change. # Ignore attribute 190 airflow temperature change. # On failure run all installed scripts (to send notification email). /dev/sda -a -o on -S on -s (S/../../[1-5]/03|L/../../6/03) -I 194 -I 190 -m root -M exec /usr/share/smartmontools/smartd-runner /dev/sdb -a -o on -S on -s (S/../../[1-5]/03|L/../../6/03) -I 194 -I 190 -m root -M exec /usr/share/smartmontools/smartd-runner And for those options, all of these are on: -a Default: equivalent to -H -f -t -l error -l selftest -C 197 -U 198 -f Monitor for failure of any 'Usage' Attributes -H Monitor SMART Health Status, report if failed -f Monitor for failure of any 'Usage' Attributes -t Equivalent to -p and -u Directives -p Report changes in 'Prefailure' Normalized Attributes -u Report changes in 'Usage' Normalized Attributes -o VAL Enable/disable automatic offline tests (on/off) -S VAL Enable/disable attribute autosave (on/off) -I ID Ignore Attribute ID for -p, -u or -t Directive Those should be relatively straight forward. Basically all of the above is monitor important things and ignore unimportant things. -s REGE Start self-test when type/date matches regular expression -(see man page) That one is a mouthful. That one is where my comments come in to help. With the man page documentation the -s (S/../../[1-5]/03|L/../../6/03) option becomes this: start a short self-test every weekday between 2-3am, and a long self test Saturdays between 3-4am. That is the part that runs the selftests. Without -s it doesn't. The example file has examples that do almost exactly this. But those examples are commented out. For a server I like the above configuration where the selftests are run periodically. But for a mobile laptop this is more difficult because depending upon the user it might not be powered up at all during that time. Although if it is then it is likely to be on AC mains power. If it is on battery then we probably do not want to run the self-tests and at least not the long self tests. On my laptops I have upgraded to SSDs everywhere so don't need to worry about this problem anymore. Therefore I don't have a good canned solution. I think for me I would use anacron to run the selftests by cron when AC power is on. At this point I think I must leave this as an exercise for the reader. Couple that with what you know about how you use your device because everyone is different. But I would read up on "anacron" and "on_ac_power" and create a cron script that runs short selftests daily and long selftests once a week or so. Hope that helps! Bob
signature.asc
Description: Digital signature