Re: irq timeout error

Mike McCarty Mon, 21 Dec 2009 00:09:26 -0800

Mykal Funk wrote:
> Mike McCarty wrote:
>> Mykal Funk wrote:
>>   
>>> While running a compile of GCC I got the following error:
>>>     
>> Oh, if you have a distro which can use SMART, and your
>> disc is SMART capable, you can ask it.
>>
>> # smartctl -i /dev/hda
>>   
> This command showed that the drive was SMART enabled, though it failed 
> to recognize the disc saying "Not in smartctl database".


Good. It isn't a problem that your program doesn't know your
disc specifically.

Most of these numbers are important to watch more for
changes than actual value.

Normal disc wearout follows the Weibull Distribution family of
distributions. This is the so called "bathtub curve", describing
failure rates in time (FITs). It starts out high, with the so
called "infant mortality", then levels out at a very low rate,
followed by a sharp rise near "end of life". When you start to
see changes in the report, indicating a trend change, then you
start to worry.

[...]

> SMART overall-health self-assessment test result: PASSED
> Please note the following marginal Attributes:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      
> UPDATED  WHEN_FAILED RAW_VALUE
>   5 Reallocated_Sector_Ct   0x0033   100   001   050    Pre-fail  
> Always   In_the_past 0

This is the most worrisome one. These discs are made with
spare sectors. As the sectors start to go bad, the data
are copied off to other spare (unmapped) sectors, and the
disc pretends that the new (previously unmapped) sectors are
the actual originals. It lies to the system, and uses a
different sector, which previously was on the "spare list"
as a replacement. It's not unusual to find a few which have
been remapped, immediately after manufacture.

When you see the remapping again starting to take place, then
the disc is likely nearing wearout. As mentioned above, the trend
is the issue. Watch this closely, and if you see the number
of remapped sectors going up, then you must not put any
important data on this disc.

>> # smartctl -a /dev/hda
>>   
> SMART Attributes Data Structure revision number: 4
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      
> UPDATED  WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x000b   100   100   025    Pre-fail  
> Always       -       274871
>   3 Spin_Up_Time            0x0027   100   099   025    Pre-fail  
> Always       -       64

If this one is gradually worsening, then it likely indicates
bearing wearout.

>   4 Start_Stop_Count        0x0032   100   100   000    Old_age   
> Always       -       1645

This is just the number of times the disc has been "spun up".
The value is only very roughly correlated to wear out. Total
hours is more closely, as it relates to bearing wearout. This
one is related, because each time the disc spins down, the heads
have to "land" on the surface and actually contact it. When the
disc spins up, they have to take off and "fly". This means the
discs rub on the heads, wearing them, and also must overcome
"stiction", which applies torque to the head supports.

>   5 Reallocated_Sector_Ct   0x0033   100   001   050    Pre-fail  
> Always   In_the_past 0
>   7 Seek_Error_Rate         0x000b   100   100   025    Pre-fail  
> Always       -       91205

If this number is going up, then your disc may be having troubles
reading the servo calibration surface, which is used for all seeks.
There is one surface, with "blank" sectors on it, used for servo
of the positioning. There is an electromagnet which is used to
position the head over the selected track/cylinder. The servo is
used to do an adjustment. One head and one surface is used to
"read" that blank surface, and the head is adjusted in/out until
the signal level read by that head from that surface is maximized.
If the seek fails, then that means that the actual cylinder selected
was the wrong one, or that the sector read which took place indicated
a non existent track.

This may be the source of your timeouts. If the servo is having a
hard time properly positioning the heads, then you've got a problem.

>   9 Power_On_Hours          0x0012   080   080   000    Old_age   
> Always       -       1149279

THIS one always goes up, and is related to both bearing wearout,
and to

>  10 Spin_Retry_Count        0x0027   100   100   072    Pre-fail  
> Always       -       64
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age   
> Always       -       832
> 
> Warning: device does not support Error Logging

These warnings are meaningless, more or less. They simply
indicate that the SMART on that drive isn't so very, er,
smart :-)

> Error SMART Error Log Read failed: Input/output error
> Smartctl: SMART Error Log Read Failed
> Warning: device does not support Self Test Logging
> Error SMART Error Self-Test Log Read failed: Input/output error
> Smartctl: SMART Self Test Log Read Failed
> Device does not support Selective Self Tests/Logging
> 
>> will give you more information about what may be wrong, if anything.
>>
>> Mike
>>   
> I've not dealt with hard discs on this level before. I am a little 
> unsure of how to interpret the data I received. It looks like it passed, 
> but you have that Reallocated_Sector_Ct to take into consideration. What 
> do you think?

I think that you need to watch this information, and look for trends
in error rates and sector remapping. If they seem to be stable, and
you aren't seeing more sectors getting remapped, then fine.

Mike
-- 
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
Oppose globalization and One World Governments like the UN.
This message made from 100% recycled bits.
You have found the bank of Larn.
I speak only for myself, and I am unanimous in that!
-- 
http://linuxfromscratch.org/mailman/listinfo/lfs-support
FAQ: http://www.linuxfromscratch.org/lfs/faq.html
Unsubscribe: See the above information page

Re: irq timeout error

Reply via email to