On 13/03/2012 08:09, Volodymyr Kostyrko wrote:
> The only load I know to cause sure lockup in some hours is memcached.
> Right now project is migrated to redis and machines survives for two
> weeks. Most common problem for lockup is ECC error.

I see.  That puts a different complexion on things.  Although it is
application specific it doesn't rule out hardware problems.  In fact,
given the nature of the error -- ECC problems -- it pretty much nails it
as something wrong with the RAM in that machine.

Given that memtest86 doesn't show any problems, and you can run a
similar workload with different software it suggests that you have a
memory stick (or sticks) that are marginal.  Something like extra heat
due to higher rates of memory accesses from a particular application
could be tipping it over the edge into failure.

The 'marginal' behaviour need not be a fault in the memory stick per se.
 It could simply be the particular characteristics of the memory you
have installed not being exactly compatible with your motherboard.  In
theory the memory conforming to a particular standard should avoid this
sort of problem, but this is unfortunately not completely infallible.
Swapping out memory sticks for an equivalent specification from a
different manufacturer should give good results.

        Cheers,

        Matthew

-- 
Dr Matthew J Seaman MA, D.Phil.
PGP: http://www.infracaninophile.co.uk/pgpkey


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to