On Mar 25, 2014, at 8:46 AM, Matthew Spilich wrote:

> The symptom:   The database machine (running postgres 9.1.9 on CentOS 6.4) is 
> running a low utilization most of the time, but once every day or two, it 
> will appear to slow down to the point where queries back up and clients are 
> unable to connect.  Once this event occurs, there are lots of concurrent 
> queries, I see slow queries appear in the logs, but there doesn't appear to 
> be anything abnormal that I have been able to see that causes this behavior.
...
> Has any on the forum seen something similar?   Any suggestions on what to 
> look at next?    If it is helpful to describe the server hardware, it's got 2 
> E5-2670 cpu and 256 GB of ram, and the database is hosted on 1.6TB raid 10 
> local storage (15K 300 GB drives).  



I could be way off here, but years ago I experienced something like this (in 
oracle land) and after some stressful chasing, the marginal failure of the raid 
controller revealed itself.  Same kind of event, steady traffic and then some 
i/o would not complete and normal ops would stack up.  Anyway, what you report 
reminded me of that event.  The E5 is a few years old, I wonder if the raid 
controller firmware needs a patch?  I suppose a marginal power supply might 
cause a similar "hang."  Anyway, marginal failures are very painful.  Have you 
checked sar or OS logging at event time?

Reply via email to