Re: Child Died

2009-09-07 Thread Kristian Lyngstol
On Wed, Sep 02, 2009 at 10:12:00AM -0400, maillis...@gmail.com wrote:
 I just started my first instance of varnish in production. Within 12 hours,
 there were alerts from our monitoring system that Varnish was taking 90% of
 the cpu. Right after that, I find these messages in /var/log/messages,
 several times over a 2 minute period:

Did you check syslog for assert errors too?

 varnishd[12461]: Child (20086) not responding to ping, killing it.
 
 The child restarted, and the stats and cache all disappeared.
 
 This is a machine with 8 gigs of ram and a pair of slightly older quad core
 xeons. The storage method is file with a 50 gig limit. At its peak, the
 machine is serving around 40 requests a second, about 5000k a second. The
 configs are the defaults.
 
 What should my first steps be to troubleshoot this? Is there a likely
 culprit?

The first I'd do is check syslog for assert errors. If it's being killed in
the same place, something must be wrong (... ).

Secondly, I'd check the value of cli_timeout. This default has changed over
time, but a very busy varnish can be slow to reply to pings from the
management thread, and thus get killed needlessly. You can check it with
the telnet interface or «varnishadm -T localhost:yourmangementport
param.show cli_timeout». The new default is 10s, which should be enough,
though it still might be too low for extremely busy threads.

You may also want to supply a varnishstat -1 (after varnish has had a
chance to warm up) and any custom VCL to the list.


-- 
Kristian Lyngstøl
Redpill Linpro AS
Tlf: +47 21544179
Mob: +47 99014497


pgpzGhT4qnAZT.pgp
Description: PGP signature
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Child Died

2009-09-02 Thread maillists0
I just started my first instance of varnish in production. Within 12 hours,
there were alerts from our monitoring system that Varnish was taking 90% of
the cpu. Right after that, I find these messages in /var/log/messages,
several times over a 2 minute period:

varnishd[12461]: Child (20086) not responding to ping, killing it.

The child restarted, and the stats and cache all disappeared.

This is a machine with 8 gigs of ram and a pair of slightly older quad core
xeons. The storage method is file with a 50 gig limit. At its peak, the
machine is serving around 40 requests a second, about 5000k a second. The
configs are the defaults.

What should my first steps be to troubleshoot this? Is there a likely
culprit?
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc