2009/7/14 Kristian Lyngstol krist...@redpill-linpro.com:
On Sat, Jul 11, 2009 at 12:21:38AM +0200, Lazy wrote:
We are having hard time figuring out what's cosing varnish 503 error,
our backend is apache is debian 5 default, os is linux x86_64 2.6.26,
everything is running on a single machine
/usr/local/sbin/varnishd -a 0.0.0.0:80 -f
/usr/local/etc/varnish/default.vcl -s malloc -T localhost: -w
10,6000,300 -u nobody
6000 threads is too much. Since it's per pool, it'll cause up to 12 000
threads to start. That's not likely to go over all that well. If you have
that sort of traffic, you need to scale out. Also, 10 thread minimum is
pretty low.
I typically recommend setting the minimum thread count to what you expect
your normal traffic to be at peak hours. It's probably a dedicated
machines, and idle threads have barely any overhead, while creating new
threads can take some time.
at first i had 3000 threads set and varnish ocassionly droped
connections, so I doubled it
so what whould be a recomended values ?
will -w 1024,1024 -p thread_pools=6 whould be ok ?
the site is usually not so busy, but it has sometimes spikes of static
traffic (about 50Mbps) that's why i upped the thread limit, 3000 was
to low
is it safe to change thread_pools on runtime ?
running with a single backend
.connect_timeout = 1s; added to the backend definition
Any particular reason for adding that?
originally it wasn't there i added it trying to go around the issue
I added
sub vcl_error {
if (req.restarts 10) {
restart;
}
}
(is it possible to add a pause before doing restart ?)
No. This is also a dirty workaround for a fundamental problem.
In about 0.1% of request we get
10 TxRequest b POST
10 TxURL b /php
10 TxProtocol b HTTP/1.1
10 TxHeader b x-requested-with: XMLHttpRequest
10 TxHeader b Accept-Language: pl
10 TxHeader b Referer: http://www.x/php
10 TxHeader b Accept: text/html, */*
10 TxHeader b Content-Type: application/x-www-form-urlencoded
10 TxHeader b UA-CPU: x86
10 TxHeader b Accept-Encoding: gzip, deflate
10 TxHeader b User-Agent: Mozilla/4.0 (compatible; MSIE 7.0;
Windows NT 5.1)
10 TxHeader b Content-Length: 8
10 TxHeader b Cookie: _.1
10 TxHeader b X-NovINet: v1.2
10 TxHeader b X-Varnish: 603437812
10 TxHeader b X-Forwarded-For: 79.162.xxx
10 BackendClose b default
31 VCL_call c error
31 VCL_return c deliver
31 Length c 465
31 VCL_call c deliver
31 VCL_return c deliver
31 TxProtocol c HTTP/1.1
31 TxStatus c 503
machine is not overloaded, there are 150 apache running 80% of them is idle
what does
31 VCL_call c error mean , a connection error, apache returned
invalid response ?
No, it just means that vcl_error is called. BackendClose notes that the
connection to the backend was closed.
can I get some more information about this error using some syslog in
vcl_error or mayby in some other way ?
Possibly, but using syslog in vcl is the last thing I'd recommend.
Does your syslog say anything meaningful? Like assert-errors...
no, only info about admin commands
(...)
60064 Backend connections failures
this is old and it's not changing now
Did the error-rate go down once you solved this? What was causing these
problems?
it was related to load testing, in production it went away when i
upped maxclients on apache
20 N worker threads
4152 N worker threads created
0 N worker threads not created
0 N worker threads limited
0 N queued work requests
226847 N overflowed work requests
This is what I mean with -w 10,6000 being wrong. After the initial startup,
overflowed work requests shouldn't grow much, and you're currently running
at only 20 threads (the minimum), which will cause overflows very fast
(consider how many connections a single client will use to fetch a front
page... You can easily imagine overflowing with just 3-4 concurrent
clients.)
But that's not really causing any 503s. Just delays while threads are
created (and removed).
tcpdump of another 503 (apache is running on port 88),
11:09:50.187842 IP x.x.x.x.50780 x.x.x.x.88: S 88526893:88526893(0)
win 32792 mss 16396,sackOK,timestamp 532825309 0,nop,wscale 7
11:09:50.187851 IP x.x.x.x.88 x.x.x.x.50780: S 81484078:81484078(0)
ack 88526894 win 32768 mss 16396,sackOK,timestamp 532825309
532825309,nop,wscale 7
11:09:50.187867 IP x.x.x.x.50780 x.x.x.x.88: . ack 1 win 257
nop,nop,timestamp 532825309 532825309
11:09:53.187730 IP x.x.x.x.88 x.x.x.x.50780: S 81484078:81484078(0)
ack 88526894 win 32768 mss 16396,sackOK,timestamp 532826059
532825309,nop,wscale 7
11:09:53.187740 IP x.x.x.x.50780 x.x.x.x.88: . ack 1 win 257
nop,nop,timestamp 532826059 532826059,nop,nop,sack 1 {0:1}
11:09:59.191730 IP x.x.x.x.88