Re: Varnish 2.0.6 nuking all my objects?
On Feb 25, 2010, at 2:56 PM, Barry Abrahamson wrote: > In my case, Varnish took a cache of 1 million objects, purged 920k of them. > When there were 80k objects left the child restarted, thus dumping the > remaining 80k :) Happened again - here is the backtrace info: AdvChild (7222) died signal=6 Child (7222) Panic message: Assert error in STV_alloc(), stevedore.c line 71: Condition((st) != NULL) not true. thread = (cache-worker) Backtrace: 0x41d655: pan_ic+85 0x433815: STV_alloc+a5 0x416ca4: Fetch+684 0x41131f: cnt_fetch+cf 0x4125a5: CNT_Session+3a5 0x41f616: wrk_do_cnt_sess+86 0x41eb90: wrk_thread+1b0 0x7f79f61e0fc7: _end+7f79f5b7a147 0x7f79f5abb59d: _end+7f79f545471d sp = 0x7f542e45a008 { fd = 9, id = 9, xid = 116896, client = 10.2.255.5:22276, step = STP_FETCH, handling = discard, restarts = 0, esis = 0 ws = 0x7f542e45a080 { id = "sess", {s,f,r,e} = {0x7f542e45a820,+347,(nil),+16384}, }, The request information shows that it was apparently fetching a 1GB file from the backend and trying to insert it into the cache. -- Barry Abrahamson | Systems Wrangler | Automattic Blog: http://barry.wordpress.com ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Varnish 2.0.6 nuking all my objects?
On Feb 25, 2010, at 12:47 PM, David Birdsong wrote: > On Thu, Feb 25, 2010 at 8:41 AM, Barry Abrahamson > wrote: >> >> On Feb 25, 2010, at 2:26 AM, David Birdsong wrote: >> >>> I have seen this happen. >>> >>> I have a similar hardware setup, though I changed the multi-ssd raid >>> into 3 separate cache file arguments. >> >> Did you try RAID and switch to the separate cache files because performance >> was better? > seemingly so. > > for some reason enabling block_dump showed that kswapd was always > writing to those devices despite their not being any swap space on > them. > > i searched around fruitlessly to try to understand the overhead of > software raid to explain this, but once i discovered varnish could > take on multiple cache files, i saw no reason for the software raid > and just abandoned it. Interesting - I will try it out! Thanks for the info. >>> We had roughly 240GB storage space total, after about 2-3 weeks and >>> sm_bfree reached ~20GB. lru_nuked started incrementing, sm_bfree >>> climbed to ~60GB, but lru_nuking never stopped. >> >> How did you fix it? > i haven't yet. > > i'm changing up how i cache content, such that lru_nuking can be > better tolerated. In my case, Varnish took a cache of 1 million objects, purged 920k of them. When there were 80k objects left the child restarted, thus dumping the remaining 80k :) -- Barry Abrahamson | Systems Wrangler | Automattic Blog: http://barry.wordpress.com ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Varnish 2.0.6 nuking all my objects?
On Feb 25, 2010, at 3:54 AM, Poul-Henning Kamp wrote: > In message , > David > Birdsong writes: > >> We had roughly 240GB storage space total, after about 2-3 weeks and >> sm_bfree reached ~20GB. lru_nuked started incrementing, sm_bfree >> climbed to ~60GB, but lru_nuking never stopped. > > We had a bug where we would nuke from one stevedore, but try to allocate > from another. Not sure if the fix made it into any of the 2.0 releases, > it will be in 2.1 Thanks for the info - are the fixes in -trunk now? -- Barry Abrahamson | Systems Wrangler | Automattic Blog: http://barry.wordpress.com ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Varnish 2.0.6 nuking all my objects?
On Feb 25, 2010, at 2:26 AM, David Birdsong wrote: > I have seen this happen. > > I have a similar hardware setup, though I changed the multi-ssd raid > into 3 separate cache file arguments. Did you try RAID and switch to the separate cache files because performance was better? > We had roughly 240GB storage space total, after about 2-3 weeks and > sm_bfree reached ~20GB. lru_nuked started incrementing, sm_bfree > climbed to ~60GB, but lru_nuking never stopped. How did you fix it? -- Barry Abrahamson | Systems Wrangler | Automattic Blog: http://barry.wordpress.com ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Varnish 2.0.6 nuking all my objects?
4519.58 SHM records shm_writes 50380066 373.27 SHM writes shm_flushes 9387 0.07 SHM flushes due to overflow shm_cont47763 0.35 SHM MTX contention shm_cycles226 0.00 SHM cycles through buffer sm_nreq 444921332.96 allocator requests sm_nobj341160 . outstanding allocations sm_balloc 11373072384 . bytes allocated sm_bfree 116602589184 . bytes free sma_nreq0 0.00 SMA allocator requests sma_nobj0 . SMA outstanding allocations sma_nbytes 0 . SMA outstanding bytes sma_balloc 0 . SMA bytes allocated sma_bfree 0 . SMA bytes free sms_nreq63997 0.47 SMS allocator requests sms_nobj0 . SMS outstanding allocations sms_nbytes 18446744073709548694 . SMS outstanding bytes sms_balloc 31161028 . SMS bytes allocated sms_bfree31162489 . SMS bytes freed backend_req 182196113.50 Backend requests made n_vcl 1 0.00 N vcl total n_vcl_avail 1 0.00 N vcl available n_vcl_discard 0 0.00 N vcl discarded n_purge 1 . N total active purges n_purge_add 1 0.00 N new purges added n_purge_retire 0 0.00 N old purges deleted n_purge_obj_test0 0.00 N objects tested n_purge_re_test 0 0.00 N regexps tested against n_purge_dups0 0.00 N duplicate purges removed hcb_nolock 0 0.00 HCB Lookups without lock hcb_lock0 0.00 HCB Lookups with lock hcb_insert 0 0.00 HCB Inserts esi_parse 0 0.00 Objects ESI parsed (unlock) esi_errors 0 0.00 ESI parse errors (unlock) Obviously, this isn't good for my cache hit rate :) It is also using a lot of CPU. Has anyone seen this happen before? -- Barry Abrahamson | Systems Wrangler | Automattic Blog: http://barry.wordpress.com ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: is 2.0.2 not as efficient as 1.1.2 was?
On Mar 11, 2009, at 2:28 AM, Alex Lines wrote: > Barry, Demitrious - did you ever find a solution here? Nope. For now, we are just using more hardware to hide the problem :( -- Barry Abrahamson | Systems Wrangler | Automattic Blog: http://barry.wordpress.com ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: is 2.0.2 not as efficient as 1.1.2 was?
On Feb 4, 2009, at 1:53 PM, Poul-Henning Kamp wrote: > In message <37eadde4-a23a-4204-b04a-46d47d348...@automattic.com>, > Barry Abraham > son writes: > >>> This week we upgraded to 2.0.2 and are using varnish's back end & >>> director configuration for the same work. What we are seeing is >>> that >>> 2.0.2 holds about 60% of the objects in the same amount of cache >>> space >>> as 1.1.2 did (we tried tcmalloc, jemalloc, and mmap.) > > Your description does not make it obvious to me what is causing this > but one candidate could be the stored hash-string, in particular if > your URLS are long. > > The new purge code (likely included in 2.0.3, but already available > in -trunk) dispenses with the need to store the hash-string so theory > could be tested. Upgraded to trunk, didn't help. -- Barry Abrahamson | Systems Wrangler | Automattic Blog: http://barry.wordpress.com ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: is 2.0.2 not as efficient as 1.1.2 was?
On Nov 25, 2008, at 5:37 PM, Demitrious Kelly wrote: > Hello, > > We run Gravatar.com and use varnish to cache avatar responses. There > are a ton of very small objects and lots of requests per second. Last > week we were using 1.1.2 compiled against tcmalloc (-t 600 -w 1,4000,5 > -h classic,59 -p thread_pools 10 -p listen_depth 4096 -s > malloc,16G). This used an nginx load balancer on a separate host as > its > back end which distributed varnish's requests to our pool of webs. > All > was well. > > This week we upgraded to 2.0.2 and are using varnish's back end & > director configuration for the same work. What we are seeing is that > 2.0.2 holds about 60% of the objects in the same amount of cache space > as 1.1.2 did (we tried tcmalloc, jemalloc, and mmap.) This caused us > quite a few problems after the upgrade as varnish would start spiking > the load on the boxes into the hundreds. We attempted tuning the > lru_interval (up) and obj_workspace (down) but we couldn't get varnish > to hold the same data that it used to on the same machines. > > Right now we've reduced the time that we keep cached objects > drastically, bringing our cache hit rate down to 92% from 96% which > roughly doubled the requests (and load) on the web servers. It is, > however, stable at this point. Obviously the idea of not keeping up > with the latest versions of varnish is not what we want to do, however > effectively doubling requirements for scaling the service is just as > unappealing. > > So, what we're asking is... how do we get varnish 2 to be as efficient > as varnish 1 was? We're glad to try things... It takes a while to > fill > up the cache to the point that it can cause problems so testing and > reporting back will take some time, but we'd like this fixed and will > put in some work. We're currently running the following cli options: > > -a 0.0.0.0:80 -f ... -P ... -T 10.1.94.43:6969 -t 600 -w 1,4000,5 -h > classic,59 -p thread_pools 10 -p listen_depth 4096 -s malloc,16G > > And our VCL looks like this (with most of the webs taken out for > brevity > since they're repeated verbatim with only numbers changed) > > backend web11 { .host = "xxx"; .port = "8088"; .probe = >{ .url = "xxx"; .timeout = 50 ms; .interval = 5s; > .window = 2; .threshold = 1; } > } > backend web12 { .host = "xxx"; .port = "8088"; .probe = >{ .url = "xxx"; .timeout = 50 ms; .interval = 5s; > .window = 2; .threshold = 1; } > } > > director default random { >.retries = 3; >{ .backend = web11; .weight = 1; } >{ .backend = web12; .weight = 1; } > } > > sub vcl_recv { > set req.backend = default; > set req.grace = 30s; > if ( req.url ~ "^/(avatar|userimage)" && req.http.cookie ) { >lookup; > } > } > > sub vcl_fetch { > if (obj.ttl < 600s) { >set obj.ttl = 600s; > } > if (obj.status == 404) { >set obj.ttl = 30s; > } > if (obj.status == 500 || obj.status == 503 ) { >pass; > } > set obj.grace = 30s; > deliver; > } > > sub vcl_deliver { > remove resp.http.Expires; > remove resp.http.Cache-Control; > set resp.http.Cache-Control = "public, max-age=600, proxy- > revalidate"; > deliver; > } Bump :) Is anyone else seeing the same thing? I think it may be a result of the fact that a lot of the cached responses are just headers (302 redirects) and don't have any actual content. That is the only thing I can think of why we would be seeing this issue and others wouldn't. I suspect most people using varnish dont have stats that look like this: 10094887744960644.65847668.80 Total header bytes 22230934332 2174908.58 1866733.93 Total body bytes I don't really want to revert to 1.1.2 because I like the general stability and features of 2.x, but I don't have any real ideas on how to troubleshoot why this would be happening. Any ideas would be appreciated. -- Barry Abrahamson | Systems Wrangler | Automattic Blog: http://barry.wordpress.com ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Random director in varnish trunk doesn't work?
On Jul 10, 2008, at 2:10 PM, Poul-Henning Kamp wrote: > In message <[EMAIL PROTECTED]>, > Barry Abraham > son writes: >> Is anyone successfully using the random director in varnish trunk >> (r2917)? > > Try #2919, I have fixed an off-by one bug I introduced recently. Worked. Thanks. -- Barry Abrahamson | Systems Wrangler | Automattic Blog: http://barry.wordpress.com ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Random director in varnish trunk doesn't work?
Is anyone successfully using the random director in varnish trunk (r2917)? I have something like this in my config: backend web1 { .host = "10.0.1.1"; .port = "8080"; } backend web2 { .host = "10.0.1.2"; .port = "8080"; } director default random { { .backend = web1; .weight = 1; } { .backend = web2; .weight = 1; } } sub vcl_recv { set req.backend = default; . Varnish won't serve any requests -- it looks like the child just dies. Strace shows: read(11, "Assert error in vdi_random_choos"..., 8191) = 102 write(2, "Child (30484) said Assert error "..., 84) = 84 Using a normal backend (not random director) works fine. Details: Debian Etch amd64, 2.6.18-6. I am not sure if it is an unhandled config/syntax error in my vcl or something else. -- Barry Abrahamson | Systems Wrangler | Automattic Blog: http://barry.wordpress.com ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Strategy for large cache sets
On Jul 1, 2008, at 2:16 PM, Skye Poier Nott wrote: > I want to deploy Varnish with very large cache sizes (200GB or more) > for large, long lived file sets. We are doing this also and running into performance problems. We have tried files and swap Running on Linux (Debian). When the cache starts to get large, we start to see huge load spikes (loads of 200+) caused by IO wait but no corresponding spikes in request rates or any of the varnish metrics (except threads running which I am pretty sure is a result of the load spike and not the cause). We are currently running 1.1.2 but have tried with trunk and the same thing happens. If you find anything useful in your testing, I would love you hear about it. -- Barry Abrahamson | Systems Wrangler | Automattic Blog: http://barry.wordpress.com ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Conditional caching question
On Jun 2, 2008, at 8:41 AM, Poul-Henning Kamp wrote: > In message <[EMAIL PROTECTED]>, David Pratt writes: > >> Hi. In most cases, I want a request to be passed to a backend where >> it >> will be handled by server. If frequency is high, however; I want to >> add >> the object to varnish cache and have varnish handle it. I am not >> worried >> about a mechanism to keeping track of frequency of requests. >> Question is >> what is available to me to add an object/path to the varnish cache >> if it >> was originally passed? > > I wouldn't say that your way of using varnish is backwards relative > to the design objectives, but you do come close, since we assumed > caching by default, and pass as exception, rather than the other > way around. We do this on WordPress.com to avoid filling our caches with infrequently requested data. The way we handle it is when an object reaches a certain req/sec threshold, we send a header from the backend and then have varnish configured to only insert objects into the cache which contain this custom header. Based on phk's reply, I guess we are using varnish in a somewhat backwards manner as well, since we assume pass as the detault, insert as the exception. This used to work in 1.0.3. I have started to look into upgrading to trunk, and it doesn't seem to work so well anymore. It looks like the first time the URL is requested, if it is passed because it hasn't reached that threshold and the header hasn't been set, all subsequent requests are automatically "pass" ed. These show up as "Cache hits for pass" in varnishstat. Any way around this? -- Barry Abrahamson | Systems Wrangler | Automattic Blog: http://barry.wordpress.com ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Multiple varnish instances per server?
\On Jun 1, 2008, at 1:38 PM, Michael S. Fischer wrote: > Why are you using Varnish to serve primarily images? Modern > webservers serve static files very efficiently off the filesystem. Because we have about 6TB of content and are using Varnish as the "hot" cache and S3 as the "cold" store. -- Barry Abrahamson | Systems Wrangler | Automattic Blog: http://barry.wordpress.com ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Multiple varnish instances per server?
Hi, Is anyone running multiple varnish instances per server (one per disk or similar?) We are currently running a single varnish instance per server using the file backend. Machines are Dual Opteron 2218, 4GB RAM, and 2 250GB SATA drives. We have the cache file on a software RAID 0 array. Our cache size is set to 300GB, but once we get to 100GB or so, IO starts to get very spiky, causing loads to spike into the 100 range. Our expires are rather long (1-2 weeks). My initial thoughts were that this was caused by cache file fragmentation, but we are seeing similar issues when using the malloc backend. We were thinking that running 2 instances per server with smaller cache files (one per physical disk), may improve our IO problems. Is there any performance benefit/detriment to running multiple varnish instances per server? Is there a performance hit for having a large cache? Request rates aren't that high (50-150/sec), but the cached files are all images, some of which can be rather big (3MB). Also, is anyone else seeing similar issues under similar workloads? -- Barry Abrahamson | Systems Wrangler | Automattic Blog: http://barry.wordpress.com ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc