Re: Issues with mod_disk_cache and htcacheclean

Neil Gunton Mon, 05 Jan 2009 14:17:58 -0800

Ruediger Pluem wrote:

What information do your cookies contain? Are these session cookies that
are individual to each client? In this case the usage of mod_disk_cache
with Vary Cookies set would be bad. As these responses would be individual
you couldn't reuse the results anyway for other clients, so it would be
the best to leave caching to the individual client caches (e.g. browser caches).
If your cookies are like BACKGROUND=blue for some users and BACKGROUND=red
for other users you should think of incorporating these differences into
the URL's instead of into varying responses.

I use two cookies currently - one for user logins and one for options.They are independent - people browsing the site may have either, orboth, or neither set.

I need to cache all dynamically generated content so that the server cancope with slashdottings and links from other popular sites where lots ofpeople all click on the same link at the same time ("click storms").Such links could go to any page on the site, and so I really need tocache almost everything from mod_perl - with the exception of areas ofthe site which are obviously user-specific, such as edit forms, users'personal pages and so on. Those are no-cache.

I am very careful about setting expiration times, since with it being adynamic site and all, you don't want too many stale pages. So many ofthe indexes (e.g. list of latest journal updates) have an expiration ofonly 1-3 minutes, while other journal pages have expiration of 12 hoursor more.

I keep a 'version' field as part of the database records for mostcontent on the site, which is incremented whenever an object is edited.Then when someone edits a journal, I include a special 'v=xxx' parameterin subsequent links to pages on that journal, to differentiate it fromearlier versions. So the links from the (fast expiring) index pages suchas forums or journals index will quickly have the new link with the newversion. This allows me to have extensively cached content while stillhaving people see new edits quickly. Thus the cache is fairly high turnover.

The mod_disk_cache works very well, the only issue being keeping thecache size under control without making iowait become noticable as aresult. I have been finding that keeping the limit down to 100M ratherthan 1000M, and making DirCacheLevels 2 rather than 3, and clearing outthe orphaned .header files, and running htcacheclean and my headerpruning script every 10 minutes, seems to make the server verycomfortable - the iowait goes away to unnoticeable levels.

All the app level code here was developed by me. This is a communitywebsite for bicycle touring journals - www.crazyguyonabike.com. Itcurrently sees somewhere north of 100,000 page requests per day,according to analog (and that's not including googlebot, which is onthere constantly). I am very interested in configuring the site to beable to run efficiently on one reasonably well-spec'd server. Cachingdynamic content is a major part of being able to scale well to cope withclick storms.

Regarding the performance you should take a look at the following:

1. Use a separate filesystem for the cache.
2. Ensure that it is mounted with noatime option.
3. Check if you are using the right type of filesystem for this job. If the
   size of the individual cache files is rather small reiserfs can be much
   faster then ext3 if I remember correctly.

I currently use ext2 with noatime for the main filesystem (includingcache). I went to ext2 from ext3 because ext3 has extra overhead relatedto keeping the journal (I believe that is the big difference between thetwo these days). Though I do not have numbers, I do seem to have seendisk performance increase since going back to ext2. I'm not sure if youcan install dir_index with ext2 without turning it into ext3 in theprocess, but in any case I don't have dir_index enabled currently.

I was aware of the potential for using other filesystems for the cache,and had thought about reiserfs as a possibility. However after I wroteto the httpd users list a few weeks back asking about this very issue, Igot zero responses. I then went to the squid group and asked there too,and similarly got zero useful responses. I agree that reiserfs mighthandle many small files better, but I am wary of using that since thetrial of Hans Reiser - it kind of calls the future of his tool intoquestion, unfortunately.

2. Why does htcacheclean not keep the cache at the stated size limit? If
you say -l100M and then do a du and it says 200M, then that is
counter-intuitive, and actually wrong in real terms. It gets worse with
the larger caches - when I had 3 levels and cookie Vary headers on, the
limit for htcacheclean was 1000M, but the cache would grow to 3GB and up.


Again, this is an issue with the documentation. In fact htcacheclean does
not limit the size of the cache at all. It can grow indefinitely.
It only ensures that the size of the cache is being reduced back at least
to the given limit after it ran. The size of the cache is defined as the
sum of all filesizes in the cache. It does not consider the disk usage of
these files which can be larger and it also doesn't take the sizes of the
directories into account. I am not sure if a du like measurement of the
cache size would be implementable in a platform independent way, but I
may be wrong here.


Ok, that's fine. You're right, it sounds like a documentation issue.

This seems to be a bug. Can you please try if the following patch fixes this?

I applied the patch and rebuilt httpd_proxy successfully. The newhtcacheclean runs ok, but still seems to leave behind the orphan .headerfiles. At least, I tried running htcacheclean in single run mode, thus:


htcacheclean -t -p/var/cache/www -l100M

Then I run my prune_cache_headers perl script, and it seems to stillfind a bunch of orphaned .header files to delete. So it doesn't appearto have fixed the issue. I did confirm that the patch was applied.

4. Will I be causing any potential problems for Apache by my deleting
the leftover .header files myself (ones which have no corresponding
.vary subdir)? Does that cause apache or htcacheclean to have potential
issues if you do this while they are running? If they are junk then I
can't see it being a problem, but it's unclear currently if they are
actually used or not.


IMHO not. The patch above does the same.


Great, thanks - good to know.

Thanks for your help!

Neil

Re: Issues with mod_disk_cache and htcacheclean

Reply via email to