On Sun, Aug 07, 2005 at 09:59:15PM +0200, [EMAIL PROTECTED] wrote: > As you already mentioned the remove_url implementation > of mod_disk_cache is currently an empty dummy :-).
I've been thinking about that, but it's not entirely as easy as it first seems, or indeed as htcacheclean wrongly assumes. unlinking the data/header files is not good enough, directories are also resources, and consume space in the inode table. Also, most filesystems slow down as the ammount of hard-links in any directory increases. The hash function used by mod_cache is 128-bits in size, which given any conceivable settings for CacheDirLevels and CacheDirLength is still going to be more than enough to exhaust the inode table on any rational filesystem. Considering that one of the main uses of mod_cache is for mod_proxy users, where there is an infinity of keys (urls) this means that the filesystem will eventually become unusable unless the adminstator periodically weighs in an removes empty directories manually. Because of Windows and non-standard filesystems like AFS (read the GNU find(1) manpage section on -noleaf) assumptions about the directory link counts can't be made, which means a full-scale readdir() and directory traversal to remove the cache entry. This can be pretty a prettly slow operation. Possibly slow enough to merit implementing it in an APR_HOOK_REALLY_LAST hook, so that it can avoid slowing down the request serving. An alternative which I think was discussed here is to create a cgid-like process which deals with this kind of task, aswell as more complex things like managing just-to-expire files. But that kind of steps on the toes of the cache_requester SoC work. Either way, implementing an immediate unlink() on the files, just to be able to return useful things about the writability of the filesystem, but leave the more complex directory cleanup until the file has been served is what I'm thinking of. For what it's worth though, htcacheclean itself has this massive bug, and does not do any directory cleanup, so your patch isn't alone in doing this. > I had a similar problem with 404 responses, and wrote a patch for this which > is > currently in discussion (attached patch again to this mail): > > http://mail-archives.apache.org/mod_mbox/httpd-dev/200507.mbox/[EMAIL > PROTECTED] > > It actually does implement a removal of the files in mod_disk_cache and > should also handle your problem. If it does not, I am pretty sure that a > small modification > to the patch would do it. > > I would really appreciate if you find some time to review my patch. I'm not a committer, so my review is only informational. But I'm familiar enough with the cache mode, I've been running and patching it for my own purposes for years, so I've taken a look. I'm not really sure what it is aiming to do in relation to 404's. 404's are never saved to the cache in the first place, the check in mod_cache.c sorts that out; if (r->status != HTTP_OK && r->status != HTTP_NON_AUTHORITATIVE && r->status != HTTP_MULTIPLE_CHOICES && r->status != HTTP_MOVED_PERMANENTLY && r->status != HTTP_NOT_MODIFIED) { /* RFC2616 13.4 we are allowed to cache 200, 203, 206, 300, 301 or 410 * We don't cache 206, because we don't (yet) cache partial responses. * We include 304 Not Modified here too as this is the origin server * telling us to serve the cached copy. */ reason = apr_psprintf(p, "Response status %d", r->status); } Are 404's being served incorrectly in some circumstances? In any case; Your remove_url code in mod_disk_cache takes the approach of just unlinking the data and header file I discussed above. But well, since so does htcacheclean, that's no reflection on the quality of the patch. I don't think the cache_removal_url filter is neccessary, the cache_save_filter already has a lot of code in place for handling the case of a stale cache handle where I think it is better placed. The (much smaller) patch I've just submitted, plus your changes to mod_disk_cache, mod_mem_cache and cache_storage.c would do the same job with a third of the code :) -- Colm MacCárthaigh Public Key: [EMAIL PROTECTED]
