Hi all I basically agree with Graham, with just one observation on multi-threaded subrequests. I believe the basic idea of forwarding multiple requests on the back end can be a very good idea, but needs some bounds as Graham suggests.
In my opinion you can define a mod_cache_requester connection pool to the back end server which could be pool limited in order to avoid back end saturation. Using this approach you could desing a mod_cache_requester in such a way: - use a priority queue to keep track of needed caching refresh requests and scheduled time for caching refresh (this latter data is the sorting method for the priority) - each time an URL is requested for the first time you should cache request data (in addition to response header) and fill the priority queue with required data - each mod_cache_requester thread can read from the queue one URL and pass the request (stored previously) to the back end In such a way you can realize an "optimized" requester. What do you think of it? Sergio > Da: "Graham Leggett" > > Parin Shah said: > > > When the page expires from the cache, it is removed from cache and > > thus next request has to wait until that page is reloaded by the > > back-end server. > > This is not strictly true - when a page expires from the cache, a > conditional request is sent to the backend server, and if a fresher > version is available it is updated, otherwise the existing cache contents > are left alone. Place was left in the original cache design for serving > multiple requests of the same non-fresh URL without fetching the backend > URL many times, but this has not yet been implemented. > > The option to guarantee freshness of the cache is a very useful feature > though. > > > Here is the overview of how am I planning to implement it. > > > > 1. when a page is requested and it exists in the cache, mod_cache > > checks the expiry time of the page. > > > > 2. If (expiry time current time) < Some_Constant_Value, > > then mod-cache notifies mod_cache_requester about this page. > > This communication between mod_cache and mod_cache_requester should > > incur least overhead as this would affect current request's response > > time. > > There are two approaches to this: > > - Cache freshness of an URL is checked on each hit to the URL. This runs > the risk of allowing non-popular (but possibly expensive) URLs to expire > without the chance to be refreshed. > > - Cache freshness is checked in an independant thread, which monitors the > cached URLs for freshness at predetermined intervals, and updates them > automatically and independantly of the frontend. > > Either way, it would be useful for mod_cache_requester to operate > independantly of the cache serving requests, so that "cache freshening" > doesn't slow down the frontend. > > I would vote for the second option - a "cache spider" that keeps it fresh. > > > 3. mod_cache_requester will re-request the page which is soon-to-expire. > > Each such request is done through separate thread so that multiple > > pages could be re-requested simultaneously. > > Once mod_cache_requester has decided that a URL needs to be "freshened", > all it needs to do is to make a subrequest to that URL setting the > relevant Cache-Control headers to tell it to refresh the cache, and let > the normal caching mechanism take it's course. > > Putting the subrequests into separate threads isn't necessarily a good > idea, as you don't want to put a sudden simultaneous load onto the backend > server, or take up too much processing power of the frontend itself. You > also probably want to keep things simple. > > > This request would force the server to reload the content of the page > > into the cache even if it is already there. (this would reset the > > expiry time of the page and thus it would be able to stay in the cache > > for longer duration.) > > The cache code should already do this. > > > Please let me know what you think about this module. Also I have some > > questions and your help would be really useful. > > > > 1.what would be the best way for communication between mod_cache and > > mod_cache_requester. I believe that keeping mod_cache_requester in a > > separate thread would be the best way. > > mod_cache_requester will need access to the backend caches so that it can > query freshness. This is done through hooks made available for mod_cache > to do the same thing. > > Firing off a separate thread/process for mod_cache_requester can be done > when the server starts up and the module is initialised, however keep in > mind some of the limitations of threads and processes: > > - If the platform supports threads, then you can monitor the disk cache, > the memory cache, and the shared memory cache. > - If the platform supports processes, then you can monitor the disk cache > and shared memory cache only. > > > 2.How should the mod_cache_requester send the re-request to the main > > server. > > You fire off a subrequest to an URL, and throw away the data that comes back. > > For some example code, look at mod_include. > > > 3.Other than these questions, any suggestion/correction is welcome. > > Any pointers to the details of related modules( mod-cache, > > communication between mod-cache and backend server) would be helpful > > too. > > Keep in mind that mod_cache is a framework, into which sub-modules are > plugged to do the work of the backend caching. > > mod_cache_requester would probably be a submodule of mod_cache, using > mod_cache provided hooks to query elements in the cache. > > Regards, > Graham > -- >