Hi all
I basically agree with Graham, with just one observation on multi-threaded 
subrequests.
I believe the basic idea of forwarding multiple requests on the back end can be 
a very good idea, but needs some bounds as Graham suggests.

In my opinion you can define a mod_cache_requester connection pool to the back 
end server which could be pool limited in order to avoid back end saturation.
Using this approach you could desing a mod_cache_requester in such a way:
- use a priority queue to keep track of needed caching refresh requests and 
scheduled time for caching refresh (this latter data is the sorting method for 
the priority)
- each time an URL is requested for the first time you should cache request 
data (in addition to response header) and fill the priority queue with required 
data
- each mod_cache_requester thread can read from the queue one URL and pass the 
request (stored previously) to the back end

In such a way you can realize an "optimized" requester.
What do you think of it?

           Sergio

> Da: "Graham Leggett" 
>
> Parin Shah said:
> 
> > When the page expires from the cache, it is removed from cache and
> > thus next request has to wait until that page is reloaded by the
> > back-end server.
> 
> This is not strictly true - when a page expires from the cache, a
> conditional request is sent to the backend server, and if a fresher
> version is available it is updated, otherwise the existing cache contents
> are left alone. Place was left in the original cache design for serving
> multiple requests of the same non-fresh URL without fetching the backend
> URL many times, but this has not yet been implemented.
> 
> The option to guarantee freshness of the cache is a very useful feature
> though.
> 
> > Here is the overview of how am I planning to implement it.
> >
> > 1. when a page is requested and it exists in the cache, mod_cache
> > checks the expiry time of the page.
> >
> > 2. If (expiry time – current time)  < Some_Constant_Value,
> > then mod-cache notifies mod_cache_requester about this page.
> > This communication between mod_cache and mod_cache_requester should
> > incur least overhead as this would affect current request's response
> > time.
> 
> There are two approaches to this:
> 
> - Cache freshness of an URL is checked on each hit to the URL. This runs
> the risk of allowing non-popular (but possibly expensive) URLs to expire
> without the chance to be refreshed.
> 
> - Cache freshness is checked in an independant thread, which monitors the
> cached URLs for freshness at predetermined intervals, and updates them
> automatically and independantly of the frontend.
> 
> Either way, it would be useful for mod_cache_requester to operate
> independantly of the cache serving requests, so that "cache freshening"
> doesn't slow down the frontend.
> 
> I would vote for the second option - a "cache spider" that keeps it fresh.
> 
> > 3. mod_cache_requester will re-request the page which is soon-to-expire.
> > Each such request is done through separate thread so that multiple
> > pages could be re-requested simultaneously.
> 
> Once mod_cache_requester has decided that a URL needs to be "freshened",
> all it needs to do is to make a subrequest to that URL setting the
> relevant Cache-Control headers to tell it to refresh the cache, and let
> the normal caching mechanism take it's course.
> 
> Putting the subrequests into separate threads isn't necessarily a good
> idea, as you don't want to put a sudden simultaneous load onto the backend
> server, or take up too much processing power of the frontend itself. You
> also probably want to keep things simple.
> 
> > This request would force the server to reload the content of the page
> > into the cache even if it is already there. (this would reset the
> > expiry time of the page and thus it would be able to stay in the cache
> > for longer duration.)
> 
> The cache code should already do this.
> 
> > Please let me know what you think about this module. Also I have some
> > questions  and your help would be really useful.
> >
> > 1.what would be the best way for communication between mod_cache and
> > mod_cache_requester.  I believe that keeping  mod_cache_requester in a
> > separate thread would be the best way.
> 
> mod_cache_requester will need access to the backend caches so that it can
> query freshness. This is done through hooks made available for mod_cache
> to do the same thing.
> 
> Firing off a separate thread/process for mod_cache_requester can be done
> when the server starts up and the module is initialised, however keep in
> mind some of the limitations of threads and processes:
> 
> - If the platform supports threads, then you can monitor the disk cache,
> the memory cache, and the shared memory cache.
> - If the platform supports processes, then you can monitor the disk cache
> and shared memory cache only.
> 
> > 2.How should the mod_cache_requester send the re-request to the main
> > server.
> 
> You fire off a subrequest to an URL, and throw away the data that comes back.
> 
> For some example code, look at mod_include.
> 
> > 3.Other than these questions, any suggestion/correction is welcome.
> > Any pointers to the details of related modules( mod-cache,
> > communication between mod-cache and backend server) would be helpful
> > too.
> 
> Keep in mind that mod_cache is a framework, into which sub-modules are
> plugged to do the work of the backend caching.
> 
> mod_cache_requester would probably be a submodule of mod_cache, using
> mod_cache provided hooks to query elements in the cache.
> 
> Regards,
> Graham
> --
> 

Reply via email to