Parin Shah wrote:
On 7/15/05, Colm MacCarthaigh <[EMAIL PROTECTED]> wrote:

On Fri, Jul 15, 2005 at 01:23:29AM -0500, Parin Shah wrote:

- we need to maintain a counter for url in this case which would
decide the priority of the url. But mainting this counter should be a
low overhead operation, I believe.

Is a counter strictly speaking the right approach? Why not a time of
last access?

I havn't run a statistical analysis but based on my logs the likelyhood
of a url being accessed is very highly correlated to how recently it has
been accessed before. A truly popular page will always have been
accessed recently, a page that is becoming popular (and therefore very
likely to get future hits) will have been accessed recently and a page
who's popularity is rapidly diminishing will not have been accessed
recently.



Last Access Time is definetaly better solution when compared to
counter mechanism. Would like to know other ppl's opinion too.


you should be using a mix of

# requests
last access time
cost of reproducing the request.

see memcache_gdsf_algorithm() in mod_mem_cache.c for an implementation of this, which assumes 'length' of request is related to the cost of reproducing the request.

the priority queue implementation is sitting in mod_mem_cache, and could be used to implement the 'refresh' queue I would think.

Thanks,
Parin.


Reply via email to