Re: Possible new cache architecture
Brian Akins wrote: Some functions a provider should provide: init(args...) - initialize an instance :) open(instance, key) - open a cache object read_buffer(object, buffer, copy) - read entire object into buffer. buffer may be read only (ie, it may be mmapped or part of sql statement) or make it a copy. read_bb(object, brigade, copy) - read object into a brigade. copy if flag is set store_bb(object, brigade) - store a bucket brigade store_buffer(object, buffer) - store a blob of data close(object) Thoughts? I'm sure we may need more/better cache provider functions. it would be helpful if provider can notify the mod_cache (using some sort of call back function ) when it is removing an object from its cache. So that mod_cache can take a look at the object being removed and decide to push it to the next-less-resource-critical provider. So if mem_cache_provider decides to remove the lru object, mod_cache can push it to disk_cache_provider.
how to make fake request? (Was Re:It's that time of the year again)
Plüm, Rüdiger, wrote: I have been spending some time to remove the libcurl dependency by creating fake connection and requests. I didn't know we already have such functionality in proxy. Can you tell me where is that code to create fake connections/requests. I can use that code instead of libcurl for mod_cache_requestor. having permanent solution of apr-http-client would be even better. But we can use fake connection/requests until we have a permanent solutions. - Please have a look at ap_proxy_connection_create / ap_proxy_make_fake_req in proxy_util.c well, they are not much of help as ap_proxy_connection_create needs an actual connection, open socket etc. similarly, ap_proxy_make_fake_request needs the connection and actual request to copy some data. Now in my case, I have just uri string and not actual request. So, I (unsuccessfully) tried to fake connection and request just by creating them locally But that doesnt seem to work because we dont have per_dir_config, and filters available locally. This leads to seg faults when I run locally created request as a sub request. I can find some , work around by intializaing all failing data structures of connection and request. But that seems neither clean nor maintainable. So, libcurl seems the best option unless some body has some other idea which I can give a try. Thanks, Parin.
Re: It's that time of the year again
- mod_cache_requestor (which i don't think really took off) and 2 active comitters. I still haven't given up on it. :-) I am trying to remove the libcurl dependency by creating mocked up connection and request. hopefully, it would take off one day :-)
Re: It's that time of the year again
An example I'd like to do (or mentor someone) is a mod_memcached that could serve as the basis of memcached based modules. It could handle all the configuration details, errors, and general heavy lifting of memcached. It would then be very easy to write other modules that had hooks into memcached (ie, a mod_cache provider). I have liked the idea of mod_memcached. I can work on it with you (if we have Soc student for this project, I can work with him as well )
conn_rec mock up?
Hi, As of now, we can not make requests without having actual connection(conn_rec) to the server. For example, mod-cache-requester needs to make request for popular and soon-to-expire from cache pages so that these pages are reloaded in the cache. right now, it has to rely on libcurl to re-request page, which is not very elegant. other than mod-cache-requester, There are many other intersting things that can be done if we could create requests internally without requiring conn_rec. I looked over existing code, It seems, the best (and easiest) way to implement this is by mocking up conn_rec. so basically we implement one more version of core_create_conn() function which would initialize request url, sockets and various pools such that this conn_rec would work with existing request_rec. we can pass this mocked up conn_rec to make_sub_request and thus would be bale to make requests internally. any thoughts? I believe, apr_http_client would solve this problem, but I am not sure about the status of it. If anybody is working on it and dont mind if I jump into the development cycle, let me know. I would love to get my hands dirty in this problem. -Parin.
mod-cache-requester source code
Hi, My svn account is created now, and I have commited mod-cache-requester files. http://svn.apache.org/repos/asf/httpd/mod_cache_requester/ Along with these files, I had to make some changes in mod_cache.c, mod_cache.h and config.m4 as well. I am attaching patches for all three files with this mail. in mod_cache.h I have added declaration of cache_requester data structure. +#define CACHE_REQUESTER_PROVIDER_GROUP cache_requester +typedef struct { +int (*notify_cache_requester) (request_rec *r, cache_handle_t *handle); +} cache_req_provider; + - in mod_cache.c I have added code for calling mod_cache_requester if the page is 'soon-to-expire' +cache_req_provider *cache_requester; +char *is_from_cache_requester; +cache_handle_t *handle; + +cache_requester = (cache_req_provider *) ap_lookup_provider(CACHE_REQUESTER_PROVIDER_GROUP, cache_req, 0); +if(cache_requester != NULL) { +if( cache_requester-notify_cache_requester(r, cache-handle) == DECLINED) { +ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, r-server, + Adding CACHE_SAVE filter.); +ap_add_output_filter_handle(cache_save_filter_handle, NULL, r, +r-connection); +return DECLINED; +} +} - config.m4 of mod_cache required modification to add mod_cache_requester. Please have a look at the code availabe in repository and patches attached with this mail and let me know your thoughts. all your suggestions/corrections are welcome! Currently, mod-cache-requester has dependency on libcurl. basically, mod-cache-requester has to make a request for a page to refresh cache, and it seems there is no straight forward way to do so if you dont have connection available. thats why we had to use libcurl instead. so as a next step, I am planning to investigate more about how to get rid of libcurl from the code. any pointers in that direction would be really helpful. Thanks, Parin. --- old_mod_cache.h 2006-02-18 19:09:07.0 -0600 +++ mod_cache.h 2006-02-23 22:41:00.0 -0600 @@ -229,6 +229,16 @@ cache_provider_list *next; }; + +// Provider structure. methods of this structure are used by the mod-cache's quick handler to insert soon-to-expire pages into the queue. + +#define CACHE_REQUESTER_PROVIDER_GROUP cache_requester + +typedef struct { +int (*notify_cache_requester) (request_rec *r, cache_handle_t *handle); +} cache_req_provider; + + /* per request cache information */ typedef struct { cache_provider_list *providers; /* possible cache providers */ --- old_mod_cache.c 2006-02-18 19:09:07.0 -0600 +++ mod_cache.c 2006-02-23 22:37:09.0 -0600 @@ -57,6 +57,10 @@ cache_server_conf *conf; apr_bucket_brigade *out; +cache_req_provider *cache_requester; +char *is_from_cache_requester; +cache_handle_t *handle; + /* Delay initialization until we know we are handling a GET */ if (r-method_number != M_GET) { return DECLINED; @@ -169,6 +173,18 @@ return DECLINED; } +cache_requester = (cache_req_provider *) ap_lookup_provider(CACHE_REQUESTER_PROVIDER_GROUP, cache_req, 0); +if(cache_requester != NULL) { +if( cache_requester-notify_cache_requester(r, cache-handle) == DECLINED) { +ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, r-server, + Adding CACHE_SAVE filter.); +ap_add_output_filter_handle(cache_save_filter_handle, NULL, r, +r-connection); +return DECLINED; +} +} + + /* if we are a lookup, we are exiting soon one way or another; Restore * the headers. */ if (lookup) { --- ol-config.m4 2006-02-23 22:44:30.0 -0600 +++ config.m4 2005-08-24 00:12:27.0 -0500 @@ -6,6 +6,7 @@ APACHE_MODULE(file_cache, File cache, , , no) + dnl # list of object files for mod_cache cache_objs=dnl mod_cache.lo dnl @@ -19,8 +20,18 @@ cache_pqueue.lo dnl cache_hash.lo dnl + +cache_requester_objs=dnl +mod_cache_requester.lo dnl + + APACHE_MODULE(cache, dynamic file caching, $cache_objs, , no) APACHE_MODULE(disk_cache, disk caching module, , , no) APACHE_MODULE(mem_cache, memory caching module, $mem_cache_objs, , no) +APACHE_MODULE(cache_requester, cache requester module, $cache_requester_objs, , no) + + if test enable_cache_requester != no; then +APR_ADDTO(LDFLAGS, [`curl-config --libs`]) + fi APACHE_MODPATH_FINISH
Re: mod_cache performance
We tried to solve thundering herd problem wih cache-requester module which I have not committed yet. It is currently available on source forge. I have not found enough time to work on it after summer of code was over as I was busy with my thesis, internship. now I have just relocated to calif recently. So, I should be actively working (in couple of weeks probably) on cache-requester soon once things settle down for me. meanwhile, feel free to use mod-cache-requester if you find it useful. Let me know if I could be of any help. -Parin. On 1/12/06, Graham Leggett [EMAIL PROTECTED] wrote: Brian Akins wrote: A short list I have (mostly mod_disk_cache): -read_table and read_array seem slower than they should be -thundering herd when a popular object expires Thundering herd was one of the original design goals of the new cache that was never fully followed through. With the work getting the cache to handle transfer failures gracefully, the next logical step is to solve thundering herd. -writing out request headers when not varying I have submitted patches for some of these in the past, but I will try to quantify the performance implications of these. Submit them to bugzilla to make sure they do not drop on the floor, then nag this list. I will be off the air till the end of the month to have a holiday, but after that I definitely plan to get my hands dirty. Regards, Graham --
my SoC Experience @ httpd
Hi All, I would like to share some of my experiences (in brief :-) ) about Summer of Code program. I worked on mod-cache-requester as a part of this program. and I had great time while working on this module; learned a lot, got a chance to interact with great guys; worked on one of the most popular web server! - It has been pleasure working with Ian. He is a great mentor, helped me a lot during every stage of the project including CLA paper work! - and dev@ list people helped me a lot through out this project. All people were really interested in all problems I was facing and helped me to get all the problems out of the way. Thank you all for your help. - So, what next? I believe, I would continue working mod-cache-requester for a while. there is lot of great stuff we could add in the mod-cache-requester. some of them are, integrating sub-requests and removing dependencies on libcurl, even better page popularity estimating algorithm etc... - you can download mod-cache-requester from http://sourceforge.net/projects/cache-requester. I would commit it into repository as soon as I get the access. Thank you once again, Parin.
Re: mod_cache wishlist
On 8/24/05, Colm MacCarthaigh [EMAIL PROTECTED] wrote: On Wed, Aug 24, 2005 at 09:18:54AM -0500, Parin Shah wrote: I have fixed that memory leak problem. also added script to include libcurl whenever this module is included. I hope that it doesn't mean that libcurl is going to be a permanent solution, when subrequests (with minor changes) could serve the same purpose. Certainly not, We would have mod-c-requester which uses sub-requests and not libcurl eventually. but my initial reaction after going through the subrequest code was that it may require significant refactoring. Will it work if you mark the subrequests as proxy requests? IE, the same approach as mod_rewrite and the P flag. We had considered proxy requests before. But many of us felt its not good idea to use proxy requests for re-requests. main reason was that proxy-requests creates new connection to the main server. I have not checked mod_rewrite code yet. so I would go through it and if it seems to solve our problem then it would be great. Thanks for your input. It introduces a dependency on mod_proxy, but curl introduces a dependency on libcurl, so that's not so bad. You are right. I believe, that shouldnt be a problem.
Re: mod_cache wishlist
Content definitely should not be served from the cache after it has expired imo. However I think an approach like; if((now + interval) expired) { if(!stat(tmpfile)) { update_cache_from_backend(); } } ie revalidate the cache content after N-seconds before it is due to be expired would have the same effect, but avoid serving stale content. To a large extent mod_cache_requester (which from inspection seems to be much further along than I thought) will solve this problem :) I am already working on it. I have also posted initial version of this module. http://utdallas.edu/~parinshah/mod-c-requester.0.2.tar.gz -Parin.
Re: mod_cache wishlist
Cool. Very good start. Leaks memory like a sieve, but good start. ohh, I thought I was taking care of it. I mean, code frees the memory when no longer needed except during the shutdown of server. anyway I will go through the code again to check that. Also feel free to point out the code which is causing memory leak problem. It would be cool if we could find a way to use Apache's subrequest stuff rather than curl. One less dependency you know. I've also had issues with libcurl and ssl randomly coreing. I would also prefer using subrequest instead of libcurl. but when I reviewed make_sub_request code, I couldnt find a way to use it as mod-c-requester doesnt have connection and request available. I believe we might need some re-factoring to make sub-request work in such scenarios. so for now I am using libcurl and would continue investigating other ways. Thanks, Parin.
Re: mod_cache wishlist
ohh, I thought I was taking care of it. I mean, code frees the memory when no longer needed except during the shutdown of server. anyway I will go through the code again to check that. Also feel free to point out the code which is causing memory leak problem. I'll look through it as well. Big thing I noticed was in regards to curl. for every call to curl_easy_init() you need a call to curl_easy_cleanup() quite right. I will go ahead and fix this. Also, you must call curl_slist_free_all() to free the list If you want to use libcurl, you may want to use a reslist of curl handles. curl can do all the keepalive stuff and you would avoid the overhead of constantly creating deleting curls. Just call curl_easy_reset before giving it back to reslist. Good point. This would improve performance for sure. Thanks for the suggestion. -Parin.
Re: mod_cache wishlist
ohh, I thought I was taking care of it. I mean, code frees the memory when no longer needed except during the shutdown of server. anyway I will go through the code again to check that. Also feel free to point out the code which is causing memory leak problem. I'll look through it as well. Big thing I noticed was in regards to curl. for every call to curl_easy_init() you need a call to curl_easy_cleanup() Also, you must call curl_slist_free_all() to free the list quite right. I will go ahead and fix this. Also, you must call curl_slist_free_all() to free the list If you want to use libcurl, you may want to use a reslist of curl handles. curl can do all the keepalive stuff and you would avoid the overhead of constantly creating deleting curls. Just call curl_easy_reset before giving it back to reslist. Good point. This would improve performance for sure. Thanks for the suggestion. -Parin.
Re: mod_cache wishlist
Hi, I have fixed that memory leak problem. also added script to include libcurl whenever this module is included. http://utdallas.edu/~parinshah/mod-c-requester.0.3.tar.gz Thanks, Parin. On 8/23/05, Parin Shah [EMAIL PROTECTED] wrote: ohh, I thought I was taking care of it. I mean, code frees the memory when no longer needed except during the shutdown of server. anyway I will go through the code again to check that. Also feel free to point out the code which is causing memory leak problem. I'll look through it as well. Big thing I noticed was in regards to curl. for every call to curl_easy_init() you need a call to curl_easy_cleanup() Also, you must call curl_slist_free_all() to free the list quite right. I will go ahead and fix this. Also, you must call curl_slist_free_all() to free the list If you want to use libcurl, you may want to use a reslist of curl handles. curl can do all the keepalive stuff and you would avoid the overhead of constantly creating deleting curls. Just call curl_easy_reset before giving it back to reslist. Good point. This would improve performance for sure. Thanks for the suggestion. -Parin.
Re: Initial mod-cache-requester
Hi All, I have added one directive for secret-code. This code is used to authenticate the requests created by mod-c-requester. further details are there in readme file. Code is available at the same url, http://utdallas.edu/~parinshah/mod-c-requester.0.2.tar.gz Thanks, Parin. On 8/19/05, Parin Shah [EMAIL PROTECTED] wrote: Hi All, please find initial version of mod-cache-requester at the following url. http://utdallas.edu/~parinshah/mod-c-requester.0.2.tar.gz As we have discussed the issue before, it is not possible (or atleast not possible w/o some refactoring) to use make-sub-request to re-request all soon-to-expire pages. So currently I am using libcurl to re-request all popular and soon to expire pages. We had considered one other solution also. there we decided to re-request durig the original request context itself, but it has some other problems including keeping track of popularity of pages, removing the pages from queue those are not there in the cache anymore etc. So as of now, making a curl request seems the best solution to me. Further details about how to compile and configure this module is in the readme file. Any comments would be really helpful. Thanks, Parin.
Initial mod-cache-requester
Hi All, please find initial version of mod-cache-requester at the following url. http://utdallas.edu/~parinshah/mod-c-requester.0.2.tar.gz As we have discussed the issue before, it is not possible (or atleast not possible w/o some refactoring) to use make-sub-request to re-request all soon-to-expire pages. So currently I am using libcurl to re-request all popular and soon to expire pages. We had considered one other solution also. there we decided to re-request durig the original request context itself, but it has some other problems including keeping track of popularity of pages, removing the pages from queue those are not there in the cache anymore etc. So as of now, making a curl request seems the best solution to me. Further details about how to compile and configure this module is in the readme file. Any comments would be really helpful. Thanks, Parin.
Re: how to make sub-requests?
What I meant was that you modify the ap_read_request() to not crash when NULL is passed to it. As far as I am aware, the request_req only needs certain fields copied out of connection_req, not all of which are required. - I played with it to make it work without conn_rec, but it doesnt soudn very clean to me . the main reason is request_rec itself needs conn_rec and also other conn_rec variables like lookup defaults. so even if we could make it work (which I am not 100% sure that it would work) it wouldn't be an elegant solution. I wouln't use proxy, as proxy has nothing to do with cache (you may build proxy without cache if you like, or cache without proxy). what I meant was that I might find similar functionality in mod_proxy which does same thing. I am not planning to reuse it. but would use it as a reference to create requests for mod-c-requester. Rather tack yourself onto mod_cache, so that after a request is complete, it can fire off a pass of the cache freshening code while a connection_req is still available to create fake requests from. This is a very good alternative solution. I would proceed in this direction if I cannot find any other way. -Parin.
Re: how to make sub-requests?
Thanks for your replies. There is a way to create a request rec using function: ap_read_request(conn_rec *). now conn_rec could be created using ap_run_create_connection function which takes pool, socket, bucket and some other arguments. I waasnt sure how to get socket information as we cannot use the connection of any current request_rec for that. It would be really helpful if you can suggest me some pointers for this. Thanks, Parin. On 8/8/05, William A. Rowe, Jr. [EMAIL PROTECTED] wrote: At 05:15 AM 8/8/2005, Graham Leggett wrote: Parin Shah said: we can store the original request_req which was used when the page was served from cache, and then use it as a parameter to the above method. Is this approach is fine? This isn't very clean, a request_req is just a structure, they should be relatively simple to make. Look for the code within the core that brings the request_req into existence for the first time, you should be able to see how it gets created. Actually I'd suggest you use the core to create that request_rec, rather than chasing upgrades for all the things you need to add custom when the structure changes. The request_rec can and does grow on mmn minor bumps, so that's an extra reason to avoid ever allocating one yourself. You will have very minimal ABI compat if you allocate your own, and should probably emit a warning on startup if you see the mmn minor has moved. Bill
how to make sub-requests?
Hi All, I am currently working on mod-cache-requester. This module stores uri's of all the pages those are served from cache. and it re-request all popular pages those are soon-to-expire from cache so that such pages are not removed from the cache. To implement that, mod-cache-requester should be able to make a sub request for the pages those are about to expire. this could be achieved in following ways, but each has its own benefits/limitations. - using make_sub_request, ap_sub_req_method_uri methods. these functions create a new request_req for a sub request and they take current request_req as one of the argument. but mod-cache-requester may not have current request available. we can store the original request_req which was used when the page was served from cache, and then use it as a parameter to the above method. Is this approach is fine? - second approach would be to create seperate socket connection and make request on this connection. Still I personally believe this is not a very elegant solution. - other than these methods, please let me know other possible methods you are aware of. waiting for some help, Thanks. Parin.
Re: mod_cache: Help
Hi again. Rici help me out on this issue. Thanks, Parin. On 7/24/05, Parin Shah [EMAIL PROTECTED] wrote: Hi All, I am currently working on my first module, mod-cache-requester. I am planning to make it a sub-module of mod-cache. I have written a small piece of code which I want to integrate it with mod-cache the way mod-mem-cache is integrated. i.e. it should have a seperate .so file in the modules and we also should have --enable-mod-cache-requester option in configure. I can compile and build a new module which is totally independent. but in this case I am adding some structure in mod_cache.h. and thats why I am not too sure how to go abt it. I would really appreciate your help on this issue. Thanks, Parin.
Re: mod-cache-requestor plan
Thanks Ian, Graham and Sergio for your help. for past couple of days I am trying to figure out how our mod-cache-requester should spawn thread (or set of threads). Currently, I am considering following option. please let me know what you think about this approach. - mod-cache-requester would be a sub-module in mod-cache as Graham had suggested once. - it would look similar to mod-mem-cache. it would have provider (mod-cache-requester-provider, for lack of any better word for now) registered. - mod-cache (cache_url_handler to be precise) will do lookup for this provider and will use this provider's methods to push any page which is soon-to-be-expired in the priority queue. - in the post config of the mod-cache-requester our pqueue would be initialized along with mutexes and other stuff. - now, we would create new thread (or set of threads) in the post config which would basically contain an infinite loop. it (or they) will keep checking pqueue and would make sub requests accordingly. Does this make sense? If this approach is correct then I have some questions regarding thread vs process implementation. I would start discussing it once we have main architecture in place. Thanks, Parin. On 7/20/05, Graham Leggett [EMAIL PROTECTED] wrote: Parin Shah wrote: 2. how mod-cache-requester can generate the sub request just to reload the content in the cache. Look inside mod_include - it uses subrequests to be able to embed pages within other pages. Regards, Graham --
Re: mod-cache-requestor plan
This would definitely relieve mod-cache from checking the status of page every time. But then, we would not be able to keep track of the popularity of the pages. But yes, this is a good observation. If we could come up with a mechanism where we could keep track of popularity of pages (# no of requests, and last access time) without mod-cache's interference, than that would be a better approach. -Parin. On 7/22/05, Sergio Leonardi [EMAIL PROTECTED] wrote: The basic approach is ok for me, I just make a note. I think that mod_cache should put each cached page in the queue at the time its entry in the cache is created (or when its expire time has been changed), setting the proper regeneration time in the queue (e.g. regeneration time = page expire time - time spent for last page generation). In such a way there's no need to lookup for what's expiring, just sleep until something needs to be regenerated. Bye Sergio -Original Message- From: Parin Shah [mailto:[EMAIL PROTECTED] Sent: venerdì 22 luglio 2005 8.02 To: dev@httpd.apache.org Subject: Re: mod-cache-requestor plan Thanks Ian, Graham and Sergio for your help. for past couple of days I am trying to figure out how our mod-cache-requester should spawn thread (or set of threads). Currently, I am considering following option. please let me know what you think about this approach. - mod-cache-requester would be a sub-module in mod-cache as Graham had suggested once. - it would look similar to mod-mem-cache. it would have provider (mod-cache-requester-provider, for lack of any better word for now) registered. - mod-cache (cache_url_handler to be precise) will do lookup for this provider and will use this provider's methods to push any page which is soon-to-be-expired in the priority queue. - in the post config of the mod-cache-requester our pqueue would be initialized along with mutexes and other stuff. - now, we would create new thread (or set of threads) in the post config which would basically contain an infinite loop. it (or they) will keep checking pqueue and would make sub requests accordingly. Does this make sense? If this approach is correct then I have some questions regarding thread vs process implementation. I would start discussing it once we have main architecture in place. Thanks, Parin. On 7/20/05, Graham Leggett [EMAIL PROTECTED] wrote: Parin Shah wrote: 2. how mod-cache-requester can generate the sub request just to reload the content in the cache. Look inside mod_include - it uses subrequests to be able to embed pages within other pages. Regards, Graham --
Re: mod-cache-requestor plan
Hi All, We are now almost at consesus about this new mod-cache-requester module's mechanism. and now I believe its good time to start implementing the module. But before I could do that, I need some help from you guys. - I am now comfortable with mod-cache, mod-mem-cache, cache_storage.c, cache_util.c etc. - But still not too sure how to implement couple of things. 1. How to start the new thread/process for mod-cache-requester when server start. any similar piece of code would help me a lot. 2. how mod-cache-requester can generate the sub request just to reload the content in the cache. 3. In current scheme, whenever mod-cache-requester pulls first entry from pqueue ('refresh' queue) it re-requests it to reload. now by the time this re-request is done, page might actually have been expired and removed from cache. in such case should mod-cache reload it or should wait for next legitimate request. Your thoughts on any/all on these issues would be really helpful. Thanks Parin. On 7/19/05, Ian Holsman [EMAIL PROTECTED] wrote: Parin Shah wrote: you should be using a mix of # requests last access time cost of reproducing the request. Just to double check, we would insert entry into the 'refresh queue' only if the page is requested and the page is soon-to-be-expired. once it is in the queue we would use above parameters to calculate the priority. Is this correct? or let me know If I have mistaken it. yep. thats the idea. refresh the most-popular pages first. see memcache_gdsf_algorithm() in mod_mem_cache.c for an implementation of this, which assumes 'length' of request is related to the cost of reproducing the request. the priority queue implementation is sitting in mod_mem_cache, and could be used to implement the 'refresh' queue I would think. I feel comfortable with mod-cache and mod-mem-cache code now. but we also need to start new thread/process for mod-cache-requester when server starts. I am not too sure how we could implement it. any pointers to the similar piece of code would be really helpful to me. I don't have any code which does this to share with you (others might know of some). Thanks, Parin. --Ian
Re: mod-cache-requestor plan
you should be using a mix of # requests last access time cost of reproducing the request. Just to double check, we would insert entry into the 'refresh queue' only if the page is requested and the page is soon-to-be-expired. once it is in the queue we would use above parameters to calculate the priority. Is this correct? or let me know If I have mistaken it. see memcache_gdsf_algorithm() in mod_mem_cache.c for an implementation of this, which assumes 'length' of request is related to the cost of reproducing the request. the priority queue implementation is sitting in mod_mem_cache, and could be used to implement the 'refresh' queue I would think. I feel comfortable with mod-cache and mod-mem-cache code now. but we also need to start new thread/process for mod-cache-requester when server starts. I am not too sure how we could implement it. any pointers to the similar piece of code would be really helpful to me. Thanks, Parin.
Re: mod-cache-requestor plan
On 7/16/05, Graham Leggett [EMAIL PROTECTED] wrote: Parin Shah wrote: - I would prefer the approach where we maintain priority queue to keep track of popularity. But again you guys have more insight and understanding. so whichever approach you guys decide, I am ready to work on it! ;-) Beware of scope creep - we can always start with something simple, like a straight list of URLs, and then add the priority later, depends on how easy or difficult it is to do. - Good Point. We could start with something simple as you said. And adding priority queue should not be difficult once we have the basic mechanism ready. Thanks, Parin.
Re: mod-cache-requestor plan
Thanks all for for your thoughts on this issue. The priority re-fetch would make sure the popular pages are always in cache, while others are allowed to die at their expense. So every request for an object would update a counter for that url? - we need to maintain a counter for url in this case which would decide the priority of the url. But mainting this counter should be a low overhead operation, I believe. Both approaches have disadvantages. I guess you just have to choose your poison :) - I would prefer the approach where we maintain priority queue to keep track of popularity. But again you guys have more insight and understanding. so whichever approach you guys decide, I am ready to work on it! ;-) Thanks, Parin.
Re: mod-cache-requestor plan
On 7/15/05, Colm MacCarthaigh [EMAIL PROTECTED] wrote: On Fri, Jul 15, 2005 at 01:23:29AM -0500, Parin Shah wrote: - we need to maintain a counter for url in this case which would decide the priority of the url. But mainting this counter should be a low overhead operation, I believe. Is a counter strictly speaking the right approach? Why not a time of last access? I havn't run a statistical analysis but based on my logs the likelyhood of a url being accessed is very highly correlated to how recently it has been accessed before. A truly popular page will always have been accessed recently, a page that is becoming popular (and therefore very likely to get future hits) will have been accessed recently and a page who's popularity is rapidly diminishing will not have been accessed recently. Last Access Time is definetaly better solution when compared to counter mechanism. Would like to know other ppl's opinion too. Thanks, Parin.
Re: mod-cache-requestor plan
We have been down this road. The way one might solve it is to allow mod_cache to be able to reload an object while serving the old one. Example: cache /A for 600 seconds after 500 seconds, request /A with special header (or from special client, etc) and cache does not serve from cache, but rather pretends the cache has expired. do normal refresh stuff. The cache will continue to server /A even though it is refreshing it As Graham suggested, such mechanism will not refresh the pages those are non-popular but expensive to load. which could incur lot of overhead. But, other than that, This looks really good solution. Also, one of the flaws of mod_disk_cache (at least the version I am looking at) is that it deletes objects before reloading them. It is better for many reasons to only replace them. That's the best way to accomplish what I described above. If we implement it the way you suggested, then this problem would automatically be solved. -Parin.
Re: mod-cache-requestor plan
- Cache freshness of an URL is checked on each hit to the URL. This runs the risk of allowing non-popular (but possibly expensive) URLs to expire without the chance to be refreshed. - Cache freshness is checked in an independant thread, which monitors the cached URLs for freshness at predetermined intervals, and updates them automatically and independantly of the frontend. Either way, it would be useful for mod_cache_requester to operate independantly of the cache serving requests, so that cache freshening doesn't slow down the frontend. I would vote for the second option - a cache spider that keeps it fresh. - In this case, what would be the criteria to determine which pages should be refreshed and which should be left out. intitially I thought that all the pages - those are about to expire and have been requested - should be refreshed. but, if we consider keeping non-popular but expensive pages in the cache, in that case houw would te mod-c-requester would make the decision? Once mod_cache_requester has decided that a URL needs to be freshened, all it needs to do is to make a subrequest to that URL setting the relevant Cache-Control headers to tell it to refresh the cache, and let the normal caching mechanism take it's course.- hmm. this seems to be the most elegant solution. mod_cache_requester would probably be a submodule of mod_cache, using mod_cache provided hooks to query elements in the cache. - considering that mod-cache-requester would be using some mod-cache's hooks to query the elements in the cache, would mod-cache-requester be still highly dependent on the platform (process vs threads)?Thanks a lot for all this valuable information, Graham.Parin.
Re: mod-cache-requestor plan
I believe the basic idea of forwarding multiple requests on the back end can be a very good idea, but needs some bounds as Graham suggests. .. its an interesting thought. But after Graham's opinion, I am not too sure about performance improvement/overload incured by threads ratio. if we could gain significant performance improvement (whouch would be the when server is lightly loaded), then it is worth to go for multiple sub-requests with some upper bound.-Parin
mod-cache-requestor plan
Hi All, I am a newbie. I am going to work on mod-cache and a new module mod-cache-requester as a part of Soc program. Small description of the module is as follows. When the page expires from the cache, it is removed from cache and thus next request has to wait until that page is reloaded by the back-end server. But if we add one more module which re-request the soon-to-expire pages, in that case such pages wont be removed from the cache and thus would reduce the response time. Here is the overview of how am I planning to implement it. 1. when a page is requested and it exists in the cache, mod_cache checks the expiry time of the page. 2. If (expiry time – current time) Some_Constant_Value, then mod-cache notifies mod_cache_requester about this page. This communication between mod_cache and mod_cache_requester should incur least overhead as this would affect current request's response time. 3. mod_cache_requester will re-request the page which is soon-to-expire. Each such request is done through separate thread so that multiple pages could be re-requested simultaneously. This request would force the server to reload the content of the page into the cache even if it is already there. (this would reset the expiry time of the page and thus it would be able to stay in the cache for longer duration.) Please let me know what you think about this module. Also I have some questions and your help would be really useful. 1.what would be the best way for communication between mod_cache and mod_cache_requester. I believe that keeping mod_cache_requester in a separate thread would be the best way. 2.How should the mod_cache_requester send the re-request to the main server. I believe that sending it as if the request has come from the some client would be the best way to implement. But we need to attach some special status with this request so that cache_lookup is bypassed and output_filter is not added as we dont need to stream the output. 3.Other than these questions, any suggestion/correction is welcome. Any pointers to the details of related modules( mod-cache, communication between mod-cache and backend server) would be helpful too. Thanks, Parin.