Re: Possible new cache architecture

2006-04-27 Thread Parin Shah
 Brian Akins wrote:
 Some functions a provider should provide:
 init(args...) - initialize an instance :)
 open(instance, key) - open a cache object
 read_buffer(object, buffer, copy) - read entire object into buffer.
 buffer may be read only (ie, it may be mmapped or part of sql statement)
 or make it a copy.
 read_bb(object, brigade, copy) - read object into a brigade. copy if
 flag is set
 store_bb(object, brigade) - store a bucket brigade
 store_buffer(object, buffer) - store a blob of data
 close(object)

 Thoughts?  I'm sure we may need more/better cache provider functions.

it would be helpful if provider can notify the mod_cache (using some
sort of call back function ) when it is removing an object from its
cache. So that mod_cache can take a look at the object being removed
and decide to push it to the next-less-resource-critical provider. So
if mem_cache_provider decides to remove the lru object, mod_cache can
push it to disk_cache_provider.


how to make fake request? (Was Re:It's that time of the year again)

2006-04-20 Thread Parin Shah
 Plüm, Rüdiger,  wrote:


 I have been spending some time to remove the libcurl dependency by
 creating fake connection and requests. I didn't know we already have
 such functionality in proxy. Can you tell me where is that code to
 create fake connections/requests. I can use that code instead of
 libcurl for mod_cache_requestor. having permanent solution of
 apr-http-client would be even better. But we can use fake
 connection/requests until we have a permanent solutions.

- Please have a look at ap_proxy_connection_create /
ap_proxy_make_fake_req in proxy_util.c

well, they are not much of help as ap_proxy_connection_create needs an
actual connection, open socket etc. similarly,
ap_proxy_make_fake_request needs the connection and actual request to
copy some data. Now in my case, I have just uri string and not actual
request.

So, I (unsuccessfully) tried to fake connection and request just by
creating them locally But that doesnt seem to work because we dont
have per_dir_config, and filters available locally.
This leads to seg faults when I run locally created request as a sub
request. I can find some , work around by intializaing all failing
data structures of connection and request. But that seems neither
clean nor maintainable. So, libcurl seems the best option unless some
body has some other idea which I can give a try.

Thanks,
Parin.


Re: It's that time of the year again

2006-04-17 Thread Parin Shah
 - mod_cache_requestor (which i don't think really took off)
 and 2 active comitters.

I still haven't given up on it. :-) I am trying to remove the libcurl
dependency by creating mocked up connection and request. hopefully, it
would take off one day :-)


Re: It's that time of the year again

2006-04-17 Thread Parin Shah
 An example I'd like to do (or mentor someone) is a mod_memcached that
 could serve as the basis of memcached based modules.  It could handle
 all the configuration details, errors, and general heavy lifting of
 memcached.  It would then be very easy to write other modules that had
 hooks into memcached (ie, a mod_cache provider).

I have liked the idea of mod_memcached. I can work on it with you (if
we have Soc student for this project, I can work with him as well )


conn_rec mock up?

2006-03-27 Thread Parin Shah
Hi,

As of now, we can not make requests without having actual
connection(conn_rec) to the server.

For example, mod-cache-requester needs to make request for popular and
soon-to-expire from cache pages so that these pages are reloaded in
the cache.  right now, it has to rely on libcurl to re-request page,
which is not very elegant.

other than mod-cache-requester, There are many other intersting things
that can be done if we could create requests internally without
requiring conn_rec.

I looked over existing code, It seems, the best (and easiest) way to
implement this is by mocking up conn_rec. so basically we implement
one more version of core_create_conn() function which would initialize
request url, sockets and various pools such that this conn_rec would
work with existing request_rec.

we can pass this mocked up conn_rec to make_sub_request and thus would
be bale to make requests internally. any thoughts?

I believe, apr_http_client would solve this problem, but I am not sure
about the status of it. If anybody is working on it and dont mind if I
jump into the development cycle, let me know. I would love to get my
hands dirty in this problem.

-Parin.


mod-cache-requester source code

2006-02-24 Thread Parin Shah
Hi,

My svn account is created now, and I have commited mod-cache-requester files.
http://svn.apache.org/repos/asf/httpd/mod_cache_requester/

Along with these files, I had to make some changes in mod_cache.c,
mod_cache.h and config.m4 as well. I am attaching patches for all
three files with this mail.

in mod_cache.h I have added declaration of  cache_requester data structure.

+#define CACHE_REQUESTER_PROVIDER_GROUP cache_requester

+typedef struct {
+int (*notify_cache_requester) (request_rec *r, cache_handle_t *handle);
+} cache_req_provider;
+

- in mod_cache.c I have added code for calling mod_cache_requester if
the page is 'soon-to-expire'

+cache_req_provider *cache_requester;
+char *is_from_cache_requester;
+cache_handle_t *handle;
+

+cache_requester = (cache_req_provider *)
ap_lookup_provider(CACHE_REQUESTER_PROVIDER_GROUP, cache_req, 0);
+if(cache_requester != NULL) {
+if( cache_requester-notify_cache_requester(r, cache-handle)
== DECLINED) {
+ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, r-server,
+ Adding CACHE_SAVE filter.);
+ap_add_output_filter_handle(cache_save_filter_handle, NULL, r,
+r-connection);
+return DECLINED;
+}
+}

- config.m4 of mod_cache required modification to add mod_cache_requester.

Please have a look at the code availabe in repository and patches
attached with this mail and let me know your thoughts. all your
suggestions/corrections are welcome!

Currently, mod-cache-requester has dependency on libcurl. basically,
mod-cache-requester has to make a request for a page to refresh cache,
and it seems there is no straight forward way to do so if you dont
have connection available. thats why we had to use libcurl instead.

so as a next step, I am planning to investigate more about how to get
rid of libcurl from the code. any pointers in that direction would be
really helpful.

Thanks,
Parin.
--- old_mod_cache.h	2006-02-18 19:09:07.0 -0600
+++ mod_cache.h	2006-02-23 22:41:00.0 -0600
@@ -229,6 +229,16 @@
 cache_provider_list *next;
 };
 
+
+// Provider structure. methods of this structure are used by the mod-cache's quick handler to insert soon-to-expire pages into the queue.
+
+#define CACHE_REQUESTER_PROVIDER_GROUP cache_requester
+
+typedef struct {
+int (*notify_cache_requester) (request_rec *r, cache_handle_t *handle);
+} cache_req_provider;
+
+
 /* per request cache information */
 typedef struct {
 cache_provider_list *providers; /* possible cache providers */

--- old_mod_cache.c	2006-02-18 19:09:07.0 -0600
+++ mod_cache.c	2006-02-23 22:37:09.0 -0600
@@ -57,6 +57,10 @@
 cache_server_conf *conf;
 apr_bucket_brigade *out;
 
+cache_req_provider *cache_requester;
+char *is_from_cache_requester;
+cache_handle_t *handle;
+
 /* Delay initialization until we know we are handling a GET */
 if (r-method_number != M_GET) {
 return DECLINED;
@@ -169,6 +173,18 @@
 return DECLINED;
 }
 
+cache_requester = (cache_req_provider *) ap_lookup_provider(CACHE_REQUESTER_PROVIDER_GROUP, cache_req, 0);
+if(cache_requester != NULL) {
+if( cache_requester-notify_cache_requester(r, cache-handle) == DECLINED) {
+ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, r-server,
+ Adding CACHE_SAVE filter.);
+ap_add_output_filter_handle(cache_save_filter_handle, NULL, r,
+r-connection);
+return DECLINED;
+}
+}
+
+
 /* if we are a lookup, we are exiting soon one way or another; Restore
  * the headers. */
 if (lookup) {

--- ol-config.m4	2006-02-23 22:44:30.0 -0600
+++ config.m4	2005-08-24 00:12:27.0 -0500
@@ -6,6 +6,7 @@
 
 APACHE_MODULE(file_cache, File cache, , , no)
 
+
 dnl #  list of object files for mod_cache
 cache_objs=dnl
 mod_cache.lo dnl
@@ -19,8 +20,18 @@
 cache_pqueue.lo dnl
 cache_hash.lo dnl
 
+
+cache_requester_objs=dnl
+mod_cache_requester.lo dnl
+
+
 APACHE_MODULE(cache, dynamic file caching, $cache_objs, , no)
 APACHE_MODULE(disk_cache, disk caching module, , , no)
 APACHE_MODULE(mem_cache, memory caching module, $mem_cache_objs, , no)
+APACHE_MODULE(cache_requester, cache requester module, $cache_requester_objs, , no)
+
+  if test enable_cache_requester != no; then
+APR_ADDTO(LDFLAGS, [`curl-config --libs`])
+  fi
 
 APACHE_MODPATH_FINISH



Re: mod_cache performance

2006-01-12 Thread Parin Shah
We tried to solve thundering herd problem wih cache-requester module
which I have not committed yet. It is currently available on source
forge.

I have not found enough time to work on it after summer of code was
over as I was busy with my thesis, internship. now I have just
relocated to calif recently. So, I should be actively working (in
couple of weeks probably) on cache-requester soon once things settle
down for me.

meanwhile, feel free to use mod-cache-requester if you find it useful.
Let me know if I could be of any help.

-Parin.

On 1/12/06, Graham Leggett [EMAIL PROTECTED] wrote:
 Brian Akins wrote:

  A short list I have (mostly mod_disk_cache):
 
  -read_table and read_array seem slower than they should be
  -thundering herd when a popular object expires

 Thundering herd was one of the original design goals of the new cache
 that was never fully followed through. With the work getting the cache
 to handle transfer failures gracefully, the next logical step is to
 solve thundering herd.

  -writing out request headers when not varying
 
  I have submitted patches for some of these in the past, but I will try
  to quantify the performance implications of these.

 Submit them to bugzilla to make sure they do not drop on the floor, then
 nag this list. I will be off the air till the end of the month to have a
 holiday, but after that I definitely plan to get my hands dirty.

 Regards,
 Graham
 --





my SoC Experience @ httpd

2005-09-01 Thread Parin Shah
Hi All,

I would like to share some of my experiences (in brief :-) ) about
Summer of Code program.

I worked on mod-cache-requester as a part of this program. and I had
great time while working on this module; learned a lot, got a chance
to interact with great guys; worked on one of the most popular web
server!

- It has been pleasure working with Ian. He is a great mentor, helped
me a lot during every stage of the project including CLA paper work!

- and dev@ list people helped me a lot through out this project. All
people were really interested in all problems I was facing and helped
me to get all the problems out of the way. Thank you all for your
help.

- So, what next? I believe, I would continue working
mod-cache-requester for a while. there is lot of great stuff we could
add in the mod-cache-requester. some of them are, integrating
sub-requests and removing dependencies on libcurl, even better page
popularity estimating algorithm etc...

- you can download mod-cache-requester  from
http://sourceforge.net/projects/cache-requester. I would commit it
into repository as soon as I get the access.

Thank you once again,
Parin.


Re: mod_cache wishlist

2005-08-24 Thread Parin Shah
On 8/24/05, Colm MacCarthaigh [EMAIL PROTECTED] wrote:
 On Wed, Aug 24, 2005 at 09:18:54AM -0500, Parin Shah wrote:
I have fixed that memory leak problem. also added script to include
libcurl whenever this module is included.
  
   I hope that it doesn't mean that libcurl is going to be a permanent
   solution, when subrequests (with minor changes) could serve the same
   purpose.
 
  Certainly not, We would have mod-c-requester which uses sub-requests
  and not libcurl eventually. but my initial reaction after going
  through the subrequest code was that it may require significant
  refactoring.
 
 Will it work if you mark the subrequests as proxy requests? IE, the same
 approach as mod_rewrite and the P flag.
 
We had considered proxy requests before. But many of us felt its not
good idea to use proxy requests for re-requests. main reason was that
proxy-requests creates new connection to the main server.

I have not checked mod_rewrite code yet. so I would go through it and
if it seems to solve our problem then it would be great.

Thanks for your input.

 It introduces a dependency on mod_proxy, but curl introduces a
 dependency on libcurl, so that's not so bad.

You are right. I believe, that shouldnt be a problem.


Re: mod_cache wishlist

2005-08-23 Thread Parin Shah
 
  Content definitely should not be served from the cache after it has
  expired imo. However I think an approach like;
 
  if((now + interval)  expired) {
  if(!stat(tmpfile)) {
update_cache_from_backend();
}
  }
 
  ie revalidate the cache content after N-seconds before it is due to be
  expired would have the same effect, but avoid serving stale content.
 
 To a large extent mod_cache_requester (which from inspection seems to be
 much further along than I thought) will solve this problem :)
 
I am already working on it. I have also posted initial version of this module. 

http://utdallas.edu/~parinshah/mod-c-requester.0.2.tar.gz

-Parin.


Re: mod_cache wishlist

2005-08-23 Thread Parin Shah
 
 Cool. Very good start.  Leaks memory like a sieve, but good start.
 
ohh, I thought I was taking care of it. I mean, code frees the memory
when no longer needed except during the shutdown of server. anyway I
will go through the code again to check that. Also feel free to point
out the code which is causing memory leak problem.


 It would be cool if we could find a way to use Apache's subrequest stuff
 rather than curl.  One less dependency you know.  I've also had issues
 with libcurl and ssl randomly coreing.
 
I would also prefer using subrequest instead of libcurl. but when I
reviewed make_sub_request code, I couldnt find a way to use it as
mod-c-requester doesnt have connection and request available.
I believe we might need some re-factoring to make sub-request work in
such scenarios. so for now I am using libcurl and would continue
investigating other ways.

Thanks,
Parin.


Re: mod_cache wishlist

2005-08-23 Thread Parin Shah
  ohh, I thought I was taking care of it. I mean, code frees the memory
  when no longer needed except during the shutdown of server. anyway I
  will go through the code again to check that. Also feel free to point
  out the code which is causing memory leak problem.
 
 I'll look through it as well.  Big thing I noticed was in regards to curl.
 
 for every call to curl_easy_init() you need a call to curl_easy_cleanup()
 
quite right. I will go ahead and fix this. 

 Also, you must call curl_slist_free_all() to free the list
 
 
 If you want to use libcurl, you may want to use a reslist of curl
 handles.  curl can do all the keepalive stuff and you would avoid the
 overhead of constantly creating deleting curls.  Just call
 curl_easy_reset before giving it back to reslist.
 
Good point. This would improve performance for sure. Thanks for the suggestion.

-Parin.


Re: mod_cache wishlist

2005-08-23 Thread Parin Shah
  ohh, I thought I was taking care of it. I mean, code frees the memory
  when no longer needed except during the shutdown of server. anyway I
  will go through the code again to check that. Also feel free to point
  out the code which is causing memory leak problem.
 
 I'll look through it as well.  Big thing I noticed was in regards to curl.
 
 for every call to curl_easy_init() you need a call to curl_easy_cleanup()
 
 Also, you must call curl_slist_free_all() to free the list
 
quite right. I will go ahead and fix this. 

 Also, you must call curl_slist_free_all() to free the list
 
 
 If you want to use libcurl, you may want to use a reslist of curl
 handles.  curl can do all the keepalive stuff and you would avoid the
 overhead of constantly creating deleting curls.  Just call
 curl_easy_reset before giving it back to reslist.
 
Good point. This would improve performance for sure. Thanks for the suggestion.

-Parin.


Re: mod_cache wishlist

2005-08-23 Thread Parin Shah
Hi,

I have fixed that memory leak problem. also added script to include
libcurl whenever this module is included.

http://utdallas.edu/~parinshah/mod-c-requester.0.3.tar.gz

Thanks,
Parin.

On 8/23/05, Parin Shah [EMAIL PROTECTED] wrote:
   ohh, I thought I was taking care of it. I mean, code frees the memory
   when no longer needed except during the shutdown of server. anyway I
   will go through the code again to check that. Also feel free to point
   out the code which is causing memory leak problem.
 
  I'll look through it as well.  Big thing I noticed was in regards to curl.
 
  for every call to curl_easy_init() you need a call to curl_easy_cleanup()
 
  Also, you must call curl_slist_free_all() to free the list
 
 quite right. I will go ahead and fix this.
 
  Also, you must call curl_slist_free_all() to free the list
 
 
  If you want to use libcurl, you may want to use a reslist of curl
  handles.  curl can do all the keepalive stuff and you would avoid the
  overhead of constantly creating deleting curls.  Just call
  curl_easy_reset before giving it back to reslist.
 
 Good point. This would improve performance for sure. Thanks for the 
 suggestion.
 
 -Parin.



Re: Initial mod-cache-requester

2005-08-22 Thread Parin Shah
Hi All,

I have added one directive for secret-code. This code is used to
authenticate the requests created by mod-c-requester. further details
are there in readme file.

Code is available at the same url,

http://utdallas.edu/~parinshah/mod-c-requester.0.2.tar.gz


Thanks,
Parin.

On 8/19/05, Parin Shah [EMAIL PROTECTED] wrote:
 Hi All,
 
 please find initial version of mod-cache-requester at the following url.
 
 http://utdallas.edu/~parinshah/mod-c-requester.0.2.tar.gz
 
 As we have discussed the issue before, it is not possible (or atleast
 not possible w/o some refactoring) to use  make-sub-request to
 re-request all soon-to-expire pages. So currently I am using libcurl
 to re-request all popular and soon to expire pages.
 
 We had considered one other solution also. there we decided to
 re-request durig the original request context itself, but it has some
 other problems including keeping track of popularity of pages,
 removing the pages from queue those are not there in the cache anymore
 etc.
 
 So as of now, making a curl request seems the best solution to me.
 Further details about how to compile and configure this module is in
 the readme file.
 
 Any comments would be really helpful.
 
 Thanks,
 Parin.



Initial mod-cache-requester

2005-08-19 Thread Parin Shah
Hi All,

please find initial version of mod-cache-requester at the following url.

http://utdallas.edu/~parinshah/mod-c-requester.0.2.tar.gz

As we have discussed the issue before, it is not possible (or atleast
not possible w/o some refactoring) to use  make-sub-request to
re-request all soon-to-expire pages. So currently I am using libcurl
to re-request all popular and soon to expire pages.

We had considered one other solution also. there we decided to
re-request durig the original request context itself, but it has some
other problems including keeping track of popularity of pages,
removing the pages from queue those are not there in the cache anymore
etc.

So as of now, making a curl request seems the best solution to me.
Further details about how to compile and configure this module is in
the readme file.

Any comments would be really helpful.

Thanks,
Parin.


Re: how to make sub-requests?

2005-08-10 Thread Parin Shah
 What I meant was that you modify the ap_read_request() to not crash when
 NULL is passed to it.
 As far as I am aware, the request_req only needs certain fields copied
 out of connection_req, not all of which are required.
 

- I played with it to make it work without conn_rec, but it doesnt
soudn very clean to me . the main reason is request_rec itself needs
conn_rec and also other conn_rec variables like lookup defaults. so
even if we  could make it work (which I am not 100% sure that it would
work) it wouldn't be an elegant solution.

 I wouln't use proxy, as proxy has nothing to do with cache (you may
 build proxy without cache if you like, or cache without proxy).

what I meant was that  I might find similar functionality in mod_proxy
which does same thing. I am not planning to reuse it. but would use it
as a reference to create requests for mod-c-requester.

 Rather tack yourself onto mod_cache, so that after a request is
 complete, it can fire off a pass of the cache freshening code while a
 connection_req is still available to create fake requests from.
This is a very good alternative solution. I would proceed in this
direction if I cannot find any other way.

-Parin.


Re: how to make sub-requests?

2005-08-09 Thread Parin Shah
Thanks for your replies.

There is a way to create a request rec using function:
ap_read_request(conn_rec *).

now conn_rec could be created using ap_run_create_connection function
which takes pool, socket, bucket and some other arguments. I waasnt
sure how to get socket information  as we cannot use the connection of
any current request_rec for that.

It would be really helpful if you can suggest me some pointers for this.

Thanks,
Parin.

On 8/8/05, William A. Rowe, Jr. [EMAIL PROTECTED] wrote:
 At 05:15 AM 8/8/2005, Graham Leggett wrote:
 Parin Shah said:
 
  we can store the original request_req which was used when the page was
  served from cache, and then use it as a parameter to the above method.
  Is this approach is fine?
 
 This isn't very clean, a request_req is just a structure, they should be
 relatively simple to make. Look for the code within the core that brings
 the request_req into existence for the first time, you should be able to
 see how it gets created.
 
 Actually I'd suggest you use the core to create that request_rec,
 rather than chasing upgrades for all the things you need to add
 custom when the structure changes.  The request_rec can and does
 grow on mmn minor bumps, so that's an extra reason to avoid ever
 allocating one yourself.  You will have very minimal ABI compat
 if you allocate your own, and should probably emit a warning on
 startup if you see the mmn minor has moved.
 
 Bill
 
 



how to make sub-requests?

2005-08-05 Thread Parin Shah
Hi All,

I am currently working on mod-cache-requester. This module stores
uri's of all the pages those are served from cache. and it re-request
all popular pages those are soon-to-expire from cache so that such
pages are not removed from the cache.

To implement that, mod-cache-requester should be able to make a sub
request for the pages those are about to expire.

this could be achieved in following ways, but each has its own
benefits/limitations.

- using make_sub_request, ap_sub_req_method_uri methods. these
functions create a new request_req for a sub request and they take
current request_req as one of the argument. but mod-cache-requester
may not have current request available.

we can store the original request_req which was used when the page was
served from cache, and then use it as a parameter to the above method.
Is this approach is fine?

- second approach would be to create seperate socket connection and
make request on this connection. Still I personally believe this is
not a very elegant solution.

- other than these methods, please let me know other possible methods
you are aware of.

waiting for some help,
Thanks.
Parin.


Re: mod_cache: Help

2005-07-25 Thread Parin Shah
Hi again.

Rici help me out on this issue.

Thanks,
Parin.

On 7/24/05, Parin Shah [EMAIL PROTECTED] wrote:
 Hi All,
 
 I am currently working on my first module, mod-cache-requester. I am
 planning to make it a sub-module of mod-cache.
 
 I have written a small piece of code which I want to integrate it with
 mod-cache the way mod-mem-cache is integrated. i.e. it should have a
 seperate .so file in the modules and we also should have
 --enable-mod-cache-requester option in configure.
 
 I can compile and build a new module which is totally independent. but
 in this case I am adding some structure in mod_cache.h. and thats why
 I am not too sure how to go abt it.
 
 I would really appreciate your help on this issue.
 
 Thanks,
 Parin.



Re: mod-cache-requestor plan

2005-07-22 Thread Parin Shah
Thanks Ian, Graham and Sergio for your help. 

for past couple of days I am trying to figure out how our
mod-cache-requester should spawn thread (or set of threads).
Currently, I am considering following option. please let me know what
you think about this approach.

- mod-cache-requester would be a sub-module in mod-cache as Graham had
suggested once.

- it would look similar to mod-mem-cache. it would have provider
(mod-cache-requester-provider, for lack of any better word for now)
registered.

- mod-cache (cache_url_handler to be precise)  will do lookup for this
provider and will use this provider's methods to push any page which
is soon-to-be-expired in the priority queue.

- in the post config of the mod-cache-requester our pqueue would be
initialized along with mutexes and other stuff.

- now, we would create new thread (or set of threads) in the post
config which would basically contain an infinite loop. it (or they)
will keep checking pqueue and would make sub requests accordingly.

Does this make sense? 

If this approach is correct then I have some questions regarding
thread vs process implementation. I would start discussing it once we
have main architecture in place.

Thanks,
Parin.

On 7/20/05, Graham Leggett [EMAIL PROTECTED] wrote:
 Parin Shah wrote:
 
  2. how mod-cache-requester can generate the sub request just to reload
  the content in the cache.
 
 Look inside mod_include - it uses subrequests to be able to embed pages
 within other pages.
 
 Regards,
 Graham
 --



Re: mod-cache-requestor plan

2005-07-22 Thread Parin Shah
This would definitely relieve mod-cache from checking the status of
page every time. But then, we would not be able to keep track of the
popularity of the pages.

But yes, this is a good observation. If we could come up with a
mechanism where we could keep track of popularity of pages (# no of
requests, and last access time) without mod-cache's interference, than
that would be a better approach.

-Parin.


On 7/22/05, Sergio Leonardi [EMAIL PROTECTED] wrote:
 The basic approach is ok for me, I just make a note.
 I think that mod_cache should put each cached page in the queue at the time
 its entry in the cache is created (or when its expire time has been
 changed), setting the proper regeneration time in the queue (e.g.
 regeneration time = page expire time - time spent for last page generation).
 
 In such a way there's no need to lookup for what's expiring, just sleep
 until something needs to be regenerated.
 Bye
 
 Sergio
 
 -Original Message-
 From: Parin Shah [mailto:[EMAIL PROTECTED]
 Sent: venerdì 22 luglio 2005 8.02
 To: dev@httpd.apache.org
 Subject: Re: mod-cache-requestor plan
 
 Thanks Ian, Graham and Sergio for your help.
 
 for past couple of days I am trying to figure out how our
 mod-cache-requester should spawn thread (or set of threads).
 Currently, I am considering following option. please let me know what
 you think about this approach.
 
 - mod-cache-requester would be a sub-module in mod-cache as Graham had
 suggested once.
 
 - it would look similar to mod-mem-cache. it would have provider
 (mod-cache-requester-provider, for lack of any better word for now)
 registered.
 
 - mod-cache (cache_url_handler to be precise)  will do lookup for this
 provider and will use this provider's methods to push any page which
 is soon-to-be-expired in the priority queue.
 
 - in the post config of the mod-cache-requester our pqueue would be
 initialized along with mutexes and other stuff.
 
 - now, we would create new thread (or set of threads) in the post
 config which would basically contain an infinite loop. it (or they)
 will keep checking pqueue and would make sub requests accordingly.
 
 Does this make sense?
 
 If this approach is correct then I have some questions regarding
 thread vs process implementation. I would start discussing it once we
 have main architecture in place.
 
 Thanks,
 Parin.
 
 On 7/20/05, Graham Leggett [EMAIL PROTECTED] wrote:
  Parin Shah wrote:
 
   2. how mod-cache-requester can generate the sub request just to reload
   the content in the cache.
 
  Look inside mod_include - it uses subrequests to be able to embed pages
  within other pages.
 
  Regards,
  Graham
  --
 
 



Re: mod-cache-requestor plan

2005-07-20 Thread Parin Shah
Hi All,

We are now almost at consesus about this new mod-cache-requester
module's mechanism. and now I believe its good time to start
implementing the module.

But before I could do that, I need some help from you guys.

- I am now comfortable with mod-cache, mod-mem-cache, cache_storage.c,
cache_util.c etc.

- But still not too sure how to implement couple of things.

1. How to start the new thread/process for mod-cache-requester when
server start. any similar piece of code would help me a lot.

2. how mod-cache-requester can generate the sub request just to reload
the content in the cache.

3. In current scheme, whenever mod-cache-requester pulls first entry
from pqueue ('refresh' queue) it re-requests it to reload. now by the
time this re-request is done, page might actually have been expired
and removed from cache. in such case should mod-cache reload it or
should wait for next legitimate request.

Your thoughts on any/all on these issues would be really helpful.

Thanks
Parin.

On 7/19/05, Ian Holsman [EMAIL PROTECTED] wrote:
 Parin Shah wrote:
 you should be using a mix of
 
 # requests
 last access time
 cost of reproducing the request.
 
 
 
  Just to double check, we would insert entry into the 'refresh queue'
  only if the page is requested and the page is soon-to-be-expired. once
  it is in the queue we would use above parameters to calculate the
  priority. Is this correct? or let me know If I have mistaken it.
 
 yep.
 thats the idea.
 refresh the most-popular pages first.
 
 
 see memcache_gdsf_algorithm() in mod_mem_cache.c for an implementation
 of this, which assumes 'length' of request is related to the cost of
 reproducing the request.
 
 the priority queue implementation is sitting in mod_mem_cache, and could
 be used to implement the 'refresh' queue I would think.
 
 
  I feel comfortable with mod-cache and mod-mem-cache code now. but we
  also need to start new thread/process for mod-cache-requester when
  server starts. I am not too sure how we could implement it. any
  pointers to the similar piece of code would be really helpful to me.
 
 I don't have any code which does this to share with you (others might
 know of some).
 
 
  Thanks,
  Parin.
 
 --Ian
 



Re: mod-cache-requestor plan

2005-07-17 Thread Parin Shah
 you should be using a mix of
 
 # requests
 last access time
 cost of reproducing the request.
 

Just to double check, we would insert entry into the 'refresh queue'
only if the page is requested and the page is soon-to-be-expired. once
it is in the queue we would use above parameters to calculate the
priority. Is this correct? or let me know If I have mistaken it.

 see memcache_gdsf_algorithm() in mod_mem_cache.c for an implementation
 of this, which assumes 'length' of request is related to the cost of
 reproducing the request.
 
 the priority queue implementation is sitting in mod_mem_cache, and could
 be used to implement the 'refresh' queue I would think.
 
I feel comfortable with mod-cache and mod-mem-cache code now. but we
also need to start new thread/process for mod-cache-requester when
server starts. I am not too sure how we could implement it. any
pointers to the similar piece of code would be really helpful to me.

Thanks,
Parin.


Re: mod-cache-requestor plan

2005-07-16 Thread Parin Shah
On 7/16/05, Graham Leggett [EMAIL PROTECTED] wrote:
 Parin Shah wrote:
 
  - I would prefer the approach where we maintain priority queue to keep
  track of popularity. But again you guys have more insight and
  understanding. so whichever approach you guys decide, I am ready to
  work on it! ;-)
 
 Beware of scope creep - we can always start with something simple, like
 a straight list of URLs, and then add the priority later, depends on how
 easy or difficult it is to do.

- Good Point.  We could start with something simple as you said. And
adding priority queue should not be difficult once  we have the basic
mechanism ready.

Thanks,
Parin.


Re: mod-cache-requestor plan

2005-07-15 Thread Parin Shah
Thanks all for for your thoughts on this issue.

  The priority re-fetch would make sure the
  popular pages are always in cache, while others are allowed to die at
  their expense.
 
 
 So every request for an object would update a counter for that url?
 
- we need to maintain a counter for url in this case which would
decide the priority of the url. But mainting this counter should be a
low overhead operation, I believe.

 Both approaches have disadvantages.  I guess you just have to choose your
 poison :)
 
- I would prefer the approach where we maintain priority queue to keep
track of popularity. But again you guys have more insight and
understanding. so whichever approach you guys decide, I am ready to
work on it! ;-)

Thanks,
Parin.


Re: mod-cache-requestor plan

2005-07-15 Thread Parin Shah
On 7/15/05, Colm MacCarthaigh [EMAIL PROTECTED] wrote:
 On Fri, Jul 15, 2005 at 01:23:29AM -0500, Parin Shah wrote:
  - we need to maintain a counter for url in this case which would
  decide the priority of the url. But mainting this counter should be a
  low overhead operation, I believe.
 
 Is a counter strictly speaking the right approach? Why not a time of
 last access?
 
 I havn't run a statistical analysis but based on my logs the likelyhood
 of a url being accessed is very highly correlated to how recently it has
 been accessed before. A truly popular page will always have been
 accessed recently, a page that is becoming popular (and therefore very
 likely to get future hits) will have been accessed recently and a page
 who's popularity is rapidly diminishing will not have been accessed
 recently.
 

Last Access Time is definetaly better solution when compared to
counter mechanism. Would like to know other ppl's opinion too.

Thanks,
Parin.


Re: mod-cache-requestor plan

2005-07-12 Thread Parin Shah
 We have been down this road.  The way one might solve it is to allow
 mod_cache to be able to reload an object while serving the old one.
 
 Example:
 
 cache /A for 600 seconds
 
 after 500 seconds, request /A with special header (or from special client,
 etc) and cache does not serve from cache, but rather pretends the cache has
 expired.  do normal refresh stuff.
 
 The cache will continue to server /A even though it is refreshing it
 

As Graham suggested, such mechanism will not refresh the pages those
are non-popular but expensive to load. which could incur lot of
overhead. But, other than that, This looks really good solution.

 
 Also, one of the flaws of mod_disk_cache (at least the version I am looking
 at) is that it deletes objects before reloading them.  It is better for many
 reasons to only replace them.  That's the best way to accomplish what I
 described above.

If we implement it the way you suggested, then this problem would
automatically be solved.

-Parin.


Re: mod-cache-requestor plan

2005-07-11 Thread Parin Shah
 - Cache freshness of an URL is checked on each hit to the URL. This runs the risk of allowing non-popular (but possibly expensive) URLs to expire
 without the chance to be refreshed. 
 - Cache freshness is checked in an independant thread, which monitors the cached URLs for freshness at predetermined intervals, and updates them
 automatically and independantly of the frontend. 
 Either way, it would be useful for mod_cache_requester to operate
 independantly of the cache serving requests, so that cache freshening doesn't slow down the frontend.
  I would vote for the second option - a cache spider that keeps it fresh.
 -
In this case, what would be the criteria to determine which pages
should be refreshed and which should be left out. intitially I thought
that all the pages - those are about to expire and have been requested
- should be refreshed. but, if we consider keeping non-popular but
expensive pages in the cache, in that case houw would te
mod-c-requester would make the decision? Once mod_cache_requester has decided that a URL needs to be freshened,
 all it needs to do is to make a subrequest to that URL setting the relevant Cache-Control headers to tell it to refresh the cache, and let
 the normal caching mechanism take it's course.- hmm. this seems to be the most elegant solution.
 mod_cache_requester would probably be a submodule of mod_cache, using mod_cache provided hooks to query elements in the cache.
-
considering that mod-cache-requester would be using some mod-cache's
hooks to query the elements in the cache, would mod-cache-requester be
still highly dependent on the platform (process vs threads)?Thanks a lot for all this valuable information, Graham.Parin.


Re: mod-cache-requestor plan

2005-07-11 Thread Parin Shah
 I believe the basic idea of
forwarding multiple requests on the back end can be a very good idea,
but needs some bounds as Graham suggests. ..
its an interesting thought. But after Graham's opinion, I am not too
sure about performance improvement/overload incured by threads ratio.
if we could gain significant performance improvement (whouch would be
the when server is lightly loaded), then it is worth to go for multiple
sub-requests with some upper bound.-Parin


mod-cache-requestor plan

2005-07-10 Thread Parin Shah
Hi All,

I am a newbie. I am going to work on mod-cache and a new module
mod-cache-requester as a part of Soc program.

Small description of the module is as follows.

When the page expires from the cache, it is removed from cache and
thus next request has to wait until that page is reloaded by the
back-end server. But if we add one more module which re-request the
soon-to-expire pages, in that case such pages wont be removed from the
cache and thus would reduce the response time.

Here is the overview of how am I planning to implement it.

1. when a page is requested and it exists in the cache, mod_cache
checks the expiry time of the page.

2. If (expiry time – current time)   Some_Constant_Value,
then mod-cache notifies mod_cache_requester about this page. 
This communication between mod_cache and mod_cache_requester should
incur least overhead as this would affect current request's response
time.

3. mod_cache_requester will re-request the page which is soon-to-expire.
Each such request is done through separate thread so that multiple
pages could be re-requested simultaneously.

This request would force the server to reload the content of the page
into the cache even if it is already there. (this would reset the
expiry time of the page and thus it would be able to stay in the cache
for longer duration.)

Please let me know what you think about this module. Also I have some
questions  and your help would be really useful.

1.what would be the best way for communication between mod_cache and
mod_cache_requester.  I believe that keeping  mod_cache_requester in a
separate thread would be the best way.

2.How should the mod_cache_requester send the re-request to the main
server. I believe that sending it as if the request has come from the
some client would be the best way to implement. But we need to attach
some special status with this request so that cache_lookup  is
bypassed and output_filter is not added as we dont need to stream the
output.

3.Other than these questions, any suggestion/correction is welcome.
Any pointers to the details of related modules( mod-cache,
communication between mod-cache and backend server) would be helpful
too.

Thanks,
Parin.