subject:"Possible new cache architecture"

Re: Possible new cache architecture

2006-05-04 Thread Brian Akins


Graham Leggett wrote:


I think in the long run, a dedicated process is the way to go.


I think using a provider architecture would be best and keep complexity 
out of mod_cache.  Some module(s) would implement the necessary cache 
management functions and mod_cache would push/pull/probe the "manager" 
using this interface.  The manager may or may not be tied to the storage 
provider.  We may have enough "generic interfaces" already to allow 
completely "stand alone" cache managers.


At least, that's how I would do it...


--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Possible new cache architecture

2006-05-04 Thread Plüm , Rüdiger , VF EITO



> -Ursprüngliche Nachricht-
> Von: Joe Orton
> > 
> > 1. This is an API change which might be hard to backport.
> > 2. I do not really like the close tie between the storage provider
> >and the filter chain. It forces the provider to do things it
> >should not care about from my point of view.
> 
> At least this much could be solved I suppose by passing in a 
> callback of 
> type apr_brigade_flush which does the pass to f->next; the storage 

Sorry, but I guess that I do not understand this completely. So instead
of passing f->next to store_body and make it call ap_pass_brigade with
the "small" brigade and f->next you propose to create a callback function
of type apr_brigade_flush inside of mod_cache and pass the pointer to this
function and f->next to store_body, such that it can call this function
with the "small" brigade and f->next as the ctx parameter of apr_brigade_flush?
This function of course calls ap_pass_brigade then.

> provider could remain filter-agnostic then.  No idea about your other 
> issues, sorry.

I will keep on thinking on this. Thanks for your help.

Regards

Rüdiger

Re: Possible new cache architecture

2006-05-03 Thread Joe Orton

On Wed, May 03, 2006 at 02:07:44PM +0200, Plüm, Rüdiger, VF EITO wrote:
> > -Ursprüngliche Nachricht-
> > Von: Joe Orton 
> > 
> > The way I would expect it to work would be by passing f->next in to 
> > the store_body callback, it looks doomed to eat RAM as currently 
> > designed. mod_disk_cache's store_body implementation can then do:
> > 
> >  1. read bucket(s) from brigade, appending to some temp brigade
> >  2. write bucket(s) in temp brigade to cache file
> >  3. pass temp brigade on to f->next
> >  4. clear temp brigade to ensure memory is released
> >  5. goto 1
> 
> Yes, this was also my idea, but I would like to avoid this, because:
> 
> 1. This is an API change which might be hard to backport.
> 2. I do not really like the close tie between the storage provider
>and the filter chain. It forces the provider to do things it
>should not care about from my point of view.

At least this much could be solved I suppose by passing in a callback of 
type apr_brigade_flush which does the pass to f->next; the storage 
provider could remain filter-agnostic then.  No idea about your other 
issues, sorry.

joe

Re: Possible new cache architecture

2006-05-03 Thread Gonzalo Arana

On 5/3/06, Graham Leggett <[EMAIL PROTECTED]> wrote:

Gonzalo Arana wrote:

> again, I am in the dark: why do cache request headers may need to be
> replaced or edited in the same entity?

It's a requirement of the HTTP/1.1 spec.

non-modified response headers to conditional requests need to update
cached response headers.

we should try to avoid 'dialog' with cache backend.

The catch is when the server sent "304 Not Modified" - you need to
update your cache to say "yep, my cached entry is still fresh", ie
update the headers, without touching the body, which hasn't changed.

I see the light now :).

Having a single cache_admin proc/thread would make this easier, since
any operation can be presented as atomic, while it may require more
than a single syscall (I know, the goal is avoid full entity
duplication).  Anyway, I guess a good policy is to have 'editable'
content as binary data (i.e., no variable length).  Perhaps this is
not possible anyway :(.

Of course, to avoid a 'dialog' between httpd process and cache_admin,
both cache_admin and httpd must be smart enough.

> That's why I suggested a dedicated process/thread for cache
> administration, which is not a good idea if too many lookups are
> issued to this process on each request received.

I think in the long run, a dedicated process is the way to go.

+1 :).

Regards,

--
Gonzalo A. Arana

Re: Possible new cache architecture

2006-05-03 Thread Ruediger Pluem

On 05/03/2006 10:46 PM, Graham Leggett wrote:
> 
> mod_cache definitely needs cache admin, currently it's implemented as an
> external program that is called via cron, which doesn't help if you're
> on a box without cron. Cache cleaning can be done either when a

Not completely true. According to the documentation you can start it as a daemon
(-d ,http://httpd.apache.org/docs/2.2/programs/htcacheclean.html#options) that
runs periodically. Of course this daemon has to be started and configured 
separately
from httpd, so it may not be the final solution.

Regards

Rüdiger

Re: Possible new cache architecture

2006-05-03 Thread Graham Leggett


Gonzalo Arana wrote:


again, I am in the dark: why do cache request headers may need to be
replaced or edited in the same entity?


It's a requirement of the HTTP/1.1 spec.

HTTP requests can be conditional, in other words a browser (or a proxy) 
can ask a server "give me this URL, but only if it has changed from my 
cached copy".


If the server thinks that the file has changed (or Cache-Control: 
no-cache was specified), then the server will send a full response back 
headers + body, and the browser/proxy replaces it's cached copy with the 
new headers+body.


If the server thinks that the file is the same, ie it didn't change, the 
server sends back the magic code "304 Not Modified", and just the 
headers - without any body. These new headers must replace the existing 
headers in the browser/proxy's cached entry, making the cached entry 
"fresh" again. And here lies the problem.


Doing the request this way means you don't have to ask the backend "is 
my cached copy still fresh?", get an answer back "No", and then send a 
second request saying "ok then, give me the new data" - you can 
implement caching in one request.


The catch is when the server sent "304 Not Modified" - you need to 
update your cache to say "yep, my cached entry is still fresh", ie 
update the headers, without touching the body, which hasn't changed.



That's why I suggested a dedicated process/thread for cache
administration, which is not a good idea if too many lookups are
issued to this process on each request received.


mod_cache definitely needs cache admin, currently it's implemented as an 
external program that is called via cron, which doesn't help if you're 
on a box without cron. Cache cleaning can be done either when a 
connection is complete in the existing process (which may be simpler to 
implement, but it runs after every connection), or it can be done as you 
suggest, where a dedicated thread/process handles this independently.


I think in the long run, a dedicated process is the way to go.

Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Possible new cache architecture

2006-05-03 Thread Brian Akins


Roy T. Fielding wrote:

That is a heck of a lot easier than convincing everyone to dump
the current code based on an untested theory.


I think the idea may be a lot more tested than you think.  Most things I 
"suggest" have had an incubation period somewhere...



I'm fine with not screwing with current mod_cache.  I just think it 
should be either: renamed or made generic.  We may or may not need a 
generic mod_backend_cache.  I have posted a "psuedo-implementation" that 
got lost in the latest thread bloat.  I can repost if anyone is interested.


--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Possible new cache architecture

2006-05-03 Thread Gonzalo Arana

Thanks for bringing me to the light.

On 5/3/06, Graham Leggett <[EMAIL PROTECTED]> wrote:

Gonzalo Arana wrote:

> Excuse my ignorance in this matter, but about the 'cache sub-key'
> issue, why not just use a generic cache (with some expiration model
> -LRU, perhaps-) with a 'smart' comparison function?

So far one of the best suggestions was from the patch posted recently,
where the headers and body were in the same file, but where the headers
were given "breathing room" before the cache body, so that the headers
can be replaced (within reasonable limits).
What this means is that each key/data entry is now a single file again
(like in 1.3), which is much easier to clean up atomically.

The problem still remains that an existing cache file's headers must be
editable, without doing expensive operations like copying, and this

again, I am in the dark: why do cache request headers may need to be
replaced or edited in the same entity?

editing must be atomic (no use one thread/process trying to serve
content from the cache and halfway through, another thread tries to
update the headers). This will require some form of locking, which may
be too much of a performance drag, thus blowing the back-to-one-file
idea out the water.

this makes sense, but I still do not understand the origin of the
problem (in-place header replacement).

Problems with cache expiry though are a real problem that mod_cache
suffers from now, and need to be fixed.

That's why I suggested a dedicated process/thread for cache
administration, which is not a good idea if too many lookups are
issued to this process on each request received.

Regards,

--
Gonzalo A. Arana

Re: Possible new cache architecture

2006-05-03 Thread Brian Akins


Roy T. Fielding wrote:


For the record, Graham's statements were entirely correct,
Brian's suggested architecture would slow the HTTP cache,


No. It would simplify the existing implementation.  The existing 
implementation, as Graham has noted, is not "fully functional."  Graham 
argues - and I'm still mulling it over - that a generic cache 
architecture would get in the way of making a fully functional http cache.


--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Possible new cache architecture

2006-05-03 Thread Graham Leggett


Gonzalo Arana wrote:


Excuse my ignorance in this matter, but about the 'cache sub-key'
issue, why not just use a generic cache (with some expiration model
-LRU, perhaps-) with a 'smart' comparison function?


So far one of the best suggestions was from the patch posted recently, 
where the headers and body were in the same file, but where the headers 
were given "breathing room" before the cache body, so that the headers 
can be replaced (within reasonable limits).


What this means is that each key/data entry is now a single file again 
(like in 1.3), which is much easier to clean up atomically.


The problem still remains that an existing cache file's headers must be 
editable, without doing expensive operations like copying, and this 
editing must be atomic (no use one thread/process trying to serve 
content from the cache and halfway through, another thread tries to 
update the headers). This will require some form of locking, which may 
be too much of a performance drag, thus blowing the back-to-one-file 
idea out the water.


Problems with cache expiry though are a real problem that mod_cache 
suffers from now, and need to be fixed.


Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Possible new cache architecture

2006-05-03 Thread Davi Arnaut

On Wed, 3 May 2006 11:39:02 -0700
"Roy T. Fielding" <[EMAIL PROTECTED]> wrote:

> On May 3, 2006, at 5:56 AM, Davi Arnaut wrote:
> 
> > On Wed, 3 May 2006 14:31:06 +0200 (SAST)
> > "Graham Leggett" <[EMAIL PROTECTED]> wrote:
> >
> >> On Wed, May 3, 2006 1:26 am, Davi Arnaut said:
> >>
>  Then you will end up with code that does not meet the  
>  requirements of
>  HTTP, and you will have wasted your time.
> >>>
> >>> Yeah, right! How ? Hey, you are using the Monty Python argument  
> >>> style.
> >>> Can you point to even one requirement of HTTP that my_cache_provider
> >>> wont meet ?
> >>
> >> Yes. Atomic insertions and deletions, the ability to update headers
> >> independantly of body, etc etc, just go back and read the thread.
> >
> > I can't argue with a zombie, you keep repeating the same  
> > misunderstands.
> >
> >> Seriously, please move this off list to keep the noise out of  
> >> people's
> >> inboxes.
> >
> > Fine, I give up.
> 
> For the record, Graham's statements were entirely correct,
> Brian's suggested architecture would slow the HTTP cache,
> and your responses have been amazingly childish for someone
> who has earned zero credibility on this list.

Fine, I do have zero credibility.

> I suggest you stop defending a half-baked design theory and
> just go ahead and implement something as a patch.  If it works,
> that's great.  If it slows the HTTP cache, I will veto it myself.

I'm already doing this.

> There is, of course, no reason why the HTTP cache has to use
> some new middle-layer back-end cache, so maybe you could just
> stop arguing about vaporware and simply implement a single
> mod_backend_cache that doesn't try to be all things to all people.
> 
> Implement it and then convince people on the basis of measurements.
> That is a heck of a lot easier than convincing everyone to dump
> the current code based on an untested theory.
> 

I just wanted to get comments (the original idea wasn't mine).

It wasn't my intention to flame anyone, I'm not mad or anything.
I was just stating my opinion. I maybe wrong, but I don't give
up easy. :)

--
Davi Arnaut

Re: Possible new cache architecture

2006-05-03 Thread Roy T. Fielding


On May 3, 2006, at 5:56 AM, Davi Arnaut wrote:


On Wed, 3 May 2006 14:31:06 +0200 (SAST)
"Graham Leggett" <[EMAIL PROTECTED]> wrote:


On Wed, May 3, 2006 1:26 am, Davi Arnaut said:

Then you will end up with code that does not meet the  
requirements of

HTTP, and you will have wasted your time.


Yeah, right! How ? Hey, you are using the Monty Python argument  
style.

Can you point to even one requirement of HTTP that my_cache_provider
wont meet ?


Yes. Atomic insertions and deletions, the ability to update headers
independantly of body, etc etc, just go back and read the thread.


I can't argue with a zombie, you keep repeating the same  
misunderstands.


Seriously, please move this off list to keep the noise out of  
people's

inboxes.


Fine, I give up.


For the record, Graham's statements were entirely correct,
Brian's suggested architecture would slow the HTTP cache,
and your responses have been amazingly childish for someone
who has earned zero credibility on this list.

I suggest you stop defending a half-baked design theory and
just go ahead and implement something as a patch.  If it works,
that's great.  If it slows the HTTP cache, I will veto it myself.

There is, of course, no reason why the HTTP cache has to use
some new middle-layer back-end cache, so maybe you could just
stop arguing about vaporware and simply implement a single
mod_backend_cache that doesn't try to be all things to all people.

Implement it and then convince people on the basis of measurements.
That is a heck of a lot easier than convincing everyone to dump
the current code based on an untested theory.

Roy

Re: Possible new cache architecture

2006-05-03 Thread Gonzalo Arana


Excuse my ignorance in this matter, but about the 'cache sub-key'
issue, why not just use a generic cache (with some expiration model
-LRU, perhaps-) with a 'smart' comparison function?

We could use as key full request headers (perhaps somewhat parsed),
and as a comparison function a clever enough code to handle Vary,
entity aging and so on.

Best regards,

--
Gonzalo A. Arana

Re: Possible new cache architecture

2006-05-03 Thread Graham Leggett


Brian Akins wrote:

Does this discussion belong off-list?  I would think this is the type of 
thing we need to discuss on this list.


The technical discussion belongs on the list, flames not.

Is there any consensus as to how to move forward?  Do we just leave it 
as it is currently?


There is a patch on the table, let's review it.

Regards,
Graham
--



smime.p7s
Description: S/MIME Cryptographic Signature

Re: Possible new cache architecture

2006-05-03 Thread Graham Leggett


William A. Rowe, Jr. wrote:

--1.  This is a development list.  If you don't want development 
discussions,

don't subscribe.


I was referring to the flamebait, development discussions would 
obviously remain on the list.


Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Possible new cache architecture

2006-05-03 Thread Graham Leggett


Brian Akins wrote:

Moving towards and keeping with the above goals is a far higher 
priority than simplifying the generic backend cache interface.


This response was a perfect summation of why we do *not* run the stock 
mod_cache here...


Having the source means you can customise and improve the code to better 
meet your needs, and in your case your modifications work for you, and 
your organisation has the resources to commission and maintain those 
modifications.


The trouble is, in order to be accepted into httpd, your modifications 
have to work for everyone else as well.


Apparently for example the problem of trying to handle subkeys under a 
main key "is mod_http_cache's problem". Ok, so mod_httpd_cache now has 
to implement locking mechanisms to try and somehow turn the elegant (but 
overly simplistic) mod_cache into a cache that is practically useful. In 
the process we slow the cache down. The whole point of the cache is to 
speed things up.


Suddenly, we lose the whole point of the exercise.

Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Possible new cache architecture

2006-05-03 Thread William A. Rowe, Jr.


Graham Leggett wrote:


Seriously, please move this off list to keep the noise out of people's
inboxes.


--1.  This is a development list.  If you don't want development discussions,
don't subscribe.

Bill

Re: Possible new cache architecture

2006-05-03 Thread Brian Akins


Graham Leggett wrote:


Seriously, please move this off list to keep the noise out of people's
inboxes.


Does this discussion belong off-list?  I would think this is the type of 
thing we need to discuss on this list.


Is there any consensus as to how to move forward?  Do we just leave it 
as it is currently?


--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Possible new cache architecture

2006-05-03 Thread Brian Akins


Graham Leggett wrote:

Moving towards and keeping with the above goals is a far higher priority 
than simplifying the generic backend cache interface.


This response was a perfect summation of why we do *not* run the stock 
mod_cache here...



--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Possible new cache architecture

2006-05-03 Thread Davi Arnaut

On Wed, 3 May 2006 14:31:06 +0200 (SAST)
"Graham Leggett" <[EMAIL PROTECTED]> wrote:

> On Wed, May 3, 2006 1:26 am, Davi Arnaut said:
> 
> >> Then you will end up with code that does not meet the requirements of
> >> HTTP, and you will have wasted your time.
> >
> > Yeah, right! How ? Hey, you are using the Monty Python argument style.
> > Can you point to even one requirement of HTTP that my_cache_provider
> > wont meet ?
> 
> Yes. Atomic insertions and deletions, the ability to update headers
> independantly of body, etc etc, just go back and read the thread.

I can't argue with a zombie, you keep repeating the same misunderstands.

> Seriously, please move this off list to keep the noise out of people's
> inboxes.

Fine, I give up.

--
Davi Arnaut

Re: Possible new cache architecture

2006-05-03 Thread Graham Leggett

On Wed, May 3, 2006 1:26 am, Davi Arnaut said:

>> Then you will end up with code that does not meet the requirements of
>> HTTP, and you will have wasted your time.
>
> Yeah, right! How ? Hey, you are using the Monty Python argument style.
> Can you point to even one requirement of HTTP that my_cache_provider
> wont meet ?

Yes. Atomic insertions and deletions, the ability to update headers
independantly of body, etc etc, just go back and read the thread.

Seriously, please move this off list to keep the noise out of people's
inboxes.

Regards,
Graham
--

Re: Possible new cache architecture

2006-05-03 Thread Plüm , Rüdiger , VF EITO



> -Ursprüngliche Nachricht-
> Von: Joe Orton 
> 
> The way I would expect it to work would be by passing f->next 
> in to the 
> store_body callback, it looks doomed to eat RAM as currently 
> designed. 
> mod_disk_cache's store_body implementation can then do:
> 
>  1. read bucket(s) from brigade, appending to some temp brigade
>  2. write bucket(s) in temp brigade to cache file
>  3. pass temp brigade on to f->next
>  4. clear temp brigade to ensure memory is released
>  5. goto 1

Yes, this was also my idea, but I would like to avoid this, because:

1. This is an API change which might be hard to backport.
2. I do not really like the close tie between the storage provider
   and the filter chain. It forces the provider to do things it
   should not care about from my point of view.
   Furthermore: What about mod_cache in this case? Do you want to
   skip ap_pass_brigade there or do you want to cleanup the original
   brigade inside store_body of mod_disk_cache and let mod_cache pass
   an empty brigade up the chain?
   If we decide to skip ap_pass_brigade inside mod_cache all storage
   providers need to ensure that they pass the data up the chain
   which seems duplicated code to me and does not seem to belong to
   their core tasks.
   OTH doing this in mod_cache and only pass the small brigade to
   store_body of the provider has the drawback that mod_mem_cache wants
   to see the original file buckets in order to save the file descriptors
   of the files.
   To be honest, currently I have no solution at hand that I really like,
   but I agree that this really needs to be changed.

Regards

Rüdiger

Re: Possible new cache architecture

2006-05-03 Thread Joe Orton

On Tue, May 02, 2006 at 02:21:27PM +0200, Plüm, Rüdiger, VF EITO wrote:
> Another thing: I guess on systems with no mmap support the current 
> implementation
> of mod_disk_cache will eat up a lot of memory if you cache a large local file,
> because it transforms the file bucket(s) into heap buckets in this case.
> Even if mmap is present I think that mod_disk_cache causes the file buckets
> to be transformed into many mmap buckets if the file is large. Thus we do not
> use sendfile in the case we cache the file.
> I the case that a brigade only contains file_buckets it might be possible to
> "copy" this brigade, sent it up the chain and process the copy of the brigade
> for disk storage afterwards. Of course this opens a race if the file gets
> changed in between these operations.
> This approach does not work with socket or pipe buckets for obvious reasons.
> Even heap buckets seem to be a somewhat critical idea because of the added 
> memory usage.

The way I would expect it to work would be by passing f->next in to the 
store_body callback, it looks doomed to eat RAM as currently designed. 
mod_disk_cache's store_body implementation can then do:

 1. read bucket(s) from brigade, appending to some temp brigade
 2. write bucket(s) in temp brigade to cache file
 3. pass temp brigade on to f->next
 4. clear temp brigade to ensure memory is released
 5. goto 1

joe

Re: Possible new cache architecture

2006-05-02 Thread Davi Arnaut

On Wed, 03 May 2006 01:09:03 +0200
Graham Leggett <[EMAIL PROTECTED]> wrote:

> Davi Arnaut wrote:
> 
> > Graham, what I want is to be able to write a mod_cache backend _without_
> > having to worry about HTTP.
> 
> Then you will end up with code that does not meet the requirements of 
> HTTP, and you will have wasted your time.

Yeah, right! How ? Hey, you are using the Monty Python argument style.
Can you point to even one requirement of HTTP that my_cache_provider
wont meet ?

> Please go through _all_ of the mod_cache architecture, and not just 
> mod_disk_cache. Also read and understand HTTP/1.1 gateways and caches, 
> and as you want to create a generic cache, read and understand mod_ldap, 
> a module that will probably benefit from the availability of a generic 
> cache. Then step back and see that mod_cache is a small part of a bigger 
> picture. At this point you'll see that as nice as your idea of a simple 
> generic cache interface is, it's not going to be the most elegant 
> solution to the problem.

blah, blah.. you essentially said: "I don't want a simpler interface,
I think the current mess is more elegant."

I have shown you that I can even wrap your messy cache_provider hooks
into a much simpler one, how can anything else be more elegant ?

--
Davi Arnaut

Re: Possible new cache architecture

2006-05-02 Thread Graham Leggett


Davi Arnaut wrote:


Graham, what I want is to be able to write a mod_cache backend _without_
having to worry about HTTP.


Then you will end up with code that does not meet the requirements of 
HTTP, and you will have wasted your time.


Please go through _all_ of the mod_cache architecture, and not just 
mod_disk_cache. Also read and understand HTTP/1.1 gateways and caches, 
and as you want to create a generic cache, read and understand mod_ldap, 
a module that will probably benefit from the availability of a generic 
cache. Then step back and see that mod_cache is a small part of a bigger 
picture. At this point you'll see that as nice as your idea of a simple 
generic cache interface is, it's not going to be the most elegant 
solution to the problem.


Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Possible new cache architecture

2006-05-02 Thread Davi Arnaut

On Tue, 02 May 2006 23:31:13 +0200
Graham Leggett <[EMAIL PROTECTED]> wrote:

> Davi Arnaut wrote:
> 
> >> The way HTTP caching works is a lot more complex than in your example, you
> >> haven't taken into account conditional HTTP requests.
> > 
> > I've taken into account the actual mod_disk_cache code!
> 
> mod_disk_cache doesn't contain any of the conditional HTTP request code, 
> which is why you're not seeing it there.
> 
> Please keep in mind that the existing mod_cache framework's goal is to 
> be a fully HTTP/1.1 compliant, content generator neutral, efficient, 
> error free and high performance cache.
> 
> Moving towards and keeping with the above goals is a far higher priority 
> than simplifying the generic backend cache interface.
> 
> To sum up - the cache backend must fulfill the requirements of the cache 
> frontend (generic or not), which in turn must fulfill the requirements 
> of the users, who are browsers, web robot code, and humans. To try and 
> prioritise this the other way round is putting the cart before the horse.

Graham, what I want is to be able to write a mod_cache backend _without_
having to worry about HTTP. _NOT_ to rewrite mod_disk/proxy/cache/whatever!

You keep talking about HTTP this, HTTP that, I wont change the way it currently
works. I just want to place a glue beteween the storage and the HTTP part.

I could even wrap around your code:

typedef struct 
apr_status_t (*fetch) (cache_handle_t *h, apr_bucket_brigade *bb);
apr_status_t (*store) (cache_handle_t *h, apr_bucket_brigade *bb);
int (*remove) (const char *key);
} my_cache_provider;

typedef struct {
const char *key_headers;
const char *key_body;
} my_cache_object;

create_entity:
my_cache_object *obj;

obj->key_headers = hash_headers(request, whatever);
obj->key_body = hash_body(request, whatever);

open_entity:
my_cache_object *obj;

my_provider->fetch(h, obj->key_headers, header_brigade);

// if necessary, update obj->key_headers/body (vary..)


remove_url:
my_provider->remove(obj->key_header);
my_provider->remove(obj->key_body);

remove_entity:
nop

store_headers:
my_cache_object *obj;
// if necessary, update obj->key_headers (vary..)
my_provider->store(h, obj->key_headers, header_brigade);

store_body:
my_cache_object *obj;
my_provider->store(h, obj->key_body, body_brigade)

recall_headers:
my_cache_object *obj;
my_provider->fetch(h, obj->key_headers, header_brigade);

recall_body:
my_cache_object *obj;
my_provider->fetch(h, obj->key_body, body_brigade);

--
Davi Arnaut

Re: Possible new cache architecture

2006-05-02 Thread Graham Leggett


Davi Arnaut wrote:


The way HTTP caching works is a lot more complex than in your example, you
haven't taken into account conditional HTTP requests.


I've taken into account the actual mod_disk_cache code!


mod_disk_cache doesn't contain any of the conditional HTTP request code, 
which is why you're not seeing it there.


Please keep in mind that the existing mod_cache framework's goal is to 
be a fully HTTP/1.1 compliant, content generator neutral, efficient, 
error free and high performance cache.


Moving towards and keeping with the above goals is a far higher priority 
than simplifying the generic backend cache interface.


To sum up - the cache backend must fulfill the requirements of the cache 
frontend (generic or not), which in turn must fulfill the requirements 
of the users, who are browsers, web robot code, and humans. To try and 
prioritise this the other way round is putting the cart before the horse.


Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Possible new cache architecture

2006-05-02 Thread Gonzalo Arana

On 5/2/06, Brian Akins <[EMAIL PROTECTED]> wrote:

Gonzalo Arana wrote:

> What problems have you seen with this approach?  postfix uses this
> architecture, for instance.

Postfix implements SMTP, which is an asynchronous protocol.

and which problems may bring this approach?

> Excuse my ignorance, what does "event mpm ... keep the balance very
> good" mean?

Not all your threads are tied up doing keepalives, for example.

ah, I see (I was unfamiliar with event MPM, sory).

--
Gonzalo A. Arana

Re: Possible new cache architecture

2006-05-02 Thread Brian Akins


Gonzalo Arana wrote:


What problems have you seen with this approach?  postfix uses this
architecture, for instance.


Postfix implements SMTP, which is an asynchronous protocol.

Excuse my ignorance, what does "event mpm ... keep the balance very 
good" mean?


Not all your threads are tied up doing keepalives, for example.


--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Possible new cache architecture

2006-05-02 Thread Gonzalo Arana

On 5/2/06, Brian Akins <[EMAIL PROTECTED]> wrote:

Gonzalo Arana wrote:
> A more suitable design for this task I think would be to make each
> process to have a special purpose: cache maintenance (purging expired
> entries, purging entries to make room for new ones, creating new
> entries, and so on), request processing (network/disk I/O, content
> filtering, and so on), or what ever.

In my experience, this always sounds good in theory, but just doesn't
ever work in the real world.  The event mpm is "sorta" a step in that
direction, but seems to keep the balance pretty good.

What problems have you seen with this approach?  postfix uses this
architecture, for instance.

Excuse my ignorance, what does "event mpm ... keep the balance very good" mean?

--
Gonzalo A. Arana

Re: Possible new cache architecture

2006-05-02 Thread Brian Akins


Gonzalo Arana wrote:

A more suitable design for this task I think would be to make each
process to have a special purpose: cache maintenance (purging expired
entries, purging entries to make room for new ones, creating new
entries, and so on), request processing (network/disk I/O, content
filtering, and so on), or what ever.


In my experience, this always sounds good in theory, but just doesn't 
ever work in the real world.  The event mpm is "sorta" a step in that 
direction, but seems to keep the balance pretty good.




--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Possible new cache architecture

2006-05-02 Thread Davi Arnaut

On Tue, 2 May 2006 17:22:00 +0200 (SAST)
"Graham Leggett" <[EMAIL PROTECTED]> wrote:

> On Tue, May 2, 2006 7:06 pm, Davi Arnaut said:
> 
> > There is not such scenario. I will simulate a request using the disk_cache
> > format:
> 
> The way HTTP caching works is a lot more complex than in your example, you
> haven't taken into account conditional HTTP requests.

I've taken into account the actual mod_disk_cache code! Let me try to translate
your typical scenario.

> A typical conditional scenario goes like this:
> 
> - Browser asks for URL from httpd.

Same.

> - Mod_cache has a cached copy by looking up the headers BUT - it's stale.
> mod_cache converts the browser's original request to a conditional request
> by adding the header If-None-Match.

sed s/mod_cache/mod_http_cache

> - The backend server answers "no worries, what you have is still fresh" by
> sending a "304 Not Modified".

sed s/mod_cache/mod_http_cache

> - mod_cache takes the headers from the 304, and replaces the headers on
> the cached entry, in the process making the entry "fresh" again.

sed s/mod_cache/mod_http_cache

> - mod_cache hands the cached data back to the browser.

sed s/mod_cache/mod_http_cache

> Read http://www.ietf.org/rfc/rfc2616.txt section 13 (mainly) to see in
> detail how this works.

Again: we do not want to change the semantics, we only want to separate
the HTTP specific part from the storage specific part. The HTTP specific
parts of mod_disk_cache, mod_mem_cache and mod_cache are moved to a
mod_http_cache, while retaining the storage specific parts. And mod_cache
is the one who will combine those two layers.

Again: it's the same thing as we were replacing all mod_disk_cache file
operations by hash table operations.

--
Davi Arnaut

Re: Possible new cache architecture

2006-05-02 Thread Gonzalo Arana


Seems to me that the thundering herd / performance degradation is
inherent to apache design: all threads/processes are exact clones.

A more suitable design for this task I think would be to make each
process to have a special purpose: cache maintenance (purging expired
entries, purging entries to make room for new ones, creating new
entries, and so on), request processing (network/disk I/O, content
filtering, and so on), or what ever.

This way, performance degradation caused by cache mutex can be
minimized.  Request processors would only get queued/locked when
querying the cache, which can be made a single operation if cache is
smart enough to figure out the right response from original request,
right?

Regards,

--
Gonzalo A. Arana

Re: Possible new cache architecture

2006-05-02 Thread Graham Leggett

On Tue, May 2, 2006 5:50 pm, Brian Akins said:

> This seems more like a wish list.  I just want to separate out the cache
> and protocol stuff.

HTTP compliance isn't a wish, it's a requirement. A patch that breaks
compliance will end up being -1'ed.

The thundering herd issues are also a requirement, as provision was made
for it in the v2.0 design. The cache must deliver what the HTTP cache
requires (which in turn delivers what users require), not the other way
around.

Separating the cache and the protocol has advantages, but it also has the
disadvantage that fixing bugs like thundering herd may require interface
changes, forcing people to have to wait for major version number changes
before they see their problems fixed.

In this scenario, the separation of cache and protocol is (very) nice to
have, but not so nice that end users are disadvantaged.

>> - The ability to amend a subkey (the headers) on an entry that is
>> already
>> cached.
>
> mod_http_cache should handle.  to new mod_cache, it's just another
> key/value.

How does mod_http_cache do this without the need for locking (and thus
performance degradation)?

How does mod_cache guarantee that it won't expire the body without
atomically expiring the headers with it?

>> - The ability to invalidate a particular cached variant (ie headers +
>> data) in one atomic step, without affecting threads that hold that
>> cached
>> entry open at the time.
>
> mod_http_cache should handle.

Entry invalidation is definitely mod_cache's problem, it falls under cache
size maintenance and expiry.

Remember that mod_http_cache only runs when requests are present, entry
invalidation has to happen whether there are requests present or not, via
a separate thread, separate process, cron job, whatever.

>> - The ability to read from a cached object that is still being written
>> to.
>
> Nice to have.  out of scope for what I am proposing.  new mod_cache
> should be the place to implement this if underlying provider supports it.

It's not nice to have, no. It's a real problem that has inspired people to
log bugs, and very recently, for one person to submit a patch.

Regards,
Graham
--

Re: Possible new cache architecture

2006-05-02 Thread Brian Akins


Graham Leggett wrote:


To be HTTP compliant, and to solve thundering herd, we need the following
from a cache:



This seems more like a wish list.  I just want to separate out the cache 
and protocol stuff.




- The ability to amend a subkey (the headers) on an entry that is already
cached.


mod_http_cache should handle.  to new mod_cache, it's just another 
key/value.



- The ability to invalidate a particular cached variant (ie headers +
data) in one atomic step, without affecting threads that hold that cached
entry open at the time.


mod_http_cache should handle. Keep a list of variants cached - this 
should use a provider interface as well.  mod_cache would handle 
whatever locking, ref counting, etc, needs to be done, if any.



- The ability to read from a cached object that is still being written to.


Nice to have.  out of scope for what I am proposing.  new mod_cache 
should be the place to implement this if underlying provider supports it.




- A guarantee that the result of a broken write (segfault, timeout,
connection reset by peer, whatever) will not result in a broken cached
entry (ie that the cached entry will eventually be invalidated, and all
threads trying to read from it will eventually get an error).


agreed.  new mod_cache should handle this.


Certainly separate the protocol from the physical cache, just make sure
the physical cache delivers the shopping list above :)


Most seem like protocol specific stuff.


--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Possible new cache architecture

2006-05-02 Thread Graham Leggett

On Tue, May 2, 2006 5:27 pm, Brian Akins said:

> Still not sure how this is different from what we are proposing.  we
> really want to separate protocol from cache stuff.  If we have a
> "revalidate" for the generic cache it should address all your concerns.
> ???

To be HTTP compliant, and to solve thundering herd, we need the following
from a cache:

- The ability to amend a subkey (the headers) on an entry that is already
cached.

- The ability to invalidate a particular cached variant (ie headers +
data) in one atomic step, without affecting threads that hold that cached
entry open at the time.

- The ability to read from a cached object that is still being written to.

- A guarantee that the result of a broken write (segfault, timeout,
connection reset by peer, whatever) will not result in a broken cached
entry (ie that the cached entry will eventually be invalidated, and all
threads trying to read from it will eventually get an error).

Certainly separate the protocol from the physical cache, just make sure
the physical cache delivers the shopping list above :)

Regards,
Graham
--

Re: Possible new cache architecture

2006-05-02 Thread Brian Akins


Graham Leggett wrote:


The way HTTP caching works is a lot more complex than in your example, you
haven't taken into account conditional HTTP requests.
...


Still not sure how this is different from what we are proposing.  we 
really want to separate protocol from cache stuff.  If we have a 
"revalidate" for the generic cache it should address all your concerns.  ???





--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Possible new cache architecture

2006-05-02 Thread Graham Leggett

On Tue, May 2, 2006 7:06 pm, Davi Arnaut said:

> There is not such scenario. I will simulate a request using the disk_cache
> format:

The way HTTP caching works is a lot more complex than in your example, you
haven't taken into account conditional HTTP requests.

A typical conditional scenario goes like this:

- Browser asks for URL from httpd.

- Mod_cache has a cached copy by looking up the headers BUT - it's stale.
mod_cache converts the browser's original request to a conditional request
by adding the header If-None-Match.

- The backend server answers "no worries, what you have is still fresh" by
sending a "304 Not Modified".

- mod_cache takes the headers from the 304, and replaces the headers on
the cached entry, in the process making the entry "fresh" again.

- mod_cache hands the cached data back to the browser.

Read http://www.ietf.org/rfc/rfc2616.txt section 13 (mainly) to see in
detail how this works.

Regards,
Graham
--

Re: Possible new cache architecture

2006-05-02 Thread Davi Arnaut

On Tue, 2 May 2006 15:40:30 +0200 (SAST)
"Graham Leggett" <[EMAIL PROTECTED]> wrote:

> On Tue, May 2, 2006 3:24 pm, Brian Akins said:
> 
> >> - the cache says "cool, will send my copy upstream. Oops, where has my
> >> data gone?".
> 
> > So, the cache says, okay must get content the old fashioned way (proxy,
> > filesystem, magic fairies, etc.).
> >
> > Where's the issue?
> 
> To rephrase it, a whole lot of extra code, which has to be written and
> debugged, has to say "oops, ok sorry backend about the If-None-Match, I
> thought I had it cached but I actually didn't, please can I have the full
> file?". Then the backend gives you a response with different headers to
> those you already delivered to the frontend. Oops.

There is not such scenario. I will simulate a request using the disk_cache
format:

. Incoming client requests URI /foo/bar/baz
. Request goes through mod_http_cache, Generate  off of URI
. mod_http_cache ask mod_cache for the data associated with key: .header
. No data:
. Fetch from upstream
. Data Fetched:
. If format #1 (Contains a list of Vary Headers):
. Use each header name (from .header) with our request
values (headers_in) to regenerate  using HeaderName+
HeaderValue+URI
. Ask mod_cache for data with key: .header
. No data:
. Fetch from upstream
. Data:
. Serve data to client
. If format #2
. Serve data to client

Where is the difference ?

> Keeping the code as simple as possible will keep your code bug free, which
> means less time debugging for you, and less time for end users trying to
> figure out what the cause is of their weird symptoms.

We are trying to get it more simple as possible by separating the storage
layer from the protocol layer.

--
Davi Arnaut
Davi Arnaut

Re: Possible new cache architecture

2006-05-02 Thread Plüm , Rüdiger , VF EITO



> -Ursprüngliche Nachricht-
> Von: Niklas Edmundsson 

> 
> Correct. When caching a 4.3GB file on a 32bit arch it gets so 
> bad that 
> mmap eats all your address space and the thing segfaults. I initally 
> thought it was eating memory, but that's only if you have mmap 
> disabled.

Ahh, good point. So I guess its needed to remove the mmap buckets
in the loop from the brigade.

Regards

Rüdiger

Re: Possible new cache architecture

2006-05-02 Thread Graham Leggett

On Tue, May 2, 2006 3:24 pm, Brian Akins said:

>> - the cache says "cool, will send my copy upstream. Oops, where has my
>> data gone?".

> So, the cache says, okay must get content the old fashioned way (proxy,
> filesystem, magic fairies, etc.).
>
> Where's the issue?

To rephrase it, a whole lot of extra code, which has to be written and
debugged, has to say "oops, ok sorry backend about the If-None-Match, I
thought I had it cached but I actually didn't, please can I have the full
file?". Then the backend gives you a response with different headers to
those you already delivered to the frontend. Oops.

Keeping the code as simple as possible will keep your code bug free, which
means less time debugging for you, and less time for end users trying to
figure out what the cause is of their weird symptoms.

Regards,
Graham
--

Re: Possible new cache architecture

2006-05-02 Thread Davi Arnaut

On Tue, 2 May 2006 11:22:31 +0200 (MEST)
Niklas Edmundsson <[EMAIL PROTECTED]> wrote:

> On Mon, 1 May 2006, Davi Arnaut wrote:
> 
> > More important, if we stick with the key/data concept it's possible to
> > implement the header/body relationship under single or multiple keys.
> 
> I've been hacking on mod_disk_cache to make it:
> * Only store one set of data when one uncached item is accessed
>simultaneously (currently all requests cache the file and the last
>finished cache process is "wins").
> * Don't wait until the whole item is cached, reply while caching
>(currently it stalls).
> * Don't block the requesting thread when requestng a large uncached
>item, cache in the background and reply while caching (currently it
>stalls).
> 
> This is mostly aimed at serving huge static files from a slow disk 
> backend (typically an NFS export from a server holding all the disk), 
> such as http://ftp.acc.umu.se/ and http://ftp.heanet.ie/ .
> 
> Doing this with the current mod_disk_cache disk layout was not 
> possible, doing the above without unneccessary locking means:
> 
> * More or less atomic operations, so caching headers and data in
>separate files gets very messy if you want to keep consistency.
> * You can't use tempfiles since you want to be able to figure out
>where the data is to be able to reply while caching.
> * You want to know the size of the data in order to tell when you're
>done (ie the current size of a file isn't necessarily the real size
>of the body since it might be caching while we're reading it).
> 
> In the light of our experiences, I really think that you want to have 
> a concept that allows you to keep the bond between header and data. 
> Yes, you can patch up a missing bond by require locking and stuff, but 
> I really prefer not having to lock cache files when doing read access. 
> When it comes to "make the common case fast" a lockless design is very 
> much preferred.

I will repeat once again: there is no locking involved, unless your format
of storing the header/data is really wrong. _The data format is up to
the module using it_, while the storage backend is a completely different
issue.

> However, if all those issues are sorted out in the layer above disk 
> cache then the above observations becomes more or less moot.

Yes, that's the point.

> In any case the patch is more or less finished, independent testing 
> and auditing haven't been done yet but I can submit a preliminary 
> jumbo-patch if people are interested in having a look at it now.

--
Davi Arnaut

Re: Possible new cache architecture

2006-05-02 Thread Niklas Edmundsson


On Tue, 2 May 2006, Graham Leggett wrote:


If it's:
* Link to latest GNOME Live CD gets published on Slashdot.
* A gazillion users click the link to download it.
* mod_disk_cache starts a new instance of caching the file for each
   request, until someone has completed caching the file.


Then this is the thundering herd problem :)


OK :)


Either a site is slashdotted (as in your case), or a cached entry expires,
and suddenly the backend gets nailed until at least one request "wins",
then we are back to normal serving from the cache.

In your case, the "backend" is the disk, while in the bug from 1998, the
backend was another webserver. Either way, same problem.


OK.


Then this patch solves the problem regardless of whether it's a static
file or dynamically generated content since it only allows one
instance to cache the file (OK, there's a small hole so there can be
multiple instances but it's wy smaller than now), all other
instances delivers data as the caching process is writing it.



Additionally, if it's a static file that's allowed to be cached in
the background it solves:
* Reduce chance of user getting bored since the data is delivered
   while being cached.
* The user got bored and closed the connection so the painfully cached
   file gets deleted.


Hmmm - thinking about this we try to cache the brigade (all X GB of it)
first, then we try write it to the network, thus the delay.

Does your patch solve all of these already, or are they planned?


It solves everything I've mentioned. The solution is probably not 
perfect for the not-static-file case since it falls back to the old 
behaviour of caching the whole file, but it should be a lot better 
than the current mod_disk_cache since the rest of the threads get 
reply-while-caching. There are issues here with the fact that the 
result is discarded if the connection is aborted, but I'm not familiar 
enough with apache filter internals to state that you can keep the 
result even though the connection is aborted.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Anything is edible if it's chopped finely enough
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: Possible new cache architecture

2006-05-02 Thread Brian Akins


Graham Leggett wrote:

- the cache says "cool, will send my copy upstream. Oops, where has my 
data gone?".





So, the cache says, okay must get content the old fashioned way (proxy, 
filesystem, magic fairies, etc.).


Where's the issue?



--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Possible new cache architecture

2006-05-02 Thread Niklas Edmundsson


On Tue, 2 May 2006, Plüm, Rüdiger, VF EITO wrote:


Another thing: I guess on systems with no mmap support the current 
implementation
of mod_disk_cache will eat up a lot of memory if you cache a large local file,
because it transforms the file bucket(s) into heap buckets in this case.
Even if mmap is present I think that mod_disk_cache causes the file buckets
to be transformed into many mmap buckets if the file is large. Thus we do not
use sendfile in the case we cache the file.


Correct. When caching a 4.3GB file on a 32bit arch it gets so bad that 
mmap eats all your address space and the thing segfaults. I initally 
thought it was eating memory, but that's only if you have mmap 
disabled.



I the case that a brigade only contains file_buckets it might be possible to
"copy" this brigade, sent it up the chain and process the copy of the brigade
for disk storage afterwards. Of course this opens a race if the file gets
changed in between these operations.
This approach does not work with socket or pipe buckets for obvious reasons.
Even heap buckets seem to be a somewhat critical idea because of the 
added memory usage.


I did the somewhat naive approach of only doing background caching 
when the buckets refer to a single sequential file. It's not perfect, 
but it solves the main case where you get a huge amount of data to 
store ...



/Nikke - stumbled upon more than one bug when digging into
 mod_disk_cache
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Anything is edible if it's chopped finely enough
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: Possible new cache architecture

2006-05-02 Thread Graham Leggett

On Tue, May 2, 2006 2:18 pm, Niklas Edmundsson said:

> Exactly what is the thundering herd problem? I can guess the general
> problem, but without a more precise definition I can't really say if
> my patch fixes it or not.
>
> If it's:
> * Link to latest GNOME Live CD gets published on Slashdot.
> * A gazillion users click the link to download it.
> * mod_disk_cache starts a new instance of caching the file for each
>request, until someone has completed caching the file.

Then this is the thundering herd problem :)

Either a site is slashdotted (as in your case), or a cached entry expires,
and suddenly the backend gets nailed until at least one request "wins",
then we are back to normal serving from the cache.

In your case, the "backend" is the disk, while in the bug from 1998, the
backend was another webserver. Either way, same problem.

> Then this patch solves the problem regardless of whether it's a static
> file or dynamically generated content since it only allows one
> instance to cache the file (OK, there's a small hole so there can be
> multiple instances but it's wy smaller than now), all other
> instances delivers data as the caching process is writing it.

> Additionally, if it's a static file that's allowed to be cached in
> the background it solves:
> * Reduce chance of user getting bored since the data is delivered
>while being cached.
> * The user got bored and closed the connection so the painfully cached
>file gets deleted.

Hmmm - thinking about this we try to cache the brigade (all X GB of it)
first, then we try write it to the network, thus the delay.

Does your patch solve all of these already, or are they planned?

Regards,
Graham
--

Re: Possible new cache architecture

2006-05-02 Thread Plüm , Rüdiger , VF EITO



> -Ursprüngliche Nachricht-
> Von: Graham Leggett 

> > The reason it does not work currently is that that a local file
> > usually is
> > delivered in one brigade with, depending on the size of the 
> file, one or
> > more
> > file buckets.
> 
> Hmmm - ok, this makes sense.
> 
> Something I've never checked for, do output filters support 
> asynchronous
> writes?

I don't think so. Of course this would be a nice feature. Maybe somehow
possible with Colm's ideas.
Another thing: I guess on systems with no mmap support the current 
implementation
of mod_disk_cache will eat up a lot of memory if you cache a large local file,
because it transforms the file bucket(s) into heap buckets in this case.
Even if mmap is present I think that mod_disk_cache causes the file buckets
to be transformed into many mmap buckets if the file is large. Thus we do not
use sendfile in the case we cache the file.
I the case that a brigade only contains file_buckets it might be possible to
"copy" this brigade, sent it up the chain and process the copy of the brigade
for disk storage afterwards. Of course this opens a race if the file gets
changed in between these operations.
This approach does not work with socket or pipe buckets for obvious reasons.
Even heap buckets seem to be a somewhat critical idea because of the added 
memory usage.


Regards

Rüdiger

Re: Possible new cache architecture

2006-05-02 Thread Niklas Edmundsson


On Tue, 2 May 2006, Graham Leggett wrote:


This is great, in doing this you've been solving a proxy bug that was
first reported in 1998 :).


This already works in the case you get the data from the proxy backend. It
does
not work for local files that get cached (the scenario Niklas uses the
cache
for).


Ok then I have misunderstood - I was referring to the thundering herd
problem.


Exactly what is the thundering herd problem? I can guess the general 
problem, but without a more precise definition I can't really say if 
my patch fixes it or not.


If it's:
* Link to latest GNOME Live CD gets published on Slashdot.
* A gazillion users click the link to download it.
* mod_disk_cache starts a new instance of caching the file for each
  request, until someone has completed caching the file.

Then this patch solves the problem regardless of whether it's a static 
file or dynamically generated content since it only allows one 
instance to cache the file (OK, there's a small hole so there can be 
multiple instances but it's wy smaller than now), all other 
instances delivers data as the caching process is writing it.


Additionally, if it's a static file that's allowed to be cached in 
the background it solves:

* Reduce chance of user getting bored since the data is delivered
  while being cached.
* The user got bored and closed the connection so the painfully cached
  file gets deleted.

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Illiterate?  Write for information!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: Possible new cache architecture

2006-05-02 Thread Graham Leggett

On Tue, May 2, 2006 12:16 pm, Plüm, Rüdiger, VF EITO said:

>> This is great, in doing this you've been solving a proxy bug that was
>> first reported in 1998 :).
>
> This already works in the case you get the data from the proxy backend. It
> does
> not work for local files that get cached (the scenario Niklas uses the
> cache
> for).

Ok then I have misunderstood - I was referring to the thundering herd
problem.

> The reason it does not work currently is that that a local file
> usually is
> delivered in one brigade with, depending on the size of the file, one or
> more
> file buckets.

Hmmm - ok, this makes sense.

Something I've never checked for, do output filters support asynchronous
writes?

If they did, this might solve this problem - the write request would
return immediately, allowing the read from file and write to cached file
to continue while the write to network blocked.

Regards,
Graham
--

Re: Possible new cache architecture

2006-05-02 Thread Plüm , Rüdiger , VF EITO



> -Ursprüngliche Nachricht-
> Von: Graham Leggett 
> > * Don't block the requesting thread when requestng a large uncached
> >item, cache in the background and reply while caching 
> (currently it
> >stalls).
> 
> This is great, in doing this you've been solving a proxy bug that was
> first reported in 1998 :).

This already works in the case you get the data from the proxy backend. It does
not work for local files that get cached (the scenario Niklas uses the cache
for). The reason it does not work currently is that that a local file usually is
delivered in one brigade with, depending on the size of the file, one or more
file buckets.
For Niklas purposes Colm's ideas regarding the use of the new Linux system calls
tee and splice will get handy 
(http://mail-archives.apache.org/mod_mbox/apr-dev/200604.mbox/[EMAIL PROTECTED])
as they should speed up such things.

Regards

Rüdiger

Re: Possible new cache architecture

2006-05-02 Thread Graham Leggett

On Tue, May 2, 2006 11:22 am, Niklas Edmundsson said:

> I've been hacking on mod_disk_cache to make it:
> * Only store one set of data when one uncached item is accessed
>simultaneously (currently all requests cache the file and the last
>finished cache process is "wins").
> * Don't wait until the whole item is cached, reply while caching
>(currently it stalls).
> * Don't block the requesting thread when requestng a large uncached
>item, cache in the background and reply while caching (currently it
>stalls).

This is great, in doing this you've been solving a proxy bug that was
first reported in 1998 :).

The only things to be careful of is for Cache-Control: no-cache and
friends to be handled gracefully (the partially cached file should be
marked as "delete-me" so that the current request creates a new cache file
/ no cache file. Existing running downloads should be unaffected by
this.), and for backend failures (either a timeout or a premature socket
close) to cause the cache entry to be invalidated and deleted.

> * More or less atomic operations, so caching headers and data in
>separate files gets very messy if you want to keep consistency.

Keep in mind that HTTP/1.1 compliance requires that the headers be
updatable without changing the body.

> * You can't use tempfiles since you want to be able to figure out
>where the data is to be able to reply while caching.
> * You want to know the size of the data in order to tell when you're
>done (ie the current size of a file isn't necessarily the real size
>of the body since it might be caching while we're reading it).

The cache already wants to know the size of the data so that it can decide
whether it's prepared to try and cache the file in the first place, so in
theory this should not be a problem.

> In any case the patch is more or less finished, independent testing
> and auditing haven't been done yet but I can submit a preliminary
> jumbo-patch if people are interested in having a look at it now.

Post it, people can take a look.

Regards,
Graham
--

Re: Possible new cache architecture

2006-05-02 Thread Niklas Edmundsson


On Mon, 1 May 2006, Davi Arnaut wrote:


More important, if we stick with the key/data concept it's possible to
implement the header/body relationship under single or multiple keys.


I've been hacking on mod_disk_cache to make it:
* Only store one set of data when one uncached item is accessed
  simultaneously (currently all requests cache the file and the last
  finished cache process is "wins").
* Don't wait until the whole item is cached, reply while caching
  (currently it stalls).
* Don't block the requesting thread when requestng a large uncached
  item, cache in the background and reply while caching (currently it
  stalls).

This is mostly aimed at serving huge static files from a slow disk 
backend (typically an NFS export from a server holding all the disk), 
such as http://ftp.acc.umu.se/ and http://ftp.heanet.ie/ .


Doing this with the current mod_disk_cache disk layout was not 
possible, doing the above without unneccessary locking means:


* More or less atomic operations, so caching headers and data in
  separate files gets very messy if you want to keep consistency.
* You can't use tempfiles since you want to be able to figure out
  where the data is to be able to reply while caching.
* You want to know the size of the data in order to tell when you're
  done (ie the current size of a file isn't necessarily the real size
  of the body since it might be caching while we're reading it).

In the light of our experiences, I really think that you want to have 
a concept that allows you to keep the bond between header and data. 
Yes, you can patch up a missing bond by require locking and stuff, but 
I really prefer not having to lock cache files when doing read access. 
When it comes to "make the common case fast" a lockless design is very 
much preferred.


However, if all those issues are sorted out in the layer above disk 
cache then the above observations becomes more or less moot.


In any case the patch is more or less finished, independent testing 
and auditing haven't been done yet but I can submit a preliminary 
jumbo-patch if people are interested in having a look at it now.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Want to forget all your troubles? Wear tight shoes.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: Possible new cache architecture

2006-05-01 Thread Davi Arnaut

On Mon, 01 May 2006 22:46:44 +0200
Graham Leggett <[EMAIL PROTECTED]> wrote:

> Brian Akins wrote:
> 
> >> That's two hits to find whether something is cached.
> > 
> > You must have two hits if you support vary.
> 
> You need only one - bring up the original cached entry with the key, and 
> then use cheap subkeys over a very limited data set to find both the 
> variants and the header/data.
> 
> >> How are races prevented?
> > 
> > shouldn't be any.  something is in the cache or not.  if one "piece" of 
> > an http "object" is not valid or in cache, the object is invalid. 
> > Although other variants may be valid/in cache.
> 
> I can think of one race off the top of my head:
> 
> - the browser says "send me this URL".
> 
> - the cache has it cached, but it's stale, so it asks the backend 
> "If-None-Match".
> 
> - the cache reaper comes along, says "oh, this is stale", and reaps the 
> cached body (which is independant, remember?). The data is no longer 
> cached even though the headers still exist.
> 
> - The backend says "304 Not Modified".
> 
> - the cache says "cool, will send my copy upstream. Oops, where has my 
> data gone?".

Sorry, but this only happens in your imagination. It's pretty obvious
that mod_cache_http will handle this.

> The end user will probably experience this as "oh, the website had a 
> glitch, let me try again", so it won't be reported as a bug.

No.

> Ok, so you tried to lock the body before going to the backend, but 
> searching for and locking the body would have been an additional wasted 
> cache hit if the backend answered with its own body. Not to mention 
> having to write and debug code to do this.

Locks are not necessary, perhaps you are imaginating something very different.
If a data body disappears under mod_http_cache it is not a big deal! It will
refuse to serve the request from the cache and a new version of the page will
be cached.

> Races need to be properly handled, and atomic cache operations will go a 
> long way to prevent them.

I think we are discussing apples and oranges. First, we only want to *organize*
the current cache code into a more layered solution. The current semantics won't
change, yet!

--
Davi Arnaut

Re: Possible new cache architecture

2006-05-01 Thread William A. Rowe, Jr.


Graham Leggett wrote:

Brian Akins wrote:


That's two hits to find whether something is cached.



You must have two hits if you support vary.



You need only one - bring up the original cached entry with the key, and 
then use cheap subkeys over a very limited data set to find both the 
variants and the header/data.



How are races prevented?



shouldn't be any.  something is in the cache or not.  if one "piece" 
of an http "object" is not valid or in cache, the object is invalid. 
Although other variants may be valid/in cache.



I can think of one race off the top of my head:

- the browser says "send me this URL".

- the cache has it cached, but it's stale, so it asks the backend 
"If-None-Match".


- the cache reaper comes along, says "oh, this is stale", and reaps the 
cached body (which is independant, remember?). The data is no longer 
cached even though the headers still exist.


- The backend says "304 Not Modified".

- the cache says "cool, will send my copy upstream. Oops, where has my 
data gone?".


I think that can be avoided by, instead of reaping the cached body, actually
setting aside the cached body (public > private), by changing it's key or
whatnot.  Then - throw it away after the backend says "200 OK", and replace
it with something new.  Or, rekey it a second time (private > public) when
the backend reports "304 NOT MODIFIED".

In the race, one will set it aside looking for another, the second will make
a fresh request (it doesn't see it in the cache), and either the first or
second request will wrap up -last- to place the final copy back into the
cache, replacing the document from the winner.  No harm no foul.

Bill

Re: Possible new cache architecture

2006-05-01 Thread Graham Leggett


Brian Akins wrote:


That's two hits to find whether something is cached.


You must have two hits if you support vary.


You need only one - bring up the original cached entry with the key, and 
then use cheap subkeys over a very limited data set to find both the 
variants and the header/data.



How are races prevented?


shouldn't be any.  something is in the cache or not.  if one "piece" of 
an http "object" is not valid or in cache, the object is invalid. 
Although other variants may be valid/in cache.


I can think of one race off the top of my head:

- the browser says "send me this URL".

- the cache has it cached, but it's stale, so it asks the backend 
"If-None-Match".


- the cache reaper comes along, says "oh, this is stale", and reaps the 
cached body (which is independant, remember?). The data is no longer 
cached even though the headers still exist.


- The backend says "304 Not Modified".

- the cache says "cool, will send my copy upstream. Oops, where has my 
data gone?".


The end user will probably experience this as "oh, the website had a 
glitch, let me try again", so it won't be reported as a bug.


Ok, so you tried to lock the body before going to the backend, but 
searching for and locking the body would have been an additional wasted 
cache hit if the backend answered with its own body. Not to mention 
having to write and debug code to do this.


Races need to be properly handled, and atomic cache operations will go a 
long way to prevent them.


Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Possible new cache architecture

2006-05-01 Thread Davi Arnaut

On Mon, 01 May 2006 15:46:58 -0400
Brian Akins <[EMAIL PROTECTED]> wrote:

> Graham Leggett wrote:
> 
> > That's two hits to find whether something is cached.
> 
> You must have two hits if you support vary.
> 
> > How are races prevented?
> 
> shouldn't be any.  something is in the cache or not.  if one "piece" of 
> an http "object" is not valid or in cache, the object is invalid. 
> Although other variants may be valid/in cache.
> 

More important, if we stick with the key/data concept it's possible to
implement the header/body relationship under single or multiple keys.

I think Brian want's mod_cache should be only a layer (glue) between the
underlying providers and the cache users. Each set of problems are better
dealt under their own layers. The storage layer (cache providers) are going
to only worry about storing the key/data pairs (and expiring ?) while the
"protocol" layer will deal with the underlying concepts of each protocol
(mod_http_cache).

The current design leads to bloat, just look at mem_cache and disk_cache,
both have their own duplicated quirks (serialize/unserialize, et cetera)
and need special handling of the headers and file format. Under the new
design this duplication will be gone, think that we will assemble the
HTTP-specific part and generalize the storage part.

--
Davi Arnaut

Re: Possible new cache architecture

2006-05-01 Thread Brian Akins


William A. Rowe, Jr. wrote:


And, of course, inserting the hit once it's composed is important, and can
happen in parallel (3 clients looking for the same, and then fetching the
same page from the origin).  But it's harmless if the insertion is mutex
protected, and the insertion can only happen once the page is fetched
complete.



in the case of mod_disk_cache the way I would do it is to have a 
deterministic tempfile rather than user apr_tempfile and opening it EXCL.


--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Possible new cache architecture

2006-05-01 Thread William A. Rowe, Jr.


Brian Akins wrote:

Graham Leggett wrote:


That's two hits to find whether something is cached.


You must have two hits if you support vary.


Well, one to three hits.  One, if you use an arbitrary page (MRU or most
frequently referenced would be most optimial, but it really doesn't matter)
and then determine what varies, and if you are in the right place, or what
that right place is (page by language, or whatever fields it varied by.)

Three hits or more if your variant also varies ;)


How are races prevented?


shouldn't be any.  something is in the cache or not.  if one "piece" of 
an http "object" is not valid or in cache, the object is invalid. 
Although other variants may be valid/in cache.


And, of course, inserting the hit once it's composed is important, and can
happen in parallel (3 clients looking for the same, and then fetching the
same page from the origin).  But it's harmless if the insertion is mutex
protected, and the insertion can only happen once the page is fetched
complete.

Re: Possible new cache architecture

2006-05-01 Thread Brian Akins


Graham Leggett wrote:


That's two hits to find whether something is cached.


You must have two hits if you support vary.


How are races prevented?


shouldn't be any.  something is in the cache or not.  if one "piece" of 
an http "object" is not valid or in cache, the object is invalid. 
Although other variants may be valid/in cache.



--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Possible new cache architecture

2006-05-01 Thread Brian Akins


Graham Leggett wrote:
 Or you 
can avoid this issue entirely by building a generic cache that works 
with key/subkey/data.


and then you have to find a way to bridge the gap between this interface 
and all the key/value caches that currently exist (memcache being the 
most popular example).


what if mod_http_cache had a way to "record" it's cached objects? It 
could keep up with the relationships there.  Basically, you have a 
provider that has a few functions that get called whenever 
mod_http_cache caches or expires an object.



--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Possible new cache architecture

2006-05-01 Thread Graham Leggett


Brian Akins wrote:

Nope.  Look at the way the current http cache works. An http "object," 
headers and data, is only valid if both headers and data are valid.


That's two hits to find whether something is cached.

How are races prevented?

Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Possible new cache architecture

2006-05-01 Thread Brian Akins


Graham Leggett wrote:
 the independent caching of variants. 


The example I posted should address this issue.

I also have some ideas concerning the thundering herd problem, it's just 
a matter if you think it should be handled in cache or http_cache.




--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Possible new cache architecture

2006-05-01 Thread Graham Leggett


Davi Arnaut wrote:

It's a design flaw to create problems that have to be specially coded 
around, when you can avoid the problem entirely.


Maybe I'm missing something, what problems do you foresee ?


There are lots of issues that were uncovered when I split the proxy and 
cache code for httpd v2.0.


A web cache requires two separately alterable cached entities (headers, 
body) just for caching a single variant. This pair of entities need to 
expire and/or be forceably expired (think Cache-Control no-cache) 
atomically. Sure, you can code and debug a lot of code to try and create 
the effect of atomically expiring multiple cache entries at once. Or you 
can avoid this issue entirely by building a generic cache that works 
with key/subkey/data.


There are a number of other issues that have been listed as bugs since 
httpd v1.3 that are still present, most notably the thundering herd 
problem, and the independent caching of variants. There is no point in 
refactoring the cache code if the new code isn't going to be 
significantly better than the existing code.


Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Possible new cache architecture

2006-05-01 Thread Brian Akins


Davi Arnaut wrote:

This way it would be possible for one cache to act as a cache of another
cache provider, mod_mem_cache would work as a small/fast MRU cache for
mod_disk_cache.


Slightly off subject, but in my testing, mod_disk_cache is much faster 
than mod_mem_cache.  Thanks to sendifle!


I was thinking about scenarios were each cache had there local cache 
(disk, mem, whatever) with memcache behind it.  That way each "object" 
only has to be generated once for the entire "farm."  This would be an 
easy way to have a distributed cache.


Also, the squid type htcp (or icp) could be a failback for the local 
cache as well without mucking up all the proxy and cache code.



--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Possible new cache architecture

2006-05-01 Thread Davi Arnaut

On Mon, 01 May 2006 09:02:31 -0400
Brian Akins <[EMAIL PROTECTED]> wrote:

> Here is a scenario.  We will assume a cache "hit."

I think the usage scenario is clear. Moving on, I would like to able to stack
up the cache providers (like the apache filter chain). Basically, mod_cache
will expose the functions:

add(key, value, expiration, flag)
get(key)
remove(key)

mod_cache will then pass the request (add/get or remove) down the chain,
similar to apache filter chain. ie:

apr_status_t mem_cache_get_filter(ap_cache_filter_t *f,
  apr_bucket_brigade *bb, ...);

apr_status_t disk_cache_get_filter(ap_cache_filter_t *f,
   apr_bucket_brigade *bb, ...);

This way it would be possible for one cache to act as a cache of another
cache provider, mod_mem_cache would work as a small/fast MRU cache for
mod_disk_cache.

--
Davi Arnaut

Re: Possible new cache architecture

2006-05-01 Thread Davi Arnaut

On Mon, 01 May 2006 14:51:53 +0200
Graham Leggett <[EMAIL PROTECTED]> wrote:

> Davi Arnaut wrote:
> 
> >> mod_cache need not be HTTP specific, it only needs the ability to cache 
> >> multiple entities (data, headers) under the same key, and be able to 
> >> replace zero or more entities independently of the other entities (think 
> >> updating headers without updating content).
> > 
> > mod_cache needs only to cache key/value pairs. The key/value format is up to
> > the mod_cache user.
> 
> It's a design flaw to create problems that have to be specially coded 
> around, when you can avoid the problem entirely.

Maybe I'm missing something, what problems do you foresee ?

> The cache needs to be generic, yes - but there is no need to stick to 
> the "key/value" cliché of cache code, if a variation to this is going to 
> make your life significantly easier.
> 

And the variation is..?

--
Davi Arnaut

Re: Possible new cache architecture

2006-05-01 Thread Brian Akins


Here is a scenario.  We will assume a cache "hit."

Client asks for http://domain/uri.html?args

mod_http_cache generates a key: http-domain-uri.html-args-header

asks mod_cache for value with this key.

mod_cache fetches the value, looks at expire time, its good, and returns 
the "blob"


mod_http_cache examines blob, it's vary information on Accept-Encoding.

mod_http_cache generates a new key: http-domain.html-args-header-gzip 
(value from client)


asks mod_cache for value with this key.

mod_cache fetches the value, looks at expire time, its good, and returns 
the "blob"


mod_http_cache examines blob, it's a normal header blob. does not "meet 
conditions" need to get data.


mod_http_cache generates a new key: http-domain.html-args-data-gzip 
(value from client)


asks mod_cache for value with this key.

mod_cache fetches the value, looks at expire time, its good, and returns 
the "blob"



mod_http_cache returns headers and data to client.


Notice there is a pattern to this...
--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Possible new cache architecture

2006-05-01 Thread Brian Akins


Davi Arnaut wrote:


mod_cache needs only to cache key/value pairs. The key/value format is up to
the mod_cache user.


correct.

--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Possible new cache architecture

2006-05-01 Thread Graham Leggett


Davi Arnaut wrote:

mod_cache need not be HTTP specific, it only needs the ability to cache 
multiple entities (data, headers) under the same key, and be able to 
replace zero or more entities independently of the other entities (think 
updating headers without updating content).


mod_cache needs only to cache key/value pairs. The key/value format is up to
the mod_cache user.


It's a design flaw to create problems that have to be specially coded 
around, when you can avoid the problem entirely.


The cache needs to be generic, yes - but there is no need to stick to 
the "key/value" cliché of cache code, if a variation to this is going to 
make your life significantly easier.


Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Possible new cache architecture

2006-05-01 Thread Brian Akins


Graham Leggett wrote:


The potential danger with this is for race conditions to happen while 
expiring cache entries. If the data entity expired before the header 
entity, it potentially could confuse the cache - is the entry cached or 
not? The headers say yes, data says no.


Nope.  Look at the way the current http cache works. An http "object," 
headers and data, is only valid if both headers and data are valid.


Each variant should be an independent cached entry, the cache should 
allow different variants to be cached side by side.


Yes.  Each is distinguished by its key.

As far as mod_cache is concerned these are 3 independent entries, but 
mod_http_cache knows how to "stitch" them together.


mod_cache should *not* be HTTP specific in any way.


mod_cache need not be HTTP specific, it only needs the ability to cache 
multiple entities (data, headers) under the same key, 


No.



In other words, there must be the ability to cache by a key and a subkey.



No. mod_http_cache generates new keys for headers (key.header) data 
(key.data) and each variant (key1.header, key2.header, key1.daya... 
etc.).  As far as the underlying generic cache is concerned, they are 
all independent entries.



--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Possible new cache architecture

2006-04-30 Thread Davi Arnaut

On Sun, 30 Apr 2006 22:38:23 +0200
Graham Leggett <[EMAIL PROTECTED]> wrote:

> Brian Akins wrote:
> 
> > mod_http_cache could just cache headers and data as separate cache entries.
> 
> The potential danger with this is for race conditions to happen while 
> expiring cache entries. If the data entity expired before the header 
> entity, it potentially could confuse the cache - is the entry cached or 
> not? The headers say yes, data says no.

If both the data and header have the same expiration time they should
both be removed atomically, but this would be hard to achieve. The trick
is to set the header to expire before the data. Also, this would confuse
the cache user not the cache itself.

> > So a given HTTP "object" may actually have 3 entries in the cache:
> > -first entry says: Vary on x,y,z
> > -second entry is headers for new key (generated with the vary info)
> > -third entry is the actual data
> 
> Each variant should be an independent cached entry, the cache should 
> allow different variants to be cached side by side.
> 
> > As far as mod_cache is concerned these are 3 independent entries, but 
> > mod_http_cache knows how to "stitch" them together.
> > 
> > mod_cache should *not* be HTTP specific in any way.
> 
> mod_cache need not be HTTP specific, it only needs the ability to cache 
> multiple entities (data, headers) under the same key, and be able to 
> replace zero or more entities independently of the other entities (think 
> updating headers without updating content).
> 

mod_cache needs only to cache key/value pairs. The key/value format is up to
the mod_cache user.

--
Davi Arnaut

Re: Possible new cache architecture

2006-04-30 Thread Graham Leggett


Brian Akins wrote:


mod_http_cache could just cache headers and data as separate cache entries.


The potential danger with this is for race conditions to happen while 
expiring cache entries. If the data entity expired before the header 
entity, it potentially could confuse the cache - is the entry cached or 
not? The headers say yes, data says no.



So a given HTTP "object" may actually have 3 entries in the cache:
-first entry says: Vary on x,y,z
-second entry is headers for new key (generated with the vary info)
-third entry is the actual data


Each variant should be an independent cached entry, the cache should 
allow different variants to be cached side by side.


As far as mod_cache is concerned these are 3 independent entries, but 
mod_http_cache knows how to "stitch" them together.


mod_cache should *not* be HTTP specific in any way.


mod_cache need not be HTTP specific, it only needs the ability to cache 
multiple entities (data, headers) under the same key, and be able to 
replace zero or more entities independently of the other entities (think 
updating headers without updating content).


In other words, there must be the ability to cache by a key and a subkey.

Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Possible new cache architecture

2006-04-28 Thread Brian Akins


Graham Leggett wrote:

A question to ponder is just how generic should the cache be. An HTTP 
cache requires cache entries containing data and headers, either of 
which can be updated separately.


So any given HTTP "object" would actually be two objects in the cache: 
headers and data.



 As a result, the typical "cache a blob
of data" interface isn't going to work, and needs to be kept in mind 
when looking at the cache interfaces.


mod_http_cache could just cache headers and data as separate cache entries.

So a given HTTP "object" may actually have 3 entries in the cache:
-first entry says: Vary on x,y,z
-second entry is headers for new key (generated with the vary info)
-third entry is the actual data

As far as mod_cache is concerned these are 3 independent entries, but 
mod_http_cache knows how to "stitch" them together.


mod_cache should *not* be HTTP specific in any way.

--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Possible new cache architecture

2006-04-27 Thread Graham Leggett


Brian Akins wrote:


The components:
mod_http_cache: what mod_cache is currently
mod_cache: a generic caching module - provides glue between providers 
and other modules.  Think mod_dbd...

cache providers: disk, mem, memcache, mysql, etc.


This sounds like a refactoring job, which is a good idea.

I think step one would be to rename mod_cache to be mod_http_cache as 
you suggest, then create a blank mod_cache, followed by some refactoring 
of the generalised methods and cache hooks into mod_cache.


I think this exercise should uncover what needs to move, and what needs 
to be changed.


A question to ponder is just how generic should the cache be. An HTTP 
cache requires cache entries containing data and headers, either of 
which can be updated separately. As a result, the typical "cache a blob 
of data" interface isn't going to work, and needs to be kept in mind 
when looking at the cache interfaces.


Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Possible new cache architecture

2006-04-27 Thread Parin Shah

> Brian Akins wrote:
> Some functions a provider should provide:
> init(args...) - initialize an instance :)
> open(instance, key) - open a cache object
> read_buffer(object, buffer, copy) - read entire object into buffer.
> buffer may be read only (ie, it may be mmapped or part of sql statement)
> or make it a copy.
> read_bb(object, brigade, copy) - read object into a brigade. copy if
> flag is set
> store_bb(object, brigade) - store a bucket brigade
> store_buffer(object, buffer) - store a blob of data
> close(object)
>
> Thoughts?  I'm sure we may need more/better cache provider functions.

it would be helpful if provider can notify the mod_cache (using some
sort of call back function ) when it is removing an object from its
cache. So that mod_cache can take a look at the object being removed
and decide to push it to the next-less-resource-critical provider. So
if mem_cache_provider decides to remove the lru object, mod_cache can
push it to disk_cache_provider.

Re: Possible new cache architecture

2006-04-27 Thread Nick Kew

On Thursday 27 April 2006 15:04, Brian Akins wrote:
> The components:

How would this fit with the various half-HTTP cacheing standards
floating around, and the SoC projects that have been mooted?
It seems to me that cache is ripe for generalisation.

> mod_http_cache: what mod_cache is currently
> mod_cache: a generic caching module - provides glue between providers
> and other modules.  Think mod_dbd...

... or mod_proxy ...

-- 
Nick Kew

Re: Possible new cache architecture

2006-04-27 Thread Brian Akins


Bart van der Schans wrote:

One thing about the current implementation. Mod_cache does server side
caching, but also set expires headers witch trigger client (browser)
caching. Right now you can't turn off setting the expires header with
mod_cache. I think it would be nice to have an option to configure this.
 WDYT?


Yes.  that should be configurable.




--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Possible new cache architecture

2006-04-27 Thread Bart van der Schans

Brian Akins wrote:
> The components:
> mod_http_cache: what mod_cache is currently
One thing about the current implementation. Mod_cache does server side
caching, but also set expires headers witch trigger client (browser)
caching. Right now you can't turn off setting the expires header with
mod_cache. I think it would be nice to have an option to configure this.
 WDYT?

Bart

-- 

Hippo
Oosteinde 11
1017WT Amsterdam
The Netherlands
Tel  +31 (0)20 5224466
-
[EMAIL PROTECTED] / http://www.hippo.nl
--

Re: Possible new cache architecture

2006-04-27 Thread Brian Akins


Brian Akins wrote:
mod_cache: a generic caching module - provides glue between providers 


The more I think about it, this part doesn't even need to be httpd 
specific.  It could be apr_cache.  Not sure how that would scre things 
up.  I also noticed that the whole providers thing is httpd and not apr...


--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Possible new cache architecture

2006-04-27 Thread Brian Akins


The components:
mod_http_cache: what mod_cache is currently
mod_cache: a generic caching module - provides glue between providers 
and other modules.  Think mod_dbd...

cache providers: disk, mem, memcache, mysql, etc.

An example mod_http_cache:

-generate cache key
-ask mod_cache for object with key
-mod_cache checks provider(s) and returns object on "hit"
-object may contain vary info, regenerate key and ask mod_cache with new 
key (this would be equivalent to header)

-ask mod_cache for the body
-serve data to client

This would remove all the HTTP specific stuff from the cache providers 
and Vary could be handled in a central location (mod_http_cache).  And 
it *should* be fairly trivial to write and stack cache providers.


Some functions a provider should provide:
init(args...) - initialize an instance :)
open(instance, key) - open a cache object
read_buffer(object, buffer, copy) - read entire object into buffer. 
buffer may be read only (ie, it may be mmapped or part of sql statement) 
or make it a copy.
read_bb(object, brigade, copy) - read object into a brigade. copy if 
flag is set

store_bb(object, brigade) - store a bucket brigade
store_buffer(object, buffer) - store a blob of data
close(object)

Thoughts?  I'm sure we may need more/better cache provider functions.

--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

80 matches

Mail list logo