Re: mod_cache and its ilk

2006-10-30 Thread Graham Leggett

Justin Erenkrantz wrote:


+1.  I would prefer that all bells and whistles be kept out of
mod_disk_cache and we provide another alternative that has more
complex behaviors that scales a little better for those who care.
(That was the whole reason behind a versioned provider interface to
begin with!)


In that case I suggest we revert the mod_disk_cache back to how it was 
before the large-file patches (but after the sets of bugfixes), and to 
move the current modified mod_disk_cache to mod_large_disk cache for 
further review?



1) cannot write entire response to disk for any content type before
sending anything to the client; filter acts by writing to cache and
client synchronously


My concern with this is we should be careful not to teach the
providers about the fact that it is sitting in an output filter chain.

In other words, I would prefer that we create an abstraction that does
not force the providers to have to worry about passing brigades down a
filter chain or dealing with filter errors.  It just needs to stash
some data away and that's all I want the provider to be concerned
with.

Perhaps we provide an optional continuation function to the
store_body() analogue that the provider can call after it writes a
bucket which mod_cache can then pass along down the filter chain on
its behalf.  Otherwise, we'd have way too much duplicate code to deal
with and that concerns me.

The implementation that was recently checked in makes it so that we
can not distinguish between a failed cache storage and a filter error.
I'd like to be able to maintain knowing the difference and not having
a failed cache storage affect the client.  See my add'l item #5 below.


My original solution for this was, inside mod_cache, to loop through the 
brigade, and to split buckets larger than a threshold before giving the 
buckets to cache_body() and the network in turn.


cache_body() would then be guaranteed that buckets would never exceed a 
certain size.


The certain size was configurable via a directive.

At the time I did not know that the apr_bucket_read() function did 
exactly this anyway - this behaviour is undocumented in doxygen - and 
apparently 4MB is the maximum size that apr_bucket_read() will return.


Based on this, we could remove the directive and default it to 4MB (or 
the same constant used in apr_bucket_read()), and use apr_bucket_split.


This has the advantage of keeping the API clean as before, and it 
ensures that cache providers don't have to care about large buckets.


The cache also doesn't have to care how 4.7GB of buckets are going to be 
stored temporarily between write-to-cache and write-to-disk, as the 
buckets so split are deleted after each iteration.



2) avoids writing a given URL/variant to the cache more than once
simultaneously using open/O_EXCL


There's problems with relying upon O_EXCL.  mod_disk_cache
purposefully lets the race condition happen as without inter-process
sync, it's not really easy to know who is actually around and is
likely to finish.  So, mod_disk_cache lets the last guy to succeed
'win'.  I think this leads to a far simpler design within
mod_disk_cache (it trades writing duplicates of the cached entity for
a higher chance of success), but I'd be curious to see if you can come
up with a scalable design using O_EXCL.  I tried before and never came
up with a clean or scalable design as the solutions with O_EXCL placed
too much faith in the fact that one selected client will actually
finish.  Although, at the end of the day, I'm willing to sacrifice
that (and let another provider solve it), but I'd like to at least
explore solutions that don't.


I'm not convinced the original last-guy-wins design was a good idea. On 
sites with high load, it meant many simultaneously cached copies would 
race to complete, using number_of_attempts * size diskspace being used 
unnecessarily, and a potentially large load spike to the backend.


People with high load sites cite this as a key reason not to use the 
cache, as the resultant load spike to their backends cannot be sustained.


The large_disk_cache has made an attempt to address this issue, where 
the above is a problem, this is a case where large_disk_cache can step in.



3) for the case where multiple simultaneous requests are made for an
uncached but cacheable entity, will simply pass-through all requests for
which the open/O_EXCL fails and does not attempt to read from an
as-yet-incomplete cache file


+1 modulo if you can get a clean design per #2 above.

I just noticed that someone introduced an "open_header_timeout" which
incurs a penalty when a partial file is present (!).  So, let me add
the following constraint derived from #3:

4) No penalties are applied to the client when we do not have a
complete cached entity on hand.

Sleeping is never an acceptable solution.  If we don't have it, it's
okay to serve directly and avoid the cache.  mod_disk_cache takes the
opportunity to try to recache, but avoiding the cache 

Re: mod_cache and its ilk

2006-10-30 Thread Justin Erenkrantz

On 10/30/06, Joe Orton <[EMAIL PROTECTED]> wrote:

On Mon, Oct 30, 2006 at 10:26:18AM -0700, Justin Erenkrantz wrote:
> My concern with this is we should be careful not to teach the
> providers about the fact that it is sitting in an output filter chain.

This is solvable if desired, but I'd like to address it separately to
keep this thread under control.


Sure.


> >2) avoids writing a given URL/variant to the cache more than once
> >simultaneously using open/O_EXCL
>
> There's problems with relying upon O_EXCL.  mod_disk_cache
> purposefully lets the race condition happen as without inter-process
> sync, it's not really easy to know who is actually around and is
> likely to finish.

Are you talking issues with O_EXCL not being reliable on e.g. NFS, or
just about the fact that the partial file left by a dead/hung process
subsequently prevents any chance to write to that cache file?


The latter.  (NFS is a lower priority - I'd like to see it work, but
if it can't no biggie, IMO.)


This is a tricky trade-off; I can go either way.  I would say it would
be OK to rely on htcacheclean having heuristics to expire (prematurely
if necessary) such partial files.


Hmm.  Maybe, but that's a bit more complex than I'd like to see.
*shrug*  -- justin


Re: mod_cache and its ilk

2006-10-30 Thread Joe Orton
On Mon, Oct 30, 2006 at 10:26:18AM -0700, Justin Erenkrantz wrote:
> On 10/30/06, Joe Orton <[EMAIL PROTECTED]> wrote:
> >1) cannot write entire response to disk for any content type before
> >sending anything to the client; filter acts by writing to cache and
> >client synchronously
> 
> My concern with this is we should be careful not to teach the
> providers about the fact that it is sitting in an output filter chain.

This is solvable if desired, but I'd like to address it separately to 
keep this thread under control.

> >2) avoids writing a given URL/variant to the cache more than once
> >simultaneously using open/O_EXCL
> 
> There's problems with relying upon O_EXCL.  mod_disk_cache
> purposefully lets the race condition happen as without inter-process
> sync, it's not really easy to know who is actually around and is
> likely to finish.

Are you talking issues with O_EXCL not being reliable on e.g. NFS, or 
just about the fact that the partial file left by a dead/hung process 
subsequently prevents any chance to write to that cache file?

This is a tricky trade-off; I can go either way.  I would say it would 
be OK to rely on htcacheclean having heuristics to expire (prematurely 
if necessary) such partial files.

> 4) No penalties are applied to the client when we do not have a
> complete cached entity on hand.

+1

> 5) If writing to the cache fails, the client doesn't need to know
> that, but the admin should get a nice note somewhere.

+1

joe


Re: mod_cache and its ilk

2006-10-30 Thread Justin Erenkrantz

On 10/30/06, Joe Orton <[EMAIL PROTECTED]> wrote:

A simple general-purpose disk cache which makes no assumptions about
speed of backend, speed of storage or speed of clients; is
single-threaded and does not involve any multi-process synchronisation
beyond open/O_EXCL.  Specifically:


+1.  I would prefer that all bells and whistles be kept out of
mod_disk_cache and we provide another alternative that has more
complex behaviors that scales a little better for those who care.
(That was the whole reason behind a versioned provider interface to
begin with!)


1) cannot write entire response to disk for any content type before
sending anything to the client; filter acts by writing to cache and
client synchronously


My concern with this is we should be careful not to teach the
providers about the fact that it is sitting in an output filter chain.

In other words, I would prefer that we create an abstraction that does
not force the providers to have to worry about passing brigades down a
filter chain or dealing with filter errors.  It just needs to stash
some data away and that's all I want the provider to be concerned
with.

Perhaps we provide an optional continuation function to the
store_body() analogue that the provider can call after it writes a
bucket which mod_cache can then pass along down the filter chain on
its behalf.  Otherwise, we'd have way too much duplicate code to deal
with and that concerns me.

The implementation that was recently checked in makes it so that we
can not distinguish between a failed cache storage and a filter error.
I'd like to be able to maintain knowing the difference and not having
a failed cache storage affect the client.  See my add'l item #5 below.


2) avoids writing a given URL/variant to the cache more than once
simultaneously using open/O_EXCL


There's problems with relying upon O_EXCL.  mod_disk_cache
purposefully lets the race condition happen as without inter-process
sync, it's not really easy to know who is actually around and is
likely to finish.  So, mod_disk_cache lets the last guy to succeed
'win'.  I think this leads to a far simpler design within
mod_disk_cache (it trades writing duplicates of the cached entity for
a higher chance of success), but I'd be curious to see if you can come
up with a scalable design using O_EXCL.  I tried before and never came
up with a clean or scalable design as the solutions with O_EXCL placed
too much faith in the fact that one selected client will actually
finish.  Although, at the end of the day, I'm willing to sacrifice
that (and let another provider solve it), but I'd like to at least
explore solutions that don't.


3) for the case where multiple simultaneous requests are made for an
uncached but cacheable entity, will simply pass-through all requests for
which the open/O_EXCL fails and does not attempt to read from an
as-yet-incomplete cache file


+1 modulo if you can get a clean design per #2 above.

I just noticed that someone introduced an "open_header_timeout" which
incurs a penalty when a partial file is present (!).  So, let me add
the following constraint derived from #3:

4) No penalties are applied to the client when we do not have a
complete cached entity on hand.

Sleeping is never an acceptable solution.  If we don't have it, it's
okay to serve directly and avoid the cache.  mod_disk_cache takes the
opportunity to try to recache, but avoiding the cache is okay too;
sleeping is an absolute no-no.  Both are acceptable solutions, IMO.
sleeping is not.

And, derived from the issue with the failure scenario:

5) If writing to the cache fails, the client doesn't need to know
that, but the admin should get a nice note somewhere.

On the surface, this speaks against the storage operation being a
filter in the traditional sense.


A disk cache which makes different assumptions about speed of backend or
uses more complex caching tricks should be in the tree as a different
provider in a module with a different name; the exact assumptions made
need to be well documented.  (Maybe something like
"mod_local_disk_cache" would be appropriate for the name?)

Needless to say, code making bogus assumptions about the output
filtering interface has no place in the tree and fails the "correctness"
test.


+1.  -- justin


Re: mod_cache and its ilk

2006-10-30 Thread Graham Leggett
On Mon, October 30, 2006 1:03 pm, Joe Orton wrote:

>> On Mon, October 30, 2006 12:07 pm, Joe Orton wrote:
>> > 1) cannot write entire response to disk for any content type before
>> > sending anything to the client; filter acts by writing to cache and
>> > client synchronously
>>
>> Justin vetoed this.
>
> Can we limit this thread to just gaining consensus on the goals?
>
> Justin vetoed a change which implemented a specific fix for a memory
> consumption issue in the cache; there are any number of different ways
> to fix that issue which are not contrary to the above goal.

Ok.

Regards,
Graham
--




Re: mod_cache and its ilk

2006-10-30 Thread Joe Orton
On Mon, Oct 30, 2006 at 12:18:30PM +0200, Graham Leggett wrote:
> On Mon, October 30, 2006 12:07 pm, Joe Orton wrote:
> > 1) cannot write entire response to disk for any content type before
> > sending anything to the client; filter acts by writing to cache and
> > client synchronously
> 
> Justin vetoed this.

Can we limit this thread to just gaining consensus on the goals?

Justin vetoed a change which implemented a specific fix for a memory 
consumption issue in the cache; there are any number of different ways 
to fix that issue which are not contrary to the above goal.

joe


Re: mod_cache and its ilk

2006-10-30 Thread Graham Leggett
On Mon, October 30, 2006 12:07 pm, Joe Orton wrote:

> Thanks Roy.  So, the goals for mod_disk_cache as I see it:
>
> A simple general-purpose disk cache which makes no assumptions about
> speed of backend, speed of storage or speed of clients; is
> single-threaded and does not involve any multi-process synchronisation
> beyond open/O_EXCL.  Specifically:

So, as has been suggested before, I suggest we have mod_disk_cache that
implements the cache as described by Joe here, along with fixes to ensure
that it works (it currently doesn't).

Then, we have mod_large_disk_cache that does the extra stuff contributed
by Niklas Edmundsson. The end user can then make an intelligent choice as
to the best cache that fits their needs.

> 1) cannot write entire response to disk for any content type before
> sending anything to the client; filter acts by writing to cache and
> client synchronously

Justin vetoed this.

The mod_cache itself should be responsible for breaking any large buckets
into bite size chunks, to prevent the cache module from having to care
about large responses. This maintains the abstraction between
write-to-cache and write-to-network.

> A disk cache which makes different assumptions about speed of backend or
> uses more complex caching tricks should be in the tree as a different
> provider in a module with a different name; the exact assumptions made
> need to be well documented.  (Maybe something like
> "mod_local_disk_cache" would be appropriate for the name?)

+1 - I suggested mod_large_disk_cache, as it is taylored for large
responses, but I'm not attached to any specific name.

> Needless to say, code making bogus assumptions about the output
> filtering interface has no place in the tree and fails the "correctness"
> test.

Obviously, thus the notify solution.

Regards,
Graham
--




Re: mod_cache and its ilk

2006-10-30 Thread Joe Orton
Thanks Roy.  So, the goals for mod_disk_cache as I see it:

A simple general-purpose disk cache which makes no assumptions about 
speed of backend, speed of storage or speed of clients; is 
single-threaded and does not involve any multi-process synchronisation 
beyond open/O_EXCL.  Specifically:

1) cannot write entire response to disk for any content type before 
sending anything to the client; filter acts by writing to cache and 
client synchronously

2) avoids writing a given URL/variant to the cache more than once 
simultaneously using open/O_EXCL

3) for the case where multiple simultaneous requests are made for an 
uncached but cacheable entity, will simply pass-through all requests for 
which the open/O_EXCL fails and does not attempt to read from an 
as-yet-incomplete cache file

A disk cache which makes different assumptions about speed of backend or 
uses more complex caching tricks should be in the tree as a different 
provider in a module with a different name; the exact assumptions made 
need to be well documented.  (Maybe something like 
"mod_local_disk_cache" would be appropriate for the name?)

Needless to say, code making bogus assumptions about the output 
filtering interface has no place in the tree and fails the "correctness" 
test.

Regards,

joe


Re: mod_cache and its ilk

2006-10-29 Thread Graham Leggett

Roy T. Fielding wrote:


As far as *I* am concerned, changes to the cache code must be correct
first and then perform second, and both of those should be proven by
actual testing before being committed to trunk.


+1.

We have an existing cache that breaks in real world environments.

We have a contributed patch set from Niklas Edmundsson that addresses 
these issues, and is used in production. It works. A significant amount 
of work has been done to ensure that after each patch was committed, the 
code was tested and still worked.


It works for me, very well.

We have some very valid objections to some of the methods used in this 
patch set, and based on these objections a major part of one patch was 
rewritten, and the last patch in the set was never committed.


We also have some very clear things that the patch is not allowed to do 
- including but not limited to threading and forking.


In response to the above, the following has been identified:

- APR needs significantly improved documentation attached to its doxygen 
comments.


- APR needs a notifier API to determine whether ap_core_output_filter() 
will block. This addresses Joe objection to the assumption that 
ap_core_output_filter() won't block on files. This also removes the need 
for any threading or forking.


- The notifier is also needed so that the need to fstat then sleep is 
removed.


Further work is being done to solve the above issues, but too few people 
are testing this code. This is another call to download trunk and to try 
it out, and to identify any issues encountered so that they may be 
identified and fixed.


Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature