Re: Possible new cache architecture

2006-05-02 Thread Davi Arnaut
On Wed, 03 May 2006 01:09:03 +0200
Graham Leggett <[EMAIL PROTECTED]> wrote:

> Davi Arnaut wrote:
> 
> > Graham, what I want is to be able to write a mod_cache backend _without_
> > having to worry about HTTP.
> 
> Then you will end up with code that does not meet the requirements of 
> HTTP, and you will have wasted your time.

Yeah, right! How ? Hey, you are using the Monty Python argument style.
Can you point to even one requirement of HTTP that my_cache_provider
wont meet ?

> Please go through _all_ of the mod_cache architecture, and not just 
> mod_disk_cache. Also read and understand HTTP/1.1 gateways and caches, 
> and as you want to create a generic cache, read and understand mod_ldap, 
> a module that will probably benefit from the availability of a generic 
> cache. Then step back and see that mod_cache is a small part of a bigger 
> picture. At this point you'll see that as nice as your idea of a simple 
> generic cache interface is, it's not going to be the most elegant 
> solution to the problem.

blah, blah.. you essentially said: "I don't want a simpler interface,
I think the current mess is more elegant."

I have shown you that I can even wrap your messy cache_provider hooks
into a much simpler one, how can anything else be more elegant ?

--
Davi Arnaut


Re: Possible new cache architecture

2006-05-02 Thread Graham Leggett

Davi Arnaut wrote:


Graham, what I want is to be able to write a mod_cache backend _without_
having to worry about HTTP.


Then you will end up with code that does not meet the requirements of 
HTTP, and you will have wasted your time.


Please go through _all_ of the mod_cache architecture, and not just 
mod_disk_cache. Also read and understand HTTP/1.1 gateways and caches, 
and as you want to create a generic cache, read and understand mod_ldap, 
a module that will probably benefit from the availability of a generic 
cache. Then step back and see that mod_cache is a small part of a bigger 
picture. At this point you'll see that as nice as your idea of a simple 
generic cache interface is, it's not going to be the most elegant 
solution to the problem.


Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Possible new cache architecture

2006-05-02 Thread Davi Arnaut
On Tue, 02 May 2006 23:31:13 +0200
Graham Leggett <[EMAIL PROTECTED]> wrote:

> Davi Arnaut wrote:
> 
> >> The way HTTP caching works is a lot more complex than in your example, you
> >> haven't taken into account conditional HTTP requests.
> > 
> > I've taken into account the actual mod_disk_cache code!
> 
> mod_disk_cache doesn't contain any of the conditional HTTP request code, 
> which is why you're not seeing it there.
> 
> Please keep in mind that the existing mod_cache framework's goal is to 
> be a fully HTTP/1.1 compliant, content generator neutral, efficient, 
> error free and high performance cache.
> 
> Moving towards and keeping with the above goals is a far higher priority 
> than simplifying the generic backend cache interface.
> 
> To sum up - the cache backend must fulfill the requirements of the cache 
> frontend (generic or not), which in turn must fulfill the requirements 
> of the users, who are browsers, web robot code, and humans. To try and 
> prioritise this the other way round is putting the cart before the horse.

Graham, what I want is to be able to write a mod_cache backend _without_
having to worry about HTTP. _NOT_ to rewrite mod_disk/proxy/cache/whatever!

You keep talking about HTTP this, HTTP that, I wont change the way it currently
works. I just want to place a glue beteween the storage and the HTTP part.

I could even wrap around your code:

typedef struct 
apr_status_t (*fetch) (cache_handle_t *h, apr_bucket_brigade *bb);
apr_status_t (*store) (cache_handle_t *h, apr_bucket_brigade *bb);
int (*remove) (const char *key);
} my_cache_provider;

typedef struct {
const char *key_headers;
const char *key_body;
} my_cache_object;

create_entity:
my_cache_object *obj;

obj->key_headers = hash_headers(request, whatever);
obj->key_body = hash_body(request, whatever);

open_entity:
my_cache_object *obj;

my_provider->fetch(h, obj->key_headers, header_brigade);

// if necessary, update obj->key_headers/body (vary..)


remove_url:
my_provider->remove(obj->key_header);
my_provider->remove(obj->key_body);

remove_entity:
nop

store_headers:
my_cache_object *obj;
// if necessary, update obj->key_headers (vary..)
my_provider->store(h, obj->key_headers, header_brigade);

store_body:
my_cache_object *obj;
my_provider->store(h, obj->key_body, body_brigade)

recall_headers:
my_cache_object *obj;
my_provider->fetch(h, obj->key_headers, header_brigade);

recall_body:
my_cache_object *obj;
my_provider->fetch(h, obj->key_body, body_brigade);

--
Davi Arnaut


Re: Possible new cache architecture

2006-05-02 Thread Graham Leggett

Davi Arnaut wrote:


The way HTTP caching works is a lot more complex than in your example, you
haven't taken into account conditional HTTP requests.


I've taken into account the actual mod_disk_cache code!


mod_disk_cache doesn't contain any of the conditional HTTP request code, 
which is why you're not seeing it there.


Please keep in mind that the existing mod_cache framework's goal is to 
be a fully HTTP/1.1 compliant, content generator neutral, efficient, 
error free and high performance cache.


Moving towards and keeping with the above goals is a far higher priority 
than simplifying the generic backend cache interface.


To sum up - the cache backend must fulfill the requirements of the cache 
frontend (generic or not), which in turn must fulfill the requirements 
of the users, who are browsers, web robot code, and humans. To try and 
prioritise this the other way round is putting the cart before the horse.


Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Possible new cache architecture

2006-05-02 Thread Gonzalo Arana

On 5/2/06, Brian Akins <[EMAIL PROTECTED]> wrote:

Gonzalo Arana wrote:

> What problems have you seen with this approach?  postfix uses this
> architecture, for instance.

Postfix implements SMTP, which is an asynchronous protocol.


and which problems may bring this approach?


> Excuse my ignorance, what does "event mpm ... keep the balance very
> good" mean?

Not all your threads are tied up doing keepalives, for example.


ah, I see (I was unfamiliar with event MPM, sory).

--
Gonzalo A. Arana


Re: Possible new cache architecture

2006-05-02 Thread Brian Akins

Gonzalo Arana wrote:


What problems have you seen with this approach?  postfix uses this
architecture, for instance.


Postfix implements SMTP, which is an asynchronous protocol.

Excuse my ignorance, what does "event mpm ... keep the balance very 
good" mean?


Not all your threads are tied up doing keepalives, for example.


--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies


Re: Possible new cache architecture

2006-05-02 Thread Gonzalo Arana

On 5/2/06, Brian Akins <[EMAIL PROTECTED]> wrote:

Gonzalo Arana wrote:
> A more suitable design for this task I think would be to make each
> process to have a special purpose: cache maintenance (purging expired
> entries, purging entries to make room for new ones, creating new
> entries, and so on), request processing (network/disk I/O, content
> filtering, and so on), or what ever.

In my experience, this always sounds good in theory, but just doesn't
ever work in the real world.  The event mpm is "sorta" a step in that
direction, but seems to keep the balance pretty good.


What problems have you seen with this approach?  postfix uses this
architecture, for instance.

Excuse my ignorance, what does "event mpm ... keep the balance very good" mean?

--
Gonzalo A. Arana


Re: Possible new cache architecture

2006-05-02 Thread Brian Akins

Gonzalo Arana wrote:

A more suitable design for this task I think would be to make each
process to have a special purpose: cache maintenance (purging expired
entries, purging entries to make room for new ones, creating new
entries, and so on), request processing (network/disk I/O, content
filtering, and so on), or what ever.


In my experience, this always sounds good in theory, but just doesn't 
ever work in the real world.  The event mpm is "sorta" a step in that 
direction, but seems to keep the balance pretty good.




--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies


Re: Possible new cache architecture

2006-05-02 Thread Davi Arnaut
On Tue, 2 May 2006 17:22:00 +0200 (SAST)
"Graham Leggett" <[EMAIL PROTECTED]> wrote:

> On Tue, May 2, 2006 7:06 pm, Davi Arnaut said:
> 
> > There is not such scenario. I will simulate a request using the disk_cache
> > format:
> 
> The way HTTP caching works is a lot more complex than in your example, you
> haven't taken into account conditional HTTP requests.

I've taken into account the actual mod_disk_cache code! Let me try to translate
your typical scenario.

> A typical conditional scenario goes like this:
> 
> - Browser asks for URL from httpd.

Same.

> - Mod_cache has a cached copy by looking up the headers BUT - it's stale.
> mod_cache converts the browser's original request to a conditional request
> by adding the header If-None-Match.

sed s/mod_cache/mod_http_cache

> - The backend server answers "no worries, what you have is still fresh" by
> sending a "304 Not Modified".

sed s/mod_cache/mod_http_cache

> - mod_cache takes the headers from the 304, and replaces the headers on
> the cached entry, in the process making the entry "fresh" again.

sed s/mod_cache/mod_http_cache

> - mod_cache hands the cached data back to the browser.

sed s/mod_cache/mod_http_cache

> Read http://www.ietf.org/rfc/rfc2616.txt section 13 (mainly) to see in
> detail how this works.

Again: we do not want to change the semantics, we only want to separate
the HTTP specific part from the storage specific part. The HTTP specific
parts of mod_disk_cache, mod_mem_cache and mod_cache are moved to a
mod_http_cache, while retaining the storage specific parts. And mod_cache
is the one who will combine those two layers.

Again: it's the same thing as we were replacing all mod_disk_cache file
operations by hash table operations.

--
Davi Arnaut


Re: Possible new cache architecture

2006-05-02 Thread Gonzalo Arana

Seems to me that the thundering herd / performance degradation is
inherent to apache design: all threads/processes are exact clones.

A more suitable design for this task I think would be to make each
process to have a special purpose: cache maintenance (purging expired
entries, purging entries to make room for new ones, creating new
entries, and so on), request processing (network/disk I/O, content
filtering, and so on), or what ever.

This way, performance degradation caused by cache mutex can be
minimized.  Request processors would only get queued/locked when
querying the cache, which can be made a single operation if cache is
smart enough to figure out the right response from original request,
right?

Regards,

--
Gonzalo A. Arana


Re: Possible new cache architecture

2006-05-02 Thread Graham Leggett
On Tue, May 2, 2006 5:50 pm, Brian Akins said:

> This seems more like a wish list.  I just want to separate out the cache
> and protocol stuff.

HTTP compliance isn't a wish, it's a requirement. A patch that breaks
compliance will end up being -1'ed.

The thundering herd issues are also a requirement, as provision was made
for it in the v2.0 design. The cache must deliver what the HTTP cache
requires (which in turn delivers what users require), not the other way
around.

Separating the cache and the protocol has advantages, but it also has the
disadvantage that fixing bugs like thundering herd may require interface
changes, forcing people to have to wait for major version number changes
before they see their problems fixed.

In this scenario, the separation of cache and protocol is (very) nice to
have, but not so nice that end users are disadvantaged.

>> - The ability to amend a subkey (the headers) on an entry that is
>> already
>> cached.
>
> mod_http_cache should handle.  to new mod_cache, it's just another
> key/value.

How does mod_http_cache do this without the need for locking (and thus
performance degradation)?

How does mod_cache guarantee that it won't expire the body without
atomically expiring the headers with it?

>> - The ability to invalidate a particular cached variant (ie headers +
>> data) in one atomic step, without affecting threads that hold that
>> cached
>> entry open at the time.
>
> mod_http_cache should handle.

Entry invalidation is definitely mod_cache's problem, it falls under cache
size maintenance and expiry.

Remember that mod_http_cache only runs when requests are present, entry
invalidation has to happen whether there are requests present or not, via
a separate thread, separate process, cron job, whatever.

>> - The ability to read from a cached object that is still being written
>> to.
>
> Nice to have.  out of scope for what I am proposing.  new mod_cache
> should be the place to implement this if underlying provider supports it.

It's not nice to have, no. It's a real problem that has inspired people to
log bugs, and very recently, for one person to submit a patch.

Regards,
Graham
--




[Fwd: 2.2+ security page empty?]

2006-05-02 Thread William A. Rowe, Jr.

An open forward from your friendly security team.


 Original Message 
Subject: 2.2+ security page empty?
Date: Tue, 2 May 2006 14:53:53 +0100 (BST)
From: Per Olausson <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]


  There is nothing on the security page any more for 2.2, is there a bug
with the report you use to populate it?

  Surely, there should be something on there for the previous 2.2 releases
just as there is for 2.0 etc?

  Regards,


  Per




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


.



Re: Possible new cache architecture

2006-05-02 Thread Brian Akins

Graham Leggett wrote:


To be HTTP compliant, and to solve thundering herd, we need the following
from a cache:



This seems more like a wish list.  I just want to separate out the cache 
and protocol stuff.




- The ability to amend a subkey (the headers) on an entry that is already
cached.


mod_http_cache should handle.  to new mod_cache, it's just another 
key/value.



- The ability to invalidate a particular cached variant (ie headers +
data) in one atomic step, without affecting threads that hold that cached
entry open at the time.


mod_http_cache should handle. Keep a list of variants cached - this 
should use a provider interface as well.  mod_cache would handle 
whatever locking, ref counting, etc, needs to be done, if any.



- The ability to read from a cached object that is still being written to.


Nice to have.  out of scope for what I am proposing.  new mod_cache 
should be the place to implement this if underlying provider supports it.




- A guarantee that the result of a broken write (segfault, timeout,
connection reset by peer, whatever) will not result in a broken cached
entry (ie that the cached entry will eventually be invalidated, and all
threads trying to read from it will eventually get an error).


agreed.  new mod_cache should handle this.


Certainly separate the protocol from the physical cache, just make sure
the physical cache delivers the shopping list above :)


Most seem like protocol specific stuff.


--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies


Re: Possible new cache architecture

2006-05-02 Thread Graham Leggett
On Tue, May 2, 2006 5:27 pm, Brian Akins said:

> Still not sure how this is different from what we are proposing.  we
> really want to separate protocol from cache stuff.  If we have a
> "revalidate" for the generic cache it should address all your concerns.
> ???

To be HTTP compliant, and to solve thundering herd, we need the following
from a cache:

- The ability to amend a subkey (the headers) on an entry that is already
cached.

- The ability to invalidate a particular cached variant (ie headers +
data) in one atomic step, without affecting threads that hold that cached
entry open at the time.

- The ability to read from a cached object that is still being written to.

- A guarantee that the result of a broken write (segfault, timeout,
connection reset by peer, whatever) will not result in a broken cached
entry (ie that the cached entry will eventually be invalidated, and all
threads trying to read from it will eventually get an error).

Certainly separate the protocol from the physical cache, just make sure
the physical cache delivers the shopping list above :)

Regards,
Graham
--




Re: Possible new cache architecture

2006-05-02 Thread Brian Akins

Graham Leggett wrote:


The way HTTP caching works is a lot more complex than in your example, you
haven't taken into account conditional HTTP requests.
...


Still not sure how this is different from what we are proposing.  we 
really want to separate protocol from cache stuff.  If we have a 
"revalidate" for the generic cache it should address all your concerns.  ???





--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies


Re: Possible new cache architecture

2006-05-02 Thread Graham Leggett
On Tue, May 2, 2006 7:06 pm, Davi Arnaut said:

> There is not such scenario. I will simulate a request using the disk_cache
> format:

The way HTTP caching works is a lot more complex than in your example, you
haven't taken into account conditional HTTP requests.

A typical conditional scenario goes like this:

- Browser asks for URL from httpd.

- Mod_cache has a cached copy by looking up the headers BUT - it's stale.
mod_cache converts the browser's original request to a conditional request
by adding the header If-None-Match.

- The backend server answers "no worries, what you have is still fresh" by
sending a "304 Not Modified".

- mod_cache takes the headers from the 304, and replaces the headers on
the cached entry, in the process making the entry "fresh" again.

- mod_cache hands the cached data back to the browser.

Read http://www.ietf.org/rfc/rfc2616.txt section 13 (mainly) to see in
detail how this works.

Regards,
Graham
--




Re: mod_disk_cache patch, preview edition (was: new cache arch)

2006-05-02 Thread Graham Leggett
On Tue, May 2, 2006 3:50 pm, Niklas Edmundsson said:

> Are there partially cached files? If I request the last 200 bytes of a
> 4.3GB DVD image, the bucket brigade contains the complete file... The
> headers says ranges and all sorts of things but they don't match
> what's cached.

By "partially cached" I meant a file that was half cached, and other
processes/threads are serving content from that cache.

>> What may be useful is a cache header with some metadata in it giving the
>> total size and a "download failed" flag, which goes in front of the
>> headers. The metadata can also contain the offset of the body.
>
> I solved it with size in the body and a timeout mechanism, a "download
> failed" flag doesn't cope with segfaults.

True, but a timeout forces the end user to wait in cases where we already
know the backend is dead. This typically won't happen with a disk backend,
but it will happen with a mod_proxy backend (think connection reset by
peer).

> It's possible, but since I needed to hammer so hard at mod_disk_cache
> to get it in the shape I wanted it I set out to first get the whole
> thing working and then worry about breaking the patch into manageable
> pieces. For example, by doing it all-incremental there would have been
> a dozen or so disk format change-patches, and I really don't think you
> would have wanted that :)

We do want that if possible :) Small changes are easy to understand, and
thus in turn easy to get the three votes needed for inclusion into httpd
v2.2 from trunk.

Regards,
Graham
--




Re: Possible new cache architecture

2006-05-02 Thread Davi Arnaut
On Tue, 2 May 2006 15:40:30 +0200 (SAST)
"Graham Leggett" <[EMAIL PROTECTED]> wrote:

> On Tue, May 2, 2006 3:24 pm, Brian Akins said:
> 
> >> - the cache says "cool, will send my copy upstream. Oops, where has my
> >> data gone?".
> 
> > So, the cache says, okay must get content the old fashioned way (proxy,
> > filesystem, magic fairies, etc.).
> >
> > Where's the issue?
> 
> To rephrase it, a whole lot of extra code, which has to be written and
> debugged, has to say "oops, ok sorry backend about the If-None-Match, I
> thought I had it cached but I actually didn't, please can I have the full
> file?". Then the backend gives you a response with different headers to
> those you already delivered to the frontend. Oops.

There is not such scenario. I will simulate a request using the disk_cache
format:

. Incoming client requests URI /foo/bar/baz
. Request goes through mod_http_cache, Generate  off of URI
. mod_http_cache ask mod_cache for the data associated with key: .header
. No data:
. Fetch from upstream
. Data Fetched:
. If format #1 (Contains a list of Vary Headers):
. Use each header name (from .header) with our request
values (headers_in) to regenerate  using HeaderName+
HeaderValue+URI
. Ask mod_cache for data with key: .header
. No data:
. Fetch from upstream
. Data:
. Serve data to client
. If format #2
. Serve data to client

Where is the difference ?

> Keeping the code as simple as possible will keep your code bug free, which
> means less time debugging for you, and less time for end users trying to
> figure out what the cause is of their weird symptoms.

We are trying to get it more simple as possible by separating the storage
layer from the protocol layer.

--
Davi Arnaut
Davi Arnaut


Re: Possible new cache architecture

2006-05-02 Thread Plüm , Rüdiger , VF EITO


> -Ursprüngliche Nachricht-
> Von: Niklas Edmundsson 

> 
> Correct. When caching a 4.3GB file on a 32bit arch it gets so 
> bad that 
> mmap eats all your address space and the thing segfaults. I initally 
> thought it was eating memory, but that's only if you have mmap 
> disabled.

Ahh, good point. So I guess its needed to remove the mmap buckets
in the loop from the brigade.

Regards

Rüdiger


Re: mod_disk_cache patch, preview edition (was: new cache arch)

2006-05-02 Thread Niklas Edmundsson

On Tue, 2 May 2006, Graham Leggett wrote:


The need-size-issue goes for retrievals as well.


If you are going to read from partially cached files, you need a "total
size" field as well as a flag to say "give up, this attempt at caching
failed"


Are there partially cached files? If I request the last 200 bytes of a 
4.3GB DVD image, the bucket brigade contains the complete file... The 
headers says ranges and all sorts of things but they don't match 
what's cached.



What may be useful is a cache header with some metadata in it giving the
total size and a "download failed" flag, which goes in front of the
headers. The metadata can also contain the offset of the body.


I solved it with size in the body and a timeout mechanism, a "download 
failed" flag doesn't cope with segfaults.



OK. It's attached. It has only had mild testing using the worker mpm
with mmap enabled, it needs a bit more testing and auditing before
trusting it too hard.

Note that this patch fixes a whole slew of other issues along the way,
the most notable ones being LFS on 32bit arch, don't eat all your
32bit memory/address space when caching a huge files, provide
r->filename so %f in LogFormat works, and other smaller issues.


Is it possibly to split the patch into separate fixes for each issue
(where practical)? It makes it easier to digest.


It's possible, but since I needed to hammer so hard at mod_disk_cache 
to get it in the shape I wanted it I set out to first get the whole 
thing working and then worry about breaking the patch into manageable 
pieces. For example, by doing it all-incremental there would have been 
a dozen or so disk format change-patches, and I really don't think you 
would have wanted that :)


As said, this is a preliminary jumbo patch for those interested in how 
we tackled the various problems involved (or those who love to take 
bleeding edge code for a spin and watch it falling into pieces when 
hitting a weird corner case ;).



Also the other fixes can be committed immediately/soon, depending on how
simple they are, which will simplify the final patch.


Yup. I'll update bug#39380 when we feel that we have a good solution.

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 To err is Human. To blame someone else is politics.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Re: Possible new cache architecture

2006-05-02 Thread Graham Leggett
On Tue, May 2, 2006 3:24 pm, Brian Akins said:

>> - the cache says "cool, will send my copy upstream. Oops, where has my
>> data gone?".

> So, the cache says, okay must get content the old fashioned way (proxy,
> filesystem, magic fairies, etc.).
>
> Where's the issue?

To rephrase it, a whole lot of extra code, which has to be written and
debugged, has to say "oops, ok sorry backend about the If-None-Match, I
thought I had it cached but I actually didn't, please can I have the full
file?". Then the backend gives you a response with different headers to
those you already delivered to the frontend. Oops.

Keeping the code as simple as possible will keep your code bug free, which
means less time debugging for you, and less time for end users trying to
figure out what the cause is of their weird symptoms.

Regards,
Graham
--




Re: [PATCH] #39275 MaxClients on startup [Was: Bug in 2.0.56-dev]

2006-05-02 Thread Jeff Trawick

On 5/2/06, Chris Darroch <[EMAIL PROTECTED]> wrote:


   If you can bear with me for a day or two more, I should have
a collection of patches ready.  These tackle the issue by
tracking the start and listener threads in a nice new spot in
the scoreboard, and also clean up various issues and bugs relating
to fork(), ThreadLimit, ServerLimit, MaxClients, etc.  They also
tackle some issues raised in various XXX comments and clean up
some stale cruft in scoreboard.h.


Note that perform_idle_server_maintenance() can't depend on the child
process to put stuff in the scoreboard in order to avoid a fork bomb. 
Assume fork() stalls for 30 seconds in the child before returning to

our code, and make sure there is no opportunity for ramping up the
number of processes while we wait for a child to get a chance to
initialize.


Re: Possible new cache architecture

2006-05-02 Thread Davi Arnaut
On Tue, 2 May 2006 11:22:31 +0200 (MEST)
Niklas Edmundsson <[EMAIL PROTECTED]> wrote:

> On Mon, 1 May 2006, Davi Arnaut wrote:
> 
> > More important, if we stick with the key/data concept it's possible to
> > implement the header/body relationship under single or multiple keys.
> 
> I've been hacking on mod_disk_cache to make it:
> * Only store one set of data when one uncached item is accessed
>simultaneously (currently all requests cache the file and the last
>finished cache process is "wins").
> * Don't wait until the whole item is cached, reply while caching
>(currently it stalls).
> * Don't block the requesting thread when requestng a large uncached
>item, cache in the background and reply while caching (currently it
>stalls).
> 
> This is mostly aimed at serving huge static files from a slow disk 
> backend (typically an NFS export from a server holding all the disk), 
> such as http://ftp.acc.umu.se/ and http://ftp.heanet.ie/ .
> 
> Doing this with the current mod_disk_cache disk layout was not 
> possible, doing the above without unneccessary locking means:
> 
> * More or less atomic operations, so caching headers and data in
>separate files gets very messy if you want to keep consistency.
> * You can't use tempfiles since you want to be able to figure out
>where the data is to be able to reply while caching.
> * You want to know the size of the data in order to tell when you're
>done (ie the current size of a file isn't necessarily the real size
>of the body since it might be caching while we're reading it).
> 
> In the light of our experiences, I really think that you want to have 
> a concept that allows you to keep the bond between header and data. 
> Yes, you can patch up a missing bond by require locking and stuff, but 
> I really prefer not having to lock cache files when doing read access. 
> When it comes to "make the common case fast" a lockless design is very 
> much preferred.

I will repeat once again: there is no locking involved, unless your format
of storing the header/data is really wrong. _The data format is up to
the module using it_, while the storage backend is a completely different
issue.

> However, if all those issues are sorted out in the layer above disk 
> cache then the above observations becomes more or less moot.

Yes, that's the point.

> In any case the patch is more or less finished, independent testing 
> and auditing haven't been done yet but I can submit a preliminary 
> jumbo-patch if people are interested in having a look at it now.

--
Davi Arnaut


Re: Possible new cache architecture

2006-05-02 Thread Niklas Edmundsson

On Tue, 2 May 2006, Graham Leggett wrote:


If it's:
* Link to latest GNOME Live CD gets published on Slashdot.
* A gazillion users click the link to download it.
* mod_disk_cache starts a new instance of caching the file for each
   request, until someone has completed caching the file.


Then this is the thundering herd problem :)


OK :)


Either a site is slashdotted (as in your case), or a cached entry expires,
and suddenly the backend gets nailed until at least one request "wins",
then we are back to normal serving from the cache.

In your case, the "backend" is the disk, while in the bug from 1998, the
backend was another webserver. Either way, same problem.


OK.


Then this patch solves the problem regardless of whether it's a static
file or dynamically generated content since it only allows one
instance to cache the file (OK, there's a small hole so there can be
multiple instances but it's wy smaller than now), all other
instances delivers data as the caching process is writing it.



Additionally, if it's a static file that's allowed to be cached in
the background it solves:
* Reduce chance of user getting bored since the data is delivered
   while being cached.
* The user got bored and closed the connection so the painfully cached
   file gets deleted.


Hmmm - thinking about this we try to cache the brigade (all X GB of it)
first, then we try write it to the network, thus the delay.

Does your patch solve all of these already, or are they planned?


It solves everything I've mentioned. The solution is probably not 
perfect for the not-static-file case since it falls back to the old 
behaviour of caching the whole file, but it should be a lot better 
than the current mod_disk_cache since the rest of the threads get 
reply-while-caching. There are issues here with the fact that the 
result is discarded if the connection is aborted, but I'm not familiar 
enough with apache filter internals to state that you can keep the 
result even though the connection is aborted.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Anything is edible if it's chopped finely enough
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Re: mod_disk_cache patch, preview edition (was: new cache arch)

2006-05-02 Thread Graham Leggett
On Tue, May 2, 2006 2:03 pm, Niklas Edmundsson said:

>> This is great, in doing this you've been solving a proxy bug that was
>> first reported in 1998 :).
>
> OK. Stuck in the "File under L for Later" pile? ;)

Er no, it was under the "redesign the entire code to fix it" class of
bugs. :)

The v2.0 mod_cache design had provision for solving this problem, but it
was never completed. The v1.3 mod_proxy/cache design needed a major
rewrite to fix, the effort was instead put into v2.0.

> Regarding partially cached files, it understands when caching a file
> has failed and so on.

All the cache has to worry about is invalidating all partially cached
files where an upstream error occurred (timeout, connection reset by peer,
whatever) the end goal being to never inadvertantly cache a broken file.

> They are. It seek():s to an offset where the body is stored so
> headers can be updated as long as they don't grow too much.

Ok, makes sense.

> The need-size-issue goes for retrievals as well.

If you are going to read from partially cached files, you need a "total
size" field as well as a flag to say "give up, this attempt at caching
failed"

What may be useful is a cache header with some metadata in it giving the
total size and a "download failed" flag, which goes in front of the
headers. The metadata can also contain the offset of the body.

> OK. It's attached. It has only had mild testing using the worker mpm
> with mmap enabled, it needs a bit more testing and auditing before
> trusting it too hard.
>
> Note that this patch fixes a whole slew of other issues along the way,
> the most notable ones being LFS on 32bit arch, don't eat all your
> 32bit memory/address space when caching a huge files, provide
> r->filename so %f in LogFormat works, and other smaller issues.

Is it possibly to split the patch into separate fixes for each issue
(where practical)? It makes it easier to digest.

Also the other fixes can be committed immediately/soon, depending on how
simple they are, which will simplify the final patch.

Regards,
Graham
--




Re: Possible new cache architecture

2006-05-02 Thread Brian Akins

Graham Leggett wrote:

- the cache says "cool, will send my copy upstream. Oops, where has my 
data gone?".





So, the cache says, okay must get content the old fashioned way (proxy, 
filesystem, magic fairies, etc.).


Where's the issue?



--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies


Re: Possible new cache architecture

2006-05-02 Thread Niklas Edmundsson

On Tue, 2 May 2006, Plüm, Rüdiger, VF EITO wrote:


Another thing: I guess on systems with no mmap support the current 
implementation
of mod_disk_cache will eat up a lot of memory if you cache a large local file,
because it transforms the file bucket(s) into heap buckets in this case.
Even if mmap is present I think that mod_disk_cache causes the file buckets
to be transformed into many mmap buckets if the file is large. Thus we do not
use sendfile in the case we cache the file.


Correct. When caching a 4.3GB file on a 32bit arch it gets so bad that 
mmap eats all your address space and the thing segfaults. I initally 
thought it was eating memory, but that's only if you have mmap 
disabled.



I the case that a brigade only contains file_buckets it might be possible to
"copy" this brigade, sent it up the chain and process the copy of the brigade
for disk storage afterwards. Of course this opens a race if the file gets
changed in between these operations.
This approach does not work with socket or pipe buckets for obvious reasons.
Even heap buckets seem to be a somewhat critical idea because of the 
added memory usage.


I did the somewhat naive approach of only doing background caching 
when the buckets refer to a single sequential file. It's not perfect, 
but it solves the main case where you get a huge amount of data to 
store ...



/Nikke - stumbled upon more than one bug when digging into
 mod_disk_cache
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Anything is edible if it's chopped finely enough
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Re: Possible new cache architecture

2006-05-02 Thread Graham Leggett
On Tue, May 2, 2006 2:18 pm, Niklas Edmundsson said:

> Exactly what is the thundering herd problem? I can guess the general
> problem, but without a more precise definition I can't really say if
> my patch fixes it or not.
>
> If it's:
> * Link to latest GNOME Live CD gets published on Slashdot.
> * A gazillion users click the link to download it.
> * mod_disk_cache starts a new instance of caching the file for each
>request, until someone has completed caching the file.

Then this is the thundering herd problem :)

Either a site is slashdotted (as in your case), or a cached entry expires,
and suddenly the backend gets nailed until at least one request "wins",
then we are back to normal serving from the cache.

In your case, the "backend" is the disk, while in the bug from 1998, the
backend was another webserver. Either way, same problem.

> Then this patch solves the problem regardless of whether it's a static
> file or dynamically generated content since it only allows one
> instance to cache the file (OK, there's a small hole so there can be
> multiple instances but it's wy smaller than now), all other
> instances delivers data as the caching process is writing it.

> Additionally, if it's a static file that's allowed to be cached in
> the background it solves:
> * Reduce chance of user getting bored since the data is delivered
>while being cached.
> * The user got bored and closed the connection so the painfully cached
>file gets deleted.

Hmmm - thinking about this we try to cache the brigade (all X GB of it)
first, then we try write it to the network, thus the delay.

Does your patch solve all of these already, or are they planned?

Regards,
Graham
--




Re: [PATCH] #39275 MaxClients on startup [Was: Bug in 2.0.56-dev]

2006-05-02 Thread Chris Darroch
Jeff Trawick wrote:

> On 5/1/06, Greg Ames <[EMAIL PROTECTED]> wrote:
>
>> after more thought, there is a simpler patch that should do the job.  the 
>> key to both of
>> these is how threads in SERVER_DEAD state with a pid in the scoreboard are 
>> treated.  this
>> means that p_i_s_m forked on a previous timer pop but some thread never made 
>> it into
>> SERVER_STARTING state.
>>
>> the difference:  this patch just counts those potential threads as idle, and 
>> allows
>> MinSpareThreads worth of processes to be forked before putting on the 
>> brakes.  the
>> previous patch pauses the forking immediately when the strange situation is 
>> detected but
>> requires more code and a new variable.
> 
> new patch is fine with me; I think we've lost our other interested
> parties on this thread anyway ;)

   I'm still here!  I had to be away from the keyboard most of last
week, so have only returned to this thread (pun intended, alas) recently.
I also found myself fixing a variety of other things that turned up
in the vicinity of this issue; patches forthcoming on those subjects.

   This is indeed a very straightforward fix; I've been trying to
ponder its consequences overnight.  The questions I've got (no
answers yet, just thought I'd ping so you know I'm here) would be
(a) whether you want to allow for SERVER_GRACEFUL as well to be
counted as idle, and (b) whether there's any reason to want to
not proceed through the if (any_dead_threads && ...) logic that
follows in p_i_s_m().

   If you can bear with me for a day or two more, I should have
a collection of patches ready.  These tackle the issue by
tracking the start and listener threads in a nice new spot in
the scoreboard, and also clean up various issues and bugs relating
to fork(), ThreadLimit, ServerLimit, MaxClients, etc.  They also
tackle some issues raised in various XXX comments and clean up
some stale cruft in scoreboard.h.

Chris.

-- 
GPG Key ID: 366A375B
GPG Key Fingerprint: 485E 5041 17E1 E2BB C263  E4DE C8E3 FA36 366A 375B



Re: Possible new cache architecture

2006-05-02 Thread Plüm , Rüdiger , VF EITO


> -Ursprüngliche Nachricht-
> Von: Graham Leggett 

> > The reason it does not work currently is that that a local file
> > usually is
> > delivered in one brigade with, depending on the size of the 
> file, one or
> > more
> > file buckets.
> 
> Hmmm - ok, this makes sense.
> 
> Something I've never checked for, do output filters support 
> asynchronous
> writes?

I don't think so. Of course this would be a nice feature. Maybe somehow
possible with Colm's ideas.
Another thing: I guess on systems with no mmap support the current 
implementation
of mod_disk_cache will eat up a lot of memory if you cache a large local file,
because it transforms the file bucket(s) into heap buckets in this case.
Even if mmap is present I think that mod_disk_cache causes the file buckets
to be transformed into many mmap buckets if the file is large. Thus we do not
use sendfile in the case we cache the file.
I the case that a brigade only contains file_buckets it might be possible to
"copy" this brigade, sent it up the chain and process the copy of the brigade
for disk storage afterwards. Of course this opens a race if the file gets
changed in between these operations.
This approach does not work with socket or pipe buckets for obvious reasons.
Even heap buckets seem to be a somewhat critical idea because of the added 
memory usage.


Regards

Rüdiger



Re: Possible new cache architecture

2006-05-02 Thread Niklas Edmundsson

On Tue, 2 May 2006, Graham Leggett wrote:


This is great, in doing this you've been solving a proxy bug that was
first reported in 1998 :).


This already works in the case you get the data from the proxy backend. It
does
not work for local files that get cached (the scenario Niklas uses the
cache
for).


Ok then I have misunderstood - I was referring to the thundering herd
problem.


Exactly what is the thundering herd problem? I can guess the general 
problem, but without a more precise definition I can't really say if 
my patch fixes it or not.


If it's:
* Link to latest GNOME Live CD gets published on Slashdot.
* A gazillion users click the link to download it.
* mod_disk_cache starts a new instance of caching the file for each
  request, until someone has completed caching the file.

Then this patch solves the problem regardless of whether it's a static 
file or dynamically generated content since it only allows one 
instance to cache the file (OK, there's a small hole so there can be 
multiple instances but it's wy smaller than now), all other 
instances delivers data as the caching process is writing it.


Additionally, if it's a static file that's allowed to be cached in 
the background it solves:

* Reduce chance of user getting bored since the data is delivered
  while being cached.
* The user got bored and closed the connection so the painfully cached
  file gets deleted.

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Illiterate?  Write for information!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


mod_disk_cache patch, preview edition (was: new cache arch)

2006-05-02 Thread Niklas Edmundsson

On Tue, 2 May 2006, Graham Leggett wrote:


I've been hacking on mod_disk_cache to make it:
* Only store one set of data when one uncached item is accessed
   simultaneously (currently all requests cache the file and the last
   finished cache process is "wins").
* Don't wait until the whole item is cached, reply while caching
   (currently it stalls).
* Don't block the requesting thread when requestng a large uncached
   item, cache in the background and reply while caching (currently it
   stalls).


This is great, in doing this you've been solving a proxy bug that was
first reported in 1998 :).


OK. Stuck in the "File under L for Later" pile? ;)


The only things to be careful of is for Cache-Control: no-cache and
friends to be handled gracefully (the partially cached file should be
marked as "delete-me" so that the current request creates a new cache file
/ no cache file. Existing running downloads should be unaffected by
this.), and for backend failures (either a timeout or a premature socket
close) to cause the cache entry to be invalidated and deleted.


I haven't changed the handling of this, so any bugs in this regard 
shouldn't be my fault at least ;)


Regarding partially cached files, it understands when caching a file 
has failed and so on.



* More or less atomic operations, so caching headers and data in
   separate files gets very messy if you want to keep consistency.


Keep in mind that HTTP/1.1 compliance requires that the headers be
updatable without changing the body.


They are. It seek():s to an offset where the body is stored so 
headers can be updated as long as they don't grow too much.



* You can't use tempfiles since you want to be able to figure out
   where the data is to be able to reply while caching.
* You want to know the size of the data in order to tell when you're
   done (ie the current size of a file isn't necessarily the real size
   of the body since it might be caching while we're reading it).


The cache already wants to know the size of the data so that it can decide
whether it's prepared to try and cache the file in the first place, so in
theory this should not be a problem.


The need-size-issue goes for retrievals as well.

You also have the "size unknown right now" issue, which this patch 
solves by writing a header with the size -1 and then updating it when 
the size is known.



In any case the patch is more or less finished, independent testing
and auditing haven't been done yet but I can submit a preliminary
jumbo-patch if people are interested in having a look at it now.


Post it, people can take a look.


OK. It's attached. It has only had mild testing using the worker mpm 
with mmap enabled, it needs a bit more testing and auditing before 
trusting it too hard.


Note that this patch fixes a whole slew of other issues along the way, 
the most notable ones being LFS on 32bit arch, don't eat all your 
32bit memory/address space when caching a huge files, provide 
r->filename so %f in LogFormat works, and other smaller issues.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 I am Zirofsky of Borg. I will reassimilate Alaska and Finland.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

httpd-2.2.2-mod_disk_cache-jumbo20060502.patch.gz
Description: Binary data


Re: Possible new cache architecture

2006-05-02 Thread Graham Leggett
On Tue, May 2, 2006 12:16 pm, Plüm, Rüdiger, VF EITO said:

>> This is great, in doing this you've been solving a proxy bug that was
>> first reported in 1998 :).
>
> This already works in the case you get the data from the proxy backend. It
> does
> not work for local files that get cached (the scenario Niklas uses the
> cache
> for).

Ok then I have misunderstood - I was referring to the thundering herd
problem.

> The reason it does not work currently is that that a local file
> usually is
> delivered in one brigade with, depending on the size of the file, one or
> more
> file buckets.

Hmmm - ok, this makes sense.

Something I've never checked for, do output filters support asynchronous
writes?

If they did, this might solve this problem - the write request would
return immediately, allowing the read from file and write to cached file
to continue while the write to network blocked.

Regards,
Graham
--




Re: [PATCH] #39275 MaxClients on startup [Was: Bug in 2.0.56-dev]

2006-05-02 Thread Jeff Trawick

On 5/1/06, Greg Ames <[EMAIL PROTECTED]> wrote:

Jeff Trawick wrote:

after more thought, there is a simpler patch that should do the job.  the key 
to both of
these is how threads in SERVER_DEAD state with a pid in the scoreboard are 
treated.  this
means that p_i_s_m forked on a previous timer pop but some thread never made it 
into
SERVER_STARTING state.

the difference:  this patch just counts those potential threads as idle, and 
allows
MinSpareThreads worth of processes to be forked before putting on the brakes.  
the
previous patch pauses the forking immediately when the strange situation is 
detected but
requires more code and a new variable.


new patch is fine with me; I think we've lost our other interested
parties on this thread anyway ;)


Re: Possible new cache architecture

2006-05-02 Thread Plüm , Rüdiger , VF EITO


> -Ursprüngliche Nachricht-
> Von: Graham Leggett 
> > * Don't block the requesting thread when requestng a large uncached
> >item, cache in the background and reply while caching 
> (currently it
> >stalls).
> 
> This is great, in doing this you've been solving a proxy bug that was
> first reported in 1998 :).

This already works in the case you get the data from the proxy backend. It does
not work for local files that get cached (the scenario Niklas uses the cache
for). The reason it does not work currently is that that a local file usually is
delivered in one brigade with, depending on the size of the file, one or more
file buckets.
For Niklas purposes Colm's ideas regarding the use of the new Linux system calls
tee and splice will get handy 
(http://mail-archives.apache.org/mod_mbox/apr-dev/200604.mbox/[EMAIL PROTECTED])
as they should speed up such things.

Regards

Rüdiger



Re: Possible new cache architecture

2006-05-02 Thread Graham Leggett
On Tue, May 2, 2006 11:22 am, Niklas Edmundsson said:

> I've been hacking on mod_disk_cache to make it:
> * Only store one set of data when one uncached item is accessed
>simultaneously (currently all requests cache the file and the last
>finished cache process is "wins").
> * Don't wait until the whole item is cached, reply while caching
>(currently it stalls).
> * Don't block the requesting thread when requestng a large uncached
>item, cache in the background and reply while caching (currently it
>stalls).

This is great, in doing this you've been solving a proxy bug that was
first reported in 1998 :).

The only things to be careful of is for Cache-Control: no-cache and
friends to be handled gracefully (the partially cached file should be
marked as "delete-me" so that the current request creates a new cache file
/ no cache file. Existing running downloads should be unaffected by
this.), and for backend failures (either a timeout or a premature socket
close) to cause the cache entry to be invalidated and deleted.

> * More or less atomic operations, so caching headers and data in
>separate files gets very messy if you want to keep consistency.

Keep in mind that HTTP/1.1 compliance requires that the headers be
updatable without changing the body.

> * You can't use tempfiles since you want to be able to figure out
>where the data is to be able to reply while caching.
> * You want to know the size of the data in order to tell when you're
>done (ie the current size of a file isn't necessarily the real size
>of the body since it might be caching while we're reading it).

The cache already wants to know the size of the data so that it can decide
whether it's prepared to try and cache the file in the first place, so in
theory this should not be a problem.

> In any case the patch is more or less finished, independent testing
> and auditing haven't been done yet but I can submit a preliminary
> jumbo-patch if people are interested in having a look at it now.

Post it, people can take a look.

Regards,
Graham
--




Re: Possible new cache architecture

2006-05-02 Thread Niklas Edmundsson

On Mon, 1 May 2006, Davi Arnaut wrote:


More important, if we stick with the key/data concept it's possible to
implement the header/body relationship under single or multiple keys.


I've been hacking on mod_disk_cache to make it:
* Only store one set of data when one uncached item is accessed
  simultaneously (currently all requests cache the file and the last
  finished cache process is "wins").
* Don't wait until the whole item is cached, reply while caching
  (currently it stalls).
* Don't block the requesting thread when requestng a large uncached
  item, cache in the background and reply while caching (currently it
  stalls).

This is mostly aimed at serving huge static files from a slow disk 
backend (typically an NFS export from a server holding all the disk), 
such as http://ftp.acc.umu.se/ and http://ftp.heanet.ie/ .


Doing this with the current mod_disk_cache disk layout was not 
possible, doing the above without unneccessary locking means:


* More or less atomic operations, so caching headers and data in
  separate files gets very messy if you want to keep consistency.
* You can't use tempfiles since you want to be able to figure out
  where the data is to be able to reply while caching.
* You want to know the size of the data in order to tell when you're
  done (ie the current size of a file isn't necessarily the real size
  of the body since it might be caching while we're reading it).

In the light of our experiences, I really think that you want to have 
a concept that allows you to keep the bond between header and data. 
Yes, you can patch up a missing bond by require locking and stuff, but 
I really prefer not having to lock cache files when doing read access. 
When it comes to "make the common case fast" a lockless design is very 
much preferred.


However, if all those issues are sorted out in the layer above disk 
cache then the above observations becomes more or less moot.


In any case the patch is more or less finished, independent testing 
and auditing haven't been done yet but I can submit a preliminary 
jumbo-patch if people are interested in having a look at it now.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Want to forget all your troubles? Wear tight shoes.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=