Re: Possible new cache architecture
On Mon, 01 May 2006 22:46:44 +0200 Graham Leggett <[EMAIL PROTECTED]> wrote: > Brian Akins wrote: > > >> That's two hits to find whether something is cached. > > > > You must have two hits if you support vary. > > You need only one - bring up the original cached entry with the key, and > then use cheap subkeys over a very limited data set to find both the > variants and the header/data. > > >> How are races prevented? > > > > shouldn't be any. something is in the cache or not. if one "piece" of > > an http "object" is not valid or in cache, the object is invalid. > > Although other variants may be valid/in cache. > > I can think of one race off the top of my head: > > - the browser says "send me this URL". > > - the cache has it cached, but it's stale, so it asks the backend > "If-None-Match". > > - the cache reaper comes along, says "oh, this is stale", and reaps the > cached body (which is independant, remember?). The data is no longer > cached even though the headers still exist. > > - The backend says "304 Not Modified". > > - the cache says "cool, will send my copy upstream. Oops, where has my > data gone?". Sorry, but this only happens in your imagination. It's pretty obvious that mod_cache_http will handle this. > The end user will probably experience this as "oh, the website had a > glitch, let me try again", so it won't be reported as a bug. No. > Ok, so you tried to lock the body before going to the backend, but > searching for and locking the body would have been an additional wasted > cache hit if the backend answered with its own body. Not to mention > having to write and debug code to do this. Locks are not necessary, perhaps you are imaginating something very different. If a data body disappears under mod_http_cache it is not a big deal! It will refuse to serve the request from the cache and a new version of the page will be cached. > Races need to be properly handled, and atomic cache operations will go a > long way to prevent them. I think we are discussing apples and oranges. First, we only want to *organize* the current cache code into a more layered solution. The current semantics won't change, yet! -- Davi Arnaut
Re: Possible new cache architecture
Graham Leggett wrote: Brian Akins wrote: That's two hits to find whether something is cached. You must have two hits if you support vary. You need only one - bring up the original cached entry with the key, and then use cheap subkeys over a very limited data set to find both the variants and the header/data. How are races prevented? shouldn't be any. something is in the cache or not. if one "piece" of an http "object" is not valid or in cache, the object is invalid. Although other variants may be valid/in cache. I can think of one race off the top of my head: - the browser says "send me this URL". - the cache has it cached, but it's stale, so it asks the backend "If-None-Match". - the cache reaper comes along, says "oh, this is stale", and reaps the cached body (which is independant, remember?). The data is no longer cached even though the headers still exist. - The backend says "304 Not Modified". - the cache says "cool, will send my copy upstream. Oops, where has my data gone?". I think that can be avoided by, instead of reaping the cached body, actually setting aside the cached body (public > private), by changing it's key or whatnot. Then - throw it away after the backend says "200 OK", and replace it with something new. Or, rekey it a second time (private > public) when the backend reports "304 NOT MODIFIED". In the race, one will set it aside looking for another, the second will make a fresh request (it doesn't see it in the cache), and either the first or second request will wrap up -last- to place the final copy back into the cache, replacing the document from the winner. No harm no foul. Bill
Re: [PATCH] #39275 MaxClients on startup [Was: Bug in 2.0.56-dev]
Jeff Trawick wrote: I have been working with a user on one of these fork bomb scenarios and assumed it was the child_init hook. But after giving them a test fix that relies on a child setting scoreboard fields in child_main before child-init hooks run, and also adds some debugging traces related to calling child-init hooks, it is clear that their stall occurs BEFORE the child-init hook. Which leaves a stretch of fairly simple code. (Best theory is bad stuff happening in an atfork handler registred by a third-party module or some library it uses. But that's besides the point.) after more thought, there is a simpler patch that should do the job. the key to both of these is how threads in SERVER_DEAD state with a pid in the scoreboard are treated. this means that p_i_s_m forked on a previous timer pop but some thread never made it into SERVER_STARTING state. the difference: this patch just counts those potential threads as idle, and allows MinSpareThreads worth of processes to be forked before putting on the brakes. the previous patch pauses the forking immediately when the strange situation is detected but requires more code and a new variable. I'm leaning toward this one because it is simpler. opinions? Greg --- server/mpm/worker/worker.c (revision 398659) +++ server/mpm/worker/worker.c (working copy) @@ -1422,7 +1422,7 @@ */ if (ps->pid != 0) { /* XXX just set all_dead_threads in outer for loop if no pid? not much else matters */ -if (status <= SERVER_READY && status != SERVER_DEAD && +if (status <= SERVER_READY && !ps->quiescing && ps->generation == ap_my_generation) { ++idle_thread_count;
Re: Possible new cache architecture
Brian Akins wrote: That's two hits to find whether something is cached. You must have two hits if you support vary. You need only one - bring up the original cached entry with the key, and then use cheap subkeys over a very limited data set to find both the variants and the header/data. How are races prevented? shouldn't be any. something is in the cache or not. if one "piece" of an http "object" is not valid or in cache, the object is invalid. Although other variants may be valid/in cache. I can think of one race off the top of my head: - the browser says "send me this URL". - the cache has it cached, but it's stale, so it asks the backend "If-None-Match". - the cache reaper comes along, says "oh, this is stale", and reaps the cached body (which is independant, remember?). The data is no longer cached even though the headers still exist. - The backend says "304 Not Modified". - the cache says "cool, will send my copy upstream. Oops, where has my data gone?". The end user will probably experience this as "oh, the website had a glitch, let me try again", so it won't be reported as a bug. Ok, so you tried to lock the body before going to the backend, but searching for and locking the body would have been an additional wasted cache hit if the backend answered with its own body. Not to mention having to write and debug code to do this. Races need to be properly handled, and atomic cache operations will go a long way to prevent them. Regards, Graham -- smime.p7s Description: S/MIME Cryptographic Signature
Re: Possible new cache architecture
On Mon, 01 May 2006 15:46:58 -0400 Brian Akins <[EMAIL PROTECTED]> wrote: > Graham Leggett wrote: > > > That's two hits to find whether something is cached. > > You must have two hits if you support vary. > > > How are races prevented? > > shouldn't be any. something is in the cache or not. if one "piece" of > an http "object" is not valid or in cache, the object is invalid. > Although other variants may be valid/in cache. > More important, if we stick with the key/data concept it's possible to implement the header/body relationship under single or multiple keys. I think Brian want's mod_cache should be only a layer (glue) between the underlying providers and the cache users. Each set of problems are better dealt under their own layers. The storage layer (cache providers) are going to only worry about storing the key/data pairs (and expiring ?) while the "protocol" layer will deal with the underlying concepts of each protocol (mod_http_cache). The current design leads to bloat, just look at mem_cache and disk_cache, both have their own duplicated quirks (serialize/unserialize, et cetera) and need special handling of the headers and file format. Under the new design this duplication will be gone, think that we will assemble the HTTP-specific part and generalize the storage part. -- Davi Arnaut
Re: Possible new cache architecture
William A. Rowe, Jr. wrote: And, of course, inserting the hit once it's composed is important, and can happen in parallel (3 clients looking for the same, and then fetching the same page from the origin). But it's harmless if the insertion is mutex protected, and the insertion can only happen once the page is fetched complete. in the case of mod_disk_cache the way I would do it is to have a deterministic tempfile rather than user apr_tempfile and opening it EXCL. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Possible new cache architecture
Brian Akins wrote: Graham Leggett wrote: That's two hits to find whether something is cached. You must have two hits if you support vary. Well, one to three hits. One, if you use an arbitrary page (MRU or most frequently referenced would be most optimial, but it really doesn't matter) and then determine what varies, and if you are in the right place, or what that right place is (page by language, or whatever fields it varied by.) Three hits or more if your variant also varies ;) How are races prevented? shouldn't be any. something is in the cache or not. if one "piece" of an http "object" is not valid or in cache, the object is invalid. Although other variants may be valid/in cache. And, of course, inserting the hit once it's composed is important, and can happen in parallel (3 clients looking for the same, and then fetching the same page from the origin). But it's harmless if the insertion is mutex protected, and the insertion can only happen once the page is fetched complete.
Re: Possible new cache architecture
Graham Leggett wrote: That's two hits to find whether something is cached. You must have two hits if you support vary. How are races prevented? shouldn't be any. something is in the cache or not. if one "piece" of an http "object" is not valid or in cache, the object is invalid. Although other variants may be valid/in cache. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Possible new cache architecture
Graham Leggett wrote: Or you can avoid this issue entirely by building a generic cache that works with key/subkey/data. and then you have to find a way to bridge the gap between this interface and all the key/value caches that currently exist (memcache being the most popular example). what if mod_http_cache had a way to "record" it's cached objects? It could keep up with the relationships there. Basically, you have a provider that has a few functions that get called whenever mod_http_cache caches or expires an object. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Possible new cache architecture
Brian Akins wrote: Nope. Look at the way the current http cache works. An http "object," headers and data, is only valid if both headers and data are valid. That's two hits to find whether something is cached. How are races prevented? Regards, Graham -- smime.p7s Description: S/MIME Cryptographic Signature
Re: Possible new cache architecture
Graham Leggett wrote: the independent caching of variants. The example I posted should address this issue. I also have some ideas concerning the thundering herd problem, it's just a matter if you think it should be handled in cache or http_cache. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Possible new cache architecture
Davi Arnaut wrote: It's a design flaw to create problems that have to be specially coded around, when you can avoid the problem entirely. Maybe I'm missing something, what problems do you foresee ? There are lots of issues that were uncovered when I split the proxy and cache code for httpd v2.0. A web cache requires two separately alterable cached entities (headers, body) just for caching a single variant. This pair of entities need to expire and/or be forceably expired (think Cache-Control no-cache) atomically. Sure, you can code and debug a lot of code to try and create the effect of atomically expiring multiple cache entries at once. Or you can avoid this issue entirely by building a generic cache that works with key/subkey/data. There are a number of other issues that have been listed as bugs since httpd v1.3 that are still present, most notably the thundering herd problem, and the independent caching of variants. There is no point in refactoring the cache code if the new code isn't going to be significantly better than the existing code. Regards, Graham -- smime.p7s Description: S/MIME Cryptographic Signature
Re: plain file name of a request
Greg Ames wrote: Markus Litz wrote: Hello, how can i get the filename only of the requested uri? For example if "http://www.example.com/test.html"; is requestet, i only want "test.html". request_rec::filename only gives the full filename on disk. basename(r->filename) :) Or portably, apr_filepath_name_get() declared in apr_lib.h
Re: plain file name of a request
Markus Litz wrote: Hello, how can i get the filename only of the requested uri? For example if "http://www.example.com/test.html"; is requestet, i only want "test.html". request_rec::filename only gives the full filename on disk. basename(r->filename) Greg
Re: svn commit: r398494 - in /httpd/site/trunk: docs/security/vulnerabilities_13.html docs/security/vulnerabilities_20.html docs/security/vulnerabilities_22.html xdocs/security/vulnerabilities_22.xml
Mark J Cox wrote: >> This killed the list of vulnerabilities for all versions. Was this intended? >> And if yes, where can they be found now? > > Must be someone with bad java foo, fixing. > Er. ya. It wasn't my intention to break stuff, I just ran build.sh and it kept saying it wanted to do this java version "1.5.0_06" Intel Mac. How could a version of java change the behavior of the site build stuff? -Paul
Re: svn commit: r398494 - in /httpd/site/trunk: docs/security/vulnerabilities_13.html docs/security/vulnerabilities_20.html docs/security/vulnerabilities_22.html xdocs/security/vulnerabilities_22.xml
> This killed the list of vulnerabilities for all versions. Was this intended? > And if yes, where can they be found now? Must be someone with bad java foo, fixing. Mark -- Mark J Cox | www.awe.com/mark
Re: Possible new cache architecture
Davi Arnaut wrote: This way it would be possible for one cache to act as a cache of another cache provider, mod_mem_cache would work as a small/fast MRU cache for mod_disk_cache. Slightly off subject, but in my testing, mod_disk_cache is much faster than mod_mem_cache. Thanks to sendifle! I was thinking about scenarios were each cache had there local cache (disk, mem, whatever) with memcache behind it. That way each "object" only has to be generated once for the entire "farm." This would be an easy way to have a distributed cache. Also, the squid type htcp (or icp) could be a failback for the local cache as well without mucking up all the proxy and cache code. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Possible new cache architecture
On Mon, 01 May 2006 09:02:31 -0400 Brian Akins <[EMAIL PROTECTED]> wrote: > Here is a scenario. We will assume a cache "hit." I think the usage scenario is clear. Moving on, I would like to able to stack up the cache providers (like the apache filter chain). Basically, mod_cache will expose the functions: add(key, value, expiration, flag) get(key) remove(key) mod_cache will then pass the request (add/get or remove) down the chain, similar to apache filter chain. ie: apr_status_t mem_cache_get_filter(ap_cache_filter_t *f, apr_bucket_brigade *bb, ...); apr_status_t disk_cache_get_filter(ap_cache_filter_t *f, apr_bucket_brigade *bb, ...); This way it would be possible for one cache to act as a cache of another cache provider, mod_mem_cache would work as a small/fast MRU cache for mod_disk_cache. -- Davi Arnaut
Re: Possible new cache architecture
On Mon, 01 May 2006 14:51:53 +0200 Graham Leggett <[EMAIL PROTECTED]> wrote: > Davi Arnaut wrote: > > >> mod_cache need not be HTTP specific, it only needs the ability to cache > >> multiple entities (data, headers) under the same key, and be able to > >> replace zero or more entities independently of the other entities (think > >> updating headers without updating content). > > > > mod_cache needs only to cache key/value pairs. The key/value format is up to > > the mod_cache user. > > It's a design flaw to create problems that have to be specially coded > around, when you can avoid the problem entirely. Maybe I'm missing something, what problems do you foresee ? > The cache needs to be generic, yes - but there is no need to stick to > the "key/value" cliché of cache code, if a variation to this is going to > make your life significantly easier. > And the variation is..? -- Davi Arnaut
Re: Possible new cache architecture
Here is a scenario. We will assume a cache "hit." Client asks for http://domain/uri.html?args mod_http_cache generates a key: http-domain-uri.html-args-header asks mod_cache for value with this key. mod_cache fetches the value, looks at expire time, its good, and returns the "blob" mod_http_cache examines blob, it's vary information on Accept-Encoding. mod_http_cache generates a new key: http-domain.html-args-header-gzip (value from client) asks mod_cache for value with this key. mod_cache fetches the value, looks at expire time, its good, and returns the "blob" mod_http_cache examines blob, it's a normal header blob. does not "meet conditions" need to get data. mod_http_cache generates a new key: http-domain.html-args-data-gzip (value from client) asks mod_cache for value with this key. mod_cache fetches the value, looks at expire time, its good, and returns the "blob" mod_http_cache returns headers and data to client. Notice there is a pattern to this... -- Brian Akins Lead Systems Engineer CNN Internet Technologies
[Fwd: svn commit: r398585 - in /httpd/site/trunk: docs/download.html docs/index.html xdocs/download.xml xdocs/index.xml]
+Win32 Binary (Self extracting): href="[preferred]/httpd/binaries/win32/apache_1.3.35-win32-x86-no_src.exe">apache_1.3.35-win32-x86-no_src.exe There is no more .exe (and won't be). By 2006 everyone has at least msiexec 1.10 installed ;-) Only -src.msi and -no_src.msi remain for 1.3, while 2.0 and 2.2 will have -ssl.msi and -no-ssl.msi flavors. Bill
Re: Possible new cache architecture
Davi Arnaut wrote: mod_cache needs only to cache key/value pairs. The key/value format is up to the mod_cache user. correct. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Possible new cache architecture
Davi Arnaut wrote: mod_cache need not be HTTP specific, it only needs the ability to cache multiple entities (data, headers) under the same key, and be able to replace zero or more entities independently of the other entities (think updating headers without updating content). mod_cache needs only to cache key/value pairs. The key/value format is up to the mod_cache user. It's a design flaw to create problems that have to be specially coded around, when you can avoid the problem entirely. The cache needs to be generic, yes - but there is no need to stick to the "key/value" cliché of cache code, if a variation to this is going to make your life significantly easier. Regards, Graham -- smime.p7s Description: S/MIME Cryptographic Signature
Re: Possible new cache architecture
Graham Leggett wrote: The potential danger with this is for race conditions to happen while expiring cache entries. If the data entity expired before the header entity, it potentially could confuse the cache - is the entry cached or not? The headers say yes, data says no. Nope. Look at the way the current http cache works. An http "object," headers and data, is only valid if both headers and data are valid. Each variant should be an independent cached entry, the cache should allow different variants to be cached side by side. Yes. Each is distinguished by its key. As far as mod_cache is concerned these are 3 independent entries, but mod_http_cache knows how to "stitch" them together. mod_cache should *not* be HTTP specific in any way. mod_cache need not be HTTP specific, it only needs the ability to cache multiple entities (data, headers) under the same key, No. In other words, there must be the ability to cache by a key and a subkey. No. mod_http_cache generates new keys for headers (key.header) data (key.data) and each variant (key1.header, key2.header, key1.daya... etc.). As far as the underlying generic cache is concerned, they are all independent entries. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: svn commit: r398492 - in /httpd/site/trunk: docs/download.html docs/index.html xdocs/download.xml xdocs/index.xml
On 05/01/2006 03:25 AM, [EMAIL PROTECTED] wrote: > Author: pquerna > Date: Sun Apr 30 18:25:38 2006 > New Revision: 398492 > > URL: http://svn.apache.org/viewcvs?rev=398492&view=rev > Log: > Rev website for 2.2.2 > > Modified: > httpd/site/trunk/docs/download.html > httpd/site/trunk/docs/index.html > httpd/site/trunk/xdocs/download.xml > httpd/site/trunk/xdocs/index.xml I see that 2.2.2 and 2.0.58 are announced. What about 1.3.35? Did it not hit the mirrors in time? Regards Rüdiger
Re: svn commit: r398494 - in /httpd/site/trunk: docs/security/vulnerabilities_13.html docs/security/vulnerabilities_20.html docs/security/vulnerabilities_22.html xdocs/security/vulnerabilities_22.xml
On 05/01/2006 03:32 AM, [EMAIL PROTECTED] wrote: > Author: pquerna > Date: Sun Apr 30 18:32:18 2006 > New Revision: 398494 > > URL: http://svn.apache.org/viewcvs?rev=398494&view=rev > Log: > rebuild all. > > Modified: > httpd/site/trunk/docs/security/vulnerabilities_13.html > httpd/site/trunk/docs/security/vulnerabilities_20.html > httpd/site/trunk/docs/security/vulnerabilities_22.html > httpd/site/trunk/xdocs/security/vulnerabilities_22.xml This killed the list of vulnerabilities for all versions. Was this intended? And if yes, where can they be found now? Anyway, many thanks for doing this release work :-). Regards Rüdiger