Re: mod_cache and its ilk
Justin Erenkrantz wrote: +1. I would prefer that all bells and whistles be kept out of mod_disk_cache and we provide another alternative that has more complex behaviors that scales a little better for those who care. (That was the whole reason behind a versioned provider interface to begin with!) In that case I suggest we revert the mod_disk_cache back to how it was before the large-file patches (but after the sets of bugfixes), and to move the current modified mod_disk_cache to mod_large_disk cache for further review? 1) cannot write entire response to disk for any content type before sending anything to the client; filter acts by writing to cache and client synchronously My concern with this is we should be careful not to teach the providers about the fact that it is sitting in an output filter chain. In other words, I would prefer that we create an abstraction that does not force the providers to have to worry about passing brigades down a filter chain or dealing with filter errors. It just needs to stash some data away and that's all I want the provider to be concerned with. Perhaps we provide an optional continuation function to the store_body() analogue that the provider can call after it writes a bucket which mod_cache can then pass along down the filter chain on its behalf. Otherwise, we'd have way too much duplicate code to deal with and that concerns me. The implementation that was recently checked in makes it so that we can not distinguish between a failed cache storage and a filter error. I'd like to be able to maintain knowing the difference and not having a failed cache storage affect the client. See my add'l item #5 below. My original solution for this was, inside mod_cache, to loop through the brigade, and to split buckets larger than a threshold before giving the buckets to cache_body() and the network in turn. cache_body() would then be guaranteed that buckets would never exceed a certain size. The certain size was configurable via a directive. At the time I did not know that the apr_bucket_read() function did exactly this anyway - this behaviour is undocumented in doxygen - and apparently 4MB is the maximum size that apr_bucket_read() will return. Based on this, we could remove the directive and default it to 4MB (or the same constant used in apr_bucket_read()), and use apr_bucket_split. This has the advantage of keeping the API clean as before, and it ensures that cache providers don't have to care about large buckets. The cache also doesn't have to care how 4.7GB of buckets are going to be stored temporarily between write-to-cache and write-to-disk, as the buckets so split are deleted after each iteration. 2) avoids writing a given URL/variant to the cache more than once simultaneously using open/O_EXCL There's problems with relying upon O_EXCL. mod_disk_cache purposefully lets the race condition happen as without inter-process sync, it's not really easy to know who is actually around and is likely to finish. So, mod_disk_cache lets the last guy to succeed 'win'. I think this leads to a far simpler design within mod_disk_cache (it trades writing duplicates of the cached entity for a higher chance of success), but I'd be curious to see if you can come up with a scalable design using O_EXCL. I tried before and never came up with a clean or scalable design as the solutions with O_EXCL placed too much faith in the fact that one selected client will actually finish. Although, at the end of the day, I'm willing to sacrifice that (and let another provider solve it), but I'd like to at least explore solutions that don't. I'm not convinced the original last-guy-wins design was a good idea. On sites with high load, it meant many simultaneously cached copies would race to complete, using number_of_attempts * size diskspace being used unnecessarily, and a potentially large load spike to the backend. People with high load sites cite this as a key reason not to use the cache, as the resultant load spike to their backends cannot be sustained. The large_disk_cache has made an attempt to address this issue, where the above is a problem, this is a case where large_disk_cache can step in. 3) for the case where multiple simultaneous requests are made for an uncached but cacheable entity, will simply pass-through all requests for which the open/O_EXCL fails and does not attempt to read from an as-yet-incomplete cache file +1 modulo if you can get a clean design per #2 above. I just noticed that someone introduced an "open_header_timeout" which incurs a penalty when a partial file is present (!). So, let me add the following constraint derived from #3: 4) No penalties are applied to the client when we do not have a complete cached entity on hand. Sleeping is never an acceptable solution. If we don't have it, it's okay to serve directly and avoid the cache. mod_disk_cache takes the opportunity to try to recache, but avoiding the cache
Re: mod_cache and its ilk
On 10/30/06, Joe Orton <[EMAIL PROTECTED]> wrote: On Mon, Oct 30, 2006 at 10:26:18AM -0700, Justin Erenkrantz wrote: > My concern with this is we should be careful not to teach the > providers about the fact that it is sitting in an output filter chain. This is solvable if desired, but I'd like to address it separately to keep this thread under control. Sure. > >2) avoids writing a given URL/variant to the cache more than once > >simultaneously using open/O_EXCL > > There's problems with relying upon O_EXCL. mod_disk_cache > purposefully lets the race condition happen as without inter-process > sync, it's not really easy to know who is actually around and is > likely to finish. Are you talking issues with O_EXCL not being reliable on e.g. NFS, or just about the fact that the partial file left by a dead/hung process subsequently prevents any chance to write to that cache file? The latter. (NFS is a lower priority - I'd like to see it work, but if it can't no biggie, IMO.) This is a tricky trade-off; I can go either way. I would say it would be OK to rely on htcacheclean having heuristics to expire (prematurely if necessary) such partial files. Hmm. Maybe, but that's a bit more complex than I'd like to see. *shrug* -- justin
Re: mod_cache and its ilk
On Mon, Oct 30, 2006 at 10:26:18AM -0700, Justin Erenkrantz wrote: > On 10/30/06, Joe Orton <[EMAIL PROTECTED]> wrote: > >1) cannot write entire response to disk for any content type before > >sending anything to the client; filter acts by writing to cache and > >client synchronously > > My concern with this is we should be careful not to teach the > providers about the fact that it is sitting in an output filter chain. This is solvable if desired, but I'd like to address it separately to keep this thread under control. > >2) avoids writing a given URL/variant to the cache more than once > >simultaneously using open/O_EXCL > > There's problems with relying upon O_EXCL. mod_disk_cache > purposefully lets the race condition happen as without inter-process > sync, it's not really easy to know who is actually around and is > likely to finish. Are you talking issues with O_EXCL not being reliable on e.g. NFS, or just about the fact that the partial file left by a dead/hung process subsequently prevents any chance to write to that cache file? This is a tricky trade-off; I can go either way. I would say it would be OK to rely on htcacheclean having heuristics to expire (prematurely if necessary) such partial files. > 4) No penalties are applied to the client when we do not have a > complete cached entity on hand. +1 > 5) If writing to the cache fails, the client doesn't need to know > that, but the admin should get a nice note somewhere. +1 joe
Re: mod_cache and its ilk
On 10/30/06, Joe Orton <[EMAIL PROTECTED]> wrote: A simple general-purpose disk cache which makes no assumptions about speed of backend, speed of storage or speed of clients; is single-threaded and does not involve any multi-process synchronisation beyond open/O_EXCL. Specifically: +1. I would prefer that all bells and whistles be kept out of mod_disk_cache and we provide another alternative that has more complex behaviors that scales a little better for those who care. (That was the whole reason behind a versioned provider interface to begin with!) 1) cannot write entire response to disk for any content type before sending anything to the client; filter acts by writing to cache and client synchronously My concern with this is we should be careful not to teach the providers about the fact that it is sitting in an output filter chain. In other words, I would prefer that we create an abstraction that does not force the providers to have to worry about passing brigades down a filter chain or dealing with filter errors. It just needs to stash some data away and that's all I want the provider to be concerned with. Perhaps we provide an optional continuation function to the store_body() analogue that the provider can call after it writes a bucket which mod_cache can then pass along down the filter chain on its behalf. Otherwise, we'd have way too much duplicate code to deal with and that concerns me. The implementation that was recently checked in makes it so that we can not distinguish between a failed cache storage and a filter error. I'd like to be able to maintain knowing the difference and not having a failed cache storage affect the client. See my add'l item #5 below. 2) avoids writing a given URL/variant to the cache more than once simultaneously using open/O_EXCL There's problems with relying upon O_EXCL. mod_disk_cache purposefully lets the race condition happen as without inter-process sync, it's not really easy to know who is actually around and is likely to finish. So, mod_disk_cache lets the last guy to succeed 'win'. I think this leads to a far simpler design within mod_disk_cache (it trades writing duplicates of the cached entity for a higher chance of success), but I'd be curious to see if you can come up with a scalable design using O_EXCL. I tried before and never came up with a clean or scalable design as the solutions with O_EXCL placed too much faith in the fact that one selected client will actually finish. Although, at the end of the day, I'm willing to sacrifice that (and let another provider solve it), but I'd like to at least explore solutions that don't. 3) for the case where multiple simultaneous requests are made for an uncached but cacheable entity, will simply pass-through all requests for which the open/O_EXCL fails and does not attempt to read from an as-yet-incomplete cache file +1 modulo if you can get a clean design per #2 above. I just noticed that someone introduced an "open_header_timeout" which incurs a penalty when a partial file is present (!). So, let me add the following constraint derived from #3: 4) No penalties are applied to the client when we do not have a complete cached entity on hand. Sleeping is never an acceptable solution. If we don't have it, it's okay to serve directly and avoid the cache. mod_disk_cache takes the opportunity to try to recache, but avoiding the cache is okay too; sleeping is an absolute no-no. Both are acceptable solutions, IMO. sleeping is not. And, derived from the issue with the failure scenario: 5) If writing to the cache fails, the client doesn't need to know that, but the admin should get a nice note somewhere. On the surface, this speaks against the storage operation being a filter in the traditional sense. A disk cache which makes different assumptions about speed of backend or uses more complex caching tricks should be in the tree as a different provider in a module with a different name; the exact assumptions made need to be well documented. (Maybe something like "mod_local_disk_cache" would be appropriate for the name?) Needless to say, code making bogus assumptions about the output filtering interface has no place in the tree and fails the "correctness" test. +1. -- justin
Re: mod_cache and its ilk
On Mon, October 30, 2006 1:03 pm, Joe Orton wrote: >> On Mon, October 30, 2006 12:07 pm, Joe Orton wrote: >> > 1) cannot write entire response to disk for any content type before >> > sending anything to the client; filter acts by writing to cache and >> > client synchronously >> >> Justin vetoed this. > > Can we limit this thread to just gaining consensus on the goals? > > Justin vetoed a change which implemented a specific fix for a memory > consumption issue in the cache; there are any number of different ways > to fix that issue which are not contrary to the above goal. Ok. Regards, Graham --
Re: mod_cache and its ilk
On Mon, Oct 30, 2006 at 12:18:30PM +0200, Graham Leggett wrote: > On Mon, October 30, 2006 12:07 pm, Joe Orton wrote: > > 1) cannot write entire response to disk for any content type before > > sending anything to the client; filter acts by writing to cache and > > client synchronously > > Justin vetoed this. Can we limit this thread to just gaining consensus on the goals? Justin vetoed a change which implemented a specific fix for a memory consumption issue in the cache; there are any number of different ways to fix that issue which are not contrary to the above goal. joe
Re: mod_cache and its ilk
On Mon, October 30, 2006 12:07 pm, Joe Orton wrote: > Thanks Roy. So, the goals for mod_disk_cache as I see it: > > A simple general-purpose disk cache which makes no assumptions about > speed of backend, speed of storage or speed of clients; is > single-threaded and does not involve any multi-process synchronisation > beyond open/O_EXCL. Specifically: So, as has been suggested before, I suggest we have mod_disk_cache that implements the cache as described by Joe here, along with fixes to ensure that it works (it currently doesn't). Then, we have mod_large_disk_cache that does the extra stuff contributed by Niklas Edmundsson. The end user can then make an intelligent choice as to the best cache that fits their needs. > 1) cannot write entire response to disk for any content type before > sending anything to the client; filter acts by writing to cache and > client synchronously Justin vetoed this. The mod_cache itself should be responsible for breaking any large buckets into bite size chunks, to prevent the cache module from having to care about large responses. This maintains the abstraction between write-to-cache and write-to-network. > A disk cache which makes different assumptions about speed of backend or > uses more complex caching tricks should be in the tree as a different > provider in a module with a different name; the exact assumptions made > need to be well documented. (Maybe something like > "mod_local_disk_cache" would be appropriate for the name?) +1 - I suggested mod_large_disk_cache, as it is taylored for large responses, but I'm not attached to any specific name. > Needless to say, code making bogus assumptions about the output > filtering interface has no place in the tree and fails the "correctness" > test. Obviously, thus the notify solution. Regards, Graham --
Re: mod_cache and its ilk
Thanks Roy. So, the goals for mod_disk_cache as I see it: A simple general-purpose disk cache which makes no assumptions about speed of backend, speed of storage or speed of clients; is single-threaded and does not involve any multi-process synchronisation beyond open/O_EXCL. Specifically: 1) cannot write entire response to disk for any content type before sending anything to the client; filter acts by writing to cache and client synchronously 2) avoids writing a given URL/variant to the cache more than once simultaneously using open/O_EXCL 3) for the case where multiple simultaneous requests are made for an uncached but cacheable entity, will simply pass-through all requests for which the open/O_EXCL fails and does not attempt to read from an as-yet-incomplete cache file A disk cache which makes different assumptions about speed of backend or uses more complex caching tricks should be in the tree as a different provider in a module with a different name; the exact assumptions made need to be well documented. (Maybe something like "mod_local_disk_cache" would be appropriate for the name?) Needless to say, code making bogus assumptions about the output filtering interface has no place in the tree and fails the "correctness" test. Regards, joe
Re: mod_cache and its ilk
Roy T. Fielding wrote: As far as *I* am concerned, changes to the cache code must be correct first and then perform second, and both of those should be proven by actual testing before being committed to trunk. +1. We have an existing cache that breaks in real world environments. We have a contributed patch set from Niklas Edmundsson that addresses these issues, and is used in production. It works. A significant amount of work has been done to ensure that after each patch was committed, the code was tested and still worked. It works for me, very well. We have some very valid objections to some of the methods used in this patch set, and based on these objections a major part of one patch was rewritten, and the last patch in the set was never committed. We also have some very clear things that the patch is not allowed to do - including but not limited to threading and forking. In response to the above, the following has been identified: - APR needs significantly improved documentation attached to its doxygen comments. - APR needs a notifier API to determine whether ap_core_output_filter() will block. This addresses Joe objection to the assumption that ap_core_output_filter() won't block on files. This also removes the need for any threading or forking. - The notifier is also needed so that the need to fstat then sleep is removed. Further work is being done to solve the above issues, but too few people are testing this code. This is another call to download trunk and to try it out, and to identify any issues encountered so that they may be identified and fixed. Regards, Graham -- smime.p7s Description: S/MIME Cryptographic Signature