Graham Leggett wrote: >> For the expiration case, there's a much easier solution than >> shadowing the >> incomplete response. Add a new state for cache entries: >> "being_updated." >> When you get a request for a cached object that's past its expiration >> date, >> set the cache entry's state to "being_updated" and start retrieving >> the new >> content. Meanwhile, as other threads handle requests for the same >> object, >> they check the state of the cache entry and, because it's currently >> being >> updated, they deliver the old copy from the cache rather than >> dispatching >> the request to the backend system that's already working on a different >> instance of the same request. As long as the thread that's getting >> the new >> content can replace the old content with the new content atomically, >> there's >> no reason to make any other threads wait for the new content. > > > Hmmm... ok, not a bad idea, however I see a few hassles though. > > What if a user force-reloads the page. In theory the cached copy > should be expired immediately - but here we don't, because shadow > threads need to access the cached content in the meantime.
I actually think we *shouldn't* expire the cached copy in this case. I want the server, rather than the client, to be able to decide whether to expire a cache entry. If the client gets to dictate the expiration semantics, then we're vulnerable to a DoS attack (all a malicious client would need to do is start submitting lots of requests that invalidate cache entries). > Also - there will be a load spike until the first page is cached (in > the case where no previous cache existed). Yep. I think the ideal solution would be to combine this technique with shadowing support. Then mod_cache could do one of four things depending on the cache state: * cache hit, normal case: deliver the object from cache * cache hit, while another thread is updating: deliver the old object * cache miss, normal case: go retrieve the content * cache miss, while another thread is updating: shadow the response in progress > What happens if an attempt to update an expired cached page hangs? The > proxy could find itself serving stale content for a long time while a > timeout occurs. One solution is to add a timestamp for the "being updated" flag. If another thread sees that the flag is set but the timestamp is more than 'n' seconds old, that thread sets the timestamp to the current time and tries to retrieve the object itself. (This logic just needs to be atomic so that we don't end up with multiple threads all deciding to retrieve the content at once.) >> * It's going to take a while to make the shadowing work >> (due to all the race conditions that have to be addressed). > > > I don't see any significant race conditions though. > > All that needs to happen is that all cached responses are > readable-from-the-cache the moment they are created, rather than the > moment the download is complete. A flag against the cache entry marks > it as "still busy". If this flag is set, shadowed threads know to keep > waiting for more data appearing in the cache. If the flag is not set, > the shadow thread is a normal CACHE_OUT case, and finishes up the > request knowing it to be complete. The race conditions arise in cases where, for example, we find out halfway through a streamed request that it isn't cacheable after all because it's just exceeded the max cache size. Therefore we won't be caching this response permanently, but we do need to keep the buffered buckets around until all threads that are shadowing the response have finished using those buckets. And then, after the last of those threads finishes using the data, we need to delete the saved brigade. But if the response is *really* large, we can't keep the whole thing in memory until the last thread finishes delivering it. Instead, we'd have to incrementally delete buckets from the start of the brigade as all the threads finish dealing with each one. (And keep new threads from shadowing the response from the moment when we first delete a bucket.) This can all be done with reference counts and careful synchronization, but it's going to be challenging to debug. Brian