Graham Leggett wrote:

>> For the expiration case, there's a much easier solution than 
>> shadowing the
>> incomplete response.  Add a new state for cache entries: 
>> "being_updated."
>> When you get a request for a cached object that's past its expiration 
>> date,
>> set the cache entry's state to "being_updated" and start retrieving 
>> the new
>> content.  Meanwhile, as other threads handle requests for the same 
>> object,
>> they check the state of the cache entry and, because it's currently 
>> being
>> updated, they deliver the old copy from the cache rather than 
>> dispatching
>> the request to the backend system that's already working on a different
>> instance of the same request.  As long as the thread that's getting 
>> the new
>> content can replace the old content with the new content atomically, 
>> there's
>> no reason to make any other threads wait for the new content.
>
>
> Hmmm... ok, not a bad idea, however I see a few hassles though.
>
> What if a user force-reloads the page. In theory the cached copy 
> should be expired immediately - but here we don't, because shadow 
> threads need to access the cached content in the meantime.


I actually think we *shouldn't* expire the cached copy in this
case.  I want the server, rather than the client, to be able to
decide whether to expire a cache entry.  If the client gets to
dictate the expiration semantics, then we're vulnerable to a
DoS attack (all a malicious client would need to do is start
submitting lots of requests that invalidate cache entries).

> Also - there will be a load spike until the first page is cached (in 
> the case where no previous cache existed).


Yep.  I think the ideal solution would be to combine this
technique with shadowing support.  Then mod_cache could do
one of four things depending on the cache state:
  * cache hit, normal case: deliver the object from cache
  * cache hit, while another thread is updating: deliver the old object
  * cache miss, normal case: go retrieve the content
  * cache miss, while another thread is updating: shadow the response in 
progress

> What happens if an attempt to update an expired cached page hangs? The 
> proxy could find itself serving stale content for a long time while a 
> timeout occurs.


One solution is to add a timestamp for the "being updated" flag.
If another thread sees that the flag is set but the timestamp is
more than 'n' seconds old, that thread sets the timestamp to the
current time and tries to retrieve the object itself.  (This
logic just needs to be atomic so that we don't end up with multiple
threads all deciding to retrieve the content at once.)

>>  * It's going to take a while to make the shadowing work
>>    (due to all the race conditions that have to be addressed).
>
>
> I don't see any significant race conditions though.
>
> All that needs to happen is that all cached responses are 
> readable-from-the-cache the moment they are created, rather than the 
> moment the download is complete. A flag against the cache entry marks 
> it as "still busy". If this flag is set, shadowed threads know to keep 
> waiting for more data appearing in the cache. If the flag is not set, 
> the shadow thread is a normal CACHE_OUT case, and finishes up the 
> request knowing it to be complete.


The race conditions arise in cases where, for example, we find out
halfway through a streamed request that it isn't cacheable after all
because it's just exceeded the max cache size.  Therefore we won't
be caching this response permanently, but we do need to keep the
buffered buckets around until all threads that are shadowing the
response have finished using those buckets.  And then, after the
last of those threads finishes using the data, we need to delete
the saved brigade.  But if the response is *really* large, we can't
keep the whole thing in memory until the last thread finishes
delivering it.  Instead, we'd have to incrementally delete buckets
from the start of the brigade as all the threads finish dealing
with each one.  (And keep new threads from shadowing the response
from the moment when we first delete a bucket.)  This can all be
done with reference counts and careful synchronization, but it's
going to be challenging to debug.

Brian


Reply via email to