Joe Orton wrote:

There is no other acceptable solution AFAICS. Buffering the entire brigade (either to disk, or into RAM as the current code does) before writing to the client is not OK, polling on buckets is not possible, using threads is not OK, using non-blocking writes up the output filter chain is not possible. Any other ideas?

I managed to solve this problem last night.

Took a while and a lot of digging to figure it out, but in the end it is relatively simple.

The ap_core_output_filter helps us out:

    /* Scan through the brigade and decide whether to attempt a write,
     * based on the following rules:
     *
     *  1) The new_bb is null: Do a nonblocking write of as much as
     *     possible: do a nonblocking write of as much data as possible,
     *     then save the rest in ctx->buffered_bb.  (If new_bb == NULL,
     *     it probably means that the MPM is doing asynchronous write
     *     completion and has just determined that this connection
     *     is writable.)
     *
    [snip]

Brigades handed to the output filter are written to the network with a non blocking write. Any parts of the brigades which cannot be written without blocking are set aside to be sent the next time the filter is invoked with more data.

There is a catch - the output filter will only setaside a certain number of non file buckets before it enforces a blocking write to clear the backlog and keep memory usage down. The solution to this catch is to ensure that you always write file buckets to the network.

This way, the output filter will never block [1].

Enter mod_disk_cache.

One of the last things mod_disk_cache does after saving the body, is to replace whatever buckets were just written regardless of bucket type [2] in the brigade, with a file bucket pointing at the cached file and containing the exact same data.

This behaviour has two side effects: responses no longer hang around in RAM waiting to be sent to a slow client, these responses can now sit on disk [3], and this potentially improves performance on "expensive" processes like CGI, which can go away immediately and not hang around waiting for slow clients. The second side effect is that the bucket handed to the output filter is a file bucket - and therefore can be set aside and handled with non blocking writes by the output filter.

Now, enter mod_cache.

None of the above would mean anything if the file buckets being sent consisted of a single 4.7GB bucket. In this case, the save_body() would only finish after 4.7GB was written to disk, and the network write would only start after the first complete invocation of save_body(), and by that point the browser got bored and is long gone.

Oops. What will we do.

But mod_cache no longer passes 4.7GB file buckets to the providers, it now splits them up into buckets of a maximum size defaulting to 16MB.

So 16MB at a time gets written to the cache, then written to the non blocking network, then written to the cache, and so on. Suddenly the write-to-cache, then write-to-network problem is gone, and without threads, and without fork.

Run a wget on a 250MB file. Watch it being downloaded and cached at the same time, the size of the file in the cache tracks the size of the file downloaded reported by wget. Run a second wget on the same file moments later. Watch that wget quickly read the file from the cache up to where the first wget is running, and then watch it track the first wget's progress from that point on. Run cmp on the original file, the downloaded files, and the cache body, all the same.

Works like a charm.

The work is not finished. There are alternate use cases that need to be checked. Some alternate use cases are not practical to handle, and we must make decisions on these.

This code however is based on code running in production right now, so bugs should be reasonably clear and straightforward.

I need some help on the behaviour of the brigades, especially with the cleanup of the brigades so they don't hang around for the entire request unnecessarily.

I also need help solving some of the less savory solutions that people are not happy with, like fstat/sleep.

Please don't mail me any more about copy_body(). This function is no longer necessary and will be removed next. Hopefully the above will explain why copy_body() was attempted in the first place, as flawed as it was. It was committed as is because it was a prerequisite of Niklas' second patch, which is a critical component of the above. It was better to commit then change, rather than never commit the first or second patches, and never get anywhere.

[1] testing shows this is the case. More testing is needed to make sure this is true in all cases.

[2] I need to teach mod_disk_cache to handle metadata buckets more intelligently.

[3] Again, I need some help making sure brigades are cleared when they should be, and there are no leaks in mod_disk_cache.

Regards,
Graham
--

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to