Hi,

It's quite clear that without some agreement we won't be able to
actually fix mod_cache shortcomings. The idea now is to gather our
efforts to get consensus on the proposed fixes and commit then one by one.

The current high priority issues can be summarized as:

* Buffering

. Problem:

For a moment forget about file buckets and large files, what's really at
stake is proxy/cache brigade management when the arrival rate is too
high (e.g. a single 4.7GB file bucket, high-rate input data to be
consumed by relatively low-rate).

By operating as a normal output filter mod_cache must deal with
potentially large brigades of (possibly) different (other than the stock
ones) bucket types created by other filters on the chain.

The problem arises from the fact that mod_disk_cache store function
traverses the brigade by it self reading each bucket in order to write
it's contents to disk, potentially filling the memory with large chunks
of data allocated/created by the bucket type read function (e.g. file
bucket).

. Constraints:

No threads/forked processes.
Bucket type specific workarounds won't work.
No core changes/knowledge, easily back-portable fixes are preferable.

. Proposed solution:

File buffering (or a part of Graham's last approach).

The solution consists of using the cache file as a output buffer by
splitting the buckets into smaller chunks and writing then to disk. Once
written (apr_file_write_full) a new file bucket is created with offset
and size of the just written buffer. The old bucket is deleted.

After that, the bucket is inserted into a temporary (empty) brigade and
sent down the output filter stack for (probably) network i/o.

At a quick glance, this solution may sound absurd -- the chunk is
already in memory, and the output filter might need it again in memory
soon. But there's no silver bullet, and it's a simple enough approach to
solve the growing memory problem while not occurring into performance
penalties.

The memory usage is kept low because the deleted buckets will be
released to the free list. Performance won't be hit because if there is
enough memory system-wide the reads from the core output filter are
going to be served by the page cache, or better yet, sendfil()ed.

. Comments

Do you agree/disagree ? Better solution ?

* Thundering herd/parallel downloads
* Partial downloads
* Code quality

Let's settle down the buffer issue first.

--
Davi Arnaut

Reply via email to