Hi, It's quite clear that without some agreement we won't be able to actually fix mod_cache shortcomings. The idea now is to gather our efforts to get consensus on the proposed fixes and commit then one by one.
The current high priority issues can be summarized as: * Buffering . Problem: For a moment forget about file buckets and large files, what's really at stake is proxy/cache brigade management when the arrival rate is too high (e.g. a single 4.7GB file bucket, high-rate input data to be consumed by relatively low-rate). By operating as a normal output filter mod_cache must deal with potentially large brigades of (possibly) different (other than the stock ones) bucket types created by other filters on the chain. The problem arises from the fact that mod_disk_cache store function traverses the brigade by it self reading each bucket in order to write it's contents to disk, potentially filling the memory with large chunks of data allocated/created by the bucket type read function (e.g. file bucket). . Constraints: No threads/forked processes. Bucket type specific workarounds won't work. No core changes/knowledge, easily back-portable fixes are preferable. . Proposed solution: File buffering (or a part of Graham's last approach). The solution consists of using the cache file as a output buffer by splitting the buckets into smaller chunks and writing then to disk. Once written (apr_file_write_full) a new file bucket is created with offset and size of the just written buffer. The old bucket is deleted. After that, the bucket is inserted into a temporary (empty) brigade and sent down the output filter stack for (probably) network i/o. At a quick glance, this solution may sound absurd -- the chunk is already in memory, and the output filter might need it again in memory soon. But there's no silver bullet, and it's a simple enough approach to solve the growing memory problem while not occurring into performance penalties. The memory usage is kept low because the deleted buckets will be released to the free list. Performance won't be hit because if there is enough memory system-wide the reads from the core output filter are going to be served by the page cache, or better yet, sendfil()ed. . Comments Do you agree/disagree ? Better solution ? * Thundering herd/parallel downloads * Partial downloads * Code quality Let's settle down the buffer issue first. -- Davi Arnaut