On Mon, 12 Jul 2004, Ian Holsman wrote:

> ok, now before I start this let me say one thing, this is not for *ALL*
> requests, it will only work for ones which don't have content-length
> modifiable filters (like gzip) applied to the request, and it would be
> left to the webserver admin to figure out what they were, and if you
> could use this.

But that's not an issue if the byterange filter comes after any filters
that modify content (CONTENT_SET).

> ok..
> at the moment when a byterange request goes to a dynamic module, the
> dynamic module can not use any tricks to only serve the bytes requested,
> it *HAS* to serve the entire content up as buckets.

Indeed.  That only becomes a problem when a filter breaks pipelining.

> what I am proposing is something like:
>
> 1. the filter keeps a ordered list of range requests that the person
> requests.

> 2. it keeps state on how far it has processed in the file. thanks to
> knowing the length of the buckets processed so far.
>    Q: when do the actual headers get put in.. I think they are after no?

ITYM data, not "the file".  The case of a single file is trivial, and
can more efficiently be handled in a separate optimised execution path.
And some bucket types have to be read to get their length.

> 3. it then examines the bucket + bucket length to see which range
> requests match this range, if some do it grabs that range (possibly
> splitting/copying if it meets multiple ranges) and puts it on the right
> bits of each range request.
>
> 4. if the top range request is finished, it passes those buckets through.
>
> 5. repeat until EOS/Sentinel, flushing the ordered list at the end.

This doesn't completely address the issue that this might cause excessive
memory usage; particularly if we have to serve ranges in a perverse order.
I would propose two admin-configurable limits:

(1) Total data buffered in memory by the byterange filter.  This can be
computed in advance from the request headers.  If this is exceeded, the
filter should create a file bucket to store the data, and the ordered
list then references offsets into the file.

(2) A limit above which byteranges won't be served at all: most of us
have neither the memory nor the /tmp space for a gigabyte.

> now.. this assumes that splitting a bucket (and copying) is a zero cost
> operation which doesn't actually *read* the bucket, is this true for
> most bucket types?
>
> would this kind of thing work?

As I said, the trivial cases should (transparently) be treated separately
and more simply.  Otherwise ... well, as discussed on IRC.

-- 
Nick Kew

Reply via email to