Re: mod_perl and Transfer-Encoding: chunked

Issac Goldstand Thu, 04 Jul 2013 01:40:52 -0700

On 03/07/2013 23:26, Jim Schueler wrote:

> 
>>             Second, if there's no Content-Length header then how
>>             does one know how much
>>             data to read using $r->read?   
>>
>>             One answer is until $r->read returns zero bytes, of
>>             course.  But, is
>>             that guaranteed to always be the case, even for,
>>             say, pipelined requests?  
>>             My guess is yes because whatever is de-chunking the
> 
> read() is blocking.  So it never returns 0, even in a pipeline request
> (if no data is available, it simply waits).  I don't wish to discuss the
> merits here, but there is no technical imperative for a content-length
> request in the request header.
> 
>  -Jim


Probably.  If you, for some reason, were doing the chunking work
yourself, each chunk says how many bytes are in it (or in the next one
perhaps; I forget offhand), so you'd know what size read to do.

> 
> 
> 
> 
> 
> On Wed, 3 Jul 2013, Bill Moseley wrote:
> 
>> Hi Jim,
>> This is the Transfer-Encoding: chunked I was writing about:
>>
>> http://tools.ietf.org/html/rfc2616#section-3.6.1
>>
>>
>>
>> On Wed, Jul 3, 2013 at 11:34 AM, Jim Schueler <jschue...@eloquency.com>
>> wrote:
>>       I played around with chunking recently in the context of media
>>       streaming: The client is only requesting a "chunk" of data.
>>        "Chunking" is how media players perform a "seek".  It was
>>       originally implemented for FTP transfers:  E.g, to transfer a
>>       large file in (say 10K) chunks.  In the case that you describe
>>       below, if no Content-Length is specified, that indicates "send
>>       the remainder".
>>
>>       >From what I know, a "chunk" request header is used this way to
>>       specify the server response.  It does not reflect anything about
>>       the data included in the body of the request.  So first, I would
>>       ask if you're confused about this request information.
>>
>>       Hypothetically, some browsers might try to upload large files in
>>       small chunks and the "chunk" header might reflect a push
>>       transfer.  I don't know if "chunk" is ever used for this
>>       purpose.  But it would require the following characteristics:
>>
>>         1.  The browser would need to originally inquire if the server
>>       is
>>             capable of this type of request.
>>         2.  Each chunk of data will arrive in a separate and
>>       independent HTTP
>>             request.  Not necessarily in the order they were sent.
>>         3.  Two or more requests may be handled by separate processes
>>             simultaneously that can't be written into a single
>>       destination.
>>         4.  Somehow the server needs to request a resend if a chunk is
>>       missing.
>>             Solving this problem requires an imaginitive use of HTTP.
>>
>>       Sounds messy.  But might be appropriate for 100M+ sized uploads.
>>        This *may* reflect your situation.  Can you please confirm?
>>
>>       For a single process, the incoming content-length is
>>       unnecessary. Buffered I/O automatically knows when transmission
>>       is complete.  The read() argument is the buffer size, not the
>>       content length.  Whether you spool the buffer to disk or simply
>>       enlarge the buffer should be determined by your hardware
>>       capabilities.  This is standard IO behavior that has nothing to
>>       do with HTTP chunk.  Without a "Content-Length" header, after
>>       looping your read() operation, determine the length of the
>>       aggregate data and pass that to Catalyst.
>>
>>       But if you're confident that the complete request spans several
>>       smaller (chunked) HTTP requests, you'll need to address all the
>>       problems I've described above, plus the problem of re-assembling
>>       the whole thing for Catalyst.  I don't know anything about
>>       Plack, maybe it can perform all this required magic.
>>
>>       Otherwise, if the whole purpose of the Plack temporary file is
>>       to pass a file handle, you can pass a buffer as a file handle.
>>        Used to be IO::String, but now that functionality is built into
>>       the core.
>>
>>       By your last paragraph, I'm really lost.  Since you're already
>>       passing the request as a file handle, I'm guessing that Catalyst
>>       creates the tempororary file for the *response* body.  Can you
>>       please clarify?  Also, what do you mean by "de-chunking"?  Is
>     >       that the same think as re-assembling?
>>
>>       Wish I could give a better answer.  Let me know if this helps.
>>
>>       -Jim
>>
>>
>>       On Tue, 2 Jul 2013, Bill Moseley wrote:
>>
>>             For requests that are chunked (Transfer-Encoding:
>>             chunked and no
>>             Content-Length header) calling $r->read returns
>>             unchunked data from the
>>             socket.
>>             That's indeed handy.  Is that mod_perl doing that
>>             un-chunking or is it
>>             Apache?
>>
>>             But, it leads to some questions.   
>>
>>             First, if $r->read reads unchunked data then why is
>>             there a
>>             Transfer-Encoding header saying that the content is
>>             chunked?   Shouldn't
>>             that header be removed?   How does one know if the
>>             content is chunked or
>>             not, otherwise?
>>
>>             Second, if there's no Content-Length header then how
>>             does one know how much
>>             data to read using $r->read?   
>>
>>             One answer is until $r->read returns zero bytes, of
>>             course.  But, is
>>             that guaranteed to always be the case, even for,
>>             say, pipelined requests?  
>>             My guess is yes because whatever is de-chunking the
>>             request knows to stop
>>             after reading the last chunk, trailer and empty
>>             line.   Can anyone elaborate
>>             on how Apache/mod_perl is doing this? 
>>
>>
>>             Perhaps I'm approaching this incorrectly, but this
>>             is all a bit untidy.
>>
>>             I'm using Catalyst and Catalyst needs a
>>             Content-Length.  So, I have a Plack
>>             Middleware component that creates a temporary file
>>             writing the buffer from
>>             $r->read( my $buffer, 64 * 1024 ) until that returns
>>             zero bytes.  I pass
>>             this file handle onto Catalyst.
>>
>>             Then, for some content-types, Catalyst (via
>>             HTTP::Body) writes the body to
>>             another temp file.    I don't know how
>>             Apache/mod_perl does its de-chunking,
>>             but I can call $r->read with a huge buffer length
>>             and Apache returns that.
>>              So, maybe Apache is buffering to disk, too.
>>
>>             In other words, for each tiny chunked JSON POST or
>>             PUT I'm creating two (or
>>             three?) temp files which doesn't seem ideal.
>>
>>
>>             --
>>             Bill Moseley
>>             mose...@hank.org
>>
>>
>>
>>
>> -- 
>> Bill Moseley
>> mose...@hank.org
>>
>>

Re: mod_perl and Transfer-Encoding: chunked

Reply via email to