On Wed, Jul 3, 2013 at 4:31 PM, Jim Schueler <jschue...@eloquency.com>wrote:
> In light of Joe Schaefer's response, I appear to be outgunned. So, if > nothing else, can someone please clarify whether "de-chunked" means > re-assembled? yes, where re-assembled means convert it back to the original data stream without any sort of transport encoding > > > -Jim > > > On Wed, 3 Jul 2013, Jim Schueler wrote: > > Thanks for the prompt response, but this is your question, not mine. I >> hardly need an RTFM for my trouble. >> >> I drew my conclusions using a packet sniffer. And as far-fetched as my >> answer may seem, it's more plausible than your theory that Apache or >> modperl is decoding a raw socket stream. >> >> The crux of your question seems to be how the request content gets >> magically re-assembled. I don't think it was ever disassembled in the >> first place. But if you don't like my answer, and you don't want to ignore >> it either, then please restate the question. I can't find any definition >> for unchunked, and Wiktionary's definition of de-chunk says to "break apart >> a chunk", that is (counter-intuitively) chunk a chunk. >> >> >> Second, if there's no Content-Length header then how >>> does one know how much >>> data to read using $r->read? >>> >>> One answer is until $r->read returns zero bytes, of >>> course. But, is >>> that guaranteed to always be the case, even for, >>> say, pipelined requests? >>> My guess is yes because whatever is de-chunking the >>> >> >> read() is blocking. So it never returns 0, even in a pipeline request >> (if no data is available, it simply waits). I don't wish to discuss the >> merits here, but there is no technical imperative for a content-length >> request in the request header. >> >> -Jim >> >> >> >> >> >> >> On Wed, 3 Jul 2013, Bill Moseley wrote: >> >> Hi Jim, >>> This is the Transfer-Encoding: chunked I was writing about: >>> >>> http://tools.ietf.org/html/**rfc2616#section-3.6.1<http://tools.ietf.org/html/rfc2616#section-3.6.1> >>> >>> >>> >>> On Wed, Jul 3, 2013 at 11:34 AM, Jim Schueler <jschue...@eloquency.com> >>> wrote: >>> I played around with chunking recently in the context of media >>> streaming: The client is only requesting a "chunk" of data. >>> "Chunking" is how media players perform a "seek". It was >>> originally implemented for FTP transfers: E.g, to transfer a >>> large file in (say 10K) chunks. In the case that you describe >>> below, if no Content-Length is specified, that indicates "send >>> the remainder". >>> >>> >From what I know, a "chunk" request header is used this way to >>> specify the server response. It does not reflect anything about >>> the data included in the body of the request. So first, I would >>> ask if you're confused about this request information. >>> >>> Hypothetically, some browsers might try to upload large files in >>> small chunks and the "chunk" header might reflect a push >>> transfer. I don't know if "chunk" is ever used for this >>> purpose. But it would require the following characteristics: >>> >>> 1. The browser would need to originally inquire if the server >>> is >>> capable of this type of request. >>> 2. Each chunk of data will arrive in a separate and >>> independent HTTP >>> request. Not necessarily in the order they were sent. >>> 3. Two or more requests may be handled by separate processes >>> simultaneously that can't be written into a single >>> destination. >>> 4. Somehow the server needs to request a resend if a chunk is >>> missing. >>> Solving this problem requires an imaginitive use of HTTP. >>> >>> Sounds messy. But might be appropriate for 100M+ sized uploads. >>> This *may* reflect your situation. Can you please confirm? >>> >>> For a single process, the incoming content-length is >>> unnecessary. Buffered I/O automatically knows when transmission >>> is complete. The read() argument is the buffer size, not the >>> content length. Whether you spool the buffer to disk or simply >>> enlarge the buffer should be determined by your hardware >>> capabilities. This is standard IO behavior that has nothing to >>> do with HTTP chunk. Without a "Content-Length" header, after >>> looping your read() operation, determine the length of the >>> aggregate data and pass that to Catalyst. >>> >>> But if you're confident that the complete request spans several >>> smaller (chunked) HTTP requests, you'll need to address all the >>> problems I've described above, plus the problem of re-assembling >>> the whole thing for Catalyst. I don't know anything about >>> Plack, maybe it can perform all this required magic. >>> >>> Otherwise, if the whole purpose of the Plack temporary file is >>> to pass a file handle, you can pass a buffer as a file handle. >>> Used to be IO::String, but now that functionality is built into >>> the core. >>> >>> By your last paragraph, I'm really lost. Since you're already >>> passing the request as a file handle, I'm guessing that Catalyst >>> creates the tempororary file for the *response* body. Can you >>> please clarify? Also, what do you mean by "de-chunking"? Is >>> >> > that the same think as re-assembling? >> >>> >>> Wish I could give a better answer. Let me know if this helps. >>> >>> -Jim >>> >>> >>> On Tue, 2 Jul 2013, Bill Moseley wrote: >>> >>> For requests that are chunked (Transfer-Encoding: >>> chunked and no >>> Content-Length header) calling $r->read returns >>> unchunked data from the >>> socket. >>> That's indeed handy. Is that mod_perl doing that >>> un-chunking or is it >>> Apache? >>> >>> But, it leads to some questions. >>> >>> First, if $r->read reads unchunked data then why is >>> there a >>> Transfer-Encoding header saying that the content is >>> chunked? Shouldn't >>> that header be removed? How does one know if the >>> content is chunked or >>> not, otherwise? >>> >>> Second, if there's no Content-Length header then how >>> does one know how much >>> data to read using $r->read? >>> >>> One answer is until $r->read returns zero bytes, of >>> course. But, is >>> that guaranteed to always be the case, even for, >>> say, pipelined requests? >>> My guess is yes because whatever is de-chunking the >>> request knows to stop >>> after reading the last chunk, trailer and empty >>> line. Can anyone elaborate >>> on how Apache/mod_perl is doing this? >>> >>> >>> Perhaps I'm approaching this incorrectly, but this >>> is all a bit untidy. >>> >>> I'm using Catalyst and Catalyst needs a >>> Content-Length. So, I have a Plack >>> Middleware component that creates a temporary file >>> writing the buffer from >>> $r->read( my $buffer, 64 * 1024 ) until that returns >>> zero bytes. I pass >>> this file handle onto Catalyst. >>> >>> Then, for some content-types, Catalyst (via >>> HTTP::Body) writes the body to >>> another temp file. I don't know how >>> Apache/mod_perl does its de-chunking, >>> but I can call $r->read with a huge buffer length >>> and Apache returns that. >>> So, maybe Apache is buffering to disk, too. >>> >>> In other words, for each tiny chunked JSON POST or >>> PUT I'm creating two (or >>> three?) temp files which doesn't seem ideal. >>> >>> >>> -- >>> Bill Moseley >>> mose...@hank.org >>> >>> >>> >>> >>> -- >>> Bill Moseley >>> mose...@hank.org >>> >>> -- Born in Roswell... married an alien... http://emptyhammock.com/