> [EMAIL PROTECTED] wrote...
>
> Hi to all,
>
> A new question to HTTP / RFC gurus.
>
> A customer has developped a custom PHP HTTP client,
> using HTTP 1.0 and compression.

That's like mixing Vodka and Beer... something could
easily puke... but OK... I hear ya...

> This HTTP client compress both request and replies.

Sure, why not.

> For replies it works great but for request we have
> a doubt.

I imagine so, yes.

> Since the HTTP client compress a request there is in
> HTTP header :
>
> Content-Encoding: gzip
>
> Also the Content-Length is set to the size of the
> plain request (not the size of the compressed request).
>
> Is it correct or should it send the Content-Length with
> the size of the compressed request ?
>
> In such case, it seems that mod_deflate INPUT filter should
> modify the Content-Length accordingly ?
>
> Thanks for your help

You've got some messed up code on your hands, Henri.

In your particular case... Content-length should ALWAYS be
ACTUAL length of the number of bytes on the wire. Anything else
is going to screw something up somewhere.

You have to remember the difference between 'Content-Encoding:'
and 'Tranfer-encoding:'. 'Transfer-Encoding:' is TRANSPORT
layer thing but 'Content-Encoding:' is a PRESENTATION
layer thing.

When any HTTP request or response says that it's BODY DATA
has 'Content-type: XXXX' and/or 'Content-Length: XXXX' what that
really meant ( in early HTTP terms ) is...

Content-Type: = Original MIME type of original data (file).
Content-Length = Actual length of original data (file).

The original assumption in early HTTP was that this would always
represent some file on some disk and the 'Content-type:' was
usually just the file extension (mapped) and the 'Content-length:' was
whatever a 'stat()' call says the file length was.

When Content started to get produced dynamically ( does not
exist until asked for ) things got a little sticky but the CONCEPT
is still the same. Content-type: is supposed to be the MIME type
'as-if' the 'file' already existed and 'Content-length' would be the
exact number of ( PRESENTATION LAYER ) bytes 'as-if' the
'data file' was sitting on a disk somewhere.

If ANYTHING steps in to alter or filter or convert the 'content'
at the PRESENTATION layer then it MUST change the 'Content-Length'
as well because from the 'Content-xxxxx' perspective... the
content has, in fact, changed at the PRESENTATION layer.

There is no HTTP header field that looks like this...

Original-Content-Length: xxxx <- Length of data before P layer content changed

All you have to work with is this...

Content-length: xxxx <- Length of P layer data NOW after something changes it.

RFC 2616 says...

4.4 Message Length
3. If a Content-Length header field ( section 14.41 ) is present, its
decimal value in OCTETs represents BOTH the entity-length and
the transfer-length. The Content-Length header field must NOT be sent
if these two lengths are different.... [snip]

What this really means is...

3. If a ( PRESENTATION layer ) Content-Length header field
( section 14.41 ) is present, its decimal value in OCTETs represents
BOTH the entity-length ( Actual PRESENTATION layer length ) and
the transfer-length. ( TRANSPORT layer length - actual number of
bytes on the wire ). The Content-Length header field must NOT be sent
if these two lengths are different.... [snip]

The last part is kind of moot since it's not uncommon at all for
presentation layer content-length to be 'different' from the actual
transport layer length. You will see it all the time 'out there'. The
only thing that gets you into real trouble is when the actual length
of the data is MORE than whatever the 'Content-length:' field says
it's supposed to be.

Example: Even with all the above being said... it is actually OK to
leave 'Content-Length:' set to the original size of the file IF you are
using GZIP or DEFLATE ( or any LZ77 ) to compress the content.
As long as the specified 'Content-length:' ( original size ) is MORE
than the number of compressed LZ77 bytes on the wire you will
usually still be OK.

Why?... because GZIP and ZLIB and all other LZ77 decompressors
already KNOW what the original content length was and they don't
need HTTP to tell it to them. The size of the orignal file is (usually)
contained in the LZ77 headers.

Even 'streamed compression' ( sic: ZLIB ) will KNOW when the
decompression has ended. There's an EOD signal built into
the stream itself... but that doesn't mean the Server will know
what the decompressor 'knows'.

Which brings us to your 'action items', methinks.

If you are using 'streamed compression' ( Sic: ZLIB ) then there
will only be 2 ways that the Server knows how many bytes the
Client is actually SENDING...

1. The Content-Length in the request header is, in fact, the transfer
length and the Server will stop uploading data when that length is
reached and won't 'timeout' or anything waiting for more data that
never arrives.

2. The Client is using HTTP/1.1 and "Transfer-encoding: chunked"
for the upload and there will be a ZERO LENGTH chunking byte
at the end of the compressed data ( even though the compressed
data has it's own EOD signal ).

3. Client connection closes <-- NOT AN OPTION FOR Client -> Server !!

A Server can always just CLOSE a connection to say "I am done sending".
No such chance on the Client side. The moment the Client 'hangs up' that
is, of course, the end of the entire conversation. Nothing can come back.

If there is any chance at all of using mod_deflate as an INPUT filter
then you better get your customer to do ALL of the following...

1. Send HTTP/1.1 requests ( Not 1.0 )
2. Make sure Content-length is the EXACT number of compressed bytes.
( OR )
3. Don't use Content-Length: at all... use "Transfer-encoding: chunked"
for the upload.

Later...
Kevin

PS: What you REALLY want to be using is "Transfer-encoding: gzip"
for the Client->Server uploads but that's another entire discussion.
Using DCE ( Dynamic Content Encoding ) to compress either
requests or responses is really just a big kludge because no one
seems to get RFC compliant and make "Transfer-Encoding:" a reality
on all major Servers and Clients. It's a 'trick' that just happens to work because
"Content-encoding: gzip" is the only kind of compression most
browsers even have a chance at doing correctly.

> Joshua Slive wrote:
>
>> On Wed, 31 Mar 2004, Henri Gomez wrote:
>>
>>>Also the Content-Length is set to the size of the
>>>plain request (not the size of the compressed request).
>>>
>>>Is it correct or should it send the Content-Length with
>>>the size of the compressed request ?
>>>
>>>In such case, it seems that mod_deflate INPUT filter should
>>>modify the Content-Length accordingly ?
>>
>>
>> The note at the bottom of this section:
>> http://httpd.apache.org/docs-2.0/mod/mod_deflate.html#enable
>> says that it should be the compressed length and that the server is not
>> responsible for changing the length during decompression.
>
> Ok.
>
> Since jk / jk2 are using content-length to forward datas to tomcat,
> how did jk/jk2 know the correct size to be forwarded ?
>
> Regards

Reply via email to