Hi folks,

You need to be very careful with HTTP deflate encoding. Due to a common
misreading of the HTTP spec, there are two distinct implementations of HTTP
deflate out in the wild. The correct implementation uses RFC 1950. The
incorrect implementation uses RFC 1951. The reason for this seems to be down
to the overloaded use of the term "deflate". Both zlib and HTTP use the
term, but unfortunately they don't mean the same thing. The incorrect
implementations use the zlib definition of deflate.

Last time I checked, Internet Explorer does it the wrong way. Opera can cope
with either. Older Netscape/Mozilla got it wrong, but I think recent Mozilla
has fixed it to handle both. I can't remember what Apache does.

This makes things slightly tricky if you are going to process content that
advertises itself as "Content-Encoding: deflate", because you can't tell
which of the two deflate implementations it is without doing a quick test on
the content. It is possible to do it though. You can test for RFC 1950 by
checking for the presence of a ZLIB header. To detect RFC 1951, you have to
attempt to decompress it. If it passes, you have RFC1951, if it fails, you
don't.

Things are much worse if you are running in a HTTP Proxy or an Origin Server
and an agent advertises "Accept-Encoding: deflate". Which of the two deflate
implementations do you use? If you knew all the possible User-Agent headers
you will ever get, you could maintain a database that mapped User-Agent to
deflate implementation type. Blech!


Finally, I'm the author of Compress::Zlib, and I've been giving it a major
overhaul over the last couple of months (I've been at it on-and-off for a
few months because I don't have a lot of free time at the moment). One of my
goals is to make it easier to use in the HTTP modules (automatically
figuring out which of the two deflate implementations is used when doing a
uncompress is already on my list), so if there are any specific requests,
now would be a good time to feed them back to me.

Paul

> -----Original Message-----
> From: David Carter [mailto:[EMAIL PROTECTED]
> Sent: 25 March 2003 10:44
> To: 'Mike Simons'; [EMAIL PROTECTED]
> Subject: RE: Net::HTTP does not use compressed transfers when it should
>
>
> Mike,
>
> If you're interested, I have some working perl code that does deflate
> decompression. It does it at the application level, and needs to be moved
> down into Net::HTTP and/or LWP. There are some wrinkles related
> to handling
> of window_bits (or similar, don't have the code in front of me at the
> moment) that are not at all obvious.
>
> No need to teach mod_gzip deflate for testing - just find a site on the
> internet that already emits "Content-encoding: deflate" & test
> with it. Such
> as http://www.homedepot.com
>
> All commercial "http accelerators" I have looked at use content-encoding
> rather than transfer-encoding. I think it has something to do with what
> Internet Explorer supports, or perhaps even how well it supports
> one vs. the
> other. It's been a couple of years since I worked with this
> extensively, so
> the details are a little foggy.
>
> I have written a Netscape (iPlanet) server plugin & CGI that apply deflate
> compression to data returned by any other CGI program, but unfortunately
> this code is proprietary.
>
> ---
> David Carter
> [EMAIL PROTECTED]
>
>
> > -----Original Message-----
> > From: Mike Simons [mailto:[EMAIL PROTECTED]
> > Sent: Tuesday, March 25, 2003 2:10 AM
> > To: [EMAIL PROTECTED]
> > Subject: Re: Net::HTTP does not use compressed transfers when it should
> >
> > On Mon, Mar 24, 2003 at 01:59:56PM -0500, Mike Simons wrote:
> > >   Net::HTTP does not play nicely with mod_gzip from apache.
> > >
> > >   Net::HTTP sends 'TE:' headers, mod_gzip looks for
> 'Accept-encoding:'.
> > >
> > > - Any chance 'Accept-encoding:' can be advertised and 'Content-
> > Encoding:'
> > >   results can be decoded by Net::HTTP sometime soon?
> >
> >   So, I have something that works between mod_gzip and Net::HTTP,
> > using the gzip transfer type.  The data is transparently decompressed
> > by the HTTP module and block by block decompression is supported.
> >
> >   Patch attached ... in order for LWP to use this a minor patch is
> > needed to the http.pm module.
> >
> > - Who does code review or where do patches go?
> >
> >     Later,
> >       Mike Simons
> >
> >
> >   I'll try to clean it up somewhat tomorrow...
> >
> > First Draft BUGS:
> > ===
> > - The HTTP modules advertises support for deflate, but doesn't handle
> >   that yet... from what I can tell mod_gzip can not send deflate data.
> >   In order to get deflate working I need to teach mod_gzip to send
> >   deflate data...
> >
> > - The documentation isn't updated.
> >
> > - No attempt was made to support TE and Content-Encoded data at the same
> >   time.
> >
> > - No test of this code with Compress::Zlib uninstalled to verify that
> >   it still works there was done.
> >
> > - The decompression routine does block by block decompression, but
> >   in order to do this calls a private Compress::ZLib method
> >   (_removeGzipHeader) to strip off the gzip header, this is the exact
> >   same function that the MemGunzip call makes to prepare the strip the
> >   header...
> >     While it's unclean calling something else's private method
> >   it would be worse re-implementing the prune here, because it's size is
> >   dynamic.
> >
> > - If compression is requested it is important that client code not pay
> >   attention to the content-length value... that is not the number of
> >   bytes to read, call the read method until it returns 0 bytes.
>
>

Reply via email to