I am forwarding your request to wikitech-l, in the hope that there are more people on there who can comment on this issue.
For those who did not follow the entire thread: the user does not send an Accept-Encoding: gzip header, but nevertheless gets a gzipped response. On Thu, Nov 25, 2010 at 8:19 PM, Anand Ramanathan <rcan...@gmail.com> wrote: > Bryan: No, I didnt set the Accept-Encoding header explicitly - I found the > following related issue on bugzilla: 7098 > > Andrew: Yes, thanks. I see that curl can support this, and so can open-uri. > > I wanted to clarify if I should be handling this in the client: > As per http 1.1 (section 14.3), for non-browser user agents, if no > Accept-Encoding is explicitly set, the response should be the document > itself if the server supports returning the document itself (identity). > However, if the server is unable to return the document itself, it is > preferable to return gzip or compressed content. > I think this issue is happening whenever I hit a cache node that has the > gzip, but not the identity cached. From a server standpoint, it seems like > the right behavior. So, it is up to the client, which needs to do one of the > following: > a) Set Accept-Encoding to make gzip not-acceptable, and identity as > acceptable. In this case, a cache node containing only gzip encoded document > will miss, and eventually a node that contains the identity will return it. > (This is a leap of faith, as I cannot target such a cache node explicitly. > If a node has both gzip and identity content, and is responding with gzip > for a request with no explicit Accept-Encoding set, then it violates the > spec and is a bug. Can anyone comment on this?) > b) Set Accept-Encoding to accept gzip or identity (or leave it unset), and > on the client, if Content-Encoding is gzip, unzip it explicitly. > I am fine with either of these approaches. Is this an accurate assessment of > the issue and options? > Thanks > Anand > > > > > > > On Thu, Nov 25, 2010 at 4:23 AM, Andrew Dunbar <hippytr...@gmail.com> wrote: >> >> On 25 November 2010 19:41, Anand Ramanathan <rcan...@gmail.com> wrote: >> > Yes, confirmed that they are. It is gzip - what is the best way to deal >> > with >> > this? Is this a bug that is tracked, or is this something worth handling >> > in >> > client code (checking if gzip and manually unzipping)? >> > Thanks >> > Anand >> >> Curl can definitely handle gzipped responses. Here's something about >> it from a very quick Google search: >> http://curl.haxx.se/mail/curlphp-2004-01/0043.html >> >> Andrew Dunbar (hippietrail) >> >> >> > On Thu, Nov 25, 2010 at 12:12 AM, Bryan Tong Minh >> > <bryan.tongm...@gmail.com> >> > wrote: >> >> >> >> On Thu, Nov 25, 2010 at 9:02 AM, Anand Ramanathan <rcan...@gmail.com> >> >> wrote: >> >> > OK, I got it again: Here is my curl output (headers + first few >> >> > characters) >> >> > for the garbled India wikipedia page (and the proper China wikipedia >> >> > page >> >> > for comparison below that): >> >> >> >> Can you verify that the first two characters are 0x1f and 0x8b >> >> respectively? Looks like gzip. >> >> >> >> _______________________________________________ >> >> Mediawiki-api mailing list >> >> Mediawiki-api@lists.wikimedia.org >> >> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >> > >> > >> > _______________________________________________ >> > Mediawiki-api mailing list >> > Mediawiki-api@lists.wikimedia.org >> > https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >> > >> > >> >> _______________________________________________ >> Mediawiki-api mailing list >> Mediawiki-api@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api > > > _______________________________________________ > Mediawiki-api mailing list > Mediawiki-api@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mediawiki-api > > _______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api