Bryan: No, I didnt set the Accept-Encoding header explicitly - I found the
following related issue on bugzilla:
7098<https://bugzilla.wikimedia.org/show_bug.cgi?id=7098>


Andrew: Yes, thanks. I see that curl can support this, and so can open-uri.

I wanted to clarify if I should be handling this in the client:

As per http 1.1  (section
14.3<http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html>),
for non-browser user agents, if no Accept-Encoding is explicitly set, the
response should be the document itself if the server supports returning the
document itself (identity). However, if the server is unable to return the
document itself, it is preferable to return gzip or compressed content.

I think this issue is happening whenever I hit a cache node that has the
gzip, but not the identity cached. From a server standpoint, it seems like
the right behavior. So, it is up to the client, which needs to do one of the
following:

a) Set Accept-Encoding to make gzip not-acceptable, and identity as
acceptable. In this case, a cache node containing only gzip encoded document
will miss, and eventually a node that contains the identity will return it.
(This is a leap of faith, as I cannot target such a cache node explicitly.
If a node has both gzip and identity content, and is responding with gzip
for a request with no explicit Accept-Encoding set, then it violates the
spec and is a bug. Can anyone comment on this?)
b) Set Accept-Encoding to accept gzip or identity (or leave it unset), and
on the client, if Content-Encoding is gzip, unzip it explicitly.

I am fine with either of these approaches. Is this an accurate assessment of
the issue and options?

Thanks
Anand








On Thu, Nov 25, 2010 at 4:23 AM, Andrew Dunbar <hippytr...@gmail.com> wrote:

> On 25 November 2010 19:41, Anand Ramanathan <rcan...@gmail.com> wrote:
> > Yes, confirmed that they are. It is gzip - what is the best way to deal
> with
> > this? Is this a bug that is tracked, or is this something worth handling
> in
> > client code (checking if gzip and manually unzipping)?
> > Thanks
> > Anand
>
> Curl can definitely handle gzipped responses. Here's something about
> it from a very quick Google search:
> http://curl.haxx.se/mail/curlphp-2004-01/0043.html
>
> Andrew Dunbar (hippietrail)
>
>
> > On Thu, Nov 25, 2010 at 12:12 AM, Bryan Tong Minh <
> bryan.tongm...@gmail.com>
> > wrote:
> >>
> >> On Thu, Nov 25, 2010 at 9:02 AM, Anand Ramanathan <rcan...@gmail.com>
> >> wrote:
> >> > OK, I got it again: Here is my curl output (headers + first few
> >> > characters)
> >> > for the garbled India wikipedia page (and the proper China wikipedia
> >> > page
> >> > for comparison below that):
> >>
> >> Can you verify that the first two characters are 0x1f and 0x8b
> >> respectively? Looks like gzip.
> >>
> >> _______________________________________________
> >> Mediawiki-api mailing list
> >> Mediawiki-api@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
> >
> >
> > _______________________________________________
> > Mediawiki-api mailing list
> > Mediawiki-api@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
> >
> >
>
> _______________________________________________
> Mediawiki-api mailing list
> Mediawiki-api@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>
_______________________________________________
Mediawiki-api mailing list
Mediawiki-api@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

Reply via email to