On Apr 2, 2011, at 3:29am, Oleg Kalnichevski wrote:

> On Sat, 2011-04-02 at 06:10 -0400, Chad La Joie wrote:
>> Okay, that makes sense.
>> 
>> To test this, is there a way I can force the content type on the client
>> side, prior to requesting the response entity, via the response object?
>> 
> 
> You can try adding Accept and / or Accept-Charset header to the request
> message and see if the origin server responds appropriately.
> 
> However, generally you might be better off using some sort of a content
> detection algorithm such that provided by Apache Tika toolkit. I suspect
> wget does exactly that.

Tika tries to follow the recommendations of RFC 3023:
      If an application/xml entity is received where the charset
      parameter is omitted, no information is being provided about the
      charset by the MIME Content-Type header.  Conforming XML
      processors MUST follow the requirements in section 4.3.3 of [XML]
      that directly address this contingency.
Which means it will look for a byte-order-mark and encoding declaration inside 
of the XML content.

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g






---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to