Okay, I'll look in to the server side of things.

As an aside, converting the stream to characters is generally a really
bad idea when it comes to XML parsing.  Normally you want just give the
parser the stream and let it figure out all the details.

Thanks for the help.

On 4/4/11 8:22 AM, Oleg Kalnichevski wrote:
> On Mon, 2011-04-04 at 08:16 -0400, Chad La Joie wrote:
>> Then I guess I misunderstood what you were saying before.  Are you
>> suggesting then that the server is transcoding the file when it serves
>> it up?  And that the missing bytes actually go missing before HttpClient
>> gets the response?
> 
> 
> That is one possibility. Besides, I suspect that your application also
> needs to convert the response content to a stream of characters in order
> to be able to parse the XML message. This is another possibility for
> things to go screwy.   
> 
> Hope this helps
> 
> Oleg
> 
>> On 4/4/11 8:08 AM, Oleg Kalnichevski wrote:
>>> On Mon, 2011-04-04 at 07:34 -0400, Chad La Joie wrote:
>>>> Yeah, unfortunately that didn't work.
>>>>
>>>> Is there any way to get the old v3 behavior that gives you access to the
>>>> raw bytes of the entity before any sort of character decoding is done?
>>>>
>>>> I strongly suspect that very few web servers out there are properly
>>>> configured to return the correct character encoding so this could
>>>> definitely be an ongoing problem.
>>>>
>>>
>>> EntityUtils.toByteArray returns raw response content without attempting
>>> to decode it. 
>>>
>>> http://hc.apache.org/httpcomponents-core-ga/httpcore/xref/org/apache/http/util/EntityUtils.html#81
>>>
>>> Oleg
>>>
>>>
>>>> On 4/2/11 6:29 AM, Oleg Kalnichevski wrote:
>>>>> On Sat, 2011-04-02 at 06:10 -0400, Chad La Joie wrote:
>>>>>> Okay, that makes sense.
>>>>>>
>>>>>> To test this, is there a way I can force the content type on the client
>>>>>> side, prior to requesting the response entity, via the response object?
>>>>>>
>>>>>
>>>>> You can try adding Accept and / or Accept-Charset header to the request
>>>>> message and see if the origin server responds appropriately.
>>>>>
>>>>> However, generally you might be better off using some sort of a content
>>>>> detection algorithm such that provided by Apache Tika toolkit. I suspect
>>>>> wget does exactly that.
>>>>>
>>>>> http://tika.apache.org/0.9/detection.html
>>>>> http://tika.apache.org/
>>>>>
>>>>> Oleg
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
>>>>> For additional commands, e-mail: httpclient-users-h...@hc.apache.org
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
>>> For additional commands, e-mail: httpclient-users-h...@hc.apache.org
>>>
>>>
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
> For additional commands, e-mail: httpclient-users-h...@hc.apache.org
> 
> 

-- 
Chad La Joie
http://itumi.biz
trusted identities, delivered

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org

Reply via email to