Dear Anand,

Like Paul said, I too have a feeling that your Solaris console is unable to
handle the character set in the file. If your windows machine is able to
display the same file correctly but not your solaris machine, it means that
the client has downloaded the bits correctly... U might need to install the
Language pack which comes with the Solaris OS.

Having said that, how do u get the data? u should use
getResponseBodyAsString() method to make sure that the bytes[] of the
response are correctly decoded. HTTP response from the web server (ideally)
sets the "Content-type" attribute to let the clients determine what the
encoding (charset) of the page is. The HTTPClient extracts this HTTP header
to decode the data. Following are how some of the method have been
implemented in the HTTP client (HttpMethodBase.java) :

    public String getResponseBodyAsString() {
        if (!responseAvailable()) {
            return null;
        }
        String body;
        try {
            body = new String(getResponseBody(), getResponseCharSet());
        }
        catch(UnsupportedEncodingException e) {
            if (log.isWarnEnabled()) {
                log.warn("Unsupported request body charset: " +
e.getMessage());
            }
            body = new String(getResponseBody());
        }

        return body;
    }


    public String getResponseCharSet() {
        return getContentCharSet(getResponseHeader("Content-Type"));
    }


Now, if the HTTP response from the server has not set this HTTP header, the
HTTP client will have to assume the default character set and this might
lead to an incorrect decoding of the bytes.. So, atleast for testing
purposes, make sure u access a page from a server which u know sets this
header. (try http://yahoo.co.jp ).

Once u get the data in String form, u should output it using
System.out.println().. If u still see ?? , it means your console is unable
to display the character set of the Page. Time to install the Language
pack...


parag.


----- Original Message -----
From: "Anand Heda" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, November 07, 2002 8:45 AM
Subject: Re: Encoding Problem


Hi Paul,

Thanks for your reply! Unfortunately, I did what you suggested and it still
appeared as a question mark.  However, I tried the same code on a Windows
machine and it worked there.  I found that Solaris has the unicode encoding
as UnicodeBig, while Windows has it as UnicodeLittle.

Do you have any suggestions as to how to correct this? I assume I wouldn't
use the getResponseBodyAsString method, but probably getResponseBody (which
retuns body as a byte array)? I am stuck on where to go from there on how to
properly convert it with the correct characters intact.

If you could provide any information or help, I would greatly appreciate it.
  Thanks so much!

Anand






>From: Paul Libbrecht <[EMAIL PROTECTED]>
>Reply-To: "Jakarta Commons Users List" <[EMAIL PROTECTED]>
>To: "Jakarta Commons Users List" <[EMAIL PROTECTED]>
>Subject: Re: Encoding Problem
>Date: Fri, 18 Oct 2002 23:42:08 +0200
>
>
>On Vendredi, octobre 18, 2002, at 11:35 , Anand Heda wrote:
>>I am both a java novice and a HTTPClient newbie, so please bear with me
>>:-).
>>
>>I have written some simple code which retrieves a webpage and outputs the
>>content of it.  However, certain characters (like ó, which is a long dash)
>>do not appear correctly when I output them.  Instead, a ? (question mark)
>>appears in their place.
>>
>>How can I correct this? (BTW, I am running on Solaris).
>
>
>Anand,
>
>You should be very careful when stating that "?" is not correct. It may be
>actually correct but that your console doesn't have the font to display
>it...
>Whenever you have to check encoding, do save the bytes to a file then open
>it with an editor that knows how to handle encodings (like jEdit).
>
>Encodings are an enormous delicate matter when it comes sending or
>receiving.
>For example, the message you sent had no encoding header (oddly). As a
>result... the long dash that you claim to be is seen on my machine as an
>"o" with an acute accent...
>
>Paul
>
>
>--







>From: Paul Libbrecht <[EMAIL PROTECTED]>
>Reply-To: "Jakarta Commons Users List" <[EMAIL PROTECTED]>
>To: "Jakarta Commons Users List" <[EMAIL PROTECTED]>
>Subject: Re: Encoding Problem
>Date: Fri, 18 Oct 2002 23:42:08 +0200
>
>
>On Vendredi, octobre 18, 2002, at 11:35 , Anand Heda wrote:
>>I am both a java novice and a HTTPClient newbie, so please bear with me
>>:-).
>>
>>I have written some simple code which retrieves a webpage and outputs the
>>content of it.  However, certain characters (like ó, which is a long dash)
>>do not appear correctly when I output them.  Instead, a ? (question mark)
>>appears in their place.
>>
>>How can I correct this? (BTW, I am running on Solaris).
>
>
>Anand,
>
>You should be very careful when stating that "?" is not correct. It may be
>actually correct but that your console doesn't have the font to display
>it...
>Whenever you have to check encoding, do save the bytes to a file then open
>it with an editor that knows how to handle encodings (like jEdit).
>
>Encodings are an enormous delicate matter when it comes sending or
>receiving.
>For example, the message you sent had no encoding header (oddly). As a
>result... the long dash that you claim to be is seen on my machine as an
>"o" with an acute accent...
>
>Paul
>
>
>--
>To unsubscribe, e-mail:
><mailto:commons-user-unsubscribe@;jakarta.apache.org>
>For additional commands, e-mail:
><mailto:commons-user-help@;jakarta.apache.org>


_________________________________________________________________
MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*.
http://join.msn.com/?page=features/virus


--
To unsubscribe, e-mail:
<mailto:commons-user-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail:
<mailto:commons-user-help@;jakarta.apache.org>


--
To unsubscribe, e-mail:   <mailto:commons-user-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail: <mailto:commons-user-help@;jakarta.apache.org>

Reply via email to