Hi Roland and thank you for answering,
It sure helped. The web server was returning real little squares (code
point = 0) . All i had to do to have it worked is set the request
header to say that i accept utf-8 as well :
post.setRequestHeader("accept-charset", "ISO-8859-1,UTF-8;q=0.7,*;q=0.7");
That changed the response encoding from ISO-8859-1 to UTF-8 and my
little squares turned into cyrillic or greek characters.
Thanks again!!
2006/10/27, Roland Weber <[EMAIL PROTECTED]>:
Hi Franck,
> But when i try to parse babelfish response to my request for a
> translation to russian or greek, i get little squares instead of
> russian or greek characters.
HttpClient never gives you little squares. It gives you bytes or
characters, where the characters might be correct or not. If you
*print* the characters to some stream, then they get *rendered*
by some font. Little squares usually indicate one of those:
1. The character is a little square.
2. The character is correct, but not supported by the font.
Like a japanese character printed by an ISO-Latin font.
3. The character is wrong.
> So i tried to use the getResponseBody()
> on the post method to get an array of bytes so i could convert it
> using UTF-8 or ISO-8859-1 or UTF-16. No matter what character encoding
> i use i get those annoying little squares.
Try printing the code points:
System.out.println("character: " + ((int)c));
Then you know which value the character has in memory, and can
verify whether the problem is in character decoding or in the
range of characters supported by the font.
You can also try to write the characters into a binary file as
UTF-16, then open that file with a text editor that supports
UTF, such as WordPad (if you use Windows).
> String encoding = post.getResponseCharSet ();
>
> String altavistaResponse /*= new String(post.getResponseBody(),
> "ISO-8859-1")*/;
> //altavistaResponse = new String(post.getResponseBody(),
> "ISO-8859-7");
> altavistaResponse = new String( post.getResponseBody(), "UTF-8");
> //altavistaResponse = new String(post.getResponseBody(), "UTF-16");
> //altavistaResponse = post.getResponseBodyAsString();
Well, what *is* the value of "encoding"? What does a browser use
for displaying then page when you visit it directly?
hope that helps,
Roland
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Cordialement.
Franck MARTIN
[EMAIL PROTECTED]
http://javafreelance.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]