I tried SELECT HEX(your_column) FROM your_table and indeed, only '?' is being stored 
(3F hex, 63 decimal). Thanks for the hint regarding Unicode support in MyODBC, I'll 
try to read more on it tomorrow. I've had enough frustrations for the day... :)

Thanks.

S Lopes



-----Original Message-----
From: Jeremy March [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 02, 2004 2:53 PM
To: [EMAIL PROTECTED]
Cc: Silvio Lopes de Oliveira
Subject: RE: Unicode characters become question marks


> You know, now I'm sure that the chars are getting stored as '?' as
> well. I tried the test
> you suggested again, but with a small modification. I typed:
> 
>       SELECT IF(networkname='?', 1, 0) from networktable;
> 
> and it returned 1. Because I used '?' instead of the chinese char and
> it matched, then
> obviously the stored character is a '?'. So my conclusion is the same
> as James Huang's;
> the problem happens when the string is stored. But no solution yet,
> though.

The first thing to be sure to do is execute this query from the client:

SET CHARACTER SET utf8;

The best way to see what is actually being stored is to select the hex
value of the column:

SELECT HEX(your_column) FROM your_table;

To see hex values as unicode codepoints convert the utf8 to ucs2:

SELECT HEX(CONVERT(your_column USING ucs2)) FROM your_table;

I'm not sure if this is equivalent to the example given in java before
or not, but this is how I always insert hex values directly:

INSERT INTO your_table VALUES (CONVERT(_ucs2 0x1234 USING utf8));

where '1234' is a unicode codepoint.  This way you can enter the
character as the codepoint and convert it to the utf8 equivalent.

FYI I'm fairly sure that MyODBC will not support unicode until version
3.52.  When I tried to use MyODBC for unicode a while back all I got was
???.

You might try upgrading to 4.1.2--it has better support for character
set conversions and a new ucs2_general_uca collation which uses the
Unicode Collation Algorithm.  

Another thing to consider is that MySQL only supports utf8 characters up
to 3 bytes long.  I don't know if this is the case for chinese or not,
but if so that might be another reason to use ucs2.

good luck,

Jeremy March


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to