Hi, I'm having trouble working with specific UTF-8 characters. For example, the U+10330 character (UTF8: 0xF0 0x90 0x8C 0xB0).
Background: I am trying to clone wiktionary onto local intranets in a series of (disconnected) schools in Nepal. I'm encountering these problems when trying to import their big db dump, but have narrowed it down to a simple test-case below. I am using MySQL-5.0.77 client and server on Linux. I know these kinds of problems are commonly user errors, but I think I've covered all the bases. First, my command line environment: # locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" all UTF-8. snippets of my my.cnf: [mysqld] character_set_server=utf8 [mysql] default-character-set=utf8 inside mysql: mysql> SHOW VARIABLES LIKE "character\_set\_%"; +--------------------------+--------+ | Variable_name | Value | +--------------------------+--------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | +--------------------------+--------+ Hopefully I have convinced you that I am running in a true UTF-8 environment. Now onto the issue: in the above environment I run: CREATE TABLE dsd ( `page_id` int(10) unsigned NOT NULL auto_increment, `page_title` varchar(255) character set utf8 collate utf8_bin NOT NULL, PRIMARY KEY (`page_id`), UNIQUE KEY `name_title` (`page_title`) ) now I insert one record with a known-working UTF-8 character: INSERT INTO dsd (page_title) VALUES (0xc2a3); This is the UK pound sign: £ http://www.fileformat.info/info/unicode/char/00a3/index.htm Running a SELECT statement shows that this was inserted just fine. Now the problematic character: INSERT INTO dsd (page_title) VALUES (0xf0908cb0); This character is http://www.fileformat.info/info/unicode/char/10330/index.htm This gives me the warning: Warning (Code 1366): Incorrect string value: '\xF0\x90\x8C\xB0' for column 'page_title' at row 1 and results in a zero-length string being inserted instead. Can anyone else reproduce this? This is definitely a valid UTF-8 character. Why is MySQL rejecting it? The same happens if I input the character directly (rather than using the hex representation) and also if I input that character directly from a UTF-8 text file. Any ideas? Thanks, Daniel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=arch...@jab.org