Re: Why UTF8 need 24bit in MySQL?
Hi, On Tue, Jun 8, 2010 at 12:44 AM, Warren Young war...@etr-usa.com wrote: The Unicode consortium has stated that Unicode will never require more than 21 bits per character[*], and 24 bits is the next even multiple of 8 up from that. Maybe off topic, but just curious...If 3 bytes is enough for all Unicode codepoint, then what is the user of 4byte UTF-8 ? -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Why UTF8 need 24bit in MySQL?
http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html Since MySQL only support BMP, so in fact 16 bit is needed actually? -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Why UTF8 need 24bit in MySQL?
On 6/7/2010 9:57 AM, Ryan Chan wrote: http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html Since MySQL only support BMP, so in fact 16 bit is needed actually? I imagine they were thinking they'd extend the support to full Unicode in the future and didn't want you to have to dump and reload your databases when that happened. The Unicode consortium has stated that Unicode will never require more than 21 bits per character[*], and 24 bits is the next even multiple of 8 up from that. [*] Why 21? Because that's the maximum number of bits you can express in 4 bytes with UTF-8 encoding. If Unicode were allowed to use all 2^32 code points as originally envisioned, it would require up to 6 bytes per character in UTF-8 encoding. This promise makes UTF-8 code easier to write and easier to future-proof without bad performance penalties. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Why UTF8 need 24bit in MySQL?
On Jun 7, 2010, at 11:44 AM, Warren Young wrote: On 6/7/2010 9:57 AM, Ryan Chan wrote: http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html Since MySQL only support BMP, so in fact 16 bit is needed actually? I imagine they were thinking they'd extend the support to full Unicode in the future and didn't want you to have to dump and reload your databases when that happened. The Unicode consortium has stated that Unicode will never require more than 21 bits per character[*], and 24 bits is the next even multiple of 8 up from that. [*] Why 21? Because that's the maximum number of bits you can express in 4 bytes with UTF-8 encoding. If Unicode were allowed to use all 2^32 code points as originally envisioned, it would require up to 6 bytes per character in UTF-8 encoding. This promise makes UTF-8 code easier to write and easier to future-proof without bad performance penalties. Supplemental Unicode characters (4-byte) are supported as of MySQL 5.5.3: http://dev.mysql.com/doc/refman/5.5/en/charset-unicode.html http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html -- Paul DuBois Oracle Corporation / MySQL Documentation Team Madison, Wisconsin, USA www.mysql.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org