Re: Why UTF8 need 24bit in MySQL?

2010-06-08 Thread Ryan Chan
Hi,


On Tue, Jun 8, 2010 at 12:44 AM, Warren Young war...@etr-usa.com wrote:
  The Unicode consortium has stated that Unicode will
 never require more than 21 bits per character[*], and 24 bits is the next
 even multiple of 8 up from that.

Maybe off topic, but just curious...If 3 bytes is enough for all
Unicode codepoint, then what is the user of 4byte UTF-8 ?

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Why UTF8 need 24bit in MySQL?

2010-06-07 Thread Ryan Chan
http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html

Since MySQL only support BMP, so in fact 16 bit is needed actually?

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Why UTF8 need 24bit in MySQL?

2010-06-07 Thread Warren Young

On 6/7/2010 9:57 AM, Ryan Chan wrote:

http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html

Since MySQL only support BMP, so in fact 16 bit is needed actually?


I imagine they were thinking they'd extend the support to full Unicode 
in the future and didn't want you to have to dump and reload your 
databases when that happened.  The Unicode consortium has stated that 
Unicode will never require more than 21 bits per character[*], and 24 
bits is the next even multiple of 8 up from that.


[*] Why 21?  Because that's the maximum number of bits you can express 
in 4 bytes with UTF-8 encoding.  If Unicode were allowed to use all 2^32 
code points as originally envisioned, it would require up to 6 bytes per 
character in UTF-8 encoding.  This promise makes UTF-8 code easier to 
write and easier to future-proof without bad performance penalties.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Why UTF8 need 24bit in MySQL?

2010-06-07 Thread Paul DuBois

On Jun 7, 2010, at 11:44 AM, Warren Young wrote:

 On 6/7/2010 9:57 AM, Ryan Chan wrote:
 http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html
 
 Since MySQL only support BMP, so in fact 16 bit is needed actually?
 
 I imagine they were thinking they'd extend the support to full Unicode in the 
 future and didn't want you to have to dump and reload your databases when 
 that happened.  The Unicode consortium has stated that Unicode will never 
 require more than 21 bits per character[*], and 24 bits is the next even 
 multiple of 8 up from that.
 
 [*] Why 21?  Because that's the maximum number of bits you can express in 4 
 bytes with UTF-8 encoding.  If Unicode were allowed to use all 2^32 code 
 points as originally envisioned, it would require up to 6 bytes per character 
 in UTF-8 encoding.  This promise makes UTF-8 code easier to write and easier 
 to future-proof without bad performance penalties.


Supplemental Unicode characters (4-byte) are supported as of MySQL 5.5.3:

http://dev.mysql.com/doc/refman/5.5/en/charset-unicode.html
http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html

-- 
Paul DuBois
Oracle Corporation / MySQL Documentation Team
Madison, Wisconsin, USA
www.mysql.com


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org