On Wednesday, June 4, 2014 9:22:54 AM UTC+5:30, Chris Angelico wrote: > On Wed, Jun 4, 2014 at 1:37 PM, Rustom Mody wrote: > > And so a pure BMP-supporting implementation may be a reasonable > > compromise. [As long as no surrogate-pairs are there]
> Not if you're working on the internet. There are several critical > groups of characters that aren't in the BMP, such as: Of course. But what has the internet to do with micropython? This is their stated goal: | Micro Python is a lean and fast implementation of the Python | programming language (python.org) that is optimised to run on a | microcontroller. > 1) Most or all Chinese and Japanese characters Dont know how you count 'most' | One possible rationale is the desire to limit the size of the full | Unicode character set, where CJK characters as represented by discrete | ideograms may approach or exceed 100,000 (while those required for | ordinary literacy in any language are probably under 3,000). Version 1 | of Unicode was designed to fit into 16 bits and only 20,940 characters | (32%) out of the possible 65,536 were reserved for these CJK Unified | Ideographs. Later Unicode has been extended to 21 bits allowing many | more CJK characters (75,960 are assigned, with room for more). | From http://en.wikipedia.org/wiki/Han_unification -- https://mail.python.org/mailman/listinfo/python-list