On Wed, Jun 4, 2014 at 1:37 PM, Rustom Mody <rustompm...@gmail.com> wrote: > 2. My casual/cursory reading of the contents of the SMP-planes > suggests that the stuff there is are things like > - egyptian hieroplyphics > - mahjong characters > - ancient greek musical symbols > - alchemical symbols etc etc. > > IOW from pov of a universallly acceptable character set this is mostly > rubbish > > And so a pure BMP-supporting implementation may be a reasonable > compromise. [As long as no surrogate-pairs are there]
Not if you're working on the internet. There are several critical groups of characters that aren't in the BMP, such as: 1) Most or all Chinese and Japanese characters 2) Heaps of emoticons and fancy letters 3) Mathematical symbols You can't ignore those. You might be able to say "Well, my program will run slower if you throw these at it", but if you're going down that route, you probably want the full FSR and the advantages it confers on ASCII and Latin-1 strings. Binding your program to BMP-only is nearly as dangerous as binding it to ASCII-only; potentially worse, because you can run an awful lot of artificial tests without remembering to stick in some astral characters. It's not rubbish. It's important stuff that you need to deal with. ChrisA -- https://mail.python.org/mailman/listinfo/python-list