In article <515c448c$0$29966$c3e8da3$54964...@news.astraweb.com>, Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote:
> On Wed, 03 Apr 2013 09:43:06 -0400, Roy Smith wrote: > > [...] > >> n = max(map(ord, s)) > >> 4 if n > 0xffff else 2 if n > 0xff else 1 > > > > This has to inspect the entire string, no? > > Correct. A more efficient implementation would be: > > def char_size(s): > for n in map(ord, s): > if n > 0xFFFF: return 4 > if n > 0xFF: return 2 > return 1 > > > > > I posted (essentially) this a few days ago: > > > > if all(ord(c) <= 0xffff for c in s): > > return "it's all bmp" > > else: > > return "it's got astral crap in it" > > > It's not "astral crap". People use it, and they'll use it more in the > future. Just because you don't, doesn't give you leave to make > disparaging remarks about it. > > Honestly, it's really painful to see how history repeats itself: > > "Bah humbug, why do we need to support the SMP astral crap? The Unicode > BMP is more than enough for everybody." Come on, guys. It was a joke. I'm the guy who was complaining that my database doesn't support non-BMP, remember? -- http://mail.python.org/mailman/listinfo/python-list