In article <515be00e$0$29891$c3e8da3$54964...@news.astraweb.com>,
 Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote:

> On Wed, 03 Apr 2013 18:24:25 +1100, Chris Angelico wrote:
> 
> > On Wed, Apr 3, 2013 at 6:06 PM, Ian Kelly <ian.g.ke...@gmail.com> wrote:
> >> On Wed, Apr 3, 2013 at 12:52 AM, Chris Angelico <ros...@gmail.com>
> >> wrote:
> >>> Hmm. I was about to say "Can you just do a quick collections.Counter()
> >>> of the string widths in 3.3, as an easy way of seeing which ones use
> >>> BMP or higher characters", but I can't find a simple way to query a
> >>> string's width. Can't see it as a method of the string object, nor in
> >>> the string or sys modules. It ought to be easy enough at the C level -
> >>> just look up the two bits representing 'kind' - but I've not found it
> >>> exposed to Python. Is there anything?
> >>
> >> 4 if max(map(ord, s)) > 0xffff else 2 if max(map(ord, s)) > 0xff else 1
> > 
> > Yeah, that's iterating over the whole string (twice, if it isn't width
> > 4). 
> 
> Then don't write it as a one-liner :-P
> 
> n = max(map(ord, s))
> 4 if n > 0xffff else 2 if n > 0xff else 1

This has to inspect the entire string, no?  I posted (essentially) this 
a few days ago:

       if all(ord(c) <= 0xffff for c in s):
            return "it's all bmp"
        else:
            return "it's got astral crap in it"

I'm reasonably sure all() is smart enough to stop at the first False 
value.


> (sys.getsizeof(s) - sys.getsizeof(''))/len(s)
> 
I wouldn't trust getsizeof() to return exactly what you're looking for.
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to