Re: Most pythonic way to truncate unicode?

John Machin Thu, 28 May 2009 21:11:32 -0700

John Machin <sjmachin <at> lexicon.net> writes:

> Andrew Fong <FongAndrew <at> gmail.com> writes:


 > Are
> > there any built-in ways to do something like this already? Or do I
> > just have to iterate over the unicode string?
> 
> Converting each character to utf8 and checking the
> total number of bytes so far?
> Ooooh, sloooowwwwww!
> 

Somewhat faster:

u8len = 0
for u in unicode_string:
   if u <= u'\u007f':
      u8len += 1
   elif u <= u'\u07ff':
      u8len += 2
   elif u <= u'\uffff':
      u8len += 3
   else:
      u8len += 4

Cheers,
John

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Most pythonic way to truncate unicode?

Reply via email to