Am Mittwoch, 16. August 2006 18:12 schrieb Abdelrazak Younes:
> Lars Gullik Bjønnes wrote:
>
> > string.length() will be lying to you when you store utf-8 in it.
> 
> Why is that? Because of some trailing \0?

No. utf8 is a multibyte encoding: Some characters use one byte, some two 
and some even more AFAIK. The benefit of utf8 is that the ASCII characters 
use the same encoding as in the 7bit ASCII code. string.length() therefore 
does not always give the number of characters in the string if it is in 
utf8.

> If the different parts all talk the same language why would there be any 
> confusion? I mean, if it is just a matter of adding plus or minus one, 
> that's not a big deal. And I guess we could still of course subclass 
> basic_string and re-implement length(), couldn't we?

That would not be so easys, because we would need to parse the utf8 encoded 
string. Better leave that to some library.


Georg

Reply via email to