Am Mittwoch, 16. August 2006 18:12 schrieb Abdelrazak Younes: > Lars Gullik Bjønnes wrote: > > > string.length() will be lying to you when you store utf-8 in it. > > Why is that? Because of some trailing \0?
No. utf8 is a multibyte encoding: Some characters use one byte, some two and some even more AFAIK. The benefit of utf8 is that the ASCII characters use the same encoding as in the 7bit ASCII code. string.length() therefore does not always give the number of characters in the string if it is in utf8. > If the different parts all talk the same language why would there be any > confusion? I mean, if it is just a matter of adding plus or minus one, > that's not a big deal. And I guess we could still of course subclass > basic_string and re-implement length(), couldn't we? That would not be so easys, because we would need to parse the utf8 encoded string. Better leave that to some library. Georg
