Lars Gullik Bjønnes wrote:
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| Georg Baum wrote:
| > Lars Gullik Bjønnes wrote:
| >
| >> Conversion between the different unicode encodings are pretty cheap.
| > Yes, but what I am more concerned about are lots of ucs4_to_utf8 or
| > vice
| > versa in the code. That just makes it a bit less readable.
| >
| >> | Since the po
| >> | files will eventually be in utf8 it seems natural to use utf8 for
| >> | _(), too.
| >>
| >> Yes. However to make us able to ignore the norm of the po files I am
| >> going to use bind_textdomain_codeset so that we always get utf-8.
| > Good.
| > Here comes the next bit: I discovered that the result of
| > std::vector<char> ucs4_to_utf8(boost::uint32_t c)
| > was never used as a vector. I changed it to std::string, and that
| > simplifies
| > the code. In particular it removes manual fiddling with the terminating
| > '\0', which we should not do IMHO.
|
| Having had a closer look at "unicode.C" it seems that all use of
| std::vector could be replaced by std::basic_string. We just have to
| replace "push_back" with "+="...
|
| Is there any reason why you chose std::vector Lars?
Yes. I did not want to have any confusion.
string.length() will be lying to you when you store utf-8 in it.
Why is that? Because of some trailing \0?
I had also a look at the (qt4) frontend side and the code there will
also benefit from a switch to basic_string...
If the different parts all talk the same language why would there be any
confusion? I mean, if it is just a matter of adding plus or minus one,
that's not a big deal. And I guess we could still of course subclass
basic_string and re-implement length(), couldn't we?
I guess you confused me :-(
Abdel.