Am Mittwoch, 16. August 2006 18:41 schrieb Abdelrazak Younes: > Hum... I am not I follows everything but let me summarize what I > understand from current code. The std::vectors I am talking about are: > > * vector<char>: could be replaced by std::basic_string<char> > * vector<unsigned char>: that is ucs2 right? That could be replaced by > std::basic_string<unsigned char> > * vector<boost::uint32_t>: I guess that is ucs4 and that could be > replaced by std::basic_string<unsigned char>
aka lyx::docstring > Internally we should just use one of those three types. IMO only the last one. ucs2 is only for talking to qt, but that can easily be wrapped in fromqstr/toqstr, so we don't really need a ucs2 string type. > The conversion > to this complicate utf8 encoding should happen on input/output only. > Handling a multi-byte encoding internally is just a recipe for a buggy > future IMHO. > > So what I do not get right here? multibyte != variable-byte. Multibyte is not bad per se. Both ucs2 and ucs4 use a fixed number of bytes for one character (2 and 4, respectively, surprise, surprise!). The problem is a variable-byte encoding such as utf8. Georg
