Georg Baum wrote:
Am Mittwoch, 16. August 2006 18:41 schrieb Abdelrazak Younes:
Hum... I am not I follows everything but let me summarize what I
understand from current code. The std::vectors I am talking about are:
* vector<char>: could be replaced by std::basic_string<char>
* vector<unsigned char>: that is ucs2 right? That could be replaced by
std::basic_string<unsigned char>
* vector<boost::uint32_t>: I guess that is ucs4 and that could be
replaced by std::basic_string<unsigned char>
aka lyx::docstring
So, IIUC, we could switch to basic_string for char, ucs2 and ucs4
without any problem. The utf8 case is an entirely different problem.
Internally we should just use one of those three types.
IMO only the last one. ucs2 is only for talking to qt, but that can easily
be wrapped in fromqstr/toqstr, so we don't really need a ucs2 string type.
Yes.
The conversion
to this complicate utf8 encoding should happen on input/output only.
Handling a multi-byte encoding internally is just a recipe for a buggy
future IMHO.
So what I do not get right here?
multibyte != variable-byte. Multibyte is not bad per se.
Yes, that's what I meant.
Both ucs2 and ucs4
use a fixed number of bytes for one character (2 and 4, respectively,
surprise, surprise!). The problem is a variable-byte encoding such as
utf8.
Yes I understood that far, sorry for "quiproquo". IMHO, the only code
that should refer to the utf8 encoding is a code that handles writing or
reading a file.
Abdel.