Georg Baum wrote:
Am Mittwoch, 16. August 2006 18:41 schrieb Abdelrazak Younes:
Hum... I am not I follows everything but let me summarize what I understand from current code. The std::vectors I am talking about are:

* vector<char>: could be replaced by std::basic_string<char>
* vector<unsigned char>: that is ucs2 right? That could be replaced by std::basic_string<unsigned char> * vector<boost::uint32_t>: I guess that is ucs4 and that could be replaced by std::basic_string<unsigned char>

aka lyx::docstring

So, IIUC, we could switch to basic_string for char, ucs2 and ucs4 without any problem. The utf8 case is an entirely different problem.



Internally we should just use one of those three types.

IMO only the last one. ucs2 is only for talking to qt, but that can easily be wrapped in fromqstr/toqstr, so we don't really need a ucs2 string type.

Yes.


The conversion to this complicate utf8 encoding should happen on input/output only. Handling a multi-byte encoding internally is just a recipe for a buggy future IMHO.

So what I do not get right here?

multibyte != variable-byte. Multibyte is not bad per se.

Yes, that's what I meant.

Both ucs2 and ucs4 use a fixed number of bytes for one character (2 and 4, respectively, surprise, surprise!). The problem is a variable-byte encoding such as utf8.

Yes I understood that far, sorry for "quiproquo". IMHO, the only code that should refer to the utf8 encoding is a code that handles writing or reading a file.

Abdel.

Reply via email to