Georg Baum wrote:
Am Sonntag, 29. Oktober 2006 18:26 schrieb Abdelrazak Younes:
Yes but I don't use that in my new utf8_to_ucs4() in docstring.[Ch] and I propose that we do the same for conversion.

We can't, because we don't know the length of the resulting string. For utf8 -> ucs4 we know that the result is at most as long as the input (and in many cases only a few bytes shorter, at least for european languages). For ucs4 -> utf8 we would have to use a result string with a length of 6 times the input length, with the average length close to the inpurt length if we want to be able to convert everything. That is probably too much to be efficient.

I think speed wise it will be very efficient. Memory wise, well the length is indeed multiplied by 6 but the actual data size increase is less than that. If N is the number of unicode char, the ucs4 version would occupy 4xN bytes exactly and the utf8 would contain 6xN bytes at most. So it is only a 50% increase. Resizing a string to a lower size is cheap.

Abdel.

Reply via email to