Georg Baum wrote:
Am Sonntag, 29. Oktober 2006 18:26 schrieb Abdelrazak Younes:
Yes but I don't use that in my new utf8_to_ucs4() in docstring.[Ch] and
I propose that we do the same for conversion.
We can't, because we don't know the length of the resulting string. For
utf8 -> ucs4 we know that the result is at most as long as the input (and
in many cases only a few bytes shorter, at least for european languages).
For ucs4 -> utf8 we would have to use a result string with a length of 6
times the input length, with the average length close to the inpurt length
if we want to be able to convert everything. That is probably too much to
be efficient.
I think speed wise it will be very efficient. Memory wise, well the
length is indeed multiplied by 6 but the actual data size increase is
less than that. If N is the number of unicode char, the ucs4 version
would occupy 4xN bytes exactly and the utf8 would contain 6xN bytes at
most. So it is only a 50% increase. Resizing a string to a lower size is
cheap.
Abdel.