Re: [Patch] optimize utf8_to_ucs4

Abdelrazak Younes Sun, 29 Oct 2006 11:50:39 -0800

Georg Baum wrote:

Am Sonntag, 29. Oktober 2006 18:26 schrieb Abdelrazak Younes:
Yes but I don't use that in my new utf8_to_ucs4() in docstring.[Ch] andI propose that we do the same for conversion.
We can't, because we don't know the length of the resulting string. Forutf8 -> ucs4 we know that the result is at most as long as the input (andin many cases only a few bytes shorter, at least for european languages).For ucs4 -> utf8 we would have to use a result string with a length of 6times the input length, with the average length close to the inpurt lengthif we want to be able to convert everything. That is probably too much tobe efficient.

I think speed wise it will be very efficient. Memory wise, well thelength is indeed multiplied by 6 but the actual data size increase isless than that. If N is the number of unicode char, the ucs4 versionwould occupy 4xN bytes exactly and the utf8 would contain 6xN bytes atmost. So it is only a 50% increase. Resizing a string to a lower size ischeap.


Abdel.

Re: [Patch] optimize utf8_to_ucs4

Reply via email to