Re: [Patch] optimize ucs4 to local conversion

Georg Baum Wed, 16 May 2007 07:06:09 -0700

Abdelrazak Younes wrote:

> Georg Baum wrote:
>> That would be conceptually wrong. If you convert a given UCS4 character
>> into an eightbit encoding you never know whether the result will be only
>> one character, not even in fixed width encodings. For example the single
>> byte fixed width encoding iso_8859-7 has two modifier letters: REVERSED
>> COMMA and APOSTROPHE. Therefore a single UCS4 character can result in two
>> iso_8859-7 characters.
> 
> If that is true, then we have a problem in Encoding::init() because we
> only test for the first 256 character for fixed width encodings.


Right. I overlooked that case.

>> I believe that I once read about an encoding that needs more than 4 bytes
>> for one code point, but am not 100% sure. Since it does not cost anything
>> to support such a beast it should be supported IMHO.
> 
> OK.

Note that the test in my patch might be incorrect, so that it does not come
for free.

        if (bytes >= 0)
                out.resize(bytes);
        else if (errno == E2BIG)
                // Use unoptimized version.
                // Does only happen for exotic encodings
                out = ucs4_to_eightbit(&ucs4, 1, encoding);
        else
                out.clear();

should be better.

> So, I will think a bit more about this and try to find a correct
> solution for 1.5.0. Right now, the simplest solution I can think of is
> to generate the correspondence table between ucs4 and the different
> encodings using iconv and distribute that.

I also thought of that. I don't really like this solution, because not all 
iconv implementations behave alike (was discussed in bugzilla, but I forgot
the number), so a table that is valid for one implementation does not need
to be valid for another.
Another possibility that avoids this problem is to define the maximum UCS4
code point (and maybe minimum, too) for each encoding in lib/encodings. I
guess that this would speed up the table generation a lot, since all the
exotic code points do not need to be tested.

> Or generate them on first use 
> in Encoding::init().

??? This happens!


Georg

Re: [Patch] optimize ucs4 to local conversion

Reply via email to