On 13 June 2012 11:40, Henrik Sperre Johansen
<henrik.s.johan...@veloxit.no> wrote:
> On 13.06.2012 10:31, Philippe Marschall wrote:
>>
>> On 06/13/2012 04:44 AM, Igor Stasenko wrote:
>>>
>>> Hi, hardcore hackers.
>>> please take a look at the code and tell if it can be improved.
>>>
>>> The AsmJit snippet below transforms an unicode integer value
>>> to 1..4-byte sequence of utf-8
>>>
>>> then the outer piece of code (which is not yet written) will
>>> accumulate the results of this snippet
>>> to do a memory-aligned (4byte) writes..
>>> like that, if 4 unicode characters can be encoded into 4 utf-8 bytes
>>> (which mostly the case for latin-1 char range), then there will be
>>> 4 memory reads (to read four 32-bit unicode values) but only single
>>> memory write (to write four 8-bit utf-8 encoded values).
>>>
>>> The idea is to make utf-8 encoding speed close to memory copying speed :)
>>
>>
>> In Seaside 3.1 we go one step further. Imagine you have a long
>> ByteString and only few non-ASCII characters. We do not want to have to
>> copy the whole string just to utf-8 encode a few characters, so we
>> combine the above approach with #next:putAll:startingAt: so that we only
>> have to encode and copy the non-ASCII characters, everything else is not
>> copied.
>>
>>
>> Cheers
>> Philippe
>>
>>
> Both Pharo and Squeak default TextConverters have done something similar for
> the last 1 1/2 years, see (in Pharo) nextPutByteString:toSteam:
> What Igor describes seems aimed at encoding WideString -> utf8 though, which
> is still slow with the default converters.
>
> As to the assembly, is leadingChar gone entirely? Otherwise the branching
> may fail miserably.
>

yes. In Pharo, leadingchar == 0 is unicode.

of course i can add another branch to check if unicode value is
greater than 16r10FFFF
and just fail primitive if it is.

>
> Cheers,
> Henry



-- 
Best regards,
Igor Stasenko.

Reply via email to