Re: [pypy-dev] Speeds of various utf8 operations

Phyo Arkar Sat, 04 Mar 2017 10:36:53 -0800

SSE measn https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions?


in comparison to CPython is this much slower ?

On Sun, Mar 5, 2017 at 12:32 AM Maciej Fijalkowski <[email protected]> wrote:

> Hello everyone
>
> I've been experimenting a bit with faster utf8 operations (and
> conversion that does not do much). I'm writing down the results so
> they don't get forgotten, as well as trying to put them in rpython
> comments.
>
> As far as non-SSE algorithms go, for things like splitlines, split
> etc. is important to walk the utf8 string quickly and check properties
> of characters.
>
> So far the current finding has been that lookup table, for example:
>
>  def next_codepoint_pos(code, pos):
>      chr1 = ord(code[pos])
>      if chr1 < 0x80:
>          return pos + 1
>     return pos + ord(runicode._utf8_code_length[chr1 - 0x80])
>
> is significantly slower than following code (both don't do error checking):
>
> def next_codepoint_pos(code, pos):
>     chr1 = ord(code[pos])
>     if chr1 < 0x80:
>         return pos + 1
>     if 0xC2 >= chr1 <= 0xDF:
>         return pos + 2
>     if chr >= 0xE0 and chr <= 0xEF:
>         return pos + 3
>     return pos + 4
>
> The exact difference depends on how much multi-byte characters are
> there and how big the strings are. It's up to 40%, but as a general
> rule, the more ascii characters are, the less of an impact it has, as
> well as the larger they are, the more impact memory/L2/L3 cache has.
>
> PS. SSE will be faster still, but we might not want SSE for just splitlines
>
> Cheers,
> fijal
> _______________________________________________
> pypy-dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/pypy-dev
>

_______________________________________________
pypy-dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/pypy-dev

Re: [pypy-dev] Speeds of various utf8 operations

Reply via email to