SSE measn https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions?
in comparison to CPython is this much slower ? On Sun, Mar 5, 2017 at 12:32 AM Maciej Fijalkowski <fij...@gmail.com> wrote: > Hello everyone > > I've been experimenting a bit with faster utf8 operations (and > conversion that does not do much). I'm writing down the results so > they don't get forgotten, as well as trying to put them in rpython > comments. > > As far as non-SSE algorithms go, for things like splitlines, split > etc. is important to walk the utf8 string quickly and check properties > of characters. > > So far the current finding has been that lookup table, for example: > > def next_codepoint_pos(code, pos): > chr1 = ord(code[pos]) > if chr1 < 0x80: > return pos + 1 > return pos + ord(runicode._utf8_code_length[chr1 - 0x80]) > > is significantly slower than following code (both don't do error checking): > > def next_codepoint_pos(code, pos): > chr1 = ord(code[pos]) > if chr1 < 0x80: > return pos + 1 > if 0xC2 >= chr1 <= 0xDF: > return pos + 2 > if chr >= 0xE0 and chr <= 0xEF: > return pos + 3 > return pos + 4 > > The exact difference depends on how much multi-byte characters are > there and how big the strings are. It's up to 40%, but as a general > rule, the more ascii characters are, the less of an impact it has, as > well as the larger they are, the more impact memory/L2/L3 cache has. > > PS. SSE will be faster still, but we might not want SSE for just splitlines > > Cheers, > fijal > _______________________________________________ > pypy-dev mailing list > pypy-dev@python.org > https://mail.python.org/mailman/listinfo/pypy-dev >
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev