Yes ;)… At some point, now I'm still experimenting with the operations we think we need.
Cheers, Richard On Mar 8, 2017 6:50 PM, "David Edelsohn" <dje....@gmail.com> wrote: > And POWER VSX and Z VX? ;-) > > - David > > > On Wed, Mar 8, 2017 at 12:17 PM, Richard Plangger <planri...@gmail.com> > wrote: > > Hi, > > > > as we discussed on the sprint I have now experimented with an SSE/AVX > > implementation to 'len(utf8 string)' (this includes a check if it is > > valid utf8). Since this is related to this mailing list thread I'll just > > add it here! > > > > I ran some small measurements on it: > > > > Here some explanation of the names: > > > > pypy-seq-.*: sequential implementation in C, nothing fancy just a > baseline > > pypy-vec-sse4-.*: implementation using sse4 (128 bit registers) > > pypy-vec-avx2-.*: implementation using avx2 (256 bit registers) > > libunistring-.*: benchmarking the function u8_check in that gnu library, > > NO length is calculated > > mystrlenutf8-.*: some guy doing length calculation (no validity check) > > only using 64bit words instead of per byte iteration. (see here [1]) > > > > .*-news-de: html of a german website (has quite a lot of 2 byte code > > points), ~ 1MB > > .*-news-cn: worldjournarl.com -> mandarin (html website with lots of 4 > > byte code points) ~ 700 KB > > .*-tipitaka-thai: xml page of some religious text with lots of 3 byte > > code points (~4.5 MB) copied many times (original file was 300KB) > > > > Why is u8u16 missing? Well, as far as I can tell there is no function in > > u8u16 that returns the length of an utf8 string and checks if it is > > valid at the same time, without rewriting it. u8u16 is really just for > > transforming utf8 to utf16. > > > > The benchmark runs read the content from a file (e.g. .*-news-de, a > > german html news website), and in a loop iterates 10 times the > > utf-8-get-length-and-check function written in C and sums up the time > > for each run (using clock_t clock(void) in C, man 3 clock). > > > > ..................... > > pypy-seq-news-de: Median +- std dev: 76.0 us +- 1.4 us > > ..................... > > pypy-sse4-vec-news-de: Median +- std dev: 5.16 us +- 0.14 us > > ..................... > > pypy-avx2-vec-news-de: Median +- std dev: 384 ns +- 11 ns > > ..................... > > libunistring-news-de: Median +- std dev: 33.0 us +- 0.4 us > > ..................... > > mystrlenutf8-news-de: Median +- std dev: 9.25 us +- 0.22 us > > ..................... > > pypy-seq-news-cn: Median +- std dev: 59.8 us +- 1.2 us > > ..................... > > pypy-sse4-vec-news-cn: Median +- std dev: 7.70 us +- 0.12 us > > ..................... > > pypy-avx2-vec-news-cn: Median +- std dev: 23.3 ns +- 0.4 ns > > ..................... > > libunistring-news-cn: Median +- std dev: 30.5 us +- 0.4 us > > ..................... > > mystrlenutf8-news-cn: Median +- std dev: 6.54 us +- 0.20 us > > ..................... > > pypy-seq-tipitaka-thai: Median +- std dev: 939 us +- 39 us > > ..................... > > pypy-sse4-vec-tipitaka-thai: Median +- std dev: 425 us +- 7 us > > ..................... > > pypy-avx2-vec-tipitaka-thai: Median +- std dev: 19.9 ns +- 0.3 ns > > ..................... > > libunistring-tipitaka-thai: Median +- std dev: 615 us +- 28 us > > ..................... > > WARNING: the benchmark seems unstable, the standard deviation is high > > (stdev/median: 17%) > > Try to rerun the benchmark with more runs, samples and/or loops > > > > mystrlenutf8-tipitaka-thai: Median +- std dev: 45.1 us +- 7.9 us > > > > What do you think? > > > > I think it would even be a good idea to take a look at AVX512 (which > > gives you a crazy amount of 512 bits (or 64 bytes) in your vector > register). > > > > The AVX implementation is a bit fishy (compare avx2-vec-tipitaka-thai > > and pypy-avx2-vec-news-cn). I need to recheck that, it would not make > > sense to process 10x 4.5 MB in 20ns and 10x 700KB in 23ns. > > > > As soon as I have ironed out the issue I'll start to think about > indexing... > > > > Cheers, > > Richard > > > > [1] http://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html > > > > On 03/04/2017 07:01 PM, Maciej Fijalkowski wrote: > >> Hello everyone > >> > >> I've been experimenting a bit with faster utf8 operations (and > >> conversion that does not do much). I'm writing down the results so > >> they don't get forgotten, as well as trying to put them in rpython > >> comments. > >> > >> As far as non-SSE algorithms go, for things like splitlines, split > >> etc. is important to walk the utf8 string quickly and check properties > >> of characters. > >> > >> So far the current finding has been that lookup table, for example: > >> > >> def next_codepoint_pos(code, pos): > >> chr1 = ord(code[pos]) > >> if chr1 < 0x80: > >> return pos + 1 > >> return pos + ord(runicode._utf8_code_length[chr1 - 0x80]) > >> > >> is significantly slower than following code (both don't do error > checking): > >> > >> def next_codepoint_pos(code, pos): > >> chr1 = ord(code[pos]) > >> if chr1 < 0x80: > >> return pos + 1 > >> if 0xC2 >= chr1 <= 0xDF: > >> return pos + 2 > >> if chr >= 0xE0 and chr <= 0xEF: > >> return pos + 3 > >> return pos + 4 > >> > >> The exact difference depends on how much multi-byte characters are > >> there and how big the strings are. It's up to 40%, but as a general > >> rule, the more ascii characters are, the less of an impact it has, as > >> well as the larger they are, the more impact memory/L2/L3 cache has. > >> > >> PS. SSE will be faster still, but we might not want SSE for just > splitlines > >> > >> Cheers, > >> fijal > >> _______________________________________________ > >> pypy-dev mailing list > >> pypy-dev@python.org > >> https://mail.python.org/mailman/listinfo/pypy-dev > >> > > _______________________________________________ > > pypy-dev mailing list > > pypy-dev@python.org > > https://mail.python.org/mailman/listinfo/pypy-dev >
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev