On Sun, May 21, 2017 at 3:46 PM, Henri Sivonen <hsivo...@hsivonen.fi> wrote: > I guess instead of looking at the relative slowness and pondering > acceleration tables, I should measure how much Chinese or Japanese > text a Raspberry Pi 3 (the underpowered ARM device I have access to > and that has predictable-enough scheduling to be benchmarkable in a > usefully repeatable way unlike Android devices) can legacy-encode in a > tenth of a second or 1/24th of a second without an acceleration table. > (I posit that with the network roundtrip happening afterwards, no one > is going to care if the form encode step in the legacy case takes up > to one movie frame duration. Possibly, the "don't care" allowance is > much larger.)
Here are numbers from ARMv7 code running on RPi3: UTF-16 to Shift_JIS: 626000 characters per second or the human-readable non-markup text of a Wikipedia article in 1/60th of a second. UTF-16 to GB18030 (same as GBK for the dominant parts): 206000 characters per second or the human-readable non-markup text of a Wikipedia article in 1/15th of a second UTF-16 to Big5: 258000 characters per second or the human-readable non-markup text of a Wikipedia article in 1/20th of a second Considering that usually a user submits considerably less than a Wikipedia article's worth of text in a form at a time, I think we can conclude that as far as user perception of form submission goes, it's OK to ship Japanese and Chinese legacy encoders that do linear search over decode-optimized data (no encode-specific data structures at all) and are extremely slow *relative* (by a factor of over 200!) to UTF-16 to UTF-8 encode. The test data I used was: https://github.com/hsivonen/encoding_bench/blob/master/src/wikipedia/zh_tw.txt https://github.com/hsivonen/encoding_bench/blob/master/src/wikipedia/zh_cn.txt https://github.com/hsivonen/encoding_bench/blob/master/src/wikipedia/ja.txt So it's human-authored text, but my understanding is that the Simplified Chinese version has been machine-mapped from the Traditional Chinese version, so it's possible that some slowness of the Simplified Chinese case is attributable to the conversion from Traditional Chinese exercising less common characters than if it had been human-authored directly as Simplified Chinese. Japanese is not fully ideographic and the kana mapping is a matter of a range check plus offset, which is why the Shift_JIS case is so much faster. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform