Hello, internals! While I was working on a new function mb_str_split (https://wiki.php.net/rfc/mb_str_split) for the extension mbstring, I noticed a place to seriously improve the mbfl library performance for the utf-16 encoding. Currently, all variable-length encodings are processed byte-by-byte.
for(int i = 0; i < string_length; ++i){ ....... } utf-8 strings are processed with precounted char length table. while (i < string_length) { int m = mbtab[*p]; i += m; ..... } This conception can be used for the utf-16 encoding, but table size would be 65536 bytes against 256 byte for the utf-8 table. Moreover the tables would be 2, one for the utf-16 big endian and 1 for the utf-16 little endian. The results of my tests show a more than 2 times speed increase. The implementation of the proposed concept is here: https://github.com/php/php-src/pull/3715/commits/d868059626290b7ba773b957045e08c3efb1d603#diff-22d593ced03b2cb94450d9f9990865c8R38 To do, or not to do: that is the question. What do you think? Regards, Ruslan -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php