Hello, internals!
While I was working on a new function mb_str_split
(https://wiki.php.net/rfc/mb_str_split) for the extension mbstring, I
noticed a place to seriously improve the mbfl library performance for
the utf-16 encoding.
Currently, all variable-length encodings are processed byte-by-byte.
for(int i = 0; i < string_length; ++i){
.......
}
utf-8 strings are processed with precounted char length table.
while (i < string_length) {
int m = mbtab[*p];
i += m;
.....
}
This conception can be used for the utf-16 encoding, but table size
would be 65536 bytes against 256 byte for the utf-8 table. Moreover
the tables would be 2, one for the utf-16 big endian and 1 for the
utf-16 little endian.
The results of my tests show a more than 2 times speed increase.
The implementation of the proposed concept is here:
https://github.com/php/php-src/pull/3715/commits/d868059626290b7ba773b957045e08c3efb1d603#diff-22d593ced03b2cb94450d9f9990865c8R38
To do, or not to do: that is the question.
What do you think?
Regards,
Ruslan
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php