Re: Possible Unicode Problems in Busybox - Collect and Discussion

Harald Becker Wed, 13 Aug 2014 06:43:20 -0700

ive seen several implementations which use mbtowc functions to test some
special chars, this is not correct for utf 8 in my opinion.

To count the number of UTF-8 characters is really simple, just count allbytes except those with value in range 0x80 to 0xBF. This has twoexceptions 0xFE and 0xFF which are no official UTF-8 characters, but Ithink it's not wrong to count and behave as such.



counting can be done with one logical an one compare instruction:

if ((c ^ 0x40) < 0xC0) n++


_______________________________________________
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: Possible Unicode Problems in Busybox - Collect and Discussion

Reply via email to