On Wed, Aug 13, 2014 at 3:42 PM, Harald Becker <ra...@gmx.de> wrote: > >> ive seen several implementations which use mbtowc functions to test some >> special chars, this is not correct for utf 8 in my opinion. > > > To count the number of UTF-8 characters is really simple, just count all > bytes except those with value in range 0x80 to 0xBF. This has two exceptions > 0xFE and 0xFF which are no official UTF-8 characters, but I think it's not > wrong to count and behave as such. > > > counting can be done with one logical an one compare instruction: > > if ((c ^ 0x40) < 0xC0) n++
include/{libbb,unicode}.h already have a bunch of helpers to do unicode_strlen(), and a few other typical functions: typedef struct uni_stat_t { unsigned byte_count; unsigned unicode_count; unsigned unicode_width; } uni_stat_t; /* Returns a string with unprintable chars replaced by '?' or * SUBST_WCHAR. This function is unicode-aware. */ const char* FAST_FUNC printable_string(uni_stat_t *stats, const char *str); /* Number of unicode chars. Falls back to strlen() on invalid unicode */ size_t FAST_FUNC unicode_strlen(const char *string); /* Width on terminal */ size_t FAST_FUNC unicode_strwidth(const char *string); enum { UNI_FLAG_PAD = (1 << 0), }; char* FAST_FUNC unicode_conv_to_printable(uni_stat_t *stats, const char *src); char* FAST_FUNC unicode_conv_to_printable_fixedwidth(/*uni_stat_t *stats,*/ const char *src, unsigned width); _______________________________________________ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox