On Sat, Nov 11, 2023 at 09:06:41PM +0100, Bruno Haible wrote: > [CCing bug-gnulib] > Indeed, the c32* functions by design work only on those Unicode characters > that can be represented as multibyte sequences in the current locale. > > I'll document this better in the Gnulib manual. > > Since you want texinfo to work on UTF-8 encoded text with characters outside > the repertoire of the current locale, you'll need the libunistring functions, > documented in > <https://www.gnu.org/software/libunistring/manual/html_node/uniwidth_002eh.html>. > Namely, replace c32width with uc_width.
Thanks, that seems to work perfectly. I also changed c32isupper to uc_is_upper. The gnulib manual stated (node "isupper"): ‘c32isupper’ This function operates in a locale dependent way, on 32-bit wide characters. In order to use it, you first have to convert from multibyte to 32-bit wide characters, using the ‘mbrtoc32’ function. It is provided by the Gnulib module ‘c32isupper’. ... ‘uc_is_upper’ This function operates in a locale independent way, on Unicode characters. It is provided by the Gnulib module ‘unictype/ctype-upper’. - and we wanted the "locale independent way". I did not understand why uc_width was said to be "locale dependent": "These functions are locale dependent." - from <https://www.gnu.org/software/libunistring/manual/html_node/uniwidth_002eh.html#index-uc_005fwidth>. I also don't understand the purpose of the "encoding" argument -- can this always be "UTF-8"? I'm also unclear on the exact relationship between the types char32_t, ucs4_t and uint32_t. For example, uc_width takes a ucs4_t argument but u8_mbtouc writes to a char32_t variable. In the code I committed, I used a cast to ucs4_t when calling uc_width.