On Sat, Jan 21, 2023 at 12:37:30PM -0500, Bruce Momjian wrote: > Well, as one of the URLs I quoted said: > > This is by design. wcwidth() is utterly broken. Any terminal or > terminal application that uses it is also utterly broken. Forget > about emoji wcwidth() doesn't even work with combining characters, > zero width joiners, flags, and a whole bunch of other things. > > So, either we have to find a function in the library that will do the > looping over the string for us, or we need to identify the special > Unicode characters that create grapheme clusters and handle them in our > code.
I just checked if wcswidth() would honor graphene clusters, though wcwidth() does not, but it seems wcswidth() treats characters just like wcwidth(): $ LANG=en_US.UTF-8 grapheme_test wcswidth len=7 bytes_consumed=4, wcwidth len=2 bytes_consumed=4, wcwidth len=2 bytes_consumed=3, wcwidth len=0 bytes_consumed=3, wcwidth len=1 bytes_consumed=3, wcwidth len=0 bytes_consumed=4, wcwidth len=2 C test program attached. This is on Debian 11. -- Bruce Momjian <br...@momjian.us> https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
#define _XOPEN_SOURCE #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <string.h> #include <wchar.h> #include <locale.h> int main (int argc, char *argv[]) { char *cp = "👩🏼⚕️🩺"; wchar_t wch[100]; int i; setlocale(LC_ALL, "en_US.UTF-8"); mbstowcs(wch, cp, 100); printf("wcswidth len=%d\n\n", wcswidth(wch, 100)); while (cp[i]) { int res = mbtowc(wch, cp + i, 100); printf("bytes_consumed=%d, ", res); int len = wcwidth(wch[0]); printf("wcwidth len=%d\n", len); i += res; } return 0; }