Pádraig Brady <[email protected]> writes: >> It looks like we'll need to adjust the code to handle invalid chars >> appropriately, >> (and add tests). The following shows how upstream and i18n patch fold >> treat invalid utf8 char \xC3 : >> $ for fold in src/fold /bin/fold; do >> for locale in C en_US.UTF-8; do >> echo "LC_ALL=$locale $fold" >> printf '\xC3' | LC_ALL=$locale $fold -w1 | od -Ax -tx1z -v | head -n1 >> done >> done >> LC_ALL=C src/fold >> 000000 >> LC_ALL=en_US.UTF-8 src/fold >> 000000 >> LC_ALL=C /bin/fold >> 000000 c3 >.< >> LC_ALL=en_US.UTF-8 /bin/fold >> 000000 c3 >.< > > I suppose a concrete way to test that might be: > > # https://datatracker.ietf.org/doc/rfc9839/ bad_unicode() { printf > '\xC3|\u0000|\u0089|\uDEAD|\uD9BF\uDFFF\n'; } test $({ bad_unicode | fold; > bad_unicode; } | uniq | wc -l) = 1 || fail=1
Thanks, I'll have a look at it later today. Also, thanks for this commit [1]. I am used to writing C23 since Gnulib handles missing bool, etc. Too bad we can't do anything to make old compilers support empty labels. Collin [1] https://github.com/coreutils/coreutils/commit/aec4f85476452310463b17c63d2ec3d2ac3d02aa
