Pádraig Brady <[email protected]> writes:

>> It looks like we'll need to adjust the code to handle invalid chars
>> appropriately,
>> (and add tests). The following shows how upstream and i18n patch fold
>> treat invalid utf8 char \xC3 :
>> $ for fold in src/fold /bin/fold; do
>>       for locale in C en_US.UTF-8; do
>>         echo "LC_ALL=$locale $fold"
>>         printf '\xC3' | LC_ALL=$locale $fold -w1 | od -Ax -tx1z -v | head -n1
>>       done
>>     done
>> LC_ALL=C src/fold
>> 000000
>> LC_ALL=en_US.UTF-8 src/fold
>> 000000
>> LC_ALL=C /bin/fold
>> 000000 c3                                               >.<
>> LC_ALL=en_US.UTF-8 /bin/fold
>> 000000 c3                                               >.<
>
> I suppose a concrete way to test that might be:
>
>   # https://datatracker.ietf.org/doc/rfc9839/  bad_unicode() { printf 
> '\xC3|\u0000|\u0089|\uDEAD|\uD9BF\uDFFF\n'; }  test $({ bad_unicode | fold; 
> bad_unicode; } | uniq | wc -l) = 1 || fail=1

Thanks, I'll have a look at it later today.

Also, thanks for this commit [1]. I am used to writing C23 since Gnulib
handles missing bool, etc. Too bad we can't do anything to make old
compilers support empty labels.

Collin

[1] 
https://github.com/coreutils/coreutils/commit/aec4f85476452310463b17c63d2ec3d2ac3d02aa

Reply via email to