On 21/08/2025 19:48, Collin Funk wrote:
Pádraig Brady <[email protected]> writes:
Re performance, it's good, but it would be great if we could maintain the LC_ALL=C performance like the i18n patch does. Some quick testing shows: $ yes `seq 100` | head -n 1M > file.in # Note /bin/fold has the the (Fedora) i18n patch applied $ for L in en_US.UTF-8 C; do for FOLD in src/fold src/fold-c /bin/fold; do printf "LC_ALL=$L $FOLD: " time LC_ALL=$L $FOLD < file.in | wc -l done done LC_ALL=en_US.UTF-8 src/fold: 4194304 real 0m1.046s LC_ALL=en_US.UTF-8 src/fold-c: 4194304 real 0m8.294s LC_ALL=en_US.UTF-8 /bin/fold: 4194304 real 0m11.556s LC_ALL=C src/fold: 4194304 real 0m0.979s LC_ALL=C src/fold-c: 4194304 real 0m8.277s LC_ALL=C /bin/fold: 4194304 real 0m0.976s I.e. we beat the i18n patch implementation, but we don't shortcut the LC_ALL=C case.Good point. I think it is worth some extra code size and duplication to preserve the current speed using LC_ALL=C. Maybe something like this: if (STREQ_OPT (locale_charset (), "ASCII", 'A', 'S', 'C', 'I', 'I', 0, 0, 0, 0)) fold_file (...) else fold_file_multibyte (...)
Ideally as little would be duplicated as possible, but yes that's one way to address the issue. Note there is a hard_locale(LC_CTYPE) abstraction to detect this. cheers, Padraig
