On 21/08/2025 19:48, Collin Funk wrote:
Pádraig Brady <[email protected]> writes:

Re performance, it's good, but it would be great if
we could maintain the LC_ALL=C performance like the i18n patch does.
Some quick testing shows:

   $ yes `seq 100` | head -n 1M > file.in

   # Note /bin/fold has the the (Fedora) i18n patch applied
   $ for L in en_US.UTF-8 C; do
       for FOLD in src/fold src/fold-c /bin/fold; do
         printf "LC_ALL=$L $FOLD: "
         time LC_ALL=$L $FOLD < file.in | wc -l
       done
     done

   LC_ALL=en_US.UTF-8 src/fold: 4194304
   real 0m1.046s
   LC_ALL=en_US.UTF-8 src/fold-c: 4194304
   real 0m8.294s
   LC_ALL=en_US.UTF-8 /bin/fold: 4194304
   real 0m11.556s
   LC_ALL=C src/fold: 4194304
   real 0m0.979s
   LC_ALL=C src/fold-c: 4194304
   real 0m8.277s
   LC_ALL=C /bin/fold: 4194304
   real 0m0.976s

I.e. we beat the i18n patch implementation,
but we don't shortcut the LC_ALL=C case.

Good point. I think it is worth some extra code size and duplication to
preserve the current speed using LC_ALL=C. Maybe something like this:

     if (STREQ_OPT (locale_charset (), "ASCII",
                    'A', 'S', 'C', 'I', 'I', 0, 0, 0, 0))
       fold_file (...)
     else
       fold_file_multibyte (...)

Ideally as little would be duplicated as possible,
but yes that's one way to address the issue.
Note there is a hard_locale(LC_CTYPE) abstraction to detect this.

cheers,
Padraig

Reply via email to