Pádraig Brady <[email protected]> writes:
> Note building with -O3 gave the following warning
> which should be fixed, but which I've not looked at:
> In function 'mbfile_multi_getc',
> inlined from 'fold_file' at src/fold.c:164:7:
> ./lib/mbfile.h:235:14: error: writing 1 byte into a region of size 0
> [-Werror=stringop-overflow=]
> 235 | *p = *(p + bytes);
> | ^
> ./lib/mbfile.h: In function 'fold_file':
> ./lib/mbfile.h:82:8: note: at offset [36, 4294967268] into destination
> object 'buf' of size 4
> 82 | char buf[MBCHAR_BUF_SIZE];
> | ^
Thanks, I didn't compile with -O3. I'll have a look at it.
> Re performance, it's good, but it would be great if
> we could maintain the LC_ALL=C performance like the i18n patch does.
> Some quick testing shows:
>
> $ yes `seq 100` | head -n 1M > file.in
>
> # Note /bin/fold has the the (Fedora) i18n patch applied
> $ for L in en_US.UTF-8 C; do
> for FOLD in src/fold src/fold-c /bin/fold; do
> printf "LC_ALL=$L $FOLD: "
> time LC_ALL=$L $FOLD < file.in | wc -l
> done
> done
>
> LC_ALL=en_US.UTF-8 src/fold: 4194304
> real 0m1.046s
> LC_ALL=en_US.UTF-8 src/fold-c: 4194304
> real 0m8.294s
> LC_ALL=en_US.UTF-8 /bin/fold: 4194304
> real 0m11.556s
> LC_ALL=C src/fold: 4194304
> real 0m0.979s
> LC_ALL=C src/fold-c: 4194304
> real 0m8.277s
> LC_ALL=C /bin/fold: 4194304
> real 0m0.976s
>
> I.e. we beat the i18n patch implementation,
> but we don't shortcut the LC_ALL=C case.
Good point. I think it is worth some extra code size and duplication to
preserve the current speed using LC_ALL=C. Maybe something like this:
if (STREQ_OPT (locale_charset (), "ASCII",
'A', 'S', 'C', 'I', 'I', 0, 0, 0, 0))
fold_file (...)
else
fold_file_multibyte (...)
Where fold_file doesn't have to care about the extra information
contained in mb_file_t and mbf_char_t.
> BTW I've some general notes on i18n in coreutils at:
> https://www.pixelbeat.org/docs/coreutils_i18n/
Thanks! That is useful.
Collin