On 16/03/2026 14:26, Leonid Evdokimov wrote:
On Mon, Mar 2, 2026 at 8:51 PM Pádraig Brady <[email protected]> wrote:
The INTEL_JCC_ERRATUM stuff may be more generally applicable,
and more appropriate for a separate patch.
I've isolated the Jcc story into a separate patch and benchmarked it
with various GCC and Clang versions. I've summarized results in the
comment, but I can share raw numbers if those are interesting.
It brings another question. align-loops=32 is a matter of ~33% speedup
for GearHash (2.0 cpb down to 1.33) on Skylake, just like Jcc fix (4.0
cpb down to 2.7).
What's the right thing to do in this case at the present moment?
Should it be reported as an issue to compiler[s]? Should coreutils
have default CFLAGS for specific files and/or CC versions that are
more convoluted than "-g -O2"?
Yes this is tricky.
Ideally we default to -O2 and leave at that.
Newer hardware for example won't have to worry about the JCC issue,
so that might be left as a distro level / user level setting.
Though we don't want to hide significant perf wins
behind undocumented / esoteric flags.
Maybe we have a ./configure --enable-jcc-perf-mitigation option,
given that it is significant to our perf.
align-loops is fairly generic, but also may increase size.
Ideally we could tag the function or line with asm directive or something
to keep this as focused as possible. Some interesting notes here:
https://maskray.me/blog/2025-08-24-understanding-alignment-from-source-to-object-file
It would be worth discussing this with compiler folks though,
as they'd be very interested in significant wins like that
if they could isolate the appropriate places.
Was the issue with errnos in getlimits, too noisy logs?
errnos in getlimits, probably, came from the missing $(git submodule
update) call. I failed to reproduce the issue.
The scan_inference being "uninitialized" is the warning coming from
gcc-11.4. Newer GCC versions, namely 12.x, 13.x, 14.x and 15.x - all
of them* are happy with the code.
*) God save cfarm maintainers!
It would be good to augment the "invalid rolling hash window"
error with a valid range for the selected hash.
I've gone through error messages and made them less horrible.
The patches are attached & force-pushed to the same branch at github.
You will also need to assign copyright for a change of this size.
WIP.
Excellent.
thanks,
Padraig