https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119108
Tamar Christina <tnfchris at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |ASSIGNED
Last reconfirmed| |2025-03-05
Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot
gnu.org
Ever confirmed|0 |1
--- Comment #2 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Confirmed.
The only early break vectorization is in the reporting harness in
benchmark::CSVReporter::ReportRuns(std::vector<benchmark::BenchmarkReporter::Run,
std::allocator<benchmark::BenchmarkReporter::Run> > const&)
But.. I can reproduce the slowdown.
Take eg BM_UFlat, this is all scalar code.
The hot function is snappy::DecompressBranchless<char*>,
but for some reason after the PFA patch memmove is no longer inlined.
This causes the slowdown as snappy does small memmove often.
Will take a look.