https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119009
--- Comment #3 from Michal Jireš <mjires at gcc dot gnu.org> ---
Thanks a lot for the script.
I have reproduced it:
# bad3714b - before my patch
BM_UIOVecSink/0 33.8 us 33.8 us 20659 bytes_per_second=2.82508G/s html
# 0895aef0 - my patch
BM_UIOVecSink/0 41.0 us 41.0 us 16890 bytes_per_second=2.32381G/s html
However current trunk shows the opposite:
# 3605e057 - trunk
BM_UIOVecSink/0 33.7 us 33.7 us 20161 bytes_per_second=2.82955G/s html
# revert patch
BM_UIOVecSink/0 39.9 us 39.9 us 17399 bytes_per_second=2.38832G/s html
Is it still a problem on your machine with current trunk?
Perf record/report of:
snappy_benchmark --benchmark_filter=BM_UIOVecSink/0
--benchmark_min_warmup_time=5 --benchmark_time_unit=us
shows regression in functions:
61.46% void
snappy::SnappyDecompressor::DecompressAllTags<snappy::SnappyIOVecWriter>(snappy::SnappyIOVecWriter*)
25.65% snappy::(anonymous namespace)::IncrementalCopy(char const*, char*,
char*, char*)
relevant symbols:
_ZN6snappy18SnappyDecompressor17DecompressAllTagsINS_17SnappyIOVecWriterEEEvPT_
_ZN6snappy12_GLOBAL__N_1L15IncrementalCopyEPKcPcS3_S3_
are identical outside of address changes.
Changing alignment of DecompressAllTags with asm("nop; nop") or
__attribute__((aligned(128))) removes the regression.
19,023,629 branch-misses:u # bad3714b
53,781,446 branch-misses:u # 0895aef0
The underlying problem seems to be branch misses caused by different alignment,
but I cannot pinpoint any specific instruction(s) as a source.
I am not sure we can reliably prevent this. In any case, reliable solution
would be unrelated to my patch.