mtrofin wrote: Ah, ok. About that one: `perf stat -e instructions -r 10` actually shows ~6% _improvement_ with the new change; when I look at `cycles`I see a 1% improvement. Wallclock average showed the same. But, the wallclock measurements are actually pretty noisy - as much as 1.31% over the 10 reps.
Looking at the profile itself, (`perf record -e instructions`): Before: ``` + 13.23% 13.23% ld.lld lld [.] llvm::MD5::body(llvm::ArrayRef<unsigne + 12.83% 12.82% ld.lld lld [.] llvm::SimpleBitstreamCursor::Read(unsi + 10.19% 10.19% ld.lld lld [.] llvm::SimpleBitstreamCursor::ReadVBR64 + 6.54% 0.00% ld.lld [unknown] [.] 0000000000000000 + 4.64% 4.64% ld.lld lld [.] llvm::BitstreamCursor::readRecord(unsi + 2.11% 2.11% ld.lld lld [.] llvm::xxh3_64bits(unsigned char const* ``` After: ``` + 15.02% 15.02% ld.lld lld [.] llvm::SimpleBitstreamCursor::Read(unsi + 10.64% 10.64% ld.lld lld [.] llvm::SimpleBitstreamCursor::ReadVBR64 + 8.56% 0.00% ld.lld [unknown] [.] 0000000000000000 + 6.48% 6.48% ld.lld lld [.] llvm::BitstreamCursor::readRecord(unsi + 3.19% 3.19% ld.lld lld [.] llvm::MD5::body(llvm::ArrayRef<unsigne + 2.96% 2.96% ld.lld lld [.] llvm::xxh3_64bits(unsigned char const* ``` This lines up with the "we serialize more" (and so we deserialize during linking). Not sure what to make of the `perf stat` results difference. Perhaps cold runs -> cold caches -> patch is deserialization heavy -> deserialization (of the GUIDs) costs more than MD5 hashing, but then (hot caches) "the tables flip". https://github.com/llvm/llvm-project/pull/201849 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
