balazs-benics-sonarsource wrote:

Backporting this updated version to clang-19 was not easy but it allowed to 
verify this PR.
Unfortunately, there is a bug in this updated version, that I'll explain.
Even after I fixed this bug, however, on a sample it ran about 19.36% slower 
under analyzing hashing-heavy real-world code.
<details>
<summary>Example hashing-heavy real-world code from 
<i>hyperscan:state_compress.c:storecompressed512_64bit</i></summary>

<pre>
typedef unsigned long long __attribute__((aligned((8)))) u64a;

u64a compress64(u64a x, u64a m) {
  if ((x & m) == 0)
      return 0;
  x &= m;
  u64a mk = ~m << 1;
  for (unsigned i = 0; i < 6; i++) {
    u64a mp = mk ^ (mk << 1);
    mp ^= mp << 2;
    mp ^= mp << 4;
    mp ^= mp << 8;
    mp ^= mp << 16;
    mp ^= mp << 32;
    u64a mv = mp & m;
    m = (m ^ mv) | (mv >> (1 << i));
    u64a t = x & mv;
    x = (x ^ t) | (t >> (1 << i));
    mk = mk & ~mp;
  }
  return x;
}

void storecompressed512_64bit(u64a *m, u64a *x) {
  u64a v[8] = {
    compress64(x[0], m[0]),
    compress64(x[1], m[1]),
    compress64(x[2], m[2]),
    compress64(x[3], m[3]),
    compress64(x[4], m[4]),
    compress64(x[5], m[5]),
    compress64(x[6], m[6]),
    compress64(x[7], m[7]),
  };
  (void)v;
}
</pre>
</details>

The bug was that we inserted the `T` into the expected `InsertPos` 
corresponding to the `Profile` of `T` - so far so good, but instead of 
returning that, we checked if this is "overly complicated", and if so, we 
created the new `SymbolOverlyComplex` symbol and returned that instead. 
However, when the next time we get to call `acquired<T>`, we would simply get a 
successful lookup for the `Profile` of `T` and return the symbol we actually 
wanted to hide! instead of returning the `SymbolOverlyComplex` that we returned 
in the previous `acquire<T>` call. This took a while to debug.
My solution to this is pretty complicated and ugly. In short, check the 
complexity of the symbol before we do insertion/lookup.
If the complexity of this symbol would be below the threshold, then find or 
insert just like we did before.
Otherwise, we need to look up the wrapped symbol first, then check if we have a 
`SymbolOverlyComplex` with that wrapped symbol in the map. If there is, just 
return that one, otherwise we create it.

As you can tell, this is pretty convoluted. And its still slow. On the attached 
test case, I observed 19.36% slowdown (went from 1.503s to 1.794s).
So I have 2 ways of resolving this:
1) Continue the investigation about why this implementation is slow.
2) Revert back to the variant I originally proposed when I opened this PR.


https://github.com/llvm/llvm-project/pull/144327
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to