https://bugs.kde.org/show_bug.cgi?id=432801

--- Comment #17 from Julian Seward <jsew...@acm.org> ---
Interesting analysis, and a plausible patch; thank you for that.  This seems
like a new trick from LLVM.

I'm still struggling to understand what's going on, though.  I can see that

  for (size_t i = 0; i < plen; ++i)
    hp += pattern[i];

could be vectorised as you say, so that it loads 4 bytes at a time, and uses
punpcklbw twice to interleave them as described in comment 12.  But:

* where's the addition instruction that merges the lanes together?  I don't
  see that.

* what is the purpose of the pcmpgtd instruction?  The original sources
  contain a scalar comparison against zero

  if (hp==j) {
    j++;
  }

  Is that related?  If so, how does a scalar 32-bit equality test against zero
  get translated into a vector 32x4 signed-greater-than operation?

---

In the patch, there's mention of biasing:

+      // From here on out, we're dealing with biased integers instead of 2's
+      // complement.

What does that mean, in this context?

Regarding the test:

* you put it in memcheck/tests/x86; "x86" here means 32-bit only.  Is that
  what you intended?  I would have expected it to go in the "amd64" directory.

* because the test is written in C, whether or not it tests what you expect it
  to test depends entirely on the compiler used to compile it.  And most
  likely, it won't be vectorised, or won't be vectorised in the same way.
  This kind of test really needs to be written in assembly (inline assembly)
  so we know what we're testing.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to