https://bugs.kde.org/show_bug.cgi?id=432801

--- Comment #18 from Eyal <eyals...@gmail.com> ---
>Interesting analysis, and a plausible patch; thank you for that.  This seems
>like a new trick from LLVM.

Thanks.  Yes

>* where's the addition instruction that merges the lanes together?  I don't
>  see that.

It's just beyond the part of the asm that I put in this post.  I didn't include
it because it wasn't relevant to the bug.  Here you go:

   0x0000000000401225 <+213>:   movd   %edx,%xmm2
   0x0000000000401229 <+217>:   punpcklbw %xmm2,%xmm2
   0x000000000040122d <+221>:   punpcklwd %xmm2,%xmm3
   0x0000000000401231 <+225>:   movzwl 0xa(%rsp,%rsi,1),%edx
   0x0000000000401236 <+230>:   movd   %edx,%xmm2
   0x000000000040123a <+234>:   punpcklbw %xmm2,%xmm2
   0x000000000040123e <+238>:   punpcklwd %xmm2,%xmm2
   0x0000000000401242 <+242>:   pxor   %xmm4,%xmm4
   0x0000000000401246 <+246>:   pcmpgtd %xmm3,%xmm4
   0x000000000040124a <+250>:   psrad  $0x18,%xmm3
   0x000000000040124f <+255>:   punpckldq %xmm4,%xmm3
   0x0000000000401253 <+259>:   paddq  %xmm0,%xmm3

That last line is the add.

>* what is the purpose of the pcmpgtd instruction?  The original sources
>  contain a scalar comparison against zero

For sign-extension.  Because the numbers are 8-bit characters (char) but the
output is 32-bit signed (int), we need to sign extend the char.  The comparison
of 0>$xmm3 is filling the register with 1s if the number is negative.  The
following psrad does an arithmetic shift, sign extending the negative to fill
the 24-bit difference.  You can see the paddq below it.  I didn't analyze it
too much; I trust the LLVM people know what they're doing.

  if (hp==j) {
    j++;
  }

>  Is that related?  If so, how does a scalar 32-bit equality test against zero
>  get translated into a vector 32x4 signed-greater-than operation?

Later in the code the vector eventually gets collapsed into a single value,
like we need.  This comparison is what causes the error because memcheck only
reports the error when the value gets used in a branch, not when it's created.

I could have also returned the sum; that would have worked as well.

---

>In the patch, there's mention of biasing:
>
>+      // From here on out, we're dealing with biased integers instead of 2's
>+      // complement.
>
>What does that mean, in this context?

Probably just bad wording on my part.  What I should have just said is that the
MSB of a signed number is 0 for positive and 1 for negative so to get the max
or min value, we need to invert the MSB with xor.  Or another way to think
about it is that we convert the number space from 2's complement, where the
0x00 is 0 and 0xff is -1, to a "biased" space where 0x00 is -128, 0x7f is -1,
0x80 is 0, 0x81 is 1, all the way up to 0xff which is 127.  Each number is
stored in bits as it's value +128. Writing numbers that way fixes the problem
of 2's complement were a 1 in the MSB is *less* than a 0 in the MSB.

I will remove the confusing explanation and instead just mention the MSB thing.

>* you put it in memcheck/tests/x86; "x86" here means 32-bit only.  Is that
>  what you intended?  I would have expected it to go in the "amd64" directory.

I'll fix that.

>* because the test is written in C, whether or not it tests what you expect it
>  to test depends entirely on the compiler used to compile it.  And most
>  likely, it won't be vectorised, or won't be vectorised in the same way.
>  This kind of test really needs to be written in assembly (inline assembly)
>  so we know what we're testing.

Yes.  For that reason I put the binary in the patch!  That was bad.  I'll use
asm instructions.  I had a bad time with them when I was debugging but I'll
figure it out.  Also, I'll be able to test more of the edge cases.

There were bugs in the original patch anyway.  It's a delicate bit of code.

Because it's only for Intel SSE2 processors, if I put the test in the amd64
directory, is that enough to ensure that only the SSE2-enabled chips will run
the test?

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to