https://bugs.kde.org/show_bug.cgi?id=432801
--- Comment #18 from Eyal <eyals...@gmail.com> --- >Interesting analysis, and a plausible patch; thank you for that. This seems >like a new trick from LLVM. Thanks. Yes >* where's the addition instruction that merges the lanes together? I don't > see that. It's just beyond the part of the asm that I put in this post. I didn't include it because it wasn't relevant to the bug. Here you go: 0x0000000000401225 <+213>: movd %edx,%xmm2 0x0000000000401229 <+217>: punpcklbw %xmm2,%xmm2 0x000000000040122d <+221>: punpcklwd %xmm2,%xmm3 0x0000000000401231 <+225>: movzwl 0xa(%rsp,%rsi,1),%edx 0x0000000000401236 <+230>: movd %edx,%xmm2 0x000000000040123a <+234>: punpcklbw %xmm2,%xmm2 0x000000000040123e <+238>: punpcklwd %xmm2,%xmm2 0x0000000000401242 <+242>: pxor %xmm4,%xmm4 0x0000000000401246 <+246>: pcmpgtd %xmm3,%xmm4 0x000000000040124a <+250>: psrad $0x18,%xmm3 0x000000000040124f <+255>: punpckldq %xmm4,%xmm3 0x0000000000401253 <+259>: paddq %xmm0,%xmm3 That last line is the add. >* what is the purpose of the pcmpgtd instruction? The original sources > contain a scalar comparison against zero For sign-extension. Because the numbers are 8-bit characters (char) but the output is 32-bit signed (int), we need to sign extend the char. The comparison of 0>$xmm3 is filling the register with 1s if the number is negative. The following psrad does an arithmetic shift, sign extending the negative to fill the 24-bit difference. You can see the paddq below it. I didn't analyze it too much; I trust the LLVM people know what they're doing. if (hp==j) { j++; } > Is that related? If so, how does a scalar 32-bit equality test against zero > get translated into a vector 32x4 signed-greater-than operation? Later in the code the vector eventually gets collapsed into a single value, like we need. This comparison is what causes the error because memcheck only reports the error when the value gets used in a branch, not when it's created. I could have also returned the sum; that would have worked as well. --- >In the patch, there's mention of biasing: > >+ // From here on out, we're dealing with biased integers instead of 2's >+ // complement. > >What does that mean, in this context? Probably just bad wording on my part. What I should have just said is that the MSB of a signed number is 0 for positive and 1 for negative so to get the max or min value, we need to invert the MSB with xor. Or another way to think about it is that we convert the number space from 2's complement, where the 0x00 is 0 and 0xff is -1, to a "biased" space where 0x00 is -128, 0x7f is -1, 0x80 is 0, 0x81 is 1, all the way up to 0xff which is 127. Each number is stored in bits as it's value +128. Writing numbers that way fixes the problem of 2's complement were a 1 in the MSB is *less* than a 0 in the MSB. I will remove the confusing explanation and instead just mention the MSB thing. >* you put it in memcheck/tests/x86; "x86" here means 32-bit only. Is that > what you intended? I would have expected it to go in the "amd64" directory. I'll fix that. >* because the test is written in C, whether or not it tests what you expect it > to test depends entirely on the compiler used to compile it. And most > likely, it won't be vectorised, or won't be vectorised in the same way. > This kind of test really needs to be written in assembly (inline assembly) > so we know what we're testing. Yes. For that reason I put the binary in the patch! That was bad. I'll use asm instructions. I had a bad time with them when I was debugging but I'll figure it out. Also, I'll be able to test more of the edge cases. There were bugs in the original patch anyway. It's a delicate bit of code. Because it's only for Intel SSE2 processors, if I put the test in the amd64 directory, is that enough to ensure that only the SSE2-enabled chips will run the test? -- You are receiving this mail because: You are watching all bug changes.