https://bugs.kde.org/show_bug.cgi?id=432801

--- Comment #12 from Eyal <eyals...@gmail.com> ---
Okay, now I'm back to thinking that it's a valgrind issue.  But it's nothing
that valgrind can fix.  Here's the problematic asm code:

<main+229>     movd   %edx,%xmm2              (1)
<main+233>     punpcklbw %xmm2,%xmm2          (2)
<main+237>     punpcklwd %xmm2,%xmm3          (3)
<main+241>     movzwl 0xa(%rsp,%rsi,1),%edx
<main+246>     movd   %edx,%xmm2              (4)
<main+250>     punpcklbw %xmm2,%xmm2
<main+254>     punpcklwd %xmm2,%xmm2
<main+258>     pxor   %xmm4,%xmm4             (5)
<main+262>     pcmpgtd %xmm3,%xmm4            (6)
<main+266>     psrad  $0x18,%xmm3

This code is some SIMD math that gets made for summing the characters in a
string, like in the original code.  Before this code, the calls to sigaction
have inadvertently fouled up the contents of the xmm registers.  That's okay,
sigaction is allowed to do that because xmm registers are caller-saved.  That
means that if the caller wanted them to have valid info, it was up to the
caller to save and restore them beforehand.  No problem.

For the below explanation, we'll use letters (ABCD) for known bytes, X for
unknown bytes, and 0 for 0 bytes.

(1) is putting a known value into xmm2.  So xmm2 is now well-defined as
ABCD0000000000000000

(2) is bytewise interleaving the value in xmm2 with itself.  So xmm2 is now
AABBCCDD00000000.

(3) is wordwise (16b) interleaving xmm2 with xmm3.  xmm3 is now
AAXXBBXXCCXXDDXX.

(4) notice that xmm2 has been clobbered with a new value.

(5) xmm4 is set to all 0: 0000000000000000

(6) is doing a signed double-word (32-bit) SIMD comparison of xmm3 and xmm4 and
putting the result as a 0 or -1 into xmm4.  If the xmm4 value is bigger than
the xmm3 value, the xmm4 double-word will be filled with ones.  Otherwise,
zero.  The comparison looks like this (MSB-first):

0000 > AAXX ? -1 : 0
0000 > BBXX ? -1 : 0
0000 > CCXX ? -1 : 0
0000 > DDXX ? -1 : 0

Considering the ABCDs:

  * If they are negative then the MSB is a 1 and zero is greater than all
negative numbers so we don't need to look any further.
  * If they are positive then the whole number is positive and we don't need to
look any further.
  * If they are zero then the number is either 0 or positive.  Either way, 0 is
not greater than a non-negative number so we don't need to look any further.

So as far we're considered, the output here is completely defined!

valgrind memcheck doesn't track values, though, only whether or not a value was
explicitly defined.  So valgrind is seeing this:

QRST > AAXX ? -1 : 0

All the letters (other than X) are defined but valgrind's mechanism doesn't
keep track of whether or not they are zero.  Because it doesn't know that Q is
a 0, it might be that QR match AA!  In which case, the only way to know the
result is to know how ST compares to XX.  And XX is unknown so the result is
unknown.  The "undefinedness" propagates all the way to a branch statement
later in the code which valgrind detects and reports.

---

In (4) above, we see that xmm2 is clobbered quickly after it's use in (1,2,3). 
So why use it anyway?  Also, lots of the other code doesn't use xmm2 as a
scratchpad:

<main+246>     movd   %edx,%xmm2
<main+250>     punpcklbw %xmm2,%xmm2
<main+254>     punpcklwd %xmm2,%xmm2

<main+305>     movd   %edx,%xmm0
<main+309>     punpcklbw %xmm0,%xmm0
<main+313>     punpcklwd %xmm0,%xmm0

<main+322>     movd   %edx,%xmm1
<main+326>     punpcklbw %xmm1,%xmm1
<main+330>     punpcklwd %xmm1,%xmm1

So why not do the same here?

If I write this code and compile it:

void asm_test() {
  __asm__ ("movd %edx, %xmm2");
  __asm__ ("punpcklbw %xmm2, %xmm2");
  __asm__ ("punpcklwd %xmm2, %xmm3");

  __asm__ ("movd %edx, %xmm3");
  __asm__ ("punpcklbw %xmm3, %xmm3");
  __asm__ ("punpcklwd %xmm3, %xmm3");
}

And then objdump -D on it:

  401697:       66 0f 6e d2             movd   %edx,%xmm2
  40169b:       66 0f 60 d2             punpcklbw %xmm2,%xmm2
  40169f:       66 0f 61 da             punpcklwd %xmm2,%xmm3
  4016a3:       66 0f 6e da             movd   %edx,%xmm3
  4016a7:       66 0f 60 db             punpcklbw %xmm3,%xmm3
  4016ab:       66 0f 61 db             punpcklwd %xmm3,%xmm3

I can see that the instruction sizes are the same.  So I can use emacs
hexl-mode or xxd or objdump -R to modify the binary and try it.  It only took a
moment and it solved the problem.  Valgrind no longer reports any errors.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to