[Bug target/113034] New: Miscompilation of __m128 ne comparison on LoongArch

c at jia dot je via Gcc-bugs Fri, 15 Dec 2023 09:56:14 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113034


            Bug ID: 113034
           Summary: Miscompilation of __m128 ne comparison on LoongArch
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: c at jia dot je
  Target Milestone: ---

Compile and run the following code:

```
#include <lsxintrin.h>
#include <stdio.h>
__m128i unord_vec(__m128 a, __m128 b) { return (a != a) | (b != b); }

int unord_float(float a, float b) { return (a != a) | (b != b); }

int main() {
  float nan = 0.0 / 0.0; // nan
  __m128 nan_vec = {nan, nan};
  int res_float = unord_float(nan, nan);
  __m128i res_vec = unord_vec(nan_vec, nan_vec);
  printf("%d %ld %ld\n", res_float, res_vec[0], res_vec[1]);
  return 0;
}
```

Compile commands: `gcc-14 -mlsx test.c -O -o test`. GCC version is 14.0.0
202231203 snapshot.

It does the `unordered` comparison between two floats. The expected output:

```
1 1 1
```

Actual output:

```
1 0 0
```

Reading the assembly, the `unord_vec` is implemented wrongly as `vfcmp.cne.s`:

```
unord_vec:
.LFB538 = .
        .cfi_startproc
        vinsgr2vr.d     $vr0,$r4,0
        vinsgr2vr.d     $vr0,$r5,1
        vinsgr2vr.d     $vr1,$r6,0
        vinsgr2vr.d     $vr1,$r7,1
        vfcmp.cne.s     $vr0,$vr0,$vr0
        vfcmp.cne.s     $vr1,$vr1,$vr1
        vor.v   $vr0,$vr0,$vr1
        vpickve2gr.du   $r4,$vr0,0
        vpickve2gr.du   $r5,$vr0,1
        jr      $r1
        .cfi_endproc
```

Whereas `unord_float` is correctly implemented as `fcmp.cune.s`:

```
unord_float:
.LFB539 = .
        .cfi_startproc
        addi.w  $r4,$r0,1                       # 0x1
        fcmp.cune.s     $fcc0,$f0,$f0
        bcnez   $fcc0,.L3
        or      $r4,$r0,$r0
.L3:
        addi.w  $r12,$r0,1                      # 0x1
        fcmp.cune.s     $fcc1,$f1,$f1
        bcnez   $fcc1,.L4
        or      $r12,$r0,$r0
.L4:
        or      $r4,$r4,$r12
        andi    $r4,$r4,1
        jr      $r1
        .cfi_endproc
```

So there is a mismatch on the `unordered` case. Besides, these functions can be
optimized to use `vfcmp.cun.s` and `fcmp.cun.s`.

[Bug target/113034] New: Miscompilation of __m128 ne comparison on LoongArch

Reply via email to