On Tue, 10 Feb 2015, Leon Alrae wrote:

> >  These cases could be addressed by either replacing subtraction from 0.0 
> > with multiplication by -1.0, or by tweaking the rounding mode as needed 
> > temporarily.  Given that the computational cost of multiplication is 
> > uncertain and likely higher or at best the same as the cost of addition or 
> > subtraction, I'd be leaning towards the latter solution.
> 
> My first thought was to treat zero in NEG.fmt as a special case and use
> float32_chs() for it. But tweaking the rounding mode temporarily
> probably is better as we will get consistent behaviour for zero as well
> as input denormals which are squashed in float32_sub() when
> flush_inputs_to_zero flag is set (actually I'm not sure if legacy fp
> instructions should flush input denormals, but according to the spec
> this is implementation dependent so I won't worry about this).

 As expected setting CP1.FCSR.FS on a randomly picked R4400 processor:

CPU0 revision is: 00000440 (R4400SC)
FPU revision is: 00000500

does flush a NEG.fmt's input denormal to 0.  Given this program:

#include <stdint.h>
#include <stdio.h>

int main(void)
{
        union {
                double d;
                uint64_t i;
        } x = { .i = 0x000123456789abcdULL }, y, z;
        unsigned long tmp, fcsr;

        printf("x: %016lx\n", x.i);
        asm volatile(
                "       cfc1    %1, $31\n"
                "       or      %2, %1, %4\n"
                "       ctc1    %2, $31\n"
                "       neg.d   %0, %3\n"
                "       ctc1    %1, $31"
                : "=f" (y.d), "=&r" (fcsr), "=&r" (tmp)
                : "f" (x.d), "r" (1 << 24));
        printf("y: %016lx\n", y.i);
        asm volatile(
                "       neg.d   %0, %1"
                : "=f" (z.d) : "f" (x.d));
        printf("z: %016lx\n", z.i);
        x.i = 0;
        printf("+: %016lx\n", x.i);
        asm volatile(
                "       neg.d   %0, %1"
                : "=f" (y.d) : "f" (x.d));
        printf("-: %016lx\n", y.i);
        return 0;
}

I get this output:

x: 000123456789abcd
y: 8000000000000000
z: 800123456789abcd
+: 0000000000000000
-: 8000000000000000

under Linux.  According to R4400 documentation the value of `z' must have 
been calculated by the in-kernel emulator in the Unimplemented Operation 
handler as for this processor implementation any denormalised operands 
cause this exception except for compare instructions.  But in any case all 
the results are consistent.  So we don't actually have to do anything for 
the flush-to-zero mode, our calculation should work out as expected (as 
long as the `float_round_down' rounding mode is respected that is).

 While at it I included the result of the negation of 0 for completeness.

  Maciej

Reply via email to