[PATCHv2 1/6] powerpc: fix exception clearing in e500 SPE float emulation

2013-12-10 Thread Joseph S. Myers
if (cpu_has_feature(CPU_FTR_SPE)) {
+   /*
+* When the sticky exception bits are set
+* directly by userspace, it must call prctl
+* with PR_GET_FPEXC (with PR_FP_EXC_SW_ENABLE
+* in the existing prctl settings) or
+* PR_SET_FPEXC (with PR_FP_EXC_SW_ENABLE in
+* the bits being set).  fenv.h functions
+* saving and restoring the whole
+* floating-point environment need to do so
+* anyway to restore the prctl settings from
+* the saved environment.
+*/
+   tsk-thread.spefscr_last = mfspr(SPRN_SPEFSCR);
tsk-thread.fpexc_mode = val 
(PR_FP_EXC_SW_ENABLE | PR_FP_ALL_EXCEPT);
return 0;
@@ -1206,9 +1219,22 @@ int get_fpexc_mode(struct task_struct *tsk, unsigned 
long adr)
 
if (tsk-thread.fpexc_mode  PR_FP_EXC_SW_ENABLE)
 #ifdef CONFIG_SPE
-   if (cpu_has_feature(CPU_FTR_SPE))
+   if (cpu_has_feature(CPU_FTR_SPE)) {
+   /*
+* When the sticky exception bits are set
+* directly by userspace, it must call prctl
+* with PR_GET_FPEXC (with PR_FP_EXC_SW_ENABLE
+* in the existing prctl settings) or
+* PR_SET_FPEXC (with PR_FP_EXC_SW_ENABLE in
+* the bits being set).  fenv.h functions
+* saving and restoring the whole
+* floating-point environment need to do so
+* anyway to restore the prctl settings from
+* the saved environment.
+*/
+   tsk-thread.spefscr_last = mfspr(SPRN_SPEFSCR);
val = tsk-thread.fpexc_mode;
-   else
+   } else
return -EINVAL;
 #else
return -EINVAL;
diff --git a/arch/powerpc/math-emu/math_efp.c b/arch/powerpc/math-emu/math_efp.c
index a73f088..59835c6 100644
--- a/arch/powerpc/math-emu/math_efp.c
+++ b/arch/powerpc/math-emu/math_efp.c
@@ -630,9 +630,27 @@ update_ccr:
regs-ccr |= (IR  ((7 - ((speinsn  23)  0x7))  2));
 
 update_regs:
-   __FPU_FPSCR = ~FP_EX_MASK;
+   /*
+* If the invalid exception sticky bit was set by the
+* processor for non-finite input, but was not set before the
+* instruction being emulated, clear it.  Likewise for the
+* underflow bit, which may have been set by the processor
+* for exact underflow, not just inexact underflow when the
+* flag should be set for IEEE 754 semantics.  Other sticky
+* exceptions will only be set by the processor when they are
+* correct according to IEEE 754 semantics, and we must not
+* clear sticky bits that were already set before the emulated
+* instruction as they represent the user-visible sticky
+* exception status.  inexact traps to kernel are not
+* required for IEEE semantics and are not enabled by default,
+* so the inexact sticky bit may have been set by a previous
+* instruction without the kernel being aware of it.
+*/
+   __FPU_FPSCR
+ = ~(FP_EX_INVALID | FP_EX_UNDERFLOW) | current-thread.spefscr_last;
__FPU_FPSCR |= (FP_CUR_EXCEPTIONS  FP_EX_MASK);
mtspr(SPRN_SPEFSCR, __FPU_FPSCR);
+   current-thread.spefscr_last = __FPU_FPSCR;
 
current-thread.evr[fc] = vc.wp[0];
regs-gpr[fc] = vc.wp[1];


-- 
Joseph S. Myers
jos...@codesourcery.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/6] powerpc: fix exception clearing in e500 SPE float emulation

2013-11-22 Thread Joseph S. Myers
On Fri, 22 Nov 2013, Scott Wood wrote:

 This sounds like an incompatible change to userspace API.  What about
 older glibc?  What about user code that directly manipulates these bits
 rather than going through libc, or uses a libc other than glibc?  Where
 is this API requirement documented?

The previous EGLIBC port, and the uClibc code copied from it, is 
fundamentally broken as regards any use of prctl for floating-point 
exceptions because it didn't use the PR_FP_EXC_SW_ENABLE bit in its prctl 
calls (and did various worse things, such as passing a pointer when prctl 
expected an integer).  If you avoid anything where prctl is used, the 
clearing of sticky bits still means it will never give anything 
approximating correct exception semantics with existing kernels.  I don't 
believe the patch makes things any worse for existing code that doesn't 
try to inform the kernel of changes to sticky bits - such code may get 
incorrect exceptions in some cases, but it would have done so anyway in 
other cases.

This is the best API I could come up with to fix the fundamentally broken 
nature of what came before, taking into account that in many cases a prctl 
call is already needed along with userspace manipulation of exception 
bits.  I'm not aware of any kernel documentation where this sort of 
subarchitecture-specific API detail is documented.  (The API also includes 
such things as needing to leave the spefscr trap-enable bits set and use 
prctl to control whether SIGFPE results from exceptions.)

 I think the impact of this could be reduced by using this mechanism only
 to clear bits, rather than set them.  That is, if the exception bit is
 unset, don't set it just because it's set in spefscr_last -- but if it's
 not set in spefscr_last, and the emulation code doesn't want to set it,
 then clear it.

It should already be the case in this patch that if a bit is clear in 
spefscr, and set in spefscr_last (i.e. userspace did not inform the kernel 
of clearing the bit, and no traps since then have resulted in the kernel 
noticing it was cleared), it won't get set unless the emulation code wants 
to set it.  The sole place spefscr_last is read is in the statement 
__FPU_FPSCR = ~(FP_EX_INVALID | FP_EX_UNDERFLOW) | 
current-thread.spefscr_last; - if the bit is already clear in spefscr, 
this statement has no effect on it.

 Are there any cases where the exception bit can be set without the
 kernel taking a trap, or is userspace manipulation limited to clearing
 the bits?

Userspace can both set and clear the bits without a trap.  For example, 
fesetenv restores a saved value of spefscr which may both set and clear 
bits (and then it calls prctl because it needs to do so anyway to restore 
the saved state for which exceptions were enabled).  fesetexceptflag 
restores saved state of particular exceptions without a trap (so needs to 
call prctl specially to inform the kernel of a change).

-- 
Joseph S. Myers
jos...@codesourcery.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Ping^2 Re: [PATCH 0/6] powerpc/math-emu: e500 SPE float emulation fixes

2013-11-18 Thread Joseph S. Myers
Ping^2.  I still haven't seen any comments on any of these patches.

-- 
Joseph S. Myers
jos...@codesourcery.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Ping Re: [PATCH 0/6] powerpc/math-emu: e500 SPE float emulation fixes

2013-11-11 Thread Joseph S. Myers
Ping.  I haven't seen any comments on any of these patches.

-- 
Joseph S. Myers
jos...@codesourcery.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 2/6] powerpc: fix e500 SPE float rounding inexactness detection

2013-11-04 Thread Joseph S. Myers
From: Joseph Myers jos...@codesourcery.com

The e500 SPE floating-point emulation code for the rounding modes
rounding to positive or negative infinity (which may not be
implemented in hardware) tries to avoid emulating rounding if the
result was inexact.  However, it tests inexactness using the sticky
bit with the cumulative result of previous operations, rather than
with the non-sticky bits relating to the operation that generated the
interrupt.  Furthermore, when a vector operation generates the
interrupt, it's possible that only one of the low and high parts is
inexact, and so only that part should have rounding emulated.  This
results in incorrect rounding of exact results in these modes when the
sticky bit is set from a previous operation.

(I'm not sure why the rounding interrupts are generated at all when
the result is exact, but empirically the hardware does generate them.)

This patch checks for inexactness using the correct bits of SPEFSCR,
and ensures that rounding only occurs when the relevant part of the
result was actually inexact.

Signed-off-by: Joseph Myers jos...@codesourcery.com

---

Previous submission: http://lkml.org/lkml/2013/10/4/497.

diff --git a/arch/powerpc/math-emu/math_efp.c b/arch/powerpc/math-emu/math_efp.c
index 59835c6..ecdf35d 100644
--- a/arch/powerpc/math-emu/math_efp.c
+++ b/arch/powerpc/math-emu/math_efp.c
@@ -680,7 +680,8 @@ int speround_handler(struct pt_regs *regs)
 {
union dw_union fgpr;
int s_lo, s_hi;
-   unsigned long speinsn, type, fc;
+   int lo_inexact, hi_inexact;
+   unsigned long speinsn, type, fc, fptype;
 
if (get_user(speinsn, (unsigned int __user *) regs-nip))
return -EFAULT;
@@ -693,8 +694,12 @@ int speround_handler(struct pt_regs *regs)
__FPU_FPSCR = mfspr(SPRN_SPEFSCR);
pr_debug(speinsn:%08lx spefscr:%08lx\n, speinsn, __FPU_FPSCR);
 
+   fptype = (speinsn  5)  0x7;
+
/* No need to round if the result is exact */
-   if (!(__FPU_FPSCR  FP_EX_INEXACT))
+   lo_inexact = __FPU_FPSCR  (SPEFSCR_FG | SPEFSCR_FX);
+   hi_inexact = __FPU_FPSCR  (SPEFSCR_FGH | SPEFSCR_FXH);
+   if (!(lo_inexact || (hi_inexact  fptype == VCT)))
return 0;
 
fc = (speinsn  21)  0x1f;
@@ -705,7 +710,7 @@ int speround_handler(struct pt_regs *regs)
 
pr_debug(round fgpr: %08x  %08x\n, fgpr.wp[0], fgpr.wp[1]);
 
-   switch ((speinsn  5)  0x7) {
+   switch (fptype) {
/* Since SPE instructions on E500 core can handle round to nearest
 * and round toward zero with IEEE-754 complied, we just need
 * to handle round toward +Inf and round toward -Inf by software.
@@ -728,11 +733,15 @@ int speround_handler(struct pt_regs *regs)
 
case VCT:
if (FP_ROUNDMODE == FP_RND_PINF) {
-   if (!s_lo) fgpr.wp[1]++; /* Z_low  0, choose Z1 */
-   if (!s_hi) fgpr.wp[0]++; /* Z_high word  0, choose Z1 
*/
+   if (lo_inexact  !s_lo)
+   fgpr.wp[1]++; /* Z_low  0, choose Z1 */
+   if (hi_inexact  !s_hi)
+   fgpr.wp[0]++; /* Z_high word  0, choose Z1 */
} else { /* round to -Inf */
-   if (s_lo) fgpr.wp[1]++; /* Z_low  0, choose Z2 */
-   if (s_hi) fgpr.wp[0]++; /* Z_high  0, choose Z2 */
+   if (lo_inexact  s_lo)
+   fgpr.wp[1]++; /* Z_low  0, choose Z2 */
+   if (hi_inexact  s_hi)
+   fgpr.wp[0]++; /* Z_high  0, choose Z2 */
}
break;
 

-- 
Joseph S. Myers
jos...@codesourcery.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 3/6] math-emu: fix floating-point to integer unsigned saturation

2013-11-04 Thread Joseph S. Myers
From: Joseph Myers jos...@codesourcery.com

The math-emu macros _FP_TO_INT and _FP_TO_INT_ROUND are supposed to
saturate their results for out-of-range arguments, except in the case
rsigned == 2 (when instead the low bits of the result are taken).
However, in the case rsigned == 0 (converting to unsigned integers),
they mistakenly produce 0 for positive results and the maximum
unsigned integer for negative results, the opposite of correct
unsigned saturation.  This patch fixes the logic.

Signed-off-by: Joseph Myers jos...@codesourcery.com

---

Previous submission: http://lkml.org/lkml/2013/10/8/694.

I have made the corresponding changes to the glibc/libgcc copy of this
code, given that it would be desirable to resync the Linux and
glibc/libgcc copies (the latter has had many enhancements and bug
fixes since it was copied into Linux), although strictly this
incorrect saturation is only a bug when trying to emulate particular
instruction semantics, not when used in userspace to implement C
operations where the results of out-of-range conversions are
unspecified or undefined.

diff --git a/include/math-emu/op-common.h b/include/math-emu/op-common.h
index 9696a5e..70fe5e9 100644
--- a/include/math-emu/op-common.h
+++ b/include/math-emu/op-common.h
@@ -685,7 +685,7 @@ do {
\
else
\
  { 
\
r = 0;  
\
-   if (X##_s)  
\
+   if (!X##_s) 
\
  r = ~r;   
\
  } 
\
FP_SET_EXCEPTION(FP_EX_INVALID);
\
@@ -762,7 +762,7 @@ do {
\
if (!rsigned)   
\
  { 
\
r = 0;  
\
-   if (X##_s)  
\
+   if (!X##_s) 
\
  r = ~r;   
\
  } 
\
else if (rsigned != 2)  
\

-- 
Joseph S. Myers
jos...@codesourcery.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 0/6] powerpc/math-emu: e500 SPE float emulation fixes

2013-11-04 Thread Joseph S. Myers
This patch series fixes various problems with the floating-point
emulation code for powerpc e500 SPE (some being issues with the
e500-specific emulation code, some with the generic math-emu headers).
All six patches were sent individually last month as the issues were
identified and fixed in the course of preparing the e500 glibc port,
and received no comments.  There are no substantive changes to the
patches in this version, but I've retested the glibc port (which is
now upstream, along with all the generic math-emu changes relevant to
the glibc soft-fp code, and various fixes to soft-fp corresponding to
fixes in the kernel code in the hope that at some point we can get the
kernel using the current soft-fp code again) with current kernel
sources with this patch series applied.

The only dependencies between patches in this series should be that
patch 5 (fix e500 SPE float to integer and fixed-point conversions)
depends on patch 2 (fix e500 SPE float rounding inexactness
detection).  Other than that, I think any subset of the patches can be
applied in any order, if some subset seems OK but there are concerns
about other patches in the series.

-- 
Joseph S. Myers
jos...@codesourcery.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 4/6] math-emu: fix floating-point to integer overflow detection

2013-11-04 Thread Joseph S. Myers
From: Joseph Myers jos...@codesourcery.com

On overflow, the math-emu macro _FP_TO_INT_ROUND tries to saturate its
result (subject to the value of rsigned specifying the desired
overflow semantics).  However, if the rounding step has the effect of
increasing the exponent so as to cause overflow (if the rounded result
is 1 larger than the largest positive value with the given number of
bits, allowing for signedness), the overflow does not get detected,
meaning that for unsigned results 0 is produced instead of the maximum
unsigned integer with the give number of bits, without an exception
being raised for overflow, and that for signed results the minimum
(negative) value is produced instead of the maximum (positive) value,
again without an exception.  This patch makes the code check for
rounding increasing the exponent and adjusts the exponent value as
needed for the overflow check.

Signed-off-by: Joseph Myers jos...@codesourcery.com

---

Previous submission: http://lkml.org/lkml/2013/10/8/700.

This macro is not present in the glibc/libgcc version of the code.  It
remains the case both before and after this patch that the conversions
wrongly treat a signed result of the most negative integer as an
overflow, when actually only that integer minus 1 or smaller should be
an overflow, although this only means an incorrect exception rather
than affecting the value returned; that was one of the bugs I fixed in
the glibc/libgcc version of this code in 2006 (as part of a major
overhaul of the code including various interface changes, so not
trivially backportable to the kernel version).

diff --git a/include/math-emu/op-common.h b/include/math-emu/op-common.h
index 70fe5e9..6bdf8c6 100644
--- a/include/math-emu/op-common.h
+++ b/include/math-emu/op-common.h
@@ -743,12 +743,17 @@ do {  
\
  } 
\
else
\
  { 
\
+   int _lz0, _lz1; 
\
if (X##_e = -_FP_WORKBITS - 1) 
\
  _FP_FRAC_SET_##wc(X, _FP_MINFRAC_##wc);   
\
else
\
  _FP_FRAC_SRS_##wc(X, _FP_FRACBITS_##fs - 1 - X##_e,   
\
_FP_WFRACBITS_##fs);
\
+   _FP_FRAC_CLZ_##wc(_lz0, X); 
\
_FP_ROUND(wc, X);   
\
+   _FP_FRAC_CLZ_##wc(_lz1, X); 
\
+   if (_lz1  _lz0)
\
+ X##_e++; /* For overflow detection.  */   
\
_FP_FRAC_SRL_##wc(X, _FP_WORKBITS); 
\
_FP_FRAC_ASSEMBLE_##wc(r, X, rsize);
\
  } 
\

-- 
Joseph S. Myers
jos...@codesourcery.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 6/6] powerpc: fix e500 SPE float SIGFPE generation

2013-11-04 Thread Joseph S. Myers
From: Joseph Myers jos...@codesourcery.com

The e500 SPE floating-point emulation code is called from
SPEFloatingPointException and SPEFloatingPointRoundException in
arch/powerpc/kernel/traps.c.  Those functions have support for
generating SIGFPE, but do_spe_mathemu and speround_handler don't
generate a return value to indicate that this should be done.  Such a
return value should depend on whether an exception is raised that has
been set via prctl to generate SIGFPE.  This patch adds the relevant
logic in these functions so that SIGFPE is generated as expected by
the glibc testsuite.

Signed-off-by: Joseph Myers jos...@codesourcery.com

---

Previous submission: http://lkml.org/lkml/2013/10/10/626.

diff --git a/arch/powerpc/math-emu/math_efp.c b/arch/powerpc/math-emu/math_efp.c
index 01a0abb..28337c9 100644
--- a/arch/powerpc/math-emu/math_efp.c
+++ b/arch/powerpc/math-emu/math_efp.c
@@ -20,6 +20,7 @@
  */
 
 #include linux/types.h
+#include linux/prctl.h
 
 #include asm/uaccess.h
 #include asm/reg.h
@@ -691,6 +692,23 @@ update_regs:
pr_debug(va: %08x  %08x\n, va.wp[0], va.wp[1]);
pr_debug(vb: %08x  %08x\n, vb.wp[0], vb.wp[1]);
 
+   if (current-thread.fpexc_mode  PR_FP_EXC_SW_ENABLE) {
+   if ((FP_CUR_EXCEPTIONS  FP_EX_DIVZERO)
+(current-thread.fpexc_mode  PR_FP_EXC_DIV))
+   return 1;
+   if ((FP_CUR_EXCEPTIONS  FP_EX_OVERFLOW)
+(current-thread.fpexc_mode  PR_FP_EXC_OVF))
+   return 1;
+   if ((FP_CUR_EXCEPTIONS  FP_EX_UNDERFLOW)
+(current-thread.fpexc_mode  PR_FP_EXC_UND))
+   return 1;
+   if ((FP_CUR_EXCEPTIONS  FP_EX_INEXACT)
+(current-thread.fpexc_mode  PR_FP_EXC_RES))
+   return 1;
+   if ((FP_CUR_EXCEPTIONS  FP_EX_INVALID)
+(current-thread.fpexc_mode  PR_FP_EXC_INV))
+   return 1;
+   }
return 0;
 
 illegal:
@@ -867,6 +885,8 @@ int speround_handler(struct pt_regs *regs)
 
pr_debug(  to fgpr: %08x  %08x\n, fgpr.wp[0], fgpr.wp[1]);
 
+   if (current-thread.fpexc_mode  PR_FP_EXC_SW_ENABLE)
+   return (current-thread.fpexc_mode  PR_FP_EXC_RES) ? 1 : 0;
return 0;
 }
 

-- 
Joseph S. Myers
jos...@codesourcery.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 1/6] powerpc: fix exception clearing in e500 SPE float emulation

2013-11-04 Thread Joseph S. Myers
 by the
+* processor for non-finite input, but was not set before the
+* instruction being emulated, clear it.  Likewise for the
+* underflow bit, which may have been set by the processor
+* for exact underflow, not just inexact underflow when the
+* flag should be set for IEEE 754 semantics.  Other sticky
+* exceptions will only be set by the processor when they are
+* correct according to IEEE 754 semantics, and we must not
+* clear sticky bits that were already set before the emulated
+* instruction as they represent the user-visible sticky
+* exception status.  inexact traps to kernel are not
+* required for IEEE semantics and are not enabled by default,
+* so the inexact sticky bit may have been set by a previous
+* instruction without the kernel being aware of it.
+*/
+   __FPU_FPSCR
+ = ~(FP_EX_INVALID | FP_EX_UNDERFLOW) | current-thread.spefscr_last;
__FPU_FPSCR |= (FP_CUR_EXCEPTIONS  FP_EX_MASK);
mtspr(SPRN_SPEFSCR, __FPU_FPSCR);
+   current-thread.spefscr_last = __FPU_FPSCR;
 
current-thread.evr[fc] = vc.wp[0];
regs-gpr[fc] = vc.wp[1];

-- 
Joseph S. Myers
jos...@codesourcery.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc: fix e500 SPE float SIGFPE generation

2013-10-10 Thread Joseph S. Myers
From: Joseph Myers jos...@codesourcery.com

The e500 SPE floating-point emulation code is called from
SPEFloatingPointException and SPEFloatingPointRoundException in
arch/powerpc/kernel/traps.c.  Those functions have support for
generating SIGFPE, but do_spe_mathemu and speround_handler don't
generate a return value to indicate that this should be done.  Such a
return value should depend on whether an exception is raised that has
been set via prctl to generate SIGFPE.  This patch adds the relevant
logic in these functions so that SIGFPE is generated as expected by
the glibc testsuite.

Signed-off-by: Joseph Myers jos...@codesourcery.com

---

This patch is not intended to depend on any of my previous patches
http://lkml.org/lkml/2013/10/4/495,
http://lkml.org/lkml/2013/10/4/497,
http://lkml.org/lkml/2013/10/8/694,
http://lkml.org/lkml/2013/10/8/700 and
http://lkml.org/lkml/2013/10/8/705, although testing has been on top
of that patch series and having all six patches will produce the best
results.

diff --git a/arch/powerpc/math-emu/math_efp.c b/arch/powerpc/math-emu/math_efp.c
index 01a0abb..28337c9 100644
--- a/arch/powerpc/math-emu/math_efp.c
+++ b/arch/powerpc/math-emu/math_efp.c
@@ -20,6 +20,7 @@
  */
 
 #include linux/types.h
+#include linux/prctl.h
 
 #include asm/uaccess.h
 #include asm/reg.h
@@ -691,6 +692,23 @@ update_regs:
pr_debug(va: %08x  %08x\n, va.wp[0], va.wp[1]);
pr_debug(vb: %08x  %08x\n, vb.wp[0], vb.wp[1]);
 
+   if (current-thread.fpexc_mode  PR_FP_EXC_SW_ENABLE) {
+   if ((FP_CUR_EXCEPTIONS  FP_EX_DIVZERO)
+(current-thread.fpexc_mode  PR_FP_EXC_DIV))
+   return 1;
+   if ((FP_CUR_EXCEPTIONS  FP_EX_OVERFLOW)
+(current-thread.fpexc_mode  PR_FP_EXC_OVF))
+   return 1;
+   if ((FP_CUR_EXCEPTIONS  FP_EX_UNDERFLOW)
+(current-thread.fpexc_mode  PR_FP_EXC_UND))
+   return 1;
+   if ((FP_CUR_EXCEPTIONS  FP_EX_INEXACT)
+(current-thread.fpexc_mode  PR_FP_EXC_RES))
+   return 1;
+   if ((FP_CUR_EXCEPTIONS  FP_EX_INVALID)
+(current-thread.fpexc_mode  PR_FP_EXC_INV))
+   return 1;
+   }
return 0;
 
 illegal:
@@ -867,6 +885,8 @@ int speround_handler(struct pt_regs *regs)
 
pr_debug(  to fgpr: %08x  %08x\n, fgpr.wp[0], fgpr.wp[1]);
 
+   if (current-thread.fpexc_mode  PR_FP_EXC_SW_ENABLE)
+   return (current-thread.fpexc_mode  PR_FP_EXC_RES) ? 1 : 0;
return 0;
 }
 

-- 
Joseph S. Myers
jos...@codesourcery.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] math-emu: fix floating-point to integer unsigned saturation

2013-10-08 Thread Joseph S. Myers
From: Joseph Myers jos...@codesourcery.com

The math-emu macros _FP_TO_INT and _FP_TO_INT_ROUND are supposed to
saturate their results for out-of-range arguments, except in the case
rsigned == 2 (when instead the low bits of the result are taken).
However, in the case rsigned == 0 (converting to unsigned integers),
they mistakenly produce 0 for positive results and the maximum
unsigned integer for negative results, the opposite of correct
unsigned saturation.  This patch fixes the logic.

Signed-off-by: Joseph Myers jos...@codesourcery.com

---

I intend to make the corresponding changes to the glibc/libgcc copy of
this code, given that it would be desirable to resync the Linux and
glibc/libgcc copies (the latter has had many enhancements and bug
fixes since it was copied into Linux), although strictly this
incorrect saturation is only a bug when trying to emulate particular
instruction semantics, not when used in userspace to implement C
operations where the results of out-of-range conversions are
unspecified or undefined.

diff --git a/include/math-emu/op-common.h b/include/math-emu/op-common.h
index 9696a5e..70fe5e9 100644
--- a/include/math-emu/op-common.h
+++ b/include/math-emu/op-common.h
@@ -685,7 +685,7 @@ do {
\
else
\
  { 
\
r = 0;  
\
-   if (X##_s)  
\
+   if (!X##_s) 
\
  r = ~r;   
\
  } 
\
FP_SET_EXCEPTION(FP_EX_INVALID);
\
@@ -762,7 +762,7 @@ do {
\
if (!rsigned)   
\
  { 
\
r = 0;  
\
-   if (X##_s)  
\
+   if (!X##_s) 
\
  r = ~r;   
\
  } 
\
else if (rsigned != 2)  
\

-- 
Joseph S. Myers
jos...@codesourcery.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] math-emu: fix floating-point to integer overflow detection

2013-10-08 Thread Joseph S. Myers
From: Joseph Myers jos...@codesourcery.com

On overflow, the math-emu macro _FP_TO_INT_ROUND tries to saturate its
result (subject to the value of rsigned specifying the desired
overflow semantics).  However, if the rounding step has the effect of
increasing the exponent so as to cause overflow (if the rounded result
is 1 larger than the largest positive value with the given number of
bits, allowing for signedness), the overflow does not get detected,
meaning that for unsigned results 0 is produced instead of the maximum
unsigned integer with the give number of bits, without an exception
being raised for overflow, and that for signed results the minimum
(negative) value is produced instead of the maximum (positive) value,
again without an exception.  This patch makes the code check for
rounding increasing the exponent and adjusts the exponent value as
needed for the overflow check.

Signed-off-by: Joseph Myers jos...@codesourcery.com

---

This macro is not present in the glibc/libgcc version of the code.
This patch is independent of my separate patch
http://lkml.org/lkml/2013/10/8/694 to fix the results for unsigned
saturation, although you need both patches together to get the correct
results for the affected unsigned overflow case.  It remains the case
both before and after this patch that the conversions wrongly treat a
signed result of the most negative integer as an overflow, when
actually only that integer minus 1 or smaller should be an overflow,
although this only means an incorrect exception rather than affecting
the value returned; that was one of the bugs I fixed in the
glibc/libgcc version of this code in 2006 (as part of a major overhaul
of the code including various interface changes, so not trivially
backportable to the kernel version).

diff --git a/include/math-emu/op-common.h b/include/math-emu/op-common.h
index 9696a5e..6bdf8c6 100644
--- a/include/math-emu/op-common.h
+++ b/include/math-emu/op-common.h
@@ -743,12 +743,17 @@ do {  
\
  } 
\
else
\
  { 
\
+   int _lz0, _lz1; 
\
if (X##_e = -_FP_WORKBITS - 1) 
\
  _FP_FRAC_SET_##wc(X, _FP_MINFRAC_##wc);   
\
else
\
  _FP_FRAC_SRS_##wc(X, _FP_FRACBITS_##fs - 1 - X##_e,   
\
_FP_WFRACBITS_##fs);
\
+   _FP_FRAC_CLZ_##wc(_lz0, X); 
\
_FP_ROUND(wc, X);   
\
+   _FP_FRAC_CLZ_##wc(_lz1, X); 
\
+   if (_lz1  _lz0)
\
+ X##_e++; /* For overflow detection.  */   
\
_FP_FRAC_SRL_##wc(X, _FP_WORKBITS); 
\
_FP_FRAC_ASSEMBLE_##wc(r, X, rsize);
\
  } 
\

-- 
Joseph S. Myers
jos...@codesourcery.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc: fix e500 SPE float to integer and fixed-point conversions

2013-10-08 Thread Joseph S. Myers
:
+   fp_result = 0;
+   s_lo = 0;
+   s_hi = 0;
+   break;
+
+   case EFSCTSI:
+   case EFSCTSF:
+   fp_result = 0;
+   /* Recover the sign of a zero result if possible.  */
+   if (fgpr.wp[1] == 0)
+   s_lo = regs-gpr[fb]  SIGN_BIT_S;
+   break;
+
+   case EVFSCTSI:
+   case EVFSCTSF:
+   fp_result = 0;
+   /* Recover the sign of a zero result if possible.  */
+   if (fgpr.wp[1] == 0)
+   s_lo = regs-gpr[fb]  SIGN_BIT_S;
+   if (fgpr.wp[0] == 0)
+   s_hi = current-thread.evr[fb]  SIGN_BIT_S;
+   break;
+
+   case EFDCTSI:
+   case EFDCTSF:
+   fp_result = 0;
+   s_hi = s_lo;
+   /* Recover the sign of a zero result if possible.  */
+   if (fgpr.wp[1] == 0)
+   s_hi = current-thread.evr[fb]  SIGN_BIT_S;
+   break;
+
+   default:
+   fp_result = 1;
+   break;
+   }
+
pr_debug(round fgpr: %08x  %08x\n, fgpr.wp[0], fgpr.wp[1]);
 
switch (fptype) {
@@ -719,15 +809,30 @@ int speround_handler(struct pt_regs *regs)
if ((FP_ROUNDMODE) == FP_RND_PINF) {
if (!s_lo) fgpr.wp[1]++; /* Z  0, choose Z1 */
} else { /* round to -Inf */
-   if (s_lo) fgpr.wp[1]++; /* Z  0, choose Z2 */
+   if (s_lo) {
+   if (fp_result)
+   fgpr.wp[1]++; /* Z  0, choose Z2 */
+   else
+   fgpr.wp[1]--; /* Z  0, choose Z2 */
+   }
}
break;
 
case DPFP:
if (FP_ROUNDMODE == FP_RND_PINF) {
-   if (!s_hi) fgpr.dp[0]++; /* Z  0, choose Z1 */
+   if (!s_hi) {
+   if (fp_result)
+   fgpr.dp[0]++; /* Z  0, choose Z1 */
+   else
+   fgpr.wp[1]++; /* Z  0, choose Z1 */
+   }
} else { /* round to -Inf */
-   if (s_hi) fgpr.dp[0]++; /* Z  0, choose Z2 */
+   if (s_hi) {
+   if (fp_result)
+   fgpr.dp[0]++; /* Z  0, choose Z2 */
+   else
+   fgpr.wp[1]--; /* Z  0, choose Z2 */
+   }
}
break;
 
@@ -738,10 +843,18 @@ int speround_handler(struct pt_regs *regs)
if (hi_inexact  !s_hi)
fgpr.wp[0]++; /* Z_high word  0, choose Z1 */
} else { /* round to -Inf */
-   if (lo_inexact  s_lo)
-   fgpr.wp[1]++; /* Z_low  0, choose Z2 */
-   if (hi_inexact  s_hi)
-   fgpr.wp[0]++; /* Z_high  0, choose Z2 */
+   if (lo_inexact  s_lo) {
+   if (fp_result)
+   fgpr.wp[1]++; /* Z_low  0, choose Z2 */
+   else
+   fgpr.wp[1]--; /* Z_low  0, choose Z2 */
+   }
+   if (hi_inexact  s_hi) {
+   if (fp_result)
+   fgpr.wp[0]++; /* Z_high  0, choose Z2 
*/
+   else
+   fgpr.wp[0]--; /* Z_high  0, choose Z2 
*/
+   }
}
break;
 

-- 
Joseph S. Myers
jos...@codesourcery.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: fix e500 SPE float to integer and fixed-point conversions

2013-10-08 Thread Joseph S. Myers
On Tue, 8 Oct 2013, Joseph S. Myers wrote:

 I'll send as a followup the testcase I used for verifying that the
 instructions (other than the theoretical conversions to 64-bit
 integers) produce the correct results.  In addition, this has been
 tested with the glibc testsuite (with the e500 port as posted at
 https://sourceware.org/ml/libc-alpha/2013-10/msg00195.html, where it
 improves the libm test results.

Here is that testcase.

#include stdio.h
#include stdlib.h

#define INFF __builtin_inff ()
#define INFD __builtin_inf ()
#define NANF __builtin_nanf ()
#define NAND __builtin_nan ()

/* e500 rounding modes: 0 = nearest, 1 = zero, 2 = up, 3 = down.  */

static inline void
set_rm (unsigned int mode)
{
  unsigned int spefscr;
  asm volatile (mfspefscr %0 : =r (spefscr));
  spefscr = (spefscr  ~3) | mode;
  asm volatile (mtspefscr %0 : : r (spefscr));
}

static int success_count, failure_count;

struct float_test_data
{
  float input;
  unsigned int expected[4];
};

struct double_test_data
{
  double input;
  unsigned int expected[4];
};

typedef float vfloat __attribute__ ((vector_size (8)));
typedef unsigned int vuint __attribute__ ((vector_size (8)));

union vfloat_union
{
  vfloat vf;
  float f[2];
};

union vuint_union
{
  vuint vui;
  unsigned int ui[2];
};

#define T(A, B, C, D, E) { (A), { (B), (C), (D), (E) } }
#define TZ(A, B) T (A, B, B, B, B)

static void
check_result (const char *insn, double input, unsigned int rm,
  unsigned int expected, unsigned int res)
{
  if (res == expected)
success_count++;
  else
{
  failure_count++;
  printf (%s %a mode %u expected 0x%x (%d) got 0x%x (%d)\n,
  insn, input, rm, expected, (int) expected, res, (int) res);
}
}

#define RUN_FLOAT_TESTS(INSN)   \
static void \
test_##INSN (void)  \
{   \
  size_t i; \
  for (i = 0;   \
   i  sizeof (INSN##_test_data) / sizeof (INSN##_test_data[0]);\
   i++) \
{   \
  unsigned int rm;  \
  for (rm = 0; rm = 3; rm++)   \
{   \
  set_rm (rm);  \
  unsigned int res; \
  asm volatile (#INSN  %0, %1 \
: =r (res)   \
: r (INSN##_test_data[i].input)); \
  check_result (#INSN, INSN##_test_data[i].input, rm,   \
INSN##_test_data[i].expected[rm], res); \
}   \
}   \
}

#define RUN_VFLOAT_TESTS(INSN, TINSN)   \
static void \
test_##INSN (void)  \
{   \
  size_t i; \
  for (i = 0;   \
   i  sizeof (TINSN##_test_data) / sizeof (TINSN##_test_data[0]);  \
   i++) \
{   \
  unsigned int rm;  \
  for (rm = 0; rm = 3; rm++)   \
{   \
  set_rm (rm);  \
  union vfloat_union varg;  \
  union vuint_union vres;   \
  varg.f[0] = TINSN##_test_data[i].input;   \
  varg.f[1] = 0;\
  asm volatile (#INSN  %0, %1 \
: =r (vres.vui)  \
: r (varg.vf));   \
  check_result (#INSN  (high), TINSN##_test_data[i].input,\
rm, TINSN##_test_data[i].expected[rm],  \
vres.ui[0

[PATCH] powerpc: fix e500 SPE float rounding inexactness detection

2013-10-05 Thread Joseph S. Myers
From: Joseph Myers jos...@codesourcery.com

The e500 SPE floating-point emulation code for the rounding modes
rounding to positive or negative infinity (which may not be
implemented in hardware) tries to avoid emulating rounding if the
result was inexact.  However, it tests inexactness using the sticky
bit with the cumulative result of previous operations, rather than
with the non-sticky bits relating to the operation that generated the
interrupt.  Furthermore, when a vector operation generates the
interrupt, it's possible that only one of the low and high parts is
inexact, and so only that part should have rounding emulated.  This
results in incorrect rounding of exact results in these modes when the
sticky bit is set from a previous operation.

(I'm not sure why the rounding interrupts are generated at all when
the result is exact, but empirically the hardware does generate them.)

This patch checks for inexactness using the correct bits of SPEFSCR,
and ensures that rounding only occurs when the relevant part of the
result was actually inexact.

Signed-off-by: Joseph Myers jos...@codesourcery.com

---

diff --git a/arch/powerpc/math-emu/math_efp.c b/arch/powerpc/math-emu/math_efp.c
index a73f088..ecdf35d 100644
--- a/arch/powerpc/math-emu/math_efp.c
+++ b/arch/powerpc/math-emu/math_efp.c
@@ -662,7 +680,8 @@ int speround_handler(struct pt_regs *regs)
 {
union dw_union fgpr;
int s_lo, s_hi;
-   unsigned long speinsn, type, fc;
+   int lo_inexact, hi_inexact;
+   unsigned long speinsn, type, fc, fptype;
 
if (get_user(speinsn, (unsigned int __user *) regs-nip))
return -EFAULT;
@@ -675,8 +694,12 @@ int speround_handler(struct pt_regs *regs)
__FPU_FPSCR = mfspr(SPRN_SPEFSCR);
pr_debug(speinsn:%08lx spefscr:%08lx\n, speinsn, __FPU_FPSCR);
 
+   fptype = (speinsn  5)  0x7;
+
/* No need to round if the result is exact */
-   if (!(__FPU_FPSCR  FP_EX_INEXACT))
+   lo_inexact = __FPU_FPSCR  (SPEFSCR_FG | SPEFSCR_FX);
+   hi_inexact = __FPU_FPSCR  (SPEFSCR_FGH | SPEFSCR_FXH);
+   if (!(lo_inexact || (hi_inexact  fptype == VCT)))
return 0;
 
fc = (speinsn  21)  0x1f;
@@ -687,7 +710,7 @@ int speround_handler(struct pt_regs *regs)
 
pr_debug(round fgpr: %08x  %08x\n, fgpr.wp[0], fgpr.wp[1]);
 
-   switch ((speinsn  5)  0x7) {
+   switch (fptype) {
/* Since SPE instructions on E500 core can handle round to nearest
 * and round toward zero with IEEE-754 complied, we just need
 * to handle round toward +Inf and round toward -Inf by software.
@@ -710,11 +733,15 @@ int speround_handler(struct pt_regs *regs)
 
case VCT:
if (FP_ROUNDMODE == FP_RND_PINF) {
-   if (!s_lo) fgpr.wp[1]++; /* Z_low  0, choose Z1 */
-   if (!s_hi) fgpr.wp[0]++; /* Z_high word  0, choose Z1 
*/
+   if (lo_inexact  !s_lo)
+   fgpr.wp[1]++; /* Z_low  0, choose Z1 */
+   if (hi_inexact  !s_hi)
+   fgpr.wp[0]++; /* Z_high word  0, choose Z1 */
} else { /* round to -Inf */
-   if (s_lo) fgpr.wp[1]++; /* Z_low  0, choose Z2 */
-   if (s_hi) fgpr.wp[0]++; /* Z_high  0, choose Z2 */
+   if (lo_inexact  s_lo)
+   fgpr.wp[1]++; /* Z_low  0, choose Z2 */
+   if (hi_inexact  s_hi)
+   fgpr.wp[0]++; /* Z_high  0, choose Z2 */
}
break;
 

-- 
Joseph S. Myers
jos...@codesourcery.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc: fix exception clearing in e500 SPE float emulation

2013-10-05 Thread Joseph S. Myers
  23)  0x7))  2));
 
 update_regs:
-   __FPU_FPSCR = ~FP_EX_MASK;
+   /*
+* If the invalid exception sticky bit was set by the
+* processor for non-finite input, but was not set before the
+* instruction being emulated, clear it.  Likewise for the
+* underflow bit, which may have been set by the processor
+* for exact underflow, not just inexact underflow when the
+* flag should be set for IEEE 754 semantics.  Other sticky
+* exceptions will only be set by the processor when they are
+* correct according to IEEE 754 semantics, and we must not
+* clear sticky bits that were already set before the emulated
+* instruction as they represent the user-visible sticky
+* exception status.  inexact traps to kernel are not
+* required for IEEE semantics and are not enabled by default,
+* so the inexact sticky bit may have been set by a previous
+* instruction without the kernel being aware of it.
+*/
+   __FPU_FPSCR
+ = ~(FP_EX_INVALID | FP_EX_UNDERFLOW) | current-thread.spefscr_last;
__FPU_FPSCR |= (FP_CUR_EXCEPTIONS  FP_EX_MASK);
mtspr(SPRN_SPEFSCR, __FPU_FPSCR);
+   current-thread.spefscr_last = __FPU_FPSCR;
 
current-thread.evr[fc] = vc.wp[0];
regs-gpr[fc] = vc.wp[1];

-- 
Joseph S. Myers
jos...@codesourcery.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev