[PATCHv2 1/6] powerpc: fix exception clearing in e500 SPE float emulation
if (cpu_has_feature(CPU_FTR_SPE)) { + /* +* When the sticky exception bits are set +* directly by userspace, it must call prctl +* with PR_GET_FPEXC (with PR_FP_EXC_SW_ENABLE +* in the existing prctl settings) or +* PR_SET_FPEXC (with PR_FP_EXC_SW_ENABLE in +* the bits being set). fenv.h functions +* saving and restoring the whole +* floating-point environment need to do so +* anyway to restore the prctl settings from +* the saved environment. +*/ + tsk-thread.spefscr_last = mfspr(SPRN_SPEFSCR); tsk-thread.fpexc_mode = val (PR_FP_EXC_SW_ENABLE | PR_FP_ALL_EXCEPT); return 0; @@ -1206,9 +1219,22 @@ int get_fpexc_mode(struct task_struct *tsk, unsigned long adr) if (tsk-thread.fpexc_mode PR_FP_EXC_SW_ENABLE) #ifdef CONFIG_SPE - if (cpu_has_feature(CPU_FTR_SPE)) + if (cpu_has_feature(CPU_FTR_SPE)) { + /* +* When the sticky exception bits are set +* directly by userspace, it must call prctl +* with PR_GET_FPEXC (with PR_FP_EXC_SW_ENABLE +* in the existing prctl settings) or +* PR_SET_FPEXC (with PR_FP_EXC_SW_ENABLE in +* the bits being set). fenv.h functions +* saving and restoring the whole +* floating-point environment need to do so +* anyway to restore the prctl settings from +* the saved environment. +*/ + tsk-thread.spefscr_last = mfspr(SPRN_SPEFSCR); val = tsk-thread.fpexc_mode; - else + } else return -EINVAL; #else return -EINVAL; diff --git a/arch/powerpc/math-emu/math_efp.c b/arch/powerpc/math-emu/math_efp.c index a73f088..59835c6 100644 --- a/arch/powerpc/math-emu/math_efp.c +++ b/arch/powerpc/math-emu/math_efp.c @@ -630,9 +630,27 @@ update_ccr: regs-ccr |= (IR ((7 - ((speinsn 23) 0x7)) 2)); update_regs: - __FPU_FPSCR = ~FP_EX_MASK; + /* +* If the invalid exception sticky bit was set by the +* processor for non-finite input, but was not set before the +* instruction being emulated, clear it. Likewise for the +* underflow bit, which may have been set by the processor +* for exact underflow, not just inexact underflow when the +* flag should be set for IEEE 754 semantics. Other sticky +* exceptions will only be set by the processor when they are +* correct according to IEEE 754 semantics, and we must not +* clear sticky bits that were already set before the emulated +* instruction as they represent the user-visible sticky +* exception status. inexact traps to kernel are not +* required for IEEE semantics and are not enabled by default, +* so the inexact sticky bit may have been set by a previous +* instruction without the kernel being aware of it. +*/ + __FPU_FPSCR + = ~(FP_EX_INVALID | FP_EX_UNDERFLOW) | current-thread.spefscr_last; __FPU_FPSCR |= (FP_CUR_EXCEPTIONS FP_EX_MASK); mtspr(SPRN_SPEFSCR, __FPU_FPSCR); + current-thread.spefscr_last = __FPU_FPSCR; current-thread.evr[fc] = vc.wp[0]; regs-gpr[fc] = vc.wp[1]; -- Joseph S. Myers jos...@codesourcery.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/6] powerpc: fix exception clearing in e500 SPE float emulation
On Fri, 22 Nov 2013, Scott Wood wrote: This sounds like an incompatible change to userspace API. What about older glibc? What about user code that directly manipulates these bits rather than going through libc, or uses a libc other than glibc? Where is this API requirement documented? The previous EGLIBC port, and the uClibc code copied from it, is fundamentally broken as regards any use of prctl for floating-point exceptions because it didn't use the PR_FP_EXC_SW_ENABLE bit in its prctl calls (and did various worse things, such as passing a pointer when prctl expected an integer). If you avoid anything where prctl is used, the clearing of sticky bits still means it will never give anything approximating correct exception semantics with existing kernels. I don't believe the patch makes things any worse for existing code that doesn't try to inform the kernel of changes to sticky bits - such code may get incorrect exceptions in some cases, but it would have done so anyway in other cases. This is the best API I could come up with to fix the fundamentally broken nature of what came before, taking into account that in many cases a prctl call is already needed along with userspace manipulation of exception bits. I'm not aware of any kernel documentation where this sort of subarchitecture-specific API detail is documented. (The API also includes such things as needing to leave the spefscr trap-enable bits set and use prctl to control whether SIGFPE results from exceptions.) I think the impact of this could be reduced by using this mechanism only to clear bits, rather than set them. That is, if the exception bit is unset, don't set it just because it's set in spefscr_last -- but if it's not set in spefscr_last, and the emulation code doesn't want to set it, then clear it. It should already be the case in this patch that if a bit is clear in spefscr, and set in spefscr_last (i.e. userspace did not inform the kernel of clearing the bit, and no traps since then have resulted in the kernel noticing it was cleared), it won't get set unless the emulation code wants to set it. The sole place spefscr_last is read is in the statement __FPU_FPSCR = ~(FP_EX_INVALID | FP_EX_UNDERFLOW) | current-thread.spefscr_last; - if the bit is already clear in spefscr, this statement has no effect on it. Are there any cases where the exception bit can be set without the kernel taking a trap, or is userspace manipulation limited to clearing the bits? Userspace can both set and clear the bits without a trap. For example, fesetenv restores a saved value of spefscr which may both set and clear bits (and then it calls prctl because it needs to do so anyway to restore the saved state for which exceptions were enabled). fesetexceptflag restores saved state of particular exceptions without a trap (so needs to call prctl specially to inform the kernel of a change). -- Joseph S. Myers jos...@codesourcery.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Ping^2 Re: [PATCH 0/6] powerpc/math-emu: e500 SPE float emulation fixes
Ping^2. I still haven't seen any comments on any of these patches. -- Joseph S. Myers jos...@codesourcery.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Ping Re: [PATCH 0/6] powerpc/math-emu: e500 SPE float emulation fixes
Ping. I haven't seen any comments on any of these patches. -- Joseph S. Myers jos...@codesourcery.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/6] powerpc: fix e500 SPE float rounding inexactness detection
From: Joseph Myers jos...@codesourcery.com The e500 SPE floating-point emulation code for the rounding modes rounding to positive or negative infinity (which may not be implemented in hardware) tries to avoid emulating rounding if the result was inexact. However, it tests inexactness using the sticky bit with the cumulative result of previous operations, rather than with the non-sticky bits relating to the operation that generated the interrupt. Furthermore, when a vector operation generates the interrupt, it's possible that only one of the low and high parts is inexact, and so only that part should have rounding emulated. This results in incorrect rounding of exact results in these modes when the sticky bit is set from a previous operation. (I'm not sure why the rounding interrupts are generated at all when the result is exact, but empirically the hardware does generate them.) This patch checks for inexactness using the correct bits of SPEFSCR, and ensures that rounding only occurs when the relevant part of the result was actually inexact. Signed-off-by: Joseph Myers jos...@codesourcery.com --- Previous submission: http://lkml.org/lkml/2013/10/4/497. diff --git a/arch/powerpc/math-emu/math_efp.c b/arch/powerpc/math-emu/math_efp.c index 59835c6..ecdf35d 100644 --- a/arch/powerpc/math-emu/math_efp.c +++ b/arch/powerpc/math-emu/math_efp.c @@ -680,7 +680,8 @@ int speround_handler(struct pt_regs *regs) { union dw_union fgpr; int s_lo, s_hi; - unsigned long speinsn, type, fc; + int lo_inexact, hi_inexact; + unsigned long speinsn, type, fc, fptype; if (get_user(speinsn, (unsigned int __user *) regs-nip)) return -EFAULT; @@ -693,8 +694,12 @@ int speround_handler(struct pt_regs *regs) __FPU_FPSCR = mfspr(SPRN_SPEFSCR); pr_debug(speinsn:%08lx spefscr:%08lx\n, speinsn, __FPU_FPSCR); + fptype = (speinsn 5) 0x7; + /* No need to round if the result is exact */ - if (!(__FPU_FPSCR FP_EX_INEXACT)) + lo_inexact = __FPU_FPSCR (SPEFSCR_FG | SPEFSCR_FX); + hi_inexact = __FPU_FPSCR (SPEFSCR_FGH | SPEFSCR_FXH); + if (!(lo_inexact || (hi_inexact fptype == VCT))) return 0; fc = (speinsn 21) 0x1f; @@ -705,7 +710,7 @@ int speround_handler(struct pt_regs *regs) pr_debug(round fgpr: %08x %08x\n, fgpr.wp[0], fgpr.wp[1]); - switch ((speinsn 5) 0x7) { + switch (fptype) { /* Since SPE instructions on E500 core can handle round to nearest * and round toward zero with IEEE-754 complied, we just need * to handle round toward +Inf and round toward -Inf by software. @@ -728,11 +733,15 @@ int speround_handler(struct pt_regs *regs) case VCT: if (FP_ROUNDMODE == FP_RND_PINF) { - if (!s_lo) fgpr.wp[1]++; /* Z_low 0, choose Z1 */ - if (!s_hi) fgpr.wp[0]++; /* Z_high word 0, choose Z1 */ + if (lo_inexact !s_lo) + fgpr.wp[1]++; /* Z_low 0, choose Z1 */ + if (hi_inexact !s_hi) + fgpr.wp[0]++; /* Z_high word 0, choose Z1 */ } else { /* round to -Inf */ - if (s_lo) fgpr.wp[1]++; /* Z_low 0, choose Z2 */ - if (s_hi) fgpr.wp[0]++; /* Z_high 0, choose Z2 */ + if (lo_inexact s_lo) + fgpr.wp[1]++; /* Z_low 0, choose Z2 */ + if (hi_inexact s_hi) + fgpr.wp[0]++; /* Z_high 0, choose Z2 */ } break; -- Joseph S. Myers jos...@codesourcery.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 3/6] math-emu: fix floating-point to integer unsigned saturation
From: Joseph Myers jos...@codesourcery.com The math-emu macros _FP_TO_INT and _FP_TO_INT_ROUND are supposed to saturate their results for out-of-range arguments, except in the case rsigned == 2 (when instead the low bits of the result are taken). However, in the case rsigned == 0 (converting to unsigned integers), they mistakenly produce 0 for positive results and the maximum unsigned integer for negative results, the opposite of correct unsigned saturation. This patch fixes the logic. Signed-off-by: Joseph Myers jos...@codesourcery.com --- Previous submission: http://lkml.org/lkml/2013/10/8/694. I have made the corresponding changes to the glibc/libgcc copy of this code, given that it would be desirable to resync the Linux and glibc/libgcc copies (the latter has had many enhancements and bug fixes since it was copied into Linux), although strictly this incorrect saturation is only a bug when trying to emulate particular instruction semantics, not when used in userspace to implement C operations where the results of out-of-range conversions are unspecified or undefined. diff --git a/include/math-emu/op-common.h b/include/math-emu/op-common.h index 9696a5e..70fe5e9 100644 --- a/include/math-emu/op-common.h +++ b/include/math-emu/op-common.h @@ -685,7 +685,7 @@ do { \ else \ { \ r = 0; \ - if (X##_s) \ + if (!X##_s) \ r = ~r; \ } \ FP_SET_EXCEPTION(FP_EX_INVALID); \ @@ -762,7 +762,7 @@ do { \ if (!rsigned) \ { \ r = 0; \ - if (X##_s) \ + if (!X##_s) \ r = ~r; \ } \ else if (rsigned != 2) \ -- Joseph S. Myers jos...@codesourcery.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/6] powerpc/math-emu: e500 SPE float emulation fixes
This patch series fixes various problems with the floating-point emulation code for powerpc e500 SPE (some being issues with the e500-specific emulation code, some with the generic math-emu headers). All six patches were sent individually last month as the issues were identified and fixed in the course of preparing the e500 glibc port, and received no comments. There are no substantive changes to the patches in this version, but I've retested the glibc port (which is now upstream, along with all the generic math-emu changes relevant to the glibc soft-fp code, and various fixes to soft-fp corresponding to fixes in the kernel code in the hope that at some point we can get the kernel using the current soft-fp code again) with current kernel sources with this patch series applied. The only dependencies between patches in this series should be that patch 5 (fix e500 SPE float to integer and fixed-point conversions) depends on patch 2 (fix e500 SPE float rounding inexactness detection). Other than that, I think any subset of the patches can be applied in any order, if some subset seems OK but there are concerns about other patches in the series. -- Joseph S. Myers jos...@codesourcery.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 4/6] math-emu: fix floating-point to integer overflow detection
From: Joseph Myers jos...@codesourcery.com On overflow, the math-emu macro _FP_TO_INT_ROUND tries to saturate its result (subject to the value of rsigned specifying the desired overflow semantics). However, if the rounding step has the effect of increasing the exponent so as to cause overflow (if the rounded result is 1 larger than the largest positive value with the given number of bits, allowing for signedness), the overflow does not get detected, meaning that for unsigned results 0 is produced instead of the maximum unsigned integer with the give number of bits, without an exception being raised for overflow, and that for signed results the minimum (negative) value is produced instead of the maximum (positive) value, again without an exception. This patch makes the code check for rounding increasing the exponent and adjusts the exponent value as needed for the overflow check. Signed-off-by: Joseph Myers jos...@codesourcery.com --- Previous submission: http://lkml.org/lkml/2013/10/8/700. This macro is not present in the glibc/libgcc version of the code. It remains the case both before and after this patch that the conversions wrongly treat a signed result of the most negative integer as an overflow, when actually only that integer minus 1 or smaller should be an overflow, although this only means an incorrect exception rather than affecting the value returned; that was one of the bugs I fixed in the glibc/libgcc version of this code in 2006 (as part of a major overhaul of the code including various interface changes, so not trivially backportable to the kernel version). diff --git a/include/math-emu/op-common.h b/include/math-emu/op-common.h index 70fe5e9..6bdf8c6 100644 --- a/include/math-emu/op-common.h +++ b/include/math-emu/op-common.h @@ -743,12 +743,17 @@ do { \ } \ else \ { \ + int _lz0, _lz1; \ if (X##_e = -_FP_WORKBITS - 1) \ _FP_FRAC_SET_##wc(X, _FP_MINFRAC_##wc); \ else \ _FP_FRAC_SRS_##wc(X, _FP_FRACBITS_##fs - 1 - X##_e, \ _FP_WFRACBITS_##fs); \ + _FP_FRAC_CLZ_##wc(_lz0, X); \ _FP_ROUND(wc, X); \ + _FP_FRAC_CLZ_##wc(_lz1, X); \ + if (_lz1 _lz0) \ + X##_e++; /* For overflow detection. */ \ _FP_FRAC_SRL_##wc(X, _FP_WORKBITS); \ _FP_FRAC_ASSEMBLE_##wc(r, X, rsize); \ } \ -- Joseph S. Myers jos...@codesourcery.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 6/6] powerpc: fix e500 SPE float SIGFPE generation
From: Joseph Myers jos...@codesourcery.com The e500 SPE floating-point emulation code is called from SPEFloatingPointException and SPEFloatingPointRoundException in arch/powerpc/kernel/traps.c. Those functions have support for generating SIGFPE, but do_spe_mathemu and speround_handler don't generate a return value to indicate that this should be done. Such a return value should depend on whether an exception is raised that has been set via prctl to generate SIGFPE. This patch adds the relevant logic in these functions so that SIGFPE is generated as expected by the glibc testsuite. Signed-off-by: Joseph Myers jos...@codesourcery.com --- Previous submission: http://lkml.org/lkml/2013/10/10/626. diff --git a/arch/powerpc/math-emu/math_efp.c b/arch/powerpc/math-emu/math_efp.c index 01a0abb..28337c9 100644 --- a/arch/powerpc/math-emu/math_efp.c +++ b/arch/powerpc/math-emu/math_efp.c @@ -20,6 +20,7 @@ */ #include linux/types.h +#include linux/prctl.h #include asm/uaccess.h #include asm/reg.h @@ -691,6 +692,23 @@ update_regs: pr_debug(va: %08x %08x\n, va.wp[0], va.wp[1]); pr_debug(vb: %08x %08x\n, vb.wp[0], vb.wp[1]); + if (current-thread.fpexc_mode PR_FP_EXC_SW_ENABLE) { + if ((FP_CUR_EXCEPTIONS FP_EX_DIVZERO) +(current-thread.fpexc_mode PR_FP_EXC_DIV)) + return 1; + if ((FP_CUR_EXCEPTIONS FP_EX_OVERFLOW) +(current-thread.fpexc_mode PR_FP_EXC_OVF)) + return 1; + if ((FP_CUR_EXCEPTIONS FP_EX_UNDERFLOW) +(current-thread.fpexc_mode PR_FP_EXC_UND)) + return 1; + if ((FP_CUR_EXCEPTIONS FP_EX_INEXACT) +(current-thread.fpexc_mode PR_FP_EXC_RES)) + return 1; + if ((FP_CUR_EXCEPTIONS FP_EX_INVALID) +(current-thread.fpexc_mode PR_FP_EXC_INV)) + return 1; + } return 0; illegal: @@ -867,6 +885,8 @@ int speround_handler(struct pt_regs *regs) pr_debug( to fgpr: %08x %08x\n, fgpr.wp[0], fgpr.wp[1]); + if (current-thread.fpexc_mode PR_FP_EXC_SW_ENABLE) + return (current-thread.fpexc_mode PR_FP_EXC_RES) ? 1 : 0; return 0; } -- Joseph S. Myers jos...@codesourcery.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/6] powerpc: fix exception clearing in e500 SPE float emulation
by the +* processor for non-finite input, but was not set before the +* instruction being emulated, clear it. Likewise for the +* underflow bit, which may have been set by the processor +* for exact underflow, not just inexact underflow when the +* flag should be set for IEEE 754 semantics. Other sticky +* exceptions will only be set by the processor when they are +* correct according to IEEE 754 semantics, and we must not +* clear sticky bits that were already set before the emulated +* instruction as they represent the user-visible sticky +* exception status. inexact traps to kernel are not +* required for IEEE semantics and are not enabled by default, +* so the inexact sticky bit may have been set by a previous +* instruction without the kernel being aware of it. +*/ + __FPU_FPSCR + = ~(FP_EX_INVALID | FP_EX_UNDERFLOW) | current-thread.spefscr_last; __FPU_FPSCR |= (FP_CUR_EXCEPTIONS FP_EX_MASK); mtspr(SPRN_SPEFSCR, __FPU_FPSCR); + current-thread.spefscr_last = __FPU_FPSCR; current-thread.evr[fc] = vc.wp[0]; regs-gpr[fc] = vc.wp[1]; -- Joseph S. Myers jos...@codesourcery.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: fix e500 SPE float SIGFPE generation
From: Joseph Myers jos...@codesourcery.com The e500 SPE floating-point emulation code is called from SPEFloatingPointException and SPEFloatingPointRoundException in arch/powerpc/kernel/traps.c. Those functions have support for generating SIGFPE, but do_spe_mathemu and speround_handler don't generate a return value to indicate that this should be done. Such a return value should depend on whether an exception is raised that has been set via prctl to generate SIGFPE. This patch adds the relevant logic in these functions so that SIGFPE is generated as expected by the glibc testsuite. Signed-off-by: Joseph Myers jos...@codesourcery.com --- This patch is not intended to depend on any of my previous patches http://lkml.org/lkml/2013/10/4/495, http://lkml.org/lkml/2013/10/4/497, http://lkml.org/lkml/2013/10/8/694, http://lkml.org/lkml/2013/10/8/700 and http://lkml.org/lkml/2013/10/8/705, although testing has been on top of that patch series and having all six patches will produce the best results. diff --git a/arch/powerpc/math-emu/math_efp.c b/arch/powerpc/math-emu/math_efp.c index 01a0abb..28337c9 100644 --- a/arch/powerpc/math-emu/math_efp.c +++ b/arch/powerpc/math-emu/math_efp.c @@ -20,6 +20,7 @@ */ #include linux/types.h +#include linux/prctl.h #include asm/uaccess.h #include asm/reg.h @@ -691,6 +692,23 @@ update_regs: pr_debug(va: %08x %08x\n, va.wp[0], va.wp[1]); pr_debug(vb: %08x %08x\n, vb.wp[0], vb.wp[1]); + if (current-thread.fpexc_mode PR_FP_EXC_SW_ENABLE) { + if ((FP_CUR_EXCEPTIONS FP_EX_DIVZERO) +(current-thread.fpexc_mode PR_FP_EXC_DIV)) + return 1; + if ((FP_CUR_EXCEPTIONS FP_EX_OVERFLOW) +(current-thread.fpexc_mode PR_FP_EXC_OVF)) + return 1; + if ((FP_CUR_EXCEPTIONS FP_EX_UNDERFLOW) +(current-thread.fpexc_mode PR_FP_EXC_UND)) + return 1; + if ((FP_CUR_EXCEPTIONS FP_EX_INEXACT) +(current-thread.fpexc_mode PR_FP_EXC_RES)) + return 1; + if ((FP_CUR_EXCEPTIONS FP_EX_INVALID) +(current-thread.fpexc_mode PR_FP_EXC_INV)) + return 1; + } return 0; illegal: @@ -867,6 +885,8 @@ int speround_handler(struct pt_regs *regs) pr_debug( to fgpr: %08x %08x\n, fgpr.wp[0], fgpr.wp[1]); + if (current-thread.fpexc_mode PR_FP_EXC_SW_ENABLE) + return (current-thread.fpexc_mode PR_FP_EXC_RES) ? 1 : 0; return 0; } -- Joseph S. Myers jos...@codesourcery.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] math-emu: fix floating-point to integer unsigned saturation
From: Joseph Myers jos...@codesourcery.com The math-emu macros _FP_TO_INT and _FP_TO_INT_ROUND are supposed to saturate their results for out-of-range arguments, except in the case rsigned == 2 (when instead the low bits of the result are taken). However, in the case rsigned == 0 (converting to unsigned integers), they mistakenly produce 0 for positive results and the maximum unsigned integer for negative results, the opposite of correct unsigned saturation. This patch fixes the logic. Signed-off-by: Joseph Myers jos...@codesourcery.com --- I intend to make the corresponding changes to the glibc/libgcc copy of this code, given that it would be desirable to resync the Linux and glibc/libgcc copies (the latter has had many enhancements and bug fixes since it was copied into Linux), although strictly this incorrect saturation is only a bug when trying to emulate particular instruction semantics, not when used in userspace to implement C operations where the results of out-of-range conversions are unspecified or undefined. diff --git a/include/math-emu/op-common.h b/include/math-emu/op-common.h index 9696a5e..70fe5e9 100644 --- a/include/math-emu/op-common.h +++ b/include/math-emu/op-common.h @@ -685,7 +685,7 @@ do { \ else \ { \ r = 0; \ - if (X##_s) \ + if (!X##_s) \ r = ~r; \ } \ FP_SET_EXCEPTION(FP_EX_INVALID); \ @@ -762,7 +762,7 @@ do { \ if (!rsigned) \ { \ r = 0; \ - if (X##_s) \ + if (!X##_s) \ r = ~r; \ } \ else if (rsigned != 2) \ -- Joseph S. Myers jos...@codesourcery.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] math-emu: fix floating-point to integer overflow detection
From: Joseph Myers jos...@codesourcery.com On overflow, the math-emu macro _FP_TO_INT_ROUND tries to saturate its result (subject to the value of rsigned specifying the desired overflow semantics). However, if the rounding step has the effect of increasing the exponent so as to cause overflow (if the rounded result is 1 larger than the largest positive value with the given number of bits, allowing for signedness), the overflow does not get detected, meaning that for unsigned results 0 is produced instead of the maximum unsigned integer with the give number of bits, without an exception being raised for overflow, and that for signed results the minimum (negative) value is produced instead of the maximum (positive) value, again without an exception. This patch makes the code check for rounding increasing the exponent and adjusts the exponent value as needed for the overflow check. Signed-off-by: Joseph Myers jos...@codesourcery.com --- This macro is not present in the glibc/libgcc version of the code. This patch is independent of my separate patch http://lkml.org/lkml/2013/10/8/694 to fix the results for unsigned saturation, although you need both patches together to get the correct results for the affected unsigned overflow case. It remains the case both before and after this patch that the conversions wrongly treat a signed result of the most negative integer as an overflow, when actually only that integer minus 1 or smaller should be an overflow, although this only means an incorrect exception rather than affecting the value returned; that was one of the bugs I fixed in the glibc/libgcc version of this code in 2006 (as part of a major overhaul of the code including various interface changes, so not trivially backportable to the kernel version). diff --git a/include/math-emu/op-common.h b/include/math-emu/op-common.h index 9696a5e..6bdf8c6 100644 --- a/include/math-emu/op-common.h +++ b/include/math-emu/op-common.h @@ -743,12 +743,17 @@ do { \ } \ else \ { \ + int _lz0, _lz1; \ if (X##_e = -_FP_WORKBITS - 1) \ _FP_FRAC_SET_##wc(X, _FP_MINFRAC_##wc); \ else \ _FP_FRAC_SRS_##wc(X, _FP_FRACBITS_##fs - 1 - X##_e, \ _FP_WFRACBITS_##fs); \ + _FP_FRAC_CLZ_##wc(_lz0, X); \ _FP_ROUND(wc, X); \ + _FP_FRAC_CLZ_##wc(_lz1, X); \ + if (_lz1 _lz0) \ + X##_e++; /* For overflow detection. */ \ _FP_FRAC_SRL_##wc(X, _FP_WORKBITS); \ _FP_FRAC_ASSEMBLE_##wc(r, X, rsize); \ } \ -- Joseph S. Myers jos...@codesourcery.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: fix e500 SPE float to integer and fixed-point conversions
: + fp_result = 0; + s_lo = 0; + s_hi = 0; + break; + + case EFSCTSI: + case EFSCTSF: + fp_result = 0; + /* Recover the sign of a zero result if possible. */ + if (fgpr.wp[1] == 0) + s_lo = regs-gpr[fb] SIGN_BIT_S; + break; + + case EVFSCTSI: + case EVFSCTSF: + fp_result = 0; + /* Recover the sign of a zero result if possible. */ + if (fgpr.wp[1] == 0) + s_lo = regs-gpr[fb] SIGN_BIT_S; + if (fgpr.wp[0] == 0) + s_hi = current-thread.evr[fb] SIGN_BIT_S; + break; + + case EFDCTSI: + case EFDCTSF: + fp_result = 0; + s_hi = s_lo; + /* Recover the sign of a zero result if possible. */ + if (fgpr.wp[1] == 0) + s_hi = current-thread.evr[fb] SIGN_BIT_S; + break; + + default: + fp_result = 1; + break; + } + pr_debug(round fgpr: %08x %08x\n, fgpr.wp[0], fgpr.wp[1]); switch (fptype) { @@ -719,15 +809,30 @@ int speround_handler(struct pt_regs *regs) if ((FP_ROUNDMODE) == FP_RND_PINF) { if (!s_lo) fgpr.wp[1]++; /* Z 0, choose Z1 */ } else { /* round to -Inf */ - if (s_lo) fgpr.wp[1]++; /* Z 0, choose Z2 */ + if (s_lo) { + if (fp_result) + fgpr.wp[1]++; /* Z 0, choose Z2 */ + else + fgpr.wp[1]--; /* Z 0, choose Z2 */ + } } break; case DPFP: if (FP_ROUNDMODE == FP_RND_PINF) { - if (!s_hi) fgpr.dp[0]++; /* Z 0, choose Z1 */ + if (!s_hi) { + if (fp_result) + fgpr.dp[0]++; /* Z 0, choose Z1 */ + else + fgpr.wp[1]++; /* Z 0, choose Z1 */ + } } else { /* round to -Inf */ - if (s_hi) fgpr.dp[0]++; /* Z 0, choose Z2 */ + if (s_hi) { + if (fp_result) + fgpr.dp[0]++; /* Z 0, choose Z2 */ + else + fgpr.wp[1]--; /* Z 0, choose Z2 */ + } } break; @@ -738,10 +843,18 @@ int speround_handler(struct pt_regs *regs) if (hi_inexact !s_hi) fgpr.wp[0]++; /* Z_high word 0, choose Z1 */ } else { /* round to -Inf */ - if (lo_inexact s_lo) - fgpr.wp[1]++; /* Z_low 0, choose Z2 */ - if (hi_inexact s_hi) - fgpr.wp[0]++; /* Z_high 0, choose Z2 */ + if (lo_inexact s_lo) { + if (fp_result) + fgpr.wp[1]++; /* Z_low 0, choose Z2 */ + else + fgpr.wp[1]--; /* Z_low 0, choose Z2 */ + } + if (hi_inexact s_hi) { + if (fp_result) + fgpr.wp[0]++; /* Z_high 0, choose Z2 */ + else + fgpr.wp[0]--; /* Z_high 0, choose Z2 */ + } } break; -- Joseph S. Myers jos...@codesourcery.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: fix e500 SPE float to integer and fixed-point conversions
On Tue, 8 Oct 2013, Joseph S. Myers wrote: I'll send as a followup the testcase I used for verifying that the instructions (other than the theoretical conversions to 64-bit integers) produce the correct results. In addition, this has been tested with the glibc testsuite (with the e500 port as posted at https://sourceware.org/ml/libc-alpha/2013-10/msg00195.html, where it improves the libm test results. Here is that testcase. #include stdio.h #include stdlib.h #define INFF __builtin_inff () #define INFD __builtin_inf () #define NANF __builtin_nanf () #define NAND __builtin_nan () /* e500 rounding modes: 0 = nearest, 1 = zero, 2 = up, 3 = down. */ static inline void set_rm (unsigned int mode) { unsigned int spefscr; asm volatile (mfspefscr %0 : =r (spefscr)); spefscr = (spefscr ~3) | mode; asm volatile (mtspefscr %0 : : r (spefscr)); } static int success_count, failure_count; struct float_test_data { float input; unsigned int expected[4]; }; struct double_test_data { double input; unsigned int expected[4]; }; typedef float vfloat __attribute__ ((vector_size (8))); typedef unsigned int vuint __attribute__ ((vector_size (8))); union vfloat_union { vfloat vf; float f[2]; }; union vuint_union { vuint vui; unsigned int ui[2]; }; #define T(A, B, C, D, E) { (A), { (B), (C), (D), (E) } } #define TZ(A, B) T (A, B, B, B, B) static void check_result (const char *insn, double input, unsigned int rm, unsigned int expected, unsigned int res) { if (res == expected) success_count++; else { failure_count++; printf (%s %a mode %u expected 0x%x (%d) got 0x%x (%d)\n, insn, input, rm, expected, (int) expected, res, (int) res); } } #define RUN_FLOAT_TESTS(INSN) \ static void \ test_##INSN (void) \ { \ size_t i; \ for (i = 0; \ i sizeof (INSN##_test_data) / sizeof (INSN##_test_data[0]);\ i++) \ { \ unsigned int rm; \ for (rm = 0; rm = 3; rm++) \ { \ set_rm (rm); \ unsigned int res; \ asm volatile (#INSN %0, %1 \ : =r (res) \ : r (INSN##_test_data[i].input)); \ check_result (#INSN, INSN##_test_data[i].input, rm, \ INSN##_test_data[i].expected[rm], res); \ } \ } \ } #define RUN_VFLOAT_TESTS(INSN, TINSN) \ static void \ test_##INSN (void) \ { \ size_t i; \ for (i = 0; \ i sizeof (TINSN##_test_data) / sizeof (TINSN##_test_data[0]); \ i++) \ { \ unsigned int rm; \ for (rm = 0; rm = 3; rm++) \ { \ set_rm (rm); \ union vfloat_union varg; \ union vuint_union vres; \ varg.f[0] = TINSN##_test_data[i].input; \ varg.f[1] = 0;\ asm volatile (#INSN %0, %1 \ : =r (vres.vui) \ : r (varg.vf)); \ check_result (#INSN (high), TINSN##_test_data[i].input,\ rm, TINSN##_test_data[i].expected[rm], \ vres.ui[0
[PATCH] powerpc: fix e500 SPE float rounding inexactness detection
From: Joseph Myers jos...@codesourcery.com The e500 SPE floating-point emulation code for the rounding modes rounding to positive or negative infinity (which may not be implemented in hardware) tries to avoid emulating rounding if the result was inexact. However, it tests inexactness using the sticky bit with the cumulative result of previous operations, rather than with the non-sticky bits relating to the operation that generated the interrupt. Furthermore, when a vector operation generates the interrupt, it's possible that only one of the low and high parts is inexact, and so only that part should have rounding emulated. This results in incorrect rounding of exact results in these modes when the sticky bit is set from a previous operation. (I'm not sure why the rounding interrupts are generated at all when the result is exact, but empirically the hardware does generate them.) This patch checks for inexactness using the correct bits of SPEFSCR, and ensures that rounding only occurs when the relevant part of the result was actually inexact. Signed-off-by: Joseph Myers jos...@codesourcery.com --- diff --git a/arch/powerpc/math-emu/math_efp.c b/arch/powerpc/math-emu/math_efp.c index a73f088..ecdf35d 100644 --- a/arch/powerpc/math-emu/math_efp.c +++ b/arch/powerpc/math-emu/math_efp.c @@ -662,7 +680,8 @@ int speround_handler(struct pt_regs *regs) { union dw_union fgpr; int s_lo, s_hi; - unsigned long speinsn, type, fc; + int lo_inexact, hi_inexact; + unsigned long speinsn, type, fc, fptype; if (get_user(speinsn, (unsigned int __user *) regs-nip)) return -EFAULT; @@ -675,8 +694,12 @@ int speround_handler(struct pt_regs *regs) __FPU_FPSCR = mfspr(SPRN_SPEFSCR); pr_debug(speinsn:%08lx spefscr:%08lx\n, speinsn, __FPU_FPSCR); + fptype = (speinsn 5) 0x7; + /* No need to round if the result is exact */ - if (!(__FPU_FPSCR FP_EX_INEXACT)) + lo_inexact = __FPU_FPSCR (SPEFSCR_FG | SPEFSCR_FX); + hi_inexact = __FPU_FPSCR (SPEFSCR_FGH | SPEFSCR_FXH); + if (!(lo_inexact || (hi_inexact fptype == VCT))) return 0; fc = (speinsn 21) 0x1f; @@ -687,7 +710,7 @@ int speround_handler(struct pt_regs *regs) pr_debug(round fgpr: %08x %08x\n, fgpr.wp[0], fgpr.wp[1]); - switch ((speinsn 5) 0x7) { + switch (fptype) { /* Since SPE instructions on E500 core can handle round to nearest * and round toward zero with IEEE-754 complied, we just need * to handle round toward +Inf and round toward -Inf by software. @@ -710,11 +733,15 @@ int speround_handler(struct pt_regs *regs) case VCT: if (FP_ROUNDMODE == FP_RND_PINF) { - if (!s_lo) fgpr.wp[1]++; /* Z_low 0, choose Z1 */ - if (!s_hi) fgpr.wp[0]++; /* Z_high word 0, choose Z1 */ + if (lo_inexact !s_lo) + fgpr.wp[1]++; /* Z_low 0, choose Z1 */ + if (hi_inexact !s_hi) + fgpr.wp[0]++; /* Z_high word 0, choose Z1 */ } else { /* round to -Inf */ - if (s_lo) fgpr.wp[1]++; /* Z_low 0, choose Z2 */ - if (s_hi) fgpr.wp[0]++; /* Z_high 0, choose Z2 */ + if (lo_inexact s_lo) + fgpr.wp[1]++; /* Z_low 0, choose Z2 */ + if (hi_inexact s_hi) + fgpr.wp[0]++; /* Z_high 0, choose Z2 */ } break; -- Joseph S. Myers jos...@codesourcery.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: fix exception clearing in e500 SPE float emulation
23) 0x7)) 2)); update_regs: - __FPU_FPSCR = ~FP_EX_MASK; + /* +* If the invalid exception sticky bit was set by the +* processor for non-finite input, but was not set before the +* instruction being emulated, clear it. Likewise for the +* underflow bit, which may have been set by the processor +* for exact underflow, not just inexact underflow when the +* flag should be set for IEEE 754 semantics. Other sticky +* exceptions will only be set by the processor when they are +* correct according to IEEE 754 semantics, and we must not +* clear sticky bits that were already set before the emulated +* instruction as they represent the user-visible sticky +* exception status. inexact traps to kernel are not +* required for IEEE semantics and are not enabled by default, +* so the inexact sticky bit may have been set by a previous +* instruction without the kernel being aware of it. +*/ + __FPU_FPSCR + = ~(FP_EX_INVALID | FP_EX_UNDERFLOW) | current-thread.spefscr_last; __FPU_FPSCR |= (FP_CUR_EXCEPTIONS FP_EX_MASK); mtspr(SPRN_SPEFSCR, __FPU_FPSCR); + current-thread.spefscr_last = __FPU_FPSCR; current-thread.evr[fc] = vc.wp[0]; regs-gpr[fc] = vc.wp[1]; -- Joseph S. Myers jos...@codesourcery.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev