Re: fsqrt
The interesting thing I'm finding is that our math-emu is actually quite busted :-) For example look at fsqrt. It's defined as type AB which is incorrect, it should be type XB. It ends up looking for it's arguments in the wrong registers, same for fsrqts, fre, and a few others. Also quite a few functions are simply left unimplemented... (fre is completely missing from the decode, fres is there but returns -ENOSYS, as does frsqrte, etc...) I'll post a quick for for fsqrt{s} that I need for anaconda, but I would strongly encourage somebody from FSL (primary users of that stuff still) to have a close look at the rest. It shouldn't be terribly hard to fix them up and add the few missing instructions, even if not with the amount of precision requested by the spec (better than -ENOSYS) Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: fsqrt
On Jun 7, 2013, at 4:30 PM, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: Right, looking more, the code really sucks. Hey! :) …. Either you use the existing apparent ability for MATH_EMU to operate in minimal mode, ie, load/store/fmr only (which seems to do exactly the same thing as the code in softemu8xx.c which we can then get rid of), or just get rid of that minimal mode altogether. The purpose for that minimal code was the signal handlers were hand-coded assembly that always did the FPU load/store regardless of compiler flags. Over time, it became convenient to emulate a few others, even with user space math emulation, for similar reasons. And while at it make it a general config option for all soft-emu processors (there is no bloody reason why that should be 8xx specfic) or just get rid of the whole concept of half-emulation. Looks like the code just evolved and the 8xx was never cleaned up. The partial emulation is a convenience for the libraries that may still have FP instructions directly coded. There are also permutations of real FPU but not all instructions, and no FPU to consider for fetching operands and storing results to be considered for the various configuration options. Thanks. -- Dan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: fsqrt
Hi Ben. On Jun 7, 2013, at 5:34 PM, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: The question is whether this is still relevant ? The only answer I could provide is that it's dependent upon the libraries and how the distributions are built. It's also dependent upon processors with hardware FP that don't implement all instructions in hardware (who had that bright idea? :)) If distributions are fully all soft-fp in user space or all hardware FP, it removes the one reason that started the whole partial emulation option. … And if the answer is yes, There are multiple options, but I believe they are solved today. One is the libraries coded with hardware load/store that are used by soft-fp, another is hardware FP that doesn't implement all instructions in hardware (which it seems is the basis of this thread, although I thought was already solved). The variation here is that in the first case you have to read/write user space soft-fp stack registers, while in the latter you read/write real FP registers. There used to be the third variation where the stack was allocated and the emulation had to write both places due to compiler function APIs or optimizations. Of course, then there is the full-up kernel emulation where hardware is entirely lacking. … we still want that minimum emulation of load/stores/fmr as an option, is there any reason why we can't replace the one in softemu8xx with the existing (and unused) equivalent in do_mathemu ? It appears to me that 8xx custom code can be removed. I guess I should try to boot it up, if anyone even cares these days. :) Thanks. -- Dan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: fsqrt
-Original Message- From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org] Another question... Do you guys happen to have a patch to emulate fsqrt in the kernel ? Fedora 19 seems to be using it ... among others. You mean this one arch/powerpc/math-emu/fsqrt.c ? Yes. Roy ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: fsqrt
On 06/07/2013 03:41 PM, Benjamin Herrenschmidt wrote: Another question... Do you guys happen to have a patch to emulate fsqrt in the kernel ? Seems this is already emulated: arch/powerpc/math-emu/fsqrt.c You can enable CONFIG_MATH_EMULATION to try. Tiejun ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: fsqrt
On Fri, 2013-06-07 at 15:46 +0800, tiejun.chen wrote: On 06/07/2013 03:41 PM, Benjamin Herrenschmidt wrote: Another question... Do you guys happen to have a patch to emulate fsqrt in the kernel ? Seems this is already emulated: arch/powerpc/math-emu/fsqrt.c You can enable CONFIG_MATH_EMULATION to try. Is math emu expected to work at all on top of a real FPU ? I though it didn't ... maybe I'm wrong. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: fsqrt
On Fri, 2013-06-07 at 07:45 +, Zang Roy-R61911 wrote: -Original Message- From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org] Another question... Do you guys happen to have a patch to emulate fsqrt in the kernel ? Fedora 19 seems to be using it ... among others. You mean this one arch/powerpc/math-emu/fsqrt.c ? No. This is for setups that have no FPU, I don't think that will work with an actual FPU. fsqrt is an optional instruction in the architecture and FSL chips don't use it. However it looks like Fedora is compiled with a toolchain that generates it. I've successfully launched the Fedora installer using this hack in the kernel: From f75adba1ee91691d431e283184e15412114000d1 Mon Sep 17 00:00:00 2001 From: Benjamin Herrenschmidt b...@kernel.crashing.org Date: Fri, 7 Jun 2013 18:42:44 +1000 Subject: [PATCH] Gross hack --- arch/powerpc/include/asm/ppc-opcode.h |2 ++ arch/powerpc/kernel/Makefile |4 ++- arch/powerpc/kernel/fsqrt-emu.c | 44 + arch/powerpc/kernel/traps.c |7 ++ 4 files changed, 56 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/kernel/fsqrt-emu.c diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index eccfc16..146c5e9 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -88,6 +88,8 @@ #define PPC_INST_DCBA_MASK 0xfc0007fe #define PPC_INST_DCBAL 0x7c2005ec #define PPC_INST_DCBZL 0x7c2007ec +#define PPC_INST_FSQRT 0xfc2c +#define PPC_INST_FSQRT_MASK0xfc1f07fe #define PPC_INST_ICBT 0x7c2c #define PPC_INST_ISEL 0x7c1e #define PPC_INST_ISEL_MASK 0xfc3e diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index f960a79..64c9962 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -26,6 +26,8 @@ CFLAGS_REMOVE_ftrace.o = -pg -mno-sched-epilog CFLAGS_REMOVE_time.o = -pg -mno-sched-epilog endif +CFLAGS_REMOVE_fsqrt-emu.o = -msoft-float + obj-y := cputable.o ptrace.o syscalls.o \ irq.o align.o signal_32.o pmc.o vdso.o \ process.o systbl.o idle.o \ @@ -34,7 +36,7 @@ obj-y := cputable.o ptrace.o syscalls.o \ udbg.o misc.o io.o dma.o \ misc_$(CONFIG_WORD_SIZE).o vdso32/ obj-$(CONFIG_PPC64)+= setup_64.o sys_ppc32.o \ - signal_64.o ptrace32.o \ + signal_64.o ptrace32.o fsqrt-emu.o \ paca.o nvram_64.o firmware.o obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_ppc970.o cpu_setup_pa6t.o diff --git a/arch/powerpc/kernel/fsqrt-emu.c b/arch/powerpc/kernel/fsqrt-emu.c new file mode 100644 index 000..2da4f71 --- /dev/null +++ b/arch/powerpc/kernel/fsqrt-emu.c @@ -0,0 +1,44 @@ +#include linux/kernel.h +#include linux/preempt.h +#include linux/sched.h +#include asm/ptrace.h +#include asm/processor.h +#include asm/switch_to.h + +static double crackpot_sqrt(double val) +{ +int i; +float x, y; +const float f = 1.5F; + +x = val * 0.5F; +y = val; +i = * ( int * ) y; +i = 0x5f3759df - ( i 1 ); +y = * ( float * ) i; +y = y * ( f - ( x * y * y ) ); +y = y * ( f - ( x * y * y ) ); +return val * y; +} + +int emulate_fsqrt_inst(struct pt_regs *regs, u32 instword) +{ + unsigned int frt_r = (instword 21) 0x1f; + unsigned int frb_r = (instword 21) 0x1f; + double frt, frb; + + /* XXX THIS WHOLE THING IS JUST A HACK ! */ + preempt_disable(); + enable_kernel_fp(); + frb = current-thread.fpr[frb_r][0]; + frt = crackpot_sqrt(frb); + current-thread.fpr[frt_r][0] = frt; + if (instword 1) + regs-ccr = 0xff00; + preempt_enable(); + + return 0; +} + + + diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 6dfbb38..e677792 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -938,6 +938,8 @@ static int emulate_isel(struct pt_regs *regs, u32 instword) return 0; } +extern int emulate_fsqrt_inst(struct pt_regs *regs, u32 instword); + #ifdef CONFIG_PPC_TRANSACTIONAL_MEM static inline bool tm_abort_check(struct pt_regs *regs, int cause) { @@ -1018,6 +1020,11 @@ static int emulate_instruction(struct pt_regs *regs) return emulate_isel(regs, instword); } + if ((instword PPC_INST_FSQRT_MASK) == PPC_INST_FSQRT) { + PPC_WARN_EMULATED(isel, regs); + return emulate_fsqrt_inst(regs, instword); + } + #ifdef
Re: fsqrt
On Fri, 2013-06-07 at 18:53 +1000, Benjamin Herrenschmidt wrote: + +static double crackpot_sqrt(double val) +{ +int i; +float x, y; +const float f = 1.5F; + +x = val * 0.5F; +y = val; +i = * ( int * ) y; +i = 0x5f3759df - ( i 1 ); +y = * ( float * ) i; +y = y * ( f - ( x * y * y ) ); +y = y * ( f - ( x * y * y ) ); +return val * y; +} + For those interested, this is the Quake3 sqrt from Carmack ... there's plenty of literature about it one or two google clicks away :-) Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: fsqrt
On 06/07/2013 04:53 PM, Benjamin Herrenschmidt wrote: On Fri, 2013-06-07 at 15:46 +0800, tiejun.chen wrote: On 06/07/2013 03:41 PM, Benjamin Herrenschmidt wrote: Another question... Do you guys happen to have a patch to emulate fsqrt in the kernel ? Seems this is already emulated: arch/powerpc/math-emu/fsqrt.c You can enable CONFIG_MATH_EMULATION to try. Is math emu expected to work at all on top of a real FPU ? I though it didn't ... maybe I'm wrong. As I understand often the real FPU can't support all float instructions, so we have to enable this to emulate those unsupported float instructions in that scenario. Tiejun ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: fsqrt
+ +static double crackpot_sqrt(double val) +{ +int i; +float x, y; +const float f = 1.5F; + +x = val * 0.5F; +y = val; +i = * ( int * ) y; +i = 0x5f3759df - ( i 1 ); +y = * ( float * ) i; +y = y * ( f - ( x * y * y ) ); +y = y * ( f - ( x * y * y ) ); +return val * y; +} + For those interested, this is the Quake3 sqrt from Carmack ... there's plenty of literature about it one or two google clicks away :-) I guess that is a rough enough approximation for graphics. However it will be miscompiled unless i and y are put in a union. David ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: fsqrt
On Fri, 2013-06-07 at 17:02 +0800, tiejun.chen wrote: On 06/07/2013 04:53 PM, Benjamin Herrenschmidt wrote: On Fri, 2013-06-07 at 15:46 +0800, tiejun.chen wrote: On 06/07/2013 03:41 PM, Benjamin Herrenschmidt wrote: Another question... Do you guys happen to have a patch to emulate fsqrt in the kernel ? Seems this is already emulated: arch/powerpc/math-emu/fsqrt.c You can enable CONFIG_MATH_EMULATION to try. Is math emu expected to work at all on top of a real FPU ? I though it didn't ... maybe I'm wrong. As I understand often the real FPU can't support all float instructions, so we have to enable this to emulate those unsupported float instructions in that scenario. Ok, two things come to mind here: - do_mathemu doesn't do giveup_fpu() so the FPU state might still be in the live FP registers and not in the thread_struct, so it can't work... unless I missed something. - mathemu uses solely integers. For something like fsqrt it's going to suck a lot more than implementing using floating points in the kernel. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: fsqrt
On Fri, 2013-06-07 at 11:48 +0100, David Laight wrote: For those interested, this is the Quake3 sqrt from Carmack ... there's plenty of literature about it one or two google clicks away :-) I guess that is a rough enough approximation for graphics. However it will be miscompiled unless i and y are put in a union. It won't in the kernel disables strict aliasing :-) Anyway, that was just a hack and plenty enough to get anaconda going, the bloody thing only uses fsqrt because it's python crappola does something like exp(1.0) / sqrt(2.0) as part of its random number stuff. Honestly I could have made it just return 1.0 and it would probably have worked :-) However, my point remains, it would be very much worthwhile for the kernel to have some reasonable emulation of those missing instructions (afaik only a handful) like we have for isel, popcnt* etc... especially since distros seem to be keen on enabling the use of them in their toolchain. I don't personally have the bandwidth to do a clean implementation (that handles FP exceptions, NaNs, FPSCR, etc...) but I believe it would be valuable if somebody else did (hint hint hint :-) since without this, Fedora ppc64 is basically going to be a non-started on those chips. BTW. Did you guys (ie. FSL) finally add fsqrt to e6500 or it's still out ? Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: fsqrt
On Jun 7, 2013, at 7:14 AM, Benjamin Herrenschmidt wrote: On Fri, 2013-06-07 at 11:48 +0100, David Laight wrote: For those interested, this is the Quake3 sqrt from Carmack ... there's plenty of literature about it one or two google clicks away :-) I guess that is a rough enough approximation for graphics. However it will be miscompiled unless i and y are put in a union. It won't in the kernel disables strict aliasing :-) Anyway, that was just a hack and plenty enough to get anaconda going, the bloody thing only uses fsqrt because it's python crappola does something like exp(1.0) / sqrt(2.0) as part of its random number stuff. Honestly I could have made it just return 1.0 and it would probably have worked :-) However, my point remains, it would be very much worthwhile for the kernel to have some reasonable emulation of those missing instructions (afaik only a handful) like we have for isel, popcnt* etc... especially since distros seem to be keen on enabling the use of them in their toolchain. I don't personally have the bandwidth to do a clean implementation (that handles FP exceptions, NaNs, FPSCR, etc...) but I believe it would be valuable if somebody else did (hint hint hint :-) since without this, Fedora ppc64 is basically going to be a non-started on those chips. BTW. Did you guys (ie. FSL) finally add fsqrt to e6500 or it's still out ? Cheers, Ben. Pretty sure fsqrt is still out of e6500. - k ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: fsqrt
On Fri, 2013-06-07 at 14:19 -0500, Kumar Gala wrote: I don't personally have the bandwidth to do a clean implementation (that handles FP exceptions, NaNs, FPSCR, etc...) but I believe it would be valuable if somebody else did (hint hint hint :-) since without this, Fedora ppc64 is basically going to be a non-started on those chips. BTW. Did you guys (ie. FSL) finally add fsqrt to e6500 or it's still out ? Cheers, Ben. Pretty sure fsqrt is still out of e6500. Ok, thinking out loud... looks like we might be able to just use existing math-emu for that. From what I can tell, all it needs (other than enabling the config option), is a call to flush_fp_to_thread(current); While talking math-emu... we seem to have some duplication between the code on do_mathemu which can be compiled without CONFIG_MATH_EMULATION and in this case only just emulates loads/stores/fmr and the code in arch/powerpc/kernel/softemu8xx.c. Is there any reason we can't just get rid of the latter ? Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: fsqrt
On Sat, 2013-06-08 at 09:23 +1000, Benjamin Herrenschmidt wrote: Ok, thinking out loud... looks like we might be able to just use existing math-emu for that. From what I can tell, all it needs (other than enabling the config option), is a call to flush_fp_to_thread(current); While talking math-emu... we seem to have some duplication between the code on do_mathemu which can be compiled without CONFIG_MATH_EMULATION and in this case only just emulates loads/stores/fmr and the code in arch/powerpc/kernel/softemu8xx.c. Is there any reason we can't just get rid of the latter ? Or just git completely rid of that minimal emulation ... Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: fsqrt
On Sat, 2013-06-08 at 09:25 +1000, Benjamin Herrenschmidt wrote: On Sat, 2013-06-08 at 09:23 +1000, Benjamin Herrenschmidt wrote: Ok, thinking out loud... looks like we might be able to just use existing math-emu for that. From what I can tell, all it needs (other than enabling the config option), is a call to flush_fp_to_thread(current); While talking math-emu... we seem to have some duplication between the code on do_mathemu which can be compiled without CONFIG_MATH_EMULATION and in this case only just emulates loads/stores/fmr and the code in arch/powerpc/kernel/softemu8xx.c. Is there any reason we can't just get rid of the latter ? Or just git completely rid of that minimal emulation ... Right, looking more, the code really sucks. Either you use the existing apparent ability for MATH_EMU to operate in minimal mode, ie, load/store/fmr only (which seems to do exactly the same thing as the code in softemu8xx.c which we can then get rid of), or just get rid of that minimal mode alltogether. And while at it make it a general config option for all soft-emu processors (there is no bloody reason why that should be 8xx specfic) or just get rid of the whole concept of half-emulation. Ie. CONFIG_MATH_EMULATION/CONFIG_MATH_HALF_ASSED_EMULATION Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: fsqrt
On Fri, 2013-06-07 at 17:20 -0700, Dan Malek wrote: The purpose for that minimal code was the signal handlers were hand-coded assembly that always did the FPU load/store regardless of compiler flags. Over time, it became convenient to emulate a few others, even with user space math emulation, for similar reasons. The question is whether this is still relevant ? And if the answer is yes, we still want that minimum emulation of load/stores/fmr as an option, is there any reason why we can't replace the one in softemu8xx with the existing (and unused) equivalent in do_mathemu ? Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: fsqrt
On Fri, 2013-06-07 at 18:13 -0700, Dan Malek wrote: Hi Ben. On Jun 7, 2013, at 5:34 PM, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: The question is whether this is still relevant ? The only answer I could provide is that it's dependent upon the libraries and how the distributions are built. It's also dependent upon processors with hardware FP that don't implement all instructions in hardware (who had that bright idea? :)) If distributions are fully all soft-fp in user space or all hardware FP, it removes the one reason that started the whole partial emulation option. I'm not questioning the relevance of math-emu as a whole, but of the tiny subset which is duplicated in math-emu and softemu8xx, which emulates only load/stores/fmr. If userspace is built with hard FP it is likely to use more than just those handful of instructions... … And if the answer is yes, There are multiple options, but I believe they are solved today. One is the libraries coded with hardware load/store that are used by soft-fp, another is hardware FP that doesn't implement all instructions in hardware (which it seems is the basis of this thread, although I thought was already solved). Yes, it's indeed the basis of that thread, and yes, I though it was already solved as well but unless I missed something it is not, because the current Program Check handler calls do_mathemu without first flushing the hard FP state into the thread_struct. However it's quite possible (I'll test when I get back to the office) that this it the only fix necessary, which is a one liner, to make CONFIG_MATH_EMULATION work just fine in that case. The variation here is that in the first case you have to read/write user space soft-fp stack registers, while in the latter you read/write real FP registers. We never do any user space soft-fp stack registers handling in the kernel. If we use full math emu (ie, no FP at all in HW), we simply use the normal thread_struct storage of FPRs to store the virtual user FP regs used by the emulation. If use space uses full soft-fp (ie, -msoft-float), we should never see any of it in the kernel. There used to be the third variation where the stack was allocated and the emulation had to write both places due to compiler function APIs or optimizations. Of course, then there is the full-up kernel emulation where hardware is entirely lacking. I don't know anything about that 3rd option, it certainly doesn't have any kernel impact that I can see :-) Full up emulation is of course still there. … we still want that minimum emulation of load/stores/fmr as an option, is there any reason why we can't replace the one in softemu8xx with the existing (and unused) equivalent in do_mathemu ? It appears to me that 8xx custom code can be removed. I guess I should try to boot it up, if anyone even cares these days. :) Thanks, Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev