Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Tue, Jul 08, 2014 at 05:33:16PM +0100, Peter Maydell wrote: Incidentally, combination of --enable-gprof and (default) --enable-pie won't build - it dies with ld(1) complaining about relocs in gcrt1.o. This sounds like a toolchain bug to me :-) Debian stable/amd64, gcc 4.7.2, binutils 2.22. And google search finds this, for example: http://osdir.com/ml/qemu-devel/2013-05/msg00710.html. That one has gcc 4.4.3. Anyway, adding --disable-pie to --enable-gprof gets it to build, but as I said, gprof is no better than perf and oprofile - same problem. Stats I quoted were from qemu-system-alpha booting debian/lenny (5.10) and going through their kernel package build. I have perf report in front of me right now; the top ones are 41.77% qemu-system-alp perf-24701.map [.] 0x7fbbee558930 11.78% qemu-system-alp qemu-system-alpha[.] cpu_alpha_exec 4.95% qemu-system-alp [vdso] [.] 0x7fffdd7ff8de 2.40% qemu-system-alp qemu-system-alpha[.] phys_page_find 1.49% qemu-system-alp qemu-system-alpha[.] address_space_translate_internal 1.34% qemu-system-alp [kernel.kallsyms][k] read_hpet 1.26% qemu-system-alp qemu-system-alpha[.] tlb_set_page 1.23% qemu-system-alp qemu-system-alpha[.] find_next_bit 1.04% qemu-system-alp qemu-system-alpha[.] get_page_addr_code 1.01% qemu-system-alp libpthread-2.13.so [.] pthread_mutex_lock 0.88% qemu-system-alp qemu-system-alpha[.] helper_cmpbge 0.80% qemu-system-alp libc-2.13.so [.] __memset_sse2 0.72% qemu-system-alp libpthread-2.13.so [.] __pthread_mutex_unlock_usercnt 0.70% qemu-system-alp qemu-system-alpha[.] get_physical_address 0.69% qemu-system-alp qemu-system-alpha[.] address_space_translate 0.68% qemu-system-alp qemu-system-alpha[.] tcg_optimize 0.67% qemu-system-alp qemu-system-alpha[.] ldq_phys 0.63% qemu-system-alp qemu-system-alpha[.] qemu_get_ram_ptr 0.62% qemu-system-alp qemu-system-alpha[.] helper_le_ldq_mmu 0.57% qemu-system-alp qemu-system-alpha[.] memory_region_is_ram and cpu_alpha_exec() spends most of the time in inlined tb_find_fast(). It might be worth checking the actual distribution of the hash of virt address used by that sucker - I wonder if dividing its argument by 4 wouldn't improve the things, but I don't have stats on actual frequency of conflicts, etc. In any case, the first lump (42%) seems to be tastier ;-) There are all kinds of microoptimizations possible (e.g. helper_cmpbge() could be done by a couple of MMX insns on amd64 host[1]), but it would be nice to have some details on what we spend the time on in tcg output... [1] The reason why helper_cmpbge() shows up is that string functions on alpha use that insn a lot; it _might_ be worth optimizing.
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Wed, Jul 09, 2014 at 08:14:12AM -0700, Richard Henderson wrote: On 07/08/2014 10:47 PM, Al Viro wrote: So env-fpcr_flush_to_zero = env-fpcr_dnod env-fpcr_undz; is another bug - needs s/dnod/unfd/ there... That's exactly what I was looking at, thanks. BTW, that (unimplementeds being RAZ) is why AARM insists on having FP_C in software - FPCR isn't guaranteed to have the trap disable bits and, in fact, doesn't have anywhere to store IEEE_TRAP_ENABLE_DNO on actual hw. The software completion is where it has to be dealt with; note that both swcr_update_status() and ieee_swcr_to_fpcr() treat -ieee_state (i.e. our FP_C) as authoritative wrt trap enable bits, 21264 or not. Trap _status_ bits are different - there (on 21264) FPCR is considered authoritative, but that's it. Unimplemented trap bits are treated as trap enabled, so the completion gets to decide what it wants to do. If you want to keep FPCR authoritative for all those bits in linux-user case, we have to treat FPCR.DNOD as writable bit for that mode, which is why my variant slapped an ifdef CONFIG_USER_ONLY around env-fpcr_dnod = (val FPCR_DNOD) != 0; instead of ripping it out completely...
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
Al Viro writes: On Tue, Jul 08, 2014 at 08:32:55PM +0100, Peter Maydell wrote: On 8 July 2014 18:20, Al Viro v...@zeniv.linux.org.uk wrote: On Tue, Jul 08, 2014 at 05:33:16PM +0100, Peter Maydell wrote: snip Again, gprof isn't particulary useful - kernel-side profilers are at least as good. So I suspect that most of the people running into that simply shrug and use those instead. Narrowing it down to -pie didn't take long and I can confirm that this is the root cause of that breakage. Should make debugging said toolchain bug a bit easier, if anybody cares to do that... Stats I quoted were from qemu-system-alpha booting debian/lenny (5.10) and going through their kernel package build. I have perf report in front of me right now; the top ones are 41.77% qemu-system-alp perf-24701.map [.] 0x7fbbee558930 11.78% qemu-system-alp qemu-system-alpha[.] cpu_alpha_exec and cpu_alpha_exec() spends most of the time in inlined tb_find_fast(). It might be worth checking the actual distribution of the hash of virt address used by that sucker - I wonder if dividing its argument by 4 wouldn't improve the things, but I don't have stats on actual frequency of conflicts, etc. In any case, the first lump (42%) seems to be tastier ;-) Depends on your point of view -- arguably we ought to be spending *more* time executing translated guest code... (As you say, the problem is that we don't have any breakdown of what things might turn out to be hotspots in the translated code.) Might be a fun project to teach perf that hits in such-and-such page should lead to lookup in a table annotating it. As in offsets 42..69 should be recorded as (this address + offset - 42). Then tcg could generate such tables and we'd get information like that much time is spent in the second host insn of instances of that code pattern generated by tcg_gen_shr_i64, etc. No idea if anything of that sort exists - qemu is not the only possible user for that; looks like it might be useful for any JIT profiling, so somebody could've done that already... Handily our patch tracker has remembered what I couldn't find ;-) https://patches.linaro.org/27229/ As I mentioned previously I plan to clean these up over the next week. -- Alex Bennée
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 07/08/2014 10:47 PM, Al Viro wrote: So env-fpcr_flush_to_zero = env-fpcr_dnod env-fpcr_undz; is another bug - needs s/dnod/unfd/ there... That's exactly what I was looking at, thanks. r~
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Mon, Jul 07, 2014 at 11:03:08PM -0700, Richard Henderson wrote: On 07/07/2014 09:20 PM, Al Viro wrote: and I'm reasonably sure that this is what they did internally. You are proposing to do 4 cases in all their messy glory in qemu itself... Yes. Primarily because we *have* to do so for the linux-user case. And that's not even going into generating the right si_code for that SIGFPE. What produces those TARGET_GEN_FLTINE and friends? linux-user/main.c, cpu_loop. That's where we consume it; where is it produced? Sure, explicit gentrap in alpha code will lead there, with whatever we have in $16 deciding what'll go into si_code, but where does that happen on fp exception codepaths? IOW, what sets si_code on those?
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Tue, Jul 08, 2014 at 08:32:55PM +0100, Peter Maydell wrote: On 8 July 2014 18:20, Al Viro v...@zeniv.linux.org.uk wrote: On Tue, Jul 08, 2014 at 05:33:16PM +0100, Peter Maydell wrote: Incidentally, combination of --enable-gprof and (default) --enable-pie won't build - it dies with ld(1) complaining about relocs in gcrt1.o. This sounds like a toolchain bug to me :-) Debian stable/amd64, gcc 4.7.2, binutils 2.22. And google search finds this, for example: http://osdir.com/ml/qemu-devel/2013-05/msg00710.html. That one has gcc 4.4.3. That just makes it a long-standing toolchain bug. I don't see any reason why PIE + gprof shouldn't work, it just looks like gprof doesn't ship and link a PIE runtime. *nod* It's not a huge itch to scratch for me, and I'm not even sure whether the bug should be filed for gcc or for libc (probably the latter). In any case, having that information findable in list archives would probably be a good thing. Again, gprof isn't particulary useful - kernel-side profilers are at least as good. So I suspect that most of the people running into that simply shrug and use those instead. Narrowing it down to -pie didn't take long and I can confirm that this is the root cause of that breakage. Should make debugging said toolchain bug a bit easier, if anybody cares to do that... Stats I quoted were from qemu-system-alpha booting debian/lenny (5.10) and going through their kernel package build. I have perf report in front of me right now; the top ones are 41.77% qemu-system-alp perf-24701.map [.] 0x7fbbee558930 11.78% qemu-system-alp qemu-system-alpha[.] cpu_alpha_exec and cpu_alpha_exec() spends most of the time in inlined tb_find_fast(). It might be worth checking the actual distribution of the hash of virt address used by that sucker - I wonder if dividing its argument by 4 wouldn't improve the things, but I don't have stats on actual frequency of conflicts, etc. In any case, the first lump (42%) seems to be tastier ;-) Depends on your point of view -- arguably we ought to be spending *more* time executing translated guest code... (As you say, the problem is that we don't have any breakdown of what things might turn out to be hotspots in the translated code.) Might be a fun project to teach perf that hits in such-and-such page should lead to lookup in a table annotating it. As in offsets 42..69 should be recorded as (this address + offset - 42). Then tcg could generate such tables and we'd get information like that much time is spent in the second host insn of instances of that code pattern generated by tcg_gen_shr_i64, etc. No idea if anything of that sort exists - qemu is not the only possible user for that; looks like it might be useful for any JIT profiling, so somebody could've done that already...
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Tue, Jul 08, 2014 at 09:05:10AM +0100, Peter Maydell wrote: The code we have currently may well be buggy, but the correct It is ;-/ We set TARGET_FPE_FLTINV unconditionally there. BTW, what's the reason why all these cpu_loop() instances can't go into linux-user/arch/something? Is that just because you have static pthread_mutex_t cpu_list_mutex = PTHREAD_MUTEX_INITIALIZER; static pthread_mutex_t exclusive_lock = PTHREAD_MUTEX_INITIALIZER; static pthread_cond_t exclusive_cond = PTHREAD_COND_INITIALIZER; static pthread_cond_t exclusive_resume = PTHREAD_COND_INITIALIZER; static int pending_cpus; and a bunch of inlines using them? As it is, about three quarters of linux-user/main.c consist of code under series of arch ifdefs... BTW, are there any more or less uptodate docs on qemu profiling? I mean, things like perf/oprofile on the host obviously end up lumping all tcg output together. Is there any way to get information beyond ~40% of time is spent in generated code, ~15% - in tb_find_fast(), and the rest is very much colder? Incidentally, combination of --enable-gprof and (default) --enable-pie won't build - it dies with ld(1) complaining about relocs in gcrt1.o. With --disable-pie it builds, but gprof of course has the same problem as perf and friends - generated code is transient, so we get no details ;-/ place to set si_code is (as Richard says) the Alpha cpu_loop() in linux-user/main.c, which has access to the trap type that just caused us to stop executing code, plus the CPUState, which should be enough information to set si_code correctly. In particular the GENTRAP case seems to be setting a variety of different si_code values for SIGFPE. Sigh... Well, having read through alpha_fp_emul() and the stuff it calls, I understand why they hadn't implemented DNOD in any released hardware. It's a bloody mess, with tons of interesting special cases. E.g. adding denorm to very large finite can push into overflow, with further effects depending on whether we have overflow and/or denorm IEEE traps disabled, etc. Frankly, I suspect that it's better to have qemu-system-alpha behave like the actual hardware does (including FPCR.DNOD can't be set) and keep the linux-user behaviour as is, for somebody brave and masochistic enough to fight that one. And no, it's nowhere near just let denorms ride through the normal softfloat code and play a bit with the flags it might raise. And then there's netbsd/alpha and openbsd/alpha, so in theory somebody might want to play with their software completion semantics (not identical to Linux one) for the sake of yet-to-be-written bsd-user alpha support... Anyway, how about the following delta? AFAICS, it gets qemu-system-alpha behaviour in sync with actual hardware without screwing qemu-alpha up. diff --git a/target-alpha/fpu_helper.c b/target-alpha/fpu_helper.c index 9b297de..30cbf02 100644 --- a/target-alpha/fpu_helper.c +++ b/target-alpha/fpu_helper.c @@ -44,6 +44,12 @@ uint32_t helper_fp_exc_get(CPUAlphaState *env) return get_float_exception_flags(FP_STATUS); } +enum { + Exc_Mask = float_flag_invalid | float_flag_int_overflow | + float_flag_divbyzero | float_flag_overflow | + float_flag_underflow | float_flag_inexact +}; + static inline void fp_exc_raise1(CPUAlphaState *env, uintptr_t retaddr, uint32_t exc, uint32_t regno, uint32_t hw_exc) { @@ -73,7 +79,7 @@ static inline void fp_exc_raise1(CPUAlphaState *env, uintptr_t retaddr, doesn't apply. */ void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, uint32_t regno) { -uint32_t exc = (uint8_t)env-fp_status.float_exception_flags; +uint32_t exc = (uint8_t)env-fp_status.float_exception_flags Exc_Mask; if (exc) { env-fpcr_exc_status |= exc; exc = ~ignore; @@ -86,7 +92,7 @@ void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, uint32_t regno) /* Raise exceptions for ieee fp insns with software completion. */ void helper_fp_exc_raise_s(CPUAlphaState *env, uint32_t ignore, uint32_t regno) { -uint32_t exc = (uint8_t)env-fp_status.float_exception_flags; +uint32_t exc = (uint8_t)env-fp_status.float_exception_flags Exc_Mask; if (exc) { env-fpcr_exc_status |= exc; exc = ~ignore; @@ -105,16 +111,14 @@ void helper_ieee_input(CPUAlphaState *env, uint64_t val) uint64_t frac = val 0xfull; if (exp == 0) { -/* Denormals without DNZ set raise an exception. */ -if (frac != 0 !env-fp_status.flush_inputs_to_zero) { -arith_excp(env, GETPC(), EXC_M_UNF, 0); +/* Denormals without /S raise an exception. */ +if (frac != 0) { +arith_excp(env, GETPC(), EXC_M_INV, 0); } } else if (exp == 0x7ff) { /* Infinity or NaN. */ -/* ??? I'm not sure these exception bit flags are correct. I do - know that the Linux kernel, at least, doesn't
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Tue, Jul 08, 2014 at 09:59:33PM -0700, Richard Henderson wrote: On 07/08/2014 01:20 PM, Al Viro wrote: Aha... So you've caught that one already... I've looked at your branch; AFAICS, the only thing missing there is treating stores to FPCR.DNOD in system mode as not implemented (which it is in the code as well as in 21[0-3]64 hardware). Is it loaded and stored on 21264, or it is read-as-zero/write-ignore? RAZ, and the same on 21364 if Compaq manual for compiler-writers is to be believed. On 21264 bits 48..62 are writable, bit 63 is disjunction of bits 52..57 (stores are ignored), bits 0..47 are RAZ. AARM requires RAZ bits 0..46 and RAZ on everything optional that is unimplemented. IOW, DNOD is unimplemented there, all other optional ones are implemented. And according to https://archive.org/details/dec-comp_guide_v2 21364 doesn't implement DNOD either... Is UNDZ not required to be paired with DNOD? There are 4 bits having some relation to handling of denorms. DNZ and DNOD are about denorm inputs; UNDZ and UNFD - about denorm output. All of them have effect only for IEEE insns with /S in trap suffix. Rules: * if DNZ, denorm inputs are silently replaced with zero. * if !DNZ !DNOD, denorm inputs trigger trap (invalid). Same as what would happen without /S. * if !DNZ DNOD, perform operation on denorm(s). And I would like to play with whatever you are using to bring hardware from alternative universes. * if !UNFD, denorm output triggers trap (underflow). Same as what would happen without /S. * if UNFD UNDZ, denorm output is replaced with zero. * if UNFD !UNDZ, denorm output remains as is. So env-fpcr_flush_to_zero = env-fpcr_dnod env-fpcr_undz; is another bug - needs s/dnod/unfd/ there...
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 07/08/2014 09:13 AM, Al Viro wrote: Frankly, I suspect that it's better to have qemu-system-alpha behave like the actual hardware does (including FPCR.DNOD can't be set) and keep the linux-user behaviour as is, for somebody brave and masochistic enough to fight that one. And no, it's nowhere near just let denorms ride through the normal softfloat code and play a bit with the flags it might raise. And then there's netbsd/alpha and openbsd/alpha, so in theory somebody might want to play with their software completion semantics (not identical to Linux one) for the sake of yet-to-be-written bsd-user alpha support... You're probably right there. I've pushed a couple more patches to the branch, split out from your patch here. I believe I've got it all, and havn't mucked things up in the process. I'll run some tests later today when I've got time. r~
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 8 July 2014 18:20, Al Viro v...@zeniv.linux.org.uk wrote: On Tue, Jul 08, 2014 at 05:33:16PM +0100, Peter Maydell wrote: Incidentally, combination of --enable-gprof and (default) --enable-pie won't build - it dies with ld(1) complaining about relocs in gcrt1.o. This sounds like a toolchain bug to me :-) Debian stable/amd64, gcc 4.7.2, binutils 2.22. And google search finds this, for example: http://osdir.com/ml/qemu-devel/2013-05/msg00710.html. That one has gcc 4.4.3. That just makes it a long-standing toolchain bug. I don't see any reason why PIE + gprof shouldn't work, it just looks like gprof doesn't ship and link a PIE runtime. Stats I quoted were from qemu-system-alpha booting debian/lenny (5.10) and going through their kernel package build. I have perf report in front of me right now; the top ones are 41.77% qemu-system-alp perf-24701.map [.] 0x7fbbee558930 11.78% qemu-system-alp qemu-system-alpha[.] cpu_alpha_exec and cpu_alpha_exec() spends most of the time in inlined tb_find_fast(). It might be worth checking the actual distribution of the hash of virt address used by that sucker - I wonder if dividing its argument by 4 wouldn't improve the things, but I don't have stats on actual frequency of conflicts, etc. In any case, the first lump (42%) seems to be tastier ;-) Depends on your point of view -- arguably we ought to be spending *more* time executing translated guest code... (As you say, the problem is that we don't have any breakdown of what things might turn out to be hotspots in the translated code.) -- PMM
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 07/08/2014 01:20 PM, Al Viro wrote: Aha... So you've caught that one already... I've looked at your branch; AFAICS, the only thing missing there is treating stores to FPCR.DNOD in system mode as not implemented (which it is in the code as well as in 21[0-3]64 hardware). Is it loaded and stored on 21264, or it is read-as-zero/write-ignore? Is UNDZ not required to be paired with DNOD? r~
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
Peter Maydell writes: On 8 July 2014 17:13, Al Viro v...@zeniv.linux.org.uk wrote: On Tue, Jul 08, 2014 at 09:05:10AM +0100, Peter Maydell wrote: snip BTW, are there any more or less uptodate docs on qemu profiling? I mean, things like perf/oprofile on the host obviously end up lumping all tcg output together. Is there any way to get information beyond ~40% of time is spent in generated code, ~15% - in tb_find_fast(), and the rest is very much colder? Alex, did you say you'd done something with profiling recently? I posted some RFC patches up a while back that spit out the perf /tmp/perf.pid JIT maps that helps with breaking down which TCG TBs are the most executed. There is another set of patches which allow you to selectively dump translation blocks so you don't end up with multi-gigabyte log files. I'm going to be doing some profiling myself over the next few days so I'll clean-up the patches and re-submit to the list soon. -- Alex Bennée
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 07/07/2014 09:20 PM, Al Viro wrote: and I'm reasonably sure that this is what they did internally. You are proposing to do 4 cases in all their messy glory in qemu itself... Yes. Primarily because we *have* to do so for the linux-user case. And that's not even going into generating the right si_code for that SIGFPE. What produces those TARGET_GEN_FLTINE and friends? linux-user/main.c, cpu_loop. r~
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 8 July 2014 08:13, Al Viro v...@zeniv.linux.org.uk wrote: Actually, that's badly worded; what codepath ends up setting si_code on e.g. fp addition overflows? In system mode it's done by completion code in the kernel, but AFAICS in user mode there are only two places where it might happen - one is gentrap handling and another - osf_setsysinfo(2) emulation for TARGET_SSI_IEEE_FP_CONTROL. What I don't understand is how do we get from float_raise(FP_STATUS, float_flag_overflow) in fpu/softfloat.c to either of those. IOW, suppose I do x = DBL_MAX; feenableexcept(FE_ALL_EXCEPT); x *= x; I understand how I'll get SIGFPE, but what will set correct si_code in siginfo I'll see in the hanler? The code we have currently may well be buggy, but the correct place to set si_code is (as Richard says) the Alpha cpu_loop() in linux-user/main.c, which has access to the trap type that just caused us to stop executing code, plus the CPUState, which should be enough information to set si_code correctly. In particular the GENTRAP case seems to be setting a variety of different si_code values for SIGFPE. thanks -- PMM
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 07/08/2014 01:05 AM, Peter Maydell wrote: On 8 July 2014 08:13, Al Viro v...@zeniv.linux.org.uk wrote: Actually, that's badly worded; what codepath ends up setting si_code on e.g. fp addition overflows? In system mode it's done by completion code in the kernel, but AFAICS in user mode there are only two places where it might happen - one is gentrap handling and another - osf_setsysinfo(2) emulation for TARGET_SSI_IEEE_FP_CONTROL. What I don't understand is how do we get from float_raise(FP_STATUS, float_flag_overflow) in fpu/softfloat.c to either of those. IOW, suppose I do x = DBL_MAX; feenableexcept(FE_ALL_EXCEPT); x *= x; I understand how I'll get SIGFPE, but what will set correct si_code in siginfo I'll see in the hanler? The code we have currently may well be buggy, but the correct place to set si_code is (as Richard says) the Alpha cpu_loop() in linux-user/main.c, which has access to the trap type that just caused us to stop executing code, plus the CPUState, which should be enough information to set si_code correctly. In particular the GENTRAP case seems to be setting a variety of different si_code values for SIGFPE. The gentrap case is a red-herring. The case you're looking for is EXC_ARITH. The path is from arith_excp dynamic_excp cpu_loop_exit longjmp cpu_exec cpu_loop It's also true that we don't install the correct si_code there, but we could. Mostly the gcc/glibc test cases really only care that SIGFPE gets raised, not what the codes are, so I haven't bothered. r~
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Tue, Jul 08, 2014 at 07:54:36AM +0100, Al Viro wrote: On Mon, Jul 07, 2014 at 11:03:08PM -0700, Richard Henderson wrote: On 07/07/2014 09:20 PM, Al Viro wrote: and I'm reasonably sure that this is what they did internally. You are proposing to do 4 cases in all their messy glory in qemu itself... Yes. Primarily because we *have* to do so for the linux-user case. And that's not even going into generating the right si_code for that SIGFPE. What produces those TARGET_GEN_FLTINE and friends? linux-user/main.c, cpu_loop. That's where we consume it; where is it produced? Sure, explicit gentrap in alpha code will lead there, with whatever we have in $16 deciding what'll go into si_code, but where does that happen on fp exception codepaths? IOW, what sets si_code on those? Actually, that's badly worded; what codepath ends up setting si_code on e.g. fp addition overflows? In system mode it's done by completion code in the kernel, but AFAICS in user mode there are only two places where it might happen - one is gentrap handling and another - osf_setsysinfo(2) emulation for TARGET_SSI_IEEE_FP_CONTROL. What I don't understand is how do we get from float_raise(FP_STATUS, float_flag_overflow) in fpu/softfloat.c to either of those. IOW, suppose I do x = DBL_MAX; feenableexcept(FE_ALL_EXCEPT); x *= x; I understand how I'll get SIGFPE, but what will set correct si_code in siginfo I'll see in the hanler?
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 8 July 2014 17:13, Al Viro v...@zeniv.linux.org.uk wrote: On Tue, Jul 08, 2014 at 09:05:10AM +0100, Peter Maydell wrote: The code we have currently may well be buggy, but the correct It is ;-/ We set TARGET_FPE_FLTINV unconditionally there. BTW, what's the reason why all these cpu_loop() instances can't go into linux-user/arch/something? It's just ancient code nobody's cleaned up yet. I do have move all this stuff into arch directories on my would like to do list, but I just haven't got round to it yet, since it's not actually actively broken (unlike many other areas of our codebase :-/). BTW, are there any more or less uptodate docs on qemu profiling? I mean, things like perf/oprofile on the host obviously end up lumping all tcg output together. Is there any way to get information beyond ~40% of time is spent in generated code, ~15% - in tb_find_fast(), and the rest is very much colder? Alex, did you say you'd done something with profiling recently? Incidentally, combination of --enable-gprof and (default) --enable-pie won't build - it dies with ld(1) complaining about relocs in gcrt1.o. This sounds like a toolchain bug to me :-) -- PMM
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Tue, Jul 08, 2014 at 11:12:20AM -0700, Richard Henderson wrote: On 07/08/2014 09:13 AM, Al Viro wrote: Frankly, I suspect that it's better to have qemu-system-alpha behave like the actual hardware does (including FPCR.DNOD can't be set) and keep the linux-user behaviour as is, for somebody brave and masochistic enough to fight that one. And no, it's nowhere near just let denorms ride through the normal softfloat code and play a bit with the flags it might raise. And then there's netbsd/alpha and openbsd/alpha, so in theory somebody might want to play with their software completion semantics (not identical to Linux one) for the sake of yet-to-be-written bsd-user alpha support... You're probably right there. I've pushed a couple more patches to the branch, split out from your patch here. I believe I've got it all, and havn't mucked things up in the process. I'll run some tests later today when I've got time. Just one thing - 0x1f will make 32bit hosts whine about integer constant being too large. So will 0x1ful, unfortunately - it really ought to be ull.
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 07/08/2014 12:02 PM, Al Viro wrote: On Tue, Jul 08, 2014 at 11:12:20AM -0700, Richard Henderson wrote: On 07/08/2014 09:13 AM, Al Viro wrote: Frankly, I suspect that it's better to have qemu-system-alpha behave like the actual hardware does (including FPCR.DNOD can't be set) and keep the linux-user behaviour as is, for somebody brave and masochistic enough to fight that one. And no, it's nowhere near just let denorms ride through the normal softfloat code and play a bit with the flags it might raise. And then there's netbsd/alpha and openbsd/alpha, so in theory somebody might want to play with their software completion semantics (not identical to Linux one) for the sake of yet-to-be-written bsd-user alpha support... You're probably right there. I've pushed a couple more patches to the branch, split out from your patch here. I believe I've got it all, and havn't mucked things up in the process. I'll run some tests later today when I've got time. Just one thing - 0x1f will make 32bit hosts whine about integer constant being too large. So will 0x1ful, unfortunately - it really ought to be ull. I did use ull on the branch. r~
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Tue, Jul 08, 2014 at 12:04:10PM -0700, Richard Henderson wrote: Just one thing - 0x1f will make 32bit hosts whine about integer constant being too large. So will 0x1ful, unfortunately - it really ought to be ull. I did use ull on the branch. Aha... So you've caught that one already... I've looked at your branch; AFAICS, the only thing missing there is treating stores to FPCR.DNOD in system mode as not implemented (which it is in the code as well as in 21[0-3]64 hardware). Other than that everything seems to be fine; you are right about cvtql treatment - since that sucker doesn't have /i in any allowed trap suffices, we might as well just raise Inexact and let it be masked out - float_flag_inexact will be present in 'ignore'. And yes, folding the calculation itself in there obviously makes sense.
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 07/05/2014 03:55 PM, Al Viro wrote: +/* Input handing with software completion. Trap for denorms, + unless DNZ is set. *IF* we try to support DNOD (which + none of the produced hardware did, AFAICS), we'll need + to suppress the trap when FPCR.DNOD is set; then the + code downstream of that will need to cope with denorms + sans flush_input_to_zero. Most of it should work sanely, + but there's nothing to compare with... +*/ +void helper_ieee_input_s(CPUAlphaState *env, uint64_t val) +{ +if (unlikely(2 * val - 1 0x1f)) { + if (!FP_STATUS.flush_inputs_to_zero) { + arith_excp(env, GETPC(), EXC_M_INV | EXC_M_SWC, 0); + } +} +} + A couple of points here: 1) We should never raise this in user-only mode. In that mode, we emulate the whole fpu stack, all the way through from HW to the OS completion handler. 2) Because of that, we have the capability of doing the same thing in system mode. This lets us do more of the computation in the host, and less in the guest, which is faster. The only thing this makes more difficult is debugging the OS completion handlers within the kernel, since they'll only get invoked when SIGFPE needs to be sent. 3) If we do want to implement a mode where we faithfully send SWC for all of the bits of IEEE that real HW didn't implement, do we really need to avoid a store to the output register when signalling this? I.e. can we notice this condition after the fact with float_flag_input_denormal, rather than having another function call to prep the inputs? r~
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Mon, Jul 07, 2014 at 07:11:58AM -0700, Richard Henderson wrote: A couple of points here: 1) We should never raise this in user-only mode. In that mode, we emulate the whole fpu stack, all the way through from HW to the OS completion handler. How is that different from other cases where we have an exception raised by an fp operation? 2) Because of that, we have the capability of doing the same thing in system mode. This lets us do more of the computation in the host, and less in the guest, which is faster. The only thing this makes more difficult is debugging the OS completion handlers within the kernel, since they'll only get invoked when SIGFPE needs to be sent. Umm... The effect of software completion depends on current-ieee_state; how would you keep track of that outside of guest kernel? 3) If we do want to implement a mode where we faithfully send SWC for all of the bits of IEEE that real HW didn't implement, do we really need to avoid a store to the output register when signalling this? I.e. can we notice this condition after the fact with float_flag_input_denormal, rather than having another function call to prep the inputs? But flag_input_denormal is raised only when we do have DNZ set. Which is an entirely different case, where we should not (and do not) get an exception at all...
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 07/07/2014 08:06 AM, Al Viro wrote: On Mon, Jul 07, 2014 at 07:11:58AM -0700, Richard Henderson wrote: A couple of points here: 1) We should never raise this in user-only mode. In that mode, we emulate the whole fpu stack, all the way through from HW to the OS completion handler. How is that different from other cases where we have an exception raised by an fp operation? In all other cases we know we're going to send SIGFPE. That's either through a non /S insn which the kernel wouldn't touch, or by having computed the true IEEE result and examined the exceptions to be raised. 2) Because of that, we have the capability of doing the same thing in system mode. This lets us do more of the computation in the host, and less in the guest, which is faster. The only thing this makes more difficult is debugging the OS completion handlers within the kernel, since they'll only get invoked when SIGFPE needs to be sent. Umm... The effect of software completion depends on current-ieee_state; how would you keep track of that outside of guest kernel? The kernel essentially keeps a copy of IEEE_STATE in the FPCR. I don't see any missing bits in ieee_swcr_to_fpcr, do you? While real hardware might ignore some of those bits once stored, qemu doesn't. While in real hardware one could force the FPCR and IEEE_STATE to differ, honestly that'd be a bug. (Although a silly one; I wish the kernel took the EV6 FPCR as gospel for everything, not just the status flags. That could make certain libm.so computations much faster.) 3) If we do want to implement a mode where we faithfully send SWC for all of the bits of IEEE that real HW didn't implement, do we really need to avoid a store to the output register when signalling this? I.e. can we notice this condition after the fact with float_flag_input_denormal, rather than having another function call to prep the inputs? But flag_input_denormal is raised only when we do have DNZ set. Which is an entirely different case, where we should not (and do not) get an exception at all... Ah, you're right about that. I'd mis-remembered the implementation. r~
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Mon, Jul 07, 2014 at 09:20:28AM -0700, Richard Henderson wrote: How is that different from other cases where we have an exception raised by an fp operation? In all other cases we know we're going to send SIGFPE. That's either through a non /S insn which the kernel wouldn't touch, or by having computed the true IEEE result and examined the exceptions to be raised. Umm... Not quite. Consider e.g. CVTQL case. There we have the following picture in case of overflow (real hw with Linux on top of it): no suffix: IOV INE /v: IOV INE SIGFPE /sv, no IEEE INVE: IOV INE INV /sv, IEEE INVE: IOV INE INV SIGFPE This is after the completion had a chance to run. From the hw POV it's no suffix IOV INE no trap /v IOV INE trapIOV /sv IOV INE trapSWC,IOV and it's alpha_fp_emul() that does the rest in /sv case. Actually, it's even simpler: if overflow FPCR.INE = 1 raise IOV do usual trap suffix handling and I'm reasonably sure that this is what they did internally. You are proposing to do 4 cases in all their messy glory in qemu itself... And I wouldn't bet a dime on not having similar turds in other insns; after all, it's hard, let's offload it to software was only a part of motivation for software completions. We really don't like this part of IEEE standard and we'd love to tell you to see figure 1, but we need conformance, so you can mark an insn with /s and have the kernel do what IEEE requires is also very visible in their manual. Result is a mess - if you try to fold the kernel-side stuff into hardware, you end up with a pile of inconsistent behaviours. In principle, it's doable, especially since we are not really constrained by actual hw in terms of what we do in case of FPCR.DNOD being true - no actual hw could set it. So we want * hw behaviour without /s (denorms trap) * hw behaviour with /s without denorms * hw behaviour with /s with denorms with FPCR.DNZ (same as with 0) * kernel completion behaviour with /s with denorm and it might even be what they intended for DNOD to do. But it's going to be messy as hell. And that's not even going into generating the right si_code for that SIGFPE. What produces those TARGET_GEN_FLTINE and friends?
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Sat, Jul 05, 2014 at 02:40:55AM +0100, Al Viro wrote: a) softfloat.c raises flags we don't care about. So checking that FP_STATUS.float_exception_flags is non-zero is *not* good - we catch false positives that way. b) DNZ has effect *only* for /S insns. Without /S denorm means INV and that's it. FPCR.INV isn't set, at that. FPCR.INVD is ignored (it affects only insns with /S). c) without DNZ or DNOD denorms trip INV even with /S. Again, FPCR.INV is not set *and* FPCR.INVD is ignored. It does stop INV from SQRTT/SU on -1, but not on DBL_MIN/2 (and on SQRTT/SU(-1) FPCR.INV is set). Looks like this sucker is a separate kind of trap, the only similarity with INV being that it sets the same bit in trap summary word. BTW, CVTTQ/SVI on denorms with DNZ shouldn't set Inexact. Right now I have duplicate of 21264 SQRTT behaviour on everything except infinities; hadn't looked into those yet. I'm going to massage it a bit and see if the result causes any regressions for corner cases of MULT and friends. Hopefully I'll have something usable by tomorrow... Situation with infinities/NaNs: without /S we should trap on those guys for arithmetics and conversions; in trap summary we get EXC_M_INV (regardless of the argument) *and* (unlike the treatment of denorms there) we should set FPCR.INV. With /S they are passed to operation, which is responsible for raising whatever it wants to raise (so far they all seem to be doing the right thing in that area). With comparisons, denorm handling is the same as for arithemtics; i.e. with /S they trigger INV unless DNZ is set (and, presumably, working DNOD would have the same effect on them). Anyway, the current delta (on top of 26f86) follows; seems to get IEEE insns behave on non-finite arguments as they do on 21264. The main exception is that register bitmask supplied to trap isn't calculated in a bunch of cases; since its main purpose is to help locating the trapping insn and we report precise traps (amask feature bit 9), it's probably not an interesting problem. Current Linux kernel definitely won't look at that thing under qemu; an old one might, but it would have to be something older than 2.3... checks the history than 2.2.8, actually. And the impact is that insns with /S getting a denorm argument won't be properly emulated and you'll get SIGFPE. Again, it has to be a really old kernel (older than May 1999) to be affected at all. diff --git a/target-alpha/fpu_helper.c b/target-alpha/fpu_helper.c index 9b297de..637d95e 100644 --- a/target-alpha/fpu_helper.c +++ b/target-alpha/fpu_helper.c @@ -44,6 +44,12 @@ uint32_t helper_fp_exc_get(CPUAlphaState *env) return get_float_exception_flags(FP_STATUS); } +enum { + Exc_Mask = float_flag_invalid | float_flag_int_overflow | + float_flag_divbyzero | float_flag_overflow | + float_flag_underflow | float_flag_inexact +}; + static inline void fp_exc_raise1(CPUAlphaState *env, uintptr_t retaddr, uint32_t exc, uint32_t regno, uint32_t hw_exc) { @@ -73,7 +79,7 @@ static inline void fp_exc_raise1(CPUAlphaState *env, uintptr_t retaddr, doesn't apply. */ void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, uint32_t regno) { -uint32_t exc = (uint8_t)env-fp_status.float_exception_flags; +uint32_t exc = (uint8_t)env-fp_status.float_exception_flags Exc_Mask; if (exc) { env-fpcr_exc_status |= exc; exc = ~ignore; @@ -86,7 +92,7 @@ void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, uint32_t regno) /* Raise exceptions for ieee fp insns with software completion. */ void helper_fp_exc_raise_s(CPUAlphaState *env, uint32_t ignore, uint32_t regno) { -uint32_t exc = (uint8_t)env-fp_status.float_exception_flags; +uint32_t exc = (uint8_t)env-fp_status.float_exception_flags Exc_Mask; if (exc) { env-fpcr_exc_status |= exc; exc = ~ignore; @@ -105,16 +111,14 @@ void helper_ieee_input(CPUAlphaState *env, uint64_t val) uint64_t frac = val 0xfull; if (exp == 0) { -/* Denormals without DNZ set raise an exception. */ -if (frac != 0 !env-fp_status.flush_inputs_to_zero) { -arith_excp(env, GETPC(), EXC_M_UNF, 0); +/* Denormals without /S raise an exception. */ +if (frac != 0) { +arith_excp(env, GETPC(), EXC_M_INV, 0); } } else if (exp == 0x7ff) { -/* Infinity or NaN. */ -/* ??? I'm not sure these exception bit flags are correct. I do - know that the Linux kernel, at least, doesn't rely on them and - just emulates the insn to figure out what exception to use. */ -arith_excp(env, GETPC(), frac ? EXC_M_INV : EXC_M_FOV, 0); +/* Infinity or NaN */ +env-fpcr_exc_status |= float_flag_invalid; +arith_excp(env, GETPC(), EXC_M_INV, 0); } } @@ -125,16 +129,34 @@ void
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Sat, Jul 05, 2014 at 10:09:51PM +0100, Al Viro wrote: Anyway, the current delta (on top of 26f86) follows; seems to get IEEE insns behave on non-finite arguments as they do on 21264. The main exception is that register bitmask supplied to trap isn't calculated in a bunch of cases; since its main purpose is to help locating the trapping insn and we report precise traps (amask feature bit 9), it's probably not an interesting problem. Current Linux kernel definitely won't look at that thing under qemu; an old one might, but it would have to be something older than 2.3... checks the history than 2.2.8, actually. And the impact is that insns with /S getting a denorm argument won't be properly emulated and you'll get SIGFPE. Again, it has to be a really old kernel (older than May 1999) to be affected at all. ... and a followup (and the last part of exception handling for non-VAX insn inputs, AFAICS) - CVTQL. * whether it triggers a trap or not, it sets IOV and INE on overflow. * in case of trap it does *not* bugger off immediately - result is calculated, stored and only then we trap. * trap summary word is different from cvtql/v and cvtql/sv. IOW, it's yet another case of we think that IEEE semantics is stupid and if you need it, you'd damn better ask for it explicitly. Note that cvtql/v sets IOV|INE and hits SIGFPE no matter what, while cvtql/sv set INV instead and triggers SIGFPE only if FP_INVALID is enabled. All difference is kernel-side and it's triggered by EXC_M_SWC in summary word. AFAICS, that should be it for IEEE and shared insns, as far as exceptions on inputs are concerned. Combined delta follows: diff --git a/target-alpha/fpu_helper.c b/target-alpha/fpu_helper.c index 9b297de..25c83b5 100644 --- a/target-alpha/fpu_helper.c +++ b/target-alpha/fpu_helper.c @@ -44,6 +44,12 @@ uint32_t helper_fp_exc_get(CPUAlphaState *env) return get_float_exception_flags(FP_STATUS); } +enum { + Exc_Mask = float_flag_invalid | float_flag_int_overflow | + float_flag_divbyzero | float_flag_overflow | + float_flag_underflow | float_flag_inexact +}; + static inline void fp_exc_raise1(CPUAlphaState *env, uintptr_t retaddr, uint32_t exc, uint32_t regno, uint32_t hw_exc) { @@ -73,7 +79,7 @@ static inline void fp_exc_raise1(CPUAlphaState *env, uintptr_t retaddr, doesn't apply. */ void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, uint32_t regno) { -uint32_t exc = (uint8_t)env-fp_status.float_exception_flags; +uint32_t exc = (uint8_t)env-fp_status.float_exception_flags Exc_Mask; if (exc) { env-fpcr_exc_status |= exc; exc = ~ignore; @@ -86,7 +92,7 @@ void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, uint32_t regno) /* Raise exceptions for ieee fp insns with software completion. */ void helper_fp_exc_raise_s(CPUAlphaState *env, uint32_t ignore, uint32_t regno) { -uint32_t exc = (uint8_t)env-fp_status.float_exception_flags; +uint32_t exc = (uint8_t)env-fp_status.float_exception_flags Exc_Mask; if (exc) { env-fpcr_exc_status |= exc; exc = ~ignore; @@ -105,16 +111,14 @@ void helper_ieee_input(CPUAlphaState *env, uint64_t val) uint64_t frac = val 0xfull; if (exp == 0) { -/* Denormals without DNZ set raise an exception. */ -if (frac != 0 !env-fp_status.flush_inputs_to_zero) { -arith_excp(env, GETPC(), EXC_M_UNF, 0); +/* Denormals without /S raise an exception. */ +if (frac != 0) { +arith_excp(env, GETPC(), EXC_M_INV, 0); } } else if (exp == 0x7ff) { /* Infinity or NaN. */ -/* ??? I'm not sure these exception bit flags are correct. I do - know that the Linux kernel, at least, doesn't rely on them and - just emulates the insn to figure out what exception to use. */ -arith_excp(env, GETPC(), frac ? EXC_M_INV : EXC_M_FOV, 0); +env-fpcr_exc_status |= float_flag_invalid; +arith_excp(env, GETPC(), EXC_M_INV, 0); } } @@ -125,16 +129,34 @@ void helper_ieee_input_cmp(CPUAlphaState *env, uint64_t val) uint64_t frac = val 0xfull; if (exp == 0) { -/* Denormals without DNZ set raise an exception. */ -if (frac != 0 !env-fp_status.flush_inputs_to_zero) { -arith_excp(env, GETPC(), EXC_M_UNF, 0); +/* Denormals raise an exception. */ +if (frac != 0) { +arith_excp(env, GETPC(), EXC_M_INV, 0); } } else if (exp == 0x7ff frac) { /* NaN. */ +env-fpcr_exc_status |= float_flag_invalid; arith_excp(env, GETPC(), EXC_M_INV, 0); } } +/* Input handing with software completion. Trap for denorms, + unless DNZ is set. *IF* we try to support DNOD (which + none of the produced hardware did, AFAICS), we'll need + to suppress the trap when FPCR.DNOD is set;
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Thu, Jul 03, 2014 at 09:30:05PM -0700, Richard Henderson wrote: Another one is probably not worth bothering - PERR, CTPOP, CTLZ, UNPKBx and PKxB don't accept literal argument. For one thing, as(1) won't let you generate those, so it would have to be explicit .long 0x70001620 instead of perr $0,0,$0 On DS10 it gives SIGILL; under qemu it succeeds. Trivial to fix, anyway, if we care about that (if (islit) goto invalid_opc; in 1C.030..1C.037). Is it just 030..037, or everything under opcode 1C? No, just those. Sadly, V4 of the handbook doesn't mention *anything* about not actually allowing literals for any of these insns. It does - compare 4.13.1 (4-155, page 225) with 4.13.2 (two pages later). The former has MINxxx Ra.rq,Rb.rq,Rc.wq ! Operate format Ra.rq,#b.ib,Rc.wq MINxxx Ra.rq,Rb.rq,Rc.wq ! Operate format Ra.rq,#b.ib,Rc.wq The latter - PERRRa.rq,Rb.rq,Rc.wq ! Operate format And yes, PERR with bit 12 set will give invalid instruction trap, while e.g. MINSB8 won't. The Real Intrudprize Kwality(tm) of technical writing, that, but the information is, indeed, there. Verified on UP1000, which has 0x307 for feature bits, so all this stuff is really in hardware, not emulated. OPC1C is a mess - that's one place on alpha where decoder needs more than upper 6 bits to determine the format of instruction. Most of those guys are Operate (with an extra twist being that some don't take literals), but FTOIS and FTOIT are F-P, and only approximately so (its source refers to integer register, destination to floating point one). Note that function field is in bits 5--11 for Operate and 5--15 for F-P ;-/ Bit 11 allows to discriminate between those, since FTOIS and FTOIT have function 0x70 and 0x78 resp, while everything else has it lower than 0x40. Hell knows how that mess had come to be... Anyway, the situation with literals in OPC1C: 0, 1 (SEXT[BW]). Work, rejected by as(1). 0x30--0x37. Invalid instruction trap, as(1) (correctly) refuses to produce those. 0x38--0x3f. Work, accepted by as(1). 0x70, 0x78. Those are F-P, no literals for them. SEXTB/SEXTW are missing ARG_OPRL form in binutils opcodes/alpha-opc.c; probably not worth bothering...
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
Denorms fun: a) softfloat.c raises flags we don't care about. So checking that FP_STATUS.float_exception_flags is non-zero is *not* good - we catch false positives that way. b) DNZ has effect *only* for /S insns. Without /S denorm means INV and that's it. FPCR.INV isn't set, at that. FPCR.INVD is ignored (it affects only insns with /S). c) without DNZ or DNOD denorms trip INV even with /S. Again, FPCR.INV is not set *and* FPCR.INVD is ignored. It does stop INV from SQRTT/SU on -1, but not on DBL_MIN/2 (and on SQRTT/SU(-1) FPCR.INV is set). Looks like this sucker is a separate kind of trap, the only similarity with INV being that it sets the same bit in trap summary word. d) at least on EV6 and EV67 DNOD *still* trips INV. According to the manual suppression of INV by DNOD is optional. And while their text might be interpreted as INV is suppressed if operation with denorm wouldn't result in something unpleasant (which would apply to sqrt(DBL_MIN/2)), the same behaviour happens on DBL_MIN/2 + DBL_MIN/2, where the result is a good finite value, so it really looks like DNOD doesn't suppress INV at all on these processors. Does anybody have 21364 to run some tests on? FWIW, hw testing had been done by direct printk from do_entArith(); it's before anything alpha_fpu_emu() does. Right now I have duplicate of 21264 SQRTT behaviour on everything except infinities; hadn't looked into those yet. I'm going to massage it a bit and see if the result causes any regressions for corner cases of MULT and friends. Hopefully I'll have something usable by tomorrow... Al, wondering if the original regression testsuite still exists somewhere in the bowels of Intel - DEC/Compaq/HP had to have one for testing the hardware back then...
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Sat, Jul 05, 2014 at 02:40:55AM +0100, Al Viro wrote: d) at least on EV6 and EV67 DNOD *still* trips INV. According to the manual suppression of INV by DNOD is optional. And while their text might be interpreted as INV is suppressed if operation with denorm wouldn't result in something unpleasant (which would apply to sqrt(DBL_MIN/2)), the same behaviour happens on DBL_MIN/2 + DBL_MIN/2, where the result is a good finite value, so it really looks like DNOD doesn't suppress INV at all on these processors. Does anybody have 21364 to run some tests on? In fact, DNOD is simply not implemented on those guys - if you try to set it, the bit still reads zero. Worse, according to Compiler Writer's Guide for the 21264/21364, Alpha architecture FPCR bit 47 (DNOD) is not implemented by the 21264 or 21364. In other words, it looks like FPCR.DNOD is something from (never-produced) 21464.
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
More bugs: addl/v should sign-extend the result, as addl does. As it is, we have uint64_t helper_addlv(CPUAlphaState *env, uint64_t op1, uint64_t op2) { uint64_t tmp = op1; op1 = (uint32_t)(op1 + op2); if (unlikely((tmp ^ op2 ^ (-1UL)) (tmp ^ op1) (1UL 31))) { arith_excp(env, GETPC(), EXC_M_IOV, 0); } return op1; } IOW, #include stdio.h long r; void __attribute__((noinline)) f(void) { asm __volatile( subl $31, 1, $0\n\t addl $0, $0, $1\n\t addl/v $0, $0, $0\n\t subq $0, $1, $0\n\t stq$0, %0\n\t : =m(r): :$0, $1); } main() { f(); printf(%ld\n, r); } ends up printing 0 on actual hardware (all variants) and 4294967296 on qemu. Similar problem with subl/v - #include stdio.h long r; void __attribute__((noinline)) f(void) { asm __volatile( subl $31, 1, $0\n\t subl/v $31, 1, $1\n\t subq $0, $1, $0\n\t stq$0, %0\n\t : =m(r): :$0, $1); } main() { f(); printf(%ld\n, r); } prints 0 on actual hw and -4294967296 on qemu. What constraints do we have on qemu host, anyway? Two's-complement, (int32_t)(uint32_t)x == x for any int32_t x? helper_mullv() seems to assume that... Oh, crap - our mull/v is sensitive to upper 32 bits of multiplicands. If you put 1UL32 into one register, 1 into another and say mull/v, result will be 0 and no overflow. qemu does int64_t res = (int64_t)op1 * (int64_t)op2; if (unlikely((int32_t)res != res)) { arith_excp(env, GETPC(), EXC_M_IOV, 0); } return (int64_t)((int32_t)res); which leads to overflow trap triggered for no good reason... Incidentally, all those guys ({add,sub,mul}[lq]/v) *do* assign the result (same as the variant without /v would) before entering the trap. So arith_excp() is wrong here. FWIW, why not just generate trunc_i64_i32 tmp, va trunc_i64_i32 tmp2, vb muls2_i32 tmp2, tmp, tmp, tmp2 ext32s_i64 vc, tmp2 maybe_overflow_32 tmp where maybe_overflow throws IOV unless tmp is 0 or -1? That would appear to suffice for mull/v. mulq/v would be muls2_i64 vc, tmp, va, vb maybe_overflow_64 tmp addl/v: trunc_i64_i32 tmp, va trunc_i64_i32 tmp2, vb add2_i32 tmp2, tmp, tmp, zero, tmp2, zero ext32s_i64 vc, tmp2 maybe_overflow_32 tmp etc. We'd need two helpers, differing only in argument type. Simple if (unlikely(arg ~arg)) arith_excp(env, GETPC(), EXC_M_IOV, 0); would do. Not sure what flags would be needed in DEFINE_HELPER_... for those, though. Comments?
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Thu, Jul 03, 2014 at 07:51:04AM +0100, Al Viro wrote: FWIW, why not just generate trunc_i64_i32 tmp, va trunc_i64_i32 tmp2, vb muls2_i32 tmp2, tmp, tmp, tmp2 ext32s_i64 vc, tmp2 maybe_overflow_32 tmp where maybe_overflow throws IOV unless tmp is 0 or -1? to suffice for mull/v. mulq/v would be muls2_i64 vc, tmp, va, vb maybe_overflow_64 tmp addl/v: trunc_i64_i32 tmp, va trunc_i64_i32 tmp2, vb add2_i32 tmp2, tmp, tmp, zero, tmp2, zero ext32s_i64 vc, tmp2 maybe_overflow_32 tmp etc. Grr... Wrong check, obviously - we want to check that tmp + MSB(tmp2) is 0. Something like setcond_32 tmp2, tmp2, zero, TCG_COND_LT add_i32 tmp, tmp2, tmp callhelper_IOV_if_not_zero tmp for 32bit ones and setcond_64 tmp2, vc, zero, TCG_COND_LT add_i64 tmp, tmp2, tmp callhelper_IOV_if_not_zero tmp for 64bit ones, or would it be better just to pass both arguments to helper and let it deal with the check? I'm not familiar enough with TCG, sorry...
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 07/03/2014 11:25 AM, Al Viro wrote: On Thu, Jul 03, 2014 at 07:51:04AM +0100, Al Viro wrote: FWIW, why not just generate trunc_i64_i32 tmp, va trunc_i64_i32 tmp2, vb muls2_i32 tmp2, tmp, tmp, tmp2 ext32s_i64 vc, tmp2 maybe_overflow_32 tmp where maybe_overflow throws IOV unless tmp is 0 or -1? to suffice for mull/v. mulq/v would be muls2_i64 vc, tmp, va, vb maybe_overflow_64 tmp addl/v: trunc_i64_i32 tmp, va trunc_i64_i32 tmp2, vb add2_i32 tmp2, tmp, tmp, zero, tmp2, zero ext32s_i64 vc, tmp2 maybe_overflow_32 tmp etc. Grr... Wrong check, obviously - we want to check that tmp + MSB(tmp2) is 0. Something like setcond_32 tmp2, tmp2, zero, TCG_COND_LT add_i32 tmp, tmp2, tmp callhelper_IOV_if_not_zero tmp for 32bit ones and setcond_64 tmp2, vc, zero, TCG_COND_LT add_i64 tmp, tmp2, tmp callhelper_IOV_if_not_zero tmp for 64bit ones, or would it be better just to pass both arguments to helper and let it deal with the check? I'm not familiar enough with TCG, sorry... I believe I have a tidy solution to these /v insns. New patch set shortly. r~
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Thu, Jul 03, 2014 at 01:19:19PM -0700, Richard Henderson wrote: Grr... Wrong check, obviously - we want to check that tmp + MSB(tmp2) is 0. Something like setcond_32 tmp2, tmp2, zero, TCG_COND_LT add_i32 tmp, tmp2, tmp callhelper_IOV_if_not_zero tmp for 32bit ones and setcond_64 tmp2, vc, zero, TCG_COND_LT add_i64 tmp, tmp2, tmp callhelper_IOV_if_not_zero tmp for 64bit ones, or would it be better just to pass both arguments to helper and let it deal with the check? I'm not familiar enough with TCG, sorry... I believe I have a tidy solution to these /v insns. New patch set shortly. Hmm... +tcg_gen_eqv_i64(tmp, va, vb); +tcg_gen_mov_i64(tmp2, va); +tcg_gen_add_i64(vc, va, vb); +tcg_gen_xor_i64(tmp2, tmp2, vc); +tcg_gen_and_i64(tmp, tmp, tmp2); +tcg_gen_shri_i64(tmp, tmp, 63); +tcg_gen_movi_i64(tmp2, 0); +gen_helper_check_overflow(cpu_env, tmp, tmp2); How can that be correct? Suppose a = b = 0. We get tcg_gen_eqv_i64(tmp, va, vb); - tmp = -1 tcg_gen_mov_i64(tmp2, va); - tmp2 = 0 tcg_gen_add_i64(vc, va, vb);- c = 0 tcg_gen_xor_i64(tmp2, tmp2, vc);- tmp2 = 0 tcg_gen_and_i64(tmp, tmp, tmp2);- tmp = -1 tcg_gen_shri_i64(tmp, tmp, 63); - tmp = 1 tcg_gen_movi_i64(tmp2, 0); - tmp2 = 0 gen_helper_check_overflow(cpu_env, tmp, tmp2); - not equal, overflow. What am I missing here?
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 3 July 2014 23:47, Al Viro v...@zeniv.linux.org.uk wrote: How can that be correct? Suppose a = b = 0. We get tcg_gen_eqv_i64(tmp, va, vb); - tmp = -1 tcg_gen_mov_i64(tmp2, va); - tmp2 = 0 tcg_gen_add_i64(vc, va, vb);- c = 0 tcg_gen_xor_i64(tmp2, tmp2, vc);- tmp2 = 0 tcg_gen_and_i64(tmp, tmp, tmp2);- tmp = -1 tmp2 here is 0, so the result of this AND is 0, not -1... tcg_gen_shri_i64(tmp, tmp, 63); - tmp = 1 so tmp = 0 tcg_gen_movi_i64(tmp2, 0); - tmp2 = 0 gen_helper_check_overflow(cpu_env, tmp, tmp2); - not equal, overflow. and tmp == tmp2, no overflow. thanks -- PMM
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Fri, Jul 04, 2014 at 12:05:37AM +0100, Peter Maydell wrote: On 3 July 2014 23:47, Al Viro v...@zeniv.linux.org.uk wrote: How can that be correct? Suppose a = b = 0. We get tcg_gen_eqv_i64(tmp, va, vb); - tmp = -1 tcg_gen_mov_i64(tmp2, va); - tmp2 = 0 tcg_gen_add_i64(vc, va, vb);- c = 0 tcg_gen_xor_i64(tmp2, tmp2, vc);- tmp2 = 0 tcg_gen_and_i64(tmp, tmp, tmp2);- tmp = -1 tmp2 here is 0, so the result of this AND is 0, not -1... Doh. Misread it as tcg_gen_add_i64, sorry. Hmm... So it's ((a ^ ~b) (a ^ c) 0 as overflow condition, IOW MSB(a) == MSB(b) MSB(c) != MSB(a). OK, that works; might deserve a comment, though...
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Thu, Jul 03, 2014 at 01:19:19PM -0700, Richard Henderson wrote: I believe I have a tidy solution to these /v insns. New patch set shortly. OK, looks sane. Next (trivial) bug: in translate_one() case 0xF800: /* WH64 */ /* No-op */ break; should be followed by case 0xFC00: /* WH64EN */ /* No-op */ break; As it is, asm __volatile( lda$0,%0\n\t wh64en ($0)\n\t :: m(r)); ends sending SIGILL. Another one is probably not worth bothering - PERR, CTPOP, CTLZ, UNPKBx and PKxB don't accept literal argument. For one thing, as(1) won't let you generate those, so it would have to be explicit .long 0x70001620 instead of perr $0,0,$0 On DS10 it gives SIGILL; under qemu it succeeds. Trivial to fix, anyway, if we care about that (if (islit) goto invalid_opc; in 1C.030..1C.037). Another interesting bit I _really_ don't want to touch right now is LDx_L/STx_C; what we get there is closer to compare-and-swap than to what the real hardware is doing. OTOH, considering the constraints on what can go between LDx_L and STx_C, I'm not sure whether it can lead to any real problems with the current qemu behaviour... Hell knows; could a long linear piece of code with LDL_L near the point where it runs out of space in block end up with QEMU switching to different cpu before we reach the matching STL_C? If so, there might be problems; on actual hardware CPU1: LDL_L reads 0 CPU2: store 1 ... CPU2: store 0 CPU1: STL_C would have STL_C fail. qemu implementation of those suckers will succeed. I'm not sure if anything in the kernel is sensitive to that, but analysis won't be fun...
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 07/03/2014 05:50 PM, Al Viro wrote: OK, looks sane. Next (trivial) bug: in translate_one() case 0xF800: /* WH64 */ /* No-op */ break; should be followed by case 0xFC00: /* WH64EN */ /* No-op */ break; Huh. I don't have any documentation for EV7. Added. Another one is probably not worth bothering - PERR, CTPOP, CTLZ, UNPKBx and PKxB don't accept literal argument. For one thing, as(1) won't let you generate those, so it would have to be explicit .long 0x70001620 instead of perr $0,0,$0 On DS10 it gives SIGILL; under qemu it succeeds. Trivial to fix, anyway, if we care about that (if (islit) goto invalid_opc; in 1C.030..1C.037). Is it just 030..037, or everything under opcode 1C? Sadly, V4 of the handbook doesn't mention *anything* about not actually allowing literals for any of these insns. For now, I've updated insns in the range you describe, because it's easy. CPU1: LDL_L reads 0 CPU2: store 1 ... CPU2: store 0 CPU1: STL_C would have STL_C fail. qemu implementation of those suckers will succeed. I'm not sure if anything in the kernel is sensitive to that, but analysis won't be fun... I'm aware that lock/cond can be used in ways that we don't support, including STL_C to a different address on the same cacheline as the LDL_L. I'm also aware that if we actually did implement SMP, we would be vulnerable to the ABA error you describe above. That said, it's all moot until the PALcode grows actual SMP support for booting and signalling secondary cpus. Given that qemu implements SMP by multiplexing the guest cpus on a single host thread, and so we can't actually speed up the guest by implementing SMP, it's not seemed like a priority. The next thing I'd work on given oodles of time is to add block device support to the PALcode, so that the console could boot from disk like a real machine. In theory, most of this code can be stolen from SeaBIOS. r~
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Wed, Jul 02, 2014 at 06:50:27AM +0100, Al Viro wrote: AFAICS, it leaves two possibilities - EV45 (AS200) vs. EV6 (DS10) and EV67 (qemu) _or_ some change in the kernel. I'll build 3.x kernel for DS10 and post the results; shouldn't take long... Actually, it's simpler - note that on *all* systems we end up with FPCR.INE set. So this swcr_update_status(unsigned long swcr, unsigned long fpcr) { /* EV6 implements most of the bits in hardware. Collect the acrued exception bits from the real fpcr. */ if (implver() == IMPLVER_EV6) { swcr = ~IEEE_STATUS_MASK; swcr |= (fpcr 35) IEEE_STATUS_MASK; } return swcr; } ends up with FE_INEXACT set on everything that has implver() return 2. Which is what EV6 and EV67 do and which is what qemu does by default. So no, it's not a kernel version difference; it's all kernel versions ignoring FPCR.INE when it calculates ieee_state on EV45 and using it on EV6 and friends. If we don't want FE_INEXACT seen by fetestexcept() after rounding 4.5, we'd better not use FPCR.INE - *all* variants of actual hardware (at least from 21064A to 21264) set that sucker, and 4.7 in Architecture Reference Manual very clearly requires such behaviour for any subset that isn't completely without floating point support.
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 07/01/2014 09:05 PM, Al Viro wrote: Which glibc version it is? I don't see such failures with your axp/axp-next (head at 6b38f4e7f); could you post the details on your reproducer? I've tried to guess the likely version by glibc.git, but I don't see nearbyint tests with such argument in any version there, so I couldn't find it that way... Glibc mainline, then look at math/test-double.out. I'm interested in the results of the following test. r~ #include stdio.h #include fenv.h #include math.h #include float.h static void div_su(void) { asm(divt/su %0,%1,$f0; trapb : : f(1.0), f(3.0) : $f0); } static void div_sui(void) { asm(divt/su %0,%1,$f0; trapb : : f(1.0), f(3.0) : $f0); } static void mul_su(void) { asm(mult/su %0,%0,$f0; trapb : : f(DBL_MIN) : $f0); } static void mul_sui(void) { asm(mult/sui %0,%0,$f0; trapb : : f(DBL_MIN) : $f0); } static void cvttq_45(void) { asm(cvttq/c %0,$f0; trapb : : f(4.5) : $f0); } static void cvttq_sv_45(void) { asm(cvttq/svc %0,$f0; trapb : : f(4.5) : $f0); } static void cvttq_svi_45(void) { asm(cvttq/svic %0,$f0; trapb : : f(4.5) : $f0); } static void cvttq_max(void) { asm(cvttq/c %0,$f0; trapb : : f(DBL_MAX) : $f0); } static void cvttq_sv_max(void) { asm(cvttq/svc %0,$f0; trapb : : f(DBL_MAX) : $f0); } static void cvttq_svi_max(void) { asm(cvttq/svic %0,$f0; trapb : : f(DBL_MAX) : $f0); } static struct test { void (*fn)(void); const char *name; } const tests[] = { { div_su, /su : 1/3 }, { div_sui, /sui : 1/3 }, { mul_su, /su : min*min }, { mul_sui, /sui : min*min }, { cvttq_45, /: (long)4.5 }, { cvttq_sv_45, /sv : (long)4.5 }, { cvttq_svi_45, /svi : (long)4.5 }, { cvttq_max, /: (long)max }, { cvttq_sv_max, /sv : (long)max }, { cvttq_svi_max, /svi : (long)max }, }; int main() { char result[8]; int i, e; for (i = 0; i sizeof(tests)/sizeof(struct test); ++i) { feclearexcept(FE_ALL_EXCEPT); tests[i].fn(); e = fetestexcept(FE_ALL_EXCEPT); result[0] = e FE_DIVBYZERO ? 'd' : '-'; result[1] = e FE_INEXACT ? 'i' : '-'; result[2] = e FE_INVALID ? 'I' : '-'; result[3] = e FE_OVERFLOW ? 'o' : '-'; result[4] = e FE_UNDERFLOW ? 'u' : '-'; result[5] = '\0'; printf(%-20s %s\n, tests[i].name, result); } return 0; }
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
I'm interested in the results of the following test. DS10: /su : 1/3 -i--- /sui : 1/3 -i--- /su : min*min -i--u /sui : min*min -i--u /: (long)4.5 -i--- /sv : (long)4.5 -i--- /svi : (long)4.5 -i--- /: (long)max -i--- /sv : (long)max -iI-- /svi : (long)max -iI-- AS200: /su : 1/3 - /sui : 1/3 - /su : min*min -i--u /sui : min*min -i--u /: (long)4.5 - /sv : (long)4.5 - /svi : (long)4.5 - /: (long)max - /sv : (long)max --I-- /svi : (long)max --I-- qemu: /su : 1/3 -i--- /sui : 1/3 -i--- /su : min*min -i--u /sui : min*min -i--u /: (long)4.5 -i--- /sv : (long)4.5 -i--- /svi : (long)4.5 -i--- /: (long)max -i--- /sv : (long)max -iI-- /svi : (long)max -iI-- IOW, same as EV6. The difference is due to the kernel trusting FPCR.INE as source for FE_INEXACT when it sees implver() returning 2. See a bit upthread for analysis...
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 07/01/2014 11:17 PM, Al Viro wrote: If we don't want FE_INEXACT seen by fetestexcept() after rounding 4.5, we'd better not use FPCR.INE - *all* variants of actual hardware (at least from 21064A to 21264) set that sucker, and 4.7 in Architecture Reference Manual very clearly requires such behaviour for any subset that isn't completely without floating point support. Um, where do you see that? I see: # 4.7.6.4 IEEE-Compliant Arithmetic Without Inexact Exception # This model is similar to the model in Section 4.7.6.3, except this # model does not signal inexact results either by the inexact status # flag or by trapping. [...] This model is implemented by using IEEE # floating-point instructions with the /SU or /SV trap qualifiers. The important words to me being does not signal and inexact status flag. Thus in sysdeps/alpha/fpu/s_nearbyint.c I explicitly use cvttq/svd and not cvttq/svid. By my reading that means no inexact shall be raised. r~
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Wed, Jul 02, 2014 at 08:26:53AM -0700, Richard Henderson wrote: On 07/01/2014 11:17 PM, Al Viro wrote: If we don't want FE_INEXACT seen by fetestexcept() after rounding 4.5, we'd better not use FPCR.INE - *all* variants of actual hardware (at least from 21064A to 21264) set that sucker, and 4.7 in Architecture Reference Manual very clearly requires such behaviour for any subset that isn't completely without floating point support. Um, where do you see that? I see: # 4.7.6.4 IEEE-Compliant Arithmetic Without Inexact Exception # This model is similar to the model in Section 4.7.6.3, except this # model does not signal inexact results either by the inexact status # flag or by trapping. [...] This model is implemented by using IEEE # floating-point instructions with the /SU or /SV trap qualifiers. The important words to me being does not signal and inexact status flag. Thus in sysdeps/alpha/fpu/s_nearbyint.c I explicitly use cvttq/svd and not cvttq/svid. By my reading that means no inexact shall be raised. What does that have to do with exceptions? cvttq/svd is not going to raise one; it *does* set that bit in FPCR, though. What happens afterwards is that fetestexcept() calls osf_getsysinfo(2) with GSI_IEEE_FP_CONTROL for op. Which does w = current_thread_info()-ieee_state IEEE_SW_MASK; w = swcr_update_status(w, rdfpcr()); and hands the value of w to caller. Now, look at swcr_update_status() (in arch/alpha/include/uapi/asm/fpu.h these days) and note that on 21264 it will throw away the status bits of -ieee_state and use 6 bits from FPCR instead. Note, BTW, that appendix B (IEEE conformance) claims (in B.1) conversions as hardware-implemented, with Software routines support remainder, round to integer in floating-point format, and convert binary to/from decimal right next to it.
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 06/30/2014 09:34 PM, Al Viro wrote: VAX operations are serious mess, but I'm not sure if we have them actually used anywhere in Linux kernel or userland. Always possible, of course, but... As far as I know, vax insns aren't used anywhere. If I were doing this port from scratch I'd leave them totally stubbed out. Let's not spend any kind of effort on this at all until other more important things are improved. r~
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 06/30/2014 01:56 PM, Al Viro wrote: On Mon, Jun 30, 2014 at 11:39:43AM -0700, Richard Henderson wrote: Looks good. I've split it up into a couple of smaller patches, made some sylistic tweaks and pushed it to git://github.com/rth7680/qemu.git axp-next I'm starting to do some testing now, but a glance though would be helpful. Especially to see if I didn't make some silly mistake in the process. Hmm. I've just gotten through glibc testing and there are quite a few failures of the form Failure: nearbyint (4.5): Exception Inexact set in math/test-double.out. Any chance you can run the glibc math tests against real hardware and see if these pass? I have a feeling that qemu is now signaling inexact when the hardware doesn't for /SU (but not /SUI) instructions. r~
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Tue, Jul 01, 2014 at 10:03:06AM -0700, Richard Henderson wrote: On 06/30/2014 01:56 PM, Al Viro wrote: On Mon, Jun 30, 2014 at 11:39:43AM -0700, Richard Henderson wrote: Looks good. I've split it up into a couple of smaller patches, made some sylistic tweaks and pushed it to git://github.com/rth7680/qemu.git axp-next I'm starting to do some testing now, but a glance though would be helpful. Especially to see if I didn't make some silly mistake in the process. Hmm. I've just gotten through glibc testing and there are quite a few failures of the form Failure: nearbyint (4.5): Exception Inexact set in math/test-double.out. Any chance you can run the glibc math tests against real hardware and see if these pass? I have a feeling that qemu is now signaling inexact when the hardware doesn't for /SU (but not /SUI) instructions. Which glibc version? Better yet, could you throw preprocessed source my way? UP1000 box is not in a good shape and I'd rather avoid trying to run full glibc builds on it ;-/
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 1 July 2014 18:50, Al Viro v...@zeniv.linux.org.uk wrote: Which glibc version? Better yet, could you throw preprocessed source my way? UP1000 box is not in a good shape and I'd rather avoid trying to run full glibc builds on it ;-/ Would a 164LX be a useful (ie non-duplicate) extra resource for testing this stuff? That has a 21164 (EV5) in it. I haven't tried to boot it for some years, but I can have a try at resurrecting it if it would be helpful... thanks -- PMM
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 07/01/2014 11:23 AM, Peter Maydell wrote: On 1 July 2014 18:50, Al Viro v...@zeniv.linux.org.uk wrote: Which glibc version? Better yet, could you throw preprocessed source my way? UP1000 box is not in a good shape and I'd rather avoid trying to run full glibc builds on it ;-/ Would a 164LX be a useful (ie non-duplicate) extra resource for testing this stuff? That has a 21164 (EV5) in it. I haven't tried to boot it for some years, but I can have a try at resurrecting it if it would be helpful... An ev5 would be a good fallback. r~
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 1 July 2014 19:30, Richard Henderson r...@twiddle.net wrote: On 07/01/2014 11:23 AM, Peter Maydell wrote: Would a 164LX be a useful (ie non-duplicate) extra resource for testing this stuff? That has a 21164 (EV5) in it. I haven't tried to boot it for some years, but I can have a try at resurrecting it if it would be helpful... An ev5 would be a good fallback. Well, it boots. /proc/cpuinfo claims it's an EV56 variation 7 revision 0. Currently running Debian etch. Let me know if you have anything you want me to run on it... thanks -- PMM
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Tue, Jul 01, 2014 at 11:30:19AM -0700, Richard Henderson wrote: On 07/01/2014 11:23 AM, Peter Maydell wrote: On 1 July 2014 18:50, Al Viro v...@zeniv.linux.org.uk wrote: Which glibc version? Better yet, could you throw preprocessed source my way? UP1000 box is not in a good shape and I'd rather avoid trying to run full glibc builds on it ;-/ Would a 164LX be a useful (ie non-duplicate) extra resource for testing this stuff? That has a 21164 (EV5) in it. I haven't tried to boot it for some years, but I can have a try at resurrecting it if it would be helpful... An ev5 would be a good fallback. OK, DS10 resurrected and so far seems to be stable (I'll know by tomorrow; there's a possibility that chipset heatsink is dodgy, but so far it seems to be doing OK). That gives us a EV6 box. Which glibc version it is? I don't see such failures with your axp/axp-next (head at 6b38f4e7f); could you post the details on your reproducer? I've tried to guess the likely version by glibc.git, but I don't see nearbyint tests with such argument in any version there, so I couldn't find it that way...
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Wed, Jul 02, 2014 at 05:05:08AM +0100, Al Viro wrote: OK, DS10 resurrected and so far seems to be stable (I'll know by tomorrow; there's a possibility that chipset heatsink is dodgy, but so far it seems to be doing OK). That gives us a EV6 box. Which glibc version it is? I don't see such failures with your axp/axp-next (head at 6b38f4e7f); could you post the details on your reproducer? I've tried to guess the likely version by glibc.git, but I don't see nearbyint tests with such argument in any version there, so I couldn't find it that way... FWIW, ; cat a.c 'EOF' #include stdio.h #include fenv.h volatile long x; void __attribute__((noinline)) f(double v) { x = v; } main() { unsigned long tmp, ret; static char *names[] = {IOV, INE, UNF, OVF, DZE, INV}; int i; feclearexcept(FE_ALL_EXCEPT); f(4.5); __asm__ __volatile__ ( stt $f0,%0\n\t trapb\n\t mf_fpcr $f0\n\t trapb\n\t stt $f0,%1\n\t ldt $f0,%0 : =m(tmp), =m(ret)); for (i = 0; i 6; i++) printf( %s, (ret (57-i)) 1 ? names[i] :); printf( %x , fetestexcept(FE_ALL_EXCEPT)); printf(FE_INEXACT = %x\n, FE_INEXACT); } EOF ; gcc -lm a.c ; ./a.out INE 20 FE_INEXACT = 20 ; uname -a Linux wynton 2.6.22-rc7 #1 Thu Aug 30 02:03:17 EDT 2007 alpha GNU/Linux That's on freshly resurrected DS10, just brought to the last debian/alpha (i.e. lenny). Kernel had been locally built back before the box has died. On miles (3.3.6+, AS200) result is different: INE 0 FE_INEXACT = 20 On qemu (with debian kernel from lenny - 2.6.26) it's the same as on DS10: INE 20 FE_INEXACT = 20 It _might_ be the difference between 3.3 and 2.6.20-somethine, but I doubt that. It's definitely not a matter of difference in libc versions - AS200 box has 2.13-38, but static binary built on DS10 (with its 2.7-18) copied on AS200 behaves there as locally built one (i.e. 0 from fetestexcept(), as opposed to FE_INEXACT the same static binary produces on DS10 and under qemu). AFAICS, it leaves two possibilities - EV45 (AS200) vs. EV6 (DS10) and EV67 (qemu) _or_ some change in the kernel. I'll build 3.x kernel for DS10 and post the results; shouldn't take long...
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 06/25/2014 10:55 PM, Al Viro wrote: On Wed, Jun 25, 2014 at 08:01:17AM +0100, Al Viro wrote: On Tue, Jun 24, 2014 at 02:32:46PM -0700, Richard Henderson wrote: On 06/24/2014 02:24 PM, Al Viro wrote: Al, off to figure out the black magic TCG is using to generate calls... If you've a helper DEF_HELPER_1(halt, void, i64) then gen_helper_halt(...) will generate the tcg ops that result in the call. Another fun issue: CVTTQ is both underreporting the overflow *AND* reports the wrong kind - FOV instead of IOV. * it misses reporting overflows for case when it *knows* that overflow will happen - the need to shift up by more than 63 bits. Trivially fixed, of course. There overflow cases leave wrong result as well - should be 0. * it also misses reporting overflows for case when value is in ranges 2^63..2^64-1 and -2^64+1..-2^63-1. And yes, it's asymmetric - 2^63 is an overflow, -2^63 isn't. * overflow is reported by float_raise(float_flag_overflow, FP_STATUS). Wrong flag - it should be IOV, not FOV. And it should be set in FPCR regardless of the trap modifier (IOV, this VI thing is wrong - we should deal with that only when we generate a trap). * All overflow cases should raise INE as well. Could we steal bit 1 in float_exception_flags for IOV? It is (currently?) unused - enum { float_flag_invalid = 1, float_flag_divbyzero = 4, float_flag_overflow = 8, float_flag_underflow = 16, float_flag_inexact = 32, float_flag_input_denormal = 64, float_flag_output_denormal = 128 }; That would allow to deal with that crap nicely - we could have it raise the new flag, then have helper_fp_exc_raise_... for default trap mode mask it out (and yes, we need to set FPCR flags in default mode, as well as /U and /V - confirmed by direct experiment *and* by TFM). OK, I've managed to resurrect UP1000 box (FSVO resurrect - the southbridge DMA controller has always been buggered, with intermittent noise on one of the data lines; fans in CPU module are FUBAR as well - 17 and 20 RPM resp., so I don't risk keeping it running for long, etc.) Still, that allows to test EV67 and I hope to resurrect a DS10 box as well, which will allow for saner testing environment. Current delta follows, fixing gcc and libc testcases *and* AFAICS getting CVTTQ handling in line with what actual EV67 is doing. It's a dirty hack wrt float_raise() - relies on bit 1 never being raised by softfpu.c. I'll look into separating that bit, but it'll probably have non-zero costs ;-/ We need two flags - has IOV been raised during this insn (in this patch it's bit 1 of fp_status.float_exception_flags, cleaned along with those) and something to keep FPCR.IOV in (in this patch - bit 1 of fpcr_exc_status). Sure, we can add another uint8_t or two in struct CPUAlphaState, but that'll mean extra PITA in code and extra memory accesses... Review would be welcome. Looks good. I've split it up into a couple of smaller patches, made some sylistic tweaks and pushed it to git://github.com/rth7680/qemu.git axp-next I'm starting to do some testing now, but a glance though would be helpful. Especially to see if I didn't make some silly mistake in the process. r~
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Mon, Jun 30, 2014 at 11:39:43AM -0700, Richard Henderson wrote: Looks good. I've split it up into a couple of smaller patches, made some sylistic tweaks and pushed it to git://github.com/rth7680/qemu.git axp-next I'm starting to do some testing now, but a glance though would be helpful. Especially to see if I didn't make some silly mistake in the process. The only problem I see at a glance is that CVTTQ should raise IOV|INE in ranges 2^63..2^64-1 and -2^64+1..-2^63-1 as well. That's what this || ((int64_t)(ret-sign) 0) thing there was about and yes, it does match the behaviour of actual hardware (verified both on EV45 and EV67). FWIW, it might be better to do what float64_to_int64_round_to_zero() is doing - i.e. if (shift = 0) { if (shift 64) ret = frac shift; if (shift 11 || a == LIT64(0xC3E0)) exc = 0; } since frac is between 1ULL52 and (1ULL53)-1, i.e. shift greater than 11 is guaranteed to overflow, shift less than 11 is guaranteed not to and shift exactly 11 won't overflow only in one case - frac == 1ULL52, sign = 1 (i.e. when we have -2^63 there). BTW, shift == 63 is interesting - we certainly overflow, but we want the result to be 0 or 2^63 depending on the least significant bit of mantissa, not always 0. IOW, 0x4720 should yield IOV|INE, with result being 0 and 0x4721 - IOV|INE and result 0x8000. Again, verified on actual hardware; the last patch I posted had been incorrect in the last case (both cases yield 0 with it, same as in mainline qemu). Incremental on top of your branch would be diff --git a/target-alpha/fpu_helper.c b/target-alpha/fpu_helper.c --- a/target-alpha/fpu_helper.c +++ b/target-alpha/fpu_helper.c @@ -722,12 +722,10 @@ static inline uint64_t inline_cvttq(CPUAlphaState *env, uint64_t a, /* In this case the number is so large that we must shift the fraction left. There is no rounding to do. */ exc = float_flag_int_overflow | float_flag_inexact; -if (shift 63) { -ret = frac shift; -if ((ret shift) == frac) { -exc = 0; -} -} + if (shift 64) + ret = frac shift; +if (shift 11 || a == LIT64( 0xC3E0)) +exc = 0; } else { uint64_t round;
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Mon, Jun 30, 2014 at 09:56:35PM +0100, Al Viro wrote: FWIW, it might be better to do what float64_to_int64_round_to_zero() is doing - i.e. if (shift = 0) { if (shift 64) ret = frac shift; if (shift 11 || a == LIT64(0xC3E0)) exc = 0; } since frac is between 1ULL52 and (1ULL53)-1, i.e. shift greater than 11 is guaranteed to overflow, shift less than 11 is guaranteed not to and shift exactly 11 won't overflow only in one case - frac == 1ULL52, sign = 1 (i.e. when we have -2^63 there). BTW, shift == 63 is interesting - we certainly overflow, but we want the result to be 0 or 2^63 depending on the least significant bit of mantissa, not always 0. IOW, 0x4720 should yield IOV|INE, with result being 0 and 0x4721 - IOV|INE and result 0x8000. Again, verified on actual hardware; the last patch I posted had been incorrect in the last case (both cases yield 0 with it, same as in mainline qemu). While we are at it, CVTTQ yields INV on +-infinity, just as it does for NaNs. IOW, in inline_cvttq() exc = (frac ? float_flag_invalid : float_flag_int_overflow | float_flag_inexact); should be simply exc = float_flag_invalid; VAX operations are serious mess, but I'm not sure if we have them actually used anywhere in Linux kernel or userland. Always possible, of course, but...
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Tue, Jul 01, 2014 at 05:34:45AM +0100, Al Viro wrote: VAX operations are serious mess, but I'm not sure if we have them actually used anywhere in Linux kernel or userland. Always possible, of course, but... Grr... Truncated mail, sorry. Missing part: _If_ we decide that we want CVTGQ working correctly, we'll have the following pile of fun: * it needs non-saturating overflow handling, same as cvttq * it needs different rounding for CVTGQ and CVTGQ/C * CVTGQ/S needs EXC_M_SWC in the word fed to trap in INV case (i.e. when we see dirty zero or reserved). I think the right way to do it is to have it use float_raise() and finish with something similar to gen_fp_exc_raise(), except that... * VAX insns need a slightly different trap handling - fpcr_exc_mask is IEEE-only. * g_to_float64() isn't quite right here - we want e.g. 2^-1023 to result in 0 *and* we want inexact raised. As it is, we'll end up with exact 0.
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Tue, Jun 24, 2014 at 02:32:46PM -0700, Richard Henderson wrote: On 06/24/2014 02:24 PM, Al Viro wrote: Al, off to figure out the black magic TCG is using to generate calls... If you've a helper DEF_HELPER_1(halt, void, i64) then gen_helper_halt(...) will generate the tcg ops that result in the call. Another fun issue: CVTTQ is both underreporting the overflow *AND* reports the wrong kind - FOV instead of IOV. * it misses reporting overflows for case when it *knows* that overflow will happen - the need to shift up by more than 63 bits. Trivially fixed, of course. There overflow cases leave wrong result as well - should be 0. * it also misses reporting overflows for case when value is in ranges 2^63..2^64-1 and -2^64+1..-2^63-1. And yes, it's asymmetric - 2^63 is an overflow, -2^63 isn't. * overflow is reported by float_raise(float_flag_overflow, FP_STATUS). Wrong flag - it should be IOV, not FOV. And it should be set in FPCR regardless of the trap modifier (IOV, this VI thing is wrong - we should deal with that only when we generate a trap). * All overflow cases should raise INE as well. Could we steal bit 1 in float_exception_flags for IOV? It is (currently?) unused - enum { float_flag_invalid = 1, float_flag_divbyzero = 4, float_flag_overflow = 8, float_flag_underflow = 16, float_flag_inexact = 32, float_flag_input_denormal = 64, float_flag_output_denormal = 128 }; That would allow to deal with that crap nicely - we could have it raise the new flag, then have helper_fp_exc_raise_... for default trap mode mask it out (and yes, we need to set FPCR flags in default mode, as well as /U and /V - confirmed by direct experiment *and* by TFM). If we can't... well, we could put that flag separately, but it would be more unpleasant. Folks?
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 25 June 2014 08:01, Al Viro v...@zeniv.linux.org.uk wrote: Could we steal bit 1 in float_exception_flags for IOV? It is (currently?) unused - enum { float_flag_invalid = 1, float_flag_divbyzero = 4, float_flag_overflow = 8, float_flag_underflow = 16, float_flag_inexact = 32, float_flag_input_denormal = 64, float_flag_output_denormal = 128 }; That would allow to deal with that crap nicely - we could have it raise the new flag, then have helper_fp_exc_raise_... for default trap mode mask it out (and yes, we need to set FPCR flags in default mode, as well as /U and /V - confirmed by direct experiment *and* by TFM). If we can't... well, we could put that flag separately, but it would be more unpleasant. Folks? I think it's OK to put extra float_flags in, provided you can define their semantics in terms that make sense for more than one architecture (even if only one arch actually happens to need them). The input_denormal/output_denormal flags only get used for ARM, for instance. However if you wanted to split overflow from integer overflow you'd need to fix up all the other targets which expect them to generate just one exception flag... (Note that any patch touching softfloat files needs to come with a statement that you're happy to license it under either the softfloat-2a or softfloat-2b licenses, because we're currently midway through the tedious process of trying to relicense it.) thanks -- PMM
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Wed, Jun 25, 2014 at 10:27:11AM +0100, Peter Maydell wrote: I think it's OK to put extra float_flags in, provided you can define their semantics in terms that make sense for more than one architecture (even if only one arch actually happens to need them). The input_denormal/output_denormal flags only get used for ARM, for instance. However if you wanted to split overflow from integer overflow you'd need to fix up all the other targets which expect them to generate just one exception flag... Hmm... On alpha it's generated only by the following: CVTTQ, CVTGQ, CVTQL. I.e. conversions to integer formats that can be held in FPU registers (double - s64, VAX double - s64 and s64 - s32). Does softfloat even have anything similar? As it is, it's all in alpha-specific code; double - s64 might have a chance to be generic (semantics: * denorms - 0, raise inexact, provided that they survived to that point and hadn't buggered off with invalid * exact integers in range -2^63 .. 2^63-1 - equivalent 64bit integer * values outside of that range (all with zero fractional part, since the weight of LSB of significand will be considerably greater than 1 by that point) - lower 64 bits of value, raise integer overflow and inexact. * values with non-zero fractional part - rounded according to rounding mode, raise inexact. ), but existing float64_to_int64() isn't it - very different behaviour on overflows. Incidentally, VAX double to s64 is buggered in that area - it *does* try to use float64_to_int64() and, on top of getting INV instead of IOV, gets the wrong result in case of overflow (MAX_LONG/MIN_LONG instead of value in -2^63..2^63-1 comparable modulo 2^64 with exact value taken as element of $\Bbb Z$). And s64-s32 is just plain weird - not in the part that has IOV raised on values outside of -2^31..2^31-1, but in the bit shuffling it's doing if the test passes; alpha FPU stores s32 value in bits 63-62/58-29, with the rest filled with zeroes. In any case, it's not splitting float_overflow_flag; similar cases in softfloat.c raise float_invalid_flag. I don't know if it would make sense to try and teach float64_to_int64() about this kind of return value on overflow... (Note that any patch touching softfloat files needs to come with a statement that you're happy to license it under either the softfloat-2a or softfloat-2b licenses, because we're currently midway through the tedious process of trying to relicense it.) Wouldn't be a problem, but I doubt that it would be particulary useful to touch softfloat.c due to the reasons above, anyway.
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 25 June 2014 15:26, Al Viro v...@zeniv.linux.org.uk wrote: Hmm... On alpha it's generated only by the following: CVTTQ, CVTGQ, CVTQL. I.e. conversions to integer formats that can be held in FPU registers (double - s64, VAX double - s64 and s64 - s32). Does softfloat even have anything similar? Well, VAX doubles are a bit out of scope for an IEEE emulation library :-) As it is, it's all in alpha-specific code; It does sound like that's the best place for it. In that case, you don't want to add a flag to the softfloat float_flags -- they are specifically for indicating softfloat's status/exceptions. Flags handled purely in CPU-specific code should be stored in the CPU specific state struct somewhere. thanks -- PMM
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Wed, Jun 25, 2014 at 08:01:17AM +0100, Al Viro wrote: On Tue, Jun 24, 2014 at 02:32:46PM -0700, Richard Henderson wrote: On 06/24/2014 02:24 PM, Al Viro wrote: Al, off to figure out the black magic TCG is using to generate calls... If you've a helper DEF_HELPER_1(halt, void, i64) then gen_helper_halt(...) will generate the tcg ops that result in the call. Another fun issue: CVTTQ is both underreporting the overflow *AND* reports the wrong kind - FOV instead of IOV. * it misses reporting overflows for case when it *knows* that overflow will happen - the need to shift up by more than 63 bits. Trivially fixed, of course. There overflow cases leave wrong result as well - should be 0. * it also misses reporting overflows for case when value is in ranges 2^63..2^64-1 and -2^64+1..-2^63-1. And yes, it's asymmetric - 2^63 is an overflow, -2^63 isn't. * overflow is reported by float_raise(float_flag_overflow, FP_STATUS). Wrong flag - it should be IOV, not FOV. And it should be set in FPCR regardless of the trap modifier (IOV, this VI thing is wrong - we should deal with that only when we generate a trap). * All overflow cases should raise INE as well. Could we steal bit 1 in float_exception_flags for IOV? It is (currently?) unused - enum { float_flag_invalid = 1, float_flag_divbyzero = 4, float_flag_overflow = 8, float_flag_underflow = 16, float_flag_inexact = 32, float_flag_input_denormal = 64, float_flag_output_denormal = 128 }; That would allow to deal with that crap nicely - we could have it raise the new flag, then have helper_fp_exc_raise_... for default trap mode mask it out (and yes, we need to set FPCR flags in default mode, as well as /U and /V - confirmed by direct experiment *and* by TFM). OK, I've managed to resurrect UP1000 box (FSVO resurrect - the southbridge DMA controller has always been buggered, with intermittent noise on one of the data lines; fans in CPU module are FUBAR as well - 17 and 20 RPM resp., so I don't risk keeping it running for long, etc.) Still, that allows to test EV67 and I hope to resurrect a DS10 box as well, which will allow for saner testing environment. Current delta follows, fixing gcc and libc testcases *and* AFAICS getting CVTTQ handling in line with what actual EV67 is doing. It's a dirty hack wrt float_raise() - relies on bit 1 never being raised by softfpu.c. I'll look into separating that bit, but it'll probably have non-zero costs ;-/ We need two flags - has IOV been raised during this insn (in this patch it's bit 1 of fp_status.float_exception_flags, cleaned along with those) and something to keep FPCR.IOV in (in this patch - bit 1 of fpcr_exc_status). Sure, we can add another uint8_t or two in struct CPUAlphaState, but that'll mean extra PITA in code and extra memory accesses... Review would be welcome. diff --git a/target-alpha/cpu.h b/target-alpha/cpu.h index d9b861f..047b9a2 100644 --- a/target-alpha/cpu.h +++ b/target-alpha/cpu.h @@ -152,6 +152,10 @@ enum { FP_ROUND_DYNAMIC = 0x3, }; +enum { +float_flag_IOV = 2, +}; + /* FPCR bits */ #define FPCR_SUM (1ULL 63) #define FPCR_INED (1ULL 62) diff --git a/target-alpha/fpu_helper.c b/target-alpha/fpu_helper.c index d2d776c..2b39ea4 100644 --- a/target-alpha/fpu_helper.c +++ b/target-alpha/fpu_helper.c @@ -45,10 +45,11 @@ uint32_t helper_fp_exc_get(CPUAlphaState *env) } static inline void inline_fp_exc_raise(CPUAlphaState *env, uintptr_t retaddr, - uint32_t exc, uint32_t regno) + uint32_t exc, uint32_t regno, uint32_t hw_exc) { if (exc) { -uint32_t hw_exc = 0; + if (hw_exc) +exc = ~env-fpcr_exc_mask; if (exc float_flag_invalid) { hw_exc |= EXC_M_INV; @@ -65,6 +66,9 @@ static inline void inline_fp_exc_raise(CPUAlphaState *env, uintptr_t retaddr, if (exc float_flag_inexact) { hw_exc |= EXC_M_INE; } +if (exc float_flag_IOV) { +hw_exc |= EXC_M_IOV; +} arith_excp(env, retaddr, hw_exc, 1ull regno); } @@ -73,18 +77,21 @@ static inline void inline_fp_exc_raise(CPUAlphaState *env, uintptr_t retaddr, /* Raise exceptions for ieee fp insns without software completion. In that case there are no exceptions that don't trap; the mask doesn't apply. */ -void helper_fp_exc_raise(CPUAlphaState *env, uint32_t exc, uint32_t regno) +void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, uint32_t regno) { -inline_fp_exc_raise(env, GETPC(), exc, regno); +uint32_t exc = (uint8_t)env-fp_status.float_exception_flags; +if (exc) { + env-fpcr_exc_status |= exc; + inline_fp_exc_raise(env, GETPC(), exc ~ignore, regno, 0); +} } -/*
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Tue, Jun 24, 2014 at 05:34:23AM +0100, Al Viro wrote: First of all, kudos - with current qemu tree qemu-alpha-system is working pretty well - debian install and a *lot* of builds work just fine. As in, getting from lenny to pretty complete squeeze toolchain, including gcj, openjdk6 and a lot of crap needed to satisfy build-deps of those, plus all priority:required and most of priority:important ones. It's a _lot_ of beating and the damn thing survives - the problems had been with debian packages themselves (fstatat() bug in lenny libc, epically buggered build-deps in gcc-defaults, etc.). As it is, one core of 6-way 3.3GHz phenom II is quite capable of running a home-grown autobuilder. Feels like ~250-300MHz alpha with a very fast disk... Remaining problems, AFAICS, are around floating point traps. I've found one in glibc testsuite (math/tests-misc.c; overflow in ADDS/SU ends up with wrong results from fetestexcept() - only FE_OVERFLOW is set, while the sucker expects FE_INEXACT as well and actual hardware sets both) and another in gcc one (with -funsafe-math-optimizations CVTST/S on denorms triggers SIGFPE/FPE_FLTINV). The libc one is a bug in gen_fp_exc_raise_ignore() - the difference between ADDS/SU and ADDS/SUI is only in trapping, not storing results in FPCR.INE and friends. Both will have the same effect on those and if (ignore) { tcg_gen_andi_i32(exc, exc, ~ignore); } in gen_fp_exc_raise_ignore() leads to exc ignore not reaching the update of env-fpcr_exc_status in helper_fp_exc_raise_s(). See 4.7.8: [quote] In addition, the FPCR gives a summary of each exception type for the exception conditions detected by all IEEE floating-point operates thus far, as well as an overall summary bit that indicates whether any of these exception conditions has been detected. The indiividual exception bits match exactly in purpose and order the exception bits found in the exception summary quadword that is pushed for arithmetic traps. However, for each instruction, these exception bits arse set independent of the trapping mode specified for the instruction. Therefore, even though trapping may be disabled for a certain exceptional condition, the fact that the exceptional condition was encountered by an instruction is still recorded in the FPCR. [end quote] And yes, on actual hardware both ADDS/SU and ADDS/SUI set FPCR.INE the same way - verified by direct experiment. BTW, here's another testcase: nclude stdio.h unsigned long __attribute__((noinline)) f(double x) { return (unsigned long)x;// SVCTQ/SVC } main() { unsigned long x; extern unsigned long __ieee_get_fp_control(void); printf(before:%lx\n, __ieee_get_fp_control()); x = f(1ULL63); printf(after:%lx\n, __ieee_get_fp_control()); printf(result:%lx\n, x); } On actual hardware: before:0 after:2 result:8000 On qemu: before:0 after:0 result:8000 IOW, gen_fcvttq() is also affected, not only gen_fp_exc_raise(). Can't we simply have separate helpers for various trap suffices, with all this work on getting exc, etc. taken inside them? It's not as if we distinguished many variants, after all... Right now we have: plain, /U, /V /S, /SU /SUI /SV /SVI and /SU should probably be separated from /S - we do want to suppress underflow traps on those (again, FPCR.UND should be set regardless). That's what, 5 or 6 helpers? Might want to separate /V and /U from plain - AFAICS, we get it wrong with things like ADDS/U vs. ADDS (it's just that normally underflow traps are disabled by FPCR.DUND). I hadn't experimented with those yet, but even if it turns out that they *are* different - 8 helpers instead of the 2 we currently have, sharing most of the actual source... Another thing: shouldn't arithmetics on denorms without /S raise EXC_M_INV, rather than EXC_M_UNF?
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 06/23/2014 09:34 PM, Al Viro wrote: Anyway, delta that seems to fix the gcc one (gcc.dg/pr28796-2.c from gcc-4.3 and later) follows. Again, I'm not at all sure if handling of env-pc in there is safe from qemu POV and I'd like like to get comments on that from somebody more familiar with qemu guts. Thanks for the diagnosis on the gcc test case. I've been meaning to investigate some of these edge cases for quite a while and never quite got there. static inline void inline_fp_exc_raise(CPUAlphaState *env, uintptr_t retaddr, - uint32_t exc, uint32_t regno) + uint32_t exc, uint32_t regno, uint32_t sw) { if (exc) { -uint32_t hw_exc = 0; +uint32_t hw_exc = sw; if (exc float_flag_invalid) { hw_exc |= EXC_M_INV; @@ -75,7 +75,7 @@ static inline void inline_fp_exc_raise(CPUAlphaState *env, uintptr_t retaddr, doesn't apply. */ void helper_fp_exc_raise(CPUAlphaState *env, uint32_t exc, uint32_t regno) { -inline_fp_exc_raise(env, GETPC(), exc, regno); +inline_fp_exc_raise(env, GETPC(), exc, regno, 0); } /* Raise exceptions for ieee fp insns with software completion. */ @@ -84,7 +84,7 @@ void helper_fp_exc_raise_s(CPUAlphaState *env, uint32_t exc, uint32_t regno) if (exc) { env-fpcr_exc_status |= exc; exc = ~env-fpcr_exc_mask; -inline_fp_exc_raise(env, GETPC(), exc, regno); +inline_fp_exc_raise(env, GETPC(), exc, regno, EXC_M_SWC); } } This part looks good. diff --git a/target-alpha/helper.c b/target-alpha/helper.c index 7c053a3..538c6b2 100644 --- a/target-alpha/helper.c +++ b/target-alpha/helper.c @@ -527,6 +527,7 @@ void QEMU_NORETURN dynamic_excp(CPUAlphaState *env, uintptr_t retaddr, env-error_code = error; if (retaddr) { cpu_restore_state(cs, retaddr); + env-pc += 4; This one needs a different fix, since dynamic_excp is also used from alpha_cpu_unassigned_access, and I'm pretty sure the mchk should have the address of the memory insn. But that should be easy to fix up. r~
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 06/24/2014 09:52 AM, Al Viro wrote: unsigned long __attribute__((noinline)) f(double x) { return (unsigned long)x;// SVCTQ/SVC } main() { unsigned long x; extern unsigned long __ieee_get_fp_control(void); printf(before:%lx\n, __ieee_get_fp_control()); x = f(1ULL63); printf(after:%lx\n, __ieee_get_fp_control()); printf(result:%lx\n, x); } On actual hardware: before:0 after:2 result:8000 On qemu: before:0 after:0 result:8000 IOW, gen_fcvttq() is also affected, not only gen_fp_exc_raise(). Clearly a gross misunderstanding of what bits are actually computed, never mind what gets signaled. Thanks for the test. I've not had working hardware for a couple of years to validate what's supposed to get set and what isn't. Can't we simply have separate helpers for various trap suffices, with all this work on getting exc, etc. taken inside them? It's not as if we distinguished many variants, after all... Right now we have: plain, /U, /V /S, /SU /SUI /SV /SVI We used to have separate helpers... at least for the modes that had been implemented at the time. The combinatorial explosion ugly though -- 4 different versions of add, sub, etc. I thought the partial inlining was a decent solution, as far as maintainability, but it's not unreasonable to disagree. Another thing: shouldn't arithmetics on denorms without /S raise EXC_M_INV, rather than EXC_M_UNF? No idea. Should they? r~
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Tue, Jun 24, 2014 at 11:33:32AM -0700, Richard Henderson wrote: return (unsigned long)x;// SVCTQ/SVC CVTTQ/SVC, of course... Clearly a gross misunderstanding of what bits are actually computed, never mind what gets signaled. Thanks for the test. I've not had working hardware for a couple of years to validate what's supposed to get set and what isn't. If you have any ideas for testing, I do have working hw (the box that is currently alive is ev45, though; I _can_ try to boot a UP1000 one, but I make no promises regarding its fans, both in PSU and in CPU module ;-/) Can't we simply have separate helpers for various trap suffices, with all this work on getting exc, etc. taken inside them? It's not as if we distinguished many variants, after all... Right now we have: plain, /U, /V /S, /SU /SUI /SV /SVI We used to have separate helpers... at least for the modes that had been implemented at the time. The combinatorial explosion ugly though -- 4 different versions of add, sub, etc. I thought the partial inlining was a decent solution, as far as maintainability, but it's not unreasonable to disagree. Um? No, I mean having gen_fp_exc_raise() generate a call of one of the 8 helpers; gen_ieee_arith3() and friends would remain as-is. It's just that instead of generating load to exc, andi, call of helper_fp_exc_raise_s or call of helper_fp_exc_raise we would generate a call of one of the helper_fp_exc_raise{,_u,_v,_s,_su,_sui,_sv,_svi} and let that sucker deal with loading exc, updating -fpcr_exc_status and generating traps. Another thing: shouldn't arithmetics on denorms without /S raise EXC_M_INV, rather than EXC_M_UNF? No idea. Should they? They seem to - both from the arch.manual and from direct experiment... And they do set FPCR.INV at the same time, not just trigger the trap.
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Tue, Jun 24, 2014 at 11:23:01AM -0700, Richard Henderson wrote: env-error_code = error; if (retaddr) { cpu_restore_state(cs, retaddr); + env-pc += 4; This one needs a different fix, since dynamic_excp is also used from alpha_cpu_unassigned_access, and I'm pretty sure the mchk should have the address of the memory insn. But that should be easy to fix up. That's not a problem, actually - there we have dynamic_excp(env, 0, EXCP_MCHK, 0); so retaddr is going to be 0 and that env-pc += 4 won't be reached at all...
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 06/24/2014 01:32 PM, Al Viro wrote: If you have any ideas for testing, I do have working hw (the box that is currently alive is ev45, though; I _can_ try to boot a UP1000 one, but I make no promises regarding its fans, both in PSU and in CPU module ;-/) Ah. Gotta be careful with ev4/45... half of the fpu is unimplemented, and so if you're not careful all you're testing is the kernel emulation behaviour. Um? No, I mean having gen_fp_exc_raise() generate a call of one of the 8 helpers; gen_ieee_arith3() and friends would remain as-is. It's just that instead of generating load to exc, andi, call of helper_fp_exc_raise_s or call of helper_fp_exc_raise we would generate a call of one of the helper_fp_exc_raise{,_u,_v,_s,_su,_sui,_sv,_svi} and let that sucker deal with loading exc, updating -fpcr_exc_status and generating traps. Ah, I getcha. Yes, that makes sense. Another thing: shouldn't arithmetics on denorms without /S raise EXC_M_INV, rather than EXC_M_UNF? No idea. Should they? They seem to - both from the arch.manual and from direct experiment... And they do set FPCR.INV at the same time, not just trigger the trap. Ok. I'll try to make time to fix up some of this stuff this weekend. r~
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On Tue, Jun 24, 2014 at 01:57:52PM -0700, Richard Henderson wrote: On 06/24/2014 01:32 PM, Al Viro wrote: If you have any ideas for testing, I do have working hw (the box that is currently alive is ev45, though; I _can_ try to boot a UP1000 one, but I make no promises regarding its fans, both in PSU and in CPU module ;-/) Ah. Gotta be careful with ev4/45... half of the fpu is unimplemented, and so if you're not careful all you're testing is the kernel emulation behaviour. *nod* Um? No, I mean having gen_fp_exc_raise() generate a call of one of the 8 helpers; gen_ieee_arith3() and friends would remain as-is. It's just that instead of generating load to exc, andi, call of helper_fp_exc_raise_s or call of helper_fp_exc_raise we would generate a call of one of the helper_fp_exc_raise{,_u,_v,_s,_su,_sui,_sv,_svi} and let that sucker deal with loading exc, updating -fpcr_exc_status and generating traps. Ah, I getcha. Yes, that makes sense. FWIW, the crudest version of that is simply +env-fpcr_exc_status |= (uint8_t)env-fp_status.float_exception_flags; in the very beginning of helper_fp_exc_raise_s(). And yes, it recovers math/tests-misc.c, even though it's obviously not a good final fix. Al, off to figure out the black magic TCG is using to generate calls...
Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
On 06/24/2014 02:24 PM, Al Viro wrote: Al, off to figure out the black magic TCG is using to generate calls... If you've a helper DEF_HELPER_1(halt, void, i64) then gen_helper_halt(...) will generate the tcg ops that result in the call. r~