Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-10 Thread Al Viro
On Tue, Jul 08, 2014 at 05:33:16PM +0100, Peter Maydell wrote:

  Incidentally, combination of --enable-gprof and (default) --enable-pie
  won't build - it dies with ld(1) complaining about relocs in gcrt1.o.
 
 This sounds like a toolchain bug to me :-)

Debian stable/amd64, gcc 4.7.2, binutils 2.22.  And google search finds
this, for example: http://osdir.com/ml/qemu-devel/2013-05/msg00710.html.
That one has gcc 4.4.3.

Anyway, adding --disable-pie to --enable-gprof gets it to build, but
as I said, gprof is no better than perf and oprofile - same problem.

Stats I quoted were from qemu-system-alpha booting debian/lenny (5.10) and
going through their kernel package build.  I have perf report in front of
me right now; the top ones are
 41.77%  qemu-system-alp  perf-24701.map   [.] 0x7fbbee558930
 11.78%  qemu-system-alp  qemu-system-alpha[.] cpu_alpha_exec
  4.95%  qemu-system-alp  [vdso]   [.] 0x7fffdd7ff8de
  2.40%  qemu-system-alp  qemu-system-alpha[.] phys_page_find
  1.49%  qemu-system-alp  qemu-system-alpha[.] 
address_space_translate_internal
  1.34%  qemu-system-alp  [kernel.kallsyms][k] read_hpet
  1.26%  qemu-system-alp  qemu-system-alpha[.] tlb_set_page
  1.23%  qemu-system-alp  qemu-system-alpha[.] find_next_bit
  1.04%  qemu-system-alp  qemu-system-alpha[.] get_page_addr_code
  1.01%  qemu-system-alp  libpthread-2.13.so   [.] pthread_mutex_lock
  0.88%  qemu-system-alp  qemu-system-alpha[.] helper_cmpbge
  0.80%  qemu-system-alp  libc-2.13.so [.] __memset_sse2
  0.72%  qemu-system-alp  libpthread-2.13.so   [.] 
__pthread_mutex_unlock_usercnt
  0.70%  qemu-system-alp  qemu-system-alpha[.] get_physical_address
  0.69%  qemu-system-alp  qemu-system-alpha[.] address_space_translate
  0.68%  qemu-system-alp  qemu-system-alpha[.] tcg_optimize
  0.67%  qemu-system-alp  qemu-system-alpha[.] ldq_phys
  0.63%  qemu-system-alp  qemu-system-alpha[.] qemu_get_ram_ptr
  0.62%  qemu-system-alp  qemu-system-alpha[.] helper_le_ldq_mmu
  0.57%  qemu-system-alp  qemu-system-alpha[.] memory_region_is_ram

and cpu_alpha_exec() spends most of the time in inlined tb_find_fast().
It might be worth checking the actual distribution of the hash of virt
address used by that sucker - I wonder if dividing its argument by 4
wouldn't improve the things, but I don't have stats on actual frequency
of conflicts, etc.  In any case, the first lump (42%) seems to be tastier ;-)
There are all kinds of microoptimizations possible (e.g. helper_cmpbge() could
be done by a couple of MMX insns on amd64 host[1]), but it would be nice to
have some details on what we spend the time on in tcg output...

[1] The reason why helper_cmpbge() shows up is that string functions on alpha
use that insn a lot; it _might_ be worth optimizing.



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-10 Thread Al Viro
On Wed, Jul 09, 2014 at 08:14:12AM -0700, Richard Henderson wrote:
 On 07/08/2014 10:47 PM, Al Viro wrote:
  So env-fpcr_flush_to_zero = env-fpcr_dnod  env-fpcr_undz; is another
  bug - needs s/dnod/unfd/ there...
 
 That's exactly what I was looking at, thanks.

BTW, that (unimplementeds being RAZ) is why AARM insists on having FP_C in
software - FPCR isn't guaranteed to have the trap disable bits and, in fact,
doesn't have anywhere to store IEEE_TRAP_ENABLE_DNO on actual hw.  The
software completion is where it has to be dealt with; note that both
swcr_update_status() and ieee_swcr_to_fpcr() treat -ieee_state (i.e. our FP_C)
as authoritative wrt trap enable bits, 21264 or not.  Trap _status_ bits are
different - there (on 21264) FPCR is considered authoritative, but that's it.

Unimplemented trap bits are treated as trap enabled, so the completion gets
to decide what it wants to do.  If you want to keep FPCR authoritative for
all those bits in linux-user case, we have to treat FPCR.DNOD as writable
bit for that mode, which is why my variant slapped an ifdef CONFIG_USER_ONLY
around 
 env-fpcr_dnod = (val  FPCR_DNOD) != 0;
instead of ripping it out completely...



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-10 Thread Alex Bennée

Al Viro writes:

 On Tue, Jul 08, 2014 at 08:32:55PM +0100, Peter Maydell wrote:
  On 8 July 2014 18:20, Al Viro v...@zeniv.linux.org.uk wrote:
  On Tue, Jul 08, 2014 at 05:33:16PM +0100, Peter Maydell wrote:
snip
 Again, gprof isn't particulary useful - kernel-side profilers are at least as
 good.  So I suspect that most of the people running into that simply shrug and
 use those instead.  Narrowing it down to -pie didn't take long and I can
 confirm that this is the root cause of that breakage.  Should make debugging
 said toolchain bug a bit easier, if anybody cares to do that...

  Stats I quoted were from qemu-system-alpha booting debian/lenny (5.10) and
  going through their kernel package build.  I have perf report in front of
  me right now; the top ones are
   41.77%  qemu-system-alp  perf-24701.map   [.] 0x7fbbee558930
   11.78%  qemu-system-alp  qemu-system-alpha[.] cpu_alpha_exec
 
  and cpu_alpha_exec() spends most of the time in inlined tb_find_fast().
  It might be worth checking the actual distribution of the hash of virt
  address used by that sucker - I wonder if dividing its argument by 4
  wouldn't improve the things, but I don't have stats on actual frequency
  of conflicts, etc.  In any case, the first lump (42%) seems to be tastier 
  ;-)
 
 Depends on your point of view -- arguably we ought to be spending *more*
 time executing translated guest code... (As you say, the problem is that
 we don't have any breakdown of what things might turn out to be hotspots
 in the translated code.)

 Might be a fun project to teach perf that hits in such-and-such page should
 lead to lookup in a table annotating it.  As in offsets 42..69 should be
 recorded as (this address + offset - 42).  Then tcg could generate
 such tables and we'd get information like that much time is spent in
 the second host insn of instances of that code pattern generated by
 tcg_gen_shr_i64, etc.

 No idea if anything of that sort exists - qemu is not the only possible user
 for that; looks like it might be useful for any JIT profiling, so somebody
 could've done that already...

Handily our patch tracker has remembered what I couldn't find ;-)

https://patches.linaro.org/27229/

As I mentioned previously I plan to clean these up over the next week.

-- 
Alex Bennée



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-10 Thread Richard Henderson
On 07/08/2014 10:47 PM, Al Viro wrote:
 So env-fpcr_flush_to_zero = env-fpcr_dnod  env-fpcr_undz; is another
 bug - needs s/dnod/unfd/ there...

That's exactly what I was looking at, thanks.


r~



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-10 Thread Al Viro
On Mon, Jul 07, 2014 at 11:03:08PM -0700, Richard Henderson wrote:
 On 07/07/2014 09:20 PM, Al Viro wrote:
  and I'm reasonably sure that this is what they did internally.  You are
  proposing to do 4 cases in all their messy glory in qemu itself...
 
 Yes.  Primarily because we *have* to do so for the linux-user case.
 
  And that's not even going into generating the right si_code for that SIGFPE.
  What produces those TARGET_GEN_FLTINE and friends?
 
 linux-user/main.c, cpu_loop.

That's where we consume it; where is it produced?  Sure, explicit
gentrap in alpha code will lead there, with whatever we have in
$16 deciding what'll go into si_code, but where does that happen on
fp exception codepaths?  IOW, what sets si_code on those?



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-10 Thread Al Viro
On Tue, Jul 08, 2014 at 08:32:55PM +0100, Peter Maydell wrote:
 On 8 July 2014 18:20, Al Viro v...@zeniv.linux.org.uk wrote:
  On Tue, Jul 08, 2014 at 05:33:16PM +0100, Peter Maydell wrote:
 
   Incidentally, combination of --enable-gprof and (default) --enable-pie
   won't build - it dies with ld(1) complaining about relocs in gcrt1.o.
 
  This sounds like a toolchain bug to me :-)
 
  Debian stable/amd64, gcc 4.7.2, binutils 2.22.  And google search finds
  this, for example: http://osdir.com/ml/qemu-devel/2013-05/msg00710.html.
  That one has gcc 4.4.3.
 
 That just makes it a long-standing toolchain bug. I don't see any
 reason why PIE + gprof shouldn't work, it just looks like gprof
 doesn't ship and link a PIE runtime.

*nod*

It's not a huge itch to scratch for me, and I'm not even sure whether the
bug should be filed for gcc or for libc (probably the latter).  In any case,
having that information findable in list archives would probably be a good
thing.

Again, gprof isn't particulary useful - kernel-side profilers are at least as
good.  So I suspect that most of the people running into that simply shrug and
use those instead.  Narrowing it down to -pie didn't take long and I can
confirm that this is the root cause of that breakage.  Should make debugging
said toolchain bug a bit easier, if anybody cares to do that...

  Stats I quoted were from qemu-system-alpha booting debian/lenny (5.10) and
  going through their kernel package build.  I have perf report in front of
  me right now; the top ones are
   41.77%  qemu-system-alp  perf-24701.map   [.] 0x7fbbee558930
   11.78%  qemu-system-alp  qemu-system-alpha[.] cpu_alpha_exec
 
  and cpu_alpha_exec() spends most of the time in inlined tb_find_fast().
  It might be worth checking the actual distribution of the hash of virt
  address used by that sucker - I wonder if dividing its argument by 4
  wouldn't improve the things, but I don't have stats on actual frequency
  of conflicts, etc.  In any case, the first lump (42%) seems to be tastier 
  ;-)
 
 Depends on your point of view -- arguably we ought to be spending *more*
 time executing translated guest code... (As you say, the problem is that
 we don't have any breakdown of what things might turn out to be hotspots
 in the translated code.)

Might be a fun project to teach perf that hits in such-and-such page should
lead to lookup in a table annotating it.  As in offsets 42..69 should be
recorded as (this address + offset - 42).  Then tcg could generate
such tables and we'd get information like that much time is spent in
the second host insn of instances of that code pattern generated by
tcg_gen_shr_i64, etc.

No idea if anything of that sort exists - qemu is not the only possible user
for that; looks like it might be useful for any JIT profiling, so somebody
could've done that already...



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-09 Thread Al Viro
On Tue, Jul 08, 2014 at 09:05:10AM +0100, Peter Maydell wrote:

 The code we have currently may well be buggy, but the correct

It is ;-/  We set TARGET_FPE_FLTINV unconditionally there.  BTW, what's
the reason why all these cpu_loop() instances can't go into
linux-user/arch/something?  Is that just because you have
static pthread_mutex_t cpu_list_mutex = PTHREAD_MUTEX_INITIALIZER;
static pthread_mutex_t exclusive_lock = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t exclusive_cond = PTHREAD_COND_INITIALIZER;
static pthread_cond_t exclusive_resume = PTHREAD_COND_INITIALIZER;
static int pending_cpus;
and a bunch of inlines using them?  As it is, about three quarters of
linux-user/main.c consist of code under series of arch ifdefs...

BTW, are there any more or less uptodate docs on qemu profiling?  I mean,
things like perf/oprofile on the host obviously end up lumping all tcg
output together.  Is there any way to get information beyond ~40% of time
is spent in generated code, ~15% - in tb_find_fast(), and the rest is very
much colder?

Incidentally, combination of --enable-gprof and (default) --enable-pie
won't build - it dies with ld(1) complaining about relocs in gcrt1.o.
With --disable-pie it builds, but gprof of course has the same problem
as perf and friends - generated code is transient, so we get no details ;-/

 place to set si_code is (as Richard says) the Alpha cpu_loop() in
 linux-user/main.c, which has access to the trap type that just
 caused us to stop executing code, plus the CPUState, which
 should be enough information to set si_code correctly. In
 particular the GENTRAP case seems to be setting a variety
 of different si_code values for SIGFPE.

Sigh...  Well, having read through alpha_fp_emul() and the stuff it calls,
I understand why they hadn't implemented DNOD in any released hardware.
It's a bloody mess, with tons of interesting special cases.  E.g. adding
denorm to very large finite can push into overflow, with further effects
depending on whether we have overflow and/or denorm IEEE traps disabled,
etc.

Frankly, I suspect that it's better to have qemu-system-alpha behave like
the actual hardware does (including FPCR.DNOD can't be set) and keep the
linux-user behaviour as is, for somebody brave and masochistic enough to
fight that one.  And no, it's nowhere near just let denorms ride through
the normal softfloat code and play a bit with the flags it might raise.
And then there's netbsd/alpha and openbsd/alpha, so in theory somebody might
want to play with their software completion semantics (not identical to Linux
one) for the sake of yet-to-be-written bsd-user alpha support...

Anyway, how about the following delta?  AFAICS, it gets qemu-system-alpha
behaviour in sync with actual hardware without screwing qemu-alpha up.

diff --git a/target-alpha/fpu_helper.c b/target-alpha/fpu_helper.c
index 9b297de..30cbf02 100644
--- a/target-alpha/fpu_helper.c
+++ b/target-alpha/fpu_helper.c
@@ -44,6 +44,12 @@ uint32_t helper_fp_exc_get(CPUAlphaState *env)
 return get_float_exception_flags(FP_STATUS);
 }
 
+enum {
+   Exc_Mask = float_flag_invalid | float_flag_int_overflow |
+  float_flag_divbyzero | float_flag_overflow |
+  float_flag_underflow | float_flag_inexact
+};
+
 static inline void fp_exc_raise1(CPUAlphaState *env, uintptr_t retaddr,
  uint32_t exc, uint32_t regno, uint32_t hw_exc)
 {
@@ -73,7 +79,7 @@ static inline void fp_exc_raise1(CPUAlphaState *env, 
uintptr_t retaddr,
doesn't apply.  */
 void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, uint32_t regno)
 {
-uint32_t exc = (uint8_t)env-fp_status.float_exception_flags;
+uint32_t exc = (uint8_t)env-fp_status.float_exception_flags  Exc_Mask;
 if (exc) {
 env-fpcr_exc_status |= exc;
 exc = ~ignore;
@@ -86,7 +92,7 @@ void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, 
uint32_t regno)
 /* Raise exceptions for ieee fp insns with software completion.  */
 void helper_fp_exc_raise_s(CPUAlphaState *env, uint32_t ignore, uint32_t regno)
 {
-uint32_t exc = (uint8_t)env-fp_status.float_exception_flags;
+uint32_t exc = (uint8_t)env-fp_status.float_exception_flags  Exc_Mask;
 if (exc) {
 env-fpcr_exc_status |= exc;
 exc = ~ignore;
@@ -105,16 +111,14 @@ void helper_ieee_input(CPUAlphaState *env, uint64_t val)
 uint64_t frac = val  0xfull;
 
 if (exp == 0) {
-/* Denormals without DNZ set raise an exception.  */
-if (frac != 0  !env-fp_status.flush_inputs_to_zero) {
-arith_excp(env, GETPC(), EXC_M_UNF, 0);
+/* Denormals without /S raise an exception.  */
+if (frac != 0) {
+arith_excp(env, GETPC(), EXC_M_INV, 0);
 }
 } else if (exp == 0x7ff) {
 /* Infinity or NaN.  */
-/* ??? I'm not sure these exception bit flags are correct.  I do
-   know that the Linux kernel, at least, doesn't 

Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-09 Thread Al Viro
On Tue, Jul 08, 2014 at 09:59:33PM -0700, Richard Henderson wrote:
 On 07/08/2014 01:20 PM, Al Viro wrote:
  Aha...  So you've caught that one already...  I've looked at your branch;
  AFAICS, the only thing missing there is treating stores to FPCR.DNOD in
  system mode as not implemented (which it is in the code as well as in
  21[0-3]64 hardware).
 
 Is it loaded and stored on 21264, or it is read-as-zero/write-ignore?

RAZ, and the same on 21364 if Compaq manual for compiler-writers is to be
believed.

On 21264 bits 48..62 are writable, bit 63 is disjunction of bits 52..57
(stores are ignored), bits 0..47 are RAZ.  AARM requires RAZ bits 0..46
and RAZ on everything optional that is unimplemented.  IOW, DNOD is
unimplemented there, all other optional ones are implemented.  And
according to https://archive.org/details/dec-comp_guide_v2 21364 doesn't
implement DNOD either...

 Is UNDZ not required to be paired with DNOD?

There are 4 bits having some relation to handling of denorms.  DNZ and DNOD
are about denorm inputs; UNDZ and UNFD - about denorm output.  All of
them have effect only for IEEE insns with /S in trap suffix.

Rules:
* if DNZ, denorm inputs are silently replaced with zero.
* if !DNZ  !DNOD, denorm inputs trigger trap (invalid).  Same
as what would happen without /S.
* if !DNZ  DNOD, perform operation on denorm(s).  And I would like
to play with whatever you are using to bring hardware from alternative
universes.
* if !UNFD, denorm output triggers trap (underflow).  Same as what
would happen without /S.
* if UNFD  UNDZ, denorm output is replaced with zero.
* if UNFD  !UNDZ, denorm output remains as is.

So env-fpcr_flush_to_zero = env-fpcr_dnod  env-fpcr_undz; is another
bug - needs s/dnod/unfd/ there...



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-09 Thread Richard Henderson
On 07/08/2014 09:13 AM, Al Viro wrote:
 Frankly, I suspect that it's better to have qemu-system-alpha behave like
 the actual hardware does (including FPCR.DNOD can't be set) and keep the
 linux-user behaviour as is, for somebody brave and masochistic enough to
 fight that one.  And no, it's nowhere near just let denorms ride through
 the normal softfloat code and play a bit with the flags it might raise.
 And then there's netbsd/alpha and openbsd/alpha, so in theory somebody might
 want to play with their software completion semantics (not identical to Linux
 one) for the sake of yet-to-be-written bsd-user alpha support...

You're probably right there.

I've pushed a couple more patches to the branch, split out from your patch
here.  I believe I've got it all, and havn't mucked things up in the process.
I'll run some tests later today when I've got time.


r~



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-09 Thread Peter Maydell
On 8 July 2014 18:20, Al Viro v...@zeniv.linux.org.uk wrote:
 On Tue, Jul 08, 2014 at 05:33:16PM +0100, Peter Maydell wrote:

  Incidentally, combination of --enable-gprof and (default) --enable-pie
  won't build - it dies with ld(1) complaining about relocs in gcrt1.o.

 This sounds like a toolchain bug to me :-)

 Debian stable/amd64, gcc 4.7.2, binutils 2.22.  And google search finds
 this, for example: http://osdir.com/ml/qemu-devel/2013-05/msg00710.html.
 That one has gcc 4.4.3.

That just makes it a long-standing toolchain bug. I don't see any
reason why PIE + gprof shouldn't work, it just looks like gprof
doesn't ship and link a PIE runtime.

 Stats I quoted were from qemu-system-alpha booting debian/lenny (5.10) and
 going through their kernel package build.  I have perf report in front of
 me right now; the top ones are
  41.77%  qemu-system-alp  perf-24701.map   [.] 0x7fbbee558930
  11.78%  qemu-system-alp  qemu-system-alpha[.] cpu_alpha_exec

 and cpu_alpha_exec() spends most of the time in inlined tb_find_fast().
 It might be worth checking the actual distribution of the hash of virt
 address used by that sucker - I wonder if dividing its argument by 4
 wouldn't improve the things, but I don't have stats on actual frequency
 of conflicts, etc.  In any case, the first lump (42%) seems to be tastier ;-)

Depends on your point of view -- arguably we ought to be spending *more*
time executing translated guest code... (As you say, the problem is that
we don't have any breakdown of what things might turn out to be hotspots
in the translated code.)

-- PMM



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-09 Thread Richard Henderson
On 07/08/2014 01:20 PM, Al Viro wrote:
 Aha...  So you've caught that one already...  I've looked at your branch;
 AFAICS, the only thing missing there is treating stores to FPCR.DNOD in
 system mode as not implemented (which it is in the code as well as in
 21[0-3]64 hardware).

Is it loaded and stored on 21264, or it is read-as-zero/write-ignore?
Is UNDZ not required to be paired with DNOD?


r~



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-09 Thread Alex Bennée

Peter Maydell writes:

 On 8 July 2014 17:13, Al Viro v...@zeniv.linux.org.uk wrote:
 On Tue, Jul 08, 2014 at 09:05:10AM +0100, Peter Maydell wrote:

snip
 BTW, are there any more or less uptodate docs on qemu profiling?  I mean,
 things like perf/oprofile on the host obviously end up lumping all tcg
 output together.  Is there any way to get information beyond ~40% of time
 is spent in generated code, ~15% - in tb_find_fast(), and the rest is very
 much colder?

 Alex, did you say you'd done something with profiling recently?

I posted some RFC patches up a while back that spit out the perf
/tmp/perf.pid JIT maps that helps with breaking down which TCG TBs are
the most executed.

There is another set of patches which allow you to selectively dump
translation blocks so you don't end up with multi-gigabyte log files.

I'm going to be doing some profiling myself over the next few days so
I'll clean-up the patches and re-submit to the list soon.

-- 
Alex Bennée



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-08 Thread Richard Henderson
On 07/07/2014 09:20 PM, Al Viro wrote:
 and I'm reasonably sure that this is what they did internally.  You are
 proposing to do 4 cases in all their messy glory in qemu itself...

Yes.  Primarily because we *have* to do so for the linux-user case.

 And that's not even going into generating the right si_code for that SIGFPE.
 What produces those TARGET_GEN_FLTINE and friends?

linux-user/main.c, cpu_loop.


r~




Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-08 Thread Peter Maydell
On 8 July 2014 08:13, Al Viro v...@zeniv.linux.org.uk wrote:
 Actually, that's badly worded; what codepath ends up setting si_code on
 e.g. fp addition overflows?  In system mode it's done by completion code
 in the kernel, but AFAICS in user mode there are only two places where it
 might happen - one is gentrap handling and another - osf_setsysinfo(2)
 emulation for TARGET_SSI_IEEE_FP_CONTROL.  What I don't understand is how
 do we get from float_raise(FP_STATUS, float_flag_overflow) in fpu/softfloat.c
 to either of those.

 IOW, suppose I do
 x = DBL_MAX;
 feenableexcept(FE_ALL_EXCEPT);
 x *= x;
 I understand how I'll get SIGFPE, but what will set correct si_code in
 siginfo I'll see in the hanler?

The code we have currently may well be buggy, but the correct
place to set si_code is (as Richard says) the Alpha cpu_loop() in
linux-user/main.c, which has access to the trap type that just
caused us to stop executing code, plus the CPUState, which
should be enough information to set si_code correctly. In
particular the GENTRAP case seems to be setting a variety
of different si_code values for SIGFPE.

thanks
-- PMM



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-08 Thread Richard Henderson
On 07/08/2014 01:05 AM, Peter Maydell wrote:
 On 8 July 2014 08:13, Al Viro v...@zeniv.linux.org.uk wrote:
 Actually, that's badly worded; what codepath ends up setting si_code on
 e.g. fp addition overflows?  In system mode it's done by completion code
 in the kernel, but AFAICS in user mode there are only two places where it
 might happen - one is gentrap handling and another - osf_setsysinfo(2)
 emulation for TARGET_SSI_IEEE_FP_CONTROL.  What I don't understand is how
 do we get from float_raise(FP_STATUS, float_flag_overflow) in 
 fpu/softfloat.c
 to either of those.

 IOW, suppose I do
 x = DBL_MAX;
 feenableexcept(FE_ALL_EXCEPT);
 x *= x;
 I understand how I'll get SIGFPE, but what will set correct si_code in
 siginfo I'll see in the hanler?
 
 The code we have currently may well be buggy, but the correct
 place to set si_code is (as Richard says) the Alpha cpu_loop() in
 linux-user/main.c, which has access to the trap type that just
 caused us to stop executing code, plus the CPUState, which
 should be enough information to set si_code correctly. In
 particular the GENTRAP case seems to be setting a variety
 of different si_code values for SIGFPE.

The gentrap case is a red-herring.

The case you're looking for is EXC_ARITH.  The path is from

arith_excp
  dynamic_excp
cpu_loop_exit
  longjmp
  cpu_exec
cpu_loop

It's also true that we don't install the correct si_code there, but we could.
Mostly the gcc/glibc test cases really only care that SIGFPE gets raised, not
what the codes are, so I haven't bothered.


r~




Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-08 Thread Al Viro
On Tue, Jul 08, 2014 at 07:54:36AM +0100, Al Viro wrote:
 On Mon, Jul 07, 2014 at 11:03:08PM -0700, Richard Henderson wrote:
  On 07/07/2014 09:20 PM, Al Viro wrote:
   and I'm reasonably sure that this is what they did internally.  You are
   proposing to do 4 cases in all their messy glory in qemu itself...
  
  Yes.  Primarily because we *have* to do so for the linux-user case.
  
   And that's not even going into generating the right si_code for that 
   SIGFPE.
   What produces those TARGET_GEN_FLTINE and friends?
  
  linux-user/main.c, cpu_loop.
 
 That's where we consume it; where is it produced?  Sure, explicit
 gentrap in alpha code will lead there, with whatever we have in
 $16 deciding what'll go into si_code, but where does that happen on
 fp exception codepaths?  IOW, what sets si_code on those?

Actually, that's badly worded; what codepath ends up setting si_code on
e.g. fp addition overflows?  In system mode it's done by completion code
in the kernel, but AFAICS in user mode there are only two places where it
might happen - one is gentrap handling and another - osf_setsysinfo(2)
emulation for TARGET_SSI_IEEE_FP_CONTROL.  What I don't understand is how
do we get from float_raise(FP_STATUS, float_flag_overflow) in fpu/softfloat.c
to either of those.

IOW, suppose I do
x = DBL_MAX;
feenableexcept(FE_ALL_EXCEPT);
x *= x;
I understand how I'll get SIGFPE, but what will set correct si_code in
siginfo I'll see in the hanler?



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-08 Thread Peter Maydell
On 8 July 2014 17:13, Al Viro v...@zeniv.linux.org.uk wrote:
 On Tue, Jul 08, 2014 at 09:05:10AM +0100, Peter Maydell wrote:

 The code we have currently may well be buggy, but the correct

 It is ;-/  We set TARGET_FPE_FLTINV unconditionally there.  BTW, what's
 the reason why all these cpu_loop() instances can't go into
 linux-user/arch/something?

It's just ancient code nobody's cleaned up yet. I do have
move all this stuff into arch directories on my would like
to do list, but I just haven't got round to it yet, since it's
not actually actively broken (unlike many other areas of
our codebase :-/).

 BTW, are there any more or less uptodate docs on qemu profiling?  I mean,
 things like perf/oprofile on the host obviously end up lumping all tcg
 output together.  Is there any way to get information beyond ~40% of time
 is spent in generated code, ~15% - in tb_find_fast(), and the rest is very
 much colder?

Alex, did you say you'd done something with profiling recently?

 Incidentally, combination of --enable-gprof and (default) --enable-pie
 won't build - it dies with ld(1) complaining about relocs in gcrt1.o.

This sounds like a toolchain bug to me :-)

-- PMM



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-08 Thread Al Viro
On Tue, Jul 08, 2014 at 11:12:20AM -0700, Richard Henderson wrote:
 On 07/08/2014 09:13 AM, Al Viro wrote:
  Frankly, I suspect that it's better to have qemu-system-alpha behave like
  the actual hardware does (including FPCR.DNOD can't be set) and keep the
  linux-user behaviour as is, for somebody brave and masochistic enough to
  fight that one.  And no, it's nowhere near just let denorms ride through
  the normal softfloat code and play a bit with the flags it might raise.
  And then there's netbsd/alpha and openbsd/alpha, so in theory somebody might
  want to play with their software completion semantics (not identical to 
  Linux
  one) for the sake of yet-to-be-written bsd-user alpha support...
 
 You're probably right there.
 
 I've pushed a couple more patches to the branch, split out from your patch
 here.  I believe I've got it all, and havn't mucked things up in the process.
 I'll run some tests later today when I've got time.

Just one thing - 0x1f will make 32bit hosts whine about integer
constant being too large.  So will 0x1ful, unfortunately - it
really ought to be ull.



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-08 Thread Richard Henderson
On 07/08/2014 12:02 PM, Al Viro wrote:
 On Tue, Jul 08, 2014 at 11:12:20AM -0700, Richard Henderson wrote:
 On 07/08/2014 09:13 AM, Al Viro wrote:
 Frankly, I suspect that it's better to have qemu-system-alpha behave like
 the actual hardware does (including FPCR.DNOD can't be set) and keep the
 linux-user behaviour as is, for somebody brave and masochistic enough to
 fight that one.  And no, it's nowhere near just let denorms ride through
 the normal softfloat code and play a bit with the flags it might raise.
 And then there's netbsd/alpha and openbsd/alpha, so in theory somebody might
 want to play with their software completion semantics (not identical to 
 Linux
 one) for the sake of yet-to-be-written bsd-user alpha support...

 You're probably right there.

 I've pushed a couple more patches to the branch, split out from your patch
 here.  I believe I've got it all, and havn't mucked things up in the process.
 I'll run some tests later today when I've got time.
 
 Just one thing - 0x1f will make 32bit hosts whine about integer
 constant being too large.  So will 0x1ful, unfortunately - it
 really ought to be ull.
 

I did use ull on the branch.


r~



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-08 Thread Al Viro
On Tue, Jul 08, 2014 at 12:04:10PM -0700, Richard Henderson wrote:

  Just one thing - 0x1f will make 32bit hosts whine about integer
  constant being too large.  So will 0x1ful, unfortunately - it
  really ought to be ull.
  
 
 I did use ull on the branch.

Aha...  So you've caught that one already...  I've looked at your branch;
AFAICS, the only thing missing there is treating stores to FPCR.DNOD in
system mode as not implemented (which it is in the code as well as in
21[0-3]64 hardware).  Other than that everything seems to be fine; you are
right about cvtql treatment - since that sucker doesn't have /i in any
allowed trap suffices, we might as well just raise Inexact and let it be
masked out - float_flag_inexact will be present in 'ignore'.  And yes,
folding the calculation itself in there obviously makes sense.



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-07 Thread Richard Henderson
On 07/05/2014 03:55 PM, Al Viro wrote:
 +/* Input handing with software completion.  Trap for denorms,
 +   unless DNZ is set.  *IF* we try to support DNOD (which
 +   none of the produced hardware did, AFAICS), we'll need
 +   to suppress the trap when FPCR.DNOD is set; then the
 +   code downstream of that will need to cope with denorms
 +   sans flush_input_to_zero.  Most of it should work sanely,
 +   but there's nothing to compare with...
 +*/
 +void helper_ieee_input_s(CPUAlphaState *env, uint64_t val)
 +{
 +if (unlikely(2 * val - 1  0x1f)) {
 + if (!FP_STATUS.flush_inputs_to_zero) {
 + arith_excp(env, GETPC(), EXC_M_INV | EXC_M_SWC, 0);
 + }
 +}
 +}
 +

A couple of points here:

1) We should never raise this in user-only mode.  In that mode, we emulate the
whole fpu stack, all the way through from HW to the OS completion handler.

2) Because of that, we have the capability of doing the same thing in system
mode.  This lets us do more of the computation in the host, and less in the
guest, which is faster.  The only thing this makes more difficult is debugging
the OS completion handlers within the kernel, since they'll only get invoked
when SIGFPE needs to be sent.

3) If we do want to implement a mode where we faithfully send SWC for all of
the bits of IEEE that real HW didn't implement, do we really need to avoid a
store to the output register when signalling this?  I.e. can we notice this
condition after the fact with float_flag_input_denormal, rather than having
another function call to prep the inputs?


r~



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-07 Thread Al Viro
On Mon, Jul 07, 2014 at 07:11:58AM -0700, Richard Henderson wrote:
 A couple of points here:
 
 1) We should never raise this in user-only mode.  In that mode, we emulate the
 whole fpu stack, all the way through from HW to the OS completion handler.

How is that different from other cases where we have an exception raised
by an fp operation?

 2) Because of that, we have the capability of doing the same thing in system
 mode.  This lets us do more of the computation in the host, and less in the
 guest, which is faster.  The only thing this makes more difficult is debugging
 the OS completion handlers within the kernel, since they'll only get invoked
 when SIGFPE needs to be sent.

Umm...  The effect of software completion depends on current-ieee_state;
how would you keep track of that outside of guest kernel?

 3) If we do want to implement a mode where we faithfully send SWC for all of
 the bits of IEEE that real HW didn't implement, do we really need to avoid a
 store to the output register when signalling this?  I.e. can we notice this
 condition after the fact with float_flag_input_denormal, rather than having
 another function call to prep the inputs?

But flag_input_denormal is raised only when we do have DNZ set.  Which is
an entirely different case, where we should not (and do not) get an exception
at all...



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-07 Thread Richard Henderson
On 07/07/2014 08:06 AM, Al Viro wrote:
 On Mon, Jul 07, 2014 at 07:11:58AM -0700, Richard Henderson wrote:
 A couple of points here:

 1) We should never raise this in user-only mode.  In that mode, we emulate 
 the
 whole fpu stack, all the way through from HW to the OS completion handler.
 
 How is that different from other cases where we have an exception raised
 by an fp operation?

In all other cases we know we're going to send SIGFPE.  That's either through a
non /S insn which the kernel wouldn't touch, or by having computed the true
IEEE result and examined the exceptions to be raised.

 2) Because of that, we have the capability of doing the same thing in system
 mode.  This lets us do more of the computation in the host, and less in the
 guest, which is faster.  The only thing this makes more difficult is 
 debugging
 the OS completion handlers within the kernel, since they'll only get invoked
 when SIGFPE needs to be sent.
 
 Umm...  The effect of software completion depends on current-ieee_state;
 how would you keep track of that outside of guest kernel?

The kernel essentially keeps a copy of IEEE_STATE in the FPCR.  I don't see any
missing bits in ieee_swcr_to_fpcr, do you?

While real hardware might ignore some of those bits once stored, qemu doesn't.

While in real hardware one could force the FPCR and IEEE_STATE to differ,
honestly that'd be a bug.  (Although a silly one; I wish the kernel took the
EV6 FPCR as gospel for everything, not just the status flags.  That could make
certain libm.so computations much faster.)

 
 3) If we do want to implement a mode where we faithfully send SWC for all of
 the bits of IEEE that real HW didn't implement, do we really need to avoid a
 store to the output register when signalling this?  I.e. can we notice this
 condition after the fact with float_flag_input_denormal, rather than having
 another function call to prep the inputs?
 
 But flag_input_denormal is raised only when we do have DNZ set.  Which is
 an entirely different case, where we should not (and do not) get an exception
 at all...

Ah, you're right about that.  I'd mis-remembered the implementation.


r~



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-07 Thread Al Viro
On Mon, Jul 07, 2014 at 09:20:28AM -0700, Richard Henderson wrote:
  How is that different from other cases where we have an exception raised
  by an fp operation?
 
 In all other cases we know we're going to send SIGFPE.  That's either through 
 a
 non /S insn which the kernel wouldn't touch, or by having computed the true
 IEEE result and examined the exceptions to be raised.

Umm...  Not quite.  Consider e.g. CVTQL case.  There we have the following
picture in case of overflow (real hw with Linux on top of it):
no suffix:  IOV INE
/v: IOV INE SIGFPE
/sv, no IEEE INVE:  IOV INE INV
/sv, IEEE INVE: IOV INE INV SIGFPE
This is after the completion had a chance to run.  From the hw POV it's
no suffix   IOV INE no trap
/v  IOV INE trapIOV
/sv IOV INE trapSWC,IOV
and it's alpha_fp_emul() that does the rest in /sv case.  Actually, it's even
simpler:
if overflow
FPCR.INE = 1
raise IOV
do usual trap suffix handling
and I'm reasonably sure that this is what they did internally.  You are
proposing to do 4 cases in all their messy glory in qemu itself...

And I wouldn't bet a dime on not having similar turds in other insns; after
all, it's hard, let's offload it to software was only a part of motivation
for software completions.  We really don't like this part of IEEE standard
and we'd love to tell you to see figure 1, but we need conformance, so you
can mark an insn with /s and have the kernel do what IEEE requires is also
very visible in their manual.

Result is a mess - if you try to fold the kernel-side stuff into hardware,
you end up with a pile of inconsistent behaviours.  In principle, it's
doable, especially since we are not really constrained by actual hw in terms
of what we do in case of FPCR.DNOD being true - no actual hw could set it.
So we want
* hw behaviour without /s (denorms trap)
* hw behaviour with /s without denorms
* hw behaviour with /s with denorms with FPCR.DNZ (same as with 0) 
* kernel completion behaviour with /s with denorm
and it might even be what they intended for DNOD to do.  But it's going
to be messy as hell.

And that's not even going into generating the right si_code for that SIGFPE.
What produces those TARGET_GEN_FLTINE and friends?



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-05 Thread Al Viro
On Sat, Jul 05, 2014 at 02:40:55AM +0100, Al Viro wrote:
 a) softfloat.c raises flags we don't care about.  So checking that
 FP_STATUS.float_exception_flags is non-zero is *not* good - we catch
 false positives that way.
 
 b) DNZ has effect *only* for /S insns.  Without /S denorm means INV and
 that's it.  FPCR.INV isn't set, at that.  FPCR.INVD is ignored (it affects
 only insns with /S).
 
 c) without DNZ or DNOD denorms trip INV even with /S.  Again, FPCR.INV is
 not set *and* FPCR.INVD is ignored.  It does stop INV from SQRTT/SU on
 -1, but not on DBL_MIN/2 (and on SQRTT/SU(-1) FPCR.INV is set).  Looks like
 this sucker is a separate kind of trap, the only similarity with INV being
 that it sets the same bit in trap summary word.

BTW, CVTTQ/SVI on denorms with DNZ shouldn't set Inexact.

 Right now I have duplicate of 21264 SQRTT behaviour on everything except
 infinities; hadn't looked into those yet.  I'm going to massage it a bit
 and see if the result causes any regressions for corner cases of MULT
 and friends.  Hopefully I'll have something usable by tomorrow...

Situation with infinities/NaNs: without /S we should trap on those guys
for arithmetics and conversions; in trap summary we get EXC_M_INV (regardless
of the argument) *and* (unlike the treatment of denorms there) we should
set FPCR.INV.  With /S they are passed to operation, which is responsible
for raising whatever it wants to raise (so far they all seem to be doing
the right thing in that area).

With comparisons, denorm handling is the same as for arithemtics; i.e.
with /S they trigger INV unless DNZ is set (and, presumably, working DNOD
would have the same effect on them).

Anyway, the current delta (on top of 26f86) follows; seems to get IEEE
insns behave on non-finite arguments as they do on 21264.  The main
exception is that register bitmask supplied to trap isn't calculated in
a bunch of cases; since its main purpose is to help locating the trapping
insn and we report precise traps (amask feature bit 9), it's probably not
an interesting problem.  Current Linux kernel definitely won't look at that
thing under qemu; an old one might, but it would have to be something
older than 2.3... checks the history than 2.2.8, actually.  And the impact
is that insns with /S getting a denorm argument won't be properly emulated
and you'll get SIGFPE.  Again, it has to be a really old kernel (older than
May 1999) to be affected at all.

diff --git a/target-alpha/fpu_helper.c b/target-alpha/fpu_helper.c
index 9b297de..637d95e 100644
--- a/target-alpha/fpu_helper.c
+++ b/target-alpha/fpu_helper.c
@@ -44,6 +44,12 @@ uint32_t helper_fp_exc_get(CPUAlphaState *env)
 return get_float_exception_flags(FP_STATUS);
 }
 
+enum {
+   Exc_Mask = float_flag_invalid | float_flag_int_overflow |
+  float_flag_divbyzero | float_flag_overflow |
+  float_flag_underflow | float_flag_inexact
+};
+
 static inline void fp_exc_raise1(CPUAlphaState *env, uintptr_t retaddr,
  uint32_t exc, uint32_t regno, uint32_t hw_exc)
 {
@@ -73,7 +79,7 @@ static inline void fp_exc_raise1(CPUAlphaState *env, 
uintptr_t retaddr,
doesn't apply.  */
 void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, uint32_t regno)
 {
-uint32_t exc = (uint8_t)env-fp_status.float_exception_flags;
+uint32_t exc = (uint8_t)env-fp_status.float_exception_flags  Exc_Mask;
 if (exc) {
 env-fpcr_exc_status |= exc;
 exc = ~ignore;
@@ -86,7 +92,7 @@ void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, 
uint32_t regno)
 /* Raise exceptions for ieee fp insns with software completion.  */
 void helper_fp_exc_raise_s(CPUAlphaState *env, uint32_t ignore, uint32_t regno)
 {
-uint32_t exc = (uint8_t)env-fp_status.float_exception_flags;
+uint32_t exc = (uint8_t)env-fp_status.float_exception_flags  Exc_Mask;
 if (exc) {
 env-fpcr_exc_status |= exc;
 exc = ~ignore;
@@ -105,16 +111,14 @@ void helper_ieee_input(CPUAlphaState *env, uint64_t val)
 uint64_t frac = val  0xfull;
 
 if (exp == 0) {
-/* Denormals without DNZ set raise an exception.  */
-if (frac != 0  !env-fp_status.flush_inputs_to_zero) {
-arith_excp(env, GETPC(), EXC_M_UNF, 0);
+/* Denormals without /S raise an exception.  */
+if (frac != 0) {
+arith_excp(env, GETPC(), EXC_M_INV, 0);
 }
 } else if (exp == 0x7ff) {
-/* Infinity or NaN.  */
-/* ??? I'm not sure these exception bit flags are correct.  I do
-   know that the Linux kernel, at least, doesn't rely on them and
-   just emulates the insn to figure out what exception to use.  */
-arith_excp(env, GETPC(), frac ? EXC_M_INV : EXC_M_FOV, 0);
+/* Infinity or NaN */
+env-fpcr_exc_status |= float_flag_invalid;
+arith_excp(env, GETPC(), EXC_M_INV, 0);
 }
 }
 
@@ -125,16 +129,34 @@ void 

Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-05 Thread Al Viro
On Sat, Jul 05, 2014 at 10:09:51PM +0100, Al Viro wrote:

 Anyway, the current delta (on top of 26f86) follows; seems to get IEEE
 insns behave on non-finite arguments as they do on 21264.  The main
 exception is that register bitmask supplied to trap isn't calculated in
 a bunch of cases; since its main purpose is to help locating the trapping
 insn and we report precise traps (amask feature bit 9), it's probably not
 an interesting problem.  Current Linux kernel definitely won't look at that
 thing under qemu; an old one might, but it would have to be something
 older than 2.3... checks the history than 2.2.8, actually.  And the impact
 is that insns with /S getting a denorm argument won't be properly emulated
 and you'll get SIGFPE.  Again, it has to be a really old kernel (older than
 May 1999) to be affected at all.

... and a followup (and the last part of exception handling for non-VAX
insn inputs, AFAICS) - CVTQL.

* whether it triggers a trap or not, it sets IOV and INE on overflow.
* in case of trap it does *not* bugger off immediately - result is
calculated, stored and only then we trap.
* trap summary word is different from cvtql/v and cvtql/sv.  IOW, it's yet
another case of we think that IEEE semantics is stupid and if you need it,
you'd damn better ask for it explicitly.  Note that cvtql/v sets IOV|INE and
hits SIGFPE no matter what, while cvtql/sv set INV instead and triggers SIGFPE
only if FP_INVALID is enabled.  All difference is kernel-side and it's
triggered by EXC_M_SWC in summary word.

AFAICS, that should be it for IEEE and shared insns, as far as exceptions
on inputs are concerned.

Combined delta follows:

diff --git a/target-alpha/fpu_helper.c b/target-alpha/fpu_helper.c
index 9b297de..25c83b5 100644
--- a/target-alpha/fpu_helper.c
+++ b/target-alpha/fpu_helper.c
@@ -44,6 +44,12 @@ uint32_t helper_fp_exc_get(CPUAlphaState *env)
 return get_float_exception_flags(FP_STATUS);
 }
 
+enum {
+   Exc_Mask = float_flag_invalid | float_flag_int_overflow |
+  float_flag_divbyzero | float_flag_overflow |
+  float_flag_underflow | float_flag_inexact
+};
+
 static inline void fp_exc_raise1(CPUAlphaState *env, uintptr_t retaddr,
  uint32_t exc, uint32_t regno, uint32_t hw_exc)
 {
@@ -73,7 +79,7 @@ static inline void fp_exc_raise1(CPUAlphaState *env, 
uintptr_t retaddr,
doesn't apply.  */
 void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, uint32_t regno)
 {
-uint32_t exc = (uint8_t)env-fp_status.float_exception_flags;
+uint32_t exc = (uint8_t)env-fp_status.float_exception_flags  Exc_Mask;
 if (exc) {
 env-fpcr_exc_status |= exc;
 exc = ~ignore;
@@ -86,7 +92,7 @@ void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, 
uint32_t regno)
 /* Raise exceptions for ieee fp insns with software completion.  */
 void helper_fp_exc_raise_s(CPUAlphaState *env, uint32_t ignore, uint32_t regno)
 {
-uint32_t exc = (uint8_t)env-fp_status.float_exception_flags;
+uint32_t exc = (uint8_t)env-fp_status.float_exception_flags  Exc_Mask;
 if (exc) {
 env-fpcr_exc_status |= exc;
 exc = ~ignore;
@@ -105,16 +111,14 @@ void helper_ieee_input(CPUAlphaState *env, uint64_t val)
 uint64_t frac = val  0xfull;
 
 if (exp == 0) {
-/* Denormals without DNZ set raise an exception.  */
-if (frac != 0  !env-fp_status.flush_inputs_to_zero) {
-arith_excp(env, GETPC(), EXC_M_UNF, 0);
+/* Denormals without /S raise an exception.  */
+if (frac != 0) {
+arith_excp(env, GETPC(), EXC_M_INV, 0);
 }
 } else if (exp == 0x7ff) {
 /* Infinity or NaN.  */
-/* ??? I'm not sure these exception bit flags are correct.  I do
-   know that the Linux kernel, at least, doesn't rely on them and
-   just emulates the insn to figure out what exception to use.  */
-arith_excp(env, GETPC(), frac ? EXC_M_INV : EXC_M_FOV, 0);
+env-fpcr_exc_status |= float_flag_invalid;
+arith_excp(env, GETPC(), EXC_M_INV, 0);
 }
 }
 
@@ -125,16 +129,34 @@ void helper_ieee_input_cmp(CPUAlphaState *env, uint64_t 
val)
 uint64_t frac = val  0xfull;
 
 if (exp == 0) {
-/* Denormals without DNZ set raise an exception.  */
-if (frac != 0  !env-fp_status.flush_inputs_to_zero) {
-arith_excp(env, GETPC(), EXC_M_UNF, 0);
+/* Denormals raise an exception.  */
+if (frac != 0) {
+arith_excp(env, GETPC(), EXC_M_INV, 0);
 }
 } else if (exp == 0x7ff  frac) {
 /* NaN.  */
+env-fpcr_exc_status |= float_flag_invalid;
 arith_excp(env, GETPC(), EXC_M_INV, 0);
 }
 }
 
+/* Input handing with software completion.  Trap for denorms,
+   unless DNZ is set.  *IF* we try to support DNOD (which
+   none of the produced hardware did, AFAICS), we'll need
+   to suppress the trap when FPCR.DNOD is set; 

Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-04 Thread Al Viro
On Thu, Jul 03, 2014 at 09:30:05PM -0700, Richard Henderson wrote:

  Another one is probably not worth bothering - PERR, CTPOP, CTLZ, UNPKBx and 
  PKxB
  don't accept literal argument.  For one thing, as(1) won't let you generate
  those, so it would have to be explicit
  .long 0x70001620
  instead of
  perr $0,0,$0
  On DS10 it gives SIGILL; under qemu it succeeds.  Trivial to fix, anyway,
  if we care about that (if (islit) goto invalid_opc; in 1C.030..1C.037).
 
 Is it just 030..037, or everything under opcode 1C?

No, just those.

 Sadly, V4 of the handbook doesn't mention *anything* about not actually
 allowing literals for any of these insns.

It does - compare 4.13.1 (4-155, page 225) with 4.13.2 (two pages later).
The former has
MINxxx  Ra.rq,Rb.rq,Rc.wq   ! Operate format
Ra.rq,#b.ib,Rc.wq
MINxxx  Ra.rq,Rb.rq,Rc.wq   ! Operate format
Ra.rq,#b.ib,Rc.wq
The latter -
PERRRa.rq,Rb.rq,Rc.wq   ! Operate format

And yes, PERR with bit 12 set will give invalid instruction trap, while e.g.
MINSB8 won't.  The Real Intrudprize Kwality(tm) of technical writing, that,
but the information is, indeed, there.  Verified on UP1000, which has 0x307
for feature bits, so all this stuff is really in hardware, not emulated.

OPC1C is a mess - that's one place on alpha where decoder needs more than
upper 6 bits to determine the format of instruction.  Most of those guys
are Operate (with an extra twist being that some don't take literals),
but FTOIS and FTOIT are F-P, and only approximately so (its source refers
to integer register, destination to floating point one).  Note that
function field is in bits 5--11 for Operate and 5--15 for F-P ;-/  Bit
11 allows to discriminate between those, since FTOIS and FTOIT have function
0x70 and 0x78 resp, while everything else has it lower than 0x40.  Hell
knows how that mess had come to be...

Anyway, the situation with literals in OPC1C:
0, 1 (SEXT[BW]).  Work, rejected by as(1).
0x30--0x37.  Invalid instruction trap, as(1) (correctly) refuses to produce
those.
0x38--0x3f.  Work, accepted by as(1).
0x70, 0x78.  Those are F-P, no literals for them.

SEXTB/SEXTW are missing ARG_OPRL form in binutils opcodes/alpha-opc.c; probably
not worth bothering...



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-04 Thread Al Viro
Denorms fun:

a) softfloat.c raises flags we don't care about.  So checking that
FP_STATUS.float_exception_flags is non-zero is *not* good - we catch
false positives that way.

b) DNZ has effect *only* for /S insns.  Without /S denorm means INV and
that's it.  FPCR.INV isn't set, at that.  FPCR.INVD is ignored (it affects
only insns with /S).

c) without DNZ or DNOD denorms trip INV even with /S.  Again, FPCR.INV is
not set *and* FPCR.INVD is ignored.  It does stop INV from SQRTT/SU on
-1, but not on DBL_MIN/2 (and on SQRTT/SU(-1) FPCR.INV is set).  Looks like
this sucker is a separate kind of trap, the only similarity with INV being
that it sets the same bit in trap summary word.

d) at least on EV6 and EV67 DNOD *still* trips INV.  According to the
manual suppression of INV by DNOD is optional.  And while their text
might be interpreted as INV is suppressed if operation with denorm
wouldn't result in something unpleasant (which would apply to
sqrt(DBL_MIN/2)), the same behaviour happens on DBL_MIN/2 + DBL_MIN/2,
where the result is a good finite value, so it really looks like DNOD
doesn't suppress INV at all on these processors.

Does anybody have 21364 to run some tests on?

FWIW, hw testing had been done by direct printk from do_entArith(); it's
before anything alpha_fpu_emu() does.

Right now I have duplicate of 21264 SQRTT behaviour on everything except
infinities; hadn't looked into those yet.  I'm going to massage it a bit
and see if the result causes any regressions for corner cases of MULT
and friends.  Hopefully I'll have something usable by tomorrow...

Al, wondering if the original regression testsuite still exists somewhere in
the bowels of Intel - DEC/Compaq/HP had to have one for testing the hardware
back then...



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-04 Thread Al Viro
On Sat, Jul 05, 2014 at 02:40:55AM +0100, Al Viro wrote:
 d) at least on EV6 and EV67 DNOD *still* trips INV.  According to the
 manual suppression of INV by DNOD is optional.  And while their text
 might be interpreted as INV is suppressed if operation with denorm
 wouldn't result in something unpleasant (which would apply to
 sqrt(DBL_MIN/2)), the same behaviour happens on DBL_MIN/2 + DBL_MIN/2,
 where the result is a good finite value, so it really looks like DNOD
 doesn't suppress INV at all on these processors.
 
 Does anybody have 21364 to run some tests on?

In fact, DNOD is simply not implemented on those guys - if you try to set it,
the bit still reads zero.  Worse, according to Compiler Writer's Guide for the
21264/21364, Alpha architecture FPCR bit 47 (DNOD) is not implemented
by the 21264 or 21364.

In other words, it looks like FPCR.DNOD is something from (never-produced)
21464.



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-03 Thread Al Viro
More bugs: addl/v should sign-extend the result, as addl does.
As it is, we have
uint64_t helper_addlv(CPUAlphaState *env, uint64_t op1, uint64_t op2)
{
uint64_t tmp = op1;
op1 = (uint32_t)(op1 + op2);
if (unlikely((tmp ^ op2 ^ (-1UL))  (tmp ^ op1)  (1UL  31))) {
arith_excp(env, GETPC(), EXC_M_IOV, 0);
}
return op1;
}

IOW,
#include stdio.h

long r;
void __attribute__((noinline)) f(void)
{
asm __volatile(
subl   $31, 1, $0\n\t
addl   $0, $0, $1\n\t
addl/v $0, $0, $0\n\t
subq   $0, $1, $0\n\t
stq$0, %0\n\t
: =m(r): :$0, $1);
}

main()
{
f();
printf(%ld\n, r);
}

ends up printing 0 on actual hardware (all variants) and 4294967296 on
qemu.  Similar problem with subl/v - 

#include stdio.h

long r;
void __attribute__((noinline)) f(void)
{
asm __volatile(
subl   $31, 1, $0\n\t
subl/v $31, 1, $1\n\t
subq   $0, $1, $0\n\t
stq$0, %0\n\t
: =m(r): :$0, $1);
}

main()
{
f();
printf(%ld\n, r);
}

prints 0 on actual hw and -4294967296 on qemu.  What constraints do we have
on qemu host, anyway?  Two's-complement, (int32_t)(uint32_t)x == x for any
int32_t x?  helper_mullv() seems to assume that...

Oh, crap - our mull/v is sensitive to upper 32 bits of multiplicands.
If you put 1UL32 into one register, 1 into another and say mull/v,
result will be 0 and no overflow.  qemu does
int64_t res = (int64_t)op1 * (int64_t)op2;

if (unlikely((int32_t)res != res)) {
arith_excp(env, GETPC(), EXC_M_IOV, 0);
}
return (int64_t)((int32_t)res);
which leads to overflow trap triggered for no good reason...

Incidentally, all those guys ({add,sub,mul}[lq]/v) *do* assign the result
(same as the variant without /v would) before entering the trap.  So
arith_excp() is wrong here.

FWIW, why not just generate
trunc_i64_i32 tmp, va
trunc_i64_i32 tmp2, vb
muls2_i32 tmp2, tmp, tmp, tmp2
ext32s_i64 vc, tmp2
maybe_overflow_32 tmp
where maybe_overflow throws IOV unless tmp is 0 or -1?  That would appear
to suffice for mull/v.  mulq/v would be
muls2_i64 vc, tmp, va, vb
maybe_overflow_64 tmp
addl/v:
trunc_i64_i32 tmp, va
trunc_i64_i32 tmp2, vb
add2_i32 tmp2, tmp, tmp, zero, tmp2, zero
ext32s_i64 vc, tmp2
maybe_overflow_32 tmp
etc.

We'd need two helpers, differing only in argument type.  Simple
if (unlikely(arg  ~arg))
   arith_excp(env, GETPC(), EXC_M_IOV, 0);
would do.  Not sure what flags would be needed in DEFINE_HELPER_... for
those, though.  Comments?



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-03 Thread Al Viro
On Thu, Jul 03, 2014 at 07:51:04AM +0100, Al Viro wrote:

 FWIW, why not just generate
   trunc_i64_i32 tmp, va
   trunc_i64_i32 tmp2, vb
   muls2_i32 tmp2, tmp, tmp, tmp2
   ext32s_i64 vc, tmp2
   maybe_overflow_32 tmp
 where maybe_overflow throws IOV unless tmp is 0 or -1?
 to suffice for mull/v.  mulq/v would be
   muls2_i64 vc, tmp, va, vb
   maybe_overflow_64 tmp
 addl/v:
   trunc_i64_i32 tmp, va
   trunc_i64_i32 tmp2, vb
   add2_i32 tmp2, tmp, tmp, zero, tmp2, zero
   ext32s_i64 vc, tmp2
   maybe_overflow_32 tmp
 etc.

Grr...  Wrong check, obviously - we want to check that tmp + MSB(tmp2) is 0.
Something like
setcond_32  tmp2, tmp2, zero, TCG_COND_LT
add_i32 tmp, tmp2, tmp
callhelper_IOV_if_not_zero tmp
for 32bit ones and
setcond_64  tmp2, vc, zero, TCG_COND_LT
add_i64 tmp, tmp2, tmp
callhelper_IOV_if_not_zero tmp
for 64bit ones, or would it be better just to pass both arguments to helper
and let it deal with the check?  I'm not familiar enough with TCG, sorry...



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-03 Thread Richard Henderson
On 07/03/2014 11:25 AM, Al Viro wrote:
 On Thu, Jul 03, 2014 at 07:51:04AM +0100, Al Viro wrote:
 
 FWIW, why not just generate
  trunc_i64_i32 tmp, va
  trunc_i64_i32 tmp2, vb
  muls2_i32 tmp2, tmp, tmp, tmp2
  ext32s_i64 vc, tmp2
  maybe_overflow_32 tmp
 where maybe_overflow throws IOV unless tmp is 0 or -1?
 to suffice for mull/v.  mulq/v would be
  muls2_i64 vc, tmp, va, vb
  maybe_overflow_64 tmp
 addl/v:
  trunc_i64_i32 tmp, va
  trunc_i64_i32 tmp2, vb
  add2_i32 tmp2, tmp, tmp, zero, tmp2, zero
  ext32s_i64 vc, tmp2
  maybe_overflow_32 tmp
 etc.
 
 Grr...  Wrong check, obviously - we want to check that tmp + MSB(tmp2) is 0.
 Something like
   setcond_32  tmp2, tmp2, zero, TCG_COND_LT
   add_i32 tmp, tmp2, tmp
   callhelper_IOV_if_not_zero tmp
 for 32bit ones and
   setcond_64  tmp2, vc, zero, TCG_COND_LT
   add_i64 tmp, tmp2, tmp
   callhelper_IOV_if_not_zero tmp
 for 64bit ones, or would it be better just to pass both arguments to helper
 and let it deal with the check?  I'm not familiar enough with TCG, sorry...
 

I believe I have a tidy solution to these /v insns.  New patch set shortly.


r~



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-03 Thread Al Viro
On Thu, Jul 03, 2014 at 01:19:19PM -0700, Richard Henderson wrote:
  Grr...  Wrong check, obviously - we want to check that tmp + MSB(tmp2) is 0.
  Something like
  setcond_32  tmp2, tmp2, zero, TCG_COND_LT
  add_i32 tmp, tmp2, tmp
  callhelper_IOV_if_not_zero tmp
  for 32bit ones and
  setcond_64  tmp2, vc, zero, TCG_COND_LT
  add_i64 tmp, tmp2, tmp
  callhelper_IOV_if_not_zero tmp
  for 64bit ones, or would it be better just to pass both arguments to helper
  and let it deal with the check?  I'm not familiar enough with TCG, sorry...
  
 
 I believe I have a tidy solution to these /v insns.  New patch set shortly.

Hmm...
+tcg_gen_eqv_i64(tmp, va, vb);
+tcg_gen_mov_i64(tmp2, va);
+tcg_gen_add_i64(vc, va, vb);
+tcg_gen_xor_i64(tmp2, tmp2, vc);
+tcg_gen_and_i64(tmp, tmp, tmp2);
+tcg_gen_shri_i64(tmp, tmp, 63);
+tcg_gen_movi_i64(tmp2, 0);
+gen_helper_check_overflow(cpu_env, tmp, tmp2);

How can that be correct?  Suppose a = b = 0.  We get
tcg_gen_eqv_i64(tmp, va, vb);   -  tmp = -1
tcg_gen_mov_i64(tmp2, va);  -  tmp2 = 0
tcg_gen_add_i64(vc, va, vb);-  c = 0
tcg_gen_xor_i64(tmp2, tmp2, vc);-  tmp2 = 0
tcg_gen_and_i64(tmp, tmp, tmp2);-  tmp = -1
tcg_gen_shri_i64(tmp, tmp, 63); -  tmp = 1
tcg_gen_movi_i64(tmp2, 0);  -  tmp2 = 0
gen_helper_check_overflow(cpu_env, tmp, tmp2);  - not equal, overflow.

What am I missing here?



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-03 Thread Peter Maydell
On 3 July 2014 23:47, Al Viro v...@zeniv.linux.org.uk wrote:
 How can that be correct?  Suppose a = b = 0.  We get
 tcg_gen_eqv_i64(tmp, va, vb);   -  tmp = -1
 tcg_gen_mov_i64(tmp2, va);  -  tmp2 = 0
 tcg_gen_add_i64(vc, va, vb);-  c = 0
 tcg_gen_xor_i64(tmp2, tmp2, vc);-  tmp2 = 0
 tcg_gen_and_i64(tmp, tmp, tmp2);-  tmp = -1

tmp2 here is 0, so the result of this AND is 0, not -1...

 tcg_gen_shri_i64(tmp, tmp, 63); -  tmp = 1

so tmp = 0

 tcg_gen_movi_i64(tmp2, 0);  -  tmp2 = 0
 gen_helper_check_overflow(cpu_env, tmp, tmp2);  - not equal, overflow.

and tmp == tmp2, no overflow.

thanks
-- PMM



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-03 Thread Al Viro
On Fri, Jul 04, 2014 at 12:05:37AM +0100, Peter Maydell wrote:
 On 3 July 2014 23:47, Al Viro v...@zeniv.linux.org.uk wrote:
  How can that be correct?  Suppose a = b = 0.  We get
  tcg_gen_eqv_i64(tmp, va, vb);   -  tmp = -1
  tcg_gen_mov_i64(tmp2, va);  -  tmp2 = 0
  tcg_gen_add_i64(vc, va, vb);-  c = 0
  tcg_gen_xor_i64(tmp2, tmp2, vc);-  tmp2 = 0
  tcg_gen_and_i64(tmp, tmp, tmp2);-  tmp = -1
 
 tmp2 here is 0, so the result of this AND is 0, not -1...

Doh.  Misread it as tcg_gen_add_i64, sorry.

Hmm...  So it's ((a ^ ~b)  (a ^ c)  0 as overflow condition, IOW
MSB(a) == MSB(b)  MSB(c) != MSB(a).  OK, that works; might deserve
a comment, though...



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-03 Thread Al Viro
On Thu, Jul 03, 2014 at 01:19:19PM -0700, Richard Henderson wrote:

 I believe I have a tidy solution to these /v insns.  New patch set shortly.

OK, looks sane.  Next (trivial) bug: in translate_one()
case 0xF800:
/* WH64 */
/* No-op */
break;
should be followed by
case 0xFC00:
/* WH64EN */
/* No-op */
break;

As it is,
asm __volatile( lda$0,%0\n\t
wh64en ($0)\n\t :: m(r));
ends sending SIGILL.

Another one is probably not worth bothering - PERR, CTPOP, CTLZ, UNPKBx and PKxB
don't accept literal argument.  For one thing, as(1) won't let you generate
those, so it would have to be explicit
.long 0x70001620
instead of
perr $0,0,$0
On DS10 it gives SIGILL; under qemu it succeeds.  Trivial to fix, anyway,
if we care about that (if (islit) goto invalid_opc; in 1C.030..1C.037).

Another interesting bit I _really_ don't want to touch right now is LDx_L/STx_C;
what we get there is closer to compare-and-swap than to what the real
hardware is doing.  OTOH, considering the constraints on what can go between
LDx_L and STx_C, I'm not sure whether it can lead to any real problems with
the current qemu behaviour...

Hell knows; could a long linear piece of code with LDL_L near the point where
it runs out of space in block end up with QEMU switching to different cpu
before we reach the matching STL_C?  If so, there might be problems; on actual
hardware

CPU1: LDL_L reads 0
CPU2: store 1
...
CPU2: store 0
CPU1: STL_C
would have STL_C fail.  qemu implementation of those suckers will succeed.
I'm not sure if anything in the kernel is sensitive to that, but analysis
won't be fun...



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-03 Thread Richard Henderson
On 07/03/2014 05:50 PM, Al Viro wrote:
 OK, looks sane.  Next (trivial) bug: in translate_one()
 case 0xF800:
 /* WH64 */
 /* No-op */
 break;
 should be followed by
 case 0xFC00:
 /* WH64EN */
 /* No-op */
 break;

Huh.  I don't have any documentation for EV7.  Added.

 Another one is probably not worth bothering - PERR, CTPOP, CTLZ, UNPKBx and 
 PKxB
 don't accept literal argument.  For one thing, as(1) won't let you generate
 those, so it would have to be explicit
   .long 0x70001620
 instead of
   perr $0,0,$0
 On DS10 it gives SIGILL; under qemu it succeeds.  Trivial to fix, anyway,
 if we care about that (if (islit) goto invalid_opc; in 1C.030..1C.037).

Is it just 030..037, or everything under opcode 1C?

Sadly, V4 of the handbook doesn't mention *anything* about not actually
allowing literals for any of these insns.

For now, I've updated insns in the range you describe, because it's easy.

 CPU1: LDL_L reads 0
 CPU2: store 1
 ...
 CPU2: store 0
 CPU1: STL_C
 would have STL_C fail.  qemu implementation of those suckers will succeed.
 I'm not sure if anything in the kernel is sensitive to that, but analysis
 won't be fun...

I'm aware that lock/cond can be used in ways that we don't support, including
STL_C to a different address on the same cacheline as the LDL_L.

I'm also aware that if we actually did implement SMP, we would be vulnerable to
the ABA error you describe above.

That said, it's all moot until the PALcode grows actual SMP support for booting
and signalling secondary cpus.  Given that qemu implements SMP by multiplexing
the guest cpus on a single host thread, and so we can't actually speed up the
guest by implementing SMP, it's not seemed like a priority.

The next thing I'd work on given oodles of time is to add block device support
to the PALcode, so that the console could boot from disk like a real machine.
In theory, most of this code can be stolen from SeaBIOS.


r~



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-02 Thread Al Viro
On Wed, Jul 02, 2014 at 06:50:27AM +0100, Al Viro wrote:

 AFAICS, it leaves two possibilities - EV45 (AS200) vs. EV6 (DS10) and EV67
 (qemu) _or_ some change in the kernel.  I'll build 3.x kernel for DS10 and
 post the results; shouldn't take long...

Actually, it's simpler - note that on *all* systems we end up with FPCR.INE
set.  So this
swcr_update_status(unsigned long swcr, unsigned long fpcr)
{
/* EV6 implements most of the bits in hardware.  Collect
   the acrued exception bits from the real fpcr.  */
if (implver() == IMPLVER_EV6) {
swcr = ~IEEE_STATUS_MASK;
swcr |= (fpcr  35)  IEEE_STATUS_MASK;
}
return swcr;
}
ends up with FE_INEXACT set on everything that has implver() return 2.
Which is what EV6 and EV67 do and which is what qemu does by default.

So no, it's not a kernel version difference; it's all kernel versions ignoring
FPCR.INE when it calculates ieee_state on EV45 and using it on EV6 and friends.

If we don't want FE_INEXACT seen by fetestexcept() after rounding 4.5, we'd
better not use FPCR.INE - *all* variants of actual hardware (at least from
21064A to 21264) set that sucker, and 4.7 in Architecture Reference Manual
very clearly requires such behaviour for any subset that isn't completely
without floating point support.



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-02 Thread Richard Henderson
On 07/01/2014 09:05 PM, Al Viro wrote:
 Which glibc version it is?  I don't see such failures with your
 axp/axp-next (head at 6b38f4e7f); could you post the details on your
 reproducer?  I've tried to guess the likely version by glibc.git, but
 I don't see nearbyint tests with such argument in any version there,
 so I couldn't find it that way...

Glibc mainline, then look at math/test-double.out.

I'm interested in the results of the following test.


r~
#include stdio.h
#include fenv.h
#include math.h
#include float.h

static void div_su(void)
{
  asm(divt/su %0,%1,$f0; trapb : : f(1.0), f(3.0) : $f0);
}

static void div_sui(void)
{
  asm(divt/su %0,%1,$f0; trapb : : f(1.0), f(3.0) : $f0);
}

static void mul_su(void)
{
  asm(mult/su %0,%0,$f0; trapb : : f(DBL_MIN) : $f0);
}

static void mul_sui(void)
{
  asm(mult/sui %0,%0,$f0; trapb : : f(DBL_MIN) : $f0);
}

static void cvttq_45(void)
{
  asm(cvttq/c %0,$f0; trapb : : f(4.5) : $f0);
}

static void cvttq_sv_45(void)
{
  asm(cvttq/svc %0,$f0; trapb : : f(4.5) : $f0);
}

static void cvttq_svi_45(void)
{
  asm(cvttq/svic %0,$f0; trapb : : f(4.5) : $f0);
}

static void cvttq_max(void)
{
  asm(cvttq/c %0,$f0; trapb : : f(DBL_MAX) : $f0);
}

static void cvttq_sv_max(void)
{
  asm(cvttq/svc %0,$f0; trapb : : f(DBL_MAX) : $f0);
}

static void cvttq_svi_max(void)
{
  asm(cvttq/svic %0,$f0; trapb : : f(DBL_MAX) : $f0);
}

static struct test {
  void (*fn)(void);
  const char *name;
} const tests[] = {
  { div_su,		/su  : 1/3 },
  { div_sui,		/sui : 1/3 },
  { mul_su,		/su  : min*min },
  { mul_sui,		/sui : min*min },
  { cvttq_45,		/: (long)4.5 },
  { cvttq_sv_45,	/sv  : (long)4.5 },
  { cvttq_svi_45,	/svi : (long)4.5 },
  { cvttq_max,		/: (long)max },
  { cvttq_sv_max,	/sv  : (long)max },
  { cvttq_svi_max,	/svi : (long)max },
};

int main()
{
  char result[8];
  int i, e;

  for (i = 0; i  sizeof(tests)/sizeof(struct test); ++i)
{
  feclearexcept(FE_ALL_EXCEPT);
  tests[i].fn();

  e = fetestexcept(FE_ALL_EXCEPT);
  result[0] = e  FE_DIVBYZERO ? 'd' : '-';
  result[1] = e  FE_INEXACT ? 'i' : '-';
  result[2] = e  FE_INVALID ? 'I' : '-';
  result[3] = e  FE_OVERFLOW ? 'o' : '-';
  result[4] = e  FE_UNDERFLOW ? 'u' : '-';
  result[5] = '\0';
  printf(%-20s %s\n, tests[i].name, result);
}

  return 0;
}


Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-02 Thread Al Viro
 I'm interested in the results of the following test.


DS10:
/su  : 1/3   -i---
/sui : 1/3   -i---
/su  : min*min   -i--u
/sui : min*min   -i--u
/: (long)4.5 -i---
/sv  : (long)4.5 -i---
/svi : (long)4.5 -i---
/: (long)max -i---
/sv  : (long)max -iI--
/svi : (long)max -iI--

AS200:
/su  : 1/3   -
/sui : 1/3   -
/su  : min*min   -i--u
/sui : min*min   -i--u
/: (long)4.5 -
/sv  : (long)4.5 -
/svi : (long)4.5 -
/: (long)max -
/sv  : (long)max --I--
/svi : (long)max --I--

qemu:
/su  : 1/3   -i---
/sui : 1/3   -i---
/su  : min*min   -i--u
/sui : min*min   -i--u
/: (long)4.5 -i---
/sv  : (long)4.5 -i---
/svi : (long)4.5 -i---
/: (long)max -i---
/sv  : (long)max -iI--
/svi : (long)max -iI--

IOW, same as EV6.  The difference is due to the kernel trusting FPCR.INE
as source for FE_INEXACT when it sees implver() returning 2.  See a bit
upthread for analysis...



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-02 Thread Richard Henderson
On 07/01/2014 11:17 PM, Al Viro wrote:
 If we don't want FE_INEXACT seen by fetestexcept() after rounding 4.5, we'd
 better not use FPCR.INE - *all* variants of actual hardware (at least from
 21064A to 21264) set that sucker, and 4.7 in Architecture Reference Manual
 very clearly requires such behaviour for any subset that isn't completely
 without floating point support.

Um, where do you see that?  I see:

# 4.7.6.4 IEEE-Compliant Arithmetic Without Inexact Exception
# This model is similar to the model in Section 4.7.6.3, except this
# model does not signal inexact results either by the inexact status
# flag or by trapping. [...] This model is implemented by using IEEE
# floating-point instructions with the /SU or /SV trap qualifiers.

The important words to me being does not signal and inexact status flag.

Thus in sysdeps/alpha/fpu/s_nearbyint.c I explicitly use cvttq/svd and not
cvttq/svid.  By my reading that means no inexact shall be raised.


r~



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-02 Thread Al Viro
On Wed, Jul 02, 2014 at 08:26:53AM -0700, Richard Henderson wrote:
 On 07/01/2014 11:17 PM, Al Viro wrote:
  If we don't want FE_INEXACT seen by fetestexcept() after rounding 4.5, we'd
  better not use FPCR.INE - *all* variants of actual hardware (at least from
  21064A to 21264) set that sucker, and 4.7 in Architecture Reference Manual
  very clearly requires such behaviour for any subset that isn't completely
  without floating point support.
 
 Um, where do you see that?  I see:
 
 # 4.7.6.4 IEEE-Compliant Arithmetic Without Inexact Exception
 # This model is similar to the model in Section 4.7.6.3, except this
 # model does not signal inexact results either by the inexact status
 # flag or by trapping. [...] This model is implemented by using IEEE
 # floating-point instructions with the /SU or /SV trap qualifiers.
 
 The important words to me being does not signal and inexact status flag.
 
 Thus in sysdeps/alpha/fpu/s_nearbyint.c I explicitly use cvttq/svd and not
 cvttq/svid.  By my reading that means no inexact shall be raised.

What does that have to do with exceptions?  cvttq/svd is not going to raise
one; it *does* set that bit in FPCR, though.  What happens afterwards is
that fetestexcept() calls osf_getsysinfo(2) with GSI_IEEE_FP_CONTROL for op.
Which does
w = current_thread_info()-ieee_state  IEEE_SW_MASK;
w = swcr_update_status(w, rdfpcr());
and hands the value of w to caller.  Now, look at swcr_update_status()
(in arch/alpha/include/uapi/asm/fpu.h these days) and note that on 21264
it will throw away the status bits of -ieee_state and use 6 bits from
FPCR instead.

Note, BTW, that appendix B (IEEE conformance) claims (in B.1) conversions as
hardware-implemented, with Software routines support remainder, round to
integer in floating-point format, and convert binary to/from decimal right
next to it.



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-01 Thread Richard Henderson
On 06/30/2014 09:34 PM, Al Viro wrote:
 VAX operations are serious mess, but I'm not sure if we have them actually
 used anywhere in Linux kernel or userland.  Always possible, of course, but...

As far as I know, vax insns aren't used anywhere.  If I were doing this port
from scratch I'd leave them totally stubbed out.

Let's not spend any kind of effort on this at all until other more important
things are improved.


r~



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-01 Thread Richard Henderson
On 06/30/2014 01:56 PM, Al Viro wrote:
 On Mon, Jun 30, 2014 at 11:39:43AM -0700, Richard Henderson wrote:
 
 Looks good.

 I've split it up into a couple of smaller patches, made some sylistic tweaks
 and pushed it to

   git://github.com/rth7680/qemu.git axp-next

 I'm starting to do some testing now, but a glance though would be helpful.
 Especially to see if I didn't make some silly mistake in the process.

Hmm.  I've just gotten through glibc testing and there are quite a few failures
of the form

Failure: nearbyint (4.5): Exception Inexact set

in math/test-double.out.

Any chance you can run the glibc math tests against real hardware and see if
these pass?  I have a feeling that qemu is now signaling inexact when the
hardware doesn't for /SU (but not /SUI) instructions.



r~



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-01 Thread Al Viro
On Tue, Jul 01, 2014 at 10:03:06AM -0700, Richard Henderson wrote:
 On 06/30/2014 01:56 PM, Al Viro wrote:
  On Mon, Jun 30, 2014 at 11:39:43AM -0700, Richard Henderson wrote:
  
  Looks good.
 
  I've split it up into a couple of smaller patches, made some sylistic 
  tweaks
  and pushed it to
 
git://github.com/rth7680/qemu.git axp-next
 
  I'm starting to do some testing now, but a glance though would be helpful.
  Especially to see if I didn't make some silly mistake in the process.
 
 Hmm.  I've just gotten through glibc testing and there are quite a few 
 failures
 of the form
 
 Failure: nearbyint (4.5): Exception Inexact set
 
 in math/test-double.out.
 
 Any chance you can run the glibc math tests against real hardware and see if
 these pass?  I have a feeling that qemu is now signaling inexact when the
 hardware doesn't for /SU (but not /SUI) instructions.

Which glibc version?  Better yet, could you throw preprocessed source
my way?  UP1000 box is not in a good shape and I'd rather avoid trying to run
full glibc builds on it ;-/



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-01 Thread Peter Maydell
On 1 July 2014 18:50, Al Viro v...@zeniv.linux.org.uk wrote:
 Which glibc version?  Better yet, could you throw preprocessed source
 my way?  UP1000 box is not in a good shape and I'd rather avoid trying to run
 full glibc builds on it ;-/

Would a 164LX be a useful (ie non-duplicate) extra resource
for testing this stuff? That has a 21164 (EV5) in it. I haven't
tried to boot it for some years, but I can have a try at
resurrecting it if it would be helpful...

thanks
-- PMM



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-01 Thread Richard Henderson
On 07/01/2014 11:23 AM, Peter Maydell wrote:
 On 1 July 2014 18:50, Al Viro v...@zeniv.linux.org.uk wrote:
 Which glibc version?  Better yet, could you throw preprocessed source
 my way?  UP1000 box is not in a good shape and I'd rather avoid trying to run
 full glibc builds on it ;-/
 
 Would a 164LX be a useful (ie non-duplicate) extra resource
 for testing this stuff? That has a 21164 (EV5) in it. I haven't
 tried to boot it for some years, but I can have a try at
 resurrecting it if it would be helpful...

An ev5 would be a good fallback.


r~




Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-01 Thread Peter Maydell
On 1 July 2014 19:30, Richard Henderson r...@twiddle.net wrote:
 On 07/01/2014 11:23 AM, Peter Maydell wrote:
 Would a 164LX be a useful (ie non-duplicate) extra resource
 for testing this stuff? That has a 21164 (EV5) in it. I haven't
 tried to boot it for some years, but I can have a try at
 resurrecting it if it would be helpful...

 An ev5 would be a good fallback.

Well, it boots. /proc/cpuinfo claims it's an EV56 variation 7
revision 0. Currently running Debian etch. Let me know if
you have anything you want me to run on it...

thanks
-- PMM



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-01 Thread Al Viro
On Tue, Jul 01, 2014 at 11:30:19AM -0700, Richard Henderson wrote:
 On 07/01/2014 11:23 AM, Peter Maydell wrote:
  On 1 July 2014 18:50, Al Viro v...@zeniv.linux.org.uk wrote:
  Which glibc version?  Better yet, could you throw preprocessed source
  my way?  UP1000 box is not in a good shape and I'd rather avoid trying to 
  run
  full glibc builds on it ;-/
  
  Would a 164LX be a useful (ie non-duplicate) extra resource
  for testing this stuff? That has a 21164 (EV5) in it. I haven't
  tried to boot it for some years, but I can have a try at
  resurrecting it if it would be helpful...
 
 An ev5 would be a good fallback.

OK, DS10 resurrected and so far seems to be stable (I'll know by tomorrow;
there's a possibility that chipset heatsink is dodgy, but so far it seems
to be doing OK).  That gives us a EV6 box.

Which glibc version it is?  I don't see such failures with your
axp/axp-next (head at 6b38f4e7f); could you post the details on your
reproducer?  I've tried to guess the likely version by glibc.git, but
I don't see nearbyint tests with such argument in any version there,
so I couldn't find it that way...



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-01 Thread Al Viro
On Wed, Jul 02, 2014 at 05:05:08AM +0100, Al Viro wrote:
 OK, DS10 resurrected and so far seems to be stable (I'll know by tomorrow;
 there's a possibility that chipset heatsink is dodgy, but so far it seems
 to be doing OK).  That gives us a EV6 box.
 
 Which glibc version it is?  I don't see such failures with your
 axp/axp-next (head at 6b38f4e7f); could you post the details on your
 reproducer?  I've tried to guess the likely version by glibc.git, but
 I don't see nearbyint tests with such argument in any version there,
 so I couldn't find it that way...

FWIW,

; cat a.c 'EOF'
#include stdio.h
#include fenv.h

volatile long x;
void __attribute__((noinline)) f(double v)
{
x = v;
}

main()
{
unsigned long tmp, ret;
static char *names[] = {IOV, INE, UNF, OVF, DZE, INV};
int i;

feclearexcept(FE_ALL_EXCEPT);
f(4.5);

__asm__ __volatile__ (
stt $f0,%0\n\t
trapb\n\t
mf_fpcr $f0\n\t
trapb\n\t
stt $f0,%1\n\t
ldt $f0,%0
: =m(tmp), =m(ret));
for (i = 0; i  6; i++)
printf( %s, (ret  (57-i))  1 ? names[i] :);
printf( %x , fetestexcept(FE_ALL_EXCEPT));
printf(FE_INEXACT = %x\n, FE_INEXACT);
}
EOF
; gcc -lm a.c
; ./a.out
 INE 20 FE_INEXACT = 20
; uname -a
Linux wynton 2.6.22-rc7 #1 Thu Aug 30 02:03:17 EDT 2007 alpha GNU/Linux

That's on freshly resurrected DS10, just brought to the last debian/alpha
(i.e. lenny).  Kernel had been locally built back before the box has died.

On miles (3.3.6+, AS200) result is different:
 INE 0 FE_INEXACT = 20

On qemu (with debian kernel from lenny - 2.6.26) it's the same as on DS10:
 INE 20 FE_INEXACT = 20

It _might_ be the difference between 3.3 and 2.6.20-somethine, but I doubt
that.  It's definitely not a matter of difference in libc versions - AS200
box has 2.13-38, but static binary built on DS10 (with its 2.7-18) copied on
AS200 behaves there as locally built one (i.e. 0 from fetestexcept(), as
opposed to FE_INEXACT the same static binary produces on DS10 and under
qemu).

AFAICS, it leaves two possibilities - EV45 (AS200) vs. EV6 (DS10) and EV67
(qemu) _or_ some change in the kernel.  I'll build 3.x kernel for DS10 and
post the results; shouldn't take long...



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-06-30 Thread Richard Henderson
On 06/25/2014 10:55 PM, Al Viro wrote:
 On Wed, Jun 25, 2014 at 08:01:17AM +0100, Al Viro wrote:
 On Tue, Jun 24, 2014 at 02:32:46PM -0700, Richard Henderson wrote:
 On 06/24/2014 02:24 PM, Al Viro wrote:
 Al, off to figure out the black magic TCG is using to generate calls...

 If you've a helper

 DEF_HELPER_1(halt, void, i64)

 then

   gen_helper_halt(...)

 will generate the tcg ops that result in the call.

 Another fun issue:

 CVTTQ is both underreporting the overflow *AND* reports the wrong kind - FOV
 instead of IOV.

  * it misses reporting overflows for case when it *knows* that
overflow will happen - the need to shift up by more than 63 bits.
Trivially fixed, of course.  There overflow cases leave wrong
result as well - should be 0.
  * it also misses reporting overflows for case when value is in
ranges 2^63..2^64-1 and -2^64+1..-2^63-1.  And yes, it's
asymmetric - 2^63 is an overflow, -2^63 isn't.
  * overflow is reported by float_raise(float_flag_overflow, FP_STATUS).
Wrong flag - it should be IOV, not FOV.  And it should be set
in FPCR regardless of the trap modifier (IOV, this VI thing is
wrong - we should deal with that only when we generate a trap).
  * All overflow cases should raise INE as well.

 Could we steal bit 1 in float_exception_flags for IOV?  It is (currently?)
 unused -
 enum {
 float_flag_invalid   =  1,
 float_flag_divbyzero =  4,
 float_flag_overflow  =  8,
 float_flag_underflow = 16,
 float_flag_inexact   = 32,
 float_flag_input_denormal = 64,
 float_flag_output_denormal = 128
 };

 That would allow to deal with that crap nicely - we could have it raise
 the new flag, then have helper_fp_exc_raise_... for default trap mode
 mask it out (and yes, we need to set FPCR flags in default mode, as well
 as /U and /V - confirmed by direct experiment *and* by TFM).
 
 OK, I've managed to resurrect UP1000 box (FSVO resurrect - the southbridge
 DMA controller has always been buggered, with intermittent noise on one of
 the data lines; fans in CPU module are FUBAR as well - 17 and 20 RPM resp.,
 so I don't risk keeping it running for long, etc.)
 
 Still, that allows to test EV67 and I hope to resurrect a DS10 box as well,
 which will allow for saner testing environment.
 
 Current delta follows, fixing gcc and libc testcases *and* AFAICS getting
 CVTTQ handling in line with what actual EV67 is doing.  It's a dirty hack
 wrt float_raise() - relies on bit 1 never being raised by softfpu.c.  I'll
 look into separating that bit, but it'll probably have non-zero costs ;-/
 We need two flags - has IOV been raised during this insn (in this patch
 it's bit 1 of fp_status.float_exception_flags, cleaned along with those)
 and something to keep FPCR.IOV in (in this patch - bit 1 of fpcr_exc_status).
 Sure, we can add another uint8_t or two in struct CPUAlphaState, but that'll
 mean extra PITA in code and extra memory accesses...
 
 Review would be welcome.

Looks good.

I've split it up into a couple of smaller patches, made some sylistic tweaks
and pushed it to

  git://github.com/rth7680/qemu.git axp-next

I'm starting to do some testing now, but a glance though would be helpful.
Especially to see if I didn't make some silly mistake in the process.


r~



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-06-30 Thread Al Viro
On Mon, Jun 30, 2014 at 11:39:43AM -0700, Richard Henderson wrote:

 Looks good.
 
 I've split it up into a couple of smaller patches, made some sylistic tweaks
 and pushed it to
 
   git://github.com/rth7680/qemu.git axp-next
 
 I'm starting to do some testing now, but a glance though would be helpful.
 Especially to see if I didn't make some silly mistake in the process.

The only problem I see at a glance is that CVTTQ should raise IOV|INE in
ranges 2^63..2^64-1 and -2^64+1..-2^63-1 as well.  That's what this
|| ((int64_t)(ret-sign)  0)
thing there was about and yes, it does match the behaviour of actual hardware
(verified both on EV45 and EV67).

FWIW, it might be better to do what float64_to_int64_round_to_zero() is doing -
i.e.
if (shift = 0) {
if (shift  64)
ret = frac  shift;
if (shift  11 || a == LIT64(0xC3E0))
exc = 0;
}
since frac is between 1ULL52 and (1ULL53)-1, i.e. shift greater than 11
is guaranteed to overflow, shift less than 11 is guaranteed not to and shift
exactly 11 won't overflow only in one case - frac == 1ULL52, sign = 1 (i.e.
when we have -2^63 there).  BTW, shift == 63 is interesting - we certainly
overflow, but we want the result to be 0 or 2^63 depending on the least
significant bit of mantissa, not always 0.  IOW, 0x4720 should
yield IOV|INE, with result being 0 and 0x4721 - IOV|INE and
result 0x8000.  Again, verified on actual hardware; the last
patch I posted had been incorrect in the last case (both cases yield 0 with it,
same as in mainline qemu).

Incremental on top of your branch would be

diff --git a/target-alpha/fpu_helper.c b/target-alpha/fpu_helper.c
--- a/target-alpha/fpu_helper.c
+++ b/target-alpha/fpu_helper.c
@@ -722,12 +722,10 @@ static inline uint64_t inline_cvttq(CPUAlphaState *env, 
uint64_t a,
 /* In this case the number is so large that we must shift
the fraction left.  There is no rounding to do.  */
 exc = float_flag_int_overflow | float_flag_inexact;
-if (shift  63) {
-ret = frac  shift;
-if ((ret  shift) == frac) {
-exc = 0;
-}
-}
+   if (shift  64)
+   ret = frac  shift;
+if (shift  11 || a == LIT64( 0xC3E0))
+exc = 0;
 } else {
 uint64_t round;
 



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-06-30 Thread Al Viro
On Mon, Jun 30, 2014 at 09:56:35PM +0100, Al Viro wrote:
 FWIW, it might be better to do what float64_to_int64_round_to_zero() is doing 
 -
 i.e.
   if (shift = 0) {
   if (shift  64)
   ret = frac  shift;
   if (shift  11 || a == LIT64(0xC3E0))
   exc = 0;
   }
 since frac is between 1ULL52 and (1ULL53)-1, i.e. shift greater than 11
 is guaranteed to overflow, shift less than 11 is guaranteed not to and shift
 exactly 11 won't overflow only in one case - frac == 1ULL52, sign = 1 (i.e.
 when we have -2^63 there).  BTW, shift == 63 is interesting - we certainly
 overflow, but we want the result to be 0 or 2^63 depending on the least
 significant bit of mantissa, not always 0.  IOW, 0x4720 should
 yield IOV|INE, with result being 0 and 0x4721 - IOV|INE and
 result 0x8000.  Again, verified on actual hardware; the last
 patch I posted had been incorrect in the last case (both cases yield 0 with 
 it,
 same as in mainline qemu).

While we are at it, CVTTQ yields INV on +-infinity, just as it does for NaNs.
IOW, in inline_cvttq()
exc = (frac ? float_flag_invalid
   : float_flag_int_overflow | float_flag_inexact);
should be simply
exc = float_flag_invalid;

VAX operations are serious mess, but I'm not sure if we have them actually
used anywhere in Linux kernel or userland.  Always possible, of course, but...



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-06-30 Thread Al Viro
On Tue, Jul 01, 2014 at 05:34:45AM +0100, Al Viro wrote:

 VAX operations are serious mess, but I'm not sure if we have them actually
 used anywhere in Linux kernel or userland.  Always possible, of course, but...

Grr...  Truncated mail, sorry.  Missing part:

_If_ we decide that we want CVTGQ working correctly, we'll have the following
pile of fun:
* it needs non-saturating overflow handling, same as cvttq
* it needs different rounding for CVTGQ and CVTGQ/C
* CVTGQ/S needs EXC_M_SWC in the word fed to trap in INV case (i.e.
when we see dirty zero or reserved).  I think the right way to do it is to
have it use float_raise() and finish with something similar to
gen_fp_exc_raise(), except that...
* VAX insns need a slightly different trap handling - fpcr_exc_mask
is IEEE-only.
* g_to_float64() isn't quite right here - we want e.g. 2^-1023 to
result in 0 *and* we want inexact raised.  As it is, we'll end up with
exact 0.



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-06-25 Thread Al Viro
On Tue, Jun 24, 2014 at 02:32:46PM -0700, Richard Henderson wrote:
 On 06/24/2014 02:24 PM, Al Viro wrote:
  Al, off to figure out the black magic TCG is using to generate calls...
 
 If you've a helper
 
 DEF_HELPER_1(halt, void, i64)
 
 then
 
   gen_helper_halt(...)
 
 will generate the tcg ops that result in the call.

Another fun issue:

CVTTQ is both underreporting the overflow *AND* reports the wrong kind - FOV
instead of IOV.

* it misses reporting overflows for case when it *knows* that
  overflow will happen - the need to shift up by more than 63 bits.
  Trivially fixed, of course.  There overflow cases leave wrong
  result as well - should be 0.
* it also misses reporting overflows for case when value is in
  ranges 2^63..2^64-1 and -2^64+1..-2^63-1.  And yes, it's
  asymmetric - 2^63 is an overflow, -2^63 isn't.
* overflow is reported by float_raise(float_flag_overflow, FP_STATUS).
  Wrong flag - it should be IOV, not FOV.  And it should be set
  in FPCR regardless of the trap modifier (IOV, this VI thing is
  wrong - we should deal with that only when we generate a trap).
* All overflow cases should raise INE as well.

Could we steal bit 1 in float_exception_flags for IOV?  It is (currently?)
unused -
enum {
float_flag_invalid   =  1,
float_flag_divbyzero =  4,
float_flag_overflow  =  8,
float_flag_underflow = 16,
float_flag_inexact   = 32,
float_flag_input_denormal = 64,
float_flag_output_denormal = 128
};

That would allow to deal with that crap nicely - we could have it raise
the new flag, then have helper_fp_exc_raise_... for default trap mode
mask it out (and yes, we need to set FPCR flags in default mode, as well
as /U and /V - confirmed by direct experiment *and* by TFM).

If we can't... well, we could put that flag separately, but it would be
more unpleasant.  Folks?



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-06-25 Thread Peter Maydell
On 25 June 2014 08:01, Al Viro v...@zeniv.linux.org.uk wrote:
 Could we steal bit 1 in float_exception_flags for IOV?  It is (currently?)
 unused -
 enum {
 float_flag_invalid   =  1,
 float_flag_divbyzero =  4,
 float_flag_overflow  =  8,
 float_flag_underflow = 16,
 float_flag_inexact   = 32,
 float_flag_input_denormal = 64,
 float_flag_output_denormal = 128
 };

 That would allow to deal with that crap nicely - we could have it raise
 the new flag, then have helper_fp_exc_raise_... for default trap mode
 mask it out (and yes, we need to set FPCR flags in default mode, as well
 as /U and /V - confirmed by direct experiment *and* by TFM).

 If we can't... well, we could put that flag separately, but it would be
 more unpleasant.  Folks?

I think it's OK to put extra float_flags in, provided you can define
their semantics in terms that make sense for more than one
architecture (even if only one arch actually happens to need them).
The input_denormal/output_denormal flags only get used for ARM,
for instance. However if you wanted to split overflow from integer
overflow you'd need to fix up all the other targets which expect
them to generate just one exception flag...

(Note that any patch touching softfloat files needs to come with
a statement that you're happy to license it under either the
softfloat-2a or softfloat-2b licenses, because we're currently
midway through the tedious process of trying to relicense it.)

thanks
-- PMM



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-06-25 Thread Al Viro
On Wed, Jun 25, 2014 at 10:27:11AM +0100, Peter Maydell wrote:

 I think it's OK to put extra float_flags in, provided you can define
 their semantics in terms that make sense for more than one
 architecture (even if only one arch actually happens to need them).
 The input_denormal/output_denormal flags only get used for ARM,
 for instance. However if you wanted to split overflow from integer
 overflow you'd need to fix up all the other targets which expect
 them to generate just one exception flag...

Hmm...  On alpha it's generated only by the following: CVTTQ, CVTGQ,
CVTQL.  I.e. conversions to integer formats that can be held in FPU
registers (double - s64, VAX double - s64 and s64 - s32).  Does
softfloat even have anything similar?  As it is, it's all in alpha-specific
code; double - s64 might have a chance to be generic (semantics:
* denorms - 0, raise inexact, provided that they survived to
that point and hadn't buggered off with invalid
* exact integers in range -2^63 .. 2^63-1 - equivalent 64bit
integer
* values outside of that range (all with zero fractional part,
since the weight of LSB of significand will be considerably greater than 1
by that point) - lower 64 bits of value, raise integer overflow and
inexact.
* values with non-zero fractional part - rounded according to
rounding mode, raise inexact.
), but existing float64_to_int64() isn't it - very different behaviour
on overflows.  Incidentally, VAX double to s64 is buggered in that area -
it *does* try to use float64_to_int64() and, on top of getting INV instead
of IOV, gets the wrong result in case of overflow (MAX_LONG/MIN_LONG instead
of value in -2^63..2^63-1 comparable modulo 2^64 with exact value taken
as element of $\Bbb Z$).

And s64-s32 is just plain weird - not in the part that has IOV raised on
values outside of -2^31..2^31-1, but in the bit shuffling it's doing if
the test passes; alpha FPU stores s32 value in bits 63-62/58-29, with the
rest filled with zeroes.

In any case, it's not splitting float_overflow_flag; similar cases in
softfloat.c raise float_invalid_flag.  I don't know if it would make
sense to try and teach float64_to_int64() about this kind of return
value on overflow...

 (Note that any patch touching softfloat files needs to come with
 a statement that you're happy to license it under either the
 softfloat-2a or softfloat-2b licenses, because we're currently
 midway through the tedious process of trying to relicense it.)

Wouldn't be a problem, but I doubt that it would be particulary useful to touch
softfloat.c due to the reasons above, anyway.



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-06-25 Thread Peter Maydell
On 25 June 2014 15:26, Al Viro v...@zeniv.linux.org.uk wrote:
 Hmm...  On alpha it's generated only by the following: CVTTQ, CVTGQ,
 CVTQL.  I.e. conversions to integer formats that can be held in FPU
 registers (double - s64, VAX double - s64 and s64 - s32).  Does
 softfloat even have anything similar?

Well, VAX doubles are a bit out of scope for an IEEE emulation
library :-)

  As it is, it's all in alpha-specific code;

It does sound like that's the best place for it. In that case, you
don't want to add a flag to the softfloat float_flags -- they are
specifically for indicating softfloat's status/exceptions. Flags
handled purely in CPU-specific code should be stored in the
CPU specific state struct somewhere.

thanks
-- PMM



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-06-25 Thread Al Viro
On Wed, Jun 25, 2014 at 08:01:17AM +0100, Al Viro wrote:
 On Tue, Jun 24, 2014 at 02:32:46PM -0700, Richard Henderson wrote:
  On 06/24/2014 02:24 PM, Al Viro wrote:
   Al, off to figure out the black magic TCG is using to generate calls...
  
  If you've a helper
  
  DEF_HELPER_1(halt, void, i64)
  
  then
  
gen_helper_halt(...)
  
  will generate the tcg ops that result in the call.
 
 Another fun issue:
 
 CVTTQ is both underreporting the overflow *AND* reports the wrong kind - FOV
 instead of IOV.
 
   * it misses reporting overflows for case when it *knows* that
 overflow will happen - the need to shift up by more than 63 bits.
 Trivially fixed, of course.  There overflow cases leave wrong
 result as well - should be 0.
   * it also misses reporting overflows for case when value is in
 ranges 2^63..2^64-1 and -2^64+1..-2^63-1.  And yes, it's
 asymmetric - 2^63 is an overflow, -2^63 isn't.
   * overflow is reported by float_raise(float_flag_overflow, FP_STATUS).
 Wrong flag - it should be IOV, not FOV.  And it should be set
 in FPCR regardless of the trap modifier (IOV, this VI thing is
 wrong - we should deal with that only when we generate a trap).
   * All overflow cases should raise INE as well.
 
 Could we steal bit 1 in float_exception_flags for IOV?  It is (currently?)
 unused -
 enum {
 float_flag_invalid   =  1,
 float_flag_divbyzero =  4,
 float_flag_overflow  =  8,
 float_flag_underflow = 16,
 float_flag_inexact   = 32,
 float_flag_input_denormal = 64,
 float_flag_output_denormal = 128
 };
 
 That would allow to deal with that crap nicely - we could have it raise
 the new flag, then have helper_fp_exc_raise_... for default trap mode
 mask it out (and yes, we need to set FPCR flags in default mode, as well
 as /U and /V - confirmed by direct experiment *and* by TFM).

OK, I've managed to resurrect UP1000 box (FSVO resurrect - the southbridge
DMA controller has always been buggered, with intermittent noise on one of
the data lines; fans in CPU module are FUBAR as well - 17 and 20 RPM resp.,
so I don't risk keeping it running for long, etc.)

Still, that allows to test EV67 and I hope to resurrect a DS10 box as well,
which will allow for saner testing environment.

Current delta follows, fixing gcc and libc testcases *and* AFAICS getting
CVTTQ handling in line with what actual EV67 is doing.  It's a dirty hack
wrt float_raise() - relies on bit 1 never being raised by softfpu.c.  I'll
look into separating that bit, but it'll probably have non-zero costs ;-/
We need two flags - has IOV been raised during this insn (in this patch
it's bit 1 of fp_status.float_exception_flags, cleaned along with those)
and something to keep FPCR.IOV in (in this patch - bit 1 of fpcr_exc_status).
Sure, we can add another uint8_t or two in struct CPUAlphaState, but that'll
mean extra PITA in code and extra memory accesses...

Review would be welcome.

diff --git a/target-alpha/cpu.h b/target-alpha/cpu.h
index d9b861f..047b9a2 100644
--- a/target-alpha/cpu.h
+++ b/target-alpha/cpu.h
@@ -152,6 +152,10 @@ enum {
 FP_ROUND_DYNAMIC = 0x3,
 };
 
+enum {
+float_flag_IOV = 2,
+};
+
 /* FPCR bits */
 #define FPCR_SUM   (1ULL  63)
 #define FPCR_INED  (1ULL  62)
diff --git a/target-alpha/fpu_helper.c b/target-alpha/fpu_helper.c
index d2d776c..2b39ea4 100644
--- a/target-alpha/fpu_helper.c
+++ b/target-alpha/fpu_helper.c
@@ -45,10 +45,11 @@ uint32_t helper_fp_exc_get(CPUAlphaState *env)
 }
 
 static inline void inline_fp_exc_raise(CPUAlphaState *env, uintptr_t retaddr,
-   uint32_t exc, uint32_t regno)
+   uint32_t exc, uint32_t regno, uint32_t 
hw_exc)
 {
 if (exc) {
-uint32_t hw_exc = 0;
+   if (hw_exc)
+exc = ~env-fpcr_exc_mask;
 
 if (exc  float_flag_invalid) {
 hw_exc |= EXC_M_INV;
@@ -65,6 +66,9 @@ static inline void inline_fp_exc_raise(CPUAlphaState *env, 
uintptr_t retaddr,
 if (exc  float_flag_inexact) {
 hw_exc |= EXC_M_INE;
 }
+if (exc  float_flag_IOV) {
+hw_exc |= EXC_M_IOV;
+}
 
 arith_excp(env, retaddr, hw_exc, 1ull  regno);
 }
@@ -73,18 +77,21 @@ static inline void inline_fp_exc_raise(CPUAlphaState *env, 
uintptr_t retaddr,
 /* Raise exceptions for ieee fp insns without software completion.
In that case there are no exceptions that don't trap; the mask
doesn't apply.  */
-void helper_fp_exc_raise(CPUAlphaState *env, uint32_t exc, uint32_t regno)
+void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, uint32_t regno)
 {
-inline_fp_exc_raise(env, GETPC(), exc, regno);
+uint32_t exc = (uint8_t)env-fp_status.float_exception_flags;
+if (exc) {
+   env-fpcr_exc_status |= exc;
+   inline_fp_exc_raise(env, GETPC(), exc  ~ignore, regno, 0);
+}
 }
 
-/* 

Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-06-24 Thread Al Viro
On Tue, Jun 24, 2014 at 05:34:23AM +0100, Al Viro wrote:
   First of all, kudos - with current qemu tree qemu-alpha-system is
 working pretty well - debian install and a *lot* of builds work just fine.
 As in, getting from lenny to pretty complete squeeze toolchain, including gcj,
 openjdk6 and a lot of crap needed to satisfy build-deps of those, plus all
 priority:required and most of priority:important ones.  It's a _lot_ of
 beating and the damn thing survives - the problems had been with debian
 packages themselves (fstatat() bug in lenny libc, epically buggered build-deps
 in gcc-defaults, etc.).  As it is, one core of 6-way 3.3GHz phenom II is quite
 capable of running a home-grown autobuilder.  Feels like ~250-300MHz alpha
 with a very fast disk...
 
   Remaining problems, AFAICS, are around floating point traps.
 I've found one in glibc testsuite (math/tests-misc.c; overflow in
 ADDS/SU ends up with wrong results from fetestexcept() - only FE_OVERFLOW is
 set, while the sucker expects FE_INEXACT as well and actual hardware sets 
 both)
 and another in gcc one (with -funsafe-math-optimizations CVTST/S on denorms
 triggers SIGFPE/FPE_FLTINV).
 
   The libc one is a bug in gen_fp_exc_raise_ignore() - the difference
 between ADDS/SU and ADDS/SUI is only in trapping, not storing results in
 FPCR.INE and friends.  Both will have the same effect on those and
 if (ignore) {
 tcg_gen_andi_i32(exc, exc, ~ignore);
 }
 in gen_fp_exc_raise_ignore() leads to exc  ignore not reaching the
 update of env-fpcr_exc_status in helper_fp_exc_raise_s().  See 4.7.8:
 [quote]
   In addition, the FPCR gives a summary of each exception type for the 
   exception conditions detected by all IEEE floating-point operates thus
   far, as well as an overall summary bit that indicates whether any of
   these exception conditions has been detected. The indiividual exception
   bits match exactly in purpose and order the exception bits found in the
   exception summary quadword that is pushed for arithmetic traps. However,
   for each instruction, these exception bits arse set independent of the
   trapping mode specified for the instruction. Therefore, even though
   trapping may be disabled for a certain exceptional condition, the fact
   that the exceptional condition was encountered by an instruction is
   still recorded in the FPCR.
 [end quote]
 And yes, on actual hardware both ADDS/SU and ADDS/SUI set FPCR.INE the same
 way - verified by direct experiment.

BTW, here's another testcase:
nclude stdio.h

unsigned long __attribute__((noinline)) f(double x)
{
return (unsigned long)x;// SVCTQ/SVC
}

main()
{
unsigned long x;
extern unsigned long __ieee_get_fp_control(void);
printf(before:%lx\n, __ieee_get_fp_control());
x = f(1ULL63);
printf(after:%lx\n, __ieee_get_fp_control());
printf(result:%lx\n, x);
}

On actual hardware:
before:0
after:2
result:8000

On qemu:
before:0
after:0
result:8000


IOW, gen_fcvttq() is also affected, not only gen_fp_exc_raise().

Can't we simply have separate helpers for various trap suffices, with
all this work on getting exc, etc. taken inside them?  It's not as if
we distinguished many variants, after all...  Right now we have:
plain, /U, /V
/S, /SU
/SUI
/SV
/SVI
and /SU should probably be separated from /S - we do want to suppress underflow
traps on those (again, FPCR.UND should be set regardless).  That's what, 5 or 6
helpers?  Might want to separate /V and /U from plain - AFAICS, we get it
wrong with things like ADDS/U vs. ADDS (it's just that normally underflow
traps are disabled by FPCR.DUND).  I hadn't experimented with those yet, but
even if it turns out that they *are* different - 8 helpers instead of the 2 we
currently have, sharing most of the actual source...

Another thing: shouldn't arithmetics on denorms without /S raise EXC_M_INV,
rather than EXC_M_UNF?



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-06-24 Thread Richard Henderson
On 06/23/2014 09:34 PM, Al Viro wrote:
   Anyway, delta that seems to fix the gcc one (gcc.dg/pr28796-2.c from
 gcc-4.3 and later) follows.  Again, I'm not at all sure if handling of
 env-pc in there is safe from qemu POV and I'd like like to get comments on
 that from somebody more familiar with qemu guts.

Thanks for the diagnosis on the gcc test case.  I've been meaning to
investigate some of these edge cases for quite a while and never quite
got there.

  static inline void inline_fp_exc_raise(CPUAlphaState *env, uintptr_t retaddr,
 -   uint32_t exc, uint32_t regno)
 +   uint32_t exc, uint32_t regno, 
 uint32_t sw)
  {
  if (exc) {
 -uint32_t hw_exc = 0;
 +uint32_t hw_exc = sw;
  
  if (exc  float_flag_invalid) {
  hw_exc |= EXC_M_INV;
 @@ -75,7 +75,7 @@ static inline void inline_fp_exc_raise(CPUAlphaState *env, 
 uintptr_t retaddr,
 doesn't apply.  */
  void helper_fp_exc_raise(CPUAlphaState *env, uint32_t exc, uint32_t regno)
  {
 -inline_fp_exc_raise(env, GETPC(), exc, regno);
 +inline_fp_exc_raise(env, GETPC(), exc, regno, 0);
  }
  
  /* Raise exceptions for ieee fp insns with software completion.  */
 @@ -84,7 +84,7 @@ void helper_fp_exc_raise_s(CPUAlphaState *env, uint32_t 
 exc, uint32_t regno)
  if (exc) {
  env-fpcr_exc_status |= exc;
  exc = ~env-fpcr_exc_mask;
 -inline_fp_exc_raise(env, GETPC(), exc, regno);
 +inline_fp_exc_raise(env, GETPC(), exc, regno, EXC_M_SWC);
  }
  }

This part looks good.

 diff --git a/target-alpha/helper.c b/target-alpha/helper.c
 index 7c053a3..538c6b2 100644
 --- a/target-alpha/helper.c
 +++ b/target-alpha/helper.c
 @@ -527,6 +527,7 @@ void QEMU_NORETURN dynamic_excp(CPUAlphaState *env, 
 uintptr_t retaddr,
  env-error_code = error;
  if (retaddr) {
  cpu_restore_state(cs, retaddr);
 + env-pc += 4;

This one needs a different fix, since dynamic_excp is also used from
alpha_cpu_unassigned_access, and I'm pretty sure the mchk should have the
address of the memory insn.  But that should be easy to fix up.


r~




Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-06-24 Thread Richard Henderson
On 06/24/2014 09:52 AM, Al Viro wrote:
 unsigned long __attribute__((noinline)) f(double x)
 {
 return (unsigned long)x;// SVCTQ/SVC
 }
 
 main()
 {
 unsigned long x;
 extern unsigned long __ieee_get_fp_control(void);
 printf(before:%lx\n, __ieee_get_fp_control());
 x = f(1ULL63);
 printf(after:%lx\n, __ieee_get_fp_control());
 printf(result:%lx\n, x);
 }
 
 On actual hardware:
 before:0
 after:2
 result:8000
 
 On qemu:
 before:0
 after:0
 result:8000
 
 
 IOW, gen_fcvttq() is also affected, not only gen_fp_exc_raise().

Clearly a gross misunderstanding of what bits are actually computed, never mind
what gets signaled.

Thanks for the test.  I've not had working hardware for a couple of years to
validate what's supposed to get set and what isn't.

 Can't we simply have separate helpers for various trap suffices, with
 all this work on getting exc, etc. taken inside them?  It's not as if
 we distinguished many variants, after all...  Right now we have:
   plain, /U, /V
   /S, /SU
   /SUI
   /SV
   /SVI

We used to have separate helpers... at least for the modes that had been
implemented at the time.  The combinatorial explosion ugly though -- 4
different versions of add, sub, etc.  I thought the partial inlining was a
decent solution, as far as maintainability, but it's not unreasonable to 
disagree.

 Another thing: shouldn't arithmetics on denorms without /S raise EXC_M_INV,
 rather than EXC_M_UNF?

No idea.  Should they?


r~




Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-06-24 Thread Al Viro
On Tue, Jun 24, 2014 at 11:33:32AM -0700, Richard Henderson wrote:
  return (unsigned long)x;// SVCTQ/SVC
   CVTTQ/SVC, of course...
 Clearly a gross misunderstanding of what bits are actually computed, never 
 mind
 what gets signaled.
 
 Thanks for the test.  I've not had working hardware for a couple of years to
 validate what's supposed to get set and what isn't.

If you have any ideas for testing, I do have working hw (the box that is
currently alive is ev45, though; I _can_ try to boot a UP1000 one, but
I make no promises regarding its fans, both in PSU and in CPU module ;-/)

  Can't we simply have separate helpers for various trap suffices, with
  all this work on getting exc, etc. taken inside them?  It's not as if
  we distinguished many variants, after all...  Right now we have:
  plain, /U, /V
  /S, /SU
  /SUI
  /SV
  /SVI
 
 We used to have separate helpers... at least for the modes that had been
 implemented at the time.  The combinatorial explosion ugly though -- 4
 different versions of add, sub, etc.  I thought the partial inlining was a
 decent solution, as far as maintainability, but it's not unreasonable to 
 disagree.

Um?  No, I mean having gen_fp_exc_raise() generate a call of one of the 8
helpers; gen_ieee_arith3() and friends would remain as-is.  It's just that
instead of generating load to exc, andi, call of helper_fp_exc_raise_s or
call of helper_fp_exc_raise we would generate a call of one of the
helper_fp_exc_raise{,_u,_v,_s,_su,_sui,_sv,_svi} and let that sucker deal
with loading exc, updating -fpcr_exc_status and generating traps.

  Another thing: shouldn't arithmetics on denorms without /S raise EXC_M_INV,
  rather than EXC_M_UNF?
 
 No idea.  Should they?

They seem to - both from the arch.manual and from direct experiment...  And
they do set FPCR.INV at the same time, not just trigger the trap.



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-06-24 Thread Al Viro
On Tue, Jun 24, 2014 at 11:23:01AM -0700, Richard Henderson wrote:
   env-error_code = error;
   if (retaddr) {
   cpu_restore_state(cs, retaddr);
  +   env-pc += 4;
 
 This one needs a different fix, since dynamic_excp is also used from
 alpha_cpu_unassigned_access, and I'm pretty sure the mchk should have the
 address of the memory insn.  But that should be easy to fix up.

That's not a problem, actually - there we have
dynamic_excp(env, 0, EXCP_MCHK, 0);
so retaddr is going to be 0 and that env-pc += 4 won't be reached at all...



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-06-24 Thread Richard Henderson
On 06/24/2014 01:32 PM, Al Viro wrote:
 If you have any ideas for testing, I do have working hw (the box that is
 currently alive is ev45, though; I _can_ try to boot a UP1000 one, but
 I make no promises regarding its fans, both in PSU and in CPU module ;-/)

Ah.  Gotta be careful with ev4/45... half of the fpu is unimplemented, and so
if you're not careful all you're testing is the kernel emulation behaviour.

 Um?  No, I mean having gen_fp_exc_raise() generate a call of one of the 8
 helpers; gen_ieee_arith3() and friends would remain as-is.  It's just that
 instead of generating load to exc, andi, call of helper_fp_exc_raise_s or
 call of helper_fp_exc_raise we would generate a call of one of the
 helper_fp_exc_raise{,_u,_v,_s,_su,_sui,_sv,_svi} and let that sucker deal
 with loading exc, updating -fpcr_exc_status and generating traps.

Ah, I getcha.  Yes, that makes sense.

 Another thing: shouldn't arithmetics on denorms without /S raise EXC_M_INV,
 rather than EXC_M_UNF?

 No idea.  Should they?
 
 They seem to - both from the arch.manual and from direct experiment...  And
 they do set FPCR.INV at the same time, not just trigger the trap.

Ok.  I'll try to make time to fix up some of this stuff this weekend.


r~




Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-06-24 Thread Al Viro
On Tue, Jun 24, 2014 at 01:57:52PM -0700, Richard Henderson wrote:
 On 06/24/2014 01:32 PM, Al Viro wrote:
  If you have any ideas for testing, I do have working hw (the box that is
  currently alive is ev45, though; I _can_ try to boot a UP1000 one, but
  I make no promises regarding its fans, both in PSU and in CPU module ;-/)
 
 Ah.  Gotta be careful with ev4/45... half of the fpu is unimplemented, and so
 if you're not careful all you're testing is the kernel emulation behaviour.

*nod*

  Um?  No, I mean having gen_fp_exc_raise() generate a call of one of the 8
  helpers; gen_ieee_arith3() and friends would remain as-is.  It's just that
  instead of generating load to exc, andi, call of helper_fp_exc_raise_s or
  call of helper_fp_exc_raise we would generate a call of one of the
  helper_fp_exc_raise{,_u,_v,_s,_su,_sui,_sv,_svi} and let that sucker deal
  with loading exc, updating -fpcr_exc_status and generating traps.
 
 Ah, I getcha.  Yes, that makes sense.

FWIW, the crudest version of that is simply
+env-fpcr_exc_status |= (uint8_t)env-fp_status.float_exception_flags;
in the very beginning of helper_fp_exc_raise_s().  And yes, it recovers
math/tests-misc.c, even though it's obviously not a good final fix.

Al, off to figure out the black magic TCG is using to generate calls...



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-06-24 Thread Richard Henderson
On 06/24/2014 02:24 PM, Al Viro wrote:
 Al, off to figure out the black magic TCG is using to generate calls...

If you've a helper

DEF_HELPER_1(halt, void, i64)

then

  gen_helper_halt(...)

will generate the tcg ops that result in the call.


r~