Re: Float problen running i386 inary on amd64

2007-11-17 Thread Bruce Evans

On Fri, 16 Nov 2007, Peter Jeremy wrote:


I've Cc'd bde@ because this relates to the FPU initialisation - which
he is the expert on.

On Thu, Nov 15, 2007 at 12:54:29PM +, Pete French wrote:

On Fri, Nov 02, 2007 at 10:04:48PM +, Pete French wrote:

int
main(int argc, char *argv[])
{
if(atof(3.2) == atof(3.200))
puts(They are equal);
else
puts(They are NOT equal!);
return 0;
}


Since the program as defined above does not include any prototype for
atof(), its return value is assumed to be int.  The i386 code for the
comparison is therefore:


Sorry, I didn't bother sticking the include lines in when I sent it
to the mailing list as I assumed it would be ovious that you need
to include the prototypes!


OK, sorry for the confusion.


Interestingly, if you recode like this:

   double x = atof(3.2);
   double y = atof(3.200);
   if(x == y)
   puts(They are equal);
   else
   puts(They are NOT equal!);

Then the problem goes away! Glancing at the assembly code they both appear to
be doing the same thing as regards the comparison.


Glance more closely.

Behaviour like this should be expected on i386 but not on amd64.  It
gives the well-known property of the sin() function, that sin(x) != sin(x)
for almost all x (!).  It happens because expressions _may_ be evaluated
in extra precision (this is perfectly standard), so identical expressions 
may sometimes be evaluated in different precisions even, or especially,

if they are on the same line.  atof(s) and sin(x) are expressions, so
they may or may not be evaluated in extra precision.  Certainly they
may be evaluated in extra precision internally.  Then when they return
a result, C99 doesn't require discarding any extra precision.  (It only
requires a conversion if the type of the expression being returned is
different from the return type.  Then it requires a conversion as if by
assignment, and such conversions _are_ required to discard any extra
precision.  This gives the bizarre behaviour that, if a functon returning
double uses long double internally until the return statement so as to
get extra precision, then it can only return double precision, since the
return statement discards the extra precision, while if it uses double
precision internally then it may return extra precision and the extra
bits may even be correct.)

The actual behaviour depends on implementation details and bugs.
Programmers are supposed to be get almost deterministic behaviour (with
no _may_'s) by using casts or assignments to discard any extra precision.
E.g., in functions that are declared as double, to actually return
only double precision, use return ((double)(x + y)) instead of return
(x + y), or assign the result to a double (maybe x += y; return (x);).
However, this is completely broken for gcc on i386's. For gcc on i386's,
casts and assignments _may_ actually work as required by C99.  The
-ffloat-store hack is often recommended for fixing problems in this
area, but it only works for assignments; casts remain broken, and the
results of expressions remain unpredictable and dependent on the
optimization level because intermediate values _may_ retain extra
precision depending on whether they are spilled to memory and perhaps
on other things (spilling certainly removes extra precision).  This
has been intentionally broken for about 20 years now.  It is hard to
fix without pessimizing almost everything in much the same way as
-ffloat-store.  The pessimization is larger than it was 20 years ago
since memory is relatively slower (though the stores now normally go
to L1 caches which are very fast, they add a relatvely large amount
to pipeline latency) and register allocation is better.  It is hard
to write code that avoids the pessimization, since only code that uses
very long expressions with no assignments to even register variables
can avoid the stores.  (Store+load to discard the extra precision is
another implementation detail.  It is the fastest way, even if a value
with extra precision is in a register.)

To work around the gcc bugs, something like *(volatile double *)x
must be used to reduce double x; to actually be a double.

The actual behaviour is fairly easy to describe for (f(x) == f(x)):

amd64:
if f() returns float, then the value is returned in the low
quarter of an XMM register, so extra precision is automatically
discarded and the results are equal except in exceptional cases
(if f(x) is a NaN or varies due to internals in the function).
Assignment of the result(s) to variables of any type work
correctly and don't change the values since float is the lowest
precision.

if f() returns double, similarly except the value is returned in
the low half of an XMM register, and assignment of the result(s)
to variable(s) of type float would work correctly and 

Re: Float problen running i386 inary on amd64

2007-11-16 Thread Peter Jeremy
On Sat, Nov 17, 2007 at 04:53:22AM +1100, Bruce Evans wrote:
Behaviour like this should be expected on i386 but not on amd64.  It
gives the well-known property of the sin() function, that sin(x) != sin(x)
for almost all x (!).  It happens because expressions _may_ be evaluated
in extra precision (this is perfectly standard), so identical expressions 
may sometimes be evaluated in different precisions even, or especially,
if they are on the same line.

Thank you for your detailed analysis.  Hwever, I believe you missed
the critical point (I may have removed too much reference to the
actual problem that Pete French saw): I can take a program that was
statically compiled on FreeBSD/i386, run it in legacy (i386) mode on
FreeBSD-6.3/amd64 and get different results.

Another (admittedly contrived) example:
jashank% uname -a  
FreeBSD jashank.vk2pj.dyndns.org 6.1-STABLE FreeBSD 6.1-STABLE #15: Wed Aug  2 
18:35:57 EST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/jashank  i386
jashank% cat y.c
#include stdio.h

double  one = 1.0;
double  three = 3.0;
double  third = 1.0/3.0;

int main(int argc, char **argv)
{
if (one/three == third)
puts(Equal);
else
puts(NOT Equal);
return (0);
}
jashank% cc -O2 -fno-strict-aliasing -pipe -march=athlon  y.c  -static -o y
jashank% ./y
Equal
jashank% /sbin/sha256 y
SHA256 (y) = d44fe8c4c4b4beab6125ba603f2a34fa4d0280ff04d697e22594debf9efc9a1a
jashank% 

turion% uname -a
FreeBSD turion.vk2pj.dyndns.org 6.2-STABLE FreeBSD 6.2-STABLE #30: Tue Jul 31 
20:29:49 EST 2007 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/turion  amd64
turion% scp -p jashank:y .   
y 100%  146KB 145.9KB/s   00:00
turion% /sbin/sha256 y
SHA256 (y) = d44fe8c4c4b4beab6125ba603f2a34fa4d0280ff04d697e22594debf9efc9a1a
turion% ./y
NOT Equal
turion% 

This is identical code being executed in supposedly equivalent
environments giving different results.

I believe the fix is to initialise the FPU using __INITIAL_NPXCW__ in
ia32_setregs(), though I'm not sure how difficult this is in reality.

-- 
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an MTA that is either RFC2821-compliant or matches their claimed behaviour.


pgps0pNqgm4ZO.pgp
Description: PGP signature


Re: Float problen running i386 inary on amd64

2007-11-16 Thread Bruce Evans

On Sat, 17 Nov 2007, Peter Jeremy wrote:


On Sat, Nov 17, 2007 at 04:53:22AM +1100, Bruce Evans wrote:

Behaviour like this should be expected on i386 but not on amd64.  It
gives the well-known property of the sin() function, that sin(x) != sin(x)
for almost all x (!).  It happens because expressions _may_ be evaluated
in extra precision (this is perfectly standard), so identical expressions
may sometimes be evaluated in different precisions even, or especially,
if they are on the same line.


Thank you for your detailed analysis.  Hwever, I believe you missed
the critical point (I may have removed too much reference to the
actual problem that Pete French saw): I can take a program that was
statically compiled on FreeBSD/i386, run it in legacy (i386) mode on
FreeBSD-6.3/amd64 and get different results.

Another (admittedly contrived) example:
...


Ah, that explains it.  This was also a longstanding bug in the Linux
emulator.  linux_setregs() wasn't fixed to use the Linux npx control
word until relatively recently (2005).  Linux libraries used to set
the control word in the C library (crt), which I think is the right
place to initialize it since the correct initialization may depend on
the language, so the bug wasn't so obvious at first.


This is identical code being executed in supposedly equivalent
environments giving different results.

I believe the fix is to initialise the FPU using __INITIAL_NPXCW__ in
ia32_setregs(), though I'm not sure how difficult this is in reality.


Yes, that is the right fix.  It is moderately difficult to do correctly.
linux_setregs() now just uses fldcw(control) where control =
__LINUX_NPXCW__.  This depends on bugs to work, since direct accesses
to the FPU in the kernel are not supported.  They cause a DNA trap
which should be fatal.  amd64 is supposed to print a message about
this error, but it apparently doesn't else log files would be fuller.
i386 doesn't even print a message.  npxdna() and fpudna() check related
invariants but not this one.

Correct code would do something like {fpu,npx}xinit(control) to
initialize the control word.  setregs() in RELENG_[1-4] does exactly
that -- npxinit() hides the complications.  Now {fpu,npx}init() is
only called once or twice at boot time for each CPU, and the complications
are a little larger since most initialization is delayed until the DNA
trap ({fpu,npx}init() now mainly sets up a copy of the initial FPU
state in memory for the trap handler to load later, and it cannot set
up per-thread state since the copy in memory is a global default).

The complications for delayed initialization are mainly to optimize
switching of the FPU state for signal handling, but are also used for
exec.  Another complication here is that signal handlers should be
given the default control word.  This is much more broken than for
setregs:
- there are sysent hooks for sendsig and sigreturn, but none for setting
  registers in sendsig.
- all FreeBSD sendsig's end up using the gobal default initial FPU state
  (if they support switching the FPU state at all).
- all Linux sendsig's are missing support for switching the FPU state.
- suppose that the initial FPU (or even CPU) state is language-dependent
  and this is implemented mainly in the language runtime startup.
  sendsig's would have a hard time determining the languages' defaults
  so as to set them.  The languages would need to set the defaults in
  signal trampolines.

Bruce
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Float problen running i386 inary on amd64

2007-11-15 Thread Pete French
 On Fri, Nov 02, 2007 at 10:04:48PM +, Pete French wrote:
  int
  main(int argc, char *argv[])
  {
  if(atof(3.2) =3D=3D atof(3.200))
  puts(They are equal);
  else
  puts(They are NOT equal!);
  return 0;
  }

 Since the program as defined above does not include any prototype for
 atof(), its return value is assumed to be int.  The i386 code for the
 comparison is therefore:

Sorry, I didn't bother sticking the include lines in when I sent it
to the mailing list as I assumed it would be ovious that you need
to include the prototypes! In the actual tests I did I included stdio.h
and stdlib.h, so the compiler did know the return type. The result is the
same, different behaviour when running the i386 binary on amd64.

 Note that this is comparing the %eax returned by each atof().  Since
 atof() actually returns a double in %st(0) and %eax is a scratch
 register, the results are completely undefined.

I just tried this with the actual code I used for the test (i.e. with the
header files included) and I get something a lot longer than the
assembler you posted. I don't really understand what it is doing as I don't
read 386 assembler, and it's not exactly self explanatory. But the error
is still there.

Interestingly, if you recode like this:

double x = atof(3.2);
double y = atof(3.200);
if(x == y)
puts(They are equal);
else
puts(They are NOT equal!);

Then the problem goes away! Glancing at the assembly code they both appear to
be doing the same thing as regards the comparison.

 Unfortunately, I can't explain why an i386 would be different to an amd64
 in i386 mode.

me neither :-(

So, this is a bug, yes ? but it is a bug in FreeBSD or not ?

-pete.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Float problen running i386 inary on amd64

2007-11-15 Thread Peter Jeremy
I've Cc'd bde@ because this relates to the FPU initialisation - which
he is the expert on.

On Thu, Nov 15, 2007 at 12:54:29PM +, Pete French wrote:
 On Fri, Nov 02, 2007 at 10:04:48PM +, Pete French wrote:
 int
 main(int argc, char *argv[])
 {
 if(atof(3.2) == atof(3.200))
 puts(They are equal);
 else
 puts(They are NOT equal!);
 return 0;
 }

 Since the program as defined above does not include any prototype for
 atof(), its return value is assumed to be int.  The i386 code for the
 comparison is therefore:

Sorry, I didn't bother sticking the include lines in when I sent it
to the mailing list as I assumed it would be ovious that you need
to include the prototypes!

OK, sorry for the confusion.

Interestingly, if you recode like this:

double x = atof(3.2);
double y = atof(3.200);
if(x == y)
puts(They are equal);
else
puts(They are NOT equal!);

Then the problem goes away! Glancing at the assembly code they both appear to
be doing the same thing as regards the comparison.

The underlying problem is that the amd64 FPU is initialised to 64-bit
precision mode, whilst the i386 FPU is initialised to 53-bit precision
mode (__INITIAL_FPUCW__ in amd64/include/fpu.h vs __INITIAL_NPXCW__ in
i386/include/npx.h).  It looks like the FPU is initialised during the
machine-dependent CPU initialisation and then inherited by subsequent
processes as they are fork()d.  The fix is probably to explicitly
initialise the FPU for legacy mode processes on the amd64.

A work-around would be to call fpsetprec(FP_PD) (see machine/ieeefp.h)
at the start of main().

-- 
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an MTA that is either RFC2821-compliant or matches their claimed behaviour.


pgpg4imMtCLmB.pgp
Description: PGP signature


Re: Float problen running i386 inary on amd64

2007-11-14 Thread Peter Jeremy
On Fri, Nov 02, 2007 at 10:04:48PM +, Pete French wrote:
   int
   main(int argc, char *argv[])
   {
   if(atof(3.2) == atof(3.200))
   puts(They are equal);
   else
   puts(They are NOT equal!);
   return 0;
   }

Since the program as defined above does not include any prototype for
atof(), its return value is assumed to be int.  The i386 code for the
comparison is therefore:

movl$.LC0, (%esp)
callatof
movl$.LC1, (%esp)
movl%eax, %ebx
callatof
cmpl%eax, %ebx
je  .L7

Note that this is comparing the %eax returned by each atof().  Since
atof() actually returns a double in %st(0) and %eax is a scratch
register, the results are completely undefined.  Unfortunately, I
can't explain why an i386 would be different to an amd64 in i386 mode.

-- 
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an MTA that is either RFC2821-compliant or matches their claimed behaviour.


pgp9NYp3Hzs1H.pgp
Description: PGP signature


Re: Float problen running i386 inary on amd64

2007-11-02 Thread Miguel Lopes Santos Ramos
 From: Pete French [EMAIL PROTECTED]

 Hi, I have a very simple program:


   int
   main(int argc, char *argv[])
   {
   if(atof(3.2) == atof(3.200))
   puts(They are equal);
   else
   puts(They are NOT equal!);
   return 0;
   }


 This works as expected on both i386 and amd64. But if I take the compiled
 binary from the i386 system and run it on the amd64 system thenit says they
 are not equal! I thought this was a library problem, but it even happens if
 I compile to a static binary, which would preseumably mean the same code is
 running on both systems.

 I am using 6.3-PRERELEASE here

Unfortunately, I didn't have the luck of having it reproduced here.
Maybe because my i386 is on 7.0, different compiler (although the amd64 is
still on RELENG_6).

Since you've rulled out everything else by building a static binary,
did you try using the new C99 functions in fenv.h related to the
floating-point environment?

Miguel Ramos
Lisboa, Portugal
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]