[Issue 13474] Discard excess precision for float and double (x87)

2017-01-16 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13474

--- Comment #32 from github-bugzi...@puremagic.com ---
Commits pushed to newCTFE at https://github.com/dlang/dmd

https://github.com/dlang/dmd/commit/6db2246e97c790e0988f024ccb25d0fb090d609a
fix Issue 13474 - Discard excess precision for float and double (x87)

https://github.com/dlang/dmd/commit/b9d6be259e2e54c66d8361675b65f717dd5e3fc4
Merge pull request #6247 from WalterBright/fix13474

--


[Issue 13474] Discard excess precision for float and double (x87)

2016-12-27 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13474

--- Comment #31 from github-bugzi...@puremagic.com ---
Commits pushed to scope at https://github.com/dlang/dmd

https://github.com/dlang/dmd/commit/6db2246e97c790e0988f024ccb25d0fb090d609a
fix Issue 13474 - Discard excess precision for float and double (x87)

https://github.com/dlang/dmd/commit/b9d6be259e2e54c66d8361675b65f717dd5e3fc4
Merge pull request #6247 from WalterBright/fix13474

--


[Issue 13474] Discard excess precision for float and double (x87)

2016-11-09 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13474

github-bugzi...@puremagic.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--


[Issue 13474] Discard excess precision for float and double (x87)

2016-11-09 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13474

--- Comment #30 from github-bugzi...@puremagic.com ---
Commits pushed to master at https://github.com/dlang/dmd

https://github.com/dlang/dmd/commit/6db2246e97c790e0988f024ccb25d0fb090d609a
fix Issue 13474 - Discard excess precision for float and double (x87)

https://github.com/dlang/dmd/commit/b9d6be259e2e54c66d8361675b65f717dd5e3fc4
Merge pull request #6247 from WalterBright/fix13474

fix Issue 13474 - Discard excess precision for float and double (x87)

--


[Issue 13474] Discard excess precision for float and double (x87)

2016-11-07 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13474

--- Comment #29 from Walter Bright  ---
(In reply to yebblies from comment #28)
> (In reply to Walter Bright from comment #26)
> I must have missed that.  I'm happy to review/merge dmd changes related to
> that.

No dmd changes are necessary, it will work as is. I designed it as an intrinsic
in case some future code gen scheme will cause it to not work.

> I'm worried the other approach will just cause a performance issue
> that's impossible to work around.

It could be worked around by using reals as temporaries instead of doubles, but
few programmers have that level of understanding of how floating point works.


> > 1. It is unknown what 32 bit x86 CPUs are used for embedded systems. I 
> > dislike adding more codegen switches, because every switch doubles the time 
> > it takes to run the test suite, and few developers set them correctly. (Who 
> > ever sets that blizzard of switches gcc has correctly?)
> 
> Who is using dmd on an embedded system?  Why?

I don't know. I'm reluctant to just break all their code just because I am
ignorant of them.

> Embedded system users are
> exactly the people who are setting gcc switches correctly.

In my experience with embedded systems developers, they aren't any more
sophisticated with detailed feature switch settings than any other systems code
developer. Most just copy the switch settings from project to project, in the
process losing any information about why those settings were set to begin with.

> Then again, wouldn't using unaligned loads/stores still be
> faster than using the x87?  Last I checked, it was... not fast.

I don't know. I also overlooked another point - the 32 bit ABI still uses the
ST0 register for floating point returns. Not sure what gcc does about that.
Anyhow, there is clearly some not insignificant engineering work to be done for
that. It's a question of whether it is worth it.

> Can you put together a dmd PR to go with druntime 1621?  I'm guessing it's
> pretty easy, since a new OPER will default to not being optimized?

I as a mentioned, no dmd changes are currently necessary.

One last point. Yes, the intrinsics will work and will be a more efficient
solution. The problem, though, is people will port code that works, or will
type in code from a book, and then it will not work, and they will blame D. Not
many will know just where the intrinsics will need to be inserted. How many
here realized the store to t was the problem?

--


[Issue 13474] Discard excess precision for float and double (x87)

2016-11-07 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13474

--- Comment #28 from yebblies  ---
(In reply to Walter Bright from comment #26)
> > I think that's well covered by adding an intrinsic,
> 
> I produced a PR request for that in druntime. Nobody liked it, and it
> languishes unpulled.
> 
> https://github.com/dlang/druntime/pull/1621

I must have missed that.  I'm happy to review/merge dmd changes related to
that.  I'm worried the other approach will just cause a performance issue
that's impossible to work around.

> 1. It is unknown what 32 bit x86 CPUs are used for embedded systems. I 
> dislike adding more codegen switches, because every switch doubles the time 
> it takes to run the test suite, and few developers set them correctly. (Who 
> ever sets that blizzard of switches gcc has correctly?)

Who is using dmd on an embedded system?  Why?  Embedded system users are
exactly the people who are setting gcc switches correctly.

> 2. It's not a simple matter of turning it on, even though dmd generates XMM 
> code for OSX 32 bit. The trouble is in getting the stack aligned to 16 bytes. 
> The Linux way of doing that is different from OSX, so there's some 
> significant dev work to do to match it.

Yeah, I know.  Then again, wouldn't using unaligned loads/stores still be
faster than using the x87?  Last I checked, it was... not fast.

> I believe that making faster 64 bit code should have priority over making 
> faster 32 bit
> code, based on the idea that users who feel the need for speed are going to 
> be using -m64.

It's much easier to switch over to m64 on linux, which is why I'm still using
m32 on windows.  One day...

Can you put together a dmd PR to go with druntime 1621?  I'm guessing it's
pretty easy, since a new OPER will default to not being optimized?

--


[Issue 13474] Discard excess precision for float and double (x87)

2016-11-07 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13474

--- Comment #26 from Walter Bright  ---
> I think that's well covered by adding an intrinsic,

I produced a PR request for that in druntime. Nobody liked it, and it
languishes unpulled.

https://github.com/dlang/druntime/pull/1621

--


[Issue 13474] Discard excess precision for float and double (x87)

2016-11-07 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13474

--- Comment #27 from Walter Bright  ---
> stop supporting targets without xmm regs

A couple problems with this:

1. It is unknown what 32 bit x86 CPUs are used for embedded systems. I dislike
adding more codegen switches, because every switch doubles the time it takes to
run the test suite, and few developers set them correctly. (Who ever sets that
blizzard of switches gcc has correctly?)

2. It's not a simple matter of turning it on, even though dmd generates XMM
code for OSX 32 bit. The trouble is in getting the stack aligned to 16 bytes.
The Linux way of doing that is different from OSX, so there's some significant
dev work to do to match it.

I believe that making faster 64 bit code should have priority over making
faster 32 bit code, based on the idea that users who feel the need for speed
are going to be using -m64.

--


[Issue 13474] Discard excess precision for float and double (x87)

2016-11-07 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13474

--- Comment #25 from Walter Bright  ---
https://github.com/dlang/dmd/pull/6247

--


[Issue 13474] Discard excess precision for float and double (x87)

2016-11-07 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13474

--- Comment #23 from yebblies  ---

(In reply to Walter Bright from comment #22)
> 
> So I propose that the fix is to disable optimizing away the assignment to y
> for x87 code gen targets.

Are you suggesting disabling that optimization always, or allowing the
programmer to specify that that particular assignment shouldn't be optimized?

If the latter, I would rather stop supporting targets without xmm regs than
stop producing fast code on Win32 etc.

If the former, I think that's well covered by adding an intrinsic, so the code
becomes:

 double foo(double x, double t, double s, double c) {
double y = __builtin_that_forces_rounding_to_double(x - t);
c += y + s;
return s + c;
 }

And this seems like something that could be handled fairly easily in the dmd
backend.  I think this covers all the cases where rounding must be required.

--


[Issue 13474] Discard excess precision for float and double (x87)

2016-11-07 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13474

--- Comment #24 from Илья Ярошенко  ---
(In reply to yebblies from comment #23)
> I would rather stop supporting targets without xmm regs than
> stop producing fast code on Win32 etc.

+1

--


[Issue 13474] Discard excess precision for float and double (x87)

2016-11-07 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13474

Walter Bright  changed:

   What|Removed |Added

 CC||bugzi...@digitalmars.com

--- Comment #22 from Walter Bright  ---
This boils down to the following code:

 double foo(double x, double t, double s, double c) {
double y = x - t;
c += y + s;
return s + c;
 }

The body of which, when optimized, looks like:

return s + (c + (x - t) + s);

Or, in x87 instructions:

   fld qword ptr 01Ch[ESP]
   fld qword ptr 0Ch[ESP]
   fxchST(1)
   fsubqword ptr 014h[ESP]
   faddqword ptr 0Ch[ESP]
   faddqword ptr 4[ESP]
   fstpqword ptr 4[ESP]
   faddqword ptr 4[ESP]
   ret 020h

The algorithm relies on rounding to double precision of the (x-t) calculation.
The only way to get the x87 to do that is to actually assign it to memory. But
the compiler optimizes away the assignment to memory, because it is
substantially slower.

The 64 bit code does not have this problem, because the code gen looks like:

   pushRBP
   mov RBP,RSP
   movsd   XMM4,XMM0
   movsd   XMM5,XMM1
   subsd   XMM3,XMM2
   addsd   XMM3,XMM5
   addsd   XMM4,XMM3
   movsd   XMM0,XMM5
   addsd   XMM0,XMM4
   pop RBP
   ret

It's doing the same optimization, but the result is rounded to double because
the XMM registers are doubles.

Note that the following targets generate x87 code, not XMM code:

Win32, Linux32, FreeBSD32

because it is not guaranteed that the target has XMM registers. I suspect we
don't really care about the floating point performance on those targets, but we
do care that the code gives expected results.

So I propose that the fix is to disable optimizing away the assignment to y for
x87 code gen targets.

--


[Issue 13474] Discard excess precision for float and double (x87)

2016-09-29 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13474

Martin Nowak  changed:

   What|Removed |Added

 CC||c...@dawg.eu

--- Comment #21 from Martin Nowak  ---
(In reply to yebblies from comment #18)
> Maybe.  The logic in here does seem sound, although again I'm not an expert.
> 
> http://dlang.org/d-floating-point.html
> 
> So the idea is that strict double rounding would be a big performance hit

That doesn't make too much sense, and we shouldn't adapt the language to an x87
"coprocessor", using the old FPU nowadays is a performance hit and should be
avoided unless you absolutely need the extra 16-bits of precision.

As C get's away with it's default behavior, I don't think need to make
-ffast-math our default.

--


[Issue 13474] Discard excess precision for float and double (x87)

2015-02-15 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13474

--- Comment #20 from yebblies  ---
(In reply to Илья Ярошенко from comment #19)
>
> C has this problem only when special compiler flags enabled.
> All this algorithms are from Python source code (C) or Wikipedia(C, Pascal).

C does allow you to specify when excess precision should be lost, but it still
has extra precision enabled for intermediate expressions IIUC.  So the problem
still exists, but is easier to work around.

--


[Issue 13474] Discard excess precision for float and double (x87)

2015-02-15 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13474

Илья Ярошенко  changed:

   What|Removed |Added

Summary|Discard excess precision|Discard excess precision
   |when returning double in|for float and double (x87)
   |x87 register|

--