[Bug target/30255] register spills in x87 unit need to be 80-bit, not 64
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30255 Jackie Rosen jackie.rosen at hushmail dot com changed: What|Removed |Added CC||jackie.rosen at hushmail dot com --- Comment #12 from Jackie Rosen jackie.rosen at hushmail dot com --- *** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Page where seen: http://volichat.com/adult-chat-rooms Marked for reference. Resolved as fixed @bugzilla.
[Bug target/30255] register spills in x87 unit need to be 80-bit, not 64
--- Comment #11 from rguenth at gcc dot gnu dot org 2006-12-27 16:21 --- Just to mention it - you can use 'long double' to force 80bit spills. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30255
[Bug target/30255] register spills in x87 unit need to be 80-bit, not 64
--- Comment #8 from ian at airs dot com 2006-12-19 14:57 --- I think I agree that if we spill an 80387 register to the stack, and then load the value back into an 80387 register, that we should spill all 80 bits, rather than implicitly converting to DFmode or SFmode. This would unfortunately be rather difficult to implement in the context of gcc's register allocator, because it is perfectly normal for gcc to spill values from one type of register and reload them into a different type of register. Thus the value might move between an 80387 register, a pair of ordinary x86 registers, and an SSE/SSE2 register, all in the same function. It would just depend on how the value was being used. Currently gcc simply says the value is DFmode or SFmode, and more or less ignores the fact that it is being represented as an 80-bit value in an 80387 register. To implement this suggestion we would need to add a new notion: the mode of the spill value. And we would need to support secondary reloads to convert 80-bit spill values as required. That sounds rather complicated, but if we didn't do that, then I think we would still be inconsistent in some cases. I don't see any point to making this change unless we can always be consistent. All in all it's pretty hard for me to get excited about better support for 80387 when all modern x87 chips support SSE2 which is more consistent and faster. See the option -mfpmath=sse. -- ian at airs dot com changed: What|Removed |Added CC||ian at airs dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30255
[Bug target/30255] register spills in x87 unit need to be 80-bit, not 64
--- Comment #9 from whaley at cs dot utsa dot edu 2006-12-19 16:04 --- Ian, Thanks for the info. I see I failed to consider the cross-register moves you mentioned. However, can't those be moved through memory, where something destined for a 64-bit register is first written from the 80-bit reg with round-down? Thus, you only do the round down when you have to change register sets. In a code compiled with -mfpmath=387, I would think that would occur pretty much only at function epilogue for the return value . . . Anyway, I see how, depending on the framework, this may be more complicated than it seemed. However, my own compilation experience is that cross-precision/type conversions are always complicated? All in all it's pretty hard for me to get excited about better support for 80387 when all modern x87 chips support SSE2 which is more consistent and faster. See the option -mfpmath=sse. First, it is consistant only in that it always has 64-bit precision. This is like prefering a car that can only achieve 30 MPH to one that can go to 60, but only for short stretches, and must sometimes slow down to 30. The first is more consistant, but hardly to be prefered :) It is certainly the case that the x87 is of decreasing importance. However, scalar SSE (the default with gcc) does *not* in general on the present generation run as fast as the x87 (I believe this common misconception comes from conflating vector and scalar performance; on AMDs, even vector performance is less than x87 for double precision). In particular, single precision scalar SSE seems to be much slower than x87 code, and double precision seems to be slightly slower *even when all 16 SSE regs are used, in contrast to the crappy 8-reg x87 stack*. Without proof, I ascribe the closer double performance to the availability of movlpd, which provides a low-cost scalar load not enjoyed by single precision (which must use movss). The only platform where scalar SSE *may* be competitive or better is Core2Duo, and I haven't had a chance to do benchmarks there to see. Note that there is one performance advantage that x87 code will pretty much always have, even once the archs improve their scalar SSE performance: it's much more compact due to being defined earlier in the CISC instruction set, which can massively reduce your instruction load on heavily unrolled loops, and allow more instructions to fit in the selection window. Now, if the performance were even (rather than x87 being faster), numerical guys would still sometimes prefer the x87, in order to get that free extra precision. If 10,000 flops are done in 80-bit precision, your worst-case error is roughly epsilon. If they are done in 64-bit (SSE), your worst-case error is 10,000*epsilon. Which would you prefer if you were in the space ship whose flight path was being calculated? :) Thanks, Clint -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30255
[Bug target/30255] register spills in x87 unit need to be 80-bit, not 64
--- Comment #10 from whaley at cs dot utsa dot edu 2006-12-19 17:18 --- Guys, In the interests of full disclosure, I did some quick timings on the Core2Duo, and as I kind of suspected, scalar SSE crushed x87 there. I was pretty sure scalar SSE could achieve 2 flop/cycle, while Intel kept the x87 at 1 flop/cycle, and that's what my timings show. So, it does appear likely that the only people using the x87 in the future on the Intel will be people who need the extra precision (and those people would really like this fix, I will point out :). All other Intel archs (P4, PIII, etc) do 1 flop cycle for both scalar SSE and x87. On the AMDs, both x87 and scalar SSE can achieve 2 flop/cycle, with x87 running somewhat faster, with only a slight advantage in double precision, and a more commanding one in single. It looks like the next generation of AMDs will increase the maximal flop rate of vector SSE, but it does not look like they will increase the max flop rate of scalar SSE, so this may continue to be the case going forward . . . Cheers, Clint -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30255
[Bug target/30255] register spills in x87 unit need to be 80-bit, not 64
--- Comment #1 from pinskia at gcc dot gnu dot org 2006-12-18 20:16 --- *** This bug has been marked as a duplicate of 323 *** -- pinskia at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||DUPLICATE http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30255
[Bug target/30255] register spills in x87 unit need to be 80-bit, not 64
--- Comment #2 from whaley at cs dot utsa dot edu 2006-12-18 20:43 --- Hi, While it may be decided not to fix this problem, this is not a duplicate of bug 323, and so it should be closed for another reason if you want to ignore it. 323 has a problem because of the function call, where a programmer knows that a round-down can occur by examining the code. This problem is due to register spilling, and so no amount of source examination can figure out if this could occur. Therefore, 323 can be worked around by the knowledgable user, and this one cannot. Also, the 323 would require a pragmas or something to prevent, whereas this problem could be completely avoided merely by spilling the 80-bit value when gcc decides to spill. Since this problem cannot be worked around, and has a much more discrete fix, it is very different indeed from the much harder to fix 323. Thanks, Clint -- whaley at cs dot utsa dot edu changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|DUPLICATE | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30255
[Bug target/30255] register spills in x87 unit need to be 80-bit, not 64
--- Comment #3 from whaley at cs dot utsa dot edu 2006-12-18 21:16 --- BTW, in case it isn't obvious, here's the fix that I typically use for problems like bug 323 that I cannot when it is gcc itself that is unpredictably spilling the computation: void test(double x, double y) { const double y2 = x + 1.0; volatile double v[2]; v[0] = y2; v[1] = y; if (v[0] != v[1]) printf(error\n); } The idea being that the volatile keyword prevents gcc from getting rid of the store/load cycle, which forces the round-down. This allows me to still do this kind of comparison, w/o the speed loss of associated with -ffloat-store (the compare itself becomes slow due to the store/load, but the body of the code runs as fast as normal), or the loss of precision associated with always rounding to 64 bit, as when you change the x87 control word. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30255
[Bug target/30255] register spills in x87 unit need to be 80-bit, not 64
--- Comment #4 from pinskia at gcc dot gnu dot org 2006-12-18 22:04 --- The problem with register spilling and what PR 323 is talking about is all the same issue really, it is just exposed differently. *** This bug has been marked as a duplicate of 323 *** -- pinskia at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||DUPLICATE http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30255
[Bug target/30255] register spills in x87 unit need to be 80-bit, not 64
--- Comment #5 from whaley at cs dot utsa dot edu 2006-12-18 22:14 --- I cannot, of course, force you to admit it, but 323 is a bug fixable by the programmer, and this one is not. The other requires a lot of work in the compiler, and this does not. So, viewing them as the same can be done, in the same way that all x87/gcc bugs are the same, or all precision bugs are the same, but since neither their genesis or solution are the same, it is misleading to do so. Saying you don't care to fix it is an honest answer, closing it because it is a duplicate of a much larger and harder problem for which known workarounds exist is not. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30255
[Bug target/30255] register spills in x87 unit need to be 80-bit, not 64
--- Comment #6 from pinskia at gcc dot gnu dot org 2006-12-18 23:02 --- I cannot, of course, force you to admit it, but 323 is a bug fixable by the programmer, and this one is not. Depends on what you mean by fixable by the programmer because most people don't know anything about precusion issues. This was a design of the x86 back-end because it gives a nice speed. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30255
[Bug target/30255] register spills in x87 unit need to be 80-bit, not 64
--- Comment #7 from whaley at cs dot utsa dot edu 2006-12-19 00:31 --- Depends on what you mean by fixable by the programmer because most people don't know anything about precusion issues. Most people don't know programming at all, so I guess you are suggesting that errors that are fixable at the source-code level must nonetheless always be fixed by the compiler? More to the point, the people who truly care about precision *are* often aware of these kinds of fixes, but they are helpless in this case, unlike for bug 323 (which is why they should not be conflated). My point was that for bug 323, there is something the user can do to fix, and that something does not hurt overall performance or accuracy. Since the problem I reported is caused completely by gcc, impacts accuracy in the same way as reordering (which gcc prohibits), and there is nothing that the user can do to fix without drastic loss of performance or accuracy, gcc is the only place it can be fixed. This problem is a narrow discrete case that can clearly be fixed by gcc, whereas 323 is a broad class of problems which cannot be fixed without adding to the C language the concept of mixed precisions within a type. Therefore, I strongly believe that it is perfectly valid to say that 323 cannot be solved in gcc, but clearly untrue to say that about this case, and so this bug report should have been closed as we don't care, not as duplicate. This was a design of the x86 back-end because it gives a nice speed. The fix I suggested would only slow spill (note: I mean gcc-spilled code, not explicit load/stores by the programmer) code, and would therefore make noticable performance difference in very few cases. Note that unlike the straw-man of bug 323 I am *not* advocating gcc handle all extra precision behavior, just its undetectable spill rounding. If the performance issue is greater than I suspect, obviously there could be a flag for this behavior. I find it a bit anomolous that a compiler that is so picky about bit-level accuracy that it forbids reordering operations without a special flag, feels free to randomly round in an algorithm, even though the fix would not hurt performance as much as not performing reordering optimizations does, and introduces the same type of error. That it does so on the most common platform on earth just adds to the beauty :) Thanks, Clint -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30255