Re: Activate -mrecip with -ffast-math?
On 6/18/07, Brooks Moses [EMAIL PROTECTED] wrote: Giovanni Bajo wrote: Both our goals are legitimate. But that's not the point. The point is what -ffast-math semantically means (the simplistic list of suboptions activated by it is of couse unsufficiente because it doesn't explain how to behave in face of new options, like -mrecip). My proposal is: -ffast-math activates all the mathematical-related optimizations that improves code speed while destroying floating point accuracy. I don't think that's a workable proposal. If it is taken literally, it means that the optimization of converting all floating-point arithmetic to no-ops and replacing all references to floating-point variables with zeros is allowed (and would be appropriate under this option). And, personally, I don't think that documentation is of use if it can't be taken reasonably literally. There's a line between what's acceptable and what's not, and regardless of where exactly it is, the documentation needs to fairly clearly indicate its location. I agree. 'destroying floating point accuracy' is too broad and discuraging. Even if in some cases this is exactly what happens - the error we introduce (if you define it as difference of result with and without -ffast-math) is essentially unbound. Still in most 'regular' cases we preserve accuracy quite well or even improve it (for some other metric of accuracy). This is really a hard to solve communication problem. OTOH, if we start to produce NaN for sqrt(0.0) that is of course simply 'wrong', not inaccurate ;) Richard.
Re: Activate -mrecip with -ffast-math?
OTOH, if we start to produce NaN for sqrt(0.0) that is of course simply 'wrong', not inaccurate ;) I still support the introduction of a special switch for this kind of transformation, -fwrong-math-optimizations. :-) Paolo
Re: Activate -mrecip with -ffast-math?
On 6/19/07, Paolo Bonzini [EMAIL PROTECTED] wrote: OTOH, if we start to produce NaN for sqrt(0.0) that is of course simply 'wrong', not inaccurate ;) I still support the introduction of a special switch for this kind of transformation, -fwrong-math-optimizations. :-) Probably as useful and widely-used as -fhello-world? ;) Richard.
Re: Activate -mrecip with -ffast-math?
On 6/18/07, Uros Bizjak [EMAIL PROTECTED] wrote: On 6/18/07, tbp [EMAIL PROTECTED] wrote: Until now, the contract was: you have to deal with (and contain) NaN and infinities. Fair enough, even if tricky that remained manageable. But if i can't expect a mere division by 0, or sqrt of 0 (quite common with FTZ/DAZ on) to give me respectively an infinite and 0 and instead get a NaN (which i can't filter, you remember?) because of the NR round, that's pure madness. Attached patch to should fix these troubles for the cost of 2 extra clocks. The trick is to limit the result just below infinity for rsqrt, and this keeps 0.0*(inf-) - 0.0. I guess I'm still confused how this will fix sqrt(x) - rsqrt for x == 0, so, can we have a testcase enumerating the now bogus cases? Thx, Richard. Uros. Index: i386.c === --- i386.c (revision 125790) +++ i386.c (working copy) @@ -22590,7 +22590,7 @@ void ix86_emit_swdivsf (rtx res, rtx a, void ix86_emit_swsqrtsf (rtx res, rtx a, enum machine_mode mode, bool recip) { - rtx x0, e0, e1, e2, e3, three, half; + rtx x0, e0, e1, e2, e3, three, half, bignum; x0 = gen_reg_rtx (mode); e0 = gen_reg_rtx (mode); @@ -22600,15 +22600,18 @@ void ix86_emit_swsqrtsf (rtx res, rtx a, three = CONST_DOUBLE_FROM_REAL_VALUE (dconst3, SFmode); half = CONST_DOUBLE_FROM_REAL_VALUE (dconsthalf, SFmode); + bignum = gen_lowpart (SFmode, GEN_INT (0x7f7f)); if (VECTOR_MODE_P (mode)) { three = ix86_build_const_vector (SFmode, true, three); half = ix86_build_const_vector (SFmode, true, half); + bignum = ix86_build_const_vector (SFmode, true, bignum); } three = force_reg (mode, three); half = force_reg (mode, half); + bignum = force_reg (mode, bignum); /* sqrt(a) = 0.5 * a * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a)) 1.0 / sqrt(a) = 0.5 * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a)) */ @@ -22617,6 +22620,9 @@ void ix86_emit_swsqrtsf (rtx res, rtx a, emit_insn (gen_rtx_SET (VOIDmode, x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, a), UNSPEC_RSQRT))); + emit_insn (gen_rtx_SET (VOIDmode, x0, + gen_rtx_SMIN (mode, x0, bignum))); + /* e0 = x0 * a */ emit_insn (gen_rtx_SET (VOIDmode, e0, gen_rtx_MULT (mode, x0, a)));
Re: Activate -mrecip with -ffast-math?
On 6/18/07, Richard Guenther [EMAIL PROTECTED] wrote: No, that's not the contract with -ffast-math. Note that -ffast-math enables -funsafe-math-optimizations which is allowed to change results (add/remove rounding operations, contract expressions, do transforms like a/b to a * 1/b, do transformations that get you bigger errors than 0.5ulp, etc.) I can't expect a division by a constant to survive -ffast-math unscathed, but then that's a change in precision and manageable. Being returned a NaN i'm not supposed to be see for a common case depending on some transformation is something else, entirely. But if i can't expect a mere division by 0, or sqrt of 0 (quite common with FTZ/DAZ on) to give me respectively an infinite and 0 and instead get a NaN (which i can't filter, you remember?) because of the NR round, that's pure madness. Hm, which particular case are you concerned about (maybe it was mentioned, but I don't remember the details)? Note that -ffast-math enables -ffinite-math-only as well, so the compiler assumes nothing will result in NaNs or Infs. Yes and that's why it's such a pain to handle them correctly while in -ffast-math. But if i generate some, then i get what i've asked for (and i'm in for a local fix). Fair enough. I'm not going to give up ie fast robust SSE ray/aabb slab tests (or ray/plane or...) because of some arbitrary rule; the hardware handles it just fine (yes there's a penalty, but then it's way faster than branching). For example, when doing 1/x and sqrt(x) via reciprocal + NR, you first get an inf from said reciprocal which then turns to a NaN in the NR stage but if you correct it by, say, doing a comparison to 0 and a 'and'. That's what ICC used to do in your back. That's what you'll find page 151 of the amdfam10 optimization manual. Because that's a common case. As far as i can see, there's no such provision in the current patch. At the very least provide a mean to look after those NaNs without losing sanity, like a way to enforce argument order of min/max[ss|ps|pd] without ressorting to inline asm. Well - certainly another reason for the Math BOF ;) We all expect very different things from -ffast-math or -funsafe-math-optimizations. You mean fast unsafe? I think there's quite a margin between to let someone shoot himself in the feet and put a gun on his head.
Re: Activate -mrecip with -ffast-math?
On 6/18/07, Giovanni Bajo [EMAIL PROTECTED] wrote: I understand your problems, but let me state that your objections are totally subjective. *You* need a specific behaviour from -ffast-math (eg: keep NaN/Inf), but that's not what *I* need. So, we have different goals. No. My NaN are my problem. Those generated by gcc, aren't. At the very least provide a cannonical (efficient) way to filter them (ie SSE min/max).
Re: Activate -mrecip with -ffast-math?
tbp wrote: For example, when doing 1/x and sqrt(x) via reciprocal + NR, you first get an inf from said reciprocal which then turns to a NaN in the NR stage but if you correct it by, say, doing a comparison to 0 and a 'and'. That's what ICC used to do in your back. That's what you'll find page 151 of the amdfam10 optimization manual. Because that's a common case. As far as i can see, there's no such provision in the current patch. At the very least provide a mean to look after those NaNs without losing sanity, like a way to enforce argument order of min/max[ss|ps|pd] without ressorting to inline asm. But even if sqrt is corrected for 0.0 * inf, there would still be a lot of problems with the combinations of NR-enhanced rsqrt and rcp. Consider for example: 1.0/sqrt(a/b) alias rsqrt(a/b) Having a=0, b != 0, the result is inf. This expression is mathematically equal to sqrt(b/a) and the compiler is free to do this optimization. In this case, b*rcp(a) produces NaN due to NR of rcp(a) and here we loose. Let's correct both, rsqrt and rcp NR steps for 0.0, so we have NR-rsqrt(0.0) = inf, NR-rcp(0.0) = inf. Again, sqrt(b/a) will create sqrt(inf) = inf * rsqrt(inf), so NR step for rsqrt will hit (0.0 * inf) from the other side. We loose, because there is no correction for the case where input operand is infinity. IMO, due to limited range of operands for -mrecip pass (inf, -inf); where 0.0 is excluded, it should be keept out of -ffast-math. There is no point to fix reciprocals only for 0.0, we need to fix both conversions for infinity and 0.0, even in -ffast-math. Uros.
Re: Activate -mrecip with -ffast-math?
Giovanni Bajo wrote: Both our goals are legitimate. But that's not the point. The point is what -ffast-math semantically means (the simplistic list of suboptions activated by it is of couse unsufficiente because it doesn't explain how to behave in face of new options, like -mrecip). My proposal is: -ffast-math activates all the mathematical-related optimizations that improves code speed while destroying floating point accuracy. I don't think that's a workable proposal. If it is taken literally, it means that the optimization of converting all floating-point arithmetic to no-ops and replacing all references to floating-point variables with zeros is allowed (and would be appropriate under this option). And, personally, I don't think that documentation is of use if it can't be taken reasonably literally. There's a line between what's acceptable and what's not, and regardless of where exactly it is, the documentation needs to fairly clearly indicate its location. - Brooks
Re: Activate -mrecip with -ffast-math?
On Jun 18, 2007, at 2:14 PM, Uros Bizjak wrote: tbp wrote: For example, when doing 1/x and sqrt(x) via reciprocal + NR, you first get an inf from said reciprocal which then turns to a NaN in the NR stage but if you correct it by, say, doing a comparison to 0 and a 'and'. That's what ICC used to do in your back. That's what you'll find page 151 of the amdfam10 optimization manual. Because that's a common case. As far as i can see, there's no such provision in the current patch. At the very least provide a mean to look after those NaNs without losing sanity, like a way to enforce argument order of min/max[ss|ps|pd] without ressorting to inline asm. But even if sqrt is corrected for 0.0 * inf, there would still be a lot of problems with the combinations of NR-enhanced rsqrt and rcp. Consider for example: 1.0/sqrt(a/b) alias rsqrt(a/b) Having a=0, b != 0, the result is inf. As already stated, -ffast-math turns on -ffinite-math-only, which allows the compiler to assume that a result of inf cannot happen, so gcc is allowed to ignore this possiblity. Producing NaN instead of inf seems to be allowed. This expression is mathematically equal to sqrt(b/a) and the compiler is free to do this optimization. In this case, b*rcp(a) produces NaN due to NR of rcp(a) and here we loose. Let's correct both, rsqrt and rcp NR steps for 0.0, so we have NR- rsqrt(0.0) = inf, NR-rcp(0.0) = inf. Again, sqrt(b/a) will create sqrt(inf) = inf * rsqrt(inf), so NR step for rsqrt will hit (0.0 * inf) from the other side. We loose, because there is no correction for the case where input operand is infinity. IMO, due to limited range of operands for -mrecip pass (inf, - inf); where 0.0 is excluded, it should be keept out of -ffast-math. There is no point to fix reciprocals only for 0.0, we need to fix both conversions for infinity and 0.0, even in -ffast-math. I think that tbp wants just to ensure that sqrt(0.0)=0.0 even with your various reciprocal and sqrt optimizations. (I can't test the new code now, but I think he claims that with the new sqrt optimizations sqrt(0.) = NaN; if indeed it does this then I would consider this a bug.) I don't think he wants the optimizations to have to do the right thing when an argument or result of one of these operations is infinite or a NaN. Of course, he can correct me if I'm wrong. Brad
Re: Activate -mrecip with -ffast-math?
On Jun 18, 2007, at 2:27 PM, Bradley Lucier wrote: But even if sqrt is corrected for 0.0 * inf, there would still be a lot of problems with the combinations of NR-enhanced rsqrt and rcp. Consider for example: 1.0/sqrt(a/b) alias rsqrt(a/b) Having a=0, b != 0, the result is inf. As already stated, -ffast-math turns on -ffinite-math-only, which allows the compiler to assume that a result of inf cannot happen, so gcc is allowed to ignore this possiblity. Producing NaN instead of inf seems to be allowed. Let me restate this. If -ffinite-math-only is specified, then producing NaN instead of inf should be allowed. If -fno-finite-math-only is specified, then the generated code should do the right thing if an argument or result is inf or NaN. In any case, I would consider it an error if the argument is finite, the result is supposed to be finite, and inf or NaN is produced. Brad
Re: Activate -mrecip with -ffast-math?
On 6/18/07, Uros Bizjak [EMAIL PROTECTED] wrote: IMO, due to limited range of operands for -mrecip pass (inf, -inf); where 0.0 is excluded, it should be keept out of -ffast-math. There is no point to fix reciprocals only for 0.0, we need to fix both conversions for infinity and 0.0, even in -ffast-math. Indeed there are holes in every direction when you pull in such transformation, and the cost of plugging every one of them would be prohibitive; the next batch of c2d supposedly will leave you with ~6 cycles to make it worth for a sqrt. Of course it only gets worse when you start composing. My point merely was that, considering one operation, you'd introduce NaN for a not so special value (0) which, in a *fast* math scenario, could be produced at any previous stage due to denormal clamping; with no sane way to take care of. Again, if you look at prior art (icc, AMD's manual...), that's the only special case they covered. Admittedly that's a trade off but not that unreasonable. Now, an option to remove such transformations from -ffast-math bag-o-tricks would be fine and would still buy gcc some Spec bragging rights :)
Re: Activate -mrecip with -ffast-math?
Bradley Lucier wrote: If -ffinite-math-only is specified, then producing NaN instead of inf should be allowed. Agreed. After all, -finite-math says: Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs. Since the compiler can assume the output isn't a NaN or an Inf, it can freely switch one and the other. If -fno-finite-math-only is specified, then the generated code should do the right thing if an argument or result is inf or NaN. Also agreed. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: Activate -mrecip with -ffast-math?
On 6/17/07, Uros Bizjak [EMAIL PROTECTED] wrote: Hello! I was wondering if there are objects to automatically activating Uros' new -mrecip flag when -ffast-math is specified. It looks like a good match since -mrecip is exactly about fast non-precise mathematics. There is a discussion in gcc-patches@ mailing list about this topic, in Re: [PATCH, middle-end, i386]: reciprocal rsqrt pass + full recip x86 backend support thread [1]. The main problem is, that one of the polyhedron tests segfaults with this patch (not the problem of the recip patch, but usage of questionable FP equivalence tests and FP indexes in the array). Of course there are cases with every optimization enabled by -ffast-math that can break existing programs. Just that we know of one case beforehand shouldn't prevent us from enabling -mrecip at -ffast-math (provided -mno-recip still works, regardless if provided before or after -ffast-math). [We'll at least get some more testing coverage this way] Richard.
Re: Activate -mrecip with -ffast-math?
On 6/18/07, Richard Guenther [EMAIL PROTECTED] wrote: Of course there are cases with every optimization enabled by -ffast-math that can break existing programs. Just that we know of one case beforehand shouldn't prevent us from enabling -mrecip at -ffast-math (provided -mno-recip still works, regardless if provided before or after -ffast-math). [We'll at least get some more testing coverage this way] Argh! Please do not make -ffast-math even more of a pain to work with than it is already. You have to enable it, on the whole compilation unit, to get anywhere near decent performance; there's no escape: either you do not turn it on and everything slows to a crawl, or you pay for not being able to inline from another unit. Until now, the contract was: you have to deal with (and contain) NaN and infinities. Fair enough, even if tricky that remained manageable. But if i can't expect a mere division by 0, or sqrt of 0 (quite common with FTZ/DAZ on) to give me respectively an infinite and 0 and instead get a NaN (which i can't filter, you remember?) because of the NR round, that's pure madness. So please, for the love of everything's sacred, leave such stunts out of -ffast-math. PS: and it's not like such reciprocals + NR couldn't be done with intrinsics or easily handle such common case.
Re: Activate -mrecip with -ffast-math?
On 6/18/07, tbp [EMAIL PROTECTED] wrote: On 6/18/07, Richard Guenther [EMAIL PROTECTED] wrote: Of course there are cases with every optimization enabled by -ffast-math that can break existing programs. Just that we know of one case beforehand shouldn't prevent us from enabling -mrecip at -ffast-math (provided -mno-recip still works, regardless if provided before or after -ffast-math). [We'll at least get some more testing coverage this way] Argh! Please do not make -ffast-math even more of a pain to work with than it is already. You have to enable it, on the whole compilation unit, to get anywhere near decent performance; there's no escape: either you do not turn it on and everything slows to a crawl, or you pay for not being able to inline from another unit. Until now, the contract was: you have to deal with (and contain) NaN and infinities. Fair enough, even if tricky that remained manageable. No, that's not the contract with -ffast-math. Note that -ffast-math enables -funsafe-math-optimizations which is allowed to change results (add/remove rounding operations, contract expressions, do transforms like a/b to a * 1/b, do transformations that get you bigger errors than 0.5ulp, etc.) But if i can't expect a mere division by 0, or sqrt of 0 (quite common with FTZ/DAZ on) to give me respectively an infinite and 0 and instead get a NaN (which i can't filter, you remember?) because of the NR round, that's pure madness. Hm, which particular case are you concerned about (maybe it was mentioned, but I don't remember the details)? Note that -ffast-math enables -ffinite-math-only as well, so the compiler assumes nothing will result in NaNs or Infs. So please, for the love of everything's sacred, leave such stunts out of -ffast-math. Well - certainly another reason for the Math BOF ;) We all expect very different things from -ffast-math or -funsafe-math-optimizations. PS: and it's not like such reciprocals + NR couldn't be done with intrinsics or easily handle such common case. Well, most optimization challenges can be solved if we are allowed to touch the source ;) Thanks, Richard.
Re: Activate -mrecip with -ffast-math?
On 6/18/07, tbp [EMAIL PROTECTED] wrote: Until now, the contract was: you have to deal with (and contain) NaN and infinities. Fair enough, even if tricky that remained manageable. But if i can't expect a mere division by 0, or sqrt of 0 (quite common with FTZ/DAZ on) to give me respectively an infinite and 0 and instead get a NaN (which i can't filter, you remember?) because of the NR round, that's pure madness. Attached patch to should fix these troubles for the cost of 2 extra clocks. The trick is to limit the result just below infinity for rsqrt, and this keeps 0.0*(inf-) - 0.0. Uros. Index: i386.c === --- i386.c (revision 125790) +++ i386.c (working copy) @@ -22590,7 +22590,7 @@ void ix86_emit_swdivsf (rtx res, rtx a, void ix86_emit_swsqrtsf (rtx res, rtx a, enum machine_mode mode, bool recip) { - rtx x0, e0, e1, e2, e3, three, half; + rtx x0, e0, e1, e2, e3, three, half, bignum; x0 = gen_reg_rtx (mode); e0 = gen_reg_rtx (mode); @@ -22600,15 +22600,18 @@ void ix86_emit_swsqrtsf (rtx res, rtx a, three = CONST_DOUBLE_FROM_REAL_VALUE (dconst3, SFmode); half = CONST_DOUBLE_FROM_REAL_VALUE (dconsthalf, SFmode); + bignum = gen_lowpart (SFmode, GEN_INT (0x7f7f)); if (VECTOR_MODE_P (mode)) { three = ix86_build_const_vector (SFmode, true, three); half = ix86_build_const_vector (SFmode, true, half); + bignum = ix86_build_const_vector (SFmode, true, bignum); } three = force_reg (mode, three); half = force_reg (mode, half); + bignum = force_reg (mode, bignum); /* sqrt(a) = 0.5 * a * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a)) 1.0 / sqrt(a) = 0.5 * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a)) */ @@ -22617,6 +22620,9 @@ void ix86_emit_swsqrtsf (rtx res, rtx a, emit_insn (gen_rtx_SET (VOIDmode, x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, a), UNSPEC_RSQRT))); + emit_insn (gen_rtx_SET (VOIDmode, x0, + gen_rtx_SMIN (mode, x0, bignum))); + /* e0 = x0 * a */ emit_insn (gen_rtx_SET (VOIDmode, e0, gen_rtx_MULT (mode, x0, a)));
Activate -mrecip with -ffast-math?
Hello, I was wondering if there are objects to automatically activating Uros' new -mrecip flag when -ffast-math is specified. It looks like a good match since -mrecip is exactly about fast non-precise mathematics. -- Giovanni Bajo
Re: Activate -mrecip with -ffast-math?
Hello! I was wondering if there are objects to automatically activating Uros' new -mrecip flag when -ffast-math is specified. It looks like a good match since -mrecip is exactly about fast non-precise mathematics. There is a discussion in gcc-patches@ mailing list about this topic, in Re: [PATCH, middle-end, i386]: reciprocal rsqrt pass + full recip x86 backend support thread [1]. The main problem is, that one of the polyhedron tests segfaults with this patch (not the problem of the recip patch, but usage of questionable FP equivalence tests and FP indexes in the array). Uros. [1]: http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01146.html
Re: Activate -mrecip with -ffast-math?
On 17/06/2007 20.20, Uros Bizjak wrote: I was wondering if there are objects to automatically activating Uros' new -mrecip flag when -ffast-math is specified. It looks like a good match since -mrecip is exactly about fast non-precise mathematics. There is a discussion in gcc-patches@ mailing list about this topic, in Re: [PATCH, middle-end, i386]: reciprocal rsqrt pass + full recip x86 backend support thread [1]. The main problem is, that one of the polyhedron tests segfaults with this patch (not the problem of the recip patch, but usage of questionable FP equivalence tests and FP indexes in the array). My own humble 2c on this is that what Roger Sayle calls the black white approach is what most users understand. I am no expert of floating point arithmetics standard; I do understand that by default GCC is very accurate to the standards, and that -ffast-math is the option for less accuracy, more speed. Simple users have simple needs. I reckon simple users like me want an option that means: activate all options that speed up floating point calculations at the cost of accuracy. I believe that option is -ffast-math today. If that's the semantic of the option, then -mrecip should be added to it. But if you dispute this, and you believe that the current semantic of -ffast-math is different (that is: there are track records of -ffast-math only including a selection of optimizations by some standards -- like -O2 which doesn't mean every optimization), that's fine by me either. But please, give me a -ffaster-math or -fuber-fast-math that really means turn on everything, thanks. Either way, -ffast-math should be documented to explain its intended semantic, and not only how that semantic is currently implemented in GCC. This way, this discussion will not need to be reopened in the future. -- Giovanni Bajo