Roland Scheidegger wrote: > Rune Petersen wrote: >> This patch: - Fixes COS. - Does range reductions for SIN & COS. - >> Adds SCS. - removes the optimized version of SIN & COS. - tweaked >> weight (should help on precision). - fixed a copy paste typo in >> emit_arith(). >> >> Roland would you mind testing if the tweaked weight helped? > Well I didn't test it first time (just quoting the numbers from the > link you provided), but I guess that's fine too. I was actually > wondering myself if it's better to optimize for absolute or relative > error, so choosing a weight in-between should work too (the > difference is not that big after all). > > A couple comments though: Since ((x + PI/2)/(2*PI))+0.5 is (x/(2*PI) > + (1/4 + 0.5) you could optimize away the first mad for the COS case. > Ah I see you're a bit short on consts, if you want to only use 2 (btw I'd say there should be 32 not only 16 but I have no idea why the driver restricts it to 16).
> Also, the comments for SCS seem a bit off. That's a pity, because > without comments I can't really see what the code does at first sight > :-). Looks like quite a few extra instructions though, are you sure > not more could be shared for calculating both sin and cos? I've looked a bit closer (this is an interesting optimization problem...) and I think it should be doable with fewer instructions, though ultimately I needed 2 temps instead of 1 (I don't think it's much of a problem, 32 is plenty, PS2.0 only exposes 12). Ok the equation was: Q (4/pi x - 4/pi^2 x^2) + P (4/pi x - 4/pi^2 x^2)^2 Simplified to: y = B * x + C * x * abs(x) y = P * (y * abs(y) - y) + y const0: B,C,pi,P const1: 0.5pi, 0.75, 1/(2pi), 2.0pi That's what I came up with with pseudo-code: //should be 5 slots (I guess it might generate 6 due to force same-slot, //but that needs fixing elewhere) //cos is even: cos(x) = cos(-x). So using simple trigo-fu //we get sin(neg(abs(x)) + pi/2)) = cos(x), no comparison needed and all //values for sine stay inside [-pi,pi] ([-pi/2, pi/2], actually) //hope it's ok to use neg+abs simultaneously? temp.z = add(neg(abs(src)), const1.x) temp.w = mul(src, C) //temp.xy = B*x, C*x (cos), temp.w = C * x, temp2.w = B * x (sin) temp.xy = mul(temp.z, BC) temp2.w = mul(src, B) //do cos in alpha slot not sin due to restricted swizzling //sin y = B * x + C * x * abs(x) temp2.z = mad(temp.w, abs(src), temp2.w) //cos temp2.w = mad(temp.y, abs(temp.z), temp.x) temp.xy = mad(temp2.wzy, abs(temp2.wzy), neg(temp2.wzy)) // now temp.x holds y * abs(y) - y for cos, temp.y same for sin dest.xy = mad(temp.xy, P, temp2.wzy) range reduction for cos: x = (x/(2*PI))+0.75 x = frac(x) x = (x*2*PI)-PI sin: x = (x/(2*PI))+HALF x = frac(x) x = (x*2*PI)-PI Isn't that an elegant solution :-) There may be any number of bugs, of course... Roland ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV -- _______________________________________________ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel