On Thu, 15 Jan 2015 15:32:59 +0100, Roland Scheidegger
<srol...@vmware.com> wrote:
Am 15.01.2015 um 10:05 schrieb Iago Toral:
Hi,
We have 16 deqp tests that fail, at least on i965, because of
insufficient precision of the mod GLSL function.
Mesa lowers mod(x,y) to y * fract(x,y) so there can be some precision
lost due to fract operation. Since the result is multiplied by y the
total precision lost usually grows together with the value of y.
Did you mean fract(x/y) here?
Below are some examples to give an idea of the magnitude of this error.
The values on the right represent the precision error for each case:
mod(-1.951171875, 1.9980468750) => 0.0000000447
mod(121.57, 13.29) => 0.0000023842
mod(3769.12, 321.99) => 0.0000762939
mod(3769.12, 1321.99) => 0.0001220703
mod(-987654.125, 123456.984375) => 0.0160663128
mod( 987654.125, 123456.984375) => 0.0312500000
As you see, for large enough values, the precision error becomes
significant.
This can be fixed by lowering mod(x,y) to x - y * floor(x/y) instead,
which is the suggested implementation in the GLSL docs. I have a local
patch in my tree that does this and it does indeed fix the problem. the
down side is that this implementation adds and extra ADD instruction to
the generated code (besides replacing fract with floor, which I guess
have similar cost).
Since this is a case where there is some trade-off to the fix, I wonder
if we are interested in doing this or not. Is the precision fix worth
the additional ADD?
Well I can tell you that llvmpipe implements frc(x) as x - floor(x), so
this change looks good to me :-).
On a more serious note though, it looks to me like the cost of this
expression would be mostly dominated by the division, hence some add
more shouldn't be that bad. And if the test is legit, I don't think
there's much choice (unless you could make this optional for some old
glsl versions if they didn't require that much precision but even then
it's probably not worth bothering imho).
FWIW, I just typed out the following little piglit test and tried it on
R600:
[require]
GLSL >= 3.30
[vertex shader passthrough]
[fragment shader]
uniform float a;
uniform float b;
out vec4 colour;
void
main(void)
{
// colour = vec4(b * fract(a / b)); // current lowering of mod(x,y)
colour = vec4(a - b * floor(a/b)); // proposed lowering
}
[test]
clear color 0.5 0.5 0.5 0.5
clear
uniform float a 4.2
uniform float b 3.5
draw rect -1 -1 2 2
probe rgba 1 1 0.7 0.7 0.7 0.7
Resulting R600 assembly:
// y * fract(x,y)
// KC0[0].x is x and KC0[1] is y
1 t: RECIP_IEEE T0.x, KC0[1].x
2 x: MUL T0.x, KC0[0].x, T0.x
3 x: FRACT T0.x, T0.x
4 x: MUL R0.x, KC0[1].x, T0.x
EXPORT_DONE PIXEL 0 R0.xxxx EOP
// x - y * floor(x/y)
1 t: RECIP_IEEE T0.x, KC0[1].x
2 x: MUL T0.x, KC0[0].x, T0.x
3 x: FLOOR T0.x, T0.x
4 x: MULADD R0.x, KC0[1].x, -T0.x, KC0[0].x
EXPORT_DONE PIXEL 0 R0.xxxx EOP
Same number of cycles/length of dependency chain/ALU pipe usage for both
methods.
I'd expect most architectures that can do source negate with multiply-add
in a single operation should get similar results with no extra cost for
the subtraction.
/Glenn
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev