On Thu, Apr 7, 2016 at 4:35 PM, Kenneth Graunke <kenn...@whitecape.org> wrote: > Many shaders contain expression trees of the form: > > const_1 * (value * const_2) > > Reorganizing these to > > (const_1 * const_2) * value > > will allow constant folding to combine the constants. Sometimes, these > constants are 2 and 0.5, so we can remove a multiply altogether. Other > times, it can create more immediate constants, which can actually hurt. > > Finding a good balance here is tricky. While much more could be done, > this simple patch seems to have a lot of positive benefit while having > a low downside. > > shader-db results on Broadwell: > > total instructions in shared programs: 8963768 -> 8961369 (-0.03%) > instructions in affected programs: 438318 -> 435919 (-0.55%) > helped: 1502 > HURT: 245 > > total cycles in shared programs: 71527354 -> 71421516 (-0.15%) > cycles in affected programs: 11541788 -> 11435950 (-0.92%) > helped: 3445 > HURT: 1224 >
The series is Reviewed-by: Matt Turner <matts...@gmail.com> The shaders most hurt from this patch are... funny. They do s_texcoord_0 = texcoord + offset * vec4(-1.5,-1.5,-0.5,-1.5); s_texcoord_1 = texcoord + offset * vec4( 0.5,-1.5, 1.5,-1.5); s_texcoord_2 = texcoord + offset * vec4(-1.5,-0.5,-0.5,-0.5); s_texcoord_3 = texcoord + offset * vec4( 0.5,-0.5, 1.5,-0.5); s_texcoord_4 = texcoord + offset * vec4(-1.5, 0.5,-0.5, 0.5); s_texcoord_5 = texcoord + offset * vec4( 0.5, 0.5, 1.5, 0.5); s_texcoord_6 = texcoord + offset * vec4(-1.5, 1.5,-0.5, 1.5); s_texcoord_7 = texcoord + offset * vec4( 0.5, 1.5, 1.5, 1.5); Today, we generate 8 MOV instructions with VF immediates. We could have just loaded 0.5, -0.5, 1.5, and -1.5 with a single VF immediate and then swizzled that as needed, but how would we recognize that all of these can be combined? NIR CSE? Part of the difficulty is that each of the vec4s in the source language include only 3 of the immediate floats -- none contain all four. Anyway, the programs go from 28->38 instructions because when things are multiplied in a different order the constants become 0.125, -0.125, 0.375, and -0.375, and since ±0.125 isn't representable as a VF we generate even more silly instructions! If we could recognize all of these as swizzles of vec4(0.5, -0.5, 1.5, -1.5) at the NIR level we could at cut those hurt programs down by a lot. Everything else just looks like noise to me. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev