On 05/08/2015 03:36 AM, Kenneth Graunke wrote: > According to Glenn, shifts on R600 have 5x the throughput as multiplies. > > Intel GPUs have strange integer multiplication restrictions - on most > hardware, MUL actually only does a 32-bit x 16-bit multiply. This > means the arguments aren't commutative, which can limit our constant > propagation options. SHL has no such restrictions. > > Shifting is probably reasonable on most people's hardware, so let's just > do that. > > i965 shader-db results (using NIR for VS): > total instructions in shared programs: 7432587 -> 7388982 (-0.59%) > instructions in affected programs: 1360411 -> 1316806 (-3.21%) > helped: 5772 > HURT: 0 > > Signed-off-by: Kenneth Graunke <kenn...@whitecape.org> > Cc: matts...@gmail.com > Cc: ja...@jlekstrand.net > --- > src/glsl/nir/nir_opt_algebraic.py | 5 +++++ > 1 file changed, 5 insertions(+) > > So...I found a bizarre issue with this patch. > > (('imul', 4, a), ('ishl', a, 2)), > > totally optimizes things. However... > > (('imul', a, 4), ('ishl', a, 2)), > > doesn't seem to do anything, even though imul is commutative, and nir_search > should totally handle that... > > ▄▄ ▄▄ ▄▄ ▄▄▄▄▄▄▄▄ ▄▄▄▄▄ ▄▄ > ██ ██ ████ ▀▀▀██▀▀▀ █▀▀▀▀██ ██ > ▀█▄ ██ ▄█▀ ████ ██ ▄█▀ ██ > ██ ██ ██ ██ ██ ██ ▄██▀ ██ > ███▀▀███ ██████ ██ ██ ▀▀ > ███ ███ ▄██ ██▄ ██ ▄▄ ▄▄ > ▀▀▀ ▀▀▀ ▀▀ ▀▀ ▀▀ ▀▀ ▀▀ > > If you know why, let me know, otherwise I may have to look into it when more > awake.
I've noticed a couple other weird things that I have been unable to understand. Shaders like the one below end with fmul/ffma instaed of flrp, for example. I understand why that happens from GLSL IR opt_algebraic, but it seems like nir_opt_algebraic should handle it. [require] GLSL >= 1.30 [vertex shader] in vec4 v; in vec2 tc_in; out vec2 tc; void main() { gl_Position = v; tc = tc_in; } [fragment shader] in vec2 tc; out vec4 color; uniform sampler2D s; uniform float a; uniform vec3 base_color; void main() { vec3 tex_color = texture(s, tc).xyz; color.xyz = (base_color * a) + (tex_color * (1.0 - a)); color.a = 1.0; } > diff --git a/src/glsl/nir/nir_opt_algebraic.py > b/src/glsl/nir/nir_opt_algebraic.py > index 400d60e..350471f 100644 > --- a/src/glsl/nir/nir_opt_algebraic.py > +++ b/src/glsl/nir/nir_opt_algebraic.py > @@ -247,6 +247,11 @@ late_optimizations = [ > (('fge', ('fadd', a, b), 0.0), ('fge', a, ('fneg', b))), > (('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))), > (('fne', ('fadd', a, b), 0.0), ('fne', a, ('fneg', b))), > + > + # Multiplication by 4 comes up fairly often in indirect offset > calculations. > + # Some GPUs have weird integer multiplication limitations, but shifts > should work > + # equally well everywhere. > + (('imul', 4, a), ('ishl', a, 2)), This should be conditionalized on whether the platform has native integers. > ] > > print nir_algebraic.AlgebraicPass("nir_opt_algebraic", > optimizations).render() > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev