On Mon, Jan 17, 2011 at 10:40 PM, Eric Anholt<e...@anholt.net> wrote:
On Thu, 13 Jan 2011 17:40:39 +0100, Roland Scheidegger<srol...@vmware.com>
wrote:
Am 12.01.2011 23:04, schrieb Eric Anholt:
This is a work-in-progress patch series to switch texenvprogram.c from
generating ARB_fp style Mesa IR to generating GLSL IR as its product.
For drivers without native GLSL codegen, that is then turned into the
Mesa IR that can be consumed. However, for 965 we don't use the Mesa
IR product and just use the GLSL output, producing much better code
thanks to the new backend. This is part of a long term goal to get
Mesa drivers off of Mesa IR and producing their instruction stream
directly from the GLSL IR.
I'm not planning on committing this series immediately, as I've still
got a regression in the 965 driver with texrect-many on the last
commit.
As a comparison, here's one of the shaders from openarena before:
So what's the code looking like after conversion to mesa IR? As long
as
[SNIP]
So, there's one extra Mesa IR move added where we could compute into the
destination reg but don't. This is a general problem with
ir_to_mesa.cpp that affects GLSL pretty badly.
I found pretty much the same thing when looking into tunnel:
# Fragment Program/Shader 0
0: TXP TEMP[0], INPUT[4].xyyw, texture[0], 2D;
1: MUL TEMP[1].xyz, TEMP[0], INPUT[1];
2: MOV TEMP[0].xyz, TEMP[1].xyzx;
3: MOV TEMP[0].w, INPUT[1].wwww;
4: MOV TEMP[2], TEMP[0];
5: MUL TEMP[0].x, INPUT[3].xxxx, STATE[1].wwww;
6: MUL TEMP[3].x, TEMP[0].xxxx, TEMP[0].xxxx;
7: EX2 TEMP[0].x, TEMP[3].-x-x-x-x;
8: MOV_SAT TEMP[3].x, TEMP[0].xxxx;
9: ADD TEMP[0].x, CONST[4].xxxx, TEMP[3].-x-x-x-x;
10: MUL TEMP[4].xyz, STATE[2].xyzz, TEMP[0].xxxx;
11: MAD TEMP[2].xyz, TEMP[1].xyzx, TEMP[3].xxxx, TEMP[4].xyzx;
12: MOV OUTPUT[2], TEMP[2];
13: END
# Fragment Program/Shader 0
0: TXP TEMP[0], INPUT[4], texture[0], 2D;
1: MUL_SAT TEMP[1].xyz, TEMP[0], INPUT[1];
2: MOV_SAT TEMP[1].w, INPUT[1];
3: MUL TEMP[2].x, STATE[0].wwww, INPUT[3].xxxx;
4: MUL TEMP[2].x, TEMP[2].xxxx, TEMP[2].xxxx;
5: EX2_SAT TEMP[2].x, TEMP[2].-x-x-x-x;
6: LRP OUTPUT[2].xyz, TEMP[2].xxxx, TEMP[1], STATE[1];
7: MOV OUTPUT[2].w, TEMP[1];
8: END
I got similar results, tho the effects are more visible here. Also
note that the new shader uses 5 temps compared to 3. The FF setup I
think only uses fog (or one texenv modulate) so its not just hard to
program texenv that gets effect by this change.
Now looking at how this is generated, the new code seems to generate
it quite similarly to the old. After that tho things gets interesting,
after the generation step the old code is now done and is on the
already optimized form you see above. The new code however is far from
done. Going through it first go through various common GLSL IR
optimizations steps (from the attached text file, the second shader
and third shader in the file both are the same just with and without
the inlining of GLSL IR). Finally it calls _mesa_optimize_program
which gets it to its current form.
As for the code itself, it doesn't look as bad as I thought it would,
there are a lot of allocations, a fair bit of extra typing tho loc
count in the commit stays about the same even less, the reason behind
that is that texenv has its own implementation of ureg. Not counting
that a conversion to GLSL IR would instead add extra locs.
Of course, talking about optimality of Mesa IR is kind of a joke, as for
the drivers that directly consume it (i915, 965 VS, r200, and I'm
discounting r300+ as they have their own IR that Mesa IR gets translated
to and actually optimized), we miss huge opportunities to reduce
instruction count due to swizzle sources including -1, 0, 1 as options
but Mesa IR not taking advantage of it. If we were doing that right,
then the other MOV-reduction pass would hit and that extra move just
added here would go away, resulting in a net win.
This could be done with any of the IR's (provided numeric swizzling is
added) and something that I have been thinking about adding to TGSI.
As pretty much all hw supports it natively (exception being svga).
Similarly, we add an extra indirection phase according to 915's
accounting of those on the second shader, but the fact that we don't
schedule those in our GLSL output anyway is a big issue for GLSL on
hardware with indirection limits.
it's not worse than the original I guess this should be ok, though for
those drivers consuming mesa IR I guess it's just more cpu time without
any real benefit?
Assuming that the setup the app did was already optimal for a
programmable GPU, yes. But I suspect that isn't generally the case --
while OA has reasonable looking fixed function setup (other than Mesa IR
we produce not using the swizzles), given how painful it is to program
using texenv I suspect there are a lot of "suboptimal" shader setups out
there that we could actually improve.
You posted some GLSL IR cpu optimizations patches after pushing this
code and only the delta between pre and post optimizations. What is
the delta for the old MesaIR code and GLSL IR code, if you didn't do
any testing can you give an estimate? We seem to be doing a lot more
cpu crunching for worse results.
For gallium we should probably address this some way
or another, it seems quite backward to do ff->glsl->mesa ir->tgsi.
I'm surprised you guys haven't forked off ir_to_mesa.cpp to something
that produces TGSI, since you seem to prefer it as the thing for drivers
to consume over GLSL IR.