Kenneth Graunke <[email protected]> writes: > On 09/22/2012 02:04 PM, Eric Anholt wrote: >> This makes a giant pile of code newly dead. It also fixes TXB on newer >> chipsets, which has been totally broken (I now have a piglit test for that). >> It passes the same set of Ian's ARB_fragment_program tests. It also improves >> high-settings ETQW performance by 3.2 +/- 1.9% (n=3), thanks to better >> optimization and having 8-wide along with 16-wide shaders. >> --- >> src/mesa/drivers/dri/i965/Makefile.sources | 1 + >> src/mesa/drivers/dri/i965/brw_fs.cpp | 36 +- >> src/mesa/drivers/dri/i965/brw_fs.h | 30 +- >> src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 22 +- >> src/mesa/drivers/dri/i965/brw_fs_fp.cpp | 781 >> ++++++++++++++++++++++++++ >> src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 +- >> src/mesa/drivers/dri/i965/brw_wm.c | 58 +- >> src/mesa/drivers/dri/i965/brw_wm_state.c | 19 +- >> src/mesa/drivers/dri/i965/gen6_wm_state.c | 8 +- >> src/mesa/drivers/dri/i965/gen7_wm_state.c | 8 +- >> 10 files changed, 857 insertions(+), 109 deletions(-) >> create mode 100644 src/mesa/drivers/dri/i965/brw_fs_fp.cpp > > I think the LIT code may be broken (comments inline), and one comment is > wrong. Assuming you fix (or refute) those, then patches 1-8 are: > Reviewed-by: Kenneth Graunke <[email protected]> > > I haven't read through 9 and 10 yet, but I plan to soon.
>> +void
>> +fs_visitor::emit_fragment_program_code()
>> +{
>> + setup_fp_regs();
>> +
>> + fs_reg null = fs_reg(brw_null_reg());
>> +
>> + /* Keep a reg with 0.0 around, for reuse use by emit_sop so that it can
>
> "Keep a reg with 1.0 around, for reuse by emit_fp_sop"
> ^^^ (not 0.0) ^^ (function name)
fixed
>> + case OPCODE_DP2:
>> + case OPCODE_DP3:
>> + case OPCODE_DP4:
>> + case OPCODE_DPH: {
>> + fs_reg mul = fs_reg(this, glsl_type::float_type);
>> + fs_reg acc = fs_reg(this, glsl_type::float_type);
>> + int count;
>> +
>> + switch (fpi->Opcode) {
>> + case OPCODE_DP2: count = 2; break;
>> + case OPCODE_DP3: count = 3; break;
>> + case OPCODE_DP4: count = 4; break;
>> + case OPCODE_DPH: count = 3; break;
>> + default: assert(!"not reached"); count = 0; break;
>> + }
>> +
>> + emit(BRW_OPCODE_MUL, acc,
>> + regoffset(src[0], 0), regoffset(src[1], 0));
>> + for (int i = 1; i < count; i++) {
>> + emit(BRW_OPCODE_MUL, mul,
>> + regoffset(src[0], i), regoffset(src[1], i));
>> + emit(BRW_OPCODE_ADD, acc, acc, mul);
>> + }
>
> Future optimization: MAD would be nice here, but that can be done later.
Yeah, or even MACs. This is a codegen quality regression from
brw_wm_*.c.
>> + case OPCODE_LIT:
>> + /* From the ARB_fragment_program spec:
>> + *
>> + * tmp = VectorLoad(op0);
>> + * if (tmp.x < 0) tmp.x = 0;
>> + * if (tmp.y < 0) tmp.y = 0;
>> + * if (tmp.w < -(128.0-epsilon)) tmp.w = -(128.0-epsilon);
>> + * else if (tmp.w > 128-epsilon) tmp.w = 128-epsilon;
>> + * result.x = 1.0;
>> + * result.y = tmp.x;
>> + * result.z = (tmp.x > 0) ? RoughApproxPower(tmp.y, tmp.w) :
>> 0.0;
>> + * result.w = 1.0;
>> + */
>> + if (fpi->DstReg.WriteMask & WRITEMASK_X)
>> + emit(BRW_OPCODE_MOV, regoffset(dst, 0), fs_reg(1.0f));
>> +
>> + if (fpi->DstReg.WriteMask & WRITEMASK_YZ) {
>> + fs_inst *inst;
>> + inst = emit(BRW_OPCODE_CMP, null,
>> + regoffset(src[0], 0), fs_reg(0.0f));
>> + inst->conditional_mod = BRW_CONDITIONAL_LE;
>> +
>> + if (fpi->DstReg.WriteMask & WRITEMASK_Y) {
>> + emit(BRW_OPCODE_MOV, regoffset(dst, 1), regoffset(src[0],
>> 0));
>> + inst = emit(BRW_OPCODE_MOV, regoffset(dst, 1), fs_reg(0.0f));
>> + inst->predicated = true;
>> + }
>> +
>> + if (fpi->DstReg.WriteMask & WRITEMASK_Z) {
>> + emit_math(SHADER_OPCODE_POW, regoffset(dst, 2),
>> + regoffset(src[0], 1), regoffset(src[0], 3));
>> +
>> + inst = emit(BRW_OPCODE_MOV, regoffset(dst, 2), fs_reg(0.0f));
>> + inst->predicated = true;
>
> This looks broken...don't you need to handle clamping to (-128, 128)?
Ah, I lifted this code directly from the former backend. I'll put in a
note that I didn't change behavior.
I don't know of any use for the -128,128 clamping. I think it's just a
spec artifact from this extension being written for a particular
hardware instruction set. We should probably fix it for thoroughness,
but I don't expect app behavior to change. Do you want to see that done
in this series?
>> + case OPCODE_SCS:
>> + if (fpi->DstReg.WriteMask & WRITEMASK_X) {
>> + emit_math(SHADER_OPCODE_COS, regoffset(dst, 0),
>> + regoffset(src[0], 0));
>> + }
>> +
>> + if (fpi->DstReg.WriteMask & WRITEMASK_Y) {
>> + emit_math(SHADER_OPCODE_SIN, regoffset(dst, 1),
>> + regoffset(src[0], 1));
>> + }
>> + break;
>
> Future optimization: we could use the actual SINCOS math instruction
> when asking for WRITEMASK_XY. But I don't know how common that is.
I haven't seen one yet. If we did, given that GLSL doesn't have a
sincos, we probably would want to do it as a peephole optimization.
(I have seen sincos-applicable shaders on the vertex side, though. Not
well-written ones, just examples)
pgpciDfRlTI50.pgp
Description: PGP signature
_______________________________________________ mesa-dev mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/mesa-dev
