On 9 August 2013 17:14, Chris Forbes <chr...@ijw.co.nz> wrote: > V2: - Use the new VS_OPCODE_UNPACK_FLAGS_SIMD4X2 to correctly split the > flags for the two vertices being processed together. > - Don't apply bogus masking of clip flags. The set of plane enables > aren't included in the shader key, and we wouldn't want the > recompiles anyway. > > Signed-off-by: Chris Forbes <chr...@ijw.co.nz> > --- > src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 25 > ++++++++++++++----------- > 1 file changed, 14 insertions(+), 11 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > index f80777b..5e007de 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > @@ -2625,7 +2625,6 @@ vec4_visitor::emit_psiz_and_flags(struct brw_reg reg) > dst_reg header1 = dst_reg(this, glsl_type::uvec4_type); > dst_reg header1_w = header1; > header1_w.writemask = WRITEMASK_W; > - GLuint i; > > emit(MOV(header1, 0u)); > > @@ -2637,18 +2636,22 @@ vec4_visitor::emit_psiz_and_flags(struct brw_reg > reg) > emit(AND(header1_w, src_reg(header1_w), 0x7ff << 8)); > } > > - current_annotation = "Clipping flags"; > - for (i = 0; i < key->nr_userclip_plane_consts; i++) { > - vec4_instruction *inst; > - gl_varying_slot slot = (prog_data->vue_map.slots_valid & > VARYING_BIT_CLIP_VERTEX) > - ? VARYING_SLOT_CLIP_VERTEX : VARYING_SLOT_POS; > + if (key->userclip_active) { > + current_annotation = "Clipping flags"; > + dst_reg temp = dst_reg(this, glsl_type::uint_type); > + dst_reg temp2 = dst_reg(this, glsl_type::uint_type); > > - inst = emit(DP4(dst_null_f(), src_reg(output_reg[slot]), > - src_reg(this->userplane[i]))); > - inst->conditional_mod = BRW_CONDITIONAL_L; > + emit(CMP(dst_null_f(), > src_reg(output_reg[VARYING_SLOT_CLIP_DIST0]), src_reg(0.0f), > BRW_CONDITIONAL_L)); > + emit(VS_OPCODE_UNPACK_FLAGS_SIMD4X2, temp, src_reg(0)); > + emit(AND(temp2, src_reg(temp), src_reg(0x0fu))); >
I think this AND is unnecessary. VS_OPCODE_UNPACK_FLAGS_SIMD4X2 is guaranteed to produce outputs that only have bits 0-3 set. > > - inst = emit(OR(header1_w, src_reg(header1_w), 1u << i)); > - inst->predicate = BRW_PREDICATE_NORMAL; > + emit(CMP(dst_null_f(), > src_reg(output_reg[VARYING_SLOT_CLIP_DIST1]), src_reg(0.0f), > BRW_CONDITIONAL_L)); > + emit(VS_OPCODE_UNPACK_FLAGS_SIMD4X2, temp, src_reg(0)); > + emit(AND(temp, src_reg(temp), src_reg(0x0fu))); > Same with this AND. > + emit(SHL(temp, src_reg(temp), src_reg(4))); > + emit(OR(temp2, src_reg(temp2), src_reg(temp))); > + > + emit(OR(header1_w, src_reg(header1_w), src_reg(temp2))); > Slight nit-pick: re-using temp registers introduces extra dependencies, making it more difficult for the instruction scheduler to produce efficient code. It probably won't produce a significant performance difference, but you might want to consider using a separate temp register for each intermediate result. It shouldn't create a sizeable amount of extra register pressure since this code executes at the end ofthe shader anyhow. > } > > /* i965 clipping workaround: > -- > 1.8.3.4 > All of my suggestions on this patch are minor, so whether or not you decide to follow them, this patch is: Reviewed-by: Paul Berry <stereotype...@gmail.com>
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev