Re: [Mesa-dev] [PATCH] mesa: fix make check for ARB_texture_gather
On 10/02/2013 06:11 PM, Chris Forbes wrote: Clean up inconsistency in enum decoration: - Use the undecorated enums where possible. - MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB remains decorated, since it has no undecorated equivalent in GL4. Signed-off-by: Chris Forbes chr...@ijw.co.nz Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70054 Reviewed-by: Kenneth Graunke kenn...@whitecape.org ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 70054] EnumStrings.LookUpByNumber regression
https://bugs.freedesktop.org/show_bug.cgi?id=70054 Chris Forbes chr...@ijw.co.nz changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #1 from Chris Forbes chr...@ijw.co.nz --- Fix is on master now. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] gallium clear and depth mask clarification
I believe depth clears should not be affected by pipe_depth_state::writemask. I suspect that the only reason depth mask is not explicitly enable is because it is a boolean, unlike color/stencil write mask which are proper bitmasks. Therefore there is no additional information in depth write mask than what's already expressed by the PIPE_CLEAR_DEPTH bit. Jose - Original Message - Just want to check an inconsistency, so GL clears respect glDepthMask, gallium docs don't explicitly mention depth masking, they say clear isn't affected by color or stencil write masks, should that sentence contain depth? Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] st/clover: Always flush the queue when waiting on an hard_event
I don't think this is right, with this patch we remove *all* events from the command queue, signalled or not, every time the command queue is flushed. You are right, I got the logic wrong here (see also http://lists.freedesktop.org/archives/mesa-dev/2013-September/044363.html). The problem is that I have an application that causes a leak of event objects. That is, some events are never deleted from the queue. I will have to debug this further, but I am somewhat busy right now since I a have just relocated. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] st/clover: Always flush the queue when waiting on an hard_event
Do you have any example of a real world application that relies on this? Or at least some reasonable use case? The problem is that the queue is only cleared from already signalled events when we flush it. And we might not do this if the user only calls clWaitForEvents once the corresponding event has already been signalled. I am fine with not flushing the queue, but we should at least make sure that signalled events are freed early enough. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] gallivm: ignore rho approximation for cube maps
From: Roland Scheidegger srol...@vmware.com There's two reasons for this: 1) even when ignoring rho approximation for cube maps, the result is still not correct, but it's better as the max error at edges is now sqrt(2) instead of 2 (which was a full mip level), same as it is for ordinary 2d maps when doing rho approximations (so the error actually goes from factor 2 at edges and sqrt(2) completely inside a face to sqrt(2) at edges and 0 inside a face). 2) I want to repurpose rho_no_approx for cubemaps for fully correct cubemap derivatives (so don't need yet another debug var). --- src/gallium/auxiliary/gallivm/lp_bld_sample.c | 34 + 1 file changed, 12 insertions(+), 22 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c b/src/gallium/auxiliary/gallivm/lp_bld_sample.c index c775382..ea6bec7 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c @@ -269,10 +269,8 @@ lp_build_rho(struct lp_build_sample_context *bld, /* Could optimize this for single quad just skip the broadcast */ cubesize = lp_build_extract_broadcast(gallivm, bld-float_size_in_type, rho_bld-type, float_size, index0); - if (no_rho_opt) { - /* skipping sqrt hence returning rho squared */ - cubesize = lp_build_mul(rho_bld, cubesize, cubesize); - } + /* skipping sqrt hence returning rho squared */ + cubesize = lp_build_mul(rho_bld, cubesize, cubesize); rho = lp_build_mul(rho_bld, cubesize, rho); } else if (derivs !(bld-static_texture_state-target == PIPE_TEXTURE_CUBE)) { @@ -757,8 +755,8 @@ lp_build_lod_selector(struct lp_build_sample_context *bld, } else { LLVMValueRef rho; - boolean rho_squared = (gallivm_debug GALLIVM_DEBUG_NO_RHO_APPROX) - (bld-dims 1); + boolean rho_squared = ((gallivm_debug GALLIVM_DEBUG_NO_RHO_APPROX) +(bld-dims 1)) || cube_rho; rho = lp_build_rho(bld, texture_unit, s, t, r, cube_rho, derivs); @@ -1602,31 +1600,23 @@ lp_build_cube_lookup(struct lp_build_sample_context *bld, * know the texture is square which simplifies things (we can omit the * size mul which happens very early completely here and do it at the * very end). + * Also always do calculations according to GALLIVM_DEBUG_NO_RHO_APPROX + * since the error can get quite big otherwise at edges. + * (With no_rho_approx max error is sqrt(2) at edges, same as it is + * without no_rho_approx for 2d textures, otherwise it would be factor 2.) */ ddx_ddy[0] = lp_build_packed_ddx_ddy_twocoord(coord_bld, s, t); ddx_ddy[1] = lp_build_packed_ddx_ddy_onecoord(coord_bld, r); - if (gallivm_debug GALLIVM_DEBUG_NO_RHO_APPROX) { -ddx_ddy[0] = lp_build_mul(coord_bld, ddx_ddy[0], ddx_ddy[0]); -ddx_ddy[1] = lp_build_mul(coord_bld, ddx_ddy[1], ddx_ddy[1]); - } - else { -ddx_ddy[0] = lp_build_abs(coord_bld, ddx_ddy[0]); -ddx_ddy[1] = lp_build_abs(coord_bld, ddx_ddy[1]); - } + ddx_ddy[0] = lp_build_mul(coord_bld, ddx_ddy[0], ddx_ddy[0]); + ddx_ddy[1] = lp_build_mul(coord_bld, ddx_ddy[1], ddx_ddy[1]); tmp[0] = lp_build_swizzle_aos(coord_bld, ddx_ddy[0], swizzle01); tmp[1] = lp_build_swizzle_aos(coord_bld, ddx_ddy[0], swizzle23); tmp[2] = lp_build_swizzle_aos(coord_bld, ddx_ddy[1], swizzle02); - if (gallivm_debug GALLIVM_DEBUG_NO_RHO_APPROX) { -rho_vec = lp_build_add(coord_bld, tmp[0], tmp[1]); -rho_vec = lp_build_add(coord_bld, rho_vec, tmp[2]); - } - else { -rho_vec = lp_build_max(coord_bld, tmp[0], tmp[1]); -rho_vec = lp_build_max(coord_bld, rho_vec, tmp[2]); - } + rho_vec = lp_build_add(coord_bld, tmp[0], tmp[1]); + rho_vec = lp_build_add(coord_bld, rho_vec, tmp[2]); tmp[0] = lp_build_swizzle_aos(coord_bld, rho_vec, swizzle0); tmp[1] = lp_build_swizzle_aos(coord_bld, rho_vec, swizzle1); -- 1.7.9.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] gallivm: handle explicit derivatives for cubemaps
From: Roland Scheidegger srol...@vmware.com They need some special handling. Quite complicated. Additionally, use the same code for implicit derivatives too if no_rho_approx and no_quad_lod is set, because it seems while generally it should be ok to use per quad lod for implicit derivatives there's at least some test which insists that in case of cubemaps the shared lod value MUST come from a pixel inside the primitive (due to the derivatives becoming different if a different larger major axis is chosen). --- src/gallium/auxiliary/gallivm/lp_bld_sample.c | 221 +++-- src/gallium/auxiliary/gallivm/lp_bld_sample.h |3 +- src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c | 35 +++- 3 files changed, 231 insertions(+), 28 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c b/src/gallium/auxiliary/gallivm/lp_bld_sample.c index ea6bec7..ce05522 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c @@ -273,7 +273,7 @@ lp_build_rho(struct lp_build_sample_context *bld, cubesize = lp_build_mul(rho_bld, cubesize, cubesize); rho = lp_build_mul(rho_bld, cubesize, rho); } - else if (derivs !(bld-static_texture_state-target == PIPE_TEXTURE_CUBE)) { + else if (derivs) { LLVMValueRef ddmax[3], ddx[3], ddy[3]; for (i = 0; i dims; i++) { LLVMValueRef floatdim; @@ -1488,8 +1488,9 @@ lp_build_cube_face(struct lp_build_sample_context *bld, void lp_build_cube_lookup(struct lp_build_sample_context *bld, LLVMValueRef *coords, - const struct lp_derivatives *derivs, /* optional */ + const struct lp_derivatives *derivs_in, /* optional */ LLVMValueRef *rho, + struct lp_derivatives *derivs_out, /* optional */ boolean need_derivs) { struct lp_build_context *coord_bld = bld-coord_bld; @@ -1512,8 +1513,6 @@ lp_build_cube_lookup(struct lp_build_sample_context *bld, * the edge). Still this is possibly a win over just selecting the same face * for all pixels. Unfortunately, something like that doesn't work for * explicit derivatives. - * TODO: handle explicit derivatives by transforming them alongside coords - * somehow. */ struct lp_build_context *cint_bld = bld-int_coord_bld; struct lp_type intctype = cint_bld-type; @@ -1522,7 +1521,7 @@ lp_build_cube_lookup(struct lp_build_sample_context *bld, LLVMValueRef as_ge_at, maxasat, ar_ge_as_at; LLVMValueRef snewx, tnewx, snewy, tnewy, snewz, tnewz; LLVMValueRef tnegi, rnegi; - LLVMValueRef ma, mai, ima; + LLVMValueRef ma, mai, imahalfpos; LLVMValueRef posHalf = lp_build_const_vec(gallivm, coord_bld-type, 0.5); LLVMValueRef signmask = lp_build_const_int_vec(gallivm, intctype, 1 (intctype.width - 1)); @@ -1561,7 +1560,195 @@ lp_build_cube_lookup(struct lp_build_sample_context *bld, maxasat = lp_build_max(coord_bld, as, at); ar_ge_as_at = lp_build_cmp(coord_bld, PIPE_FUNC_GEQUAL, ar, maxasat); - if (need_derivs) { + if (need_derivs (derivs_in || + ((gallivm_debug GALLIVM_DEBUG_NO_QUAD_LOD) + (gallivm_debug GALLIVM_DEBUG_NO_RHO_APPROX { + /* + * XXX: This is really really complex. + * It is a bit overkill to use this for implicit derivatives as well, + * no way this is worth the cost in practice, but seems to be the + * only way for getting accurate and per-pixel lod values. + */ + LLVMValueRef imapos, tmp, ddx[3], ddy[3]; + LLVMValueRef madx, mady, madxdivma, madydivma; + LLVMValueRef sdxi, tdxi, rdxi, signsdx, signtdx, signrdx; + LLVMValueRef sdyi, tdyi, rdyi, signsdy, signtdy, signrdy; + LLVMValueRef tdxnegi, rdxnegi, tdynegi, rdynegi; + LLVMValueRef sdxnewx, sdxnewy, sdxnewz, tdxnewx, tdxnewy, tdxnewz; + LLVMValueRef sdynewx, sdynewy, sdynewz, tdynewx, tdynewy, tdynewz; + LLVMValueRef face_sdx, face_tdx, face_sdy, face_tdy; + LLVMValueRef posHalf = lp_build_const_vec(coord_bld-gallivm, + coord_bld-type, 0.5); + /* + * s = 1/2 * ( sc / ma + 1) + * t = 1/2 * ( tc / ma + 1) + * + * s' = 1/2 * (sc' * ma - sc * ma') / ma^2 + * t' = 1/2 * (tc' * ma - tc * ma') / ma^2 + * + * dx.s = 0.5 * (dx.sc - sc * dx.ma / ma) / ma + * dx.t = 0.5 * (dx.tc - tc * dx.ma / ma) / ma + * dy.s = 0.5 * (dy.sc - sc * dy.ma / ma) / ma + * dy.t = 0.5 * (dy.tc - tc * dy.ma / ma) / ma + */ + + /* select ma, calculate ima */ + ma = lp_build_select(coord_bld, as_ge_at, s, t); + ma = lp_build_select(coord_bld, ar_ge_as_at, r, ma);
[Mesa-dev] [PATCH 3/3] gallivm: kill old per-quad face selection code
From: Roland Scheidegger srol...@vmware.com Not used since ages, and it wouldn't work at all with explicit derivatives now (not that it did before as it ignored them but now the code would just use the derivs pre-projected which would be quite random numbers). --- src/gallium/auxiliary/gallivm/lp_bld_sample.c | 751 +++-- 1 file changed, 313 insertions(+), 438 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c b/src/gallium/auxiliary/gallivm/lp_bld_sample.c index ce05522..3fac981 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c @@ -1493,323 +1493,135 @@ lp_build_cube_lookup(struct lp_build_sample_context *bld, struct lp_derivatives *derivs_out, /* optional */ boolean need_derivs) { + /* +* Do per-pixel face selection. We cannot however (as we used to do) +* simply calculate the derivs afterwards (which is very bogus for +* explicit derivs btw) because the values would be random when +* not all pixels lie on the same face. So what we do here is just +* calculate the derivatives after scaling the coords by the absolute +* value of the inverse major axis, and essentially do rho calculation +* steps as if it were a 3d texture. This is perfect if all pixels hit +* the same face, but not so great at edges, I believe the max error +* should be sqrt(2) with no_rho_approx or 2 otherwise (essentially measuring +* the 3d distance between 2 points on the cube instead of measuring up/down +* the edge). Still this is possibly a win over just selecting the same face +* for all pixels. Unfortunately, something like that doesn't work for +* explicit derivatives. +*/ struct lp_build_context *coord_bld = bld-coord_bld; LLVMBuilderRef builder = bld-gallivm-builder; struct gallivm_state *gallivm = bld-gallivm; LLVMValueRef si, ti, ri; + struct lp_build_context *cint_bld = bld-int_coord_bld; + struct lp_type intctype = cint_bld-type; + LLVMValueRef signs, signt, signr, signma; + LLVMValueRef as, at, ar, face, face_s, face_t; + LLVMValueRef as_ge_at, maxasat, ar_ge_as_at; + LLVMValueRef snewx, tnewx, snewy, tnewy, snewz, tnewz; + LLVMValueRef tnegi, rnegi; + LLVMValueRef ma, mai, imahalfpos; + LLVMValueRef posHalf = lp_build_const_vec(gallivm, coord_bld-type, 0.5); + LLVMValueRef signmask = lp_build_const_int_vec(gallivm, intctype, + 1 (intctype.width - 1)); + LLVMValueRef signshift = lp_build_const_int_vec(gallivm, intctype, + intctype.width -1); + LLVMValueRef facex = lp_build_const_int_vec(gallivm, intctype, PIPE_TEX_FACE_POS_X); + LLVMValueRef facey = lp_build_const_int_vec(gallivm, intctype, PIPE_TEX_FACE_POS_Y); + LLVMValueRef facez = lp_build_const_int_vec(gallivm, intctype, PIPE_TEX_FACE_POS_Z); + LLVMValueRef s = coords[0]; + LLVMValueRef t = coords[1]; + LLVMValueRef r = coords[2]; + + assert(PIPE_TEX_FACE_NEG_X == PIPE_TEX_FACE_POS_X + 1); + assert(PIPE_TEX_FACE_NEG_Y == PIPE_TEX_FACE_POS_Y + 1); + assert(PIPE_TEX_FACE_NEG_Z == PIPE_TEX_FACE_POS_Z + 1); - if (1 || coord_bld-type.length 4) { - /* - * Do per-pixel face selection. We cannot however (as we used to do) - * simply calculate the derivs afterwards (which is very bogus for - * explicit derivs btw) because the values would be random when - * not all pixels lie on the same face. So what we do here is just - * calculate the derivatives after scaling the coords by the absolute - * value of the inverse major axis, and essentially do rho calculation - * steps as if it were a 3d texture. This is perfect if all pixels hit - * the same face, but not so great at edges, I believe the max error - * should be sqrt(2) with no_rho_approx or 2 otherwise (essentially measuring - * the 3d distance between 2 points on the cube instead of measuring up/down - * the edge). Still this is possibly a win over just selecting the same face - * for all pixels. Unfortunately, something like that doesn't work for - * explicit derivatives. - */ - struct lp_build_context *cint_bld = bld-int_coord_bld; - struct lp_type intctype = cint_bld-type; - LLVMValueRef signs, signt, signr, signma; - LLVMValueRef as, at, ar, face, face_s, face_t; - LLVMValueRef as_ge_at, maxasat, ar_ge_as_at; - LLVMValueRef snewx, tnewx, snewy, tnewy, snewz, tnewz; - LLVMValueRef tnegi, rnegi; - LLVMValueRef ma, mai, imahalfpos; - LLVMValueRef posHalf = lp_build_const_vec(gallivm, coord_bld-type, 0.5); - LLVMValueRef signmask = lp_build_const_int_vec(gallivm, intctype, - 1 (intctype.width - 1)); - LLVMValueRef signshift =
Re: [Mesa-dev] [PATCH] gen7: Use logical, not physical, dims in 3DSTATE_DEPTH_BUFFER (v2)
It would be good to test HSW too. Reviewed-by: Jordan Justen jordan.l.jus...@intel.com On Wed, 2013-10-02 at 17:50 -0700, Chad Versace wrote: In 3DSTATE_DEPTH_BUFFER, we set Width and Height to the miptree slice's physical dimensions. (Logical and physical dimensions may differ for multisample surfaces). However, in SURFACE_STATE, we always set Width and Height to the slice's logical dimensions. We should do the same for 3DSTATE_DEPTH_BUFFER, because the hw docs say so. No Piglit regressions (-x glx -x glean) on Ivybridge with Wayland. v2: No Piglit regressions, for real this time. CC: Jordan Justen jordan.l.jus...@intel.com CC: Eric Anholt e...@anholt.org Signed-off-by: Chad Versace chad.vers...@linux.intel.com --- My first patch was garbage. It segfaulted in gen7_misc_state.c and hung the GPU in gen7_blorp.cpp. I thought I regression tested it, but my Piglit automation scripts suck; the scripts just ran Piglit twice on master. I really regression tested this patch. I promise. src/mesa/drivers/dri/i965/gen7_blorp.cpp| 4 ++-- src/mesa/drivers/dri/i965/gen7_misc_state.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp b/src/mesa/drivers/dri/i965/gen7_blorp.cpp index 9df3d92..f64e536 100644 --- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp +++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp @@ -706,8 +706,8 @@ gen7_blorp_emit_depth_stencil_config(struct brw_context *brw, surfwidth = params-depth.width; surfheight = params-depth.height; } else { - surfwidth = params-depth.mt-physical_width0; - surfheight = params-depth.mt-physical_height0; + surfwidth = params-depth.mt-logical_width0; + surfheight = params-depth.mt-logical_height0; } /* 3DSTATE_DEPTH_BUFFER */ diff --git a/src/mesa/drivers/dri/i965/gen7_misc_state.c b/src/mesa/drivers/dri/i965/gen7_misc_state.c index eb942cf..3f3833e 100644 --- a/src/mesa/drivers/dri/i965/gen7_misc_state.c +++ b/src/mesa/drivers/dri/i965/gen7_misc_state.c @@ -93,8 +93,8 @@ gen7_emit_depth_stencil_hiz(struct brw_context *brw, lod = irb ? irb-mt_level - irb-mt-first_level : 0; if (mt) { - width = mt-physical_width0; - height = mt-physical_height0; + width = mt-logical_width0; + height = mt-logical_height0; } /* _NEW_DEPTH, _NEW_STENCIL, _NEW_BUFFERS */ ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/10] i965/fs: Add a peephole pass to combine ADD with ADDC/SUBB.
v2: Check fixed_hw_reg.{file,nr} instead of dst.reg. v3: Store the bool emitted_addc_or_subb in the class, not static. --- src/mesa/drivers/dri/i965/brw_fs.h | 3 + src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 104 +++ 2 files changed, 107 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 6a53e59..c703c2b 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -345,6 +345,7 @@ public: fs_reg src0, fs_reg src1); bool try_emit_saturate(ir_expression *ir); bool try_emit_mad(ir_expression *ir, int mul_arg); + void try_combine_add_with_addc_subb(); void try_replace_with_sel(); void emit_bool_to_cond_code(ir_rvalue *condition); void emit_if_gen6(ir_if *ir); @@ -458,6 +459,8 @@ public: int force_uncompressed_stack; int force_sechalf_stack; + + bool emitted_addc_or_subb; }; /** diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index b8c30e6..8accbd6 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -313,6 +313,102 @@ fs_visitor::try_emit_mad(ir_expression *ir, int mul_arg) return true; } +/** + * The addition and carry in the uaddCarry() built-in function are implemented + * separately as ir_binop_add and ir_binop_carry respectively. i965 generates + * ADDC and a MOV from the accumulator for the carry. + * + * The generated code for uaddCarry(uint x, uint y, out uint carry) would look + * like this: + * + *addc null, x, y + *mov carry, acc0 + *add sum, x, y + * + * This peephole pass optimizes this into + * + *addc sum, x, y + *mov carry, acc0 + * + * usubBorrow() works in the same fashion. + */ +void +fs_visitor::try_combine_add_with_addc_subb() +{ + /* ADDC/SUBB was introduced in gen7. */ + if (brw-gen 7) + return; + + fs_inst *add_inst = (fs_inst *) instructions.get_tail(); + assert(add_inst-opcode == BRW_OPCODE_ADD); + + /* ADDC/SUBB only operates on UD. */ + if (add_inst-dst.type != BRW_REGISTER_TYPE_UD || + add_inst-src[0].type != BRW_REGISTER_TYPE_UD || + add_inst-src[1].type != BRW_REGISTER_TYPE_UD) + return; + + bool found = false; + fs_inst *match = (fs_inst *) add_inst-prev; + /* The ADDC should appear within 8 instructions of ADD for a vec4. SUBB +* should appear farther away because of the extra MOV negates. +*/ + for (int i = 0; i 16; i++, match = (fs_inst *) match-prev) { + if (match-is_head_sentinel()) + return; + + /* Look for an ADDC/SUBB instruction whose destination is the null + * register (ir_binop_carry emits ADDC with null destination; same for + * ir_binop_borrow with SUBB) and whose sources are identical to those + * of the ADD. + */ + if (match-opcode != BRW_OPCODE_ADDC match-opcode != BRW_OPCODE_SUBB) + continue; + + /* Only look for newly emitted ADDC/SUBB with null destination. */ + if (match-dst.file != HW_REG || + match-dst.fixed_hw_reg.file != BRW_ARCHITECTURE_REGISTER_FILE || + match-dst.fixed_hw_reg.nr != BRW_ARF_NULL) + continue; + + fs_reg *src0 = add_inst-src[0]; + fs_reg *src1 = add_inst-src[1]; + + /* For SUBB, the ADD's second source will contain a negate modifier + * which at this point will be in the form of a + * + *MOV dst, -src + * + * instruction, so src[1].file will be GRF, even if it's a uniform push + * constant. + */ + if (match-src[1].reg != add_inst-src[1].reg) { + /* The negating MOV should be immediately before the ADD. */ + fs_inst *mov_inst = (fs_inst *) add_inst-prev; + if (mov_inst-opcode != BRW_OPCODE_MOV) +continue; + + src1 = mov_inst-src[0]; + } + + /* If everything matches, we're done. */ + if (match-src[0].file == src0-file + match-src[1].file == src1-file + match-src[0].reg == src0-reg + match-src[1].reg == src1-reg + match-src[0].reg_offset == src0-reg_offset + match-src[1].reg_offset == src1-reg_offset) { + found = true; + break; + } + } + + if (found) { + match-dst = add_inst-dst; + add_inst-remove(); + } +} + void fs_visitor::visit(ir_expression *ir) { @@ -415,6 +511,8 @@ fs_visitor::visit(ir_expression *ir) case ir_binop_add: emit(ADD(this-result, op[0], op[1])); + if (emitted_addc_or_subb) + try_combine_add_with_addc_subb(); break; case ir_binop_sub: assert(!not reached: should be handled by ir_sub_to_add_neg); @@ -451,6 +549,8 @@ fs_visitor::visit(ir_expression *ir) if (brw-gen = 7 dispatch_width == 16) fail(16-wide explicit accumulator operands unsupported\n); + emitted_addc_or_subb =
[Mesa-dev] [PATCH 10/10] i965/vs: Add a peephole pass to combine ADD with ADDC/SUBB.
v2: Check fixed_hw_reg.{file,nr} instead of dst.reg. v3: Store the bool emitted_addc_or_subb in the class, not static. --- src/mesa/drivers/dri/i965/brw_vec4.h | 3 + src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 104 - 2 files changed, 106 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 25427d7..9e2204d 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -507,6 +507,7 @@ public: bool try_emit_sat(ir_expression *ir); bool try_emit_mad(ir_expression *ir, int mul_arg); + void try_combine_add_with_addc_subb(); void resolve_ud_negate(src_reg *reg); src_reg get_timestamp(); @@ -530,6 +531,8 @@ protected: virtual int compute_array_stride(ir_dereference_array *ir); const bool debug_flag; + + bool emitted_addc_or_subb; }; diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index ffb2cfc..74bdd4d 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -1122,6 +1122,102 @@ vec4_visitor::try_emit_mad(ir_expression *ir, int mul_arg) return true; } +/** + * The addition and carry in the uaddCarry() built-in function are implemented + * separately as ir_binop_add and ir_binop_carry respectively. i965 generates + * ADDC and a MOV from the accumulator for the carry. + * + * The generated code for uaddCarry(uint x, uint y, out uint carry) would look + * like this: + * + *addc null, x, y + *mov carry, acc0 + *add sum, x, y + * + * This peephole pass optimizes this into + * + *addc sum, x, y + *mov carry, acc0 + * + * usubBorrow() works in the same fashion. + */ +void +vec4_visitor::try_combine_add_with_addc_subb() +{ + /* ADDC/SUBB was introduced in gen7. */ + if (brw-gen 7) + return; + + vec4_instruction *add_inst = (vec4_instruction *) instructions.get_tail(); + assert(add_inst-opcode == BRW_OPCODE_ADD); + + /* ADDC/SUBB only operates on UD. */ + if (add_inst-dst.type != BRW_REGISTER_TYPE_UD || + add_inst-src[0].type != BRW_REGISTER_TYPE_UD || + add_inst-src[1].type != BRW_REGISTER_TYPE_UD) + return; + + bool found = false; + vec4_instruction *match = (vec4_instruction *) add_inst-prev; + /* The ADDC should appear within 2 instructions of ADD. SUBB should appear +* farther away because of the extra MOV negate. +*/ + for (int i = 0; i 4; i++, match = (vec4_instruction *) match-prev) { + if (match-is_head_sentinel()) + return; + + /* Look for an ADDC/SUBB instruction whose destination is the null + * register (ir_binop_carry emits ADDC with null destination; same for + * ir_binop_borrow with SUBB) and whose sources are identical to those + * of the ADD. + */ + if (match-opcode != BRW_OPCODE_ADDC match-opcode != BRW_OPCODE_SUBB) + continue; + + /* Only look for newly emitted ADDC/SUBB with null destination. */ + if (match-dst.file != HW_REG || + match-dst.fixed_hw_reg.file != BRW_ARCHITECTURE_REGISTER_FILE || + match-dst.fixed_hw_reg.nr != BRW_ARF_NULL) + continue; + + src_reg *src0 = add_inst-src[0]; + src_reg *src1 = add_inst-src[1]; + + /* For SUBB, the ADD's second source will contain a negate modifier + * which at this point will be in the form of a + * + *MOV dst, -src + * + * instruction, so src[1].file will be GRF, even if it's a uniform push + * constant. + */ + if (match-src[1].reg != add_inst-src[1].reg) { + /* The negating MOV should be immediately before the ADD. */ + vec4_instruction *mov_inst = (vec4_instruction *) add_inst-prev; + if (mov_inst-opcode != BRW_OPCODE_MOV) +continue; + + src1 = mov_inst-src[0]; + } + + /* If everything matches, we're done. */ + if (match-src[0].file == src0-file + match-src[1].file == src1-file + match-src[0].reg == src0-reg + match-src[1].reg == src1-reg + match-src[0].reg_offset == src0-reg_offset + match-src[1].reg_offset == src1-reg_offset) { + found = true; + break; + } + } + + if (found) { + match-dst = add_inst-dst; + add_inst-remove(); + } +} + void vec4_visitor::emit_bool_comparison(unsigned int op, dst_reg dst, src_reg src0, src_reg src1) @@ -1319,6 +1415,8 @@ vec4_visitor::visit(ir_expression *ir) case ir_binop_add: emit(ADD(result_dst, op[0], op[1])); + if (emitted_addc_or_subb) + try_combine_add_with_addc_subb(); break; case ir_binop_sub: assert(!not reached: should be handled by ir_sub_to_add_neg); @@ -1359,6 +1457,8 @@ vec4_visitor::visit(ir_expression *ir)
Re: [Mesa-dev] [PATCH 2/2] st/clover: Always flush the queue when waiting on an hard_event
Niels Ole Salscheider niels_...@salscheider-online.de writes: Do you have any example of a real world application that relies on this? Or at least some reasonable use case? The problem is that the queue is only cleared from already signalled events when we flush it. And we might not do this if the user only calls clWaitForEvents once the corresponding event has already been signalled. I am fine with not flushing the queue, but we should at least make sure that signalled events are freed early enough. So your application doesn't call clFlush() explicitly nor any blocking call on that specific event and it stalls forever polling an event with clGetEventInfo() that never gets flushed to the GPU? Is that the problem you've seen? Is it an open source application? Thanks. pgpJ9Q9lsmx8m.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] gallivm: handle explicit derivatives for cubemaps
On 10/03/2013 09:42 AM, srol...@vmware.com wrote: From: Roland Scheidegger srol...@vmware.com They need some special handling. Quite complicated. Additionally, use the same code for implicit derivatives too if no_rho_approx and no_quad_lod is set, because it seems while generally it should be ok to use per quad lod for implicit derivatives there's at least some test which insists that in case of cubemaps the shared lod value MUST come from a pixel inside the primitive (due to the derivatives becoming different if a different larger major axis is chosen). --- src/gallium/auxiliary/gallivm/lp_bld_sample.c | 221 +++-- src/gallium/auxiliary/gallivm/lp_bld_sample.h |3 +- src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c | 35 +++- 3 files changed, 231 insertions(+), 28 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c b/src/gallium/auxiliary/gallivm/lp_bld_sample.c index ea6bec7..ce05522 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c @@ -273,7 +273,7 @@ lp_build_rho(struct lp_build_sample_context *bld, cubesize = lp_build_mul(rho_bld, cubesize, cubesize); rho = lp_build_mul(rho_bld, cubesize, rho); } - else if (derivs !(bld-static_texture_state-target == PIPE_TEXTURE_CUBE)) { + else if (derivs) { LLVMValueRef ddmax[3], ddx[3], ddy[3]; for (i = 0; i dims; i++) { LLVMValueRef floatdim; @@ -1488,8 +1488,9 @@ lp_build_cube_face(struct lp_build_sample_context *bld, void lp_build_cube_lookup(struct lp_build_sample_context *bld, LLVMValueRef *coords, - const struct lp_derivatives *derivs, /* optional */ + const struct lp_derivatives *derivs_in, /* optional */ LLVMValueRef *rho, + struct lp_derivatives *derivs_out, /* optional */ boolean need_derivs) { struct lp_build_context *coord_bld = bld-coord_bld; @@ -1512,8 +1513,6 @@ lp_build_cube_lookup(struct lp_build_sample_context *bld, * the edge). Still this is possibly a win over just selecting the same face * for all pixels. Unfortunately, something like that doesn't work for * explicit derivatives. - * TODO: handle explicit derivatives by transforming them alongside coords - * somehow. */ struct lp_build_context *cint_bld = bld-int_coord_bld; struct lp_type intctype = cint_bld-type; @@ -1522,7 +1521,7 @@ lp_build_cube_lookup(struct lp_build_sample_context *bld, LLVMValueRef as_ge_at, maxasat, ar_ge_as_at; LLVMValueRef snewx, tnewx, snewy, tnewy, snewz, tnewz; LLVMValueRef tnegi, rnegi; - LLVMValueRef ma, mai, ima; + LLVMValueRef ma, mai, imahalfpos; LLVMValueRef posHalf = lp_build_const_vec(gallivm, coord_bld-type, 0.5); LLVMValueRef signmask = lp_build_const_int_vec(gallivm, intctype, 1 (intctype.width - 1)); @@ -1561,7 +1560,195 @@ lp_build_cube_lookup(struct lp_build_sample_context *bld, maxasat = lp_build_max(coord_bld, as, at); ar_ge_as_at = lp_build_cmp(coord_bld, PIPE_FUNC_GEQUAL, ar, maxasat); - if (need_derivs) { + if (need_derivs (derivs_in || + ((gallivm_debug GALLIVM_DEBUG_NO_QUAD_LOD) + (gallivm_debug GALLIVM_DEBUG_NO_RHO_APPROX { + /* + * XXX: This is really really complex. + * It is a bit overkill to use this for implicit derivatives as well, + * no way this is worth the cost in practice, but seems to be the + * only way for getting accurate and per-pixel lod values. + */ + LLVMValueRef imapos, tmp, ddx[3], ddy[3]; + LLVMValueRef madx, mady, madxdivma, madydivma; + LLVMValueRef sdxi, tdxi, rdxi, signsdx, signtdx, signrdx; + LLVMValueRef sdyi, tdyi, rdyi, signsdy, signtdy, signrdy; + LLVMValueRef tdxnegi, rdxnegi, tdynegi, rdynegi; + LLVMValueRef sdxnewx, sdxnewy, sdxnewz, tdxnewx, tdxnewy, tdxnewz; + LLVMValueRef sdynewx, sdynewy, sdynewz, tdynewx, tdynewy, tdynewz; + LLVMValueRef face_sdx, face_tdx, face_sdy, face_tdy; + LLVMValueRef posHalf = lp_build_const_vec(coord_bld-gallivm, + coord_bld-type, 0.5); + /* + * s = 1/2 * ( sc / ma + 1) + * t = 1/2 * ( tc / ma + 1) + * + * s' = 1/2 * (sc' * ma - sc * ma') / ma^2 + * t' = 1/2 * (tc' * ma - tc * ma') / ma^2 + * + * dx.s = 0.5 * (dx.sc - sc * dx.ma / ma) / ma + * dx.t = 0.5 * (dx.tc - tc * dx.ma / ma) / ma + * dy.s = 0.5 * (dy.sc - sc * dy.ma / ma) / ma + * dy.t = 0.5 * (dy.tc - tc * dy.ma / ma) / ma + */ + + /* select ma, calculate ima */ + ma =
Re: [Mesa-dev] [PATCH 1/3] gallivm: ignore rho approximation for cube maps
On 10/03/2013 09:42 AM, srol...@vmware.com wrote: From: Roland Scheidegger srol...@vmware.com There's two reasons for this: 1) even when ignoring rho approximation for cube maps, the result is still not correct, but it's better as the max error at edges is now sqrt(2) instead of 2 (which was a full mip level), same as it is for ordinary 2d maps when doing rho approximations (so the error actually goes from factor 2 at edges and sqrt(2) completely inside a face to sqrt(2) at edges and 0 inside a face). 2) I want to repurpose rho_no_approx for cubemaps for fully correct cubemap derivatives (so don't need yet another debug var). --- src/gallium/auxiliary/gallivm/lp_bld_sample.c | 34 + 1 file changed, 12 insertions(+), 22 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c b/src/gallium/auxiliary/gallivm/lp_bld_sample.c index c775382..ea6bec7 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c @@ -269,10 +269,8 @@ lp_build_rho(struct lp_build_sample_context *bld, /* Could optimize this for single quad just skip the broadcast */ cubesize = lp_build_extract_broadcast(gallivm, bld-float_size_in_type, rho_bld-type, float_size, index0); - if (no_rho_opt) { - /* skipping sqrt hence returning rho squared */ - cubesize = lp_build_mul(rho_bld, cubesize, cubesize); - } + /* skipping sqrt hence returning rho squared */ + cubesize = lp_build_mul(rho_bld, cubesize, cubesize); rho = lp_build_mul(rho_bld, cubesize, rho); } else if (derivs !(bld-static_texture_state-target == PIPE_TEXTURE_CUBE)) { @@ -757,8 +755,8 @@ lp_build_lod_selector(struct lp_build_sample_context *bld, } else { LLVMValueRef rho; - boolean rho_squared = (gallivm_debug GALLIVM_DEBUG_NO_RHO_APPROX) - (bld-dims 1); + boolean rho_squared = ((gallivm_debug GALLIVM_DEBUG_NO_RHO_APPROX) +(bld-dims 1)) || cube_rho; rho = lp_build_rho(bld, texture_unit, s, t, r, cube_rho, derivs); @@ -1602,31 +1600,23 @@ lp_build_cube_lookup(struct lp_build_sample_context *bld, * know the texture is square which simplifies things (we can omit the * size mul which happens very early completely here and do it at the * very end). + * Also always do calculations according to GALLIVM_DEBUG_NO_RHO_APPROX + * since the error can get quite big otherwise at edges. + * (With no_rho_approx max error is sqrt(2) at edges, same as it is + * without no_rho_approx for 2d textures, otherwise it would be factor 2.) */ ddx_ddy[0] = lp_build_packed_ddx_ddy_twocoord(coord_bld, s, t); ddx_ddy[1] = lp_build_packed_ddx_ddy_onecoord(coord_bld, r); - if (gallivm_debug GALLIVM_DEBUG_NO_RHO_APPROX) { -ddx_ddy[0] = lp_build_mul(coord_bld, ddx_ddy[0], ddx_ddy[0]); -ddx_ddy[1] = lp_build_mul(coord_bld, ddx_ddy[1], ddx_ddy[1]); - } - else { -ddx_ddy[0] = lp_build_abs(coord_bld, ddx_ddy[0]); -ddx_ddy[1] = lp_build_abs(coord_bld, ddx_ddy[1]); - } + ddx_ddy[0] = lp_build_mul(coord_bld, ddx_ddy[0], ddx_ddy[0]); + ddx_ddy[1] = lp_build_mul(coord_bld, ddx_ddy[1], ddx_ddy[1]); tmp[0] = lp_build_swizzle_aos(coord_bld, ddx_ddy[0], swizzle01); tmp[1] = lp_build_swizzle_aos(coord_bld, ddx_ddy[0], swizzle23); tmp[2] = lp_build_swizzle_aos(coord_bld, ddx_ddy[1], swizzle02); - if (gallivm_debug GALLIVM_DEBUG_NO_RHO_APPROX) { -rho_vec = lp_build_add(coord_bld, tmp[0], tmp[1]); -rho_vec = lp_build_add(coord_bld, rho_vec, tmp[2]); - } - else { -rho_vec = lp_build_max(coord_bld, tmp[0], tmp[1]); -rho_vec = lp_build_max(coord_bld, rho_vec, tmp[2]); - } + rho_vec = lp_build_add(coord_bld, tmp[0], tmp[1]); + rho_vec = lp_build_add(coord_bld, rho_vec, tmp[2]); I don't know how often we have these 3-way lp_build_add() sequences, but would an lp_build_add3(bld, a, b, c) be useful? tmp[0] = lp_build_swizzle_aos(coord_bld, rho_vec, swizzle0); tmp[1] = lp_build_swizzle_aos(coord_bld, rho_vec, swizzle1); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] gallivm: kill old per-quad face selection code
On 10/03/2013 09:42 AM, srol...@vmware.com wrote: From: Roland Scheidegger srol...@vmware.com Not used since ages, and it wouldn't work at all with explicit derivatives now (not that it did before as it ignored them but now the code would just use the derivs pre-projected which would be quite random numbers). --- src/gallium/auxiliary/gallivm/lp_bld_sample.c | 751 +++-- 1 file changed, 313 insertions(+), 438 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c b/src/gallium/auxiliary/gallivm/lp_bld_sample.c index ce05522..3fac981 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c @@ -1493,323 +1493,135 @@ lp_build_cube_lookup(struct lp_build_sample_context *bld, struct lp_derivatives *derivs_out, /* optional */ boolean need_derivs) { + /* +* Do per-pixel face selection. We cannot however (as we used to do) +* simply calculate the derivs afterwards (which is very bogus for +* explicit derivs btw) because the values would be random when +* not all pixels lie on the same face. So what we do here is just +* calculate the derivatives after scaling the coords by the absolute +* value of the inverse major axis, and essentially do rho calculation +* steps as if it were a 3d texture. This is perfect if all pixels hit +* the same face, but not so great at edges, I believe the max error +* should be sqrt(2) with no_rho_approx or 2 otherwise (essentially measuring +* the 3d distance between 2 points on the cube instead of measuring up/down +* the edge). Still this is possibly a win over just selecting the same face +* for all pixels. Unfortunately, something like that doesn't work for +* explicit derivatives. +*/ struct lp_build_context *coord_bld = bld-coord_bld; LLVMBuilderRef builder = bld-gallivm-builder; struct gallivm_state *gallivm = bld-gallivm; LLVMValueRef si, ti, ri; + struct lp_build_context *cint_bld = bld-int_coord_bld; + struct lp_type intctype = cint_bld-type; + LLVMValueRef signs, signt, signr, signma; + LLVMValueRef as, at, ar, face, face_s, face_t; + LLVMValueRef as_ge_at, maxasat, ar_ge_as_at; + LLVMValueRef snewx, tnewx, snewy, tnewy, snewz, tnewz; + LLVMValueRef tnegi, rnegi; + LLVMValueRef ma, mai, imahalfpos; + LLVMValueRef posHalf = lp_build_const_vec(gallivm, coord_bld-type, 0.5); + LLVMValueRef signmask = lp_build_const_int_vec(gallivm, intctype, + 1 (intctype.width - 1)); + LLVMValueRef signshift = lp_build_const_int_vec(gallivm, intctype, + intctype.width -1); + LLVMValueRef facex = lp_build_const_int_vec(gallivm, intctype, PIPE_TEX_FACE_POS_X); + LLVMValueRef facey = lp_build_const_int_vec(gallivm, intctype, PIPE_TEX_FACE_POS_Y); + LLVMValueRef facez = lp_build_const_int_vec(gallivm, intctype, PIPE_TEX_FACE_POS_Z); + LLVMValueRef s = coords[0]; + LLVMValueRef t = coords[1]; + LLVMValueRef r = coords[2]; + + assert(PIPE_TEX_FACE_NEG_X == PIPE_TEX_FACE_POS_X + 1); + assert(PIPE_TEX_FACE_NEG_Y == PIPE_TEX_FACE_POS_Y + 1); + assert(PIPE_TEX_FACE_NEG_Z == PIPE_TEX_FACE_POS_Z + 1); - if (1 || coord_bld-type.length 4) { - /* - * Do per-pixel face selection. We cannot however (as we used to do) - * simply calculate the derivs afterwards (which is very bogus for - * explicit derivs btw) because the values would be random when - * not all pixels lie on the same face. So what we do here is just - * calculate the derivatives after scaling the coords by the absolute - * value of the inverse major axis, and essentially do rho calculation - * steps as if it were a 3d texture. This is perfect if all pixels hit - * the same face, but not so great at edges, I believe the max error - * should be sqrt(2) with no_rho_approx or 2 otherwise (essentially measuring - * the 3d distance between 2 points on the cube instead of measuring up/down - * the edge). Still this is possibly a win over just selecting the same face - * for all pixels. Unfortunately, something like that doesn't work for - * explicit derivatives. - */ - struct lp_build_context *cint_bld = bld-int_coord_bld; - struct lp_type intctype = cint_bld-type; - LLVMValueRef signs, signt, signr, signma; - LLVMValueRef as, at, ar, face, face_s, face_t; - LLVMValueRef as_ge_at, maxasat, ar_ge_as_at; - LLVMValueRef snewx, tnewx, snewy, tnewy, snewz, tnewz; - LLVMValueRef tnegi, rnegi; - LLVMValueRef ma, mai, imahalfpos; - LLVMValueRef posHalf = lp_build_const_vec(gallivm, coord_bld-type, 0.5); - LLVMValueRef signmask = lp_build_const_int_vec(gallivm, intctype, - 1 (intctype.width - 1)); -
[Mesa-dev] [PATCH] radeonsi/compute: Fix segfault caused by recent refactoring
From: Tom Stellard thomas.stell...@amd.com --- src/gallium/drivers/radeon/r600_pipe_common.c | 4 src/gallium/drivers/radeonsi/radeonsi_shader.c | 4 ++-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index 852993c..b038740 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -249,6 +249,10 @@ static unsigned tgsi_get_processor_type(const struct tgsi_token *tokens) bool r600_can_dump_shader(struct r600_common_screen *rscreen, const struct tgsi_token *tokens) { + /* Compute shader don't have tgsi_tokens */ + if (!tokens) + return (rscreen-debug_flags DBG_CS) != 0; + switch (tgsi_get_processor_type(tokens)) { case TGSI_PROCESSOR_VERTEX: return (rscreen-debug_flags DBG_VS) != 0; diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c b/src/gallium/drivers/radeonsi/radeonsi_shader.c index 7ed3d26..97ed4e3 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_shader.c +++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c @@ -1759,8 +1759,8 @@ int si_compile_llvm(struct r600_context *rctx, struct si_pipe_shader *shader, unsigned i; uint32_t *ptr; struct radeon_llvm_binary binary; - bool dump = r600_can_dump_shader(rctx-screen-b, shader-selector-tokens); - + bool dump = r600_can_dump_shader(rctx-screen-b, + shader-selector ? shader-selector-tokens : NULL); memset(binary, 0, sizeof(binary)); radeon_llvm_compile(mod, binary, r600_get_llvm_processor_name(rctx-screen-b.family), dump); -- 1.8.1.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCHv2] configure: set HAVE_COMMON_DRI when building only swrast
On Wed, Oct 2, 2013 at 3:45 PM, Emil Velikov emil.l.veli...@gmail.com wrote: With commit cb1febb07, I have incorrectly removed HAVE_COMMON_DRI assuming that swrast does not need to build the translations for driconf options, as effectively swrast/drisw does not use them. With the incoming unification work of dri and drisw, it makes sense just to revert the offending hunk. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70057 Reported-by: Vinson Lee v...@freedesktop.org Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- v2: resolve typos in the commit message. Thanks Ian configure.ac | 1 + 1 file changed, 1 insertion(+) diff --git a/configure.ac b/configure.ac index e7c8223..9546163 100644 --- a/configure.ac +++ b/configure.ac @@ -1823,6 +1823,7 @@ if test x$with_gallium_drivers != x; then if test x$enable_dri = xyes; then GALLIUM_TARGET_DIRS=$GALLIUM_TARGET_DIRS dri-swrast +HAVE_COMMON_DRI=yes fi if test x$enable_vdpau = xyes; then GALLIUM_TARGET_DIRS=$GALLIUM_TARGET_DIRS vdpau-softpipe -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev This patch fixes the build for me. Tested-by: Vinson Lee v...@freedesktop.org ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] configure: set HAVE_COMMON_DRI when building only swrast
With commit cb1febb07, I have incorrectly removed HAVE_COMMON_DRI assuming that swrast does not need to build the translations for driconf options, as effectively swrast/drisw does not use them. With the incoming unification work of dri and drisw, it makes sense just to revert the offending hunk. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70057 Reported-by: Vinson Lee v...@freedesktop.org Tested-by: Vinson Lee v...@freedesktop.org Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- Vinson as I'm short of commit access (for obvious reasons) so feel free to commit if you're happy with this (and noone objects). I've made a silly assumption, which I'll try not to repeat in the future. Cheers Emil --- configure.ac | 1 + 1 file changed, 1 insertion(+) diff --git a/configure.ac b/configure.ac index e7c8223..9546163 100644 --- a/configure.ac +++ b/configure.ac @@ -1823,6 +1823,7 @@ if test x$with_gallium_drivers != x; then if test x$enable_dri = xyes; then GALLIUM_TARGET_DIRS=$GALLIUM_TARGET_DIRS dri-swrast +HAVE_COMMON_DRI=yes fi if test x$enable_vdpau = xyes; then GALLIUM_TARGET_DIRS=$GALLIUM_TARGET_DIRS vdpau-softpipe -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] gallivm: handle explicit derivatives for cubemaps
Am 03.10.2013 21:39, schrieb Brian Paul: On 10/03/2013 09:42 AM, srol...@vmware.com wrote: From: Roland Scheidegger srol...@vmware.com They need some special handling. Quite complicated. Additionally, use the same code for implicit derivatives too if no_rho_approx and no_quad_lod is set, because it seems while generally it should be ok to use per quad lod for implicit derivatives there's at least some test which insists that in case of cubemaps the shared lod value MUST come from a pixel inside the primitive (due to the derivatives becoming different if a different larger major axis is chosen). --- src/gallium/auxiliary/gallivm/lp_bld_sample.c | 221 +++-- src/gallium/auxiliary/gallivm/lp_bld_sample.h |3 +- src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c | 35 +++- 3 files changed, 231 insertions(+), 28 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c b/src/gallium/auxiliary/gallivm/lp_bld_sample.c index ea6bec7..ce05522 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c @@ -273,7 +273,7 @@ lp_build_rho(struct lp_build_sample_context *bld, cubesize = lp_build_mul(rho_bld, cubesize, cubesize); rho = lp_build_mul(rho_bld, cubesize, rho); } - else if (derivs !(bld-static_texture_state-target == PIPE_TEXTURE_CUBE)) { + else if (derivs) { LLVMValueRef ddmax[3], ddx[3], ddy[3]; for (i = 0; i dims; i++) { LLVMValueRef floatdim; @@ -1488,8 +1488,9 @@ lp_build_cube_face(struct lp_build_sample_context *bld, void lp_build_cube_lookup(struct lp_build_sample_context *bld, LLVMValueRef *coords, - const struct lp_derivatives *derivs, /* optional */ + const struct lp_derivatives *derivs_in, /* optional */ LLVMValueRef *rho, + struct lp_derivatives *derivs_out, /* optional */ boolean need_derivs) { struct lp_build_context *coord_bld = bld-coord_bld; @@ -1512,8 +1513,6 @@ lp_build_cube_lookup(struct lp_build_sample_context *bld, * the edge). Still this is possibly a win over just selecting the same face * for all pixels. Unfortunately, something like that doesn't work for * explicit derivatives. - * TODO: handle explicit derivatives by transforming them alongside coords - * somehow. */ struct lp_build_context *cint_bld = bld-int_coord_bld; struct lp_type intctype = cint_bld-type; @@ -1522,7 +1521,7 @@ lp_build_cube_lookup(struct lp_build_sample_context *bld, LLVMValueRef as_ge_at, maxasat, ar_ge_as_at; LLVMValueRef snewx, tnewx, snewy, tnewy, snewz, tnewz; LLVMValueRef tnegi, rnegi; - LLVMValueRef ma, mai, ima; + LLVMValueRef ma, mai, imahalfpos; LLVMValueRef posHalf = lp_build_const_vec(gallivm, coord_bld-type, 0.5); LLVMValueRef signmask = lp_build_const_int_vec(gallivm, intctype, 1 (intctype.width - 1)); @@ -1561,7 +1560,195 @@ lp_build_cube_lookup(struct lp_build_sample_context *bld, maxasat = lp_build_max(coord_bld, as, at); ar_ge_as_at = lp_build_cmp(coord_bld, PIPE_FUNC_GEQUAL, ar, maxasat); - if (need_derivs) { + if (need_derivs (derivs_in || + ((gallivm_debug GALLIVM_DEBUG_NO_QUAD_LOD) + (gallivm_debug GALLIVM_DEBUG_NO_RHO_APPROX { + /* + * XXX: This is really really complex. + * It is a bit overkill to use this for implicit derivatives as well, + * no way this is worth the cost in practice, but seems to be the + * only way for getting accurate and per-pixel lod values. + */ + LLVMValueRef imapos, tmp, ddx[3], ddy[3]; + LLVMValueRef madx, mady, madxdivma, madydivma; + LLVMValueRef sdxi, tdxi, rdxi, signsdx, signtdx, signrdx; + LLVMValueRef sdyi, tdyi, rdyi, signsdy, signtdy, signrdy; + LLVMValueRef tdxnegi, rdxnegi, tdynegi, rdynegi; + LLVMValueRef sdxnewx, sdxnewy, sdxnewz, tdxnewx, tdxnewy, tdxnewz; + LLVMValueRef sdynewx, sdynewy, sdynewz, tdynewx, tdynewy, tdynewz; + LLVMValueRef face_sdx, face_tdx, face_sdy, face_tdy; + LLVMValueRef posHalf = lp_build_const_vec(coord_bld-gallivm, + coord_bld-type, 0.5); + /* + * s = 1/2 * ( sc / ma + 1) + * t = 1/2 * ( tc / ma + 1) + * + * s' = 1/2 * (sc' * ma - sc * ma') / ma^2 + * t' = 1/2 * (tc' * ma - tc * ma') / ma^2 + * + * dx.s = 0.5 * (dx.sc - sc * dx.ma / ma) / ma + * dx.t = 0.5 * (dx.tc - tc * dx.ma / ma) / ma + * dy.s = 0.5 * (dy.sc - sc * dy.ma / ma) / ma +
Re: [Mesa-dev] [PATCH] radeonsi/compute: Fix segfault caused by recent refactoring
Reviewed-by: Marek Olšák marek.ol...@amd.com Marek On Thu, Oct 3, 2013 at 11:39 PM, Tom Stellard t...@stellard.net wrote: From: Tom Stellard thomas.stell...@amd.com --- src/gallium/drivers/radeon/r600_pipe_common.c | 4 src/gallium/drivers/radeonsi/radeonsi_shader.c | 4 ++-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index 852993c..b038740 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -249,6 +249,10 @@ static unsigned tgsi_get_processor_type(const struct tgsi_token *tokens) bool r600_can_dump_shader(struct r600_common_screen *rscreen, const struct tgsi_token *tokens) { + /* Compute shader don't have tgsi_tokens */ + if (!tokens) + return (rscreen-debug_flags DBG_CS) != 0; + switch (tgsi_get_processor_type(tokens)) { case TGSI_PROCESSOR_VERTEX: return (rscreen-debug_flags DBG_VS) != 0; diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c b/src/gallium/drivers/radeonsi/radeonsi_shader.c index 7ed3d26..97ed4e3 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_shader.c +++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c @@ -1759,8 +1759,8 @@ int si_compile_llvm(struct r600_context *rctx, struct si_pipe_shader *shader, unsigned i; uint32_t *ptr; struct radeon_llvm_binary binary; - bool dump = r600_can_dump_shader(rctx-screen-b, shader-selector-tokens); - + bool dump = r600_can_dump_shader(rctx-screen-b, + shader-selector ? shader-selector-tokens : NULL); memset(binary, 0, sizeof(binary)); radeon_llvm_compile(mod, binary, r600_get_llvm_processor_name(rctx-screen-b.family), dump); -- 1.8.1.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] [v2] i965: Extract region use from hiz depth buffer
On 10/01/2013 04:36 PM, Ben Widawsky wrote: On Tue, Oct 01, 2013 at 01:06:02PM -0700, Chad Versace wrote: On 09/30/2013 12:35 PM, Ben Widawsky wrote: Starting with Ivybridge, the hierarchical had relaxed requirements for its allocation. Following a simple formula in the bspec was all you needed to satisfy the requirement. To prepare the code for this, extract all places where the miptree was used, when we really only needed the region. This allows an upcoming patch to simply allocate the region, and not the whole miptree. v2: Don't use intel_region. Instead use bo + stride. We actually do store the stride in libdrm, but it is inaccessible in the current libdrm version. CC: Chad Versace chad.vers...@linux.intel.com Signed-off-by: Ben Widawsky b...@bwidawsk.net --- src/mesa/drivers/dri/i965/brw_misc_state.c| 11 +--- src/mesa/drivers/dri/i965/gen6_blorp.cpp | 20 +-- src/mesa/drivers/dri/i965/gen7_blorp.cpp | 6 ++--- src/mesa/drivers/dri/i965/gen7_misc_state.c | 5 ++-- src/mesa/drivers/dri/i965/intel_fbo.c | 4 +-- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 36 +++ src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 6 - 7 files changed, 52 insertions(+), 36 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c b/src/mesa/drivers/dri/i965/brw_misc_state.c index 7f4cd6f..23ffeab 100644 --- a/src/mesa/drivers/dri/i965/brw_misc_state.c +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c @@ -210,8 +210,12 @@ brw_get_depthstencil_tile_masks(struct intel_mipmap_tree *depth_mt, tile_mask_x, tile_mask_y, false); if (intel_miptree_slice_has_hiz(depth_mt, depth_level, depth_layer)) { +uint32_t tmp; uint32_t hiz_tile_mask_x, hiz_tile_mask_y; - intel_region_get_tile_masks(depth_mt-hiz_mt-region, +struct intel_region region = { .cpp = depth_mt-cpp }; + +drm_intel_bo_get_tiling(depth_mt-hiz_buffer.bo, region.tiling, tmp); + intel_region_get_tile_masks(region, hiz_tile_mask_x, hiz_tile_mask_y, false); /* Each HiZ row represents 2 rows of pixels */ @@ -667,11 +671,10 @@ brw_emit_depth_stencil_hiz(struct brw_context *brw, /* Emit hiz buffer. */ if (hiz) { - struct intel_mipmap_tree *hiz_mt = depth_mt-hiz_mt; BEGIN_BATCH(3); OUT_BATCH((_3DSTATE_HIER_DEPTH_BUFFER 16) | (3 - 2)); -OUT_BATCH(hiz_mt-region-pitch - 1); -OUT_RELOC(hiz_mt-region-bo, +OUT_BATCH(depth_mt-hiz_buffer.stride - 1); +OUT_RELOC(depth_mt-hiz_buffer.bo, I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, brw-depthstencil.hiz_offset); ADVANCE_BATCH(); diff --git a/src/mesa/drivers/dri/i965/gen6_blorp.cpp b/src/mesa/drivers/dri/i965/gen6_blorp.cpp index da523e5..fc3a331 100644 --- a/src/mesa/drivers/dri/i965/gen6_blorp.cpp +++ b/src/mesa/drivers/dri/i965/gen6_blorp.cpp @@ -887,16 +887,22 @@ gen6_blorp_emit_depth_stencil_config(struct brw_context *brw, /* 3DSTATE_HIER_DEPTH_BUFFER */ { - struct intel_region *hiz_region = params-depth.mt-hiz_mt-region; - uint32_t hiz_offset = - intel_region_get_aligned_offset(hiz_region, - draw_x ~tile_mask_x, - (draw_y ~tile_mask_y) / 2, false); + uint32_t hiz_offset, tmp; + struct intel_mipmap_tree *depth_mt = params-depth.mt; + struct intel_region hiz_region; + + hiz_region.cpp = depth_mt-cpp; + hiz_region.pitch = depth_mt-hiz_buffer.stride; + drm_intel_bo_get_tiling(depth_mt-hiz_buffer.bo, hiz_region.tiling, tmp); This initialization of hiz_region subtly differs from the initialization in the previous hunk that uses the designated initializer syntax. When using designated initializers, all uninitialized fields are initialized to 0. Here, the uninitialized fields have undefined values. Please use designated initializers here to prevent undefined behavior. + + hiz_offset = intel_region_get_aligned_offset(hiz_region, + draw_x ~tile_mask_x, + (draw_y ~tile_mask_y) / 2, false); BEGIN_BATCH(3); OUT_BATCH((_3DSTATE_HIER_DEPTH_BUFFER 16) | (3 - 2)); - OUT_BATCH(hiz_region-pitch - 1); - OUT_RELOC(hiz_region-bo, + OUT_BATCH(hiz_region.pitch - 1); + OUT_RELOC(depth_mt-hiz_buffer.bo, The 'hiz_region' is a temporary thing that will eventually die off as we clean up the driver. So, replace OUT_BATCH(hiz_region.pitch - 1) with OUT_BATCH(depth_mt-hiz_buffer.stride - 1). (As a nice little side-effect, the sequence of OUT_BATCH's look more symmetric that way). I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, hiz_offset);
Re: [Mesa-dev] [PATCH 1/2] [v2] i965: Extract region use from hiz depth buffer
On 10/01/2013 04:48 PM, Ben Widawsky wrote: On Tue, Oct 01, 2013 at 01:06:02PM -0700, Chad Versace wrote: On 09/30/2013 12:35 PM, Ben Widawsky wrote: diff --git a/src/mesa/drivers/dri/i965/gen6_blorp.cpp b/src/mesa/drivers/dri/i965/gen6_blorp.cpp index da523e5..fc3a331 100644 --- a/src/mesa/drivers/dri/i965/gen6_blorp.cpp +++ b/src/mesa/drivers/dri/i965/gen6_blorp.cpp @@ -887,16 +887,22 @@ gen6_blorp_emit_depth_stencil_config(struct brw_context *brw, /* 3DSTATE_HIER_DEPTH_BUFFER */ { - struct intel_region *hiz_region = params-depth.mt-hiz_mt-region; - uint32_t hiz_offset = - intel_region_get_aligned_offset(hiz_region, - draw_x ~tile_mask_x, - (draw_y ~tile_mask_y) / 2, false); + uint32_t hiz_offset, tmp; + struct intel_mipmap_tree *depth_mt = params-depth.mt; + struct intel_region hiz_region; + + hiz_region.cpp = depth_mt-cpp; + hiz_region.pitch = depth_mt-hiz_buffer.stride; + drm_intel_bo_get_tiling(depth_mt-hiz_buffer.bo, hiz_region.tiling, tmp); This initialization of hiz_region subtly differs from the initialization in the previous hunk that uses the designated initializer syntax. When using designated initializers, all uninitialized fields are initialized to 0. Here, the uninitialized fields have undefined values. Please use designated initializers here to prevent undefined behavior. + + hiz_offset = intel_region_get_aligned_offset(hiz_region, + draw_x ~tile_mask_x, + (draw_y ~tile_mask_y) / 2, false); BEGIN_BATCH(3); OUT_BATCH((_3DSTATE_HIER_DEPTH_BUFFER 16) | (3 - 2)); - OUT_BATCH(hiz_region-pitch - 1); - OUT_RELOC(hiz_region-bo, + OUT_BATCH(hiz_region.pitch - 1); + OUT_RELOC(depth_mt-hiz_buffer.bo, The 'hiz_region' is a temporary thing that will eventually die off as we clean up the driver. So, replace OUT_BATCH(hiz_region.pitch - 1) with OUT_BATCH(depth_mt-hiz_buffer.stride - 1). (As a nice little side-effect, the sequence of OUT_BATCH's look more symmetric that way). Are you referring to memset? The only initializer is intel_region_alloc_internal() which I do not have access to (and indeed it seems like the wrong thing to make it extern). As you pointed out yesterday, this is C++ code, so a designated struct initializer can't be used. Oops. We still need to avoid passing around uninitialized data, though. I think memset is a good choice here. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Janitorial work: no more intel_context.[ch]; tidying
On 10/02/2013 12:51 PM, Matt Turner wrote: On Wed, Oct 2, 2013 at 11:02 AM, Ian Romanick i...@freedesktop.org wrote: (Adding Alan to the CC list.) On 10/01/2013 10:51 PM, Vinson Lee wrote: On Mon, Sep 30, 2013 at 10:21 PM, Kenneth Graunke kenn...@whitecape.org wrote: On 09/27/2013 06:24 PM, Emil Velikov wrote: * With the recent split of the intel driver codebase, the new i965 headers has been getting a bunch of #pragma once over the standard #ifndef _HEADER_H_... Are those intentional ? Yup, that's intentional. #pragma once doesn't require inventing a unique #define name, is less typing, and is faster on some compilers. I actually forgot that it wasn't standard. It's supported basically everywhere, though, so I'd be really shocked if it caused problems. Oracle Solaris Studio does not support #pragma once. Is there *any* reason to use that compiler over GCC? This isn't the first time that we've discovered it to be lacking some feature that GCC, clang, and Visual Studio all support. :( Before we go down this rabbit hole -- Vinson said it doesn't support #pragma once. He didn't say it caused problems. I don't expect it is, since we're already using it and have been for a long time. It probably just means that you have to to #pragma once along with the standard #ifndef ... #endif wrapper. I'm not opposed to doing that. I just didn't think it was necessary anymore. However, note that brw_blorp.h, brw_fs.h, brw_shader.h, gen6_blorp.h, gen7_blorp.h, and intel_resolve_map.h already use #pragma once and don't use the standard #ifndef...#endif wrapping. I think those are all C++ based, though... Maybe we should switch those. --Ken ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev