Re: [Mesa-dev] [PATCH] mesa: fix make check for ARB_texture_gather

2013-10-03 Thread Kenneth Graunke
On 10/02/2013 06:11 PM, Chris Forbes wrote:
 Clean up inconsistency in enum decoration:
 - Use the undecorated enums where possible.
 - MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB remains decorated, since it
   has no undecorated equivalent in GL4.
 
 Signed-off-by: Chris Forbes chr...@ijw.co.nz

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70054
Reviewed-by: Kenneth Graunke kenn...@whitecape.org
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 70054] EnumStrings.LookUpByNumber regression

2013-10-03 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70054

Chris Forbes chr...@ijw.co.nz changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Chris Forbes chr...@ijw.co.nz ---
Fix is on master now.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] gallium clear and depth mask clarification

2013-10-03 Thread Jose Fonseca
I believe depth clears should not be affected by pipe_depth_state::writemask.

I suspect that the only reason depth mask is not explicitly enable is because 
it is a boolean, unlike color/stencil write mask which are proper bitmasks.

Therefore there is no additional information in depth write mask than what's 
already expressed by the PIPE_CLEAR_DEPTH bit.

Jose

- Original Message -
 Just want to check an inconsistency,
 
 so GL clears respect glDepthMask, gallium docs don't explicitly
 mention depth masking, they say clear isn't affected by color or
 stencil write masks, should that sentence contain depth?
 
 Dave.
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] st/clover: Always flush the queue when waiting on an hard_event

2013-10-03 Thread Niels Ole Salscheider
 I don't think this is right, with this patch we remove *all* events from
 the command queue, signalled or not, every time the command queue is
 flushed.

You are right, I got the logic wrong here (see also 
http://lists.freedesktop.org/archives/mesa-dev/2013-September/044363.html).

The problem is that I have an application that causes a leak of event objects. 
That is, some events are never deleted from the queue. I will have to debug 
this further, but I am somewhat busy right now since I a have just relocated.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] st/clover: Always flush the queue when waiting on an hard_event

2013-10-03 Thread Niels Ole Salscheider
 Do you have any example of a real world application that relies on this?
 Or at least some reasonable use case?

The problem is that the queue is only cleared from already signalled events 
when we flush it. And we might not do this if the user only calls 
clWaitForEvents once the corresponding event has already been signalled.

I am fine with not flushing the queue, but we should at least make sure that 
signalled events are freed early enough.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] gallivm: ignore rho approximation for cube maps

2013-10-03 Thread sroland
From: Roland Scheidegger srol...@vmware.com

There's two reasons for this:
1) even when ignoring rho approximation for cube maps, the result is still
not correct, but it's better as the max error at edges is now sqrt(2) instead
of 2 (which was a full mip level), same as it is for ordinary 2d maps when
doing rho approximations (so the error actually goes from factor 2 at edges and
sqrt(2) completely inside a face to sqrt(2) at edges and 0 inside a face).
2) I want to repurpose rho_no_approx for cubemaps for fully correct cubemap
derivatives (so don't need yet another debug var).
---
 src/gallium/auxiliary/gallivm/lp_bld_sample.c |   34 +
 1 file changed, 12 insertions(+), 22 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c 
b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
index c775382..ea6bec7 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
@@ -269,10 +269,8 @@ lp_build_rho(struct lp_build_sample_context *bld,
   /* Could optimize this for single quad just skip the broadcast */
   cubesize = lp_build_extract_broadcast(gallivm, bld-float_size_in_type,
 rho_bld-type, float_size, index0);
-  if (no_rho_opt) {
- /* skipping sqrt hence returning rho squared */
- cubesize = lp_build_mul(rho_bld, cubesize, cubesize);
-  }
+  /* skipping sqrt hence returning rho squared */
+  cubesize = lp_build_mul(rho_bld, cubesize, cubesize);
   rho = lp_build_mul(rho_bld, cubesize, rho);
}
else if (derivs  !(bld-static_texture_state-target == 
PIPE_TEXTURE_CUBE)) {
@@ -757,8 +755,8 @@ lp_build_lod_selector(struct lp_build_sample_context *bld,
   }
   else {
  LLVMValueRef rho;
- boolean rho_squared = (gallivm_debug  GALLIVM_DEBUG_NO_RHO_APPROX) 
-   (bld-dims  1);
+ boolean rho_squared = ((gallivm_debug  GALLIVM_DEBUG_NO_RHO_APPROX) 

+(bld-dims  1)) || cube_rho;
 
  rho = lp_build_rho(bld, texture_unit, s, t, r, cube_rho, derivs);
 
@@ -1602,31 +1600,23 @@ lp_build_cube_lookup(struct lp_build_sample_context 
*bld,
   * know the texture is square which simplifies things (we can omit the
   * size mul which happens very early completely here and do it at the
   * very end).
+  * Also always do calculations according to 
GALLIVM_DEBUG_NO_RHO_APPROX
+  * since the error can get quite big otherwise at edges.
+  * (With no_rho_approx max error is sqrt(2) at edges, same as it is
+  * without no_rho_approx for 2d textures, otherwise it would be 
factor 2.)
   */
  ddx_ddy[0] = lp_build_packed_ddx_ddy_twocoord(coord_bld, s, t);
  ddx_ddy[1] = lp_build_packed_ddx_ddy_onecoord(coord_bld, r);
 
- if (gallivm_debug  GALLIVM_DEBUG_NO_RHO_APPROX) {
-ddx_ddy[0] = lp_build_mul(coord_bld, ddx_ddy[0], ddx_ddy[0]);
-ddx_ddy[1] = lp_build_mul(coord_bld, ddx_ddy[1], ddx_ddy[1]);
- }
- else {
-ddx_ddy[0] = lp_build_abs(coord_bld, ddx_ddy[0]);
-ddx_ddy[1] = lp_build_abs(coord_bld, ddx_ddy[1]);
- }
+ ddx_ddy[0] = lp_build_mul(coord_bld, ddx_ddy[0], ddx_ddy[0]);
+ ddx_ddy[1] = lp_build_mul(coord_bld, ddx_ddy[1], ddx_ddy[1]);
 
  tmp[0] = lp_build_swizzle_aos(coord_bld, ddx_ddy[0], swizzle01);
  tmp[1] = lp_build_swizzle_aos(coord_bld, ddx_ddy[0], swizzle23);
  tmp[2] = lp_build_swizzle_aos(coord_bld, ddx_ddy[1], swizzle02);
 
- if (gallivm_debug  GALLIVM_DEBUG_NO_RHO_APPROX) {
-rho_vec = lp_build_add(coord_bld, tmp[0], tmp[1]);
-rho_vec = lp_build_add(coord_bld, rho_vec, tmp[2]);
- }
- else {
-rho_vec = lp_build_max(coord_bld, tmp[0], tmp[1]);
-rho_vec = lp_build_max(coord_bld, rho_vec, tmp[2]);
- }
+ rho_vec = lp_build_add(coord_bld, tmp[0], tmp[1]);
+ rho_vec = lp_build_add(coord_bld, rho_vec, tmp[2]);
 
  tmp[0] = lp_build_swizzle_aos(coord_bld, rho_vec, swizzle0);
  tmp[1] = lp_build_swizzle_aos(coord_bld, rho_vec, swizzle1);
-- 
1.7.9.5
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] gallivm: handle explicit derivatives for cubemaps

2013-10-03 Thread sroland
From: Roland Scheidegger srol...@vmware.com

They need some special handling. Quite complicated.
Additionally, use the same code for implicit derivatives too if no_rho_approx
and no_quad_lod is set, because it seems while generally it should be ok
to use per quad lod for implicit derivatives there's at least some test which
insists that in case of cubemaps the shared lod value MUST come from a pixel
inside the primitive (due to the derivatives becoming different if a different
larger major axis is chosen).
---
 src/gallium/auxiliary/gallivm/lp_bld_sample.c |  221 +++--
 src/gallium/auxiliary/gallivm/lp_bld_sample.h |3 +-
 src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |   35 +++-
 3 files changed, 231 insertions(+), 28 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c 
b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
index ea6bec7..ce05522 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
@@ -273,7 +273,7 @@ lp_build_rho(struct lp_build_sample_context *bld,
   cubesize = lp_build_mul(rho_bld, cubesize, cubesize);
   rho = lp_build_mul(rho_bld, cubesize, rho);
}
-   else if (derivs  !(bld-static_texture_state-target == 
PIPE_TEXTURE_CUBE)) {
+   else if (derivs) {
   LLVMValueRef ddmax[3], ddx[3], ddy[3];
   for (i = 0; i  dims; i++) {
  LLVMValueRef floatdim;
@@ -1488,8 +1488,9 @@ lp_build_cube_face(struct lp_build_sample_context *bld,
 void
 lp_build_cube_lookup(struct lp_build_sample_context *bld,
  LLVMValueRef *coords,
- const struct lp_derivatives *derivs, /* optional */
+ const struct lp_derivatives *derivs_in, /* optional */
  LLVMValueRef *rho,
+ struct lp_derivatives *derivs_out, /* optional */
  boolean need_derivs)
 {
struct lp_build_context *coord_bld = bld-coord_bld;
@@ -1512,8 +1513,6 @@ lp_build_cube_lookup(struct lp_build_sample_context *bld,
* the edge). Still this is possibly a win over just selecting the same 
face
* for all pixels. Unfortunately, something like that doesn't work for
* explicit derivatives.
-   * TODO: handle explicit derivatives by transforming them alongside 
coords
-   * somehow.
*/
   struct lp_build_context *cint_bld = bld-int_coord_bld;
   struct lp_type intctype = cint_bld-type;
@@ -1522,7 +1521,7 @@ lp_build_cube_lookup(struct lp_build_sample_context *bld,
   LLVMValueRef as_ge_at, maxasat, ar_ge_as_at;
   LLVMValueRef snewx, tnewx, snewy, tnewy, snewz, tnewz;
   LLVMValueRef tnegi, rnegi;
-  LLVMValueRef ma, mai, ima;
+  LLVMValueRef ma, mai, imahalfpos;
   LLVMValueRef posHalf = lp_build_const_vec(gallivm, coord_bld-type, 0.5);
   LLVMValueRef signmask = lp_build_const_int_vec(gallivm, intctype,
  1  (intctype.width - 
1));
@@ -1561,7 +1560,195 @@ lp_build_cube_lookup(struct lp_build_sample_context 
*bld,
   maxasat = lp_build_max(coord_bld, as, at);
   ar_ge_as_at = lp_build_cmp(coord_bld, PIPE_FUNC_GEQUAL, ar, maxasat);
 
-  if (need_derivs) {
+  if (need_derivs  (derivs_in ||
+  ((gallivm_debug  GALLIVM_DEBUG_NO_QUAD_LOD) 
+   (gallivm_debug  GALLIVM_DEBUG_NO_RHO_APPROX {
+ /*
+  * XXX: This is really really complex.
+  * It is a bit overkill to use this for implicit derivatives as well,
+  * no way this is worth the cost in practice, but seems to be the
+  * only way for getting accurate and per-pixel lod values.
+  */
+ LLVMValueRef imapos, tmp, ddx[3], ddy[3];
+ LLVMValueRef madx, mady, madxdivma, madydivma;
+ LLVMValueRef sdxi, tdxi, rdxi, signsdx, signtdx, signrdx;
+ LLVMValueRef sdyi, tdyi, rdyi, signsdy, signtdy, signrdy;
+ LLVMValueRef tdxnegi, rdxnegi, tdynegi, rdynegi;
+ LLVMValueRef sdxnewx, sdxnewy, sdxnewz, tdxnewx, tdxnewy, tdxnewz;
+ LLVMValueRef sdynewx, sdynewy, sdynewz, tdynewx, tdynewy, tdynewz;
+ LLVMValueRef face_sdx, face_tdx, face_sdy, face_tdy;
+ LLVMValueRef posHalf = lp_build_const_vec(coord_bld-gallivm,
+   coord_bld-type, 0.5);
+ /*
+  * s = 1/2 * ( sc / ma + 1)
+  * t = 1/2 * ( tc / ma + 1)
+  *
+  * s' = 1/2 * (sc' * ma - sc * ma') / ma^2
+  * t' = 1/2 * (tc' * ma - tc * ma') / ma^2
+  *
+  * dx.s = 0.5 * (dx.sc - sc * dx.ma / ma) / ma
+  * dx.t = 0.5 * (dx.tc - tc * dx.ma / ma) / ma
+  * dy.s = 0.5 * (dy.sc - sc * dy.ma / ma) / ma
+  * dy.t = 0.5 * (dy.tc - tc * dy.ma / ma) / ma
+  */
+
+ /* select ma, calculate ima */
+ ma = lp_build_select(coord_bld, as_ge_at, s, t);
+ ma = lp_build_select(coord_bld, ar_ge_as_at, r, ma);

[Mesa-dev] [PATCH 3/3] gallivm: kill old per-quad face selection code

2013-10-03 Thread sroland
From: Roland Scheidegger srol...@vmware.com

Not used since ages, and it wouldn't work at all with explicit derivatives now
(not that it did before as it ignored them but now the code would just use
the derivs pre-projected which would be quite random numbers).
---
 src/gallium/auxiliary/gallivm/lp_bld_sample.c |  751 +++--
 1 file changed, 313 insertions(+), 438 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c 
b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
index ce05522..3fac981 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
@@ -1493,323 +1493,135 @@ lp_build_cube_lookup(struct lp_build_sample_context 
*bld,
  struct lp_derivatives *derivs_out, /* optional */
  boolean need_derivs)
 {
+   /*
+* Do per-pixel face selection. We cannot however (as we used to do)
+* simply calculate the derivs afterwards (which is very bogus for
+* explicit derivs btw) because the values would be random when
+* not all pixels lie on the same face. So what we do here is just
+* calculate the derivatives after scaling the coords by the absolute
+* value of the inverse major axis, and essentially do rho calculation
+* steps as if it were a 3d texture. This is perfect if all pixels hit
+* the same face, but not so great at edges, I believe the max error
+* should be sqrt(2) with no_rho_approx or 2 otherwise (essentially 
measuring
+* the 3d distance between 2 points on the cube instead of measuring up/down
+* the edge). Still this is possibly a win over just selecting the same face
+* for all pixels. Unfortunately, something like that doesn't work for
+* explicit derivatives.
+*/
struct lp_build_context *coord_bld = bld-coord_bld;
LLVMBuilderRef builder = bld-gallivm-builder;
struct gallivm_state *gallivm = bld-gallivm;
LLVMValueRef si, ti, ri;
+   struct lp_build_context *cint_bld = bld-int_coord_bld;
+   struct lp_type intctype = cint_bld-type;
+   LLVMValueRef signs, signt, signr, signma;
+   LLVMValueRef as, at, ar, face, face_s, face_t;
+   LLVMValueRef as_ge_at, maxasat, ar_ge_as_at;
+   LLVMValueRef snewx, tnewx, snewy, tnewy, snewz, tnewz;
+   LLVMValueRef tnegi, rnegi;
+   LLVMValueRef ma, mai, imahalfpos;
+   LLVMValueRef posHalf = lp_build_const_vec(gallivm, coord_bld-type, 0.5);
+   LLVMValueRef signmask = lp_build_const_int_vec(gallivm, intctype,
+  1  (intctype.width - 1));
+   LLVMValueRef signshift = lp_build_const_int_vec(gallivm, intctype,
+   intctype.width -1);
+   LLVMValueRef facex = lp_build_const_int_vec(gallivm, intctype, 
PIPE_TEX_FACE_POS_X);
+   LLVMValueRef facey = lp_build_const_int_vec(gallivm, intctype, 
PIPE_TEX_FACE_POS_Y);
+   LLVMValueRef facez = lp_build_const_int_vec(gallivm, intctype, 
PIPE_TEX_FACE_POS_Z);
+   LLVMValueRef s = coords[0];
+   LLVMValueRef t = coords[1];
+   LLVMValueRef r = coords[2];
+
+   assert(PIPE_TEX_FACE_NEG_X == PIPE_TEX_FACE_POS_X + 1);
+   assert(PIPE_TEX_FACE_NEG_Y == PIPE_TEX_FACE_POS_Y + 1);
+   assert(PIPE_TEX_FACE_NEG_Z == PIPE_TEX_FACE_POS_Z + 1);
 
-   if (1 || coord_bld-type.length  4) {
-  /*
-   * Do per-pixel face selection. We cannot however (as we used to do)
-   * simply calculate the derivs afterwards (which is very bogus for
-   * explicit derivs btw) because the values would be random when
-   * not all pixels lie on the same face. So what we do here is just
-   * calculate the derivatives after scaling the coords by the absolute
-   * value of the inverse major axis, and essentially do rho calculation
-   * steps as if it were a 3d texture. This is perfect if all pixels hit
-   * the same face, but not so great at edges, I believe the max error
-   * should be sqrt(2) with no_rho_approx or 2 otherwise (essentially 
measuring
-   * the 3d distance between 2 points on the cube instead of measuring 
up/down
-   * the edge). Still this is possibly a win over just selecting the same 
face
-   * for all pixels. Unfortunately, something like that doesn't work for
-   * explicit derivatives.
-   */
-  struct lp_build_context *cint_bld = bld-int_coord_bld;
-  struct lp_type intctype = cint_bld-type;
-  LLVMValueRef signs, signt, signr, signma;
-  LLVMValueRef as, at, ar, face, face_s, face_t;
-  LLVMValueRef as_ge_at, maxasat, ar_ge_as_at;
-  LLVMValueRef snewx, tnewx, snewy, tnewy, snewz, tnewz;
-  LLVMValueRef tnegi, rnegi;
-  LLVMValueRef ma, mai, imahalfpos;
-  LLVMValueRef posHalf = lp_build_const_vec(gallivm, coord_bld-type, 0.5);
-  LLVMValueRef signmask = lp_build_const_int_vec(gallivm, intctype,
- 1  (intctype.width - 
1));
-  LLVMValueRef signshift = 

Re: [Mesa-dev] [PATCH] gen7: Use logical, not physical, dims in 3DSTATE_DEPTH_BUFFER (v2)

2013-10-03 Thread Jordan Justen
It would be good to test HSW too.

Reviewed-by: Jordan Justen jordan.l.jus...@intel.com

On Wed, 2013-10-02 at 17:50 -0700, Chad Versace wrote:
 In 3DSTATE_DEPTH_BUFFER, we set Width and Height to the miptree slice's
 physical dimensions. (Logical and physical dimensions may differ for
 multisample surfaces).
 
 However, in SURFACE_STATE, we always set Width and Height to the slice's
 logical dimensions. We should do the same for 3DSTATE_DEPTH_BUFFER,
 because the hw docs say so.
 
 No Piglit regressions (-x glx -x glean) on Ivybridge with Wayland.
 
 v2: No Piglit regressions, for real this time.
 
 CC: Jordan Justen jordan.l.jus...@intel.com
 CC: Eric Anholt e...@anholt.org
 Signed-off-by: Chad Versace chad.vers...@linux.intel.com
 ---
 
 My first patch was garbage. It segfaulted in gen7_misc_state.c and hung the 
 GPU
 in gen7_blorp.cpp. I thought I regression tested it, but my Piglit automation
 scripts suck; the scripts just ran Piglit twice on master.
 
 I really regression tested this patch. I promise.
 
 
 
  src/mesa/drivers/dri/i965/gen7_blorp.cpp| 4 ++--
  src/mesa/drivers/dri/i965/gen7_misc_state.c | 4 ++--
  2 files changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp 
 b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
 index 9df3d92..f64e536 100644
 --- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp
 +++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
 @@ -706,8 +706,8 @@ gen7_blorp_emit_depth_stencil_config(struct brw_context 
 *brw,
surfwidth = params-depth.width;
surfheight = params-depth.height;
 } else {
 -  surfwidth = params-depth.mt-physical_width0;
 -  surfheight = params-depth.mt-physical_height0;
 +  surfwidth = params-depth.mt-logical_width0;
 +  surfheight = params-depth.mt-logical_height0;
 }
  
 /* 3DSTATE_DEPTH_BUFFER */
 diff --git a/src/mesa/drivers/dri/i965/gen7_misc_state.c 
 b/src/mesa/drivers/dri/i965/gen7_misc_state.c
 index eb942cf..3f3833e 100644
 --- a/src/mesa/drivers/dri/i965/gen7_misc_state.c
 +++ b/src/mesa/drivers/dri/i965/gen7_misc_state.c
 @@ -93,8 +93,8 @@ gen7_emit_depth_stencil_hiz(struct brw_context *brw,
 lod = irb ? irb-mt_level - irb-mt-first_level : 0;
  
 if (mt) {
 -  width = mt-physical_width0;
 -  height = mt-physical_height0;
 +  width = mt-logical_width0;
 +  height = mt-logical_height0;
 }
  
 /* _NEW_DEPTH, _NEW_STENCIL, _NEW_BUFFERS */


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/10] i965/fs: Add a peephole pass to combine ADD with ADDC/SUBB.

2013-10-03 Thread Matt Turner
v2: Check fixed_hw_reg.{file,nr} instead of dst.reg.
v3: Store the bool emitted_addc_or_subb in the class, not static.
---
 src/mesa/drivers/dri/i965/brw_fs.h   |   3 +
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 104 +++
 2 files changed, 107 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 6a53e59..c703c2b 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -345,6 +345,7 @@ public:
 fs_reg src0, fs_reg src1);
bool try_emit_saturate(ir_expression *ir);
bool try_emit_mad(ir_expression *ir, int mul_arg);
+   void try_combine_add_with_addc_subb();
void try_replace_with_sel();
void emit_bool_to_cond_code(ir_rvalue *condition);
void emit_if_gen6(ir_if *ir);
@@ -458,6 +459,8 @@ public:
 
int force_uncompressed_stack;
int force_sechalf_stack;
+
+   bool emitted_addc_or_subb;
 };
 
 /**
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index b8c30e6..8accbd6 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -313,6 +313,102 @@ fs_visitor::try_emit_mad(ir_expression *ir, int mul_arg)
return true;
 }
 
+/**
+ * The addition and carry in the uaddCarry() built-in function are implemented
+ * separately as ir_binop_add and ir_binop_carry respectively. i965 generates
+ * ADDC and a MOV from the accumulator for the carry.
+ *
+ * The generated code for uaddCarry(uint x, uint y, out uint carry) would look
+ * like this:
+ *
+ *addc null, x, y
+ *mov  carry, acc0
+ *add  sum, x, y
+ *
+ * This peephole pass optimizes this into
+ *
+ *addc sum, x, y
+ *mov carry, acc0
+ *
+ * usubBorrow() works in the same fashion.
+ */
+void
+fs_visitor::try_combine_add_with_addc_subb()
+{
+   /* ADDC/SUBB was introduced in gen7. */
+   if (brw-gen  7)
+  return;
+
+   fs_inst *add_inst = (fs_inst *) instructions.get_tail();
+   assert(add_inst-opcode == BRW_OPCODE_ADD);
+
+   /* ADDC/SUBB only operates on UD. */
+   if (add_inst-dst.type != BRW_REGISTER_TYPE_UD ||
+   add_inst-src[0].type != BRW_REGISTER_TYPE_UD ||
+   add_inst-src[1].type != BRW_REGISTER_TYPE_UD)
+  return;
+
+   bool found = false;
+   fs_inst *match = (fs_inst *) add_inst-prev;
+   /* The ADDC should appear within 8 instructions of ADD for a vec4. SUBB
+* should appear farther away because of the extra MOV negates.
+*/
+   for (int i = 0; i  16; i++, match = (fs_inst *) match-prev) {
+  if (match-is_head_sentinel())
+ return;
+
+  /* Look for an ADDC/SUBB instruction whose destination is the null
+   * register (ir_binop_carry emits ADDC with null destination; same for
+   * ir_binop_borrow with SUBB) and whose sources are identical to those
+   * of the ADD.
+   */
+  if (match-opcode != BRW_OPCODE_ADDC  match-opcode != BRW_OPCODE_SUBB)
+ continue;
+
+  /* Only look for newly emitted ADDC/SUBB with null destination. */
+  if (match-dst.file != HW_REG ||
+  match-dst.fixed_hw_reg.file != BRW_ARCHITECTURE_REGISTER_FILE ||
+  match-dst.fixed_hw_reg.nr != BRW_ARF_NULL)
+ continue;
+
+  fs_reg *src0 = add_inst-src[0];
+  fs_reg *src1 = add_inst-src[1];
+
+  /* For SUBB, the ADD's second source will contain a negate modifier
+   * which at this point will be in the form of a
+   *
+   *MOV dst, -src
+   *
+   * instruction, so src[1].file will be GRF, even if it's a uniform push
+   * constant.
+   */
+  if (match-src[1].reg != add_inst-src[1].reg) {
+ /* The negating MOV should be immediately before the ADD. */
+ fs_inst *mov_inst = (fs_inst *) add_inst-prev;
+ if (mov_inst-opcode != BRW_OPCODE_MOV)
+continue;
+
+ src1 = mov_inst-src[0];
+  }
+
+  /* If everything matches, we're done. */
+  if (match-src[0].file == src0-file 
+  match-src[1].file == src1-file 
+  match-src[0].reg == src0-reg 
+  match-src[1].reg == src1-reg 
+  match-src[0].reg_offset == src0-reg_offset 
+  match-src[1].reg_offset == src1-reg_offset) {
+ found = true;
+ break;
+  }
+   }
+
+   if (found) {
+  match-dst = add_inst-dst;
+  add_inst-remove();
+   }
+}
+
 void
 fs_visitor::visit(ir_expression *ir)
 {
@@ -415,6 +511,8 @@ fs_visitor::visit(ir_expression *ir)
 
case ir_binop_add:
   emit(ADD(this-result, op[0], op[1]));
+  if (emitted_addc_or_subb)
+ try_combine_add_with_addc_subb();
   break;
case ir_binop_sub:
   assert(!not reached: should be handled by ir_sub_to_add_neg);
@@ -451,6 +549,8 @@ fs_visitor::visit(ir_expression *ir)
   if (brw-gen = 7  dispatch_width == 16)
  fail(16-wide explicit accumulator operands unsupported\n);
 
+  emitted_addc_or_subb = 

[Mesa-dev] [PATCH 10/10] i965/vs: Add a peephole pass to combine ADD with ADDC/SUBB.

2013-10-03 Thread Matt Turner
v2: Check fixed_hw_reg.{file,nr} instead of dst.reg.
v3: Store the bool emitted_addc_or_subb in the class, not static.
---
 src/mesa/drivers/dri/i965/brw_vec4.h   |   3 +
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 104 -
 2 files changed, 106 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 25427d7..9e2204d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -507,6 +507,7 @@ public:
 
bool try_emit_sat(ir_expression *ir);
bool try_emit_mad(ir_expression *ir, int mul_arg);
+   void try_combine_add_with_addc_subb();
void resolve_ud_negate(src_reg *reg);
 
src_reg get_timestamp();
@@ -530,6 +531,8 @@ protected:
virtual int compute_array_stride(ir_dereference_array *ir);
 
const bool debug_flag;
+
+   bool emitted_addc_or_subb;
 };
 
 
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index ffb2cfc..74bdd4d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -1122,6 +1122,102 @@ vec4_visitor::try_emit_mad(ir_expression *ir, int 
mul_arg)
return true;
 }
 
+/**
+ * The addition and carry in the uaddCarry() built-in function are implemented
+ * separately as ir_binop_add and ir_binop_carry respectively. i965 generates
+ * ADDC and a MOV from the accumulator for the carry.
+ *
+ * The generated code for uaddCarry(uint x, uint y, out uint carry) would look
+ * like this:
+ *
+ *addc null, x, y
+ *mov  carry, acc0
+ *add  sum, x, y
+ *
+ * This peephole pass optimizes this into
+ *
+ *addc sum, x, y
+ *mov carry, acc0
+ *
+ * usubBorrow() works in the same fashion.
+ */
+void
+vec4_visitor::try_combine_add_with_addc_subb()
+{
+   /* ADDC/SUBB was introduced in gen7. */
+   if (brw-gen  7)
+  return;
+
+   vec4_instruction *add_inst = (vec4_instruction *) instructions.get_tail();
+   assert(add_inst-opcode == BRW_OPCODE_ADD);
+
+   /* ADDC/SUBB only operates on UD. */
+   if (add_inst-dst.type != BRW_REGISTER_TYPE_UD ||
+   add_inst-src[0].type != BRW_REGISTER_TYPE_UD ||
+   add_inst-src[1].type != BRW_REGISTER_TYPE_UD)
+  return;
+
+   bool found = false;
+   vec4_instruction *match = (vec4_instruction *) add_inst-prev;
+   /* The ADDC should appear within 2 instructions of ADD. SUBB should appear
+* farther away because of the extra MOV negate.
+*/
+   for (int i = 0; i  4; i++, match = (vec4_instruction *) match-prev) {
+  if (match-is_head_sentinel())
+ return;
+
+  /* Look for an ADDC/SUBB instruction whose destination is the null
+   * register (ir_binop_carry emits ADDC with null destination; same for
+   * ir_binop_borrow with SUBB) and whose sources are identical to those
+   * of the ADD.
+   */
+  if (match-opcode != BRW_OPCODE_ADDC  match-opcode != BRW_OPCODE_SUBB)
+ continue;
+
+  /* Only look for newly emitted ADDC/SUBB with null destination. */
+  if (match-dst.file != HW_REG ||
+  match-dst.fixed_hw_reg.file != BRW_ARCHITECTURE_REGISTER_FILE ||
+  match-dst.fixed_hw_reg.nr != BRW_ARF_NULL)
+ continue;
+
+  src_reg *src0 = add_inst-src[0];
+  src_reg *src1 = add_inst-src[1];
+
+  /* For SUBB, the ADD's second source will contain a negate modifier
+   * which at this point will be in the form of a
+   *
+   *MOV dst, -src
+   *
+   * instruction, so src[1].file will be GRF, even if it's a uniform push
+   * constant.
+   */
+  if (match-src[1].reg != add_inst-src[1].reg) {
+ /* The negating MOV should be immediately before the ADD. */
+ vec4_instruction *mov_inst = (vec4_instruction *) add_inst-prev;
+ if (mov_inst-opcode != BRW_OPCODE_MOV)
+continue;
+
+ src1 = mov_inst-src[0];
+  }
+
+  /* If everything matches, we're done. */
+  if (match-src[0].file == src0-file 
+  match-src[1].file == src1-file 
+  match-src[0].reg == src0-reg 
+  match-src[1].reg == src1-reg 
+  match-src[0].reg_offset == src0-reg_offset 
+  match-src[1].reg_offset == src1-reg_offset) {
+ found = true;
+ break;
+  }
+   }
+
+   if (found) {
+  match-dst = add_inst-dst;
+  add_inst-remove();
+   }
+}
+
 void
 vec4_visitor::emit_bool_comparison(unsigned int op,
 dst_reg dst, src_reg src0, src_reg src1)
@@ -1319,6 +1415,8 @@ vec4_visitor::visit(ir_expression *ir)
 
case ir_binop_add:
   emit(ADD(result_dst, op[0], op[1]));
+  if (emitted_addc_or_subb)
+ try_combine_add_with_addc_subb();
   break;
case ir_binop_sub:
   assert(!not reached: should be handled by ir_sub_to_add_neg);
@@ -1359,6 +1457,8 @@ vec4_visitor::visit(ir_expression *ir)
   

Re: [Mesa-dev] [PATCH 2/2] st/clover: Always flush the queue when waiting on an hard_event

2013-10-03 Thread Francisco Jerez
Niels Ole Salscheider niels_...@salscheider-online.de writes:

 Do you have any example of a real world application that relies on this?
 Or at least some reasonable use case?

 The problem is that the queue is only cleared from already signalled events 
 when we flush it. And we might not do this if the user only calls 
 clWaitForEvents once the corresponding event has already been signalled.

 I am fine with not flushing the queue, but we should at least make sure that 
 signalled events are freed early enough.

So your application doesn't call clFlush() explicitly nor any blocking
call on that specific event and it stalls forever polling an event with
clGetEventInfo() that never gets flushed to the GPU?  Is that the
problem you've seen?  Is it an open source application?

Thanks.


pgpJ9Q9lsmx8m.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] gallivm: handle explicit derivatives for cubemaps

2013-10-03 Thread Brian Paul

On 10/03/2013 09:42 AM, srol...@vmware.com wrote:

From: Roland Scheidegger srol...@vmware.com

They need some special handling. Quite complicated.
Additionally, use the same code for implicit derivatives too if no_rho_approx
and no_quad_lod is set, because it seems while generally it should be ok
to use per quad lod for implicit derivatives there's at least some test which
insists that in case of cubemaps the shared lod value MUST come from a pixel
inside the primitive (due to the derivatives becoming different if a different
larger major axis is chosen).
---
  src/gallium/auxiliary/gallivm/lp_bld_sample.c |  221 +++--
  src/gallium/auxiliary/gallivm/lp_bld_sample.h |3 +-
  src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |   35 +++-
  3 files changed, 231 insertions(+), 28 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c 
b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
index ea6bec7..ce05522 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
@@ -273,7 +273,7 @@ lp_build_rho(struct lp_build_sample_context *bld,
cubesize = lp_build_mul(rho_bld, cubesize, cubesize);
rho = lp_build_mul(rho_bld, cubesize, rho);
 }
-   else if (derivs  !(bld-static_texture_state-target == 
PIPE_TEXTURE_CUBE)) {
+   else if (derivs) {
LLVMValueRef ddmax[3], ddx[3], ddy[3];
for (i = 0; i  dims; i++) {
   LLVMValueRef floatdim;
@@ -1488,8 +1488,9 @@ lp_build_cube_face(struct lp_build_sample_context *bld,
  void
  lp_build_cube_lookup(struct lp_build_sample_context *bld,
   LLVMValueRef *coords,
- const struct lp_derivatives *derivs, /* optional */
+ const struct lp_derivatives *derivs_in, /* optional */
   LLVMValueRef *rho,
+ struct lp_derivatives *derivs_out, /* optional */
   boolean need_derivs)
  {
 struct lp_build_context *coord_bld = bld-coord_bld;
@@ -1512,8 +1513,6 @@ lp_build_cube_lookup(struct lp_build_sample_context *bld,
 * the edge). Still this is possibly a win over just selecting the same 
face
 * for all pixels. Unfortunately, something like that doesn't work for
 * explicit derivatives.
-   * TODO: handle explicit derivatives by transforming them alongside 
coords
-   * somehow.
 */
struct lp_build_context *cint_bld = bld-int_coord_bld;
struct lp_type intctype = cint_bld-type;
@@ -1522,7 +1521,7 @@ lp_build_cube_lookup(struct lp_build_sample_context *bld,
LLVMValueRef as_ge_at, maxasat, ar_ge_as_at;
LLVMValueRef snewx, tnewx, snewy, tnewy, snewz, tnewz;
LLVMValueRef tnegi, rnegi;
-  LLVMValueRef ma, mai, ima;
+  LLVMValueRef ma, mai, imahalfpos;
LLVMValueRef posHalf = lp_build_const_vec(gallivm, coord_bld-type, 
0.5);
LLVMValueRef signmask = lp_build_const_int_vec(gallivm, intctype,
   1  (intctype.width - 
1));
@@ -1561,7 +1560,195 @@ lp_build_cube_lookup(struct lp_build_sample_context 
*bld,
maxasat = lp_build_max(coord_bld, as, at);
ar_ge_as_at = lp_build_cmp(coord_bld, PIPE_FUNC_GEQUAL, ar, maxasat);

-  if (need_derivs) {
+  if (need_derivs  (derivs_in ||
+  ((gallivm_debug  GALLIVM_DEBUG_NO_QUAD_LOD) 
+   (gallivm_debug  GALLIVM_DEBUG_NO_RHO_APPROX {
+ /*
+  * XXX: This is really really complex.
+  * It is a bit overkill to use this for implicit derivatives as well,
+  * no way this is worth the cost in practice, but seems to be the
+  * only way for getting accurate and per-pixel lod values.
+  */
+ LLVMValueRef imapos, tmp, ddx[3], ddy[3];
+ LLVMValueRef madx, mady, madxdivma, madydivma;
+ LLVMValueRef sdxi, tdxi, rdxi, signsdx, signtdx, signrdx;
+ LLVMValueRef sdyi, tdyi, rdyi, signsdy, signtdy, signrdy;
+ LLVMValueRef tdxnegi, rdxnegi, tdynegi, rdynegi;
+ LLVMValueRef sdxnewx, sdxnewy, sdxnewz, tdxnewx, tdxnewy, tdxnewz;
+ LLVMValueRef sdynewx, sdynewy, sdynewz, tdynewx, tdynewy, tdynewz;
+ LLVMValueRef face_sdx, face_tdx, face_sdy, face_tdy;
+ LLVMValueRef posHalf = lp_build_const_vec(coord_bld-gallivm,
+   coord_bld-type, 0.5);
+ /*
+  * s = 1/2 * ( sc / ma + 1)
+  * t = 1/2 * ( tc / ma + 1)
+  *
+  * s' = 1/2 * (sc' * ma - sc * ma') / ma^2
+  * t' = 1/2 * (tc' * ma - tc * ma') / ma^2
+  *
+  * dx.s = 0.5 * (dx.sc - sc * dx.ma / ma) / ma
+  * dx.t = 0.5 * (dx.tc - tc * dx.ma / ma) / ma
+  * dy.s = 0.5 * (dy.sc - sc * dy.ma / ma) / ma
+  * dy.t = 0.5 * (dy.tc - tc * dy.ma / ma) / ma
+  */
+
+ /* select ma, calculate ima */
+ ma = 

Re: [Mesa-dev] [PATCH 1/3] gallivm: ignore rho approximation for cube maps

2013-10-03 Thread Brian Paul

On 10/03/2013 09:42 AM, srol...@vmware.com wrote:

From: Roland Scheidegger srol...@vmware.com

There's two reasons for this:
1) even when ignoring rho approximation for cube maps, the result is still
not correct, but it's better as the max error at edges is now sqrt(2) instead
of 2 (which was a full mip level), same as it is for ordinary 2d maps when
doing rho approximations (so the error actually goes from factor 2 at edges and
sqrt(2) completely inside a face to sqrt(2) at edges and 0 inside a face).
2) I want to repurpose rho_no_approx for cubemaps for fully correct cubemap
derivatives (so don't need yet another debug var).
---
  src/gallium/auxiliary/gallivm/lp_bld_sample.c |   34 +
  1 file changed, 12 insertions(+), 22 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c 
b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
index c775382..ea6bec7 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
@@ -269,10 +269,8 @@ lp_build_rho(struct lp_build_sample_context *bld,
/* Could optimize this for single quad just skip the broadcast */
cubesize = lp_build_extract_broadcast(gallivm, bld-float_size_in_type,
  rho_bld-type, float_size, 
index0);
-  if (no_rho_opt) {
- /* skipping sqrt hence returning rho squared */
- cubesize = lp_build_mul(rho_bld, cubesize, cubesize);
-  }
+  /* skipping sqrt hence returning rho squared */
+  cubesize = lp_build_mul(rho_bld, cubesize, cubesize);
rho = lp_build_mul(rho_bld, cubesize, rho);
 }
 else if (derivs  !(bld-static_texture_state-target == 
PIPE_TEXTURE_CUBE)) {
@@ -757,8 +755,8 @@ lp_build_lod_selector(struct lp_build_sample_context *bld,
}
else {
   LLVMValueRef rho;
- boolean rho_squared = (gallivm_debug  GALLIVM_DEBUG_NO_RHO_APPROX) 
-   (bld-dims  1);
+ boolean rho_squared = ((gallivm_debug  GALLIVM_DEBUG_NO_RHO_APPROX) 

+(bld-dims  1)) || cube_rho;

   rho = lp_build_rho(bld, texture_unit, s, t, r, cube_rho, derivs);

@@ -1602,31 +1600,23 @@ lp_build_cube_lookup(struct lp_build_sample_context 
*bld,
* know the texture is square which simplifies things (we can omit 
the
* size mul which happens very early completely here and do it at the
* very end).
+  * Also always do calculations according to 
GALLIVM_DEBUG_NO_RHO_APPROX
+  * since the error can get quite big otherwise at edges.
+  * (With no_rho_approx max error is sqrt(2) at edges, same as it is
+  * without no_rho_approx for 2d textures, otherwise it would be 
factor 2.)
*/
   ddx_ddy[0] = lp_build_packed_ddx_ddy_twocoord(coord_bld, s, t);
   ddx_ddy[1] = lp_build_packed_ddx_ddy_onecoord(coord_bld, r);

- if (gallivm_debug  GALLIVM_DEBUG_NO_RHO_APPROX) {
-ddx_ddy[0] = lp_build_mul(coord_bld, ddx_ddy[0], ddx_ddy[0]);
-ddx_ddy[1] = lp_build_mul(coord_bld, ddx_ddy[1], ddx_ddy[1]);
- }
- else {
-ddx_ddy[0] = lp_build_abs(coord_bld, ddx_ddy[0]);
-ddx_ddy[1] = lp_build_abs(coord_bld, ddx_ddy[1]);
- }
+ ddx_ddy[0] = lp_build_mul(coord_bld, ddx_ddy[0], ddx_ddy[0]);
+ ddx_ddy[1] = lp_build_mul(coord_bld, ddx_ddy[1], ddx_ddy[1]);

   tmp[0] = lp_build_swizzle_aos(coord_bld, ddx_ddy[0], swizzle01);
   tmp[1] = lp_build_swizzle_aos(coord_bld, ddx_ddy[0], swizzle23);
   tmp[2] = lp_build_swizzle_aos(coord_bld, ddx_ddy[1], swizzle02);

- if (gallivm_debug  GALLIVM_DEBUG_NO_RHO_APPROX) {
-rho_vec = lp_build_add(coord_bld, tmp[0], tmp[1]);
-rho_vec = lp_build_add(coord_bld, rho_vec, tmp[2]);
- }
- else {
-rho_vec = lp_build_max(coord_bld, tmp[0], tmp[1]);
-rho_vec = lp_build_max(coord_bld, rho_vec, tmp[2]);
- }
+ rho_vec = lp_build_add(coord_bld, tmp[0], tmp[1]);
+ rho_vec = lp_build_add(coord_bld, rho_vec, tmp[2]);


I don't know how often we have these 3-way lp_build_add() sequences, but 
would an lp_build_add3(bld, a, b, c) be useful?





   tmp[0] = lp_build_swizzle_aos(coord_bld, rho_vec, swizzle0);
   tmp[1] = lp_build_swizzle_aos(coord_bld, rho_vec, swizzle1);



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] gallivm: kill old per-quad face selection code

2013-10-03 Thread Brian Paul

On 10/03/2013 09:42 AM, srol...@vmware.com wrote:

From: Roland Scheidegger srol...@vmware.com

Not used since ages, and it wouldn't work at all with explicit derivatives now
(not that it did before as it ignored them but now the code would just use
the derivs pre-projected which would be quite random numbers).
---
  src/gallium/auxiliary/gallivm/lp_bld_sample.c |  751 +++--
  1 file changed, 313 insertions(+), 438 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c 
b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
index ce05522..3fac981 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
@@ -1493,323 +1493,135 @@ lp_build_cube_lookup(struct lp_build_sample_context 
*bld,
   struct lp_derivatives *derivs_out, /* optional */
   boolean need_derivs)
  {
+   /*
+* Do per-pixel face selection. We cannot however (as we used to do)
+* simply calculate the derivs afterwards (which is very bogus for
+* explicit derivs btw) because the values would be random when
+* not all pixels lie on the same face. So what we do here is just
+* calculate the derivatives after scaling the coords by the absolute
+* value of the inverse major axis, and essentially do rho calculation
+* steps as if it were a 3d texture. This is perfect if all pixels hit
+* the same face, but not so great at edges, I believe the max error
+* should be sqrt(2) with no_rho_approx or 2 otherwise (essentially 
measuring
+* the 3d distance between 2 points on the cube instead of measuring up/down
+* the edge). Still this is possibly a win over just selecting the same face
+* for all pixels. Unfortunately, something like that doesn't work for
+* explicit derivatives.
+*/
 struct lp_build_context *coord_bld = bld-coord_bld;
 LLVMBuilderRef builder = bld-gallivm-builder;
 struct gallivm_state *gallivm = bld-gallivm;
 LLVMValueRef si, ti, ri;
+   struct lp_build_context *cint_bld = bld-int_coord_bld;
+   struct lp_type intctype = cint_bld-type;
+   LLVMValueRef signs, signt, signr, signma;
+   LLVMValueRef as, at, ar, face, face_s, face_t;
+   LLVMValueRef as_ge_at, maxasat, ar_ge_as_at;
+   LLVMValueRef snewx, tnewx, snewy, tnewy, snewz, tnewz;
+   LLVMValueRef tnegi, rnegi;
+   LLVMValueRef ma, mai, imahalfpos;
+   LLVMValueRef posHalf = lp_build_const_vec(gallivm, coord_bld-type, 0.5);
+   LLVMValueRef signmask = lp_build_const_int_vec(gallivm, intctype,
+  1  (intctype.width - 1));
+   LLVMValueRef signshift = lp_build_const_int_vec(gallivm, intctype,
+   intctype.width -1);
+   LLVMValueRef facex = lp_build_const_int_vec(gallivm, intctype, 
PIPE_TEX_FACE_POS_X);
+   LLVMValueRef facey = lp_build_const_int_vec(gallivm, intctype, 
PIPE_TEX_FACE_POS_Y);
+   LLVMValueRef facez = lp_build_const_int_vec(gallivm, intctype, 
PIPE_TEX_FACE_POS_Z);
+   LLVMValueRef s = coords[0];
+   LLVMValueRef t = coords[1];
+   LLVMValueRef r = coords[2];
+
+   assert(PIPE_TEX_FACE_NEG_X == PIPE_TEX_FACE_POS_X + 1);
+   assert(PIPE_TEX_FACE_NEG_Y == PIPE_TEX_FACE_POS_Y + 1);
+   assert(PIPE_TEX_FACE_NEG_Z == PIPE_TEX_FACE_POS_Z + 1);

-   if (1 || coord_bld-type.length  4) {
-  /*
-   * Do per-pixel face selection. We cannot however (as we used to do)
-   * simply calculate the derivs afterwards (which is very bogus for
-   * explicit derivs btw) because the values would be random when
-   * not all pixels lie on the same face. So what we do here is just
-   * calculate the derivatives after scaling the coords by the absolute
-   * value of the inverse major axis, and essentially do rho calculation
-   * steps as if it were a 3d texture. This is perfect if all pixels hit
-   * the same face, but not so great at edges, I believe the max error
-   * should be sqrt(2) with no_rho_approx or 2 otherwise (essentially 
measuring
-   * the 3d distance between 2 points on the cube instead of measuring 
up/down
-   * the edge). Still this is possibly a win over just selecting the same 
face
-   * for all pixels. Unfortunately, something like that doesn't work for
-   * explicit derivatives.
-   */
-  struct lp_build_context *cint_bld = bld-int_coord_bld;
-  struct lp_type intctype = cint_bld-type;
-  LLVMValueRef signs, signt, signr, signma;
-  LLVMValueRef as, at, ar, face, face_s, face_t;
-  LLVMValueRef as_ge_at, maxasat, ar_ge_as_at;
-  LLVMValueRef snewx, tnewx, snewy, tnewy, snewz, tnewz;
-  LLVMValueRef tnegi, rnegi;
-  LLVMValueRef ma, mai, imahalfpos;
-  LLVMValueRef posHalf = lp_build_const_vec(gallivm, coord_bld-type, 0.5);
-  LLVMValueRef signmask = lp_build_const_int_vec(gallivm, intctype,
- 1  (intctype.width - 
1));
-  

[Mesa-dev] [PATCH] radeonsi/compute: Fix segfault caused by recent refactoring

2013-10-03 Thread Tom Stellard
From: Tom Stellard thomas.stell...@amd.com

---
 src/gallium/drivers/radeon/r600_pipe_common.c  | 4 
 src/gallium/drivers/radeonsi/radeonsi_shader.c | 4 ++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 852993c..b038740 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -249,6 +249,10 @@ static unsigned tgsi_get_processor_type(const struct 
tgsi_token *tokens)
 bool r600_can_dump_shader(struct r600_common_screen *rscreen,
  const struct tgsi_token *tokens)
 {
+   /* Compute shader don't have tgsi_tokens */
+   if (!tokens)
+   return (rscreen-debug_flags  DBG_CS) != 0;
+
switch (tgsi_get_processor_type(tokens)) {
case TGSI_PROCESSOR_VERTEX:
return (rscreen-debug_flags  DBG_VS) != 0;
diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c 
b/src/gallium/drivers/radeonsi/radeonsi_shader.c
index 7ed3d26..97ed4e3 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_shader.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c
@@ -1759,8 +1759,8 @@ int si_compile_llvm(struct r600_context *rctx, struct 
si_pipe_shader *shader,
unsigned i;
uint32_t *ptr;
struct radeon_llvm_binary binary;
-   bool dump = r600_can_dump_shader(rctx-screen-b, 
shader-selector-tokens);
-
+   bool dump = r600_can_dump_shader(rctx-screen-b,
+   shader-selector ? shader-selector-tokens : NULL);
memset(binary, 0, sizeof(binary));
radeon_llvm_compile(mod, binary,
r600_get_llvm_processor_name(rctx-screen-b.family), dump);
-- 
1.8.1.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCHv2] configure: set HAVE_COMMON_DRI when building only swrast

2013-10-03 Thread Vinson Lee
On Wed, Oct 2, 2013 at 3:45 PM, Emil Velikov emil.l.veli...@gmail.com wrote:
 With commit cb1febb07, I have incorrectly removed HAVE_COMMON_DRI
 assuming that swrast does not need to build the translations for
 driconf options, as effectively swrast/drisw does not use them.

 With the incoming unification work of dri and drisw, it makes
 sense just to revert the offending hunk.

 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70057
 Reported-by: Vinson Lee v...@freedesktop.org
 Signed-off-by: Emil Velikov emil.l.veli...@gmail.com
 ---
 v2: resolve typos in the commit message. Thanks Ian

  configure.ac | 1 +
  1 file changed, 1 insertion(+)

 diff --git a/configure.ac b/configure.ac
 index e7c8223..9546163 100644
 --- a/configure.ac
 +++ b/configure.ac
 @@ -1823,6 +1823,7 @@ if test x$with_gallium_drivers != x; then

  if test x$enable_dri = xyes; then
  GALLIUM_TARGET_DIRS=$GALLIUM_TARGET_DIRS dri-swrast
 +HAVE_COMMON_DRI=yes
  fi
  if test x$enable_vdpau = xyes; then
  GALLIUM_TARGET_DIRS=$GALLIUM_TARGET_DIRS vdpau-softpipe
 --
 1.8.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


This patch fixes the build for me.

Tested-by: Vinson Lee v...@freedesktop.org
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] configure: set HAVE_COMMON_DRI when building only swrast

2013-10-03 Thread Emil Velikov
With commit cb1febb07, I have incorrectly removed HAVE_COMMON_DRI
assuming that swrast does not need to build the translations for
driconf options, as effectively swrast/drisw does not use them.

With the incoming unification work of dri and drisw, it makes
sense just to revert the offending hunk.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70057
Reported-by: Vinson Lee v...@freedesktop.org
Tested-by: Vinson Lee v...@freedesktop.org
Signed-off-by: Emil Velikov emil.l.veli...@gmail.com
---

Vinson as I'm short of commit access (for obvious reasons) so feel
free to commit if you're happy with this (and noone objects).
I've made a silly assumption, which I'll try not to repeat in the
future.

Cheers
Emil
---

 configure.ac | 1 +
 1 file changed, 1 insertion(+)

diff --git a/configure.ac b/configure.ac
index e7c8223..9546163 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1823,6 +1823,7 @@ if test x$with_gallium_drivers != x; then
 
 if test x$enable_dri = xyes; then
 GALLIUM_TARGET_DIRS=$GALLIUM_TARGET_DIRS dri-swrast
+HAVE_COMMON_DRI=yes
 fi
 if test x$enable_vdpau = xyes; then
 GALLIUM_TARGET_DIRS=$GALLIUM_TARGET_DIRS vdpau-softpipe
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] gallivm: handle explicit derivatives for cubemaps

2013-10-03 Thread Roland Scheidegger
Am 03.10.2013 21:39, schrieb Brian Paul:
 On 10/03/2013 09:42 AM, srol...@vmware.com wrote:
 From: Roland Scheidegger srol...@vmware.com

 They need some special handling. Quite complicated.
 Additionally, use the same code for implicit derivatives too if
 no_rho_approx
 and no_quad_lod is set, because it seems while generally it should be ok
 to use per quad lod for implicit derivatives there's at least some
 test which
 insists that in case of cubemaps the shared lod value MUST come from a
 pixel
 inside the primitive (due to the derivatives becoming different if a
 different
 larger major axis is chosen).
 ---
   src/gallium/auxiliary/gallivm/lp_bld_sample.c |  221
 +++--
   src/gallium/auxiliary/gallivm/lp_bld_sample.h |3 +-
   src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |   35 +++-
   3 files changed, 231 insertions(+), 28 deletions(-)

 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c
 b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
 index ea6bec7..ce05522 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
 @@ -273,7 +273,7 @@ lp_build_rho(struct lp_build_sample_context *bld,
 cubesize = lp_build_mul(rho_bld, cubesize, cubesize);
 rho = lp_build_mul(rho_bld, cubesize, rho);
  }
 -   else if (derivs  !(bld-static_texture_state-target ==
 PIPE_TEXTURE_CUBE)) {
 +   else if (derivs) {
 LLVMValueRef ddmax[3], ddx[3], ddy[3];
 for (i = 0; i  dims; i++) {
LLVMValueRef floatdim;
 @@ -1488,8 +1488,9 @@ lp_build_cube_face(struct
 lp_build_sample_context *bld,
   void
   lp_build_cube_lookup(struct lp_build_sample_context *bld,
LLVMValueRef *coords,
 - const struct lp_derivatives *derivs, /* optional */
 + const struct lp_derivatives *derivs_in, /*
 optional */
LLVMValueRef *rho,
 + struct lp_derivatives *derivs_out, /* optional */
boolean need_derivs)
   {
  struct lp_build_context *coord_bld = bld-coord_bld;
 @@ -1512,8 +1513,6 @@ lp_build_cube_lookup(struct
 lp_build_sample_context *bld,
  * the edge). Still this is possibly a win over just selecting
 the same face
  * for all pixels. Unfortunately, something like that doesn't
 work for
  * explicit derivatives.
 -   * TODO: handle explicit derivatives by transforming them
 alongside coords
 -   * somehow.
  */
 struct lp_build_context *cint_bld = bld-int_coord_bld;
 struct lp_type intctype = cint_bld-type;
 @@ -1522,7 +1521,7 @@ lp_build_cube_lookup(struct
 lp_build_sample_context *bld,
 LLVMValueRef as_ge_at, maxasat, ar_ge_as_at;
 LLVMValueRef snewx, tnewx, snewy, tnewy, snewz, tnewz;
 LLVMValueRef tnegi, rnegi;
 -  LLVMValueRef ma, mai, ima;
 +  LLVMValueRef ma, mai, imahalfpos;
 LLVMValueRef posHalf = lp_build_const_vec(gallivm,
 coord_bld-type, 0.5);
 LLVMValueRef signmask = lp_build_const_int_vec(gallivm, intctype,
1 
 (intctype.width - 1));
 @@ -1561,7 +1560,195 @@ lp_build_cube_lookup(struct
 lp_build_sample_context *bld,
 maxasat = lp_build_max(coord_bld, as, at);
 ar_ge_as_at = lp_build_cmp(coord_bld, PIPE_FUNC_GEQUAL, ar,
 maxasat);

 -  if (need_derivs) {
 +  if (need_derivs  (derivs_in ||
 +  ((gallivm_debug  GALLIVM_DEBUG_NO_QUAD_LOD) 
 +   (gallivm_debug  GALLIVM_DEBUG_NO_RHO_APPROX {
 + /*
 +  * XXX: This is really really complex.
 +  * It is a bit overkill to use this for implicit derivatives
 as well,
 +  * no way this is worth the cost in practice, but seems to
 be the
 +  * only way for getting accurate and per-pixel lod values.
 +  */
 + LLVMValueRef imapos, tmp, ddx[3], ddy[3];
 + LLVMValueRef madx, mady, madxdivma, madydivma;
 + LLVMValueRef sdxi, tdxi, rdxi, signsdx, signtdx, signrdx;
 + LLVMValueRef sdyi, tdyi, rdyi, signsdy, signtdy, signrdy;
 + LLVMValueRef tdxnegi, rdxnegi, tdynegi, rdynegi;
 + LLVMValueRef sdxnewx, sdxnewy, sdxnewz, tdxnewx, tdxnewy,
 tdxnewz;
 + LLVMValueRef sdynewx, sdynewy, sdynewz, tdynewx, tdynewy,
 tdynewz;
 + LLVMValueRef face_sdx, face_tdx, face_sdy, face_tdy;
 + LLVMValueRef posHalf = lp_build_const_vec(coord_bld-gallivm,
 +   coord_bld-type,
 0.5);
 + /*
 +  * s = 1/2 * ( sc / ma + 1)
 +  * t = 1/2 * ( tc / ma + 1)
 +  *
 +  * s' = 1/2 * (sc' * ma - sc * ma') / ma^2
 +  * t' = 1/2 * (tc' * ma - tc * ma') / ma^2
 +  *
 +  * dx.s = 0.5 * (dx.sc - sc * dx.ma / ma) / ma
 +  * dx.t = 0.5 * (dx.tc - tc * dx.ma / ma) / ma
 +  * dy.s = 0.5 * (dy.sc - sc * dy.ma / ma) / ma
 +   

Re: [Mesa-dev] [PATCH] radeonsi/compute: Fix segfault caused by recent refactoring

2013-10-03 Thread Marek Olšák
Reviewed-by: Marek Olšák marek.ol...@amd.com

Marek

On Thu, Oct 3, 2013 at 11:39 PM, Tom Stellard t...@stellard.net wrote:
 From: Tom Stellard thomas.stell...@amd.com

 ---
  src/gallium/drivers/radeon/r600_pipe_common.c  | 4 
  src/gallium/drivers/radeonsi/radeonsi_shader.c | 4 ++--
  2 files changed, 6 insertions(+), 2 deletions(-)

 diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
 b/src/gallium/drivers/radeon/r600_pipe_common.c
 index 852993c..b038740 100644
 --- a/src/gallium/drivers/radeon/r600_pipe_common.c
 +++ b/src/gallium/drivers/radeon/r600_pipe_common.c
 @@ -249,6 +249,10 @@ static unsigned tgsi_get_processor_type(const struct 
 tgsi_token *tokens)
  bool r600_can_dump_shader(struct r600_common_screen *rscreen,
   const struct tgsi_token *tokens)
  {
 +   /* Compute shader don't have tgsi_tokens */
 +   if (!tokens)
 +   return (rscreen-debug_flags  DBG_CS) != 0;
 +
 switch (tgsi_get_processor_type(tokens)) {
 case TGSI_PROCESSOR_VERTEX:
 return (rscreen-debug_flags  DBG_VS) != 0;
 diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c 
 b/src/gallium/drivers/radeonsi/radeonsi_shader.c
 index 7ed3d26..97ed4e3 100644
 --- a/src/gallium/drivers/radeonsi/radeonsi_shader.c
 +++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c
 @@ -1759,8 +1759,8 @@ int si_compile_llvm(struct r600_context *rctx, struct 
 si_pipe_shader *shader,
 unsigned i;
 uint32_t *ptr;
 struct radeon_llvm_binary binary;
 -   bool dump = r600_can_dump_shader(rctx-screen-b, 
 shader-selector-tokens);
 -
 +   bool dump = r600_can_dump_shader(rctx-screen-b,
 +   shader-selector ? shader-selector-tokens : NULL);
 memset(binary, 0, sizeof(binary));
 radeon_llvm_compile(mod, binary,
 r600_get_llvm_processor_name(rctx-screen-b.family), dump);
 --
 1.8.1.5

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] [v2] i965: Extract region use from hiz depth buffer

2013-10-03 Thread Chad Versace

On 10/01/2013 04:36 PM, Ben Widawsky wrote:

On Tue, Oct 01, 2013 at 01:06:02PM -0700, Chad Versace wrote:

On 09/30/2013 12:35 PM, Ben Widawsky wrote:

Starting with Ivybridge, the hierarchical had relaxed requirements for
its allocation. Following a simple formula in the bspec was all you
needed to satisfy the requirement.

To prepare the code for this, extract all places where the miptree was
used, when we really only needed the region. This allows an upcoming
patch to simply allocate the region, and not the whole miptree.

v2: Don't use intel_region. Instead use bo + stride. We actually do
store the stride in libdrm, but it is inaccessible in the current
libdrm version.

CC: Chad Versace chad.vers...@linux.intel.com
Signed-off-by: Ben Widawsky b...@bwidawsk.net
---
  src/mesa/drivers/dri/i965/brw_misc_state.c| 11 +---
  src/mesa/drivers/dri/i965/gen6_blorp.cpp  | 20 +--
  src/mesa/drivers/dri/i965/gen7_blorp.cpp  |  6 ++---
  src/mesa/drivers/dri/i965/gen7_misc_state.c   |  5 ++--
  src/mesa/drivers/dri/i965/intel_fbo.c |  4 +--
  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 36 +++
  src/mesa/drivers/dri/i965/intel_mipmap_tree.h |  6 -
  7 files changed, 52 insertions(+), 36 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c 
b/src/mesa/drivers/dri/i965/brw_misc_state.c
index 7f4cd6f..23ffeab 100644
--- a/src/mesa/drivers/dri/i965/brw_misc_state.c
+++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
@@ -210,8 +210,12 @@ brw_get_depthstencil_tile_masks(struct intel_mipmap_tree 
*depth_mt,
tile_mask_x, tile_mask_y, false);

if (intel_miptree_slice_has_hiz(depth_mt, depth_level, depth_layer)) {
+uint32_t tmp;
   uint32_t hiz_tile_mask_x, hiz_tile_mask_y;
- intel_region_get_tile_masks(depth_mt-hiz_mt-region,
+struct intel_region region = { .cpp = depth_mt-cpp };
+
+drm_intel_bo_get_tiling(depth_mt-hiz_buffer.bo, region.tiling, tmp);
+ intel_region_get_tile_masks(region,
   hiz_tile_mask_x, hiz_tile_mask_y, 
false);

   /* Each HiZ row represents 2 rows of pixels */
@@ -667,11 +671,10 @@ brw_emit_depth_stencil_hiz(struct brw_context *brw,

/* Emit hiz buffer. */
if (hiz) {
- struct intel_mipmap_tree *hiz_mt = depth_mt-hiz_mt;
 BEGIN_BATCH(3);
 OUT_BATCH((_3DSTATE_HIER_DEPTH_BUFFER  16) | (3 - 2));
-OUT_BATCH(hiz_mt-region-pitch - 1);
-OUT_RELOC(hiz_mt-region-bo,
+OUT_BATCH(depth_mt-hiz_buffer.stride - 1);
+OUT_RELOC(depth_mt-hiz_buffer.bo,
   I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER,
   brw-depthstencil.hiz_offset);
 ADVANCE_BATCH();
diff --git a/src/mesa/drivers/dri/i965/gen6_blorp.cpp 
b/src/mesa/drivers/dri/i965/gen6_blorp.cpp
index da523e5..fc3a331 100644
--- a/src/mesa/drivers/dri/i965/gen6_blorp.cpp
+++ b/src/mesa/drivers/dri/i965/gen6_blorp.cpp
@@ -887,16 +887,22 @@ gen6_blorp_emit_depth_stencil_config(struct brw_context 
*brw,

 /* 3DSTATE_HIER_DEPTH_BUFFER */
 {
-  struct intel_region *hiz_region = params-depth.mt-hiz_mt-region;
-  uint32_t hiz_offset =
- intel_region_get_aligned_offset(hiz_region,
- draw_x  ~tile_mask_x,
- (draw_y  ~tile_mask_y) / 2, false);
+  uint32_t hiz_offset, tmp;
+  struct intel_mipmap_tree *depth_mt = params-depth.mt;
+  struct intel_region hiz_region;
+
+  hiz_region.cpp = depth_mt-cpp;
+  hiz_region.pitch = depth_mt-hiz_buffer.stride;
+  drm_intel_bo_get_tiling(depth_mt-hiz_buffer.bo, hiz_region.tiling, 
tmp);


This initialization of hiz_region subtly differs from the initialization in the 
previous
hunk that uses the designated initializer syntax. When using designated 
initializers,
all uninitialized fields are initialized to 0. Here, the uninitialized fields 
have
undefined values. Please use designated initializers here to prevent undefined 
behavior.


+
+  hiz_offset = intel_region_get_aligned_offset(hiz_region,
+  draw_x  ~tile_mask_x,
+  (draw_y  ~tile_mask_y) / 2, 
false);



BEGIN_BATCH(3);
OUT_BATCH((_3DSTATE_HIER_DEPTH_BUFFER  16) | (3 - 2));
-  OUT_BATCH(hiz_region-pitch - 1);
-  OUT_RELOC(hiz_region-bo,
+  OUT_BATCH(hiz_region.pitch - 1);
+  OUT_RELOC(depth_mt-hiz_buffer.bo,


The 'hiz_region' is a temporary thing that will eventually die off as we clean 
up
the driver. So, replace OUT_BATCH(hiz_region.pitch - 1) with
OUT_BATCH(depth_mt-hiz_buffer.stride - 1). (As a nice little side-effect, the 
sequence of
OUT_BATCH's look more symmetric that way).


  I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER,
  hiz_offset);

Re: [Mesa-dev] [PATCH 1/2] [v2] i965: Extract region use from hiz depth buffer

2013-10-03 Thread Chad Versace

On 10/01/2013 04:48 PM, Ben Widawsky wrote:

On Tue, Oct 01, 2013 at 01:06:02PM -0700, Chad Versace wrote:

On 09/30/2013 12:35 PM, Ben Widawsky wrote:



diff --git a/src/mesa/drivers/dri/i965/gen6_blorp.cpp 
b/src/mesa/drivers/dri/i965/gen6_blorp.cpp
index da523e5..fc3a331 100644
--- a/src/mesa/drivers/dri/i965/gen6_blorp.cpp
+++ b/src/mesa/drivers/dri/i965/gen6_blorp.cpp
@@ -887,16 +887,22 @@ gen6_blorp_emit_depth_stencil_config(struct brw_context 
*brw,

 /* 3DSTATE_HIER_DEPTH_BUFFER */
 {
-  struct intel_region *hiz_region = params-depth.mt-hiz_mt-region;
-  uint32_t hiz_offset =
- intel_region_get_aligned_offset(hiz_region,
- draw_x  ~tile_mask_x,
- (draw_y  ~tile_mask_y) / 2, false);
+  uint32_t hiz_offset, tmp;
+  struct intel_mipmap_tree *depth_mt = params-depth.mt;
+  struct intel_region hiz_region;
+
+  hiz_region.cpp = depth_mt-cpp;
+  hiz_region.pitch = depth_mt-hiz_buffer.stride;
+  drm_intel_bo_get_tiling(depth_mt-hiz_buffer.bo, hiz_region.tiling, 
tmp);


This initialization of hiz_region subtly differs from the initialization in the 
previous
hunk that uses the designated initializer syntax. When using designated 
initializers,
all uninitialized fields are initialized to 0. Here, the uninitialized fields 
have
undefined values. Please use designated initializers here to prevent undefined 
behavior.


+
+  hiz_offset = intel_region_get_aligned_offset(hiz_region,
+  draw_x  ~tile_mask_x,
+  (draw_y  ~tile_mask_y) / 2, 
false);



BEGIN_BATCH(3);
OUT_BATCH((_3DSTATE_HIER_DEPTH_BUFFER  16) | (3 - 2));
-  OUT_BATCH(hiz_region-pitch - 1);
-  OUT_RELOC(hiz_region-bo,
+  OUT_BATCH(hiz_region.pitch - 1);
+  OUT_RELOC(depth_mt-hiz_buffer.bo,


The 'hiz_region' is a temporary thing that will eventually die off as we clean 
up
the driver. So, replace OUT_BATCH(hiz_region.pitch - 1) with
OUT_BATCH(depth_mt-hiz_buffer.stride - 1). (As a nice little side-effect, the 
sequence of
OUT_BATCH's look more symmetric that way).


Are you referring to memset? The only initializer is
intel_region_alloc_internal() which I do not have access to (and indeed
it seems like the wrong thing to make it extern).


As you pointed out yesterday, this is C++ code, so a designated struct 
initializer
can't be used. Oops.

We still need to avoid passing around uninitialized data, though. I think 
memset is
a good choice here.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Janitorial work: no more intel_context.[ch]; tidying

2013-10-03 Thread Kenneth Graunke
On 10/02/2013 12:51 PM, Matt Turner wrote:
 On Wed, Oct 2, 2013 at 11:02 AM, Ian Romanick i...@freedesktop.org wrote:
 (Adding Alan to the CC list.)

 On 10/01/2013 10:51 PM, Vinson Lee wrote:
 On Mon, Sep 30, 2013 at 10:21 PM, Kenneth Graunke kenn...@whitecape.org 
 wrote:
 On 09/27/2013 06:24 PM, Emil Velikov wrote:
 * With the recent split of the intel driver codebase, the new i965
 headers has been getting a bunch of #pragma once over the standard
 #ifndef _HEADER_H_... Are those intentional ?

 Yup, that's intentional.  #pragma once doesn't require inventing a
 unique #define name, is less typing, and is faster on some compilers.

 I actually forgot that it wasn't standard.  It's supported basically
 everywhere, though, so I'd be really shocked if it caused problems.

 Oracle Solaris Studio does not support #pragma once.

 Is there *any* reason to use that compiler over GCC?  This isn't the
 first time that we've discovered it to be lacking some feature that GCC,
 clang, and Visual Studio all support. :(
 
 Before we go down this rabbit hole -- Vinson said it doesn't support
 #pragma once. He didn't say it caused problems. I don't expect it is,
 since we're already using it and have been for a long time.
 
 It probably just means that you have to to #pragma once along with the
 standard #ifndef ... #endif wrapper.

I'm not opposed to doing that.  I just didn't think it was necessary
anymore.

However, note that brw_blorp.h, brw_fs.h, brw_shader.h, gen6_blorp.h,
gen7_blorp.h, and intel_resolve_map.h already use #pragma once and don't
use the standard #ifndef...#endif wrapping.  I think those are all C++
based, though...

Maybe we should switch those.

--Ken
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev