Re: [Mesa-dev] [PATCH 0/4] i965: skip control-flow aware liveness analysis if we only have 1 block

2015-10-13 Thread Iago Toral
On Tue, 2015-10-13 at 09:44 -0700, Jordan Justen wrote:
> On 2015-10-13 05:17:37, Francisco Jerez wrote:
> > Iago Toral Quiroga  writes:
> > 
> > > This fixes the following test:
> > >
> > > [require]
> > > GL >= 3.3
> > > GLSL >= 3.30
> > > GL_ARB_shader_storage_buffer_object
> > >
> > > [fragment shader]
> > > #version 330
> > > #extension GL_ARB_shader_storage_buffer_object: require
> > >
> > > buffer SSBO {
> > > mat4 sm4;
> > > };
> > >
> > >
> > > mat4 um4;
> > >
> > > void main() {
> > > sm4 *= um4;
> > 
> > This is using the value of "um4", which is never assigned, so liveness
> > analysis will correctly extend its live interval up to the start of the
> > block.
> 
> This test was derived by simplifying a CTS test case.
> 
> Anyway, I'm not sure what happened on the way to the commit message,
> but um4 should be a uniform.
> 
> http://sprunge.us/cEUe

Oh yes, that was me playing around with the example. The patches also
fix the uniform version. Jordan, can you verify if this fixes the CTS
test case?

In any case, since Curro is working on a more general fix for this
stuff, I guess you'd rather wait for his patches...

Iago

> -Jordan
> 
> > The other problem here seems to be that the liveness analysis pass
> > doesn't consider scalar writes (like the ones emitted by the
> > combine_constants optimization pass and by emit_uniformize()) to fully
> > define the contents of a register, so it will incorrectly extend up the
> > live interval of registers defined using scalar writes even if all
> > components ever used in the shader have been defined individually.
> > Incidentally the use-def-chains-based implementation of liveness
> > analysis I'm working on will fix this issue easily.
> > 
> > > sm4[0][0] = 0.0;
> > > }
> > >
> > > [test]
> > > link success
> > >
> > > where our liveness analysis works really bad because the control-flow part
> > > will expand register liveness to the end of only block we have (so every
> > > register will be marked alive until the end of the program). This 
> > > artificially
> > > increases register pressure to a point in which we run out of registers.
> > > Of course, this is only a simple optimization for a trivial case, the
> > > underlying problem still exists and would manifest when we have more than
> > > one block, but it helps simple shaders such as the one in the example 
> > > above
> > > without any effort. I guess the real fix would involve re-thinking parts 
> > > of the
> > > liveness analysis algorithm...
> > >
> > > Jordan, there is another thing that we can improve for this shader that is
> > > specific to SSBOs: we emit the same ssbo load multiple times because we 
> > > are
> > > playing it safe just in case there are writes in between. I think we can 
> > > try to
> > > do better and not re-issue the same load if we don't have ssbo stores to
> > > the same address in between. I'll try to come up with a patch for this
> > > (hopefully we can do this inside lower_ubo_reference).
> > >
> > > The actual fix goes into patches 2 (FS) and 4 (VS). Patches 1,3 are just
> > > indentation fixes in the code around these.
> > >
> > > Iago Toral Quiroga (4):
> > >   i965/fs: Fix indentation in fs_live_variables::compute_start_end
> > >   i965/fs: skip control-flow aware liveness analysis if we only have 1
> > > block
> > >   i965/vec4: fix indentation in vec4_visitor::calculate_live_intervals
> > >   i965/vec4: skip control-flow aware liveness analysis if we only have 1
> > > block
> > >
> > >  .../drivers/dri/i965/brw_fs_live_variables.cpp | 24 
> > > ++
> > >  .../drivers/dri/i965/brw_vec4_live_variables.cpp   | 23 
> > > +
> > >  2 files changed, 30 insertions(+), 17 deletions(-)
> > >
> > > -- 
> > > 1.9.1
> > >
> > > ___
> > > mesa-dev mailing list
> > > mesa-dev@lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > 
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/10] i965/gen9: Support fast clears for 32b float

2015-10-13 Thread Ben Widawsky
SKL supports the ability to do fast clears and resolves of 32b RGBA as both
integer and floats. This patch only enables float color clears because we
haven't yet enabled integer color clears, (HW support for that was added in
BDW).

This is enabled separate because it is a new feature to SKL and so it might have
some issues.

NOTE: This patch has 2 regressions with 16F_LUMINANCE and 16F_INTENSITY which
needs to be resolved before merging. The rest of the test suites are happy.
./bin/ext_framebuffer_multisample-formats [2468]
...
Testing GL_LUMINANCE16F_ARB
Probe at (0,0)
  Expected: 0.00
  Observed: 0.50
Probe at (0,0)
  Expected: 0.00
  Observed: 0.50

Not-Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 8 ++--
 src/mesa/drivers/dri/i965/gen8_surface_state.c  | 8 
 2 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index 9c51ffb..aa36794 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -360,8 +360,12 @@ is_color_fast_clear_compatible(struct brw_context *brw,
}
 
for (int i = 0; i < 4; i++) {
-  if (color->f[i] != 0.0f && color->f[i] != 1.0f &&
-  _mesa_format_has_color_component(format, i)) {
+  if (!_mesa_format_has_color_component(format, i)) {
+ continue;
+  }
+
+  if (brw->gen < 9 &&
+  color->f[i] != 0.0f && color->f[i] != 1.0f) {
  return false;
   }
}
diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
b/src/mesa/drivers/dri/i965/gen8_surface_state.c
index b19b492..ca0cedc 100644
--- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
@@ -188,14 +188,6 @@ gen8_emit_fast_clear_color(struct brw_context *brw,
uint32_t *surf)
 {
if (brw->gen >= 9) {
-#define check_fast_clear_val(x) \
-  assert(mt->gen9_fast_clear_color.f[x] == 0.0 || \
- mt->gen9_fast_clear_color.f[x] == 1.0)
-  check_fast_clear_val(0);
-  check_fast_clear_val(1);
-  check_fast_clear_val(2);
-  check_fast_clear_val(3);
-#undef check_fast_clear_val
   surf[12] = mt->gen9_fast_clear_color.ui[0];
   surf[13] = mt->gen9_fast_clear_color.ui[1];
   surf[14] = mt->gen9_fast_clear_color.ui[2];
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/10] i965/gen8+: Remove redundant zeroing of surface state

2015-10-13 Thread Ben Widawsky
The allocate_surface_state already zeroes out the surface state, and doing it
later in the function is destructive for what we want to accomplish when we
split out support for gen9 fast clears (next patch).

NOTE: Only dword 12 actually needed to be fixed, but it seemed more consistent
to remove the other instances as well. I can make an argument both ways (open
coding it, vs. not). I can rework the next patch if requires.

Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/gen8_surface_state.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
b/src/mesa/drivers/dri/i965/gen8_surface_state.c
index 18b8665..eaaecd3 100644
--- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
@@ -284,8 +284,6 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
 SET_FIELD((aux_mt->pitch / tile_w) - 1,
   GEN8_SURFACE_AUX_PITCH) |
 aux_mode;
-   } else {
-  surf[6] = 0;
}
 
surf[7] = mt->fast_clear_color_value |
@@ -302,11 +300,7 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
   aux_mt->bo, 0,
   I915_GEM_DOMAIN_SAMPLER,
   (rw ? I915_GEM_DOMAIN_SAMPLER : 0));
-   } else {
-  surf[10] = 0;
-  surf[11] = 0;
}
-   surf[12] = 0;
 
/* Emit relocation to surface contents */
drm_intel_bo_emit_reloc(brw->batch.bo,
@@ -514,8 +508,6 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
 SET_FIELD((aux_mt->pitch / tile_w) - 1,
   GEN8_SURFACE_AUX_PITCH) |
 aux_mode;
-   } else {
-  surf[6] = 0;
}
 
surf[7] = mt->fast_clear_color_value |
@@ -533,11 +525,7 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
   offset + 10 * 4,
   aux_mt->bo, 0,
   I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER);
-   } else {
-  surf[10] = 0;
-  surf[11] = 0;
}
-   surf[12] = 0;
 
drm_intel_bo_emit_reloc(brw->batch.bo,
offset + 8 * 4,
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/10] i965/meta/gen9: Individually fast clear color attachments

2015-10-13 Thread Ben Widawsky
The impetus for this patch comes from a seemingly benign statement within the
spec (quoted within the patch). For me, this patch was at some point critical
for getting stable piglit results (though this did not seem to be the case on a
branch Chad was working on).

It is very important for clearing multiple color buffer attachments and can be
observed in the following piglit tests:
spec/arb_framebuffer_object/fbo-drawbuffers-none glclear
spec/ext_framebuffer_multisample/blit-multiple-render-targets 0

Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 97 +
 1 file changed, 84 insertions(+), 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index 7bf52f0..9e6711e 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -427,6 +427,74 @@ use_rectlist(struct brw_context *brw, bool enable)
brw->ctx.NewDriverState |= BRW_NEW_FRAGMENT_PROGRAM;
 }
 
+/**
+ * Individually fast clear each color buffer attachment. On previous gens this
+ * isn't required. The motivation for this comes from one line (which seems to
+ * be specific to SKL+). The list item is in section titled _MCS Buffer for
+ * Render Target(s)_
+ *
+ *   "Since only one RT is bound with a clear pass, only one RT can be cleared
+ *   at a time. To clear multiple RTs, multiple clear passes are required."
+ *
+ * The code follows the same idea as the resolve code which creates a fake FBO
+ * to avoid interfering with too much of the GL state.
+ */
+static void
+fast_clear_attachments(struct brw_context *brw,
+   struct gl_framebuffer *fb,
+   uint32_t fast_clear_buffers,
+   struct rect fast_clear_rect)
+{
+   assert(brw->gen >= 9);
+   struct gl_context *ctx = >ctx;
+
+   GLuint old_fb = ctx->DrawBuffer->Name;
+
+   for (unsigned buf = 0; buf < fb->_NumColorDrawBuffers; buf++) {
+  struct gl_renderbuffer *rb = fb->_ColorDrawBuffers[buf];
+  struct intel_renderbuffer *irb = intel_renderbuffer(rb);
+  GLuint fbo, rbo;
+  int index = fb->_ColorDrawBufferIndexes[buf];
+
+  if (!((1 << index) & fast_clear_buffers))
+ continue;
+
+  _mesa_GenFramebuffers(1, );
+  rbo = brw_get_rb_for_slice(brw, irb->mt, 0, 0, false);
+
+  _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, fbo);
+  _mesa_FramebufferRenderbuffer(GL_DRAW_FRAMEBUFFER,
+GL_COLOR_ATTACHMENT0,
+GL_RENDERBUFFER, rbo);
+  _mesa_DrawBuffer(GL_COLOR_ATTACHMENT0);
+
+  brw_fast_clear_init(brw);
+
+  use_rectlist(brw, true);
+
+  brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
+
+  /* SKL+ also has a resolve mode for compressed render targets and thus 
more
+   * bits to let us select the type of resolve.  For fast clear resolves, 
it
+   * turns out we can use the same value as pre-SKL though.
+   */
+  set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE);
+  brw_draw_rectlist(ctx, _clear_rect, MAX2(1, fb->MaxNumLayers));
+  set_fast_clear_op(brw, 0);
+  use_rectlist(brw, false);
+
+  _mesa_DeleteRenderbuffers(1, );
+  _mesa_DeleteFramebuffers(1, );
+
+  /* Now set the mcs we cleared to INTEL_FAST_CLEAR_STATE_CLEAR so we'll
+   * resolve them eventually.
+   */
+  irb->mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR;
+   }
+
+   _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, old_fb);
+}
+
 bool
 brw_meta_fast_clear(struct brw_context *brw, struct gl_framebuffer *fb,
 GLbitfield buffers, bool partial_clear)
@@ -600,12 +668,27 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
use_rectlist(brw, true);
 
layers = MAX2(1, fb->MaxNumLayers);
-   if (fast_clear_buffers) {
+
+   if (brw->gen >= 9 && fast_clear_buffers) {
+  fast_clear_attachments(brw, fb, fast_clear_buffers, fast_clear_rect);
+   } else if (fast_clear_buffers) {
   _mesa_meta_drawbuffers_from_bitfield(fast_clear_buffers);
   brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
   set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE);
   brw_draw_rectlist(ctx, _clear_rect, layers);
   set_fast_clear_op(brw, 0);
+
+  /* Now set the mcs we cleared to INTEL_FAST_CLEAR_STATE_CLEAR so we'll
+   * resolve them eventually.
+   */
+  for (unsigned buf = 0; buf < fb->_NumColorDrawBuffers; buf++) {
+ struct gl_renderbuffer *rb = fb->_ColorDrawBuffers[buf];
+ struct intel_renderbuffer *irb = intel_renderbuffer(rb);
+ int index = fb->_ColorDrawBufferIndexes[buf];
+
+ if ((1 << index) & fast_clear_buffers)
+irb->mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR;
+  }
}
 
if (rep_clear_buffers) {
@@ -614,18 +697,6 @@ 

[Mesa-dev] [PATCH 00/10] Support Skylake MCS buffers (fast clears)

2015-10-13 Thread Ben Widawsky
This patch series adds support for fast color clears on SKL as it exists on
previous generations of hardware minus the new hardware restriction on surface
formats. Additionally, it adds support for utilizing clear values with up to 32b
per color channel (see note at the bottom). It is based on work originally done
by Kristian, so thanks to him for that initial work as well as helping me debug
some of the issues.

Additionally, thanks to Chad for helping track down the last bug in the 
rectangle
scaling code which was (for me) being masked by another bug (#3 below). I
imagine it would have been several more weeks at least before I uncovered it.

We knew that SKL added the extra DWORDs to the RENDER_SURFACE_STATE in order to
support the 32b per channel. As it turned out though, Skylake made other changes
to support this which caused weird failures which seemed to interfere with
each other.

1. Not all surface formats support lossless compression.
2. Clearing multiple color buffer attachments must happen in n passes
3. Change to the scaling factors for the MCS surface - SKL has 2x height (this
was the bug which Chad helped uncover, I had it correct in my patch from March
http://lists.freedesktop.org/archives/mesa-dev/2015-March/079084.html, but we
had other problems which prevented merge, including #1 and #2 above).

I have no piglit, dEQP or CTS regressions (except for the last patch). I haven't
yet, but will collect perf data on this ASAP. Historically we've come to expect
this to provide large gains in tests which are memory bandwidth limited and
doing many clears.

Ben Widawsky (10):
  i965/gen8+: Remove redundant zeroing of surface state
  i965/gen8+: Extract color clear surface state
  i965/skl: Enable fast color clears on SKL
  i965/skl: skip fast clears for certain surface formats
  i965/meta/gen9: Individually fast clear color attachments
  Revert "i965/gen9: Disable MCS for 1x color surfaces"
  Revert "i965/gen9: Enable rep clears on gen9"
  i965/meta: Assert fast clears and rep clears never overlap
  i965/meta: Remove fast_clear_color variable
  i965/gen9: Support fast clears for 32b float

 src/mesa/drivers/dri/i965/brw_context.h |   1 +
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 172 ++--
 src/mesa/drivers/dri/i965/brw_surface_formats.c |  27 
 src/mesa/drivers/dri/i965/gen8_surface_state.c  |  48 ---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c   |  20 +--
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h   |   7 +-
 6 files changed, 205 insertions(+), 70 deletions(-)

-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/10] Revert "i965/gen9: Disable MCS for 1x color surfaces"

2015-10-13 Thread Ben Widawsky
This reverts commit dcd59a9e322edeea74187bcad65a8e56c0bfaaa2.
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index f108b75..c723f79 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -207,14 +207,6 @@ intel_miptree_supports_non_msrt_fast_clear(struct 
brw_context *brw,
if (brw->gen < 7)
   return false;
 
-   if (brw->gen >= 9) {
-  /* FINISHME: Enable singlesample fast MCS clears on SKL after all GPU
-   * FINISHME: hangs are resolved.
-   */
-  perf_debug("singlesample fast MCS clears disabled on gen9");
-  return false;
-   }
-
if (mt->disable_aux_buffers)
   return false;
 
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/10] Revert "i965/gen9: Enable rep clears on gen9"

2015-10-13 Thread Ben Widawsky
This reverts commit 8a0c85b25853decb4a110b6d36d79c4f095d437b.

It's not a strict revert because I don't want to bring back the gen < 9 check at
this point in time.
---
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index 9e6711e..97094ae 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -537,11 +537,6 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
   if (irb->mt->fast_clear_state == INTEL_FAST_CLEAR_STATE_NO_MCS)
  clear_type = REP_CLEAR;
 
-  if (brw->gen >= 9 && clear_type == FAST_CLEAR) {
- perf_debug("fast MCS clears are disabled on gen9");
- clear_type = REP_CLEAR;
-  }
-
   /* We can't do scissored fast clears because of the restrictions on the
* fast clear rectangle size.
*/
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/10] i965/skl: Enable fast color clears on SKL

2015-10-13 Thread Ben Widawsky
Based on a patch originally from Kristian. Skylake has extended capabilities
with regard to fast clears, but that is saved for another patch.

The same effect could be acheived with the following, however I think the way
I've done it is more in line with how the docs explain it.
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -150,9 +150,13 @@ intel_get_non_msrt_mcs_alignment(struct brw_context *brw,
   /* In release builds, fall through */
case I915_TILING_Y:
   *width_px = 32 / mt->cpp;
-  *height = 4;
+  if (brw->gen >= 9)
+ *height = 2;
+  else
+ *height = 4;

Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 54 +
 src/mesa/drivers/dri/i965/gen8_surface_state.c  | 34 
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c   |  9 +
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h   |  7 +++-
 4 files changed, 78 insertions(+), 26 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index fbde3f0..7bf52f0 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -204,7 +204,7 @@ brw_draw_rectlist(struct gl_context *ctx, struct rect 
*rect, int num_instances)
 }
 
 static void
-get_fast_clear_rect(struct gl_framebuffer *fb,
+get_fast_clear_rect(struct brw_context *brw, struct gl_framebuffer *fb,
 struct intel_renderbuffer *irb, struct rect *rect)
 {
unsigned int x_align, y_align;
@@ -228,7 +228,14 @@ get_fast_clear_rect(struct gl_framebuffer *fb,
*/
   intel_get_non_msrt_mcs_alignment(irb->mt, _align, _align);
   x_align *= 16;
-  y_align *= 32;
+
+  /* SKL+ line alignment requirement for Y-tiled are half those of the 
prior
+   * generations.
+   */
+  if (brw->gen >= 9)
+ y_align *= 16;
+  else
+ y_align *= 32;
 
   /* From the Ivy Bridge PRM, Vol2 Part1 11.7 "MCS Buffer for Render
* Target(s)", beneath the "Fast Color Clear" bullet (p327):
@@ -265,8 +272,10 @@ get_fast_clear_rect(struct gl_framebuffer *fb,
* terms of (width,height) of the RT.
*
* MSAA  Width of Clear Rect  Height of Clear Rect
+   *  2X Ceil(1/8*width)  Ceil(1/2*height)
*  4X Ceil(1/8*width)  Ceil(1/2*height)
*  8X Ceil(1/2*width)  Ceil(1/2*height)
+   * 16X widthCeil(1/2*height)
*
* The text "with upper left co-ordinate to coincide with actual
* rectangle being cleared" is a little confusing--it seems to imply
@@ -289,6 +298,9 @@ get_fast_clear_rect(struct gl_framebuffer *fb,
   case 8:
  x_scaledown = 2;
  break;
+  case 16:
+ x_scaledown = 1;
+ break;
   default:
  unreachable("Unexpected sample count for fast clear");
   }
@@ -358,18 +370,24 @@ is_color_fast_clear_compatible(struct brw_context *brw,
 
 /**
  * Convert the given color to a bitfield suitable for ORing into DWORD 7 of
- * SURFACE_STATE.
+ * SURFACE_STATE (DWORD 12-15 on SKL+).
  */
-static uint32_t
-compute_fast_clear_color_bits(const union gl_color_union *color)
+static void
+set_fast_clear_color(struct brw_context *brw,
+ struct intel_mipmap_tree *mt,
+ const union gl_color_union *color)
 {
-   uint32_t bits = 0;
-   for (int i = 0; i < 4; i++) {
-  /* Testing for non-0 works for integer and float colors */
-  if (color->f[i] != 0.0f)
- bits |= 1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i));
+   if (brw->gen >= 9) {
+  mt->gen9_fast_clear_color = *color;
+   } else {
+  mt->fast_clear_color_value = 0;
+  for (int i = 0; i < 4; i++) {
+ /* Testing for non-0 works for integer and float colors */
+ if (color->f[i] != 0.0f)
+ mt->fast_clear_color_value |=
+1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i));
+  }
}
-   return bits;
 }
 
 static const uint32_t fast_clear_color[4] = { ~0, ~0, ~0, ~0 };
@@ -504,8 +522,7 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
 
   switch (clear_type) {
   case FAST_CLEAR:
- irb->mt->fast_clear_color_value =
-compute_fast_clear_color_bits(>Color.ClearColor);
+ set_fast_clear_color(brw, irb->mt, >Color.ClearColor);
  irb->need_downsample = true;
 
  /* If the buffer is already in INTEL_FAST_CLEAR_STATE_CLEAR, the
@@ -521,7 +538,7 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
  irb->mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_RESOLVED;
  irb->need_downsample = true;
  fast_clear_buffers |= 1 << index;
- get_fast_clear_rect(fb, irb, _clear_rect);
+ get_fast_clear_rect(brw, 

[Mesa-dev] [PATCH 02/10] i965/gen8+: Extract color clear surface state

2015-10-13 Thread Ben Widawsky
On future generation platforms the color clear value is stored elsewhere in the
surface state. By extracting this logic, we can cleanly implement the difference
in an upcoming patch.

Should have no functional impact.

Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/gen8_surface_state.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
b/src/mesa/drivers/dri/i965/gen8_surface_state.c
index eaaecd3..e70c15b 100644
--- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
@@ -384,6 +384,12 @@ gen8_emit_null_surface_state(struct brw_context *brw,
  SET_FIELD(height - 1, GEN7_SURFACE_HEIGHT);
 }
 
+static void
+gen8_emit_fast_clear_color(struct intel_mipmap_tree *mt, uint32_t *surf)
+{
+   surf[7] |= mt->fast_clear_color_value;
+}
+
 /**
  * Sets up a surface state structure to point at the given region.
  * While it is only used for the front/back buffer currently, it should be
@@ -510,11 +516,11 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
 aux_mode;
}
 
-   surf[7] = mt->fast_clear_color_value |
- SET_FIELD(HSW_SCS_RED,   GEN7_SURFACE_SCS_R) |
- SET_FIELD(HSW_SCS_GREEN, GEN7_SURFACE_SCS_G) |
- SET_FIELD(HSW_SCS_BLUE,  GEN7_SURFACE_SCS_B) |
- SET_FIELD(HSW_SCS_ALPHA, GEN7_SURFACE_SCS_A);
+   gen8_emit_fast_clear_color(mt, surf);
+   surf[7] |= SET_FIELD(HSW_SCS_RED,   GEN7_SURFACE_SCS_R) |
+  SET_FIELD(HSW_SCS_GREEN, GEN7_SURFACE_SCS_G) |
+  SET_FIELD(HSW_SCS_BLUE,  GEN7_SURFACE_SCS_B) |
+  SET_FIELD(HSW_SCS_ALPHA, GEN7_SURFACE_SCS_A);
 
assert(mt->offset % mt->cpp == 0);
*((uint64_t *) [8]) = mt->bo->offset64 + mt->offset; /* reloc */
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/10] i965/meta: Remove fast_clear_color variable

2015-10-13 Thread Ben Widawsky
It doesn't actually serve a purpose AFAICT (in fact, I'm not certain what it's
meant to do).

Cc: Kristian Høgsberg 
Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index 41afc9a..9c51ffb 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -390,8 +390,6 @@ set_fast_clear_color(struct brw_context *brw,
}
 }
 
-static const uint32_t fast_clear_color[4] = { ~0, ~0, ~0, ~0 };
-
 static void
 set_fast_clear_op(struct brw_context *brw, uint32_t op)
 {
@@ -472,7 +470,7 @@ fast_clear_attachments(struct brw_context *brw,
 
   use_rectlist(brw, true);
 
-  brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
+  brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f);
 
   /* SKL+ also has a resolve mode for compressed render targets and thus 
more
* bits to let us select the type of resolve.  For fast clear resolves, 
it
@@ -670,7 +668,7 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
   fast_clear_attachments(brw, fb, fast_clear_buffers, fast_clear_rect);
} else if (fast_clear_buffers) {
   _mesa_meta_drawbuffers_from_bitfield(fast_clear_buffers);
-  brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
+  brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f);
   set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE);
   brw_draw_rectlist(ctx, _clear_rect, layers);
   set_fast_clear_op(brw, 0);
@@ -785,7 +783,7 @@ brw_meta_resolve_color(struct brw_context *brw,
 
use_rectlist(brw, true);
 
-   brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
+   brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f);
 
/* SKL+ also has a resolve mode for compressed render targets and thus more
 * bits to let us select the type of resolve.  For fast clear resolves, it
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/10] i965/meta: Assert fast clears and rep clears never overlap

2015-10-13 Thread Ben Widawsky
There is nothing wrong with the code today, but as one modifies the code it
turns out to be not too difficult to mess up the code, and this easy assertion
should catch such driver implementation failures quickly.

Cc: Kristian Høgsberg 
Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index 97094ae..41afc9a 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -616,6 +616,8 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
   }
}
 
+   assert((fast_clear_buffers & rep_clear_buffers) == 0);
+
if (!(fast_clear_buffers | rep_clear_buffers)) {
   if (plain_clear_buffers)
  /* If we only have plain clears, skip the meta save/restore. */
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/10] i965/skl: skip fast clears for certain surface formats

2015-10-13 Thread Ben Widawsky
Initially I had this planned as a patch to be squashed in to the enabling patch
because there is no point enabling fast clears without this. However, Chad
merged a patch which disables fast clears on gen9 explicitly, and so I can hide
this behind the revert of that patch. This is a nice I really wanted this patch
as a distinct patch for review. This is a new, weird, and poorly documented
restriction for SKL. (In fact, I am still not 100% certain the restriction is
entirely necessary, but there are around 30 piglit regressions without this).

SKL adds compressible render targets and as a result mutates some of the
programming for fast clears and resolves. There is a new internal surface type
called the CCS. The old AUX_MCS bit becomes AUX_CCS_D. "The Auxiliary surface is
a CCS (Color Control Surface) with compression disabled or an MCS with
compression enabled, depending on number of multisamples. MCS (Multisample
Control Surface) is a special type of CCS."

Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_context.h |  1 +
 src/mesa/drivers/dri/i965/brw_surface_formats.c | 27 +
 src/mesa/drivers/dri/i965/gen8_surface_state.c  |  8 ++--
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c   |  3 +++
 4 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index e59478a..32b8250 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1546,6 +1546,7 @@ struct brw_context
 
uint32_t render_target_format[MESA_FORMAT_COUNT];
bool format_supported_as_render_target[MESA_FORMAT_COUNT];
+   bool losslessly_compressable[MESA_FORMAT_COUNT];
 
/* Interpolation modes, one byte per vue slot.
 * Used Gen4/5 by the clip|sf|wm stages. Ignored on Gen6+.
diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c 
b/src/mesa/drivers/dri/i965/brw_surface_formats.c
index 97fff60..d706ecc 100644
--- a/src/mesa/drivers/dri/i965/brw_surface_formats.c
+++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c
@@ -693,6 +693,33 @@ brw_init_surface_formats(struct brw_context *brw)
   }
}
 
+   if (brw->gen >= 9) {
+  brw->losslessly_compressable[MESA_FORMAT_RGBA_FLOAT32] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT32] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT32] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RGBA_UNORM16] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RGBA_SNORM16] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT16] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT16] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RGBA_FLOAT16] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RG_FLOAT32] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RG_SINT32] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RG_UINT32] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RGBX_FLOAT16] = true;
+  brw->losslessly_compressable[MESA_FORMAT_B8G8R8A8_UNORM] = true;
+  brw->losslessly_compressable[MESA_FORMAT_R8G8B8A8_UNORM] = true;
+  brw->losslessly_compressable[MESA_FORMAT_R8G8B8A8_SNORM] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT8] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT8] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RG_SINT16] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RG_UINT16] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RG_FLOAT16] = true;
+  brw->losslessly_compressable[MESA_FORMAT_R_UINT32] = true;
+  brw->losslessly_compressable[MESA_FORMAT_R_SINT32] = true;
+  brw->losslessly_compressable[MESA_FORMAT_R_FLOAT32] = true;
+  brw->losslessly_compressable[MESA_FORMAT_B8G8R8X8_UNORM] = true;
+   }
+
/* We will check this table for FBO completeness, but the surface format
 * table above only covered color rendering.
 */
diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
b/src/mesa/drivers/dri/i965/gen8_surface_state.c
index 995b4dd..b19b492 100644
--- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
@@ -243,8 +243,10 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
* "When Auxiliary Surface Mode is set to AUX_CCS_D or AUX_CCS_E, HALIGN
*  16 must be used."
*/
-  if (brw->gen >= 9 || mt->num_samples == 1)
+  if (brw->gen >= 9 || mt->num_samples == 1) {
  assert(mt->halign == 16);
+ assert(mt->num_samples || brw->losslessly_compressable[mt->format] == 
true);
+  }
}
 
const uint32_t surf_type = translate_tex_target(target);
@@ -488,8 +490,10 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
* "When Auxiliary Surface Mode is set to AUX_CCS_D or AUX_CCS_E, HALIGN
*  16 must be used."
*/
-  if 

Re: [Mesa-dev] [PATCH] mesa: remove unused functions in program.c

2015-10-13 Thread Matt Turner
On Tue, Oct 13, 2015 at 7:16 PM, Brian Paul  wrote:
> replace_registers() and adjust_param_indexes() were unused.
> ---

Yep, noticed this too. Found it strange that someone would have
managed to find and delete unused functions that gcc doesn't warn
about while leaving in place functions that it does...

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Restore compute shader support in brw_nir_lower_inputs

2015-10-13 Thread Kenneth Graunke
On Tuesday, October 13, 2015 01:44:55 PM Jordan Justen wrote:
> The commit shown below caused compute shaders to hit the unreachable
> in the default of the switch block. Restore compute shaders to use the
> fragment shader path.
> 
> Also, simplify the fragment/compute path to only support scalar mode.
> 
> commit 2953c3d76178d7589947e6ea1dbd902b7b02b3d4
> Author: Kenneth Graunke 
> Date:   Fri Aug 14 15:15:11 2015 -0700
> 
> i965/vs: Map scalar VS input locations properly; avoid tons of MOVs.
> 
> Signed-off-by: Jordan Justen 
> Cc: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_nir.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
> b/src/mesa/drivers/dri/i965/brw_nir.c
> index 4f35d81..357ee4f 100644
> --- a/src/mesa/drivers/dri/i965/brw_nir.c
> +++ b/src/mesa/drivers/dri/i965/brw_nir.c
> @@ -96,8 +96,10 @@ brw_nir_lower_inputs(nir_shader *nir, bool is_scalar)
>}
>break;
> case MESA_SHADER_FRAGMENT:
> +   case MESA_SHADER_COMPUTE:
> +  assert(is_scalar);
>nir_assign_var_locations(>inputs, >num_inputs,
> -   is_scalar ? type_size_scalar : 
> type_size_vec4);
> +   type_size_scalar);
>break;
> default:
>unreachable("unsupported shader stage");
> 

I didn't think compute shaders had inputs...so it might make more sense
just to do:

   case MESA_SHADER_COMPUTE:
  /* Compute shaders have no inputs. */
  break;

Either way, sorry for breaking this, and...
Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] glsl: Support uint index in do_vec_index_to_cond_assign

2015-10-13 Thread Tapani Pälli

Series is
Reviewed-by: Tapani Pälli 

(Note that due to recent changes, test won't pass but fails like:
glcts: brw_nir.c:103: brw_nir_lower_inputs: Assertion `!"unsupported 
shader stage"' failed.)


On 10/14/2015 12:27 AM, Jordan Justen wrote:

The ES31-CTS.compute_shader.pipeline-compute-chain test case was
generating an unsigned index by using gl_LocalInvocationID.x and
gl_LocalInvocationID.y as array indices.

Signed-off-by: Jordan Justen 
---
  src/glsl/lower_vec_index_to_cond_assign.cpp | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/glsl/lower_vec_index_to_cond_assign.cpp 
b/src/glsl/lower_vec_index_to_cond_assign.cpp
index 0c3394a..b623882 100644
--- a/src/glsl/lower_vec_index_to_cond_assign.cpp
+++ b/src/glsl/lower_vec_index_to_cond_assign.cpp
@@ -88,7 +88,9 @@ 
ir_vec_index_to_cond_assign_visitor::convert_vec_index_to_cond_assign(void *mem_
 exec_list list;
  
 /* Store the index to a temporary to avoid reusing its tree. */

-   index = new(base_ir) ir_variable(glsl_type::int_type,
+   assert(orig_index->type == glsl_type::int_type ||
+  orig_index->type == glsl_type::uint_type);
+   index = new(base_ir) ir_variable(orig_index->type,
"vec_index_tmp_i",
ir_var_temporary);
 list.push_tail(index);


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/10] i965/skl: Enable fast color clears on SKL

2015-10-13 Thread Matt Turner
On Tue, Oct 13, 2015 at 8:50 PM, Ben Widawsky
 wrote:
> Based on a patch originally from Kristian. Skylake has extended capabilities
> with regard to fast clears, but that is saved for another patch.
>
> The same effect could be acheived with the following, however I think the way
> I've done it is more in line with how the docs explain it.
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -150,9 +150,13 @@ intel_get_non_msrt_mcs_alignment(struct brw_context *brw,
>/* In release builds, fall through */
> case I915_TILING_Y:
>*width_px = 32 / mt->cpp;
> -  *height = 4;
> +  if (brw->gen >= 9)
> + *height = 2;
> +  else
> + *height = 4;
>
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 54 
> +
>  src/mesa/drivers/dri/i965/gen8_surface_state.c  | 34 
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c   |  9 +
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.h   |  7 +++-
>  4 files changed, 78 insertions(+), 26 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> index fbde3f0..7bf52f0 100644
> --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> @@ -204,7 +204,7 @@ brw_draw_rectlist(struct gl_context *ctx, struct rect 
> *rect, int num_instances)
>  }
>
>  static void
> -get_fast_clear_rect(struct gl_framebuffer *fb,
> +get_fast_clear_rect(struct brw_context *brw, struct gl_framebuffer *fb,
>  struct intel_renderbuffer *irb, struct rect *rect)
>  {
> unsigned int x_align, y_align;
> @@ -228,7 +228,14 @@ get_fast_clear_rect(struct gl_framebuffer *fb,
> */
>intel_get_non_msrt_mcs_alignment(irb->mt, _align, _align);
>x_align *= 16;
> -  y_align *= 32;
> +
> +  /* SKL+ line alignment requirement for Y-tiled are half those of the 
> prior
> +   * generations.
> +   */
> +  if (brw->gen >= 9)
> + y_align *= 16;
> +  else
> + y_align *= 32;
>
>/* From the Ivy Bridge PRM, Vol2 Part1 11.7 "MCS Buffer for Render
> * Target(s)", beneath the "Fast Color Clear" bullet (p327):
> @@ -265,8 +272,10 @@ get_fast_clear_rect(struct gl_framebuffer *fb,
> * terms of (width,height) of the RT.
> *
> * MSAA  Width of Clear Rect  Height of Clear Rect
> +   *  2X Ceil(1/8*width)  Ceil(1/2*height)
> *  4X Ceil(1/8*width)  Ceil(1/2*height)
> *  8X Ceil(1/2*width)  Ceil(1/2*height)
> +   * 16X widthCeil(1/2*height)
> *
> * The text "with upper left co-ordinate to coincide with actual
> * rectangle being cleared" is a little confusing--it seems to imply
> @@ -289,6 +298,9 @@ get_fast_clear_rect(struct gl_framebuffer *fb,
>case 8:
>   x_scaledown = 2;
>   break;
> +  case 16:
> + x_scaledown = 1;
> + break;
>default:
>   unreachable("Unexpected sample count for fast clear");
>}
> @@ -358,18 +370,24 @@ is_color_fast_clear_compatible(struct brw_context *brw,
>
>  /**
>   * Convert the given color to a bitfield suitable for ORing into DWORD 7 of
> - * SURFACE_STATE.
> + * SURFACE_STATE (DWORD 12-15 on SKL+).
>   */
> -static uint32_t
> -compute_fast_clear_color_bits(const union gl_color_union *color)
> +static void
> +set_fast_clear_color(struct brw_context *brw,
> + struct intel_mipmap_tree *mt,
> + const union gl_color_union *color)
>  {
> -   uint32_t bits = 0;
> -   for (int i = 0; i < 4; i++) {
> -  /* Testing for non-0 works for integer and float colors */
> -  if (color->f[i] != 0.0f)
> - bits |= 1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i));
> +   if (brw->gen >= 9) {
> +  mt->gen9_fast_clear_color = *color;
> +   } else {
> +  mt->fast_clear_color_value = 0;
> +  for (int i = 0; i < 4; i++) {
> + /* Testing for non-0 works for integer and float colors */
> + if (color->f[i] != 0.0f)
> + mt->fast_clear_color_value |=
> +1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i));

Lets put braces around this since it spans multiple lines.

> +  }
> }
> -   return bits;
>  }
>
>  static const uint32_t fast_clear_color[4] = { ~0, ~0, ~0, ~0 };
> @@ -504,8 +522,7 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
> gl_framebuffer *fb,
>
>switch (clear_type) {
>case FAST_CLEAR:
> - irb->mt->fast_clear_color_value =
> -compute_fast_clear_color_bits(>Color.ClearColor);
> + set_fast_clear_color(brw, irb->mt, >Color.ClearColor);
>   irb->need_downsample = true;
>
>   /* If the buffer is 

Re: [Mesa-dev] [PATCH 05/10] i965/meta/gen9: Individually fast clear color attachments

2015-10-13 Thread Matt Turner
On Tue, Oct 13, 2015 at 9:12 PM, Matt Turner  wrote:
> On Tue, Oct 13, 2015 at 8:50 PM, Ben Widawsky
>  wrote:
>> The impetus for this patch comes from a seemingly benign statement within the
>> spec (quoted within the patch). For me, this patch was at some point critical
>> for getting stable piglit results (though this did not seem to be the case 
>> on a
>> branch Chad was working on).
>>
>> It is very important for clearing multiple color buffer attachments and can 
>> be
>> observed in the following piglit tests:
>> spec/arb_framebuffer_object/fbo-drawbuffers-none glclear
>> spec/ext_framebuffer_multisample/blit-multiple-render-targets 0
>>
>> Signed-off-by: Ben Widawsky 
>> ---
>>  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 97 
>> +
>>  1 file changed, 84 insertions(+), 13 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
>> b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
>> index 7bf52f0..9e6711e 100644
>> --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
>> +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
>> @@ -427,6 +427,74 @@ use_rectlist(struct brw_context *brw, bool enable)
>> brw->ctx.NewDriverState |= BRW_NEW_FRAGMENT_PROGRAM;
>>  }
>>
>> +/**
>> + * Individually fast clear each color buffer attachment. On previous gens 
>> this
>> + * isn't required. The motivation for this comes from one line (which seems 
>> to
>> + * be specific to SKL+). The list item is in section titled _MCS Buffer for
>> + * Render Target(s)_
>> + *
>> + *   "Since only one RT is bound with a clear pass, only one RT can be 
>> cleared
>> + *   at a time. To clear multiple RTs, multiple clear passes are required."
>> + *
>> + * The code follows the same idea as the resolve code which creates a fake 
>> FBO
>> + * to avoid interfering with too much of the GL state.
>> + */
>> +static void
>> +fast_clear_attachments(struct brw_context *brw,
>> +   struct gl_framebuffer *fb,
>> +   uint32_t fast_clear_buffers,
>> +   struct rect fast_clear_rect)
>> +{
>> +   assert(brw->gen >= 9);
>> +   struct gl_context *ctx = >ctx;
>> +
>> +   GLuint old_fb = ctx->DrawBuffer->Name;
>> +
>> +   for (unsigned buf = 0; buf < fb->_NumColorDrawBuffers; buf++) {
>> +  struct gl_renderbuffer *rb = fb->_ColorDrawBuffers[buf];
>> +  struct intel_renderbuffer *irb = intel_renderbuffer(rb);
>> +  GLuint fbo, rbo;
>> +  int index = fb->_ColorDrawBufferIndexes[buf];
>> +
>> +  if (!((1 << index) & fast_clear_buffers))
>
> Small suggestion: I think this would read better as
> ((fast_clear_buffers & (1 << index)) != 0.

Oops, sorry -- of course I meant ((fast_clear_buffers & (1 << index)) == 0.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: remove unused texUnit local in _mesa_BindTextureUnit()

2015-10-13 Thread Brian Paul
The texture unit is error-checked before this and the texUnit var
is unused, so remove it.
---
 src/mesa/main/texobj.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/src/mesa/main/texobj.c b/src/mesa/main/texobj.c
index b571b1b..3182920 100644
--- a/src/mesa/main/texobj.c
+++ b/src/mesa/main/texobj.c
@@ -1759,19 +1759,12 @@ _mesa_BindTextureUnit(GLuint unit, GLuint texture)
 {
GET_CURRENT_CONTEXT(ctx);
struct gl_texture_object *texObj;
-   struct gl_texture_unit *texUnit;
 
if (unit >= _mesa_max_tex_unit(ctx)) {
   _mesa_error(ctx, GL_INVALID_VALUE, "glBindTextureUnit(unit=%u)", unit);
   return;
}
 
-   texUnit = _mesa_get_tex_unit(ctx, unit);
-   assert(texUnit);
-   if (!texUnit) {
-  return;
-   }
-
if (MESA_VERBOSE & (VERBOSE_API|VERBOSE_TEXTURE))
   _mesa_debug(ctx, "glBindTextureUnit %s %d\n",
   _mesa_enum_to_string(GL_TEXTURE0+unit), (GLint) texture);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: remove unused functions in program.c

2015-10-13 Thread Brian Paul
replace_registers() and adjust_param_indexes() were unused.
---
 src/mesa/program/program.c | 51 --
 1 file changed, 51 deletions(-)

diff --git a/src/mesa/program/program.c b/src/mesa/program/program.c
index c35a89b..86de5e9 100644
--- a/src/mesa/program/program.c
+++ b/src/mesa/program/program.c
@@ -450,57 +450,6 @@ _mesa_delete_instructions(struct gl_program *prog, GLuint 
start, GLuint count)
 
 
 /**
- * Search instructions for registers that match (oldFile, oldIndex),
- * replacing them with (newFile, newIndex).
- */
-static void
-replace_registers(struct prog_instruction *inst, GLuint numInst,
-  GLuint oldFile, GLuint oldIndex,
-  GLuint newFile, GLuint newIndex)
-{
-   GLuint i, j;
-   for (i = 0; i < numInst; i++) {
-  /* src regs */
-  for (j = 0; j < _mesa_num_inst_src_regs(inst[i].Opcode); j++) {
- if (inst[i].SrcReg[j].File == oldFile &&
- inst[i].SrcReg[j].Index == oldIndex) {
-inst[i].SrcReg[j].File = newFile;
-inst[i].SrcReg[j].Index = newIndex;
- }
-  }
-  /* dst reg */
-  if (inst[i].DstReg.File == oldFile && inst[i].DstReg.Index == oldIndex) {
- inst[i].DstReg.File = newFile;
- inst[i].DstReg.Index = newIndex;
-  }
-   }
-}
-
-
-/**
- * Search instructions for references to program parameters.  When found,
- * increment the parameter index by 'offset'.
- * Used when combining programs.
- */
-static void
-adjust_param_indexes(struct prog_instruction *inst, GLuint numInst,
- GLuint offset)
-{
-   GLuint i, j;
-   for (i = 0; i < numInst; i++) {
-  for (j = 0; j < _mesa_num_inst_src_regs(inst[i].Opcode); j++) {
- GLuint f = inst[i].SrcReg[j].File;
- if (f == PROGRAM_CONSTANT ||
- f == PROGRAM_UNIFORM ||
- f == PROGRAM_STATE_VAR) {
-inst[i].SrcReg[j].Index += offset;
- }
-  }
-   }
-}
-
-
-/**
  * Populate the 'used' array with flags indicating which registers (TEMPs,
  * INPUTs, OUTPUTs, etc, are used by the given program.
  * \param file  type of register to scan for
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization

2015-10-13 Thread Alejandro Piñeiro


On 13/10/15 03:10, Matt Turner wrote:
> On Mon, Oct 12, 2015 at 4:25 PM, Matt Turner  wrote:
>> On Sat, Oct 10, 2015 at 4:24 AM, Alejandro Piñeiro  
>> wrote:
>>> vec4 port of fs_cmod_propagation.
>>>
>>> Shader-db results:
>>> total instructions in shared programs: 6241226 -> 6224469 (-0.27%)
>>> instructions in affected programs: 498213 -> 481456 (-3.36%)
>>> helped:3082
>>> HURT:  0
>>> ---
>>>
>>> The final outcome is really similar to fs_brw_cmod_propagation. In
>>> fact the only difference is that on fs we have this:
>>>  if (scan_inst->overwrites_reg(inst->src[0])) {
>>> if (scan_inst->is_partial_write() ||
>>> scan_inst->dst.reg_offset != inst->src[0].reg_offset)
>>>break;
>>>
>>> And on vec4 (this commit) we have this:
>>>  if (inst->src[0].in_range(scan_inst->dst,
>>>scan_inst->regs_written)) {
>>>
>>> if ((scan_inst->predicate && scan_inst->opcode != 
>>> BRW_OPCODE_SEL) ||
>>> scan_inst->dst.reg_offset != inst->src[0].reg_offset ||
>>> (scan_inst->dst.writemask != WRITEMASK_X && 
>>> scan_inst->dst.writemask != WRITEMASK_XYZW))
>>>break;
>>>
>>> if (scan_inst->dst.writemask == WRITEMASK_XYZW &&
>>> inst->src[0].swizzle != BRW_SWIZZLE_XYZW) {
>>>break;
>>> }
>>>
>>> So at some point I thought about refactoring it and having one common,
>>> like with opt_predicated_break, but that one was possible with just
>>> backend_instructions, while here we would need to deal with
>>> vec4_instructions and fs_inst, that could be somewhat messy, so
>>> I'm leaving this as it is.
>>>
>>>  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
>>>  src/mesa/drivers/dri/i965/brw_vec4.cpp |   1 +
>>>  src/mesa/drivers/dri/i965/brw_vec4.h   |   1 +
>>>  .../drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 163 
>>> +
>>>  4 files changed, 166 insertions(+)
>>>  create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
>>> b/src/mesa/drivers/dri/i965/Makefile.sources
>>> index 81ef628..c1836d6 100644
>>> --- a/src/mesa/drivers/dri/i965/Makefile.sources
>>> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
>>> @@ -56,6 +56,7 @@ i965_compiler_FILES = \
>>> brw_util.c \
>>> brw_util.h \
>>> brw_vec4_builder.h \
>>> +   brw_vec4_cmod_propagation.cpp \
>>> brw_vec4_copy_propagation.cpp \
>>> brw_vec4.cpp \
>>> brw_vec4_cse.cpp \
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> index e966b96..55e381b 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> @@ -1867,6 +1867,7 @@ vec4_visitor::run()
>>>OPT(dead_code_eliminate);
>>>OPT(dead_control_flow_eliminate, this);
>>>OPT(opt_copy_propagation);
>>> +  OPT(opt_cmod_propagation);
>>>OPT(opt_cse);
>>>OPT(opt_algebraic);
>>>OPT(opt_register_coalesce);
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
>>> b/src/mesa/drivers/dri/i965/brw_vec4.h
>>> index 5e3500c..3c1711d 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.h
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
>>> @@ -149,6 +149,7 @@ public:
>>> int var_range_start(unsigned v, unsigned n) const;
>>> int var_range_end(unsigned v, unsigned n) const;
>>> bool virtual_grf_interferes(int a, int b);
>>> +   bool opt_cmod_propagation();
>>> bool opt_copy_propagation(bool do_constant_prop = true);
>>> bool opt_cse_local(bblock_t *block);
>>> bool opt_cse();
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
>>> new file mode 100644
>>> index 000..7e39d2b
>>> --- /dev/null
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
>>> @@ -0,0 +1,163 @@
>>> +/*
>>> + * Copyright © 2015 Intel Corporation
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person obtaining a
>>> + * copy of this software and associated documentation files (the 
>>> "Software"),
>>> + * to deal in the Software without restriction, including without 
>>> limitation
>>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>>> + * and/or sell copies of the Software, and to permit persons to whom the
>>> + * Software is furnished to do so, subject to the following conditions:
>>> + *
>>> + * The above copyright notice and this permission notice (including the 
>>> next
>>> + * paragraph) shall be included in all copies or substantial portions of 
>>> the
>>> + * Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT 

Re: [Mesa-dev] [PATCH V4 2/6] glsl: assign hidden uniforms their slot id earlier

2015-10-13 Thread Timothy Arceri
On Mon, 2015-10-12 at 01:06 +0200, Marek Olšák wrote:
> On Sun, Oct 11, 2015 at 9:20 AM, Timothy Arceri <
> t_arc...@yahoo.com.au> wrote:
> > On Sat, 2015-10-10 at 18:06 +0200, Marek Olšák wrote:
> > > Hi Timothy,
> > > 
> > > One of these 3 commits breaks compilation for Talos shaders with
> > > gallium. My piglit patch "glsl-1.30/sampler-bug: ..." contains a
> > > minimal test case. I can't say which commit, because Mesa fails
> > > to
> > > build between them.
> > >  It has something to do with uniforms, structures,
> > > and samplers.
> > > 
> > > commit dcd9cd03837545055ce2a315e7e8840cc3254d1a
> > > Author: Timothy Arceri 
> > > Date:   Sun Aug 30 12:50:34 2015 +1000
> > > 
> > > glsl: store uniform slot id in var location field
> > > 
> > 
> > Hi Marek,
> > 
> > The piglit test passes on my intel based laptop I can't test on
> > anything else until later this week (as I'm traveling) but my guess
> > is
> > its something to do with the above patch.
> > 
> > The other two patches shouldn't change anything for gallium drivers
> > "glsl: assign hidden uniforms their slot id earlier" just assigns
> > hidden uniforms their slot id earlier and there shouldn't be any
> > difference once the IR gets to glsl_to_tgsi.
> > 
> > Also "glsl: order indices for samplers inside a struct array"
> > shouldn't
> > change things either as in your test the sampler are not inside the
> > struct.
> > 
> > There is some code in the glsl_to_tgsi pass that looks like the
> > location field would have always been -1 for uniforms other the UBO
> > and
> > UBO members maybe this has something to do with the problem now
> > that
> > all uniforms now get a non -1 value.
> > 
> >   case ir_var_uniform:
> >  entry = new(mem_ctx) variable_storage(var,
> > PROGRAM_UNIFORM,
> >var->data.location);
> >  this->variables.push_tail(entry);
> >  break;
> > 
> > I hope this helps get you started. If you haven't figured it out by
> > later in the week than I'll take a look on my desktop once I get
> > home.
> 
> The problem is that _mesa_get_sampler_uniform_value returns a sampler
> index which is greater than 16. If I allow large sampler indices, it
> starts to fail with sampler.file==TGSI_FILE_NULL in the TGSI codegen
> later. I didn't investigate further.
> 
> Marek


So from my bisect run your test was passing until:

a907b5dd162b7911b8c21f6d54837831bc078059 is the first bad commit
commit a907b5dd162b7911b8c21f6d54837831bc078059
Author: Marek Olšák 
Date:   Mon Oct 5 03:26:48 2015 +0200

st/mesa: translate fragment shaders into TGSI when we get them

Reviewed-by: Dave Airlie 
Reviewed-by: Brian Paul 
Tested-by: Brian Paul 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: fix matrix stride calculation for std430's row_major matrices with two columns

2015-10-13 Thread Lofstedt, Marta
Thanks Samuel,

This also fixes 10+ OpenGL ES 3.1 test cases.

Reviewed-by: Marta Lofstedt 

> -Original Message-
> From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On
> Behalf Of Samuel Iglesias Gonsalvez
> Sent: Tuesday, October 13, 2015 8:39 AM
> To: mesa-dev@lists.freedesktop.org
> Subject: [Mesa-dev] [PATCH] glsl: fix matrix stride calculation for std430's
> row_major matrices with two columns
> 
> This is the result of applying several rules:
> 
> From OpenGL 4.3 spec, section 7.6.2.2 "Standard Uniform Block Layout":
> 
> "2. If the member is a two- or four-component vector with components
> consuming N basic machine units, the base alignment is 2N or 4N,
> respectively."
> [...]
> "4. If the member is an array of scalars or vectors, the base alignment and
> array stride are set to match the base alignment of a single array element,
> according to rules (1), (2), and (3), and rounded up to the base alignment of 
> a
> vec4."
> [...]
> "7. If the member is a row-major matrix with C columns and R rows, the
> matrix is stored identically to an array of R row vectors with C components
> each, according to rule (4)."
> [...]
> "When using the std430 storage layout, shader storage blocks will be laid out
> in buffer storage identically to uniform and shader storage blocks using the
> std140 layout, except that the base alignment and stride of arrays of scalars
> and vectors in rule 4 and of structures in rule 9 are not rounded up a 
> multiple
> of the base alignment of a vec4."
> 
> In summary: vec2 has a base alignment of 2*N, a row-major mat2xY is stored
> like an array of Y row vectors with 2 components each. Because of std430
> storage layout, the base alignment of the array of vectors is not rounded up
> to vec4, so it is still 2*N.
> 
> Fixes 15 dEQP tests:
> 
> dEQP-
> GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_lowp_m
> at2
> dEQP-
> GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_mediu
> mp_mat2
> dEQP-
> GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_highp_
> mat2
> dEQP-
> GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_lowp_m
> at2x3
> dEQP-
> GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_mediu
> mp_mat2x3
> dEQP-
> GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_highp_
> mat2x3
> dEQP-
> GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_lowp_m
> at2x4
> dEQP-
> GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_mediu
> mp_mat2x4
> dEQP-
> GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_highp_
> mat2x4
> dEQP-
> GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat2
> dEQP-
> GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat2x3
> dEQP-
> GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat2x4
> dEQP-
> GLES31.functional.ssbo.layout.instance_array_basic_type.std430.row_major
> _mat2
> dEQP-
> GLES31.functional.ssbo.layout.instance_array_basic_type.std430.row_major
> _mat2x3
> dEQP-
> GLES31.functional.ssbo.layout.instance_array_basic_type.std430.row_major
> _mat2x4
> 
> v2:
> - Add spec quote in both commit log and code (Timothy)
> 
> Signed-off-by: Samuel Iglesias Gonsalvez 
> Cc: Timothy Arceri 
> ---
>  src/glsl/lower_ubo_reference.cpp | 26 +-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/src/glsl/lower_ubo_reference.cpp
> b/src/glsl/lower_ubo_reference.cpp
> index 247620e..c8ec5c1 100644
> --- a/src/glsl/lower_ubo_reference.cpp
> +++ b/src/glsl/lower_ubo_reference.cpp
> @@ -744,7 +744,31 @@ lower_ubo_reference_visitor::emit_access(bool
> is_write,
> * or 32 depending on the number of columns.
> */
>assert(matrix_columns <= 4);
> -  unsigned matrix_stride = glsl_align(matrix_columns * N, 16);
> +  unsigned matrix_stride = 0;
> +  /* Matrix stride for std430 mat2xY matrices are not rounded up to
> +   * vec4 size. From OpenGL 4.3 spec, section 7.6.2.2 "Standard Uniform
> +   * Block Layout":
> +   *
> +   * "2. If the member is a two- or four-component vector with
> components
> +   * consuming N basic machine units, the base alignment is 2N or 4N,
> +   * respectively." [...]
> +   * "4. If the member is an array of scalars or vectors, the base 
> alignment
> +   * and array stride are set to match the base alignment of a single 
> array
> +   * element, according to rules (1), (2), and (3), and rounded up to the
> +   * base alignment of a vec4." [...]
> +   * "7. If the member is a row-major matrix with C columns and R rows,
> the
> +   * matrix is stored identically to an array of R row vectors with C
> +   * components each, according to rule (4)." [...]
> +   * "When using the std430 storage layout, shader storage blocks will be
> +   * laid out in 

Re: [Mesa-dev] [PATCH V4 2/6] glsl: assign hidden uniforms their slot id earlier

2015-10-13 Thread Timothy Arceri
On Mon, 2015-10-12 at 01:06 +0200, Marek Olšák wrote:
> On Sun, Oct 11, 2015 at 9:20 AM, Timothy Arceri <
> t_arc...@yahoo.com.au> wrote:
> > On Sat, 2015-10-10 at 18:06 +0200, Marek Olšák wrote:
> > > Hi Timothy,
> > > 
> > > One of these 3 commits breaks compilation for Talos shaders with
> > > gallium. My piglit patch "glsl-1.30/sampler-bug: ..." contains a
> > > minimal test case. I can't say which commit, because Mesa fails
> > > to
> > > build between them.
> > >  It has something to do with uniforms, structures,
> > > and samplers.
> > > 
> > > commit dcd9cd03837545055ce2a315e7e8840cc3254d1a
> > > Author: Timothy Arceri 
> > > Date:   Sun Aug 30 12:50:34 2015 +1000
> > > 
> > > glsl: store uniform slot id in var location field
> > > 
> > 
> > Hi Marek,
> > 
> > The piglit test passes on my intel based laptop I can't test on
> > anything else until later this week (as I'm traveling) but my guess
> > is
> > its something to do with the above patch.
> > 
> > The other two patches shouldn't change anything for gallium drivers
> > "glsl: assign hidden uniforms their slot id earlier" just assigns
> > hidden uniforms their slot id earlier and there shouldn't be any
> > difference once the IR gets to glsl_to_tgsi.
> > 
> > Also "glsl: order indices for samplers inside a struct array"
> > shouldn't
> > change things either as in your test the sampler are not inside the
> > struct.
> > 
> > There is some code in the glsl_to_tgsi pass that looks like the
> > location field would have always been -1 for uniforms other the UBO
> > and
> > UBO members maybe this has something to do with the problem now
> > that
> > all uniforms now get a non -1 value.
> > 
> >   case ir_var_uniform:
> >  entry = new(mem_ctx) variable_storage(var,
> > PROGRAM_UNIFORM,
> >var->data.location);
> >  this->variables.push_tail(entry);
> >  break;
> > 
> > I hope this helps get you started. If you haven't figured it out by
> > later in the week than I'll take a look on my desktop once I get
> > home.
> 
> The problem is that _mesa_get_sampler_uniform_value returns a sampler
> index which is greater than 16. If I allow large sampler indices, it
> starts to fail with sampler.file==TGSI_FILE_NULL in the TGSI codegen
> later. I didn't investigate further.

Hi Marek,

I just built the ilo driver to test this and I was able to get this
error but only after updating master.

I don't have time for a full bisect right now as I'm about to get on a
flight but commit 64831832791139328a67b80387f062d39e304d24 is good and
is well after this commit.

Tim

> 
> Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: Set api prefix to version string when overriding version

2015-10-13 Thread Boyan Ding
2015-10-13 13:49 GMT+08:00 Tapani Pälli :
> Otherwise there are problems when user overrides version and application
> such as Piglit wants to detect used api with glGetString(GL_VERSION).
>
> Below is example when using MESA_GLES_VERSION_OVERRIDE=3.1.
>
> Before:
> "3.1 Mesa 11.1.0-devel (git-24a1a15)"
>
> After:
> "OpenGL ES 3.1 Mesa 11.1.0-devel (git-78042ff)"
>
> Signed-off-by: Tapani Pälli 
> ---
>  src/mesa/main/version.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/main/version.c b/src/mesa/main/version.c
> index 498b2f8..a1b1db5 100644
> --- a/src/mesa/main/version.c
> +++ b/src/mesa/main/version.c
> @@ -24,6 +24,7 @@
>
>
>  #include 
> +#include "context.h"
>  #include "imports.h"
>  #include "mtypes.h"
>  #include "version.h"
> @@ -181,7 +182,7 @@ _mesa_override_gl_version(struct gl_context *ctx)
>  {
> if (_mesa_override_gl_version_contextless(>Const, >API,
>   >Version)) {
> -  create_version_string(ctx, "");
> +  create_version_string(ctx, _mesa_is_gles(ctx) ? "OpenGL ES " : "OpenGL 
> ");


Things are a little bit funny here, because the version strings of
OpenGL and ES are different.

From OpenGL 4.5 (Core Profile) spec, Page 541:

"""
The VERSION and SHADING_LANGUAGE_VERSION strings are laid out as
follows:

"""

From OpenGL ES 3.2 spec, Page 436:

"""
The VERSION string is laid out as follows:
"OpenGL ES N.M vendor-specific information"
"""

So I don't think the "OpenGL" prefix is necessary on desktop GL.

Regards,
Boyan Ding

> }
>  }
>
> --
> 2.4.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: fix matrix stride calculation for std430's row_major matrices with two columns

2015-10-13 Thread Samuel Iglesias Gonsalvez
This is the result of applying several rules:

From OpenGL 4.3 spec, section 7.6.2.2 "Standard Uniform Block Layout":

"2. If the member is a two- or four-component vector with components
consuming N basic machine units, the base alignment is 2N or 4N,
respectively."
[...]
"4. If the member is an array of scalars or vectors, the base alignment
and array stride are set to match the base alignment of a single array
element, according to rules (1), (2), and (3), and rounded up to the
base alignment of a vec4."
[...]
"7. If the member is a row-major matrix with C columns and R rows, the
matrix is stored identically to an array of R row vectors with C
components each, according to rule (4)."
[...]
"When using the std430 storage layout, shader storage blocks will be
laid out in buffer storage identically to uniform and shader storage
blocks using the std140 layout, except that the base alignment and
stride of arrays of scalars and vectors in rule 4 and of structures in
rule 9 are not rounded up a multiple of the base alignment of a vec4."

In summary: vec2 has a base alignment of 2*N, a row-major mat2xY is
stored like an array of Y row vectors with 2 components each. Because
of std430 storage layout, the base alignment of the array of vectors
is not rounded up to vec4, so it is still 2*N.

Fixes 15 dEQP tests:

dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_lowp_mat2
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_mediump_mat2
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_highp_mat2
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_lowp_mat2x3
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_mediump_mat2x3
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_highp_mat2x3
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_lowp_mat2x4
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_mediump_mat2x4
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_highp_mat2x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat2
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat2x3
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat2x4
dEQP-GLES31.functional.ssbo.layout.instance_array_basic_type.std430.row_major_mat2
dEQP-GLES31.functional.ssbo.layout.instance_array_basic_type.std430.row_major_mat2x3
dEQP-GLES31.functional.ssbo.layout.instance_array_basic_type.std430.row_major_mat2x4

v2:
- Add spec quote in both commit log and code (Timothy)

Signed-off-by: Samuel Iglesias Gonsalvez 
Cc: Timothy Arceri 
---
 src/glsl/lower_ubo_reference.cpp | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/src/glsl/lower_ubo_reference.cpp b/src/glsl/lower_ubo_reference.cpp
index 247620e..c8ec5c1 100644
--- a/src/glsl/lower_ubo_reference.cpp
+++ b/src/glsl/lower_ubo_reference.cpp
@@ -744,7 +744,31 @@ lower_ubo_reference_visitor::emit_access(bool is_write,
* or 32 depending on the number of columns.
*/
   assert(matrix_columns <= 4);
-  unsigned matrix_stride = glsl_align(matrix_columns * N, 16);
+  unsigned matrix_stride = 0;
+  /* Matrix stride for std430 mat2xY matrices are not rounded up to
+   * vec4 size. From OpenGL 4.3 spec, section 7.6.2.2 "Standard Uniform
+   * Block Layout":
+   *
+   * "2. If the member is a two- or four-component vector with components
+   * consuming N basic machine units, the base alignment is 2N or 4N,
+   * respectively." [...]
+   * "4. If the member is an array of scalars or vectors, the base 
alignment
+   * and array stride are set to match the base alignment of a single array
+   * element, according to rules (1), (2), and (3), and rounded up to the
+   * base alignment of a vec4." [...]
+   * "7. If the member is a row-major matrix with C columns and R rows, the
+   * matrix is stored identically to an array of R row vectors with C
+   * components each, according to rule (4)." [...]
+   * "When using the std430 storage layout, shader storage blocks will be
+   * laid out in buffer storage identically to uniform and shader storage
+   * blocks using the std140 layout, except that the base alignment and
+   * stride of arrays of scalars and vectors in rule 4 and of structures in
+   * rule 9 are not rounded up a multiple of the base alignment of a vec4."
+   */
+  if (packing == GLSL_INTERFACE_PACKING_STD430 && matrix_columns == 2)
+ matrix_stride = 2 * N;
+  else
+ matrix_stride = glsl_align(matrix_columns * N, 16);
 
   const glsl_type *deref_type = deref->type->base_type == GLSL_TYPE_FLOAT ?
  glsl_type::float_type : glsl_type::double_type;
-- 
2.1.4

___
mesa-dev mailing list

Re: [Mesa-dev] [PATCH] mesa: Set api prefix to version string when overriding version

2015-10-13 Thread Tapani Pälli



On 10/13/2015 09:35 AM, Boyan Ding wrote:

2015-10-13 13:49 GMT+08:00 Tapani Pälli :

Otherwise there are problems when user overrides version and application
such as Piglit wants to detect used api with glGetString(GL_VERSION).

Below is example when using MESA_GLES_VERSION_OVERRIDE=3.1.

Before:
 "3.1 Mesa 11.1.0-devel (git-24a1a15)"

After:
 "OpenGL ES 3.1 Mesa 11.1.0-devel (git-78042ff)"

Signed-off-by: Tapani Pälli 
---
  src/mesa/main/version.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/version.c b/src/mesa/main/version.c
index 498b2f8..a1b1db5 100644
--- a/src/mesa/main/version.c
+++ b/src/mesa/main/version.c
@@ -24,6 +24,7 @@


  #include 
+#include "context.h"
  #include "imports.h"
  #include "mtypes.h"
  #include "version.h"
@@ -181,7 +182,7 @@ _mesa_override_gl_version(struct gl_context *ctx)
  {
 if (_mesa_override_gl_version_contextless(>Const, >API,
   >Version)) {
-  create_version_string(ctx, "");
+  create_version_string(ctx, _mesa_is_gles(ctx) ? "OpenGL ES " : "OpenGL 
");



Things are a little bit funny here, because the version strings of
OpenGL and ES are different.

 From OpenGL 4.5 (Core Profile) spec, Page 541:

"""
The VERSION and SHADING_LANGUAGE_VERSION strings are laid out as
follows:
 
"""

 From OpenGL ES 3.2 spec, Page 436:

"""
The VERSION string is laid out as follows:
 "OpenGL ES N.M vendor-specific information"
"""

So I don't think the "OpenGL" prefix is necessary on desktop GL.


ok, I was not a aware of such inconsistency. I guess it should be ok 
then to change "OpenGL " as "" in the patch.


Thanks for pointing this out!


Regards,
Boyan Ding


 }
  }

--
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] radeonsi: implement vertex color clamping

2015-10-13 Thread Michel Dänzer
On 11.10.2015 10:17, Marek Olšák wrote:
> From: Marek Olšák 
> 
> This is only supported in the compatibility profile (without GS and tess).

[...]

> diff --git a/src/gallium/drivers/radeonsi/si_state.c 
> b/src/gallium/drivers/radeonsi/si_state.c
> index 3aafe8a..a81060a 100644
> --- a/src/gallium/drivers/radeonsi/si_state.c
> +++ b/src/gallium/drivers/radeonsi/si_state.c
> @@ -760,6 +760,8 @@ static void *si_create_rs_state(struct pipe_context *ctx,
>  state->fill_back != PIPE_POLYGON_MODE_FILL) |
>   
> S_028814_POLYMODE_FRONT_PTYPE(si_translate_fill(state->fill_front)) |
>   
> S_028814_POLYMODE_BACK_PTYPE(si_translate_fill(state->fill_back)));
> + si_pm4_set_reg(pm4, R_00B130_SPI_SHADER_USER_DATA_VS_0 +
> + SI_SGPR_VS_STATE_BITS * 4, 
> state->clamp_vertex_color);

The second line added here seems to have a few too many spaces of
indentation. With that fixed, this series is

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V4 2/6] glsl: assign hidden uniforms their slot id earlier

2015-10-13 Thread Michel Dänzer
On 11.10.2015 16:20, Timothy Arceri wrote:
> On Sat, 2015-10-10 at 18:06 +0200, Marek Olšák wrote:
>> Hi Timothy,
>>
>> One of these 3 commits breaks compilation for Talos shaders with
>> gallium. My piglit patch "glsl-1.30/sampler-bug: ..." contains a
>> minimal test case. I can't say which commit, because Mesa fails to
>> build between them.
>>  It has something to do with uniforms, structures,
>> and samplers.
>>
>> commit dcd9cd03837545055ce2a315e7e8840cc3254d1a
>> Author: Timothy Arceri 
>> Date:   Sun Aug 30 12:50:34 2015 +1000
>>
>> glsl: store uniform slot id in var location field
>>
> 
> Hi Marek,
> 
> The piglit test passes on my intel based laptop I can't test on
> anything else until later this week (as I'm traveling) [...]

FWIW, surely you can test llvmpipe on that laptop? :)


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/10] RadeonSI: Better LLVM IR generation

2015-10-13 Thread Michel Dänzer
On 11.10.2015 10:29, Marek Olšák wrote:
> 
> This patch series improves IR generation for radeonsi. Most of it removes
> uses of AMDGPU intrinsics.

Patches 3-7 are

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: couple shader_enums cleanups

2015-10-13 Thread Emil Velikov
On 10 October 2015 at 01:56, Rob Clark  wrote:
> On Fri, Oct 9, 2015 at 8:07 PM, Emil Velikov  wrote:
>> On 9 October 2015 at 23:27, Rob Clark  wrote:
>>> On Fri, Oct 9, 2015 at 5:57 PM, Emil Velikov  
>>> wrote:
 On 9 October 2015 at 21:31, Rob Clark  wrote:
> From: Rob Clark 
>
> Add missing enum to gl_system_value_name() and move VARYING_SLOT_MAX /
> FRAG_RESULT_MAX / etc into shader_enums.h as suggested by Emil.
>
> Reported-by: Emil Velikov 
> Signed-off-by: Rob Clark 
> ---
> Note: punted on the STATIC_ASSERT() thing for now..  kinda wanted some-
> think more like tgsi_strings_check() where we check everything once on
> startup, so you don't have to trigger something that calls the various
> xyz_name() to realize something is out of sync.
>
 I'm confused - isn't static_assert a compile thing ?
>>>
>>> oh, heh, good point..
>>>
>  src/glsl/nir/shader_enums.c | 1 +
>  src/glsl/nir/shader_enums.h | 7 +++
>  src/mesa/main/mtypes.h  | 5 -
>  3 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/src/glsl/nir/shader_enums.c b/src/glsl/nir/shader_enums.c
> index 3722475..7fcbe81 100644
> --- a/src/glsl/nir/shader_enums.c
> +++ b/src/glsl/nir/shader_enums.c
> @@ -169,6 +169,7 @@ const char * gl_system_value_name(gl_system_value 
> sysval)
>   ENUM(SYSTEM_VALUE_TESS_LEVEL_INNER),
>   ENUM(SYSTEM_VALUE_LOCAL_INVOCATION_ID),
>   ENUM(SYSTEM_VALUE_WORK_GROUP_ID),
> + ENUM(SYSTEM_VALUE_NUM_WORK_GROUPS),
>   ENUM(SYSTEM_VALUE_VERTEX_CNT),
> };
> return NAME(sysval);
> diff --git a/src/glsl/nir/shader_enums.h b/src/glsl/nir/shader_enums.h
> index 2a5d2c5..77638ba 100644
> --- a/src/glsl/nir/shader_enums.h
> +++ b/src/glsl/nir/shader_enums.h
> @@ -233,6 +233,11 @@ typedef enum
> VARYING_SLOT_VAR31,
>  } gl_varying_slot;
>
> +
> +#define VARYING_SLOT_MAX   (VARYING_SLOT_VAR0 + MAX_VARYING)
 I'd keep this and FRAG_RESULT_MAX defined as FOO + 1. Otherwise we'll
 end in funny spaghetti of header dependencies due to the extra two
 macros.
>>>
>>> not quite sure, but it sounds like you are asking to open-code
>>> MAX_VARYINGS?  I don't think that is needed since that seems to
>>> basically be a global config.h type thing (I mean mesa/main/config.h,
>>> not autoconf one)
>>>
>>> There isn't really *supposed* to be a VARYING_SLOT_VAR(n-1) (ignoring
>>> the fact that I added them simply to get nice string values to print
>>> out in nir_print)
>>>
>> I was thinking that #define VARYING_SLOT_MAX (VARYING_SLOT_VAR + 1)
>> /*VARYING_SLOT_VAR(n) */ is the preferred way.
>> Hmm can you please point me in the right direction, as to why having
>> VARYING_SLOT_VAR(n-1) is a bad idea ?
>
> A few of the enums are meant to have their last element be interpreted
> as xyz0 + n (where n < #define MAX_xyz from config.h)
>
> I guess technically the max isn't something that is likely to change
> frequently but if we ignore the MAX_xyz from config.h that means we
> have two places to update if the max was ever increased.
>
If we go for the 0 + MAX_foo approach, we'll get some lovely build
warnings. On the other hand things will break nicely with +1. I'm
slightly leaning towards the latter, as it seems that currently things
are subtle/brittle and by breaking them (albeit unlikely in the near
future) we'll get a victim^Wvolunteer to untangle/make things obvious.

Just my 2c. I'm not proposing that you have to do anything on the
topic, neither am I volunteering.

> +#define VARYING_SLOT_PATCH0(VARYING_SLOT_MAX)
> +#define VARYING_SLOT_TESS_MAX  (VARYING_SLOT_PATCH0 + MAX_VARYING)
> +
 As there are defined 'on top'/'after' of the existing enum perhaps we
 should keep them alongside their PRIM_MAX...UNKNOWN brethren ?
>>>
>>> PRIM_MAX is a different thing, not related to anything that is already
>>> in shader_enums..
>>>
>> I had in mind the relationship wrt the respective enums/macros, rather
>> than their meaning. Although with the _MAX above in mind, I'm likely
>> missing something.
>
> prim == gl draw primitive (ie. tris/tristrip/quads/points/etc)..  not
> something that is internal to mesa or in shader_enums.h..
>
That's precisely what I meant. PRIM_MAX and VARYING_SLOT_MAX are well
defined quantities with the former unlikey to ever change.
VARYING_SLOT_PATCH0 and VARYING_SLOT_TESS_MAX on the other hand, are
'duck taped' on top of the existing gl_varying_slot enum. Similar to
PRIM_OUTSIDE_BEGIN_END and PRIM_UNKNOWN.

Cheers,
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] tgsi: try and handle overflowing shaders.

2015-10-13 Thread Marek Olšák
On Tue, Oct 13, 2015 at 6:40 AM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> This is used to detect error in virgl if we overflow the shader
> dumping buffers.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/gallium/auxiliary/tgsi/tgsi_dump.c | 10 --
>  src/gallium/auxiliary/tgsi/tgsi_dump.h |  2 +-
>  2 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c 
> b/src/gallium/auxiliary/tgsi/tgsi_dump.c
> index 33f6a56..f341783 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_dump.c
> +++ b/src/gallium/auxiliary/tgsi/tgsi_dump.c
> @@ -708,6 +708,7 @@ struct str_dump_ctx
> char *str;
> char *ptr;
> int left;
> +   bool nospace;
>  };
>
>  static void
> @@ -730,10 +731,11 @@ str_dump_ctx_printf(struct dump_ctx *ctx, const char 
> *format, ...)
>   sctx->ptr += written;
>   sctx->left -= written;
>}
> -   }
> +   } else
> +  sctx->nospace = true;
>  }
>
> -void
> +int
>  tgsi_dump_str(
> const struct tgsi_token *tokens,
> uint flags,
> @@ -760,6 +762,7 @@ tgsi_dump_str(
> ctx.str[0] = 0;
> ctx.ptr = str;
> ctx.left = (int)size;
> +   ctx.nospace = false;
>
> if (flags & TGSI_DUMP_FLOAT_AS_HEX)
>ctx.base.dump_float_as_hex = TRUE;
> @@ -767,6 +770,8 @@ tgsi_dump_str(
>ctx.base.dump_float_as_hex = FALSE;
>
> tgsi_iterate_shader( tokens,  );
> +
> +   return (ctx.nospace == true) ? -1 : 0;

Why not just return bool meaning success/failure?

Anyway:

Reviewed-by: Marek Olšák 

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/10] RadeonSI cleanups

2015-10-13 Thread Michel Dänzer
On 11.10.2015 10:11, Marek Olšák wrote:
> Nothing special here other than cleanups. One patch disables NaNs for LS and 
> HS, and there's also one GS shader leak fix.
> 
> Please review.

Patches 1, 3-6 & 8 are

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH v2] configure.ac: ensure RM is set

2015-10-13 Thread Emil Velikov
On 10 October 2015 at 07:42, Jonathan Gray  wrote:
> GNU make predefines RM to rm -f but this is not required by POSIX
> so ensure that RM is set.  This fixes "make clean" on OpenBSD.
>
> v2: use AC_CHECK_PROG
>
I'm ambivalent if we go the AC_CHECK_PROG vs ?= route.

I'll let others pick their favourite, but both are
Reviewed-by: Emil Velikov 

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vbo: fix incorrect switch statement in init_mat_currval()

2015-10-13 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Tue, Oct 13, 2015 at 4:45 AM, Brian Paul  wrote:
> The variable 'i' is a value in [0, MAT_ATTRIB_MAX-1] so subtracting
> VERT_ATTRIB_GENERIC0 gave a bogus value and we executed the default
> switch clause for all loop iterations.
>
> This doesn't fix any known issues but was clearly incorrect.
>
> Cc: mesa-sta...@lists.freedesktop.org
> ---
>  src/mesa/vbo/vbo_context.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/mesa/vbo/vbo_context.c b/src/mesa/vbo/vbo_context.c
> index e3eb286..802955d 100644
> --- a/src/mesa/vbo/vbo_context.c
> +++ b/src/mesa/vbo/vbo_context.c
> @@ -121,7 +121,7 @@ static void init_mat_currval(struct gl_context *ctx)
>/* Size is fixed for the material attributes, for others will
> * be determined at runtime:
> */
> -  switch (i - VERT_ATTRIB_GENERIC0) {
> +  switch (i) {
>case MAT_ATTRIB_FRONT_SHININESS:
>case MAT_ATTRIB_BACK_SHININESS:
>  cl->Size = 1;
> --
> 1.9.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/4] i965/vec4: skip control-flow aware liveness analysis if we only have 1 block

2015-10-13 Thread Iago Toral Quiroga
Otherwise, what this is going to do is mark every variable alive in the
single block we have, and thus, alive since its definition until the
program ends, which is really bad in situations where register pressure
is high.

Of course, this does not address the real problem: that our liveness
analysis is not good enough, but it helps some simple cases with no
effort.
---
 src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp
index 6782379..afdecc2 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp
@@ -287,6 +287,13 @@ vec4_visitor::calculate_live_intervals()
 */
this->live_intervals = new(mem_ctx) vec4_live_variables(alloc, cfg);
 
+   /* If we only have one block this pass would only mark all used variables
+* as alive in the block, and thus, alive in the entire program. We don't
+* want that.
+*/
+   if (cfg->block_list.length() <= 1)
+  return;
+
foreach_block (block, cfg) {
   struct block_data *bd = _intervals->block_data[block->num];
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] i965/vec4: fix indentation in vec4_visitor::calculate_live_intervals

2015-10-13 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp
index cc688ef..6782379 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp
@@ -291,15 +291,15 @@ vec4_visitor::calculate_live_intervals()
   struct block_data *bd = _intervals->block_data[block->num];
 
   for (int i = 0; i < live_intervals->num_vars; i++) {
-if (BITSET_TEST(bd->livein, i)) {
-   start[i] = MIN2(start[i], block->start_ip);
-   end[i] = MAX2(end[i], block->start_ip);
-}
+ if (BITSET_TEST(bd->livein, i)) {
+start[i] = MIN2(start[i], block->start_ip);
+end[i] = MAX2(end[i], block->start_ip);
+ }
 
-if (BITSET_TEST(bd->liveout, i)) {
-   start[i] = MIN2(start[i], block->end_ip);
-   end[i] = MAX2(end[i], block->end_ip);
-}
+ if (BITSET_TEST(bd->liveout, i)) {
+start[i] = MIN2(start[i], block->end_ip);
+end[i] = MAX2(end[i], block->end_ip);
+ }
   }
}
 }
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] i965/fs: Fix indentation in fs_live_variables::compute_start_end

2015-10-13 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp
index 19aec92..ce066a9 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp
@@ -259,16 +259,15 @@ fs_live_variables::compute_start_end()
   struct block_data *bd = _data[block->num];
 
   for (int i = 0; i < num_vars; i++) {
-if (BITSET_TEST(bd->livein, i)) {
-   start[i] = MIN2(start[i], block->start_ip);
-   end[i] = MAX2(end[i], block->start_ip);
-}
-
-if (BITSET_TEST(bd->liveout, i)) {
-   start[i] = MIN2(start[i], block->end_ip);
-   end[i] = MAX2(end[i], block->end_ip);
-}
+ if (BITSET_TEST(bd->livein, i)) {
+start[i] = MIN2(start[i], block->start_ip);
+end[i] = MAX2(end[i], block->start_ip);
+ }
 
+ if (BITSET_TEST(bd->liveout, i)) {
+start[i] = MIN2(start[i], block->end_ip);
+end[i] = MAX2(end[i], block->end_ip);
+ }
   }
}
 }
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/4] i965: skip control-flow aware liveness analysis if we only have 1 block

2015-10-13 Thread Iago Toral Quiroga
This fixes the following test:

[require]
GL >= 3.3
GLSL >= 3.30
GL_ARB_shader_storage_buffer_object

[fragment shader]
#version 330
#extension GL_ARB_shader_storage_buffer_object: require

buffer SSBO {
mat4 sm4;
};


mat4 um4;

void main() {
sm4 *= um4;
sm4[0][0] = 0.0;
}

[test]
link success

where our liveness analysis works really bad because the control-flow part
will expand register liveness to the end of only block we have (so every
register will be marked alive until the end of the program). This artificially
increases register pressure to a point in which we run out of registers.
Of course, this is only a simple optimization for a trivial case, the
underlying problem still exists and would manifest when we have more than
one block, but it helps simple shaders such as the one in the example above
without any effort. I guess the real fix would involve re-thinking parts of the
liveness analysis algorithm...

Jordan, there is another thing that we can improve for this shader that is
specific to SSBOs: we emit the same ssbo load multiple times because we are
playing it safe just in case there are writes in between. I think we can try to
do better and not re-issue the same load if we don't have ssbo stores to
the same address in between. I'll try to come up with a patch for this
(hopefully we can do this inside lower_ubo_reference).

The actual fix goes into patches 2 (FS) and 4 (VS). Patches 1,3 are just
indentation fixes in the code around these.

Iago Toral Quiroga (4):
  i965/fs: Fix indentation in fs_live_variables::compute_start_end
  i965/fs: skip control-flow aware liveness analysis if we only have 1
block
  i965/vec4: fix indentation in vec4_visitor::calculate_live_intervals
  i965/vec4: skip control-flow aware liveness analysis if we only have 1
block

 .../drivers/dri/i965/brw_fs_live_variables.cpp | 24 ++
 .../drivers/dri/i965/brw_vec4_live_variables.cpp   | 23 +
 2 files changed, 30 insertions(+), 17 deletions(-)

-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] i965/fs: skip control-flow aware liveness analysis if we only have 1 block

2015-10-13 Thread Iago Toral Quiroga
Otherwise, what this is going to do is mark every variable alive in the
single block we have, and thus, alive since its definition until the
program ends, which is really bad in situations where register pressure
is high.

Of course, this does not address the real problem: that our liveness
analysis is not good enough, but it helps some simple cases with no
effort.
---
 src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp
index ce066a9..ce99bd3 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp
@@ -255,6 +255,13 @@ fs_live_variables::compute_live_variables()
 void
 fs_live_variables::compute_start_end()
 {
+   /* If we only have one block this pass would only mark all used variables
+* as alive in the block, and thus, alive in the entire program. We don't
+* want that.
+*/
+   if (cfg->block_list.length() <= 1)
+  return;
+
foreach_block (block, cfg) {
   struct block_data *bd = _data[block->num];
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V4 2/6] glsl: assign hidden uniforms their slot id earlier

2015-10-13 Thread Marek Olšák
On Tue, Oct 13, 2015 at 10:13 AM, Timothy Arceri  wrote:
> On Mon, 2015-10-12 at 01:06 +0200, Marek Olšák wrote:
>> On Sun, Oct 11, 2015 at 9:20 AM, Timothy Arceri <
>> t_arc...@yahoo.com.au> wrote:
>> > On Sat, 2015-10-10 at 18:06 +0200, Marek Olšák wrote:
>> > > Hi Timothy,
>> > >
>> > > One of these 3 commits breaks compilation for Talos shaders with
>> > > gallium. My piglit patch "glsl-1.30/sampler-bug: ..." contains a
>> > > minimal test case. I can't say which commit, because Mesa fails
>> > > to
>> > > build between them.
>> > >  It has something to do with uniforms, structures,
>> > > and samplers.
>> > >
>> > > commit dcd9cd03837545055ce2a315e7e8840cc3254d1a
>> > > Author: Timothy Arceri 
>> > > Date:   Sun Aug 30 12:50:34 2015 +1000
>> > >
>> > > glsl: store uniform slot id in var location field
>> > >
>> >
>> > Hi Marek,
>> >
>> > The piglit test passes on my intel based laptop I can't test on
>> > anything else until later this week (as I'm traveling) but my guess
>> > is
>> > its something to do with the above patch.
>> >
>> > The other two patches shouldn't change anything for gallium drivers
>> > "glsl: assign hidden uniforms their slot id earlier" just assigns
>> > hidden uniforms their slot id earlier and there shouldn't be any
>> > difference once the IR gets to glsl_to_tgsi.
>> >
>> > Also "glsl: order indices for samplers inside a struct array"
>> > shouldn't
>> > change things either as in your test the sampler are not inside the
>> > struct.
>> >
>> > There is some code in the glsl_to_tgsi pass that looks like the
>> > location field would have always been -1 for uniforms other the UBO
>> > and
>> > UBO members maybe this has something to do with the problem now
>> > that
>> > all uniforms now get a non -1 value.
>> >
>> >   case ir_var_uniform:
>> >  entry = new(mem_ctx) variable_storage(var,
>> > PROGRAM_UNIFORM,
>> >var->data.location);
>> >  this->variables.push_tail(entry);
>> >  break;
>> >
>> > I hope this helps get you started. If you haven't figured it out by
>> > later in the week than I'll take a look on my desktop once I get
>> > home.
>>
>> The problem is that _mesa_get_sampler_uniform_value returns a sampler
>> index which is greater than 16. If I allow large sampler indices, it
>> starts to fail with sampler.file==TGSI_FILE_NULL in the TGSI codegen
>> later. I didn't investigate further.
>>
>> Marek
>
>
> So from my bisect run your test was passing until:
>
> a907b5dd162b7911b8c21f6d54837831bc078059 is the first bad commit
> commit a907b5dd162b7911b8c21f6d54837831bc078059
> Author: Marek Olšák 
> Date:   Mon Oct 5 03:26:48 2015 +0200
>
> st/mesa: translate fragment shaders into TGSI when we get them
>
> Reviewed-by: Dave Airlie 
> Reviewed-by: Brian Paul 
> Tested-by: Brian Paul 

It's a linker test, so you really need to set ST_DEBUG=precompile to
get that shader translated before this commit. Using --enable-debug
should make it fail an assertion earlier without ST_DEBUG=precompile.

You didn't need to bisect. I had bisected the bug already and it's one
of your 3 commits.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] program: convert _mesa_init_gl_program() to take struct gl_program *

2015-10-13 Thread Emil Velikov
Rather than accepting a void pointer, only to down and up cast around
it, convert the function to take the base (struct gl_program) pointer.

Cc: Marek Olšák 
Signed-off-by: Emil Velikov 
---
 src/mesa/drivers/dri/i915/i915_fragprog.c  |  9 ++--
 src/mesa/drivers/dri/i965/brw_program.c|  6 +--
 .../drivers/dri/i965/test_fs_cmod_propagation.cpp  |  2 +-
 .../dri/i965/test_fs_saturate_propagation.cpp  |  2 +-
 .../dri/i965/test_vec4_copy_propagation.cpp|  2 +-
 .../dri/i965/test_vec4_register_coalesce.cpp   |  2 +-
 src/mesa/drivers/dri/r200/r200_vertprog.c  | 17 +++
 src/mesa/program/program.c | 55 ++
 src/mesa/program/program.h |  2 +-
 src/mesa/state_tracker/st_cb_program.c | 38 ---
 10 files changed, 68 insertions(+), 67 deletions(-)

diff --git a/src/mesa/drivers/dri/i915/i915_fragprog.c 
b/src/mesa/drivers/dri/i915/i915_fragprog.c
index 237d219..59d7959 100644
--- a/src/mesa/drivers/dri/i915/i915_fragprog.c
+++ b/src/mesa/drivers/dri/i915/i915_fragprog.c
@@ -1315,9 +1315,10 @@ static struct gl_program *
 i915NewProgram(struct gl_context * ctx, GLenum target, GLuint id)
 {
switch (target) {
-   case GL_VERTEX_PROGRAM_ARB:
-  return _mesa_init_gl_program(CALLOC_STRUCT(gl_vertex_program),
-   target, id);
+   case GL_VERTEX_PROGRAM_ARB: {
+  struct gl_vertex_program *prog = CALLOC_STRUCT(gl_vertex_program);
+  return _mesa_init_gl_program(>Base, target, id);
+   }
 
case GL_FRAGMENT_PROGRAM_ARB:{
  struct i915_fragment_program *prog =
@@ -1325,7 +1326,7 @@ i915NewProgram(struct gl_context * ctx, GLenum target, 
GLuint id)
  if (prog) {
 i915_init_program(I915_CONTEXT(ctx), prog);
 
-return _mesa_init_gl_program(prog, target, id);
+return _mesa_init_gl_program(>FragProg.Base, target, id);
  }
  else
 return NULL;
diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
b/src/mesa/drivers/dri/i965/brw_program.c
index 164c3d7..b547d07 100644
--- a/src/mesa/drivers/dri/i965/brw_program.c
+++ b/src/mesa/drivers/dri/i965/brw_program.c
@@ -69,7 +69,7 @@ static struct gl_program *brwNewProgram( struct gl_context 
*ctx,
   if (prog) {
 prog->id = get_new_program_id(brw->intelScreen);
 
-return _mesa_init_gl_program(>program, target, id);
+return _mesa_init_gl_program(>program.Base, target, id);
   }
   else
 return NULL;
@@ -80,7 +80,7 @@ static struct gl_program *brwNewProgram( struct gl_context 
*ctx,
   if (prog) {
 prog->id = get_new_program_id(brw->intelScreen);
 
-return _mesa_init_gl_program(>program, target, id);
+return _mesa_init_gl_program(>program.Base, target, id);
   }
   else
 return NULL;
@@ -102,7 +102,7 @@ static struct gl_program *brwNewProgram( struct gl_context 
*ctx,
   if (prog) {
  prog->id = get_new_program_id(brw->intelScreen);
 
- return _mesa_init_gl_program(>program, target, id);
+ return _mesa_init_gl_program(>program.Base, target, id);
   } else {
  return NULL;
   }
diff --git a/src/mesa/drivers/dri/i965/test_fs_cmod_propagation.cpp 
b/src/mesa/drivers/dri/i965/test_fs_cmod_propagation.cpp
index 7eee426..5f80f90 100644
--- a/src/mesa/drivers/dri/i965/test_fs_cmod_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/test_fs_cmod_propagation.cpp
@@ -66,7 +66,7 @@ void cmod_propagation_test::SetUp()
 
v = new cmod_propagation_fs_visitor(compiler, prog_data, shader);
 
-   _mesa_init_gl_program(>program, GL_FRAGMENT_SHADER, 0);
+   _mesa_init_gl_program(>program.Base, GL_FRAGMENT_SHADER, 0);
 
devinfo->gen = 4;
 }
diff --git a/src/mesa/drivers/dri/i965/test_fs_saturate_propagation.cpp 
b/src/mesa/drivers/dri/i965/test_fs_saturate_propagation.cpp
index fefde4b..32e8b8f 100644
--- a/src/mesa/drivers/dri/i965/test_fs_saturate_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/test_fs_saturate_propagation.cpp
@@ -66,7 +66,7 @@ void saturate_propagation_test::SetUp()
 
v = new saturate_propagation_fs_visitor(compiler, prog_data, shader);
 
-   _mesa_init_gl_program(>program, GL_FRAGMENT_SHADER, 0);
+   _mesa_init_gl_program(>program.Base, GL_FRAGMENT_SHADER, 0);
 
devinfo->gen = 4;
 }
diff --git a/src/mesa/drivers/dri/i965/test_vec4_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/test_vec4_copy_propagation.cpp
index 4a87e6e..e80b71b 100644
--- a/src/mesa/drivers/dri/i965/test_vec4_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/test_vec4_copy_propagation.cpp
@@ -98,7 +98,7 @@ void copy_propagation_test::SetUp()
 
v = new copy_propagation_vec4_visitor(compiler, shader);
 
-   _mesa_init_gl_program(>program, GL_VERTEX_SHADER, 0);
+   _mesa_init_gl_program(>program.Base, GL_VERTEX_SHADER, 0);
 
devinfo->gen = 4;
 }
diff --git 

Re: [Mesa-dev] [PATCH] program: convert _mesa_init_gl_program() to take struct gl_program *

2015-10-13 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Tue, Oct 13, 2015 at 12:26 PM, Emil Velikov  wrote:
> Rather than accepting a void pointer, only to down and up cast around
> it, convert the function to take the base (struct gl_program) pointer.
>
> Cc: Marek Olšák 
> Signed-off-by: Emil Velikov 
> ---
>  src/mesa/drivers/dri/i915/i915_fragprog.c  |  9 ++--
>  src/mesa/drivers/dri/i965/brw_program.c|  6 +--
>  .../drivers/dri/i965/test_fs_cmod_propagation.cpp  |  2 +-
>  .../dri/i965/test_fs_saturate_propagation.cpp  |  2 +-
>  .../dri/i965/test_vec4_copy_propagation.cpp|  2 +-
>  .../dri/i965/test_vec4_register_coalesce.cpp   |  2 +-
>  src/mesa/drivers/dri/r200/r200_vertprog.c  | 17 +++
>  src/mesa/program/program.c | 55 
> ++
>  src/mesa/program/program.h |  2 +-
>  src/mesa/state_tracker/st_cb_program.c | 38 ---
>  10 files changed, 68 insertions(+), 67 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i915/i915_fragprog.c 
> b/src/mesa/drivers/dri/i915/i915_fragprog.c
> index 237d219..59d7959 100644
> --- a/src/mesa/drivers/dri/i915/i915_fragprog.c
> +++ b/src/mesa/drivers/dri/i915/i915_fragprog.c
> @@ -1315,9 +1315,10 @@ static struct gl_program *
>  i915NewProgram(struct gl_context * ctx, GLenum target, GLuint id)
>  {
> switch (target) {
> -   case GL_VERTEX_PROGRAM_ARB:
> -  return _mesa_init_gl_program(CALLOC_STRUCT(gl_vertex_program),
> -   target, id);
> +   case GL_VERTEX_PROGRAM_ARB: {
> +  struct gl_vertex_program *prog = CALLOC_STRUCT(gl_vertex_program);
> +  return _mesa_init_gl_program(>Base, target, id);
> +   }
>
> case GL_FRAGMENT_PROGRAM_ARB:{
>   struct i915_fragment_program *prog =
> @@ -1325,7 +1326,7 @@ i915NewProgram(struct gl_context * ctx, GLenum target, 
> GLuint id)
>   if (prog) {
>  i915_init_program(I915_CONTEXT(ctx), prog);
>
> -return _mesa_init_gl_program(prog, target, id);
> +return _mesa_init_gl_program(>FragProg.Base, target, id);
>   }
>   else
>  return NULL;
> diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
> b/src/mesa/drivers/dri/i965/brw_program.c
> index 164c3d7..b547d07 100644
> --- a/src/mesa/drivers/dri/i965/brw_program.c
> +++ b/src/mesa/drivers/dri/i965/brw_program.c
> @@ -69,7 +69,7 @@ static struct gl_program *brwNewProgram( struct gl_context 
> *ctx,
>if (prog) {
>  prog->id = get_new_program_id(brw->intelScreen);
>
> -return _mesa_init_gl_program(>program, target, id);
> +return _mesa_init_gl_program(>program.Base, target, id);
>}
>else
>  return NULL;
> @@ -80,7 +80,7 @@ static struct gl_program *brwNewProgram( struct gl_context 
> *ctx,
>if (prog) {
>  prog->id = get_new_program_id(brw->intelScreen);
>
> -return _mesa_init_gl_program(>program, target, id);
> +return _mesa_init_gl_program(>program.Base, target, id);
>}
>else
>  return NULL;
> @@ -102,7 +102,7 @@ static struct gl_program *brwNewProgram( struct 
> gl_context *ctx,
>if (prog) {
>   prog->id = get_new_program_id(brw->intelScreen);
>
> - return _mesa_init_gl_program(>program, target, id);
> + return _mesa_init_gl_program(>program.Base, target, id);
>} else {
>   return NULL;
>}
> diff --git a/src/mesa/drivers/dri/i965/test_fs_cmod_propagation.cpp 
> b/src/mesa/drivers/dri/i965/test_fs_cmod_propagation.cpp
> index 7eee426..5f80f90 100644
> --- a/src/mesa/drivers/dri/i965/test_fs_cmod_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/test_fs_cmod_propagation.cpp
> @@ -66,7 +66,7 @@ void cmod_propagation_test::SetUp()
>
> v = new cmod_propagation_fs_visitor(compiler, prog_data, shader);
>
> -   _mesa_init_gl_program(>program, GL_FRAGMENT_SHADER, 0);
> +   _mesa_init_gl_program(>program.Base, GL_FRAGMENT_SHADER, 0);
>
> devinfo->gen = 4;
>  }
> diff --git a/src/mesa/drivers/dri/i965/test_fs_saturate_propagation.cpp 
> b/src/mesa/drivers/dri/i965/test_fs_saturate_propagation.cpp
> index fefde4b..32e8b8f 100644
> --- a/src/mesa/drivers/dri/i965/test_fs_saturate_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/test_fs_saturate_propagation.cpp
> @@ -66,7 +66,7 @@ void saturate_propagation_test::SetUp()
>
> v = new saturate_propagation_fs_visitor(compiler, prog_data, shader);
>
> -   _mesa_init_gl_program(>program, GL_FRAGMENT_SHADER, 0);
> +   _mesa_init_gl_program(>program.Base, GL_FRAGMENT_SHADER, 0);
>
> devinfo->gen = 4;
>  }
> diff --git a/src/mesa/drivers/dri/i965/test_vec4_copy_propagation.cpp 
> b/src/mesa/drivers/dri/i965/test_vec4_copy_propagation.cpp
> index 4a87e6e..e80b71b 100644
> --- a/src/mesa/drivers/dri/i965/test_vec4_copy_propagation.cpp
> 

Re: [Mesa-dev] [PATCH 1/4] tgsi: add option to dump floats as hex values

2015-10-13 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Tue, Oct 13, 2015 at 6:40 AM, Dave Airlie  wrote:
> This adds support to the parser to accept hex values as floats,
> and then adds support to the dumper to allow the user to select
> to dump float as 32-bit hex numbers.
>
> This is required to get accurate values for virgl use of TGSI.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/gallium/auxiliary/tgsi/tgsi_dump.c | 19 ++-
>  src/gallium/auxiliary/tgsi/tgsi_dump.h |  2 ++
>  src/gallium/auxiliary/tgsi/tgsi_text.c | 11 ++-
>  3 files changed, 30 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c 
> b/src/gallium/auxiliary/tgsi/tgsi_dump.c
> index 8ceb5b4..33f6a56 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_dump.c
> +++ b/src/gallium/auxiliary/tgsi/tgsi_dump.c
> @@ -29,6 +29,7 @@
>  #include "util/u_string.h"
>  #include "util/u_math.h"
>  #include "util/u_memory.h"
> +#include "util/u_math.h"
>  #include "tgsi_dump.h"
>  #include "tgsi_info.h"
>  #include "tgsi_iterate.h"
> @@ -43,6 +44,8 @@ struct dump_ctx
>  {
> struct tgsi_iterate_context iter;
>
> +   boolean dump_float_as_hex;
> +
> uint instno;
> uint immno;
> int indent;
> @@ -88,6 +91,7 @@ dump_enum(
>  #define SID(I)  ctx->dump_printf( ctx, "%d", I )
>  #define FLT(F)  ctx->dump_printf( ctx, "%10.4f", F )
>  #define DBL(D)  ctx->dump_printf( ctx, "%10.8f", D )
> +#define HFLT(F) ctx->dump_printf( ctx, "0x%08x", fui((F)) )
>  #define ENM(E,ENUMS)dump_enum( ctx, E, ENUMS, sizeof( ENUMS ) / sizeof( 
> *ENUMS ) )
>
>  const char *
> @@ -251,7 +255,10 @@ dump_imm_data(struct tgsi_iterate_context *iter,
>   break;
>}
>case TGSI_IMM_FLOAT32:
> - FLT( data[i].Float );
> + if (ctx->dump_float_as_hex)
> +HFLT( data[i].Float );
> + else
> +FLT( data[i].Float );
>   break;
>case TGSI_IMM_UINT32:
>   UID(data[i].Uint);
> @@ -681,6 +688,11 @@ tgsi_dump_to_file(const struct tgsi_token *tokens, uint 
> flags, FILE *file)
> ctx.indentation = 0;
> ctx.file = file;
>
> +   if (flags & TGSI_DUMP_FLOAT_AS_HEX)
> +  ctx.dump_float_as_hex = TRUE;
> +   else
> +  ctx.dump_float_as_hex = FALSE;
> +
> tgsi_iterate_shader( tokens,  );
>  }
>
> @@ -749,6 +761,11 @@ tgsi_dump_str(
> ctx.ptr = str;
> ctx.left = (int)size;
>
> +   if (flags & TGSI_DUMP_FLOAT_AS_HEX)
> +  ctx.base.dump_float_as_hex = TRUE;
> +   else
> +  ctx.base.dump_float_as_hex = FALSE;
> +
> tgsi_iterate_shader( tokens,  );
>  }
>
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.h 
> b/src/gallium/auxiliary/tgsi/tgsi_dump.h
> index 7c8f92e..b98 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_dump.h
> +++ b/src/gallium/auxiliary/tgsi/tgsi_dump.h
> @@ -38,6 +38,8 @@
>  extern "C" {
>  #endif
>
> +#define TGSI_DUMP_FLOAT_AS_HEX (1 << 0)
> +
>  void
>  tgsi_dump_str(
> const struct tgsi_token *tokens,
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c 
> b/src/gallium/auxiliary/tgsi/tgsi_text.c
> index 3e3ed5b..4a82c9b 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_text.c
> +++ b/src/gallium/auxiliary/tgsi/tgsi_text.c
> @@ -195,8 +195,15 @@ static boolean parse_float( const char **pcur, float 
> *val )
> boolean integral_part = FALSE;
> boolean fractional_part = FALSE;
>
> -   *val = (float) atof( cur );
> +   if (*cur == '0' && *(cur + 1) == 'x') {
> +  union fi fi;
> +  fi.ui = strtoul(cur, NULL, 16);
> +  *val = fi.f;
> +  cur += 10;
> +  goto out;
> +   }
>
> +   *val = (float) atof( cur );
> if (*cur == '-' || *cur == '+')
>cur++;
> if (is_digit( cur )) {
> @@ -228,6 +235,8 @@ static boolean parse_float( const char **pcur, float *val 
> )
>else
>   return FALSE;
> }
> +
> +out:
> *pcur = cur;
> return TRUE;
>  }
> --
> 2.4.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 07/10] radeonsi: unify shader delete functions

2015-10-13 Thread Michel Dänzer
On 11.10.2015 10:11, Marek Olšák wrote:
> From: Marek Olšák 
> 
> ---
>  src/gallium/drivers/radeonsi/si_state_shaders.c | 84 
> +
>  1 file changed, 17 insertions(+), 67 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
> b/src/gallium/drivers/radeonsi/si_state_shaders.c
> index 9d05cb5..cc053bb 100644
> --- a/src/gallium/drivers/radeonsi/si_state_shaders.c
> +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
> @@ -907,11 +907,21 @@ static void si_bind_ps_shader(struct pipe_context *ctx, 
> void *state)
>   si_mark_atom_dirty(sctx, >cb_target_mask);
>  }
>  
> -static void si_delete_shader_selector(struct pipe_context *ctx,
> -   struct si_shader_selector *sel)
> +static void si_delete_shader_selector(struct pipe_context *ctx, void *state)
>  {
>   struct si_context *sctx = (struct si_context *)ctx;
> + struct si_shader_selector *sel = (struct si_shader_selector *)state;
>   struct si_shader *p = sel->current, *c;
> + struct si_shader_selector **current_shader[SI_NUM_SHADERS] = {
> + [PIPE_SHADER_VERTEX] = >vs_shader,
> + [PIPE_SHADER_TESS_CTRL] = >tcs_shader,
> + [PIPE_SHADER_TESS_EVAL] = >tes_shader,
> + [PIPE_SHADER_GEOMETRY] = >gs_shader,
> + [PIPE_SHADER_FRAGMENT] = >ps_shader,
> + };

A switch (sel->type) statement might allow the compiler to generate more
efficient code than this. Either way though, this patch is

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 02/10] radeonsi: cleanup si_llvm_init_export_args

2015-10-13 Thread Michel Dänzer
On 11.10.2015 10:11, Marek Olšák wrote:
> From: Marek Olšák 

The shortlog should say "clean up" (verb) instead of "cleanup" (noun).
Same for patches 9 & 10.


> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> b/src/gallium/drivers/radeonsi/si_shader.c
> index 32a702f..109a805 100644
> --- a/src/gallium/drivers/radeonsi/si_shader.c
> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> @@ -1306,6 +1306,22 @@ static void si_llvm_init_export_args(struct 
> lp_build_tgsi_context *bld_base,
>   unsigned compressed = 0;
>   unsigned chan;
>  
> + /* XXX: This controls which components of the output
> +  * registers actually get exported. (e.g bit 0 means export
> +  * X component, bit 1 means export Y component, etc.)  I'm
> +  * hard coding this to 0xf for now.  In the future, we might
> +  * want to do something else. */

The "*/" should go on its own line.


With those fixed, this patch and patches 9 & 10 are

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plan to work on opt_peephole_sel optimization for vec4

2015-10-13 Thread Antía Puentes
On lun, 2015-10-12 at 15:55 +0300, Francisco Jerez wrote:
> Antía Puentes  writes:
> > On dom, 2015-10-11 at 10:35 -0700, Matt Turner wrote:
> >> I would expect big improvements in the vec4 backend from making its
> >> copy propagation pass global.
> >
> > I will be working on global copy propagation then.
> >
> 
> Heh...  I suggest you hold yourself from working on this for a while.
> 
> While working on some fragment discard improvements last week I noticed
> that a seemingly harmless change in the CFG could cause shitloads of
> shader-db regressions (>8k).  The reason is that a bunch of optimization
> passes of the compiler back-end are fully local (including VEC4 copy
> propagation) and become ineffective anytime your dataflow happens to
> traverse edges of the CFG.
> 
> Instead of reimplementing the intricate dataflow propagation logic (e.g.
> what fs_copy_prop_dataflow does) in every pass (or rather every pass
> that needs to consider two or more instructions at once), I've given a
> shot at implementing use-def chains in the back-end (didn't have a
> bigger hammer at hand).  Use-def chains should simplify this sort of
> optimization passes massively and allow them to work globally without
> any additional effort: Instead of looping back the instruction list to
> find out whether the last instruction that wrote a register had some
> specific form (c.f. cmod_propagation, saturate_propagation,
> compute_to_mrf, which are all local and have O(n^2) complexity), you
> just look-up the use entry for the instruction you're interested in and
> look at its immediate neighbours in the use-def graph, which represent
> the instruction(s) that generated the variable.
> 
> Use-def chains can be implemented trivially on an SSA IR so as an added
> benefit it should become easier to port optimization passes to SSA once
> they've been rewritten in terms of use-def chains.
> 
> I've got an algorithm to construct the use-def graph basically working
> on the FS back-end (I'm not sharing it publicly yet though because in
> its current form it would likely offend some people's sensibilities -- I
> could send you a branch in private though if you're interested).  I
> haven't started porting other optimization passes to use it yet, but
> it's next on my to-do list.

Hi Francisco,

that sounds great. Do you intend to do the same for vec4? Otherwise, I
can try to port your work to vec4 once you are done with FS.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH v2] configure.ac: ensure RM is set

2015-10-13 Thread Jonathan Gray
On Tue, Oct 13, 2015 at 11:25:08AM +0100, Emil Velikov wrote:
> On 10 October 2015 at 07:42, Jonathan Gray  wrote:
> > GNU make predefines RM to rm -f but this is not required by POSIX
> > so ensure that RM is set.  This fixes "make clean" on OpenBSD.
> >
> > v2: use AC_CHECK_PROG
> >
> I'm ambivalent if we go the AC_CHECK_PROG vs ?= route.
> 
> I'll let others pick their favourite, but both are
> Reviewed-by: Emil Velikov 
> 
> -Emil

The ?= diff was a screwup as that syntax is valid in make but not shell.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/4] i965: skip control-flow aware liveness analysis if we only have 1 block

2015-10-13 Thread Iago Toral
On Tue, 2015-10-13 at 15:17 +0300, Francisco Jerez wrote:
> Iago Toral Quiroga  writes:
> 
> > This fixes the following test:
> >
> > [require]
> > GL >= 3.3
> > GLSL >= 3.30
> > GL_ARB_shader_storage_buffer_object
> >
> > [fragment shader]
> > #version 330
> > #extension GL_ARB_shader_storage_buffer_object: require
> >
> > buffer SSBO {
> > mat4 sm4;
> > };
> >
> >
> > mat4 um4;
> >
> > void main() {
> > sm4 *= um4;
> 
> This is using the value of "um4", which is never assigned, so liveness
> analysis will correctly extend its live interval up to the start of the
> block.
> 
> The other problem here seems to be that the liveness analysis pass
> doesn't consider scalar writes (like the ones emitted by the
> combine_constants optimization pass and by emit_uniformize()) to fully
> define the contents of a register, so it will incorrectly extend up the
> live interval of registers defined using scalar writes even if all
> components ever used in the shader have been defined individually.
> Incidentally the use-def-chains-based implementation of liveness
> analysis I'm working on will fix this issue easily.

Great, thanks Curro! Once you make your change public I can verify that
they fix this too.

BTW, even if your changes fix this I wonder if my patch is still valid:
control-flow analysis should not add anything relevant to liveness
analysis if we only have one block, right?

Iago

> > sm4[0][0] = 0.0;
> > }
> >
> > [test]
> > link success
> >
> > where our liveness analysis works really bad because the control-flow part
> > will expand register liveness to the end of only block we have (so every
> > register will be marked alive until the end of the program). This 
> > artificially
> > increases register pressure to a point in which we run out of registers.
> > Of course, this is only a simple optimization for a trivial case, the
> > underlying problem still exists and would manifest when we have more than
> > one block, but it helps simple shaders such as the one in the example above
> > without any effort. I guess the real fix would involve re-thinking parts of 
> > the
> > liveness analysis algorithm...
> >
> > Jordan, there is another thing that we can improve for this shader that is
> > specific to SSBOs: we emit the same ssbo load multiple times because we are
> > playing it safe just in case there are writes in between. I think we can 
> > try to
> > do better and not re-issue the same load if we don't have ssbo stores to
> > the same address in between. I'll try to come up with a patch for this
> > (hopefully we can do this inside lower_ubo_reference).
> >
> > The actual fix goes into patches 2 (FS) and 4 (VS). Patches 1,3 are just
> > indentation fixes in the code around these.
> >
> > Iago Toral Quiroga (4):
> >   i965/fs: Fix indentation in fs_live_variables::compute_start_end
> >   i965/fs: skip control-flow aware liveness analysis if we only have 1
> > block
> >   i965/vec4: fix indentation in vec4_visitor::calculate_live_intervals
> >   i965/vec4: skip control-flow aware liveness analysis if we only have 1
> > block
> >
> >  .../drivers/dri/i965/brw_fs_live_variables.cpp | 24 
> > ++
> >  .../drivers/dri/i965/brw_vec4_live_variables.cpp   | 23 
> > +
> >  2 files changed, 30 insertions(+), 17 deletions(-)
> >
> > -- 
> > 1.9.1
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/4] i965: skip control-flow aware liveness analysis if we only have 1 block

2015-10-13 Thread Francisco Jerez
Iago Toral  writes:

> On Tue, 2015-10-13 at 15:17 +0300, Francisco Jerez wrote:
>> Iago Toral Quiroga  writes:
>> 
>> > This fixes the following test:
>> >
>> > [require]
>> > GL >= 3.3
>> > GLSL >= 3.30
>> > GL_ARB_shader_storage_buffer_object
>> >
>> > [fragment shader]
>> > #version 330
>> > #extension GL_ARB_shader_storage_buffer_object: require
>> >
>> > buffer SSBO {
>> > mat4 sm4;
>> > };
>> >
>> >
>> > mat4 um4;
>> >
>> > void main() {
>> > sm4 *= um4;
>> 
>> This is using the value of "um4", which is never assigned, so liveness
>> analysis will correctly extend its live interval up to the start of the
>> block.
>> 
>> The other problem here seems to be that the liveness analysis pass
>> doesn't consider scalar writes (like the ones emitted by the
>> combine_constants optimization pass and by emit_uniformize()) to fully
>> define the contents of a register, so it will incorrectly extend up the
>> live interval of registers defined using scalar writes even if all
>> components ever used in the shader have been defined individually.
>> Incidentally the use-def-chains-based implementation of liveness
>> analysis I'm working on will fix this issue easily.
>
> Great, thanks Curro! Once you make your change public I can verify that
> they fix this too.
>
> BTW, even if your changes fix this I wonder if my patch is still valid:
> control-flow analysis should not add anything relevant to liveness
> analysis if we only have one block, right?
>

Yeah, that should be okay.  I'm not strongly opposed to this change as
temporary hack-around for the meantime, but the comments seem somewhat
misleading.

Either way patches 1 and 3 (i.e. the indentation fixes) are:

Reviewed-by: Francisco Jerez 

> Iago
>
>> > sm4[0][0] = 0.0;
>> > }
>> >
>> > [test]
>> > link success
>> >
>> > where our liveness analysis works really bad because the control-flow part
>> > will expand register liveness to the end of only block we have (so every
>> > register will be marked alive until the end of the program). This 
>> > artificially
>> > increases register pressure to a point in which we run out of registers.
>> > Of course, this is only a simple optimization for a trivial case, the
>> > underlying problem still exists and would manifest when we have more than
>> > one block, but it helps simple shaders such as the one in the example above
>> > without any effort. I guess the real fix would involve re-thinking parts 
>> > of the
>> > liveness analysis algorithm...
>> >
>> > Jordan, there is another thing that we can improve for this shader that is
>> > specific to SSBOs: we emit the same ssbo load multiple times because we are
>> > playing it safe just in case there are writes in between. I think we can 
>> > try to
>> > do better and not re-issue the same load if we don't have ssbo stores to
>> > the same address in between. I'll try to come up with a patch for this
>> > (hopefully we can do this inside lower_ubo_reference).
>> >
>> > The actual fix goes into patches 2 (FS) and 4 (VS). Patches 1,3 are just
>> > indentation fixes in the code around these.
>> >
>> > Iago Toral Quiroga (4):
>> >   i965/fs: Fix indentation in fs_live_variables::compute_start_end
>> >   i965/fs: skip control-flow aware liveness analysis if we only have 1
>> > block
>> >   i965/vec4: fix indentation in vec4_visitor::calculate_live_intervals
>> >   i965/vec4: skip control-flow aware liveness analysis if we only have 1
>> > block
>> >
>> >  .../drivers/dri/i965/brw_fs_live_variables.cpp | 24 
>> > ++
>> >  .../drivers/dri/i965/brw_vec4_live_variables.cpp   | 23 
>> > +
>> >  2 files changed, 30 insertions(+), 17 deletions(-)
>> >
>> > -- 
>> > 1.9.1
>> >
>> > ___
>> > mesa-dev mailing list
>> > mesa-dev@lists.freedesktop.org
>> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plan to work on opt_peephole_sel optimization for vec4

2015-10-13 Thread Francisco Jerez
Antía Puentes  writes:

> On lun, 2015-10-12 at 15:55 +0300, Francisco Jerez wrote:
>> Antía Puentes  writes:
>> > On dom, 2015-10-11 at 10:35 -0700, Matt Turner wrote:
>> >> I would expect big improvements in the vec4 backend from making its
>> >> copy propagation pass global.
>> >
>> > I will be working on global copy propagation then.
>> >
>> 
>> Heh...  I suggest you hold yourself from working on this for a while.
>> 
>> While working on some fragment discard improvements last week I noticed
>> that a seemingly harmless change in the CFG could cause shitloads of
>> shader-db regressions (>8k).  The reason is that a bunch of optimization
>> passes of the compiler back-end are fully local (including VEC4 copy
>> propagation) and become ineffective anytime your dataflow happens to
>> traverse edges of the CFG.
>> 
>> Instead of reimplementing the intricate dataflow propagation logic (e.g.
>> what fs_copy_prop_dataflow does) in every pass (or rather every pass
>> that needs to consider two or more instructions at once), I've given a
>> shot at implementing use-def chains in the back-end (didn't have a
>> bigger hammer at hand).  Use-def chains should simplify this sort of
>> optimization passes massively and allow them to work globally without
>> any additional effort: Instead of looping back the instruction list to
>> find out whether the last instruction that wrote a register had some
>> specific form (c.f. cmod_propagation, saturate_propagation,
>> compute_to_mrf, which are all local and have O(n^2) complexity), you
>> just look-up the use entry for the instruction you're interested in and
>> look at its immediate neighbours in the use-def graph, which represent
>> the instruction(s) that generated the variable.
>> 
>> Use-def chains can be implemented trivially on an SSA IR so as an added
>> benefit it should become easier to port optimization passes to SSA once
>> they've been rewritten in terms of use-def chains.
>> 
>> I've got an algorithm to construct the use-def graph basically working
>> on the FS back-end (I'm not sharing it publicly yet though because in
>> its current form it would likely offend some people's sensibilities -- I
>> could send you a branch in private though if you're interested).  I
>> haven't started porting other optimization passes to use it yet, but
>> it's next on my to-do list.
>
> Hi Francisco,
>
> that sounds great. Do you intend to do the same for vec4? Otherwise, I
> can try to port your work to vec4 once you are done with FS.

Pretty much the whole use-def analysis logic is independent from the
back-end, so it should be straightforward to port to VEC4.  However I
might need some help at some point rewriting other optimization passes
in terms of use-def chains instead of their current hand-rolled dataflow
analysis -- I'll get back to you once I have something solid for you to
use as starting point. ;)


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] glsl: calculate TOP_LEVEL_ARRAY_SIZE and STRIDE when adding resources

2015-10-13 Thread Samuel Iglesias Gonsálvez
Series is:

Reviewed-by: Samuel Iglesias Gonsálvez 

Thanks!

Sam

On 13/10/15 13:40, Tapani Pälli wrote:
> Patch moves existing calculation code from shader_query.cpp to happen
> during program resource list creation.
> 
> No Piglit or CTS regressions were observed during testing.
> 
> Signed-off-by: Tapani Pälli 
> ---
>  src/glsl/linker.cpp| 241 
>  src/mesa/main/shader_query.cpp | 244 
> +
>  2 files changed, 243 insertions(+), 242 deletions(-)
> 
> diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
> index a97b4ef..3422fba 100644
> --- a/src/glsl/linker.cpp
> +++ b/src/glsl/linker.cpp
> @@ -3389,6 +3389,242 @@ add_packed_varyings(struct gl_shader_program *shProg, 
> int stage)
> return true;
>  }
>  
> +static char*
> +get_top_level_name(const char *name)
> +{
> +   const char *first_dot = strchr(name, '.');
> +   const char *first_square_bracket = strchr(name, '[');
> +   int name_size = 0;
> +   /* From ARB_program_interface_query spec:
> +*
> +* "For the property TOP_LEVEL_ARRAY_SIZE, a single integer identifying 
> the
> +*  number of active array elements of the top-level shader storage block
> +*  member containing to the active variable is written to .  If 
> the
> +*  top-level block member is not declared as an array, the value one is
> +*  written to .  If the top-level block member is an array with 
> no
> +*  declared size, the value zero is written to .
> +*/
> +
> +   /* The buffer variable is on top level.*/
> +   if (!first_square_bracket && !first_dot)
> +  name_size = strlen(name);
> +   else if ((!first_square_bracket ||
> +(first_dot && first_dot < first_square_bracket)))
> +  name_size = first_dot - name;
> +   else
> +  name_size = first_square_bracket - name;
> +
> +   return strndup(name, name_size);
> +}
> +
> +static char*
> +get_var_name(const char *name)
> +{
> +   const char *first_dot = strchr(name, '.');
> +
> +   if (!first_dot)
> +  return strdup(name);
> +
> +   return strndup(first_dot+1, strlen(first_dot) - 1);
> +}
> +
> +static bool
> +is_top_level_shader_storage_block_member(const char* name,
> + const char* interface_name,
> + const char* field_name)
> +{
> +   bool result = false;
> +
> +   /* If the given variable is already a top-level shader storage
> +* block member, then return array_size = 1.
> +* We could have two possibilities: if we have an instanced
> +* shader storage block or not instanced.
> +*
> +* For the first, we check create a name as it was in top level and
> +* compare it with the real name. If they are the same, then
> +* the variable is already at top-level.
> +*
> +* Full instanced name is: interface name + '.' + var name +
> +*NULL character
> +*/
> +   int name_length = strlen(interface_name) + 1 + strlen(field_name) + 1;
> +   char *full_instanced_name = (char *) calloc(name_length, sizeof(char));
> +   if (!full_instanced_name) {
> +  fprintf(stderr, "%s: Cannot allocate space for name\n", __func__);
> +  return false;
> +   }
> +
> +   snprintf(full_instanced_name, name_length, "%s.%s",
> +interface_name, field_name);
> +
> +   /* Check if its top-level shader storage block member of an
> +* instanced interface block, or of a unnamed interface block.
> +*/
> +   if (strcmp(name, full_instanced_name) == 0 ||
> +   strcmp(name, field_name) == 0)
> +  result = true;
> +
> +   free(full_instanced_name);
> +   return result;
> +}
> +
> +static void
> +calculate_array_size(struct gl_shader_program *shProg,
> + struct gl_uniform_storage *uni)
> +{
> +   int block_index = uni->block_index;
> +   int array_size = -1;
> +   char *var_name = get_top_level_name(uni->name);
> +   char *interface_name =
> +  get_top_level_name(shProg->UniformBlocks[block_index].Name);
> +
> +   if (strcmp(var_name, interface_name) == 0) {
> +  /* Deal with instanced array of SSBOs */
> +  char *temp_name = get_var_name(uni->name);
> +  free(var_name);
> +  var_name = get_top_level_name(temp_name);
> +  free(temp_name);
> +   }
> +
> +   for (unsigned i = 0; i < shProg->NumShaders; i++) {
> +  if (shProg->Shaders[i] == NULL)
> + continue;
> +
> +  const gl_shader *stage = shProg->Shaders[i];
> +  foreach_in_list(ir_instruction, node, stage->ir) {
> + ir_variable *var = node->as_variable();
> + if (!var || !var->get_interface_type() ||
> + var->data.mode != ir_var_shader_storage)
> +continue;
> +
> + const glsl_type *interface = var->get_interface_type();
> +
> + if (strcmp(interface_name, interface->name) != 0)
> +continue;
> +
> + for (unsigned i = 0; i < 

Re: [Mesa-dev] [PATCH 01/11] vbo: get rid of needless NR_MAT_ATTRIBS constant

2015-10-13 Thread Marek Olšák
For the series:

Reviewed-by: Marek Olšák 

Marek

On Tue, Oct 13, 2015 at 4:45 AM, Brian Paul  wrote:
> ---
>  src/mesa/vbo/vbo_context.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/src/mesa/vbo/vbo_context.c b/src/mesa/vbo/vbo_context.c
> index 802955d..b5dc45c 100644
> --- a/src/mesa/vbo/vbo_context.c
> +++ b/src/mesa/vbo/vbo_context.c
> @@ -33,7 +33,6 @@
>  #include "vbo.h"
>  #include "vbo_context.h"
>
> -#define NR_MAT_ATTRIBS 12
>
>  static GLuint check_size( const GLfloat *attr )
>  {
> @@ -108,14 +107,12 @@ static void init_mat_currval(struct gl_context *ctx)
>>currval[VBO_ATTRIB_MAT_FRONT_AMBIENT];
> GLuint i;
>
> -   assert(NR_MAT_ATTRIBS == MAT_ATTRIB_MAX);
> -
> -   memset(arrays, 0, sizeof(*arrays) * NR_MAT_ATTRIBS);
> +   memset(arrays, 0, sizeof(*arrays) * MAT_ATTRIB_MAX);
>
> /* Set up a constant (StrideB == 0) array for each current
>  * attribute:
>  */
> -   for (i = 0; i < NR_MAT_ATTRIBS; i++) {
> +   for (i = 0; i < MAT_ATTRIB_MAX; i++) {
>struct gl_client_array *cl = [i];
>
>/* Size is fixed for the material attributes, for others will
> @@ -175,7 +172,7 @@ GLboolean _vbo_CreateContext( struct gl_context *ctx )
>for (i = 0; i < ARRAY_SIZE(vbo->map_vp_none); i++)
>  vbo->map_vp_none[i] = i;
>/* map material attribs to generic slots */
> -  for (i = 0; i < NR_MAT_ATTRIBS; i++)
> +  for (i = 0; i < MAT_ATTRIB_MAX; i++)
>  vbo->map_vp_none[VERT_ATTRIB_GENERIC(i)]
>  = VBO_ATTRIB_MAT_FRONT_AMBIENT + i;
>
> --
> 1.9.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] glsl: calculate TOP_LEVEL_ARRAY_SIZE and STRIDE when adding resources

2015-10-13 Thread Tapani Pälli
Patch moves existing calculation code from shader_query.cpp to happen
during program resource list creation.

No Piglit or CTS regressions were observed during testing.

Signed-off-by: Tapani Pälli 
---
 src/glsl/linker.cpp| 241 
 src/mesa/main/shader_query.cpp | 244 +
 2 files changed, 243 insertions(+), 242 deletions(-)

diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
index a97b4ef..3422fba 100644
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -3389,6 +3389,242 @@ add_packed_varyings(struct gl_shader_program *shProg, 
int stage)
return true;
 }
 
+static char*
+get_top_level_name(const char *name)
+{
+   const char *first_dot = strchr(name, '.');
+   const char *first_square_bracket = strchr(name, '[');
+   int name_size = 0;
+   /* From ARB_program_interface_query spec:
+*
+* "For the property TOP_LEVEL_ARRAY_SIZE, a single integer identifying the
+*  number of active array elements of the top-level shader storage block
+*  member containing to the active variable is written to .  If the
+*  top-level block member is not declared as an array, the value one is
+*  written to .  If the top-level block member is an array with no
+*  declared size, the value zero is written to .
+*/
+
+   /* The buffer variable is on top level.*/
+   if (!first_square_bracket && !first_dot)
+  name_size = strlen(name);
+   else if ((!first_square_bracket ||
+(first_dot && first_dot < first_square_bracket)))
+  name_size = first_dot - name;
+   else
+  name_size = first_square_bracket - name;
+
+   return strndup(name, name_size);
+}
+
+static char*
+get_var_name(const char *name)
+{
+   const char *first_dot = strchr(name, '.');
+
+   if (!first_dot)
+  return strdup(name);
+
+   return strndup(first_dot+1, strlen(first_dot) - 1);
+}
+
+static bool
+is_top_level_shader_storage_block_member(const char* name,
+ const char* interface_name,
+ const char* field_name)
+{
+   bool result = false;
+
+   /* If the given variable is already a top-level shader storage
+* block member, then return array_size = 1.
+* We could have two possibilities: if we have an instanced
+* shader storage block or not instanced.
+*
+* For the first, we check create a name as it was in top level and
+* compare it with the real name. If they are the same, then
+* the variable is already at top-level.
+*
+* Full instanced name is: interface name + '.' + var name +
+*NULL character
+*/
+   int name_length = strlen(interface_name) + 1 + strlen(field_name) + 1;
+   char *full_instanced_name = (char *) calloc(name_length, sizeof(char));
+   if (!full_instanced_name) {
+  fprintf(stderr, "%s: Cannot allocate space for name\n", __func__);
+  return false;
+   }
+
+   snprintf(full_instanced_name, name_length, "%s.%s",
+interface_name, field_name);
+
+   /* Check if its top-level shader storage block member of an
+* instanced interface block, or of a unnamed interface block.
+*/
+   if (strcmp(name, full_instanced_name) == 0 ||
+   strcmp(name, field_name) == 0)
+  result = true;
+
+   free(full_instanced_name);
+   return result;
+}
+
+static void
+calculate_array_size(struct gl_shader_program *shProg,
+ struct gl_uniform_storage *uni)
+{
+   int block_index = uni->block_index;
+   int array_size = -1;
+   char *var_name = get_top_level_name(uni->name);
+   char *interface_name =
+  get_top_level_name(shProg->UniformBlocks[block_index].Name);
+
+   if (strcmp(var_name, interface_name) == 0) {
+  /* Deal with instanced array of SSBOs */
+  char *temp_name = get_var_name(uni->name);
+  free(var_name);
+  var_name = get_top_level_name(temp_name);
+  free(temp_name);
+   }
+
+   for (unsigned i = 0; i < shProg->NumShaders; i++) {
+  if (shProg->Shaders[i] == NULL)
+ continue;
+
+  const gl_shader *stage = shProg->Shaders[i];
+  foreach_in_list(ir_instruction, node, stage->ir) {
+ ir_variable *var = node->as_variable();
+ if (!var || !var->get_interface_type() ||
+ var->data.mode != ir_var_shader_storage)
+continue;
+
+ const glsl_type *interface = var->get_interface_type();
+
+ if (strcmp(interface_name, interface->name) != 0)
+continue;
+
+ for (unsigned i = 0; i < interface->length; i++) {
+const glsl_struct_field *field = >fields.structure[i];
+if (strcmp(field->name, var_name) != 0)
+   continue;
+/* From GL_ARB_program_interface_query spec:
+ *
+ * "For the property TOP_LEVEL_ARRAY_SIZE, a single integer
+ * identifying the number of active array elements of the top-level
+ 

[Mesa-dev] [PATCH 1/2] glsl: add top level array size and stride to gl_uniform_storage

2015-10-13 Thread Tapani Pälli
Patch adds 2 new fields to gl_uniform_storage so that we don't need to
calculate these values during runtime shader queries. This is required by
upcoming changes to free GLSL IR after linking.

Patch moves 3 booleans inside structure so that structure size stays the
same after this change.

Signed-off-by: Tapani Pälli 
---
 src/glsl/ir_uniform.h | 32 ++--
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/src/glsl/ir_uniform.h b/src/glsl/ir_uniform.h
index 50fe76b..1854279 100644
--- a/src/glsl/ir_uniform.h
+++ b/src/glsl/ir_uniform.h
@@ -162,6 +162,22 @@ struct gl_uniform_storage {
/** @} */
 
/**
+* This is a compiler-generated uniform that should not be advertised
+* via the API.
+*/
+   bool hidden;
+
+   /**
+* This is a built-in uniform that should not be modified through any gl 
API.
+*/
+   bool builtin;
+
+   /**
+* This is a shader storage buffer variable, not an uniform.
+*/
+   bool is_shader_storage;
+
+   /**
 * Index within gl_shader_program::AtomicBuffers[] of the atomic
 * counter buffer this uniform is stored in, or -1 if this is not
 * an atomic counter.
@@ -181,20 +197,16 @@ struct gl_uniform_storage {
unsigned num_compatible_subroutines;
 
/**
-* This is a compiler-generated uniform that should not be advertised
-* via the API.
+* A single integer identifying the number of active array elements of
+* the top-level shader storage block member (GL_TOP_LEVEL_ARRAY_SIZE).
 */
-   bool hidden;
+   unsigned top_level_array_size;
 
/**
-* This is a built-in uniform that should not be modified through any gl 
API.
+* A single integer identifying the stride between array elements of the
+* top-level shader storage block member. (GL_TOP_LEVEL_ARRAY_STRIDE).
 */
-   bool builtin;
-
-   /**
-* This is a shader storage buffer variable, not an uniform.
-*/
-   bool is_shader_storage;
+   unsigned top_level_array_stride;
 };
 
 #ifdef __cplusplus
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/4] i965: skip control-flow aware liveness analysis if we only have 1 block

2015-10-13 Thread Francisco Jerez
Iago Toral Quiroga  writes:

> This fixes the following test:
>
> [require]
> GL >= 3.3
> GLSL >= 3.30
> GL_ARB_shader_storage_buffer_object
>
> [fragment shader]
> #version 330
> #extension GL_ARB_shader_storage_buffer_object: require
>
> buffer SSBO {
> mat4 sm4;
> };
>
>
> mat4 um4;
>
> void main() {
> sm4 *= um4;

This is using the value of "um4", which is never assigned, so liveness
analysis will correctly extend its live interval up to the start of the
block.

The other problem here seems to be that the liveness analysis pass
doesn't consider scalar writes (like the ones emitted by the
combine_constants optimization pass and by emit_uniformize()) to fully
define the contents of a register, so it will incorrectly extend up the
live interval of registers defined using scalar writes even if all
components ever used in the shader have been defined individually.
Incidentally the use-def-chains-based implementation of liveness
analysis I'm working on will fix this issue easily.

> sm4[0][0] = 0.0;
> }
>
> [test]
> link success
>
> where our liveness analysis works really bad because the control-flow part
> will expand register liveness to the end of only block we have (so every
> register will be marked alive until the end of the program). This artificially
> increases register pressure to a point in which we run out of registers.
> Of course, this is only a simple optimization for a trivial case, the
> underlying problem still exists and would manifest when we have more than
> one block, but it helps simple shaders such as the one in the example above
> without any effort. I guess the real fix would involve re-thinking parts of 
> the
> liveness analysis algorithm...
>
> Jordan, there is another thing that we can improve for this shader that is
> specific to SSBOs: we emit the same ssbo load multiple times because we are
> playing it safe just in case there are writes in between. I think we can try 
> to
> do better and not re-issue the same load if we don't have ssbo stores to
> the same address in between. I'll try to come up with a patch for this
> (hopefully we can do this inside lower_ubo_reference).
>
> The actual fix goes into patches 2 (FS) and 4 (VS). Patches 1,3 are just
> indentation fixes in the code around these.
>
> Iago Toral Quiroga (4):
>   i965/fs: Fix indentation in fs_live_variables::compute_start_end
>   i965/fs: skip control-flow aware liveness analysis if we only have 1
> block
>   i965/vec4: fix indentation in vec4_visitor::calculate_live_intervals
>   i965/vec4: skip control-flow aware liveness analysis if we only have 1
> block
>
>  .../drivers/dri/i965/brw_fs_live_variables.cpp | 24 
> ++
>  .../drivers/dri/i965/brw_vec4_live_variables.cpp   | 23 +
>  2 files changed, 30 insertions(+), 17 deletions(-)
>
> -- 
> 1.9.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 2/2] gallium: add tegra support

2015-10-13 Thread Alexandre Courbot
On Mon, Oct 12, 2015 at 12:09 AM, Christian Gmeiner
 wrote:
> This commit adds tegra support, which uses the renderonly driver
> library.
>
> Signed-off-by: Christian Gmeiner 
> ---
>  configure.ac   | 19 +++-
>  src/gallium/Makefile.am|  6 +++
>  .../auxiliary/target-helpers/inline_drm_helper.h   | 29 
>  src/gallium/drivers/tegra/Automake.inc | 10 +
>  src/gallium/drivers/tegra/Makefile.am  |  9 
>  src/gallium/targets/dri/Makefile.am|  2 +
>  src/gallium/winsys/tegra/drm/Android.mk| 34 +++
>  src/gallium/winsys/tegra/drm/Makefile.am   | 33 ++
>  src/gallium/winsys/tegra/drm/Makefile.sources  |  3 ++
>  src/gallium/winsys/tegra/drm/tegra_drm_public.h| 31 +
>  src/gallium/winsys/tegra/drm/tegra_drm_winsys.c| 51 
> ++
>  11 files changed, 226 insertions(+), 1 deletion(-)
>  create mode 100644 src/gallium/drivers/tegra/Automake.inc
>  create mode 100644 src/gallium/drivers/tegra/Makefile.am
>  create mode 100644 src/gallium/winsys/tegra/drm/Android.mk
>  create mode 100644 src/gallium/winsys/tegra/drm/Makefile.am
>  create mode 100644 src/gallium/winsys/tegra/drm/Makefile.sources
>  create mode 100644 src/gallium/winsys/tegra/drm/tegra_drm_public.h
>  create mode 100644 src/gallium/winsys/tegra/drm/tegra_drm_winsys.c
>
> diff --git a/configure.ac b/configure.ac
> index ea485b1..9fb8244 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -75,6 +75,7 @@ LIBDRM_INTEL_REQUIRED=2.4.61
>  LIBDRM_NVVIEUX_REQUIRED=2.4.33
>  LIBDRM_NOUVEAU_REQUIRED=2.4.62
>  LIBDRM_FREEDRENO_REQUIRED=2.4.65
> +LIBDRM_TEGRA_REQUIRED=2.4.58
>  DRI2PROTO_REQUIRED=2.6
>  DRI3PROTO_REQUIRED=1.0
>  PRESENTPROTO_REQUIRED=1.0
> @@ -864,7 +865,7 @@ GALLIUM_DRIVERS_DEFAULT="r300,r600,svga,swrast"
>  AC_ARG_WITH([gallium-drivers],
>  [AS_HELP_STRING([--with-gallium-drivers@<:@=DIRS...@:>@],
>  [comma delimited Gallium drivers list, e.g.
> -"i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,vc4"
> +"i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,tegra,vc4"
>  @<:@default=r300,r600,svga,swrast@:>@])],
>  [with_gallium_drivers="$withval"],
>  [with_gallium_drivers="$GALLIUM_DRIVERS_DEFAULT"])
> @@ -2166,6 +2167,12 @@ if test -n "$with_gallium_drivers"; then
>  HAVE_GALLIUM_LLVMPIPE=yes
>  fi
>  ;;
> +xtegra)
> +HAVE_GALLIUM_TEGRA=yes
> +PKG_CHECK_MODULES([TEGRA], [libdrm_tegra >= 
> $LIBDRM_TEGRA_REQUIRED])
> +gallium_require_drm "tegra"
> +gallium_require_drm_loader
> +;;
>  xvc4)
>  HAVE_GALLIUM_VC4=yes
>  gallium_require_drm "vc4"
> @@ -2181,6 +2188,13 @@ if test -n "$with_gallium_drivers"; then
>  done
>  fi
>
> +dnl We need to validate some needed dependencies for renderonly drivers.
> +
> +if test "x$HAVE_GALLIUM_NOUVEAU" != xyes -a "x$HAVE_GALLIUM_TEGRA" == xyes  
> ; then
> +AC_ERROR([Building with tegra requires that nouveau])
> +fi
> +
> +
>  dnl Set LLVM_LIBS - This is done after the driver configuration so
>  dnl that drivers can add additional components to LLVM_COMPONENTS.
>  dnl Previously, gallium drivers were updating LLVM_LIBS directly
> @@ -2245,6 +2259,7 @@ AM_CONDITIONAL(HAVE_GALLIUM_FREEDRENO, test 
> "x$HAVE_GALLIUM_FREEDRENO" = xyes)
>  AM_CONDITIONAL(HAVE_GALLIUM_SOFTPIPE, test "x$HAVE_GALLIUM_SOFTPIPE" = xyes)
>  AM_CONDITIONAL(HAVE_GALLIUM_LLVMPIPE, test "x$HAVE_GALLIUM_LLVMPIPE" = xyes)
>  AM_CONDITIONAL(HAVE_GALLIUM_VC4, test "x$HAVE_GALLIUM_VC4" = xyes)
> +AM_CONDITIONAL(HAVE_GALLIUM_TEGRA, test "x$HAVE_GALLIUM_TEGRA" = xyes)
>
>  AM_CONDITIONAL(HAVE_GALLIUM_STATIC_TARGETS, test 
> "x$enable_shared_pipe_drivers" = xno)
>
> @@ -2364,6 +2379,7 @@ AC_CONFIG_FILES([Makefile
> src/gallium/drivers/renderonly/Makefile
> src/gallium/drivers/softpipe/Makefile
> src/gallium/drivers/svga/Makefile
> +   src/gallium/drivers/tegra/Makefile
> src/gallium/drivers/trace/Makefile
> src/gallium/drivers/vc4/Makefile
> src/gallium/state_trackers/clover/Makefile
> @@ -2406,6 +2422,7 @@ AC_CONFIG_FILES([Makefile
> src/gallium/winsys/sw/wrapper/Makefile
> src/gallium/winsys/sw/xlib/Makefile
> src/gallium/winsys/vc4/drm/Makefile
> +   src/gallium/winsys/tegra/drm/Makefile
> src/gbm/Makefile
> src/gbm/main/gbm.pc
> src/glsl/Makefile
> diff --git a/src/gallium/Makefile.am b/src/gallium/Makefile.am
> index a7c3606..7278300 100644
> --- a/src/gallium/Makefile.am
> +++ b/src/gallium/Makefile.am
> @@ -82,6 +82,12 @@ if HAVE_GALLIUM_VC4
>  SUBDIRS += 

Re: [Mesa-dev] [PATCH v4 1/2] intel: 48b ppgtt support (EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag)

2015-10-13 Thread Michel Thierry

On 10/6/2015 2:12 PM, Michel Thierry wrote:

On 10/5/2015 7:06 PM, Kristian Høgsberg wrote:

On Mon, Oct 5, 2015 at 7:03 AM, Michel Thierry
 wrote:

On 9/14/2015 2:54 PM, Michał Winiarski wrote:


On Thu, Sep 03, 2015 at 03:23:58PM +0100, Michel Thierry wrote:


Gen8+ supports 48-bit virtual addresses, but some objects must
always be
allocated inside the 32-bit address range.

In specific, any resource used with flat/heapless
(0x-0xf000)
General State Heap (GSH) or Instruction State Heap (ISH) must be in a
32-bit range, because the General State Offset and Instruction State
Offset
are limited to 32-bits.

The i915 driver has been modified to provide a flag to set when the
4GB
limit is not necessary in a given bo
(EXEC_OBJECT_SUPPORTS_48B_ADDRESS).
48-bit range will only be used when explicitly requested.

Callers to the existing drm_intel_bo_emit_reloc function should set
the
use_48b_address_range flag beforehand, in order to use full ppgtt
range.

v2: Make set/clear functions nops on pre-gen8 platforms, and use them
  internally in emit_reloc functions (Ben)
  s/48BADDRESS/48B_ADDRESS/ (Dave)
v3: Keep set/clear functions internal, no-one needs to use them
directly.
v4: Don't set 48bit-support flag in emit reloc, check for ppgtt type
  before enabling set/clear function, print full offsets in debug
  statements, using port of lower_32_bits and upper_32_bits
from linux
  kernel (Michał)

References:
http://lists.freedesktop.org/archives/intel-gfx/2015-July/072612.html
Cc: Ben Widawsky 
Cc: Michał Winiarski 



+Kristian

LGTM.
Acked-by: Michał Winiarski 


Signed-off-by: Michel Thierry 




Hi Kristian,

More comments on this?
I've resent the mesa patch with the last changes
(http://lists.freedesktop.org/archives/dri-devel/2015-October/091752.html).


Michał has agreed with the interface and will use it for his work.
If mesa doesn't plan to use this for now, it's ok. The kernel changes
have
been done to prevent any regressions when the support 48-bit flag is not
set.


I didn't get any replies to my last comments on this:

http://lists.freedesktop.org/archives/mesa-dev/2015-August/091398.html

Basically, I tried explicitly placing buffers above 8G and didn't see
the HW problem described (ie it all worked fine).  I still think
there's some confusion as to what the W/A is about.

Here's my take: the W/A is a SW-only workaround to handle the cases
where a driver uses heapless and 48-bit ppgtt. The problem is that the
heaps are limited to 4G but with 48bit ppgtt a buffer can be placed
anywhere it the 48 bit address space. If that happens it's consideredd
out-of-bounds for the heap and access fails. To prevent this we need
to make sure the bos we address in a heapless fashion are placed below
4g.

For mesa, we only configure general state base as heap-less, which
affects scratch bos. What this boils down to is that we need the 4G
restriction only for the scratch bos set up on 3DSTATE_VS, 3DSTATE_GS
and 3DSTATE_PS (and tesselation stage eventually). Look for the
OUT_RELOC64 for stage->scratch_bo in gen8_vs_state.c, gen8_gs_state.c
and gen8_ps_state.c


I think it also affects _3DSTATE_VIEWPORT_STATE_POINTERS_CC, maybe it
isn't exclusive to the scratch bos (the error state points to that
instruction).




Hi,

Anymore inputs about this patch?
AFAIK, if changes are required based on further comments from Kristian, 
these will be in the mesa patch[1], not libdrm. Also, Michał will use 
this flag in another project.


Thanks,

-Michel

[1]http://lists.freedesktop.org/archives/dri-devel/2015-October/091752.html
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 1/2] intel: 48b ppgtt support (EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag)

2015-10-13 Thread Michel Thierry

On 10/13/2015 3:13 PM, Emil Velikov wrote:

On 13 October 2015 at 13:16, Michel Thierry  wrote:

On 10/6/2015 2:12 PM, Michel Thierry wrote:


On 10/5/2015 7:06 PM, Kristian Høgsberg wrote:


On Mon, Oct 5, 2015 at 7:03 AM, Michel Thierry
 wrote:


On 9/14/2015 2:54 PM, Michał Winiarski wrote:



On Thu, Sep 03, 2015 at 03:23:58PM +0100, Michel Thierry wrote:



Gen8+ supports 48-bit virtual addresses, but some objects must
always be
allocated inside the 32-bit address range.

In specific, any resource used with flat/heapless
(0x-0xf000)
General State Heap (GSH) or Instruction State Heap (ISH) must be in a
32-bit range, because the General State Offset and Instruction State
Offset
are limited to 32-bits.

The i915 driver has been modified to provide a flag to set when the
4GB
limit is not necessary in a given bo
(EXEC_OBJECT_SUPPORTS_48B_ADDRESS).
48-bit range will only be used when explicitly requested.

Callers to the existing drm_intel_bo_emit_reloc function should set
the
use_48b_address_range flag beforehand, in order to use full ppgtt
range.

v2: Make set/clear functions nops on pre-gen8 platforms, and use them
   internally in emit_reloc functions (Ben)
   s/48BADDRESS/48B_ADDRESS/ (Dave)
v3: Keep set/clear functions internal, no-one needs to use them
directly.
v4: Don't set 48bit-support flag in emit reloc, check for ppgtt type
   before enabling set/clear function, print full offsets in debug
   statements, using port of lower_32_bits and upper_32_bits
from linux
   kernel (Michał)

References:
http://lists.freedesktop.org/archives/intel-gfx/2015-July/072612.html
Cc: Ben Widawsky 
Cc: Michał Winiarski 




+Kristian

LGTM.
Acked-by: Michał Winiarski 


Signed-off-by: Michel Thierry 





Hi Kristian,

More comments on this?
I've resent the mesa patch with the last changes

(http://lists.freedesktop.org/archives/dri-devel/2015-October/091752.html).


Michał has agreed with the interface and will use it for his work.
If mesa doesn't plan to use this for now, it's ok. The kernel changes
have
been done to prevent any regressions when the support 48-bit flag is not
set.



I didn't get any replies to my last comments on this:

http://lists.freedesktop.org/archives/mesa-dev/2015-August/091398.html

Basically, I tried explicitly placing buffers above 8G and didn't see
the HW problem described (ie it all worked fine).  I still think
there's some confusion as to what the W/A is about.

Here's my take: the W/A is a SW-only workaround to handle the cases
where a driver uses heapless and 48-bit ppgtt. The problem is that the
heaps are limited to 4G but with 48bit ppgtt a buffer can be placed
anywhere it the 48 bit address space. If that happens it's consideredd
out-of-bounds for the heap and access fails. To prevent this we need
to make sure the bos we address in a heapless fashion are placed below
4g.

For mesa, we only configure general state base as heap-less, which
affects scratch bos. What this boils down to is that we need the 4G
restriction only for the scratch bos set up on 3DSTATE_VS, 3DSTATE_GS
and 3DSTATE_PS (and tesselation stage eventually). Look for the
OUT_RELOC64 for stage->scratch_bo in gen8_vs_state.c, gen8_gs_state.c
and gen8_ps_state.c



I think it also affects _3DSTATE_VIEWPORT_STATE_POINTERS_CC, maybe it
isn't exclusive to the scratch bos (the error state points to that
instruction).




Hi,

Anymore inputs about this patch?
AFAIK, if changes are required based on further comments from Kristian,
these will be in the mesa patch[1], not libdrm. Also, Michał will use this
flag in another project.


The comment seems quite brief and I'm not sure it fully addresses
Kristian's concern. I'd suspect that providing reference to the HW
documentation (confirming your assumption) might be beneficial.



Sure, attached is the hang I get if I don't set the restriction in 
gen8_misc_state.c and try to use the full 48-bit range 
(i915_error_state_nowa.txt). This is just running gfxbench t-rex.


I see the same hang signature when it is only applied to the scratch bos 
(in gen8_vs_state.c, gen8_gs_state.c and gen8_ps_state.c - 
i915_error_state_scratchbo.txt).


3DSTATE_VIEWPORT_STATE_POINTERS_CC (0x7823) is defined here:
https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-bdw-vol02a-commandreference-instructions_0.pdf 
(page 256)


Thanks,

-Michel


i915_error_states.7z
Description: Binary data
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/6] nir: remove dependency on glsl

2015-10-13 Thread Emil Velikov
Hi Rob,

On 10 October 2015 at 19:47, Rob Clark  wrote:
> From: Rob Clark 
>
> Move glsl_types into NIR, now that the dependency on glsl_symbol_table
> has been split out.
>
> Possibly makes sense to rename things at this point, but if we do that
> I'd like to keep it split out into a separate patch to make git history
> easier to follow (IMHO).
>
> Signed-off-by: Rob Clark 
> ---
>  src/glsl/Makefile.am   |3 -
>  src/glsl/Makefile.sources  |4 +-
>  src/glsl/builtin_type_macros.h |  172 --
>  src/glsl/glsl_types.cpp| 1729 
> 
>  src/glsl/glsl_types.h  |  867 --
>  src/glsl/nir/builtin_type_macros.h |  172 ++
>  src/glsl/nir/glsl_types.cpp| 1729 
> 
>  src/glsl/nir/glsl_types.h  |  867 ++
>  src/glsl/nir/nir_types.h   |2 +-
>  .../drivers/dri/i965/brw_cubemap_normalize.cpp |2 +-
>  src/mesa/drivers/dri/i965/brw_fs.cpp   |2 +-
>  src/mesa/drivers/dri/i965/brw_fs.h |2 +-
>  .../dri/i965/brw_fs_channel_expressions.cpp|2 +-
>  src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  |2 +-
>  .../drivers/dri/i965/brw_fs_vector_splitting.cpp   |2 +-
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |2 +-
>  .../dri/i965/brw_lower_unnormalized_offset.cpp |2 +-
>  .../drivers/dri/i965/brw_schedule_instructions.cpp |2 +-
>  src/mesa/main/ff_fragment_shader.cpp   |2 +-
>  src/mesa/main/uniforms.h   |2 +-
>  src/mesa/program/ir_to_mesa.cpp|2 +-
>  src/mesa/program/sampler.cpp   |2 +-
>  22 files changed, 2784 insertions(+), 2787 deletions(-)
>  delete mode 100644 src/glsl/builtin_type_macros.h
>  delete mode 100644 src/glsl/glsl_types.cpp
>  delete mode 100644 src/glsl/glsl_types.h
>  create mode 100644 src/glsl/nir/builtin_type_macros.h
>  create mode 100644 src/glsl/nir/glsl_types.cpp
>  create mode 100644 src/glsl/nir/glsl_types.h
>
> diff --git a/src/glsl/Makefile.am b/src/glsl/Makefile.am
> index 347919b..437c6a5 100644
> --- a/src/glsl/Makefile.am
> +++ b/src/glsl/Makefile.am
> @@ -148,9 +148,6 @@ libglsl_la_SOURCES =  
>   \
>
>
>  libnir_la_SOURCES =\
> -   glsl_types.cpp  \
> -   builtin_types.cpp   \
> -   glsl_symbol_table.cpp   \
> $(NIR_FILES)\
> $(NIR_GENERATED_FILES)
>
> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
> index 436949c..6e61f23 100644
> --- a/src/glsl/Makefile.sources
> +++ b/src/glsl/Makefile.sources
> @@ -20,6 +20,8 @@ NIR_GENERATED_FILES = \
>  NIR_FILES = \
> nir/glsl_to_nir.cpp \
> nir/glsl_to_nir.h \
> +   nir/glsl_types.cpp \
> +   nir/glsl_types.h \
> nir/nir.c \
> nir/nir.h \
> nir/nir_array.h \
> @@ -103,8 +105,6 @@ LIBGLSL_FILES = \
> glsl_parser_extras.h \
> glsl_symbol_table.cpp \
> glsl_symbol_table.h \
> -   glsl_types.cpp \
> -   glsl_types.h \
> hir_field_selection.cpp \
> ir_basic_block.cpp \
> ir_basic_block.h \
Can we split this into two (or more) patches.
 - move the files from glsl to glsl/nir, updating scons/android. note
scons is missing everything NIR related.
 - fold/nuke the additional glsl requirements, from NIR.

From autotools side alone the patch looks great.

Thank you
Emil

P.S. Please use -M for git to detect code movement when generating the patch.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] glsl: couple shader_enums cleanups

2015-10-13 Thread Emil Velikov
On 10 October 2015 at 19:47, Rob Clark  wrote:
> From: Rob Clark 
>
> Add missing enum to gl_system_value_name() and move VARYING_SLOT_MAX /
> FRAG_RESULT_MAX / etc into shader_enums.h as suggested by Emil.
>
> v2: add STATIC_ASSERT()'s
>
> Reported-by: Emil Velikov 
> Signed-off-by: Rob Clark 
> ---
>  src/glsl/nir/shader_enums.c | 8 
>  src/glsl/nir/shader_enums.h | 7 +++
>  src/mesa/main/mtypes.h  | 5 -
>  3 files changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/src/glsl/nir/shader_enums.c b/src/glsl/nir/shader_enums.c
> index 3722475..66a25e7 100644
> --- a/src/glsl/nir/shader_enums.c
> +++ b/src/glsl/nir/shader_enums.c
> @@ -28,6 +28,7 @@
>
>  #include "shader_enums.h"
>  #include "util/macros.h"
> +#include "mesa/main/config.h"
>
>  #define ENUM(x) [x] = #x
>  #define NAME(val) val) < ARRAY_SIZE(names)) && names[(val)]) ? 
> names[(val)] : "UNKNOWN")
> @@ -42,6 +43,7 @@ const char * gl_shader_stage_name(gl_shader_stage stage)
>ENUM(MESA_SHADER_FRAGMENT),
>ENUM(MESA_SHADER_COMPUTE),
> };
> +   STATIC_ASSERT(ARRAY_SIZE(names) == MESA_SHADER_STAGES);
> return NAME(stage);
>  }
>
> @@ -82,6 +84,7 @@ const char * gl_vert_attrib_name(gl_vert_attrib attrib)
>ENUM(VERT_ATTRIB_GENERIC14),
>ENUM(VERT_ATTRIB_GENERIC15),
> };
> +   STATIC_ASSERT(ARRAY_SIZE(names) == VERT_ATTRIB_MAX);
> return NAME(attrib);
>  }
>
> @@ -147,6 +150,7 @@ const char * gl_varying_slot_name(gl_varying_slot slot)
>ENUM(VARYING_SLOT_VAR30),
>ENUM(VARYING_SLOT_VAR31),
> };
> +   STATIC_ASSERT(ARRAY_SIZE(names) == VARYING_SLOT_MAX);
> return NAME(slot);
>  }
>
> @@ -169,8 +173,10 @@ const char * gl_system_value_name(gl_system_value sysval)
>   ENUM(SYSTEM_VALUE_TESS_LEVEL_INNER),
>   ENUM(SYSTEM_VALUE_LOCAL_INVOCATION_ID),
>   ENUM(SYSTEM_VALUE_WORK_GROUP_ID),
> + ENUM(SYSTEM_VALUE_NUM_WORK_GROUPS),
>   ENUM(SYSTEM_VALUE_VERTEX_CNT),
> };
> +   STATIC_ASSERT(ARRAY_SIZE(names) == SYSTEM_VALUE_MAX);
> return NAME(sysval);
>  }
>
> @@ -182,6 +188,7 @@ const char * glsl_interp_qualifier_name(enum 
> glsl_interp_qualifier qual)
>ENUM(INTERP_QUALIFIER_FLAT),
>ENUM(INTERP_QUALIFIER_NOPERSPECTIVE),
> };
> +   STATIC_ASSERT(ARRAY_SIZE(names) == INTERP_QUALIFIER_COUNT);
> return NAME(qual);
>  }
>
> @@ -201,5 +208,6 @@ const char * gl_frag_result_name(gl_frag_result result)
>ENUM(FRAG_RESULT_DATA6),
>ENUM(FRAG_RESULT_DATA7),
> };
> +   STATIC_ASSERT(ARRAY_SIZE(names) == FRAG_RESULT_MAX);
> return NAME(result);
>  }
> diff --git a/src/glsl/nir/shader_enums.h b/src/glsl/nir/shader_enums.h
> index 2a5d2c5..77638ba 100644
> --- a/src/glsl/nir/shader_enums.h
> +++ b/src/glsl/nir/shader_enums.h
> @@ -233,6 +233,11 @@ typedef enum
> VARYING_SLOT_VAR31,
>  } gl_varying_slot;
>
> +
> +#define VARYING_SLOT_MAX   (VARYING_SLOT_VAR0 + MAX_VARYING)
> +#define VARYING_SLOT_PATCH0(VARYING_SLOT_MAX)
> +#define VARYING_SLOT_TESS_MAX  (VARYING_SLOT_PATCH0 + MAX_VARYING)
> +
As mentioned before I'm not sure (and perhaps not the best person to
comment) on the correct definition of the macros, so I'd call this
Acked-by: Emil Velikov 

For patches 2,3,4 and 6
Reviewed-by: Emil Velikov 

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 1/2] intel: 48b ppgtt support (EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag)

2015-10-13 Thread Emil Velikov
On 13 October 2015 at 13:16, Michel Thierry  wrote:
> On 10/6/2015 2:12 PM, Michel Thierry wrote:
>>
>> On 10/5/2015 7:06 PM, Kristian Høgsberg wrote:
>>>
>>> On Mon, Oct 5, 2015 at 7:03 AM, Michel Thierry
>>>  wrote:

 On 9/14/2015 2:54 PM, Michał Winiarski wrote:
>
>
> On Thu, Sep 03, 2015 at 03:23:58PM +0100, Michel Thierry wrote:
>>
>>
>> Gen8+ supports 48-bit virtual addresses, but some objects must
>> always be
>> allocated inside the 32-bit address range.
>>
>> In specific, any resource used with flat/heapless
>> (0x-0xf000)
>> General State Heap (GSH) or Instruction State Heap (ISH) must be in a
>> 32-bit range, because the General State Offset and Instruction State
>> Offset
>> are limited to 32-bits.
>>
>> The i915 driver has been modified to provide a flag to set when the
>> 4GB
>> limit is not necessary in a given bo
>> (EXEC_OBJECT_SUPPORTS_48B_ADDRESS).
>> 48-bit range will only be used when explicitly requested.
>>
>> Callers to the existing drm_intel_bo_emit_reloc function should set
>> the
>> use_48b_address_range flag beforehand, in order to use full ppgtt
>> range.
>>
>> v2: Make set/clear functions nops on pre-gen8 platforms, and use them
>>   internally in emit_reloc functions (Ben)
>>   s/48BADDRESS/48B_ADDRESS/ (Dave)
>> v3: Keep set/clear functions internal, no-one needs to use them
>> directly.
>> v4: Don't set 48bit-support flag in emit reloc, check for ppgtt type
>>   before enabling set/clear function, print full offsets in debug
>>   statements, using port of lower_32_bits and upper_32_bits
>> from linux
>>   kernel (Michał)
>>
>> References:
>> http://lists.freedesktop.org/archives/intel-gfx/2015-July/072612.html
>> Cc: Ben Widawsky 
>> Cc: Michał Winiarski 
>
>
>
> +Kristian
>
> LGTM.
> Acked-by: Michał Winiarski 
>
>> Signed-off-by: Michel Thierry 




 Hi Kristian,

 More comments on this?
 I've resent the mesa patch with the last changes

 (http://lists.freedesktop.org/archives/dri-devel/2015-October/091752.html).


 Michał has agreed with the interface and will use it for his work.
 If mesa doesn't plan to use this for now, it's ok. The kernel changes
 have
 been done to prevent any regressions when the support 48-bit flag is not
 set.
>>>
>>>
>>> I didn't get any replies to my last comments on this:
>>>
>>> http://lists.freedesktop.org/archives/mesa-dev/2015-August/091398.html
>>>
>>> Basically, I tried explicitly placing buffers above 8G and didn't see
>>> the HW problem described (ie it all worked fine).  I still think
>>> there's some confusion as to what the W/A is about.
>>>
>>> Here's my take: the W/A is a SW-only workaround to handle the cases
>>> where a driver uses heapless and 48-bit ppgtt. The problem is that the
>>> heaps are limited to 4G but with 48bit ppgtt a buffer can be placed
>>> anywhere it the 48 bit address space. If that happens it's consideredd
>>> out-of-bounds for the heap and access fails. To prevent this we need
>>> to make sure the bos we address in a heapless fashion are placed below
>>> 4g.
>>>
>>> For mesa, we only configure general state base as heap-less, which
>>> affects scratch bos. What this boils down to is that we need the 4G
>>> restriction only for the scratch bos set up on 3DSTATE_VS, 3DSTATE_GS
>>> and 3DSTATE_PS (and tesselation stage eventually). Look for the
>>> OUT_RELOC64 for stage->scratch_bo in gen8_vs_state.c, gen8_gs_state.c
>>> and gen8_ps_state.c
>>
>>
>> I think it also affects _3DSTATE_VIEWPORT_STATE_POINTERS_CC, maybe it
>> isn't exclusive to the scratch bos (the error state points to that
>> instruction).
>>
>>
>
> Hi,
>
> Anymore inputs about this patch?
> AFAIK, if changes are required based on further comments from Kristian,
> these will be in the mesa patch[1], not libdrm. Also, Michał will use this
> flag in another project.
>
The comment seems quite brief and I'm not sure it fully addresses
Kristian's concern. I'd suspect that providing reference to the HW
documentation (confirming your assumption) might be beneficial.

Regards,
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 90264] [Regression, bisected] Tooltip corruption in Chrome

2015-10-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=90264

Patryk Zawadzki  changed:

   What|Removed |Added

 CC||pat...@gmail.com

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/6] nir: remove dependency on glsl

2015-10-13 Thread Rob Clark
On Tue, Oct 13, 2015 at 11:22 AM, Emil Velikov  wrote:
> Hi Rob,
>
> On 10 October 2015 at 19:47, Rob Clark  wrote:
>> From: Rob Clark 
>>
>> Move glsl_types into NIR, now that the dependency on glsl_symbol_table
>> has been split out.
>>
>> Possibly makes sense to rename things at this point, but if we do that
>> I'd like to keep it split out into a separate patch to make git history
>> easier to follow (IMHO).
>>
>> Signed-off-by: Rob Clark 
>> ---
>>  src/glsl/Makefile.am   |3 -
>>  src/glsl/Makefile.sources  |4 +-
>>  src/glsl/builtin_type_macros.h |  172 --
>>  src/glsl/glsl_types.cpp| 1729 
>> 
>>  src/glsl/glsl_types.h  |  867 --
>>  src/glsl/nir/builtin_type_macros.h |  172 ++
>>  src/glsl/nir/glsl_types.cpp| 1729 
>> 
>>  src/glsl/nir/glsl_types.h  |  867 ++
>>  src/glsl/nir/nir_types.h   |2 +-
>>  .../drivers/dri/i965/brw_cubemap_normalize.cpp |2 +-
>>  src/mesa/drivers/dri/i965/brw_fs.cpp   |2 +-
>>  src/mesa/drivers/dri/i965/brw_fs.h |2 +-
>>  .../dri/i965/brw_fs_channel_expressions.cpp|2 +-
>>  src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  |2 +-
>>  .../drivers/dri/i965/brw_fs_vector_splitting.cpp   |2 +-
>>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |2 +-
>>  .../dri/i965/brw_lower_unnormalized_offset.cpp |2 +-
>>  .../drivers/dri/i965/brw_schedule_instructions.cpp |2 +-
>>  src/mesa/main/ff_fragment_shader.cpp   |2 +-
>>  src/mesa/main/uniforms.h   |2 +-
>>  src/mesa/program/ir_to_mesa.cpp|2 +-
>>  src/mesa/program/sampler.cpp   |2 +-
>>  22 files changed, 2784 insertions(+), 2787 deletions(-)
>>  delete mode 100644 src/glsl/builtin_type_macros.h
>>  delete mode 100644 src/glsl/glsl_types.cpp
>>  delete mode 100644 src/glsl/glsl_types.h
>>  create mode 100644 src/glsl/nir/builtin_type_macros.h
>>  create mode 100644 src/glsl/nir/glsl_types.cpp
>>  create mode 100644 src/glsl/nir/glsl_types.h
>>
>> diff --git a/src/glsl/Makefile.am b/src/glsl/Makefile.am
>> index 347919b..437c6a5 100644
>> --- a/src/glsl/Makefile.am
>> +++ b/src/glsl/Makefile.am
>> @@ -148,9 +148,6 @@ libglsl_la_SOURCES = 
>>\
>>
>>
>>  libnir_la_SOURCES =\
>> -   glsl_types.cpp  \
>> -   builtin_types.cpp   \
>> -   glsl_symbol_table.cpp   \
>> $(NIR_FILES)\
>> $(NIR_GENERATED_FILES)
>>
>> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
>> index 436949c..6e61f23 100644
>> --- a/src/glsl/Makefile.sources
>> +++ b/src/glsl/Makefile.sources
>> @@ -20,6 +20,8 @@ NIR_GENERATED_FILES = \
>>  NIR_FILES = \
>> nir/glsl_to_nir.cpp \
>> nir/glsl_to_nir.h \
>> +   nir/glsl_types.cpp \
>> +   nir/glsl_types.h \
>> nir/nir.c \
>> nir/nir.h \
>> nir/nir_array.h \
>> @@ -103,8 +105,6 @@ LIBGLSL_FILES = \
>> glsl_parser_extras.h \
>> glsl_symbol_table.cpp \
>> glsl_symbol_table.h \
>> -   glsl_types.cpp \
>> -   glsl_types.h \
>> hir_field_selection.cpp \
>> ir_basic_block.cpp \
>> ir_basic_block.h \
> Can we split this into two (or more) patches.
>  - move the files from glsl to glsl/nir, updating scons/android. note
> scons is missing everything NIR related.
>  - fold/nuke the additional glsl requirements, from NIR.

It is already split up this way.. this patch is primarily the move
(plus header path tweaks, etc, to keep things compiling).  I don't see
how it could be split up any finer while keeping bisectability (ie.
not breaking compile in the middle).

That said, I did completely ignore scons/android.  I don't know the
first thing about scons or how to do a scons build, so I think I'll
ignore that and let someone else fix it up.  I suppose I could fix
android build, although I can't build android on my laptop (and I
guess I'd have to rebase some of the other android related stuff that
isn't upstream yet), so maybe I'll just fix that in a follow-on patch
this weekend.

BR,
-R

> From autotools side alone the patch looks great.
>
> Thank you
> Emil
>
> P.S. Please use -M for git to detect code movement when generating the patch.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] glsl: couple shader_enums cleanups

2015-10-13 Thread Jason Ekstrand
On Oct 13, 2015 8:26 AM, "Emil Velikov"  wrote:
>
> On 10 October 2015 at 19:47, Rob Clark  wrote:
> > From: Rob Clark 
> >
> > Add missing enum to gl_system_value_name() and move VARYING_SLOT_MAX /
> > FRAG_RESULT_MAX / etc into shader_enums.h as suggested by Emil.
> >
> > v2: add STATIC_ASSERT()'s
> >
> > Reported-by: Emil Velikov 
> > Signed-off-by: Rob Clark 
> > ---
> >  src/glsl/nir/shader_enums.c | 8 
> >  src/glsl/nir/shader_enums.h | 7 +++
> >  src/mesa/main/mtypes.h  | 5 -
> >  3 files changed, 15 insertions(+), 5 deletions(-)
> >
> > diff --git a/src/glsl/nir/shader_enums.c b/src/glsl/nir/shader_enums.c
> > index 3722475..66a25e7 100644
> > --- a/src/glsl/nir/shader_enums.c
> > +++ b/src/glsl/nir/shader_enums.c
> > @@ -28,6 +28,7 @@
> >
> >  #include "shader_enums.h"
> >  #include "util/macros.h"
> > +#include "mesa/main/config.h"
> >
> >  #define ENUM(x) [x] = #x
> >  #define NAME(val) val) < ARRAY_SIZE(names)) && names[(val)]) ?
names[(val)] : "UNKNOWN")
> > @@ -42,6 +43,7 @@ const char * gl_shader_stage_name(gl_shader_stage
stage)
> >ENUM(MESA_SHADER_FRAGMENT),
> >ENUM(MESA_SHADER_COMPUTE),
> > };
> > +   STATIC_ASSERT(ARRAY_SIZE(names) == MESA_SHADER_STAGES);
> > return NAME(stage);
> >  }
> >
> > @@ -82,6 +84,7 @@ const char * gl_vert_attrib_name(gl_vert_attrib
attrib)
> >ENUM(VERT_ATTRIB_GENERIC14),
> >ENUM(VERT_ATTRIB_GENERIC15),
> > };
> > +   STATIC_ASSERT(ARRAY_SIZE(names) == VERT_ATTRIB_MAX);
> > return NAME(attrib);
> >  }
> >
> > @@ -147,6 +150,7 @@ const char * gl_varying_slot_name(gl_varying_slot
slot)
> >ENUM(VARYING_SLOT_VAR30),
> >ENUM(VARYING_SLOT_VAR31),
> > };
> > +   STATIC_ASSERT(ARRAY_SIZE(names) == VARYING_SLOT_MAX);
> > return NAME(slot);
> >  }
> >
> > @@ -169,8 +173,10 @@ const char * gl_system_value_name(gl_system_value
sysval)
> >   ENUM(SYSTEM_VALUE_TESS_LEVEL_INNER),
> >   ENUM(SYSTEM_VALUE_LOCAL_INVOCATION_ID),
> >   ENUM(SYSTEM_VALUE_WORK_GROUP_ID),
> > + ENUM(SYSTEM_VALUE_NUM_WORK_GROUPS),
> >   ENUM(SYSTEM_VALUE_VERTEX_CNT),
> > };
> > +   STATIC_ASSERT(ARRAY_SIZE(names) == SYSTEM_VALUE_MAX);
> > return NAME(sysval);
> >  }
> >
> > @@ -182,6 +188,7 @@ const char * glsl_interp_qualifier_name(enum
glsl_interp_qualifier qual)
> >ENUM(INTERP_QUALIFIER_FLAT),
> >ENUM(INTERP_QUALIFIER_NOPERSPECTIVE),
> > };
> > +   STATIC_ASSERT(ARRAY_SIZE(names) == INTERP_QUALIFIER_COUNT);
> > return NAME(qual);
> >  }
> >
> > @@ -201,5 +208,6 @@ const char * gl_frag_result_name(gl_frag_result
result)
> >ENUM(FRAG_RESULT_DATA6),
> >ENUM(FRAG_RESULT_DATA7),
> > };
> > +   STATIC_ASSERT(ARRAY_SIZE(names) == FRAG_RESULT_MAX);
> > return NAME(result);
> >  }
> > diff --git a/src/glsl/nir/shader_enums.h b/src/glsl/nir/shader_enums.h
> > index 2a5d2c5..77638ba 100644
> > --- a/src/glsl/nir/shader_enums.h
> > +++ b/src/glsl/nir/shader_enums.h
> > @@ -233,6 +233,11 @@ typedef enum
> > VARYING_SLOT_VAR31,
> >  } gl_varying_slot;
> >
> > +
> > +#define VARYING_SLOT_MAX   (VARYING_SLOT_VAR0 + MAX_VARYING)
> > +#define VARYING_SLOT_PATCH0(VARYING_SLOT_MAX)
> > +#define VARYING_SLOT_TESS_MAX  (VARYING_SLOT_PATCH0 + MAX_VARYING)
> > +
> As mentioned before I'm not sure (and perhaps not the best person to
> comment) on the correct definition of the macros, so I'd call this
> Acked-by: Emil Velikov 
>
> For patches 2,3,4 and 6
> Reviewed-by: Emil Velikov 

You can have my R-B on the series as well. I think we're going to do even
more compiler source shuffling soon but this is at least the right
direction.
--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/4] i965: skip control-flow aware liveness analysis if we only have 1 block

2015-10-13 Thread Jordan Justen
On 2015-10-13 05:17:37, Francisco Jerez wrote:
> Iago Toral Quiroga  writes:
> 
> > This fixes the following test:
> >
> > [require]
> > GL >= 3.3
> > GLSL >= 3.30
> > GL_ARB_shader_storage_buffer_object
> >
> > [fragment shader]
> > #version 330
> > #extension GL_ARB_shader_storage_buffer_object: require
> >
> > buffer SSBO {
> > mat4 sm4;
> > };
> >
> >
> > mat4 um4;
> >
> > void main() {
> > sm4 *= um4;
> 
> This is using the value of "um4", which is never assigned, so liveness
> analysis will correctly extend its live interval up to the start of the
> block.

This test was derived by simplifying a CTS test case.

Anyway, I'm not sure what happened on the way to the commit message,
but um4 should be a uniform.

http://sprunge.us/cEUe

-Jordan

> The other problem here seems to be that the liveness analysis pass
> doesn't consider scalar writes (like the ones emitted by the
> combine_constants optimization pass and by emit_uniformize()) to fully
> define the contents of a register, so it will incorrectly extend up the
> live interval of registers defined using scalar writes even if all
> components ever used in the shader have been defined individually.
> Incidentally the use-def-chains-based implementation of liveness
> analysis I'm working on will fix this issue easily.
> 
> > sm4[0][0] = 0.0;
> > }
> >
> > [test]
> > link success
> >
> > where our liveness analysis works really bad because the control-flow part
> > will expand register liveness to the end of only block we have (so every
> > register will be marked alive until the end of the program). This 
> > artificially
> > increases register pressure to a point in which we run out of registers.
> > Of course, this is only a simple optimization for a trivial case, the
> > underlying problem still exists and would manifest when we have more than
> > one block, but it helps simple shaders such as the one in the example above
> > without any effort. I guess the real fix would involve re-thinking parts of 
> > the
> > liveness analysis algorithm...
> >
> > Jordan, there is another thing that we can improve for this shader that is
> > specific to SSBOs: we emit the same ssbo load multiple times because we are
> > playing it safe just in case there are writes in between. I think we can 
> > try to
> > do better and not re-issue the same load if we don't have ssbo stores to
> > the same address in between. I'll try to come up with a patch for this
> > (hopefully we can do this inside lower_ubo_reference).
> >
> > The actual fix goes into patches 2 (FS) and 4 (VS). Patches 1,3 are just
> > indentation fixes in the code around these.
> >
> > Iago Toral Quiroga (4):
> >   i965/fs: Fix indentation in fs_live_variables::compute_start_end
> >   i965/fs: skip control-flow aware liveness analysis if we only have 1
> > block
> >   i965/vec4: fix indentation in vec4_visitor::calculate_live_intervals
> >   i965/vec4: skip control-flow aware liveness analysis if we only have 1
> > block
> >
> >  .../drivers/dri/i965/brw_fs_live_variables.cpp | 24 
> > ++
> >  .../drivers/dri/i965/brw_vec4_live_variables.cpp   | 23 
> > +
> >  2 files changed, 30 insertions(+), 17 deletions(-)
> >
> > -- 
> > 1.9.1
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/6] nir: remove dependency on glsl

2015-10-13 Thread Emil Velikov
On 13 October 2015 at 16:37, Rob Clark  wrote:
> On Tue, Oct 13, 2015 at 11:22 AM, Emil Velikov  
> wrote:
>> Hi Rob,
>>
>> On 10 October 2015 at 19:47, Rob Clark  wrote:
>>> From: Rob Clark 
>>>
>>> Move glsl_types into NIR, now that the dependency on glsl_symbol_table
>>> has been split out.
>>>
>>> Possibly makes sense to rename things at this point, but if we do that
>>> I'd like to keep it split out into a separate patch to make git history
>>> easier to follow (IMHO).
>>>
>>> Signed-off-by: Rob Clark 
>>> ---
>>>  src/glsl/Makefile.am   |3 -
>>>  src/glsl/Makefile.sources  |4 +-
>>>  src/glsl/builtin_type_macros.h |  172 --
>>>  src/glsl/glsl_types.cpp| 1729 
>>> 
>>>  src/glsl/glsl_types.h  |  867 --
>>>  src/glsl/nir/builtin_type_macros.h |  172 ++
>>>  src/glsl/nir/glsl_types.cpp| 1729 
>>> 
>>>  src/glsl/nir/glsl_types.h  |  867 ++
>>>  src/glsl/nir/nir_types.h   |2 +-
>>>  .../drivers/dri/i965/brw_cubemap_normalize.cpp |2 +-
>>>  src/mesa/drivers/dri/i965/brw_fs.cpp   |2 +-
>>>  src/mesa/drivers/dri/i965/brw_fs.h |2 +-
>>>  .../dri/i965/brw_fs_channel_expressions.cpp|2 +-
>>>  src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  |2 +-
>>>  .../drivers/dri/i965/brw_fs_vector_splitting.cpp   |2 +-
>>>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |2 +-
>>>  .../dri/i965/brw_lower_unnormalized_offset.cpp |2 +-
>>>  .../drivers/dri/i965/brw_schedule_instructions.cpp |2 +-
>>>  src/mesa/main/ff_fragment_shader.cpp   |2 +-
>>>  src/mesa/main/uniforms.h   |2 +-
>>>  src/mesa/program/ir_to_mesa.cpp|2 +-
>>>  src/mesa/program/sampler.cpp   |2 +-
>>>  22 files changed, 2784 insertions(+), 2787 deletions(-)
>>>  delete mode 100644 src/glsl/builtin_type_macros.h
>>>  delete mode 100644 src/glsl/glsl_types.cpp
>>>  delete mode 100644 src/glsl/glsl_types.h
>>>  create mode 100644 src/glsl/nir/builtin_type_macros.h
>>>  create mode 100644 src/glsl/nir/glsl_types.cpp
>>>  create mode 100644 src/glsl/nir/glsl_types.h
>>>
>>> diff --git a/src/glsl/Makefile.am b/src/glsl/Makefile.am
>>> index 347919b..437c6a5 100644
>>> --- a/src/glsl/Makefile.am
>>> +++ b/src/glsl/Makefile.am
>>> @@ -148,9 +148,6 @@ libglsl_la_SOURCES =
>>> \
>>>
>>>
>>>  libnir_la_SOURCES =\
>>> -   glsl_types.cpp  \
>>> -   builtin_types.cpp   \
>>> -   glsl_symbol_table.cpp   \
>>> $(NIR_FILES)\
>>> $(NIR_GENERATED_FILES)
>>>
>>> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
>>> index 436949c..6e61f23 100644
>>> --- a/src/glsl/Makefile.sources
>>> +++ b/src/glsl/Makefile.sources
>>> @@ -20,6 +20,8 @@ NIR_GENERATED_FILES = \
>>>  NIR_FILES = \
>>> nir/glsl_to_nir.cpp \
>>> nir/glsl_to_nir.h \
>>> +   nir/glsl_types.cpp \
>>> +   nir/glsl_types.h \
>>> nir/nir.c \
>>> nir/nir.h \
>>> nir/nir_array.h \
>>> @@ -103,8 +105,6 @@ LIBGLSL_FILES = \
>>> glsl_parser_extras.h \
>>> glsl_symbol_table.cpp \
>>> glsl_symbol_table.h \
>>> -   glsl_types.cpp \
>>> -   glsl_types.h \
>>> hir_field_selection.cpp \
>>> ir_basic_block.cpp \
>>> ir_basic_block.h \
>> Can we split this into two (or more) patches.
>>  - move the files from glsl to glsl/nir, updating scons/android. note
>> scons is missing everything NIR related.
>>  - fold/nuke the additional glsl requirements, from NIR.
>
> It is already split up this way.. this patch is primarily the move
> (plus header path tweaks, etc, to keep things compiling).  I don't see
> how it could be split up any finer while keeping bisectability (ie.
> not breaking compile in the middle).
>
> That said, I did completely ignore scons/android.  I don't know the
> first thing about scons or how to do a scons build, so I think I'll
> ignore that and let someone else fix it up.  I suppose I could fix
> android build, although I can't build android on my laptop (and I
> guess I'd have to rebase some of the other android related stuff that
> isn't upstream yet), so maybe I'll just fix that in a follow-on patch
> this weekend.
>
Essentially you'd want (I've expanded greatly, but feel free do you
your own split)
 - Add libnir for scons.
 - Android split out libnir static lib.
 - Move the file - GLSL_FILES -> 

Re: [Mesa-dev] [Mesa-stable] [PATCH v2] configure.ac: ensure RM is set

2015-10-13 Thread Matt Turner
On Tue, Oct 13, 2015 at 3:25 AM, Emil Velikov  wrote:
> On 10 October 2015 at 07:42, Jonathan Gray  wrote:
>> GNU make predefines RM to rm -f but this is not required by POSIX
>> so ensure that RM is set.  This fixes "make clean" on OpenBSD.
>>
>> v2: use AC_CHECK_PROG
>>
> I'm ambivalent if we go the AC_CHECK_PROG vs ?= route.
>
> I'll let others pick their favourite, but both are
> Reviewed-by: Emil Velikov 

I think AC_CHECK_PROG is the way to go.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: pass caller name to create_textures()

2015-10-13 Thread Anuj Phogat
On Mon, Oct 12, 2015 at 7:44 PM, Brian Paul  wrote:
> Simpler than the dsa flag approach.
> ---
>  src/mesa/main/texobj.c | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/src/mesa/main/texobj.c b/src/mesa/main/texobj.c
> index 60c55ae..b571b1b 100644
> --- a/src/mesa/main/texobj.c
> +++ b/src/mesa/main/texobj.c
> @@ -1205,17 +1205,16 @@ invalidate_tex_image_error_check(struct gl_context 
> *ctx, GLuint texture,
>   */
>  static void
>  create_textures(struct gl_context *ctx, GLenum target,
> -GLsizei n, GLuint *textures, bool dsa)
> +GLsizei n, GLuint *textures, const char *caller)
>  {
> GLuint first;
> GLint i;
> -   const char *func = dsa ? "Create" : "Gen";
>
> if (MESA_VERBOSE & (VERBOSE_API|VERBOSE_TEXTURE))
> -  _mesa_debug(ctx, "gl%sTextures %d\n", func, n);
> +  _mesa_debug(ctx, "%s %d\n", caller, n);
>
> if (n < 0) {
> -  _mesa_error( ctx, GL_INVALID_VALUE, "gl%sTextures(n < 0)", func );
> +  _mesa_error(ctx, GL_INVALID_VALUE, "%s(n < 0)", caller);
>return;
> }
>
> @@ -1236,7 +1235,7 @@ create_textures(struct gl_context *ctx, GLenum target,
>texObj = ctx->Driver.NewTextureObject(ctx, name, target);
>if (!texObj) {
>   mtx_unlock(>Shared->Mutex);
> - _mesa_error(ctx, GL_OUT_OF_MEMORY, "gl%sTextures", func);
> + _mesa_error(ctx, GL_OUT_OF_MEMORY, "gl%sTextures", caller);
_mesa_error(ctx, GL_OUT_OF_MEMORY, "%s", caller);

>   return;
>}
>
> @@ -1273,7 +1272,7 @@ void GLAPIENTRY
>  _mesa_GenTextures(GLsizei n, GLuint *textures)
>  {
> GET_CURRENT_CONTEXT(ctx);
> -   create_textures(ctx, 0, n, textures, false);
> +   create_textures(ctx, 0, n, textures, "glGenTextures");
>  }
>
>  /**
> @@ -1306,7 +1305,7 @@ _mesa_CreateTextures(GLenum target, GLsizei n, GLuint 
> *textures)
>return;
> }
>
> -   create_textures(ctx, target, n, textures, true);
> +   create_textures(ctx, target, n, textures, "glCreateTextures");
>  }
>
>  /**
> --
> 1.9.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

With the suggested change:
Reviewed-by: Anuj Phogat 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] nir: split SSBO min/max atomic instrinsics into signed/unsigned versions

2015-10-13 Thread Kristian Høgsberg
On Tue, Sep 29, 2015 at 09:25:27AM +0200, Iago Toral Quiroga wrote:
> NIR is typeless so this is the only way to keep track of the
> type to select the proper atomic to use.
> 
> v2:
>   - Use imin,imax,umin,umax for the intrinsic names (Connor Abbott)
>   - Change message for unreachable paths (Michael Schellenberger)

Looks good,

Reviewed-by: Kristian Høgsberg 

> ---
>  src/glsl/nir/glsl_to_nir.cpp   | 22 ++
>  src/glsl/nir/nir_intrinsics.h  |  6 --
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 20 ++--
>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 22 +++---
>  4 files changed, 43 insertions(+), 27 deletions(-)
> 
> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
> index f03a107..25f41a7 100644
> --- a/src/glsl/nir/glsl_to_nir.cpp
> +++ b/src/glsl/nir/glsl_to_nir.cpp
> @@ -662,9 +662,21 @@ nir_visitor::visit(ir_call *ir)
>} else if (strcmp(ir->callee_name(), 
> "__intrinsic_ssbo_atomic_xor_internal") == 0) {
>   op = nir_intrinsic_ssbo_atomic_xor;
>} else if (strcmp(ir->callee_name(), 
> "__intrinsic_ssbo_atomic_min_internal") == 0) {
> - op = nir_intrinsic_ssbo_atomic_min;
> + assert(ir->return_deref);
> + if (ir->return_deref->type == glsl_type::int_type)
> +op = nir_intrinsic_ssbo_atomic_imin;
> + else if (ir->return_deref->type == glsl_type::uint_type)
> +op = nir_intrinsic_ssbo_atomic_umin;
> + else
> +unreachable("Invalid type");
>} else if (strcmp(ir->callee_name(), 
> "__intrinsic_ssbo_atomic_max_internal") == 0) {
> - op = nir_intrinsic_ssbo_atomic_max;
> + assert(ir->return_deref);
> + if (ir->return_deref->type == glsl_type::int_type)
> +op = nir_intrinsic_ssbo_atomic_imax;
> + else if (ir->return_deref->type == glsl_type::uint_type)
> +op = nir_intrinsic_ssbo_atomic_umax;
> + else
> +unreachable("Invalid type");
>} else if (strcmp(ir->callee_name(), 
> "__intrinsic_ssbo_atomic_exchange_internal") == 0) {
>   op = nir_intrinsic_ssbo_atomic_exchange;
>} else if (strcmp(ir->callee_name(), 
> "__intrinsic_ssbo_atomic_comp_swap_internal") == 0) {
> @@ -874,8 +886,10 @@ nir_visitor::visit(ir_call *ir)
>   break;
>}
>case nir_intrinsic_ssbo_atomic_add:
> -  case nir_intrinsic_ssbo_atomic_min:
> -  case nir_intrinsic_ssbo_atomic_max:
> +  case nir_intrinsic_ssbo_atomic_imin:
> +  case nir_intrinsic_ssbo_atomic_umin:
> +  case nir_intrinsic_ssbo_atomic_imax:
> +  case nir_intrinsic_ssbo_atomic_umax:
>case nir_intrinsic_ssbo_atomic_and:
>case nir_intrinsic_ssbo_atomic_or:
>case nir_intrinsic_ssbo_atomic_xor:
> diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
> index 06f1b02..ff42bb2 100644
> --- a/src/glsl/nir/nir_intrinsics.h
> +++ b/src/glsl/nir/nir_intrinsics.h
> @@ -174,8 +174,10 @@ INTRINSIC(image_samples, 0, ARR(), true, 1, 1, 0,
>   * 3: For CompSwap only: the second data parameter.
>   */
>  INTRINSIC(ssbo_atomic_add, 3, ARR(1, 1, 1), true, 1, 0, 0, 0)
> -INTRINSIC(ssbo_atomic_min, 3, ARR(1, 1, 1), true, 1, 0, 0, 0)
> -INTRINSIC(ssbo_atomic_max, 3, ARR(1, 1, 1), true, 1, 0, 0, 0)
> +INTRINSIC(ssbo_atomic_imin, 3, ARR(1, 1, 1), true, 1, 0, 0, 0)
> +INTRINSIC(ssbo_atomic_umin, 3, ARR(1, 1, 1), true, 1, 0, 0, 0)
> +INTRINSIC(ssbo_atomic_imax, 3, ARR(1, 1, 1), true, 1, 0, 0, 0)
> +INTRINSIC(ssbo_atomic_umax, 3, ARR(1, 1, 1), true, 1, 0, 0, 0)
>  INTRINSIC(ssbo_atomic_and, 3, ARR(1, 1, 1), true, 1, 0, 0, 0)
>  INTRINSIC(ssbo_atomic_or, 3, ARR(1, 1, 1), true, 1, 0, 0, 0)
>  INTRINSIC(ssbo_atomic_xor, 3, ARR(1, 1, 1), true, 1, 0, 0, 0)
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index a2bc5c6..0b9555c 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -1870,17 +1870,17 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
> nir_intrinsic_instr *instr
> case nir_intrinsic_ssbo_atomic_add:
>nir_emit_ssbo_atomic(bld, BRW_AOP_ADD, instr);
>break;
> -   case nir_intrinsic_ssbo_atomic_min:
> -  if (dest.type == BRW_REGISTER_TYPE_D)
> - nir_emit_ssbo_atomic(bld, BRW_AOP_IMIN, instr);
> -  else
> - nir_emit_ssbo_atomic(bld, BRW_AOP_UMIN, instr);
> +   case nir_intrinsic_ssbo_atomic_imin:
> +  nir_emit_ssbo_atomic(bld, BRW_AOP_IMIN, instr);
>break;
> -   case nir_intrinsic_ssbo_atomic_max:
> -  if (dest.type == BRW_REGISTER_TYPE_D)
> - nir_emit_ssbo_atomic(bld, BRW_AOP_IMAX, instr);
> -  else
> - nir_emit_ssbo_atomic(bld, BRW_AOP_UMAX, instr);
> +   case nir_intrinsic_ssbo_atomic_umin:
> +  nir_emit_ssbo_atomic(bld, BRW_AOP_UMIN, instr);
> +  break;
> +   case 

[Mesa-dev] [RFC 1/4] glsl IR: add always_active_io attribute to ir_variable

2015-10-13 Thread Gregory Hainaut
The value will be set in separate-shader program when an input/output
must remains active (i.e. deadcode removal isn't allowed because it will create
interface location/name-matching mismatch)

v3:
* Rename the attribute
* Use ir_variable directly instead of ir_variable_refcount_visitor
* Move the foreach IR code in the linker file

Signed-off-by: Gregory Hainaut 
---
 src/glsl/ir.cpp |  1 +
 src/glsl/ir.h   |  7 +
 src/glsl/linker.cpp | 73 +
 3 files changed, 81 insertions(+)

diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp
index 2c45b9e..f3f7355 100644
--- a/src/glsl/ir.cpp
+++ b/src/glsl/ir.cpp
@@ -1650,6 +1650,7 @@ ir_variable::ir_variable(const struct glsl_type *type, 
const char *name,
this->data.pixel_center_integer = false;
this->data.depth_layout = ir_depth_layout_none;
this->data.used = false;
+   this->data.always_active_io = false;
this->data.read_only = false;
this->data.centroid = false;
this->data.sample = false;
diff --git a/src/glsl/ir.h b/src/glsl/ir.h
index 43a2bf0..a91add9 100644
--- a/src/glsl/ir.h
+++ b/src/glsl/ir.h
@@ -658,6 +658,13 @@ public:
   unsigned assigned:1;
 
   /**
+   * When separate shader programs are enabled, only interstage
+   * variables can be safely removed of the shader interface. Others
+   * input/output must remains active.
+   */
+  unsigned always_active_io:1;
+
+  /**
* Enum indicating how the variable was declared.  See
* ir_var_declaration_type.
*
diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
index a97b4ef..3a30d0c 100644
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -3599,6 +3599,70 @@ link_assign_subroutine_types(struct gl_shader_program 
*prog)
}
 }
 
+static void
+ir_set_always_active_io(exec_list *ir, ir_variable_mode io_mode)
+{
+   assert(mode == ir_var_shader_in || mode == ir_var_shader_out);
+
+   foreach_in_list(ir_instruction, node, ir) {
+  ir_variable *const var = node->as_variable();
+
+  if (var == NULL || var->data.mode != io_mode)
+ continue;
+
+  var->data.always_active_io = true;
+   }
+}
+
+static void
+set_always_active_io(struct gl_shader_program *prog)
+{
+   unsigned first, last;
+   assert(prog->SeparateShader);
+
+   first = MESA_SHADER_STAGES;
+   last = 0;
+
+   /* Determine first and last stage. Excluding the compute stage */
+   for (unsigned i = 0; i < MESA_SHADER_COMPUTE; i++) {
+  if (!prog->_LinkedShaders[i])
+ continue;
+  if (first == MESA_SHADER_STAGES)
+ first = i;
+  last = i;
+   }
+
+   if (first == MESA_SHADER_STAGES)
+  return;
+
+   for (unsigned stage = 0; stage < MESA_SHADER_STAGES; stage++) {
+  gl_shader *sh = prog->_LinkedShaders[stage];
+  if (!sh)
+ continue;
+
+  if (first == last) {
+ /* Single shader program: allowed inactive variable
+  * 1/ input of the VS
+  * 2/ output of the FS
+  */
+ if (stage != MESA_SHADER_VERTEX)
+ir_set_always_active_io(sh->ir, ir_var_shader_in);
+ if (stage != MESA_SHADER_FRAGMENT)
+ir_set_always_active_io(sh->ir, ir_var_shader_out);
+  } else {
+ /* Multiple shaders program: allowed inactive variable
+  * 1/ input of the VS
+  * 2/ output of the FS
+  * 3/ interstage variables
+  */
+ if (stage == first && stage != MESA_SHADER_VERTEX)
+ir_set_always_active_io(sh->ir, ir_var_shader_in);
+ else if (stage == last && stage != MESA_SHADER_FRAGMENT)
+ir_set_always_active_io(sh->ir, ir_var_shader_out);
+  }
+   }
+}
+
 void
 link_shaders(struct gl_context *ctx, struct gl_shader_program *prog)
 {
@@ -3858,6 +3922,15 @@ link_shaders(struct gl_context *ctx, struct 
gl_shader_program *prog)
   }
}
 
+
+   /**
+* When separate shader programs are enabled, only interstage
+* variables can be safely removed of the shader interface. Others
+* input/output must remains active.
+*/
+   if (prog->SeparateShader)
+  set_always_active_io(prog);
+
if (!interstage_cross_validate_uniform_blocks(prog))
   goto done;
 
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC 2/4] glsl IR: only allow optimization of interstage variable

2015-10-13 Thread Gregory Hainaut
GL_ARB_separate_shader_objects allow to match by name variable or block
interface. Input varying can't be removed because it is will impact the
location assignment.

It fixes the bug 79783 and likely any application that uses
GL_ARB_separate_shader_objects extension.

piglit test: arb_separate_shader_object-rendezvous_by_name

Signed-off-by: Gregory Hainaut 
---
 src/glsl/opt_dead_code.cpp | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/src/glsl/opt_dead_code.cpp b/src/glsl/opt_dead_code.cpp
index 071485a..37329f6 100644
--- a/src/glsl/opt_dead_code.cpp
+++ b/src/glsl/opt_dead_code.cpp
@@ -75,6 +75,24 @@ do_dead_code(exec_list *instructions, bool 
uniform_locations_assigned)
  || !entry->declaration)
 continue;
 
+  /* Section 7.4.1 (Shader Interface Matching) of the OpenGL 4.5
+   * (Core Profile) spec says:
+   *
+   *"With separable program objects, interfaces between shader
+   *stages may involve the outputs from one program object and the
+   *inputs from a second program object.  For such interfaces, it is
+   *not possible to detect mismatches at link time, because the
+   *programs are linked separately. When each such program is
+   *linked, all inputs or outputs interfacing with another program
+   *stage are treated as active."
+   */
+  if (entry->var->data.always_active_io &&
+(!entry->var->data.explicit_location ||
+ entry->var->data.location >= VARYING_SLOT_VAR0) &&
+(entry->var->data.mode == ir_var_shader_in ||
+ entry->var->data.mode == ir_var_shader_out))
+ continue;
+
   if (entry->assign) {
 /* Remove a single dead assignment to the variable we found.
  * Don't do so if it's a shader or function output or a shader
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC 0/4] V3: Improve GLSL support of GL_ARB_separate_shader_objects

2015-10-13 Thread Gregory Hainaut
New version that improves previous code quality and fixes several outstanding 
issues.
Piglit wasn't run yet.

v3:
Squash commit 1&2
* Use a better name for the new attribute: always_active_io
* Use ir_variable directly instead of ir_variable_refcount_visitor
* Put related code in linker.cpp

Add 2 new commits to fix wrong interface matching in more complex case.
Commit 3: avoid collision between user and linker slot assignment
Commit 4: avoid unpredictable sorting of varying

Commit 1/2/3 fix the piglit test: arb_separate_shader_object-rendezvous_by_name 
posted on piglit ML
Commit 4 was tested on the PCSX2 application. I need to implement a new test.

Gregory Hainaut (4):
  glsl IR: add always_active_io attribute to ir_variable
  glsl IR: only allow optimization of interstage variable
  glsl: avoid linker and user varying location to overlap
  glsl: don't sort varying in separate shader mode

 src/glsl/ir.cpp|  1 +
 src/glsl/ir.h  |  7 +
 src/glsl/link_varyings.cpp | 76 ++
 src/glsl/linker.cpp| 73 
 src/glsl/opt_dead_code.cpp | 18 +++
 5 files changed, 169 insertions(+), 6 deletions(-)

-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC 3/4] glsl: avoid linker and user varying location to overlap

2015-10-13 Thread Gregory Hainaut
Current behavior on the interface matching:

layout (location = 0) out0; // Assigned to VARYING_SLOT_VAR0 by user
out1; // Assigned to VARYING_SLOT_VAR0 by the linker

New behavior on the interface matching:

layout (location = 0) out0; // Assigned to VARYING_SLOT_VAR0 by user
out1; // Assigned to VARYING_SLOT_VAR1 by the linker

piglit: arb_separate_shader_object-rendezvous_by_name
Signed-off-by: Gregory Hainaut 
---
 src/glsl/link_varyings.cpp | 46 +++---
 1 file changed, 43 insertions(+), 3 deletions(-)

diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp
index 7e77a67..6d6e9b5 100644
--- a/src/glsl/link_varyings.cpp
+++ b/src/glsl/link_varyings.cpp
@@ -766,7 +766,7 @@ public:
gl_shader_stage consumer_stage);
~varying_matches();
void record(ir_variable *producer_var, ir_variable *consumer_var);
-   unsigned assign_locations();
+   unsigned assign_locations(uint64_t reserved_slots);
void store_locations() const;
 
 private:
@@ -986,7 +986,7 @@ varying_matches::record(ir_variable *producer_var, 
ir_variable *consumer_var)
  * passed to varying_matches::record().
  */
 unsigned
-varying_matches::assign_locations()
+varying_matches::assign_locations(uint64_t reserved_slots)
 {
/* Sort varying matches into an order that makes them easy to pack. */
qsort(this->matches, this->num_matches, sizeof(*this->matches),
@@ -1013,6 +1013,10 @@ varying_matches::assign_locations()
   != this->matches[i].packing_class) {
  *location = ALIGN(*location, 4);
   }
+  while ((*location < MAX_VARYING * 4u) &&
+(reserved_slots & (1u << *location / 4u))) {
+ *location = ALIGN(*location + 1, 4);
+  }
 
   this->matches[i].generic_location = *location;
 
@@ -1376,6 +1380,38 @@ canonicalize_shader_io(exec_list *ir, enum 
ir_variable_mode io_mode)
 }
 
 /**
+ * Generate a bitfield map of the already reserved slots for a shader.
+ *
+ * In theory a 32 bits value will be enough but a 64 bits value is future 
proof.
+ */
+uint64_t
+reserved_varying_slot(struct gl_shader *stage, ir_variable_mode io_mode)
+{
+   assert(mode == ir_var_shader_in || mode == ir_var_shader_out);
+   assert(MAX_VARYING <= 64); /* avoid an overflow of the returned value */
+
+   uint64_t slots = 0;
+   int var_slot;
+
+   if (!stage)
+  return slots;
+
+   foreach_in_list(ir_instruction, node, stage->ir) {
+  ir_variable *const var = node->as_variable();
+
+  if (var == NULL || var->data.mode != io_mode || 
!var->data.explicit_location)
+ continue;
+
+  var_slot = var->data.location - VARYING_SLOT_VAR0;
+  if (var_slot >= 0 && var_slot < MAX_VARYING)
+ slots |= 1u << var_slot;
+   }
+
+   return slots;
+}
+
+
+/**
  * Assign locations for all variables that are produced in one pipeline stage
  * (the "producer") and consumed in the next stage (the "consumer").
  *
@@ -1550,7 +1586,11 @@ assign_varying_locations(struct gl_context *ctx,
  matches.record(matched_candidate->toplevel_var, NULL);
}
 
-   const unsigned slots_used = matches.assign_locations();
+   const uint64_t reserved_slots =
+  reserved_varying_slot(producer, ir_var_shader_out) |
+  reserved_varying_slot(consumer, ir_var_shader_in);
+
+   const unsigned slots_used = matches.assign_locations(reserved_slots);
matches.store_locations();
 
for (unsigned i = 0; i < num_tfeedback_decls; ++i) {
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC 4/4] glsl: don't sort varying in separate shader mode

2015-10-13 Thread Gregory Hainaut
Current issue is the addition of FLAT qualifier on varying_matches::record()
which break the varying expected order

Future issue is the removal of the interpolation qualifier matching constrain

In my humble opinion, it is the responsability of the GL developer to optimize
their slots assignment in SSO with the help of GL_ARB_enhanced_layouts

Signed-off-by: Gregory Hainaut 
---
 src/glsl/link_varyings.cpp | 36 ++--
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp
index 6d6e9b5..31efee7 100644
--- a/src/glsl/link_varyings.cpp
+++ b/src/glsl/link_varyings.cpp
@@ -766,7 +766,7 @@ public:
gl_shader_stage consumer_stage);
~varying_matches();
void record(ir_variable *producer_var, ir_variable *consumer_var);
-   unsigned assign_locations(uint64_t reserved_slots);
+   unsigned assign_locations(uint64_t reserved_slots, bool separate_shader);
void store_locations() const;
 
 private:
@@ -986,11 +986,34 @@ varying_matches::record(ir_variable *producer_var, 
ir_variable *consumer_var)
  * passed to varying_matches::record().
  */
 unsigned
-varying_matches::assign_locations(uint64_t reserved_slots)
+varying_matches::assign_locations(uint64_t reserved_slots, bool 
separate_shader)
 {
-   /* Sort varying matches into an order that makes them easy to pack. */
-   qsort(this->matches, this->num_matches, sizeof(*this->matches),
- _matches::match_comparator);
+   /* Disable the varying sorting for separate shader program
+* 1/ All programs must sort the code in the same order to guarantee the
+*interface matching. However varying_matches::record() will change the
+*interpolation qualifier of some stages.
+*
+* 2/ GLSL version 4.50 removes the matching constrain on the interpolation
+*qualifier.
+*
+* Chapter 4.5 of GLSL 4.40:
+*"The type and presence of interpolation qualifiers of variables with
+*the same name declared in all linked shaders for the same cross-stage
+*interface must match, otherwise the link command will fail.
+*
+*When comparing an output from one stage to an input of a subsequent
+*stage, the input and output don't match if their interpolation
+*qualifiers (or lack thereof) are not the same."
+*
+* Chapter 4.5 of GLSL 4.50:
+*"It is a link-time error if, within the same stage, the interpolation
+*qualifiers of variables of the same name do not match."
+*/
+   if (!separate_shader) {
+  /* Sort varying matches into an order that makes them easy to pack. */
+  qsort(this->matches, this->num_matches, sizeof(*this->matches),
+_matches::match_comparator);
+   }
 
unsigned generic_location = 0;
unsigned generic_patch_location = MAX_VARYING*4;
@@ -1590,7 +1613,8 @@ assign_varying_locations(struct gl_context *ctx,
   reserved_varying_slot(producer, ir_var_shader_out) |
   reserved_varying_slot(consumer, ir_var_shader_in);
 
-   const unsigned slots_used = matches.assign_locations(reserved_slots);
+   const unsigned slots_used = matches.assign_locations(reserved_slots,
+prog->SeparateShader);
matches.store_locations();
 
for (unsigned i = 0; i < num_tfeedback_decls; ++i) {
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] llvmpipe broken on Skylake Pentium (LP_NATIVE_VECTOR_WIDTH=128)

2015-10-13 Thread Adam Jackson
On Mon, 2015-10-12 at 22:55 +0200, Roland Scheidegger wrote:

> I don't know that looks like a generic string you're getting back.
> x86-64 IIRC implies sse2 in llvm, but not the other sseX flags (and if
> we detected sse41 we're going to use intrinsics for these, which then
> may not be available in llvm _potentially_ (I'm really not sure here as
> this code changed, and mesa generally is only adapted to such changes
> once it breaks..., but the string here will be something less generic
> like "ivybridge" usually on real hw).

This was the clue I needed, SSE4.1 intrinsics die because llvm doesn't
think they're legal for the target, because llvm 3.6 doesn't know about
broadwell or skylake so they just appear as generic.  Adding their
model numbers to the Haswell path seems to do the trick (since
apparently Haswell Pentium has the same property re AVX).

- ajax
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] llvmpipe broken on Skylake Pentium (LP_NATIVE_VECTOR_WIDTH=128)

2015-10-13 Thread Roland Scheidegger
Am 13.10.2015 um 19:59 schrieb Adam Jackson:
> On Mon, 2015-10-12 at 22:55 +0200, Roland Scheidegger wrote:
> 
>> I don't know that looks like a generic string you're getting back.
>> x86-64 IIRC implies sse2 in llvm, but not the other sseX flags (and if
>> we detected sse41 we're going to use intrinsics for these, which then
>> may not be available in llvm _potentially_ (I'm really not sure here as
>> this code changed, and mesa generally is only adapted to such changes
>> once it breaks..., but the string here will be something less generic
>> like "ivybridge" usually on real hw).
> 
> This was the clue I needed, SSE4.1 intrinsics die because llvm doesn't
> think they're legal for the target, because llvm 3.6 doesn't know about
> broadwell or skylake so they just appear as generic.  Adding their
> model numbers to the Haswell path seems to do the trick (since
> apparently Haswell Pentium has the same property re AVX).
> 
> - ajax
> 

I think though it's still possible to tell llvm that these are legal by
setting mattr's, so they'd work regardless the cpu detected. So we
could/should probably fix this in mesa too. Albeit as mentioned the
mechanism there changed somewhat over time in llvm.

Roland

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] glsl: Support uint index in lower_vector_insert

2015-10-13 Thread Jordan Justen
The ES31-CTS.compute_shader.pipeline-compute-chain test case was
generating an unsigned index by using gl_LocalInvocationID.x and
gl_LocalInvocationID.y as array indices.

Signed-off-by: Jordan Justen 
---
 src/glsl/lower_vector_insert.cpp | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/glsl/lower_vector_insert.cpp b/src/glsl/lower_vector_insert.cpp
index 6d7cfa9..26d31b0 100644
--- a/src/glsl/lower_vector_insert.cpp
+++ b/src/glsl/lower_vector_insert.cpp
@@ -108,9 +108,13 @@ vector_insert_visitor::handle_rvalue(ir_rvalue **rv)
   factory.emit(assign(temp, expr->operands[0]));
   factory.emit(assign(src_temp, expr->operands[1]));
 
+  assert(expr->operands[2]->type == glsl_type::int_type ||
+ expr->operands[2]->type == glsl_type::uint_type);
+
   for (unsigned i = 0; i < expr->type->vector_elements; i++) {
  ir_constant *const cmp_index =
-new(factory.mem_ctx) ir_constant(int(i));
+ir_constant::zero(factory.mem_ctx, expr->operands[2]->type);
+ cmp_index->value.u[0] = i;
 
  ir_variable *const cmp_result =
 factory.make_temp(glsl_type::bool_type, "index_condition");
-- 
2.5.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: Restore compute shader support in brw_nir_lower_inputs

2015-10-13 Thread Jordan Justen
The commit shown below caused compute shaders to hit the unreachable
in the default of the switch block. Restore compute shaders to use the
fragment shader path.

Also, simplify the fragment/compute path to only support scalar mode.

commit 2953c3d76178d7589947e6ea1dbd902b7b02b3d4
Author: Kenneth Graunke 
Date:   Fri Aug 14 15:15:11 2015 -0700

i965/vs: Map scalar VS input locations properly; avoid tons of MOVs.

Signed-off-by: Jordan Justen 
Cc: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_nir.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
b/src/mesa/drivers/dri/i965/brw_nir.c
index 4f35d81..357ee4f 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -96,8 +96,10 @@ brw_nir_lower_inputs(nir_shader *nir, bool is_scalar)
   }
   break;
case MESA_SHADER_FRAGMENT:
+   case MESA_SHADER_COMPUTE:
+  assert(is_scalar);
   nir_assign_var_locations(>inputs, >num_inputs,
-   is_scalar ? type_size_scalar : type_size_vec4);
+   type_size_scalar);
   break;
default:
   unreachable("unsupported shader stage");
-- 
2.5.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] glsl: Support uint index in do_vec_index_to_cond_assign

2015-10-13 Thread Jordan Justen
The ES31-CTS.compute_shader.pipeline-compute-chain test case was
generating an unsigned index by using gl_LocalInvocationID.x and
gl_LocalInvocationID.y as array indices.

Signed-off-by: Jordan Justen 
---
 src/glsl/lower_vec_index_to_cond_assign.cpp | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/glsl/lower_vec_index_to_cond_assign.cpp 
b/src/glsl/lower_vec_index_to_cond_assign.cpp
index 0c3394a..b623882 100644
--- a/src/glsl/lower_vec_index_to_cond_assign.cpp
+++ b/src/glsl/lower_vec_index_to_cond_assign.cpp
@@ -88,7 +88,9 @@ 
ir_vec_index_to_cond_assign_visitor::convert_vec_index_to_cond_assign(void *mem_
exec_list list;
 
/* Store the index to a temporary to avoid reusing its tree. */
-   index = new(base_ir) ir_variable(glsl_type::int_type,
+   assert(orig_index->type == glsl_type::int_type ||
+  orig_index->type == glsl_type::uint_type);
+   index = new(base_ir) ir_variable(orig_index->type,
"vec_index_tmp_i",
ir_var_temporary);
list.push_tail(index);
-- 
2.5.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: Fix variable_referenced() for vector_{extract, insert} expressions

2015-10-13 Thread Kristian Høgsberg
On Mon, Oct 05, 2015 at 11:42:43AM +0200, Iago Toral Quiroga wrote:
> We get these when we operate on vector variables with array accessors
> (i.e. things like a[0] where 'a' is a vec4). When we call 
> variable_referenced()
> on these expressions we want to return a reference to 'a' instead of NULL.
> 
> This fixes a problem where we pass a[0] as the first argument to an atomic
> SSBO function that expects a buffer variable. In order to check this, we use
> variable_referenced(), but that is currently returning NULL in this case, 
> since
> the underlying rvalue is a vector_extract expression.

Reviewed-by: Kristian Høgsberg 

> ---
>  src/glsl/ir.cpp | 16 
>  src/glsl/ir.h   |  2 ++
>  2 files changed, 18 insertions(+)
> 
> diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp
> index 2c45b9e..4c22843 100644
> --- a/src/glsl/ir.cpp
> +++ b/src/glsl/ir.cpp
> @@ -662,6 +662,22 @@ ir_expression::get_operator(const char *str)
> return (ir_expression_operation) -1;
>  }
>  
> +ir_variable *
> +ir_expression::variable_referenced() const
> +{
> +   switch (operation) {
> +  case ir_binop_vector_extract:
> +  case ir_triop_vector_insert:
> + /* We get these for things like a[0] where a is a vector type. In 
> these
> +  * cases we want variable_referenced() to return the actual vector
> +  * variable this is wrapping.
> +  */
> + return operands[0]->variable_referenced();
> +  default:
> + return ir_rvalue::variable_referenced();
> +   }
> +}
> +
>  ir_constant::ir_constant()
> : ir_rvalue(ir_type_constant)
>  {
> diff --git a/src/glsl/ir.h b/src/glsl/ir.h
> index 43a2bf0..9c9f22d 100644
> --- a/src/glsl/ir.h
> +++ b/src/glsl/ir.h
> @@ -1731,6 +1731,8 @@ public:
>  
> virtual ir_visitor_status accept(ir_hierarchical_visitor *);
>  
> +   virtual ir_variable *variable_referenced() const;
> +
> ir_expression_operation operation;
> ir_rvalue *operands[4];
>  };
> -- 
> 1.9.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization

2015-10-13 Thread Matt Turner
On Tue, Oct 13, 2015 at 1:49 AM, Alejandro Piñeiro  wrote:
> On 13/10/15 03:10, Matt Turner wrote:
>> Looks like this is causing an intermittent failure on HSW in our
>> Jenkins system (but I'm not able to reproduce locally) and a
>> consistent failure on G45. I'll investigate that.
>
> Ok, will hold on, and meanwhile I will try to reproduce the problem on
> HSW. Unfortunately I don't have any G45 available, so I will not be able
> to help on that. Thanks for taking a look there.

Okay, I think I see what's going on:

--- good2015-10-13 13:34:40.753357253 -0700
+++ bad 2015-10-13 13:36:06.114290094 -0700
-and(8)  g6<1>.xDg6<4,4,1>.xD1D  { align16 };
-cmp.nz.f0(8)null-g6<4,4,1>.xD   0D  { align16 };
+and.nz.f0(8)g6<1>.xDg6<4,4,1>.xD1D  { align16 };

We're propagating a .nz conditional mod from a CMP with a null dest
(that has a writemask of .xyzw) to an AND that has a writemask of only
.x, so only the .x channel of the flag is now being updated.

I think for now the thing to do is add

   (inst->dst.writemask & ~scan_inst->dst.writemask) != 0)

to the list of conditions under which we bail out. That is, if the
instruction we want to propagate the cmod onto writes fewer channels,
we can't do the optimization.

With that, the block should look like:

 if ((scan_inst->predicate && scan_inst->opcode != BRW_OPCODE_SEL) ||
 scan_inst->dst.reg_offset != inst->src[0].reg_offset ||
 (scan_inst->dst.writemask != WRITEMASK_X &&
  scan_inst->dst.writemask != WRITEMASK_XYZW) ||
 (scan_inst->dst.writemask == WRITEMASK_XYZW &&
  inst->src[0].swizzle != BRW_SWIZZLE_XYZW) ||
 (inst->dst.writemask & ~scan_inst->dst.writemask) != 0)
break;

The good news is that, unless I've done something wrong, this doesn't
affect any shaders in shader-db on ILK or HSW (I only tested those
two, but I expect the results are the same everywhere). Apparently
this is a pretty rare case.

With that change, my R-b still stands, though we should have a unit
test for this case as well in 3/5.

Thanks!
Matt
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 1/2] intel: 48b ppgtt support (EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag)

2015-10-13 Thread Kristian Høgsberg
On Tue, Oct 13, 2015 at 7:55 AM, Michel Thierry
 wrote:
> On 10/13/2015 3:13 PM, Emil Velikov wrote:
>>
>> On 13 October 2015 at 13:16, Michel Thierry 
>> wrote:
>>>
>>> On 10/6/2015 2:12 PM, Michel Thierry wrote:


 On 10/5/2015 7:06 PM, Kristian Høgsberg wrote:
>
>
> On Mon, Oct 5, 2015 at 7:03 AM, Michel Thierry
>  wrote:
>>
>>
>> On 9/14/2015 2:54 PM, Michał Winiarski wrote:
>>>
>>>
>>>
>>> On Thu, Sep 03, 2015 at 03:23:58PM +0100, Michel Thierry wrote:



 Gen8+ supports 48-bit virtual addresses, but some objects must
 always be
 allocated inside the 32-bit address range.

 In specific, any resource used with flat/heapless
 (0x-0xf000)
 General State Heap (GSH) or Instruction State Heap (ISH) must be in
 a
 32-bit range, because the General State Offset and Instruction State
 Offset
 are limited to 32-bits.

 The i915 driver has been modified to provide a flag to set when the
 4GB
 limit is not necessary in a given bo
 (EXEC_OBJECT_SUPPORTS_48B_ADDRESS).
 48-bit range will only be used when explicitly requested.

 Callers to the existing drm_intel_bo_emit_reloc function should set
 the
 use_48b_address_range flag beforehand, in order to use full ppgtt
 range.

 v2: Make set/clear functions nops on pre-gen8 platforms, and use
 them
internally in emit_reloc functions (Ben)
s/48BADDRESS/48B_ADDRESS/ (Dave)
 v3: Keep set/clear functions internal, no-one needs to use them
 directly.
 v4: Don't set 48bit-support flag in emit reloc, check for ppgtt type
before enabling set/clear function, print full offsets in
 debug
statements, using port of lower_32_bits and upper_32_bits
 from linux
kernel (Michał)

 References:

 http://lists.freedesktop.org/archives/intel-gfx/2015-July/072612.html
 Cc: Ben Widawsky 
 Cc: Michał Winiarski 
>>>
>>>
>>>
>>>
>>> +Kristian
>>>
>>> LGTM.
>>> Acked-by: Michał Winiarski 
>>>
 Signed-off-by: Michel Thierry 
>>
>>
>>
>>
>>
>> Hi Kristian,
>>
>> More comments on this?
>> I've resent the mesa patch with the last changes
>>
>>
>> (http://lists.freedesktop.org/archives/dri-devel/2015-October/091752.html).
>>
>>
>> Michał has agreed with the interface and will use it for his work.
>> If mesa doesn't plan to use this for now, it's ok. The kernel changes
>> have
>> been done to prevent any regressions when the support 48-bit flag is
>> not
>> set.
>
>
>
> I didn't get any replies to my last comments on this:
>
> http://lists.freedesktop.org/archives/mesa-dev/2015-August/091398.html
>
> Basically, I tried explicitly placing buffers above 8G and didn't see
> the HW problem described (ie it all worked fine).  I still think
> there's some confusion as to what the W/A is about.
>
> Here's my take: the W/A is a SW-only workaround to handle the cases
> where a driver uses heapless and 48-bit ppgtt. The problem is that the
> heaps are limited to 4G but with 48bit ppgtt a buffer can be placed
> anywhere it the 48 bit address space. If that happens it's consideredd
> out-of-bounds for the heap and access fails. To prevent this we need
> to make sure the bos we address in a heapless fashion are placed below
> 4g.
>
> For mesa, we only configure general state base as heap-less, which
> affects scratch bos. What this boils down to is that we need the 4G
> restriction only for the scratch bos set up on 3DSTATE_VS, 3DSTATE_GS
> and 3DSTATE_PS (and tesselation stage eventually). Look for the
> OUT_RELOC64 for stage->scratch_bo in gen8_vs_state.c, gen8_gs_state.c
> and gen8_ps_state.c



 I think it also affects _3DSTATE_VIEWPORT_STATE_POINTERS_CC, maybe it
 isn't exclusive to the scratch bos (the error state points to that
 instruction).


>>>
>>> Hi,
>>>
>>> Anymore inputs about this patch?
>>> AFAIK, if changes are required based on further comments from Kristian,
>>> these will be in the mesa patch[1], not libdrm. Also, Michał will use
>>> this
>>> flag in another project.
>>>
>> The comment seems quite brief and I'm not sure it fully addresses
>> Kristian's concern. I'd suspect that providing reference to the HW
>> documentation (confirming your assumption) might be beneficial.
>>
>
> Sure, attached is the hang I get if 

Re: [Mesa-dev] [PATCH v2] nir: split SSBO min/max atomic instrinsics into signed/unsigned versions

2015-10-13 Thread Kenneth Graunke
On Tuesday, September 29, 2015 09:25:27 AM Iago Toral Quiroga wrote:
> NIR is typeless so this is the only way to keep track of the
> type to select the proper atomic to use.
> 
> v2:
>   - Use imin,imax,umin,umax for the intrinsic names (Connor Abbott)
>   - Change message for unreachable paths (Michael Schellenberger)
> ---
>  src/glsl/nir/glsl_to_nir.cpp   | 22 ++
>  src/glsl/nir/nir_intrinsics.h  |  6 --
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 20 ++--
>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 22 +++---
>  4 files changed, 43 insertions(+), 27 deletions(-)

Thanks, Iago!  This looks good to me.

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 6/7] radeonsi: Add DCC decompress.

2015-10-13 Thread Marek Olšák
On Tue, Oct 13, 2015 at 2:19 AM, Marek Olšák  wrote:
> On Mon, Oct 12, 2015 at 5:05 PM, Bas Nieuwenhuizen
>  wrote:
>> Hi Marek,
>>
>> Thanks for the quick review.
>>
>> I do not think I understand the sharing semantics. We currently have
>> fast clear for scanout surfaces with the CMASK and eliminate it on
>> flush resource. I would think we could do that similarly with DCC fast
>> clear. Both require a flush_resource after modifying the resource
>> before other applications can use it.
>
> I meant not using fast clear for shared textures, so flush_resource
> would be a no-op. We can't use fast clear for shared textures, because
> we can't communicate clear color changes between processes.
>
>>
>> Furthermore, if we disable DCC for image stores, we also need to
>> communicate that. We could leave DCC enabled for sampling as long as
>> the DCC buffer stays in decompressed state. But we would need to
>> communicate that DCC should not be used anymore for rendering.
>
> Image stores are the only big problem preventing DCC sharing. If we
> share textures with DCC enabled, we can't disable it, which means we
> have to decompress before image stores are used, which can result in
> performance that is worse than without DCC.

I've been thinking about this and I propose the following solution:
1) If a texture isn't shared, decompress and disable DCC permanently
for that texture when we get a shader with an image store.
2) If a texture is shared, decompress before it's used by image
stores, but don't disable DCC. This case shouldn't occur and we should
avoid using image stores for 2D acceleration.

A side note: DCC and SDMA are mutually exclusive, so we must choose
carefully which one we're going to use for 2D acceleration. I'll leave
that decision to Michel. If we decide to use DCC for 2D accel, SDMA
image blit support will be pretty useless on VI.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Don't hardcode FS in "validation failed!" message.

2015-10-13 Thread Kenneth Graunke
Instead, print "Scalar VS" or "Scalar FS".  Otherwise it's really
confusing which stage is broken.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_fs_validate.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_validate.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_validate.cpp
index d0e04f3..814c551 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_validate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_validate.cpp
@@ -32,7 +32,7 @@
 
 #define fsv_assert(cond) \
if (!(cond)) { \
-  fprintf(stderr, "ASSERT: FS validation failed!\n"); \
+  fprintf(stderr, "ASSERT: Scalar %s validation failed!\n", stage_abbrev); 
\
   dump_instruction(inst, stderr); \
   fprintf(stderr, "%s:%d: %s\n", __FILE__, __LINE__, #cond); \
   abort(); \
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/4] Implement separate index spaces for UBOs and SSBOs

2015-10-13 Thread Kristian Høgsberg
On Fri, Oct 09, 2015 at 03:23:33PM +0200, Iago Toral Quiroga wrote:
> See the rationale for this in [1], no piglit regressions observed in my
> IvyBridge laptop.
> 
> Patch 1: Renames {Num}UniformBlocks to {Num}BufferInterfaceBlocks. This is
> more consistent with the current implementation, since right now
> UniformBlocks contains both UBOs and SSBOs.
> 
> Patch 2: Adds separate UniformBlocks and ShaderStorageBlocks arrays.
> 
> Patch 3: Changes lower_ubo_reference pass to use UniformBlocks and
>  ShaderStorageBlocks instead of BufferInterfaceBlocks so we get
>  block indices in separate index spaces.
> 
> Patch 4: Changes i965 to work with separate index spaces.
> 
> I think there are other places that use BufferInterfaceBlocks after
> linking that can be rewritten to use UniformBlocks or ShaderStorageBlocks
> more efficiently. I'd like to tackle that too, but I think it makes sense
> to land this first I think.

Sure, that sounds good. I wasn't sure about the name
'BufferInterfaceBlocks', but thinking about it, I think it's just
right.  They're interface blocks that are backed by buffers (as
opposed to input or output interface blocks).

Series

Reviewed-by: Kristian Høgsberg 

> [1] http://lists.freedesktop.org/archives/mesa-dev/2015-October/095951.html
> 
> Iago Toral Quiroga (4):
>   mesa: Rename {Num}UniformBlocks to {Num}BufferInterfaceBlocks
>   mesa: Add {Num}UniformBlocks and {Num}ShaderStorageBlocks to
> gl_shader{_program}
>   glsl/lower_ubo_reference: lower UBOs and SSBOs to separate index
> spaces
>   i965: Adapt SSBOs to work with their own separate index space
> 
>  src/glsl/link_uniform_initializers.cpp   |   4 +-
>  src/glsl/link_uniforms.cpp   |  22 ++---
>  src/glsl/linker.cpp  | 105 
> ++-
>  src/glsl/lower_ubo_reference.cpp |  22 +++--
>  src/glsl/nir/glsl_to_nir.cpp |   2 +-
>  src/glsl/standalone_scaffolding.cpp  |  11 ++-
>  src/mesa/drivers/dri/i965/brw_context.h  |   4 +-
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp |  24 +++---
>  src/mesa/drivers/dri/i965/brw_shader.cpp |   7 +-
>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp   |  24 +++---
>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c |  79 +
>  src/mesa/main/mtypes.h   |  53 +++-
>  src/mesa/main/shader_query.cpp   |   4 +-
>  src/mesa/main/shaderapi.c|   4 +-
>  src/mesa/main/shaderobj.c|   4 +-
>  src/mesa/main/uniforms.c |  12 +--
>  src/mesa/state_tracker/st_atom_constbuf.c|   4 +-
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp   |   4 +-
>  18 files changed, 257 insertions(+), 132 deletions(-)
> 
> -- 
> 1.9.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Don't hardcode FS in "validation failed!" message.

2015-10-13 Thread Kristian Høgsberg
On Tue, Oct 13, 2015 at 3:17 PM, Kenneth Graunke  wrote:
> Instead, print "Scalar VS" or "Scalar FS".  Otherwise it's really
> confusing which stage is broken.
>
> Signed-off-by: Kenneth Graunke 

Reviewed-by: Kristian Høgsberg 

> ---
>  src/mesa/drivers/dri/i965/brw_fs_validate.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_validate.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_validate.cpp
> index d0e04f3..814c551 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_validate.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_validate.cpp
> @@ -32,7 +32,7 @@
>
>  #define fsv_assert(cond) \
> if (!(cond)) { \
> -  fprintf(stderr, "ASSERT: FS validation failed!\n"); \
> +  fprintf(stderr, "ASSERT: Scalar %s validation failed!\n", 
> stage_abbrev); \
>dump_instruction(inst, stderr); \
>fprintf(stderr, "%s:%d: %s\n", __FILE__, __LINE__, #cond); \
>abort(); \
> --
> 2.6.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] nir: Add helpers for creating variables and adding them to lists

2015-10-13 Thread Iago Toral
On Mon, 2015-10-12 at 07:47 -0700, Jason Ekstrand wrote:
> 
> On Oct 12, 2015 1:26 AM, "Iago Toral"  wrote:
> >
> > On Fri, 2015-10-09 at 07:09 -0700, Jason Ekstrand wrote:
> > > ---
> > >  src/glsl/nir/glsl_to_nir.cpp   | 40 -
> > >  src/glsl/nir/nir.c | 66
> ++
> > >  src/glsl/nir/nir.h | 20 +
> > >  src/mesa/program/prog_to_nir.c | 19 +---
> > >  4 files changed, 99 insertions(+), 46 deletions(-)
> > >
> > > diff --git a/src/glsl/nir/glsl_to_nir.cpp
> b/src/glsl/nir/glsl_to_nir.cpp
> > > index 874e361..5250080 100644
> > > --- a/src/glsl/nir/glsl_to_nir.cpp
> > > +++ b/src/glsl/nir/glsl_to_nir.cpp
> > > @@ -389,35 +389,10 @@ nir_visitor::visit(ir_variable *ir)
> > >
> > > var->interface_type = ir->get_interface_type();
> > >
> > > -   switch (var->data.mode) {
> > > -   case nir_var_local:
> > > -  exec_list_push_tail(>locals, >node);
> > > -  break;
> > > -
> > > -   case nir_var_global:
> > > -  exec_list_push_tail(>globals, >node);
> > > -  break;
> > > -
> > > -   case nir_var_shader_in:
> > > -  exec_list_push_tail(>inputs, >node);
> > > -  break;
> > > -
> > > -   case nir_var_shader_out:
> > > -  exec_list_push_tail(>outputs, >node);
> > > -  break;
> > > -
> > > -   case nir_var_uniform:
> > > -   case nir_var_shader_storage:
> > > -  exec_list_push_tail(>uniforms, >node);
> > > -  break;
> > > -
> > > -   case nir_var_system_value:
> > > -  exec_list_push_tail(>system_values, >node);
> > > -  break;
> > > -
> > > -   default:
> > > -  unreachable("not reached");
> > > -   }
> > > +   if (var->data.mode == nir_var_local)
> > > +  nir_function_impl_add_variable(impl, var);
> > > +   else
> > > +  nir_shader_add_variable(shader, var);
> > >
> > > _mesa_hash_table_insert(var_table, ir, var);
> > > this->var = var;
> > > @@ -2023,13 +1998,10 @@ nir_visitor::visit(ir_constant *ir)
> > >  * constant initializer and return a dereference.
> > >  */
> > >
> > > -   nir_variable *var = ralloc(this->shader, nir_variable);
> > > -   var->name = ralloc_strdup(var, "const_temp");
> > > -   var->type = ir->type;
> > > -   var->data.mode = nir_var_local;
> > > +   nir_variable *var =
> > > +  nir_local_variable_create(this->impl, ir->type,
> "const_temp");
> > > var->data.read_only = true;
> > > var->constant_initializer = constant_copy(ir, var);
> > > -   exec_list_push_tail(>impl->locals, >node);
> > >
> > > this->deref_head = nir_deref_var_create(this->shader, var);
> > > this->deref_tail = >deref_head->deref;
> > > diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c
> > > index e12da80..3645726 100644
> > > --- a/src/glsl/nir/nir.c
> > > +++ b/src/glsl/nir/nir.c
> > > @@ -103,6 +103,72 @@ nir_reg_remove(nir_register *reg)
> > > exec_node_remove(>node);
> > >  }
> > >
> > > +void
> > > +nir_shader_add_variable(nir_shader *shader, nir_variable *var)
> > > +{
> > > +   switch (var->data.mode) {
> > > +   case nir_var_local:
> > > +  assert(!"nir_shader_add_variable cannot be used for local
> variables");
> > > +  break;
> > > +
> > > +   case nir_var_global:
> > > +  exec_list_push_tail(>globals, >node);
> > > +  break;
> > > +
> > > +   case nir_var_shader_in:
> > > +  exec_list_push_tail(>inputs, >node);
> > > +  break;
> > > +
> > > +   case nir_var_shader_out:
> > > +  exec_list_push_tail(>outputs, >node);
> > > +  break;
> > > +
> > > +   case nir_var_uniform:
> > > +   case nir_var_shader_storage:
> > > +  exec_list_push_tail(>uniforms, >node);
> > > +  break;
> > > +
> > > +   case nir_var_system_value:
> > > +  exec_list_push_tail(>system_values, >node);
> > > +  break;
> > > +   }
> > > +}
> > > +
> > > +nir_variable *
> > > +nir_variable_create(nir_shader *shader, nir_variable_mode mode,
> > > +const struct glsl_type *type, const char
> *name)
> > > +{
> > > +   nir_variable *var = rzalloc(shader, nir_variable);
> > > +   var->name = ralloc_strdup(var, name);
> > > +   var->type = type;
> > > +   var->data.mode = mode;
> > > +
> > > +   if ((mode == nir_var_shader_in && shader->stage !=
> MESA_SHADER_VERTEX) ||
> > > +   (mode == nir_var_shader_out && shader->stage !=
> MESA_SHADER_FRAGMENT))
> > > +  var->data.interpolation = INTERP_QUALIFIER_SMOOTH;
> > > +
> > > +   if (mode == nir_var_shader_in || mode == nir_var_uniform)
> > > +  var->data.read_only = true;
> > > +
> > > +   nir_shader_add_variable(shader, var);
> > > +
> > > +   return var;
> > > +}
> > > +
> > > +nir_variable *
> > > +nir_local_variable_create(nir_function_impl *impl,
> > > +  const struct glsl_type *type, const
> char *name)
> > > +{
> > > +   nir_variable *var = rzalloc(impl->overload->function->shader,
> nir_variable);
> > > +   var->name = ralloc_strdup(var, name);
> > > +   var->type = type;