date:20150625

Re: [Mesa-dev] [PATCH] i965/gen9: use an unreserved surface alignment value

2015-06-25 Thread Pohjolainen, Topi

On Wed, Jun 24, 2015 at 05:57:13PM -0700, Anuj Phogat wrote:
 On Wed, Jun 24, 2015 at 3:51 PM, Nanley Chery nanleych...@gmail.com wrote:
  From: Nanley Chery nanley.g.ch...@intel.com
 
  Although the horizontal and vertical alignment fields are ignored here,
  0 is a reserved value for them and may cause undefined behavior. Change
  the default value to an abitrary valid one.
 
  Signed-off-by: Nanley Chery nanley.g.ch...@intel.com
  ---
   src/mesa/drivers/dri/i965/gen8_surface_state.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)
 
  diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
  b/src/mesa/drivers/dri/i965/gen8_surface_state.c
  index b2d1a57..22ae960 100644
  --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
  +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
  @@ -93,7 +93,7 @@ vertical_alignment(const struct brw_context *brw,
  if (brw-gen  8 
  (mt-tr_mode != INTEL_MIPTREE_TRMODE_NONE ||
   surf_type == BRW_SURFACE_1D))
  -  return 0;
  +  return GEN8_SURFACE_VALIGN_4;
 
  switch (mt-align_h) {
  case 4:
  @@ -118,7 +118,7 @@ horizontal_alignment(const struct brw_context *brw,
  if (brw-gen  8 
  (mt-tr_mode != INTEL_MIPTREE_TRMODE_NONE ||
   gen9_use_linear_1d_layout(brw, mt)))
  -  return 0;
  +  return GEN8_SURFACE_HALIGN_4;
 
  switch (mt-align_w) {
  case 4:
  --
  2.4.4
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 
 Good find Nanley. We had no known issues with value 0 but it's
 always nice to avoid undefined behavior :).

Right, I thought about this when I reviewed the original. The spec says
it is ignored in these cases and hence the reserved value seemed fine. Now
that we put something meaningful there, somebody is going to compare the
spec and wonder why we set it to 4. If we added also a comment here that
says this is just an arbitrary (non-reserved) value and really ignored
by the hardware, it would prevent misunderstandings. What do you guys think?

 
 Reviewed-by: Anuj Phogat anuj.pho...@gmail.com
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] glsls: Modify exec_list to avoid strict-aliasing violations

2015-06-25 Thread Davin McCall


On 25/06/15 01:13, Dave Airlie wrote:

 -fno-strict-aliasing:with strict aliasing:
libGL.so  699188  699188(no change)
*_dri.so 9575876 9563104(-2772)


Use the size command to get the actual text segment size,

otherwise debugging symbols can drown changes.

Dave.


I had stripped debugging symbols. size gives me:

   text   databssdechexfilename
 590628   6540   1636 598804  92314 libGL.so.1.2.0

   text   databssdechexfilename
 590628   6540   1636 598804  92314 libGL.so.1.2.0

   text   databssdechexfilename
8477420 265925197861610721961 a39aa9r600_dri.so

   text   databssdechexfilename
8464211 265925197861610708752 a36710r600_dri.so

I.e. about a 10k difference in the text size of the dri modules.

Davin

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/4] i965: use SSA values when we can

2015-06-25 Thread Connor Abbott

Before, we were using a hack where when we converted out of SSA, we set
a parent_instr field of the nir_register to indicate that the register
was actually an SSA value. But in the future, we want to handle SSA
values directly, and right now we're creating an extra nir_register for
everything, even if it's not involved in a phi node. This series removes
that hack for i965 and gets us using SSA values directly in most cases.

The only other user of nir_convert_from_ssa() is vc4, which I believed I
changed correctly, and it doesn't seem to use
nir_register::parent_instr, based on my grepping. I tried to
compile-test it, but it assumed I was using the simulator and died, so
it would be nice to at least compile-test it.

The changes are also available at:

git://people.freedesktop.org/~cwabbott0/mesa i965-use-ssa

Connor Abbott (4):
  nir/from_ssa: add a flag to not convert everything to SSA
  i965/fs: use SSA values directly
  nir: remove nir_src_get_parent_instr()
  nir: remove parent_instr from nir_register

 src/gallium/drivers/vc4/vc4_program.c  |  2 +-
 src/glsl/nir/nir.c |  1 -
 src/glsl/nir/nir.h | 25 ++--
 src/glsl/nir/nir_from_ssa.c| 33 +-
 src/mesa/drivers/dri/i965/brw_fs.h |  5 ++
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 73 ++
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |  1 +
 src/mesa/drivers/dri/i965/brw_nir.c|  2 +-
 .../dri/i965/brw_nir_analyze_boolean_resolves.c| 12 ++--
 9 files changed, 84 insertions(+), 70 deletions(-)

-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/4] nir: remove nir_src_get_parent_instr()

2015-06-25 Thread Connor Abbott

It's now unused.

Signed-off-by: Connor Abbott cwabbo...@gmail.com
---
 src/glsl/nir/nir.h | 10 --
 1 file changed, 10 deletions(-)

diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index 2116f60..b33c9c5 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -565,16 +565,6 @@ nir_src_for_reg(nir_register *reg)
return src;
 }
 
-static inline nir_instr *
-nir_src_get_parent_instr(const nir_src *src)
-{
-   if (src-is_ssa) {
-  return src-ssa-parent_instr;
-   } else {
-  return src-reg.reg-parent_instr;
-   }
-}
-
 static inline nir_dest
 nir_dest_for_reg(nir_register *reg)
 {
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 08/17] i965/fs: Make better use of the builder in shader_time

2015-06-25 Thread Jason Ekstrand

On Tue, Jun 23, 2015 at 2:09 AM, Pohjolainen, Topi
topi.pohjolai...@intel.com wrote:
 On Thu, Jun 18, 2015 at 05:51:37PM -0700, Jason Ekstrand wrote:
 Previously, we were just depending on register widths to ensure that
 various things were exec_size of 1 etc.  Now, we do so explicitly using the
 builder.
 ---
  src/mesa/drivers/dri/i965/brw_fs.cpp | 14 --
  1 file changed, 8 insertions(+), 6 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs.cpp
 index c13ac7d..740b51d 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
 @@ -557,7 +557,7 @@ fs_visitor::get_timestamp(const fs_builder bld)
 /* We want to read the 3 fields we care about even if it's not enabled in
  * the dispatch.
  */
 -   bld.exec_all().MOV(dst, ts);
 +   bld.group(4, 0).exec_all().MOV(dst, ts);

 Just to make sure I understand correctly, we want SIMD4 in order to read wide
 enough to get all the mentioned 3 fields?

I believe so, yes.


 /* The caller wants the low 32 bits of the timestamp.  Since it's running
  * at the GPU clock rate of ~1.2ghz, it will roll over every ~3 seconds,
 @@ -637,17 +637,19 @@ fs_visitor::emit_shader_time_end()
 start.negate = true;
 fs_reg diff = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD, 1);
 diff.set_smear(0);
 -   ibld.ADD(diff, start, shader_end_time);
 +
 +   const fs_builder cbld = ibld.group(1, 0);
 +   cbld.group(1, 0).ADD(diff, start, shader_end_time);

 /* If there were no instructions between the two timestamp gets, the diff
  * is 2 cycles.  Remove that overhead, so I can forget about that when
  * trying to determine the time taken for single instructions.
  */
 -   ibld.ADD(diff, diff, fs_reg(-2u));
 -   SHADER_TIME_ADD(ibld, type, diff);
 -   SHADER_TIME_ADD(ibld, written_type, fs_reg(1u));
 +   cbld.ADD(diff, diff, fs_reg(-2u));
 +   SHADER_TIME_ADD(cbld, type, diff);
 +   SHADER_TIME_ADD(cbld, written_type, fs_reg(1u));
 ibld.emit(BRW_OPCODE_ELSE);
 -   SHADER_TIME_ADD(ibld, reset_type, fs_reg(1u));
 +   SHADER_TIME_ADD(cbld, reset_type, fs_reg(1u));
 ibld.emit(BRW_OPCODE_ENDIF);
  }

 --
 2.4.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v4 3/6] mesa/es3.1: enable GL_ARB_texture_multisample for GLES 3.1

2015-06-25 Thread Ilia Mirkin

On Thu, Jun 25, 2015 at 5:08 AM, Marta Lofstedt
marta.lofst...@linux.intel.com wrote:
 From: Marta Lofstedt marta.lofst...@intel.com

 v4 : only expose GL_ARB_texture_multisample enums
 for gles 3.1 and desktop GL.

I was suspicious of this logic. Based on my reading of the code, what
your ARB_texture_multisample_es31 thing does is expose those enums
when *either* the driver enables ARB_texture_multisample *or* the
current context is ES3.1.

ARB_draw_indirect_es31 has the same problem, btw.

I could have misread the get.c extra_ext() logic, but I don't think I
have. As far as I can tell there's no way to (generically) AND these
things.

What you really need to do is create a whole new [GL, GL_CORE, ES31]
section in get_hash_params and update get_hash_generator.py
accordingly.

Cheers,

  -ilia


 Signed-off-by: Marta Lofstedt marta.lofst...@intel.com
 ---
  src/mesa/main/get.c  |  6 ++
  src/mesa/main/get_hash_params.py | 18 +-
  2 files changed, 15 insertions(+), 9 deletions(-)

 diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
 index 4a0537d..6148349 100644
 --- a/src/mesa/main/get.c
 +++ b/src/mesa/main/get.c
 @@ -367,6 +367,12 @@ static const int extra_ARB_shader_atomic_counters_es31[] 
 = {
 EXTRA_END
  };

 +static const int extra_ARB_texture_multisample_es31[] = {
 +   EXT(ARB_texture_multisample),
 +   EXTRA_API_ES31,
 +   EXTRA_END
 +};
 +
  EXTRA_EXT(ARB_texture_cube_map);
  EXTRA_EXT(EXT_texture_array);
  EXTRA_EXT(NV_fog_distance);
 diff --git a/src/mesa/main/get_hash_params.py 
 b/src/mesa/main/get_hash_params.py
 index e25dbe1..0026000 100644
 --- a/src/mesa/main/get_hash_params.py
 +++ b/src/mesa/main/get_hash_params.py
 @@ -426,6 +426,15 @@ descriptor=[
[ MAX_FRAGMENT_ATOMIC_COUNTERS, 
 CONTEXT_INT(Const.Program[MESA_SHADER_FRAGMENT].MaxAtomicCounters), 
 extra_ARB_shader_atomic_counters_es31 ],
[ MAX_COMBINED_ATOMIC_COUNTER_BUFFERS, 
 CONTEXT_INT(Const.MaxCombinedAtomicBuffers), 
 extra_ARB_shader_atomic_counters_es31 ],
[ MAX_COMBINED_ATOMIC_COUNTERS, 
 CONTEXT_INT(Const.MaxCombinedAtomicCounters), 
 extra_ARB_shader_atomic_counters_es31 ],
 +
 +# GL_ARB_texture_multisample / GLES 3.1
 +  [ TEXTURE_BINDING_2D_MULTISAMPLE, LOC_CUSTOM, TYPE_INT, 
 TEXTURE_2D_MULTISAMPLE_INDEX, extra_ARB_texture_multisample_es31 ],
 +  [ TEXTURE_BINDING_2D_MULTISAMPLE_ARRAY, LOC_CUSTOM, TYPE_INT, 
 TEXTURE_2D_MULTISAMPLE_ARRAY_INDEX, extra_ARB_texture_multisample_es31 ],
 +  [ MAX_COLOR_TEXTURE_SAMPLES, CONTEXT_INT(Const.MaxColorTextureSamples), 
 extra_ARB_texture_multisample_es31 ],
 +  [ MAX_DEPTH_TEXTURE_SAMPLES, CONTEXT_INT(Const.MaxDepthTextureSamples), 
 extra_ARB_texture_multisample_es31 ],
 +  [ MAX_INTEGER_SAMPLES, CONTEXT_INT(Const.MaxIntegerSamples), 
 extra_ARB_texture_multisample_es31 ],
 +  [ SAMPLE_MASK, CONTEXT_BOOL(Multisample.SampleMask), 
 extra_ARB_texture_multisample_es31 ],
 +  [ MAX_SAMPLE_MASK_WORDS, CONST(1), extra_ARB_texture_multisample_es31 
 ],
  ]},

  # Enums in OpenGL Core profile and ES 3.1
 @@ -717,15 +726,6 @@ descriptor=[
[ TEXTURE_BUFFER_FORMAT_ARB, LOC_CUSTOM, TYPE_INT, 0, 
 extra_texture_buffer_object ],
[ TEXTURE_BUFFER_ARB, LOC_CUSTOM, TYPE_INT, 0, 
 extra_texture_buffer_object ],

 -# GL_ARB_texture_multisample / GL 3.2
 -  [ TEXTURE_BINDING_2D_MULTISAMPLE, LOC_CUSTOM, TYPE_INT, 
 TEXTURE_2D_MULTISAMPLE_INDEX, extra_ARB_texture_multisample ],
 -  [ TEXTURE_BINDING_2D_MULTISAMPLE_ARRAY, LOC_CUSTOM, TYPE_INT, 
 TEXTURE_2D_MULTISAMPLE_ARRAY_INDEX, extra_ARB_texture_multisample ],
 -  [ MAX_COLOR_TEXTURE_SAMPLES, CONTEXT_INT(Const.MaxColorTextureSamples), 
 extra_ARB_texture_multisample ],
 -  [ MAX_DEPTH_TEXTURE_SAMPLES, CONTEXT_INT(Const.MaxDepthTextureSamples), 
 extra_ARB_texture_multisample ],
 -  [ MAX_INTEGER_SAMPLES, CONTEXT_INT(Const.MaxIntegerSamples), 
 extra_ARB_texture_multisample ],
 -  [ SAMPLE_MASK, CONTEXT_BOOL(Multisample.SampleMask), 
 extra_ARB_texture_multisample ],
 -  [ MAX_SAMPLE_MASK_WORDS, CONST(1), extra_ARB_texture_multisample ],
 -
  # GL 3.0
[ CONTEXT_FLAGS, CONTEXT_INT(Const.ContextFlags), extra_version_30 ],

 --
 1.9.1

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 16/19] i965/fs: Use the builder dispatch_width for computing register offsets

2015-06-25 Thread Jason Ekstrand

Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com
---
 src/mesa/drivers/dri/i965/brw_fs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index d4cc43d..d94a842 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -72,7 +72,7 @@ offset(fs_reg reg, const brw::fs_builder bld, unsigned delta)
case MRF:
case ATTR:
   return byte_offset(reg,
- delta * MAX2(reg.width * reg.stride, 1) *
+ delta * bld.dispatch_width() * reg.stride *
  type_sz(reg.type));
case UNIFORM:
   reg.reg_offset += delta;
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radeonsi: add support for geometry shader invocations.

2015-06-25 Thread Dave Airlie

From: Dave Airlie airl...@redhat.com

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/radeonsi/si_shader.c| 5 +
 src/gallium/drivers/radeonsi/si_shader.h| 1 +
 src/gallium/drivers/radeonsi/si_state.c | 1 -
 src/gallium/drivers/radeonsi/si_state_shaders.c | 7 +++
 4 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 87608a1..665ce83 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -630,6 +630,11 @@ static void declare_system_value(
 SI_PARAM_BASE_VERTEX);
break;
 
+   case TGSI_SEMANTIC_INVOCATIONID:
+   value = LLVMGetParam(radeon_bld-main_fn,
+SI_PARAM_GS_INSTANCE_ID);
+   break;
+
case TGSI_SEMANTIC_SAMPLEID:
value = get_sample_id(radeon_bld);
break;
diff --git a/src/gallium/drivers/radeonsi/si_shader.h 
b/src/gallium/drivers/radeonsi/si_shader.h
index 51055af..b4339ae 100644
--- a/src/gallium/drivers/radeonsi/si_shader.h
+++ b/src/gallium/drivers/radeonsi/si_shader.h
@@ -115,6 +115,7 @@ struct si_shader_selector {
 
unsignedgs_output_prim;
unsignedgs_max_out_vertices;
+   unsignedgs_num_invocations;
uint64_tgs_used_inputs; /* mask of get_unique_index bits */
 };
 
diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index 752467b..0dd08a2 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -3078,7 +3078,6 @@ void si_init_config(struct si_context *sctx)
si_pm4_set_reg(pm4, R_028B60_VGT_GS_VERT_ITEMSIZE_1, 0);
si_pm4_set_reg(pm4, R_028B64_VGT_GS_VERT_ITEMSIZE_2, 0);
si_pm4_set_reg(pm4, R_028B68_VGT_GS_VERT_ITEMSIZE_3, 0);
-   si_pm4_set_reg(pm4, R_028B90_VGT_GS_INSTANCE_CNT, 0);
 
si_pm4_set_reg(pm4, R_028B98_VGT_STRMOUT_BUFFER_CONFIG, 0x0);
si_pm4_set_reg(pm4, R_028AB4_VGT_REUSE_OFF, 0);
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 48128fa..eef3baa 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -76,6 +76,7 @@ static void si_shader_gs(struct si_shader *shader)
unsigned gs_vert_itemsize = shader-selector-info.num_outputs * (16  
2);
unsigned gs_max_vert_out = shader-selector-gs_max_out_vertices;
unsigned gsvs_itemsize = gs_vert_itemsize * gs_max_vert_out;
+   unsigned gs_num_invocations = shader-selector-gs_num_invocations;
unsigned cut_mode;
struct si_pm4_state *pm4;
unsigned num_sgprs, num_user_sgprs;
@@ -118,6 +119,10 @@ static void si_shader_gs(struct si_shader *shader)
 
si_pm4_set_reg(pm4, R_028B5C_VGT_GS_VERT_ITEMSIZE, gs_vert_itemsize);
 
+   si_pm4_set_reg(pm4, R_028B90_VGT_GS_INSTANCE_CNT,
+  S_028B90_CNT(MIN2(gs_num_invocations, 127)) |
+  S_028B90_ENABLE(gs_num_invocations  0));
+
va = shader-bo-gpu_address;
si_pm4_add_bo(pm4, shader-bo, RADEON_USAGE_READ, 
RADEON_PRIO_SHADER_DATA);
si_pm4_set_reg(pm4, R_00B220_SPI_SHADER_PGM_LO_GS, va  8);
@@ -490,6 +495,8 @@ static void *si_create_shader_state(struct pipe_context 
*ctx,
sel-info.properties[TGSI_PROPERTY_GS_OUTPUT_PRIM];
sel-gs_max_out_vertices =

sel-info.properties[TGSI_PROPERTY_GS_MAX_OUTPUT_VERTICES];
+   sel-gs_num_invocations =
+   sel-info.properties[TGSI_PROPERTY_GS_INVOCATIONS];
 
for (i = 0; i  sel-info.num_inputs; i++) {
unsigned name = sel-info.input_semantic_name[i];
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallium/hud: prevent NULL pointer dereference with pipe_query functions

2015-06-25 Thread Samuel Pitoiset




On 06/25/2015 02:36 PM, Marek Olšák wrote:

What's the point of drawing a HUD pane if a query cannot be created?


With my series which adds support for global performance counters on 
NV50, query_create() may fail if we want to monitor *two* different 
query types with the HUD.
This limitation is due to how the HUD uses the pipe_query interface and 
this doesn't fit well with the underlying interface exposed by Nouveau.


In other words, with two different query types the scenario is as follows:
CREATE Q1, BEGIN Q1, CREATE Q2, BEGIN Q2, END Q1, RESULT Q1, BEGIN Q1, 
END Q2, RESULT Q2, BEGIN Q2, END Q1, and so on.


But, with nv50/nvc0 drivers I need to schedule hardware counters at 
query creation and this is going to be pretty hard without a really 
weird workaround.
Hence, only one query type can be monitored simultaneously, and 
query_create() fails.


A better scenario for nouveau drivers will be:
CREATE Q1, CREATE Q2, BEGIN Q1, BEGIN Q2, END Q1, END Q2, RESULT Q1, 
RESULT Q2, BEGIN Q1, and so on.
This could allow to introduce, for example, begin_all_queries() and 
end_all_queries() to be able to create/begin/end all queries in one shot 
*only*.


My plan is to change this behaviour but it will require lot of changes 
in the HUD mainly because queries are collected by pane.



Can we detect this during initialization?


I'm not sure if we can detect this at initialization and if this is 
going to be easy to do.
But, how can we handle the case where a driver will only fail one time 
to create a query? Do we need to remove the pane? Not sure.
This is going to be hard to say, especially because nouveau drivers 
could fail if no hardware counters are available.




Marek

On Wed, Jun 24, 2015 at 9:26 PM, Samuel Pitoiset
samuel.pitoi...@gmail.com wrote:

The HUD doesn't check if query_create() fails and it calls other
pipe_query functions with NULL pointer instead of a valid query object.

Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com
---
  src/gallium/auxiliary/hud/hud_driver_query.c | 14 --
  1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/src/gallium/auxiliary/hud/hud_driver_query.c 
b/src/gallium/auxiliary/hud/hud_driver_query.c
index 603aba7..ee71678 100644
--- a/src/gallium/auxiliary/hud/hud_driver_query.c
+++ b/src/gallium/auxiliary/hud/hud_driver_query.c
@@ -62,7 +62,8 @@ query_new_value(struct hud_graph *gr)
 uint64_t now = os_time_get();

 if (info-last_time) {
-  pipe-end_query(pipe, info-query[info-head]);
+  if (info-query[info-head])
+ pipe-end_query(pipe, info-query[info-head]);

/* read query results */
while (1) {
@@ -70,7 +71,7 @@ query_new_value(struct hud_graph *gr)
   union pipe_query_result result;
   uint64_t *res64 = (uint64_t *)result;

- if (pipe-get_query_result(pipe, query, FALSE, result)) {
+ if (query  pipe-get_query_result(pipe, query, FALSE, result)) {
  info-results_cumulative += res64[info-result_index];
  info-num_results++;

@@ -88,7 +89,8 @@ query_new_value(struct hud_graph *gr)
 gallium_hud: all queries are busy after %i frames, 
 can't add another query\n,
 NUM_QUERIES);
-   pipe-destroy_query(pipe, info-query[info-head]);
+   if (info-query[info-head])
+  pipe-destroy_query(pipe, info-query[info-head]);
 info-query[info-head] =
   pipe-create_query(pipe, info-query_type, 0);
  }
@@ -113,15 +115,15 @@ query_new_value(struct hud_graph *gr)
   info-results_cumulative = 0;
   info-num_results = 0;
}
-
-  pipe-begin_query(pipe, info-query[info-head]);
 }
 else {
/* initialize */
info-last_time = now;
info-query[info-head] = pipe-create_query(pipe, info-query_type, 0);
-  pipe-begin_query(pipe, info-query[info-head]);
 }
+
+   if (info-query[info-head])
+  pipe-begin_query(pipe, info-query[info-head]);
  }

  static void
--
2.4.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 90264] [Regression, bisected] Tooltip corruption in Chrome

2015-06-25 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=90264

--- Comment #23 from Furkan fal...@gmail.com ---
Still there for me.

I'm using Ubuntu 15.04, Chrome 43.0.2357.130, Linux 4.1, and Mesa
10.7~git1506250730.d1663c~gd~v from oibaf ppa.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/6] i965: Define HW-binding table and resource streamer control opcodes

2015-06-25 Thread Abdiel Janulgue

v2: Use macros for HW binding table edits (Topi)
v3: Add Broadwell support.
v4: Make hardware binding table bit definitions even more clearer (Ken)

Cc: kenn...@whitecape.org
Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com
Reviewed-by: Kenneth Graunke kenn...@whitecape.org
Signed-off-by: Abdiel Janulgue abdiel.janul...@linux.intel.com
---
 src/mesa/drivers/dri/i965/brw_defines.h | 30 ++
 src/mesa/drivers/dri/i965/intel_reg.h   |  3 +++
 2 files changed, 33 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 66b9abc..ca0064e 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1640,6 +1640,36 @@ enum brw_message_target {
 #define _3DSTATE_BINDING_TABLE_POINTERS_GS 0x7829 /* GEN7+ */
 #define _3DSTATE_BINDING_TABLE_POINTERS_PS 0x782A /* GEN7+ */
 
+#define _3DSTATE_BINDING_TABLE_POOL_ALLOC   0x7919 /* GEN7.5+ */
+#define BRW_HW_BINDING_TABLE_ENABLE (1  11)
+#define GEN7_HW_BT_POOL_MOCS_SHIFT  7
+#define GEN7_HW_BT_POOL_MOCS_MASK   INTEL_MASK(10, 7)
+#define GEN8_HW_BT_POOL_MOCS_SHIFT  0
+#define GEN8_HW_BT_POOL_MOCS_MASK   INTEL_MASK(6, 0)
+/* Only required in HSW */
+#define HSW_BT_POOL_ALLOC_MUST_BE_ONE   (3  5)
+
+#define _3DSTATE_BINDING_TABLE_EDIT_VS  0x7843 /* GEN7.5 */
+#define _3DSTATE_BINDING_TABLE_EDIT_GS  0x7844 /* GEN7.5 */
+#define _3DSTATE_BINDING_TABLE_EDIT_HS  0x7845 /* GEN7.5 */
+#define _3DSTATE_BINDING_TABLE_EDIT_DS  0x7846 /* GEN7.5 */
+#define _3DSTATE_BINDING_TABLE_EDIT_PS  0x7847 /* GEN7.5 */
+#define BRW_BINDING_TABLE_INDEX_SHIFT   16
+#define BRW_BINDING_TABLE_INDEX_MASKINTEL_MASK(23, 16)
+
+#define BRW_BINDING_TABLE_EDIT_TARGET_ALL   3
+#define BRW_BINDING_TABLE_EDIT_TARGET_CORE1 2
+#define BRW_BINDING_TABLE_EDIT_TARGET_CORE0 1
+/* In HSW, when editing binding table entries to surface state offsets,
+ * the surface state offset is a 16-bit value aligned to 32 bytes. But
+ * Surface State Pointer in dword 2 is [15:0]. Right shift surf_offset
+ * by 5 bits so it won't disturb bit 16 (which is used as the binding
+ * table index entry), otherwise it would hang the GPU.
+ */
+#define HSW_SURFACE_STATE_EDIT(value)   (value  5)
+/* Same as Haswell, but surface state offsets now aligned to 64 bytes.*/
+#define GEN8_SURFACE_STATE_EDIT(value)  (value  6)
+
 #define _3DSTATE_SAMPLER_STATE_POINTERS0x7802 /* GEN6+ */
 # define PS_SAMPLER_STATE_CHANGE   (1  12)
 # define GS_SAMPLER_STATE_CHANGE   (1  9)
diff --git a/src/mesa/drivers/dri/i965/intel_reg.h 
b/src/mesa/drivers/dri/i965/intel_reg.h
index bd14e18..98adf45 100644
--- a/src/mesa/drivers/dri/i965/intel_reg.h
+++ b/src/mesa/drivers/dri/i965/intel_reg.h
@@ -47,6 +47,9 @@
 /* Load a value from memory into a register.  Only available on Gen7+. */
 #define GEN7_MI_LOAD_REGISTER_MEM  (CMD_MI | (0x29  23))
 # define MI_LOAD_REGISTER_MEM_USE_GGTT (1  22)
+/* Haswell RS control */
+#define MI_RS_CONTROL   (CMD_MI | (0x6  23))
+#define MI_RS_STORE_DATA_IMM(CMD_MI | (0x2b  23))
 
 /* Manipulate the predicate bit based on some register values. Only on Gen7+ */
 #define GEN7_MI_PREDICATE  (CMD_MI | (0xC  23))
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/6] i965: Enable resource streamer for the batchbuffer

2015-06-25 Thread Abdiel Janulgue

Check first if the hardware and kernel supports resource streamer. If this
is allowed, tell the kernel to enable the resource streamer enable bit on
MI_BATCHBUFFER_START by specifying I915_EXEC_RESOURCE_STREAMER
execbuffer flags.

v2: - Use new I915_PARAM_HAS_RESOURCE_STREAMER ioctl to check if kernel
  supports RS (Ken).
- Add brw_device_info::has_resource_streamer and toggle it for
  Haswell, Broadwell, Cherryview, Skylake, and Broxton (Ken).
v3: - Update I915_PARAM_HAS_RESOURCE_STREAMER to match updated kernel.
v4: - Always inspect the getparam.value (Chris Wilson).

Cc: kenn...@whitecape.org
Cc: ch...@chris-wilson.co.uk
Signed-off-by: Abdiel Janulgue abdiel.janul...@linux.intel.com
---
 src/mesa/drivers/dri/i965/brw_context.c   |  5 +
 src/mesa/drivers/dri/i965/brw_context.h   |  1 +
 src/mesa/drivers/dri/i965/brw_device_info.c   |  5 -
 src/mesa/drivers/dri/i965/brw_device_info.h   |  1 +
 src/mesa/drivers/dri/i965/intel_batchbuffer.c |  8 +++-
 src/mesa/drivers/dri/i965/intel_screen.c  | 14 ++
 src/mesa/drivers/dri/i965/intel_screen.h  |  5 +
 7 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 4b51fe5..6b08216 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -865,6 +865,11 @@ brwCreateContext(gl_api api,
 
brw-predicate.state = BRW_PREDICATE_STATE_RENDER;
 
+   brw-use_resource_streamer = devinfo-has_resource_streamer 
+  screen-has_resource_streamer 
+  (brw_env_var_as_boolean(INTEL_USE_HW_BT, false) ||
+   brw_env_var_as_boolean(INTEL_USE_GATHER, false));
+
ctx-VertexProgram._MaintainTnlProgram = true;
ctx-FragmentProgram._MaintainTexEnvProgram = true;
 
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 3553f6e..10d8f1e 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1130,6 +1130,7 @@ struct brw_context
bool has_pln;
bool no_simd8;
bool use_rep_send;
+   bool use_resource_streamer;
 
/**
 * Some versions of Gen hardware don't do centroid interpolation correctly
diff --git a/src/mesa/drivers/dri/i965/brw_device_info.c 
b/src/mesa/drivers/dri/i965/brw_device_info.c
index 342e566..51a91b6 100644
--- a/src/mesa/drivers/dri/i965/brw_device_info.c
+++ b/src/mesa/drivers/dri/i965/brw_device_info.c
@@ -170,7 +170,8 @@ static const struct brw_device_info brw_device_info_byt = {
 #define HSW_FEATURES \
GEN7_FEATURES,\
.is_haswell = true,   \
-   .supports_simd16_3src = true
+   .supports_simd16_3src = true, \
+   .has_resource_streamer = true
 
 static const struct brw_device_info brw_device_info_hsw_gt1 = {
HSW_FEATURES, .gt = 1,
@@ -229,6 +230,7 @@ static const struct brw_device_info brw_device_info_hsw_gt3 
= {
 #define GEN8_FEATURES   \
.gen = 8,\
.has_hiz_and_separate_stencil = true,\
+   .has_resource_streamer = true,   \
.must_use_separate_stencil = true,   \
.has_llc = true, \
.has_pln = true, \
@@ -301,6 +303,7 @@ static const struct brw_device_info brw_device_info_chv = {
 #define GEN9_FEATURES   \
.gen = 9,\
.has_hiz_and_separate_stencil = true,\
+   .has_resource_streamer = true,   \
.must_use_separate_stencil = true,   \
.has_llc = true, \
.has_pln = true, \
diff --git a/src/mesa/drivers/dri/i965/brw_device_info.h 
b/src/mesa/drivers/dri/i965/brw_device_info.h
index 7b7a1fc..2a73e93 100644
--- a/src/mesa/drivers/dri/i965/brw_device_info.h
+++ b/src/mesa/drivers/dri/i965/brw_device_info.h
@@ -46,6 +46,7 @@ struct brw_device_info
bool has_compr4;
bool has_surface_tile_offset;
bool supports_simd16_3src;
+   bool has_resource_streamer;
 
/**
 * Quirks:
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 54081a1..4e82003 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -267,6 +267,11 @@ throttle(struct brw_context *brw)
}
 }
 
+/* Drop when RS headers get pulled to libdrm */
+#ifndef I915_EXEC_RESOURCE_STREAMER
+#define I915_EXEC_RESOURCE_STREAMER (115)
+#endif
+
 /* TODO: Push this whole function into bufmgr.
  */
 static int
@@ -293,7 +298,8 @@ do_flush_locked(struct brw_context *brw)
   if (brw-gen = 6  batch-ring == BLT_RING) {
  flags = I915_EXEC_BLT;
   } else {
- flags = I915_EXEC_RENDER;
+ flags = I915_EXEC_RENDER |
+

[Mesa-dev] [PATCH 5/6] i965: Upload binding tables in hw-generated binding table format.

2015-06-25 Thread Abdiel Janulgue

When hardware-generated binding tables are enabled, use the hw-generated
binding table format when uploading binding table state.

Normally, the CS will will just consume the binding table pointer commands
as pipelined state. When the RS is enabled however, the RS flushes whatever
edited surface state entries of our on-chip binding table to the binding
table pool before passing the command on to the CS.

Note that the the binding table pointer offset is relative to the binding table
pool base address when resource streamer instead of the surface state base 
address.

v2: Fix possible buffer overflow when allocating a chunk out of the
hw-binding table pool (Ken).

Cc: kenn...@whitecape.org
Signed-off-by: Abdiel Janulgue abdiel.janul...@linux.intel.com
---
 src/mesa/drivers/dri/i965/brw_binding_tables.c | 73 --
 1 file changed, 57 insertions(+), 16 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c 
b/src/mesa/drivers/dri/i965/brw_binding_tables.c
index a0bc2f3..a438d10 100644
--- a/src/mesa/drivers/dri/i965/brw_binding_tables.c
+++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c
@@ -50,6 +50,26 @@ static const GLuint stage_to_bt_edit[MESA_SHADER_FRAGMENT + 
1] = {
_3DSTATE_BINDING_TABLE_EDIT_PS,
 };
 
+static uint32_t
+reserve_hw_bt_space(struct brw_context *brw, unsigned bytes)
+{
+   if (brw-hw_bt_pool.next_offset + bytes = brw-hw_bt_pool.bo-size - 128) {
+  gen7_reset_hw_bt_pool_offsets(brw);
+   }
+
+   uint32_t offset = brw-hw_bt_pool.next_offset;
+
+   /* From the Haswell PRM, Volume 2b: Command Reference: Instructions,
+* 3DSTATE_BINDING_TABLE_POINTERS_xS:
+*
+* If HW Binding Table is enabled, the offset is relative to the
+*  Binding Table Pool Base Address and the alignment is 64 bytes.
+*/
+   brw-hw_bt_pool.next_offset += ALIGN(bytes, 64);
+
+   return offset;
+}
+
 /**
  * Upload a shader stage's binding table as indirect state.
  *
@@ -70,30 +90,51 @@ brw_upload_binding_table(struct brw_context *brw,
 
   stage_state-bind_bo_offset = 0;
} else {
-  /* Upload a new binding table. */
-  if (INTEL_DEBUG  DEBUG_SHADER_TIME) {
- brw-vtbl.emit_buffer_surface_state(
-brw, stage_state-surf_offset[
-prog_data-binding_table.shader_time_start],
-brw-shader_time.bo, 0, BRW_SURFACEFORMAT_RAW,
-brw-shader_time.bo-size, 1, true);
+  /* When RS is enabled use hw-binding table uploads, otherwise fallback to
+   * software-uploads.
+   */
+  if (brw-use_resource_streamer) {
+ gen7_update_binding_table_from_array(brw, stage_state-stage,
+  stage_state-surf_offset,
+  prog_data-binding_table
+  .size_bytes / 4);
+  } else {
+ /* Upload a new binding table. */
+ if (INTEL_DEBUG  DEBUG_SHADER_TIME) {
+brw-vtbl.emit_buffer_surface_state(
+   brw, stage_state-surf_offset[
+  prog_data-binding_table.shader_time_start],
+   brw-shader_time.bo, 0, BRW_SURFACEFORMAT_RAW,
+   brw-shader_time.bo-size, 1, true);
+ }
+
+ uint32_t *bind = brw_state_batch(brw, AUB_TRACE_BINDING_TABLE,
+  prog_data-binding_table.size_bytes,
+  32,
+  stage_state-bind_bo_offset);
+
+ /* BRW_NEW_SURFACES and BRW_NEW_*_CONSTBUF */
+ memcpy(bind, stage_state-surf_offset,
+prog_data-binding_table.size_bytes);
   }
-
-  uint32_t *bind = brw_state_batch(brw, AUB_TRACE_BINDING_TABLE,
-   prog_data-binding_table.size_bytes, 32,
-   stage_state-bind_bo_offset);
-
-  /* BRW_NEW_SURFACES and BRW_NEW_*_CONSTBUF */
-  memcpy(bind, stage_state-surf_offset,
- prog_data-binding_table.size_bytes);
}
 
brw-ctx.NewDriverState |= brw_new_binding_table;
 
if (brw-gen = 7) {
+
+  if (brw-use_resource_streamer)
+ stage_state-bind_bo_offset =
+reserve_hw_bt_space(brw, prog_data-binding_table.size_bytes);
+
   BEGIN_BATCH(2);
   OUT_BATCH(packet_name  16 | (2 - 2));
-  OUT_BATCH(stage_state-bind_bo_offset);
+  /* Align SurfaceStateOffset[16:6] format to [15:5] PS Binding Table field
+   * when hw-generated binding table is enabled.
+   */
+  OUT_BATCH(brw-use_resource_streamer ?
+(stage_state-bind_bo_offset  1) :
+stage_state-bind_bo_offset);
   ADVANCE_BATCH();
}
 }
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/6] i965: Implement interface to edit binding table entries

2015-06-25 Thread Abdiel Janulgue

Unlike normal software binding tables where the driver has to manually
generate and fill a binding table array which are then uploaded to the
hardware, the resource streamer instead presents the driver with an option
to fill out slots for individual binding table indices. The hardware
accumulates the state for these combined edits which it then automatically
flushes to a binding table pool when the binding table pointer state
command is invoked.

v2: Clarify binding table edit bit aligment (Topi).
v3: Make comments and function names more clearer (Ken).

Reviewed-by: Kenneth Graunke kenn...@whitecape.org
Signed-off-by: Abdiel Janulgue abdiel.janul...@linux.intel.com
---
 src/mesa/drivers/dri/i965/brw_binding_tables.c | 55 ++
 src/mesa/drivers/dri/i965/brw_state.h  |  9 +
 2 files changed, 64 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c 
b/src/mesa/drivers/dri/i965/brw_binding_tables.c
index 6bc540f..a0bc2f3 100644
--- a/src/mesa/drivers/dri/i965/brw_binding_tables.c
+++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c
@@ -44,6 +44,12 @@
 #include brw_state.h
 #include intel_batchbuffer.h
 
+static const GLuint stage_to_bt_edit[MESA_SHADER_FRAGMENT + 1] = {
+   _3DSTATE_BINDING_TABLE_EDIT_VS,
+   _3DSTATE_BINDING_TABLE_EDIT_GS,
+   _3DSTATE_BINDING_TABLE_EDIT_PS,
+};
+
 /**
  * Upload a shader stage's binding table as indirect state.
  *
@@ -171,6 +177,55 @@ const struct brw_tracked_state brw_gs_binding_table = {
 };
 
 /**
+ * Edit a single entry in a hardware-generated binding table
+ */
+void
+gen7_edit_hw_binding_table_entry(struct brw_context *brw,
+ gl_shader_stage stage,
+ uint32_t index,
+ uint32_t surf_offset)
+{
+   assert(stage = MESA_SHADER_FRAGMENT);
+
+   uint32_t dw2 = SET_FIELD(index, BRW_BINDING_TABLE_INDEX) |
+  (brw-gen = 8 ? GEN8_SURFACE_STATE_EDIT(surf_offset) :
+   HSW_SURFACE_STATE_EDIT(surf_offset));
+
+   BEGIN_BATCH(3);
+   OUT_BATCH(stage_to_bt_edit[stage]  16 | (3 - 2));
+   OUT_BATCH(BRW_BINDING_TABLE_EDIT_TARGET_ALL);
+   OUT_BATCH(dw2);
+   ADVANCE_BATCH();
+}
+
+/**
+ * Upload a whole hardware binding table for the given stage.
+ *
+ * Takes an array of surface offsets and the number of binding table
+ * entries.
+ */
+void
+gen7_update_binding_table_from_array(struct brw_context *brw,
+ gl_shader_stage stage,
+ const uint32_t* binding_table,
+ int num_surfaces)
+{
+   uint32_t dw2 = 0;
+   assert(stage = MESA_SHADER_FRAGMENT);
+
+   BEGIN_BATCH(num_surfaces + 2);
+   OUT_BATCH(stage_to_bt_edit[stage]  16 | num_surfaces);
+   OUT_BATCH(BRW_BINDING_TABLE_EDIT_TARGET_ALL);
+   for (int i = 0; i  num_surfaces; i++) {
+  dw2 = SET_FIELD(i, BRW_BINDING_TABLE_INDEX) |
+ (brw-gen = 8 ? GEN8_SURFACE_STATE_EDIT(binding_table[i]) :
+  HSW_SURFACE_STATE_EDIT(binding_table[i]));
+  OUT_BATCH(dw2);
+   }
+   ADVANCE_BATCH();
+}
+
+/**
  * Hardware-generated binding tables for the resource streamer
  */
 void
diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
b/src/mesa/drivers/dri/i965/brw_state.h
index f8ef98f..2eff1b5 100644
--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/drivers/dri/i965/brw_state.h
@@ -374,6 +374,15 @@ gen7_upload_constant_state(struct brw_context *brw,
bool active, unsigned opcode);
 
 void gen7_rs_control(struct brw_context *brw, int enable);
+
+void gen7_edit_hw_binding_table_entry(struct brw_context *brw,
+  gl_shader_stage stage,
+  uint32_t index,
+  uint32_t surf_offset);
+void gen7_update_binding_table_from_array(struct brw_context *brw,
+  gl_shader_stage stage,
+  const uint32_t* binding_table,
+  int num_surfaces);
 void gen7_enable_hw_binding_tables(struct brw_context *brw);
 void gen7_disable_hw_binding_tables(struct brw_context *brw);
 void gen7_reset_hw_bt_pool_offsets(struct brw_context *brw);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 6/6] i965: Disable resource streamer in BLORP

2015-06-25 Thread Abdiel Janulgue

Switch off hardware-generated binding tables and gather push
constants in the blorp. Blorp requires only a minimal set of
simple constants. There is no need for the extra complexity
to program a gather table entry into the pipeline.

Cc: kenn...@whitecape.org
Signed-off-by: Abdiel Janulgue abdiel.janul...@linux.intel.com
---
 src/mesa/drivers/dri/i965/gen7_blorp.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp 
b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
index abace6d..9822dc1 100644
--- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp
+++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
@@ -794,6 +794,8 @@ gen7_blorp_exec(struct brw_context *brw,
}
depthstencil_offset = gen6_blorp_emit_depth_stencil_state(brw, params);
gen7_blorp_emit_depth_stencil_state_pointers(brw, depthstencil_offset);
+   if (brw-use_resource_streamer)
+  gen7_disable_hw_binding_tables(brw);
if (params-use_wm_prog) {
   uint32_t wm_surf_offset_renderbuffer;
   uint32_t wm_surf_offset_texture = 0;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/6] i965: Enable hardware-generated binding tables on render path.

2015-06-25 Thread Abdiel Janulgue

This patch implements the binding table enable command which is also
used to allocate a binding table pool where where hardware-generated
binding table entries are flushed into. Each binding table offset in
the binding table pool is unique per each shader stage that are
enabled within a batch.

Also insert the required brw_tracked_state objects to enable
hw-generated binding tables in normal render path.

v2: - Use MOCS in binding table pool alloc for GEN8
- Fix spurious offset when allocating binding table pool entry
  and start from zero instead.
v3  - Include GEN8 fix for spurious offset above.

Cc: kenn...@whitecape.org
Signed-off-by: Abdiel Janulgue abdiel.janul...@linux.intel.com
---
 src/mesa/drivers/dri/i965/brw_binding_tables.c | 87 ++
 src/mesa/drivers/dri/i965/brw_context.c|  4 ++
 src/mesa/drivers/dri/i965/brw_context.h|  6 ++
 src/mesa/drivers/dri/i965/brw_state.h  |  6 ++
 src/mesa/drivers/dri/i965/brw_state_upload.c   |  4 ++
 src/mesa/drivers/dri/i965/gen7_disable.c   |  4 +-
 src/mesa/drivers/dri/i965/gen8_disable.c   |  4 +-
 src/mesa/drivers/dri/i965/intel_batchbuffer.c  |  4 ++
 8 files changed, 115 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c 
b/src/mesa/drivers/dri/i965/brw_binding_tables.c
index 98ff0dd..6bc540f 100644
--- a/src/mesa/drivers/dri/i965/brw_binding_tables.c
+++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c
@@ -170,6 +170,93 @@ const struct brw_tracked_state brw_gs_binding_table = {
.emit = brw_gs_upload_binding_table,
 };
 
+/**
+ * Hardware-generated binding tables for the resource streamer
+ */
+void
+gen7_disable_hw_binding_tables(struct brw_context *brw)
+{
+   int pkt_len = brw-gen = 8 ? 4 : 3;
+
+   BEGIN_BATCH(3);
+   OUT_BATCH(_3DSTATE_BINDING_TABLE_POOL_ALLOC  16 | (pkt_len - 2));
+   OUT_BATCH(brw-is_haswell ? HSW_BT_POOL_ALLOC_MUST_BE_ONE : 0);
+   OUT_BATCH(0);
+   ADVANCE_BATCH();
+
+   /* From the Haswell PRM, Volume 7: 3D Media GPGPU,
+* 3DSTATE_BINDING_TABLE_POOL_ALLOC  Programming Note:
+*
+* When switching between HW and SW binding table generation, SW must
+* issue a state cache invalidate.
+*/
+   brw_emit_pipe_control_flush(brw, PIPE_CONTROL_STATE_CACHE_INVALIDATE);
+}
+
+void
+gen7_enable_hw_binding_tables(struct brw_context *brw)
+{
+   if (!brw-use_resource_streamer)
+  return;
+
+   if (!brw-hw_bt_pool.bo) {
+  /* We use a single re-usable buffer object for the lifetime of the
+   * context and size it to maximum allowed binding tables that can be
+   * programmed per batch:
+   *
+   * From the Haswell PRM, Volume 7: 3D Media GPGPU,
+   * 3DSTATE_BINDING_TABLE_POOL_ALLOC  Programming Note:
+   * A maximum of 16,383 Binding tables are allowed in any batch buffer
+   */
+  static const int max_size = 16383 * 4;
+  brw-hw_bt_pool.bo = drm_intel_bo_alloc(brw-bufmgr, hw_bt,
+  max_size, 64);
+  brw-hw_bt_pool.next_offset = 0;
+   }
+
+   int pkt_len = brw-gen = 8 ? 4 : 3;
+   uint32_t dw1 = BRW_HW_BINDING_TABLE_ENABLE;
+   if (brw-is_haswell)
+  dw1 |= SET_FIELD(GEN7_MOCS_L3, GEN7_HW_BT_POOL_MOCS) |
+ HSW_BT_POOL_ALLOC_MUST_BE_ONE;
+   else if (brw-gen = 8)
+  dw1 |= BDW_MOCS_WB;
+
+   BEGIN_BATCH(3);
+   OUT_BATCH(_3DSTATE_BINDING_TABLE_POOL_ALLOC  16 | (pkt_len - 2));
+   if (brw-gen = 8) {
+  OUT_RELOC64(brw-hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0, dw1);
+  OUT_BATCH(brw-hw_bt_pool.bo-size);
+   } else {
+  OUT_RELOC(brw-hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0, dw1);
+  OUT_RELOC(brw-hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0,
+ brw-hw_bt_pool.bo-size);
+   }
+   ADVANCE_BATCH();
+
+   /* From the Haswell PRM, Volume 7: 3D Media GPGPU,
+* 3DSTATE_BINDING_TABLE_POOL_ALLOC  Programming Note:
+*
+* When switching between HW and SW binding table generation, SW must
+* issue a state cache invalidate.
+*/
+   brw_emit_pipe_control_flush(brw, PIPE_CONTROL_STATE_CACHE_INVALIDATE);
+}
+
+void
+gen7_reset_hw_bt_pool_offsets(struct brw_context *brw)
+{
+   brw-hw_bt_pool.next_offset = 0;
+}
+
+const struct brw_tracked_state gen7_hw_binding_tables = {
+   .dirty = {
+  .mesa = 0,
+  .brw = BRW_NEW_BATCH,
+   },
+   .emit = gen7_enable_hw_binding_tables
+};
+
 /** @} */
 
 /**
diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 6b08216..f2bd2d3 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -936,6 +936,10 @@ intelDestroyContext(__DRIcontext * driContextPriv)
if (brw-wm.base.scratch_bo)
   drm_intel_bo_unreference(brw-wm.base.scratch_bo);
 
+   gen7_reset_hw_bt_pool_offsets(brw);
+   drm_intel_bo_unreference(brw-hw_bt_pool.bo);
+   brw-hw_bt_pool.bo = NULL;
+
drm_intel_gem_context_destroy(brw-hw_ctx);
 
if (ctx-swrast_context) {
diff --git

[Mesa-dev] i965: hardware-generated binding tables

2015-06-25 Thread Abdiel Janulgue

Time for another rebase, yet again. My previous patches were getting stale...

Changes since last posting:
 - Always inspect the getparam.value in the I915_PARAM_HAS_RESOURCE_STREAMER
   ioctl as suggested by Chris Wilson.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa : NULL check InfoLog

2015-06-25 Thread Lofstedt, Marta

 -Original Message-
 From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On
 Behalf Of Matt Turner
 Sent: Tuesday, June 23, 2015 11:05 PM
 To: Marta Lofstedt
 Cc: mesa-dev@lists.freedesktop.org
 Subject: Re: [Mesa-dev] [PATCH] mesa : NULL check InfoLog

 On Tue, Jun 23, 2015 at 4:03 AM, Marta Lofstedt
 marta.lofst...@linux.intel.com wrote:
  From: Marta Lofstedt marta.lofst...@intel.com

  When a program is compiled, but linking failed the
  sh-InfoLog could be NULL. This is expoloited
  by OpenGL ES 3.1 conformance tests.

  Signed-off-by: Marta Lofstedt marta.lofst...@intel.com
  ---
   src/mesa/main/shaderapi.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

  diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
  index a4296ad..c783c69 100644
  --- a/src/mesa/main/shaderapi.c
  +++ b/src/mesa/main/shaderapi.c
  @@ -1920,8 +1920,8 @@ _mesa_create_shader_program(struct
 gl_context* ctx, GLboolean separate,
  }
   #endif
   }
  -
  -ralloc_strcat(shProg-InfoLog, sh-InfoLog);
  + if (sh-InfoLog)
  +   ralloc_strcat(shProg-InfoLog, sh-InfoLog);

 Wrong indentation on this line.
There is a strange mix of tabs and space indentation in this file. I hope it is
Correct of me to use space indentation in my V2.

 Surely just not writing to the info log isn't the right fix? If it's null, 
 shouldn't
 we instead ralloc_strdup() the string?
I ralloc_strdup the shProg-InfoLog in my V2.

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
--
Intel Sweden AB
Registered Office: Knarrarnasgatan 15, 164 40 Kista, Stockholm, Sweden
Registration Number: 556189-6027

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC] gallium/hud: fix issue w/ tgsi_to_nir

2015-06-25 Thread Marek Olšák

This documentation was only written for TEMPs. I agree that TEMP
indexing doesn't necessarily need arrays to work, but INPUT and OUTPUT
indexing always needs them.

Marek

On Thu, Jun 25, 2015 at 1:31 AM, Rob Clark robdcl...@gmail.com wrote:
 tgsi.rst currently says:

 
 If no ArrayID is specified with an indirect addressing operand the whole
 register file might be accessed by this operand. This is strongly discouraged
 and will prevent packing of scalar/vec2 arrays and effective alias analysis.
 

 I'd be pretty happy to remove it and replace it w/ something to the
 effect of indirect addressing where no ArrayID is specified is
 undefined :-)

 BR,
 -R

 On Wed, Jun 24, 2015 at 6:17 PM, Marek Olšák mar...@gmail.com wrote:
 Indirect addressing without the ArrayID is undefined in the general
 case. You need the ArrayID to be able to tell where the array
 declaration starts and what its semantic name is.

 Marek

 On Wed, Jun 24, 2015 at 8:35 PM, Rob Clark robdcl...@gmail.com wrote:
 On Wed, Jun 24, 2015 at 12:31 PM, Marek Olšák mar...@gmail.com wrote:
 Yes, ArrayID != 0 means it's an array, otherwise it's just a compact
 way to add declarations.

 ok.. hmm, I guess tgsi.rst needs an update then..

 I recently added the array support for inputs and outputs, we just
 need to enable it on non-radeon non-swrast drivers:
 http://cgit.freedesktop.org/mesa/mesa/commit/?id=b6ebe7eabf54936a02acc0968e718e0c264a73f5

 ok, in principle (after fixing ttn based on the assumption that we
 won't get indirect access if ArrayID==0, and now that tgsi f/e is
 dropped) freedreno should be ok for array in/out's..  I'll enable the
 cap and see what happens after fixing ttn and debugging some indirect
 issues (seems either there is something I don't understand about the
 hw yet, or maybe an a4xx bug.. right now shaders that I think should
 work aren't and I'm in android hell trying to get some more traces
 from blob :-/)

 BR,
 -R

 It would be nice to enable the arrays on all drivers instead of
 working around it indefinitely.

 Marek

 On Wed, Jun 24, 2015 at 1:31 PM, Rob Clark robdcl...@gmail.com wrote:
 Ok, I *thought* we didn't get ArrayID on IN/OUT, but only TEMP?  If it
 is safe to assume that we always get ArrayID that makes it much
 easier.

 BR,
 -R

 On Wed, Jun 24, 2015 at 5:39 AM, Marek Olšák mar...@gmail.com wrote:
 It's not an array, because the ArrayID is 0. It's a valid non-array
 declaration. If any TGSI user doesn't understand it, that user should
 be fixed.

 Marek

 On Tue, Jun 23, 2015 at 3:20 PM, Rob Clark robdcl...@gmail.com wrote:
 From: Rob Clark robcl...@freedesktop.org

 Ok, so actually there is a ttn issue here to fix as well.. but it
 brought up a question in my mind.  When ttn sees something like

   DCL IN[0..1]

 it will treat that as an array (which in the end will result in
 constraints about where the registers get allocated.  Which is not
 really ideal.

 With glsl we don't actually get input arrays (but instead a bunch
 of MOV's to a TEMP array) currently.  So I'm not quite sure how
 an actual input array should look.  (But my preference would be
 IN[a..b] for arrays and IN[c] otherwise)
 ---
  src/gallium/auxiliary/hud/hud_context.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/src/gallium/auxiliary/hud/hud_context.c 
 b/src/gallium/auxiliary/hud/hud_context.c
 index 6a124f7..2b6d3a7 100644
 --- a/src/gallium/auxiliary/hud/hud_context.c
 +++ b/src/gallium/auxiliary/hud/hud_context.c
 @@ -1163,7 +1163,8 @@ hud_create(struct pipe_context *pipe, struct 
 cso_context *cso)
 {
static const char *vertex_shader_text = {
   VERT\n
 - DCL IN[0..1]\n
 + DCL IN[0]\n
 + DCL IN[1]\n
   DCL OUT[0], POSITION\n
   DCL OUT[1], COLOR[0]\n /* color */
   DCL OUT[2], GENERIC[0]\n /* texcoord */
 --
 2.4.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] glsls: Modify exec_list to avoid strict-aliasing violations

2015-06-25 Thread Eero Tamminen


Hi,

On 06/25/2015 02:48 AM, Davin McCall wrote:

In terms of performance:

(export LIBGL_ALWAYS_SOFTWARE=1; time glmark2)


For Intel driver, INTEL_NO_HW=1 could be used.

(Do other drivers have something similar?)



-fno-strict-aliasing:

glmark2 Score: 244
real5m34.707s
user11m36.192s
sys0m29.596s

with strict aliasing:

glmark2 Score: 247
real5m34.438s
user11m29.904s
sys0m29.556s

Again, only a very small improvement when strict aliasing is enabled.
With the above in mind it is reasonable to question whether this patch
is worthwhile.


IMHO to evaluate that, we would need data for more drivers, not just for 
SW rendering.




However, it's done, and it seems to work, so I offer it for review.



- Eero


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallium/hud: prevent NULL pointer dereference with pipe_query functions

2015-06-25 Thread Marek Olšák

What's the point of drawing a HUD pane if a query cannot be created?
Can we detect this during initialization?

Marek

On Wed, Jun 24, 2015 at 9:26 PM, Samuel Pitoiset
samuel.pitoi...@gmail.com wrote:
 The HUD doesn't check if query_create() fails and it calls other
 pipe_query functions with NULL pointer instead of a valid query object.

 Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com
 ---
  src/gallium/auxiliary/hud/hud_driver_query.c | 14 --
  1 file changed, 8 insertions(+), 6 deletions(-)

 diff --git a/src/gallium/auxiliary/hud/hud_driver_query.c 
 b/src/gallium/auxiliary/hud/hud_driver_query.c
 index 603aba7..ee71678 100644
 --- a/src/gallium/auxiliary/hud/hud_driver_query.c
 +++ b/src/gallium/auxiliary/hud/hud_driver_query.c
 @@ -62,7 +62,8 @@ query_new_value(struct hud_graph *gr)
 uint64_t now = os_time_get();

 if (info-last_time) {
 -  pipe-end_query(pipe, info-query[info-head]);
 +  if (info-query[info-head])
 + pipe-end_query(pipe, info-query[info-head]);

/* read query results */
while (1) {
 @@ -70,7 +71,7 @@ query_new_value(struct hud_graph *gr)
   union pipe_query_result result;
   uint64_t *res64 = (uint64_t *)result;

 - if (pipe-get_query_result(pipe, query, FALSE, result)) {
 + if (query  pipe-get_query_result(pipe, query, FALSE, result)) {
  info-results_cumulative += res64[info-result_index];
  info-num_results++;

 @@ -88,7 +89,8 @@ query_new_value(struct hud_graph *gr)
 gallium_hud: all queries are busy after %i frames, 
 can't add another query\n,
 NUM_QUERIES);
 -   pipe-destroy_query(pipe, info-query[info-head]);
 +   if (info-query[info-head])
 +  pipe-destroy_query(pipe, info-query[info-head]);
 info-query[info-head] =
   pipe-create_query(pipe, info-query_type, 0);
  }
 @@ -113,15 +115,15 @@ query_new_value(struct hud_graph *gr)
   info-results_cumulative = 0;
   info-num_results = 0;
}
 -
 -  pipe-begin_query(pipe, info-query[info-head]);
 }
 else {
/* initialize */
info-last_time = now;
info-query[info-head] = pipe-create_query(pipe, info-query_type, 
 0);
 -  pipe-begin_query(pipe, info-query[info-head]);
 }
 +
 +   if (info-query[info-head])
 +  pipe-begin_query(pipe, info-query[info-head]);
  }

  static void
 --
 2.4.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] glsls: Modify exec_list to avoid strict-aliasing violations

2015-06-25 Thread Davin McCall


On 25/06/15 14:32, Eero Tamminen wrote:

Hi,

On 06/25/2015 03:53 PM, Davin McCall wrote:

On 25/06/15 12:27, Eero Tamminen wrote:

On 06/25/2015 02:48 AM, Davin McCall wrote:

In terms of performance:

(export LIBGL_ALWAYS_SOFTWARE=1; time glmark2)


For Intel driver, INTEL_NO_HW=1 could be used.

(Do other drivers have something similar?)


Unfortunately I do not have an Intel display set up.


If you can get libINTEL_DEVID_OVERRIDEdrm to use libdrm_intel, you can 
fake desired

InteL HW to Mesa with INTEL_DEVID_OVERRIDE environment variable.

Similarly to INTEL_NO_HW, it prevents batches from being submitted
to GPU.



Ok, thanks, I'll look into this shortly. Any pointers on how to get 
libdrm to use libdrm_intel?





When testing 3D driver CPU side optimizations, one should either use 
test specifically written for testing driver CPU overhead (proprietary 
benchmarks have such tests) or force test-case to be CPU bound e.g. 
with INTEL_NO_HW.




Understood. The 'user' time divided by the glmark2 score should however 
give a relative indication of the CPU processing required per frame, right?


Davin

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa: remove unnecessary checks in _mesa_readpixels_needs_slow_path

2015-06-25 Thread Marek Olšák

Gallium should be alright. We'll let you know if we find a regression,
but I don't think there will be any.

Reviewed-by: Marek Olšák marek.ol...@amd.com

Marek

On Thu, Jun 25, 2015 at 2:45 AM, Anuj Phogat anuj.pho...@gmail.com wrote:
 On Tue, Jun 23, 2015 at 3:34 AM, Iago Toral Quiroga ito...@igalia.com wrote:
 readpixels_can_use_memcpy will later call 
 _mesa_format_matches_format_and_type
 which does much tighter checks than these to decide if we can use
 memcpy for readpixels.

 Also, the checks do not seem to be extensive enough anyway, since we are
 checking for signed/unsigned conversion only when the framebuffer has 
 integers,
 but the same checks could be done for other types anyway, since as long as
 there is a signed/unsigned conversion we can't memcpy.

 No regressions observed on i965/llvmpipe.
 ---
 The way gallium uses _mesa_format_matches_format_and_type and
 _mesa_readpixels_needs_slow_path is a bit different, they call these
 functions to decide if they want to fallback to Mesa's implementation
 of ReadPixels, not exactly to check if we can memcpy... so it is unclear
 to me if some combinations may still need these checks to make the right
 decision. I did not see any regressions with llvmpipe at least, so I am
 assuming that they are not needed, but maybe someone wants to give this
 patch a test run on nouveau or radeon, just in case. If they are needed
 I guess the right thing would be to move the checks to st_cb_readpixels.c.

  src/mesa/main/readpix.c | 16 
  1 file changed, 16 deletions(-)

 diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
 index a3357cd..e256695 100644
 --- a/src/mesa/main/readpix.c
 +++ b/src/mesa/main/readpix.c
 @@ -128,7 +128,6 @@ _mesa_readpixels_needs_slow_path(const struct gl_context 
 *ctx, GLenum format,
  {
 struct gl_renderbuffer *rb =
   _mesa_get_read_renderbuffer_for_format(ctx, format);
 -   GLenum srcType;

 assert(rb);

 @@ -153,21 +152,6 @@ _mesa_readpixels_needs_slow_path(const struct 
 gl_context *ctx, GLenum format,
   return GL_TRUE;
}

 -  /* Conversion between signed and unsigned integers needs masking
 -   * (it isn't just memcpy). */
 -  srcType = _mesa_get_format_datatype(rb-Format);
 -
 -  if ((srcType == GL_INT 
 -   (type == GL_UNSIGNED_INT ||
 -type == GL_UNSIGNED_SHORT ||
 -type == GL_UNSIGNED_BYTE)) ||
 -  (srcType == GL_UNSIGNED_INT 
 -   (type == GL_INT ||
 -type == GL_SHORT ||
 -type == GL_BYTE))) {
 - return GL_TRUE;
 -  }
 -
/* And finally, see if there are any transfer ops. */
return get_readpixels_transfer_ops(ctx, rb-Format, format, type,
   uses_blit) != 0;
 --
 1.9.1

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

 Thanks for the patch Iago.
 Reviewed-by: Anuj Phogat anuj.pho...@gmail.com

 You might want to wait few days to hear any comments on the gallium
 usage of the function. If no one objects you can push it upstream.
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 90264] [Regression, bisected] Tooltip corruption in Chrome

2015-06-25 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=90264

--- Comment #21 from Jose P. lbdkm...@sharklasers.com ---
This seems to have been fixed, I can't see the corruption anymore. I'm using
Chromium 43.0.2357.81 on Ubuntu 14.04 (64-bit) and Mesa 10.7.0-devel
(git-20dca37 2015-06-23 trusty-oibaf-ppa).

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] configure: use $target_cpu, not $host_cpu when setting asm_arch

2015-06-25 Thread Brian Paul



I've got 32-bit libs building on 64-bit Ubuntu and Fedora.  But I've 
found a weird problem.


On Fedora 22, for example, Mesa's make install creates a 
/usr/lib/libGL.la file which contains the line:


dependency_libs=' -L/usr/lib -lexpat -L/usr/lib64 /usr/lib64/libglapi.la 
-lXext -lXdamage -lXfixes -lX11-xcb -lX11 -lxcb-glx -lxcb-dri2 -lxcb 
-lXxf86vm /usr/lib/libdrm.la -ludev -lm -lpthread -ldl'


Note the -L/usr/lib/64 and /usr/lib64/libglapi.la parts.

When I try to build 32-bit Mesa demos, libtools finds this libGL.la file 
and tries to link the 32-bit glxgears.o (for example) with 
/usr/lib64/libglapi.so.  That fails, of course.


Removing the /usr/lib/libGL.la file is a work-around but I'm interested 
in finding a better solution.


-Brian
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC] gallium/hud: fix issue w/ tgsi_to_nir

2015-06-25 Thread Rob Clark

hmm, well I still think tgsi.rst should get some clarification, if for
no other reason than I misunderstood it (and therefore sooner or later
I guess someone else will too)..

I can take a shot at a doc patch..

BR,
-R

On Thu, Jun 25, 2015 at 8:00 AM, Marek Olšák mar...@gmail.com wrote:
 This documentation was only written for TEMPs. I agree that TEMP
 indexing doesn't necessarily need arrays to work, but INPUT and OUTPUT
 indexing always needs them.

 Marek

 On Thu, Jun 25, 2015 at 1:31 AM, Rob Clark robdcl...@gmail.com wrote:
 tgsi.rst currently says:

 
 If no ArrayID is specified with an indirect addressing operand the whole
 register file might be accessed by this operand. This is strongly discouraged
 and will prevent packing of scalar/vec2 arrays and effective alias analysis.
 

 I'd be pretty happy to remove it and replace it w/ something to the
 effect of indirect addressing where no ArrayID is specified is
 undefined :-)

 BR,
 -R

 On Wed, Jun 24, 2015 at 6:17 PM, Marek Olšák mar...@gmail.com wrote:
 Indirect addressing without the ArrayID is undefined in the general
 case. You need the ArrayID to be able to tell where the array
 declaration starts and what its semantic name is.

 Marek

 On Wed, Jun 24, 2015 at 8:35 PM, Rob Clark robdcl...@gmail.com wrote:
 On Wed, Jun 24, 2015 at 12:31 PM, Marek Olšák mar...@gmail.com wrote:
 Yes, ArrayID != 0 means it's an array, otherwise it's just a compact
 way to add declarations.

 ok.. hmm, I guess tgsi.rst needs an update then..

 I recently added the array support for inputs and outputs, we just
 need to enable it on non-radeon non-swrast drivers:
 http://cgit.freedesktop.org/mesa/mesa/commit/?id=b6ebe7eabf54936a02acc0968e718e0c264a73f5

 ok, in principle (after fixing ttn based on the assumption that we
 won't get indirect access if ArrayID==0, and now that tgsi f/e is
 dropped) freedreno should be ok for array in/out's..  I'll enable the
 cap and see what happens after fixing ttn and debugging some indirect
 issues (seems either there is something I don't understand about the
 hw yet, or maybe an a4xx bug.. right now shaders that I think should
 work aren't and I'm in android hell trying to get some more traces
 from blob :-/)

 BR,
 -R

 It would be nice to enable the arrays on all drivers instead of
 working around it indefinitely.

 Marek

 On Wed, Jun 24, 2015 at 1:31 PM, Rob Clark robdcl...@gmail.com wrote:
 Ok, I *thought* we didn't get ArrayID on IN/OUT, but only TEMP?  If it
 is safe to assume that we always get ArrayID that makes it much
 easier.

 BR,
 -R

 On Wed, Jun 24, 2015 at 5:39 AM, Marek Olšák mar...@gmail.com wrote:
 It's not an array, because the ArrayID is 0. It's a valid non-array
 declaration. If any TGSI user doesn't understand it, that user should
 be fixed.

 Marek

 On Tue, Jun 23, 2015 at 3:20 PM, Rob Clark robdcl...@gmail.com wrote:
 From: Rob Clark robcl...@freedesktop.org

 Ok, so actually there is a ttn issue here to fix as well.. but it
 brought up a question in my mind.  When ttn sees something like

   DCL IN[0..1]

 it will treat that as an array (which in the end will result in
 constraints about where the registers get allocated.  Which is not
 really ideal.

 With glsl we don't actually get input arrays (but instead a bunch
 of MOV's to a TEMP array) currently.  So I'm not quite sure how
 an actual input array should look.  (But my preference would be
 IN[a..b] for arrays and IN[c] otherwise)
 ---
  src/gallium/auxiliary/hud/hud_context.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/src/gallium/auxiliary/hud/hud_context.c 
 b/src/gallium/auxiliary/hud/hud_context.c
 index 6a124f7..2b6d3a7 100644
 --- a/src/gallium/auxiliary/hud/hud_context.c
 +++ b/src/gallium/auxiliary/hud/hud_context.c
 @@ -1163,7 +1163,8 @@ hud_create(struct pipe_context *pipe, struct 
 cso_context *cso)
 {
static const char *vertex_shader_text = {
   VERT\n
 - DCL IN[0..1]\n
 + DCL IN[0]\n
 + DCL IN[1]\n
   DCL OUT[0], POSITION\n
   DCL OUT[1], COLOR[0]\n /* color */
   DCL OUT[2], GENERIC[0]\n /* texcoord */
 --
 2.4.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] glsls: Modify exec_list to avoid strict-aliasing violations

2015-06-25 Thread Davin McCall


Hi Eero,

On 25/06/15 12:27, Eero Tamminen wrote:

Hi,

On 06/25/2015 02:48 AM, Davin McCall wrote:

In terms of performance:

(export LIBGL_ALWAYS_SOFTWARE=1; time glmark2)


For Intel driver, INTEL_NO_HW=1 could be used.

(Do other drivers have something similar?)


Unfortunately I do not have an Intel display set up. What exactly would 
INTEL_NO_HW=1 do?





Again, only a very small improvement when strict aliasing is enabled.
With the above in mind it is reasonable to question whether this patch
is worthwhile.


IMHO to evaluate that, we would need data for more drivers, not just 
for SW rendering.


Given that we are talking about using a compiler option when compiling 
Mesa itself, I do not imagine that we would see a more significant 
change in performance when using HW assisted rendering. The code that 
runs on the GPU is the same in either case.


Using the R600 driver (and Radeon 6770 hardware) I get:


With -fno-strict-aliasing:
glmark2 Score: 1614
real5m31.729s
user0m43.828s
sys0m20.892s

With strict aliasing:
glmark2 Score: 1626
real5m31.725s
user0m43.496s
sys0m20.924s


This is all on x86, 32-bit. It's possible that more of an improvement 
would be seen on less register-starved architectures (including Intel 
x86-64), since more register spills/loads can potentially be avoided. I 
don't have the capability to test that right now. It's entirely possible 
that compiling without -fno-strict-aliasing on x86-64 will not (at 
present, even with this patch applied) produce a working library.


Davin

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] i965: Delete linked GLSL IR when using NIR.

2015-06-25 Thread Tapani Pälli

Huh I see this went in already, I've noticed a problem and thought to 
share it.


Currently program resource list (used by gl api shader queries) is 
generated in linker, before backend LinkShader call. What this means is 
that it relies on frontend optimization passes to get rid of dead inputs 
and outputs. But .. this does not seem to always happen, sometimes these 
get removed only during backend optimization passes. I have a bug on 
this as #90925.


There's 2 possibilities to move with this, either move resource list 
creation to happen after LinkShader (which would mean we should not free 
IR during LinkShader) or try to fix frontend dead code removal to 
recognize the case in the bug. I will keep digging why the variable in 
question is not recognized by the frontend passes, just wanted to let 
you know!



On 06/11/2015 07:40 PM, Jason Ekstrand wrote:

On Thu, Jun 11, 2015 at 12:41 AM, Tapani Pälli tapani.pa...@intel.com wrote:

This is based on Kenneth's patch to delete 'most of the IR'. Due to
linker changes to clone variables, we can now free all of IR.

Saves 58MB of memory when replaying a Dota 2 trace on Broadwell.


I think we've saved ~50 MB 3 times now on that one dota trace.  Good work guys!


Signed-off-by: Tapani Pälli tapani.pa...@intel.com
---
  src/mesa/drivers/dri/i965/brw_shader.cpp | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index 76285f2..99de1cd 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -297,8 +297,11 @@ brw_link_shader(struct gl_context *ctx, struct 
gl_shader_program *shProg)

brw_add_texrect_params(prog);

-  if (options-NirOptions)
+  if (options-NirOptions) {
   prog-nir = brw_create_nir(brw, shProg, prog, (gl_shader_stage) 
stage);
+ ralloc_free(shader-ir);
+ shader-ir = NULL;
+  }

_mesa_reference_program(ctx, prog, NULL);
 }
--
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2] mesa : NULL check InfoLog

2015-06-25 Thread Marta Lofstedt

From: Marta Lofstedt marta.lofst...@intel.com

When a program is compiled, but linking failed the
sh-InfoLog could be NULL. This is expoloited
by OpenGL ES 3.1 conformance tests.

V2: ralloc_strdup shProg-InfoLog

Signed-off-by: Marta Lofstedt marta.lofst...@intel.com
---
 src/mesa/main/shaderapi.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
index a4296ad..bc6625a 100644
--- a/src/mesa/main/shaderapi.c
+++ b/src/mesa/main/shaderapi.c
@@ -1921,7 +1921,10 @@ _mesa_create_shader_program(struct gl_context* ctx, 
GLboolean separate,
 #endif
 }
 
-ralloc_strcat(shProg-InfoLog, sh-InfoLog);
+ if (sh-InfoLog)
+ralloc_strcat(shProg-InfoLog, sh-InfoLog);
+ else
+ralloc_strdup(ctx, shProg-InfoLog);
   }
 
   delete_shader(ctx, shader);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/3] i965: Stop aux data compare preventing program binary re-use

2015-06-25 Thread Topi Pohjolainen

Items in the program cache consist of three things: key, the data
representing the instructions and auxiliary data representing
uniform storage. The data consisting of instructions is stored into
a drm buffer object while the key and the auxiliary data reside in
malloced section. Now the cache uploading is equipped with a check
that iterates over existing items and seeks to find a another item
using identical instruction data than the one being just uploaded.
If such is found there is no need to add another section into the
drm buffer object holding identical copy of the existing one. The
item just being uploaded should instead simply point to the same
offset in the underlying drm buffer object.

Unfortunately the check for the matching instruction data is
coupled with a check for matching auxiliary data also. This
effectively prevents the cache from ever containing two items
that could share a section in the drm buffer object.

The constraint for the instruction data and auxiliary data to
match is, fortunately, unnecessary strong. When items are stored
into the cache they will anyway contain their own copy of the
auxiliary data (even if they matched - which they in real world
never will). The only thing the items would be sharing is the
instruction data and hence we should only check for that to match
and nothing else.

No piglit regression in jenkins.

CC: Kenneth Graunke kenn...@whitecape.org
Signed-off-by: Topi Pohjolainen topi.pohjolai...@intel.com
---
 src/mesa/drivers/dri/i965/brw_state_cache.c | 52 +++--
 1 file changed, 20 insertions(+), 32 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c 
b/src/mesa/drivers/dri/i965/brw_state_cache.c
index d42b4b4..a0ec280 100644
--- a/src/mesa/drivers/dri/i965/brw_state_cache.c
+++ b/src/mesa/drivers/dri/i965/brw_state_cache.c
@@ -200,36 +200,23 @@ brw_cache_new_bo(struct brw_cache *cache, uint32_t 
new_size)
 }
 
 /**
- * Attempts to find an item in the cache with identical data and aux
- * data to use
+ * Attempts to find an item in the cache with identical data.
  */
-static bool
-brw_try_upload_using_copy(struct brw_cache *cache,
- struct brw_cache_item *result_item,
- const void *data,
- const void *aux)
+static const struct brw_cache_item *
+brw_lookup_prog(const struct brw_cache *cache,
+enum brw_cache_id cache_id,
+const void *data, unsigned data_size)
 {
-   struct brw_context *brw = cache-brw;
+   const struct brw_context *brw = cache-brw;
int i;
-   struct brw_cache_item *item;
+   const struct brw_cache_item *item;
 
for (i = 0; i  cache-size; i++) {
   for (item = cache-items[i]; item; item = item-next) {
-const void *item_aux = item-key + item-key_size;
 int ret;
 
-if (item-cache_id != result_item-cache_id ||
-item-size != result_item-size ||
-item-aux_size != result_item-aux_size) {
-   continue;
-}
-
- if (cache-aux_compare[result_item-cache_id]) {
-if (!cache-aux_compare[result_item-cache_id](item_aux, aux))
-   continue;
- } else if (memcmp(item_aux, aux, item-aux_size) != 0) {
+if (item-cache_id != cache_id || item-size != data_size)
continue;
-}
 
  if (!brw-has_llc)
 drm_intel_bo_map(cache-bo, false);
@@ -239,13 +226,11 @@ brw_try_upload_using_copy(struct brw_cache *cache,
 if (ret)
continue;
 
-result_item-offset = item-offset;
-
-return true;
+return item;
   }
}
 
-   return false;
+   return NULL;
 }
 
 static uint32_t
@@ -294,6 +279,8 @@ brw_upload_cache(struct brw_cache *cache,
 {
struct brw_context *brw = cache-brw;
struct brw_cache_item *item = CALLOC_STRUCT(brw_cache_item);
+   const struct brw_cache_item *matching_data =
+  brw_lookup_prog(cache, cache_id, data, data_size);
GLuint hash;
void *tmp;
 
@@ -305,14 +292,15 @@ brw_upload_cache(struct brw_cache *cache,
hash = hash_key(item);
item-hash = hash;
 
-   /* If we can find a matching prog/prog_data combo in the cache
-* already, then reuse the existing stuff.  This will mean not
-* flagging CACHE_NEW_* when transitioning between the two
-* equivalent hash keys.  This is notably useful for programs
-* generating shaders at runtime, where multiple shaders may
-* compile to the thing in our backend.
+   /* If we can find a matching prog in the cache already, then reuse the
+* existing stuff without creating new copy into the underlying buffer
+* object. This is notably useful for programs generating shaders at
+* runtime, where multiple shaders may compile to the same thing in our
+* backend.
 */
-   if (!brw_try_upload_using_copy(cache, item, data, aux)) {
+   if (matching_data) {
+  item-offset = matching_data-offset;
+   } else {

[Mesa-dev] [PATCH 2/3] i965: Only write program to cache when it doesn't exist yet

2015-06-25 Thread Topi Pohjolainen

Current logic re-writes the same data when existing data is found.
Not that this actually matters at the moment in practice, the
contraint for finding matching data is too severe to ever allow
data to be shared between two items in the cache.

CC: Kenneth Graunke kenn...@whitecape.org
Signed-off-by: Topi Pohjolainen topi.pohjolai...@intel.com
---
 src/mesa/drivers/dri/i965/brw_state_cache.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c 
b/src/mesa/drivers/dri/i965/brw_state_cache.c
index 97a41b9..d42b4b4 100644
--- a/src/mesa/drivers/dri/i965/brw_state_cache.c
+++ b/src/mesa/drivers/dri/i965/brw_state_cache.c
@@ -314,6 +314,13 @@ brw_upload_cache(struct brw_cache *cache,
 */
if (!brw_try_upload_using_copy(cache, item, data, aux)) {
   item-offset = brw_alloc_item_data(cache, data_size);
+
+  /* Copy data to the buffer */
+  if (brw-has_llc) {
+ memcpy((char *)cache-bo-virtual + item-offset, data, data_size);
+  } else {
+ drm_intel_bo_subdata(cache-bo, item-offset, data_size, data);
+  }
}
 
/* Set up the memory containing the key and aux_data */
@@ -332,13 +339,6 @@ brw_upload_cache(struct brw_cache *cache,
cache-items[hash] = item;
cache-n_items++;
 
-   /* Copy data to the buffer */
-   if (brw-has_llc) {
-  memcpy((char *) cache-bo-virtual + item-offset, data, data_size);
-   } else {
-  drm_intel_bo_subdata(cache-bo, item-offset, data_size, data);
-   }
-
*out_offset = item-offset;
*(void **)out_aux = (void *)((char *)item-key + item-key_size);
cache-brw-ctx.NewDriverState |= 1  cache_id;
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/3] i965: Rename brw_upload_item_data to brw_alloc_item_data

2015-06-25 Thread Topi Pohjolainen

and simplify the interface to take directly the size and to return
the offset. The routine does nothing more than allocate, it doesn't
upload anything.

CC: Kenneth Graunke kenn...@whitecape.org
Signed-off-by: Topi Pohjolainen topi.pohjolai...@intel.com
---
 src/mesa/drivers/dri/i965/brw_state_cache.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c 
b/src/mesa/drivers/dri/i965/brw_state_cache.c
index 157b33d..97a41b9 100644
--- a/src/mesa/drivers/dri/i965/brw_state_cache.c
+++ b/src/mesa/drivers/dri/i965/brw_state_cache.c
@@ -248,18 +248,17 @@ brw_try_upload_using_copy(struct brw_cache *cache,
return false;
 }
 
-static void
-brw_upload_item_data(struct brw_cache *cache,
-struct brw_cache_item *item,
-const void *data)
+static uint32_t
+brw_alloc_item_data(struct brw_cache *cache, uint32_t size)
 {
+   uint32_t offset;
struct brw_context *brw = cache-brw;
 
/* Allocate space in the cache BO for our new program. */
-   if (cache-next_offset + item-size  cache-bo-size) {
+   if (cache-next_offset + size  cache-bo-size) {
   uint32_t new_size = cache-bo-size * 2;
 
-  while (cache-next_offset + item-size  new_size)
+  while (cache-next_offset + size  new_size)
 new_size *= 2;
 
   brw_cache_new_bo(cache, new_size);
@@ -273,10 +272,12 @@ brw_upload_item_data(struct brw_cache *cache,
   brw_cache_new_bo(cache, cache-bo-size);
}
 
-   item-offset = cache-next_offset;
+   offset = cache-next_offset;
 
/* Programs are always 64-byte aligned, so set up the next one now */
-   cache-next_offset = ALIGN(item-offset + item-size, 64);
+   cache-next_offset = ALIGN(offset + size, 64);
+
+   return offset;
 }
 
 void
@@ -312,7 +313,7 @@ brw_upload_cache(struct brw_cache *cache,
 * compile to the thing in our backend.
 */
if (!brw_try_upload_using_copy(cache, item, data, aux)) {
-  brw_upload_item_data(cache, item, data);
+  item-offset = brw_alloc_item_data(cache, data_size);
}
 
/* Set up the memory containing the key and aux_data */
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 90264] [Regression, bisected] Tooltip corruption in Chrome

2015-06-25 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=90264

Boyan Ding stu_...@126.com changed:

   What|Removed |Added

 CC||stu_...@126.com

--- Comment #22 from Boyan Ding stu_...@126.com ---
This bug also occur on my computer with Intel graphics. It seems that it only
occur with DRI3 enabled, tested with both xf86-video-intel and
xf86-video-modesetting drivers.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] glsls: Modify exec_list to avoid strict-aliasing violations

2015-06-25 Thread Eero Tamminen


Hi,

On 06/25/2015 03:53 PM, Davin McCall wrote:

On 25/06/15 12:27, Eero Tamminen wrote:

On 06/25/2015 02:48 AM, Davin McCall wrote:

In terms of performance:

(export LIBGL_ALWAYS_SOFTWARE=1; time glmark2)


For Intel driver, INTEL_NO_HW=1 could be used.

(Do other drivers have something similar?)


Unfortunately I do not have an Intel display set up.


If you can get libINTEL_DEVID_OVERRIDEdrm to use libdrm_intel, you can 
fake desired

InteL HW to Mesa with INTEL_DEVID_OVERRIDE environment variable.

Similarly to INTEL_NO_HW, it prevents batches from being submitted
to GPU.



What exactly would INTEL_NO_HW=1 do?


http://www.mesa3d.org/envvars.html

INTEL_NO_HW - if set to 1, prevents batches from being submitted to the 
hardware.


I.e. driver does only CPU / user-space side things and skips GPU / 
kernel side work.




Again, only a very small improvement when strict aliasing is enabled.
With the above in mind it is reasonable to question whether this patch
is worthwhile.


IMHO to evaluate that, we would need data for more drivers, not just
for SW rendering.


Given that we are talking about using a compiler option when compiling
Mesa itself, I do not imagine that we would see a more significant
change in performance when using HW assisted rendering. The code that
runs on the GPU is the same in either case.


When testing 3D driver CPU side optimizations, one should either use 
test specifically written for testing driver CPU overhead (proprietary 
benchmarks have such tests) or force test-case to be CPU bound e.g. with 
INTEL_NO_HW.




Using the R600 driver (and Radeon 6770 hardware) I get:


With -fno-strict-aliasing:
glmark2 Score: 1614
real5m31.729s
user0m43.828s
sys0m20.892s

With strict aliasing:
glmark2 Score: 1626
real5m31.725s
user0m43.496s
sys0m20.924s


This is all on x86, 32-bit. It's possible that more of an improvement
would be seen on less register-starved architectures (including Intel
x86-64), since more register spills/loads can potentially be avoided. I
don't have the capability to test that right now. It's entirely possible
that compiling without -fno-strict-aliasing on x86-64 will not (at
present, even with this patch applied) produce a working library.



- Eero


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 02/19] i965/fs: Actually set/use the mlen for gen7 uniform pull constant loads

2015-06-25 Thread Jason Ekstrand

Previously, we were allocating the payload with different sizes per gen and
then figuring out the mlen in the generator based on gen.  This meant,
among other things, that the higher level passes knew nothing about it.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 19 ---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp |  9 +++--
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 3ec8e6a..31dfb24 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2909,14 +2909,18 @@ fs_visitor::lower_uniform_pull_constant_loads()
  assert(const_offset_reg.file == IMM 
 const_offset_reg.type == BRW_REGISTER_TYPE_UD);
  const_offset_reg.fixed_hw_reg.dw1.ud /= 4;
- fs_reg payload = fs_reg(GRF, alloc.allocate(1));
 
- /* We have to use a message header on Skylake to get SIMD4x2 mode.
-  * Reserve space for the register.
-  */
+ fs_reg payload, offset;
  if (devinfo-gen = 9) {
-payload.reg_offset++;
-alloc.sizes[payload.reg] = 2;
+/* We have to use a message header on Skylake to get SIMD4x2
+ * mode.  Reserve space for the register.
+*/
+offset = payload = fs_reg(GRF, alloc.allocate(2));
+offset.reg_offset++;
+inst-mlen = 2;
+ } else {
+offset = payload = fs_reg(GRF, alloc.allocate(1));
+inst-mlen = 1;
  }
 
  /* This is actually going to be a MOV, but since only the first dword
@@ -2925,7 +2929,7 @@ fs_visitor::lower_uniform_pull_constant_loads()
   * by live variable analysis, or register allocation will explode.
   */
  fs_inst *setup = new(mem_ctx) fs_inst(FS_OPCODE_SET_SIMD4X2_OFFSET,
-   8, payload, const_offset_reg);
+   8, offset, const_offset_reg);
  setup-force_writemask_all = true;
 
  setup-ir = inst-ir;
@@ -2938,6 +2942,7 @@ fs_visitor::lower_uniform_pull_constant_loads()
   */
  inst-opcode = FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GEN7;
  inst-src[1] = payload;
+ inst-base_mrf = -1;
 
  invalidate_live_intervals();
   } else {
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 2ed0bac..8d821ab 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -1054,7 +1054,6 @@ 
fs_generator::generate_uniform_pull_constant_load_gen7(fs_inst *inst,
struct brw_reg index,
struct brw_reg offset)
 {
-   assert(inst-mlen == 0);
assert(index.type == BRW_REGISTER_TYPE_UD);
 
assert(offset.file == BRW_GENERAL_REGISTER_FILE);
@@ -1069,12 +1068,10 @@ 
fs_generator::generate_uniform_pull_constant_load_gen7(fs_inst *inst,
 
struct brw_reg src = offset;
bool header_present = false;
-   int mlen = 1;
 
if (devinfo-gen = 9) {
   /* Skylake requires a message header in order to use SIMD4x2 mode. */
-  src = retype(brw_vec4_grf(offset.nr - 1, 0), BRW_REGISTER_TYPE_UD);
-  mlen = 2;
+  src = retype(brw_vec4_grf(offset.nr, 0), BRW_REGISTER_TYPE_UD);
   header_present = true;
 
   brw_push_insn_state(p);
@@ -1105,7 +1102,7 @@ 
fs_generator::generate_uniform_pull_constant_load_gen7(fs_inst *inst,
   0, /* LD message ignores sampler unit */
   GEN5_SAMPLER_MESSAGE_SAMPLE_LD,
   1, /* rlen */
-  mlen,
+  inst-mlen,
   header_present,
   BRW_SAMPLER_SIMD_MODE_SIMD4X2,
   0);
@@ -1135,7 +1132,7 @@ 
fs_generator::generate_uniform_pull_constant_load_gen7(fs_inst *inst,
   0, /* LD message ignores sampler unit */
   GEN5_SAMPLER_MESSAGE_SAMPLE_LD,
   1, /* rlen */
-  mlen,
+  inst-mlen,
   header_present,
   BRW_SAMPLER_SIMD_MODE_SIMD4X2,
   0);
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 05/19] i965/fs: Explicitly set the exec_size on the add(32) in interpolation setup

2015-06-25 Thread Jason Ekstrand

Soon we will start using the builder to explicitly set all the execution
sizes.  We could make a 32-wide builder, but the builder asserts that we
never grow it which is usually a reasonable assumption.  Sinc this one
instruction is a bit of an odd-ball, we just set the exec_size explicitly.

Reviewed-by: Iago Toral Quiroga ito...@igalia.com

v2: Explicitly new the fs_inst instead of using the builder and setting
exec_size after the fact.

v3: Set force_writemask_all with the builder instead of directly.  The
builder over-writes it if we set it manually.  Also, if we don't have
force_writemask_all in the builder it will assert-fail on SIMD32.
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 9a4bad6..8976c25 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1357,10 +1357,12 @@ fs_visitor::emit_interpolation_setup_gen6()
*/
   fs_reg int_pixel_xy(GRF, alloc.allocate(dispatch_width / 8),
   BRW_REGISTER_TYPE_UW, dispatch_width * 2);
-  abld.exec_all()
-  .ADD(int_pixel_xy,
-   fs_reg(stride(suboffset(g1_uw, 4), 1, 4, 0)),
-   fs_reg(brw_imm_v(0x11001010)));
+  fs_inst *add =
+ new (mem_ctx) fs_inst(BRW_OPCODE_ADD, dispatch_width * 2,
+   int_pixel_xy,
+   fs_reg(stride(suboffset(g1_uw, 4), 1, 4, 0)),
+   fs_reg(brw_imm_v(0x11001010)));
+  abld.exec_all().emit(add);
 
   this-pixel_x = vgrf(glsl_type::float_type);
   this-pixel_y = vgrf(glsl_type::float_type);
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 10/19] i965/fs: Make better use of the builder in shader_time

2015-06-25 Thread Jason Ekstrand

Previously, we were just depending on register widths to ensure that
various things were exec_size of 1 etc.  Now, we do so explicitly using the
builder.

Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 9855bfb..8024fae 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -557,7 +557,7 @@ fs_visitor::get_timestamp(const fs_builder bld)
/* We want to read the 3 fields we care about even if it's not enabled in
 * the dispatch.
 */
-   bld.exec_all().MOV(dst, ts);
+   bld.group(4, 0).exec_all().MOV(dst, ts);
 
/* The caller wants the low 32 bits of the timestamp.  Since it's running
 * at the GPU clock rate of ~1.2ghz, it will roll over every ~3 seconds,
@@ -604,17 +604,19 @@ fs_visitor::emit_shader_time_end()
start.negate = true;
fs_reg diff = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD, 1);
diff.set_smear(0);
-   ibld.ADD(diff, start, shader_end_time);
+
+   const fs_builder cbld = ibld.group(1, 0);
+   cbld.group(1, 0).ADD(diff, start, shader_end_time);
 
/* If there were no instructions between the two timestamp gets, the diff
 * is 2 cycles.  Remove that overhead, so I can forget about that when
 * trying to determine the time taken for single instructions.
 */
-   ibld.ADD(diff, diff, fs_reg(-2u));
-   SHADER_TIME_ADD(ibld, 0, diff);
-   SHADER_TIME_ADD(ibld, 1, fs_reg(1u));
+   cbld.ADD(diff, diff, fs_reg(-2u));
+   SHADER_TIME_ADD(cbld, 0, diff);
+   SHADER_TIME_ADD(cbld, 1, fs_reg(1u));
ibld.emit(BRW_OPCODE_ELSE);
-   SHADER_TIME_ADD(ibld, 2, fs_reg(1u));
+   SHADER_TIME_ADD(cbld, 2, fs_reg(1u));
ibld.emit(BRW_OPCODE_ENDIF);
 }
 
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 12/19] i965/fs: Use exec_size for determining regs read/written and partial writes

2015-06-25 Thread Jason Ekstrand

Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index d1e253a..4f56865 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -101,7 +101,7 @@ fs_inst::init(enum opcode opcode, uint8_t exec_size, const 
fs_reg dst,
case MRF:
case ATTR:
   this-regs_written =
- DIV_ROUND_UP(MAX2(dst.width * dst.stride, 1) * type_sz(dst.type), 32);
+ DIV_ROUND_UP(MAX2(exec_size * dst.stride, 1) * type_sz(dst.type), 32);
   break;
case BAD_FILE:
   this-regs_written = 0;
@@ -675,7 +675,7 @@ bool
 fs_inst::is_partial_write() const
 {
return ((this-predicate  this-opcode != BRW_OPCODE_SEL) ||
-   (this-dst.width * type_sz(this-dst.type))  32 ||
+   (this-exec_size * type_sz(this-dst.type))  32 ||
!this-dst.is_contiguous());
 }
 
@@ -729,8 +729,8 @@ fs_inst::regs_read(int arg) const
   if (src[arg].stride == 0) {
  return 1;
   } else {
- int size = src[arg].width * src[arg].stride * type_sz(src[arg].type);
- return (size + 31) / 32;
+ int size = this-exec_size * src[arg].stride * type_sz(src[arg].type);
+ return DIV_ROUND_UP(size, 32);
   }
case MRF:
   unreachable(MRF registers are not allowed as sources);
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 04/19] i965/fs: Report the right value in fs_inst::regs_read() for PIXEL_X/Y

2015-06-25 Thread Jason Ekstrand

Reviewed-by: Iago Toral Quiroga ito...@igalia.com
Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 589b74c..6cf9e96 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -726,6 +726,12 @@ fs_inst::regs_read(int arg) const
  return exec_size / 4;
   break;
 
+   case FS_OPCODE_PIXEL_X:
+   case FS_OPCODE_PIXEL_Y:
+  if (arg == 0)
+ return 2;
+  break;
+
default:
   if (is_tex()  arg == 0  src[0].file == GRF)
  return mlen;
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 01/19] i965/fs: Use a switch statement in fs_inst::regs_read()

2015-06-25 Thread Jason Ekstrand

This makes things a little simpler, more efficient, and quite a bit more
readable.

Reviewed-by: Iago Toral Quiroga ito...@igalia.com
Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 45 ++--
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 4292aa6..3ec8e6a 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -701,28 +701,29 @@ fs_inst::is_partial_write() const
 int
 fs_inst::regs_read(int arg) const
 {
-   if (is_tex()  arg == 0  src[0].file == GRF) {
-  return mlen;
-   } else if (opcode == FS_OPCODE_FB_WRITE  arg == 0) {
-  return mlen;
-   } else if (opcode == SHADER_OPCODE_URB_WRITE_SIMD8  arg == 0) {
-  return mlen;
-   } else if (opcode == SHADER_OPCODE_UNTYPED_ATOMIC  arg == 0) {
-  return mlen;
-   } else if (opcode == SHADER_OPCODE_UNTYPED_SURFACE_READ  arg == 0) {
-  return mlen;
-   } else if (opcode == SHADER_OPCODE_UNTYPED_SURFACE_WRITE  arg == 0) {
-  return mlen;
-   } else if (opcode == SHADER_OPCODE_TYPED_ATOMIC  arg == 0) {
-  return mlen;
-   } else if (opcode == SHADER_OPCODE_TYPED_SURFACE_READ  arg == 0) {
-  return mlen;
-   } else if (opcode == SHADER_OPCODE_TYPED_SURFACE_WRITE  arg == 0) {
-  return mlen;
-   } else if (opcode == FS_OPCODE_INTERPOLATE_AT_PER_SLOT_OFFSET  arg == 0) {
-  return mlen;
-   } else if (opcode == FS_OPCODE_LINTERP  arg == 0) {
-  return exec_size / 4;
+   switch (opcode) {
+   case FS_OPCODE_FB_WRITE:
+   case SHADER_OPCODE_URB_WRITE_SIMD8:
+   case SHADER_OPCODE_UNTYPED_ATOMIC:
+   case SHADER_OPCODE_UNTYPED_SURFACE_READ:
+   case SHADER_OPCODE_UNTYPED_SURFACE_WRITE:
+   case SHADER_OPCODE_TYPED_ATOMIC:
+   case SHADER_OPCODE_TYPED_SURFACE_READ:
+   case SHADER_OPCODE_TYPED_SURFACE_WRITE:
+   case FS_OPCODE_INTERPOLATE_AT_PER_SLOT_OFFSET:
+  if (arg == 0)
+ return mlen;
+  break;
+
+   case FS_OPCODE_LINTERP:
+  if (arg == 0)
+ return exec_size / 4;
+  break;
+
+   default:
+  if (is_tex()  arg == 0  src[0].file == GRF)
+ return mlen;
+  break;
}
 
switch (src[arg].file) {
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 17/19] i965/fs: Use exec_size instead of dst.width for computing component size

2015-06-25 Thread Jason Ekstrand

There are a variety of places where we use dst.width / 8 to compute the
size of a single logical channel.  Instead, we should be using exec_size.

Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com
---
 src/mesa/drivers/dri/i965/brw_fs.cpp| 6 +++---
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp| 2 +-
 src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp  | 2 +-
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp| 2 +-
 src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp | 4 ++--
 5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index aeaa1c4..6e8d9e8 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2274,12 +2274,12 @@ fs_visitor::opt_register_renaming()
 
   if (depth == 0 
   inst-dst.file == GRF 
-  alloc.sizes[inst-dst.reg] == inst-dst.width / 8 
+  alloc.sizes[inst-dst.reg] == inst-exec_size / 8 
   !inst-is_partial_write()) {
  if (remap[dst] == -1) {
 remap[dst] = dst;
  } else {
-remap[dst] = alloc.allocate(inst-dst.width / 8);
+remap[dst] = alloc.allocate(inst-exec_size / 8);
 inst-dst.reg = remap[dst];
 progress = true;
  }
@@ -2410,7 +2410,7 @@ fs_visitor::compute_to_mrf()
 /* Things returning more than one register would need us to
  * understand coalescing out more than one MOV at a time.
  */
-if (scan_inst-regs_written  scan_inst-dst.width / 8)
+if (scan_inst-regs_written  scan_inst-exec_size / 8)
break;
 
/* SEND instructions can't have MRF as a destination. */
diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
index 29d1f2a..29b46b9 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
@@ -179,7 +179,7 @@ static void
 create_copy_instr(const fs_builder bld, fs_inst *inst, fs_reg src, bool 
negate)
 {
int written = inst-regs_written;
-   int dst_width = inst-dst.width / 8;
+   int dst_width = inst-exec_size / 8;
const fs_builder ubld = bld.group(inst-exec_size, inst-force_sechalf)
   .exec_all(inst-force_writemask_all);
fs_inst *copy;
diff --git a/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
index 2ad7079..149c0f0 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
@@ -196,7 +196,7 @@ fs_visitor::register_coalesce()
 continue;
  }
  reg_to_offset[offset] = inst-dst.reg_offset;
- if (inst-src[0].width == 16)
+ if (inst-exec_size == 16)
 reg_to_offset[offset + 1] = inst-dst.reg_offset + 1;
  mov[offset] = inst;
  channels_remaining -= inst-regs_written;
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 7651e96..df24fb9 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -912,7 +912,7 @@ fs_visitor::emit_texture(ir_texture_opcode op,
   bld.emit(SHADER_OPCODE_INT_QUOTIENT, fixed_depth, depth, fs_reg(6));
 
   fs_reg *fixed_payload = ralloc_array(mem_ctx, fs_reg, 
inst-regs_written);
-  int components = inst-regs_written / (dst.width / 8);
+  int components = inst-regs_written / (inst-exec_size / 8);
   for (int i = 0; i  components; i++) {
  if (i == 2) {
 fixed_payload[i] = fixed_depth;
diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp 
b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
index ee0add5..b49961f 100644
--- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
+++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
@@ -1314,8 +1314,8 @@ fs_instruction_scheduler::choose_instruction_to_schedule()
 * single-result send is probably actually reducing register
 * pressure.
 */
-   if (inst-regs_written = inst-dst.width / 8 
-   chosen_inst-regs_written  chosen_inst-dst.width / 8) {
+   if (inst-regs_written = inst-exec_size / 8 
+   chosen_inst-regs_written  chosen_inst-exec_size / 8) {
   chosen = n;
   continue;
} else if (inst-regs_written  chosen_inst-regs_written) {
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 15/19] i965/fs: Use the builder dispatch width instead of dst.width for pull constants

2015-06-25 Thread Jason Ekstrand

Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 6e45fa7..aeaa1c4 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -188,7 +188,7 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
bld,
bld.ADD(vec4_offset, varying_offset, fs_reg(const_offset  ~3));
 
int scale = 1;
-   if (devinfo-gen == 4  dst.width == 8) {
+   if (devinfo-gen == 4  bld.dispatch_width() == 8) {
   /* Pre-gen5, we can either use a SIMD8 message that requires (header,
* u, v, r) as parameters, or we can just use the SIMD16 message
* consisting of (header, u).  We choose the second, at the cost of a
@@ -204,9 +204,9 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
bld,
   op = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD;
 
assert(dst.width % 8 == 0);
-   int regs_written = 4 * (dst.width / 8) * scale;
+   int regs_written = 4 * (bld.dispatch_width() / 8) * scale;
fs_reg vec4_result = fs_reg(GRF, alloc.allocate(regs_written),
-   dst.type, dst.width);
+   dst.type, bld.dispatch_width());
fs_inst *inst = bld.emit(op, vec4_result, surf_index, vec4_offset);
inst-regs_written = regs_written;
 
@@ -216,7 +216,7 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
bld,
   if (devinfo-gen == 4)
  inst-mlen = 3;
   else
- inst-mlen = 1 + dispatch_width / 8;
+ inst-mlen = 1 + bld.dispatch_width() / 8;
}
 
bld.MOV(dst, offset(vec4_result, bld, (const_offset  3) * scale));
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 19/19] i965/fs: Remove the width field from fs_reg

2015-06-25 Thread Jason Ekstrand

As of now, the width field is no longer used for anything.  The width field
seemed like a good idea at the time but is actually entirely redundant
with the instruction's execution size.  Initially, it gave us the ability
to easily set the instructions execution size based entirely on register
widths.  With the builder, we can easiliy set the sizes explicitly and the
width field doesn't have as much purpose.  At this point, it's just
redundant information that can get out of sync so it really needs to go.

Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com
---
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 62 --
 src/mesa/drivers/dri/i965/brw_fs_builder.h | 19 ++-
 .../drivers/dri/i965/brw_fs_copy_propagation.cpp   |  4 --
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp   |  6 +--
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  |  4 +-
 .../drivers/dri/i965/brw_fs_register_coalesce.cpp  |  1 -
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 26 -
 src/mesa/drivers/dri/i965/brw_ir_fs.h  | 15 +-
 8 files changed, 30 insertions(+), 107 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 6e8d9e8..a96424e 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -203,10 +203,8 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
bld,
else
   op = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD;
 
-   assert(dst.width % 8 == 0);
int regs_written = 4 * (bld.dispatch_width() / 8) * scale;
-   fs_reg vec4_result = fs_reg(GRF, alloc.allocate(regs_written),
-   dst.type, bld.dispatch_width());
+   fs_reg vec4_result = fs_reg(GRF, alloc.allocate(regs_written), dst.type);
fs_inst *inst = bld.emit(op, vec4_result, surf_index, vec4_offset);
inst-regs_written = regs_written;
 
@@ -310,7 +308,6 @@ fs_inst::is_copy_payload(const brw::simple_allocator 
grf_alloc) const
 
for (int i = 0; i  this-sources; i++) {
   reg.type = this-src[i].type;
-  reg.width = this-src[i].width;
   if (!this-src[i].equals(reg))
  return false;
 
@@ -366,7 +363,6 @@ fs_reg::fs_reg(float f)
this-file = IMM;
this-type = BRW_REGISTER_TYPE_F;
this-fixed_hw_reg.dw1.f = f;
-   this-width = 1;
 }
 
 /** Immediate value constructor. */
@@ -376,7 +372,6 @@ fs_reg::fs_reg(int32_t i)
this-file = IMM;
this-type = BRW_REGISTER_TYPE_D;
this-fixed_hw_reg.dw1.d = i;
-   this-width = 1;
 }
 
 /** Immediate value constructor. */
@@ -386,7 +381,6 @@ fs_reg::fs_reg(uint32_t u)
this-file = IMM;
this-type = BRW_REGISTER_TYPE_UD;
this-fixed_hw_reg.dw1.ud = u;
-   this-width = 1;
 }
 
 /** Vector float immediate value constructor. */
@@ -417,7 +411,6 @@ fs_reg::fs_reg(struct brw_reg fixed_hw_reg)
this-file = HW_REG;
this-fixed_hw_reg = fixed_hw_reg;
this-type = fixed_hw_reg.type;
-   this-width = 1  fixed_hw_reg.width;
 }
 
 bool
@@ -432,7 +425,6 @@ fs_reg::equals(const fs_reg r) const
abs == r.abs 
!reladdr  !r.reladdr 
memcmp(fixed_hw_reg, r.fixed_hw_reg, sizeof(fixed_hw_reg)) == 0 
-   width == r.width 
stride == r.stride);
 }
 
@@ -504,7 +496,7 @@ fs_visitor::get_timestamp(const fs_builder bld)
   0),
  BRW_REGISTER_TYPE_UD));
 
-   fs_reg dst = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD, 4);
+   fs_reg dst = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD);
 
/* We want to read the 3 fields we care about even if it's not enabled in
 * the dispatch.
@@ -554,7 +546,7 @@ fs_visitor::emit_shader_time_end()
 
fs_reg start = shader_start_time;
start.negate = true;
-   fs_reg diff = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD, 1);
+   fs_reg diff = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD);
diff.set_smear(0);
 
const fs_builder cbld = ibld.group(1, 0);
@@ -803,7 +795,7 @@ fs_visitor::vgrf(const glsl_type *const type)
 {
int reg_width = dispatch_width / 8;
return fs_reg(GRF, alloc.allocate(type_size(type) * reg_width),
- brw_type_for_base_type(type), dispatch_width);
+ brw_type_for_base_type(type));
 }
 
 /** Fixed HW reg constructor. */
@@ -813,14 +805,6 @@ fs_reg::fs_reg(enum register_file file, int reg)
this-file = file;
this-reg = reg;
this-type = BRW_REGISTER_TYPE_F;
-
-   switch (file) {
-   case UNIFORM:
-  this-width = 1;
-  break;
-   default:
-  this-width = 8;
-   }
 }
 
 /** Fixed HW reg constructor. */
@@ -830,25 +814,6 @@ fs_reg::fs_reg(enum register_file file, int reg, enum 
brw_reg_type type)
this-file = file;
this-reg = reg;
this-type = type;
-
-   switch (file) {
-   case UNIFORM:
-  this-width = 1;
-  break;
-   default:
-  this-width = 8;
-   }
-}
-
-/** Fixed HW reg constructor. */
-fs_reg::fs_reg(enum

[Mesa-dev] [PATCH v2 14/19] i965/fs: Remove exec_size guessing from fs_inst::init()

2015-06-25 Thread Jason Ekstrand

Now that all of the non-explicit constructors are gone, we don't need to
guess anymore.

Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 22 --
 1 file changed, 22 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 4f56865..6e45fa7 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -68,28 +68,6 @@ fs_inst::init(enum opcode opcode, uint8_t exec_size, const 
fs_reg dst,
 
assert(dst.file != IMM  dst.file != UNIFORM);
 
-   /* If exec_size == 0, try to guess it from the registers.  Since all
-* manner of things may use hardware registers, we first try to guess
-* based on GRF registers.  If this fails, we will go ahead and take the
-* width from the destination register.
-*/
-   if (this-exec_size == 0) {
-  if (dst.file == GRF) {
- this-exec_size = dst.width;
-  } else {
- for (unsigned i = 0; i  sources; ++i) {
-if (src[i].file != GRF  src[i].file != ATTR)
-   continue;
-
-if (this-exec_size = 1)
-   this-exec_size = src[i].width;
-assert(src[i].width == 1 || src[i].width == this-exec_size);
- }
-  }
-
-  if (this-exec_size == 0  dst.file != BAD_FILE)
- this-exec_size = dst.width;
-   }
assert(this-exec_size != 0);
 
this-conditional_mod = BRW_CONDITIONAL_NONE;
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 18/19] i965/fs_generator: Use inst-exec_size for determining hardware reg widths

2015-06-25 Thread Jason Ekstrand

Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com
---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 8d821ab..0a70bdc 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -48,7 +48,7 @@ static uint32_t brw_file_from_reg(fs_reg *reg)
 }
 
 static struct brw_reg
-brw_reg_from_fs_reg(fs_reg *reg)
+brw_reg_from_fs_reg(fs_inst *inst, fs_reg *reg)
 {
struct brw_reg brw_reg;
 
@@ -57,10 +57,10 @@ brw_reg_from_fs_reg(fs_reg *reg)
case MRF:
   if (reg-stride == 0) {
  brw_reg = brw_vec1_reg(brw_file_from_reg(reg), reg-reg, 0);
-  } else if (reg-width  8) {
+  } else if (inst-exec_size  8) {
  brw_reg = brw_vec8_reg(brw_file_from_reg(reg), reg-reg, 0);
- brw_reg = stride(brw_reg, reg-width * reg-stride,
-  reg-width, reg-stride);
+ brw_reg = stride(brw_reg, inst-exec_size * reg-stride,
+  inst-exec_size, reg-stride);
   } else {
  /* From the Haswell PRM:
   *
@@ -414,7 +414,7 @@ fs_generator::generate_blorp_fb_write(fs_inst *inst)
brw_fb_WRITE(p,
 16 /* dispatch_width */,
 brw_message_reg(inst-base_mrf),
-brw_reg_from_fs_reg(inst-src[0]),
+brw_reg_from_fs_reg(inst, inst-src[0]),
 BRW_DATAPORT_RENDER_TARGET_WRITE_SIMD16_SINGLE_SOURCE,
 inst-target,
 inst-mlen,
@@ -1560,7 +1560,7 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
  annotate(p-devinfo, annotation, cfg, inst, p-next_insn_offset);
 
   for (unsigned int i = 0; i  inst-sources; i++) {
-src[i] = brw_reg_from_fs_reg(inst-src[i]);
+src[i] = brw_reg_from_fs_reg(inst, inst-src[i]);
 
 /* The accumulator result appears to get used for the
  * conditional modifier generation.  When negating a UD
@@ -1572,7 +1572,7 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
inst-src[i].type != BRW_REGISTER_TYPE_UD ||
!inst-src[i].negate);
   }
-  dst = brw_reg_from_fs_reg(inst-dst);
+  dst = brw_reg_from_fs_reg(inst, inst-dst);
 
   brw_set_default_predicate_control(p, inst-predicate);
   brw_set_default_predicate_inverse(p, inst-predicate_inverse);
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 13/19] i965/fs_builder: Use the dispatch width for setting exec sizes

2015-06-25 Thread Jason Ekstrand

Previously we used dst.width but the two *should* be the same.

Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com
---
 src/mesa/drivers/dri/i965/brw_fs_builder.h | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h 
b/src/mesa/drivers/dri/i965/brw_fs_builder.h
index c823190..8af16a0 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_builder.h
+++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h
@@ -235,7 +235,7 @@ namespace brw {
   instruction *
   emit(enum opcode opcode, const dst_reg dst) const
   {
- return emit(instruction(opcode, dst.width, dst));
+ return emit(instruction(opcode, dispatch_width(), dst));
   }
 
   /**
@@ -253,11 +253,11 @@ namespace brw {
  case SHADER_OPCODE_SIN:
  case SHADER_OPCODE_COS:
 return fix_math_instruction(
-   emit(instruction(opcode, dst.width, dst,
+   emit(instruction(opcode, dispatch_width(), dst,
 fix_math_operand(src0;
 
  default:
-return emit(instruction(opcode, dst.width, dst, src0));
+return emit(instruction(opcode, dispatch_width(), dst, src0));
  }
   }
 
@@ -273,12 +273,12 @@ namespace brw {
  case SHADER_OPCODE_INT_QUOTIENT:
  case SHADER_OPCODE_INT_REMAINDER:
 return fix_math_instruction(
-   emit(instruction(opcode, dst.width, dst,
+   emit(instruction(opcode, dispatch_width(), dst,
 fix_math_operand(src0),
 fix_math_operand(src1;
 
  default:
-return emit(instruction(opcode, dst.width, dst, src0, src1));
+return emit(instruction(opcode, dispatch_width(), dst, src0, 
src1));
 
  }
   }
@@ -295,13 +295,14 @@ namespace brw {
  case BRW_OPCODE_BFI2:
  case BRW_OPCODE_MAD:
  case BRW_OPCODE_LRP:
-return emit(instruction(opcode, dst.width, dst,
+return emit(instruction(opcode, dispatch_width(), dst,
 fix_3src_operand(src0),
 fix_3src_operand(src1),
 fix_3src_operand(src2)));
 
  default:
-return emit(instruction(opcode, dst.width, dst, src0, src1, src2));
+return emit(instruction(opcode, dispatch_width(), dst,
+src0, src1, src2));
  }
   }
 
@@ -517,7 +518,8 @@ namespace brw {
   {
  assert(dst.width % 8 == 0);
  instruction *inst = emit(instruction(SHADER_OPCODE_LOAD_PAYLOAD,
-  dst.width, dst, src, sources));
+  dispatch_width(), dst,
+  src, sources));
  inst-header_size = header_size;
 
  for (unsigned i = 0; i  header_size; i++)
@@ -528,7 +530,7 @@ namespace brw {
  for (unsigned i = header_size; i  sources; ++i)
 assert(src[i].file != GRF ||
src[i].width == dst.width);
- inst-regs_written += (sources - header_size) * (dst.width / 8);
+ inst-regs_written += (sources - header_size) * (dispatch_width() / 
8);
 
  return inst;
   }
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: add support for viewport array (v2)

2015-06-25 Thread Marek Olšák

In that case, feel free to push.

Reviewed-by: Marek Olšák marek.ol...@amd.com

Marek

On Thu, Jun 25, 2015 at 10:25 PM, Dave Airlie airl...@gmail.com wrote:
 On 26 June 2015 at 00:26, Marek Olšák mar...@gmail.com wrote:
 Hi Dave,

 The change in si_shader_io_get_unique_index can be dropped. The
 function is only used for shaders before GS.

 Ok okay I was hitting the assert in there for the layer/viewport index cases,
 but if the patch you pushed to master helps I'll drop it.

 This looks good, but I've had a different plan for this feature:

 Yeah I thought you might, I just wanted to hack something up to see it working
 since it seems new Dota 2 uses this feature.

 I'd like the states to be converted into 2 atoms:

 1 r600_atom for all 16 viewports
 1 r600_atom for all 16 scissors

 Each atom should have a bitmask saying which slots are dirty. (the same
 idea as resource slots)

 The emit functions should only emit dirty viewports/scissors.

 Also, the emit functions shouldn't emit non-zero viewports/scissors
 if the viewport index isn't written by the hardware VS stage
 (si_get_vs_info(sctx)-...). This should keep the same level of
 effectiveness as before. When a shader that writes the viewport index
 is bound *and* there are any dirty viewports or scissors, that's the
 right time to mark the atoms as dirty again, so that non-zero dirty
 viewports/scissors are finally emitted.

 I'm not sure if I'll get to doing more of it, I just had a day at home,
 and wanted to get up to speed on radeonsi a bit, so I can just leave
 that here, in case anyone wants to pick it up,

 Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/4] nir: remove nir_src_get_parent_instr()

2015-06-25 Thread Jason Ekstrand

Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com

On Thu, Jun 25, 2015 at 12:29 PM, Connor Abbott cwabbo...@gmail.com wrote:
 It's now unused.

 Signed-off-by: Connor Abbott cwabbo...@gmail.com
 ---
  src/glsl/nir/nir.h | 10 --
  1 file changed, 10 deletions(-)

 diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
 index 2116f60..b33c9c5 100644
 --- a/src/glsl/nir/nir.h
 +++ b/src/glsl/nir/nir.h
 @@ -565,16 +565,6 @@ nir_src_for_reg(nir_register *reg)
 return src;
  }

 -static inline nir_instr *
 -nir_src_get_parent_instr(const nir_src *src)
 -{
 -   if (src-is_ssa) {
 -  return src-ssa-parent_instr;
 -   } else {
 -  return src-reg.reg-parent_instr;
 -   }
 -}
 -
  static inline nir_dest
  nir_dest_for_reg(nir_register *reg)
  {
 --
 2.4.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v4 0/6] Continue enabling Open GL ES 3.1

2015-06-25 Thread Tapani Pälli



Reviewed-by: Tapani Pälli tapani.pa...@intel.com

(no v4 really needed for the little nitpicks but I forgot to say this)

On 06/25/2015 12:08 PM, Marta Lofstedt wrote:

This are for the V4s.

Marta Lofstedt (6):
   mesa/es3.1: enable GL_ARB_shader_image_load_store for gles3.1
   mesa/es3.1: enable GL_ARB_shader_atomic_counters for GLES 3.1
   mesa/es3.1: enable GL_ARB_texture_multisample for GLES 3.1
   mesa/es3.1: enable GL_ARB_texture_gather for GLES 3.1
   mesa/es3.1: enable GL_ARB_compute_shader for GLES 3.1
   mesa/es3.1: enable GL_ARB_explicit_uniform_location for GLES 3.1

  src/mesa/main/get.c  | 36 +
  src/mesa/main/get_hash_params.py | 86 +---
  2 files changed, 82 insertions(+), 40 deletions(-)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 00/19] i965/fs: Remove the width field from fs_reg

2015-06-25 Thread Jason Ekstrand

This is a re-send of the series I did a week or two ago to remove the width
field from the fs_reg class.  I really didn't want to do a re-send but
there have been enough fixes since then that I thought it was worth
re-sending.  Most of these patches have already been reviewed but not all.

02: New.  Needs to be reviewed by someone familiar with SKL

03: Needs re-review.  This one is affected by 02.

05: Needs re-review.  This one went through a lot of changes to actually
get it right.  It should be the way we want now.

08: New.  It's just moving code around so it should be trivial.

09: New.  This is a complete replacement of patch 07 from the previous
series.

Cc: Topi Pohjolainen topi.pohjolai...@intel.com
Cc: Iago Toral Quiroga ito...@igalia.com
Cc: Francisco Jerez curroje...@riseup.net
Cc: Neil Roberts n...@linux.intel.com

Jason Ekstrand (19):
  i965/fs: Use a switch statement in fs_inst::regs_read()
  i965/fs: Actually set/use the mlen for gen7 uniform pull constant
loads
  i965/fs: Fix fs_inst::regs_read() for uniform pull constant loads
  i965/fs: Report the right value in fs_inst::regs_read() for PIXEL_X/Y
  i965/fs: Explicitly set the exec_size on the add(32) in interpolation
setup
  i965/fs: Set the builder group for emitting FB-write stencil/AA alpha
  i965/blorp: Explicitly set execution sizes for new'd instructions
  i965/fs: Move offset(fs_reg, unsigned) to brw_fs.h
  i965/fs: Add a builder argument to offset()
  i965/fs: Make better use of the builder in shader_time
  i965/fs: Remove fs_inst constructors that don't take an explicit
exec_size
  i965/fs: Use exec_size for determining regs read/written and partial
writes
  i965/fs_builder: Use the dispatch width for setting exec sizes
  i965/fs: Remove exec_size guessing from fs_inst::init()
  i965/fs: Use the builder dispatch width instead of dst.width for pull
constants
  i965/fs: Use the builder dispatch_width for computing register offsets
  i965/fs: Use exec_size instead of dst.width for computing component
size
  i965/fs_generator: Use inst-exec_size for determining hardware reg
widths
  i965/fs: Remove the width field from fs_reg

 src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp|   9 +-
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 266 -
 src/mesa/drivers/dri/i965/brw_fs.h |  21 ++
 src/mesa/drivers/dri/i965/brw_fs_builder.h |  37 ++-
 .../drivers/dri/i965/brw_fs_copy_propagation.cpp   |   4 -
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp   |  10 +-
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp |  23 +-
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp   |  64 ++---
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  |   4 +-
 .../drivers/dri/i965/brw_fs_register_coalesce.cpp  |   3 +-
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 183 +++---
 src/mesa/drivers/dri/i965/brw_ir_fs.h  |  45 +---
 .../drivers/dri/i965/brw_schedule_instructions.cpp |   4 +-
 13 files changed, 287 insertions(+), 386 deletions(-)

-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: add support for viewport array (v2)

2015-06-25 Thread Dave Airlie

On 26 June 2015 at 00:26, Marek Olšák mar...@gmail.com wrote:
 Hi Dave,

 The change in si_shader_io_get_unique_index can be dropped. The
 function is only used for shaders before GS.

Ok okay I was hitting the assert in there for the layer/viewport index cases,
but if the patch you pushed to master helps I'll drop it.

 This looks good, but I've had a different plan for this feature:

Yeah I thought you might, I just wanted to hack something up to see it working
since it seems new Dota 2 uses this feature.

 I'd like the states to be converted into 2 atoms:

 1 r600_atom for all 16 viewports
 1 r600_atom for all 16 scissors

 Each atom should have a bitmask saying which slots are dirty. (the same
 idea as resource slots)

 The emit functions should only emit dirty viewports/scissors.

 Also, the emit functions shouldn't emit non-zero viewports/scissors
 if the viewport index isn't written by the hardware VS stage
 (si_get_vs_info(sctx)-...). This should keep the same level of
 effectiveness as before. When a shader that writes the viewport index
 is bound *and* there are any dirty viewports or scissors, that's the
 right time to mark the atoms as dirty again, so that non-zero dirty
 viewports/scissors are finally emitted.

I'm not sure if I'll get to doing more of it, I just had a day at home,
and wanted to get up to speed on radeonsi a bit, so I can just leave
that here, in case anyone wants to pick it up,

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 11/19] i965/fs: Remove fs_inst constructors that don't take an explicit exec_size

2015-06-25 Thread Jason Ekstrand

Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com
---
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 30 ++
 src/mesa/drivers/dri/i965/brw_fs_builder.h |  2 +-
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp   |  6 --
 src/mesa/drivers/dri/i965/brw_ir_fs.h  |  9 +
 4 files changed, 8 insertions(+), 39 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 8024fae..d1e253a 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -126,9 +126,9 @@ fs_inst::fs_inst(enum opcode opcode, uint8_t exec_size)
init(opcode, exec_size, reg_undef, NULL, 0);
 }
 
-fs_inst::fs_inst(enum opcode opcode, const fs_reg dst)
+fs_inst::fs_inst(enum opcode opcode, uint8_t exec_size, const fs_reg dst)
 {
-   init(opcode, 0, dst, NULL, 0);
+   init(opcode, exec_size, dst, NULL, 0);
 }
 
 fs_inst::fs_inst(enum opcode opcode, uint8_t exec_size, const fs_reg dst,
@@ -138,12 +138,6 @@ fs_inst::fs_inst(enum opcode opcode, uint8_t exec_size, 
const fs_reg dst,
init(opcode, exec_size, dst, src, 1);
 }
 
-fs_inst::fs_inst(enum opcode opcode, const fs_reg dst, const fs_reg src0)
-{
-   const fs_reg src[1] = { src0 };
-   init(opcode, 0, dst, src, 1);
-}
-
 fs_inst::fs_inst(enum opcode opcode, uint8_t exec_size, const fs_reg dst,
  const fs_reg src0, const fs_reg src1)
 {
@@ -151,13 +145,6 @@ fs_inst::fs_inst(enum opcode opcode, uint8_t exec_size, 
const fs_reg dst,
init(opcode, exec_size, dst, src, 2);
 }
 
-fs_inst::fs_inst(enum opcode opcode, const fs_reg dst, const fs_reg src0,
- const fs_reg src1)
-{
-   const fs_reg src[2] = { src0, src1 };
-   init(opcode, 0, dst, src, 2);
-}
-
 fs_inst::fs_inst(enum opcode opcode, uint8_t exec_size, const fs_reg dst,
  const fs_reg src0, const fs_reg src1, const fs_reg src2)
 {
@@ -165,19 +152,6 @@ fs_inst::fs_inst(enum opcode opcode, uint8_t exec_size, 
const fs_reg dst,
init(opcode, exec_size, dst, src, 3);
 }
 
-fs_inst::fs_inst(enum opcode opcode, const fs_reg dst, const fs_reg src0,
- const fs_reg src1, const fs_reg src2)
-{
-   const fs_reg src[3] = { src0, src1, src2 };
-   init(opcode, 0, dst, src, 3);
-}
-
-fs_inst::fs_inst(enum opcode opcode, const fs_reg dst,
- const fs_reg src[], unsigned sources)
-{
-   init(opcode, 0, dst, src, sources);
-}
-
 fs_inst::fs_inst(enum opcode opcode, uint8_t exec_width, const fs_reg dst,
  const fs_reg src[], unsigned sources)
 {
diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h 
b/src/mesa/drivers/dri/i965/brw_fs_builder.h
index 58ac598..c823190 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_builder.h
+++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h
@@ -235,7 +235,7 @@ namespace brw {
   instruction *
   emit(enum opcode opcode, const dst_reg dst) const
   {
- return emit(instruction(opcode, dst));
+ return emit(instruction(opcode, dst.width, dst));
   }
 
   /**
diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 61eb904..50d6014 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -109,7 +109,8 @@ fs_visitor::nir_setup_inputs(nir_shader *shader)
  if (var-data.location == VARYING_SLOT_POS) {
 reg = *emit_fragcoord_interpolation(var-data.pixel_center_integer,
 var-data.origin_upper_left);
-emit_percomp(bld, fs_inst(BRW_OPCODE_MOV, input, reg), 0xF);
+emit_percomp(bld, fs_inst(BRW_OPCODE_MOV, bld.dispatch_width(),
+  input, reg), 0xF);
  } else {
 emit_general_interpolation(input, var-name, var-type,
(glsl_interp_qualifier) 
var-data.interpolation,
@@ -1743,7 +1744,8 @@ fs_visitor::nir_emit_texture(const fs_builder bld, 
nir_tex_instr *instr)
fs_reg dest = get_nir_dest(instr-dest);
dest.type = this-result.type;
unsigned num_components = nir_tex_instr_dest_size(instr);
-   emit_percomp(bld, fs_inst(BRW_OPCODE_MOV, dest, this-result),
+   emit_percomp(bld, fs_inst(BRW_OPCODE_MOV, bld.dispatch_width(),
+ dest, this-result),
 (1  num_components) - 1);
 }
 
diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h 
b/src/mesa/drivers/dri/i965/brw_ir_fs.h
index 16b20be..d6b617a 100644
--- a/src/mesa/drivers/dri/i965/brw_ir_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h
@@ -189,20 +189,13 @@ public:
 
fs_inst();
fs_inst(enum opcode opcode, uint8_t exec_size);
-   fs_inst(enum opcode opcode, const fs_reg dst);
+   fs_inst(enum opcode opcode, uint8_t exec_size, const fs_reg dst);
fs_inst(enum opcode opcode, uint8_t exec_size, const fs_reg dst,
const fs_reg src0);
-   fs_inst(enum opcode opcode, const fs_reg dst, const

[Mesa-dev] [PATCH v2 09/19] i965/fs: Add a builder argument to offset()

2015-06-25 Thread Jason Ekstrand

---
 src/mesa/drivers/dri/i965/brw_fs.cpp |  42 
 src/mesa/drivers/dri/i965/brw_fs.h   |   2 +-
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp |   2 +-
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp |  58 +--
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 143 ++-
 5 files changed, 128 insertions(+), 119 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 6cf9e96..9855bfb 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -267,7 +267,7 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
bld,
  inst-mlen = 1 + dispatch_width / 8;
}
 
-   bld.MOV(dst, offset(vec4_result, (const_offset  3) * scale));
+   bld.MOV(dst, offset(vec4_result, bld, (const_offset  3) * scale));
 }
 
 /**
@@ -361,7 +361,12 @@ fs_inst::is_copy_payload(const brw::simple_allocator 
grf_alloc) const
   reg.width = this-src[i].width;
   if (!this-src[i].equals(reg))
  return false;
-  reg = ::offset(reg, 1);
+
+  if (i  this-header_size) {
+ reg.reg_offset += 1;
+  } else {
+ reg.reg_offset += this-exec_size / 8;
+  }
}
 
return true;
@@ -920,7 +925,7 @@ fs_visitor::emit_fragcoord_interpolation(bool 
pixel_center_integer,
} else {
   bld.ADD(wpos, this-pixel_x, fs_reg(0.5f));
}
-   wpos = offset(wpos, 1);
+   wpos = offset(wpos, bld, 1);
 
/* gl_FragCoord.y */
if (!flip  pixel_center_integer) {
@@ -936,7 +941,7 @@ fs_visitor::emit_fragcoord_interpolation(bool 
pixel_center_integer,
 
   bld.ADD(wpos, pixel_y, fs_reg(offset));
}
-   wpos = offset(wpos, 1);
+   wpos = offset(wpos, bld, 1);
 
/* gl_FragCoord.z */
if (devinfo-gen = 6) {
@@ -946,7 +951,7 @@ fs_visitor::emit_fragcoord_interpolation(bool 
pixel_center_integer,
this-delta_xy[BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC],
interp_reg(VARYING_SLOT_POS, 2));
}
-   wpos = offset(wpos, 1);
+   wpos = offset(wpos, bld, 1);
 
/* gl_FragCoord.w: Already set up in emit_interpolation */
bld.MOV(wpos, this-wpos_w);
@@ -1029,7 +1034,7 @@ fs_visitor::emit_general_interpolation(fs_reg attr, const 
char *name,
/* If there's no incoming setup data for this slot, don't
 * emit interpolation for it.
 */
-   attr = offset(attr, type-vector_elements);
+   attr = offset(attr, bld, type-vector_elements);
location++;
continue;
 }
@@ -1044,7 +1049,7 @@ fs_visitor::emit_general_interpolation(fs_reg attr, const 
char *name,
   interp = suboffset(interp, 3);
interp.type = attr.type;
bld.emit(FS_OPCODE_CINTERP, attr, fs_reg(interp));
-  attr = offset(attr, 1);
+  attr = offset(attr, bld, 1);
}
 } else {
/* Smooth/noperspective interpolation case. */
@@ -1082,7 +1087,7 @@ fs_visitor::emit_general_interpolation(fs_reg attr, const 
char *name,
if (devinfo-gen  6  interpolation_mode == 
INTERP_QUALIFIER_SMOOTH) {
   bld.MUL(attr, attr, this-pixel_w);
}
-  attr = offset(attr, 1);
+  attr = offset(attr, bld, 1);
}
 
 }
@@ -1190,7 +1195,7 @@ fs_visitor::emit_samplepos_setup()
}
/* Compute gl_SamplePosition.x */
compute_sample_position(pos, int_sample_x);
-   pos = offset(pos, 1);
+   pos = offset(pos, abld, 1);
if (dispatch_width == 8) {
   abld.MOV(int_sample_y, fs_reg(suboffset(sample_pos_reg, 1)));
} else {
@@ -2980,10 +2985,6 @@ fs_visitor::lower_load_payload()
 
   assert(inst-dst.file == MRF || inst-dst.file == GRF);
   assert(inst-saturate == false);
-
-  const fs_builder ibld = bld.group(inst-exec_size, inst-force_sechalf)
- .exec_all(inst-force_writemask_all)
- .at(block, inst);
   fs_reg dst = inst-dst;
 
   /* Get rid of COMPR4.  We'll add it back in if we need it */
@@ -2991,17 +2992,23 @@ fs_visitor::lower_load_payload()
  dst.reg = dst.reg  ~BRW_MRF_COMPR4;
 
   dst.width = 8;
+  const fs_builder hbld = bld.group(8, 0).exec_all().at(block, inst);
+
   for (uint8_t i = 0; i  inst-header_size; i++) {
  if (inst-src[i].file != BAD_FILE) {
 fs_reg mov_dst = retype(dst, BRW_REGISTER_TYPE_UD);
 fs_reg mov_src = retype(inst-src[i], BRW_REGISTER_TYPE_UD);
 mov_src.width = 8;
-ibld.exec_all().MOV(mov_dst, mov_src);
+hbld.MOV(mov_dst, mov_src);
  }
- dst = offset(dst, 1);
+ dst = offset(dst, hbld, 1);
   }
 
   dst.width = inst-exec_size;
+  const fs_builder ibld = bld.group(inst-exec_size, inst-force_sechalf)
+ .exec_all(inst-force_writemask_all)
+

[Mesa-dev] [PATCH v2 08/19] i965/fs: Move offset(fs_reg, unsigned) to brw_fs.h

2015-06-25 Thread Jason Ekstrand

Shortly, offset() will depend on the builder so we need it moved to some
place where it has access to that.
---
 src/mesa/drivers/dri/i965/brw_fs.h| 21 +
 src/mesa/drivers/dri/i965/brw_ir_fs.h | 21 -
 2 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 243baf6..c1819cc 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -62,6 +62,27 @@ namespace brw {
class fs_live_variables;
 }
 
+static inline fs_reg
+offset(fs_reg reg, unsigned delta)
+{
+   switch (reg.file) {
+   case BAD_FILE:
+  break;
+   case GRF:
+   case MRF:
+   case ATTR:
+  return byte_offset(reg,
+ delta * MAX2(reg.width * reg.stride, 1) *
+ type_sz(reg.type));
+   case UNIFORM:
+  reg.reg_offset += delta;
+  break;
+   default:
+  assert(delta == 0);
+   }
+   return reg;
+}
+
 /**
  * The fragment shader front-end.
  *
diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h 
b/src/mesa/drivers/dri/i965/brw_ir_fs.h
index 96dc20d..16b20be 100644
--- a/src/mesa/drivers/dri/i965/brw_ir_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h
@@ -129,27 +129,6 @@ horiz_offset(fs_reg reg, unsigned delta)
 }
 
 static inline fs_reg
-offset(fs_reg reg, unsigned delta)
-{
-   switch (reg.file) {
-   case BAD_FILE:
-  break;
-   case GRF:
-   case MRF:
-   case ATTR:
-  return byte_offset(reg,
- delta * MAX2(reg.width * reg.stride, 1) *
- type_sz(reg.type));
-   case UNIFORM:
-  reg.reg_offset += delta;
-  break;
-   default:
-  assert(delta == 0);
-   }
-   return reg;
-}
-
-static inline fs_reg
 component(fs_reg reg, unsigned idx)
 {
assert(reg.subreg_offset == 0);
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 06/19] i965/fs: Set the builder group for emitting FB-write stencil/AA alpha

2015-06-25 Thread Jason Ekstrand

Reviewed-by: Iago Toral Quiroga ito...@igalia.com
Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 8976c25..2341d02 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1529,7 +1529,7 @@ fs_visitor::emit_single_fb_write(const fs_builder bld,
 
if (payload.aa_dest_stencil_reg) {
   sources[length] = fs_reg(GRF, alloc.allocate(1));
-  bld.exec_all().annotate(FB write stencil/AA alpha)
+  bld.group(8, 0).exec_all().annotate(FB write stencil/AA alpha)
  .MOV(sources[length],
   fs_reg(brw_vec8_grf(payload.aa_dest_stencil_reg, 0)));
   length++;
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 03/19] i965/fs: Fix fs_inst::regs_read() for uniform pull constant loads

2015-06-25 Thread Jason Ekstrand

Previously, fs_inst::regs_read() fell back to depending on the register
width for the second source.  This isn't really correct since it isn't a
SIMD8 value at all, but a SIMD4x2 value.  This commit changes it to
explicitly be always one register.

Reviewed-by: Iago Toral Quiroga ito...@igalia.com

v2: Use mlen for determining the number of registers written
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 31dfb24..589b74c 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -715,6 +715,12 @@ fs_inst::regs_read(int arg) const
  return mlen;
   break;
 
+   case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GEN7:
+  /* The payload is actually stored in src1 */
+  if (arg == 1)
+ return mlen;
+  break;
+
case FS_OPCODE_LINTERP:
   if (arg == 0)
  return exec_size / 4;
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 07/19] i965/blorp: Explicitly set execution sizes for new'd instructions

2015-06-25 Thread Jason Ekstrand

This doesn't affect instructions allocated using the builder.

Reviewed-by: Iago Toral Quiroga ito...@igalia.com
Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com
---
 src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
index 789520c..d458ad8 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
@@ -73,7 +73,7 @@ brw_blorp_eu_emitter::emit_kill_if_outside_rect(const struct 
brw_reg x,
emit_cmp(BRW_CONDITIONAL_L, x, dst_x1)-predicate = BRW_PREDICATE_NORMAL;
emit_cmp(BRW_CONDITIONAL_L, y, dst_y1)-predicate = BRW_PREDICATE_NORMAL;
 
-   fs_inst *inst = new (mem_ctx) fs_inst(BRW_OPCODE_AND, g1, f0, g1);
+   fs_inst *inst = new (mem_ctx) fs_inst(BRW_OPCODE_AND, 16, g1, f0, g1);
inst-force_writemask_all = true;
insts.push_tail(inst);
 }
@@ -84,7 +84,7 @@ brw_blorp_eu_emitter::emit_texture_lookup(const struct 
brw_reg dst,
   unsigned base_mrf,
   unsigned msg_length)
 {
-   fs_inst *inst = new (mem_ctx) fs_inst(op, dst, brw_message_reg(base_mrf),
+   fs_inst *inst = new (mem_ctx) fs_inst(op, 16, dst, 
brw_message_reg(base_mrf),
  fs_reg(0u));
 
inst-base_mrf = base_mrf;
@@ -119,7 +119,8 @@ brw_blorp_eu_emitter::emit_combine(enum opcode 
combine_opcode,
 {
assert(combine_opcode == BRW_OPCODE_ADD || combine_opcode == 
BRW_OPCODE_AVG);
 
-   insts.push_tail(new (mem_ctx) fs_inst(combine_opcode, dst, src_1, src_2));
+   insts.push_tail(new (mem_ctx) fs_inst(combine_opcode, 16, dst,
+ src_1, src_2));
 }
 
 fs_inst *
@@ -127,7 +128,7 @@ brw_blorp_eu_emitter::emit_cmp(enum brw_conditional_mod op,
const struct brw_reg x,
const struct brw_reg y)
 {
-   fs_inst *cmp = new (mem_ctx) fs_inst(BRW_OPCODE_CMP,
+   fs_inst *cmp = new (mem_ctx) fs_inst(BRW_OPCODE_CMP, 16,
 vec16(brw_null_reg()), x, y);
cmp-conditional_mod = op;
insts.push_tail(cmp);
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: add support for geometry shader invocations.

2015-06-25 Thread Ilia Mirkin

On Thu, Jun 25, 2015 at 4:26 PM, Dave Airlie airl...@gmail.com wrote:
 From: Dave Airlie airl...@redhat.com

 Signed-off-by: Dave Airlie airl...@redhat.com
 ---
  src/gallium/drivers/radeonsi/si_shader.c| 5 +
  src/gallium/drivers/radeonsi/si_shader.h| 1 +
  src/gallium/drivers/radeonsi/si_state.c | 1 -
  src/gallium/drivers/radeonsi/si_state_shaders.c | 7 +++
  4 files changed, 13 insertions(+), 1 deletion(-)

 diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
 b/src/gallium/drivers/radeonsi/si_shader.c
 index 87608a1..665ce83 100644
 --- a/src/gallium/drivers/radeonsi/si_shader.c
 +++ b/src/gallium/drivers/radeonsi/si_shader.c
 @@ -630,6 +630,11 @@ static void declare_system_value(
  SI_PARAM_BASE_VERTEX);
 break;

 +   case TGSI_SEMANTIC_INVOCATIONID:
 +   value = LLVMGetParam(radeon_bld-main_fn,
 +SI_PARAM_GS_INSTANCE_ID);
 +   break;
 +
 case TGSI_SEMANTIC_SAMPLEID:
 value = get_sample_id(radeon_bld);
 break;
 diff --git a/src/gallium/drivers/radeonsi/si_shader.h 
 b/src/gallium/drivers/radeonsi/si_shader.h
 index 51055af..b4339ae 100644
 --- a/src/gallium/drivers/radeonsi/si_shader.h
 +++ b/src/gallium/drivers/radeonsi/si_shader.h
 @@ -115,6 +115,7 @@ struct si_shader_selector {

 unsignedgs_output_prim;
 unsignedgs_max_out_vertices;
 +   unsignedgs_num_invocations;
 uint64_tgs_used_inputs; /* mask of get_unique_index bits */
  };

 diff --git a/src/gallium/drivers/radeonsi/si_state.c 
 b/src/gallium/drivers/radeonsi/si_state.c
 index 752467b..0dd08a2 100644
 --- a/src/gallium/drivers/radeonsi/si_state.c
 +++ b/src/gallium/drivers/radeonsi/si_state.c
 @@ -3078,7 +3078,6 @@ void si_init_config(struct si_context *sctx)
 si_pm4_set_reg(pm4, R_028B60_VGT_GS_VERT_ITEMSIZE_1, 0);
 si_pm4_set_reg(pm4, R_028B64_VGT_GS_VERT_ITEMSIZE_2, 0);
 si_pm4_set_reg(pm4, R_028B68_VGT_GS_VERT_ITEMSIZE_3, 0);
 -   si_pm4_set_reg(pm4, R_028B90_VGT_GS_INSTANCE_CNT, 0);

 si_pm4_set_reg(pm4, R_028B98_VGT_STRMOUT_BUFFER_CONFIG, 0x0);
 si_pm4_set_reg(pm4, R_028AB4_VGT_REUSE_OFF, 0);
 diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
 b/src/gallium/drivers/radeonsi/si_state_shaders.c
 index 48128fa..eef3baa 100644
 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c
 +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
 @@ -76,6 +76,7 @@ static void si_shader_gs(struct si_shader *shader)
 unsigned gs_vert_itemsize = shader-selector-info.num_outputs * (16 
  2);
 unsigned gs_max_vert_out = shader-selector-gs_max_out_vertices;
 unsigned gsvs_itemsize = gs_vert_itemsize * gs_max_vert_out;
 +   unsigned gs_num_invocations = shader-selector-gs_num_invocations;
 unsigned cut_mode;
 struct si_pm4_state *pm4;
 unsigned num_sgprs, num_user_sgprs;
 @@ -118,6 +119,10 @@ static void si_shader_gs(struct si_shader *shader)

 si_pm4_set_reg(pm4, R_028B5C_VGT_GS_VERT_ITEMSIZE, gs_vert_itemsize);

 +   si_pm4_set_reg(pm4, R_028B90_VGT_GS_INSTANCE_CNT,
 +  S_028B90_CNT(MIN2(gs_num_invocations, 127)) |
 +  S_028B90_ENABLE(gs_num_invocations  0));

Without knowing how radeonsi works, just want to point out that
previously GS_INSTANCE_CNT was being always set to 0, even though a
single invocation was still being run. Perhaps this really needs to be
gs_num_invocations_minus_one?

GS_INVOCATIONS will always be at least 1.

 +
 va = shader-bo-gpu_address;
 si_pm4_add_bo(pm4, shader-bo, RADEON_USAGE_READ, 
 RADEON_PRIO_SHADER_DATA);
 si_pm4_set_reg(pm4, R_00B220_SPI_SHADER_PGM_LO_GS, va  8);
 @@ -490,6 +495,8 @@ static void *si_create_shader_state(struct pipe_context 
 *ctx,
 sel-info.properties[TGSI_PROPERTY_GS_OUTPUT_PRIM];
 sel-gs_max_out_vertices =
 
 sel-info.properties[TGSI_PROPERTY_GS_MAX_OUTPUT_VERTICES];
 +   sel-gs_num_invocations =
 +   sel-info.properties[TGSI_PROPERTY_GS_INVOCATIONS];

 for (i = 0; i  sel-info.num_inputs; i++) {
 unsigned name = sel-info.input_semantic_name[i];
 --
 2.4.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/4] nir/from_ssa: add a flag to not convert everything to SSA

2015-06-25 Thread Jason Ekstrand

On Thu, Jun 25, 2015 at 12:29 PM, Connor Abbott cwabbo...@gmail.com wrote:
 We already don't convert constants out of SSA, and in our backend we'd
 like to have only one way of saying something is still in SSA.

 The one tricky part about this is that we may now leave some undef
 instructions around if they aren't part of a phi-web, so we have to be
 more careful about deleting them.

 Signed-off-by: Connor Abbott cwabbo...@gmail.com
 ---
  src/gallium/drivers/vc4/vc4_program.c |  2 +-
  src/glsl/nir/nir.h|  7 ++-
  src/glsl/nir/nir_from_ssa.c   | 25 ++---
  src/mesa/drivers/dri/i965/brw_nir.c   |  2 +-
  4 files changed, 26 insertions(+), 10 deletions(-)

 diff --git a/src/gallium/drivers/vc4/vc4_program.c 
 b/src/gallium/drivers/vc4/vc4_program.c
 index 2061631..1a550e1 100644
 --- a/src/gallium/drivers/vc4/vc4_program.c
 +++ b/src/gallium/drivers/vc4/vc4_program.c
 @@ -2102,7 +2102,7 @@ vc4_shader_ntq(struct vc4_context *vc4, enum qstage 
 stage,

  nir_remove_dead_variables(c-s);

 -nir_convert_from_ssa(c-s);
 +nir_convert_from_ssa(c-s, true);

  if (vc4_debug  VC4_DEBUG_SHADERDB) {
  fprintf(stderr, SHADER-DB: %s prog %d/%d: %d NIR 
 instructions\n,
 diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
 index 697d37e..2116f60 100644
 --- a/src/glsl/nir/nir.h
 +++ b/src/glsl/nir/nir.h
 @@ -1676,7 +1676,12 @@ bool nir_ssa_defs_interfere(nir_ssa_def *a, 
 nir_ssa_def *b);

  void nir_convert_to_ssa_impl(nir_function_impl *impl);
  void nir_convert_to_ssa(nir_shader *shader);
 -void nir_convert_from_ssa(nir_shader *shader);
 +
 +/* If convert_everything is true, convert all values (even those not involved
 + * in a phi node) to registers. If false, only convert SSA values involved in
 + * phi nodes to registers.
 + */
 +void nir_convert_from_ssa(nir_shader *shader, bool convert_everything);

I don't think convert everything is really what we want to call it.
A better idea might be to flip the bool and call it phi_webs_only.

With that changed,
Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com


  bool nir_opt_algebraic(nir_shader *shader);
  bool nir_opt_algebraic_late(nir_shader *shader);
 diff --git a/src/glsl/nir/nir_from_ssa.c b/src/glsl/nir/nir_from_ssa.c
 index 67733e6..966c2fe 100644
 --- a/src/glsl/nir/nir_from_ssa.c
 +++ b/src/glsl/nir/nir_from_ssa.c
 @@ -37,6 +37,7 @@
  struct from_ssa_state {
 void *mem_ctx;
 void *dead_ctx;
 +   bool convert_everything;
 struct hash_table *merge_node_table;
 nir_instr *instr;
 nir_function_impl *impl;
 @@ -482,6 +483,9 @@ rewrite_ssa_def(nir_ssa_def *def, void *void_state)

reg = node-set-reg;
 } else {
 +  if (!state-convert_everything)
 + return true;
 +
/* We leave load_const SSA values alone.  They act as immediates to
 * the backend.  If it got coalesced into a phi, that's ok.
 */
 @@ -505,8 +509,15 @@ rewrite_ssa_def(nir_ssa_def *def, void *void_state)
 nir_ssa_def_rewrite_uses(def, nir_src_for_reg(reg), state-mem_ctx);
 assert(list_empty(def-uses)  list_empty(def-if_uses));

 -   if (def-parent_instr-type == nir_instr_type_ssa_undef)
 +   if (def-parent_instr-type == nir_instr_type_ssa_undef) {
 +  /* If it's an ssa_undef instruction, remove it since we know we just 
 got
 +   * rid of all its uses.
 +   */
 +  nir_instr *parent_instr = def-parent_instr;
 +  nir_instr_remove(parent_instr);
 +  ralloc_steal(state-dead_ctx, parent_instr);
return true;
 +   }

 assert(def-parent_instr-type != nir_instr_type_load_const);

 @@ -523,7 +534,7 @@ rewrite_ssa_def(nir_ssa_def *def, void *void_state)
  }

  /* Resolves ssa definitions to registers.  While we're at it, we also
 - * remove phi nodes and ssa_undef instructions
 + * remove phi nodes.
   */
  static bool
  resolve_registers_block(nir_block *block, void *void_state)
 @@ -534,8 +545,7 @@ resolve_registers_block(nir_block *block, void 
 *void_state)
state-instr = instr;
nir_foreach_ssa_def(instr, rewrite_ssa_def, state);

 -  if (instr-type == nir_instr_type_ssa_undef ||
 -  instr-type == nir_instr_type_phi) {
 +  if (instr-type == nir_instr_type_phi) {
   nir_instr_remove(instr);
   ralloc_steal(state-dead_ctx, instr);
}
 @@ -765,13 +775,14 @@ resolve_parallel_copies_block(nir_block *block, void 
 *void_state)
  }

  static void
 -nir_convert_from_ssa_impl(nir_function_impl *impl)
 +nir_convert_from_ssa_impl(nir_function_impl *impl, bool convert_everything)
  {
 struct from_ssa_state state;

 state.mem_ctx = ralloc_parent(impl);
 state.dead_ctx = ralloc_context(NULL);
 state.impl = impl;
 +   state.convert_everything = convert_everything;
 state.merge_node_table = _mesa_hash_table_create(NULL, _mesa_hash_pointer,
  _mesa_key_pointer_equal);

 @@ -801,10 +812,10 @@

Re: [Mesa-dev] [PATCH 4/4] nir: remove parent_instr from nir_register

2015-06-25 Thread Jason Ekstrand

Yes, please!  It was nice at the time, but it was always a hack.

Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com

On Thu, Jun 25, 2015 at 12:29 PM, Connor Abbott cwabbo...@gmail.com wrote:
 It's no longer used

 Signed-off-by: Connor Abbott cwabbo...@gmail.com
 ---
  src/glsl/nir/nir.c  | 1 -
  src/glsl/nir/nir.h  | 8 
  src/glsl/nir/nir_from_ssa.c | 8 
  3 files changed, 17 deletions(-)

 diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c
 index f03e80a..f661249 100644
 --- a/src/glsl/nir/nir.c
 +++ b/src/glsl/nir/nir.c
 @@ -57,7 +57,6 @@ reg_create(void *mem_ctx, struct exec_list *list)
  {
 nir_register *reg = ralloc(mem_ctx, nir_register);

 -   reg-parent_instr = NULL;
 list_inithead(reg-uses);
 list_inithead(reg-defs);
 list_inithead(reg-if_uses);
 diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
 index b33c9c5..e818acc 100644
 --- a/src/glsl/nir/nir.h
 +++ b/src/glsl/nir/nir.h
 @@ -389,14 +389,6 @@ typedef struct {
  */
 bool is_packed;

 -   /**
 -* If this pointer is non-NULL then this register has exactly one
 -* definition and that definition dominates all of its uses.  This is
 -* set by the out-of-SSA pass so that backends can get SSA-like
 -* information even once they have gone out of SSA.
 -*/
 -   struct nir_instr *parent_instr;
 -
 /** set of nir_instr's where this register is used (read from) */
 struct list_head uses;

 diff --git a/src/glsl/nir/nir_from_ssa.c b/src/glsl/nir/nir_from_ssa.c
 index 966c2fe..57bbdde 100644
 --- a/src/glsl/nir/nir_from_ssa.c
 +++ b/src/glsl/nir/nir_from_ssa.c
 @@ -496,14 +496,6 @@ rewrite_ssa_def(nir_ssa_def *def, void *void_state)
reg-name = def-name;
reg-num_components = def-num_components;
reg-num_array_elems = 0;
 -
 -  /* This register comes from an SSA definition that is defined and not
 -   * part of a phi-web.  Therefore, we know it has a single unique
 -   * definition that dominates all of its uses; we can copy the
 -   * parent_instr from the SSA def safely.
 -   */
 -  if (def-parent_instr-type != nir_instr_type_ssa_undef)
 - reg-parent_instr = def-parent_instr;
 }

 nir_ssa_def_rewrite_uses(def, nir_src_for_reg(reg), state-mem_ctx);
 --
 2.4.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/gen9: use an unreserved surface alignment value

2015-06-25 Thread Pohjolainen, Topi

On Thu, Jun 25, 2015 at 08:40:33AM -0700, Nanley Chery wrote:
 On Thu, Jun 25, 2015 at 12:37 AM, Pohjolainen, Topi
 topi.pohjolai...@intel.com wrote:
  On Wed, Jun 24, 2015 at 05:57:13PM -0700, Anuj Phogat wrote:
  On Wed, Jun 24, 2015 at 3:51 PM, Nanley Chery nanleych...@gmail.com 
  wrote:
   From: Nanley Chery nanley.g.ch...@intel.com
  
   Although the horizontal and vertical alignment fields are ignored here,
   0 is a reserved value for them and may cause undefined behavior. Change
   the default value to an abitrary valid one.
  
   Signed-off-by: Nanley Chery nanley.g.ch...@intel.com
   ---
src/mesa/drivers/dri/i965/gen8_surface_state.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
  
   diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
   b/src/mesa/drivers/dri/i965/gen8_surface_state.c
   index b2d1a57..22ae960 100644
   --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
   +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
   @@ -93,7 +93,7 @@ vertical_alignment(const struct brw_context *brw,
   if (brw-gen  8 
   (mt-tr_mode != INTEL_MIPTREE_TRMODE_NONE ||
surf_type == BRW_SURFACE_1D))
   -  return 0;
   +  return GEN8_SURFACE_VALIGN_4;
  
   switch (mt-align_h) {
   case 4:
   @@ -118,7 +118,7 @@ horizontal_alignment(const struct brw_context *brw,
   if (brw-gen  8 
   (mt-tr_mode != INTEL_MIPTREE_TRMODE_NONE ||
gen9_use_linear_1d_layout(brw, mt)))
   -  return 0;
   +  return GEN8_SURFACE_HALIGN_4;
  
   switch (mt-align_w) {
   case 4:
   --
   2.4.4
  
   ___
   mesa-dev mailing list
   mesa-dev@lists.freedesktop.org
   http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 
  Good find Nanley. We had no known issues with value 0 but it's
  always nice to avoid undefined behavior :).
 
  Right, I thought about this when I reviewed the original. The spec says
  it is ignored in these cases and hence the reserved value seemed fine. Now
  that we put something meaningful there, somebody is going to compare the
  spec and wonder why we set it to 4. If we added also a comment here that
  says this is just an arbitrary (non-reserved) value and really ignored
  by the hardware, it would prevent misunderstandings. What do you guys think?
 
 There's enough space to insert Set to an arbitrary non-reserved
 value. in both of the comments preceding the conditional without
 adding an extra line. I wouldn't mind including it.

Sounds great, thanks!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 91111] [BYT regression] dEQP-GLES3.functional.shaders.texture_functions.texturelod.usamplercube_*

2015-06-25 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=9

Bug ID: 9
   Summary: [BYT regression]
dEQP-GLES3.functional.shaders.texture_functions.textur
elod.usamplercube_*
   Product: Mesa
   Version: 10.6
  Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
  Severity: normal
  Priority: medium
 Component: Mesa core
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: brian.wil...@intel.com
QA Contact: mesa-dev@lists.freedesktop.org

Tested Mesa commit:
c2a0600d5b0645533ba442b5ab879b23c2564a4d: i965: Don't set NirOptions for
stages that will use the vec4 backend.
Plus patches:
- 10.6-i965-do-not-round-line-width-when-multisampling-or-a.patch
- 10.6-mesa-add-GL_RED-GL_RG-support-for-floating-point-tex.patch
- 10.6-i965-Momentarily-pretend-to-support-ARB_texture_sten.patch
- 10.6-Revert-i965-Advertise-a-line-width-of-40.0-on-Cherry.patch

Testing a Mesa update on ChromeOS, the following dEQP tests fail on BYT:
-
dEQP-GLES3.functional.shaders.texture_functions.texturelod.usamplercube_vertex
-
dEQP-GLES3.functional.shaders.texture_functions.texturelod.usamplercube_fragment

Sample failure:
Image comparison failed: difference = 450.979, threshold = 0.05

These tests pass on HSW, BDW and BSW, and passed on BYT with an older Mesa
commit (November 18, 2014: 129178893b2260df22db96327c5ca9c2ce7db046 glsl:
Generate unique names for each const array lowered to uniforms + ~10 patches)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/4] i965/fs: use SSA values directly

2015-06-25 Thread Jason Ekstrand

On Thu, Jun 25, 2015 at 12:29 PM, Connor Abbott cwabbo...@gmail.com wrote:
 Before, we would use registers, but set a magical parent_instr field
 to indicate that it was actually purely an SSA value (i.e., it wasn't
 involved in any phi nodes). Instead, just use SSA values directly, which
 lets us get rid of the hack and reduces memory usage since we're not
 allocating a nir_register for every value. It also makes our handling of
 load_const more consistent compared to the other instructions.

 Signed-off-by: Connor Abbott cwabbo...@gmail.com
 ---
  src/mesa/drivers/dri/i965/brw_fs.h |  5 ++
  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 73 
 ++
  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |  1 +
  src/mesa/drivers/dri/i965/brw_nir.c|  2 +-
  .../dri/i965/brw_nir_analyze_boolean_resolves.c| 12 ++--
  5 files changed, 59 insertions(+), 34 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
 b/src/mesa/drivers/dri/i965/brw_fs.h
 index 243baf6..a6fee35 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs.h
 +++ b/src/mesa/drivers/dri/i965/brw_fs.h
 @@ -249,6 +249,10 @@ public:
 void nir_emit_block(nir_block *block);
 void nir_emit_instr(nir_instr *instr);
 void nir_emit_alu(const brw::fs_builder bld, nir_alu_instr *instr);
 +   void nir_emit_load_const(const brw::fs_builder bld,
 +nir_load_const_instr *instr);
 +   void nir_emit_undef(const brw::fs_builder bld,
 +   nir_ssa_undef_instr *instr);
 void nir_emit_intrinsic(const brw::fs_builder bld,
 nir_intrinsic_instr *instr);
 void nir_emit_texture(const brw::fs_builder bld,
 @@ -345,6 +349,7 @@ public:
 unsigned max_grf;

 fs_reg *nir_locals;
 +   fs_reg *nir_ssa_values;
 fs_reg *nir_globals;
 fs_reg nir_inputs;
 fs_reg nir_outputs;
 diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 index 59081ea..58896d7 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 @@ -366,6 +366,9 @@ fs_visitor::nir_emit_impl(nir_function_impl *impl)
nir_locals[reg-index] = bld.vgrf(BRW_REGISTER_TYPE_F, size);
 }

 +   nir_ssa_values = reralloc(mem_ctx, nir_ssa_values, fs_reg,
 + impl-ssa_alloc);
 +
 nir_emit_cf_list(impl-body);
  }

 @@ -459,9 +462,11 @@ fs_visitor::nir_emit_instr(nir_instr *instr)
break;

 case nir_instr_type_load_const:
 -  /* We can hit these, but we do nothing now and use them as
 -   * immediates later.
 -   */
 +  nir_emit_load_const(abld, nir_instr_as_load_const(instr));
 +  break;
 +
 +   case nir_instr_type_ssa_undef:
 +  nir_emit_undef(abld, nir_instr_as_ssa_undef(instr));
break;

 case nir_instr_type_jump:
 @@ -495,17 +500,12 @@ bool
  fs_visitor::optimize_frontfacing_ternary(nir_alu_instr *instr,
   const fs_reg result)
  {
 -   if (instr-src[0].src.is_ssa ||
 -   !instr-src[0].src.reg.reg ||
 -   !instr-src[0].src.reg.reg-parent_instr)
 -  return false;
 -
 -   if (instr-src[0].src.reg.reg-parent_instr-type !=
 -   nir_instr_type_intrinsic)
 +   if (!instr-src[0].src.is_ssa ||
 +   instr-src[0].src.ssa-parent_instr-type != nir_instr_type_intrinsic)
return false;

 nir_intrinsic_instr *src0 =
 -  nir_instr_as_intrinsic(instr-src[0].src.reg.reg-parent_instr);
 +  nir_instr_as_intrinsic(instr-src[0].src.ssa-parent_instr);

 if (src0-intrinsic != nir_intrinsic_load_front_face)
return false;
 @@ -1146,6 +1146,25 @@ fs_visitor::nir_emit_alu(const fs_builder bld, 
 nir_alu_instr *instr)
 }
  }

 +void
 +fs_visitor::nir_emit_load_const(const fs_builder bld,
 +nir_load_const_instr *instr)
 +{
 +   fs_reg reg = bld.vgrf(BRW_REGISTER_TYPE_D, instr-def.num_components);
 +
 +   for (unsigned i = 0; i  instr-def.num_components; i++)
 +  bld.MOV(offset(reg, i), fs_reg(instr-value.i[i]));
 +
 +   nir_ssa_values[instr-def.index] = reg;
 +}
 +
 +void
 +fs_visitor::nir_emit_undef(const fs_builder bld, nir_ssa_undef_instr *instr)
 +{
 +   nir_ssa_values[instr-def.index] = bld.vgrf(BRW_REGISTER_TYPE_D,
 +   instr-def.num_components);
 +}
 +
  static fs_reg
  fs_reg_for_nir_reg(fs_visitor *v, nir_register *nir_reg,
 unsigned base_offset, nir_src *indirect)
 @@ -1171,30 +1190,30 @@ fs_reg_for_nir_reg(fs_visitor *v, nir_register 
 *nir_reg,
  fs_reg
  fs_visitor::get_nir_src(nir_src src)
  {
 +   fs_reg reg;
 if (src.is_ssa) {
 -  assert(src.ssa-parent_instr-type == nir_instr_type_load_const);
 -  nir_load_const_instr *load = 
 nir_instr_as_load_const(src.ssa-parent_instr);
 -  fs_reg reg = bld.vgrf(BRW_REGISTER_TYPE_D, src.ssa-num_components);
 -
 -  for (unsigned i = 0; i

Re: [Mesa-dev] [PATCH 0/4] i965: use SSA values when we can

2015-06-25 Thread Jason Ekstrand

And, you got some shader-db stats:

total instructions in shared programs: 6078991 - 6073118 (-0.10%)
instructions in affected programs: 402221 - 396348 (-1.46%)
helped:1527
HURT:  0
GAINED:8
LOST:  2

I'm not sure which commit it was that helped.  I'm guessing one of our
on-the-fly peepholes started working better.


On Thu, Jun 25, 2015 at 12:29 PM, Connor Abbott cwabbo...@gmail.com wrote:
 Before, we were using a hack where when we converted out of SSA, we set
 a parent_instr field of the nir_register to indicate that the register
 was actually an SSA value. But in the future, we want to handle SSA
 values directly, and right now we're creating an extra nir_register for
 everything, even if it's not involved in a phi node. This series removes
 that hack for i965 and gets us using SSA values directly in most cases.

 The only other user of nir_convert_from_ssa() is vc4, which I believed I
 changed correctly, and it doesn't seem to use
 nir_register::parent_instr, based on my grepping. I tried to
 compile-test it, but it assumed I was using the simulator and died, so
 it would be nice to at least compile-test it.

 The changes are also available at:

 git://people.freedesktop.org/~cwabbott0/mesa i965-use-ssa

 Connor Abbott (4):
   nir/from_ssa: add a flag to not convert everything to SSA
   i965/fs: use SSA values directly
   nir: remove nir_src_get_parent_instr()
   nir: remove parent_instr from nir_register

  src/gallium/drivers/vc4/vc4_program.c  |  2 +-
  src/glsl/nir/nir.c |  1 -
  src/glsl/nir/nir.h | 25 ++--
  src/glsl/nir/nir_from_ssa.c| 33 +-
  src/mesa/drivers/dri/i965/brw_fs.h |  5 ++
  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 73 
 ++
  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |  1 +
  src/mesa/drivers/dri/i965/brw_nir.c|  2 +-
  .../dri/i965/brw_nir_analyze_boolean_resolves.c| 12 ++--
  9 files changed, 84 insertions(+), 70 deletions(-)

 --
 2.4.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 5/7] util: port _mesa_strto[df] to C

2015-06-25 Thread Erik Faye-Lund

_mesa_strtod and _mesa_strtof are only used from the GLSL compiler and
the ARB_[vertex|fragment]_program code, meaning that the locale doesn't
need to be initialized before the first OpenGL context gets initialized.

So let's use explicit initialization from the one-time init code instead
of depending on a C++ compiler to initialize at image-load time.

Signed-off-by: Erik Faye-Lund kusmab...@gmail.com
Reviewed-by: Matt Turner matts...@gmail.com
---
 src/glsl/glcpp/glcpp.c|  3 ++
 src/glsl/main.cpp |  3 ++
 src/mesa/main/context.c   |  3 ++
 src/util/Makefile.sources |  2 +-
 src/util/strtod.c | 78 +++
 src/util/strtod.cpp   | 75 -
 src/util/strtod.h |  3 ++
 7 files changed, 91 insertions(+), 76 deletions(-)
 create mode 100644 src/util/strtod.c
 delete mode 100644 src/util/strtod.cpp

diff --git a/src/glsl/glcpp/glcpp.c b/src/glsl/glcpp/glcpp.c
index 5144516..c62f4ef 100644
--- a/src/glsl/glcpp/glcpp.c
+++ b/src/glsl/glcpp/glcpp.c
@@ -29,6 +29,7 @@
 #include glcpp.h
 #include main/mtypes.h
 #include main/shaderobj.h
+#include util/strtod.h
 
 extern int glcpp_parser_debug;
 
@@ -168,6 +169,8 @@ main (int argc, char *argv[])
if (shader == NULL)
   return 1;
 
+   _mesa_locale_init();
+
ret = glcpp_preprocess(ctx, shader, info_log, NULL, gl_ctx);
 
printf(%s, shader);
diff --git a/src/glsl/main.cpp b/src/glsl/main.cpp
index 2341298..58651df 100644
--- a/src/glsl/main.cpp
+++ b/src/glsl/main.cpp
@@ -38,6 +38,7 @@
 #include program/hash_table.h
 #include loop_analysis.h
 #include standalone_scaffolding.h
+#include util/strtod.h
 
 static int glsl_version = 330;
 
@@ -46,6 +47,8 @@ initialize_context(struct gl_context *ctx, gl_api api)
 {
initialize_context_to_defaults(ctx, api);
 
+   _mesa_locale_init();
+
/* The standalone compiler needs to claim support for almost
 * everything in order to compile the built-in functions.
 */
diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c
index c4af8ea..e68de68 100644
--- a/src/mesa/main/context.c
+++ b/src/mesa/main/context.c
@@ -120,6 +120,7 @@
 #include shaderobj.h
 #include shaderimage.h
 #include util/simple_list.h
+#include util/strtod.h
 #include state.h
 #include stencil.h
 #include texcompress_s3tc.h
@@ -374,6 +375,8 @@ one_time_init( struct gl_context *ctx )
   assert( sizeof(GLint) == 4 );
   assert( sizeof(GLuint) == 4 );
 
+  _mesa_locale_init();
+
   _mesa_one_time_init_extension_overrides();
 
   _mesa_get_cpu_features();
diff --git a/src/util/Makefile.sources b/src/util/Makefile.sources
index dc55939..82df3bc 100644
--- a/src/util/Makefile.sources
+++ b/src/util/Makefile.sources
@@ -19,7 +19,7 @@ MESA_UTIL_FILES :=\
set.c \
set.h \
simple_list.h \
-   strtod.cpp \
+   strtod.c \
strtod.h \
texcompress_rgtc_tmp.h \
u_atomic.h
diff --git a/src/util/strtod.c b/src/util/strtod.c
new file mode 100644
index 000..e5e6f76
--- /dev/null
+++ b/src/util/strtod.c
@@ -0,0 +1,78 @@
+/*
+ * Copyright 2010 VMware, Inc.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * Software), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
+ * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+#include stdlib.h
+
+#ifdef _GNU_SOURCE
+#include locale.h
+#ifdef HAVE_XLOCALE_H
+#include xlocale.h
+static locale_t loc;
+static int initialized;
+#endif
+#endif
+
+#include strtod.h
+
+
+void
+_mesa_locale_init(void)
+{
+#if defined(_GNU_SOURCE)  defined(HAVE_XLOCALE_H)
+   loc = newlocale(LC_CTYPE_MASK, C, NULL);
+#endif
+}
+
+/**
+ * Wrapper around strtod which uses the C locale so the decimal
+ * point is always '.'
+ */
+double
+_mesa_strtod(const char *s, char **end)
+{
+#if defined(_GNU_SOURCE)  defined(HAVE_XLOCALE_H)
+   return strtod_l(s, end, loc);
+#else
+   return strtod(s, end);
+#endif
+}
+
+
+/**
+ * Wrapper around strtof which uses

[Mesa-dev] [PATCH v2 6/7] mesa/main: free locale at exit

2015-06-25 Thread Erik Faye-Lund

In order to save a small leak if mesa is continously loaded and
unloaded, let's free the locale when the shared object is unloaded.

Signed-off-by: Erik Faye-Lund kusmab...@gmail.com
---
 src/mesa/main/context.c | 12 +++-
 src/util/strtod.c   |  8 
 src/util/strtod.h   |  3 +++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c
index e68de68..dee1fa8 100644
--- a/src/mesa/main/context.c
+++ b/src/mesa/main/context.c
@@ -346,6 +346,16 @@ _mesa_destroy_visual( struct gl_config *vis )
 mtx_t OneTimeLock = _MTX_INITIALIZER_NP;
 
 
+/**
+ * Calls all the various one-time-fini functions in Mesa
+ */
+
+static void
+one_time_fini()
+{
+   _mesa_destroy_shader_compiler();
+   _mesa_locale_fini();
+}
 
 /**
  * Calls all the various one-time-init functions in Mesa.
@@ -385,7 +395,7 @@ one_time_init( struct gl_context *ctx )
  _mesa_ubyte_to_float_color_tab[i] = (float) i / 255.0F;
   }
 
-  atexit(_mesa_destroy_shader_compiler);
+  atexit(one_time_fini);
 
 #if defined(DEBUG)  defined(__DATE__)  defined(__TIME__)
   if (MESA_VERBOSE != 0) {
diff --git a/src/util/strtod.c b/src/util/strtod.c
index e5e6f76..5c36b05 100644
--- a/src/util/strtod.c
+++ b/src/util/strtod.c
@@ -46,6 +46,14 @@ _mesa_locale_init(void)
 #endif
 }
 
+void
+_mesa_locale_fini(void)
+{
+#if defined(_GNU_SOURCE)  defined(HAVE_XLOCALE_H)
+   freelocale(loc);
+#endif
+}
+
 /**
  * Wrapper around strtod which uses the C locale so the decimal
  * point is always '.'
diff --git a/src/util/strtod.h b/src/util/strtod.h
index b7e2beb..60e15cf 100644
--- a/src/util/strtod.h
+++ b/src/util/strtod.h
@@ -34,6 +34,9 @@ extern C {
 extern void
 _mesa_locale_init(void);
 
+extern void
+_mesa_locale_fini(void);
+
 extern double
 _mesa_strtod(const char *s, char **end);
 
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 7/7] util: assert to verify that locale is initialized

2015-06-25 Thread Erik Faye-Lund

Add an assert to Verify that the locale has been initialized when
we call strtod. This might help some developers sleep better at
night.

Signed-off-by: Erik Faye-Lund kusmab...@gmail.com
---
 src/util/strtod.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/util/strtod.c b/src/util/strtod.c
index 5c36b05..7a73041 100644
--- a/src/util/strtod.c
+++ b/src/util/strtod.c
@@ -25,6 +25,7 @@
 
 
 #include stdlib.h
+#include assert.h
 
 #ifdef _GNU_SOURCE
 #include locale.h
@@ -43,6 +44,7 @@ _mesa_locale_init(void)
 {
 #if defined(_GNU_SOURCE)  defined(HAVE_XLOCALE_H)
loc = newlocale(LC_CTYPE_MASK, C, NULL);
+   initialized = 1;
 #endif
 }
 
@@ -51,6 +53,7 @@ _mesa_locale_fini(void)
 {
 #if defined(_GNU_SOURCE)  defined(HAVE_XLOCALE_H)
freelocale(loc);
+   initialized = 0;
 #endif
 }
 
@@ -62,6 +65,7 @@ double
 _mesa_strtod(const char *s, char **end)
 {
 #if defined(_GNU_SOURCE)  defined(HAVE_XLOCALE_H)
+   assert(initialized);
return strtod_l(s, end, loc);
 #else
return strtod(s, end);
@@ -77,6 +81,7 @@ float
 _mesa_strtof(const char *s, char **end)
 {
 #if defined(_GNU_SOURCE)  defined(HAVE_XLOCALE_H)
+   assert(initialized);
return strtof_l(s, end, loc);
 #elif defined(HAVE_STRTOF)
return strtof(s, end);
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 4/7] glsl: No need to lock in _mesa_glsl_release_types

2015-06-25 Thread Erik Faye-Lund

This function only gets called while mesa is unloading, so there's
no potential of racing or multiple calls at the same time. So let's
just get rid of the locking.

Signed-off-by: Erik Faye-Lund kusmab...@gmail.com
---
 src/glsl/glsl_types.cpp | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp
index f675e90..dbe2382 100644
--- a/src/glsl/glsl_types.cpp
+++ b/src/glsl/glsl_types.cpp
@@ -324,8 +324,10 @@ const glsl_type *glsl_type::get_scalar_type() const
 void
 _mesa_glsl_release_types(void)
 {
-   mtx_lock(glsl_type::mutex);
-
+   /* should only be called during atexit (either when unloading shared
+* object, or if process terminates), so no mutex-locking should be
+* nessecary.
+*/
if (glsl_type::array_types != NULL) {
   hash_table_dtor(glsl_type::array_types);
   glsl_type::array_types = NULL;
@@ -335,8 +337,6 @@ _mesa_glsl_release_types(void)
   hash_table_dtor(glsl_type::record_types);
   glsl_type::record_types = NULL;
}
-
-   mtx_unlock(glsl_type::mutex);
 }
 
 
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 1/7] mesa/main: Get rid of outdated GDB-hack

2015-06-25 Thread Erik Faye-Lund

All of these enums are now in use around in the code, so there's no need
to explicitly use them here any more.

Signed-off-by: Erik Faye-Lund kusmab...@gmail.com
---
 src/mesa/main/context.c | 27 ---
 1 file changed, 27 deletions(-)

diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c
index 79fa018..265f98a 100644
--- a/src/mesa/main/context.c
+++ b/src/mesa/main/context.c
@@ -338,31 +338,6 @@ _mesa_destroy_visual( struct gl_config *vis )
 
 
 /**
- * This is lame.  gdb only seems to recognize enum types that are
- * actually used somewhere.  We want to be able to print/use enum
- * values such as TEXTURE_2D_INDEX in gdb.  But we don't actually use
- * the gl_texture_index type anywhere.  Thus, this lame function.
- */
-static void
-dummy_enum_func(void)
-{
-   gl_buffer_index bi = BUFFER_FRONT_LEFT;
-   gl_face_index fi = FACE_POS_X;
-   gl_frag_result fr = FRAG_RESULT_DEPTH;
-   gl_texture_index ti = TEXTURE_2D_ARRAY_INDEX;
-   gl_vert_attrib va = VERT_ATTRIB_POS;
-   gl_varying_slot vs = VARYING_SLOT_POS;
-
-   (void) bi;
-   (void) fi;
-   (void) fr;
-   (void) ti;
-   (void) va;
-   (void) vs;
-}
-
-
-/**
  * One-time initialization mutex lock.
  *
  * \sa Used by one_time_init().
@@ -434,8 +409,6 @@ one_time_init( struct gl_context *ctx )
 * #ifdef tests here.
 */
atexit(_mesa_destroy_shader_compiler);
-
-   dummy_enum_func();
 }
 
 
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 2/7] dri: don't touch the shader compiler

2015-06-25 Thread Erik Faye-Lund

This function is for deleting per-screen resources, and the shader
compiler resources are not of such nature. Besides, dri shouldn't
need to even know about the presence of a shader compiler.

These resources will already be released when mesa gets unloaded,
and that should be sufficient.

Signed-off-by: Erik Faye-Lund kusmab...@gmail.com
---
 src/mesa/drivers/dri/common/dri_util.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/src/mesa/drivers/dri/common/dri_util.c 
b/src/mesa/drivers/dri/common/dri_util.c
index e7ababe..ae4592c 100644
--- a/src/mesa/drivers/dri/common/dri_util.c
+++ b/src/mesa/drivers/dri/common/dri_util.c
@@ -46,7 +46,6 @@
 #include dri_util.h
 #include utils.h
 #include xmlpool.h
-#include ../glsl/glsl_parser_extras.h
 #include main/mtypes.h
 #include main/version.h
 #include main/errors.h
@@ -238,8 +237,6 @@ static void driDestroyScreen(__DRIscreen *psp)
 * stream open to the X-server anymore.
 */
 
-   _mesa_destroy_shader_compiler();
-
psp-driver-DestroyScreen(psp);
 
driDestroyOptionCache(psp-optionCache);
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 3/7] mesa/main: only call _mesa_destroy_shader_compiler once on exit

2015-06-25 Thread Erik Faye-Lund

There's no point in calling _mesa_destroy_shader_compiler multiple
times on exit; the resources will only be released once anyway.

So let's move the atexit-call into the part that is only called
once.

Signed-off-by: Erik Faye-Lund kusmab...@gmail.com
---
 src/mesa/main/context.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c
index 265f98a..c4af8ea 100644
--- a/src/mesa/main/context.c
+++ b/src/mesa/main/context.c
@@ -382,6 +382,8 @@ one_time_init( struct gl_context *ctx )
  _mesa_ubyte_to_float_color_tab[i] = (float) i / 255.0F;
   }
 
+  atexit(_mesa_destroy_shader_compiler);
+
 #if defined(DEBUG)  defined(__DATE__)  defined(__TIME__)
   if (MESA_VERBOSE != 0) {
 _mesa_debug(ctx, Mesa %s DEBUG build %s %s\n,
@@ -404,11 +406,6 @@ one_time_init( struct gl_context *ctx )
api_init_mask |= 1  ctx-API;
 
mtx_unlock(OneTimeLock);
-
-   /* Hopefully atexit() is widely available.  If not, we may need some
-* #ifdef tests here.
-*/
-   atexit(_mesa_destroy_shader_compiler);
 }
 
 
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 0/7] port _mesa_strto[df] to C

2015-06-25 Thread Erik Faye-Lund

Back in March[1], I sent a patch porting _mesa_strto[df] to
C rather than C++. I fixed up the patch according to the
criticism, but unfortunately I dropped the ball before I sent
out the result. So here I am, picking it back up!

This time I've taken a deeper dive into the whole init/deinit
of Mesa, and cleaned up a bunch of stuff in that area. And as
a result, this time we end up freeing the locale also.

No Piglit regressions observed.

[1]: 1426446329-23984-1-git-send-email-kusmab...@gmail.com

Erik Faye-Lund (7):
  mesa/main: Get rid of outdated GDB-hack
  dri: don't touch the shader compiler
  mesa/main: only call _mesa_destroy_shader_compiler once on exit
  glsl: No need to lock in _mesa_glsl_release_types
  util: port _mesa_strto[df] to C
  mesa/main: free locale at exit
  util: assert to verify that locale is initialized

 src/glsl/glcpp/glcpp.c |  3 ++
 src/glsl/glsl_types.cpp|  8 +--
 src/glsl/main.cpp  |  3 ++
 src/mesa/drivers/dri/common/dri_util.c |  3 --
 src/mesa/main/context.c| 47 ++
 src/util/Makefile.sources  |  2 +-
 src/util/strtod.c  | 91 ++
 src/util/strtod.cpp| 75 
 src/util/strtod.h  |  6 +++
 9 files changed, 123 insertions(+), 115 deletions(-)
 create mode 100644 src/util/strtod.c
 delete mode 100644 src/util/strtod.cpp

-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: add support for viewport array (v2)

2015-06-25 Thread Marek Olšák

Wait a moment, how did it fail with si_shader_io_get_unique_index? The
function shouldn't be called for ES with the viewport index, because
ES can't pass the output to GS. If it was called, ignoring the
viewport index in si_llvm_emit_es_epilogue should fix it.

Marek

On Thu, Jun 25, 2015 at 10:29 PM, Marek Olšák mar...@gmail.com wrote:
 In that case, feel free to push.

 Reviewed-by: Marek Olšák marek.ol...@amd.com

 Marek

 On Thu, Jun 25, 2015 at 10:25 PM, Dave Airlie airl...@gmail.com wrote:
 On 26 June 2015 at 00:26, Marek Olšák mar...@gmail.com wrote:
 Hi Dave,

 The change in si_shader_io_get_unique_index can be dropped. The
 function is only used for shaders before GS.

 Ok okay I was hitting the assert in there for the layer/viewport index cases,
 but if the patch you pushed to master helps I'll drop it.

 This looks good, but I've had a different plan for this feature:

 Yeah I thought you might, I just wanted to hack something up to see it 
 working
 since it seems new Dota 2 uses this feature.

 I'd like the states to be converted into 2 atoms:

 1 r600_atom for all 16 viewports
 1 r600_atom for all 16 scissors

 Each atom should have a bitmask saying which slots are dirty. (the same
 idea as resource slots)

 The emit functions should only emit dirty viewports/scissors.

 Also, the emit functions shouldn't emit non-zero viewports/scissors
 if the viewport index isn't written by the hardware VS stage
 (si_get_vs_info(sctx)-...). This should keep the same level of
 effectiveness as before. When a shader that writes the viewport index
 is bound *and* there are any dirty viewports or scissors, that's the
 right time to mark the atoms as dirty again, so that non-zero dirty
 viewports/scissors are finally emitted.

 I'm not sure if I'll get to doing more of it, I just had a day at home,
 and wanted to get up to speed on radeonsi a bit, so I can just leave
 that here, in case anyone wants to pick it up,

 Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 91112] [HSW/BYT] dEQP-GLES3.functional.uniform_api.random.23 test timeout

2015-06-25 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=91112

Bug ID: 91112
   Summary: [HSW/BYT] dEQP-GLES3.functional.uniform_api.random.23
test timeout
   Product: Mesa
   Version: 10.6
  Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
  Severity: normal
  Priority: medium
 Component: Mesa core
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: brian.wil...@intel.com
QA Contact: mesa-dev@lists.freedesktop.org

This isn't an outright test failure per-se, but dEQP is marking the test
timeout as such.

Timeout is on: dEQP-GLES3.functional.uniform_api.random.23

Mesa commit being tested:
c2a0600d5b0645533ba442b5ab879b23c2564a4d: i965: Don't set NirOptions for
stages that will use the vec4 backend.
Plus patches:
- 10.6-i965-do-not-round-line-width-when-multisampling-or-a.patch
- 10.6-mesa-add-GL_RED-GL_RG-support-for-floating-point-tex.patch
- 10.6-i965-Momentarily-pretend-to-support-ARB_texture_sten.patch
- 10.6-Revert-i965-Advertise-a-line-width-of-40.0-on-Cherry.patch

This test passes on BDW and BSW, and passed on HSW/BYT with an older Mesa
commit (November 18, 2014: 129178893b2260df22db96327c5ca9c2ce7db046 glsl:
Generate unique names for each const array lowered to uniforms + ~10 patches)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: add support for geometry shader invocations.

2015-06-25 Thread Ilia Mirkin

On Thu, Jun 25, 2015 at 4:29 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
 On Thu, Jun 25, 2015 at 4:26 PM, Dave Airlie airl...@gmail.com wrote:
 From: Dave Airlie airl...@redhat.com

 Signed-off-by: Dave Airlie airl...@redhat.com
 ---
  src/gallium/drivers/radeonsi/si_shader.c| 5 +
  src/gallium/drivers/radeonsi/si_shader.h| 1 +
  src/gallium/drivers/radeonsi/si_state.c | 1 -
  src/gallium/drivers/radeonsi/si_state_shaders.c | 7 +++
  4 files changed, 13 insertions(+), 1 deletion(-)

 diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
 b/src/gallium/drivers/radeonsi/si_shader.c
 index 87608a1..665ce83 100644
 --- a/src/gallium/drivers/radeonsi/si_shader.c
 +++ b/src/gallium/drivers/radeonsi/si_shader.c
 @@ -630,6 +630,11 @@ static void declare_system_value(
  SI_PARAM_BASE_VERTEX);
 break;

 +   case TGSI_SEMANTIC_INVOCATIONID:
 +   value = LLVMGetParam(radeon_bld-main_fn,
 +SI_PARAM_GS_INSTANCE_ID);
 +   break;
 +
 case TGSI_SEMANTIC_SAMPLEID:
 value = get_sample_id(radeon_bld);
 break;
 diff --git a/src/gallium/drivers/radeonsi/si_shader.h 
 b/src/gallium/drivers/radeonsi/si_shader.h
 index 51055af..b4339ae 100644
 --- a/src/gallium/drivers/radeonsi/si_shader.h
 +++ b/src/gallium/drivers/radeonsi/si_shader.h
 @@ -115,6 +115,7 @@ struct si_shader_selector {

 unsignedgs_output_prim;
 unsignedgs_max_out_vertices;
 +   unsignedgs_num_invocations;
 uint64_tgs_used_inputs; /* mask of get_unique_index bits */
  };

 diff --git a/src/gallium/drivers/radeonsi/si_state.c 
 b/src/gallium/drivers/radeonsi/si_state.c
 index 752467b..0dd08a2 100644
 --- a/src/gallium/drivers/radeonsi/si_state.c
 +++ b/src/gallium/drivers/radeonsi/si_state.c
 @@ -3078,7 +3078,6 @@ void si_init_config(struct si_context *sctx)
 si_pm4_set_reg(pm4, R_028B60_VGT_GS_VERT_ITEMSIZE_1, 0);
 si_pm4_set_reg(pm4, R_028B64_VGT_GS_VERT_ITEMSIZE_2, 0);
 si_pm4_set_reg(pm4, R_028B68_VGT_GS_VERT_ITEMSIZE_3, 0);
 -   si_pm4_set_reg(pm4, R_028B90_VGT_GS_INSTANCE_CNT, 0);

 si_pm4_set_reg(pm4, R_028B98_VGT_STRMOUT_BUFFER_CONFIG, 0x0);
 si_pm4_set_reg(pm4, R_028AB4_VGT_REUSE_OFF, 0);
 diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
 b/src/gallium/drivers/radeonsi/si_state_shaders.c
 index 48128fa..eef3baa 100644
 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c
 +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
 @@ -76,6 +76,7 @@ static void si_shader_gs(struct si_shader *shader)
 unsigned gs_vert_itemsize = shader-selector-info.num_outputs * (16 
  2);
 unsigned gs_max_vert_out = shader-selector-gs_max_out_vertices;
 unsigned gsvs_itemsize = gs_vert_itemsize * gs_max_vert_out;
 +   unsigned gs_num_invocations = shader-selector-gs_num_invocations;
 unsigned cut_mode;
 struct si_pm4_state *pm4;
 unsigned num_sgprs, num_user_sgprs;
 @@ -118,6 +119,10 @@ static void si_shader_gs(struct si_shader *shader)

 si_pm4_set_reg(pm4, R_028B5C_VGT_GS_VERT_ITEMSIZE, gs_vert_itemsize);

 +   si_pm4_set_reg(pm4, R_028B90_VGT_GS_INSTANCE_CNT,
 +  S_028B90_CNT(MIN2(gs_num_invocations, 127)) |
 +  S_028B90_ENABLE(gs_num_invocations  0));

 Without knowing how radeonsi works, just want to point out that
 previously GS_INSTANCE_CNT was being always set to 0, even though a
 single invocation was still being run. Perhaps this really needs to be
 gs_num_invocations_minus_one?

 GS_INVOCATIONS will always be at least 1.

OK, I looked at the docs. Looks like the enable is for instance id
generation, and if invocations == 0, that's also treated as a disable.

This may end up generating the instanceid unnecessarily, but that's
probably OK. I might try changing that enable to be  1, but probably
not a big deal either way.

  -ilia


 +
 va = shader-bo-gpu_address;
 si_pm4_add_bo(pm4, shader-bo, RADEON_USAGE_READ, 
 RADEON_PRIO_SHADER_DATA);
 si_pm4_set_reg(pm4, R_00B220_SPI_SHADER_PGM_LO_GS, va  8);
 @@ -490,6 +495,8 @@ static void *si_create_shader_state(struct pipe_context 
 *ctx,
 sel-info.properties[TGSI_PROPERTY_GS_OUTPUT_PRIM];
 sel-gs_max_out_vertices =
 
 sel-info.properties[TGSI_PROPERTY_GS_MAX_OUTPUT_VERTICES];
 +   sel-gs_num_invocations =
 +   sel-info.properties[TGSI_PROPERTY_GS_INVOCATIONS];

 for (i = 0; i  sel-info.num_inputs; i++) {
 unsigned name = sel-info.input_semantic_name[i];
 --
 2.4.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 91106] glx: make check fails to build on osx

2015-06-25 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=91106

Julien Isorce julien.iso...@gmail.com changed:

   What|Removed |Added

 CC||bri...@vmware.com,
   ||emil.l.veli...@gmail.com,
   ||jerem...@freedesktop.org,
   ||jon.tur...@dronecode.org.uk
   ||, k...@bitplanet.net,
   ||matts...@gmail.com

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH mesa] i965/gen8+: bo in state base address must be in 32-bit address range

2015-06-25 Thread Michel Thierry


On 6/24/2015 4:51 AM, Ben Widawsky wrote:

Hi. Feel free to Cc me on patches of this nature. I am far behind on mesa-dev,
and no longer read intel-gfx. I'm probably one of the sensible people to look at
this...

On Tue, Jun 23, 2015 at 01:21:27PM +0100, Michel Thierry wrote:

Gen8+ supports 48-bit virtual addresses, but some objects must always be
allocated inside the 32-bit address range.

In specific, any resource used with flat/heapless (0x-0xf000)
General State Heap (GSH) or Intruction State Heap (ISH) must be in a
32-bit range, because the General State Offset and Instruction State Offset
are limited to 32-bits.


I don't think GSH, or ISH are well known terms that have every appeared
anywhere. I'd just keep the bit after the final comma (...because ...)



Set provided bo flag when the 4GB limit is not necessary, to be able to use
the full address space.


I'm glad you got around to this. We'd been putting it off for a long time.



Cc: mesa-dev@lists.freedesktop.org
Signed-off-by: Michel Thierry michel.thie...@intel.com
---
  src/mesa/drivers/dri/i965/gen8_misc_state.c   | 6 +++---
  src/mesa/drivers/dri/i965/intel_batchbuffer.h | 7 +++
  2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen8_misc_state.c 
b/src/mesa/drivers/dri/i965/gen8_misc_state.c
index b20038e..26531d0 100644
--- a/src/mesa/drivers/dri/i965/gen8_misc_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_misc_state.c
@@ -41,17 +41,17 @@ void gen8_upload_state_base_address(struct brw_context *brw)
 OUT_BATCH(0);
 OUT_BATCH(mocs_wb  16);
 /* Surface state base address: */
-   OUT_RELOC64(brw-batch.bo, I915_GEM_DOMAIN_SAMPLER, 0,
+   OUT_RELOC64_32BWA(brw-batch.bo, I915_GEM_DOMAIN_SAMPLER, 0,
 mocs_wb  4 | 1);
 /* Dynamic state base address: */
-   OUT_RELOC64(brw-batch.bo,
+   OUT_RELOC64_32BWA(brw-batch.bo,
 I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION, 0,
 mocs_wb  4 | 1);
 /* Indirect object base address: MEDIA_OBJECT data */
 OUT_BATCH(mocs_wb  4 | 1);
 OUT_BATCH(0);
 /* Instruction base address: shader kernels (incl. SIP) */
-   OUT_RELOC64(brw-cache.bo, I915_GEM_DOMAIN_INSTRUCTION, 0,
+   OUT_RELOC64_32BWA(brw-cache.bo, I915_GEM_DOMAIN_INSTRUCTION, 0,
 mocs_wb  4 | 1);

 /* General state buffer size */
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
index 7bdd836..5aa741e 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
@@ -177,6 +177,13 @@ intel_batchbuffer_advance(struct brw_context *brw)

  /* Handle 48-bit address relocations for Gen8+ */
  #define OUT_RELOC64(buf, read_domains, write_domain, delta) do { \
+   drm_intel_bo_set_supports_48baddress(buf); \
+   intel_batchbuffer_emit_reloc64(brw, buf, read_domains, write_domain, 
delta);\
+} while (0)
+
+/* Handle 48-bit address relocations for Gen8+, ask for 32-bit address */
+#define OUT_RELOC64_32BWA(buf, read_domains, write_domain, delta) do { \
+   drm_intel_bo_clear_supports_48baddress(buf); \
 intel_batchbuffer_emit_reloc64(brw, buf, read_domains, write_domain, 
delta);   \
  } while (0)



First and least bikesheddy, you need to bump the required libdrm in the
configure.ac to support this new libdrm function (maybe you did, but I don't see
it on mesa-dev).


I didn't bump the libdrm version either. I'll make sure to include this 
in the next version.




More bikesheddy, and forgive me here because I haven't looked at any of the
kernel interfaces or libdrm patches (you can Cc those to mesa-dev if they're
relevant fwiw).

Presumably at the end of the day it's drm_intel_bo_emit_reloc which needs to
know about these limitations. Unfortunately we don't have a flags field there.
The implementation here seems like a somewhat cumbersome workaround for that (it
looks like the context execbuf which is pretty crappy - yes, I know who the
author was). Have you already discussed adding a new emit_reloc? I suppose if
people are opposed to a new emit reloc, the only I'd like to see different is
have the functions which need the workaround just call OUT_RELOC, instead of
OUT_RELOC64 (put a comment in the call sites), and make OUT_RELOC call the
drm_intel_bo_clear_supports_48baddress() (which is obviously a nop on pre-gen8
platforms). The OUT_RELOC64 case should be left alone - we shouldn't need to
tell libdrm that I want a 64bit relocation, and it can actually be 64b...


Right, at the end I was only looking to flag the exec_objects that can 
be safely outside the first 4GBs (it was decided to be an opt-in to not 
break any other user). I can change it like you suggest, OUT_RELOC will 
mean that we need the wa, while OUT_RELOC64 means we don't.


If you want to take a look, these are the libdrm and kernel changes:

libdrm:

Re: [Mesa-dev] [Mesa-stable] [PATCH 2/6] clover: Call clBuildProgram() notification function when build completes v2

2015-06-25 Thread Emil Velikov

On 3 June 2015 at 14:27, Francisco Jerez curroje...@riseup.net wrote:
 Emil Velikov emil.l.veli...@gmail.com writes:

 Hi Tom,

 On 31 March 2015 at 15:29, Francisco Jerez curroje...@riseup.net wrote:
 Tom Stellard thomas.stell...@amd.com writes:

 v2:
   - Only call notification for build errors
   - Fix clCompileProgram()

 Cc: 10.5 10.4 mesa-sta...@lists.freedesktop.org
 ---
  src/gallium/state_trackers/clover/api/program.cpp | 16 ++--
  1 file changed, 14 insertions(+), 2 deletions(-)

 diff --git a/src/gallium/state_trackers/clover/api/program.cpp 
 b/src/gallium/state_trackers/clover/api/program.cpp
 index 60184ed..5cd543c 100644
 --- a/src/gallium/state_trackers/clover/api/program.cpp
 +++ b/src/gallium/state_trackers/clover/api/program.cpp
 @@ -180,12 +180,18 @@ clBuildProgram(cl_program d_prog, cl_uint num_devs,
 validate_build_program_common(prog, num_devs, d_devs, pfn_notify, 
 user_data);

 prog.build(devs, opts);
 +   if (pfn_notify)
 +  pfn_notify(d_prog, user_data);

 Maybe leave blank lines around the conditional where preceded/followed
 by another statement?

 return CL_SUCCESS;
 +} catch (const build_error e) {
 +   if (pfn_notify)
 +  pfn_notify(d_prog, user_data);
 +   if (e.get() == CL_COMPILE_PROGRAM_FAILURE)
 +  return CL_BUILD_PROGRAM_FAILURE;
 +   return e.get();

 You can just return CL_BUILD_PROGRAM_FAILURE here unconditionally.

  } catch (error e) {
 if (e.get() == CL_INVALID_COMPILER_OPTIONS)
return CL_INVALID_BUILD_OPTIONS;
 -   if (e.get() == CL_COMPILE_PROGRAM_FAILURE)
 -  return CL_BUILD_PROGRAM_FAILURE;
 return e.get();
  }

 @@ -223,8 +229,14 @@ clCompileProgram(cl_program d_prog, cl_uint num_devs,
objsallow_empty_tag(d_header_progs, num_headers));

 prog.build(devs, opts, headers);
 +   if (pfn_notify)
 +  pfn_notify(d_prog, user_data);

 Same here.

 Did you had the chance to address Francisco's comments ? Did this
 patch fell through the cracks, or should I consider it
 obsolete/rejected ?


 I think the patch was good it just fell through the cracks.  Tom?

Humble reminder.

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Mesa-stable] [PATCH 4/4] glsl: validate sampler array indexing for 'constant-index-expression'

2015-06-25 Thread Emil Velikov

Hi gents,

On 9 June 2015 at 14:09, Francisco Jerez curroje...@riseup.net wrote:
 Francisco Jerez curroje...@riseup.net writes:

 Tapani Pälli tapani.pa...@intel.com writes:

 Desktop GLSL  130 and GLSL ES  300 allow sampler array indexing where
 index can contain a loop induction variable. This extra check will warn
 during linking if some of the indexes could not be turned in to constant
 expressions.

 v2: warning instead of error for backends that did not enable
 UnrollSamplerArrayDynamicIndexing option (have dynamic indexing)

 Signed-off-by: Tapani Pälli tapani.pa...@intel.com
 Cc: 10.5 and 10.6 mesa-sta...@lists.freedesktop.org
 ---
  src/glsl/linker.cpp | 77 
 +
  1 file changed, 77 insertions(+)

 diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
 index 9978380..27d7c18 100644
 --- a/src/glsl/linker.cpp
 +++ b/src/glsl/linker.cpp
 @@ -346,6 +346,39 @@ private:
 bool uses_non_zero_stream;
  };

 +/* Class that finds array derefs and check if indexes are dynamic. */
 +class dynamic_sampler_array_indexing_visitor : public 
 ir_hierarchical_visitor
 +{
 +public:
 +   dynamic_sampler_array_indexing_visitor() :
 +  dynamic_sampler_array_indexing(false)
 +   {
 +   }
 +
 +   ir_visitor_status visit_enter(ir_dereference_array *ir)
 +   {
 +  if (!ir-variable_referenced())
 + return visit_continue;
 +
 +  if (!ir-variable_referenced()-type-contains_sampler())
 + return visit_continue;
 +
 +  if (!ir-array_index-constant_expression_value()) {
 + dynamic_sampler_array_indexing = true;
 + return visit_stop;
 +  }
 +  return visit_continue;
 +   }
 +
 +   bool uses_dynamic_sampler_array_indexing()
 +   {
 +  return dynamic_sampler_array_indexing;
 +   }
 +
 +private:
 +   bool dynamic_sampler_array_indexing;
 +};
 +
  } /* anonymous namespace */

  void
 @@ -2736,6 +2769,40 @@ build_program_resource_list(struct gl_context *ctx,
  */
  }

 +/**
 + * This check is done to make sure we allow only constant expression
 + * indexing and constant-index-expression (indexing with an expression
 + * that includes loop induction variable).
 + */
 +static bool
 +validate_sampler_array_indexing(struct gl_context *ctx,
 +struct gl_shader_program *prog)
 +{
 +   dynamic_sampler_array_indexing_visitor v;
 +   for (unsigned i = 0; i  MESA_SHADER_STAGES; i++) {
 +  if (prog-_LinkedShaders[i] == NULL)
 + continue;
 +
 +  bool no_dynamic_indexing =
 + 
 ctx-Const.ShaderCompilerOptions[i].UnrollSamplerArrayDynamicIndexing;
 +
 +  /* Search for array derefs in shader. */
 +  v.run(prog-_LinkedShaders[i]-ir);
 +  if (v.uses_dynamic_sampler_array_indexing()) {
 + const char *msg = sampler arrays indexed with non-constant 
 +   expressions is forbidden in GLSL %s %u;

 For the sake of clarity, maybe add that it's sampler array indexing with
 *general* non-constant expressions what is forbidden, loop induction
 variables are allowed and they are technically a kind of non-constant
 expression.

 + /* Backend has indicated that it has no dynamic indexing support. 
 */
 + if (no_dynamic_indexing) {
 +linker_error(prog, msg, prog-IsES ? ES : , prog-Version);
 +return false;
 + } else {
 +linker_warning(prog, msg, prog-IsES ? ES : , 
 prog-Version);

 It seems a bit mean to spam the user with another warning at link time,
 you've already warned in PATCH 01 that this feature will be removed in
 more recent GLSL versions.  If you drop the warning:

 Er, nevermind, the warning here is indeed subtly different (you are
 doing a kind of indexing not considered under the
 constant-index-expression wording), disregard my comment about dropping
 the warning, it seems fine to warn the user twice.

 Reviewed-by: Francisco Jerez curroje...@riseup.net

If I understand things correctly the series (patches 1/4 and 4/4 from
stable pov) are reviewed but are not in master. Are there any
obstacles/objections against doing so ?

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/gen9: use an unreserved surface alignment value

2015-06-25 Thread Nanley Chery

On Thu, Jun 25, 2015 at 12:37 AM, Pohjolainen, Topi
topi.pohjolai...@intel.com wrote:
 On Wed, Jun 24, 2015 at 05:57:13PM -0700, Anuj Phogat wrote:
 On Wed, Jun 24, 2015 at 3:51 PM, Nanley Chery nanleych...@gmail.com wrote:
  From: Nanley Chery nanley.g.ch...@intel.com
 
  Although the horizontal and vertical alignment fields are ignored here,
  0 is a reserved value for them and may cause undefined behavior. Change
  the default value to an abitrary valid one.
 
  Signed-off-by: Nanley Chery nanley.g.ch...@intel.com
  ---
   src/mesa/drivers/dri/i965/gen8_surface_state.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)
 
  diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
  b/src/mesa/drivers/dri/i965/gen8_surface_state.c
  index b2d1a57..22ae960 100644
  --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
  +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
  @@ -93,7 +93,7 @@ vertical_alignment(const struct brw_context *brw,
  if (brw-gen  8 
  (mt-tr_mode != INTEL_MIPTREE_TRMODE_NONE ||
   surf_type == BRW_SURFACE_1D))
  -  return 0;
  +  return GEN8_SURFACE_VALIGN_4;
 
  switch (mt-align_h) {
  case 4:
  @@ -118,7 +118,7 @@ horizontal_alignment(const struct brw_context *brw,
  if (brw-gen  8 
  (mt-tr_mode != INTEL_MIPTREE_TRMODE_NONE ||
   gen9_use_linear_1d_layout(brw, mt)))
  -  return 0;
  +  return GEN8_SURFACE_HALIGN_4;
 
  switch (mt-align_w) {
  case 4:
  --
  2.4.4
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev

 Good find Nanley. We had no known issues with value 0 but it's
 always nice to avoid undefined behavior :).

 Right, I thought about this when I reviewed the original. The spec says
 it is ignored in these cases and hence the reserved value seemed fine. Now
 that we put something meaningful there, somebody is going to compare the
 spec and wonder why we set it to 4. If we added also a comment here that
 says this is just an arbitrary (non-reserved) value and really ignored
 by the hardware, it would prevent misunderstandings. What do you guys think?

There's enough space to insert Set to an arbitrary non-reserved
value. in both of the comments preceding the conditional without
adding an extra line. I wouldn't mind including it.

Thanks,
Nanley
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: add support for viewport array (v2)

2015-06-25 Thread Marek Olšák

Hi Dave,

The change in si_shader_io_get_unique_index can be dropped. The
function is only used for shaders before GS.

This looks good, but I've had a different plan for this feature:

I'd like the states to be converted into 2 atoms:

1 r600_atom for all 16 viewports
1 r600_atom for all 16 scissors

Each atom should have a bitmask saying which slots are dirty. (the same
idea as resource slots)

The emit functions should only emit dirty viewports/scissors.

Also, the emit functions shouldn't emit non-zero viewports/scissors
if the viewport index isn't written by the hardware VS stage
(si_get_vs_info(sctx)-...). This should keep the same level of
effectiveness as before. When a shader that writes the viewport index
is bound *and* there are any dirty viewports or scissors, that's the
right time to mark the atoms as dirty again, so that non-zero dirty
viewports/scissors are finally emitted.

Marek

On Thu, Jun 25, 2015 at 6:38 AM, Dave Airlie airl...@gmail.com wrote:
 From: Dave Airlie airl...@redhat.com

 This isn't pretty and I'd suggest it the pm4 interface builder
 could be tweaked to do this more efficently, but I'd need
 guidance on how that would look.

 This seems to pass the few piglit tests I threw at it.

 v2: handle passing layer/viewport index to fragment shader.
 fix crash in blit changes,
 add support to io_get_unique_index for layer/viewport index
 update docs.

 Signed-off-by: Dave Airlie airl...@redhat.com
 ---
  docs/GL3.txt|  4 +-
  docs/relnotes/10.7.0.html   |  3 ++
  src/gallium/drivers/radeonsi/si_blit.c  |  8 +--
  src/gallium/drivers/radeonsi/si_pipe.c  |  2 +-
  src/gallium/drivers/radeonsi/si_shader.c| 26 +++---
  src/gallium/drivers/radeonsi/si_state.c | 66 
 +++--
  src/gallium/drivers/radeonsi/si_state.h |  4 +-
  src/gallium/drivers/radeonsi/si_state_shaders.c |  2 -
  8 files changed, 73 insertions(+), 42 deletions(-)

 diff --git a/docs/GL3.txt b/docs/GL3.txt
 index 220bcc8..df913bd 100644
 --- a/docs/GL3.txt
 +++ b/docs/GL3.txt
 @@ -128,7 +128,7 @@ GL 4.1, GLSL 4.10:
GL_ARB_separate_shader_objects   DONE (all drivers)
GL_ARB_shader_precision  started (Micah)
GL_ARB_vertex_attrib_64bit   DONE (nvc0, softpipe)
 -  GL_ARB_viewport_arrayDONE (i965, nv50, 
 nvc0, r600, llvmpipe)
 +  GL_ARB_viewport_arrayDONE (i965, nv50, 
 nvc0, r600, radeonsi, llvmpipe)


  GL 4.2, GLSL 4.20:
 @@ -156,7 +156,7 @@ GL 4.3, GLSL 4.30:
GL_ARB_copy_imageDONE (i965) (gallium 
 - in progress, VMware)
GL_KHR_debug DONE (all drivers)
GL_ARB_explicit_uniform_location DONE (all drivers 
 that support GLSL)
 -  GL_ARB_fragment_layer_viewport   DONE (nv50, nvc0, 
 r600, llvmpipe)
 +  GL_ARB_fragment_layer_viewport   DONE (nv50, nvc0, 
 r600, radeonsi, llvmpipe)
GL_ARB_framebuffer_no_attachmentsDONE (i965)
GL_ARB_internalformat_query2 not started
GL_ARB_invalidate_subdataDONE (all drivers)
 diff --git a/docs/relnotes/10.7.0.html b/docs/relnotes/10.7.0.html
 index e089889..fcc5081 100644
 --- a/docs/relnotes/10.7.0.html
 +++ b/docs/relnotes/10.7.0.html
 @@ -44,8 +44,11 @@ Note: some of the new features are only available with 
 certain drivers.
  /p

  ul
 +liGL_AMD_vertex_shader_viewport_index on radeonsi/li
  liGL_ARB_framebuffer_no_attachments on i965/li
  liGL_ARB_shader_stencil_export on llvmpipe/li
 +liGL_ARB_viewport_array on radeonsi/li
 +liGL_ARB_fragment_layer_viewport on radeonsi/li
  /ul

  h2Bug fixes/h2
 diff --git a/src/gallium/drivers/radeonsi/si_blit.c 
 b/src/gallium/drivers/radeonsi/si_blit.c
 index 1f2c408..6c7b383 100644
 --- a/src/gallium/drivers/radeonsi/si_blit.c
 +++ b/src/gallium/drivers/radeonsi/si_blit.c
 @@ -63,11 +63,11 @@ static void si_blitter_begin(struct pipe_context *ctx, 
 enum si_blitter_op op)
 util_blitter_save_sample_mask(sctx-blitter,
   
 sctx-queued.named.sample_mask-sample_mask);
 }
 -   if (sctx-queued.named.viewport) {
 -   util_blitter_save_viewport(sctx-blitter, 
 sctx-queued.named.viewport-viewport);
 +   if (sctx-queued.named.viewport[0]) {
 +   util_blitter_save_viewport(sctx-blitter, 
 sctx-queued.named.viewport[0]-viewport);
 }
 -   if (sctx-queued.named.scissor) {
 -   util_blitter_save_scissor(sctx-blitter, 
 sctx-queued.named.scissor-scissor);
 +   if (sctx-queued.named.scissor[0]) {
 +   util_blitter_save_scissor(sctx-blitter, 
 sctx-queued.named.scissor[0]-scissor);
 }

Re: [Mesa-dev] [Mesa-stable] [PATCH 2/6] clover: Call clBuildProgram() notification function when build completes v2

2015-06-25 Thread Tom Stellard

On Thu, Jun 25, 2015 at 03:19:40PM +0100, Emil Velikov wrote:
 On 3 June 2015 at 14:27, Francisco Jerez curroje...@riseup.net wrote:
  Emil Velikov emil.l.veli...@gmail.com writes:
 
  Hi Tom,
 
  On 31 March 2015 at 15:29, Francisco Jerez curroje...@riseup.net wrote:
  Tom Stellard thomas.stell...@amd.com writes:
 
  v2:
- Only call notification for build errors
- Fix clCompileProgram()
 
  Cc: 10.5 10.4 mesa-sta...@lists.freedesktop.org
  ---
   src/gallium/state_trackers/clover/api/program.cpp | 16 ++--
   1 file changed, 14 insertions(+), 2 deletions(-)
 
  diff --git a/src/gallium/state_trackers/clover/api/program.cpp 
  b/src/gallium/state_trackers/clover/api/program.cpp
  index 60184ed..5cd543c 100644
  --- a/src/gallium/state_trackers/clover/api/program.cpp
  +++ b/src/gallium/state_trackers/clover/api/program.cpp
  @@ -180,12 +180,18 @@ clBuildProgram(cl_program d_prog, cl_uint num_devs,
  validate_build_program_common(prog, num_devs, d_devs, pfn_notify, 
  user_data);
 
  prog.build(devs, opts);
  +   if (pfn_notify)
  +  pfn_notify(d_prog, user_data);
 
  Maybe leave blank lines around the conditional where preceded/followed
  by another statement?
 
  return CL_SUCCESS;
  +} catch (const build_error e) {
  +   if (pfn_notify)
  +  pfn_notify(d_prog, user_data);
  +   if (e.get() == CL_COMPILE_PROGRAM_FAILURE)
  +  return CL_BUILD_PROGRAM_FAILURE;
  +   return e.get();
 
  You can just return CL_BUILD_PROGRAM_FAILURE here unconditionally.
 
   } catch (error e) {
  if (e.get() == CL_INVALID_COMPILER_OPTIONS)
 return CL_INVALID_BUILD_OPTIONS;
  -   if (e.get() == CL_COMPILE_PROGRAM_FAILURE)
  -  return CL_BUILD_PROGRAM_FAILURE;
  return e.get();
   }
 
  @@ -223,8 +229,14 @@ clCompileProgram(cl_program d_prog, cl_uint 
  num_devs,
 objsallow_empty_tag(d_header_progs, num_headers));
 
  prog.build(devs, opts, headers);
  +   if (pfn_notify)
  +  pfn_notify(d_prog, user_data);
 
  Same here.
 
  Did you had the chance to address Francisco's comments ? Did this
  patch fell through the cracks, or should I consider it
  obsolete/rejected ?
 
 
  I think the patch was good it just fell through the cracks.  Tom?
 
 Humble reminder.

Sorry, I haven't had a chance to update this patch, you can ignore it.

-Tom
 
 -Emil
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] clover: fix event handling of buffer operations

2015-06-25 Thread Grigori Goronzy


On 2015-06-09 22:52, Francisco Jerez wrote:

+
+   if (blocking)
+  hev().wait();
+


hard_event::wait() may fail, so this should probably be done before the
ret_object() call to avoid leaks.


Alright... C++ exceptions are a minefield. :)


Is there any reason you didn't make
the same change in clEnqueueReadBuffer() and clEnqueueWriteBuffer()?



Must be an oversight. I think I did that, or at least I intended to do 
so.



Same comment as above.  Also note that this is being more strict than
the spec requires (which I believe is what Tom was referring to).  From
the CL 1.2 spec:

| If blocking_write is CL_TRUE, the OpenCL implementation copies the 
data

| referred to by ptr and enqueues the write operation in the
| command-queue. The memory pointed to by ptr can be reused by the
| application after the clEnqueueWriteBufferRect call returns.

The spec is giving you no guarantee that the write to the actual memory
object will be complete by the time the clEnqueueWriteBufferRect call
returns -- Only that your data will have been buffered somewhere and 
the

memory pointed to by the argument can be reused immediately by the
application.  The reason why I was reluctant to make this change last
time it came up was that it's likely to hurt performance unnecessarily
because the wait() call blocks until *all* previous commands in the 
same

queue have completed execution, even though in the most common case the
copy is performed synchronously using soft_copy_op(), so the wait() 
call

is redundant even for blocking copies.



OK, maybe we could drop the wait completely for all of the write 
calls.



The case with blocking reads is similar, the copy is handled
synchronously using soft_copy_op() when no user events are present in
the list of dependencies, so calling wait() on the event is unnecessary
to guarantee that the execution of the read has completed, and will
cause a pipe_context flush and wait until the most recent fence is
signalled.



I think it's reasonable to expect that the event is ready for profile 
queries after a blocking read has finished. That was the initial 
motivation for this patch. Other implementations behave like that. I 
didn't expect wait() to completely flush everything. Won't that cause a 
lot of needless flushing with event wait lists?



Ideally we would have a weaker variant of event::wait()
(e.g. wait_signalled()) that doesn't flush and just waits for the
associated action call-back to have been executed without giving any
guarantees about the corresponding GPU command.  The event interface
doesn't expose such a functionality right now, I'm attaching two
(completely untested) patches implementing it, you should be able to 
use

them as starting point to fix blocking transfers.



Thanks, I'll look into that later when I get some free time.

Grigori
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/gen9: use an unreserved surface alignment value

2015-06-25 Thread Anuj Phogat

On Thu, Jun 25, 2015 at 12:37 AM, Pohjolainen, Topi
topi.pohjolai...@intel.com wrote:
 On Wed, Jun 24, 2015 at 05:57:13PM -0700, Anuj Phogat wrote:
 On Wed, Jun 24, 2015 at 3:51 PM, Nanley Chery nanleych...@gmail.com wrote:
  From: Nanley Chery nanley.g.ch...@intel.com
 
  Although the horizontal and vertical alignment fields are ignored here,
  0 is a reserved value for them and may cause undefined behavior. Change
  the default value to an abitrary valid one.
 
  Signed-off-by: Nanley Chery nanley.g.ch...@intel.com
  ---
   src/mesa/drivers/dri/i965/gen8_surface_state.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)
 
  diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
  b/src/mesa/drivers/dri/i965/gen8_surface_state.c
  index b2d1a57..22ae960 100644
  --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
  +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
  @@ -93,7 +93,7 @@ vertical_alignment(const struct brw_context *brw,
  if (brw-gen  8 
  (mt-tr_mode != INTEL_MIPTREE_TRMODE_NONE ||
   surf_type == BRW_SURFACE_1D))
  -  return 0;
  +  return GEN8_SURFACE_VALIGN_4;
 
  switch (mt-align_h) {
  case 4:
  @@ -118,7 +118,7 @@ horizontal_alignment(const struct brw_context *brw,
  if (brw-gen  8 
  (mt-tr_mode != INTEL_MIPTREE_TRMODE_NONE ||
   gen9_use_linear_1d_layout(brw, mt)))
  -  return 0;
  +  return GEN8_SURFACE_HALIGN_4;
 
  switch (mt-align_w) {
  case 4:
  --
  2.4.4
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev

 Good find Nanley. We had no known issues with value 0 but it's
 always nice to avoid undefined behavior :).

 Right, I thought about this when I reviewed the original. The spec says
 it is ignored in these cases and hence the reserved value seemed fine. Now
 that we put something meaningful there, somebody is going to compare the
 spec and wonder why we set it to 4. If we added also a comment here that
 says this is just an arbitrary (non-reserved) value and really ignored
 by the hardware, it would prevent misunderstandings. What do you guys think?
That's a good suggestion Topi.


 Reviewed-by: Anuj Phogat anuj.pho...@gmail.com
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] clover: implement CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE

2015-06-25 Thread Grigori Goronzy


On 2015-05-28 13:04, Grigori Goronzy wrote:

Work-group size should always be aligned to subgroup size; this is a
basic requirement, otherwise some work-items will be no-operation.

It might make sense to refine the value according to a kernel's
resource usage, but that's a possible optimization for the future.


Ping?

This is rather simple, but I'd like an Rb, if possible. That also goes 
for the Gallium support patch.


Grigori


---
 src/gallium/state_trackers/clover/api/kernel.cpp  | 2 +-
 src/gallium/state_trackers/clover/core/device.cpp | 5 +
 src/gallium/state_trackers/clover/core/device.hpp | 1 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/clover/api/kernel.cpp
b/src/gallium/state_trackers/clover/api/kernel.cpp
index 05cc392..857a152 100644
--- a/src/gallium/state_trackers/clover/api/kernel.cpp
+++ b/src/gallium/state_trackers/clover/api/kernel.cpp
@@ -169,7 +169,7 @@ clGetKernelWorkGroupInfo(cl_kernel d_kern,
cl_device_id d_dev,
   break;

case CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE:
-  buf.as_scalarsize_t() = 1;
+  buf.as_scalarsize_t() = dev.subgroup_size();
   break;

case CL_KERNEL_PRIVATE_MEM_SIZE:
diff --git a/src/gallium/state_trackers/clover/core/device.cpp
b/src/gallium/state_trackers/clover/core/device.cpp
index 42b45b7..c42d1d2 100644
--- a/src/gallium/state_trackers/clover/core/device.cpp
+++ b/src/gallium/state_trackers/clover/core/device.cpp
@@ -185,6 +185,11 @@ device::max_block_size() const {
return { v.begin(), v.end() };
 }

+cl_uint
+device::subgroup_size() const {
+   return get_compute_paramuint32_t(pipe, 
PIPE_COMPUTE_CAP_SUBGROUP_SIZE)[0];

+}
+
 std::string
 device::device_name() const {
return pipe-get_name(pipe);
diff --git a/src/gallium/state_trackers/clover/core/device.hpp
b/src/gallium/state_trackers/clover/core/device.hpp
index de5fc6b..2857847 100644
--- a/src/gallium/state_trackers/clover/core/device.hpp
+++ b/src/gallium/state_trackers/clover/core/device.hpp
@@ -67,6 +67,7 @@ namespace clover {
   bool has_doubles() const;

   std::vectorsize_t max_block_size() const;
+  cl_uint subgroup_size() const;
   std::string device_name() const;
   std::string vendor_name() const;
   enum pipe_shader_ir ir_format() const;

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] ARB_arrays_of_arrays GLSL ES

2015-06-25 Thread Jason Ekstrand

On Thu, Jun 25, 2015 at 1:19 AM, Timothy Arceri t_arc...@yahoo.com.au wrote:
 On Wed, 2015-06-24 at 11:17 -0700, Jason Ekstrand wrote:
 On Sat, Jun 20, 2015 at 5:32 AM, Timothy Arceri 
 t_arc...@yahoo.com.au wrote:
  Hi all,
 
  The restrictions in ES make the extension easier to implement so
  I thought I'd try get this stuff reviewed an committed before
  finishing
  up the full extension.
  The bits that I'm still working on for the desktop version are AoA
  inputs
  outputs, and interface blocks.
 
  The only thing I know is definatly missing in this series for ES is
  support for indirect indexing of samplers, but that didn't seem
  like
  something that should hold up the series.
 
  Once the SSBO series lands (with a patch that restricts unsized
  arrays)
  then all the AoA ES conformance tests will pass.
 
  There are already a bunch of piglit tests in git but I've just sent
  a
  series with all the patches still waiting review here:
  http://lists.freedesktop.org/archives/piglit/2015-June/016312.html
 
  I haven't made a patch marking this as done yet because currently
  the i965 backend takes a very long time trying to optimise some of
  the
  conformance tests. They still pass but they are taking 15-minutes+
  just
  to compile so this really needs to be sorted out first. If someone
  with
  more knowledge in this area than me wants to take a look at this I
  would
  be greatful for being pointed in the right direction.

 Can you try it with this patch:

 http://lists.freedesktop.org/archives/mesa-dev/2015-June/086011.html

 Hi Jason,

 I tried that patch it didn't seem to make any difference, then I
 noticed its for fs. The slowdown is currently happening in
 vec4_live_variables. I've also noticed other large slowdowns with some
 piglit tests I've been working on this time in the register allocate
 code.

I just sent an equivalent patch for vec4:

http://patchwork.freedesktop.org/patch/52801/

I would love to know if it helps something so that I can put a good
justification in the commit message beyond the same as for FS.
--Jason

 Once I finish fixing up some bugs with my current patchset I'll start
 digging deeper, just thought since it's so noticeably slow it might be
 easy for someone with good knowledge of the backend to point out where
 I should start looking, or to say your doing this wrong.

 One thing the slow shaders have in common is the use of arrays of
 arrays in nested loops so maybe something funny is going on when they
 go through the optimisation paths, I haven't spent much time looking at
 any of these passes yet so its highly likely they don't work as
 expected for arrays of arrays.

 Thanks anyway,
 Tim



  If useful the series is in the 'gles4' branch of the repo here:
  https://github.com/tarceri/Mesa_arrays_of_arrays.git
 
  Thanks,
  Tim
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965/vec4_live_variables: Do liveness analysis bottom-to-top

2015-06-25 Thread Jason Ekstrand

From Muchnick's Advanced Compiler Design and Implementation:

To determine which variables are live at each point in a flowgraph, we
perform a backward data-flow analysis

Previously, we were walking the blocks forwards and updating the livein and
then the liveout.  However, the livein calculation depends on the liveout
and the liveout depends on the successor blocks.  The net result is that it
takes one full iteration to go from liveout to livein and then another
full iteration to propagate to the predecessors.  This works out to an
O(n^2) computation where n is the number of blocks.  If we run things in
the other order, it's O(nl) where l is the maximum loop depth which is
practically bounded by 3.

In b2c6ba0c4b21391dc35018e1c8c4f7f7d8952bea, we made this same change in
the FS backend to great effect.  Might as well keep it consistent and make
the same change for vec4.
---
 .../drivers/dri/i965/brw_vec4_live_variables.cpp   | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp
index 95b9d90..29b4a53 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp
@@ -133,27 +133,9 @@ vec4_live_variables::compute_live_variables()
while (cont) {
   cont = false;
 
-  foreach_block (block, cfg) {
+  foreach_block_reverse (block, cfg) {
  struct block_data *bd = block_data[block-num];
 
-/* Update livein */
-for (int i = 0; i  bitset_words; i++) {
-BITSET_WORD new_livein = (bd-use[i] |
-  (bd-liveout[i] 
-   ~bd-def[i]));
-if (new_livein  ~bd-livein[i]) {
-   bd-livein[i] |= new_livein;
-   cont = true;
-   }
-}
- BITSET_WORD new_livein = (bd-flag_use[0] |
-   (bd-flag_liveout[0] 
-~bd-flag_def[0]));
- if (new_livein  ~bd-flag_livein[0]) {
-bd-flag_livein[0] |= new_livein;
-cont = true;
- }
-
 /* Update liveout */
 foreach_list_typed(bblock_link, child_link, link, block-children) {
 struct block_data *child_bd = block_data[child_link-block-num];
@@ -173,6 +155,24 @@ vec4_live_variables::compute_live_variables()
cont = true;
 }
 }
+
+ /* Update livein */
+ for (int i = 0; i  bitset_words; i++) {
+BITSET_WORD new_livein = (bd-use[i] |
+  (bd-liveout[i] 
+   ~bd-def[i]));
+if (new_livein  ~bd-livein[i]) {
+   bd-livein[i] |= new_livein;
+   cont = true;
+}
+ }
+ BITSET_WORD new_livein = (bd-flag_use[0] |
+   (bd-flag_liveout[0] 
+~bd-flag_def[0]));
+ if (new_livein  ~bd-flag_livein[0]) {
+bd-flag_livein[0] |= new_livein;
+cont = true;
+ }
   }
}
 }
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/skl: Use more compact hiz dimensions

2015-06-25 Thread Jordan Justen

Reviewed-by: Jordan Justen jordan.l.jus...@intel.com

On 2015-06-24 20:07:54, Ben Widawsky wrote:
 gen8 had some special restrictions which don't seem to carry over to gen9.
 Quoting the spec for SKL:
 The Z_Height and Z_Width values must equal those present in
 3DSTATE_DEPTH_BUFFER incremented by one.
 
 This fixes nothing in piglit (and regresses nothing).
 
 Cc: Jordan Justen jordan.l.jus...@intel.com
 Signed-off-by: Ben Widawsky b...@bwidawsk.net
 ---
  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 32 
 ++-
  1 file changed, 17 insertions(+), 15 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
 b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
 index 6aa969a..432a47c 100644
 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
 +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
 @@ -1550,21 +1550,23 @@ intel_gen8_hiz_buf_create(struct brw_context *brw,
 /* Gen7 PRM Volume 2, Part 1, 11.5.3 Hierarchical Depth Buffer documents
  * adjustments required for Z_Height and Z_Width based on multisampling.
  */
 -   switch (mt-num_samples) {
 -   case 0:
 -   case 1:
 -  break;
 -   case 2:
 -   case 4:
 -  z_width *= 2;
 -  z_height *= 2;
 -  break;
 -   case 8:
 -  z_width *= 4;
 -  z_height *= 2;
 -  break;
 -   default:
 -  unreachable(unsupported sample count);
 +   if (brw-gen  9) {
 +  switch (mt-num_samples) {
 +  case 0:
 +  case 1:
 + break;
 +  case 2:
 +  case 4:
 + z_width *= 2;
 + z_height *= 2;
 + break;
 +  case 8:
 + z_width *= 4;
 + z_height *= 2;
 + break;
 +  default:
 + unreachable(unsupported sample count);
 +  }
 }
  
 const unsigned vertical_align = 8; /* 'j' in the docs */
 -- 
 2.4.4
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] New stable-branch 10.6 candidate pushed

2015-06-25 Thread Emil Velikov

Hello list,

The candidate for the Mesa 10.6.1 is now available. Currently we have:
 - 22 queued
 - 19 nominated (outstanding)
 - and 0 rejected (obsolete) patches

The present queue consist of core mesa and glsl patches (amonst which
a Dota2 Reborn bugfix) affecting all dri drivers, nouveau specific
fixes and a selection of shared-glapi commits - from build fixes, to
ones ensuring that the dri modules can be loaded.

Take a look at section Mesa stable queue for more information.

Testing
---
The following results are against piglit 246791c51ec.


Changes - classic i965(snb)
---
None.


Changes - swrast classic

None.


Changes - gallium softpipe
--
None.


Changes - gallium llvmpipe (LLVM 3.6)
-
None.


Testing reports/general approval

Any testing reports (or general approval of the state of the branch)
will be greatly appreciated.


Trivial merge conflicts
---
None.


The plan is to have 10.6.1 this Friday(26th of June).

If you have any questions or comments that you would like to share
before the release, please go ahead.


Cheers,
Emil


Mesa stable queue
-

Nominated (19)
==

Anuj Phogat (8):
  mesa: Turn get_readpixels_transfer_ops() in to a global function
  meta: Fix transfer operations check in meta pbo path for readpixels
  mesa: Fix conditions to test signed, unsigned integer format
  mesa: Add a mesa utility function
_mesa_need_signed_unsigned_int_conversion()
  meta: Abort meta pbo path if readpixels need signed-unsigned
  meta: Don't do fragment color clamping in case of ReadPixels
  mesa: Add a helper function _mesa_need_luminance_to_rgb_conversion()
  meta: Fix reading luminance texture as rgba in
_mesa_meta_pbo_GetTexSubImage()

Boyan Ding (1):
  i915: Add XRGB format to intel_screen_make_configs

Brian Paul (1):
  configure: don't try to build gallium DRI drivers if --disable-dri is set

Emil Velikov (1):
  bugzilla_mesa.sh: sort the bugs list by number

Kenneth Graunke (1)
  i965/fs: Fix ir_txs in emit_texture_gen4_simd16().

Mario Kleiner(1):
  nouveau: Use dup fd as key in drm-winsys hash table to fix ZaphodHeads

Matt Turner (1):
  i965/fs: Don't mess up stride for uniform integer multiplication.

Tapani Pälli (2):
  glsl: Allow dynamic sampler array indexing with GLSL ES  3.00
  glsl: validate sampler array indexing for 'constant-index-expression'

Tom Stellard (3):
  clover: Call clBuildProgram() notification function when build
completes v2
  gallium/drivers: Add threadsafe wrappers for pipe_context and pipe_screen
  clover: Use threadsafe wrappers for pipe_screen and pipe_context


Queued (22)
===

Anuj Phogat (4):
  mesa: Handle integer formats in need_rgb_to_luminance_conversion()
  mesa: Use helper function need_rgb_to_luminance_conversion()
  mesa: Turn need_rgb_to_luminance_conversion() in to a global function
  meta: Abort meta path if ReadPixels need rgb to luminance conversion

Ben Widawsky (1):
  i965/gen9: Implement Push Constant Buffer workaround

Boyan Ding (2):
  egl/x11: Set version of swrastLoader to 2
  egl/x11: Remove duplicate call to dri2_x11_add_configs_for_visuals

Emil Velikov (5):
  docs: Add sha256sums for the 10.6.0 release
  configure: warn about shared_glapi  xlib-glx only when both are set
  configure: error out when building backend-less libEGL
  configure: error out when building libEGL without shared-glapi
  gbm: do not (over)link against libglapi.so

Frank Henigman (1):
  gbm: dlopen libglapi so gbm_create_device works

Ilia Mirkin (9):
  nvc0/ir: fix collection of first uses for texture barrier insertion
  nv50,nvc0: clamp uniform size to 64k
  nvc0/ir: can't have a join on a load with an indirect source
  glsl: handle conversions to double when comparing param matches
  glsl: add version checks to conditionals for builtin variable enablement
  mesa: add GL_PROGRAM_PIPELINE support in KHR_debug calls
  glsl: binding point is a texture unit, which is a combined space
  nvc0: always put all tfb bufs into bufctx
  nv50,nvc0: make sure to pushbuf_refn before putting bo into pushbuf_data
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965: Remove special case for layered drawbuffer attachments.

2015-06-25 Thread Kenneth Graunke

When binding a layered texture, the layer is already 0.  There's no need
to special case this.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/gen6_surface_state.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_surface_state.c 
b/src/mesa/drivers/dri/i965/gen6_surface_state.c
index 03e913a..39de62f 100644
--- a/src/mesa/drivers/dri/i965/gen6_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_surface_state.c
@@ -88,7 +88,8 @@ gen6_update_renderbuffer_surface(struct brw_context *brw,
   break;
}
 
-   const int min_array_element = layered ? 0 : irb-mt_layer;
+   const int min_array_element = irb-mt_layer;
+   assert(!layered || irb-mt_layer == 0);
 
surf[0] = SET_FIELD(surftype, BRW_SURFACE_TYPE) |
  SET_FIELD(format, BRW_SURFACE_FORMAT);
-- 
2.4.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/skl: Use more compact hiz dimensions

2015-06-25 Thread Anuj Phogat

On Wed, Jun 24, 2015 at 8:07 PM, Ben Widawsky
benjamin.widaw...@intel.com wrote:
 gen8 had some special restrictions which don't seem to carry over to gen9.
 Quoting the spec for SKL:
 The Z_Height and Z_Width values must equal those present in
 3DSTATE_DEPTH_BUFFER incremented by one.

 This fixes nothing in piglit (and regresses nothing).

 Cc: Jordan Justen jordan.l.jus...@intel.com
 Signed-off-by: Ben Widawsky b...@bwidawsk.net
 ---
  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 32 
 ++-
  1 file changed, 17 insertions(+), 15 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
 b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
 index 6aa969a..432a47c 100644
 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
 +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
 @@ -1550,21 +1550,23 @@ intel_gen8_hiz_buf_create(struct brw_context *brw,
 /* Gen7 PRM Volume 2, Part 1, 11.5.3 Hierarchical Depth Buffer documents
  * adjustments required for Z_Height and Z_Width based on multisampling.
  */
 -   switch (mt-num_samples) {
 -   case 0:
 -   case 1:
 -  break;
 -   case 2:
 -   case 4:
 -  z_width *= 2;
 -  z_height *= 2;
 -  break;
 -   case 8:
 -  z_width *= 4;
 -  z_height *= 2;
 -  break;
 -   default:
 -  unreachable(unsupported sample count);
 +   if (brw-gen  9) {
 +  switch (mt-num_samples) {
 +  case 0:
 +  case 1:
 + break;
 +  case 2:
 +  case 4:
 + z_width *= 2;
 + z_height *= 2;
 + break;
 +  case 8:
 + z_width *= 4;
 + z_height *= 2;
 + break;
 +  default:
 + unreachable(unsupported sample count);
 +  }
 }

 const unsigned vertical_align = 8; /* 'j' in the docs */
 --
 2.4.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reviewed-by: Anuj Phogat anuj.pho...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965/gen6: Set up layer constraints properly for depth buffers.

2015-06-25 Thread Kenneth Graunke

This ports over Chris Forbes' equivalent fixes in gen7_misc_state.c
from commit 77d55ef4819436ebbf9786a1e720ec00707bbb19.

No Piglit changes on Sandybridge.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/gen6_depth_state.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_depth_state.c 
b/src/mesa/drivers/dri/i965/gen6_depth_state.c
index 8f0d7dc..febd478 100644
--- a/src/mesa/drivers/dri/i965/gen6_depth_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_depth_state.c
@@ -73,7 +73,7 @@ gen6_emit_depth_stencil_hiz(struct brw_context *brw,
rb = (struct gl_renderbuffer*) irb;
 
if (rb) {
-  depth = MAX2(rb-Depth, 1);
+  depth = MAX2(irb-layer_count, 1);
   if (rb-TexImage)
  gl_target = rb-TexImage-TexObject-Target;
}
@@ -89,6 +89,10 @@ gen6_emit_depth_stencil_hiz(struct brw_context *brw,
   surftype = BRW_SURFACE_2D;
   depth *= 6;
   break;
+   case GL_TEXTURE_3D:
+  assert(mt);
+  depth = MAX2(mt-logical_depth0, 1);
+  /* fallthrough */
default:
   surftype = translate_tex_target(gl_target);
   break;
-- 
2.4.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Remove special case for layered drawbuffer attachments.

2015-06-25 Thread Anuj Phogat

On Thu, Jun 25, 2015 at 10:08 AM, Kenneth Graunke kenn...@whitecape.org wrote:
 When binding a layered texture, the layer is already 0.  There's no need
 to special case this.

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/mesa/drivers/dri/i965/gen6_surface_state.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/src/mesa/drivers/dri/i965/gen6_surface_state.c 
 b/src/mesa/drivers/dri/i965/gen6_surface_state.c
 index 03e913a..39de62f 100644
 --- a/src/mesa/drivers/dri/i965/gen6_surface_state.c
 +++ b/src/mesa/drivers/dri/i965/gen6_surface_state.c
 @@ -88,7 +88,8 @@ gen6_update_renderbuffer_surface(struct brw_context *brw,
break;
 }

 -   const int min_array_element = layered ? 0 : irb-mt_layer;
 +   const int min_array_element = irb-mt_layer;
 +   assert(!layered || irb-mt_layer == 0);

 surf[0] = SET_FIELD(surftype, BRW_SURFACE_TYPE) |
   SET_FIELD(format, BRW_SURFACE_FORMAT);
 --
 2.4.0

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reviewed-by: Anuj Phogat anuj.pho...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/cs: Initialize GPGPU Thread Count

2015-06-25 Thread Ben Widawsky

On Thu, Jun 11, 2015 at 09:04:45PM -0700, Jordan Justen wrote:
 This field should always be set for gen8. In the bdw PRM, Volume 2d:
 Command Reference: Structures under INTERFACE_DESCRIPTOR_DATA, DWORD
 6, Bits 9:0, Number of Threads in GPGPU Thread Group:
 
 This field should not be set to 0 even if the barrier is disabled,
 since an accurate value is needed for proper pre-emption.

I am pretty skeptical that we actually need this. It's pretty clear this is a
requirement for preemption, and do we ever plan to support gpgpu preemption with
compute shaders?

Since I did some research, here's why I think it's a requirement. BDW added
supported of doing mid thread-group preemption. Doing this requires that when
the workload is restored, it has a concept of how many more threads need to
complete.

So I don't see this patch as being a requirement, but it shouldn't hurt and
probably makes looking a debug slightly easier. Also, if we do ever support
preemption, it should work. One comment inline, and then it's
Reviewed-by: Ben Widawsky b...@bwidawsk.net

 
 In the HSW PRM, the it doesn't mention that it must always be set, but
 it should not hurt.
 
 Reported-by: Kristian Høgsberg k...@bitplanet.net
 Signed-off-by: Jordan Justen jordan.l.jus...@intel.com
 Cc: Kristian Høgsberg k...@bitplanet.net
 ---
  src/mesa/drivers/dri/i965/brw_cs.cpp| 19 +++
  src/mesa/drivers/dri/i965/brw_defines.h |  5 +
  2 files changed, 24 insertions(+)
 
 diff --git a/src/mesa/drivers/dri/i965/brw_cs.cpp 
 b/src/mesa/drivers/dri/i965/brw_cs.cpp
 index 1f2a9d2..44c76ba 100644
 --- a/src/mesa/drivers/dri/i965/brw_cs.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_cs.cpp
 @@ -284,6 +284,17 @@ brw_cs_precompile(struct gl_context *ctx,
  }
  
  
 +static unsigned
 +get_cs_thread_count(const struct brw_cs_prog_data *cs_prog_data)
 +{
 +   const unsigned simd_size = cs_prog_data-simd_size;
 +   unsigned group_size = cs_prog_data-local_size[0] *
 +  cs_prog_data-local_size[1] * cs_prog_data-local_size[2];
 +
 +   return (group_size + simd_size - 1) / simd_size;
 +}
 +
 +
  static void
  brw_upload_cs_state(struct brw_context *brw)
  {
 @@ -309,6 +320,8 @@ brw_upload_cs_state(struct brw_context *brw)
  
 prog_data-binding_table.size_bytes,
  32, 
 stage_state-bind_bo_offset);
  
 +   unsigned threads = get_cs_thread_count(cs_prog_data);
 +
 uint32_t dwords = brw-gen  8 ? 8 : 9;
 BEGIN_BATCH(dwords);
 OUT_BATCH(MEDIA_VFE_STATE  16 | (dwords - 2));
 @@ -358,6 +371,12 @@ brw_upload_cs_state(struct brw_context *brw)
 desc[dw++] = 0;
 desc[dw++] = 0;
 desc[dw++] = stage_state-bind_bo_offset;
 +   desc[dw++] = 0;
 +   const uint32_t media_threads =
 +  brw-gen = 8 ?
 +  SET_FIELD(threads, GEN8_MEDIA_GPGPU_THREAD_COUNT) :
 +  SET_FIELD(threads, MEDIA_GPGPU_THREAD_COUNT);
 +   desc[dw++] = media_threads;

What's the deal with, The maximum value for global barriers is limited by the
number of threads in the system, or by 511, Can we add an assert?

  
 BEGIN_BATCH(4);
 OUT_BATCH(MEDIA_INTERFACE_DESCRIPTOR_LOAD  16 | (4 - 2));
 diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
 b/src/mesa/drivers/dri/i965/brw_defines.h
 index f6da305..2a8f500 100644
 --- a/src/mesa/drivers/dri/i965/brw_defines.h
 +++ b/src/mesa/drivers/dri/i965/brw_defines.h
 @@ -2495,6 +2495,11 @@ enum brw_wm_barycentric_interp_mode {
  # define MEDIA_VFE_STATE_CURBE_ALLOC_MASK   INTEL_MASK(15, 0)
  
  #define MEDIA_INTERFACE_DESCRIPTOR_LOAD 0x7002
 +/* GEN7 DW5, GEN8+ DW6 */
 +# define MEDIA_GPGPU_THREAD_COUNT_SHIFT 0
 +# define MEDIA_GPGPU_THREAD_COUNT_MASK  INTEL_MASK(7, 0)
 +# define GEN8_MEDIA_GPGPU_THREAD_COUNT_SHIFT0
 +# define GEN8_MEDIA_GPGPU_THREAD_COUNT_MASK INTEL_MASK(9, 0)
  #define MEDIA_STATE_FLUSH   0x7004
  #define GPGPU_WALKER0x7105
  /* GEN8+ DW2 */
 -- 
 2.1.4
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/4] nir/from_ssa: add a flag to not convert everything to SSA

2015-06-25 Thread Connor Abbott

We already don't convert constants out of SSA, and in our backend we'd
like to have only one way of saying something is still in SSA.

The one tricky part about this is that we may now leave some undef
instructions around if they aren't part of a phi-web, so we have to be
more careful about deleting them.

Signed-off-by: Connor Abbott cwabbo...@gmail.com
---
 src/gallium/drivers/vc4/vc4_program.c |  2 +-
 src/glsl/nir/nir.h|  7 ++-
 src/glsl/nir/nir_from_ssa.c   | 25 ++---
 src/mesa/drivers/dri/i965/brw_nir.c   |  2 +-
 4 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/vc4/vc4_program.c 
b/src/gallium/drivers/vc4/vc4_program.c
index 2061631..1a550e1 100644
--- a/src/gallium/drivers/vc4/vc4_program.c
+++ b/src/gallium/drivers/vc4/vc4_program.c
@@ -2102,7 +2102,7 @@ vc4_shader_ntq(struct vc4_context *vc4, enum qstage stage,
 
 nir_remove_dead_variables(c-s);
 
-nir_convert_from_ssa(c-s);
+nir_convert_from_ssa(c-s, true);
 
 if (vc4_debug  VC4_DEBUG_SHADERDB) {
 fprintf(stderr, SHADER-DB: %s prog %d/%d: %d NIR 
instructions\n,
diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index 697d37e..2116f60 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -1676,7 +1676,12 @@ bool nir_ssa_defs_interfere(nir_ssa_def *a, nir_ssa_def 
*b);
 
 void nir_convert_to_ssa_impl(nir_function_impl *impl);
 void nir_convert_to_ssa(nir_shader *shader);
-void nir_convert_from_ssa(nir_shader *shader);
+
+/* If convert_everything is true, convert all values (even those not involved
+ * in a phi node) to registers. If false, only convert SSA values involved in
+ * phi nodes to registers.
+ */
+void nir_convert_from_ssa(nir_shader *shader, bool convert_everything);
 
 bool nir_opt_algebraic(nir_shader *shader);
 bool nir_opt_algebraic_late(nir_shader *shader);
diff --git a/src/glsl/nir/nir_from_ssa.c b/src/glsl/nir/nir_from_ssa.c
index 67733e6..966c2fe 100644
--- a/src/glsl/nir/nir_from_ssa.c
+++ b/src/glsl/nir/nir_from_ssa.c
@@ -37,6 +37,7 @@
 struct from_ssa_state {
void *mem_ctx;
void *dead_ctx;
+   bool convert_everything;
struct hash_table *merge_node_table;
nir_instr *instr;
nir_function_impl *impl;
@@ -482,6 +483,9 @@ rewrite_ssa_def(nir_ssa_def *def, void *void_state)
 
   reg = node-set-reg;
} else {
+  if (!state-convert_everything)
+ return true;
+
   /* We leave load_const SSA values alone.  They act as immediates to
* the backend.  If it got coalesced into a phi, that's ok.
*/
@@ -505,8 +509,15 @@ rewrite_ssa_def(nir_ssa_def *def, void *void_state)
nir_ssa_def_rewrite_uses(def, nir_src_for_reg(reg), state-mem_ctx);
assert(list_empty(def-uses)  list_empty(def-if_uses));
 
-   if (def-parent_instr-type == nir_instr_type_ssa_undef)
+   if (def-parent_instr-type == nir_instr_type_ssa_undef) {
+  /* If it's an ssa_undef instruction, remove it since we know we just got
+   * rid of all its uses.
+   */
+  nir_instr *parent_instr = def-parent_instr;
+  nir_instr_remove(parent_instr);
+  ralloc_steal(state-dead_ctx, parent_instr);
   return true;
+   }
 
assert(def-parent_instr-type != nir_instr_type_load_const);
 
@@ -523,7 +534,7 @@ rewrite_ssa_def(nir_ssa_def *def, void *void_state)
 }
 
 /* Resolves ssa definitions to registers.  While we're at it, we also
- * remove phi nodes and ssa_undef instructions
+ * remove phi nodes.
  */
 static bool
 resolve_registers_block(nir_block *block, void *void_state)
@@ -534,8 +545,7 @@ resolve_registers_block(nir_block *block, void *void_state)
   state-instr = instr;
   nir_foreach_ssa_def(instr, rewrite_ssa_def, state);
 
-  if (instr-type == nir_instr_type_ssa_undef ||
-  instr-type == nir_instr_type_phi) {
+  if (instr-type == nir_instr_type_phi) {
  nir_instr_remove(instr);
  ralloc_steal(state-dead_ctx, instr);
   }
@@ -765,13 +775,14 @@ resolve_parallel_copies_block(nir_block *block, void 
*void_state)
 }
 
 static void
-nir_convert_from_ssa_impl(nir_function_impl *impl)
+nir_convert_from_ssa_impl(nir_function_impl *impl, bool convert_everything)
 {
struct from_ssa_state state;
 
state.mem_ctx = ralloc_parent(impl);
state.dead_ctx = ralloc_context(NULL);
state.impl = impl;
+   state.convert_everything = convert_everything;
state.merge_node_table = _mesa_hash_table_create(NULL, _mesa_hash_pointer,
 _mesa_key_pointer_equal);
 
@@ -801,10 +812,10 @@ nir_convert_from_ssa_impl(nir_function_impl *impl)
 }
 
 void
-nir_convert_from_ssa(nir_shader *shader)
+nir_convert_from_ssa(nir_shader *shader, bool convert_everything)
 {
nir_foreach_overload(shader, overload) {
   if (overload-impl)
- nir_convert_from_ssa_impl(overload-impl);
+ nir_convert_from_ssa_impl(overload-impl, convert_everything);
}
 }
diff

[Mesa-dev] [PATCH 4/4] nir: remove parent_instr from nir_register

2015-06-25 Thread Connor Abbott

It's no longer used

Signed-off-by: Connor Abbott cwabbo...@gmail.com
---
 src/glsl/nir/nir.c  | 1 -
 src/glsl/nir/nir.h  | 8 
 src/glsl/nir/nir_from_ssa.c | 8 
 3 files changed, 17 deletions(-)

diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c
index f03e80a..f661249 100644
--- a/src/glsl/nir/nir.c
+++ b/src/glsl/nir/nir.c
@@ -57,7 +57,6 @@ reg_create(void *mem_ctx, struct exec_list *list)
 {
nir_register *reg = ralloc(mem_ctx, nir_register);
 
-   reg-parent_instr = NULL;
list_inithead(reg-uses);
list_inithead(reg-defs);
list_inithead(reg-if_uses);
diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index b33c9c5..e818acc 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -389,14 +389,6 @@ typedef struct {
 */
bool is_packed;
 
-   /**
-* If this pointer is non-NULL then this register has exactly one
-* definition and that definition dominates all of its uses.  This is
-* set by the out-of-SSA pass so that backends can get SSA-like
-* information even once they have gone out of SSA.
-*/
-   struct nir_instr *parent_instr;
-
/** set of nir_instr's where this register is used (read from) */
struct list_head uses;
 
diff --git a/src/glsl/nir/nir_from_ssa.c b/src/glsl/nir/nir_from_ssa.c
index 966c2fe..57bbdde 100644
--- a/src/glsl/nir/nir_from_ssa.c
+++ b/src/glsl/nir/nir_from_ssa.c
@@ -496,14 +496,6 @@ rewrite_ssa_def(nir_ssa_def *def, void *void_state)
   reg-name = def-name;
   reg-num_components = def-num_components;
   reg-num_array_elems = 0;
-
-  /* This register comes from an SSA definition that is defined and not
-   * part of a phi-web.  Therefore, we know it has a single unique
-   * definition that dominates all of its uses; we can copy the
-   * parent_instr from the SSA def safely.
-   */
-  if (def-parent_instr-type != nir_instr_type_ssa_undef)
- reg-parent_instr = def-parent_instr;
}
 
nir_ssa_def_rewrite_uses(def, nir_src_for_reg(reg), state-mem_ctx);
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/4] i965/fs: use SSA values directly

2015-06-25 Thread Connor Abbott

Before, we would use registers, but set a magical parent_instr field
to indicate that it was actually purely an SSA value (i.e., it wasn't
involved in any phi nodes). Instead, just use SSA values directly, which
lets us get rid of the hack and reduces memory usage since we're not
allocating a nir_register for every value. It also makes our handling of
load_const more consistent compared to the other instructions.

Signed-off-by: Connor Abbott cwabbo...@gmail.com
---
 src/mesa/drivers/dri/i965/brw_fs.h |  5 ++
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 73 ++
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |  1 +
 src/mesa/drivers/dri/i965/brw_nir.c|  2 +-
 .../dri/i965/brw_nir_analyze_boolean_resolves.c| 12 ++--
 5 files changed, 59 insertions(+), 34 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 243baf6..a6fee35 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -249,6 +249,10 @@ public:
void nir_emit_block(nir_block *block);
void nir_emit_instr(nir_instr *instr);
void nir_emit_alu(const brw::fs_builder bld, nir_alu_instr *instr);
+   void nir_emit_load_const(const brw::fs_builder bld,
+nir_load_const_instr *instr);
+   void nir_emit_undef(const brw::fs_builder bld,
+   nir_ssa_undef_instr *instr);
void nir_emit_intrinsic(const brw::fs_builder bld,
nir_intrinsic_instr *instr);
void nir_emit_texture(const brw::fs_builder bld,
@@ -345,6 +349,7 @@ public:
unsigned max_grf;
 
fs_reg *nir_locals;
+   fs_reg *nir_ssa_values;
fs_reg *nir_globals;
fs_reg nir_inputs;
fs_reg nir_outputs;
diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 59081ea..58896d7 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -366,6 +366,9 @@ fs_visitor::nir_emit_impl(nir_function_impl *impl)
   nir_locals[reg-index] = bld.vgrf(BRW_REGISTER_TYPE_F, size);
}
 
+   nir_ssa_values = reralloc(mem_ctx, nir_ssa_values, fs_reg,
+ impl-ssa_alloc);
+
nir_emit_cf_list(impl-body);
 }
 
@@ -459,9 +462,11 @@ fs_visitor::nir_emit_instr(nir_instr *instr)
   break;
 
case nir_instr_type_load_const:
-  /* We can hit these, but we do nothing now and use them as
-   * immediates later.
-   */
+  nir_emit_load_const(abld, nir_instr_as_load_const(instr));
+  break;
+
+   case nir_instr_type_ssa_undef:
+  nir_emit_undef(abld, nir_instr_as_ssa_undef(instr));
   break;
 
case nir_instr_type_jump:
@@ -495,17 +500,12 @@ bool
 fs_visitor::optimize_frontfacing_ternary(nir_alu_instr *instr,
  const fs_reg result)
 {
-   if (instr-src[0].src.is_ssa ||
-   !instr-src[0].src.reg.reg ||
-   !instr-src[0].src.reg.reg-parent_instr)
-  return false;
-
-   if (instr-src[0].src.reg.reg-parent_instr-type !=
-   nir_instr_type_intrinsic)
+   if (!instr-src[0].src.is_ssa ||
+   instr-src[0].src.ssa-parent_instr-type != nir_instr_type_intrinsic)
   return false;
 
nir_intrinsic_instr *src0 =
-  nir_instr_as_intrinsic(instr-src[0].src.reg.reg-parent_instr);
+  nir_instr_as_intrinsic(instr-src[0].src.ssa-parent_instr);
 
if (src0-intrinsic != nir_intrinsic_load_front_face)
   return false;
@@ -1146,6 +1146,25 @@ fs_visitor::nir_emit_alu(const fs_builder bld, 
nir_alu_instr *instr)
}
 }
 
+void
+fs_visitor::nir_emit_load_const(const fs_builder bld,
+nir_load_const_instr *instr)
+{
+   fs_reg reg = bld.vgrf(BRW_REGISTER_TYPE_D, instr-def.num_components);
+
+   for (unsigned i = 0; i  instr-def.num_components; i++)
+  bld.MOV(offset(reg, i), fs_reg(instr-value.i[i]));
+
+   nir_ssa_values[instr-def.index] = reg;
+}
+
+void
+fs_visitor::nir_emit_undef(const fs_builder bld, nir_ssa_undef_instr *instr)
+{
+   nir_ssa_values[instr-def.index] = bld.vgrf(BRW_REGISTER_TYPE_D,
+   instr-def.num_components);
+}
+
 static fs_reg
 fs_reg_for_nir_reg(fs_visitor *v, nir_register *nir_reg,
unsigned base_offset, nir_src *indirect)
@@ -1171,30 +1190,30 @@ fs_reg_for_nir_reg(fs_visitor *v, nir_register *nir_reg,
 fs_reg
 fs_visitor::get_nir_src(nir_src src)
 {
+   fs_reg reg;
if (src.is_ssa) {
-  assert(src.ssa-parent_instr-type == nir_instr_type_load_const);
-  nir_load_const_instr *load = 
nir_instr_as_load_const(src.ssa-parent_instr);
-  fs_reg reg = bld.vgrf(BRW_REGISTER_TYPE_D, src.ssa-num_components);
-
-  for (unsigned i = 0; i  src.ssa-num_components; ++i)
- bld.MOV(offset(reg, i), fs_reg(load-value.i[i]));
-
-  return reg;
+  reg = nir_ssa_values[src.ssa-index];
} else {
-  fs_reg reg =

Re: [Mesa-dev] [PATCH v4 3/6] mesa/es3.1: enable GL_ARB_texture_multisample for GLES 3.1

2015-06-25 Thread Ilia Mirkin

On Thu, Jun 25, 2015 at 4:22 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
 On Thu, Jun 25, 2015 at 5:08 AM, Marta Lofstedt
 marta.lofst...@linux.intel.com wrote:
 From: Marta Lofstedt marta.lofst...@intel.com

 v4 : only expose GL_ARB_texture_multisample enums
 for gles 3.1 and desktop GL.

 I was suspicious of this logic. Based on my reading of the code, what
 your ARB_texture_multisample_es31 thing does is expose those enums
 when *either* the driver enables ARB_texture_multisample *or* the
 current context is ES3.1.

 ARB_draw_indirect_es31 has the same problem, btw.

 I could have misread the get.c extra_ext() logic, but I don't think I
 have. As far as I can tell there's no way to (generically) AND these
 things.

 What you really need to do is create a whole new [GL, GL_CORE, ES31]
 section in get_hash_params and update get_hash_generator.py
 accordingly.

An alternative, BTW, is to just say screw it and expose the enums in
GL ES 3.0. In that case, just move the enums into a section that
includes GLES3. I'm not sure how big of a deal it is. But your current
patches are definitely wrong.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 11/18] mesa/macros: add power-of-two assertions for alignment macros

2015-06-25 Thread Nanley Chery

How about if I create a patch which puts the greater than 0 check into
is_power_of_two()?

(value  0)  (value  (value - 1)) == 0);

Thanks,
Nanley

On Wed, Jun 24, 2015 at 3:22 PM, Anuj Phogat anuj.pho...@gmail.com wrote:
 On Mon, Jun 22, 2015 at 4:02 PM, Nanley Chery nanleych...@gmail.com wrote:
 From: Nanley Chery nanley.g.ch...@intel.com

 ALIGN and ROUND_DOWN_TO both require that the alignment value passed
 into the macro be a power of two in the comments. Using software assertions
 verifies this to be the case.

 v2: use static inline functions instead of gcc-specific statement 
 expressions.

 Signed-off-by: Nanley Chery nanley.g.ch...@intel.com
 ---
  src/mesa/drivers/dri/i965/brw_fs_nir.cpp |  2 +-
  src/mesa/main/macros.h   | 16 +---
  2 files changed, 14 insertions(+), 4 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 index 59081ea..1a57784 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 @@ -134,7 +134,7 @@ fs_visitor::nir_setup_outputs(nir_shader *shader)
 : var-type-vector_elements;

if (stage == MESA_SHADER_VERTEX) {
 - for (int i = 0; i  ALIGN(type_size(var-type), 4) / 4; i++) {
 + for (unsigned int i = 0; i  ALIGN(type_size(var-type), 4) / 4; 
 i++) {
  int output = var-data.location + i;
  this-outputs[output] = offset(reg, 4 * i);
  this-output_components[output] = vector_elements;
 diff --git a/src/mesa/main/macros.h b/src/mesa/main/macros.h
 index 0608650..4a640ad 100644
 --- a/src/mesa/main/macros.h
 +++ b/src/mesa/main/macros.h
 @@ -684,7 +684,7 @@ minify(unsigned value, unsigned levels)
   * Note that this considers 0 a power of two.
   */
  static inline bool
 -is_power_of_two(unsigned value)
 +is_power_of_two(uintptr_t value)
  {
 return (value  (value - 1)) == 0;
  }
 @@ -700,7 +700,12 @@ is_power_of_two(unsigned value)
   *
   * \sa ROUND_DOWN_TO()
   */
 -#define ALIGN(value, alignment)  (((value) + (alignment) - 1)  
 ~((alignment) - 1))
 +static inline uintptr_t
 +ALIGN(uintptr_t value, uintptr_t alignment)
 +{
 +  assert(is_power_of_two(alignment));
 Also handle the 0 alignment. is_power_of_two() returns true for 0.
 Use assert(alignment  0  is_power_of_two(alignment))?
 +  return (((value) + (alignment) - 1)  ~((alignment) - 1));
 +}

  /**
   * Align a value down to an alignment value
 @@ -713,7 +718,12 @@ is_power_of_two(unsigned value)
   *
   * \sa ALIGN()
   */
 -#define ROUND_DOWN_TO(value, alignment) ((value)  ~(alignment - 1))
 +static inline uintptr_t
 +ROUND_DOWN_TO(uintptr_t value, uintptr_t alignment)
 +{
 +  assert(is_power_of_two(alignment));
 Here as well.
 +  return ((value)  ~(alignment - 1));
 +}


  /** Cross product of two 3-element vectors */
 --
 2.4.2

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

 With above changes and indentation fixes:
 Reviewed-by: Anuj Phogat anuj.pho...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 4/7] glsl: No need to lock in _mesa_glsl_release_types

2015-06-25 Thread Matt Turner

On Thu, Jun 25, 2015 at 2:05 PM, Erik Faye-Lund kusmab...@gmail.com wrote:
 This function only gets called while mesa is unloading, so there's
 no potential of racing or multiple calls at the same time. So let's
 just get rid of the locking.

 Signed-off-by: Erik Faye-Lund kusmab...@gmail.com
 ---
  src/glsl/glsl_types.cpp | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

 diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp
 index f675e90..dbe2382 100644
 --- a/src/glsl/glsl_types.cpp
 +++ b/src/glsl/glsl_types.cpp
 @@ -324,8 +324,10 @@ const glsl_type *glsl_type::get_scalar_type() const
  void
  _mesa_glsl_release_types(void)
  {
 -   mtx_lock(glsl_type::mutex);
 -
 +   /* should only be called during atexit (either when unloading shared

Let's capitalize should

 +* object, or if process terminates), so no mutex-locking should be
 +* nessecary.

typo: necessary
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

1 2 >

1 - 100 of 139 matches

Mail list logo