[Mesa-dev] [PATCH] i965/clip: Fix brw_clip_unfilled.c/compute_offset's assembly.

2014-08-06 Thread Kenneth Graunke
Due to the destination register width of 1 or 2, these instructions get
ExecSize 1 or 2.  But dir and offset (used as src0) are both registers
of width 4, violating the execsize = width assertion.

I honestly don't think this could have ever worked.

Fixes Piglit's polygon-offset and polygon-mode-offset tests on Gen4-5.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70441
Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_clip_unfilled.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Only tested on Ironlake.  No Piglit regressions.  Two fixes.

diff --git a/src/mesa/drivers/dri/i965/brw_clip_unfilled.c 
b/src/mesa/drivers/dri/i965/brw_clip_unfilled.c
index 5104276..82d7b64 100644
--- a/src/mesa/drivers/dri/i965/brw_clip_unfilled.c
+++ b/src/mesa/drivers/dri/i965/brw_clip_unfilled.c
@@ -198,7 +198,7 @@ static void compute_offset( struct brw_clip_compile *c )
struct brw_reg dir = c-reg.dir;
 
brw_math_invert(p, get_element(off, 2), get_element(dir, 2));
-   brw_MUL(p, vec2(off), dir, get_element(off, 2));
+   brw_MUL(p, vec2(off), vec2(dir), get_element(off, 2));
 
brw_CMP(p,
   vec1(brw_null_reg()),
@@ -210,8 +210,8 @@ static void compute_offset( struct brw_clip_compile *c )
brw_abs(get_element(off, 0)), brw_abs(get_element(off, 1)));
brw_inst_set_pred_control(brw, brw_last_inst, BRW_PREDICATE_NORMAL);
 
-   brw_MUL(p, vec1(off), off, brw_imm_f(c-key.offset_factor));
-   brw_ADD(p, vec1(off), off, brw_imm_f(c-key.offset_units));
+   brw_MUL(p, vec1(off), vec1(off), brw_imm_f(c-key.offset_factor));
+   brw_ADD(p, vec1(off), vec1(off), brw_imm_f(c-key.offset_units));
 }
 
 
-- 
1.9.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/clip: Fix brw_clip_unfilled.c/compute_offset's assembly.

2014-08-06 Thread Chris Forbes
Reviewed-by: Chris Forbes chr...@ijw.co.nz

On Wed, Aug 6, 2014 at 6:57 PM, Kenneth Graunke kenn...@whitecape.org wrote:
 Due to the destination register width of 1 or 2, these instructions get
 ExecSize 1 or 2.  But dir and offset (used as src0) are both registers
 of width 4, violating the execsize = width assertion.

 I honestly don't think this could have ever worked.

 Fixes Piglit's polygon-offset and polygon-mode-offset tests on Gen4-5.

 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70441
 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/mesa/drivers/dri/i965/brw_clip_unfilled.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

 Only tested on Ironlake.  No Piglit regressions.  Two fixes.

 diff --git a/src/mesa/drivers/dri/i965/brw_clip_unfilled.c 
 b/src/mesa/drivers/dri/i965/brw_clip_unfilled.c
 index 5104276..82d7b64 100644
 --- a/src/mesa/drivers/dri/i965/brw_clip_unfilled.c
 +++ b/src/mesa/drivers/dri/i965/brw_clip_unfilled.c
 @@ -198,7 +198,7 @@ static void compute_offset( struct brw_clip_compile *c )
 struct brw_reg dir = c-reg.dir;

 brw_math_invert(p, get_element(off, 2), get_element(dir, 2));
 -   brw_MUL(p, vec2(off), dir, get_element(off, 2));
 +   brw_MUL(p, vec2(off), vec2(dir), get_element(off, 2));

 brw_CMP(p,
vec1(brw_null_reg()),
 @@ -210,8 +210,8 @@ static void compute_offset( struct brw_clip_compile *c )
 brw_abs(get_element(off, 0)), brw_abs(get_element(off, 1)));
 brw_inst_set_pred_control(brw, brw_last_inst, BRW_PREDICATE_NORMAL);

 -   brw_MUL(p, vec1(off), off, brw_imm_f(c-key.offset_factor));
 -   brw_ADD(p, vec1(off), off, brw_imm_f(c-key.offset_units));
 +   brw_MUL(p, vec1(off), vec1(off), brw_imm_f(c-key.offset_factor));
 +   brw_ADD(p, vec1(off), vec1(off), brw_imm_f(c-key.offset_units));
  }


 --
 1.9.2

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 04/19] i965/gen6: Adjust render height in errata case for MSAA

2014-08-06 Thread Kenneth Graunke
On Friday, August 01, 2014 12:53:34 AM Jordan Justen wrote:
 In the gen6 PRM Volume 1 Part 1: Graphics Core, Section
 7.18.3.7.1 (Surface Arrays For all surfaces other than separate
 stencil buffer):
 
  [DevSNB] Errata: Sampler MSAA Qpitch will be 4 greater than the
   value calculated in the equation above , for every other odd Surface
   Height starting from 1 i.e. 1,5,9,13
 
 Since this Qpitch errata only impacts the sampler, we have to adjust
 the input for the rendering surface to achieve the same qpitch. For
 the affected heights, we increment the height by 1 for the rendering
 surface.
 
 Signed-off-by: Jordan Justen jordan.l.jus...@intel.com
 ---
  src/mesa/drivers/dri/i965/gen6_surface_state.c | 18 +-
  1 file changed, 17 insertions(+), 1 deletion(-)
 
 diff --git a/src/mesa/drivers/dri/i965/gen6_surface_state.c 
 b/src/mesa/drivers/dri/i965/gen6_surface_state.c
 index db58de9..141ca6f 100644
 --- a/src/mesa/drivers/dri/i965/gen6_surface_state.c
 +++ b/src/mesa/drivers/dri/i965/gen6_surface_state.c
 @@ -96,8 +96,24 @@ gen6_update_renderbuffer_surface(struct brw_context *brw,
 /* reloc */
 surf[1] = mt-bo-offset64;
  
 +   /* In the gen6 PRM Volume 1 Part 1: Graphics Core, Section 7.18.3.7.1
 +* (Surface Arrays For all surfaces other than separate stencil buffer):
 +*
 +* [DevSNB] Errata: Sampler MSAA Qpitch will be 4 greater than the value
 +*  calculated in the equation above , for every other odd Surface Height
 +*  starting from 1 i.e. 1,5,9,13
 +*
 +* Since this Qpitch errata only impacts the sampler, we have to adjust 
 the
 +* input for the rendering surface to achieve the same qpitch. For the
 +* affected heights, we increment the height by 1 for the rendering
 +* surface.
 +*/
 +   int height0 = irb-mt-logical_height0;
 +   if (brw-gen == 6  irb-mt-num_samples  1  (height0 % 4) == 1)
 +  height0++;
 +
 surf[2] = SET_FIELD(mt-logical_width0 - 1, BRW_SURFACE_WIDTH) |
 - SET_FIELD(mt-logical_height0 - 1, BRW_SURFACE_HEIGHT) |
 + SET_FIELD(height0 - 1, BRW_SURFACE_HEIGHT) |
   SET_FIELD(irb-mt_level - irb-mt-first_level, 
 BRW_SURFACE_LOD);
  
 surf[3] = brw_get_surface_tiling_bits(mt-tiling) |
 

FWIW, I believe this code is correct after all.  I worked through a lot of math 
to show the effects this has on QPitch, and it originally didn't work out.  It 
turns out our QPitch computation in brw_tex_layout.c is wrong; once I did the 
correct QPitch computation, it worked out.

I'm going to write patches to fix that.

So this gets a:
Reviewed-by: Kenneth Graunke kenn...@whitecape.org

But please wait on the series for a bit - I'd like to look over the rest, and 
see how my qpitch fixes affect things.  I'll also post my math demonstrating 
why this does the right thing.  Sorry for the hold up...

Really nice work tracking this down.

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa (master): mesa/formats: Add layout and swizzle information

2014-08-06 Thread Michel Dänzer
On 06.08.2014 03:08, Jason Ekstrand wrote:
 Module: Mesa
 Branch: master
 Commit: 850fb0d1dca616179d3239a7b7bd94fe1979604c
 URL:
 http://cgit.freedesktop.org/mesa/mesa/commit/?id=850fb0d1dca616179d3239a7b7bd94fe1979604c
 
 Author: Jason Ekstrand jason.ekstr...@intel.com
 Date:   Thu Jul 10 23:59:42 2014 -0700
 
 mesa/formats: Add layout and swizzle information
 
 v2: Move the MESA_FORMAT_SWIZZLE enum to the top of the file
 
 Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com
 Reviewed-by: Brian Paul bri...@vmware.com

As of this commit, ~20 depth/stencil related piglit tests have regressed with 
the radeonsi driver compared to before your changes. See below for an example 
failure of the draw-pixels test.

That test is already broken with the previous commits, each of them with 
slightly different failure symptoms.


Mesa 10.3.0-devel implementation error: Unexpected inFormat GL_STENCIL_INDEX
Please report at https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa
Mesa 10.3.0-devel implementation error: Unexpected inFormat 0x
Please report at https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa
Mesa 10.3.0-devel implementation error: Unexpected inFormat 0x
Please report at https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa
Probe stencil at (0, 0)
  Expected: 50
  Observed: 100
Probe color at (0,0)
  Expected: 1.00 0.00 0.00 1.00
  Observed: 0.00 0.00 0.00 1.00
Probe color at (0,0)
  Expected: 0.00 0.396078 0.00 1.00
  Observed: 0.00 0.384314 0.00 1.00
Probe color at (0,0)
  Expected: 0.00 0.00 0.396078 1.00
  Observed: 0.00 0.00 0.384314 1.00
Probe color at (0,0)
  Expected: 0.396078 0.427451 0.00 1.00
  Observed: 0.384314 0.415686 0.00 1.00
Probe color at (0,0)
  Expected: 0.396078 0.427451 0.458824 1.00
  Observed: 0.384314 0.415686 0.447059 1.00
Probe color at (0,0)
  Expected: 0.458824 0.427451 0.396078 1.00
  Observed: 0.447059 0.415686 0.384314 1.00
Probe color at (0,0)
  Expected: 0.458824 0.427451 0.396078 0.490196
  Observed: 0.447059 0.415686 0.384314 0.478431
Probe color at (0,0)
  Expected: 0.396078 0.396078 0.396078 0.427451
  Observed: 0.384314 0.384314 0.384314 0.415686
Mesa 10.3.0-devel implementation error: Unexpected inFormat 0x
Please report at https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa
Mesa 10.3.0-devel implementation error: Unexpected inFormat 0x
Please report at https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa
draw-pixels: ../../../src/mesa/main/pack.c:4669: _mesa_unpack_color_span_uint: 
Assertion `srcFormat == 0x1903 || srcFormat == 0x1904 || srcFormat == 0x1905 || 
srcFormat == 0x1906 || srcFormat == 0x1909 || srcFormat == 0x190A || srcFormat 
== 0x8049 || srcFormat == 0x8227 || srcFormat == 0x1907 || srcFormat == 0x80E0 
|| srcFormat == 0x1908 || srcFormat == 0x80E1 || srcFormat == 0x8000 || 
srcFormat == 0x8D94 || srcFormat == 0x8D95 || srcFormat == 0x8D96 || srcFormat 
== 0x8D97 || srcFormat == 0x8228 || srcFormat == 0x8D98 || srcFormat == 0x8D99 
|| srcFormat == 0x8D9A || srcFormat == 0x8D9B || srcFormat == 0x8D9C || 
srcFormat == 0x8D9D' failed.


-- 
Earthling Michel Dänzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa (master): mesa/formats: Add layout and swizzle information

2014-08-06 Thread Michel Dänzer
On 06.08.2014 18:28, Michel Dänzer wrote:
 On 06.08.2014 03:08, Jason Ekstrand wrote:
 Module: Mesa
 Branch: master
 Commit: 850fb0d1dca616179d3239a7b7bd94fe1979604c
 URL:
 http://cgit.freedesktop.org/mesa/mesa/commit/?id=850fb0d1dca616179d3239a7b7bd94fe1979604c

 Author: Jason Ekstrand jason.ekstr...@intel.com
 Date:   Thu Jul 10 23:59:42 2014 -0700

 mesa/formats: Add layout and swizzle information

 v2: Move the MESA_FORMAT_SWIZZLE enum to the top of the file

 Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com
 Reviewed-by: Brian Paul bri...@vmware.com
 
 As of this commit, ~20 depth/stencil related piglit tests have
 regressed with the radeonsi driver compared to before your changes.
 See below for an example failure of the draw-pixels test.
 
 That test is already broken with the previous commits, each of them
 with slightly different failure symptoms.

I meant to write: 'That test is already broken with the three previous
commits, [...]'


-- 
Earthling Michel Dänzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: mesa/st dynamic sampler support in tgsi

2014-08-06 Thread Marek Olšák
On Wed, Aug 6, 2014 at 4:02 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
 On Tue, Aug 5, 2014 at 5:25 PM, Roland Scheidegger srol...@vmware.com wrote:
 From a gallium perspective, indirect temp regs are already working - so
 something like
 MOV TEMP[0], TEMP[TEMP[1].x] should work.
 Indirect registers are supported for inputs, outputs, temps, constants,
 and immediates even, but the indirect reg itself must come from a temp
 or address reg (I am not 100% certain where that restriction comes
 from). I have no idea which drivers support it, all I can tell is that
 it works with llvmpipe.
 I sort of doubt it is supported for samplers right now in gallium though
 technically it might be possible to express this already.

 Well, with my limited patch + ChrisF's small patches to mesa core, the
 dynamic sampler stuff works for nvc0, except for the issues I
 outlined. Not sure what you mean by supported in gallium. Perhaps I
 have an incorrect view of things, but I see gallium as an amorphous
 thing that we can change to our heart's content.

 A cap bit for the ability to support dynamic indexing of shaders (plus
 whatever is needed for making it work like declaration of sampler
 arrays) would certainly be needed in any case. For drivers supporting

 Right... so it's not like shaders will start magically containing
 these things, it'll only happen if ARB_gs5 is enabled (probably via
 PIPE_CAP_GLSL = 400). Which presumably means that the backend
 supports whatever we're throwing at it.

 this I would certainly expect them to allow temp regs as the indirect
 reg. I guess it would be nice if we'd just use temp regs instead of
 address reg in glsl to tgsi conversion if a driver supports it. I think
 for modern drivers this makes a lot more sense than trying to shove
 everything into address regs.

 Agreed. With the exception that I guess we also need to support
 indexing with float values? (i.e. ARL) This would have to be treated
 with some care. Not sure when that comes up though... perhaps only if
 !native_integers, which won't be an issue with any of the hw that
 we're talking about.

If you really want to lower ARL into a temp, I recommend using F2I,
which is equivalent in behavior. For UARL, MOV will do.

Also, I don't think GLSL sampler arrays have to be declared as arrays
in TGSI. Array declarations are really only needed for TEMPs, because
they allow better register allocation. Every other shader resource has
a fixed location and would not benefit from it.

If GLSL is strict about out-of-bounds access, I recommending always
clamping the index in glsl_to_tgsi.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: support unsigned increment in ir_loop controls

2014-08-06 Thread Tapani Pälli
On 08/05/2014 05:41 PM, Michel Dänzer wrote:
 On 31.07.2014 15:05, Michel Dänzer wrote:
 On 30.07.2014 20:11, Tapani Pälli wrote:
 Current version can create ir_expression where operands have
 different base type, patch adds support for unsigned type.

 Signed-off-by: Tapani Pälli tapani.pa...@intel.com
 https://bugs.freedesktop.org/show_bug.cgi?id=80880
 Tested-by: Michel Dänzer michel.daen...@amd.com
 Can this go in? This is is the only remaining issue preventing the UE4
 demos from working on Gallium drivers.

I've been waiting for r-b, it's quite simple so if anyone has time ...

// Tapani
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: implement switch flow control using a loop

2014-08-06 Thread Tapani Pälli
Patch removes old variable based logic for handling a break inside
switch. Switch is put inside a loop so that existing infrastructure
for loop flow control can be used for the switch, now also dead code
elimination works properly.

Possible 'continue' call inside a switch needs now special handling
which is taken care of by detecting continue, breaking out and calling
continue for the outside loop.

Fixes following Piglit tests:

   fs-exec-after-break.shader_test
   fs-conditional-break.shader_test

No Piglit or es3conform regressions.

Signed-off-by: Tapani Pälli tapani.pa...@intel.com
---
 src/glsl/ast_to_hir.cpp   | 101 +++---
 src/glsl/glsl_parser_extras.h |   4 +-
 2 files changed, 68 insertions(+), 37 deletions(-)

diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index 30b02d0..4e3c48c 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -4366,7 +4366,7 @@ ast_jump_statement::hir(exec_list *instructions,
   * loop.
   */
  if (state-loop_nesting_ast != NULL 
- mode == ast_continue) {
+ mode == ast_continue  !state-switch_state.is_switch_innermost) 
{
 if (state-loop_nesting_ast-rest_expression) {
state-loop_nesting_ast-rest_expression-hir(instructions,
  state);
@@ -4378,19 +4378,27 @@ ast_jump_statement::hir(exec_list *instructions,
  }
 
  if (state-switch_state.is_switch_innermost 
+ mode == ast_continue) {
+/* Set 'continue_inside' to true. */
+ir_rvalue *const true_val = new (ctx) ir_constant(true);
+ir_dereference_variable *deref_continue_inside_var =
+   new(ctx) 
ir_dereference_variable(state-switch_state.continue_inside);
+instructions-push_tail(new(ctx) 
ir_assignment(deref_continue_inside_var,
+   true_val));
+
+/* Break out from the switch, continue for the loop will
+ * be called right after switch. */
+ir_loop_jump *const jump =
+   new(ctx) ir_loop_jump(ir_loop_jump::jump_break);
+instructions-push_tail(jump);
+
+ } else if (state-switch_state.is_switch_innermost 
  mode == ast_break) {
-/* Force break out of switch by setting is_break switch state.
- */
-ir_variable *const is_break_var = state-switch_state.is_break_var;
-ir_dereference_variable *const deref_is_break_var =
-   new(ctx) ir_dereference_variable(is_break_var);
-ir_constant *const true_val = new(ctx) ir_constant(true);
-ir_assignment *const set_break_var =
-   new(ctx) ir_assignment(deref_is_break_var, true_val);
-   
-instructions-push_tail(set_break_var);
- }
- else {
+/* Force break out of switch by inserting a break. */
+ir_loop_jump *const jump =
+   new(ctx) ir_loop_jump(ir_loop_jump::jump_break);
+instructions-push_tail(jump);
+ } else {
 ir_loop_jump *const jump =
new(ctx) ir_loop_jump((mode == ast_break)
   ? ir_loop_jump::jump_break
@@ -4502,19 +4510,19 @@ ast_switch_statement::hir(exec_list *instructions,
instructions-push_tail(new(ctx) ir_assignment(deref_is_fallthru_var,
   is_fallthru_val));
 
-   /* Initalize is_break state to false.
+   /* Initialize continue_inside state to false.
 */
-   ir_rvalue *const is_break_val = new (ctx) ir_constant(false);
-   state-switch_state.is_break_var =
+   state-switch_state.continue_inside =
   new(ctx) ir_variable(glsl_type::bool_type,
-   switch_is_break_tmp,
+   continue_inside_tmp,
ir_var_temporary);
-   instructions-push_tail(state-switch_state.is_break_var);
+   instructions-push_tail(state-switch_state.continue_inside);
 
-   ir_dereference_variable *deref_is_break_var =
-  new(ctx) ir_dereference_variable(state-switch_state.is_break_var);
-   instructions-push_tail(new(ctx) ir_assignment(deref_is_break_var,
-  is_break_val));
+   ir_rvalue *const false_val = new (ctx) ir_constant(false);
+   ir_dereference_variable *deref_continue_inside_var =
+  new(ctx) ir_dereference_variable(state-switch_state.continue_inside);
+   instructions-push_tail(new(ctx) ir_assignment(deref_continue_inside_var,
+  false_val));
 
state-switch_state.run_default =
   new(ctx) ir_variable(glsl_type::bool_type,
@@ -4522,13 +4530,46 @@ ast_switch_statement::hir(exec_list *instructions,
  ir_var_temporary);
instructions-push_tail(state-switch_state.run_default);
 

Re: [Mesa-dev] [PATCH 4/4] radeon: cache the last used userptr

2014-08-06 Thread Marek Olšák
What is this patch good for?

Marek

On Tue, Aug 5, 2014 at 7:31 PM, Christian König deathsim...@vodafone.de wrote:
 From: Christian König christian.koe...@amd.com

 Signed-off-by: Christian König christian.koe...@amd.com
 ---
  src/gallium/drivers/radeon/r600_pipe_common.c |  9 ++
  src/gallium/drivers/radeon/r600_pipe_common.h | 11 +++
  src/gallium/drivers/radeon/r600_texture.c | 41 
 +--
  3 files changed, 59 insertions(+), 2 deletions(-)

 diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
 b/src/gallium/drivers/radeon/r600_pipe_common.c
 index 69d344e..f745311 100644
 --- a/src/gallium/drivers/radeon/r600_pipe_common.c
 +++ b/src/gallium/drivers/radeon/r600_pipe_common.c
 @@ -770,11 +770,20 @@ bool r600_common_screen_init(struct r600_common_screen 
 *rscreen,
 }
 }

 +   pipe_mutex_init(rscreen-userptr_lock);
 +
 return true;
  }

  void r600_destroy_common_screen(struct r600_common_screen *rscreen)
  {
 +   unsigned i;
 +
 +   for (i = 0; i  R600_USERPTR_CACHE; ++i)
 +   pipe_resource_reference((struct pipe_resource 
 **)rscreen-userptr[i].tex, NULL);
 +
 +   pipe_mutex_destroy(rscreen-userptr_lock);
 +
 pipe_mutex_destroy(rscreen-aux_context_lock);
 rscreen-aux_context-destroy(rscreen-aux_context);

 diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
 b/src/gallium/drivers/radeon/r600_pipe_common.h
 index dcec2bb..88dbaf8 100644
 --- a/src/gallium/drivers/radeon/r600_pipe_common.h
 +++ b/src/gallium/drivers/radeon/r600_pipe_common.h
 @@ -97,6 +97,8 @@

  #define R600_MAP_BUFFER_ALIGNMENT 64

 +#define R600_USERPTR_CACHE 32
 +
  struct r600_common_context;

  struct radeon_shader_binary {
 @@ -258,6 +260,15 @@ struct r600_common_screen {
 struct r600_resource*trace_bo;
 uint32_t*trace_ptr;
 unsignedcs_count;
 +
 +   struct {
 +   struct r600_texture *tex;
 +   void*pointer;
 +   unsignedoffset;
 +   unsignedsize;
 +   }   userptr[R600_USERPTR_CACHE];
 +   unsigneduserptr_idx;
 +   pipe_mutex  userptr_lock;
  };

  /* This encapsulates a state or an operation which can emitted into the GPU
 diff --git a/src/gallium/drivers/radeon/r600_texture.c 
 b/src/gallium/drivers/radeon/r600_texture.c
 index 89b3b55..c3ff96c 100644
 --- a/src/gallium/drivers/radeon/r600_texture.c
 +++ b/src/gallium/drivers/radeon/r600_texture.c
 @@ -855,10 +855,11 @@ static struct r600_texture 
 *r600_texture_from_ptr(struct pipe_screen *screen,
  {
 struct r600_common_screen *rscreen = (struct 
 r600_common_screen*)screen;
 struct radeon_surface surface = {};
 +   struct pipe_resource *res = NULL;
 struct r600_texture *tex;
 unsigned offset, size;
 struct pb_buffer *buf;
 -   int r;
 +   int r, i;

 /* Support only 2D textures without mipmaps */
 if ((templ-target != PIPE_TEXTURE_2D  templ-target != 
 PIPE_TEXTURE_RECT) ||
 @@ -877,16 +878,52 @@ static struct r600_texture 
 *r600_texture_from_ptr(struct pipe_screen *screen,
 if (size  64*1024)
 return NULL;

 +   pipe_mutex_lock(rscreen-userptr_lock);
 +   for (i = 0; i  R600_USERPTR_CACHE; ++i) {
 +
 +   if (rscreen-userptr[i].pointer != pointer ||
 +   rscreen-userptr[i].offset != offset ||
 +   rscreen-userptr[i].size != size ||
 +   !rscreen-userptr[i].tex)
 +   continue;
 +
 +   tex = rscreen-userptr[i].tex;
 +   if (tex-resource.b.b.width0 != templ-width0 
 +   tex-resource.b.b.height0 != templ-height0 
 +   tex-resource.b.b.target != templ-target 
 +   tex-resource.b.b.format != templ-format)
 +   continue;
 +
 +   pipe_resource_reference(res, (struct pipe_resource *)tex);
 +   pipe_mutex_unlock(rscreen-userptr_lock);
 +   return (struct r600_texture *)res;
 +   }
 +   pipe_mutex_unlock(rscreen-userptr_lock);
 +
 buf = rscreen-ws-buffer_from_ptr(rscreen-ws, pointer, size);
 if (!buf)
 return NULL;

 -   r = r600_init_surface(rscreen, surface, templ, 
 RADEON_SURF_MODE_LINEAR_ALIGNED, false);
 +   r = r600_init_surface(rscreen, surface, templ, 
 RADEON_SURF_MODE_LINEAR, false);
 if (r)
 return NULL;

 tex = r600_texture_create_object(screen, templ, stride, buf, 
 surface);
 tex-surface.level[0].offset += offset;
 +
 +   pipe_mutex_lock(rscreen-userptr_lock);
 +   ++rscreen-userptr_idx;
 +   rscreen-userptr_idx %= R600_USERPTR_CACHE;
 +
 +   i = rscreen-userptr_idx;
 +  

Re: [Mesa-dev] [PATCH 4/4] radeon: cache the last used userptr

2014-08-06 Thread Christian König

What is this patch good for?
Nothing in particular, I just wanted to test how much overhead creating 
a new BO each time we do transfer_inline_write actually makes.


BTW: Implementing transfer_inline_write using userptrs was just a prove 
of concept. It turned out to actually be way slower than just copying 
with the CPU because we need to block for the copy to complete.


For a real use case we need to support creating textures from 
application supplied pointers and implement the matching OpenGL 
extensions, but you probably know that better than I do.


Christian.

Am 06.08.2014 um 13:24 schrieb Marek Olšák:

What is this patch good for?

Marek

On Tue, Aug 5, 2014 at 7:31 PM, Christian König deathsim...@vodafone.de wrote:

From: Christian König christian.koe...@amd.com

Signed-off-by: Christian König christian.koe...@amd.com
---
  src/gallium/drivers/radeon/r600_pipe_common.c |  9 ++
  src/gallium/drivers/radeon/r600_pipe_common.h | 11 +++
  src/gallium/drivers/radeon/r600_texture.c | 41 +--
  3 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 69d344e..f745311 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -770,11 +770,20 @@ bool r600_common_screen_init(struct r600_common_screen 
*rscreen,
 }
 }

+   pipe_mutex_init(rscreen-userptr_lock);
+
 return true;
  }

  void r600_destroy_common_screen(struct r600_common_screen *rscreen)
  {
+   unsigned i;
+
+   for (i = 0; i  R600_USERPTR_CACHE; ++i)
+   pipe_resource_reference((struct pipe_resource 
**)rscreen-userptr[i].tex, NULL);
+
+   pipe_mutex_destroy(rscreen-userptr_lock);
+
 pipe_mutex_destroy(rscreen-aux_context_lock);
 rscreen-aux_context-destroy(rscreen-aux_context);

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index dcec2bb..88dbaf8 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -97,6 +97,8 @@

  #define R600_MAP_BUFFER_ALIGNMENT 64

+#define R600_USERPTR_CACHE 32
+
  struct r600_common_context;

  struct radeon_shader_binary {
@@ -258,6 +260,15 @@ struct r600_common_screen {
 struct r600_resource*trace_bo;
 uint32_t*trace_ptr;
 unsignedcs_count;
+
+   struct {
+   struct r600_texture *tex;
+   void*pointer;
+   unsignedoffset;
+   unsignedsize;
+   }   userptr[R600_USERPTR_CACHE];
+   unsigneduserptr_idx;
+   pipe_mutex  userptr_lock;
  };

  /* This encapsulates a state or an operation which can emitted into the GPU
diff --git a/src/gallium/drivers/radeon/r600_texture.c 
b/src/gallium/drivers/radeon/r600_texture.c
index 89b3b55..c3ff96c 100644
--- a/src/gallium/drivers/radeon/r600_texture.c
+++ b/src/gallium/drivers/radeon/r600_texture.c
@@ -855,10 +855,11 @@ static struct r600_texture *r600_texture_from_ptr(struct 
pipe_screen *screen,
  {
 struct r600_common_screen *rscreen = (struct 
r600_common_screen*)screen;
 struct radeon_surface surface = {};
+   struct pipe_resource *res = NULL;
 struct r600_texture *tex;
 unsigned offset, size;
 struct pb_buffer *buf;
-   int r;
+   int r, i;

 /* Support only 2D textures without mipmaps */
 if ((templ-target != PIPE_TEXTURE_2D  templ-target != 
PIPE_TEXTURE_RECT) ||
@@ -877,16 +878,52 @@ static struct r600_texture *r600_texture_from_ptr(struct 
pipe_screen *screen,
 if (size  64*1024)
 return NULL;

+   pipe_mutex_lock(rscreen-userptr_lock);
+   for (i = 0; i  R600_USERPTR_CACHE; ++i) {
+
+   if (rscreen-userptr[i].pointer != pointer ||
+   rscreen-userptr[i].offset != offset ||
+   rscreen-userptr[i].size != size ||
+   !rscreen-userptr[i].tex)
+   continue;
+
+   tex = rscreen-userptr[i].tex;
+   if (tex-resource.b.b.width0 != templ-width0 
+   tex-resource.b.b.height0 != templ-height0 
+   tex-resource.b.b.target != templ-target 
+   tex-resource.b.b.format != templ-format)
+   continue;
+
+   pipe_resource_reference(res, (struct pipe_resource *)tex);
+   pipe_mutex_unlock(rscreen-userptr_lock);
+   return (struct r600_texture *)res;
+   }
+   pipe_mutex_unlock(rscreen-userptr_lock);
+
 buf = rscreen-ws-buffer_from_ptr(rscreen-ws, pointer, size);
 if (!buf)
 return 

Re: [Mesa-dev] [PATCH 1/4] radeonsi: implement partial DMA copies v2

2014-08-06 Thread Marek Olšák
For patch 1  2:

Reviewed-by: Marek Olšák marek.ol...@amd.com

How was the DMA code tested? I think the best thing would be to switch
resource_copy_region to dma_copy just for testing and run piglit. (you
also probably want to avoid recursion between dma_copy and
resource_copy_region)

Marek

On Tue, Aug 5, 2014 at 7:31 PM, Christian König deathsim...@vodafone.de wrote:
 From: Christian König christian.koe...@amd.com

 v2: fix a couple of typos and bugs

 Signed-off-by: Christian König christian.koe...@amd.com
 ---
  src/gallium/drivers/radeonsi/si_dma.c | 85 
 +++
  src/gallium/drivers/radeonsi/sid.h|  1 +
  2 files changed, 68 insertions(+), 18 deletions(-)

 diff --git a/src/gallium/drivers/radeonsi/si_dma.c 
 b/src/gallium/drivers/radeonsi/si_dma.c
 index 26f1e1b..4d72f62 100644
 --- a/src/gallium/drivers/radeonsi/si_dma.c
 +++ b/src/gallium/drivers/radeonsi/si_dma.c
 @@ -111,6 +111,48 @@ static void si_dma_copy_buffer(struct si_context *ctx,
 }
  }

 +static void si_dma_copy_partial(struct si_context *ctx,
 +   struct pipe_resource *dst,
 +   uint64_t dst_offset,
 +   uint32_t dst_slice_size,
 +   uint32_t dst_pitch,
 +   struct pipe_resource *src,
 +   uint64_t src_offset,
 +   uint32_t src_slice_size,
 +   uint32_t src_pitch,
 +   uint32_t width,
 +   uint32_t height,
 +   uint32_t depth,
 +   unsigned bpp)
 +{
 +   struct radeon_winsys_cs *cs = ctx-b.rings.dma.cs;
 +   struct r600_resource *rdst = (struct r600_resource*)dst;
 +   struct r600_resource *rsrc = (struct r600_resource*)src;
 +
 +   dst_offset += r600_resource_va(ctx-screen-b.b, dst);
 +   src_offset += r600_resource_va(ctx-screen-b.b, src);
 +
 +   r600_need_dma_space(ctx-b, 9);
 +
 +   r600_context_bo_reloc(ctx-b, ctx-b.rings.dma, rsrc, 
 RADEON_USAGE_READ,
 + RADEON_PRIO_MIN);
 +   r600_context_bo_reloc(ctx-b, ctx-b.rings.dma, rdst, 
 RADEON_USAGE_WRITE,
 + RADEON_PRIO_MIN);
 +
 +   radeon_emit(cs, SI_DMA_PACKET(SI_DMA_PACKET_COPY, 
 SI_DMA_COPY_PARTIAL, 0x0));
 +
 +   radeon_emit(cs, src_offset  0x);
 +   radeon_emit(cs, ((src_offset  32UL)  0xff) | (src_pitch  13));
 +   radeon_emit(cs, src_slice_size);
 +
 +   radeon_emit(cs, dst_offset  0x);
 +   radeon_emit(cs, ((dst_offset  32UL)  0xff) | (dst_pitch  13));
 +   radeon_emit(cs, dst_slice_size);
 +
 +   radeon_emit(cs, width | (height  16));
 +   radeon_emit(cs, depth | (util_logbase2(bpp)  29));
 +}
 +
  static void si_dma_copy_tile(struct si_context *ctx,
  struct pipe_resource *dst,
  unsigned dst_level,
 @@ -299,33 +341,40 @@ void si_dma_copy(struct pipe_context *ctx,
 src_mode = src_mode == RADEON_SURF_MODE_LINEAR_ALIGNED ? 
 RADEON_SURF_MODE_LINEAR : src_mode;
 dst_mode = dst_mode == RADEON_SURF_MODE_LINEAR_ALIGNED ? 
 RADEON_SURF_MODE_LINEAR : dst_mode;

 -   if (src_pitch != dst_pitch || src_box-x || dst_x || src_w != dst_w) {
 -   /* FIXME si can do partial blit */
 -   goto fallback;
 -   }
 -   /* the x test here are currently useless (because we don't support 
 partial blit)
 -* but keep them around so we don't forget about those
 -*/
 -   if ((src_pitch % 8) || (src_box-x % 8) || (dst_x % 8) || (src_box-y 
 % 8) || (dst_y % 8)) {
 +   if (((src_pitch % 8) || (src_box-x % 8) || (dst_x % 8) || 
 (src_box-y % 8) || (dst_y % 8)) 
 +   ((src_mode != RADEON_SURF_MODE_LINEAR) || (dst_mode != 
 RADEON_SURF_MODE_LINEAR))) {
 goto fallback;
 }

 if (src_mode == dst_mode) {
 uint64_t dst_offset, src_offset;
 -   /* simple dma blit would do NOTE code here assume :
 -*   src_box.x/y == 0
 -*   dst_x/y == 0
 -*   dst_pitch == src_pitch
 -*/
 -   src_offset= rsrc-surface.level[src_level].offset;
 -   src_offset += rsrc-surface.level[src_level].slice_size * 
 src_box-z;
 +   uint32_t dst_slice_size, src_slice_size;
 +
 +   src_slice_size = rsrc-surface.level[src_level].slice_size;
 +   src_offset = rsrc-surface.level[src_level].offset;
 +   src_offset += src_slice_size * src_box-z;
 src_offset += src_y * src_pitch + src_x * bpp;
 +
 +   dst_slice_size = rdst-surface.level[dst_level].slice_size;
 dst_offset = rdst-surface.level[dst_level].offset;
 -   dst_offset += 

Re: [Mesa-dev] [PATCH 3/4] radeon: accelerate transfer_inline_write

2014-08-06 Thread Marek Olšák
On Tue, Aug 5, 2014 at 7:31 PM, Christian König deathsim...@vodafone.de wrote:
 From: Christian König christian.koe...@amd.com

 Not completely implemented, cause we need DMA copy support for every hw 
 generation.

 Signed-off-by: Christian König christian.koe...@amd.com
 ---
  src/gallium/drivers/radeon/r600_buffer_common.c |   2 +-
  src/gallium/drivers/radeon/r600_pipe_common.c   |   2 +-
  src/gallium/drivers/radeon/r600_texture.c   | 104 
 ++--
  3 files changed, 100 insertions(+), 8 deletions(-)

 diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
 b/src/gallium/drivers/radeon/r600_buffer_common.c
 index d747cbc..28ab30c 100644
 --- a/src/gallium/drivers/radeon/r600_buffer_common.c
 +++ b/src/gallium/drivers/radeon/r600_buffer_common.c
 @@ -372,7 +372,7 @@ static const struct u_resource_vtbl r600_buffer_vtbl =
 r600_buffer_transfer_map,   /* transfer_map */
 NULL,   /* transfer_flush_region */
 r600_buffer_transfer_unmap, /* transfer_unmap */
 -   NULL/* transfer_inline_write */
 +   u_default_transfer_inline_write /* transfer_inline_write */
  };

  struct pipe_resource *r600_buffer_create(struct pipe_screen *screen,
 diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
 b/src/gallium/drivers/radeon/r600_pipe_common.c
 index 3476021..69d344e 100644
 --- a/src/gallium/drivers/radeon/r600_pipe_common.c
 +++ b/src/gallium/drivers/radeon/r600_pipe_common.c
 @@ -153,7 +153,7 @@ bool r600_common_context_init(struct r600_common_context 
 *rctx,
 rctx-b.transfer_map = u_transfer_map_vtbl;
 rctx-b.transfer_flush_region = u_default_transfer_flush_region;
 rctx-b.transfer_unmap = u_transfer_unmap_vtbl;
 -   rctx-b.transfer_inline_write = u_default_transfer_inline_write;
 +   rctx-b.transfer_inline_write = u_transfer_inline_write_vtbl;
  rctx-b.memory_barrier = r600_memory_barrier;
 rctx-b.flush = r600_flush_from_st;

 diff --git a/src/gallium/drivers/radeon/r600_texture.c 
 b/src/gallium/drivers/radeon/r600_texture.c
 index 482bbff..89b3b55 100644
 --- a/src/gallium/drivers/radeon/r600_texture.c
 +++ b/src/gallium/drivers/radeon/r600_texture.c
 @@ -849,6 +849,47 @@ static struct pipe_resource 
 *r600_texture_from_handle(struct pipe_screen *screen
   stride, 
 buf, surface);
  }

 +static struct r600_texture *r600_texture_from_ptr(struct pipe_screen *screen,
 + const struct pipe_resource 
 *templ,
 + void *pointer, unsigned 
 stride)
 +{
 +   struct r600_common_screen *rscreen = (struct 
 r600_common_screen*)screen;
 +   struct radeon_surface surface = {};
 +   struct r600_texture *tex;
 +   unsigned offset, size;
 +   struct pb_buffer *buf;
 +   int r;
 +
 +   /* Support only 2D textures without mipmaps */
 +   if ((templ-target != PIPE_TEXTURE_2D  templ-target != 
 PIPE_TEXTURE_RECT) ||
 + templ-depth0 != 1 || templ-last_level != 0)
 +   return NULL;
 +
 +   /* stride needs to be at least dw aligned */
 +   if (stride % 4)
 +   return NULL;
 +
 +   offset = ((uintptr_t)pointer)  0xfff;
 +   pointer = (void *)(((uintptr_t)pointer) - offset);
 +   size = align(stride * templ-height0 + offset, 0x1000);
 +
 +   /* avoid the overhead for small copies */
 +   if (size  64*1024)
 +   return NULL;
 +
 +   buf = rscreen-ws-buffer_from_ptr(rscreen-ws, pointer, size);
 +   if (!buf)
 +   return NULL;
 +
 +   r = r600_init_surface(rscreen, surface, templ, 
 RADEON_SURF_MODE_LINEAR_ALIGNED, false);

I know you change it the next patch, but I think the alignment for
LINEAR (not ALIGNED) is 8 pixels, right? Of course, libdrm_radeon
should be reviewed if it doesn't over-align the stride. The safest
thing would be to check if stride == surface[0].pitch_in_bytes.


 +   if (r)
 +   return NULL;
 +
 +   tex = r600_texture_create_object(screen, templ, stride, buf, 
 surface);
 +   tex-surface.level[0].offset += offset;
 +   return tex;
 +}
 +
  bool r600_init_flushed_depth_texture(struct pipe_context *ctx,
  struct pipe_resource *texture,
  struct r600_texture **staging)
 @@ -1112,14 +1153,65 @@ static void r600_texture_transfer_unmap(struct 
 pipe_context *ctx,
 FREE(transfer);
  }

 +static void r600_texture_transfer_inline_write(struct pipe_context *ctx,
 +  struct pipe_resource *dst,
 +  unsigned level, unsigned usage,
 +  const struct pipe_box *box,
 +  const void *data,
 +  

Re: [Mesa-dev] [PATCH 4/4] radeon: cache the last used userptr

2014-08-06 Thread Marek Olšák
I only know about AMD_pinned_memory, which is for buffers only. I
don't know about an API for creating textures from user pointers. Yes,
there are pixel buffer objects, but they are a lot more difficult to
implement and they are defined such that a zero-copy approach to get a
texture is not possible.

Marek

On Wed, Aug 6, 2014 at 1:39 PM, Christian König deathsim...@vodafone.de wrote:
 What is this patch good for?

 Nothing in particular, I just wanted to test how much overhead creating a
 new BO each time we do transfer_inline_write actually makes.

 BTW: Implementing transfer_inline_write using userptrs was just a prove of
 concept. It turned out to actually be way slower than just copying with the
 CPU because we need to block for the copy to complete.

 For a real use case we need to support creating textures from application
 supplied pointers and implement the matching OpenGL extensions, but you
 probably know that better than I do.

 Christian.

 Am 06.08.2014 um 13:24 schrieb Marek Olšák:

 What is this patch good for?

 Marek

 On Tue, Aug 5, 2014 at 7:31 PM, Christian König deathsim...@vodafone.de
 wrote:

 From: Christian König christian.koe...@amd.com

 Signed-off-by: Christian König christian.koe...@amd.com
 ---
   src/gallium/drivers/radeon/r600_pipe_common.c |  9 ++
   src/gallium/drivers/radeon/r600_pipe_common.h | 11 +++
   src/gallium/drivers/radeon/r600_texture.c | 41
 +--
   3 files changed, 59 insertions(+), 2 deletions(-)

 diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c
 b/src/gallium/drivers/radeon/r600_pipe_common.c
 index 69d344e..f745311 100644
 --- a/src/gallium/drivers/radeon/r600_pipe_common.c
 +++ b/src/gallium/drivers/radeon/r600_pipe_common.c
 @@ -770,11 +770,20 @@ bool r600_common_screen_init(struct
 r600_common_screen *rscreen,
  }
  }

 +   pipe_mutex_init(rscreen-userptr_lock);
 +
  return true;
   }

   void r600_destroy_common_screen(struct r600_common_screen *rscreen)
   {
 +   unsigned i;
 +
 +   for (i = 0; i  R600_USERPTR_CACHE; ++i)
 +   pipe_resource_reference((struct pipe_resource
 **)rscreen-userptr[i].tex, NULL);
 +
 +   pipe_mutex_destroy(rscreen-userptr_lock);
 +
  pipe_mutex_destroy(rscreen-aux_context_lock);
  rscreen-aux_context-destroy(rscreen-aux_context);

 diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h
 b/src/gallium/drivers/radeon/r600_pipe_common.h
 index dcec2bb..88dbaf8 100644
 --- a/src/gallium/drivers/radeon/r600_pipe_common.h
 +++ b/src/gallium/drivers/radeon/r600_pipe_common.h
 @@ -97,6 +97,8 @@

   #define R600_MAP_BUFFER_ALIGNMENT 64

 +#define R600_USERPTR_CACHE 32
 +
   struct r600_common_context;

   struct radeon_shader_binary {
 @@ -258,6 +260,15 @@ struct r600_common_screen {
  struct r600_resource*trace_bo;
  uint32_t*trace_ptr;
  unsignedcs_count;
 +
 +   struct {
 +   struct r600_texture *tex;
 +   void*pointer;
 +   unsignedoffset;
 +   unsignedsize;
 +   }   userptr[R600_USERPTR_CACHE];
 +   unsigneduserptr_idx;
 +   pipe_mutex  userptr_lock;
   };

   /* This encapsulates a state or an operation which can emitted into the
 GPU
 diff --git a/src/gallium/drivers/radeon/r600_texture.c
 b/src/gallium/drivers/radeon/r600_texture.c
 index 89b3b55..c3ff96c 100644
 --- a/src/gallium/drivers/radeon/r600_texture.c
 +++ b/src/gallium/drivers/radeon/r600_texture.c
 @@ -855,10 +855,11 @@ static struct r600_texture
 *r600_texture_from_ptr(struct pipe_screen *screen,
   {
  struct r600_common_screen *rscreen = (struct
 r600_common_screen*)screen;
  struct radeon_surface surface = {};
 +   struct pipe_resource *res = NULL;
  struct r600_texture *tex;
  unsigned offset, size;
  struct pb_buffer *buf;
 -   int r;
 +   int r, i;

  /* Support only 2D textures without mipmaps */
  if ((templ-target != PIPE_TEXTURE_2D  templ-target !=
 PIPE_TEXTURE_RECT) ||
 @@ -877,16 +878,52 @@ static struct r600_texture
 *r600_texture_from_ptr(struct pipe_screen *screen,
  if (size  64*1024)
  return NULL;

 +   pipe_mutex_lock(rscreen-userptr_lock);
 +   for (i = 0; i  R600_USERPTR_CACHE; ++i) {
 +
 +   if (rscreen-userptr[i].pointer != pointer ||
 +   rscreen-userptr[i].offset != offset ||
 +   rscreen-userptr[i].size != size ||
 +   !rscreen-userptr[i].tex)
 +   continue;
 +
 +   tex = rscreen-userptr[i].tex;
 +   if (tex-resource.b.b.width0 != templ-width0 
 +   tex-resource.b.b.height0 != templ-height0 
 +   

Re: [Mesa-dev] [PATCH 05/20] i965/cfg: Add a foreach_inst_in_block_safe macro.

2014-08-06 Thread Pohjolainen, Topi
On Tue, Aug 05, 2014 at 09:14:55PM +0300, Pohjolainen, Topi wrote:
 On Thu, Jul 24, 2014 at 07:54:12PM -0700, Matt Turner wrote:
  ---
   src/mesa/drivers/dri/i965/brw_cfg.h | 8 
   1 file changed, 8 insertions(+)
  
  diff --git a/src/mesa/drivers/dri/i965/brw_cfg.h 
  b/src/mesa/drivers/dri/i965/brw_cfg.h
  index a5d2df5..913a1ed 100644
  --- a/src/mesa/drivers/dri/i965/brw_cfg.h
  +++ b/src/mesa/drivers/dri/i965/brw_cfg.h
  @@ -120,6 +120,14 @@ struct cfg_t {
   __inst != __block-end-next;  \
   __inst = (__type *)__inst-next)
   
  +#define foreach_inst_in_block_safe(__type, __inst, __block)\
  +   for (__type *__inst = (__type *)__block-start, \
  +   *__next = (__type *)__inst-next,   \
  +   *__end = (__type *)__block-end-next-next;\
 
 Patches 4 and 7 make sense but the double -next-next here is not obvious
 to me.

I tried handwriting instructions into blocks (this is purely arbitrary):

ipopcode
--
0  :  BRW_OPCODE_?
..
k  :  BRW_OPCODE_IF
k+1:  BRW_OPCODE_?
..
n  :  BRW_OPCODE_ELSE
n+1:  BRW_OPCODE_?
..
m  :  BRW_OPCODE_ENDIF
m+1:  BRW_OPCODE_?
..
t  :  BRW_OPCODE_?


Following the logic in the constructor of cfg_t, I would deduce this:

block 0:
   start_ip = 0
   num = 0
   start = inst_0
   end = inst_k  (if)

block 1:
   start_ip = k+1
   num = 1
   start = inst_k+1
   end = inst_n  (else)

block 2:
   start_ip = n+1
   num = 2
   start = inst_n+1
   end = inst_m-1

block 3:
   start_ip = m
   num = 3
   start = inst_m(endif)
   end = inst_t


And as instructions are inherited from exec_node, for block 3 end-next
should be NULL, right?

 
  +__next != __end;   \
  +__inst = __next,   \
  +__next = (__type *)__next-next)
  +
   #define foreach_inst_in_block_reverse(__type, __inst, __block) \
  for (__type *__inst = (__type *)__block-end;   \
   __inst != __block-start-prev;\
  -- 
  1.8.5.5
  
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] radeonsi: implement partial DMA copies v2

2014-08-06 Thread Christian König

Am 06.08.2014 um 13:42 schrieb Marek Olšák:

For patch 1  2:

Reviewed-by: Marek Olšák marek.ol...@amd.com

How was the DMA code tested? I think the best thing would be to switch
resource_copy_region to dma_copy just for testing and run piglit. (you
also probably want to avoid recursion between dma_copy and
resource_copy_region)


Yeah, good idea. I just won't have time for that right now, since I need 
to start on the next requirement asap.


I tested it through VDPAU playback without UVD, e.g. mplayer/vdpau state 
tracker copying the frames using transfer_inline_write to a video buffer 
and compositing that with subtitles.


Seems to provide a quite good way of providing all kind of different 
texture and rectangle sizes (because of the different letters in the 
subtitle). It took me a while to get everything working.


Christian.



Marek

On Tue, Aug 5, 2014 at 7:31 PM, Christian König deathsim...@vodafone.de wrote:

From: Christian König christian.koe...@amd.com

v2: fix a couple of typos and bugs

Signed-off-by: Christian König christian.koe...@amd.com
---
  src/gallium/drivers/radeonsi/si_dma.c | 85 +++
  src/gallium/drivers/radeonsi/sid.h|  1 +
  2 files changed, 68 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_dma.c 
b/src/gallium/drivers/radeonsi/si_dma.c
index 26f1e1b..4d72f62 100644
--- a/src/gallium/drivers/radeonsi/si_dma.c
+++ b/src/gallium/drivers/radeonsi/si_dma.c
@@ -111,6 +111,48 @@ static void si_dma_copy_buffer(struct si_context *ctx,
 }
  }

+static void si_dma_copy_partial(struct si_context *ctx,
+   struct pipe_resource *dst,
+   uint64_t dst_offset,
+   uint32_t dst_slice_size,
+   uint32_t dst_pitch,
+   struct pipe_resource *src,
+   uint64_t src_offset,
+   uint32_t src_slice_size,
+   uint32_t src_pitch,
+   uint32_t width,
+   uint32_t height,
+   uint32_t depth,
+   unsigned bpp)
+{
+   struct radeon_winsys_cs *cs = ctx-b.rings.dma.cs;
+   struct r600_resource *rdst = (struct r600_resource*)dst;
+   struct r600_resource *rsrc = (struct r600_resource*)src;
+
+   dst_offset += r600_resource_va(ctx-screen-b.b, dst);
+   src_offset += r600_resource_va(ctx-screen-b.b, src);
+
+   r600_need_dma_space(ctx-b, 9);
+
+   r600_context_bo_reloc(ctx-b, ctx-b.rings.dma, rsrc, 
RADEON_USAGE_READ,
+ RADEON_PRIO_MIN);
+   r600_context_bo_reloc(ctx-b, ctx-b.rings.dma, rdst, 
RADEON_USAGE_WRITE,
+ RADEON_PRIO_MIN);
+
+   radeon_emit(cs, SI_DMA_PACKET(SI_DMA_PACKET_COPY, SI_DMA_COPY_PARTIAL, 
0x0));
+
+   radeon_emit(cs, src_offset  0x);
+   radeon_emit(cs, ((src_offset  32UL)  0xff) | (src_pitch  13));
+   radeon_emit(cs, src_slice_size);
+
+   radeon_emit(cs, dst_offset  0x);
+   radeon_emit(cs, ((dst_offset  32UL)  0xff) | (dst_pitch  13));
+   radeon_emit(cs, dst_slice_size);
+
+   radeon_emit(cs, width | (height  16));
+   radeon_emit(cs, depth | (util_logbase2(bpp)  29));
+}
+
  static void si_dma_copy_tile(struct si_context *ctx,
  struct pipe_resource *dst,
  unsigned dst_level,
@@ -299,33 +341,40 @@ void si_dma_copy(struct pipe_context *ctx,
 src_mode = src_mode == RADEON_SURF_MODE_LINEAR_ALIGNED ? 
RADEON_SURF_MODE_LINEAR : src_mode;
 dst_mode = dst_mode == RADEON_SURF_MODE_LINEAR_ALIGNED ? 
RADEON_SURF_MODE_LINEAR : dst_mode;

-   if (src_pitch != dst_pitch || src_box-x || dst_x || src_w != dst_w) {
-   /* FIXME si can do partial blit */
-   goto fallback;
-   }
-   /* the x test here are currently useless (because we don't support 
partial blit)
-* but keep them around so we don't forget about those
-*/
-   if ((src_pitch % 8) || (src_box-x % 8) || (dst_x % 8) || (src_box-y % 
8) || (dst_y % 8)) {
+   if (((src_pitch % 8) || (src_box-x % 8) || (dst_x % 8) || (src_box-y % 8) || 
(dst_y % 8)) 
+   ((src_mode != RADEON_SURF_MODE_LINEAR) || (dst_mode != 
RADEON_SURF_MODE_LINEAR))) {
 goto fallback;
 }

 if (src_mode == dst_mode) {
 uint64_t dst_offset, src_offset;
-   /* simple dma blit would do NOTE code here assume :
-*   src_box.x/y == 0
-*   dst_x/y == 0
-*   dst_pitch == src_pitch
-*/
-   src_offset= rsrc-surface.level[src_level].offset;
-   src_offset += rsrc-surface.level[src_level].slice_size * 
src_box-z;
+   uint32_t dst_slice_size, 

Re: [Mesa-dev] [PATCH 3/4] radeon: accelerate transfer_inline_write

2014-08-06 Thread Christian König

Am 06.08.2014 um 13:45 schrieb Marek Olšák:

On Tue, Aug 5, 2014 at 7:31 PM, Christian König deathsim...@vodafone.de wrote:

From: Christian König christian.koe...@amd.com

Not completely implemented, cause we need DMA copy support for every hw 
generation.

Signed-off-by: Christian König christian.koe...@amd.com
---
  src/gallium/drivers/radeon/r600_buffer_common.c |   2 +-
  src/gallium/drivers/radeon/r600_pipe_common.c   |   2 +-
  src/gallium/drivers/radeon/r600_texture.c   | 104 ++--
  3 files changed, 100 insertions(+), 8 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
b/src/gallium/drivers/radeon/r600_buffer_common.c
index d747cbc..28ab30c 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -372,7 +372,7 @@ static const struct u_resource_vtbl r600_buffer_vtbl =
 r600_buffer_transfer_map,   /* transfer_map */
 NULL,   /* transfer_flush_region */
 r600_buffer_transfer_unmap, /* transfer_unmap */
-   NULL/* transfer_inline_write */
+   u_default_transfer_inline_write /* transfer_inline_write */
  };

  struct pipe_resource *r600_buffer_create(struct pipe_screen *screen,
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 3476021..69d344e 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -153,7 +153,7 @@ bool r600_common_context_init(struct r600_common_context 
*rctx,
 rctx-b.transfer_map = u_transfer_map_vtbl;
 rctx-b.transfer_flush_region = u_default_transfer_flush_region;
 rctx-b.transfer_unmap = u_transfer_unmap_vtbl;
-   rctx-b.transfer_inline_write = u_default_transfer_inline_write;
+   rctx-b.transfer_inline_write = u_transfer_inline_write_vtbl;
  rctx-b.memory_barrier = r600_memory_barrier;
 rctx-b.flush = r600_flush_from_st;

diff --git a/src/gallium/drivers/radeon/r600_texture.c 
b/src/gallium/drivers/radeon/r600_texture.c
index 482bbff..89b3b55 100644
--- a/src/gallium/drivers/radeon/r600_texture.c
+++ b/src/gallium/drivers/radeon/r600_texture.c
@@ -849,6 +849,47 @@ static struct pipe_resource 
*r600_texture_from_handle(struct pipe_screen *screen
   stride, buf, 
surface);
  }

+static struct r600_texture *r600_texture_from_ptr(struct pipe_screen *screen,
+ const struct pipe_resource 
*templ,
+ void *pointer, unsigned 
stride)
+{
+   struct r600_common_screen *rscreen = (struct r600_common_screen*)screen;
+   struct radeon_surface surface = {};
+   struct r600_texture *tex;
+   unsigned offset, size;
+   struct pb_buffer *buf;
+   int r;
+
+   /* Support only 2D textures without mipmaps */
+   if ((templ-target != PIPE_TEXTURE_2D  templ-target != 
PIPE_TEXTURE_RECT) ||
+ templ-depth0 != 1 || templ-last_level != 0)
+   return NULL;
+
+   /* stride needs to be at least dw aligned */
+   if (stride % 4)
+   return NULL;
+
+   offset = ((uintptr_t)pointer)  0xfff;
+   pointer = (void *)(((uintptr_t)pointer) - offset);
+   size = align(stride * templ-height0 + offset, 0x1000);
+
+   /* avoid the overhead for small copies */
+   if (size  64*1024)
+   return NULL;
+
+   buf = rscreen-ws-buffer_from_ptr(rscreen-ws, pointer, size);
+   if (!buf)
+   return NULL;
+
+   r = r600_init_surface(rscreen, surface, templ, 
RADEON_SURF_MODE_LINEAR_ALIGNED, false);

I know you change it the next patch, but I think the alignment for
LINEAR (not ALIGNED) is 8 pixels, right? Of course, libdrm_radeon
should be reviewed if it doesn't over-align the stride. The safest
thing would be to check if stride == surface[0].pitch_in_bytes.


Yeah, correct.

The problem here is that even RADEON_SURF_MODE_LINEAR couldn't even 
handle all different alignments the application could come up with for 
the base pointer and stride. The only thing that can handle dword 
aligned or even byte aligned subwindow copies is the async DMA partial 
copy command and that is only available on NI+.


Apart from that testing if libdrm_radeon really comes up with the 
correct stride is indeed a good idea.






+   if (r)
+   return NULL;
+
+   tex = r600_texture_create_object(screen, templ, stride, buf, surface);
+   tex-surface.level[0].offset += offset;
+   return tex;
+}
+
  bool r600_init_flushed_depth_texture(struct pipe_context *ctx,
  struct pipe_resource *texture,
  struct r600_texture **staging)
@@ -1112,14 +1153,65 @@ static void r600_texture_transfer_unmap(struct 

Re: [Mesa-dev] [PATCH 11/20] i965: Add basic-block aware backend_instruction::insert_* methods.

2014-08-06 Thread Pohjolainen, Topi
On Thu, Jul 24, 2014 at 07:54:18PM -0700, Matt Turner wrote:
 ---
  src/mesa/drivers/dri/i965/brw_shader.cpp | 80 
 
  src/mesa/drivers/dri/i965/brw_shader.h   |  5 ++
  2 files changed, 85 insertions(+)
 
 diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
 b/src/mesa/drivers/dri/i965/brw_shader.cpp
 index 47535a9..ba93cbc 100644
 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
 @@ -740,6 +740,86 @@ backend_instruction::has_side_effects() const
  }
  
  void
 +backend_instruction::insert_after(bblock_t *block, backend_instruction *inst)
 +{
 +   bool found = false; (void) found;
 +   foreach_inst_in_block (backend_instruction, i, block) {
 +  if (this == i) {
 + found = true;
 +  }
 +   }
 +   assert(found || !Instruction not in block);
 +
 +   block-end_ip++;
 +
 +   for (bblock_t *block_iter = (bblock_t *)block-link.next;
 +!block_iter-link.is_tail_sentinel();
 +block_iter = (bblock_t *)block_iter-link.next) {
 +  block_iter-start_ip++;
 +  block_iter-end_ip++;
 +   }
 +
 +   if (block-end == this)
 +  block-end = inst;
 +
 +   this-insert_after(inst);

If you used exec_node::insert_after(inst) instead would you still need the
using directive in the header?

 +}
 +
 +void
 +backend_instruction::insert_before(bblock_t *block, backend_instruction 
 *inst)
 +{
 +   bool found = false; (void) found;
 +   foreach_inst_in_block (backend_instruction, i, block) {
 +  if (this == i) {
 + found = true;
 +  }
 +   }
 +   assert(found || !Instruction not in block);
 +
 +   block-end_ip++;
 +
 +   for (bblock_t *block_iter = (bblock_t *)block-link.next;
 +!block_iter-link.is_tail_sentinel();
 +block_iter = (bblock_t *)block_iter-link.next) {
 +  block_iter-start_ip++;
 +  block_iter-end_ip++;
 +   }
 +
 +   if (block-start == this)
 +  block-start = inst;
 +
 +   this-insert_before(inst);
 +}
 +
 +void
 +backend_instruction::insert_before(bblock_t *block, exec_list *list)
 +{
 +   bool found = false; (void) found;
 +   foreach_inst_in_block (backend_instruction, i, block) {
 +  if (this == i) {
 + found = true;
 +  }
 +   }
 +   assert(found || !Instruction not in block);

This is common for all three cases, and could be refactored into its own
function, say check_inst_in_block(). It would document the seven lines nicely.

 +
 +   unsigned num_inst = list-length();
 +
 +   block-end_ip += num_inst;
 +
 +   for (bblock_t *block_iter = (bblock_t *)block-link.next;
 +!block_iter-link.is_tail_sentinel();
 +block_iter = (bblock_t *)block_iter-link.next) {
 +  block_iter-start_ip += num_inst;
 +  block_iter-end_ip += num_inst;
 +   }

Same here, this iteration is the same and could be its own member with
arugment telling the adjustment size.

 +
 +   if (block-start == this)
 +  block-start = (backend_instruction *)list-get_head();
 +
 +   this-insert_before(list);
 +}
 +
 +void
  backend_instruction::remove(bblock_t *block)
  {
 bool found = false; (void) found;
 diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
 b/src/mesa/drivers/dri/i965/brw_shader.h
 index 4b80ea9..d174d5c 100644
 --- a/src/mesa/drivers/dri/i965/brw_shader.h
 +++ b/src/mesa/drivers/dri/i965/brw_shader.h
 @@ -92,6 +92,11 @@ struct backend_instruction : public exec_node {
  
 using exec_node::remove;
 void remove(bblock_t *block);
 +   using exec_node::insert_after;
 +   void insert_after(bblock_t *block, backend_instruction *inst);
 +   using exec_node::insert_before;
 +   void insert_before(bblock_t *block, backend_instruction *inst);
 +   void insert_before(bblock_t *block, exec_list *list);
  
 /**
  * True if the instruction has side effects other than writing to
 -- 
 1.8.5.5
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] radeon: accelerate transfer_inline_write

2014-08-06 Thread Marek Olšák
On Wed, Aug 6, 2014 at 2:44 PM, Christian König deathsim...@vodafone.de wrote:
 Am 06.08.2014 um 13:45 schrieb Marek Olšák:

 On Tue, Aug 5, 2014 at 7:31 PM, Christian König deathsim...@vodafone.de
 wrote:

 From: Christian König christian.koe...@amd.com

 Not completely implemented, cause we need DMA copy support for every hw
 generation.

 Signed-off-by: Christian König christian.koe...@amd.com
 ---
   src/gallium/drivers/radeon/r600_buffer_common.c |   2 +-
   src/gallium/drivers/radeon/r600_pipe_common.c   |   2 +-
   src/gallium/drivers/radeon/r600_texture.c   | 104
 ++--
   3 files changed, 100 insertions(+), 8 deletions(-)

 diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c
 b/src/gallium/drivers/radeon/r600_buffer_common.c
 index d747cbc..28ab30c 100644
 --- a/src/gallium/drivers/radeon/r600_buffer_common.c
 +++ b/src/gallium/drivers/radeon/r600_buffer_common.c
 @@ -372,7 +372,7 @@ static const struct u_resource_vtbl r600_buffer_vtbl
 =
  r600_buffer_transfer_map,   /* transfer_map */
  NULL,   /* transfer_flush_region */
  r600_buffer_transfer_unmap, /* transfer_unmap */
 -   NULL/* transfer_inline_write */
 +   u_default_transfer_inline_write /* transfer_inline_write */
   };

   struct pipe_resource *r600_buffer_create(struct pipe_screen *screen,
 diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c
 b/src/gallium/drivers/radeon/r600_pipe_common.c
 index 3476021..69d344e 100644
 --- a/src/gallium/drivers/radeon/r600_pipe_common.c
 +++ b/src/gallium/drivers/radeon/r600_pipe_common.c
 @@ -153,7 +153,7 @@ bool r600_common_context_init(struct
 r600_common_context *rctx,
  rctx-b.transfer_map = u_transfer_map_vtbl;
  rctx-b.transfer_flush_region = u_default_transfer_flush_region;
  rctx-b.transfer_unmap = u_transfer_unmap_vtbl;
 -   rctx-b.transfer_inline_write = u_default_transfer_inline_write;
 +   rctx-b.transfer_inline_write = u_transfer_inline_write_vtbl;
   rctx-b.memory_barrier = r600_memory_barrier;
  rctx-b.flush = r600_flush_from_st;

 diff --git a/src/gallium/drivers/radeon/r600_texture.c
 b/src/gallium/drivers/radeon/r600_texture.c
 index 482bbff..89b3b55 100644
 --- a/src/gallium/drivers/radeon/r600_texture.c
 +++ b/src/gallium/drivers/radeon/r600_texture.c
 @@ -849,6 +849,47 @@ static struct pipe_resource
 *r600_texture_from_handle(struct pipe_screen *screen

 stride, buf, surface);
   }

 +static struct r600_texture *r600_texture_from_ptr(struct pipe_screen
 *screen,
 + const struct
 pipe_resource *templ,
 + void *pointer, unsigned
 stride)
 +{
 +   struct r600_common_screen *rscreen = (struct
 r600_common_screen*)screen;
 +   struct radeon_surface surface = {};
 +   struct r600_texture *tex;
 +   unsigned offset, size;
 +   struct pb_buffer *buf;
 +   int r;
 +
 +   /* Support only 2D textures without mipmaps */
 +   if ((templ-target != PIPE_TEXTURE_2D  templ-target !=
 PIPE_TEXTURE_RECT) ||
 + templ-depth0 != 1 || templ-last_level != 0)
 +   return NULL;
 +
 +   /* stride needs to be at least dw aligned */
 +   if (stride % 4)
 +   return NULL;
 +
 +   offset = ((uintptr_t)pointer)  0xfff;
 +   pointer = (void *)(((uintptr_t)pointer) - offset);
 +   size = align(stride * templ-height0 + offset, 0x1000);
 +
 +   /* avoid the overhead for small copies */
 +   if (size  64*1024)
 +   return NULL;
 +
 +   buf = rscreen-ws-buffer_from_ptr(rscreen-ws, pointer, size);
 +   if (!buf)
 +   return NULL;
 +
 +   r = r600_init_surface(rscreen, surface, templ,
 RADEON_SURF_MODE_LINEAR_ALIGNED, false);

 I know you change it the next patch, but I think the alignment for
 LINEAR (not ALIGNED) is 8 pixels, right? Of course, libdrm_radeon
 should be reviewed if it doesn't over-align the stride. The safest
 thing would be to check if stride == surface[0].pitch_in_bytes.


 Yeah, correct.

 The problem here is that even RADEON_SURF_MODE_LINEAR couldn't even handle
 all different alignments the application could come up with for the base
 pointer and stride. The only thing that can handle dword aligned or even
 byte aligned subwindow copies is the async DMA partial copy command and that
 is only available on NI+.

 Apart from that testing if libdrm_radeon really comes up with the correct
 stride is indeed a good idea.




 +   if (r)
 +   return NULL;
 +
 +   tex = r600_texture_create_object(screen, templ, stride, buf,
 surface);
 +   tex-surface.level[0].offset += offset;
 +   return tex;
 +}
 +
   bool r600_init_flushed_depth_texture(struct pipe_context *ctx,
   struct pipe_resource *texture,
   

Re: [Mesa-dev] [PATCH 00/10] [RFC] Probably useless algebraic optimizations

2014-08-06 Thread Thomas Helland
2014-08-04 21:25 GMT+02:00 Eric Anholt e...@anholt.net:
 thomashellan...@gmail.com writes:

 From: Thomas Helland thomashellan...@gmail.com

 When writing that A || (A  B) patch some
 days ago I also wrote some other patches
 that have no impact on my collection of shaders.
 (shader-db + Some TF2 and Portal-shaders).
 No reduction in instruction count, and no
 significant increase in compilation time.

 I decided to put them up here anyway, as
 with your collection of shaders maybe YMMV.

 I'm definitely interested in seeing our optimizer gain features like
 this, even if we don't have samples of code triggering them in our
 database yet.  What we have in shader-db from real-world apps is a
 subset of what our compiler will encounter -- it doesn't tend to cover
 code by novice shader developers, nor does it cover more-complex,
 more-code-generated code we expect to see in the future.

This was my initial thought to.
Also, while some of these patterns are simple, and likely be spotted
by a seasoned programmer, they may end up in our tree from
other optimizations simplifying the code.


 If the patches are cleaned up to use spaces instead of tabs, and avoid
 trailing whitespace, patches 1-4, 7-8, and 10 are:

 Reviewed-by: Eric Anholt e...@anholt.net


I'll get these cleaned up and posted to the list again soon.
I don't have commit access, so I'll need someone to push them for me.

 For the sub case, I'm going want to disable lower_sub_to_add_neg on my
 hardware, since I've got SUB but no negate modifier on operands.  This
 makes the (A - neg(B)) patch interesting to me.  However, since neg(A) -
 B - neg(A+B) was questioned, and it would be no change for me as well,
 I think we should probably drop that half.

I'll rewrite this to drop the neg(A)-B part, and post it along with the rest.


 The min/max patches I'm not that interested in -- I think that class of
 optimization would be better handled in an pass that can track various
 bounds that values might have over time, rather than being a special
 case in algebraic.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeon, r200: fix buffer validation after CS flush

2014-08-06 Thread Marek Olšák
From: Marek Olšák marek.ol...@amd.com

This validates all bound buffers (CB, ZB, textures, DMA) at the beginning
of CS. This fixes bo-space_accouned assertion failures.

Tested by: Jochen Rollwagen joro-2...@t-online.de
Cc: mesa-sta...@lists.freedesktop.org
---
 src/mesa/drivers/dri/r200/r200_context.c|  1 +
 src/mesa/drivers/dri/r200/r200_state.c  |  2 +-
 src/mesa/drivers/dri/r200/r200_state.h  |  1 +
 src/mesa/drivers/dri/radeon/radeon_common.c | 14 +-
 src/mesa/drivers/dri/radeon/radeon_common_context.h |  1 +
 src/mesa/drivers/dri/radeon/radeon_context.c|  1 +
 src/mesa/drivers/dri/radeon/radeon_state.c  |  2 +-
 src/mesa/drivers/dri/radeon/radeon_state.h  |  1 +
 8 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/src/mesa/drivers/dri/r200/r200_context.c 
b/src/mesa/drivers/dri/r200/r200_context.c
index 71dfcf3..d5749f3 100644
--- a/src/mesa/drivers/dri/r200/r200_context.c
+++ b/src/mesa/drivers/dri/r200/r200_context.c
@@ -190,6 +190,7 @@ static void r200_init_vtbl(radeonContextPtr radeon)
radeon-vtbl.check_blit = r200_check_blit;
radeon-vtbl.blit = r200_blit;
radeon-vtbl.is_format_renderable = radeonIsFormatRenderable;
+   radeon-vtbl.revalidate_all_buffers = r200ValidateBuffers;
 }
 
 
diff --git a/src/mesa/drivers/dri/r200/r200_state.c 
b/src/mesa/drivers/dri/r200/r200_state.c
index 2c7b652..983430f 100644
--- a/src/mesa/drivers/dri/r200/r200_state.c
+++ b/src/mesa/drivers/dri/r200/r200_state.c
@@ -2210,7 +2210,7 @@ static void update_texturematrix( struct gl_context *ctx )
}
 }
 
-static GLboolean r200ValidateBuffers(struct gl_context *ctx)
+GLboolean r200ValidateBuffers(struct gl_context *ctx)
 {
r200ContextPtr rmesa = R200_CONTEXT(ctx);
struct radeon_renderbuffer *rrb;
diff --git a/src/mesa/drivers/dri/r200/r200_state.h 
b/src/mesa/drivers/dri/r200/r200_state.h
index db0f01f..a396b06 100644
--- a/src/mesa/drivers/dri/r200/r200_state.h
+++ b/src/mesa/drivers/dri/r200/r200_state.h
@@ -47,6 +47,7 @@ extern void r200UpdateViewportOffset( struct gl_context *ctx 
);
 extern void r200UpdateWindow( struct gl_context *ctx );
 extern void r200UpdateDrawBuffer(struct gl_context *ctx);
 
+extern GLboolean r200ValidateBuffers(struct gl_context *ctx);
 extern GLboolean r200ValidateState( struct gl_context *ctx );
 
 extern void r200_vtbl_update_scissor( struct gl_context *ctx );
diff --git a/src/mesa/drivers/dri/radeon/radeon_common.c 
b/src/mesa/drivers/dri/radeon/radeon_common.c
index 67c6dc7..515e55a 100644
--- a/src/mesa/drivers/dri/radeon/radeon_common.c
+++ b/src/mesa/drivers/dri/radeon/radeon_common.c
@@ -532,17 +532,6 @@ static INLINE void radeonEmitAtoms(radeonContextPtr 
radeon, GLboolean emitAll)
COMMIT_BATCH();
 }
 
-static GLboolean radeon_revalidate_bos(struct gl_context *ctx)
-{
-   radeonContextPtr radeon = RADEON_CONTEXT(ctx);
-   int ret;
-
-   ret = radeon_cs_space_check(radeon-cmdbuf.cs);
-   if (ret == RADEON_CS_SPACE_FLUSH)
-   return GL_FALSE;
-   return GL_TRUE;
-}
-
 void radeonEmitState(radeonContextPtr radeon)
 {
radeon_print(RADEON_STATE, RADEON_NORMAL, %s\n, __FUNCTION__);
@@ -661,9 +650,8 @@ int rcommonFlushCmdBufLocked(radeonContextPtr rmesa, const 
char *caller)
radeon_cs_erase(rmesa-cmdbuf.cs);
rmesa-cmdbuf.flushing = 0;
 
-   if (radeon_revalidate_bos(rmesa-glCtx) == GL_FALSE) {
+   if (!rmesa-vtbl.revalidate_all_buffers(rmesa-glCtx))
fprintf(stderr,failed to revalidate buffers\n);
-   }
 
return ret;
 }
diff --git a/src/mesa/drivers/dri/radeon/radeon_common_context.h 
b/src/mesa/drivers/dri/radeon/radeon_common_context.h
index 6cd1535..ac3e7b5 100644
--- a/src/mesa/drivers/dri/radeon/radeon_common_context.h
+++ b/src/mesa/drivers/dri/radeon/radeon_common_context.h
@@ -496,6 +496,7 @@ struct radeon_context {
 unsigned reg_height,
 unsigned flip_y);
   unsigned (*is_format_renderable)(mesa_format mesa_format);
+  GLboolean (*revalidate_all_buffers)(struct gl_context *ctx);
} vtbl;
 };
 
diff --git a/src/mesa/drivers/dri/radeon/radeon_context.c 
b/src/mesa/drivers/dri/radeon/radeon_context.c
index 1ceb4ab..edd94e2 100644
--- a/src/mesa/drivers/dri/radeon/radeon_context.c
+++ b/src/mesa/drivers/dri/radeon/radeon_context.c
@@ -157,6 +157,7 @@ static void r100_init_vtbl(radeonContextPtr radeon)
radeon-vtbl.check_blit = r100_check_blit;
radeon-vtbl.blit = r100_blit;
radeon-vtbl.is_format_renderable = radeonIsFormatRenderable;
+   radeon-vtbl.revalidate_all_buffers = r100ValidateBuffers;
 }
 
 /* Create the device specific context.
diff --git a/src/mesa/drivers/dri/radeon/radeon_state.c 
b/src/mesa/drivers/dri/radeon/radeon_state.c
index f6bc5df..843b041 100644
--- a/src/mesa/drivers/dri/radeon/radeon_state.c
+++ b/src/mesa/drivers/dri/radeon/radeon_state.c
@@ -1992,7 +1992,7 @@ static void 

[Mesa-dev] [PATCH] st/mesa: dump TGSI before calling into the driver

2014-08-06 Thread Marek Olšák
From: Marek Olšák marek.ol...@amd.com

If the driver crashes in create_xx_shader, you want to see the shader.
---
 src/mesa/state_tracker/st_program.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/src/mesa/state_tracker/st_program.c 
b/src/mesa/state_tracker/st_program.c
index 9d7b7c4..fbf8930 100644
--- a/src/mesa/state_tracker/st_program.c
+++ b/src/mesa/state_tracker/st_program.c
@@ -393,13 +393,12 @@ st_translate_vertex_program(struct st_context *st,
   vpv-tgsi.stream_output);
}
 
-   vpv-driver_shader = pipe-create_vs_state(pipe, vpv-tgsi);
-
if (ST_DEBUG  DEBUG_TGSI) {
-  tgsi_dump( vpv-tgsi.tokens, 0 );
+  tgsi_dump(vpv-tgsi.tokens, 0);
   debug_printf(\n);
}
 
+   vpv-driver_shader = pipe-create_vs_state(pipe, vpv-tgsi);
return vpv;
 
 fail:
@@ -804,15 +803,15 @@ st_translate_fragment_program(struct st_context *st,
variant-tgsi.tokens = ureg_get_tokens( ureg, NULL );
ureg_destroy( ureg );
 
-   /* fill in variant */
-   variant-driver_shader = pipe-create_fs_state(pipe, variant-tgsi);
-   variant-key = *key;
-
if (ST_DEBUG  DEBUG_TGSI) {
-  tgsi_dump( variant-tgsi.tokens, 0/*TGSI_DUMP_VERBOSE*/ );
+  tgsi_dump(variant-tgsi.tokens, 0/*TGSI_DUMP_VERBOSE*/);
   debug_printf(\n);
}
 
+   /* fill in variant */
+   variant-driver_shader = pipe-create_fs_state(pipe, variant-tgsi);
+   variant-key = *key;
+
if (deleteFP) {
   /* Free the temporary program made above */
   struct gl_fragment_program *fp = stfp-Base;
@@ -1173,10 +1172,6 @@ st_translate_geometry_program(struct st_context *st,
   stgp-tgsi.stream_output);
}
 
-   /* fill in new variant */
-   gpv-driver_shader = pipe-create_gs_state(pipe, stgp-tgsi);
-   gpv-key = *key;
-
if ((ST_DEBUG  DEBUG_TGSI)  (ST_DEBUG  DEBUG_MESA)) {
   _mesa_print_program(stgp-Base.Base);
   debug_printf(\n);
@@ -1187,6 +1182,9 @@ st_translate_geometry_program(struct st_context *st,
   debug_printf(\n);
}
 
+   /* fill in new variant */
+   gpv-driver_shader = pipe-create_gs_state(pipe, stgp-tgsi);
+   gpv-key = *key;
return gpv;
 }
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] radeonsi: always prefer SWITCH_ON_EOP(0) on CIK

2014-08-06 Thread Marek Olšák
From: Marek Olšák marek.ol...@amd.com

The code is rewritten to take known constraints into account, while always
using 0 by default.

This should improve performance for multi-SE parts in theory.

A debug option is also added for easier debugging. (If there are hangs,
use the option. If the hangs go away, you have found the problem.)
---
 src/gallium/drivers/radeon/r600_pipe_common.c |  2 +-
 src/gallium/drivers/radeon/r600_pipe_common.h |  1 +
 src/gallium/drivers/radeonsi/si_state_draw.c  | 33 ---
 src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 17 
 4 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 3476021..eb44d72 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -239,7 +239,6 @@ static const struct debug_named_value 
common_debug_options[] = {
{ vm, DBG_VM, Print virtual addresses when creating resources },
{ trace_cs, DBG_TRACE_CS, Trace cs and write rlockup_csid.c file 
with faulty cs },
 
-
/* shaders */
{ fs, DBG_FS, Print fetch shaders },
{ vs, DBG_VS, Print vertex shaders },
@@ -254,6 +253,7 @@ static const struct debug_named_value 
common_debug_options[] = {
{ noinvalrange, DBG_NO_DISCARD_RANGE, Disable handling of 
INVALIDATE_RANGE map flags },
{ no2d, DBG_NO_2D_TILING, Disable 2D tiling },
{ notiling, DBG_NO_TILING, Disable tiling },
+   { switch_on_eop, DBG_SWITCH_ON_EOP, Program WD/IA to switch on 
end-of-packet. },
 
DEBUG_NAMED_VALUE_END /* must be last */
 };
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index dcec2bb..ac69d5b 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -93,6 +93,7 @@
 #define DBG_NO_DISCARD_RANGE   (1  12)
 #define DBG_NO_2D_TILING   (1  13)
 #define DBG_NO_TILING  (1  14)
+#define DBG_SWITCH_ON_EOP  (1  15)
 /* The maximum allowed bit is 15. */
 
 #define R600_MAP_BUFFER_ALIGNMENT 64
diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
b/src/gallium/drivers/radeonsi/si_state_draw.c
index 4e808a3..ae839ba 100644
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -401,25 +401,40 @@ static bool si_update_draw_info_state(struct si_context 
*sctx,
 
if (sctx-b.chip_class = CIK) {
struct si_state_rasterizer *rs = sctx-queued.named.rasterizer;
-   bool wd_switch_on_eop = prim == V_008958_DI_PT_POLYGON ||
-   prim == V_008958_DI_PT_LINELOOP ||
-   prim == V_008958_DI_PT_TRIFAN ||
-   prim == V_008958_DI_PT_TRISTRIP_ADJ ||
-   info-primitive_restart ||
-   (rs ? rs-line_stipple_enable : false);
-   /* If the WD switch is false, the IA switch must be false too. 
*/
-   bool ia_switch_on_eop = wd_switch_on_eop;
unsigned primgroup_size = 64;
 
+   /* SWITCH_ON_EOP(0) is always preferable. */
+   bool wd_switch_on_eop = false;
+   bool ia_switch_on_eop = false;
+
+   /* WD_SWITCH_ON_EOP has no effect on GPUs with less than
+* 4 shader engines. Set 1 to pass the assertion below.
+* The other cases are hardware requirements. */
+   if (sctx-b.screen-info.max_se  4 ||
+   prim == V_008958_DI_PT_POLYGON ||
+   prim == V_008958_DI_PT_LINELOOP ||
+   prim == V_008958_DI_PT_TRIFAN ||
+   prim == V_008958_DI_PT_TRISTRIP_ADJ ||
+   info-primitive_restart)
+   wd_switch_on_eop = true;
+
/* Hawaii hangs if instancing is enabled and WD_SWITCH_ON_EOP 
is 0.
 * We don't know that for indirect drawing, so treat it as
 * always problematic. */
if (sctx-b.family == CHIP_HAWAII 
-   (info-indirect || info-instance_count  1)) {
+   (info-indirect || info-instance_count  1))
wd_switch_on_eop = true;
+
+   /* This is a hardware requirement. */
+   if ((rs  rs-line_stipple_enable) ||
+   (sctx-b.screen-debug_flags  DBG_SWITCH_ON_EOP)) {
ia_switch_on_eop = true;
+   wd_switch_on_eop = true;
}
 
+   /* If the WD switch is false, the IA switch must be false too. 
*/
+   assert(wd_switch_on_eop || !ia_switch_on_eop);
+
si_pm4_set_reg(pm4, R_028B74_VGT_DISPATCH_DRAW_INDEX,
   

[Mesa-dev] [PATCH 1/2] radeonsi: fix a hang with instancing in Unigine Heaven/Valley on Hawaii

2014-08-06 Thread Marek Olšák
From: Marek Olšák marek.ol...@amd.com

This isn't documented anywhere, but it's the only thing that works
for this case.
---
 src/gallium/drivers/radeonsi/si_state_draw.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
b/src/gallium/drivers/radeonsi/si_state_draw.c
index eb21ba1..4e808a3 100644
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -411,14 +411,11 @@ static bool si_update_draw_info_state(struct si_context 
*sctx,
bool ia_switch_on_eop = wd_switch_on_eop;
unsigned primgroup_size = 64;
 
-   /* Hawaii hangs if instancing is enabled and each instance
-* is smaller than a prim group and WD_SWITCH_ON_EOP is 0.
+   /* Hawaii hangs if instancing is enabled and WD_SWITCH_ON_EOP 
is 0.
 * We don't know that for indirect drawing, so treat it as
 * always problematic. */
if (sctx-b.family == CHIP_HAWAII 
-   (info-indirect ||
-(info-instance_count  1 
- u_prims_for_vertices(info-mode, info-count)  
primgroup_size))) {
+   (info-indirect || info-instance_count  1)) {
wd_switch_on_eop = true;
ia_switch_on_eop = true;
}
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: make vertex array type error checking a little more efficient

2014-08-06 Thread Brian Paul

On 08/05/2014 10:35 AM, Roland Scheidegger wrote:

Am 30.07.2014 19:08, schrieb Brian Paul:

Compute the bitmask of supported array types once instead of every
time we call a GL vertex array function.
---
  src/mesa/main/mtypes.h |3 ++
  src/mesa/main/varray.c |   86 +++-
  2 files changed, 59 insertions(+), 30 deletions(-)

diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 3f60a55..f5ce360 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -1693,6 +1693,9 @@ struct gl_array_attrib

 /** One of the DRAW_xxx flags, not consumed by drivers */
 gl_draw_method DrawMethod;
+
+   /** Legal array datatypes */
+   GLbitfield LegalTypesMask;
  };


diff --git a/src/mesa/main/varray.c b/src/mesa/main/varray.c
index 46956ef..0356858 100644
--- a/src/mesa/main/varray.c
+++ b/src/mesa/main/varray.c
@@ -179,6 +179,53 @@ vertex_binding_divisor(struct gl_context *ctx, GLuint 
bindingIndex,


  /**
+ * Examine the API profile and extensions to determine which types are legal
+ * for vertex arrays.  This is called once from update_array_format().
+ */
+static GLbitfield
+get_legal_types_mask(const struct gl_context *ctx)
+{
+   GLbitfield legalTypesMask = ~0u; /* all */

I think it would be better to list all possible values explicitly here,
otherwise you have lots of impossible bits in there in the end. It
should not make any difference really, though for instance for debugging
the legalTypesMask might look a little confusing otherwise.


OK, I can do that in a follow-up.





+
+   if (_mesa_is_gles(ctx)) {
+  legalTypesMask = ~(FIXED_GL_BIT |
+  DOUBLE_BIT |
+  UNSIGNED_INT_10F_11F_11F_REV_BIT);
+
+  /* GL_INT and GL_UNSIGNED_INT data is not allowed in OpenGL ES until
+   * 3.0.  The 2_10_10_10 types are added in OpenGL ES 3.0 or
+   * GL_OES_vertex_type_10_10_10_2.  GL_HALF_FLOAT data is not allowed
+   * until 3.0 or with the GL_OES_vertex_half float extension, which isn't
+   * quite as trivial as we'd like because it uses a different enum value
+   * for GL_HALF_FLOAT_OES.
+   */
+  if (ctx-Version  30) {
+ legalTypesMask = ~(UNSIGNED_INT_BIT |
+ INT_BIT |
+ UNSIGNED_INT_2_10_10_10_REV_BIT |
+ INT_2_10_10_10_REV_BIT |
+ HALF_BIT);
+  }
+   }
+   else {
+  legalTypesMask = ~FIXED_ES_BIT;
+
+  if (!ctx-Extensions.ARB_ES2_compatibility)
+ legalTypesMask = ~FIXED_GL_BIT;
+
+  if (!ctx-Extensions.ARB_vertex_type_2_10_10_10_rev)
+ legalTypesMask = ~(UNSIGNED_INT_2_10_10_10_REV_BIT |
+ INT_2_10_10_10_REV_BIT);
+
+  if (!ctx-Extensions.ARB_vertex_type_10f_11f_11f_rev)
+ legalTypesMask = ~UNSIGNED_INT_10F_11F_11F_REV_BIT;
+   }
+
+   return legalTypesMask;
+}
+
+
+/**
   * Does error checking and updates the format in an attrib array.
   *
   * Called by update_array() and VertexAttrib*Format().
@@ -208,40 +255,19 @@ update_array_format(struct gl_context *ctx,
 GLuint elementSize;
 GLenum format = GL_RGBA;

-   if (_mesa_is_gles(ctx)) {
-  legalTypesMask = ~(FIXED_GL_BIT | DOUBLE_BIT | 
UNSIGNED_INT_10F_11F_11F_REV_BIT);
-
-  /* GL_INT and GL_UNSIGNED_INT data is not allowed in OpenGL ES until
-   * 3.0.  The 2_10_10_10 types are added in OpenGL ES 3.0 or
-   * GL_OES_vertex_type_10_10_10_2.  GL_HALF_FLOAT data is not allowed
-   * until 3.0 or with the GL_OES_vertex_half float extension, which isn't
-   * quite as trivial as we'd like because it uses a different enum value
-   * for GL_HALF_FLOAT_OES.
+   if (ctx-Array.LegalTypesMask == 0) {
+  /* One-time initialization.  We can't do this in _mesa_init_varrays()
+   * below because extensions are not yet enabled at that point.
 */
-  if (ctx-Version  30) {
- legalTypesMask = ~(UNSIGNED_INT_BIT
- | INT_BIT
- | UNSIGNED_INT_2_10_10_10_REV_BIT
- | INT_2_10_10_10_REV_BIT
- | HALF_BIT);
-  }
+  ctx-Array.LegalTypesMask = get_legal_types_mask(ctx);
+   }
+
+   legalTypesMask = ctx-Array.LegalTypesMask;

+   if (_mesa_is_gles(ctx)  sizeMax == BGRA_OR_4) {
/* BGRA ordering is not supported in ES contexts.
 */
-  if (sizeMax == BGRA_OR_4)
- sizeMax = 4;
-   } else {
-  legalTypesMask = ~FIXED_ES_BIT;
-
-  if (!ctx-Extensions.ARB_ES2_compatibility)
- legalTypesMask = ~FIXED_GL_BIT;
-
-  if (!ctx-Extensions.ARB_vertex_type_2_10_10_10_rev)
- legalTypesMask = ~(UNSIGNED_INT_2_10_10_10_REV_BIT |
- INT_2_10_10_10_REV_BIT);
-
-  if (!ctx-Extensions.ARB_vertex_type_10f_11f_11f_rev)
- legalTypesMask = ~UNSIGNED_INT_10F_11F_11F_REV_BIT;
+  sizeMax = 4;
 

Re: [Mesa-dev] [PATCH] radeon, r200: fix buffer validation after CS flush

2014-08-06 Thread Alex Deucher
On Wed, Aug 6, 2014 at 9:28 AM, Marek Olšák mar...@gmail.com wrote:
 From: Marek Olšák marek.ol...@amd.com

 This validates all bound buffers (CB, ZB, textures, DMA) at the beginning
 of CS. This fixes bo-space_accouned assertion failures.

 Tested by: Jochen Rollwagen joro-2...@t-online.de
 Cc: mesa-sta...@lists.freedesktop.org

Reviewed-by: Alex Deucher alexander.deuc...@amd.com

 ---
  src/mesa/drivers/dri/r200/r200_context.c|  1 +
  src/mesa/drivers/dri/r200/r200_state.c  |  2 +-
  src/mesa/drivers/dri/r200/r200_state.h  |  1 +
  src/mesa/drivers/dri/radeon/radeon_common.c | 14 +-
  src/mesa/drivers/dri/radeon/radeon_common_context.h |  1 +
  src/mesa/drivers/dri/radeon/radeon_context.c|  1 +
  src/mesa/drivers/dri/radeon/radeon_state.c  |  2 +-
  src/mesa/drivers/dri/radeon/radeon_state.h  |  1 +
  8 files changed, 8 insertions(+), 15 deletions(-)

 diff --git a/src/mesa/drivers/dri/r200/r200_context.c 
 b/src/mesa/drivers/dri/r200/r200_context.c
 index 71dfcf3..d5749f3 100644
 --- a/src/mesa/drivers/dri/r200/r200_context.c
 +++ b/src/mesa/drivers/dri/r200/r200_context.c
 @@ -190,6 +190,7 @@ static void r200_init_vtbl(radeonContextPtr radeon)
 radeon-vtbl.check_blit = r200_check_blit;
 radeon-vtbl.blit = r200_blit;
 radeon-vtbl.is_format_renderable = radeonIsFormatRenderable;
 +   radeon-vtbl.revalidate_all_buffers = r200ValidateBuffers;
  }


 diff --git a/src/mesa/drivers/dri/r200/r200_state.c 
 b/src/mesa/drivers/dri/r200/r200_state.c
 index 2c7b652..983430f 100644
 --- a/src/mesa/drivers/dri/r200/r200_state.c
 +++ b/src/mesa/drivers/dri/r200/r200_state.c
 @@ -2210,7 +2210,7 @@ static void update_texturematrix( struct gl_context 
 *ctx )
 }
  }

 -static GLboolean r200ValidateBuffers(struct gl_context *ctx)
 +GLboolean r200ValidateBuffers(struct gl_context *ctx)
  {
 r200ContextPtr rmesa = R200_CONTEXT(ctx);
 struct radeon_renderbuffer *rrb;
 diff --git a/src/mesa/drivers/dri/r200/r200_state.h 
 b/src/mesa/drivers/dri/r200/r200_state.h
 index db0f01f..a396b06 100644
 --- a/src/mesa/drivers/dri/r200/r200_state.h
 +++ b/src/mesa/drivers/dri/r200/r200_state.h
 @@ -47,6 +47,7 @@ extern void r200UpdateViewportOffset( struct gl_context 
 *ctx );
  extern void r200UpdateWindow( struct gl_context *ctx );
  extern void r200UpdateDrawBuffer(struct gl_context *ctx);

 +extern GLboolean r200ValidateBuffers(struct gl_context *ctx);
  extern GLboolean r200ValidateState( struct gl_context *ctx );

  extern void r200_vtbl_update_scissor( struct gl_context *ctx );
 diff --git a/src/mesa/drivers/dri/radeon/radeon_common.c 
 b/src/mesa/drivers/dri/radeon/radeon_common.c
 index 67c6dc7..515e55a 100644
 --- a/src/mesa/drivers/dri/radeon/radeon_common.c
 +++ b/src/mesa/drivers/dri/radeon/radeon_common.c
 @@ -532,17 +532,6 @@ static INLINE void radeonEmitAtoms(radeonContextPtr 
 radeon, GLboolean emitAll)
 COMMIT_BATCH();
  }

 -static GLboolean radeon_revalidate_bos(struct gl_context *ctx)
 -{
 -   radeonContextPtr radeon = RADEON_CONTEXT(ctx);
 -   int ret;
 -
 -   ret = radeon_cs_space_check(radeon-cmdbuf.cs);
 -   if (ret == RADEON_CS_SPACE_FLUSH)
 -   return GL_FALSE;
 -   return GL_TRUE;
 -}
 -
  void radeonEmitState(radeonContextPtr radeon)
  {
 radeon_print(RADEON_STATE, RADEON_NORMAL, %s\n, __FUNCTION__);
 @@ -661,9 +650,8 @@ int rcommonFlushCmdBufLocked(radeonContextPtr rmesa, 
 const char *caller)
 radeon_cs_erase(rmesa-cmdbuf.cs);
 rmesa-cmdbuf.flushing = 0;

 -   if (radeon_revalidate_bos(rmesa-glCtx) == GL_FALSE) {
 +   if (!rmesa-vtbl.revalidate_all_buffers(rmesa-glCtx))
 fprintf(stderr,failed to revalidate buffers\n);
 -   }

 return ret;
  }
 diff --git a/src/mesa/drivers/dri/radeon/radeon_common_context.h 
 b/src/mesa/drivers/dri/radeon/radeon_common_context.h
 index 6cd1535..ac3e7b5 100644
 --- a/src/mesa/drivers/dri/radeon/radeon_common_context.h
 +++ b/src/mesa/drivers/dri/radeon/radeon_common_context.h
 @@ -496,6 +496,7 @@ struct radeon_context {
  unsigned reg_height,
  unsigned flip_y);
unsigned (*is_format_renderable)(mesa_format mesa_format);
 +  GLboolean (*revalidate_all_buffers)(struct gl_context *ctx);
 } vtbl;
  };

 diff --git a/src/mesa/drivers/dri/radeon/radeon_context.c 
 b/src/mesa/drivers/dri/radeon/radeon_context.c
 index 1ceb4ab..edd94e2 100644
 --- a/src/mesa/drivers/dri/radeon/radeon_context.c
 +++ b/src/mesa/drivers/dri/radeon/radeon_context.c
 @@ -157,6 +157,7 @@ static void r100_init_vtbl(radeonContextPtr radeon)
 radeon-vtbl.check_blit = r100_check_blit;
 radeon-vtbl.blit = r100_blit;
 radeon-vtbl.is_format_renderable = radeonIsFormatRenderable;
 +   radeon-vtbl.revalidate_all_buffers = r100ValidateBuffers;
  }

  /* Create the device specific context.
 diff --git 

Re: [Mesa-dev] [PATCH 2/2] radeonsi: always prefer SWITCH_ON_EOP(0) on CIK

2014-08-06 Thread Alex Deucher
On Wed, Aug 6, 2014 at 9:32 AM, Marek Olšák mar...@gmail.com wrote:
 From: Marek Olšák marek.ol...@amd.com

 The code is rewritten to take known constraints into account, while always
 using 0 by default.

 This should improve performance for multi-SE parts in theory.

 A debug option is also added for easier debugging. (If there are hangs,
 use the option. If the hangs go away, you have found the problem.)

Just one comment below.  With that addressed:

Reviewed-by: Alex Deucher alexander.deuc...@amd.com

 ---
  src/gallium/drivers/radeon/r600_pipe_common.c |  2 +-
  src/gallium/drivers/radeon/r600_pipe_common.h |  1 +
  src/gallium/drivers/radeonsi/si_state_draw.c  | 33 
 ---
  src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 17 
  4 files changed, 43 insertions(+), 10 deletions(-)

 diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
 b/src/gallium/drivers/radeon/r600_pipe_common.c
 index 3476021..eb44d72 100644
 --- a/src/gallium/drivers/radeon/r600_pipe_common.c
 +++ b/src/gallium/drivers/radeon/r600_pipe_common.c
 @@ -239,7 +239,6 @@ static const struct debug_named_value 
 common_debug_options[] = {
 { vm, DBG_VM, Print virtual addresses when creating resources },
 { trace_cs, DBG_TRACE_CS, Trace cs and write rlockup_csid.c file 
 with faulty cs },

 -
 /* shaders */
 { fs, DBG_FS, Print fetch shaders },
 { vs, DBG_VS, Print vertex shaders },
 @@ -254,6 +253,7 @@ static const struct debug_named_value 
 common_debug_options[] = {
 { noinvalrange, DBG_NO_DISCARD_RANGE, Disable handling of 
 INVALIDATE_RANGE map flags },
 { no2d, DBG_NO_2D_TILING, Disable 2D tiling },
 { notiling, DBG_NO_TILING, Disable tiling },
 +   { switch_on_eop, DBG_SWITCH_ON_EOP, Program WD/IA to switch on 
 end-of-packet. },

 DEBUG_NAMED_VALUE_END /* must be last */
  };
 diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
 b/src/gallium/drivers/radeon/r600_pipe_common.h
 index dcec2bb..ac69d5b 100644
 --- a/src/gallium/drivers/radeon/r600_pipe_common.h
 +++ b/src/gallium/drivers/radeon/r600_pipe_common.h
 @@ -93,6 +93,7 @@
  #define DBG_NO_DISCARD_RANGE   (1  12)
  #define DBG_NO_2D_TILING   (1  13)
  #define DBG_NO_TILING  (1  14)
 +#define DBG_SWITCH_ON_EOP  (1  15)
  /* The maximum allowed bit is 15. */

  #define R600_MAP_BUFFER_ALIGNMENT 64
 diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
 b/src/gallium/drivers/radeonsi/si_state_draw.c
 index 4e808a3..ae839ba 100644
 --- a/src/gallium/drivers/radeonsi/si_state_draw.c
 +++ b/src/gallium/drivers/radeonsi/si_state_draw.c
 @@ -401,25 +401,40 @@ static bool si_update_draw_info_state(struct si_context 
 *sctx,

 if (sctx-b.chip_class = CIK) {
 struct si_state_rasterizer *rs = 
 sctx-queued.named.rasterizer;
 -   bool wd_switch_on_eop = prim == V_008958_DI_PT_POLYGON ||
 -   prim == V_008958_DI_PT_LINELOOP ||
 -   prim == V_008958_DI_PT_TRIFAN ||
 -   prim == V_008958_DI_PT_TRISTRIP_ADJ ||
 -   info-primitive_restart ||
 -   (rs ? rs-line_stipple_enable : 
 false);
 -   /* If the WD switch is false, the IA switch must be false 
 too. */
 -   bool ia_switch_on_eop = wd_switch_on_eop;
 unsigned primgroup_size = 64;

 +   /* SWITCH_ON_EOP(0) is always preferable. */
 +   bool wd_switch_on_eop = false;
 +   bool ia_switch_on_eop = false;
 +
 +   /* WD_SWITCH_ON_EOP has no effect on GPUs with less than
 +* 4 shader engines. Set 1 to pass the assertion below.
 +* The other cases are hardware requirements. */
 +   if (sctx-b.screen-info.max_se  4 ||
 +   prim == V_008958_DI_PT_POLYGON ||
 +   prim == V_008958_DI_PT_LINELOOP ||
 +   prim == V_008958_DI_PT_TRIFAN ||
 +   prim == V_008958_DI_PT_TRISTRIP_ADJ ||
 +   info-primitive_restart)
 +   wd_switch_on_eop = true;
 +
 /* Hawaii hangs if instancing is enabled and WD_SWITCH_ON_EOP 
 is 0.
  * We don't know that for indirect drawing, so treat it as
  * always problematic. */
 if (sctx-b.family == CHIP_HAWAII 
 -   (info-indirect || info-instance_count  1)) {
 +   (info-indirect || info-instance_count  1))
 wd_switch_on_eop = true;
 +
 +   /* This is a hardware requirement. */
 +   if ((rs  rs-line_stipple_enable) ||
 +   (sctx-b.screen-debug_flags  DBG_SWITCH_ON_EOP)) {
 ia_switch_on_eop = true;
 +   wd_switch_on_eop = true;
  

Re: [Mesa-dev] [PATCH 1/2] radeonsi: fix a hang with instancing in Unigine Heaven/Valley on Hawaii

2014-08-06 Thread Alex Deucher
On Wed, Aug 6, 2014 at 9:32 AM, Marek Olšák mar...@gmail.com wrote:
 From: Marek Olšák marek.ol...@amd.com

 This isn't documented anywhere, but it's the only thing that works
 for this case.

Reviewed-by: Alex Deucher alexander.deuc...@amd.com


 ---
  src/gallium/drivers/radeonsi/si_state_draw.c | 7 ++-
  1 file changed, 2 insertions(+), 5 deletions(-)

 diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
 b/src/gallium/drivers/radeonsi/si_state_draw.c
 index eb21ba1..4e808a3 100644
 --- a/src/gallium/drivers/radeonsi/si_state_draw.c
 +++ b/src/gallium/drivers/radeonsi/si_state_draw.c
 @@ -411,14 +411,11 @@ static bool si_update_draw_info_state(struct si_context 
 *sctx,
 bool ia_switch_on_eop = wd_switch_on_eop;
 unsigned primgroup_size = 64;

 -   /* Hawaii hangs if instancing is enabled and each instance
 -* is smaller than a prim group and WD_SWITCH_ON_EOP is 0.
 +   /* Hawaii hangs if instancing is enabled and WD_SWITCH_ON_EOP 
 is 0.
  * We don't know that for indirect drawing, so treat it as
  * always problematic. */
 if (sctx-b.family == CHIP_HAWAII 
 -   (info-indirect ||
 -(info-instance_count  1 
 - u_prims_for_vertices(info-mode, info-count)  
 primgroup_size))) {
 +   (info-indirect || info-instance_count  1)) {
 wd_switch_on_eop = true;
 ia_switch_on_eop = true;
 }
 --
 1.9.1

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] draw: fix clipvertex trouble if position comes from gs

2014-08-06 Thread Zack Rusin
On Aug 5, 2014, at 9:40 PM, srol...@vmware.com wrote:
 From: Roland Scheidegger srol...@vmware.com
 
 If the vertex shader has no position but the gs has, the clipvertex output
 was -1 (because it's the same as vs position in this case if there's no
 explicit clipvertex output). This caused crashes (or assertion failures) in
 clipping since in the end position (which came from gs) was different from
 cv (-1) and we then tried to use the bogus cv input.
 Rather than just test for -1 cv value in clipping, make it explicitly return
 the position output of the gs instead which seems cleaner (since we really
 don't want to use the clipvertex value from the vs (it could be a valid value
 in the (unsupported) case of vs writing clipvertex but still using a gs).
 This fixes piglit shader_runner clip-distance-out-values.shader_test.

Great. Well done! Both of those look good. 

Reviewed-by: Zack Rusin za...@vmware.com

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: support unsigned increment in ir_loop controls

2014-08-06 Thread Ian Romanick
On 07/30/2014 04:11 AM, Tapani Pälli wrote:
 Current version can create ir_expression where operands have
 different base type, patch adds support for unsigned type.
 
 Signed-off-by: Tapani Pälli tapani.pa...@intel.com
 https://bugs.freedesktop.org/show_bug.cgi?id=80880
 ---
  src/glsl/loop_controls.cpp | 18 +++---
  1 file changed, 15 insertions(+), 3 deletions(-)
 
 diff --git a/src/glsl/loop_controls.cpp b/src/glsl/loop_controls.cpp
 index 36b49eb..419f9c1 100644
 --- a/src/glsl/loop_controls.cpp
 +++ b/src/glsl/loop_controls.cpp
 @@ -123,9 +123,21 @@ calculate_iterations(ir_rvalue *from, ir_rvalue *to, 
 ir_rvalue *increment,
 bool valid_loop = false;
  
 for (unsigned i = 0; i  Elements(bias); i++) {
 -  iter = (increment-type-is_integer())
 -  ? new(mem_ctx) ir_constant(iter_value + bias[i])
 -  : new(mem_ctx) ir_constant(float(iter_value + bias[i]));
 +
 +  /* Increment may be of type int, uint or float. */
 +  switch (increment-type-base_type) {
 +  case GLSL_TYPE_INT:
 + iter = new(mem_ctx) ir_constant(iter_value + bias[i]);
 + break;
 +  case GLSL_TYPE_UINT:
 + iter = new(mem_ctx) ir_constant(unsigned(iter_value + bias[i]));
 + break;
 +  case GLSL_TYPE_FLOAT:
 + iter = new(mem_ctx) ir_constant(float(iter_value + bias[i]));
 + break;
 +  default:
 +  assert(!Unsupported type for loop iterator.);

Right... because this code was written when we only had int and
float types.

Two things:

  - Use spaces instead of tabs.  It looks like the surrounding code
uses tabs, and that's my fault.  We're trying to fix that in new
code.

  - Change the assert to unreachable(Unsupported type for loop iterator.)

With those fixed, this patch is

Reviewed-by: Ian Romanick ian.d.roman...@intel.com

 +  }
  
ir_expression *const mul =
new(mem_ctx) ir_expression(ir_binop_mul, increment-type, iter,
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: mesa/st dynamic sampler support in tgsi

2014-08-06 Thread Roland Scheidegger
Am 06.08.2014 13:00, schrieb Marek Olšák:
 On Wed, Aug 6, 2014 at 4:02 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
 On Tue, Aug 5, 2014 at 5:25 PM, Roland Scheidegger srol...@vmware.com 
 wrote:
 From a gallium perspective, indirect temp regs are already working - so
 something like
 MOV TEMP[0], TEMP[TEMP[1].x] should work.
 Indirect registers are supported for inputs, outputs, temps, constants,
 and immediates even, but the indirect reg itself must come from a temp
 or address reg (I am not 100% certain where that restriction comes
 from). I have no idea which drivers support it, all I can tell is that
 it works with llvmpipe.
 I sort of doubt it is supported for samplers right now in gallium though
 technically it might be possible to express this already.

 Well, with my limited patch + ChrisF's small patches to mesa core, the
 dynamic sampler stuff works for nvc0, except for the issues I
 outlined. Not sure what you mean by supported in gallium. Perhaps I
 have an incorrect view of things, but I see gallium as an amorphous
 thing that we can change to our heart's content.

 A cap bit for the ability to support dynamic indexing of shaders (plus
 whatever is needed for making it work like declaration of sampler
 arrays) would certainly be needed in any case. For drivers supporting

 Right... so it's not like shaders will start magically containing
 these things, it'll only happen if ARB_gs5 is enabled (probably via
 PIPE_CAP_GLSL = 400). Which presumably means that the backend
 supports whatever we're throwing at it.

 this I would certainly expect them to allow temp regs as the indirect
 reg. I guess it would be nice if we'd just use temp regs instead of
 address reg in glsl to tgsi conversion if a driver supports it. I think
 for modern drivers this makes a lot more sense than trying to shove
 everything into address regs.

 Agreed. With the exception that I guess we also need to support
 indexing with float values? (i.e. ARL) This would have to be treated
 with some care. Not sure when that comes up though... perhaps only if
 !native_integers, which won't be an issue with any of the hw that
 we're talking about.
 
 If you really want to lower ARL into a temp, I recommend using F2I,
 which is equivalent in behavior. For UARL, MOV will do.
 
 Also, I don't think GLSL sampler arrays have to be declared as arrays
 in TGSI. Array declarations are really only needed for TEMPs, because
 they allow better register allocation. Every other shader resource has
 a fixed location and would not benefit from it.
I think not requiring them to be declared as an array is a bad idea. It
may well be true that hw drivers can't really benefit from it but in any
case it would be trivial to handle in the drivers. It gives you the
ability to easily see what values are legal in the end as a sampler
index, might help with debugging at some day. Besides, it's just bad
style imho to index into things which aren't arrays, that is applicable
to all languages, so I can't see why it should be different for tgsi.
But I guess it's not all _that_ important.

Roland


 
 If GLSL is strict about out-of-bounds access, I recommending always
 clamping the index in glsl_to_tgsi.
 
 Marek
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: mesa/st dynamic sampler support in tgsi

2014-08-06 Thread Ilia Mirkin
On Wed, Aug 6, 2014 at 10:52 AM, Roland Scheidegger srol...@vmware.com wrote:
 Am 06.08.2014 13:00, schrieb Marek Olšák:
 On Wed, Aug 6, 2014 at 4:02 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
 On Tue, Aug 5, 2014 at 5:25 PM, Roland Scheidegger srol...@vmware.com 
 wrote:
 From a gallium perspective, indirect temp regs are already working - so
 something like
 MOV TEMP[0], TEMP[TEMP[1].x] should work.
 Indirect registers are supported for inputs, outputs, temps, constants,
 and immediates even, but the indirect reg itself must come from a temp
 or address reg (I am not 100% certain where that restriction comes
 from). I have no idea which drivers support it, all I can tell is that
 it works with llvmpipe.
 I sort of doubt it is supported for samplers right now in gallium though
 technically it might be possible to express this already.

 Well, with my limited patch + ChrisF's small patches to mesa core, the
 dynamic sampler stuff works for nvc0, except for the issues I
 outlined. Not sure what you mean by supported in gallium. Perhaps I
 have an incorrect view of things, but I see gallium as an amorphous
 thing that we can change to our heart's content.

 A cap bit for the ability to support dynamic indexing of shaders (plus
 whatever is needed for making it work like declaration of sampler
 arrays) would certainly be needed in any case. For drivers supporting

 Right... so it's not like shaders will start magically containing
 these things, it'll only happen if ARB_gs5 is enabled (probably via
 PIPE_CAP_GLSL = 400). Which presumably means that the backend
 supports whatever we're throwing at it.

 this I would certainly expect them to allow temp regs as the indirect
 reg. I guess it would be nice if we'd just use temp regs instead of
 address reg in glsl to tgsi conversion if a driver supports it. I think
 for modern drivers this makes a lot more sense than trying to shove
 everything into address regs.

 Agreed. With the exception that I guess we also need to support
 indexing with float values? (i.e. ARL) This would have to be treated
 with some care. Not sure when that comes up though... perhaps only if
 !native_integers, which won't be an issue with any of the hw that
 we're talking about.

 If you really want to lower ARL into a temp, I recommend using F2I,
 which is equivalent in behavior. For UARL, MOV will do.

 Also, I don't think GLSL sampler arrays have to be declared as arrays
 in TGSI. Array declarations are really only needed for TEMPs, because
 they allow better register allocation. Every other shader resource has
 a fixed location and would not benefit from it.
 I think not requiring them to be declared as an array is a bad idea. It
 may well be true that hw drivers can't really benefit from it but in any
 case it would be trivial to handle in the drivers. It gives you the
 ability to easily see what values are legal in the end as a sampler
 index, might help with debugging at some day. Besides, it's just bad

You would see that based on the declarations anyways, no?

 style imho to index into things which aren't arrays, that is applicable
 to all languages, so I can't see why it should be different for tgsi.
 But I guess it's not all _that_ important.

Well, it might be important to put this in some context -- sampler
arrays are perfectly legal in GLSL today. What's not legal in pre-gs5
glsl (although based on some commends glsl 110 might have allowed it)
are the dynamic indices.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: dump TGSI before calling into the driver

2014-08-06 Thread Ilia Mirkin
On Wed, Aug 6, 2014 at 9:33 AM, Marek Olšák mar...@gmail.com wrote:
 From: Marek Olšák marek.ol...@amd.com

 If the driver crashes in create_xx_shader, you want to see the shader.

Reviewed-by: Ilia Mirkin imir...@alum.mit.edu

 ---
  src/mesa/state_tracker/st_program.c | 22 ++
  1 file changed, 10 insertions(+), 12 deletions(-)

 diff --git a/src/mesa/state_tracker/st_program.c 
 b/src/mesa/state_tracker/st_program.c
 index 9d7b7c4..fbf8930 100644
 --- a/src/mesa/state_tracker/st_program.c
 +++ b/src/mesa/state_tracker/st_program.c
 @@ -393,13 +393,12 @@ st_translate_vertex_program(struct st_context *st,
vpv-tgsi.stream_output);
 }

 -   vpv-driver_shader = pipe-create_vs_state(pipe, vpv-tgsi);
 -
 if (ST_DEBUG  DEBUG_TGSI) {
 -  tgsi_dump( vpv-tgsi.tokens, 0 );
 +  tgsi_dump(vpv-tgsi.tokens, 0);
debug_printf(\n);
 }

 +   vpv-driver_shader = pipe-create_vs_state(pipe, vpv-tgsi);
 return vpv;

  fail:
 @@ -804,15 +803,15 @@ st_translate_fragment_program(struct st_context *st,
 variant-tgsi.tokens = ureg_get_tokens( ureg, NULL );
 ureg_destroy( ureg );

 -   /* fill in variant */
 -   variant-driver_shader = pipe-create_fs_state(pipe, variant-tgsi);
 -   variant-key = *key;
 -
 if (ST_DEBUG  DEBUG_TGSI) {
 -  tgsi_dump( variant-tgsi.tokens, 0/*TGSI_DUMP_VERBOSE*/ );
 +  tgsi_dump(variant-tgsi.tokens, 0/*TGSI_DUMP_VERBOSE*/);
debug_printf(\n);
 }

 +   /* fill in variant */
 +   variant-driver_shader = pipe-create_fs_state(pipe, variant-tgsi);
 +   variant-key = *key;
 +
 if (deleteFP) {
/* Free the temporary program made above */
struct gl_fragment_program *fp = stfp-Base;
 @@ -1173,10 +1172,6 @@ st_translate_geometry_program(struct st_context *st,
stgp-tgsi.stream_output);
 }

 -   /* fill in new variant */
 -   gpv-driver_shader = pipe-create_gs_state(pipe, stgp-tgsi);
 -   gpv-key = *key;
 -
 if ((ST_DEBUG  DEBUG_TGSI)  (ST_DEBUG  DEBUG_MESA)) {
_mesa_print_program(stgp-Base.Base);
debug_printf(\n);
 @@ -1187,6 +1182,9 @@ st_translate_geometry_program(struct st_context *st,
debug_printf(\n);
 }

 +   /* fill in new variant */
 +   gpv-driver_shader = pipe-create_gs_state(pipe, stgp-tgsi);
 +   gpv-key = *key;
 return gpv;
  }

 --
 1.9.1

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: mesa/st dynamic sampler support in tgsi

2014-08-06 Thread Roland Scheidegger
Am 06.08.2014 17:03, schrieb Ilia Mirkin:
 On Wed, Aug 6, 2014 at 10:52 AM, Roland Scheidegger srol...@vmware.com 
 wrote:
 Am 06.08.2014 13:00, schrieb Marek Olšák:
 On Wed, Aug 6, 2014 at 4:02 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
 On Tue, Aug 5, 2014 at 5:25 PM, Roland Scheidegger srol...@vmware.com 
 wrote:
 From a gallium perspective, indirect temp regs are already working - so
 something like
 MOV TEMP[0], TEMP[TEMP[1].x] should work.
 Indirect registers are supported for inputs, outputs, temps, constants,
 and immediates even, but the indirect reg itself must come from a temp
 or address reg (I am not 100% certain where that restriction comes
 from). I have no idea which drivers support it, all I can tell is that
 it works with llvmpipe.
 I sort of doubt it is supported for samplers right now in gallium though
 technically it might be possible to express this already.

 Well, with my limited patch + ChrisF's small patches to mesa core, the
 dynamic sampler stuff works for nvc0, except for the issues I
 outlined. Not sure what you mean by supported in gallium. Perhaps I
 have an incorrect view of things, but I see gallium as an amorphous
 thing that we can change to our heart's content.

 A cap bit for the ability to support dynamic indexing of shaders (plus
 whatever is needed for making it work like declaration of sampler
 arrays) would certainly be needed in any case. For drivers supporting

 Right... so it's not like shaders will start magically containing
 these things, it'll only happen if ARB_gs5 is enabled (probably via
 PIPE_CAP_GLSL = 400). Which presumably means that the backend
 supports whatever we're throwing at it.

 this I would certainly expect them to allow temp regs as the indirect
 reg. I guess it would be nice if we'd just use temp regs instead of
 address reg in glsl to tgsi conversion if a driver supports it. I think
 for modern drivers this makes a lot more sense than trying to shove
 everything into address regs.

 Agreed. With the exception that I guess we also need to support
 indexing with float values? (i.e. ARL) This would have to be treated
 with some care. Not sure when that comes up though... perhaps only if
 !native_integers, which won't be an issue with any of the hw that
 we're talking about.

 If you really want to lower ARL into a temp, I recommend using F2I,
 which is equivalent in behavior. For UARL, MOV will do.

 Also, I don't think GLSL sampler arrays have to be declared as arrays
 in TGSI. Array declarations are really only needed for TEMPs, because
 they allow better register allocation. Every other shader resource has
 a fixed location and would not benefit from it.
 I think not requiring them to be declared as an array is a bad idea. It
 may well be true that hw drivers can't really benefit from it but in any
 case it would be trivial to handle in the drivers. It gives you the
 ability to easily see what values are legal in the end as a sampler
 index, might help with debugging at some day. Besides, it's just bad
 
 You would see that based on the declarations anyways, no?
How so? If you've got 15 samplers declared it is still not legal to
index into the 15th one if your sampler array is starting at 0 with 5
entries (or maybe it is legal but results undefined). That is at least
my understanding of the spec. (Of course if I'm wrong here then indeed
sampler arrays are worthless.)

 
 style imho to index into things which aren't arrays, that is applicable
 to all languages, so I can't see why it should be different for tgsi.
 But I guess it's not all _that_ important.
 
 Well, it might be important to put this in some context -- sampler
 arrays are perfectly legal in GLSL today. What's not legal in pre-gs5
 glsl (although based on some commends glsl 110 might have allowed it)
 are the dynamic indices.
Yes, but without dynamically indexing into it the sampler array can
easily be flattened since you always got the corresponding immediate
index. That is, it was never addressed as an array in tgsi.
FWIW this is the same story with d3d10 - resource dcls could be arrays
but the index had to be an immediate.

Roland


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: mesa/st dynamic sampler support in tgsi

2014-08-06 Thread Ilia Mirkin
On Wed, Aug 6, 2014 at 11:15 AM, Roland Scheidegger srol...@vmware.com wrote:
 Am 06.08.2014 17:03, schrieb Ilia Mirkin:
 On Wed, Aug 6, 2014 at 10:52 AM, Roland Scheidegger srol...@vmware.com 
 wrote:
 Am 06.08.2014 13:00, schrieb Marek Olšák:
 On Wed, Aug 6, 2014 at 4:02 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
 On Tue, Aug 5, 2014 at 5:25 PM, Roland Scheidegger srol...@vmware.com 
 wrote:
 From a gallium perspective, indirect temp regs are already working - so
 something like
 MOV TEMP[0], TEMP[TEMP[1].x] should work.
 Indirect registers are supported for inputs, outputs, temps, constants,
 and immediates even, but the indirect reg itself must come from a temp
 or address reg (I am not 100% certain where that restriction comes
 from). I have no idea which drivers support it, all I can tell is that
 it works with llvmpipe.
 I sort of doubt it is supported for samplers right now in gallium though
 technically it might be possible to express this already.

 Well, with my limited patch + ChrisF's small patches to mesa core, the
 dynamic sampler stuff works for nvc0, except for the issues I
 outlined. Not sure what you mean by supported in gallium. Perhaps I
 have an incorrect view of things, but I see gallium as an amorphous
 thing that we can change to our heart's content.

 A cap bit for the ability to support dynamic indexing of shaders (plus
 whatever is needed for making it work like declaration of sampler
 arrays) would certainly be needed in any case. For drivers supporting

 Right... so it's not like shaders will start magically containing
 these things, it'll only happen if ARB_gs5 is enabled (probably via
 PIPE_CAP_GLSL = 400). Which presumably means that the backend
 supports whatever we're throwing at it.

 this I would certainly expect them to allow temp regs as the indirect
 reg. I guess it would be nice if we'd just use temp regs instead of
 address reg in glsl to tgsi conversion if a driver supports it. I think
 for modern drivers this makes a lot more sense than trying to shove
 everything into address regs.

 Agreed. With the exception that I guess we also need to support
 indexing with float values? (i.e. ARL) This would have to be treated
 with some care. Not sure when that comes up though... perhaps only if
 !native_integers, which won't be an issue with any of the hw that
 we're talking about.

 If you really want to lower ARL into a temp, I recommend using F2I,
 which is equivalent in behavior. For UARL, MOV will do.

 Also, I don't think GLSL sampler arrays have to be declared as arrays
 in TGSI. Array declarations are really only needed for TEMPs, because
 they allow better register allocation. Every other shader resource has
 a fixed location and would not benefit from it.
 I think not requiring them to be declared as an array is a bad idea. It
 may well be true that hw drivers can't really benefit from it but in any
 case it would be trivial to handle in the drivers. It gives you the
 ability to easily see what values are legal in the end as a sampler
 index, might help with debugging at some day. Besides, it's just bad

 You would see that based on the declarations anyways, no?
 How so? If you've got 15 samplers declared it is still not legal to
 index into the 15th one if your sampler array is starting at 0 with 5
 entries (or maybe it is legal but results undefined). That is at least
 my understanding of the spec. (Of course if I'm wrong here then indeed
 sampler arrays are worthless.)

That is indeed not legal. So right -- you wouldn't see where the
arrays are. But is that really worth worrying about at the TGSI level?
Anyways, I'll send my patch once perl gets unbroken on my system, and
you can rip it apart then :) Doing the array thing would be a giant
complication for what I perceive to be fairly little gain. The thing
is that the information of what's an array where is long lost by the
time the declarations are created -- there's just a bitmask of used
samplers.



 style imho to index into things which aren't arrays, that is applicable
 to all languages, so I can't see why it should be different for tgsi.
 But I guess it's not all _that_ important.

 Well, it might be important to put this in some context -- sampler
 arrays are perfectly legal in GLSL today. What's not legal in pre-gs5
 glsl (although based on some commends glsl 110 might have allowed it)
 are the dynamic indices.
 Yes, but without dynamically indexing into it the sampler array can
 easily be flattened since you always got the corresponding immediate
 index. That is, it was never addressed as an array in tgsi.
 FWIW this is the same story with d3d10 - resource dcls could be arrays
 but the index had to be an immediate.

 Roland


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa/st: add support for dynamic sampler offsets

2014-08-06 Thread Ilia Mirkin
Replace the plain sampler index with a register reference to a sampler.
We also need to keep track of the sampler array size when there is a
relative reference so that we can mark the whole array used.

To facilitate implementation, we add a separate ADDR register that
exclusively handles the sampler relative address. Other approaches would
be more invasive.

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---

_mesa_get_sampler_array_nonconst_index is a function added by a patch that
ChrisF is working on... basically it returns NULL unless it's a nonconst
access.

I've done a very modest amount of piglit testing, but I definitely need to do
some more. The nvc0 bits aren't 100% ready -- I noticed that in some odd
situations the arguments to the tex instruction will get all mangled. But for
a simple case that mixes non-array and array samplers, it looks something like
this:

FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL IN[0], GENERIC[0], PERSPECTIVE
DCL OUT[0], COLOR
DCL SAMP[0]
DCL SAMP[1]
DCL SAMP[2]
DCL SAMP[3]
DCL CONST[0..1]
DCL TEMP[0..1], LOCAL
DCL ADDR[0..2]
IMM[0] FLT32 {0., 0., 0., 0.}
  0: MOV TEMP[0].xy, IN[0].xyyy
  1: TEX TEMP[0], TEMP[0], SAMP[0], 2D
  2: MOV TEMP[1].xy, IN[0].xyyy
  3: UARL ADDR[2].x, CONST[1].
  4: TEX TEMP[1], TEMP[1], SAMP[ADDR[2].x+1], 2D
  5: MUL TEMP[1], TEMP[1], CONST[0].
  6: MAD TEMP[0], TEMP[0], IMM[0]., TEMP[1]
  7: MOV OUT[0], TEMP[0]
  8: END

 src/gallium/auxiliary/tgsi/tgsi_ureg.c | 2 +-
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 60
 +- 2 files changed, 44 insertions(+), 18
 deletions(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c 
b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
index dcf0cb5..6d3ac91 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
@@ -78,7 +78,7 @@ struct ureg_tokens {
 #define UREG_MAX_OUTPUT PIPE_MAX_SHADER_OUTPUTS
 #define UREG_MAX_CONSTANT_RANGE 32
 #define UREG_MAX_IMMEDIATE 4096
-#define UREG_MAX_ADDR 2
+#define UREG_MAX_ADDR 3
 #define UREG_MAX_PRED 1
 #define UREG_MAX_ARRAY_TEMPS 256
 
diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index c5e2eb5..0d5c3ed 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -245,7 +245,8 @@ public:
ir_instruction *ir;
GLboolean cond_update;
bool saturate;
-   int sampler; /** sampler index */
+   st_src_reg sampler; /** sampler register */
+   int sampler_array_size; /** 1-based size of sampler array, 1 if not array 
*/
int tex_target; /** One of TEXTURE_*_INDEX */
GLboolean tex_shadow;
 
@@ -476,6 +477,7 @@ static st_dst_reg undef_dst = st_dst_reg(PROGRAM_UNDEFINED, 
SWIZZLE_NOOP, GLSL_T
 
 static st_dst_reg address_reg = st_dst_reg(PROGRAM_ADDRESS, WRITEMASK_X, 
GLSL_TYPE_FLOAT, 0);
 static st_dst_reg address_reg2 = st_dst_reg(PROGRAM_ADDRESS, WRITEMASK_X, 
GLSL_TYPE_FLOAT, 1);
+static st_dst_reg sampler_reladdr = st_dst_reg(PROGRAM_ADDRESS, WRITEMASK_X, 
GLSL_TYPE_FLOAT, 2);
 
 static void
 fail_link(struct gl_shader_program *prog, const char *fmt, ...) PRINTFLIKE(2, 
3);
@@ -2799,6 +2801,8 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir)
glsl_to_tgsi_instruction *inst = NULL;
unsigned opcode = TGSI_OPCODE_NOP;
const glsl_type *sampler_type = ir-sampler-type;
+   ir_rvalue *sampler_index =
+  _mesa_get_sampler_array_nonconst_index(ir-sampler);
bool is_cube_array = false;
unsigned i;
 
@@ -3016,6 +3020,11 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir)
   coord_dst.writemask = WRITEMASK_XYZW;
}
 
+   if (sampler_index) {
+  sampler_index-accept(this);
+  emit_arl(ir, sampler_reladdr, this-result);
+   }
+
if (opcode == TGSI_OPCODE_TXD)
   inst = emit(ir, opcode, result_dst, coord, dx, dy);
else if (opcode == TGSI_OPCODE_TXQ) {
@@ -3045,9 +3054,18 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir)
if (ir-shadow_comparitor)
   inst-tex_shadow = GL_TRUE;
 
-   inst-sampler = _mesa_get_sampler_uniform_value(ir-sampler,
-  this-shader_program,
-  this-prog);
+   inst-sampler.index = _mesa_get_sampler_uniform_value(ir-sampler,
+ this-shader_program,
+ this-prog);
+   if (sampler_index) {
+  inst-sampler.reladdr = ralloc(mem_ctx, st_src_reg);
+  memcpy(inst-sampler.reladdr, sampler_reladdr, sizeof(sampler_reladdr));
+  inst-sampler_array_size =
+ ir-sampler-as_dereference_array()
+-array-variable_referenced()-type-length;
+   } else {
+  inst-sampler_array_size = 1;
+   }
 
if (ir-offset) {
   for (i = 0; i  MAX_GLSL_TEXTURE_OFFSET  offset[i].file != 
PROGRAM_UNDEFINED; i++)
@@ -3215,10 +3233,12 @@ count_resources(glsl_to_tgsi_visitor *v, gl_program 

Re: [Mesa-dev] RFC: mesa/st dynamic sampler support in tgsi

2014-08-06 Thread Roland Scheidegger
Am 06.08.2014 17:20, schrieb Ilia Mirkin:
 On Wed, Aug 6, 2014 at 11:15 AM, Roland Scheidegger srol...@vmware.com 
 wrote:
 Am 06.08.2014 17:03, schrieb Ilia Mirkin:
 On Wed, Aug 6, 2014 at 10:52 AM, Roland Scheidegger srol...@vmware.com 
 wrote:
 Am 06.08.2014 13:00, schrieb Marek Olšák:
 On Wed, Aug 6, 2014 at 4:02 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
 On Tue, Aug 5, 2014 at 5:25 PM, Roland Scheidegger srol...@vmware.com 
 wrote:
 From a gallium perspective, indirect temp regs are already working - so
 something like
 MOV TEMP[0], TEMP[TEMP[1].x] should work.
 Indirect registers are supported for inputs, outputs, temps, constants,
 and immediates even, but the indirect reg itself must come from a temp
 or address reg (I am not 100% certain where that restriction comes
 from). I have no idea which drivers support it, all I can tell is that
 it works with llvmpipe.
 I sort of doubt it is supported for samplers right now in gallium though
 technically it might be possible to express this already.

 Well, with my limited patch + ChrisF's small patches to mesa core, the
 dynamic sampler stuff works for nvc0, except for the issues I
 outlined. Not sure what you mean by supported in gallium. Perhaps I
 have an incorrect view of things, but I see gallium as an amorphous
 thing that we can change to our heart's content.

 A cap bit for the ability to support dynamic indexing of shaders (plus
 whatever is needed for making it work like declaration of sampler
 arrays) would certainly be needed in any case. For drivers supporting

 Right... so it's not like shaders will start magically containing
 these things, it'll only happen if ARB_gs5 is enabled (probably via
 PIPE_CAP_GLSL = 400). Which presumably means that the backend
 supports whatever we're throwing at it.

 this I would certainly expect them to allow temp regs as the indirect
 reg. I guess it would be nice if we'd just use temp regs instead of
 address reg in glsl to tgsi conversion if a driver supports it. I think
 for modern drivers this makes a lot more sense than trying to shove
 everything into address regs.

 Agreed. With the exception that I guess we also need to support
 indexing with float values? (i.e. ARL) This would have to be treated
 with some care. Not sure when that comes up though... perhaps only if
 !native_integers, which won't be an issue with any of the hw that
 we're talking about.

 If you really want to lower ARL into a temp, I recommend using F2I,
 which is equivalent in behavior. For UARL, MOV will do.

 Also, I don't think GLSL sampler arrays have to be declared as arrays
 in TGSI. Array declarations are really only needed for TEMPs, because
 they allow better register allocation. Every other shader resource has
 a fixed location and would not benefit from it.
 I think not requiring them to be declared as an array is a bad idea. It
 may well be true that hw drivers can't really benefit from it but in any
 case it would be trivial to handle in the drivers. It gives you the
 ability to easily see what values are legal in the end as a sampler
 index, might help with debugging at some day. Besides, it's just bad

 You would see that based on the declarations anyways, no?
 How so? If you've got 15 samplers declared it is still not legal to
 index into the 15th one if your sampler array is starting at 0 with 5
 entries (or maybe it is legal but results undefined). That is at least
 my understanding of the spec. (Of course if I'm wrong here then indeed
 sampler arrays are worthless.)
 
 That is indeed not legal. So right -- you wouldn't see where the
 arrays are. But is that really worth worrying about at the TGSI level?
 Anyways, I'll send my patch once perl gets unbroken on my system, and
 you can rip it apart then :) Doing the array thing would be a giant
 complication for what I perceive to be fairly little gain. The thing
 is that the information of what's an array where is long lost by the
 time the declarations are created -- there's just a bitmask of used
 samplers.
Oh I wasn't aware of that I thought you got that information pretty easily.
Yeah in that case I guess it's not worth bothering. In any case we could
tighten that up later if necessary.

Roland




 style imho to index into things which aren't arrays, that is applicable
 to all languages, so I can't see why it should be different for tgsi.
 But I guess it's not all _that_ important.

 Well, it might be important to put this in some context -- sampler
 arrays are perfectly legal in GLSL today. What's not legal in pre-gs5
 glsl (although based on some commends glsl 110 might have allowed it)
 are the dynamic indices.
 Yes, but without dynamically indexing into it the sampler array can
 easily be flattened since you always got the corresponding immediate
 index. That is, it was never addressed as an array in tgsi.
 FWIW this is the same story with d3d10 - resource dcls could be arrays
 but the index had to be an immediate.

 Roland




Re: [Mesa-dev] [PATCH 2/2] radeonsi: always prefer SWITCH_ON_EOP(0) on CIK

2014-08-06 Thread Marek Olšák
On Wed, Aug 6, 2014 at 4:01 PM, Alex Deucher alexdeuc...@gmail.com wrote:
 On Wed, Aug 6, 2014 at 9:32 AM, Marek Olšák mar...@gmail.com wrote:
 From: Marek Olšák marek.ol...@amd.com

 The code is rewritten to take known constraints into account, while always
 using 0 by default.

 This should improve performance for multi-SE parts in theory.

 A debug option is also added for easier debugging. (If there are hangs,
 use the option. If the hangs go away, you have found the problem.)

 Just one comment below.  With that addressed:

 Reviewed-by: Alex Deucher alexander.deuc...@amd.com

 ---
  src/gallium/drivers/radeon/r600_pipe_common.c |  2 +-
  src/gallium/drivers/radeon/r600_pipe_common.h |  1 +
  src/gallium/drivers/radeonsi/si_state_draw.c  | 33 
 ---
  src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 17 
  4 files changed, 43 insertions(+), 10 deletions(-)

 diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
 b/src/gallium/drivers/radeon/r600_pipe_common.c
 index 3476021..eb44d72 100644
 --- a/src/gallium/drivers/radeon/r600_pipe_common.c
 +++ b/src/gallium/drivers/radeon/r600_pipe_common.c
 @@ -239,7 +239,6 @@ static const struct debug_named_value 
 common_debug_options[] = {
 { vm, DBG_VM, Print virtual addresses when creating resources },
 { trace_cs, DBG_TRACE_CS, Trace cs and write rlockup_csid.c 
 file with faulty cs },

 -
 /* shaders */
 { fs, DBG_FS, Print fetch shaders },
 { vs, DBG_VS, Print vertex shaders },
 @@ -254,6 +253,7 @@ static const struct debug_named_value 
 common_debug_options[] = {
 { noinvalrange, DBG_NO_DISCARD_RANGE, Disable handling of 
 INVALIDATE_RANGE map flags },
 { no2d, DBG_NO_2D_TILING, Disable 2D tiling },
 { notiling, DBG_NO_TILING, Disable tiling },
 +   { switch_on_eop, DBG_SWITCH_ON_EOP, Program WD/IA to switch on 
 end-of-packet. },

 DEBUG_NAMED_VALUE_END /* must be last */
  };
 diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
 b/src/gallium/drivers/radeon/r600_pipe_common.h
 index dcec2bb..ac69d5b 100644
 --- a/src/gallium/drivers/radeon/r600_pipe_common.h
 +++ b/src/gallium/drivers/radeon/r600_pipe_common.h
 @@ -93,6 +93,7 @@
  #define DBG_NO_DISCARD_RANGE   (1  12)
  #define DBG_NO_2D_TILING   (1  13)
  #define DBG_NO_TILING  (1  14)
 +#define DBG_SWITCH_ON_EOP  (1  15)
  /* The maximum allowed bit is 15. */

  #define R600_MAP_BUFFER_ALIGNMENT 64
 diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
 b/src/gallium/drivers/radeonsi/si_state_draw.c
 index 4e808a3..ae839ba 100644
 --- a/src/gallium/drivers/radeonsi/si_state_draw.c
 +++ b/src/gallium/drivers/radeonsi/si_state_draw.c
 @@ -401,25 +401,40 @@ static bool si_update_draw_info_state(struct 
 si_context *sctx,

 if (sctx-b.chip_class = CIK) {
 struct si_state_rasterizer *rs = 
 sctx-queued.named.rasterizer;
 -   bool wd_switch_on_eop = prim == V_008958_DI_PT_POLYGON ||
 -   prim == V_008958_DI_PT_LINELOOP ||
 -   prim == V_008958_DI_PT_TRIFAN ||
 -   prim == V_008958_DI_PT_TRISTRIP_ADJ 
 ||
 -   info-primitive_restart ||
 -   (rs ? rs-line_stipple_enable : 
 false);
 -   /* If the WD switch is false, the IA switch must be false 
 too. */
 -   bool ia_switch_on_eop = wd_switch_on_eop;
 unsigned primgroup_size = 64;

 +   /* SWITCH_ON_EOP(0) is always preferable. */
 +   bool wd_switch_on_eop = false;
 +   bool ia_switch_on_eop = false;
 +
 +   /* WD_SWITCH_ON_EOP has no effect on GPUs with less than
 +* 4 shader engines. Set 1 to pass the assertion below.
 +* The other cases are hardware requirements. */
 +   if (sctx-b.screen-info.max_se  4 ||
 +   prim == V_008958_DI_PT_POLYGON ||
 +   prim == V_008958_DI_PT_LINELOOP ||
 +   prim == V_008958_DI_PT_TRIFAN ||
 +   prim == V_008958_DI_PT_TRISTRIP_ADJ ||
 +   info-primitive_restart)
 +   wd_switch_on_eop = true;
 +
 /* Hawaii hangs if instancing is enabled and 
 WD_SWITCH_ON_EOP is 0.
  * We don't know that for indirect drawing, so treat it as
  * always problematic. */
 if (sctx-b.family == CHIP_HAWAII 
 -   (info-indirect || info-instance_count  1)) {
 +   (info-indirect || info-instance_count  1))
 wd_switch_on_eop = true;
 +
 +   /* This is a hardware requirement. */
 +   if ((rs  rs-line_stipple_enable) ||
 +   (sctx-b.screen-debug_flags  DBG_SWITCH_ON_EOP)) {
 

Re: [Mesa-dev] [PATCH] mesa/st: add support for dynamic sampler offsets

2014-08-06 Thread Marek Olšák
I guess PIPE_SHADER_CAP_MAX_ADDRS is now useless, because it can be
derived from GLSL_FEATURE_LEVEL, right?

Marek

On Wed, Aug 6, 2014 at 5:25 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
 Replace the plain sampler index with a register reference to a sampler.
 We also need to keep track of the sampler array size when there is a
 relative reference so that we can mark the whole array used.

 To facilitate implementation, we add a separate ADDR register that
 exclusively handles the sampler relative address. Other approaches would
 be more invasive.

 Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
 ---

 _mesa_get_sampler_array_nonconst_index is a function added by a patch that
 ChrisF is working on... basically it returns NULL unless it's a nonconst
 access.

 I've done a very modest amount of piglit testing, but I definitely need to do
 some more. The nvc0 bits aren't 100% ready -- I noticed that in some odd
 situations the arguments to the tex instruction will get all mangled. But for
 a simple case that mixes non-array and array samplers, it looks something like
 this:

 FRAG
 PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
 DCL IN[0], GENERIC[0], PERSPECTIVE
 DCL OUT[0], COLOR
 DCL SAMP[0]
 DCL SAMP[1]
 DCL SAMP[2]
 DCL SAMP[3]
 DCL CONST[0..1]
 DCL TEMP[0..1], LOCAL
 DCL ADDR[0..2]
 IMM[0] FLT32 {0., 0., 0., 0.}
   0: MOV TEMP[0].xy, IN[0].xyyy
   1: TEX TEMP[0], TEMP[0], SAMP[0], 2D
   2: MOV TEMP[1].xy, IN[0].xyyy
   3: UARL ADDR[2].x, CONST[1].
   4: TEX TEMP[1], TEMP[1], SAMP[ADDR[2].x+1], 2D
   5: MUL TEMP[1], TEMP[1], CONST[0].
   6: MAD TEMP[0], TEMP[0], IMM[0]., TEMP[1]
   7: MOV OUT[0], TEMP[0]
   8: END

  src/gallium/auxiliary/tgsi/tgsi_ureg.c | 2 +-
  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 60
  +- 2 files changed, 44 insertions(+), 18
  deletions(-)

 diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c 
 b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
 index dcf0cb5..6d3ac91 100644
 --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
 +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
 @@ -78,7 +78,7 @@ struct ureg_tokens {
  #define UREG_MAX_OUTPUT PIPE_MAX_SHADER_OUTPUTS
  #define UREG_MAX_CONSTANT_RANGE 32
  #define UREG_MAX_IMMEDIATE 4096
 -#define UREG_MAX_ADDR 2
 +#define UREG_MAX_ADDR 3
  #define UREG_MAX_PRED 1
  #define UREG_MAX_ARRAY_TEMPS 256

 diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
 b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
 index c5e2eb5..0d5c3ed 100644
 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
 +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
 @@ -245,7 +245,8 @@ public:
 ir_instruction *ir;
 GLboolean cond_update;
 bool saturate;
 -   int sampler; /** sampler index */
 +   st_src_reg sampler; /** sampler register */
 +   int sampler_array_size; /** 1-based size of sampler array, 1 if not 
 array */
 int tex_target; /** One of TEXTURE_*_INDEX */
 GLboolean tex_shadow;

 @@ -476,6 +477,7 @@ static st_dst_reg undef_dst = 
 st_dst_reg(PROGRAM_UNDEFINED, SWIZZLE_NOOP, GLSL_T

  static st_dst_reg address_reg = st_dst_reg(PROGRAM_ADDRESS, WRITEMASK_X, 
 GLSL_TYPE_FLOAT, 0);
  static st_dst_reg address_reg2 = st_dst_reg(PROGRAM_ADDRESS, WRITEMASK_X, 
 GLSL_TYPE_FLOAT, 1);
 +static st_dst_reg sampler_reladdr = st_dst_reg(PROGRAM_ADDRESS, WRITEMASK_X, 
 GLSL_TYPE_FLOAT, 2);

  static void
  fail_link(struct gl_shader_program *prog, const char *fmt, ...) 
 PRINTFLIKE(2, 3);
 @@ -2799,6 +2801,8 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir)
 glsl_to_tgsi_instruction *inst = NULL;
 unsigned opcode = TGSI_OPCODE_NOP;
 const glsl_type *sampler_type = ir-sampler-type;
 +   ir_rvalue *sampler_index =
 +  _mesa_get_sampler_array_nonconst_index(ir-sampler);
 bool is_cube_array = false;
 unsigned i;

 @@ -3016,6 +3020,11 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir)
coord_dst.writemask = WRITEMASK_XYZW;
 }

 +   if (sampler_index) {
 +  sampler_index-accept(this);
 +  emit_arl(ir, sampler_reladdr, this-result);
 +   }
 +
 if (opcode == TGSI_OPCODE_TXD)
inst = emit(ir, opcode, result_dst, coord, dx, dy);
 else if (opcode == TGSI_OPCODE_TXQ) {
 @@ -3045,9 +3054,18 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir)
 if (ir-shadow_comparitor)
inst-tex_shadow = GL_TRUE;

 -   inst-sampler = _mesa_get_sampler_uniform_value(ir-sampler,
 -  this-shader_program,
 -  this-prog);
 +   inst-sampler.index = _mesa_get_sampler_uniform_value(ir-sampler,
 + 
 this-shader_program,
 + this-prog);
 +   if (sampler_index) {
 +  inst-sampler.reladdr = ralloc(mem_ctx, st_src_reg);
 +  memcpy(inst-sampler.reladdr, sampler_reladdr, 
 sizeof(sampler_reladdr));
 +  inst-sampler_array_size =
 + ir-sampler-as_dereference_array()
 +

Re: [Mesa-dev] [PATCH 2/2] radeonsi: always prefer SWITCH_ON_EOP(0) on CIK

2014-08-06 Thread Alex Deucher
On Wed, Aug 6, 2014 at 11:30 AM, Marek Olšák mar...@gmail.com wrote:
 On Wed, Aug 6, 2014 at 4:01 PM, Alex Deucher alexdeuc...@gmail.com wrote:
 On Wed, Aug 6, 2014 at 9:32 AM, Marek Olšák mar...@gmail.com wrote:
 From: Marek Olšák marek.ol...@amd.com

 The code is rewritten to take known constraints into account, while always
 using 0 by default.

 This should improve performance for multi-SE parts in theory.

 A debug option is also added for easier debugging. (If there are hangs,
 use the option. If the hangs go away, you have found the problem.)

 Just one comment below.  With that addressed:

 Reviewed-by: Alex Deucher alexander.deuc...@amd.com

 ---
  src/gallium/drivers/radeon/r600_pipe_common.c |  2 +-
  src/gallium/drivers/radeon/r600_pipe_common.h |  1 +
  src/gallium/drivers/radeonsi/si_state_draw.c  | 33 
 ---
  src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 17 
  4 files changed, 43 insertions(+), 10 deletions(-)

 diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
 b/src/gallium/drivers/radeon/r600_pipe_common.c
 index 3476021..eb44d72 100644
 --- a/src/gallium/drivers/radeon/r600_pipe_common.c
 +++ b/src/gallium/drivers/radeon/r600_pipe_common.c
 @@ -239,7 +239,6 @@ static const struct debug_named_value 
 common_debug_options[] = {
 { vm, DBG_VM, Print virtual addresses when creating resources },
 { trace_cs, DBG_TRACE_CS, Trace cs and write rlockup_csid.c 
 file with faulty cs },

 -
 /* shaders */
 { fs, DBG_FS, Print fetch shaders },
 { vs, DBG_VS, Print vertex shaders },
 @@ -254,6 +253,7 @@ static const struct debug_named_value 
 common_debug_options[] = {
 { noinvalrange, DBG_NO_DISCARD_RANGE, Disable handling of 
 INVALIDATE_RANGE map flags },
 { no2d, DBG_NO_2D_TILING, Disable 2D tiling },
 { notiling, DBG_NO_TILING, Disable tiling },
 +   { switch_on_eop, DBG_SWITCH_ON_EOP, Program WD/IA to switch on 
 end-of-packet. },

 DEBUG_NAMED_VALUE_END /* must be last */
  };
 diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
 b/src/gallium/drivers/radeon/r600_pipe_common.h
 index dcec2bb..ac69d5b 100644
 --- a/src/gallium/drivers/radeon/r600_pipe_common.h
 +++ b/src/gallium/drivers/radeon/r600_pipe_common.h
 @@ -93,6 +93,7 @@
  #define DBG_NO_DISCARD_RANGE   (1  12)
  #define DBG_NO_2D_TILING   (1  13)
  #define DBG_NO_TILING  (1  14)
 +#define DBG_SWITCH_ON_EOP  (1  15)
  /* The maximum allowed bit is 15. */

  #define R600_MAP_BUFFER_ALIGNMENT 64
 diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
 b/src/gallium/drivers/radeonsi/si_state_draw.c
 index 4e808a3..ae839ba 100644
 --- a/src/gallium/drivers/radeonsi/si_state_draw.c
 +++ b/src/gallium/drivers/radeonsi/si_state_draw.c
 @@ -401,25 +401,40 @@ static bool si_update_draw_info_state(struct 
 si_context *sctx,

 if (sctx-b.chip_class = CIK) {
 struct si_state_rasterizer *rs = 
 sctx-queued.named.rasterizer;
 -   bool wd_switch_on_eop = prim == V_008958_DI_PT_POLYGON ||
 -   prim == V_008958_DI_PT_LINELOOP ||
 -   prim == V_008958_DI_PT_TRIFAN ||
 -   prim == V_008958_DI_PT_TRISTRIP_ADJ 
 ||
 -   info-primitive_restart ||
 -   (rs ? rs-line_stipple_enable : 
 false);
 -   /* If the WD switch is false, the IA switch must be false 
 too. */
 -   bool ia_switch_on_eop = wd_switch_on_eop;
 unsigned primgroup_size = 64;

 +   /* SWITCH_ON_EOP(0) is always preferable. */
 +   bool wd_switch_on_eop = false;
 +   bool ia_switch_on_eop = false;
 +
 +   /* WD_SWITCH_ON_EOP has no effect on GPUs with less than
 +* 4 shader engines. Set 1 to pass the assertion below.
 +* The other cases are hardware requirements. */
 +   if (sctx-b.screen-info.max_se  4 ||
 +   prim == V_008958_DI_PT_POLYGON ||
 +   prim == V_008958_DI_PT_LINELOOP ||
 +   prim == V_008958_DI_PT_TRIFAN ||
 +   prim == V_008958_DI_PT_TRISTRIP_ADJ ||
 +   info-primitive_restart)
 +   wd_switch_on_eop = true;
 +
 /* Hawaii hangs if instancing is enabled and 
 WD_SWITCH_ON_EOP is 0.
  * We don't know that for indirect drawing, so treat it as
  * always problematic. */
 if (sctx-b.family == CHIP_HAWAII 
 -   (info-indirect || info-instance_count  1)) {
 +   (info-indirect || info-instance_count  1))
 wd_switch_on_eop = true;
 +
 +   /* This is a hardware requirement. */
 +   if ((rs  rs-line_stipple_enable) ||
 +   

Re: [Mesa-dev] [PATCH] mesa/st: add support for dynamic sampler offsets

2014-08-06 Thread Ilia Mirkin
  pc-MaxAddressRegs = pc-MaxNativeAddressRegs =
 _min(screen-get_shader_param(screen, sh, PIPE_SHADER_CAP_MAX_ADDRS),
  MAX_PROGRAM_ADDRESS_REGS);

Not really sure what that's referring to... ARB_vp/fp or something?

Anyways, this is definitely a bit of a violation of that. OTOH, so is
the indirect UBO indexing and indirect GS input access (assuming
that's allowed), since those would use ADDR[1] and every driver
(except nv30) returns 1, and sometimes 0 -- including
nv50/nvc0/r600/radeonsi.

So... dunno what the proper way to proceed is. Fix drivers to claim
higher numbers? Continue the tradition of ignoring it and relying on
the fact that GPU's that don't support it also won't support the
features that cause it to get used?

On Wed, Aug 6, 2014 at 11:45 AM, Marek Olšák mar...@gmail.com wrote:
 I guess PIPE_SHADER_CAP_MAX_ADDRS is now useless, because it can be
 derived from GLSL_FEATURE_LEVEL, right?

 Marek

 On Wed, Aug 6, 2014 at 5:25 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
 Replace the plain sampler index with a register reference to a sampler.
 We also need to keep track of the sampler array size when there is a
 relative reference so that we can mark the whole array used.

 To facilitate implementation, we add a separate ADDR register that
 exclusively handles the sampler relative address. Other approaches would
 be more invasive.

 Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
 ---

 _mesa_get_sampler_array_nonconst_index is a function added by a patch that
 ChrisF is working on... basically it returns NULL unless it's a nonconst
 access.

 I've done a very modest amount of piglit testing, but I definitely need to do
 some more. The nvc0 bits aren't 100% ready -- I noticed that in some odd
 situations the arguments to the tex instruction will get all mangled. But for
 a simple case that mixes non-array and array samplers, it looks something 
 like
 this:

 FRAG
 PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
 DCL IN[0], GENERIC[0], PERSPECTIVE
 DCL OUT[0], COLOR
 DCL SAMP[0]
 DCL SAMP[1]
 DCL SAMP[2]
 DCL SAMP[3]
 DCL CONST[0..1]
 DCL TEMP[0..1], LOCAL
 DCL ADDR[0..2]
 IMM[0] FLT32 {0., 0., 0., 0.}
   0: MOV TEMP[0].xy, IN[0].xyyy
   1: TEX TEMP[0], TEMP[0], SAMP[0], 2D
   2: MOV TEMP[1].xy, IN[0].xyyy
   3: UARL ADDR[2].x, CONST[1].
   4: TEX TEMP[1], TEMP[1], SAMP[ADDR[2].x+1], 2D
   5: MUL TEMP[1], TEMP[1], CONST[0].
   6: MAD TEMP[0], TEMP[0], IMM[0]., TEMP[1]
   7: MOV OUT[0], TEMP[0]
   8: END

  src/gallium/auxiliary/tgsi/tgsi_ureg.c | 2 +-
  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 60
  +- 2 files changed, 44 insertions(+), 18
  deletions(-)

 diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c 
 b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
 index dcf0cb5..6d3ac91 100644
 --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
 +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
 @@ -78,7 +78,7 @@ struct ureg_tokens {
  #define UREG_MAX_OUTPUT PIPE_MAX_SHADER_OUTPUTS
  #define UREG_MAX_CONSTANT_RANGE 32
  #define UREG_MAX_IMMEDIATE 4096
 -#define UREG_MAX_ADDR 2
 +#define UREG_MAX_ADDR 3
  #define UREG_MAX_PRED 1
  #define UREG_MAX_ARRAY_TEMPS 256

 diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
 b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
 index c5e2eb5..0d5c3ed 100644
 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
 +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
 @@ -245,7 +245,8 @@ public:
 ir_instruction *ir;
 GLboolean cond_update;
 bool saturate;
 -   int sampler; /** sampler index */
 +   st_src_reg sampler; /** sampler register */
 +   int sampler_array_size; /** 1-based size of sampler array, 1 if not 
 array */
 int tex_target; /** One of TEXTURE_*_INDEX */
 GLboolean tex_shadow;

 @@ -476,6 +477,7 @@ static st_dst_reg undef_dst = 
 st_dst_reg(PROGRAM_UNDEFINED, SWIZZLE_NOOP, GLSL_T

  static st_dst_reg address_reg = st_dst_reg(PROGRAM_ADDRESS, WRITEMASK_X, 
 GLSL_TYPE_FLOAT, 0);
  static st_dst_reg address_reg2 = st_dst_reg(PROGRAM_ADDRESS, WRITEMASK_X, 
 GLSL_TYPE_FLOAT, 1);
 +static st_dst_reg sampler_reladdr = st_dst_reg(PROGRAM_ADDRESS, 
 WRITEMASK_X, GLSL_TYPE_FLOAT, 2);

  static void
  fail_link(struct gl_shader_program *prog, const char *fmt, ...) 
 PRINTFLIKE(2, 3);
 @@ -2799,6 +2801,8 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir)
 glsl_to_tgsi_instruction *inst = NULL;
 unsigned opcode = TGSI_OPCODE_NOP;
 const glsl_type *sampler_type = ir-sampler-type;
 +   ir_rvalue *sampler_index =
 +  _mesa_get_sampler_array_nonconst_index(ir-sampler);
 bool is_cube_array = false;
 unsigned i;

 @@ -3016,6 +3020,11 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir)
coord_dst.writemask = WRITEMASK_XYZW;
 }

 +   if (sampler_index) {
 +  sampler_index-accept(this);
 +  emit_arl(ir, sampler_reladdr, this-result);
 +   }
 +
 if (opcode == TGSI_OPCODE_TXD)
inst = emit(ir, opcode, result_dst, coord, dx, dy);
 else if 

Re: [Mesa-dev] [PATCH] mesa/st: add support for dynamic sampler offsets

2014-08-06 Thread Marek Olšák
On Wed, Aug 6, 2014 at 5:53 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
   pc-MaxAddressRegs = pc-MaxNativeAddressRegs =
  _min(screen-get_shader_param(screen, sh, PIPE_SHADER_CAP_MAX_ADDRS),
   MAX_PROGRAM_ADDRESS_REGS);

 Not really sure what that's referring to... ARB_vp/fp or something?

Yes, ARB_vp needs 1, ARB_fp doesn't support indirect addresing (expects 0).


 Anyways, this is definitely a bit of a violation of that. OTOH, so is
 the indirect UBO indexing and indirect GS input access (assuming
 that's allowed), since those would use ADDR[1] and every driver
 (except nv30) returns 1, and sometimes 0 -- including
 nv50/nvc0/r600/radeonsi.

 So... dunno what the proper way to proceed is. Fix drivers to claim
 higher numbers? Continue the tradition of ignoring it and relying on
 the fact that GPU's that don't support it also won't support the
 features that cause it to get used?

You don't have to worry about that for now. We can clean it up later.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4] winsys/radeon: fix nop packet padding for hawaii

2014-08-06 Thread Alex Deucher
On Mon, Aug 4, 2014 at 6:48 AM, Andreas Boll andreas.boll@gmail.com wrote:
 The initial firmware for hawaii does not support type3 nop packet.
 Detect the new hawaii firmware with query RADEON_INFO_ACCEL_WORKING2.
 If the returned value is 3, then the new firmware is used.

 This patch uses type2 for the old firmware and type3 for the new firmware.

 It fixes the cases when the old firmware is used and the user wants to
 manually enable acceleration.
 The two possible scenarios are:
  - the kernel has no support for the new firmware.
  - the kernel has support for the new firmware but only the old firmware
is available.

 Additionaly this patch disables GPU acceleration on hawaii if the kernel
 returns a value  2. In this case the kernel hasn't the required fixes
 for proper acceleration.

 v2:
  - Fix indentation
  - Use private struct radeon_drm_winsys instead of public struct radeon_info
  - Rename r600_accel_working2 to accel_working2

 v3:
  - Use type2 nop packet for returned value  3

 v4:
  - Fail to initialize winsys for returned value  2

 Cc: mesa-sta...@lists.freedesktop.org
 Cc: Alex Deucher alexander.deuc...@amd.com
 Cc: Jérôme Glisse jgli...@redhat.com
 Cc: Marek Olšák marek.ol...@amd.com
 Cc: Michel Dänzer michel.daen...@amd.com
 Signed-off-by: Andreas Boll andreas.boll@gmail.com

Reviewed-by: Alex Deucher alexander.deuc...@amd.com

 ---

 Unfortunately I can't test this patch myself since I don't own a hawaii card.
 So I'd need someone to test this patch on kernel = 3.16-rc7 + these patches
 [1-2].

 This patch would bring us one step further for hawaii acceleration on kernel
 3.16.

 Finally we can enable hawaii acceleration if the query returns  2 [3].

 Andreas.

 [1] http://lists.freedesktop.org/archives/dri-devel/2014-August/065305.html
 [2] http://lists.freedesktop.org/archives/dri-devel/2014-August/065306.html
 [3] http://lists.x.org/archives/xorg-driver-ati/2014-August/026534.html

  src/gallium/winsys/radeon/drm/radeon_drm_cs.c |  6 +-
  src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 10 ++
  src/gallium/winsys/radeon/drm/radeon_drm_winsys.h |  1 +
  3 files changed, 16 insertions(+), 1 deletion(-)

 diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c 
 b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
 index a06ecb2..dd109af 100644
 --- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
 +++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
 @@ -446,8 +446,12 @@ static void radeon_drm_cs_flush(struct radeon_winsys_cs 
 *rcs,
  case RING_GFX:
  /* pad DMA ring to 8 DWs to meet CP fetch alignment requirements
   * r6xx, requires at least 4 dw alignment to avoid a hw bug.
 + * hawaii with old firmware needs type2 nop packet.
 + * accel_working2 with value 2 indicates the new firmware.
   */
 -if (cs-ws-info.chip_class = SI) {
 +if (cs-ws-info.chip_class = SI ||
 +(cs-ws-info.family == CHIP_HAWAII 
 + cs-ws-accel_working2  3)) {
  while (rcs-cdw  7)
  OUT_CS(cs-base, 0x8000); /* type2 nop packet */
  } else {
 diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c 
 b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
 index 910d06b..ecff0e7 100644
 --- a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
 +++ b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
 @@ -395,6 +395,16 @@ static boolean do_winsys_init(struct radeon_drm_winsys 
 *ws)
  radeon_get_drm_value(ws-fd, RADEON_INFO_MAX_SH_PER_SE, NULL,
   ws-info.max_sh_per_se);

 +radeon_get_drm_value(ws-fd, RADEON_INFO_ACCEL_WORKING2, NULL,
 + ws-accel_working2);
 +if (ws-info.family == CHIP_HAWAII  ws-accel_working2  2) {
 +fprintf(stderr, radeon: GPU acceleration for Hawaii disabled, 
 +returned accel_working2 value %u is smaller than 2. 
 +Please install a newer kernel.\n,
 +ws-accel_working2);
 +return FALSE;
 +}
 +
  if (radeon_get_drm_value(ws-fd, RADEON_INFO_SI_TILE_MODE_ARRAY, NULL,
   ws-info.si_tile_mode_array)) {
  ws-info.si_tile_mode_array_valid = TRUE;
 diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.h 
 b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.h
 index ea6f7f0..aebc391 100644
 --- a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.h
 +++ b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.h
 @@ -55,6 +55,7 @@ struct radeon_drm_winsys {
  enum radeon_generation gen;
  struct radeon_info info;
  uint32_t va_start;
 +uint32_t accel_working2;

  struct pb_manager *kman;
  struct pb_manager *cman_vram;
 --
 2.0.1

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list

[Mesa-dev] [PATCH 01/12] mesa: Add the GL_ARB_texture_compression_bptc extension

2014-08-06 Thread Neil Roberts
This adds a boolean in the gl_extensions struct for
GL_ARB_texture_compression_bptc as well as an entry in extension_table.
---
 src/mesa/main/extensions.c | 1 +
 src/mesa/main/mtypes.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
index 9ac8377..f3197f9 100644
--- a/src/mesa/main/extensions.c
+++ b/src/mesa/main/extensions.c
@@ -156,6 +156,7 @@ static const struct extension extension_table[] = {
{ GL_ARB_texture_buffer_object_rgb32, 
o(ARB_texture_buffer_object_rgb32), GLC,2009 },
{ GL_ARB_texture_buffer_range,
o(ARB_texture_buffer_range),GLC,2012 },
{ GL_ARB_texture_compression, o(dummy_true),  
GLL,2000 },
+   { GL_ARB_texture_compression_bptc,
o(ARB_texture_compression_bptc),GL, 2010 },
{ GL_ARB_texture_compression_rgtc,
o(ARB_texture_compression_rgtc),GL, 2004 },
{ GL_ARB_texture_cube_map,o(ARB_texture_cube_map),
GLL,1999 },
{ GL_ARB_texture_cube_map_array,  
o(ARB_texture_cube_map_array),  GL, 2009 },
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index f5ce360..312a336 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -3574,6 +3574,7 @@ struct gl_extensions
GLboolean ARB_texture_buffer_object;
GLboolean ARB_texture_buffer_object_rgb32;
GLboolean ARB_texture_buffer_range;
+   GLboolean ARB_texture_compression_bptc;
GLboolean ARB_texture_compression_rgtc;
GLboolean ARB_texture_cube_map;
GLboolean ARB_texture_cube_map_array;
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/12] mesa/format_info: Add support for the BPTC layout

2014-08-06 Thread Neil Roberts
Adds the ‘bptc’ layout to get_channel_bits. The channel bits for BPTC depend
on the mode but as it only has to be an approximation we can set it to 4 like
for S3TC.
---
 src/mesa/main/format_info.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/main/format_info.py b/src/mesa/main/format_info.py
index a0eecd3..fc40dc4 100644
--- a/src/mesa/main/format_info.py
+++ b/src/mesa/main/format_info.py
@@ -110,7 +110,7 @@ def get_channel_bits(fmat, chan_name):
if fmat.is_compressed():
   # These values are pretty-much bogus, but OpenGL requires that we
   # return an approximate number of bits.
-  if fmat.layout == 's3tc':
+  if fmat.layout in ('s3tc', 'bptc'):
  return 4 if fmat.has_channel(chan_name) else 0
   elif fmat.layout == 'fxt1':
  if chan_name in 'rgb':
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/12] mesa/format_info: Add support for compressed floating-point formats

2014-08-06 Thread Neil Roberts
If the name of a compressed texture format has ‘FLOAT’ in it it will now set
the data type of the format to GL_FLOAT. This will be needed for the BPTC
half-float formats.
---
 src/mesa/main/format_info.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/format_info.py b/src/mesa/main/format_info.py
index 448bd00..a0eecd3 100644
--- a/src/mesa/main/format_info.py
+++ b/src/mesa/main/format_info.py
@@ -62,7 +62,9 @@ def get_gl_base_format(fmat):
 
 def get_gl_data_type(fmat):
if fmat.is_compressed():
-  if 'SIGNED' in fmat.name or 'SNORM' in fmat.name:
+  if 'FLOAT' in fmat.name:
+ return 'GL_FLOAT'
+  elif 'SIGNED' in fmat.name or 'SNORM' in fmat.name:
  return 'GL_SIGNED_NORMALIZED'
   else:
  return 'GL_UNSIGNED_NORMALIZED'
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/12] swrast: Enable GL_ARB_texture_compression_bptc

2014-08-06 Thread Neil Roberts
Enables BPTC texture compression on the software rasterizer.
---
 src/mesa/main/extensions.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
index f3197f9..7732249 100644
--- a/src/mesa/main/extensions.c
+++ b/src/mesa/main/extensions.c
@@ -449,6 +449,7 @@ _mesa_enable_sw_extensions(struct gl_context *ctx)
ctx-Extensions.ARB_point_sprite = GL_TRUE;
ctx-Extensions.ARB_shadow = GL_TRUE;
ctx-Extensions.ARB_texture_border_clamp = GL_TRUE;
+   ctx-Extensions.ARB_texture_compression_bptc = GL_TRUE;
ctx-Extensions.ARB_texture_cube_map = GL_TRUE;
ctx-Extensions.ARB_texture_env_combine = GL_TRUE;
ctx-Extensions.ARB_texture_env_crossbar = GL_TRUE;
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 0/12] Add support for BPTC texture compression

2014-08-06 Thread Neil Roberts
Here is a v2 of the BPTC texture compression series. The main
difference is that instead of going via DXT3 for the UNORM formats it
now always uses the custom naïve compressor for all formats. This
doesn't give very good-looking results but it is fast and doesn't add
any dependencies. There was some discussion about alternative
approaches on the list here:

http://lists.freedesktop.org/archives/mesa-dev/2014-July/064436.html

I didn't manage to get any consensus on whether this approach is the
right thing to do so I thought I would just post the patches and see
what happens.

The other changes are:

• The patches are rebased on top of Jason Ekstrand's texstore changes.
  This required some modification to format_info.py.

• Added a patch to make glGenerateMipmap work with the BPTC formats.

• Added a patch to make the meta implementation of glGetTexImage work
  with the two floating-point formats.

• Added the formats to some format query functions that were missed.
  (There are a lot of switches for formats spread around Mesa!)

• Fixed setting the alpha component to 1.0 when fetching from the RGB
  half-float formats.

• Fixed fetching the alpha component from sRGB formats.

• Fixed the quantization step for the half-float compressor.

• Fixed a typo causing a bug in the compressor for textures with a
  width that isn't a multiple of four.

The patches are also available on Github here:

https://github.com/bpeel/mesa/commits/wip/bptc

There are piglit tests for BPTC in a branch here:

https://github.com/bpeel/piglit/commits/wip/bptc

- Neil

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 06/12] mesa: Add texel fetch functions for BPTC-compressed textures

2014-08-06 Thread Neil Roberts
Adds functions to fetch from any of the four BPTC-compressed formats.

v2: Set the alpha component to 1.0 when fetching from the half-float formats
instead of leaving it uninitialised. Don't linearize the alpha component
when fetching from sRGB.
---
 src/mesa/Makefile.sources|   1 +
 src/mesa/main/texcompress.c  |   6 +
 src/mesa/main/texcompress_bptc.c | 960 +++
 src/mesa/main/texcompress_bptc.h |  34 ++
 4 files changed, 1001 insertions(+)
 create mode 100644 src/mesa/main/texcompress_bptc.c
 create mode 100644 src/mesa/main/texcompress_bptc.h

diff --git a/src/mesa/Makefile.sources b/src/mesa/Makefile.sources
index 45c53ca..d495bd1 100644
--- a/src/mesa/Makefile.sources
+++ b/src/mesa/Makefile.sources
@@ -96,6 +96,7 @@ MAIN_FILES = \
$(SRCDIR)main/stencil.c \
$(SRCDIR)main/syncobj.c \
$(SRCDIR)main/texcompress.c \
+   $(SRCDIR)main/texcompress_bptc.c \
$(SRCDIR)main/texcompress_cpal.c \
$(SRCDIR)main/texcompress_rgtc.c \
$(SRCDIR)main/texcompress_s3tc.c \
diff --git a/src/mesa/main/texcompress.c b/src/mesa/main/texcompress.c
index 53c0ea0..b4efeee 100644
--- a/src/mesa/main/texcompress.c
+++ b/src/mesa/main/texcompress.c
@@ -42,6 +42,7 @@
 #include texcompress_rgtc.h
 #include texcompress_s3tc.h
 #include texcompress_etc.h
+#include texcompress_bptc.h
 
 
 /**
@@ -610,6 +611,11 @@ _mesa_get_compressed_fetch_func(mesa_format format)
   return _mesa_get_compressed_rgtc_func(format);
case MESA_FORMAT_ETC1_RGB8:
   return _mesa_get_etc_fetch_func(format);
+   case MESA_FORMAT_BPTC_RGBA_UNORM:
+   case MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM:
+   case MESA_FORMAT_BPTC_RGB_SIGNED_FLOAT:
+   case MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT:
+  return _mesa_get_bptc_fetch_func(format);
default:
   return NULL;
}
diff --git a/src/mesa/main/texcompress_bptc.c b/src/mesa/main/texcompress_bptc.c
new file mode 100644
index 000..7ec294b
--- /dev/null
+++ b/src/mesa/main/texcompress_bptc.c
@@ -0,0 +1,960 @@
+/*
+ * Copyright (C) 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the Software),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file texcompress_bptc.c
+ * GL_ARB_texture_compression_bptc support.
+ */
+
+#include stdbool.h
+#include texcompress.h
+#include texcompress_bptc.h
+#include util/format_srgb.h
+#include texstore.h
+#include macros.h
+#include image.h
+
+#define BLOCK_SIZE 4
+#define N_PARTITIONS 64
+#define BLOCK_BYTES 16
+
+struct bptc_unorm_mode {
+   int n_subsets;
+   int n_partition_bits;
+   bool has_rotation_bits;
+   bool has_index_selection_bit;
+   int n_color_bits;
+   int n_alpha_bits;
+   bool has_endpoint_pbits;
+   bool has_shared_pbits;
+   int n_index_bits;
+   int n_secondary_index_bits;
+};
+
+struct bptc_float_bitfield {
+   int8_t endpoint;
+   uint8_t component;
+   uint8_t offset;
+   uint8_t n_bits;
+   bool reverse;
+};
+
+struct bptc_float_mode {
+   bool reserved;
+   bool transformed_endpoints;
+   int n_partition_bits;
+   int n_endpoint_bits;
+   int n_index_bits;
+   int n_delta_bits[3];
+   struct bptc_float_bitfield bitfields[24];
+};
+
+static const struct bptc_unorm_mode
+bptc_unorm_modes[] = {
+   /* 0 */ { 3, 4, false, false, 4, 0, true,  false, 3, 0 },
+   /* 1 */ { 2, 6, false, false, 6, 0, false, true,  3, 0 },
+   /* 2 */ { 3, 6, false, false, 5, 0, false, false, 2, 0 },
+   /* 3 */ { 2, 6, false, false, 7, 0, true,  false, 2, 0 },
+   /* 4 */ { 1, 0, true,  true,  5, 6, false, false, 2, 3 },
+   /* 5 */ { 1, 0, true,  false, 7, 8, false, false, 2, 2 },
+   /* 6 */ { 1, 0, false, false, 7, 7, true,  false, 4, 0 },
+   /* 7 */ { 2, 6, false, false, 5, 5, true,  false, 2, 0 }
+};
+
+static const struct bptc_float_mode
+bptc_float_modes[] = {
+   /* 00 */
+   { false, true, 5, 10, 3, { 5, 5, 5 },
+ { { 2, 1, 4, 1, false }, { 2, 2, 4, 1, false }, { 3, 2, 4, 1, false },
+   { 0, 0, 0, 10, false }, 

[Mesa-dev] [PATCH 08/12] mesa/main: Modify generate_mipmap_compressed to cope with float textures

2014-08-06 Thread Neil Roberts
Once we add BPTC texture support we will need to generate mipmaps for
compressed floating point textures too. Most of the code seems to already be
there but it just needs a few extra lines to get it to use GL_FLOAT instead of
GL_UNSIGNED_BYTE as the type for the temporary buffers.
---
 src/mesa/main/mipmap.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/src/mesa/main/mipmap.c b/src/mesa/main/mipmap.c
index cc109cc..fdaa682 100644
--- a/src/mesa/main/mipmap.c
+++ b/src/mesa/main/mipmap.c
@@ -2038,12 +2038,15 @@ generate_mipmap_compressed(struct gl_context *ctx, 
GLenum target,
 
components = _mesa_format_num_components(temp_format);
 
-   /* Revisit this if we get compressed formats with 8 bits per component */
-   if (_mesa_get_format_datatype(srcImage-TexFormat)
-   == GL_SIGNED_NORMALIZED) {
+   switch (_mesa_get_format_datatype(srcImage-TexFormat)) {
+   case GL_FLOAT:
+  temp_datatype = GL_FLOAT;
+  break;
+   case GL_SIGNED_NORMALIZED:
+  /* Revisit this if we get compressed formats with 8 bits per component 
*/
   temp_datatype = GL_BYTE;
-   }
-   else {
+  break;
+   default:
   temp_datatype = GL_UNSIGNED_BYTE;
}
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 05/12] mesa: Add the format enums for BPTC-compressed images

2014-08-06 Thread Neil Roberts
This adds the following four Mesa image format enums which correspond to the
four BPTC compressed texture formats:

 MESA_FORMAT_BPTC_RGBA_UNORM
 MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM
 MESA_FORMAT_BPTC_RGB_SIGNED_FLOAT
 MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT

It also updates the format information functions to handle these and the
corresponding GL enums.

v2: Also modify _mesa_get_format_color_encoding, _mesa_get_srgb_format_linear
and _mesa_get_uncompressed_format
---
 src/mesa/main/formats.c  | 20 
 src/mesa/main/formats.csv|  6 ++
 src/mesa/main/formats.h  |  6 ++
 src/mesa/main/glformats.c| 10 ++
 src/mesa/main/texcompress.c  | 24 
 src/mesa/main/texformat.c|  8 
 src/mesa/main/teximage.c | 14 ++
 src/mesa/swrast/s_texfetch.c | 24 
 8 files changed, 112 insertions(+)

diff --git a/src/mesa/main/formats.c b/src/mesa/main/formats.c
index f03425e..a5e06ce 100644
--- a/src/mesa/main/formats.c
+++ b/src/mesa/main/formats.c
@@ -369,6 +369,7 @@ _mesa_get_format_color_encoding(mesa_format format)
case MESA_FORMAT_ETC2_SRGB8_ALPHA8_EAC:
case MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1:
case MESA_FORMAT_B8G8R8X8_SRGB:
+   case MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM:
   return GL_SRGB;
default:
   return GL_LINEAR;
@@ -426,6 +427,9 @@ _mesa_get_srgb_format_linear(mesa_format format)
case MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1:
   format = MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1;
   break;
+   case MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM:
+  format = MESA_FORMAT_BPTC_RGBA_UNORM;
+  break;
case MESA_FORMAT_B8G8R8X8_SRGB:
   format = MESA_FORMAT_B8G8R8X8_UNORM;
   break;
@@ -491,6 +495,12 @@ _mesa_get_uncompressed_format(mesa_format format)
case MESA_FORMAT_ETC2_RG11_EAC:
case MESA_FORMAT_ETC2_SIGNED_RG11_EAC:
   return MESA_FORMAT_R16G16_UNORM;
+   case MESA_FORMAT_BPTC_RGBA_UNORM:
+   case MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM:
+  return MESA_FORMAT_A8B8G8R8_UNORM;
+   case MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT:
+   case MESA_FORMAT_BPTC_RGB_SIGNED_FLOAT:
+  return MESA_FORMAT_RGB_FLOAT32;
default:
 #ifdef DEBUG
   assert(!_mesa_is_format_compressed(format));
@@ -968,6 +978,10 @@ _mesa_format_to_type_and_comps(mesa_format format,
case MESA_FORMAT_ETC2_SIGNED_RG11_EAC:
case MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1:
case MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1:
+   case MESA_FORMAT_BPTC_RGBA_UNORM:
+   case MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM:
+   case MESA_FORMAT_BPTC_RGB_SIGNED_FLOAT:
+   case MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT:
   /* XXX generate error instead? */
   *datatype = GL_UNSIGNED_BYTE;
   *comps = 0;
@@ -1524,6 +1538,12 @@ _mesa_format_matches_format_and_type(mesa_format 
mesa_format,
case MESA_FORMAT_RGBA_DXT5:
   return GL_FALSE;
 
+   case MESA_FORMAT_BPTC_RGBA_UNORM:
+   case MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM:
+   case MESA_FORMAT_BPTC_RGB_SIGNED_FLOAT:
+   case MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT:
+  return GL_FALSE;
+
case MESA_FORMAT_RGBA_FLOAT32:
   return format == GL_RGBA  type == GL_FLOAT  !swapBytes;
case MESA_FORMAT_RGBA_FLOAT16:
diff --git a/src/mesa/main/formats.csv b/src/mesa/main/formats.csv
index 5abb706..fdd4341 100644
--- a/src/mesa/main/formats.csv
+++ b/src/mesa/main/formats.csv
@@ -280,3 +280,9 @@ MESA_FORMAT_ETC2_SIGNED_R11_EAC   , etc2  , 4, 4, 
x64 , , ,
 MESA_FORMAT_ETC2_SIGNED_RG11_EAC  , etc2  , 4, 4, x128, , ,
 , xyzw, rgb
 MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1 , etc2  , 4, 4, x64 , , ,
 , xyzw, rgb
 MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1, etc2  , 4, 4, x128, , ,
 , xyzw, srgb
+
+# BPTC compressed formats
+MESA_FORMAT_BPTC_RGBA_UNORM   , bptc  , 4, 4, x128, , ,
 , xyzw, rgb
+MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM , bptc  , 4, 4, x128, , ,
 , xyzw, srgb
+MESA_FORMAT_BPTC_RGB_SIGNED_FLOAT , bptc  , 4, 4, x128, , ,
 , xyz1, rgb
+MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT   , bptc  , 4, 4, x128, , ,
 , xyz1, rgb
diff --git a/src/mesa/main/formats.h b/src/mesa/main/formats.h
index 457c8ab..83a7367 100644
--- a/src/mesa/main/formats.h
+++ b/src/mesa/main/formats.h
@@ -427,6 +427,12 @@ typedef enum
MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1,
MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1,
 
+   /* BPTC compressed formats */
+   MESA_FORMAT_BPTC_RGBA_UNORM,
+   MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM,
+   MESA_FORMAT_BPTC_RGB_SIGNED_FLOAT,
+   MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT,
+
MESA_FORMAT_COUNT
 } mesa_format;
 
diff --git a/src/mesa/main/glformats.c b/src/mesa/main/glformats.c
index 0fb25ba..00478f9 100644
--- a/src/mesa/main/glformats.c
+++ b/src/mesa/main/glformats.c
@@ -787,6 +787,10 @@ _mesa_is_color_format(GLenum format)
   case GL_COMPRESSED_SIGNED_RG11_EAC:
   case 

[Mesa-dev] [PATCH 02/12] mesa: Fix the base format for GL_COMPRESSED_RGB_BPTC_*_FLOAT_ARB

2014-08-06 Thread Neil Roberts
The signed and unsigned half-float BPTC-compressed formats were being reported
as having a base format of GL_RGBA but they don't store an alpha channel so it
should be GL_RGB.
---
 src/mesa/main/texcompress.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/main/texcompress.c b/src/mesa/main/texcompress.c
index 9dbfe9f..fb3ea02 100644
--- a/src/mesa/main/texcompress.c
+++ b/src/mesa/main/texcompress.c
@@ -92,6 +92,8 @@ _mesa_gl_compressed_format_base_format(GLenum format)
 
case GL_COMPRESSED_RGB:
case GL_COMPRESSED_SRGB:
+   case GL_COMPRESSED_RGB_BPTC_SIGNED_FLOAT_ARB:
+   case GL_COMPRESSED_RGB_BPTC_UNSIGNED_FLOAT_ARB:
case GL_COMPRESSED_RGB_S3TC_DXT1_EXT:
case GL_COMPRESSED_RGB_FXT1_3DFX:
case GL_COMPRESSED_SRGB_S3TC_DXT1_EXT:
@@ -104,8 +106,6 @@ _mesa_gl_compressed_format_base_format(GLenum format)
case GL_COMPRESSED_SRGB_ALPHA:
case GL_COMPRESSED_RGBA_BPTC_UNORM_ARB:
case GL_COMPRESSED_SRGB_ALPHA_BPTC_UNORM_ARB:
-   case GL_COMPRESSED_RGB_BPTC_SIGNED_FLOAT_ARB:
-   case GL_COMPRESSED_RGB_BPTC_UNSIGNED_FLOAT_ARB:
case GL_COMPRESSED_RGBA_S3TC_DXT1_EXT:
case GL_COMPRESSED_RGBA_S3TC_DXT3_EXT:
case GL_COMPRESSED_RGBA_S3TC_DXT5_EXT:
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/12] docs: Update release notes and GL3.txt for GL_ARB_texture_compression_bptc

2014-08-06 Thread Neil Roberts
---
 docs/GL3.txt| 2 +-
 docs/relnotes/10.3.html | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index e241257..973495c 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -138,7 +138,7 @@ GL 4.1:
 GL 4.2:
 
   GLSL 4.2 not started
-  GL_ARB_texture_compression_bptc  not started
+  GL_ARB_texture_compression_bptc  DONE (i965)
   GL_ARB_compressed_texture_pixel_storage  DONE (all drivers)
   GL_ARB_shader_atomic_countersDONE (i965)
   GL_ARB_texture_storage   DONE (all drivers)
diff --git a/docs/relnotes/10.3.html b/docs/relnotes/10.3.html
index f023ca6..0bb417d 100644
--- a/docs/relnotes/10.3.html
+++ b/docs/relnotes/10.3.html
@@ -61,6 +61,7 @@ Note: some of the new features are only available with 
certain drivers.
 liGL_ARB_clear_texture on i965/li
 liA new software rasterizer driver (kms_swrast_dri.so) that works with
 DRM drivers that don't have a full-fledged GEM (such as qxl or simpledrm)/li
+liGL_ARB_texture_compression_bptc on i965/gen7+/li
 /ul
 
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/12] mesa/meta: Support decompressing floating-point formats

2014-08-06 Thread Neil Roberts
Previously the Meta implementation of glGetTexImage would fall back to
_mesa_get_teximage if the texturing is not using an unsigned normalised
format. However in order to support the half-float formats of BPTC textures we
can make it render to a floating-point renderbuffer instead. This patch makes
decompression_state have two FBOs, one for the GL_RGBA format and one for
GL_RGBA32F. If a floating-point texture is encountered it will try setting up
a floating-point FBO. It will now also check the status of the FBO and fall
back to _mesa_get_teximage if the FBO is not complete.
---
 src/mesa/drivers/common/meta.c | 97 --
 src/mesa/drivers/common/meta.h | 14 +-
 2 files changed, 78 insertions(+), 33 deletions(-)

diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c
index f8f0ee3..c3764ee 100644
--- a/src/mesa/drivers/common/meta.c
+++ b/src/mesa/drivers/common/meta.c
@@ -2940,14 +2940,22 @@ _mesa_meta_CopyTexSubImage(struct gl_context *ctx, 
GLuint dims,
free(buf);
 }
 
+static void
+meta_decompress_fbo_cleanup(struct decompress_fbo_state *decompress_fbo)
+{
+   if (decompress_fbo-FBO != 0) {
+  _mesa_DeleteFramebuffers(1, decompress_fbo-FBO);
+  _mesa_DeleteRenderbuffers(1, decompress_fbo-RBO);
+   }
+
+   memset(decompress_fbo, 0, sizeof(*decompress_fbo));
+}
 
 static void
 meta_decompress_cleanup(struct decompress_state *decompress)
 {
-   if (decompress-FBO != 0) {
-  _mesa_DeleteFramebuffers(1, decompress-FBO);
-  _mesa_DeleteRenderbuffers(1, decompress-RBO);
-   }
+   meta_decompress_fbo_cleanup(decompress-byteFBO);
+   meta_decompress_fbo_cleanup(decompress-floatFBO);
 
if (decompress-VAO != 0) {
   _mesa_DeleteVertexArrays(1, decompress-VAO);
@@ -2969,7 +2977,7 @@ meta_decompress_cleanup(struct decompress_state 
*decompress)
  * \param dest  destination buffer
  * \param destRowLength  dest image rowLength (ala GL_PACK_ROW_LENGTH)
  */
-static void
+static bool
 decompress_texture_image(struct gl_context *ctx,
  struct gl_texture_image *texImage,
  GLuint slice,
@@ -2977,17 +2985,33 @@ decompress_texture_image(struct gl_context *ctx,
  GLvoid *dest)
 {
struct decompress_state *decompress = ctx-Meta-Decompress;
+   struct decompress_fbo_state *decompress_fbo;
struct gl_texture_object *texObj = texImage-TexObject;
const GLint width = texImage-Width;
const GLint height = texImage-Height;
const GLint depth = texImage-Height;
const GLenum target = texObj-Target;
+   GLenum rbFormat;
GLenum faceTarget;
struct vertex verts[4];
GLuint samplerSave;
+   GLenum status;
const bool use_glsl_version = ctx-Extensions.ARB_vertex_shader 
   ctx-Extensions.ARB_fragment_shader;
 
+   switch (_mesa_get_format_datatype(texImage-TexFormat)) {
+   case GL_FLOAT:
+  decompress_fbo = decompress-floatFBO;
+  rbFormat = GL_RGBA32F;
+  break;
+   case GL_UNSIGNED_NORMALIZED:
+  decompress_fbo = decompress-byteFBO;
+  rbFormat = GL_RGBA;
+  break;
+   default:
+  return false;
+   }
+
if (slice  0) {
   assert(target == GL_TEXTURE_3D ||
  target == GL_TEXTURE_2D_ARRAY ||
@@ -2998,11 +3022,11 @@ decompress_texture_image(struct gl_context *ctx,
case GL_TEXTURE_1D:
case GL_TEXTURE_1D_ARRAY:
   assert(!No compressed 1D textures.);
-  return;
+  return false;
 
case GL_TEXTURE_3D:
   assert(!No compressed 3D textures.);
-  return;
+  return false;
 
case GL_TEXTURE_CUBE_MAP_ARRAY:
   faceTarget = GL_TEXTURE_CUBE_MAP_POSITIVE_X + (slice % 6);
@@ -3024,27 +3048,35 @@ decompress_texture_image(struct gl_context *ctx,
  ctx-Texture.Unit[ctx-Texture.CurrentUnit].Sampler-Name : 0;
 
/* Create/bind FBO/renderbuffer */
-   if (decompress-FBO == 0) {
-  _mesa_GenFramebuffers(1, decompress-FBO);
-  _mesa_GenRenderbuffers(1, decompress-RBO);
-  _mesa_BindFramebuffer(GL_FRAMEBUFFER_EXT, decompress-FBO);
-  _mesa_BindRenderbuffer(GL_RENDERBUFFER_EXT, decompress-RBO);
+   if (decompress_fbo-FBO == 0) {
+  _mesa_GenFramebuffers(1, decompress_fbo-FBO);
+  _mesa_GenRenderbuffers(1, decompress_fbo-RBO);
+  _mesa_BindFramebuffer(GL_FRAMEBUFFER_EXT, decompress_fbo-FBO);
+  _mesa_BindRenderbuffer(GL_RENDERBUFFER_EXT, decompress_fbo-RBO);
   _mesa_FramebufferRenderbuffer(GL_FRAMEBUFFER_EXT,
GL_COLOR_ATTACHMENT0_EXT,
GL_RENDERBUFFER_EXT,
-   decompress-RBO);
+   decompress_fbo-RBO);
}
else {
-  _mesa_BindFramebuffer(GL_FRAMEBUFFER_EXT, decompress-FBO);
+  _mesa_BindFramebuffer(GL_FRAMEBUFFER_EXT, decompress_fbo-FBO);
}
 
/* alloc dest surface */
-   if (width  decompress-Width || height  decompress-Height) 

[Mesa-dev] [PATCH v2 07/12] mesa: Add texstore functions for BPTC-compressed textures

2014-08-06 Thread Neil Roberts
This adds compressors for all four of the BPTC compressed-texture formats. The
compressor is written from scratch and takes a very simple approach. It always
uses a single mode of the BPTC format (4 for unorm and 3 for half-floats) and
picks the two endpoints by dividing the texels into those which have more or
less than the average luminance of the block and then calculating an average
color of the texels within each division.

It's probably not really sensible to try to use BPTC compression at runtime
because for example with the Nvidia offline compression tool it can take in
the order of an hour to compress a full-screen image. With that in mind I
don't think it's worth having a proper compressor in Mesa and this approach
gives reasonable results for a usage that is basically a corner case.

v2: Always use the custom compressor, even for the unorm formats. Fix the
quantization step for the half-float format compressor. Fixed a typo which
was breaking the right-hand edge of half-float textures with a width that
isn't a multiple of four.
---
 src/mesa/main/texcompress_bptc.c | 689 +++
 src/mesa/main/texcompress_bptc.h |  10 +
 src/mesa/main/texstore.c |  10 +
 3 files changed, 709 insertions(+)

diff --git a/src/mesa/main/texcompress_bptc.c b/src/mesa/main/texcompress_bptc.c
index 7ec294b..9204f12 100644
--- a/src/mesa/main/texcompress_bptc.c
+++ b/src/mesa/main/texcompress_bptc.c
@@ -69,6 +69,12 @@ struct bptc_float_mode {
struct bptc_float_bitfield bitfields[24];
 };
 
+struct bit_writer {
+   uint8_t buf;
+   int pos;
+   uint8_t *dst;
+};
+
 static const struct bptc_unorm_mode
 bptc_unorm_modes[] = {
/* 0 */ { 3, 4, false, false, 4, 0, true,  false, 3, 0 },
@@ -958,3 +964,686 @@ _mesa_get_bptc_fetch_func(mesa_format format)
   return NULL;
}
 }
+
+static void
+write_bits(struct bit_writer *writer, int n_bits, int value)
+{
+   do {
+  if (n_bits + writer-pos = 8) {
+ *(writer-dst++) = writer-buf | (value  writer-pos);
+ writer-buf = 0;
+ value = (8 - writer-pos);
+ n_bits -= (8 - writer-pos);
+ writer-pos = 0;
+  } else {
+ writer-buf |= value  writer-pos;
+ writer-pos += n_bits;
+ break;
+  }
+   } while (n_bits  0);
+}
+
+static void
+get_average_luminance_alpha_unorm(int width, int height,
+  const uint8_t *src, int src_rowstride,
+  int *average_luminance, int *average_alpha)
+{
+   int luminance_sum = 0, alpha_sum = 0;
+   int y, x;
+
+   for (y = 0; y  height; y++) {
+  for (x = 0; x  width; x++) {
+ luminance_sum += src[0] + src[1] + src[2];
+ alpha_sum += src[3];
+ src += 4;
+  }
+  src += src_rowstride - width * 4;
+   }
+
+   *average_luminance = luminance_sum / (width * height);
+   *average_alpha = alpha_sum / (width * height);
+}
+
+static void
+get_rgba_endpoints_unorm(int width, int height,
+ const uint8_t *src, int src_rowstride,
+ int average_luminance, int average_alpha,
+ uint8_t endpoints[][4])
+{
+   int endpoint_luminances[2];
+   int midpoint;
+   int sums[2][4];
+   int endpoint;
+   int luminance;
+   uint8_t temp[3];
+   const uint8_t *p = src;
+   int rgb_left_endpoint_count = 0;
+   int alpha_left_endpoint_count = 0;
+   int y, x, i;
+
+   memset(sums, 0, sizeof sums);
+
+   for (y = 0; y  height; y++) {
+  for (x = 0; x  width; x++) {
+ luminance = p[0] + p[1] + p[2];
+ if (luminance  average_luminance) {
+endpoint = 0;
+rgb_left_endpoint_count++;
+ } else {
+endpoint = 1;
+ }
+ for (i = 0; i  3; i++)
+sums[endpoint][i] += p[i];
+
+ if (p[2]  average_alpha) {
+endpoint = 0;
+alpha_left_endpoint_count++;
+ } else {
+endpoint = 1;
+ }
+ sums[endpoint][3] += p[3];
+
+ p += 4;
+  }
+
+  p += src_rowstride - width * 4;
+   }
+
+   if (rgb_left_endpoint_count == 0 ||
+   rgb_left_endpoint_count == width * height) {
+  for (i = 0; i  3; i++)
+ endpoints[0][i] = endpoints[1][i] =
+(sums[0][i] + sums[1][i]) / (width * height);
+   } else {
+  for (i = 0; i  3; i++) {
+ endpoints[0][i] = sums[0][i] / rgb_left_endpoint_count;
+ endpoints[1][i] = (sums[1][i] /
+(width * height - rgb_left_endpoint_count));
+  }
+   }
+
+   if (alpha_left_endpoint_count == 0 ||
+   alpha_left_endpoint_count == width * height) {
+  endpoints[0][3] = endpoints[1][3] =
+ (sums[0][3] + sums[1][3]) / (width * height);
+   } else {
+ endpoints[0][3] = sums[0][3] / alpha_left_endpoint_count;
+ endpoints[1][3] = (sums[1][3] /
+(width * height - alpha_left_endpoint_count));
+   }
+
+   

[Mesa-dev] [PATCH 09/12] i965: Enable the GL_ARB_texture_compression_bptc extension

2014-08-06 Thread Neil Roberts
Enables the BPTC extension on Gen=7 and adds the necessary format mappings to
get the right surface type value.
---
 src/mesa/drivers/dri/i965/brw_surface_formats.c | 5 +
 src/mesa/drivers/dri/i965/intel_extensions.c| 2 ++
 2 files changed, 7 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c 
b/src/mesa/drivers/dri/i965/brw_surface_formats.c
index 41f4221..974f2df 100644
--- a/src/mesa/drivers/dri/i965/brw_surface_formats.c
+++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c
@@ -487,6 +487,11 @@ brw_format_for_mesa_format(mesa_format mesa_format)
   [MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1] = 
BRW_SURFACEFORMAT_ETC2_RGB8_PTA,
   [MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1] = 
BRW_SURFACEFORMAT_ETC2_SRGB8_PTA,
 
+  [MESA_FORMAT_BPTC_RGBA_UNORM] = BRW_SURFACEFORMAT_BC7_UNORM,
+  [MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM] = BRW_SURFACEFORMAT_BC7_UNORM_SRGB,
+  [MESA_FORMAT_BPTC_RGB_SIGNED_FLOAT] = BRW_SURFACEFORMAT_BC6H_SF16,
+  [MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT] = BRW_SURFACEFORMAT_BC6H_UF16,
+
   [MESA_FORMAT_A_SNORM8] = 0,
   [MESA_FORMAT_L_SNORM8] = 0,
   [MESA_FORMAT_L8A8_SNORM] = 0,
diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index 4ee8636..b14b9c7 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -302,6 +302,8 @@ intelInitExtensions(struct gl_context *ctx)
  ctx-Extensions.ARB_viewport_array = true;
  ctx-Extensions.AMD_vertex_shader_viewport_index = true;
   }
+
+  ctx-Extensions.ARB_texture_compression_bptc = true;
}
 
if (brw-gen = 8) {
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fix z_offset computation in intel_miptree_unmap_depthstencil()

2014-08-06 Thread Neil Roberts
I'd just liked to point out that I made a nearly identical patch before
this patch was posted but I didn't get any review despite prodding
people a few times on #dri-devel. Maybe we should try to get into the
habit of searching patchwork for existing patches before posting to the
list. Does anyone have any suggestions for how I can get my patches more
noticed?

http://patchwork.freedesktop.org/patch/27168/

I also made a piglit test for the problem here:

http://cgit.freedesktop.org/piglit/commit/?id=108a17a4d78bcc7480754d2104b4

Regards,
- Neil

Jordan Justen jljus...@gmail.com writes:

 Reviewed-by: Jordan Justen jordan.l.jus...@intel.com

 On Wed, Jul 16, 2014 at 3:32 PM, Anuj Phogat anuj.pho...@gmail.com wrote:
 The bug is triggered by using glTexSubImage2d() with GL_DEPTH_STENCIL
 as base internal format and non-zero x, y offsets. Currently x, y
 offsets are ignored while updating the texture image.

 Fixes Khronos GLES3 CTS tests:
 npot_tex_sub_image_2d
 npot_tex_sub_image_3d
 npot_pbo_tex_sub_image_2d
 npot_pbo_tex_sub_image_2d

 Cc: mesa-sta...@lists.freedesktop.org
 Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
 ---
  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
 b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
 index 2ab0faa..b36ffc7 100644
 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
 +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
 @@ -2129,9 +2129,9 @@ intel_miptree_unmap_depthstencil(struct brw_context 
 *brw,
  x + s_image_x + map-x,
  y + s_image_y + map-y,
  brw-has_swizzling);
 -   ptrdiff_t z_offset = ((y + z_image_y) *
 +   ptrdiff_t z_offset = ((y + z_image_y + map-y) *
(z_mt-pitch / 4) +
 - (x + z_image_x));
 + (x + z_image_x + map-x));

 if (map_z32f_x24s8) {
z_map[z_offset] = packed_map[(y * map-w + x) * 2 + 0];
 --
 1.9.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/12] mesa/format_info: Add support for the BPTC layout

2014-08-06 Thread Jason Ekstrand
This looks fine to me.
Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com


On Wed, Aug 6, 2014 at 9:27 AM, Neil Roberts n...@linux.intel.com wrote:

 Adds the ‘bptc’ layout to get_channel_bits. The channel bits for BPTC
 depend
 on the mode but as it only has to be an approximation we can set it to 4
 like
 for S3TC.
 ---
  src/mesa/main/format_info.py | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/src/mesa/main/format_info.py b/src/mesa/main/format_info.py
 index a0eecd3..fc40dc4 100644
 --- a/src/mesa/main/format_info.py
 +++ b/src/mesa/main/format_info.py
 @@ -110,7 +110,7 @@ def get_channel_bits(fmat, chan_name):
 if fmat.is_compressed():
# These values are pretty-much bogus, but OpenGL requires that we
# return an approximate number of bits.
 -  if fmat.layout == 's3tc':
 +  if fmat.layout in ('s3tc', 'bptc'):
   return 4 if fmat.has_channel(chan_name) else 0
elif fmat.layout == 'fxt1':
   if chan_name in 'rgb':
 --
 1.9.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/12] mesa/format_info: Add support for compressed floating-point formats

2014-08-06 Thread Jason Ekstrand
Looks fine
Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com


On Wed, Aug 6, 2014 at 9:27 AM, Neil Roberts n...@linux.intel.com wrote:

 If the name of a compressed texture format has ‘FLOAT’ in it it will now
 set
 the data type of the format to GL_FLOAT. This will be needed for the BPTC
 half-float formats.
 ---
  src/mesa/main/format_info.py | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

 diff --git a/src/mesa/main/format_info.py b/src/mesa/main/format_info.py
 index 448bd00..a0eecd3 100644
 --- a/src/mesa/main/format_info.py
 +++ b/src/mesa/main/format_info.py
 @@ -62,7 +62,9 @@ def get_gl_base_format(fmat):

  def get_gl_data_type(fmat):
 if fmat.is_compressed():
 -  if 'SIGNED' in fmat.name or 'SNORM' in fmat.name:
 +  if 'FLOAT' in fmat.name:
 + return 'GL_FLOAT'
 +  elif 'SIGNED' in fmat.name or 'SNORM' in fmat.name:
   return 'GL_SIGNED_NORMALIZED'
else:
   return 'GL_UNSIGNED_NORMALIZED'
 --
 1.9.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/12] mesa/format_info: Add support for the BPTC layout

2014-08-06 Thread Jason Ekstrand
Sorry, said that just a little early.  Do we really want 4 bits for a
floating-point format?  How many bits does nvidia report?
--Jason


On Wed, Aug 6, 2014 at 9:55 AM, Jason Ekstrand ja...@jlekstrand.net wrote:

 This looks fine to me.
 Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com


 On Wed, Aug 6, 2014 at 9:27 AM, Neil Roberts n...@linux.intel.com wrote:

 Adds the ‘bptc’ layout to get_channel_bits. The channel bits for BPTC
 depend
 on the mode but as it only has to be an approximation we can set it to 4
 like
 for S3TC.
 ---
  src/mesa/main/format_info.py | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/src/mesa/main/format_info.py b/src/mesa/main/format_info.py
 index a0eecd3..fc40dc4 100644
 --- a/src/mesa/main/format_info.py
 +++ b/src/mesa/main/format_info.py
 @@ -110,7 +110,7 @@ def get_channel_bits(fmat, chan_name):
 if fmat.is_compressed():
# These values are pretty-much bogus, but OpenGL requires that we
# return an approximate number of bits.
 -  if fmat.layout == 's3tc':
 +  if fmat.layout in ('s3tc', 'bptc'):
   return 4 if fmat.has_channel(chan_name) else 0
elif fmat.layout == 'fxt1':
   if chan_name in 'rgb':
 --
 1.9.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa (master): mesa/formats: Add layout and swizzle information

2014-08-06 Thread Jason Ekstrand
Michael,
Could you please point me at the failing tests.  I don't have a radeon, but
I can run with llvmpipe or dri swrast and try to find the bug that way.
--Jason Ekstrand


On Wed, Aug 6, 2014 at 2:36 AM, Michel Dänzer mic...@daenzer.net wrote:

 On 06.08.2014 18:28, Michel Dänzer wrote:
  On 06.08.2014 03:08, Jason Ekstrand wrote:
  Module: Mesa
  Branch: master
  Commit: 850fb0d1dca616179d3239a7b7bd94fe1979604c
  URL:
 http://cgit.freedesktop.org/mesa/mesa/commit/?id=850fb0d1dca616179d3239a7b7bd94fe1979604c
 
  Author: Jason Ekstrand jason.ekstr...@intel.com
  Date:   Thu Jul 10 23:59:42 2014 -0700
 
  mesa/formats: Add layout and swizzle information
 
  v2: Move the MESA_FORMAT_SWIZZLE enum to the top of the file
 
  Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com
  Reviewed-by: Brian Paul bri...@vmware.com
 
  As of this commit, ~20 depth/stencil related piglit tests have
  regressed with the radeonsi driver compared to before your changes.
  See below for an example failure of the draw-pixels test.
 
  That test is already broken with the previous commits, each of them
  with slightly different failure symptoms.

 I meant to write: 'That test is already broken with the three previous
 commits, [...]'


 --
 Earthling Michel Dänzer|  http://www.amd.com
 Libre software enthusiast  |Mesa and X developer

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] mesa/formats: Use the correct swizzle parameter for the 11-bit EAC formats

2014-08-06 Thread Jason Ekstrand
Red-only formats should be x001 and RG formats should be xy01.

Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com
---
 src/mesa/main/formats.csv | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/mesa/main/formats.csv b/src/mesa/main/formats.csv
index 5abb706..f45e34b 100644
--- a/src/mesa/main/formats.csv
+++ b/src/mesa/main/formats.csv
@@ -274,9 +274,9 @@ MESA_FORMAT_ETC2_RGB8 , etc2  , 4, 4, 
x64 , , ,
 MESA_FORMAT_ETC2_SRGB8, etc2  , 4, 4, x64 , , ,
 , xyz1, srgb
 MESA_FORMAT_ETC2_RGBA8_EAC, etc2  , 4, 4, x128, , ,
 , xyzw, rgb
 MESA_FORMAT_ETC2_SRGB8_ALPHA8_EAC , etc2  , 4, 4, x128, , ,
 , xyzw, srgb
-MESA_FORMAT_ETC2_R11_EAC  , etc2  , 4, 4, x64 , , ,
 , xyzw, rgb
-MESA_FORMAT_ETC2_RG11_EAC , etc2  , 4, 4, x128, , ,
 , xyzw, rgb
-MESA_FORMAT_ETC2_SIGNED_R11_EAC   , etc2  , 4, 4, x64 , , ,
 , xyzw, rgb
-MESA_FORMAT_ETC2_SIGNED_RG11_EAC  , etc2  , 4, 4, x128, , ,
 , xyzw, rgb
+MESA_FORMAT_ETC2_R11_EAC  , etc2  , 4, 4, x64 , , ,
 , x001, rgb
+MESA_FORMAT_ETC2_RG11_EAC , etc2  , 4, 4, x128, , ,
 , xy01, rgb
+MESA_FORMAT_ETC2_SIGNED_R11_EAC   , etc2  , 4, 4, x64 , , ,
 , x001, rgb
+MESA_FORMAT_ETC2_SIGNED_RG11_EAC  , etc2  , 4, 4, x128, , ,
 , xy01, rgb
 MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1 , etc2  , 4, 4, x64 , , ,
 , xyzw, rgb
 MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1, etc2  , 4, 4, x128, , ,
 , xyzw, srgb
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa (master): mesa/formats: Add layout and swizzle information

2014-08-06 Thread Jason Ekstrand
Michael,
With the two patches I just sent to the list, the generated format_info
structure is now binary-identical to the original structure commited to git
with the following two exceptions: The string name parameter for
MESA_FORMAT_R9G9B9E5_FLOAT was updated from MESA_FORMAT_RGB9_E5 to
MESA_FORMAT_R9G9B9E5_FLOAT.  The LATC formats now report 8 bits of
precision instead of 4;  This makes the LATC formats match the RGTC formats
which use exactly the same compression just interpreted differently.

I'm really confused about where the bug is coming from.  That said, I'm
going to run some llvmpipe tests to see if I can reproduce.
--Jason Ekstrand


On Wed, Aug 6, 2014 at 10:02 AM, Jason Ekstrand ja...@jlekstrand.net
wrote:

 Michael,
 Could you please point me at the failing tests.  I don't have a radeon,
 but I can run with llvmpipe or dri swrast and try to find the bug that way.
 --Jason Ekstrand


 On Wed, Aug 6, 2014 at 2:36 AM, Michel Dänzer mic...@daenzer.net wrote:

 On 06.08.2014 18:28, Michel Dänzer wrote:
  On 06.08.2014 03:08, Jason Ekstrand wrote:
  Module: Mesa
  Branch: master
  Commit: 850fb0d1dca616179d3239a7b7bd94fe1979604c
  URL:
 http://cgit.freedesktop.org/mesa/mesa/commit/?id=850fb0d1dca616179d3239a7b7bd94fe1979604c
 
  Author: Jason Ekstrand jason.ekstr...@intel.com
  Date:   Thu Jul 10 23:59:42 2014 -0700
 
  mesa/formats: Add layout and swizzle information
 
  v2: Move the MESA_FORMAT_SWIZZLE enum to the top of the file
 
  Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com
  Reviewed-by: Brian Paul bri...@vmware.com
 
  As of this commit, ~20 depth/stencil related piglit tests have
  regressed with the radeonsi driver compared to before your changes.
  See below for an example failure of the draw-pixels test.
 
  That test is already broken with the previous commits, each of them
  with slightly different failure symptoms.

 I meant to write: 'That test is already broken with the three previous
 commits, [...]'


 --
 Earthling Michel Dänzer|  http://www.amd.com
 Libre software enthusiast  |Mesa and X developer



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] mesa/formats: Fix the size of ETC2_SRGB8_PUNCHTHROUGH_ALPHA1

2014-08-06 Thread Jason Ekstrand
Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com
---
 src/mesa/main/formats.csv | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/main/formats.csv b/src/mesa/main/formats.csv
index f45e34b..eade6fa 100644
--- a/src/mesa/main/formats.csv
+++ b/src/mesa/main/formats.csv
@@ -279,4 +279,4 @@ MESA_FORMAT_ETC2_RG11_EAC , etc2  , 4, 4, 
x128, , ,
 MESA_FORMAT_ETC2_SIGNED_R11_EAC   , etc2  , 4, 4, x64 , , ,
 , x001, rgb
 MESA_FORMAT_ETC2_SIGNED_RG11_EAC  , etc2  , 4, 4, x128, , ,
 , xy01, rgb
 MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1 , etc2  , 4, 4, x64 , , ,
 , xyzw, rgb
-MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1, etc2  , 4, 4, x128, , ,
 , xyzw, srgb
+MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1, etc2  , 4, 4, x64 , , ,
 , xyzw, srgb
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa (master): mesa/formats: Add layout and swizzle information

2014-08-06 Thread Marek Olšák
FYI, it seems to be DrawPixels(GL_STENCIL_INDEX) that is broken. We
actually use S8 texturing for DrawPixels and some of the functions you
changed probably don't support S8 anymore.

Marek

On Wed, Aug 6, 2014 at 7:37 PM, Jason Ekstrand ja...@jlekstrand.net wrote:
 Michael,
 With the two patches I just sent to the list, the generated format_info
 structure is now binary-identical to the original structure commited to git
 with the following two exceptions: The string name parameter for
 MESA_FORMAT_R9G9B9E5_FLOAT was updated from MESA_FORMAT_RGB9_E5 to
 MESA_FORMAT_R9G9B9E5_FLOAT.  The LATC formats now report 8 bits of
 precision instead of 4;  This makes the LATC formats match the RGTC formats
 which use exactly the same compression just interpreted differently.

 I'm really confused about where the bug is coming from.  That said, I'm
 going to run some llvmpipe tests to see if I can reproduce.
 --Jason Ekstrand


 On Wed, Aug 6, 2014 at 10:02 AM, Jason Ekstrand ja...@jlekstrand.net
 wrote:

 Michael,
 Could you please point me at the failing tests.  I don't have a radeon,
 but I can run with llvmpipe or dri swrast and try to find the bug that way.
 --Jason Ekstrand


 On Wed, Aug 6, 2014 at 2:36 AM, Michel Dänzer mic...@daenzer.net wrote:

 On 06.08.2014 18:28, Michel Dänzer wrote:
  On 06.08.2014 03:08, Jason Ekstrand wrote:
  Module: Mesa
  Branch: master
  Commit: 850fb0d1dca616179d3239a7b7bd94fe1979604c
  URL:
  http://cgit.freedesktop.org/mesa/mesa/commit/?id=850fb0d1dca616179d3239a7b7bd94fe1979604c
 
  Author: Jason Ekstrand jason.ekstr...@intel.com
  Date:   Thu Jul 10 23:59:42 2014 -0700
 
  mesa/formats: Add layout and swizzle information
 
  v2: Move the MESA_FORMAT_SWIZZLE enum to the top of the file
 
  Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com
  Reviewed-by: Brian Paul bri...@vmware.com
 
  As of this commit, ~20 depth/stencil related piglit tests have
  regressed with the radeonsi driver compared to before your changes.
  See below for an example failure of the draw-pixels test.
 
  That test is already broken with the previous commits, each of them
  with slightly different failure symptoms.

 I meant to write: 'That test is already broken with the three previous
 commits, [...]'


 --
 Earthling Michel Dänzer|  http://www.amd.com
 Libre software enthusiast  |Mesa and X developer




 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/20] i965/cfg: Add functions to test if a block is a successor/predecessor.

2014-08-06 Thread Matt Turner
On Tue, Aug 5, 2014 at 10:21 AM, Pohjolainen, Topi
topi.pohjolai...@intel.com wrote:
 On Thu, Jul 24, 2014 at 07:54:15PM -0700, Matt Turner wrote:
 ---
  src/mesa/drivers/dri/i965/brw_cfg.cpp | 24 
  src/mesa/drivers/dri/i965/brw_cfg.h   |  2 ++
  2 files changed, 26 insertions(+)

 diff --git a/src/mesa/drivers/dri/i965/brw_cfg.cpp 
 b/src/mesa/drivers/dri/i965/brw_cfg.cpp
 index d806b83..9cd8b9f 100644
 --- a/src/mesa/drivers/dri/i965/brw_cfg.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_cfg.cpp
 @@ -71,6 +71,30 @@ bblock_t::add_successor(void *mem_ctx, bblock_t 
 *successor)
 children.push_tail(::link(mem_ctx, successor));
  }

 +bool
 +bblock_t::is_predecessor_of(const bblock_t *block) const
 +{
 +   foreach_list_typed_safe (bblock_link, parent, link, block-parents) {

 I read patch number three again, and noticed this small formatting change
 there as well. I haven't seen us leaving the space before ( anywhere else.

I figure I should have a space between the macro and the ( since we
put a space there for regular for loops.

 +  if (parent-block == this) {
 + return true;
 +  }

 We have one line blocks with and without {}. I just thought I mention in
 case you didn't mean to.

Right, I'll drop the {}.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/20] i965/cfg: Add a foreach_inst_in_block_safe macro.

2014-08-06 Thread Matt Turner
On Wed, Aug 6, 2014 at 5:22 AM, Pohjolainen, Topi
topi.pohjolai...@intel.com wrote:
 On Tue, Aug 05, 2014 at 09:14:55PM +0300, Pohjolainen, Topi wrote:
 On Thu, Jul 24, 2014 at 07:54:12PM -0700, Matt Turner wrote:
  ---
   src/mesa/drivers/dri/i965/brw_cfg.h | 8 
   1 file changed, 8 insertions(+)
 
  diff --git a/src/mesa/drivers/dri/i965/brw_cfg.h 
  b/src/mesa/drivers/dri/i965/brw_cfg.h
  index a5d2df5..913a1ed 100644
  --- a/src/mesa/drivers/dri/i965/brw_cfg.h
  +++ b/src/mesa/drivers/dri/i965/brw_cfg.h
  @@ -120,6 +120,14 @@ struct cfg_t {
   __inst != __block-end-next;  \
   __inst = (__type *)__inst-next)
 
  +#define foreach_inst_in_block_safe(__type, __inst, __block)\
  +   for (__type *__inst = (__type *)__block-start, \
  +   *__next = (__type *)__inst-next,   \
  +   *__end = (__type *)__block-end-next-next;\

 Patches 4 and 7 make sense but the double -next-next here is not obvious
 to me.

Right, yep. exec_list uses head and tail sentinels, so the double-next
handles that. Explained below:

 I tried handwriting instructions into blocks (this is purely arbitrary):

 ipopcode
 --
 0  :  BRW_OPCODE_?
 ..
 k  :  BRW_OPCODE_IF
 k+1:  BRW_OPCODE_?
 ..
 n  :  BRW_OPCODE_ELSE
 n+1:  BRW_OPCODE_?
 ..
 m  :  BRW_OPCODE_ENDIF
 m+1:  BRW_OPCODE_?
 ..
 t  :  BRW_OPCODE_?


 Following the logic in the constructor of cfg_t, I would deduce this:

 block 0:
start_ip = 0
num = 0
start = inst_0
end = inst_k  (if)

 block 1:
start_ip = k+1
num = 1
start = inst_k+1
end = inst_n  (else)

 block 2:
start_ip = n+1
num = 2
start = inst_n+1
end = inst_m-1

 block 3:
start_ip = m
num = 3
start = inst_m(endif)
end = inst_t


 And as instructions are inherited from exec_node, for block 3 end-next
 should be NULL, right?

Since exec_list uses head and tail sentinels, so block[3]-end-next
will actually be the tail sentinel (and block[2]-end-next will be
the first instruction of block[3]).

The __end variable prevents us from dereferencing NULL if we remove
the last instruction in a block (and therefore remove the block). Note
that the continuing condition is (__next != __end). For each block, we
want to iterate through the instructions until we hit
block-end-next-next because if the block

 - isn't the last block, end-next-next will be two nodes (I say
node, rather than instruction because of the tail sentinel) after the
end

 - is the last block, end-next-next will be NULL.

In both cases we want to compare with __next, which after the
iteration is one past the node after block-end.

Does that make sense? There are really two things to remember: (1)
head and tail sentinels, and (2) this macro is _safe, so we're
comparing with __next (i.e., one past the end).
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/9] i915: Gen2 texturing fixes and a few random patches

2014-08-06 Thread ville . syrjala
From: Ville Syrjälä ville.syrj...@linux.intel.com

I had a few rainy days during my summer vacation so I decided to fix a
chromnium-bsu texturing problem that was nagging me for a while now. I
ended up fixing a few other things too that I spotted mostly from reading
the code.

The aniso vs. mip filter thing probably comes down to personal preference,
but at least to me aniso+mip nearest looks better than trilinear. At least
when playing the old classic glaxium :)

I have no idea if the scissor patch makes any difference anywhere. I just
caught the note in the spec and noticed we're doing it in the opposite order.

The rest should be pretty clear.

Ville Syrjälä (9):
  i915: Only use TEXCOORDTYPE_VECTOR with cube maps on gen2
  i915: Fix GL_DOT3_RGBA a bit
  i915: Use L8A8 instead of I8 to simulate A8 on gen2
  i915: Override mip filter to nearest with aniso
  i915: Accept GL_DEPTH_STENCIL GL_DEPTH_COMPONENT formats for
renderbuffers
  i915: Kill intel_context::hw_stencil
  i915: Protect macro argument for TEXTURE_SET()
  i915: Don't call _mesa_meta_glsl_Clear() on gen2
  i915: Emit 3DSTATE_SCISSOR_RECTANGLE_0 before 3DSTATE_SCISSOR_ENABLE

 src/mesa/drivers/dri/i915/i830_context.h|  8 +++---
 src/mesa/drivers/dri/i915/i830_reg.h|  2 +-
 src/mesa/drivers/dri/i915/i830_state.c  |  4 +--
 src/mesa/drivers/dri/i915/i830_texblend.c   |  5 ++--
 src/mesa/drivers/dri/i915/i830_texstate.c   |  4 +--
 src/mesa/drivers/dri/i915/i830_vtbl.c   | 39 +++--
 src/mesa/drivers/dri/i915/i915_context.c|  3 ++-
 src/mesa/drivers/dri/i915/i915_context.h|  8 +++---
 src/mesa/drivers/dri/i915/i915_state.c  |  4 +--
 src/mesa/drivers/dri/i915/i915_vtbl.c   |  8 +++---
 src/mesa/drivers/dri/i915/intel_clear.c |  2 +-
 src/mesa/drivers/dri/i915/intel_context.c   |  1 -
 src/mesa/drivers/dri/i915/intel_context.h   |  1 -
 src/mesa/drivers/dri/i915/intel_fbo.c   |  9 +++
 src/mesa/drivers/dri/i915/intel_tex_image.c | 22 
 15 files changed, 76 insertions(+), 44 deletions(-)

-- 
1.8.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/9] i915: Use L8A8 instead of I8 to simulate A8 on gen2

2014-08-06 Thread ville . syrjala
From: Ville Syrjälä ville.syrj...@linux.intel.com

Gen2 doesn't support the A8 texture format. Currently the driver
substitutes it with I8, but that results in incorrect RGB values.
Use A8L8 instead. We end up wasting a bit of memory, but at least
we should get the correct results.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72819
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80050
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38873
Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com
---
 src/mesa/drivers/dri/i915/i830_texstate.c   |  2 --
 src/mesa/drivers/dri/i915/i915_context.c|  3 ++-
 src/mesa/drivers/dri/i915/intel_tex_image.c | 22 ++
 3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i915/i830_texstate.c 
b/src/mesa/drivers/dri/i915/i830_texstate.c
index 58d3356..b1414c7 100644
--- a/src/mesa/drivers/dri/i915/i830_texstate.c
+++ b/src/mesa/drivers/dri/i915/i830_texstate.c
@@ -47,8 +47,6 @@ translate_texture_format(GLuint mesa_format)
   return MAPSURF_8BIT | MT_8BIT_L8;
case MESA_FORMAT_I_UNORM8:
   return MAPSURF_8BIT | MT_8BIT_I8;
-   case MESA_FORMAT_A_UNORM8:
-  return MAPSURF_8BIT | MT_8BIT_I8; /* Kludge! */
case MESA_FORMAT_L8A8_UNORM:
   return MAPSURF_16BIT | MT_16BIT_AY88;
case MESA_FORMAT_B5G6R5_UNORM:
diff --git a/src/mesa/drivers/dri/i915/i915_context.c 
b/src/mesa/drivers/dri/i915/i915_context.c
index 7f43896..3fd571d 100644
--- a/src/mesa/drivers/dri/i915/i915_context.c
+++ b/src/mesa/drivers/dri/i915/i915_context.c
@@ -109,7 +109,8 @@ intel_init_texture_formats(struct gl_context *ctx)
ctx-TextureFormatSupported[MESA_FORMAT_B5G5R5A1_UNORM] = true;
ctx-TextureFormatSupported[MESA_FORMAT_B5G6R5_UNORM] = true;
ctx-TextureFormatSupported[MESA_FORMAT_L_UNORM8] = true;
-   ctx-TextureFormatSupported[MESA_FORMAT_A_UNORM8] = true;
+   if (intel-gen == 3)
+  ctx-TextureFormatSupported[MESA_FORMAT_A_UNORM8] = true;
ctx-TextureFormatSupported[MESA_FORMAT_I_UNORM8] = true;
ctx-TextureFormatSupported[MESA_FORMAT_L8A8_UNORM] = true;
 
diff --git a/src/mesa/drivers/dri/i915/intel_tex_image.c 
b/src/mesa/drivers/dri/i915/intel_tex_image.c
index 57674b9..be9a4ff 100644
--- a/src/mesa/drivers/dri/i915/intel_tex_image.c
+++ b/src/mesa/drivers/dri/i915/intel_tex_image.c
@@ -14,6 +14,7 @@
 #include main/texobj.h
 #include main/teximage.h
 #include main/texstore.h
+#include main/texformat.h
 
 #include intel_context.h
 #include intel_mipmap_tree.h
@@ -362,9 +363,30 @@ intel_image_target_texture_2d(struct gl_context *ctx, 
GLenum target,
   image-tile_x, image-tile_y);
 }
 
+static mesa_format intel_choose_tex_format(struct gl_context *ctx,
+   GLenum target,
+   GLint internalFormat,
+   GLenum format, GLenum type)
+{
+   struct intel_context *intel = intel_context(ctx);
+
+   switch (internalFormat) {
+   case GL_ALPHA:
+   case GL_ALPHA4:
+   case GL_ALPHA8:
+  /* no A8 on gen2 :( */
+  if (intel-gen == 2)
+ return MESA_FORMAT_L8A8_UNORM;
+  /* fall through */
+   default:
+  return _mesa_choose_tex_format(ctx, target, internalFormat, format, 
type);
+   }
+}
+
 void
 intelInitTextureImageFuncs(struct dd_function_table *functions)
 {
functions-TexImage = intelTexImage;
+   functions-ChooseTextureFormat = intel_choose_tex_format;
functions-EGLImageTargetTexture2D = intel_image_target_texture_2d;
 }
-- 
1.8.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/9] i915: Only use TEXCOORDTYPE_VECTOR with cube maps on gen2

2014-08-06 Thread ville . syrjala
From: Ville Syrjälä ville.syrj...@linux.intel.com

Check that the target is GL_TEXTURE_CUBE_MAP before emitting
TEXCOORDTYPE_VECTOR texture coordinates.

I'm not sure if the hardware would like CARTESIAN coordinates
with cube maps, and as I'm too lazy to find out just emit the
VECTOR coordinates for cube maps always. For other targets use
CARTESIAN or HOMOGENOUS depending on the number of texture
coordinates provided.

Fixes rendering of the electric background texture in chromium-bsu
main menu. We appear to be provided with three texture coordinates
there (I'm guessing due to the funky texture matrix rotation it does).
So the code would decide to use TEXCOORDTYPE_VECTOR instead of
TEXCOORDTYPE_CARTESIAN even though we're dealing with a 2D texure.
The results weren't what one might expect.

demos/cubemap still works, which hopefully indicates that this doesn't
break things.

Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com
---
 src/mesa/drivers/dri/i915/i830_vtbl.c | 37 ++-
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/src/mesa/drivers/dri/i915/i830_vtbl.c 
b/src/mesa/drivers/dri/i915/i830_vtbl.c
index 53d408b..0f22d86 100644
--- a/src/mesa/drivers/dri/i915/i830_vtbl.c
+++ b/src/mesa/drivers/dri/i915/i830_vtbl.c
@@ -134,27 +134,28 @@ i830_render_start(struct intel_context *intel)
 GLuint mcs = (i830-state.Tex[i][I830_TEXREG_MCS] 
   ~TEXCOORDTYPE_MASK);
 
-switch (sz) {
-case 1:
-case 2:
-   emit = EMIT_2F;
-   sz = 2;
-   mcs |= TEXCOORDTYPE_CARTESIAN;
-   break;
-case 3:
+if (intel-ctx.Texture.Unit[i]._Current-Target == 
GL_TEXTURE_CUBE_MAP) {
emit = EMIT_3F;
sz = 3;
mcs |= TEXCOORDTYPE_VECTOR;
-   break;
-case 4:
-   emit = EMIT_3F_XYW;
-   sz = 3;
-   mcs |= TEXCOORDTYPE_HOMOGENEOUS;
-   break;
-default:
-   continue;
-};
-
+} else {
+   switch (sz) {
+   case 1:
+   case 2:
+   case 3:
+  emit = EMIT_2F;
+  sz = 2;
+  mcs |= TEXCOORDTYPE_CARTESIAN;
+  break;
+   case 4:
+  emit = EMIT_3F_XYW;
+  sz = 3;
+  mcs |= TEXCOORDTYPE_HOMOGENEOUS;
+  break;
+   default:
+  continue;
+   }
+}
 
 EMIT_ATTR(_TNL_ATTRIB_TEX0 + i, emit, 0);
 v2 |= VRTX_TEX_SET_FMT(count, SZ_TO_HW(sz));
-- 
1.8.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/9] i915: Kill intel_context::hw_stencil

2014-08-06 Thread ville . syrjala
From: Ville Syrjälä ville.syrj...@linux.intel.com

ctx.hw_stencil is not used anywhere so kill it.

Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com
---
 src/mesa/drivers/dri/i915/intel_context.c | 1 -
 src/mesa/drivers/dri/i915/intel_context.h | 1 -
 2 files changed, 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i915/intel_context.c 
b/src/mesa/drivers/dri/i915/intel_context.c
index 3104776..12a1d2b 100644
--- a/src/mesa/drivers/dri/i915/intel_context.c
+++ b/src/mesa/drivers/dri/i915/intel_context.c
@@ -507,7 +507,6 @@ intelInitContext(struct intel_context *intel,
 
_mesa_meta_init(ctx);
 
-   intel-hw_stencil = mesaVis  mesaVis-stencilBits  mesaVis-depthBits 
== 24;
intel-hw_stipple = 1;
 
intel-RenderIndex = ~0;
diff --git a/src/mesa/drivers/dri/i915/intel_context.h 
b/src/mesa/drivers/dri/i915/intel_context.h
index fccf821..c314594 100644
--- a/src/mesa/drivers/dri/i915/intel_context.h
+++ b/src/mesa/drivers/dri/i915/intel_context.h
@@ -226,7 +226,6 @@ struct intel_context
 
GLfloat polygon_offset_scale;/* dependent on depth_scale, bpp */
 
-   bool hw_stencil;
bool hw_stipple;
bool no_rast;
bool always_flush_batch;
-- 
1.8.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/9] i915: Accept GL_DEPTH_STENCIL GL_DEPTH_COMPONENT formats for renderbuffers

2014-08-06 Thread ville . syrjala
From: Ville Syrjälä ville.syrj...@linux.intel.com

Gen2 doesn't support depth/stencil textures, and since

 commit c1d4d4999303f9167b20f4e0674b9436e6295cf7
 Author: Ville Syrjälä ville.syrj...@linux.intel.com
 Date:   Thu Apr 24 14:11:43 2014 +0300

i915: Don't advertise Z formats in TextureFormatSupported on gen2

depth/stencil formats are no longer accepted as texture formats.
However we still want depth/stencil renderbuffers, so add explicit
format checks to intel_alloc_renderbuffer_storage() to allow such
things.

Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com
---
 src/mesa/drivers/dri/i915/intel_fbo.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/mesa/drivers/dri/i915/intel_fbo.c 
b/src/mesa/drivers/dri/i915/intel_fbo.c
index a77c3d6..b260d16 100644
--- a/src/mesa/drivers/dri/i915/intel_fbo.c
+++ b/src/mesa/drivers/dri/i915/intel_fbo.c
@@ -180,6 +180,15 @@ intel_renderbuffer_format(struct gl_context * ctx, GLenum 
internalFormat)
   return intel-ctx.Driver.ChooseTextureFormat(ctx, GL_TEXTURE_2D,
internalFormat,
GL_NONE, GL_NONE);
+
+   case GL_DEPTH_COMPONENT16:
+  return MESA_FORMAT_Z_UNORM16;
+   case GL_DEPTH_COMPONENT:
+   case GL_DEPTH_COMPONENT24:
+   case GL_DEPTH_COMPONENT32:
+  return MESA_FORMAT_Z24_UNORM_X8_UINT;
+   case GL_DEPTH_STENCIL_EXT:
+   case GL_DEPTH24_STENCIL8_EXT:
case GL_STENCIL_INDEX:
case GL_STENCIL_INDEX1_EXT:
case GL_STENCIL_INDEX4_EXT:
-- 
1.8.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 9/9] i915: Emit 3DSTATE_SCISSOR_RECTANGLE_0 before 3DSTATE_SCISSOR_ENABLE

2014-08-06 Thread ville . syrjala
From: Ville Syrjälä ville.syrj...@linux.intel.com

According to gen2 BSpec the pipeline must be flushed at least up to the
windower before changing the scissor rect enable field. Emitting the
3DSTATE_SCISSOR_RECTANGLE_0 before 3DSTATE_SCISSOR_ENABLE is sufficient
to do that.

gen3 BSpec no longer has that piece of text, but let's make the same
change there too for symmetry. The spec does still say that the scissor
rectangle must be defined before enabling it, so the new order does seem
more in line with the spec.

Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com
---
 src/mesa/drivers/dri/i915/i830_context.h | 8 
 src/mesa/drivers/dri/i915/i830_state.c   | 4 ++--
 src/mesa/drivers/dri/i915/i830_vtbl.c| 2 +-
 src/mesa/drivers/dri/i915/i915_context.h | 8 
 src/mesa/drivers/dri/i915/i915_state.c   | 4 ++--
 src/mesa/drivers/dri/i915/i915_vtbl.c| 8 
 6 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/src/mesa/drivers/dri/i915/i830_context.h 
b/src/mesa/drivers/dri/i915/i830_context.h
index 1a7222d..09076c3 100644
--- a/src/mesa/drivers/dri/i915/i830_context.h
+++ b/src/mesa/drivers/dri/i915/i830_context.h
@@ -55,10 +55,10 @@
 #define I830_DESTREG_DBUFADDR1 3
 #define I830_DESTREG_DV0 4
 #define I830_DESTREG_DV1 5
-#define I830_DESTREG_SENABLE 6
-#define I830_DESTREG_SR0 7
-#define I830_DESTREG_SR1 8
-#define I830_DESTREG_SR2 9
+#define I830_DESTREG_SR0 6
+#define I830_DESTREG_SR1 7
+#define I830_DESTREG_SR2 8
+#define I830_DESTREG_SENABLE 9
 #define I830_DESTREG_DRAWRECT0 10
 #define I830_DESTREG_DRAWRECT1 11
 #define I830_DESTREG_DRAWRECT2 12
diff --git a/src/mesa/drivers/dri/i915/i830_state.c 
b/src/mesa/drivers/dri/i915/i830_state.c
index bae9204..3e379f3 100644
--- a/src/mesa/drivers/dri/i915/i830_state.c
+++ b/src/mesa/drivers/dri/i915/i830_state.c
@@ -1069,11 +1069,11 @@ i830_init_packets(struct i830_context *i830)
i830-state.Stipple[I830_STPREG_ST0] = _3DSTATE_STIPPLE;
 
i830-state.Buffer[I830_DESTREG_DV0] = _3DSTATE_DST_BUF_VARS_CMD;
-   i830-state.Buffer[I830_DESTREG_SENABLE] = (_3DSTATE_SCISSOR_ENABLE_CMD |
-   DISABLE_SCISSOR_RECT);
i830-state.Buffer[I830_DESTREG_SR0] = _3DSTATE_SCISSOR_RECT_0_CMD;
i830-state.Buffer[I830_DESTREG_SR1] = 0;
i830-state.Buffer[I830_DESTREG_SR2] = 0;
+   i830-state.Buffer[I830_DESTREG_SENABLE] = (_3DSTATE_SCISSOR_ENABLE_CMD |
+   DISABLE_SCISSOR_RECT);
 }
 
 void
diff --git a/src/mesa/drivers/dri/i915/i830_vtbl.c 
b/src/mesa/drivers/dri/i915/i830_vtbl.c
index 0f22d86..91da977 100644
--- a/src/mesa/drivers/dri/i915/i830_vtbl.c
+++ b/src/mesa/drivers/dri/i915/i830_vtbl.c
@@ -511,10 +511,10 @@ i830_emit_state(struct intel_context *intel)
 
   OUT_BATCH(state-Buffer[I830_DESTREG_DV0]);
   OUT_BATCH(state-Buffer[I830_DESTREG_DV1]);
-  OUT_BATCH(state-Buffer[I830_DESTREG_SENABLE]);
   OUT_BATCH(state-Buffer[I830_DESTREG_SR0]);
   OUT_BATCH(state-Buffer[I830_DESTREG_SR1]);
   OUT_BATCH(state-Buffer[I830_DESTREG_SR2]);
+  OUT_BATCH(state-Buffer[I830_DESTREG_SENABLE]);
 
   assert(state-Buffer[I830_DESTREG_DRAWRECT0] != MI_NOOP);
   OUT_BATCH(state-Buffer[I830_DESTREG_DRAWRECT0]);
diff --git a/src/mesa/drivers/dri/i915/i915_context.h 
b/src/mesa/drivers/dri/i915/i915_context.h
index 34af202..10f1f8b 100644
--- a/src/mesa/drivers/dri/i915/i915_context.h
+++ b/src/mesa/drivers/dri/i915/i915_context.h
@@ -64,10 +64,10 @@
 #define I915_DESTREG_DBUFADDR1 4
 #define I915_DESTREG_DV0 6
 #define I915_DESTREG_DV1 7
-#define I915_DESTREG_SENABLE 8
-#define I915_DESTREG_SR0 9
-#define I915_DESTREG_SR1 10
-#define I915_DESTREG_SR2 11
+#define I915_DESTREG_SR0 8
+#define I915_DESTREG_SR1 9
+#define I915_DESTREG_SR2 10
+#define I915_DESTREG_SENABLE 11
 #define I915_DESTREG_DRAWRECT0 12
 #define I915_DESTREG_DRAWRECT1 13
 #define I915_DESTREG_DRAWRECT2 14
diff --git a/src/mesa/drivers/dri/i915/i915_state.c 
b/src/mesa/drivers/dri/i915/i915_state.c
index f31b271..203e4a0 100644
--- a/src/mesa/drivers/dri/i915/i915_state.c
+++ b/src/mesa/drivers/dri/i915/i915_state.c
@@ -988,11 +988,11 @@ i915_init_packets(struct i915_context *i915)
   i915-state.Buffer[I915_DESTREG_DV0] = _3DSTATE_DST_BUF_VARS_CMD;
 
   /* scissor */
-  i915-state.Buffer[I915_DESTREG_SENABLE] =
- (_3DSTATE_SCISSOR_ENABLE_CMD | DISABLE_SCISSOR_RECT);
   i915-state.Buffer[I915_DESTREG_SR0] = _3DSTATE_SCISSOR_RECT_0_CMD;
   i915-state.Buffer[I915_DESTREG_SR1] = 0;
   i915-state.Buffer[I915_DESTREG_SR2] = 0;
+  i915-state.Buffer[I915_DESTREG_SENABLE] =
+ (_3DSTATE_SCISSOR_ENABLE_CMD | DISABLE_SCISSOR_RECT);
}
 
i915-state.RasterRules[I915_RASTER_RULES] = _3DSTATE_RASTER_RULES_CMD |
diff --git a/src/mesa/drivers/dri/i915/i915_vtbl.c 
b/src/mesa/drivers/dri/i915/i915_vtbl.c
index 74173d4..706e0c3 100644
--- a/src/mesa/drivers/dri/i915/i915_vtbl.c
+++ 

[Mesa-dev] [PATCH 8/9] i915: Don't call _mesa_meta_glsl_Clear() on gen2

2014-08-06 Thread ville . syrjala
From: Ville Syrjälä ville.syrj...@linux.intel.com

Gen2 doesn't have fragmnts shaders so we shouldn't be calling
_mesa_meta_glsl_Clear() on gen2. Restore the appropriate
ARB_fragment_shader check to the clear path which was lost in:

 commit 94f22fbe787214580a1a13a774114d2650c166cb
 Author: Tapani Pälli tapani.pa...@intel.com
 Date:   Wed Aug 8 20:46:45 2012 +0300

intel: use _mesa_meta_Clear with OpenGL ES 1.1 v2

Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com
---
 src/mesa/drivers/dri/i915/intel_clear.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i915/intel_clear.c 
b/src/mesa/drivers/dri/i915/intel_clear.c
index 1762c1d..5374e19 100644
--- a/src/mesa/drivers/dri/i915/intel_clear.c
+++ b/src/mesa/drivers/dri/i915/intel_clear.c
@@ -179,7 +179,7 @@ intelClear(struct gl_context *ctx, GLbitfield mask)
 
if (tri_mask) {
   debug_mask(tri, tri_mask);
-  if (ctx-API == API_OPENGLES)
+  if (ctx-API == API_OPENGLES || !ctx-Extensions.ARB_fragment_shader)
 _mesa_meta_Clear(intel-ctx, tri_mask);
   else
 _mesa_meta_glsl_Clear(intel-ctx, tri_mask);
-- 
1.8.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/9] i915: Protect macro argument for TEXTURE_SET()

2014-08-06 Thread ville . syrjala
From: Ville Syrjälä ville.syrj...@linux.intel.com

TEXTURE_SET() is the only register macro that forgets to wrap the
argument evaluation in parens. Only simple integers are passed to this
macro so there's no bug but sitll it seems prudent to add the
parens.

Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com
---
 src/mesa/drivers/dri/i915/i830_reg.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i915/i830_reg.h 
b/src/mesa/drivers/dri/i915/i830_reg.h
index e08cbe5..d7ccc16 100644
--- a/src/mesa/drivers/dri/i915/i830_reg.h
+++ b/src/mesa/drivers/dri/i915/i830_reg.h
@@ -256,7 +256,7 @@
 
 #define _3DSTATE_MAP_COORD_TRANSFORM   ((329)|(0x1d24)|(0x8c16))
 #define DISABLE_TEX_TRANSFORM  (128)
-#define TEXTURE_SET(x) (x29)
+#define TEXTURE_SET(x) ((x)29)
 
 #define _3DSTATE_VERTEX_TRANSFORM  ((329)|(0x1d24)|(0x8b16))
 #define DISABLE_VIEWPORT_TRANSFORM (131)
-- 
1.8.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/9] i915: Override mip filter to nearest with aniso

2014-08-06 Thread ville . syrjala
From: Ville Syrjälä ville.syrj...@linux.intel.com

gen2 doesn't supporte linear mip filter with anisotropic min/mag
filtering. The hardware would automagically downgrade the min/mag
filters to linear in such cases, which IMO looks worse than forcing
the mip filter to nearest.

Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com
---
 src/mesa/drivers/dri/i915/i830_texstate.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/drivers/dri/i915/i830_texstate.c 
b/src/mesa/drivers/dri/i915/i830_texstate.c
index b1414c7..00731e6 100644
--- a/src/mesa/drivers/dri/i915/i830_texstate.c
+++ b/src/mesa/drivers/dri/i915/i830_texstate.c
@@ -225,6 +225,8 @@ i830_update_tex_unit(struct intel_context *intel, GLuint 
unit, GLuint ss3)
   if (sampler-MaxAnisotropy  1.0) {
  minFilt = FILTER_ANISOTROPIC;
  magFilt = FILTER_ANISOTROPIC;
+ /* no trilinear + anisotropic */
+ mipFilt = MIPFILTER_NEAREST;
   }
   else {
  switch (sampler-MagFilter) {
-- 
1.8.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/9] i915: Fix GL_DOT3_RGBA a bit

2014-08-06 Thread ville . syrjala
From: Ville Syrjälä ville.syrj...@linux.intel.com

The spec says using DOT4 for alpha is undefined unless DOT4 is also used
for color. It seems to do the right thing anyway, but better safe than sorry.

Also override numAlphaArgs to 2 for DOT4 since that's what it wants.
This migth fix something in case the specified alpha mode has only one
argument. Also avoids emitting a needless 3DSTATE_MAP_BLEND_ARG if
the specified alpha mode has three arguments.

Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com
---
 src/mesa/drivers/dri/i915/i830_texblend.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i915/i830_texblend.c 
b/src/mesa/drivers/dri/i915/i830_texblend.c
index 236be59..feea383 100644
--- a/src/mesa/drivers/dri/i915/i830_texblend.c
+++ b/src/mesa/drivers/dri/i915/i830_texblend.c
@@ -136,7 +136,7 @@ i830SetTexEnvCombine(struct i830_context * i830,
  GLuint texel_op, GLuint * state, const GLfloat * factor)
 {
const GLuint numColorArgs = combine-_NumArgsRGB;
-   const GLuint numAlphaArgs = combine-_NumArgsA;
+   GLuint numAlphaArgs = combine-_NumArgsA;
 
GLuint blendop;
GLuint ablendop;
@@ -209,7 +209,7 @@ i830SetTexEnvCombine(struct i830_context * i830,
   break;
case GL_DOT3_RGBA_EXT:
case GL_DOT3_RGBA:
-  blendop = TEXBLENDOP_DOT3;
+  blendop = TEXBLENDOP_DOT4;
   break;
default:
   return pass_through(state, blendUnit);
@@ -273,6 +273,7 @@ i830SetTexEnvCombine(struct i830_context * i830,
if (combine-ModeRGB == GL_DOT3_RGBA_EXT ||
combine-ModeRGB == GL_DOT3_RGBA) {
   ablendop = TEXBLENDOP_DOT4;
+  numAlphaArgs = 2;
   args_A[0] = TEXBLENDARG_FACTOR;   /* the global factor */
   args_A[1] = TEXBLENDARG_FACTOR;
   args_A[2] = TEXBLENDARG_FACTOR;
-- 
1.8.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fix z_offset computation in intel_miptree_unmap_depthstencil()

2014-08-06 Thread Anuj Phogat
On Wed, Aug 6, 2014 at 9:50 AM, Neil Roberts n...@linux.intel.com wrote:
 I'd just liked to point out that I made a nearly identical patch before
 this patch was posted but I didn't get any review despite prodding
 people a few times on #dri-devel. Maybe we should try to get into the
 habit of searching patchwork for existing patches before posting to the
 list. Does anyone have any suggestions for how I can get my patches more
 noticed?

I'm sorry Neil. I should've looked first on the mailing list to avoid this
duplication of efforts. I'll try to look more often in patchwork for the
patches pending review. I would also request everyone to keep the status
of their patches on patchwork updated. That'll make it easier to go through
pending patches. I usually send out a reminder on the mailing list to attract
wider attention for my patches.

 http://patchwork.freedesktop.org/patch/27168/

 I also made a piglit test for the problem here:

 http://cgit.freedesktop.org/piglit/commit/?id=108a17a4d78bcc7480754d2104b4

 Regards,
 - Neil

 Jordan Justen jljus...@gmail.com writes:

 Reviewed-by: Jordan Justen jordan.l.jus...@intel.com

 On Wed, Jul 16, 2014 at 3:32 PM, Anuj Phogat anuj.pho...@gmail.com wrote:
 The bug is triggered by using glTexSubImage2d() with GL_DEPTH_STENCIL
 as base internal format and non-zero x, y offsets. Currently x, y
 offsets are ignored while updating the texture image.

 Fixes Khronos GLES3 CTS tests:
 npot_tex_sub_image_2d
 npot_tex_sub_image_3d
 npot_pbo_tex_sub_image_2d
 npot_pbo_tex_sub_image_2d

 Cc: mesa-sta...@lists.freedesktop.org
 Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
 ---
  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
 b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
 index 2ab0faa..b36ffc7 100644
 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
 +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
 @@ -2129,9 +2129,9 @@ intel_miptree_unmap_depthstencil(struct brw_context 
 *brw,
  x + s_image_x + map-x,
  y + s_image_y + map-y,
  brw-has_swizzling);
 -   ptrdiff_t z_offset = ((y + z_image_y) *
 +   ptrdiff_t z_offset = ((y + z_image_y + map-y) *
(z_mt-pitch / 4) +
 - (x + z_image_x));
 + (x + z_image_x + map-x));

 if (map_z32f_x24s8) {
z_map[z_offset] = packed_map[(y * map-w + x) * 2 + 0];
 --
 1.9.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/11] i965/blorp: Put sampler index in src1 of texture ops

2014-08-06 Thread Chris Forbes
Yes -- there's no interaction between patches 3 and 4. Blorp only
shares the generator.

On Tue, Aug 5, 2014 at 1:32 PM, Ian Romanick i...@freedesktop.org wrote:
 Does it still build with patch 3 without patch 4?

 On 08/04/2014 01:58 AM, Chris Forbes wrote:
 Signed-off-by: Chris Forbes chr...@ijw.co.nz
 ---
  src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp 
 b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
 index c1676a9..7d4b327 100644
 --- a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
 @@ -78,7 +78,8 @@ brw_blorp_eu_emitter::emit_texture_lookup(const struct 
 brw_reg dst,
unsigned base_mrf,
unsigned msg_length)
  {
 -   fs_inst *inst = new (mem_ctx) fs_inst(op, dst, 
 brw_message_reg(base_mrf));
 +   fs_inst *inst = new (mem_ctx) fs_inst(op, dst, brw_message_reg(base_mrf),
 + fs_reg(0u));

 inst-base_mrf = base_mrf;
 inst-mlen = msg_length;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 0/12] Add support for BPTC texture compression

2014-08-06 Thread Chris Forbes
Does this actually work on all Gen7?

The IVB PRM Vol 4 Part 1 Page 83 says:

   Errata: BC6H_SF16, BC6H_UF16, and BC7_SRGB are not supported and
may result in data corruption if used.

On Thu, Aug 7, 2014 at 4:27 AM, Neil Roberts n...@linux.intel.com wrote:
 Here is a v2 of the BPTC texture compression series. The main
 difference is that instead of going via DXT3 for the UNORM formats it
 now always uses the custom naïve compressor for all formats. This
 doesn't give very good-looking results but it is fast and doesn't add
 any dependencies. There was some discussion about alternative
 approaches on the list here:

 http://lists.freedesktop.org/archives/mesa-dev/2014-July/064436.html

 I didn't manage to get any consensus on whether this approach is the
 right thing to do so I thought I would just post the patches and see
 what happens.

 The other changes are:

 • The patches are rebased on top of Jason Ekstrand's texstore changes.
   This required some modification to format_info.py.

 • Added a patch to make glGenerateMipmap work with the BPTC formats.

 • Added a patch to make the meta implementation of glGetTexImage work
   with the two floating-point formats.

 • Added the formats to some format query functions that were missed.
   (There are a lot of switches for formats spread around Mesa!)

 • Fixed setting the alpha component to 1.0 when fetching from the RGB
   half-float formats.

 • Fixed fetching the alpha component from sRGB formats.

 • Fixed the quantization step for the half-float compressor.

 • Fixed a typo causing a bug in the compressor for textures with a
   width that isn't a multiple of four.

 The patches are also available on Github here:

 https://github.com/bpeel/mesa/commits/wip/bptc

 There are piglit tests for BPTC in a branch here:

 https://github.com/bpeel/piglit/commits/wip/bptc

 - Neil

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Merging VC4 driver

2014-08-06 Thread Eric Anholt
I'd like to start merging the VC4 driver.  I've got a lot of things
working under sim (piglit's at 5212/6726 for a slightly-trimmed
quick.py), and once I find where I put my serial cable I hope to get the
kernel fixed up and passing even more than that on HW.

I'm at 80 commits right now, with 3 initial huge commits then actual
incremental development.  I don't think other people are going to want
to review all of this (45 files changed, 9277 insertions(+), 5
deletions(-)), so I'm feeling ready to go ahead on my own.  What I'll
throw out for (maybe) review, though, is the stuff outside of the
driver:

diff --git a/configure.ac b/configure.ac
index a3b3abd..9679c4c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -724,7 +724,7 @@ GALLIUM_DRIVERS_DEFAULT=r300,r600,svga,swrast
 AC_ARG_WITH([gallium-drivers],
 [AS_HELP_STRING([--with-gallium-drivers@:@=DIRS...@:@],
 [comma delimited Gallium drivers list, e.g.
-i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast
+i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,vc4
 @:@default=r300,r600,svga,swrast@:@])],
 [with_gallium_drivers=$withval],
 [with_gallium_drivers=$GALLIUM_DRIVERS_DEFAULT])
@@ -2003,6 +2003,19 @@ if test -n $with_gallium_drivers; then
 GALLIUM_TARGET_DIRS=$GALLIUM_TARGET_DIRS dri/kms-swrast
 fi
 ;;
+xvc4)
+HAVE_GALLIUM_VC4=yes
+gallium_require_drm_loader
+GALLIUM_DRIVERS_DIRS=$GALLIUM_DRIVERS_DIRS vc4
+gallium_check_st vc4/drm dri-vc4   
+DRICOMMON_NEED_LIBDRM=yes
+
+case $host_cpu in
+i?86 | x86_64 | amd64)
+USE_VC4_SIMULATOR=yes
+;;
+esac
+;;
 *)
 AC_MSG_ERROR([Unknown Gallium driver: $driver])
 ;;
@@ -2064,6 +2077,7 @@ AM_CONDITIONAL(HAVE_GALLIUM_NOUVEAU, test 
x$HAVE_GALLIUM_NOUVEAU = xyes)
 AM_CONDITIONAL(HAVE_GALLIUM_FREEDRENO, test x$HAVE_GALLIUM_FREEDRENO = xyes)
 AM_CONDITIONAL(HAVE_GALLIUM_SOFTPIPE, test x$HAVE_GALLIUM_SOFTPIPE = xyes)
 AM_CONDITIONAL(HAVE_GALLIUM_LLVMPIPE, test x$HAVE_GALLIUM_LLVMPIPE = xyes)
+AM_CONDITIONAL(HAVE_GALLIUM_VC4, test x$HAVE_GALLIUM_VC4 = xyes)
 
 AM_CONDITIONAL(NEED_GALLIUM_SOFTPIPE_DRIVER, test x$HAVE_GALLIUM_SVGA = xyes 
-o \
   x$HAVE_GALLIUM_SOFTPIPE = 
xyes)
@@ -2129,6 +2143,7 @@ AM_CONDITIONAL(HAVE_LOADER_GALLIUM, test 
x$enable_gallium_loader = xyes)
 AM_CONDITIONAL(HAVE_DRM_LOADER_GALLIUM, test x$enable_gallium_drm_loader = 
xyes)
 AM_CONDITIONAL(HAVE_GALLIUM_COMPUTE, test x$enable_opencl = xyes)
 AM_CONDITIONAL(HAVE_MESA_LLVM, test x$MESA_LLVM = x1)
+AM_CONDITIONAL(USE_VC4_SIMULATOR, test x$USE_VC4_SIMULATOR = xyes)
 
 AC_SUBST([ELF_LIB])
 
@@ -2201,6 +2216,7 @@ AC_CONFIG_FILES([Makefile
src/gallium/drivers/softpipe/Makefile
src/gallium/drivers/svga/Makefile
src/gallium/drivers/trace/Makefile
+   src/gallium/drivers/vc4/Makefile
src/gallium/state_trackers/Makefile
src/gallium/state_trackers/clover/Makefile
src/gallium/state_trackers/dri/Makefile
@@ -2243,6 +2259,7 @@ AC_CONFIG_FILES([Makefile
src/gallium/winsys/sw/wayland/Makefile
src/gallium/winsys/sw/wrapper/Makefile
src/gallium/winsys/sw/xlib/Makefile
+   src/gallium/winsys/vc4/drm/Makefile
src/gbm/Makefile
src/gbm/main/gbm.pc
src/glsl/Makefile
diff --git a/src/gallium/auxiliary/target-helpers/inline_drm_helper.h 
b/src/gallium/auxiliary/target-helpers/inline_drm_helper.h
index 5d02da7..4ef94de 100644
--- a/src/gallium/auxiliary/target-helpers/inline_drm_helper.h
+++ b/src/gallium/auxiliary/target-helpers/inline_drm_helper.h
@@ -54,6 +54,10 @@
 #include freedreno/drm/freedreno_drm_public.h
 #endif
 
+#if GALLIUM_VC4
+#include vc4/drm/vc4_drm_public.h
+#endif
+
 static char* driver_name = NULL;
 
 /* XXX: We need to teardown the winsys if *screen_create() fails. */
@@ -286,6 +290,48 @@ pipe_freedreno_create_screen(int fd)
 }
 #endif
 
+#if defined(GALLIUM_VC4)
+#if defined(DRI_TARGET)
+
+const __DRIextension **__driDriverGetExtensions_vc4(void);
+
+PUBLIC const __DRIextension **__driDriverGetExtensions_vc4(void)
+{
+   globalDriverAPI = galliumdrm_driver_api;
+   return galliumdrm_driver_extensions;
+}
+
+#if defined(USE_VC4_SIMULATOR)
+const __DRIextension **__driDriverGetExtensions_i965(void);
+
+/**
+ * When building using the simulator (on x86), we advertise ourselves as the
+ * i965 driver so that you can just make a directory with a link from
+ * i965_dri.so to the built vc4_dri.so, and point LIBGL_DRIVERS_PATH to that
+ * on your i965-using host to run the driver under simulation.
+ *
+ * This is, of course, incompatible with building with the ilo driver, but you
+ * shouldn't be building that anyway.

Re: [Mesa-dev] [PATCH 3/9] i915: Use L8A8 instead of I8 to simulate A8 on gen2

2014-08-06 Thread Eric Anholt
ville.syrj...@linux.intel.com writes:

 From: Ville Syrjälä ville.syrj...@linux.intel.com

 Gen2 doesn't support the A8 texture format. Currently the driver
 substitutes it with I8, but that results in incorrect RGB values.
 Use A8L8 instead. We end up wasting a bit of memory, but at least
 we should get the correct results.

 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72819
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80050
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38873
 Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com
 ---
  src/mesa/drivers/dri/i915/i830_texstate.c   |  2 --
  src/mesa/drivers/dri/i915/i915_context.c|  3 ++-
  src/mesa/drivers/dri/i915/intel_tex_image.c | 22 ++
  3 files changed, 24 insertions(+), 3 deletions(-)

 diff --git a/src/mesa/drivers/dri/i915/i830_texstate.c 
 b/src/mesa/drivers/dri/i915/i830_texstate.c
 index 58d3356..b1414c7 100644
 --- a/src/mesa/drivers/dri/i915/i830_texstate.c
 +++ b/src/mesa/drivers/dri/i915/i830_texstate.c
 @@ -47,8 +47,6 @@ translate_texture_format(GLuint mesa_format)
return MAPSURF_8BIT | MT_8BIT_L8;
 case MESA_FORMAT_I_UNORM8:
return MAPSURF_8BIT | MT_8BIT_I8;
 -   case MESA_FORMAT_A_UNORM8:
 -  return MAPSURF_8BIT | MT_8BIT_I8; /* Kludge! */
 case MESA_FORMAT_L8A8_UNORM:
return MAPSURF_16BIT | MT_16BIT_AY88;
 case MESA_FORMAT_B5G6R5_UNORM:
 diff --git a/src/mesa/drivers/dri/i915/i915_context.c 
 b/src/mesa/drivers/dri/i915/i915_context.c
 index 7f43896..3fd571d 100644
 --- a/src/mesa/drivers/dri/i915/i915_context.c
 +++ b/src/mesa/drivers/dri/i915/i915_context.c
 @@ -109,7 +109,8 @@ intel_init_texture_formats(struct gl_context *ctx)
 ctx-TextureFormatSupported[MESA_FORMAT_B5G5R5A1_UNORM] = true;
 ctx-TextureFormatSupported[MESA_FORMAT_B5G6R5_UNORM] = true;
 ctx-TextureFormatSupported[MESA_FORMAT_L_UNORM8] = true;
 -   ctx-TextureFormatSupported[MESA_FORMAT_A_UNORM8] = true;
 +   if (intel-gen == 3)
 +  ctx-TextureFormatSupported[MESA_FORMAT_A_UNORM8] = true;
 ctx-TextureFormatSupported[MESA_FORMAT_I_UNORM8] = true;
 ctx-TextureFormatSupported[MESA_FORMAT_L8A8_UNORM] = true;
  
 diff --git a/src/mesa/drivers/dri/i915/intel_tex_image.c 
 b/src/mesa/drivers/dri/i915/intel_tex_image.c
 index 57674b9..be9a4ff 100644
 --- a/src/mesa/drivers/dri/i915/intel_tex_image.c
 +++ b/src/mesa/drivers/dri/i915/intel_tex_image.c
 @@ -14,6 +14,7 @@
  #include main/texobj.h
  #include main/teximage.h
  #include main/texstore.h
 +#include main/texformat.h
  
  #include intel_context.h
  #include intel_mipmap_tree.h
 @@ -362,9 +363,30 @@ intel_image_target_texture_2d(struct gl_context *ctx, 
 GLenum target,
image-tile_x, image-tile_y);
  }
  
 +static mesa_format intel_choose_tex_format(struct gl_context *ctx,
 +   GLenum target,
 +   GLint internalFormat,
 +   GLenum format, GLenum type)
 +{
 +   struct intel_context *intel = intel_context(ctx);
 +
 +   switch (internalFormat) {
 +   case GL_ALPHA:
 +   case GL_ALPHA4:
 +   case GL_ALPHA8:
 +  /* no A8 on gen2 :( */
 +  if (intel-gen == 2)
 + return MESA_FORMAT_L8A8_UNORM;
 +  /* fall through */
 +   default:
 +  return _mesa_choose_tex_format(ctx, target, internalFormat, format, 
 type);
 +   }
 +}

Instead, I'd rather see _mesa_choose_tex_format just grow another case:

  RETURN_IF_SUPPORTED(MESA_FORMAT_L8A8_UNORM);


pgpOsvZTR2qMU.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/11] i965/blorp: Put sampler index in src1 of texture ops

2014-08-06 Thread Ian Romanick
On 08/06/2014 12:40 PM, Chris Forbes wrote:
 Yes -- there's no interaction between patches 3 and 4. Blorp only
 shares the generator.

Okay.  Then these two are also

Reviewed-by: Ian Romanick ian.d.roman...@intel.com

 On Tue, Aug 5, 2014 at 1:32 PM, Ian Romanick i...@freedesktop.org wrote:
 Does it still build with patch 3 without patch 4?

 On 08/04/2014 01:58 AM, Chris Forbes wrote:
 Signed-off-by: Chris Forbes chr...@ijw.co.nz
 ---
  src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp 
 b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
 index c1676a9..7d4b327 100644
 --- a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
 @@ -78,7 +78,8 @@ brw_blorp_eu_emitter::emit_texture_lookup(const struct 
 brw_reg dst,
unsigned base_mrf,
unsigned msg_length)
  {
 -   fs_inst *inst = new (mem_ctx) fs_inst(op, dst, 
 brw_message_reg(base_mrf));
 +   fs_inst *inst = new (mem_ctx) fs_inst(op, dst, 
 brw_message_reg(base_mrf),
 + fs_reg(0u));

 inst-base_mrf = base_mrf;
 inst-mlen = msg_length;


 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/9] i915: Don't call _mesa_meta_glsl_Clear() on gen2

2014-08-06 Thread Eric Anholt
ville.syrj...@linux.intel.com writes:

 From: Ville Syrjälä ville.syrj...@linux.intel.com

 Gen2 doesn't have fragmnts shaders so we shouldn't be calling

spelling^

Other than that, patches 4-9 are:

Reviewed-by: Eric Anholt e...@anholt.net


pgp3isQE9B0y5.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/11] i965/blorp: Put sampler index in src1 of texture ops

2014-08-06 Thread Chris Forbes
OK, assuming no one complains, I'll push the series later today.

On Thu, Aug 7, 2014 at 8:44 AM, Ian Romanick i...@freedesktop.org wrote:
 On 08/06/2014 12:40 PM, Chris Forbes wrote:
 Yes -- there's no interaction between patches 3 and 4. Blorp only
 shares the generator.

 Okay.  Then these two are also

 Reviewed-by: Ian Romanick ian.d.roman...@intel.com

 On Tue, Aug 5, 2014 at 1:32 PM, Ian Romanick i...@freedesktop.org wrote:
 Does it still build with patch 3 without patch 4?

 On 08/04/2014 01:58 AM, Chris Forbes wrote:
 Signed-off-by: Chris Forbes chr...@ijw.co.nz
 ---
  src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp 
 b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
 index c1676a9..7d4b327 100644
 --- a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
 @@ -78,7 +78,8 @@ brw_blorp_eu_emitter::emit_texture_lookup(const struct 
 brw_reg dst,
unsigned base_mrf,
unsigned msg_length)
  {
 -   fs_inst *inst = new (mem_ctx) fs_inst(op, dst, 
 brw_message_reg(base_mrf));
 +   fs_inst *inst = new (mem_ctx) fs_inst(op, dst, 
 brw_message_reg(base_mrf),
 + fs_reg(0u));

 inst-base_mrf = base_mrf;
 inst-mlen = msg_length;




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Merging VC4 driver

2014-08-06 Thread Roland Scheidegger
Am 06.08.2014 22:33, schrieb Eric Anholt:
 I'd like to start merging the VC4 driver.  I've got a lot of things
 working under sim (piglit's at 5212/6726 for a slightly-trimmed
 quick.py), and once I find where I put my serial cable I hope to get the
 kernel fixed up and passing even more than that on HW.
 
 I'm at 80 commits right now, with 3 initial huge commits then actual
 incremental development.  I don't think other people are going to want
 to review all of this (45 files changed, 9277 insertions(+), 5
 deletions(-)), so I'm feeling ready to go ahead on my own.  What I'll
 throw out for (maybe) review, though, is the stuff outside of the
 driver:
 
 diff --git a/configure.ac b/configure.ac
 index a3b3abd..9679c4c 100644
 --- a/configure.ac
 +++ b/configure.ac
 @@ -724,7 +724,7 @@ GALLIUM_DRIVERS_DEFAULT=r300,r600,svga,swrast
  AC_ARG_WITH([gallium-drivers],
  [AS_HELP_STRING([--with-gallium-drivers@:@=DIRS...@:@],
  [comma delimited Gallium drivers list, e.g.
 -i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast
 +i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,vc4
  @:@default=r300,r600,svga,swrast@:@])],
  [with_gallium_drivers=$withval],
  [with_gallium_drivers=$GALLIUM_DRIVERS_DEFAULT])
 @@ -2003,6 +2003,19 @@ if test -n $with_gallium_drivers; then
  GALLIUM_TARGET_DIRS=$GALLIUM_TARGET_DIRS dri/kms-swrast
  fi
  ;;
 +xvc4)
 +HAVE_GALLIUM_VC4=yes
 +gallium_require_drm_loader
 +GALLIUM_DRIVERS_DIRS=$GALLIUM_DRIVERS_DIRS vc4
 +gallium_check_st vc4/drm dri-vc4   
 +DRICOMMON_NEED_LIBDRM=yes
 +
 +case $host_cpu in
 +i?86 | x86_64 | amd64)
 +USE_VC4_SIMULATOR=yes
 +;;
 +esac
 +;;
  *)
  AC_MSG_ERROR([Unknown Gallium driver: $driver])
  ;;
 @@ -2064,6 +2077,7 @@ AM_CONDITIONAL(HAVE_GALLIUM_NOUVEAU, test 
 x$HAVE_GALLIUM_NOUVEAU = xyes)
  AM_CONDITIONAL(HAVE_GALLIUM_FREEDRENO, test x$HAVE_GALLIUM_FREEDRENO = 
 xyes)
  AM_CONDITIONAL(HAVE_GALLIUM_SOFTPIPE, test x$HAVE_GALLIUM_SOFTPIPE = xyes)
  AM_CONDITIONAL(HAVE_GALLIUM_LLVMPIPE, test x$HAVE_GALLIUM_LLVMPIPE = xyes)
 +AM_CONDITIONAL(HAVE_GALLIUM_VC4, test x$HAVE_GALLIUM_VC4 = xyes)
  
  AM_CONDITIONAL(NEED_GALLIUM_SOFTPIPE_DRIVER, test x$HAVE_GALLIUM_SVGA = 
 xyes -o \
x$HAVE_GALLIUM_SOFTPIPE 
 = xyes)
 @@ -2129,6 +2143,7 @@ AM_CONDITIONAL(HAVE_LOADER_GALLIUM, test 
 x$enable_gallium_loader = xyes)
  AM_CONDITIONAL(HAVE_DRM_LOADER_GALLIUM, test x$enable_gallium_drm_loader = 
 xyes)
  AM_CONDITIONAL(HAVE_GALLIUM_COMPUTE, test x$enable_opencl = xyes)
  AM_CONDITIONAL(HAVE_MESA_LLVM, test x$MESA_LLVM = x1)
 +AM_CONDITIONAL(USE_VC4_SIMULATOR, test x$USE_VC4_SIMULATOR = xyes)
  
  AC_SUBST([ELF_LIB])
  
 @@ -2201,6 +2216,7 @@ AC_CONFIG_FILES([Makefile
   src/gallium/drivers/softpipe/Makefile
   src/gallium/drivers/svga/Makefile
   src/gallium/drivers/trace/Makefile
 + src/gallium/drivers/vc4/Makefile
   src/gallium/state_trackers/Makefile
   src/gallium/state_trackers/clover/Makefile
   src/gallium/state_trackers/dri/Makefile
 @@ -2243,6 +2259,7 @@ AC_CONFIG_FILES([Makefile
   src/gallium/winsys/sw/wayland/Makefile
   src/gallium/winsys/sw/wrapper/Makefile
   src/gallium/winsys/sw/xlib/Makefile
 + src/gallium/winsys/vc4/drm/Makefile
   src/gbm/Makefile
   src/gbm/main/gbm.pc
   src/glsl/Makefile
 diff --git a/src/gallium/auxiliary/target-helpers/inline_drm_helper.h 
 b/src/gallium/auxiliary/target-helpers/inline_drm_helper.h
 index 5d02da7..4ef94de 100644
 --- a/src/gallium/auxiliary/target-helpers/inline_drm_helper.h
 +++ b/src/gallium/auxiliary/target-helpers/inline_drm_helper.h
 @@ -54,6 +54,10 @@
  #include freedreno/drm/freedreno_drm_public.h
  #endif
  
 +#if GALLIUM_VC4
 +#include vc4/drm/vc4_drm_public.h
 +#endif
 +
  static char* driver_name = NULL;
  
  /* XXX: We need to teardown the winsys if *screen_create() fails. */
 @@ -286,6 +290,48 @@ pipe_freedreno_create_screen(int fd)
  }
  #endif
  
 +#if defined(GALLIUM_VC4)
 +#if defined(DRI_TARGET)
 +
 +const __DRIextension **__driDriverGetExtensions_vc4(void);
 +
 +PUBLIC const __DRIextension **__driDriverGetExtensions_vc4(void)
 +{
 +   globalDriverAPI = galliumdrm_driver_api;
 +   return galliumdrm_driver_extensions;
 +}
 +
 +#if defined(USE_VC4_SIMULATOR)
 +const __DRIextension **__driDriverGetExtensions_i965(void);
 +
 +/**
 + * When building using the simulator (on x86), we advertise ourselves as the
 + * i965 driver so that you can just make a directory with a link from
 + * i965_dri.so to the built vc4_dri.so, and point LIBGL_DRIVERS_PATH to that
 + * on your i965-using host to run the driver 

[Mesa-dev] [PATCH 3/3] r600, radeonsi: Copy implicit args provided by clover

2014-08-06 Thread Jan Vesely
Signed-off-by: Jan Vesely jan.ves...@rutgers.edu
---
 src/gallium/drivers/r600/evergreen_compute.c | 14 --
 src/gallium/drivers/r600/evergreen_compute.h |  1 -
 src/gallium/drivers/radeonsi/si_compute.c|  6 +++---
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
b/src/gallium/drivers/r600/evergreen_compute.c
index d50f343..37910fb 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -268,11 +268,12 @@ static void evergreen_bind_compute_state(struct 
pipe_context *ctx_, void *state)
  * (x,y,z)
  * DWORDS 9+ : Kernel parameters
  */
-void evergreen_compute_upload_input(
+static void evergreen_compute_upload_input(
struct pipe_context *ctx_,
const uint *block_layout,
const uint *grid_layout,
-   const void *input)
+   const void *input,
+   size_t kinput_size)
 {
struct r600_context *ctx = (struct r600_context *)ctx_;
struct r600_pipe_compute *shader = ctx-cs_shader_state.shader;
@@ -280,7 +281,7 @@ void evergreen_compute_upload_input(
/* We need to reserve 9 dwords (36 bytes) for implicit kernel
 * parameters.
 */
-   unsigned input_size = shader-input_size + 36;
+   unsigned input_size = kinput_size + 36;
uint32_t * num_work_groups_start;
uint32_t * global_size_start;
uint32_t * local_size_start;
@@ -320,7 +321,7 @@ void evergreen_compute_upload_input(
memcpy(local_size_start, block_layout, 3 * sizeof(uint));
 
/* Copy the kernel inputs */
-   memcpy(kernel_parameters_start, input, shader-input_size);
+   memcpy(kernel_parameters_start, input, kinput_size);
 
for (i = 0; i  (input_size / 4); i++) {
COMPUTE_DBG(ctx-screen, input %i : %u\n, i,
@@ -541,7 +542,7 @@ void evergreen_emit_cs_shader(
 static void evergreen_launch_grid(
struct pipe_context *ctx_,
const uint *block_layout, const uint *grid_layout,
-   uint32_t pc, const void *input, size_t size)
+   uint32_t pc, const void *input, size_t input_size)
 {
struct r600_context *ctx = (struct r600_context *)ctx_;
 
@@ -584,7 +585,8 @@ static void evergreen_launch_grid(
 #endif
shader-active_kernel = kernel;
ctx-cs_shader_state.kernel_index = pc;
-   evergreen_compute_upload_input(ctx_, block_layout, grid_layout, input);
+   evergreen_compute_upload_input(ctx_, block_layout, grid_layout, input,
+   input_size);
compute_emit_cs(ctx, block_layout, grid_layout);
 }
 
diff --git a/src/gallium/drivers/r600/evergreen_compute.h 
b/src/gallium/drivers/r600/evergreen_compute.h
index 4fb53a1..570ab2a 100644
--- a/src/gallium/drivers/r600/evergreen_compute.h
+++ b/src/gallium/drivers/r600/evergreen_compute.h
@@ -40,7 +40,6 @@ struct r600_resource_global {
 
 void *evergreen_create_compute_state(struct pipe_context *ctx, const struct 
pipe_compute_state *cso);
 void evergreen_delete_compute_state(struct pipe_context *ctx, void *state);
-void evergreen_compute_upload_input(struct pipe_context *context, const uint 
*block_layout, const uint *grid_layout, const void *input);
 void evergreen_init_atom_start_compute_cs(struct r600_context *rctx);
 void evergreen_init_compute_state_functions(struct r600_context *rctx);
 void evergreen_emit_cs_shader(struct r600_context *rctx, struct r600_atom * 
atom);
diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
b/src/gallium/drivers/radeonsi/si_compute.c
index 9a90470..66df65f 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -162,7 +162,7 @@ static unsigned compute_num_waves_for_scratch(
 static void si_launch_grid(
struct pipe_context *ctx,
const uint *block_layout, const uint *grid_layout,
-   uint32_t pc, const void *input, size_t size)
+   uint32_t pc, const void *input, size_t input_size)
 {
struct si_context *sctx = (struct si_context*)ctx;
struct si_pipe_compute *program = sctx-cs_shader_state.program;
@@ -197,7 +197,7 @@ static void si_launch_grid(
/* Upload the kernel arguments */
 
/* The extra num_work_size_bytes are for work group / work item size 
information */
-   kernel_args_size = program-input_size + num_work_size_bytes + 8 /* For 
scratch va */;
+   kernel_args_size = input_size + num_work_size_bytes + 8 /* For scratch 
va */;
 
kernel_args = MALLOC(kernel_args_size);
for (i = 0; i  3; i++) {
@@ -209,7 +209,7 @@ static void si_launch_grid(
num_waves_for_scratch = compute_num_waves_for_scratch(
sctx-screen-b.info, block_layout, grid_layout);
 
-   memcpy(kernel_args + (num_work_size_bytes / 4), input, 
program-input_size);
+   memcpy(kernel_args + (num_work_size_bytes / 4), input, 

[Mesa-dev] [PATCH 2/3] clover: Add work dimension implicit param to input

2014-08-06 Thread Jan Vesely
Signed-off-by: Jan Vesely jan.ves...@rutgers.edu
---
 src/gallium/state_trackers/clover/core/kernel.cpp | 162 --
 1 file changed, 85 insertions(+), 77 deletions(-)

diff --git a/src/gallium/state_trackers/clover/core/kernel.cpp 
b/src/gallium/state_trackers/clover/core/kernel.cpp
index 68e91d5..7a88de1 100644
--- a/src/gallium/state_trackers/clover/core/kernel.cpp
+++ b/src/gallium/state_trackers/clover/core/kernel.cpp
@@ -28,6 +28,82 @@
 
 using namespace clover;
 
+namespace {
+   templatetypename T
+   std::vectoruint8_t
+   bytes(const T x) {
+  return { (uint8_t *)x, (uint8_t *)x + sizeof(x) };
+   }
+
+   ///
+   /// Transform buffer \a v from the native byte order into the byte
+   /// order specified by \a e.
+   ///
+   templatetypename T
+   void
+   byteswap(T v, pipe_endian e) {
+  if (PIPE_ENDIAN_NATIVE != e)
+ std::reverse(v.begin(), v.end());
+   }
+
+   ///
+   /// Pad buffer \a v to the next multiple of \a n.
+   ///
+   templatetypename T
+   void
+   align(T v, size_t n) {
+  v.resize(util_align_npot(v.size(), n));
+   }
+
+   bool
+   msb(const std::vectoruint8_t s) {
+  if (PIPE_ENDIAN_NATIVE == PIPE_ENDIAN_LITTLE)
+ return s.back()  0x80;
+  else
+ return s.front()  0x80;
+   }
+
+   ///
+   /// Resize buffer \a v to size \a n using sign or zero extension
+   /// according to \a ext.
+   ///
+   templatetypename T
+   void
+   extend(T v, enum module::argument::ext_type ext, size_t n) {
+  const size_t m = std::min(v.size(), n);
+  const bool sign_ext = (ext == module::argument::sign_ext);
+  const uint8_t fill = (sign_ext  msb(v) ? ~0 : 0);
+  T w(n, fill);
+
+  if (PIPE_ENDIAN_NATIVE == PIPE_ENDIAN_LITTLE)
+ std::copy_n(v.begin(), m, w.begin());
+  else
+ std::copy_n(v.end() - m, m, w.end() - m);
+
+  std::swap(v, w);
+   }
+
+   ///
+   /// Append buffer \a w to \a v.
+   ///
+   templatetypename T
+   void
+   insert(T v, const T w) {
+  v.insert(v.end(), w.begin(), w.end());
+   }
+
+   ///
+   /// Append \a n elements to the end of buffer \a v.
+   ///
+   templatetypename T
+   size_t
+   allocate(T v, size_t n) {
+  size_t pos = v.size();
+  v.resize(pos + n);
+  return pos;
+   }
+}
+
 kernel::kernel(clover::program prog, const std::string name,
const std::vectormodule::argument margs) :
program(prog), _name(name), exec(*this) {
@@ -77,6 +153,10 @@ kernel::launch(command_queue q,
  return (uint32_t *)exec.input[h];
   }, exec.g_handles);
 
+   // Implicit arguments
+   auto dims = bytes(cl_uint(block_size.size()));
+   byteswap(dims, q.device().endianness());
+
q.pipe-bind_compute_state(q.pipe, st);
q.pipe-bind_sampler_states(q.pipe, PIPE_SHADER_COMPUTE,
0, exec.samplers.size(),
@@ -89,11 +169,15 @@ kernel::launch(command_queue q,
q.pipe-set_global_binding(q.pipe, 0, exec.g_buffers.size(),
   exec.g_buffers.data(), g_handles.data());
 
+   // Create local copy for implicit arguments
+   auto local_input = exec.input;
+   insert(local_input, dims);
+
q.pipe-launch_grid(q.pipe,
pad_vector(q, block_size, 1).data(),
pad_vector(q, reduced_grid_size, 1).data(),
find(name_equals(_name), m.syms).offset,
-   exec.input.data(), exec.input.size());
+   local_input.data(), local_input.size());
 
q.pipe-set_global_binding(q.pipe, 0, exec.g_buffers.size(), NULL, NULL);
q.pipe-set_compute_resources(q.pipe, 0, exec.resources.size(), NULL);
@@ -206,82 +290,6 @@ kernel::exec_context::unbind() {
mem_local = 0;
 }
 
-namespace {
-   templatetypename T
-   std::vectoruint8_t
-   bytes(const T x) {
-  return { (uint8_t *)x, (uint8_t *)x + sizeof(x) };
-   }
-
-   ///
-   /// Transform buffer \a v from the native byte order into the byte
-   /// order specified by \a e.
-   ///
-   templatetypename T
-   void
-   byteswap(T v, pipe_endian e) {
-  if (PIPE_ENDIAN_NATIVE != e)
- std::reverse(v.begin(), v.end());
-   }
-
-   ///
-   /// Pad buffer \a v to the next multiple of \a n.
-   ///
-   templatetypename T
-   void
-   align(T v, size_t n) {
-  v.resize(util_align_npot(v.size(), n));
-   }
-
-   bool
-   msb(const std::vectoruint8_t s) {
-  if (PIPE_ENDIAN_NATIVE == PIPE_ENDIAN_LITTLE)
- return s.back()  0x80;
-  else
- return s.front()  0x80;
-   }
-
-   ///
-   /// Resize buffer \a v to size \a n using sign or zero extension
-   /// according to \a ext.
-   ///
-   templatetypename T
-   void
-   extend(T v, enum module::argument::ext_type ext, size_t n) {
-  const size_t m = std::min(v.size(), n);
-  const bool sign_ext = (ext == module::argument::sign_ext);
-  const uint8_t fill = (sign_ext  msb(v) ? ~0 : 0);
-  T w(n, fill);
-
-  if (PIPE_ENDIAN_NATIVE == PIPE_ENDIAN_LITTLE)
-  

[Mesa-dev] [PATCH 1/3] gallium: Pass input data size to launch_grid

2014-08-06 Thread Jan Vesely
Future commits add implicit parameters so
we can no longer rely on shader param size

Signed-off-by: Jan Vesely jan.ves...@rutgers.edu
---
 src/gallium/drivers/ilo/ilo_gpgpu.c   |  2 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_compute.c   |  2 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_context.h   |  4 +--
 src/gallium/drivers/nouveau/nvc0/nve4_compute.c   |  2 +-
 src/gallium/drivers/r600/evergreen_compute.c  |  2 +-
 src/gallium/drivers/radeonsi/si_compute.c |  2 +-
 src/gallium/include/pipe/p_context.h  |  2 +-
 src/gallium/state_trackers/clover/core/kernel.cpp |  2 +-
 src/gallium/tests/trivial/compute.c   | 40 +++
 9 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/src/gallium/drivers/ilo/ilo_gpgpu.c 
b/src/gallium/drivers/ilo/ilo_gpgpu.c
index b17a518..d995db2 100644
--- a/src/gallium/drivers/ilo/ilo_gpgpu.c
+++ b/src/gallium/drivers/ilo/ilo_gpgpu.c
@@ -35,7 +35,7 @@
 static void
 ilo_launch_grid(struct pipe_context *pipe,
 const uint *block_layout, const uint *grid_layout,
-uint32_t pc, const void *input)
+uint32_t pc, const void *input, size_t size)
 {
 }
 
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
index ad287a2..55b71e2 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
@@ -197,7 +197,7 @@ void
 nvc0_launch_grid(struct pipe_context *pipe,
  const uint *block_layout, const uint *grid_layout,
  uint32_t label,
- const void *input)
+ const void *input, size_t size)
 {
struct nvc0_context *nvc0 = nvc0_context(pipe);
struct nouveau_pushbuf *push = nvc0-base.pushbuf;
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h 
b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
index ebeb8c4..2e901fa 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h
@@ -353,10 +353,10 @@ void nvc0_push_vbo(struct nvc0_context *, const struct 
pipe_draw_info *);
 
 /* nve4_compute.c */
 void nve4_launch_grid(struct pipe_context *,
-  const uint *, const uint *, uint32_t, const void *);
+  const uint *, const uint *, uint32_t, const void *, 
size_t);
 
 /* nvc0_compute.c */
 void nvc0_launch_grid(struct pipe_context *,
-  const uint *, const uint *, uint32_t, const void *);
+  const uint *, const uint *, uint32_t, const void *, 
size_t);
 
 #endif
diff --git a/src/gallium/drivers/nouveau/nvc0/nve4_compute.c 
b/src/gallium/drivers/nouveau/nvc0/nve4_compute.c
index f243316..e408ec8 100644
--- a/src/gallium/drivers/nouveau/nvc0/nve4_compute.c
+++ b/src/gallium/drivers/nouveau/nvc0/nve4_compute.c
@@ -432,7 +432,7 @@ void
 nve4_launch_grid(struct pipe_context *pipe,
  const uint *block_layout, const uint *grid_layout,
  uint32_t label,
- const void *input)
+ const void *input, size_t size)
 {
struct nvc0_context *nvc0 = nvc0_context(pipe);
struct nouveau_pushbuf *push = nvc0-base.pushbuf;
diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
b/src/gallium/drivers/r600/evergreen_compute.c
index 1970414..d50f343 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -541,7 +541,7 @@ void evergreen_emit_cs_shader(
 static void evergreen_launch_grid(
struct pipe_context *ctx_,
const uint *block_layout, const uint *grid_layout,
-   uint32_t pc, const void *input)
+   uint32_t pc, const void *input, size_t size)
 {
struct r600_context *ctx = (struct r600_context *)ctx_;
 
diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
b/src/gallium/drivers/radeonsi/si_compute.c
index 42e4fec..9a90470 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -162,7 +162,7 @@ static unsigned compute_num_waves_for_scratch(
 static void si_launch_grid(
struct pipe_context *ctx,
const uint *block_layout, const uint *grid_layout,
-   uint32_t pc, const void *input)
+   uint32_t pc, const void *input, size_t size)
 {
struct si_context *sctx = (struct si_context*)ctx;
struct si_pipe_compute *program = sctx-cs_shader_state.program;
diff --git a/src/gallium/include/pipe/p_context.h 
b/src/gallium/include/pipe/p_context.h
index af5674f..e71be02 100644
--- a/src/gallium/include/pipe/p_context.h
+++ b/src/gallium/include/pipe/p_context.h
@@ -523,7 +523,7 @@ struct pipe_context {
 */
void (*launch_grid)(struct pipe_context *context,
const uint *block_layout, const uint *grid_layout,
-   uint32_t pc, const void *input);
+   

[Mesa-dev] [PATCH 0/3] cl workdim v2

2014-08-06 Thread Jan Vesely
This respin includes Francisco's approach of providing implicit
in the arg vector passed from clover, and Tom's idea of appending
implicit args after the kernel args.

I assumed it's not safe to modify exec.input, so the input vector is copied
before appending work dim.

Passes get-work-dim piglit on turks without any regression,
I have not tested SI as I don't have the hw.

jan




Jan Vesely (3):
  gallium: Pass input data size to launch_grid
  clover: Add work dimension implicit param to input
  r600,radeonsi: Copy implicit args provided by clover

 src/gallium/drivers/ilo/ilo_gpgpu.c   |   2 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_compute.c   |   2 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_context.h   |   4 +-
 src/gallium/drivers/nouveau/nvc0/nve4_compute.c   |   2 +-
 src/gallium/drivers/r600/evergreen_compute.c  |  14 +-
 src/gallium/drivers/r600/evergreen_compute.h  |   1 -
 src/gallium/drivers/radeonsi/si_compute.c |   6 +-
 src/gallium/include/pipe/p_context.h  |   2 +-
 src/gallium/state_trackers/clover/core/kernel.cpp | 162 --
 src/gallium/tests/trivial/compute.c   |  40 +++---
 10 files changed, 122 insertions(+), 113 deletions(-)

-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] mesa/formats: Fix the size of ETC2_SRGB8_PUNCHTHROUGH_ALPHA1

2014-08-06 Thread Anuj Phogat
On Wed, Aug 6, 2014 at 10:31 AM, Jason Ekstrand ja...@jlekstrand.net wrote:
 Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com
 ---
  src/mesa/main/formats.csv | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/src/mesa/main/formats.csv b/src/mesa/main/formats.csv
 index f45e34b..eade6fa 100644
 --- a/src/mesa/main/formats.csv
 +++ b/src/mesa/main/formats.csv
 @@ -279,4 +279,4 @@ MESA_FORMAT_ETC2_RG11_EAC , etc2  , 4, 4, 
 x128, , ,
  MESA_FORMAT_ETC2_SIGNED_R11_EAC   , etc2  , 4, 4, x64 , , ,  
, x001, rgb
  MESA_FORMAT_ETC2_SIGNED_RG11_EAC  , etc2  , 4, 4, x128, , ,  
, xy01, rgb
  MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1 , etc2  , 4, 4, x64 , , ,  
, xyzw, rgb
 -MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1, etc2  , 4, 4, x128, , ,  
, xyzw, srgb
 +MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1, etc2  , 4, 4, x64 , , ,  
, xyzw, srgb
 --
 2.0.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Both patches are:
Reviewed-by: Anuj Phogat anuj.pho...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/6] gallium/radeon: store VM address in r600_resource

2014-08-06 Thread Marek Olšák
From: Marek Olšák marek.ol...@amd.com

This will help to get rid of the buffer_get_virtual_address calls.
---
 src/gallium/drivers/radeon/r600_buffer_common.c | 7 +--
 src/gallium/drivers/radeon/r600_pipe_common.h   | 1 +
 src/gallium/drivers/radeon/r600_texture.c   | 1 +
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
b/src/gallium/drivers/radeon/r600_buffer_common.c
index d747cbc..a580685 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -168,14 +168,17 @@ bool r600_init_resource(struct r600_common_screen 
*rscreen,
old_buf = res-buf;
res-cs_buf = rscreen-ws-buffer_get_cs_handle(new_buf); /* should be 
atomic */
res-buf = new_buf; /* should be atomic */
+
+   if (rscreen-info.r600_virtual_address)
+   res-gpu_address = 
rscreen-ws-buffer_get_virtual_address(res-cs_buf);
+
pb_reference(old_buf, NULL);
 
util_range_set_empty(res-valid_buffer_range);
 
if (rscreen-debug_flags  DBG_VM  res-b.b.target == PIPE_BUFFER) {
fprintf(stderr, VM start=0x%PRIX64  end=0x%PRIX64 | Buffer 
%u bytes\n,
-   r600_resource_va(rscreen-b, res-b.b),
-   r600_resource_va(rscreen-b, res-b.b) + 
res-buf-size,
+   res-gpu_address, res-gpu_address + res-buf-size,
res-buf-size);
}
return true;
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index ac69d5b..59d0b3e 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -127,6 +127,7 @@ struct r600_resource {
/* Winsys objects. */
struct pb_buffer*buf;
struct radeon_winsys_cs_handle  *cs_buf;
+   uint64_tgpu_address;
 
/* Resource state. */
enum radeon_bo_domain   domains;
diff --git a/src/gallium/drivers/radeon/r600_texture.c 
b/src/gallium/drivers/radeon/r600_texture.c
index 482bbff..326aca4 100644
--- a/src/gallium/drivers/radeon/r600_texture.c
+++ b/src/gallium/drivers/radeon/r600_texture.c
@@ -655,6 +655,7 @@ r600_texture_create_object(struct pipe_screen *screen,
} else {
resource-buf = buf;
resource-cs_buf = rscreen-ws-buffer_get_cs_handle(buf);
+   resource-gpu_address = 
rscreen-ws-buffer_get_virtual_address(resource-cs_buf);
resource-domains = 
rscreen-ws-buffer_get_initial_domain(resource-cs_buf);
}
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/6] radeonsi: use gpu_address from r600_resource

2014-08-06 Thread Marek Olšák
From: Marek Olšák marek.ol...@amd.com

---
 src/gallium/drivers/radeonsi/si_compute.c | 10 
 src/gallium/drivers/radeonsi/si_descriptors.c | 33 ---
 src/gallium/drivers/radeonsi/si_dma.c | 12 +-
 src/gallium/drivers/radeonsi/si_hw_context.c  |  2 +-
 src/gallium/drivers/radeonsi/si_state.c   | 17 ++
 src/gallium/drivers/radeonsi/si_state_draw.c  | 23 ---
 6 files changed, 41 insertions(+), 56 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
b/src/gallium/drivers/radeonsi/si_compute.c
index 42e4fec..12e4f56 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -114,7 +114,7 @@ static void si_set_global_binding(
uint64_t va;
uint32_t offset;
program-global_buffers[i] = resources[i];
-   va = r600_resource_va(ctx-screen, resources[i]);
+   va = r600_resource(resources[i])-gpu_address;
offset = util_le32_to_cpu(*handles[i]);
va += offset;
va = util_cpu_to_le64(va);
@@ -223,8 +223,7 @@ static void si_launch_grid(
si_resource_create_custom(sctx-b.b.screen,
PIPE_USAGE_DEFAULT, scratch_bytes);
}
-   scratch_buffer_va = r600_resource_va(ctx-screen,
-   (struct pipe_resource*)shader-scratch_bo);
+   scratch_buffer_va = shader-scratch_bo-gpu_address;
si_pm4_add_bo(pm4, shader-scratch_bo,
RADEON_USAGE_READWRITE,
RADEON_PRIO_SHADER_RESOURCE_RW);
@@ -238,8 +237,7 @@ static void si_launch_grid(
 
si_upload_const_buffer(sctx, kernel_args_buffer, (uint8_t*)kernel_args,
kernel_args_size, kernel_args_offset);
-   kernel_args_va = r600_resource_va(ctx-screen,
-   (struct pipe_resource*)kernel_args_buffer);
+   kernel_args_va = kernel_args_buffer-gpu_address;
kernel_args_va += kernel_args_offset;
 
si_pm4_add_bo(pm4, kernel_args_buffer, RADEON_USAGE_READ, 
RADEON_PRIO_SHADER_DATA);
@@ -285,7 +283,7 @@ static void si_launch_grid(
0x190 /* Default value */);
}
 
-   shader_va = r600_resource_va(ctx-screen, (void *)shader-bo);
+   shader_va = shader-bo-gpu_address;
si_pm4_add_bo(pm4, shader-bo, RADEON_USAGE_READ, 
RADEON_PRIO_SHADER_DATA);
si_pm4_set_reg(pm4, R_00B830_COMPUTE_PGM_LO, (shader_va  8)  
0x);
si_pm4_set_reg(pm4, R_00B834_COMPUTE_PGM_HI, shader_va  40);
diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
b/src/gallium/drivers/radeonsi/si_descriptors.c
index 171de45..81ad14b 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -113,8 +113,6 @@ static void si_init_descriptors(struct si_context *sctx,
unsigned num_elements,
void (*emit_func)(struct si_context *ctx, 
struct r600_atom *state))
 {
-   uint64_t va;
-
assert(num_elements = sizeof(desc-enabled_mask)*8);
assert(num_elements = sizeof(desc-dirty_mask)*8);
 
@@ -131,11 +129,11 @@ static void si_init_descriptors(struct si_context *sctx,
 
r600_context_bo_reloc(sctx-b, sctx-b.rings.gfx, desc-buffer,
  RADEON_USAGE_READWRITE, RADEON_PRIO_SHADER_DATA);
-   va = r600_resource_va(sctx-b.b.screen, desc-buffer-b.b);
 
/* We don't check for CS space here, because this should be called
 * only once at context initialization. */
-   si_emit_cp_dma_clear_buffer(sctx, va, desc-buffer-b.b.width0, 0,
+   si_emit_cp_dma_clear_buffer(sctx, desc-buffer-gpu_address,
+   desc-buffer-b.b.width0, 0,
R600_CP_DMA_SYNC);
 }
 
@@ -170,7 +168,7 @@ static void si_emit_shader_pointer(struct si_context *sctx,
 {
struct si_descriptors *desc = (struct si_descriptors*)atom;
struct radeon_winsys_cs *cs = sctx-b.rings.gfx.cs;
-   uint64_t va = r600_resource_va(sctx-b.b.screen, desc-buffer-b.b) +
+   uint64_t va = desc-buffer-gpu_address +
  desc-current_context_id * desc-context_size +
  desc-buffer_offset;
 
@@ -205,7 +203,7 @@ static void si_emit_descriptors(struct si_context *sctx,
 
assert(dirty_mask);
 
-   va_base = r600_resource_va(sctx-b.b.screen, desc-buffer-b.b);
+   va_base = desc-buffer-gpu_address;
 
/* Copy the descriptors to a new context slot. */
/* XXX Consider using TC or L2 for this copy on CIK. */
@@ -567,7 +565,6 @@ static void si_vertex_buffers_begin_new_cs(struct 
si_context *sctx)
 
 void si_update_vertex_buffers(struct 

[Mesa-dev] [PATCH 6/6] gallium/radeon: remove r600_resource_va

2014-08-06 Thread Marek Olšák
From: Marek Olšák marek.ol...@amd.com

---
 src/gallium/drivers/radeon/r600_cs.h | 9 -
 1 file changed, 9 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_cs.h 
b/src/gallium/drivers/radeon/r600_cs.h
index b30b465..3cee760 100644
--- a/src/gallium/drivers/radeon/r600_cs.h
+++ b/src/gallium/drivers/radeon/r600_cs.h
@@ -33,15 +33,6 @@
 #include r600_pipe_common.h
 #include r600d_common.h
 
-static INLINE uint64_t r600_resource_va(struct pipe_screen *screen,
-   struct pipe_resource *resource)
-{
-   struct r600_common_screen *rscreen = (struct r600_common_screen*)screen;
-   struct r600_resource *rresource = (struct r600_resource*)resource;
-
-   return rscreen-ws-buffer_get_virtual_address(rresource-cs_buf);
-}
-
 static INLINE unsigned r600_context_bo_reloc(struct r600_common_context *rctx,
 struct r600_ring *ring,
 struct r600_resource *rbo,
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/6] r600g: use gpu_address from r600_resource

2014-08-06 Thread Marek Olšák
From: Marek Olšák marek.ol...@amd.com

---
 src/gallium/drivers/r600/evergreen_compute.c|  5 +--
 src/gallium/drivers/r600/evergreen_hw_context.c |  6 ++--
 src/gallium/drivers/r600/evergreen_state.c  | 47 +++--
 src/gallium/drivers/r600/r600_hw_context.c  |  4 +--
 src/gallium/drivers/r600/r600_state_common.c|  6 ++--
 5 files changed, 29 insertions(+), 39 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
b/src/gallium/drivers/r600/evergreen_compute.c
index 1970414..402c871 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -521,12 +521,9 @@ void evergreen_emit_cs_shader(
struct r600_pipe_compute *shader = state-shader;
struct r600_kernel *kernel = shader-kernels[state-kernel_index];
struct radeon_winsys_cs *cs = rctx-b.rings.gfx.cs;
-   uint64_t va;
-
-   va = r600_resource_va(rctx-screen-b.b, kernel-code_bo-b.b);
 
r600_write_compute_context_reg_seq(cs, R_0288D0_SQ_PGM_START_LS, 3);
-   radeon_emit(cs, va  8); /* R_0288D0_SQ_PGM_START_LS */
+   radeon_emit(cs, kernel-code_bo-gpu_address  8); /* 
R_0288D0_SQ_PGM_START_LS */
radeon_emit(cs,   /* R_0288D4_SQ_PGM_RESOURCES_LS */
S_0288D4_NUM_GPRS(kernel-bc.ngpr)
| S_0288D4_STACK_SIZE(kernel-bc.nstack));
diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c 
b/src/gallium/drivers/r600/evergreen_hw_context.c
index f95a17e..63c2906 100644
--- a/src/gallium/drivers/r600/evergreen_hw_context.c
+++ b/src/gallium/drivers/r600/evergreen_hw_context.c
@@ -46,8 +46,8 @@ void evergreen_dma_copy_buffer(struct r600_context *rctx,
util_range_add(rdst-valid_buffer_range, dst_offset,
   dst_offset + size);
 
-   dst_offset += r600_resource_va(rctx-screen-b.b, dst);
-   src_offset += r600_resource_va(rctx-screen-b.b, src);
+   dst_offset += rdst-gpu_address;
+   src_offset += rsrc-gpu_address;
 
/* see if we use dword or byte copy */
if (!(dst_offset % 4)  !(src_offset % 4)  !(size % 4)) {
@@ -97,7 +97,7 @@ void evergreen_cp_dma_clear_buffer(struct r600_context *rctx,
util_range_add(r600_resource(dst)-valid_buffer_range, offset,
   offset + size);
 
-   offset += r600_resource_va(rctx-screen-b.b, dst);
+   offset += r600_resource(dst)-gpu_address;
 
/* Flush the cache where the resource is bound. */
rctx-b.flags |= R600_CONTEXT_INV_CONST_CACHE |
diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 63811e8..4598ccf 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -600,7 +600,6 @@ texture_buffer_sampler_view(struct r600_pipe_sampler_view 
*view,
unsigned width0, unsigned height0)

 {
-   struct pipe_context *ctx = view-base.context;
struct r600_texture *tmp = (struct r600_texture*)view-base.texture;
uint64_t va;
int stride = util_format_get_blocksize(view-base.format);
@@ -624,7 +623,7 @@ texture_buffer_sampler_view(struct r600_pipe_sampler_view 
*view,
 
swizzle_res = r600_get_swizzle_combined(desc-swizzle, swizzle, TRUE);
 
-   va = r600_resource_va(ctx-screen, view-base.texture) + offset;
+   va = tmp-resource.gpu_address + offset;
view-tex_resource = tmp-resource;
 
view-skip_mip_address_reloc = true;
@@ -781,7 +780,7 @@ evergreen_create_sampler_view_custom(struct pipe_context 
*ctx,
} else if (texture-target == PIPE_TEXTURE_CUBE_ARRAY)
depth = texture-array_size / 6;
 
-   va = r600_resource_va(ctx-screen, texture);
+   va = tmp-resource.gpu_address;
 
view-tex_resource = tmp-resource;
view-tex_resource_words[0] = 
(S_03_DIM(r600_tex_dim(texture-target, texture-nr_samples)) |
@@ -941,8 +940,7 @@ void evergreen_init_color_surface_rat(struct r600_context 
*rctx,
endian = ENDIAN_NONE;
}
 
-   surf-cb_color_base =
-   r600_resource_va(rctx-b.b.screen, pipe_buffer)  8;
+   surf-cb_color_base = r600_resource(pipe_buffer)-gpu_address  8;
 
surf-cb_color_pitch = (pitch / 8) - 1;
 
@@ -980,7 +978,6 @@ void evergreen_init_color_surface(struct r600_context *rctx,
 {
struct r600_screen *rscreen = rctx-screen;
struct r600_texture *rtex = (struct r600_texture*)surf-base.texture;
-   struct pipe_resource *pipe_tex = surf-base.texture;
unsigned level = surf-base.u.tex.level;
unsigned pitch, slice;
unsigned color_info, color_attrib, color_dim = 0, color_view;
@@ -1139,7 +1136,7 @@ void evergreen_init_color_surface(struct r600_context 
*rctx,
color_info |= S_028C70_COMPRESSION(1);
}
 
-   base_offset = r600_resource_va(rctx-b.b.screen, pipe_tex);

[Mesa-dev] [PATCH 5/6] gallium/radeon: use gpu_address from r600_resource

2014-08-06 Thread Marek Olšák
From: Marek Olšák marek.ol...@amd.com

---
 src/gallium/drivers/radeon/r600_query.c | 14 ++
 src/gallium/drivers/radeon/r600_streamout.c |  9 +++--
 src/gallium/drivers/radeon/r600_texture.c   | 12 +---
 3 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_query.c 
b/src/gallium/drivers/radeon/r600_query.c
index 92863cb..503737c 100644
--- a/src/gallium/drivers/radeon/r600_query.c
+++ b/src/gallium/drivers/radeon/r600_query.c
@@ -171,8 +171,7 @@ static void r600_emit_query_begin(struct 
r600_common_context *ctx, struct r600_q
}
 
/* emit begin query */
-   va = r600_resource_va(ctx-b.screen, (void*)query-buffer.buf);
-   va += query-buffer.results_end;
+   va = query-buffer.buf-gpu_address + query-buffer.results_end;
 
switch (query-type) {
case PIPE_QUERY_OCCLUSION_COUNTER:
@@ -233,7 +232,8 @@ static void r600_emit_query_end(struct r600_common_context 
*ctx, struct r600_que
ctx-need_gfx_cs_space(ctx-b, query-num_cs_dw, FALSE);
}
 
-   va = r600_resource_va(ctx-b.screen, (void*)query-buffer.buf);
+   va = query-buffer.buf-gpu_address;
+
/* emit end query */
switch (query-type) {
case PIPE_QUERY_OCCLUSION_COUNTER:
@@ -329,7 +329,7 @@ static void r600_emit_query_predication(struct 
r600_common_context *ctx, struct
/* emit predicate packets for all data blocks */
for (qbuf = query-buffer; qbuf; qbuf = qbuf-previous) {
unsigned results_base = 0;
-   uint64_t va = r600_resource_va(ctx-b.screen, 
qbuf-buf-b.b);
+   uint64_t va = qbuf-buf-gpu_address;
 
while (results_base  qbuf-results_end) {
radeon_emit(cs, PKT3(PKT3_SET_PREDICATION, 1, 
0));
@@ -826,7 +826,6 @@ void r600_query_init_backend_mask(struct 
r600_common_context *ctx)
uint32_t *results;
unsigned num_backends = ctx-screen-info.r600_num_backends;
unsigned i, mask = 0;
-   uint64_t va;
 
/* if backend_map query is supported by the kernel */
if (ctx-screen-info.r600_backend_map_valid) {
@@ -861,7 +860,6 @@ void r600_query_init_backend_mask(struct 
r600_common_context *ctx)
   PIPE_USAGE_STAGING, ctx-max_db*16);
if (!buffer)
goto err;
-   va = r600_resource_va(ctx-b.screen, (void*)buffer);
 
/* initialize buffer with zeroes */
results = r600_buffer_map_sync_with_rings(ctx, buffer, 
PIPE_TRANSFER_WRITE);
@@ -872,8 +870,8 @@ void r600_query_init_backend_mask(struct 
r600_common_context *ctx)
/* emit EVENT_WRITE for ZPASS_DONE */
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 2, 0));
radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_ZPASS_DONE) | 
EVENT_INDEX(1));
-   radeon_emit(cs, va);
-   radeon_emit(cs, va  32);
+   radeon_emit(cs, buffer-gpu_address);
+   radeon_emit(cs, buffer-gpu_address  32);
 
r600_emit_reloc(ctx, ctx-rings.gfx, buffer, 
RADEON_USAGE_WRITE, RADEON_PRIO_MIN);
 
diff --git a/src/gallium/drivers/radeon/r600_streamout.c 
b/src/gallium/drivers/radeon/r600_streamout.c
index cb72ada..e2413c2 100644
--- a/src/gallium/drivers/radeon/r600_streamout.c
+++ b/src/gallium/drivers/radeon/r600_streamout.c
@@ -212,8 +212,7 @@ static void r600_emit_streamout_begin(struct 
r600_common_context *rctx, struct r
 t[i]-b.buffer_size)  2);/* 
BUFFER_SIZE (in DW) */
radeon_emit(cs, stride_in_dw[i]);   /* 
VTX_STRIDE (in DW) */
} else {
-   uint64_t va = r600_resource_va(rctx-b.screen,
-  (void*)t[i]-b.buffer);
+   uint64_t va = 
r600_resource(t[i]-b.buffer)-gpu_address;
 
update_flags |= SURFACE_BASE_UPDATE_STRMOUT(i);
 
@@ -239,8 +238,7 @@ static void r600_emit_streamout_begin(struct 
r600_common_context *rctx, struct r
}
 
if (rctx-streamout.append_bitmask  (1  i)) {
-   uint64_t va = r600_resource_va(rctx-b.screen,
-  
(void*)t[i]-buf_filled_size) +
+   uint64_t va = t[i]-buf_filled_size-gpu_address +
  t[i]-buf_filled_size_offset;
 
/* Append. */
@@ -286,8 +284,7 @@ void r600_emit_streamout_end(struct r600_common_context 
*rctx)
if (!t[i])
continue;
 
-   va = r600_resource_va(rctx-b.screen,
- (void*)t[i]-buf_filled_size) + 
t[i]-buf_filled_size_offset;
+   va = t[i]-buf_filled_size-gpu_address + 
t[i]-buf_filled_size_offset;
   

[Mesa-dev] [PATCH 1/6] r600g: remove useless r600_resource_va calls

2014-08-06 Thread Marek Olšák
From: Marek Olšák marek.ol...@amd.com

R600-R700 don't support virtual memory.
---
 src/gallium/drivers/r600/r600_state.c | 27 +--
 1 file changed, 9 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index 258ffd1..607b199 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -595,25 +595,22 @@ texture_buffer_sampler_view(struct r600_pipe_sampler_view 
*view,
unsigned width0, unsigned height0)

 {
-   struct pipe_context *ctx = view-base.context;
struct r600_texture *tmp = (struct r600_texture*)view-base.texture;
-   uint64_t va;
int stride = util_format_get_blocksize(view-base.format);
unsigned format, num_format, format_comp, endian;
-   unsigned offset = view-base.u.buf.first_element * stride;
+   uint64_t offset = view-base.u.buf.first_element * stride;
unsigned size = (view-base.u.buf.last_element - 
view-base.u.buf.first_element + 1) * stride;
 
r600_vertex_data_type(view-base.format,
  format, num_format, format_comp,
  endian);
 
-   va = r600_resource_va(ctx-screen, view-base.texture) + offset;
view-tex_resource = tmp-resource;
-
view-skip_mip_address_reloc = true;
-   view-tex_resource_words[0] = va;
+
+   view-tex_resource_words[0] = offset;
view-tex_resource_words[1] = size - 1;
-   view-tex_resource_words[2] = S_038008_BASE_ADDRESS_HI(va  32UL) |
+   view-tex_resource_words[2] = S_038008_BASE_ADDRESS_HI(offset  32UL) |
S_038008_STRIDE(stride) |
S_038008_DATA_FORMAT(format) |
S_038008_NUM_FORMAT_ALL(num_format) |
@@ -1105,8 +1102,7 @@ static void r600_init_depth_surface(struct r600_context 
*rctx,
 
/* use htile only for first level */
if (rtex-htile_buffer  !level) {
-   uint64_t va = r600_resource_va(rctx-screen-b.b, 
rtex-htile_buffer-b.b);
-   surf-db_htile_data_base = va  8;
+   surf-db_htile_data_base = 0;
surf-db_htile_surface = S_028D24_HTILE_WIDTH(1) |
S_028D24_HTILE_HEIGHT(1) |
S_028D24_FULL_CACHE(1) |
@@ -1944,7 +1940,6 @@ static void r600_emit_shader_stages(struct r600_context 
*rctx, struct r600_atom
 
 static void r600_emit_gs_rings(struct r600_context *rctx, struct r600_atom *a)
 {
-   struct pipe_screen *screen = rctx-b.b.screen;
struct radeon_winsys_cs *cs = rctx-b.rings.gfx.cs;
struct r600_gs_rings_state *state = (struct r600_gs_rings_state*)a;
struct r600_resource *rbuffer;
@@ -1955,8 +1950,7 @@ static void r600_emit_gs_rings(struct r600_context *rctx, 
struct r600_atom *a)
 
if (state-enable) {
rbuffer =(struct r600_resource*)state-esgs_ring.buffer;
-   r600_write_config_reg(cs, R_008C40_SQ_ESGS_RING_BASE,
-   (r600_resource_va(screen, rbuffer-b.b))  8);
+   r600_write_config_reg(cs, R_008C40_SQ_ESGS_RING_BASE, 0);
radeon_emit(cs, PKT3(PKT3_NOP, 0, 0));
radeon_emit(cs, r600_context_bo_reloc(rctx-b, 
rctx-b.rings.gfx, rbuffer,
  RADEON_USAGE_READWRITE,
@@ -1965,8 +1959,7 @@ static void r600_emit_gs_rings(struct r600_context *rctx, 
struct r600_atom *a)
state-esgs_ring.buffer_size  8);
 
rbuffer =(struct r600_resource*)state-gsvs_ring.buffer;
-   r600_write_config_reg(cs, R_008C48_SQ_GSVS_RING_BASE,
-   (r600_resource_va(screen, rbuffer-b.b))  8);
+   r600_write_config_reg(cs, R_008C48_SQ_GSVS_RING_BASE, 0);
radeon_emit(cs, PKT3(PKT3_NOP, 0, 0));
radeon_emit(cs, r600_context_bo_reloc(rctx-b, 
rctx-b.rings.gfx, rbuffer,
  RADEON_USAGE_READWRITE,
@@ -2644,8 +2637,7 @@ void r600_update_gs_state(struct pipe_context *ctx, 
struct r600_pipe_shader *sha
r600_store_context_reg(cb, R_02887C_SQ_PGM_RESOURCES_GS,
   S_02887C_NUM_GPRS(rshader-bc.ngpr) |
   S_02887C_STACK_SIZE(rshader-bc.nstack));
-   r600_store_context_reg(cb, R_02886C_SQ_PGM_START_GS,
-  r600_resource_va(ctx-screen, (void 
*)shader-bo)  8);
+   r600_store_context_reg(cb, R_02886C_SQ_PGM_START_GS, 0);
/* After that, the NOP relocation packet must be emitted (shader-bo, 
RADEON_USAGE_READ). */
 }
 
@@ -2659,8 +2651,7 @@ void r600_update_es_state(struct pipe_context *ctx, 
struct r600_pipe_shader *sha
r600_store_context_reg(cb, R_028890_SQ_PGM_RESOURCES_ES,
   

Re: [Mesa-dev] [PATCH 08/20] i965/cfg: Add functions to test if a block is a successor/predecessor.

2014-08-06 Thread Matt Turner
On Wed, Aug 6, 2014 at 11:19 AM, Matt Turner matts...@gmail.com wrote:
 On Tue, Aug 5, 2014 at 10:21 AM, Pohjolainen, Topi
 topi.pohjolai...@intel.com wrote:
 On Thu, Jul 24, 2014 at 07:54:15PM -0700, Matt Turner wrote:
 ---
  src/mesa/drivers/dri/i965/brw_cfg.cpp | 24 
  src/mesa/drivers/dri/i965/brw_cfg.h   |  2 ++
  2 files changed, 26 insertions(+)

 diff --git a/src/mesa/drivers/dri/i965/brw_cfg.cpp 
 b/src/mesa/drivers/dri/i965/brw_cfg.cpp
 index d806b83..9cd8b9f 100644
 --- a/src/mesa/drivers/dri/i965/brw_cfg.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_cfg.cpp
 @@ -71,6 +71,30 @@ bblock_t::add_successor(void *mem_ctx, bblock_t 
 *successor)
 children.push_tail(::link(mem_ctx, successor));
  }

 +bool
 +bblock_t::is_predecessor_of(const bblock_t *block) const
 +{
 +   foreach_list_typed_safe (bblock_link, parent, link, block-parents) {

 I read patch number three again, and noticed this small formatting change
 there as well. I haven't seen us leaving the space before ( anywhere else.

 I figure I should have a space between the macro and the ( since we
 put a space there for regular for loops.

 +  if (parent-block == this) {
 + return true;
 +  }

 We have one line blocks with and without {}. I just thought I mention in
 case you didn't mean to.

 Right, I'll drop the {}.

Oh, this is inside another {} set. I'd rather include the braces when
they're nested.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallium: remove PIPE_SHADER_CAP_MAX_ADDRS

2014-08-06 Thread Marek Olšák
From: Marek Olšák marek.ol...@amd.com

This limit is fixed in Mesa core and cannot be changed.
It only affects ARB_vertex_program and ARB_fragment_program.

The minimum value for ARB_vertex_program is 1 according to the spec.
The maximum value for ARB_vertex_program is limited to 1 by Mesa core.

The value should be zero for ARB_fragment_program, because it doesn't
support ARL.

Finally, drivers shouldn't mess with these values arbitrarily.
---

Sidenote: Does anybody use predicates in TGSI?

 src/gallium/auxiliary/gallivm/lp_bld_limits.h| 2 --
 src/gallium/auxiliary/tgsi/tgsi_exec.h   | 3 ---
 src/gallium/docs/source/screen.rst   | 1 -
 src/gallium/drivers/freedreno/freedreno_screen.c | 2 --
 src/gallium/drivers/i915/i915_screen.c   | 2 --
 src/gallium/drivers/ilo/ilo_screen.c | 2 --
 src/gallium/drivers/nouveau/nv30/nv30_screen.c   | 4 
 src/gallium/drivers/nouveau/nv50/nv50_screen.c   | 2 --
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c   | 2 --
 src/gallium/drivers/r300/r300_screen.c   | 3 ---
 src/gallium/drivers/r600/r600_pipe.c | 3 ---
 src/gallium/drivers/radeonsi/si_pipe.c   | 3 ---
 src/gallium/drivers/svga/svga_screen.c   | 3 ---
 src/gallium/include/pipe/p_defines.h | 1 -
 src/mesa/state_tracker/st_extensions.c   | 3 +--
 15 files changed, 1 insertion(+), 35 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_limits.h 
b/src/gallium/auxiliary/gallivm/lp_bld_limits.h
index eb83ea8..a96ab29 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_limits.h
+++ b/src/gallium/auxiliary/gallivm/lp_bld_limits.h
@@ -103,8 +103,6 @@ gallivm_get_shader_param(enum pipe_shader_cap param)
   return PIPE_MAX_CONSTANT_BUFFERS;
case PIPE_SHADER_CAP_MAX_TEMPS:
   return LP_MAX_TGSI_TEMPS;
-   case PIPE_SHADER_CAP_MAX_ADDRS:
-  return LP_MAX_TGSI_ADDRS;
case PIPE_SHADER_CAP_MAX_PREDS:
   return LP_MAX_TGSI_PREDS;
case PIPE_SHADER_CAP_TGSI_CONT_SUPPORTED:
diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.h 
b/src/gallium/auxiliary/tgsi/tgsi_exec.h
index c6fd3d7..4720ec6 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.h
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.h
@@ -193,7 +193,6 @@ struct tgsi_sampler
 #define TGSI_EXEC_NUM_TEMP_R4
 
 #define TGSI_EXEC_TEMP_ADDR (TGSI_EXEC_NUM_TEMPS + 8)
-#define TGSI_EXEC_NUM_ADDRS 1
 
 /* predicate register */
 #define TGSI_EXEC_TEMP_P0   (TGSI_EXEC_NUM_TEMPS + 9)
@@ -433,8 +432,6 @@ tgsi_exec_get_shader_param(enum pipe_shader_cap param)
   return PIPE_MAX_CONSTANT_BUFFERS;
case PIPE_SHADER_CAP_MAX_TEMPS:
   return TGSI_EXEC_NUM_TEMPS;
-   case PIPE_SHADER_CAP_MAX_ADDRS:
-  return TGSI_EXEC_NUM_ADDRS;
case PIPE_SHADER_CAP_MAX_PREDS:
   return TGSI_EXEC_NUM_PREDS;
case PIPE_SHADER_CAP_TGSI_CONT_SUPPORTED:
diff --git a/src/gallium/docs/source/screen.rst 
b/src/gallium/docs/source/screen.rst
index 74cecc2..814e3ae 100644
--- a/src/gallium/docs/source/screen.rst
+++ b/src/gallium/docs/source/screen.rst
@@ -269,7 +269,6 @@ file is still supported. In that case, the constbuf index 
is assumed
 to be 0.
 
 * ``PIPE_SHADER_CAP_MAX_TEMPS``: The maximum number of temporary registers.
-* ``PIPE_SHADER_CAP_MAX_ADDRS``: The maximum number of address registers.
 * ``PIPE_SHADER_CAP_MAX_PREDS``: The maximum number of predicate registers.
 * ``PIPE_SHADER_CAP_TGSI_CONT_SUPPORTED``: Whether the continue opcode is 
supported.
 * ``PIPE_SHADER_CAP_INDIRECT_INPUT_ADDR``: Whether indirect addressing
diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
b/src/gallium/drivers/freedreno/freedreno_screen.c
index 8fae5dd..5fb7352 100644
--- a/src/gallium/drivers/freedreno/freedreno_screen.c
+++ b/src/gallium/drivers/freedreno/freedreno_screen.c
@@ -327,8 +327,6 @@ fd_screen_get_shader_param(struct pipe_screen *pscreen, 
unsigned shader,
return 16;
case PIPE_SHADER_CAP_MAX_TEMPS:
return 64; /* Max native temporaries. */
-   case PIPE_SHADER_CAP_MAX_ADDRS:
-   return 1; /* Max native address registers */
case PIPE_SHADER_CAP_MAX_CONST_BUFFER_SIZE:
return ((screen-gpu_id = 300) ? 1024 : 64) * sizeof(float[4]);
case PIPE_SHADER_CAP_MAX_CONST_BUFFERS:
diff --git a/src/gallium/drivers/i915/i915_screen.c 
b/src/gallium/drivers/i915/i915_screen.c
index 133c773..ca3dd4a 100644
--- a/src/gallium/drivers/i915/i915_screen.c
+++ b/src/gallium/drivers/i915/i915_screen.c
@@ -135,8 +135,6 @@ i915_get_shader_param(struct pipe_screen *screen, unsigned 
shader, enum pipe_sha
  return 1;
   case PIPE_SHADER_CAP_MAX_TEMPS:
  return 12; /* XXX: 12 - 32 ? */
-  case PIPE_SHADER_CAP_MAX_ADDRS:
- return 0;
   case PIPE_SHADER_CAP_MAX_PREDS:
  return 0;
   case PIPE_SHADER_CAP_TGSI_CONT_SUPPORTED:
diff --git a/src/gallium/drivers/ilo/ilo_screen.c 
b/src/gallium/drivers/ilo/ilo_screen.c

Re: [Mesa-dev] Merging VC4 driver

2014-08-06 Thread Eric Anholt
Roland Scheidegger srol...@vmware.com writes:

 Am 06.08.2014 22:33, schrieb Eric Anholt:

 + * When building using the simulator (on x86), we advertise ourselves as the
 + * i965 driver so that you can just make a directory with a link from
 + * i965_dri.so to the built vc4_dri.so, and point LIBGL_DRIVERS_PATH to that
 + * on your i965-using host to run the driver under simulation.
 + *
 + * This is, of course, incompatible with building with the ilo driver, but 
 you
 + * shouldn't be building that anyway.
 + */
 +PUBLIC const __DRIextension **__driDriverGetExtensions_i965(void)
 +{
 +   globalDriverAPI = galliumdrm_driver_api;
 +   return galliumdrm_driver_extensions;
 +}
 +#endif
 I have no idea how that simulator works, but this looks like a fairly
 gross hack to me. Couldn't you use something similar to how the software
 based drivers are loaded or anything like that?

I need a DRI fd and normal DRI buffer management from a host x server
(which I interact with using the dumb ioctls).  Are you thinking of
something that would provide that in a simpler way?


pgpHs5t3f30AY.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 82268] New: Add support for the OpenRISC architecture (or1k)

2014-08-06 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=82268

  Priority: medium
Bug ID: 82268
  Assignee: mesa-dev@lists.freedesktop.org
   Summary: Add support for the OpenRISC architecture (or1k)
  Severity: enhancement
Classification: Unclassified
OS: All
  Reporter: manuel.montez...@gmail.com
  Hardware: Other
Status: NEW
   Version: 10.1
 Component: Mesa core
   Product: Mesa

Created attachment 104182
  -- https://bugs.freedesktop.org/attachment.cgi?id=104182action=edit
mesa-or1k.patch

From Debian bug report:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=749172

From: Christian Svensson deb...@cmd.nu
Date: Sat, 24 May 2014 21:37:01 +0200

Package: mesa
Version: 10.1.2
Severity: wishlist
Tags: upstream patch

Dear Maintainer,

This trivial patch adds support for or1k.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >