Re: [Mesa-dev] [PATCH] glsl: Flip around "if" statements with empty "then" blocks.

2013-05-03 Thread Matt Turner
On Fri, May 3, 2013 at 3:05 PM, Eric Anholt  wrote:
> This cleans up some funny-looking code in some unigine shaders I was
> looking at.  Also slightly helps on planeshift and a few shaders in an
> upcoming Valve release.
>
> total instructions in shared programs: 1653715 -> 1653587 (-0.01%)
> instructions in affected programs: 16550 -> 16422 (-0.77%)
> ---
>  src/glsl/opt_if_simplification.cpp | 27 ++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/src/glsl/opt_if_simplification.cpp 
> b/src/glsl/opt_if_simplification.cpp
> index e1a23d9..db59b13 100644
> --- a/src/glsl/opt_if_simplification.cpp
> +++ b/src/glsl/opt_if_simplification.cpp
> @@ -25,7 +25,8 @@
>   * \file opt_if_simplification.cpp
>   *
>   * Moves constant branches of if statements out to the surrounding
> - * instruction stream.
> + * instruction stream, and inverts if conditionals to avoid empty
> + * "then" blocks.
>   */
>
>  #include "ir.h"
> @@ -101,6 +102,30 @@ ir_if_simplification_visitor::visit_leave(ir_if *ir)
>}
>ir->remove();
>this->made_progress = true;
> +  return visit_continue;
> +   }
> +
> +   /* Turn:
> +*
> +* if (cond) {
> +* } else {
> +* do_work();
> +* }
> +*
> +* into :
> +*
> +* if (!cond)
> +* do_work();
> +*
> +* which avoids control flow for "else" (which is usually more
> +* expensive than normal operations), and the "not" can usually be
> +* folded into the generation of "cond" anyway.
> +*/
> +   if (ir->then_instructions.is_empty()) {
> +  ir->condition = new(ralloc_parent(ir->condition))
> +ir_expression(ir_unop_logic_not, ir->condition);
> +  ir->else_instructions.move_nodes_to(&ir->then_instructions);
> +  this->made_progress = true;
> }
>
> return visit_continue;
> --

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 15/14] i965/fs: Don't emit 16-wide BFI1 instructions.

2013-05-03 Thread Matt Turner
On Fri, May 3, 2013 at 5:02 PM, Eric Anholt  wrote:
> Matt Turner  writes:
>
>> The Haswell Bspec says "A SIMD16 instruction is not allowed." (but
>> 16-wide BFI1 works for me so far). Since GLSL's bitfieldInsert()
>> function takes int parameters BFI1 produces the same results in all
>> channels, so there's never any reason to emit a 16-wide BFI1.
>
> Suppose the first 8 pixels didn't take a branch, while the second 8 did,
> and you're bitfield inserting in this branch.  You'd be using undefined
> values for src[0].

You're right. Thanks. I'll send an updated patch.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] i965/gen7: Add defines for Memory Object Control State

2013-05-03 Thread Chad Versace
CC: Stéphane Marchesin 
CC: Kenneth Graunke 
CC: Eric Anholt 
CC: Matt Turner 
Signed-off-by: Chad Versace 
---

 src/mesa/drivers/dri/i965/brw_defines.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 6517f24..7895caf 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -503,6 +503,8 @@
 #define BRW_SURFACE_Y_OFFSET_MASK  INTEL_MASK(23, 20)
 #define GEN7_SURFACE_MIN_LOD_SHIFT  4
 #define GEN7_SURFACE_MIN_LOD_MASK   INTEL_MASK(7, 4)
+#define GEN7_SURFACE_MOCS_SHIFT 16
+#define GEN7_SURFACE_MOCS_MASK  INTEL_MASK(19, 16)
 
 /* Surface state DW6 */
 #define GEN7_SURFACE_MCS_ENABLE (1 << 0)
@@ -1660,6 +1662,24 @@ enum brw_wm_barycentric_interp_mode {
  */
 #define BRW_MAX_NUM_BUFFER_ENTRIES (1 << 27)
 
+/* Memory Object Control State, Gen7 */
+/* L3 Cacheability Control */
+#define GEN7_MOCS_L3_UNCACHEABLE0
+#define GEN7_MOCS_L3_CACHEABLE  1
+/* LCC Cacheability Control */
+#define GEN7_MOCS_LCC_USE_PTE   (0 << 1)
+#define GEN7_MOCS_LCC_CACHEABLE (1 << 1)
+
+/* Memory Object Control State, Haswell */
+/* L3 Cacheability Control */
+#define HSW_MOCS_L3_UNCACHEABLE 0
+#define HSW_MOCS_L3_CACHEABLE   1
+/* LCC Cacheability Control */
+#define HSW_MOCS_LCC_USE_PTE(0 << 1)
+#define HSW_MOCS_LCC_UNCACHEABLE(1 << 1)
+#define HSW_MOCS_LCC_WB_TO_ALL  (2 << 1)
+#define HSW_MOCS_LCC_WB_TO_SOME (3 << 1)
+
 #include "intel_chipset.h"
 
 #endif
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] i965/hsw: Set MOCS for surfaces

2013-05-03 Thread Chad Versace
The drivers was setting MOCS (Memory Object Control State) to 0 for all
objects. This patch sets it as following:
renderbuffer, depthbuffer => LLC uncacheable, L3 cacheable
texture, stencil, hiz => LLC cacheable, L3 cacheable

The goal here is to avoid blowing out the LLC with too-large buffers.

Performance:
Haswell Harris Beach GT3
Android 4.2.2
kernel based on 3.8-4fc7c97

GLBenchmark 2.5.1 Egypt HD C24Z16 Offscreen DXT1
+32.0309% +/- 0.775397%,  n = 5, 95% confidence

GLBenchmark 2.7 T-Rex HD C24Z16 Offscreen Fixed timestep ETC1
+20.2435% +/- 0.821163%,  n = 5, 95% confidence

CC: Stéphane Marchesin 
CC: Kenneth Graunke 
CC: Eric Anholt 
CC: Matt Turner 
Signed-off-by: Chad Versace 
---
 src/mesa/drivers/dri/i965/brw_context.c   | 31 +++
 src/mesa/drivers/dri/i965/brw_context.h   | 13 ++
 src/mesa/drivers/dri/i965/gen7_blorp.cpp  | 14 +++---
 src/mesa/drivers/dri/i965/gen7_misc_state.c   | 16 ++--
 src/mesa/drivers/dri/i965/gen7_wm_surface_state.c |  7 +++--
 5 files changed, 74 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 4650553..edcf59d 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -418,3 +418,34 @@ brwCreateContext(int api,
return true;
 }
 
+/**
+ * Get the region's Memory Object Control State.
+ */
+uint32_t
+brw_get_mocs(struct brw_context *brw,
+ struct intel_region *region,
+ enum brw_mocs_usage usage)
+{
+   struct intel_context *intel = &brw->intel;
+   uint32_t mocs = 0;
+
+   if (intel->is_haswell) {
+  /* This heuristic is dumb: it considers the buffer's usage, but not its
+   * size. A more intelligent heuristic may give us better performance.
+   */
+  switch (usage) {
+  case BRW_MOCS_USAGE_RB_SURFACE:
+  case BRW_MOCS_USAGE_DEPTH:
+ /* These surfaces are usually so large they blow out the LLC. */
+ mocs = HSW_MOCS_LCC_UNCACHEABLE | HSW_MOCS_L3_CACHEABLE;
+ break;
+  case BRW_MOCS_USAGE_TEX_SURFACE:
+  case BRW_MOCS_USAGE_HIZ:
+  case BRW_MOCS_USAGE_STENCIL:
+ mocs = HSW_MOCS_LCC_WB_TO_ALL | HSW_MOCS_L3_CACHEABLE;
+ break;
+  }
+   }
+
+   return mocs;
+}
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index c682501..172d36d 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1371,6 +1371,19 @@ gen7_emit_depth_stencil_hiz(struct brw_context *brw,
 uint32_t width, uint32_t height,
 uint32_t tile_x, uint32_t tile_y);
 
+enum brw_mocs_usage {
+   BRW_MOCS_USAGE_RB_SURFACE,
+   BRW_MOCS_USAGE_TEX_SURFACE,
+   BRW_MOCS_USAGE_DEPTH,
+   BRW_MOCS_USAGE_STENCIL,
+   BRW_MOCS_USAGE_HIZ,
+};
+
+uint32_t
+brw_get_mocs(struct brw_context *brw,
+ struct intel_region *region,
+ enum brw_mocs_usage usage);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp 
b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
index 1c23866..48e98dc 100644
--- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp
+++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
@@ -151,6 +151,9 @@ gen7_blorp_emit_surface_state(struct brw_context *brw,
 */
struct intel_region *region = surface->mt->region;
uint32_t tile_x, tile_y;
+   uint32_t mocs = brw_get_mocs(brw, region, is_render_target
+? BRW_MOCS_USAGE_RB_SURFACE
+: BRW_MOCS_USAGE_TEX_SURFACE);
 
uint32_t tiling = surface->map_stencil_as_y_tiled
   ? I915_TILING_Y : region->tiling;
@@ -183,7 +186,8 @@ gen7_blorp_emit_surface_state(struct brw_context *brw,
assert(tile_x % 4 == 0);
assert(tile_y % 2 == 0);
surf[5] = SET_FIELD(tile_x / 4, BRW_SURFACE_X_OFFSET) |
- SET_FIELD(tile_y / 2, BRW_SURFACE_Y_OFFSET);
+ SET_FIELD(tile_y / 2, BRW_SURFACE_Y_OFFSET) |
+ SET_FIELD(mocs, GEN7_SURFACE_MOCS);
 
surf[2] = SET_FIELD(width - 1, GEN7_SURFACE_WIDTH) |
  SET_FIELD(height - 1, GEN7_SURFACE_HEIGHT);
@@ -598,6 +602,8 @@ gen7_blorp_emit_depth_stencil_config(struct brw_context 
*brw,
  intel_region_get_aligned_offset(params->depth.mt->region,
  draw_x & ~tile_mask_x,
  draw_y & ~tile_mask_y, false);
+  uint32_t depth_mocs = brw_get_mocs(brw, params->depth.mt->region,
+ BRW_MOCS_USAGE_DEPTH);
 
   /* According to the Sandy Bridge PRM, volume 2 part 1, pp326-327
* (3DSTATE_DEPTH_BUFFER dw5), in the documentation for "Depth
@@ -634,7 +640,7 @@ gen7_blorp_emit_depth_stencil_config(struct brw_context 
*brw,
 offset);

[Mesa-dev] [PATCH 0/2] i965/hsw: Set MOCS for surfaces

2013-05-03 Thread Chad Versace
+32% on GLBenchmark 2.5 Egypt Offscreen DXT1. Tested on Harris Beach GT3 with
Android.

Please please please, someone run Piglit on this series.  I currently have no
Haswell hardware that is stable on non-Android Linux, and so can't run Piglit
myself.  I don't want to commit this series until it's been regression tested.

This series lives on my "mocs" branch.

Chad Versace (2):
  i965/gen7: Add defines for Memory Object Control State
  i965/hsw: Set MOCS for surfaces

 src/mesa/drivers/dri/i965/brw_context.c   | 31 +++
 src/mesa/drivers/dri/i965/brw_context.h   | 13 ++
 src/mesa/drivers/dri/i965/brw_defines.h   | 20 +++
 src/mesa/drivers/dri/i965/gen7_blorp.cpp  | 14 +++---
 src/mesa/drivers/dri/i965/gen7_misc_state.c   | 16 ++--
 src/mesa/drivers/dri/i965/gen7_wm_surface_state.c |  7 +++--
 6 files changed, 94 insertions(+), 7 deletions(-)

-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Enable fast clears on non-8x4-aligned sizes.

2013-05-03 Thread Eric Anholt
Kenneth Graunke  writes:

> On 05/02/2013 01:58 PM, Eric Anholt wrote:
>> Improves glb2.7 performance at a misaligned size by 2.3% +/- 0.7% (n=11).
>> The workaround was to avoid bad primitive/surface sizes, but that's worked
>> around as of a14dc4f92cdad6177d83f051a088a66e31a973bc.  (One might note
>> that pre-gen7 we don't know that the right half of an 8x4 at the right
>> edge is actually our pixels, but we're already clobbering those pixels for
>> depth resolves anyway and more work would be required to avoid that).
>> ---
>>   src/mesa/drivers/dri/i965/brw_clear.c | 13 -
>>   1 file changed, 13 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_clear.c 
>> b/src/mesa/drivers/dri/i965/brw_clear.c
>> index 4ee456a..2b999bf 100644
>> --- a/src/mesa/drivers/dri/i965/brw_clear.c
>> +++ b/src/mesa/drivers/dri/i965/brw_clear.c
>> @@ -128,19 +128,6 @@ brw_fast_clear_depth(struct gl_context *ctx)
>> return false;
>>  }
>>
>> -   /* The rendered area has to be 8x4 samples, not resolved pixels, so we 
>> look
>> -* at the miptree slice dimensions instead of renderbuffer size.
>> -*/
>> -   if (mt->level[depth_irb->mt_level].width % 8 != 0 ||
>> -   mt->level[depth_irb->mt_level].height % 4 != 0) {
>> -  perf_debug("Failed to fast clear depth due to width/height %d,%d not "
>> - "being aligned to 8,4.  Possible 5%% performance win if "
>> - "avoided\n",
>> - mt->level[depth_irb->mt_level].width,
>> - mt->level[depth_irb->mt_level].height);
>> -  return false;
>> -   }
>> -
>>  uint32_t depth_clear_value;
>>  switch (mt->format) {
>>  case MESA_FORMAT_Z32_FLOAT_X24S8:
>>
>
> Would be worth regression testing on Gen6.

Passed piglit.


pgpBR4vg2KMH_.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 15/14] i965/fs: Don't emit 16-wide BFI1 instructions.

2013-05-03 Thread Eric Anholt
Matt Turner  writes:

> The Haswell Bspec says "A SIMD16 instruction is not allowed." (but
> 16-wide BFI1 works for me so far). Since GLSL's bitfieldInsert()
> function takes int parameters BFI1 produces the same results in all
> channels, so there's never any reason to emit a 16-wide BFI1.

Suppose the first 8 pixels didn't take a branch, while the second 8 did,
and you're bitfield inserting in this branch.  You'd be using undefined
values for src[0].


pgpwAQP3OVgwg.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/12] glsl: Death to array dereferences of vectors!

2013-05-03 Thread Ian Romanick
From: Ian Romanick 

Now that all the places that used to generate array derefeneces of
vectors have been changed to generate either ir_binop_vector_extract or
ir_triop_vector_insert (or both), remove all support for dealing with
this deprecated construct.

As an added safeguard, modify ir_validate to reject ir_dereference_array
of a vector.

v2: Convert tabs to spaces.  Suggested by Eric.

Signed-off-by: Ian Romanick 
Reviewed-by: Eric Anholt 
---
 src/glsl/ir_validate.cpp|  29 +++
 src/glsl/lower_vec_index_to_cond_assign.cpp | 116 +---
 src/glsl/lower_vec_index_to_swizzle.cpp |  56 +-
 3 files changed, 32 insertions(+), 169 deletions(-)

diff --git a/src/glsl/ir_validate.cpp b/src/glsl/ir_validate.cpp
index 95bb0fe..52cfd39 100644
--- a/src/glsl/ir_validate.cpp
+++ b/src/glsl/ir_validate.cpp
@@ -69,6 +69,8 @@ public:
virtual ir_visitor_status visit_leave(ir_expression *ir);
virtual ir_visitor_status visit_leave(ir_swizzle *ir);
 
+   virtual ir_visitor_status visit_enter(class ir_dereference_array *);
+
virtual ir_visitor_status visit_enter(ir_assignment *ir);
virtual ir_visitor_status visit_enter(ir_call *ir);
 
@@ -102,6 +104,33 @@ ir_validate::visit(ir_dereference_variable *ir)
 }
 
 ir_visitor_status
+ir_validate::visit_enter(class ir_dereference_array *ir)
+{
+   if (!ir->array->type->is_array() && !ir->array->type->is_matrix()) {
+  printf("ir_dereference_array @ %p does not specify an array or a "
+ "matrix\n",
+ (void *) ir);
+  ir->print();
+  printf("\n");
+  abort();
+   }
+
+   if (!ir->array_index->type->is_scalar()) {
+  printf("ir_dereference_array @ %p does not have scalar index: %s\n",
+ (void *) ir, ir->array_index->type->name);
+  abort();
+   }
+
+   if (!ir->array_index->type->is_integer()) {
+  printf("ir_dereference_array @ %p does not have integer index: %s\n",
+ (void *) ir, ir->array_index->type->name);
+  abort();
+   }
+
+   return visit_continue;
+}
+
+ir_visitor_status
 ir_validate::visit_enter(ir_if *ir)
 {
if (ir->condition->type != glsl_type::bool_type) {
diff --git a/src/glsl/lower_vec_index_to_cond_assign.cpp 
b/src/glsl/lower_vec_index_to_cond_assign.cpp
index 9d248ca..8808596 100644
--- a/src/glsl/lower_vec_index_to_cond_assign.cpp
+++ b/src/glsl/lower_vec_index_to_cond_assign.cpp
@@ -52,7 +52,6 @@ public:
   progress = false;
}
 
-   ir_rvalue *convert_vec_index_to_cond_assign(ir_rvalue *val);
ir_rvalue *convert_vec_index_to_cond_assign(void *mem_ctx,
ir_rvalue *orig_vector,
ir_rvalue *orig_index,
@@ -142,26 +141,6 @@ 
ir_vec_index_to_cond_assign_visitor::convert_vec_index_to_cond_assign(void *mem_
 }
 
 ir_rvalue *
-ir_vec_index_to_cond_assign_visitor::convert_vec_index_to_cond_assign(ir_rvalue
 *ir)
-{
-   ir_dereference_array *orig_deref = ir->as_dereference_array();
-
-   if (!orig_deref)
-  return ir;
-
-   if (orig_deref->array->type->is_matrix() ||
-   orig_deref->array->type->is_array())
-  return ir;
-
-   assert(orig_deref->array_index->type->base_type == GLSL_TYPE_INT);
-
-   return convert_vec_index_to_cond_assign(ralloc_parent(ir),
-   orig_deref->array,
-   orig_deref->array_index,
-   ir->type);
-}
-
-ir_rvalue *
 
ir_vec_index_to_cond_assign_visitor::convert_vector_extract_to_cond_assign(ir_rvalue
 *ir)
 {
ir_expression *const expr = ir->as_expression();
@@ -181,7 +160,6 @@ 
ir_vec_index_to_cond_assign_visitor::visit_enter(ir_expression *ir)
unsigned int i;
 
for (i = 0; i < ir->get_num_operands(); i++) {
-  ir->operands[i] = convert_vec_index_to_cond_assign(ir->operands[i]);
   ir->operands[i] = convert_vector_extract_to_cond_assign(ir->operands[i]);
}
 
@@ -195,7 +173,6 @@ ir_vec_index_to_cond_assign_visitor::visit_enter(ir_swizzle 
*ir)
 * the result of indexing a vector is.  But maybe at some point we'll end up
 * using swizzling of scalars for vector construction.
 */
-   ir->val = convert_vec_index_to_cond_assign(ir->val);
ir->val = convert_vector_extract_to_cond_assign(ir->val);
 
return visit_continue;
@@ -204,95 +181,12 @@ 
ir_vec_index_to_cond_assign_visitor::visit_enter(ir_swizzle *ir)
 ir_visitor_status
 ir_vec_index_to_cond_assign_visitor::visit_leave(ir_assignment *ir)
 {
-   ir_variable *index, *var;
-   ir_dereference_variable *deref;
-   ir_assignment *assign;
-   unsigned i;
-
-   ir->rhs = convert_vec_index_to_cond_assign(ir->rhs);
ir->rhs = convert_vector_extract_to_cond_assign(ir->rhs);
 
if (ir->condition) {
-  ir->condition = convert_vec_index_to_cond_assign(ir->condition);
   ir->condition = convert_vector_extract_to_cond_assign(ir->condition);
}
 
-   /* Last, handle 

[Mesa-dev] [PATCH 11/12] glsl: Generate correct ir_binop_vector_extract code for out and inout parameters

2013-05-03 Thread Ian Romanick
From: Ian Romanick 

Like with type conversions on out parameters, some extra copies need to
occur to handle these cases.  The fundamental problem is that
ir_binop_vector_extract is not an lvalue, but out and inout parameters
must be lvalues.  A previous patch delt with a similar problem in the
LHS of ir_assignment.

v2: Convert tabs to spaces.  Suggested by Eric.

Signed-off-by: Ian Romanick 
Reviewed-by: Eric Anholt 
---
 src/glsl/ast_function.cpp | 149 +++---
 1 file changed, 102 insertions(+), 47 deletions(-)

diff --git a/src/glsl/ast_function.cpp b/src/glsl/ast_function.cpp
index 26f72cf..00e0c05 100644
--- a/src/glsl/ast_function.cpp
+++ b/src/glsl/ast_function.cpp
@@ -165,10 +165,18 @@ verify_parameter_modes(_mesa_glsl_parse_state *state,
 actual->variable_referenced()->name);
return false;
 } else if (!actual->is_lvalue()) {
-   _mesa_glsl_error(&loc, state,
-"function parameter '%s %s' is not an lvalue",
-mode, formal->name);
-   return false;
+/* Even though ir_binop_vector_extract is not an l-value, let it
+ * slop through.  generate_call will handle it correctly.
+ */
+ir_expression *const expr = ((ir_rvalue *) 
actual)->as_expression();
+if (expr == NULL
+|| expr->operation != ir_binop_vector_extract
+|| !expr->operands[0]->is_lvalue()) {
+   _mesa_glsl_error(&loc, state,
+"function parameter '%s %s' is not an lvalue",
+mode, formal->name);
+   return false;
+}
 }
   }
 
@@ -178,6 +186,93 @@ verify_parameter_modes(_mesa_glsl_parse_state *state,
return true;
 }
 
+static void
+fix_parameter(void *mem_ctx, ir_rvalue *actual, const glsl_type *formal_type,
+  exec_list *before_instructions, exec_list *after_instructions,
+  bool parameter_is_inout)
+{
+   ir_expression *const expr = actual->as_expression();
+
+   /* If the types match exactly and the parameter is not a vector-extract,
+* nothing needs to be done to fix the parameter.
+*/
+   if (formal_type == actual->type
+   && (expr == NULL || expr->operation != ir_binop_vector_extract))
+  return;
+
+   /* To convert an out parameter, we need to create a temporary variable to
+* hold the value before conversion, and then perform the conversion after
+* the function call returns.
+*
+* This has the effect of transforming code like this:
+*
+*   void f(out int x);
+*   float value;
+*   f(value);
+*
+* Into IR that's equivalent to this:
+*
+*   void f(out int x);
+*   float value;
+*   int out_parameter_conversion;
+*   f(out_parameter_conversion);
+*   value = float(out_parameter_conversion);
+*
+* If the parameter is an ir_expression of ir_binop_vector_extract,
+* additional conversion is needed in the post-call re-write.
+*/
+   ir_variable *tmp =
+  new(mem_ctx) ir_variable(formal_type, "inout_tmp", ir_var_temporary);
+
+   before_instructions->push_tail(tmp);
+
+   /* If the parameter is an inout parameter, copy the value of the actual
+* parameter to the new temporary.  Note that no type conversion is allowed
+* here because inout parameters must match types exactly.
+*/
+   if (parameter_is_inout) {
+  /* Inout parameters should never require conversion, since that would
+   * require an implicit conversion to exist both to and from the formal
+   * parameter type, and there are no bidirectional implicit conversions.
+   */
+  assert (actual->type == formal_type);
+
+  ir_dereference_variable *const deref_tmp_1 =
+ new(mem_ctx) ir_dereference_variable(tmp);
+  ir_assignment *const assignment =
+ new(mem_ctx) ir_assignment(deref_tmp_1, actual);
+  before_instructions->push_tail(assignment);
+   }
+
+   /* Replace the parameter in the call with a dereference of the new
+* temporary.
+*/
+   ir_dereference_variable *const deref_tmp_2 =
+  new(mem_ctx) ir_dereference_variable(tmp);
+   actual->replace_with(deref_tmp_2);
+
+
+   /* Copy the temporary variable to the actual parameter with optional
+* type conversion applied.
+*/
+   ir_rvalue *rhs = new(mem_ctx) ir_dereference_variable(tmp);
+   if (actual->type != formal_type)
+  rhs = convert_component(rhs, actual->type);
+
+   ir_rvalue *lhs = actual;
+   if (expr != NULL && expr->operation == ir_binop_vector_extract) {
+  rhs = new(mem_ctx) ir_expression(ir_triop_vector_insert,
+   expr->operands[0]->type,
+   expr->operands[0]->clone(mem_ctx, NULL),
+   rhs,
+   expr->operands[1

[Mesa-dev] [PATCH 10/12] glsl: Use vector-insert and vector-extract on elements of gl_ClipDistanceMESA

2013-05-03 Thread Ian Romanick
From: Ian Romanick 

Variable indexing into vectors using ir_dereference_array is being
removed, so this lowering pass has to generate something different.

v2: Convert tabs to spaces.  Suggested by Eric.

Signed-off-by: Ian Romanick 
Cc: Paul Berry 
---
 src/glsl/lower_clip_distance.cpp | 36 ++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/src/glsl/lower_clip_distance.cpp b/src/glsl/lower_clip_distance.cpp
index 19068fb..c93c821e 100644
--- a/src/glsl/lower_clip_distance.cpp
+++ b/src/glsl/lower_clip_distance.cpp
@@ -197,10 +197,17 @@ lower_clip_distance_visitor::handle_rvalue(ir_rvalue **rv)
  ir_rvalue *swizzle_index;
  this->create_indices(array->array_index, array_index, swizzle_index);
  void *mem_ctx = ralloc_parent(array);
- array->array =
+
+ ir_dereference_array *const ClipDistanceMESA_deref =
 new(mem_ctx) ir_dereference_array(this->new_clip_distance_var,
   array_index);
- array->array_index = swizzle_index;
+
+ ir_expression *const expr =
+new(mem_ctx) ir_expression(ir_binop_vector_extract,
+   ClipDistanceMESA_deref,
+   swizzle_index);
+
+ *rv = expr;
   }
}
 }
@@ -280,7 +287,32 @@ lower_clip_distance_visitor::visit_leave(ir_assignment *ir)
   return visit_continue;
}
 
+   /* Handle the LHS as if it were an r-value.  This may cause the LHS to get
+* replaced with an ir_expression or ir_binop_vector_extract.  If this
+* occurs, replace it with a dereference of the vector, and replace the RHS
+* with an ir_triop_vector_insert.
+*/
handle_rvalue((ir_rvalue **)&ir->lhs);
+   if (ir->lhs->ir_type == ir_type_expression) {
+  ir_expression *const expr = (ir_expression *) ir->lhs;
+
+  /* The expression must be of the form:
+   *
+   * (vector_extract gl_ClipDistanceMESA[i], j).
+   */
+  assert(expr->operation == ir_binop_vector_extract);
+  assert(expr->operands[0]->ir_type == ir_type_dereference_array);
+
+  ir_dereference *const new_lhs = (ir_dereference *) expr->operands[0];
+  ir->rhs = new(ctx) ir_expression(ir_triop_vector_insert,
+   new_lhs->type,
+   new_lhs->clone(ctx, NULL),
+   ir->rhs,
+   expr->operands[1]);
+  ir->set_lhs(new_lhs);
+  ir->write_mask = (1U << new_lhs->type->vector_elements) - 1;
+   }
+
return rvalue_visit(ir);
 }
 
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/12] glsl: Convert lower_clip_distance_visitor to be an ir_rvalue_visitor

2013-05-03 Thread Ian Romanick
From: Ian Romanick 

Right now the lower_clip_distance_visitor lowers variable indexing into
gl_ClipDistance into variable indexing into both the array
gl_ClipDistanceMESA and the vectors of that array.  For example,

gl_ClipDistance[i] = f;

becomes

gl_ClipDistanceMESA[i/4][i%4] = f;

However, variable indexing into vectors using ir_dereference_array is
being removed.  Instead, ir_expression with ir_triop_vector_insert will
be used.  The above code will become

gl_ClipDistanceMESA[i/4] = vector_insert(gl_ClipDistanceMESA[i/4],
 i % 4,
 f);

In order to do this, an ir_rvalue_visitor will need to be used.  This
commit is really just a refactor to get ready for that.

v2: Convert tabs to spaces.  Suggested by Eric.

Signed-off-by: Ian Romanick 
Cc: Paul Berry 
---
 src/glsl/lower_clip_distance.cpp | 136 +--
 1 file changed, 86 insertions(+), 50 deletions(-)

diff --git a/src/glsl/lower_clip_distance.cpp b/src/glsl/lower_clip_distance.cpp
index 643807d..19068fb 100644
--- a/src/glsl/lower_clip_distance.cpp
+++ b/src/glsl/lower_clip_distance.cpp
@@ -46,10 +46,10 @@
  */
 
 #include "glsl_symbol_table.h"
-#include "ir_hierarchical_visitor.h"
+#include "ir_rvalue_visitor.h"
 #include "ir.h"
 
-class lower_clip_distance_visitor : public ir_hierarchical_visitor {
+class lower_clip_distance_visitor : public ir_rvalue_visitor {
 public:
lower_clip_distance_visitor()
   : progress(false), old_clip_distance_var(NULL),
@@ -59,11 +59,12 @@ public:
 
virtual ir_visitor_status visit(ir_variable *);
void create_indices(ir_rvalue*, ir_rvalue *&, ir_rvalue *&);
-   virtual ir_visitor_status visit_leave(ir_dereference_array *);
virtual ir_visitor_status visit_leave(ir_assignment *);
void visit_new_assignment(ir_assignment *ir);
virtual ir_visitor_status visit_leave(ir_call *);
 
+   virtual void handle_rvalue(ir_rvalue **rvalue);
+
bool progress;
 
/**
@@ -173,33 +174,35 @@ lower_clip_distance_visitor::create_indices(ir_rvalue 
*old_index,
 }
 
 
-/**
- * Replace any expression that indexes into the gl_ClipDistance array with an
- * expression that indexes into one of the vec4's in gl_ClipDistanceMESA and
- * accesses the appropriate component.
- */
-ir_visitor_status
-lower_clip_distance_visitor::visit_leave(ir_dereference_array *ir)
+void
+lower_clip_distance_visitor::handle_rvalue(ir_rvalue **rv)
 {
/* If the gl_ClipDistance var hasn't been declared yet, then
 * there's no way this deref can refer to it.
 */
-   if (!this->old_clip_distance_var)
-  return visit_continue;
-
-   ir_dereference_variable *old_var_ref = ir->array->as_dereference_variable();
-   if (old_var_ref && old_var_ref->var == this->old_clip_distance_var) {
-  this->progress = true;
-  ir_rvalue *array_index;
-  ir_rvalue *swizzle_index;
-  this->create_indices(ir->array_index, array_index, swizzle_index);
-  void *mem_ctx = ralloc_parent(ir);
-  ir->array = new(mem_ctx) ir_dereference_array(
- this->new_clip_distance_var, array_index);
-  ir->array_index = swizzle_index;
+   if (!this->old_clip_distance_var || *rv == NULL)
+  return;
+
+   ir_dereference_array *const array = (*rv)->as_dereference_array();
+   if (array != NULL) {
+  /* Replace any expression that indexes into the gl_ClipDistance array
+   * with an expression that indexes into one of the vec4's in
+   * gl_ClipDistanceMESA and accesses the appropriate component.
+   */
+  ir_dereference_variable *old_var_ref =
+ array->array->as_dereference_variable();
+  if (old_var_ref && old_var_ref->var == this->old_clip_distance_var) {
+ this->progress = true;
+ ir_rvalue *array_index;
+ ir_rvalue *swizzle_index;
+ this->create_indices(array->array_index, array_index, swizzle_index);
+ void *mem_ctx = ralloc_parent(array);
+ array->array =
+new(mem_ctx) ir_dereference_array(this->new_clip_distance_var,
+  array_index);
+ array->array_index = swizzle_index;
+  }
}
-
-   return visit_continue;
 }
 
 
@@ -214,38 +217,71 @@ lower_clip_distance_visitor::visit_leave(ir_assignment 
*ir)
 {
ir_dereference_variable *lhs_var = ir->lhs->as_dereference_variable();
ir_dereference_variable *rhs_var = ir->rhs->as_dereference_variable();
-   if ((lhs_var && lhs_var->var == this->old_clip_distance_var)
-   || (rhs_var && rhs_var->var == this->old_clip_distance_var)) {
-  /* LHS or RHS of the assignment is the entire gl_ClipDistance array.
-   * Since we are reshaping gl_ClipDistance from an array of floats to an
-   * array of vec4's, this isn't going to work as a bulk assignment
-   * anymore, so unroll it to element-by-element assignments and lower
-   * each of them.
-   *
-   * Note: to unroll into 

Re: [Mesa-dev] [PATCH] glsl: Flip around "if" statements with empty "then" blocks.

2013-05-03 Thread Kenneth Graunke

On 05/03/2013 03:05 PM, Eric Anholt wrote:

This cleans up some funny-looking code in some unigine shaders I was
looking at.  Also slightly helps on planeshift and a few shaders in an
upcoming Valve release.

total instructions in shared programs: 1653715 -> 1653587 (-0.01%)
instructions in affected programs: 16550 -> 16422 (-0.77%)
---
  src/glsl/opt_if_simplification.cpp | 27 ++-
  1 file changed, 26 insertions(+), 1 deletion(-)


What a strange occurrance :)  Obvious improvement though.

Reviewed-by: Kenneth Graunke 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/12] glsl: Generate ir_binop_vector_extract for indexing of vectors

2013-05-03 Thread Ian Romanick
From: Ian Romanick 

Now ir_dereference_array of a vector will never occur in the RHS of an
expression.

v2: Add back the { } around the if-statement body to make it more
readable.  Suggested by Eric.

Signed-off-by: Ian Romanick 
Reviewed-by: Eric Anholt 
---
 src/glsl/ast_array_index.cpp | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/src/glsl/ast_array_index.cpp b/src/glsl/ast_array_index.cpp
index 862f64c..4baeb6f 100644
--- a/src/glsl/ast_array_index.cpp
+++ b/src/glsl/ast_array_index.cpp
@@ -31,8 +31,6 @@ _mesa_ast_array_index_to_hir(void *mem_ctx,
 ir_rvalue *array, ir_rvalue *idx,
 YYLTYPE &loc, YYLTYPE &idx_loc)
 {
-   ir_rvalue *result = new(mem_ctx) ir_dereference_array(array, idx);
-
if (!array->type->is_error()
&& !array->type->is_array()
&& !array->type->is_matrix()
@@ -40,7 +38,6 @@ _mesa_ast_array_index_to_hir(void *mem_ctx,
   _mesa_glsl_error(& idx_loc, state,
   "cannot dereference non-array / non-matrix / "
   "non-vector");
-  result->type = glsl_type::error_type;
}
 
if (!idx->type->is_error()) {
@@ -174,5 +171,20 @@ _mesa_ast_array_index_to_hir(void *mem_ctx,
   }
}
 
-   return result;
+   /* After performing all of the error checking, generate the IR for the
+* expression.
+*/
+   if (array->type->is_array()
+   || array->type->is_matrix()) {
+  return new(mem_ctx) ir_dereference_array(array, idx);
+   } else if (array->type->is_vector()) {
+  return new(mem_ctx) ir_expression(ir_binop_vector_extract, array, idx);
+   } else if (array->type->is_error()) {
+  return array;
+   } else {
+  ir_rvalue *result = new(mem_ctx) ir_dereference_array(array, idx);
+  result->type = glsl_type::error_type;
+
+  return result;
+   }
 }
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/12] glsl: Convert ir_binop_vector_extract in the LHS to ir_triop_vector_insert

2013-05-03 Thread Ian Romanick
From: Ian Romanick 

The ast_array_index code can't know whether to generate an
ir_binop_vector_extract or an ir_triop_vector_insert.  Instead it will
always generate ir_binop_vector_extract, and the LHS and RHS have to be
re-written.

v2: Convert tabs to spaces.  Suggested by Eric.

Signed-off-by: Ian Romanick 
Reviewed-by: Eric Anholt 
---
 src/glsl/ast_to_hir.cpp | 24 
 1 file changed, 24 insertions(+)

diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index e595110..b206380 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -672,6 +672,30 @@ do_assignment(exec_list *instructions, struct 
_mesa_glsl_parse_state *state,
void *ctx = state;
bool error_emitted = (lhs->type->is_error() || rhs->type->is_error());
 
+   /* If the assignment LHS comes back as an ir_binop_vector_extract
+* expression, move it to the RHS as an ir_triop_vector_insert.
+*/
+   if (lhs->ir_type == ir_type_expression) {
+  ir_expression *const expr = lhs->as_expression();
+
+  if (unlikely(expr->operation == ir_binop_vector_extract)) {
+ ir_rvalue *new_rhs =
+validate_assignment(state, lhs->type, rhs, is_initializer);
+
+ if (new_rhs == NULL) {
+_mesa_glsl_error(& lhs_loc, state, "type mismatch");
+return lhs;
+ } else {
+rhs = new(ctx) ir_expression(ir_triop_vector_insert,
+ expr->operands[0]->type,
+ expr->operands[0],
+ new_rhs,
+ expr->operands[1]);
+lhs = expr->operands[0]->clone(ctx, NULL);
+ }
+  }
+   }
+
ir_variable *lhs_var = lhs->variable_referenced();
if (lhs_var)
   lhs_var->assigned = true;
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/12] glsl: Add lowering pass for ir_triop_vector_insert

2013-05-03 Thread Ian Romanick
From: Ian Romanick 

This will eventually replace do_vec_index_to_cond_assign.  This lowering
pass is called in all the places where do_vec_index_to_cond_assign or
do_vec_index_to_swizzle is called.

v2: Use WRITEMASK_* instead of integer literals.  Use a more concise
method of generating broadcast_index.  Both suggested by Eric.

Signed-off-by: Ian Romanick 
Reviewed-by: Eric Anholt 
---
 src/glsl/Makefile.sources  |   1 +
 src/glsl/glsl_parser_extras.cpp|   1 +
 src/glsl/ir_optimization.h |   1 +
 src/glsl/lower_vector_insert.cpp   | 160 +
 src/mesa/drivers/dri/i965/brw_shader.cpp   |   1 +
 src/mesa/program/ir_to_mesa.cpp|   1 +
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp |   1 +
 7 files changed, 166 insertions(+)
 create mode 100644 src/glsl/lower_vector_insert.cpp

diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
index 674a05f..8e2dc1b 100644
--- a/src/glsl/Makefile.sources
+++ b/src/glsl/Makefile.sources
@@ -69,6 +69,7 @@ LIBGLSL_FILES = \
$(GLSL_SRCDIR)/lower_vec_index_to_cond_assign.cpp \
$(GLSL_SRCDIR)/lower_vec_index_to_swizzle.cpp \
$(GLSL_SRCDIR)/lower_vector.cpp \
+   $(GLSL_SRCDIR)/lower_vector_insert.cpp \
$(GLSL_SRCDIR)/lower_output_reads.cpp \
$(GLSL_SRCDIR)/lower_ubo_reference.cpp \
$(GLSL_SRCDIR)/opt_algebraic.cpp \
diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp
index 0992294..d38e967 100644
--- a/src/glsl/glsl_parser_extras.cpp
+++ b/src/glsl/glsl_parser_extras.cpp
@@ -1236,6 +1236,7 @@ do_common_optimization(exec_list *ir, bool linked,
progress = do_algebraic(ir) || progress;
progress = do_lower_jumps(ir) || progress;
progress = do_vec_index_to_swizzle(ir) || progress;
+   progress = lower_vector_insert(ir, false) || progress;
progress = do_swizzle_swizzle(ir) || progress;
progress = do_noop_swizzle(ir) || progress;
 
diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
index a8885d7..0216e46 100644
--- a/src/glsl/ir_optimization.h
+++ b/src/glsl/ir_optimization.h
@@ -106,6 +106,7 @@ void lower_ubo_reference(struct gl_shader *shader, 
exec_list *instructions);
 void lower_packed_varyings(void *mem_ctx, unsigned location_base,
unsigned locations_used, ir_variable_mode mode,
gl_shader *shader);
+bool lower_vector_insert(exec_list *instructions, bool 
lower_nonconstant_index);
 bool optimize_redundant_jumps(exec_list *instructions);
 bool optimize_split_arrays(exec_list *instructions, bool linked);
 
diff --git a/src/glsl/lower_vector_insert.cpp b/src/glsl/lower_vector_insert.cpp
new file mode 100644
index 000..3dbc263
--- /dev/null
+++ b/src/glsl/lower_vector_insert.cpp
@@ -0,0 +1,160 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include "ir.h"
+#include "ir_builder.h"
+#include "ir_rvalue_visitor.h"
+#include "ir_optimization.h"
+
+using namespace ir_builder;
+
+class vector_insert_visitor : public ir_rvalue_visitor {
+public:
+   vector_insert_visitor(bool lower_nonconstant_index)
+  : progress(false), lower_nonconstant_index(lower_nonconstant_index)
+   {
+  factory.instructions = &factory_instructions;
+   }
+
+   virtual ~vector_insert_visitor()
+   {
+  assert(factory_instructions.is_empty());
+   }
+
+   virtual void handle_rvalue(ir_rvalue **rv);
+
+   ir_factory factory;
+   exec_list factory_instructions;
+   bool progress;
+   bool lower_nonconstant_index;
+};
+
+
+void
+vector_insert_visitor::handle_rvalue(ir_rvalue **rv)
+{
+   if (*rv == NULL || (*rv)->ir_type != ir_type_expression)
+  return;
+
+   ir_expression *const expr = (ir_expression *) *rv;
+
+   if (likely(expr->operation != ir_triop_vector_insert))
+  return;
+
+   factory.mem_ctx = ralloc_pare

[Mesa-dev] [PATCH 05/12] glsl: Lower ir_binop_vector_extract to conditional moves

2013-05-03 Thread Ian Romanick
From: Ian Romanick 

Lower ir_binop_vector_extract with a non-constant index to a series of
conditional moves.  This is exactly like ir_dereference_array of a
vector with a non-constant index.

v2: Convert tabs to spaces.  Suggested by Eric.

Signed-off-by: Ian Romanick 
Reviewed-by: Eric Anholt 
---
 src/glsl/lower_vec_index_to_cond_assign.cpp | 32 -
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/src/glsl/lower_vec_index_to_cond_assign.cpp 
b/src/glsl/lower_vec_index_to_cond_assign.cpp
index 9c0d92e..9d248ca 100644
--- a/src/glsl/lower_vec_index_to_cond_assign.cpp
+++ b/src/glsl/lower_vec_index_to_cond_assign.cpp
@@ -58,6 +58,8 @@ public:
ir_rvalue *orig_index,
const glsl_type *type);
 
+   ir_rvalue *convert_vector_extract_to_cond_assign(ir_rvalue *ir);
+
virtual ir_visitor_status visit_enter(ir_expression *);
virtual ir_visitor_status visit_enter(ir_swizzle *);
virtual ir_visitor_status visit_leave(ir_assignment *);
@@ -159,6 +161,20 @@ 
ir_vec_index_to_cond_assign_visitor::convert_vec_index_to_cond_assign(ir_rvalue
ir->type);
 }
 
+ir_rvalue *
+ir_vec_index_to_cond_assign_visitor::convert_vector_extract_to_cond_assign(ir_rvalue
 *ir)
+{
+   ir_expression *const expr = ir->as_expression();
+
+   if (expr == NULL || expr->operation != ir_binop_vector_extract)
+  return ir;
+
+   return convert_vec_index_to_cond_assign(ralloc_parent(ir),
+   expr->operands[0],
+   expr->operands[1],
+   ir->type);
+}
+
 ir_visitor_status
 ir_vec_index_to_cond_assign_visitor::visit_enter(ir_expression *ir)
 {
@@ -166,6 +182,7 @@ 
ir_vec_index_to_cond_assign_visitor::visit_enter(ir_expression *ir)
 
for (i = 0; i < ir->get_num_operands(); i++) {
   ir->operands[i] = convert_vec_index_to_cond_assign(ir->operands[i]);
+  ir->operands[i] = convert_vector_extract_to_cond_assign(ir->operands[i]);
}
 
return visit_continue;
@@ -179,6 +196,7 @@ ir_vec_index_to_cond_assign_visitor::visit_enter(ir_swizzle 
*ir)
 * using swizzling of scalars for vector construction.
 */
ir->val = convert_vec_index_to_cond_assign(ir->val);
+   ir->val = convert_vector_extract_to_cond_assign(ir->val);
 
return visit_continue;
 }
@@ -192,8 +210,12 @@ 
ir_vec_index_to_cond_assign_visitor::visit_leave(ir_assignment *ir)
unsigned i;
 
ir->rhs = convert_vec_index_to_cond_assign(ir->rhs);
-   if (ir->condition)
+   ir->rhs = convert_vector_extract_to_cond_assign(ir->rhs);
+
+   if (ir->condition) {
   ir->condition = convert_vec_index_to_cond_assign(ir->condition);
+  ir->condition = convert_vector_extract_to_cond_assign(ir->condition);
+   }
 
/* Last, handle the LHS */
ir_dereference_array *orig_deref = ir->lhs->as_dereference_array();
@@ -283,6 +305,12 @@ ir_vec_index_to_cond_assign_visitor::visit_enter(ir_call 
*ir)
 
   if (new_param != param) {
 param->replace_with(new_param);
+  } else {
+ new_param = convert_vector_extract_to_cond_assign(param);
+
+ if (new_param != param) {
+param->replace_with(new_param);
+ }
   }
}
 
@@ -294,6 +322,7 @@ ir_vec_index_to_cond_assign_visitor::visit_enter(ir_return 
*ir)
 {
if (ir->value) {
   ir->value = convert_vec_index_to_cond_assign(ir->value);
+  ir->value = convert_vector_extract_to_cond_assign(ir->value);
}
 
return visit_continue;
@@ -303,6 +332,7 @@ ir_visitor_status
 ir_vec_index_to_cond_assign_visitor::visit_enter(ir_if *ir)
 {
ir->condition = convert_vec_index_to_cond_assign(ir->condition);
+   ir->condition = convert_vector_extract_to_cond_assign(ir->condition);
 
return visit_continue;
 }
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/12] glsl: Lower ir_binop_vector_extract to swizzle

2013-05-03 Thread Ian Romanick
From: Ian Romanick 

Lower ir_binop_vector_extract with a constant index to a swizzle.  This
is exactly like ir_dereference_array of a vector with a constant index.

v2: Convert tabs to spaces.  Suggested by Eric.

v3: Correctly call convert_vector_extract_to_swizzle in
ir_vec_index_to_swizzle_visitor::visit_enter(ir_call *ir).  Suggested by
Ken.

Signed-off-by: Ian Romanick 
Reviewed-by: Eric Anholt 
---
 src/glsl/lower_vec_index_to_swizzle.cpp | 47 -
 1 file changed, 46 insertions(+), 1 deletion(-)

diff --git a/src/glsl/lower_vec_index_to_swizzle.cpp 
b/src/glsl/lower_vec_index_to_swizzle.cpp
index 264d6dc..9d0d696 100644
--- a/src/glsl/lower_vec_index_to_swizzle.cpp
+++ b/src/glsl/lower_vec_index_to_swizzle.cpp
@@ -47,6 +47,7 @@ public:
}
 
ir_rvalue *convert_vec_index_to_swizzle(ir_rvalue *val);
+   ir_rvalue *convert_vector_extract_to_swizzle(ir_rvalue *val);
 
virtual ir_visitor_status visit_enter(ir_expression *);
virtual ir_visitor_status visit_enter(ir_swizzle *);
@@ -93,11 +94,45 @@ 
ir_vec_index_to_swizzle_visitor::convert_vec_index_to_swizzle(ir_rvalue *ir)
 * large.  For simplicity sake, just clamp the index to [0, size-1].
 */
const int i = MIN2(MAX2(ir_constant->value.i[0], 0),
- ((int) deref->array->type->vector_elements - 1));
+  ((int) deref->array->type->vector_elements - 1));
 
return new(ctx) ir_swizzle(deref->array, i, 0, 0, 0, 1);
 }
 
+ir_rvalue *
+ir_vec_index_to_swizzle_visitor::convert_vector_extract_to_swizzle(ir_rvalue 
*ir)
+{
+   ir_expression *const expr = ir->as_expression();
+   if (expr == NULL || expr->operation != ir_binop_vector_extract)
+  return ir;
+
+   ir_constant *const idx = expr->operands[1]->constant_expression_value();
+   if (idx == NULL)
+  return ir;
+
+   void *ctx = ralloc_parent(ir);
+   this->progress = true;
+
+   /* Page 40 of the GLSL 1.20 spec says:
+*
+* "When indexing with non-constant expressions, behavior is undefined
+* if the index is negative, or greater than or equal to the size of
+* the vector."
+*
+* The quoted spec text mentions non-constant expressions, but this code
+* operates on constants.  These constants are the result of non-constant
+* expressions that have been optimized to constants.  The common case here
+* is a loop counter from an unrolled loop that is used to index a vector.
+*
+* The ir_swizzle constructor gets angry if the index is negative or too
+* large.  For simplicity sake, just clamp the index to [0, size-1].
+*/
+   const int i = MIN2(MAX2(idx->value.i[0], 0),
+  ((int) expr->operands[0]->type->vector_elements - 1));
+
+   return new(ctx) ir_swizzle(expr->operands[0], i, 0, 0, 0, 1);
+}
+
 ir_visitor_status
 ir_vec_index_to_swizzle_visitor::visit_enter(ir_expression *ir)
 {
@@ -105,6 +140,7 @@ ir_vec_index_to_swizzle_visitor::visit_enter(ir_expression 
*ir)
 
for (i = 0; i < ir->get_num_operands(); i++) {
   ir->operands[i] = convert_vec_index_to_swizzle(ir->operands[i]);
+  ir->operands[i] = convert_vector_extract_to_swizzle(ir->operands[i]);
}
 
return visit_continue;
@@ -127,6 +163,7 @@ ir_vec_index_to_swizzle_visitor::visit_enter(ir_assignment 
*ir)
 {
ir->set_lhs(convert_vec_index_to_swizzle(ir->lhs));
ir->rhs = convert_vec_index_to_swizzle(ir->rhs);
+   ir->rhs = convert_vector_extract_to_swizzle(ir->rhs);
 
return visit_continue;
 }
@@ -140,6 +177,12 @@ ir_vec_index_to_swizzle_visitor::visit_enter(ir_call *ir)
 
   if (new_param != param) {
 param->replace_with(new_param);
+  } else {
+ new_param = convert_vector_extract_to_swizzle(param);
+
+ if (new_param != param) {
+param->replace_with(new_param);
+ }
   }
}
 
@@ -151,6 +194,7 @@ ir_vec_index_to_swizzle_visitor::visit_enter(ir_return *ir)
 {
if (ir->value) {
   ir->value = convert_vec_index_to_swizzle(ir->value);
+  ir->value = convert_vector_extract_to_swizzle(ir->value);
}
 
return visit_continue;
@@ -160,6 +204,7 @@ ir_visitor_status
 ir_vec_index_to_swizzle_visitor::visit_enter(ir_if *ir)
 {
ir->condition = convert_vec_index_to_swizzle(ir->condition);
+   ir->condition = convert_vector_extract_to_swizzle(ir->condition);
 
return visit_continue;
 }
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/12] glsl: Refactor part of convert_vec_index_to_cond_assign

2013-05-03 Thread Ian Romanick
From: Ian Romanick 

Use a first function that extract the vector being indexed and the index
from the deref.  Call the second function that does the real work.

Coming patches will add a new ir_expression for variable indexing into a
vector.  Having the lowering pass split into two functions will make it
much easier to lower the new ir_expression.

v2: Convert tabs to spaces.  Suggested by Eric.

v3: Move some bits from a later patch back to this patch so that it
actually compiles.  Suggested by Ken.

Signed-off-by: Ian Romanick 
Reviewed-by: Eric Anholt 
---
 src/glsl/lower_vec_index_to_cond_assign.cpp | 57 +++--
 1 file changed, 37 insertions(+), 20 deletions(-)

diff --git a/src/glsl/lower_vec_index_to_cond_assign.cpp 
b/src/glsl/lower_vec_index_to_cond_assign.cpp
index f85875f..9c0d92e 100644
--- a/src/glsl/lower_vec_index_to_cond_assign.cpp
+++ b/src/glsl/lower_vec_index_to_cond_assign.cpp
@@ -53,6 +53,10 @@ public:
}
 
ir_rvalue *convert_vec_index_to_cond_assign(ir_rvalue *val);
+   ir_rvalue *convert_vec_index_to_cond_assign(void *mem_ctx,
+   ir_rvalue *orig_vector,
+   ir_rvalue *orig_index,
+   const glsl_type *type);
 
virtual ir_visitor_status visit_enter(ir_expression *);
virtual ir_visitor_status visit_enter(ir_swizzle *);
@@ -65,24 +69,16 @@ public:
 };
 
 ir_rvalue *
-ir_vec_index_to_cond_assign_visitor::convert_vec_index_to_cond_assign(ir_rvalue
 *ir)
+ir_vec_index_to_cond_assign_visitor::convert_vec_index_to_cond_assign(void 
*mem_ctx,
+  
ir_rvalue *orig_vector,
+  
ir_rvalue *orig_index,
+  const 
glsl_type *type)
 {
-   ir_dereference_array *orig_deref = ir->as_dereference_array();
ir_assignment *assign, *value_assign;
ir_variable *index, *var, *value;
ir_dereference *deref, *deref_value;
unsigned i;
 
-   if (!orig_deref)
-  return ir;
-
-   if (orig_deref->array->type->is_matrix() ||
-   orig_deref->array->type->is_array())
-  return ir;
-
-   void *mem_ctx = ralloc_parent(ir);
-
-   assert(orig_deref->array_index->type->base_type == GLSL_TYPE_INT);
 
exec_list list;
 
@@ -92,19 +88,19 @@ 
ir_vec_index_to_cond_assign_visitor::convert_vec_index_to_cond_assign(ir_rvalue
ir_var_temporary);
list.push_tail(index);
deref = new(base_ir) ir_dereference_variable(index);
-   assign = new(base_ir) ir_assignment(deref, orig_deref->array_index, NULL);
+   assign = new(base_ir) ir_assignment(deref, orig_index, NULL);
list.push_tail(assign);
 
/* Store the value inside a temp, thus avoiding matrixes duplication */
-   value = new(base_ir) ir_variable(orig_deref->array->type, "vec_value_tmp",
-   ir_var_temporary);
+   value = new(base_ir) ir_variable(orig_vector->type, "vec_value_tmp",
+ir_var_temporary);
list.push_tail(value);
deref_value = new(base_ir) ir_dereference_variable(value);
-   value_assign = new(base_ir) ir_assignment(deref_value, orig_deref->array);
+   value_assign = new(base_ir) ir_assignment(deref_value, orig_vector);
list.push_tail(value_assign);
 
/* Temporary where we store whichever value we swizzle out. */
-   var = new(base_ir) ir_variable(ir->type, "vec_index_tmp_v",
+   var = new(base_ir) ir_variable(type, "vec_index_tmp_v",
  ir_var_temporary);
list.push_tail(var);
 
@@ -113,13 +109,14 @@ 
ir_vec_index_to_cond_assign_visitor::convert_vec_index_to_cond_assign(ir_rvalue
 */
ir_rvalue *const cond_deref =
   compare_index_block(&list, index, 0,
- orig_deref->array->type->vector_elements,
+  orig_vector->type->vector_elements,
  mem_ctx);
 
/* Generate a conditional move of each vector element to the temp. */
-   for (i = 0; i < orig_deref->array->type->vector_elements; i++) {
+   for (i = 0; i < orig_vector->type->vector_elements; i++) {
   ir_rvalue *condition_swizzle =
-new(base_ir) ir_swizzle(cond_deref->clone(ir, NULL), i, 0, 0, 0, 1);
+ new(base_ir) ir_swizzle(cond_deref->clone(mem_ctx, NULL),
+ i, 0, 0, 0, 1);
 
   /* Just clone the rest of the deref chain when trying to get at the
* underlying variable.
@@ -142,6 +139,26 @@ 
ir_vec_index_to_cond_assign_visitor::convert_vec_index_to_cond_assign(ir_rvalue
return new(base_ir) ir_dereference_variable(var);
 }
 
+ir_rvalue *
+ir_vec_index_to_cond_assign_visitor::convert_vec_index_to_cond_assign(ir_rvalue
 *ir)
+{
+   ir_dereference_array *orig_deref = ir->as_dereference_array();
+
+   if (!orig_deref)

[Mesa-dev] [PATCH 02/12] glsl: Add ir_triop_vector_insert

2013-05-03 Thread Ian Romanick
From: Ian Romanick 

The new opcode is used to generate a new vector with a single field from
the source vector replaced.  This will eventually replace
ir_dereference_array of vectors in the LHS of assignments.

v2: Convert tabs to spaces.  Suggested by Eric.

v3: Add constant expression handling for ir_triop_vector_insert.  This
prevents the constant matrix inversion tests from regressing.  Duh.

Signed-off-by: Ian Romanick 
Reviewed-by: Eric Anholt 
---
 src/glsl/ir.cpp |  1 +
 src/glsl/ir.h   | 11 ++-
 src/glsl/ir_constant_expression.cpp | 25 +
 src/glsl/ir_validate.cpp|  9 +
 src/mesa/program/ir_to_mesa.cpp |  1 +
 5 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp
index f4596db..336ff95 100644
--- a/src/glsl/ir.cpp
+++ b/src/glsl/ir.cpp
@@ -511,6 +511,7 @@ static const char *const operator_strs[] = {
"ubo_load",
"vector_extract",
"lrp",
+   "vector_insert",
"vector",
 };
 
diff --git a/src/glsl/ir.h b/src/glsl/ir.h
index ba6a61b..878e360 100644
--- a/src/glsl/ir.h
+++ b/src/glsl/ir.h
@@ -1132,9 +1132,18 @@ enum ir_expression_operation {
ir_triop_lrp,
 
/**
+* Generate a value with one field of a vector changed
+*
+* operand0 is the vector
+* operand1 is the value to write into the vector result
+* operand2 is the index in operand0 to be modified
+*/
+   ir_triop_vector_insert,
+
+   /**
 * A sentinel marking the last of the ternary operations.
 */
-   ir_last_triop = ir_triop_lrp,
+   ir_last_triop = ir_triop_vector_insert,
 
ir_quadop_vector,
 
diff --git a/src/glsl/ir_constant_expression.cpp 
b/src/glsl/ir_constant_expression.cpp
index 963c88b..acd906d 100644
--- a/src/glsl/ir_constant_expression.cpp
+++ b/src/glsl/ir_constant_expression.cpp
@@ -1291,6 +1291,31 @@ ir_expression::constant_expression_value(struct 
hash_table *variable_context)
   break;
}
 
+   case ir_triop_vector_insert: {
+  const unsigned idx = op[2]->value.u[0];
+
+  memcpy(&data, &op[0]->value, sizeof(data));
+
+  switch (this->type->base_type) {
+  case GLSL_TYPE_INT:
+data.i[idx] = op[1]->value.i[0];
+break;
+  case GLSL_TYPE_UINT:
+data.u[idx] = op[1]->value.u[0];
+break;
+  case GLSL_TYPE_FLOAT:
+data.f[idx] = op[1]->value.f[0];
+break;
+  case GLSL_TYPE_BOOL:
+data.b[idx] = op[1]->value.b[0];
+break;
+  default:
+assert(!"Should not get here.");
+break;
+  }
+  break;
+   }
+
case ir_quadop_vector:
   for (unsigned c = 0; c < this->type->vector_elements; c++) {
 switch (this->type->base_type) {
diff --git a/src/glsl/ir_validate.cpp b/src/glsl/ir_validate.cpp
index 0b9631f2..95bb0fe 100644
--- a/src/glsl/ir_validate.cpp
+++ b/src/glsl/ir_validate.cpp
@@ -480,6 +480,15 @@ ir_validate::visit_leave(ir_expression *ir)
   assert(ir->operands[2]->type == ir->operands[0]->type || 
ir->operands[2]->type == glsl_type::float_type);
   break;
 
+   case ir_triop_vector_insert:
+  assert(ir->operands[0]->type->is_vector());
+  assert(ir->operands[1]->type->is_scalar());
+  assert(ir->operands[0]->type->base_type == 
ir->operands[1]->type->base_type);
+  assert(ir->operands[2]->type->is_scalar()
+ && ir->operands[2]->type->is_integer());
+  assert(ir->type == ir->operands[0]->type);
+  break;
+
case ir_quadop_vector:
   /* The vector operator collects some number of scalars and generates a
* vector from them.
diff --git a/src/mesa/program/ir_to_mesa.cpp b/src/mesa/program/ir_to_mesa.cpp
index 7d351c0..eb64347 100644
--- a/src/mesa/program/ir_to_mesa.cpp
+++ b/src/mesa/program/ir_to_mesa.cpp
@@ -1486,6 +1486,7 @@ ir_to_mesa_visitor::visit(ir_expression *ir)
   break;
 
case ir_binop_vector_extract:
+   case ir_triop_vector_insert:
case ir_quadop_vector:
   /* This operation should have already been handled.
*/
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/12] glsl: Add ir_binop_vector_extract

2013-05-03 Thread Ian Romanick
From: Ian Romanick 

The new opcode is used to get a single field from a vector.  The field
index may not be constant.  This will eventually replace
ir_dereference_array of vectors.  This is similar to the extractelement
instruction in LLVM IR.

http://llvm.org/docs/LangRef.html#extractelement-instruction

v2: Convert tabs to spaces.  Suggested by Eric.

v3: Add array index range checking to ir_binop_vector_extract constant
expression handling.  Suggested by Ken.

Signed-off-by: Ian Romanick 
Reviewed-by: Eric Anholt 
---
 src/glsl/ir.cpp |  5 +
 src/glsl/ir.h   | 10 +-
 src/glsl/ir_constant_expression.cpp | 36 +---
 src/glsl/ir_validate.cpp|  6 ++
 src/mesa/program/ir_to_mesa.cpp |  1 +
 5 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp
index 05b77da..f4596db 100644
--- a/src/glsl/ir.cpp
+++ b/src/glsl/ir.cpp
@@ -399,6 +399,10 @@ ir_expression::ir_expression(int op, ir_rvalue *op0, 
ir_rvalue *op1)
   this->type = op0->type;
   break;
 
+   case ir_binop_vector_extract:
+  this->type = op0->type->get_scalar_type();
+  break;
+
default:
   assert(!"not reached: missing automatic type setup for ir_expression");
   this->type = glsl_type::float_type;
@@ -505,6 +509,7 @@ static const char *const operator_strs[] = {
"pow",
"packHalf2x16_split",
"ubo_load",
+   "vector_extract",
"lrp",
"vector",
 };
diff --git a/src/glsl/ir.h b/src/glsl/ir.h
index 878a946..ba6a61b 100644
--- a/src/glsl/ir.h
+++ b/src/glsl/ir.h
@@ -1117,9 +1117,17 @@ enum ir_expression_operation {
ir_binop_ubo_load,
 
/**
+* Extract a scalar from a vector
+*
+* operand0 is the vector
+* operand1 is the index of the field to read from operand0
+*/
+   ir_binop_vector_extract,
+
+   /**
 * A sentinel marking the last of the binary operations.
 */
-   ir_last_binop = ir_binop_ubo_load,
+   ir_last_binop = ir_binop_vector_extract,
 
ir_triop_lrp,
 
diff --git a/src/glsl/ir_constant_expression.cpp 
b/src/glsl/ir_constant_expression.cpp
index c09e56a..963c88b 100644
--- a/src/glsl/ir_constant_expression.cpp
+++ b/src/glsl/ir_constant_expression.cpp
@@ -391,9 +391,16 @@ ir_expression::constant_expression_value(struct hash_table 
*variable_context)
}
 
if (op[1] != NULL)
-  assert(op[0]->type->base_type == op[1]->type->base_type ||
-this->operation == ir_binop_lshift ||
-this->operation == ir_binop_rshift);
+  switch (this->operation) {
+  case ir_binop_lshift:
+  case ir_binop_rshift:
+  case ir_binop_vector_extract:
+ break;
+
+  default:
+ assert(op[0]->type->base_type == op[1]->type->base_type);
+ break;
+  }
 
bool op0_scalar = op[0]->type->is_scalar();
bool op1_scalar = op[1] != NULL && op[1]->type->is_scalar();
@@ -1230,6 +1237,29 @@ ir_expression::constant_expression_value(struct 
hash_table *variable_context)
   }
   break;
 
+   case ir_binop_vector_extract: {
+  const int c = MAX2(0, MIN2(op[1]->value.i[0],
+ op[0]->type->vector_elements - 1));
+
+  switch (op[0]->type->base_type) {
+  case GLSL_TYPE_UINT:
+ data.u[0] = op[0]->value.u[c];
+ break;
+  case GLSL_TYPE_INT:
+ data.i[0] = op[0]->value.i[c];
+ break;
+  case GLSL_TYPE_FLOAT:
+ data.f[0] = op[0]->value.f[c];
+ break;
+  case GLSL_TYPE_BOOL:
+ data.b[0] = op[0]->value.b[c];
+ break;
+  default:
+ assert(0);
+  }
+  break;
+   }
+
case ir_binop_bit_xor:
   for (unsigned c = 0, c0 = 0, c1 = 0;
c < components;
diff --git a/src/glsl/ir_validate.cpp b/src/glsl/ir_validate.cpp
index 699c192..0b9631f2 100644
--- a/src/glsl/ir_validate.cpp
+++ b/src/glsl/ir_validate.cpp
@@ -468,6 +468,12 @@ ir_validate::visit_leave(ir_expression *ir)
   assert(ir->operands[1]->type == glsl_type::uint_type);
   break;
 
+   case ir_binop_vector_extract:
+  assert(ir->operands[0]->type->is_vector());
+  assert(ir->operands[1]->type->is_scalar()
+ && ir->operands[1]->type->is_integer());
+  break;
+
case ir_triop_lrp:
   assert(ir->operands[0]->type->base_type == GLSL_TYPE_FLOAT);
   assert(ir->operands[0]->type == ir->operands[1]->type);
diff --git a/src/mesa/program/ir_to_mesa.cpp b/src/mesa/program/ir_to_mesa.cpp
index 14cf5ba..7d351c0 100644
--- a/src/mesa/program/ir_to_mesa.cpp
+++ b/src/mesa/program/ir_to_mesa.cpp
@@ -1485,6 +1485,7 @@ ir_to_mesa_visitor::visit(ir_expression *ir)
   emit(ir, OPCODE_LRP, result_dst, op[2], op[1], op[0]);
   break;
 
+   case ir_binop_vector_extract:
case ir_quadop_vector:
   /* This operation should have already been handled.
*/
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-

Re: [Mesa-dev] [PATCH] r600g: don't emit surface_sync after FLUSH_AND_INV_EVENT

2013-05-03 Thread Aaron Watry
I know it's been pushed already, but this also fixes some lockups that
I was seeing on Barts (HD6850) when running piglit's OpenCL tests.

Thanks for fixing this.

--Aaron

On Fri, May 3, 2013 at 9:47 AM, Marek Olšák  wrote:
> Reviewed-by: Marek Olšák 
>
> Marek
>
> On Fri, May 3, 2013 at 4:01 PM,   wrote:
>> From: Alex Deucher 
>>
>> It shouldn't be needed since the FLUSH_AND_INV_EVENT has already
>> made sure the destination caches are flushed.  Additionally,
>> we didn't previously emit the surface_sync until this commit:
>> http://cgit.freedesktop.org/mesa/mesa/commit/?id=e5e4c07e7964a3258ed02b530bcdc24c0650204b
>> Emitting them together causes hangs in compute on cayman/TN
>> and hangs in Heaven on evergreen.
>>
>> Note: this patch is a candidate for the 9.1 branch, but requires:
>> http://cgit.freedesktop.org/mesa/mesa/commit/?id=156bcca62c9f4e79e78929f72bc085757f36a65a
>> as well.
>>
>> Signed-off-by: Alex Deucher 
>> ---
>>  src/gallium/drivers/r600/r600_hw_context.c |   26 --
>>  1 files changed, 0 insertions(+), 26 deletions(-)
>>
>> diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
>> b/src/gallium/drivers/r600/r600_hw_context.c
>> index 6d8b2cf..944b666 100644
>> --- a/src/gallium/drivers/r600/r600_hw_context.c
>> +++ b/src/gallium/drivers/r600/r600_hw_context.c
>> @@ -226,32 +226,6 @@ void r600_flush_emit(struct r600_context *rctx)
>> if (rctx->flags & R600_CONTEXT_FLUSH_AND_INV) {
>> cs->buf[cs->cdw++] = PKT3(PKT3_EVENT_WRITE, 0, 0);
>> cs->buf[cs->cdw++] = 
>> EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT) | EVENT_INDEX(0);
>> -   if (rctx->chip_class >= EVERGREEN) {
>> -   /* We were previously setting the CB and DB bits on
>> -* cp_coher_cntl, but this is unnecessary since
>> -* we are emitting the
>> -* EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet.
>> -* Setting the CB bits was causing lockups when using
>> -* compute on cayman.
>> -*
>> -* XXX: Do even need to emit a surface sync packet 
>> here?
>> -* Prior to e5e4c07e7964a3258ed02b530bcdc24c0650204b
>> -* surface sync was not being emitted with the
>> -* R600_CONTEXT_FLUSH_AND_INV flag.
>> -*/
>> -   cp_coher_cntl = S_0085F0_TC_ACTION_ENA(1) |
>> -   S_0085F0_DB_ACTION_ENA(1) |
>> -   S_0085F0_SH_ACTION_ENA(1) |
>> -   S_0085F0_SMX_ACTION_ENA(1) |
>> -   S_0085F0_FULL_CACHE_ENA(1);
>> -   } else {
>> -   cp_coher_cntl = S_0085F0_SMX_ACTION_ENA(1) |
>> -   S_0085F0_SH_ACTION_ENA(1) |
>> -   S_0085F0_VC_ACTION_ENA(1) |
>> -   S_0085F0_TC_ACTION_ENA(1) |
>> -   S_0085F0_FULL_CACHE_ENA(1);
>> -   }
>> -   emit_flush = 1;
>> }
>>
>> if (rctx->flags & R600_CONTEXT_INVAL_READ_CACHES) {
>> --
>> 1.7.7.5
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] vbo: implement primitive merging for glBegin/End sequences

2013-05-03 Thread Brian Paul

On 05/03/2013 03:52 PM, Eric Anholt wrote:

Brian Paul  writes:

diff --git a/src/mesa/vbo/vbo_save_api.c b/src/mesa/vbo/vbo_save_api.c
index b8dd90c..9ce3c6e 100644
--- a/src/mesa/vbo/vbo_save_api.c
+++ b/src/mesa/vbo/vbo_save_api.c
@@ -305,9 +305,9 @@ _save_reset_counters(struct gl_context *ctx)
   * previous prim.
   */
  static void
-vbo_merge_prims(struct gl_context *ctx,
-struct _mesa_prim *prim_list,
-GLuint *prim_count)
+merge_prims(struct gl_context *ctx,
+struct _mesa_prim *prim_list,
+GLuint *prim_count)
  {
 GLuint i;
 struct _mesa_prim *prev_prim = prim_list;
@@ -315,10 +315,9 @@ vbo_merge_prims(struct gl_context *ctx,
 for (i = 1; i<  *prim_count; i++) {
struct _mesa_prim *this_prim = prim_list + i;

-  if (this_prim->mode == prev_prim->mode&&
-  this_prim->mode == GL_QUADS&&
-  this_prim->count % 4 == 0&&
-  prev_prim->count % 4 == 0&&
+  vbo_try_prim_conversion(this_prim);
+
+  if (vbo_can_merge_prims(prev_prim, this_prim)&&
this_prim->start == prev_prim->start + prev_prim->count&&
this_prim->basevertex == prev_prim->basevertex&&
this_prim->num_instances == prev_prim->num_instances&&


These other checks look like they should be in vbo_can_merge_prims.


Will do.


Other than that,

Reviewed-by: Eric Anholt


Thanks.

-Brian
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] r600g/llvm: Parse config values in register / value pairs

2013-05-03 Thread Tom Stellard
From: Tom Stellard 

Rather than relying on a predetermined order for the config values.
---
 src/gallium/drivers/r600/r600_llvm.c | 33 ++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_llvm.c 
b/src/gallium/drivers/r600/r600_llvm.c
index f83caee..c004879 100644
--- a/src/gallium/drivers/r600/r600_llvm.c
+++ b/src/gallium/drivers/r600/r600_llvm.c
@@ -7,6 +7,7 @@
 #include "util/u_double_list.h"
 #include "util/u_memory.h"
 
+#include "evergreend.h"
 #include "r600_asm.h"
 #include "r600_sq.h"
 #include "r600_opcodes.h"
@@ -550,6 +551,12 @@ LLVMModuleRef r600_tgsi_llvm(
return ctx->gallivm.module;
 }
 
+/* We need to define these R600 registers here, because we can't include
+ * evergreend.h and r600d.h.
+ */
+#define R_028868_SQ_PGM_RESOURCES_VS 0x028868
+#define R_028850_SQ_PGM_RESOURCES_PS 0x028850
+
 unsigned r600_llvm_compile(
LLVMModuleRef mod,
enum radeon_family family,
@@ -560,6 +567,7 @@ unsigned r600_llvm_compile(
unsigned r;
struct radeon_llvm_binary binary;
const char * gpu_family = r600_llvm_gpu_string(family);
+   unsigned i;
 
r = radeon_llvm_compile(mod, &binary, gpu_family, dump);
 
@@ -568,9 +576,28 @@ unsigned r600_llvm_compile(
memcpy(bc->bytecode, binary.code, binary.code_size);
bc->ndw = binary.code_size / 4;
 
-   bc->ngpr = util_le32_to_cpu(*(uint32_t*)binary.config);
-   bc->nstack = util_le32_to_cpu(*(uint32_t*)(binary.config + 4));
-   *use_kill = util_le32_to_cpu(*(uint32_t*)(binary.config + 8));
+   for (i = 0; i < binary.config_size; i+= 8) {
+   unsigned reg =
+   util_le32_to_cpu(*(uint32_t*)(binary.config + i));
+   unsigned value =
+   util_le32_to_cpu(*(uint32_t*)(binary.config + i + 4));
+   switch (reg) {
+   /* R600 / R700 */
+   case R_028850_SQ_PGM_RESOURCES_PS:
+   case R_028868_SQ_PGM_RESOURCES_VS:
+   /* Evergreen / Northern Islands */
+   case R_028844_SQ_PGM_RESOURCES_PS:
+   case R_028860_SQ_PGM_RESOURCES_VS:
+   case R_0288D4_SQ_PGM_RESOURCES_LS:
+   bc->ngpr = G_028844_NUM_GPRS(value);
+   bc->nstack = G_028844_STACK_SIZE(value);
+   break;
+   case R_02880C_DB_SHADER_CONTROL:
+   *use_kill = G_02880C_KILL_ENABLE(value);
+   break;
+   }
+   }
+
return r;
 }
 
-- 
1.7.11.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] r600g/llvm: Don't feed LLVM output through r600_bytecode_build()

2013-05-03 Thread Tom Stellard
From: Tom Stellard 

The LLVM backend emits raw ISA now, so we can just its output
unmodified.
---
 src/gallium/drivers/r600/r600_llvm.c   |  11 +-
 src/gallium/drivers/r600/r600_llvm.h   |   2 -
 src/gallium/drivers/r600/r600_shader.c | 401 ++---
 3 files changed, 20 insertions(+), 394 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_llvm.c 
b/src/gallium/drivers/r600/r600_llvm.c
index 83d7340..f83caee 100644
--- a/src/gallium/drivers/r600/r600_llvm.c
+++ b/src/gallium/drivers/r600/r600_llvm.c
@@ -552,8 +552,6 @@ LLVMModuleRef r600_tgsi_llvm(
 
 unsigned r600_llvm_compile(
LLVMModuleRef mod,
-   unsigned char ** inst_bytes,
-   unsigned * inst_byte_count,
enum radeon_family family,
struct r600_bytecode *bc,
boolean *use_kill,
@@ -562,9 +560,14 @@ unsigned r600_llvm_compile(
unsigned r;
struct radeon_llvm_binary binary;
const char * gpu_family = r600_llvm_gpu_string(family);
+
r = radeon_llvm_compile(mod, &binary, gpu_family, dump);
-   *inst_bytes = binary.code;
-   *inst_byte_count = binary.code_size;
+
+   assert(binary.code_size % 4 == 0);
+   bc->bytecode = CALLOC(1, binary.code_size);
+   memcpy(bc->bytecode, binary.code, binary.code_size);
+   bc->ndw = binary.code_size / 4;
+
bc->ngpr = util_le32_to_cpu(*(uint32_t*)binary.config);
bc->nstack = util_le32_to_cpu(*(uint32_t*)(binary.config + 4));
*use_kill = util_le32_to_cpu(*(uint32_t*)(binary.config + 8));
diff --git a/src/gallium/drivers/r600/r600_llvm.h 
b/src/gallium/drivers/r600/r600_llvm.h
index 50bbca6..e036bff 100644
--- a/src/gallium/drivers/r600/r600_llvm.h
+++ b/src/gallium/drivers/r600/r600_llvm.h
@@ -18,8 +18,6 @@ LLVMModuleRef r600_tgsi_llvm(
 
 unsigned r600_llvm_compile(
LLVMModuleRef mod,
-   unsigned char ** inst_bytes,
-   unsigned * inst_byte_count,
enum radeon_family family,
struct r600_bytecode *bc,
boolean *use_kill,
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index b3b124e..78266c0 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -158,10 +158,17 @@ int r600_pipe_shader_create(struct pipe_context *ctx,
R600_ERR("translation from TGSI failed !\n");
return r;
}
-   r = r600_bytecode_build(&shader->shader.bc);
-   if (r) {
-   R600_ERR("building bytecode failed !\n");
-   return r;
+
+   /* Check if the bytecode has already been built.  When using the llvm
+* backend, r600_shader_from_tgsi() will take care of building the
+* bytecode.
+*/
+   if (!shader->shader.bc.bytecode) {
+   r = r600_bytecode_build(&shader->shader.bc);
+   if (r) {
+   R600_ERR("building bytecode failed !\n");
+   return r;
+   }
}
 
if (dump && !sb_disasm) {
@@ -284,23 +291,11 @@ static int tgsi_bgnloop(struct r600_shader_ctx *ctx);
 static int tgsi_endloop(struct r600_shader_ctx *ctx);
 static int tgsi_loop_brk_cont(struct r600_shader_ctx *ctx);
 
-/*
- * bytestream -> r600 shader
- *
- * These functions are used to transform the output of the LLVM backend into
- * struct r600_bytecode.
- */
-
-static void r600_bytecode_from_byte_stream(struct r600_shader_ctx *ctx,
-   unsigned char * bytes,  unsigned num_bytes);
-
 #ifdef HAVE_OPENCL
 int r600_compute_shader_create(struct pipe_context * ctx,
LLVMModuleRef mod,  struct r600_bytecode * bytecode)
 {
struct r600_context *r600_ctx = (struct r600_context *)ctx;
-   unsigned char * bytes;
-   unsigned byte_count;
struct r600_shader_ctx shader_ctx;
boolean use_kill = false;
bool dump = (r600_ctx->screen->debug_flags & DBG_CS) != 0;
@@ -313,13 +308,8 @@ int r600_compute_shader_create(struct pipe_context * ctx,
   r600_ctx->screen->msaa_texture_support);
shader_ctx.bc->type = TGSI_PROCESSOR_COMPUTE;
shader_ctx.bc->isa = r600_ctx->isa;
-   r600_llvm_compile(mod, &bytes, &byte_count, r600_ctx->family,
+   r600_llvm_compile(mod, r600_ctx->family,
shader_ctx.bc, &use_kill, dump);
-   r600_bytecode_from_byte_stream(&shader_ctx, bytes, byte_count);
-   if (shader_ctx.bc->chip_class == CAYMAN) {
-   cm_bytecode_add_cf_end(shader_ctx.bc);
-   }
-   r600_bytecode_build(shader_ctx.bc);
 
if (dump && !sb_disasm) {
r600_bytecode_disasm(shader_ctx.bc);
@@ -328,366 +318,11 @@ int r600_compute_shader_create(struct pipe_context * ctx,
R600_ERR("r600_sb_bytecode_process failed!\n");
}
 
-   free(bytes);
return 1;
 }
 
 #endif /* HAVE_OPENCL */
 
-static uint32_t i32_from_byte_stream(unsigned char * by

[Mesa-dev] [PATCH 1/3] r600g/llvm: Don't emit CALL_FS for vertex shaders

2013-05-03 Thread Tom Stellard
From: Tom Stellard 

The LLVM backend takes care of this now.
---
 src/gallium/drivers/r600/r600_shader.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 9afd57f..b3b124e 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -1371,13 +1371,6 @@ static int r600_shader_from_tgsi(struct r600_screen 
*rscreen,
for (i = 0; i < TGSI_FILE_COUNT; i++) {
ctx.file_offset[i] = 0;
}
-   if (ctx.type == TGSI_PROCESSOR_VERTEX) {
-   ctx.file_offset[TGSI_FILE_INPUT] = 1;
-   r600_bytecode_add_cfinst(ctx.bc, CF_OP_CALL_FS);
-   }
-   if (ctx.type == TGSI_PROCESSOR_FRAGMENT && ctx.bc->chip_class >= 
EVERGREEN) {
-   ctx.file_offset[TGSI_FILE_INPUT] = evergreen_gpr_count(&ctx);
-   }
 
 #ifdef R600_USE_LLVM
if (use_llvm && ctx.info.indirect_files && (ctx.info.indirect_files & 
(1 << TGSI_FILE_CONSTANT)) != ctx.info.indirect_files) {
@@ -1387,6 +1380,15 @@ static int r600_shader_from_tgsi(struct r600_screen 
*rscreen,
use_llvm = 0;
}
 #endif
+   if (ctx.type == TGSI_PROCESSOR_VERTEX) {
+   ctx.file_offset[TGSI_FILE_INPUT] = 1;
+   if (!use_llvm) {
+   r600_bytecode_add_cfinst(ctx.bc, CF_OP_CALL_FS);
+   }
+   }
+   if (ctx.type == TGSI_PROCESSOR_FRAGMENT && ctx.bc->chip_class >= 
EVERGREEN) {
+   ctx.file_offset[TGSI_FILE_INPUT] = evergreen_gpr_count(&ctx);
+   }
ctx.use_llvm = use_llvm;
 
if (use_llvm) {
-- 
1.7.11.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] R600 Patchset: Emit true ISA

2013-05-03 Thread Tom Stellard
Hi,

The attached patches modify the CodeEmitter to emit true ISA.
Previously, we were prefixing all instructions with an instruction type
byte.

Vincent did most of the work to convert the CodeEmitter to true ISA,
these patches are just the last few cleanups that are needed to finish
the project.

Please test/review.

Thanks,
Tom
>From 4fc6af0637de0eae0542a987e93d467bad3a4eee Mon Sep 17 00:00:00 2001
From: Tom Stellard 
Date: Fri, 3 May 2013 11:17:18 -0700
Subject: [PATCH 1/4] R600: Emit ISA for CALL_FS_* instructions

---
 lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp |  1 -
 test/CodeGen/R600/call_fs.ll   | 15 +++
 2 files changed, 15 insertions(+), 1 deletion(-)
 create mode 100644 test/CodeGen/R600/call_fs.ll

diff --git a/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp 
b/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
index 7c83d86..8261477 100644
--- a/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
+++ b/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
@@ -262,7 +262,6 @@ void R600MCCodeEmitter::EncodeInstruction(const MCInst &MI, 
raw_ostream &OS,
 }
 case AMDGPU::CF_CALL_FS_EG:
 case AMDGPU::CF_CALL_FS_R600:
-  return;
 case AMDGPU::CF_TC_EG:
 case AMDGPU::CF_VC_EG:
 case AMDGPU::CF_TC_R600:
diff --git a/test/CodeGen/R600/call_fs.ll b/test/CodeGen/R600/call_fs.ll
new file mode 100644
index 000..fd21b72
--- /dev/null
+++ b/test/CodeGen/R600/call_fs.ll
@@ -0,0 +1,15 @@
+
+; RUN: llc < %s -march=r600 -mcpu=redwood -show-mc-encoding -o - | FileCheck 
--check-prefix=EG-CHECK %s
+; RUN: llc < %s -march=r600 -mcpu=rv710 -show-mc-encoding -o - | FileCheck 
--check-prefix=R600-CHECK %s
+
+; EG-CHECK: @call_fs
+; EG-CHECK: CALL_FS  ; encoding: [0x03,0x00,0x00,0x00,0x00,0x00,0x00,0xc0,0x84]
+; R600-CHECK: @call_fs
+; R600-CHECK:CALL_FS ; encoding: [0x03,0x00,0x00,0x00,0x00,0x00,0x00,0x80,0x89]
+
+
+define void @call_fs() #0 {
+  ret void
+}
+
+attributes #0 = { "ShaderType"="1" } ; Vertex Shader
-- 
1.7.11.4

>From 24b983199b5c81eb0362f4c583eafb842255acc5 Mon Sep 17 00:00:00 2001
From: Tom Stellard 
Date: Fri, 3 May 2013 11:38:33 -0700
Subject: [PATCH 2/4] R600: Stop emitting the instruction type byte before
 each instruction

---
 lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp | 35 ++
 test/CodeGen/R600/call_fs.ll   |  4 +--
 test/CodeGen/R600/cf_end.ll|  4 +--
 3 files changed, 6 insertions(+), 37 deletions(-)

diff --git a/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp 
b/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
index 8261477..10dee20 100644
--- a/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
+++ b/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
@@ -9,12 +9,8 @@
 //
 /// \file
 ///
-/// This code emitter outputs bytecode that is understood by the r600g driver
-/// in the Mesa [1] project.  The bytecode is very similar to the hardware's 
ISA,
-/// but it still needs to be run through a finalizer in order to be executed
-/// by the GPU.
-///
-/// [1] http://www.mesa3d.org/
+/// \brief The R600 code emitter produces machine code that can be executed
+/// directly on the GPU device.
 //
 
//===--===//
 
@@ -95,16 +91,6 @@ enum RegElement {
   ELEMENT_W
 };
 
-enum InstrTypes {
-  INSTR_ALU = 0,
-  INSTR_TEX,
-  INSTR_FC,
-  INSTR_NATIVE,
-  INSTR_VTX,
-  INSTR_EXPORT,
-  INSTR_CFALU
-};
-
 enum FCInstr {
   FC_IF_PREDICATE = 0,
   FC_ELSE,
@@ -152,7 +138,6 @@ void R600MCCodeEmitter::EncodeInstruction(const MCInst &MI, 
raw_ostream &OS,
 case AMDGPU::RAT_WRITE_CACHELESS_32_eg:
 case AMDGPU::RAT_WRITE_CACHELESS_128_eg: {
   uint64_t inst = getBinaryCodeForInstr(MI, Fixups);
-  EmitByte(INSTR_NATIVE, OS);
   Emit(inst, OS);
   break;
 }
@@ -170,9 +155,7 @@ void R600MCCodeEmitter::EncodeInstruction(const MCInst &MI, 
raw_ostream &OS,
   uint32_t InstWord2 = MI.getOperand(2).getImm(); // Offset
   InstWord2 |= 1 << 19;
 
-  EmitByte(INSTR_NATIVE, OS);
   Emit(InstWord01, OS);
-  EmitByte(INSTR_NATIVE, OS);
   Emit(InstWord2, OS);
   Emit((u_int32_t) 0, OS);
   break;
@@ -246,9 +229,7 @@ void R600MCCodeEmitter::EncodeInstruction(const MCInst &MI, 
raw_ostream &OS,
   SrcSelect[ELEMENT_W] << 29 | Offsets[0] << 0 | Offsets[1] << 5 |
   Offsets[2] << 10;
 
-  EmitByte(INSTR_NATIVE, OS);
   Emit(Word01, OS);
-  EmitByte(INSTR_NATIVE, OS);
   Emit(Word2, OS);
   Emit((u_int32_t) 0, OS);
   break;
@@ -256,7 +237,6 @@ void R600MCCodeEmitter::EncodeInstruction(const MCInst &MI, 
raw_ostream &OS,
 case AMDGPU::CF_ALU:
 case AMDGPU::CF_ALU_PUSH_BEFORE: {
   uint64_t Inst = getBinaryCodeForInstr(MI, Fixups);
-  EmitByte(INSTR_NATIVE, OS);
   Emit(Inst, OS);
   break;
 }
@@ -289,13 +269,11 @@ void R600MCCodeEmitter::EncodeInstruction(const MCInst 
&MI, raw_ostream &

[Mesa-dev] [PATCH] glsl: Flip around "if" statements with empty "then" blocks.

2013-05-03 Thread Eric Anholt
This cleans up some funny-looking code in some unigine shaders I was
looking at.  Also slightly helps on planeshift and a few shaders in an
upcoming Valve release.

total instructions in shared programs: 1653715 -> 1653587 (-0.01%)
instructions in affected programs: 16550 -> 16422 (-0.77%)
---
 src/glsl/opt_if_simplification.cpp | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/src/glsl/opt_if_simplification.cpp 
b/src/glsl/opt_if_simplification.cpp
index e1a23d9..db59b13 100644
--- a/src/glsl/opt_if_simplification.cpp
+++ b/src/glsl/opt_if_simplification.cpp
@@ -25,7 +25,8 @@
  * \file opt_if_simplification.cpp
  *
  * Moves constant branches of if statements out to the surrounding
- * instruction stream.
+ * instruction stream, and inverts if conditionals to avoid empty
+ * "then" blocks.
  */
 
 #include "ir.h"
@@ -101,6 +102,30 @@ ir_if_simplification_visitor::visit_leave(ir_if *ir)
   }
   ir->remove();
   this->made_progress = true;
+  return visit_continue;
+   }
+
+   /* Turn:
+*
+* if (cond) {
+* } else {
+* do_work();
+* }
+*
+* into :
+*
+* if (!cond)
+* do_work();
+*
+* which avoids control flow for "else" (which is usually more
+* expensive than normal operations), and the "not" can usually be
+* folded into the generation of "cond" anyway.
+*/
+   if (ir->then_instructions.is_empty()) {
+  ir->condition = new(ralloc_parent(ir->condition))
+ir_expression(ir_unop_logic_not, ir->condition);
+  ir->else_instructions.move_nodes_to(&ir->then_instructions);
+  this->made_progress = true;
}
 
return visit_continue;
-- 
1.8.3.rc0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] vbo: implement primitive merging for glBegin/End sequences

2013-05-03 Thread Eric Anholt
Brian Paul  writes:
> diff --git a/src/mesa/vbo/vbo_save_api.c b/src/mesa/vbo/vbo_save_api.c
> index b8dd90c..9ce3c6e 100644
> --- a/src/mesa/vbo/vbo_save_api.c
> +++ b/src/mesa/vbo/vbo_save_api.c
> @@ -305,9 +305,9 @@ _save_reset_counters(struct gl_context *ctx)
>   * previous prim.
>   */
>  static void
> -vbo_merge_prims(struct gl_context *ctx,
> -struct _mesa_prim *prim_list,
> -GLuint *prim_count)
> +merge_prims(struct gl_context *ctx,
> +struct _mesa_prim *prim_list,
> +GLuint *prim_count)
>  {
> GLuint i;
> struct _mesa_prim *prev_prim = prim_list;
> @@ -315,10 +315,9 @@ vbo_merge_prims(struct gl_context *ctx,
> for (i = 1; i < *prim_count; i++) {
>struct _mesa_prim *this_prim = prim_list + i;
>  
> -  if (this_prim->mode == prev_prim->mode &&
> -  this_prim->mode == GL_QUADS &&
> -  this_prim->count % 4 == 0 &&
> -  prev_prim->count % 4 == 0 &&
> +  vbo_try_prim_conversion(this_prim);
> +
> +  if (vbo_can_merge_prims(prev_prim, this_prim) &&
>this_prim->start == prev_prim->start + prev_prim->count &&
>this_prim->basevertex == prev_prim->basevertex &&
>this_prim->num_instances == prev_prim->num_instances &&

These other checks look like they should be in vbo_can_merge_prims.

Other than that,

Reviewed-by: Eric Anholt 


pgpIm8htMPF00.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallium: more tgsi documentation updates

2013-05-03 Thread sroland
From: Roland Scheidegger 

Adds the remaining integer opcodes, and some opcodes are moved to more
appropriate places, along with getting rid of the (already nearly empty)
ps_2_x section. Though the CAP bits for some of these are still a bit in
the air so the documentation isn't quite as watertight as is desirable.
---
 src/gallium/docs/source/tgsi.rst |  381 +-
 1 file changed, 251 insertions(+), 130 deletions(-)

diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst
index b2f7a85..b479fcf 100644
--- a/src/gallium/docs/source/tgsi.rst
+++ b/src/gallium/docs/source/tgsi.rst
@@ -473,7 +473,7 @@ This instruction replicates its result.
 
 .. opcode:: KILP - Predicated Discard
 
-  discard
+  Not really predicated, just unconditional discard
 
 
 .. opcode:: PK2H - Pack Two 16-bit Floats
@@ -720,25 +720,6 @@ This instruction replicates its result.
   dst.w = round(src.w)
 
 
-.. opcode:: BRA - Branch
-
-  pc = target
-
-.. note::
-
-   Considered for removal.
-
-.. opcode:: CAL - Subroutine Call
-
-  push(pc)
-  pc = target
-
-
-.. opcode:: RET - Subroutine Call Return
-
-  pc = pop()
-
-
 .. opcode:: SSG - Set Sign
 
 .. math::
@@ -856,99 +837,6 @@ This instruction replicates its result.
   dst = texture_sample(unit, coord, lod)
 
 
-.. opcode:: BRK - Break
-
-  Unconditionally moves the point of execution to the instruction after the
-  next endloop or endswitch. The instruction must appear within a loop/endloop
-  or switch/endswitch.
-
-
-.. opcode:: BREAKC - Break Conditional
-
-  Conditionally moves the point of execution to the instruction after the
-  next endloop or endswitch. The instruction must appear within a loop/endloop
-  or switch/endswitch.
-  Condition evaluates to true if src0.x != 0 where src0.x is interpreted
-  as an integer register.
-
-
-.. opcode:: CONT - Continue
-
-  TBD
-
-.. note::
-
-   Support for CONT is determined by a special capability bit,
-   ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information.
-
-
-.. opcode:: IF - Float If
-
-  Start an IF ... ELSE .. ENDIF block.  Condition evaluates to true if
-
-src0.x != 0.0
-
-  where src0.x is interpreted as a floating point register.
-
-
-.. opcode:: UIF - Bitwise If
-
-  Start an UIF ... ELSE .. ENDIF block. Condition evaluates to true if
-
-src0.x != 0
-
-  where src0.x is interpreted as an integer register.
-
-
-.. opcode:: ELSE - Else
-
-  Starts an else block, after an IF or UIF statement.
-
-
-.. opcode:: ENDIF - End If
-
-  Ends an IF or UIF block.
-
-
-.. opcode:: SWITCH - Switch
-
-   Starts a C-style switch expression. The switch consists of one or multiple
-   CASE statements, and at most one DEFAULT statement. Execution of a statement
-   ends when a BRK is hit, but just like in C falling through to other cases
-   without a break is allowed. Similarly, DEFAULT label is allowed anywhere not
-   just as last statement, and fallthrough is allowed into/from it.
-   CASE src arguments are evaluated at bit level against the SWITCH src 
argument.
-
-   Example:
-   SWITCH src[0].x
-   CASE src[0].x
-   (some instructions here)
-   (optional BRK here)
-   DEFAULT
-   (some instructions here)
-   (optional BRK here)
-   CASE src[0].x
-   (some instructions here)
-   (optional BRK here)
-   ENDSWITCH
-
-
-.. opcode:: CASE - Switch case
-
-   This represents a switch case label. The src arg must be an integer 
immediate.
-
-
-.. opcode:: DEFAULT - Switch default
-
-   This represents the default case in the switch, which is taken if no other
-   case matches.
-
-
-.. opcode:: ENDSWITCH - End of switch
-
-   Ends a switch expression.
-
-
 .. opcode:: PUSHA - Push Address Register On Stack
 
   push(src.x)
@@ -980,6 +868,28 @@ This instruction replicates its result.
Considered for removal.
 
 
+.. opcode:: BRA - Branch
+
+  pc = target
+
+.. note::
+
+   Considered for removal.
+
+
+.. opcode:: CALLNZ - Subroutine Call If Not Zero
+
+   TBD
+
+.. note::
+
+   Considered for cleanup.
+
+.. note::
+
+   Considered for removal.
+
+
 Compute ISA
 
 
@@ -1380,8 +1290,6 @@ Support for these opcodes indicated by 
PIPE_SHADER_CAP_INTEGERS (all of them?)
   dst.w = src0.w >> (unsigned) src1.x
 
 
-
-
 .. opcode:: UCMP - Integer Conditional Move
 
 .. math::
@@ -1395,6 +1303,115 @@ Support for these opcodes indicated by 
PIPE_SHADER_CAP_INTEGERS (all of them?)
   dst.w = src0.w ? src1.w : src2.w
 
 
+
+.. opcode:: ISSG - Integer Set Sign
+
+.. math::
+
+  dst.x = (src0.x < 0) ? -1 : (src0.x > 0) ? 1 : 0
+
+  dst.y = (src0.y < 0) ? -1 : (src0.y > 0) ? 1 : 0
+
+  dst.z = (src0.z < 0) ? -1 : (src0.z > 0) ? 1 : 0
+
+  dst.w = (src0.w < 0) ? -1 : (src0.w > 0) ? 1 : 0
+
+
+
+.. opcode:: ISLT - Signed Integer Set On Less Than
+
+.. math::
+
+  dst.x = (src0.x < src1.x) ? ~0 : 0
+
+  dst.y = (src0.y < src1.y) ? ~0 : 0
+
+  dst.z = (src0.z < src1.z) ? ~0 : 0
+
+  dst.w = (src0.w < src1.w) ? ~0 : 0
+
+
+.. opcode:: USLT - Unsigned Integer Set

[Mesa-dev] [Bug 39527] 3D Driving-School - missing textures

2013-05-03 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=39527

David "okias" Heidelberger  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |WORKSFORME

--- Comment #2 from David "okias" Heidelberger  ---
Well, tested on r600 today, and it seems like have all textures loaded.
CLOSING.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Enable fast clears on non-8x4-aligned sizes.

2013-05-03 Thread Chad Versace

On 05/02/2013 04:58 PM, Kenneth Graunke wrote:

On 05/02/2013 01:58 PM, Eric Anholt wrote:

Improves glb2.7 performance at a misaligned size by 2.3% +/- 0.7% (n=11).
The workaround was to avoid bad primitive/surface sizes, but that's worked
around as of a14dc4f92cdad6177d83f051a088a66e31a973bc.  (One might note
that pre-gen7 we don't know that the right half of an 8x4 at the right
edge is actually our pixels, but we're already clobbering those pixels for
depth resolves anyway and more work would be required to avoid that).
---
  src/mesa/drivers/dri/i965/brw_clear.c | 13 -
  1 file changed, 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_clear.c 
b/src/mesa/drivers/dri/i965/brw_clear.c
index 4ee456a..2b999bf 100644
--- a/src/mesa/drivers/dri/i965/brw_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_clear.c
@@ -128,19 +128,6 @@ brw_fast_clear_depth(struct gl_context *ctx)
return false;
 }

-   /* The rendered area has to be 8x4 samples, not resolved pixels, so we look
-* at the miptree slice dimensions instead of renderbuffer size.
-*/
-   if (mt->level[depth_irb->mt_level].width % 8 != 0 ||
-   mt->level[depth_irb->mt_level].height % 4 != 0) {
-  perf_debug("Failed to fast clear depth due to width/height %d,%d not "
- "being aligned to 8,4.  Possible 5%% performance win if "
- "avoided\n",
- mt->level[depth_irb->mt_level].width,
- mt->level[depth_irb->mt_level].height);
-  return false;
-   }
-
 uint32_t depth_clear_value;
 switch (mt->format) {
 case MESA_FORMAT_Z32_FLOAT_X24S8:



Would be worth regression testing on Gen6.

Really glad to see this working.
Reviewed-by: Kenneth Graunke 


Reviewed-by: Chad Versace 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] visual leak?

2013-05-03 Thread Andy Skinner
Thank you.  As far as I can tell, the pointer in question was allocated to be 
stored in the object that is freed there, and XMesaDestroyVisual frees them 
together.

But because of your caution, and because it is a lot smaller than the larger 
object which will be freed if I register it with the display, I'll wait on that.

thanks
andy

-Original Message-
From: Brian Paul [mailto:bri...@vmware.com] 
Sent: Friday, May 03, 2013 1:04 PM
To: Andy Skinner
Cc: mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] visual leak?

On 05/03/2013 09:42 AM, Andy Skinner wrote:
> Hi, Brian.  Did you have thoughts one these?
>
> Would it make sense, if we can't update Mesa right now, to just put the 
> register_with_display() call into Fake_glXChooseFBConfig(), as it is for 
> Fake_glxChooseVisual()?  It allows the visual to be cleaned up when the 
> Display is closed.  Unless it introduces something else we'd have to chase 
> down, we think we'd have to do a smaller job to make that change (for now) 
> than to go to more recent Mesa.

Sounds fine.


> Whether we do that or not, should destroy_visuals_on_display() free both the 
> pointers that XMesaDestroyVisual() frees?  This looks like a leak to me, even 
> in current Mesa.

I don't think so.  There's several XVisualInfo pointers that we hang 
onto.  Some come from the user/application (and we can't free those) 
and others are internal to Mesa's GLX.  I'd have to study/fix the code 
to sort that out.  We risk a double-free otherwise.

-Brian


>
> thanks
> andy
>
> -Original Message-
> From: Andy Skinner
> Sent: Monday, April 29, 2013 1:04 PM
> To: 'Brian Paul'
> Cc: mesa-dev@lists.freedesktop.org
> Subject: RE: [Mesa-dev] visual leak?
>
> Thanks.  I realized today that newer Mesa registers the display with that 
> callback for choosing FBConfig, but our older version only does it for 
> Visual.  So it isn't getting freed when we close the display.
>
> Do you think the register_with_display() call can be put into 
> Fake_glXChooseFBConfig() on its own?  We'd like to update Mesa, but it may 
> not be a good time in our release cycle.
>
> Also, destroy_visuals_on_display calls free, but XMesaDestroyVisual can also 
> call free on v->visinfo.  Should destroy_visuals_on_display call 
> XMesaDestroyVisual, or do both free()s?
>
> thanks
> andy
>
> -Original Message-
> From: Brian Paul [mailto:bri...@vmware.com]
> Sent: Monday, April 29, 2013 11:39 AM
> To: Andy Skinner
> Cc: mesa-dev@lists.freedesktop.org
> Subject: Re: [Mesa-dev] visual leak?
>
> On 04/26/2013 03:23 PM, Andy Skinner wrote:
>> Hi. We are seeing a leak of things created in XMesaCreateVisual, as
>> called from fakeglx.c. I don't see anywhere that XMesaDestroyVisual is
>> called. I see a free() in destroy_visuals_on_display, but I don't know
>> how often that is called. Are these visuals intended to be kept around?
>>
>> The visual is allocated in XMesaCreateVisual, from save_glx_visual,
>> from choose_visual, from Fake_glXChooseFBConfig.
>>
>> It may be that the problem is that I'm not finding and reusing one in
>> the list, but separate from that, I am curious why XMesaDestroyVisual
>> isn't called.
>>
>> Any thoughts?
>
> There's no GLX function for explicitly destroying a visual so that's why 
> XMesaDestroyVisual() isn't called.  Instead, we register an X display 
> callback that's called when the X Display is closed which frees visuals and 
> other things.
>
> So, when you call XCloseDisplay, they should get freed.
>
> -Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/12] sso: update glGet: GL_PROGRAM_PIPELINE_BINDING

2013-05-03 Thread Matt Turner
On Fri, May 3, 2013 at 10:44 AM, Gregory Hainaut
 wrote:
> ---
>  src/mesa/main/get.c  |9 +
>  src/mesa/main/get_hash_params.py |3 +++
>  2 files changed, 12 insertions(+)
>
> diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
> index 54159c0..6cbb7db 100644
> --- a/src/mesa/main/get.c
> +++ b/src/mesa/main/get.c
> @@ -369,6 +369,7 @@ EXTRA_EXT(ARB_map_buffer_alignment);
>  EXTRA_EXT(ARB_texture_cube_map_array);
>  EXTRA_EXT(ARB_texture_buffer_range);
>  EXTRA_EXT(ARB_texture_multisample);
> +EXTRA_EXT(ARB_separate_shader_objects);
>
>  static const int
>  extra_ARB_color_buffer_float_or_glcore[] = {
> @@ -889,6 +890,14 @@ find_custom_value(struct gl_context *ctx, const struct 
> value_desc *d, union valu
>   _mesa_problem(ctx, "driver doesn't implement GetTimestamp");
>}
>break;
> +   /* GL_ARB_separate_shader_objects */
> +   case GL_PROGRAM_PIPELINE_BINDING:
> +  if (ctx->Pipeline.Current) {
> + v->value_int = ctx->Pipeline.Current->Name;
> +  } else {
> + v->value_int = 0;
> +  }
> +  break;
> }
>  }

This looks believable, but I can't find a description in the extension
spec or GL 4.1+ specs that say precisely what this query is supposed
to do. Looks like it's just mentioned in the extension spec, and not
at all in GL 4.1+ specs.

> diff --git a/src/mesa/main/get_hash_params.py 
> b/src/mesa/main/get_hash_params.py
> index 2b97da6..43a11cf 100644
> --- a/src/mesa/main/get_hash_params.py
> +++ b/src/mesa/main/get_hash_params.py
> @@ -709,6 +709,9 @@ descriptor=[
>
>  # GL_ARB_texture_cube_map_array
>[ "TEXTURE_BINDING_CUBE_MAP_ARRAY_ARB", "LOC_CUSTOM, TYPE_INT, 
> TEXTURE_CUBE_ARRAY_INDEX, extra_ARB_texture_cube_map_array" ],
> +
> +# GL_ARB_separate_shader_objects
> +  [ "PROGRAM_PIPELINE_BINDING", "LOC_CUSTOM, TYPE_INT, 
> GL_PROGRAM_PIPELINE_BINDING, extra_ARB_separate_shader_objects" ],
>  ]},
>
>  # Enums restricted to OpenGL Core profile
> --
> 1.7.10.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/12] sso: implement ActiveShaderProgram & GetProgramPipelineiv

2013-05-03 Thread Matt Turner
On Fri, May 3, 2013 at 10:44 AM, Gregory Hainaut
 wrote:
> V2:
> * Rename object
> * Formatting improvement
> ---
>  src/mesa/main/pipelineobj.c |   77 
> +++
>  1 file changed, 77 insertions(+)
>
> diff --git a/src/mesa/main/pipelineobj.c b/src/mesa/main/pipelineobj.c
> index d81bd0e..ffbeeae 100644
> --- a/src/mesa/main/pipelineobj.c
> +++ b/src/mesa/main/pipelineobj.c
> @@ -231,6 +231,30 @@ _mesa_UseProgramStages(GLuint pipeline, GLbitfield 
> stages, GLuint program)
>  void GLAPIENTRY
>  _mesa_ActiveShaderProgram(GLuint pipeline, GLuint program)
>  {
> +   GET_CURRENT_CONTEXT(ctx);
> +   struct gl_shader_program *shProg = (program != 0)
> +  ? _mesa_lookup_shader_program_err(ctx, program, 
> "glActiveShaderProgram(program)")
> +  : NULL;
> +
> +   struct gl_pipeline_object *pipe = lookup_pipeline_object(ctx, pipeline);
> +
> +   if (!pipe) {
> +  _mesa_error(ctx, GL_INVALID_OPERATION, 
> "glActiveShaderProgram(pipeline)");
> +  return;
> +   }
> +
> +   /* Object is created by any Pipeline call but glGenProgramPipelines,
> +* glIsProgramPipeline and GetProgramPipelineInfoLog
> +*/
> +   pipe->EverBound = GL_TRUE;
> +
> +   if ((shProg != NULL) && !shProg->LinkStatus) {
> +  _mesa_error(ctx, GL_INVALID_OPERATION,
> +"glActiveShaderProgram(program %u not linked)", shProg->Name);
> +  return;
> +   }
> +
> +   _mesa_reference_shader_program(ctx, &pipe->ActiveProgram, shProg);
>  }
>
>  /**
> @@ -348,6 +372,59 @@ _mesa_IsProgramPipeline(GLuint pipeline)
>  void GLAPIENTRY
>  _mesa_GetProgramPipelineiv(GLuint pipeline, GLenum pname, GLint *params)
>  {
> +   GET_CURRENT_CONTEXT(ctx);
> +   struct gl_pipeline_object *pipe = lookup_pipeline_object(ctx, pipeline);
> +
> +   /* Are geometry shaders available in this context?
> +*/
> +   const bool has_gs = _mesa_is_desktop_gl(ctx) && 
> ctx->Extensions.ARB_geometry_shader4;

The geometry shaders in GL 3.2 aren't exactly the same as in
ARB_geometry_shader4, so it's conceivable that we could support 3.2
without ARB_geometry_shader4. Probably should be changed to

const bool has_gs = _mesa_is_desktop_gl(ctx) && (ctx->Version >= 32 ||
ctx->Extensions.ARB_geometry_shader4);

> +
> +   if (!pipe) {
> +  _mesa_error(ctx, GL_INVALID_OPERATION, 
> "glGetProgramPipelineiv(pipeline)");
> +  return;
> +   }
> +
> +   /* Object is created by any Pipeline call but glGenProgramPipelines,
> +* glIsProgramPipeline and GetProgramPipelineInfoLog
> +*/
> +   pipe->EverBound = GL_TRUE;
> +
> +   switch (pname) {
> +   case GL_ACTIVE_PROGRAM:
> +  *params = pipe->ActiveProgram ? pipe->ActiveProgram->Name : 0;
> +  return;
> +   case GL_INFO_LOG_LENGTH:
> +  // TODO
> +  *params = 0;
> +  return;
> +   case GL_VALIDATE_STATUS:
> +  *params = pipe->ValidationStatus;
> +  return;
> +   case GL_VERTEX_SHADER:
> +  *params = pipe->CurrentVertexProgram ? 
> pipe->CurrentVertexProgram->Name : 0;
> +  return;
> +   case GL_TESS_EVALUATION_SHADER:
> +  /* NOT YET SUPPORTED */
> +  break;
> +   case GL_TESS_CONTROL_SHADER:
> +  /* NOT YET SUPPORTED */
> +  break;
> +   case GL_GEOMETRY_SHADER:
> +  if (!has_gs) break;
> +  *params = pipe->CurrentGeometryProgram ? 
> pipe->CurrentGeometryProgram->Name : 0;;
> +  return;
> +   case GL_FRAGMENT_SHADER:
> +  *params = pipe->CurrentFragmentProgram ? 
> pipe->CurrentFragmentProgram->Name : 0;;

Double ; at the ends of these two lines.

> +  return;
> +   case GL_COMPUTE_SHADER:
> +  /* NOT YET SUPPORTED */
> +  break;

GL_COMPUTE_SHADER isn't valid, even in GL 4.3 where compute shaders
are part of core.

> +   default:
> +  break;
> +   }

The spec says
"""
Dependencies on ARB_geometry_shader4, EXT_geometry_shader4, NV_geometry_-
shader4, and/or OpenGL version 3.2

If none of ARB_geometry_shader4, EXT_geometry_shader4, NV_geometry_-
shader4, or OpenGL 3.2 are supported by the implementation, ignore
all references to geometry shaders and generate an INVALID_ENUM
error when UseProgramStages is called with GEOMETRY_SHADER_BIT
set in .
"""

Similar wording appears for tesselation So I think if a shader stage
isn't supported that this function should return INVALID_ENUM.

> +
> +   _mesa_error(ctx, GL_INVALID_ENUM, "glGetProgramPipelineiv(pname=%s)",
> + _mesa_lookup_enum_by_nr(pname));
>  }
>
>  /**
> --
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] vbo: implement primitive merging for glBegin/End sequences

2013-05-03 Thread Jose Fonseca
Series looks alright AFAICT.
Jose

- Original Message -
> A surprising number of apps and benchmarks have poor code like this:
> 
> glBegin(GL_LINE_STRIP);
> glVertex(v1);
> glVertex(v2);
> glEnd();
> // Possibly some no-op state changes here
> glBegin(GL_LINE_STRIP);
> glVertex(v3);
> glVertex(v4);
> glEnd();
> // repeat many, many times.
> 
> The above sequence can be converted into:
> 
> glBegin(GL_LINES);
> glVertex(v1);
> glVertex(v2);
> glVertex(v3);
> glVertex(v4);
> glEnd();
> 
> Similarly for GL_POINTS, GL_TRIANGLES, etc.
> 
> Merging was already implemented for GL_QUADS in the display list code.
> Now other prim types are handled and it's also done for immediate mode.
> 
> In one case:
>  before   after
> ---
> number of st_draw_vbo() calls: 141  45
> number of _mesa_prims issued: 7520 632
> ---
>  src/mesa/vbo/vbo_exec_api.c |   30 ++
>  src/mesa/vbo/vbo_save_api.c |   18 --
>  2 files changed, 38 insertions(+), 10 deletions(-)
> 
> diff --git a/src/mesa/vbo/vbo_exec_api.c b/src/mesa/vbo/vbo_exec_api.c
> index 88cce98..600398c 100644
> --- a/src/mesa/vbo/vbo_exec_api.c
> +++ b/src/mesa/vbo/vbo_exec_api.c
> @@ -719,6 +719,34 @@ static void GLAPIENTRY vbo_exec_Begin( GLenum mode )
>  
>  
>  /**
> + * Try to merge / concatenate the two most recent VBO primitives.
> + */
> +static void
> +try_vbo_merge(struct vbo_exec_context *exec)
> +{
> +   struct _mesa_prim *cur =  &exec->vtx.prim[exec->vtx.prim_count - 1];
> +
> +   assert(exec->vtx.prim_count >= 1);
> +
> +   vbo_try_prim_conversion(cur);
> +
> +   if (exec->vtx.prim_count >= 2) {
> +  struct _mesa_prim *prev = &exec->vtx.prim[exec->vtx.prim_count - 2];
> +  assert(prev == cur - 1);
> +
> +  if (vbo_can_merge_prims(prev, cur)) {
> + assert(cur->begin);
> + assert(cur->end);
> + assert(prev->begin);
> + assert(prev->end);
> + vbo_merge_prims(prev, cur);
> + exec->vtx.prim_count--;  /* drop the last primitive */
> +  }
> +   }
> +}
> +
> +
> +/**
>   * Called via glEnd.
>   */
>  static void GLAPIENTRY vbo_exec_End( void )
> @@ -744,6 +772,8 @@ static void GLAPIENTRY vbo_exec_End( void )
>  
>exec->vtx.prim[i].end = 1;
>exec->vtx.prim[i].count = idx - exec->vtx.prim[i].start;
> +
> +  try_vbo_merge(exec);
> }
>  
> ctx->Driver.CurrentExecPrimitive = PRIM_OUTSIDE_BEGIN_END;
> diff --git a/src/mesa/vbo/vbo_save_api.c b/src/mesa/vbo/vbo_save_api.c
> index b8dd90c..9ce3c6e 100644
> --- a/src/mesa/vbo/vbo_save_api.c
> +++ b/src/mesa/vbo/vbo_save_api.c
> @@ -305,9 +305,9 @@ _save_reset_counters(struct gl_context *ctx)
>   * previous prim.
>   */
>  static void
> -vbo_merge_prims(struct gl_context *ctx,
> -struct _mesa_prim *prim_list,
> -GLuint *prim_count)
> +merge_prims(struct gl_context *ctx,
> +struct _mesa_prim *prim_list,
> +GLuint *prim_count)
>  {
> GLuint i;
> struct _mesa_prim *prev_prim = prim_list;
> @@ -315,10 +315,9 @@ vbo_merge_prims(struct gl_context *ctx,
> for (i = 1; i < *prim_count; i++) {
>struct _mesa_prim *this_prim = prim_list + i;
>  
> -  if (this_prim->mode == prev_prim->mode &&
> -  this_prim->mode == GL_QUADS &&
> -  this_prim->count % 4 == 0 &&
> -  prev_prim->count % 4 == 0 &&
> +  vbo_try_prim_conversion(this_prim);
> +
> +  if (vbo_can_merge_prims(prev_prim, this_prim) &&
>this_prim->start == prev_prim->start + prev_prim->count &&
>this_prim->basevertex == prev_prim->basevertex &&
>this_prim->num_instances == prev_prim->num_instances &&
> @@ -326,8 +325,7 @@ vbo_merge_prims(struct gl_context *ctx,
>   /* We've found a prim that just extend the previous one.  Tack it
>* onto the previous one, and let this primitive struct get
>dropped.
>*/
> - prev_prim->count += this_prim->count;
> - prev_prim->end = this_prim->end;
> + vbo_merge_prims(prev_prim, this_prim);
>   continue;
>}
>  
> @@ -420,7 +418,7 @@ _save_compile_vertex_list(struct gl_context *ctx)
>  */
> save->copied.nr = _save_copy_vertices(ctx, node, save->buffer);
>  
> -   vbo_merge_prims(ctx, node->prim, &node->prim_count);
> +   merge_prims(ctx, node->prim, &node->prim_count);
>  
> /* Deal with GL_COMPILE_AND_EXECUTE:
>  */
> --
> 1.7.3.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/12] sso: Finally enable the extension on Gallium

2013-05-03 Thread Gregory Hainaut
Note: it probably work on others drivers.
---
 src/mesa/state_tracker/st_extensions.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index f986480..4ce74f2 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -519,6 +519,7 @@ void st_init_extensions(struct st_context *st)
ctx->Extensions.ARB_half_float_vertex = GL_TRUE;
ctx->Extensions.ARB_internalformat_query = GL_TRUE;
ctx->Extensions.ARB_map_buffer_range = GL_TRUE;
+   ctx->Extensions.ARB_separate_shader_objects = GL_TRUE;
ctx->Extensions.ARB_shader_objects = GL_TRUE;
ctx->Extensions.ARB_shading_language_100 = GL_TRUE;
ctx->Extensions.ARB_texture_border_clamp = GL_TRUE; /* XXX temp */
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/12] sso: implement ValidateProgramPipeline and GetProgramPipelineInfoLog

2013-05-03 Thread Gregory Hainaut
Implementation note:
I don't use context for ralloc (don't know how).

The check on PROGRAM_SEPARABLE flags is also done when the pipeline isn't bound.
It doesn't make any sense in a DSA style API.

Maybe we could replace _mesa_validate_program_pipeline by  
_mesa_validate_program_pipeline.
For example we could recreate a dummy pipeline object. However the new function 
checks
also the TEXTURE_IMAGE_UNIT number not sure of the impact.

V2:
Fix memory leak with ralloc_strdup
Formatting improvement
---
 src/mesa/main/context.c |9 ++
 src/mesa/main/mtypes.h  |2 +
 src/mesa/main/pipelineobj.c |  221 ++-
 src/mesa/main/pipelineobj.h |3 +
 src/mesa/main/uniform_query.cpp |   71 +
 src/mesa/main/uniforms.h|3 +
 6 files changed, 305 insertions(+), 4 deletions(-)

diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c
index 6a0619a..559e21f 100644
--- a/src/mesa/main/context.c
+++ b/src/mesa/main/context.c
@@ -1767,6 +1767,7 @@ _mesa_check_blend_func_error(struct gl_context *ctx)
  * Prior to drawing anything with glBegin, glDrawArrays, etc. this function
  * is called to see if it's valid to render.  This involves checking that
  * the current shader is valid and the framebuffer is complete.
+ * It also check the current pipeline object is valid if any.
  * If an error is detected it'll be recorded here.
  * \return GL_TRUE if OK to render, GL_FALSE if not
  */
@@ -1876,6 +1877,14 @@ _mesa_valid_to_render(struct gl_context *ctx, const char 
*where)
   }
}
 
+   /* A pipeline object is bound */
+   if (ctx->_Shader->Name && !ctx->_Shader->Validated) {
+  /* Error message will be printed inside _mesa_validate_program_pipeline 
*/
+  if (!_mesa_validate_program_pipeline(ctx, ctx->_Shader, GL_TRUE)) {
+ return GL_FALSE;
+  }
+   }
+
if (ctx->DrawBuffer->_Status != GL_FRAMEBUFFER_COMPLETE_EXT) {
   _mesa_error(ctx, GL_INVALID_FRAMEBUFFER_OPERATION_EXT,
   "%s(incomplete framebuffer)", where);
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index adf518b..ce88860 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -2424,6 +2424,8 @@ struct gl_pipeline_object
GLboolean Validated; /**< Pipeline Validation status */
 
GLboolean EverBound; /**< Has the pipeline object been 
created */
+
+   GLchar *InfoLog;
 };
 
 /**
diff --git a/src/mesa/main/pipelineobj.c b/src/mesa/main/pipelineobj.c
index 2d18192..d7948ff 100644
--- a/src/mesa/main/pipelineobj.c
+++ b/src/mesa/main/pipelineobj.c
@@ -87,6 +87,7 @@ _mesa_new_pipeline_object(struct gl_context *ctx, GLuint name)
   _glthread_INIT_MUTEX(obj->Mutex);
   obj->RefCount = 1;
   obj->Flags = _mesa_get_shader_flags();
+  obj->InfoLog = ralloc_strdup(obj, "");
}
 
return obj;
@@ -339,13 +340,15 @@ _mesa_UseProgramStages(GLuint pipeline, GLbitfield 
stages, GLuint program)
 *disabled (tessellation control and evaluation, geometry), or have
 *undefined results (core profile vertex and fragment).
 */
-
if (stages & GL_VERTEX_SHADER_BIT)
   _mesa_use_shader_program(ctx, GL_VERTEX_SHADER, shProg, pipe);
if (stages & GL_FRAGMENT_SHADER_BIT)
   _mesa_use_shader_program(ctx, GL_FRAGMENT_SHADER, shProg, pipe);
if (stages & GL_GEOMETRY_SHADER_BIT)
   _mesa_use_shader_program(ctx, GL_GEOMETRY_SHADER_ARB, shProg, pipe);
+
+   /* Validation would need to be redone */
+   pipe->Validated = GL_FALSE;
 }
 
 /**
@@ -573,11 +576,10 @@ _mesa_GetProgramPipelineiv(GLuint pipeline, GLenum pname, 
GLint *params)
   *params = pipe->ActiveProgram ? pipe->ActiveProgram->Name : 0;
   return;
case GL_INFO_LOG_LENGTH:
-  // TODO
-  *params = 0;
+  *params = pipe->Validated;
   return;
case GL_VALIDATE_STATUS:
-  *params = pipe->ValidationStatus;
+  *params = pipe->Validated;
   return;
case GL_VERTEX_SHADER:
   *params = pipe->CurrentVertexProgram ? pipe->CurrentVertexProgram->Name 
: 0;
@@ -606,16 +608,227 @@ _mesa_GetProgramPipelineiv(GLuint pipeline, GLenum 
pname, GLint *params)
  _mesa_lookup_enum_by_nr(pname));
 }
 
+static GLboolean
+ProgramEnabledEverywhere(struct gl_pipeline_object *pipe,
+ struct gl_shader_program *prog,
+ char *errMsg, size_t errMsgLength)
+{
+   if (!prog) return GL_TRUE;
+
+   GLboolean status = GL_TRUE;
+
+   if (prog->_LinkedShaders[MESA_SHADER_VERTEX]) {
+  if (pipe->CurrentVertexProgram) {
+ if (prog->Name != pipe->CurrentVertexProgram->Name) {
+status = GL_FALSE;
+ }
+  } else {
+ status = GL_FALSE;
+  }
+   }
+
+   if (prog->_LinkedShaders[MESA_SHADER_FRAGMENT]) {
+  if (pipe->CurrentFragmentProgram) {
+ if (prog->Name != pipe->CurrentFragmentProgram->Name) {
+status = GL_FALSE;
+ }
+  } else {
+  

[Mesa-dev] [PATCH 10/12] sso: update glGet: GL_PROGRAM_PIPELINE_BINDING

2013-05-03 Thread Gregory Hainaut
---
 src/mesa/main/get.c  |9 +
 src/mesa/main/get_hash_params.py |3 +++
 2 files changed, 12 insertions(+)

diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
index 54159c0..6cbb7db 100644
--- a/src/mesa/main/get.c
+++ b/src/mesa/main/get.c
@@ -369,6 +369,7 @@ EXTRA_EXT(ARB_map_buffer_alignment);
 EXTRA_EXT(ARB_texture_cube_map_array);
 EXTRA_EXT(ARB_texture_buffer_range);
 EXTRA_EXT(ARB_texture_multisample);
+EXTRA_EXT(ARB_separate_shader_objects);
 
 static const int
 extra_ARB_color_buffer_float_or_glcore[] = {
@@ -889,6 +890,14 @@ find_custom_value(struct gl_context *ctx, const struct 
value_desc *d, union valu
  _mesa_problem(ctx, "driver doesn't implement GetTimestamp");
   }
   break;
+   /* GL_ARB_separate_shader_objects */
+   case GL_PROGRAM_PIPELINE_BINDING:
+  if (ctx->Pipeline.Current) {
+ v->value_int = ctx->Pipeline.Current->Name;
+  } else {
+ v->value_int = 0;
+  }
+  break;
}
 }
 
diff --git a/src/mesa/main/get_hash_params.py b/src/mesa/main/get_hash_params.py
index 2b97da6..43a11cf 100644
--- a/src/mesa/main/get_hash_params.py
+++ b/src/mesa/main/get_hash_params.py
@@ -709,6 +709,9 @@ descriptor=[
 
 # GL_ARB_texture_cube_map_array
   [ "TEXTURE_BINDING_CUBE_MAP_ARRAY_ARB", "LOC_CUSTOM, TYPE_INT, 
TEXTURE_CUBE_ARRAY_INDEX, extra_ARB_texture_cube_map_array" ],
+
+# GL_ARB_separate_shader_objects
+  [ "PROGRAM_PIPELINE_BINDING", "LOC_CUSTOM, TYPE_INT, 
GL_PROGRAM_PIPELINE_BINDING, extra_ARB_separate_shader_objects" ],
 ]},
 
 # Enums restricted to OpenGL Core profile
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/12] sso: implement BindProgramPipeline

2013-05-03 Thread Gregory Hainaut
Test become green in piglit:
The updated ext_transform_feedback-api-errors:useprogstage_noactive 
useprogstage_active bind_pipeline
arb_separate_shader_object-GetProgramPipelineiv
arb_separate_shader_object-IsProgramPipeline

For the moment I reuse Driver.UseProgram but I guess it will be better to create
a UseProgramStages functions. Opinion is welcome

V2: formatting & rename
---
 src/mesa/main/pipelineobj.c |   60 +++
 1 file changed, 60 insertions(+)

diff --git a/src/mesa/main/pipelineobj.c b/src/mesa/main/pipelineobj.c
index 3cdca63..2d18192 100644
--- a/src/mesa/main/pipelineobj.c
+++ b/src/mesa/main/pipelineobj.c
@@ -35,6 +35,10 @@
  * DeletePipelineObject
  * 2/ We probably need a UseProgramStages driver function. It would avoir to
  * dirty all stages
+ * 3/ Maybe create a bind pipeline driver function to notify that all
+ * program was updated. Currently I call UseProgram with NULL because there 
isn't
+ * a single program
+ * 4/ When vertices need to be flushed (FLUSH_VERTICES)
  */
 
 #include "main/glheader.h"
@@ -382,6 +386,62 @@ _mesa_ActiveShaderProgram(GLuint pipeline, GLuint program)
 void GLAPIENTRY
 _mesa_BindProgramPipeline(GLuint pipeline)
 {
+   GET_CURRENT_CONTEXT(ctx);
+   struct gl_pipeline_object *newObj = NULL;
+
+   if (ctx->_Shader->Name == pipeline)
+  return;   /* rebinding the same pipeline object- no change */
+
+   /*
+*  An INVALID_OPERATION error is generated :
+*  by BindProgramPipeline if the current transform feedback object is 
active
+*  and not paused;
+*/
+   if (_mesa_is_xfb_active_and_unpaused(ctx)) {
+  _mesa_error(ctx, GL_INVALID_OPERATION,
+"glBindProgramPipeline(transform feedback active)");
+  return;
+   }
+
+   /*
+* Get pointer to new pipeline object (newObj)
+*/
+   if (pipeline) {
+  /* non-default pipeline object */
+  newObj = lookup_pipeline_object(ctx, pipeline);
+  if (!newObj) {
+ _mesa_error(ctx, GL_INVALID_OPERATION, "glBindProgramPipeline(non-gen 
name)");
+ return;
+  }
+
+  /* Object is created by any Pipeline call but glGenProgramPipelines,
+   * glIsProgramPipeline and GetProgramPipelineInfoLog
+   */
+  newObj->EverBound = GL_TRUE;
+   }
+
+   /* First bind the Pipeline to pipeline binding point */
+   _mesa_reference_pipeline_object(ctx, &ctx->Pipeline.Current, newObj);
+
+   /* Spec say:
+* if any program is bound to the context, the current pipeline object is
+* ignored.
+*/
+   if (&ctx->Shader != ctx->_Shader) {
+  if (pipeline) {
+ /* Bound the pipeline to the current program and
+  * restore the pipeline state
+  */
+ _mesa_reference_pipeline_object(ctx, &ctx->_Shader, newObj);
+  } else {
+ /* Unbind the pipeline */
+ _mesa_reference_pipeline_object(ctx, &ctx->_Shader, 
ctx->Pipeline.Default);
+  }
+  FLUSH_VERTICES(ctx, _NEW_PROGRAM | _NEW_PROGRAM_CONSTANTS);
+  /* FIXME */
+  if (ctx->Driver.UseProgram)
+ ctx->Driver.UseProgram(ctx, NULL);
+   }
 }
 
 /**
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/12] sso: Implement _mesa_UseProgramStages

2013-05-03 Thread Gregory Hainaut
Implement _mesa_UseProgramStages => 
arb_separate_shader_object-GetProgramPipelineiv is now pass :)

Extend use_shader_program to support a different target. Allow to reuse the 
function to update
the pipeline state. Note I bypass the flush when target isn't current. Maybe it 
would be better to
create a new UseProgramStages driver function

V2: formatting & rename
---
 src/mesa/main/pipelineobj.c |  111 +++
 src/mesa/main/shaderapi.c   |   32 +++--
 src/mesa/main/shaderapi.h   |3 +-
 3 files changed, 131 insertions(+), 15 deletions(-)

diff --git a/src/mesa/main/pipelineobj.c b/src/mesa/main/pipelineobj.c
index 5add307..3cdca63 100644
--- a/src/mesa/main/pipelineobj.c
+++ b/src/mesa/main/pipelineobj.c
@@ -33,6 +33,8 @@
  * XXX things to do:
  * 1/ Do we need to create 2 new drivers functions: CreatePipelineObject
  * DeletePipelineObject
+ * 2/ We probably need a UseProgramStages driver function. It would avoir to
+ * dirty all stages
  */
 
 #include "main/glheader.h"
@@ -231,6 +233,115 @@ _mesa_reference_pipeline_object_(struct gl_context *ctx,
 void GLAPIENTRY
 _mesa_UseProgramStages(GLuint pipeline, GLbitfield stages, GLuint program)
 {
+   GET_CURRENT_CONTEXT(ctx);
+
+   struct gl_pipeline_object *pipe = lookup_pipeline_object(ctx, pipeline);
+   struct gl_shader_program *shProg = NULL;
+
+   if (!pipe) {
+  _mesa_error(ctx, GL_INVALID_OPERATION, "glUseProgramStages(pipeline)");
+  return;
+   }
+
+   /* Object is created by any Pipeline call but glGenProgramPipelines,
+* glIsProgramPipeline and GetProgramPipelineInfoLog
+*/
+   pipe->EverBound = GL_TRUE;
+
+   /* NOT YET SUPPORTED:
+* GL_TESS_CONTROL_SHADER_BIT
+* GL_TESS_EVALUATION_SHADER_BIT
+* GL_COMPUTE_SHADER_BIT
+*/
+   GLbitfield any_valid_stages = GL_VERTEX_SHADER_BIT | GL_FRAGMENT_SHADER_BIT;
+   if (_mesa_is_desktop_gl(ctx) && ctx->Extensions.ARB_geometry_shader4)
+  any_valid_stages |= GL_GEOMETRY_SHADER_BIT;
+
+   if (stages != GL_ALL_SHADER_BITS && (stages  & ~any_valid_stages) != 0) {
+  _mesa_error(ctx, GL_INVALID_VALUE, "glUseProgramStages(Stages)");
+  return;
+   }
+
+   /*
+*  An INVALID_OPERATION error is generated :
+*  by UseProgramStages if the program pipeline object it refers to is 
current
+*  and the current transform feedback object is active and not paused;
+*/
+   /*
+* 6a. Should the fragment shader program object be allowed to changed
+* within transform feedback mode?
+* RESOLVED:  No, this should generate an GL_INVALID_OPERATION error.
+*/
+   if (ctx->_Shader == pipe) {
+  if (_mesa_is_xfb_active_and_unpaused(ctx)) {
+ _mesa_error(ctx, GL_INVALID_OPERATION,
+   "glUseProgramStages(transform feedback active)");
+ return;
+  }
+   }
+
+   if (program) {
+  /* An INVALID_OPERATION error is generated if program is the name of a
+   * shader object
+   */
+  struct gl_shader *sh = _mesa_lookup_shader(ctx, program);
+  if (sh != NULL) {
+ _mesa_error(ctx, GL_INVALID_OPERATION,
+   "glUseProgramStages(progam is a shader object)");
+ return;
+  }
+
+  /* An INVALID_VALUE error is generated if program is not the name of ei-
+   * ther a program or shader object
+   */
+  shProg = _mesa_lookup_shader_program(ctx, program);
+  if (shProg == NULL) {
+ _mesa_error(ctx, GL_INVALID_VALUE,
+   "glUseProgramStages(progam is not a program object)");
+ return;
+  }
+
+  /* An INVALID_OPERATION error is generated if the program object named
+   * by program was linked without the PROGRAM_SEPARABLE parameter set, has
+   * not been linked, or was last linked unsuccessfully. The corresponding 
shader
+   * stages in pipeline are not modified.
+   */
+  if (!shProg->LinkStatus) {
+ _mesa_error(ctx, GL_INVALID_OPERATION,
+   "glUseProgramStages(program not linked)");
+ return;
+  }
+  if (!shProg->SeparateShader) {
+ _mesa_error(ctx, GL_INVALID_OPERATION,
+   "glUseProgramStages(program wasn't linked with the 
PROGRAM_SEPARABLE flag)");
+ return;
+  }
+   }
+
+   /*
+*  7.  What happens if you have a program object current for a shader 
stage,
+*but the program object doesn't contain an executable for that stage?
+
+*RESOLVED:  This is not an error; instead it is as though there were no
+*program bound to that stage.  We have two different notions for
+*programs bound to shader stages.  A program is "current" for a stage
+*if it bound to that stage in the active program pipeline object.  A
+*program is "active" for a stage if it is current and it has an
+*executable for this stage.  In this case, the program would be current
+*but not active.
+
+*When no program is active for a stage, the stage will be replaced

[Mesa-dev] [PATCH 05/12] sso: replace Shader binding point with _Shader

2013-05-03 Thread Gregory Hainaut
To avoid NULL pointer check a default pipeline object is installed in _Shader 
when no
program is current

The spec say that UseProgram/UseShaderProgramEXT/ActiveProgramEXT got an higher
priority over the pipeline object. When default program is uninstall, the 
pipeline is
used if any was bound.

Note: A careful rename need to be done now...

V2: formating improvement
---
 src/mesa/main/mtypes.h  |5 +++
 src/mesa/main/pipelineobj.c |8 
 src/mesa/main/shaderapi.c   |   87 +--
 3 files changed, 97 insertions(+), 3 deletions(-)

diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index f979cd0..adf518b 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -2434,6 +2434,9 @@ struct gl_pipeline_shader_state
/** Currently bound pipeline object. See _mesa_BindProgramPipeline() */
struct gl_pipeline_object *Current;
 
+   /* Default Object to ensure that _Shader is never NULL */
+   struct gl_pipeline_object *Default;
+
/** Pipeline objects */
struct _mesa_HashTable *Objects;
 };
@@ -3546,6 +3549,8 @@ struct gl_context
 
struct gl_pipeline_shader_state Pipeline; /**< GLSL pipeline shader object 
state */
struct gl_pipeline_object Shader; /**< GLSL shader object state */
+   struct gl_pipeline_object *_Shader; /**< Points to ::Shader or 
::Pipeline.Current
+ or ::Pipeline.Default */
struct gl_shader_compiler_options ShaderCompilerOptions[MESA_SHADER_TYPES];
 
struct gl_query_state Query;  /**< occlusion, timer queries */
diff --git a/src/mesa/main/pipelineobj.c b/src/mesa/main/pipelineobj.c
index ffbeeae..5add307 100644
--- a/src/mesa/main/pipelineobj.c
+++ b/src/mesa/main/pipelineobj.c
@@ -95,6 +95,10 @@ _mesa_init_pipeline(struct gl_context *ctx)
ctx->Pipeline.Objects = _mesa_NewHashTable();
 
ctx->Pipeline.Current = NULL;
+
+   /* Install a default Pipeline */
+   ctx->Pipeline.Default = _mesa_new_pipeline_object(ctx, 0);
+   _mesa_reference_pipeline_object(ctx, &ctx->_Shader, ctx->Pipeline.Default);
 }
 
 
@@ -118,6 +122,10 @@ _mesa_free_pipeline_data(struct gl_context *ctx)
 {
_mesa_HashDeleteAll(ctx->Pipeline.Objects, delete_pipelineobj_cb, ctx);
_mesa_DeleteHashTable(ctx->Pipeline.Objects);
+
+   _mesa_reference_pipeline_object(ctx, &ctx->_Shader, NULL);
+   _mesa_delete_pipeline_object(ctx, ctx->Pipeline.Default);
+
 }
 
 /**
diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
index 46072ba..c25f49f 100644
--- a/src/mesa/main/shaderapi.c
+++ b/src/mesa/main/shaderapi.c
@@ -43,6 +43,7 @@
 #include "main/hash.h"
 #include "main/mfeatures.h"
 #include "main/mtypes.h"
+#include "main/pipelineobj.h"
 #include "main/shaderapi.h"
 #include "main/shaderobj.h"
 #include "main/transformfeedback.h"
@@ -138,6 +139,8 @@ _mesa_free_shader_state(struct gl_context *ctx)
_mesa_reference_shader_program(ctx, &ctx->Shader.ActiveProgram, NULL);
 
/* Extended for ARB_separate_shader_objects */
+   _mesa_reference_pipeline_object(ctx, &ctx->_Shader, NULL);
+
assert(ctx->Shader.RefCount == 1);
_glthread_DESTROY_MUTEX(ctx->Shader.Mutex);
 }
@@ -1455,7 +1458,29 @@ _mesa_UseProgram(GLhandleARB program)
   shProg = NULL;
}
 
-   _mesa_use_program(ctx, shProg);
+   /*
+* The executable code for an individual shader stage is taken from the
+* current program for that stage.  If there is a current program object
+* for any shader stage or for uniform updates established by UseProgram,
+* UseShaderProgramEXT, or ActiveProgramEXT, the current program for that
+* stage (if any) is considered current.  Otherwise, if there is a bound
+* program pipeline object ...
+*/
+   if (program) {
+  /* Attach shader state to the binding point */
+  _mesa_reference_pipeline_object(ctx, &ctx->_Shader, &ctx->Shader);
+  /* Update the program */
+  _mesa_use_program(ctx, shProg);
+   } else {
+  /* Must be done first: detach the progam */
+  _mesa_use_program(ctx, shProg);
+  /* Unattach shader_state binding point */
+  _mesa_reference_pipeline_object(ctx, &ctx->_Shader, 
ctx->Pipeline.Default);
+  /* If a pipeline was bound, rebind it */
+  if (ctx->Pipeline.Current) {
+ _mesa_BindProgramPipeline(ctx->Pipeline.Current->Name);
+  }
+   }
 }
 
 
@@ -1778,7 +1803,35 @@ _mesa_UseShaderProgramEXT(GLenum type, GLuint program)
   }
}
 
-   _mesa_use_shader_program(ctx, type, shProg);
+   /*
+* The executable code for an individual shader stage is taken from the
+* current program for that stage.  If there is a current program object
+* for any shader stage or for uniform updates established by UseProgram,
+* UseShaderProgramEXT, or ActiveProgramEXT, the current program for that
+* stage (if any) is considered current.  Otherwise, if there is a bound
+* program pipeline object ...
+*/
+   if (program) {
+  /* Attach shader state to the binding

[Mesa-dev] [PATCH 07/12] sso: update meta state

2013-05-03 Thread Gregory Hainaut
save and restore _Shader/Pipeline binding point. Rational we don't want any 
conflict
when the program will be unattached.

V2: formatting improvement
---
 src/mesa/drivers/common/meta.c |   28 +---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c
index 927573d..01a63bd 100644
--- a/src/mesa/drivers/common/meta.c
+++ b/src/mesa/drivers/common/meta.c
@@ -51,6 +51,7 @@
 #include "main/macros.h"
 #include "main/matrix.h"
 #include "main/mipmap.h"
+#include "main/pipelineobj.h"
 #include "main/pixel.h"
 #include "main/pbo.h"
 #include "main/polygon.h"
@@ -142,6 +143,8 @@ struct save_state
struct gl_shader_program *GeometryShader;
struct gl_shader_program *FragmentShader;
struct gl_shader_program *ActiveShader;
+   struct gl_pipeline_object   *_Shader;
+   struct gl_pipeline_object   *Pipeline;
 
/** MESA_META_STENCIL_TEST */
struct gl_stencil_attrib Stencil;
@@ -615,6 +618,14 @@ _mesa_meta_begin(struct gl_context *ctx, GLbitfield state)
  _mesa_set_enable(ctx, GL_FRAGMENT_SHADER_ATI, GL_FALSE);
   }
 
+  if (ctx->Extensions.ARB_separate_shader_objects) {
+ /* Warning it must be done before _mesa_UseProgram call */
+ _mesa_reference_pipeline_object(ctx, &save->_Shader, ctx->_Shader);
+ _mesa_reference_pipeline_object(ctx, &save->Pipeline,
+ ctx->Pipeline.Current);
+ _mesa_BindProgramPipeline(0);
+  }
+
   if (ctx->Extensions.ARB_shader_objects) {
 _mesa_reference_shader_program(ctx, &save->VertexShader,
ctx->_Shader->CurrentVertexProgram);
@@ -954,16 +965,26 @@ _mesa_meta_end(struct gl_context *ctx)
   save->ATIFragmentShaderEnabled);
   }
 
+  /* Warning it must be done before _mesa_use_shader_program call */
+  if (ctx->Extensions.ARB_separate_shader_objects) {
+ _mesa_reference_pipeline_object(ctx, &ctx->_Shader, save->_Shader);
+ _mesa_reference_pipeline_object(ctx, &ctx->Pipeline.Current,
+ save->Pipeline);
+ _mesa_reference_pipeline_object(ctx, &save->Pipeline, NULL);
+  }
+
   if (ctx->Extensions.ARB_vertex_shader)
-_mesa_use_shader_program(ctx, GL_VERTEX_SHADER, save->VertexShader);
+_mesa_use_shader_program(ctx, GL_VERTEX_SHADER, save->VertexShader,
+  ctx->_Shader);
 
   if (ctx->Extensions.ARB_geometry_shader4)
 _mesa_use_shader_program(ctx, GL_GEOMETRY_SHADER_ARB,
- save->GeometryShader);
+ save->GeometryShader, ctx->_Shader);
 
   if (ctx->Extensions.ARB_fragment_shader)
 _mesa_use_shader_program(ctx, GL_FRAGMENT_SHADER,
- save->FragmentShader);
+ save->FragmentShader, ctx->_Shader);
+
 
   _mesa_reference_shader_program(ctx, &ctx->_Shader->ActiveProgram,
 save->ActiveShader);
@@ -972,6 +993,7 @@ _mesa_meta_end(struct gl_context *ctx)
   _mesa_reference_shader_program(ctx, &save->GeometryShader, NULL);
   _mesa_reference_shader_program(ctx, &save->FragmentShader, NULL);
   _mesa_reference_shader_program(ctx, &save->ActiveShader, NULL);
+  _mesa_reference_pipeline_object(ctx, &save->_Shader, NULL);
}
 
if (state & MESA_META_STENCIL_TEST) {
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/12] sso: rename Shader to the pointer _Shader

2013-05-03 Thread Gregory Hainaut
Basically a sed but shaderapi.c and get.c.
get.c => GL_CURRENT_PROGAM always refer to the "old" UseProgram behavior
shaderapi.c => the old api stil update the Shader object directly

V2: formatting improvement
---
 src/mesa/drivers/common/meta.c   |   10 ++--
 src/mesa/drivers/dri/i965/brw_gs.c   |2 +-
 src/mesa/drivers/dri/i965/brw_shader.cpp |4 +-
 src/mesa/drivers/dri/i965/brw_vs.c   |4 +-
 src/mesa/drivers/dri/i965/brw_vs_surface_state.c |2 +-
 src/mesa/drivers/dri/i965/brw_wm.c   |2 +-
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c |2 +-
 src/mesa/drivers/dri/i965/gen6_sol.c |6 +-
 src/mesa/drivers/dri/i965/gen6_vs_state.c|2 +-
 src/mesa/drivers/dri/i965/gen6_wm_state.c|2 +-
 src/mesa/drivers/dri/i965/gen7_sol_state.c   |4 +-
 src/mesa/drivers/dri/i965/gen7_vs_state.c|2 +-
 src/mesa/drivers/dri/i965/gen7_wm_state.c|2 +-
 src/mesa/main/api_validate.c |2 +-
 src/mesa/main/context.c  |   32 +--
 src/mesa/main/ff_fragment_shader.cpp |8 +--
 src/mesa/main/get.c  |   10 
 src/mesa/main/shaderapi.c|   26 -
 src/mesa/main/state.c|   14 ++---
 src/mesa/main/texstate.c |   12 ++--
 src/mesa/main/transformfeedback.c|4 +-
 src/mesa/main/uniform_query.cpp  |4 +-
 src/mesa/main/uniforms.c |   66 +++---
 src/mesa/program/ir_to_mesa.cpp  |   12 ++--
 src/mesa/state_tracker/st_atom_clip.c|2 +-
 src/mesa/state_tracker/st_atom_constbuf.c|4 +-
 src/mesa/state_tracker/st_cb_drawpixels.c|2 +-
 src/mesa/state_tracker/st_draw.c |6 +-
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp   |2 +-
 src/mesa/state_tracker/st_program.c  |6 +-
 src/mesa/swrast/s_fragprog.c |2 +-
 31 files changed, 134 insertions(+), 124 deletions(-)

diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c
index e3ab82b..927573d 100644
--- a/src/mesa/drivers/common/meta.c
+++ b/src/mesa/drivers/common/meta.c
@@ -617,13 +617,13 @@ _mesa_meta_begin(struct gl_context *ctx, GLbitfield state)
 
   if (ctx->Extensions.ARB_shader_objects) {
 _mesa_reference_shader_program(ctx, &save->VertexShader,
-   ctx->Shader.CurrentVertexProgram);
+   ctx->_Shader->CurrentVertexProgram);
 _mesa_reference_shader_program(ctx, &save->GeometryShader,
-   ctx->Shader.CurrentGeometryProgram);
+   ctx->_Shader->CurrentGeometryProgram);
 _mesa_reference_shader_program(ctx, &save->FragmentShader,
-   ctx->Shader.CurrentFragmentProgram);
+   ctx->_Shader->CurrentFragmentProgram);
 _mesa_reference_shader_program(ctx, &save->ActiveShader,
-   ctx->Shader.ActiveProgram);
+   ctx->_Shader->ActiveProgram);
 
  _mesa_UseProgram(0);
   }
@@ -965,7 +965,7 @@ _mesa_meta_end(struct gl_context *ctx)
 _mesa_use_shader_program(ctx, GL_FRAGMENT_SHADER,
  save->FragmentShader);
 
-  _mesa_reference_shader_program(ctx, &ctx->Shader.ActiveProgram,
+  _mesa_reference_shader_program(ctx, &ctx->_Shader->ActiveProgram,
 save->ActiveShader);
 
   _mesa_reference_shader_program(ctx, &save->VertexShader, NULL);
diff --git a/src/mesa/drivers/dri/i965/brw_gs.c 
b/src/mesa/drivers/dri/i965/brw_gs.c
index caa3b3e..31b01df 100644
--- a/src/mesa/drivers/dri/i965/brw_gs.c
+++ b/src/mesa/drivers/dri/i965/brw_gs.c
@@ -189,7 +189,7 @@ static void populate_key( struct brw_context *brw,
   /* _NEW_TRANSFORM_FEEDBACK */
   if (_mesa_is_xfb_active_and_unpaused(ctx)) {
  const struct gl_shader_program *shaderprog =
-ctx->Shader.CurrentVertexProgram;
+ctx->_Shader->CurrentVertexProgram;
  const struct gl_transform_feedback_info *linked_xfb_info =
 &shaderprog->LinkedTransformFeedback;
  int i;
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index b3bd1b9..7259fde 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -257,7 +257,7 @@ brw_link_shader(struct gl_context *ctx, struct 
gl_shader_program *shProg)
 
   _mesa_reference_program(ctx, &prog, NULL);
 
-  if (ctx->Shader.Flags & GLSL_DUMP) {
+  if (ctx->_Shader->Flags & GLSL_DUMP) {
  printf("\n");
  pri

[Mesa-dev] [PATCH 04/12] sso: implement ActiveShaderProgram & GetProgramPipelineiv

2013-05-03 Thread Gregory Hainaut
V2:
* Rename object
* Formatting improvement
---
 src/mesa/main/pipelineobj.c |   77 +++
 1 file changed, 77 insertions(+)

diff --git a/src/mesa/main/pipelineobj.c b/src/mesa/main/pipelineobj.c
index d81bd0e..ffbeeae 100644
--- a/src/mesa/main/pipelineobj.c
+++ b/src/mesa/main/pipelineobj.c
@@ -231,6 +231,30 @@ _mesa_UseProgramStages(GLuint pipeline, GLbitfield stages, 
GLuint program)
 void GLAPIENTRY
 _mesa_ActiveShaderProgram(GLuint pipeline, GLuint program)
 {
+   GET_CURRENT_CONTEXT(ctx);
+   struct gl_shader_program *shProg = (program != 0)
+  ? _mesa_lookup_shader_program_err(ctx, program, 
"glActiveShaderProgram(program)")
+  : NULL;
+
+   struct gl_pipeline_object *pipe = lookup_pipeline_object(ctx, pipeline);
+
+   if (!pipe) {
+  _mesa_error(ctx, GL_INVALID_OPERATION, 
"glActiveShaderProgram(pipeline)");
+  return;
+   }
+
+   /* Object is created by any Pipeline call but glGenProgramPipelines,
+* glIsProgramPipeline and GetProgramPipelineInfoLog
+*/
+   pipe->EverBound = GL_TRUE;
+
+   if ((shProg != NULL) && !shProg->LinkStatus) {
+  _mesa_error(ctx, GL_INVALID_OPERATION,
+"glActiveShaderProgram(program %u not linked)", shProg->Name);
+  return;
+   }
+
+   _mesa_reference_shader_program(ctx, &pipe->ActiveProgram, shProg);
 }
 
 /**
@@ -348,6 +372,59 @@ _mesa_IsProgramPipeline(GLuint pipeline)
 void GLAPIENTRY
 _mesa_GetProgramPipelineiv(GLuint pipeline, GLenum pname, GLint *params)
 {
+   GET_CURRENT_CONTEXT(ctx);
+   struct gl_pipeline_object *pipe = lookup_pipeline_object(ctx, pipeline);
+
+   /* Are geometry shaders available in this context?
+*/
+   const bool has_gs = _mesa_is_desktop_gl(ctx) && 
ctx->Extensions.ARB_geometry_shader4;
+
+   if (!pipe) {
+  _mesa_error(ctx, GL_INVALID_OPERATION, 
"glGetProgramPipelineiv(pipeline)");
+  return;
+   }
+
+   /* Object is created by any Pipeline call but glGenProgramPipelines,
+* glIsProgramPipeline and GetProgramPipelineInfoLog
+*/
+   pipe->EverBound = GL_TRUE;
+
+   switch (pname) {
+   case GL_ACTIVE_PROGRAM:
+  *params = pipe->ActiveProgram ? pipe->ActiveProgram->Name : 0;
+  return;
+   case GL_INFO_LOG_LENGTH:
+  // TODO
+  *params = 0;
+  return;
+   case GL_VALIDATE_STATUS:
+  *params = pipe->ValidationStatus;
+  return;
+   case GL_VERTEX_SHADER:
+  *params = pipe->CurrentVertexProgram ? pipe->CurrentVertexProgram->Name 
: 0;
+  return;
+   case GL_TESS_EVALUATION_SHADER:
+  /* NOT YET SUPPORTED */
+  break;
+   case GL_TESS_CONTROL_SHADER:
+  /* NOT YET SUPPORTED */
+  break;
+   case GL_GEOMETRY_SHADER:
+  if (!has_gs) break;
+  *params = pipe->CurrentGeometryProgram ? 
pipe->CurrentGeometryProgram->Name : 0;;
+  return;
+   case GL_FRAGMENT_SHADER:
+  *params = pipe->CurrentFragmentProgram ? 
pipe->CurrentFragmentProgram->Name : 0;;
+  return;
+   case GL_COMPUTE_SHADER:
+  /* NOT YET SUPPORTED */
+  break;
+   default:
+  break;
+   }
+
+   _mesa_error(ctx, GL_INVALID_ENUM, "glGetProgramPipelineiv(pname=%s)",
+ _mesa_lookup_enum_by_nr(pname));
 }
 
 /**
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/12] sso: add support of GL_PROGRAM_SEPARABLE and CreateShaderProgramv

2013-05-03 Thread Gregory Hainaut
V1:
CreateShaderProgramv is similar as CreateShaderProgramEXT. The 2 differences are
1/ it an array of strings
2/ it support the GL_PROGRAM_SEPARABLE (aka SeparateShader) flag

V2: Formatting improvement
---
 src/mesa/main/mtypes.h|5 +++
 src/mesa/main/shaderapi.c |   94 +++--
 src/mesa/main/shaderapi.h |3 +-
 3 files changed, 72 insertions(+), 30 deletions(-)

diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 4487068..f979cd0 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -2255,6 +2255,11 @@ struct gl_shader_program
 * glClear()).
 */
GLboolean InternalSeparateShader;
+   /* ARB_separate_shader_objects
+* indicates whether program can be bound for individual pipeline stages 
using
+* UseProgramStages after it is next linked.
+*/
+   GLboolean SeparateShader;
 
GLuint NumShaders;  /**< number of attached shaders */
struct gl_shader **Shaders; /**< List of attached the shaders */
diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
index 774163d..46072ba 100644
--- a/src/mesa/main/shaderapi.c
+++ b/src/mesa/main/shaderapi.c
@@ -622,6 +622,11 @@ get_programiv(struct gl_context *ctx, GLuint program, 
GLenum pname, GLint *param
case GL_PROGRAM_BINARY_LENGTH:
   *params = 0;
   return;
+   case GL_PROGRAM_SEPARABLE:
+  if (!ctx->Extensions.ARB_separate_shader_objects)
+ break;
+  *params = shProg->SeparateShader;
+  return;
default:
   break;
}
@@ -1702,6 +1707,25 @@ _mesa_ProgramParameteri(GLuint program, GLenum pname, 
GLint value)
*/
   shProg->BinaryRetreivableHint = value;
   return;
+
+   case GL_PROGRAM_SEPARABLE:
+  if (!ctx->Extensions.ARB_separate_shader_objects)
+ break;
+
+  /* Spec imply that the behavior is the same as ARB_get_program_binary
+   * Chapter 7.3 Program Objects
+   */
+  if (value != GL_TRUE && value != GL_FALSE) {
+ _mesa_error(ctx, GL_INVALID_VALUE,
+ "glProgramParameteri(pname=%s, value=%d): "
+ "value must be 0 or 1.",
+ _mesa_lookup_enum_by_nr(pname),
+ value);
+ return;
+  }
+  shProg->SeparateShader = value;
+  return;
+
default:
   break;
}
@@ -1773,59 +1797,71 @@ _mesa_ActiveProgramEXT(GLuint program)
return;
 }
 
-
-/**
- * For GL_EXT_separate_shader_objects
- */
-GLuint GLAPIENTRY
-_mesa_CreateShaderProgramEXT(GLenum type, const GLchar *string)
+static GLuint
+_mesa_create_shader_program(struct gl_context* ctx, GLboolean separate,
+GLenum type, GLsizei count, const GLchar* const 
*strings)
 {
-   GET_CURRENT_CONTEXT(ctx);
const GLuint shader = create_shader(ctx, type);
GLuint program = 0;
 
if (shader) {
-  shader_source(ctx, shader, _mesa_strdup(string));
+  _mesa_ShaderSource(shader, count, strings, NULL);
+
   compile_shader(ctx, shader);
 
   program = create_shader_program(ctx);
   if (program) {
-struct gl_shader_program *shProg;
-struct gl_shader *sh;
-GLint compiled = GL_FALSE;
+ struct gl_shader_program *shProg;
+ struct gl_shader *sh;
+ GLint compiled = GL_FALSE;
 
-shProg = _mesa_lookup_shader_program(ctx, program);
-sh = _mesa_lookup_shader(ctx, shader);
+ shProg = _mesa_lookup_shader_program(ctx, program);
+ sh = _mesa_lookup_shader(ctx, shader);
 
-get_shaderiv(ctx, shader, GL_COMPILE_STATUS, &compiled);
-if (compiled) {
-   attach_shader(ctx, program, shader);
-   link_program(ctx, program);
-   detach_shader(ctx, program, shader);
+ shProg->SeparateShader = separate;
 
+ get_shaderiv(ctx, shader, GL_COMPILE_STATUS, &compiled);
+ if (compiled) {
+attach_shader(ctx, program, shader);
+link_program(ctx, program);
+detach_shader(ctx, program, shader);
 #if 0
-   /* Possibly... */
-   if (active-user-defined-varyings-in-linked-program) {
-  append-error-to-info-log;
-  shProg->LinkStatus = GL_FALSE;
-   }
+/* Possibly... */
+if (active-user-defined-varyings-in-linked-program) {
+   append-error-to-info-log;
+   shProg->LinkStatus = GL_FALSE;
+}
 #endif
-}
+ }
 
-ralloc_strcat(&shProg->InfoLog, sh->InfoLog);
+ ralloc_strcat(&shProg->InfoLog, sh->InfoLog);
   }
-
   delete_shader(ctx, shader);
}
-
return program;
 }
 
 /**
+ * For GL_EXT_separate_shader_objects
+ */
+GLuint GLAPIENTRY
+_mesa_CreateShaderProgramEXT(GLenum type, const GLchar *string)
+{
+   GET_CURRENT_CONTEXT(ctx);
+
+   return _mesa_create_shader_program(ctx, GL_FALSE, type, 1, &string);
+}
+
+/**
  * ARB_separate_shader_objects: Compile & Link Program
+ * Bas

[Mesa-dev] [PATCH 02/12] sso: Add pipeline container/state

2013-05-03 Thread Gregory Hainaut
V1:
* Extend gl_shader_state as pipeline object state
* Add a new container gl_pipeline_shader_state that contains
  binding point of the previous object
* Update mesa init/free shader state due to the extension of
  the attibute
* Add an init/free pipeline function for the context
* Implement GenProgramPipeline/DeleteProgramPipeline/IsProgramPipeline.
 I based my work on the VAO code.

V2:
* Rename gl_shader_state to gl_pipeline_object
* Rename Pipeline.PipelineObj to Pipeline.Current
* Rename ValidationStatus to Validated
* Formatting improvement
---
 src/mesa/main/context.c |3 +
 src/mesa/main/mtypes.h  |   30 +-
 src/mesa/main/pipelineobj.c |  234 ++-
 src/mesa/main/pipelineobj.h |   25 +
 src/mesa/main/shaderapi.c   |   14 ++-
 src/mesa/main/shaderapi.h   |3 +
 6 files changed, 303 insertions(+), 6 deletions(-)

diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c
index 0539934..d089827 100644
--- a/src/mesa/main/context.c
+++ b/src/mesa/main/context.c
@@ -106,6 +106,7 @@
 #include "macros.h"
 #include "matrix.h"
 #include "multisample.h"
+#include "pipelineobj.h"
 #include "pixel.h"
 #include "pixelstore.h"
 #include "points.h"
@@ -762,6 +763,7 @@ init_attrib_groups(struct gl_context *ctx)
_mesa_init_lighting( ctx );
_mesa_init_matrix( ctx );
_mesa_init_multisample( ctx );
+   _mesa_init_pipeline( ctx );
_mesa_init_pixel( ctx );
_mesa_init_pixelstore( ctx );
_mesa_init_point( ctx );
@@ -1167,6 +1169,7 @@ _mesa_free_context_data( struct gl_context *ctx )
_mesa_free_texture_data( ctx );
_mesa_free_matrix_data( ctx );
_mesa_free_viewport_data( ctx );
+   _mesa_free_pipeline_data(ctx);
_mesa_free_program_data(ctx);
_mesa_free_shader_state(ctx);
_mesa_free_queryobj_data(ctx);
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 05d8518..4487068 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -2381,9 +2381,19 @@ struct gl_shader_program
 
 /**
  * Context state for GLSL vertex/fragment shaders.
+ * Extended to support pipeline object
  */
-struct gl_shader_state
+struct gl_pipeline_object
 {
+   /** Name of the pipeline object as received from glGenProgramPipelines.
+* It would be 0 for shaders without separate shader objects.
+*/
+   GLuint Name;
+
+   GLint RefCount;
+
+   _glthread_Mutex Mutex;
+
/**
 * Programs used for rendering
 *
@@ -2405,8 +2415,23 @@ struct gl_shader_state
struct gl_shader_program *ActiveProgram;
 
GLbitfield Flags;/**< Mask of GLSL_x flags */
+
+   GLboolean Validated; /**< Pipeline Validation status */
+
+   GLboolean EverBound; /**< Has the pipeline object been 
created */
 };
 
+/**
+ * Context state for GLSL pipeline shaders.
+ */
+struct gl_pipeline_shader_state
+{
+   /** Currently bound pipeline object. See _mesa_BindProgramPipeline() */
+   struct gl_pipeline_object *Current;
+
+   /** Pipeline objects */
+   struct _mesa_HashTable *Objects;
+};
 
 /**
  * Compiler options for a single GLSL shaders type
@@ -3514,7 +3539,8 @@ struct gl_context
struct gl_geometry_program_state GeometryProgram;
struct gl_ati_fragment_shader_state ATIFragmentShader;
 
-   struct gl_shader_state Shader; /**< GLSL shader object state */
+   struct gl_pipeline_shader_state Pipeline; /**< GLSL pipeline shader object 
state */
+   struct gl_pipeline_object Shader; /**< GLSL shader object state */
struct gl_shader_compiler_options ShaderCompilerOptions[MESA_SHADER_TYPES];
 
struct gl_query_state Query;  /**< occlusion, timer queries */
diff --git a/src/mesa/main/pipelineobj.c b/src/mesa/main/pipelineobj.c
index e7e628b..d81bd0e 100644
--- a/src/mesa/main/pipelineobj.c
+++ b/src/mesa/main/pipelineobj.c
@@ -30,6 +30,9 @@
  * Implementation of pipeline object related API functions. Based on
  * GL_ARB_separate_shader_objects extension.
  *
+ * XXX things to do:
+ * 1/ Do we need to create 2 new drivers functions: CreatePipelineObject
+ * DeletePipelineObject
  */
 
 #include "main/glheader.h"
@@ -51,6 +54,168 @@
 #include "../glsl/glsl_parser_extras.h"
 #include "../glsl/ir_uniform.h"
 
+/**
+ * Delete a pipeline object.
+ */
+void
+_mesa_delete_pipeline_object(struct gl_context *ctx, struct gl_pipeline_object 
*obj)
+{
+   _mesa_reference_shader_program(ctx, &obj->_CurrentFragmentProgram, NULL);
+   _mesa_reference_shader_program(ctx, &obj->CurrentFragmentProgram, NULL);
+   _mesa_reference_shader_program(ctx, &obj->CurrentVertexProgram, NULL);
+   _mesa_reference_shader_program(ctx, &obj->CurrentGeometryProgram, NULL);
+   _mesa_reference_shader_program(ctx, &obj->ActiveProgram, NULL);
+   _glthread_DESTROY_MUTEX(obj->Mutex);
+   ralloc_free(obj);
+}
+
+/**
+ * Allocate and initialize a new pipeline object.
+ */
+static struct gl_pipeline_object *
+_mesa_new_pipeline_object(struct gl_context *ctx, GLuint name)
+{
+   struct gl_pipeline_object *obj = rz

[Mesa-dev] [PATCH 00/12] RFC: add support of ARB_separate_shader_object extensions V3

2013-05-03 Thread Gregory Hainaut
Please find an implementation of the ARB_separate_shader_objects extensions. I 
concentrate
mostly on the state part of the extensions aka the pipeline object. I think GLSL
already compiled program separately anyway.

I test my implementation on the test that I send on piglit mailing list. All 
tests 
are ok but I miss a test for new uniform function. Besides there are still some 
parts unimplemented:
1/ GLX Protocol: dropped
2/ Display list: would be done later if someone is interested otherwise dropped

Stuff that bug me:
1/ I implement the feature as a pure mesa state. I don't know if they're any 
rule to create driver functions. Maybe it 
   would be better to add 
CreatePipelineObject/DeletePipelineObject/BindPipeline/UseProgramStages. 
Opinion is welcome

A github branch can be found here (based on v2 but v3 add only comment):
* https://github.com/gregory38/mesa-sso/tree/sso2

V2: 
* fix ralloc memory leak
* follow mesa formating rule
* rename Pipeline.PipelineObj to Pipeline.Current
* rename gl_shader_state to gl_pipeline_object
* rebase on current mesa

V3:
* resend properly the patches on the mesa mailing list
* Add minor comment


Note: Rebase done on ac1118d53c0b22db8dcd6fcdcd2d1a245037dbc1

Gregory Hainaut (12):
  sso: Create extensions entry points
  sso: Add pipeline container/state
  sso: add support of GL_PROGRAM_SEPARABLE and CreateShaderProgramv
  sso: implement ActiveShaderProgram & GetProgramPipelineiv
  sso: replace Shader binding point with _Shader
  sso: rename Shader to the pointer _Shader
  sso: update meta state
  sso: Implement _mesa_UseProgramStages
  sso: implement BindProgramPipeline
  sso: update glGet: GL_PROGRAM_PIPELINE_BINDING
  sso: implement ValidateProgramPipeline and GetProgramPipelineInfoLog
  sso: Finally enable the extension on Gallium

 src/mapi/glapi/gen/ARB_separate_shader_objects.xml |  401 ++
 src/mapi/glapi/gen/Makefile.am |1 +
 src/mapi/glapi/gen/gl_API.xml  |6 +-
 src/mapi/glapi/gen/gl_genexec.py   |1 +
 src/mesa/drivers/common/meta.c |   38 +-
 src/mesa/drivers/dri/i965/brw_gs.c |2 +-
 src/mesa/drivers/dri/i965/brw_shader.cpp   |4 +-
 src/mesa/drivers/dri/i965/brw_vs.c |4 +-
 src/mesa/drivers/dri/i965/brw_vs_surface_state.c   |2 +-
 src/mesa/drivers/dri/i965/brw_wm.c |2 +-
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c   |2 +-
 src/mesa/drivers/dri/i965/gen6_sol.c   |6 +-
 src/mesa/drivers/dri/i965/gen6_vs_state.c  |2 +-
 src/mesa/drivers/dri/i965/gen6_wm_state.c  |2 +-
 src/mesa/drivers/dri/i965/gen7_sol_state.c |4 +-
 src/mesa/drivers/dri/i965/gen7_vs_state.c  |2 +-
 src/mesa/drivers/dri/i965/gen7_wm_state.c  |2 +-
 src/mesa/main/api_validate.c   |2 +-
 src/mesa/main/context.c|   44 +-
 src/mesa/main/extensions.c |1 +
 src/mesa/main/ff_fragment_shader.cpp   |8 +-
 src/mesa/main/get.c|   19 +
 src/mesa/main/get_hash_params.py   |3 +
 src/mesa/main/mtypes.h |   43 +-
 src/mesa/main/pipelineobj.c|  834 
 src/mesa/main/pipelineobj.h|  102 +++
 src/mesa/main/shaderapi.c  |  252 --
 src/mesa/main/shaderapi.h  |   10 +-
 src/mesa/main/state.c  |   14 +-
 src/mesa/main/texstate.c   |   12 +-
 src/mesa/main/transformfeedback.c  |4 +-
 src/mesa/main/uniform_query.cpp|   75 +-
 src/mesa/main/uniforms.c   |  466 ++-
 src/mesa/main/uniforms.h   |   86 ++
 src/mesa/program/ir_to_mesa.cpp|   12 +-
 src/mesa/sources.mak   |1 +
 src/mesa/state_tracker/st_atom_clip.c  |2 +-
 src/mesa/state_tracker/st_atom_constbuf.c  |4 +-
 src/mesa/state_tracker/st_cb_drawpixels.c  |2 +-
 src/mesa/state_tracker/st_draw.c   |6 +-
 src/mesa/state_tracker/st_extensions.c |1 +
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp |2 +-
 src/mesa/state_tracker/st_program.c|6 +-
 src/mesa/swrast/s_fragprog.c   |2 +-
 44 files changed, 2319 insertions(+), 175 deletions(-)
 create mode 100644 src/mapi/glapi/gen/ARB_separate_shader_objects.xml
 create mode 100644 src/mesa/main/pipelineobj.c
 create mode 100644 src/mesa/main/pipelineobj.h

-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/xvmc/tests: Fix build failure, v2

2013-05-03 Thread Andreas Boll
Pushed as
http://cgit.freedesktop.org/mesa/mesa/commit/?id=5ff81cfd8640d02dc78d736cad5020d54ef7a0dc


2013/5/3 Matt Turner 

> On Fri, May 3, 2013 at 1:48 AM, Lauri Kasanen  wrote:
> > v2: Removed extra libs as requested by Matt Turner.
> >
> > Signed-off-by: Lauri Kasanen 
> > ---
> >  src/gallium/state_trackers/xvmc/Makefile.am |2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/src/gallium/state_trackers/xvmc/Makefile.am
> > b/src/gallium/state_trackers/xvmc/Makefile.am index 8ac7406..184432e
> > 100644
> > --- a/src/gallium/state_trackers/xvmc/Makefile.am
> > +++ b/src/gallium/state_trackers/xvmc/Makefile.am
> > @@ -44,7 +44,7 @@ check_PROGRAMS = \
> >  TESTS = $(check_PROGRAMS)
> >  noinst_PROGRAMS = tests/xvmc_bench
> >
> > -TEST_LIBS = -lXvMCW -lXvMC -lXv -lX11
> > +TEST_LIBS = $(XVMC_LIBS) -lXvMCW
> >  tests_test_context_SOURCES = tests/test_context.c tests/testlib.c
> >  tests_test_context_LDADD = $(TEST_LIBS)
> >  tests_test_surface_SOURCES = tests/test_surface.c tests/testlib.c
> > --
> > 1.7.2.1
> >
>
> Reviewed-by: Matt Turner 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Correctly initialize the shader key, v2

2013-05-03 Thread Andreas Boll
Pushed as
http://cgit.freedesktop.org/mesa/mesa/commit/?id=e495d88453076a4fe552502d66d6a3869ab70e4a


2013/5/3 Marek Olšák 

> As long as the commit message contains a note matching this regular
> expression '^[[:space:]]*NOTE: .*[Cc]andidate', it won't be missed.
> Usually the release manager takes care of cherry-picking if there are
> no conflicts.
>
> Marek
>
> On Fri, May 3, 2013 at 5:39 PM, Lauri Kasanen  wrote:
> > On Fri, 3 May 2013 10:54:09 -0400
> > Alex Deucher  wrote:
> >
> >> On Fri, May 3, 2013 at 9:55 AM, Lauri Kasanen  wrote:
> >> > This patch fixes constant shader recompiles by initializing the struct
> >> > to zero. For completeness, memcpy is used to copy the key to the
> shader
> >> > struct.
> >> >
> >>
> >> Candidate for the stable branch?
> >>
> >> Alex
> >
> > Yes, please apply to all stable branches where it applies cleanly.
> >
> > - Lauri
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] visual leak?

2013-05-03 Thread Brian Paul

On 05/03/2013 09:42 AM, Andy Skinner wrote:

Hi, Brian.  Did you have thoughts one these?

Would it make sense, if we can't update Mesa right now, to just put the 
register_with_display() call into Fake_glXChooseFBConfig(), as it is for 
Fake_glxChooseVisual()?  It allows the visual to be cleaned up when the Display 
is closed.  Unless it introduces something else we'd have to chase down, we 
think we'd have to do a smaller job to make that change (for now) than to go to 
more recent Mesa.


Sounds fine.



Whether we do that or not, should destroy_visuals_on_display() free both the 
pointers that XMesaDestroyVisual() frees?  This looks like a leak to me, even 
in current Mesa.


I don't think so.  There's several XVisualInfo pointers that we hang 
onto.  Some come from the user/application (and we can't free those) 
and others are internal to Mesa's GLX.  I'd have to study/fix the code 
to sort that out.  We risk a double-free otherwise.


-Brian




thanks
andy

-Original Message-
From: Andy Skinner
Sent: Monday, April 29, 2013 1:04 PM
To: 'Brian Paul'
Cc: mesa-dev@lists.freedesktop.org
Subject: RE: [Mesa-dev] visual leak?

Thanks.  I realized today that newer Mesa registers the display with that 
callback for choosing FBConfig, but our older version only does it for Visual.  
So it isn't getting freed when we close the display.

Do you think the register_with_display() call can be put into 
Fake_glXChooseFBConfig() on its own?  We'd like to update Mesa, but it may not 
be a good time in our release cycle.

Also, destroy_visuals_on_display calls free, but XMesaDestroyVisual can also call 
free on v->visinfo.  Should destroy_visuals_on_display call XMesaDestroyVisual, 
or do both free()s?

thanks
andy

-Original Message-
From: Brian Paul [mailto:bri...@vmware.com]
Sent: Monday, April 29, 2013 11:39 AM
To: Andy Skinner
Cc: mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] visual leak?

On 04/26/2013 03:23 PM, Andy Skinner wrote:

Hi. We are seeing a leak of things created in XMesaCreateVisual, as
called from fakeglx.c. I don't see anywhere that XMesaDestroyVisual is
called. I see a free() in destroy_visuals_on_display, but I don't know
how often that is called. Are these visuals intended to be kept around?

The visual is allocated in XMesaCreateVisual, from save_glx_visual,
from choose_visual, from Fake_glXChooseFBConfig.

It may be that the problem is that I'm not finding and reusing one in
the list, but separate from that, I am curious why XMesaDestroyVisual
isn't called.

Any thoughts?


There's no GLX function for explicitly destroying a visual so that's why 
XMesaDestroyVisual() isn't called.  Instead, we register an X display callback 
that's called when the X Display is closed which frees visuals and other things.

So, when you call XCloseDisplay, they should get freed.

-Brian


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/xvmc/tests: Fix build failure, v2

2013-05-03 Thread Matt Turner
On Fri, May 3, 2013 at 1:48 AM, Lauri Kasanen  wrote:
> v2: Removed extra libs as requested by Matt Turner.
>
> Signed-off-by: Lauri Kasanen 
> ---
>  src/gallium/state_trackers/xvmc/Makefile.am |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/src/gallium/state_trackers/xvmc/Makefile.am
> b/src/gallium/state_trackers/xvmc/Makefile.am index 8ac7406..184432e
> 100644
> --- a/src/gallium/state_trackers/xvmc/Makefile.am
> +++ b/src/gallium/state_trackers/xvmc/Makefile.am
> @@ -44,7 +44,7 @@ check_PROGRAMS = \
>  TESTS = $(check_PROGRAMS)
>  noinst_PROGRAMS = tests/xvmc_bench
>
> -TEST_LIBS = -lXvMCW -lXvMC -lXv -lX11
> +TEST_LIBS = $(XVMC_LIBS) -lXvMCW
>  tests_test_context_SOURCES = tests/test_context.c tests/testlib.c
>  tests_test_context_LDADD = $(TEST_LIBS)
>  tests_test_surface_SOURCES = tests/test_surface.c tests/testlib.c
> --
> 1.7.2.1
>

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Correctly initialize the shader key, v2

2013-05-03 Thread Marek Olšák
As long as the commit message contains a note matching this regular
expression '^[[:space:]]*NOTE: .*[Cc]andidate', it won't be missed.
Usually the release manager takes care of cherry-picking if there are
no conflicts.

Marek

On Fri, May 3, 2013 at 5:39 PM, Lauri Kasanen  wrote:
> On Fri, 3 May 2013 10:54:09 -0400
> Alex Deucher  wrote:
>
>> On Fri, May 3, 2013 at 9:55 AM, Lauri Kasanen  wrote:
>> > This patch fixes constant shader recompiles by initializing the struct
>> > to zero. For completeness, memcpy is used to copy the key to the shader
>> > struct.
>> >
>>
>> Candidate for the stable branch?
>>
>> Alex
>
> Yes, please apply to all stable branches where it applies cleanly.
>
> - Lauri
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 64170] [llvmpipe] piglit fbo-cubemap regression

2013-05-03 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=64170

Roland Scheidegger  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |WONTFIX

--- Comment #1 from Roland Scheidegger  ---
This is due to inaccuracies calculating rho (it goes away with no_rho_approx
set).

Examining this, the problem is at the mip level it's failing, there are 8x8
texels and 8x8 pixels, but the mapping to screen-aligned quad is such that at
the pixel it's failing there's only one pixel in the quad visible. And the
mapping to texel coords is done in such a way that this is in fact a texel
right at the corner of the cube map - the test is written in a way to use the
same face for all visible pixels, but I'm near certain the other 3 invisible
pixels map to different 2 faces. And the derivatives are not quite calculated
correctly in such a case (I believe this test really hits very close to worst
case, which is factor 2, hence one miplevel, which is exactly what the result
shows).
(Note that no_rho_opt does not fix the systematic error calculating derivatives
if the pixels don't lie on the same face, but the max error will drop to
sqrt(2) which is enough to make it pass.)

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] visual leak?

2013-05-03 Thread Andy Skinner
Hi, Brian.  Did you have thoughts one these?

Would it make sense, if we can't update Mesa right now, to just put the 
register_with_display() call into Fake_glXChooseFBConfig(), as it is for 
Fake_glxChooseVisual()?  It allows the visual to be cleaned up when the Display 
is closed.  Unless it introduces something else we'd have to chase down, we 
think we'd have to do a smaller job to make that change (for now) than to go to 
more recent Mesa.

Whether we do that or not, should destroy_visuals_on_display() free both the 
pointers that XMesaDestroyVisual() frees?  This looks like a leak to me, even 
in current Mesa.

thanks
andy

-Original Message-
From: Andy Skinner 
Sent: Monday, April 29, 2013 1:04 PM
To: 'Brian Paul'
Cc: mesa-dev@lists.freedesktop.org
Subject: RE: [Mesa-dev] visual leak?

Thanks.  I realized today that newer Mesa registers the display with that 
callback for choosing FBConfig, but our older version only does it for Visual.  
So it isn't getting freed when we close the display.

Do you think the register_with_display() call can be put into 
Fake_glXChooseFBConfig() on its own?  We'd like to update Mesa, but it may not 
be a good time in our release cycle.

Also, destroy_visuals_on_display calls free, but XMesaDestroyVisual can also 
call free on v->visinfo.  Should destroy_visuals_on_display call 
XMesaDestroyVisual, or do both free()s?

thanks
andy

-Original Message-
From: Brian Paul [mailto:bri...@vmware.com] 
Sent: Monday, April 29, 2013 11:39 AM
To: Andy Skinner
Cc: mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] visual leak?

On 04/26/2013 03:23 PM, Andy Skinner wrote:
> Hi. We are seeing a leak of things created in XMesaCreateVisual, as 
> called from fakeglx.c. I don't see anywhere that XMesaDestroyVisual is 
> called. I see a free() in destroy_visuals_on_display, but I don't know 
> how often that is called. Are these visuals intended to be kept around?
>
> The visual is allocated in XMesaCreateVisual, from save_glx_visual, 
> from choose_visual, from Fake_glXChooseFBConfig.
>
> It may be that the problem is that I'm not finding and reusing one in 
> the list, but separate from that, I am curious why XMesaDestroyVisual 
> isn't called.
>
> Any thoughts?

There's no GLX function for explicitly destroying a visual so that's why 
XMesaDestroyVisual() isn't called.  Instead, we register an X display callback 
that's called when the X Display is closed which frees visuals and other things.

So, when you call XCloseDisplay, they should get freed.

-Brian
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Correctly initialize the shader key, v2

2013-05-03 Thread Lauri Kasanen
On Fri, 3 May 2013 10:54:09 -0400
Alex Deucher  wrote:

> On Fri, May 3, 2013 at 9:55 AM, Lauri Kasanen  wrote:
> > This patch fixes constant shader recompiles by initializing the struct
> > to zero. For completeness, memcpy is used to copy the key to the shader
> > struct.
> >
> 
> Candidate for the stable branch?
> 
> Alex

Yes, please apply to all stable branches where it applies cleanly.

- Lauri
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCHv2] gallium: fix type of flags in pipe_context::flush()

2013-05-03 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Fri, May 3, 2013 at 5:19 PM, Chia-I Wu  wrote:
> It should be unsigned, not enum pipe_flush_flags.
>
> Fixed a build error:
>
>   src/gallium/state_trackers/egl/android/native_android.cpp:426:29: error:
>   invalid conversion from 'int' to 'pipe_flush_flags' [-fpermissive]
>
> v2: replace all occurrences of enum pipe_flush_flags by unsigned
>
> Signed-off-by: Chia-I Wu 
> ---
>  src/gallium/drivers/freedreno/freedreno_context.c |2 +-
>  src/gallium/drivers/galahad/glhd_context.c|2 +-
>  src/gallium/drivers/i915/i915_batch.h |2 +-
>  src/gallium/drivers/i915/i915_flush.c |4 ++--
>  src/gallium/drivers/identity/id_context.c |2 +-
>  src/gallium/drivers/ilo/ilo_context.c |2 +-
>  src/gallium/drivers/llvmpipe/lp_context.c |2 +-
>  src/gallium/drivers/noop/noop_pipe.c  |2 +-
>  src/gallium/drivers/nv30/nv30_context.c   |2 +-
>  src/gallium/drivers/nv50/nv50_context.c   |2 +-
>  src/gallium/drivers/nvc0/nvc0_context.c   |2 +-
>  src/gallium/drivers/r300/r300_flush.c |2 +-
>  src/gallium/drivers/r600/r600_pipe.c  |2 +-
>  src/gallium/drivers/radeonsi/radeonsi_pipe.c  |2 +-
>  src/gallium/drivers/rbug/rbug_context.c   |2 +-
>  src/gallium/drivers/softpipe/sp_flush.c   |2 +-
>  src/gallium/drivers/softpipe/sp_flush.h   |2 +-
>  src/gallium/drivers/svga/svga_pipe_flush.c|2 +-
>  src/gallium/drivers/trace/tr_context.c|2 +-
>  src/gallium/include/pipe/p_context.h  |2 +-
>  src/gallium/state_trackers/clover/core/queue.cpp  |2 +-
>  src/gallium/state_trackers/vega/vg_manager.c  |2 +-
>  src/mesa/state_tracker/st_cb_flush.c  |2 +-
>  src/mesa/state_tracker/st_cb_flush.h  |2 +-
>  src/mesa/state_tracker/st_manager.c   |2 +-
>  25 files changed, 26 insertions(+), 26 deletions(-)
>
> diff --git a/src/gallium/drivers/freedreno/freedreno_context.c 
> b/src/gallium/drivers/freedreno/freedreno_context.c
> index 4753f58..64c21fe 100644
> --- a/src/gallium/drivers/freedreno/freedreno_context.c
> +++ b/src/gallium/drivers/freedreno/freedreno_context.c
> @@ -96,7 +96,7 @@ fd_context_render(struct pipe_context *pctx)
>
>  static void
>  fd_context_flush(struct pipe_context *pctx, struct pipe_fence_handle **fence,
> -   enum pipe_flush_flags flags)
> +   unsigned flags)
>  {
> DBG("fence=%p", fence);
>
> diff --git a/src/gallium/drivers/galahad/glhd_context.c 
> b/src/gallium/drivers/galahad/glhd_context.c
> index 8901b6a..a73a3ad 100644
> --- a/src/gallium/drivers/galahad/glhd_context.c
> +++ b/src/gallium/drivers/galahad/glhd_context.c
> @@ -844,7 +844,7 @@ galahad_context_clear_depth_stencil(struct pipe_context 
> *_pipe,
>  static void
>  galahad_context_flush(struct pipe_context *_pipe,
>struct pipe_fence_handle **fence,
> -  enum pipe_flush_flags flags)
> +  unsigned flags)
>  {
> struct galahad_context *glhd_pipe = galahad_context(_pipe);
> struct pipe_context *pipe = glhd_pipe->pipe;
> diff --git a/src/gallium/drivers/i915/i915_batch.h 
> b/src/gallium/drivers/i915/i915_batch.h
> index 5f2b324..ab0f8c8 100644
> --- a/src/gallium/drivers/i915/i915_batch.h
> +++ b/src/gallium/drivers/i915/i915_batch.h
> @@ -55,6 +55,6 @@
>   */
>  extern void i915_flush(struct i915_context *i915,
> struct pipe_fence_handle **fence,
> -   enum pipe_flush_flags flags);
> +   unsigned flags);
>
>  #endif
> diff --git a/src/gallium/drivers/i915/i915_flush.c 
> b/src/gallium/drivers/i915/i915_flush.c
> index 3db6ca1..0dca722 100644
> --- a/src/gallium/drivers/i915/i915_flush.c
> +++ b/src/gallium/drivers/i915/i915_flush.c
> @@ -40,7 +40,7 @@
>
>  static void i915_flush_pipe( struct pipe_context *pipe,
>   struct pipe_fence_handle **fence,
> - enum pipe_flush_flags flags )
> + unsigned flags )
>  {
> struct i915_context *i915 = i915_context(pipe);
> enum i915_winsys_flush_flags winsys_flags = I915_FLUSH_ASYNC;
> @@ -71,7 +71,7 @@ void i915_init_flush_functions( struct i915_context *i915 )
>   */
>  void i915_flush(struct i915_context *i915,
>  struct pipe_fence_handle **fence,
> -enum pipe_flush_flags flags)
> +unsigned flags)
>  {
> struct i915_winsys_batchbuffer *batch = i915->batch;
>
> diff --git a/src/gallium/drivers/identity/id_context.c 
> b/src/gallium/drivers/identity/id_context.c
> index d0b67ef..0eff6c9 100644
> --- a/src/gallium/drivers/identity/id_context.c
> +++ b/src/gallium/drivers/identity/id_context.c
> @@ -712,7 +712,7 @@ identity_clear_depth_stencil(struct pipe_context *_pip

Re: [Mesa-dev] RFC: tgsi opcodes for 32x32 muls with 64bit results

2013-05-03 Thread Roland Scheidegger
Am 03.05.2013 16:45, schrieb Christoph Bumiller:
> On 03.05.2013 16:32, Jose Fonseca wrote:
>>
>> - Original Message -
>>> Am 03.05.2013 06:58, schrieb Jose Fonseca:

 - Original Message -
> Currently, there's no way to get the high bits of a 32x32
> signed/unsigned integer multiplication with tgsi. However, all of
> d3d10, OpenGL, and OpenCL support that, so we need it as well.
> There's essentially two ways how it could be done: - a
> 2-destination instruction returning both high and low bits (this
> is how it looks like in d3d10 and glsl) - use the existing umul for
> the low bits and have another instruction for the high bits (this
> is how it looks like in opencl)
>
> Well there's other possibilities but these looked like they'd match
> both APIs and HW reasonably (well with the exception of things like
> sse2 which would prefer 2x2 32bit inputs and return 2x64bit as one
> reg...).
>
> Actually it's two new instructions because unlike for the low bits
> it matters for the high bits if the source operands are signed or
> unsigned.
>
> Personally I'm favoring two separate instructions for low and high
> bits to not have to deal with multi-destination instructions, but
> if someone makes a strong case for one returning both low and high
> bits I could be convinced otherwise. I think though two
> instructions matches most hw very well (with the exception of
> software renderers and possibly intel graphics but then a good
> backend could certainly recognize this).
 Roland,

 I don't know about GPU HW, but I think that what you propose will
 forever prevent decent SSE code generation with LLVM.

 Using two separate opcodes for hi/low bits relies on common
 sub-expression elimination to merge the two multiplication operations
 back into one.  But I strongly doubt that even LLVM's optimization
 passes will be able to do that.

 Getting the 64bits results with LLVM will require sign extend the
 source arguments (http://llvm.org/docs/LangRef.html#mul-instruction )
 or SSE intrinsics. Eitherway, the expressions for the low and high
 bit will be radically different, so we'll end with two multiplies in
 the end -- which I think it is simply inadmissible -- TGSI should not
 stand in the way of backends generating good code.
>>> You can't generate good code either way, this is a deficiency of sse
>>> instruction set.
>>> As I've outlined in another email, I think the best you can do with
>>> sse41 is:
>>> - shuffle both src args (put 2nd/4th elements into 1st/3rd slot)
>>> - 2xpmuldq/pmuludq for doing the 32x32->64bit mul for both 1st/3rd and
>>> 2nd/4th element
>>> - shuffle the high bits into place (I think this needs 3 hw shuffle
>>> instructions)
>>> - shuffle the low bits into place (can benefit from shuffles for high
>>> bits, so just one another shuffle)
>>>
>>> Maybe you can do better with more clever shuffles, but in any case the
>>> low bits will always require one (at least) additional shuffle.
>>>
>>> If you have separate opcodes, everything will be the same, except the
>>> last step you'll just ignore that shuffle and instead just use the
>>> pmulld instruction, which will do exactly what you need for the low
>>> bits. Sure multiplications are more effort for the hw, but hell it even
>>> has the same throughput on most cpus compared to a shuffle, just latency
>>> is worse. In any case it would be 8 vs 8 instructions, with just one
>>> instruction of them very slightly worse. We have much more optimization
>>> opportunities elsewhere than that (I agree that with sse2, which lacks
>>> pmulld, it would be worse, but we never particularly cared about that).
>> That's the thing -- if we have 32x32->64 opcodes we can fine tune this 
>> later. If we stick with separate high bit opcodes then that ability is lost 
>> (at least without coming back and changing TGSI again).
>>
 So I strongly think this is a bad idea. TGSI has support for multiple
 destinations, though we never made much use of it. I see nothing
 special about it.

 If you can prove me wrong -- that LLVM can handle merge the
 multiplies -- fine.  But I do think we have bigger fish to fry, so
 I'd prefer we don't put too much time debating this.
>>> No I doubt llvm can merge it (though in theory nothing would prevent it
>>> from recognizing the pattern). My guess is it will do scalar extraction,
>>> and use the imul/mul instructions (which can return 2x32bit numbers even
>>> on 32bit), then combine the vectors back together (most likely element
>>> by element). If it actually does it like that, a separate mul for the
>>> low bits would be in fact a win, because it would save the 4 reinsertion
>>> of the elements at the cost of just one vector mul (llvm uses pmulld
>>> just fine). But looking at this that way doesn't really make sense, we

Re: [Mesa-dev] [PATCH] gallium: fix type of flags in pipe_context::flush()

2013-05-03 Thread Chia-I Wu
On Fri, May 3, 2013 at 4:00 PM, Jose Fonseca  wrote:
> - Original Message -
>> It should be unsigned, not enum pipe_flush_flags.
>>
>> Fixed a build error:
>>
>>   src/gallium/state_trackers/egl/android/native_android.cpp:426:29: error:
>>   invalid conversion from 'int' to 'pipe_flush_flags' [-fpermissive]
>>
>
> While I agree that using enum for bitmasks is not a great idea, I think it is 
> a bit too late to go back on this now, as enum pipe_flush_flags is now used 
> all over the place.  Please just add (enum pipe_flush_flags) cast to silence. 
>  That is what is done in other cpp files.
I decided to give the patch another shot by replacing all occurrences
of "enum pipe_flush_flags" with unsigned.  It is not uncommon for
gallium to have interface changes, and it is a bit hard for me to let
it go when noticing an enum is used for bitmasks.

I compile-tested the patch with most drivers (r600 and radeonsi
excluded for requiring LLVM 3.3) and all state trackers that use "enum
pipe_flush_flags" this time.

>
> Jose



--
o...@lunarg.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: tgsi opcodes for 32x32 muls with 64bit results

2013-05-03 Thread Roland Scheidegger
Am 03.05.2013 16:32, schrieb Jose Fonseca:
> 
> 
> - Original Message -
>> Am 03.05.2013 06:58, schrieb Jose Fonseca:
>>>
>>>
>>> - Original Message -
 Currently, there's no way to get the high bits of a 32x32
 signed/unsigned integer multiplication with tgsi. However, all of
 d3d10, OpenGL, and OpenCL support that, so we need it as well.
 There's essentially two ways how it could be done: - a
 2-destination instruction returning both high and low bits (this
 is how it looks like in d3d10 and glsl) - use the existing umul for
 the low bits and have another instruction for the high bits (this
 is how it looks like in opencl)

 Well there's other possibilities but these looked like they'd match
 both APIs and HW reasonably (well with the exception of things like
 sse2 which would prefer 2x2 32bit inputs and return 2x64bit as one
 reg...).

 Actually it's two new instructions because unlike for the low bits
 it matters for the high bits if the source operands are signed or
 unsigned.

 Personally I'm favoring two separate instructions for low and high
 bits to not have to deal with multi-destination instructions, but
 if someone makes a strong case for one returning both low and high
 bits I could be convinced otherwise. I think though two
 instructions matches most hw very well (with the exception of
 software renderers and possibly intel graphics but then a good
 backend could certainly recognize this).
>>>
>>> Roland,
>>>
>>> I don't know about GPU HW, but I think that what you propose will
>>> forever prevent decent SSE code generation with LLVM.
>>>
>>> Using two separate opcodes for hi/low bits relies on common
>>> sub-expression elimination to merge the two multiplication operations
>>> back into one.  But I strongly doubt that even LLVM's optimization
>>> passes will be able to do that.
>>>
>>> Getting the 64bits results with LLVM will require sign extend the
>>> source arguments (http://llvm.org/docs/LangRef.html#mul-instruction )
>>> or SSE intrinsics. Eitherway, the expressions for the low and high
>>> bit will be radically different, so we'll end with two multiplies in
>>> the end -- which I think it is simply inadmissible -- TGSI should not
>>> stand in the way of backends generating good code.
> 
>> You can't generate good code either way, this is a deficiency of sse
>> instruction set.
>> As I've outlined in another email, I think the best you can do with
>> sse41 is:
>> - shuffle both src args (put 2nd/4th elements into 1st/3rd slot)
>> - 2xpmuldq/pmuludq for doing the 32x32->64bit mul for both 1st/3rd and
>> 2nd/4th element
>> - shuffle the high bits into place (I think this needs 3 hw shuffle
>> instructions)
>> - shuffle the low bits into place (can benefit from shuffles for high
>> bits, so just one another shuffle)
>>
>> Maybe you can do better with more clever shuffles, but in any case the
>> low bits will always require one (at least) additional shuffle.
>>
>> If you have separate opcodes, everything will be the same, except the
>> last step you'll just ignore that shuffle and instead just use the
>> pmulld instruction, which will do exactly what you need for the low
>> bits. Sure multiplications are more effort for the hw, but hell it even
>> has the same throughput on most cpus compared to a shuffle, just latency
>> is worse. In any case it would be 8 vs 8 instructions, with just one
>> instruction of them very slightly worse. We have much more optimization
>> opportunities elsewhere than that (I agree that with sse2, which lacks
>> pmulld, it would be worse, but we never particularly cared about that).
> 
> That's the thing -- if we have 32x32->64 opcodes we can fine tune this later. 
> If we stick with separate high bit opcodes then that ability is lost (at 
> least without coming back and changing TGSI again).
Well it wouldn't really be a 32x32->64 opcode but a 32x32->2x32 one of
course.
How do you want to fine tune that? The APIs demand the result returned
as low/high parts. And there's no way around the fact that sse41 just
doesn't have the instructions to do that natively, no matter if it's two
separate opcodes or just one. And really, I'm not even going to start
worrying about the cost of a mul vs. a shuffle, that just makes plain no
sense.


> 
>>>
>>> So I strongly think this is a bad idea. TGSI has support for multiple
>>> destinations, though we never made much use of it. I see nothing
>>> special about it.
>>>
>>> If you can prove me wrong -- that LLVM can handle merge the
>>> multiplies -- fine.  But I do think we have bigger fish to fry, so
>>> I'd prefer we don't put too much time debating this.
>>
>> No I doubt llvm can merge it (though in theory nothing would prevent it
>> from recognizing the pattern). My guess is it will do scalar extraction,
>> and use the imul/mul instructions (which can return 2x32bit numbers even
>> on 32bit), then combine the vect

[Mesa-dev] [PATCHv2] gallium: fix type of flags in pipe_context::flush()

2013-05-03 Thread Chia-I Wu
It should be unsigned, not enum pipe_flush_flags.

Fixed a build error:

  src/gallium/state_trackers/egl/android/native_android.cpp:426:29: error:
  invalid conversion from 'int' to 'pipe_flush_flags' [-fpermissive]

v2: replace all occurrences of enum pipe_flush_flags by unsigned

Signed-off-by: Chia-I Wu 
---
 src/gallium/drivers/freedreno/freedreno_context.c |2 +-
 src/gallium/drivers/galahad/glhd_context.c|2 +-
 src/gallium/drivers/i915/i915_batch.h |2 +-
 src/gallium/drivers/i915/i915_flush.c |4 ++--
 src/gallium/drivers/identity/id_context.c |2 +-
 src/gallium/drivers/ilo/ilo_context.c |2 +-
 src/gallium/drivers/llvmpipe/lp_context.c |2 +-
 src/gallium/drivers/noop/noop_pipe.c  |2 +-
 src/gallium/drivers/nv30/nv30_context.c   |2 +-
 src/gallium/drivers/nv50/nv50_context.c   |2 +-
 src/gallium/drivers/nvc0/nvc0_context.c   |2 +-
 src/gallium/drivers/r300/r300_flush.c |2 +-
 src/gallium/drivers/r600/r600_pipe.c  |2 +-
 src/gallium/drivers/radeonsi/radeonsi_pipe.c  |2 +-
 src/gallium/drivers/rbug/rbug_context.c   |2 +-
 src/gallium/drivers/softpipe/sp_flush.c   |2 +-
 src/gallium/drivers/softpipe/sp_flush.h   |2 +-
 src/gallium/drivers/svga/svga_pipe_flush.c|2 +-
 src/gallium/drivers/trace/tr_context.c|2 +-
 src/gallium/include/pipe/p_context.h  |2 +-
 src/gallium/state_trackers/clover/core/queue.cpp  |2 +-
 src/gallium/state_trackers/vega/vg_manager.c  |2 +-
 src/mesa/state_tracker/st_cb_flush.c  |2 +-
 src/mesa/state_tracker/st_cb_flush.h  |2 +-
 src/mesa/state_tracker/st_manager.c   |2 +-
 25 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/src/gallium/drivers/freedreno/freedreno_context.c 
b/src/gallium/drivers/freedreno/freedreno_context.c
index 4753f58..64c21fe 100644
--- a/src/gallium/drivers/freedreno/freedreno_context.c
+++ b/src/gallium/drivers/freedreno/freedreno_context.c
@@ -96,7 +96,7 @@ fd_context_render(struct pipe_context *pctx)
 
 static void
 fd_context_flush(struct pipe_context *pctx, struct pipe_fence_handle **fence,
-   enum pipe_flush_flags flags)
+   unsigned flags)
 {
DBG("fence=%p", fence);
 
diff --git a/src/gallium/drivers/galahad/glhd_context.c 
b/src/gallium/drivers/galahad/glhd_context.c
index 8901b6a..a73a3ad 100644
--- a/src/gallium/drivers/galahad/glhd_context.c
+++ b/src/gallium/drivers/galahad/glhd_context.c
@@ -844,7 +844,7 @@ galahad_context_clear_depth_stencil(struct pipe_context 
*_pipe,
 static void
 galahad_context_flush(struct pipe_context *_pipe,
   struct pipe_fence_handle **fence,
-  enum pipe_flush_flags flags)
+  unsigned flags)
 {
struct galahad_context *glhd_pipe = galahad_context(_pipe);
struct pipe_context *pipe = glhd_pipe->pipe;
diff --git a/src/gallium/drivers/i915/i915_batch.h 
b/src/gallium/drivers/i915/i915_batch.h
index 5f2b324..ab0f8c8 100644
--- a/src/gallium/drivers/i915/i915_batch.h
+++ b/src/gallium/drivers/i915/i915_batch.h
@@ -55,6 +55,6 @@
  */
 extern void i915_flush(struct i915_context *i915,
struct pipe_fence_handle **fence,
-   enum pipe_flush_flags flags);
+   unsigned flags);
 
 #endif
diff --git a/src/gallium/drivers/i915/i915_flush.c 
b/src/gallium/drivers/i915/i915_flush.c
index 3db6ca1..0dca722 100644
--- a/src/gallium/drivers/i915/i915_flush.c
+++ b/src/gallium/drivers/i915/i915_flush.c
@@ -40,7 +40,7 @@
 
 static void i915_flush_pipe( struct pipe_context *pipe,
  struct pipe_fence_handle **fence,
- enum pipe_flush_flags flags )
+ unsigned flags )
 {
struct i915_context *i915 = i915_context(pipe);
enum i915_winsys_flush_flags winsys_flags = I915_FLUSH_ASYNC;
@@ -71,7 +71,7 @@ void i915_init_flush_functions( struct i915_context *i915 )
  */
 void i915_flush(struct i915_context *i915,
 struct pipe_fence_handle **fence,
-enum pipe_flush_flags flags)
+unsigned flags)
 {
struct i915_winsys_batchbuffer *batch = i915->batch;
 
diff --git a/src/gallium/drivers/identity/id_context.c 
b/src/gallium/drivers/identity/id_context.c
index d0b67ef..0eff6c9 100644
--- a/src/gallium/drivers/identity/id_context.c
+++ b/src/gallium/drivers/identity/id_context.c
@@ -712,7 +712,7 @@ identity_clear_depth_stencil(struct pipe_context *_pipe,
 static void
 identity_flush(struct pipe_context *_pipe,
struct pipe_fence_handle **fence,
-   enum pipe_flush_flags flags)
+   unsigned flags)
 {
struct identity_context *id_pipe = identity_context(_pipe);
struct pipe_context *pipe = i

Re: [Mesa-dev] [PATCH] r600g: Correctly initialize the shader key, v2

2013-05-03 Thread Alex Deucher
On Fri, May 3, 2013 at 9:55 AM, Lauri Kasanen  wrote:
> Assigning a struct only copies the members - any padding is left as is.
>
> Thus this code:
>
> struct foo_t foo;
> foo = bar;
>
> leaves the padding of foo intact, ie uninitialized random garbage.
>
> This patch fixes constant shader recompiles by initializing the struct
> to zero. For completeness, memcpy is used to copy the key to the shader
> struct.
>

Candidate for the stable branch?

Alex

> Signed-off-by: Lauri Kasanen 
> ---
>  src/gallium/drivers/r600/r600_state_common.c |3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/r600_state_common.c
> b/src/gallium/drivers/r600/r600_state_common.c index 87a2e2e..fbac576
> 100644
> --- a/src/gallium/drivers/r600/r600_state_common.c
> +++ b/src/gallium/drivers/r600/r600_state_common.c
> @@ -715,6 +715,7 @@ static int r600_shader_select(struct pipe_context
> *ctx, struct r600_pipe_shader * shader = NULL;
> int r;
>
> +   memset(&key, 0, sizeof(key));
> key = r600_shader_selector_key(ctx, sel);
>
> /* Check if we don't need to change anything.
> @@ -762,7 +763,7 @@ static int r600_shader_select(struct pipe_context
> *ctx, key = r600_shader_selector_key(ctx, sel);
> }
>
> -   shader->key = key;
> +   memcpy(&shader->key, &key, sizeof(key));
> sel->num_shaders++;
> }
>
> --
> 1.7.2.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: don't emit surface_sync after FLUSH_AND_INV_EVENT

2013-05-03 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Fri, May 3, 2013 at 4:01 PM,   wrote:
> From: Alex Deucher 
>
> It shouldn't be needed since the FLUSH_AND_INV_EVENT has already
> made sure the destination caches are flushed.  Additionally,
> we didn't previously emit the surface_sync until this commit:
> http://cgit.freedesktop.org/mesa/mesa/commit/?id=e5e4c07e7964a3258ed02b530bcdc24c0650204b
> Emitting them together causes hangs in compute on cayman/TN
> and hangs in Heaven on evergreen.
>
> Note: this patch is a candidate for the 9.1 branch, but requires:
> http://cgit.freedesktop.org/mesa/mesa/commit/?id=156bcca62c9f4e79e78929f72bc085757f36a65a
> as well.
>
> Signed-off-by: Alex Deucher 
> ---
>  src/gallium/drivers/r600/r600_hw_context.c |   26 --
>  1 files changed, 0 insertions(+), 26 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
> b/src/gallium/drivers/r600/r600_hw_context.c
> index 6d8b2cf..944b666 100644
> --- a/src/gallium/drivers/r600/r600_hw_context.c
> +++ b/src/gallium/drivers/r600/r600_hw_context.c
> @@ -226,32 +226,6 @@ void r600_flush_emit(struct r600_context *rctx)
> if (rctx->flags & R600_CONTEXT_FLUSH_AND_INV) {
> cs->buf[cs->cdw++] = PKT3(PKT3_EVENT_WRITE, 0, 0);
> cs->buf[cs->cdw++] = 
> EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT) | EVENT_INDEX(0);
> -   if (rctx->chip_class >= EVERGREEN) {
> -   /* We were previously setting the CB and DB bits on
> -* cp_coher_cntl, but this is unnecessary since
> -* we are emitting the
> -* EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet.
> -* Setting the CB bits was causing lockups when using
> -* compute on cayman.
> -*
> -* XXX: Do even need to emit a surface sync packet 
> here?
> -* Prior to e5e4c07e7964a3258ed02b530bcdc24c0650204b
> -* surface sync was not being emitted with the
> -* R600_CONTEXT_FLUSH_AND_INV flag.
> -*/
> -   cp_coher_cntl = S_0085F0_TC_ACTION_ENA(1) |
> -   S_0085F0_DB_ACTION_ENA(1) |
> -   S_0085F0_SH_ACTION_ENA(1) |
> -   S_0085F0_SMX_ACTION_ENA(1) |
> -   S_0085F0_FULL_CACHE_ENA(1);
> -   } else {
> -   cp_coher_cntl = S_0085F0_SMX_ACTION_ENA(1) |
> -   S_0085F0_SH_ACTION_ENA(1) |
> -   S_0085F0_VC_ACTION_ENA(1) |
> -   S_0085F0_TC_ACTION_ENA(1) |
> -   S_0085F0_FULL_CACHE_ENA(1);
> -   }
> -   emit_flush = 1;
> }
>
> if (rctx->flags & R600_CONTEXT_INVAL_READ_CACHES) {
> --
> 1.7.7.5
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Correctly initialize the shader key, v2

2013-05-03 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Fri, May 3, 2013 at 3:55 PM, Lauri Kasanen  wrote:
> Assigning a struct only copies the members - any padding is left as is.
>
> Thus this code:
>
> struct foo_t foo;
> foo = bar;
>
> leaves the padding of foo intact, ie uninitialized random garbage.
>
> This patch fixes constant shader recompiles by initializing the struct
> to zero. For completeness, memcpy is used to copy the key to the shader
> struct.
>
> Signed-off-by: Lauri Kasanen 
> ---
>  src/gallium/drivers/r600/r600_state_common.c |3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/r600_state_common.c
> b/src/gallium/drivers/r600/r600_state_common.c index 87a2e2e..fbac576
> 100644
> --- a/src/gallium/drivers/r600/r600_state_common.c
> +++ b/src/gallium/drivers/r600/r600_state_common.c
> @@ -715,6 +715,7 @@ static int r600_shader_select(struct pipe_context
> *ctx, struct r600_pipe_shader * shader = NULL;
> int r;
>
> +   memset(&key, 0, sizeof(key));
> key = r600_shader_selector_key(ctx, sel);
>
> /* Check if we don't need to change anything.
> @@ -762,7 +763,7 @@ static int r600_shader_select(struct pipe_context
> *ctx, key = r600_shader_selector_key(ctx, sel);
> }
>
> -   shader->key = key;
> +   memcpy(&shader->key, &key, sizeof(key));
> sel->num_shaders++;
> }
>
> --
> 1.7.2.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: tgsi opcodes for 32x32 muls with 64bit results

2013-05-03 Thread Christoph Bumiller
On 03.05.2013 16:32, Jose Fonseca wrote:
>
> - Original Message -
>> Am 03.05.2013 06:58, schrieb Jose Fonseca:
>>>
>>> - Original Message -
 Currently, there's no way to get the high bits of a 32x32
 signed/unsigned integer multiplication with tgsi. However, all of
 d3d10, OpenGL, and OpenCL support that, so we need it as well.
 There's essentially two ways how it could be done: - a
 2-destination instruction returning both high and low bits (this
 is how it looks like in d3d10 and glsl) - use the existing umul for
 the low bits and have another instruction for the high bits (this
 is how it looks like in opencl)

 Well there's other possibilities but these looked like they'd match
 both APIs and HW reasonably (well with the exception of things like
 sse2 which would prefer 2x2 32bit inputs and return 2x64bit as one
 reg...).

 Actually it's two new instructions because unlike for the low bits
 it matters for the high bits if the source operands are signed or
 unsigned.

 Personally I'm favoring two separate instructions for low and high
 bits to not have to deal with multi-destination instructions, but
 if someone makes a strong case for one returning both low and high
 bits I could be convinced otherwise. I think though two
 instructions matches most hw very well (with the exception of
 software renderers and possibly intel graphics but then a good
 backend could certainly recognize this).
>>> Roland,
>>>
>>> I don't know about GPU HW, but I think that what you propose will
>>> forever prevent decent SSE code generation with LLVM.
>>>
>>> Using two separate opcodes for hi/low bits relies on common
>>> sub-expression elimination to merge the two multiplication operations
>>> back into one.  But I strongly doubt that even LLVM's optimization
>>> passes will be able to do that.
>>>
>>> Getting the 64bits results with LLVM will require sign extend the
>>> source arguments (http://llvm.org/docs/LangRef.html#mul-instruction )
>>> or SSE intrinsics. Eitherway, the expressions for the low and high
>>> bit will be radically different, so we'll end with two multiplies in
>>> the end -- which I think it is simply inadmissible -- TGSI should not
>>> stand in the way of backends generating good code.
>> You can't generate good code either way, this is a deficiency of sse
>> instruction set.
>> As I've outlined in another email, I think the best you can do with
>> sse41 is:
>> - shuffle both src args (put 2nd/4th elements into 1st/3rd slot)
>> - 2xpmuldq/pmuludq for doing the 32x32->64bit mul for both 1st/3rd and
>> 2nd/4th element
>> - shuffle the high bits into place (I think this needs 3 hw shuffle
>> instructions)
>> - shuffle the low bits into place (can benefit from shuffles for high
>> bits, so just one another shuffle)
>>
>> Maybe you can do better with more clever shuffles, but in any case the
>> low bits will always require one (at least) additional shuffle.
>>
>> If you have separate opcodes, everything will be the same, except the
>> last step you'll just ignore that shuffle and instead just use the
>> pmulld instruction, which will do exactly what you need for the low
>> bits. Sure multiplications are more effort for the hw, but hell it even
>> has the same throughput on most cpus compared to a shuffle, just latency
>> is worse. In any case it would be 8 vs 8 instructions, with just one
>> instruction of them very slightly worse. We have much more optimization
>> opportunities elsewhere than that (I agree that with sse2, which lacks
>> pmulld, it would be worse, but we never particularly cared about that).
> That's the thing -- if we have 32x32->64 opcodes we can fine tune this later. 
> If we stick with separate high bit opcodes then that ability is lost (at 
> least without coming back and changing TGSI again).
>
>>> So I strongly think this is a bad idea. TGSI has support for multiple
>>> destinations, though we never made much use of it. I see nothing
>>> special about it.
>>>
>>> If you can prove me wrong -- that LLVM can handle merge the
>>> multiplies -- fine.  But I do think we have bigger fish to fry, so
>>> I'd prefer we don't put too much time debating this.
>> No I doubt llvm can merge it (though in theory nothing would prevent it
>> from recognizing the pattern). My guess is it will do scalar extraction,
>> and use the imul/mul instructions (which can return 2x32bit numbers even
>> on 32bit), then combine the vectors back together (most likely element
>> by element). If it actually does it like that, a separate mul for the
>> low bits would be in fact a win, because it would save the 4 reinsertion
>> of the elements at the cost of just one vector mul (llvm uses pmulld
>> just fine). But looking at this that way doesn't really make sense, we
>> need instructions which make sense for everybody and aren't specified to
>> suit one very peculiar implementation.
>> But even if it gen

Re: [Mesa-dev] RFC: tgsi opcodes for 32x32 muls with 64bit results

2013-05-03 Thread Jose Fonseca


- Original Message -
> Am 03.05.2013 06:58, schrieb Jose Fonseca:
> > 
> > 
> > - Original Message -
> >> Currently, there's no way to get the high bits of a 32x32
> >> signed/unsigned integer multiplication with tgsi. However, all of
> >> d3d10, OpenGL, and OpenCL support that, so we need it as well.
> >> There's essentially two ways how it could be done: - a
> >> 2-destination instruction returning both high and low bits (this
> >> is how it looks like in d3d10 and glsl) - use the existing umul for
> >> the low bits and have another instruction for the high bits (this
> >> is how it looks like in opencl)
> >> 
> >> Well there's other possibilities but these looked like they'd match
> >> both APIs and HW reasonably (well with the exception of things like
> >> sse2 which would prefer 2x2 32bit inputs and return 2x64bit as one
> >> reg...).
> >> 
> >> Actually it's two new instructions because unlike for the low bits
> >> it matters for the high bits if the source operands are signed or
> >> unsigned.
> >> 
> >> Personally I'm favoring two separate instructions for low and high
> >> bits to not have to deal with multi-destination instructions, but
> >> if someone makes a strong case for one returning both low and high
> >> bits I could be convinced otherwise. I think though two
> >> instructions matches most hw very well (with the exception of
> >> software renderers and possibly intel graphics but then a good
> >> backend could certainly recognize this).
> > 
> > Roland,
> > 
> > I don't know about GPU HW, but I think that what you propose will
> > forever prevent decent SSE code generation with LLVM.
> > 
> > Using two separate opcodes for hi/low bits relies on common
> > sub-expression elimination to merge the two multiplication operations
> > back into one.  But I strongly doubt that even LLVM's optimization
> > passes will be able to do that.
> > 
> > Getting the 64bits results with LLVM will require sign extend the
> > source arguments (http://llvm.org/docs/LangRef.html#mul-instruction )
> > or SSE intrinsics. Eitherway, the expressions for the low and high
> > bit will be radically different, so we'll end with two multiplies in
> > the end -- which I think it is simply inadmissible -- TGSI should not
> > stand in the way of backends generating good code.

> You can't generate good code either way, this is a deficiency of sse
> instruction set.
> As I've outlined in another email, I think the best you can do with
> sse41 is:
> - shuffle both src args (put 2nd/4th elements into 1st/3rd slot)
> - 2xpmuldq/pmuludq for doing the 32x32->64bit mul for both 1st/3rd and
> 2nd/4th element
> - shuffle the high bits into place (I think this needs 3 hw shuffle
> instructions)
> - shuffle the low bits into place (can benefit from shuffles for high
> bits, so just one another shuffle)
> 
> Maybe you can do better with more clever shuffles, but in any case the
> low bits will always require one (at least) additional shuffle.
>
> If you have separate opcodes, everything will be the same, except the
> last step you'll just ignore that shuffle and instead just use the
> pmulld instruction, which will do exactly what you need for the low
> bits. Sure multiplications are more effort for the hw, but hell it even
> has the same throughput on most cpus compared to a shuffle, just latency
> is worse. In any case it would be 8 vs 8 instructions, with just one
> instruction of them very slightly worse. We have much more optimization
> opportunities elsewhere than that (I agree that with sse2, which lacks
> pmulld, it would be worse, but we never particularly cared about that).

That's the thing -- if we have 32x32->64 opcodes we can fine tune this later. 
If we stick with separate high bit opcodes then that ability is lost (at least 
without coming back and changing TGSI again).

> > 
> > So I strongly think this is a bad idea. TGSI has support for multiple
> > destinations, though we never made much use of it. I see nothing
> > special about it.
> > 
> > If you can prove me wrong -- that LLVM can handle merge the
> > multiplies -- fine.  But I do think we have bigger fish to fry, so
> > I'd prefer we don't put too much time debating this.
> 
> No I doubt llvm can merge it (though in theory nothing would prevent it
> from recognizing the pattern). My guess is it will do scalar extraction,
> and use the imul/mul instructions (which can return 2x32bit numbers even
> on 32bit), then combine the vectors back together (most likely element
> by element). If it actually does it like that, a separate mul for the
> low bits would be in fact a win, because it would save the 4 reinsertion
> of the elements at the cost of just one vector mul (llvm uses pmulld
> just fine). But looking at this that way doesn't really make sense, we
> need instructions which make sense for everybody and aren't specified to
> suit one very peculiar implementation.
> But even if it generates optimal code, fact is that the multiply for
> gett

Re: [Mesa-dev] [PATCH 1/3] gallivm: Move LLVMStartMultithreaded() static initializer into gallivm

2013-05-03 Thread Jose Fonseca
Looks ok by me.

- Original Message -
> From: Tom Stellard 
> 
> This does not solve all of the problems with using LLVM in a
> multithreaded enivronment, but it should help in some cases.
> ---
>  src/gallium/auxiliary/gallivm/lp_bld_misc.cpp   | 15 +++
>  src/gallium/drivers/radeon/radeon_llvm_emit.cpp | 14 --
>  2 files changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> index 717afa7..1500602 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> @@ -77,6 +77,21 @@
>  
>  #include "lp_bld_misc.h"
>  
> +namespace {
> +
> +class LLVMEnsureMultithreaded {
> +public:
> +   LLVMEnsureMultithreaded()
> +   {
> +  if (!LLVMIsMultithreaded()) {
> + LLVMStartMultithreaded();
> +  }
> +   }
> +};
> +
> +static LLVMEnsureMultithreaded lLVMEnsureMultithreaded;
> +
> +}
>  
>  extern "C" void
>  lp_set_target_options(void)
> diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.cpp
> b/src/gallium/drivers/radeon/radeon_llvm_emit.cpp
> index 55dad9b..03eb5f2 100644
> --- a/src/gallium/drivers/radeon/radeon_llvm_emit.cpp
> +++ b/src/gallium/drivers/radeon/radeon_llvm_emit.cpp
> @@ -58,20 +58,6 @@
>  
>  using namespace llvm;
>  
> -namespace {
> -
> -class LLVMEnsureMultithreaded {
> -public:
> -   LLVMEnsureMultithreaded()
> -   {
> -  llvm_start_multithreaded();
> -   }
> -};
> -
> -static LLVMEnsureMultithreaded lLVMEnsureMultithreaded;
> -
> -}
> -
>  /**
>   * Set the shader type we want to compile
>   *
> --
> 1.7.11.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] scons: remove nouveau build

2013-05-03 Thread Jose Fonseca
Fine by me.

Jose

- Original Message -
> One build system for linux/unix only drivers should be enough.
> Additionally the nouveau target was disabled anyway.
> ---
>  src/gallium/SConscript |   15 ---
>  src/gallium/drivers/nouveau/SConscript |   10 --
>  src/gallium/drivers/nv30/SConscript|   11 ---
>  src/gallium/drivers/nv50/SConscript|   11 ---
>  src/gallium/drivers/nvc0/SConscript|   11 ---
>  5 files changed, 58 deletions(-)
>  delete mode 100644 src/gallium/drivers/nouveau/SConscript
>  delete mode 100644 src/gallium/drivers/nv30/SConscript
>  delete mode 100644 src/gallium/drivers/nv50/SConscript
>  delete mode 100644 src/gallium/drivers/nvc0/SConscript
> 
> diff --git a/src/gallium/SConscript b/src/gallium/SConscript
> index c53cd39..a3edc65 100644
> --- a/src/gallium/SConscript
> +++ b/src/gallium/SConscript
> @@ -27,19 +27,6 @@ if not env['msvc']:
>  'drivers/i915/SConscript',
>  ])
>  
> -if env['drm']:
> -# These drivers depend on drm headers
> -# XXX: nouveau drivers have a tight dependency on libdrm, so to enable
> -# we need some version logic before we enable them. Also, ATM there is
> -# no nouveau target in scons
> -# if env['drm_nouveau']:
> -# SConscript([
> -# 'drivers/nouveau/SConscript',
> -# 'drivers/nv50/SConscript',
> -# 'drivers/nvc0/SConscript',
> -# 'drivers/nvfx/SConscript',
> -# ])
> -
>  #
>  # State trackers
>  #
> @@ -132,7 +119,6 @@ if not env['embedded']:
>  'targets/SConscript.dri',
>  'targets/dri-swrast/SConscript',
>  'targets/dri-vmwgfx/SConscript',
> -#'targets/dri-nouveau/SConscript',
>  ])
>  if env['drm_intel']:
>  SConscript([
> @@ -142,7 +128,6 @@ if not env['embedded']:
>  if env['xorg'] and env['drm']:
>  SConscript([
>  #'targets/xorg-i915/SConscript',
> -#'targets/xorg-nouveau/SConscript',
>  ])
>  
>  
> diff --git a/src/gallium/drivers/nouveau/SConscript
> b/src/gallium/drivers/nouveau/SConscript
> deleted file mode 100644
> index 418744c..000
> --- a/src/gallium/drivers/nouveau/SConscript
> +++ /dev/null
> @@ -1,10 +0,0 @@
> -Import('*')
> -
> -env = env.Clone()
> -
> -nouveau = env.ConvenienceLibrary(
> -target = 'nouveau',
> -source = env.ParseSourceList('Makefile.sources', 'C_SOURCES')
> -)
> -
> -Export('nouveau')
> diff --git a/src/gallium/drivers/nv30/SConscript
> b/src/gallium/drivers/nv30/SConscript
> deleted file mode 100644
> index 3c2ff03..000
> --- a/src/gallium/drivers/nv30/SConscript
> +++ /dev/null
> @@ -1,11 +0,0 @@
> -Import('*')
> -
> -env = env.Clone()
> -
> -nv30 = env.ConvenienceLibrary(
> -target = 'nv30',
> -source = env.ParseSourceList('Makefile.sources',
> -['C_SOURCES', 'CPP_SOURCES'])
> -)
> -
> -Export('nv30')
> diff --git a/src/gallium/drivers/nv50/SConscript
> b/src/gallium/drivers/nv50/SConscript
> deleted file mode 100644
> index 3baaa8a..000
> --- a/src/gallium/drivers/nv50/SConscript
> +++ /dev/null
> @@ -1,11 +0,0 @@
> -Import('*')
> -
> -env = env.Clone()
> -
> -nv50 = env.ConvenienceLibrary(
> -target = 'nv50',
> -source = env.ParseSourceList('Makefile.sources',
> -['C_SOURCES', 'CPP_SOURCES'])
> -)
> -
> -Export('nv50')
> diff --git a/src/gallium/drivers/nvc0/SConscript
> b/src/gallium/drivers/nvc0/SConscript
> deleted file mode 100644
> index 4e456d1..000
> --- a/src/gallium/drivers/nvc0/SConscript
> +++ /dev/null
> @@ -1,11 +0,0 @@
> -Import('*')
> -
> -env = env.Clone()
> -
> -nvc0 = env.ConvenienceLibrary(
> -target = 'nvc0',
> -source = env.ParseSourceList('Makefile.sources',
> -['C_SOURCES', 'CPP_SOURCES'])
> -)
> -
> -Export('nvc0')
> --
> 1.7.10.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: don't emit surface_sync after FLUSH_AND_INV_EVENT

2013-05-03 Thread Tom Stellard
On Fri, May 03, 2013 at 10:01:00AM -0400, alexdeuc...@gmail.com wrote:
> From: Alex Deucher 
>

Reviewed-by: Tom Stellard 
 
> It shouldn't be needed since the FLUSH_AND_INV_EVENT has already
> made sure the destination caches are flushed.  Additionally,
> we didn't previously emit the surface_sync until this commit:
> http://cgit.freedesktop.org/mesa/mesa/commit/?id=e5e4c07e7964a3258ed02b530bcdc24c0650204b
> Emitting them together causes hangs in compute on cayman/TN
> and hangs in Heaven on evergreen.
> 
> Note: this patch is a candidate for the 9.1 branch, but requires:
> http://cgit.freedesktop.org/mesa/mesa/commit/?id=156bcca62c9f4e79e78929f72bc085757f36a65a
> as well.
> 
> Signed-off-by: Alex Deucher 
> ---
>  src/gallium/drivers/r600/r600_hw_context.c |   26 --
>  1 files changed, 0 insertions(+), 26 deletions(-)
> 
> diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
> b/src/gallium/drivers/r600/r600_hw_context.c
> index 6d8b2cf..944b666 100644
> --- a/src/gallium/drivers/r600/r600_hw_context.c
> +++ b/src/gallium/drivers/r600/r600_hw_context.c
> @@ -226,32 +226,6 @@ void r600_flush_emit(struct r600_context *rctx)
>   if (rctx->flags & R600_CONTEXT_FLUSH_AND_INV) {
>   cs->buf[cs->cdw++] = PKT3(PKT3_EVENT_WRITE, 0, 0);
>   cs->buf[cs->cdw++] = 
> EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT) | EVENT_INDEX(0);
> - if (rctx->chip_class >= EVERGREEN) {
> - /* We were previously setting the CB and DB bits on
> -  * cp_coher_cntl, but this is unnecessary since
> -  * we are emitting the
> -  * EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet.
> -  * Setting the CB bits was causing lockups when using
> -  * compute on cayman.
> -  *
> -  * XXX: Do even need to emit a surface sync packet here?
> -  * Prior to e5e4c07e7964a3258ed02b530bcdc24c0650204b
> -  * surface sync was not being emitted with the
> -  * R600_CONTEXT_FLUSH_AND_INV flag.
> -  */
> - cp_coher_cntl = S_0085F0_TC_ACTION_ENA(1) |
> - S_0085F0_DB_ACTION_ENA(1) |
> - S_0085F0_SH_ACTION_ENA(1) |
> - S_0085F0_SMX_ACTION_ENA(1) |
> - S_0085F0_FULL_CACHE_ENA(1);
> - } else {
> - cp_coher_cntl = S_0085F0_SMX_ACTION_ENA(1) |
> - S_0085F0_SH_ACTION_ENA(1) |
> - S_0085F0_VC_ACTION_ENA(1) |
> - S_0085F0_TC_ACTION_ENA(1) |
> - S_0085F0_FULL_CACHE_ENA(1);
> - }
> - emit_flush = 1;
>   }
>  
>   if (rctx->flags & R600_CONTEXT_INVAL_READ_CACHES) {
> -- 
> 1.7.7.5
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: don't emit surface_sync after FLUSH_AND_INV_EVENT

2013-05-03 Thread alexdeucher
From: Alex Deucher 

It shouldn't be needed since the FLUSH_AND_INV_EVENT has already
made sure the destination caches are flushed.  Additionally,
we didn't previously emit the surface_sync until this commit:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=e5e4c07e7964a3258ed02b530bcdc24c0650204b
Emitting them together causes hangs in compute on cayman/TN
and hangs in Heaven on evergreen.

Note: this patch is a candidate for the 9.1 branch, but requires:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=156bcca62c9f4e79e78929f72bc085757f36a65a
as well.

Signed-off-by: Alex Deucher 
---
 src/gallium/drivers/r600/r600_hw_context.c |   26 --
 1 files changed, 0 insertions(+), 26 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index 6d8b2cf..944b666 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -226,32 +226,6 @@ void r600_flush_emit(struct r600_context *rctx)
if (rctx->flags & R600_CONTEXT_FLUSH_AND_INV) {
cs->buf[cs->cdw++] = PKT3(PKT3_EVENT_WRITE, 0, 0);
cs->buf[cs->cdw++] = 
EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT) | EVENT_INDEX(0);
-   if (rctx->chip_class >= EVERGREEN) {
-   /* We were previously setting the CB and DB bits on
-* cp_coher_cntl, but this is unnecessary since
-* we are emitting the
-* EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet.
-* Setting the CB bits was causing lockups when using
-* compute on cayman.
-*
-* XXX: Do even need to emit a surface sync packet here?
-* Prior to e5e4c07e7964a3258ed02b530bcdc24c0650204b
-* surface sync was not being emitted with the
-* R600_CONTEXT_FLUSH_AND_INV flag.
-*/
-   cp_coher_cntl = S_0085F0_TC_ACTION_ENA(1) |
-   S_0085F0_DB_ACTION_ENA(1) |
-   S_0085F0_SH_ACTION_ENA(1) |
-   S_0085F0_SMX_ACTION_ENA(1) |
-   S_0085F0_FULL_CACHE_ENA(1);
-   } else {
-   cp_coher_cntl = S_0085F0_SMX_ACTION_ENA(1) |
-   S_0085F0_SH_ACTION_ENA(1) |
-   S_0085F0_VC_ACTION_ENA(1) |
-   S_0085F0_TC_ACTION_ENA(1) |
-   S_0085F0_FULL_CACHE_ENA(1);
-   }
-   emit_flush = 1;
}
 
if (rctx->flags & R600_CONTEXT_INVAL_READ_CACHES) {
-- 
1.7.7.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: Correctly initialize the shader key, v2

2013-05-03 Thread Lauri Kasanen
Assigning a struct only copies the members - any padding is left as is.

Thus this code:

struct foo_t foo;
foo = bar;

leaves the padding of foo intact, ie uninitialized random garbage.

This patch fixes constant shader recompiles by initializing the struct
to zero. For completeness, memcpy is used to copy the key to the shader
struct.

Signed-off-by: Lauri Kasanen 
---
 src/gallium/drivers/r600/r600_state_common.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_state_common.c
b/src/gallium/drivers/r600/r600_state_common.c index 87a2e2e..fbac576
100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -715,6 +715,7 @@ static int r600_shader_select(struct pipe_context
*ctx, struct r600_pipe_shader * shader = NULL;
int r;
 
+   memset(&key, 0, sizeof(key));
key = r600_shader_selector_key(ctx, sel);
 
/* Check if we don't need to change anything.
@@ -762,7 +763,7 @@ static int r600_shader_select(struct pipe_context
*ctx, key = r600_shader_selector_key(ctx, sel);
}
 
-   shader->key = key;
+   memcpy(&shader->key, &key, sizeof(key));
sel->num_shaders++;
}
 
-- 
1.7.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] draw/pt: adjust overflow calculations

2013-05-03 Thread Jose Fonseca


- Original Message -
> On Thu, May 2, 2013 at 11:40 PM, Zack Rusin  wrote:
> > gallium lies. buffer_size is not actually buffer_size but available
> > size, which is 'buffer_size - buffer_offset' so by adding buffer
> > offset we'd incorrectly compute overflow.
> 
> Maybe add a comment to that effect in the code?

Yes, in the pipe_stream_output_target code.

Otherwise looks good to me Zack.

Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: tgsi opcodes for 32x32 muls with 64bit results

2013-05-03 Thread Roland Scheidegger
Yes, that's why I said it looks like separate low and high bits in
opencl. So in opencl you will get the low and high parts separately anyway.
If we have only one instruction, we also probably really wanted to be
able to say that we may only need one or the other destination to avoid
extra work, and I'm not sure we can do that easily without adding some
more code.

Roland


Am 03.05.2013 15:05, schrieb Aaron Watry:
> Not sure if this helps much, but...
> 
> With gentype being one of:
> char, uchar, short, ushort, int, uint, long, ulong, and the widths
> being scalar, 2, 3, 4, 8, or 16 components wide.
> 
> From the OpenCL 1.1 spec:
> gentype mad_hi(gentype a, gentype b):
> Computes x * y and returns the high half of the product of x and y
> 
> gentype mad_hi (gentype x, gentype y, gentype z)
> result = mul_hi(a,b) + c
> 
> --Aaron
> 
> 
> On Fri, May 3, 2013 at 5:31 AM, Marek Olšák  wrote:
>> FWIW, this maps nicely to r600, which also has separate instructions
>> for the low and high 32 bits. As to what option is better, it really
>> depends on whether shading languages and OpenCL expose the
>> instructions directly through functions, or whether they just have
>> 64-bit integers.
>>
>> Marek
>>
>> On Fri, May 3, 2013 at 1:29 AM, Roland Scheidegger  
>> wrote:
>>> Currently, there's no way to get the high bits of a 32x32
>>> signed/unsigned integer multiplication with tgsi.
>>> However, all of d3d10, OpenGL, and OpenCL support that, so we need it as
>>> well.
>>> There's essentially two ways how it could be done:
>>> - a 2-destination instruction returning both high and low bits (this is
>>> how it looks like in d3d10 and glsl)
>>> - use the existing umul for the low bits and have another instruction
>>> for the high bits (this is how it looks like in opencl)
>>>
>>> Well there's other possibilities but these looked like they'd match both
>>> APIs and HW reasonably (well with the exception of things like sse2
>>> which would prefer 2x2 32bit inputs and return 2x64bit as one reg...).
>>>
>>> Actually it's two new instructions because unlike for the low bits it
>>> matters for the high bits if the source operands are signed or unsigned.
>>>
>>> Personally I'm favoring two separate instructions for low and high bits
>>> to not have to deal with multi-destination instructions, but if someone
>>> makes a strong case for one returning both low and high bits I could be
>>> convinced otherwise. I think though two instructions matches most hw
>>> very well (with the exception of software renderers and possibly intel
>>> graphics but then a good backend could certainly recognize this).
>>>
>>> So here's what the docs would say about these instructions:
>>>
>>>
>>> .. opcode:: IMUL_HI - Signed Integer Multiply High Bits
>>>
>>>The high 32bits of the multiplication of 2 signed integers is returned.
>>>
>>> .. math::
>>>
>>>   dst.x = src0.x \times src1.x >> 32
>>>
>>>   dst.y = src0.y \times src1.y >> 32
>>>
>>>   dst.z = src0.z \times src1.z >> 32
>>>
>>>   dst.w = src0.w \times src1.w >> 32
>>>
>>>
>>> .. opcode:: UMUL_HI - Unsigned Integer Multiply High Bits
>>>
>>>The high 32bits of the multiplication of 2 unsigned integers is returned.
>>>
>>> .. math::
>>>
>>>   dst.x = src0.x \times src1.x >> 32
>>>
>>>   dst.y = src0.y \times src1.y >> 32
>>>
>>>   dst.z = src0.z \times src1.z >> 32
>>>
>>>   dst.w = src0.w \times src1.w >> 32
>>>
>>>
>>> Roland
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gallivm: allow negation of all integer types

2013-05-03 Thread Jose Fonseca
- Original Message -
> > Sorry to hear the hw doesn't support it, but this is supported by d3d10
> > so it's quite likely some hw indeed supports it. There's always some
> > things some hw can't do natively.
> 
> Well I was hoping before adding new things for sw driver to gallium we
> should confirm if it
> makes sense for hw.

I think that's what we've been doing all along.

Integer modified support was posted for review, and at the time you made the 
observation that R600 didn't supported, 
http://lists.freedesktop.org/archives/mesa-dev/2013-February/034639.html , but 
it didn't sound like you objected, and nobody else came forward weighing in one 
way or the other.  If we did a bad decision we can always revert it, but it's 
not like we are carrying our agenda without taking in consideration other 
opinions.

D3D10 is an abstraction that many HW vendors follow closely, so sticking with 
it when it doubt is IMO a good rule of thumb.

> > I'd be fine to just restrict it to some opcodes, e.g. exclude logic ops
> > if that helps you. Note that tgsi (and d3d10 for that matter) also for
> > instance don't have a integer SUB opcode, so the natural way for
> > encoding that would of course just be with a ADD with negate modifier on
> > second arg, and I bet your hw could do that one.
> 
> Nope, all r600 hw has SUB_INT. The negate bit is always specified to
> only operate
> on float instructions.

If we decide to forhibid integer modifiers, then maybe we should have an ISUB.



Anyway, I don't feel too strongly about integer modifiers either way.

If no hardware supports it natively, then I'm fine passing on the 
responsibility to the state tracker.  It would be nice to know what other HW 
besides R600 does.

But when in doubt, sticking with existing abstractions, or leaning towards the 
most generic representation, are IMO good rules of thumb.  Even a very dumb 
TGSI translator can easily handle the negative integer modifier by emiting an 
auxiliary instruciton, where as the opposite -- combine and INEG with other 
operations -- requires more complex optimization passes.


Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] r600g: Don't set the dest cache bits on surface sync for R600_CONTEXT_FLUSH_AND_INV

2013-05-03 Thread Vadim Girlin

On 05/03/2013 05:36 PM, Alex Deucher wrote:

On Fri, May 3, 2013 at 9:30 AM, Vadim Girlin  wrote:

This patch results in lockups with Heaven on juniper for me.


Does dropping the surface_sync packet completely help?  We shouldn't
need a surface_sync packet after a CACHE_FLUSH_AND_INV_EVENT packet
and prior to e5e4c07e7964a3258ed02b530bcdc24c0650204b we never emitted
it.


Yes, this patch fixed it.

Vadim



Alex

diff --git a/src/gallium/drivers/r600/r600_hw_context.c
b/src/gallium/drivers/r600/r600_hw_context.c
index 6d8b2cf..944b666 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -226,32 +226,6 @@ void r600_flush_emit(struct r600_context *rctx)
 if (rctx->flags & R600_CONTEXT_FLUSH_AND_INV) {
 cs->buf[cs->cdw++] = PKT3(PKT3_EVENT_WRITE, 0, 0);
 cs->buf[cs->cdw++] =
EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT) | EVENT_INDEX(0);
-   if (rctx->chip_class >= EVERGREEN) {
-   /* We were previously setting the CB and DB bits on
-* cp_coher_cntl, but this is unnecessary since
-* we are emitting the
-* EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet.
-* Setting the CB bits was causing lockups when using
-* compute on cayman.
-*
-* XXX: Do even need to emit a surface sync packet here?
-* Prior to e5e4c07e7964a3258ed02b530bcdc24c0650204b
-* surface sync was not being emitted with the
-* R600_CONTEXT_FLUSH_AND_INV flag.
-*/
-   cp_coher_cntl = S_0085F0_TC_ACTION_ENA(1) |
-   S_0085F0_DB_ACTION_ENA(1) |
-   S_0085F0_SH_ACTION_ENA(1) |
-   S_0085F0_SMX_ACTION_ENA(1) |
-   S_0085F0_FULL_CACHE_ENA(1);
-   } else {
-   cp_coher_cntl = S_0085F0_SMX_ACTION_ENA(1) |
-   S_0085F0_SH_ACTION_ENA(1) |
-   S_0085F0_VC_ACTION_ENA(1) |
-   S_0085F0_TC_ACTION_ENA(1) |
-   S_0085F0_FULL_CACHE_ENA(1);
-   }
-   emit_flush = 1;
 }

 if (rctx->flags & R600_CONTEXT_INVAL_READ_CACHES) {




Vadim



On 04/26/2013 09:21 PM, Tom Stellard wrote:


From: Tom Stellard 

We are already emitting a EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet
when this flush flag is set, so flushing the dest caches with a
SURFACE_SYNC should not be necessary.

The motivation for this change is that emitting a SURFACE_SYNC packet with
the CB bits set was causing compute shaders to hang on Cayman.
---
   src/gallium/drivers/r600/r600_hw_context.c | 28
+---
   1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_hw_context.c
b/src/gallium/drivers/r600/r600_hw_context.c
index b4fb3bf..8aebd25 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -231,21 +231,19 @@ void r600_flush_emit(struct r600_context *rctx)
 cs->buf[cs->cdw++] = PKT3(PKT3_EVENT_WRITE, 0, 0);
 cs->buf[cs->cdw++] =
EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT) | EVENT_INDEX(0);
 if (rctx->chip_class >= EVERGREEN) {
-   cp_coher_cntl = S_0085F0_CB0_DEST_BASE_ENA(1) |
-   S_0085F0_CB1_DEST_BASE_ENA(1) |
-   S_0085F0_CB2_DEST_BASE_ENA(1) |
-   S_0085F0_CB3_DEST_BASE_ENA(1) |
-   S_0085F0_CB4_DEST_BASE_ENA(1) |
-   S_0085F0_CB5_DEST_BASE_ENA(1) |
-   S_0085F0_CB6_DEST_BASE_ENA(1) |
-   S_0085F0_CB7_DEST_BASE_ENA(1) |
-   S_0085F0_CB8_DEST_BASE_ENA(1) |
-   S_0085F0_CB9_DEST_BASE_ENA(1) |
-   S_0085F0_CB10_DEST_BASE_ENA(1) |
-   S_0085F0_CB11_DEST_BASE_ENA(1) |
-   S_0085F0_DB_DEST_BASE_ENA(1) |
-   S_0085F0_TC_ACTION_ENA(1) |
-   S_0085F0_CB_ACTION_ENA(1) |
+   /* We were previously setting the CB and DB bits
on
+* cp_coher_cntl, but this is unnecessary since
+* we are emitting the
+* EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet.
+* Setting the 

Re: [Mesa-dev] RFC: tgsi opcodes for 32x32 muls with 64bit results

2013-05-03 Thread Roland Scheidegger
Am 03.05.2013 06:58, schrieb Jose Fonseca:
> 
> 
> - Original Message -
>> Currently, there's no way to get the high bits of a 32x32 
>> signed/unsigned integer multiplication with tgsi. However, all of
>> d3d10, OpenGL, and OpenCL support that, so we need it as well. 
>> There's essentially two ways how it could be done: - a
>> 2-destination instruction returning both high and low bits (this
>> is how it looks like in d3d10 and glsl) - use the existing umul for
>> the low bits and have another instruction for the high bits (this
>> is how it looks like in opencl)
>> 
>> Well there's other possibilities but these looked like they'd match
>> both APIs and HW reasonably (well with the exception of things like
>> sse2 which would prefer 2x2 32bit inputs and return 2x64bit as one
>> reg...).
>> 
>> Actually it's two new instructions because unlike for the low bits
>> it matters for the high bits if the source operands are signed or
>> unsigned.
>> 
>> Personally I'm favoring two separate instructions for low and high
>> bits to not have to deal with multi-destination instructions, but
>> if someone makes a strong case for one returning both low and high
>> bits I could be convinced otherwise. I think though two
>> instructions matches most hw very well (with the exception of
>> software renderers and possibly intel graphics but then a good
>> backend could certainly recognize this).
> 
> Roland,
> 
> I don't know about GPU HW, but I think that what you propose will
> forever prevent decent SSE code generation with LLVM.
> 
> Using two separate opcodes for hi/low bits relies on common
> sub-expression elimination to merge the two multiplication operations
> back into one.  But I strongly doubt that even LLVM's optimization
> passes will be able to do that.
> 
> Getting the 64bits results with LLVM will require sign extend the
> source arguments (http://llvm.org/docs/LangRef.html#mul-instruction )
> or SSE intrinsics. Eitherway, the expressions for the low and high
> bit will be radically different, so we'll end with two multiplies in
> the end -- which I think it is simply inadmissible -- TGSI should not
> stand in the way of backends generating good code.
You can't generate good code either way, this is a deficiency of sse
instruction set.
As I've outlined in another email, I think the best you can do with
sse41 is:
- shuffle both src args (put 2nd/4th elements into 1st/3rd slot)
- 2xpmuldq/pmuludq for doing the 32x32->64bit mul for both 1st/3rd and
2nd/4th element
- shuffle the high bits into place (I think this needs 3 hw shuffle
instructions)
- shuffle the low bits into place (can benefit from shuffles for high
bits, so just one another shuffle)

Maybe you can do better with more clever shuffles, but in any case the
low bits will always require one (at least) additional shuffle.

If you have separate opcodes, everything will be the same, except the
last step you'll just ignore that shuffle and instead just use the
pmulld instruction, which will do exactly what you need for the low
bits. Sure multiplications are more effort for the hw, but hell it even
has the same throughput on most cpus compared to a shuffle, just latency
is worse. In any case it would be 8 vs 8 instructions, with just one
instruction of them very slightly worse. We have much more optimization
opportunities elsewhere than that (I agree that with sse2, which lacks
pmulld, it would be worse, but we never particularly cared about that).



> 
> So I strongly think this is a bad idea. TGSI has support for multiple
> destinations, though we never made much use of it. I see nothing
> special about it.
> 
> If you can prove me wrong -- that LLVM can handle merge the
> multiplies -- fine.  But I do think we have bigger fish to fry, so
> I'd prefer we don't put too much time debating this.

No I doubt llvm can merge it (though in theory nothing would prevent it
from recognizing the pattern). My guess is it will do scalar extraction,
and use the imul/mul instructions (which can return 2x32bit numbers even
on 32bit), then combine the vectors back together (most likely element
by element). If it actually does it like that, a separate mul for the
low bits would be in fact a win, because it would save the 4 reinsertion
of the elements at the cost of just one vector mul (llvm uses pmulld
just fine). But looking at this that way doesn't really make sense, we
need instructions which make sense for everybody and aren't specified to
suit one very peculiar implementation.
But even if it generates optimal code, fact is that the multiply for
getting the low bits is essentially noise in the whole instruction
sequence. And who knows maybe intel will one day add some pmulhd/pmulhud
instruction (which just makes plain more sense for vector instruction
sets rather than the weird expanding muls).
So I really don't see how that will prevent good code from being
generated. Yes it will be one more multiplication (3 instead of 2 if
doing everything vectorized) bu

Re: [Mesa-dev] [PATCH 3/3] r600g: Don't set the dest cache bits on surface sync for R600_CONTEXT_FLUSH_AND_INV

2013-05-03 Thread Alex Deucher
On Fri, May 3, 2013 at 9:30 AM, Vadim Girlin  wrote:
> This patch results in lockups with Heaven on juniper for me.

Does dropping the surface_sync packet completely help?  We shouldn't
need a surface_sync packet after a CACHE_FLUSH_AND_INV_EVENT packet
and prior to e5e4c07e7964a3258ed02b530bcdc24c0650204b we never emitted
it.

Alex

diff --git a/src/gallium/drivers/r600/r600_hw_context.c
b/src/gallium/drivers/r600/r600_hw_context.c
index 6d8b2cf..944b666 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -226,32 +226,6 @@ void r600_flush_emit(struct r600_context *rctx)
if (rctx->flags & R600_CONTEXT_FLUSH_AND_INV) {
cs->buf[cs->cdw++] = PKT3(PKT3_EVENT_WRITE, 0, 0);
cs->buf[cs->cdw++] =
EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT) | EVENT_INDEX(0);
-   if (rctx->chip_class >= EVERGREEN) {
-   /* We were previously setting the CB and DB bits on
-* cp_coher_cntl, but this is unnecessary since
-* we are emitting the
-* EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet.
-* Setting the CB bits was causing lockups when using
-* compute on cayman.
-*
-* XXX: Do even need to emit a surface sync packet here?
-* Prior to e5e4c07e7964a3258ed02b530bcdc24c0650204b
-* surface sync was not being emitted with the
-* R600_CONTEXT_FLUSH_AND_INV flag.
-*/
-   cp_coher_cntl = S_0085F0_TC_ACTION_ENA(1) |
-   S_0085F0_DB_ACTION_ENA(1) |
-   S_0085F0_SH_ACTION_ENA(1) |
-   S_0085F0_SMX_ACTION_ENA(1) |
-   S_0085F0_FULL_CACHE_ENA(1);
-   } else {
-   cp_coher_cntl = S_0085F0_SMX_ACTION_ENA(1) |
-   S_0085F0_SH_ACTION_ENA(1) |
-   S_0085F0_VC_ACTION_ENA(1) |
-   S_0085F0_TC_ACTION_ENA(1) |
-   S_0085F0_FULL_CACHE_ENA(1);
-   }
-   emit_flush = 1;
}

if (rctx->flags & R600_CONTEXT_INVAL_READ_CACHES) {


>
> Vadim
>
>
>
> On 04/26/2013 09:21 PM, Tom Stellard wrote:
>>
>> From: Tom Stellard 
>>
>> We are already emitting a EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet
>> when this flush flag is set, so flushing the dest caches with a
>> SURFACE_SYNC should not be necessary.
>>
>> The motivation for this change is that emitting a SURFACE_SYNC packet with
>> the CB bits set was causing compute shaders to hang on Cayman.
>> ---
>>   src/gallium/drivers/r600/r600_hw_context.c | 28
>> +---
>>   1 file changed, 13 insertions(+), 15 deletions(-)
>>
>> diff --git a/src/gallium/drivers/r600/r600_hw_context.c
>> b/src/gallium/drivers/r600/r600_hw_context.c
>> index b4fb3bf..8aebd25 100644
>> --- a/src/gallium/drivers/r600/r600_hw_context.c
>> +++ b/src/gallium/drivers/r600/r600_hw_context.c
>> @@ -231,21 +231,19 @@ void r600_flush_emit(struct r600_context *rctx)
>> cs->buf[cs->cdw++] = PKT3(PKT3_EVENT_WRITE, 0, 0);
>> cs->buf[cs->cdw++] =
>> EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT) | EVENT_INDEX(0);
>> if (rctx->chip_class >= EVERGREEN) {
>> -   cp_coher_cntl = S_0085F0_CB0_DEST_BASE_ENA(1) |
>> -   S_0085F0_CB1_DEST_BASE_ENA(1) |
>> -   S_0085F0_CB2_DEST_BASE_ENA(1) |
>> -   S_0085F0_CB3_DEST_BASE_ENA(1) |
>> -   S_0085F0_CB4_DEST_BASE_ENA(1) |
>> -   S_0085F0_CB5_DEST_BASE_ENA(1) |
>> -   S_0085F0_CB6_DEST_BASE_ENA(1) |
>> -   S_0085F0_CB7_DEST_BASE_ENA(1) |
>> -   S_0085F0_CB8_DEST_BASE_ENA(1) |
>> -   S_0085F0_CB9_DEST_BASE_ENA(1) |
>> -   S_0085F0_CB10_DEST_BASE_ENA(1) |
>> -   S_0085F0_CB11_DEST_BASE_ENA(1) |
>> -   S_0085F0_DB_DEST_BASE_ENA(1) |
>> -   S_0085F0_TC_ACTION_ENA(1) |
>> -   S_0085F0_CB_ACTION_ENA(1) |
>> +   /* We were previously setting the CB and DB bits
>> on
>> +* cp_coher_cntl, but this is unnecessary since
>> +* we are emitting the
>> +* EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet

Re: [Mesa-dev] [PATCH 3/3] r600g: Don't set the dest cache bits on surface sync for R600_CONTEXT_FLUSH_AND_INV

2013-05-03 Thread Vadim Girlin

This patch results in lockups with Heaven on juniper for me.

Vadim


On 04/26/2013 09:21 PM, Tom Stellard wrote:

From: Tom Stellard 

We are already emitting a EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet
when this flush flag is set, so flushing the dest caches with a
SURFACE_SYNC should not be necessary.

The motivation for this change is that emitting a SURFACE_SYNC packet with
the CB bits set was causing compute shaders to hang on Cayman.
---
  src/gallium/drivers/r600/r600_hw_context.c | 28 +---
  1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index b4fb3bf..8aebd25 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -231,21 +231,19 @@ void r600_flush_emit(struct r600_context *rctx)
cs->buf[cs->cdw++] = PKT3(PKT3_EVENT_WRITE, 0, 0);
cs->buf[cs->cdw++] = 
EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT) | EVENT_INDEX(0);
if (rctx->chip_class >= EVERGREEN) {
-   cp_coher_cntl = S_0085F0_CB0_DEST_BASE_ENA(1) |
-   S_0085F0_CB1_DEST_BASE_ENA(1) |
-   S_0085F0_CB2_DEST_BASE_ENA(1) |
-   S_0085F0_CB3_DEST_BASE_ENA(1) |
-   S_0085F0_CB4_DEST_BASE_ENA(1) |
-   S_0085F0_CB5_DEST_BASE_ENA(1) |
-   S_0085F0_CB6_DEST_BASE_ENA(1) |
-   S_0085F0_CB7_DEST_BASE_ENA(1) |
-   S_0085F0_CB8_DEST_BASE_ENA(1) |
-   S_0085F0_CB9_DEST_BASE_ENA(1) |
-   S_0085F0_CB10_DEST_BASE_ENA(1) |
-   S_0085F0_CB11_DEST_BASE_ENA(1) |
-   S_0085F0_DB_DEST_BASE_ENA(1) |
-   S_0085F0_TC_ACTION_ENA(1) |
-   S_0085F0_CB_ACTION_ENA(1) |
+   /* We were previously setting the CB and DB bits on
+* cp_coher_cntl, but this is unnecessary since
+* we are emitting the
+* EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet.
+* Setting the CB bits was causing lockups when using
+* compute on cayman.
+*
+* XXX: Do even need to emit a surface sync packet here?
+* Prior to e5e4c07e7964a3258ed02b530bcdc24c0650204b
+* surface sync was not being emitted with the
+* R600_CONTEXT_FLUSH_AND_INV flag.
+*/
+   cp_coher_cntl = S_0085F0_TC_ACTION_ENA(1) |
S_0085F0_DB_ACTION_ENA(1) |
S_0085F0_SH_ACTION_ENA(1) |
S_0085F0_SMX_ACTION_ENA(1) |



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium: tgsi documentation updates and clarification for integer opcodes.

2013-05-03 Thread Jose Fonseca


- Original Message -
> From: Roland Scheidegger 
> 
> A lot of them were missing. Others were moved from the Compute ISA
> to a new Integer ISA section as that seemed more appropriate.
> ---
>  src/gallium/docs/source/tgsi.rst |  362
>  ++
>  1 file changed, 289 insertions(+), 73 deletions(-)
> 
> diff --git a/src/gallium/docs/source/tgsi.rst
> b/src/gallium/docs/source/tgsi.rst
> index a528fd2..b7caf63 100644
> --- a/src/gallium/docs/source/tgsi.rst
> +++ b/src/gallium/docs/source/tgsi.rst
> @@ -872,6 +872,16 @@ This instruction replicates its result.
>as an integer register.
>  
>  
> +.. opcode:: CONT - Continue
> +
> +  TBD
> +
> +.. note::
> +
> +   Support for CONT is determined by a special capability bit,
> +   ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information.
> +
> +
>  .. opcode:: IF - Float If
>  
>Start an IF ... ELSE .. ENDIF block.  Condition evaluates to true if
> @@ -977,6 +987,7 @@ These opcodes are primarily provided for special-use
> computational shaders.
>  Support for these opcodes indicated by a special pipe capability bit (TBD).
>  
>  XXX so let's discuss it, yeah?
> +XXX doesn't look like most of the opcodes really belong here.
>  
>  .. opcode:: CEIL - Ceiling
>  
> @@ -991,7 +1002,89 @@ XXX so let's discuss it, yeah?
>dst.w = \lceil src.w\rceil
>  
>  
> -.. opcode:: I2F - Integer To Float
> +.. opcode:: TRUNC - Truncate
> +
> +.. math::
> +
> +  dst.x = trunc(src.x)
> +
> +  dst.y = trunc(src.y)
> +
> +  dst.z = trunc(src.z)
> +
> +  dst.w = trunc(src.w)
> +
> +
> +.. opcode:: MOD - Modulus
> +
> +.. math::
> +
> +  dst.x = src0.x \bmod src1.x
> +
> +  dst.y = src0.y \bmod src1.y
> +
> +  dst.z = src0.z \bmod src1.z
> +
> +  dst.w = src0.w \bmod src1.w
> +
> +
> +.. opcode:: UARL - Integer Address Register Load
> +
> +  Moves the contents of the source register, assumed to be an integer, into
> the
> +  destination register, which is assumed to be an address (ADDR) register.
> +
> +
> +.. opcode:: SAD - Sum Of Absolute Differences
> +
> +.. math::
> +
> +  dst.x = |src0.x - src1.x| + src2.x
> +
> +  dst.y = |src0.y - src1.y| + src2.y
> +
> +  dst.z = |src0.z - src1.z| + src2.z
> +
> +  dst.w = |src0.w - src1.w| + src2.w
> +
> +
> +.. opcode:: TXF - Texel Fetch (as per NV_gpu_shader4), extract a single
> texel
> +  from a specified texture image. The source sampler may
> +   not be a CUBE or SHADOW.
> +  src 0 is a four-component signed integer vector used to
> +   identify the single texel accessed. 3 components + level.
> +   src 1 is a 3 component constant signed integer vector,
> +   with each component only have a range of
> +   -8..+8 (hw only seems to deal with this range, interface
> +   allows for up to unsigned int).
> +   TXF(uint_vec coord, int_vec offset).
> +
> +
> +.. opcode:: TXQ - Texture Size Query (as per NV_gpu_program4)
> +  retrieve the dimensions of the texture
> +  depending on the target. For 1D (width), 2D/RECT/CUBE
> +   (width, height), 3D (width, height, depth),
> +   1D array (width, layers), 2D array (width, height, layers)
> +
> +.. math::
> +
> +  lod = src0


  src0.x ?

Otherwise looks good. Thanks for taking the time of cleaning up these.

Jose

> +
> +  dst.x = texture_width(unit, lod)
> +
> +  dst.y = texture_height(unit, lod)
> +
> +  dst.z = texture_depth(unit, lod)
> +
> +
> +Integer ISA
> +
> +These opcodes are used for integer operations.
> +Support for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of
> them?)
> +
> +
> +.. opcode:: I2F - Signed Integer To Float
> +
> +   Rounding is unspecified (round to nearest even suggested).
>  
>  .. math::
>  
> @@ -1004,56 +1097,157 @@ XXX so let's discuss it, yeah?
>dst.w = (float) src.w
>  
>  
> -.. opcode:: NOT - Bitwise Not
> +.. opcode:: U2F - Unsigned Integer To Float
> +
> +   Rounding is unspecified (round to nearest even suggested).
>  
>  .. math::
>  
> -  dst.x = ~src.x
> +  dst.x = (float) src.x
>  
> -  dst.y = ~src.y
> +  dst.y = (float) src.y
>  
> -  dst.z = ~src.z
> +  dst.z = (float) src.z
>  
> -  dst.w = ~src.w
> +  dst.w = (float) src.w
>  
>  
> -.. opcode:: TRUNC - Truncate
> +.. opcode:: F2I - Float to Signed Integer
> +
> +   Rounding is towards zero (truncate).
> +   Values outside signed range (including NaNs) produce undefined results.
>  
>  .. math::
>  
> -  dst.x = trunc(src.x)
> +  dst.x = (int) src.x
>  
> -  dst.y = trunc(src.y)
> +  dst.y = (int) src.y
>  
> -  dst.z = trunc(src.z)
> +  dst.z = (int) src.z
>  
> -  dst.w = trunc(src.w)
> +  dst.w = (int) src.w
>  
>  
> -.. opcode:: SHL - Shift Left
> +.. opcode:: F2U - Float to Unsigned Integer
> +
> +   Rounding is towards zero (truncate).
> +   Values outside unsigned range (including NaNs) produce undefined results.
>  
>  ..

Re: [Mesa-dev] [PATCH] tgsi/ureg: make the dst register match the src indirection

2013-05-03 Thread Jose Fonseca
- Original Message -
> In ureg src registers could have an indirect register that was
> either a temp or an addr register, while dst registers allowed
> only addr. That made moving between them a little difficult so
> make them behave the same way and allow temp's and addr registers
> as indirect files for both (tgsi supports it, just ureg didn't).
> 
> Signed-off-by: Zack Rusin 
> ---
>  src/gallium/auxiliary/tgsi/tgsi_ureg.c |3 ++-
>  src/gallium/auxiliary/tgsi/tgsi_ureg.h |   14 ++
>  2 files changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
> b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
> index 5a28b89..432ed00 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
> +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
> @@ -258,6 +258,7 @@ ureg_dst_register( unsigned file,
> dst.File  = file;
> dst.WriteMask = TGSI_WRITEMASK_XYZW;
> dst.Indirect  = 0;
> +   dst.IndirectFile = TGSI_FILE_NULL;
> dst.IndirectIndex = 0;
> dst.IndirectSwizzle = 0;
> dst.Saturate  = 0;
> @@ -943,7 +944,7 @@ ureg_emit_dst( struct ureg_program *ureg,
> 
> if (dst.Indirect) {
>out[n].value = 0;
> -  out[n].ind.File = TGSI_FILE_ADDRESS;
> +  out[n].ind.File = dst.IndirectFile;
>out[n].ind.Swizzle = dst.IndirectSwizzle;
>out[n].ind.Index = dst.IndirectIndex;
>out[n].ind.ArrayID = dst.ArrayID;
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.h
> b/src/gallium/auxiliary/tgsi/tgsi_ureg.h
> index 4ad0242..1d7923b 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.h
> +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.h
> @@ -83,7 +83,8 @@ struct ureg_dst
> unsigned PredSwizzleZ: 2;  /* TGSI_SWIZZLE_ */
> unsigned PredSwizzleW: 2;  /* TGSI_SWIZZLE_ */
> int  Index   : 16; /* SINT */
> -   int  IndirectIndex   : 16; /* SINT */
> +   int  IndirectIndex   : 12; /* SINT */

I think it's better to keep IndirectIndex the same size, to match ureg_src.

ureg_src already is bigger than 64bits, and things still work, so I see no 
benefit in handicapping ureg_dst

Otherwise looks good to me.

Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: tgsi opcodes for 32x32 muls with 64bit results

2013-05-03 Thread Aaron Watry
Not sure if this helps much, but...

With gentype being one of:
char, uchar, short, ushort, int, uint, long, ulong, and the widths
being scalar, 2, 3, 4, 8, or 16 components wide.

From the OpenCL 1.1 spec:
gentype mad_hi(gentype a, gentype b):
Computes x * y and returns the high half of the product of x and y

gentype mad_hi (gentype x, gentype y, gentype z)
result = mul_hi(a,b) + c

--Aaron


On Fri, May 3, 2013 at 5:31 AM, Marek Olšák  wrote:
> FWIW, this maps nicely to r600, which also has separate instructions
> for the low and high 32 bits. As to what option is better, it really
> depends on whether shading languages and OpenCL expose the
> instructions directly through functions, or whether they just have
> 64-bit integers.
>
> Marek
>
> On Fri, May 3, 2013 at 1:29 AM, Roland Scheidegger  wrote:
>> Currently, there's no way to get the high bits of a 32x32
>> signed/unsigned integer multiplication with tgsi.
>> However, all of d3d10, OpenGL, and OpenCL support that, so we need it as
>> well.
>> There's essentially two ways how it could be done:
>> - a 2-destination instruction returning both high and low bits (this is
>> how it looks like in d3d10 and glsl)
>> - use the existing umul for the low bits and have another instruction
>> for the high bits (this is how it looks like in opencl)
>>
>> Well there's other possibilities but these looked like they'd match both
>> APIs and HW reasonably (well with the exception of things like sse2
>> which would prefer 2x2 32bit inputs and return 2x64bit as one reg...).
>>
>> Actually it's two new instructions because unlike for the low bits it
>> matters for the high bits if the source operands are signed or unsigned.
>>
>> Personally I'm favoring two separate instructions for low and high bits
>> to not have to deal with multi-destination instructions, but if someone
>> makes a strong case for one returning both low and high bits I could be
>> convinced otherwise. I think though two instructions matches most hw
>> very well (with the exception of software renderers and possibly intel
>> graphics but then a good backend could certainly recognize this).
>>
>> So here's what the docs would say about these instructions:
>>
>>
>> .. opcode:: IMUL_HI - Signed Integer Multiply High Bits
>>
>>The high 32bits of the multiplication of 2 signed integers is returned.
>>
>> .. math::
>>
>>   dst.x = src0.x \times src1.x >> 32
>>
>>   dst.y = src0.y \times src1.y >> 32
>>
>>   dst.z = src0.z \times src1.z >> 32
>>
>>   dst.w = src0.w \times src1.w >> 32
>>
>>
>> .. opcode:: UMUL_HI - Unsigned Integer Multiply High Bits
>>
>>The high 32bits of the multiplication of 2 unsigned integers is returned.
>>
>> .. math::
>>
>>   dst.x = src0.x \times src1.x >> 32
>>
>>   dst.y = src0.y \times src1.y >> 32
>>
>>   dst.z = src0.z \times src1.z >> 32
>>
>>   dst.w = src0.w \times src1.w >> 32
>>
>>
>> Roland
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] draw/pt: adjust overflow calculations

2013-05-03 Thread Alex Deucher
On Thu, May 2, 2013 at 11:40 PM, Zack Rusin  wrote:
> gallium lies. buffer_size is not actually buffer_size but available
> size, which is 'buffer_size - buffer_offset' so by adding buffer
> offset we'd incorrectly compute overflow.

Maybe add a comment to that effect in the code?

Alex

>
> Signed-off-by: Zack Rusin 
> ---
>  src/gallium/auxiliary/draw/draw_pt_so_emit.c |3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/src/gallium/auxiliary/draw/draw_pt_so_emit.c 
> b/src/gallium/auxiliary/draw/draw_pt_so_emit.c
> index 9b55773..d624a99 100644
> --- a/src/gallium/auxiliary/draw/draw_pt_so_emit.c
> +++ b/src/gallium/auxiliary/draw/draw_pt_so_emit.c
> @@ -131,8 +131,7 @@ static void so_emit_prim(struct pt_so_emit *so,
> for (i = 0; i < draw->so.num_targets; i++) {
>struct draw_so_target *target = draw->so.targets[i];
>if (target) {
> - buffer_total_bytes[i] = target->internal_offset +
> -target->target.buffer_offset;
> + buffer_total_bytes[i] = target->internal_offset;
>} else {
>   buffer_total_bytes[i] = 0;
>}
> --
> 1.7.10.4
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Correctly initialize the shader key

2013-05-03 Thread Vadim Girlin

On 05/03/2013 03:10 PM, Lauri Kasanen wrote:

Assigning a struct only copies the members - any padding is left as is.

Thus this code:

struct foo;
foo = bar;

leaves the padding of foo intact, ie uninitialized random garbage.

This patch fixes constant shader recompiles by initializing the struct
to zero.

Signed-off-by: Lauri Kasanen 
---
  src/gallium/drivers/r600/r600_state_common.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_state_common.c
b/src/gallium/drivers/r600/r600_state_common.c index 87a2e2e..bf7cc39
100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -710,7 +710,7 @@ static int r600_shader_select(struct pipe_context
*ctx, struct r600_pipe_shader_selector* sel,
  bool *dirty)
  {
-   struct r600_shader_key key;
+   struct r600_shader_key key = {0};


I suspect the effect of this initialization on padding is undefined. 
Probably it's safer to use memset.


Vadim


struct r600_context *rctx = (struct r600_context *)ctx;
struct r600_pipe_shader * shader = NULL;
int r;



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600 sb test results

2013-05-03 Thread Lauri Kasanen
On Fri, 03 May 2013 01:27:27 +0400
Vadim Girlin  wrote:

> I'm almost sure that the same issue that you have with glxgears affects 
> your app too, so you might want to wait until we resolve the problem 
> with gears, possibly this will solve other rendering issues as well.
>
...
>
> By the way, I won't be very surprised if some old gcc release simply 
> fails at handling bitfields which are used to store both the keys of 
> shader variants in r600g and bytecode data in r600-sb (the same data 
> that ends up being broken in your glxgears dump), IIRC there were 
> bitfields-related bugs.

It's not a bug, but undefined behavior AFAIK. I sent a patch that fixes
the constant rebuilds (r600g: Correctly initialize the shader key).

With these currently pending patches applied, I get no more visual
distortion, and the fps improves a bit (28 -> 32).

Just in case it'd be useful to you, here's the current sb,vs,ps output:
http://bayfiles.net/file/PnH3/9BRcGY/foo_shaders.gz

- Lauri
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: Correctly initialize the shader key

2013-05-03 Thread Lauri Kasanen
Assigning a struct only copies the members - any padding is left as is.

Thus this code:

struct foo;
foo = bar;

leaves the padding of foo intact, ie uninitialized random garbage.

This patch fixes constant shader recompiles by initializing the struct
to zero.

Signed-off-by: Lauri Kasanen 
---
 src/gallium/drivers/r600/r600_state_common.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_state_common.c
b/src/gallium/drivers/r600/r600_state_common.c index 87a2e2e..bf7cc39
100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -710,7 +710,7 @@ static int r600_shader_select(struct pipe_context
*ctx, struct r600_pipe_shader_selector* sel,
 bool *dirty)
 {
-   struct r600_shader_key key;
+   struct r600_shader_key key = {0};
struct r600_context *rctx = (struct r600_context *)ctx;
struct r600_pipe_shader * shader = NULL;
int r;
-- 
1.7.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: use old shader disassembler by default

2013-05-03 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Fri, May 3, 2013 at 10:04 AM, Vadim Girlin  wrote:
> New disassembler is not completely isolated yet from further processing
> in r600g/sb that is not required for printing the dump, so it has higher
> probability to fail in case of any unexpected features in the bytecode.
>
> This patch adds "sbdisasm" flag for R600_DEBUG that allows to use new
> disassembler in r600g/sb for shader dumps when shader optimization
> is not enabled.
>
> If shader optimization is enabled, new disassembler is used by default.
>
> Signed-off-by: Vadim Girlin 
> ---
>  src/gallium/drivers/r600/r600_asm.c| 13 +++--
>  src/gallium/drivers/r600/r600_pipe.c   |  1 +
>  src/gallium/drivers/r600/r600_pipe.h   |  1 +
>  src/gallium/drivers/r600/r600_shader.c | 22 +-
>  4 files changed, 18 insertions(+), 19 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/r600_asm.c 
> b/src/gallium/drivers/r600/r600_asm.c
> index 81b84ec..df0376a 100644
> --- a/src/gallium/drivers/r600/r600_asm.c
> +++ b/src/gallium/drivers/r600/r600_asm.c
> @@ -2281,6 +2281,7 @@ void *r600_create_vertex_fetch_shader(struct 
> pipe_context *ctx,
> uint32_t *bytecode;
> int i, j, r, fs_size;
> struct r600_fetch_shader *shader;
> +   unsigned sb_disasm = rctx->screen->debug_flags & (DBG_SB_DISASM | 
> DBG_SB);
>
> assert(count < 32);
>
> @@ -2387,13 +2388,13 @@ void *r600_create_vertex_fetch_shader(struct 
> pipe_context *ctx,
> fprintf(stderr, "\n");
> }
>
> -#if 0
> -   r600_bytecode_disasm(&bc);
> +   if (!sb_disasm) {
> +   r600_bytecode_disasm(&bc);
>
> -   fprintf(stderr, 
> "__\n");
> -#else
> -   r600_sb_bytecode_process(rctx, &bc, NULL, 1 /*dump*/, 0 
> /*optimize*/);
> -#endif
> +   fprintf(stderr, 
> "__\n");
> +   } else {
> +   r600_sb_bytecode_process(rctx, &bc, NULL, 1 /*dump*/, 
> 0 /*optimize*/);
> +   }
> }
>
> fs_size = bc.ndw*4;
> diff --git a/src/gallium/drivers/r600/r600_pipe.c 
> b/src/gallium/drivers/r600/r600_pipe.c
> index 4991fb2..daadaeb 100644
> --- a/src/gallium/drivers/r600/r600_pipe.c
> +++ b/src/gallium/drivers/r600/r600_pipe.c
> @@ -73,6 +73,7 @@ static const struct debug_named_value debug_options[] = {
> { "sbstat", DBG_SB_STAT, "Print optimization statistics for shaders" 
> },
> { "sbdump", DBG_SB_DUMP, "Print IR dumps after some optimization 
> passes" },
> { "sbnofallback", DBG_SB_NO_FALLBACK, "Abort on errors instead of 
> fallback" },
> +   { "sbdisasm", DBG_SB_DISASM, "Use sb disassembler for shader dumps" },
>
> DEBUG_NAMED_VALUE_END /* must be last */
>  };
> diff --git a/src/gallium/drivers/r600/r600_pipe.h 
> b/src/gallium/drivers/r600/r600_pipe.h
> index 61e2022..bb4e429 100644
> --- a/src/gallium/drivers/r600/r600_pipe.h
> +++ b/src/gallium/drivers/r600/r600_pipe.h
> @@ -264,6 +264,7 @@ typedef boolean (*r600g_dma_blit_t)(struct pipe_context 
> *ctx,
>  #define DBG_SB_STAT(1 << 24)
>  #define DBG_SB_DUMP(1 << 25)
>  #define DBG_SB_NO_FALLBACK (1 << 26)
> +#define DBG_SB_DISASM  (1 << 27)
>
>  struct r600_tiling_info {
> unsigned num_channels;
> diff --git a/src/gallium/drivers/r600/r600_shader.c 
> b/src/gallium/drivers/r600/r600_shader.c
> index 49218e5..9afd57f 100644
> --- a/src/gallium/drivers/r600/r600_shader.c
> +++ b/src/gallium/drivers/r600/r600_shader.c
> @@ -141,6 +141,7 @@ int r600_pipe_shader_create(struct pipe_context *ctx,
> uint32_t *ptr;
> bool dump = r600_can_dump_shader(rctx->screen, 
> tgsi_get_processor_type(sel->tokens));
> unsigned use_sb = rctx->screen->debug_flags & DBG_SB;
> +   unsigned sb_disasm = use_sb || (rctx->screen->debug_flags & 
> DBG_SB_DISASM);
>
> shader->shader.bc.isa = rctx->isa;
>
> @@ -163,21 +164,18 @@ int r600_pipe_shader_create(struct pipe_context *ctx,
> return r;
> }
>
> -#if 0
> -   if (dump) {
> +   if (dump && !sb_disasm) {
> fprintf(stderr, 
> "--\n");
> r600_bytecode_disasm(&shader->shader.bc);
> fprintf(stderr, 
> "__\n");
> -   }
> -#else
> -   if (dump || use_sb) {
> -   r = r600_sb_bytecode_process(rctx, &shader->shader.bc, 
> &shader->shader, dump, use_sb);
> +   } else if ((dump && sb_disasm) || use_sb) {
> +   r = r600_sb_bytecode_process(rctx, &shader->shader.bc, 
> &shader->shader,
> +dump, use_sb);
> if (r) {
> R600_ERR("r600_sb_byte

Re: [Mesa-dev] RFC: tgsi opcodes for 32x32 muls with 64bit results

2013-05-03 Thread Marek Olšák
FWIW, this maps nicely to r600, which also has separate instructions
for the low and high 32 bits. As to what option is better, it really
depends on whether shading languages and OpenCL expose the
instructions directly through functions, or whether they just have
64-bit integers.

Marek

On Fri, May 3, 2013 at 1:29 AM, Roland Scheidegger  wrote:
> Currently, there's no way to get the high bits of a 32x32
> signed/unsigned integer multiplication with tgsi.
> However, all of d3d10, OpenGL, and OpenCL support that, so we need it as
> well.
> There's essentially two ways how it could be done:
> - a 2-destination instruction returning both high and low bits (this is
> how it looks like in d3d10 and glsl)
> - use the existing umul for the low bits and have another instruction
> for the high bits (this is how it looks like in opencl)
>
> Well there's other possibilities but these looked like they'd match both
> APIs and HW reasonably (well with the exception of things like sse2
> which would prefer 2x2 32bit inputs and return 2x64bit as one reg...).
>
> Actually it's two new instructions because unlike for the low bits it
> matters for the high bits if the source operands are signed or unsigned.
>
> Personally I'm favoring two separate instructions for low and high bits
> to not have to deal with multi-destination instructions, but if someone
> makes a strong case for one returning both low and high bits I could be
> convinced otherwise. I think though two instructions matches most hw
> very well (with the exception of software renderers and possibly intel
> graphics but then a good backend could certainly recognize this).
>
> So here's what the docs would say about these instructions:
>
>
> .. opcode:: IMUL_HI - Signed Integer Multiply High Bits
>
>The high 32bits of the multiplication of 2 signed integers is returned.
>
> .. math::
>
>   dst.x = src0.x \times src1.x >> 32
>
>   dst.y = src0.y \times src1.y >> 32
>
>   dst.z = src0.z \times src1.z >> 32
>
>   dst.w = src0.w \times src1.w >> 32
>
>
> .. opcode:: UMUL_HI - Unsigned Integer Multiply High Bits
>
>The high 32bits of the multiplication of 2 unsigned integers is returned.
>
> .. math::
>
>   dst.x = src0.x \times src1.x >> 32
>
>   dst.y = src0.y \times src1.y >> 32
>
>   dst.z = src0.z \times src1.z >> 32
>
>   dst.w = src0.w \times src1.w >> 32
>
>
> Roland
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] scons: remove nouveau build

2013-05-03 Thread Andreas Boll
One build system for linux/unix only drivers should be enough.
Additionally the nouveau target was disabled anyway.
---
 src/gallium/SConscript |   15 ---
 src/gallium/drivers/nouveau/SConscript |   10 --
 src/gallium/drivers/nv30/SConscript|   11 ---
 src/gallium/drivers/nv50/SConscript|   11 ---
 src/gallium/drivers/nvc0/SConscript|   11 ---
 5 files changed, 58 deletions(-)
 delete mode 100644 src/gallium/drivers/nouveau/SConscript
 delete mode 100644 src/gallium/drivers/nv30/SConscript
 delete mode 100644 src/gallium/drivers/nv50/SConscript
 delete mode 100644 src/gallium/drivers/nvc0/SConscript

diff --git a/src/gallium/SConscript b/src/gallium/SConscript
index c53cd39..a3edc65 100644
--- a/src/gallium/SConscript
+++ b/src/gallium/SConscript
@@ -27,19 +27,6 @@ if not env['msvc']:
 'drivers/i915/SConscript',
 ])
 
-if env['drm']:
-# These drivers depend on drm headers
-# XXX: nouveau drivers have a tight dependency on libdrm, so to enable
-# we need some version logic before we enable them. Also, ATM there is
-# no nouveau target in scons
-# if env['drm_nouveau']:
-# SConscript([
-# 'drivers/nouveau/SConscript', 
-# 'drivers/nv50/SConscript', 
-# 'drivers/nvc0/SConscript', 
-# 'drivers/nvfx/SConscript',
-# ])
-
 #
 # State trackers
 #
@@ -132,7 +119,6 @@ if not env['embedded']:
 'targets/SConscript.dri',
 'targets/dri-swrast/SConscript',
 'targets/dri-vmwgfx/SConscript',
-#'targets/dri-nouveau/SConscript',
 ])
 if env['drm_intel']:
 SConscript([
@@ -142,7 +128,6 @@ if not env['embedded']:
 if env['xorg'] and env['drm']:
 SConscript([
 #'targets/xorg-i915/SConscript',
-#'targets/xorg-nouveau/SConscript',
 ])
 
 
diff --git a/src/gallium/drivers/nouveau/SConscript 
b/src/gallium/drivers/nouveau/SConscript
deleted file mode 100644
index 418744c..000
--- a/src/gallium/drivers/nouveau/SConscript
+++ /dev/null
@@ -1,10 +0,0 @@
-Import('*')
-
-env = env.Clone()
-
-nouveau = env.ConvenienceLibrary(
-target = 'nouveau',
-source = env.ParseSourceList('Makefile.sources', 'C_SOURCES')
-)
-
-Export('nouveau')
diff --git a/src/gallium/drivers/nv30/SConscript 
b/src/gallium/drivers/nv30/SConscript
deleted file mode 100644
index 3c2ff03..000
--- a/src/gallium/drivers/nv30/SConscript
+++ /dev/null
@@ -1,11 +0,0 @@
-Import('*')
-
-env = env.Clone()
-
-nv30 = env.ConvenienceLibrary(
-target = 'nv30',
-source = env.ParseSourceList('Makefile.sources',
-['C_SOURCES', 'CPP_SOURCES'])
-)
-
-Export('nv30')
diff --git a/src/gallium/drivers/nv50/SConscript 
b/src/gallium/drivers/nv50/SConscript
deleted file mode 100644
index 3baaa8a..000
--- a/src/gallium/drivers/nv50/SConscript
+++ /dev/null
@@ -1,11 +0,0 @@
-Import('*')
-
-env = env.Clone()
-
-nv50 = env.ConvenienceLibrary(
-target = 'nv50',
-source = env.ParseSourceList('Makefile.sources',
-['C_SOURCES', 'CPP_SOURCES'])
-)
-
-Export('nv50')
diff --git a/src/gallium/drivers/nvc0/SConscript 
b/src/gallium/drivers/nvc0/SConscript
deleted file mode 100644
index 4e456d1..000
--- a/src/gallium/drivers/nvc0/SConscript
+++ /dev/null
@@ -1,11 +0,0 @@
-Import('*')
-
-env = env.Clone()
-
-nvc0 = env.ConvenienceLibrary(
-target = 'nvc0',
-source = env.ParseSourceList('Makefile.sources',
-['C_SOURCES', 'CPP_SOURCES'])
-)
-
-Export('nvc0')
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] scons: remove radeon build

2013-05-03 Thread Andreas Boll
One build system for linux/unix only drivers should be enough.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48694
---
 scons/gallium.py|1 -
 src/gallium/SConscript  |   24 ---
 src/gallium/drivers/r300/SConscript |   24 ---
 src/gallium/drivers/r600/SConscript |   18 -
 src/gallium/drivers/radeonsi/SConscript |   17 
 src/gallium/targets/dri-r300/SConscript |   26 -
 src/gallium/targets/dri-r600/SConscript |   28 ---
 src/gallium/targets/dri-radeonsi/SConscript |   25 
 src/gallium/targets/egl-static/SConscript   |9 -
 src/gallium/winsys/radeon/drm/SConscript|   14 --
 10 files changed, 186 deletions(-)
 delete mode 100644 src/gallium/drivers/r300/SConscript
 delete mode 100644 src/gallium/drivers/r600/SConscript
 delete mode 100644 src/gallium/drivers/radeonsi/SConscript
 delete mode 100644 src/gallium/targets/dri-r300/SConscript
 delete mode 100644 src/gallium/targets/dri-r600/SConscript
 delete mode 100644 src/gallium/targets/dri-radeonsi/SConscript
 delete mode 100644 src/gallium/winsys/radeon/drm/SConscript

diff --git a/scons/gallium.py b/scons/gallium.py
index 7eb515e..2e341e9 100755
--- a/scons/gallium.py
+++ b/scons/gallium.py
@@ -535,7 +535,6 @@ def generate(env):
 env.PkgCheckModules('XF86VIDMODE', ['xxf86vm'])
 env.PkgCheckModules('DRM', ['libdrm >= 2.4.24'])
 env.PkgCheckModules('DRM_INTEL', ['libdrm_intel >= 2.4.30'])
-env.PkgCheckModules('DRM_RADEON', ['libdrm_radeon >= 2.4.42'])
 env.PkgCheckModules('XORG', ['xorg-server >= 1.6.0'])
 env.PkgCheckModules('KMS', ['libkms >= 2.4.24'])
 env.PkgCheckModules('UDEV', ['libudev > 150'])
diff --git a/src/gallium/SConscript b/src/gallium/SConscript
index f281f4c..c53cd39 100644
--- a/src/gallium/SConscript
+++ b/src/gallium/SConscript
@@ -29,15 +29,6 @@ if not env['msvc']:
 
 if env['drm']:
 # These drivers depend on drm headers
-if env['drm_radeon']:
-SConscript([
-'drivers/r300/SConscript',
-'drivers/r600/SConscript',
-])
-if env['llvm']:
-SConscript([
-'drivers/radeonsi/SConscript',
-])
 # XXX: nouveau drivers have a tight dependency on libdrm, so to enable
 # we need some version logic before we enable them. Also, ATM there is
 # no nouveau target in scons
@@ -110,11 +101,6 @@ if env['dri']:
 'winsys/i915/drm/SConscript',
 ])
 
-if env['drm_radeon']:
-SConscript([
-'winsys/radeon/drm/SConscript',
-])
-
 #
 # Targets
 #
@@ -152,21 +138,11 @@ if not env['embedded']:
 SConscript([
 'targets/dri-i915/SConscript',
 ])
-if env['drm_radeon']:
-SConscript([
-'targets/dri-r300/SConscript',
-'targets/dri-r600/SConscript',
-])
-if env['llvm']:
-SConscript([
-'targets/dri-radeonsi/SConscript',
-])
 
 if env['xorg'] and env['drm']:
 SConscript([
 #'targets/xorg-i915/SConscript',
 #'targets/xorg-nouveau/SConscript',
-#'targets/xorg-radeon/SConscript',
 ])
 
 
diff --git a/src/gallium/drivers/r300/SConscript 
b/src/gallium/drivers/r300/SConscript
deleted file mode 100644
index 7ff0732..000
--- a/src/gallium/drivers/r300/SConscript
+++ /dev/null
@@ -1,24 +0,0 @@
-Import('*')
-
-env = env.Clone()
-
-env.PkgUseModules('DRM_RADEON')
-
-env.Append(CPPPATH = [
-'#/include', 
-'#/src/mesa',
-'#/src/glsl',
-'#/src/mapi',
-])
-
-r300 = env.ConvenienceLibrary(
-target = 'r300',
-source = env.ParseSourceList('Makefile.sources', 'C_SOURCES') + [
-'#/src/glsl/ralloc.c',
-'#/src/mesa/program/register_allocate.c'
-])
-
-env.Alias('r300', r300)
-
-Export('r300')
-
diff --git a/src/gallium/drivers/r600/SConscript 
b/src/gallium/drivers/r600/SConscript
deleted file mode 100644
index 658be50..000
--- a/src/gallium/drivers/r600/SConscript
+++ /dev/null
@@ -1,18 +0,0 @@
-Import('*')
-
-env = env.Clone()
-
-env.PkgUseModules('DRM_RADEON')
-
-env.Append(CPPPATH = [
-'#/include',
-])
-
-r600 = env.ConvenienceLibrary(
-target = 'r600',
-source = env.ParseSourceList('Makefile.sources', 'C_SOURCES')
-)
-
-env.Alias('r600', r600)
-
-Export('r600')
diff --git a/src/gallium/drivers/radeonsi/SConscript 
b/src/gallium/drivers/radeonsi/SConscript
deleted file mode 100644
index f2d2bec..000
--- a/src/gallium/drivers/radeonsi/SConscript
+++ /dev/null
@@ -1,17 +0,0 @@
-Import('*')
-
-env = env.Clone()
-
-env.Append(CPPPATH = [
-'#/include',
-'#/src/gallium/drivers/radeon',
-])
-
-radeonsi = env.ConvenienceLibrary(
-target = 'radeonsi',
-source = en

[Mesa-dev] [Bug 59591] BUILD make distclean fails

2013-05-03 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=59591

Andy Furniss  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/xvmc/tests: Fix build failure, v2

2013-05-03 Thread Christian König

Am 03.05.2013 10:48, schrieb Lauri Kasanen:

v2: Removed extra libs as requested by Matt Turner.

Signed-off-by: Lauri Kasanen 


Reviewed-by: Christian König 


---
  src/gallium/state_trackers/xvmc/Makefile.am |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/gallium/state_trackers/xvmc/Makefile.am
b/src/gallium/state_trackers/xvmc/Makefile.am index 8ac7406..184432e
100644
--- a/src/gallium/state_trackers/xvmc/Makefile.am
+++ b/src/gallium/state_trackers/xvmc/Makefile.am
@@ -44,7 +44,7 @@ check_PROGRAMS = \
  TESTS = $(check_PROGRAMS)
  noinst_PROGRAMS = tests/xvmc_bench
  
-TEST_LIBS = -lXvMCW -lXvMC -lXv -lX11

+TEST_LIBS = $(XVMC_LIBS) -lXvMCW
  tests_test_context_SOURCES = tests/test_context.c tests/testlib.c
  tests_test_context_LDADD = $(TEST_LIBS)
  tests_test_surface_SOURCES = tests/test_surface.c tests/testlib.c


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600-sb: glxgears wrong rendering

2013-05-03 Thread Lauri Kasanen
On Fri, 03 May 2013 00:39:09 +0400
Vadim Girlin  wrote:

> I see some issues issues in the dump, looks like compiler doesn't 
> zero-initialize some data (particularly alu_node::bc) in cases where I 
> expect it. Possibly it's my bug, I'll look into it, but the data in 
> question is definitely zero-initialized with gcc 4.8 (also this worked 
> fine for me previously with gcc 4.7). I suspect this could be related to 
> the issues of old gcc releases (e.g. something like [1]).
> 
> Please test with the attached patch.
>
>   [1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33916

With this patch, glxgears has correct output.

Here's the dump with the patch:
http://bayfiles.net/file/PnC9/us249L/glxgears_new.gz

Please note that the behavior you relied on - implicit zeroing of C++
class members - is unreliable in other compilers too. I find it's also
bad practise not to init members in the constructor if you expect them
to have some particular value.

- Lauri
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] st/xvmc/tests: Fix build failure, v2

2013-05-03 Thread Lauri Kasanen
v2: Removed extra libs as requested by Matt Turner.

Signed-off-by: Lauri Kasanen 
---
 src/gallium/state_trackers/xvmc/Makefile.am |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/gallium/state_trackers/xvmc/Makefile.am
b/src/gallium/state_trackers/xvmc/Makefile.am index 8ac7406..184432e
100644
--- a/src/gallium/state_trackers/xvmc/Makefile.am
+++ b/src/gallium/state_trackers/xvmc/Makefile.am
@@ -44,7 +44,7 @@ check_PROGRAMS = \
 TESTS = $(check_PROGRAMS)
 noinst_PROGRAMS = tests/xvmc_bench
 
-TEST_LIBS = -lXvMCW -lXvMC -lXv -lX11
+TEST_LIBS = $(XVMC_LIBS) -lXvMCW
 tests_test_context_SOURCES = tests/test_context.c tests/testlib.c
 tests_test_context_LDADD = $(TEST_LIBS)
 tests_test_surface_SOURCES = tests/test_surface.c tests/testlib.c
-- 
1.7.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: use old shader disassembler by default

2013-05-03 Thread Vadim Girlin
New disassembler is not completely isolated yet from further processing
in r600g/sb that is not required for printing the dump, so it has higher
probability to fail in case of any unexpected features in the bytecode.

This patch adds "sbdisasm" flag for R600_DEBUG that allows to use new
disassembler in r600g/sb for shader dumps when shader optimization
is not enabled.

If shader optimization is enabled, new disassembler is used by default.

Signed-off-by: Vadim Girlin 
---
 src/gallium/drivers/r600/r600_asm.c| 13 +++--
 src/gallium/drivers/r600/r600_pipe.c   |  1 +
 src/gallium/drivers/r600/r600_pipe.h   |  1 +
 src/gallium/drivers/r600/r600_shader.c | 22 +-
 4 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_asm.c 
b/src/gallium/drivers/r600/r600_asm.c
index 81b84ec..df0376a 100644
--- a/src/gallium/drivers/r600/r600_asm.c
+++ b/src/gallium/drivers/r600/r600_asm.c
@@ -2281,6 +2281,7 @@ void *r600_create_vertex_fetch_shader(struct pipe_context 
*ctx,
uint32_t *bytecode;
int i, j, r, fs_size;
struct r600_fetch_shader *shader;
+   unsigned sb_disasm = rctx->screen->debug_flags & (DBG_SB_DISASM | 
DBG_SB);
 
assert(count < 32);
 
@@ -2387,13 +2388,13 @@ void *r600_create_vertex_fetch_shader(struct 
pipe_context *ctx,
fprintf(stderr, "\n");
}
 
-#if 0
-   r600_bytecode_disasm(&bc);
+   if (!sb_disasm) {
+   r600_bytecode_disasm(&bc);
 
-   fprintf(stderr, 
"__\n");
-#else
-   r600_sb_bytecode_process(rctx, &bc, NULL, 1 /*dump*/, 0 
/*optimize*/);
-#endif
+   fprintf(stderr, 
"__\n");
+   } else {
+   r600_sb_bytecode_process(rctx, &bc, NULL, 1 /*dump*/, 0 
/*optimize*/);
+   }
}
 
fs_size = bc.ndw*4;
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index 4991fb2..daadaeb 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -73,6 +73,7 @@ static const struct debug_named_value debug_options[] = {
{ "sbstat", DBG_SB_STAT, "Print optimization statistics for shaders" },
{ "sbdump", DBG_SB_DUMP, "Print IR dumps after some optimization 
passes" },
{ "sbnofallback", DBG_SB_NO_FALLBACK, "Abort on errors instead of 
fallback" },
+   { "sbdisasm", DBG_SB_DISASM, "Use sb disassembler for shader dumps" },
 
DEBUG_NAMED_VALUE_END /* must be last */
 };
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index 61e2022..bb4e429 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -264,6 +264,7 @@ typedef boolean (*r600g_dma_blit_t)(struct pipe_context 
*ctx,
 #define DBG_SB_STAT(1 << 24)
 #define DBG_SB_DUMP(1 << 25)
 #define DBG_SB_NO_FALLBACK (1 << 26)
+#define DBG_SB_DISASM  (1 << 27)
 
 struct r600_tiling_info {
unsigned num_channels;
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 49218e5..9afd57f 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -141,6 +141,7 @@ int r600_pipe_shader_create(struct pipe_context *ctx,
uint32_t *ptr;
bool dump = r600_can_dump_shader(rctx->screen, 
tgsi_get_processor_type(sel->tokens));
unsigned use_sb = rctx->screen->debug_flags & DBG_SB;
+   unsigned sb_disasm = use_sb || (rctx->screen->debug_flags & 
DBG_SB_DISASM);
 
shader->shader.bc.isa = rctx->isa;
 
@@ -163,21 +164,18 @@ int r600_pipe_shader_create(struct pipe_context *ctx,
return r;
}
 
-#if 0
-   if (dump) {
+   if (dump && !sb_disasm) {
fprintf(stderr, 
"--\n");
r600_bytecode_disasm(&shader->shader.bc);
fprintf(stderr, 
"__\n");
-   }
-#else
-   if (dump || use_sb) {
-   r = r600_sb_bytecode_process(rctx, &shader->shader.bc, 
&shader->shader, dump, use_sb);
+   } else if ((dump && sb_disasm) || use_sb) {
+   r = r600_sb_bytecode_process(rctx, &shader->shader.bc, 
&shader->shader,
+dump, use_sb);
if (r) {
R600_ERR("r600_sb_bytecode_process failed !\n");
return r;
}
}
-#endif
 
/* Store the shader in a buffer. */
if (shader->bo == NULL) {
@@ -307,6 +305,8 @@ int r600_compute_shader_create(struct pipe_context * ctx,
boolean use_kill = false;
bool dump = (r600_ct

Re: [Mesa-dev] [PATCH] gallium: fix type of flags in pipe_context::flush()

2013-05-03 Thread Jose Fonseca
- Original Message -
> It should be unsigned, not enum pipe_flush_flags.
> 
> Fixed a build error:
> 
>   src/gallium/state_trackers/egl/android/native_android.cpp:426:29: error:
>   invalid conversion from 'int' to 'pipe_flush_flags' [-fpermissive]
> 

While I agree that using enum for bitmasks is not a great idea, I think it is a 
bit too late to go back on this now, as enum pipe_flush_flags is now used all 
over the place.  Please just add (enum pipe_flush_flags) cast to silence.  That 
is what is done in other cpp files.

Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gallivm: allow negation of all integer types

2013-05-03 Thread Michel Dänzer
On Fre, 2013-05-03 at 02:00 +0200, Roland Scheidegger wrote: 
> Am 03.05.2013 00:29, schrieb Dave Airlie:
> > On Fri, May 3, 2013 at 6:04 AM, Zack Rusin  wrote:
> >>> Well in contrast to the IF/UIF they'd be really redundant unless I'm
> >>> missing something so just for supporting negation on inputs or not this
> >>> looks like not really worth it (and as said there are also other signed
> >>> instructions where supporting negation doesn't really make sense). OTOH
> >>> you're right it would be just 3 more instructions (imul/iadd/imad) I
> >>> believe so it wouldn't be too bad. But I'm fine with just supporting
> >>> negation on unsigned instructions, after all it's just two's complement
> >>> negation regardless if it's signed or not. And just document these umul
> >>> functions work for both signed and unsigned.
> >>
> >> k, I'll just push that last patch then. If someone won't like it or we'll 
> >> decide to do it in some other way we can always redo it later. For now 
> >> this will be enough to fix the umad handling.
> >>
> > 
> > I don't like this, sorry for being slow :-)
> > 
> > Mainly because I don't think any hw has the ability to apply negate on
> > non-float instructions (at least radeon doesn't), and I'd like gallium
> > to be an interface that reflects the hw.
> > 
> > Granted the hw also doesn't have IMAD/UMAD was have to decompose those 
> > anyways.
> 
> Well we already said before that negation (but not abs) was supported on
> signed inputs, this changes it to just say it's also supported on
> unsigned ones, but that's really just because uadd/umul/umad are
> ignorant of signedness.
> Sorry to hear the hw doesn't support it, but this is supported by d3d10
> so it's quite likely some hw indeed supports it. There's always some
> things some hw can't do natively.
> I'd be fine to just restrict it to some opcodes, e.g. exclude logic ops
> if that helps you.

FWIW, I don't think modifiers make sense for logic ops, as they just
operate on individual bits of the operands, regardless of any particular
encoding of the whole operands.


> Note that tgsi (and d3d10 for that matter) also for instance don't
> have a integer SUB opcode, so the natural way for encoding that would
> of course just be with a ADD with negate modifier on second arg, and I
> bet your hw could do that one.

I agree it shouldn't matter that much how this is expressed in TGSI. 

-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] android: libsync is needed on Android 4.2+ for any driver

2013-05-03 Thread Chia-I Wu
Add libsync not only for MESA_BUILD_CLASSIC, but also for MESA_BUILD_GALLIUM.

Signed-off-by: Chia-I Wu 
---
 src/egl/main/Android.mk |8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/src/egl/main/Android.mk b/src/egl/main/Android.mk
index b2cbe05..30e5ea7 100644
--- a/src/egl/main/Android.mk
+++ b/src/egl/main/Android.mk
@@ -68,6 +68,9 @@ LOCAL_SHARED_LIBRARIES := \
libcutils \
libgralloc_drm \
 
+ifeq ($(shell echo "$(MESA_ANDROID_VERSION) >= 4.2" | bc),1)
+LOCAL_SHARED_LIBRARIES += libsync
+endif
 
 # add libdrm if there are hardware drivers
 ifneq ($(MESA_GPU_DRIVERS),swrast)
@@ -81,11 +84,6 @@ LOCAL_STATIC_LIBRARIES += libmesa_egl_dri2
 # require i915_dri and/or i965_dri
 LOCAL_REQUIRED_MODULES += \
$(addsuffix _dri, $(filter i915 i965, $(MESA_GPU_DRIVERS)))
-
-ifeq ($(shell echo "$(MESA_ANDROID_VERSION) >= 4.2" | bc),1)
-LOCAL_SHARED_LIBRARIES += \
-libsync
-endif
 endif # MESA_BUILD_CLASSIC
 
 ifeq ($(strip $(MESA_BUILD_GALLIUM)),true)
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] android: add ilo to the build system

2013-05-03 Thread Chia-I Wu
It can be selected with

  BOARD_GPU_DRIVERS := ilo

Signed-off-by: Chia-I Wu 
---
 Android.mk|4 +--
 src/egl/main/Android.mk   |6 +
 src/gallium/Android.mk|5 
 src/gallium/drivers/ilo/Android.mk|   39 
 src/gallium/targets/egl-static/Android.mk |3 +++
 src/gallium/winsys/intel/drm/Android.mk   |   40 +
 6 files changed, 95 insertions(+), 2 deletions(-)
 create mode 100644 src/gallium/drivers/ilo/Android.mk
 create mode 100644 src/gallium/winsys/intel/drm/Android.mk

diff --git a/Android.mk b/Android.mk
index 80c0e13..c3c1f08 100644
--- a/Android.mk
+++ b/Android.mk
@@ -24,7 +24,7 @@
 # BOARD_GPU_DRIVERS should be defined.  The valid values are
 #
 #   classic drivers: i915 i965
-#   gallium drivers: swrast i915g nouveau r300g r600g radeonsi vmwgfx
+#   gallium drivers: swrast i915g ilo nouveau r300g r600g radeonsi vmwgfx
 #
 # The main target is libGLES_mesa.  For each classic driver enabled, a DRI
 # module will also be built.  DRI modules will be loaded by libGLES_mesa.
@@ -42,7 +42,7 @@ DRM_TOP := external/drm
 DRM_GRALLOC_TOP := hardware/drm_gralloc
 
 classic_drivers := i915 i965
-gallium_drivers := swrast i915g nouveau r300g r600g radeonsi vmwgfx
+gallium_drivers := swrast i915g ilo nouveau r300g r600g radeonsi vmwgfx
 
 MESA_GPU_DRIVERS := $(strip $(BOARD_GPU_DRIVERS))
 
diff --git a/src/egl/main/Android.mk b/src/egl/main/Android.mk
index 3751ecf..b2cbe05 100644
--- a/src/egl/main/Android.mk
+++ b/src/egl/main/Android.mk
@@ -103,6 +103,12 @@ gallium_DRIVERS += libmesa_winsys_i915 libmesa_pipe_i915
 LOCAL_SHARED_LIBRARIES += libdrm_intel
 endif
 
+# ilo
+ifneq ($(filter ilo, $(MESA_GPU_DRIVERS)),)
+gallium_DRIVERS += libmesa_winsys_intel libmesa_pipe_ilo
+LOCAL_SHARED_LIBRARIES += libdrm_intel
+endif
+
 # nouveau
 ifneq ($(filter nouveau, $(MESA_GPU_DRIVERS)),)
 gallium_DRIVERS += \
diff --git a/src/gallium/Android.mk b/src/gallium/Android.mk
index 1d002d0..4e5acc8 100644
--- a/src/gallium/Android.mk
+++ b/src/gallium/Android.mk
@@ -39,6 +39,11 @@ ifneq ($(filter i915g, $(MESA_GPU_DRIVERS)),)
 SUBDIRS += winsys/i915/drm drivers/i915
 endif
 
+# ilo
+ifneq ($(filter ilo, $(MESA_GPU_DRIVERS)),)
+SUBDIRS += winsys/intel/drm drivers/ilo
+endif
+
 # nouveau
 ifneq ($(filter nouveau, $(MESA_GPU_DRIVERS)),)
 SUBDIRS += \
diff --git a/src/gallium/drivers/ilo/Android.mk 
b/src/gallium/drivers/ilo/Android.mk
new file mode 100644
index 000..5e70d3d
--- /dev/null
+++ b/src/gallium/drivers/ilo/Android.mk
@@ -0,0 +1,39 @@
+# Mesa 3-D graphics library
+#
+# Copyright (C) 2013 LunarG Inc.
+#
+# Permission is hereby granted, free of charge, to any person obtaining a
+# copy of this software and associated documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and to permit persons to whom the
+# Software is furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included
+# in all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+# DEALINGS IN THE SOFTWARE.
+
+LOCAL_PATH := $(call my-dir)
+
+# get C_SOURCES
+include $(LOCAL_PATH)/Makefile.sources
+
+include $(CLEAR_VARS)
+
+LOCAL_C_INCLUDES := \
+   $(LOCAL_PATH)/include \
+   $(GALLIUM_TOP)/winsys/intel/drm
+
+LOCAL_SRC_FILES := $(C_SOURCES)
+
+LOCAL_MODULE := libmesa_pipe_ilo
+
+include $(GALLIUM_COMMON_MK)
+include $(BUILD_STATIC_LIBRARY)
diff --git a/src/gallium/targets/egl-static/Android.mk 
b/src/gallium/targets/egl-static/Android.mk
index 99c0812..06975db 100644
--- a/src/gallium/targets/egl-static/Android.mk
+++ b/src/gallium/targets/egl-static/Android.mk
@@ -56,6 +56,9 @@ endif
 ifneq ($(filter i915g, $(MESA_GPU_DRIVERS)),)
 LOCAL_CFLAGS += -D_EGL_PIPE_I915=1
 endif
+ifneq ($(filter ilo, $(MESA_GPU_DRIVERS)),)
+LOCAL_CFLAGS += -D_EGL_PIPE_ILO=1
+endif
 ifneq ($(filter nouveau, $(MESA_GPU_DRIVERS)),)
 LOCAL_CFLAGS += -D_EGL_PIPE_NOUVEAU=1
 endif
diff --git a/src/gallium/winsys/intel/drm/Android.mk 
b/src/gallium/winsys/intel/drm/Android.mk
new file mode 100644
index 000..a375b59
--- /dev/null
+++ b/src/gallium/winsys/intel/drm/Android.mk
@@ -0,0 +1,40 @@
+# Mesa 3-D graphics library
+#
+# Copyright (C) 2013 LunarG Inc.
+#
+# Permission is hereby granted, free of charge, to any person obtaining a
+# cop

[Mesa-dev] [PATCH 0/2] android: add ilo

2013-05-03 Thread Chia-I Wu
Hi,

This patch series allows ilo to be selected as the GPU driver on Android-IA.
The second patch fixes a linking error, which should be common to other
Gallium-based drivers.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev