date:20160513

Re: [Mesa-dev] [PATCH 2/2] glsl/linker: Include the interface name for input and output blocks

2016-05-13 Thread Kenneth Graunke

On Friday, May 13, 2016 6:42:54 PM PDT Ian Romanick wrote:
> From: Ian Romanick 
> 
> On my oes_shader_io_blocks branch, this fixes 71
> dEQP-GLES31.functional.program_interface_query.* tests.
> 
> Signed-off-by: Ian Romanick 
> Cc: mesa-sta...@lists.freedesktop.org
> ---
>  src/compiler/glsl/linker.cpp | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp
> index 41b43ab..3749585 100644
> --- a/src/compiler/glsl/linker.cpp
> +++ b/src/compiler/glsl/linker.cpp
> @@ -3654,6 +3654,21 @@ add_shader_variable(struct gl_shader_program *shProg, 
unsigned stage_mask,
> }
>  
> default: {
> +  /* Issue #16 of the ARB_program_interface_query spec says:
> +   *
> +   * "* If a variable is a member of an interface block without an
> +   *instance name, it is enumerated using just the variable name.
> +   *
> +   *  * If a variable is a member of an interface block with an 
instance
> +   *name, it is enumerated as "BlockName.Member", where "BlockName" 
is
> +   *the name of the interface block (not the instance name) and
> +   *"Member" is the name of the variable."

lol..."if it's in a block with one kind of name, use the block's other 
name..."

Reviewed-by: Kenneth Graunke 

> +   */
> +  const char *prefixed_name = var->data.from_named_ifc_block
> + ? ralloc_asprintf(shProg, "%s.%s", var->get_interface_type()-
>name,
> +   name)
> + : name;
> +
>/* The ARB_program_interface_query spec says:
> *
> * "For an active variable declared as a single instance of a 
basic
> @@ -3661,7 +3676,7 @@ add_shader_variable(struct gl_shader_program *shProg, 
unsigned stage_mask,
> * from the shader source."
> */
>gl_shader_variable *sha_v =
> - create_shader_variable(shProg, var, name, type,
> + create_shader_variable(shProg, var, prefixed_name, type,
>  use_implicit_location, location);
>if (!sha_v)
>   return false;
> 



signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] glsl: Drop bad ASSERT_TRUE in gl_CullDistance link_varyings test.

2016-05-13 Thread Kenneth Graunke

I don't know what the intention was here, but this function returns
void.  We can't assert anything about its return value.

Fixes "make check" failures.

Signed-off-by: Kenneth Graunke 
---
 src/compiler/glsl/tests/varyings_test.cpp | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/compiler/glsl/tests/varyings_test.cpp 
b/src/compiler/glsl/tests/varyings_test.cpp
index 936f495..09bf1eb 100644
--- a/src/compiler/glsl/tests/varyings_test.cpp
+++ b/src/compiler/glsl/tests/varyings_test.cpp
@@ -210,11 +210,11 @@ TEST_F(link_varyings, gl_CullDistance)
 
ir.push_tail(culldistance);
 
-   ASSERT_TRUE(linker::populate_consumer_input_sets(mem_ctx,
-,
-consumer_inputs,
-consumer_interface_inputs,
-junk));
+   linker::populate_consumer_input_sets(mem_ctx,
+,
+consumer_inputs,
+consumer_interface_inputs,
+junk);
 
EXPECT_EQ(culldistance, junk[VARYING_SLOT_CULL_DIST0]);
EXPECT_TRUE(is_empty(consumer_inputs));
-- 
2.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Fix undefined df bits in brw_reg comparisons.

2016-05-13 Thread Kenneth Graunke

On Friday, May 13, 2016 5:58:25 PM PDT Francisco Jerez wrote:
> Kenneth Graunke  writes:
> 
> > Commit 5310bca024f77da40ea6f4c275455f9cb0528f9e added a new "double df"
> > field to the brw_reg struct, adding an extra 4 bytes of data that isn't
> > usually initialized (or may contain irrelevant garbage if the struct is
> > mutated).  This means that it's no longer safe to memcmp().
> >
> > Instead, add a brw_regs_equal() function which ignores the extra df bits
> > unless they matter.  To keep the implementation cheap, we wrap the first
> > set of fields in a union/struct so that we can use a single DWord
> > comparison.
> >
> > Signed-off-by: Kenneth Graunke 
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  2 +-
> >  src/mesa/drivers/dri/i965/brw_reg.h  | 27 
+---
> >  src/mesa/drivers/dri/i965/brw_shader.cpp |  2 +-
> >  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  2 +-
> >  4 files changed, 22 insertions(+), 11 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/
drivers/dri/i965/brw_fs_generator.cpp
> > index 4f6f3a3..3b50a82 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > @@ -1010,7 +1010,7 @@ fs_generator::generate_tex(fs_inst *inst, struct 
brw_reg dst, struct brw_reg src
> >brw_set_default_mask_control(p, BRW_MASK_DISABLE);
> >brw_set_default_access_mode(p, BRW_ALIGN_1);
> >  
> > -  if (memcmp(_reg, _reg, sizeof(surface_reg)) == 0) {
> > +  if (brw_regs_equal(_reg, _reg)) {
> >   brw_MUL(p, addr, sampler_reg, brw_imm_uw(0x101));
> >} else {
> >   brw_SHL(p, addr, sampler_reg, brw_imm_ud(8));
> > diff --git a/src/mesa/drivers/dri/i965/brw_reg.h b/src/mesa/drivers/dri/
i965/brw_reg.h
> > index 6d51623..71e1024 100644
> > --- a/src/mesa/drivers/dri/i965/brw_reg.h
> > +++ b/src/mesa/drivers/dri/i965/brw_reg.h
> > @@ -234,14 +234,19 @@ uint32_t brw_swizzle_immediate(enum brw_reg_type 
type, uint32_t x, unsigned swz)
> >   * or "structure of array" form:
> >   */
> >  struct brw_reg {
> > -   enum brw_reg_type type:4;
> > -   enum brw_reg_file file:3;  /* :2 hardware format */
> > -   unsigned negate:1; /* source only */
> > -   unsigned abs:1;/* source only */
> > -   unsigned address_mode:1;   /* relative addressing, hopefully! */
> > -   unsigned pad0:1;
> > -   unsigned subnr:5;  /* :1 in align16 */
> > -   unsigned nr:16;
> > +   union {
> > +  struct {
> > + enum brw_reg_type type:4;
> > + enum brw_reg_file file:3;  /* :2 hardware format */
> > + unsigned negate:1; /* source only */
> > + unsigned abs:1;/* source only */
> > + unsigned address_mode:1;   /* relative addressing, 
hopefully! */
> > + unsigned pad0:1;
> > + unsigned subnr:5;  /* :1 in align16 */
> > + unsigned nr:16;
> > +  };
> > +  uint32_t bits;
> > +   };
> >  
> > union {
> >struct {
> > @@ -261,6 +266,12 @@ struct brw_reg {
> > };
> >  };
> >  
> > +static inline bool
> > +brw_regs_equal(const struct brw_reg *a, const struct brw_reg *b)
> > +{
> > +   const bool df = a->type == BRW_REGISTER_TYPE_DF && a->file == IMM;
> > +   return a->bits == b->bits && (df ? a->df == b->df : a->ud == b->ud);
> > +}
> >  
> >  struct brw_indirect {
> > unsigned addr_subnr:4;
> > diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/
dri/i965/brw_shader.cpp
> > index a23f14e..8d9e309 100644
> > --- a/src/mesa/drivers/dri/i965/brw_shader.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
> > @@ -687,7 +687,7 @@ backend_shader::backend_shader(const struct 
brw_compiler *compiler,
> >  bool
> >  backend_reg::equals(const backend_reg ) const
> >  {
> > -   return memcmp((brw_reg *)this, (brw_reg *), sizeof(brw_reg)) == 0 &&
> > +   return brw_regs_equal((brw_reg *)this, (brw_reg *)) &&
> 
> These casts should be redundant now, if the upcast is safe the compiler
> will be able to figure it out for you.  With that cleaned up:
> 
> Reviewed-by: Francisco Jerez 

Right, they're not necessary.  Dropped locally, thanks!


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] glsl/linker: Include the interface name for input and output blocks

2016-05-13 Thread Ian Romanick

From: Ian Romanick 

On my oes_shader_io_blocks branch, this fixes 71
dEQP-GLES31.functional.program_interface_query.* tests.

Signed-off-by: Ian Romanick 
Cc: mesa-sta...@lists.freedesktop.org
---
 src/compiler/glsl/linker.cpp | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp
index 41b43ab..3749585 100644
--- a/src/compiler/glsl/linker.cpp
+++ b/src/compiler/glsl/linker.cpp
@@ -3654,6 +3654,21 @@ add_shader_variable(struct gl_shader_program *shProg, 
unsigned stage_mask,
}
 
default: {
+  /* Issue #16 of the ARB_program_interface_query spec says:
+   *
+   * "* If a variable is a member of an interface block without an
+   *instance name, it is enumerated using just the variable name.
+   *
+   *  * If a variable is a member of an interface block with an instance
+   *name, it is enumerated as "BlockName.Member", where "BlockName" is
+   *the name of the interface block (not the instance name) and
+   *"Member" is the name of the variable."
+   */
+  const char *prefixed_name = var->data.from_named_ifc_block
+ ? ralloc_asprintf(shProg, "%s.%s", var->get_interface_type()->name,
+   name)
+ : name;
+
   /* The ARB_program_interface_query spec says:
*
* "For an active variable declared as a single instance of a basic
@@ -3661,7 +3676,7 @@ add_shader_variable(struct gl_shader_program *shProg, 
unsigned stage_mask,
* from the shader source."
*/
   gl_shader_variable *sha_v =
- create_shader_variable(shProg, var, name, type,
+ create_shader_variable(shProg, var, prefixed_name, type,
 use_implicit_location, location);
   if (!sha_v)
  return false;
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] glsl/linker: Use canonical format for ARB_program_interface_query spec quotes

2016-05-13 Thread Ian Romanick

From: Ian Romanick 

Signed-off-by: Ian Romanick 
---
 src/compiler/glsl/linker.cpp | 100 ++-
 1 file changed, 51 insertions(+), 49 deletions(-)

diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp
index eae10655..41b43ab 100644
--- a/src/compiler/glsl/linker.cpp
+++ b/src/compiler/glsl/linker.cpp
@@ -3414,12 +3414,12 @@ should_add_buffer_variable(struct gl_shader_program 
*shProg,
if (found_interface)
   name = name + block_name_len + 1;
 
-   /* From: ARB_program_interface_query extension:
+   /* The ARB_program_interface_query spec says:
 *
-*  "For an active shader storage block member declared as an array, an
-*   entry will be generated only for the first array element, regardless
-*   of its type.  For arrays of aggregate types, the enumeration rules are
-*   applied recursively for the single enumerated array element.
+* "For an active shader storage block member declared as an array, an
+* entry will be generated only for the first array element, regardless
+* of its type.  For arrays of aggregate types, the enumeration rules
+* are applied recursively for the single enumerated array element."
 */
const char *struct_first_dot = strchr(name, '.');
const char *first_square_bracket = strchr(name, '[');
@@ -3585,19 +3585,20 @@ create_shader_variable(struct gl_shader_program *shProg,
if (!out->name)
   return NULL;
 
-   /* From the ARB_program_interface_query specification:
+   /* The ARB_program_interface_query spec says:
 *
-* "Not all active variables are assigned valid locations; the
-*  following variables will have an effective location of -1:
+* "Not all active variables are assigned valid locations; the
+* following variables will have an effective location of -1:
 *
-*  * uniforms declared as atomic counters;
+*  * uniforms declared as atomic counters;
 *
-*  * members of a uniform block;
+*  * members of a uniform block;
 *
-*  * built-in inputs, outputs, and uniforms (starting with "gl_"); and
+*  * built-in inputs, outputs, and uniforms (starting with "gl_"); and
 *
-*  * inputs or outputs not declared with a "location" layout qualifier,
-*except for vertex shader inputs and fragment shader outputs."
+*  * inputs or outputs not declared with a "location" layout
+*qualifier, except for vertex shader inputs and fragment shader
+*outputs."
 */
if (in->type->base_type == GLSL_TYPE_ATOMIC_UINT ||
is_gl_identifier(in->name) ||
@@ -3628,14 +3629,14 @@ add_shader_variable(struct gl_shader_program *shProg, 
unsigned stage_mask,
 
switch (type->base_type) {
case GLSL_TYPE_STRUCT: {
-  /* From the ARB_program_interface_query specification:
+  /* The ARB_program_interface_query spec says:
*
-   *  "For an active variable declared as a structure, a separate entry
-   *   will be generated for each active structure member.  The name of
-   *   each entry is formed by concatenating the name of the structure,
-   *   the "."  character, and the name of the structure member.  If a
-   *   structure member to enumerate is itself a structure or array, these
-   *   enumeration rules are applied recursively."
+   * "For an active variable declared as a structure, a separate entry
+   * will be generated for each active structure member.  The name of
+   * each entry is formed by concatenating the name of the structure,
+   * the "."  character, and the name of the structure member.  If a
+   * structure member to enumerate is itself a structure or array,
+   * these enumeration rules are applied recursively."
*/
   unsigned field_location = location;
   for (unsigned i = 0; i < type->length; i++) {
@@ -3653,11 +3654,11 @@ add_shader_variable(struct gl_shader_program *shProg, 
unsigned stage_mask,
}
 
default: {
-  /* From the ARB_program_interface_query specification:
+  /* The ARB_program_interface_query spec says:
*
-   *  "For an active variable declared as a single instance of a basic
-   *   type, a single entry will be generated, using the variable name
-   *   from the shader source."
+   * "For an active variable declared as a single instance of a basic
+   * type, a single entry will be generated, using the variable name
+   * from the shader source."
*/
   gl_shader_variable *sha_v =
  create_shader_variable(shProg, var, name, type,
@@ -3791,14 +3792,16 @@ get_top_level_name(const char *name)
const char *first_dot = strchr(name, '.');
const char *first_square_bracket = strchr(name, '[');
int name_size = 0;
-   /* From ARB_program_interface_query spec:
+
+   /* The

Re: [Mesa-dev] [RFC] nir/validate: on failure, dump shader w/ offending line annotated

2016-05-13 Thread Rob Clark

On Fri, May 13, 2016 at 9:13 PM, Connor Abbott  wrote:
> On Fri, May 13, 2016 at 9:07 PM, Rob Clark  wrote:
>> On Fri, May 13, 2016 at 8:23 PM, Connor Abbott  wrote:
>>> On Fri, May 13, 2016 at 4:14 PM, Rob Clark  wrote:
 On Fri, May 13, 2016 at 4:10 PM, Jason Ekstrand  
 wrote:
> On Fri, May 13, 2016 at 1:02 PM, Rob Clark  wrote:
>>
>> From: Rob Clark 
>>
>> If we assert in nir_validate_shader(), print the shader with the
>> offending instruction prefixed with "=>" to make it easier to find what
>> part of the shader nir_validate is complaining about.
>>
>> Macro funny-business in nir_validate() was just to avoid changing a
>> bazillion assert() lines to validate_assert() (or similar) for the point
>> of an RFC ;-)
>
>
> I love this idea.  I just wish it worked for more than just instructions.
> It would also be fantastic if it were somehow able to print more than one
> error.  Maybe something where we tie printing and validation together
> somehow?  Just a thought.

 hmm, err_instr could easily become a void* (or array of void*?) to
 match var's/etc too..

 and nir_validate could easily keep a list of fails (maybe up to some
 threshold), and only assert at the end if num_errors > 0..

 That might be an easier way to go than merging the two existing
 passes..  although if I was starting from scratch merging the two
 might have been the better approach
>>>
>>> If we want to show multiple failures, we probably want to display the
>>> assertion failure inline when printing -- otherwise things might get
>>> confusing when reading the output (which assertion goes with which
>>> line?). We could have a way of adding annotation strings to an
>>> instruction/variable/etc. when printing it, and then have nir_validate
>>> use that. I'd imagine it might be useful for other things too, like
>>> printing the results of an analysis pass for unit tests.
>>
>> fwiw, what I did was nir_validate constructs a hashtable mapping
>> offending object (currently instr or var, but I guess we could add
>> whatever) to assert string.. then passes that to nir_print which does
>> hashtable lookups as it prints instr/var/whatever and displays all the
>> failed assert msgs inline with the dump of the shader.  Seems to work
>> pretty well..
>>
>> But I haven't yet done a rebase -i to squash that vs other various
>> interleaved unrelated fixup patches.. and still need to wire up the
>> error reporting for printing var's (which I didn't need yet for what I
>> was debugging) so I'll get around to cleaning that up and resending
>> sometime this weekend
>
> Ok. I'd call them annotations instead of assert strings inside of
> nir_print.c (since they're really more general than that) but
> otherwise seems like a good idea.

right, I'll concat the "error:" part of the msg in nir_validate, and
rename things a bit to generalize.

BR,
-R


>>
>> BR,
>> -R
>>
>>

 BR,
 -R

>>
>> Example output: http://hastebin.com/raw/qorirayazu
>> ---
>>  src/compiler/nir/nir.h  |  1 +
>>  src/compiler/nir/nir_print.c| 14 +-
>>  src/compiler/nir/nir_validate.c | 15 +++
>>  3 files changed, 29 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> index ade584c..6bb9fbe 100644
>> --- a/src/compiler/nir/nir.h
>> +++ b/src/compiler/nir/nir.h
>> @@ -2196,6 +2196,7 @@ unsigned nir_index_instrs(nir_function_impl *impl);
>>  void nir_index_blocks(nir_function_impl *impl);
>>
>>  void nir_print_shader(nir_shader *shader, FILE *fp);
>> +void nir_print_shader_err(nir_shader *shader, FILE *fp, nir_instr
>> *instr);
>>  void nir_print_instr(const nir_instr *instr, FILE *fp);
>>
>>  nir_shader *nir_shader_clone(void *mem_ctx, const nir_shader *s);
>> diff --git a/src/compiler/nir/nir_print.c b/src/compiler/nir/nir_print.c
>> index a36561e..3b25a49 100644
>> --- a/src/compiler/nir/nir_print.c
>> +++ b/src/compiler/nir/nir_print.c
>> @@ -53,6 +53,8 @@ typedef struct {
>>
>> /* an index used to make new non-conflicting names */
>> unsigned index;
>> +
>> +   nir_instr *err_instr;
>>  } print_state;
>>
>>  static void
>> @@ -916,6 +918,8 @@ print_block(nir_block *block, print_state *state,
>> unsigned tabs)
>> free(preds);
>>
>> nir_foreach_instr(instr, block) {
>> +  if (instr == state->err_instr)
>> + fprintf(fp, "=>");
>>print_instr(instr, state, tabs);
>>fprintf(fp, "\n");
>> }
>> @@ -1090,11 +1094,13 @@ destroy_print_state(print_state *state)
>>  }
>>
>>  void
>>

Re: [Mesa-dev] [RFC] nir/validate: on failure, dump shader w/ offending line annotated

2016-05-13 Thread Connor Abbott

On Fri, May 13, 2016 at 9:07 PM, Rob Clark  wrote:
> On Fri, May 13, 2016 at 8:23 PM, Connor Abbott  wrote:
>> On Fri, May 13, 2016 at 4:14 PM, Rob Clark  wrote:
>>> On Fri, May 13, 2016 at 4:10 PM, Jason Ekstrand  
>>> wrote:
 On Fri, May 13, 2016 at 1:02 PM, Rob Clark  wrote:
>
> From: Rob Clark 
>
> If we assert in nir_validate_shader(), print the shader with the
> offending instruction prefixed with "=>" to make it easier to find what
> part of the shader nir_validate is complaining about.
>
> Macro funny-business in nir_validate() was just to avoid changing a
> bazillion assert() lines to validate_assert() (or similar) for the point
> of an RFC ;-)


 I love this idea.  I just wish it worked for more than just instructions.
 It would also be fantastic if it were somehow able to print more than one
 error.  Maybe something where we tie printing and validation together
 somehow?  Just a thought.
>>>
>>> hmm, err_instr could easily become a void* (or array of void*?) to
>>> match var's/etc too..
>>>
>>> and nir_validate could easily keep a list of fails (maybe up to some
>>> threshold), and only assert at the end if num_errors > 0..
>>>
>>> That might be an easier way to go than merging the two existing
>>> passes..  although if I was starting from scratch merging the two
>>> might have been the better approach
>>
>> If we want to show multiple failures, we probably want to display the
>> assertion failure inline when printing -- otherwise things might get
>> confusing when reading the output (which assertion goes with which
>> line?). We could have a way of adding annotation strings to an
>> instruction/variable/etc. when printing it, and then have nir_validate
>> use that. I'd imagine it might be useful for other things too, like
>> printing the results of an analysis pass for unit tests.
>
> fwiw, what I did was nir_validate constructs a hashtable mapping
> offending object (currently instr or var, but I guess we could add
> whatever) to assert string.. then passes that to nir_print which does
> hashtable lookups as it prints instr/var/whatever and displays all the
> failed assert msgs inline with the dump of the shader.  Seems to work
> pretty well..
>
> But I haven't yet done a rebase -i to squash that vs other various
> interleaved unrelated fixup patches.. and still need to wire up the
> error reporting for printing var's (which I didn't need yet for what I
> was debugging) so I'll get around to cleaning that up and resending
> sometime this weekend

Ok. I'd call them annotations instead of assert strings inside of
nir_print.c (since they're really more general than that) but
otherwise seems like a good idea.

>
> BR,
> -R
>
>
>>>
>>> BR,
>>> -R
>>>
>
> Example output: http://hastebin.com/raw/qorirayazu
> ---
>  src/compiler/nir/nir.h  |  1 +
>  src/compiler/nir/nir_print.c| 14 +-
>  src/compiler/nir/nir_validate.c | 15 +++
>  3 files changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index ade584c..6bb9fbe 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -2196,6 +2196,7 @@ unsigned nir_index_instrs(nir_function_impl *impl);
>  void nir_index_blocks(nir_function_impl *impl);
>
>  void nir_print_shader(nir_shader *shader, FILE *fp);
> +void nir_print_shader_err(nir_shader *shader, FILE *fp, nir_instr
> *instr);
>  void nir_print_instr(const nir_instr *instr, FILE *fp);
>
>  nir_shader *nir_shader_clone(void *mem_ctx, const nir_shader *s);
> diff --git a/src/compiler/nir/nir_print.c b/src/compiler/nir/nir_print.c
> index a36561e..3b25a49 100644
> --- a/src/compiler/nir/nir_print.c
> +++ b/src/compiler/nir/nir_print.c
> @@ -53,6 +53,8 @@ typedef struct {
>
> /* an index used to make new non-conflicting names */
> unsigned index;
> +
> +   nir_instr *err_instr;
>  } print_state;
>
>  static void
> @@ -916,6 +918,8 @@ print_block(nir_block *block, print_state *state,
> unsigned tabs)
> free(preds);
>
> nir_foreach_instr(instr, block) {
> +  if (instr == state->err_instr)
> + fprintf(fp, "=>");
>print_instr(instr, state, tabs);
>fprintf(fp, "\n");
> }
> @@ -1090,11 +1094,13 @@ destroy_print_state(print_state *state)
>  }
>
>  void
> -nir_print_shader(nir_shader *shader, FILE *fp)
> +nir_print_shader_err(nir_shader *shader, FILE *fp, nir_instr *instr)
>  {
> print_state state;
> init_print_state(, shader, fp);
>
> +   state.err_instr = instr;
> +
> fprintf(fp, "shader: %s\n",

Re: [Mesa-dev] [RFC] nir/validate: on failure, dump shader w/ offending line annotated

2016-05-13 Thread Rob Clark

On Fri, May 13, 2016 at 8:23 PM, Connor Abbott  wrote:
> On Fri, May 13, 2016 at 4:14 PM, Rob Clark  wrote:
>> On Fri, May 13, 2016 at 4:10 PM, Jason Ekstrand  wrote:
>>> On Fri, May 13, 2016 at 1:02 PM, Rob Clark  wrote:

 From: Rob Clark 

 If we assert in nir_validate_shader(), print the shader with the
 offending instruction prefixed with "=>" to make it easier to find what
 part of the shader nir_validate is complaining about.

 Macro funny-business in nir_validate() was just to avoid changing a
 bazillion assert() lines to validate_assert() (or similar) for the point
 of an RFC ;-)
>>>
>>>
>>> I love this idea.  I just wish it worked for more than just instructions.
>>> It would also be fantastic if it were somehow able to print more than one
>>> error.  Maybe something where we tie printing and validation together
>>> somehow?  Just a thought.
>>
>> hmm, err_instr could easily become a void* (or array of void*?) to
>> match var's/etc too..
>>
>> and nir_validate could easily keep a list of fails (maybe up to some
>> threshold), and only assert at the end if num_errors > 0..
>>
>> That might be an easier way to go than merging the two existing
>> passes..  although if I was starting from scratch merging the two
>> might have been the better approach
>
> If we want to show multiple failures, we probably want to display the
> assertion failure inline when printing -- otherwise things might get
> confusing when reading the output (which assertion goes with which
> line?). We could have a way of adding annotation strings to an
> instruction/variable/etc. when printing it, and then have nir_validate
> use that. I'd imagine it might be useful for other things too, like
> printing the results of an analysis pass for unit tests.

fwiw, what I did was nir_validate constructs a hashtable mapping
offending object (currently instr or var, but I guess we could add
whatever) to assert string.. then passes that to nir_print which does
hashtable lookups as it prints instr/var/whatever and displays all the
failed assert msgs inline with the dump of the shader.  Seems to work
pretty well..

But I haven't yet done a rebase -i to squash that vs other various
interleaved unrelated fixup patches.. and still need to wire up the
error reporting for printing var's (which I didn't need yet for what I
was debugging) so I'll get around to cleaning that up and resending
sometime this weekend

BR,
-R


>>
>> BR,
>> -R
>>

 Example output: http://hastebin.com/raw/qorirayazu
 ---
  src/compiler/nir/nir.h  |  1 +
  src/compiler/nir/nir_print.c| 14 +-
  src/compiler/nir/nir_validate.c | 15 +++
  3 files changed, 29 insertions(+), 1 deletion(-)

 diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
 index ade584c..6bb9fbe 100644
 --- a/src/compiler/nir/nir.h
 +++ b/src/compiler/nir/nir.h
 @@ -2196,6 +2196,7 @@ unsigned nir_index_instrs(nir_function_impl *impl);
  void nir_index_blocks(nir_function_impl *impl);

  void nir_print_shader(nir_shader *shader, FILE *fp);
 +void nir_print_shader_err(nir_shader *shader, FILE *fp, nir_instr
 *instr);
  void nir_print_instr(const nir_instr *instr, FILE *fp);

  nir_shader *nir_shader_clone(void *mem_ctx, const nir_shader *s);
 diff --git a/src/compiler/nir/nir_print.c b/src/compiler/nir/nir_print.c
 index a36561e..3b25a49 100644
 --- a/src/compiler/nir/nir_print.c
 +++ b/src/compiler/nir/nir_print.c
 @@ -53,6 +53,8 @@ typedef struct {

 /* an index used to make new non-conflicting names */
 unsigned index;
 +
 +   nir_instr *err_instr;
  } print_state;

  static void
 @@ -916,6 +918,8 @@ print_block(nir_block *block, print_state *state,
 unsigned tabs)
 free(preds);

 nir_foreach_instr(instr, block) {
 +  if (instr == state->err_instr)
 + fprintf(fp, "=>");
print_instr(instr, state, tabs);
fprintf(fp, "\n");
 }
 @@ -1090,11 +1094,13 @@ destroy_print_state(print_state *state)
  }

  void
 -nir_print_shader(nir_shader *shader, FILE *fp)
 +nir_print_shader_err(nir_shader *shader, FILE *fp, nir_instr *instr)
  {
 print_state state;
 init_print_state(, shader, fp);

 +   state.err_instr = instr;
 +
 fprintf(fp, "shader: %s\n", gl_shader_stage_name(shader->stage));

 if (shader->info.name)
 @@ -1144,6 +1150,12 @@ nir_print_shader(nir_shader *shader, FILE *fp)
  }

  void
 +nir_print_shader(nir_shader *shader, FILE *fp)
 +{
 +   nir_print_shader_err(shader, fp, NULL);
 +}
 +
 +void
  nir_print_instr(const nir_instr *instr, FILE *fp)
  {
 print_state state = {

Re: [Mesa-dev] [PATCH] i965: Fix undefined df bits in brw_reg comparisons.

2016-05-13 Thread Francisco Jerez

Kenneth Graunke  writes:

> Commit 5310bca024f77da40ea6f4c275455f9cb0528f9e added a new "double df"
> field to the brw_reg struct, adding an extra 4 bytes of data that isn't
> usually initialized (or may contain irrelevant garbage if the struct is
> mutated).  This means that it's no longer safe to memcmp().
>
> Instead, add a brw_regs_equal() function which ignores the extra df bits
> unless they matter.  To keep the implementation cheap, we wrap the first
> set of fields in a union/struct so that we can use a single DWord
> comparison.
>
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  2 +-
>  src/mesa/drivers/dri/i965/brw_reg.h  | 27 
> +---
>  src/mesa/drivers/dri/i965/brw_shader.cpp |  2 +-
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  2 +-
>  4 files changed, 22 insertions(+), 11 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index 4f6f3a3..3b50a82 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -1010,7 +1010,7 @@ fs_generator::generate_tex(fs_inst *inst, struct 
> brw_reg dst, struct brw_reg src
>brw_set_default_mask_control(p, BRW_MASK_DISABLE);
>brw_set_default_access_mode(p, BRW_ALIGN_1);
>  
> -  if (memcmp(_reg, _reg, sizeof(surface_reg)) == 0) {
> +  if (brw_regs_equal(_reg, _reg)) {
>   brw_MUL(p, addr, sampler_reg, brw_imm_uw(0x101));
>} else {
>   brw_SHL(p, addr, sampler_reg, brw_imm_ud(8));
> diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
> b/src/mesa/drivers/dri/i965/brw_reg.h
> index 6d51623..71e1024 100644
> --- a/src/mesa/drivers/dri/i965/brw_reg.h
> +++ b/src/mesa/drivers/dri/i965/brw_reg.h
> @@ -234,14 +234,19 @@ uint32_t brw_swizzle_immediate(enum brw_reg_type type, 
> uint32_t x, unsigned swz)
>   * or "structure of array" form:
>   */
>  struct brw_reg {
> -   enum brw_reg_type type:4;
> -   enum brw_reg_file file:3;  /* :2 hardware format */
> -   unsigned negate:1; /* source only */
> -   unsigned abs:1;/* source only */
> -   unsigned address_mode:1;   /* relative addressing, hopefully! */
> -   unsigned pad0:1;
> -   unsigned subnr:5;  /* :1 in align16 */
> -   unsigned nr:16;
> +   union {
> +  struct {
> + enum brw_reg_type type:4;
> + enum brw_reg_file file:3;  /* :2 hardware format */
> + unsigned negate:1; /* source only */
> + unsigned abs:1;/* source only */
> + unsigned address_mode:1;   /* relative addressing, hopefully! */
> + unsigned pad0:1;
> + unsigned subnr:5;  /* :1 in align16 */
> + unsigned nr:16;
> +  };
> +  uint32_t bits;
> +   };
>  
> union {
>struct {
> @@ -261,6 +266,12 @@ struct brw_reg {
> };
>  };
>  
> +static inline bool
> +brw_regs_equal(const struct brw_reg *a, const struct brw_reg *b)
> +{
> +   const bool df = a->type == BRW_REGISTER_TYPE_DF && a->file == IMM;
> +   return a->bits == b->bits && (df ? a->df == b->df : a->ud == b->ud);
> +}
>  
>  struct brw_indirect {
> unsigned addr_subnr:4;
> diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
> b/src/mesa/drivers/dri/i965/brw_shader.cpp
> index a23f14e..8d9e309 100644
> --- a/src/mesa/drivers/dri/i965/brw_shader.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
> @@ -687,7 +687,7 @@ backend_shader::backend_shader(const struct brw_compiler 
> *compiler,
>  bool
>  backend_reg::equals(const backend_reg ) const
>  {
> -   return memcmp((brw_reg *)this, (brw_reg *), sizeof(brw_reg)) == 0 &&
> +   return brw_regs_equal((brw_reg *)this, (brw_reg *)) &&

These casts should be redundant now, if the upcast is safe the compiler
will be able to figure it out for you.  With that cleaned up:

Reviewed-by: Francisco Jerez 

>reg_offset == r.reg_offset;
>  }
>  
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> index 4b44c3a..baf4422 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> @@ -295,7 +295,7 @@ generate_tex(struct brw_codegen *p,
>brw_set_default_mask_control(p, BRW_MASK_DISABLE);
>brw_set_default_access_mode(p, BRW_ALIGN_1);
>  
> -  if (memcmp(_reg, _reg, sizeof(surface_reg)) == 0) {
> +  if (brw_regs_equal(_reg, _reg)) {
>   brw_MUL(p, addr, sampler_reg, brw_imm_uw(0x101));
>} else {
>   brw_SHL(p, addr, sampler_reg, brw_imm_ud(8));
> -- 
> 2.8.2


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

Re: [Mesa-dev] [PATCH v2 27/30] i965/tes/scalar: Fix load input for doubles

2016-05-13 Thread Francisco Jerez

Samuel Iglesias Gonsálvez  writes:

> From: Iago Toral Quiroga 
>
Reviewed-by: Francisco Jerez 

> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 2160127..57ab020 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -2716,8 +2716,8 @@ fs_visitor::nir_emit_tes_intrinsic(const fs_builder 
> ,
>   if (imm_offset < max_push_slots) {
>  fs_reg src = fs_reg(ATTR, imm_offset / 2, dest.type);
>  for (int i = 0; i < instr->num_components; i++) {
> -   bld.MOV(offset(dest, bld, i),
> -   component(src, 4 * (imm_offset % 2) + i));
> +   unsigned comp = 16 / type_sz(dest.type) * (imm_offset % 2) + 
> i;
> +   bld.MOV(offset(dest, bld, i), component(src, comp));
>  }
>  tes_prog_data->base.urb_read_length =
> MAX2(tes_prog_data->base.urb_read_length,
> -- 
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965: Fix undefined df bits in brw_reg comparisons.

2016-05-13 Thread Kenneth Graunke

Commit 5310bca024f77da40ea6f4c275455f9cb0528f9e added a new "double df"
field to the brw_reg struct, adding an extra 4 bytes of data that isn't
usually initialized (or may contain irrelevant garbage if the struct is
mutated).  This means that it's no longer safe to memcmp().

Instead, add a brw_regs_equal() function which ignores the extra df bits
unless they matter.  To keep the implementation cheap, we wrap the first
set of fields in a union/struct so that we can use a single DWord
comparison.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  2 +-
 src/mesa/drivers/dri/i965/brw_reg.h  | 27 +---
 src/mesa/drivers/dri/i965/brw_shader.cpp |  2 +-
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  2 +-
 4 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 4f6f3a3..3b50a82 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -1010,7 +1010,7 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg 
dst, struct brw_reg src
   brw_set_default_mask_control(p, BRW_MASK_DISABLE);
   brw_set_default_access_mode(p, BRW_ALIGN_1);
 
-  if (memcmp(_reg, _reg, sizeof(surface_reg)) == 0) {
+  if (brw_regs_equal(_reg, _reg)) {
  brw_MUL(p, addr, sampler_reg, brw_imm_uw(0x101));
   } else {
  brw_SHL(p, addr, sampler_reg, brw_imm_ud(8));
diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
b/src/mesa/drivers/dri/i965/brw_reg.h
index 6d51623..71e1024 100644
--- a/src/mesa/drivers/dri/i965/brw_reg.h
+++ b/src/mesa/drivers/dri/i965/brw_reg.h
@@ -234,14 +234,19 @@ uint32_t brw_swizzle_immediate(enum brw_reg_type type, 
uint32_t x, unsigned swz)
  * or "structure of array" form:
  */
 struct brw_reg {
-   enum brw_reg_type type:4;
-   enum brw_reg_file file:3;  /* :2 hardware format */
-   unsigned negate:1; /* source only */
-   unsigned abs:1;/* source only */
-   unsigned address_mode:1;   /* relative addressing, hopefully! */
-   unsigned pad0:1;
-   unsigned subnr:5;  /* :1 in align16 */
-   unsigned nr:16;
+   union {
+  struct {
+ enum brw_reg_type type:4;
+ enum brw_reg_file file:3;  /* :2 hardware format */
+ unsigned negate:1; /* source only */
+ unsigned abs:1;/* source only */
+ unsigned address_mode:1;   /* relative addressing, hopefully! */
+ unsigned pad0:1;
+ unsigned subnr:5;  /* :1 in align16 */
+ unsigned nr:16;
+  };
+  uint32_t bits;
+   };
 
union {
   struct {
@@ -261,6 +266,12 @@ struct brw_reg {
};
 };
 
+static inline bool
+brw_regs_equal(const struct brw_reg *a, const struct brw_reg *b)
+{
+   const bool df = a->type == BRW_REGISTER_TYPE_DF && a->file == IMM;
+   return a->bits == b->bits && (df ? a->df == b->df : a->ud == b->ud);
+}
 
 struct brw_indirect {
unsigned addr_subnr:4;
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index a23f14e..8d9e309 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -687,7 +687,7 @@ backend_shader::backend_shader(const struct brw_compiler 
*compiler,
 bool
 backend_reg::equals(const backend_reg ) const
 {
-   return memcmp((brw_reg *)this, (brw_reg *), sizeof(brw_reg)) == 0 &&
+   return brw_regs_equal((brw_reg *)this, (brw_reg *)) &&
   reg_offset == r.reg_offset;
 }
 
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index 4b44c3a..baf4422 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -295,7 +295,7 @@ generate_tex(struct brw_codegen *p,
   brw_set_default_mask_control(p, BRW_MASK_DISABLE);
   brw_set_default_access_mode(p, BRW_ALIGN_1);
 
-  if (memcmp(_reg, _reg, sizeof(surface_reg)) == 0) {
+  if (brw_regs_equal(_reg, _reg)) {
  brw_MUL(p, addr, sampler_reg, brw_imm_uw(0x101));
   } else {
  brw_SHL(p, addr, sampler_reg, brw_imm_ud(8));
-- 
2.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 22/30] i965/vec4: handle doubles in type_size_vec4()

2016-05-13 Thread Francisco Jerez

Samuel Iglesias Gonsálvez  writes:

> From: Iago Toral Quiroga 
>
> The scalar backend uses this to check URB input sizes.
> ---
>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> index 507f2ee..a0e18c6 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> @@ -587,16 +587,20 @@ type_size_vec4(const struct glsl_type *type)
> case GLSL_TYPE_INT:
> case GLSL_TYPE_FLOAT:
> case GLSL_TYPE_BOOL:
> +   case GLSL_TYPE_DOUBLE:
>if (type->is_matrix()) {
> -  return type->matrix_columns;
> + const glsl_type *col_type = type->column_type();
> + unsigned col_slots = col_type->is_dual_slot_double() ? 2 : 1;
> + return type->matrix_columns * col_slots;
>} else {
>/* Regardless of size of vector, it gets a vec4. This is bad
> * packing for things like floats, but otherwise arrays become a
> * mess.  Hopefully a later pass over the code can pack scalars
> * down if appropriate.
> */
> -  return 1;
> + return type->is_dual_slot_double() ? 2 : 1;
>}
> +  break;

Redundant break after return -- With that cleaned up:

Reviewed-by: Francisco Jerez 

> case GLSL_TYPE_ARRAY:
>assert(type->length > 0);
>return type_size_vec4(type->fields.array) * type->length;
> @@ -619,7 +623,6 @@ type_size_vec4(const struct glsl_type *type)
> case GLSL_TYPE_IMAGE:
>return DIV_ROUND_UP(BRW_IMAGE_PARAM_SIZE, 4);
> case GLSL_TYPE_VOID:
> -   case GLSL_TYPE_DOUBLE:
> case GLSL_TYPE_ERROR:
> case GLSL_TYPE_INTERFACE:
> case GLSL_TYPE_FUNCTION:
> -- 
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 18/30] i965/fs: support doubles with SSBO loads

2016-05-13 Thread Francisco Jerez

Samuel Iglesias Gonsálvez  writes:

> From: Iago Toral Quiroga 
>

Same comment as for PATCH 17, and likewise:

Reviewed-by: Francisco Jerez 

> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 9 ++---
>  1 file changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 73b9082..ae95448 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -3516,13 +3516,8 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
> nir_intrinsic_instr *instr
>}
>  
>/* Read the vector */
> -  fs_reg read_result = emit_untyped_read(bld, surf_index, offset_reg,
> - 1 /* dims */,
> - instr->num_components,
> - BRW_PREDICATE_NONE);
> -  read_result.type = dest.type;
> -  for (int i = 0; i < instr->num_components; i++)
> - bld.MOV(offset(dest, bld, i), offset(read_result, bld, i));
> +  do_untyped_vector_read(bld, surf_index, offset_reg,
> + dest, instr->num_components);
>  
>break;
> }
> -- 
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC] nir/validate: on failure, dump shader w/ offending line annotated

2016-05-13 Thread Connor Abbott

On Fri, May 13, 2016 at 4:14 PM, Rob Clark  wrote:
> On Fri, May 13, 2016 at 4:10 PM, Jason Ekstrand  wrote:
>> On Fri, May 13, 2016 at 1:02 PM, Rob Clark  wrote:
>>>
>>> From: Rob Clark 
>>>
>>> If we assert in nir_validate_shader(), print the shader with the
>>> offending instruction prefixed with "=>" to make it easier to find what
>>> part of the shader nir_validate is complaining about.
>>>
>>> Macro funny-business in nir_validate() was just to avoid changing a
>>> bazillion assert() lines to validate_assert() (or similar) for the point
>>> of an RFC ;-)
>>
>>
>> I love this idea.  I just wish it worked for more than just instructions.
>> It would also be fantastic if it were somehow able to print more than one
>> error.  Maybe something where we tie printing and validation together
>> somehow?  Just a thought.
>
> hmm, err_instr could easily become a void* (or array of void*?) to
> match var's/etc too..
>
> and nir_validate could easily keep a list of fails (maybe up to some
> threshold), and only assert at the end if num_errors > 0..
>
> That might be an easier way to go than merging the two existing
> passes..  although if I was starting from scratch merging the two
> might have been the better approach

If we want to show multiple failures, we probably want to display the
assertion failure inline when printing -- otherwise things might get
confusing when reading the output (which assertion goes with which
line?). We could have a way of adding annotation strings to an
instruction/variable/etc. when printing it, and then have nir_validate
use that. I'd imagine it might be useful for other things too, like
printing the results of an analysis pass for unit tests.

>
> BR,
> -R
>
>>>
>>> Example output: http://hastebin.com/raw/qorirayazu
>>> ---
>>>  src/compiler/nir/nir.h  |  1 +
>>>  src/compiler/nir/nir_print.c| 14 +-
>>>  src/compiler/nir/nir_validate.c | 15 +++
>>>  3 files changed, 29 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>>> index ade584c..6bb9fbe 100644
>>> --- a/src/compiler/nir/nir.h
>>> +++ b/src/compiler/nir/nir.h
>>> @@ -2196,6 +2196,7 @@ unsigned nir_index_instrs(nir_function_impl *impl);
>>>  void nir_index_blocks(nir_function_impl *impl);
>>>
>>>  void nir_print_shader(nir_shader *shader, FILE *fp);
>>> +void nir_print_shader_err(nir_shader *shader, FILE *fp, nir_instr
>>> *instr);
>>>  void nir_print_instr(const nir_instr *instr, FILE *fp);
>>>
>>>  nir_shader *nir_shader_clone(void *mem_ctx, const nir_shader *s);
>>> diff --git a/src/compiler/nir/nir_print.c b/src/compiler/nir/nir_print.c
>>> index a36561e..3b25a49 100644
>>> --- a/src/compiler/nir/nir_print.c
>>> +++ b/src/compiler/nir/nir_print.c
>>> @@ -53,6 +53,8 @@ typedef struct {
>>>
>>> /* an index used to make new non-conflicting names */
>>> unsigned index;
>>> +
>>> +   nir_instr *err_instr;
>>>  } print_state;
>>>
>>>  static void
>>> @@ -916,6 +918,8 @@ print_block(nir_block *block, print_state *state,
>>> unsigned tabs)
>>> free(preds);
>>>
>>> nir_foreach_instr(instr, block) {
>>> +  if (instr == state->err_instr)
>>> + fprintf(fp, "=>");
>>>print_instr(instr, state, tabs);
>>>fprintf(fp, "\n");
>>> }
>>> @@ -1090,11 +1094,13 @@ destroy_print_state(print_state *state)
>>>  }
>>>
>>>  void
>>> -nir_print_shader(nir_shader *shader, FILE *fp)
>>> +nir_print_shader_err(nir_shader *shader, FILE *fp, nir_instr *instr)
>>>  {
>>> print_state state;
>>> init_print_state(, shader, fp);
>>>
>>> +   state.err_instr = instr;
>>> +
>>> fprintf(fp, "shader: %s\n", gl_shader_stage_name(shader->stage));
>>>
>>> if (shader->info.name)
>>> @@ -1144,6 +1150,12 @@ nir_print_shader(nir_shader *shader, FILE *fp)
>>>  }
>>>
>>>  void
>>> +nir_print_shader(nir_shader *shader, FILE *fp)
>>> +{
>>> +   nir_print_shader_err(shader, fp, NULL);
>>> +}
>>> +
>>> +void
>>>  nir_print_instr(const nir_instr *instr, FILE *fp)
>>>  {
>>> print_state state = {
>>> diff --git a/src/compiler/nir/nir_validate.c
>>> b/src/compiler/nir/nir_validate.c
>>> index 84334d4..b47087f 100644
>>> --- a/src/compiler/nir/nir_validate.c
>>> +++ b/src/compiler/nir/nir_validate.c
>>> @@ -97,6 +97,21 @@ typedef struct {
>>> struct hash_table *var_defs;
>>>  } validate_state;
>>>
>>> +
>>> +
>>> +static void
>>> +dump_assert(validate_state *state, const char *failed)
>>> +{
>>> +   fprintf(stderr, "validate failed: %s\n", failed);
>>> +   if (state->instr)
>>> +  nir_print_shader_err(state->shader, stderr, state->instr);
>>> +}
>>> +
>>> +#define __assert assert
>>> +#undef assert
>>> +#define assert(x) do { if (!(x)) { dump_assert(state, #x);
>>> __assert_fail(#x, __FILE__, __LINE__, __func__); } } while (0)
>>> +
>>> +
>>>  static void validate_src(nir_src *src, validate_state *state);
>>>
>>>  static void
>>> --

Re: [Mesa-dev] [PATCH v2 17/30] i965/fs: support double with shared variable loads

2016-05-13 Thread Francisco Jerez

Samuel Iglesias Gonsálvez  writes:

> From: Iago Toral Quiroga 
>
> Reviewed-by: Kenneth Graunke 

If you change the argument ordering of do_untyped_vector_read make sure
you remember to update this patch.  With that taken into account:

Reviewed-by: Francisco Jerez 

> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 10 ++
>  1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 4af5979..73b9082 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -3041,14 +3041,8 @@ fs_visitor::nir_emit_cs_intrinsic(const fs_builder 
> ,
>}
>  
>/* Read the vector */
> -  fs_reg read_result = emit_untyped_read(bld, surf_index, offset_reg,
> - 1 /* dims */,
> - instr->num_components,
> - BRW_PREDICATE_NONE);
> -  read_result.type = dest.type;
> -  for (int i = 0; i < instr->num_components; i++)
> - bld.MOV(offset(dest, bld, i), offset(read_result, bld, i));
> -
> +  do_untyped_vector_read(bld, surf_index, offset_reg,
> + dest, instr->num_components);
>break;
> }
>  
> -- 
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 16/30] i965/fs: Add do_untyped_vector_read helper

2016-05-13 Thread Francisco Jerez

Samuel Iglesias Gonsálvez  writes:

> From: Iago Toral Quiroga 
>
> We are going to need the same logic for anything that reads
> doubles via untyped messages (CS shared variables and SSBOs). Add a
> helper function with that logic so that we can reuse it.
>
> Reviewed-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_fs.h   |  6 +++
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 64 
> 
>  2 files changed, 70 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
> b/src/mesa/drivers/dri/i965/brw_fs.h
> index 1970ad0..1aeacae 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -110,6 +110,12 @@ public:
>  const fs_reg src,
>  uint32_t components);
>  
> +   void do_untyped_vector_read(const brw::fs_builder ,
> +   const fs_reg surf_index,
> +   const fs_reg offset_reg,
> +   const fs_reg dest,
> +   unsigned num_components);
> +
> bool run_fs(bool do_rep_send);
> bool run_vs(gl_clip_plane *clip_planes);
> bool run_tcs_single_patch();
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 02f1e81..4af5979 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -3106,6 +3106,70 @@ fs_visitor::nir_emit_cs_intrinsic(const fs_builder 
> ,
>  }
>  
>  void
> +fs_visitor::do_untyped_vector_read(const fs_builder ,
> +   const fs_reg surf_index,
> +   const fs_reg offset_reg,
> +   const fs_reg dest,

Usually we put the destination register before any source registers in
function argument lists.

> +   unsigned num_components)
> +{
> +   if (type_sz(dest.type) <= 4) {

The code below isn't going to work for type_sz() < 4, maybe make the
condition 'type_sz(..) == 4'?

> +  fs_reg read_result = emit_untyped_read(bld, surf_index, offset_reg,
> + 1 /* dims */,
> + num_components,
> + BRW_PREDICATE_NONE);
> +  read_result.type = dest.type;
> +  for (unsigned i = 0; i < num_components; i++)
> + bld.MOV(offset(dest, bld, i), offset(read_result, bld, i));
> +   } else {

This can only work for type_sz() == 8, how about you make this an 'else
if' statement and put an else statement at the end with a single
unreachable() call in it?

> +  /* Reading a dvec, so we need to:
> +   *
> +   * 1. Multiply num_components by 2, to account for the fact that we
> +   *need to read 64-bit components.
> +   * 2. Shuffle the result of the load to form valid 64-bit elements
> +   * 3. Emit a second load (for components z/w) if needed.
> +   */
> +  fs_reg read_offset = vgrf(glsl_type::uint_type);
> +  bld.MOV(read_offset, offset_reg);
> +
> +  int iters = num_components <= 2 ? 1 : 2;
> +
> +  /* Load the dvec, the first iteration loads components x/y, the second
> +   * iteration, if needed, loads components z/w
> +   */
> +  for (int it = 0; it < iters; it++) {
> + /* Compute number of components to read in this iteration */
> + int iter_components = MIN2(2, num_components);
> + num_components -= iter_components;
> +
> + /* Read. Since this message reads 32-bit components, we need to
> +  * read twice as many components.
> +  */
> + fs_reg read_result = emit_untyped_read(bld, surf_index, read_offset,
> +1 /* dims */,
> +iter_components * 2,
> +BRW_PREDICATE_NONE);
> +

You'd save some retypes in the code below (actually all of them -- and
also avoid inadvertent type conversion if the destination type is an
integer) if you allocated a separate virtual register here for the
packed 64bit result, like:

|const fs_reg packed_result = bld.vgrf(dest.type, iter_components);

and then use it as destination for the shuffle function.  With the above
taken into account:

Reviewed-by: Francisco Jerez 

> + /* Shuffle the 32-bit load result into valid 64-bit data */
> + shuffle_32bit_load_result_to_64bit_data(
> +bld,
> +retype(read_result, BRW_REGISTER_TYPE_DF),
> +retype(read_result, BRW_REGISTER_TYPE_F),
> +iter_components);
> +
> + /* Move each component to its destination */
> + read_result = retype(read_result,

Re: [Mesa-dev] [PATCH 1/2] util: Add ATTRIBUTE_RETURNS_NONNULL.

2016-05-13 Thread Kenneth Graunke

On Friday, May 13, 2016 2:31:30 PM PDT Matt Turner wrote:
> ---
>  configure.ac| 1 +
>  m4/ax_gcc_func_attribute.m4 | 7 +++
>  src/util/macros.h   | 6 ++
>  3 files changed, 14 insertions(+)

I wonder if Coverity pays attention to this.  Either way, seems handy.

Series is:
Reviewed-by: Kenneth Graunke 

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] i965: initialize the alignment related bits in struct brw_reg

2016-05-13 Thread Francisco Jerez

Kenneth Graunke  writes:

> On Friday, May 13, 2016 1:21:03 PM PDT Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez  writes:
>> 
>> > With the inclusion of the "df" field in the union, this union is going
>> > to be at the offset 8 because of the alignment rules. The alignment
>> > bits in the middle are uninitialized and valgrind complains with errors
>> > similar to this:
>> >
>> > ==10298== Conditional jump or move depends on uninitialised value(s)
>> > ==10298==at 0x4C31D52: __memcmp_sse4_1 (in /usr/lib/valgrind/
> vgpreload_memcheck-amd64-linux.so)
>> > ==10298==by 0xAB16663: backend_reg::equals(backend_reg const&) const 
> (brw_shader.cpp:690)
>> > ==10298==by 0xAAB629D: fs_reg::equals(fs_reg&) const (brw_fs.cpp:456)
>> > ==10298==by 0xAAD4452: operands_match(fs_inst*, fs_inst*, bool*) 
> (brw_fs_cse.cpp:161)
>> > ==10298==by 0xAAD46C3: instructions_match(fs_inst*, fs_inst*, bool*) 
> (brw_fs_cse.cpp:187)
>> > ==10298==by 0xAAD4BAA: fs_visitor::opt_cse_local(bblock_t*) 
> (brw_fs_cse.cpp:251)
>> > ==10298==by 0xAAD5216: fs_visitor::opt_cse() (brw_fs_cse.cpp:361)
>> > ==10298==by 0xAAC8AAD: fs_visitor::optimize() (brw_fs.cpp:5401)
>> > ==10298==by 0xAACB9DC: fs_visitor::run_fs(bool) (brw_fs.cpp:5803)
>> > ==10298==by 0xAACC38B: brw_compile_fs (brw_fs.cpp:6029)
>> > ==10298==by 0xAA39796: brw_codegen_wm_prog (brw_wm.c:137)
>> > ==10298==by 0xAA3B068: brw_fs_precompile (brw_wm.c:637)
>> >
>> > This patch adds an explicit padding and initializes it to zero.
>> >
>> > Signed-off-by: Samuel Iglesias Gonsálvez 
>> > ---
>> >
>> > This patch replaces the following one:
>> >
>> > [PATCH 2/2] i965: check each field separately in backend_end::equals()
>> >
>> >  src/mesa/drivers/dri/i965/brw_reg.h | 5 -
>> >  1 file changed, 4 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_reg.h b/src/mesa/drivers/dri/
> i965/brw_reg.h
>> > index 3b76d7d..ebb7f29 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_reg.h
>> > +++ b/src/mesa/drivers/dri/i965/brw_reg.h
>> > @@ -243,6 +243,9 @@ struct brw_reg {
>> > unsigned subnr:5;  /* :1 in align16 */
>> > unsigned nr:16;
>> >  
>> > +   /* IMPORTANT: adjust padding bits if you add new fields */
>> > +   unsigned padding:32;
>> > +
>> 
>> Ugh!  It seems terribly fragile to me to make assumptions about the
>> amount of (implementation-defined) padding that you're going to end up
>> with.  It would be awful if someone builds the driver on a different
>> compiler or architecture that happens to align things differently, what
>> would cause the whole compiler back-end to behave non-deterministically
>> (possibly without any obvious sign of anything being wrong other than
>> decreased shader performance).  I think the two least insane
>> possibilities we have to fix the problem are:
>> 
>>  - memset() the whole struct at the top of brw_reg() and anywhere else a
>>brw_reg struct is initialized.
>
> This would still break in the case of:
>
> struct brw_reg foo = brw_imm_df(-1.0); // imm.df = 0xBFF0
> struct brw_reg bar = brw_imm_df(-2.0); // imm.df = 0xC000
>
> foo.type = BRW_REGISTER_TYPE_D;
> bar.type = BRW_REGISTER_TYPE_D;
> foo.f = 123;
> bar.f = 123;
>
> Here, the values are the same, but the top 32 bits are different garbage.
> Initialized, but irrelevant.
>
Yeah, good point -- Let's go with the other approach in that case.

>>  - Accept the reality that the struct contains some amount of undefined
>>padding and define a helper function (e.g. brw_regs_equal() in
>>brw_reg.h) to do the comparison manually, then use it everywhere we
>>currently use memcmp() to compare brw_regs.
>
> I think this is the best approach.
>
>> Any suggestions Matt?


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] i965: initialize the alignment related bits in struct brw_reg

2016-05-13 Thread Kenneth Graunke

On Friday, May 13, 2016 1:21:03 PM PDT Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez  writes:
> 
> > With the inclusion of the "df" field in the union, this union is going
> > to be at the offset 8 because of the alignment rules. The alignment
> > bits in the middle are uninitialized and valgrind complains with errors
> > similar to this:
> >
> > ==10298== Conditional jump or move depends on uninitialised value(s)
> > ==10298==at 0x4C31D52: __memcmp_sse4_1 (in /usr/lib/valgrind/
vgpreload_memcheck-amd64-linux.so)
> > ==10298==by 0xAB16663: backend_reg::equals(backend_reg const&) const 
(brw_shader.cpp:690)
> > ==10298==by 0xAAB629D: fs_reg::equals(fs_reg&) const (brw_fs.cpp:456)
> > ==10298==by 0xAAD4452: operands_match(fs_inst*, fs_inst*, bool*) 
(brw_fs_cse.cpp:161)
> > ==10298==by 0xAAD46C3: instructions_match(fs_inst*, fs_inst*, bool*) 
(brw_fs_cse.cpp:187)
> > ==10298==by 0xAAD4BAA: fs_visitor::opt_cse_local(bblock_t*) 
(brw_fs_cse.cpp:251)
> > ==10298==by 0xAAD5216: fs_visitor::opt_cse() (brw_fs_cse.cpp:361)
> > ==10298==by 0xAAC8AAD: fs_visitor::optimize() (brw_fs.cpp:5401)
> > ==10298==by 0xAACB9DC: fs_visitor::run_fs(bool) (brw_fs.cpp:5803)
> > ==10298==by 0xAACC38B: brw_compile_fs (brw_fs.cpp:6029)
> > ==10298==by 0xAA39796: brw_codegen_wm_prog (brw_wm.c:137)
> > ==10298==by 0xAA3B068: brw_fs_precompile (brw_wm.c:637)
> >
> > This patch adds an explicit padding and initializes it to zero.
> >
> > Signed-off-by: Samuel Iglesias Gonsálvez 
> > ---
> >
> > This patch replaces the following one:
> >
> > [PATCH 2/2] i965: check each field separately in backend_end::equals()
> >
> >  src/mesa/drivers/dri/i965/brw_reg.h | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_reg.h b/src/mesa/drivers/dri/
i965/brw_reg.h
> > index 3b76d7d..ebb7f29 100644
> > --- a/src/mesa/drivers/dri/i965/brw_reg.h
> > +++ b/src/mesa/drivers/dri/i965/brw_reg.h
> > @@ -243,6 +243,9 @@ struct brw_reg {
> > unsigned subnr:5;  /* :1 in align16 */
> > unsigned nr:16;
> >  
> > +   /* IMPORTANT: adjust padding bits if you add new fields */
> > +   unsigned padding:32;
> > +
> 
> Ugh!  It seems terribly fragile to me to make assumptions about the
> amount of (implementation-defined) padding that you're going to end up
> with.  It would be awful if someone builds the driver on a different
> compiler or architecture that happens to align things differently, what
> would cause the whole compiler back-end to behave non-deterministically
> (possibly without any obvious sign of anything being wrong other than
> decreased shader performance).  I think the two least insane
> possibilities we have to fix the problem are:
> 
>  - memset() the whole struct at the top of brw_reg() and anywhere else a
>brw_reg struct is initialized.

This would still break in the case of:

struct brw_reg foo = brw_imm_df(-1.0); // imm.df = 0xBFF0
struct brw_reg bar = brw_imm_df(-2.0); // imm.df = 0xC000

foo.type = BRW_REGISTER_TYPE_D;
bar.type = BRW_REGISTER_TYPE_D;
foo.f = 123;
bar.f = 123;

Here, the values are the same, but the top 32 bits are different garbage.
Initialized, but irrelevant.

>  - Accept the reality that the struct contains some amount of undefined
>padding and define a helper function (e.g. brw_regs_equal() in
>brw_reg.h) to do the comparison manually, then use it everywhere we
>currently use memcmp() to compare brw_regs.

I think this is the best approach.

> Any suggestions Matt?


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC] nir/validate: on failure, dump shader w/ offending line annotated

2016-05-13 Thread Eric Anholt

Rob Clark  writes:

> From: Rob Clark 
>
> If we assert in nir_validate_shader(), print the shader with the
> offending instruction prefixed with "=>" to make it easier to find what
> part of the shader nir_validate is complaining about.
>
> Macro funny-business in nir_validate() was just to avoid changing a
> bazillion assert() lines to validate_assert() (or similar) for the point
> of an RFC ;-)

This is an awesome idea.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 15/30] i965/fs: support doubles with UBO loads

2016-05-13 Thread Francisco Jerez

Samuel Iglesias Gonsálvez  writes:

> From: Iago Toral Quiroga 
>
> UBO loads with constant offset use the UNIFORM_PULL_CONSTANT_LOAD
> instruction, which reads 16 bytes (a vec4) of data from memory. For dvec
> types this only provides components x and y. Thus, if we are reading
> more than 2 components we need to issue a second load at offset+16 to
> read the next 16-byte chunk with components w and z.
>
> UBO loads with non-constant offset emit a load for each component
> in the vector (and rely in CSE to fix redundant loads), so we only
> need to consider the size of the data type when computing the offset
> of each element in a vector.
>
> v2 (Sam):
> - Adapt the code to use component() (Curro).
>
> Signed-off-by: Samuel Iglesias Gonsálvez 
> Reviewed-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 52 
> +++-
>  1 file changed, 45 insertions(+), 7 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 2d57fd3..02f1e81 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -3362,6 +3362,9 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
> nir_intrinsic_instr *instr
> nir->info.num_ubos - 1);
>}
>  
> +  /* Number of 32-bit slots in the type */
> +  unsigned type_slots = MAX2(1, type_sz(dest.type) / 4);
> +
>nir_const_value *const_offset = nir_src_as_const_value(instr->src[1]);
>if (const_offset == NULL) {
>   fs_reg base_offset = retype(get_nir_src(instr->src[1]),
> @@ -3369,19 +3372,54 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
> nir_intrinsic_instr *instr
>  
>   for (int i = 0; i < instr->num_components; i++)
>  VARYING_PULL_CONSTANT_LOAD(bld, offset(dest, bld, i), surf_index,
> -   base_offset, i * 4);
> +   base_offset, i * 4 * type_slots);

Why not 'i * type_sz(...)'?  As before it seems like type_slots is just
going to introduce rounding errors here for no benefit?

>} else {
> + /* Even if we are loading doubles, a pull constant load will load
> +  * a 32-bit vec4, so should only reserve vgrf space for that. If we
> +  * need to load a full dvec4 we will have to emit 2 loads. This is
> +  * similar to demote_pull_constants(), except that in that case we
> +  * see individual accesses to each component of the vector and then
> +  * we let CSE deal with duplicate loads. Here we see a vector access
> +  * and we have to split it if necessary.
> +  */
>   fs_reg packed_consts = vgrf(glsl_type::float_type);
>   packed_consts.type = dest.type;
>  
> - struct brw_reg const_offset_reg = brw_imm_ud(const_offset->u32[0] & 
> ~15);
> - bld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD, packed_consts,
> -  surf_index, const_offset_reg);
> + unsigned const_offset_aligned = const_offset->u32[0] & ~15;
> +
> + /* A vec4 only contains half of a dvec4, if we need more than 2
> +  * components of a dvec4 we will have to issue another load for
> +  * components z and w
> +  */
> + int num_components;
> + if (type_slots == 1)
> +num_components = instr->num_components;
> + else
> +num_components = MIN2(2, instr->num_components);
>
> - const fs_reg consts = byte_offset(packed_consts, 
> const_offset->u32[0] % 16);
> + int remaining_components = instr->num_components;
> + while (remaining_components > 0) {
> +/* Read the vec4 from a 16-byte aligned offset */
> +struct brw_reg const_offset_reg = 
> brw_imm_ud(const_offset_aligned);
> +bld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD,
> + retype(packed_consts, BRW_REGISTER_TYPE_F),
> + surf_index, const_offset_reg);
>  
> - for (unsigned i = 0; i < instr->num_components; i++)
> -bld.MOV(offset(dest, bld, i), component(consts, i));
> +const fs_reg consts = byte_offset(packed_consts, 
> (const_offset->u32[0] % 16));

This looks really fishy to me, if the initial offset is not 16B aligned
you'll apply the same sub-16B offset to the result from each one of the
subsequent pull constant loads.  Also you don't seem to take into
account whether the initial offset is misaligned in the calculation of
num_components -- If it is it looks like the first pull constant load
could return less than "num_components" usable components and you would
end up reading past the end of the return payload of the message.

> +unsigned dest_offset = instr->num_components - 
> remaining_components;
> +
> +for

Re: [Mesa-dev] ARB_cull_distance support v4?

2016-05-13 Thread Roland Scheidegger

Am 13.05.2016 um 23:12 schrieb Dave Airlie:
> On 14 May 2016 at 01:12, Roland Scheidegger  wrote:
>> Am 13.05.2016 um 06:41 schrieb Dave Airlie:
>>> This is just the core patches, as I think the lowering was pretty
>>> broken in the last couple of reposts.
>>>
>>> The lowering now lowers to one array of 8 or whatever. I need
>>> to recheck the gallium and llvmpipe bits on top of this, as I think
>>> llvmpipe will be broken.
>> Maybe. draw expects separate clip and cull dists, each packed as vec4s
>> (it could probably handle up to 2 vec4 for each).
>>
>>>
>>> I think I'm going to rip out the CULLDIST semantic from gallium,
>>> it really isn't what the hw wants.
>>>
>>
>> I can't really see how the output is going to look like from your
>> change, but there's reasons things are the way they are. This is, of
>> course, all inspired by d3d10 (this even predates the gl cull dist
>> extension) - d3d10 has these weirdo packed vec4. The problem is, in
>> d3d10, you can have a vec4 output declared, with x component being a
>> ordinary output, yz being a clipdist, and w being a cull dist. But in
>> gallium, we can't really have different semantics per output - hence
>> clip and cull must be in different outputs (and nothing else can be
>> packed into the same vars).
>> A single array for clip and cull dist each probably would have been
>> cleaner, but we didn't have input/output arrays for system values
>> neither at that time, so gallium's design is something which looks like
>> neither what gl, d3d10 nor probably hw wants, but was simple enough to
>> translate and worked.
> 
> Ouch d3d10 is ugly here. You'll probably need to patch up your state tracker
> to work with the new code, since I don't have access to it, and I'd prefer
> not to keep the in-tree code too messy just to support something crazy like
> the above.
> 
> Hopefully it should be easy enough to translate to the new scheme, since at
> least most of the hardware seems to work off either vec4[2], or
> float[8] and some
> bitmask.

Yes, shouldn't be much of a problem, since we already had to do lots of
translation. (Initially, there were not actually any properties
associated with clip/cull dists, the number of components was figured
out from the mask of the register. That stopped working though when
Marek changed those inputs to be potentially arrays. I hope that still
works...)

Roland

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 14/30] i965/fs: fix pull constant load component selection for doubles

2016-05-13 Thread Francisco Jerez

Samuel Iglesias Gonsálvez  writes:

> From: Iago Toral Quiroga 
>
> UNIFORM_PULL_CONSTANT_LOAD is used to load a contiguous vec4 starting at a
> constant offset that is 16-byte aligned. If we need to access an unaligned
> offset we emit a load with an aligned offset and use the remaining constant
> offset to select the component into the vec4 result that we are interested
> in. This component must be computed in units of the type size, since that
> is what fs_reg::set_smear expects.
>
> This patch does this change in the two places where we use this message:
> In demote_pull_constants when we lower uniform access with constant offset
> into the pull constant buffer and in UBO loads with constant offset.
>
> v2 (Sam):
> - Fix set_smear() in fs_visitor::lower_constant_loads(), take into account
> source type instead and remove MAX2 (Curro).
> - Improve changes to nir_intrinsic_load_ubo case in nir_emit_intrinsic()
> (Curro).
>
> Signed-off-by: Samuel Iglesias Gonsálvez 

Reviewed-by: Francisco Jerez 

> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp |  3 ++-
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 13 +++--
>  2 files changed, 5 insertions(+), 11 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 2383d2c..ebc5128 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -2316,7 +2316,8 @@ fs_visitor::lower_constant_loads()
>   inst->src[i].file = VGRF;
>   inst->src[i].nr = dst.nr;
>   inst->src[i].reg_offset = 0;
> - inst->src[i].set_smear(pull_index & 3);
> + inst->src[i].set_smear((pull_index & 3) * 4 /
> +type_sz(inst->src[i].type));
>  
>   brw_mark_surface_used(prog_data, index);
>}
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 4648c58..2d57fd3 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -3378,17 +3378,10 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
> nir_intrinsic_instr *instr
>   bld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD, packed_consts,
>surf_index, const_offset_reg);
>  
> - for (unsigned i = 0; i < instr->num_components; i++) {
> -packed_consts.set_smear(const_offset->u32[0] % 16 / 4 + i);
> + const fs_reg consts = byte_offset(packed_consts, 
> const_offset->u32[0] % 16);
>  
> -/* The std140 packing rules don't allow vectors to cross 16-byte
> - * boundaries, and a reg is 32 bytes.
> - */
> -assert(packed_consts.subreg_offset < 32);
> -
> -bld.MOV(dest, packed_consts);
> -dest = offset(dest, bld, 1);
> - }
> + for (unsigned i = 0; i < instr->num_components; i++)
> +bld.MOV(offset(dest, bld, i), component(consts, i));
>}
>break;
> }
> -- 
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 09/11] tgsi: remove culldist semantic.

2016-05-13 Thread Ilia Mirkin

On Fri, May 13, 2016 at 6:43 PM, Roland Scheidegger  wrote:
> Am 13.05.2016 um 23:10 schrieb Dave Airlie:
>> From: Dave Airlie 
>>
>> This isn't used anymore in the tree, culldist's
>> are part of the clipdist semantic, we could in theory
>> rename it, but I'm not sure there is much point, and
>> I'd have to be careful with virgl.
>>
>> Signed-off-by: Dave Airlie 
>> ---
>>  src/gallium/auxiliary/tgsi/tgsi_strings.c  |  1 -
>>  src/gallium/docs/source/tgsi.rst   | 22 ++
>>  src/gallium/include/pipe/p_shader_tokens.h |  1 -
>>  3 files changed, 18 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_strings.c 
>> b/src/gallium/auxiliary/tgsi/tgsi_strings.c
>> index 306ab4f..c13f7ea 100644
>> --- a/src/gallium/auxiliary/tgsi/tgsi_strings.c
>> +++ b/src/gallium/auxiliary/tgsi/tgsi_strings.c
>> @@ -85,7 +85,6 @@ const char *tgsi_semantic_names[TGSI_SEMANTIC_COUNT] =
>> "PCOORD",
>> "VIEWPORT_INDEX",
>> "LAYER",
>> -   "CULLDIST",
>> "SAMPLEID",
>> "SAMPLEPOS",
>> "SAMPLEMASK",
>> diff --git a/src/gallium/docs/source/tgsi.rst 
>> b/src/gallium/docs/source/tgsi.rst
>> index 4315707..ab12490 100644
>> --- a/src/gallium/docs/source/tgsi.rst
>> +++ b/src/gallium/docs/source/tgsi.rst
>> @@ -2876,18 +2876,32 @@ annotated with those semantics.
>>  TGSI_SEMANTIC_CLIPDIST
>>  ""
>>
>> +Note this covers clipping and culling distances.
>> +
>>  When components of vertex elements are identified this way, these
>>  values are each assumed to be a float32 signed distance to a plane.
>> +
>> +For clip distances:
>>  Primitive setup only invokes rasterization on pixels for which
>> -the interpolated plane distances are >= 0. Multiple clip planes
>> -can be implemented simultaneously, by annotating multiple
>> -components of one or more vertex elements with the above specified
>> -semantic. The limits on both clip and cull distances are bound
>> +the interpolated plane distances are >= 0.
>> +
>> +For cull distances:
>> +Primitives will be completely discarded if the plane distance
>> +for all of the vertices in the primitive are < 0.
>> +If a vertex has a cull distance of NaN, that vertex counts as "out"
>> +(as if its < 0);
>> +
>> +Multiple clip/cull planes can be implemented simultaneously, by
>> +annotating multiple components of one or more vertex elements with
>> +the above specified semantic.
>> +The limits on both clip and cull distances are bound
>>  by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines
>>  the maximum number of components that can be used to hold the
>>  distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT
>>  which specifies the maximum number of registers which can be
>>  annotated with those semantics.
>> +The properties NUM_CLIPDIST_ENABLED and NUM_CULLDIST_ENABLED
>> +are used to divide up the 2 x vec4 space between clipping and culling.
> This should really say how it's determined which one is which (so clip
> dists come first).
>
>
> You should remove the TGSI_SEMANTIC_CULLDIST section.
>
> For patch 10, shouldn't this work with softpipe too?
>
> Honestly, I'm not a big fan of packed clip and cull dists in the same
> regs (it's still not the same as what d3d10 does in any case), my
> opinion is since we generally don't allow different semantics within the
> same reg, I see no good reason why we allow it here (and clip dists and
> cull dists, albeit somewhat similar, are still different). So, if some
> drivers wanted it in different regs and some in the same regs, I'd
> prefer it to be different regs in the interface, with drivers having to
> merge it when required, just because it looks cleaner. But if really all
> hw wants it like that, 6,8-11 are
> Reviewed-by: Roland Scheidegger 
> (But I'd like to hear from other driver's authors.)

AFAIK all DX10+ NVIDIA, Intel, and AMD hardware wants it that way. I'm
not aware of any other hw out there that supports culling.
(Indications are that Adreno doesn't do either clipping or culling.
Not sure about the others.)

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 12/30] i965/fs: Fix fs_visitor::VARYING_PULL_CONSTANT_LOAD for doubles

2016-05-13 Thread Francisco Jerez

Samuel Iglesias Gonsálvez  writes:

> From: Iago Toral Quiroga 
>
> Reviewed-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 19 +--
>  1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 4827dea..2383d2c 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -194,8 +194,15 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
> ,
> else
>op = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD;
>  
> +   /* The pull load message will load a vec4 (16 bytes). If we are loading
> +* a double this means we are only loading 2 elements worth of data.
> +* We also want to use a 32-bit data type for the dst of the load 
> operation
> +* so other parts of the driver don't get confused about the size of the
> +* result.
> +*/
> int regs_written = 4 * (bld.dispatch_width() / 8) * scale;
> -   fs_reg vec4_result = fs_reg(VGRF, alloc.allocate(regs_written), dst.type);
> +   fs_reg vec4_result = fs_reg(VGRF, alloc.allocate(regs_written),
> +   BRW_REGISTER_TYPE_F);
> fs_inst *inst = bld.emit(op, vec4_result, surf_index, vec4_offset);
> inst->regs_written = regs_written;
>  
> @@ -208,7 +215,15 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
> ,
>   inst->mlen = 1 + bld.dispatch_width() / 8;
> }
>  
> -   bld.MOV(dst, offset(vec4_result, bld, ((const_offset & 0xf) / 4) * 
> scale));
> +   if (type_sz(dst.type) == 8) {

Add 'assert(scale == 1)' here because you're not taking it into account.

> +  shuffle_32bit_load_result_to_64bit_data(
> + bld, retype(vec4_result, dst.type), vec4_result, 2);
> +   }
> +
> +   vec4_result.type = dst.type;
> +   int type_slots = MAX2(type_sz(dst.type) / 4, 1);

This code is definitely not going to work for type sizes other than 8 or
4, the MAX2(..., 1) will only conceal the problem.  It seems weird that
you convert the type size into an integer number of 32-bit slots
(potentially introducing rounding errors) only to convert it back to
bytes in the next line (because const_offset is expressed in bytes
regardless).  How about:

| bld.MOV(dst, offset(vec4_result, bld,
| (const_offset & 0xf) / type_sz(vec4_result.type) * 
scale));

and remove the type_slots variable.  With that fixed:

Reviewed-by: Francisco Jerez 

(It would be nice to do the shuffle_32bit_load_result_to_64bit_data()
directly into the destination and clean up the retyping slightly, but I
guess that can be done as a follow-up).

> +   bld.MOV(dst, offset(vec4_result, bld,
> +   ((const_offset & 0xf) / (4 * type_slots)) * scale));
>  }
>  
>  /**
> -- 
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 09/11] tgsi: remove culldist semantic.

2016-05-13 Thread Roland Scheidegger

Am 13.05.2016 um 23:10 schrieb Dave Airlie:
> From: Dave Airlie 
> 
> This isn't used anymore in the tree, culldist's
> are part of the clipdist semantic, we could in theory
> rename it, but I'm not sure there is much point, and
> I'd have to be careful with virgl.
> 
> Signed-off-by: Dave Airlie 
> ---
>  src/gallium/auxiliary/tgsi/tgsi_strings.c  |  1 -
>  src/gallium/docs/source/tgsi.rst   | 22 ++
>  src/gallium/include/pipe/p_shader_tokens.h |  1 -
>  3 files changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_strings.c 
> b/src/gallium/auxiliary/tgsi/tgsi_strings.c
> index 306ab4f..c13f7ea 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_strings.c
> +++ b/src/gallium/auxiliary/tgsi/tgsi_strings.c
> @@ -85,7 +85,6 @@ const char *tgsi_semantic_names[TGSI_SEMANTIC_COUNT] =
> "PCOORD",
> "VIEWPORT_INDEX",
> "LAYER",
> -   "CULLDIST",
> "SAMPLEID",
> "SAMPLEPOS",
> "SAMPLEMASK",
> diff --git a/src/gallium/docs/source/tgsi.rst 
> b/src/gallium/docs/source/tgsi.rst
> index 4315707..ab12490 100644
> --- a/src/gallium/docs/source/tgsi.rst
> +++ b/src/gallium/docs/source/tgsi.rst
> @@ -2876,18 +2876,32 @@ annotated with those semantics.
>  TGSI_SEMANTIC_CLIPDIST
>  ""
>  
> +Note this covers clipping and culling distances.
> +
>  When components of vertex elements are identified this way, these
>  values are each assumed to be a float32 signed distance to a plane.
> +
> +For clip distances:
>  Primitive setup only invokes rasterization on pixels for which
> -the interpolated plane distances are >= 0. Multiple clip planes
> -can be implemented simultaneously, by annotating multiple
> -components of one or more vertex elements with the above specified
> -semantic. The limits on both clip and cull distances are bound
> +the interpolated plane distances are >= 0.
> +
> +For cull distances:
> +Primitives will be completely discarded if the plane distance
> +for all of the vertices in the primitive are < 0.
> +If a vertex has a cull distance of NaN, that vertex counts as "out"
> +(as if its < 0);
> +
> +Multiple clip/cull planes can be implemented simultaneously, by
> +annotating multiple components of one or more vertex elements with
> +the above specified semantic.
> +The limits on both clip and cull distances are bound
>  by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines
>  the maximum number of components that can be used to hold the
>  distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT
>  which specifies the maximum number of registers which can be
>  annotated with those semantics.
> +The properties NUM_CLIPDIST_ENABLED and NUM_CULLDIST_ENABLED
> +are used to divide up the 2 x vec4 space between clipping and culling.
This should really say how it's determined which one is which (so clip
dists come first).


You should remove the TGSI_SEMANTIC_CULLDIST section.

For patch 10, shouldn't this work with softpipe too?

Honestly, I'm not a big fan of packed clip and cull dists in the same
regs (it's still not the same as what d3d10 does in any case), my
opinion is since we generally don't allow different semantics within the
same reg, I see no good reason why we allow it here (and clip dists and
cull dists, albeit somewhat similar, are still different). So, if some
drivers wanted it in different regs and some in the same regs, I'd
prefer it to be different regs in the interface, with drivers having to
merge it when required, just because it looks cleaner. But if really all
hw wants it like that, 6,8-11 are
Reviewed-by: Roland Scheidegger 
(But I'd like to hear from other driver's authors.)

Roland

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: check tcs for NULL dereference

2016-05-13 Thread Kenneth Graunke

On Friday, May 13, 2016 1:11:28 PM PDT Mark Janes wrote:
> Coverity issue 1361544 found an instance where the tcs variable is
> checked for NULL, but unconditionally dereferenced later in the same
> function.
> 
> CC: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_tcs.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_tcs.c b/src/mesa/drivers/dri/
i965/brw_tcs.c
> index e8178c6..9589fa5 100644
> --- a/src/mesa/drivers/dri/i965/brw_tcs.c
> +++ b/src/mesa/drivers/dri/i965/brw_tcs.c
> @@ -278,14 +278,16 @@ brw_codegen_tcs_prog(struct brw_context *brw,
>  
> if (unlikely(brw->perf_debug)) {
>struct brw_shader *btcs = (struct brw_shader *) tcs;
> -  if (btcs->compiled_once) {
> - brw_tcs_debug_recompile(brw, shader_prog, key);
> +  if (btcs) {
> + if (btcs->compiled_once) {
> +brw_tcs_debug_recompile(brw, shader_prog, key);
> + }
> + btcs->compiled_once = true;
>}
>if (start_busy && !drm_intel_bo_busy(brw->batch.last_bo)) {
>   perf_debug("TCS compile took %.03f ms and stalled the GPU\n",
>  (get_time() - start_time) * 1000);
>}
> -  btcs->compiled_once = true;
> }
>  
> /* Scratch space is used for register spilling */
> 

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] ARB_cull_distance (final?) and llvmpipe support

2016-05-13 Thread Kristian Høgsberg Kristensen

Dave Airlie  writes:

> This is hopefully the final posting for this series, I've gotten
> the lowering pass to look like I wanted, which is to say it lowers
> to vec4[2].
>
> TGSI then uses the CLIPDIST semantic and the two properties to
> workout what is what. This means the CULLDIST semantic is no longer
> required.
>
> So I've ripped out CULLDIST from draw, and anywhere else it was used,
> and fixed draw to use the new API, as it more closely reflects how
> some of the hw works.
>
> I've also fixed the array size maximum checks, however the piglit
> test expects a link error when a compile error is a valid result.

This all works for me on i965. Patches 1-6

Reviewed-by: Kristian Høgsberg 

I'd be fine with landing the core part of the series now :)

Kristian
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] genxml: Use llroundf() and store to appropriate type.

2016-05-13 Thread Kristian Høgsberg

On Fri, May 13, 2016 at 2:31 PM, Matt Turner  wrote:
> Both functions return uint64_t, so I expect the masking/shifting should
> be done on 64-bit types.

Yea, this is more consistent. We don't have any fixed point fields
over 32 bits, but there's no good reason to mix 32bit and 64bit types
here.

Reviewed-by: Kristian Høgsberg 

> ---
>  src/intel/genxml/gen_pack_header.py | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/genxml/gen_pack_header.py 
> b/src/intel/genxml/gen_pack_header.py
> index 9ef7122..47870b3 100644
> --- a/src/intel/genxml/gen_pack_header.py
> +++ b/src/intel/genxml/gen_pack_header.py
> @@ -131,7 +131,7 @@ __gen_sfixed(float v, uint32_t start, uint32_t end, 
> uint32_t fract_bits)
> assert(min <= v && v <= max);
>  #endif
>
> -   const int32_t int_val = roundf(v * factor);
> +   const int64_t int_val = llroundf(v * factor);
> const uint64_t mask = ~0ull >> (64 - (end - start + 1));
>
> return (int_val & mask) << start;
> @@ -150,7 +150,7 @@ __gen_ufixed(float v, uint32_t start, uint32_t end, 
> uint32_t fract_bits)
> assert(min <= v && v <= max);
>  #endif
>
> -   const uint32_t uint_val = roundf(v * factor);
> +   const uint64_t uint_val = llroundf(v * factor);
>
> return uint_val << start;
>  }
> --
> 2.7.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 11/30] i965/fs: add shuffle_32bit_load_result_to_64bit_data helper

2016-05-13 Thread Francisco Jerez

Iago Toral  writes:

> On Thu, 2016-05-12 at 20:01 -0700, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez  writes:
>> 
>> > From: Iago Toral Quiroga 
>> >
>> > There will be a few places where we need to shuffle the result of a 32-bit
>> > load into valid 64-bit data, so extract this logic into a separate helper
>> > that we can reuse.
>> >
>> > The shuffling needs to operate with WE_all set because we are changing the
>> > layout of the data across the various channels. Otherwise we will run into
>> > problems in non-uniform control-flow scenarios.
>> >
>> I guess you could remove this paragraph because it no longer applies to
>> the last version of the patch.
>
> Ooops, yes!
>
>> > v2 (Curro):
>> > - Use subscript() instead of stride()
>> > - Assert on the input types rather than retyping.
>> > - Use offset() instead of horiz_offset(), drop the multiplier definition.
>> > - Do not use a temporary for the writes and drop force_writemask_all.
>> 
>> Don't pretend you took my "don't use a temporary" suggestion into
>> account. :P
>
> Oh right, I did write the patch with that change included but then run
> into the issue we discussed when src == dst and forgot to remove this. 
>
>> > - Mark component_i as const.
>> 
>> Did you forget to git-add something?
>
> No, somehow I lost the change in the process... I'll put it back.
>
>> > - Make the function name lower case.
>> > ---
>> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 53 
>> > 
>> >  src/mesa/drivers/dri/i965/brw_fs.h   |  5 
>> >  2 files changed, 58 insertions(+)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
>> > b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > index 15d5759..4827dea 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > @@ -212,6 +212,59 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const 
>> > fs_builder ,
>> >  }
>> >  
>> >  /**
>> > + * This helper takes the result of a load operation that reads 32-bit 
>> > elements
>> > + * in this format:
>> > + *
>> > + * x x x x x x x x
>> > + * y y y y y y y y
>> > + * z z z z z z z z
>> > + * w w w w w w w w
>> > + *
>> > + * and shuffles the data to get this:
>> > + *
>> > + * x y x y x y x y
>> > + * x y x y x y x y
>> > + * z w z w z w z w
>> > + * z w z w z w z w
>> > + *
>> > + * Which is exactly what we want if the load is reading 64-bit components
>> > + * like doubles, where x represents the low 32-bit of the x double 
>> > component
>> > + * and y represents the high 32-bit of the x double component (likewise 
>> > with
>> > + * z and w for double component y). The parameter @components represents
>> > + * the number of 64-bit components present in @src. This would typically 
>> > be
>> > + * 2 at most, since we can only fit 2 double elements in the result of a
>> > + * vec4 load.
>> > + *
>> > + * Notice that @dst and @src can be the same register.
>> > + */
>> > +void
>> > +fs_visitor::shuffle_32bit_load_result_to_64bit_data(const fs_builder ,
>> 
>> There was a second reason I had in mind when I suggested it would
>> improve encapsulation to take this out of fs_visitor: The function has
>> absolutely nothing to do with visiting, it uses no internal or external
>> fs_visitor data structures or interfaces, it doesn't even use the "this"
>> pointer.  Defining a function that could perfectly be stand-alone inside
>> an object it has no need to be in (in this case it doesn't even have any
>> logical relation with) actually *decreases* encapsulation because it
>> exposes the object's internals to the function unnecessarily.
>> 
>> Either way the back-end code is already plagued by this anti-pattern so
>> I'm not going to complain if you keep the code as-is -- You could argue
>> you're just being consistent with the existing practice. ;)
>
> I agree with all your reasoning, my only disagreement is with the fact
> that brw_fs_nir.cpp is a better place for it considering that this has
> nothing to do with NIR, but since we have to put this somewhere let's
> put it there for now.
>
Oh, I don't really mind where you put the function, the only reason I
suggested that was that I assumed you were only using it from the NIR
front-end, but brw_fs.cpp does seem like a better fit if you're using it
elsewhere.

>> > +const fs_reg dst,
>> > +const fs_reg src,
>> 
>> Pass by reference.
>
> Ok.
>
>> > +uint32_t components)
>> > +{
>> > +   assert(type_sz(src.type) == 4);
>> > +   assert(type_sz(dst.type) == 8);
>> > +
>> > +   /* A temporary that we will use to shuffle the 32-bit data of each
>> > +* component in the vector into valid 64-bit data. We can't write 
>> > directly
>> > +* to dst because dst can be (and would usually be) the same as src
>> > +* and in that case the

[Mesa-dev] [PATCH 1/2] util: Add ATTRIBUTE_RETURNS_NONNULL.

2016-05-13 Thread Matt Turner

---
 configure.ac| 1 +
 m4/ax_gcc_func_attribute.m4 | 7 +++
 src/util/macros.h   | 6 ++
 3 files changed, 14 insertions(+)

diff --git a/configure.ac b/configure.ac
index 023110e..9fbfe4d 100644
--- a/configure.ac
+++ b/configure.ac
@@ -224,6 +224,7 @@ AX_GCC_FUNC_ATTRIBUTE([format])
 AX_GCC_FUNC_ATTRIBUTE([malloc])
 AX_GCC_FUNC_ATTRIBUTE([packed])
 AX_GCC_FUNC_ATTRIBUTE([pure])
+AX_GCC_FUNC_ATTRIBUTE([returns_nonnull])
 AX_GCC_FUNC_ATTRIBUTE([unused])
 AX_GCC_FUNC_ATTRIBUTE([warn_unused_result])
 
diff --git a/m4/ax_gcc_func_attribute.m4 b/m4/ax_gcc_func_attribute.m4
index 4e0ecbb..2e67ea2 100644
--- a/m4/ax_gcc_func_attribute.m4
+++ b/m4/ax_gcc_func_attribute.m4
@@ -53,6 +53,7 @@
 #optimize
 #packed
 #pure
+#returns_nonnull
 #unused
 #used
 #visibility
@@ -76,6 +77,9 @@
 
 #serial 2
 
+# mattst88:
+# Added support for returns_nonnull attribute
+
 AC_DEFUN([AX_GCC_FUNC_ATTRIBUTE], [
 AS_VAR_PUSHDEF([ac_var], [ax_cv_have_func_attribute_$1])
 
@@ -175,6 +179,9 @@ AC_DEFUN([AX_GCC_FUNC_ATTRIBUTE], [
 [pure], [
 int foo( void ) __attribute__(($1));
 ],
+[returns_nonnull], [
+int *foo( void ) __attribute__(($1));
+],
 [unused], [
 int foo( void ) __attribute__(($1));
 ],
diff --git a/src/util/macros.h b/src/util/macros.h
index c0bfb15..9ddf675 100644
--- a/src/util/macros.h
+++ b/src/util/macros.h
@@ -154,6 +154,12 @@ do {   \
 #define ATTRIBUTE_PURE
 #endif
 
+#ifdef HAVE_FUNC_ATTRIBUTE_RETURNS_NONNULL
+#define ATTRIBUTE_RETURNS_NONNULL __attribute__((__returns_nonnull__))
+#else
+#define ATTRIBUTE_RETURNS_NONNULL
+#endif
+
 #ifdef __cplusplus
 /**
  * Macro function that evaluates to true if T is a trivially
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] nir: Mark nir_start_block()/nir_impl_last_block() with returns_nonnull.

2016-05-13 Thread Matt Turner

---
 src/compiler/nir/nir.h | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 8a616d4..7ea6791 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -34,6 +34,7 @@
 #include "util/ralloc.h"
 #include "util/set.h"
 #include "util/bitset.h"
+#include "util/macros.h"
 #include "compiler/nir_types.h"
 #include "compiler/shader_enums.h"
 #include 
@@ -1544,16 +1545,16 @@ typedef struct {
nir_metadata valid_metadata;
 } nir_function_impl;
 
-static inline nir_block *
+ATTRIBUTE_RETURNS_NONNULL static inline nir_block *
 nir_start_block(nir_function_impl *impl)
 {
-   return (nir_block *) exec_list_get_head(>body);
+   return (nir_block *) impl->body.head;
 }
 
-static inline nir_block *
+ATTRIBUTE_RETURNS_NONNULL static inline nir_block *
 nir_impl_last_block(nir_function_impl *impl)
 {
-   return (nir_block *) exec_list_get_tail(>body);
+   return (nir_block *) impl->body.tail_pred;
 }
 
 static inline nir_cf_node *
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] genxml: Use llroundf() and store to appropriate type.

2016-05-13 Thread Matt Turner

Both functions return uint64_t, so I expect the masking/shifting should
be done on 64-bit types.
---
 src/intel/genxml/gen_pack_header.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/intel/genxml/gen_pack_header.py 
b/src/intel/genxml/gen_pack_header.py
index 9ef7122..47870b3 100644
--- a/src/intel/genxml/gen_pack_header.py
+++ b/src/intel/genxml/gen_pack_header.py
@@ -131,7 +131,7 @@ __gen_sfixed(float v, uint32_t start, uint32_t end, 
uint32_t fract_bits)
assert(min <= v && v <= max);
 #endif
 
-   const int32_t int_val = roundf(v * factor);
+   const int64_t int_val = llroundf(v * factor);
const uint64_t mask = ~0ull >> (64 - (end - start + 1));
 
return (int_val & mask) << start;
@@ -150,7 +150,7 @@ __gen_ufixed(float v, uint32_t start, uint32_t end, 
uint32_t fract_bits)
assert(min <= v && v <= max);
 #endif
 
-   const uint32_t uint_val = roundf(v * factor);
+   const uint64_t uint_val = llroundf(v * factor);
 
return uint_val << start;
 }
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/5] ARB_internalformat_query2 support for OpenGL ES and other fixes

2016-05-13 Thread Alejandro Piñeiro



On 13/05/16 17:06, Ilia Mirkin wrote:
> On Fri, May 13, 2016 at 10:57 AM, Alejandro Piñeiro
>  wrote:
>> Earlier this year the support for ARB_internalformat_query2 has landed
>> [1][2], initially only for desktop GL.
>>
>> But looking more carefully to the spec [3], we found the following:
>>
>> "Dependencies
>>
>>  OpenGL 2.0 or OpenGL ES 2.0 is required"
>>
>> Note the *or*. Additionally the spec list other GL ES 2.0/3.0
>> dependencies. So that means that the extension can be also applied to
>> GL ES 2.0/3.0. FWIW, this mistake is common, as it also happens with
>> the khronos registry xml (khronos bug created [4]).
> Are you sure it's not a mistake the other way? There's no ES extension
> number allocated, and no vendor drivers expose this ext on ES, and
> this would be the first GL_ARB_* ext to be exposed in ES... normally
> these become GL_OES_bla or GL_KHR_bla.

Seems that you were right:
https://www.khronos.org/bugzilla/show_bug.cgi?id=1498#c1

Although then I don't understand why ARB_internalformat_query2 has those
dependencies to OpenGL ES 2.0/3.x and OES extensions:
https://www.khronos.org/bugzilla/show_bug.cgi?id=1498#c2

:/

BR
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] ARB_cull_distance support v4?

2016-05-13 Thread Dave Airlie

On 14 May 2016 at 01:12, Roland Scheidegger  wrote:
> Am 13.05.2016 um 06:41 schrieb Dave Airlie:
>> This is just the core patches, as I think the lowering was pretty
>> broken in the last couple of reposts.
>>
>> The lowering now lowers to one array of 8 or whatever. I need
>> to recheck the gallium and llvmpipe bits on top of this, as I think
>> llvmpipe will be broken.
> Maybe. draw expects separate clip and cull dists, each packed as vec4s
> (it could probably handle up to 2 vec4 for each).
>
>>
>> I think I'm going to rip out the CULLDIST semantic from gallium,
>> it really isn't what the hw wants.
>>
>
> I can't really see how the output is going to look like from your
> change, but there's reasons things are the way they are. This is, of
> course, all inspired by d3d10 (this even predates the gl cull dist
> extension) - d3d10 has these weirdo packed vec4. The problem is, in
> d3d10, you can have a vec4 output declared, with x component being a
> ordinary output, yz being a clipdist, and w being a cull dist. But in
> gallium, we can't really have different semantics per output - hence
> clip and cull must be in different outputs (and nothing else can be
> packed into the same vars).
> A single array for clip and cull dist each probably would have been
> cleaner, but we didn't have input/output arrays for system values
> neither at that time, so gallium's design is something which looks like
> neither what gl, d3d10 nor probably hw wants, but was simple enough to
> translate and worked.

Ouch d3d10 is ugly here. You'll probably need to patch up your state tracker
to work with the new code, since I don't have access to it, and I'd prefer
not to keep the in-tree code too messy just to support something crazy like
the above.

Hopefully it should be easy enough to translate to the new scheme, since at
least most of the hardware seems to work off either vec4[2], or
float[8] and some
bitmask.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 11/11] docs: update ARB_cull_distance status.

2016-05-13 Thread Dave Airlie

From: Dave Airlie 

Signed-off-by: Dave Airlie 
---
 docs/GL3.txt  | 2 +-
 docs/relnotes/11.3.0.html | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index e2dabea..04976c6 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -211,7 +211,7 @@ GL 4.5, GLSL 4.50:
   GL_ARB_ES3_1_compatibilitynot started
   GL_ARB_clip_control   DONE (i965, nv50, 
nvc0, r600, radeonsi, llvmpipe, softpipe)
   GL_ARB_conditional_render_invertedDONE (i965, nv50, 
nvc0, r600, radeonsi, llvmpipe, softpipe)
-  GL_ARB_cull_distance  in progress (Tobias)
+  GL_ARB_cull_distance  DONE (llvmpipe)
   GL_ARB_derivative_control DONE (i965, nv50, 
nvc0, r600, radeonsi)
   GL_ARB_direct_state_accessDONE (all drivers)
   GL_ARB_get_texture_sub_image  DONE (all drivers)
diff --git a/docs/relnotes/11.3.0.html b/docs/relnotes/11.3.0.html
index 4977afe..f772ac0 100644
--- a/docs/relnotes/11.3.0.html
+++ b/docs/relnotes/11.3.0.html
@@ -46,6 +46,7 @@ Note: some of the new features are only available with 
certain drivers.
 
 OpenGL 4.2 on radeonsi
 GL_ARB_compute_shader on radeonsi, softpipe
+GL_ARB_cull_distance on llvmpipe
 GL_ARB_framebuffer_no_attachments on nvc0, r600, radeonsi, softpipe
 GL_ARB_internalformat_query2 on all drivers
 GL_ARB_query_buffer_object on i965/hsw+
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 02/11] mesa/main: Add support for GL_ARB_cull_distance (v2)

2016-05-13 Thread Dave Airlie

From: Tobias Klausmann 

airlied:
v2: rename LowerClipDistance to LowerCombinedClipCullDistnace.
I don't think we want any other behaviour with any current hw.

Signed-off-by: Tobias Klausmann 
Reviewed-by: Edward O'Callaghan 
Reviewed-by: Ian Romanick 
Signed-off-by: Dave Airlie 
---
 src/compiler/glsl/link_varyings.cpp   |  2 +-
 src/compiler/glsl/linker.cpp  |  2 +-
 src/compiler/glsl/lower_clip_distance.cpp |  2 +-
 src/mesa/drivers/dri/i965/brw_compiler.c  |  2 +-
 src/mesa/main/extensions_table.h  |  1 +
 src/mesa/main/get.c   |  1 +
 src/mesa/main/get_hash_params.py  |  4 
 src/mesa/main/mtypes.h| 14 +-
 src/mesa/main/shaderapi.c |  3 +++
 src/mesa/state_tracker/st_extensions.c|  2 +-
 10 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/src/compiler/glsl/link_varyings.cpp 
b/src/compiler/glsl/link_varyings.cpp
index 34e82c7..2555cc9 100644
--- a/src/compiler/glsl/link_varyings.cpp
+++ b/src/compiler/glsl/link_varyings.cpp
@@ -627,7 +627,7 @@ tfeedback_decl::init(struct gl_context *ctx, const void 
*mem_ctx,
 * class must behave specially to account for the fact that gl_ClipDistance
 * is converted from a float[8] to a vec4[2].
 */
-   if (ctx->Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].LowerClipDistance 
&&
+   if 
(ctx->Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].LowerCombinedClipCullDistance
 &&
strcmp(this->var_name, "gl_ClipDistance") == 0) {
   this->lowered_builtin_array_variable = clip_distance;
}
diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp
index 0268b74..2a520bc 100644
--- a/src/compiler/glsl/linker.cpp
+++ b/src/compiler/glsl/linker.cpp
@@ -4560,7 +4560,7 @@ link_shaders(struct gl_context *ctx, struct 
gl_shader_program *prog)
   if (!prog->LinkStatus)
 goto done;
 
-  if (ctx->Const.ShaderCompilerOptions[i].LowerClipDistance) {
+  if (ctx->Const.ShaderCompilerOptions[i].LowerCombinedClipCullDistance) {
  lower_clip_distance(prog->_LinkedShaders[i]);
   }
 
diff --git a/src/compiler/glsl/lower_clip_distance.cpp 
b/src/compiler/glsl/lower_clip_distance.cpp
index 1ada215..5d9468d 100644
--- a/src/compiler/glsl/lower_clip_distance.cpp
+++ b/src/compiler/glsl/lower_clip_distance.cpp
@@ -42,7 +42,7 @@
  *
  * Since some hardware may not internally represent gl_ClipDistance as a pair
  * of vec4's, this lowering pass is optional.  To enable it, set the
- * LowerClipDistance flag in gl_shader_compiler_options to true.
+ * LowerCombinedClipCullDistance flag in gl_shader_compiler_options to true.
  */
 
 #include "glsl_symbol_table.h"
diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c 
b/src/mesa/drivers/dri/i965/brw_compiler.c
index 1e3fb41..82131db 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.c
+++ b/src/mesa/drivers/dri/i965/brw_compiler.c
@@ -168,7 +168,7 @@ brw_compiler_create(void *mem_ctx, const struct 
brw_device_info *devinfo)
   compiler->glsl_compiler_options[i].EmitNoMainReturn = true;
   compiler->glsl_compiler_options[i].EmitNoIndirectInput = true;
   compiler->glsl_compiler_options[i].EmitNoIndirectUniform = false;
-  compiler->glsl_compiler_options[i].LowerClipDistance = true;
+  compiler->glsl_compiler_options[i].LowerCombinedClipCullDistance = true;
 
   bool is_scalar = compiler->scalar_stage[i];
 
diff --git a/src/mesa/main/extensions_table.h b/src/mesa/main/extensions_table.h
index 18a5505..471b19f 100644
--- a/src/mesa/main/extensions_table.h
+++ b/src/mesa/main/extensions_table.h
@@ -44,6 +44,7 @@ EXT(ARB_conditional_render_inverted , 
ARB_conditional_render_inverted
 EXT(ARB_conservative_depth  , ARB_conservative_depth   
  , GLL, GLC,  x ,  x , 2011)
 EXT(ARB_copy_buffer , dummy_true   
  , GLL, GLC,  x ,  x , 2008)
 EXT(ARB_copy_image  , ARB_copy_image   
  , GLL, GLC,  x ,  x , 2012)
+EXT(ARB_cull_distance   , ARB_cull_distance
  , GLL, GLC,  x ,  x , 2014)
 EXT(ARB_debug_output, dummy_true   
  , GLL, GLC,  x ,  x , 2009)
 EXT(ARB_depth_buffer_float  , ARB_depth_buffer_float   
  , GLL, GLC,  x ,  x , 2008)
 EXT(ARB_depth_clamp , ARB_depth_clamp  
  , GLL, GLC,  x ,  x , 2003)
diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
index 6829c33..e3a0a11 100644
--- a/src/mesa/main/get.c
+++ b/src/mesa/main/get.c
@@ -465,6 +465,7 @@ EXTRA_EXT(ARB_shader_storage_buffer_object);
 EXTRA_EXT(ARB_indirect_parameters);
 EXTRA_EXT(ATI_meminfo);
 EXTRA_EXT(NVX_gpu_memory_info);

[Mesa-dev] [PATCH 05/11] glsl: Add arb_cull_distance support (v3)

2016-05-13 Thread Dave Airlie

From: Tobias Klausmann 

v2: make too large array a compile error
v3: squash mesa/prog patch to avoid static compiler errors in bisect

Signed-off-by: Tobias Klausmann 
Signed-off-by: Dave Airlie 
---
 src/compiler/glsl/ast_to_hir.cpp |  46 
 src/compiler/glsl/builtin_variables.cpp  |  11 ++-
 src/compiler/glsl/glcpp/glcpp-parse.y|   3 +
 src/compiler/glsl/glsl_parser_extras.cpp |   1 +
 src/compiler/glsl/glsl_parser_extras.h   |   8 ++
 src/compiler/glsl/link_varyings.cpp  |  10 +++
 src/compiler/glsl/link_varyings.h|   1 +
 src/compiler/glsl/linker.cpp | 108 +--
 src/compiler/glsl/standalone_scaffolding.cpp |   1 +
 src/compiler/glsl/tests/varyings_test.cpp|  27 +++
 src/compiler/shader_enums.h  |   4 +
 src/mesa/program/prog_print.c|   4 +
 12 files changed, 185 insertions(+), 39 deletions(-)

diff --git a/src/compiler/glsl/ast_to_hir.cpp b/src/compiler/glsl/ast_to_hir.cpp
index 5a1fc9f..338edc8 100644
--- a/src/compiler/glsl/ast_to_hir.cpp
+++ b/src/compiler/glsl/ast_to_hir.cpp
@@ -1196,20 +1196,38 @@ check_builtin_array_max_size(const char *name, unsigned 
size,
   _mesa_glsl_error(, state, "`gl_TexCoord' array size cannot "
"be larger than gl_MaxTextureCoords (%u)",
state->Const.MaxTextureCoords);
-   } else if (strcmp("gl_ClipDistance", name) == 0
-  && size > state->Const.MaxClipPlanes) {
-  /* From section 7.1 (Vertex Shader Special Variables) of the
-   * GLSL 1.30 spec:
-   *
-   *   "The gl_ClipDistance array is predeclared as unsized and
-   *   must be sized by the shader either redeclaring it with a
-   *   size or indexing it only with integral constant
-   *   expressions. ... The size can be at most
-   *   gl_MaxClipDistances."
-   */
-  _mesa_glsl_error(, state, "`gl_ClipDistance' array size cannot "
-   "be larger than gl_MaxClipDistances (%u)",
-   state->Const.MaxClipPlanes);
+   } else if (strcmp("gl_ClipDistance", name) == 0) {
+  state->clip_dist_size = size;
+  if (size + state->cull_dist_size > state->Const.MaxClipPlanes) {
+ /* From section 7.1 (Vertex Shader Special Variables) of the
+  * GLSL 1.30 spec:
+  *
+  *   "The gl_ClipDistance array is predeclared as unsized and
+  *   must be sized by the shader either redeclaring it with a
+  *   size or indexing it only with integral constant
+  *   expressions. ... The size can be at most
+  *   gl_MaxClipDistances."
+  */
+ _mesa_glsl_error(, state, "`gl_ClipDistance' array size cannot "
+  "be larger than gl_MaxClipDistances (%u)",
+  state->Const.MaxClipPlanes);
+  }
+   } else if (strcmp("gl_CullDistance", name) == 0) {
+  state->cull_dist_size = size;
+  if (size + state->clip_dist_size > state->Const.MaxClipPlanes) {
+ /* From the ARB_cull_distance spec:
+  *
+  *   "The gl_CullDistance array is predeclared as unsized and
+  *must be sized by the shader either redeclaring it with
+  *a size or indexing it only with integral constant
+  *expressions. The size determines the number and set of
+  *enabled cull distances and can be at most
+  *gl_MaxCullDistances."
+  */
+ _mesa_glsl_error(, state, "`gl_CullDistance' array size cannot "
+  "be larger than gl_MaxCullDistances (%u)",
+  state->Const.MaxClipPlanes);
+  }
}
 }
 
diff --git a/src/compiler/glsl/builtin_variables.cpp 
b/src/compiler/glsl/builtin_variables.cpp
index cc32990..ff8a7e2 100644
--- a/src/compiler/glsl/builtin_variables.cpp
+++ b/src/compiler/glsl/builtin_variables.cpp
@@ -302,7 +302,7 @@ public:
const glsl_type *construct_interface_instance() const;
 
 private:
-   glsl_struct_field fields[10];
+   glsl_struct_field fields[11];
unsigned num_fields;
 };
 
@@ -678,6 +678,11 @@ builtin_variable_generator::generate_constants()
   add_const("gl_MaxClipDistances", state->Const.MaxClipPlanes);
   add_const("gl_MaxVaryingComponents", state->ctx->Const.MaxVarying * 4);
}
+   if (state->is_version(450, 0) || state->ARB_cull_distance_enable) {
+  add_const("gl_MaxCullDistances", state->Const.MaxClipPlanes);
+  add_const("gl_MaxCombinedClipAndCullDistances",
+state->Const.MaxClipPlanes);
+   }
 
if (state->has_geometry_shader()) {
   add_const("gl_MaxVertexOutputComponents",
@@ -1249,6 +1254,10 @@ builtin_variable_generator::generate_varyings()
add_varying(VARYING_SLOT_CLIP_DIST0, array(float_t, 0),

[Mesa-dev] [PATCH 06/11] gallium: Add a pipe cap for arb_cull_distance

2016-05-13 Thread Dave Airlie

From: Tobias Klausmann 

This lets us safely enable or disable the extension as needed

Signed-off-by: Tobias Klausmann 
Reviewed-by: Edward O'Callaghan 
Reviewed-by: Marek Olšák 
Signed-off-by: Dave Airlie 
---
 src/gallium/docs/source/screen.rst   | 2 ++
 src/gallium/drivers/freedreno/freedreno_screen.c | 1 +
 src/gallium/drivers/i915/i915_screen.c   | 1 +
 src/gallium/drivers/ilo/ilo_screen.c | 1 +
 src/gallium/drivers/llvmpipe/lp_screen.c | 1 +
 src/gallium/drivers/nouveau/nv30/nv30_screen.c   | 1 +
 src/gallium/drivers/nouveau/nv50/nv50_screen.c   | 1 +
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c   | 1 +
 src/gallium/drivers/r300/r300_screen.c   | 1 +
 src/gallium/drivers/r600/r600_pipe.c | 1 +
 src/gallium/drivers/radeonsi/si_pipe.c   | 1 +
 src/gallium/drivers/softpipe/sp_screen.c | 1 +
 src/gallium/drivers/svga/svga_screen.c   | 1 +
 src/gallium/drivers/vc4/vc4_screen.c | 1 +
 src/gallium/include/pipe/p_defines.h | 1 +
 15 files changed, 16 insertions(+)

diff --git a/src/gallium/docs/source/screen.rst 
b/src/gallium/docs/source/screen.rst
index 9451075..315a6a1 100644
--- a/src/gallium/docs/source/screen.rst
+++ b/src/gallium/docs/source/screen.rst
@@ -336,6 +336,8 @@ The integer capabilities:
   PIPE_CONTEXT_ROBUST_BUFFER_ACCESS. See the ARB_robust_buffer_access_behavior
   extension for information on the required behavior for out of bounds accesses
   and accesses to unbound resources.
+* ``PIPE_CAP_CULL_DISTANCE``: Whether the driver supports the arb_cull_distance
+  extension and thus implements proper support for culling planes.
 
 
 .. _pipe_capf:
diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
b/src/gallium/drivers/freedreno/freedreno_screen.c
index fee44fa..916151c 100644
--- a/src/gallium/drivers/freedreno/freedreno_screen.c
+++ b/src/gallium/drivers/freedreno/freedreno_screen.c
@@ -257,6 +257,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_SURFACE_REINTERPRET_BLOCKS:
case PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT:
case PIPE_CAP_ROBUST_BUFFER_ACCESS_BEHAVIOR:
+   case PIPE_CAP_CULL_DISTANCE:
return 0;
 
case PIPE_CAP_MAX_VIEWPORTS:
diff --git a/src/gallium/drivers/i915/i915_screen.c 
b/src/gallium/drivers/i915/i915_screen.c
index 9b6a660..802b25f 100644
--- a/src/gallium/drivers/i915/i915_screen.c
+++ b/src/gallium/drivers/i915/i915_screen.c
@@ -271,6 +271,7 @@ i915_get_param(struct pipe_screen *screen, enum pipe_cap 
cap)
case PIPE_CAP_PCI_FUNCTION:
case PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT:
case PIPE_CAP_ROBUST_BUFFER_ACCESS_BEHAVIOR:
+   case PIPE_CAP_CULL_DISTANCE:
   return 0;
 
case PIPE_CAP_MAX_DUAL_SOURCE_RENDER_TARGETS:
diff --git a/src/gallium/drivers/ilo/ilo_screen.c 
b/src/gallium/drivers/ilo/ilo_screen.c
index 2fc1873..ddeebc9 100644
--- a/src/gallium/drivers/ilo/ilo_screen.c
+++ b/src/gallium/drivers/ilo/ilo_screen.c
@@ -500,6 +500,7 @@ ilo_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
case PIPE_CAP_PCI_FUNCTION:
case PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT:
case PIPE_CAP_ROBUST_BUFFER_ACCESS_BEHAVIOR:
+   case PIPE_CAP_CULL_DISTANCE:
   return 0;
 
case PIPE_CAP_VENDOR_ID:
diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c 
b/src/gallium/drivers/llvmpipe/lp_screen.c
index 4f61de8..c6c18ee 100644
--- a/src/gallium/drivers/llvmpipe/lp_screen.c
+++ b/src/gallium/drivers/llvmpipe/lp_screen.c
@@ -321,6 +321,7 @@ llvmpipe_get_param(struct pipe_screen *screen, enum 
pipe_cap param)
case PIPE_CAP_PCI_FUNCTION:
case PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT:
case PIPE_CAP_ROBUST_BUFFER_ACCESS_BEHAVIOR:
+   case PIPE_CAP_CULL_DISTANCE:
   return 0;
}
/* should only get here on unhandled cases */
diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c 
b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
index 400e9f5..87c3449 100644
--- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c
+++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
@@ -194,6 +194,7 @@ nv30_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_PCI_FUNCTION:
case PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT:
case PIPE_CAP_ROBUST_BUFFER_ACCESS_BEHAVIOR:
+   case PIPE_CAP_CULL_DISTANCE:
   return 0;
 
case PIPE_CAP_VENDOR_ID:
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index ef114e5..0912150 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -247,6 +247,7 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_PCI_FUNCTION:
case PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT:
case

[Mesa-dev] [PATCH 07/11] mesa/st: Add support for GL_ARB_cull_distance (v2)

2016-05-13 Thread Dave Airlie

From: Tobias Klausmann 

v2: don't bother with cull dist varyings except to assert.

Signed-off-by: Tobias Klausmann 
Reviewed-by: Marek Olšák 
Signed-off-by: Dave Airlie 
---
 src/mesa/state_tracker/st_extensions.c |  1 +
 src/mesa/state_tracker/st_program.c| 26 ++
 2 files changed, 27 insertions(+)

diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index 746f4fc..4b9a3bd 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -574,6 +574,7 @@ void st_init_extensions(struct pipe_screen *screen,
   { o(ARB_color_buffer_float),   PIPE_CAP_VERTEX_COLOR_UNCLAMPED   
},
   { o(ARB_conditional_render_inverted),  
PIPE_CAP_CONDITIONAL_RENDER_INVERTED  },
   { o(ARB_copy_image),   
PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS },
+  { o(ARB_cull_distance),PIPE_CAP_CULL_DISTANCE
},
   { o(ARB_depth_clamp),  PIPE_CAP_DEPTH_CLIP_DISABLE   
},
   { o(ARB_depth_texture),PIPE_CAP_TEXTURE_SHADOW_MAP   
},
   { o(ARB_derivative_control),   PIPE_CAP_TGSI_FS_FINE_DERIVATIVE  
},
diff --git a/src/mesa/state_tracker/st_program.c 
b/src/mesa/state_tracker/st_program.c
index 444e5aa..4e37a17 100644
--- a/src/mesa/state_tracker/st_program.c
+++ b/src/mesa/state_tracker/st_program.c
@@ -310,6 +310,11 @@ st_translate_vertex_program(struct st_context *st,
 output_semantic_name[slot] = TGSI_SEMANTIC_CLIPDIST;
 output_semantic_index[slot] = 1;
 break;
+ case VARYING_SLOT_CULL_DIST0:
+ case VARYING_SLOT_CULL_DIST1:
+/* these should have been lowered by GLSL */
+assert(0);
+break;
  case VARYING_SLOT_EDGE:
 assert(0);
 break;
@@ -366,6 +371,9 @@ st_translate_vertex_program(struct st_context *st,
if (stvp->Base.Base.ClipDistanceArraySize)
   ureg_property(ureg, TGSI_PROPERTY_NUM_CLIPDIST_ENABLED,
 stvp->Base.Base.ClipDistanceArraySize);
+   if (stvp->Base.Base.CullDistanceArraySize)
+  ureg_property(ureg, TGSI_PROPERTY_NUM_CULLDIST_ENABLED,
+stvp->Base.Base.CullDistanceArraySize);
 
if (ST_DEBUG & DEBUG_MESA) {
   _mesa_print_program(>Base.Base);
@@ -627,6 +635,11 @@ st_translate_fragment_program(struct st_context *st,
 input_semantic_index[slot] = 1;
 interpMode[slot] = TGSI_INTERPOLATE_PERSPECTIVE;
 break;
+ case VARYING_SLOT_CULL_DIST0:
+ case VARYING_SLOT_CULL_DIST1:
+/* these should have been lowered by GLSL */
+assert(0);
+break;
 /* In most cases, there is nothing special about these
  * inputs, so adopt a convention to use the generic
  * semantic name and the mesa VARYING_SLOT_ number as the
@@ -1044,6 +1057,9 @@ st_translate_program_common(struct st_context *st,
if (prog->ClipDistanceArraySize)
   ureg_property(ureg, TGSI_PROPERTY_NUM_CLIPDIST_ENABLED,
 prog->ClipDistanceArraySize);
+   if (prog->CullDistanceArraySize)
+  ureg_property(ureg, TGSI_PROPERTY_NUM_CULLDIST_ENABLED,
+prog->CullDistanceArraySize);
 
/*
 * Convert Mesa program inputs to TGSI input register semantics.
@@ -1089,6 +1105,11 @@ st_translate_program_common(struct st_context *st,
 input_semantic_name[slot] = TGSI_SEMANTIC_CLIPDIST;
 input_semantic_index[slot] = 1;
 break;
+ case VARYING_SLOT_CULL_DIST0:
+ case VARYING_SLOT_CULL_DIST1:
+/* these should have been lowered by GLSL */
+assert(0);
+break;
  case VARYING_SLOT_PSIZ:
 input_semantic_name[slot] = TGSI_SEMANTIC_PSIZE;
 input_semantic_index[slot] = 0;
@@ -1191,6 +1212,11 @@ st_translate_program_common(struct st_context *st,
 output_semantic_name[slot] = TGSI_SEMANTIC_CLIPDIST;
 output_semantic_index[slot] = 1;
 break;
+ case VARYING_SLOT_CULL_DIST0:
+ case VARYING_SLOT_CULL_DIST1:
+/* these should have been lowered by GLSL */
+assert(0);
+break;
  case VARYING_SLOT_LAYER:
 output_semantic_name[slot] = TGSI_SEMANTIC_LAYER;
 output_semantic_index[slot] = 0;
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 10/11] llvmpipe: Enable cull_distance as draw supports it.

2016-05-13 Thread Dave Airlie

From: Tobias Klausmann 

Signed-off-by: Tobias Klausmann 
Signed-off-by: Dave Airlie 
---
 src/gallium/drivers/llvmpipe/lp_screen.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c 
b/src/gallium/drivers/llvmpipe/lp_screen.c
index c6c18ee..6aef5c9 100644
--- a/src/gallium/drivers/llvmpipe/lp_screen.c
+++ b/src/gallium/drivers/llvmpipe/lp_screen.c
@@ -291,6 +291,8 @@ llvmpipe_get_param(struct pipe_screen *screen, enum 
pipe_cap param)
case PIPE_CAP_TEXTURE_FLOAT_LINEAR:
case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR:
   return 1;
+   case PIPE_CAP_CULL_DISTANCE:
+  return 1;
case PIPE_CAP_MULTISAMPLE_Z_RESOLVE:
case PIPE_CAP_RESOURCE_FROM_USER_MEMORY:
case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
@@ -321,7 +323,6 @@ llvmpipe_get_param(struct pipe_screen *screen, enum 
pipe_cap param)
case PIPE_CAP_PCI_FUNCTION:
case PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT:
case PIPE_CAP_ROBUST_BUFFER_ACCESS_BEHAVIOR:
-   case PIPE_CAP_CULL_DISTANCE:
   return 0;
}
/* should only get here on unhandled cases */
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 09/11] tgsi: remove culldist semantic.

2016-05-13 Thread Dave Airlie

From: Dave Airlie 

This isn't used anymore in the tree, culldist's
are part of the clipdist semantic, we could in theory
rename it, but I'm not sure there is much point, and
I'd have to be careful with virgl.

Signed-off-by: Dave Airlie 
---
 src/gallium/auxiliary/tgsi/tgsi_strings.c  |  1 -
 src/gallium/docs/source/tgsi.rst   | 22 ++
 src/gallium/include/pipe/p_shader_tokens.h |  1 -
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_strings.c 
b/src/gallium/auxiliary/tgsi/tgsi_strings.c
index 306ab4f..c13f7ea 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_strings.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_strings.c
@@ -85,7 +85,6 @@ const char *tgsi_semantic_names[TGSI_SEMANTIC_COUNT] =
"PCOORD",
"VIEWPORT_INDEX",
"LAYER",
-   "CULLDIST",
"SAMPLEID",
"SAMPLEPOS",
"SAMPLEMASK",
diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst
index 4315707..ab12490 100644
--- a/src/gallium/docs/source/tgsi.rst
+++ b/src/gallium/docs/source/tgsi.rst
@@ -2876,18 +2876,32 @@ annotated with those semantics.
 TGSI_SEMANTIC_CLIPDIST
 ""
 
+Note this covers clipping and culling distances.
+
 When components of vertex elements are identified this way, these
 values are each assumed to be a float32 signed distance to a plane.
+
+For clip distances:
 Primitive setup only invokes rasterization on pixels for which
-the interpolated plane distances are >= 0. Multiple clip planes
-can be implemented simultaneously, by annotating multiple
-components of one or more vertex elements with the above specified
-semantic. The limits on both clip and cull distances are bound
+the interpolated plane distances are >= 0.
+
+For cull distances:
+Primitives will be completely discarded if the plane distance
+for all of the vertices in the primitive are < 0.
+If a vertex has a cull distance of NaN, that vertex counts as "out"
+(as if its < 0);
+
+Multiple clip/cull planes can be implemented simultaneously, by
+annotating multiple components of one or more vertex elements with
+the above specified semantic.
+The limits on both clip and cull distances are bound
 by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines
 the maximum number of components that can be used to hold the
 distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT
 which specifies the maximum number of registers which can be
 annotated with those semantics.
+The properties NUM_CLIPDIST_ENABLED and NUM_CULLDIST_ENABLED
+are used to divide up the 2 x vec4 space between clipping and culling.
 
 TGSI_SEMANTIC_SAMPLEID
 ""
diff --git a/src/gallium/include/pipe/p_shader_tokens.h 
b/src/gallium/include/pipe/p_shader_tokens.h
index 514b339..b9d28fe 100644
--- a/src/gallium/include/pipe/p_shader_tokens.h
+++ b/src/gallium/include/pipe/p_shader_tokens.h
@@ -185,7 +185,6 @@ enum tgsi_semantic {
TGSI_SEMANTIC_PCOORD,  /**< point sprite coordinate */
TGSI_SEMANTIC_VIEWPORT_INDEX,  /**< viewport index */
TGSI_SEMANTIC_LAYER,   /**< layer (rendertarget index) */
-   TGSI_SEMANTIC_CULLDIST,
TGSI_SEMANTIC_SAMPLEID,
TGSI_SEMANTIC_SAMPLEPOS,
TGSI_SEMANTIC_SAMPLEMASK,
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 04/11] glsl: Extend lowering pass for gl_ClipDistance to support other arrays (v4)

2016-05-13 Thread Dave Airlie

From: Tobias Klausmann 

This will come in handy when we want to lower gl_CullDistance into
gl_CullDistanceMESA.

[airlied: drop separate APIs for clip/cull - just use single API
to call both passes.]

v3: reexamine my sanity, this was pretty broken, the new code
creates one copy of gl_ClipDistanceMESA, as the clip distance
varying and lowers everything into that in two passes, one for clips
one for culls.
v4: rework using the passes in clip/cull sizes, instead of the
array sizes.

Signed-off-by: Tobias Klausmann 
Signed-off-by: Dave Airlie 
---
 src/compiler/glsl/ir_optimization.h  |   4 +-
 src/compiler/glsl/linker.cpp |   4 +-
 src/compiler/glsl/lower_distance.cpp | 248 ++-
 3 files changed, 161 insertions(+), 95 deletions(-)

diff --git a/src/compiler/glsl/ir_optimization.h 
b/src/compiler/glsl/ir_optimization.h
index f9599a3..2a1b709 100644
--- a/src/compiler/glsl/ir_optimization.h
+++ b/src/compiler/glsl/ir_optimization.h
@@ -117,7 +117,9 @@ bool lower_variable_index_to_cond_assign(gl_shader_stage 
stage,
 bool lower_temp, bool lower_uniform);
 bool lower_quadop_vector(exec_list *instructions, bool dont_lower_swz);
 bool lower_const_arrays_to_uniforms(exec_list *instructions);
-bool lower_clip_distance(gl_shader *shader);
+bool lower_combined_clip_cull_distance(gl_shader *shader,
+   uint8_t clipDistanceArraySize,
+   uint8_t cullDistanceArraySize);
 void lower_output_reads(unsigned stage, exec_list *instructions);
 bool lower_packing_builtins(exec_list *instructions, int op_mask);
 void lower_shared_reference(struct gl_shader *shader, unsigned *shared_size);
diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp
index 2a520bc..a85072d 100644
--- a/src/compiler/glsl/linker.cpp
+++ b/src/compiler/glsl/linker.cpp
@@ -4561,7 +4561,9 @@ link_shaders(struct gl_context *ctx, struct 
gl_shader_program *prog)
 goto done;
 
   if (ctx->Const.ShaderCompilerOptions[i].LowerCombinedClipCullDistance) {
- lower_clip_distance(prog->_LinkedShaders[i]);
+ lower_combined_clip_cull_distance(prog->_LinkedShaders[i], 
+   
(uint8_t)prog->LastClipDistanceArraySize,
+   
(uint8_t)prog->LastCullDistanceArraySize);
   }
 
   if (ctx->Const.LowerTessLevel) {
diff --git a/src/compiler/glsl/lower_distance.cpp 
b/src/compiler/glsl/lower_distance.cpp
index 301afe4..0316fb5 100644
--- a/src/compiler/glsl/lower_distance.cpp
+++ b/src/compiler/glsl/lower_distance.cpp
@@ -25,14 +25,17 @@
  * \file lower_distance.cpp
  *
  * This pass accounts for the difference between the way
- * gl_ClipDistance is declared in standard GLSL (as an array of
- * floats), and the way it is frequently implemented in hardware (as
- * a pair of vec4s, with four clip distances packed into each).
+ * gl_ClipDistance or gl_CullDistance is declared in standard GLSL
+ * (as an array of floats), and the way it is frequently implemented
+ * in hardware (as a pair of vec4s, with four clip or cull distances
+ * packed into each).
  *
- * The declaration of gl_ClipDistance is replaced with a declaration
- * of gl_ClipDistanceMESA, and any references to gl_ClipDistance are
- * translated to refer to gl_ClipDistanceMESA with the appropriate
- * swizzling of array indices.  For instance:
+ * The declarations of gl_ClipDistance or gl_CullDistance are replaced
+ * with a single declaration of gl_ClipDistanceMESA.
+ * Any references to the original gl_ClipDistance or gl_CullDistance
+ * are translated to refer to gl_ClipDistanceMESA with the appropriate
+ * swizzling of array indices.
+ * For instance:
  *
  *   gl_ClipDistance[i]
  *
@@ -40,11 +43,12 @@
  *
  *   gl_ClipDistanceMESA[i>>2][i&3]
  *
- * Since some hardware may not internally represent gl_ClipDistance as a pair
- * of vec4's, this lowering pass is optional.  To enable it, set the
- * LowerCombinedClipCullDistance flag in gl_shader_compiler_options to true.
+ * Since some hardware may not internally represent these arrays as a
+ * pair of vec4's, this lowering pass is optional.  To enable it, set
+ * the LowerCombinedClipCullDistance flag in gl_shader_compiler_options to 
true.
  */
 
+#include 
 #include "glsl_symbol_table.h"
 #include "ir_rvalue_visitor.h"
 #include "ir.h"
@@ -54,10 +58,16 @@ namespace {
 
 class lower_distance_visitor : public ir_rvalue_visitor {
 public:
-   explicit lower_distance_visitor(gl_shader_stage shader_stage)
+   explicit lower_distance_visitor(gl_shader_stage shader_stage,
+   std::string in_name, ir_variable *out_var,
+   int num_clip_dist, int num_cull_dist,
+   bool is_cull, bool replace_var)
   : progress(false),

[Mesa-dev] [PATCH 08/11] draw: stop using CULLDIST semantic.

2016-05-13 Thread Dave Airlie

From: Dave Airlie 

The way the HW works doesn't really fit with having
two semantics for this.

The GLSL compiler emits 2 vec4s and two properties,
this makes draw use those instead of CULLDIST semantics.

Signed-off-by: Dave Airlie 
---
 src/gallium/auxiliary/draw/draw_cliptest_tmp.h |  4 ++--
 src/gallium/auxiliary/draw/draw_context.c  | 16 +++-
 src/gallium/auxiliary/draw/draw_gs.c   |  7 +--
 src/gallium/auxiliary/draw/draw_gs.h   |  3 +--
 src/gallium/auxiliary/draw/draw_llvm.c |  4 ++--
 src/gallium/auxiliary/draw/draw_pipe_clip.c|  2 +-
 src/gallium/auxiliary/draw/draw_pipe_cull.c| 25 +++--
 src/gallium/auxiliary/draw/draw_private.h  |  5 ++---
 src/gallium/auxiliary/draw/draw_vs.c   | 10 +++---
 src/gallium/auxiliary/draw/draw_vs.h   |  3 +--
 src/gallium/auxiliary/tgsi/tgsi_scan.c |  3 +--
 11 files changed, 32 insertions(+), 50 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h 
b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
index 34add82..6fbefa5 100644
--- a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
+++ b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
@@ -49,8 +49,8 @@ static boolean TAG(do_cliptest)( struct pt_post_vs *pvs,
int num_written_clipdistance =
   draw_current_shader_num_written_clipdistances(pvs->draw);
 
-   cd[0] = draw_current_shader_clipdistance_output(pvs->draw, 0);
-   cd[1] = draw_current_shader_clipdistance_output(pvs->draw, 1);
+   cd[0] = draw_current_shader_ccdistance_output(pvs->draw, 0);
+   cd[1] = draw_current_shader_ccdistance_output(pvs->draw, 1);
 
if (cd[0] != pos || cd[1] != pos)
   have_cd = true;
diff --git a/src/gallium/auxiliary/draw/draw_context.c 
b/src/gallium/auxiliary/draw/draw_context.c
index 3f36b34..6305761 100644
--- a/src/gallium/auxiliary/draw/draw_context.c
+++ b/src/gallium/auxiliary/draw/draw_context.c
@@ -887,12 +887,12 @@ draw_current_shader_clipvertex_output(const struct 
draw_context *draw)
 }
 
 uint
-draw_current_shader_clipdistance_output(const struct draw_context *draw, int 
index)
+draw_current_shader_ccdistance_output(const struct draw_context *draw, int 
index)
 {
debug_assert(index < PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT);
if (draw->gs.geometry_shader)
-  return draw->gs.geometry_shader->clipdistance_output[index];
-   return draw->vs.clipdistance_output[index];
+  return draw->gs.geometry_shader->ccdistance_output[index];
+   return draw->vs.ccdistance_output[index];
 }
 
 
@@ -904,16 +904,6 @@ draw_current_shader_num_written_clipdistances(const struct 
draw_context *draw)
return draw->vs.vertex_shader->info.num_written_clipdistance;
 }
 
-
-uint
-draw_current_shader_culldistance_output(const struct draw_context *draw, int 
index)
-{
-   debug_assert(index < PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT);
-   if (draw->gs.geometry_shader)
-  return draw->gs.geometry_shader->culldistance_output[index];
-   return draw->vs.vertex_shader->culldistance_output[index];
-}
-
 uint
 draw_current_shader_num_written_culldistances(const struct draw_context *draw)
 {
diff --git a/src/gallium/auxiliary/draw/draw_gs.c 
b/src/gallium/auxiliary/draw/draw_gs.c
index 6cf8846..18af1d9 100644
--- a/src/gallium/auxiliary/draw/draw_gs.c
+++ b/src/gallium/auxiliary/draw/draw_gs.c
@@ -803,12 +803,7 @@ draw_create_geometry_shader(struct draw_context *draw,
   if (gs->info.output_semantic_name[i] == TGSI_SEMANTIC_CLIPDIST) {
  debug_assert(gs->info.output_semantic_index[i] <
   PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT);
- gs->clipdistance_output[gs->info.output_semantic_index[i]] = i;
-  }
-  if (gs->info.output_semantic_name[i] == TGSI_SEMANTIC_CULLDIST) {
- debug_assert(gs->info.output_semantic_index[i] <
-  PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT);
- gs->culldistance_output[gs->info.output_semantic_index[i]] = i;
+ gs->ccdistance_output[gs->info.output_semantic_index[i]] = i;
   }
}
 
diff --git a/src/gallium/auxiliary/draw/draw_gs.h 
b/src/gallium/auxiliary/draw/draw_gs.h
index d256a05..149278d 100644
--- a/src/gallium/auxiliary/draw/draw_gs.h
+++ b/src/gallium/auxiliary/draw/draw_gs.h
@@ -68,8 +68,7 @@ struct draw_geometry_shader {
struct tgsi_shader_info info;
unsigned position_output;
unsigned viewport_index_output;
-   unsigned clipdistance_output[PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT];
-   unsigned culldistance_output[PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT];
+   unsigned ccdistance_output[PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT];
 
unsigned max_output_vertices;
unsigned primitive_boundary;
diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index c1460cc..b098132 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++

[Mesa-dev] [PATCH 01/11] glapi: Add GL_ARB_cull_distance

2016-05-13 Thread Dave Airlie

From: Tobias Klausmann 

Signed-off-by: Tobias Klausmann 
Reviewed-by: Edward O'Callaghan 
Signed-off-by: Dave Airlie 
---
 src/mapi/glapi/gen/gl_API.xml | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
index 4bfa9ef..2fcba0b 100644
--- a/src/mapi/glapi/gen/gl_API.xml
+++ b/src/mapi/glapi/gen/gl_API.xml
@@ -8269,7 +8269,12 @@
 
 
 
-
+
+
+
+
+
+
 
 http://www.w3.org/2001/XInclude"/>
 
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 03/11] glsl: rename lower_clip_distance to lower_distance.

2016-05-13 Thread Dave Airlie

From: Dave Airlie 

This just renames the file in anticipation of adding cull lowering,
and renames the internals.

Signed-off-by: Tobias Klausmann 
Signed-off-by: Dave Airlie 
Reviewed-by: Edward O'Callaghan 
---
 src/compiler/Makefile.sources |   2 +-
 src/compiler/glsl/lower_clip_distance.cpp | 574 --
 src/compiler/glsl/lower_distance.cpp  | 574 ++
 3 files changed, 575 insertions(+), 575 deletions(-)
 delete mode 100644 src/compiler/glsl/lower_clip_distance.cpp
 create mode 100644 src/compiler/glsl/lower_distance.cpp

diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
index 66fbd84..97f9eb4 100644
--- a/src/compiler/Makefile.sources
+++ b/src/compiler/Makefile.sources
@@ -77,10 +77,10 @@ LIBGLSL_FILES = \
glsl/loop_unroll.cpp \
glsl/lower_buffer_access.cpp \
glsl/lower_buffer_access.h \
-   glsl/lower_clip_distance.cpp \
glsl/lower_const_arrays_to_uniforms.cpp \
glsl/lower_discard.cpp \
glsl/lower_discard_flow.cpp \
+   glsl/lower_distance.cpp \
glsl/lower_if_to_cond_assign.cpp \
glsl/lower_instructions.cpp \
glsl/lower_jumps.cpp \
diff --git a/src/compiler/glsl/lower_clip_distance.cpp 
b/src/compiler/glsl/lower_clip_distance.cpp
deleted file mode 100644
index 5d9468d..000
--- a/src/compiler/glsl/lower_clip_distance.cpp
+++ /dev/null
@@ -1,574 +0,0 @@
-/*
- * Copyright © 2011 Intel Corporation
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice (including the next
- * paragraph) shall be included in all copies or substantial portions of the
- * Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
- * DEALINGS IN THE SOFTWARE.
- */
-
-/**
- * \file lower_clip_distance.cpp
- *
- * This pass accounts for the difference between the way
- * gl_ClipDistance is declared in standard GLSL (as an array of
- * floats), and the way it is frequently implemented in hardware (as
- * a pair of vec4s, with four clip distances packed into each).
- *
- * The declaration of gl_ClipDistance is replaced with a declaration
- * of gl_ClipDistanceMESA, and any references to gl_ClipDistance are
- * translated to refer to gl_ClipDistanceMESA with the appropriate
- * swizzling of array indices.  For instance:
- *
- *   gl_ClipDistance[i]
- *
- * is translated into:
- *
- *   gl_ClipDistanceMESA[i>>2][i&3]
- *
- * Since some hardware may not internally represent gl_ClipDistance as a pair
- * of vec4's, this lowering pass is optional.  To enable it, set the
- * LowerCombinedClipCullDistance flag in gl_shader_compiler_options to true.
- */
-
-#include "glsl_symbol_table.h"
-#include "ir_rvalue_visitor.h"
-#include "ir.h"
-#include "program/prog_instruction.h" /* For WRITEMASK_* */
-
-namespace {
-
-class lower_clip_distance_visitor : public ir_rvalue_visitor {
-public:
-   explicit lower_clip_distance_visitor(gl_shader_stage shader_stage)
-  : progress(false), old_clip_distance_out_var(NULL),
-old_clip_distance_in_var(NULL), new_clip_distance_out_var(NULL),
-new_clip_distance_in_var(NULL), shader_stage(shader_stage)
-   {
-   }
-
-   virtual ir_visitor_status visit(ir_variable *);
-   void create_indices(ir_rvalue*, ir_rvalue *&, ir_rvalue *&);
-   bool is_clip_distance_vec8(ir_rvalue *ir);
-   ir_rvalue *lower_clip_distance_vec8(ir_rvalue *ir);
-   virtual ir_visitor_status visit_leave(ir_assignment *);
-   void visit_new_assignment(ir_assignment *ir);
-   virtual ir_visitor_status visit_leave(ir_call *);
-
-   virtual void handle_rvalue(ir_rvalue **rvalue);
-
-   void fix_lhs(ir_assignment *);
-
-   bool progress;
-
-   /**
-* Pointer to the declaration of gl_ClipDistance, if found.
-*
-* Note:
-*
-* - the in_var is for geometry and both tessellation shader inputs only.
-*
-* - since gl_ClipDistance is available in tessellation control,
-*   tessellation evaluation and geometry shaders as both an input
-*   and an

[Mesa-dev] arb_cull_distance (last time for sure)

2016-05-13 Thread Dave Airlie

Same patch series as last time, except

squashed the mesa/prog patch as it broke bisecting with static asserts
on builds.

renamed more stuff in the renaming patch so most clip_distance->distance
should make reviewing the lowering pass patch easier.

Dave.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] ARB_cull_distance (final?) and llvmpipe support

2016-05-13 Thread Dave Airlie

On 13 May 2016 at 21:45, Tobias Klausmann
 wrote:
> Hi Dave,
>
> i was not aware you are working on this actively as well, i had a series
> posted 5 days ago which got some critics and reviews [1]. The most important
> points where:

I wasn't really, then krh asked my how it worked and when I read the
code I realised it didn't
work like I thought so I decided to fix it up and try and get it merged.

>
> 1. split functional change and renaming of the lowering pass [Ian]

I've done that now,
>
> 2. check max clip/cull array sizes in link_shaders for all stages [Ian]

I did that in this series I think. (just earlier so it's a compile error).
>
> 3. drop culldist semantics, which you already did [Ilia]

And that should be done.

>
>
> If you are interested in changes made to satisfy 1+2, you can fetch patches
> from here: https://git.thm.de/tjkl80/mesa.git arb-cull-distance
>

One I've hopefully arrived a similiar place, I'll repost what I did, 2
I think I fixed elsewhere.

Thanks, and sorry for treading on your toes!

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] i965: Enable ES 3.2 sample shading extensions.

2016-05-13 Thread Ian Romanick

This series looks good to me.  Perhaps docs/GL3.txt needs some updates?

Reviewed-by: Ian Romanick 

On 05/12/2016 06:31 PM, Kenneth Graunke wrote:
> This enables:
> - GL_OES_sample_shading
> - GL_OES_sample_variables
> - GL_OES_shader_multisample_interpolation
> 
> We pass all the CTS tests, and all but 8 of the dEQP-GLES31 tests.
> Half of the failing dEQP tests appear to be broken tests; the other
> four still need investigating but cover an obscure corner case.
> 
> Otherwise, the functionality is in great shape, and actually in better
> shape than the ARB_sample_shading support we've shipped for years
> (we fixed a number of bugs when enabling this).  So it makes sense to
> enable it for our users in the Mesa 12.0 release.
> 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/intel_extensions.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
> b/src/mesa/drivers/dri/i965/intel_extensions.c
> index 8d98788..7f44c1d 100644
> --- a/src/mesa/drivers/dri/i965/intel_extensions.c
> +++ b/src/mesa/drivers/dri/i965/intel_extensions.c
> @@ -308,6 +308,7 @@ intelInitExtensions(struct gl_context *ctx)
>ctx->Extensions.EXT_framebuffer_multisample_blit_scaled = true;
>ctx->Extensions.EXT_transform_feedback = true;
>ctx->Extensions.OES_depth_texture_cube_map = true;
> +  ctx->Extensions.OES_sample_variables = true;
>  
>ctx->Extensions.ARB_timer_query = brw->intelScreen->hw_has_timestamp;
>  
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] anv/copy: Fix Copying between Buffers and Images of different dimensions

2016-05-13 Thread Jason Ekstrand

On Mon, May 9, 2016 at 12:26 PM, Nanley Chery  wrote:

> From: Nanley Chery 
>
> This function previously assumed that the Buffer and Image had matching
> dimensions. However, it is possible to copy from a buffer with larger
> dimensions than the image. Modify the copy function to enable this.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95292
> Signed-off-by: Nanley Chery 
> ---
>  src/intel/vulkan/anv_meta_copy.c | 19 +++
>  1 file changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/src/intel/vulkan/anv_meta_copy.c
> b/src/intel/vulkan/anv_meta_copy.c
> index 982fa7e..1d131d3 100644
> --- a/src/intel/vulkan/anv_meta_copy.c
> +++ b/src/intel/vulkan/anv_meta_copy.c
> @@ -128,18 +128,20 @@ meta_copy_buffer_to_image(struct anv_cmd_buffer
> *cmd_buffer,
>const VkOffset3D img_offset_el =
>   meta_region_offset_el(image, [r].imageOffset);
>const VkExtent3D bufferExtent = {
> - .width = pRegions[r].bufferRowLength,
> - .height = pRegions[r].bufferImageHeight,
> + .width  = MAX(pRegions[r].bufferRowLength,
> +   pRegions[r].imageExtent.width),
>

As I commented on IRC, I think this would be better as
"pRegions[r].bufferRowLength ? pRegions[r].BufferRowLength :
pRegions[r].imageExtent.width"

With that,

Reviewed-by: Jason Ekstrand 


> + .height = MAX(pRegions[r].bufferImageHeight,
> +   pRegions[r].imageExtent.height),
>};
> -
> -  /* Start creating blit rect */
>const VkExtent3D buf_extent_el =
>   meta_region_extent_el(image, );
> +
> +  /* Start creating blit rect */
>const VkExtent3D img_extent_el =
>   meta_region_extent_el(image, [r].imageExtent);
>struct anv_meta_blit2d_rect rect = {
> - .width = MAX2(buf_extent_el.width, img_extent_el.width),
> - .height = MAX2(buf_extent_el.height, img_extent_el.height),
> + .width = img_extent_el.width,
> + .height =  img_extent_el.height,
>};
>
>/* Create blit surfaces */
> @@ -153,7 +155,7 @@ meta_copy_buffer_to_image(struct anv_cmd_buffer
> *cmd_buffer,
>   .tiling = ISL_TILING_LINEAR,
>   .base_offset = buffer->offset + pRegions[r].bufferOffset,
>   .bs = forward ? image->format->isl_layout->bs : img_bsurf.bs,
> - .pitch = rect.width * buf_bsurf.bs,
> + .pitch = buf_extent_el.width * buf_bsurf.bs,
>};
>
>/* Set direction-dependent variables */
> @@ -188,7 +190,8 @@ meta_copy_buffer_to_image(struct anv_cmd_buffer
> *cmd_buffer,
>* increment the offset directly in the image effectively
>* re-binding it to different backing memory.
>*/
> - buf_bsurf.base_offset += rect.width * rect.height * buf_bsurf.bs
> ;
> + buf_bsurf.base_offset += buf_extent_el.width *
> +  buf_extent_el.height * buf_bsurf.bs;
>
>   if (image->type == VK_IMAGE_TYPE_3D)
>  slice_3d++;
> --
> 2.8.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 11/15] glsl/linker: dvec3/dvec4 may consume twice input vertex attributes

2016-05-13 Thread Dave Airlie

On 13 May 2016 at 18:34, Juan A. Suarez Romero  wrote:
> On Fri, 2016-05-13 at 05:34 +1000, Dave Airlie wrote:
>> On 13 May 2016 4:28 AM, "Antia Puentes"  wrote:
>> >
>> >
>> > From: "Juan A. Suarez Romero" 
>> >
>> > From the GL 4.5 core spec, section 11.1.1 (Vertex Attributes):
>> >
>> > "A program with more than the value of MAX_VERTEX_ATTRIBS
>> > active attribute variables may fail to link, unless
>> > device-dependent optimizations are able to make the program
>> > fit within available hardware resources. For the purposes
>> > of this test, attribute variables of the type dvec3, dvec4,
>> > dmat2x3, dmat2x4, dmat3, dmat3x4, dmat4x3, and dmat4 may
>> > count as consuming twice as many attributes as equivalent
>> > single-precision types. While these types use the same number
>> > of generic attributes as their single-precision equivalents,
>> > implementations are permitted to consume two single-precision
>> > vectors of internal storage for each three- or four-component
>> > double-precision vector."
>> >
>> > This commits adds a flag that allows driver to specify if dvec3,
>> > dvec4,
>> > dmat2x3, dmat2x4, dmat3, dmat3x4, dmat4x3 and dmat4 count as
>> > consuming
>> > twice as many attributes as equivalent single-precision types
>> > (default
>> > value being false).
>> Doesn't this patch break all the drivers currently implementing this
>> extension?
>>
>> If I read it correctly, it creates the new Const, and then turns off
>> the feature.
>>
>
>
> Right. That const defines if those doubles consume two locations (flag
> as true) or just one (flag as false), for the purposes of checking if
> it reaches the MAX_VERTEX_ATTRIBS.
>
> And the default value is to count as one (flag as false). The reason is
> that this is what is happening right now in that function, except when
> we use explicit location.
>
> When you added the code to count doubles as consuming two locations,
> you only did it if the locations were explicit. But in other case,
> double attributes as counted as consuming one attribute.
>
> I don't know if you only added it with explicit location for a good
> reason, or just forgot to add in the general case.
>
> So I took the general case as the default one.
>
> If actually the general case should count the doubles as consuming two
> (as in the case of explicit), when either we can swap the flag set it
> to true as default, or directly remove the flag and force all drivers
> to count doubles as consuming two attributes.

For MAX_VERTEX_ATTRIBS I think we always want to count as
2. It's was an oversight on my part I think if I missed that.

But yes be careful with locations as it consumes two attributes,
but not two locations.

so
location 3 dvec3
location 4 dvec3

is valid, but they consume 4 attributes.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] i965: initialize the alignment related bits in struct brw_reg

2016-05-13 Thread Francisco Jerez

Samuel Iglesias Gonsálvez  writes:

> With the inclusion of the "df" field in the union, this union is going
> to be at the offset 8 because of the alignment rules. The alignment
> bits in the middle are uninitialized and valgrind complains with errors
> similar to this:
>
> ==10298== Conditional jump or move depends on uninitialised value(s)
> ==10298==at 0x4C31D52: __memcmp_sse4_1 (in 
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==10298==by 0xAB16663: backend_reg::equals(backend_reg const&) const 
> (brw_shader.cpp:690)
> ==10298==by 0xAAB629D: fs_reg::equals(fs_reg&) const (brw_fs.cpp:456)
> ==10298==by 0xAAD4452: operands_match(fs_inst*, fs_inst*, bool*) 
> (brw_fs_cse.cpp:161)
> ==10298==by 0xAAD46C3: instructions_match(fs_inst*, fs_inst*, bool*) 
> (brw_fs_cse.cpp:187)
> ==10298==by 0xAAD4BAA: fs_visitor::opt_cse_local(bblock_t*) 
> (brw_fs_cse.cpp:251)
> ==10298==by 0xAAD5216: fs_visitor::opt_cse() (brw_fs_cse.cpp:361)
> ==10298==by 0xAAC8AAD: fs_visitor::optimize() (brw_fs.cpp:5401)
> ==10298==by 0xAACB9DC: fs_visitor::run_fs(bool) (brw_fs.cpp:5803)
> ==10298==by 0xAACC38B: brw_compile_fs (brw_fs.cpp:6029)
> ==10298==by 0xAA39796: brw_codegen_wm_prog (brw_wm.c:137)
> ==10298==by 0xAA3B068: brw_fs_precompile (brw_wm.c:637)
>
> This patch adds an explicit padding and initializes it to zero.
>
> Signed-off-by: Samuel Iglesias Gonsálvez 
> ---
>
> This patch replaces the following one:
>
> [PATCH 2/2] i965: check each field separately in backend_end::equals()
>
>  src/mesa/drivers/dri/i965/brw_reg.h | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
> b/src/mesa/drivers/dri/i965/brw_reg.h
> index 3b76d7d..ebb7f29 100644
> --- a/src/mesa/drivers/dri/i965/brw_reg.h
> +++ b/src/mesa/drivers/dri/i965/brw_reg.h
> @@ -243,6 +243,9 @@ struct brw_reg {
> unsigned subnr:5;  /* :1 in align16 */
> unsigned nr:16;
>  
> +   /* IMPORTANT: adjust padding bits if you add new fields */
> +   unsigned padding:32;
> +

Ugh!  It seems terribly fragile to me to make assumptions about the
amount of (implementation-defined) padding that you're going to end up
with.  It would be awful if someone builds the driver on a different
compiler or architecture that happens to align things differently, what
would cause the whole compiler back-end to behave non-deterministically
(possibly without any obvious sign of anything being wrong other than
decreased shader performance).  I think the two least insane
possibilities we have to fix the problem are:

 - memset() the whole struct at the top of brw_reg() and anywhere else a
   brw_reg struct is initialized.

 - Accept the reality that the struct contains some amount of undefined
   padding and define a helper function (e.g. brw_regs_equal() in
   brw_reg.h) to do the comparison manually, then use it everywhere we
   currently use memcmp() to compare brw_regs.

Any suggestions Matt?

> union {
>struct {
>   unsigned swizzle:8;  /* src only, align16 only */
> @@ -337,7 +340,7 @@ brw_reg(enum brw_reg_file file,
> reg.pad0 = 0;
> reg.subnr = subnr * type_sz(type);
> reg.nr = nr;
> -
> +   reg.padding = 0;
> /* Initialize all union's bits to zero before setting them. */
> reg.df = 0;
>  
> -- 
> 2.5.0


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC] nir/validate: on failure, dump shader w/ offending line annotated

2016-05-13 Thread Rob Clark

On Fri, May 13, 2016 at 4:14 PM, Rob Clark  wrote:
> On Fri, May 13, 2016 at 4:10 PM, Jason Ekstrand  wrote:
>> On Fri, May 13, 2016 at 1:02 PM, Rob Clark  wrote:
>>>
>>> From: Rob Clark 
>>>
>>> If we assert in nir_validate_shader(), print the shader with the
>>> offending instruction prefixed with "=>" to make it easier to find what
>>> part of the shader nir_validate is complaining about.
>>>
>>> Macro funny-business in nir_validate() was just to avoid changing a
>>> bazillion assert() lines to validate_assert() (or similar) for the point
>>> of an RFC ;-)
>>
>>
>> I love this idea.  I just wish it worked for more than just instructions.
>> It would also be fantastic if it were somehow able to print more than one
>> error.  Maybe something where we tie printing and validation together
>> somehow?  Just a thought.
>
> hmm, err_instr could easily become a void* (or array of void*?) to
> match var's/etc too..

(or really what we want is a hashset, I guess..)

> and nir_validate could easily keep a list of fails (maybe up to some
> threshold), and only assert at the end if num_errors > 0..
>
> That might be an easier way to go than merging the two existing
> passes..  although if I was starting from scratch merging the two
> might have been the better approach
>
> BR,
> -R
>
>>>
>>> Example output: http://hastebin.com/raw/qorirayazu
>>> ---
>>>  src/compiler/nir/nir.h  |  1 +
>>>  src/compiler/nir/nir_print.c| 14 +-
>>>  src/compiler/nir/nir_validate.c | 15 +++
>>>  3 files changed, 29 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>>> index ade584c..6bb9fbe 100644
>>> --- a/src/compiler/nir/nir.h
>>> +++ b/src/compiler/nir/nir.h
>>> @@ -2196,6 +2196,7 @@ unsigned nir_index_instrs(nir_function_impl *impl);
>>>  void nir_index_blocks(nir_function_impl *impl);
>>>
>>>  void nir_print_shader(nir_shader *shader, FILE *fp);
>>> +void nir_print_shader_err(nir_shader *shader, FILE *fp, nir_instr
>>> *instr);
>>>  void nir_print_instr(const nir_instr *instr, FILE *fp);
>>>
>>>  nir_shader *nir_shader_clone(void *mem_ctx, const nir_shader *s);
>>> diff --git a/src/compiler/nir/nir_print.c b/src/compiler/nir/nir_print.c
>>> index a36561e..3b25a49 100644
>>> --- a/src/compiler/nir/nir_print.c
>>> +++ b/src/compiler/nir/nir_print.c
>>> @@ -53,6 +53,8 @@ typedef struct {
>>>
>>> /* an index used to make new non-conflicting names */
>>> unsigned index;
>>> +
>>> +   nir_instr *err_instr;
>>>  } print_state;
>>>
>>>  static void
>>> @@ -916,6 +918,8 @@ print_block(nir_block *block, print_state *state,
>>> unsigned tabs)
>>> free(preds);
>>>
>>> nir_foreach_instr(instr, block) {
>>> +  if (instr == state->err_instr)
>>> + fprintf(fp, "=>");
>>>print_instr(instr, state, tabs);
>>>fprintf(fp, "\n");
>>> }
>>> @@ -1090,11 +1094,13 @@ destroy_print_state(print_state *state)
>>>  }
>>>
>>>  void
>>> -nir_print_shader(nir_shader *shader, FILE *fp)
>>> +nir_print_shader_err(nir_shader *shader, FILE *fp, nir_instr *instr)
>>>  {
>>> print_state state;
>>> init_print_state(, shader, fp);
>>>
>>> +   state.err_instr = instr;
>>> +
>>> fprintf(fp, "shader: %s\n", gl_shader_stage_name(shader->stage));
>>>
>>> if (shader->info.name)
>>> @@ -1144,6 +1150,12 @@ nir_print_shader(nir_shader *shader, FILE *fp)
>>>  }
>>>
>>>  void
>>> +nir_print_shader(nir_shader *shader, FILE *fp)
>>> +{
>>> +   nir_print_shader_err(shader, fp, NULL);
>>> +}
>>> +
>>> +void
>>>  nir_print_instr(const nir_instr *instr, FILE *fp)
>>>  {
>>> print_state state = {
>>> diff --git a/src/compiler/nir/nir_validate.c
>>> b/src/compiler/nir/nir_validate.c
>>> index 84334d4..b47087f 100644
>>> --- a/src/compiler/nir/nir_validate.c
>>> +++ b/src/compiler/nir/nir_validate.c
>>> @@ -97,6 +97,21 @@ typedef struct {
>>> struct hash_table *var_defs;
>>>  } validate_state;
>>>
>>> +
>>> +
>>> +static void
>>> +dump_assert(validate_state *state, const char *failed)
>>> +{
>>> +   fprintf(stderr, "validate failed: %s\n", failed);
>>> +   if (state->instr)
>>> +  nir_print_shader_err(state->shader, stderr, state->instr);
>>> +}
>>> +
>>> +#define __assert assert
>>> +#undef assert
>>> +#define assert(x) do { if (!(x)) { dump_assert(state, #x);
>>> __assert_fail(#x, __FILE__, __LINE__, __func__); } } while (0)
>>> +
>>> +
>>>  static void validate_src(nir_src *src, validate_state *state);
>>>
>>>  static void
>>> --
>>> 2.5.5
>>>
>>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC] nir/validate: on failure, dump shader w/ offending line annotated

2016-05-13 Thread Rob Clark

On Fri, May 13, 2016 at 4:10 PM, Jason Ekstrand  wrote:
> On Fri, May 13, 2016 at 1:02 PM, Rob Clark  wrote:
>>
>> From: Rob Clark 
>>
>> If we assert in nir_validate_shader(), print the shader with the
>> offending instruction prefixed with "=>" to make it easier to find what
>> part of the shader nir_validate is complaining about.
>>
>> Macro funny-business in nir_validate() was just to avoid changing a
>> bazillion assert() lines to validate_assert() (or similar) for the point
>> of an RFC ;-)
>
>
> I love this idea.  I just wish it worked for more than just instructions.
> It would also be fantastic if it were somehow able to print more than one
> error.  Maybe something where we tie printing and validation together
> somehow?  Just a thought.

hmm, err_instr could easily become a void* (or array of void*?) to
match var's/etc too..

and nir_validate could easily keep a list of fails (maybe up to some
threshold), and only assert at the end if num_errors > 0..

That might be an easier way to go than merging the two existing
passes..  although if I was starting from scratch merging the two
might have been the better approach

BR,
-R

>>
>> Example output: http://hastebin.com/raw/qorirayazu
>> ---
>>  src/compiler/nir/nir.h  |  1 +
>>  src/compiler/nir/nir_print.c| 14 +-
>>  src/compiler/nir/nir_validate.c | 15 +++
>>  3 files changed, 29 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> index ade584c..6bb9fbe 100644
>> --- a/src/compiler/nir/nir.h
>> +++ b/src/compiler/nir/nir.h
>> @@ -2196,6 +2196,7 @@ unsigned nir_index_instrs(nir_function_impl *impl);
>>  void nir_index_blocks(nir_function_impl *impl);
>>
>>  void nir_print_shader(nir_shader *shader, FILE *fp);
>> +void nir_print_shader_err(nir_shader *shader, FILE *fp, nir_instr
>> *instr);
>>  void nir_print_instr(const nir_instr *instr, FILE *fp);
>>
>>  nir_shader *nir_shader_clone(void *mem_ctx, const nir_shader *s);
>> diff --git a/src/compiler/nir/nir_print.c b/src/compiler/nir/nir_print.c
>> index a36561e..3b25a49 100644
>> --- a/src/compiler/nir/nir_print.c
>> +++ b/src/compiler/nir/nir_print.c
>> @@ -53,6 +53,8 @@ typedef struct {
>>
>> /* an index used to make new non-conflicting names */
>> unsigned index;
>> +
>> +   nir_instr *err_instr;
>>  } print_state;
>>
>>  static void
>> @@ -916,6 +918,8 @@ print_block(nir_block *block, print_state *state,
>> unsigned tabs)
>> free(preds);
>>
>> nir_foreach_instr(instr, block) {
>> +  if (instr == state->err_instr)
>> + fprintf(fp, "=>");
>>print_instr(instr, state, tabs);
>>fprintf(fp, "\n");
>> }
>> @@ -1090,11 +1094,13 @@ destroy_print_state(print_state *state)
>>  }
>>
>>  void
>> -nir_print_shader(nir_shader *shader, FILE *fp)
>> +nir_print_shader_err(nir_shader *shader, FILE *fp, nir_instr *instr)
>>  {
>> print_state state;
>> init_print_state(, shader, fp);
>>
>> +   state.err_instr = instr;
>> +
>> fprintf(fp, "shader: %s\n", gl_shader_stage_name(shader->stage));
>>
>> if (shader->info.name)
>> @@ -1144,6 +1150,12 @@ nir_print_shader(nir_shader *shader, FILE *fp)
>>  }
>>
>>  void
>> +nir_print_shader(nir_shader *shader, FILE *fp)
>> +{
>> +   nir_print_shader_err(shader, fp, NULL);
>> +}
>> +
>> +void
>>  nir_print_instr(const nir_instr *instr, FILE *fp)
>>  {
>> print_state state = {
>> diff --git a/src/compiler/nir/nir_validate.c
>> b/src/compiler/nir/nir_validate.c
>> index 84334d4..b47087f 100644
>> --- a/src/compiler/nir/nir_validate.c
>> +++ b/src/compiler/nir/nir_validate.c
>> @@ -97,6 +97,21 @@ typedef struct {
>> struct hash_table *var_defs;
>>  } validate_state;
>>
>> +
>> +
>> +static void
>> +dump_assert(validate_state *state, const char *failed)
>> +{
>> +   fprintf(stderr, "validate failed: %s\n", failed);
>> +   if (state->instr)
>> +  nir_print_shader_err(state->shader, stderr, state->instr);
>> +}
>> +
>> +#define __assert assert
>> +#undef assert
>> +#define assert(x) do { if (!(x)) { dump_assert(state, #x);
>> __assert_fail(#x, __FILE__, __LINE__, __func__); } } while (0)
>> +
>> +
>>  static void validate_src(nir_src *src, validate_state *state);
>>
>>  static void
>> --
>> 2.5.5
>>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965: check tcs for NULL dereference

2016-05-13 Thread Mark Janes

Coverity issue 1361544 found an instance where the tcs variable is
checked for NULL, but unconditionally dereferenced later in the same
function.

CC: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_tcs.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_tcs.c 
b/src/mesa/drivers/dri/i965/brw_tcs.c
index e8178c6..9589fa5 100644
--- a/src/mesa/drivers/dri/i965/brw_tcs.c
+++ b/src/mesa/drivers/dri/i965/brw_tcs.c
@@ -278,14 +278,16 @@ brw_codegen_tcs_prog(struct brw_context *brw,
 
if (unlikely(brw->perf_debug)) {
   struct brw_shader *btcs = (struct brw_shader *) tcs;
-  if (btcs->compiled_once) {
- brw_tcs_debug_recompile(brw, shader_prog, key);
+  if (btcs) {
+ if (btcs->compiled_once) {
+brw_tcs_debug_recompile(brw, shader_prog, key);
+ }
+ btcs->compiled_once = true;
   }
   if (start_busy && !drm_intel_bo_busy(brw->batch.last_bo)) {
  perf_debug("TCS compile took %.03f ms and stalled the GPU\n",
 (get_time() - start_time) * 1000);
   }
-  btcs->compiled_once = true;
}
 
/* Scratch space is used for register spilling */
-- 
2.8.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC] nir/validate: on failure, dump shader w/ offending line annotated

2016-05-13 Thread Jason Ekstrand

On Fri, May 13, 2016 at 1:02 PM, Rob Clark  wrote:

> From: Rob Clark 
>
> If we assert in nir_validate_shader(), print the shader with the
> offending instruction prefixed with "=>" to make it easier to find what
> part of the shader nir_validate is complaining about.
>
> Macro funny-business in nir_validate() was just to avoid changing a
> bazillion assert() lines to validate_assert() (or similar) for the point
> of an RFC ;-)
>

I love this idea.  I just wish it worked for more than just instructions.
It would also be fantastic if it were somehow able to print more than one
error.  Maybe something where we tie printing and validation together
somehow?  Just a thought.


> Example output: http://hastebin.com/raw/qorirayazu
> ---
>  src/compiler/nir/nir.h  |  1 +
>  src/compiler/nir/nir_print.c| 14 +-
>  src/compiler/nir/nir_validate.c | 15 +++
>  3 files changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index ade584c..6bb9fbe 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -2196,6 +2196,7 @@ unsigned nir_index_instrs(nir_function_impl *impl);
>  void nir_index_blocks(nir_function_impl *impl);
>
>  void nir_print_shader(nir_shader *shader, FILE *fp);
> +void nir_print_shader_err(nir_shader *shader, FILE *fp, nir_instr *instr);
>  void nir_print_instr(const nir_instr *instr, FILE *fp);
>
>  nir_shader *nir_shader_clone(void *mem_ctx, const nir_shader *s);
> diff --git a/src/compiler/nir/nir_print.c b/src/compiler/nir/nir_print.c
> index a36561e..3b25a49 100644
> --- a/src/compiler/nir/nir_print.c
> +++ b/src/compiler/nir/nir_print.c
> @@ -53,6 +53,8 @@ typedef struct {
>
> /* an index used to make new non-conflicting names */
> unsigned index;
> +
> +   nir_instr *err_instr;
>  } print_state;
>
>  static void
> @@ -916,6 +918,8 @@ print_block(nir_block *block, print_state *state,
> unsigned tabs)
> free(preds);
>
> nir_foreach_instr(instr, block) {
> +  if (instr == state->err_instr)
> + fprintf(fp, "=>");
>print_instr(instr, state, tabs);
>fprintf(fp, "\n");
> }
> @@ -1090,11 +1094,13 @@ destroy_print_state(print_state *state)
>  }
>
>  void
> -nir_print_shader(nir_shader *shader, FILE *fp)
> +nir_print_shader_err(nir_shader *shader, FILE *fp, nir_instr *instr)
>  {
> print_state state;
> init_print_state(, shader, fp);
>
> +   state.err_instr = instr;
> +
> fprintf(fp, "shader: %s\n", gl_shader_stage_name(shader->stage));
>
> if (shader->info.name)
> @@ -1144,6 +1150,12 @@ nir_print_shader(nir_shader *shader, FILE *fp)
>  }
>
>  void
> +nir_print_shader(nir_shader *shader, FILE *fp)
> +{
> +   nir_print_shader_err(shader, fp, NULL);
> +}
> +
> +void
>  nir_print_instr(const nir_instr *instr, FILE *fp)
>  {
> print_state state = {
> diff --git a/src/compiler/nir/nir_validate.c
> b/src/compiler/nir/nir_validate.c
> index 84334d4..b47087f 100644
> --- a/src/compiler/nir/nir_validate.c
> +++ b/src/compiler/nir/nir_validate.c
> @@ -97,6 +97,21 @@ typedef struct {
> struct hash_table *var_defs;
>  } validate_state;
>
> +
> +
> +static void
> +dump_assert(validate_state *state, const char *failed)
> +{
> +   fprintf(stderr, "validate failed: %s\n", failed);
> +   if (state->instr)
> +  nir_print_shader_err(state->shader, stderr, state->instr);
> +}
> +
> +#define __assert assert
> +#undef assert
> +#define assert(x) do { if (!(x)) { dump_assert(state, #x);
> __assert_fail(#x, __FILE__, __LINE__, __func__); } } while (0)
> +
> +
>  static void validate_src(nir_src *src, validate_state *state);
>
>  static void
> --
> 2.5.5
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC] nir/validate: on failure, dump shader w/ offending line annotated

2016-05-13 Thread Rob Clark

From: Rob Clark 

If we assert in nir_validate_shader(), print the shader with the
offending instruction prefixed with "=>" to make it easier to find what
part of the shader nir_validate is complaining about.

Macro funny-business in nir_validate() was just to avoid changing a
bazillion assert() lines to validate_assert() (or similar) for the point
of an RFC ;-)

Example output: http://hastebin.com/raw/qorirayazu
---
 src/compiler/nir/nir.h  |  1 +
 src/compiler/nir/nir_print.c| 14 +-
 src/compiler/nir/nir_validate.c | 15 +++
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index ade584c..6bb9fbe 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2196,6 +2196,7 @@ unsigned nir_index_instrs(nir_function_impl *impl);
 void nir_index_blocks(nir_function_impl *impl);
 
 void nir_print_shader(nir_shader *shader, FILE *fp);
+void nir_print_shader_err(nir_shader *shader, FILE *fp, nir_instr *instr);
 void nir_print_instr(const nir_instr *instr, FILE *fp);
 
 nir_shader *nir_shader_clone(void *mem_ctx, const nir_shader *s);
diff --git a/src/compiler/nir/nir_print.c b/src/compiler/nir/nir_print.c
index a36561e..3b25a49 100644
--- a/src/compiler/nir/nir_print.c
+++ b/src/compiler/nir/nir_print.c
@@ -53,6 +53,8 @@ typedef struct {
 
/* an index used to make new non-conflicting names */
unsigned index;
+
+   nir_instr *err_instr;
 } print_state;
 
 static void
@@ -916,6 +918,8 @@ print_block(nir_block *block, print_state *state, unsigned 
tabs)
free(preds);
 
nir_foreach_instr(instr, block) {
+  if (instr == state->err_instr)
+ fprintf(fp, "=>");
   print_instr(instr, state, tabs);
   fprintf(fp, "\n");
}
@@ -1090,11 +1094,13 @@ destroy_print_state(print_state *state)
 }
 
 void
-nir_print_shader(nir_shader *shader, FILE *fp)
+nir_print_shader_err(nir_shader *shader, FILE *fp, nir_instr *instr)
 {
print_state state;
init_print_state(, shader, fp);
 
+   state.err_instr = instr;
+
fprintf(fp, "shader: %s\n", gl_shader_stage_name(shader->stage));
 
if (shader->info.name)
@@ -1144,6 +1150,12 @@ nir_print_shader(nir_shader *shader, FILE *fp)
 }
 
 void
+nir_print_shader(nir_shader *shader, FILE *fp)
+{
+   nir_print_shader_err(shader, fp, NULL);
+}
+
+void
 nir_print_instr(const nir_instr *instr, FILE *fp)
 {
print_state state = {
diff --git a/src/compiler/nir/nir_validate.c b/src/compiler/nir/nir_validate.c
index 84334d4..b47087f 100644
--- a/src/compiler/nir/nir_validate.c
+++ b/src/compiler/nir/nir_validate.c
@@ -97,6 +97,21 @@ typedef struct {
struct hash_table *var_defs;
 } validate_state;
 
+
+
+static void
+dump_assert(validate_state *state, const char *failed)
+{
+   fprintf(stderr, "validate failed: %s\n", failed);
+   if (state->instr)
+  nir_print_shader_err(state->shader, stderr, state->instr);
+}
+
+#define __assert assert
+#undef assert
+#define assert(x) do { if (!(x)) { dump_assert(state, #x); __assert_fail(#x, 
__FILE__, __LINE__, __func__); } } while (0)
+
+
 static void validate_src(nir_src *src, validate_state *state);
 
 static void
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Make exec_size 16 word/byte registers use exec_size halving again.

2016-05-13 Thread Francisco Jerez

Francisco Jerez  writes:

> Francisco Jerez  writes:
>
>> Kenneth Graunke  writes:
>>
>>> On Friday, May 13, 2016 3:39:29 AM PDT Connor Abbott wrote:
 My understanding is that compression isn't necessary here, at least on
 newer gens (I don't know much about gen4/5). Could you explain why a
 <16,16,1>:w region is illegal? It would be nice to get a PRM citation
 in the comment below.
>>>
>>> Matt mentioned it was illegal in a bugzilla comment, but it certainly
>>> seems legal to me, at least with W types.  However,  we /are/ using
>>> compression...and using both together seems wrong...
>>>
>>> fs_generator::generate_code() contains:
>>>
>>>if (dispatch_width == 16)
>>>   brw_set_default_compression_control(p, BRW_COMPRESSION_COMPRESSED);
>>>
>> That seems pretty bogus (and Connor's patch looks correct to me).  For
>> SIMD32 I had to fix the generator to set compression control to
>> BRW_COMPRESSION_COMPRESSED for instructions that write multiple
>> registers *only* -- Do you want me to try if that alone fixes the
>> regressions on 965GM?
>>
>
> I just looked at the bug report -- It looks like what is going on is
> that the instruction writes to a 32-bit 16-wide register so it must be
> compressed either way, but the code in brw_reg_from_fs_reg() fails to
> consider whether the instruction will be compressed while deciding
> whether to do the width halving or not -- AFAICT both the new and old
> code were bogus in different ways ;), we probably want something like:
>
> | if (needs_compression(inst)) {
> |// Use half register width
> | } else {
> |// Use half register width

Oops, of course here you'd have to use the full register width.

> | }
>
> The implementation of should_instruction_be_compressed() would have to
> be whatever we base the instruction compression control field on --
> Right now it would be something like:
>
> | static bool
> | needs_compression(fs_inst *inst)
> | {
> |return inst->dst.component_size(inst->exec_size) > REG_SIZE;
> | }
>
> which is not quite correct where sources and destination don't agree on
> the number of components read or written, but that's already broken and
> fixing it properly probably belongs in a separate change.
>
>>> It looks like we also have code a little ways down to set compression
>>> control based on the register region, so maybe we don't the above code.
>>> But I'm kind of hesitant to delete it - we might need it for some
>>> instructions like IF which don't have sources/destinations.
>>>
>>
>> Control flow instruction have to be uncompressed anyway.  :)
>>
>>> --Ken


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95346] Stellaris - Black/super dark planets

2016-05-13 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95346

Alexander Tsoy  changed:

   What|Removed |Added

 CC||alexan...@tsoy.me

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Make exec_size 16 word/byte registers use exec_size halving again.

2016-05-13 Thread Francisco Jerez

Francisco Jerez  writes:

> Kenneth Graunke  writes:
>
>> On Friday, May 13, 2016 3:39:29 AM PDT Connor Abbott wrote:
>>> My understanding is that compression isn't necessary here, at least on
>>> newer gens (I don't know much about gen4/5). Could you explain why a
>>> <16,16,1>:w region is illegal? It would be nice to get a PRM citation
>>> in the comment below.
>>
>> Matt mentioned it was illegal in a bugzilla comment, but it certainly
>> seems legal to me, at least with W types.  However,  we /are/ using
>> compression...and using both together seems wrong...
>>
>> fs_generator::generate_code() contains:
>>
>>if (dispatch_width == 16)
>>   brw_set_default_compression_control(p, BRW_COMPRESSION_COMPRESSED);
>>
> That seems pretty bogus (and Connor's patch looks correct to me).  For
> SIMD32 I had to fix the generator to set compression control to
> BRW_COMPRESSION_COMPRESSED for instructions that write multiple
> registers *only* -- Do you want me to try if that alone fixes the
> regressions on 965GM?
>

I just looked at the bug report -- It looks like what is going on is
that the instruction writes to a 32-bit 16-wide register so it must be
compressed either way, but the code in brw_reg_from_fs_reg() fails to
consider whether the instruction will be compressed while deciding
whether to do the width halving or not -- AFAICT both the new and old
code were bogus in different ways ;), we probably want something like:

| if (needs_compression(inst)) {
|// Use half register width
| } else {
|// Use half register width
| }

The implementation of should_instruction_be_compressed() would have to
be whatever we base the instruction compression control field on --
Right now it would be something like:

| static bool
| needs_compression(fs_inst *inst)
| {
|return inst->dst.component_size(inst->exec_size) > REG_SIZE;
| }

which is not quite correct where sources and destination don't agree on
the number of components read or written, but that's already broken and
fixing it properly probably belongs in a separate change.

>> It looks like we also have code a little ways down to set compression
>> control based on the register region, so maybe we don't the above code.
>> But I'm kind of hesitant to delete it - we might need it for some
>> instructions like IF which don't have sources/destinations.
>>
>
> Control flow instruction have to be uncompressed anyway.  :)
>
>> --Ken


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 19/28] i965/blorp: Use NIR for clear shaders

2016-05-13 Thread Jason Ekstrand

On Fri, May 13, 2016 at 11:48 AM, Pohjolainen, Topi <
topi.pohjolai...@intel.com> wrote:

> On Tue, May 10, 2016 at 04:16:39PM -0700, Jason Ekstrand wrote:
> > ---
> >  src/mesa/drivers/dri/i965/brw_blorp_clear.cpp | 184
> ++
> >  1 file changed, 39 insertions(+), 145 deletions(-)
>
> Can you also add:
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95373
>

Done


> We concluded it wasn't worth fixing the warning because you were about to
> delete the old compiler.
>
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> > index 94b8277..3925d28 100644
> > --- a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> > @@ -37,6 +37,8 @@
> >  #include "brw_eu.h"
> >  #include "brw_state.h"
> >
> > +#include "nir_builder.h"
> > +
> >  #define FILE_DEBUG_FLAG DEBUG_BLORP
> >
> >  struct brw_blorp_const_color_prog_key
> > @@ -45,78 +47,55 @@ struct brw_blorp_const_color_prog_key
> > bool pad[3];
> >  };
> >
> > -class brw_blorp_const_color_program
> > +static void
> > +brw_blorp_params_get_clear_kernel(struct brw_context *brw,
> > +  struct brw_blorp_params *params,
> > +  bool use_replicated_data)
> >  {
> > -public:
> > -   brw_blorp_const_color_program(struct brw_context *brw,
> > - const brw_blorp_const_color_prog_key
> *key);
> > -   ~brw_blorp_const_color_program();
> > +   struct brw_blorp_const_color_prog_key blorp_key;
> > +   memset(_key, 0, sizeof(blorp_key));
> > +   blorp_key.use_simd16_replicated_data = use_replicated_data;
> >
> > -   const GLuint *compile(struct brw_context *brw, GLuint *program_size);
> > +   if (brw_search_cache(>cache, BRW_CACHE_BLORP_PROG,
> > +_key, sizeof(blorp_key),
> > +>wm_prog_kernel, >wm_prog_data))
> > +  return;
> >
> > -   brw_blorp_prog_data prog_data;
> > +   void *mem_ctx = ralloc_context(NULL);
> >
> > -private:
> > -   void alloc_regs();
> > +   nir_builder b;
> > +   nir_builder_init_simple_shader(, NULL, MESA_SHADER_FRAGMENT, NULL);
> > +   b.shader->info.name = ralloc_strdup(b.shader, "BLORP-clear");
> >
> > -   void *mem_ctx;
> > -   const brw_blorp_const_color_prog_key *key;
> > -   struct brw_codegen func;
> > +   nir_variable *u_color = nir_variable_create(b.shader,
> nir_var_uniform,
> > +   glsl_vec4_type(),
> "u_color");
> > +   u_color->data.location = 0;
> >
> > -   /* Thread dispatch header */
> > -   struct brw_reg R0;
> > +   nir_variable *frag_color = nir_variable_create(b.shader,
> nir_var_shader_out,
> > +  glsl_vec4_type(),
> > +  "gl_FragColor");
> > +   frag_color->data.location = FRAG_RESULT_COLOR;
> >
> > -   /* Pixel X/Y coordinates (always in R1). */
> > -   struct brw_reg R1;
> > +   nir_copy_var(, frag_color, u_color);
> >
> > -   /* Register with push constants (a single vec4) */
> > -   struct brw_reg clear_rgba;
> > +   struct brw_wm_prog_key wm_key;
> > +   brw_blorp_init_wm_prog_key(_key);
> >
> > -   /* MRF used for render target writes */
> > -   GLuint base_mrf;
> > -};
> > +   struct brw_blorp_prog_data prog_data;
> > +   brw_blorp_prog_data_init(_data);
> >
> > -brw_blorp_const_color_program::brw_blorp_const_color_program(
> > -  struct brw_context *brw,
> > -  const brw_blorp_const_color_prog_key *key)
> > -   : mem_ctx(ralloc_context(NULL)),
> > - key(key),
> > - R0(),
> > - R1(),
> > - clear_rgba(),
> > - base_mrf(0)
> > -{
> > -   prog_data.first_curbe_grf_0 = 0;
> > -   prog_data.persample_msaa_dispatch = false;
> > -   brw_init_codegen(brw->intelScreen->devinfo, , mem_ctx);
> > -}
> > +   unsigned program_size;
> > +   const unsigned *program =
> > +  brw_blorp_compile_nir_shader(brw, b.shader, _key,
> use_replicated_data,
> > +   _data, _size);
> >
> > -brw_blorp_const_color_program::~brw_blorp_const_color_program()
> > -{
> > -   ralloc_free(mem_ctx);
> > -}
> > -
> > -static void
> > -brw_blorp_params_get_clear_kernel(struct brw_context *brw,
> > -  struct brw_blorp_params *params,
> > -  bool use_replicated_data)
> > -{
> > -   struct brw_blorp_const_color_prog_key blorp_key;
> > -   memset(_key, 0, sizeof(blorp_key));
> > -   blorp_key.use_simd16_replicated_data = use_replicated_data;
> > +   brw_upload_cache(>cache, BRW_CACHE_BLORP_PROG,
> > +_key, sizeof(blorp_key),
> > +program, program_size,
> > +_data, sizeof(prog_data),
> > +>wm_prog_kernel, >wm_prog_data);
> >
> > -   if (!brw_search_cache(>cache, BRW_CACHE_BLORP_PROG,
> > - _key,

Re: [Mesa-dev] [PATCH] i965: Make exec_size 16 word/byte registers use exec_size halving again.

2016-05-13 Thread Francisco Jerez

Kenneth Graunke  writes:

> On Friday, May 13, 2016 3:39:29 AM PDT Connor Abbott wrote:
>> My understanding is that compression isn't necessary here, at least on
>> newer gens (I don't know much about gen4/5). Could you explain why a
>> <16,16,1>:w region is illegal? It would be nice to get a PRM citation
>> in the comment below.
>
> Matt mentioned it was illegal in a bugzilla comment, but it certainly
> seems legal to me, at least with W types.  However,  we /are/ using
> compression...and using both together seems wrong...
>
> fs_generator::generate_code() contains:
>
>if (dispatch_width == 16)
>   brw_set_default_compression_control(p, BRW_COMPRESSION_COMPRESSED);
>
That seems pretty bogus (and Connor's patch looks correct to me).  For
SIMD32 I had to fix the generator to set compression control to
BRW_COMPRESSION_COMPRESSED for instructions that write multiple
registers *only* -- Do you want me to try if that alone fixes the
regressions on 965GM?

> It looks like we also have code a little ways down to set compression
> control based on the register region, so maybe we don't the above code.
> But I'm kind of hesitant to delete it - we might need it for some
> instructions like IF which don't have sources/destinations.
>

Control flow instruction have to be uncompressed anyway.  :)

> --Ken


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 19/28] i965/blorp: Use NIR for clear shaders

2016-05-13 Thread Pohjolainen, Topi

On Tue, May 10, 2016 at 04:16:39PM -0700, Jason Ekstrand wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_blorp_clear.cpp | 184 
> ++
>  1 file changed, 39 insertions(+), 145 deletions(-)

Can you also add:

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95373

We concluded it wasn't worth fixing the warning because you were about to
delete the old compiler.

> 
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp 
> b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> index 94b8277..3925d28 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> @@ -37,6 +37,8 @@
>  #include "brw_eu.h"
>  #include "brw_state.h"
>  
> +#include "nir_builder.h"
> +
>  #define FILE_DEBUG_FLAG DEBUG_BLORP
>  
>  struct brw_blorp_const_color_prog_key
> @@ -45,78 +47,55 @@ struct brw_blorp_const_color_prog_key
> bool pad[3];
>  };
>  
> -class brw_blorp_const_color_program
> +static void
> +brw_blorp_params_get_clear_kernel(struct brw_context *brw,
> +  struct brw_blorp_params *params,
> +  bool use_replicated_data)
>  {
> -public:
> -   brw_blorp_const_color_program(struct brw_context *brw,
> - const brw_blorp_const_color_prog_key *key);
> -   ~brw_blorp_const_color_program();
> +   struct brw_blorp_const_color_prog_key blorp_key;
> +   memset(_key, 0, sizeof(blorp_key));
> +   blorp_key.use_simd16_replicated_data = use_replicated_data;
>  
> -   const GLuint *compile(struct brw_context *brw, GLuint *program_size);
> +   if (brw_search_cache(>cache, BRW_CACHE_BLORP_PROG,
> +_key, sizeof(blorp_key),
> +>wm_prog_kernel, >wm_prog_data))
> +  return;
>  
> -   brw_blorp_prog_data prog_data;
> +   void *mem_ctx = ralloc_context(NULL);
>  
> -private:
> -   void alloc_regs();
> +   nir_builder b;
> +   nir_builder_init_simple_shader(, NULL, MESA_SHADER_FRAGMENT, NULL);
> +   b.shader->info.name = ralloc_strdup(b.shader, "BLORP-clear");
>  
> -   void *mem_ctx;
> -   const brw_blorp_const_color_prog_key *key;
> -   struct brw_codegen func;
> +   nir_variable *u_color = nir_variable_create(b.shader, nir_var_uniform,
> +   glsl_vec4_type(), "u_color");
> +   u_color->data.location = 0;
>  
> -   /* Thread dispatch header */
> -   struct brw_reg R0;
> +   nir_variable *frag_color = nir_variable_create(b.shader, 
> nir_var_shader_out,
> +  glsl_vec4_type(),
> +  "gl_FragColor");
> +   frag_color->data.location = FRAG_RESULT_COLOR;
>  
> -   /* Pixel X/Y coordinates (always in R1). */
> -   struct brw_reg R1;
> +   nir_copy_var(, frag_color, u_color);
>  
> -   /* Register with push constants (a single vec4) */
> -   struct brw_reg clear_rgba;
> +   struct brw_wm_prog_key wm_key;
> +   brw_blorp_init_wm_prog_key(_key);
>  
> -   /* MRF used for render target writes */
> -   GLuint base_mrf;
> -};
> +   struct brw_blorp_prog_data prog_data;
> +   brw_blorp_prog_data_init(_data);
>  
> -brw_blorp_const_color_program::brw_blorp_const_color_program(
> -  struct brw_context *brw,
> -  const brw_blorp_const_color_prog_key *key)
> -   : mem_ctx(ralloc_context(NULL)),
> - key(key),
> - R0(),
> - R1(),
> - clear_rgba(),
> - base_mrf(0)
> -{
> -   prog_data.first_curbe_grf_0 = 0;
> -   prog_data.persample_msaa_dispatch = false;
> -   brw_init_codegen(brw->intelScreen->devinfo, , mem_ctx);
> -}
> +   unsigned program_size;
> +   const unsigned *program =
> +  brw_blorp_compile_nir_shader(brw, b.shader, _key, 
> use_replicated_data,
> +   _data, _size);
>  
> -brw_blorp_const_color_program::~brw_blorp_const_color_program()
> -{
> -   ralloc_free(mem_ctx);
> -}
> -
> -static void
> -brw_blorp_params_get_clear_kernel(struct brw_context *brw,
> -  struct brw_blorp_params *params,
> -  bool use_replicated_data)
> -{
> -   struct brw_blorp_const_color_prog_key blorp_key;
> -   memset(_key, 0, sizeof(blorp_key));
> -   blorp_key.use_simd16_replicated_data = use_replicated_data;
> +   brw_upload_cache(>cache, BRW_CACHE_BLORP_PROG,
> +_key, sizeof(blorp_key),
> +program, program_size,
> +_data, sizeof(prog_data),
> +>wm_prog_kernel, >wm_prog_data);
>  
> -   if (!brw_search_cache(>cache, BRW_CACHE_BLORP_PROG,
> - _key, sizeof(blorp_key),
> - >wm_prog_kernel, >wm_prog_data)) {
> -  brw_blorp_const_color_program prog(brw, _key);
> -  GLuint program_size;
> -  const GLuint *program = prog.compile(brw, _size);
> -  brw_upload_cache(>cache, BRW_CACHE_BLORP_PROG,
> -   _key, sizeof(blorp_key),
> -

Re: [Mesa-dev] [PATCH 2/2] i965: Flip interpolateAtOffset's y offset when necessary.

2016-05-13 Thread Jason Ekstrand

On Fri, May 13, 2016 at 1:42 AM, Kenneth Graunke 
wrote:

> Fixes 5 dEQP-GLES31.functional.shaders.multisample_interpolation tests:
> - interpolate_at_offset.no_qualifiers.default_framebuffer
> - interpolate_at_offset.centroid_qualifier.default_framebuffer
> - interpolate_at_offset.sample_qualifier.default_framebuffer
> - interpolate_at_offset.at_sample_position.default_framebuffer
> - interpolate_at_offset.array_element.default_framebuffer
>
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 8 ++--
>  src/mesa/drivers/dri/i965/brw_wm.c   | 3 ++-
>  2 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 4648c58..5890750 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -2871,9 +2871,12 @@ fs_visitor::nir_emit_fs_intrinsic(const fs_builder
> ,
>case nir_intrinsic_interp_var_at_offset: {
>   nir_const_value *const_offset =
> nir_src_as_const_value(instr->src[0]);
>
> + const bool flip = !wm_key->render_to_fbo;
> +
>   if (const_offset) {
>  unsigned off_x = MIN2((int)(const_offset->f32[0] * 16), 7) &
> 0xf;
> -unsigned off_y = MIN2((int)(const_offset->f32[1] * 16), 7) &
> 0xf;
> +unsigned off_y = MIN2((int)(const_offset->f32[1] * 16 *
> +(flip ? -1 : 1)), 7) & 0xf;
>
>  emit_pixel_interpolater_send(bld,
>
> FS_OPCODE_INTERPOLATE_AT_SHARED_OFFSET,
> @@ -2889,7 +2892,8 @@ fs_visitor::nir_emit_fs_intrinsic(const fs_builder
> ,
> fs_reg temp = vgrf(glsl_type::float_type);
> bld.MUL(temp, offset(offset_src, bld, i),
> brw_imm_f(16.0f));
> fs_reg itemp = vgrf(glsl_type::int_type);
> -   bld.MOV(itemp, temp);  /* float to int */
> +   /* float to int */
> +   bld.MOV(itemp, (i == 1 && flip) ? negate(temp) : temp);
>
> /* Clamp the upper end of the range to +7/16.
>  * ARB_gpu_shader5 requires that we support a maximum
> offset
> diff --git a/src/mesa/drivers/dri/i965/brw_wm.c
> b/src/mesa/drivers/dri/i965/brw_wm.c
> index ced9708..192e8e2 100644
> --- a/src/mesa/drivers/dri/i965/brw_wm.c
> +++ b/src/mesa/drivers/dri/i965/brw_wm.c
> @@ -511,7 +511,8 @@ brw_wm_populate_key(struct brw_context *brw, struct
> brw_wm_prog_key *key)
>key->drawable_height = _mesa_geometric_height(ctx->DrawBuffer);
> }
>
> -   if ((fp->program.Base.InputsRead & VARYING_BIT_POS) ||
> program_uses_dfdy) {
> +   if ((fp->program.Base.InputsRead & VARYING_BIT_POS) ||
> +   program_uses_dfdy || prog->nir->info.uses_interp_var_at_offset) {
>

It's kind of lame that we have to add something to nir_shader_info just for
optimistically setting the key.  :-(  I guess not that many shaders use
things that actually need render_to_fbo so it's best to not set it all the
time.  Thanks for fixing the key bit!

Reviewed-by: Jason Ekstrand 


>key->render_to_fbo = _mesa_is_user_fbo(ctx->DrawBuffer);
> }
>
> --
> 2.8.2
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] nir: Add a nir->info.uses_interp_var_at_offset flag.

2016-05-13 Thread Jason Ekstrand

On Fri, May 13, 2016 at 1:42 AM, Kenneth Graunke 
wrote:

> It would probably make more sense to set this from nir_gather_info()
> in case we manage to dead code eliminate these intrinsics.  However,
> we haven't transitioned the GL driver to using that pass yet...
>

Please add it anyway so nir_gather_info doesn't get out-of-date.


>
> Signed-off-by: Kenneth Graunke 
> ---
>  src/compiler/nir/glsl_to_nir.cpp | 3 +++
>  src/compiler/nir/nir.h   | 3 +++
>  2 files changed, 6 insertions(+)
>
> diff --git a/src/compiler/nir/glsl_to_nir.cpp
> b/src/compiler/nir/glsl_to_nir.cpp
> index fb1d421..e82d98a 100644
> --- a/src/compiler/nir/glsl_to_nir.cpp
> +++ b/src/compiler/nir/glsl_to_nir.cpp
> @@ -1276,6 +1276,9 @@ nir_visitor::visit(ir_expression *ir)
>intrin->intrinsic == nir_intrinsic_interp_var_at_sample)
>   intrin->src[0] =
> nir_src_for_ssa(evaluate_rvalue(ir->operands[1]));
>
> +  if (intrin->intrinsic == nir_intrinsic_interp_var_at_offset)
> + shader->info.uses_interp_var_at_offset = true;
> +
>unsigned bit_size =  glsl_get_bit_size(deref->type);
>add_instr(>instr, deref->type->vector_elements, bit_size);
>
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index 20927a2..d12792d 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -1710,6 +1710,9 @@ typedef struct nir_shader_info {
> /* Whether or not this shader ever uses textureGather() */
> bool uses_texture_gather;
>
> +   /** Whether or not this shader uses nir_intrinsic_interp_var_at_offset
> */
> +   bool uses_interp_var_at_offset;
> +
> /* Whether or not this shader uses the gl_ClipDistance output */
> bool uses_clip_distance_out;
>
> --
> 2.8.2
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] i965: Enable ES 3.2 sample shading extensions.

2016-05-13 Thread Ilia Mirkin

On Fri, May 13, 2016 at 2:05 PM, Kenneth Graunke  wrote:
> On Thursday, May 12, 2016 11:28:35 PM PDT Ilia Mirkin wrote:
>> I think it's more than 8... for example, these 6 fail for me as well:
>>
>> dEQP-
> GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.negative.interpolate_struct_member,Fail
>> dEQP-
> GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.negative.interpolate_constant,Fail
>> dEQP-
> GLES31.functional.shaders.multisample_interpolation.interpolate_at_centroid.negative.interpolate_struct_member,Fail
>> dEQP-
> GLES31.functional.shaders.multisample_interpolation.interpolate_at_centroid.negative.interpolate_constant,Fail
>> dEQP-
> GLES31.functional.shaders.multisample_interpolation.interpolate_at_offset.negative.interpolate_struct_member,Fail
>> dEQP-
> GLES31.functional.shaders.multisample_interpolation.interpolate_at_offset.negative.interpolate_constant,Fail
>>
>> However I don't think it's such a huge deal. (We end up successfully
>> compiling a shader we shouldn't.)
>
> Ah, right.  I should mention, I'm still blacklisting anything not in
> the Android mustpass list.  So that's very likely true.

Yeah, and I sent fixes for them last night. Will push tonight in all likelihood.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] i965: Enable ES 3.2 sample shading extensions.

2016-05-13 Thread Kenneth Graunke

On Thursday, May 12, 2016 11:28:35 PM PDT Ilia Mirkin wrote:
> I think it's more than 8... for example, these 6 fail for me as well:
> 
> dEQP-
GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.negative.interpolate_struct_member,Fail
> dEQP-
GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.negative.interpolate_constant,Fail
> dEQP-
GLES31.functional.shaders.multisample_interpolation.interpolate_at_centroid.negative.interpolate_struct_member,Fail
> dEQP-
GLES31.functional.shaders.multisample_interpolation.interpolate_at_centroid.negative.interpolate_constant,Fail
> dEQP-
GLES31.functional.shaders.multisample_interpolation.interpolate_at_offset.negative.interpolate_struct_member,Fail
> dEQP-
GLES31.functional.shaders.multisample_interpolation.interpolate_at_offset.negative.interpolate_constant,Fail
> 
> However I don't think it's such a huge deal. (We end up successfully
> compiling a shader we shouldn't.)

Ah, right.  I should mention, I'm still blacklisting anything not in
the Android mustpass list.  So that's very likely true.

--Ken


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Make exec_size 16 word/byte registers use exec_size halving again.

2016-05-13 Thread Kenneth Graunke

On Friday, May 13, 2016 3:39:29 AM PDT Connor Abbott wrote:
> My understanding is that compression isn't necessary here, at least on
> newer gens (I don't know much about gen4/5). Could you explain why a
> <16,16,1>:w region is illegal? It would be nice to get a PRM citation
> in the comment below.

Matt mentioned it was illegal in a bugzilla comment, but it certainly
seems legal to me, at least with W types.  However,  we /are/ using
compression...and using both together seems wrong...

fs_generator::generate_code() contains:

   if (dispatch_width == 16)
  brw_set_default_compression_control(p, BRW_COMPRESSION_COMPRESSED);

It looks like we also have code a little ways down to set compression
control based on the register region, so maybe we don't the above code.
But I'm kind of hesitant to delete it - we might need it for some
instructions like IF which don't have sources/destinations.

--Ken

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] i965/blorp: Special-case the clear color in MSAA resolves

2016-05-13 Thread Jason Ekstrand

On Wed, May 11, 2016 at 7:42 PM, Jason Ekstrand 
wrote:

> The current MSAA resolve code has a special-case for if the MCS value is 0.
> In this case we can only sample once because we know that all values are in
> slice 0.  This commit adds a second optimization that detecs the magic MCS
> value that indicates the clear color and grabs the color from a push
> constant and avoids sampling altogether.  On a microbenchmark written by
> Neil Roberts that tests resolving surfaces with just clear color, this
> improves performance by 60% for 8x, 40% for 4x, and 28% for 2x MSAA on my
> SKL gte3 laptop.  The benchmark can be found on the ML archive:
>
> https://lists.freedesktop.org/archives/mesa-dev/2016-February/108077.html
> ---
>  src/mesa/drivers/dri/i965/brw_blorp.h|   4 +-
>  src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 101
> +--
>  2 files changed, 100 insertions(+), 5 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h
> b/src/mesa/drivers/dri/i965/brw_blorp.h
> index 15114d0..9d71ca4 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp.h
> +++ b/src/mesa/drivers/dri/i965/brw_blorp.h
> @@ -197,7 +197,9 @@ struct brw_blorp_wm_push_constants
> uint32_t src_z;
>
> /* Pad out to an integral number of registers */
> -   uint32_t pad[5];
> +   uint32_t pad;
> +
> +   union gl_color_union clear_color;
>  };
>
>  #define BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS \
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> index 514a316..45b696d 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> @@ -346,6 +346,7 @@ struct brw_blorp_blit_vars {
>nir_variable *offset;
> } u_x_transform, u_y_transform;
> nir_variable *u_src_z;
> +   nir_variable *u_clear_color;
>
> /* gl_FragCoord */
> nir_variable *frag_coord;
> @@ -374,6 +375,7 @@ brw_blorp_blit_vars_init(nir_builder *b, struct
> brw_blorp_blit_vars *v,
> LOAD_UNIFORM(y_transform.multiplier, glsl_float_type())
> LOAD_UNIFORM(y_transform.offset, glsl_float_type())
> LOAD_UNIFORM(src_z, glsl_uint_type())
> +   LOAD_UNIFORM(clear_color, glsl_vec4_type())
>
>  #undef DECL_UNIFORM
>
> @@ -858,7 +860,8 @@ static nir_ssa_def *
>  blorp_nir_manual_blend_average(nir_builder *b, nir_ssa_def *pos,
> unsigned tex_samples,
> enum intel_msaa_layout tex_layout,
> -   enum brw_reg_type dst_type)
> +   enum brw_reg_type dst_type,
> +   struct brw_blorp_blit_vars *v)
>  {
> /* If non-null, this is the outer-most if statement */
> nir_if *outer_if = NULL;
> @@ -867,9 +870,53 @@ blorp_nir_manual_blend_average(nir_builder *b,
> nir_ssa_def *pos,
>nir_local_variable_create(b->impl, glsl_vec4_type(), "color");
>
> nir_ssa_def *mcs = NULL;
> -   if (tex_layout == INTEL_MSAA_LAYOUT_CMS)
> +   if (tex_layout == INTEL_MSAA_LAYOUT_CMS) {
>mcs = blorp_nir_txf_ms_mcs(b, pos);
>
> +  /* The MCS buffer stores a packed value that provides a mapping from
> +   * samples to array slices.  The magic value of all ones means that
> all
> +   * samples have the clear color.  In this case, we can
> short-circuit the
> +   * sampling process and just use the clear color that we pushed
> into the
> +   * shader.
> +   */
> +  nir_ssa_def *is_clear_color;
> +  switch (tex_samples) {
> +  case 2:
> + /* Empirical evidence suggests that the value returned from the
> +  * sampler is not always 0x3 for clear color so we need to mask
> it.
> +  */
> + is_clear_color =
> +nir_ieq(b, nir_iand(b, nir_channel(b, mcs, 0), nir_imm_int(b,
> 0x3)),
> +   nir_imm_int(b, 0x3));
> + break;
> +  case 4:
> + is_clear_color =
> +nir_ieq(b, nir_channel(b, mcs, 0), nir_imm_int(b, 0xff));
> + break;
> +  case 8:
> + is_clear_color =
> +nir_ieq(b, nir_channel(b, mcs, 0), nir_imm_int(b, ~0));
> + break;
> +  case 16:
> + is_clear_color =
> +nir_ior(b, nir_ieq(b, nir_channel(b, mcs, 0), nir_imm_int(b,
> ~0)),
>

This needs to be nir_iand.  Fixed locally...


> +   nir_ieq(b, nir_channel(b, mcs, 1), nir_imm_int(b,
> ~0)));
> + break;
> +  default:
> + unreachable("Invalid sample count");
> +  }
> +
> +  nir_if *if_stmt = nir_if_create(b->shader);
> +  if_stmt->condition = nir_src_for_ssa(is_clear_color);
> +  nir_cf_node_insert(b->cursor, _stmt->cf_node);
> +
> +  b->cursor = nir_after_cf_list(_stmt->then_list);
> +  nir_store_var(b, color, nir_load_var(b, v->u_clear_color), 0xf);
> +
> +  b->cursor = nir_after_cf_list(_stmt->else_list);
> +  outer_if = if_stmt;
> +   }
> +
>

Re: [Mesa-dev] [PATCH] egl: Check if API is supported when using eglBindAPI.

2016-05-13 Thread Manolova, Plamena

Hi Daniel,
Thanks for reviewing!

On Fri, May 13, 2016 at 5:09 PM, Daniel Stone  wrote:

> Hi,
>
> On 13 May 2016 at 17:03, Plamena Manolova 
> wrote:
> > @@ -444,6 +444,8 @@ _eglCreateAPIsString(_EGLDisplay *dpy)
> >strcat(dpy->ClientAPIsString, "OpenVG ");
> >
> > assert(strlen(dpy->ClientAPIsString) <
> sizeof(dpy->ClientAPIsString));
> > +
> > +   _eglGlobal.ClientAPIsString = dpy->ClientAPIsString;
>
> What happens when the display is destroyed and this is freed? Or when
> different displays have different supported APIs?
>

You're right this would cause trouble. I think we have two alternatives
here:
1: We could copy the string, but that wouldn't address the case in which
different displays have different APIs.
2: We could fetch the current context and use the ClientAPIsString of the
display associated with it. If there's no current
context we could simply return EGL_FALSE since we'll have no way of
verifying whether the API is supported.

>
> > @@ -69,7 +70,26 @@ struct _egl_thread_info
> >  static inline EGLBoolean
> >  _eglIsApiValid(EGLenum api)
> >  {
> > -   return (api >= _EGL_API_FIRST_API && api <= _EGL_API_LAST_API);
> > +   char *api_string;
> > +   switch (api) {
> > +  case EGL_OPENGL_API:
> > + api_string = "OpenGL";
> > + break;
> > +  case EGL_OPENGL_ES_API:
> > + api_string = "OpenGL_ES";
> > + break;
> > +  case EGL_OPENVG_API:
> > + api_string = "OpenVG";
> > + break;
> > +  default:
> > + return EGL_FALSE;
> > +  break;
> > +   }
> > +
> > +   if (strstr(api_string, _eglGlobal.ClientAPIsString))
> > +  return EGL_TRUE;
> > +   else
> > +  return EGL_FALSE;
>
> This is trivially broken: it returns TRUE if a display only supports
> OpenGL ES, but you request to bind OpenGL.
>
>
Thank you for catching this! Silly mistake on my part.


> Cheers,
> Daniel
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] egl: Check if API is supported when using eglBindAPI.

2016-05-13 Thread Daniel Stone

Hi,

On 13 May 2016 at 17:03, Plamena Manolova  wrote:
> @@ -444,6 +444,8 @@ _eglCreateAPIsString(_EGLDisplay *dpy)
>strcat(dpy->ClientAPIsString, "OpenVG ");
>
> assert(strlen(dpy->ClientAPIsString) < sizeof(dpy->ClientAPIsString));
> +
> +   _eglGlobal.ClientAPIsString = dpy->ClientAPIsString;

What happens when the display is destroyed and this is freed? Or when
different displays have different supported APIs?

> @@ -69,7 +70,26 @@ struct _egl_thread_info
>  static inline EGLBoolean
>  _eglIsApiValid(EGLenum api)
>  {
> -   return (api >= _EGL_API_FIRST_API && api <= _EGL_API_LAST_API);
> +   char *api_string;
> +   switch (api) {
> +  case EGL_OPENGL_API:
> + api_string = "OpenGL";
> + break;
> +  case EGL_OPENGL_ES_API:
> + api_string = "OpenGL_ES";
> + break;
> +  case EGL_OPENVG_API:
> + api_string = "OpenVG";
> + break;
> +  default:
> + return EGL_FALSE;
> +  break;
> +   }
> +
> +   if (strstr(api_string, _eglGlobal.ClientAPIsString))
> +  return EGL_TRUE;
> +   else
> +  return EGL_FALSE;

This is trivially broken: it returns TRUE if a display only supports
OpenGL ES, but you request to bind OpenGL.

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/5] ARB_internalformat_query2 support for OpenGL ES and other fixes

2016-05-13 Thread Alejandro Piñeiro

On 13/05/16 17:06, Ilia Mirkin wrote:
> On Fri, May 13, 2016 at 10:57 AM, Alejandro Piñeiro
>  wrote:
>> Earlier this year the support for ARB_internalformat_query2 has landed
>> [1][2], initially only for desktop GL.
>>
>> But looking more carefully to the spec [3], we found the following:
>>
>> "Dependencies
>>
>>  OpenGL 2.0 or OpenGL ES 2.0 is required"
>>
>> Note the *or*. Additionally the spec list other GL ES 2.0/3.0
>> dependencies. So that means that the extension can be also applied to
>> GL ES 2.0/3.0. FWIW, this mistake is common, as it also happens with
>> the khronos registry xml (khronos bug created [4]).
> Are you sure it's not a mistake the other way? There's no ES extension
> number allocated, and no vendor drivers expose this ext on ES, and
> this would be the first GL_ARB_* ext to be exposed in ES... normally
> these become GL_OES_bla or GL_KHR_bla.

Yes, initially I also found strange this extension to be the only ARB_
extension to be exposed in ES. But there are too many OpenGL ES
references and dependencies on the ARB_internalformat_query2 to be
consider a mistake in the other way. So now doing a detailed list of
references to OpenGL ES:

* From the list of dependencies at the beginning (line 42), it list the
following extensions:
   * OES_texture_3D (line 51)
   * OES_depth_texture(line 60)

* Then it gives further details about the OpenGL ES dependencies inside
the document:
   * Dependencies on OpenGL ES 2.0 (line 850): list valid targets/pnames
for OpenGL ES 2.0
   * Dependencies on OES_texture_3D (line 876)
   * Dependencies on OpenGL ES 3.0 (line 879): list valid targets/pnames
for OpenGL ES 3.0
   * Dependencies on ARB_depth_texture and OES_depth_texture (line 1063)

* Additionally, OpenGL ES is mentioned on the Overview. Quoting (line 89)

"While much of this information can be determined for a single GL
version
by careful examination of the specification, support for many of these
properties has been gradually introduced over a number of API
revisions. This can observed when considering the range in functionality
between the various versions of GL 2, 3, and 4, as well as GL ES 2
and 3.

In the case of an application which wishes to be scalable and able
to run
on a variety of possible GL or GL ES versions without being specifically
tailored for each version, it must either have knowledge of the
specifications built up into either the code or tables, or it must do
a number of tests on startup to determine which capabilities are
present."

From all those reference, my conclusion is that this extension should be
also supported on OpenGL ES.

About the extension number: good point. I didn't realize about that. For
EXT_ extensions that are supported on both OpenGL and OpenGL ES (like
EXT_texture_sRGB_decode), it is true that define two extensions numbers,
one for Open GL and other for Open GL ES. Note sure if it is needed for
ARB extensions too. In any case, in case of being needed, at this point
Im still convinced that query2 was intended to be available to OpenGL
ES, so this is another minor bug on the extension, similar to the bug I
opened for the registry gl.xml file, so a extension number for OpenGL ES
would be needed.

BR

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] egl: Check if API is supported when using eglBindAPI.

2016-05-13 Thread Plamena Manolova

According to the EGL specifications before binding an API
we must check whether it's supported first. If not eglBindAPI
should return EGL_FALSE and generate a EGL_BAD_PARAMETER error.

Signed-off-by: Plamena Manolova 
---
 src/egl/main/eglapi.c |  2 ++
 src/egl/main/eglcurrent.h | 22 +-
 src/egl/main/eglglobals.h |  1 +
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c
index eb612c0..0931c3d 100644
--- a/src/egl/main/eglapi.c
+++ b/src/egl/main/eglapi.c
@@ -444,6 +444,8 @@ _eglCreateAPIsString(_EGLDisplay *dpy)
   strcat(dpy->ClientAPIsString, "OpenVG ");
 
assert(strlen(dpy->ClientAPIsString) < sizeof(dpy->ClientAPIsString));
+
+   _eglGlobal.ClientAPIsString = dpy->ClientAPIsString;
 }
 
 static void
diff --git a/src/egl/main/eglcurrent.h b/src/egl/main/eglcurrent.h
index 1e386ac..0e3bd56 100644
--- a/src/egl/main/eglcurrent.h
+++ b/src/egl/main/eglcurrent.h
@@ -32,6 +32,7 @@
 #include "c99_compat.h"
 
 #include "egltypedefs.h"
+#include "eglglobals.h"
 
 
 #ifdef __cplusplus
@@ -69,7 +70,26 @@ struct _egl_thread_info
 static inline EGLBoolean
 _eglIsApiValid(EGLenum api)
 {
-   return (api >= _EGL_API_FIRST_API && api <= _EGL_API_LAST_API);
+   char *api_string;
+   switch (api) {
+  case EGL_OPENGL_API:
+ api_string = "OpenGL";
+ break;
+  case EGL_OPENGL_ES_API:
+ api_string = "OpenGL_ES";
+ break;
+  case EGL_OPENVG_API:
+ api_string = "OpenVG";
+ break;
+  default:
+ return EGL_FALSE;
+  break;
+   }
+
+   if (strstr(api_string, _eglGlobal.ClientAPIsString))
+  return EGL_TRUE;
+   else
+  return EGL_FALSE;
 }
 
 
diff --git a/src/egl/main/eglglobals.h b/src/egl/main/eglglobals.h
index ae1b75b..b770d4b 100644
--- a/src/egl/main/eglglobals.h
+++ b/src/egl/main/eglglobals.h
@@ -51,6 +51,7 @@ struct _egl_global
void (*AtExitCalls[10])(void);
 
const char *ClientExtensionString;
+   const char *ClientAPIsString;
 };
 
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] ARB_cull_distance support v4?

2016-05-13 Thread Roland Scheidegger

Am 13.05.2016 um 06:41 schrieb Dave Airlie:
> This is just the core patches, as I think the lowering was pretty
> broken in the last couple of reposts.
> 
> The lowering now lowers to one array of 8 or whatever. I need
> to recheck the gallium and llvmpipe bits on top of this, as I think
> llvmpipe will be broken.
Maybe. draw expects separate clip and cull dists, each packed as vec4s
(it could probably handle up to 2 vec4 for each).

> 
> I think I'm going to rip out the CULLDIST semantic from gallium,
> it really isn't what the hw wants.
> 

I can't really see how the output is going to look like from your
change, but there's reasons things are the way they are. This is, of
course, all inspired by d3d10 (this even predates the gl cull dist
extension) - d3d10 has these weirdo packed vec4. The problem is, in
d3d10, you can have a vec4 output declared, with x component being a
ordinary output, yz being a clipdist, and w being a cull dist. But in
gallium, we can't really have different semantics per output - hence
clip and cull must be in different outputs (and nothing else can be
packed into the same vars).
A single array for clip and cull dist each probably would have been
cleaner, but we didn't have input/output arrays for system values
neither at that time, so gallium's design is something which looks like
neither what gl, d3d10 nor probably hw wants, but was simple enough to
translate and worked.

Roland

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/5] ARB_internalformat_query2 support for OpenGL ES and other fixes

2016-05-13 Thread Ilia Mirkin

On Fri, May 13, 2016 at 10:57 AM, Alejandro Piñeiro
 wrote:
> Earlier this year the support for ARB_internalformat_query2 has landed
> [1][2], initially only for desktop GL.
>
> But looking more carefully to the spec [3], we found the following:
>
> "Dependencies
>
>  OpenGL 2.0 or OpenGL ES 2.0 is required"
>
> Note the *or*. Additionally the spec list other GL ES 2.0/3.0
> dependencies. So that means that the extension can be also applied to
> GL ES 2.0/3.0. FWIW, this mistake is common, as it also happens with
> the khronos registry xml (khronos bug created [4]).

Are you sure it's not a mistake the other way? There's no ES extension
number allocated, and no vendor drivers expose this ext on ES, and
this would be the first GL_ARB_* ext to be exposed in ES... normally
these become GL_OES_bla or GL_KHR_bla.

  -ilia

>
> Fortunately, when the extension was initially implemented, we already
> took into account most of the GL ES dependencies defined at the spec,
> so we don't need a lot of changes on mesa now. There are more on the piglit
> tests (I will send a series for piglit in short).
>
> So this series include two patches that provides the support of this
> extension in OpenGL ES:
>
>  * [PATCH 4/5] mesa/glformats: add desktop gl checks on _mesa_base_tex_format
>  * [PATCH 5/5] mesa/main: expose ARB_internalformat_query2 on ES2.
>
> The other three patches are not related with OpenGL ES, but I think
> that it is better/tidier to keep all the unreviewed patches for
> ARB_internalformat_query2 on the same series. Two of those three were
> sent at the beginning of the month [5] (so it is technically a
> re-send).
>
> As mentioned, I will send in short a equivalent series for piglit. It
> is worth to mention that with this series there will be two deqp tests
> that will start to fail:
>   * deqp-gles3@functional@negative_api@state@get_internalformativ
>   * 
> deqp-gles31@functional@debug@negative_coverage@get_error@state@get_internalformativ
>
> And two warnings:
>   * 
> deqp-gles31@functional@debug@negative_coverage@callbacks@state@get_internalformativ
>   * 
> deqp-gles31@functional@debug@negative_coverage@log@state@get_internalformativ
>
> This is caused because those tests are checking that
> GetInternalformativ returns error for some pname/target/internalformat
> that were wrong with query1 but are not anymore with query2. I
> provided patches to solve this problem [6][7]
>
> Best regards
>
> [1] https://lists.freedesktop.org/archives/mesa-dev/2016-February/106397.html
> [2] https://lists.freedesktop.org/archives/mesa-dev/2016-March/108956.html
> [3] https://www.opengl.org/registry/specs/ARB/internalformat_query2.txt
> [4] https://www.khronos.org/bugzilla/show_bug.cgi?id=1496
> [5] https://lists.freedesktop.org/archives/mesa-dev/2016-May/115736.html
> [6] https://android-review.googlesource.com/#/c/229484/
> [7] https://android-review.googlesource.com/#/c/229485/
>
> Alejandro Piñeiro (5):
>   i965/formatquery: remove INTERNALFORMAT_PREFERRED implementation
>   mesa/formatquery: add a comment to clarify INTERNALFORMAT_PREFERRED
>   mesa/formatquery: expand NUM_SAMPLE_COUNTS OpenGL ES comment
>   mesa/glformats: add desktop gl checks on _mesa_base_tex_format
>   mesa/main: expose ARB_internalformat_query2 on ES2.
>
>  src/mapi/glapi/gen/ARB_internalformat_query2.xml |  2 +-
>  src/mesa/drivers/dri/i965/brw_formatquery.c  | 71 --
>  src/mesa/main/extensions_table.h |  2 +-
>  src/mesa/main/formatquery.c  |  8 ++-
>  src/mesa/main/glformats.c| 76 
> +++-
>  5 files changed, 71 insertions(+), 88 deletions(-)
>
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glsl: make sure that interpolateAt arguments are variables

2016-05-13 Thread Eduardo Lima Mitev

Patch is:

Reviewed-by: Eduardo Lima Mitev 

On 05/13/2016 05:55 AM, Ilia Mirkin wrote:
> In the case of a constant, it might have been propagated through and
> variable_referenced() returns NULL. Error out in that case.
> 
> Fixes 3 dEQP tests:
> 
> dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.negative.interpolate_constant
> dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_centroid.negative.interpolate_constant
> dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_offset.negative.interpolate_constant
> 
> Signed-off-by: Ilia Mirkin 
> ---
>  src/compiler/glsl/ast_function.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/compiler/glsl/ast_function.cpp 
> b/src/compiler/glsl/ast_function.cpp
> index 37fb3e79..4db3dd0 100644
> --- a/src/compiler/glsl/ast_function.cpp
> +++ b/src/compiler/glsl/ast_function.cpp
> @@ -209,7 +209,7 @@ verify_parameter_modes(_mesa_glsl_parse_state *state,
>/* Verify that shader_in parameters are shader inputs */
>if (formal->data.must_be_shader_input) {
>   ir_variable *var = actual->variable_referenced();
> - if (var && var->data.mode != ir_var_shader_in) {
> + if (!var || var->data.mode != ir_var_shader_in) {
>  _mesa_glsl_error(, state,
>   "parameter `%s` must be a shader input",
>   formal->name);
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/5] mesa/main: expose ARB_internalformat_query2 on ES2.

2016-05-13 Thread Alejandro Piñeiro

From the ARB_internalformat_query2 spec:
"Dependencies
 OpenGL 2.0 or OpenGL ES 2.0 is required."

Additionally there are other mentions to ES 2.0 and 3.0 on the spec.

Acked-by: Eduardo Lima 
Acked-by: Antia Puentes 
---
 src/mapi/glapi/gen/ARB_internalformat_query2.xml | 2 +-
 src/mesa/main/extensions_table.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mapi/glapi/gen/ARB_internalformat_query2.xml 
b/src/mapi/glapi/gen/ARB_internalformat_query2.xml
index 9b0f320..073c14f 100644
--- a/src/mapi/glapi/gen/ARB_internalformat_query2.xml
+++ b/src/mapi/glapi/gen/ARB_internalformat_query2.xml
@@ -107,7 +107,7 @@
 
 
 
-
+
 
 
 
diff --git a/src/mesa/main/extensions_table.h b/src/mesa/main/extensions_table.h
index 18a5505..b07fca8 100644
--- a/src/mesa/main/extensions_table.h
+++ b/src/mesa/main/extensions_table.h
@@ -75,7 +75,7 @@ EXT(ARB_half_float_vertex   , 
ARB_half_float_vertex
 EXT(ARB_indirect_parameters , ARB_indirect_parameters  
  ,  x , GLC,  x ,  x , 2013)
 EXT(ARB_instanced_arrays, ARB_instanced_arrays 
  , GLL, GLC,  x ,  x , 2008)
 EXT(ARB_internalformat_query, ARB_internalformat_query 
  , GLL, GLC,  x ,  x , 2011)
-EXT(ARB_internalformat_query2   , ARB_internalformat_query2
  , GLL, GLC,  x ,  x , 2013)
+EXT(ARB_internalformat_query2   , ARB_internalformat_query2
  , GLL, GLC,  x , ES2, 2013)
 EXT(ARB_invalidate_subdata  , dummy_true   
  , GLL, GLC,  x ,  x , 2012)
 EXT(ARB_map_buffer_alignment, dummy_true   
  , GLL, GLC,  x ,  x , 2011)
 EXT(ARB_map_buffer_range, ARB_map_buffer_range 
  , GLL, GLC,  x ,  x , 2008)
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/5] mesa/glformats: add desktop gl checks on _mesa_base_tex_format

2016-05-13 Thread Alejandro Piñeiro

There are several internalformat that are not supported on gl es, so
it should return -1 if that is the case. This is needed in order to
get ARB_internalformat_query2 implementation deciding correctly if
a resource is supported or not on opengl es.

FWIW, in some cases, _mesa_base_fbo_format has equivalent checks
for those internalformats, although for this method it is implemented
as a check/break in most cases, to keep consistency within the function.

Acked-by: Eduardo Lima 
Acked-by: Antia Puentes 
---
 src/mesa/main/glformats.c | 76 ++-
 1 file changed, 62 insertions(+), 14 deletions(-)

diff --git a/src/mesa/main/glformats.c b/src/mesa/main/glformats.c
index 24ce7b0..26644ec 100644
--- a/src/mesa/main/glformats.c
+++ b/src/mesa/main/glformats.c
@@ -2293,25 +2293,28 @@ _mesa_base_tex_format(const struct gl_context *ctx, 
GLint internalFormat)
case 3:
   return (ctx->API != API_OPENGL_CORE) ? GL_RGB : -1;
case GL_RGB:
+   case GL_RGB8:
+  return GL_RGB;
case GL_R3_G3_B2:
case GL_RGB4:
case GL_RGB5:
-   case GL_RGB8:
case GL_RGB10:
case GL_RGB12:
case GL_RGB16:
-  return GL_RGB;
+  return _mesa_is_desktop_gl(ctx) ? GL_RGB : -1;
case 4:
   return (ctx->API != API_OPENGL_CORE) ? GL_RGBA : -1;
case GL_RGBA:
-   case GL_RGBA2:
case GL_RGBA4:
case GL_RGB5_A1:
case GL_RGBA8:
-   case GL_RGB10_A2:
+  return GL_RGBA;
+   case GL_RGBA2:
case GL_RGBA12:
case GL_RGBA16:
-  return GL_RGBA;
+  return _mesa_is_desktop_gl(ctx) ? GL_RGBA : -1;
+   case GL_RGB10_A2:
+  return _mesa_is_desktop_gl(ctx) || _mesa_is_gles3(ctx) ? GL_RGBA : -1;
default:
   ; /* fallthrough */
}
@@ -2341,7 +2344,10 @@ _mesa_base_tex_format(const struct gl_context *ctx, 
GLint internalFormat)
   case GL_DEPTH_COMPONENT:
   case GL_DEPTH_COMPONENT16:
   case GL_DEPTH_COMPONENT24:
+ return GL_DEPTH_COMPONENT;
   case GL_DEPTH_COMPONENT32:
+ if (!_mesa_is_desktop_gl(ctx))
+break;
  return GL_DEPTH_COMPONENT;
   case GL_DEPTH_STENCIL:
   case GL_DEPTH24_STENCIL8:
@@ -2374,8 +2380,12 @@ _mesa_base_tex_format(const struct gl_context *ctx, 
GLint internalFormat)
case GL_COMPRESSED_INTENSITY:
   return GL_INTENSITY;
case GL_COMPRESSED_RGB:
+  if (!_mesa_is_desktop_gl(ctx))
+ break;
   return GL_RGB;
case GL_COMPRESSED_RGBA:
+  if (!_mesa_is_desktop_gl(ctx))
+ break;
   return GL_RGBA;
default:
   ; /* fallthrough */
@@ -2426,37 +2436,57 @@ _mesa_base_tex_format(const struct gl_context *ctx, 
GLint internalFormat)
 
if (ctx->Extensions.EXT_texture_snorm) {
   switch (internalFormat) {
-  case GL_RED_SNORM:
   case GL_R8_SNORM:
+ return GL_RED;
+  case GL_RED_SNORM:
   case GL_R16_SNORM:
+ if (!_mesa_is_desktop_gl(ctx))
+break;
  return GL_RED;
-  case GL_RG_SNORM:
   case GL_RG8_SNORM:
+ return GL_RG;
+  case GL_RG_SNORM:
   case GL_RG16_SNORM:
+ if (!_mesa_is_desktop_gl(ctx))
+break;
  return GL_RG;
-  case GL_RGB_SNORM:
   case GL_RGB8_SNORM:
+ return GL_RGB;
+  case GL_RGB_SNORM:
   case GL_RGB16_SNORM:
+ if (!_mesa_is_desktop_gl(ctx))
+break;
  return GL_RGB;
-  case GL_RGBA_SNORM:
   case GL_RGBA8_SNORM:
+ return GL_RGBA;
+  case GL_RGBA_SNORM:
   case GL_RGBA16_SNORM:
+ if (!_mesa_is_desktop_gl(ctx))
+break;
  return GL_RGBA;
   case GL_ALPHA_SNORM:
   case GL_ALPHA8_SNORM:
   case GL_ALPHA16_SNORM:
+ if (!_mesa_is_desktop_gl(ctx))
+break;
  return GL_ALPHA;
   case GL_LUMINANCE_SNORM:
   case GL_LUMINANCE8_SNORM:
   case GL_LUMINANCE16_SNORM:
+ if (!_mesa_is_desktop_gl(ctx))
+break;
  return GL_LUMINANCE;
   case GL_LUMINANCE_ALPHA_SNORM:
   case GL_LUMINANCE8_ALPHA8_SNORM:
   case GL_LUMINANCE16_ALPHA16_SNORM:
+ if (!_mesa_is_desktop_gl(ctx))
+break;
  return GL_LUMINANCE_ALPHA;
   case GL_INTENSITY_SNORM:
   case GL_INTENSITY8_SNORM:
   case GL_INTENSITY16_SNORM:
+ if (!_mesa_is_desktop_gl(ctx))
+break;
  return GL_INTENSITY;
   default:
  ; /* fallthrough */
@@ -2465,21 +2495,31 @@ _mesa_base_tex_format(const struct gl_context *ctx, 
GLint internalFormat)
 
if (ctx->Extensions.EXT_texture_sRGB) {
   switch (internalFormat) {
-  case GL_SRGB_EXT:
   case GL_SRGB8_EXT:
+ return GL_RGB;
+  case GL_SRGB_EXT:
   case GL_COMPRESSED_SRGB_EXT:
+ if (!_mesa_is_desktop_gl(ctx))
+break;
  return GL_RGB;
-  case GL_SRGB_ALPHA_EXT:
   case GL_SRGB8_ALPHA8_EXT:
+ return GL_RGBA;
+  case

[Mesa-dev] [PATCH 0/5] ARB_internalformat_query2 support for OpenGL ES and other fixes

2016-05-13 Thread Alejandro Piñeiro

Earlier this year the support for ARB_internalformat_query2 has landed
[1][2], initially only for desktop GL.

But looking more carefully to the spec [3], we found the following:

"Dependencies

 OpenGL 2.0 or OpenGL ES 2.0 is required"

Note the *or*. Additionally the spec list other GL ES 2.0/3.0
dependencies. So that means that the extension can be also applied to
GL ES 2.0/3.0. FWIW, this mistake is common, as it also happens with
the khronos registry xml (khronos bug created [4]).

Fortunately, when the extension was initially implemented, we already
took into account most of the GL ES dependencies defined at the spec,
so we don't need a lot of changes on mesa now. There are more on the piglit
tests (I will send a series for piglit in short).

So this series include two patches that provides the support of this
extension in OpenGL ES:

 * [PATCH 4/5] mesa/glformats: add desktop gl checks on _mesa_base_tex_format
 * [PATCH 5/5] mesa/main: expose ARB_internalformat_query2 on ES2.

The other three patches are not related with OpenGL ES, but I think
that it is better/tidier to keep all the unreviewed patches for
ARB_internalformat_query2 on the same series. Two of those three were
sent at the beginning of the month [5] (so it is technically a
re-send).

As mentioned, I will send in short a equivalent series for piglit. It
is worth to mention that with this series there will be two deqp tests
that will start to fail:
  * deqp-gles3@functional@negative_api@state@get_internalformativ
  * 
deqp-gles31@functional@debug@negative_coverage@get_error@state@get_internalformativ

And two warnings:
  * 
deqp-gles31@functional@debug@negative_coverage@callbacks@state@get_internalformativ
  * 
deqp-gles31@functional@debug@negative_coverage@log@state@get_internalformativ

This is caused because those tests are checking that
GetInternalformativ returns error for some pname/target/internalformat
that were wrong with query1 but are not anymore with query2. I
provided patches to solve this problem [6][7]

Best regards

[1] https://lists.freedesktop.org/archives/mesa-dev/2016-February/106397.html
[2] https://lists.freedesktop.org/archives/mesa-dev/2016-March/108956.html
[3] https://www.opengl.org/registry/specs/ARB/internalformat_query2.txt
[4] https://www.khronos.org/bugzilla/show_bug.cgi?id=1496
[5] https://lists.freedesktop.org/archives/mesa-dev/2016-May/115736.html
[6] https://android-review.googlesource.com/#/c/229484/
[7] https://android-review.googlesource.com/#/c/229485/

Alejandro Piñeiro (5):
  i965/formatquery: remove INTERNALFORMAT_PREFERRED implementation
  mesa/formatquery: add a comment to clarify INTERNALFORMAT_PREFERRED
  mesa/formatquery: expand NUM_SAMPLE_COUNTS OpenGL ES comment
  mesa/glformats: add desktop gl checks on _mesa_base_tex_format
  mesa/main: expose ARB_internalformat_query2 on ES2.

 src/mapi/glapi/gen/ARB_internalformat_query2.xml |  2 +-
 src/mesa/drivers/dri/i965/brw_formatquery.c  | 71 --
 src/mesa/main/extensions_table.h |  2 +-
 src/mesa/main/formatquery.c  |  8 ++-
 src/mesa/main/glformats.c| 76 +++-
 5 files changed, 71 insertions(+), 88 deletions(-)

-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/5] mesa/formatquery: expand NUM_SAMPLE_COUNTS OpenGL ES comment

2016-05-13 Thread Alejandro Piñeiro

For ES 3.0 NUM_SAMPLE_COUNTS spec points that some formats will be
always zero. But on ES 3.1 can be different to zero.

The current code is correctly checking exactly against version 3.0,
but the comment only mentions 3.0 spec. It is clearer mentioning both.

Acked-by: Eduardo Lima 
Acked-by: Antia Puentes 
---
 src/mesa/main/formatquery.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/main/formatquery.c b/src/mesa/main/formatquery.c
index 1f21d17..1dfec5c 100644
--- a/src/mesa/main/formatquery.c
+++ b/src/mesa/main/formatquery.c
@@ -877,6 +877,9 @@ _mesa_GetInternalformativ(GLenum target, GLenum 
internalformat, GLenum pname,
* "Since multisampling is not supported for signed and unsigned
* integer internal formats, the value of NUM_SAMPLE_COUNTS will be
* zero for such formats.
+   *
+   * But that is not true for GL ES 3.1. This is the reason why we are
+   * checking against exactly version 30, instead of use _mesa_is_gles3.
*/
   if (pname == GL_NUM_SAMPLE_COUNTS && ctx->API == API_OPENGLES2 &&
   ctx->Version == 30 && _mesa_is_enum_format_integer(internalformat)) {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/5] mesa/formatquery: add a comment to clarify INTERNALFORMAT_PREFERRED

2016-05-13 Thread Alejandro Piñeiro

The comment clarifies that the driver is called only to try to get
a preferred internalformat, and that it was already checked if the
format is supported or not.

Acked-by: Eduardo Lima 
Acked-by: Antia Puentes 
---
 src/mesa/main/formatquery.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/formatquery.c b/src/mesa/main/formatquery.c
index 215c14f..1f21d17 100644
--- a/src/mesa/main/formatquery.c
+++ b/src/mesa/main/formatquery.c
@@ -902,7 +902,10 @@ _mesa_GetInternalformativ(GLenum target, GLenum 
internalformat, GLenum pname,
* format for representing resources of the specified 
 is
* returned in .
*
-   * Therefore, we let the driver answer.
+   * Therefore, we let the driver answer. Note that if we reach this
+   * point, it means that the internalformat is supported, so the driver
+   * is called just to try to get a preferred format. If not supported,
+   * GL_NONE was already returned and the driver is not called.
*/
   ctx->Driver.QueryInternalFormat(ctx, target, internalformat, pname,
   buffer);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/5] i965/formatquery: remove INTERNALFORMAT_PREFERRED implementation

2016-05-13 Thread Alejandro Piñeiro

Right now the implementation only checks if the internalformat is
supported or not. But that implementation is wrong, returning
unsupported for some internalformats. Additionally, checking if
the internalformat is supported or not is already done at mesa/main
before calling the driver hook, so this new check is not needed.

Acked-by: Eduardo Lima 
Acked-by: Antia Puentes 
---
 src/mesa/drivers/dri/i965/brw_formatquery.c | 71 -
 1 file changed, 71 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_formatquery.c 
b/src/mesa/drivers/dri/i965/brw_formatquery.c
index 210109b..8f7a910 100644
--- a/src/mesa/drivers/dri/i965/brw_formatquery.c
+++ b/src/mesa/drivers/dri/i965/brw_formatquery.c
@@ -65,46 +65,6 @@ brw_query_samples_for_format(struct gl_context *ctx, GLenum 
target,
}
 }
 
-/**
- * Returns a generic GL type from an internal format, so that it can be used
- * together with the base format to obtain a mesa_format by calling
- * mesa_format_from_format_and_type().
- */
-static GLenum
-get_generic_type_for_internal_format(GLenum internalFormat)
-{
-   if (_mesa_is_color_format(internalFormat)) {
-  if (_mesa_is_enum_format_unsigned_int(internalFormat))
- return GL_UNSIGNED_BYTE;
-  else if (_mesa_is_enum_format_signed_int(internalFormat))
- return GL_BYTE;
-   } else {
-  switch (internalFormat) {
-  case GL_STENCIL_INDEX:
-  case GL_STENCIL_INDEX8:
- return GL_UNSIGNED_BYTE;
-  case GL_DEPTH_COMPONENT:
-  case GL_DEPTH_COMPONENT16:
- return GL_UNSIGNED_SHORT;
-  case GL_DEPTH_COMPONENT24:
-  case GL_DEPTH_COMPONENT32:
- return GL_UNSIGNED_INT;
-  case GL_DEPTH_COMPONENT32F:
- return GL_FLOAT;
-  case GL_DEPTH_STENCIL:
-  case GL_DEPTH24_STENCIL8:
- return GL_UNSIGNED_INT_24_8;
-  case GL_DEPTH32F_STENCIL8:
- return GL_FLOAT_32_UNSIGNED_INT_24_8_REV;
-  default:
- /* fall-through */
- break;
-  }
-   }
-
-   return GL_FLOAT;
-}
-
 void
 brw_query_internal_format(struct gl_context *ctx, GLenum target,
   GLenum internalFormat, GLenum pname, GLint *params)
@@ -129,37 +89,6 @@ brw_query_internal_format(struct gl_context *ctx, GLenum 
target,
   break;
}
 
-   case GL_INTERNALFORMAT_PREFERRED: {
-  params[0] = GL_NONE;
-
-  /* We need to resolve an internal format that is compatible with
-   * the passed internal format, and optimal to the driver. By now,
-   * we just validate that the passed internal format is supported by
-   * the driver, and if so return the same internal format, otherwise
-   * return GL_NONE.
-   *
-   * For validating the internal format, we use the
-   * ctx->TextureFormatSupported map to check that a BRW surface format
-   * exists, that can be derived from the internal format. But this
-   * expects a mesa_format, not an internal format. So we need to "come up"
-   * with a type that is generic enough, to resolve the mesa_format first.
-   */
-  GLenum type = get_generic_type_for_internal_format(internalFormat);
-
-  /* Get a mesa_format from the internal format and type. */
-  GLint base_format = _mesa_base_tex_format(ctx, internalFormat);
-  if (base_format != -1) {
- mesa_format mesa_format =
-_mesa_format_from_format_and_type(base_format, type);
-
- if (mesa_format < MESA_FORMAT_COUNT &&
- ctx->TextureFormatSupported[mesa_format]) {
-params[0] = internalFormat;
- }
-  }
-  break;
-   }
-
default:
   /* By default, we call the driver hook's fallback function from the 
frontend,
* which has generic implementation for all pnames.
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] i965: initialize the alignment related bits in struct brw_reg

2016-05-13 Thread Samuel Iglesias Gonsálvez

With the inclusion of the "df" field in the union, this union is going
to be at the offset 8 because of the alignment rules. The alignment
bits in the middle are uninitialized and valgrind complains with errors
similar to this:

==10298== Conditional jump or move depends on uninitialised value(s)
==10298==at 0x4C31D52: __memcmp_sse4_1 (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==10298==by 0xAB16663: backend_reg::equals(backend_reg const&) const 
(brw_shader.cpp:690)
==10298==by 0xAAB629D: fs_reg::equals(fs_reg&) const (brw_fs.cpp:456)
==10298==by 0xAAD4452: operands_match(fs_inst*, fs_inst*, bool*) 
(brw_fs_cse.cpp:161)
==10298==by 0xAAD46C3: instructions_match(fs_inst*, fs_inst*, bool*) 
(brw_fs_cse.cpp:187)
==10298==by 0xAAD4BAA: fs_visitor::opt_cse_local(bblock_t*) 
(brw_fs_cse.cpp:251)
==10298==by 0xAAD5216: fs_visitor::opt_cse() (brw_fs_cse.cpp:361)
==10298==by 0xAAC8AAD: fs_visitor::optimize() (brw_fs.cpp:5401)
==10298==by 0xAACB9DC: fs_visitor::run_fs(bool) (brw_fs.cpp:5803)
==10298==by 0xAACC38B: brw_compile_fs (brw_fs.cpp:6029)
==10298==by 0xAA39796: brw_codegen_wm_prog (brw_wm.c:137)
==10298==by 0xAA3B068: brw_fs_precompile (brw_wm.c:637)

This patch adds an explicit padding and initializes it to zero.

Signed-off-by: Samuel Iglesias Gonsálvez 
---

This patch replaces the following one:

[PATCH 2/2] i965: check each field separately in backend_end::equals()

 src/mesa/drivers/dri/i965/brw_reg.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
b/src/mesa/drivers/dri/i965/brw_reg.h
index 3b76d7d..ebb7f29 100644
--- a/src/mesa/drivers/dri/i965/brw_reg.h
+++ b/src/mesa/drivers/dri/i965/brw_reg.h
@@ -243,6 +243,9 @@ struct brw_reg {
unsigned subnr:5;  /* :1 in align16 */
unsigned nr:16;
 
+   /* IMPORTANT: adjust padding bits if you add new fields */
+   unsigned padding:32;
+
union {
   struct {
  unsigned swizzle:8;  /* src only, align16 only */
@@ -337,7 +340,7 @@ brw_reg(enum brw_reg_file file,
reg.pad0 = 0;
reg.subnr = subnr * type_sz(type);
reg.nr = nr;
-
+   reg.padding = 0;
/* Initialize all union's bits to zero before setting them. */
reg.df = 0;
 
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] ARB_cull_distance (final?) and llvmpipe support

2016-05-13 Thread Tobias Klausmann


Hi Dave,

i was not aware you are working on this actively as well, i had a series 
posted 5 days ago which got some critics and reviews [1]. The most 
important points where:


1. split functional change and renaming of the lowering pass [Ian]

2. check max clip/cull array sizes in link_shaders for all stages [Ian]

3. drop culldist semantics, which you already did [Ilia]


If you are interested in changes made to satisfy 1+2, you can fetch 
patches from here: https://git.thm.de/tjkl80/mesa.git arb-cull-distance



[1] https://lists.freedesktop.org/archives/mesa-dev/2016-May/115909.html

Greetings,

Tobias


On 13.05.2016 08:14, Dave Airlie wrote:

This is hopefully the final posting for this series, I've gotten
the lowering pass to look like I wanted, which is to say it lowers
to vec4[2].

TGSI then uses the CLIPDIST semantic and the two properties to
workout what is what. This means the CULLDIST semantic is no longer
required.

So I've ripped out CULLDIST from draw, and anywhere else it was used,
and fixed draw to use the new API, as it more closely reflects how
some of the hw works.

I've also fixed the array size maximum checks, however the piglit
test expects a link error when a compile error is a valid result.

Dave.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] GBM backend dynamic dispatch method

2016-05-13 Thread Daniel Vetter

On Fri, May 13, 2016 at 02:33:13PM +0800, Jammy Zhou wrote:
> 2016-05-13 14:01 GMT+08:00 Nicolai Hähnle :
> 
> > On 13.05.2016 00:22, Jammy Zhou wrote:
> >
> >>
> >>
> >> 2016-05-13 12:39 GMT+08:00 Nicolai Hähnle  >> >:
> >>
> >> On 12.05.2016 20:20, Jammy Zhou wrote:
> >>
> >>
> >>
> >> 2016-05-12 17:39 GMT+08:00 Michel Dänzer  >> 
> >> >>:
> >>
> >>
> >>  On 12.05.2016 17:58, Yu, Qiang wrote:
> >>  > Oh, what a crazy idea. So you mean it can work like this?
> >>  >
> >>  > 1. use the libgbm/gbm_dri/libEGL/libGLES from mesa which
> >> will load
> >>  > radeonsi_dri.so
> >>  >
> >>  > 2. libGL/amdgpu_dri.so from amdgpu-pro
> >>
> >>  glamor uses libEGL/GBM and libGL, so this could only work
> >> with Mesa's
> >>  libGL (or the GLVND one in the future). Can amdgpu_dri.so
> >> work with
> >>  Mesa's libGL right now?
> >>
> >>
> >> I think amdgpu_dri.so is not completely compatible with Mesa's
> >> libGL
> >> (considering some special feature requirements for amdgpu-pro
> >> and Mesa's
> >> evolving). Another problem is that Mesa's libgbm cannot share
> >> necessary
> >> buffer attributes (such as tiling info, etc) with amdgpu_dri.so
> >> at this
> >> moment.
> >>
> >>
> >> I think the long-term plan for such attributes is passing them via
> >> amdgpu_bo_metadata (which is defined in libdrm's amdgpu.h). This
> >> metadata is read and written directly through libdrm_amdgpu, and so
> >> libgbm doesn't have to be involved as far as I can see.
> >>
> >>
> >> Yes, amdgpu_bo_metadata is exactly one good place for such kind of
> >> information. But IMHO there are still several things to take care. Did I
> >> miss something?
> >> - Same meta data definition ("umd_metadata" field) should be used by
> >> radeonsi and amdgpu-pro.
> >>
> >
> > I absolutely agree that we need to coordinate on how the metadata fields
> > are used.
> >
> > At this time, radeonsi uses and sets the explicit members of
> > amdgpu_bo_metadata, i.e. tiling_info and size_metadata. As far as I can
> > see, no flags have been defined - flags and umd_metadata are preserved by
> > radeonsi if a different UMD were to set them, and are otherwise initialized
> > to 0.
> >
> >
> > - We need some way to translate gbm_bo or EGLImage into amdgpu_bo, so
> >> that libdrm_amdgpu interfaces can be used.
> >>
> >
> > In general, how to do this kind of mapping depends on the situation. For
> > example, for gbm_bo it is the GBM backend that allocates the gbm_bo
> > structure, so C-style inheritance can be used. For example, the DRI backend
> > has a type:
> >
> > struct gbm_dri_bo {
> >struct gbm_drm_bo base;
> >
> >__DRIimage *image;
> >
> >/* Used for cursors and the swrast front BO */
> >uint32_t handle, size;
> >void *map;
> > };
> >
> > It will allocate a struct gbm_dri_bo, and pass a pointer to base back to
> > the caller. Then, callbacks implemented by the backend cast the provided
> > gbm_bo pointer to gbm_dri_bo. The GBM backend implementation in amdgpu-pro
> > can use the same trick, and store whatever internal info it requires in the
> > "derived" structure, e.g. an amdgpu_bo_handle.
> >
> 
> To clarify, I was talking about retrieving the meta data from gbm_bo in
> amdgpu-pro. This doesn't work if the DRI backend (radeonsi_dri.so) is used
> with mesa libgbm, which was mentioned in previous discussion.

Randomly jumping in here with a few notes. I'm digging through the entire
egl/gbm/wayland stack right now to figure out what metadata needs to be
added so that we can describe buffers faithfully and completely, including
tiling, compression and all that, in a generic way.

It will be just for color buffers (i.e. anything you might actually want
to share), because that's the only thing we need to share across
drivers/processes, but I think that's the same thing you need here. The
basic idea is to add the fb_modifiers we have added to the kernel (and
which are defined in drm_fourcc.h) to the DRIimage interfaces, both for
exporting and importing. It's always been the idea that drm_fourcc.h could
also contain fb_modifiers which are only useful for GL->GL sharing, so no
requirement to have an in-kernel user for each define. E.g. when certain
tiling layouts can't be scanned out.

That should be enough that you could talk to 2 completely different dri
drivers, and share buffers with them in a generic way.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

Re: [Mesa-dev] [PATCH v2 00/30] Finishing arb_gpu_shader_fp64 support to the i965 scalar backend

2016-05-13 Thread Iago Toral

On Thu, 2016-05-12 at 13:35 +0200, Samuel Iglesias Gonsálvez wrote:
> Hi,
> 
> this version includes all the feedback received to v1 plus a few new
> patches (22-27) that deal with 64bit URB read/writes, which was
> missing in v1. Below is a list of patches that still need to get the Rb:
> 
> [PATCH v2 02/30] i965/fs: Fix propagation of copies with strided source.
> [PATCH v2 05/30] i965/fs: Simplify and fix register offset calculation
> [PATCH v2 06/30] i965/fs: Reindent register offset calculation of
> [PATCH v2 07/30] i965/fs: fix copy propagation of partially invalidated
> [PATCH v2 11/30] i965/fs: add shuffle_32bit_load_result_to_64bit_data
> [PATCH v2 14/30] i965/fs: fix pull constant load component selection for
> [PATCH v2 18/30] i965/fs: support doubles with SSBO loads
> [PATCH v2 19/30] i965/fs: add shuffle_64bit_data_for_32bit_write helper
> [PATCH v2 20/30] i965/fs: support doubles with ssbo stores
> [PATCH v2 21/30] i965/fs: support doubles with shared variable stores
> [PATCH v2 22/30] i965/vec4: handle doubles in type_size_vec4()
> [PATCH v2 23/30] i965/fs: fix number of output components for doubles
> [PATCH v2 24/30] i965/fs: fix nir_intrinsic_store_output for doubles
> [PATCH v2 25/30] i965/tcs/scalar: fix load input for doubles
> [PATCH v2 26/30] i965/tcs/scalar: fix store output for doubles
> [PATCH v2 27/30] i965/tes/scalar: Fix load input for doubles

I've just sent a v3 for patches 19 and 21. The former gets rid of the
temporary like Curro suggested since in this case we really don't want
to do the shuffling in-place. The latter fixes a related bug where we
were doing in-place shuffling before a write which we shouldn't.

I think we have addressed all the other comments too, including moving
the shuffling functions to brw_fs_nir.cpp. I also went ahead and made
the do_untyped_vector_read helper static to brw_fs_nir.cpp (instead of a
fs_visitor method) since Curro's reasoning for the shuffling functions
applies to this helper just as much.

All these changes have been merged in our
i965-fp64-scalar-backend-part2-to-push branch for review / testing.

I think that at this point we only need the thumbs-up for those two v3
patches and see if Curro has more feedback since I believe he did not
have time to go through all the patches yet. If Curro does not find
anything major we should be able to land this tomorrow.

> There is still some discussion on going about where to put the
> shuffling functions but it does not make sense to postpone review of v2
> because of that, so for now we kept them in brw_fs.cpp and if we
> finally agree to move them to brw_fs_nir.cpp we will do that before
> pushing.
> 
> We have not observed any piglit regressions in ILK, SNB, IVB, HSW, BDW
> or SKL compared against master's ba3f0b6.
> 
> This series enables fp64 for gen8+ only and requires scalar GS, TCS and
> TES so these gens can do fp64 in these stages via the scalar backend,
> as the vec4 backend is not ready yet. Support to enable the scalar
> backend by default for all 3 stages has already landed in master so we
> should be all set in this regard.
> 
> As usual, a branch with the series is available for testing here:
> $ git clone -b i965-fp64-scalar-backend-part2-to-push  
> https://github.com/Igalia/mesa.git
> 
> All the new fp64 tests we wrote have also landed in piglit, except for
> patch [0]. We have a branch available with that test included here:
> 
> $ git clone -b arb_gpu_shader_fp64 https://github.com/Igalia/piglit.git
> 
> Thanks,
> 
> Sam
> 
> [0] https://lists.freedesktop.org/archives/piglit/2016-May/019761.html
> 
> Francisco Jerez (5):
>   i965/fs: Fix propagation of copies with strided source.
>   i965/fs: Simplify and fix register offset calculation of
> try_copy_propagate().
>   i965/fs: Reindent register offset calculation of try_copy_propagate().
>   i965/fs: Stop using the LOAD_PAYLOAD instruction in lower_simd_width.
>   i965/fs: Fix and document component().
> 
> Iago Toral Quiroga (25):
>   i965/fs: fix subreg_offset overflow in byte_offset()
>   i965/fs: Fix copy propagation of load payload for double operands
>   i965/fs: disallow type change in copy-propagation if types have
> different sizes
>   i965/fs: fix copy propagation of partially invalidated entries
>   i965/fs: fix copy propagation from load payload
>   i965/fs: fix copy/constant propagation regioning checks
>   i965/fs: add shuffle_32bit_load_result_to_64bit_data helper
>   i965/fs: Fix fs_visitor::VARYING_PULL_CONSTANT_LOAD for doubles
>   i965/fs: fix pull constant load component selection for doubles
>   i965/fs: support doubles with UBO loads
>   i965/fs: Add do_untyped_vector_read helper
>   i965/fs: support double with shared variable loads
>   i965/fs: support doubles with SSBO loads
>   i965/fs: add shuffle_64bit_data_for_32bit_write helper
>   i965/fs: support doubles with ssbo stores
>   i965/fs: support doubles with shared variable stores
>   i965/vec4: handle doubles in type_size_vec4()
>

[Mesa-dev] [PATCH v3] i965/fs: add shuffle_64bit_data_for_32bit_write helper

2016-05-13 Thread Iago Toral Quiroga

This does the inverse operation of shuffle_32bit_load_result_to_64bit_data
and we will use it when we need to write 64-bit data in the layout expected
by untyped write messages.

v2 (curro):
- Use subscript() instead of stride()
- Assert on the input types rather than silently retyping.
- Use offset() instead of horiz_offset(), drop the multiplier definition.
- Drop the temporary vgrf and force_writemask_all.
- Make component_i const.
- Move to brw_fs_nir.cpp

v3 (curro):
- Pass dst and src by reference.
- Simplify allocation of tmp register.
- Move to brw_fs_nir.cpp.
- Get rid of the temporary.

v3 (Iago):
- Check that the src and dst regions do not overlap, since that would
  typically be a bug in the caller.

Reviewed-by: Kenneth Graunke 
Reviewed-by: Francisco Jerez 
---
 src/mesa/drivers/dri/i965/brw_fs.h   |  5 +
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 32 
 2 files changed, 37 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 3f49a64..9faeaa1 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -541,3 +541,8 @@ void shuffle_32bit_load_result_to_64bit_data(const 
brw::fs_builder ,
  const fs_reg ,
  const fs_reg ,
  uint32_t components);
+
+void shuffle_64bit_data_for_32bit_write(const brw::fs_builder ,
+const fs_reg ,
+const fs_reg ,
+uint32_t components);
diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index ebae3c4..1820512 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -4102,3 +4102,35 @@ shuffle_32bit_load_result_to_64bit_data(const fs_builder 
,
   bld.MOV(offset(dst, bld, i), tmp);
}
 }
+
+/**
+ * This helper does the inverse operation of
+ * SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA.
+ *
+ * We need to do this when we are going to use untyped write messsages that
+ * operate with 32-bit components in order to arrange our 64-bit data to be
+ * in the expected layout.
+ *
+ * Notice that callers of this function, unlike in the case of the inverse
+ * operation, would typically need to call this with dst and src being
+ * different registers, since they would otherwise corrupt the original
+ * 64-bit data they are about to write. Because of this the function checks
+ * that the src and dst regions involved in the operation do not overlap.
+ */
+void
+shuffle_64bit_data_for_32bit_write(const fs_builder ,
+   const fs_reg ,
+   const fs_reg ,
+   uint32_t components)
+{
+   assert(type_sz(src.type) == 8);
+   assert(type_sz(dst.type) == 4);
+
+   assert(!src.in_range(dst, 2 * components * bld.dispatch_width() / 8));
+
+   for (unsigned i = 0; i < components; i++) {
+  const fs_reg component_i = offset(src, bld, i);
+  bld.MOV(offset(dst, bld, 2 * i), subscript(component_i, dst.type, 0));
+  bld.MOV(offset(dst, bld, 2 * i + 1), subscript(component_i, dst.type, 
1));
+   }
+}
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3] i965/fs: support doubles with shared variable stores

2016-05-13 Thread Iago Toral Quiroga

This is pretty much the same we do with SSBOs.

v2: do not shuffle in-place, it is not safe since the original 64-bit data
could be used after the write, instead use a temporary like we do
for SSBO stores (Iago)

Reviewed-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 40 
 1 file changed, 35 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 67c1022..e535d85 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -3121,6 +3121,29 @@ fs_visitor::nir_emit_cs_intrinsic(const fs_builder ,
   /* Writemask */
   unsigned writemask = instr->const_index[1];
 
+  /* get_nir_src() retypes to integer. Be wary of 64-bit types though
+   * since the untyped writes below operate in units of 32-bits, which
+   * means that we need to write twice as many components each time.
+   * Also, we have to suffle 64-bit data to be in the appropriate layout
+   * expected by our 32-bit write messages.
+   */
+  unsigned type_size = 4;
+  unsigned bit_size = instr->src[0].is_ssa ?
+ instr->src[0].ssa->bit_size : instr->src[0].reg.reg->bit_size;
+  if (bit_size == 64) {
+ type_size = 8;
+ fs_reg tmp =
+   fs_reg(VGRF, alloc.allocate(alloc.sizes[val_reg.nr]), val_reg.type);
+ shuffle_64bit_data_for_32bit_write(
+bld,
+retype(tmp, BRW_REGISTER_TYPE_F),
+retype(val_reg, BRW_REGISTER_TYPE_DF),
+instr->num_components);
+ val_reg = tmp;
+  }
+
+  unsigned type_slots = type_size / 4;
+
   /* Combine groups of consecutive enabled channels in one write
* message. We use ffs to find the first enabled channel and then ffs on
* the bit-inverse, down-shifted writemask to determine the length of
@@ -3129,22 +3152,29 @@ fs_visitor::nir_emit_cs_intrinsic(const fs_builder ,
   while (writemask) {
  unsigned first_component = ffs(writemask) - 1;
  unsigned length = ffs(~(writemask >> first_component)) - 1;
- fs_reg offset_reg;
 
+ /* We can't write more than 2 64-bit components at once. Limit the
+  * length of the write to what we can do and let the next iteration
+  * handle the rest
+  */
+ if (type_size > 4)
+length = MIN2(2, length);
+
+ fs_reg offset_reg;
  nir_const_value *const_offset = nir_src_as_const_value(instr->src[1]);
  if (const_offset) {
 offset_reg = brw_imm_ud(instr->const_index[0] + 
const_offset->u32[0] +
-4 * first_component);
+type_size * first_component);
  } else {
 offset_reg = vgrf(glsl_type::uint_type);
 bld.ADD(offset_reg,
 retype(get_nir_src(instr->src[1]), BRW_REGISTER_TYPE_UD),
-brw_imm_ud(instr->const_index[0] + 4 * first_component));
+brw_imm_ud(instr->const_index[0] + type_size * 
first_component));
  }
 
  emit_untyped_write(bld, surf_index, offset_reg,
-offset(val_reg, bld, first_component),
-1 /* dims */, length,
+offset(val_reg, bld, first_component * type_slots),
+1 /* dims */, length * type_slots,
 BRW_PREDICATE_NONE);
 
  /* Clear the bits in the writemask that we just wrote, then try
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] i965: Flip interpolateAtOffset's y offset when necessary.

2016-05-13 Thread Kenneth Graunke

Fixes 5 dEQP-GLES31.functional.shaders.multisample_interpolation tests:
- interpolate_at_offset.no_qualifiers.default_framebuffer
- interpolate_at_offset.centroid_qualifier.default_framebuffer
- interpolate_at_offset.sample_qualifier.default_framebuffer
- interpolate_at_offset.at_sample_position.default_framebuffer
- interpolate_at_offset.array_element.default_framebuffer

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 8 ++--
 src/mesa/drivers/dri/i965/brw_wm.c   | 3 ++-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 4648c58..5890750 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -2871,9 +2871,12 @@ fs_visitor::nir_emit_fs_intrinsic(const fs_builder ,
   case nir_intrinsic_interp_var_at_offset: {
  nir_const_value *const_offset = nir_src_as_const_value(instr->src[0]);
 
+ const bool flip = !wm_key->render_to_fbo;
+
  if (const_offset) {
 unsigned off_x = MIN2((int)(const_offset->f32[0] * 16), 7) & 0xf;
-unsigned off_y = MIN2((int)(const_offset->f32[1] * 16), 7) & 0xf;
+unsigned off_y = MIN2((int)(const_offset->f32[1] * 16 *
+(flip ? -1 : 1)), 7) & 0xf;
 
 emit_pixel_interpolater_send(bld,
  
FS_OPCODE_INTERPOLATE_AT_SHARED_OFFSET,
@@ -2889,7 +2892,8 @@ fs_visitor::nir_emit_fs_intrinsic(const fs_builder ,
fs_reg temp = vgrf(glsl_type::float_type);
bld.MUL(temp, offset(offset_src, bld, i), brw_imm_f(16.0f));
fs_reg itemp = vgrf(glsl_type::int_type);
-   bld.MOV(itemp, temp);  /* float to int */
+   /* float to int */
+   bld.MOV(itemp, (i == 1 && flip) ? negate(temp) : temp);
 
/* Clamp the upper end of the range to +7/16.
 * ARB_gpu_shader5 requires that we support a maximum offset
diff --git a/src/mesa/drivers/dri/i965/brw_wm.c 
b/src/mesa/drivers/dri/i965/brw_wm.c
index ced9708..192e8e2 100644
--- a/src/mesa/drivers/dri/i965/brw_wm.c
+++ b/src/mesa/drivers/dri/i965/brw_wm.c
@@ -511,7 +511,8 @@ brw_wm_populate_key(struct brw_context *brw, struct 
brw_wm_prog_key *key)
   key->drawable_height = _mesa_geometric_height(ctx->DrawBuffer);
}
 
-   if ((fp->program.Base.InputsRead & VARYING_BIT_POS) || program_uses_dfdy) {
+   if ((fp->program.Base.InputsRead & VARYING_BIT_POS) ||
+   program_uses_dfdy || prog->nir->info.uses_interp_var_at_offset) {
   key->render_to_fbo = _mesa_is_user_fbo(ctx->DrawBuffer);
}
 
-- 
2.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] nir: Add a nir->info.uses_interp_var_at_offset flag.

2016-05-13 Thread Kenneth Graunke

It would probably make more sense to set this from nir_gather_info()
in case we manage to dead code eliminate these intrinsics.  However,
we haven't transitioned the GL driver to using that pass yet...

Signed-off-by: Kenneth Graunke 
---
 src/compiler/nir/glsl_to_nir.cpp | 3 +++
 src/compiler/nir/nir.h   | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/src/compiler/nir/glsl_to_nir.cpp b/src/compiler/nir/glsl_to_nir.cpp
index fb1d421..e82d98a 100644
--- a/src/compiler/nir/glsl_to_nir.cpp
+++ b/src/compiler/nir/glsl_to_nir.cpp
@@ -1276,6 +1276,9 @@ nir_visitor::visit(ir_expression *ir)
   intrin->intrinsic == nir_intrinsic_interp_var_at_sample)
  intrin->src[0] = nir_src_for_ssa(evaluate_rvalue(ir->operands[1]));
 
+  if (intrin->intrinsic == nir_intrinsic_interp_var_at_offset)
+ shader->info.uses_interp_var_at_offset = true;
+
   unsigned bit_size =  glsl_get_bit_size(deref->type);
   add_instr(>instr, deref->type->vector_elements, bit_size);
 
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 20927a2..d12792d 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -1710,6 +1710,9 @@ typedef struct nir_shader_info {
/* Whether or not this shader ever uses textureGather() */
bool uses_texture_gather;
 
+   /** Whether or not this shader uses nir_intrinsic_interp_var_at_offset */
+   bool uses_interp_var_at_offset;
+
/* Whether or not this shader uses the gl_ClipDistance output */
bool uses_clip_distance_out;
 
-- 
2.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 11/15] glsl/linker: dvec3/dvec4 may consume twice input vertex attributes

2016-05-13 Thread Juan A. Suarez Romero

On Thu, 2016-05-12 at 15:42 -0700, Kenneth Graunke wrote:
> I'm a bit confused - it looks like we already do this check slightly
> earlier in the function.  Why do we need to do it again (or later?)?

In the earlier case, we are using explicit location. So we already know
how many locations we are consuming.

In the latest case, we do not have explicit location, so we need first
to calculate the locations used, and at the end check if we have enough
of them.

J.A.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 11/15] glsl/linker: dvec3/dvec4 may consume twice input vertex attributes

2016-05-13 Thread Juan A. Suarez Romero

On Fri, 2016-05-13 at 05:34 +1000, Dave Airlie wrote:
> On 13 May 2016 4:28 AM, "Antia Puentes"  wrote:
> > 
> > 
> > From: "Juan A. Suarez Romero" 
> > 
> > From the GL 4.5 core spec, section 11.1.1 (Vertex Attributes):
> > 
> > "A program with more than the value of MAX_VERTEX_ATTRIBS
> > active attribute variables may fail to link, unless
> > device-dependent optimizations are able to make the program
> > fit within available hardware resources. For the purposes
> > of this test, attribute variables of the type dvec3, dvec4,
> > dmat2x3, dmat2x4, dmat3, dmat3x4, dmat4x3, and dmat4 may
> > count as consuming twice as many attributes as equivalent
> > single-precision types. While these types use the same number
> > of generic attributes as their single-precision equivalents,
> > implementations are permitted to consume two single-precision
> > vectors of internal storage for each three- or four-component
> > double-precision vector."
> > 
> > This commits adds a flag that allows driver to specify if dvec3,
> > dvec4,
> > dmat2x3, dmat2x4, dmat3, dmat3x4, dmat4x3 and dmat4 count as
> > consuming
> > twice as many attributes as equivalent single-precision types
> > (default
> > value being false).
> Doesn't this patch break all the drivers currently implementing this
> extension?
> 
> If I read it correctly, it creates the new Const, and then turns off
> the feature.
> 


Right. That const defines if those doubles consume two locations (flag
as true) or just one (flag as false), for the purposes of checking if
it reaches the MAX_VERTEX_ATTRIBS.

And the default value is to count as one (flag as false). The reason is
that this is what is happening right now in that function, except when
we use explicit location.

When you added the code to count doubles as consuming two locations,
you only did it if the locations were explicit. But in other case,
double attributes as counted as consuming one attribute.

I don't know if you only added it with explicit location for a good
reason, or just forgot to add in the general case. 

So I took the general case as the default one.

If actually the general case should count the doubles as consuming two
(as in the case of explicit), when either we can swap the flag set it
to true as default, or directly remove the flag and force all drivers
to count doubles as consuming two attributes.


> I haven't found any hardware that doesn't consume two locations, so I
> didn't
> care to implement optional support for it back when I wrote this.
> Adding optional
> support and breaking all the gallium drivers doesn't seem like the
> correct answer
> at this time either.
> 
> Dave.
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/8] i965: Stop using brw_meta for blits and clears

2016-05-13 Thread Pohjolainen, Topi

On Thu, May 12, 2016 at 04:30:42PM -0700, Jason Ekstrand wrote:
> Now that blorp is up-and-running on gen9, there's not much stopping us from
> using blorp basically everywhere on gen6+.  The only real problem is 16x
> MSAA.  This little series adds 16x MSAA support to blorp and starts using
> it for practically everything.

I checked the 16 x msaa case, and it makes sense to me. Good call to
introduce the the nir_mask_shift_or() helper. Series is:

Reviewed-by: Topi Pohjolainen 

> 
> Jason Ekstrand (8):
>   i965: Move brw_get_rb_for_slice to brw_meta_util
>   i965: Move brw_is_color_fast_clear_compatible to brw_meta_util
>   i965; Move brw_meta_get_*_rect to brw_meta_util.c
>   i965: move brw_meta_set_fast_clear_color to brw_meta_util.c
>   i965/blorp: Add support for 16x MSAA
>   i965: Use blorp for all updownsample blits
>   i965: Use blorp for all stencil blits
>   i965: Use blorp for all clears
> 
>  src/mesa/drivers/dri/i965/Makefile.sources|   3 -
>  src/mesa/drivers/dri/i965/brw_blorp_blit.cpp  |  81 +-
>  src/mesa/drivers/dri/i965/brw_clear.c |   8 -
>  src/mesa/drivers/dri/i965/brw_context.c   |   1 -
>  src/mesa/drivers/dri/i965/brw_context.h   |  29 -
>  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c   | 919 
> --
>  src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c | 566 -
>  src/mesa/drivers/dri/i965/brw_meta_updownsample.c | 150 
>  src/mesa/drivers/dri/i965/brw_meta_util.c | 350 
>  src/mesa/drivers/dri/i965/brw_meta_util.h |   5 +
>  src/mesa/drivers/dri/i965/intel_fbo.c |   7 +-
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c |  34 +-
>  12 files changed, 442 insertions(+), 1711 deletions(-)
>  delete mode 100644 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
>  delete mode 100644 src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c
>  delete mode 100644 src/mesa/drivers/dri/i965/brw_meta_updownsample.c
> 
> -- 
> 2.5.0.400.gff86faf
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] glsl: add unit tests data vertex/expected outcome for uninitialized warning

2016-05-13 Thread Alejandro Piñeiro

v2: fix 025 test. Add three more tests (Ian Romanick)
---

This patch adds the two tests Ian suggested:
  026-out-function-parameter-shaderout.ver
  027-inout-function-parameter-shaderout.vert

Plus one combining function parameters and undefined/then defined arrays
with undefined indexes:
  030-array-as-function-parameter

It also updates 025 test. I have just sent a v3 of the patch 02 of this
series, due an error detected by test 026.

 .../glsl/tests/warnings/000-basic-test.vert| 10 
 .../tests/warnings/000-basic-test.vert.expected|  1 +
 .../warnings/001-use-undefined-then-define.vert| 12 ++
 .../001-use-undefined-then-define.vert.expected|  1 +
 src/compiler/glsl/tests/warnings/002-loop.vert | 23 ++
 .../glsl/tests/warnings/002-loop.vert.expected |  3 +++
 src/compiler/glsl/tests/warnings/003-less.vert | 17 +
 .../glsl/tests/warnings/003-less.vert.expected |  1 +
 src/compiler/glsl/tests/warnings/004-greater.vert  | 17 +
 .../glsl/tests/warnings/004-greater.vert.expected  |  1 +
 src/compiler/glsl/tests/warnings/005-lequal.vert   | 17 +
 .../glsl/tests/warnings/005-lequal.vert.expected   |  1 +
 src/compiler/glsl/tests/warnings/006-gequal.vert   | 17 +
 .../glsl/tests/warnings/006-gequal.vert.expected   |  1 +
 src/compiler/glsl/tests/warnings/007-test-mod.vert | 25 +++
 .../glsl/tests/warnings/007-test-mod.vert.expected |  3 +++
 .../glsl/tests/warnings/008-mulassign.vert | 12 ++
 .../tests/warnings/008-mulassign.vert.expected |  1 +
 .../glsl/tests/warnings/009-div-assign.vert| 12 ++
 .../tests/warnings/009-div-assign.vert.expected|  1 +
 .../glsl/tests/warnings/010-add-assign.vert| 12 ++
 .../tests/warnings/010-add-assign.vert.expected|  1 +
 .../glsl/tests/warnings/011-sub-assign.vert| 12 ++
 .../tests/warnings/011-sub-assign.vert.expected|  1 +
 .../glsl/tests/warnings/012-modassign.vert | 12 ++
 .../tests/warnings/012-modassign.vert.expected |  1 +
 src/compiler/glsl/tests/warnings/013-lsassign.vert | 12 ++
 .../glsl/tests/warnings/013-lsassign.vert.expected |  1 +
 src/compiler/glsl/tests/warnings/014-rsassign.vert | 12 ++
 .../glsl/tests/warnings/014-rsassign.vert.expected |  1 +
 .../glsl/tests/warnings/015-andassign.vert | 12 ++
 .../tests/warnings/015-andassign.vert.expected |  1 +
 src/compiler/glsl/tests/warnings/016-orassign.vert | 12 ++
 .../glsl/tests/warnings/016-orassign.vert.expected |  1 +
 .../glsl/tests/warnings/017-xorassign.vert | 12 ++
 .../tests/warnings/017-xorassign.vert.expected |  1 +
 src/compiler/glsl/tests/warnings/018-bitand.vert   | 24 +++
 .../glsl/tests/warnings/018-bitand.vert.expected   |  3 +++
 src/compiler/glsl/tests/warnings/019-array.vert| 23 ++
 .../glsl/tests/warnings/019-array.vert.expected|  5 
 .../glsl/tests/warnings/020-array-length.vert  | 12 ++
 .../tests/warnings/020-array-length.vert.expected  |  0
 src/compiler/glsl/tests/warnings/021-lshift.vert   | 25 +++
 .../glsl/tests/warnings/021-lshift.vert.expected   |  3 +++
 src/compiler/glsl/tests/warnings/022-rshift.vert   | 25 +++
 .../glsl/tests/warnings/022-rshift.vert.expected   |  3 +++
 src/compiler/glsl/tests/warnings/023-switch.vert   | 28 ++
 .../glsl/tests/warnings/023-switch.vert.expected   |  3 +++
 .../glsl/tests/warnings/024-shaderout.vert | 19 +++
 .../tests/warnings/024-shaderout.vert.expected |  2 ++
 .../tests/warnings/025-function-parameters.vert| 16 +
 .../warnings/025-function-parameters.vert.expected |  2 ++
 .../026-out-function-parameter-shaderout.vert  | 14 +++
 ...-out-function-parameter-shaderout.vert.expected |  0
 .../027-inout-function-parameter-shaderout.vert| 14 +++
 ...nout-function-parameter-shaderout.vert.expected |  1 +
 .../glsl/tests/warnings/028-conditional.vert   | 17 +
 .../tests/warnings/028-conditional.vert.expected   |  6 +
 .../glsl/tests/warnings/029-fieldselection.vert| 23 ++
 .../warnings/029-fieldselection.vert.expected  |  1 +
 .../warnings/030-array-as-function-parameter.vert  | 17 +
 .../030-array-as-function-parameter.vert.expected  |  7 ++
 62 files changed, 573 insertions(+)
 create mode 100644 src/compiler/glsl/tests/warnings/000-basic-test.vert
 create mode 100644 
src/compiler/glsl/tests/warnings/000-basic-test.vert.expected
 create mode 100644 
src/compiler/glsl/tests/warnings/001-use-undefined-then-define.vert
 create mode 100644 
src/compiler/glsl/tests/warnings/001-use-undefined-then-define.vert.expected
 create mode 100644 src/compiler/glsl/tests/warnings/002-loop.vert
 create mode 100644

[Mesa-dev] [PATCH v3] glsl: do not raise uninitialized warning with out function parameters

2016-05-13 Thread Alejandro Piñeiro

It silence by default warnings with function parameters, as the
parameters need to be processed in order to have the actual and the
formal parameter, and the function signature. Then it raises the
warning if needed at verify_parameter_modes where other in/out/inout modes
checks are done.

v2: fix comment style, multi-line condition style, simplify check,
remove extra blank (Ian Romanick)
v3: inout function parameters can raise the warning too (Ian
Romanick)
---

One of the extra tests Ian proposed for patch 05 showed that the patch
was wrong, as inout function parameters should raise the warning too.

So the warning should be raised too inside the out/inout if (the one
that starts with the comment /* Verify that 'out' and 'inout' ...)
for inout vars, and before var->data.assigned is set.


 src/compiler/glsl/ast_function.cpp | 28 
 1 file changed, 28 insertions(+)

diff --git a/src/compiler/glsl/ast_function.cpp 
b/src/compiler/glsl/ast_function.cpp
index 37fb3e79..68bccbd 100644
--- a/src/compiler/glsl/ast_function.cpp
+++ b/src/compiler/glsl/ast_function.cpp
@@ -43,6 +43,12 @@ process_parameters(exec_list *instructions, exec_list 
*actual_parameters,
unsigned count = 0;
 
foreach_list_typed(ast_node, ast, link, parameters) {
+  /* We need to process the parameters first in order to know if we can
+   * raise or not a unitialized warning. Calling set_is_lhs silence the
+   * warning for now. Raising the warning or not will be checked at
+   * verify_parameter_modes.
+   */
+  ast->set_is_lhs(true);
   ir_rvalue *result = ast->hir(instructions, state);
 
   ir_constant *const constant = result->constant_expression_value();
@@ -247,6 +253,16 @@ verify_parameter_modes(_mesa_glsl_parse_state *state,
 }
 
 ir_variable *var = actual->variable_referenced();
+
+ if (var && formal->data.mode == ir_var_function_inout) {
+if ((var->data.mode == ir_var_auto || var->data.mode == 
ir_var_shader_out) &&
+!var->data.assigned &&
+!is_gl_identifier(var->name)) {
+   _mesa_glsl_warning(, state, "`%s' used uninitialized",
+  var->name);
+}
+ }
+
 if (var)
var->data.assigned = true;
 
@@ -263,6 +279,18 @@ verify_parameter_modes(_mesa_glsl_parse_state *state,
  mode, formal->name);
 return false;
 }
+  } else {
+ assert(formal->data.mode == ir_var_function_in ||
+formal->data.mode == ir_var_const_in);
+ ir_variable *var = actual->variable_referenced();
+ if (var) {
+if ((var->data.mode == ir_var_auto || var->data.mode == 
ir_var_shader_out) &&
+!var->data.assigned &&
+!is_gl_identifier(var->name)) {
+   _mesa_glsl_warning(, state, "`%s' used uninitialized",
+  var->name);
+}
+ }
   }
 
   if (formal->type->is_image() &&
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 06/28] i965/gen7_wm: Move where we set the fast clear op

2016-05-13 Thread Jason Ekstrand

On Fri, May 13, 2016 at 12:58 AM, Pohjolainen, Topi <
topi.pohjolai...@intel.com> wrote:

> On Tue, May 10, 2016 at 04:16:26PM -0700, Jason Ekstrand wrote:
> > This better matches gen8 state setup
>
> I don't think anything in the rest of the series depends on this:
>

It mostly just made the next "reshuffle everything" patch a bit cleaner.


> Acked-by: Topi Pohjolainen 
>
> > ---
> >  src/mesa/drivers/dri/i965/gen7_wm_state.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/gen7_wm_state.c
> b/src/mesa/drivers/dri/i965/gen7_wm_state.c
> > index 17dea99..8d2e2c3 100644
> > --- a/src/mesa/drivers/dri/i965/gen7_wm_state.c
> > +++ b/src/mesa/drivers/dri/i965/gen7_wm_state.c
> > @@ -214,6 +214,8 @@ gen7_upload_ps_state(struct brw_context *brw,
> > if (prog_data->num_varying_inputs != 0)
> >dw4 |= GEN7_PS_ATTRIBUTE_ENABLE;
> >
> > +   dw4 |= fast_clear_op;
> > +
> > if (prog_data->prog_offset_16 || prog_data->no_8) {
> >dw4 |= GEN7_PS_16_DISPATCH_ENABLE;
> >
> > @@ -243,8 +245,6 @@ gen7_upload_ps_state(struct brw_context *brw,
> >ksp0 = stage_state->prog_offset;
> > }
> >
> > -   dw4 |= fast_clear_op;
> > -
> > BEGIN_BATCH(8);
> > OUT_BATCH(_3DSTATE_PS << 16 | (8 - 2));
> > OUT_BATCH(ksp0);
> > --
> > 2.5.0.400.gff86faf
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 06/28] i965/gen7_wm: Move where we set the fast clear op

2016-05-13 Thread Pohjolainen, Topi

On Tue, May 10, 2016 at 04:16:26PM -0700, Jason Ekstrand wrote:
> This better matches gen8 state setup

I don't think anything in the rest of the series depends on this:

Acked-by: Topi Pohjolainen 

> ---
>  src/mesa/drivers/dri/i965/gen7_wm_state.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/gen7_wm_state.c 
> b/src/mesa/drivers/dri/i965/gen7_wm_state.c
> index 17dea99..8d2e2c3 100644
> --- a/src/mesa/drivers/dri/i965/gen7_wm_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_wm_state.c
> @@ -214,6 +214,8 @@ gen7_upload_ps_state(struct brw_context *brw,
> if (prog_data->num_varying_inputs != 0)
>dw4 |= GEN7_PS_ATTRIBUTE_ENABLE;
>  
> +   dw4 |= fast_clear_op;
> +
> if (prog_data->prog_offset_16 || prog_data->no_8) {
>dw4 |= GEN7_PS_16_DISPATCH_ENABLE;
>  
> @@ -243,8 +245,6 @@ gen7_upload_ps_state(struct brw_context *brw,
>ksp0 = stage_state->prog_offset;
> }
>  
> -   dw4 |= fast_clear_op;
> -
> BEGIN_BATCH(8);
> OUT_BATCH(_3DSTATE_PS << 16 | (8 - 2));
> OUT_BATCH(ksp0);
> -- 
> 2.5.0.400.gff86faf
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 05/28] i965/fs: Stop setting dispatch_grf_start_reg from the visitor

2016-05-13 Thread Pohjolainen, Topi

On Tue, May 10, 2016 at 04:16:25PM -0700, Jason Ekstrand wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp  | 18 --
>  src/mesa/drivers/dri/i965/brw_shader.cpp  |  1 +
>  src/mesa/drivers/dri/i965/brw_vec4.cpp|  2 ++
>  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp |  1 +
>  src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp|  1 +
>  5 files changed, 9 insertions(+), 14 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp

Reviewed-by: Topi Pohjolainen 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 04/28] i965/fs: Clean up the logic in compile_fs a bit

2016-05-13 Thread Pohjolainen, Topi

On Tue, May 10, 2016 at 04:16:24PM -0700, Jason Ekstrand wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 73 
> 
>  1 file changed, 41 insertions(+), 32 deletions(-)

Reviewed-by: Topi Pohjolainen 

> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 71e759d..d136ba8 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -6017,52 +6017,56 @@ brw_compile_fs(const struct brw_compiler *compiler, 
> void *log_data,
> key->persample_interp,
> shader);
>  
> -   fs_visitor v(compiler, log_data, mem_ctx, key,
> -_data->base, prog, shader, 8,
> -shader_time_index8);
> -   if (!v.run_fs(false /* do_rep_send */)) {
> +   cfg_t *simd8_cfg = NULL, *simd16_cfg = NULL;
> +
> +   fs_visitor v8(compiler, log_data, mem_ctx, key,
> + _data->base, prog, shader, 8,
> + shader_time_index8);
> +   if (!v8.run_fs(false /* do_rep_send */)) {
>if (error_str)
> - *error_str = ralloc_strdup(mem_ctx, v.fail_msg);
> + *error_str = ralloc_strdup(mem_ctx, v8.fail_msg);
>  
>return NULL;
> +   } else if (likely(!(INTEL_DEBUG & DEBUG_NO8))) {
> +  simd8_cfg = v8.cfg;
> }
>  
> -   cfg_t *simd16_cfg = NULL;
> -   fs_visitor v2(compiler, log_data, mem_ctx, key,
> - _data->base, prog, shader, 16,
> - shader_time_index16);
> -   if (likely(!(INTEL_DEBUG & DEBUG_NO16) || use_rep_send)) {
> -  if (!v.simd16_unsupported) {
> - /* Try a SIMD16 compile */
> - v2.import_uniforms();
> - if (!v2.run_fs(use_rep_send)) {
> -compiler->shader_perf_log(log_data,
> -  "SIMD16 shader failed to compile: %s",
> -  v2.fail_msg);
> - } else {
> -simd16_cfg = v2.cfg;
> - }
> +   if (!v8.simd16_unsupported &&
> +   likely(!(INTEL_DEBUG & DEBUG_NO16) || use_rep_send)) {
> +  /* Try a SIMD16 compile */
> +  fs_visitor v16(compiler, log_data, mem_ctx, key,
> + _data->base, prog, shader, 16,
> + shader_time_index16);
> +  v16.import_uniforms();
> +  if (!v16.run_fs(use_rep_send)) {
> + compiler->shader_perf_log(log_data,
> +   "SIMD16 shader failed to compile: %s",
> +   v16.fail_msg);
> +  } else {
> + simd16_cfg = v16.cfg;
>}
> }
>  
> +   /* When the caller requests a repclear shader, they want SIMD16-only */
> +   if (use_rep_send)
> +  simd8_cfg = NULL;
> +
> +   /* Prior to Iron Lake, the PS had a single shader offset with a jump table
> +* at the top to select the shader.  We've never implemented that.
> +* Instead, we just give them exactly one shader and we pick the widest 
> one
> +* available.
> +*/
> +   if (compiler->devinfo->gen < 5 && simd16_cfg)
> +  simd8_cfg = NULL;
> +
> /* We have to compute the flat inputs after the visitor is finished 
> running
>  * because it relies on prog_data->urb_setup which is computed in
>  * fs_visitor::calculate_urb_setup().
>  */
> brw_compute_flat_inputs(prog_data, key->flat_shade, shader);
>  
> -   cfg_t *simd8_cfg;
> -   int no_simd8 = (INTEL_DEBUG & DEBUG_NO8) || use_rep_send;
> -   if ((no_simd8 || compiler->devinfo->gen < 5) && simd16_cfg) {
> -  simd8_cfg = NULL;
> -  prog_data->no_8 = true;
> -   } else {
> -  simd8_cfg = v.cfg;
> -  prog_data->no_8 = false;
> -   }
> -
> fs_generator g(compiler, log_data, mem_ctx, (void *) key, 
> _data->base,
> -  v.promoted_constants, v.runtime_check_aads_emit,
> +  v8.promoted_constants, v8.runtime_check_aads_emit,
>MESA_SHADER_FRAGMENT);
>  
> if (unlikely(INTEL_DEBUG & DEBUG_WM)) {
> @@ -6072,8 +6076,13 @@ brw_compile_fs(const struct brw_compiler *compiler, 
> void *log_data,
>   shader->info.name));
> }
>  
> -   if (simd8_cfg)
> +   if (simd8_cfg) {
>g.generate_code(simd8_cfg, 8);
> +  prog_data->no_8 = false;
> +   } else {
> +  prog_data->no_8 = true;
> +   }
> +
> if (simd16_cfg)
>prog_data->prog_offset_16 = g.generate_code(simd16_cfg, 16);
>  
> -- 
> 2.5.0.400.gff86faf
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Make exec_size 16 word/byte registers use exec_size halving again.

2016-05-13 Thread Connor Abbott

My understanding is that compression isn't necessary here, at least on
newer gens (I don't know much about gen4/5). Could you explain why a
<16,16,1>:w region is illegal? It would be nice to get a PRM citation
in the comment below.

On Fri, May 13, 2016 at 3:02 AM, Kenneth Graunke  wrote:
> In a5d7e144eaf43fee37e6ff9e2de194407087632b, Connor generalized the
> exec_size halving code to handle more cases.  However, he accidentally
> made exec_size 16 instructions with word/byte types skip the halving
> code, producing invalid regions, and regressing a lot of Piglit tests
> on some 965GM systems.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95370
> Signed-off-by: Kenneth Graunke 
> Cc: Connor Abbott 
> Cc: Francisco Jerez 
> Cc: Matt Turner 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index 4f6f3a3..383450b 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -65,7 +65,8 @@ brw_reg_from_fs_reg(fs_inst *inst, fs_reg *reg, unsigned 
> gen)
> case VGRF:
>if (reg->stride == 0) {
>   brw_reg = brw_vec1_reg(brw_file_from_reg(reg), reg->nr, 0);
> -  } else if (inst->exec_size * reg->stride * type_sz(reg->type) <= 32) {
> +  } else if (inst->exec_size <= 8 &&
> + inst->exec_size * reg->stride * type_sz(reg->type) <= 32) {
>   brw_reg = brw_vecn_reg(inst->exec_size, brw_file_from_reg(reg),
>  reg->nr, 0);
>   brw_reg = stride(brw_reg, inst->exec_size * reg->stride,
> --
> 2.8.2
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

1 2 >

1 - 100 of 133 matches

Mail list logo