Re: [Mesa-dev] [PATCH 02/74] glsl: Add ir_var_buffer

2015-05-19 Thread Iago Toral
On Tue, 2015-05-19 at 15:14 -0700, Jordan Justen wrote:
> On 2015-05-14 07:06:05, Iago Toral Quiroga wrote:
> > From: Kristian Høgsberg 
> > 
> > This will be used to identify buffer variables inside shader storage
> > buffer objects, which are very similar to uniforms except for a few
> > differences, most important of which is that they are writable.
> > 
> > Since buffer variables are so similar to uniforms, we will almost always
> > want them to go through the same paths as uniforms.
> > ---
> >  src/glsl/builtin_variables.cpp   |  5 +++--
> >  src/glsl/glsl_symbol_table.cpp   | 16 +++-
> >  src/glsl/ir.cpp  |  3 +++
> >  src/glsl/ir.h|  4 +++-
> >  src/glsl/ir_function.cpp |  1 +
> >  src/glsl/ir_print_visitor.cpp|  3 ++-
> >  src/glsl/ir_reader.cpp   |  2 ++
> >  src/glsl/loop_unroll.cpp |  1 +
> >  src/glsl/lower_named_interface_blocks.cpp|  5 +++--
> >  src/glsl/lower_variable_index_to_cond_assign.cpp |  1 +
> >  src/glsl/opt_structure_splitting.cpp |  5 +++--
> >  11 files changed, 33 insertions(+), 13 deletions(-)
> > 
> > diff --git a/src/glsl/builtin_variables.cpp b/src/glsl/builtin_variables.cpp
> > index 6806aa1..a3ed79d 100644
> > --- a/src/glsl/builtin_variables.cpp
> > +++ b/src/glsl/builtin_variables.cpp
> > @@ -436,11 +436,12 @@ builtin_variable_generator::add_variable(const char 
> > *name,
> >var->data.read_only = true;
> >break;
> > case ir_var_shader_out:
> > +   case ir_var_buffer:
> 
> I think ir_var_shader_storage or ir_var_ssbo is better.

Let's use ir_var_shader_storage then.

FWIW, the ir_var_buffer name was related to the fact that shader storage
variables are called buffer variables by the spec, although I guess the
name might be a bit too generic.

Iago

> -Jordan
> 
> >break;
> > default:
> >/* The only variables that are added using this function should be
> > -   * uniforms, shader inputs, and shader outputs, constants (which use
> > -   * ir_var_auto), and system values.
> > +   * uniforms, buffers, shader inputs, and shader outputs, constants
> > +   * (which use ir_var_auto), and system values.
> > */
> >assert(0);
> >break;
> > diff --git a/src/glsl/glsl_symbol_table.cpp b/src/glsl/glsl_symbol_table.cpp
> > index 2294dda..e557689 100644
> > --- a/src/glsl/glsl_symbol_table.cpp
> > +++ b/src/glsl/glsl_symbol_table.cpp
> > @@ -36,6 +36,9 @@ public:
> >case ir_var_uniform:
> >   dest = &ibu;
> >   break;
> > +  case ir_var_buffer:
> > + dest = &ibb;
> > + break;
> >case ir_var_shader_in:
> >   dest = &ibi;
> >   break;
> > @@ -60,6 +63,8 @@ public:
> >switch (mode) {
> >case ir_var_uniform:
> >   return ibu;
> > +  case ir_var_buffer:
> > + return ibb;
> >case ir_var_shader_in:
> >   return ibi;
> >case ir_var_shader_out:
> > @@ -71,24 +76,25 @@ public:
> > }
> >  
> > symbol_table_entry(ir_variable *v)   :
> > -  v(v), f(0), t(0), ibu(0), ibi(0), ibo(0), a(0) {}
> > +  v(v), f(0), t(0), ibu(0), ibb(0), ibi(0), ibo(0), a(0) {}
> > symbol_table_entry(ir_function *f)   :
> > -  v(0), f(f), t(0), ibu(0), ibi(0), ibo(0), a(0) {}
> > +  v(0), f(f), t(0), ibu(0), ibb(0), ibi(0), ibo(0), a(0) {}
> > symbol_table_entry(const glsl_type *t)   :
> > -  v(0), f(0), t(t), ibu(0), ibi(0), ibo(0), a(0) {}
> > +  v(0), f(0), t(t), ibu(0), ibb(0), ibi(0), ibo(0), a(0) {}
> > symbol_table_entry(const glsl_type *t, enum ir_variable_mode mode) :
> > -  v(0), f(0), t(0), ibu(0), ibi(0), ibo(0), a(0)
> > +  v(0), f(0), t(0), ibu(0), ibb(0), ibi(0), ibo(0), a(0)
> > {
> >assert(t->is_interface());
> >add_interface(t, mode);
> > }
> > symbol_table_entry(const class ast_type_specifier *a):
> > -  v(0), f(0), t(0), ibu(0), ibi(0), ibo(0), a(a) {}
> > +  v(0), f(0), t(0), ibu(0), ibb(0), ibi(0), ibo(0), a(a) {}
> >  
> > ir_variable *v;
> > ir_function *f;
> > const glsl_type *t;
> > const glsl_type *ibu;
> > +   const glsl_type *ibb;
> > con

Re: [Mesa-dev] [PATCH] docs: Mark ARB_shader_storage_buffer_object as in progress

2015-05-25 Thread Iago Toral
On Mon, 2015-05-25 at 20:15 +1200, Chris Forbes wrote:
> Hardly needed, but:

I know, I should've sent this patch when we started working on this... I
got some comments asking why this is marked as not started if there are
patches in the mailing list, so I guess this will help make the current
state clear while the patches don't land.

Thanks,
Iago

> Reviewed-by: Chris Forbes 
> 
> On Mon, May 25, 2015 at 7:41 PM, Iago Toral Quiroga  wrote:
> > ---
> >  docs/GL3.txt | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/docs/GL3.txt b/docs/GL3.txt
> > index 9d56ee5..44a824b 100644
> > --- a/docs/GL3.txt
> > +++ b/docs/GL3.txt
> > @@ -164,7 +164,7 @@ GL 4.3, GLSL 4.30:
> >GL_ARB_program_interface_query   DONE (all drivers)
> >GL_ARB_robust_buffer_access_behavior not started
> >GL_ARB_shader_image_size in progress (Martin 
> > Peres)
> > -  GL_ARB_shader_storage_buffer_object      not started
> > +  GL_ARB_shader_storage_buffer_object  in progress (Iago 
> > Toral, Samuel Iglesias)
> >GL_ARB_stencil_texturing DONE (i965/gen8+, 
> > nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
> >GL_ARB_texture_buffer_range  DONE (nv50, nvc0, 
> > i965, r600, radeonsi, llvmpipe)
> >GL_ARB_texture_query_levels  DONE (all drivers 
> > that support GLSL 1.30)
> > @@ -221,7 +221,7 @@ GLES3.1, GLSL ES 3.1
> >GL_ARB_shader_atomic_countersDONE (i965)
> >GL_ARB_shader_image_load_store   in progress (curro)
> >GL_ARB_shader_image_size in progress (Martin 
> > Peres)
> > -  GL_ARB_shader_storage_buffer_object  not started
> > +  GL_ARB_shader_storage_buffer_object  in progress (Iago 
> > Toral, Samuel Iglesias)
> >GL_ARB_shading_language_packing  DONE (all drivers)
> >GL_ARB_separate_shader_objects   DONE (all drivers)
> >GL_ARB_stencil_texturing DONE (i965/gen8+, 
> > nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
> > --
> > 1.9.1
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 02/82] glsl: Add ir_var_shader_storage

2015-06-05 Thread Iago Toral
On Thu, 2015-06-04 at 00:12 -0700, Jordan Justen wrote:
> On 2015-06-03 00:00:52, Iago Toral Quiroga wrote:
> > From: Kristian Høgsberg 
(...)
> > diff --git a/src/glsl/ir.h b/src/glsl/ir.h
> > index fab1cd2..dd10298 100644
> > --- a/src/glsl/ir.h
> > +++ b/src/glsl/ir.h
> > @@ -323,6 +323,7 @@ protected:
> >  enum ir_variable_mode {
> > ir_var_auto = 0, /**< Function local variables and globals. */
> > ir_var_uniform,  /**< Variable declared as a uniform. */
> > +   ir_var_shader_storage,   /**< Variable declared as an ssbo. */
> > ir_var_shader_in,
> > ir_var_shader_out,
> > ir_var_function_in,
> > @@ -444,7 +445,9 @@ public:
> >  */
> > inline bool is_in_uniform_block() const
> 
> How about renaming this? is_in_buffer_block? (shader storage and
> uniform buffer interface blocks...)

Sure, do you think we should do the rename in the same commit or
separately?

Iago

> With that, Reviewed-by: Jordan Justen 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: do not round line width when multisampling or antialiasing are enabled

2015-06-09 Thread Iago Toral
On Tue, 2015-06-09 at 01:49 -0700, Kenneth Graunke wrote:
> On Tuesday, June 09, 2015 08:49:54 AM Iago Toral Quiroga wrote:
> > In commit fe74fee8fa721a we rounded the line width to the nearest integer to
> > match the GLES3 spec requirements stated in section 13.4.2.1, but that seems
> > to break a dEQP test that renders wide lines in some multisampling 
> > scenarios.
> > 
> > Ian noted that the Open 4.4 spec has the following similar text:
> > 
> > "The actual width of non-antialiased lines is determined by rounding the
> > supplied width to the nearest integer, then clamping it to the
> > implementation-dependent maximum non-antialiased line width."
> > 
> > and suggested that when ES removed antialiased lines, they removed
> > "non-antialised" from that paragraph but probably should not have.
> > 
> > Going by that note, this patch restricts the quantization implemented in
> > fe74fee8fa721a only to regular aliased lines. This seems to fix the broken
> > test without causing regressions in any of the dEQP line rasterization tests
> > (dEQP-GLES3.functional.rasterization.*line*).
> > 
> > Fixes:
> > dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_max.primitives.lines_wide
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90749
> > ---
> >  src/mesa/drivers/dri/i965/gen6_sf_state.c | 5 +++--
> >  src/mesa/drivers/dri/i965/gen7_sf_state.c | 6 --
> >  src/mesa/drivers/dri/i965/gen8_sf_state.c | 5 +++--
> >  3 files changed, 10 insertions(+), 6 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/gen6_sf_state.c 
> > b/src/mesa/drivers/dri/i965/gen6_sf_state.c
> > index e445ce2..b674268 100644
> > --- a/src/mesa/drivers/dri/i965/gen6_sf_state.c
> > +++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c
> > @@ -364,8 +364,9 @@ upload_sf_state(struct brw_context *brw)
> >/* OpenGL dictates that line width should be rounded to the nearest
> > * integer
> > */
> > -  float line_width =
> > - roundf(CLAMP(ctx->Line.Width, 0.0, ctx->Const.MaxLineWidth));
> > +  float line_width = CLAMP(ctx->Line.Width, 0.0, 
> > ctx->Const.MaxLineWidth);
> > +  if (!ctx->Multisample._Enabled && !ctx->Line.SmoothFlag)
> > + line_width = CLAMP(roundf(line_width), 0.0, 
> > ctx->Const.MaxLineWidth);
> >uint32_t line_width_u3_7 = U_FIXED(line_width, 7);
> >  
> >/* Line width of 0 is not allowed when MSAA enabled */
> > diff --git a/src/mesa/drivers/dri/i965/gen7_sf_state.c 
> > b/src/mesa/drivers/dri/i965/gen7_sf_state.c
> > index 58e3337..07b4394 100644
> > --- a/src/mesa/drivers/dri/i965/gen7_sf_state.c
> > +++ b/src/mesa/drivers/dri/i965/gen7_sf_state.c
> > @@ -195,9 +195,11 @@ upload_sf_state(struct brw_context *brw)
> >/* OpenGL dictates that line width should be rounded to the nearest
> > * integer
> > */
> > -  float line_width =
> > - roundf(CLAMP(ctx->Line.Width, 0.0, ctx->Const.MaxLineWidth));
> > +  float line_width = CLAMP(ctx->Line.Width, 0.0, 
> > ctx->Const.MaxLineWidth);
> > +  if (!ctx->Multisample._Enabled && !ctx->Line.SmoothFlag)
> > + line_width = CLAMP(roundf(line_width), 0.0, 
> > ctx->Const.MaxLineWidth);
> >uint32_t line_width_u3_7 = U_FIXED(line_width, 7);
> > +
> >/* Line width of 0 is not allowed when MSAA enabled */
> >if (ctx->Multisample._Enabled) {
> >   if (line_width_u3_7 == 0)
> > diff --git a/src/mesa/drivers/dri/i965/gen8_sf_state.c 
> > b/src/mesa/drivers/dri/i965/gen8_sf_state.c
> > index 52a21b6..6c765f0 100644
> > --- a/src/mesa/drivers/dri/i965/gen8_sf_state.c
> > +++ b/src/mesa/drivers/dri/i965/gen8_sf_state.c
> > @@ -157,8 +157,9 @@ upload_sf(struct brw_context *brw)
> > /* OpenGL dictates that line width should be rounded to the nearest
> >  * integer
> >  */
> > -   float line_width =
> > -  roundf(CLAMP(ctx->Line.Width, 0.0, ctx->Const.MaxLineWidth));
> > +   float line_width = CLAMP(ctx->Line.Width, 0.0, ctx->Const.MaxLineWidth);
> > +   if (!ctx->Multisample._Enabled && !ctx->Line.SmoothFlag)
> > +  line_width = CLAMP(roundf(line_width), 0.0, ctx->Const.MaxLineWidth);
> > uint32_t line_width_u3_7 = U_FIXED(line_width, 7);
> > if (line_width_u3_7 == 0)
> >line_width_u3_7 = 1;
> > 
> 
> Why not drop the second CLAMP?  Given our curre

Re: [Mesa-dev] [PATCH] i965: do not round line width when multisampling or antialiasing are enabled

2015-06-09 Thread Iago Toral
On Tue, 2015-06-09 at 10:38 -0700, Kenneth Graunke wrote:
> On Tuesday, June 09, 2015 11:29:48 AM Iago Toral wrote:
> > On Tue, 2015-06-09 at 01:49 -0700, Kenneth Graunke wrote:
> > > On Tuesday, June 09, 2015 08:49:54 AM Iago Toral Quiroga wrote:
> > > > In commit fe74fee8fa721a we rounded the line width to the nearest 
> > > > integer to
> > > > match the GLES3 spec requirements stated in section 13.4.2.1, but that 
> > > > seems
> > > > to break a dEQP test that renders wide lines in some multisampling 
> > > > scenarios.
> > > > 
> > > > Ian noted that the Open 4.4 spec has the following similar text:
> > > > 
> > > > "The actual width of non-antialiased lines is determined by 
> > > > rounding the
> > > > supplied width to the nearest integer, then clamping it to the
> > > > implementation-dependent maximum non-antialiased line width."
> > > > 
> > > > and suggested that when ES removed antialiased lines, they removed
> > > > "non-antialised" from that paragraph but probably should not have.
> > > > 
> > > > Going by that note, this patch restricts the quantization implemented in
> > > > fe74fee8fa721a only to regular aliased lines. This seems to fix the 
> > > > broken
> > > > test without causing regressions in any of the dEQP line rasterization 
> > > > tests
> > > > (dEQP-GLES3.functional.rasterization.*line*).
> > > > 
> > > > Fixes:
> > > > dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_max.primitives.lines_wide
> > > > 
> > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90749
> > > > ---
> > > >  src/mesa/drivers/dri/i965/gen6_sf_state.c | 5 +++--
> > > >  src/mesa/drivers/dri/i965/gen7_sf_state.c | 6 --
> > > >  src/mesa/drivers/dri/i965/gen8_sf_state.c | 5 +++--
> > > >  3 files changed, 10 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/src/mesa/drivers/dri/i965/gen6_sf_state.c 
> > > > b/src/mesa/drivers/dri/i965/gen6_sf_state.c
> > > > index e445ce2..b674268 100644
> > > > --- a/src/mesa/drivers/dri/i965/gen6_sf_state.c
> > > > +++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c
> > > > @@ -364,8 +364,9 @@ upload_sf_state(struct brw_context *brw)
> > > >/* OpenGL dictates that line width should be rounded to the 
> > > > nearest
> > > > * integer
> > > > */
> > > > -  float line_width =
> > > > - roundf(CLAMP(ctx->Line.Width, 0.0, ctx->Const.MaxLineWidth));
> > > > +  float line_width = CLAMP(ctx->Line.Width, 0.0, 
> > > > ctx->Const.MaxLineWidth);
> > > > +  if (!ctx->Multisample._Enabled && !ctx->Line.SmoothFlag)
> > > > + line_width = CLAMP(roundf(line_width), 0.0, 
> > > > ctx->Const.MaxLineWidth);
> > > >uint32_t line_width_u3_7 = U_FIXED(line_width, 7);
> > > >  
> > > >/* Line width of 0 is not allowed when MSAA enabled */
> > > > diff --git a/src/mesa/drivers/dri/i965/gen7_sf_state.c 
> > > > b/src/mesa/drivers/dri/i965/gen7_sf_state.c
> > > > index 58e3337..07b4394 100644
> > > > --- a/src/mesa/drivers/dri/i965/gen7_sf_state.c
> > > > +++ b/src/mesa/drivers/dri/i965/gen7_sf_state.c
> > > > @@ -195,9 +195,11 @@ upload_sf_state(struct brw_context *brw)
> > > >/* OpenGL dictates that line width should be rounded to the 
> > > > nearest
> > > > * integer
> > > > */
> > > > -  float line_width =
> > > > - roundf(CLAMP(ctx->Line.Width, 0.0, ctx->Const.MaxLineWidth));
> > > > +  float line_width = CLAMP(ctx->Line.Width, 0.0, 
> > > > ctx->Const.MaxLineWidth);
> > > > +  if (!ctx->Multisample._Enabled && !ctx->Line.SmoothFlag)
> > > > + line_width = CLAMP(roundf(line_width), 0.0, 
> > > > ctx->Const.MaxLineWidth);
> > > >uint32_t line_width_u3_7 = U_FIXED(line_width, 7);
> > > > +
> > > >/* Line width of 0 is not allowed when MSAA enabled */
> > > >if (ctx->Multisample._Enabled) {
> > > >   if (line_width_u3_7 == 0)
> > > > diff --git a/src/mesa/drivers/dri/i965/gen8_sf_state.c 
> >

Re: [Mesa-dev] [PATCH v2] i965: do not round line width when multisampling or antialiaing are enabled

2015-06-10 Thread Iago Toral
On Wed, 2015-06-10 at 17:03 -0700, Kenneth Graunke wrote:
> On Wednesday, June 10, 2015 09:07:32 AM Iago Toral Quiroga wrote:
> > In commit fe74fee8fa721a we rounded the line width to the nearest integer to
> > match the GLES3 spec requirements stated in section 13.4.2.1, but that seems
> > to break a dEQP test that renders wide lines in some multisampling 
> > scenarios.
> > 
> > Ian noted that the Open 4.4 spec has the following similar text:
> > 
> > "The actual width of non-antialiased lines is determined by rounding the
> > supplied width to the nearest integer, then clamping it to the
> > implementation-dependent maximum non-antialiased line width."
> > 
> > and suggested that when ES removed antialiased lines, they removed
> > "non-antialised" from that paragraph but probably should not have.
> > 
> > Going by that note, this patch restricts the quantization implemented in
> > fe74fee8fa721a only to regular aliased lines. This seems to keep the
> > tests fixed with that commit passing while fixing the broken test.
> > 
> > v2:
> >   - Drop one of the clamps (Ken, Marius)
> >   - Add a rule to prevent advertising line widths that when rounded go 
> > beyond
> > the limits allowed by the hardware (Ken)
> >   - Update comments in the code accordingly (Ian)
> >   - Put the code in a utility function (Ian)
> > 
> > Fixes:
> > dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_max.primitives.lines_wide
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90749
> 
> Looks good to me.  Thanks for doing this!
> 
> Reviewed-by: Kenneth Graunke 
> 
> By the way, I noticed that Marius' line-width < 1.5 code never got added
> to gen8_sf_state.c, so a couple Piglit tests fail still.  It might be
> nice to put that in the helper function too.  Feel like making a
> follow-up patch?

Sure, I'll take care of that.

Iago

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/gen8: Fix antialiased line rendering with width < 1.5

2015-06-11 Thread Iago Toral
On Thu, 2015-06-11 at 00:18 -0700, Kenneth Graunke wrote:
> On Thursday, June 11, 2015 09:03:37 AM Iago Toral Quiroga wrote:
> > The same fix Marius implemented for gen6 (commit a9b04d8a) and
> > gen7 (commit 24ecf37a).
> > ---
> >  src/mesa/drivers/dri/i965/gen8_sf_state.c | 22 --
> >  1 file changed, 20 insertions(+), 2 deletions(-)
> > 
> > Ken, I don't have gen8 hardware available to test this so it would
> > be nice if someone else verifies that this fixes the piglit tests you
> > mentioned.
> 
> Oh, sorry - I was hoping we could include this in the helper function,
> since it's the exact same code for Gen6/7/8...

Ah right, you did mention that, sorry I forgot about it. I'll send a new
version shortly.

Iago

> > diff --git a/src/mesa/drivers/dri/i965/gen8_sf_state.c 
> > b/src/mesa/drivers/dri/i965/gen8_sf_state.c
> > index 83ef62b..22c1c4a 100644
> > --- a/src/mesa/drivers/dri/i965/gen8_sf_state.c
> > +++ b/src/mesa/drivers/dri/i965/gen8_sf_state.c
> > @@ -156,8 +156,26 @@ upload_sf(struct brw_context *brw)
> > /* _NEW_LINE */
> > float line_width = brw_get_line_width(brw);
> > uint32_t line_width_u3_7 = U_FIXED(line_width, 7);
> > -   if (line_width_u3_7 == 0)
> > -  line_width_u3_7 = 1;
> > +
> > +   /* Line width of 0 is not allowed when MSAA enabled */
> > +   if (ctx->Multisample._Enabled) {
> > +  if (line_width_u3_7 == 0)
> > +  line_width_u3_7 = 1;
> > +   } else if (ctx->Line.SmoothFlag && ctx->Line.Width < 1.5) {
> > +  /* For 1 pixel line thickness or less, the general
> > +   * anti-aliasing algorithm gives up, and a garbage line is
> > +   * generated.  Setting a Line Width of 0.0 specifies the
> > +   * rasterization of the "thinnest" (one-pixel-wide),
> > +   * non-antialiased lines.
> > +   *
> > +   * Lines rendered with zero Line Width are rasterized using
> > +   * Grid Intersection Quantization rules as specified by
> > +   * bspec section 6.3.12.1 Zero-Width (Cosmetic) Line
> > +   * Rasterization.
> > +   */
> > +  line_width_u3_7 = 0;
> > +   }
> > +
> > if (brw->gen >= 9 || brw->is_cherryview) {
> >dw1 |= line_width_u3_7 << GEN9_SF_LINE_WIDTH_SHIFT;
> > } else {
> > 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: Recognize max(min(a, 1.0), 0.0) as fsat(a).

2015-06-11 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga 

On Thu, 2015-06-11 at 01:59 -0700, Kenneth Graunke wrote:
> We already recognize min(max(a, 0.0), 1.0) as a saturate, but neglected
> this variant (which is also handled by the GLSL IR pass).
> 
> shader-db results on Broadwell:
> total instructions in shared programs: 7363046 -> 7362788 (-0.00%)
> instructions in affected programs: 11928 -> 11670 (-2.16%)
> helped:64
> HURT:  0
> 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/glsl/nir/nir_opt_algebraic.py | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/glsl/nir/nir_opt_algebraic.py 
> b/src/glsl/nir/nir_opt_algebraic.py
> index eace791..3068445 100644
> --- a/src/glsl/nir/nir_opt_algebraic.py
> +++ b/src/glsl/nir/nir_opt_algebraic.py
> @@ -101,6 +101,7 @@ optimizations = [
> (('umin', a, a), a),
> (('umax', a, a), a),
> (('fmin', ('fmax', a, 0.0), 1.0), ('fsat', a), '!options->lower_fsat'),
> +   (('fmax', ('fmin', a, 1.0), 0.0), ('fsat', a), '!options->lower_fsat'),
> (('fsat', a), ('fmin', ('fmax', a, 0.0), 1.0), 'options->lower_fsat'),
> (('fsat', ('fsat', a)), ('fsat', a)),
> (('fmin', ('fmax', ('fmin', ('fmax', a, 0.0), 1.0), 0.0), 1.0), ('fmin', 
> ('fmax', a, 0.0), 1.0)),


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Re-index SSA definitions before printing NIR code.

2015-06-11 Thread Iago Toral
Thanks Ken, this does make debugging NIR code a bit easier :)

Reviewed-by: Iago Toral Quiroga 

On Wed, 2015-06-10 at 02:39 -0700, Kenneth Graunke wrote:
> This makes the SSA definitions use sequential numbers (0, 1, 2, ...)
> instead of seemingly random ones.  There's not much point normally,
> but it makes debug output much easier to read.
> 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_nir.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
> b/src/mesa/drivers/dri/i965/brw_nir.c
> index 142162c..c13708a 100644
> --- a/src/mesa/drivers/dri/i965/brw_nir.c
> +++ b/src/mesa/drivers/dri/i965/brw_nir.c
> @@ -167,6 +167,12 @@ brw_create_nir(struct brw_context *brw,
> nir_validate_shader(nir);
>  
> if (unlikely(debug_enabled)) {
> +  /* Re-index SSA defs so we print more sensible numbers. */
> +  nir_foreach_overload(nir, overload) {
> + if (overload->impl)
> +nir_index_ssa_defs(overload->impl);
> +  }
> +
>fprintf(stderr, "NIR (SSA form) for %s shader:\n",
>_mesa_shader_stage_to_string(stage));
>nir_print_shader(nir, stderr);


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0.5/3] mesa: Handle integer formats in need_rgb_to_luminance_conversion()

2015-06-15 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga 

On Thu, 2015-06-11 at 09:58 -0700, Anuj Phogat wrote:
> Signed-off-by: Anuj Phogat 
> ---
>  src/mesa/main/readpix.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
> index df46f83..9166a50 100644
> --- a/src/mesa/main/readpix.c
> +++ b/src/mesa/main/readpix.c
> @@ -54,7 +54,10 @@ need_rgb_to_luminance_conversion(mesa_format texFormat, 
> GLenum format)
> return (baseTexFormat == GL_RG ||
> baseTexFormat == GL_RGB ||
> baseTexFormat == GL_RGBA) &&
> -  (format == GL_LUMINANCE || format == GL_LUMINANCE_ALPHA);
> +  (format == GL_LUMINANCE ||
> +   format == GL_LUMINANCE_ALPHA ||
> +   format == GL_LUMINANCE_INTEGER_EXT ||
> +   format == GL_LUMINANCE_ALPHA_INTEGER_EXT);
>  }
>  
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] mesa: Use helper function need_rgb_to_luminance_conversion()

2015-06-15 Thread Iago Toral
Oh, I missed that we had that helper function available for this...

With the patch that adds the integer formats to the helper function this
is:

Reviewed-by: Iago Toral Quiroga 

On Wed, 2015-06-10 at 16:54 -0700, Anuj Phogat wrote:
> Signed-off-by: Anuj Phogat 
> ---
>  src/mesa/main/readpix.c | 11 ---
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
> index 9166a50..cba9db8 100644
> --- a/src/mesa/main/readpix.c
> +++ b/src/mesa/main/readpix.c
> @@ -421,7 +421,7 @@ read_rgba_pixels( struct gl_context *ctx,
>const struct gl_pixelstore_attrib *packing )
>  {
> GLbitfield transferOps;
> -   bool dst_is_integer, dst_is_luminance, needs_rebase;
> +   bool dst_is_integer, convert_rgb_to_lum, needs_rebase;
> int dst_stride, src_stride, rb_stride;
> uint32_t dst_format, src_format;
> GLubyte *dst, *map;
> @@ -442,10 +442,7 @@ read_rgba_pixels( struct gl_context *ctx,
> dst_is_integer = _mesa_is_enum_format_integer(format);
> dst_stride = _mesa_image_row_stride(packing, width, format, type);
> dst_format = _mesa_format_from_format_and_type(format, type);
> -   dst_is_luminance = format == GL_LUMINANCE ||
> -  format == GL_LUMINANCE_ALPHA ||
> -  format == GL_LUMINANCE_INTEGER_EXT ||
> -  format == GL_LUMINANCE_ALPHA_INTEGER_EXT;
> +   convert_rgb_to_lum = need_rgb_to_luminance_conversion(rb->Format, format);
> dst = (GLubyte *) _mesa_image_address2d(packing, pixels, width, height,
> format, type, 0, 0);
>  
> @@ -493,7 +490,7 @@ read_rgba_pixels( struct gl_context *ctx,
>  */
> assert(!transferOps || (transferOps && !dst_is_integer));
>  
> -   needs_rgba = transferOps || dst_is_luminance;
> +   needs_rgba = transferOps || convert_rgb_to_lum;
> rgba = NULL;
> if (needs_rgba) {
>uint32_t rgba_format;
> @@ -566,7 +563,7 @@ read_rgba_pixels( struct gl_context *ctx,
>  * If the dst format is Luminance, we need to do the conversion by 
> computing
>  * L=R+G+B values.
>  */
> -   if (!dst_is_luminance) {
> +   if (!convert_rgb_to_lum) {
>_mesa_format_convert(dst, dst_format, dst_stride,
> src, src_format, src_stride,
> width, height,


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/9] i965: Fix textureGrad with cube samplers

2015-06-17 Thread Iago Toral
On Tue, 2015-06-16 at 23:52 -0700, Ben Widawsky wrote:
> On Tue, Feb 24, 2015 at 07:02:50PM +0100, Eduardo Lima Mitev wrote:
> > From: Iago Toral Quiroga 
> > 
> > We can't use sampler messages with gradient information (like
> > sample_g or sample_d) to deal with this scenario because according
> > to the PRM:
> > 
> > "The r coordinate and its gradients are required only for surface
> > types that use the third coordinate. Usage of this message type on
> > cube surfaces assumes that the u, v, and gradients have already been
> > transformed onto the appropriate face, but still in [-1,+1] range.
> > The r coordinate contains the faceid, and the r gradients are ignored
> > by hardware."
> > 
> > Instead, we should lower this to compute the LOD manually based on the
> > gradients and use a different sample message that takes the computed
> > LOD instead of the gradients. This is already being done in
> > brw_lower_texture_gradients.cpp, but it is restricted to shadow
> > samplers only, although there is a comment stating that we should
> > probably do this also for samplerCube and samplerCubeArray.
> > 
> > Because of this, both dEQP and Piglit test cases for textureGrad with
> > cube maps currently fail.
> > 
> > This patch does two things:
> > 1) Activates the texturegrad lowering pass for all cube samplers.
> > 2) Corrects the computation of the LOD value for cube samplers.
> > 
> > I had to do 2) because for cube maps the calculations implemented
> > in the lowering pass always compute a value of rho that is twice
> > the value we want (so we get a LOD value one unit larger than we
> > want). This only happens for cube map samplers (all kinds). I am
> > not sure about why we need to do this, but I suspect that it is
> > related to the fact that cube map coordinates, when transported
> > to a specific face in the cube, are in the range [-1, 1] instead of
> > [0, 1] so we probably need to divide the derivatives by 2 when
> > we compute the LOD. Doing that would produce the same result as
> > dividing the final rho computation by 2 (or removing a unit
> > from the computed LOD, which is what we are doing here).
> > 
> > Fixes the following piglit tests:
> > bin/tex-miplevel-selection textureGrad Cube -auto -fbo
> > bin/tex-miplevel-selection textureGrad CubeArray -auto -fbo
> > bin/tex-miplevel-selection textureGrad CubeShadow -auto -fbo
> > 
> > Fixes 10 dEQP tests in the following category:
> > dEQP-GLES3.functional.shaders.texture_functions.texturegrad.*cube*
> 
> What$ happened to this patch? It seems like we still need/want it, and it 
> seemed
> like people had been looking at it. Was it just missing a formal review?

Yes, I think nobody gave it a formal reviewed-by yet.

Iago

> > ---
> >  .../dri/i965/brw_lower_texture_gradients.cpp   | 26 
> > +++---
> >  1 file changed, 18 insertions(+), 8 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp 
> > b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
> > index 9679d28..878a54e 100644
> > --- a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
> > @@ -89,19 +89,18 @@ txs_type(const glsl_type *type)
> >  ir_visitor_status
> >  lower_texture_grad_visitor::visit_leave(ir_texture *ir)
> >  {
> > -   /* Only lower textureGrad with shadow samplers */
> > -   if (ir->op != ir_txd || !ir->shadow_comparitor)
> > +   /* Only lower textureGrad with cube maps or shadow samplers */
> > +   if (ir->op != ir_txd ||
> > +  (ir->sampler->type->sampler_dimensionality != GLSL_SAMPLER_DIM_CUBE 
> > &&
> > +   !ir->shadow_comparitor))
> >return visit_continue;
> >  
> 
> Do you need this for GLSL_SAMPLER_DIM_3D? It seems to fit with the PRM blurb
> about "use the third coordinate"
> 
> > -   /* Lower textureGrad() with samplerCubeShadow even if we have the 
> > sample_d_c
> > +   /* Lower textureGrad() with samplerCube* even if we have the sample_d_c
> >  * message.  GLSL provides gradients for the 'r' coordinate.  
> > Unfortunately:
> >  *
> >  * From the Ivybridge PRM, Volume 4, Part 1, sample_d message 
> > description:
> >  * "The r coordinate contains the faceid, and the r gradients are 
> > ignored
> >  *  by hardware."
> > -*
> > -* We likely need to do a similar treatment for samplerCube and

Re: [Mesa-dev] [PATCH 2/9] i965: Fix textureGrad with cube samplers

2015-06-17 Thread Iago Toral
On Tue, 2015-06-16 at 23:52 -0700, Ben Widawsky wrote:
> On Tue, Feb 24, 2015 at 07:02:50PM +0100, Eduardo Lima Mitev wrote:
> > From: Iago Toral Quiroga 
> > 
> > We can't use sampler messages with gradient information (like
> > sample_g or sample_d) to deal with this scenario because according
> > to the PRM:
> > 
> > "The r coordinate and its gradients are required only for surface
> > types that use the third coordinate. Usage of this message type on
> > cube surfaces assumes that the u, v, and gradients have already been
> > transformed onto the appropriate face, but still in [-1,+1] range.
> > The r coordinate contains the faceid, and the r gradients are ignored
> > by hardware."
> > 
> > Instead, we should lower this to compute the LOD manually based on the
> > gradients and use a different sample message that takes the computed
> > LOD instead of the gradients. This is already being done in
> > brw_lower_texture_gradients.cpp, but it is restricted to shadow
> > samplers only, although there is a comment stating that we should
> > probably do this also for samplerCube and samplerCubeArray.
> > 
> > Because of this, both dEQP and Piglit test cases for textureGrad with
> > cube maps currently fail.
> > 
> > This patch does two things:
> > 1) Activates the texturegrad lowering pass for all cube samplers.
> > 2) Corrects the computation of the LOD value for cube samplers.
> > 
> > I had to do 2) because for cube maps the calculations implemented
> > in the lowering pass always compute a value of rho that is twice
> > the value we want (so we get a LOD value one unit larger than we
> > want). This only happens for cube map samplers (all kinds). I am
> > not sure about why we need to do this, but I suspect that it is
> > related to the fact that cube map coordinates, when transported
> > to a specific face in the cube, are in the range [-1, 1] instead of
> > [0, 1] so we probably need to divide the derivatives by 2 when
> > we compute the LOD. Doing that would produce the same result as
> > dividing the final rho computation by 2 (or removing a unit
> > from the computed LOD, which is what we are doing here).
> > 
> > Fixes the following piglit tests:
> > bin/tex-miplevel-selection textureGrad Cube -auto -fbo
> > bin/tex-miplevel-selection textureGrad CubeArray -auto -fbo
> > bin/tex-miplevel-selection textureGrad CubeShadow -auto -fbo
> > 
> > Fixes 10 dEQP tests in the following category:
> > dEQP-GLES3.functional.shaders.texture_functions.texturegrad.*cube*
> 
> What$ happened to this patch? It seems like we still need/want it, and it 
> seemed
> like people had been looking at it. Was it just missing a formal review?
> 
> > ---
> >  .../dri/i965/brw_lower_texture_gradients.cpp   | 26 
> > +++---
> >  1 file changed, 18 insertions(+), 8 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp 
> > b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
> > index 9679d28..878a54e 100644
> > --- a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
> > @@ -89,19 +89,18 @@ txs_type(const glsl_type *type)
> >  ir_visitor_status
> >  lower_texture_grad_visitor::visit_leave(ir_texture *ir)
> >  {
> > -   /* Only lower textureGrad with shadow samplers */
> > -   if (ir->op != ir_txd || !ir->shadow_comparitor)
> > +   /* Only lower textureGrad with cube maps or shadow samplers */
> > +   if (ir->op != ir_txd ||
> > +  (ir->sampler->type->sampler_dimensionality != GLSL_SAMPLER_DIM_CUBE 
> > &&
> > +   !ir->shadow_comparitor))
> >return visit_continue;
> >  
> 
> Do you need this for GLSL_SAMPLER_DIM_3D? It seems to fit with the PRM blurb
> about "use the third coordinate"

I did not identify the need to do this for that kind, at least I did not
find any tests that failed in either deqp or piglit.

> > -   /* Lower textureGrad() with samplerCubeShadow even if we have the 
> > sample_d_c
> > +   /* Lower textureGrad() with samplerCube* even if we have the sample_d_c
> >  * message.  GLSL provides gradients for the 'r' coordinate.  
> > Unfortunately:
> >  *
> >  * From the Ivybridge PRM, Volume 4, Part 1, sample_d message 
> > description:
> >  * "The r coordinate contains the faceid, and the r gradients are 
> > ignored
> >  *  by hardware."
> > -*
> > -

Re: [Mesa-dev] [PATCH] nir: add helper to get # of src/dest components

2015-06-17 Thread Iago Toral
On Mon, 2015-06-08 at 15:45 -0400, Rob Clark wrote:
> From: Rob Clark 
> 
> I need something like this in a couple places.  And didn't see anything
> like it anywhere.

We ended up doing something similar in our work-in-progress nir/vec4
pass, it makes the code a bit easier to read, so if nobody else has
objections this is:

Reviewed-by: Iago Toral Quiroga 

> Signed-off-by: Rob Clark 
> ---
> v2: Added similar helper for nir_src, and cleaned up a few places that
> open coded this.  There are a couple left (such as validate_alu_src())
> but that handle is_packed differently so I thought it best to leave
> them as-is.


A quick grep suggests that nobody is ever setting is_packed to true...
the only code that reads is_packed is in nir_validate and it is there
only to avoid doing validations on the number of components when
is_packed is set to true, so I guess it is okay to keep that code as it
is.

Iago

>  src/glsl/nir/nir.h  | 18 ++
>  src/glsl/nir/nir_from_ssa.c | 10 ++
>  src/glsl/nir/nir_validate.c |  4 +---
>  3 files changed, 21 insertions(+), 11 deletions(-)
> 
> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
> index 697d37e..06bbb0c 100644
> --- a/src/glsl/nir/nir.h
> +++ b/src/glsl/nir/nir.h
> @@ -541,6 +541,24 @@ typedef struct {
>  #define nir_foreach_def_safe(reg, dest) \
> list_for_each_entry_safe(nir_dest, dest, &(reg)->defs, reg.def_link)
>  
> +static inline unsigned
> +nir_dest_num_components(nir_dest *dest)
> +{
> +   if (dest->is_ssa)
> +  return dest->ssa.num_components;
> +   else
> +  return dest->reg.reg->num_components;
> +}
> +
> +static inline unsigned
> +nir_src_num_components(nir_src *src)
> +{
> +   if (src->is_ssa)
> +  return src->ssa->num_components;
> +   else
> +  return src->reg.reg->num_components;
> +}
> +
>  static inline nir_src
>  nir_src_for_ssa(nir_ssa_def *def)
>  {
> diff --git a/src/glsl/nir/nir_from_ssa.c b/src/glsl/nir/nir_from_ssa.c
> index 67733e6..23c798d 100644
> --- a/src/glsl/nir/nir_from_ssa.c
> +++ b/src/glsl/nir/nir_from_ssa.c
> @@ -553,10 +553,7 @@ emit_copy(nir_parallel_copy_instr *pcopy, nir_src src, 
> nir_src dest_src,
>dest_src.reg.indirect == NULL &&
>dest_src.reg.base_offset == 0);
>  
> -   if (src.is_ssa)
> -  assert(src.ssa->num_components >= dest_src.reg.reg->num_components);
> -   else
> -  assert(src.reg.reg->num_components >= 
> dest_src.reg.reg->num_components);
> +   assert(nir_src_num_components(&src) == nir_src_num_components(&dest_src));
>  
> nir_alu_instr *mov = nir_alu_instr_create(mem_ctx, nir_op_imov);
> nir_src_copy(&mov->src[0].src, &src, mem_ctx);
> @@ -712,10 +709,7 @@ resolve_parallel_copy(nir_parallel_copy_instr *pcopy,
>nir_register *reg = nir_local_reg_create(state->impl);
>reg->name = "copy_temp";
>reg->num_array_elems = 0;
> -  if (values[b].is_ssa)
> - reg->num_components = values[b].ssa->num_components;
> -  else
> - reg->num_components = values[b].reg.reg->num_components;
> +  reg->num_components = nir_src_num_components(&values[b]);
>values[num_vals].is_ssa = false;
>values[num_vals].reg.reg = reg;
>  
> diff --git a/src/glsl/nir/nir_validate.c b/src/glsl/nir/nir_validate.c
> index da92ed9..c781912 100644
> --- a/src/glsl/nir/nir_validate.c
> +++ b/src/glsl/nir/nir_validate.c
> @@ -262,9 +262,7 @@ validate_dest(nir_dest *dest, validate_state *state)
>  static void
>  validate_alu_dest(nir_alu_dest *dest, validate_state *state)
>  {
> -   unsigned dest_size =
> -  dest->dest.is_ssa ? dest->dest.ssa.num_components
> -: dest->dest.reg.reg->num_components;
> +   unsigned dest_size = nir_dest_num_components(&dest->dest);
> bool is_packed = !dest->dest.is_ssa && dest->dest.reg.reg->is_packed;
> /*
>  * validate that the instruction doesn't write to components not in the


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 23/82] glsl: Do not do CSE for expressions involving SSBO loads

2015-06-17 Thread Iago Toral
On Wed, 2015-06-17 at 17:20 -0700, Jordan Justen wrote:
> I wanted to question whether this was required, based on this text
> from the extension spec:
> 
> "The ability to write to buffer objects creates the potential for
>  multiple independent shader invocations to read and write the same
>  underlying memory. The same issue exists with the
>  ARB_shader_image_load_store extension provided in OpenGL 4.2, which
>  can write to texture objects and buffers. In both cases, the
>  specification makes few guarantees related to the relative order of
>  memory reads and writes performed by the shader invocations."
> 
> But I'm not sure if we can reconcile CSE with 'memoryBarrier' and
> 'barrier'. curro, any thoughts from image load/store?

I think the problem is within the same thread, that text above talks
about multiple invocations reading from and writing to the same
location, but within the same invocation, the order of reads and writes
must be preserved:

"Buffer variable memory reads and writes within a single shader
invocation are processed in order.  However, the order of reads and
writes performed in one invocation relative to those performed by
another invocation is largely undefined."

For example, if X is a shader storage buffer variable and we have code
like this with just one invocation:

ssbo_store(X, 1);
a = ssbo_load(X) + 1  // a = 2
ssbo_store(X, 2);
b = ssbo_load(X) + 1; // b = 3

CSE could mess it up like this:

ssbo_store(X, 1);
tmp = ssbo_load(X) + 1  // tmp = 2
a = tmp;
ssbo_store(X, 2);
b = tmp;

which would be incorrect. I think I wrote this patch after seeing
something like this happening. The CSE pass clearly states that it does
not support write variables after all.

Also, notice the same would apply if there are multiple invocations but
the shader code used something like gl_VertexID or gl_FragCoord to make
each invocation read from/write to a different address within the SSBO
buffer (I imagine this is the usual way to operate with SSBOs). In these
cases, even if we have multiple invocations, keeping the relative order
of reads and writes within each one is necessary.

Iago

> -Jordan
> 
> On 2015-06-03 00:01:13, Iago Toral Quiroga wrote:
> > SSBOs are read/write and this CSE pass only handles read-only variables.
> > ---
> >  src/glsl/opt_cse.cpp | 33 -
> >  1 file changed, 32 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/glsl/opt_cse.cpp b/src/glsl/opt_cse.cpp
> > index 4b8e9a0..a05ab46 100644
> > --- a/src/glsl/opt_cse.cpp
> > +++ b/src/glsl/opt_cse.cpp
> > @@ -245,6 +245,28 @@ contains_rvalue(ir_rvalue *haystack, ir_rvalue *needle)
> >  }
> >  
> >  static bool
> > +expression_contains_ssbo_load(ir_expression *expr)
> > +{
> > +   if (expr->operation == ir_binop_ssbo_load)
> > +  return true;
> > +
> > +   for (unsigned i = 0; i < expr->get_num_operands(); i++) {
> > +  ir_rvalue *op = expr->operands[i];
> > +  if (op->ir_type == ir_type_expression &&
> > +  expression_contains_ssbo_load(op->as_expression())) {
> > + return true;
> > +  } else if (op->ir_type == ir_type_swizzle) {
> > + ir_swizzle *swizzle = op->as_swizzle();
> > + ir_expression *val = swizzle->val->as_expression();
> > + if (val && expression_contains_ssbo_load(val))
> > +return true;
> > +  }
> > +   }
> > +
> > +   return false;
> > +}
> > +
> > +static bool
> >  is_cse_candidate(ir_rvalue *ir)
> >  {
> > /* Our temporary variable assignment generation isn't ready to handle
> > @@ -260,7 +282,16 @@ is_cse_candidate(ir_rvalue *ir)
> >  * to variable-index array dereferences at some point.
> >  */
> > switch (ir->ir_type) {
> > -   case ir_type_expression:
> > +   case ir_type_expression: {
> > + /* Skip expressions involving SSBO loads, since these operate on
> > +  * read-write variables, meaning that the same ssbo_load 
> > expression
> > +  * may return a different value if the underlying buffer storage
> > +  * is written in between.
> > +  */
> > + if (expression_contains_ssbo_load(ir->as_expression()))
> > +return false;
> > +  }
> > +  break;
> > case ir_type_texture:
> >break;
> > default:
> > -- 
> > 1.9.1
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 02/14] meta: Fix transfer operations check in meta pbo path for readpixels

2015-06-18 Thread Iago Toral
On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote:
> Without this patch, arb_color_buffer_float-readpixels test fails, when
> forced to use meta pbo path.
> 
> Signed-off-by: Anuj Phogat 
> Cc: 
> ---
>  src/mesa/drivers/common/meta_tex_subimage.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/drivers/common/meta_tex_subimage.c 
> b/src/mesa/drivers/common/meta_tex_subimage.c
> index d2474f5..00364f8 100644
> --- a/src/mesa/drivers/common/meta_tex_subimage.c
> +++ b/src/mesa/drivers/common/meta_tex_subimage.c
> @@ -273,12 +273,14 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
> GLuint dims,
> format == GL_COLOR_INDEX)
>return false;
>  
> -   if (ctx->_ImageTransferState)
> -  return false;
> -
> -

That test uses glReadPixels so it should call this with tex_image set to
NULL and it should flow through the if you have below. The call to
_mesa_get_readpixels_transfer_ops that you add below looks like it does
what we want for a pixel read from a framebuffer instead of simply
checking ctx->_ImageTransferState directly. I suppose this is what fixes
the test, right?

The patch also removes the ctx->_ImageTransferState check for the case
where we are reading from a real texture (tex_image != NULL), that seems
unrelated to fixing arb_color_buffer_float-readpixels... Looking at the
texture read code from getteximage.c it seems like this should be fine
since that file does not seem to use that field for anything either, so
I guess the check might not valid in this case.

I think it would be nice if you updated the changelog to explain these
things.

Iago

> +   /* Don't use meta path for readpixels in below conditions. */
> if (!tex_image) {
>rb = ctx->ReadBuffer->_ColorReadBuffer;
> +
> +  if (_mesa_get_readpixels_transfer_ops(ctx, rb->Format, format,
> +type, GL_FALSE))
> + return false;
> +
>if (_mesa_need_rgb_to_luminance_conversion(rb->Format, format))
>   return false;
> }


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/14] mesa: Fix conditions to test signed, unsigned integer format

2015-06-18 Thread Iago Toral
On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote:
> Signed-off-by: Anuj Phogat 
> Cc: 
> ---
>  src/mesa/main/readpix.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
> index caa2648..a9416ef 100644
> --- a/src/mesa/main/readpix.c
> +++ b/src/mesa/main/readpix.c
> @@ -160,10 +160,12 @@ _mesa_readpixels_needs_slow_path(const struct 
> gl_context *ctx, GLenum format,
>srcType = _mesa_get_format_datatype(rb->Format);
>  
>if ((srcType == GL_INT &&
> +   _mesa_is_enum_format_integer(format) &&
> (type == GL_UNSIGNED_INT ||
>  type == GL_UNSIGNED_SHORT ||
>  type == GL_UNSIGNED_BYTE)) ||
>(srcType == GL_UNSIGNED_INT &&
> +   _mesa_is_enum_format_integer(format) &&
> (type == GL_INT ||
>  type == GL_SHORT ||
>  type == GL_BYTE))) {

As far as I understand this code we are trying to see if we can use
memcpy to directly copy the contents of the framebuffer to the
destination buffer. In that case, as long as the src/dst types have
different sign we can't just use memcpy, right? In fact it looks like we
might need to expand the checks to include the cases where srcType is
GL_(UNSIGNED_)SHORT and GL_(UNSIGNED_)BYTE as well.

That said, I think this code is not necessary with the call to
_mesa_format_matches_format_and_type that we do immediately after this,
since that will check that the framebuffer format exactly matches the
destination format anyway, which is a much tighter check than this. In
fact, a quick piglit run without these checks does not seem to break any
tests on i965. Gallium uses these two functions in a slightly different
way in st_cb_readpixels.c though, so I wonder if their use case requires
these checks to exist in this function anyway.

Iago

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/14] mesa: Fix conditions to test signed, unsigned integer format

2015-06-18 Thread Iago Toral
On Thu, 2015-06-18 at 09:19 -0700, Anuj Phogat wrote:
> On Thu, Jun 18, 2015 at 7:09 AM, Iago Toral  wrote:
> > On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote:
> >> Signed-off-by: Anuj Phogat 
> >> Cc: 
> >> ---
> >>  src/mesa/main/readpix.c | 2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
> >> index caa2648..a9416ef 100644
> >> --- a/src/mesa/main/readpix.c
> >> +++ b/src/mesa/main/readpix.c
> >> @@ -160,10 +160,12 @@ _mesa_readpixels_needs_slow_path(const struct 
> >> gl_context *ctx, GLenum format,
> >>srcType = _mesa_get_format_datatype(rb->Format);
> >>
> >>if ((srcType == GL_INT &&
> >> +   _mesa_is_enum_format_integer(format) &&
> >> (type == GL_UNSIGNED_INT ||
> >>  type == GL_UNSIGNED_SHORT ||
> >>  type == GL_UNSIGNED_BYTE)) ||
> >>(srcType == GL_UNSIGNED_INT &&
> >> +   _mesa_is_enum_format_integer(format) &&
> >> (type == GL_INT ||
> >>  type == GL_SHORT ||
> >>  type == GL_BYTE))) {
> >
> > As far as I understand this code we are trying to see if we can use
> > memcpy to directly copy the contents of the framebuffer to the
> > destination buffer. In that case, as long as the src/dst types have
> > different sign we can't just use memcpy, right? In fact it looks like we
> > might need to expand the checks to include the cases where srcType is
> > GL_(UNSIGNED_)SHORT and GL_(UNSIGNED_)BYTE as well.
> >
> srcType returned by _mesa_get_format_datatype() is one of:
> GL_UNSIGNED_NORMALIZED
> GL_SIGNED_NORMALIZED
> GL_UNSIGNED_INT
> GL_INT
> GL_FLOAT
> So, the suggested checks for srcType are not required.

Oh, right, although I think that does not invalidate my point: can we
memcpy from a GL_UNSIGNED_NORMALIZED to a format with type GL_FLOAT or
GL_SIGNED_NORMALIZED? It does not look like these checks here are
thorough.

In any case, that's beyond the point of your patch. Talking specifically
about your patch: can we memcpy, for example, from a _signed_ integer
format like MESA_FORMAT_R_SINT8 to an _unsigned_ format (integer or
not)? I don't think we can, in which case your patch would not look
correct to me.

Also, as I said in my previous e-mail, I feel like these checks here do
not add anything, at least when this function is called from
readpixels_can_use_memcpy, since even if we return true here, we will
later call _mesa_format_matches_format_and_type and that would check
that the formats match anyway before going through the memcpy path. Even
the other use of this function in Gallium would call
_mesa_format_matches_format_and_type before it calls this... That's why
I think we probably want to remove these checks from this function and
rely on _mesa_format_matches_format_and_type exclusively to check
allowed formats and types.

> > That said, I think this code is not necessary with the call to
> > _mesa_format_matches_format_and_type that we do immediately after this,
> > since that will check that the framebuffer format exactly matches the
> > destination format anyway, which is a much tighter check than this. In
> > fact, a quick piglit run without these checks does not seem to break any
> > tests on i965. Gallium uses these two functions in a slightly different
> > way in st_cb_readpixels.c though, so I wonder if their use case requires
> > these checks to exist in this function anyway.
> >
> > Iago
> >
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/17] i965/fs: Use a switch statement in fs_inst::regs_read()

2015-06-19 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga 

On Thu, 2015-06-18 at 17:50 -0700, Jason Ekstrand wrote:
> This makes things a little simpler, more efficient, and quite a bit more
> readable.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 45 
> ++--
>  1 file changed, 23 insertions(+), 22 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 5563c5a..37b6d0d 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -744,28 +744,29 @@ fs_inst::is_partial_write() const
>  int
>  fs_inst::regs_read(int arg) const
>  {
> -   if (is_tex() && arg == 0 && src[0].file == GRF) {
> -  return mlen;
> -   } else if (opcode == FS_OPCODE_FB_WRITE && arg == 0) {
> -  return mlen;
> -   } else if (opcode == SHADER_OPCODE_URB_WRITE_SIMD8 && arg == 0) {
> -  return mlen;
> -   } else if (opcode == SHADER_OPCODE_UNTYPED_ATOMIC && arg == 0) {
> -  return mlen;
> -   } else if (opcode == SHADER_OPCODE_UNTYPED_SURFACE_READ && arg == 0) {
> -  return mlen;
> -   } else if (opcode == SHADER_OPCODE_UNTYPED_SURFACE_WRITE && arg == 0) {
> -  return mlen;
> -   } else if (opcode == SHADER_OPCODE_TYPED_ATOMIC && arg == 0) {
> -  return mlen;
> -   } else if (opcode == SHADER_OPCODE_TYPED_SURFACE_READ && arg == 0) {
> -  return mlen;
> -   } else if (opcode == SHADER_OPCODE_TYPED_SURFACE_WRITE && arg == 0) {
> -  return mlen;
> -   } else if (opcode == FS_OPCODE_INTERPOLATE_AT_PER_SLOT_OFFSET && arg == 
> 0) {
> -  return mlen;
> -   } else if (opcode == FS_OPCODE_LINTERP && arg == 0) {
> -  return exec_size / 4;
> +   switch (opcode) {
> +   case FS_OPCODE_FB_WRITE:
> +   case SHADER_OPCODE_URB_WRITE_SIMD8:
> +   case SHADER_OPCODE_UNTYPED_ATOMIC:
> +   case SHADER_OPCODE_UNTYPED_SURFACE_READ:
> +   case SHADER_OPCODE_UNTYPED_SURFACE_WRITE:
> +   case SHADER_OPCODE_TYPED_ATOMIC:
> +   case SHADER_OPCODE_TYPED_SURFACE_READ:
> +   case SHADER_OPCODE_TYPED_SURFACE_WRITE:
> +   case FS_OPCODE_INTERPOLATE_AT_PER_SLOT_OFFSET:
> +  if (arg == 0)
> + return mlen;
> +  break;
> +
> +   case FS_OPCODE_LINTERP:
> +  if (arg == 0)
> + return exec_size / 4;
> +  break;
> +
> +   default:
> +  if (is_tex() && arg == 0 && src[0].file == GRF)
> + return mlen;
> +  break;
> }
>  
> switch (src[arg].file) {


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 02/17] i965/fs: Fix fs_inst::regs_read() for uniform pull constant loads

2015-06-19 Thread Iago Toral
On Thu, 2015-06-18 at 17:50 -0700, Jason Ekstrand wrote:
> Previously, fs_inst::regs_read() fell back to depending on the register
> width for the second source.  This isn't really correct since it isn't a
> SIMD8 value at all, but a SIMD4x2 value.  This commit changes it to
> explicitly be always one register.

Right, in fact we were incorrectly computing a size of 2 registers in
SIMD16 executions because of this.

Reviewed-by: Iago Toral Quiroga 

> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 37b6d0d..ce56657 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -763,6 +763,12 @@ fs_inst::regs_read(int arg) const
>   return exec_size / 4;
>break;
>  
> +   case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GEN7:
> +  /* The second argument is a single SIMD4x2 register */
> +  if (arg == 1)
> + return 1;
> +  break;
> +
> default:
>if (is_tex() && arg == 0 && src[0].file == GRF)
>   return mlen;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/17] i965/fs: Report the right value in fs_inst::regs_read() for PIXEL_X/Y

2015-06-19 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga 

On Thu, 2015-06-18 at 17:50 -0700, Jason Ekstrand wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index ce56657..4f98d63 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -769,6 +769,12 @@ fs_inst::regs_read(int arg) const
>   return 1;
>break;
>  
> +   case FS_OPCODE_PIXEL_X:
> +   case FS_OPCODE_PIXEL_Y:
> +  if (arg == 0)
> + return 2;
> +  break;
> +
> default:
>if (is_tex() && arg == 0 && src[0].file == GRF)
>   return mlen;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/17] i965/fs: Explicitly set the exec_size on the add(32) in interpolation setup

2015-06-19 Thread Iago Toral
On Thu, 2015-06-18 at 17:50 -0700, Jason Ekstrand wrote:
> Soon we will start using the builder to explicitly set all the execution
> sizes.  We could make a 32-wide builder, but the builder asserts that we
> never grow it which is usually a reasonable assumption.  Sinc this one
> instruction is a bit of an odd-ball, we just set the exec_size explicitly.

So if I understand it right, the only point of this change is making
explicit that this instruction has a different execution size to ensure
that we notice it when we rewrite the code to set explicit execution
sizes with the new builder, right?

> ---
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> index 4770838..b00825e 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> @@ -1357,10 +1357,11 @@ fs_visitor::emit_interpolation_setup_gen6()
> */
>fs_reg int_pixel_xy(GRF, alloc.allocate(dispatch_width / 8),
>BRW_REGISTER_TYPE_UW, dispatch_width * 2);
> -  abld.exec_all()
> -  .ADD(int_pixel_xy,
> -   fs_reg(stride(suboffset(g1_uw, 4), 1, 4, 0)),
> -   fs_reg(brw_imm_v(0x11001010)));
> +  fs_inst *add = abld.exec_all()
> + .ADD(int_pixel_xy,
> +  fs_reg(stride(suboffset(g1_uw, 4), 1, 4, 0)),
> +  fs_reg(brw_imm_v(0x11001010)));
> +  add->exec_size = dispatch_width * 2;
>  
>this->pixel_x = vgrf(glsl_type::float_type);
>this->pixel_y = vgrf(glsl_type::float_type);


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/17] i965/fs: Explicitly set the exec_size on the add(32) in interpolation setup

2015-06-21 Thread Iago Toral
On Fri, 2015-06-19 at 07:01 -0700, Jason Ekstrand wrote:
> 
> On Jun 19, 2015 5:09 AM, "Iago Toral"  wrote:
> >
> > On Thu, 2015-06-18 at 17:50 -0700, Jason Ekstrand wrote:
> > > Soon we will start using the builder to explicitly set all the
> execution
> > > sizes.  We could make a 32-wide builder, but the builder asserts
> that we
> > > never grow it which is usually a reasonable assumption.  Sinc this
> one
> > > instruction is a bit of an odd-ball, we just set the exec_size
> explicitly.
> >
> > So if I understand it right, the only point of this change is making
> > explicit that this instruction has a different execution size to
> ensure
> > that we notice it when we rewrite the code to set explicit execution
> > sizes with the new builder, right?
> 
> No, it's more that there is no good way to set it to SIMD32 with the
> builder because changing dispatch width in the builder can only go
> down and not up.

Aha, I see now, thanks.

>   In retrospect, I should have explicitly created the fs_inst rather
> than using the builder to emit it 16-wide and changing it later.

Yeah, I guess that would've made more sense, not a big deal anyway:

Reviewed-by: Iago Toral Quiroga 

> The reason this patch can stand on it's own is because, at this point
> in the series, the builder still uses the exec size guessing based on
> register widths.
> 
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 9 +
> > >  1 file changed, 5 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > > index 4770838..b00825e 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > > +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > > @@ -1357,10 +1357,11 @@
> fs_visitor::emit_interpolation_setup_gen6()
> > > */
> > >fs_reg int_pixel_xy(GRF, alloc.allocate(dispatch_width /
> 8),
> > >BRW_REGISTER_TYPE_UW, dispatch_width *
> 2);
> > > -  abld.exec_all()
> > > -  .ADD(int_pixel_xy,
> > > -   fs_reg(stride(suboffset(g1_uw, 4), 1, 4, 0)),
> > > -   fs_reg(brw_imm_v(0x11001010)));
> > > +  fs_inst *add = abld.exec_all()
> > > + .ADD(int_pixel_xy,
> > > +  fs_reg(stride(suboffset(g1_uw, 4),
> 1, 4, 0)),
> > > +  fs_reg(brw_imm_v(0x11001010)));
> > > +  add->exec_size = dispatch_width * 2;
> > >
> > >this->pixel_x = vgrf(glsl_type::float_type);
> > >this->pixel_y = vgrf(glsl_type::float_type);
> >
> >
> 
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/14] mesa: Fix conditions to test signed, unsigned integer format

2015-06-21 Thread Iago Toral
On Fri, 2015-06-19 at 13:32 -0700, Anuj Phogat wrote:
> On Thu, Jun 18, 2015 at 11:41 PM, Iago Toral  wrote:
> > On Thu, 2015-06-18 at 09:19 -0700, Anuj Phogat wrote:
> >> On Thu, Jun 18, 2015 at 7:09 AM, Iago Toral  wrote:
> >> > On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote:
> >> >> Signed-off-by: Anuj Phogat 
> >> >> Cc: 
> >> >> ---
> >> >>  src/mesa/main/readpix.c | 2 ++
> >> >>  1 file changed, 2 insertions(+)
> >> >>
> >> >> diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
> >> >> index caa2648..a9416ef 100644
> >> >> --- a/src/mesa/main/readpix.c
> >> >> +++ b/src/mesa/main/readpix.c
> >> >> @@ -160,10 +160,12 @@ _mesa_readpixels_needs_slow_path(const struct 
> >> >> gl_context *ctx, GLenum format,
> >> >>srcType = _mesa_get_format_datatype(rb->Format);
> >> >>
> >> >>if ((srcType == GL_INT &&
> >> >> +   _mesa_is_enum_format_integer(format) &&
> >> >> (type == GL_UNSIGNED_INT ||
> >> >>  type == GL_UNSIGNED_SHORT ||
> >> >>  type == GL_UNSIGNED_BYTE)) ||
> >> >>(srcType == GL_UNSIGNED_INT &&
> >> >> +   _mesa_is_enum_format_integer(format) &&
> >> >> (type == GL_INT ||
> >> >>  type == GL_SHORT ||
> >> >>  type == GL_BYTE))) {
> >> >
> >> > As far as I understand this code we are trying to see if we can use
> >> > memcpy to directly copy the contents of the framebuffer to the
> >> > destination buffer. In that case, as long as the src/dst types have
> >> > different sign we can't just use memcpy, right? In fact it looks like we
> >> > might need to expand the checks to include the cases where srcType is
> >> > GL_(UNSIGNED_)SHORT and GL_(UNSIGNED_)BYTE as well.
> >> >
> >> srcType returned by _mesa_get_format_datatype() is one of:
> >> GL_UNSIGNED_NORMALIZED
> >> GL_SIGNED_NORMALIZED
> >> GL_UNSIGNED_INT
> >> GL_INT
> >> GL_FLOAT
> >> So, the suggested checks for srcType are not required.
> >
> > Oh, right, although I think that does not invalidate my point: can we
> > memcpy from a GL_UNSIGNED_NORMALIZED to a format with type GL_FLOAT or
> > GL_SIGNED_NORMALIZED? It does not look like these checks here are
> > thorough.
> >
> Helper function _mesa_need_signed_unsigned_int_conversion() is
> meant to do the checks only for integer formats. May be add another
> function to do the missing checks for other formats?

I have no concerns about the _mesa_need_signed_unsigned_int_conversion
function that you add in a later patch for your PBO work, my concern is
related to the fact that you are assuming that the checks that you need
in the PBO path are the same that we have in
_mesa_readpixels_needs_slow_path, so you make both the same when I think
they are trying to address different things.

In your PBO code, you can't handle signed/unsigned integer conversions,
so you need to detect that and fall back to another path. That should be
fine I guess and the function _mesa_need_signed_unsigned_int_conversion
does what you need, so no problems there.

However, in _mesa_readpixels_needs_slow_path I think we don't want to
just do integer checking. The purpose of the function is to tell whether
we can use memcpy to copy pixels from the framebuffer to the dst, and if
we have types with different signs, *whether they are integer or not*,
we can't, so limiting the check only to integer types does not look
right to me. The key aspect here is that what this function needs to
check is not specific to integer types, even if the current code only
seems to check things when the framebuffer has an integer format.

> > In any case, that's beyond the point of your patch. Talking specifically
> > about your patch: can we memcpy, for example, from a _signed_ integer
> > format like MESA_FORMAT_R_SINT8 to an _unsigned_ format (integer or
> > not)? I don't think we can, in which case your patch would not look
> > correct to me.
> >
> Reading integer format to a non integer format is not allowed in
> glReadPixels. That's why those cases are not relevant here and
> we just check for integer formats. From ext_texture_integer:
> "INVALID_OPERATON is generated by ReadPixels if  is
> an integer format and the color buffer is not an integer format, or
>  if  is not an integer format and th

Re: [Mesa-dev] [PATCH v2 26/82] glsl: Don't do copy propagation on buffer variables

2015-06-22 Thread Iago Toral
On Mon, 2015-06-22 at 14:28 -0700, Jordan Justen wrote:
> 24-26 once again makes me wonder if these optimization *can* be used
> with SSBOs based on the same ext spec wording I referenced before:
> 
> "The ability to write to buffer objects creates the potential for
>  multiple independent shader invocations to read and write the same
>  underlying memory. The same issue exists with the
>  ARB_shader_image_load_store extension provided in OpenGL 4.2, which
>  can write to texture objects and buffers. In both cases, the
>  specification makes few guarantees related to the relative order of
>  memory reads and writes performed by the shader invocations."
> 
> In these patches "other threads" were specifically mentioned.
> 
> Did these patches also prevent bad things from happening in generated
> code? (Like mentioned for patch 23.)

I think the problem here is the possibility for shaders to use
memoryBarrier():

"SHADER_STORAGE_BARRIER_BIT:  Memory accesses using shader buffer
variables issued after the barrier will reflect data written by
shaders prior to the barrier.  Additionally, assignments to and atomic
operations performed on shader buffer variables after the barrier will
not execute until all memory accesses (e.g., loads, stores, texture
fetches, vertex fetches) initiated prior to the barrier complete."

I think in these cases we can't allow these optimizations to kick in. 

That said, maybe we can check if we are using any memorybarriers in the
shader code and decide if we want to enable these optimizations for
ssbos based on that. I think we can try to do that in a later patch.

Iago

> -Jordan
> 
> On 2015-06-03 00:01:16, Iago Toral Quiroga wrote:
> > Since the backing storage for these is shared we cannot ensure that the
> > value won't change by writes from other threads.
> > ---
> >  src/glsl/opt_copy_propagation.cpp | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/src/glsl/opt_copy_propagation.cpp 
> > b/src/glsl/opt_copy_propagation.cpp
> > index 806027b..f206995 100644
> > --- a/src/glsl/opt_copy_propagation.cpp
> > +++ b/src/glsl/opt_copy_propagation.cpp
> > @@ -330,7 +330,7 @@ ir_copy_propagation_visitor::add_copy(ir_assignment *ir)
> >   */
> >  ir->condition = new(ralloc_parent(ir)) ir_constant(false);
> >  this->progress = true;
> > -  } else {
> > +  } else if (lhs_var->data.mode != ir_var_shader_storage) {
> >  entry = new(this->acp) acp_entry(lhs_var, rhs_var);
> >  this->acp->push_tail(entry);
> >}
> > -- 
> > 1.9.1
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/14] mesa: Fix conditions to test signed, unsigned integer format

2015-06-22 Thread Iago Toral
On Mon, 2015-06-22 at 12:35 -0700, Anuj Phogat wrote:
> On Sun, Jun 21, 2015 at 11:25 PM, Iago Toral  wrote:
> > On Fri, 2015-06-19 at 13:32 -0700, Anuj Phogat wrote:
> >> On Thu, Jun 18, 2015 at 11:41 PM, Iago Toral  wrote:
> >> > On Thu, 2015-06-18 at 09:19 -0700, Anuj Phogat wrote:
> >> >> On Thu, Jun 18, 2015 at 7:09 AM, Iago Toral  wrote:
> >> >> > On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote:
> >> >> >> Signed-off-by: Anuj Phogat 
> >> >> >> Cc: 
> >> >> >> ---
> >> >> >>  src/mesa/main/readpix.c | 2 ++
> >> >> >>  1 file changed, 2 insertions(+)
> >> >> >>
> >> >> >> diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
> >> >> >> index caa2648..a9416ef 100644
> >> >> >> --- a/src/mesa/main/readpix.c
> >> >> >> +++ b/src/mesa/main/readpix.c
> >> >> >> @@ -160,10 +160,12 @@ _mesa_readpixels_needs_slow_path(const struct 
> >> >> >> gl_context *ctx, GLenum format,
> >> >> >>srcType = _mesa_get_format_datatype(rb->Format);
> >> >> >>
> >> >> >>if ((srcType == GL_INT &&
> >> >> >> +   _mesa_is_enum_format_integer(format) &&
> >> >> >> (type == GL_UNSIGNED_INT ||
> >> >> >>  type == GL_UNSIGNED_SHORT ||
> >> >> >>  type == GL_UNSIGNED_BYTE)) ||
> >> >> >>(srcType == GL_UNSIGNED_INT &&
> >> >> >> +   _mesa_is_enum_format_integer(format) &&
> >> >> >> (type == GL_INT ||
> >> >> >>  type == GL_SHORT ||
> >> >> >>  type == GL_BYTE))) {
> >> >> >
> >> >> > As far as I understand this code we are trying to see if we can use
> >> >> > memcpy to directly copy the contents of the framebuffer to the
> >> >> > destination buffer. In that case, as long as the src/dst types have
> >> >> > different sign we can't just use memcpy, right? In fact it looks like 
> >> >> > we
> >> >> > might need to expand the checks to include the cases where srcType is
> >> >> > GL_(UNSIGNED_)SHORT and GL_(UNSIGNED_)BYTE as well.
> >> >> >
> >> >> srcType returned by _mesa_get_format_datatype() is one of:
> >> >> GL_UNSIGNED_NORMALIZED
> >> >> GL_SIGNED_NORMALIZED
> >> >> GL_UNSIGNED_INT
> >> >> GL_INT
> >> >> GL_FLOAT
> >> >> So, the suggested checks for srcType are not required.
> >> >
> >> > Oh, right, although I think that does not invalidate my point: can we
> >> > memcpy from a GL_UNSIGNED_NORMALIZED to a format with type GL_FLOAT or
> >> > GL_SIGNED_NORMALIZED? It does not look like these checks here are
> >> > thorough.
> >> >
> >> Helper function _mesa_need_signed_unsigned_int_conversion() is
> >> meant to do the checks only for integer formats. May be add another
> >> function to do the missing checks for other formats?
> >
> > I have no concerns about the _mesa_need_signed_unsigned_int_conversion
> > function that you add in a later patch for your PBO work, my concern is
> > related to the fact that you are assuming that the checks that you need
> > in the PBO path are the same that we have in
> > _mesa_readpixels_needs_slow_path, so you make both the same when I think
> > they are trying to address different things.
> >
> > In your PBO code, you can't handle signed/unsigned integer conversions,
> > so you need to detect that and fall back to another path. That should be
> > fine I guess and the function _mesa_need_signed_unsigned_int_conversion
> > does what you need, so no problems there.
> >
> > However, in _mesa_readpixels_needs_slow_path I think we don't want to
> > just do integer checking. The purpose of the function is to tell whether
> > we can use memcpy to copy pixels from the framebuffer to the dst, and if
> > we have types with different signs, *whether they are integer or not*,
> > we can't, so limiting the check only to integer types does not look
> > right to me. The key aspect here is that what this function needs to
> > check is not specific to integer types, even if the current code only

Re: [Mesa-dev] [PATCH 03/14] mesa: Fix conditions to test signed, unsigned integer format

2015-06-23 Thread Iago Toral
On Tue, 2015-06-23 at 08:54 +0200, Iago Toral wrote:
> On Mon, 2015-06-22 at 12:35 -0700, Anuj Phogat wrote:
> > On Sun, Jun 21, 2015 at 11:25 PM, Iago Toral  wrote:
> > > On Fri, 2015-06-19 at 13:32 -0700, Anuj Phogat wrote:
> > >> On Thu, Jun 18, 2015 at 11:41 PM, Iago Toral  wrote:
> > >> > On Thu, 2015-06-18 at 09:19 -0700, Anuj Phogat wrote:
> > >> >> On Thu, Jun 18, 2015 at 7:09 AM, Iago Toral  wrote:
> > >> >> > On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote:
> > >> >> >> Signed-off-by: Anuj Phogat 
> > >> >> >> Cc: 
> > >> >> >> ---
> > >> >> >>  src/mesa/main/readpix.c | 2 ++
> > >> >> >>  1 file changed, 2 insertions(+)
> > >> >> >>
> > >> >> >> diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
> > >> >> >> index caa2648..a9416ef 100644
> > >> >> >> --- a/src/mesa/main/readpix.c
> > >> >> >> +++ b/src/mesa/main/readpix.c
> > >> >> >> @@ -160,10 +160,12 @@ _mesa_readpixels_needs_slow_path(const 
> > >> >> >> struct gl_context *ctx, GLenum format,
> > >> >> >>srcType = _mesa_get_format_datatype(rb->Format);
> > >> >> >>
> > >> >> >>if ((srcType == GL_INT &&
> > >> >> >> +   _mesa_is_enum_format_integer(format) &&
> > >> >> >> (type == GL_UNSIGNED_INT ||
> > >> >> >>  type == GL_UNSIGNED_SHORT ||
> > >> >> >>  type == GL_UNSIGNED_BYTE)) ||
> > >> >> >>(srcType == GL_UNSIGNED_INT &&
> > >> >> >> +   _mesa_is_enum_format_integer(format) &&
> > >> >> >> (type == GL_INT ||
> > >> >> >>  type == GL_SHORT ||
> > >> >> >>  type == GL_BYTE))) {
> > >> >> >
> > >> >> > As far as I understand this code we are trying to see if we can use
> > >> >> > memcpy to directly copy the contents of the framebuffer to the
> > >> >> > destination buffer. In that case, as long as the src/dst types have
> > >> >> > different sign we can't just use memcpy, right? In fact it looks 
> > >> >> > like we
> > >> >> > might need to expand the checks to include the cases where srcType 
> > >> >> > is
> > >> >> > GL_(UNSIGNED_)SHORT and GL_(UNSIGNED_)BYTE as well.
> > >> >> >
> > >> >> srcType returned by _mesa_get_format_datatype() is one of:
> > >> >> GL_UNSIGNED_NORMALIZED
> > >> >> GL_SIGNED_NORMALIZED
> > >> >> GL_UNSIGNED_INT
> > >> >> GL_INT
> > >> >> GL_FLOAT
> > >> >> So, the suggested checks for srcType are not required.
> > >> >
> > >> > Oh, right, although I think that does not invalidate my point: can we
> > >> > memcpy from a GL_UNSIGNED_NORMALIZED to a format with type GL_FLOAT or
> > >> > GL_SIGNED_NORMALIZED? It does not look like these checks here are
> > >> > thorough.
> > >> >
> > >> Helper function _mesa_need_signed_unsigned_int_conversion() is
> > >> meant to do the checks only for integer formats. May be add another
> > >> function to do the missing checks for other formats?
> > >
> > > I have no concerns about the _mesa_need_signed_unsigned_int_conversion
> > > function that you add in a later patch for your PBO work, my concern is
> > > related to the fact that you are assuming that the checks that you need
> > > in the PBO path are the same that we have in
> > > _mesa_readpixels_needs_slow_path, so you make both the same when I think
> > > they are trying to address different things.
> > >
> > > In your PBO code, you can't handle signed/unsigned integer conversions,
> > > so you need to detect that and fall back to another path. That should be
> > > fine I guess and the function _mesa_need_signed_unsigned_int_conversion
> > > does what you need, so no problems there.
> > >
> > > However, in _mesa_readpixels_needs_slow_path I think we don't want to
> > > just do integer checking. The purpose o

Re: [Mesa-dev] [PATCH 06/17] i965/blorp: Explicitly set execution sizes for new'd instructions

2015-06-23 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga 

On Thu, 2015-06-18 at 17:51 -0700, Jason Ekstrand wrote:
> This doesn't affect instructions allocated using the builder.
> ---
>  src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp 
> b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
> index c1b7609..f655a0c 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
> @@ -72,7 +72,7 @@ brw_blorp_eu_emitter::emit_kill_if_outside_rect(const 
> struct brw_reg &x,
> emit_cmp(BRW_CONDITIONAL_L, x, dst_x1)->predicate = BRW_PREDICATE_NORMAL;
> emit_cmp(BRW_CONDITIONAL_L, y, dst_y1)->predicate = BRW_PREDICATE_NORMAL;
>  
> -   fs_inst *inst = new (mem_ctx) fs_inst(BRW_OPCODE_AND, g1, f0, g1);
> +   fs_inst *inst = new (mem_ctx) fs_inst(BRW_OPCODE_AND, 16, g1, f0, g1);
> inst->force_writemask_all = true;
> insts.push_tail(inst);
>  }
> @@ -83,7 +83,7 @@ brw_blorp_eu_emitter::emit_texture_lookup(const struct 
> brw_reg &dst,
>unsigned base_mrf,
>unsigned msg_length)
>  {
> -   fs_inst *inst = new (mem_ctx) fs_inst(op, dst, brw_message_reg(base_mrf),
> +   fs_inst *inst = new (mem_ctx) fs_inst(op, 16, dst, 
> brw_message_reg(base_mrf),
>   fs_reg(0u));
>  
> inst->base_mrf = base_mrf;
> @@ -118,7 +118,8 @@ brw_blorp_eu_emitter::emit_combine(enum opcode 
> combine_opcode,
>  {
> assert(combine_opcode == BRW_OPCODE_ADD || combine_opcode == 
> BRW_OPCODE_AVG);
>  
> -   insts.push_tail(new (mem_ctx) fs_inst(combine_opcode, dst, src_1, src_2));
> +   insts.push_tail(new (mem_ctx) fs_inst(combine_opcode, 16, dst,
> + src_1, src_2));
>  }
>  
>  fs_inst *
> @@ -126,7 +127,7 @@ brw_blorp_eu_emitter::emit_cmp(enum brw_conditional_mod 
> op,
> const struct brw_reg &x,
> const struct brw_reg &y)
>  {
> -   fs_inst *cmp = new (mem_ctx) fs_inst(BRW_OPCODE_CMP,
> +   fs_inst *cmp = new (mem_ctx) fs_inst(BRW_OPCODE_CMP, 16,
>  vec16(brw_null_reg()), x, y);
> cmp->conditional_mod = op;
> insts.push_tail(cmp);


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/17] i965/fs: Set the builder group for emitting FB-write stencil/AA alpha

2015-06-23 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga 

On Thu, 2015-06-18 at 17:50 -0700, Jason Ekstrand wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> index b00825e..8a43ec8 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> @@ -1528,7 +1528,7 @@ fs_visitor::emit_single_fb_write(const fs_builder &bld,
>  
> if (payload.aa_dest_stencil_reg) {
>sources[length] = fs_reg(GRF, alloc.allocate(1));
> -  bld.exec_all().annotate("FB write stencil/AA alpha")
> +  bld.group(8, 0).exec_all().annotate("FB write stencil/AA alpha")
>   .MOV(sources[length],
>fs_reg(brw_vec8_grf(payload.aa_dest_stencil_reg, 0)));
>length++;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 07/17] i965/fs: Move offset() and half() to the fs_builder

2015-06-23 Thread Iago Toral
On Thu, 2015-06-18 at 17:51 -0700, Jason Ekstrand wrote:
> We want to move these into the builder so that they know the current
> builder's dispatch width.  This will be needed by a later commit.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp |  52 ++
>  src/mesa/drivers/dri/i965/brw_fs_builder.h   |  46 +
>  src/mesa/drivers/dri/i965/brw_fs_cse.cpp |   2 +-
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp |  60 +--
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 149 
> ++-
>  src/mesa/drivers/dri/i965/brw_ir_fs.h|  51 -
>  6 files changed, 182 insertions(+), 178 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 4f98d63..c13ac7d 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -267,7 +267,7 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
> &bld,
>   inst->mlen = 1 + dispatch_width / 8;
> }
>  
> -   bld.MOV(dst, offset(vec4_result, (const_offset & 3) * scale));
> +   bld.MOV(dst, bld.offset(vec4_result, (const_offset & 3) * scale));
>  }
>  
>  /**
> @@ -361,7 +361,12 @@ fs_inst::is_copy_payload(const brw::simple_allocator 
> &grf_alloc) const
>reg.width = this->src[i].width;
>if (!this->src[i].equals(reg))
>   return false;
> -  reg = ::offset(reg, 1);
> +
> +  if (i < this->header_size) {
> + reg.reg_offset += 1;
> +  } else {
> + reg.reg_offset += this->exec_size / 8;
> +  }

This here looks like you are squashing a fix. Should it go in a separate
patch?

> }
>  
> return true;
> @@ -963,7 +968,7 @@ fs_visitor::emit_fragcoord_interpolation(bool 
> pixel_center_integer,
> } else {
>bld.ADD(wpos, this->pixel_x, fs_reg(0.5f));
> }
> -   wpos = offset(wpos, 1);
> +   wpos = bld.offset(wpos, 1);
>  
> /* gl_FragCoord.y */
> if (!flip && pixel_center_integer) {
> @@ -979,7 +984,7 @@ fs_visitor::emit_fragcoord_interpolation(bool 
> pixel_center_integer,
>  
>bld.ADD(wpos, pixel_y, fs_reg(offset));
> }
> -   wpos = offset(wpos, 1);
> +   wpos = bld.offset(wpos, 1);
>  
> /* gl_FragCoord.z */
> if (devinfo->gen >= 6) {
> @@ -989,7 +994,7 @@ fs_visitor::emit_fragcoord_interpolation(bool 
> pixel_center_integer,
> this->delta_xy[BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC],
> interp_reg(VARYING_SLOT_POS, 2));
> }
> -   wpos = offset(wpos, 1);
> +   wpos = bld.offset(wpos, 1);
>  
> /* gl_FragCoord.w: Already set up in emit_interpolation */
> bld.MOV(wpos, this->wpos_w);
> @@ -1072,7 +1077,7 @@ fs_visitor::emit_general_interpolation(fs_reg attr, 
> const char *name,
>   /* If there's no incoming setup data for this slot, don't
>* emit interpolation for it.
>*/
> - attr = offset(attr, type->vector_elements);
> + attr = bld.offset(attr, type->vector_elements);
>   location++;
>   continue;
>}
> @@ -1087,7 +1092,7 @@ fs_visitor::emit_general_interpolation(fs_reg attr, 
> const char *name,
>  interp = suboffset(interp, 3);
> interp.type = attr.type;
> bld.emit(FS_OPCODE_CINTERP, attr, fs_reg(interp));
> -attr = offset(attr, 1);
> +attr = bld.offset(attr, 1);
>   }
>} else {
>   /* Smooth/noperspective interpolation case. */
> @@ -1125,7 +1130,7 @@ fs_visitor::emit_general_interpolation(fs_reg attr, 
> const char *name,
> if (devinfo->gen < 6 && interpolation_mode == 
> INTERP_QUALIFIER_SMOOTH) {
>bld.MUL(attr, attr, this->pixel_w);
> }
> -attr = offset(attr, 1);
> +attr = bld.offset(attr, 1);
>   }
>  
>}
> @@ -1227,19 +1232,19 @@ fs_visitor::emit_samplepos_setup()
> if (dispatch_width == 8) {
>abld.MOV(int_sample_x, fs_reg(sample_pos_reg));
> } else {
> -  abld.half(0).MOV(half(int_sample_x, 0), fs_reg(sample_pos_reg));
> -  abld.half(1).MOV(half(int_sample_x, 1),
> +  abld.half(0).MOV(abld.half(int_sample_x, 0), fs_reg(sample_pos_reg));
> +  abld.half(1).MOV(abld.half(int_sample_x, 1),
> fs_reg(suboffset(sample_pos_reg, 16)));
> }
> /* Compute gl_SamplePosition.x */
> compute_sample_position(pos, int_sample_x);
> -   pos = offset(pos, 1);
> +   pos = abld.offset(pos, 1);
> if (dispatch_width == 8) {
>abld.MOV(int_sample_y, fs_reg(suboffset(sample_pos_reg, 1)));
> } else {
> -  abld.half(0).MOV(half(int_sample_y, 0),
> +  abld.half(0).MOV(abld.half(int_sample_y, 0),
> fs_reg(suboffset(sample_pos_reg, 1)));
> -  abld.half(1).MOV(half(int_sample_y, 1),
> +  abld.half(1).MOV(abld.half(int_sample_y, 1),
> fs_reg(suboffset(sample_pos_reg, 17)));
> }
> /* Compute gl_Sa

Re: [Mesa-dev] [PATCH v2 26/82] glsl: Don't do copy propagation on buffer variables

2015-06-24 Thread Iago Toral
On Tue, 2015-06-23 at 15:45 -0700, Jordan Justen wrote:
> On 2015-06-22 23:38:14, Iago Toral wrote:
> > On Mon, 2015-06-22 at 14:28 -0700, Jordan Justen wrote:
> > > 24-26 once again makes me wonder if these optimization *can* be used
> > > with SSBOs based on the same ext spec wording I referenced before:
> > > 
> > > "The ability to write to buffer objects creates the potential for
> > >  multiple independent shader invocations to read and write the same
> > >  underlying memory. The same issue exists with the
> > >  ARB_shader_image_load_store extension provided in OpenGL 4.2, which
> > >  can write to texture objects and buffers. In both cases, the
> > >  specification makes few guarantees related to the relative order of
> > >  memory reads and writes performed by the shader invocations."
> > > 
> > > In these patches "other threads" were specifically mentioned.
> > > 
> > > Did these patches also prevent bad things from happening in generated
> > > code? (Like mentioned for patch 23.)
> > 
> > I think the problem here is the possibility for shaders to use
> > memoryBarrier():
> > 
> > "SHADER_STORAGE_BARRIER_BIT:  Memory accesses using shader buffer
> > variables issued after the barrier will reflect data written by
> > shaders prior to the barrier.  Additionally, assignments to and atomic
> > operations performed on shader buffer variables after the barrier will
> > not execute until all memory accesses (e.g., loads, stores, texture
> > fetches, vertex fetches) initiated prior to the barrier complete."
> > 
> > I think in these cases we can't allow these optimizations to kick in. 
> > 
> > That said, maybe we can check if we are using any memorybarriers in the
> > shader code and decide if we want to enable these optimizations for
> > ssbos based on that. I think we can try to do that in a later patch.
> 
> Ok. What do you think about updating the commit messages on these
> three patches?
> 
> For example, currently you have:
> 
> "Since the backing storage for these is shared we cannot ensure that
>  the value won't change by writes from other threads."
> 
> How does something like this sound?
> 
> "Since the backing storage for these is shared we cannot ensure that
>  the value won't change by writes from other threads. Normally SSBO
>  accesses are not guaranteed to be syncronized with other threads,
>  except when memoryBarrier is used. So, we might be able to optimize
>  some SSBO accesses, but for now we always take the safe path and emit
>  the SSBO access."

Sure, that makes things more clear.

> With a change like that to these commit messages, you can add
> Reviewed-by: Jordan Justen 
> to all 3 patches.

Thanks Jordan!

Iago

> > > On 2015-06-03 00:01:16, Iago Toral Quiroga wrote:
> > > > Since the backing storage for these is shared we cannot ensure that the
> > > > value won't change by writes from other threads.
> > > > ---
> > > >  src/glsl/opt_copy_propagation.cpp | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/src/glsl/opt_copy_propagation.cpp 
> > > > b/src/glsl/opt_copy_propagation.cpp
> > > > index 806027b..f206995 100644
> > > > --- a/src/glsl/opt_copy_propagation.cpp
> > > > +++ b/src/glsl/opt_copy_propagation.cpp
> > > > @@ -330,7 +330,7 @@ ir_copy_propagation_visitor::add_copy(ir_assignment 
> > > > *ir)
> > > >   */
> > > >  ir->condition = new(ralloc_parent(ir)) ir_constant(false);
> > > >  this->progress = true;
> > > > -  } else {
> > > > +  } else if (lhs_var->data.mode != ir_var_shader_storage) {
> > > >  entry = new(this->acp) acp_entry(lhs_var, rhs_var);
> > > >  this->acp->push_tail(entry);
> > > >}
> > > > -- 
> > > > 1.9.1
> > > > 
> > > > ___
> > > > mesa-dev mailing list
> > > > mesa-dev@lists.freedesktop.org
> > > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > > 
> > 
> > 
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 23/82] glsl: Do not do CSE for expressions involving SSBO loads

2015-06-24 Thread Iago Toral
On Wed, 2015-06-24 at 15:43 +0300, Francisco Jerez wrote:
> Iago Toral  writes:
> 
> > On Wed, 2015-06-17 at 17:20 -0700, Jordan Justen wrote:
> >> I wanted to question whether this was required, based on this text
> >> from the extension spec:
> >> 
> >> "The ability to write to buffer objects creates the potential for
> >>  multiple independent shader invocations to read and write the same
> >>  underlying memory. The same issue exists with the
> >>  ARB_shader_image_load_store extension provided in OpenGL 4.2, which
> >>  can write to texture objects and buffers. In both cases, the
> >>  specification makes few guarantees related to the relative order of
> >>  memory reads and writes performed by the shader invocations."
> >> 
> >> But I'm not sure if we can reconcile CSE with 'memoryBarrier' and
> >> 'barrier'. curro, any thoughts from image load/store?
> >
> > I think the problem is within the same thread, that text above talks
> > about multiple invocations reading from and writing to the same
> > location, but within the same invocation, the order of reads and writes
> > must be preserved:
> >
> > "Buffer variable memory reads and writes within a single shader
> > invocation are processed in order.  However, the order of reads and
> > writes performed in one invocation relative to those performed by
> > another invocation is largely undefined."
> >
> > For example, if X is a shader storage buffer variable and we have code
> > like this with just one invocation:
> >
> > ssbo_store(X, 1);
> > a = ssbo_load(X) + 1  // a = 2
> > ssbo_store(X, 2);
> > b = ssbo_load(X) + 1; // b = 3
> >
> > CSE could mess it up like this:
> >
> > ssbo_store(X, 1);
> > tmp = ssbo_load(X) + 1  // tmp = 2
> > a = tmp;
> > ssbo_store(X, 2);
> > b = tmp;
> >
> > which would be incorrect. I think I wrote this patch after seeing
> > something like this happening. The CSE pass clearly states that it does
> > not support write variables after all.
> >
> > Also, notice the same would apply if there are multiple invocations but
> > the shader code used something like gl_VertexID or gl_FragCoord to make
> > each invocation read from/write to a different address within the SSBO
> > buffer (I imagine this is the usual way to operate with SSBOs). In these
> > cases, even if we have multiple invocations, keeping the relative order
> > of reads and writes within each one is necessary.
> >
> 
> AFAICT the reason why this (and many of the other changes in GLSL
> optimization passes) is needed is because SSBO loads have been
> implemented as ir_expression nodes instead of being lowered into
> intrinsics (as other side-effectful operations do like
> ARB_shader_image_load_store and ARB_shader_atomic_counters).  This
> surely broke the assumption of a number of optimization passes that
> ir_expression nodes behave as pure functions.  I guess the reason why
> you've done it this way is because UBO loads were already being
> represented as expressions, so I see why you may have wanted to use the
> same approach for SSBOs even though there is a fundamental difference
> between the two: UBO loads have no side effects and are constant for a
> given set of arguments and a given shader execution, SSBO loads and
> stores are not.  SSBO stores couldn't be accommodated into the same
> framework so easily, and you decided to create a separate ir node for
> them, what seems inconsistent with loads.  Intrinsics would probably
> have been a good fit for both loads and stores, and would have made all
> these optimization changes unnecessary...
> 
> P.S.: Sorry for the late reply, I was on vacation when I was CC'ed.

Right, your assessment about the reasons behind the current
implementation is correct. I did not realize of these issues when I
decided to go with the current implementation, now it does look like
going with GLSL intrinsics would have made things a bit easier. I
suppose it would make sense to revisit the implementation in the near
future taking your work on arb_shader_image_load_store as a reference.

Iago

> > Iago
> >
> >> -Jordan
> >> 
> >> On 2015-06-03 00:01:13, Iago Toral Quiroga wrote:
> >> > SSBOs are read/write and this CSE pass only handles read-only variables.
> >> > ---
> >> >  src/glsl/opt_cse.cpp | 33 -
> >> >  1 file changed, 32 insertions(+), 1 deletion(-)
> >> > 
> >> > diff --git a/src/glsl/opt_cse.cpp

Re: [Mesa-dev] [PATCH v2 05/14] meta: Abort meta pbo path if readpixels need signed-unsigned conversion

2015-06-24 Thread Iago Toral
On Wed, 2015-06-24 at 18:39 -0700, Anuj Phogat wrote:
> Meta pbo path for ReadPixels rely on BlitFramebuffer which doesn't support
> signed to unsigned integer conversions and vice versa.
> 
> Without this patch, piglit test fbo_integer_readpixels_sint_uint fails, when
> forced to use the meta pbo path.
> 
> v2: Make need_rgb_to_luminance_conversion() a static function. (Iago)

I think you meant "need_signed_unsigned_int_conversion()" here

Other than that:

Reviewed-by: Iago Toral Quiroga 


> Bump up the comment and the commit message. (Jason)
> 
> Signed-off-by: Anuj Phogat 
> Reviewed-by: Jason Ekstrand 
> Cc: Iago Toral 
> Cc: 
> ---
>  src/mesa/drivers/common/meta_tex_subimage.c | 25 +
>  1 file changed, 25 insertions(+)
> 
> diff --git a/src/mesa/drivers/common/meta_tex_subimage.c 
> b/src/mesa/drivers/common/meta_tex_subimage.c
> index 00364f8..a617b77 100644
> --- a/src/mesa/drivers/common/meta_tex_subimage.c
> +++ b/src/mesa/drivers/common/meta_tex_subimage.c
> @@ -248,6 +248,23 @@ fail:
> return success;
>  }
>  
> +static bool
> +need_signed_unsigned_int_conversion(mesa_format rbFormat,
> +GLenum format, GLenum type)
> +{
> +   const GLenum srcType = _mesa_get_format_datatype(rbFormat);
> +   return (srcType == GL_INT &&
> +   _mesa_is_enum_format_integer(format) &&
> +   (type == GL_UNSIGNED_INT ||
> +type == GL_UNSIGNED_SHORT ||
> +type == GL_UNSIGNED_BYTE)) ||
> +  (srcType == GL_UNSIGNED_INT &&
> +   _mesa_is_enum_format_integer(format) &&
> +   (type == GL_INT ||
> +type == GL_SHORT ||
> +type == GL_BYTE));
> +}
> +
>  bool
>  _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims,
>struct gl_texture_image *tex_image,
> @@ -283,6 +300,14 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
> GLuint dims,
>  
>if (_mesa_need_rgb_to_luminance_conversion(rb->Format, format))
>   return false;
> +
> +  /* This function rely on BlitFramebuffer to fill in the pixel data for
> +   * ReadPixels. But, BlitFrameBuffer doesn't support signed to unsigned
> +   * or unsigned to signed integer conversions. OpenGL spec expects an
> +   * invalid operation in that case.
> +   */
> +  if (need_signed_unsigned_int_conversion(rb->Format, format, type))
> + return false;
> }
>  
> /* For arrays, use a tall (height * depth) 2D texture but taking into


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 03/19] i965/fs: Fix fs_inst::regs_read() for uniform pull constant loads

2015-06-25 Thread Iago Toral
On Thu, 2015-06-25 at 13:24 -0700, Jason Ekstrand wrote:
> Previously, fs_inst::regs_read() fell back to depending on the register
> width for the second source.  This isn't really correct since it isn't a
> SIMD8 value at all, but a SIMD4x2 value.  This commit changes it to
> explicitly be always one register.
> 
> Reviewed-by: Iago Toral Quiroga 
> 
> v2: Use mlen for determining the number of registers written

I think you meant read instead of written. 

With that change my review stays, I had also missed that skylake
required a header.

Iago

> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 31dfb24..589b74c 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -715,6 +715,12 @@ fs_inst::regs_read(int arg) const
>   return mlen;
>break;
>  
> +   case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GEN7:
> +  /* The payload is actually stored in src1 */
> +  if (arg == 1)
> + return mlen;
> +  break;
> +
> case FS_OPCODE_LINTERP:
>if (arg == 0)
>   return exec_size / 4;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 05/19] i965/fs: Explicitly set the exec_size on the add(32) in interpolation setup

2015-06-25 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga 

On Thu, 2015-06-25 at 13:24 -0700, Jason Ekstrand wrote:
> Soon we will start using the builder to explicitly set all the execution
> sizes.  We could make a 32-wide builder, but the builder asserts that we
> never grow it which is usually a reasonable assumption.  Sinc this one
   since
> instruction is a bit of an odd-ball, we just set the exec_size explicitly.
> 
> Reviewed-by: Iago Toral Quiroga 
> 
> v2: Explicitly new the fs_inst instead of using the builder and setting
> exec_size after the fact.
> 
> v3: Set force_writemask_all with the builder instead of directly.  The
> builder over-writes it if we set it manually.  Also, if we don't have
> force_writemask_all in the builder it will assert-fail on SIMD32.
> ---
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> index 9a4bad6..8976c25 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> @@ -1357,10 +1357,12 @@ fs_visitor::emit_interpolation_setup_gen6()
> */
>fs_reg int_pixel_xy(GRF, alloc.allocate(dispatch_width / 8),
>BRW_REGISTER_TYPE_UW, dispatch_width * 2);
> -  abld.exec_all()
> -  .ADD(int_pixel_xy,
> -   fs_reg(stride(suboffset(g1_uw, 4), 1, 4, 0)),
> -   fs_reg(brw_imm_v(0x11001010)));
> +  fs_inst *add =
> + new (mem_ctx) fs_inst(BRW_OPCODE_ADD, dispatch_width * 2,
> +   int_pixel_xy,
> +   fs_reg(stride(suboffset(g1_uw, 4), 1, 4, 0)),
> +   fs_reg(brw_imm_v(0x11001010)));
> +  abld.exec_all().emit(add);
>  
>this->pixel_x = vgrf(glsl_type::float_type);
>this->pixel_y = vgrf(glsl_type::float_type);


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 08/19] i965/fs: Move offset(fs_reg, unsigned) to brw_fs.h

2015-06-25 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga 

On Thu, 2015-06-25 at 13:24 -0700, Jason Ekstrand wrote:
> Shortly, offset() will depend on the builder so we need it moved to some
> place where it has access to that.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.h| 21 +
>  src/mesa/drivers/dri/i965/brw_ir_fs.h | 21 -
>  2 files changed, 21 insertions(+), 21 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
> b/src/mesa/drivers/dri/i965/brw_fs.h
> index 243baf6..c1819cc 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -62,6 +62,27 @@ namespace brw {
> class fs_live_variables;
>  }
>  
> +static inline fs_reg
> +offset(fs_reg reg, unsigned delta)
> +{
> +   switch (reg.file) {
> +   case BAD_FILE:
> +  break;
> +   case GRF:
> +   case MRF:
> +   case ATTR:
> +  return byte_offset(reg,
> + delta * MAX2(reg.width * reg.stride, 1) *
> + type_sz(reg.type));
> +   case UNIFORM:
> +  reg.reg_offset += delta;
> +  break;
> +   default:
> +  assert(delta == 0);
> +   }
> +   return reg;
> +}
> +
>  /**
>   * The fragment shader front-end.
>   *
> diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h 
> b/src/mesa/drivers/dri/i965/brw_ir_fs.h
> index 96dc20d..16b20be 100644
> --- a/src/mesa/drivers/dri/i965/brw_ir_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h
> @@ -129,27 +129,6 @@ horiz_offset(fs_reg reg, unsigned delta)
>  }
>  
>  static inline fs_reg
> -offset(fs_reg reg, unsigned delta)
> -{
> -   switch (reg.file) {
> -   case BAD_FILE:
> -  break;
> -   case GRF:
> -   case MRF:
> -   case ATTR:
> -  return byte_offset(reg,
> - delta * MAX2(reg.width * reg.stride, 1) *
> - type_sz(reg.type));
> -   case UNIFORM:
> -  reg.reg_offset += delta;
> -  break;
> -   default:
> -  assert(delta == 0);
> -   }
> -   return reg;
> -}
> -
> -static inline fs_reg
>  component(fs_reg reg, unsigned idx)
>  {
> assert(reg.subreg_offset == 0);


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] i965: Handle scratch accesses where reladdr also points to scratch space

2015-03-23 Thread Iago Toral
Thanks for reviewing! I'll write a new version including the changes you
suggest:

On Fri, 2015-03-20 at 19:21 +0200, Francisco Jerez wrote:
> Iago Toral Quiroga  writes:
> 
> > This is a problem when we have IR like this:
> >
> > (array_ref (var_ref temps) (swiz x (expression ivec4 bitcast_f2i
> >(swiz  (array_ref (var_ref temps) (constant int (2)) ) )) )) ) )
> >
> > where we are indexing an array with the result of an expression that
> > accesses the same array.
> >
> > In this scenario, temps will be moved to scratch space and we will need
> > to add scratch reads/writes for all accesses to temps, however, the
> > current implementation does not consider the case where a reladdr pointer
> > (obtained by indexing into temps trough a expression) points to a register
> > that is also stored in scratch space (as in this case, where the expression
> > used to index temps access temps[2]), and thus, requires a scratch read
> > before it is accessed.
> >
> > v2 (Francisco Jerez):
> >  - Handle also recursive reladdr addressing.
> >  - Do not memcpy dst_reg into src_reg when rewriting reladdr.
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89508
> > ---
> > A couple of notes for reviewers:
> >
> > 1. The implementation resolves recursive reladdr scratch accesses in one go
> > so we don't have to do multiple passes over the complete set of 
> > instructions.
> > This is more performant than doing something similar to what we do in
> > move_uniform_array_access_to_pull_constants.
> >
> 
> I doubt it will make any appreciable difference, multiple indirection is
> extremely unusual in practice.  My suggestion is don't increase
> complexity with the sole purpose of improving performance unless you
> have evidence that it will be appreciably beneficial in some realistic
> scenario.  Otherwise you run the risk of increasing the maintenance
> burden and distracting yourself and other developers from, say, working
> on improving performance where it really matters, often causing the
> effect opposite to what you originally intended.  As Donald Knuth said,
> premature optimization is the root of all evil. ;)

I agree that premature optimization is not a good idea in general, but
that was not really what drove me to do it, it was the fact that we have
plenty of things in Mesa and the drivers that go through the entire set
of instructions multiple times during compilation and linking that it
seemed like a good idea to try, in general, to avoid doing this when
there is not a good reason and in this case it looked like it was pretty
easy to avoid. I agree that this case is probably irrelevant in the big
picture though.

> That said, I think you could fix this problem efficiently while keeping
> complexity under control if you factor out the logic you have duplicated
> between destination and source handling into a separate function, say:
> 
> | src_reg
> | emit_resolve_reladdr(int scratch_loc[], bblock_t *block,
> |  vec4_instruction *inst, src_reg src)
> 
> that would emit the necessary scratch loads into a temporary right
> before instruction inst and return either the temporary or src itself,
> if it was already in the expected form.  To handle destination
> indirection you'd just pass *inst->dst.reladdr as src and the next
> instruction as inst, and to handle source indirection you could pass
> inst->src[i] directly.  The nested looping you have that finds the last
> non-null reladdr of the chain in every iteration would become completely
> unnecessary by having emit_resolve_reladdr() call itself recursively to
> resolve src.reladdr if non-null.

Looks like a good idea, I'll give it a try.

> A few more nit-picks below.
> 
> > 2. Once we start handling recursive reladdr we are rewriting reladdr
> > accesses to point to the destination registers of the scratch loads. This
> > means that alloc.count increases and we can have a reladdr pointing to a
> > register location beyond  the original alloc.count, so we should take this
> > into account when indexing scratch_loc[] with reladdr registers.
> >
> > I tested this for recursive reladdr accesses in both src and dst, including
> > indexing different arrays: a[b[a[b[0, etc and seems to work well.
> >
> > No piglit regressions on IvyBridge.
> >
> > As a side note, I also noticed that opt_array_splitting.cpp does not handle
> > these situations well and hits an assertion in some cases where it wrongly
> > assumes that an array only has constant indexing. This problem is happening
> > in master and is unrelated to this patch so I'll 

Re: [Mesa-dev] [PATCH 4/5] glsl: Consider active all elements of a shared/std140 block array

2015-04-07 Thread Iago Toral
Besides fixing the mentioned dEQP crashes, this patch also generally
fixes instance arrays with UBOs. The problem we have now is that each
element in the UBO instance array is a separate UBO mapped to a specific
binding point (and thus, a separate buffer), but we kill the instances
that are not being referenced in the shader code, so if we have
something like this:

layout(std140, binding=2) uniform Fragments {
   vec4 v0;
   vec4 v1;
} inst[3];

And then the shader code only references inst[1], for example:

vec4 tfOutput0 = inst[1].v0;

That UBO read for inst[1].v0 can fail as a consequence of the fact that
we we are killing UBOs for inst[0] and inst[2] and we shouldn't.

I hit this while developing SSBO, which is the same thing, and this
patch fixes the problem.

Iago

On Wed, 2015-03-11 at 10:01 +0100, Eduardo Lima Mitev wrote:
> From: Antia Puentes 
> 
> Commmit 1ca25ab (glsl: Do not eliminate 'shared' or 'std140'
> blocks or block members) considers active 'shared' and 'std140'
> uniform blocks and uniform block arrays but did not include the
> block array elements. It was possible to have an active uniform
> block array without any elements marked as used, making the
> assertion ((b->num_array_elements > 0) == b->type->is_array())
> in link_uniform_blocks fail.
> 
> Fixes the following 5 dEQP tests:
> 
>  * dEQP-GLES3.functional.ubo.random.nested_structs_instance_arrays.18
>  * dEQP-GLES3.functional.ubo.random.nested_structs_instance_arrays.24
>  * dEQP-GLES3.functional.ubo.random.nested_structs_arrays_instance_arrays.19
>  * dEQP-GLES3.functional.ubo.random.all_per_block_buffers.49
>  * dEQP-GLES3.functional.ubo.random.all_shared_buffer.36
> ---
>  src/glsl/link_uniform_block_active_visitor.cpp | 23 +++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/src/glsl/link_uniform_block_active_visitor.cpp 
> b/src/glsl/link_uniform_block_active_visitor.cpp
> index 292cde3..8379750 100644
> --- a/src/glsl/link_uniform_block_active_visitor.cpp
> +++ b/src/glsl/link_uniform_block_active_visitor.cpp
> @@ -105,6 +105,22 @@ link_uniform_block_active_visitor::visit(ir_variable 
> *var)
> assert(b->num_array_elements == 0);
> assert(b->array_elements == NULL);
> assert(b->type != NULL);
> +   assert(!b->type->is_array() || b->has_instance_name);
> +
> +   /* For uniform block arrays declared with a shared or std140 layout
> +* qualifier, mark all its instances as used.
> +*/
> +   if (b->type->is_array() && b->type->length > 0) {
> +  b->num_array_elements = b->type->length;
> +  b->array_elements = reralloc(this->mem_ctx,
> +   b->array_elements,
> +   unsigned,
> +   b->num_array_elements);
> +
> +  for (unsigned i = 0; i < b->num_array_elements; i++) {
> + b->array_elements[i] = i;
> +  }
> +   }
>  
> return visit_continue;
>  }
> @@ -146,6 +162,13 @@ 
> link_uniform_block_active_visitor::visit_enter(ir_dereference_array *ir)
> assert((b->num_array_elements == 0) == (b->array_elements == NULL));
> assert(b->type != NULL);
>  
> +   /* If the block array was declared with a shared or std140 layout 
> qualifier,
> +* all its instances have been already marked as used (see
> +* link_uniform_block_active_visitor::visit(ir_variable *) function).
> +*/
> +   if (var->type->interface_packing == GLSL_INTERFACE_PACKING_PACKED)
> +  return visit_continue;
> +
> ir_constant *c = ir->array_index->as_constant();
>  
> if (c) {


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/11] i965: Fix textureSize for Lod > 0 with non-mipmap filters

2015-04-13 Thread Iago Toral
Hi Ben,

On Fri, 2015-04-10 at 15:34 -0700, Ben Widawsky wrote:
> On Tue, Feb 10, 2015 at 04:40:48PM +0100, Eduardo Lima Mitev wrote:
> > From: Iago Toral Quiroga 
> > 
> > Currently, when the MinFilter is GL_LINEAR or GL_NEAREST we hide the
> > actual miplevel count from the hardware (and we avoid re-creating
> > the miptree structure with all the levels), since we don't expect
> > levels other than the base level to be needed. Unfortunately,
> > GLSL's textureSize() function is an exception to this rule. This
> > function takes a lod parameter that we need to use to return the
> > size of the appropriate miplevel (if it exists). The spec only
> > requires that the miplevel exists, so even if the sampler is
> > configured with a linear or nearest MinFilter, as far as the user
> > has uploaded miplevels for the texture, textureSize() should return
> > the appropriate sizes.
> > 
> > This patch fixes this by exposing the actual miplevel count for all
> > sampling engine textures while keeping the original implementation
> > for render targets (for render targets textures we do not provide
> > the miplevel count but the actual LOD we are wrting to, so we
> > want to make sure that we make this the base level).
> > 
> > Fixes 28 dEQP tests in the following category:
> > dEQP-GLES3.functional.shaders.texture_functions.texturesize.*
> > ---
> >  src/mesa/drivers/dri/i965/intel_tex_validate.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/intel_tex_validate.c 
> > b/src/mesa/drivers/dri/i965/intel_tex_validate.c
> > index 0bf0393..06aeca6 100644
> > --- a/src/mesa/drivers/dri/i965/intel_tex_validate.c
> > +++ b/src/mesa/drivers/dri/i965/intel_tex_validate.c
> > @@ -47,8 +47,10 @@ intel_update_max_level(struct intel_texture_object 
> > *intelObj,
> >  {
> > struct gl_texture_object *tObj = &intelObj->base;
> >  
> > -   if (sampler->MinFilter == GL_NEAREST ||
> > -   sampler->MinFilter == GL_LINEAR) {
> > +   if (!tObj->_MipmapComplete ||
> > +   (tObj->_RenderToTexture &&
> > +(sampler->MinFilter == GL_NEAREST ||
> > + sampler->MinFilter == GL_LINEAR))) {
> >intelObj->_MaxLevel = tObj->BaseLevel;
> > } else {
> >intelObj->_MaxLevel = tObj->_MaxLevel;
> 
> Forgive my ignorance on this subject in advance...
> 
> I don't know if _MipmapComplete is correct here. Looking at
> _mesa_test_texobj_completeness, it /seems/ like almost all the cases which 
> would
> mark the texture as !_MipmapComplete are errors, but reading the comments for
> incomplete() suggests to me there are potentially valid cases where you could
> use textureSize(), but have an incomplete mipmap. IN OTHER WORDS... it seems 
> to
> me that you may want something other than tObj->BaseLevel even when
> !tObj->_MipmapComplete. Clarification would be greatly appreciated.

That was necessary to fix a piglit test regression. I don't remember
what piglit test it was, sorry :(, but I do remember the problem:
The test uploaded a mipmap-complete texture first, did some tests with
it and then it would upload level 0 again with a _different_ texture
size (which would make the texture mipmap incomplete again because all
levels other than the base would be incorrect at this point). In this
scenario only the base level is valid and we want to make sure that we
only upload that. Not doing this would led to crash in the driver at
some point later on.

Looking at the comments in incomplete(), cases 1 and 3 seem like errors
to me, and case 2  is exactly the case triggered by the piglit test I
mention above (in which we want only the base level uploaded). I think
_MipmapComplete can only be TRUE when levels other than the base level
are valid.

Make sense?

> I assume the use of _MipmapComplete was what required adding the bit about
> tObj->_RenderToTexture (because certain textures are considered complete even
> when they're not??). I don't know enough to assert correctness for that 
> though.
> Would be nice if you could dumb this down for me as well.

Not really, the reason to add _RenderToTexture is different, it is
related to this paragraph from the commit log:

"This patch fixes this by exposing the actual miplevel count for all
sampling engine textures while keeping the original implementation for
render targets (for render targets textures we do not provide
the miplevel count but the actual LOD we are wrting to, so we
want to make sure that we make this the base level)."

So what I found is that for render targets the driver 

Re: [Mesa-dev] [PATCH 2/2] mesa: Always return a value in _mesa_format_from_format_and_type

2015-05-04 Thread Iago Toral
On Mon, 2015-05-04 at 05:52 -0700, Jason Ekstrand wrote:
> 
> On May 4, 2015 4:25 AM, "Iago Toral Quiroga" 
> wrote:
> >
> > Return MESA_FORMAT_NONE if no matching type was found. We have an
> > assertion here, but we should return something anyway to avoid
> > confusion with non-debug builds.
> > ---
> >  src/mesa/main/glformats.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/src/mesa/main/glformats.c b/src/mesa/main/glformats.c
> > index 8ced579..fddd048 100644
> > --- a/src/mesa/main/glformats.c
> > +++ b/src/mesa/main/glformats.c
> > @@ -2752,4 +2752,5 @@ _mesa_format_from_format_and_type(GLenum
> format, GLenum type)
> >  * format in that case.
> >  */
> > unreachable("Unsupported format");
> > +   return MESA_FORMAT_NONE;
> 
> If we want this to have well-defined results, this should not be an
> unreachable().  The semantics of unreachable allow the compiler to
> generate wrong code for the case where it does get reached.  We could
> switch it to an assert or we could just drop this patch. I'd be a fan
> of the latter.
> --Jason

Ok, let's drop this patch then.

Iago

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fix SHADER_OPCODE_UNTYPED_SURFACE_WRITE to mark written surfaces

2015-05-05 Thread Iago Toral
On Tue, 2015-05-05 at 16:41 +0300, Francisco Jerez wrote:
> Iago Toral Quiroga  writes:
> 
> > We can do this when the surface index is an immediate, as we do for
> > reads. Otherwise the visitor should handle this.
> > ---
> > Francisco, maybe you choose not to do this for a reason? It seems a bit
> > inconsistent with the reads though. Also, other read/write messages seem
> > to mark surfaces in the generator when the surface index is an immediate
> > and leave the visitor to handle non-immediate scenarios.
> >
> 
> Yeah, this deviation from the other untyped surface opcodes was
> intentional, actually I sent a patch to remove the implicit call to
> brw_mark_surface_used() for untyped surface reads and atomics which I
> haven't landed yet [1].

Good to know, since I'll be using these functions I'll make sure I
always mark used surfaces in the visitor code then.

>   The reason is that the surface index may not be
> an immediate, so the visitor cannot in general rely on the generator to
> do it.  Whether the generator can do it or not is even dependent on the
> extent that optimization passes are able to simplify the program, so it
> could lead to hard to debug errors.  It seems more consistent and less
> error-prone to have the visitor handle this unconditionally rather than
> sharing the responsibility between visitor and generator depending on
> the situation.

Yes, makes sense.

Thanks,
Iago

> [1] http://lists.freedesktop.org/archives/mesa-dev/2015-May/083278.html
> 
> >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
> > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > index b8b8ed1..cf4b9ae 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > @@ -2247,6 +2247,8 @@ fs_generator::generate_code(const cfg_t *cfg, int 
> > dispatch_width)
> >   assert(src[2].file == BRW_IMMEDIATE_VALUE);
> >   brw_untyped_surface_write(p, src[0], src[1],
> > inst->mlen, src[2].dw1.ud);
> > + if (src[1].file == BRW_IMMEDIATE_VALUE)
> > +brw_mark_surface_used(prog_data, src[1].dw1.ud);
> >   break;
> >  
> >case SHADER_OPCODE_TYPED_ATOMIC:
> > -- 
> > 1.9.1


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: Fix memory leak in builtin_builder::_image_prototype.

2014-10-05 Thread Iago Toral
On Fri, 2014-10-03 at 13:27 +0300, Francisco Jerez wrote:
> Kenneth Graunke  writes:
> 
> > On Thursday, October 02, 2014 11:28:35 AM Ian Romanick wrote:
> >> Since this is Curro's code, I'm CC'ing me.
> >> 
> >> On 10/01/2014 03:12 AM, Iago Toral Quiroga wrote:
> >> > in_var calls the ir_variable constructor, which dups the variable name.
> >> > ---
> >> >  src/glsl/builtin_functions.cpp | 8 +---
> >> >  1 file changed, 5 insertions(+), 3 deletions(-)
> >> > 
> >> > diff --git a/src/glsl/builtin_functions.cpp 
> >> > b/src/glsl/builtin_functions.cpp
> >> > index 5a024cb..7d61fcc 100644
> >> > --- a/src/glsl/builtin_functions.cpp
> >> > +++ b/src/glsl/builtin_functions.cpp
> >> > @@ -4465,9 +4465,11 @@ builtin_builder::_image_prototype(const glsl_type 
> >> > *image_type,
> >> >sig->parameters.push_tail(in_var(glsl_type::int_type, "sample"));
> >> >  
> >> > /* Data arguments. */
> >> > -   for (unsigned i = 0; i < num_arguments; ++i)
> >> > -  sig->parameters.push_tail(in_var(data_type,
> >> > -   ralloc_asprintf(NULL, "arg%d", 
> >> > i)));
> >> > +   for (unsigned i = 0; i < num_arguments; ++i) {
> >> > +  char *arg_name = ralloc_asprintf(NULL, "arg%d", i);
> >> > +  sig->parameters.push_tail(in_var(data_type, arg_name));
> >> > +  ralloc_free(arg_name);
> >> > +   }
> >> 
> >> Using a NULL memory context is generally bad... precisely because it
> >> often leads to memory leaks.
> >> 
> >> There are a couple ways to fix this.  Since all of the image functions
> >> have a limited number of parameters, we could either:
> >> 
> >>  - Have a fixed size buffer that we snprintf to.
> >> 
> >>  - Have a table of all the parameter names.
> >> 
> >>  - Since this is the function prototype, I don't think we need names for
> >> the parameters at all.  Just pass NULL?
> >
> > Does anything even use the names?  I don't think anything does...at which 
> > point, why not just call them all "arg" and be done with it?
> >
> 
> Aren't the names useful for debugging and error reporting?  But sure,
> you're right that "arg1" isn't a lot more meaningful than "arg".

Francisco: so I understand that you would be okay with just passing
"arg" for all these parameters as Kenneth suggested? I can send the
patch for this if we agree to do this.

Iago

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/main: Fix unpack_R5G6B5_UNORM.

2014-10-07 Thread Iago Toral

El 2014-10-07 21:46, Jason Ekstrand escribió:

On Oct 7, 2014 12:04 PM, "Iago Toral Quiroga" 
wrote:
 >
 > There is a comment warning about the fact that this is not doing
what we
 > expect, so fix it. This should be doing the same as
unpack_B5G6R5_UNORM
 > but swapping B and R.
 > ---
 >
 > Jason started some work to auto-generate the format_pack.c and
 > format_unpack.c files which I think has this fixed. I am continuing
his
 > work on this at the moment, but I guess it might make sense to fix
this in
 > the current code too while that work is on-going.

Not much time to reply right now, but I seem to recall there being a
bit more to fix here.  What about packing and swrast's texel fetch
implementation.  Are those OK for this format?


No piglit regressions observed.


On what drivers?  It might be good to test on swrast and llvmpipe. 
I'm not super-concerned there but we should at least try not to break
it.
 --Jason


That was on Intel, but you are right, at least classic swrast has some 
regressions with this.


For reference, I did not see any regressions on Gallium Softpipe but I 
could only run a small subset of tests with this driver (-t texture -t 
color -t format)  or otherwise it would hog my CPU and eventually crash 
my system. Could not test with llvmpipe, for some reason, even when I am 
bulding with llvm and I see the llvmpipe sources being built Mesa 
insists in using the softpipe driver at runtime...


Since there are regressions on swrast at least I guess we should just 
drop this patch until we have a proper fix for all drivers.


Iago




 >  src/mesa/main/format_unpack.c | 10 +++---
 >  1 file changed, 3 insertions(+), 7 deletions(-)
 >
 > diff --git a/src/mesa/main/format_unpack.c
b/src/mesa/main/format_unpack.c
 > index d5628a9..11b028c 100644
 > --- a/src/mesa/main/format_unpack.c
 > +++ b/src/mesa/main/format_unpack.c
 > @@ -207,16 +207,12 @@ unpack_B5G6R5_UNORM(const void *src, GLfloat
dst[][4], GLuint n)
 >  static void
 >  unpack_R5G6B5_UNORM(const void *src, GLfloat dst[][4], GLuint n)
 >  {
 > -   /* Warning: this function does not match the current Mesa
definition
 > -    * of MESA_FORMAT_R5G6B5_UNORM.
 > -    */
 >     const GLushort *s = ((const GLushort *) src);
 >     GLuint i;
 >     for (i = 0; i < n; i++) {
 > -      GLuint t = (s[i] >> 8) | (s[i] << 8); /* byte swap */
 > -      dst[i][RCOMP] = UBYTE_TO_FLOAT( ((t >> 8) & 0xf8) | ((t

13) & 0x7) );

 > -      dst[i][GCOMP] = UBYTE_TO_FLOAT( ((t >> 3) & 0xfc) | ((t

  9) & 0x3) );

 > -      dst[i][BCOMP] = UBYTE_TO_FLOAT( ((t << 3) & 0xf8) | ((t

  2) & 0x7) );

 > +      dst[i][RCOMP] = ((s[i]      ) & 0x1f) * (1.0F /
31.0F);
 > +      dst[i][GCOMP] = ((s[i] >> 5 ) & 0x3f) * (1.0F / 63.0F);
 > +      dst[i][BCOMP] = ((s[i] >> 11) & 0x1f) * (1.0F / 31.0F);
 >        dst[i][ACOMP] = 1.0F;
 >     }
 >  }
 > --
 > 1.9.1
 >
 > ___
 > mesa-dev mailing list
 > mesa-dev@lists.freedesktop.org
 > http://lists.freedesktop.org/mailman/listinfo/mesa-dev [1]

Links:
--
[1] http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/main: Fix unpack_R5G6B5_UNORM.

2014-10-24 Thread Iago Toral

El 2014-10-08 07:11, Jason Ekstrand escribió:

On Oct 8, 2014 6:36 AM, "Iago Toral"  wrote:
 >
 > El 2014-10-07 21:46, Jason Ekstrand escribió:
 >
 >> On Oct 7, 2014 12:04 PM, "Iago Toral Quiroga" 
 >> wrote:
 >>  >
 >>  > There is a comment warning about the fact that this is not
doing
 >> what we
 >>  > expect, so fix it. This should be doing the same as
 >> unpack_B5G6R5_UNORM
 >>  > but swapping B and R.
 >>  > ---
 >>  >
 >>  > Jason started some work to auto-generate the format_pack.c and
 >>  > format_unpack.c files which I think has this fixed. I am
continuing
 >> his
 >>  > work on this at the moment, but I guess it might make sense to
fix
 >> this in
 >>  > the current code too while that work is on-going.
 >>
 >> Not much time to reply right now, but I seem to recall there being
a
 >> bit more to fix here.  What about packing and swrast's texel
fetch
 >> implementation.  Are those OK for this format?
 >>
 >>> No piglit regressions observed.
 >>
 >>
 >> On what drivers?  It might be good to test on swrast and
llvmpipe. 
 >> I'm not super-concerned there but we should at least try not to
break
 >> it.
 >>  --Jason
 >
 >
 > That was on Intel, but you are right, at least classic swrast has
some regressions with this.
 >
 > For reference, I did not see any regressions on Gallium Softpipe
but I could only run a small subset of tests with this driver (-t
texture -t color -t format)  or otherwise it would hog my CPU and
eventually crash my system. Could not test with llvmpipe, for some
reason, even when I am bulding with llvm and I see the llvmpipe
sources being built Mesa insists in using the softpipe driver at
runtime...

I can give you the configure flags for testing llvmpipe if you'd
like. 


I am using --with-gallium-drivers=swrast, I think that and having llvm 
should be all, right?


Configure reports that I have llvm, I see sources under 
src/gallium/drivers/llvmpipe/ being built and libllvmpipe.la is being 
generated , so I figure that my configure flags are working fine. My 
problem seems to at runtime, I thought that as far as llvmpipe is built 
it would take priority over softpipe, but for some reason that is not 
happening for me...



Yes, doing a full piglit run on llvmpipe or swrast requires a
pretty beefy desktop and still takes quite a while.


Since there are regressions on swrast at least I guess we should

just drop this patch until we have a proper fix for all drivers.

I'm not as worried about llvmpipe because it fails about half the
tests it attempts anyway.  I would like to know what's going on with
swrast though.

Brian, any ideas?

--Jason




 > Iago
 >
 >>>
 >>  >  src/mesa/main/format_unpack.c | 10 +++---
 >>  >  1 file changed, 3 insertions(+), 7 deletions(-)
 >>  >
 >>  > diff --git a/src/mesa/main/format_unpack.c
 >> b/src/mesa/main/format_unpack.c
 >>  > index d5628a9..11b028c 100644
 >>  > --- a/src/mesa/main/format_unpack.c
 >>  > +++ b/src/mesa/main/format_unpack.c
 >>  > @@ -207,16 +207,12 @@ unpack_B5G6R5_UNORM(const void *src,
GLfloat
 >> dst[][4], GLuint n)
 >>  >  static void
 >>  >  unpack_R5G6B5_UNORM(const void *src, GLfloat dst[][4],
GLuint n)
 >>  >  {
 >>  > -   /* Warning: this function does not match the current
Mesa
 >> definition
 >>  > -    * of MESA_FORMAT_R5G6B5_UNORM.
 >>  > -    */
 >>  >     const GLushort *s = ((const GLushort *) src);
 >>  >     GLuint i;
 >>  >     for (i = 0; i < n; i++) {
 >>  > -      GLuint t = (s[i] >> 8) | (s[i] << 8); /* byte swap
*/
 >>  > -      dst[i][RCOMP] = UBYTE_TO_FLOAT( ((t >> 8) & 0xf8) |
((t
 >>>>
 >>>> 13) & 0x7) );
 >>
 >>  > -      dst[i][GCOMP] = UBYTE_TO_FLOAT( ((t >> 3) & 0xfc) |
((t
 >>>>
 >>>>   9) & 0x3) );
 >>
 >>  > -      dst[i][BCOMP] = UBYTE_TO_FLOAT( ((t << 3) & 0xf8) |
((t
 >>>>
 >>>>   2) & 0x7) );
 >>
 >>  > +      dst[i][RCOMP] = ((s[i]      ) & 0x1f) * (1.0F /
 >> 31.0F);
 >>  > +      dst[i][GCOMP] = ((s[i] >> 5 ) & 0x3f) * (1.0F /
63.0F);
 >>  > +      dst[i][BCOMP] = ((s[i] >> 11) & 0x1f) * (1.0F /
31.0F);
 >>  >        dst[i][ACOMP] = 1.0F;
 >>  >     }
 >>  >  }
 >>  > --
 >>  > 1.9.1
 >>  >
 >>  > ___
 >>  > mesa-dev mailing list
 >>  > mesa-dev@lists.freedesktop.org
 >>  > http://lists.freedesktop.org/mailman/listinfo/mesa-dev [1] [1]
 >>
 >> Links:
 >> --
 >> [1] http://lists.freedesktop.org/mailman/listinfo/mesa-dev [1]


Links:
--
[1] http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PULL] i965: rename brw_gs -> brw_ff_gs; rename brw_vec4_gs -> brw_gs.

2014-10-29 Thread Iago Toral
On Tue, 2014-10-28 at 19:27 -0700, Kenneth Graunke wrote:
> Hello,
> 
> I'd like to rename some files in i965:
> 
> - brw_gs.c  -> brw_ff_gs.c
> - brw_gs.h  -> brw_ff_gs.h
> - brw_gs_emit.c -> brw_ff_gs_emit.c
> - brw_vec4_gs.c -> brw_gs.c
> - brw_vec4_gs.h -> brw_gs.h
> 
> The current "brw_gs" files are about emulating fixed-function functionality 
> (VF primitive decomposition and SOL) via the geometry shader; actual 
> programmable geometry shader code is handled by brw_vec4_gs.[ch].
> 
> With the advent of SIMD8 geometry shaders, "vec4_gs" will be confusing.  "gs" 
> is nicer.  Most of the legacy code uses the "ff_gs" name already - when Paul 
> respun his GS series, he renamed everything, but didn't change the filenames.
> 
> Objections?  Acks?
> 
> The "i965-reorg" branch of my tree (~kwg/mesa) has two patches to do the 
> renames:
> 
> i965: Rename brw_vec4_gs.[ch] to brw_gs.[ch].
> i965: Rename brw_gs{,_emit}.[ch] to brw_ff_gs{,_emit}.[ch].
> 
> Since they're purely "git mv" and #include fixes, I figured mailing out the 
> diff would be useless.

I think this is a good idea. I remember being a big confused by the file
names while I was working on geometry shaders. the new names looks
better to me.

Iago

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/20] Auto-generate pack/unpack functions

2014-11-18 Thread Iago Toral
I forgot to say that the series is available for testing here:
https://github.com/Igalia/mesa/tree/itoral-autogen-packing-review

Also, one of the patches was held for review by the list owner due to
its size (patch 12, which handles auto-generation of format_unpack.c),
so reviewers can fetch it from that link too if it does not reach the
list in the end.

Iago

On Tue, 2014-11-18 at 09:43 +0100, Iago Toral Quiroga wrote:
> This is the fist of two series of patches to address:
> https://bugs.freedesktop.org/show_bug.cgi?id=84566
> 
> The idea is that we have a lot of format conversion code scattered through
> different files in the repository, a lot of that is redundant / duplicated,
> so this intends to address that issue.
> 
> The goal of this first series is to address auto-generation of our pack/unpack
> functions (format_pack.c and format_unpack.c). Currently, we  have a ton of
> hand-coded pack/unpack functions for lots of formats, but we can auto-generate
> most of that code instead, so this series handles this.
> 
> This is based on initial work by Jason Ekstrand.
> 
> Tested on i965, classic swrast and gallium (radeon, nouveau, llvmpipe) without
> regressions.
> 
> For software drivers we worked with a trimmed set of piglit tests (related to
> format conversion), ~5700 tests selected with the following filter:
> 
> -t format -t color -t tex -t image -t swizzle -t clamp -t rgb -t lum -t pix
> -t fbo -t frame
> 
> Summary of the patches:
>  * Patches 1-7 are general fixes to the current code that were found while
>working on this.
>  * Patches 8-16 implement auto-generation of pack/unpack functions.
>  * Patches 17-20 make use of the auto-generated pack/unpack functions in
>various places to simplify the current code.
> 
> Notice that some of the fixes in patches 1-7 will become obsolete as soon as
> we auto-generate the pack/unpack functions, but we thought it would make sense
> to keep them in the patch set anyway since we started from that base and they
> should be correct fixes to the currently existing code.
> 
> Iago Toral Quiroga (1):
>   swrast: Remove unused variable.
> 
> Jason Ekstrand (9):
>   mesa/format_utils: Fix a bug in one of the format helper functions
>   mesa: Fix packing/unpacking of MESA_FORMAT_R5G6B5_UNORM
>   mesa/colormac: Remove an unused macro
>   mesa: Fix A1R5G5B5 packing/unpacking
>   mesa/format_utils: Prefix and expose the conversion helper functions
>   mesa: Add a concept of an array format
>   mesa: Add a _mesa_is_format_color_format helper
>   mesa: Autogenerate most of format_pack.c
>   mesa: Autogenerate format_unpack.c
> 
> Samuel Iglesias Gonsalvez (10):
>   mesa: Fix get_texbuffer_format().
>   mesa: Fix _mesa_swizzle_and_convert integer conversions to clamp
> properly
>   mesa: Add _mesa_pack_uint_rgba_row() format conversion function
>   mesa: Add non-normalized formats support for ubyte packing functions
>   mesa/format_pack: Add _mesa_pack_int_rgba_row()
>   mesa/formats: add new mesa formats and their pack/unpack functions.
>   mesa: use format conversion functions in swrast
>   mesa/pack: use autogenerated format_pack functions
>   mesa/main/pack_tmp.h: Add float conversion support
>   mesa/pack: refactor _mesa_pack_rgba_span_float()
> 
>  src/mesa/Makefile.am   |   18 +
>  src/mesa/Makefile.sources  |4 +-
>  src/mesa/main/colormac.h   |3 -
>  src/mesa/main/format_convert.py|   71 +
>  src/mesa/main/format_info.py   |   41 +
>  src/mesa/main/format_pack.c| 2994 
>  src/mesa/main/format_pack.c.mako   | 1147 ++
>  src/mesa/main/format_pack.h|6 +
>  src/mesa/main/format_unpack.c  | 4400 
> 
>  src/mesa/main/format_unpack.c.mako |  914 
>  src/mesa/main/format_utils.c   |  248 +-
>  src/mesa/main/format_utils.h   |  105 +
>  src/mesa/main/formats.c|  193 +-
>  src/mesa/main/formats.csv  |   13 +
>  src/mesa/main/formats.h|   73 +
>  src/mesa/main/pack.c   | 2111 +++--
>  src/mesa/main/pack_tmp.h   |   76 +-
>  src/mesa/main/run_mako.py  |7 +
>  src/mesa/main/teximage.c   |4 +-
>  src/mesa/main/texstore.c   |2 +-
>  src/mesa/swrast/s_drawpix.c|3 -
>  src/mesa/swrast/s_texfetch.c   |   13 +
>  src/mesa/swrast/s_texfetch_tmp.h   | 1359 +--
>  23 files changed, 3222 insertions(+), 10583 deletions(-)
>  create mode 100644 src/mesa/main/format_convert.py
>  delete mode 100644 src/mesa/main/format_pack.c
>  create mode 100644 src/mesa/main/format_pack.c.mako
>  delete mode 100644 src/mesa/main/format_unpack.c
>  create mode 100644 src/mesa/main/format_unpack.c.mako
>  create mode 100644 src/mesa/main/run_mako.py
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/20] mesa: Add a concept of an array format

2014-11-18 Thread Iago Toral
On Tue, 2014-11-18 at 10:46 -0800, Jason Ekstrand wrote:
> 
> 
> On Tue, Nov 18, 2014 at 12:43 AM, Iago Toral Quiroga
>  wrote:
> From: Jason Ekstrand 
> 
> An array format is a 32-bit integer format identifier that can
> represent
> any format that can be represented as an array of standard GL
> datatypes.
> While the MESA_FORMAT enums provide several of these, they
> don't account
> for all of them.
> 
> v2 by Iago Toral Quiroga :
> - Set pad to 0 and array_format_bit to 1 for all mesa array
> formats.
> - Fix array_format_flip_channels, since it was not doing what
> was expected.
> 
> 
> I'm not terribly surprised, I never got a chance to test that function
> before I handed it off to you.  I'm curious how you found it though,
> given that it's only run on big endian architectures.  Maybe it was
> just obviously wrong.  (silly me).  In any case, the new version looks
> good to me.

I noticed that because at some point I thought I needed to flip the
channels for GL_UNSIGNED_INT_8_8_8_8 in certain cases (I was wrong about
that in the end, but that drove me to look into it), and when I looked
at the function it was pretty obvious that it was not doing what it was
expected.

Iago

> --Jason
> 
>  
> ---
>  src/mesa/main/format_info.py | 41
> 
>  src/mesa/main/formats.c  | 56
> +++-
>  src/mesa/main/formats.h  | 55
> +++
>  3 files changed, 151 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/main/format_info.py
> b/src/mesa/main/format_info.py
> index 7424fe0..315767d 100644
> --- a/src/mesa/main/format_info.py
> +++ b/src/mesa/main/format_info.py
> @@ -98,6 +98,32 @@ def get_gl_data_type(fmat):
> else:
>assert False
> 
> +def get_array_format_datatype(chan):
> +   if chan.type == parser.FLOAT:
> +  if chan.size == 16:
> + return 'MESA_ARRAY_FORMAT_TYPE_HALF'
> +  elif chan.size == 32:
> + return 'MESA_ARRAY_FORMAT_TYPE_FLOAT'
> +  else:
> + assert False
> +   elif chan.type in (parser.SIGNED, parser.UNSIGNED):
> +  datatype = 'MESA_ARRAY_FORMAT_TYPE_'
> +  if chan.type == parser.UNSIGNED:
> + datatype += 'U'
> +
> +  if chan.size == 8:
> + datatype += 'BYTE'
> +  elif chan.size == 16:
> + datatype += 'SHORT'
> +  elif chan.size == 32:
> + datatype += 'INT'
> +  else:
> + print chan.size
> + assert False
> +  return datatype
> +   else:
> +  assert False
> +
>  def get_mesa_layout(fmat):
> if fmat.layout == 'array':
>return 'MESA_FORMAT_LAYOUT_ARRAY'
> @@ -192,6 +218,21 @@ for fmat in formats:
> 
> int(fmat.block_size() / 8))
> 
> print '  {{ {0} }},'.format(', '.join(map(str,
> fmat.swizzle)))
> +   if fmat.is_array() and fmat.colorspace in ('rgb', 'srgb'):
> +  chan = fmat.array_element()
> +  print '   {0} ,'.format(', '.join([
> + get_array_format_datatype(chan),
> + str(int(chan.norm)),
> + str(len(fmat.channels)),
> + str(fmat.swizzle[0]),
> + str(fmat.swizzle[1]),
> + str(fmat.swizzle[2]),
> + str(fmat.swizzle[3]),
> + str(int(0)),
> + str(int(1))
> +  ]))
> +   else:
> +  print '  {{ MESA_ARRAY_FORMAT_TYPE_UBYTE, 0, 0, 0,
> 0, 0, 0, 0, 0 }},'
> print '   },'
> 
>  print '};'
> diff --git a/src/mesa/main/formats.c b/src/mesa/main/formats.c
> index 7ec0507..f86925e 100644
> --- a/src/mesa/main/formats.c
> +++ b/src/mesa/main/formats.c
> @@ -71,6 +71,7 @@ struct gl_format_info
> GLubyte BytesPerBlock;
> 
> uint8_t Swizzle[4];
> +   mesa_array_f

Re: [Mesa-dev] [PATCH 16/29] mesa: use master conversion function to implement get_tex_rgba_uncompressed()

2014-11-18 Thread Iago Toral
On Tue, 2014-11-18 at 08:00 -0700, Brian Paul wrote:
> On 11/18/2014 02:23 AM, Iago Toral Quiroga wrote:
> > From: Samuel Iglesias Gonsalvez 
> >
> > This covers glGetTexImage for uncompressed color formats.
> >
> > Signed-off-by: Samuel Iglesias Gonsalvez 
> > Signed-off-by: Eduardo Lima Mitev 
> > ---
> >   src/mesa/main/texgetimage.c | 170 
> > ++--
> >   1 file changed, 117 insertions(+), 53 deletions(-)
> >
> > diff --git a/src/mesa/main/texgetimage.c b/src/mesa/main/texgetimage.c
> > index cb5f793..84cd53e 100644
> > --- a/src/mesa/main/texgetimage.c
> > +++ b/src/mesa/main/texgetimage.c
> > @@ -45,7 +45,8 @@
> >   #include "texgetimage.h"
> >   #include "teximage.h"
> >   #include "texstore.h"
> > -
> > +#include "format_utils.h"
> > +#include "pixeltransfer.h"
> >
> >
> >   /**
> > @@ -380,20 +381,10 @@ get_tex_rgba_uncompressed(struct gl_context *ctx, 
> > GLuint dimensions,
> >  GLenum rebaseFormat = GL_NONE;
> >  GLuint height = texImage->Height;
> >  GLuint depth = texImage->Depth;
> > -   GLuint img, row;
> > -   GLfloat (*rgba)[4];
> > -   GLuint (*rgba_uint)[4];
> > -   GLboolean tex_is_integer = 
> > _mesa_is_format_integer_color(texImage->TexFormat);
> > -   GLboolean tex_is_uint = _mesa_is_format_unsigned(texImage->TexFormat);
> > +   GLuint img;
> >  GLenum texBaseFormat = 
> > _mesa_get_format_base_format(texImage->TexFormat);
> >
> > -   /* Allocate buffer for one row of texels */
> > -   rgba = malloc(4 * width * sizeof(GLfloat));
> > -   rgba_uint = (GLuint (*)[4]) rgba;
> > -   if (!rgba) {
> > -  _mesa_error(ctx, GL_OUT_OF_MEMORY, "glGetTexImage()");
> > -  return;
> > -   }
> > +   assert (depth <= 1 || dimensions > 2);
> >
> >  if (texImage->TexObject->Target == GL_TEXTURE_1D_ARRAY) {
> > depth = height;
> > @@ -413,9 +404,7 @@ get_tex_rgba_uncompressed(struct gl_context *ctx, 
> > GLuint dimensions,
> >texImage->_BaseFormat == GL_RGB ||
> >texImage->_BaseFormat == GL_RG) &&
> >   (destBaseFormat == GL_LUMINANCE ||
> > - destBaseFormat == GL_LUMINANCE_ALPHA ||
> > - destBaseFormat == GL_LUMINANCE_INTEGER_EXT ||
> > - destBaseFormat == GL_LUMINANCE_ALPHA_INTEGER_EXT)) {
> > + destBaseFormat == GL_LUMINANCE_ALPHA)) {
> > /* If we're reading back an RGB(A) texture as luminance then we need
> >  * to return L=tex(R).  Note, that's different from glReadPixels 
> > which
> >  * returns L=R+G+B.
> > @@ -467,6 +456,22 @@ get_tex_rgba_uncompressed(struct gl_context *ctx, 
> > GLuint dimensions,
> > }
> >  }
> >
> > +   /* Describe the dst format */
> > +   GLboolean dst_is_integer = _mesa_is_enum_format_integer(format);
> > +   uint32_t dst_format = _mesa_format_from_format_and_type(format, type);
> > +   int dst_stride = _mesa_image_row_stride(&ctx->Pack, width, format, 
> > type);
> 
> This won't build with MSVC.  Declarations cannot follow code.

Ooops! I think I have done this in a number of patches..
Noted, I'll clean these up. thanks!

> 
> 
> > +
> > +   /* Since _mesa_format_convert does not handle transferOps we need to 
> > handle
> > +* them before we call the function. This requires to convert to RGBA 
> > float
> > +* first so we can call _mesa_apply_rgba_transfer_ops. If the dst 
> > format is
> > +* integer we can ignore transferOps.
> > +*
> > +* Some source formats (Luminance) will also require to be rebased to 
> > obtain
> > +* the expected results and this also requires to convert to RGBA first.
> > +*/
> > +   assert(!transferOps || (transferOps && !dst_is_integer));
> > +   bool needs_rgba = (transferOps || rebaseFormat != GL_NONE);
> > +
> >  for (img = 0; img < depth; img++) {
> > GLubyte *srcMap;
> > GLint rowstride;
> > @@ -475,46 +480,105 @@ get_tex_rgba_uncompressed(struct gl_context *ctx, 
> > GLuint dimensions,
> > ctx->Driver.MapTextureImage(ctx, texImage, img,
> > 0, 0, width, height, GL_MAP_READ_BIT,
> > &srcMap, &rowstride);
> > -  if (srcMap) {
> > -

Re: [Mesa-dev] [PATCH 00/20] Auto-generate pack/unpack functions

2014-11-19 Thread Iago Toral
On Tue, 2014-11-18 at 13:41 -0800, Jason Ekstrand wrote:
> Iago,
> 
> Most of this looks pretty good to me.  The one primary concern I have
> is in the handling of integer formats.  I made the comment in a couple
> of patches, but I'll make it in general here.  In a lot of the code,
> when you convert from integer formats to float, you treat them as if
> they are normalized.  Can you explain why you are doing this?  It
> seems very wrong to me.

Right, I have been discussing this with Samuel and it does look wrong.
He will change the code and run a new piglit run to verify the changes.

> 
> One other issue is that I couldn't actually get it to compile.  This
> is probably due to the fact that I always build out-of-tree, so
> sourcedir and builddir are not the same.  Not really sure what's going
> on there.

Mmm... that's weird. I think I remember seeing a patch that added a new
file and could be the source of that issue. We will look into it.

> 
> Other than that, It's looking pretty good.  I'll try and get to
> reviewing your second patch series tomorrow.  Since my R-B obviously
> doesn't mean much on the code I wrote I'll try and dig up a second
> reviewer as well.

Yes, that makes sense.
Thanks for looking into these patches so fast!

Iago

> --Jason
> 
> 
> On Tue, Nov 18, 2014 at 12:43 AM, Iago Toral Quiroga
>  wrote:
> This is the fist of two series of patches to address:
> https://bugs.freedesktop.org/show_bug.cgi?id=84566
> 
> The idea is that we have a lot of format conversion code
> scattered through
> different files in the repository, a lot of that is
> redundant / duplicated,
> so this intends to address that issue.
> 
> The goal of this first series is to address auto-generation of
> our pack/unpack
> functions (format_pack.c and format_unpack.c). Currently, we
> have a ton of
> hand-coded pack/unpack functions for lots of formats, but we
> can auto-generate
> most of that code instead, so this series handles this.
> 
> This is based on initial work by Jason Ekstrand.
> 
> Tested on i965, classic swrast and gallium (radeon, nouveau,
> llvmpipe) without
> regressions.
> 
> For software drivers we worked with a trimmed set of piglit
> tests (related to
> format conversion), ~5700 tests selected with the following
> filter:
> 
> -t format -t color -t tex -t image -t swizzle -t clamp -t rgb
> -t lum -t pix
> -t fbo -t frame
> 
> Summary of the patches:
>  * Patches 1-7 are general fixes to the current code that were
> found while
>working on this.
>  * Patches 8-16 implement auto-generation of pack/unpack
> functions.
>  * Patches 17-20 make use of the auto-generated pack/unpack
> functions in
>various places to simplify the current code.
> 
> Notice that some of the fixes in patches 1-7 will become
> obsolete as soon as
> we auto-generate the pack/unpack functions, but we thought it
> would make sense
> to keep them in the patch set anyway since we started from
> that base and they
> should be correct fixes to the currently existing code.
> 
> Iago Toral Quiroga (1):
>   swrast: Remove unused variable.
> 
> Jason Ekstrand (9):
>   mesa/format_utils: Fix a bug in one of the format helper
> functions
>   mesa: Fix packing/unpacking of MESA_FORMAT_R5G6B5_UNORM
>   mesa/colormac: Remove an unused macro
>   mesa: Fix A1R5G5B5 packing/unpacking
>   mesa/format_utils: Prefix and expose the conversion helper
> functions
>   mesa: Add a concept of an array format
>   mesa: Add a _mesa_is_format_color_format helper
>   mesa: Autogenerate most of format_pack.c
>   mesa: Autogenerate format_unpack.c
> 
> Samuel Iglesias Gonsalvez (10):
>   mesa: Fix get_texbuffer_format().
>   mesa: Fix _mesa_swizzle_and_convert integer conversions to
> clamp
> properly
>   mesa: Add _mesa_pack_uint_rgba_row() format conversion
> function
>   mesa: Add non-normalized formats support for ubyte packing
> functions
>   mesa/format_pack: Add _mesa_pack_int_rgba_row()
>   mesa/formats: add new mesa formats and their pack/unpack
> functions.
>   mesa: u

Re: [Mesa-dev] [PATCH 00/20] Auto-generate pack/unpack functions

2014-11-19 Thread Iago Toral
On Wed, 2014-11-19 at 17:09 +0900, Michel Dänzer wrote:
> On 18.11.2014 17:43, Iago Toral Quiroga wrote:
> >
> > For software drivers we worked with a trimmed set of piglit tests (related 
> > to
> > format conversion), ~5700 tests selected with the following filter:
> >
> > -t format -t color -t tex -t image -t swizzle -t clamp -t rgb -t lum -t pix
> > -t fbo -t frame
> 
> Any particular reason for not testing at least piglit gpu.py with 
> llvmpipe? Last time I tried that a few months ago, it didn't take much 
> more than ten minutes on a quad-core A10-7850K.

Not really, we tried to run the full suite but many tests would take
forever and we thought we should just cut it down to the tests that
seemed more related to the kind of stuff we were working on. Also, since
we would ran piglit very often to verify our changes, specially towards
the end of the development, we needed something manageable.

We will give gpu.py a try too. Thanks!

Iago


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Using the 'f' suffix to create a float from an integer literal

2014-11-19 Thread Iago Toral
On Wed, 2014-11-19 at 10:27 -0800, Ian Romanick wrote:
> On 11/19/2014 03:47 AM, Iago Toral Quiroga wrote:
> > Hi,
> > 
> > I came across a GLSL test that checks that doing something like this in
> > a shader should fail:
> 
> Is this one of the dEQP tests?

Yes.

> > float value = 1f;
> 
> It seems like we have a test related to this in piglit somewhere... it
> looks like tests/shaders/glsl-floating-constant-120.shader_test uses
> that syntax, but it's not explicitly testing that feature.
> 
> > However, this works fine in Mesa. Checking the spec I  see:
> > 
> > Floating-point constants are defined as follows.
> >  floating-constant:
> >fractional-constant exponent-part(opt) floating-suffix(opt)
> >digit-sequence exponent-part floating-suffix(opt)
> >  fractional-constant:
> >digit-sequence . digit-sequence
> >digit-sequence .
> >. digit-sequence
> >  exponent-part:
> >e sign(opt) digit-sequence
> >E sign(opt) digit-sequence
> >  sign: one of
> >+ -
> >  digit-sequence:
> >digit
> >digit-sequence digit
> >  floating-suffix: one of
> >f F
> > 
> > which suggests that the test is correct and Mesa has a bug. According to
> > the above rules, however, something like this is fine:
> > 
> > float f = 1e2f;
> > 
> > which I find kind of weird if the other case is not valid, so I wonder
> > if there is a bug in the spec or this is on purpose for some reason that
> > I am missing.
> > 
> > Then, to make matters worse, I read this in opengl.org wiki [1]:
> > 
> > "A numeric literal that uses a decimal is by default of type float​. To
> > create a float literal from an integer value, use the suffix f​ or F​ as
> > in C/C++."
> > 
> > which contradicts the spec and the test and is aligned with the current
> > way Mesa works.
> > 
> > So: does anyone know what version is right? Could this be a bug in the
> > spec? and if it is not, do we want to change the behavior to follow the
> > spec as it is now?
> 
> The $64,000 question: What do other GLSL compilers (including, perhaps,
> glslang) do?  This seems like one of the cases where nobody is likely to
> follow the spec, and some application will depend on that.  If that's
> the case, I'll submit a spec bug.

Good point. I'll try to check a few cases and reply here. Thanks!

Iago

> > Thanks,
> > Iago
> > 
> > [1] https://www.opengl.org/wiki/Data_Type_%28GLSL%29
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 02/29] mesa: Set normalized=true for float array formats.

2014-11-19 Thread Iago Toral
Hi Jason,

we discussed this some weeks ago actually, the detailed explanation is
here:
https://bugs.freedesktop.org/show_bug.cgi?id=84566#c5

the short answer is that this is necessary because there is a normalized
parameter to _mesa_swizzle_and_convert, and when we deal with float
types we want to set this to true.

Iago

On Wed, 2014-11-19 at 11:31 -0800, Jason Ekstrand wrote:
> I'm not sure what I think about this.  Does it make a functional
> change other than consistancy?
> 
> --Jason
> 
> 
> On Tue, Nov 18, 2014 at 1:23 AM, Iago Toral Quiroga
>  wrote:
> In order to check if a format is normalized Mesa does
> something like this:
> normalized = !_mesa_is_enum_format_integer(srcFormat);
> 
> So all float types will set normalized to true. Since our
> mesa_array_format
> includes a normalized flag for each type we want to make it
> consistent with
> this.
> ---
>  src/mesa/main/format_info.py | 3 ++-
>  src/mesa/main/format_utils.c | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/main/format_info.py
> b/src/mesa/main/format_info.py
> index 315767d..d4bc276 100644
> --- a/src/mesa/main/format_info.py
> +++ b/src/mesa/main/format_info.py
> @@ -220,9 +220,10 @@ for fmat in formats:
> print '  {{ {0} }},'.format(', '.join(map(str,
> fmat.swizzle)))
> if fmat.is_array() and fmat.colorspace in ('rgb', 'srgb'):
>chan = fmat.array_element()
> +  norm = chan.norm or chan.type == parser.FLOAT
>print '   {0} ,'.format(', '.join([
>   get_array_format_datatype(chan),
> - str(int(chan.norm)),
> + str(int(norm)),
>   str(len(fmat.channels)),
>   str(fmat.swizzle[0]),
>   str(fmat.swizzle[1]),
> diff --git a/src/mesa/main/format_utils.c
> b/src/mesa/main/format_utils.c
> index c3815cb..1d65f2b 100644
> --- a/src/mesa/main/format_utils.c
> +++ b/src/mesa/main/format_utils.c
> @@ -30,7 +30,7 @@
> 
>  mesa_array_format RGBA_FLOAT = {{
> MESA_ARRAY_FORMAT_TYPE_FLOAT,
> -   0,
> +   1,
> 4,
> 0, 1, 2, 3,
> 0, 1
> --
> 1.9.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/29] mesa: Consider internal base format in _mesa_format_convert

2014-11-19 Thread Iago Toral
On Wed, 2014-11-19 at 11:43 -0800, Jason Ekstrand wrote:
> A couple of specific comments are below.  More generally, why are you
> only considering the base format on two cases?  Do we never use it for
> anything else?

I thought about that too but when I looked at the original code it
seemed that it only cared for the base format in these two scenarios, so
I thought that maybe the conversions cases that could be affected are
all handled in those two paths. I'll check again though, just in case I
missed something.

> On Tue, Nov 18, 2014 at 1:23 AM, Iago Toral Quiroga
>  wrote:
> Add a dst_internal_format parameter to _mesa_format_convert,
> that represents
> the base internal format for texture/pixel uploads, so we can
> do the right
> thing when the driver has selected a different internal format
> for the target
> dst format.
> ---
>  src/mesa/main/format_utils.c | 65
> +++-
>  src/mesa/main/format_utils.h |  2 +-
>  2 files changed, 65 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/main/format_utils.c
> b/src/mesa/main/format_utils.c
> index fc59e86..5964689 100644
> --- a/src/mesa/main/format_utils.c
> +++ b/src/mesa/main/format_utils.c
> @@ -303,7 +303,7 @@ _mesa_compute_component_mapping(GLenum
> inFormat, GLenum outFormat, GLubyte *map)
>  void
>  _mesa_format_convert(void *void_dst, uint32_t dst_format,
> size_t dst_stride,
>   void *void_src, uint32_t src_format,
> size_t src_stride,
> - size_t width, size_t height)
> + size_t width, size_t height, GLenum
> dst_internal_format)
>  {
> uint8_t *dst = (uint8_t *)void_dst;
> uint8_t *src = (uint8_t *)void_src;
> @@ -422,6 +422,36 @@ _mesa_format_convert(void *void_dst,
> uint32_t dst_format, size_t dst_stride,
> if (src_array_format.as_uint && dst_array_format.as_uint)
> {
>assert(src_array_format.normalized ==
> dst_array_format.normalized);
> 
> +  /* If the base format of our dst is not the same as the
> provided base
> +   * format it means that we are converting to a
> different format
> +   * than the one originally requested by the client.
> This can happen when
> +   * the internal base format requested is not supported
> by the driver.
> +   * In this case we need to consider the requested
> internal base format to
> +   * compute the correct swizzle operation from src to
> dst. We will do this
> +   * by computing the swizzle transform
> src->rgba->base->rgba->dst instead
> +   * of src->rgba->dst.
> +   */
> +  mesa_format dst_mesa_format;
> +  if (dst_format & MESA_ARRAY_FORMAT_BIT)
> + dst_mesa_format =
> _mesa_format_from_array_format(dst_format);
> +  else
> + dst_mesa_format = dst_format;
> 
> 
> Let's put an extra line here so it doesn't get confused with the below
> if statement
> 
>  
> +  if (dst_internal_format !=
> _mesa_get_format_base_format(dst_mesa_format)) {
> + /* Compute src2rgba as src->rgba->base->rgba */
> + uint8_t rgba2base[4], base2rgba[4], swizzle[4];
> + _mesa_compute_component_mapping(GL_RGBA,
> dst_internal_format, rgba2base);
> + _mesa_compute_component_mapping(dst_internal_format,
> GL_RGBA, base2rgba);
> +
> + for (i = 0; i < 4; i++) {
> +if (base2rgba[i] > MESA_FORMAT_SWIZZLE_W)
> +   swizzle[i] = base2rgba[i];
> +else
> +   swizzle[i] =
> src2rgba[rgba2base[base2rgba[i]]];
> 
> 
> This doesn't work for composing three swizzles.  If you get a ZERO or
> ONE in rgba2base, you'll read the wrong memory from src2rgba.

Actually, the problem is worse, because the mapping written by
_mesa_compute_component_mapping is a 6-component mapping and we are
passing a 4-component array. I'll fix this.

>  
> 
> + }
> + memcpy(src2rgba, swizzle, sizeof(src2rgba));
> +  }
> +
> +  /* Compute src2dst from src2rg

Re: [Mesa-dev] [PATCH 06/29] mesa: Avoid pack/unpack fast paths if base internal format != base format

2014-11-19 Thread Iago Toral
On Wed, 2014-11-19 at 11:57 -0800, Jason Ekstrand wrote:
> A couple of general comments on this patch:
> 
> 
> 1) The prerequisites should be moved to before the first patch in the
> series and it should be squashed into the patch that introduces the
> function.  There are one or two more patches which also modify it and
> those should also be squashed in.

Ok.

> 
> 2) I wonder if giving _mesa_format_convert an internal swizzle
> wouldn't be better than a destination internal format.  There are a
> couple of reasons for this:
> 
> a) It's more general.  If it doesn't cost us anything extra to do
> it that way, we might as well.

I think that would only work directly for conversions between array
formats, but what if we have, for example, a texture upload from RGB565
to a Luminance format (so that we get an RGBA base format)? that would
not go though _mesa_swizzle_and_convert and would require to unpack to
RGBA and then pack to the dst... and unless the client has provided the
dst format as an array format that won't use _mesa_swizzle_and_convert
either. That should not be a problem when the calls to
_mesa_format_convert come from things like glTexImage or glReadPixels,
since in these cases the compute the dst format from a GL type and if it
is an array format we should get that, but in other cases that might not
be the case...

> b) It removes the GL enum dependency from the _mesa_format_convert
> c) I think this implementation misses the case where we download a
> texture that is storred in a different format than its base format.
> For instance, if you are downloading a GL_RGB texture as GL_RGBA but
> it's storred as GL_RGBA.  I think with the current implementaion,
> you'll get the junk in the alpha component of the texture's backing
> storage instead of a constant alpha value of 1.

That's correct. In the implementation of readpixels and getteximage we
had to rebase the results in some cases to account for that.

Iago

> 
> On Tue, Nov 18, 2014 at 1:23 AM, Iago Toral Quiroga
>  wrote:
> In general, if the dst base internal format and the selected
> dst format are
> different we can't be sure that directly packing or unpacking
> to the destination
> format will produce correct results. In these cases we
> probably want to fall
> back to other paths (like mesa_swizzle_and_convert) that can
> handle these
> situations better.
> 
> An example of this includes a luminance internal format for
> which the driver
> decided to use something like BGRA. In these case, unpacking
> to BGRA won't
> match the requirements of the luminance internal format.
> 
> In the future we may want to allow these fast paths for
> specific cases
> where we know the direct pack/unpack functions will do the
> right thing.
> ---
>  src/mesa/main/format_utils.c | 137
> +++
>  1 file changed, 72 insertions(+), 65 deletions(-)
> 
> diff --git a/src/mesa/main/format_utils.c
> b/src/mesa/main/format_utils.c
> index 5964689..34c90d9 100644
> --- a/src/mesa/main/format_utils.c
> +++ b/src/mesa/main/format_utils.c
> @@ -331,65 +331,82 @@ _mesa_format_convert(void *void_dst,
> uint32_t dst_format, size_t dst_stride,
>dst_array_format.as_uint =
> _mesa_format_to_array_format(dst_format);
> }
> 
> -   /* Handle the cases where we can directly unpack */
> -   if (!(src_format & MESA_ARRAY_FORMAT_BIT)) {
> -  if (dst_array_format.as_uint == RGBA_FLOAT.as_uint)
> {
> - for (row = 0; row < height; ++row) {
> -_mesa_unpack_rgba_row(src_format, width,
> -  src, (float (*)[4])dst);
> -src += src_stride;
> -dst += dst_stride;
> - }
> - return;
> -  } else if (dst_array_format.as_uint ==
> RGBA_UBYTE.as_uint &&
> - !_mesa_is_format_integer_color(src_format))
> {
> - for (row = 0; row < height; ++row) {
> -_mesa_unpack_ubyte_rgba_row(src_format, width,
> -src, (uint8_t
> (*)[4])dst);
> -src += src_stride;
> -dst += dst_stride;
> - }
> - return;
> -  } else if (dst_ar

Re: [Mesa-dev] [PATCH 09/29] mesa: Add _mesa_swap2_copy and _mesa_swap4_copy

2014-11-20 Thread Iago Toral
No good reason really, simply that the original functions seemed simpler
for the case of in-place swapping since you don't have to pass the dst
parameter explicitly, so I figured there was a marginal gain in letting
them stay, specially since their implementation is just an inline call
to the other version. Do you prefer the other solution?

Iago

On Wed, 2014-11-19 at 12:00 -0800, Jason Ekstrand wrote:
> Any reason why you added 2 new functions, instead of just altering the
> ones we have and updating where they are used?
> 
> 
> On Tue, Nov 18, 2014 at 1:23 AM, Iago Toral Quiroga
>  wrote:
> We have _mesa_swap{2,4} but these do in-place byte-swapping
> only. The new
> functions receive an extra parameter so we can swap bytes on a
> source
> input array and store the results in a (possibly different)
> destination
> array.
> 
> This is useful to implement byte-swapping in pixel uploads,
> since in this
> case we need to swap bytes on the src data which is owned by
> the
> application so we can't do an in-place byte swap.
> ---
>  src/mesa/main/image.c | 25 +
>  src/mesa/main/image.h | 10 --
>  2 files changed, 25 insertions(+), 10 deletions(-)
> 
> diff --git a/src/mesa/main/image.c b/src/mesa/main/image.c
> index 4ea5f04..9ad97c5 100644
> --- a/src/mesa/main/image.c
> +++ b/src/mesa/main/image.c
> @@ -41,36 +41,45 @@
> 
> 
>  /**
> - * Flip the order of the 2 bytes in each word in the given
> array.
> + * Flip the order of the 2 bytes in each word in the given
> array (src) and
> + * store the result in another array (dst). For in-place
> byte-swapping this
> + * function can be called with the same array for src and
> dst.
>   *
> - * \param p array.
> + * \param dst the array where byte-swapped data will be
> stored.
> + * \param src the array with the source data we want to
> byte-swap.
>   * \param n number of words.
>   */
>  void
> -_mesa_swap2( GLushort *p, GLuint n )
> +_mesa_swap2_copy( GLushort *dst, GLushort *src, GLuint n )
>  {
> GLuint i;
> for (i = 0; i < n; i++) {
> -  p[i] = (p[i] >> 8) | ((p[i] << 8) & 0xff00);
> +  dst[i] = (src[i] >> 8) | ((src[i] << 8) & 0xff00);
> }
>  }
> 
> 
> 
>  /*
> - * Flip the order of the 4 bytes in each word in the given
> array.
> + * Flip the order of the 4 bytes in each word in the given
> array (src) and
> + * store the result in another array (dst). For in-place
> byte-swapping this
> + * function can be called with the same array for src and
> dst.
> + *
> + * \param dst the array where byte-swapped data will be
> stored.
> + * \param src the array with the source data we want to
> byte-swap.
> + * \param n number of words.
>   */
>  void
> -_mesa_swap4( GLuint *p, GLuint n )
> +_mesa_swap4_copy( GLuint *dst, GLuint *src, GLuint n )
>  {
> GLuint i, a, b;
> for (i = 0; i < n; i++) {
> -  b = p[i];
> +  b = src[i];
>a =  (b >> 24)
> | ((b >> 8) & 0xff00)
> | ((b << 8) & 0xff)
> | ((b << 24) & 0xff00);
> -  p[i] = a;
> +  dst[i] = a;
> }
>  }
> 
> diff --git a/src/mesa/main/image.h b/src/mesa/main/image.h
> index abd84bf..79c6e68 100644
> --- a/src/mesa/main/image.h
> +++ b/src/mesa/main/image.h
> @@ -33,10 +33,16 @@ struct gl_context;
>  struct gl_pixelstore_attrib;
> 
>  extern void
> -_mesa_swap2( GLushort *p, GLuint n );
> +_mesa_swap2_copy( GLushort *dst, GLushort *src, GLuint n );
> 
>  extern void
> -_mesa_swap4( GLuint *p, GLuint n );
> +_mesa_swap4_copy( GLuint *dst, GLuint *src, GLuint n );
> +
> +static inline void
> +_mesa_swap2( GLushort *p, GLuint n ) { _mesa_swap2_copy(p, p,
> n); }
> +
> +static inline void
> +_mesa_swap4( GLuint *p, GLuint 

Re: [Mesa-dev] [PATCH 03/29] mesa: Do not assert on integer<->non-integer direct pack/unpack fast paths

2014-11-20 Thread Iago Toral
It is explained here:
https://bugs.freedesktop.org/show_bug.cgi?id=84566#c35

So one example of this was a glReadPixels where we want to store the
pixel data as RGBA UINT, but the render buffer format we  read from is
MESA_FORMAT_B8G8R8A8_UNORM. There are piglit tests that hit this case.

Iago

On Wed, 2014-11-19 at 12:04 -0800, Jason Ekstrand wrote:
> Can you remind me again as to what formats hit these paths?  I
> remember you hitting them, but I'm still not really seeing how it
> happens.
> 
> --Jason
> 
> 
> On Tue, Nov 18, 2014 at 1:23 AM, Iago Toral Quiroga
>  wrote:
> We can have conversions from non-integer types to integer
> types, so remove
> the assertions for these in the pack/unpack fast paths. It
> could be that
> we do not have all the necessary pack/unpack functions in
> these cases though,
> so protect these paths with conditionals and let
> _mesa_format_convert use
> other paths to resolve these kind of conversions if necessary.
> ---
>  src/mesa/main/format_utils.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/src/mesa/main/format_utils.c
> b/src/mesa/main/format_utils.c
> index 1d65f2b..56a3b8d 100644
> --- a/src/mesa/main/format_utils.c
> +++ b/src/mesa/main/format_utils.c
> @@ -143,8 +143,8 @@ _mesa_format_convert(void *void_dst,
> uint32_t dst_format, size_t dst_stride,
>  dst += dst_stride;
>   }
>   return;
> -  } else if (dst_array_format.as_uint ==
> RGBA_UBYTE.as_uint) {
> - assert(!_mesa_is_format_integer_color(src_format));
> +  } else if (dst_array_format.as_uint ==
> RGBA_UBYTE.as_uint &&
> + !_mesa_is_format_integer_color(src_format))
> {
>   for (row = 0; row < height; ++row) {
>  _mesa_unpack_ubyte_rgba_row(src_format, width,
>  src, (uint8_t
> (*)[4])dst);
> @@ -152,8 +152,8 @@ _mesa_format_convert(void *void_dst,
> uint32_t dst_format, size_t dst_stride,
>  dst += dst_stride;
>   }
>   return;
> -  } else if (dst_array_format.as_uint ==
> RGBA_UINT.as_uint) {
> - assert(_mesa_is_format_integer_color(src_format));
> +  } else if (dst_array_format.as_uint ==
> RGBA_UINT.as_uint &&
> + _mesa_is_format_integer_color(src_format)) {
>   for (row = 0; row < height; ++row) {
>  _mesa_unpack_uint_rgba_row(src_format, width,
> src, (uint32_t
> (*)[4])dst);
> @@ -174,8 +174,8 @@ _mesa_format_convert(void *void_dst,
> uint32_t dst_format, size_t dst_stride,
>  dst += dst_stride;
>   }
>   return;
> -  } else if (src_array_format.as_uint ==
> RGBA_UBYTE.as_uint) {
> - assert(!_mesa_is_format_integer_color(dst_format));
> +  } else if (src_array_format.as_uint ==
> RGBA_UBYTE.as_uint &&
> + !_mesa_is_format_integer_color(dst_format))
> {
>   for (row = 0; row < height; ++row) {
>  _mesa_pack_ubyte_rgba_row(dst_format, width,
>(const uint8_t
> (*)[4])src, dst);
> @@ -183,8 +183,8 @@ _mesa_format_convert(void *void_dst,
> uint32_t dst_format, size_t dst_stride,
>  dst += dst_stride;
>   }
>   return;
> -  } else if (src_array_format.as_uint ==
> RGBA_UINT.as_uint) {
> - assert(_mesa_is_format_integer_color(dst_format));
> +  } else if (src_array_format.as_uint ==
> RGBA_UINT.as_uint &&
> + _mesa_is_format_integer_color(dst_format)) {
>   for (row = 0; row < height; ++row) {
>  _mesa_pack_uint_rgba_row(dst_format, width,
>   (const uint32_t
> (*)[4])src, dst);
> --
> 1.9.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 07/29] mesa: Add helper to convert a GL format and type to a mesa (array) format.

2014-11-20 Thread Iago Toral
On Wed, 2014-11-19 at 12:11 -0800, Jason Ekstrand wrote:
> General comment:  Maybe this would be better in gltypes rather than in
> mesa_formats

Ok, I'll move it.

> On Tue, Nov 18, 2014 at 1:23 AM, Iago Toral Quiroga
>  wrote:
> ---
>  src/mesa/main/formats.c | 285
> 
>  src/mesa/main/formats.h |   3 +
>  2 files changed, 288 insertions(+)
(...)
> +
> +/**
> +* Take an OpenGL format (GL_RGB, GL_RGBA, etc), OpenGL data
> type (GL_INT,
> +* GL_FOAT, etc) and return a matching mesa_array_format or a
> mesa_format
> +* otherwise (for non-array formats).
> +*
> +* This function will typically be used to compute a mesa
> format from a GL type
> +* so we can then call _mesa_format_convert. This function
> does
> +* not consider byte swapping, so it returns types assuming
> that no byte
> +* swapping is involved. If byte swapping is involved then
> clients are supposed
> +* to handle that on their side before calling
> _mesa_format_convert.
> +*
> +* This function returns an uint32_t that can pack a
> mesa_format or a
> +* mesa_array_format. Clients must check the mesa array format
> bit
> +* (MESA_ARRAY_FORMAT_BIT) on the return value to know if the
> returned
> +* format is a mesa_array_format or a mesa_format.
> +*/
> +uint32_t
> +_mesa_format_from_format_and_type(GLenum format, GLenum type)
> +{
> +   mesa_array_format array_format;
> +
> +   bool is_array_format = true;
> +
> +   /* Map the OpenGL data type to an array format data type
> */
> +   switch (type) {
> +   case GL_UNSIGNED_BYTE:
> +  array_format.type = MESA_ARRAY_FORMAT_TYPE_UBYTE;
> +  break;
> +   case GL_BYTE:
> +  array_format.type = MESA_ARRAY_FORMAT_TYPE_BYTE;
> +  break;
> +   case GL_UNSIGNED_SHORT:
> +  array_format.type = MESA_ARRAY_FORMAT_TYPE_USHORT;
> +  break;
> +   case GL_SHORT:
> +  array_format.type = MESA_ARRAY_FORMAT_TYPE_SHORT;
> +  break;
> +   case GL_UNSIGNED_INT:
> +  array_format.type = MESA_ARRAY_FORMAT_TYPE_UINT;
> +  break;
> +   case GL_INT:
> +  array_format.type = MESA_ARRAY_FORMAT_TYPE_INT;
> +  break;
> +   case GL_HALF_FLOAT:
> +  array_format.type = MESA_ARRAY_FORMAT_TYPE_HALF;
> +  break;
> +   case GL_FLOAT:
> +  array_format.type = MESA_ARRAY_FORMAT_TYPE_FLOAT;
> +  break;
> +   case GL_UNSIGNED_INT_8_8_8_8:
> +   case GL_UNSIGNED_INT_8_8_8_8_REV:
> +  array_format.type = MESA_ARRAY_FORMAT_TYPE_UBYTE;
> 
> 
> If you put these in the GL type switch below as returning the
> MESA_FORMAT_R8G8B8A8 or whatever, then the code in
> mesa_format_get_array_format will fix up the swizzling for you.

Do you mean returning a mesa_format instead of a mesa_array_format for
the *8_8_8_8* types? Sure, we can do that.

>  
> +  break;
> +   default:
> +  is_array_format = false;
> +  break;
> +   }
> +
> +   /* Next we extract array swizzle information from the
> OpenGL format */
> +   if (is_array_format) {
> +  is_array_format =
> +
>  _mesa_array_format_set_swizzle_from_format(&array_format,
> format);
> +   }
> +
> +   /* If this is an array format type after checking data
> type and format,
> +* fill in the remaining data
> +*/
> +   if (is_array_format) {
> +  array_format.normalized = !
> _mesa_is_enum_format_integer(format);
> +  array_format.num_channels =
> _mesa_components_in_format(format);
> +  array_format.pad = 0;
> +  array_format.array_format_bit = 1;
> +
> +  /* Check if we need to swap the swizzle for
> GL_UNSIGNED_INT_8_8_8_8* */
> + bool swap_swizzle =
> + (type == GL_UNSIGNED_INT_8_8_8_8 &&
> _mesa_little_endian()) ||
> + (type == GL_UNSIGNED_INT_8_8_8_8_REV && !
> _mesa_little_endian());
> +  if (swap_swizzle) {

Re: [Mesa-dev] [PATCH 09/29] mesa: Add _mesa_swap2_copy and _mesa_swap4_copy

2014-11-20 Thread Iago Toral
On Wed, 2014-11-19 at 14:15 -0600, Patrick Baggett wrote:
> 
> 
> On Tue, Nov 18, 2014 at 3:23 AM, Iago Toral Quiroga
>  wrote:
> We have _mesa_swap{2,4} but these do in-place byte-swapping
> only. The new
> functions receive an extra parameter so we can swap bytes on a
> source
> input array and store the results in a (possibly different)
> destination
> array.
> 
> 
> 
> If this is being split into an "in-place" and "different pointers"
> version, I think using the "restrict" keyword would be useful here to
> improve the performance. Then, the in-place one cannot be implemented
> as copy(p,p,n), but the code isn't overly complicated.

I did not know about 'restrict', thanks for the tip!.

It kind of defeats the original idea of not duplicating the code but it
is true that it is not particularly complex anyway, so maybe it is worth
it if Jason agrees with having two versions of the functions instead of
just one in the end. Jason, what do you think?

Iago

> 
>  
> This is useful to implement byte-swapping in pixel uploads,
> since in this
> case we need to swap bytes on the src data which is owned by
> the
> application so we can't do an in-place byte swap.
> ---
>  src/mesa/main/image.c | 25 +
>  src/mesa/main/image.h | 10 --
>  2 files changed, 25 insertions(+), 10 deletions(-)
> 
> diff --git a/src/mesa/main/image.c b/src/mesa/main/image.c
> index 4ea5f04..9ad97c5 100644
> --- a/src/mesa/main/image.c
> +++ b/src/mesa/main/image.c
> @@ -41,36 +41,45 @@
> 
> 
>  /**
> - * Flip the order of the 2 bytes in each word in the given
> array.
> + * Flip the order of the 2 bytes in each word in the given
> array (src) and
> + * store the result in another array (dst). For in-place
> byte-swapping this
> + * function can be called with the same array for src and
> dst.
>   *
> - * \param p array.
> + * \param dst the array where byte-swapped data will be
> stored.
> + * \param src the array with the source data we want to
> byte-swap.
>   * \param n number of words.
>   */
>  void
> -_mesa_swap2( GLushort *p, GLuint n )
> +_mesa_swap2_copy( GLushort *dst, GLushort *src, GLuint n )
>  {
> GLuint i;
> for (i = 0; i < n; i++) {
> -  p[i] = (p[i] >> 8) | ((p[i] << 8) & 0xff00);
> +  dst[i] = (src[i] >> 8) | ((src[i] << 8) & 0xff00);
> }
>  }
> 
> 
> 
>  /*
> - * Flip the order of the 4 bytes in each word in the given
> array.
> + * Flip the order of the 4 bytes in each word in the given
> array (src) and
> + * store the result in another array (dst). For in-place
> byte-swapping this
> + * function can be called with the same array for src and
> dst.
> + *
> + * \param dst the array where byte-swapped data will be
> stored.
> + * \param src the array with the source data we want to
> byte-swap.
> + * \param n number of words.
>   */
>  void
> -_mesa_swap4( GLuint *p, GLuint n )
> +_mesa_swap4_copy( GLuint *dst, GLuint *src, GLuint n )
>  {
> GLuint i, a, b;
> for (i = 0; i < n; i++) {
> -  b = p[i];
> +  b = src[i];
>a =  (b >> 24)
> | ((b >> 8) & 0xff00)
> | ((b << 8) & 0xff)
> | ((b << 24) & 0xff00);
> -  p[i] = a;
> +  dst[i] = a;
> }
>  }
> 
> diff --git a/src/mesa/main/image.h b/src/mesa/main/image.h
> index abd84bf..79c6e68 100644
> --- a/src/mesa/main/image.h
> +++ b/src/mesa/main/image.h
> @@ -33,10 +33,16 @@ struct gl_context;
>  struct gl_pixelstore_attrib;
> 
>  extern void
> -_mesa_swap2( GLushort *p, GLuint n );
> +_mesa_swap2_copy( GLushort *dst, GLushort *src, GLuint n );
> 
>  extern void
> -_mesa_swap4( GLuint *p, GLuint n );
> +_mesa_swap4_copy( GLuint *dst, GLuint *src, GLuint n );
> +
> +s

Re: [Mesa-dev] [PATCH 14/20] mesa: Add non-normalized formats support for ubyte packing functions

2014-11-20 Thread Iago Toral
On Thu, 2014-11-20 at 08:15 +0100, Samuel Iglesias Gonsálvez wrote:
> On Tue, 2014-11-18 at 11:08 -0800, Jason Ekstrand wrote:
> > 
> > 
> > On Tue, Nov 18, 2014 at 12:44 AM, Iago Toral Quiroga
> >  wrote:
> > From: Samuel Iglesias Gonsalvez 
> > 
> > Signed-off-by: Samuel Iglesias Gonsalvez
> > 
> > ---
> >  src/mesa/main/format_pack.c.mako | 10 +-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/mesa/main/format_pack.c.mako
> > b/src/mesa/main/format_pack.c.mako
> > index b9f4656..97adf6e 100644
> > --- a/src/mesa/main/format_pack.c.mako
> > +++ b/src/mesa/main/format_pack.c.mako
> > @@ -84,7 +84,15 @@ pack_ubyte_${f.short_name()}(const GLubyte
> > src[4], void *dst)
> >%endif
> > 
> >${channel_datatype(c)} ${c.name} =
> > -  %if c.type == parser.UNSIGNED:
> > +  %if not f.is_normalized():
> > + %if c.type == parser.FLOAT and c.size == 32:
> > +UBYTE_TO_FLOAT(src[${i}]);
> > + %elif c.type == parser.FLOAT and c.size == 16:
> > +_mesa_float_to_half(UBYTE_TO_FLOAT(src[${i}]));
> > 
> > 
> > Same question here as in the previous patch.  Why are we using
> > UBYTE_TO_FLOAT?
> > 
> 
> This is what current format_pack.c is doing for those formats and some
> piglit tests complain if it is not there.
> 

Jason, this looks correct to me: we are packing from an ubyte type to a
half float type, so first we need to convert from ubyte to float and
then downgrade the float to a half float, we don't currently have means
to convert directly from an ubyte to a half float, right?

Iago

> >  
> > + %else:
> > +(${channel_datatype(c)}) src[${i}];
> > + %endif
> > +  %elif c.type == parser.UNSIGNED:
> >   %if f.colorspace == 'srgb' and c.name in 'rgb':
> >  util_format_linear_to_srgb_8unorm(src[${i}]);
> >   %else:
> > --
> > 1.9.1
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > 
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Using the 'f' suffix to create a float from an integer literal

2014-11-20 Thread Iago Toral
On Thu, 2014-11-20 at 08:08 +0100, Iago Toral wrote:
> On Wed, 2014-11-19 at 10:27 -0800, Ian Romanick wrote:
> > On 11/19/2014 03:47 AM, Iago Toral Quiroga wrote:
> > > Hi,
> > > 
> > > I came across a GLSL test that checks that doing something like this in
> > > a shader should fail:
> > 
> > Is this one of the dEQP tests?
> 
> Yes.
> 
> > > float value = 1f;
> > 
> > It seems like we have a test related to this in piglit somewhere... it
> > looks like tests/shaders/glsl-floating-constant-120.shader_test uses
> > that syntax, but it's not explicitly testing that feature.
> > 
> > > However, this works fine in Mesa. Checking the spec I  see:
> > > 
> > > Floating-point constants are defined as follows.
> > >  floating-constant:
> > >fractional-constant exponent-part(opt) floating-suffix(opt)
> > >digit-sequence exponent-part floating-suffix(opt)
> > >  fractional-constant:
> > >digit-sequence . digit-sequence
> > >digit-sequence .
> > >. digit-sequence
> > >  exponent-part:
> > >e sign(opt) digit-sequence
> > >E sign(opt) digit-sequence
> > >  sign: one of
> > >+ -
> > >  digit-sequence:
> > >digit
> > >digit-sequence digit
> > >  floating-suffix: one of
> > >f F
> > > 
> > > which suggests that the test is correct and Mesa has a bug. According to
> > > the above rules, however, something like this is fine:
> > > 
> > > float f = 1e2f;
> > > 
> > > which I find kind of weird if the other case is not valid, so I wonder
> > > if there is a bug in the spec or this is on purpose for some reason that
> > > I am missing.
> > > 
> > > Then, to make matters worse, I read this in opengl.org wiki [1]:
> > > 
> > > "A numeric literal that uses a decimal is by default of type float​. To
> > > create a float literal from an integer value, use the suffix f​ or F​ as
> > > in C/C++."
> > > 
> > > which contradicts the spec and the test and is aligned with the current
> > > way Mesa works.
> > > 
> > > So: does anyone know what version is right? Could this be a bug in the
> > > spec? and if it is not, do we want to change the behavior to follow the
> > > spec as it is now?
> > 
> > The $64,000 question: What do other GLSL compilers (including, perhaps,
> > glslang) do?  This seems like one of the cases where nobody is likely to
> > follow the spec, and some application will depend on that.  If that's
> > the case, I'll submit a spec bug.
> 
> Good point. I'll try to check a few cases and reply here. Thanks!

I did a quick test on AMD Radeon and nVidia proprietary drivers since I
had these easily available. AMD fails to compile (so it follows the
spec) but nVidia works (so same case as Mesa).

This confirms your guess: different drivers are doing different things.
Is this enough to file a spec bug? I imagine that the result on glslang
won't change anything, but I can try to install it and test there too if
you think that's interesting anyway.

Iago

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/29] mesa: Add an implementation of a master convert function.

2014-11-20 Thread Iago Toral
On Wed, 2014-11-19 at 11:28 -0800, Jason Ekstrand wrote:
> By and large, this looks good to me.  Most of my comments are cosmetic
> or suggestions for added documentation.  There is one issue that I
> think is subtly wrong with integer format conversion but that should
> be easy to fix.
> 
> --Jason
> 
> On Tue, Nov 18, 2014 at 1:23 AM, Iago Toral Quiroga
>  wrote:
> From: Jason Ekstrand 
>     
> v2 by Iago Toral :
> 
> - When testing if we can directly pack we should use the src
> format to check
>   if we are packing from an RGBA format. The original code
> used the dst format
>   for the ubyte case by mistake.
> - Fixed incorrect number of bits for dst, it was computed
> using the src format
>   instead of the dst format.
> - If the dst format is an array format, check if it is signed.
> We were only
>   checking this for the case where it was not an array format,
> but we need
>   to know this in both scenarios.
> - Fixed incorrect swizzle transform for the cases where we
> convert between
>   array formats.
> - Compute is_signed and bits only once and for the dst format.
> We were
>   computing these for the src format too but they were
> overwritten by the
>   dst values immediately after.
> - Be more careful when selecting the integer path.
> Specifically, check that
>   both src and dst are integer types. Checking only one of
> them should suffice
>   since OpenGL does not allow conversions between normalized
> and integer types,
>   but putting extra care here makes sense and also makes the
> actual requirements
>   for this path more clear.
> - The format argument for pack functions is the destination
> format we are
>   packing to, not the source format (which has to be RGBA).
> - Expose RGBA_* to other files. These will come in handy
> when in need to
>   test if a given array format is RGBA or in need to pass RGBA
> formats to
>   mesa_format_convert.
> 
> v3 by Samuel Iglesias :
> 
> - Add an RGBA_INT definition.
> ---
>  src/mesa/main/format_utils.c | 378
> +++
>  src/mesa/main/format_utils.h |  10 ++
>  src/mesa/main/formats.h  |  15 +-
>  3 files changed, 399 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/main/format_utils.c
> b/src/mesa/main/format_utils.c
> index fcbbba4..c3815cb 100644
> --- a/src/mesa/main/format_utils.c
> +++ b/src/mesa/main/format_utils.c
> @@ -25,6 +25,384 @@
>  #include "format_utils.h"
>  #include "glformats.h"
>  #include "macros.h"
> +#include "format_pack.h"
> +#include "format_unpack.h"
> +
> +mesa_array_format RGBA_FLOAT = {{
> +   MESA_ARRAY_FORMAT_TYPE_FLOAT,
> +   0,
> +   4,
> +   0, 1, 2, 3,
> +   0, 1
> +}};
> +
> +mesa_array_format RGBA_UBYTE = {{
> +   MESA_ARRAY_FORMAT_TYPE_UBYTE,
> +   1,
> +   4,
> +   0, 1, 2, 3,
> +   0, 1
> +}};
> +
> +mesa_array_format RGBA_UINT = {{
> +   MESA_ARRAY_FORMAT_TYPE_UINT,
> +   0,
> +   4,
> +   0, 1, 2, 3,
> +   0, 1
> +}};
> +
> +mesa_array_format RGBA_INT = {{
> +   MESA_ARRAY_FORMAT_TYPE_INT,
> +   0,
> +   4,
> +   0, 1, 2, 3,
> +   0, 1
> +}};
> +
> +static void
> +invert_swizzle(uint8_t dst[4], const uint8_t src[4])
> +{
> +   int i, j;
> +
> +   dst[0] = MESA_FORMAT_SWIZZLE_NONE;
> +   dst[1] = MESA_FORMAT_SWIZZLE_NONE;
> +   dst[2] = MESA_FORMAT_SWIZZLE_NONE;
> +   dst[3] = MESA_FORMAT_SWIZZLE_NONE;
> +
> +   for (i = 0; i < 4; ++i)
> +  for (j = 0; j < 4; ++j)
> + if (src[j] == i && dst[i] ==
> MESA_FORMAT_SWIZZLE_NONE)
> +dst[i] = j;
> +}
> +
> +static GLenum
> +gl_type_for_array_format_datatype(enum
> mesa_array_format_datatype type)
&g

Re: [Mesa-dev] [PATCH 05/29] mesa: Consider internal base format in _mesa_format_convert

2014-11-20 Thread Iago Toral
On Thu, 2014-11-20 at 08:42 +0100, Iago Toral wrote:
> On Wed, 2014-11-19 at 11:43 -0800, Jason Ekstrand wrote:
> > A couple of specific comments are below.  More generally, why are you
> > only considering the base format on two cases?  Do we never use it for
> > anything else?
> 
> I thought about that too but when I looked at the original code it
> seemed that it only cared for the base format in these two scenarios, so
> I thought that maybe the conversions cases that could be affected are
> all handled in those two paths. I'll check again though, just in case I
> missed something.

I don't know how I came to that conclusion but it seems wrong after
looking at the original code in texstore.c, which considers the base
internal format in the integer, float and ubyte paths too, so we should
do the same in _mesa_format_convert. I'll fix that.

Iago

> > On Tue, Nov 18, 2014 at 1:23 AM, Iago Toral Quiroga
> >  wrote:
> > Add a dst_internal_format parameter to _mesa_format_convert,
> > that represents
> > the base internal format for texture/pixel uploads, so we can
> > do the right
> > thing when the driver has selected a different internal format
> > for the target
> > dst format.
> > ---
> >  src/mesa/main/format_utils.c | 65
> > +++-
> >  src/mesa/main/format_utils.h |  2 +-
> >  2 files changed, 65 insertions(+), 2 deletions(-)
> > 
> > diff --git a/src/mesa/main/format_utils.c
> > b/src/mesa/main/format_utils.c
> > index fc59e86..5964689 100644
> > --- a/src/mesa/main/format_utils.c
> > +++ b/src/mesa/main/format_utils.c
> > @@ -303,7 +303,7 @@ _mesa_compute_component_mapping(GLenum
> > inFormat, GLenum outFormat, GLubyte *map)
> >  void
> >  _mesa_format_convert(void *void_dst, uint32_t dst_format,
> > size_t dst_stride,
> >   void *void_src, uint32_t src_format,
> > size_t src_stride,
> > - size_t width, size_t height)
> > + size_t width, size_t height, GLenum
> > dst_internal_format)
> >  {
> > uint8_t *dst = (uint8_t *)void_dst;
> > uint8_t *src = (uint8_t *)void_src;
> > @@ -422,6 +422,36 @@ _mesa_format_convert(void *void_dst,
> > uint32_t dst_format, size_t dst_stride,
> > if (src_array_format.as_uint && dst_array_format.as_uint)
> > {
> >assert(src_array_format.normalized ==
> > dst_array_format.normalized);
> > 
> > +  /* If the base format of our dst is not the same as the
> > provided base
> > +   * format it means that we are converting to a
> > different format
> > +   * than the one originally requested by the client.
> > This can happen when
> > +   * the internal base format requested is not supported
> > by the driver.
> > +   * In this case we need to consider the requested
> > internal base format to
> > +   * compute the correct swizzle operation from src to
> > dst. We will do this
> > +   * by computing the swizzle transform
> > src->rgba->base->rgba->dst instead
> > +   * of src->rgba->dst.
> > +   */
> > +  mesa_format dst_mesa_format;
> > +  if (dst_format & MESA_ARRAY_FORMAT_BIT)
> > + dst_mesa_format =
> > _mesa_format_from_array_format(dst_format);
> > +  else
> > + dst_mesa_format = dst_format;
> > 
> > 
> > Let's put an extra line here so it doesn't get confused with the below
> > if statement
> > 
> >  
> > +  if (dst_internal_format !=
> > _mesa_get_format_base_format(dst_mesa_format)) {
> > + /* Compute src2rgba as src->rgba->base->rgba */
> > + uint8_t rgba2base[4], base2rgba[4], swizzle[4];
> > + _mesa_compute_component_mapping(GL_RGBA,
> > dst_internal_format, rgba2base);
> > + _mesa_compute_component_mapping(dst_internal_format,
> > GL_RGBA, base2rgba);
> > +
> > + for (i = 0; i < 4; i++) 

Re: [Mesa-dev] [PATCH 06/29] mesa: Avoid pack/unpack fast paths if base internal format != base format

2014-11-20 Thread Iago Toral
On Thu, 2014-11-20 at 10:49 -0800, Jason Ekstrand wrote:
> 
> 
> On Wed, Nov 19, 2014 at 11:58 PM, Iago Toral 
> wrote:
> On Wed, 2014-11-19 at 11:57 -0800, Jason Ekstrand wrote:
> > A couple of general comments on this patch:
> >
> >
> > 1) The prerequisites should be moved to before the first
> patch in the
> > series and it should be squashed into the patch that
> introduces the
> > function.  There are one or two more patches which also
> modify it and
> > those should also be squashed in.
> 
> Ok.
> 
> >
> > 2) I wonder if giving _mesa_format_convert an internal
> swizzle
> > wouldn't be better than a destination internal format.
> There are a
> > couple of reasons for this:
> >
> > a) It's more general.  If it doesn't cost us anything
> extra to do
> > it that way, we might as well.
> 
> I think that would only work directly for conversions between
> array
> formats, but what if we have, for example, a texture upload
> from RGB565
> to a Luminance format (so that we get an RGBA base format)?
> that would
> not go though _mesa_swizzle_and_convert and would require to
> unpack to
> RGBA and then pack to the dst... and unless the client has
> provided the
> dst format as an array format that won't use
> _mesa_swizzle_and_convert
> either. That should not be a problem when the calls to
> _mesa_format_convert come from things like glTexImage or
> glReadPixels,
> since in these cases the compute the dst format from a GL type
> and if it
> is an array format we should get that, but in other cases that
> might not
> be the case...
>  
> 
> I'm a bit confused about what you mean here.  If the user passes in a
> non-trivial swizzle and we have to pack and unpack on both sides then
> we have to unpack, swizzle, and then repack.  We would still have to
> do this if all you pass in is an internal format. 

I was confused by the fact that we are currently not doing this in all
paths (I mean, swizzling after unpacking if necessary), but you are
right.

>  Fortunately, the _mesa_swizzle_and_convert function can be used to do
> an in-place swizzle as long as the source and destination types have
> the same number of bits per pixel.
> 
> If one side of the pack/repack is an array format, we can just build
> the swizzling into the one _mesa_swizzle_and_convert call.

Yes, I get your point now. Thanks!

> 
> > b) It removes the GL enum dependency from the
> _mesa_format_convert
> > c) I think this implementation misses the case where we
> download a
> > texture that is storred in a different format than its base
> format.
> > For instance, if you are downloading a GL_RGB texture as
> GL_RGBA but
> > it's storred as GL_RGBA.  I think with the current
> implementaion,
> > you'll get the junk in the alpha component of the texture's
> backing
>     > storage instead of a constant alpha value of 1.
> 
> That's correct. In the implementation of readpixels and
> getteximage we
> had to rebase the results in some cases to account for that.
> 
> Iago
> 
> >
> > On Tue, Nov 18, 2014 at 1:23 AM, Iago Toral Quiroga
> >  wrote:
> > In general, if the dst base internal format and the
> selected
> > dst format are
> > different we can't be sure that directly packing or
> unpacking
> > to the destination
> > format will produce correct results. In these cases
> we
> > probably want to fall
> > back to other paths (like mesa_swizzle_and_convert)
> that can
> > handle these
> > situations better.
> >
> > An example of this includes a luminance internal
> format for
> > which the driver
> > decided to use something like BGRA. In these case,
> unpacking
> > to BGRA won't
> > match the requirements of the luminance int

Re: [Mesa-dev] [PATCH 02/29] mesa: Set normalized=true for float array formats.

2014-11-20 Thread Iago Toral
On Thu, 2014-11-20 at 10:40 -0800, Jason Ekstrand wrote:
> 
> 
> On Wed, Nov 19, 2014 at 11:24 PM, Iago Toral 
> wrote:
> Hi Jason,
> 
> we discussed this some weeks ago actually, the detailed
> explanation is
> here:
> https://bugs.freedesktop.org/show_bug.cgi?id=84566#c5
> 
> the short answer is that this is necessary because there is a
> normalized
> parameter to _mesa_swizzle_and_convert, and when we deal with
> float
> types we want to set this to true.
> 
> 
> I went back and looked at that and I thought the result of the
> discussion was to fix the assert in mesa_format_convert and compute
> the normalized parameter correctly.  After that, I thought this
> shouldn't be strictly needed.  It may still be a good idea for
> consistency, but I want to make sure we're doing the right thing in
> mesa_format_convert

With this patch, in mesa_format_convert we simply take the "normalized"
value for mesa_swizzle_and_convert from the normalized field of the
array format, since we make sure that all float array formats will have
this set to 1.

Without this patch we would have to do something like this (pseudocode)
in mesa_format_convert:

normalized = array_format.normalized || array_format.type == FLOAT ||  
 array_format.type == HALF_FLOAT;

We can do it either way, I just think that the latter is a bit
inconsistent because:

a) why would we want to generate array formats with a normalized setting
of 0 if we then want to set normalized to true when they are involved?.

b) Other parts of Mesa check if a format is normalized by doing
normalized = !_mesa_is_enum_format_integer(srcFormat), which will make
float types normalized.

Iago

> Iago
> 
> On Wed, 2014-11-19 at 11:31 -0800, Jason Ekstrand wrote:
> > I'm not sure what I think about this.  Does it make a
> functional
> > change other than consistancy?
> >
> > --Jason
> >
> >
> > On Tue, Nov 18, 2014 at 1:23 AM, Iago Toral Quiroga
> >  wrote:
> > In order to check if a format is normalized Mesa
> does
> > something like this:
> > normalized = !
> _mesa_is_enum_format_integer(srcFormat);
> >
> > So all float types will set normalized to true.
> Since our
> > mesa_array_format
> > includes a normalized flag for each type we want to
> make it
> > consistent with
> > this.
> > ---
> >  src/mesa/main/format_info.py | 3 ++-
> >  src/mesa/main/format_utils.c | 2 +-
> >  2 files changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/mesa/main/format_info.py
> > b/src/mesa/main/format_info.py
> > index 315767d..d4bc276 100644
> > --- a/src/mesa/main/format_info.py
> > +++ b/src/mesa/main/format_info.py
> > @@ -220,9 +220,10 @@ for fmat in formats:
> > print '  {{ {0} }},'.format(',
> '.join(map(str,
> > fmat.swizzle)))
> > if fmat.is_array() and fmat.colorspace in
> ('rgb', 'srgb'):
> >chan = fmat.array_element()
> > +  norm = chan.norm or chan.type == parser.FLOAT
> >print '   {0} ,'.format(',
> '.join([
> >   get_array_format_datatype(chan),
> > - str(int(chan.norm)),
> > + str(int(norm)),
> >   str(len(fmat.channels)),
> >   str(fmat.swizzle[0]),
> >   str(fmat.swizzle[1]),
> > diff --git a/src/mesa/main/format_utils.c
> > b/src/mesa/main/format_utils.c
> > index c3815cb..1d65f2b 100644
> > --- a/src/mesa/main/format_utils.c
> > +++ b/src/mesa/main/format_utils.c
> > @@ -30,7 +30,7 @@
> >
> >  mesa_array_format RGBA_FLOAT = {{
> > MESA_ARRAY_FORMAT_TYPE_FLOAT,
> > -   0,
> > +   1,
> > 4,
> > 0, 

Re: [Mesa-dev] [PATCH 01/29] mesa: Add an implementation of a master convert function.

2014-11-21 Thread Iago Toral
On Thu, 2014-11-20 at 11:10 -0800, Jason Ekstrand wrote:
> 
> 
> On Thu, Nov 20, 2014 at 1:48 AM, Iago Toral  wrote:
> On Wed, 2014-11-19 at 11:28 -0800, Jason Ekstrand wrote:
> > By and large, this looks good to me.  Most of my comments
> are cosmetic
> > or suggestions for added documentation.  There is one issue
> that I
> > think is subtly wrong with integer format conversion but
> that should
> > be easy to fix.
> >
> > --Jason
>     >
> > On Tue, Nov 18, 2014 at 1:23 AM, Iago Toral Quiroga
> >  wrote:
>     > From: Jason Ekstrand 
> >
> > v2 by Iago Toral :
> >
> > - When testing if we can directly pack we should use
> the src
> > format to check
> >   if we are packing from an RGBA format. The
> original code
> > used the dst format
> >   for the ubyte case by mistake.
> > - Fixed incorrect number of bits for dst, it was
> computed
> > using the src format
> >   instead of the dst format.
> > - If the dst format is an array format, check if it
> is signed.
> > We were only
> >   checking this for the case where it was not an
> array format,
> > but we need
> >   to know this in both scenarios.
> > - Fixed incorrect swizzle transform for the cases
> where we
> > convert between
> >   array formats.
> > - Compute is_signed and bits only once and for the
> dst format.
> > We were
> >   computing these for the src format too but they
> were
> > overwritten by the
> >   dst values immediately after.
> > - Be more careful when selecting the integer path.
> > Specifically, check that
> >   both src and dst are integer types. Checking only
> one of
> > them should suffice
> >   since OpenGL does not allow conversions between
> normalized
> > and integer types,
> >   but putting extra care here makes sense and also
> makes the
> > actual requirements
> >   for this path more clear.
> > - The format argument for pack functions is the
> destination
> > format we are
> >   packing to, not the source format (which has to be
> RGBA).
> > - Expose RGBA_* to other files. These will come
> in handy
> > when in need to
> >   test if a given array format is RGBA or in need to
> pass RGBA
> > formats to
> >   mesa_format_convert.
> >
> > v3 by Samuel Iglesias :
> >
> > - Add an RGBA_INT definition.
> > ---
> >  src/mesa/main/format_utils.c | 378
> > +++
> >  src/mesa/main/format_utils.h |  10 ++
> >  src/mesa/main/formats.h  |  15 +-
> >  3 files changed, 399 insertions(+), 4 deletions(-)
> >
> > diff --git a/src/mesa/main/format_utils.c
> > b/src/mesa/main/format_utils.c
> > index fcbbba4..c3815cb 100644
> > --- a/src/mesa/main/format_utils.c
> > +++ b/src/mesa/main/format_utils.c
> > @@ -25,6 +25,384 @@
> >  #include "format_utils.h"
> >  #include "glformats.h"
> >  #include "macros.h"
> > +#include "format_pack.h"
> > +#include "format_unpack.h"
> > +
> > +mesa_array_format RGBA_FLOAT = {{
> > +   MESA_ARRAY_FORMAT_TYPE_FLOAT,
> > +   0,
> > +   4,
> > +   0, 1, 2, 3,
> > +   0, 1
> > +}};
> > +

Re: [Mesa-dev] [PATCH 01/29] mesa: Add an implementation of a master convert function.

2014-11-21 Thread Iago Toral
On Fri, 2014-11-21 at 17:40 +0900, Michel Dänzer wrote:
> On 21.11.2014 17:07, Iago Toral wrote:
> > On Thu, 2014-11-20 at 11:10 -0800, Jason Ekstrand wrote:
> >> On Thu, Nov 20, 2014 at 1:48 AM, Iago Toral  wrote:
> >>  Just out of curiosity: is there any gain in avoiding the GL
> >>  types in the
> >>  conversion code?
> >>
> >>
> >> As I said in my reply to Jose on the 00/20 patch, we would like to
> >> eventually move the format conversion stuff to a common helper library
> >> that can be shared by mesa main and the gallium code.  If we are going
> >> to do that, then we don't want any GL dependencies.
> >
> > Yes, I know, I was just wondering why Mesa and Gallium would need/want
> > to avoid the GL dependencies in that helper library, since clients of
> > that library would usually be things like glTexImage or glReadPixels and
> > these deal with GL types.
> 
> Since OpenGL is only one out of many APIs provided by state trackers on 
> top of Gallium (some of which have nothing to do with OpenGL), code 
> below the Gallium interface using GL types directly is a layering violation.

Aha, I see now. That makes sense.
Thanks for the explanation!

Iago


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/29] mesa: Do not assert on integer<->non-integer direct pack/unpack fast paths

2014-11-21 Thread Iago Toral
On Thu, 2014-11-20 at 21:35 -0800, Jason Ekstrand wrote:
> 
> 
> On Thu, Nov 20, 2014 at 9:33 PM, Jason Ekstrand 
> wrote:
> 
> 
> On Thu, Nov 20, 2014 at 12:29 AM, Iago Toral
>  wrote:
> It is explained here:
> https://bugs.freedesktop.org/show_bug.cgi?id=84566#c35
> 
> So one example of this was a glReadPixels where we
> want to store the
> pixel data as RGBA UINT, but the render buffer format
> we  read from is
> MESA_FORMAT_B8G8R8A8_UNORM. There are piglit tests
> that hit this case.
> 
> 
> I'm still not seeing how this is allowed.  From the 4.2 core
> spec:
> 
> "If format is one of RED , GREEN , BLUE , RG , RGB , RGBA ,
> BGR , or BGRA , then
> red, green, blue, and alpha values are obtained from the
> selected buffer at each
> pixel location.
> If format is an integer format and the color buffer is not an
> integer format, or
> if the color buffer is an integer format and format is not an
> integer format, an
> INVALID_OPERATION error is generated."
> 
> 
> I also checked the 3.3 compatibility spec and it says the same
> thing.  This seems to indicate that that combination should
> result in GL_INVALID_OPERATION.
> 
> 
> 
> I also just CC'd Ian, our local spec expert.  Maybe he can shed a
> little light on this.

No need. I have reverted the commit and run piglit again on i965 and
swrast and I don't hit the assert any more, so I guess that when I was
hitting that it was because of a bug somewhere in the GL->Mesa type
mapping that I must have fixed after added this patch.

I'll remove the patch in the second version of the series.
>  
> 
> Iago
> 
> On Wed, 2014-11-19 at 12:04 -0800, Jason Ekstrand
> wrote:
> > Can you remind me again as to what formats hit these
> paths?  I
> > remember you hitting them, but I'm still not really
>     seeing how it
> > happens.
> >
> > --Jason
> >
> >
> > On Tue, Nov 18, 2014 at 1:23 AM, Iago Toral Quiroga
> >  wrote:
> > We can have conversions from non-integer
> types to integer
> > types, so remove
> > the assertions for these in the pack/unpack
> fast paths. It
> > could be that
> > we do not have all the necessary pack/unpack
> functions in
> > these cases though,
> > so protect these paths with conditionals and
> let
> > _mesa_format_convert use
> > other paths to resolve these kind of
> conversions if necessary.
> > ---
> >  src/mesa/main/format_utils.c | 16
> 
> >  1 file changed, 8 insertions(+), 8
> deletions(-)
> >
> > diff --git a/src/mesa/main/format_utils.c
> > b/src/mesa/main/format_utils.c
> > index 1d65f2b..56a3b8d 100644
> > --- a/src/mesa/main/format_utils.c
> > +++ b/src/mesa/main/format_utils.c
> > @@ -143,8 +143,8 @@
> _mesa_format_convert(void *void_dst,
> > uint32_t dst_format, size_t dst_stride,
> >  dst += dst_stride;
> >   }
> >   return;
> > -  } else if (dst_array_format.as_uint
> ==
> > RGBA_UBYTE.as_uint) {
> > - assert(!
> _mesa_is_format_integer_color(src_format));
> > +  } else if (dst_array_format.as_uint
> ==
> > RGBA_UBYTE.as_uint &&
> > + !
>  

Re: [Mesa-dev] Using the 'f' suffix to create a float from an integer literal

2014-11-24 Thread Iago Toral
On Thu, 2014-11-20 at 09:33 -0800, Ian Romanick wrote:
> On 11/20/2014 05:33 AM, Neil Roberts wrote:
> > For what it's worth, I did a quick grep through the internal and public
> > shader-db and I couldn't find anything using this.
> > 
> >  git grep -P '\b(? > 
> > If AMD disallows it then it seems like it would be reasonably safe to
> > disallow it in Mesa too.
> > 
> > GCC disallows it too which could be an indication that people won't have
> > a habit of using it.
> 
> So... the GLSL spec actually follows C?  Then we should definitely
> follow the spec, and there's no need for a GLSL spec bug.  If AMD
> disallows it, then there are not likely to be any applications that
> depend on it... so I agree with Neil that we're safe to disallow it too.
> 
> I'm still curious about glslang... if glslang allows it, I'll file a bug
> against glslang. :)

glslang follows the spec, it spits an error and fails to compile a
shader that uses that syntax.

Iago

> > - Neil
> > 
> > Iago Toral  writes:
> > 
> >> On Thu, 2014-11-20 at 08:08 +0100, Iago Toral wrote:
> >>> On Wed, 2014-11-19 at 10:27 -0800, Ian Romanick wrote:
> >>>> On 11/19/2014 03:47 AM, Iago Toral Quiroga wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I came across a GLSL test that checks that doing something like this in
> >>>>> a shader should fail:
> >>>>
> >>>> Is this one of the dEQP tests?
> >>>
> >>> Yes.
> >>>
> >>>>> float value = 1f;
> >>>>
> >>>> It seems like we have a test related to this in piglit somewhere... it
> >>>> looks like tests/shaders/glsl-floating-constant-120.shader_test uses
> >>>> that syntax, but it's not explicitly testing that feature.
> >>>>
> >>>>> However, this works fine in Mesa. Checking the spec I  see:
> >>>>>
> >>>>> Floating-point constants are defined as follows.
> >>>>>  floating-constant:
> >>>>>fractional-constant exponent-part(opt) floating-suffix(opt)
> >>>>>digit-sequence exponent-part floating-suffix(opt)
> >>>>>  fractional-constant:
> >>>>>digit-sequence . digit-sequence
> >>>>>digit-sequence .
> >>>>>. digit-sequence
> >>>>>  exponent-part:
> >>>>>e sign(opt) digit-sequence
> >>>>>E sign(opt) digit-sequence
> >>>>>  sign: one of
> >>>>>+ -
> >>>>>  digit-sequence:
> >>>>>digit
> >>>>>digit-sequence digit
> >>>>>  floating-suffix: one of
> >>>>>f F
> >>>>>
> >>>>> which suggests that the test is correct and Mesa has a bug. According to
> >>>>> the above rules, however, something like this is fine:
> >>>>>
> >>>>> float f = 1e2f;
> >>>>>
> >>>>> which I find kind of weird if the other case is not valid, so I wonder
> >>>>> if there is a bug in the spec or this is on purpose for some reason that
> >>>>> I am missing.
> >>>>>
> >>>>> Then, to make matters worse, I read this in opengl.org wiki [1]:
> >>>>>
> >>>>> "A numeric literal that uses a decimal is by default of type float​. To
> >>>>> create a float literal from an integer value, use the suffix f​ or F​ as
> >>>>> in C/C++."
> >>>>>
> >>>>> which contradicts the spec and the test and is aligned with the current
> >>>>> way Mesa works.
> >>>>>
> >>>>> So: does anyone know what version is right? Could this be a bug in the
> >>>>> spec? and if it is not, do we want to change the behavior to follow the
> >>>>> spec as it is now?
> >>>>
> >>>> The $64,000 question: What do other GLSL compilers (including, perhaps,
> >>>> glslang) do?  This seems like one of the cases where nobody is likely to
> >>>> follow the spec, and some application will depend on that.  If that's
> >>>> the case, I'll submit a spec bug.
> >>>
> >>> Good point. I'll try to check a few cases and reply here. Thanks!
> >>
> >> I did a quick test on AMD Radeon and nVidia proprietary drivers since I
> >> had these easily available. AMD fails to compile (so it follows the
> >> spec) but nVidia works (so same case as Mesa).
> >>
> >> This confirms your guess: different drivers are doing different things.
> >> Is this enough to file a spec bug? I imagine that the result on glslang
> >> won't change anything, but I can try to install it and test there too if
> >> you think that's interesting anyway.
> >>
> >> Iago
> >>
> >> ___
> >> mesa-dev mailing list
> >> mesa-dev@lists.freedesktop.org
> >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] i965: Relax accumulator dependency scheduling on Gen < 6

2014-05-13 Thread Iago Toral
Hi Matt, are you okay with excluding FS_OPCODE_CINTERP from the list of
virtual instructions to consider in Gen < 6? As far as I can see it is
implemented as a single MOV instruction and the accumulator does not
seem to be used but maybe I am missing something.

Iago

On Wed, 2014-05-07 at 09:58 +0200, Iago Toral Quiroga wrote:
> Many instructions implicitly update the accumulator on Gen < 6. The 
> instruction
> scheduling code just calls add_barrier_deps() for each accumulator access on
> these platforms, but a large class of operations don't actually update the
> accumulator -- mostly move and logical instructions. Teaching the scheduling
> code about this would allow more flexibility to schedule instructions.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77740
> ---
> 
> This version properly identifies virtual instructions that write to the
> accumulator in Gen < 6 as indicated by Matt. FS_OPCODE_CINTERP is excluded
> because it seems to be implemented as a MOV.
> 
> Passes piglit tests on IronLake.
> 
>  .../drivers/dri/i965/brw_schedule_instructions.cpp | 84 
> +++---
>  src/mesa/drivers/dri/i965/brw_shader.cpp   | 10 +++
>  src/mesa/drivers/dri/i965/brw_shader.h |  1 +
>  3 files changed, 36 insertions(+), 59 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp 
> b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> index 8cc6908..6f8f405 100644
> --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> @@ -742,8 +742,6 @@ fs_instruction_scheduler::is_compressed(fs_inst *inst)
>  void
>  fs_instruction_scheduler::calculate_deps()
>  {
> -   const bool gen6plus = v->brw->gen >= 6;
> -
> /* Pre-register-allocation, this tracks the last write per VGRF (so
>  * different reg_offsets within it can interfere when they shouldn't).
>  * After register allocation, reg_offsets are gone and we track individual
> @@ -803,7 +801,7 @@ fs_instruction_scheduler::calculate_deps()
>  } else {
> add_dep(last_fixed_grf_write, n);
>  }
> - } else if (inst->src[i].is_accumulator() && gen6plus) {
> + } else if (inst->src[i].is_accumulator()) {
>  add_dep(last_accumulator_write, n);
>} else if (inst->src[i].file != BAD_FILE &&
>   inst->src[i].file != IMM &&
> @@ -828,11 +826,7 @@ fs_instruction_scheduler::calculate_deps()
>}
>  
>if (inst->reads_accumulator_implicitly()) {
> - if (gen6plus) {
> -add_dep(last_accumulator_write, n);
> - } else {
> -add_barrier_deps(n);
> - }
> + add_dep(last_accumulator_write, n);
>}
>  
>/* write-after-write deps. */
> @@ -867,7 +861,7 @@ fs_instruction_scheduler::calculate_deps()
>   } else {
>  last_fixed_grf_write = n;
>   }
> -  } else if (inst->dst.is_accumulator() && gen6plus) {
> +  } else if (inst->dst.is_accumulator()) {
>   add_dep(last_accumulator_write, n);
>   last_accumulator_write = n;
>} else if (inst->dst.file != BAD_FILE &&
> @@ -887,13 +881,10 @@ fs_instruction_scheduler::calculate_deps()
>last_conditional_mod[inst->flag_subreg] = n;
>}
>  
> -  if (inst->writes_accumulator) {
> - if (gen6plus) {
> -add_dep(last_accumulator_write, n);
> -last_accumulator_write = n;
> - } else {
> -add_barrier_deps(n);
> - }
> +  if (inst->writes_accumulator_implicitly(v->brw->gen) &&
> +  !inst->dst.is_accumulator()) {
> + add_dep(last_accumulator_write, n);
> + last_accumulator_write = n;
>}
> }
>  
> @@ -933,7 +924,7 @@ fs_instruction_scheduler::calculate_deps()
>  } else {
> add_dep(n, last_fixed_grf_write);
>  }
> - } else if (inst->src[i].is_accumulator() && gen6plus) {
> + } else if (inst->src[i].is_accumulator()) {
>  add_dep(n, last_accumulator_write);
>   } else if (inst->src[i].file != BAD_FILE &&
>   inst->src[i].file != IMM &&
> @@ -958,11 +949,7 @@ fs_instruction_scheduler::calculate_deps()
>}
>  
>if (inst->reads_accumulator_implicitly()) {
> - if (gen6plus) {
> -add_dep(n, last_accumulator_write);
> - } else {
> -add_barrier_deps(n);
> -  

[Mesa-dev] i965: Need help testing patches to fix bogus antaliasing in Gen4/Gen5

2014-05-23 Thread Iago Toral
Hi,

Short story:

I have two implementations of a patch to fix this bug:
https://bugs.freedesktop.org/show_bug.cgi?id=78679

I would need help testing both on Gen4 (I don't have Gen4 hardware
available) and I would also like to understand why only on of them works
on Gen5. There is a patch making both implementations available for
testing attached to the bug report.

Following are the details of the problem and some issues I see with the
two implementations I have. Hopefully someone can help me address the
questions/problems I mention below.

Long story:

The bug report is related to evaluating whether
c->runtime_check_aads_emit is still needed and in that case put back the
code that dealt with it and that was removed with commit 098acf6c843.

I found that this flag is relevant for correct rendering in Gen < 6 when
the following requirements are met:

- Rendering polygons (not lines)
- Rendering exactly one of the polygon's faces in GL_LINE mode and the
other in GL_FILL mode (and none of them are culled).
- Enable smooth lines (antaliased lines).

I attached a sample program to the bug report that generates this
scenario. Running the program in IronLake shows incorrect rendering of
the GL_FILL face of the polygon and correct rendering of the GL_LINE
face.

The idea here is that with this setup in Gen < 6 the hardware generates
a runtime bit that indicates whether AA data needs to be sent as part of
the frame buffer write SEND message. Particularly, AA data has to be
sent when rendering the GL_LINE face of the polygon, but not when
rendering the GL_FILL face for things to work properly.

The current code always sends AA info which produces the incorrect
rendering results in the sample program I mentioned. Instead,
runtime_check_aads_emit should be checked and if TRUE we should generate
code that tests this bit at runtime and changes the SEND message
accordingly.

The code removed in the mentioned commit apparently addressed this for
Gen4, however rendering in Gen5 was not addressed even before that
commit because the code path that included the code to deal with this
did not run in Gen5, so for IronLake this has always been broken as far
as I can tell and, presumably, it has been broken for Gen < 5 since that
commit.

The original code used a brw_JMPI instruction to handle the runtime
check when runtime_check_aads_emit==TRUE. Adapted to the current code
base it would look something like this:

 brw_set_conditionalmod(p, BRW_CONDITIONAL_NZ);
 brw_AND(p,
 v1_null_ud,
 get_element_ud(brw_vec8_grf(1,0), 6),
 brw_imm_ud(1<<26));
 jmp = brw_JMPI(p, ip, ip, brw_imm_ud(0)) - p->store;
 {
fire_fb_write(inst, inst->base_mrf+1, implied_header, inst->mlen-1);
 }
 brw_land_fwd_jump(p, jmp);
 fire_fb_write(inst, inst->base_mrf, implied_header, inst->mlen);

Basically the idea here is to shift the message header by one position
to fill in the space that would otherwise be used for AA data (stored in
base_mrf+2) when the runtime test indicates that we should not send AA
info in the SEND message.

However, at least in my IronLake hardware this does not produce correct
rendering results either. The face of the polygon in GL_FILL mode,
although different from before, is still not rendered properly. The
original code included comments suggesting that the resulting program
would be equivalent to an IF/ELSE structure where only one of the
fire_fb_writes will be executed.

However, I found that this other code that in theory should be
equivalent, produces correct results:

 struct brw_reg v1_null_ud =
   vec1(retype(brw_null_reg(), BRW_REGISTER_TYPE_UD));
 struct brw_reg tmp_reg_ud = brw_vec1_grf(BRW_MAX_GRF-1, 0);
 brw_AND(p,
 tmp_reg_ud,
 get_element_ud(brw_vec8_grf(1,0), 6),
 brw_imm_ud(1<<26));
 brw_CMP(p,
 v1_null_ud,
 BRW_CONDITIONAL_Z,
 tmp_reg_ud,
 brw_imm_ud(0));
 brw_IF(p, BRW_EXECUTE_1);
 {
fire_fb_write(inst, inst->base_mrf+1, implied_header, inst->mlen-1);
 }
 brw_ELSE(p);
 {
fire_fb_write(inst, inst->base_mrf, implied_header, inst->mlen);
 }
 brw_ENDIF(p);

I have no idea why this happens. Also, since the implementation that
follows the strategy from the original code does not work on IronLake, I
wonder if that means that it was not working correctly in Gen < 5
either.

Can anyone explain if the is any difference between both
implementations? If it helps, here is the assembly code produced for
both implementations (notice that fire_fb_write produces a mov and a
send):

IF/ELSE/ENDIF version:

and(1)   g127<1>F  g1.6<0,1,0>UD  0x0400UD{ align1 nomask };
cmp.e.f0(1)  null  g127<0,1,0>F   0xUD{ align1 nomask };
(+f0) if(1)ip 6D  { align1 switch };
mov(8)   m3<1>Fg1<8,8,1>F { align1 nomask };
send(16) 2   null  g0<8,8,1>UW
  write (0, 8, 4, 0) mlen 10 rlen 0   { align1 nomask EOT };
else(1)ip 65

Re: [Mesa-dev] [PATCH 2/3] i965: Add runtime checks for line antialiasing in Gen < 6.

2014-05-29 Thread Iago Toral
Hi Kenneth,

On Wed, 2014-05-28 at 12:42 -0700, Kenneth Graunke wrote:
> On 05/27/2014 03:50 AM, Iago Toral Quiroga wrote:
> > In Gen < 6 the hardware generates a runtime bit that indicates whether AA 
> > data
> > has to be sent as part of the framebuffer write SEND message. This affects 
> > the
> > specific case where we have setup antialiased line rendering and we render
> > polygons which have one face setup in GL_LINE mode (line antialiasing
> > will be used) and the other one in GL_FILL mode (no line antialiasing 
> > needed).
> > 
> > Currently we are not doing this runtime test and instead we always send AA
> > data, which produces incorrect rendering of the GL_FILL face of the polygon 
> > in
> > in the aforementioned scenario (verified in ironlake and gm45).
> > 
> > In Gen4 this is, likely, a regression introduced with commit 098acf6c843. In
> > Gen5 this has never worked properly. Gen > 5 are not affected by this.
> > 
> > The patch fixes the problem by adding the appropriate runtime check and
> > adjusting the framebuffer write message accordingly in the conflictive
> > scenario (detected with fs_visitor::runtime_check_aads_emit == TRUE).
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78679
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs.h   |  4 ++
> >  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 86 
> > +---
> >  2 files changed, 58 insertions(+), 32 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
> > b/src/mesa/drivers/dri/i965/brw_fs.h
> > index 60a4906..ab8912f 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.h
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> > @@ -452,6 +452,10 @@ public:
> >  
> > void emit_color_write(int target, int index, int first_color_mrf);
> > void emit_alpha_test();
> > +   void do_emit_fb_write(int target, int base_mrf, int mlen, bool eot,
> > + bool header_present);
> > +   void emit_fb_write(int target, int base_mrf, int mlen, bool eot,
> > +  bool header_present);
> > void emit_fb_writes();
> >  
> > void emit_shader_time_begin();
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
> > b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > index 171f063..4c3897b 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > @@ -2731,6 +2731,54 @@ fs_visitor::emit_alpha_test()
> >  }
> >  
> >  void
> > +fs_visitor::do_emit_fb_write(int target, int base_mrf, int mlen, bool eot,
> > + bool header_present)
> > +{
> > +   fs_inst *inst = emit(FS_OPCODE_FB_WRITE);
> > +   inst->target = target;
> > +   inst->base_mrf = base_mrf;
> > +   inst->mlen = mlen;
> > +   inst->eot = eot;
> > +   inst->header_present = header_present;
> > +   if ((brw->gen >= 8 || brw->is_haswell) && fp->UsesKill) {
> > +  inst->predicate = BRW_PREDICATE_NORMAL;
> > +  inst->flag_subreg = 1;
> > +   }
> > +}
> > +
> > +void
> > +fs_visitor::emit_fb_write(int target, int base_mrf, int mlen, bool eot,
> > +  bool header_present)
> > +{
> > +   if (!runtime_check_aads_emit) {
> > +  do_emit_fb_write(target, base_mrf, mlen, eot, header_present);
> > +   } else {
> > +  /* This can only happen in Gen < 6
> > +   */
> > +  fs_reg reg_tmp_ud = fs_reg(this, glsl_type::uint_type);
> > +  emit(AND(reg_tmp_ud,
> > +   fs_reg(get_element_ud(brw_vec8_grf(1,0), 6)),
> 
> I think
> 
> retype(brw_vec1_grf(1, 6), BRW_REGISTER_TYPE_UD)
> 
> might be a little clearer than:
> 
> get_element_ud(brw_vec8_grf(1,0), 6))
> 
> since it just refers to r1.6 right away, rather than r1.0 modified to
> have a suboffset of 6.

Sure, that looks better.

> > +   fs_reg(brw_imm_ud(1<<26;
> > +  emit(CMP(reg_null_ud,
> > +   reg_tmp_ud,
> > +   fs_reg(brw_imm_ud(0)),
> > +   BRW_CONDITIONAL_Z));
> 
> You can actually generate a flag condition directly from the AND
> instruction, and eliminate the CMP:
> 
> fs_inst *inst =
>emit(AND(reg_null_ud,
> fs_reg(retype(brw_vec1_grf(1, 6), BRW_REGISTER_TYPE_UD),
> fs_reg(0)));
> inst->conditional_mod = BRW_CONDITIONAL_Z;
> 
> (you might have to use vec1(retype(brw_null_reg

Re: [Mesa-dev] [PATCH 2/3] i965: Add runtime checks for line antialiasing in Gen < 6.

2014-06-03 Thread Iago Toral
On Thu, 2014-05-29 at 08:31 -0700, Kenneth Graunke wrote:
(...)
> >>> +  emit(IF(BRW_PREDICATE_NORMAL));
> >>> +  {
> >>> + /* Shift message header one register since we are not sending
> >>> +  * AA data stored in base_mrf+2
> >>> +  */
> >>> + do_emit_fb_write(target, base_mrf + 1, mlen - 1, eot, 
> >>> header_present);
> >>> +  }
> >>> +  emit(BRW_OPCODE_ELSE);
> >>> +  {
> >>> + do_emit_fb_write(target, base_mrf, mlen, eot, header_present);
> >>> +  }
> >>> +  emit(BRW_OPCODE_ENDIF);
> >>
> >> I suppose this probably works, but I've never seen a program end with an
> >> ENDIF.  I'm not really comfortable with doing that, since the ENDIF
> >> should never be executed.  With JMPI or ADD brw_ip_reg(), we'd just jump
> >> over one instruction and then execute one FB write or the other, which
> >> seems really straightforward.
> >>
> >> But, looking at the bug, I see you tried both of those suggestions, and
> >> it didn't work for some reason.  Huh.  It really should...maybe I can
> >> look into why...
> > 
> > Right, I don't understand why that code did not work for me either. More
> > over, the code did not work as I expected in other ways, for example I
> > noticed that reverting the condition and the order of the fb_writes
> > (which should be completely the same thing) changed the behavior too...
> > I remember that it produced GPU hangs but I tried so many versions that
> > I can't tell... it felt as if the two fb_writes were being executed.  I
> > tried all kinds of things to make it work as intended, even including a
> > manual JMPI at the end of the IF branch to force the jump over the
> > second fb_write... but nothing.
> > 
> > I commented about this and included the C code and the generated
> > assembly for the JMPI version hoping that someone could tell if
> > something was off here:
> > 
> > http://lists.freedesktop.org/archives/mesa-dev/2014-May/059985.html
> > 
> > Maybe you can have a quick look at that see if there is something
> > obvious that I am missing...
> 
> Yeah, I'll try and take a look...

I found what was going on with this. I have to manually set the
execution_size of the JMPI instruction to BRW_EXECUTE_1, that seems to
fix the problem.

> > Also, when I was trying the JMPI version I was developing the patches in
> > the generator rather than the visitor. I suppose that should not make
> > any difference though.
> > 
> > One last thing, JMPI was recently made private to the sf module, so if
> > we want to do go with it we need to revert f3cb2e6ed705.
> 
> It wasn't, actually - brw_JMPI is still available in brw_eu_emit.c.
> It's only brw_land_fwd_jump that moved to brw_sf_state.c, but all that
> does is set the jump distance on the JMPI instruction.  It seems like
> you know the jump distance when emitting the JMPI (jump over 1 FB write
> and maybe some MOVs) so you could just set it straightaway, and skip
> that function.  Or you could revert it - that'd be fine too.

Oh, right... I did not notice that.

Still, brw_land_fwd_jump() seems very convenient and safer to use than
relying on manual calculations for the jump distance (I think that would
be specially error prone if someone has to change the code at a later
point, since they have to know that they need to alter the jump distance
if they add/remove instructions that execute in the conditional) so I
think reverting is the best option.

Iago

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/4] i965: Always set a valid block end pointer

2014-06-09 Thread Iago Toral
On Mon, 2014-06-09 at 02:22 -0700, Kenneth Graunke wrote:
> On Thursday, June 05, 2014 03:03:05 PM Iago Toral Quiroga wrote:
> > When a instruction stream ends in a block structure (like a IF/ELSE/ENDIF) 
> the
> > last block's end pointer will not be set, leading to a crash later on in
> > fs_live_variables::setup_def_use().
> > 
> > If we have not assigned the end pointer of the last block, set it to the 
> last
> > instruction.
> > ---
> >  src/mesa/drivers/dri/i965/brw_cfg.cpp | 5 +
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_cfg.cpp 
> b/src/mesa/drivers/dri/i965/brw_cfg.cpp
> > index 6bf99f1..d4647c4 100644
> > --- a/src/mesa/drivers/dri/i965/brw_cfg.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_cfg.cpp
> > @@ -257,6 +257,11 @@ cfg_t::cfg_t(exec_list *instructions)
> >}
> > }
> >  
> > +   /* If the instruction stream ended with a block structure we need to
> > +  set the block's end pointer to the last instruction here */
> > +   if (!cur->end)
> > +  cur->end = (backend_instruction *)instructions->get_tail();
> > +
> > cur->end_ip = ip;
> >  
> > make_block_array();
> > 
> 
> I'm not too excited about this patch - threads must terminate with a SEND 
> message to a particular shared function, such as a URB write or FB write.  So 
> the instruction stream really shouldn't end in an IF/ELSE/ENDIF.
> 
> I think we should probably just drop it, as it isn't needed for the latest 
> version of your series anyway.

Agreed.

> That said, perhaps we should add an assert(cur->end) here?  It seems like 
> catching this problem earlier than fs_live_variables::setup_def_use would be 
> nice.  What do you think?

Yes, that would be better.

Should I just push a commit adding the assertion then? I guess there is
not much to review about it :)

Iago

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 4/4] i965/fs: Add Gen < 6 runtime checks for line antialiasing.

2014-06-09 Thread Iago Toral
On Mon, 2014-06-09 at 02:31 -0700, Kenneth Graunke wrote:
> On Thursday, June 05, 2014 03:03:08 PM Iago Toral Quiroga wrote:
> > In Gen < 6 the hardware generates a runtime bit that indicates whether AA 
> data
> > has to be sent as part of the framebuffer write SEND message. This affects 
> the
> > specific case where we have setup antialiased line rendering and we render
> > polygons which have one face setup in GL_LINE mode (line antialiasing
> > will be used) and the other one in GL_FILL mode (no line antialiasing 
> needed).
> > 
> > Currently we are not doing this runtime test and instead we always send AA
> > data, which produces incorrect rendering of the GL_FILL face of the polygon 
> in
> > in the aforementioned scenario (verified in ironlake and gm45).
> > 
> > In Gen4 this is, likely, a regression introduced with commit 098acf6c843. In
> > Gen5 this has never worked properly. Gen > 5 are not affected by this.
> > 
> > The patch fixes the problem by adding the appropriate runtime check and
> > adjusting the framebuffer write message accordingly in the conflictive
> > scenario.
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78679
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs.h |  4 ++
> >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 88 
> ++
> >  2 files changed, 65 insertions(+), 27 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
> b/src/mesa/drivers/dri/i965/brw_fs.h
> > index 02311a6..cda344e 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.h
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> > @@ -617,6 +617,10 @@ public:
> >  
> >  private:
> > void generate_code(exec_list *instructions);
> > +   void fire_fb_write(fs_inst *inst,
> > +  GLuint base_reg,
> > +  struct brw_reg implied_header,
> > +  GLuint nr);
> > void generate_fb_write(fs_inst *inst);
> > void generate_blorp_fb_write(fs_inst *inst);
> > void generate_pixel_xy(struct brw_reg dst, bool is_x);
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > index f4e4826..04c9b74 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > @@ -98,11 +98,47 @@ fs_generator::patch_discard_jumps_to_fb_writes()
> >  }
> >  
> >  void
> > +fs_generator::fire_fb_write(fs_inst *inst,
> > +GLuint base_reg,
> > +struct brw_reg implied_header,
> > +GLuint nr)
> > +{
> > +   uint32_t msg_control;
> > +
> > +   if (brw->gen < 6) {
> > +  brw_MOV(p,
> > +  brw_message_reg(base_reg + 1),
> > +  brw_vec8_grf(1, 0));
> > +   }
> > +
> > +   if (this->dual_source_output)
> > +  msg_control = 
> BRW_DATAPORT_RENDER_TARGET_WRITE_SIMD8_DUAL_SOURCE_SUBSPAN01;
> > +   else if (dispatch_width == 16)
> > +  msg_control = BRW_DATAPORT_RENDER_TARGET_WRITE_SIMD16_SINGLE_SOURCE;
> > +   else
> > +  msg_control = 
> BRW_DATAPORT_RENDER_TARGET_WRITE_SIMD8_SINGLE_SOURCE_SUBSPAN01;
> > +
> > +   uint32_t surf_index =
> > +  prog_data->binding_table.render_target_start + inst->target;
> > +
> > +   brw_fb_WRITE(p,
> > +dispatch_width,
> > +base_reg,
> > +implied_header,
> > +msg_control,
> > +surf_index,
> > +nr,
> > +0,
> > +inst->eot,
> > +inst->header_present);
> > +
> > +   brw_mark_surface_used(&prog_data->base, surf_index);
> > +}
> > +
> > +void
> >  fs_generator::generate_fb_write(fs_inst *inst)
> >  {
> > -   bool eot = inst->eot;
> > struct brw_reg implied_header;
> > -   uint32_t msg_control;
> >  
> > /* Header is 2 regs, g0 and g1 are the contents. g0 will be implied
> >  * move, here's g1.
> > @@ -155,38 +191,36 @@ fs_generator::generate_fb_write(fs_inst *inst)
> >  implied_header = brw_null_reg();
> >} else {
> >  implied_header = retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UW);
> > -
> > -brw_MOV(p,
> > -brw_message_reg(inst->base_mrf + 1),
> > -brw_vec8_grf(1, 0));
> >

Re: [Mesa-dev] [PATCH 1/2] i965: Fix Haswell discard regressions since Gen4-5 line AA fix.

2014-06-10 Thread Iago Toral
Urgh, sorry about that :-(
I'll be more careful when manipulating the state flags next time.

Iago

On Tue, 2014-06-10 at 20:12 -0700, Kenneth Graunke wrote:
> In commit dc2d3a7f5c217a7cee92380fbf503924a9591bea, Iago accidentally
> moved fire_fb_write() above the brw_pop_insn_state(), which caused the
> SEND to lose its predication and change from WE_normal to WE_all.
> Haswell uses predicated SENDs for discards, so this broke Piglit's
> tests for discards.
> 
> We want the Gen4-5 MOV to be uncompressed, unpredicated, and unmasked,
> but the actual FB write itself should respect those.  So, pop state
> first, and force it again around the single MOV.
> 
> Signed-off-by: Kenneth Graunke 
> Cc: Iago Toral Quiroga 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> I originally Piglit those patches on Crestline, but neglected to test
> Haswell, and missed this.  Whoops.  But easily fixed!
> 
> I tested these two patches on both Crestline and Haswell this time,
> and also tested Iago's line_aa demo to make sure it still worked too.
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index 3787d24..d21b4e2 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -106,9 +106,14 @@ fs_generator::fire_fb_write(fs_inst *inst,
> uint32_t msg_control;
>  
> if (brw->gen < 6) {
> +  brw_push_insn_state(p);
> +  brw_set_default_mask_control(p, BRW_MASK_DISABLE);
> +  brw_set_default_predicate_control(p, BRW_PREDICATE_NONE);
> +  brw_set_default_compression_control(p, BRW_COMPRESSION_NONE);
>brw_MOV(p,
>brw_message_reg(base_reg + 1),
>brw_vec8_grf(1, 0));
> +  brw_pop_insn_state(p);
> }
>  
> if (this->dual_source_output)
> @@ -196,6 +201,8 @@ fs_generator::generate_fb_write(fs_inst *inst)
>implied_header = brw_null_reg();
> }
>  
> +   brw_pop_insn_state(p);
> +
> if (!runtime_check_aads_emit) {
>fire_fb_write(inst, inst->base_mrf, implied_header, inst->mlen);
> } else {
> @@ -221,8 +228,6 @@ fs_generator::generate_fb_write(fs_inst *inst)
>brw_land_fwd_jump(p, jmp);
>fire_fb_write(inst, inst->base_mrf, implied_header, inst->mlen);
> }
> -
> -   brw_pop_insn_state(p);
>  }
>  
>  void


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/18] Multi-stream support for geometry shaders

2014-06-11 Thread Iago Toral
Hi Chris,

thanks for the quick review!

On Wed, 2014-06-11 at 21:45 +1200, Chris Forbes wrote:
> I sent comments on patches 1, 3, 5, 9, 11, 16, 18

We will look into these.

> Patches 2, 4, 6-8, 10, 12-15, 17 are:
> 
> Reviewed-by: Chris Forbes 
> 
> You should also include a patch to docs/GL3.txt marking off this
> subfeature for i965 :)
> 
> 
> Do you have a bunch of piglits which exercise all the tricky corners
> of this? I see a few tests
> which require the existence of the stream layout qualifier, but not
> covering all the behavior.

No, so far we have been testing this with a standalone program. We will
check what already exists in piglit and add missing test cases.

Iago

> -- Chris
> 
> 
> 
> On Wed, Jun 11, 2014 at 7:49 PM, Iago Toral Quiroga  wrote:
> > This series brings multi-stream support to geometry shaders as defined by
> > ARB_gpu_shader5. It also covers some missing multi-stream functionality from
> > ARB_transform_feedback3. This is combined work from Samuel Iglesias and
> > myself.
> >
> > The series includes both required infrastructure in Mesa and specific
> > implementation for i965.
> >
> > Summary:
> > Patch 1: GLSL parsing bits.
> > Patches 2-8: transform feedback support.
> > Patches 9-13: support for vertex/primitive emission to non-zero streams.
> > Patches 14-15: ir_reader support.
> > Patches 16-18: transform feedback queries support (ARB_transform_feedback3).
> >
> > Notes:
> > I am not very happy with patch 11 but I could not find a better way to do 
> > this,
> > hopefully someone here knows how to do this better. There is a comment in 
> > the
> > patch explaining the problem, but here goes a summary:
> >
> > EmitVertex() is not a builtin-function like most others, when this is
> > called we don't really generate code, we simply generate an ir_emit_vertex()
> > to mark the point where vertex emission is to be done. Then, this will be
> > processed at native code-generation time via
> > vec4_gs_visitor::visit(ir_emit_vertex *) (in the i965 driver) to do what 
> > this
> > means for the specifc hardware we are dealing with.  EmitStreamVertex(n) is
> > the same thing, with the exception that it takes a compile-time constant
> > parameter with the stream id. The problem we have here is that this input
> > variable and its value are not really important until we generate native 
> > code,
> > it won't be used for anything until that moment, and this makes Mesa think 
> > that
> > it is unused in the optimization passes for dead code that we run before 
> > native
> > code genration. As a consequence, these passes will kill the variable and 
> > by the
> > time we reach vec4_gs_visitor::visit(ir_emit_vertex *) we have lost
> > the streamId information.
> >
> > The way patch 11 works around this problem is to never generate an ir_call 
> > for
> > EmitStreamVertex(n), instead, when it detects that we are calling this 
> > function,
> > it evaluates the parameters at that moment (we can do this because by 
> > definition
> > the parameter must be a compile-time constant expression), and generates the
> > ir_emit_vertex with the stream value directly. This is more efficient, gets
> > the problem solved and also solves another thing: we want to control if we 
> > are
> > emitting vertices to streams other than 0 and this also gives as the 
> > opportuity
> > to detect this situation in a place where we can set state accordingly for
> > later use. However, having to create a special case for this in does not
> > look very nice. The same goes for EndStreamPrimitive(n).
> >
> > Iago Toral Quiroga (16):
> >   mesa: add StreamId information to transform feedback outputs.
> >   i965: Enable transform feedback for streams > 0
> >   glsl: Assign GLSL StreamIds to transform feedback outputs.
> >   glsl: Fail to link if inter-stage input/outputs are not assigned to
> > stream 0
> >   glsl: Add methods to retrive a varying's name and streamId.
> >   glsl: Two varyings can't write to the same buffer from different
> > streams.
> >   glsl: Only geometry shader outputs can be associated with non-zero
> > streams.
> >   glsl: Store info about geometry shaders that emit vertices to non-zero
> > streams.
> >   i965/gs: Set number of control data bits for stream mode.
> >   glsl: Add support for EmitStreamVertex() and EndStreamPrimitive().
> >   glsl: Can't emit vertices to non-zero streams with output !=
> > GL_POINTS.
> >   i965/gs: Set control d

Re: [Mesa-dev] [PATCH 05/18] glsl: Fail to link if inter-stage input/outputs are not assigned to stream 0

2014-06-11 Thread Iago Toral
I am a bit confused here:

Reading the code it looks like input_var should be NULL if there is no
consumer stage. In that case, if this is a separable program and there
is no consumer, then input_var must be NULL too, in which case the
linker_error would never take place.

If input_var is not NULL then there has to be a consumer stage, and in
that case we should check that we can link both stages, right?

Iago

On Wed, 2014-06-11 at 20:48 +1200, Chris Forbes wrote:
> This would appear to prohibit the use of multiple streams in separable
> programs entirely. I don't think that's the right thing.
> 
> On Wed, Jun 11, 2014 at 7:49 PM, Iago Toral Quiroga  wrote:
> > Outputs that are linked to inputs in the next stage must be output to 
> > stream 0,
> > otherwise we should fail to link.
> > ---
> >  src/glsl/link_varyings.cpp | 7 +++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp
> > index 9883c0b..0066b4e 100644
> > --- a/src/glsl/link_varyings.cpp
> > +++ b/src/glsl/link_varyings.cpp
> > @@ -1343,6 +1343,13 @@ assign_varying_locations(struct gl_context *ctx,
> >* consumer stage, add the output.
> >*/
> >   if (input_var || (prog->SeparateShader && consumer == NULL)) {
> > +/* Only stream 0 outputs can be consumed in the next stage */
> > +if (input_var && output_var->data.streamId != 0) {
> > +   linker_error(prog, "output %s is assigned to stream=%d but "
> > +"is linked to an input, which requires 
> > stream=0",
> > +output_var->name, output_var->data.streamId);
> > +   return false;
> > +}
> >  matches.record(output_var, input_var);
> >   }
> >}
> > --
> > 1.9.1
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/18] glsl: Fail to link if inter-stage input/outputs are not assigned to stream 0

2014-06-11 Thread Iago Toral
It may be a bit less confusing to read if I take the inner if out, so
I'll rewrite it like this:

if (input_var || (prog->SeparateShader && consumer == NULL)) {
   matches.record(output_var, input_var);
}

if (input_var && output_var->data.streamId != 0) {
   linker_error(prog, ...);
   return false;
}

On Thu, 2014-06-12 at 07:43 +1200, Chris Forbes wrote:
> I had misread it -- yes, this looks correct.
> 
> On Thu, Jun 12, 2014 at 12:02 AM, Iago Toral  wrote:
> > I am a bit confused here:
> >
> > Reading the code it looks like input_var should be NULL if there is no
> > consumer stage. In that case, if this is a separable program and there
> > is no consumer, then input_var must be NULL too, in which case the
> > linker_error would never take place.
> >
> > If input_var is not NULL then there has to be a consumer stage, and in
> > that case we should check that we can link both stages, right?
> >
> > Iago
> >
> > On Wed, 2014-06-11 at 20:48 +1200, Chris Forbes wrote:
> >> This would appear to prohibit the use of multiple streams in separable
> >> programs entirely. I don't think that's the right thing.
> >>
> >> On Wed, Jun 11, 2014 at 7:49 PM, Iago Toral Quiroga  
> >> wrote:
> >> > Outputs that are linked to inputs in the next stage must be output to 
> >> > stream 0,
> >> > otherwise we should fail to link.
> >> > ---
> >> >  src/glsl/link_varyings.cpp | 7 +++
> >> >  1 file changed, 7 insertions(+)
> >> >
> >> > diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp
> >> > index 9883c0b..0066b4e 100644
> >> > --- a/src/glsl/link_varyings.cpp
> >> > +++ b/src/glsl/link_varyings.cpp
> >> > @@ -1343,6 +1343,13 @@ assign_varying_locations(struct gl_context *ctx,
> >> >* consumer stage, add the output.
> >> >*/
> >> >   if (input_var || (prog->SeparateShader && consumer == NULL)) {
> >> > +/* Only stream 0 outputs can be consumed in the next stage 
> >> > */
> >> > +if (input_var && output_var->data.streamId != 0) {
> >> > +   linker_error(prog, "output %s is assigned to stream=%d 
> >> > but "
> >> > +"is linked to an input, which requires 
> >> > stream=0",
> >> > +output_var->name, 
> >> > output_var->data.streamId);
> >> > +   return false;
> >> > +}
> >> >  matches.record(output_var, input_var);
> >> >   }
> >> >}
> >> > --
> >> > 1.9.1
> >> >
> >> > ___
> >> > mesa-dev mailing list
> >> > mesa-dev@lists.freedesktop.org
> >> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >>
> >
> >
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/18] glsl: Add support for EmitStreamVertex() and EndStreamPrimitive().

2014-06-13 Thread Iago Toral
, 2014-06-11 at 21:25 +1200, Chris Forbes wrote:
> This is pretty weird.
> 
> We should be able to generate a normal builtin function body here
> which consists of just the ir_emit_vertex, passing the stream
> parameter to it. This would then get inlined like any other function
> leaving a bare ir_emit_vertex / ir_end_primitive with a constant
> operand. If one of the optimization passes eats that, then it's
> broken.
> 
> 
> On Wed, Jun 11, 2014 at 7:49 PM, Iago Toral Quiroga  wrote:
> > ---
> >  src/glsl/ast_function.cpp  | 37 +-
> >  src/glsl/builtin_functions.cpp | 60 
> > ++
> >  src/glsl/ir.h  | 18 -
> >  3 files changed, 103 insertions(+), 12 deletions(-)
> >
> > diff --git a/src/glsl/ast_function.cpp b/src/glsl/ast_function.cpp
> > index 8e91a1e..1c4d7e7 100644
> > --- a/src/glsl/ast_function.cpp
> > +++ b/src/glsl/ast_function.cpp
> > @@ -1722,15 +1722,40 @@ ast_function_expression::hir(exec_list 
> > *instructions,
> >ir_function_signature *sig =
> >  match_function_by_name(func_name, &actual_parameters, state);
> >
> > +  bool is_emit_stream_vertex = !strcmp(func_name, "EmitStreamVertex");
> > +  bool is_end_stream_primitive = !strcmp(func_name, 
> > "EndStreamPrimitive");
> > +
> >ir_rvalue *value = NULL;
> >if (sig == NULL) {
> > -no_matching_function_error(func_name, &loc, &actual_parameters, 
> > state);
> > -value = ir_rvalue::error_value(ctx);
> > + no_matching_function_error(func_name, &loc, &actual_parameters, 
> > state);
> > + value = ir_rvalue::error_value(ctx);
> >} else if (!verify_parameter_modes(state, sig, actual_parameters, 
> > this->expressions)) {
> > -/* an error has already been emitted */
> > -value = ir_rvalue::error_value(ctx);
> > -  } else {
> > -value = generate_call(instructions, sig, &actual_parameters, 
> > state);
> > + /* an error has already been emitted */
> > + value = ir_rvalue::error_value(ctx);
> > +  } else if (is_emit_stream_vertex || is_end_stream_primitive) {
> > + /* See builtin_builder::_EmitStreamVertex() to undertand why we 
> > need
> > +  * to handle these as a special case.
> > +  */
> > + ir_rvalue *stream_param = (ir_rvalue *) actual_parameters.head;
> > + ir_constant *stream_const = stream_param->as_constant();
> > + /* stream_const should not be NULL if we got here */
> > + assert(stream_const);
> > + int stream_id = stream_const->value.i[0];
> > + if (stream_id < 0 || stream_id >= MAX_VERTEX_STREAMS) {
> > +_mesa_glsl_error(&loc, state,
> > + "Invalid call %s(%d). Accepted "
> > + "values for the stream parameter are in the "
> > + "range [0, %d].",
> > + func_name, stream_id, MAX_VERTEX_STREAMS - 1);
> > + }
> > + /* Only enable multi-stream if we emit vertices to non-zero 
> > streams */
> > + state->gs_uses_streams = state->gs_uses_streams || stream_id > 0;
> > + if (is_emit_stream_vertex)
> > +instructions->push_tail(new(ctx) ir_emit_vertex(stream_id));
> > + else
> > +instructions->push_tail(new(ctx) ir_end_primitive(stream_id));
> > +   } else {
> > + value = generate_call(instructions, sig, &actual_parameters, 
> > state);
> >}
> >
> >return value;
> > diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
> > index f9f0686..412e8f3 100644
> > --- a/src/glsl/builtin_functions.cpp
> > +++ b/src/glsl/builtin_functions.cpp
> > @@ -359,6 +359,12 @@ shader_image_load_store(const _mesa_glsl_parse_state 
> > *state)
> > state->ARB_shader_image_load_store_enable);
> >  }
> >
> > +static bool
> > +gs_streams(const _mesa_glsl_parse_state *state)
> > +{
> > +   return gpu_shader5(state) && gs_only(state);
> > +}
> > +
> >  /** @} */
> >
> >  
> > /**/
> > @@ -594,6 +600,10 @@ private:
> >
> > B0(EmitVertex)
> > B0(EndPrimitive)
> 

Re: [Mesa-dev] [PATCH 11/18] glsl: Add support for EmitStreamVertex() and EndStreamPrimitive().

2014-06-13 Thread Iago Toral
I forgot to add an important piece of info. I also had to add this in
the opt_dead_code.cpp, do_dead_code():

if (strcmp(entry->var->name, "stream") == 0)
continue;

without that, the variable referenced by ir_emit_vertex() gets trashed
anyway, whether the ralloc context in link_shaders is detroyed or not.

Iago

On Fri, 2014-06-13 at 10:09 +0200, Iago Toral wrote:
> After debugging I have more information about what is going on. There
> are two problems, one is that the stream variable in ir_emit_vertex gets
> trashed and the other one is that even if we manage to avoid that it
> won't get its value assigned. I explain how these two come to happen
> below and maybe someone can point me to what I am doing wrong:
> 
> first, this is how I am defining EmitStreamVertex():
> 
> ir_function_signature *
> builtin_builder::_EmitStreamVertex(builtin_available_predicate avail,
>const glsl_type *stream_type)
> {
>ir_variable *stream =
>   new(mem_ctx) ir_variable(stream_type, "stream", ir_var_const_in);
>MAKE_SIG(glsl_type::void_type, avail, 1, stream);
>ir_emit_vertex *ir = new(mem_ctx) ir_emit_vertex();
>ir->stream = var_ref(stream);
>body.emit(ir);
>return sig;
> }
> 
> The pattern is similar to other built-in functions. Notice how
> ir_stream_vertex will take a reference to the input stream variable.
> 
> And this is how I am defining ir_emit_vertex:
> 
> class ir_emit_vertex : public ir_instruction {
> public:
>  ir_emit_vertex() : ir_instruction(ir_type_emit_vertex), stream(NULL) {}
> 
>  virtual void accept(ir_visitor *v) { v->visit(this); }
> 
>  virtual ir_emit_vertex *clone(void *mem_ctx, struct hash_table *ht)
> const
>  {
> ir_emit_vertex *ir = new(mem_ctx) ir_emit_vertex();
> if (this->stream)
>ir->stream = this->stream->clone(mem_ctx, ht);
> return ir;
>  }
> 
>  virtual ir_visitor_status accept(ir_hierarchical_visitor *);
>  ir_dereference_variable *stream;
> };
> 
> Again, I don't see anything special as it follows the same pattern as
> other IR definitions in ir.h.
> 
> If I only do this, then, by the time I reach
> vec4_gs_visitor::visit(ir_emit_vertex *ir), ir->stream has been trashed.
> 
> ¿Is this expected? ¿Am I missing something in EmitStreamVertex(),
> ir_emit_vertex or somewhere else that is causing this?
> 
> Valgrind says the variable gets killed with the destruction of the
> ralloc context created in link_shaders. And indeed, removing the
> ralloc_free lets the variable reach the visitor.  I suppose this is not
> expected (otherwise we would have this problem in any built-in function
> that accepts input parameters). Just in case, I noticed this code in
> link_shaders:
> 
>   clone_ir_list(mem_ctx, linked->ir, main->ir);
> 
> It seems that it clones code using that ralloc context created in
> link_shaders, so I changed it to be:
> 
>clone_ir_list(linked, linked->ir, main->ir);
> 
> And it fixes the problem, but I suppose this is only a workaround for
> the real problem.
> 
> As for the second problem, if I bypass the variable trashing by removing
> the call to ralloc_free in link_shaders() or by doing the change above,
> then when we reach  vec4_gs_visitor::visit(ir_emit_vertex *ir), if I do
> ((ir_rvalue*)ir->stream)->as_constant() it will still return NULL, so it
> is useless. I want to read the value of the variable here, which I
> should be able to do since this is a constant expression.
> 
> However, as far as I can see by looking into ir_call::generate_inline()
> this seems to be expected: inputs to functions get a *new* variable for
> them, where the actual parameter value is set via an ir_assignment:
> 
> parameters[i] = sig_param->clone(ctx, ht);
> parameters[i]->data.mode = ir_var_auto;
> (...)
> assign =
>new(ctx) ir_assignment(new(ctx) 
>   ir_dereference_variable(parameters[i]),
>   param, NULL);
> next_ir->insert_before(assign);
> (...)
> 
> And then it clones the body of the function, like so:
> 
> foreach_list(n, &callee->body) {
>ir_instruction *ir = (ir_instruction *) n;
>ir_instruction *new_ir = ir->clone(ctx, ht);
> 
>new_instructions.push_tail(new_ir);
>visit_tree(new_ir, replace_return_with_assignment, 
>   this->return_deref);
> }
> 
> In our case, there is only one instruction here: ir_emit_vertex, and
> when cloning it we are also cloning its reference to the stream
> variable, but this is *different* from the parameter variable where we
> copied the ac

Re: [Mesa-dev] [PATCH 11/18] glsl: Add support for EmitStreamVertex() and EndStreamPrimitive().

2014-06-13 Thread Iago Toral
On Fri, 2014-06-13 at 10:28 +0200, Iago Toral wrote:
> I forgot to add an important piece of info. I also had to add this in
> the opt_dead_code.cpp, do_dead_code():
> 
> if (strcmp(entry->var->name, "stream") == 0)
> continue;
> 
> without that, the variable referenced by ir_emit_vertex() gets trashed
> anyway, whether the ralloc context in link_shaders is detroyed or not.

The variable is killed because it is not used, as I was anticipating in
my patch, but I don't think the optimization passes are broken, I think
this is expected to happen:

This is the code generated for EmitStreamVertex(0) after function
inlining:

(declare () int stream)
(assign  (x) (var_ref stream)  (constant int (0)) )
(emit-vertex)

(...)

(function EmitStreamVertex
  (signature void
(parameters
  (declare (const_in ) int stream)
)
(
  (emit-vertex)
))
)

And this is after the dead code elimination passes (dead_code and
dead_code_local), first the assignment is removed:

(declare () int stream)
(emit-vertex)

And finally, in a second pass, it removes the declaration too, leaving:

(emit-vertex)

Seems to make sense: it is killing a variable that is, at this stage,
not used for anything. So, as I was saying in the original patch, I
think we can't do EmitStreamVertex(n) like any other built-in function
because we won't be using its input parameter in the body of the
function for anything, the variable's value is to be used in the visitor
when it is time to generate the native code and that happens after the
optimization passes, so we need to grab its constant value before the
optimization passes (as my original patch did) or we have to find a way
to tell the optimization passes that it should not touch this variable
specifically (and then we would still have to figure out how to access
that temporary variable holding the stream value from the visitor).

Iago

> Iago
> 
> On Fri, 2014-06-13 at 10:09 +0200, Iago Toral wrote:
> > After debugging I have more information about what is going on. There
> > are two problems, one is that the stream variable in ir_emit_vertex gets
> > trashed and the other one is that even if we manage to avoid that it
> > won't get its value assigned. I explain how these two come to happen
> > below and maybe someone can point me to what I am doing wrong:
> > 
> > first, this is how I am defining EmitStreamVertex():
> > 
> > ir_function_signature *
> > builtin_builder::_EmitStreamVertex(builtin_available_predicate avail,
> >const glsl_type *stream_type)
> > {
> >ir_variable *stream =
> >   new(mem_ctx) ir_variable(stream_type, "stream", ir_var_const_in);
> >MAKE_SIG(glsl_type::void_type, avail, 1, stream);
> >ir_emit_vertex *ir = new(mem_ctx) ir_emit_vertex();
> >ir->stream = var_ref(stream);
> >body.emit(ir);
> >return sig;
> > }
> > 
> > The pattern is similar to other built-in functions. Notice how
> > ir_stream_vertex will take a reference to the input stream variable.
> > 
> > And this is how I am defining ir_emit_vertex:
> > 
> > class ir_emit_vertex : public ir_instruction {
> > public:
> >  ir_emit_vertex() : ir_instruction(ir_type_emit_vertex), stream(NULL) {}
> > 
> >  virtual void accept(ir_visitor *v) { v->visit(this); }
> > 
> >  virtual ir_emit_vertex *clone(void *mem_ctx, struct hash_table *ht)
> > const
> >  {
> > ir_emit_vertex *ir = new(mem_ctx) ir_emit_vertex();
> > if (this->stream)
> >ir->stream = this->stream->clone(mem_ctx, ht);
> > return ir;
> >  }
> > 
> >  virtual ir_visitor_status accept(ir_hierarchical_visitor *);
> >  ir_dereference_variable *stream;
> > };
> > 
> > Again, I don't see anything special as it follows the same pattern as
> > other IR definitions in ir.h.
> > 
> > If I only do this, then, by the time I reach
> > vec4_gs_visitor::visit(ir_emit_vertex *ir), ir->stream has been trashed.
> > 
> > ¿Is this expected? ¿Am I missing something in EmitStreamVertex(),
> > ir_emit_vertex or somewhere else that is causing this?
> > 
> > Valgrind says the variable gets killed with the destruction of the
> > ralloc context created in link_shaders. And indeed, removing the
> > ralloc_free lets the variable reach the visitor.  I suppose this is not
> > expected (otherwise we would have this problem in any built-in function
> > that accepts input parameters). Just in case, I noticed this code in
> > link_shaders:
> > 
> >   clone_ir_list(mem_ctx, linked->ir, main->ir);
> > 
> >

Re: [Mesa-dev] [PATCH 11/18] glsl: Add support for EmitStreamVertex() and EndStreamPrimitive().

2014-06-16 Thread Iago Toral
Hi Chris,

On Sat, 2014-06-14 at 08:34 +1200, Chris Forbes wrote:
> Right, this happens because ir_emit_vertex doesn't take a proper
> operand, so it can't keep it alive.
> 
> What I think you want to do is change the stream in ir_emit_vertex and
> ir_end_primitive to be a pointer to ir_rvalue (and apply the various
> tweaks required to consider it alive; have rvalue visitors descend
> into it; etc) then emit:

thanks for the tip, I will look into this. I am not sure about what you
mean by applying the various tweaks required to consider the stream
alive (unless you mean the rvalue visitor stuff specifically), but I'll
try to look for clues in other built-in functions.

Iago

> (function EmitStreamVertex
>   (signature void
> (parameters
>   (declare (const_in ) int stream)
> )
> (
>   (emit-vertex (var_ref stream))
> ))
> )
> 
> which would inline in your case to
> 
> 
> (declare () int stream)
> (assign  (x) (var_ref stream)  (constant int (0)) )
> (emit-vertex (var_ref stream))
> 
> 
> and then after constant propagation,
> 
> (emit-vertex (constant int (0)) )
> 
> which you can then pick out in your later visitors just as easily as
> you can with the integer you're currently storing.
> 
> 
> On Fri, Jun 13, 2014 at 11:52 PM, Iago Toral  wrote:
> > On Fri, 2014-06-13 at 10:28 +0200, Iago Toral wrote:
> >> I forgot to add an important piece of info. I also had to add this in
> >> the opt_dead_code.cpp, do_dead_code():
> >>
> >> if (strcmp(entry->var->name, "stream") == 0)
> >> continue;
> >>
> >> without that, the variable referenced by ir_emit_vertex() gets trashed
> >> anyway, whether the ralloc context in link_shaders is detroyed or not.
> >
> > The variable is killed because it is not used, as I was anticipating in
> > my patch, but I don't think the optimization passes are broken, I think
> > this is expected to happen:
> >
> > This is the code generated for EmitStreamVertex(0) after function
> > inlining:
> >
> > (declare () int stream)
> > (assign  (x) (var_ref stream)  (constant int (0)) )
> > (emit-vertex)
> >
> > (...)
> >
> > (function EmitStreamVertex
> >   (signature void
> > (parameters
> >   (declare (const_in ) int stream)
> > )
> > (
> >   (emit-vertex)
> > ))
> > )
> >
> > And this is after the dead code elimination passes (dead_code and
> > dead_code_local), first the assignment is removed:
> >
> > (declare () int stream)
> > (emit-vertex)
> >
> > And finally, in a second pass, it removes the declaration too, leaving:
> >
> > (emit-vertex)
> >
> > Seems to make sense: it is killing a variable that is, at this stage,
> > not used for anything. So, as I was saying in the original patch, I
> > think we can't do EmitStreamVertex(n) like any other built-in function
> > because we won't be using its input parameter in the body of the
> > function for anything, the variable's value is to be used in the visitor
> > when it is time to generate the native code and that happens after the
> > optimization passes, so we need to grab its constant value before the
> > optimization passes (as my original patch did) or we have to find a way
> > to tell the optimization passes that it should not touch this variable
> > specifically (and then we would still have to figure out how to access
> > that temporary variable holding the stream value from the visitor).
> >
> > Iago
> >
> >> Iago
> >>
> >> On Fri, 2014-06-13 at 10:09 +0200, Iago Toral wrote:
> >> > After debugging I have more information about what is going on. There
> >> > are two problems, one is that the stream variable in ir_emit_vertex gets
> >> > trashed and the other one is that even if we manage to avoid that it
> >> > won't get its value assigned. I explain how these two come to happen
> >> > below and maybe someone can point me to what I am doing wrong:
> >> >
> >> > first, this is how I am defining EmitStreamVertex():
> >> >
> >> > ir_function_signature *
> >> > builtin_builder::_EmitStreamVertex(builtin_available_predicate avail,
> >> >const glsl_type *stream_type)
> >> > {
> >> >ir_variable *stream =
> >> >   new(mem_ctx) ir_variable(stream_type, "stream", ir_var_const_in);
> >> >

Re: [Mesa-dev] [PATCH] glsl: Expand matrix flip optimization pass to cover more cases.

2014-06-17 Thread Iago Toral
Hi Ian,

On Mon, 2014-06-16 at 12:20 -0700, Ian Romanick wrote:
> On 04/21/2014 04:38 AM, Iago Toral Quiroga wrote:
> > Currently it only considers the cases of gl_ModelViewProjectionMatrix and
> > gl_TextureMatrix. The same optimization can be done also for
> > gl_ModelViewMatrix, gl_ProjectionMatrix and the corresponding inverses.
> 
> I've been looking at some thing similar recently.  I've been looking at
> flipping user-defined matrices.
> 
> I think rather than adding a bunch of individual handles to flippable
> matrices, I think we should add a hash-table of possible names.  For the
> built-in uniforms, we would add any matrix uniform that doesn't have
> "Transpose" in the name.  Something like:
> 
> struct matrix_and_transpose {
> ir_variable *matrix;
> ir_variable *transpose_matrix;
> };
> 
> I think all of the names have regular enough patterns that we should be
> able to generate the non-transpose name from the transpose name.  Vice
> versa shouldn't be necessary.  If the transpose name is encountered
> first, add the structure with matrix set NULL and the name derived from
> the transpose name.

Right, I think this is better too.

I understand that you are already working on this then? If not I can
give this a go as soon as I send the v2 of the series for the
multi-stream support (hopefully tomorrow).

Iago

> One other comment below.
> 
> > ---
> >  src/glsl/opt_flip_matrices.cpp | 118 
> > +++--
> >  1 file changed, 91 insertions(+), 27 deletions(-)
> > 
> > diff --git a/src/glsl/opt_flip_matrices.cpp b/src/glsl/opt_flip_matrices.cpp
> > index 9044fd6..bf09b78 100644
> > --- a/src/glsl/opt_flip_matrices.cpp
> > +++ b/src/glsl/opt_flip_matrices.cpp
> > @@ -29,8 +29,9 @@
> >   * On some hardware, this is more efficient.
> >   *
> >   * This currently only does the conversion for built-in matrices which
> > - * already have transposed equivalents.  Namely, 
> > gl_ModelViewProjectionMatrix
> > - * and gl_TextureMatrix.
> > + * already have transposed equivalents. Namely, gl_ModelViewMatrix,
> > + * gl_ProjectionMatrix, gl_ModelViewProjectionMatrix, gl_TextureMatrix and
> > + * their inverses.
> >   */
> >  #include "ir.h"
> >  #include "ir_optimization.h"
> > @@ -42,18 +43,39 @@ public:
> > matrix_flipper(exec_list *instructions)
> > {
> >progress = false;
> > +  p_transpose = NULL;
> > +  p_inv_transpose = NULL;
> > +  mv_transpose = NULL;
> > +  mv_inv_transpose = NULL;
> >mvp_transpose = NULL;
> > +  mvp_inv_transpose = NULL;
> >texmat_transpose = NULL;
> > +  texmat_inv_transpose = NULL;
> >  
> >foreach_list(n, instructions) {
> >   ir_instruction *ir = (ir_instruction *) n;
> >   ir_variable *var = ir->as_variable();
> >   if (!var)
> >  continue;
> > - if (strcmp(var->name, "gl_ModelViewProjectionMatrixTranspose") == 
> > 0)
> > +
> > + if (strcmp(var->name, "gl_ProjectionMatrixTranspose") == 0)
> > +p_transpose = var;
> > + else if (strcmp(var->name, "gl_ProjectionMatrixInverseTranspose") 
> > == 0)
> > +p_inv_transpose = var;
> > + else if (strcmp(var->name, "gl_ModelViewMatrixTranspose") == 0)
> > +mv_transpose = var;
> > + else if (strcmp(var->name, "gl_ModelViewMatrixInverseTranspose") 
> > == 0)
> > +mv_inv_transpose = var;
> > + else if (strcmp(var->name,
> > + "gl_ModelViewProjectionMatrixTranspose") == 0)
> >  mvp_transpose = var;
> > - if (strcmp(var->name, "gl_TextureMatrixTranspose") == 0)
> > + else if (strcmp(var->name,
> > + "gl_ModelViewProjectionMatrixInverseTranspose") 
> > == 0)
> > +mvp_inv_transpose = var;
> > + else if (strcmp(var->name, "gl_TextureMatrixTranspose") == 0)
> >  texmat_transpose = var;
> > + else if (strcmp(var->name, "gl_TextureMatrixInverseTranspose") == 
> > 0)
> > +texmat_inv_transpose = var;
> >}
> > }
> >  
> > @@ -62,11 +84,49 @@ public:
> > bool progress;
> >  
> >  private:
> > +   ir_variable *p_transpose;
> > +   ir_variable *p_inv_transpose;
&

Re: [Mesa-dev] [PATCH v2] glsl: Expand matrix flip optimization pass to cover more cases.

2014-06-18 Thread Iago Toral
On Wed, 2014-06-18 at 14:43 -0700, Ian Romanick wrote:
> On 06/18/2014 05:03 AM, Iago Toral Quiroga wrote:
> > Also, as suggested by Ian Romanick, make it so we don't need a bunch of
> > individual handles to flippable matrices, instead we register
> > matrix/transpose_matrix pairs in a hash table for all built-in matrices
> > using the non-transpose matrix name as key.
> > ---
> >  src/glsl/opt_flip_matrices.cpp | 145 
> > ++---
> >  1 file changed, 108 insertions(+), 37 deletions(-)
> > 
> > diff --git a/src/glsl/opt_flip_matrices.cpp b/src/glsl/opt_flip_matrices.cpp
> > index 9044fd6..d0b8125 100644
> > --- a/src/glsl/opt_flip_matrices.cpp
> > +++ b/src/glsl/opt_flip_matrices.cpp
> > @@ -29,44 +29,132 @@
> >   * On some hardware, this is more efficient.
> >   *
> >   * This currently only does the conversion for built-in matrices which
> > - * already have transposed equivalents.  Namely, 
> > gl_ModelViewProjectionMatrix
> > - * and gl_TextureMatrix.
> > + * already have transposed equivalents.
> >   */
> >  #include "ir.h"
> >  #include "ir_optimization.h"
> >  #include "main/macros.h"
> > +#include "program/hash_table.h"
> >  
> >  namespace {
> > +
> >  class matrix_flipper : public ir_hierarchical_visitor {
> >  public:
> > +   struct matrix_and_transpose {
> > +  ir_variable *matrix;
> > +  ir_variable *transpose_matrix;
> > +   };
> > +
> > matrix_flipper(exec_list *instructions)
> > {
> >progress = false;
> > -  mvp_transpose = NULL;
> > -  texmat_transpose = NULL;
> > +
> > +  /* Build a hash table of built-in matrices and their transposes.
> > +   *
> > +   * The key for the entries in the hash table is the non-transpose 
> > matrix
> > +   * name. This assumes that all built-in transpose matrices have the
> > +   * "Transpose" suffix.
> > +   */
> > +  ht = hash_table_ctor(0, hash_table_string_hash,
> > +   hash_table_string_compare);
> >  
> >foreach_list(n, instructions) {
> >   ir_instruction *ir = (ir_instruction *) n;
> >   ir_variable *var = ir->as_variable();
> > - if (!var)
> > +
> > + /* Must be a matrix */
> > + if (!var || !var->type->is_matrix())
> >  continue;
> 
> gl_TextureMatrix is an array of matrices, so var->type->is_matrix()
> will fail.  I think you want:
> 
>  if (!var)
> continue;
> 
>  /* Must be a matrix or array of matrices. */
>  if (!var->type->is_matrix() && 
>  !(var->type->is_array() && var->type->fields.array->is_matrix()))
> continue;

Oh, right.

> > - if (strcmp(var->name, "gl_ModelViewProjectionMatrixTranspose") == 
> > 0)
> > -mvp_transpose = var;
> > - if (strcmp(var->name, "gl_TextureMatrixTranspose") == 0)
> > -texmat_transpose = var;
> > + /* Must be a built-in */
> > + if (strstr(var->name, "gl_") != var->name)
> > +continue;
> 
> The name has to start with gl_, not just contain it.  Use
> is_gl_identifier(var->name) instead.

Actually, this checks that it starts with it (see != var->name), but
I'll use is_gl_identifier.

> > +
> > + /* Create a new entry for this matrix if we don't have one yet */
> > + bool new_entry = false;
> > + struct matrix_and_transpose *entry =
> > +(struct matrix_and_transpose *) hash_table_find(ht, var->name);
> > + if (!entry) {
> > +new_entry = true;
> > +entry = new struct matrix_and_transpose();
> > +entry->matrix = NULL;
> > +entry->transpose_matrix = NULL;
> > + }
> > +
> > + const char *transpose_ptr = strstr(var->name, "Transpose");
> > + if (transpose_ptr == NULL) {
> > +entry->matrix = var;
> > + } else {
> 
> It's probably worth adding an assertion in case a built-in is ever
> added with something after Transpose.  The probability is very, very
> low, but I'd rather be safe.

Sure, I will add that.

>assert(transpose_ptr[9] == 0);
> 
> > +entry->transpose_matrix = var;
> &

Re: [Mesa-dev] [PATCH v2 04/23] glsl: Assign GLSL StreamIds to transform feedback outputs.

2014-06-18 Thread Iago Toral
On Wed, 2014-06-18 at 12:56 -0700, Ian Romanick wrote:
> On 06/18/2014 02:51 AM, Iago Toral Quiroga wrote:
> > Inter-shader outputs must be on stream 0, which is the default.
> > ---
> >  src/glsl/link_varyings.cpp | 12 +---
> >  src/glsl/link_varyings.h   |  7 +++
> >  2 files changed, 16 insertions(+), 3 deletions(-)
> > 
> > diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp
> > index f765d37..9725a43 100644
> > --- a/src/glsl/link_varyings.cpp
> > +++ b/src/glsl/link_varyings.cpp
> > @@ -291,6 +291,7 @@ tfeedback_decl::init(struct gl_context *ctx, const void 
> > *mem_ctx,
> > this->skip_components = 0;
> > this->next_buffer_separator = false;
> > this->matched_candidate = NULL;
> > +   this->stream_id = 0;
> >  
> > if (ctx->Extensions.ARB_transform_feedback3) {
> >/* Parse gl_NextBuffer. */
> > @@ -355,8 +356,8 @@ tfeedback_decl::is_same(const tfeedback_decl &x, const 
> > tfeedback_decl &y)
> >  
> >  
> >  /**
> > - * Assign a location for this tfeedback_decl object based on the transform
> > - * feedback candidate found by find_candidate.
> > + * Assign a location and stream ID for this tfeedback_decl object based on 
> > the
> > + * transform feedback candidate found by find_candidate.
> >   *
> >   * If an error occurs, the error is reported through linker_error() and 
> > false
> >   * is returned.
> > @@ -437,6 +438,11 @@ tfeedback_decl::assign_location(struct gl_context *ctx,
> >return false;
> > }
> >  
> > +   /* Only transform feedback varyings can be assigned to non-zero streams,
> > +* so assign the stream id here.
> > +*/
> > +   this->stream_id = this->matched_candidate->toplevel_var->data.stream;
> > +
> > return true;
> >  }
> >  
> > @@ -495,7 +501,7 @@ tfeedback_decl::store(struct gl_context *ctx, struct 
> > gl_shader_program *prog,
> >info->Outputs[info->NumOutputs].ComponentOffset = location_frac;
> >info->Outputs[info->NumOutputs].OutputRegister = location;
> >info->Outputs[info->NumOutputs].NumComponents = output_size;
> > -  info->Outputs[info->NumOutputs].StreamId = 0;
> > +  info->Outputs[info->NumOutputs].StreamId = stream_id;
> >info->Outputs[info->NumOutputs].OutputBuffer = buffer;
> >info->Outputs[info->NumOutputs].DstOffset = 
> > info->BufferStride[buffer];
> >++info->NumOutputs;
> > diff --git a/src/glsl/link_varyings.h b/src/glsl/link_varyings.h
> > index 6fa2681..4e485c3 100644
> > --- a/src/glsl/link_varyings.h
> > +++ b/src/glsl/link_varyings.h
> > @@ -210,6 +210,13 @@ private:
> >  * data structure that was found.  Otherwise NULL.
> >  */
> > const tfeedback_candidate *matched_candidate;
> > +
> > +   /**
> > +* StreamId assigned to this varying (defaults to 0). Can only be set to
> > +* values other than 0 in geometry shaders that use the stream layout
> > +* modifier. Accepted values must be in the range [0, 
> > MAX_VERTEX_STREAMS].
> 
> MAX_VERTEX_STREAMS-1, right?

Ooops, right. I'll fix that.

> > +*/
> > +   unsigned stream_id;
> >  };
> >  
> >  
> > 
> 
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 12/23] glsl: Validate vertex emission in geometry shaders.

2014-06-18 Thread Iago Toral
On Wed, 2014-06-18 at 13:38 -0700, Ian Romanick wrote:
> On 06/18/2014 02:51 AM, Iago Toral Quiroga wrote:
> > Check if non-zero streams are used. Fail to link if emitting to unsupported
> > streams or emitting to non-zero streams with output type other than 
> > GL_POINTS.
> > ---
> >  src/glsl/linker.cpp | 148 
> > +++-
> >  1 file changed, 134 insertions(+), 14 deletions(-)
> > 
> > diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
> > index 0b6a716..f8ff138 100644
> > --- a/src/glsl/linker.cpp
> > +++ b/src/glsl/linker.cpp
> > @@ -250,31 +250,100 @@ public:
> > }
> >  };
> >  
> > -
> >  /**
> > - * Visitor that determines whether or not a shader uses ir_end_primitive.
> > + * Visitor that determines the highest stream id to which a (geometry) 
> > shader
> > + * emits vertices. It also checks whether End{Stream}Primitive is ever 
> > called.
> >   */
> > -class find_end_primitive_visitor : public ir_hierarchical_visitor {
> > +class find_emit_vertex_visitor : public ir_hierarchical_visitor {
> >  public:
> > -   find_end_primitive_visitor()
> > -  : found(false)
> > +   find_emit_vertex_visitor(int max_allowed)
> > +  : max_stream_allowed(max_allowed),
> > +invalid_stream_id(0),
> > +invalid_stream_id_from_emit_vertex(false),
> > +end_primitive_found(false),
> > +uses_non_zero_stream(false)
> > {
> >/* empty */
> > }
> >  
> > -   virtual ir_visitor_status visit(ir_end_primitive *)
> > +   virtual ir_visitor_status visit_leave(ir_emit_vertex *ir)
> > {
> > -  found = true;
> > -  return visit_stop;
> > +  int stream_id = ir->stream_id();
> > +
> > +  if (stream_id < 0) {
> > + invalid_stream_id = stream_id;
> > + invalid_stream_id_from_emit_vertex = true;
> > + return visit_stop;
> > +  }
> > +
> > +  if (stream_id > max_stream_allowed) {
> > + invalid_stream_id = stream_id;
> > + invalid_stream_id_from_emit_vertex = true;
> > + return visit_stop;
> > +  }
> > +
> > +  if (stream_id != 0)
> > + uses_non_zero_stream = true;
> > +
> > +  return visit_continue;
> > }
> >  
> > -   bool end_primitive_found()
> > +   virtual ir_visitor_status visit_leave(ir_end_primitive *ir)
> > {
> > -  return found;
> > +  end_primitive_found = true;
> > +
> > +  int stream_id = ir->stream_id();
> > +
> > +  if (stream_id < 0) {
> > + invalid_stream_id = stream_id;
> > + invalid_stream_id_from_emit_vertex = false;
> > + return visit_stop;
> > +  }
> > +
> > +  if (stream_id > max_stream_allowed) {
> > + invalid_stream_id = stream_id;
> > + invalid_stream_id_from_emit_vertex = false;
> > + return visit_stop;
> > +  }
> > +
> > +  if (stream_id != 0)
> > + uses_non_zero_stream = true;
> > +
> > +  return visit_continue;
> > +   }
> > +
> > +   bool error()
> > +   {
> > +  return invalid_stream_id != 0;
> > +   }
> > +
> > +   const char *error_func()
> > +   {
> > +  return invalid_stream_id_from_emit_vertex ?
> > + "EmitStreamVertex" : "EndStreamPrimitive";
> > +   }
> > +
> > +   int error_stream()
> > +   {
> > +  return invalid_stream_id;
> > +   }
> > +
> > +   bool uses_streams()
> > +   {
> > +  return uses_non_zero_stream;
> > +   }
> > +
> > +   bool uses_end_primitive()
> > +   {
> > +  return end_primitive_found;
> > }
> >  
> >  private:
> > -   bool found;
> > +   int max_stream_allowed;
> > +   int invalid_stream_id;
> > +   bool invalid_stream_id_from_emit_vertex;
> > +   bool end_primitive_found;
> > +   bool uses_non_zero_stream;
> >  };
> >  
> >  } /* anonymous namespace */
> > @@ -551,10 +620,58 @@ validate_geometry_shader_executable(struct 
> > gl_shader_program *prog,
> >  
> > analyze_clip_usage(prog, shader, &prog->Geom.UsesClipDistance,
> >&prog->Geom.ClipDistanceArraySize);
> > +}
> > +
> > +/**
> > + * Check if geometry shaders emit to non-zero streams and do corresponding
> &g

Re: [Mesa-dev] [PATCH v2 22/23] mesa: Init Geom.UsesEndPrimitive in shader programs.

2014-06-18 Thread Iago Toral
On Wed, 2014-06-18 at 13:48 -0700, Ian Romanick wrote:
> I think this patch and 23 should be moved first in the series... the can
> certainly land before the other patches in the series.

Right. I'll push these two tomorrow if nobody says otherwise today.

Since you marked patch 23 for stable: do I need to do something else
other than adding the CC tag in the commit message? Should I send the
patch also to mesa-sta...@lists.freedesktop.org?

Iago

> This patch is
> 
> Reviewed-by: Ian Romanick 
> 
> On 06/18/2014 02:51 AM, Iago Toral Quiroga wrote:
> > ---
> >  src/mesa/main/shaderobj.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/src/mesa/main/shaderobj.c b/src/mesa/main/shaderobj.c
> > index 03db862..b3d428c 100644
> > --- a/src/mesa/main/shaderobj.c
> > +++ b/src/mesa/main/shaderobj.c
> > @@ -248,6 +248,7 @@ _mesa_init_shader_program(struct gl_context *ctx, 
> > struct gl_shader_program *prog
> > prog->Geom.VerticesOut = 0;
> > prog->Geom.InputType = GL_TRIANGLES;
> > prog->Geom.OutputType = GL_TRIANGLE_STRIP;
> > +   prog->Geom.UsesEndPrimitive = GL_FALSE;
> > prog->Geom.UsesStreams = GL_FALSE;
> >  
> > prog->TransformFeedback.BufferMode = GL_INTERLEAVED_ATTRIBS;
> > 
> 
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 09/23] glsl: Store info about geometry shaders that emit vertices to non-zero streams.

2014-06-19 Thread Iago Toral
On Wed, 2014-06-18 at 13:05 -0700, Ian Romanick wrote:
> On 06/18/2014 02:51 AM, Iago Toral Quiroga wrote:
> > On Intel hardware when a geometry shader outputs GL_POINTS primitives we
> > only need to emit vertex control bits if it emits vertices to non-zero
> > streams, so use a flag to track this.
> > 
> > This flag will be set to TRUE when a geometry shader calls 
> > EmitStreamVertex()
> > or EndStreamPrimitive() with a non-zero stream parameter in a later patch.
> > ---
> >  src/mesa/main/mtypes.h | 2 ++
> >  src/mesa/main/shaderapi.c  | 1 +
> >  src/mesa/main/shaderobj.c  | 1 +
> >  src/mesa/program/program.c | 1 +
> >  4 files changed, 5 insertions(+)
> > 
> > diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
> > index f45dde4..5bc710e 100644
> > --- a/src/mesa/main/mtypes.h
> > +++ b/src/mesa/main/mtypes.h
> > @@ -2178,6 +2178,7 @@ struct gl_geometry_program
> > GL_TRIANGLES, or GL_TRIANGLES_ADJACENCY_ARB */
> > GLenum OutputType; /**< GL_POINTS, GL_LINE_STRIP or GL_TRIANGLE_STRIP */
> > GLboolean UsesEndPrimitive;
> > +   GLboolean UsesStreams;
> 
> For things that are not visible to the GL API, we've been trying to
> transition away from GL types.  Unless Brian or Chris object, I think
> I'd rather have this be bool (and true / false below).

If Brian and Chris are okay with this change then I'll also change
UsesEndPrimitive to be bool.

> >  };
> >  
> >  
> > @@ -2681,6 +2682,7 @@ struct gl_shader_program
> >GLuint ClipDistanceArraySize; /**< Size of the gl_ClipDistance 
> > array, or
> >   0 if not present. */
> >GLboolean UsesEndPrimitive;
> > +  GLboolean UsesStreams;
> > } Geom;
> >  
> > /** Vertex shader state */
> > diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
> > index 2ec2444..2bbef35 100644
> > --- a/src/mesa/main/shaderapi.c
> > +++ b/src/mesa/main/shaderapi.c
> > @@ -1888,6 +1888,7 @@ _mesa_copy_linked_program_data(gl_shader_stage type,
> >dst_gp->OutputType = src->Geom.OutputType;
> >dst->UsesClipDistanceOut = src->Geom.UsesClipDistance;
> >dst_gp->UsesEndPrimitive = src->Geom.UsesEndPrimitive;
> > +  dst_gp->UsesStreams = src->Geom.UsesStreams;
> > }
> >break;
> > case MESA_SHADER_FRAGMENT: {
> > diff --git a/src/mesa/main/shaderobj.c b/src/mesa/main/shaderobj.c
> > index b0f0bfa..03db862 100644
> > --- a/src/mesa/main/shaderobj.c
> > +++ b/src/mesa/main/shaderobj.c
> > @@ -248,6 +248,7 @@ _mesa_init_shader_program(struct gl_context *ctx, 
> > struct gl_shader_program *prog
> > prog->Geom.VerticesOut = 0;
> > prog->Geom.InputType = GL_TRIANGLES;
> > prog->Geom.OutputType = GL_TRIANGLE_STRIP;
> > +   prog->Geom.UsesStreams = GL_FALSE;
> >  
> > prog->TransformFeedback.BufferMode = GL_INTERLEAVED_ATTRIBS;
> >  
> > diff --git a/src/mesa/program/program.c b/src/mesa/program/program.c
> > index b7332fc..1263cea 100644
> > --- a/src/mesa/program/program.c
> > +++ b/src/mesa/program/program.c
> > @@ -552,6 +552,7 @@ _mesa_clone_program(struct gl_context *ctx, const 
> > struct gl_program *prog)
> >   gpc->InputType = gp->InputType;
> >   gpc->Invocations = gp->Invocations;
> >   gpc->OutputType = gp->OutputType;
> > + gpc->UsesStreams = gp->UsesStreams;
> >}
> >break;
> > default:
> > 
> 
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 11/23] glsl: Add support for EmitStreamVertex() and EndStreamPrimitive().

2014-06-19 Thread Iago Toral
On Wed, 2014-06-18 at 13:31 -0700, Ian Romanick wrote:
> This patch should be split into several patches:
> 
> 1. Modify ir_emit_vertex to have a stream.  This patch also needs to update
> ir_to_mesa.cpp and st_glsl_to_tgsi.cpp.
> 
> 2. Modify ir_end_primitive to have a stream.  This patch also needs to update
> ir_to_mesa.cpp and st_glsl_to_tgsi.cpp.
> 
> 3. Add the new built-in functions.

Ok, I will rework the patch accordingly.

> A couple other minor comments below...
(...)
> > +ir_function_signature *
> > +builtin_builder::_EmitStreamVertex(builtin_available_predicate avail,
> > +   const glsl_type *stream_type)
> > +{
> 
> Please add a spec quotation for this.  I had to go look it up to be sure
> ir_var_const_in was correct.
> 
>/* Section 8.12 (Geometry Shader Functions) of the OpenGL 4.0 spec says:
> *
> * "Completes the current output primitive on stream stream and starts
> * a new one. The argument to stream must be a constant integral
> * expression."
> */

I think you meant the GLSL 4.0 spec here.

> > +   ir_variable *stream =
> > +  new(mem_ctx) ir_variable(stream_type, "stream", ir_var_const_in);
> > +
> > +   MAKE_SIG(glsl_type::void_type, avail, 1, stream);
> > +
> > +   body.emit(new(mem_ctx) ir_emit_vertex(var_ref(stream)));
> >  
> > return sig;
> >  }
> > @@ -3882,7 +3915,22 @@ builtin_builder::_EndPrimitive()
> >  {
> > MAKE_SIG(glsl_type::void_type, gs_only, 0);
> >  
> > -   body.emit(new(mem_ctx) ir_end_primitive());
> > +   ir_rvalue *stream = new(mem_ctx) ir_constant(0, 1);
> > +   body.emit(new(mem_ctx) ir_end_primitive(stream));
> > +
> > +   return sig;
> > +}

(...)

Iago

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 01/23] glsl: Add parsing support for multi-stream output in geometry shaders.

2014-06-19 Thread Iago Toral
On Wed, 2014-06-18 at 11:16 -0700, Ian Romanick wrote:
(...)
> 
> Please replace this comment with the following spec quotation:
> 
>  /* Section 4.3.8.2 (Output Layout Qualifiers) of the OpenGL 4.00

I think you meant the GLSL spec.

>   * spec says:
>   *
>   * "If the block or variable is declared with the stream
>   * identifier, it is associated with the specified stream;
>   * otherwise, it is associated with the current default stream."
>   */
> 
> > +  $$.flags.q.stream = 1;
> > +  $$.flags.q.explicit_stream = 0;
> > +  $$.stream = state->out_qualifier->stream;
> > +  }

Iago

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 11/23] glsl: Add support for EmitStreamVertex() and EndStreamPrimitive().

2014-06-19 Thread Iago Toral
After having a quick look at ir_to_mesa.cpp and st_glsl_to_tgsi.cpp I
have some comments and questions about this:

On Wed, 2014-06-18 at 13:31 -0700, Ian Romanick wrote:
> This patch should be split into several patches:
> 
> 1. Modify ir_emit_vertex to have a stream.  This patch also needs to update
> ir_to_mesa.cpp and st_glsl_to_tgsi.cpp.

ir_to_mesa.cpp does not currently implement support for emit_vertex and
end_primitive at all:

void
ir_to_mesa_visitor::visit(ir_emit_vertex *)
{
   assert(!"Geometry shaders not supported.");
}

void
ir_to_mesa_visitor::visit(ir_end_primitive *)
{
   assert(!"Geometry shaders not supported.");
}

so doing this, as far as I can see, involves defining opcodes for
emit_vertex and end_primitive and then handle these opcodes properly in
other places of the code so things can get done I guess. I am not yet
familiar with this parts of the code base, so I guess I'll need some
time to figure how to do this right. Since ir_to_mesa.cpp is not
currently supporting ir_emit_vertex and ir_end_primitive at all, would
it be okay if we do this on a later patch after this series has been
reviewed ad merged?

Also, how can I debug this part of the code? what drivers are using
this?

Regarding st_glsl_to_tgsi.cpp, I think I can use some guidance since I
am not familiar with gallium. Seeing what is being done for other IR
types, I think I need to do something like:

st_src_reg stream_src = ir->stream->accept(this);

and then emit the TGSI_OPCODE_EMIT passing stream_src to it.

However, I see that glsl_to_tgsi_visitor::emit() can take no src/dst
arguments or if it takes, it takes one dst/src pair at least, so I am
not sure what should be done here... what do I have to use for the dst
parameter? should I just pass undef_dst? something else?

> 2. Modify ir_end_primitive to have a stream.  This patch also needs to update
> ir_to_mesa.cpp and st_glsl_to_tgsi.cpp.
> 
> 3. Add the new built-in functions.
> 
> A couple other minor comments below...
> 
> On 06/18/2014 02:51 AM, Iago Toral Quiroga wrote:
> > ---
> >  src/glsl/builtin_functions.cpp   | 52 
> > ++--
> >  src/glsl/ir.h| 34 +--
> >  src/glsl/ir_hierarchical_visitor.cpp | 50 
> > +-
> >  src/glsl/ir_hierarchical_visitor.h   |  6 +++--
> >  src/glsl/ir_hv_accept.cpp| 21 ---
> >  src/glsl/ir_rvalue_visitor.cpp   | 37 +
> >  src/glsl/ir_rvalue_visitor.h |  6 +
> >  src/glsl/lower_output_reads.cpp  |  4 +--
> >  src/glsl/lower_packed_varyings.cpp   |  4 +--
> >  src/glsl/opt_dead_code_local.cpp |  2 +-
> >  10 files changed, 178 insertions(+), 38 deletions(-)
> > 
> > diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
> > index f9f0686..07a0722 100644
> > --- a/src/glsl/builtin_functions.cpp
> > +++ b/src/glsl/builtin_functions.cpp
> > @@ -359,6 +359,12 @@ shader_image_load_store(const _mesa_glsl_parse_state 
> > *state)
> > state->ARB_shader_image_load_store_enable);
> >  }
> >  
> > +static bool
> > +gs_streams(const _mesa_glsl_parse_state *state)
> > +{
> > +   return gpu_shader5(state) && gs_only(state);
> > +}
> > +
> >  /** @} */
> >  
> >  
> > /**/
> > @@ -594,6 +600,10 @@ private:
> >  
> > B0(EmitVertex)
> > B0(EndPrimitive)
> > +   ir_function_signature *_EmitStreamVertex(builtin_available_predicate 
> > avail,
> > +const glsl_type *stream_type);
> > +   ir_function_signature *_EndStreamPrimitive(builtin_available_predicate 
> > avail,
> > +  const glsl_type 
> > *stream_type);
> >  
> > B2(textureQueryLod);
> > B1(textureQueryLevels);
> > @@ -1708,6 +1718,14 @@ builtin_builder::create_builtins()
> >  
> > add_function("EmitVertex",   _EmitVertex(),   NULL);
> > add_function("EndPrimitive", _EndPrimitive(), NULL);
> > +   add_function("EmitStreamVertex",
> > +_EmitStreamVertex(gs_streams, glsl_type::uint_type),
> > +_EmitStreamVertex(gs_streams, glsl_type::int_type),
> > +NULL);
> > +   add_function("EndStreamPrimitive",
> > +_EndStreamPrimitive(gs_streams, glsl_type::uint_type),
> > +_EndStreamPrimitive(gs_streams, glsl_type::int_type),
> > 

Re: [Mesa-dev] [PATCH v2 11/23] glsl: Add support for EmitStreamVertex() and EndStreamPrimitive().

2014-06-19 Thread Iago Toral
Samuel has reminded me that Ilia has already done some changes to
Gallium in order to check a piglit test for multi-stream support there,
so adding to the CC.

On Thu, 2014-06-19 at 12:37 +0200, Iago Toral wrote:
> After having a quick look at ir_to_mesa.cpp and st_glsl_to_tgsi.cpp I
> have some comments and questions about this:
> 
> On Wed, 2014-06-18 at 13:31 -0700, Ian Romanick wrote:
> > This patch should be split into several patches:
> > 
> > 1. Modify ir_emit_vertex to have a stream.  This patch also needs to update
> > ir_to_mesa.cpp and st_glsl_to_tgsi.cpp.
(...)
> Regarding st_glsl_to_tgsi.cpp, I think I can use some guidance since I
> am not familiar with gallium. Seeing what is being done for other IR
> types, I think I need to do something like:
> 
> st_src_reg stream_src = ir->stream->accept(this);
> 
> and then emit the TGSI_OPCODE_EMIT passing stream_src to it.
> 
> However, I see that glsl_to_tgsi_visitor::emit() can take no src/dst
> arguments or if it takes, it takes one dst/src pair at least, so I am
> not sure what should be done here... what do I have to use for the dst
> parameter? should I just pass undef_dst? something else?
> 
> > 2. Modify ir_end_primitive to have a stream.  This patch also needs to 
> > update
> > ir_to_mesa.cpp and st_glsl_to_tgsi.cpp.
> > 
> > 3. Add the new built-in functions.
> > 
> > A couple other minor comments below...
> > 
> > On 06/18/2014 02:51 AM, Iago Toral Quiroga wrote:
> > > ---
> > >  src/glsl/builtin_functions.cpp   | 52 
> > > ++--
> > >  src/glsl/ir.h| 34 +--
> > >  src/glsl/ir_hierarchical_visitor.cpp | 50 
> > > +-
> > >  src/glsl/ir_hierarchical_visitor.h   |  6 +++--
> > >  src/glsl/ir_hv_accept.cpp| 21 ---
> > >  src/glsl/ir_rvalue_visitor.cpp   | 37 +
> > >  src/glsl/ir_rvalue_visitor.h |  6 +
> > >  src/glsl/lower_output_reads.cpp  |  4 +--
> > >  src/glsl/lower_packed_varyings.cpp   |  4 +--
> > >  src/glsl/opt_dead_code_local.cpp |  2 +-
> > >  10 files changed, 178 insertions(+), 38 deletions(-)
> > > 
> > > diff --git a/src/glsl/builtin_functions.cpp 
> > > b/src/glsl/builtin_functions.cpp
> > > index f9f0686..07a0722 100644
> > > --- a/src/glsl/builtin_functions.cpp
> > > +++ b/src/glsl/builtin_functions.cpp
> > > @@ -359,6 +359,12 @@ shader_image_load_store(const _mesa_glsl_parse_state 
> > > *state)
> > > state->ARB_shader_image_load_store_enable);
> > >  }
> > >  
> > > +static bool
> > > +gs_streams(const _mesa_glsl_parse_state *state)
> > > +{
> > > +   return gpu_shader5(state) && gs_only(state);
> > > +}
> > > +
> > >  /** @} */
> > >  
> > >  
> > > /**/
> > > @@ -594,6 +600,10 @@ private:
> > >  
> > > B0(EmitVertex)
> > > B0(EndPrimitive)
> > > +   ir_function_signature *_EmitStreamVertex(builtin_available_predicate 
> > > avail,
> > > +const glsl_type 
> > > *stream_type);
> > > +   ir_function_signature 
> > > *_EndStreamPrimitive(builtin_available_predicate avail,
> > > +  const glsl_type 
> > > *stream_type);
> > >  
> > > B2(textureQueryLod);
> > > B1(textureQueryLevels);
> > > @@ -1708,6 +1718,14 @@ builtin_builder::create_builtins()
> > >  
> > > add_function("EmitVertex",   _EmitVertex(),   NULL);
> > > add_function("EndPrimitive", _EndPrimitive(), NULL);
> > > +   add_function("EmitStreamVertex",
> > > +_EmitStreamVertex(gs_streams, glsl_type::uint_type),
> > > +_EmitStreamVertex(gs_streams, glsl_type::int_type),
> > > +NULL);
> > > +   add_function("EndStreamPrimitive",
> > > +_EndStreamPrimitive(gs_streams, glsl_type::uint_type),
> > > +_EndStreamPrimitive(gs_streams, glsl_type::int_type),
> > > +NULL);
> > >  
> > > add_function("textureQueryLOD",
> > >  _textureQueryLod(glsl_type::sampler1D_type,  
> > > glsl_type::float_type),
> > &g

Re: [Mesa-dev] [PATCH v2 11/23] glsl: Add support for EmitStreamVertex() and EndStreamPrimitive().

2014-06-19 Thread Iago Toral
On Thu, 2014-06-19 at 15:18 -0700, Ian Romanick wrote:
> On 06/19/2014 03:37 AM, Iago Toral wrote:
> > After having a quick look at ir_to_mesa.cpp and st_glsl_to_tgsi.cpp I
> > have some comments and questions about this:
> > 
> > On Wed, 2014-06-18 at 13:31 -0700, Ian Romanick wrote:
> >> This patch should be split into several patches:
> >>
> >> 1. Modify ir_emit_vertex to have a stream.  This patch also needs to update
> >> ir_to_mesa.cpp and st_glsl_to_tgsi.cpp.
> > 
> > ir_to_mesa.cpp does not currently implement support for emit_vertex and
> > end_primitive at all:
> > 
> > void
> > ir_to_mesa_visitor::visit(ir_emit_vertex *)
> > {
> >assert(!"Geometry shaders not supported.");
> > }
> > 
> > void
> > ir_to_mesa_visitor::visit(ir_end_primitive *)
> > {
> >assert(!"Geometry shaders not supported.");
> > }
> 
> But the methods with these signatures have been deleted from
> ir_hierarchical_visitor.  They (and the st_glsl_to_tgsi counterparts)
> need to be updated to be either visit_enter or visit_leave.  That's all. :)

Actually, both ir_to_mesa_visitor and glsl_to_tgsi_visitor inherit from
ir_visitor which is a separate class hierarchy from
ir_hierarchical_visitor and only has visit methods so it is unaffected
by my patch which only alters hierarchical visitors... so I think this
is okay as it is, right?

FWIW, both files are still building fine with my patch and I can
actually run geometry shaders that use EmitVertex() with
LIBGL_ALWAYS_SOFTWARE=1, which goes through the st_glsl_to_tgsi.cpp
path.

Iago

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 12/23] glsl: Validate vertex emission in geometry shaders.

2014-06-19 Thread Iago Toral
On Wed, 2014-06-18 at 13:38 -0700, Ian Romanick wrote:
> On 06/18/2014 02:51 AM, Iago Toral Quiroga wrote:
(...)
> > +  /* From the ARB_gpu_shader5 spec:
> > +   *
> > +   *   "Multiple vertex streams are supported only if the output 
> > primitive
> > +   *type is declared to be "points".  A program will fail to link 
> > if it
> > +   *contains a geometry shader calling EmitStreamVertex() or
> > +   *EndStreamPrimitive() if its output primitive type is not 
> > "points".
> > +   *
> > +   * However, in the same spec:
> > +   *
> > +   *   "The function EmitVertex() is equivalent to calling 
> > EmitStreamVertex()
> > +   *with  set to zero."
> > +   *
> > +   * And:
> > +   *
> > +   *   "The function EndPrimitive() is equivalent to calling
> > +   *EndStreamPrimitive() with  set to zero."
> > +   *
> > +   * Since we can call EmitVertex() and EndPrimitive() when we output
> > +   * primitives other than points, calling EmitStreamVertex(0) or
> > +   * EmitEndPrimitive(0) should not produce errors. This it also what 
> > Nvidia
> > +   * does. Currently we only set prog->Geom.UsesStreams to TRUE when
> > +   * EmitStreamVertex() or EmitEndPrimitive() are called with a 
> > non-zero
> > +   * stream.
> 
> Does AMD also behave this way?  If so, I can submit a spec bug to make
> these explicitly legal.  Otherwise, I think we should not allow it.  We
> probably ought to check Apple and the Intel Windows driver too...

I could test this on the proprietary AMD Linux driver:

A geometry shader with output triangle_strip using EmitStreamVertex(0)
and EmitEndPrimitive(0) works fine: no link errors, correct output.

A geometry shader with output triangle_strip using EmitStreamVertex(1)
does not produce a link error (which is wrong by the spec) and produces
incorrect output (blank screen).

A geometry shader with output triangle_strip using EmitStreamVertex(0)
that also calls EmitStreamPrimitive(1) does not produce a link error
(which is wrong by the spec) and produces correct output (the end stream
primitive on stream 1 is really a no-op since there are no vertices
emitted on that stream).

So I'd say that AMD is not respecting the spec too much here, but it
looks like using EmitStreamVertex(0) or EndStreamPrimitive(0) with
output types other than points works on both Nvidia and AMD proprietary
drivers in the end.

So Ian, what do you think we should do?

Iago





___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 00/23] Multi-stream support for geometry shaders

2014-06-24 Thread Iago Toral

Hi,

this is a summary of the review process for the multi-stream support 
series. Most of the patches got a reviewed-by by either Chris or Ian and 
if I am not mistaken only 4 patches still need a reviewed-by so I 
hopefully we get these 4 reviewed soon :). Here is the situation of each 
of these patches:


Patch 11: glsl: Add support for EmitStreamVertex() and 
EndStreamPrimitive().
- Ian requested to split this in 3 patches which have already been sent 
to the mailing list as a reply.

  - glsl: Modify ir_emit_vertex to have a stream.
  - glsl: Modify ir_end_primitive to have a stream.
  - Add support for EmitStreamVertex() and EndStreamPrimitive().
- There was some discussion on whether ir_to_mesa.cpp and 
st_glsl_to_tgsi.cpp had to be updated too. Based on what was discussed I 
concluded that it was not necessary in the end.


Patch 12: glsl: Validate vertex emission in geometry shaders.
- We need to decide if we allow EmitStreamVertex(0) with outputs other 
than points. Since Nvidia allows this Ian asked if AMD allowed this too 
in order to decide if a bug to the spec should be filed. I confirmed 
that AMD allows this too. Maybe we want to test other systems before 
making a decision anyway.
- I can also change the patch to disallow this behavior until we decide 
otherwise if that is preferred.


Patch 18: i965: Implement GL_PRIMITIVES_GENERATED with non-zero streams.
- Needs review

Patch 21: docs: mark "Geometry shader multiple streams" as done for i965
- Needs review

Iago


El 2014-06-18 11:51, Iago Toral Quiroga escribió:
This series addresses review comments of the previous version and adds 
a few

things. Summary of the changes in version 2:

Patch 1:
  - Ian: multiple layout(location=) qualifiers
+ This was not a problem in the end.
  - Chris: s/explicitely/explicitly/
  - Chris: Remove 'Id' suffix in variables.
  - Chris: !this->flags.q.streamId for consistency
  - Chris: checks should pass if state->is_version(400, 0)
  - Chris: Check against the real limit and not MAX_VERTEX_STREAMS
  + Plus fixed some other issues we found

Patch 2: already reviewed by Chris, unchanged.

Patch 3:
  - Chris: Replace magic 4's with appropriate constant
+ Also added necessary memset(so_decl, 0, sizeof(so_decl));

Patch 4: already reviewed by Chris, unchanged.

Patch 5: [DONE]
  - Chris: Not right for separable programs.
+ Original patch was okay after discussion in the mailing list, but
  I modified it to make the code more clear.

Patch 6: already reviewed by Chris, unchanged.

Patch 7: already reviewed by Chris, unchanged.

Patch 8: already reviewed by Chris, unchanged.

Patch 9:
  - Chris: Does not work with multiple geometry shaders
+ Now we only store info in the program and not individual shaders 
and we
  set this value after linking all the shaders together (see patch 
12).


Patch 10: already reviewed by Chris, unchanged.

Patch 11:
  - Chris: EmitStreamVertex() and EndStreamPrimitive() should be normal 
builtins
+ Reimplemented this following Chris' recommendations in the 
mailing list.


Patch 12: already reviewed by Chris, but now expanded:
  + Includes original patch 12 reviewed-by Chris.
  + Sets prog->Geom.UsesStreams (this was being done in patch 11 in the 
original

series).
  + Link error if emitting to wrong streams (this was being done in 
patch 11 in

original series)

Patch 13: already reviewed by Chris, unchanged.

Patch 14: already reviewed by Chris, but modified to adapt to changes 
in the

implementation of ir_emit_vertex and ir_end_primitive from patch 11.

Patch 15: already reviewed by Chris, unchanged.

Patch 16:
  - Chris: Do no set Ctx->Const.MaxVertexStreams to MAX_VERTEX_STREAMS 
globally.

+ Now this is done in the i965 driver only (see patch 20).

Patch 17: already reviewed by Chris, unchanged.

Patch 18: NEW
  + Adds multi-stream querying for GL_PRIMITIVES_GENERATED

Patch 19 (was patch 18 in original series)
  - Chris: s/stream/index
  + Plus added the same logic for GL_PRIMITIVES_GENERATED

Patch 20: NEW
  + Enables up to MAX_VERTEX_STREAMS in i965.

Patch 21: NEW
  + Mark multiple streams done for i965.

Patch 22: NEW
Patch 23: NEW
  + Fixed incorrect initilization and cloning of UsesEndPrimitive flag.


Iago Toral Quiroga (20):
  mesa: add StreamId information to transform feedback outputs.
  i965: Enable transform feedback for streams > 0
  glsl: Assign GLSL StreamIds to transform feedback outputs.
  glsl: Fail to link if inter-stage input/outputs are not assigned to
stream 0
  glsl: Add methods to retrive a varying's name and streamId.
  glsl: Two varyings can't write to the same buffer from different
streams.
  glsl: Only geometry shader outputs can be associated with non-zero
streams.
  glsl: Store info about geometry shaders that emit vertices to 
non-zero

streams.
  i965/gs: Set number of control data bits for stream mode.
  glsl: Add support 

Re: [Mesa-dev] [PATCH v2 18/23] i965: Implement GL_PRIMITIVES_GENERATED with non-zero streams.

2014-06-25 Thread Iago Toral
Hi Chris,

On Tue, 2014-06-24 at 22:12 +1200, Chris Forbes wrote:
> Continuing from this, I think you need a test to ensure that the
> points emitted in nonzero streams are not accidentally rendered when
> SO isn't active.
> 
> According to that same section of the Haswell PRM, if SO Function
> Enable is 0, it appears the StreamID bits are ignored entirely, rather
> than filtered by dw1.27-28 Render Stream Select.

Right... and this behavior seems to be specific to Haswell. IvyBridge
says that Render Stream Select is used even if SO Function Enable is 0,
so it should not be a problem there.

Since your suggestion to implement GL_PRIMITIVES_GENERATED (which I
already confirmed works well on Ivy) means that we will always enable SO
the above shouldn't really ever happen but adding that piglit test to
avoid regressions if we change that configuration in the future makes
sense. We will add that test to the batch we sent for review to piglit.

Iago

> On Tue, Jun 24, 2014 at 9:53 PM, Chris Forbes  wrote:
> > It looks like you can have the SOL stage increment that counter even
> > when not doing any actual streamout, which should give you the correct
> > semantics.
> >
> > See the definition of 3DSTATE_STREAMOUT in the Haswell PRM, Volume 2b. You 
> > want:
> >
> > - dw1.31 SO Function Enable = 1
> > - dw1.25 SO Statistics Enable = 1
> > - dw1.8-11 SO Buffer Enable [n] = 0
> >
> > This should behave just like having the unit disabled, except the
> > SO_PRIMITIVE_STORAGE_NEEDED registers will increment.
> >
> > -- Chris
> >
> >
> > On Wed, Jun 18, 2014 at 9:51 PM, Iago Toral Quiroga  
> > wrote:
> >> So far we have been using CL_INVOCATION_COUNT to resolve this query but 
> >> this
> >> is no good with streams, as only stream 0 reaches the clipping stage.
> >>
> >> From ARB_transform_feedback3:
> >>
> >> "When a generated primitive query for a vertex stream is active, the
> >>  primitives-generated count is incremented every time a primitive emitted 
> >> to
> >>  that stream reaches the Discarding Rasterization stage (see Section 3.x)
> >>  right before rasterization. This counter is incremented whether or not
> >>  transform feedback is active."
> >>
> >> Unfortunately, I don't see any registers that provide the number of 
> >> primitives
> >> written to a specific stream other than the ones that track the number of
> >> primitives written to transform feedback in the SOL stage, so I think we 
> >> can't
> >> implement this exactly as specified. This should not be a major issue 
> >> though,
> >> since non-zero streams are pointless unless they are bound to transform 
> >> feedback
> >> buffers anyway.
> >>
> >> Also in ARB_transform_feedback3:
> >>
> >> "These two types of queries can be used together to determine if all
> >>  primitives in a given vertex stream have been written to the bound
> >>  feedback buffers; if both queries are run simultaneously and the query
> >>  results are equal, all primitives have been written to the buffer(s).  If
> >>  the number of primitives written is less than the number of primitives
> >>  generated, one or more buffers overflowed.
> >>
> >> This refers to GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN and
> >> GL_PRIMITIVES_GENERATED. This behavior is achieved by implementing 
> >> primitives
> >> generated queries through GEN7_SO_PRIM_STORAGE_NEEDED for non-zero streams.
> >> ---
> >>  src/mesa/drivers/dri/i965/gen6_queryobj.c | 13 +
> >>  1 file changed, 9 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/src/mesa/drivers/dri/i965/gen6_queryobj.c 
> >> b/src/mesa/drivers/dri/i965/gen6_queryobj.c
> >> index 0cb64ca..ee33e1d 100644
> >> --- a/src/mesa/drivers/dri/i965/gen6_queryobj.c
> >> +++ b/src/mesa/drivers/dri/i965/gen6_queryobj.c
> >> @@ -84,11 +84,16 @@ brw_store_register_mem64(struct brw_context *brw,
> >>
> >>  static void
> >>  write_primitives_generated(struct brw_context *brw,
> >> -   drm_intel_bo *query_bo, int idx)
> >> +   drm_intel_bo *query_bo, int stream, int idx)
> >>  {
> >> intel_batchbuffer_emit_mi_flush(brw);
> >>
> >> -   brw_store_register_mem64(brw, query_bo, CL_INVOCATION_COUNT, idx);
> >> +   if (brw->gen >= 7 && stream > 0) {
> >> +  brw_store_register_mem64(brw, query_bo,
&

[Mesa-dev] i965: possible hardware bug affecting multi-stream support

2014-06-26 Thread Iago Toral
Hello,

while testing various scenarios for multi-stream support in geometry
shaders I came across one that I think might be a hardware bug, or at
the very least, a hardware limitation that creates a problem to
implement correct behavior according to ARB_transform_feedback3.

The conflictive scenario is activated with this setup:
  - Enable transform feedback.
  - Do not associate any varyings with one particular stream (let's call
this stream X).
  - Have the GS emit a vertex to stream X.

ARB_transform_feedback3 clarifies expected behavior in this case:

"If the set of varyings selected for transform feedback does not include
any belonging to the specified stream, nothing will be recorded when
primitives are emitted to that stream, and the corresponding vertex
count will be zero."

However, we get two possible outcomes with this setup:

1) If the vertex emitted to that stream is not the last vertex emitted
by the GS, then primitive count for that stream is incorrect (returns
0), but everything else works ok.

I think this behavior is expected as per the IvyBridge documentation:

"8.3 Stream Output Function:
...
If a stream has no SO_DECL state defined (NumEntries is 0), incoming
objects targeting that stream are effectively ignored. As there is no
attempt to perform stream output, overflow detection is neither required
nor performed."

Which means that we can't use SO_PRIMITIVE_STORAGE_NEEDED for the
primitive count in this case. We could still use CL_INVOCATION_COUNT for
stream 0, but that would not fix the problem for other streams.

2) If the vertex emitted to that stream is the last vertex emitted by
the GS, then transform feedback does not work for any stream (no values
are recorded in the TF buffers) and primitive queries for all streams
return 0. Rendering is okay though: stream 0 outputs are rendered
properly and outputs from other streams are discarded. This, I think, is
a hardware problem.

With this setup, we are configuring the 3DSTATE_SO_DECL_LIST command for
stream X like this:

Buffer Selects (Stream X) = 0
Num Entries (Stream X) = 0

that is, that stream writes to no buffers and has no declarations to
write, which is correct.

Now comes the funny part: simply forcing Num Entries(Stream X) = 1, even
if there are no declarations, makes TF and primitive queries work again
for all streams but X, and for stream X, primitive count is ok, but TF
is not (but that is kind of expected since we are not configuring it
properly). More over, if I also force Buffer Selects (Stream X) = N (so
that N is the index of a disabled TF buffer), then TF also works as
expected for Stream X (primitives generated is okay, TF primitives
written is 0, and no TF data for that stream is written).

It looks like the hardware does not like setups where there are streams
that have 0 varyings to record after all, even less so if the last
vertex we emit is sent to such a stream.

Based on the above, there is a work around for this but I think it is
pretty ugly so I would like to know other people's thoughts on whether
it is worth implementing. It would involve the following:

In upload_3dstate_streamout() we make sure we disable all transform
feedback buffers that are not going to record information (currently a
TF buffer is activated as far as the user has called
glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, index bufferName)). We
can know if a buffer is not going to be written by inspecting its
BufferStride: it should be 0 for buffers that won't get written. I think
this is probably good t do in any case.

Then the ugly part: in gen7_upload_3dstate_so_decl_list(), if we detect
a stream with no varyings bound to it (so num delcs is 0) *and* there
are disabled TF buffers, we silently set num decls for that stream to 1
and set its buffer_mask to write to one of the disabled buffers (it
won't actually write because they are disabled).

I have a patch for this [1] and seems to fix the problem (although it
only works as far as we have disabled TF buffers available).

Opinions? Is there any other alternative to work around this issue?

The problem is particularly annoying because I think it hits a very
likely scenario: an application using stream 0 for rendering only (no
TF) and using other streams to capture TF.

Iago

[1] Patch:

diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c 
b/src/mesa/drivers/dri/i965/gen7_sol_state.c
index d2c3ae3..1450dde 100644
--- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
@@ -189,6 +189,27 @@ gen7_upload_3dstate_so_decl_list(struct brw_context *brw,
  max_decls = decls[stream_id];
}
 
+   /* We need to inspect if we have streams for which we don't have any
+* varyings to record. The hardware does not handle this scenario well
+* and for TF to work in this case we need to configure such streams to
+* have at least one decl and write to some disabled buffer.
+*/
+   int disabled_buffer = -1;
+   for (int i = 0; i < 4

Re: [Mesa-dev] i965: possible hardware bug affecting multi-stream support

2014-06-26 Thread Iago Toral
If by not doing anything you mean not processing or removing the
ir_emit_vertex instructions for that stream this would have two problems
at least:

1) We won't get correct results for GL_PRIMITIVES_GENERATED in that
stream (it will always be 0). This may be a minor problem.

2) If that stream is stream 0 and rendering is enabled then we lose
rendering output, which would be a major problem.

So I think this is not a good thing to do.

Iago

On Fri, 2014-06-27 at 08:08 +1200, Chris Forbes wrote:
> As an alternative -- we know if we have this scenario at link time --
> could we perhaps just not do anything in EmitStreamVertex if there are
> no varyings captured to that stream?
> On Thu, Jun 26, 2014 at 10:26 PM, Iago Toral  wrote:
> > Hello,
> >
> > while testing various scenarios for multi-stream support in geometry
> > shaders I came across one that I think might be a hardware bug, or at
> > the very least, a hardware limitation that creates a problem to
> > implement correct behavior according to ARB_transform_feedback3.
> >
> > The conflictive scenario is activated with this setup:
> >   - Enable transform feedback.
> >   - Do not associate any varyings with one particular stream (let's call
> > this stream X).
> >   - Have the GS emit a vertex to stream X.
> >
> > ARB_transform_feedback3 clarifies expected behavior in this case:
> >
> > "If the set of varyings selected for transform feedback does not include
> > any belonging to the specified stream, nothing will be recorded when
> > primitives are emitted to that stream, and the corresponding vertex
> > count will be zero."
> >
> > However, we get two possible outcomes with this setup:
> >
> > 1) If the vertex emitted to that stream is not the last vertex emitted
> > by the GS, then primitive count for that stream is incorrect (returns
> > 0), but everything else works ok.
> >
> > I think this behavior is expected as per the IvyBridge documentation:
> >
> > "8.3 Stream Output Function:
> > ...
> > If a stream has no SO_DECL state defined (NumEntries is 0), incoming
> > objects targeting that stream are effectively ignored. As there is no
> > attempt to perform stream output, overflow detection is neither required
> > nor performed."
> >
> > Which means that we can't use SO_PRIMITIVE_STORAGE_NEEDED for the
> > primitive count in this case. We could still use CL_INVOCATION_COUNT for
> > stream 0, but that would not fix the problem for other streams.
> >
> > 2) If the vertex emitted to that stream is the last vertex emitted by
> > the GS, then transform feedback does not work for any stream (no values
> > are recorded in the TF buffers) and primitive queries for all streams
> > return 0. Rendering is okay though: stream 0 outputs are rendered
> > properly and outputs from other streams are discarded. This, I think, is
> > a hardware problem.
> >
> > With this setup, we are configuring the 3DSTATE_SO_DECL_LIST command for
> > stream X like this:
> >
> > Buffer Selects (Stream X) = 0
> > Num Entries (Stream X) = 0
> >
> > that is, that stream writes to no buffers and has no declarations to
> > write, which is correct.
> >
> > Now comes the funny part: simply forcing Num Entries(Stream X) = 1, even
> > if there are no declarations, makes TF and primitive queries work again
> > for all streams but X, and for stream X, primitive count is ok, but TF
> > is not (but that is kind of expected since we are not configuring it
> > properly). More over, if I also force Buffer Selects (Stream X) = N (so
> > that N is the index of a disabled TF buffer), then TF also works as
> > expected for Stream X (primitives generated is okay, TF primitives
> > written is 0, and no TF data for that stream is written).
> >
> > It looks like the hardware does not like setups where there are streams
> > that have 0 varyings to record after all, even less so if the last
> > vertex we emit is sent to such a stream.
> >
> > Based on the above, there is a work around for this but I think it is
> > pretty ugly so I would like to know other people's thoughts on whether
> > it is worth implementing. It would involve the following:
> >
> > In upload_3dstate_streamout() we make sure we disable all transform
> > feedback buffers that are not going to record information (currently a
> > TF buffer is activated as far as the user has called
> > glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, index bufferName)). We
> > can know if a buffer is not going to be written by inspecting its
> > 

Re: [Mesa-dev] [PATCH v2 00/23] Multi-stream support for geometry shaders

2014-06-29 Thread Iago Toral
Great, thanks Chris.

Ilia, I'll push the patches today.

Iago

On Sat, 2014-06-28 at 08:46 +1200, Chris Forbes wrote:
> Ilia,
> 
> That bikeshed is already done :)
> 
> I just send an r-b for v3 of patch 18, which is the last patch that lacked 
> one.
> 
> I think this is all good to land now.
> 
> -- Chris
> 
> On Sat, Jun 28, 2014 at 5:11 AM, Ilia Mirkin  wrote:
> > Iago,
> >
> > Not sure where you are with these patches... I guess some updates have
> > been requested to rename stream -> index in some places? Anyways, once
> > the generic bits are all sorted out and reviewed, I'd appreciate it if
> > they could be pushed out, even if you're still working out hw-related
> > issues with i965 so that the gallium mesa/st integration work can be
> > pushed out as well.
> >
> > Thanks,
> >
> >   -ilia
> >
> > On Tue, Jun 24, 2014 at 4:10 AM, Iago Toral  wrote:
> >> Hi,
> >>
> >> this is a summary of the review process for the multi-stream support 
> >> series.
> >> Most of the patches got a reviewed-by by either Chris or Ian and if I am 
> >> not
> >> mistaken only 4 patches still need a reviewed-by so I hopefully we get 
> >> these
> >> 4 reviewed soon :). Here is the situation of each of these patches:
> >>
> >> Patch 11: glsl: Add support for EmitStreamVertex() and 
> >> EndStreamPrimitive().
> >> - Ian requested to split this in 3 patches which have already been sent to
> >> the mailing list as a reply.
> >>   - glsl: Modify ir_emit_vertex to have a stream.
> >>   - glsl: Modify ir_end_primitive to have a stream.
> >>
> >>   - Add support for EmitStreamVertex() and EndStreamPrimitive().
> >> - There was some discussion on whether ir_to_mesa.cpp and
> >> st_glsl_to_tgsi.cpp had to be updated too. Based on what was discussed I
> >> concluded that it was not necessary in the end.
> >>
> >> Patch 12: glsl: Validate vertex emission in geometry shaders.
> >> - We need to decide if we allow EmitStreamVertex(0) with outputs other than
> >> points. Since Nvidia allows this Ian asked if AMD allowed this too in order
> >> to decide if a bug to the spec should be filed. I confirmed that AMD allows
> >> this too. Maybe we want to test other systems before making a decision
> >> anyway.
> >> - I can also change the patch to disallow this behavior until we decide
> >> otherwise if that is preferred.
> >>
> >> Patch 18: i965: Implement GL_PRIMITIVES_GENERATED with non-zero streams.
> >> - Needs review
> >>
> >> Patch 21: docs: mark "Geometry shader multiple streams" as done for i965
> >> - Needs review
> >>
> >> Iago
> >>
> >>
> >> El 2014-06-18 11:51, Iago Toral Quiroga escribió:
> >>
> >>> This series addresses review comments of the previous version and adds a
> >>> few
> >>> things. Summary of the changes in version 2:
> >>>
> >>> Patch 1:
> >>>   - Ian: multiple layout(location=) qualifiers
> >>> + This was not a problem in the end.
> >>>   - Chris: s/explicitely/explicitly/
> >>>   - Chris: Remove 'Id' suffix in variables.
> >>>   - Chris: !this->flags.q.streamId for consistency
> >>>   - Chris: checks should pass if state->is_version(400, 0)
> >>>   - Chris: Check against the real limit and not MAX_VERTEX_STREAMS
> >>>   + Plus fixed some other issues we found
> >>>
> >>> Patch 2: already reviewed by Chris, unchanged.
> >>>
> >>> Patch 3:
> >>>   - Chris: Replace magic 4's with appropriate constant
> >>> + Also added necessary memset(so_decl, 0, sizeof(so_decl));
> >>>
> >>> Patch 4: already reviewed by Chris, unchanged.
> >>>
> >>> Patch 5: [DONE]
> >>>   - Chris: Not right for separable programs.
> >>> + Original patch was okay after discussion in the mailing list, but
> >>>   I modified it to make the code more clear.
> >>>
> >>> Patch 6: already reviewed by Chris, unchanged.
> >>>
> >>> Patch 7: already reviewed by Chris, unchanged.
> >>>
> >>> Patch 8: already reviewed by Chris, unchanged.
> >>>
> >>> Patch 9:
> >>>   - Chris: Does not work with multiple geometry shaders
> >>> + Now we only store info in the progr

Re: [Mesa-dev] [PATCH 2/2] i965: Disable SOL buffers and decls when not doing transform feedback.

2014-07-02 Thread Iago Toral
On Wed, 2014-07-02 at 08:49 +0200, Iago Toral wrote:
> Hello Kenneth, Steven:
> 
> On Tue, 2014-07-01 at 17:25 -0700, Kenneth Graunke wrote:
> > Previously, we only emitted 3DSTATE_SO_BUFFER and 3DSTATE_SO_DECL_LIST
> > when transform feedback was active.  When it was inactive, we disabled
> > the SOL stage in 3DSTATE_SOL so the other state would be ignored.
> > 
> > In commit 3178d2474ae5bdd1102fb3d76a60d1d63c961ff5, Iago enabled the SOL
> > stage universally, so we could implement the GL_PRIMITIVES_GENERATED
> > statistics counter.  This caused every Piglit test to trigger assertions
> > in the simulator, and apparently caused GPU hangs for some users.
> > 
> > Apparently, we're supposed to program 3DSTATE_SO_DECL_LIST to zero when
> > output streams are inactive, but we hadn't been doing so.  Now that SOL
> > is on, we need to do that properly.
> 
> If that is the case then I guess the workaround I was trying to use for
> that hardware issue I mentioned in another e-mail may not be any good,
> since it required to configure a non-zero 3DSTATE_SO_DECL_LIST while
> outputting to a disabled stream... :(

My workaround patch works here in IvyBridge and keeps working with your
patch since it only activates when TF is active, but since it relies on
this stuff I think it would be best if someone tests it on other
hardware when it gets reviewed.

> I never get any GPU hangs though...
> 
> > Experimentally, it looks like we also need to program 3DSTATE_SO_BUFFER
> > to zero as well, or else I get many GPU hangs on Haswell.
> > 
> > Signed-off-by: Kenneth Graunke 
> > Cc: Iago Toral Quiroga 
> > Cc: Chris Forbes 
> > Cc: Steven Newbury 
> > ---
> >  src/mesa/drivers/dri/i965/brw_state.h  |  3 ++-
> >  src/mesa/drivers/dri/i965/gen7_sol_state.c | 34 
> > ++
> >  src/mesa/drivers/dri/i965/gen8_sol_state.c |  2 +-
> >  3 files changed, 28 insertions(+), 11 deletions(-)
> > 
> > Iago,
> > 
> > I noticed that pretty much every Piglit test broke in our simulation
> > environment after your patch to turn on SOL by default.  Something about
> > SO_DECLs not being right.  This patch seems to fix the issue.  Does it
> > look reasonable to you?
> 
> The patch is producing GPU hangs for me in the case where I have a
> geometry shader and no TF, so it seems that it introduces other
> problems. I'll see if I can find the reason for this.

It looks like we have to send at least one decl per stream even if we
are configuring the number of declarations per stream to 0, otherwise we
get this GPU hang on IvyBridge at least.

This seems to contradict clearly the documentation for IvyBridge. From
8.5 3DSTATE_SO_DECL_LIST Command, dw2, bits 7:0, Num Entries[0]:

"It is legal to specify Num Entries = 0 for all four streams
simultaneously. In this case there will be no SO_DECLs included in the
command (only DW 0-2). Note that all Stream to Buffer Selects bits must
be zero in this case (as no streams produce output).

Which is exactly what Kenneth's patch does, but that produces a GPU hang
here.

At the same time, there is this other statement at the beginning of that
same chapter:

"Errata: All 128 decls for all four streams must be included whenever
this command is issued. The “Num Entries [n]” fields still contain the
actual numbers of valid decls."

Not sure if that sentence is the fix to a previous errata or the errata
itself... but in any case, we are never sending all 128 decls but it
looks like at least on IvyBridge we have to send one or we get a GPU
hang.

Kenneth, following the above, this change to your patch fixes the GPU
hang I was running into:

diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c 
b/src/mesa/drivers/dri/i965/gen7_sol_state.c
index 2013406..6d1b93c 100644
--- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
@@ -108,8 +108,10 @@ gen7_upload_3dstate_so_decl_list(struct brw_context *brw,
*  all zero (no buffers written to) and the corresponding Num Entries
*  field to zero (no valid SO_DECLs)."
*/
-  BEGIN_BATCH(3);
-  OUT_BATCH(_3DSTATE_SO_DECL_LIST << 16 | (3 - 2));
+  BEGIN_BATCH(5);
+  OUT_BATCH(_3DSTATE_SO_DECL_LIST << 16 | (5 - 2));
+  OUT_BATCH(0);
+  OUT_BATCH(0);
   OUT_BATCH(0);
   OUT_BATCH(0);
   ADVANCE_BATCH();

Steven, can you try Kenneth's patch with this change too?

Iago

> > Steven,
> > 
> > Does this fix your GPU hangs on Ivybridge?
> 
> Steven, can you provide more info on the use case that is triggering
> your GPU hang (specifically the setup for transform feedback and whether
> you are using a geometry shader or just a vertex shad

<    1   2   3   4   5   6   7   8   9   10   >