[Mesa-dev] [PATCH] st/fbo: use pipe_surface_release instead of pipe_surface_reference

2015-10-14 Thread Krzysztof A. Sobiecki
From: Krzysztof Sobiecki 

pipe_surface_reference have problems with deleted contexts,
so use of pipe_surface_release might be more appropriate.

Fixes Wasteland 2 Director's Cut crash on start.
---
 src/mesa/state_tracker/st_cb_fbo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/state_tracker/st_cb_fbo.c 
b/src/mesa/state_tracker/st_cb_fbo.c
index ff703fa..2a2eb09 100644
--- a/src/mesa/state_tracker/st_cb_fbo.c
+++ b/src/mesa/state_tracker/st_cb_fbo.c
@@ -456,7 +456,7 @@ st_update_renderbuffer_surface(struct st_context *st,
   surf_tmpl.u.tex.first_layer = first_layer;
   surf_tmpl.u.tex.last_layer = last_layer;
 
-  pipe_surface_reference(>surface, NULL);
+  pipe_surface_release(pipe, >surface);
 
   strb->surface = pipe->create_surface(pipe, resource, _tmpl);
}
-- 
2.6.1.204.gc021fdf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [mesa-dev, mesa-demos][PATCH] sharedtex_mt: fix rendering thread hang

2015-10-14 Thread Brian Paul

I just pushed it.  Thanks.

-Brian

On 10/14/2015 02:07 AM, Belal, Awais wrote:

Hi Brian,

Do you want me to update this or is it good to go as is?

BR,
Awais


From: mesa-dev [mesa-dev-boun...@lists.freedesktop.org] on behalf of Belal, 
Awais
Sent: Monday, October 12, 2015 6:33 PM
To: Brian Paul; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [mesa-dev, mesa-demos][PATCH] sharedtex_mt: fix 
rendering thread hang

Hi Brian,

Thanks for your reply :)
The move of variable definition was just to make the code look a little cleaner.

BR,
Awais


From: Brian Paul [bri...@vmware.com]
Sent: Monday, October 12, 2015 6:31 PM
To: Belal, Awais; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [mesa-dev, mesa-demos][PATCH] sharedtex_mt: fix 
rendering thread hang

On 10/12/2015 05:25 AM, Belal, Awais wrote:

Hi,

Is there are a reservation against the below patch?


Looks OK, but one comment below.



BR,
Awais


From: mesa-dev [mesa-dev-boun...@lists.freedesktop.org] on behalf of Belal, 
Awais
Sent: Thursday, October 08, 2015 2:00 PM
To: mesa-dev@lists.freedesktop.org
Subject: [Mesa-dev] [mesa-dev,  mesa-demos][PATCH] sharedtex_mt: fix rendering 
thread hang

XNextEvent is a blocking call which locks up the display mutex
this causes the rendering threads to hang when they try call
glXSwapBuffers() as that tries to take the same mutex in
underlying calls through XCopyArea().
So we only go to XNextEvent when it has at least one event
and we wouldn't lock indefinitely.

Signed-off-by: Awais Belal 
---
   src/xdemos/sharedtex_mt.c | 9 +++--
   1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/xdemos/sharedtex_mt.c b/src/xdemos/sharedtex_mt.c
index a90903a..1d503c4 100644
--- a/src/xdemos/sharedtex_mt.c
+++ b/src/xdemos/sharedtex_mt.c
@@ -420,9 +420,14 @@ Resize(struct window *h, unsigned int width, unsigned int 
height)
   static void
   EventLoop(void)
   {
+   int i;
+   XEvent event;
  while (1) {
-  int i;
-  XEvent event;
+  /* Do we have an event? */
+  if (XPending(gDpy) == 0) {
+ usleep(1);
+ continue;
+  }
 XNextEvent(gDpy, );
 for (i = 0; i < NumWindows; i++) {
   struct window *h = [i];


Was there particular reason to move the i, event declarations?

In any case, I'll commit this in a bit.

-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev=BQIFAg=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=T0t4QG7chq2ZwJo6wilkFznRSFy-8uDKartPGbomVj8=OGzLAaED2FXBhL6Q1sLFgviEkbFMzCA_bTROi2XgqJA=o_lPvn53_4RWhqKWa6ZyyfO1l6b2SzQMpEL6OtZc6y8=



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: Correctly handle GL_BGRA_EXT in ES3 format_and_type checks

2015-10-14 Thread Eduardo Lima Mitev
On 10/08/2015 12:58 AM, Jason Ekstrand wrote:
> The EXT_texture_format_BGRA extension (which mesa supports
> unconditionally) adds a new format and internal format called GL_BGRA_EXT.
> Previously, this was not really handled at all in
> _mesa_ex3_error_check_format_and_type.  When the checks were tightened in
> commit f15a7f3c, we accidentally tightened things too far and GL_BGRA_EXT
> would always cause an error to be thrown.
> 
> There were two primary issues here.  First, is that
> _mesa_es3_effective_internal_format_for_format_and_type didn't handle the
> GL_BGRA_EXT format.  Second is that it blindly uses _mesa_base_tex_format
> which returns GL_RGBA for GL_BGRA_EXT.  This commit fixes both of these
> issues as well as adds explicit checks that GL_BGRA_EXT is only ever used
> with GL_BGRA_EXT and GL_UNSIGNED_BYTE.
> 
> Signed-off-by: Jason Ekstrand 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92265
> Cc: Eduardo Lima Mitev 
> ---
>  src/mesa/main/glformats.c | 21 +++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/main/glformats.c b/src/mesa/main/glformats.c
> index 7dab33c..843ec02 100644
> --- a/src/mesa/main/glformats.c
> +++ b/src/mesa/main/glformats.c
> @@ -2678,6 +2678,7 @@ 
> _mesa_es3_effective_internal_format_for_format_and_type(GLenum format,
> * internal formats, they do not correspond to GL constants, so the 
> base
> * format is returned instead.
> */
> +  case GL_BGRA_EXT:
>case GL_LUMINANCE_ALPHA:
>case GL_LUMINANCE:
>case GL_ALPHA:
> @@ -2797,8 +2798,19 @@ _mesa_es3_error_check_format_and_type(const struct 
> gl_context *ctx,
>if (effectiveInternalFormat == GL_NONE)
>   return GL_INVALID_OPERATION;
>  
> -  GLenum baseInternalFormat =
> - _mesa_base_tex_format(ctx, effectiveInternalFormat);
> +  GLenum baseInternalFormat;
> +  if (internalFormat == GL_BGRA_EXT) {
> + /* Unfortunately, _mesa_base_tex_format returns a base format of
> +  * GL_RGBA for GL_BGRA_EXT.  This makes perfect sense if you're
> +  * asking the question, "what channels doe this format have?"
> +  * However, if we're trying to determine if two internal formats
> +  * match in the ES3 sense, we actually want GL_BGRA.
> +  */
> + baseInternalFormat = GL_BGRA_EXT;

What's wrong with having _mesa_base_tex_format() return GL_BGRA_EXT for
format GL_BGRA_EXT?

Looking through all usages of _mesa_base_tex_format(), I fail to see a
case where returning GL_BGRA_EXT instead of GL_RGBA would cause a
different behavior.
As I understand the EXT_texture_format_BGRA extension [1],
GL_BGRA_EXT becomes a valid base format at the same level of the others
in the table, which is what _mesa_base_tex_format() claims to return
(quoting the function doc)

"\return the corresponding \u base internal format (GL_ALPHA,
GL_LUMINANCE, GL_DEPTH_STENCIL, GL_LUMANCE_ALPHA, GL_INTENSITY, GL_RGB,
or GL_RGBA), or -1 if invalid enum."

I also tested dEQP and piglit with this modification and saw no regressions.

If we modify _mesa_base_tex_format() to return GL_BGRA_EXT for
GL_BGRA_EXT format, we simplify this code and avoid adding the specific
check inside the code that resolves the effective internal format.


Sorry to chime in late. I just returned from holidays.

> +  } else {
> + baseInternalFormat =
> +_mesa_base_tex_format(ctx, effectiveInternalFormat);
> +  }
>  
>if (internalFormat != baseInternalFormat)
>   return GL_INVALID_OPERATION;
> @@ -2807,6 +2819,11 @@ _mesa_es3_error_check_format_and_type(const struct 
> gl_context *ctx,
> }
>  
> switch (format) {
> +   case GL_BGRA_EXT:
> +  if (type != GL_UNSIGNED_BYTE || internalFormat != GL_BGRA)
> + return GL_INVALID_OPERATION;
> +  break;
> +
> case GL_RGBA:
>switch (type) {
>case GL_UNSIGNED_BYTE:
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/10] i965/meta: Remove fast_clear_color variable

2015-10-14 Thread Ben Widawsky
On Wed, Oct 14, 2015 at 11:52:03AM +0200, Neil Roberts wrote:
> This patch doesn't look right. See this sentence in “Render Target Fast
> Clear”:
> 
> “The pixel shader kernel requires no attributes, and delivers a value of
>  0x in all channels of the render target write message”
> 
> Presumably the fast_clear_color is trying to implement this restriction.
> 
> Regards,
> - Neil
> 

You're right. Originally the patch only touched the color in the resolve pass,
which I believe doesn't matter (nor does the actual shader we bind). However,
I'm sort of baffled now why I'd see no piglit regressions since the clear color
will *never* be all F.

Either way, I'll drop this patch - but my confusion level has increased.

> Ben Widawsky  writes:
> 
> > It doesn't actually serve a purpose AFAICT (in fact, I'm not certain what 
> > it's
> > meant to do).
> >
> > Cc: Kristian Høgsberg 
> > Signed-off-by: Ben Widawsky 
> > ---
> >  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 8 +++-
> >  1 file changed, 3 insertions(+), 5 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> > b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > index 41afc9a..9c51ffb 100644
> > --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > @@ -390,8 +390,6 @@ set_fast_clear_color(struct brw_context *brw,
> > }
> >  }
> >  
> > -static const uint32_t fast_clear_color[4] = { ~0, ~0, ~0, ~0 };
> > -
> >  static void
> >  set_fast_clear_op(struct brw_context *brw, uint32_t op)
> >  {
> > @@ -472,7 +470,7 @@ fast_clear_attachments(struct brw_context *brw,
> >  
> >use_rectlist(brw, true);
> >  
> > -  brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
> > +  brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f);
> >  
> >/* SKL+ also has a resolve mode for compressed render targets and 
> > thus more
> > * bits to let us select the type of resolve.  For fast clear 
> > resolves, it
> > @@ -670,7 +668,7 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
> > gl_framebuffer *fb,
> >fast_clear_attachments(brw, fb, fast_clear_buffers, fast_clear_rect);
> > } else if (fast_clear_buffers) {
> >_mesa_meta_drawbuffers_from_bitfield(fast_clear_buffers);
> > -  brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
> > +  brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f);
> >set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE);
> >brw_draw_rectlist(ctx, _clear_rect, layers);
> >set_fast_clear_op(brw, 0);
> > @@ -785,7 +783,7 @@ brw_meta_resolve_color(struct brw_context *brw,
> >  
> > use_rectlist(brw, true);
> >  
> > -   brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
> > +   brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f);
> >  
> > /* SKL+ also has a resolve mode for compressed render targets and thus 
> > more
> >  * bits to let us select the type of resolve.  For fast clear resolves, 
> > it
> > -- 
> > 2.6.1
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/fbo: use pipe_surface_release instead of pipe_surface_reference

2015-10-14 Thread Brian Paul

On 10/14/2015 10:03 AM, Krzysztof A. Sobiecki wrote:

From: Krzysztof Sobiecki 

pipe_surface_reference have problems with deleted contexts,
so use of pipe_surface_release might be more appropriate.

Fixes Wasteland 2 Director's Cut crash on start.
---
  src/mesa/state_tracker/st_cb_fbo.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/state_tracker/st_cb_fbo.c 
b/src/mesa/state_tracker/st_cb_fbo.c
index ff703fa..2a2eb09 100644
--- a/src/mesa/state_tracker/st_cb_fbo.c
+++ b/src/mesa/state_tracker/st_cb_fbo.c
@@ -456,7 +456,7 @@ st_update_renderbuffer_surface(struct st_context *st,
surf_tmpl.u.tex.first_layer = first_layer;
surf_tmpl.u.tex.last_layer = last_layer;

-  pipe_surface_reference(>surface, NULL);
+  pipe_surface_release(pipe, >surface);

strb->surface = pipe->create_surface(pipe, resource, _tmpl);
 }



Reviewed-by: Brian Paul 

Do you need me to commit this for you?

-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/10] i965/meta/gen9: Individually fast clear color attachments

2015-10-14 Thread Pohjolainen, Topi
On Wed, Oct 14, 2015 at 09:54:43AM -0700, Ben Widawsky wrote:
> On Wed, Oct 14, 2015 at 02:43:24PM +0300, Pohjolainen, Topi wrote:
> > On Wed, Oct 14, 2015 at 11:39:03AM +0200, Neil Roberts wrote:
> > > Ben Widawsky  writes:
> > > 
> > > > The impetus for this patch comes from a seemingly benign statement 
> > > > within the
> > > > spec (quoted within the patch). For me, this patch was at some point 
> > > > critical
> > > > for getting stable piglit results (though this did not seem to be the 
> > > > case on a
> > > > branch Chad was working on).
> > > >
> > > > It is very important for clearing multiple color buffer attachments and 
> > > > can be
> > > > observed in the following piglit tests:
> > > > spec/arb_framebuffer_object/fbo-drawbuffers-none glclear
> > > > spec/ext_framebuffer_multisample/blit-multiple-render-targets 0
> > > >
> > > > Signed-off-by: Ben Widawsky 
> > > > ---
> > > >  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 97 
> > > > +
> > > >  1 file changed, 84 insertions(+), 13 deletions(-)
> > > >
> > > > diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> > > > b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > > > index 7bf52f0..9e6711e 100644
> > > > --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > > > +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > > > @@ -427,6 +427,74 @@ use_rectlist(struct brw_context *brw, bool enable)
> > > > brw->ctx.NewDriverState |= BRW_NEW_FRAGMENT_PROGRAM;
> > > >  }
> > > >  
> > > > +/**
> > > > + * Individually fast clear each color buffer attachment. On previous 
> > > > gens this
> > > > + * isn't required. The motivation for this comes from one line (which 
> > > > seems to
> > > > + * be specific to SKL+). The list item is in section titled _MCS 
> > > > Buffer for
> > > > + * Render Target(s)_
> > > > + *
> > > > + *   "Since only one RT is bound with a clear pass, only one RT can be 
> > > > cleared
> > > > + *   at a time. To clear multiple RTs, multiple clear passes are 
> > > > required."
> > > 
> > > This sentence also appears in the HSW PRM so it seems a bit odd if it's
> > > only causing problems on SKL. I guess if we get Piglit regressions
> > > without it then it makes sense to have the patch. It might be worth just
> > > double checking whether this patch is completely necessary. The wording
> > > in the commit message seems a little unsure.
> > 
> > The spec seems to be missing something as the section discussing "Render
> > Target Fast Clear" seems to suggest the opposite:
> > 
> > "The render target(s) is/are bound as they normally would be, with the MCS
> >  surface defined in SURFACE_STATE."
> 
> I am aware of all this. Neil, yes it is completely necessary for piglit (I 
> don't
> know if anything in the real world does this or not).
> 
> You are both asking to me to provide something which may be impossible, an
> explanation of why the docs and/or hardware are behaving this way. Let me
> respond in kind, please provide an alternate patch which fixes:
> spec/ext_framebuffer_multisample/blit-multiple-render-targets 0
> spec/arb_framebuffer_object/fbo-drawbuffers-none glclear (all subtests)
> 
> FWIW Topi, it's also contradicted in 3DSTATE_PS definition.

You misunderstood me I think, I'm not questioning your patch or your
interpretation, or asking you to provide some information that just isn't
there in the spec. We talked about this quite a bit. I'm just saying that I
feel that something is missing in the spec.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/fbo: use pipe_surface_release instead of pipe_surface_reference

2015-10-14 Thread Krzysztof A. Sobiecki
Brian Paul  writes:
> Reviewed-by: Brian Paul 
>
> Do you need me to commit this for you?
>
> -Brian
>
Yes, it would be nice.
Thanks.
-- 
X was an interactive protocol: 
alpha blending a full-screen image looked like slugs racing down the monitor. 
http://www.keithp.com/~keithp/talks/usenix2000/render.html
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: remove unused texUnit local in _mesa_BindTextureUnit()

2015-10-14 Thread Anuj Phogat
On Tue, Oct 13, 2015 at 7:10 PM, Brian Paul  wrote:
> The texture unit is error-checked before this and the texUnit var
> is unused, so remove it.
> ---
>  src/mesa/main/texobj.c | 7 ---
>  1 file changed, 7 deletions(-)
>
> diff --git a/src/mesa/main/texobj.c b/src/mesa/main/texobj.c
> index b571b1b..3182920 100644
> --- a/src/mesa/main/texobj.c
> +++ b/src/mesa/main/texobj.c
> @@ -1759,19 +1759,12 @@ _mesa_BindTextureUnit(GLuint unit, GLuint texture)
>  {
> GET_CURRENT_CONTEXT(ctx);
> struct gl_texture_object *texObj;
> -   struct gl_texture_unit *texUnit;
>
> if (unit >= _mesa_max_tex_unit(ctx)) {
>_mesa_error(ctx, GL_INVALID_VALUE, "glBindTextureUnit(unit=%u)", unit);
>return;
> }
>
> -   texUnit = _mesa_get_tex_unit(ctx, unit);
> -   assert(texUnit);
> -   if (!texUnit) {
> -  return;
> -   }
> -
> if (MESA_VERBOSE & (VERBOSE_API|VERBOSE_TEXTURE))
>_mesa_debug(ctx, "glBindTextureUnit %s %d\n",
>_mesa_enum_to_string(GL_TEXTURE0+unit), (GLint) texture);
> --
> 1.9.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reviewed-by: Anuj Phogat 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/10] i965/meta/gen9: Individually fast clear color attachments

2015-10-14 Thread Ben Widawsky
On Wed, Oct 14, 2015 at 08:04:48PM +0300, Pohjolainen, Topi wrote:
> On Wed, Oct 14, 2015 at 09:54:43AM -0700, Ben Widawsky wrote:
> > On Wed, Oct 14, 2015 at 02:43:24PM +0300, Pohjolainen, Topi wrote:
> > > On Wed, Oct 14, 2015 at 11:39:03AM +0200, Neil Roberts wrote:
> > > > Ben Widawsky  writes:
> > > > 
> > > > > The impetus for this patch comes from a seemingly benign statement 
> > > > > within the
> > > > > spec (quoted within the patch). For me, this patch was at some point 
> > > > > critical
> > > > > for getting stable piglit results (though this did not seem to be the 
> > > > > case on a
> > > > > branch Chad was working on).
> > > > >
> > > > > It is very important for clearing multiple color buffer attachments 
> > > > > and can be
> > > > > observed in the following piglit tests:
> > > > > spec/arb_framebuffer_object/fbo-drawbuffers-none glclear
> > > > > spec/ext_framebuffer_multisample/blit-multiple-render-targets 0
> > > > >
> > > > > Signed-off-by: Ben Widawsky 
> > > > > ---
> > > > >  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 97 
> > > > > +
> > > > >  1 file changed, 84 insertions(+), 13 deletions(-)
> > > > >
> > > > > diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> > > > > b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > > > > index 7bf52f0..9e6711e 100644
> > > > > --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > > > > +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > > > > @@ -427,6 +427,74 @@ use_rectlist(struct brw_context *brw, bool 
> > > > > enable)
> > > > > brw->ctx.NewDriverState |= BRW_NEW_FRAGMENT_PROGRAM;
> > > > >  }
> > > > >  
> > > > > +/**
> > > > > + * Individually fast clear each color buffer attachment. On previous 
> > > > > gens this
> > > > > + * isn't required. The motivation for this comes from one line 
> > > > > (which seems to
> > > > > + * be specific to SKL+). The list item is in section titled _MCS 
> > > > > Buffer for
> > > > > + * Render Target(s)_
> > > > > + *
> > > > > + *   "Since only one RT is bound with a clear pass, only one RT can 
> > > > > be cleared
> > > > > + *   at a time. To clear multiple RTs, multiple clear passes are 
> > > > > required."
> > > > 
> > > > This sentence also appears in the HSW PRM so it seems a bit odd if it's
> > > > only causing problems on SKL. I guess if we get Piglit regressions
> > > > without it then it makes sense to have the patch. It might be worth just
> > > > double checking whether this patch is completely necessary. The wording
> > > > in the commit message seems a little unsure.
> > > 
> > > The spec seems to be missing something as the section discussing "Render
> > > Target Fast Clear" seems to suggest the opposite:
> > > 
> > > "The render target(s) is/are bound as they normally would be, with the MCS
> > >  surface defined in SURFACE_STATE."
> > 
> > I am aware of all this. Neil, yes it is completely necessary for piglit (I 
> > don't
> > know if anything in the real world does this or not).
> > 
> > You are both asking to me to provide something which may be impossible, an
> > explanation of why the docs and/or hardware are behaving this way. Let me
> > respond in kind, please provide an alternate patch which fixes:
> > spec/ext_framebuffer_multisample/blit-multiple-render-targets 0
> > spec/arb_framebuffer_object/fbo-drawbuffers-none glclear (all subtests)
> > 
> > FWIW Topi, it's also contradicted in 3DSTATE_PS definition.
> 
> You misunderstood me I think, I'm not questioning your patch or your
> interpretation, or asking you to provide some information that just isn't
> there in the spec. We talked about this quite a bit. I'm just saying that I
> feel that something is missing in the spec.

I've been unable to get any issues addressed in this part of the spec. I have 4
open bugs filed against the doc on this section alone.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] i965: Extract can_change_source_types() functions.

2015-10-14 Thread Matt Turner
Make them members of fs_inst/vec4_instruction for use elsewhere.

Also fix the fs version to check that dst.type == src[1].type and for
!saturate.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp| 12 
 src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp   | 15 ++-
 src/mesa/drivers/dri/i965/brw_ir_fs.h   |  1 +
 src/mesa/drivers/dri/i965/brw_ir_vec4.h |  1 +
 src/mesa/drivers/dri/i965/brw_vec4.cpp  | 12 
 src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 16 ++--
 6 files changed, 30 insertions(+), 27 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index d000f16..3837bbc 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -338,6 +338,18 @@ fs_inst::can_do_source_mods(const struct brw_device_info 
*devinfo)
 }
 
 bool
+fs_inst::can_change_types() const
+{
+   return dst.type == src[0].type &&
+  !src[0].abs && !src[0].negate && !saturate &&
+  (opcode == BRW_OPCODE_MOV ||
+   (opcode == BRW_OPCODE_SEL &&
+dst.type == src[1].type &&
+predicate != BRW_PREDICATE_NONE &&
+!src[1].abs && !src[1].negate));
+}
+
+bool
 fs_inst::has_side_effects() const
 {
return this->eot || backend_instruction::has_side_effects();
diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
index 230b0ca..5589716 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
@@ -275,17 +275,6 @@ is_logic_op(enum opcode opcode)
opcode == BRW_OPCODE_NOT);
 }
 
-static bool
-can_change_source_types(fs_inst *inst)
-{
-   return !inst->src[0].abs && !inst->src[0].negate &&
-  inst->dst.type == inst->src[0].type &&
-  (inst->opcode == BRW_OPCODE_MOV ||
-   (inst->opcode == BRW_OPCODE_SEL &&
-inst->predicate != BRW_PREDICATE_NONE &&
-!inst->src[1].abs && !inst->src[1].negate));
-}
-
 bool
 fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
 {
@@ -368,7 +357,7 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
acp_entry *entry)
 
if (has_source_modifiers &&
entry->dst.type != inst->src[arg].type &&
-   !can_change_source_types(inst))
+   !inst->can_change_types())
   return false;
 
if (devinfo->gen >= 8 && (entry->src.negate || entry->src.abs) &&
@@ -438,7 +427,7 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
acp_entry *entry)
   * type.  If we got here, then we can just change the source and
   * destination types of the instruction and keep going.
   */
- assert(can_change_source_types(inst));
+ assert(inst->can_change_types());
  for (int i = 0; i < inst->sources; i++) {
 inst->src[i].type = entry->dst.type;
  }
diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h 
b/src/mesa/drivers/dri/i965/brw_ir_fs.h
index 97c6f8b..7726e4b 100644
--- a/src/mesa/drivers/dri/i965/brw_ir_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h
@@ -204,6 +204,7 @@ public:
unsigned components_read(unsigned i) const;
int regs_read(int arg) const;
bool can_do_source_mods(const struct brw_device_info *devinfo);
+   bool can_change_types() const;
bool has_side_effects() const;
 
bool reads_flag() const;
diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h 
b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
index 96dd633..1b57b65 100644
--- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
@@ -179,6 +179,7 @@ public:
   int swizzle, int swizzle_mask);
void reswizzle(int dst_writemask, int swizzle);
bool can_do_source_mods(const struct brw_device_info *devinfo);
+   bool can_change_types() const;
 
bool reads_flag()
{
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 08f3e91..f5242d3 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -280,6 +280,18 @@ vec4_instruction::can_do_source_mods(const struct 
brw_device_info *devinfo)
return true;
 }
 
+bool
+vec4_instruction::can_change_types() const
+{
+   return dst.type == src[0].type &&
+  !src[0].abs && !src[0].negate && !saturate &&
+  (opcode == BRW_OPCODE_MOV ||
+   (opcode == BRW_OPCODE_SEL &&
+dst.type == src[1].type &&
+predicate != BRW_PREDICATE_NONE &&
+!src[1].abs && !src[1].negate));
+}
+
 /**
  * Returns how many MRFs an opcode will write over.
  *
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
index 610caef..db99ecb 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
+++ 

Re: [Mesa-dev] [PATCH 05/10] i965/meta/gen9: Individually fast clear color attachments

2015-10-14 Thread Ben Widawsky
On Wed, Oct 14, 2015 at 02:43:24PM +0300, Pohjolainen, Topi wrote:
> On Wed, Oct 14, 2015 at 11:39:03AM +0200, Neil Roberts wrote:
> > Ben Widawsky  writes:
> > 
> > > The impetus for this patch comes from a seemingly benign statement within 
> > > the
> > > spec (quoted within the patch). For me, this patch was at some point 
> > > critical
> > > for getting stable piglit results (though this did not seem to be the 
> > > case on a
> > > branch Chad was working on).
> > >
> > > It is very important for clearing multiple color buffer attachments and 
> > > can be
> > > observed in the following piglit tests:
> > > spec/arb_framebuffer_object/fbo-drawbuffers-none glclear
> > > spec/ext_framebuffer_multisample/blit-multiple-render-targets 0
> > >
> > > Signed-off-by: Ben Widawsky 
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 97 
> > > +
> > >  1 file changed, 84 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> > > b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > > index 7bf52f0..9e6711e 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > > +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > > @@ -427,6 +427,74 @@ use_rectlist(struct brw_context *brw, bool enable)
> > > brw->ctx.NewDriverState |= BRW_NEW_FRAGMENT_PROGRAM;
> > >  }
> > >  
> > > +/**
> > > + * Individually fast clear each color buffer attachment. On previous 
> > > gens this
> > > + * isn't required. The motivation for this comes from one line (which 
> > > seems to
> > > + * be specific to SKL+). The list item is in section titled _MCS Buffer 
> > > for
> > > + * Render Target(s)_
> > > + *
> > > + *   "Since only one RT is bound with a clear pass, only one RT can be 
> > > cleared
> > > + *   at a time. To clear multiple RTs, multiple clear passes are 
> > > required."
> > 
> > This sentence also appears in the HSW PRM so it seems a bit odd if it's
> > only causing problems on SKL. I guess if we get Piglit regressions
> > without it then it makes sense to have the patch. It might be worth just
> > double checking whether this patch is completely necessary. The wording
> > in the commit message seems a little unsure.
> 
> The spec seems to be missing something as the section discussing "Render
> Target Fast Clear" seems to suggest the opposite:
> 
> "The render target(s) is/are bound as they normally would be, with the MCS
>  surface defined in SURFACE_STATE."

I am aware of all this. Neil, yes it is completely necessary for piglit (I don't
know if anything in the real world does this or not).

You are both asking to me to provide something which may be impossible, an
explanation of why the docs and/or hardware are behaving this way. Let me
respond in kind, please provide an alternate patch which fixes:
spec/ext_framebuffer_multisample/blit-multiple-render-targets 0
spec/arb_framebuffer_object/fbo-drawbuffers-none glclear (all subtests)

FWIW Topi, it's also contradicted in 3DSTATE_PS definition.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] i965/fs: Consider type mismatches in saturate propagation.

2015-10-14 Thread Matt Turner
NIR considers bcsel to produce and consume unsigned types, leading to
SEL instructions operating on unsigned types when the data is really
floating-point. Previous to this patch, saturate propagation would
happily transform

   (+f0) sel  g20:UD, g30:UD, g40:UD
 mov.sat  g50:F,  g20:F

into

   (+f0) sel.sat  g20:UD, g30:UD, g40:UD
 mov  g50:F,  g20:F

But since the meaning of .sat is dependent on the type of the
destination register, this is not valid.

Instead, allow saturate propagation to change the types of dest/source
on instructions that are simply copying data in order to propagate the
saturate modifier.

Fixes bad code gen in 158 programs.
---
 src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp
index e406c28..8792a8c 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp
@@ -52,11 +52,12 @@ opt_saturate_propagation_local(fs_visitor *v, bblock_t 
*block)
   ip--;
 
   if (inst->opcode != BRW_OPCODE_MOV ||
+  !inst->saturate ||
   inst->dst.file != GRF ||
+  inst->dst.type != inst->src[0].type ||
   inst->src[0].file != GRF ||
   inst->src[0].abs ||
-  inst->src[0].negate ||
-  !inst->saturate)
+  inst->src[0].negate)
  continue;
 
   int src_var = v->live_intervals->var_from_reg(inst->src[0]);
@@ -65,7 +66,9 @@ opt_saturate_propagation_local(fs_visitor *v, bblock_t *block)
   bool interfered = false;
   foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst, 
block) {
  if (scan_inst->overwrites_reg(inst->src[0])) {
-if (scan_inst->is_partial_write())
+if (scan_inst->is_partial_write() ||
+(scan_inst->dst.type != inst->dst.type &&
+ !scan_inst->can_change_types()))
break;
 
 if (scan_inst->saturate) {
@@ -73,6 +76,12 @@ opt_saturate_propagation_local(fs_visitor *v, bblock_t 
*block)
progress = true;
 } else if (src_end_ip <= ip || inst->dst.equals(inst->src[0])) {
if (scan_inst->can_do_saturate()) {
+  if (scan_inst->dst.type != inst->dst.type) {
+ scan_inst->dst.type = inst->dst.type;
+ for (int i = 0; i < scan_inst->sources; i++) {
+scan_inst->src[i].type = inst->dst.type;
+ }
+  }
   scan_inst->saturate = true;
   inst->saturate = false;
   progress = true;
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/10] i965/skl: skip fast clears for certain surface formats

2015-10-14 Thread Ben Widawsky
On Wed, Oct 14, 2015 at 01:10:23PM +0200, Neil Roberts wrote:
> It would be nice if you could give some indication of where this list of
> formats came from.
> 
> Unless we expect the list to change with future generations, maybe it
> would be better to make it a static const table? It's a shame to grow
> the context size unnecessarily.
> 
> Regards,
> - Neil
> 

You are correct, I should have referenced it. It is in the section titled
"Render Target Surface Types [SKL+] - Surface Formats for Render Target Messages
[SKL+]"

The supported formats do change, for now we could certainly assume that SKL is
the base set and future GENs add things.

> Ben Widawsky  writes:
> 
> > Initially I had this planned as a patch to be squashed in to the enabling 
> > patch
> > because there is no point enabling fast clears without this. However, Chad
> > merged a patch which disables fast clears on gen9 explicitly, and so I can 
> > hide
> > this behind the revert of that patch. This is a nice I really wanted this 
> > patch
> > as a distinct patch for review. This is a new, weird, and poorly documented
> > restriction for SKL. (In fact, I am still not 100% certain the restriction 
> > is
> > entirely necessary, but there are around 30 piglit regressions without 
> > this).
> >
> > SKL adds compressible render targets and as a result mutates some of the
> > programming for fast clears and resolves. There is a new internal surface 
> > type
> > called the CCS. The old AUX_MCS bit becomes AUX_CCS_D. "The Auxiliary 
> > surface is
> > a CCS (Color Control Surface) with compression disabled or an MCS with
> > compression enabled, depending on number of multisamples. MCS (Multisample
> > Control Surface) is a special type of CCS."
> >
> > Signed-off-by: Ben Widawsky 
> > ---
> >  src/mesa/drivers/dri/i965/brw_context.h |  1 +
> >  src/mesa/drivers/dri/i965/brw_surface_formats.c | 27 
> > +
> >  src/mesa/drivers/dri/i965/gen8_surface_state.c  |  8 ++--
> >  src/mesa/drivers/dri/i965/intel_mipmap_tree.c   |  3 +++
> >  4 files changed, 37 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> > b/src/mesa/drivers/dri/i965/brw_context.h
> > index e59478a..32b8250 100644
> > --- a/src/mesa/drivers/dri/i965/brw_context.h
> > +++ b/src/mesa/drivers/dri/i965/brw_context.h
> > @@ -1546,6 +1546,7 @@ struct brw_context
> >  
> > uint32_t render_target_format[MESA_FORMAT_COUNT];
> > bool format_supported_as_render_target[MESA_FORMAT_COUNT];
> > +   bool losslessly_compressable[MESA_FORMAT_COUNT];
> >  
> > /* Interpolation modes, one byte per vue slot.
> >  * Used Gen4/5 by the clip|sf|wm stages. Ignored on Gen6+.
> > diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c 
> > b/src/mesa/drivers/dri/i965/brw_surface_formats.c
> > index 97fff60..d706ecc 100644
> > --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c
> > +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c
> > @@ -693,6 +693,33 @@ brw_init_surface_formats(struct brw_context *brw)
> >}
> > }
> >  
> > +   if (brw->gen >= 9) {
> > +  brw->losslessly_compressable[MESA_FORMAT_RGBA_FLOAT32] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT32] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT32] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RGBA_UNORM16] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RGBA_SNORM16] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT16] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT16] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RGBA_FLOAT16] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RG_FLOAT32] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RG_SINT32] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RG_UINT32] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RGBX_FLOAT16] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_B8G8R8A8_UNORM] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_R8G8B8A8_UNORM] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_R8G8B8A8_SNORM] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT8] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT8] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RG_SINT16] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RG_UINT16] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RG_FLOAT16] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_R_UINT32] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_R_SINT32] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_R_FLOAT32] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_B8G8R8X8_UNORM] = true;
> > +   }
> > +
> > /* We will check this table for FBO completeness, but 

Re: [Mesa-dev] [PATCH 0/4] i965: skip control-flow aware liveness analysis if we only have 1 block

2015-10-14 Thread Francisco Jerez
Jordan Justen  writes:

> On 2015-10-13 22:49:08, Iago Toral wrote:
>> On Tue, 2015-10-13 at 09:44 -0700, Jordan Justen wrote:
>> > On 2015-10-13 05:17:37, Francisco Jerez wrote:
>> > > Iago Toral Quiroga  writes:
>> > > 
>> > > > This fixes the following test:
>> > > >
>> > > > [require]
>> > > > GL >= 3.3
>> > > > GLSL >= 3.30
>> > > > GL_ARB_shader_storage_buffer_object
>> > > >
>> > > > [fragment shader]
>> > > > #version 330
>> > > > #extension GL_ARB_shader_storage_buffer_object: require
>> > > >
>> > > > buffer SSBO {
>> > > > mat4 sm4;
>> > > > };
>> > > >
>> > > >
>> > > > mat4 um4;
>> > > >
>> > > > void main() {
>> > > > sm4 *= um4;
>> > > 
>> > > This is using the value of "um4", which is never assigned, so liveness
>> > > analysis will correctly extend its live interval up to the start of the
>> > > block.
>> > 
>> > This test was derived by simplifying a CTS test case.
>> > 
>> > Anyway, I'm not sure what happened on the way to the commit message,
>> > but um4 should be a uniform.
>> > 
>> > http://sprunge.us/cEUe
>> 
>> Oh yes, that was me playing around with the example. The patches also
>> fix the uniform version. Jordan, can you verify if this fixes the CTS
>> test case?
>
> Unfortunately, no. The CTS case has some control flow. I had removed
> it to minimize the test case.
>
> Here is a small shader_test that has control flow and still fails to
> compile with your patches:
>
> http://sprunge.us/LIjA
>
>> In any case, since Curro is working on a more general fix for this
>> stuff, I guess you'd rather wait for his patches...
>
> It depends how long we'd have to wait. :) Anyway, since we don't have
> a short-term fix anyhow, let's wait to see what curro has to say.
>
Assuming that at least some of the scalar writes in the shader are being
introduced by emit_uniformize(), an alternative hack that might get the
shader to compile for the moment would be to change emit_uniformize() to
emit a full SIMD-width BROADCAST instruction instead of a scalar one
(see attachment) -- Which is pretty useless in principle because only
the first component will ever be used but it might keep dataflow
analysis from getting confused.

> -Jordan

diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h b/src/mesa/drivers/dri/i965/brw_fs_builder.h
index df10a9d..5db0acf 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_builder.h
+++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h
@@ -392,12 +392,12 @@ namespace brw {
   {
  const fs_builder ubld = exec_all();
  const dst_reg chan_index = component(vgrf(BRW_REGISTER_TYPE_UD), 0);
- const dst_reg dst = component(vgrf(src.type), 0);
+ const dst_reg dst = vgrf(src.type);
 
  ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index);
  ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, chan_index);
 
- return src_reg(dst);
+ return component(src_reg(dst), 0);
   }
 
   /**


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 1/2] intel: 48b ppgtt support (EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag)

2015-10-14 Thread Daniel Vetter
On Tue, Oct 13, 2015 at 02:51:36PM -0700, Kristian Høgsberg wrote:
> On Tue, Oct 13, 2015 at 7:55 AM, Michel Thierry
>  wrote:
> > On 10/13/2015 3:13 PM, Emil Velikov wrote:
> >>
> >> On 13 October 2015 at 13:16, Michel Thierry 
> >> wrote:
> >>>
> >>> On 10/6/2015 2:12 PM, Michel Thierry wrote:
> 
> 
>  On 10/5/2015 7:06 PM, Kristian Høgsberg wrote:
> >
> >
> > On Mon, Oct 5, 2015 at 7:03 AM, Michel Thierry
> >  wrote:
> >>
> >>
> >> On 9/14/2015 2:54 PM, Michał Winiarski wrote:
> >>>
> >>>
> >>>
> >>> On Thu, Sep 03, 2015 at 03:23:58PM +0100, Michel Thierry wrote:
> 
> 
> 
>  Gen8+ supports 48-bit virtual addresses, but some objects must
>  always be
>  allocated inside the 32-bit address range.
> 
>  In specific, any resource used with flat/heapless
>  (0x-0xf000)
>  General State Heap (GSH) or Instruction State Heap (ISH) must be in
>  a
>  32-bit range, because the General State Offset and Instruction State
>  Offset
>  are limited to 32-bits.
> 
>  The i915 driver has been modified to provide a flag to set when the
>  4GB
>  limit is not necessary in a given bo
>  (EXEC_OBJECT_SUPPORTS_48B_ADDRESS).
>  48-bit range will only be used when explicitly requested.
> 
>  Callers to the existing drm_intel_bo_emit_reloc function should set
>  the
>  use_48b_address_range flag beforehand, in order to use full ppgtt
>  range.
> 
>  v2: Make set/clear functions nops on pre-gen8 platforms, and use
>  them
> internally in emit_reloc functions (Ben)
> s/48BADDRESS/48B_ADDRESS/ (Dave)
>  v3: Keep set/clear functions internal, no-one needs to use them
>  directly.
>  v4: Don't set 48bit-support flag in emit reloc, check for ppgtt type
> before enabling set/clear function, print full offsets in
>  debug
> statements, using port of lower_32_bits and upper_32_bits
>  from linux
> kernel (Michał)
> 
>  References:
> 
>  http://lists.freedesktop.org/archives/intel-gfx/2015-July/072612.html
>  Cc: Ben Widawsky 
>  Cc: Michał Winiarski 
> >>>
> >>>
> >>>
> >>>
> >>> +Kristian
> >>>
> >>> LGTM.
> >>> Acked-by: Michał Winiarski 
> >>>
>  Signed-off-by: Michel Thierry 
> >>
> >>
> >>
> >>
> >>
> >> Hi Kristian,
> >>
> >> More comments on this?
> >> I've resent the mesa patch with the last changes
> >>
> >>
> >> (http://lists.freedesktop.org/archives/dri-devel/2015-October/091752.html).
> >>
> >>
> >> Michał has agreed with the interface and will use it for his work.
> >> If mesa doesn't plan to use this for now, it's ok. The kernel changes
> >> have
> >> been done to prevent any regressions when the support 48-bit flag is
> >> not
> >> set.
> >
> >
> >
> > I didn't get any replies to my last comments on this:
> >
> > http://lists.freedesktop.org/archives/mesa-dev/2015-August/091398.html
> >
> > Basically, I tried explicitly placing buffers above 8G and didn't see
> > the HW problem described (ie it all worked fine).  I still think
> > there's some confusion as to what the W/A is about.
> >
> > Here's my take: the W/A is a SW-only workaround to handle the cases
> > where a driver uses heapless and 48-bit ppgtt. The problem is that the
> > heaps are limited to 4G but with 48bit ppgtt a buffer can be placed
> > anywhere it the 48 bit address space. If that happens it's consideredd
> > out-of-bounds for the heap and access fails. To prevent this we need
> > to make sure the bos we address in a heapless fashion are placed below
> > 4g.
> >
> > For mesa, we only configure general state base as heap-less, which
> > affects scratch bos. What this boils down to is that we need the 4G
> > restriction only for the scratch bos set up on 3DSTATE_VS, 3DSTATE_GS
> > and 3DSTATE_PS (and tesselation stage eventually). Look for the
> > OUT_RELOC64 for stage->scratch_bo in gen8_vs_state.c, gen8_gs_state.c
> > and gen8_ps_state.c
> 
> 
> 
>  I think it also affects _3DSTATE_VIEWPORT_STATE_POINTERS_CC, maybe it
>  isn't exclusive to the scratch bos (the error state points to that
>  instruction).
> 
> 
> >>>
> >>> Hi,
> >>>
> >>> Anymore inputs about this patch?
> >>> AFAIK, if changes are required based on further comments from Kristian,
> >>> these will be in the 

Re: [Mesa-dev] [PATCH v2 12/17] i965/vs: Rework vs_emit to take a nir_shader and a brw_compiler

2015-10-14 Thread Pohjolainen, Topi
On Wed, Oct 14, 2015 at 11:25:40AM +0300, Pohjolainen, Topi wrote:
> On Sat, Oct 10, 2015 at 08:09:01AM -0700, Jason Ekstrand wrote:
> > This commit removes all dependence on GL state by getting rid of the
> > brw_context parameter and the GL data structures.
> > 
> > v2 (Jason Ekstrand):
> >- Patch use_legacy_snorm_formula through as a function argument rather
> >  than trying to go through the shader key.
> > ---
> >  src/mesa/drivers/dri/i965/brw_vec4.cpp | 70 
> > +-
> >  src/mesa/drivers/dri/i965/brw_vs.c | 16 +++-
> >  src/mesa/drivers/dri/i965/brw_vs.h | 12 --
> >  3 files changed, 49 insertions(+), 49 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> > b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > index 4b8390f..8e38729 100644
> > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > @@ -1937,51 +1937,42 @@ extern "C" {
> >   * Returns the final assembly and the program's size.
> >   */
> >  const unsigned *
> > -brw_vs_emit(struct brw_context *brw,
> > +brw_vs_emit(const struct brw_compiler *compiler, void *log_data,
> >  void *mem_ctx,
> >  const struct brw_vs_prog_key *key,
> >  struct brw_vs_prog_data *prog_data,
> > -struct gl_vertex_program *vp,
> > -struct gl_shader_program *prog,
> > +const nir_shader *shader,
> > +gl_clip_plane *clip_planes,
> > +bool use_legacy_snorm_formula,
> >  int shader_time_index,
> > -unsigned *final_assembly_size)
> > +unsigned *final_assembly_size,
> > +char **error_str)
> >  {
> > const unsigned *assembly = NULL;
> >  
> > -   if (brw->intelScreen->compiler->scalar_vs) {
> > +   if (compiler->scalar_vs) {
> >prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8;
> >  
> > -  fs_visitor v(brw->intelScreen->compiler, brw,
> > -   mem_ctx, key, _data->base.base,
> > +  fs_visitor v(compiler, log_data, mem_ctx, key, _data->base.base,
> > NULL, /* prog; Only used for TEXTURE_RECTANGLE on gen < 
> > 8 */
> > -   vp->Base.nir, 8, shader_time_index);
> > -  if (!v.run_vs(brw_select_clip_planes(>ctx))) {
> > - if (prog) {
> > -prog->LinkStatus = false;
> > -ralloc_strcat(>InfoLog, v.fail_msg);
> > - }
> > -
> > - _mesa_problem(NULL, "Failed to compile vertex shader: %s\n",
> > -   v.fail_msg);
> > +   shader, 8, shader_time_index);
> > +  if (!v.run_vs(clip_planes)) {
> > + if (error_str)
> > +*error_str = ralloc_strdup(mem_ctx, v.fail_msg);
> 
> I don't particularly like the complexity of the error reporting mechanism.
> First vec4_visitor::fail() uses ralloc_asprintf() to create one string, then
> we make a copy of it here and finally the caller of brw_vs_emit() makes yet
> another copy using ralloc_strcat().
> I wonder if we could pass the final destination all the way for the
> vec4_visitor::fail() to augment with ralloc_asprintf() and hence avoid all

Or more appropiately using ralloc_asprintf_append()...

> the indirection in the middle. What do you think?
> 
> >  
> >   return NULL;
> >}
> >  
> > -  fs_generator g(brw->intelScreen->compiler, brw,
> > - mem_ctx, (void *) key, _data->base.base,
> > - v.promoted_constants,
> > +  fs_generator g(compiler, log_data, mem_ctx, (void *) key,
> > + _data->base.base, v.promoted_constants,
> >   v.runtime_check_aads_emit, "VS");
> >if (INTEL_DEBUG & DEBUG_VS) {
> > - char *name;
> > - if (prog) {
> > -name = ralloc_asprintf(mem_ctx, "%s vertex shader %d",
> > -   prog->Label ? prog->Label : "unnamed",
> > -   prog->Name);
> > - } else {
> > -name = ralloc_asprintf(mem_ctx, "vertex program %d",
> > -   vp->Base.Id);
> > - }
> > - g.enable_debug(name);
> > + const char *debug_name =
> > +ralloc_asprintf(mem_ctx, "%s vertex shader %s",
> > +shader->info.label ? shader->info.label : 
> > "unnamed",
> > +shader->info.name);
> > +
> > + g.enable_debug(debug_name);
> >}
> >g.generate_code(v.cfg, 8);
> >assembly = g.get_assembly(final_assembly_size);
> > @@ -1990,26 +1981,19 @@ brw_vs_emit(struct brw_context *brw,
> > if (!assembly) {
> >prog_data->base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT;
> >  
> > -  vec4_vs_visitor v(brw->intelScreen->compiler, brw, key, prog_data,
> > -vp->Base.nir, brw_select_clip_planes(>ctx),
> > -mem_ctx, 

Re: [Mesa-dev] [mesa-dev, mesa-demos][PATCH] sharedtex_mt: fix rendering thread hang

2015-10-14 Thread Belal, Awais
Hi Brian,

Do you want me to update this or is it good to go as is?

BR,
Awais


From: mesa-dev [mesa-dev-boun...@lists.freedesktop.org] on behalf of Belal, 
Awais
Sent: Monday, October 12, 2015 6:33 PM
To: Brian Paul; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [mesa-dev, mesa-demos][PATCH] sharedtex_mt: fix 
rendering thread hang

Hi Brian,

Thanks for your reply :)
The move of variable definition was just to make the code look a little cleaner.

BR,
Awais


From: Brian Paul [bri...@vmware.com]
Sent: Monday, October 12, 2015 6:31 PM
To: Belal, Awais; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [mesa-dev, mesa-demos][PATCH] sharedtex_mt: fix 
rendering thread hang

On 10/12/2015 05:25 AM, Belal, Awais wrote:
> Hi,
>
> Is there are a reservation against the below patch?

Looks OK, but one comment below.


> BR,
> Awais
>
> 
> From: mesa-dev [mesa-dev-boun...@lists.freedesktop.org] on behalf of Belal, 
> Awais
> Sent: Thursday, October 08, 2015 2:00 PM
> To: mesa-dev@lists.freedesktop.org
> Subject: [Mesa-dev] [mesa-dev,  mesa-demos][PATCH] sharedtex_mt: fix 
> rendering thread hang
>
> XNextEvent is a blocking call which locks up the display mutex
> this causes the rendering threads to hang when they try call
> glXSwapBuffers() as that tries to take the same mutex in
> underlying calls through XCopyArea().
> So we only go to XNextEvent when it has at least one event
> and we wouldn't lock indefinitely.
>
> Signed-off-by: Awais Belal 
> ---
>   src/xdemos/sharedtex_mt.c | 9 +++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/src/xdemos/sharedtex_mt.c b/src/xdemos/sharedtex_mt.c
> index a90903a..1d503c4 100644
> --- a/src/xdemos/sharedtex_mt.c
> +++ b/src/xdemos/sharedtex_mt.c
> @@ -420,9 +420,14 @@ Resize(struct window *h, unsigned int width, unsigned 
> int height)
>   static void
>   EventLoop(void)
>   {
> +   int i;
> +   XEvent event;
>  while (1) {
> -  int i;
> -  XEvent event;
> +  /* Do we have an event? */
> +  if (XPending(gDpy) == 0) {
> + usleep(1);
> + continue;
> +  }
> XNextEvent(gDpy, );
> for (i = 0; i < NumWindows; i++) {
>   struct window *h = [i];

Was there particular reason to move the i, event declarations?

In any case, I'll commit this in a bit.

-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 12/17] i965/vs: Rework vs_emit to take a nir_shader and a brw_compiler

2015-10-14 Thread Pohjolainen, Topi
On Sat, Oct 10, 2015 at 08:09:01AM -0700, Jason Ekstrand wrote:
> This commit removes all dependence on GL state by getting rid of the
> brw_context parameter and the GL data structures.
> 
> v2 (Jason Ekstrand):
>- Patch use_legacy_snorm_formula through as a function argument rather
>  than trying to go through the shader key.
> ---
>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 70 
> +-
>  src/mesa/drivers/dri/i965/brw_vs.c | 16 +++-
>  src/mesa/drivers/dri/i965/brw_vs.h | 12 --
>  3 files changed, 49 insertions(+), 49 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index 4b8390f..8e38729 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -1937,51 +1937,42 @@ extern "C" {
>   * Returns the final assembly and the program's size.
>   */
>  const unsigned *
> -brw_vs_emit(struct brw_context *brw,
> +brw_vs_emit(const struct brw_compiler *compiler, void *log_data,
>  void *mem_ctx,
>  const struct brw_vs_prog_key *key,
>  struct brw_vs_prog_data *prog_data,
> -struct gl_vertex_program *vp,
> -struct gl_shader_program *prog,
> +const nir_shader *shader,
> +gl_clip_plane *clip_planes,
> +bool use_legacy_snorm_formula,
>  int shader_time_index,
> -unsigned *final_assembly_size)
> +unsigned *final_assembly_size,
> +char **error_str)
>  {
> const unsigned *assembly = NULL;
>  
> -   if (brw->intelScreen->compiler->scalar_vs) {
> +   if (compiler->scalar_vs) {
>prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8;
>  
> -  fs_visitor v(brw->intelScreen->compiler, brw,
> -   mem_ctx, key, _data->base.base,
> +  fs_visitor v(compiler, log_data, mem_ctx, key, _data->base.base,
> NULL, /* prog; Only used for TEXTURE_RECTANGLE on gen < 8 
> */
> -   vp->Base.nir, 8, shader_time_index);
> -  if (!v.run_vs(brw_select_clip_planes(>ctx))) {
> - if (prog) {
> -prog->LinkStatus = false;
> -ralloc_strcat(>InfoLog, v.fail_msg);
> - }
> -
> - _mesa_problem(NULL, "Failed to compile vertex shader: %s\n",
> -   v.fail_msg);
> +   shader, 8, shader_time_index);
> +  if (!v.run_vs(clip_planes)) {
> + if (error_str)
> +*error_str = ralloc_strdup(mem_ctx, v.fail_msg);

I don't particularly like the complexity of the error reporting mechanism.
First vec4_visitor::fail() uses ralloc_asprintf() to create one string, then
we make a copy of it here and finally the caller of brw_vs_emit() makes yet
another copy using ralloc_strcat().
I wonder if we could pass the final destination all the way for the
vec4_visitor::fail() to augment with ralloc_asprintf() and hence avoid all
the indirection in the middle. What do you think?

>  
>   return NULL;
>}
>  
> -  fs_generator g(brw->intelScreen->compiler, brw,
> - mem_ctx, (void *) key, _data->base.base,
> - v.promoted_constants,
> +  fs_generator g(compiler, log_data, mem_ctx, (void *) key,
> + _data->base.base, v.promoted_constants,
>   v.runtime_check_aads_emit, "VS");
>if (INTEL_DEBUG & DEBUG_VS) {
> - char *name;
> - if (prog) {
> -name = ralloc_asprintf(mem_ctx, "%s vertex shader %d",
> -   prog->Label ? prog->Label : "unnamed",
> -   prog->Name);
> - } else {
> -name = ralloc_asprintf(mem_ctx, "vertex program %d",
> -   vp->Base.Id);
> - }
> - g.enable_debug(name);
> + const char *debug_name =
> +ralloc_asprintf(mem_ctx, "%s vertex shader %s",
> +shader->info.label ? shader->info.label : 
> "unnamed",
> +shader->info.name);
> +
> + g.enable_debug(debug_name);
>}
>g.generate_code(v.cfg, 8);
>assembly = g.get_assembly(final_assembly_size);
> @@ -1990,26 +1981,19 @@ brw_vs_emit(struct brw_context *brw,
> if (!assembly) {
>prog_data->base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT;
>  
> -  vec4_vs_visitor v(brw->intelScreen->compiler, brw, key, prog_data,
> -vp->Base.nir, brw_select_clip_planes(>ctx),
> -mem_ctx, shader_time_index,
> -!_mesa_is_gles3(>ctx));
> +  vec4_vs_visitor v(compiler, log_data, key, prog_data,
> +shader, clip_planes, mem_ctx,
> +shader_time_index, use_legacy_snorm_formula);
>if (!v.run()) {
> - if (prog) {
> -prog->LinkStatus = false;
> 

Re: [Mesa-dev] [PATCH v2 18/17 (was 10/17)] i965/vs: Move use_legacy_snorm_formula into the shader key

2015-10-14 Thread Pohjolainen, Topi
On Sat, Oct 10, 2015 at 08:05:59AM -0700, Jason Ekstrand wrote:
> This is really an input into the shader compiler so it kind of makes sense
> in the key.  Also, given where it's placed into the key, it doesn't
> actually make it any bigger.
> 
> v2 (Jason Ekstrand):
>- Rebase on top of the compiler clean-ups so the affects of this patch
>  can better be studied without being in the middle of a series.

I guess you are planning to check the fixed hangs before pushing. The patch
itself looks good:

Reviewed-by: Topi Pohjolainen 

> ---
>  src/mesa/drivers/dri/i965/brw_compiler.h  | 3 ++-
>  src/mesa/drivers/dri/i965/brw_vec4.cpp| 4 +---
>  src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp | 9 -
>  src/mesa/drivers/dri/i965/brw_vs.c| 3 ++-
>  src/mesa/drivers/dri/i965/brw_vs.h| 5 +
>  5 files changed, 10 insertions(+), 14 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h 
> b/src/mesa/drivers/dri/i965/brw_compiler.h
> index 4bc1caa..153e381 100644
> --- a/src/mesa/drivers/dri/i965/brw_compiler.h
> +++ b/src/mesa/drivers/dri/i965/brw_compiler.h
> @@ -161,6 +161,8 @@ struct brw_vs_prog_key {
>  
> bool clamp_vertex_color:1;
>  
> +   bool use_legacy_snorm_formula:1;
> +
> /**
>  * How many user clipping planes are being uploaded to the vertex shader 
> as
>  * push constants.
> @@ -585,7 +587,6 @@ brw_compile_vs(const struct brw_compiler *compiler, void 
> *log_data,
> struct brw_vs_prog_data *prog_data,
> const struct nir_shader *shader,
> gl_clip_plane *clip_planes,
> -   bool use_legacy_snorm_formula,
> int shader_time_index,
> unsigned *final_assembly_size,
> char **error_str);
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index 8636323..5336590 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -1943,7 +1943,6 @@ brw_compile_vs(const struct brw_compiler *compiler, 
> void *log_data,
> struct brw_vs_prog_data *prog_data,
> const nir_shader *shader,
> gl_clip_plane *clip_planes,
> -   bool use_legacy_snorm_formula,
> int shader_time_index,
> unsigned *final_assembly_size,
> char **error_str)
> @@ -1982,8 +1981,7 @@ brw_compile_vs(const struct brw_compiler *compiler, 
> void *log_data,
>prog_data->base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT;
>  
>vec4_vs_visitor v(compiler, log_data, key, prog_data,
> -shader, clip_planes, mem_ctx,
> -shader_time_index, use_legacy_snorm_formula);
> +shader, clip_planes, mem_ctx, shader_time_index);
>if (!v.run()) {
>   if (error_str)
>  *error_str = ralloc_strdup(mem_ctx, v.fail_msg);
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
> index 485a80e..9cf04cd 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
> @@ -77,7 +77,8 @@ vec4_vs_visitor::emit_prolog()
>  /* ES 3.0 has different rules for converting signed normalized
>   * fixed-point numbers than desktop GL.
>   */
> -if ((wa_flags & BRW_ATTRIB_WA_SIGN) && 
> !use_legacy_snorm_formula) {
> +if ((wa_flags & BRW_ATTRIB_WA_SIGN) &&
> +!key->use_legacy_snorm_formula) {
> /* According to equation 2.2 of the ES 3.0 specification,
>  * signed normalization conversion is done by:
>  *
> @@ -304,14 +305,12 @@ vec4_vs_visitor::vec4_vs_visitor(const struct 
> brw_compiler *compiler,
>   const nir_shader *shader,
>   gl_clip_plane *clip_planes,
>   void *mem_ctx,
> - int shader_time_index,
> - bool use_legacy_snorm_formula)
> + int shader_time_index)
> : vec4_visitor(compiler, log_data, >tex, _prog_data->base, shader,
>mem_ctx, false /* no_spills */, shader_time_index),
>   key(key),
>   vs_prog_data(vs_prog_data),
> - clip_planes(clip_planes),
> - use_legacy_snorm_formula(use_legacy_snorm_formula)
> + clip_planes(clip_planes)
>  {
>  }
>  
> diff --git a/src/mesa/drivers/dri/i965/brw_vs.c 
> b/src/mesa/drivers/dri/i965/brw_vs.c
> index 9c9b83b..3b3eb8b 100644
> --- a/src/mesa/drivers/dri/i965/brw_vs.c
> +++ b/src/mesa/drivers/dri/i965/brw_vs.c
> @@ -184,7 +184,6 @@ brw_codegen_vs_prog(struct brw_context *brw,
> program = 

Re: [Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization

2015-10-14 Thread Alejandro Piñeiro
On 13/10/15 23:36, Matt Turner wrote:
> On Tue, Oct 13, 2015 at 1:49 AM, Alejandro Piñeiro  
> wrote:
>> On 13/10/15 03:10, Matt Turner wrote:
>>> Looks like this is causing an intermittent failure on HSW in our
>>> Jenkins system (but I'm not able to reproduce locally) and a
>>> consistent failure on G45. I'll investigate that.
>> Ok, will hold on, and meanwhile I will try to reproduce the problem on
>> HSW. Unfortunately I don't have any G45 available, so I will not be able
>> to help on that. Thanks for taking a look there.
> Okay, I think I see what's going on:
>
> --- good2015-10-13 13:34:40.753357253 -0700
> +++ bad 2015-10-13 13:36:06.114290094 -0700
> -and(8)  g6<1>.xDg6<4,4,1>.xD1D  { align16 };
> -cmp.nz.f0(8)null-g6<4,4,1>.xD   0D  { align16 };
> +and.nz.f0(8)g6<1>.xDg6<4,4,1>.xD1D  { align16 };
>
> We're propagating a .nz conditional mod from a CMP with a null dest
> (that has a writemask of .xyzw) to an AND that has a writemask of only
> .x, so only the .x channel of the flag is now being updated.

That is really common. And that is the idea behind the first patch of
the series that changes how if's are emitted, and allowing to optimize
if scan_inst->writemask is WRITEMASK_X. I will use the piglit test
glsl-1.30/execution/switch/vs-pervertex.shader_test to explain this.

If we use current master, we get the following assembly:
1: cmp.z.f0(8) g20<1>.xD   g12<4,4,1>.xD   g19<4,4,1>.xD   {
align16 1Q compacted };
2: mov.nz.f0(8)nullg20<4,4,1>.xD   {
align16 1Q };
3: (+f0) if(8) JIP: 6  UIP: 6  {
align16 1Q };

If we just simplify out the mov.nz at #2, the test would fail, because
as in the example you mentioned, null has a XYZW writemask, but g19 a
writemask X. So this optimized version, f0 only have x updated, and #3
is doing a normal if, that uses all the channels.

But right now the if just needs to check against one component, the x.
So that is the reason (among others) that my patch 1 changes how if's
are emitted. So, using both my patches 1 and 2, the assembly outcome
would be the following:
1: cmp.z.f0(8) g20<1>.xD   g12<4,4,1>.xD   g19<4,4,1>.xD   {
align16 1Q compacted };
2: (+f0.x) if(8)   JIP: 6  UIP: 6  {
align16 1Q };

It is true that now #1 only updates the x channel. But the if at #2 only
use that channel now, so the test still passes.

>
> I think for now the thing to do is add
>
>(inst->dst.writemask & ~scan_inst->dst.writemask) != 0)

This would bail out basically all if's.

So guessing about what is happening here, as I mentioned, we are
allowing to pass that condition if scan_inst->dst.writemask equal to
WRITEMASK_X assuming that they come from an if, so we don't need to
update all f0 channels. But as Jason mentioned in our off-list chat,
that would not be the case if we are dealing with a sel. So something
like this:
1: cmp.z.f0(8) g20<1>.xD   g12<4,4,1>.xD   g19<4,4,1>.xD   {
align16 1Q compacted };
2: mov.nz.f0(8)nullg20<4,4,1>.xD   {
align16 1Q };
3: (+f0.x) if(8) JIP: 6  UIP: 6  {
align16 1Q };

#2 can be optimized out. But if we replace that if for an sel at #3, we
can't.

If that is the case, in addition to check scan_inst and inst, we would
need to check in which kind of instruction the flag register is used later.

But as I said, Im just guessing until I see the full failing test.

> to the list of conditions under which we bail out. That is, if the
> instruction we want to propagate the cmod onto writes fewer channels,
> we can't do the optimization.
>
> With that, the block should look like:
>
>  if ((scan_inst->predicate && scan_inst->opcode != BRW_OPCODE_SEL) ||
>  scan_inst->dst.reg_offset != inst->src[0].reg_offset ||
>  (scan_inst->dst.writemask != WRITEMASK_X &&
>   scan_inst->dst.writemask != WRITEMASK_XYZW) ||
>  (scan_inst->dst.writemask == WRITEMASK_XYZW &&
>   inst->src[0].swizzle != BRW_SWIZZLE_XYZW) ||
>  (inst->dst.writemask & ~scan_inst->dst.writemask) != 0)
> break;
>
> The good news is that, unless I've done something wrong, this doesn't
> affect any shaders in shader-db on ILK or HSW (I only tested those
> two, but I expect the results are the same everywhere). Apparently
> this is a pretty rare case.

Are you sure? I have made a run adding your condition, and now comparing
master vs having the optimization I get this:
total instructions in shared programs: 6240631 -> 6240471 (-0.00%)
instructions in affected programs: 18965 -> 18805 (-0.84%)
helped:160
HURT:  0

That is a really small gain. Or put in other way, if we compare the
conditions I have on the original patches vs adding the condition you
are proposing, 

Re: [Mesa-dev] [PATCH V4 2/6] glsl: assign hidden uniforms their slot id earlier

2015-10-14 Thread Timothy Arceri
On Tue, 2015-10-13 at 12:20 +0200, Marek Olšák wrote:
> On Tue, Oct 13, 2015 at 10:13 AM, Timothy Arceri <
> t_arc...@yahoo.com.au> wrote:
> > On Mon, 2015-10-12 at 01:06 +0200, Marek Olšák wrote:
> > > On Sun, Oct 11, 2015 at 9:20 AM, Timothy Arceri <
> > > t_arc...@yahoo.com.au> wrote:
> > > > On Sat, 2015-10-10 at 18:06 +0200, Marek Olšák wrote:
> > > > > Hi Timothy,
> > > > > 
> > > > > One of these 3 commits breaks compilation for Talos shaders
> > > > > with
> > > > > gallium. My piglit patch "glsl-1.30/sampler-bug: ..."
> > > > > contains a
> > > > > minimal test case. I can't say which commit, because Mesa
> > > > > fails
> > > > > to
> > > > > build between them.
> > > > >  It has something to do with uniforms, structures,
> > > > > and samplers.
> > > > > 
> > > > > commit dcd9cd03837545055ce2a315e7e8840cc3254d1a
> > > > > Author: Timothy Arceri 
> > > > > Date:   Sun Aug 30 12:50:34 2015 +1000
> > > > > 
> > > > > glsl: store uniform slot id in var location field
> > > > > 
> > > > 
> > > > Hi Marek,
> > > > 
> > > > The piglit test passes on my intel based laptop I can't test on
> > > > anything else until later this week (as I'm traveling) but my
> > > > guess
> > > > is
> > > > its something to do with the above patch.
> > > > 
> > > > The other two patches shouldn't change anything for gallium
> > > > drivers
> > > > "glsl: assign hidden uniforms their slot id earlier" just
> > > > assigns
> > > > hidden uniforms their slot id earlier and there shouldn't be
> > > > any
> > > > difference once the IR gets to glsl_to_tgsi.
> > > > 
> > > > Also "glsl: order indices for samplers inside a struct array"
> > > > shouldn't
> > > > change things either as in your test the sampler are not inside
> > > > the
> > > > struct.
> > > > 
> > > > There is some code in the glsl_to_tgsi pass that looks like the
> > > > location field would have always been -1 for uniforms other the
> > > > UBO
> > > > and
> > > > UBO members maybe this has something to do with the problem now
> > > > that
> > > > all uniforms now get a non -1 value.
> > > > 
> > > >   case ir_var_uniform:
> > > >  entry = new(mem_ctx) variable_storage(var,
> > > > PROGRAM_UNIFORM,
> > > >var
> > > > ->data.location);
> > > >  this->variables.push_tail(entry);
> > > >  break;
> > > > 
> > > > I hope this helps get you started. If you haven't figured it
> > > > out by
> > > > later in the week than I'll take a look on my desktop once I
> > > > get
> > > > home.
> > > 
> > > The problem is that _mesa_get_sampler_uniform_value returns a
> > > sampler
> > > index which is greater than 16. If I allow large sampler indices,
> > > it
> > > starts to fail with sampler.file==TGSI_FILE_NULL in the TGSI
> > > codegen
> > > later. I didn't investigate further.
> > > 
> > > Marek
> > 
> > 
> > So from my bisect run your test was passing until:
> > 
> > a907b5dd162b7911b8c21f6d54837831bc078059 is the first bad commit
> > commit a907b5dd162b7911b8c21f6d54837831bc078059
> > Author: Marek Olšák 
> > Date:   Mon Oct 5 03:26:48 2015 +0200
> > 
> > st/mesa: translate fragment shaders into TGSI when we get them
> > 
> > Reviewed-by: Dave Airlie 
> > Reviewed-by: Brian Paul 
> > Tested-by: Brian Paul 
> 
> It's a linker test, so you really need to set ST_DEBUG=precompile to
> get that shader translated before this commit. Using --enable-debug
> should make it fail an assertion earlier without ST_DEBUG=precompile.
> 
> You didn't need to bisect. I had bisected the bug already and it's
> one
> of your 3 commits.

I did a bisect because the test passed at my commits but not in master.
It would have helped if you included the information about needing to
use ST_DEBUG=precompile in the original email. 

The test passed even with --enable-debug, anyway I now get the error
using ST_DEBUG=precompile so I'll take a better looking into the
problem tomorrow.

> 
> Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization

2015-10-14 Thread Francisco Jerez
Alejandro Piñeiro  writes:

> On 13/10/15 23:36, Matt Turner wrote:
>> On Tue, Oct 13, 2015 at 1:49 AM, Alejandro Piñeiro  
>> wrote:
>>> On 13/10/15 03:10, Matt Turner wrote:
 Looks like this is causing an intermittent failure on HSW in our
 Jenkins system (but I'm not able to reproduce locally) and a
 consistent failure on G45. I'll investigate that.
>>> Ok, will hold on, and meanwhile I will try to reproduce the problem on
>>> HSW. Unfortunately I don't have any G45 available, so I will not be able
>>> to help on that. Thanks for taking a look there.
>> Okay, I think I see what's going on:
>>
>> --- good2015-10-13 13:34:40.753357253 -0700
>> +++ bad 2015-10-13 13:36:06.114290094 -0700
>> -and(8)  g6<1>.xDg6<4,4,1>.xD1D  { align16 };
>> -cmp.nz.f0(8)null-g6<4,4,1>.xD   0D  { align16 };
>> +and.nz.f0(8)g6<1>.xDg6<4,4,1>.xD1D  { align16 };
>>
>> We're propagating a .nz conditional mod from a CMP with a null dest
>> (that has a writemask of .xyzw) to an AND that has a writemask of only
>> .x, so only the .x channel of the flag is now being updated.
>
> That is really common. And that is the idea behind the first patch of
> the series that changes how if's are emitted, and allowing to optimize
> if scan_inst->writemask is WRITEMASK_X. I will use the piglit test
> glsl-1.30/execution/switch/vs-pervertex.shader_test to explain this.
>
> If we use current master, we get the following assembly:
> 1: cmp.z.f0(8) g20<1>.xD   g12<4,4,1>.xD   g19<4,4,1>.xD   {
> align16 1Q compacted };
> 2: mov.nz.f0(8)nullg20<4,4,1>.xD   {
> align16 1Q };
> 3: (+f0) if(8) JIP: 6  UIP: 6  {
> align16 1Q };
>
> If we just simplify out the mov.nz at #2, the test would fail, because
> as in the example you mentioned, null has a XYZW writemask, but g19 a
> writemask X. So this optimized version, f0 only have x updated, and #3
> is doing a normal if, that uses all the channels.
>
> But right now the if just needs to check against one component, the x.
> So that is the reason (among others) that my patch 1 changes how if's
> are emitted. So, using both my patches 1 and 2, the assembly outcome
> would be the following:
> 1: cmp.z.f0(8) g20<1>.xD   g12<4,4,1>.xD   g19<4,4,1>.xD   {
> align16 1Q compacted };
> 2: (+f0.x) if(8)   JIP: 6  UIP: 6  {
> align16 1Q };
>
> It is true that now #1 only updates the x channel. But the if at #2 only
> use that channel now, so the test still passes.
>
>>
>> I think for now the thing to do is add
>>
>>(inst->dst.writemask & ~scan_inst->dst.writemask) != 0)
>
> This would bail out basically all if's.
>
I agree with Matt that this check is necessary, otherwise
cmod_propagation will replace the pair of instruction with another
instruction which doesn't have equivalent semantics.  If you still want
cases like the one you pasted below involving an if conditional that
only reads the x components to be simplified, I suggest you have a look
at dead code elimination and fix it so that writemask bits of an
instruction that writes a flag register (e.g. the "mov.nz.f0") are
turned off for components which are determined to be dead, kind of like
is done already for the vec4 components of GRFs.

With that change the assembly from your example would look like this
after DCE:

| 1: cmp.z.f0(8) g20<1>.xD   g12<4,4,1>.xD   g19<4,4,1>.xD { 
align16 1Q compacted };
| 2: mov.nz.f0(8)null.x  g20<4,4,1>.xD { 
align16 1Q };
| 3: (+f0.x) if(8)   JIP: 6  UIP: 6{ 
align16 1Q };

Which cmod propagation would be allowed to simplify further.

> So guessing about what is happening here, as I mentioned, we are
> allowing to pass that condition if scan_inst->dst.writemask equal to
> WRITEMASK_X assuming that they come from an if, so we don't need to
> update all f0 channels. But as Jason mentioned in our off-list chat,
> that would not be the case if we are dealing with a sel. So something
> like this:
> 1: cmp.z.f0(8) g20<1>.xD   g12<4,4,1>.xD   g19<4,4,1>.xD   {
> align16 1Q compacted };
> 2: mov.nz.f0(8)nullg20<4,4,1>.xD   {
> align16 1Q };
> 3: (+f0.x) if(8) JIP: 6  UIP: 6  {
> align16 1Q };
>
> #2 can be optimized out. But if we replace that if for an sel at #3, we
> can't.
>
> If that is the case, in addition to check scan_inst and inst, we would
> need to check in which kind of instruction the flag register is used later.
>
> But as I said, Im just guessing until I see the full failing test.
>
>> to the list of conditions under which we bail out. That is, if the
>> instruction we want to propagate the cmod onto writes fewer channels,
>> we can't do the optimization.
>>
>> With that, the block should look like:

Re: [Mesa-dev] [PATCH 05/10] i965/meta/gen9: Individually fast clear color attachments

2015-10-14 Thread Neil Roberts
Ben Widawsky  writes:

> The impetus for this patch comes from a seemingly benign statement within the
> spec (quoted within the patch). For me, this patch was at some point critical
> for getting stable piglit results (though this did not seem to be the case on 
> a
> branch Chad was working on).
>
> It is very important for clearing multiple color buffer attachments and can be
> observed in the following piglit tests:
> spec/arb_framebuffer_object/fbo-drawbuffers-none glclear
> spec/ext_framebuffer_multisample/blit-multiple-render-targets 0
>
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 97 
> +
>  1 file changed, 84 insertions(+), 13 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> index 7bf52f0..9e6711e 100644
> --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> @@ -427,6 +427,74 @@ use_rectlist(struct brw_context *brw, bool enable)
> brw->ctx.NewDriverState |= BRW_NEW_FRAGMENT_PROGRAM;
>  }
>  
> +/**
> + * Individually fast clear each color buffer attachment. On previous gens 
> this
> + * isn't required. The motivation for this comes from one line (which seems 
> to
> + * be specific to SKL+). The list item is in section titled _MCS Buffer for
> + * Render Target(s)_
> + *
> + *   "Since only one RT is bound with a clear pass, only one RT can be 
> cleared
> + *   at a time. To clear multiple RTs, multiple clear passes are required."

This sentence also appears in the HSW PRM so it seems a bit odd if it's
only causing problems on SKL. I guess if we get Piglit regressions
without it then it makes sense to have the patch. It might be worth just
double checking whether this patch is completely necessary. The wording
in the commit message seems a little unsure.

> + *
> + * The code follows the same idea as the resolve code which creates a fake 
> FBO
> + * to avoid interfering with too much of the GL state.
> + */

What state are you trying to avoid interfering with? Maybe the resolve
code only creates an FBO because it needs to work on an abritrary
intel_mipmap_tree which might not already have an FBO associated with
it. It seems like it should be enough just to call
_mesa_meta_drawbuffers_from_bitfield(1 << index) instead of creating an
FBO. If you did that you could also make the loop a bit nicer by just
looping over the bits set in fast_clear_buffers with ffs. The
glDrawBuffers state is already saved as part of the meta save state and
the existing code for gen<9 also modifies it.

> +static void
> +fast_clear_attachments(struct brw_context *brw,
> +   struct gl_framebuffer *fb,
> +   uint32_t fast_clear_buffers,
> +   struct rect fast_clear_rect)
> +{
> +   assert(brw->gen >= 9);
> +   struct gl_context *ctx = >ctx;
> +
> +   GLuint old_fb = ctx->DrawBuffer->Name;
> +
> +   for (unsigned buf = 0; buf < fb->_NumColorDrawBuffers; buf++) {
> +  struct gl_renderbuffer *rb = fb->_ColorDrawBuffers[buf];
> +  struct intel_renderbuffer *irb = intel_renderbuffer(rb);
> +  GLuint fbo, rbo;
> +  int index = fb->_ColorDrawBufferIndexes[buf];
> +
> +  if (!((1 << index) & fast_clear_buffers))
> + continue;
> +
> +  _mesa_GenFramebuffers(1, );
> +  rbo = brw_get_rb_for_slice(brw, irb->mt, 0, 0, false);
> +
> +  _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, fbo);
> +  _mesa_FramebufferRenderbuffer(GL_DRAW_FRAMEBUFFER,
> +GL_COLOR_ATTACHMENT0,
> +GL_RENDERBUFFER, rbo);
> +  _mesa_DrawBuffer(GL_COLOR_ATTACHMENT0);
> +
> +  brw_fast_clear_init(brw);

Is is really necessary to call this for every buffer? It looks like it's
called once before fast_clear_attachments is called so maybe it doesn't
need to be in this function at all?

> +
> +  use_rectlist(brw, true);

Same for this.

> +
> +  brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
> +
> +  /* SKL+ also has a resolve mode for compressed render targets and thus 
> more
> +   * bits to let us select the type of resolve.  For fast clear 
> resolves, it
> +   * turns out we can use the same value as pre-SKL though.
> +   */
> +  set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE);
> +  brw_draw_rectlist(ctx, _clear_rect, MAX2(1, fb->MaxNumLayers));
> +  set_fast_clear_op(brw, 0);
> +  use_rectlist(brw, false);

And this.

> +
> +  _mesa_DeleteRenderbuffers(1, );
> +  _mesa_DeleteFramebuffers(1, );
> +
> +  /* Now set the mcs we cleared to INTEL_FAST_CLEAR_STATE_CLEAR so we'll
> +   * resolve them eventually.
> +   */
> +  irb->mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR;
> +   }
> +
> +   _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, 

Re: [Mesa-dev] [PATCH 09/10] i965/meta: Remove fast_clear_color variable

2015-10-14 Thread Neil Roberts
This patch doesn't look right. See this sentence in “Render Target Fast
Clear”:

“The pixel shader kernel requires no attributes, and delivers a value of
 0x in all channels of the render target write message”

Presumably the fast_clear_color is trying to implement this restriction.

Regards,
- Neil

Ben Widawsky  writes:

> It doesn't actually serve a purpose AFAICT (in fact, I'm not certain what it's
> meant to do).
>
> Cc: Kristian Høgsberg 
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> index 41afc9a..9c51ffb 100644
> --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> @@ -390,8 +390,6 @@ set_fast_clear_color(struct brw_context *brw,
> }
>  }
>  
> -static const uint32_t fast_clear_color[4] = { ~0, ~0, ~0, ~0 };
> -
>  static void
>  set_fast_clear_op(struct brw_context *brw, uint32_t op)
>  {
> @@ -472,7 +470,7 @@ fast_clear_attachments(struct brw_context *brw,
>  
>use_rectlist(brw, true);
>  
> -  brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
> +  brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f);
>  
>/* SKL+ also has a resolve mode for compressed render targets and thus 
> more
> * bits to let us select the type of resolve.  For fast clear 
> resolves, it
> @@ -670,7 +668,7 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
> gl_framebuffer *fb,
>fast_clear_attachments(brw, fb, fast_clear_buffers, fast_clear_rect);
> } else if (fast_clear_buffers) {
>_mesa_meta_drawbuffers_from_bitfield(fast_clear_buffers);
> -  brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
> +  brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f);
>set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE);
>brw_draw_rectlist(ctx, _clear_rect, layers);
>set_fast_clear_op(brw, 0);
> @@ -785,7 +783,7 @@ brw_meta_resolve_color(struct brw_context *brw,
>  
> use_rectlist(brw, true);
>  
> -   brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
> +   brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f);
>  
> /* SKL+ also has a resolve mode for compressed render targets and thus 
> more
>  * bits to let us select the type of resolve.  For fast clear resolves, it
> -- 
> 2.6.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/fbo: use pipe_surface_release instead of pipe_surface_reference

2015-10-14 Thread Roland Scheidegger
Am 14.10.2015 um 18:03 schrieb Krzysztof A. Sobiecki:
> From: Krzysztof Sobiecki 
> 
> pipe_surface_reference have problems with deleted contexts,
> so use of pipe_surface_release might be more appropriate.
> 
> Fixes Wasteland 2 Director's Cut crash on start.
> ---
>  src/mesa/state_tracker/st_cb_fbo.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/mesa/state_tracker/st_cb_fbo.c 
> b/src/mesa/state_tracker/st_cb_fbo.c
> index ff703fa..2a2eb09 100644
> --- a/src/mesa/state_tracker/st_cb_fbo.c
> +++ b/src/mesa/state_tracker/st_cb_fbo.c
> @@ -456,7 +456,7 @@ st_update_renderbuffer_surface(struct st_context *st,
>surf_tmpl.u.tex.first_layer = first_layer;
>surf_tmpl.u.tex.last_layer = last_layer;
>  
> -  pipe_surface_reference(>surface, NULL);
> +  pipe_surface_release(pipe, >surface);
>  
>strb->surface = pipe->create_surface(pipe, resource, _tmpl);
> }
> 

Sounds to me like the surface gets deleted in another context - since
surfaces are per context this is illegal.
So, in other words, you're exchanging one bug for another. Albeit this
might be preferable for now...

Roland

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] st/omx/dec/h264: fix field picture type 0 poc disorder

2015-10-14 Thread Leo Liu
Signed-off-by: Leo Liu 
Cc: "10.6 11.0" 
---
 src/gallium/state_trackers/omx/vid_dec_h264.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/gallium/state_trackers/omx/vid_dec_h264.c 
b/src/gallium/state_trackers/omx/vid_dec_h264.c
index 18d8803..f66ed89 100644
--- a/src/gallium/state_trackers/omx/vid_dec_h264.c
+++ b/src/gallium/state_trackers/omx/vid_dec_h264.c
@@ -753,10 +753,14 @@ static void slice_header(vid_dec_PrivateType *priv, 
struct vl_rbsp *rbsp,
  priv->codec_data.h264.delta_pic_order_cnt_bottom = 
delta_pic_order_cnt_bottom;
   }
 
-  priv->picture.h264.field_order_cnt[0] = pic_order_cnt_msb + 
pic_order_cnt_lsb;
-  priv->picture.h264.field_order_cnt[1] = pic_order_cnt_msb + 
pic_order_cnt_lsb;
-  if (!priv->picture.h264.field_pic_flag)
- priv->picture.h264.field_order_cnt[1] += 
priv->codec_data.h264.delta_pic_order_cnt_bottom;
+  if (!priv->picture.h264.field_pic_flag) {
+ priv->picture.h264.field_order_cnt[0] = pic_order_cnt_msb + 
pic_order_cnt_lsb;
+ priv->picture.h264.field_order_cnt[1] = 
priv->picture.h264.field_order_cnt [0] +
+  
priv->codec_data.h264.delta_pic_order_cnt_bottom;
+  } else if (!priv->picture.h264.bottom_field_flag)
+ priv->picture.h264.field_order_cnt[0] = pic_order_cnt_msb + 
pic_order_cnt_lsb;
+  else
+ priv->picture.h264.field_order_cnt[1] = pic_order_cnt_msb + 
pic_order_cnt_lsb;
 
} else if (sps->pic_order_cnt_type == 1) {
   unsigned MaxFrameNum = 1 << (sps->log2_max_frame_num_minus4 + 4);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] main: Match DispatchCompute* API validation from main specification

2015-10-14 Thread Jordan Justen
There is a discrepancy between the ARB_compute_shader specification,
and the OpenGL 4.3 and OpenGLES 3.1 specifications. With regards to
the indirect dispatch parameter, unsupported value errors should
return INVALID_VALUE according to the main specifications, whereas the
extension specification indicated INVALID_OPERATION should be
returned.

Here we update the code to match the main specifications, and update
the citations use the main specification rather than the extension
specification.

Signed-off-by: Jordan Justen 
---
 Fixes ES31-CTS.compute_shader.api-indirect

 src/mesa/main/api_validate.c | 37 ++---
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
index a46c194..c286945 100644
--- a/src/mesa/main/api_validate.c
+++ b/src/mesa/main/api_validate.c
@@ -895,6 +895,11 @@ check_valid_to_compute(struct gl_context *ctx, const char 
*function)
   return false;
}
 
+   /* From the OpenGL 4.3 Core Specification, Chapter 19, Compute Shaders:
+*
+* "An INVALID_OPERATION error is generated if there is no active program
+*  for the compute shader stage."
+*/
prog = ctx->Shader.CurrentProgram[MESA_SHADER_COMPUTE];
if (prog == NULL || prog->_LinkedShaders[MESA_SHADER_COMPUTE] == NULL) {
   _mesa_error(ctx, GL_INVALID_OPERATION,
@@ -917,6 +922,12 @@ _mesa_validate_DispatchCompute(struct gl_context *ctx,
   return GL_FALSE;
 
for (i = 0; i < 3; i++) {
+  /* From the OpenGL 4.3 Core Specification, Chapter 19, Compute Shaders:
+   *
+   * "An INVALID_VALUE error is generated if any of num_groups_x,
+   *  num_groups_y and num_groups_z are greater than or equal to the
+   *  maximum work group count for the corresponding dimension."
+   */
   if (num_groups[i] > ctx->Const.MaxComputeWorkGroupCount[i]) {
  _mesa_error(ctx, GL_INVALID_VALUE,
  "glDispatchCompute(num_groups_%c)", 'x' + i);
@@ -937,24 +948,33 @@ valid_dispatch_indirect(struct gl_context *ctx,
if (!check_valid_to_compute(ctx, name))
   return GL_FALSE;
 
-   /* From the ARB_compute_shader specification:
+   /* From the OpenGL 4.3 Core Specification, Chapter 19, Compute Shaders:
 *
-* "An INVALID_OPERATION error is generated [...] if  is less
-*  than zero or not a multiple of the size, in basic machine units, of
-*  uint."
+* "An INVALID_VALUE error is generated if indirect is negative or is not a
+*  multiple of four."
+*
+* Note that the OpenGLES 3.1 specification matches this, but this is
+* different than the extension specification which has a return of
+* INVALID_OPERATION instead.
 */
if ((GLintptr)indirect & (sizeof(GLuint) - 1)) {
-  _mesa_error(ctx, GL_INVALID_OPERATION,
+  _mesa_error(ctx, GL_INVALID_VALUE,
   "%s(indirect is not aligned)", name);
   return GL_FALSE;
}
 
if ((GLintptr)indirect < 0) {
-  _mesa_error(ctx, GL_INVALID_OPERATION,
+  _mesa_error(ctx, GL_INVALID_VALUE,
   "%s(indirect is less than zero)", name);
   return GL_FALSE;
}
 
+   /* From the OpenGL 4.3 Core Specification, Chapter 19, Compute Shaders:
+*
+* "An INVALID_OPERATION error is generated if no buffer is bound to the
+*  DRAW_INDIRECT_BUFFER binding, or if the command would source data
+*  beyond the end of the buffer object."
+*/
if (!_mesa_is_bufferobj(ctx->DispatchIndirectBuffer)) {
   _mesa_error(ctx, GL_INVALID_OPERATION,
   "%s: no buffer bound to DISPATCH_INDIRECT_BUFFER", name);
@@ -967,11 +987,6 @@ valid_dispatch_indirect(struct gl_context *ctx,
   return GL_FALSE;
}
 
-   /* From the ARB_compute_shader specification:
-*
-* "An INVALID_OPERATION error is generated if this command sources data
-*  beyond the end of the buffer object [...]"
-*/
if (ctx->DispatchIndirectBuffer->Size < end) {
   _mesa_error(ctx, GL_INVALID_OPERATION,
   "%s(DISPATCH_INDIRECT_BUFFER too small)", name);
-- 
2.5.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] configure: show which gallium drivers/sts are built

2015-10-14 Thread Ilia Mirkin
Signed-off-by: Ilia Mirkin 
---
 configure.ac | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index 217281f..7112e43 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1576,6 +1576,8 @@ fi
 AM_CONDITIONAL(HAVE_EGL, test "x$enable_egl" = xyes)
 AC_SUBST([EGL_LIB_DEPS])
 
+gallium_st=""
+
 dnl
 dnl XA configuration
 dnl
@@ -1589,6 +1591,7 @@ if test "x$enable_xa" = xyes; then
   Example: ./configure --enable-xa --with-gallium-drivers=svga...])
 fi
 enable_gallium_loader=$enable_shared_pipe_drivers
+gallium_st="xa $gallium_st"
 fi
 AM_CONDITIONAL(HAVE_ST_XA, test "x$enable_xa" = xyes)
 
@@ -1634,24 +1637,28 @@ AM_CONDITIONAL(NEED_GALLIUM_VL_WINSYS, test 
"x$need_gallium_vl_winsys" = xyes)
 if test "x$enable_xvmc" = xyes; then
 PKG_CHECK_MODULES([XVMC], [xvmc >= $XVMC_REQUIRED])
 enable_gallium_loader=$enable_shared_pipe_drivers
+gallium_st="xvmc $gallium_st"
 fi
 AM_CONDITIONAL(HAVE_ST_XVMC, test "x$enable_xvmc" = xyes)
 
 if test "x$enable_vdpau" = xyes; then
 PKG_CHECK_MODULES([VDPAU], [vdpau >= $VDPAU_REQUIRED])
 enable_gallium_loader=$enable_shared_pipe_drivers
+gallium_st="vdpau $gallium_st"
 fi
 AM_CONDITIONAL(HAVE_ST_VDPAU, test "x$enable_vdpau" = xyes)
 
 if test "x$enable_omx" = xyes; then
 PKG_CHECK_MODULES([OMX], [libomxil-bellagio >= 
$LIBOMXIL_BELLAGIO_REQUIRED])
 enable_gallium_loader=$enable_shared_pipe_drivers
+gallium_st="omx $gallium_st"
 fi
 AM_CONDITIONAL(HAVE_ST_OMX, test "x$enable_omx" = xyes)
 
 if test "x$enable_va" = xyes; then
 PKG_CHECK_MODULES([VA], [libva >= $LIBVA_REQUIRED])
 enable_gallium_loader=$enable_shared_pipe_drivers
+gallium_st="va $gallium_st"
 fi
 AM_CONDITIONAL(HAVE_ST_VA, test "x$enable_va" = xyes)
 
@@ -1674,6 +1681,7 @@ if test "x$enable_nine" = xyes; then
 fi
 
 enable_gallium_loader=$enable_shared_pipe_drivers
+gallium_st="nine $gallium_st"
 fi
 AM_CONDITIONAL(HAVE_ST_NINE, test "x$enable_nine" = xyes)
 
@@ -1713,6 +1721,7 @@ if test "x$enable_opencl" = xyes; then
 
 # XXX: Use $enable_shared_pipe_drivers once converted to use static/shared 
pipe-drivers
 enable_gallium_loader=yes
+gallium_st="clover $gallium_st"
 
 if test "x$enable_opencl_icd" = xyes; then
 OPENCL_LIBNAME="MesaOpenCL"
@@ -2513,7 +2522,8 @@ fi
 
 echo ""
 if test -n "$with_gallium_drivers"; then
-echo "Gallium: yes"
+echo "Gallium drivers: $gallium_drivers"
+echo "Gallium st:  $gallium_st"
 else
 echo "Gallium: no"
 fi
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] i965/vec4: Add unit tests for cmod propagation pass

2015-10-14 Thread Alejandro Piñeiro
This include the same tests coming from test_fs_cmod_propagation, (non
vector glsl types included) plus some new with vec4 types, inspired on
the regressions found while the optimization was a work in progress.

Additionally, the check of number of instructions after the
optimization was changed from EXPECT_EQ to ASSERT_EQ. This was done to
avoid a crash on failing tests that expected no optimization, as after
checking the number of instructions, there were some checks related to
this last instruction opcode/conditional mod.

v2: include new tests to manage when inst and scan_inst has
different writemasks
---
 src/mesa/drivers/dri/i965/Makefile.am  |   7 +
 .../dri/i965/test_vec4_cmod_propagation.cpp| 820 +
 2 files changed, 827 insertions(+)
 create mode 100644 src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp

diff --git a/src/mesa/drivers/dri/i965/Makefile.am 
b/src/mesa/drivers/dri/i965/Makefile.am
index 2e24151..63228a5 100644
--- a/src/mesa/drivers/dri/i965/Makefile.am
+++ b/src/mesa/drivers/dri/i965/Makefile.am
@@ -58,6 +58,7 @@ TESTS = \
test_fs_saturate_propagation \
 test_eu_compact \
test_vf_float_conversions \
+   test_vec4_cmod_propagation \
 test_vec4_copy_propagation \
 test_vec4_register_coalesce
 
@@ -93,6 +94,12 @@ test_vec4_copy_propagation_LDADD = \
 $(top_builddir)/src/gtest/libgtest.la \
 $(TEST_LIBS)
 
+test_vec4_cmod_propagation_SOURCES = \
+   test_vec4_cmod_propagation.cpp
+test_vec4_cmod_propagation_LDADD = \
+   $(top_builddir)/src/gtest/libgtest.la \
+   $(TEST_LIBS)
+
 test_eu_compact_SOURCES = \
test_eu_compact.c
 nodist_EXTRA_test_eu_compact_SOURCES = dummy.cpp
diff --git a/src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp 
b/src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp
new file mode 100644
index 000..2d9f6c7
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp
@@ -0,0 +1,820 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Based on test_fs_cmod_propagation.cpp
+ */
+
+#include 
+#include "brw_vec4.h"
+#include "brw_vec4_builder.h"
+#include "brw_cfg.h"
+#include "program/program.h"
+
+using namespace brw;
+
+class cmod_propagation_test : public ::testing::Test {
+   virtual void SetUp();
+
+public:
+   struct brw_compiler *compiler;
+   struct brw_device_info *devinfo;
+   struct gl_context *ctx;
+   struct gl_shader_program *shader_prog;
+   struct brw_vertex_program *vp;
+   vec4_visitor *v;
+};
+
+class cmod_propagation_vec4_visitor : public vec4_visitor
+{
+public:
+   cmod_propagation_vec4_visitor(struct brw_compiler *compiler,
+ nir_shader *shader)
+  : vec4_visitor(compiler, NULL, NULL, NULL, shader, NULL,
+ false, -1) {}
+
+protected:
+   /* Dummy implementation for pure virtual methods */
+   virtual dst_reg *make_reg_for_system_value(int location,
+  const glsl_type *type)
+   {
+  unreachable("Not reached");
+   }
+
+   virtual void setup_payload()
+   {
+  unreachable("Not reached");
+   }
+
+   virtual void emit_prolog()
+   {
+  unreachable("Not reached");
+   }
+
+   virtual void emit_program_code()
+   {
+  unreachable("Not reached");
+   }
+
+   virtual void emit_thread_end()
+   {
+  unreachable("Not reached");
+   }
+
+   virtual void emit_urb_write_header(int mrf)
+   {
+  unreachable("Not reached");
+   }
+
+   virtual vec4_instruction *emit_urb_write_opcode(bool complete)
+   {
+  unreachable("Not reached");
+   }
+};
+
+
+void cmod_propagation_test::SetUp()
+{
+   ctx = (struct gl_context *)calloc(1, sizeof(*ctx));
+   compiler = (struct brw_compiler *)calloc(1, sizeof(*compiler));
+   devinfo = (struct brw_device_info *)calloc(1, 

Re: [Mesa-dev] [RFC 2/2] gallium: add tegra support

2015-10-14 Thread Christian Gmeiner
2015-10-14 5:58 GMT+02:00 Michel Dänzer :
> On 13.10.2015 12:44, Alexandre Courbot wrote:
>> On Mon, Oct 12, 2015 at 12:09 AM, Christian Gmeiner
>>  wrote:
>>>
>>> diff --git a/src/gallium/winsys/tegra/drm/tegra_drm_winsys.c 
>>> b/src/gallium/winsys/tegra/drm/tegra_drm_winsys.c
>>> new file mode 100644
>>> index 000..e172407
>>> --- /dev/null
>>> +++ b/src/gallium/winsys/tegra/drm/tegra_drm_winsys.c
>>> @@ -0,0 +1,51 @@
>>> +/*
>>> + * Copyright © 2014 NVIDIA Corporation
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person obtaining a
>>> + * copy of this software and associated documentation files (the 
>>> "Software"),
>>> + * to deal in the Software without restriction, including without 
>>> limitation
>>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>>> + * and/or sell copies of the Software, and to permit persons to whom the
>>> + * Software is furnished to do so, subject to the following conditions:
>>> + *
>>> + * The above copyright notice and this permission notice (including the 
>>> next
>>> + * paragraph) shall be included in all copies or substantial portions of 
>>> the
>>> + * Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
>>> OR
>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
>>> OTHER
>>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
>>> DEALINGS
>>> + * IN THE SOFTWARE.
>>> + */
>>> +
>>> +#include "renderonly/renderonly_screen.h"
>>> +#include "../winsys/tegra/drm/tegra_drm_public.h"
>>> +#include "../winsys/nouveau/drm/nouveau_drm_public.h"
>>> +
>>> +#include 
>>
>> I had to change this line to
>>
>> #include 
>>
>> on my system for it to compile (drm's master installs tegra_drm.h in
>> $PREFIX/include/libdrm)
>
> #include 
>
> should work either way.
>

Will give it a try.

thanks
--
Christian Gmeiner, MSc

https://soundcloud.com/christian-gmeiner
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 2/2] gallium: add tegra support

2015-10-14 Thread Christian Gmeiner
2015-10-14 14:51 GMT+02:00 Erik Faye-Lund :
> On Sun, Oct 11, 2015 at 5:09 PM, Christian Gmeiner
>  wrote:
>> @@ -2181,6 +2188,13 @@ if test -n "$with_gallium_drivers"; then
>>  done
>>  fi
>>
>> +dnl We need to validate some needed dependencies for renderonly drivers.
>> +
>> +if test "x$HAVE_GALLIUM_NOUVEAU" != xyes -a "x$HAVE_GALLIUM_TEGRA" == xyes  
>> ; then
>> +AC_ERROR([Building with tegra requires that nouveau])
>
> Requires that nouveau what? :)

Yeah will improve working for V2.

thanks
--
Christian Gmeiner, MSc

https://soundcloud.com/christian-gmeiner
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/vec4: dead_code_eliminate: update writemask on null_regs based on flag_live

2015-10-14 Thread Alejandro Piñeiro
---

This patch implements the idea proposed by Francisco Jerez. With this
change, even adding the new condition pointed by Matt Turner on the
"2/5 i965/vec4: adding vec4_cmod_propagation optimization", the shader-db
numbers remain the same. So this patch would go before the optimization
(so in this series it would be the patch 1.5).

Note: Im not resending the patch 2/5, as Matt pointed that he granted
the reviewed status with his suggested change. I can send it if needed
in any case.

 .../drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp| 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp
index 8fc7a36..31ea128 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp
@@ -78,13 +78,19 @@ vec4_visitor::dead_code_eliminate()
  sizeof(BITSET_WORD));
 
   foreach_inst_in_block_reverse(vec4_instruction, inst, block) {
- if (inst->dst.file == GRF && !inst->has_side_effects()) {
+ if ((inst->dst.file == GRF && !inst->has_side_effects()) ||
+ (inst->dst.is_null() && inst->writes_flag())){
 bool result_live[4] = { false };
 
-for (unsigned i = 0; i < inst->regs_written; i++) {
-   for (int c = 0; c < 4; c++)
-  result_live[c] |= BITSET_TEST(
- live, var_from_reg(alloc, offset(inst->dst, i), c));
+if (inst->dst.file == GRF) {
+   for (unsigned i = 0; i < inst->regs_written; i++) {
+  for (int c = 0; c < 4; c++)
+ result_live[c] |= BITSET_TEST(
+live, var_from_reg(alloc, offset(inst->dst, i), c));
+   }
+} else {
+   for (unsigned c = 0; c < 4; c++)
+  result_live[c] |= BITSET_TEST(flag_live, c);
 }
 
 /* If the instruction can't do writemasking, then it's all or
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 2/2] gallium: add tegra support

2015-10-14 Thread Christian Gmeiner
Hi

2015-10-13 5:44 GMT+02:00 Alexandre Courbot :
> On Mon, Oct 12, 2015 at 12:09 AM, Christian Gmeiner
>  wrote:
>> This commit adds tegra support, which uses the renderonly driver
>> library.
>>
>> Signed-off-by: Christian Gmeiner 
>> ---
>>  configure.ac   | 19 +++-
>>  src/gallium/Makefile.am|  6 +++
>>  .../auxiliary/target-helpers/inline_drm_helper.h   | 29 
>>  src/gallium/drivers/tegra/Automake.inc | 10 +
>>  src/gallium/drivers/tegra/Makefile.am  |  9 
>>  src/gallium/targets/dri/Makefile.am|  2 +
>>  src/gallium/winsys/tegra/drm/Android.mk| 34 +++
>>  src/gallium/winsys/tegra/drm/Makefile.am   | 33 ++
>>  src/gallium/winsys/tegra/drm/Makefile.sources  |  3 ++
>>  src/gallium/winsys/tegra/drm/tegra_drm_public.h| 31 +
>>  src/gallium/winsys/tegra/drm/tegra_drm_winsys.c| 51 
>> ++
>>  11 files changed, 226 insertions(+), 1 deletion(-)
>>  create mode 100644 src/gallium/drivers/tegra/Automake.inc
>>  create mode 100644 src/gallium/drivers/tegra/Makefile.am
>>  create mode 100644 src/gallium/winsys/tegra/drm/Android.mk
>>  create mode 100644 src/gallium/winsys/tegra/drm/Makefile.am
>>  create mode 100644 src/gallium/winsys/tegra/drm/Makefile.sources
>>  create mode 100644 src/gallium/winsys/tegra/drm/tegra_drm_public.h
>>  create mode 100644 src/gallium/winsys/tegra/drm/tegra_drm_winsys.c
>>
>> diff --git a/configure.ac b/configure.ac
>> index ea485b1..9fb8244 100644
>> --- a/configure.ac
>> +++ b/configure.ac
>> @@ -75,6 +75,7 @@ LIBDRM_INTEL_REQUIRED=2.4.61
>>  LIBDRM_NVVIEUX_REQUIRED=2.4.33
>>  LIBDRM_NOUVEAU_REQUIRED=2.4.62
>>  LIBDRM_FREEDRENO_REQUIRED=2.4.65
>> +LIBDRM_TEGRA_REQUIRED=2.4.58
>>  DRI2PROTO_REQUIRED=2.6
>>  DRI3PROTO_REQUIRED=1.0
>>  PRESENTPROTO_REQUIRED=1.0
>> @@ -864,7 +865,7 @@ GALLIUM_DRIVERS_DEFAULT="r300,r600,svga,swrast"
>>  AC_ARG_WITH([gallium-drivers],
>>  [AS_HELP_STRING([--with-gallium-drivers@<:@=DIRS...@:>@],
>>  [comma delimited Gallium drivers list, e.g.
>> -"i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,vc4"
>> +
>> "i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,tegra,vc4"
>>  @<:@default=r300,r600,svga,swrast@:>@])],
>>  [with_gallium_drivers="$withval"],
>>  [with_gallium_drivers="$GALLIUM_DRIVERS_DEFAULT"])
>> @@ -2166,6 +2167,12 @@ if test -n "$with_gallium_drivers"; then
>>  HAVE_GALLIUM_LLVMPIPE=yes
>>  fi
>>  ;;
>> +xtegra)
>> +HAVE_GALLIUM_TEGRA=yes
>> +PKG_CHECK_MODULES([TEGRA], [libdrm_tegra >= 
>> $LIBDRM_TEGRA_REQUIRED])
>> +gallium_require_drm "tegra"
>> +gallium_require_drm_loader
>> +;;
>>  xvc4)
>>  HAVE_GALLIUM_VC4=yes
>>  gallium_require_drm "vc4"
>> @@ -2181,6 +2188,13 @@ if test -n "$with_gallium_drivers"; then
>>  done
>>  fi
>>
>> +dnl We need to validate some needed dependencies for renderonly drivers.
>> +
>> +if test "x$HAVE_GALLIUM_NOUVEAU" != xyes -a "x$HAVE_GALLIUM_TEGRA" == xyes  
>> ; then
>> +AC_ERROR([Building with tegra requires that nouveau])
>> +fi
>> +
>> +
>>  dnl Set LLVM_LIBS - This is done after the driver configuration so
>>  dnl that drivers can add additional components to LLVM_COMPONENTS.
>>  dnl Previously, gallium drivers were updating LLVM_LIBS directly
>> @@ -2245,6 +2259,7 @@ AM_CONDITIONAL(HAVE_GALLIUM_FREEDRENO, test 
>> "x$HAVE_GALLIUM_FREEDRENO" = xyes)
>>  AM_CONDITIONAL(HAVE_GALLIUM_SOFTPIPE, test "x$HAVE_GALLIUM_SOFTPIPE" = xyes)
>>  AM_CONDITIONAL(HAVE_GALLIUM_LLVMPIPE, test "x$HAVE_GALLIUM_LLVMPIPE" = xyes)
>>  AM_CONDITIONAL(HAVE_GALLIUM_VC4, test "x$HAVE_GALLIUM_VC4" = xyes)
>> +AM_CONDITIONAL(HAVE_GALLIUM_TEGRA, test "x$HAVE_GALLIUM_TEGRA" = xyes)
>>
>>  AM_CONDITIONAL(HAVE_GALLIUM_STATIC_TARGETS, test 
>> "x$enable_shared_pipe_drivers" = xno)
>>
>> @@ -2364,6 +2379,7 @@ AC_CONFIG_FILES([Makefile
>> src/gallium/drivers/renderonly/Makefile
>> src/gallium/drivers/softpipe/Makefile
>> src/gallium/drivers/svga/Makefile
>> +   src/gallium/drivers/tegra/Makefile
>> src/gallium/drivers/trace/Makefile
>> src/gallium/drivers/vc4/Makefile
>> src/gallium/state_trackers/clover/Makefile
>> @@ -2406,6 +2422,7 @@ AC_CONFIG_FILES([Makefile
>> src/gallium/winsys/sw/wrapper/Makefile
>> src/gallium/winsys/sw/xlib/Makefile
>> src/gallium/winsys/vc4/drm/Makefile
>> +   src/gallium/winsys/tegra/drm/Makefile
>> src/gbm/Makefile
>> src/gbm/main/gbm.pc
>> src/glsl/Makefile
>> diff --git a/src/gallium/Makefile.am 

Re: [Mesa-dev] [PATCH 04/10] i965/skl: skip fast clears for certain surface formats

2015-10-14 Thread Neil Roberts
It would be nice if you could give some indication of where this list of
formats came from.

Unless we expect the list to change with future generations, maybe it
would be better to make it a static const table? It's a shame to grow
the context size unnecessarily.

Regards,
- Neil

Ben Widawsky  writes:

> Initially I had this planned as a patch to be squashed in to the enabling 
> patch
> because there is no point enabling fast clears without this. However, Chad
> merged a patch which disables fast clears on gen9 explicitly, and so I can 
> hide
> this behind the revert of that patch. This is a nice I really wanted this 
> patch
> as a distinct patch for review. This is a new, weird, and poorly documented
> restriction for SKL. (In fact, I am still not 100% certain the restriction is
> entirely necessary, but there are around 30 piglit regressions without this).
>
> SKL adds compressible render targets and as a result mutates some of the
> programming for fast clears and resolves. There is a new internal surface type
> called the CCS. The old AUX_MCS bit becomes AUX_CCS_D. "The Auxiliary surface 
> is
> a CCS (Color Control Surface) with compression disabled or an MCS with
> compression enabled, depending on number of multisamples. MCS (Multisample
> Control Surface) is a special type of CCS."
>
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/brw_context.h |  1 +
>  src/mesa/drivers/dri/i965/brw_surface_formats.c | 27 
> +
>  src/mesa/drivers/dri/i965/gen8_surface_state.c  |  8 ++--
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c   |  3 +++
>  4 files changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index e59478a..32b8250 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -1546,6 +1546,7 @@ struct brw_context
>  
> uint32_t render_target_format[MESA_FORMAT_COUNT];
> bool format_supported_as_render_target[MESA_FORMAT_COUNT];
> +   bool losslessly_compressable[MESA_FORMAT_COUNT];
>  
> /* Interpolation modes, one byte per vue slot.
>  * Used Gen4/5 by the clip|sf|wm stages. Ignored on Gen6+.
> diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c 
> b/src/mesa/drivers/dri/i965/brw_surface_formats.c
> index 97fff60..d706ecc 100644
> --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c
> +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c
> @@ -693,6 +693,33 @@ brw_init_surface_formats(struct brw_context *brw)
>}
> }
>  
> +   if (brw->gen >= 9) {
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_FLOAT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_UNORM16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_SNORM16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_FLOAT16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RG_FLOAT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RG_SINT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RG_UINT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBX_FLOAT16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_B8G8R8A8_UNORM] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_R8G8B8A8_UNORM] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_R8G8B8A8_SNORM] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT8] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT8] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RG_SINT16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RG_UINT16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RG_FLOAT16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_R_UINT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_R_SINT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_R_FLOAT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_B8G8R8X8_UNORM] = true;
> +   }
> +
> /* We will check this table for FBO completeness, but the surface format
>  * table above only covered color rendering.
>  */
> diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
> b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> index 995b4dd..b19b492 100644
> --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> @@ -243,8 +243,10 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
> * "When Auxiliary Surface Mode is set to AUX_CCS_D or AUX_CCS_E, 
> HALIGN
> *  16 must be used."
> 

Re: [Mesa-dev] [PATCH 00/10] Support Skylake MCS buffers (fast clears)

2015-10-14 Thread Neil Roberts
Looks good, it'll be great to get this landed. Patches 1-3 and 6-8 are:

Reviewed-by: Neil Roberts 

I've sent comments separately for 4, 5 and 9. Hopefully I can try to
help with patch 10 once my SKL machine arrives.

Regards,
- Neil

Ben Widawsky  writes:

> This patch series adds support for fast color clears on SKL as it exists on
> previous generations of hardware minus the new hardware restriction on surface
> formats. Additionally, it adds support for utilizing clear values with up to 
> 32b
> per color channel (see note at the bottom). It is based on work originally 
> done
> by Kristian, so thanks to him for that initial work as well as helping me 
> debug
> some of the issues.
>
> Additionally, thanks to Chad for helping track down the last bug in the 
> rectangle
> scaling code which was (for me) being masked by another bug (#3 below). I
> imagine it would have been several more weeks at least before I uncovered it.
>
> We knew that SKL added the extra DWORDs to the RENDER_SURFACE_STATE in order 
> to
> support the 32b per channel. As it turned out though, Skylake made other 
> changes
> to support this which caused weird failures which seemed to interfere with
> each other.
>
> 1. Not all surface formats support lossless compression.
> 2. Clearing multiple color buffer attachments must happen in n passes
> 3. Change to the scaling factors for the MCS surface - SKL has 2x height (this
> was the bug which Chad helped uncover, I had it correct in my patch from March
> http://lists.freedesktop.org/archives/mesa-dev/2015-March/079084.html, but we
> had other problems which prevented merge, including #1 and #2 above).
>
> I have no piglit, dEQP or CTS regressions (except for the last patch). I 
> haven't
> yet, but will collect perf data on this ASAP. Historically we've come to 
> expect
> this to provide large gains in tests which are memory bandwidth limited and
> doing many clears.
>
> Ben Widawsky (10):
>   i965/gen8+: Remove redundant zeroing of surface state
>   i965/gen8+: Extract color clear surface state
>   i965/skl: Enable fast color clears on SKL
>   i965/skl: skip fast clears for certain surface formats
>   i965/meta/gen9: Individually fast clear color attachments
>   Revert "i965/gen9: Disable MCS for 1x color surfaces"
>   Revert "i965/gen9: Enable rep clears on gen9"
>   i965/meta: Assert fast clears and rep clears never overlap
>   i965/meta: Remove fast_clear_color variable
>   i965/gen9: Support fast clears for 32b float
>
>  src/mesa/drivers/dri/i965/brw_context.h |   1 +
>  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 172 
> ++--
>  src/mesa/drivers/dri/i965/brw_surface_formats.c |  27 
>  src/mesa/drivers/dri/i965/gen8_surface_state.c  |  48 ---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c   |  20 +--
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.h   |   7 +-
>  6 files changed, 205 insertions(+), 70 deletions(-)
>
> -- 
> 2.6.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH shader-db 3/3] docs: Add symbolic link generation step

2015-10-14 Thread Rhys Kidd
On 13 October 2015 at 08:54, Matt Turner  wrote:

> On Sat, Oct 10, 2015 at 10:30 PM, Rhys Kidd  wrote:
> > Signed-off-by: Rhys Kidd 
> > ---
> >  README | 5 +
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/README b/README
> > index 6ed3244..03be4e7 100644
> > --- a/README
> > +++ b/README
> > @@ -60,6 +60,11 @@ Build with:
> >  make
> >  ```
> >
> > +run.py relies on a symbolic link to a built piglit bin directory, as
> follows:
> > +```
> > +ln -s /bin "$PWD"/bin
> > +```
>
> run.py prints an appropriate message, doesn't it? If we want to do
> anything at all, maybe we should just print a more verbose message
> from run.py?
>

Hello Matt,

There is a relevant distinction though here, between a user-reported error
(in the general sense) and a user-reported error due to a missing
dependency.

Having a local built copy of piglit is a dependency of the run.py script,
hence why I'm recommending including that fact prominently within README's
dependency section per this patch.

That said, I'm not wedded to this approach -- so will shortly send out a
re-roll of the two patches from this series that aren't upstream to the
mailing list for further comment.

Regards,
Rhys
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 1/2] intel: 48b ppgtt support (EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag)

2015-10-14 Thread Michel Thierry

On 10/14/2015 8:19 AM, Daniel Vetter wrote:

On Tue, Oct 13, 2015 at 02:51:36PM -0700, Kristian Høgsberg wrote:

On Tue, Oct 13, 2015 at 7:55 AM, Michel Thierry
 wrote:

On 10/13/2015 3:13 PM, Emil Velikov wrote:


On 13 October 2015 at 13:16, Michel Thierry 
wrote:


On 10/6/2015 2:12 PM, Michel Thierry wrote:



On 10/5/2015 7:06 PM, Kristian Høgsberg wrote:



On Mon, Oct 5, 2015 at 7:03 AM, Michel Thierry
 wrote:



On 9/14/2015 2:54 PM, Michał Winiarski wrote:




On Thu, Sep 03, 2015 at 03:23:58PM +0100, Michel Thierry wrote:




Gen8+ supports 48-bit virtual addresses, but some objects must
always be
allocated inside the 32-bit address range.

In specific, any resource used with flat/heapless
(0x-0xf000)
General State Heap (GSH) or Instruction State Heap (ISH) must be in
a
32-bit range, because the General State Offset and Instruction State
Offset
are limited to 32-bits.

The i915 driver has been modified to provide a flag to set when the
4GB
limit is not necessary in a given bo
(EXEC_OBJECT_SUPPORTS_48B_ADDRESS).
48-bit range will only be used when explicitly requested.

Callers to the existing drm_intel_bo_emit_reloc function should set
the
use_48b_address_range flag beforehand, in order to use full ppgtt
range.

v2: Make set/clear functions nops on pre-gen8 platforms, and use
them
internally in emit_reloc functions (Ben)
s/48BADDRESS/48B_ADDRESS/ (Dave)
v3: Keep set/clear functions internal, no-one needs to use them
directly.
v4: Don't set 48bit-support flag in emit reloc, check for ppgtt type
before enabling set/clear function, print full offsets in
debug
statements, using port of lower_32_bits and upper_32_bits
from linux
kernel (Michał)

References:

http://lists.freedesktop.org/archives/intel-gfx/2015-July/072612.html
Cc: Ben Widawsky 
Cc: Michał Winiarski 





+Kristian

LGTM.
Acked-by: Michał Winiarski 


Signed-off-by: Michel Thierry 






Hi Kristian,

More comments on this?
I've resent the mesa patch with the last changes


(http://lists.freedesktop.org/archives/dri-devel/2015-October/091752.html).


Michał has agreed with the interface and will use it for his work.
If mesa doesn't plan to use this for now, it's ok. The kernel changes
have
been done to prevent any regressions when the support 48-bit flag is
not
set.




I didn't get any replies to my last comments on this:

http://lists.freedesktop.org/archives/mesa-dev/2015-August/091398.html

Basically, I tried explicitly placing buffers above 8G and didn't see
the HW problem described (ie it all worked fine).  I still think
there's some confusion as to what the W/A is about.

Here's my take: the W/A is a SW-only workaround to handle the cases
where a driver uses heapless and 48-bit ppgtt. The problem is that the
heaps are limited to 4G but with 48bit ppgtt a buffer can be placed
anywhere it the 48 bit address space. If that happens it's consideredd
out-of-bounds for the heap and access fails. To prevent this we need
to make sure the bos we address in a heapless fashion are placed below
4g.

For mesa, we only configure general state base as heap-less, which
affects scratch bos. What this boils down to is that we need the 4G
restriction only for the scratch bos set up on 3DSTATE_VS, 3DSTATE_GS
and 3DSTATE_PS (and tesselation stage eventually). Look for the
OUT_RELOC64 for stage->scratch_bo in gen8_vs_state.c, gen8_gs_state.c
and gen8_ps_state.c




I think it also affects _3DSTATE_VIEWPORT_STATE_POINTERS_CC, maybe it
isn't exclusive to the scratch bos (the error state points to that
instruction).




Hi,

Anymore inputs about this patch?
AFAIK, if changes are required based on further comments from Kristian,
these will be in the mesa patch[1], not libdrm. Also, Michał will use
this
flag in another project.


The comment seems quite brief and I'm not sure it fully addresses
Kristian's concern. I'd suspect that providing reference to the HW
documentation (confirming your assumption) might be beneficial.



Sure, attached is the hang I get if I don't set the restriction in
gen8_misc_state.c and try to use the full 48-bit range
(i915_error_state_nowa.txt). This is just running gfxbench t-rex.

I see the same hang signature when it is only applied to the scratch bos (in
gen8_vs_state.c, gen8_gs_state.c and gen8_ps_state.c -
i915_error_state_scratchbo.txt).

3DSTATE_VIEWPORT_STATE_POINTERS_CC (0x7823) is defined here:
https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-bdw-vol02a-commandreference-instructions_0.pdf
(page 256)


I applied your mesa and libdrm patches and then backed out the 4G
restriction in the STATE_BASE_ADDRESS relocations.  I was not able to
reproduce any hang with trex or glxgears. I confirmed (using
INTEL_DEBUG=bat) that gem actually placed BOs above the 4G mark and
got 

[Mesa-dev] [PATCH shader-db v2 2/2] docs: Add symbolic link generation step

2015-10-14 Thread Rhys Kidd
Signed-off-by: Rhys Kidd 
---
 README | 5 +
 1 file changed, 5 insertions(+)

diff --git a/README b/README
index 2f83987..cfbc362 100644
--- a/README
+++ b/README
@@ -55,6 +55,11 @@ Install necessary dependencies on Debian or Ubuntu:
 sudo apt-get install build-essentials libepoxy-dev libgbm-dev
 ```
 
+run.py relies on a symbolic link to a built piglit bin directory, as follows:
+```
+ln -s /bin "$PWD"/bin
+```
+
 === jemalloc ===
 Since run compiles shaders in different threads, malloc/free locking overhead
 from inside Mesa can be expensive. Preloading jemalloc can cut significant
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH shader-db v2 1/2] docs: Improve dependencies documentation

2015-10-14 Thread Rhys Kidd
v2: Reflect feedback from Kenneth Graunke  and
Matt Turner 

Signed-off-by: Rhys Kidd 
---
 README | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/README b/README
index e301d0e..2f83987 100644
--- a/README
+++ b/README
@@ -49,6 +49,12 @@ ST_DEBUG=precompile R600_DEBUG=ps,vs,gs,precompile ./run 
shaders -1 2> new-run
 run requires some GNU C extensions, render nodes (/dev/dri/renderD128),
 libepoxy, OpenMP, and Mesa configured with --with-egl-platforms=x11,drm
 
+Install necessary dependencies on Debian or Ubuntu:
+```
+# Developers will probably have a local build of Mesa
+sudo apt-get install build-essentials libepoxy-dev libgbm-dev
+```
+
 === jemalloc ===
 Since run compiles shaders in different threads, malloc/free locking overhead
 from inside Mesa can be expensive. Preloading jemalloc can cut significant
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/10] mesa: optimize no-change check in _mesa_BlendEquation()

2015-10-14 Thread Brian Paul
Same story as preceeding change to _mesa_BlendFuncSeparate().
---
 src/mesa/main/blend.c | 35 +++
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/src/mesa/main/blend.c b/src/mesa/main/blend.c
index 98d2858..01b6919 100644
--- a/src/mesa/main/blend.c
+++ b/src/mesa/main/blend.c
@@ -345,33 +345,44 @@ void GLAPIENTRY
 _mesa_BlendEquation( GLenum mode )
 {
GLuint buf, numBuffers;
-   GLboolean changed;
+   bool changed = false;
GET_CURRENT_CONTEXT(ctx);
 
if (MESA_VERBOSE & VERBOSE_API)
   _mesa_debug(ctx, "glBlendEquation(%s)\n",
   _mesa_enum_to_string(mode));
 
-   if (!legal_blend_equation(ctx, mode)) {
-  _mesa_error(ctx, GL_INVALID_ENUM, "glBlendEquation");
-  return;
-   }
-
numBuffers = ctx->Extensions.ARB_draw_buffers_blend
   ? ctx->Const.MaxDrawBuffers : 1;
 
-   changed = GL_FALSE;
-   for (buf = 0; buf < numBuffers; buf++) {
-  if (ctx->Color.Blend[buf].EquationRGB != mode ||
-  ctx->Color.Blend[buf].EquationA != mode) {
- changed = GL_TRUE;
- break;
+   if (ctx->Color._BlendEquationPerBuffer) {
+  /* Check all per-buffer states */
+  for (buf = 0; buf < numBuffers; buf++) {
+ if (ctx->Color.Blend[buf].EquationRGB != mode ||
+ ctx->Color.Blend[buf].EquationA != mode) {
+changed = true;
+break;
+ }
+  }
+   }
+   else {
+  /* only need to check 0th per-buffer state */
+  if (ctx->Color.Blend[0].EquationRGB != mode ||
+  ctx->Color.Blend[0].EquationA != mode) {
+ changed = true;
   }
}
+
if (!changed)
   return;
 
+   if (!legal_blend_equation(ctx, mode)) {
+  _mesa_error(ctx, GL_INVALID_ENUM, "glBlendEquation");
+  return;
+   }
+
FLUSH_VERTICES(ctx, _NEW_COLOR);
+
for (buf = 0; buf < numBuffers; buf++) {
   ctx->Color.Blend[buf].EquationRGB = mode;
   ctx->Color.Blend[buf].EquationA = mode;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/10] mesa: short-cut new_state == _NEW_LINE in _mesa_update_state_locked()

2015-10-14 Thread Brian Paul
We can skip to the end of _mesa_update_state_locked() if only the
_NEW_LINE flag is set since none of the derived state depends on it
(just like _NEW_CURRENT_ATTRIB).  Note that we still call the
ctx->Driver.UpdateState() function, of course.
---
 src/mesa/main/state.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/state.c b/src/mesa/main/state.c
index d3b1c72..7fa7da2 100644
--- a/src/mesa/main/state.c
+++ b/src/mesa/main/state.c
@@ -392,7 +392,8 @@ _mesa_update_state_locked( struct gl_context *ctx )
GLbitfield prog_flags = _NEW_PROGRAM;
GLbitfield new_prog_state = 0x0;
 
-   if (new_state == _NEW_CURRENT_ATTRIB) 
+   if (new_state == _NEW_CURRENT_ATTRIB ||
+   new_state == _NEW_LINE)
   goto out;
 
if (MESA_VERBOSE & VERBOSE_STATE)
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/10] Support Skylake MCS buffers (fast clears)

2015-10-14 Thread Ben Widawsky
On Tue, Oct 13, 2015 at 08:50:17PM -0700, Ben Widawsky wrote:
> This patch series adds support for fast color clears on SKL as it exists on
> previous generations of hardware minus the new hardware restriction on surface
> formats. Additionally, it adds support for utilizing clear values with up to 
> 32b
> per color channel (see note at the bottom). It is based on work originally 
> done
> by Kristian, so thanks to him for that initial work as well as helping me 
> debug
> some of the issues.
> 
> Additionally, thanks to Chad for helping track down the last bug in the 
> rectangle
> scaling code which was (for me) being masked by another bug (#3 below). I
> imagine it would have been several more weeks at least before I uncovered it.
> 
> We knew that SKL added the extra DWORDs to the RENDER_SURFACE_STATE in order 
> to
> support the 32b per channel. As it turned out though, Skylake made other 
> changes
> to support this which caused weird failures which seemed to interfere with
> each other.
> 
> 1. Not all surface formats support lossless compression.
> 2. Clearing multiple color buffer attachments must happen in n passes
> 3. Change to the scaling factors for the MCS surface - SKL has 2x height (this
> was the bug which Chad helped uncover, I had it correct in my patch from March
> http://lists.freedesktop.org/archives/mesa-dev/2015-March/079084.html, but we
> had other problems which prevented merge, including #1 and #2 above).
> 
> I have no piglit, dEQP or CTS regressions (except for the last patch). I 
> haven't
> yet, but will collect perf data on this ASAP. Historically we've come to 
> expect
> this to provide large gains in tests which are memory bandwidth limited and
> doing many clears.

I left out the note here about 32b having two small regressions.

I did some very basic performance data collection. As expected, the rep_clears
which were already enabled by Chad seem to actually provide most of the gains. I
didn't actually run long enough to do much except prove to myself that there
aren't any performance regressions over the gen9 rep clears. These are the
results which shouldn't be taken too seriously (5 runs only).

Benchmark   % diff (master->full 32b fast clears)
OglBatch0 1.87   
OglBatch1 0.54   
OglBatch2 -0.44  
OglBatch3 0.11   
OglBatch4 -0.94  
OglBatch5 -2.11  
OglBatch6 1.18   
OglBatch7 7.02   
OglDeferred   3.05   
OglDeferredAA 3.6
OglFillPixel  0.07   
OglFillTexMulti   -0.01  
OglFillTexSingle  0.03   
OglGeomPoint  0.07   
OglGeomTriList0.74   
OglGeomTriStrip   -0.13  
OglHdrBloom   -1.93  
OglMultithread-0.96  
OglPSBump20.33   
OglPSBump80.31   
OglPSPhong0.18   
OglPSPom  -0.08  
OglShMapPcf   0.03   
OglShMapVsm   -0.3   
OglTerrainFlyInst 0.46   
OglTerrainPanInst 0.4
OglTexFilterAniso -0.08  
OglTexFilterTri   0.13   
OglTexMem128  0.2
OglTexMem512  -0.03  
OglVSDiffuse1 0.23   
OglVSDiffuse8 -0.23  
OglVSInstancing   -0.15  
OglVSTangent  -0.06  
OglZBuffer0.07   
fill  0.17   
filloff   -0.01  
fur   -0.19  
heaven0.56   
plot3d-0.18  
trex  4.51   
trexoff   3.69   
triangle  0.04   
valley1.86   
warsow0.18   
xonotic   0.4


BTW: the patches are here as well (with 32b support reverted):
http://cgit.freedesktop.org/~bwidawsk/mesa/log/?h=skl-fast-clear
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 07/10] mesa: fix incorrect error string in _mesa_BlendEquationiARB()

2015-10-14 Thread Eric Anholt
Brian Paul  writes:

> ---
>  src/mesa/main/blend.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/mesa/main/blend.c b/src/mesa/main/blend.c
> index d225f3d..f14949f 100644
> --- a/src/mesa/main/blend.c
> +++ b/src/mesa/main/blend.c
> @@ -407,7 +407,7 @@ _mesa_BlendEquationiARB(GLuint buf, GLenum mode)
>buf, _mesa_enum_to_string(mode));
>  
> if (buf >= ctx->Const.MaxDrawBuffers) {
> -  _mesa_error(ctx, GL_INVALID_VALUE, "glBlendFuncSeparatei(buffer=%u)",
> +  _mesa_error(ctx, GL_INVALID_VALUE, "glBlendFuncEquationi(buffer=%u)",
>buf);
>return;

The other strings in this function say "glBlendEquationi".  I think you
meant that, instead?


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 02/10] mesa: short-cut new_state == _NEW_LINE in _mesa_update_state_locked()

2015-10-14 Thread Eric Anholt
Brian Paul  writes:

> We can skip to the end of _mesa_update_state_locked() if only the
> _NEW_LINE flag is set since none of the derived state depends on it
> (just like _NEW_CURRENT_ATTRIB).  Note that we still call the
> ctx->Driver.UpdateState() function, of course.
> ---
>  src/mesa/main/state.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/main/state.c b/src/mesa/main/state.c
> index d3b1c72..7fa7da2 100644
> --- a/src/mesa/main/state.c
> +++ b/src/mesa/main/state.c
> @@ -392,7 +392,8 @@ _mesa_update_state_locked( struct gl_context *ctx )
> GLbitfield prog_flags = _NEW_PROGRAM;
> GLbitfield new_prog_state = 0x0;
>  
> -   if (new_state == _NEW_CURRENT_ATTRIB) 
> +   if (new_state == _NEW_CURRENT_ATTRIB ||
> +   new_state == _NEW_LINE)
>goto out;

Perhaps something like:

GLbitfield computed_states = ~(_NEW_CURRENT_ATTRIB | _NEW_LINE);

if (!(new_state & computed_states))
   goto out;

making the optimization slightly more general and more self-documenting?
Either way, other than the comment on #7,

Reviewed-by: Eric Anholt 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 92265] Black windows in weston after update mesa to 11.0.2-1

2015-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92265

--- Comment #16 from Eduardo Lima Mitev  ---
(In reply to Mark Janes from comment #14)
> No piglit, dEQP, or CTS tests indicated this regression.  However, a major
> consumer of Mesa was debilitated due to this bug.
> 
> This bug cannot be marked fixed until there exists a piglit test which
> prevents future regressions of this type.

I agree we must have a piglit test for this to avoid regressions in the future.
I will provide one ASAP.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization

2015-10-14 Thread Alejandro Piñeiro
On 14/10/15 10:15, Francisco Jerez wrote:
> Alejandro Piñeiro  writes:
>
>> On 13/10/15 23:36, Matt Turner wrote:
>>> On Tue, Oct 13, 2015 at 1:49 AM, Alejandro Piñeiro  
>>> wrote:
 On 13/10/15 03:10, Matt Turner wrote:
> Looks like this is causing an intermittent failure on HSW in our
> Jenkins system (but I'm not able to reproduce locally) and a
> consistent failure on G45. I'll investigate that.
 Ok, will hold on, and meanwhile I will try to reproduce the problem on
 HSW. Unfortunately I don't have any G45 available, so I will not be able
 to help on that. Thanks for taking a look there.
>>> Okay, I think I see what's going on:
>>>
>>> --- good2015-10-13 13:34:40.753357253 -0700
>>> +++ bad 2015-10-13 13:36:06.114290094 -0700
>>> -and(8)  g6<1>.xDg6<4,4,1>.xD1D  { align16 
>>> };
>>> -cmp.nz.f0(8)null-g6<4,4,1>.xD   0D  { align16 
>>> };
>>> +and.nz.f0(8)g6<1>.xDg6<4,4,1>.xD1D  { align16 
>>> };
>>>
>>> We're propagating a .nz conditional mod from a CMP with a null dest
>>> (that has a writemask of .xyzw) to an AND that has a writemask of only
>>> .x, so only the .x channel of the flag is now being updated.
>> That is really common. And that is the idea behind the first patch of
>> the series that changes how if's are emitted, and allowing to optimize
>> if scan_inst->writemask is WRITEMASK_X. I will use the piglit test
>> glsl-1.30/execution/switch/vs-pervertex.shader_test to explain this.
>>
>> If we use current master, we get the following assembly:
>> 1: cmp.z.f0(8) g20<1>.xD   g12<4,4,1>.xD   g19<4,4,1>.xD   {
>> align16 1Q compacted };
>> 2: mov.nz.f0(8)nullg20<4,4,1>.xD   {
>> align16 1Q };
>> 3: (+f0) if(8) JIP: 6  UIP: 6  {
>> align16 1Q };
>>
>> If we just simplify out the mov.nz at #2, the test would fail, because
>> as in the example you mentioned, null has a XYZW writemask, but g19 a
>> writemask X. So this optimized version, f0 only have x updated, and #3
>> is doing a normal if, that uses all the channels.
>>
>> But right now the if just needs to check against one component, the x.
>> So that is the reason (among others) that my patch 1 changes how if's
>> are emitted. So, using both my patches 1 and 2, the assembly outcome
>> would be the following:
>> 1: cmp.z.f0(8) g20<1>.xD   g12<4,4,1>.xD   g19<4,4,1>.xD   {
>> align16 1Q compacted };
>> 2: (+f0.x) if(8)   JIP: 6  UIP: 6  {
>> align16 1Q };
>>
>> It is true that now #1 only updates the x channel. But the if at #2 only
>> use that channel now, so the test still passes.
>>
>>> I think for now the thing to do is add
>>>
>>>(inst->dst.writemask & ~scan_inst->dst.writemask) != 0)
>> This would bail out basically all if's.
>>
> I agree with Matt that this check is necessary, otherwise
> cmod_propagation will replace the pair of instruction with another
> instruction which doesn't have equivalent semantics.  

Ok.

> If you still want
> cases like the one you pasted below involving an if conditional that
> only reads the x components to be simplified, I suggest you have a look
> at dead code elimination and fix it so that writemask bits of an
> instruction that writes a flag register (e.g. the "mov.nz.f0") are
> turned off for components which are determined to be dead, kind of like
> is done already for the vec4 components of GRFs.
>
> With that change the assembly from your example would look like this
> after DCE:
>
> | 1: cmp.z.f0(8) g20<1>.xD   g12<4,4,1>.xD   g19<4,4,1>.xD { 
> align16 1Q compacted };
> | 2: mov.nz.f0(8)null.x  g20<4,4,1>.xD { 
> align16 1Q };
> | 3: (+f0.x) if(8)   JIP: 6  UIP: 6{ 
> align16 1Q };
>
> Which cmod propagation would be allowed to simplify further.

Ok, thanks, that sounds like a good idea. Will work on that, and send a
new patch to this series.

>
>> So guessing about what is happening here, as I mentioned, we are
>> allowing to pass that condition if scan_inst->dst.writemask equal to
>> WRITEMASK_X assuming that they come from an if, so we don't need to
>> update all f0 channels. But as Jason mentioned in our off-list chat,
>> that would not be the case if we are dealing with a sel. So something
>> like this:
>> 1: cmp.z.f0(8) g20<1>.xD   g12<4,4,1>.xD   g19<4,4,1>.xD   {
>> align16 1Q compacted };
>> 2: mov.nz.f0(8)nullg20<4,4,1>.xD   {
>> align16 1Q };
>> 3: (+f0.x) if(8) JIP: 6  UIP: 6  {
>> align16 1Q };
>>
>> #2 can be optimized out. But if we replace that if for an sel at #3, we
>> can't.
>>
>> If that is the case, in addition to check scan_inst and inst, we would
>> need to check in which kind of instruction the flag register is used later.
>>
>> But as I 

Re: [Mesa-dev] [PATCH 05/10] i965/meta/gen9: Individually fast clear color attachments

2015-10-14 Thread Pohjolainen, Topi
On Wed, Oct 14, 2015 at 11:39:03AM +0200, Neil Roberts wrote:
> Ben Widawsky  writes:
> 
> > The impetus for this patch comes from a seemingly benign statement within 
> > the
> > spec (quoted within the patch). For me, this patch was at some point 
> > critical
> > for getting stable piglit results (though this did not seem to be the case 
> > on a
> > branch Chad was working on).
> >
> > It is very important for clearing multiple color buffer attachments and can 
> > be
> > observed in the following piglit tests:
> > spec/arb_framebuffer_object/fbo-drawbuffers-none glclear
> > spec/ext_framebuffer_multisample/blit-multiple-render-targets 0
> >
> > Signed-off-by: Ben Widawsky 
> > ---
> >  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 97 
> > +
> >  1 file changed, 84 insertions(+), 13 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> > b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > index 7bf52f0..9e6711e 100644
> > --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > @@ -427,6 +427,74 @@ use_rectlist(struct brw_context *brw, bool enable)
> > brw->ctx.NewDriverState |= BRW_NEW_FRAGMENT_PROGRAM;
> >  }
> >  
> > +/**
> > + * Individually fast clear each color buffer attachment. On previous gens 
> > this
> > + * isn't required. The motivation for this comes from one line (which 
> > seems to
> > + * be specific to SKL+). The list item is in section titled _MCS Buffer for
> > + * Render Target(s)_
> > + *
> > + *   "Since only one RT is bound with a clear pass, only one RT can be 
> > cleared
> > + *   at a time. To clear multiple RTs, multiple clear passes are required."
> 
> This sentence also appears in the HSW PRM so it seems a bit odd if it's
> only causing problems on SKL. I guess if we get Piglit regressions
> without it then it makes sense to have the patch. It might be worth just
> double checking whether this patch is completely necessary. The wording
> in the commit message seems a little unsure.

The spec seems to be missing something as the section discussing "Render
Target Fast Clear" seems to suggest the opposite:

"The render target(s) is/are bound as they normally would be, with the MCS
 surface defined in SURFACE_STATE."
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: Enable split of lower UBOs and SSBO also for compute shaders

2015-10-14 Thread Lofstedt, Marta
> -Original Message-
> From: Francisco Jerez [mailto:curroje...@riseup.net]
> Sent: Wednesday, October 14, 2015 3:18 PM
> To: Lofstedt, Marta; Marta Lofstedt; mesa-dev@lists.freedesktop.org
> Subject: Re: [Mesa-dev] [PATCH] glsl: Enable split of lower UBOs and SSBO
> also for compute shaders
> 
> "Lofstedt, Marta"  writes:
> 
> > I have found a couple of more places in linker.cpp where we loop up to
> MESA_SHADER_FRAGMENT.
> > Should these now also be up to MESA_SHADER_COMPUTE instead?
> >
> Some might be oversights like this, but I guess in some cases a loop up to
> MESA_SHADER_FRAGMENT might be the right thing to do when doing stuff
> like linking varyings that really only applies to the graphics pipeline and 
> not to
> compute shaders.
> 
Thanks Curro,

I will leave the potential other cases for now and send up a V2 with 
MESA_SHADER_STAGES instead.
This is blocking GLES 3.1 testing at the moment.

/Marta

> 
> > /Marta
> >
> >> -Original Message-
> >> From: Marta Lofstedt [mailto:marta.lofst...@linux.intel.com]
> >> Sent: Wednesday, October 14, 2015 2:56 PM
> >> To: mesa-dev@lists.freedesktop.org
> >> Cc: Lofstedt, Marta; Marta Lofstedt
> >> Subject: [PATCH] glsl: Enable split of lower UBOs and SSBO also for
> >> compute shaders
> >>
> >> From: Marta Lofstedt 
> >>
> >> The split of Uniform blocks and shader storage block only loops up to
> >> MESA_SHADER_FRAGMENT and igonres compute shaders.
> >> This cause segfault when running the OpenGL ES 3.1 CTS tests with
> >> GL_ARB_compute_shader enabled.
> >>
> >> Signed-off-by: Marta Lofstedt 
> >> ---
> >>  src/glsl/linker.cpp | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp index
> >> c61c76e..5b5d6e6
> >> 100644
> >> --- a/src/glsl/linker.cpp
> >> +++ b/src/glsl/linker.cpp
> >> @@ -4392,7 +4392,7 @@ link_shaders(struct gl_context *ctx, struct
> >> gl_shader_program *prog)
> >>  * for gl_shader_program and gl_shader, so that drivers that need
> separate
> >>  * index spaces for each set can have that.
> >>  */
> >> -   for (unsigned i = MESA_SHADER_VERTEX; i <=
> >> MESA_SHADER_FRAGMENT; i++) {
> >> +   for (unsigned i = MESA_SHADER_VERTEX; i <=
> MESA_SHADER_COMPUTE;
> >> i++)
> >
> > Remove new line.
> >
> >> + {
> >>if (prog->_LinkedShaders[i] != NULL) {
> >>   gl_shader *sh = prog->_LinkedShaders[i];
> >>   split_ubos_and_ssbos(sh,
> >> --
> >> 2.1.4
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] glsl: Enable split of lower UBOs and SSBO also for compute shaders

2015-10-14 Thread Marta Lofstedt
From: Marta Lofstedt 

The split of Uniform blocks and shader storage block only loops
up to MESA_SHADER_FRAGMENT and igonres compute shaders.
This cause segfault when running the OpenGL ES 3.1 CTS tests
with GL_ARB_compute_shader enabled.

V2: Changed to use MESA_SHADER_STAGES instead of
MESA_SHADER_COMPUTE

Signed-off-by: Marta Lofstedt 
---
 src/glsl/linker.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
index c61c76e..c15034b 100644
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -4392,7 +4392,7 @@ link_shaders(struct gl_context *ctx, struct 
gl_shader_program *prog)
 * for gl_shader_program and gl_shader, so that drivers that need separate
 * index spaces for each set can have that.
 */
-   for (unsigned i = MESA_SHADER_VERTEX; i <= MESA_SHADER_FRAGMENT; i++) {
+   for (unsigned i = MESA_SHADER_VERTEX; i < MESA_SHADER_STAGES; i++) {
   if (prog->_LinkedShaders[i] != NULL) {
  gl_shader *sh = prog->_LinkedShaders[i];
  split_ubos_and_ssbos(sh,
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] glsl: Enable split of lower UBOs and SSBO also for compute shaders

2015-10-14 Thread Francisco Jerez
Marta Lofstedt  writes:

> From: Marta Lofstedt 
>
> The split of Uniform blocks and shader storage block only loops
> up to MESA_SHADER_FRAGMENT and igonres compute shaders.
> This cause segfault when running the OpenGL ES 3.1 CTS tests
> with GL_ARB_compute_shader enabled.
>
> V2: Changed to use MESA_SHADER_STAGES instead of
> MESA_SHADER_COMPUTE
>
> Signed-off-by: Marta Lofstedt 

Reviewed-by: Francisco Jerez 

> ---
>  src/glsl/linker.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
> index c61c76e..c15034b 100644
> --- a/src/glsl/linker.cpp
> +++ b/src/glsl/linker.cpp
> @@ -4392,7 +4392,7 @@ link_shaders(struct gl_context *ctx, struct 
> gl_shader_program *prog)
>  * for gl_shader_program and gl_shader, so that drivers that need separate
>  * index spaces for each set can have that.
>  */
> -   for (unsigned i = MESA_SHADER_VERTEX; i <= MESA_SHADER_FRAGMENT; i++) {
> +   for (unsigned i = MESA_SHADER_VERTEX; i < MESA_SHADER_STAGES; i++) {
>if (prog->_LinkedShaders[i] != NULL) {
>   gl_shader *sh = prog->_LinkedShaders[i];
>   split_ubos_and_ssbos(sh,
> -- 
> 2.1.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V7 03/24] glsl: allow AoA to be sized by initializer or constructor

2015-10-14 Thread Timothy Arceri
On Fri, 2015-10-09 at 13:33 +0200, Samuel Iglesias Gonsálvez wrote:
> 
> On 09/10/15 13:25, Timothy Arceri wrote:
> > On Thu, 2015-10-08 at 11:08 +0200, Samuel Iglesias Gonsálvez wrote:
> > > On 07/10/15 00:47, Timothy Arceri wrote:
> > > > From Section 4.1.9 of the GLSL ES 3.10 spec:
> > > > 
> > > >  "Arrays are sized either at compile-time or at run-time.
> > > >   To size an array at compile-time, either the size
> > > >   must be specified within the brackets as above or
> > > >   must be inferred from the type of the initializer."
> > > > ---
> > > >  src/glsl/ast.h   | 15 ++---
> > > >  src/glsl/ast_array_index.cpp |  7 ++--
> > > >  src/glsl/ast_function.cpp| 33 +-
> > > >  src/glsl/ast_to_hir.cpp  | 79
> > > > ++--
> > > >  src/glsl/glsl_parser.yy  | 11 +++---
> > > >  5 files changed, 104 insertions(+), 41 deletions(-)
> > > > 
> > > > diff --git a/src/glsl/ast.h b/src/glsl/ast.h
> > > > index 4c31436..b43be24 100644
> > > > --- a/src/glsl/ast.h
> > > > +++ b/src/glsl/ast.h
> > > > @@ -181,6 +181,7 @@ enum ast_operators {
> > > > ast_post_dec,
> > > > ast_field_selection,
> > > > ast_array_index,
> > > > +   ast_unsized_array_dim,
> > > >  
> > > > ast_function_call,
> > > >  
> > > > @@ -318,16 +319,7 @@ public:
> > > >  
> > > >  class ast_array_specifier : public ast_node {
> > > >  public:
> > > > -   /** Unsized array specifier ([]) */
> > > > -   explicit ast_array_specifier(const struct YYLTYPE )
> > > > - : is_unsized_array(true)
> > > > -   {
> > > > -  set_location(locp);
> > > > -   }
> > > > -
> > > > -   /** Sized array specifier ([dim]) */
> > > > ast_array_specifier(const struct YYLTYPE ,
> > > > ast_expression
> > > > *dim)
> > > > - : is_unsized_array(false)
> > > > {
> > > >set_location(locp);
> > > >array_dimensions.push_tail(>link);
> > > > @@ -340,11 +332,8 @@ public:
> > > >  
> > > > virtual void print(void) const;
> > > >  
> > > > -   /* If true, this means that the array has an unsized
> > > > outermost
> > > > dimension. */
> > > > -   bool is_unsized_array;
> > > > -
> > > > /* This list contains objects of type ast_node containing
> > > > the
> > > > -* sized dimensions only, in outermost-to-innermost order.
> > > > +* array dimensions in outermost-to-innermost order.
> > > >  */
> > > > exec_list array_dimensions;
> > > >  };
> > > > diff --git a/src/glsl/ast_array_index.cpp
> > > > b/src/glsl/ast_array_index.cpp
> > > > index 5e8f49d..7855e0a 100644
> > > > --- a/src/glsl/ast_array_index.cpp
> > > > +++ b/src/glsl/ast_array_index.cpp
> > > > @@ -28,13 +28,10 @@
> > > >  void
> > > >  ast_array_specifier::print(void) const
> > > >  {
> > > > -   if (this->is_unsized_array) {
> > > > -  printf("[ ] ");
> > > > -   }
> > > > -
> > > > foreach_list_typed (ast_node, array_dimension, link, 
> > > > ->array_dimensions) {
> > > >printf("[ ");
> > > > -  array_dimension->print();
> > > > +  if (((ast_expression*)array_dimension)->oper !=
> > > > ast_unsized_array_dim)
> > > > + array_dimension->print();
> > > >printf("] ");
> > > > }
> > > >  }
> > > > diff --git a/src/glsl/ast_function.cpp
> > > > b/src/glsl/ast_function.cpp
> > > > index 26d4c62..cf4e64a 100644
> > > > --- a/src/glsl/ast_function.cpp
> > > > +++ b/src/glsl/ast_function.cpp
> > > > @@ -950,6 +950,7 @@ process_array_constructor(exec_list
> > > > *instructions,
> > > > }
> > > >  
> > > > bool all_parameters_are_constant = true;
> > > > +   const glsl_type *element_type = constructor_type
> > > > ->fields.array;
> > > >  
> > > > /* Type cast each parameter and, if possible, fold
> > > > constants.
> > > > */
> > > > foreach_in_list_safe(ir_rvalue, ir, _parameters) {
> > > > @@ -976,12 +977,34 @@ process_array_constructor(exec_list
> > > > *instructions,
> > > >  }
> > > >}
> > > >  
> > > > -  if (result->type != constructor_type->fields.array) {
> > > > +  if (constructor_type->fields.array->is_unsized_array())
> > > > {
> > > > + /* As the inner parameters of the constructor are
> > > > created
> > > > without
> > > > +  * knowledge of each other we need to check to make
> > > > sure
> > > > unsized
> > > > +  * parameters of unsized constructors all end up with
> > > > the
> > > > same size.
> > > > +  *
> > > > +  * e.g we make sure to fail for a constructor like
> > > > this:
> > > > +  * vec4[][] a = vec4[][](vec4[](vec4(0.0),
> > > > vec4(1.0)),
> > > > +  *   vec4[](vec4(0.0), vec4(1.0),
> > > > vec4(1.0)),
> > > > +  *   vec4[](vec4(0.0),
> > > > vec4(1.0)));
> > > > +  */
> > > > + if (element_type->is_unsized_array()) {
> > > > + /* This is the first parameter so just get the
> > > > type
> > > > */
> > > > +

Re: [Mesa-dev] [PATCH] i965/fs: Restore compute shader support in brw_nir_lower_inputs

2015-10-14 Thread Kenneth Graunke
On Tuesday, October 13, 2015 09:02:48 PM Jordan Justen wrote:
> On 2015-10-13 20:04:36, Kenneth Graunke wrote:
> > On Tuesday, October 13, 2015 01:44:55 PM Jordan Justen wrote:
> > > The commit shown below caused compute shaders to hit the unreachable
> > > in the default of the switch block. Restore compute shaders to use the
> > > fragment shader path.
> > > 
> > > Also, simplify the fragment/compute path to only support scalar mode.
> > > 
> > > commit 2953c3d76178d7589947e6ea1dbd902b7b02b3d4
> > > Author: Kenneth Graunke 
> > > Date:   Fri Aug 14 15:15:11 2015 -0700
> > > 
> > > i965/vs: Map scalar VS input locations properly; avoid tons of MOVs.
> > > 
> > > Signed-off-by: Jordan Justen 
> > > Cc: Kenneth Graunke 
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_nir.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
> > > b/src/mesa/drivers/dri/i965/brw_nir.c
> > > index 4f35d81..357ee4f 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_nir.c
> > > +++ b/src/mesa/drivers/dri/i965/brw_nir.c
> > > @@ -96,8 +96,10 @@ brw_nir_lower_inputs(nir_shader *nir, bool is_scalar)
> > >}
> > >break;
> > > case MESA_SHADER_FRAGMENT:
> > > +   case MESA_SHADER_COMPUTE:
> > > +  assert(is_scalar);
> > >nir_assign_var_locations(>inputs, >num_inputs,
> > > -   is_scalar ? type_size_scalar : 
> > > type_size_vec4);
> > > +   type_size_scalar);
> > >break;
> > > default:
> > >unreachable("unsupported shader stage");
> > > 
> > 
> > I didn't think compute shaders had inputs...so it might make more sense
> > just to do:
> > 
> >case MESA_SHADER_COMPUTE:
> >   /* Compute shaders have no inputs. */
> >   break;
> 
> Huh. I looked at nir_assign_var_locations and it seemed to ignore
> UBO's and SSBO's, but not normal uniforms. So, I thought it might be
> needed for uniforms.
> 
> But, I didn't see a regression after skipping calling it. Can you
> confirm it is not needed for uniforms?
> 
> Strange, it looks like fs_visitor::nir_setup_uniforms will use
> var->data.driver_location which nir_assign_var_locations sets. So, I'm
> wondering why not calling it is not causing trouble.
> 
> > Either way, sorry for breaking this, and...
> > Reviewed-by: Kenneth Graunke 
> 
> No problem. Hitting unreachable is easy to debug. :)
> 
> -Jordan

brw_nir_lower_inputs() / brw_nir_lower_outputs() are just for shader
inputs/outputs.  Uniforms are handled by the caller one line below.

--Ken


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/4] i965: skip control-flow aware liveness analysis if we only have 1 block

2015-10-14 Thread Jordan Justen
On 2015-10-13 22:49:08, Iago Toral wrote:
> On Tue, 2015-10-13 at 09:44 -0700, Jordan Justen wrote:
> > On 2015-10-13 05:17:37, Francisco Jerez wrote:
> > > Iago Toral Quiroga  writes:
> > > 
> > > > This fixes the following test:
> > > >
> > > > [require]
> > > > GL >= 3.3
> > > > GLSL >= 3.30
> > > > GL_ARB_shader_storage_buffer_object
> > > >
> > > > [fragment shader]
> > > > #version 330
> > > > #extension GL_ARB_shader_storage_buffer_object: require
> > > >
> > > > buffer SSBO {
> > > > mat4 sm4;
> > > > };
> > > >
> > > >
> > > > mat4 um4;
> > > >
> > > > void main() {
> > > > sm4 *= um4;
> > > 
> > > This is using the value of "um4", which is never assigned, so liveness
> > > analysis will correctly extend its live interval up to the start of the
> > > block.
> > 
> > This test was derived by simplifying a CTS test case.
> > 
> > Anyway, I'm not sure what happened on the way to the commit message,
> > but um4 should be a uniform.
> > 
> > http://sprunge.us/cEUe
> 
> Oh yes, that was me playing around with the example. The patches also
> fix the uniform version. Jordan, can you verify if this fixes the CTS
> test case?

Unfortunately, no. The CTS case has some control flow. I had removed
it to minimize the test case.

Here is a small shader_test that has control flow and still fails to
compile with your patches:

http://sprunge.us/LIjA

> In any case, since Curro is working on a more general fix for this
> stuff, I guess you'd rather wait for his patches...

It depends how long we'd have to wait. :) Anyway, since we don't have
a short-term fix anyhow, let's wait to see what curro has to say.

-Jordan
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: Enable split of lower UBOs and SSBO also for compute shaders

2015-10-14 Thread Marta Lofstedt
From: Marta Lofstedt 

The split of Uniform blocks and shader storage block only loops
up to MESA_SHADER_FRAGMENT and igonres compute shaders.
This cause segfault when running the OpenGL ES 3.1 CTS tests
with GL_ARB_compute_shader enabled.

Signed-off-by: Marta Lofstedt 
---
 src/glsl/linker.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
index c61c76e..5b5d6e6 100644
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -4392,7 +4392,7 @@ link_shaders(struct gl_context *ctx, struct 
gl_shader_program *prog)
 * for gl_shader_program and gl_shader, so that drivers that need separate
 * index spaces for each set can have that.
 */
-   for (unsigned i = MESA_SHADER_VERTEX; i <= MESA_SHADER_FRAGMENT; i++) {
+   for (unsigned i = MESA_SHADER_VERTEX; i <= MESA_SHADER_COMPUTE; i++) {
   if (prog->_LinkedShaders[i] != NULL) {
  gl_shader *sh = prog->_LinkedShaders[i];
  split_ubos_and_ssbos(sh,
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: Enable split of lower UBOs and SSBO also for compute shaders

2015-10-14 Thread Francisco Jerez
"Lofstedt, Marta"  writes:

> I have found a couple of more places in linker.cpp where we loop up to 
> MESA_SHADER_FRAGMENT.
> Should these now also be up to MESA_SHADER_COMPUTE instead?
>
Some might be oversights like this, but I guess in some cases a loop up
to MESA_SHADER_FRAGMENT might be the right thing to do when doing stuff
like linking varyings that really only applies to the graphics pipeline
and not to compute shaders.

> /Marta
>
>> -Original Message-
>> From: Marta Lofstedt [mailto:marta.lofst...@linux.intel.com]
>> Sent: Wednesday, October 14, 2015 2:56 PM
>> To: mesa-dev@lists.freedesktop.org
>> Cc: Lofstedt, Marta; Marta Lofstedt
>> Subject: [PATCH] glsl: Enable split of lower UBOs and SSBO also for compute
>> shaders
>> 
>> From: Marta Lofstedt 
>> 
>> The split of Uniform blocks and shader storage block only loops up to
>> MESA_SHADER_FRAGMENT and igonres compute shaders.
>> This cause segfault when running the OpenGL ES 3.1 CTS tests with
>> GL_ARB_compute_shader enabled.
>> 
>> Signed-off-by: Marta Lofstedt 
>> ---
>>  src/glsl/linker.cpp | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp index c61c76e..5b5d6e6
>> 100644
>> --- a/src/glsl/linker.cpp
>> +++ b/src/glsl/linker.cpp
>> @@ -4392,7 +4392,7 @@ link_shaders(struct gl_context *ctx, struct
>> gl_shader_program *prog)
>>  * for gl_shader_program and gl_shader, so that drivers that need 
>> separate
>>  * index spaces for each set can have that.
>>  */
>> -   for (unsigned i = MESA_SHADER_VERTEX; i <=
>> MESA_SHADER_FRAGMENT; i++) {
>> +   for (unsigned i = MESA_SHADER_VERTEX; i <= MESA_SHADER_COMPUTE;
>> i++)
>
> Remove new line.
>
>> + {
>>if (prog->_LinkedShaders[i] != NULL) {
>>   gl_shader *sh = prog->_LinkedShaders[i];
>>   split_ubos_and_ssbos(sh,
>> --
>> 2.1.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 2/2] gallium: add tegra support

2015-10-14 Thread Erik Faye-Lund
On Sun, Oct 11, 2015 at 5:09 PM, Christian Gmeiner
 wrote:
> @@ -2181,6 +2188,13 @@ if test -n "$with_gallium_drivers"; then
>  done
>  fi
>
> +dnl We need to validate some needed dependencies for renderonly drivers.
> +
> +if test "x$HAVE_GALLIUM_NOUVEAU" != xyes -a "x$HAVE_GALLIUM_TEGRA" == xyes  
> ; then
> +AC_ERROR([Building with tegra requires that nouveau])

Requires that nouveau what? :)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: Enable split of lower UBOs and SSBO also for compute shaders

2015-10-14 Thread Francisco Jerez
Hi Marta,

Marta Lofstedt  writes:

> From: Marta Lofstedt 
>
> The split of Uniform blocks and shader storage block only loops
> up to MESA_SHADER_FRAGMENT and igonres compute shaders.
> This cause segfault when running the OpenGL ES 3.1 CTS tests
> with GL_ARB_compute_shader enabled.
>
> Signed-off-by: Marta Lofstedt 
> ---
>  src/glsl/linker.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
> index c61c76e..5b5d6e6 100644
> --- a/src/glsl/linker.cpp
> +++ b/src/glsl/linker.cpp
> @@ -4392,7 +4392,7 @@ link_shaders(struct gl_context *ctx, struct 
> gl_shader_program *prog)
>  * for gl_shader_program and gl_shader, so that drivers that need separate
>  * index spaces for each set can have that.
>  */
> -   for (unsigned i = MESA_SHADER_VERTEX; i <= MESA_SHADER_FRAGMENT; i++) {
> +   for (unsigned i = MESA_SHADER_VERTEX; i <= MESA_SHADER_COMPUTE; i++) {

Any reason not to use MESA_SHADER_STAGES as upper bound here so this
doesn't break again if new shader stages are added?

>if (prog->_LinkedShaders[i] != NULL) {
>   gl_shader *sh = prog->_LinkedShaders[i];
>   split_ubos_and_ssbos(sh,
> -- 
> 2.1.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: Enable split of lower UBOs and SSBO also for compute shaders

2015-10-14 Thread Lofstedt, Marta

I have found a couple of more places in linker.cpp where we loop up to 
MESA_SHADER_FRAGMENT.
Should these now also be up to MESA_SHADER_COMPUTE instead?

/Marta

> -Original Message-
> From: Marta Lofstedt [mailto:marta.lofst...@linux.intel.com]
> Sent: Wednesday, October 14, 2015 2:56 PM
> To: mesa-dev@lists.freedesktop.org
> Cc: Lofstedt, Marta; Marta Lofstedt
> Subject: [PATCH] glsl: Enable split of lower UBOs and SSBO also for compute
> shaders
> 
> From: Marta Lofstedt 
> 
> The split of Uniform blocks and shader storage block only loops up to
> MESA_SHADER_FRAGMENT and igonres compute shaders.
> This cause segfault when running the OpenGL ES 3.1 CTS tests with
> GL_ARB_compute_shader enabled.
> 
> Signed-off-by: Marta Lofstedt 
> ---
>  src/glsl/linker.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp index c61c76e..5b5d6e6
> 100644
> --- a/src/glsl/linker.cpp
> +++ b/src/glsl/linker.cpp
> @@ -4392,7 +4392,7 @@ link_shaders(struct gl_context *ctx, struct
> gl_shader_program *prog)
>  * for gl_shader_program and gl_shader, so that drivers that need separate
>  * index spaces for each set can have that.
>  */
> -   for (unsigned i = MESA_SHADER_VERTEX; i <=
> MESA_SHADER_FRAGMENT; i++) {
> +   for (unsigned i = MESA_SHADER_VERTEX; i <= MESA_SHADER_COMPUTE;
> i++)

Remove new line.

> + {
>if (prog->_LinkedShaders[i] != NULL) {
>   gl_shader *sh = prog->_LinkedShaders[i];
>   split_ubos_and_ssbos(sh,
> --
> 2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: Improve handling of GL_BGRA format in es3 format_and_type checks

2015-10-14 Thread Eduardo Lima Mitev
We recently added support for GL_BGRA internal format when validating
combination of format+type+internal_format in Tex(Sub)ImageXD calls
(to fix https://bugs.freedesktop.org/show_bug.cgi?id=92265).

However, the current implementation handles it as a special case when
obtaining the effective internal format, treating GL_BGRA as if its
base format is GL_RGBA execpt for the case of validation.

This causes Mesa to accept a combination like:
internalFormat = GL_BGRA_EXT, format = GL_RGBA, type = GL_UNSIGNED_BYTE as
valid arguments to TexImage2D, when it is actually an invalid combination
per EXT_texture_format_BGRA


This patch makes _mesa_base_tex_format() return GL_BGRA_EXT as base format of
GL_BGRA_EXT internal format, which is consistent with the extension
spec. As a result, the code for handling GL_BGRA during validation gets
simplified.
---
 src/mesa/main/glformats.c | 21 +++--
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/src/mesa/main/glformats.c b/src/mesa/main/glformats.c
index faa6382..e0192fe 100644
--- a/src/mesa/main/glformats.c
+++ b/src/mesa/main/glformats.c
@@ -2148,6 +2148,9 @@ _mesa_es_error_check_format_and_type(GLenum format, 
GLenum type,
  *
  * \return the corresponding \u base internal format (GL_ALPHA, GL_LUMINANCE,
  * GL_LUMANCE_ALPHA, GL_INTENSITY, GL_RGB, or GL_RGBA), or -1 if invalid enum.
+ * When profile is GLES, it will also return GL_BGRA as base format of
+ * GL_BGRA internal format, as specified by extension
+ * EXT_texture_format_BGRA.
  *
  * This is the format which is used during texture application (i.e. the
  * texture format and env mode determine the arithmetic used.
@@ -2215,7 +2218,7 @@ _mesa_base_tex_format(const struct gl_context *ctx, GLint 
internalFormat)
if (_mesa_is_gles(ctx)) {
   switch (internalFormat) {
   case GL_BGRA:
- return GL_RGBA;
+ return GL_BGRA_EXT;
   default:
  ; /* fallthrough */
   }
@@ -2799,18 +2802,8 @@ _mesa_es3_error_check_format_and_type(const struct 
gl_context *ctx,
  return GL_INVALID_OPERATION;
 
   GLenum baseInternalFormat;
-  if (internalFormat == GL_BGRA_EXT) {
- /* Unfortunately, _mesa_base_tex_format returns a base format of
-  * GL_RGBA for GL_BGRA_EXT.  This makes perfect sense if you're
-  * asking the question, "what channels does this format have?"
-  * However, if we're trying to determine if two internal formats
-  * match in the ES3 sense, we actually want GL_BGRA.
-  */
- baseInternalFormat = GL_BGRA_EXT;
-  } else {
- baseInternalFormat =
-_mesa_base_tex_format(ctx, effectiveInternalFormat);
-  }
+  baseInternalFormat =
+ _mesa_base_tex_format(ctx, effectiveInternalFormat);
 
   if (internalFormat != baseInternalFormat)
  return GL_INVALID_OPERATION;
@@ -2820,7 +2813,7 @@ _mesa_es3_error_check_format_and_type(const struct 
gl_context *ctx,
 
switch (format) {
case GL_BGRA_EXT:
-  if (type != GL_UNSIGNED_BYTE || internalFormat != GL_BGRA)
+  if (type != GL_UNSIGNED_BYTE || internalFormat != format)
  return GL_INVALID_OPERATION;
   break;
 
-- 
2.5.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 92265] Black windows in weston after update mesa to 11.0.2-1

2015-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92265

--- Comment #17 from Eduardo Lima Mitev  ---
I just sent for review a piglit test that checks that the combination of
internalFormat=GL_BGRA_EXT, format=GL_BGRA_EXT and type=GL_UNSIGNED_BYTE is
valid on TexImageXD and TexSubImageXD, as specified by the extension
:

http://lists.freedesktop.org/archives/piglit/2015-October/017535.html

This should prevent this regression in the future.

However, this test doesn't pass on master because current handling of GL_BGRA
format allows for this invalid combination (which is checked in the test):

internalFormat=GL_RGBA format=GL_BGRA_EXT and type=GL_UNSIGNED_BYTE

or

internalFormat=GL_BGRA_EXT format=GL_RGBA and type=GL_UNSIGNED_BYTE

So I also sent a patch to mesa-dev that improves this and make the test pass: 

http://lists.freedesktop.org/archives/mesa-dev/2015-October/097211.html

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: initialise record count to 1

2015-10-14 Thread Timothy Arceri
This was only being done in one of the two process methods.

Fixes issue with samplers using the array size of a previous record.

Cc: Marek Olšák 
Cc: Jason Ekstrand 
---
 src/glsl/link_uniforms.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/glsl/link_uniforms.cpp b/src/glsl/link_uniforms.cpp
index 0ccd9c8..e60b050 100644
--- a/src/glsl/link_uniforms.cpp
+++ b/src/glsl/link_uniforms.cpp
@@ -160,6 +160,7 @@ program_resource_visitor::process(ir_variable *var)
 false, record_array_count);
   ralloc_free(name);
} else {
+  this->set_record_array_count(record_array_count);
   this->visit_field(t, var->name, row_major, NULL, packing, false);
}
 }
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/10] mesa: optimize _UsesDualSrc blend flag setting

2015-10-14 Thread Brian Paul
For glBlendFunc and glBlendFuncSeparate(), the _UsesDualSrc flag
will be the same for all buffers, so no need to compute it N times.
---
 src/mesa/main/blend.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/blend.c b/src/mesa/main/blend.c
index f14949f..6e27cf9 100644
--- a/src/mesa/main/blend.c
+++ b/src/mesa/main/blend.c
@@ -255,8 +255,13 @@ _mesa_BlendFuncSeparate( GLenum sfactorRGB, GLenum 
dfactorRGB,
   ctx->Color.Blend[buf].DstRGB = dfactorRGB;
   ctx->Color.Blend[buf].SrcA = sfactorA;
   ctx->Color.Blend[buf].DstA = dfactorA;
-  update_uses_dual_src(ctx, buf);
}
+
+   update_uses_dual_src(ctx, 0);
+   for (buf = 1; buf < numBuffers; buf++) {
+  ctx->Color.Blend[buf]._UsesDualSrc = ctx->Color.Blend[0]._UsesDualSrc;
+   }
+
ctx->Color._BlendFuncPerBuffer = GL_FALSE;
 
if (ctx->Driver.BlendFuncSeparate) {
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/10] mesa: remove FLUSH_VERTICES() in _mesa_MatrixMode()

2015-10-14 Thread Brian Paul
Changing the matrix mode alone has no effect on rendering and does
not need to trigger a flush or state validation.
---
 src/mesa/main/matrix.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/mesa/main/matrix.c b/src/mesa/main/matrix.c
index 2b8016a..5ff5ac5 100644
--- a/src/mesa/main/matrix.c
+++ b/src/mesa/main/matrix.c
@@ -151,7 +151,6 @@ _mesa_MatrixMode( GLenum mode )
 
if (ctx->Transform.MatrixMode == mode && mode != GL_TEXTURE)
   return;
-   FLUSH_VERTICES(ctx, _NEW_TRANSFORM);
 
switch (mode) {
case GL_MODELVIEW:
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/10] mesa: wrap a ridiculously long line in es1_conversion.c

2015-10-14 Thread Brian Paul
---
 src/mesa/main/es1_conversion.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/es1_conversion.c b/src/mesa/main/es1_conversion.c
index b254a6e..1dfe827 100644
--- a/src/mesa/main/es1_conversion.c
+++ b/src/mesa/main/es1_conversion.c
@@ -1,3 +1,4 @@
+
 #include 
 
 #include "api_loopback.h"
@@ -326,7 +327,24 @@ _mesa_GetTexEnvxv(GLenum target, GLenum pname, GLfixed 
*params)
   }
   break;
case GL_TEXTURE_ENV:
-  if (pname != GL_TEXTURE_ENV_COLOR && pname != GL_RGB_SCALE && pname != 
GL_ALPHA_SCALE && pname != GL_TEXTURE_ENV_MODE && pname != GL_COMBINE_RGB && 
pname != GL_COMBINE_ALPHA && pname != GL_SRC0_RGB && pname != GL_SRC1_RGB && 
pname != GL_SRC2_RGB && pname != GL_SRC0_ALPHA && pname != GL_SRC1_ALPHA && 
pname != GL_SRC2_ALPHA && pname != GL_OPERAND0_RGB && pname != GL_OPERAND1_RGB 
&& pname != GL_OPERAND2_RGB && pname != GL_OPERAND0_ALPHA && pname != 
GL_OPERAND1_ALPHA && pname != GL_OPERAND2_ALPHA) {
+  if (pname != GL_TEXTURE_ENV_COLOR &&
+  pname != GL_RGB_SCALE &&
+  pname != GL_ALPHA_SCALE &&
+  pname != GL_TEXTURE_ENV_MODE &&
+  pname != GL_COMBINE_RGB &&
+  pname != GL_COMBINE_ALPHA &&
+  pname != GL_SRC0_RGB &&
+  pname != GL_SRC1_RGB &&
+  pname != GL_SRC2_RGB &&
+  pname != GL_SRC0_ALPHA &&
+  pname != GL_SRC1_ALPHA &&
+  pname != GL_SRC2_ALPHA &&
+  pname != GL_OPERAND0_RGB &&
+  pname != GL_OPERAND1_RGB &&
+  pname != GL_OPERAND2_RGB &&
+  pname != GL_OPERAND0_ALPHA &&
+  pname != GL_OPERAND1_ALPHA &&
+  pname != GL_OPERAND2_ALPHA) {
  _mesa_error(_mesa_get_current_context(), GL_INVALID_ENUM,
  "glGetTexEnvxv(target=0x%x)", target);
  return;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/10] mesa: optimize no-change check in _mesa_BlendFuncSeparate()

2015-10-14 Thread Brian Paul
Streamline the checking for no state change in _mesa_BlendFuncSeparate()
(and _mesa_BlendFunc()).  If _BlendFuncPerBuffer is false, we only need
to check the 0th buffer state.  Move argument validation after the no-op
check.

I'm looking at an app that issues about 1000 redundant glBlendFunc()
calls per frame!
---
 src/mesa/main/blend.c | 43 ---
 1 file changed, 28 insertions(+), 15 deletions(-)

diff --git a/src/mesa/main/blend.c b/src/mesa/main/blend.c
index dee5e29..98d2858 100644
--- a/src/mesa/main/blend.c
+++ b/src/mesa/main/blend.c
@@ -203,7 +203,7 @@ _mesa_BlendFuncSeparate( GLenum sfactorRGB, GLenum 
dfactorRGB,
 GLenum sfactorA, GLenum dfactorA )
 {
GLuint buf, numBuffers;
-   GLboolean changed;
+   bool changed = false;
GET_CURRENT_CONTEXT(ctx);
 
if (MESA_VERBOSE & VERBOSE_API)
@@ -213,28 +213,41 @@ _mesa_BlendFuncSeparate( GLenum sfactorRGB, GLenum 
dfactorRGB,
   _mesa_enum_to_string(sfactorA),
   _mesa_enum_to_string(dfactorA));
 
-   if (!validate_blend_factors(ctx, "glBlendFuncSeparate",
-   sfactorRGB, dfactorRGB,
-   sfactorA, dfactorA)) {
-  return;
-   }
-
numBuffers = ctx->Extensions.ARB_draw_buffers_blend
   ? ctx->Const.MaxDrawBuffers : 1;
 
-   changed = GL_FALSE;
-   for (buf = 0; buf < numBuffers; buf++) {
-  if (ctx->Color.Blend[buf].SrcRGB != sfactorRGB ||
-  ctx->Color.Blend[buf].DstRGB != dfactorRGB ||
-  ctx->Color.Blend[buf].SrcA != sfactorA ||
-  ctx->Color.Blend[buf].DstA != dfactorA) {
- changed = GL_TRUE;
- break;
+   /* Check if we're really changing any state.  If not, return early. */
+   if (ctx->Color._BlendFuncPerBuffer) {
+  /* Check all per-buffer states */
+  for (buf = 0; buf < numBuffers; buf++) {
+ if (ctx->Color.Blend[buf].SrcRGB != sfactorRGB ||
+ ctx->Color.Blend[buf].DstRGB != dfactorRGB ||
+ ctx->Color.Blend[buf].SrcA != sfactorA ||
+ ctx->Color.Blend[buf].DstA != dfactorA) {
+changed = true;
+break;
+ }
+  }
+   }
+   else {
+  /* only need to check 0th per-buffer state */
+  if (ctx->Color.Blend[0].SrcRGB != sfactorRGB ||
+  ctx->Color.Blend[0].DstRGB != dfactorRGB ||
+  ctx->Color.Blend[0].SrcA != sfactorA ||
+  ctx->Color.Blend[0].DstA != dfactorA) {
+ changed = true;
   }
}
+
if (!changed)
   return;
 
+   if (!validate_blend_factors(ctx, "glBlendFuncSeparate",
+   sfactorRGB, dfactorRGB,
+   sfactorA, dfactorA)) {
+  return;
+   }
+
FLUSH_VERTICES(ctx, _NEW_COLOR);
 
for (buf = 0; buf < numBuffers; buf++) {
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/10] mesa: add num_buffers() helper in blend.c

2015-10-14 Thread Brian Paul
---
 src/mesa/main/blend.c | 37 ++---
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/src/mesa/main/blend.c b/src/mesa/main/blend.c
index 6e27cf9..ccbac75 100644
--- a/src/mesa/main/blend.c
+++ b/src/mesa/main/blend.c
@@ -190,6 +190,19 @@ update_uses_dual_src(struct gl_context *ctx, int buf)
blend_factor_is_dual_src(ctx->Color.Blend[buf].DstA));
 }
 
+
+/**
+ * Return the number of per-buffer blend states to update in
+ * glBlendFunc, glBlendFuncSeparate, glBlendEquation, etc.
+ */
+static inline unsigned
+num_buffers(const struct gl_context *ctx)
+{
+   return ctx->Extensions.ARB_draw_buffers_blend
+  ? ctx->Const.MaxDrawBuffers : 1;
+}
+
+
 /**
  * Set the separate blend source/dest factors for all draw buffers.
  *
@@ -202,9 +215,10 @@ void GLAPIENTRY
 _mesa_BlendFuncSeparate( GLenum sfactorRGB, GLenum dfactorRGB,
 GLenum sfactorA, GLenum dfactorA )
 {
-   GLuint buf, numBuffers;
-   bool changed = false;
GET_CURRENT_CONTEXT(ctx);
+   const unsigned numBuffers = num_buffers(ctx);
+   unsigned buf;
+   bool changed = false;
 
if (MESA_VERBOSE & VERBOSE_API)
   _mesa_debug(ctx, "glBlendFuncSeparate %s %s %s %s\n",
@@ -213,9 +227,6 @@ _mesa_BlendFuncSeparate( GLenum sfactorRGB, GLenum 
dfactorRGB,
   _mesa_enum_to_string(sfactorA),
   _mesa_enum_to_string(dfactorA));
 
-   numBuffers = ctx->Extensions.ARB_draw_buffers_blend
-  ? ctx->Const.MaxDrawBuffers : 1;
-
/* Check if we're really changing any state.  If not, return early. */
if (ctx->Color._BlendFuncPerBuffer) {
   /* Check all per-buffer states */
@@ -349,17 +360,15 @@ legal_blend_equation(const struct gl_context *ctx, GLenum 
mode)
 void GLAPIENTRY
 _mesa_BlendEquation( GLenum mode )
 {
-   GLuint buf, numBuffers;
-   bool changed = false;
GET_CURRENT_CONTEXT(ctx);
+   const unsigned numBuffers = num_buffers(ctx);
+   unsigned buf;
+   bool changed = false;
 
if (MESA_VERBOSE & VERBOSE_API)
   _mesa_debug(ctx, "glBlendEquation(%s)\n",
   _mesa_enum_to_string(mode));
 
-   numBuffers = ctx->Extensions.ARB_draw_buffers_blend
-  ? ctx->Const.MaxDrawBuffers : 1;
-
if (ctx->Color._BlendEquationPerBuffer) {
   /* Check all per-buffer states */
   for (buf = 0; buf < numBuffers; buf++) {
@@ -436,18 +445,16 @@ _mesa_BlendEquationiARB(GLuint buf, GLenum mode)
 void GLAPIENTRY
 _mesa_BlendEquationSeparate( GLenum modeRGB, GLenum modeA )
 {
-   GLuint buf, numBuffers;
-   bool changed = false;
GET_CURRENT_CONTEXT(ctx);
+   const unsigned numBuffers = num_buffers(ctx);
+   unsigned buf;
+   bool changed = false;
 
if (MESA_VERBOSE & VERBOSE_API)
   _mesa_debug(ctx, "glBlendEquationSeparateEXT(%s %s)\n",
   _mesa_enum_to_string(modeRGB),
   _mesa_enum_to_string(modeA));
 
-   numBuffers = ctx->Extensions.ARB_draw_buffers_blend
-  ? ctx->Const.MaxDrawBuffers : 1;
-
if (ctx->Color._BlendEquationPerBuffer) {
   /* Check all per-buffer states */
   for (buf = 0; buf < numBuffers; buf++) {
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/10] mesa: optimize no-change check in _mesa_BlendEquationSeparate()

2015-10-14 Thread Brian Paul
---
 src/mesa/main/blend.c | 41 ++---
 1 file changed, 26 insertions(+), 15 deletions(-)

diff --git a/src/mesa/main/blend.c b/src/mesa/main/blend.c
index 01b6919..14742d0 100644
--- a/src/mesa/main/blend.c
+++ b/src/mesa/main/blend.c
@@ -432,7 +432,7 @@ void GLAPIENTRY
 _mesa_BlendEquationSeparate( GLenum modeRGB, GLenum modeA )
 {
GLuint buf, numBuffers;
-   GLboolean changed;
+   bool changed = false;
GET_CURRENT_CONTEXT(ctx);
 
if (MESA_VERBOSE & VERBOSE_API)
@@ -440,6 +440,30 @@ _mesa_BlendEquationSeparate( GLenum modeRGB, GLenum modeA )
   _mesa_enum_to_string(modeRGB),
   _mesa_enum_to_string(modeA));
 
+   numBuffers = ctx->Extensions.ARB_draw_buffers_blend
+  ? ctx->Const.MaxDrawBuffers : 1;
+
+   if (ctx->Color._BlendEquationPerBuffer) {
+  /* Check all per-buffer states */
+  for (buf = 0; buf < numBuffers; buf++) {
+ if (ctx->Color.Blend[buf].EquationRGB != modeRGB ||
+ ctx->Color.Blend[buf].EquationA != modeA) {
+changed = true;
+break;
+ }
+  }
+   }
+   else {
+  /* only need to check 0th per-buffer state */
+  if (ctx->Color.Blend[0].EquationRGB != modeRGB ||
+  ctx->Color.Blend[0].EquationA != modeA) {
+ changed = true;
+  }
+   }
+
+   if (!changed)
+  return;
+
if ( (modeRGB != modeA) && !ctx->Extensions.EXT_blend_equation_separate ) {
   _mesa_error(ctx, GL_INVALID_OPERATION,
  "glBlendEquationSeparateEXT not supported by driver");
@@ -456,21 +480,8 @@ _mesa_BlendEquationSeparate( GLenum modeRGB, GLenum modeA )
   return;
}
 
-   numBuffers = ctx->Extensions.ARB_draw_buffers_blend
-  ? ctx->Const.MaxDrawBuffers : 1;
-
-   changed = GL_FALSE;
-   for (buf = 0; buf < numBuffers; buf++) {
-  if (ctx->Color.Blend[buf].EquationRGB != modeRGB ||
-  ctx->Color.Blend[buf].EquationA != modeA) {
- changed = GL_TRUE;
- break;
-  }
-   }
-   if (!changed)
-  return;
-
FLUSH_VERTICES(ctx, _NEW_COLOR);
+
for (buf = 0; buf < numBuffers; buf++) {
   ctx->Color.Blend[buf].EquationRGB = modeRGB;
   ctx->Color.Blend[buf].EquationA = modeA;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/10] mesa: fix incorrect error string in _mesa_BlendEquationiARB()

2015-10-14 Thread Brian Paul
---
 src/mesa/main/blend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/main/blend.c b/src/mesa/main/blend.c
index d225f3d..f14949f 100644
--- a/src/mesa/main/blend.c
+++ b/src/mesa/main/blend.c
@@ -407,7 +407,7 @@ _mesa_BlendEquationiARB(GLuint buf, GLenum mode)
   buf, _mesa_enum_to_string(mode));
 
if (buf >= ctx->Const.MaxDrawBuffers) {
-  _mesa_error(ctx, GL_INVALID_VALUE, "glBlendFuncSeparatei(buffer=%u)",
+  _mesa_error(ctx, GL_INVALID_VALUE, "glBlendFuncEquationi(buffer=%u)",
   buf);
   return;
}
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/10] mesa: move validate_blend_factors() call after no-change check

2015-10-14 Thread Brian Paul
A redundant call to glBlendFuncSeparateiARB() is more likely than getting
invalid values, so do the no-op check first.
---
 src/mesa/main/blend.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/mesa/main/blend.c b/src/mesa/main/blend.c
index 14742d0..d225f3d 100644
--- a/src/mesa/main/blend.c
+++ b/src/mesa/main/blend.c
@@ -296,18 +296,18 @@ _mesa_BlendFuncSeparateiARB(GLuint buf, GLenum 
sfactorRGB, GLenum dfactorRGB,
   return;
}
 
-   if (!validate_blend_factors(ctx, "glBlendFuncSeparatei",
-   sfactorRGB, dfactorRGB,
-   sfactorA, dfactorA)) {
-  return;
-   }
-
if (ctx->Color.Blend[buf].SrcRGB == sfactorRGB &&
ctx->Color.Blend[buf].DstRGB == dfactorRGB &&
ctx->Color.Blend[buf].SrcA == sfactorA &&
ctx->Color.Blend[buf].DstA == dfactorA)
   return; /* no change */
 
+   if (!validate_blend_factors(ctx, "glBlendFuncSeparatei",
+   sfactorRGB, dfactorRGB,
+   sfactorA, dfactorA)) {
+  return;
+   }
+
FLUSH_VERTICES(ctx, _NEW_COLOR);
 
ctx->Color.Blend[buf].SrcRGB = sfactorRGB;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 12/17] i965/vs: Rework vs_emit to take a nir_shader and a brw_compiler

2015-10-14 Thread Pohjolainen, Topi
On Wed, Oct 14, 2015 at 11:53:37AM -0700, Jason Ekstrand wrote:
> On Wed, Oct 14, 2015 at 1:41 AM, Pohjolainen, Topi
>  wrote:
> > On Wed, Oct 14, 2015 at 11:25:40AM +0300, Pohjolainen, Topi wrote:
> >> On Sat, Oct 10, 2015 at 08:09:01AM -0700, Jason Ekstrand wrote:
> >> > This commit removes all dependence on GL state by getting rid of the
> >> > brw_context parameter and the GL data structures.
> >> >
> >> > v2 (Jason Ekstrand):
> >> >- Patch use_legacy_snorm_formula through as a function argument rather
> >> >  than trying to go through the shader key.
> >> > ---
> >> >  src/mesa/drivers/dri/i965/brw_vec4.cpp | 70 
> >> > +-
> >> >  src/mesa/drivers/dri/i965/brw_vs.c | 16 +++-
> >> >  src/mesa/drivers/dri/i965/brw_vs.h | 12 --
> >> >  3 files changed, 49 insertions(+), 49 deletions(-)
> >> >
> >> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> >> > b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> >> > index 4b8390f..8e38729 100644
> >> > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> >> > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> >> > @@ -1937,51 +1937,42 @@ extern "C" {
> >> >   * Returns the final assembly and the program's size.
> >> >   */
> >> >  const unsigned *
> >> > -brw_vs_emit(struct brw_context *brw,
> >> > +brw_vs_emit(const struct brw_compiler *compiler, void *log_data,
> >> >  void *mem_ctx,
> >> >  const struct brw_vs_prog_key *key,
> >> >  struct brw_vs_prog_data *prog_data,
> >> > -struct gl_vertex_program *vp,
> >> > -struct gl_shader_program *prog,
> >> > +const nir_shader *shader,
> >> > +gl_clip_plane *clip_planes,
> >> > +bool use_legacy_snorm_formula,
> >> >  int shader_time_index,
> >> > -unsigned *final_assembly_size)
> >> > +unsigned *final_assembly_size,
> >> > +char **error_str)
> >> >  {
> >> > const unsigned *assembly = NULL;
> >> >
> >> > -   if (brw->intelScreen->compiler->scalar_vs) {
> >> > +   if (compiler->scalar_vs) {
> >> >prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8;
> >> >
> >> > -  fs_visitor v(brw->intelScreen->compiler, brw,
> >> > -   mem_ctx, key, _data->base.base,
> >> > +  fs_visitor v(compiler, log_data, mem_ctx, key, 
> >> > _data->base.base,
> >> > NULL, /* prog; Only used for TEXTURE_RECTANGLE on 
> >> > gen < 8 */
> >> > -   vp->Base.nir, 8, shader_time_index);
> >> > -  if (!v.run_vs(brw_select_clip_planes(>ctx))) {
> >> > - if (prog) {
> >> > -prog->LinkStatus = false;
> >> > -ralloc_strcat(>InfoLog, v.fail_msg);
> >> > - }
> >> > -
> >> > - _mesa_problem(NULL, "Failed to compile vertex shader: %s\n",
> >> > -   v.fail_msg);
> >> > +   shader, 8, shader_time_index);
> >> > +  if (!v.run_vs(clip_planes)) {
> >> > + if (error_str)
> >> > +*error_str = ralloc_strdup(mem_ctx, v.fail_msg);
> >>
> >> I don't particularly like the complexity of the error reporting mechanism.
> >> First vec4_visitor::fail() uses ralloc_asprintf() to create one string, 
> >> then
> >> we make a copy of it here and finally the caller of brw_vs_emit() makes yet
> >> another copy using ralloc_strcat().
> >> I wonder if we could pass the final destination all the way for the
> >> vec4_visitor::fail() to augment with ralloc_asprintf() and hence avoid all
> >
> > Or more appropiately using ralloc_asprintf_append()...
> >
> >> the indirection in the middle. What do you think?
> 
> I'd be moderately ok with just doing "*error_str = v.fail_msg" and
> avoiding the extra copy.  I'm not a big fan of the extra copy, but I
> decided to leave it in for a couple of reasons
> 
> 1) It only happens on the error path so it's not a big deal.

I wasn't concerned about the overhead either, as you said this is error path
only.

> 
> 2) Not copying it is kind of a layering violation.  You're grabbing a
> string from an object without copying it, destroying the object, and
> then handing it back to the thing that called you.  The only way this
> works is if you know that the class ralloc'd the string from the
> context you gave it.  We do, in this case, but it did seem like a bit
> of a layering violation.
> 
> 3) The first time I did this rework, I created a new memory context
> for *_emit and destroyed that memory context at the end.  Because
> fail_msg was allocated out of this temp context, I had to do something
> with it before returning it.  The objective there was to remove the
> mem_ctx input parameter and make it more self-contained.  Then I
> realized that we had to do something about the pointer that gets
> returned.  The only sane use of the emit function is to call it,
> immediately copy the result into some buffer, and destroy the context
> so it wasn't like having