date:20151021

Re: [Mesa-dev] [PATCH 1/2] glsl: Implement a SSBO load optimization pass

2015-10-21 Thread Iago Toral

On Wed, 2015-10-21 at 14:58 +0300, Francisco Jerez wrote:
> Iago Toral  writes:
> 
> > On Wed, 2015-10-21 at 13:00 +0300, Francisco Jerez wrote:
> >> Iago Toral  writes:
> >> 
> >> > Hi Curro,
> >> >
> >> > On Tue, 2015-10-20 at 14:18 +0300, Francisco Jerez wrote:
> >> >> Iago Toral  writes:
> >> >> 
> >> >> > On Tue, 2015-10-20 at 13:22 +0300, Francisco Jerez wrote:
> >> >> >> Iago Toral Quiroga  writes:
> >> >> >> 
> >> >> >> > This allows us to re-use the results of previous ssbo loads in 
> >> >> >> > situations
> >> >> >> > that are safe (i.e. when there are no stores, atomic operations or
> >> >> >> > memory barriers in between).
> >> >> >> >
> >> >> >> > This is particularly useful for things like matrix 
> >> >> >> > multiplications, where
> >> >> >> > for a mat4 buffer variable we cut the number of loads from 16 (4 
> >> >> >> > reads of
> >> >> >> > each column) down to 4 (1 read of each column).
> >> >> >> >
> >> >> >> > The pass can only cache ssbo loads that involve constant blocks and
> >> >> >> > offsets, but could be extended to compare sub-expressions for these
> >> >> >> > as well, similar to a CSE pass.
> >> >> >> >
> >> >> >> > The way the cache works is simple: ssbo loads with constant 
> >> >> >> > block/offset
> >> >> >> > are included in a cache as they are seen. Stores invalidate cache 
> >> >> >> > entries.
> >> >> >> > Stores with non-constant offset invalidate all cached loads for 
> >> >> >> > the block
> >> >> >> > and stores with non-constant block invalidate all cache entries. 
> >> >> >> > There is
> >> >> >> > room to improve this by using the actual variable name we are 
> >> >> >> > accessing to
> >> >> >> > limit the entries that should be invalidated. We also need to 
> >> >> >> > invalidate
> >> >> >> > cache entries when we exit the block in which they have been 
> >> >> >> > defined
> >> >> >> > (i.e. inside if/else blocks or loops).
> >> >> >> >
> >> >> >> > The cache optimization is built as a separate pass, instead of 
> >> >> >> > merging it
> >> >> >> > inside the lower_ubo_reference pass for a number of reasons:
> >> >> >> >
> >> >> >> > 1) The way we process assignments in visitors is that the LHS is
> >> >> >> > processed before the RHS. This creates a problem for an 
> >> >> >> > optimization
> >> >> >> > such as this when we do things like a = a + 1, since we would see 
> >> >> >> > the
> >> >> >> > store before the read when the actual execution order is reversed.
> >> >> >> > This could be fixed by re-implementing the logic in the visit_enter
> >> >> >> > method for ir_assignment in lower_ubo_reference and then returning
> >> >> >> > visit_continue_with_parent.
> >> >> >> >
> >> >> >> > 2) Some writes/reads need to be split into multiple smaller
> >> >> >> > writes/reads, and we need to handle caching for each one. This 
> >> >> >> > happens
> >> >> >> > deep inside the code that handles the lowering and some
> >> >> >> > of the information we need to do this is not available. This could 
> >> >> >> > also
> >> >> >> > be fixed by passing more data into the corresponding functions or 
> >> >> >> > by
> >> >> >> > making this data available as class members, but the current 
> >> >> >> > implementation
> >> >> >> > is already complex enough and  this would only contribute to the 
> >> >> >> > complexity.
> >> >> >> >
> >> >> >> > 3) We can have ssbo loads in the LHS too (i.e. a[a[0]] = ..). In 
> >> >> >> > these cases
> >> >> >> > the current code in lower_uo_reference would see the store before 
> >> >> >> > the read.
> >> >> >> > Probably fixable, but again would add more complexity to the 
> >> >> >> > lowering.
> >> >> >> >
> >> >> >> > On the other hand, a separate pass that runs after the lowering 
> >> >> >> > sees
> >> >> >> > all the individal loads and stores in the correct order (so we 
> >> >> >> > don't need
> >> >> >> > to do any tricks) and it allows us to sepearate the lowering logic 
> >> >> >> > (which
> >> >> >> > is already complex) from the caching logic. It also gives us a 
> >> >> >> > chance to
> >> >> >> > run it after other optimization passes have run and turned constant
> >> >> >> > expressions for block/offset into constants, enabling more 
> >> >> >> > opportunities
> >> >> >> > for caching.
> >> >> >> 
> >> >> >> Seems like a restricted form of CSE that only handles SSBO loads, and
> >> >> >> only the ones with constant arguments.  Why don't we CSE these? (and
> >> >> >> other memory access operations like image loads)
> >> >> >
> >> >> > There is not a CSE pass in GLSL IR any more so we would have to do it 
> >> >> > in
> >> >> > NIR and some drivers would lose the optimization. Doing it at GLSL IR
> >> >> > level seemed like a win from this perspective.
> >> >> >
> >> >> > Then there is the fact that we cannot just CSE these. We need to make
> >> >> > sure that we only CSE them when it is safe to do so (i.e. no
> >> >> > stores/atomics to the

Re: [Mesa-dev] [PATCH 1/4] mesa: Draw indirect is not allowed if the default VAO is bound.

2015-10-21 Thread Ian Romanick

On 10/20/2015 10:22 AM, Ilia Mirkin wrote:
> On Tue, Oct 20, 2015 at 10:19 AM, Marta Lofstedt
>  wrote:
>> From: Marta Lofstedt 
>>
>> From OpenGL ES 3.1 specification, section 10.5:
>> "DrawArraysIndirect requires that all data sourced for the
>> command, including the DrawArraysIndirectCommand
>> structure,  be in buffer objects,  and may not be called when
>> the default vertex array object is bound."
> 
> Is it possible to do this with desktop GL? AFAIK ARB_draw_indirect is
> only enabled for core profiles, and you can't draw at all in core
> without a VAO bound. So I think you can remove the _mesa_is_gles31
> check. [Might want to wait on confirmation for that, or double-check
> my claim yourself.]

Correct.  We do this check for core profile in check_valid_to_render
because a VAO is always required.

>> Signed-off-by: Marta Lofstedt 
>> ---
>>  src/mesa/main/api_validate.c | 13 +
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
>> index a46c194..c5628f5 100644
>> --- a/src/mesa/main/api_validate.c
>> +++ b/src/mesa/main/api_validate.c
>> @@ -698,6 +698,19 @@ valid_draw_indirect(struct gl_context *ctx,
>>  {
>> const GLsizeiptr end = (GLsizeiptr)indirect + size;
>>
>> +   /*
>> +* OpenGL ES 3.1 spec. section 10.5:
>> +* "DrawArraysIndirect requires that all data sourced for the
>> +* command, including the DrawArraysIndirectCommand
>> +* structure,  be in buffer objects,  and may not be called when
>> +* the default vertex array object is bound."
>> +*/
>> +   if (_mesa_is_gles31(ctx) && (ctx->Array.VAO == ctx->Array.DefaultVAO)) {
>> +  _mesa_error(ctx, GL_INVALID_OPERATION,
>> +  "%s(The default VAO is bound)", name);

We should use the same error here that is used in check_valid_to_render
for core profile:

 _mesa_error(ctx, GL_INVALID_OPERATION, "%s(no VAO bound)",
function);

>> +  return GL_FALSE;
>> +   }
>> +
>> if (!_mesa_valid_prim_mode(ctx, mode, name))
>>return GL_FALSE;
>>
>> --
>> 2.1.4
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/4] mesa: Draw Indirect is not allowed when no vertex array binding exists.

2015-10-21 Thread Marek Olšák

You still have to check all enabled vertex attributes. If you don't want to
loop, use bitmasks. See u_vbuf.c as an example of how to avoid loops.

Marek
On Oct 21, 2015 2:33 PM, "Lofstedt, Marta"  wrote:

> > -Original Message-
> > From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On
> > Behalf Of Tapani Pälli
> > Sent: Wednesday, October 21, 2015 1:25 PM
> > To: Marek Olšák
> > Cc: mesa-dev@lists.freedesktop.org
> > Subject: Re: [Mesa-dev] [PATCH 2/4] mesa: Draw Indirect is not allowed
> > when no vertex array binding exists.
> >
> > On 10/21/2015 01:41 PM, Marek Olšák wrote:
> > > On Wed, Oct 21, 2015 at 7:16 AM, Tapani Pälli 
> > wrote:
> > >> On 10/20/2015 08:54 PM, Marek Olšák wrote:
> > >>> On Tue, Oct 20, 2015 at 4:19 PM, Marta Lofstedt
> > >>>  wrote:
> >  From: Marta Lofstedt 
> > 
> >  OpenGL ES 3.1 spec. section 10.5:
> >  "An INVALID_OPERATION error is generated if zero is bound to
> >  VERTEX_ARRAY_BINDING, DRAW_INDIRECT_BUFFER or to any enabled
> > vertex
> >  array."
> > 
> >  Signed-off-by: Marta Lofstedt 
> >  ---
> > src/mesa/main/api_validate.c | 14 ++
> > 1 file changed, 14 insertions(+)
> > 
> >  diff --git a/src/mesa/main/api_validate.c
> >  b/src/mesa/main/api_validate.c index c5628f5..7062cbd 100644
> >  --- a/src/mesa/main/api_validate.c
> >  +++ b/src/mesa/main/api_validate.c
> >  @@ -711,6 +711,20 @@ valid_draw_indirect(struct gl_context *ctx,
> >   return GL_FALSE;
> >    }
> > 
> >  +   /*
> >  +* OpenGL ES 3.1 spec. section 10.5:
> >  +* "An INVALID_OPERATION error is generated if zero is bound to
> >  +* VERTEX_ARRAY_BINDING, DRAW_INDIRECT_BUFFER or to any
> > enabled
> >  +* vertex array."
> >  +* OpenGL 4.5 spec. section 10.4:
> >  +* "An INVALID_OPERATION error is generated if  zero is bound to
> >  +* DRAW_INDIRECT_BUFFER, or if  no element array buffer is
> bound"
> >  +*/
> >  +   if (!_mesa_is_bufferobj(ctx->Array.ArrayBufferObj)) {
> >  +  _mesa_error(ctx, GL_INVALID_OPERATION,
> >  +  "%s(No VBO is bound)", name);
> >  +   }
> > >>> NAK.
> > >>>
> > >>> VERTEX_ARRAY_BINDING is a VAO. Array.ArrayBufferObj is from
> > glBindBuffer.
> > >>
> > >> This check is valid, it is not against VERTEX_ARRAY_BINDING. Note
> > >> "any enabled vertex array", we hit this weird situation when client
> > >> has a VAO bound and has enabled vertex attrib array but has not bound
> > any VBO to it.
> > > No, it's invalid. The check has absolutely nothing to do with enabled
> > > vertex arrays and draw calls. Absolutely nothing. glBindBuffer changes
> > > a latched state, which means it doesn't do anything by itself, it only
> > > affects other functions that change states. The functions affected by
> > > glBindBuffer(GL_ARRAY_BUFFER, ..) are glVertexAttribPointer, etc. not
> > > glDraw*. If you called glBindBuffer(GL_ARRAY_BUFFER, ..) right before
> > > a Draw call, it wouldn't do anything to vertex arrays and buffers, but
> > > it would pass the check.
> >
> > OK my understanding was that reason why this change fixes the bug is that
> > ctx->Array.ArrayBufferObj is 0 for the default VAO and never 0 when
> vertex
> > array buffer binding has been set, and this would happen when we would
> > have an VBO bound. I will spend some more time to understand this.
>
> If you have access to the CTS it is these tests that this fixed:
> ES31-CTS.draw_indirect.negative-noVBO-arrays
> ES31-CTS.draw_indirect.negative-noVBO-elements
>
> My understanding is as Tapanis above, I was trying to come up with a method
> of not needing to loop through the VertexAttribPointers.
> Also, I have mis-quoted the spec. I should have only quoted the:
>  "or any enabled vertex arrays" and limit to gles 3.1.
>
> >
> > > Now, where does this patch check "enabled vertex arrays"? Nowhere. It
> > > doesn't check VERTEX_ARRAY_BINDING, it doesn't check
> > > DRAW_INDIRECT_BUFFER, and it doesn't check enabled vertex arrays. That
> > > whole comment is completely useless there.
> > >
> > > Sorry if I'm too direct, but you should really think more before
> > > making such statements and giving Reviewed-by.
> >
> > // Tapani
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 92570] 10 bit h264 OMX UVD decode outputs NV12

2015-10-21 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=92570

Bug ID: 92570
   Summary: 10 bit h264 OMX UVD decode outputs NV12
   Product: Mesa
   Version: git
  Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
  Severity: normal
  Priority: medium
 Component: Other
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: adf.li...@gmail.com
QA Contact: mesa-dev@lists.freedesktop.org

No idea if this a Mesa OMX issue or UVD or gstreamer -

GPU is R9 285 Tonga.

In theory this should be able to h/w decode 10 bit h264 - and the h/w does seem
to process it.

The problem is that something is assuming/expecting/indicating that the output
is NV12, so the output is corrupted.

Here's a snip of a debug output from doing -

GST_DEBUG=*:4 gst-launch-1.0 -f filesrc location=A-10bit-h264.mkv !
matroskademux ! h264parse !  omxh264dec ! filesink location=out.yuv

0:00:00.364695565   660  0x22310f0 INFO   GST_EVENT
gstevent.c:679:gst_event_new_caps: creating caps event video/x-h264,
level=(string)4.1, profile=(string)high-10, stream-format=(string)byte-stream,
alignment=(string)au, width=(int)1920, height=(int)1080,
framerate=(fraction)3/1001, parsed=(boolean)true



gstpad.c:5881:gst_pad_start_task: created task
0x22f85f0
0:00:00.366822661   660  0x2231590 INFO   GST_EVENT 

gstevent.c:679:gst_event_new_caps: creating caps event video/x-raw,
format=(string)NV12, width=(int)1920, height=(int)1080,
interlace-mode=(string)progressive, pixel-aspect-ratio=(fraction)1/1,
chroma-site=(string)mpeg2, colorimetry=(string)bt709,
framerate=(fraction)3/1001

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/4] mesa: Draw Indirect is not allowed when no vertex array binding exists.

2015-10-21 Thread Ian Romanick

On 10/21/2015 07:32 AM, Lofstedt, Marta wrote:
>> -Original Message-
>> From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On
>> Behalf Of Tapani Pälli
>> Sent: Wednesday, October 21, 2015 1:25 PM
>> To: Marek Olšák
>> Cc: mesa-dev@lists.freedesktop.org
>> Subject: Re: [Mesa-dev] [PATCH 2/4] mesa: Draw Indirect is not allowed
>> when no vertex array binding exists.
>>
>> On 10/21/2015 01:41 PM, Marek Olšák wrote:
>>> On Wed, Oct 21, 2015 at 7:16 AM, Tapani Pälli 
>> wrote:
 On 10/20/2015 08:54 PM, Marek Olšák wrote:
> On Tue, Oct 20, 2015 at 4:19 PM, Marta Lofstedt
>  wrote:
>> From: Marta Lofstedt 
>>
>> OpenGL ES 3.1 spec. section 10.5:
>> "An INVALID_OPERATION error is generated if zero is bound to
>> VERTEX_ARRAY_BINDING, DRAW_INDIRECT_BUFFER or to any enabled
>> vertex
>> array."
>>
>> Signed-off-by: Marta Lofstedt 
>> ---
>>src/mesa/main/api_validate.c | 14 ++
>>1 file changed, 14 insertions(+)
>>
>> diff --git a/src/mesa/main/api_validate.c
>> b/src/mesa/main/api_validate.c index c5628f5..7062cbd 100644
>> --- a/src/mesa/main/api_validate.c
>> +++ b/src/mesa/main/api_validate.c
>> @@ -711,6 +711,20 @@ valid_draw_indirect(struct gl_context *ctx,
>>  return GL_FALSE;
>>   }
>>
>> +   /*
>> +* OpenGL ES 3.1 spec. section 10.5:
>> +* "An INVALID_OPERATION error is generated if zero is bound to
>> +* VERTEX_ARRAY_BINDING, DRAW_INDIRECT_BUFFER or to any
>> enabled
>> +* vertex array."
>> +* OpenGL 4.5 spec. section 10.4:
>> +* "An INVALID_OPERATION error is generated if  zero is bound to
>> +* DRAW_INDIRECT_BUFFER, or if  no element array buffer is bound"
>> +*/
>> +   if (!_mesa_is_bufferobj(ctx->Array.ArrayBufferObj)) {
>> +  _mesa_error(ctx, GL_INVALID_OPERATION,
>> +  "%s(No VBO is bound)", name);
>> +   }
> NAK.
>
> VERTEX_ARRAY_BINDING is a VAO. Array.ArrayBufferObj is from
>> glBindBuffer.

 This check is valid, it is not against VERTEX_ARRAY_BINDING. Note
 "any enabled vertex array", we hit this weird situation when client
 has a VAO bound and has enabled vertex attrib array but has not bound
>> any VBO to it.
>>> No, it's invalid. The check has absolutely nothing to do with enabled
>>> vertex arrays and draw calls. Absolutely nothing. glBindBuffer changes
>>> a latched state, which means it doesn't do anything by itself, it only
>>> affects other functions that change states. The functions affected by
>>> glBindBuffer(GL_ARRAY_BUFFER, ..) are glVertexAttribPointer, etc. not
>>> glDraw*. If you called glBindBuffer(GL_ARRAY_BUFFER, ..) right before
>>> a Draw call, it wouldn't do anything to vertex arrays and buffers, but
>>> it would pass the check.
>>
>> OK my understanding was that reason why this change fixes the bug is that
>> ctx->Array.ArrayBufferObj is 0 for the default VAO and never 0 when vertex
>> array buffer binding has been set, and this would happen when we would
>> have an VBO bound. I will spend some more time to understand this.

Core profile has the same sort of limitation. I really hope we're
enforcing it there. It's probably worth finding that check. I expected
to find it in either check_valid_to_render or _mesa_valid_to_render, but
I didn't see it in either place. Hmm... it may just happen in
_mesa_VertexAttribPointer.

> If you have access to the CTS it is these tests that this fixed:
> ES31-CTS.draw_indirect.negative-noVBO-arrays
> ES31-CTS.draw_indirect.negative-noVBO-elements
> 
> My understanding is as Tapanis above, I was trying to come up with a method
> of not needing to loop through the VertexAttribPointers.
> Also, I have mis-quoted the spec. I should have only quoted the:
>  "or any enabled vertex arrays" and limit to gles 3.1.

Since this is a hot path, avoiding a loop would be good.  We already
have such a loop in the driver, and it hurts.  See

http://patchwork.freedesktop.org/patch/56772/

If we were to track some bitmasks of enabled arrays and vbo-backed
arrays, we could simplify the check on non-core profile.  That should
help performance.

>>> Now, where does this patch check "enabled vertex arrays"? Nowhere. It
>>> doesn't check VERTEX_ARRAY_BINDING, it doesn't check
>>> DRAW_INDIRECT_BUFFER, and it doesn't check enabled vertex arrays. That
>>> whole comment is completely useless there.
>>>
>>> Sorry if I'm too direct, but you should really think more before
>>> making such statements and giving Reviewed-by.
>>
>> // Tapani
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
>

Re: [Mesa-dev] [PATCH] mesa/glformats: Undo code changes from _mesa_base_tex_format() move

2015-10-21 Thread Emil Velikov

On 9 October 2015 at 23:35, Nanley Chery  wrote:
> From: Nanley Chery 
>
> The refactoring commit, c6bf1cd, accidentally reverted cd49b97
> and 99b1f47. These changes caused more code to be added to the
> function and removed the existing support for ASTC. This patch
> reverts those modifications.
>
> v2. Actually include ASTC support again.
>
Thanks Nanley.

I take it that with this in place the KHR_texture_compression_astc_ldr
piglits/tests are back online (pass) ?
I have double-checked and this patch does fix the erroneous reverts by c6bf1cd.

Reviewed-by: Emil Velikov 

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 92570] 10 bit h264 OMX UVD decode outputs NV12

2015-10-21 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=92570

Christian König  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Severity|normal  |enhancement

--- Comment #1 from Christian König  ---
Yeah, that's a known issue/unimplemented feature.

On pre Tonga hardware UVD can actually decode 10 bit h264, but still outputs
NV12.

And so far we didn't had the time to actually implement support for 10bit video
surfaces used on Tonga so your end result is corrupted.

BTW: If somebody wants to get his hands dirty this should be rather easy to
hack together, just not top priority for us.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] glsl: remove excess location qualifier validation

2015-10-21 Thread Timothy Arceri

Location has never been able to be a negative value because it has
always been validated in the parser.

Also the linker doesn't check for negatives like the comment claims.
---

 No piglit regressions and an extra negative test sent for
 ARB_explicit_uniform_location [1]

 [1] http://patchwork.freedesktop.org/patch/62573/

 src/glsl/ast_to_hir.cpp | 70 -
 1 file changed, 22 insertions(+), 48 deletions(-)

diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index 8549d55..0306530 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -2422,21 +2422,6 @@ validate_explicit_location(const struct 
ast_type_qualifier *qual,
   const struct gl_context *const ctx = state->ctx;
   unsigned max_loc = qual->location + var->type->uniform_locations() - 1;
 
-  /* ARB_explicit_uniform_location specification states:
-   *
-   * "The explicitly defined locations and the generated locations
-   * must be in the range of 0 to MAX_UNIFORM_LOCATIONS minus one."
-   *
-   * "Valid locations for default-block uniform variable locations
-   * are in the range of 0 to the implementation-defined maximum
-   * number of uniform locations."
-   */
-  if (qual->location < 0) {
- _mesa_glsl_error(loc, state,
-  "explicit location < 0 for uniform %s", var->name);
- return;
-  }
-
   if (max_loc >= ctx->Const.MaxUserAssignableUniformLocations) {
  _mesa_glsl_error(loc, state, "location(s) consumed by uniform %s "
   ">= MAX_UNIFORM_LOCATIONS (%u)", var->name,
@@ -2527,41 +2512,30 @@ validate_explicit_location(const struct 
ast_type_qualifier *qual,
} else {
   var->data.explicit_location = true;
 
-  /* This bit of silliness is needed because invalid explicit locations
-   * are supposed to be flagged during linking.  Small negative values
-   * biased by VERT_ATTRIB_GENERIC0 or FRAG_RESULT_DATA0 could alias
-   * built-in values (e.g., -16+VERT_ATTRIB_GENERIC0 = VERT_ATTRIB_POS).
-   * The linker needs to be able to differentiate these cases.  This
-   * ensures that negative values stay negative.
-   */
-  if (qual->location >= 0) {
- switch (state->stage) {
- case MESA_SHADER_VERTEX:
-var->data.location = (var->data.mode == ir_var_shader_in)
-   ? (qual->location + VERT_ATTRIB_GENERIC0)
-   : (qual->location + VARYING_SLOT_VAR0);
-break;
+  switch (state->stage) {
+  case MESA_SHADER_VERTEX:
+ var->data.location = (var->data.mode == ir_var_shader_in)
+? (qual->location + VERT_ATTRIB_GENERIC0)
+: (qual->location + VARYING_SLOT_VAR0);
+ break;
 
- case MESA_SHADER_TESS_CTRL:
- case MESA_SHADER_TESS_EVAL:
- case MESA_SHADER_GEOMETRY:
-if (var->data.patch)
-   var->data.location = qual->location + VARYING_SLOT_PATCH0;
-else
-   var->data.location = qual->location + VARYING_SLOT_VAR0;
-break;
+  case MESA_SHADER_TESS_CTRL:
+  case MESA_SHADER_TESS_EVAL:
+  case MESA_SHADER_GEOMETRY:
+ if (var->data.patch)
+var->data.location = qual->location + VARYING_SLOT_PATCH0;
+ else
+var->data.location = qual->location + VARYING_SLOT_VAR0;
+ break;
 
- case MESA_SHADER_FRAGMENT:
-var->data.location = (var->data.mode == ir_var_shader_out)
-   ? (qual->location + FRAG_RESULT_DATA0)
-   : (qual->location + VARYING_SLOT_VAR0);
-break;
- case MESA_SHADER_COMPUTE:
-assert(!"Unexpected shader type");
-break;
- }
-  } else {
- var->data.location = qual->location;
+  case MESA_SHADER_FRAGMENT:
+ var->data.location = (var->data.mode == ir_var_shader_out)
+? (qual->location + FRAG_RESULT_DATA0)
+: (qual->location + VARYING_SLOT_VAR0);
+ break;
+  case MESA_SHADER_COMPUTE:
+ assert(!"Unexpected shader type");
+ break;
   }
 
   if (qual->flags.q.explicit_index) {
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] mesa/meta: Use interpolateAtOffset for 16x MSAA copy blit

2015-10-21 Thread Ian Romanick

On 09/29/2015 07:57 AM, Neil Roberts wrote:
> Previously there was a problem in i965 where if 16x MSAA is used then
> some of the sample positions are exactly on the 0 x or y axis. When
> the MSAA copy blit shader interpolates the texture coordinates at
> these sample positions it was possible that it would jump to a
> neighboring texel due to rounding errors. It is likely that these
> positions would be used on 16x MSAA because that is where they are
> defined to be in D3D.
> 
> To fix that this patch makes it use interpolateAtOffset in the blit
> shader whenever 16x MSAA is used and the GL_ARB_gpu_shader5 extension
> is available. This forces it to interpolate the texture coordinates at
> the pixel center to avoid these problematic positions.

Would it also work to use "centroid in" interpolation qualifier?  Do we
have any data about the relative cost of the three interpolation methods?

> This fixes ext_framebuffer_multisample-unaligned-blit and
> ext_framebuffer_multisample-clip-and-scissor-blit with 16x MSAA on
> SKL+.
> ---
>  src/mesa/drivers/common/meta_blit.c | 64 
> ++---
>  1 file changed, 52 insertions(+), 12 deletions(-)
> 
> diff --git a/src/mesa/drivers/common/meta_blit.c 
> b/src/mesa/drivers/common/meta_blit.c
> index 1f9515a..e812ecb 100644
> --- a/src/mesa/drivers/common/meta_blit.c
> +++ b/src/mesa/drivers/common/meta_blit.c
> @@ -352,17 +352,27 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
> shader_index == BLIT_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_DEPTH_COPY ||
> shader_index == BLIT_MSAA_SHADER_2D_MULTISAMPLE_DEPTH_COPY) {
>char *sample_index;
> -  const char *arb_sample_shading_extension_string;
> +  const char *extra_extensions;
> +  const char *tex_coords = "texCoords";
>  
>if (dst_is_msaa) {
> - arb_sample_shading_extension_string = "#extension 
> GL_ARB_sample_shading : enable";
>   sample_index = "gl_SampleID";
>   name = "depth MSAA copy";
> +
> + if (ctx->Extensions.ARB_gpu_shader5 && samples >= 16) {
> +extra_extensions =
> +   "#extension GL_ARB_sample_shading : enable\n"
> +   "#extension GL_ARB_gpu_shader5 : enable\n";

You can unconditionally add the enables.  If the implementation doesn't
support the extension, enable will still succeed while require will not.

> +/* See comment below for the color copy */
> +tex_coords = "interpolateAtOffset(texCoords, vec2(0.0))";
> + } else {
> +extra_extensions = "#extension GL_ARB_sample_shading : enable\n";
> + }
>} else {
> - /* Don't need that extension, since we're drawing to a 
> single-sampled
> -  * destination.
> + /* Don't need any extra extensions, since we're drawing to a
> +  * single-sampled destination.
>*/
> - arb_sample_shading_extension_string = "";
> + extra_extensions = "";
>   /* From the GL 4.3 spec:
>*
>* "If there is a multisample buffer (the value of 
> SAMPLE_BUFFERS
> @@ -399,27 +409,57 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
>"\n"
>"void main()\n"
>"{\n"
> -  "   gl_FragDepth = texelFetch(texSampler, 
> i%s(texCoords), %s).r;\n"
> +  "   gl_FragDepth = texelFetch(texSampler, 
> i%s(%s), %s).r;\n"
>"}\n",
> -  arb_sample_shading_extension_string,
> +  extra_extensions,
>sampler_array_suffix,
>texcoord_type,
>texcoord_type,
> +  tex_coords,
>sample_index);
> } else {
>/* You can create 2D_MULTISAMPLE textures with 0 sample count (meaning 
> 1
> * sample).  Yes, this is ridiculous.
> */
>char *sample_resolve;
> -  const char *arb_sample_shading_extension_string;
> +  const char *extra_extensions;
>const char *merge_function;
>name = ralloc_asprintf(mem_ctx, "%svec4 MSAA %s",
>   vec4_prefix,
>   dst_is_msaa ? "copy" : "resolve");
>  
>if (dst_is_msaa) {
> - arb_sample_shading_extension_string = "#extension 
> GL_ARB_sample_shading : enable";
> - sample_resolve = ralloc_asprintf(mem_ctx, "   out_color = 
> texelFetch(texSampler, i%s(texCoords), gl_SampleID);", texcoord_type);
> + const char *tex_coords;
> +
> + if (ctx->Extensions.ARB_gpu_shader5 && samples >= 16) {
> +/* If interpolateAtOffset is available then it will be used to
> + * force the interpolation to the center.

Re: [Mesa-dev] [PATCH 3/4] mesa: Draw Indirect return wrong error code on unalinged

2015-10-21 Thread Ian Romanick

On 10/20/2015 01:03 PM, Ilia Mirkin wrote:
> On Tue, Oct 20, 2015 at 10:19 AM, Marta Lofstedt
>  wrote:
>> From: Marta Lofstedt 
>>
>> From OpenGL 4.4 specification, section 10.4 and
>> Open GL Es 3.1 section 10.5:
>> "An INVALID_VALUE error is generated if indirect is not a multiple
>> of the size, in basic machine units, of uint."
>>
>> However, the current code follow the ARB_draw_indirect:
>> https://www.opengl.org/registry/specs/ARB/draw_indirect.txt
>> "INVALID_OPERATION is generated by DrawArraysIndirect and
>> DrawElementsIndirect if commands source data beyond the end
>> of a buffer object or if  is not word aligned."
>>
>> Signed-off-by: Marta Lofstedt 
>> ---
>>  src/mesa/main/api_validate.c | 13 +++--
>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
>> index 7062cbd..a084362 100644
>> --- a/src/mesa/main/api_validate.c
>> +++ b/src/mesa/main/api_validate.c
>> @@ -732,10 +732,19 @@ valid_draw_indirect(struct gl_context *ctx,
>> /* From the ARB_draw_indirect specification:
>>  * "An INVALID_OPERATION error is generated [...] if  is no
>>  *  word aligned."
>> +* However, from OpenGL version 4.4. section 10.5
> 
> 4.4,
> 
> I double-checked and you're right -- it was INVALID_OPERATION in GL
> 4.3, but INVALID_VALUE in GL 4.4. Weird.

We (Khronos) changed the error because, in fact, the value is invalid.
Generating GL_INVALID_VALUE is more consistent with other similar errors. :)

Since Khronos doesn't usually update older specs, we (Mesa) usually
interpret such changes as clarifications that should be retroactively
applied.  I really hate having a mess of "if this API generate this
error otherwise generate that error."

Let's just always generate GL_INVALID_VALUE.

> Reviewed-by: Ilia Mirkin 
> 
> Should probably fix up the piglit test as well, if any.

Yes.  My recommendation would be to accept either error in desktop GL <=
4.3, but only accept GL_INVALID_VALUE in other versions / APIs.

>> +* and OpenGL ES 3.1, section 10.6:
>> +* "An INVALID_VALUE error is generated if indirect is not a multiple
>> +* of the size, in basic machine units, of uint."
>>  */
>> if ((GLsizeiptr)indirect & (sizeof(GLuint) - 1)) {
>> -  _mesa_error(ctx, GL_INVALID_OPERATION,
>> -  "%s(indirect is not aligned)", name);
>> +  if ((_mesa_is_desktop_gl(ctx) && ctx->Version >= 44) ||
>> +  _mesa_is_gles31(ctx))
>> + _mesa_error(ctx, GL_INVALID_VALUE,
>> + "%s(indirect is not aligned)", name);
>> +  else
>> + _mesa_error(ctx, GL_INVALID_OPERATION,
>> + "%s(indirect is not aligned)", name);
>>return GL_FALSE;
>> }
>>
>> --
>> 2.1.4
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCHv3 5/7] i965/fs: move the fs_reg::smear() from

2015-10-21 Thread Emil Velikov

From: Emil Velikov 

We're about to reuse get_timestamp() for the nir_intrinsic_shader_clock.
In the latter the generalisation does not apply, so move the smear()
where needed. This also makes the function analogous to the vec4 one.

v2: Tweak the comment - The caller -> We (Matt, Connor).
v3: More comment tweaks (Connor)

Signed-off-by: Emil Velikov 
Reviewed-by: Connor Abbott 
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 29 +
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index da90467..6b1b54a 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -533,18 +533,6 @@ fs_visitor::get_timestamp(const fs_builder )
 */
bld.group(4, 0).exec_all().MOV(dst, ts);
 
-   /* The caller wants the low 32 bits of the timestamp.  Since it's running
-* at the GPU clock rate of ~1.2ghz, it will roll over every ~3 seconds,
-* which is plenty of time for our purposes.  It is identical across the
-* EUs, but since it's tracking GPU core speed it will increment at a
-* varying rate as render P-states change.
-*
-* The caller could also check if render P-states have changed (or anything
-* else that might disrupt timing) by setting smear to 2 and checking if
-* that field is != 0.
-*/
-   dst.set_smear(0);
-
return dst;
 }
 
@@ -552,6 +540,14 @@ void
 fs_visitor::emit_shader_time_begin()
 {
shader_start_time = get_timestamp(bld.annotate("shader time start"));
+
+   /* We want only the low 32 bits of the timestamp.  Since it's running
+* at the GPU clock rate of ~1.2ghz, it will roll over every ~3 seconds,
+* which is plenty of time for our purposes.  It is identical across the
+* EUs, but since it's tracking GPU core speed it will increment at a
+* varying rate as render P-states change.
+*/
+   shader_start_time.set_smear(0);
 }
 
 void
@@ -565,6 +561,15 @@ fs_visitor::emit_shader_time_end()
 
fs_reg shader_end_time = get_timestamp(ibld);
 
+   /* We only use the low 32 bits of the timestamp - see
+* emit_shader_time_begin()).
+*
+* We could also check if render P-states have changed (or anything
+* else that might disrupt timing) by setting smear to 2 and checking if
+* that field is != 0.
+*/
+   shader_end_time.set_smear(0);
+
/* Check that there weren't any timestamp reset events (assuming these
 * were the only two timestamp reads that happened).
 */
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallivm: Translate all util_cpu_caps bits to LLVM attributes.

2015-10-21 Thread Gustaw Smolarczyk

I am just a bystander, but I have one suggestion to this patch.

2015-10-21 18:25 GMT+02:00 Jose Fonseca :
> This should prevent disparity between features Mesa and LLVM
> believe are supported by the CPU.
>
> http://lists.freedesktop.org/archives/mesa-dev/2015-October/thread.html#96990
>
> Tested on a i7-3720QM w/ LLVM 3.3 and 3.6.
> ---
>  src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 34 
> ++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp 
> b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> index 72fab8c..7073956 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> @@ -498,6 +498,32 @@ 
> lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
> }
>
> llvm::SmallVector MAttrs;

Maybe increase the size of the SmallVector here?

Gustaw

> +   if (util_cpu_caps.has_sse) {
> +  MAttrs.push_back("+sse");
> +   }
> +   if (util_cpu_caps.has_sse2) {
> +  MAttrs.push_back("+sse2");
> +   }
> +   if (util_cpu_caps.has_sse3) {
> +  MAttrs.push_back("+sse3");
> +   }
> +   if (util_cpu_caps.has_ssse3) {
> +  MAttrs.push_back("+ssse3");
> +   }
> +   if (util_cpu_caps.has_sse4_1) {
> +#if HAVE_LLVM >= 0x0304
> +  MAttrs.push_back("+sse4.1");
> +#else
> +  MAttrs.push_back("+sse41");
> +#endif
> +   }
> +   if (util_cpu_caps.has_sse4_2) {
> +#if HAVE_LLVM >= 0x0304
> +  MAttrs.push_back("+sse4.2");
> +#else
> +  MAttrs.push_back("+sse42");
> +#endif
> +   }
> if (util_cpu_caps.has_avx) {
>/*
> * AVX feature is not automatically detected from CPUID by the X86 
> target
> @@ -509,8 +535,14 @@ 
> lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
>if (util_cpu_caps.has_f16c) {
>   MAttrs.push_back("+f16c");
>}
> -  builder.setMAttrs(MAttrs);
> +  if (util_cpu_caps.has_avx2) {
> + MAttrs.push_back("+avx2");
> +  }
> +   }
> +   if (util_cpu_caps.has_altivec) {
> +  MAttrs.push_back("+altivec");
> }
> +   builder.setMAttrs(MAttrs);
>
>  #if HAVE_LLVM >= 0x0305
> StringRef MCPU = llvm::sys::getHostCPUName();
> --
> 2.1.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] gallivm: Translate all util_cpu_caps bits to LLVM attributes.

2015-10-21 Thread Jose Fonseca

This should prevent disparity between features Mesa and LLVM
believe are supported by the CPU.

http://lists.freedesktop.org/archives/mesa-dev/2015-October/thread.html#96990

Tested on a i7-3720QM w/ LLVM 3.3 and 3.6.
---
 src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 34 ++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp 
b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
index 72fab8c..7073956 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
+++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
@@ -498,6 +498,32 @@ 
lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
}
 
llvm::SmallVector MAttrs;
+   if (util_cpu_caps.has_sse) {
+  MAttrs.push_back("+sse");
+   }
+   if (util_cpu_caps.has_sse2) {
+  MAttrs.push_back("+sse2");
+   }
+   if (util_cpu_caps.has_sse3) {
+  MAttrs.push_back("+sse3");
+   }
+   if (util_cpu_caps.has_ssse3) {
+  MAttrs.push_back("+ssse3");
+   }
+   if (util_cpu_caps.has_sse4_1) {
+#if HAVE_LLVM >= 0x0304
+  MAttrs.push_back("+sse4.1");
+#else
+  MAttrs.push_back("+sse41");
+#endif
+   }
+   if (util_cpu_caps.has_sse4_2) {
+#if HAVE_LLVM >= 0x0304
+  MAttrs.push_back("+sse4.2");
+#else
+  MAttrs.push_back("+sse42");
+#endif
+   }
if (util_cpu_caps.has_avx) {
   /*
* AVX feature is not automatically detected from CPUID by the X86 target
@@ -509,8 +535,14 @@ 
lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
   if (util_cpu_caps.has_f16c) {
  MAttrs.push_back("+f16c");
   }
-  builder.setMAttrs(MAttrs);
+  if (util_cpu_caps.has_avx2) {
+ MAttrs.push_back("+avx2");
+  }
+   }
+   if (util_cpu_caps.has_altivec) {
+  MAttrs.push_back("+altivec");
}
+   builder.setMAttrs(MAttrs);
 
 #if HAVE_LLVM >= 0x0305
StringRef MCPU = llvm::sys::getHostCPUName();
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/4] i965/fs: Emit a single ADD instruction for SET_SAMPLE_ID on Gen8+.

2015-10-21 Thread Matt Turner

Gen8+ lifted the register region restriction that an instruction whose
destination spans two registers must have sources that also span two
registers.
---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 15d0430..aed4adb 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -1426,7 +1426,7 @@ fs_generator::generate_set_sample_id(fs_inst *inst,
   src0.type == BRW_REGISTER_TYPE_UD);
 
struct brw_reg reg = stride(src1, 1, 4, 0);
-   if (dispatch_width == 8) {
+   if (devinfo->gen >= 8 || dispatch_width == 8) {
   brw_ADD(p, dst, src0, reg);
} else if (dispatch_width == 16) {
   brw_push_insn_state(p);
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/4] i965/fs: Use type-W for immediate in SampleID setup.

2015-10-21 Thread Matt Turner

Not a functional difference, but register is loaded with a signed
immediate (V) and added to a signed type (D) producing a signed result
(D).

Also change the type of g0 to allow for compaction.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 4 ++--
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 49323eb..f9c78df 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1273,7 +1273,7 @@ fs_visitor::emit_sampleid_setup()
if (key->compute_sample_id) {
   fs_reg t1 = vgrf(glsl_type::int_type);
   fs_reg t2 = vgrf(glsl_type::int_type);
-  t2.type = BRW_REGISTER_TYPE_UW;
+  t2.type = BRW_REGISTER_TYPE_W;
 
   /* The PS will be run in MSDISPMODE_PERSAMPLE. For example with
* 8x multisampling, subspan 0 will represent sample N (where N
@@ -1295,7 +1295,7 @@ fs_visitor::emit_sampleid_setup()
* subspan 1, and finally sample 1 of subspan 1.
*/
   abld.exec_all()
-  .AND(t1, fs_reg(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_UD)),
+  .AND(t1, fs_reg(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_D)),
fs_reg(0xc0));
   abld.exec_all().SHR(t1, t1, fs_reg(5));
 
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 13c495c..9a5992e1 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -1429,7 +1429,7 @@ fs_generator::generate_set_sample_id(fs_inst *inst,
brw_set_default_exec_size(p, BRW_EXECUTE_8);
brw_set_default_compression_control(p, BRW_COMPRESSION_NONE);
brw_set_default_mask_control(p, BRW_MASK_DISABLE);
-   struct brw_reg reg = retype(stride(src1, 1, 4, 0), BRW_REGISTER_TYPE_UW);
+   struct brw_reg reg = stride(src1, 1, 4, 0);
if (dispatch_width == 8) {
   brw_ADD(p, dst, src0, reg);
} else if (dispatch_width == 16) {
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/4] i965/fs: Trim unneeded channels in SampleID setup.

2015-10-21 Thread Matt Turner

The AND and SHR produce a scalar value that we had been replicating
across $dispatch_width channels. The immediate MOV produces only four
useful channels of data.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index f9c78df..1c075c3 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1271,9 +1271,9 @@ fs_visitor::emit_sampleid_setup()
fs_reg *reg = new(this->mem_ctx) fs_reg(vgrf(glsl_type::int_type));
 
if (key->compute_sample_id) {
-  fs_reg t1 = vgrf(glsl_type::int_type);
-  fs_reg t2 = vgrf(glsl_type::int_type);
-  t2.type = BRW_REGISTER_TYPE_W;
+  fs_reg t1(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_D);
+  t1.set_smear(0);
+  fs_reg t2(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_W);
 
   /* The PS will be run in MSDISPMODE_PERSAMPLE. For example with
* 8x multisampling, subspan 0 will represent sample N (where N
@@ -1294,13 +1294,13 @@ fs_visitor::emit_sampleid_setup()
* are sample 1 of subspan 0; the third group is sample 0 of
* subspan 1, and finally sample 1 of subspan 1.
*/
-  abld.exec_all()
+  abld.exec_all().group(1, 0)
   .AND(t1, fs_reg(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_D)),
fs_reg(0xc0));
-  abld.exec_all().SHR(t1, t1, fs_reg(5));
+  abld.exec_all().group(1, 0).SHR(t1, t1, fs_reg(5));
 
   /* This works for both SIMD8 and SIMD16 */
-  abld.exec_all()
+  abld.exec_all().group(4, 0)
   .MOV(t2, brw_imm_v(key->persample_2x ? 0x1010 : 0x3210));
 
   /* This special instruction takes care of setting vstride=1,
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/4] i965/fs: Drop unnecessary write-enable-all from SET_SAMPLE_ID.

2015-10-21 Thread Matt Turner

---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 9a5992e1..15d0430 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -1425,18 +1425,18 @@ fs_generator::generate_set_sample_id(fs_inst *inst,
assert(src0.type == BRW_REGISTER_TYPE_D ||
   src0.type == BRW_REGISTER_TYPE_UD);
 
-   brw_push_insn_state(p);
-   brw_set_default_exec_size(p, BRW_EXECUTE_8);
-   brw_set_default_compression_control(p, BRW_COMPRESSION_NONE);
-   brw_set_default_mask_control(p, BRW_MASK_DISABLE);
struct brw_reg reg = stride(src1, 1, 4, 0);
if (dispatch_width == 8) {
   brw_ADD(p, dst, src0, reg);
} else if (dispatch_width == 16) {
+  brw_push_insn_state(p);
+  brw_set_default_exec_size(p, BRW_EXECUTE_8);
+  brw_set_default_compression_control(p, BRW_COMPRESSION_NONE);
   brw_ADD(p, firsthalf(dst), firsthalf(src0), reg);
+  brw_set_default_compression_control(p, BRW_COMPRESSION_2NDHALF);
   brw_ADD(p, sechalf(dst), sechalf(src0), suboffset(reg, 2));
+  brw_pop_insn_state(p);
}
-   brw_pop_insn_state(p);
 }
 
 void
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] New stable-branch 11.0 candidate pushed

2015-10-21 Thread Emil Velikov

Hello list,

The candidate for the Mesa 11.0.4 is now available. Currently we have:
 - 36 queued
 - 18 nominated (outstanding)
 - and 0 rejected/obsolete patches

The current queue consists on various mesa, glsl and driver fixes, a few
build related patches and an omx bugfix.


Take a look at section "Mesa stable queue" for more information.

Testing
---
The following results are against piglit 4b6848c131c.


Changes - classic i965(snb)
---
None.


Changes - swrast classic

None.


Changes - gallium softpipe
--
None.


Changes - gallium llvmpipe (LLVM 3.7)
-
None.


Testing reports/general approval

Any testing reports (or general approval of the state of the branch)
will be greatly appreciated.


Trivial merge conflicts
---
commit 141109cc529b3a5d71c0023ad5c19c8844c05171
Author: Marek Olšák 

radeonsi: fix a GS copy shader leak

(cherry picked from commit aa060e276c203baf4691d4a4722accd5bdbb8526)


The plan is to have 11.0.4 this Friday (21th of October) or shortly after.

If you have any questions or comments that you would like to share
before the release, please go ahead.


Cheers,
Emil


Mesa stable queue
-

Nominated (18)
==

Boyan Ding (1):
  i915: Add XRGB format to intel_screen_make_configs

Brian Paul (1):
  configure: don't try to build gallium DRI drivers if --disable-dri is set

Emil Velikov (3):
  i965: store reference to the context within struct brw_fence
  egl/dri2: expose srgb configs when KHR_gl_colorspace is available
  mesa; add get-extra-pick-list.sh script into bin/

Ian Romanick (1):
  i965: Fix is-renderable check in intel_image_target_renderbuffer_storage

Jean-Sébastien Pédron (1):
  ralloc: Use __attribute__((destructor)) instead of atexit(3)

Kenneth Graunke (3):
  i965/nir: Switch on shader stage in nir_lower_outputs().
  i965: Implement a new type_size_4x() function.
  i965: Fix scalar VS float[] and vec2[] output arrays.

Nanley Chery (2):
  mesa/texcompress: Restrict FXT1 format to desktop GL subset
  mesa/glformats: Undo code changes from _mesa_base_tex_format() move

Timothy Arceri (1):
  glsl: fix stream qualifier for blocks with an instance name

Tom Stellard (4):
  clover: Call clBuildProgram() notification function when build completes 
v2
  gallium/drivers: Add threadsafe wrappers for pipe_context v2
  clover: Use threadsafe wrappers for pipe_context v2
  clover: Properly initialize LLVM targets when linking with component libs



Queued (36)
===

Alejandro Piñeiro (2):
  i965/vec4: check writemask when bailing out at register coalesce
  i965/vec4: fill src_reg type using the constructor type parameter

Brian Paul (2):
  vbo: fix incorrect switch statement in init_mat_currval()
  mesa: fix incorrect opcode in save_BlendFunci()

Chih-Wei Huang (3):
  mesa: android: Fix the incorrect path of sse_minmax.c
  nv50/ir: use C++11 standard std::unordered_map if possible
  nv30: include the header of ffs prototype

Chris Wilson (1):
  i965: Remove early release of DRI2 miptree

Dave Airlie (1):
  mesa/uniforms: fix get_uniform for doubles (v2)

Emil Velikov (1):
  docs: add sha256 checksums for 11.0.3

Francisco Jerez (5):
  i965: Don't tell the hardware about our UAV access.
  mesa: Expose function to calculate whether a shader image unit is valid.
  mesa: Skip redundant texture completeness checking during image 
validation.
  i965: Use _mesa_is_image_unit_valid() instead of gl_image_unit::_Valid.
  mesa: Get rid of texture-dependent image unit derived state.

Ian Romanick (8):
  glsl: Allow built-in functions as constant expressions in OpenGL ES 1.00
  ff_fragment_shader: Use binding to set the sampler unit
  glsl/linker: Use constant_initializer instead of constant_value to 
initialize uniforms
  glsl: Use constant_initializer instead of constant_value to determine 
whether to keep an unused uniform
  glsl: Only set ir_variable::constant_value for const-decorated variables
  glsl: Restrict initializers for global variables to constant expression 
in ES
  glsl: Add method to determine whether an expression contains the sequence 
operator
  glsl: In later GLSL versions, sequence operator is cannot be a constant 
expression

Ilia Mirkin (1):
  nouveau: make sure there's always room to emit a fence

Indrajit Das (1):
  st/va: Used correct parameter to derive the value of the "h" variable in 
vlVaCreateImage

Jonathan Gray (1):
  configure.ac: ensure RM is set

Krzysztof Sobiecki (1):
  st/fbo: use pipe_surface_release instead of pipe_surface_reference

Leo Liu (1):
  st/omx/dec/h264: fix field picture type 0 poc disorder

Marek Olšák (3):
  st/mesa: fix clip state

[Mesa-dev] [PATCHv5 4/7] nir: add shader_clock intrinsic

2015-10-21 Thread Emil Velikov

From: Emil Velikov 

v2: Add flags and inline comment/description.
v3: None of the input/outputs are variables
v4: Drop clockARB reference, relate code motion barrier comment wrt
intrinsic flag.
v5: Drop the "thus we can eliminate..." comment (Connor)

Signed-off-by: Emil Velikov 
Reviewed-by: Connor Abbott 
---
 src/glsl/nir/glsl_to_nir.cpp  | 6 ++
 src/glsl/nir/nir_intrinsics.h | 8 
 2 files changed, 14 insertions(+)

diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
index c9cdf35..5e724b1 100644
--- a/src/glsl/nir/glsl_to_nir.cpp
+++ b/src/glsl/nir/glsl_to_nir.cpp
@@ -714,6 +714,8 @@ nir_visitor::visit(ir_call *ir)
  op = nir_intrinsic_ssbo_atomic_exchange;
   } else if (strcmp(ir->callee_name(), 
"__intrinsic_ssbo_atomic_comp_swap_internal") == 0) {
  op = nir_intrinsic_ssbo_atomic_comp_swap;
+  } else if (strcmp(ir->callee_name(), "__intrinsic_shader_clock") == 0) {
+ op = nir_intrinsic_shader_clock;
   } else {
  unreachable("not reached");
   }
@@ -818,6 +820,10 @@ nir_visitor::visit(ir_call *ir)
   case nir_intrinsic_memory_barrier:
  nir_instr_insert_after_cf_list(this->cf_node_list, >instr);
  break;
+  case nir_intrinsic_shader_clock:
+ nir_ssa_dest_init(>instr, >dest, 1, NULL);
+ nir_instr_insert_after_cf_list(this->cf_node_list, >instr);
+ break;
   case nir_intrinsic_store_ssbo: {
  exec_node *param = ir->actual_parameters.get_head();
  ir_rvalue *block = ((ir_instruction *)param)->as_rvalue();
diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
index 49bf3b2..92cb25f 100644
--- a/src/glsl/nir/nir_intrinsics.h
+++ b/src/glsl/nir/nir_intrinsics.h
@@ -83,6 +83,14 @@ BARRIER(discard)
  */
 BARRIER(memory_barrier)
 
+/*
+ * Shader clock intrinsic with semantics analogous to the clock2x32ARB()
+ * GLSL intrinsic.
+ * The latter can be used as code motion barrier, which is currently not
+ * feasible with NIR.
+ */
+INTRINSIC(shader_clock, 0, ARR(), true, 1, 0, 0, NIR_INTRINSIC_CAN_ELIMINATE)
+
 /** A conditional discard, with a single boolean source. */
 INTRINSIC(discard_if, 1, ARR(1), false, 0, 0, 0, 0)
 
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCHv4 4/7] nir: add shader_clock intrinsic

2015-10-21 Thread Emil Velikov

On 20 October 2015 at 19:58, Connor Abbott  wrote:
> On Tue, Oct 20, 2015 at 12:55 PM, Emil Velikov  
> wrote:
[snip]
>> +/*
>> + * Shader clock intrinsic with semantics analogous to the clock2x32ARB()
>> + * GLSL intrinsic.
>> + * The latter can be used as code motion barrier, which is currently not
>> + * feasible with NIR, thus we can eliminate the intrinsic when the return
>> + * value is unused.
>
> Just a small bikeshedding thing: technically, even if we were to make
> shader_clock a code motion barrier like the spec asks for, we could
> still delete it if its result is unused because then nobody will
> notice if we move code around it. Get rid of the "thus we can
> eliminate..." bit and this is
>
Indeed you're correct. It won't make much sense to keep in around.

Thanks for the suggestions and review.
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCHv3 6/7] i965: Implement nir_intrinsic_shader_clock

2015-10-21 Thread Emil Velikov

From: Emil Velikov 

v2:
 - Add a few const qualifiers for good measure.
 - Drop unneeded retype()s (Matt)
 - Convert timestamp to SIMD8/16, as fs_visitor::get_timestamp() returns
SIMD4 (Connor)

v3:
 - Remove unneeded temporary + MOV (Connor)

Signed-off-by: Emil Velikov 
Reviewed-by: Connor Abbott 
---
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp   |  9 +
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 10 ++
 2 files changed, 19 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 792663f..a2b2097 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -1309,6 +1309,15 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
   break;
}
 
+   case nir_intrinsic_shader_clock: {
+  /* We cannot do anything if there is an event, so ignore it for now */
+  fs_reg shader_clock = get_timestamp(bld);
+  const fs_reg srcs[] = { shader_clock.set_smear(0), 
shader_clock.set_smear(1) };
+
+  bld.LOAD_PAYLOAD(dest, srcs, ARRAY_SIZE(srcs), 0);
+  break;
+   }
+
case nir_intrinsic_image_size: {
   /* Get the referenced image variable and type. */
   const nir_variable *var = instr->variables[0]->var;
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index ea1e3e7..c401212 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -806,6 +806,16 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr 
*instr)
   break;
}
 
+   case nir_intrinsic_shader_clock: {
+  /* We cannot do anything if there is an event, so ignore it for now */
+  const src_reg shader_clock = get_timestamp();
+  const enum brw_reg_type type = 
brw_type_for_base_type(glsl_type::uvec2_type);
+
+  dest = get_nir_dest(instr->dest, type);
+  emit(MOV(dest, shader_clock));
+  break;
+   }
+
default:
   unreachable("Unknown intrinsic");
}
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallivm: Translate all util_cpu_caps bits to LLVM attributes.

2015-10-21 Thread Roland Scheidegger

Thanks for fixing this up.

Reviewed-by: Roland Scheidegger 

Am 21.10.2015 um 18:25 schrieb Jose Fonseca:
> This should prevent disparity between features Mesa and LLVM
> believe are supported by the CPU.
> 
> http://lists.freedesktop.org/archives/mesa-dev/2015-October/thread.html#96990
> 
> Tested on a i7-3720QM w/ LLVM 3.3 and 3.6.
> ---
>  src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 34 
> ++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp 
> b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> index 72fab8c..7073956 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> @@ -498,6 +498,32 @@ 
> lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
> }
>  
> llvm::SmallVector MAttrs;
> +   if (util_cpu_caps.has_sse) {
> +  MAttrs.push_back("+sse");
> +   }
> +   if (util_cpu_caps.has_sse2) {
> +  MAttrs.push_back("+sse2");
> +   }
> +   if (util_cpu_caps.has_sse3) {
> +  MAttrs.push_back("+sse3");
> +   }
> +   if (util_cpu_caps.has_ssse3) {
> +  MAttrs.push_back("+ssse3");
> +   }
> +   if (util_cpu_caps.has_sse4_1) {
> +#if HAVE_LLVM >= 0x0304
> +  MAttrs.push_back("+sse4.1");
> +#else
> +  MAttrs.push_back("+sse41");
> +#endif
> +   }
> +   if (util_cpu_caps.has_sse4_2) {
> +#if HAVE_LLVM >= 0x0304
> +  MAttrs.push_back("+sse4.2");
> +#else
> +  MAttrs.push_back("+sse42");
> +#endif
> +   }
> if (util_cpu_caps.has_avx) {
>/*
> * AVX feature is not automatically detected from CPUID by the X86 
> target
> @@ -509,8 +535,14 @@ 
> lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
>if (util_cpu_caps.has_f16c) {
>   MAttrs.push_back("+f16c");
>}
> -  builder.setMAttrs(MAttrs);
> +  if (util_cpu_caps.has_avx2) {
> + MAttrs.push_back("+avx2");
> +  }
> +   }
> +   if (util_cpu_caps.has_altivec) {
> +  MAttrs.push_back("+altivec");
> }
> +   builder.setMAttrs(MAttrs);
>  
>  #if HAVE_LLVM >= 0x0305
> StringRef MCPU = llvm::sys::getHostCPUName();
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa/glformats: Undo code changes from _mesa_base_tex_format() move

2015-10-21 Thread Nanley Chery

On Wed, Oct 21, 2015 at 7:23 AM, Emil Velikov 
wrote:

> On 9 October 2015 at 23:35, Nanley Chery  wrote:
> > From: Nanley Chery 
> >
> > The refactoring commit, c6bf1cd, accidentally reverted cd49b97
> > and 99b1f47. These changes caused more code to be added to the
> > function and removed the existing support for ASTC. This patch
> > reverts those modifications.
> >
> > v2. Actually include ASTC support again.
> >
> Thanks Nanley.
>
> I take it that with this in place the KHR_texture_compression_astc_ldr
> piglits/tests are back online (pass) ?
>

With this in place a dEQP failure is fixed. If I remember correctly, this
did also enable a piglit test to pass.


> I have double-checked and this patch does fix the erroneous reverts by
> c6bf1cd.
>
> Reviewed-by: Emil Velikov 
>
>
Thanks!
Nanley

> -Emil
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 01/11] i965: Introduce new SHADER_OPCODE_URB_WRITE_SIMD8_MASKED/PER_SLOT opcodes.

2015-10-21 Thread Jason Ekstrand

On Wed, Oct 21, 2015 at 1:29 AM, Kenneth Graunke  wrote:
> On Monday, October 12, 2015 02:49:03 PM Kenneth Graunke wrote:
>> In the vec4 backend, we have a vec4_instruction::urb_write_flags field.
>> There are many kinds of flags for SIMD4x2 messages.
>>
>> However, there are really only two (per-slot offset, use channel masks)
>> for SIMD8 messages.  Rather than adding a boolean flag for per-slot
>> offsets (polluting all instructions), I decided to just make three new
>> opcodes.
>>
>> Signed-off-by: Kenneth Graunke 
>> ---
>>  src/mesa/drivers/dri/i965/brw_defines.h|  3 +++
>>  src/mesa/drivers/dri/i965/brw_fs.cpp   |  9 +
>>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 11 +++
>>  src/mesa/drivers/dri/i965/brw_inst.h   |  1 +
>>  src/mesa/drivers/dri/i965/brw_shader.cpp   |  9 +
>>  5 files changed, 33 insertions(+)
>>
>> Here's the rest of the series that didn't get reviewed last time,
>> rebased on Jason's compiler reworks.
>
> Jason landed yet more compiler reworks.  I've pushed a rebased copy
> to the 'simd8gs' branch of ~kwg/mesa.  Code got shuffled between
> functions or header files, so some of it didn't textually apply, but
> the new code isn't significantly different.  I've verified that it
> still builds and passes Piglit.
>
> Jason has yet *more* compiler reworks on the mailing list.  I've
> preemptively rebased on those and pushed that to my tree as well.
> It's the 'simd8gs-rebase-rebase' branch.  That branch doesn't
> compile, however - with the roundabout vec4/fs include hell, it's
> somehow getting an incomplete type for "struct brw_gs_compile".
> I didn't spend the time to figure out why.  Other work to do.

You didn't have to do that...

Anyway, I'll poke at it and get the include stuff sorted.

--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for more message types

2015-10-21 Thread Ben Widawsky

On Tue, Oct 20, 2015 at 02:48:41PM -0700, Matt Turner wrote:
> On Tue, Oct 20, 2015 at 2:41 PM, Ben Widawsky  wrote:
> > On Tue, Oct 20, 2015 at 11:56:15AM +0200, Neil Roberts wrote:
> >> In bfdae9149e0 I disabled the opt_sampler_eot optimisation for TG4
> >> message types because I found by experimentation that it doesn't work.
> >> I wrote in the comment that I couldn't find any documentation for this
> >> problem. However I've now found the documentation and it has
> >> additional restrictions on further message types so this patch updates
> >> the comment and adds the others.
> >> ---
> >>
> >> That paragraph in the spec also mentions further restrictions that we
> >> should probably worry about like that the shader shouldn't combine
> >> this optimisation with any other render target data port read/writes.
> >>
> >> It also has a fairly pessimistic note saying the optimisation is only
> >> really good for large polygons in a GUI-like workload. I wonder
> >> whether we should be doing some more benchmarking to decide whether
> >> it's really a good idea to enable this as a general optimisation even
> >> for games.
> >
> > I remember seeing this before, but I cannot find it now. All I am seeing
> > regarding performance implications are the bits about requiring a header, 
> > and
> > writing to the same pixel from multiple threads. The latter one I assume is 
> > only
> > going to happen with MSAA?
> 
> No, I don't think so. As I understand it, the EUs can be executing
> fragment shaders for multiple primitives at the same time, and those
> primitives might overlap. The c in sendc means that it does some extra
> tracking to ensure that the render target writes land in the correct
> order.
> 
> Presumably by using sendc to texture directly to the render target, it
> adds some extra synchronization (before the texturing is done... or
> something?) that especially hurts when there's a lot of overlapping
> primitives (as in the case of lots of small primitives).

Ah, Neil pointed me to the blurb. Putting this here to remind myself... I think
a cheap way to measure things is to turn the sendc into a send. Things will
probably render wrong, but it should eliminate the bottleneck. If we can see
measurable perf difference with send it certainly would indicate we need to
spend time optimizing the optimization.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 92221] Unintended code changes in _mesa_base_tex_format commit

2015-10-21 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=92221

Nanley Chery  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Nanley Chery  ---
The bug fix patch is now upstreamed.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] mesa/meta: Use interpolateAtOffset for 16x MSAA copy blit

2015-10-21 Thread Neil Roberts

Ian Romanick  writes:

>> To fix that this patch makes it use interpolateAtOffset in the blit
>> shader whenever 16x MSAA is used and the GL_ARB_gpu_shader5 extension
>> is available. This forces it to interpolate the texture coordinates at
>> the pixel center to avoid these problematic positions.
>
> Would it also work to use "centroid in" interpolation qualifier?  Do we
> have any data about the relative cost of the three interpolation
> methods?

I don't think centroid interpolation does anything for per-sample
shading. Centroid interpolation is just meant to ensure that the
interpolated values are within the polygon (it's confusingly named and
has nothing to do with the center). For per-sample shading the sample
position will always be within the polygon so it will just use that for
the interpolation and we would be stuck with the same problem that some
of these positions are on the pixel boundary.

>> +
>> + if (ctx->Extensions.ARB_gpu_shader5 && samples >= 16) {
>> +extra_extensions =
>> +   "#extension GL_ARB_sample_shading : enable\n"
>> +   "#extension GL_ARB_gpu_shader5 : enable\n";
>
> You can unconditionally add the enables. If the implementation doesn't
> support the extension, enable will still succeed while require will
> not.

Ok, yes that is probably worth doing. The GL_ARB_sample_shading one was
already conditionally added before my patch, so maybe I can make a
second patch that first stops it from doing that.

Thanks for looking at the patch.

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 01/11] i965: Introduce new SHADER_OPCODE_URB_WRITE_SIMD8_MASKED/PER_SLOT opcodes.

2015-10-21 Thread Kristian Høgsberg

On Wed, Oct 21, 2015 at 1:29 AM, Kenneth Graunke  wrote:
> On Monday, October 12, 2015 02:49:03 PM Kenneth Graunke wrote:
>> In the vec4 backend, we have a vec4_instruction::urb_write_flags field.
>> There are many kinds of flags for SIMD4x2 messages.
>>
>> However, there are really only two (per-slot offset, use channel masks)
>> for SIMD8 messages.  Rather than adding a boolean flag for per-slot
>> offsets (polluting all instructions), I decided to just make three new
>> opcodes.
>>
>> Signed-off-by: Kenneth Graunke 
>> ---
>>  src/mesa/drivers/dri/i965/brw_defines.h|  3 +++
>>  src/mesa/drivers/dri/i965/brw_fs.cpp   |  9 +
>>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 11 +++
>>  src/mesa/drivers/dri/i965/brw_inst.h   |  1 +
>>  src/mesa/drivers/dri/i965/brw_shader.cpp   |  9 +
>>  5 files changed, 33 insertions(+)
>>
>> Here's the rest of the series that didn't get reviewed last time,
>> rebased on Jason's compiler reworks.
>
> Jason landed yet more compiler reworks.  I've pushed a rebased copy
> to the 'simd8gs' branch of ~kwg/mesa.  Code got shuffled between
> functions or header files, so some of it didn't textually apply, but
> the new code isn't significantly different.  I've verified that it
> still builds and passes Piglit.
>
> Jason has yet *more* compiler reworks on the mailing list.  I've
> preemptively rebased on those and pushed that to my tree as well.
> It's the 'simd8gs-rebase-rebase' branch.  That branch doesn't
> compile, however - with the roundabout vec4/fs include hell, it's
> somehow getting an incomplete type for "struct brw_gs_compile".
> I didn't spend the time to figure out why.  Other work to do.

I started reviewing the series and meant to give a Reviewed-by for the
whole series. I reviewed the first 11 patches, then I got stuck on the
last two (11/11) and (12/11). While I work my way through the last
two, for patches 1-10:

Reviewed-by: Kristian Høgsberg 

in the hope that landing those will reduce the clashing with Jasons work.

Kristian
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glsl: remove excess location qualifier validation

2015-10-21 Thread Tapani Pälli


On 10/22/2015 08:29 AM, Timothy Arceri wrote:

Location has never been able to be a negative value because it has
always been validated in the parser.

Also the linker doesn't check for negatives like the comment claims.


Neither does the parser, if one utilizes negative explicit location, 
parser says:


error: syntax error, unexpected '-', expecting INTCONSTANT or UINTCONSTANT

I'm not sure if this is quite OK, it should rather accept the negative 
value and the fail here in this check you are about to remove?



---

  No piglit regressions and an extra negative test sent for
  ARB_explicit_uniform_location [1]

  [1] http://patchwork.freedesktop.org/patch/62573/

  src/glsl/ast_to_hir.cpp | 70 -
  1 file changed, 22 insertions(+), 48 deletions(-)

diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index 8549d55..0306530 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -2422,21 +2422,6 @@ validate_explicit_location(const struct 
ast_type_qualifier *qual,
const struct gl_context *const ctx = state->ctx;
unsigned max_loc = qual->location + var->type->uniform_locations() - 1;
  
-  /* ARB_explicit_uniform_location specification states:

-   *
-   * "The explicitly defined locations and the generated locations
-   * must be in the range of 0 to MAX_UNIFORM_LOCATIONS minus one."
-   *
-   * "Valid locations for default-block uniform variable locations
-   * are in the range of 0 to the implementation-defined maximum
-   * number of uniform locations."
-   */
-  if (qual->location < 0) {
- _mesa_glsl_error(loc, state,
-  "explicit location < 0 for uniform %s", var->name);
- return;
-  }
-
if (max_loc >= ctx->Const.MaxUserAssignableUniformLocations) {
   _mesa_glsl_error(loc, state, "location(s) consumed by uniform %s "
">= MAX_UNIFORM_LOCATIONS (%u)", var->name,
@@ -2527,41 +2512,30 @@ validate_explicit_location(const struct 
ast_type_qualifier *qual,
 } else {
var->data.explicit_location = true;
  
-  /* This bit of silliness is needed because invalid explicit locations

-   * are supposed to be flagged during linking.  Small negative values
-   * biased by VERT_ATTRIB_GENERIC0 or FRAG_RESULT_DATA0 could alias
-   * built-in values (e.g., -16+VERT_ATTRIB_GENERIC0 = VERT_ATTRIB_POS).
-   * The linker needs to be able to differentiate these cases.  This
-   * ensures that negative values stay negative.
-   */
-  if (qual->location >= 0) {
- switch (state->stage) {
- case MESA_SHADER_VERTEX:
-var->data.location = (var->data.mode == ir_var_shader_in)
-   ? (qual->location + VERT_ATTRIB_GENERIC0)
-   : (qual->location + VARYING_SLOT_VAR0);
-break;
+  switch (state->stage) {
+  case MESA_SHADER_VERTEX:
+ var->data.location = (var->data.mode == ir_var_shader_in)
+? (qual->location + VERT_ATTRIB_GENERIC0)
+: (qual->location + VARYING_SLOT_VAR0);
+ break;
  
- case MESA_SHADER_TESS_CTRL:

- case MESA_SHADER_TESS_EVAL:
- case MESA_SHADER_GEOMETRY:
-if (var->data.patch)
-   var->data.location = qual->location + VARYING_SLOT_PATCH0;
-else
-   var->data.location = qual->location + VARYING_SLOT_VAR0;
-break;
+  case MESA_SHADER_TESS_CTRL:
+  case MESA_SHADER_TESS_EVAL:
+  case MESA_SHADER_GEOMETRY:
+ if (var->data.patch)
+var->data.location = qual->location + VARYING_SLOT_PATCH0;
+ else
+var->data.location = qual->location + VARYING_SLOT_VAR0;
+ break;
  
- case MESA_SHADER_FRAGMENT:

-var->data.location = (var->data.mode == ir_var_shader_out)
-   ? (qual->location + FRAG_RESULT_DATA0)
-   : (qual->location + VARYING_SLOT_VAR0);
-break;
- case MESA_SHADER_COMPUTE:
-assert(!"Unexpected shader type");
-break;
- }
-  } else {
- var->data.location = qual->location;
+  case MESA_SHADER_FRAGMENT:
+ var->data.location = (var->data.mode == ir_var_shader_out)
+? (qual->location + FRAG_RESULT_DATA0)
+: (qual->location + VARYING_SLOT_VAR0);
+ break;
+  case MESA_SHADER_COMPUTE:
+ assert(!"Unexpected shader type");
+ break;
}
  
if (qual->flags.q.explicit_index) {


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 6.5/7] i965/gs: Use NIR info for setting up prog_data

2015-10-21 Thread Jason Ekstrand

Previously, we were pulling bits from GL data structures in order to set up
the prog_data.  However, in this brave new world of NIR, we want to be
pulling it out of the NIR shader whenever possible.  This way, we can move
all this setup code into brw_compile_gs without depending on the old GL
stuff.
---
 src/mesa/drivers/dri/i965/brw_gs.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_gs.c 
b/src/mesa/drivers/dri/i965/brw_gs.c
index d7ea2f0..f3d1e0b 100644
--- a/src/mesa/drivers/dri/i965/brw_gs.c
+++ b/src/mesa/drivers/dri/i965/brw_gs.c
@@ -58,6 +58,7 @@ brw_codegen_gs_prog(struct brw_context *brw,
 struct brw_gs_prog_key *key)
 {
struct gl_shader *shader = prog->_LinkedShaders[MESA_SHADER_GEOMETRY];
+   nir_shader *nir = gp->program.Base.nir;
struct brw_stage_state *stage_state = >gs.base;
struct brw_gs_prog_data prog_data;
struct brw_gs_compile c;
@@ -66,9 +67,9 @@ brw_codegen_gs_prog(struct brw_context *brw,
c.key = *key;
 
prog_data.include_primitive_id =
-  (gp->program.Base.InputsRead & VARYING_BIT_PRIMITIVE_ID) != 0;
+  (nir->info.inputs_read & VARYING_BIT_PRIMITIVE_ID) != 0;
 
-   prog_data.invocations = gp->program.Invocations;
+   prog_data.invocations = nir->info.gs.invocations;
 
assign_gs_binding_table_offsets(brw->intelScreen->devinfo, prog,
>program.Base, _data);
@@ -102,7 +103,7 @@ brw_codegen_gs_prog(struct brw_context *brw,
}
 
if (brw->gen >= 7) {
-  if (gp->program.OutputType == GL_POINTS) {
+  if (nir->info.gs.output_primitive == GL_POINTS) {
  /* When the output type is points, the geometry shader may output data
   * to multiple streams, and EndPrimitive() has no effect.  So we
   * configure the hardware to interpret the control data as stream ID.
@@ -110,7 +111,7 @@ brw_codegen_gs_prog(struct brw_context *brw,
  prog_data.control_data_format = GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_SID;
 
  /* We only have to emit control bits if we are using streams */
- if (prog->Geom.UsesStreams)
+ if (nir->info.gs.uses_streams)
 c.control_data_bits_per_vertex = 2;
  else
 c.control_data_bits_per_vertex = 0;
@@ -126,20 +127,21 @@ brw_codegen_gs_prog(struct brw_context *brw,
  /* We only need to output control data if the shader actually calls
   * EndPrimitive().
   */
- c.control_data_bits_per_vertex = gp->program.UsesEndPrimitive ? 1 : 0;
+ c.control_data_bits_per_vertex =
+nir->info.gs.uses_end_primitive ? 1 : 0;
   }
} else {
   /* There are no control data bits in gen6. */
   c.control_data_bits_per_vertex = 0;
 
   /* If it is using transform feedback, enable it */
-  if (prog->TransformFeedback.NumVarying)
+  if (nir->info.has_transform_feedback_varyings)
  prog_data.gen6_xfb_enabled = true;
   else
  prog_data.gen6_xfb_enabled = false;
}
c.control_data_header_size_bits =
-  gp->program.VerticesOut * c.control_data_bits_per_vertex;
+  nir->info.gs.vertices_out * c.control_data_bits_per_vertex;
 
/* 1 HWORD = 32 bytes = 256 bits */
prog_data.control_data_header_size_hwords =
@@ -240,7 +242,7 @@ brw_codegen_gs_prog(struct brw_context *brw,
unsigned output_size_bytes;
if (brw->gen >= 7) {
   output_size_bytes =
- prog_data.output_vertex_size_hwords * 32 * gp->program.VerticesOut;
+ prog_data.output_vertex_size_hwords * 32 * nir->info.gs.vertices_out;
   output_size_bytes += 32 * prog_data.control_data_header_size_hwords;
} else {
   output_size_bytes = prog_data.output_vertex_size_hwords * 32;
@@ -269,7 +271,7 @@ brw_codegen_gs_prog(struct brw_context *brw,
   prog_data.base.urb_entry_size = ALIGN(output_size_bytes, 128) / 128;
 
prog_data.output_topology =
-  get_hw_prim_for_gl_prim(gp->program.OutputType);
+  get_hw_prim_for_gl_prim(nir->info.gs.output_primitive);
 
/* The GLSL linker will have already matched up GS inputs and the outputs
 * of prior stages.  The driver does extend VS outputs in some cases, but
@@ -283,10 +285,10 @@ brw_codegen_gs_prog(struct brw_context *brw,
 * written by previous stages and shows up via payload magic.
 */
GLbitfield64 inputs_read =
-  gp->program.Base.InputsRead & ~VARYING_BIT_PRIMITIVE_ID;
+  nir->info.inputs_read & ~VARYING_BIT_PRIMITIVE_ID;
brw_compute_vue_map(brw->intelScreen->devinfo,
_vue_map, inputs_read,
-   prog->SeparateShader);
+   nir->info.separate_shader);
 
/* GS inputs are read from the VUE 256 bits (2 vec4's) at a time, so we
 * need to program a URB read length of ceiling(num_slots / 2).
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list

[Mesa-dev] [PATCH] i965/vec4: Initialize LOD to 0.0f for textureQueryLevels() and texture().

2015-10-21 Thread Matt Turner

We implement textureQueryLevels (which takes no arguments, save the
sampler) using the resinfo message (which takes an argument of LOD).
Without initializing it, we'd generate a MOV from the null register to
load the LOD argument.

Essentially the same logic applies to texture. A vertex shader cannot
compute derivatives and so cannot produce an LOD, so TXL with an LOD of
0.0 is used.
---
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index c39f97e..b8f90f2 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -882,6 +882,18 @@ vec4_visitor::emit_texture(ir_texture_opcode op,
uint32_t sampler,
src_reg sampler_reg)
 {
+   /* The sampler can only meaningfully compute LOD for fragment shader
+* messages. For all other stages, we change the opcode to TXL and hardcode
+* the LOD to 0.
+*
+* textureQueryLevels() is implemented in terms of TXS so we need to pass a
+* valid LOD argument.
+*/
+   if (op == ir_tex || op == ir_query_levels) {
+  assert(lod.file == BAD_FILE);
+  lod = src_reg(0.0f);
+   }
+
enum opcode opcode;
switch (op) {
case ir_tex: opcode = SHADER_OPCODE_TXL; break;
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/4] i965/fs: Use type-W for immediate in SampleID setup.

2015-10-21 Thread Kenneth Graunke

On Wednesday, October 21, 2015 10:05:27 AM Matt Turner wrote:
> Not a functional difference, but register is loaded with a signed
> immediate (V) and added to a signed type (D) producing a signed result
> (D).
> 
> Also change the type of g0 to allow for compaction.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp   | 4 ++--
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 49323eb..f9c78df 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -1273,7 +1273,7 @@ fs_visitor::emit_sampleid_setup()
> if (key->compute_sample_id) {
>fs_reg t1 = vgrf(glsl_type::int_type);
>fs_reg t2 = vgrf(glsl_type::int_type);
> -  t2.type = BRW_REGISTER_TYPE_UW;
> +  t2.type = BRW_REGISTER_TYPE_W;
>  
>/* The PS will be run in MSDISPMODE_PERSAMPLE. For example with
> * 8x multisampling, subspan 0 will represent sample N (where N
> @@ -1295,7 +1295,7 @@ fs_visitor::emit_sampleid_setup()
> * subspan 1, and finally sample 1 of subspan 1.
> */
>abld.exec_all()
> -  .AND(t1, fs_reg(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_UD)),
> +  .AND(t1, fs_reg(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_D)),
> fs_reg(0xc0));
>abld.exec_all().SHR(t1, t1, fs_reg(5));
>  
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index 13c495c..9a5992e1 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -1429,7 +1429,7 @@ fs_generator::generate_set_sample_id(fs_inst *inst,
> brw_set_default_exec_size(p, BRW_EXECUTE_8);
> brw_set_default_compression_control(p, BRW_COMPRESSION_NONE);
> brw_set_default_mask_control(p, BRW_MASK_DISABLE);
> -   struct brw_reg reg = retype(stride(src1, 1, 4, 0), BRW_REGISTER_TYPE_UW);
> +   struct brw_reg reg = stride(src1, 1, 4, 0);
> if (dispatch_width == 8) {
>brw_ADD(p, dst, src0, reg);
> } else if (dispatch_width == 16) {
> 

Series looks right to me.
Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 1/7] nir/info: Add more information about geometry shaders

2015-10-21 Thread Jason Ekstrand

v2: Add a uses_streams boolean

---
 src/glsl/nir/glsl_to_nir.cpp |  4 
 src/glsl/nir/nir.h   | 12 
 2 files changed, 16 insertions(+)

diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
index c9cdf35..9b50a93 100644
--- a/src/glsl/nir/glsl_to_nir.cpp
+++ b/src/glsl/nir/glsl_to_nir.cpp
@@ -170,8 +170,12 @@ glsl_to_nir(const struct gl_shader_program *shader_prog,
 
switch (stage) {
case MESA_SHADER_GEOMETRY:
+  shader->info.gs.vertices_in = shader_prog->Geom.VerticesIn;
+  shader->info.gs.output_primitive = sh->Geom.OutputType;
   shader->info.gs.vertices_out = sh->Geom.VerticesOut;
   shader->info.gs.invocations = sh->Geom.Invocations;
+  shader->info.gs.uses_end_primitive = shader_prog->Geom.UsesEndPrimitive;
+  shader->info.gs.uses_streams = shader_prog->Geom.UsesStreams;
   break;
 
case MESA_SHADER_FRAGMENT: {
diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index 2ab48fb..f65d44c 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -1495,11 +1495,23 @@ typedef struct nir_shader_info {
 
union {
   struct {
+ /** The number of vertices recieves per input primitive */
+ unsigned vertices_in;
+
+ /** The output primitive type (GL enum value) */
+ unsigned output_primitive;
+
  /** The maximum number of vertices the geometry shader might write. */
  unsigned vertices_out;
 
  /** 1 .. MAX_GEOMETRY_SHADER_INVOCATIONS */
  unsigned invocations;
+
+ /** Whether or not this shader uses EndPrimitive */
+ bool uses_end_primitive;
+
+ /** Whether or not this shader uses non-zero streams */
+ bool uses_streams;
   } gs;
 
   struct {
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 6.5/7] i965/gs: Use NIR info for setting up prog_data

2015-10-21 Thread Kenneth Graunke

On Wednesday, October 21, 2015 12:45:27 PM Jason Ekstrand wrote:
> Previously, we were pulling bits from GL data structures in order to set up
> the prog_data.  However, in this brave new world of NIR, we want to be
> pulling it out of the NIR shader whenever possible.  This way, we can move
> all this setup code into brw_compile_gs without depending on the old GL
> stuff.
> ---
>  src/mesa/drivers/dri/i965/brw_gs.c | 24 +---
>  1 file changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_gs.c 
> b/src/mesa/drivers/dri/i965/brw_gs.c
> index d7ea2f0..f3d1e0b 100644
> --- a/src/mesa/drivers/dri/i965/brw_gs.c
> +++ b/src/mesa/drivers/dri/i965/brw_gs.c
> @@ -58,6 +58,7 @@ brw_codegen_gs_prog(struct brw_context *brw,
>  struct brw_gs_prog_key *key)
>  {
> struct gl_shader *shader = prog->_LinkedShaders[MESA_SHADER_GEOMETRY];
> +   nir_shader *nir = gp->program.Base.nir;
> struct brw_stage_state *stage_state = >gs.base;
> struct brw_gs_prog_data prog_data;
> struct brw_gs_compile c;
> @@ -66,9 +67,9 @@ brw_codegen_gs_prog(struct brw_context *brw,
> c.key = *key;
>  
> prog_data.include_primitive_id =
> -  (gp->program.Base.InputsRead & VARYING_BIT_PRIMITIVE_ID) != 0;
> +  (nir->info.inputs_read & VARYING_BIT_PRIMITIVE_ID) != 0;
>  
> -   prog_data.invocations = gp->program.Invocations;
> +   prog_data.invocations = nir->info.gs.invocations;
>  
> assign_gs_binding_table_offsets(brw->intelScreen->devinfo, prog,
> >program.Base, _data);
> @@ -102,7 +103,7 @@ brw_codegen_gs_prog(struct brw_context *brw,
> }
>  
> if (brw->gen >= 7) {
> -  if (gp->program.OutputType == GL_POINTS) {
> +  if (nir->info.gs.output_primitive == GL_POINTS) {
>   /* When the output type is points, the geometry shader may output 
> data
>* to multiple streams, and EndPrimitive() has no effect.  So we
>* configure the hardware to interpret the control data as stream 
> ID.
> @@ -110,7 +111,7 @@ brw_codegen_gs_prog(struct brw_context *brw,
>   prog_data.control_data_format = 
> GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_SID;
>  
>   /* We only have to emit control bits if we are using streams */
> - if (prog->Geom.UsesStreams)
> + if (nir->info.gs.uses_streams)
>  c.control_data_bits_per_vertex = 2;
>   else
>  c.control_data_bits_per_vertex = 0;
> @@ -126,20 +127,21 @@ brw_codegen_gs_prog(struct brw_context *brw,
>   /* We only need to output control data if the shader actually calls
>* EndPrimitive().
>*/
> - c.control_data_bits_per_vertex = gp->program.UsesEndPrimitive ? 1 : 
> 0;
> + c.control_data_bits_per_vertex =
> +nir->info.gs.uses_end_primitive ? 1 : 0;
>}
> } else {
>/* There are no control data bits in gen6. */
>c.control_data_bits_per_vertex = 0;
>  
>/* If it is using transform feedback, enable it */
> -  if (prog->TransformFeedback.NumVarying)
> +  if (nir->info.has_transform_feedback_varyings)
>   prog_data.gen6_xfb_enabled = true;
>else
>   prog_data.gen6_xfb_enabled = false;
> }
> c.control_data_header_size_bits =
> -  gp->program.VerticesOut * c.control_data_bits_per_vertex;
> +  nir->info.gs.vertices_out * c.control_data_bits_per_vertex;
>  
> /* 1 HWORD = 32 bytes = 256 bits */
> prog_data.control_data_header_size_hwords =
> @@ -240,7 +242,7 @@ brw_codegen_gs_prog(struct brw_context *brw,
> unsigned output_size_bytes;
> if (brw->gen >= 7) {
>output_size_bytes =
> - prog_data.output_vertex_size_hwords * 32 * gp->program.VerticesOut;
> + prog_data.output_vertex_size_hwords * 32 * 
> nir->info.gs.vertices_out;
>output_size_bytes += 32 * prog_data.control_data_header_size_hwords;
> } else {
>output_size_bytes = prog_data.output_vertex_size_hwords * 32;
> @@ -269,7 +271,7 @@ brw_codegen_gs_prog(struct brw_context *brw,
>prog_data.base.urb_entry_size = ALIGN(output_size_bytes, 128) / 128;
>  
> prog_data.output_topology =
> -  get_hw_prim_for_gl_prim(gp->program.OutputType);
> +  get_hw_prim_for_gl_prim(nir->info.gs.output_primitive);
>  
> /* The GLSL linker will have already matched up GS inputs and the outputs
>  * of prior stages.  The driver does extend VS outputs in some cases, but
> @@ -283,10 +285,10 @@ brw_codegen_gs_prog(struct brw_context *brw,
>  * written by previous stages and shows up via payload magic.
>  */
> GLbitfield64 inputs_read =
> -  gp->program.Base.InputsRead & ~VARYING_BIT_PRIMITIVE_ID;
> +  nir->info.inputs_read & ~VARYING_BIT_PRIMITIVE_ID;
> brw_compute_vue_map(brw->intelScreen->devinfo,
> _vue_map, inputs_read,
> -   prog->SeparateShader);
> +

Re: [Mesa-dev] [PATCH v2 1/7] nir/info: Add more information about geometry shaders

2015-10-21 Thread Kenneth Graunke

On Wednesday, October 21, 2015 12:44:31 PM Jason Ekstrand wrote:
> v2: Add a uses_streams boolean
> 
> ---
>  src/glsl/nir/glsl_to_nir.cpp |  4 
>  src/glsl/nir/nir.h   | 12 
>  2 files changed, 16 insertions(+)
> 
> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
> index c9cdf35..9b50a93 100644
> --- a/src/glsl/nir/glsl_to_nir.cpp
> +++ b/src/glsl/nir/glsl_to_nir.cpp
> @@ -170,8 +170,12 @@ glsl_to_nir(const struct gl_shader_program *shader_prog,
>  
> switch (stage) {
> case MESA_SHADER_GEOMETRY:
> +  shader->info.gs.vertices_in = shader_prog->Geom.VerticesIn;
> +  shader->info.gs.output_primitive = sh->Geom.OutputType;
>shader->info.gs.vertices_out = sh->Geom.VerticesOut;
>shader->info.gs.invocations = sh->Geom.Invocations;
> +  shader->info.gs.uses_end_primitive = 
> shader_prog->Geom.UsesEndPrimitive;
> +  shader->info.gs.uses_streams = shader_prog->Geom.UsesStreams;
>break;
>  
> case MESA_SHADER_FRAGMENT: {
> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
> index 2ab48fb..f65d44c 100644
> --- a/src/glsl/nir/nir.h
> +++ b/src/glsl/nir/nir.h
> @@ -1495,11 +1495,23 @@ typedef struct nir_shader_info {
>  
> union {
>struct {
> + /** The number of vertices recieves per input primitive */
> + unsigned vertices_in;
> +
> + /** The output primitive type (GL enum value) */
> + unsigned output_primitive;
> +
>   /** The maximum number of vertices the geometry shader might write. 
> */
>   unsigned vertices_out;
>  
>   /** 1 .. MAX_GEOMETRY_SHADER_INVOCATIONS */
>   unsigned invocations;
> +
> + /** Whether or not this shader uses EndPrimitive */
> + bool uses_end_primitive;
> +
> + /** Whether or not this shader uses non-zero streams */
> + bool uses_streams;
>} gs;
>  
>struct {
> 

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 92570] 10 bit h264 OMX UVD decode outputs NV12

2015-10-21 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=92570

--- Comment #2 from Andy Furniss  ---
(In reply to Christian König from comment #1)
> Yeah, that's a known issue/unimplemented feature.
> 
> On pre Tonga hardware UVD can actually decode 10 bit h264, but still outputs
> NV12.

So you mean it produces correct output on other h/w like using 10 bit
internally and truncating to 8 bit for output?

If so why not output nv16 or something else 10 bit?

> And so far we didn't had the time to actually implement support for 10bit
> video surfaces used on Tonga so your end result is corrupted.

OK - I guess it is not exactly a needed feature by anyone I'm just testing.

Are there any docs that list the capabilities whether implemented or not of the
various UVDs VCEs and VSR (if that is h/w) 

> BTW: If somebody wants to get his hands dirty this should be rather easy to
> hack together, just not top priority for us.

Maybe easy for those who know what they are doing :-)

Where would someone start to look for inspiration?

On a slightly related note what version of bellagio/gstreamer do you use?

sf.net version needs a bit of patching to even compile and then seems to
install OMX headers that gst-omx doesn't like. I can get there in the end but
wondered whether I am missing some new version hiding somewhere.

I asked on #gstreamer and the only person that replied thought it was
old/broken and not needed - though after looking around he did admit he didn't
know about VCE.

I am just asking to double check that there really is no other way to use
gstreamer and get-omx h/w accel.

TIA

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/9] i965/vec4: Don't emit MOVs for unused URB slots.

2015-10-21 Thread Emil Velikov

On 20 October 2015 at 05:08, Matt Turner  wrote:
> Otherwise we'd emit a MOV from the null register (which isn't allowed).
>
Would you say it's a good idea to push the check down to the MOV()
implementation ? If not perhaps we should add an assert() to easily
catch cases like these in the future ?

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] i965: remove cache_aux_free_func array

2015-10-21 Thread Matt Turner

On Wed, Oct 21, 2015 at 2:16 PM, Emil Velikov  wrote:
> On 21 October 2015 at 21:33, Kenneth Graunke  wrote:
>> On Monday, October 19, 2015 02:54:56 PM Emil Velikov wrote:
>>> Ping on these two trivial patches ?
>>>
>>> -Emil
>>
>> Oh, sorry, I thought I'd sent R-bs for these...
>>
>> Both are
>> Reviewed-by: Kenneth Graunke 
> Thanks Ken. I was wondering if people looked at them and went "meh ...
> too small, we need something beefier" :-P
>
> And now ... some C++ questions. I realise that templates (or is it
> only STL?) are out of the question, but how about
>  - Initialization upon object declaration, rather than copy
> constructors ? Rather trivial yet we have thousands of

You're talking about things like this?

   fs_reg reg = fs_reg(...)

I seem to remember trying at one point to replace those with just

   fs_reg reg(...)

and found that it made absolutely no difference in the compiled code.

The second's probably preferable if for no other reason than it's
shorter, but I don't know that there's anything to be gained...

> duplicated/wasted CPU cycles due to it. One example is the memset()
> from {fs,src,dst}_reg. Does the compiler squash/optimize those for us
> ?

Not sure. Experiments and data are welcome.

I think people preferred the memset because there wasn't an
alternative safe way of ensuring everything was initialized? Not sure.

>  - Where is the line about "big enough to pass as reference" rather
> than a copy for i965 ? It seems that older code(?) and extremely
> common things such as the *_reg are passed around as copies.

We want to use const references for dst_reg/src_reg/fs_reg where
possible (see commit e58992aed and the three immediately before it for
data). I don't think there should be many more instances of this.

brw_reg is 8 bytes, so there's no reason to pass it as anything but by value.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] i965: remove cache_aux_free_func array

2015-10-21 Thread Kenneth Graunke

On Monday, October 19, 2015 02:54:56 PM Emil Velikov wrote:
> Ping on these two trivial patches ?
> 
> -Emil

Oh, sorry, I thought I'd sent R-bs for these...

Both are
Reviewed-by: Kenneth Graunke 

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 01/11] i965: Introduce new SHADER_OPCODE_URB_WRITE_SIMD8_MASKED/PER_SLOT opcodes.

2015-10-21 Thread Jason Ekstrand

On Oct 21, 2015 10:28 AM, "Jason Ekstrand"  wrote:
>
> On Wed, Oct 21, 2015 at 1:29 AM, Kenneth Graunke 
wrote:
> > On Monday, October 12, 2015 02:49:03 PM Kenneth Graunke wrote:
> >> In the vec4 backend, we have a vec4_instruction::urb_write_flags field.
> >> There are many kinds of flags for SIMD4x2 messages.
> >>
> >> However, there are really only two (per-slot offset, use channel masks)
> >> for SIMD8 messages.  Rather than adding a boolean flag for per-slot
> >> offsets (polluting all instructions), I decided to just make three new
> >> opcodes.
> >>
> >> Signed-off-by: Kenneth Graunke 
> >> ---
> >>  src/mesa/drivers/dri/i965/brw_defines.h|  3 +++
> >>  src/mesa/drivers/dri/i965/brw_fs.cpp   |  9 +
> >>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 11 +++
> >>  src/mesa/drivers/dri/i965/brw_inst.h   |  1 +
> >>  src/mesa/drivers/dri/i965/brw_shader.cpp   |  9 +
> >>  5 files changed, 33 insertions(+)
> >>
> >> Here's the rest of the series that didn't get reviewed last time,
> >> rebased on Jason's compiler reworks.
> >
> > Jason landed yet more compiler reworks.  I've pushed a rebased copy
> > to the 'simd8gs' branch of ~kwg/mesa.  Code got shuffled between
> > functions or header files, so some of it didn't textually apply, but
> > the new code isn't significantly different.  I've verified that it
> > still builds and passes Piglit.
> >
> > Jason has yet *more* compiler reworks on the mailing list.  I've
> > preemptively rebased on those and pushed that to my tree as well.
> > It's the 'simd8gs-rebase-rebase' branch.  That branch doesn't
> > compile, however - with the roundabout vec4/fs include hell, it's
> > somehow getting an incomplete type for "struct brw_gs_compile".
> > I didn't spend the time to figure out why.  Other work to do.
>
> You didn't have to do that...
>
> Anyway, I'll poke at it and get the include stuff sorted.

I just pushed a rebased version to my freedesktop repo in the wip/simd8gs
branch.

--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/6] [v2] i965/fs: Enumerate logical fb writes arguments

2015-10-21 Thread Ben Widawsky

Gen9 adds the ability to write out a stencil value, so we need to expand the
virtual payload by one. Abstracting this now makes that change easier to read.

I was admittedly confused early on about some of the hardcoding. If people
believe the resulting code is inferior, I am not super attached to the patch.

v2:
Remove explicit numbering from the enumeration (Matt).
Use a real naming scheme, and reference it in the opcode definition (Curro)
  - LOGICAL_SRC_SRC_DEPTH kinda sucks... but it's consistent
Add a missed hardcoded logical position in get_lowered_simd_width (Ben)
Add an assertion to make sure the component numbering is correct (Ben)

Cc: Matt Turner 
Cc: Francisco Jerez 
Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_defines.h  | 22 +-
 src/mesa/drivers/dri/i965/brw_fs.cpp | 24 +---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |  1 +
 3 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index e61ad54..a2f59ea 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -911,15 +911,9 @@ enum opcode {
 
/**
 * Same as FS_OPCODE_FB_WRITE but expects its arguments separately as
-* individual sources instead of as a single payload blob:
-*
-* Source 0: [required] Color 0.
-* Source 1: [optional] Color 1 (for dual source blend messages).
-* Source 2: [optional] Src0 Alpha.
-* Source 3: [optional] Source Depth (gl_FragDepth)
-* Source 4: [optional (gen4-5)] Destination Depth passthrough from thread
-* Source 5: [optional] Sample Mask (gl_SampleMask).
-* Source 6: [required] Number of color components (as a UD immediate).
+* individual sources instead of as a single payload blob. The
+* position/ordering of the arguments are defined by the enum
+* fb_write_logical_srcs.
 */
FS_OPCODE_FB_WRITE_LOGICAL,
 
@@ -1318,6 +1312,16 @@ enum brw_urb_write_flags {
   BRW_URB_WRITE_ALLOCATE | BRW_URB_WRITE_COMPLETE,
 };
 
+enum fb_write_logical_srcs {
+   FB_WRITE_LOGICAL_SRC_COLOR0,  /* REQUIRED */
+   FB_WRITE_LOGICAL_SRC_COLOR1,  /* for dual source blend messages */
+   FB_WRITE_LOGICAL_SRC_SRC0_ALPHA,
+   FB_WRITE_LOGICAL_SRC_SRC_DEPTH,   /* gl_FragDepth */
+   FB_WRITE_LOGICAL_SRC_DST_DEPTH,   /* GEN4-5: passthrough from thread */
+   FB_WRITE_LOGICAL_SRC_OMASK,   /* Sample Mask (gl_SampleMask) */
+   FB_WRITE_LOGICAL_SRC_COMPONENTS,  /* REQUIRED */
+};
+
 #ifdef __cplusplus
 /**
  * Allow brw_urb_write_flags enums to be ORed together.
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index da90467..ef06a70 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -695,10 +695,10 @@ fs_inst::components_read(unsigned i) const
   return 2;
 
case FS_OPCODE_FB_WRITE_LOGICAL:
-  assert(src[6].file == IMM);
+  assert(src[FB_WRITE_LOGICAL_SRC_COMPONENTS].file == IMM);
   /* First/second FB write color. */
   if (i < 2)
- return src[6].fixed_hw_reg.dw1.ud;
+ return src[FB_WRITE_LOGICAL_SRC_COMPONENTS].fixed_hw_reg.dw1.ud;
   else
  return 1;
 
@@ -3339,15 +3339,16 @@ lower_fb_write_logical_send(const fs_builder , 
fs_inst *inst,
 const brw_wm_prog_key *key,
 const fs_visitor::thread_payload )
 {
-   assert(inst->src[6].file == IMM);
+   assert(inst->src[FB_WRITE_LOGICAL_SRC_COMPONENTS].file == IMM);
const brw_device_info *devinfo = bld.shader->devinfo;
-   const fs_reg  = inst->src[0];
-   const fs_reg  = inst->src[1];
-   const fs_reg _alpha = inst->src[2];
-   const fs_reg _depth = inst->src[3];
-   const fs_reg _depth = inst->src[4];
-   fs_reg sample_mask = inst->src[5];
-   const unsigned components = inst->src[6].fixed_hw_reg.dw1.ud;
+   const fs_reg  = inst->src[FB_WRITE_LOGICAL_SRC_COLOR0];
+   const fs_reg  = inst->src[FB_WRITE_LOGICAL_SRC_COLOR1];
+   const fs_reg _alpha = inst->src[FB_WRITE_LOGICAL_SRC_SRC0_ALPHA];
+   const fs_reg _depth = inst->src[FB_WRITE_LOGICAL_SRC_SRC_DEPTH];
+   const fs_reg _depth = inst->src[FB_WRITE_LOGICAL_SRC_DST_DEPTH];
+   fs_reg sample_mask = inst->src[FB_WRITE_LOGICAL_SRC_OMASK];
+   const unsigned components =
+  inst->src[FB_WRITE_LOGICAL_SRC_COMPONENTS].fixed_hw_reg.dw1.ud;
 
/* We can potentially have a message length of up to 15, so we have to set
 * base_mrf to either 0 or 1 in order to fit in m0..m15.
@@ -4175,7 +4176,8 @@ get_lowered_simd_width(const struct brw_device_info 
*devinfo,
   /* Gen6 doesn't support SIMD16 depth writes but we cannot handle them
* here.
*/
-  assert(devinfo->gen != 6 || inst->src[3].file == BAD_FILE ||
+  assert(devinfo->gen != 6 ||
+

Re: [Mesa-dev] [PATCH 2/4] st/dri2: Add shared flag to missing locations

2015-10-21 Thread Marek Olšák

On Wed, Oct 21, 2015 at 12:28 PM, Axel Davy  wrote:
> The PIPE_BIND_SHARED flag should be added whenever
> the resource may be shared with another process.
>
> In particular if the resource is imported, or may
> be exported, the flag should be used.

This can't be enforced. EGL_MESA_image_dma_buf_export allows exporting
any texture. Mesa can't know in advance if a texture will be exported.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] i965: remove cache_aux_free_func array

2015-10-21 Thread Emil Velikov

On 21 October 2015 at 21:33, Kenneth Graunke  wrote:
> On Monday, October 19, 2015 02:54:56 PM Emil Velikov wrote:
>> Ping on these two trivial patches ?
>>
>> -Emil
>
> Oh, sorry, I thought I'd sent R-bs for these...
>
> Both are
> Reviewed-by: Kenneth Graunke 
Thanks Ken. I was wondering if people looked at them and went "meh ...
too small, we need something beefier" :-P

And now ... some C++ questions. I realise that templates (or is it
only STL?) are out of the question, but how about
 - Initialization upon object declaration, rather than copy
constructors ? Rather trivial yet we have thousands of
duplicated/wasted CPU cycles due to it. One example is the memset()
from {fs,src,dst}_reg. Does the compiler squash/optimize those for us
?
 - Where is the line about "big enough to pass as reference" rather
than a copy for i965 ? It seems that older code(?) and extremely
common things such as the *_reg are passed around as copies.

Cheers,
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/9] i965/vec4: Don't emit MOVs for unused URB slots.

2015-10-21 Thread Matt Turner

On Wed, Oct 21, 2015 at 1:52 PM, Emil Velikov  wrote:
> On 20 October 2015 at 05:08, Matt Turner  wrote:
>> Otherwise we'd emit a MOV from the null register (which isn't allowed).
>>
> Would you say it's a good idea to push the check down to the MOV()
> implementation ? If not perhaps we should add an assert() to easily
> catch cases like these in the future ?

Sort of, yes. :)

This series arose by writing an assembly validator that I plan to send today.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] svga: fix clip plane regression after recent tgsi_scan change

2015-10-21 Thread Brian Paul

Before the change "tgsi/scan: use properties for clip/cull distance
writemasks", the tgsi_shader_info::num_written_culldistance field
was a multiple of four, now it's an accurate count.  In the svga
driver, we need a minor change to the loop test.
---
 src/gallium/drivers/svga/svga_tgsi_vgpu10.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_tgsi_vgpu10.c 
b/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
index d62f2bb..332904f 100644
--- a/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
+++ b/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
@@ -3097,7 +3097,7 @@ emit_clip_distance_instructions(struct 
svga_shader_emitter_v10 *emit)
unsigned i;
unsigned clip_plane_enable = emit->key.clip_plane_enable;
unsigned clip_dist_tmp_index = emit->clip_dist_tmp_index;
-   unsigned num_written_clipdist = emit->info.num_written_clipdistance;
+   int num_written_clipdist = emit->info.num_written_clipdistance;
 
assert(emit->clip_dist_out_index != INVALID_INDEX);
assert(emit->clip_dist_tmp_index != INVALID_INDEX);
@@ -3109,7 +3109,7 @@ emit_clip_distance_instructions(struct 
svga_shader_emitter_v10 *emit)
 */
emit->clip_dist_tmp_index = INVALID_INDEX;
 
-   for (i = 0; i < 2 && num_written_clipdist; i++, num_written_clipdist-=4) {
+   for (i = 0; i < 2 && num_written_clipdist > 0; i++, 
num_written_clipdist-=4) {
 
   tmp_clip_dist_src = make_src_temp_reg(clip_dist_tmp_index + i);
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/4] drivers/common: use _mesa_RasterPos instead of _tnl_RasterPos

2015-10-21 Thread Brian Paul

---
 src/mesa/drivers/common/driverfuncs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/common/driverfuncs.c 
b/src/mesa/drivers/common/driverfuncs.c
index 3d1fccb..752aaf6 100644
--- a/src/mesa/drivers/common/driverfuncs.c
+++ b/src/mesa/drivers/common/driverfuncs.c
@@ -33,6 +33,7 @@
 #include "main/mipmap.h"
 #include "main/queryobj.h"
 #include "main/readpix.h"
+#include "main/rastpos.h"
 #include "main/renderbuffer.h"
 #include "main/shaderobj.h"
 #include "main/texcompress.h"
@@ -81,7 +82,7 @@ _mesa_init_driver_functions(struct dd_function_table *driver)
 
/* framebuffer/image functions */
driver->Clear = _swrast_Clear;
-   driver->RasterPos = _tnl_RasterPos;
+   driver->RasterPos = _mesa_RasterPos;
driver->DrawPixels = _swrast_DrawPixels;
driver->ReadPixels = _mesa_readpixels;
driver->CopyPixels = _swrast_CopyPixels;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/4] tnl: remove t_rasterpos.c

2015-10-21 Thread Brian Paul

---
 src/mesa/Makefile.sources  |   1 -
 src/mesa/tnl/t_rasterpos.c | 478 -
 2 files changed, 479 deletions(-)
 delete mode 100644 src/mesa/tnl/t_rasterpos.c

diff --git a/src/mesa/Makefile.sources b/src/mesa/Makefile.sources
index 34fb446..4bcaa62 100644
--- a/src/mesa/Makefile.sources
+++ b/src/mesa/Makefile.sources
@@ -345,7 +345,6 @@ TNL_FILES = \
tnl/tnl.h \
tnl/t_pipeline.c \
tnl/t_pipeline.h \
-   tnl/t_rasterpos.c \
tnl/t_vb_cliptmp.h \
tnl/t_vb_fog.c \
tnl/t_vb_light.c \
diff --git a/src/mesa/tnl/t_rasterpos.c b/src/mesa/tnl/t_rasterpos.c
deleted file mode 100644
index 4bd9ac8..000
--- a/src/mesa/tnl/t_rasterpos.c
+++ /dev/null
@@ -1,478 +0,0 @@
-/*
- * Mesa 3-D graphics library
- *
- * Copyright (C) 1999-2007  Brian Paul   All Rights Reserved.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included
- * in all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
- * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
- * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
- * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
- * OTHER DEALINGS IN THE SOFTWARE.
- */
-
-
-#include "c99_math.h"
-#include "main/glheader.h"
-#include "main/feedback.h"
-#include "main/light.h"
-#include "main/macros.h"
-#include "util/simple_list.h"
-#include "main/mtypes.h"
-#include "main/viewport.h"
-
-#include "math/m_matrix.h"
-#include "tnl/tnl.h"
-
-
-
-/**
- * Clip a point against the view volume.
- *
- * \param v vertex vector describing the point to clip.
- * 
- * \return zero if outside view volume, or one if inside.
- */
-static GLuint
-viewclip_point_xy( const GLfloat v[] )
-{
-   if (   v[0] > v[3] || v[0] < -v[3]
-   || v[1] > v[3] || v[1] < -v[3] ) {
-  return 0;
-   }
-   else {
-  return 1;
-   }
-}
-
-
-/**
- * Clip a point against the far/near Z clipping planes.
- *
- * \param v vertex vector describing the point to clip.
- * 
- * \return zero if outside view volume, or one if inside.
- */
-static GLuint
-viewclip_point_z( const GLfloat v[] )
-{
-   if (v[2] > v[3] || v[2] < -v[3] ) {
-  return 0;
-   }
-   else {
-  return 1;
-   }
-}
-
-
-/**
- * Clip a point against the user clipping planes.
- * 
- * \param ctx GL context.
- * \param v vertex vector describing the point to clip.
- * 
- * \return zero if the point was clipped, or one otherwise.
- */
-static GLuint
-userclip_point( struct gl_context *ctx, const GLfloat v[] )
-{
-   GLuint p;
-
-   for (p = 0; p < ctx->Const.MaxClipPlanes; p++) {
-  if (ctx->Transform.ClipPlanesEnabled & (1 << p)) {
-GLfloat dot = v[0] * ctx->Transform._ClipUserPlane[p][0]
-+ v[1] * ctx->Transform._ClipUserPlane[p][1]
-+ v[2] * ctx->Transform._ClipUserPlane[p][2]
-+ v[3] * ctx->Transform._ClipUserPlane[p][3];
- if (dot < 0.0F) {
-return 0;
- }
-  }
-   }
-
-   return 1;
-}
-
-
-/**
- * Compute lighting for the raster position.  RGB modes computed.
- * \param ctx the context
- * \param vertex vertex location
- * \param normal normal vector
- * \param Rcolor returned color
- * \param Rspec returned specular color (if separate specular enabled)
- */
-static void
-shade_rastpos(struct gl_context *ctx,
-  const GLfloat vertex[4],
-  const GLfloat normal[3],
-  GLfloat Rcolor[4],
-  GLfloat Rspec[4])
-{
-   /*const*/ GLfloat (*base)[3] = ctx->Light._BaseColor;
-   const struct gl_light *light;
-   GLfloat diffuseColor[4], specularColor[4];  /* for RGB mode only */
-
-   COPY_3V(diffuseColor, base[0]);
-   diffuseColor[3] = CLAMP( 
-  ctx->Light.Material.Attrib[MAT_ATTRIB_FRONT_DIFFUSE][3], 0.0F, 1.0F );
-   ASSIGN_4V(specularColor, 0.0, 0.0, 0.0, 1.0);
-
-   foreach (light, >Light.EnabledList) {
-  GLfloat attenuation = 1.0;
-  GLfloat VP[3]; /* vector from vertex to light pos */
-  GLfloat n_dot_VP;
-  GLfloat diffuseContrib[3], specularContrib[3];
-
-  if (!(light->_Flags & LIGHT_POSITIONAL)) {
- /* light at infinity */
-COPY_3V(VP, light->_VP_inf_norm);
-attenuation =

[Mesa-dev] [PATCH 1/4] mesa: copy rasterpos evaluation code into core Mesa

2015-10-21 Thread Brian Paul

We'll remove it from the tnl module next.  By lifting this code into core
Mesa we can use it from the gallium state tracker.
---
 src/mesa/main/rastpos.c | 441 
 src/mesa/main/rastpos.h |   3 +
 2 files changed, 444 insertions(+)

diff --git a/src/mesa/main/rastpos.c b/src/mesa/main/rastpos.c
index 54b2125..b468219 100644
--- a/src/mesa/main/rastpos.c
+++ b/src/mesa/main/rastpos.c
@@ -36,6 +36,447 @@
 #include "rastpos.h"
 #include "state.h"
 #include "main/dispatch.h"
+#include "main/viewport.h"
+#include "util/simple_list.h"
+
+
+
+/**
+ * Clip a point against the view volume.
+ *
+ * \param v vertex vector describing the point to clip.
+ *
+ * \return zero if outside view volume, or one if inside.
+ */
+static GLuint
+viewclip_point_xy( const GLfloat v[] )
+{
+   if (   v[0] > v[3] || v[0] < -v[3]
+   || v[1] > v[3] || v[1] < -v[3] ) {
+  return 0;
+   }
+   else {
+  return 1;
+   }
+}
+
+
+/**
+ * Clip a point against the far/near Z clipping planes.
+ *
+ * \param v vertex vector describing the point to clip.
+ *
+ * \return zero if outside view volume, or one if inside.
+ */
+static GLuint
+viewclip_point_z( const GLfloat v[] )
+{
+   if (v[2] > v[3] || v[2] < -v[3] ) {
+  return 0;
+   }
+   else {
+  return 1;
+   }
+}
+
+
+/**
+ * Clip a point against the user clipping planes.
+ *
+ * \param ctx GL context.
+ * \param v vertex vector describing the point to clip.
+ *
+ * \return zero if the point was clipped, or one otherwise.
+ */
+static GLuint
+userclip_point( struct gl_context *ctx, const GLfloat v[] )
+{
+   GLuint p;
+
+   for (p = 0; p < ctx->Const.MaxClipPlanes; p++) {
+  if (ctx->Transform.ClipPlanesEnabled & (1 << p)) {
+GLfloat dot = v[0] * ctx->Transform._ClipUserPlane[p][0]
++ v[1] * ctx->Transform._ClipUserPlane[p][1]
++ v[2] * ctx->Transform._ClipUserPlane[p][2]
++ v[3] * ctx->Transform._ClipUserPlane[p][3];
+ if (dot < 0.0F) {
+return 0;
+ }
+  }
+   }
+
+   return 1;
+}
+
+
+/**
+ * Compute lighting for the raster position.  RGB modes computed.
+ * \param ctx the context
+ * \param vertex vertex location
+ * \param normal normal vector
+ * \param Rcolor returned color
+ * \param Rspec returned specular color (if separate specular enabled)
+ */
+static void
+shade_rastpos(struct gl_context *ctx,
+  const GLfloat vertex[4],
+  const GLfloat normal[3],
+  GLfloat Rcolor[4],
+  GLfloat Rspec[4])
+{
+   /*const*/ GLfloat (*base)[3] = ctx->Light._BaseColor;
+   const struct gl_light *light;
+   GLfloat diffuseColor[4], specularColor[4];  /* for RGB mode only */
+
+   COPY_3V(diffuseColor, base[0]);
+   diffuseColor[3] = CLAMP(
+  ctx->Light.Material.Attrib[MAT_ATTRIB_FRONT_DIFFUSE][3], 0.0F, 1.0F );
+   ASSIGN_4V(specularColor, 0.0, 0.0, 0.0, 1.0);
+
+   foreach (light, >Light.EnabledList) {
+  GLfloat attenuation = 1.0;
+  GLfloat VP[3]; /* vector from vertex to light pos */
+  GLfloat n_dot_VP;
+  GLfloat diffuseContrib[3], specularContrib[3];
+
+  if (!(light->_Flags & LIGHT_POSITIONAL)) {
+ /* light at infinity */
+COPY_3V(VP, light->_VP_inf_norm);
+attenuation = light->_VP_inf_spot_attenuation;
+  }
+  else {
+ /* local/positional light */
+GLfloat d;
+
+ /* VP = vector from vertex pos to light[i].pos */
+SUB_3V(VP, light->_Position, vertex);
+ /* d = length(VP) */
+d = (GLfloat) LEN_3FV( VP );
+if (d > 1.0e-6F) {
+/* normalize VP */
+   GLfloat invd = 1.0F / d;
+   SELF_SCALE_SCALAR_3V(VP, invd);
+}
+
+ /* atti */
+attenuation = 1.0F / (light->ConstantAttenuation + d *
+  (light->LinearAttenuation + d *
+   light->QuadraticAttenuation));
+
+if (light->_Flags & LIGHT_SPOT) {
+   GLfloat PV_dot_dir = - DOT3(VP, light->_NormSpotDirection);
+
+   if (PV_dot_dir_CosCutoff) {
+  continue;
+   }
+   else {
+   GLfloat spot = powf(PV_dot_dir, light->SpotExponent);
+  attenuation *= spot;
+   }
+}
+  }
+
+  if (attenuation < 1e-3F)
+continue;
+
+  n_dot_VP = DOT3( normal, VP );
+
+  if (n_dot_VP < 0.0F) {
+ACC_SCALE_SCALAR_3V(diffuseColor, attenuation, light->_MatAmbient[0]);
+continue;
+  }
+
+  /* Ambient + diffuse */
+  COPY_3V(diffuseContrib, light->_MatAmbient[0]);
+  ACC_SCALE_SCALAR_3V(diffuseContrib, n_dot_VP, light->_MatDiffuse[0]);
+
+  /* Specular */
+  {
+ const GLfloat *h;
+ GLfloat n_dot_h;
+
+ ASSIGN_3V(specularContrib, 0.0, 0.0, 0.0);
+
+if (ctx->Light.Model.LocalViewer) {
+   GLfloat v[3];
+   COPY_3V(v, vertex);
+

[Mesa-dev] [PATCH 4/4] st/mesa: use _mesa_RasterPos() when possible

2015-10-21 Thread Brian Paul

The st_RasterPos() function goes to great pains to implement the
rasterpos transformation.  It basically uses gallium's draw module to
execute the vertex shader to draw a point, then capture that point's
attributes.

But glRasterPos isn't typically used with a vertex shader so we can
usually use the old/fixed-function implementation which is a lot simpler
and faster.

This can add up for legacy apps that make a lot of calls to glRasterPos.
---
 src/mesa/state_tracker/st_cb_rasterpos.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/mesa/state_tracker/st_cb_rasterpos.c 
b/src/mesa/state_tracker/st_cb_rasterpos.c
index b9997da..747b414 100644
--- a/src/mesa/state_tracker/st_cb_rasterpos.c
+++ b/src/mesa/state_tracker/st_cb_rasterpos.c
@@ -39,6 +39,7 @@
 #include "main/imports.h"
 #include "main/macros.h"
 #include "main/feedback.h"
+#include "main/rastpos.h"
 
 #include "st_context.h"
 #include "st_atom.h"
@@ -224,6 +225,15 @@ st_RasterPos(struct gl_context *ctx, const GLfloat v[4])
struct rastpos_stage *rs;
const struct gl_client_array **saved_arrays = ctx->Array._DrawArrays;
 
+   if (ctx->VertexProgram._Current == NULL ||
+   ctx->VertexProgram._Current == ctx->VertexProgram._TnlProgram) {
+  /* No vertex shader/program is enabled, used the simple/fast fixed-
+   * function implementation of RasterPos.
+   */
+  _mesa_RasterPos(ctx, v);
+  return;
+   }
+
if (st->rastpos_stage) {
   /* get rastpos stage info */
   rs = rastpos_stage(st->rastpos_stage);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] vbo: optimize vertex copying when 'wrapping'

2015-10-21 Thread Brian Paul

Instead of calling memcpy() 'n' times, we can do it all at once since
the source and dest regions are all contiguous.
---
 src/mesa/vbo/vbo_exec_api.c | 16 +++-
 src/mesa/vbo/vbo_save_api.c | 15 +++
 2 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/src/mesa/vbo/vbo_exec_api.c b/src/mesa/vbo/vbo_exec_api.c
index a23d5aa..d70fc3b 100644
--- a/src/mesa/vbo/vbo_exec_api.c
+++ b/src/mesa/vbo/vbo_exec_api.c
@@ -132,8 +132,7 @@ static void vbo_exec_wrap_buffers( struct vbo_exec_context 
*exec )
 static void
 vbo_exec_vtx_wrap(struct vbo_exec_context *exec)
 {
-   fi_type *data = exec->vtx.copied.buffer;
-   GLuint i;
+   GLuint numComponents;
 
/* Run pipeline on current vertices, copy wrapped vertices
 * to exec->vtx.copied.
@@ -149,13 +148,12 @@ vbo_exec_vtx_wrap(struct vbo_exec_context *exec)
 */
assert(exec->vtx.max_vert - exec->vtx.vert_count > exec->vtx.copied.nr);
 
-   for (i = 0 ; i < exec->vtx.copied.nr ; i++) {
-  memcpy( exec->vtx.buffer_ptr, data, 
- exec->vtx.vertex_size * sizeof(GLfloat));
-  exec->vtx.buffer_ptr += exec->vtx.vertex_size;
-  data += exec->vtx.vertex_size;
-  exec->vtx.vert_count++;
-   }
+   numComponents = exec->vtx.copied.nr * exec->vtx.vertex_size;
+   memcpy(exec->vtx.buffer_ptr,
+  exec->vtx.copied.buffer,
+  numComponents * sizeof(fi_type));
+   exec->vtx.buffer_ptr += numComponents;
+   exec->vtx.vert_count += exec->vtx.copied.nr;
 
exec->vtx.copied.nr = 0;
 }
diff --git a/src/mesa/vbo/vbo_save_api.c b/src/mesa/vbo/vbo_save_api.c
index d49aa15..d5570c7 100644
--- a/src/mesa/vbo/vbo_save_api.c
+++ b/src/mesa/vbo/vbo_save_api.c
@@ -601,8 +601,7 @@ static void
 _save_wrap_filled_vertex(struct gl_context *ctx)
 {
struct vbo_save_context *save = _context(ctx)->save;
-   fi_type *data = save->copied.buffer;
-   GLuint i;
+   GLuint numComponents;
 
/* Emit a glEnd to close off the last vertex list.
 */
@@ -612,12 +611,12 @@ _save_wrap_filled_vertex(struct gl_context *ctx)
 */
assert(save->max_vert - save->vert_count > save->copied.nr);
 
-   for (i = 0; i < save->copied.nr; i++) {
-  memcpy(save->buffer_ptr, data, save->vertex_size * sizeof(GLfloat));
-  data += save->vertex_size;
-  save->buffer_ptr += save->vertex_size;
-  save->vert_count++;
-   }
+   numComponents = save->copied.nr * save->vertex_size;
+   memcpy(save->buffer_ptr,
+  save->copied.buffer,
+  numComponents * sizeof(fi_type));
+   save->buffer_ptr += numComponents;
+   save->vert_count += save->copied.nr;
 }
 
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] mesa: check for unchanged line width before error checking

2015-10-21 Thread Brian Paul

---
 src/mesa/main/lines.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/mesa/main/lines.c b/src/mesa/main/lines.c
index c020fb3..93b80af 100644
--- a/src/mesa/main/lines.c
+++ b/src/mesa/main/lines.c
@@ -45,6 +45,10 @@ _mesa_LineWidth( GLfloat width )
if (MESA_VERBOSE & VERBOSE_API)
   _mesa_debug(ctx, "glLineWidth %f\n", width);
 
+   /* If width is unchanged, there can't be an error */
+   if (ctx->Line.Width == width)
+  return;
+
if (width <= 0.0F) {
   _mesa_error( ctx, GL_INVALID_VALUE, "glLineWidth" );
   return;
@@ -68,9 +72,6 @@ _mesa_LineWidth( GLfloat width )
   return;
}
 
-   if (ctx->Line.Width == width)
-  return;
-
FLUSH_VERTICES(ctx, _NEW_LINE);
ctx->Line.Width = width;
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/9] i965: Fill out instruction list.

2015-10-21 Thread Matt Turner

Add some instructions: illegal, movi, sends, sendsc.

Remove some instructions with reused opcodes: msave, mrestore, push,
pop, goto. I did have some gross code for disassembling opcodes
per-generation, but there's very little meaningful overlap so it's
probably not needed.
---
 src/mesa/drivers/dri/i965/brw_defines.h  | 37 ++--
 src/mesa/drivers/dri/i965/brw_disasm.c   | 16 --
 src/mesa/drivers/dri/i965/brw_shader.cpp |  2 +-
 3 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 393f17a..26fc0af 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -838,43 +838,62 @@ enum PACKED brw_horizontal_stride {
 
 enum opcode {
/* These are the actual hardware opcodes. */
+   BRW_OPCODE_ILLEGAL = 0,
BRW_OPCODE_MOV =1,
BRW_OPCODE_SEL =2,
+   BRW_OPCODE_MOVI =   3,   /**< G45+ */
BRW_OPCODE_NOT =4,
BRW_OPCODE_AND =5,
BRW_OPCODE_OR = 6,
BRW_OPCODE_XOR =7,
BRW_OPCODE_SHR =8,
BRW_OPCODE_SHL =9,
+   // BRW_OPCODE_DIM = 10,  /**< Gen7.5 only */ /* Reused */
+   // BRW_OPCODE_SMOV =10,  /**< Gen8+   */ /* Reused */
+   /* Reserved - 11 */
BRW_OPCODE_ASR =12,
+   /* Reserved - 13-15 */
BRW_OPCODE_CMP =16,
BRW_OPCODE_CMPN =   17,
BRW_OPCODE_CSEL =   18,  /**< Gen8+ */
BRW_OPCODE_F32TO16 = 19,  /**< Gen7 only */
BRW_OPCODE_F16TO32 = 20,  /**< Gen7 only */
+   /* Reserved - 21-22 */
BRW_OPCODE_BFREV =  23,  /**< Gen7+ */
BRW_OPCODE_BFE =24,  /**< Gen7+ */
BRW_OPCODE_BFI1 =   25,  /**< Gen7+ */
BRW_OPCODE_BFI2 =   26,  /**< Gen7+ */
+   /* Reserved - 27-31 */
BRW_OPCODE_JMPI =   32,
+   // BRW_OPCODE_BRD = 33,  /**< Gen7+ */
BRW_OPCODE_IF = 34,
-   BRW_OPCODE_IFF =35,  /**< Pre-Gen6 */
+   BRW_OPCODE_IFF =35,  /**< Pre-Gen6*/ /* Reused */
+   // BRW_OPCODE_BRC = 35,  /**< Gen7+   */ /* Reused */
BRW_OPCODE_ELSE =   36,
BRW_OPCODE_ENDIF =  37,
-   BRW_OPCODE_DO = 38,
+   BRW_OPCODE_DO = 38,  /**< Pre-Gen6*/ /* Reused */
+   // BRW_OPCODE_CASE =38,  /**< Gen6 only   */ /* Reused */
BRW_OPCODE_WHILE =  39,
BRW_OPCODE_BREAK =  40,
BRW_OPCODE_CONTINUE = 41,
BRW_OPCODE_HALT =   42,
-   BRW_OPCODE_MSAVE =  44,  /**< Pre-Gen6 */
-   BRW_OPCODE_MRESTORE = 45, /**< Pre-Gen6 */
-   BRW_OPCODE_PUSH =   46,  /**< Pre-Gen6 */
-   BRW_OPCODE_GOTO =   46,  /**< Gen8+*/
-   BRW_OPCODE_POP =47,  /**< Pre-Gen6 */
+   // BRW_OPCODE_CALLA =   43,  /**< Gen7.5+ */
+   // BRW_OPCODE_MSAVE =   44,  /**< Pre-Gen6*/ /* Reused */
+   // BRW_OPCODE_CALL =44,  /**< Gen6+   */ /* Reused */
+   // BRW_OPCODE_MREST =   45,  /**< Pre-Gen6*/ /* Reused */
+   // BRW_OPCODE_RET = 45,  /**< Gen6+   */ /* Reused */
+   // BRW_OPCODE_PUSH =46,  /**< Pre-Gen6*/ /* Reused */
+   // BRW_OPCODE_FORK =46,  /**< Gen6 only   */ /* Reused */
+   // BRW_OPCODE_GOTO =46,  /**< Gen8+   */ /* Reused */
+   // BRW_OPCODE_POP = 47,  /**< Pre-Gen6*/
BRW_OPCODE_WAIT =   48,
BRW_OPCODE_SEND =   49,
BRW_OPCODE_SENDC =  50,
+   BRW_OPCODE_SENDS =  51,  /**< Gen9+ */
+   BRW_OPCODE_SENDSC = 52,  /**< Gen9+ */
+   /* Reserved 53-55 */
BRW_OPCODE_MATH =   56,  /**< Gen6+ */
+   /* Reserved 57-63 */
BRW_OPCODE_ADD =64,
BRW_OPCODE_MUL =65,
BRW_OPCODE_AVG =66,
@@ -893,14 +912,18 @@ enum opcode {
BRW_OPCODE_SUBB =   79,  /**< Gen7+ */
BRW_OPCODE_SAD2 =   80,
BRW_OPCODE_SADA2 =  81,
+   /* Reserved 82-83 */
BRW_OPCODE_DP4 =84,
BRW_OPCODE_DPH =85,
BRW_OPCODE_DP3 =86,
BRW_OPCODE_DP2 =87,
+   /* Reserved 88 */
BRW_OPCODE_LINE =   89,
BRW_OPCODE_PLN =90,  /**< G45+ */
BRW_OPCODE_MAD =91,  /**< Gen6+ */
BRW_OPCODE_LRP =92,  /**< Gen6+ */
+   // BRW_OPCODE_MADM =93,  /**< Gen8+ */
+   /* Reserved 94-124 */
BRW_OPCODE_NENOP =  125, /**< G45 only */
BRW_OPCODE_NOP =126,
 
diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c 
b/src/mesa/drivers/dri/i965/brw_disasm.c
index db23a18..c2dac7c 100644
--- a/src/mesa/drivers/dri/i965/brw_disasm.c
+++ b/src/mesa/drivers/dri/i965/brw_disasm.c
@@ -34,6 +34,7 @@
 
 const struct opcode_desc opcode_descs[128] = {
[BRW_OPCODE_MOV]  = { .name = "mov", .nsrc = 1, .ndst = 1 },
+   [BRW_OPCODE_MOVI] = { .name = "movi",.nsrc = 2, .ndst = 1 },
[BRW_OPCODE_FRC]  = { .name = "frc", .nsrc = 1, .ndst = 1 },
[BRW_OPCODE_RNDU] = { .name = "rndu",.nsrc = 1, .ndst = 1 },
[BRW_OPCODE_RNDD] = { .name = "rndd",.nsrc = 1, .ndst = 1 },
@@ -83,6 +84,9 @@ const struct opcode_desc opcode_descs[128] = {
 
[BRW_OPCODE_SEND] = { .name = "send",.nsrc = 1, .ndst = 1 },
[BRW_OPCODE_SENDC]= { .name = "sendc",   .nsrc =

[Mesa-dev] [PATCH 7/9] i965: Add initial assembly validation pass.

2015-10-21 Thread Matt Turner

Initially just checks that sources are non-NULL, which would have
alerted us to the problem fixed by commit 6c846dc5.
---
 src/mesa/drivers/dri/i965/Makefile.sources   |   1 +
 src/mesa/drivers/dri/i965/brw_eu.h   |   4 +
 src/mesa/drivers/dri/i965/brw_eu_validate.c  | 150 +++
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |   8 ++
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |   8 ++
 5 files changed, 171 insertions(+)
 create mode 100644 src/mesa/drivers/dri/i965/brw_eu_validate.c

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index c2438bd..7cd9cc0 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -14,6 +14,7 @@ i965_compiler_FILES = \
brw_eu_emit.c \
brw_eu.h \
brw_eu_util.c \
+   brw_eu_validate.c \
brw_fs_builder.h \
brw_fs_channel_expressions.cpp \
brw_fs_cmod_propagation.cpp \
diff --git a/src/mesa/drivers/dri/i965/brw_eu.h 
b/src/mesa/drivers/dri/i965/brw_eu.h
index 1345db7..829e393 100644
--- a/src/mesa/drivers/dri/i965/brw_eu.h
+++ b/src/mesa/drivers/dri/i965/brw_eu.h
@@ -522,6 +522,10 @@ bool brw_try_compact_instruction(const struct 
brw_device_info *devinfo,
 void brw_debug_compact_uncompact(const struct brw_device_info *devinfo,
  brw_inst *orig, brw_inst *uncompacted);
 
+/* brw_eu_validate.c */
+bool brw_validate_instructions(const struct brw_codegen *p, int start_offset,
+   struct annotation_info *annotation);
+
 static inline int
 next_offset(const struct brw_device_info *devinfo, void *store, int offset)
 {
diff --git a/src/mesa/drivers/dri/i965/brw_eu_validate.c 
b/src/mesa/drivers/dri/i965/brw_eu_validate.c
new file mode 100644
index 000..85d4c19
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_eu_validate.c
@@ -0,0 +1,150 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+/** @file brw_eu_validate.c
+ *
+ * This file implements a pass that validates shader assembly.
+ */
+
+#include "brw_eu.h"
+
+/* We're going to do lots of string concatenation, so this should help. */
+struct string {
+   char *str;
+   size_t len;
+};
+
+static void
+cat(struct string *dest, const struct string src)
+{
+   dest->str = realloc(dest->str, dest->len + src.len + 1);
+   memcpy(dest->str + dest->len, src.str, src.len);
+   dest->str[dest->len + src.len + 1] = '\0';
+   dest->len = dest->len + src.len;
+}
+#define CAT(dest, src) cat(, (struct string){src, strlen(src)})
+
+#define error(str) "\tERROR: " str "\n"
+
+#define ERROR_IF(cond, msg)  \
+   do {  \
+  if (cond) {\
+ CAT(error_msg, error(msg)); \
+ valid = false;  \
+  }  \
+   } while(0)
+
+static bool
+src0_is_null(const struct brw_device_info *devinfo, const brw_inst *inst)
+{
+   return brw_inst_src0_reg_file(devinfo, inst) == 
BRW_ARCHITECTURE_REGISTER_FILE &&
+  brw_inst_src0_da_reg_nr(devinfo, inst) == BRW_ARF_NULL;
+}
+
+static bool
+src1_is_null(const struct brw_device_info *devinfo, const brw_inst *inst)
+{
+   return brw_inst_src1_reg_file(devinfo, inst) == 
BRW_ARCHITECTURE_REGISTER_FILE &&
+  brw_inst_src1_da_reg_nr(devinfo, inst) == BRW_ARF_NULL;
+}
+
+static unsigned
+num_sources_from_inst(const struct brw_device_info *devinfo,
+  const brw_inst *inst)
+{
+   unsigned math_function;
+
+   if (brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MATH) {
+  math_function = brw_inst_math_function(devinfo, inst);
+   } else if (devinfo->gen < 6 &&
+  brw_inst_opcode(devinfo, inst) == BRW_OPCODE_SEND) {
+  if (brw_inst_sfid(devinfo, inst) == BRW_SFID_MATH) {
+ math_function =

[Mesa-dev] [PATCH 9/9] i965: Check accumulator restrictions.

2015-10-21 Thread Matt Turner

---
 src/mesa/drivers/dri/i965/brw_eu_validate.c | 244 
 1 file changed, 244 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_validate.c 
b/src/mesa/drivers/dri/i965/brw_eu_validate.c
index eb57962..3d16f90 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_validate.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_validate.c
@@ -54,6 +54,16 @@ cat(struct string *dest, const struct string src)
   }  \
} while(0)
 
+#define CHECK(func) \
+   do { \
+  struct string __msg = func; \
+  if (__msg.str) { \
+ cat(_msg, __msg); \
+ free(__msg.str); \
+ valid = false; \
+  } \
+   } while (0)
+
 static bool
 src0_is_null(const struct brw_device_info *devinfo, const brw_inst *inst)
 {
@@ -68,6 +78,42 @@ src1_is_null(const struct brw_device_info *devinfo, const 
brw_inst *inst)
   brw_inst_src1_da_reg_nr(devinfo, inst) == BRW_ARF_NULL;
 }
 
+static bool
+dst_is_accumulator(const struct brw_device_info *devinfo, const brw_inst *inst)
+{
+   return brw_inst_dst_reg_file(devinfo, inst) == 
BRW_ARCHITECTURE_REGISTER_FILE &&
+  brw_inst_dst_da_reg_nr(devinfo, inst) == BRW_ARF_ACCUMULATOR;
+}
+
+static bool
+src0_is_accumulator(const struct brw_device_info *devinfo, const brw_inst 
*inst)
+{
+   return brw_inst_src0_reg_file(devinfo, inst) == 
BRW_ARCHITECTURE_REGISTER_FILE &&
+  brw_inst_src0_da_reg_nr(devinfo, inst) == BRW_ARF_ACCUMULATOR;
+}
+
+static bool
+src1_is_accumulator(const struct brw_device_info *devinfo, const brw_inst 
*inst)
+{
+   return brw_inst_src1_reg_file(devinfo, inst) == 
BRW_ARCHITECTURE_REGISTER_FILE &&
+  brw_inst_src1_da_reg_nr(devinfo, inst) == BRW_ARF_ACCUMULATOR;
+}
+
+static bool
+is_integer(enum brw_reg_type type)
+{
+   return type == BRW_REGISTER_TYPE_UD ||
+  type == BRW_REGISTER_TYPE_D ||
+  type == BRW_REGISTER_TYPE_UW ||
+  type == BRW_REGISTER_TYPE_W ||
+  type == BRW_REGISTER_TYPE_UB ||
+  type == BRW_REGISTER_TYPE_B ||
+  type == BRW_REGISTER_TYPE_V ||
+  type == BRW_REGISTER_TYPE_UV ||
+  type == BRW_REGISTER_TYPE_UQ ||
+  type == BRW_REGISTER_TYPE_Q;
+}
+
 enum gen {
GEN4  = (1 << 0),
GEN45 = (1 << 1),
@@ -83,40 +129,66 @@ enum gen {
 #define GEN_GE(gen) (~((gen) - 1) | gen)
 #define GEN_LE(gen) (((gen) - 1) | gen)
 
+enum acc {
+   ACC_NO_RESTRICTIONS = 0,
+   ACC_GEN_DEPENDENT = (1 << 0),
+   ACC_NO_EXPLICIT_SOURCE = (1 << 1),
+   ACC_NO_EXPLICIT_DESTINATION = (1 << 2),
+   ACC_NO_IMPLICIT_DESTINATION = (1 << 3),
+   ACC_NO_DESTINATION = ACC_NO_EXPLICIT_DESTINATION |
+ACC_NO_IMPLICIT_DESTINATION,
+   ACC_NO_ACCESS = ACC_NO_EXPLICIT_SOURCE |
+   ACC_NO_DESTINATION,
+   ACC_NO_SOURCE_MODIFIER = (1 << 4),
+   ACC_NO_INTEGER_SOURCE = (1 << 5),
+   ACC_IMPLICIT_WRITE_REQUIRED = (1 << 6),
+   ACC_NOT_BOTH_SOURCE_AND_DESTINATION = (1 << 7),
+};
+
 struct inst_info {
enum gen gen;
+   enum acc acc;
 };
 
 static const struct inst_info inst_info[128] = {
[BRW_OPCODE_ILLEGAL] = {
   .gen = GEN_ALL,
+  .acc = ACC_NO_ACCESS,
},
[BRW_OPCODE_MOV] = {
   .gen = GEN_ALL,
+  .acc = ACC_NOT_BOTH_SOURCE_AND_DESTINATION,
},
[BRW_OPCODE_SEL] = {
   .gen = GEN_ALL,
+  .acc = ACC_GEN_DEPENDENT,
},
[BRW_OPCODE_MOVI] = {
   .gen = GEN_GE(GEN45),
+  .acc = ACC_NO_EXPLICIT_SOURCE,
},
[BRW_OPCODE_NOT] = {
   .gen = GEN_ALL,
+  .acc = ACC_NO_SOURCE_MODIFIER,
},
[BRW_OPCODE_AND] = {
   .gen = GEN_ALL,
+  .acc = ACC_NO_SOURCE_MODIFIER,
},
[BRW_OPCODE_OR] = {
   .gen = GEN_ALL,
+  .acc = ACC_NO_SOURCE_MODIFIER,
},
[BRW_OPCODE_XOR] = {
   .gen = GEN_ALL,
+  .acc = ACC_NO_SOURCE_MODIFIER,
},
[BRW_OPCODE_SHR] = {
   .gen = GEN_ALL,
},
[BRW_OPCODE_SHL] = {
   .gen = GEN_ALL,
+  .acc = ACC_NO_DESTINATION,
},
/* BRW_OPCODE_DIM / BRW_OPCODE_SMOV */
/* Reserved - 11 */
@@ -126,63 +198,81 @@ static const struct inst_info inst_info[128] = {
/* Reserved - 13-15 */
[BRW_OPCODE_CMP] = {
   .gen = GEN_ALL,
+  .acc = ACC_GEN_DEPENDENT,
},
[BRW_OPCODE_CMPN] = {
   .gen = GEN_ALL,
+  .acc = ACC_GEN_DEPENDENT,
},
[BRW_OPCODE_CSEL] = {
   .gen = GEN_GE(GEN8),
},
[BRW_OPCODE_F32TO16] = {
   .gen = GEN7 | GEN75,
+  .acc = ACC_NO_ACCESS,
},
[BRW_OPCODE_F16TO32] = {
   .gen = GEN7 | GEN75,
+  .acc = ACC_NO_ACCESS,
},
/* Reserved - 21-22 */
[BRW_OPCODE_BFREV] = {
   .gen = GEN_GE(GEN7),
+  .acc = ACC_NO_ACCESS,
},
[BRW_OPCODE_BFE] = {
   .gen = GEN_GE(GEN7),
+  .acc = ACC_NO_IMPLICIT_DESTINATION,
},
[BRW_OPCODE_BFI1] = {
   .gen = GEN_GE(GEN7),
+  .acc = ACC_NO_ACCESS,
},
[BRW_OPCODE_BFI2] = {
   .gen = GEN_GE(GEN7),
+  .acc = ACC_NO_IMPLICIT_DESTINATION,
},
/* Reserved -

[Mesa-dev] [PATCH 6/9] i965: Add annotation_insert_error() and support for printing errors.

2015-10-21 Thread Matt Turner

Will allow annotations to contain error messages (indicating an
instruction violates a rule for instance) that are printed after the
disassembly of the block.
---
 src/mesa/drivers/dri/i965/intel_asm_annotation.c | 60 
 src/mesa/drivers/dri/i965/intel_asm_annotation.h |  7 +++
 2 files changed, 67 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/intel_asm_annotation.c 
b/src/mesa/drivers/dri/i965/intel_asm_annotation.c
index 58830db..eaee386 100644
--- a/src/mesa/drivers/dri/i965/intel_asm_annotation.c
+++ b/src/mesa/drivers/dri/i965/intel_asm_annotation.c
@@ -69,6 +69,10 @@ dump_assembly(void *assembly, int num_annotations, struct 
annotation *annotation
 
   brw_disassemble(devinfo, assembly, start_offset, end_offset, stderr);
 
+  if (annotation[i].error) {
+ fputs(annotation[i].error, stderr);
+  }
+
   if (annotation[i].block_end) {
  fprintf(stderr, "   END B%d", annotation[i].block_end->num);
  foreach_list_typed(struct bblock_link, successor_link, link,
@@ -152,3 +156,59 @@ annotation_finalize(struct annotation_info *annotation,
}
annotation->ann[annotation->ann_count].offset = next_inst_offset;
 }
+
+void
+annotation_insert_error(struct annotation_info *annotation, unsigned offset,
+const char *error)
+{
+   struct annotation *ann = NULL;
+
+   if (!annotation->ann_count)
+  return;
+
+   /* We may have to split an annotation, so ensure we have enough space
+* allocated for that case up front.
+*/
+   if (annotation->ann_size <= annotation->ann_count) {
+  int old_size = annotation->ann_size;
+  annotation->ann_size = MAX2(1024, annotation->ann_size * 2);
+  annotation->ann = reralloc(annotation->mem_ctx, annotation->ann,
+ struct annotation, annotation->ann_size);
+  if (!annotation->ann)
+ return;
+
+  memset(annotation->ann + old_size, 0,
+ (annotation->ann_size - old_size) * sizeof(struct annotation));
+   }
+
+   for (int i = 0; i <= annotation->ann_count; i++) {
+  if (annotation->ann[i].offset <= offset)
+ continue;
+
+  struct annotation *cur = >ann[i - 1];
+  struct annotation *next = >ann[i];
+  ann = cur;
+
+  if (offset + sizeof(brw_inst) != next->offset) {
+ memmove(next, cur,
+ (annotation->ann_count - i + 2) * sizeof(struct annotation));
+ cur->error = NULL;
+ cur->error_length = 0;
+ cur->block_end = NULL;
+ next->offset = offset + sizeof(brw_inst);
+ next->block_start = NULL;
+ annotation->ann_count++;
+  }
+  break;
+   }
+
+   assume(ann != NULL);
+
+   ralloc_asprintf_rewrite_tail(>error, >error_length, error);
+
+   /* FIXME: ralloc_vasprintf_rewrite_tail() allocates memory out of the
+* null context. We have to reparent the it if we want it to be freed
+* with the rest of the annotation context.
+*/
+   ralloc_steal(annotation->mem_ctx, ann->error);
+}
diff --git a/src/mesa/drivers/dri/i965/intel_asm_annotation.h 
b/src/mesa/drivers/dri/i965/intel_asm_annotation.h
index 6c72326..662a4b4 100644
--- a/src/mesa/drivers/dri/i965/intel_asm_annotation.h
+++ b/src/mesa/drivers/dri/i965/intel_asm_annotation.h
@@ -37,6 +37,9 @@ struct cfg_t;
 struct annotation {
int offset;
 
+   size_t error_length;
+   char *error;
+
/* Pointers to the basic block in the CFG if the instruction group starts
 * or ends a basic block.
 */
@@ -69,6 +72,10 @@ annotate(const struct brw_device_info *devinfo,
 void
 annotation_finalize(struct annotation_info *annotation, unsigned offset);
 
+void
+annotation_insert_error(struct annotation_info *annotation, unsigned offset,
+const char *error);
+
 #ifdef __cplusplus
 } /* extern "C" */
 #endif
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/9] ralloc: Set *start in ralloc_vasprintf_rewrite_tail() if str is NULL.

2015-10-21 Thread Matt Turner

We were leaving it undefined, even though we were writing a string to
*str.
---
 src/util/ralloc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/util/ralloc.c b/src/util/ralloc.c
index e07fce7..bb4cf96 100644
--- a/src/util/ralloc.c
+++ b/src/util/ralloc.c
@@ -499,6 +499,7 @@ ralloc_vasprintf_rewrite_tail(char **str, size_t *start, 
const char *fmt,
if (unlikely(*str == NULL)) {
   // Assuming a NULL context is probably bad, but it's expected behavior.
   *str = ralloc_vasprintf(NULL, fmt, args);
+  *start = strlen(*str);
   return true;
}
 
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/9] i965: Combine assembly annotations if possible.

2015-10-21 Thread Matt Turner

Often annotations are identical between sets of consecutive
instructions. We can perhaps avoid some memory allocations by reusing
the previous annotation.
---
 src/mesa/drivers/dri/i965/intel_asm_annotation.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_asm_annotation.c 
b/src/mesa/drivers/dri/i965/intel_asm_annotation.c
index f87a9bb..58830db 100644
--- a/src/mesa/drivers/dri/i965/intel_asm_annotation.c
+++ b/src/mesa/drivers/dri/i965/intel_asm_annotation.c
@@ -112,6 +112,20 @@ void annotate(const struct brw_device_info *devinfo,
   ann->block_start = cfg->blocks[annotation->cur_block];
}
 
+   if (bblock_end(cfg->blocks[annotation->cur_block]) == inst) {
+  ann->block_end = cfg->blocks[annotation->cur_block];
+  annotation->cur_block++;
+   }
+
+   /* Merge this annotation with the previous if possible. */
+   struct annotation *prev = >ann[annotation->ann_count - 2];
+   if (ann->ir == prev->ir &&
+   ann->annotation == prev->annotation &&
+   ann->block_start == NULL) {
+  annotation->ann_count--;
+  return;
+   }
+
/* There is no hardware DO instruction on Gen6+, so since DO always
 * starts a basic block, we need to set the .block_start of the next
 * instruction's annotation with a pointer to the bblock started by
@@ -123,11 +137,6 @@ void annotate(const struct brw_device_info *devinfo,
if (devinfo->gen >= 6 && inst->opcode == BRW_OPCODE_DO) {
   annotation->ann_count--;
}
-
-   if (bblock_end(cfg->blocks[annotation->cur_block]) == inst) {
-  ann->block_end = cfg->blocks[annotation->cur_block];
-  annotation->cur_block++;
-   }
 }
 
 void
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/9] i965: Don't consider control flow instructions to have sources.

2015-10-21 Thread Matt Turner

And why did IFF have a destination?

I suspect that once upon a time the disassembler used this information
to know which fields to find the jump targets in. The jump targets have
moved, so the disassembler has to know how to handle these
per-generation anyway.
---
 src/mesa/drivers/dri/i965/brw_disasm.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c 
b/src/mesa/drivers/dri/i965/brw_disasm.c
index c2dac7c..29056ed 100644
--- a/src/mesa/drivers/dri/i965/brw_disasm.c
+++ b/src/mesa/drivers/dri/i965/brw_disasm.c
@@ -90,20 +90,20 @@ const struct opcode_desc opcode_descs[128] = {
[BRW_OPCODE_NOP]  = { .name = "nop", .nsrc = 0, .ndst = 0 },
[BRW_OPCODE_NENOP]= { .name = "nenop",   .nsrc = 0, .ndst = 0 },
[BRW_OPCODE_JMPI] = { .name = "jmpi",.nsrc = 0, .ndst = 0 },
-   [BRW_OPCODE_IF]   = { .name = "if",  .nsrc = 2, .ndst = 0 },
-   [BRW_OPCODE_IFF]  = { .name = "iff", .nsrc = 2, .ndst = 1 },
-   [BRW_OPCODE_WHILE]= { .name = "while",   .nsrc = 2, .ndst = 0 },
-   [BRW_OPCODE_ELSE] = { .name = "else",.nsrc = 2, .ndst = 0 },
-   [BRW_OPCODE_BREAK]= { .name = "break",   .nsrc = 2, .ndst = 0 },
-   [BRW_OPCODE_CONTINUE] = { .name = "cont",.nsrc = 1, .ndst = 0 },
-   [BRW_OPCODE_HALT] = { .name = "halt",.nsrc = 1, .ndst = 0 },
+   [BRW_OPCODE_IF]   = { .name = "if",  .nsrc = 0, .ndst = 0 },
+   [BRW_OPCODE_IFF]  = { .name = "iff", .nsrc = 0, .ndst = 0 },
+   [BRW_OPCODE_WHILE]= { .name = "while",   .nsrc = 0, .ndst = 0 },
+   [BRW_OPCODE_ELSE] = { .name = "else",.nsrc = 0, .ndst = 0 },
+   [BRW_OPCODE_BREAK]= { .name = "break",   .nsrc = 0, .ndst = 0 },
+   [BRW_OPCODE_CONTINUE] = { .name = "cont",.nsrc = 0, .ndst = 0 },
+   [BRW_OPCODE_HALT] = { .name = "halt",.nsrc = 0, .ndst = 0 },
// [BRW_OPCODE_MSAVE]= { .name = "msave",   .nsrc = 1, .ndst = 1 },
// [BRW_OPCODE_PUSH] = { .name = "push",.nsrc = 1, .ndst = 1 },
// [BRW_OPCODE_MREST]= { .name = "mrest",   .nsrc = 1, .ndst = 1 },
// [BRW_OPCODE_POP]  = { .name = "pop", .nsrc = 2, .ndst = 0 },
[BRW_OPCODE_WAIT] = { .name = "wait",.nsrc = 1, .ndst = 0 },
[BRW_OPCODE_DO]   = { .name = "do",  .nsrc = 0, .ndst = 0 },
-   [BRW_OPCODE_ENDIF]= { .name = "endif",   .nsrc = 2, .ndst = 0 },
+   [BRW_OPCODE_ENDIF]= { .name = "endif",   .nsrc = 0, .ndst = 0 },
 };
 
 static bool
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/9] i965 assembly validator

2015-10-21 Thread Matt Turner

Inspired by a bug this summer, I've written a basic assembly validation
pass. The series currently checks only three things:

   - that instruction sources are not null (when they shouldn't be);
   - that the Gen supports the instruction opcode; and
   - that the various accumulator restrictions are satisfied.

To do this, I add a bit of infrastructure to the annotation system, to
allow errors to be printed.

In debug builds, an assertion checks that the assembly was validated.
When using INTEL_DEBUG=, ERROR: ... will be emitted
following a bad instruction, with a hopefully helpful message describing
what's wrong. For example:

   mov(8)  g116<1>.xUD null
  ERROR: src0 is null
   [...]
   shader_runner: 
../../../../../../mesa/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp:1671: 
void brw::vec4_generator::generate_code(const cfg_t*, const nir_shader*): 
Assertion `validated' failed.

I'd like to extend the validator to cover many more cases, but that's
no reason to hold back what I've already written -- checking for null 
source would have saved three people a week or two after all.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/6] i965: fix cycle estimates when there's a pipeline stall

2015-10-21 Thread Jason Ekstrand

I'm not 100% sure if this actually matches the hardware.  It's
possible that some of the issue time is used to determine interference
and do the thread switch in which case, there may be some overlap.
However, it's definitely better than what we had before since, before,
issue time would get completely over-written if we have a significant
unblocked_time and that doesn't seem right.

Reviewed-by: Jason Ekstrand 

On Fri, Oct 2, 2015 at 2:37 PM, Connor Abbott  wrote:
> The issue time for an instruction is how many cycles it takes to
> actually put it into the pipeline. If there's a pipeline stall that
> causes the instruction to be delayed, we should first take that into
> account to figure out when the instruction would start executing and
> *then* add the issue time. The old code had it backwards, and so we
> would underestimate the total time whenever we thought there would be a
> pipeline stall by up to the issue time of the instruction.
>
> Signed-off-by: Connor Abbott 
> ---
>  src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp 
> b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> index 4e43e5c..76d58e2 100644
> --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> @@ -1405,18 +1405,19 @@ instruction_scheduler::schedule_instructions(bblock_t 
> *block)
>instructions_to_schedule--;
>update_register_pressure(chosen->inst);
>
> +  /* If we expected a delay for scheduling, then bump the clock to 
> reflect
> +   * that.  In reality, the hardware will switch to another hyperthread
> +   * and may not return to dispatching our thread for a while even after
> +   * we're unblocked.  After this, we have the time when the chosen
> +   * instruction will start executing.
> +   */
> +  time = MAX2(time, chosen->unblocked_time);
> +
>/* Update the clock for how soon an instruction could start after the
> * chosen one.
> */
>time += issue_time(chosen->inst);
>
> -  /* If we expected a delay for scheduling, then bump the clock to 
> reflect
> -   * that as well.  In reality, the hardware will switch to another
> -   * hyperthread and may not return to dispatching our thread for a while
> -   * even after we're unblocked.
> -   */
> -  time = MAX2(time, chosen->unblocked_time);
> -
>if (debug) {
>   fprintf(stderr, "clock %4d, scheduled: ", time);
>   bs->dump_instruction(chosen->inst);
> --
> 2.1.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Introducing OpenSWR: High performance software rasterizer

2015-10-21 Thread Rowley, Timothy O


> On Oct 20, 2015, at 2:03 PM, Roland Scheidegger  wrote:
> 
> Certainly looks interesting...
> From a high level point of view, seems quite similar to llvmpipe (both
> tile based, using llvm for jitting shaders, ...). Of course llvmpipe
> isn't well suited for these kind of workloads (the most important use
> case is desktop compositing, so a couple dozen vertices per frame but
> millions of pixels...). Making vertex loads scale is something which
> just wasn't worth the effort so far (there's not actually that many
> people working on llvmpipe), albeit we realize that the completely
> non-parallel nature of it currently actually can hinder scaling quite a
> bit even for "typical" workloads (not desktop compositing, but "simple"
> 3d apps) once you've got enough cores/threads (8 or so), but that's
> something we're not worried too much about.
> I think requiring llvm 3.6 probably isn't going to work if you want to
> upstream this, a minimum version of 3.6 is fine but the general rule is
> things should still work with newer versions (including current
> development version, seems like you're using c++ interface of llvm quite
> a bit so that's probably going to require some #ifdef mess). Albeit I
> guess if you just don't try to build the driver with non-released
> versions that's probably ok (but will limit the ability for some people
> to try out your driver).

Some differences between llvmpipe and swr based on my understanding of 
llvmpipe’s architecture:

threading model
llvmpipe: single threaded vertex processing, up to 16 rasterization 
threads
swr: common thread pool that pick up frontend or backend work as 
available
vertex processing
llvmpipe: entire draw call processed in a single pass
swr: large draws chopped into chunks that can be processed in parallel
frontend/backend coupling
llvmpipe: separate binning pass in single threaded frontend
swr: frontend vertex processing and binning combined in a single pass
primitive assembly and binning
llvmpipe: scalar c code
swr: x86 avx/avx2 working on vector of primitives
fragment processing
llvmpipe: single jitted shader combining depth/fragment/stencil/blend 
on16x16 block
swr: separate jitted fragment and blend shaders, plus templated depth 
test
in-memory representation
llvmpipe: direct access to render targets
swr: hot-tile working representation with load and/or store at required 
times

As you say, we do use LLVM’s C++ API.  While that has some advantages, it’s not 
guaranteed to be stable and can/does make nontrivial changes.  3.6 to 3.7 made 
some change to at least the GEP instruction which we could work around if 
necessary for upstreaming.

-Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/6] i965: always run the post-RA scheduler

2015-10-21 Thread Jason Ekstrand

Reviewed-by: Jason Ekstrand 

Let's add the perf numbers and get this pushed.

On Sat, Oct 3, 2015 at 6:13 PM, Jason Ekstrand  wrote:
> On Sat, Oct 3, 2015 at 11:13 AM, Jason Ekstrand  wrote:
>> On Fri, Oct 2, 2015 at 2:37 PM, Connor Abbott  wrote:
>>> Before, we would only do scheduling after register allocation if we
>>> spilled, despite the fact that the pre-RA scheduler was only supposed to
>>> be for register pressure and set the latencies of every instruction to
>>> 1. This meant that unless we spilled, which we rarely do, then we never
>>> considered instruction latencies at all, and we usually never bothered
>>> to try and hide texture fetch latency. Although a later commit removes
>>> the setting the latency to 1 part, we still want to always run the
>>> post-RA scheduler since it's able to take the false dependencies that
>>> the register allocator creates into account, and it can be more
>>> aggressive than the pre-RA scheduler since it doesn't have to worry
>>> about register pressure at all.
>>>
>>> XXX perf data
>>
>> Test   master  post-ra-sched diff   %diff
>
> bench_heaven   25.179  25.2540.074  +0.200%
>
>> bench_OglPSBump2   396.730 402.386   5.656  +1.400%
>> bench_OglPSBump8   244.370 247.591   3.221  +1.300%
>> bench_OglPSPhong   241.117 242.002   0.885  +0.300%
>> bench_OglPSPom 59.555  59.7250.170  +0.200%
>> bench_OglShMapPcf  86.149  102.346   16.197 +18.800%
>> bench_OglVSTangent 388.849 395.489   6.640  +1.700%
>> bench_trex 65.471  65.8620.390  +0.500%
>> bench_trexoff  69.562  70.1500.588  +0.800%
>>
>> Unfortunately, neither of the unigin benchmarks (heaven or vally)
>> seemed to render correctly.  I just got white on both master and your
>> branch.  Not sure if we have a bug or if they just weren't running
>> right.  In any case, ministat didn't notice any difference in them.
>>
>>> Signed-off-by: Connor Abbott 
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_fs.cpp | 3 +--
>>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> index b269ade..14a9fdf 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> @@ -4981,8 +4981,7 @@ fs_visitor::allocate_registers()
>>> if (failed)
>>>return;
>>>
>>> -   if (!allocated_without_spills)
>>> -  schedule_instructions(SCHEDULE_POST);
>>> +   schedule_instructions(SCHEDULE_POST);
>>>
>>> if (last_scratch > 0)
>>>prog_data->total_scratch = brw_get_scratch_size(last_scratch);
>>> --
>>> 2.1.0
>>>
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 8/9] i965: Check instructions appear only on supported hardware.

2015-10-21 Thread Matt Turner

---
 src/mesa/drivers/dri/i965/brw_eu_validate.c | 257 
 1 file changed, 257 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_validate.c 
b/src/mesa/drivers/dri/i965/brw_eu_validate.c
index 85d4c19..eb57962 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_validate.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_validate.c
@@ -68,6 +68,234 @@ src1_is_null(const struct brw_device_info *devinfo, const 
brw_inst *inst)
   brw_inst_src1_da_reg_nr(devinfo, inst) == BRW_ARF_NULL;
 }
 
+enum gen {
+   GEN4  = (1 << 0),
+   GEN45 = (1 << 1),
+   GEN5  = (1 << 2),
+   GEN6  = (1 << 3),
+   GEN7  = (1 << 4),
+   GEN75 = (1 << 5),
+   GEN8  = (1 << 6),
+   GEN9  = (1 << 7),
+   GEN_ALL = ~0
+};
+
+#define GEN_GE(gen) (~((gen) - 1) | gen)
+#define GEN_LE(gen) (((gen) - 1) | gen)
+
+struct inst_info {
+   enum gen gen;
+};
+
+static const struct inst_info inst_info[128] = {
+   [BRW_OPCODE_ILLEGAL] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_MOV] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_SEL] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_MOVI] = {
+  .gen = GEN_GE(GEN45),
+   },
+   [BRW_OPCODE_NOT] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_AND] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_OR] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_XOR] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_SHR] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_SHL] = {
+  .gen = GEN_ALL,
+   },
+   /* BRW_OPCODE_DIM / BRW_OPCODE_SMOV */
+   /* Reserved - 11 */
+   [BRW_OPCODE_ASR] = {
+  .gen = GEN_ALL,
+   },
+   /* Reserved - 13-15 */
+   [BRW_OPCODE_CMP] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_CMPN] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_CSEL] = {
+  .gen = GEN_GE(GEN8),
+   },
+   [BRW_OPCODE_F32TO16] = {
+  .gen = GEN7 | GEN75,
+   },
+   [BRW_OPCODE_F16TO32] = {
+  .gen = GEN7 | GEN75,
+   },
+   /* Reserved - 21-22 */
+   [BRW_OPCODE_BFREV] = {
+  .gen = GEN_GE(GEN7),
+   },
+   [BRW_OPCODE_BFE] = {
+  .gen = GEN_GE(GEN7),
+   },
+   [BRW_OPCODE_BFI1] = {
+  .gen = GEN_GE(GEN7),
+   },
+   [BRW_OPCODE_BFI2] = {
+  .gen = GEN_GE(GEN7),
+   },
+   /* Reserved - 27-31 */
+   [BRW_OPCODE_JMPI] = {
+  .gen = GEN_ALL,
+   },
+   /* BRW_OPCODE_BRD */
+   [BRW_OPCODE_IF] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_IFF] = { /* also BRW_OPCODE_BRC */
+  .gen = GEN_LE(GEN5),
+   },
+   [BRW_OPCODE_ELSE] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_ENDIF] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_DO] = { /* also BRW_OPCODE_CASE */
+  .gen = GEN_LE(GEN5),
+   },
+   [BRW_OPCODE_WHILE] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_BREAK] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_CONTINUE] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_HALT] = {
+  .gen = GEN_ALL,
+   },
+   /* BRW_OPCODE_CALLA */
+   /* BRW_OPCODE_MSAVE / BRW_OPCODE_CALL */
+   /* BRW_OPCODE_MREST / BRW_OPCODE_RET */
+   /* BRW_OPCODE_PUSH / BRW_OPCODE_FORK / BRW_OPCODE_GOTO */
+   /* BRW_OPCODE_POP */
+   [BRW_OPCODE_WAIT] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_SEND] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_SENDC] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_SENDS] = {
+  .gen = GEN_GE(GEN9),
+   },
+   [BRW_OPCODE_SENDSC] = {
+  .gen = GEN_GE(GEN9),
+   },
+   /* Reserved 53-55 */
+   [BRW_OPCODE_MATH] = {
+  .gen = GEN_GE(GEN6),
+   },
+   /* Reserved 57-63 */
+   [BRW_OPCODE_ADD] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_MUL] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_AVG] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_FRC] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_RNDU] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_RNDD] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_RNDE] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_RNDZ] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_MAC] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_MACH] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_LZD] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_FBH] = {
+  .gen = GEN_GE(GEN7),
+   },
+   [BRW_OPCODE_FBL] = {
+  .gen = GEN_GE(GEN7),
+   },
+   [BRW_OPCODE_CBIT] = {
+  .gen = GEN_GE(GEN7),
+   },
+   [BRW_OPCODE_ADDC] = {
+  .gen = GEN_GE(GEN7),
+   },
+   [BRW_OPCODE_SUBB] = {
+  .gen = GEN_GE(GEN7),
+   },
+   [BRW_OPCODE_SAD2] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_SADA2] = {
+  .gen = GEN_ALL,
+   },
+   /* Reserved 82-83 */
+   [BRW_OPCODE_DP4] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_DPH] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_DP3] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_DP2] = {
+  .gen = GEN_ALL,
+   },
+   /* Reserved 88 */
+   [BRW_OPCODE_LINE] = {
+  .gen = GEN_ALL,
+   },
+   [BRW_OPCODE_PLN] = {
+  .gen = GEN_GE(GEN45),
+   },
+   [BRW_OPCODE_MAD] = {
+  .gen = GEN_GE(GEN6),
+   },
+   [BRW_OPCODE_LRP] = {
+  .gen = GEN_GE(GEN6),

[Mesa-dev] [PATCH 4/9] i965: Set annotation_info's mem_ctx.

2015-10-21 Thread Matt Turner

It was being memset to 0 previously.
---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp   | 2 +-
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 2 +-
 src/mesa/drivers/dri/i965/intel_asm_annotation.c | 3 +++
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index aed4adb..8ab57f7 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -2187,7 +2187,7 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
 
   dump_assembly(p->store, annotation.ann_count, annotation.ann,
 p->devinfo);
-  ralloc_free(annotation.ann);
+  ralloc_free(annotation.mem_ctx);
}
 
compiler->shader_debug_log(log_data,
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index a84f6c4..6ac8591 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -1659,7 +1659,7 @@ vec4_generator::generate_code(const cfg_t *cfg, const 
nir_shader *nir)
 
   dump_assembly(p->store, annotation.ann_count, annotation.ann,
 p->devinfo);
-  ralloc_free(annotation.ann);
+  ralloc_free(annotation.mem_ctx);
}
 
compiler->shader_debug_log(log_data,
diff --git a/src/mesa/drivers/dri/i965/intel_asm_annotation.c 
b/src/mesa/drivers/dri/i965/intel_asm_annotation.c
index b3d6324..f87a9bb 100644
--- a/src/mesa/drivers/dri/i965/intel_asm_annotation.c
+++ b/src/mesa/drivers/dri/i965/intel_asm_annotation.c
@@ -86,6 +86,9 @@ void annotate(const struct brw_device_info *devinfo,
   struct annotation_info *annotation, const struct cfg_t *cfg,
   struct backend_instruction *inst, unsigned offset)
 {
+   if (annotation->mem_ctx == NULL)
+  annotation->mem_ctx = ralloc_context(NULL);
+
if (annotation->ann_size <= annotation->ann_count) {
   int old_size = annotation->ann_size;
   annotation->ann_size = MAX2(1024, annotation->ann_size * 2);
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 4/6] i965: dump scheduling cycle estimates

2015-10-21 Thread Jason Ekstrand

On Fri, Oct 16, 2015 at 8:03 PM, Connor Abbott  wrote:
> The heuristic we're using is rather lame, since it assumes everything is
> non-uniform and loops execute 10 times, but it should be enough for
> measuring improvements in the scheduler that don't result in a change in
> the number of instructions.
>
> v2:
> - Switch loops and cycle counts to be compatible with older shader-db.
> - Make loop heuristic 10x to match with spilling code.
>
> Signed-off-by: Connor Abbott 
> ---
>  src/mesa/drivers/dri/i965/brw_cfg.h  |  4 
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   | 11 ++-
>  .../drivers/dri/i965/brw_schedule_instructions.cpp   | 20 
> 
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  9 +
>  4 files changed, 35 insertions(+), 9 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_cfg.h 
> b/src/mesa/drivers/dri/i965/brw_cfg.h
> index a094917..d0bdb00 100644
> --- a/src/mesa/drivers/dri/i965/brw_cfg.h
> +++ b/src/mesa/drivers/dri/i965/brw_cfg.h
> @@ -90,6 +90,8 @@ struct bblock_t {
> struct exec_list parents;
> struct exec_list children;
> int num;
> +
> +   unsigned cycle_count;
>  };
>
>  static inline struct backend_instruction *
> @@ -285,6 +287,8 @@ struct cfg_t {
> int num_blocks;
>
> bool idom_dirty;
> +
> +   unsigned cycle_count;
>  };
>
>  /* Note that this is implemented with a double for loop -- break will
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index 17e19cf..3bb0e7d 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -2180,9 +2180,9 @@ fs_generator::generate_code(const cfg_t *cfg, int 
> dispatch_width)
>
> if (unlikely(debug_flag)) {
>fprintf(stderr, "Native code for %s\n"
> -  "SIMD%d shader: %d instructions. %d loops. %d:%d spills:fills. 
> Promoted %u constants. Compacted %d to %d"
> +  "SIMD%d shader: %d instructions. %d loops. %u cycles. %d:%d 
> spills:fills. Promoted %u constants. Compacted %d to %d"
>" bytes (%.0f%%)\n",
> -  shader_name, dispatch_width, before_size / 16, loop_count,
> +  shader_name, dispatch_width, before_size / 16, loop_count, 
> cfg->cycle_count,
>spill_count, fill_count, promoted_constants, before_size, 
> after_size,
>100.0f * (before_size - after_size) / before_size);
>
> @@ -2192,12 +2192,13 @@ fs_generator::generate_code(const cfg_t *cfg, int 
> dispatch_width)
> }
>
> compiler->shader_debug_log(log_data,
> -  "%s SIMD%d shader: %d inst, %d loops, "
> +  "%s SIMD%d shader: %d inst, %d loops, %u 
> cycles, "
>"%d:%d spills:fills, Promoted %u constants, "
>"compacted %d to %d bytes.\n",
>stage_abbrev, dispatch_width, before_size / 16,
> -  loop_count, spill_count, fill_count,
> -  promoted_constants, before_size, after_size);
> +  loop_count, cfg->cycle_count, spill_count,
> +  fill_count, promoted_constants, before_size,
> +  after_size);
>
> return start_offset;
>  }
> diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp 
> b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> index 1652261..e14d041 100644
> --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> @@ -1467,6 +1467,24 @@ instruction_scheduler::schedule_instructions(bblock_t 
> *block)
> if (block->end()->opcode == BRW_OPCODE_NOP)
>block->end()->remove(block);
> assert(instructions_to_schedule == 0);
> +
> +   block->cycle_count = time;
> +}
> +
> +static unsigned get_cycle_count(cfg_t *cfg)
> +{
> +   unsigned count = 0, multiplier = 1;
> +   foreach_block(block, cfg) {
> +  if (block->start()->opcode == BRW_OPCODE_DO)
> + multiplier *= 10; /* assume that loops execute ~10 times */
> +
> +  count += block->cycle_count * multiplier;

Unfortunately, I don't think this properly handles "if (...) { tex }
else { tex };" and similar things where the latency isn't necessarily
additive.  However, it's a good first-order.

Reviewed-by: Jason Ekstrand 

> +
> +  if (block->end()->opcode == BRW_OPCODE_WHILE)
> + multiplier /= 10;
> +   }
> +
> +   return count;
>  }
>
>  void
> @@ -1507,6 +1525,8 @@ instruction_scheduler::run(cfg_t *cfg)
>post_reg_alloc);
>bs->dump_instructions();
> }
> +
> +   cfg->cycle_count = get_cycle_count(cfg);
>  }
>
>  void
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
>

Re: [Mesa-dev] [PATCH 5/6] i965/fs: split out calculation of payload live ranges

2015-10-21 Thread Jason Ekstrand

Reviewed-by: Jason Ekstrand 

On Fri, Oct 2, 2015 at 2:37 PM, Connor Abbott  wrote:
> We'll need this for the scheduler too, since it wants to know when the
> live ranges of payload registers end in order to model them in our
> register pressure calculations.
>
> Signed-off-by: Connor Abbott 
> ---
>  src/mesa/drivers/dri/i965/brw_fs.h|  2 +
>  src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 51 
> +--
>  2 files changed, 31 insertions(+), 22 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
> b/src/mesa/drivers/dri/i965/brw_fs.h
> index a8b6726..160b09b 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -141,6 +141,8 @@ public:
> void assign_vs_urb_setup();
> bool assign_regs(bool allow_spilling);
> void assign_regs_trivial();
> +   void calculate_payload_ranges(int payload_node_count,
> + int *payload_last_use_ip);
> void setup_payload_interference(struct ra_graph *g, int payload_reg_count,
> int first_payload_node);
> int choose_spill_reg(struct ra_graph *g);
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> index 6900cee..3f00479 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> @@ -332,32 +332,12 @@ count_to_loop_end(const bblock_t *block)
> unreachable("not reached");
>  }
>
> -/**
> - * Sets up interference between thread payload registers and the virtual GRFs
> - * to be allocated for program temporaries.
> - *
> - * We want to be able to reallocate the payload for our virtual GRFs, notably
> - * because the setup coefficients for a full set of 16 FS inputs takes up 8 
> of
> - * our 128 registers.
> - *
> - * The layout of the payload registers is:
> - *
> - * 0..payload.num_regs-1: fixed function setup (including bary coordinates).
> - * payload.num_regs..payload.num_regs+curb_read_lengh-1: uniform data
> - * payload.num_regs+curb_read_lengh..first_non_payload_grf-1: setup 
> coefficients.
> - *
> - * And we have payload_node_count nodes covering these registers in order
> - * (note that in SIMD16, a node is two registers).
> - */
> -void
> -fs_visitor::setup_payload_interference(struct ra_graph *g,
> -   int payload_node_count,
> -   int first_payload_node)
> +void fs_visitor::calculate_payload_ranges(int payload_node_count,
> +  int *payload_last_use_ip)
>  {
> int loop_depth = 0;
> int loop_end_ip = 0;
>
> -   int payload_last_use_ip[payload_node_count];
> for (int i = 0; i < payload_node_count; i++)
>payload_last_use_ip[i] = -1;
>
> @@ -428,6 +408,33 @@ fs_visitor::setup_payload_interference(struct ra_graph 
> *g,
>
>ip++;
> }
> +}
> +
> +
> +/**
> + * Sets up interference between thread payload registers and the virtual GRFs
> + * to be allocated for program temporaries.
> + *
> + * We want to be able to reallocate the payload for our virtual GRFs, notably
> + * because the setup coefficients for a full set of 16 FS inputs takes up 8 
> of
> + * our 128 registers.
> + *
> + * The layout of the payload registers is:
> + *
> + * 0..payload.num_regs-1: fixed function setup (including bary coordinates).
> + * payload.num_regs..payload.num_regs+curb_read_lengh-1: uniform data
> + * payload.num_regs+curb_read_lengh..first_non_payload_grf-1: setup 
> coefficients.
> + *
> + * And we have payload_node_count nodes covering these registers in order
> + * (note that in SIMD16, a node is two registers).
> + */
> +void
> +fs_visitor::setup_payload_interference(struct ra_graph *g,
> +   int payload_node_count,
> +   int first_payload_node)
> +{
> +   int payload_last_use_ip[payload_node_count];
> +   calculate_payload_ranges(payload_node_count, payload_last_use_ip);
>
> for (int i = 0; i < payload_node_count; i++) {
>if (payload_last_use_ip[i] == -1)
> --
> 2.1.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 1/7] nvc0: fix crash when nv50_miptree_from_handle fails

2015-10-21 Thread Julien Isorce

Sorry this patch should not have gone in the v2 since it has been already
reviewed by Emil. But thx for your review.
I experienced the crash when testing patch 5/7 of this patch series, around
"resource = pscreen->resource_from_handle" in the new vaCreateSurface2
function. Just passing a wrong fd.

I checked your remark for nv50 and nv30 and they don't make this step. From
what I can see, nvc0 re-use nv50_miptree_from_handle from nv50 but still
has its own nvc0_miptree_vtbl. But that's just a guess :)

Cheers
Julien

On 20 October 2015 at 18:04, samuel.pitoiset 
wrote:

> Is there a particular situation where nv50_miptree_from_handle() fails?
> And did you check nv50?
>
> Anyway, this patch is:
> Reviewed-by: Samuel Pitoiset 
>
> On 20/10/2015 18:34, Julien Isorce wrote:
>
>> Signed-off-by: Julien Isorce 
>> ---
>>   src/gallium/drivers/nouveau/nvc0/nvc0_resource.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c
>> b/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c
>> index 12b5a02..15c803c 100644
>> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c
>> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c
>> @@ -26,7 +26,8 @@ nvc0_resource_from_handle(struct pipe_screen * screen,
>>  } else {
>> struct pipe_resource *res = nv50_miptree_from_handle(screen,
>>  templ,
>> whandle);
>> -  nv04_resource(res)->vtbl = _miptree_vtbl;
>> +  if (res)
>> + nv04_resource(res)->vtbl = _miptree_vtbl;
>> return res;
>>  }
>>   }
>>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 6/6] i965/sched: use liveness analysis for computing register pressure

2015-10-21 Thread Jason Ekstrand

On Fri, Oct 2, 2015 at 2:37 PM, Connor Abbott  wrote:
> Previously, we were using some heuristics to try and detect when a write
> was about to begin a live range, or when a read was about to end a live
> range. We never used the liveness analysis information used by the
> register allocator, though, which meant that the scheduler's and the
> allocator's ideas of when a live range began and ended were different.
> Not only did this make our estimate of the register pressure benefit of
> scheduling an instruction wrong in some cases, but it was preventing us
> from knowing the actual register pressure when scheduling each
> instruction, which we want to have in order to switch to register
> pressure scheduling only when the register pressure is too high.
>
> This commit rewrites the register pressure tracking code to use the same
> model as our register allocator currently uses. We use the results of
> liveness analysis, as well as the compute_payload_ranges() function that
> we split out in the last commit. This means that we compute live ranges
> twice on each round through the register allocator, although we could
> speed it up by only recomputing the ranges and not the live in/live out
> sets after scheduling, since we only shuffle around instructions within
> a single basic block when we schedule.
>
> Shader-db results on bdw:
>
> total instructions in shared programs: 7130187 -> 7129880 (-0.00%)
> instructions in affected programs: 1744 -> 1437 (-17.60%)
> helped: 1
> HURT: 1
>
> total cycles in shared programs: 172535126 -> 172473226 (-0.04%)
> cycles in affected programs: 11338636 -> 11276736 (-0.55%)
> helped: 876
> HURT: 873
>
> LOST:   8
> GAINED: 0
> Signed-off-by: Connor Abbott 
> ---
> The results are a wash, but this is needed for a lot of the more
> experimental things I want to do. I can drop this if there are any
> objections.
>
>  .../drivers/dri/i965/brw_schedule_instructions.cpp | 300 
> +
>  1 file changed, 244 insertions(+), 56 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp 
> b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> index 22a493f..6b8792b 100644
> --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> @@ -26,6 +26,7 @@
>   */
>
>  #include "brw_fs.h"
> +#include "brw_fs_live_variables.h"
>  #include "brw_vec4.h"
>  #include "brw_cfg.h"
>  #include "brw_shader.h"
> @@ -400,22 +401,49 @@ schedule_node::set_latency_gen7(bool is_haswell)
>  class instruction_scheduler {
>  public:
> instruction_scheduler(backend_shader *s, int grf_count,
> + int hw_reg_count, int block_count,
>   instruction_scheduler_mode mode)
> {
>this->bs = s;
>this->mem_ctx = ralloc_context(NULL);
>this->grf_count = grf_count;
> +  this->hw_reg_count = hw_reg_count;
>this->instructions.make_empty();
>this->instructions_to_schedule = 0;
>this->post_reg_alloc = (mode == SCHEDULE_POST);
>this->mode = mode;
>this->time = 0;
>if (!post_reg_alloc) {
> - this->remaining_grf_uses = rzalloc_array(mem_ctx, int, grf_count);
> - this->grf_active = rzalloc_array(mem_ctx, bool, grf_count);
> + this->reg_pressure_in = rzalloc_array(mem_ctx, int, block_count);
> +
> + this->livein = ralloc_array(mem_ctx, BITSET_WORD *, block_count);
> + for (int i = 0; i < block_count; i++)
> +this->livein[i] = rzalloc_array(mem_ctx, BITSET_WORD,
> +BITSET_WORDS(grf_count));
> +
> + this->liveout = ralloc_array(mem_ctx, BITSET_WORD *, block_count);
> + for (int i = 0; i < block_count; i++)
> +this->liveout[i] = rzalloc_array(mem_ctx, BITSET_WORD,
> + BITSET_WORDS(grf_count));
> +
> + this->hw_liveout = ralloc_array(mem_ctx, BITSET_WORD *, 
> block_count);
> + for (int i = 0; i < block_count; i++)
> +this->hw_liveout[i] = rzalloc_array(mem_ctx, BITSET_WORD,
> +BITSET_WORDS(hw_reg_count));
> +
> + this->written = rzalloc_array(mem_ctx, bool, grf_count);
> +
> + this->reads_remaining = rzalloc_array(mem_ctx, int, grf_count);
> +
> + this->hw_reads_remaining = rzalloc_array(mem_ctx, int, 
> hw_reg_count);
>} else {
> - this->remaining_grf_uses = NULL;
> - this->grf_active = NULL;
> + this->reg_pressure_in = NULL;
> + this->livein = NULL;
> + this->liveout = NULL;
> + this->hw_liveout = NULL;
> + this->written = NULL;
> + this->reads_remaining = NULL;
> + this->hw_reads_remaining = NULL;
>}
> }
>
> @@ -442,7 +470,8 @@ public:
>  */
> virtual int

[Mesa-dev] [PATCH] mesa: Enable ASTC in GLES' [NUM_]COMPRESSED_TEXTURE_FORMATS queries

2015-10-21 Thread Nanley Chery

From: Nanley Chery 

In OpenGL ES, the COMPRESSED_TEXTURE_FORMATS query returns the set of
supported specific compressed formats. Since ASTC formats fit within
that category, include them in the set and update the
NUM_COMPRESSED_TEXTURE_FORMATS query as well.

This enables GLES2-based ASTC dEQP tests to run. See the Bugzilla for
more info.

Cc: "11.0" 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92193
Reported-by: Tapani Pälli 
Suggested-by: Ian Romanick 
Signed-off-by: Nanley Chery 
---
 src/mesa/main/texcompress.c | 85 +
 1 file changed, 63 insertions(+), 22 deletions(-)

diff --git a/src/mesa/main/texcompress.c b/src/mesa/main/texcompress.c
index 84973d3..9d22586 100644
--- a/src/mesa/main/texcompress.c
+++ b/src/mesa/main/texcompress.c
@@ -243,28 +243,6 @@ _mesa_gl_compressed_format_base_format(GLenum format)
  *what GL_NUM_COMPRESSED_TEXTURE_FORMATS and
  *GL_COMPRESSED_TEXTURE_FORMATS return."
  *
- * The KHR_texture_compression_astc_hdr spec says:
- *
- *"Interactions with OpenGL 4.2
- *
- *OpenGL 4.2 supports the feature that compressed textures can be
- *compressed online, by passing the compressed texture format enum as
- *the internal format when uploading a texture using TexImage1D,
- *TexImage2D or TexImage3D (see Section 3.9.3, Texture Image
- *Specification, subsection Encoding of Special Internal Formats).
- *
- *Due to the complexity of the ASTC compression algorithm, it is not
- *usually suitable for online use, and therefore ASTC support will be
- *limited to pre-compressed textures only. Where on-device compression
- *is required, a domain-specific limited compressor will typically
- *be used, and this is therefore not suitable for implementation in
- *the driver.
- *
- *In particular, the ASTC format specifiers will not be added to
- *Table 3.14, and thus will not be accepted by the TexImage*D
- *functions, and will not be returned by the (already deprecated)
- *COMPRESSED_TEXTURE_FORMATS query."
- *
  * There is no formal spec for GL_ATI_texture_compression_3dc.  Since the
  * formats added by this extension are luminance-alpha formats, it is
  * reasonable to expect them to follow the same rules as
@@ -396,6 +374,69 @@ _mesa_get_compressed_formats(struct gl_context *ctx, GLint 
*formats)
  n += 10;
   }
}
+
+   /* The KHR_texture_compression_astc_hdr spec says:
+*
+*"Interactions with OpenGL 4.2
+*
+*OpenGL 4.2 supports the feature that compressed textures can be
+*compressed online, by passing the compressed texture format enum 
as
+*the internal format when uploading a texture using TexImage1D,
+*TexImage2D or TexImage3D (see Section 3.9.3, Texture Image
+*Specification, subsection Encoding of Special Internal Formats).
+*
+*Due to the complexity of the ASTC compression algorithm, it is not
+*usually suitable for online use, and therefore ASTC support will 
be
+*limited to pre-compressed textures only. Where on-device 
compression
+*is required, a domain-specific limited compressor will typically
+*be used, and this is therefore not suitable for implementation in
+*the driver.
+*
+*In particular, the ASTC format specifiers will not be added to
+*Table 3.14, and thus will not be accepted by the TexImage*D
+*functions, and will not be returned by the (already deprecated)
+*COMPRESSED_TEXTURE_FORMATS query."
+*
+* The ES and the desktop specs diverge here. In OpenGL ES, the 
COMPRESSED_TEXTURE_FORMATS
+* query returns the set of supported specific compressed formats.
+*/
+   if (ctx->API == API_OPENGLES2 &&
+   ctx->Extensions.KHR_texture_compression_astc_ldr) {
+  if (formats) {
+ formats[n++] = GL_COMPRESSED_RGBA_ASTC_4x4_KHR;
+ formats[n++] = GL_COMPRESSED_RGBA_ASTC_5x4_KHR;
+ formats[n++] = GL_COMPRESSED_RGBA_ASTC_5x5_KHR;
+ formats[n++] = GL_COMPRESSED_RGBA_ASTC_6x5_KHR;
+ formats[n++] = GL_COMPRESSED_RGBA_ASTC_6x6_KHR;
+ formats[n++] = GL_COMPRESSED_RGBA_ASTC_8x5_KHR;
+ formats[n++] = GL_COMPRESSED_RGBA_ASTC_8x6_KHR;
+ formats[n++] = GL_COMPRESSED_RGBA_ASTC_8x8_KHR;
+ formats[n++] = GL_COMPRESSED_RGBA_ASTC_10x5_KHR;
+ formats[n++] = GL_COMPRESSED_RGBA_ASTC_10x6_KHR;
+ formats[n++] = GL_COMPRESSED_RGBA_ASTC_10x8_KHR;
+ formats[n++] = GL_COMPRESSED_RGBA_ASTC_10x10_KHR;
+ formats[n++] = GL_COMPRESSED_RGBA_ASTC_12x10_KHR;
+ formats[n++] = GL_COMPRESSED_RGBA_ASTC_12x12_KHR;
+ formats[n++] =

Re: [Mesa-dev] [PATCH 2/6] i965/sched: write-after-read dependencies are free

2015-10-21 Thread Jason Ekstrand

On Fri, Oct 2, 2015 at 2:37 PM, Connor Abbott  wrote:
> Although write-after-write dependencies have the same latency as
> read-after-write dependencies due to how the register scoreboard works,
> write-after-read dependencies aren't checked by the EU at all, so
> they're purely a constraint on how the scheduler can order the
> instructions.
>
> Signed-off-by: Connor Abbott 
> ---
>  src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp 
> b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> index 76d58e2..1652261 100644
> --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
> @@ -927,10 +927,10 @@ fs_instruction_scheduler::calculate_deps()
>   if (inst->src[i].file == GRF) {
>  if (post_reg_alloc) {
> for (int r = 0; r < inst->regs_read(i); r++)
> -  add_dep(n, last_grf_write[inst->src[i].reg + r]);
> +  add_dep(n, last_grf_write[inst->src[i].reg + r], 0);
>  } else {
> for (int r = 0; r < inst->regs_read(i); r++) {
> -  add_dep(n, last_grf_write[inst->src[i].reg * 16 + 
> inst->src[i].reg_offset + r]);
> +  add_dep(n, last_grf_write[inst->src[i].reg * 16 + 
> inst->src[i].reg_offset + r], 0);
> }
>  }
>   } else if (inst->src[i].file == HW_REG &&
> @@ -941,9 +941,9 @@ fs_instruction_scheduler::calculate_deps()
> if (inst->src[i].fixed_hw_reg.vstride == 
> BRW_VERTICAL_STRIDE_0)
>size = 1;
> for (int r = 0; r < size; r++)
> -  add_dep(n, last_grf_write[inst->src[i].fixed_hw_reg.nr + 
> r]);
> +  add_dep(n, last_grf_write[inst->src[i].fixed_hw_reg.nr + 
> r], 0);
>  } else {
> -   add_dep(n, last_fixed_grf_write);
> +   add_dep(n, last_fixed_grf_write, 0);
>  }
>   } else if (inst->src[i].is_accumulator()) {
>  add_dep(n, last_accumulator_write);

You could probably change this one as well, but meh, it's the accumulator...

Reviewed-by: Jason Ekstrand 

> --
> 2.1.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa: check for unchanged line width before error checking

2015-10-21 Thread Matt Turner

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] svga: fix clip plane regression after recent tgsi_scan change

2015-10-21 Thread Charmaine Lee

A minor nit, please change "num_written_culldistance field was a multiple of 
four" comment
in the commit message to "num_written_clipdistance field was a "

Reviewed-by: Charmaine Lee 

 

From: Brian Paul 
Sent: Wednesday, October 21, 2015 3:25 PM
To: mesa-dev@lists.freedesktop.org
Cc: Charmaine Lee
Subject: [PATCH] svga: fix clip plane regression after recent tgsi_scan change

Before the change "tgsi/scan: use properties for clip/cull distance
writemasks", the tgsi_shader_info::num_written_culldistance field
was a multiple of four, now it's an accurate count.  In the svga
driver, we need a minor change to the loop test.
---
 src/gallium/drivers/svga/svga_tgsi_vgpu10.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_tgsi_vgpu10.c 
b/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
index d62f2bb..332904f 100644
--- a/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
+++ b/src/gallium/drivers/svga/svga_tgsi_vgpu10.c
@@ -3097,7 +3097,7 @@ emit_clip_distance_instructions(struct 
svga_shader_emitter_v10 *emit)
unsigned i;
unsigned clip_plane_enable = emit->key.clip_plane_enable;
unsigned clip_dist_tmp_index = emit->clip_dist_tmp_index;
-   unsigned num_written_clipdist = emit->info.num_written_clipdistance;
+   int num_written_clipdist = emit->info.num_written_clipdistance;

assert(emit->clip_dist_out_index != INVALID_INDEX);
assert(emit->clip_dist_tmp_index != INVALID_INDEX);
@@ -3109,7 +3109,7 @@ emit_clip_distance_instructions(struct 
svga_shader_emitter_v10 *emit)
 */
emit->clip_dist_tmp_index = INVALID_INDEX;

-   for (i = 0; i < 2 && num_written_clipdist; i++, num_written_clipdist-=4) {
+   for (i = 0; i < 2 && num_written_clipdist > 0; i++, 
num_written_clipdist-=4) {

   tmp_clip_dist_src = make_src_temp_reg(clip_dist_tmp_index + i);

--
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] vbo: optimize vertex copying when 'wrapping'

2015-10-21 Thread Matt Turner

On Wed, Oct 21, 2015 at 3:41 PM, Brian Paul  wrote:
> Instead of calling memcpy() 'n' times, we can do it all at once since
> the source and dest regions are all contiguous.
> ---
>  src/mesa/vbo/vbo_exec_api.c | 16 +++-
>  src/mesa/vbo/vbo_save_api.c | 15 +++
>  2 files changed, 14 insertions(+), 17 deletions(-)
>
> diff --git a/src/mesa/vbo/vbo_exec_api.c b/src/mesa/vbo/vbo_exec_api.c
> index a23d5aa..d70fc3b 100644
> --- a/src/mesa/vbo/vbo_exec_api.c
> +++ b/src/mesa/vbo/vbo_exec_api.c
> @@ -132,8 +132,7 @@ static void vbo_exec_wrap_buffers( struct 
> vbo_exec_context *exec )
>  static void
>  vbo_exec_vtx_wrap(struct vbo_exec_context *exec)
>  {
> -   fi_type *data = exec->vtx.copied.buffer;
> -   GLuint i;
> +   GLuint numComponents;

You might use unsigned here (and below).

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] nouveau: fix double free when screen_create fails

2015-10-21 Thread Julien Isorce

The real fix is in nouveau_drm_winsys.c by setting dev to 0.
Which means dev's ownership has been passed to previous call.
Other changes are there to be consistent with what the
screen_create functions already do on errors.

Encountered this crash because nvc0_screen_create sometimes fails with:
nvc0_screen_create:717 - Error allocating PGRAPH context for M2MF: -16
Also see: https://bugs.freedesktop.org/show_bug.cgi?id=70354

Signed-off-by: Julien Isorce 
---
 src/gallium/drivers/nouveau/nv30/nv30_screen.c  | 5 -
 src/gallium/drivers/nouveau/nv50/nv50_screen.c  | 4 +++-
 src/gallium/winsys/nouveau/drm/nouveau_drm_winsys.c | 2 ++
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c 
b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
index 0330164..9b8ddac 100644
--- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c
+++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
@@ -425,8 +425,10 @@ nv30_screen_create(struct nouveau_device *dev)
unsigned oclass = 0;
int ret, i;
 
-   if (!screen)
+   if (!screen) {
+  nouveau_device_del();
   return NULL;
+   }
 
switch (dev->chipset & 0xf0) {
case 0x30:
@@ -456,6 +458,7 @@ nv30_screen_create(struct nouveau_device *dev)
 
if (!oclass) {
   NOUVEAU_ERR("unknown 3d class for 0x%02x\n", dev->chipset);
+  nouveau_device_del();
   FREE(screen);
   return NULL;
}
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index ec51d00..e9604d5 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -711,8 +711,10 @@ nv50_screen_create(struct nouveau_device *dev)
int ret;
 
screen = CALLOC_STRUCT(nv50_screen);
-   if (!screen)
+   if (!screen) {
+  nouveau_device_del();
   return NULL;
+   }
pscreen = >base.base;
 
ret = nouveau_screen_init(>base, dev);
diff --git a/src/gallium/winsys/nouveau/drm/nouveau_drm_winsys.c 
b/src/gallium/winsys/nouveau/drm/nouveau_drm_winsys.c
index c6603e3..bd1d761 100644
--- a/src/gallium/winsys/nouveau/drm/nouveau_drm_winsys.c
+++ b/src/gallium/winsys/nouveau/drm/nouveau_drm_winsys.c
@@ -117,6 +117,8 @@ nouveau_drm_screen_create(int fd)
}
 
screen = (struct nouveau_screen*)init(dev);
+   /* Previous init func took ownership of dev */
+   dev = 0;
if (!screen)
goto err;
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] gbm: Add a flag to enable creation of rotated scanout buffers

2015-10-21 Thread Vivek Kasireddy

For certain platforms that support rotated scanout buffers, currently,
there is no way to create them with the GBM DRI interface. This flag
will tell the DRI driver to set Y-tiling while creating the rotated
scanout buffer.

Cc: Kristian Hogsberg 
Signed-off-by: Vivek Kasireddy 
---
 include/GL/internal/dri_interface.h  | 1 +
 src/gbm/backends/dri/gbm_dri.c   | 9 +++--
 src/gbm/main/gbm.h   | 5 +
 src/mesa/drivers/dri/i965/intel_screen.c | 6 ++
 4 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/include/GL/internal/dri_interface.h 
b/include/GL/internal/dri_interface.h
index c827bb6..1a721d0 100644
--- a/include/GL/internal/dri_interface.h
+++ b/include/GL/internal/dri_interface.h
@@ -1098,6 +1098,7 @@ struct __DRIdri2ExtensionRec {
 #define __DRI_IMAGE_USE_SCANOUT0x0002
 #define __DRI_IMAGE_USE_CURSOR 0x0004 /* Depricated */
 #define __DRI_IMAGE_USE_LINEAR 0x0008
+#define __DRI_IMAGE_USE_SCANOUT_ROTATED_90_270 0x0010
 
 
 /**
diff --git a/src/gbm/backends/dri/gbm_dri.c b/src/gbm/backends/dri/gbm_dri.c
index ccc3cc6..92b6573 100644
--- a/src/gbm/backends/dri/gbm_dri.c
+++ b/src/gbm/backends/dri/gbm_dri.c
@@ -539,7 +539,7 @@ gbm_dri_is_format_supported(struct gbm_device *gbm,
   break;
case GBM_BO_FORMAT_ARGB:
case GBM_FORMAT_ARGB:
-  if (usage & GBM_BO_USE_SCANOUT)
+  if (usage & (GBM_BO_USE_SCANOUT | GBM_BO_USE_SCANOUT_ROTATED_90_270))
  return 0;
   break;
default:
@@ -732,6 +732,8 @@ gbm_dri_bo_import(struct gbm_device *gbm,
 
if (usage & GBM_BO_USE_SCANOUT)
   dri_use |= __DRI_IMAGE_USE_SCANOUT;
+   if (usage & GBM_BO_USE_SCANOUT_ROTATED_90_270)
+  dri_use |= __DRI_IMAGE_USE_SCANOUT_ROTATED_90_270;
if (usage & GBM_BO_USE_CURSOR)
   dri_use |= __DRI_IMAGE_USE_CURSOR;
if (dri->image->base.version >= 2 &&
@@ -770,7 +772,8 @@ create_dumb(struct gbm_device *gbm,
 
is_cursor = (usage & GBM_BO_USE_CURSOR) != 0 &&
   format == GBM_FORMAT_ARGB;
-   is_scanout = (usage & GBM_BO_USE_SCANOUT) != 0 &&
+   is_scanout = (usage & (GBM_BO_USE_SCANOUT |
+  GBM_BO_USE_SCANOUT_ROTATED_90_270)) != 0 &&
   format == GBM_FORMAT_XRGB;
if (!is_cursor && !is_scanout) {
   errno = EINVAL;
@@ -864,6 +867,8 @@ gbm_dri_bo_create(struct gbm_device *gbm,
 
if (usage & GBM_BO_USE_SCANOUT)
   dri_use |= __DRI_IMAGE_USE_SCANOUT;
+   if (usage & GBM_BO_USE_SCANOUT_ROTATED_90_270)
+  dri_use |= __DRI_IMAGE_USE_SCANOUT_ROTATED_90_270;
if (usage & GBM_BO_USE_CURSOR)
   dri_use |= __DRI_IMAGE_USE_CURSOR;
if (usage & GBM_BO_USE_LINEAR)
diff --git a/src/gbm/main/gbm.h b/src/gbm/main/gbm.h
index 2708e50..2ef7bd8 100644
--- a/src/gbm/main/gbm.h
+++ b/src/gbm/main/gbm.h
@@ -213,6 +213,11 @@ enum gbm_bo_flags {
 * Buffer is linear, i.e. not tiled.
 */
GBM_BO_USE_LINEAR = (1 << 4),
+   /**
+* Buffer would be rotated and some platforms have additional tiling
+* requirements for 90/270 rotated buffers.
+*/
+   GBM_BO_USE_SCANOUT_ROTATED_90_270 = (1 << 5),
 };
 
 int
diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index 896a125..3c1dc9f 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -520,6 +520,12 @@ intel_create_image(__DRIscreen *screen,
 
if (use & __DRI_IMAGE_USE_LINEAR)
   tiling = I915_TILING_NONE;
+   else if (use & __DRI_IMAGE_USE_SCANOUT_ROTATED_90_270) {
+  if (intelScreen->devinfo->gen >= 9)
+ tiling = I915_TILING_Y;
+  else
+ return NULL;
+   }
 
image = intel_allocate_image(format, loaderPrivate);
if (image == NULL)
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/4] st/mesa: use _mesa_RasterPos() when possible

2015-10-21 Thread Roland Scheidegger

Ahhh legacy functionality from hell...

For the series:
Reviewed-by: Roland Scheidegger 

I'm wondering if it would be possible to omit the rasterPos execution
completely, by incorporating it into drawPixels etc. at least in the
hopefully common case there's no state changes affecting the results in
between. Albeit I guess since the current raster pos needs to be always
maintained we'd need to maintain significant state in the case we'd need
to really evaluate it...


Roland

Am 22.10.2015 um 00:41 schrieb Brian Paul:
> The st_RasterPos() function goes to great pains to implement the
> rasterpos transformation.  It basically uses gallium's draw module to
> execute the vertex shader to draw a point, then capture that point's
> attributes.
> 
> But glRasterPos isn't typically used with a vertex shader so we can
> usually use the old/fixed-function implementation which is a lot simpler
> and faster.
> 
> This can add up for legacy apps that make a lot of calls to glRasterPos.
> ---
>  src/mesa/state_tracker/st_cb_rasterpos.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/src/mesa/state_tracker/st_cb_rasterpos.c 
> b/src/mesa/state_tracker/st_cb_rasterpos.c
> index b9997da..747b414 100644
> --- a/src/mesa/state_tracker/st_cb_rasterpos.c
> +++ b/src/mesa/state_tracker/st_cb_rasterpos.c
> @@ -39,6 +39,7 @@
>  #include "main/imports.h"
>  #include "main/macros.h"
>  #include "main/feedback.h"
> +#include "main/rastpos.h"
>  
>  #include "st_context.h"
>  #include "st_atom.h"
> @@ -224,6 +225,15 @@ st_RasterPos(struct gl_context *ctx, const GLfloat v[4])
> struct rastpos_stage *rs;
> const struct gl_client_array **saved_arrays = ctx->Array._DrawArrays;
>  
> +   if (ctx->VertexProgram._Current == NULL ||
> +   ctx->VertexProgram._Current == ctx->VertexProgram._TnlProgram) {
> +  /* No vertex shader/program is enabled, used the simple/fast fixed-
> +   * function implementation of RasterPos.
> +   */
> +  _mesa_RasterPos(ctx, v);
> +  return;
> +   }
> +
> if (st->rastpos_stage) {
>/* get rastpos stage info */
>rs = rastpos_stage(st->rastpos_stage);
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/9] ralloc: Set *start in ralloc_vasprintf_rewrite_tail() if str is NULL.

2015-10-21 Thread Kenneth Graunke

On Wednesday, October 21, 2015 03:58:09 PM Matt Turner wrote:
> We were leaving it undefined, even though we were writing a string to
> *str.
> ---
>  src/util/ralloc.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/util/ralloc.c b/src/util/ralloc.c
> index e07fce7..bb4cf96 100644
> --- a/src/util/ralloc.c
> +++ b/src/util/ralloc.c
> @@ -499,6 +499,7 @@ ralloc_vasprintf_rewrite_tail(char **str, size_t *start, 
> const char *fmt,
> if (unlikely(*str == NULL)) {
>// Assuming a NULL context is probably bad, but it's expected behavior.
>*str = ralloc_vasprintf(NULL, fmt, args);
> +  *start = strlen(*str);
>return true;
> }

This patch is:
Reviewed-by: Kenneth Graunke 

Thanks for fixing my cheesy string library :)


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Introducing OpenSWR: High performance software rasterizer

2015-10-21 Thread Roland Scheidegger

Am 22.10.2015 um 00:41 schrieb Rowley, Timothy O:
> 
>> On Oct 20, 2015, at 2:03 PM, Roland Scheidegger  wrote:
>>
>> Certainly looks interesting...
>> From a high level point of view, seems quite similar to llvmpipe (both
>> tile based, using llvm for jitting shaders, ...). Of course llvmpipe
>> isn't well suited for these kind of workloads (the most important use
>> case is desktop compositing, so a couple dozen vertices per frame but
>> millions of pixels...). Making vertex loads scale is something which
>> just wasn't worth the effort so far (there's not actually that many
>> people working on llvmpipe), albeit we realize that the completely
>> non-parallel nature of it currently actually can hinder scaling quite a
>> bit even for "typical" workloads (not desktop compositing, but "simple"
>> 3d apps) once you've got enough cores/threads (8 or so), but that's
>> something we're not worried too much about.
>> I think requiring llvm 3.6 probably isn't going to work if you want to
>> upstream this, a minimum version of 3.6 is fine but the general rule is
>> things should still work with newer versions (including current
>> development version, seems like you're using c++ interface of llvm quite
>> a bit so that's probably going to require some #ifdef mess). Albeit I
>> guess if you just don't try to build the driver with non-released
>> versions that's probably ok (but will limit the ability for some people
>> to try out your driver).
> 
> Some differences between llvmpipe and swr based on my understanding of 
> llvmpipe’s architecture:
> 
> threading model
>   llvmpipe: single threaded vertex processing, up to 16 rasterization 
> threads
The limit is actually pretty much arbitrary. Though since vertex
processing is single threaded, there's definitely practical scaling
limits (and having more threads than render tiles wouldn't show any
advantage).

>   swr: common thread pool that pick up frontend or backend work as 
> available
> vertex processing
>   llvmpipe: entire draw call processed in a single pass
>   swr: large draws chopped into chunks that can be processed in parallel
> frontend/backend coupling
>   llvmpipe: separate binning pass in single threaded frontend
>   swr: frontend vertex processing and binning combined in a single pass
There's definitive advantages to swr there. llvmpipe's binning pass
isn't really separate from vertex processing, so this being
single-threaded is more of a result of vertex processing also being
handled in the same frontend thread (though of course if it were
multithreaded some extra logic would be needed for things to stay
correctly in order).
Part of it is due to draw really being separate from llvmpipe (it can
and is used by other drivers), so the "interface" between vs and fs is
rather simple. But certainly it's not like this is set in stone, rather
noone had the time to do something a bit more scalable there...

> primitive assembly and binning
>   llvmpipe: scalar c code
there's actually some jit code there plus some manual sse code (though
still c fallback). Albeit it is indeed not quite as parallel as I'd like
(only works on a single primitive at a time).

>   swr: x86 avx/avx2 working on vector of primitives
> fragment processing
>   llvmpipe: single jitted shader combining depth/fragment/stencil/blend 
> on16x16 block
It is working on a 4x4 block actually, but otherwise that's right.

>   swr: separate jitted fragment and blend shaders, plus templated depth 
> test
> in-memory representation
>   llvmpipe: direct access to render targets
>   swr: hot-tile working representation with load and/or store at required 
> times
This is actually an interesting difference, of course also tied to
llvmpipe integrating everything together into the fragment shader.

So yes, these are all definitely significant architectural differences
to llvmpipe. But most of it (ok the combined fragment shader / backend
jit code is not) is not really due to a concious design decision - I'd
happily accept patches to make it possible to do vertex processing in
parallel :-).


> As you say, we do use LLVM’s C++ API.  While that has some advantages, it’s 
> not guaranteed to be stable and can/does make nontrivial changes.  3.6 to 3.7 
> made some change to at least the GEP instruction which we could work around 
> if necessary for upstreaming.
IMHO you should really try to keep up at least with llvm releases (and
ideally llvm head). Otherwise you make it a pain to build not just for
users but developers alike (and if stuff doesn't get at least built, it
has a tendency to break quite often when there's gallium interface
changes etc.).


Roland

> 
> -Tim
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Introducing OpenSWR: High performance software rasterizer

2015-10-21 Thread Rowley, Timothy O

> On Oct 20, 2015, at 5:58 PM, Jose Fonseca  wrote:
> 
> Thanks for the explanations.  It's closer now, but still a bit of gap:
> 
> $ KNOB_MAX_THREADS_PER_CORE=0 ./gloss
> SWR create screen!
> This processor supports AVX2.
> --> numThreads = 3
> 1102 frames in 5.002 seconds = 220.312 FPS
> 1133 frames in 5.001 seconds = 226.555 FPS
> 1130 frames in 5.002 seconds = 225.91 FPS
> ^C
> $ GALLIUM_DRIVER=llvmpipe LP_NUM_THREADS=2 ./gloss
> 1456 frames in 5 seconds = 291.2 FPS
> 1617 frames in 5.003 seconds = 323.206 FPS
> 1571 frames in 5.002 seconds = 314.074 FPS

A bit more of an apples to apples comparison might be single-threaded llvmpipe 
(LP_NUM_THREADS=1) and single-threaded swr (KNOB_SINGLE_THREADED=1).  Running 
gloss and glxgears (another favorite “benchmark” :) ) under these conditions 
show swr running a bit slower, though a little closer than your numbers.  
Examining performance traces, we think swr’s concept of hot-tiles, the working 
memory representation of the render target, and the associated load/store 
functions contribute to most of the difference.  We might be able to optimize 
those conversions; additionally fast clear would help these demos.  For larger 
workloads this small per-frame cost doesn’t really affect the performance.

> One final question: you said that one thread is reserved for the API, but I 
> see all threads (with top `H`) maxing up the CPU. So if the thread reserved 
> for the API is not doing vertex/fragment processing, then what is it using 
> 100% of a CPU thread for?

With a trivial application main loop and light api usage, the API thread is 
going to end up spending most of the time waiting for the other threads to 
finish work.

These initial observations from you and others regarding performance have been 
interesting.  Our performance work has been with large workloads on high core 
count configurations, where while some of the decisions such as a dedicated 
core for the application/API might have cost performance a bit, the percentage 
is much less than on the dual and quad core processors.  We’ll look into some 
changes/tuning that will benefit both extremes, though we might have to end up 
conceding that llvmpipe will be faster at glxgears. :-)  

> Final thoughts: I understand this project has its own history, but I echo 
> what Roland said -- it would be nice to unify with llvmpipe at one point, in 
> some way or fashion.  Our (VMware's) focus has been desktop composition, but 
> there's no reason why a single SW renderer can't satisfy both ends of the 
> spectrum, especially for JIT enable renderers, since they can emit at runtime 
> the code most suited for the workload.

We would be happy for someone to take some of the ideas from swr to speed up 
llvmpipe, but for now our development will continue on the swr core and driver. 
 We’re not planning on replacing llvmpipe - its intent of working on any 
architecture is admirable.  In the ideal world the solution would be something 
that combines the best traits of both rasterizers, but at this point the 
shortest path to having a performant solution for our customers is with swr. 

> That said, it's really nice seeing Mesa and Gallium enabling this sort of 
> experiments with SW rendering.

Yes, we were quite happy with how fast we were able to get a new driver 
functioning with gallium.  The major thing slowing us was the documentation, 
which is not uniform in coverage.  There was a lot of reading other drivers’ 
source to figure out how things were supposed to work.

-Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 92437] osmesa: Expose GL entry points for Windows build, via .def file

2015-10-21 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=92437

Jose Fonseca  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Jose Fonseca  ---
Pushed. Thanks.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] glsl: join calculate_array_size() and calculate_array_stride()

2015-10-21 Thread Juha-Pekka Heikkila

These helpers are ran for same case the same loop. Here joined
their operation so the loop is ran just once. Also fixed
out-of-memory condition here.

Signed-off-by: Juha-Pekka Heikkila 
---
 src/glsl/linker.cpp | 112 +---
 1 file changed, 37 insertions(+), 75 deletions(-)

diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
index 25ca928..175d90b 100644
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -3471,11 +3471,12 @@ is_top_level_shader_storage_block_member(const char* 
name,
 }
 
 static void
-calculate_array_size(struct gl_shader_program *shProg,
- struct gl_uniform_storage *uni)
+calculate_array_size_and_stride(struct gl_shader_program *shProg,
+struct gl_uniform_storage *uni)
 {
int block_index = uni->block_index;
int array_size = -1;
+   int array_stride = -1;
char *var_name = get_top_level_name(uni->name);
char *interface_name =
   get_top_level_name(shProg->BufferInterfaceBlocks[block_index].Name);
@@ -3483,9 +3484,17 @@ calculate_array_size(struct gl_shader_program *shProg,
if (strcmp(var_name, interface_name) == 0) {
   /* Deal with instanced array of SSBOs */
   char *temp_name = get_var_name(uni->name);
+  if (!temp_name) {
+ linker_error(shProg, "Out of memory during linking.\n");
+ goto write_top_level_array_size_and_stride;
+  }
   free(var_name);
   var_name = get_top_level_name(temp_name);
   free(temp_name);
+  if (!var_name) {
+ linker_error(shProg, "Out of memory during linking.\n");
+ goto write_top_level_array_size_and_stride;
+  }
}
 
for (unsigned i = 0; i < shProg->NumShaders; i++) {
@@ -3508,76 +3517,7 @@ calculate_array_size(struct gl_shader_program *shProg,
 const glsl_struct_field *field = >fields.structure[i];
 if (strcmp(field->name, var_name) != 0)
continue;
-/* From GL_ARB_program_interface_query spec:
- *
- * "For the property TOP_LEVEL_ARRAY_SIZE, a single integer
- * identifying the number of active array elements of the top-level
- * shader storage block member containing to the active variable is
- * written to .  If the top-level block member is not
- * declared as an array, the value one is written to .  If
- * the top-level block member is an array with no declared size,
- * the value zero is written to .
- */
-if (is_top_level_shader_storage_block_member(uni->name,
- interface_name,
- var_name))
-   array_size = 1;
-else if (field->type->is_unsized_array())
-   array_size = 0;
-else if (field->type->is_array())
-   array_size = field->type->length;
-else
-   array_size = 1;
 
-goto found_top_level_array_size;
- }
-  }
-   }
-found_top_level_array_size:
-   free(interface_name);
-   free(var_name);
-   uni->top_level_array_size = array_size;
-}
-
-static void
-calculate_array_stride(struct gl_shader_program *shProg,
-   struct gl_uniform_storage *uni)
-{
-   int block_index = uni->block_index;
-   int array_stride = -1;
-   char *var_name = get_top_level_name(uni->name);
-   char *interface_name =
-  get_top_level_name(shProg->BufferInterfaceBlocks[block_index].Name);
-
-   if (strcmp(var_name, interface_name) == 0) {
-  /* Deal with instanced array of SSBOs */
-  char *temp_name = get_var_name(uni->name);
-  free(var_name);
-  var_name = get_top_level_name(temp_name);
-  free(temp_name);
-   }
-
-   for (unsigned i = 0; i < shProg->NumShaders; i++) {
-  if (shProg->Shaders[i] == NULL)
- continue;
-
-  const gl_shader *stage = shProg->Shaders[i];
-  foreach_in_list(ir_instruction, node, stage->ir) {
- ir_variable *var = node->as_variable();
- if (!var || !var->get_interface_type() ||
- var->data.mode != ir_var_shader_storage)
-continue;
-
- const glsl_type *interface = var->get_interface_type();
-
- if (strcmp(interface_name, interface->name) != 0) {
-continue;
- }
-
- for (unsigned i = 0; i < interface->length; i++) {
-const glsl_struct_field *field = >fields.structure[i];
-if (strcmp(field->name, var_name) != 0)
-   continue;
 /* From GL_ARB_program_interface_query:
  *
  * "For the property TOP_LEVEL_ARRAY_STRIDE, a single integer
@@ -3617,14 +3557,37 @@ calculate_array_stride(struct gl_shader_program *shProg,
 } else {
array_stride = 0;
 }
-goto

Re: [Mesa-dev] [PATCH] gallivm: Translate all util_cpu_caps bits to LLVM attributes.

2015-10-21 Thread Roland Scheidegger

Oh and it probably should go to stable.

Roland

Am 21.10.2015 um 18:55 schrieb Roland Scheidegger:
> Thanks for fixing this up.
> 
> Reviewed-by: Roland Scheidegger 
> 
> Am 21.10.2015 um 18:25 schrieb Jose Fonseca:
>> This should prevent disparity between features Mesa and LLVM
>> believe are supported by the CPU.
>>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_archives_mesa-2Ddev_2015-2DOctober_thread.html-2396990=BQIGaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I=bHU3Xahz-uDZ6-5z71qyASt0F2O3ZeKC4cD2GpiDv8c=ptbM2wYSyjHNp6-mOXdvcUtSt6aNeKZa0eDOOFLXfOQ=
>>  
>>
>> Tested on a i7-3720QM w/ LLVM 3.3 and 3.6.
>> ---
>>  src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 34 
>> ++-
>>  1 file changed, 33 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp 
>> b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>> index 72fab8c..7073956 100644
>> --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>> @@ -498,6 +498,32 @@ 
>> lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
>> }
>>  
>> llvm::SmallVector MAttrs;
>> +   if (util_cpu_caps.has_sse) {
>> +  MAttrs.push_back("+sse");
>> +   }
>> +   if (util_cpu_caps.has_sse2) {
>> +  MAttrs.push_back("+sse2");
>> +   }
>> +   if (util_cpu_caps.has_sse3) {
>> +  MAttrs.push_back("+sse3");
>> +   }
>> +   if (util_cpu_caps.has_ssse3) {
>> +  MAttrs.push_back("+ssse3");
>> +   }
>> +   if (util_cpu_caps.has_sse4_1) {
>> +#if HAVE_LLVM >= 0x0304
>> +  MAttrs.push_back("+sse4.1");
>> +#else
>> +  MAttrs.push_back("+sse41");
>> +#endif
>> +   }
>> +   if (util_cpu_caps.has_sse4_2) {
>> +#if HAVE_LLVM >= 0x0304
>> +  MAttrs.push_back("+sse4.2");
>> +#else
>> +  MAttrs.push_back("+sse42");
>> +#endif
>> +   }
>> if (util_cpu_caps.has_avx) {
>>/*
>> * AVX feature is not automatically detected from CPUID by the X86 
>> target
>> @@ -509,8 +535,14 @@ 
>> lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
>>if (util_cpu_caps.has_f16c) {
>>   MAttrs.push_back("+f16c");
>>}
>> -  builder.setMAttrs(MAttrs);
>> +  if (util_cpu_caps.has_avx2) {
>> + MAttrs.push_back("+avx2");
>> +  }
>> +   }
>> +   if (util_cpu_caps.has_altivec) {
>> +  MAttrs.push_back("+altivec");
>> }
>> +   builder.setMAttrs(MAttrs);
>>  
>>  #if HAVE_LLVM >= 0x0305
>> StringRef MCPU = llvm::sys::getHostCPUName();
>>
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev=BQIGaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I=bHU3Xahz-uDZ6-5z71qyASt0F2O3ZeKC4cD2GpiDv8c=hgElrxhkotuJUOBB43VA6IPz3E7UDEO3splLILFioaM=
>  
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gbm: Add a flag to enable creation of rotated scanout buffers

2015-10-21 Thread Michel Dänzer

On 22.10.2015 10:47, Vivek Kasireddy wrote:
> For certain platforms that support rotated scanout buffers, currently,
> there is no way to create them with the GBM DRI interface. This flag
> will tell the DRI driver to set Y-tiling while creating the rotated
> scanout buffer.

Please split up the GBM and i965 changes.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/9] i965: Compact acc_wr_control only on Gen6+.

2015-10-21 Thread Iago Toral

On Tue, 2015-10-20 at 11:38 -0700, Matt Turner wrote:
> On Tue, Oct 20, 2015 at 1:51 AM, Iago Toral  wrote:
> > On Mon, 2015-10-19 at 21:09 -0700, Matt Turner wrote:
> >> It only exists on Gen6+, and the next patches will add compaction
> >> support for the (unused) field in the same location on earlier
> >> platforms.
> >
> > The docs say that this exists also in ILK at least. See Page 131 of:
> > https://01.org/sites/default/files/documentation/ilk_ihd_os_vol4_part2_july_28_10_0.pdf
> >
> > However, I see some places in the i965 code where dealing with this is
> > surrounded by if (gen >= 6)...
> >
> > Is this a bug in the ILK documentation?
> 
> Yes. The ILK docs are terrible and contain more SNB documentation than
> ILK documentation. :(

Ugh! :( Anyway, in that case, patches 4-6 are:

Reviewed-by: Iago Toral Quiroga 

Iago

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 7/9] i965: Add mask_control_ex field and handle it in compaction.

2015-10-21 Thread Iago Toral

On Mon, 2015-10-19 at 21:09 -0700, Matt Turner wrote:
> Documentation is sparse, but it appears to have existed on G45 and ILK
> as a second bit extension of the mask_control field. Setting the pair of
> bits to 0b11 enables "NoCMask".

It shows up in the compacted table for g45 in bit 23, but bit 28 of
regular instructions seems to be unused as per this document:
https://01.org/sites/default/files/documentation/g45_vol_4_subsystem_0.pdf

There are references to that extension bit in the docs though so I guess
this is another documentation bug? Of course ILK does not have any
mentions to this, but I guess that is not surprising.

I have to ask though, how did you find/verify this? :)

Iago

> ---
>  src/mesa/drivers/dri/i965/brw_eu_compact.c | 4 
>  src/mesa/drivers/dri/i965/brw_inst.h   | 2 ++
>  2 files changed, 6 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_eu_compact.c 
> b/src/mesa/drivers/dri/i965/brw_eu_compact.c
> index b122dec..f787ea3 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu_compact.c
> +++ b/src/mesa/drivers/dri/i965/brw_eu_compact.c
> @@ -1018,6 +1018,8 @@ brw_try_compact_instruction(const struct 
> brw_device_info *devinfo,
>  
> if (devinfo->gen >= 6) {
>compact(acc_wr_control);
> +   } else {
> +  compact(mask_control_ex);
> }
>  
> compact(cond_modifier);
> @@ -1229,6 +1231,8 @@ brw_uncompact_instruction(const struct brw_device_info 
> *devinfo, brw_inst *dst,
>  
> if (devinfo->gen >= 6) {
>uncompact(acc_wr_control);
> +   } else {
> +  uncompact(mask_control_ex);
> }
>  
> uncompact(cond_modifier);
> diff --git a/src/mesa/drivers/dri/i965/brw_inst.h 
> b/src/mesa/drivers/dri/i965/brw_inst.h
> index cb3d7e6..819ce59 100644
> --- a/src/mesa/drivers/dri/i965/brw_inst.h
> +++ b/src/mesa/drivers/dri/i965/brw_inst.h
> @@ -182,6 +182,7 @@ F(debug_control,30,  30)
>  F(cmpt_control, 29,  29)
>  FC(branch_control,  28,  28, devinfo->gen >= 8)
>  FC(acc_wr_control,  28,  28, devinfo->gen >= 6)
> +FC(mask_control_ex, 28,  28, devinfo->is_g4x || devinfo->gen == 5)
>  F(cond_modifier,27,  24)
>  FC(math_function,   27,  24, devinfo->gen >= 6)
>  F(exec_size,23,  21)
> @@ -792,6 +793,7 @@ F(cmpt_control, 29, 29) /* Same location as brw_inst 
> */
>  FC(flag_subreg_nr,  28, 28, devinfo->gen <= 6)
>  F(cond_modifier,27, 24) /* Same location as brw_inst */
>  FC(acc_wr_control,  23, 23, devinfo->gen >= 6)
> +FC(mask_control_ex, 23, 23, devinfo->is_g4x || devinfo->gen == 5)
>  F(subreg_index, 22, 18)
>  F(datatype_index,   17, 13)
>  F(control_index,12,  8)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH] i965: book space at the end of p->store for SEND opcodes to avoid invalid memory access

2015-10-21 Thread Samuel Iglesias Gonsalvez

The caller to brw_next_insn() could be brw_send_indirect_message()
as a result of creating a SEND instruction after the OR
used to load the indirect descriptor to an address register.

In that case, the pointer to the OR instruction is
p->store[p->nr_insn - 1] and as it will be saved to specify additional
descriptor bits later throught brw_set_*_message() calls, we need to
have that pointer to valid memory.

If we realloc when processing the SEND instruction from the same
brw_send_indirect_message() call, the old p->store could be free'd and
the saved OR instruction pointer becomes invalid. Then undefined results
will happen when accessing to it to specify the additional descriptor
bits.

To avoid that, we only reallocate when we are close enough to the end
of p->store and not generating a SEND instruction or when we are really
running out of store size after several consecutive SEND at the end of
the store table. That should be enough to avoid this invalid memory
access problem.

Fixes ~120 dEQP-GLES31.functional.ssbo.* tests.

Signed-off-by: Samuel Iglesias Gonsalvez 
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index bf2fee9..3b97cfa 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -810,7 +810,17 @@ brw_next_insn(struct brw_codegen *p, unsigned opcode)
const struct brw_device_info *devinfo = p->devinfo;
brw_inst *insn;
 
-   if (p->nr_insn + 1 > p->store_size) {
+
+   /* Book enough room for one or more consecutive SEND* instructions at the
+* end of the p->store table in order to avoid reallocating p->store in
+* the middle of brw_send_indirect_message().
+*
+* The 32 value was chosen arbitrary to make this problem less likely to
+* happen and because it has low impact in p->store (its initial size is
+* 1024).
+*/
+   if ((p->nr_insn + 32 > p->store_size && opcode != BRW_OPCODE_SEND) ||
+   (p->nr_insn + 1 > p->store_size)) {
   p->store_size <<= 1;
   p->store = reralloc(p->mem_ctx, p->store, brw_inst, p->store_size);
}
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] i965: Invalid memory accesses after resizing brw_codegen's store table

2015-10-21 Thread Samuel Iglesias Gonsalvez

Hello,

I have found several invalid memory accesses when running
dEQP-GLES31.functional.ssbo.* tests on i965 driver (and gen7+). That
invalid memory accesses were unluckily happening when generating the
assembly instructions for SSBO stores for different compute shaders.

However it looks like this problem could happen to other shaders and
situations. Because of that, I am going to explain the problem here:

When generating a untyped surface write/read, i965 driver will end up
calling brw_send_indirect_message() through
brw_send_indirect_surface_message(). At brw_send_indirect_message()'s
'else' branch, the code generates a load of the indirect descriptor to
an address register using an OR instruction and it also generates a new
SEND instruction; if this case happens, the OR instruction is returned.
brw_send_indirect_surface_message() uses that OR instruction to set mlen
and rlen's descriptor bits later.

Just to give more context, when generating instructions in fs/vec4
generators, i965 driver uses pointers to elements in the 'store' table
inside struct brw_codegen. That table has an initial size of 1024 but,
when it's full, it is resized (doubling its size each time,
see brw_next_insn()). This resize operation ends up calling
realloc(). However the returned pointer by realloc() could be different
and the old allocated memory would be free'd as part of the process.

Back to the issue, if the p->store's resize happens when we get the
pointer to the SEND instruction at brw_send_indirect_message()'s 'else'
branch, we could have the following problem:

The realloc() returns a new pointer and *free'd* the old allocation, so
the pointer we previously saved for the OR instruction at
brw_send_indirect_surface_message() becomes invalid (because it is a
pointer of the old allocation). Then, we access to that invalid pointer
when setting up rlen/mlen bits at the end of
brw_send_indirect_surface_message() and we would have undefined results.

This issue is quite unlikely to happen but it is reproducible on
~120 dEQP-GLES31.functional.ssbo.* tests, basically because they have
the same shaders except the buffer variable's data type. Those tests
were failing intermittently at different rates but valgrind helped to
find what was happening.

I would like to expose publicly the problem and analyse possible
solutions for it along with the community. For the time being, a patch
is proposed to mitigate this issue, which is sent as a reply to this
one.

What this work-around patch does is: book enough room for one or more
consecutive SEND* instructions at the end of the p->store table in order
to avoid reallocating p->store in the aforementioned case. The 32 value
was chosen arbitrary because it has low impact in p->store
(its initial size is 1024) and makes this issue much less likely to
happen. We could tune this number to a less conservative value if
needed. If you want to test it, that patch should be applied on top of
this Curro's patch [0] as it fixes a lot of compute shader compilation
errors (~700 dEQP-GLES31.functional.ssbo.* tests). I have setup a branch
with both patches in [1].

Feel free to comment about other solutions, ideas, opinions, etc.

Thanks,

Sam

[0] See attachment at
http://lists.freedesktop.org/archives/mesa-dev/2015-October/097183.html
[1] $ git clone -b dEQP-functional-ssbo-fixes-v1 \
https://github.com/Igalia/mesa.git

Samuel Iglesias Gonsalvez (1):
  i965: book space at the end of p->store for SEND opcodes to avoid
invalid memory access

 src/mesa/drivers/dri/i965/brw_eu_emit.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/4] st/dri2: Add shared flag to missing locations

2015-10-21 Thread Axel Davy

The PIPE_BIND_SHARED flag should be added whenever
the resource may be shared with another process.

In particular if the resource is imported, or may
be exported, the flag should be used.

Signed-off-by: Axel Davy 
---
 src/gallium/state_trackers/dri/dri2.c | 9 +++--
 src/gallium/state_trackers/dri/dri_drawable.c | 3 ++-
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/gallium/state_trackers/dri/dri2.c 
b/src/gallium/state_trackers/dri/dri2.c
index 019414b..5f5bc86 100644
--- a/src/gallium/state_trackers/dri/dri2.c
+++ b/src/gallium/state_trackers/dri/dri2.c
@@ -554,7 +554,8 @@ dri2_allocate_textures(struct dri_context *ctx,
 
  if (drawable->textures[statt]) {
 templ.format = drawable->textures[statt]->format;
-templ.bind = drawable->textures[statt]->bind & ~PIPE_BIND_SCANOUT;
+templ.bind = drawable->textures[statt]->bind &
+   ~(PIPE_BIND_SCANOUT | PIPE_BIND_SHARED);
 templ.nr_samples = drawable->stvis.samples;
 
 /* Try to reuse the resource.
@@ -717,7 +718,8 @@ dri2_create_image_from_winsys(__DRIscreen *_screen,
unsigned tex_usage;
enum pipe_format pf;
 
-   tex_usage = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW;
+   tex_usage = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW |
+  PIPE_BIND_SHARED;
 
switch (format) {
case __DRI_IMAGE_FORMAT_RGB565:
@@ -1089,6 +1091,9 @@ dri2_create_from_texture(__DRIcontext *context, int 
target, unsigned texture,
   return NULL;
}
 
+   /* TODO: The initial texture was not created with the PIPE_BIND_SHARED flag.
+* There should be a way to add this flag after creation. This flag is
+* needed for EGLImages. */
pipe_resource_reference(>texture, tex);
 
*error = __DRI_IMAGE_ERROR_SUCCESS;
diff --git a/src/gallium/state_trackers/dri/dri_drawable.c 
b/src/gallium/state_trackers/dri/dri_drawable.c
index f0cc4a2..04041d6 100644
--- a/src/gallium/state_trackers/dri/dri_drawable.c
+++ b/src/gallium/state_trackers/dri/dri_drawable.c
@@ -285,7 +285,8 @@ dri_drawable_get_format(struct dri_drawable *drawable,
* to use an sRGB format here.
*/
   *format = util_format_linear(drawable->stvis.color_format);
-  *bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW;
+  *bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW |
+ PIPE_BIND_SHARED;
   break;
case ST_ATTACHMENT_DEPTH_STENCIL:
   *format = drawable->stvis.depth_stencil_format;
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/4] dri: Add backbuffer use flag

2015-10-21 Thread Axel Davy

Add __DRI_IMAGE_USE_BACKBUFFER to indicate the
image is going to be used as a backbuffer.

Backbuffers are going to be attached as
__DRI_BUFFER_BACK_LEFT or
__DRI_BUFFER_BACK_RIGHT.

This flag enables the driver to assume the
buffer will only be read by an external process after
a swapbuffer, in contrary to gbm buffers,
front buffers and fake front buffers, which could be
read after a flush.

Signed-off-by: Axel Davy 
---
 include/GL/internal/dri_interface.h | 1 +
 src/egl/drivers/dri2/platform_wayland.c | 3 ++-
 src/glx/dri3_glx.c  | 6 --
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/GL/internal/dri_interface.h 
b/include/GL/internal/dri_interface.h
index a0f155a..555894a 100644
--- a/include/GL/internal/dri_interface.h
+++ b/include/GL/internal/dri_interface.h
@@ -1091,6 +1091,7 @@ struct __DRIdri2ExtensionRec {
 #define __DRI_IMAGE_USE_SCANOUT0x0002
 #define __DRI_IMAGE_USE_CURSOR 0x0004 /* Depricated */
 #define __DRI_IMAGE_USE_LINEAR 0x0008
+#define __DRI_IMAGE_USE_BACKBUFFER 0x0010
 
 
 /**
diff --git a/src/egl/drivers/dri2/platform_wayland.c 
b/src/egl/drivers/dri2/platform_wayland.c
index 92ff2af..1fbc271 100644
--- a/src/egl/drivers/dri2/platform_wayland.c
+++ b/src/egl/drivers/dri2/platform_wayland.c
@@ -352,7 +352,8 @@ get_back_bo(struct dri2_egl_surface *dri2_surf)
if (dri2_surf->back == NULL)
   return -1;
 
-   use_flags = __DRI_IMAGE_USE_SHARE | __DRI_IMAGE_USE_SCANOUT;
+   use_flags = __DRI_IMAGE_USE_SHARE | __DRI_IMAGE_USE_SCANOUT |
+  __DRI_IMAGE_USE_BACKBUFFER;
 
if (dri2_dpy->is_different_gpu &&
dri2_surf->back->linear_copy == NULL) {
diff --git a/src/glx/dri3_glx.c b/src/glx/dri3_glx.c
index 96f13e6..feee6e6 100644
--- a/src/glx/dri3_glx.c
+++ b/src/glx/dri3_glx.c
@@ -880,7 +880,8 @@ dri3_alloc_render_buffer(struct glx_screen *glx_screen, 
Drawable draw,
   width, height,
   format,
   __DRI_IMAGE_USE_SHARE |
-  __DRI_IMAGE_USE_SCANOUT,
+  __DRI_IMAGE_USE_SCANOUT |
+  __DRI_IMAGE_USE_BACKBUFFER,
   buffer);
   pixmap_buffer = buffer->image;
 
@@ -904,7 +905,8 @@ dri3_alloc_render_buffer(struct glx_screen *glx_screen, 
Drawable draw,
   width, height,
   format,
   
__DRI_IMAGE_USE_SHARE |
-  
__DRI_IMAGE_USE_LINEAR,
+  
__DRI_IMAGE_USE_LINEAR |
+  
__DRI_IMAGE_USE_BACKBUFFER,
   buffer);
   pixmap_buffer = buffer->linear_buffer;
 
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/4] pipe: Add new bind flag for shared resources with flush_resource call

2015-10-21 Thread Axel Davy

Add a new bind flag to differentiate shared resources
that must be readable after any flush, or that can afford
being readable only after flush_resource.

Previously the two cases were mixed, and implictly things were done
such that there would be no issues.

flush_resource is called for:
. st/nine back buffers
. dri2 and dri3 back buffers (both wayland and x11)

flush_resource is not called for:
. gbm buffers
. dri2 and dri3 x11 (fake/real) front buffers
. EGLImages (they can be shared)

I didn't look at what the other state trackers do, but a grep
said there is no flush_resource call outside dri2 and nine state
trackers.

Signed-off-by: Axel Davy 
---
 src/gallium/include/pipe/p_defines.h  |  8 
 src/gallium/state_trackers/dri/dri2.c | 17 +
 src/gallium/state_trackers/dri/dri_drawable.c |  9 +++--
 src/gallium/state_trackers/nine/swapchain9.c  | 10 +++---
 4 files changed, 31 insertions(+), 13 deletions(-)

diff --git a/src/gallium/include/pipe/p_defines.h 
b/src/gallium/include/pipe/p_defines.h
index 1ad545a..f877893 100644
--- a/src/gallium/include/pipe/p_defines.h
+++ b/src/gallium/include/pipe/p_defines.h
@@ -399,6 +399,14 @@ enum pipe_flush_flags
 #define PIPE_BIND_SHARED  (1 << 19) /* get_texture_handle ??? */
 #define PIPE_BIND_LINEAR  (1 << 20)
 
+/* This flag indicates that in addition to being shared, the resource won't be
+ * read by any external process before we call flush_resource. This allows
+ * things like compressing the buffer when drawing, while uncompressing on
+ * flush_resource. The PIPE_BIND_SHARED must still be set with this flag.
+ * If PIPE_BIND_SHARED is specified but not
+ * PIPE_BIND_SHARED_FLUSH_RESOURCE, then the resource must be
+ * readable by external processes after any normal flush. */
+#define PIPE_BIND_SHARED_FLUSH_RESOURCE   (1 << 21)
 
 /**
  * Flags for the driver about resource behaviour:
diff --git a/src/gallium/state_trackers/dri/dri2.c 
b/src/gallium/state_trackers/dri/dri2.c
index 5f5bc86..74b398f 100644
--- a/src/gallium/state_trackers/dri/dri2.c
+++ b/src/gallium/state_trackers/dri/dri2.c
@@ -291,27 +291,25 @@ dri2_allocate_buffer(__DRIscreen *sPriv,
struct dri2_buffer *buffer;
struct pipe_resource templ;
enum pipe_format pf;
-   unsigned bind = 0;
+   unsigned bind = PIPE_BIND_SHARED; /* because we get the handle and stride */
struct winsys_handle whandle;
 
switch (attachment) {
   case __DRI_BUFFER_FRONT_LEFT:
   case __DRI_BUFFER_FAKE_FRONT_LEFT:
- bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW;
+ bind |= PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW;
  break;
   case __DRI_BUFFER_BACK_LEFT:
- bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW;
+ bind |= PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW |
+PIPE_BIND_SHARED_FLUSH_RESOURCE;
  break;
   case __DRI_BUFFER_DEPTH:
   case __DRI_BUFFER_DEPTH_STENCIL:
   case __DRI_BUFFER_STENCIL:
-bind = PIPE_BIND_DEPTH_STENCIL; /* XXX sampler? */
+bind |= PIPE_BIND_DEPTH_STENCIL; /* XXX sampler? */
  break;
}
 
-   /* because we get the handle and stride */
-   bind |= PIPE_BIND_SHARED;
-
switch (format) {
   case 32:
  pf = PIPE_FORMAT_BGRA_UNORM;
@@ -555,7 +553,8 @@ dri2_allocate_textures(struct dri_context *ctx,
  if (drawable->textures[statt]) {
 templ.format = drawable->textures[statt]->format;
 templ.bind = drawable->textures[statt]->bind &
-   ~(PIPE_BIND_SCANOUT | PIPE_BIND_SHARED);
+   ~(PIPE_BIND_SCANOUT | PIPE_BIND_SHARED |
+ PIPE_BIND_SHARED_FLUSH_RESOURCE);
 templ.nr_samples = drawable->stvis.samples;
 
 /* Try to reuse the resource.
@@ -834,6 +833,8 @@ dri2_create_image(__DRIscreen *_screen,
   tex_usage |= PIPE_BIND_SCANOUT;
if (use & __DRI_IMAGE_USE_SHARE)
   tex_usage |= PIPE_BIND_SHARED;
+   if (use & __DRI_IMAGE_USE_BACKBUFFER)
+  tex_usage |= PIPE_BIND_SHARED | PIPE_BIND_SHARED_FLUSH_RESOURCE;
if (use & __DRI_IMAGE_USE_LINEAR)
   tex_usage |= PIPE_BIND_LINEAR;
if (use & __DRI_IMAGE_USE_CURSOR) {
diff --git a/src/gallium/state_trackers/dri/dri_drawable.c 
b/src/gallium/state_trackers/dri/dri_drawable.c
index 04041d6..badd111 100644
--- a/src/gallium/state_trackers/dri/dri_drawable.c
+++ b/src/gallium/state_trackers/dri/dri_drawable.c
@@ -276,9 +276,7 @@ dri_drawable_get_format(struct dri_drawable *drawable,
 {
switch (statt) {
case ST_ATTACHMENT_FRONT_LEFT:
-   case ST_ATTACHMENT_BACK_LEFT:
case ST_ATTACHMENT_FRONT_RIGHT:
-   case ST_ATTACHMENT_BACK_RIGHT:
   /* Other pieces of the driver stack get confused and behave incorrectly
* when they get an sRGB drawable. st/mesa receives "drawable->stvis"
* though other means and handles it correctly, so we don't really need
@@ -288,6 +286,13 @@

Re: [Mesa-dev] [PATCH 2/4] mesa: Draw Indirect is not allowed when no vertex array binding exists.

2015-10-21 Thread Marek Olšák

On Wed, Oct 21, 2015 at 7:16 AM, Tapani Pälli  wrote:
> On 10/20/2015 08:54 PM, Marek Olšák wrote:
>>
>> On Tue, Oct 20, 2015 at 4:19 PM, Marta Lofstedt
>>  wrote:
>>>
>>> From: Marta Lofstedt 
>>>
>>> OpenGL ES 3.1 spec. section 10.5:
>>> "An INVALID_OPERATION error is generated if zero is bound
>>> to VERTEX_ARRAY_BINDING, DRAW_INDIRECT_BUFFER or to
>>> any enabled vertex array."
>>>
>>> Signed-off-by: Marta Lofstedt 
>>> ---
>>>   src/mesa/main/api_validate.c | 14 ++
>>>   1 file changed, 14 insertions(+)
>>>
>>> diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
>>> index c5628f5..7062cbd 100644
>>> --- a/src/mesa/main/api_validate.c
>>> +++ b/src/mesa/main/api_validate.c
>>> @@ -711,6 +711,20 @@ valid_draw_indirect(struct gl_context *ctx,
>>> return GL_FALSE;
>>>  }
>>>
>>> +   /*
>>> +* OpenGL ES 3.1 spec. section 10.5:
>>> +* "An INVALID_OPERATION error is generated if zero is bound to
>>> +* VERTEX_ARRAY_BINDING, DRAW_INDIRECT_BUFFER or to any enabled
>>> +* vertex array."
>>> +* OpenGL 4.5 spec. section 10.4:
>>> +* "An INVALID_OPERATION error is generated if  zero is bound to
>>> +* DRAW_INDIRECT_BUFFER, or if  no element array buffer is bound"
>>> +*/
>>> +   if (!_mesa_is_bufferobj(ctx->Array.ArrayBufferObj)) {
>>> +  _mesa_error(ctx, GL_INVALID_OPERATION,
>>> +  "%s(No VBO is bound)", name);
>>> +   }
>>
>> NAK.
>>
>> VERTEX_ARRAY_BINDING is a VAO. Array.ArrayBufferObj is from glBindBuffer.
>
>
> This check is valid, it is not against VERTEX_ARRAY_BINDING. Note "any
> enabled vertex array", we hit this weird situation when client has a VAO
> bound and has enabled vertex attrib array but has not bound any VBO to it.

No, it's invalid. The check has absolutely nothing to do with enabled
vertex arrays and draw calls. Absolutely nothing. glBindBuffer changes
a latched state, which means it doesn't do anything by itself, it only
affects other functions that change states. The functions affected by
glBindBuffer(GL_ARRAY_BUFFER, ..) are glVertexAttribPointer, etc. not
glDraw*. If you called glBindBuffer(GL_ARRAY_BUFFER, ..) right before
a Draw call, it wouldn't do anything to vertex arrays and buffers, but
it would pass the check.

Now, where does this patch check "enabled vertex arrays"? Nowhere. It
doesn't check VERTEX_ARRAY_BINDING, it doesn't check
DRAW_INDIRECT_BUFFER, and it doesn't check enabled vertex arrays. That
whole comment is completely useless there.

Sorry if I'm too direct, but you should really think more before
making such statements and giving Reviewed-by.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for more message types

2015-10-21 Thread Neil Roberts

Ilia Mirkin  writes:

>> -   if (tex_inst->opcode == SHADER_OPCODE_TG4 ||
>> +   if (tex_inst->opcode == SHADER_OPCODE_TXS ||
>> +   tex_inst->opcode == SHADER_OPCODE_LOD ||
>> +   tex_inst->opcode == SHADER_OPCODE_TG4 ||
>> tex_inst->opcode == SHADER_OPCODE_TG4_OFFSET)
>
> Do you also need to include SHADER_OPCODE_SAMPLEINFO?

Oops, yes, thanks! I've gone ahead and pushed it with this additional
change.

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 1/7] radeonsi: Allocate buffers for DCC.

2015-10-21 Thread Bas Nieuwenhuizen

On Wed, Oct 21, 2015 at 9:56 AM, Axel Davy  wrote:
> On 21/10/2015 00:10, Bas Nieuwenhuizen wrote:
>>
>>
>> DCC is disabled for textures that can be shared as sharing the
>> DCC buffers has not been implemented yet.
>>
>>
>>   +   surf->dcc_enabled =  !(surf->flags & RADEON_SURF_Z_OR_SBUFFER) &&
>> +!(surf->flags & RADEON_SURF_SCANOUT) &&
>> +!compressed && AddrDccIn.numSamples <= 1;
>> +
>>
>
> Testing if a surface is scanout is not enough to avoid shared surfaces.
>
> In practice, it may be true currently mesa, and glamor via gbm, would use
> the scanout flag for shared
> buffers. It seems however a bit weak to rely on that.
>
> I suggest rather to use the pipe shared bind flag.
>
>  I noticed in some case of imported surfaces the bind flag is not
> advertised, I'm going to send a patch to fix that.

The commit message indeed does not agree with the commit.

Note that with the DCC decompress patches we essentially have the same
behavior as the CMASK fast clear for shared non-displayable surfaces:
after writing and before using it in another application you need to
call flush_resource.

- Bas
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/7] glsl: set image access qualifiers for AoA

2015-10-21 Thread Francisco Jerez

Timothy Arceri  writes:

> On Fri, 2015-10-16 at 10:28 +1100, Timothy Arceri wrote:
>> Cc: Francisco Jerez 
>
> Hi Curro,
>
> Just pinging you on this patch and patch 5. These are the final two
> patches remaining unreviewed before I can enable arrays of arrays.
>
> If your not able to review these can you let me know so I can chase
> this upvwith someone else as I'd like to enable this as soon as
> possible to limit breakage.
>

Depends when you want them reviewed, doesn't seem like the kind of thing
I could convince myself is correct during a break.  I'm unlikely to have
time to review it this week.  I might next week but I cannot give you
any guarantees.  If that's not acceptable for you feel free to look for
someone else to review them.

> Thanks,
> Tim
>
>
>> ---
>>  src/glsl/link_uniforms.cpp | 77 +---
>> --
>>  1 file changed, 49 insertions(+), 28 deletions(-)
>> 
>> diff --git a/src/glsl/link_uniforms.cpp b/src/glsl/link_uniforms.cpp
>> index 647aa2b..2a1da07 100644
>> --- a/src/glsl/link_uniforms.cpp
>> +++ b/src/glsl/link_uniforms.cpp
>> @@ -1008,38 +1008,37 @@ link_update_uniform_buffer_variables(struct
>> gl_shader *shader)
>> }
>>  }
>>  
>> -/**
>> - * Scan the program for image uniforms and store image unit access
>> - * information into the gl_shader data structure.
>> - */
>>  static void
>> -link_set_image_access_qualifiers(struct gl_shader_program *prog)
>> +link_set_image_access_qualifiers(struct gl_shader_program *prog,
>> + gl_shader *sh, unsigned
>> shader_stage,
>> + ir_variable *var, const glsl_type
>> *type,
>> + char **name, size_t name_length)
>>  {
>> -   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
>> -  gl_shader *sh = prog->_LinkedShaders[i];
>> -
>> -  if (sh == NULL)
>> - continue;
>> +   /* Handle arrays of arrays */
>> +   if (type->is_array() && type->fields.array->is_array()) {
>> +  for (unsigned i = 0; i < type->length; i++) {
>> + size_t new_length = name_length;
>>  
>> -  foreach_in_list(ir_instruction, node, sh->ir) {
>> - ir_variable *var = node->as_variable();
>> + /* Append the subscript to the current variable name */
>> + ralloc_asprintf_rewrite_tail(name, _length, "[%u]", i);
>>  
>> - if (var && var->data.mode == ir_var_uniform &&
>> - var->type->contains_image()) {
>> -unsigned id = 0;
>> -bool found = prog->UniformHash->get(id, var->name);
>> -assert(found);
>> -(void) found;
>> -const gl_uniform_storage *storage = 
>> ->UniformStorage[id];
>> -const unsigned index = storage->opaque[i].index;
>> -const GLenum access = (var->data.image_read_only ?
>> GL_READ_ONLY :
>> -   var->data.image_write_only ?
>> GL_WRITE_ONLY :
>> -   GL_READ_WRITE);
>> -
>> -for (unsigned j = 0; j < MAX2(1, storage
>> ->array_elements); ++j)
>> -   sh->ImageAccess[index + j] = access;
>> - }
>> + link_set_image_access_qualifiers(prog, sh, shader_stage,
>> var,
>> +  type->fields.array, name,
>> +  new_length);
>>}
>> +   } else {
>> +  unsigned id = 0;
>> +  bool found = prog->UniformHash->get(id, *name);
>> +  assert(found);
>> +  (void) found;
>> +  const gl_uniform_storage *storage = >UniformStorage[id];
>> +  const unsigned index = storage->opaque[shader_stage].index;
>> +  const GLenum access = (var->data.image_read_only ?
>> GL_READ_ONLY :
>> + var->data.image_write_only ?
>> GL_WRITE_ONLY :
>> + GL_READ_WRITE);
>> +
>> +  for (unsigned j = 0; j < MAX2(1, storage->array_elements);
>> ++j)
>> + sh->ImageAccess[index + j] = access;
>> }
>>  }
>>  
>> @@ -1300,7 +1299,29 @@ link_assign_uniform_locations(struct
>> gl_shader_program *prog,
>> prog->NumHiddenUniforms = hidden_uniforms;
>> prog->UniformStorage = uniforms;
>>  
>> -   link_set_image_access_qualifiers(prog);
>> +   /**
>> +* Scan the program for image uniforms and store image unit
>> access
>> +* information into the gl_shader data structure.
>> +*/
>> +   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
>> +  gl_shader *sh = prog->_LinkedShaders[i];
>> +
>> +  if (sh == NULL)
>> + continue;
>> +
>> +  foreach_in_list(ir_instruction, node, sh->ir) {
>> + ir_variable *var = node->as_variable();
>> +
>> + if (var && var->data.mode == ir_var_uniform &&
>> + var->type->contains_image()) {
>> +char *name_copy = ralloc_strdup(NULL, var->name);
>> +link_set_image_access_qualifiers(prog, sh, i, var, var
>> ->type,
>> +

[Mesa-dev] [PATCH] glsl: fix shader storage block member rules when adding program resources

2015-10-21 Thread Samuel Iglesias Gonsalvez

Commit f24e5e did not take into account arrays of named shader
storage blocks.

Fixes 20 dEQP-GLES31.functional.ssbo.* tests:

dEQP-GLES31.functional.ssbo.layout.single_struct_array.per_block_buffer.shared_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.per_block_buffer.packed_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.per_block_buffer.std140_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.per_block_buffer.std430_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.single_buffer.shared_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.single_buffer.packed_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.single_buffer.std140_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.single_buffer.std430_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.per_block_buffer.shared_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.per_block_buffer.packed_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.per_block_buffer.std140_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.per_block_buffer.std430_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.single_buffer.shared_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.single_buffer.packed_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.single_buffer.std140_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.single_buffer.std430_instance_array
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.2
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.29
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.33
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.3

Signed-off-by: Samuel Iglesias Gonsalvez 
---
 src/glsl/linker.cpp | 34 +-
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
index 07ea0e0..6593e58 100644
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -3138,6 +3138,9 @@ should_add_buffer_variable(struct gl_shader_program 
*shProg,
 {
bool found_interface = false;
const char *block_name = NULL;
+   unsigned block_name_len = 0;
+   const char *first_dot = strchr(name, '.');
+   const char *first_square_bracket = strchr(name, '[');
 
/* These rules only apply to buffer variables. So we return
 * true for the rest of types.
@@ -3147,7 +3150,27 @@ should_add_buffer_variable(struct gl_shader_program 
*shProg,
 
for (unsigned i = 0; i < shProg->NumBufferInterfaceBlocks; i++) {
   block_name = shProg->BufferInterfaceBlocks[i].Name;
-  if (strncmp(block_name, name, strlen(block_name)) == 0) {
+  block_name_len = strlen(block_name);
+
+  const char *first_block_square_bracket = strchr(block_name, '[');
+  if (first_block_square_bracket) {
+ /* The block is part of an array of named interfaces,
+  * for the name comparison we ignore the "[x]" part.
+  */
+ block_name_len -= strlen(first_block_square_bracket);
+  }
+
+  if (first_dot) {
+ /* Check if the variable name starts with the interface
+  * name. The interface name (if present) should have the
+  * length than the interface block name we are comparing to.
+  */
+ unsigned len = strlen(name) - strlen(first_dot);
+ if (len != block_name_len)
+continue;
+  }
+
+  if (strncmp(block_name, name, block_name_len) == 0) {
  found_interface = true;
  break;
   }
@@ -3156,8 +3179,11 @@ should_add_buffer_variable(struct gl_shader_program 
*shProg,
/* We remove the interface name from the buffer variable name,
 * including the dot that follows it.
 */
-   if (found_interface)
-  name = name + strlen(block_name) + 1;
+   if (found_interface) {
+  name = name + block_name_len + 1;
+  first_dot = strchr(name, '.');
+  first_square_bracket = strchr(name, '[');
+   }
 
/* From: ARB_program_interface_query extension:
 *
@@ -3166,8 +3192,6 @@ should_add_buffer_variable(struct gl_shader_program 
*shProg,
 *   of its type.  For arrays of aggregate types, the enumeration rules are
 *   applied recursively for the single enumerated array element.
 */
-   const char *first_dot = strchr(name, '.');
-   const char *first_square_bracket = strchr(name, '[');
 
/* The buffer variable is on top level and it is not an array */
if (!first_square_bracket) {
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/7] glsl: set image access qualifiers for AoA

2015-10-21 Thread Timothy Arceri

On Wed, 2015-10-21 at 13:06 +0300, Francisco Jerez wrote:
> Timothy Arceri  writes:
> 
> > On Fri, 2015-10-16 at 10:28 +1100, Timothy Arceri wrote:
> > > Cc: Francisco Jerez 
> > 
> > Hi Curro,
> > 
> > Just pinging you on this patch and patch 5. These are the final two
> > patches remaining unreviewed before I can enable arrays of arrays.
> > 
> > If your not able to review these can you let me know so I can chase
> > this upvwith someone else as I'd like to enable this as soon as
> > possible to limit breakage.
> > 
> 
> Depends when you want them reviewed, doesn't seem like the kind of
> thing
> I could convince myself is correct during a break.  I'm unlikely to
> have
> time to review it this week.  I might next week but I cannot give you
> any guarantees.  If that's not acceptable for you feel free to look
> for
> someone else to review them.

Not a problem. I thought you must have been back from your break as
I've seen alot of email from you. No need to look at this while your
meant to be on break, if no one has looked at it before you get back
then it would be great if you could take a look :)

Thanks,
Tim

> 
> > Thanks,
> > Tim
> > 
> > 
> > > ---
> > >  src/glsl/link_uniforms.cpp | 77 +---
> > > 
> > > --
> > >  1 file changed, 49 insertions(+), 28 deletions(-)
> > > 
> > > diff --git a/src/glsl/link_uniforms.cpp
> > > b/src/glsl/link_uniforms.cpp
> > > index 647aa2b..2a1da07 100644
> > > --- a/src/glsl/link_uniforms.cpp
> > > +++ b/src/glsl/link_uniforms.cpp
> > > @@ -1008,38 +1008,37 @@
> > > link_update_uniform_buffer_variables(struct
> > > gl_shader *shader)
> > > }
> > >  }
> > >  
> > > -/**
> > > - * Scan the program for image uniforms and store image unit
> > > access
> > > - * information into the gl_shader data structure.
> > > - */
> > >  static void
> > > -link_set_image_access_qualifiers(struct gl_shader_program *prog)
> > > +link_set_image_access_qualifiers(struct gl_shader_program *prog,
> > > + gl_shader *sh, unsigned
> > > shader_stage,
> > > + ir_variable *var, const
> > > glsl_type
> > > *type,
> > > + char **name, size_t
> > > name_length)
> > >  {
> > > -   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
> > > -  gl_shader *sh = prog->_LinkedShaders[i];
> > > -
> > > -  if (sh == NULL)
> > > -  continue;
> > > +   /* Handle arrays of arrays */
> > > +   if (type->is_array() && type->fields.array->is_array()) {
> > > +  for (unsigned i = 0; i < type->length; i++) {
> > > +  size_t new_length = name_length;
> > >  
> > > -  foreach_in_list(ir_instruction, node, sh->ir) {
> > > -  ir_variable *var = node->as_variable();
> > > +  /* Append the subscript to the current variable name */
> > > +  ralloc_asprintf_rewrite_tail(name, _length, "[%u]",
> > > i);
> > >  
> > > - if (var && var->data.mode == ir_var_uniform &&
> > > - var->type->contains_image()) {
> > > -unsigned id = 0;
> > > -bool found = prog->UniformHash->get(id, var->name);
> > > -assert(found);
> > > -(void) found;
> > > -const gl_uniform_storage *storage = 
> > > ->UniformStorage[id];
> > > -const unsigned index = storage->opaque[i].index;
> > > -const GLenum access = (var->data.image_read_only ?
> > > GL_READ_ONLY :
> > > -   var->data.image_write_only ?
> > > GL_WRITE_ONLY :
> > > -   GL_READ_WRITE);
> > > -
> > > -for (unsigned j = 0; j < MAX2(1, storage
> > > ->array_elements); ++j)
> > > -   sh->ImageAccess[index + j] = access;
> > > - }
> > > + link_set_image_access_qualifiers(prog, sh,
> > > shader_stage,
> > > var,
> > > +  type->fields.array,
> > > name,
> > > +  new_length);
> > >}
> > > +   } else {
> > > +  unsigned id = 0;
> > > +  bool found = prog->UniformHash->get(id, *name);
> > > +  assert(found);
> > > +  (void) found;
> > > +  const gl_uniform_storage *storage = 
> > > ->UniformStorage[id];
> > > +  const unsigned index = storage
> > > ->opaque[shader_stage].index;
> > > +  const GLenum access = (var->data.image_read_only ?
> > > GL_READ_ONLY :
> > > + var->data.image_write_only ?
> > > GL_WRITE_ONLY :
> > > + GL_READ_WRITE);
> > > +
> > > +  for (unsigned j = 0; j < MAX2(1, storage->array_elements);
> > > ++j)
> > > + sh->ImageAccess[index + j] = access;
> > > }
> > >  }
> > >  
> > > @@ -1300,7 +1299,29 @@ link_assign_uniform_locations(struct
> > > gl_shader_program *prog,
> > > prog->NumHiddenUniforms = hidden_uniforms;
> > > prog->UniformStorage = uniforms;
> > >  
> > > -

Re: [Mesa-dev] [PATCH] i915/aa: fixing anti-aliasing bug for thinnest width lines

2015-10-21 Thread Ville Syrjälä

On Tue, Oct 20, 2015 at 02:02:21PM +0300, Ville Syrjälä wrote:
> On Tue, Oct 20, 2015 at 08:15:32AM +, Predut, Marius wrote:
> > > -Original Message-
> > > From: Ville Syrjälä [mailto:ville.syrj...@linux.intel.com]
> > > Sent: Monday, October 19, 2015 6:04 PM
> > > To: Predut, Marius
> > > Cc: mesa-dev@lists.freedesktop.org; Iago Toral Quiroga
> > > Subject: Re: [Mesa-dev] [PATCH] i915/aa: fixing anti-aliasing bug for 
> > > thinnest
> > > width lines
> > > 
> > > On Thu, Oct 15, 2015 at 06:03:34PM +0300, Ville Syrjälä wrote:
> > > > On Thu, Oct 15, 2015 at 02:19:09PM +, Predut, Marius wrote:
> > > > > > -Original Message-
> > > > > > From: Ville Syrjälä [mailto:ville.syrj...@linux.intel.com]
> > > > > > Sent: Wednesday, October 07, 2015 1:53 PM
> > > > > > To: Predut, Marius
> > > > > > Cc: mesa-dev@lists.freedesktop.org
> > > > > > Subject: Re: [Mesa-dev] [PATCH] i915/aa: fixing anti-aliasing bug
> > > > > > for thinnest width lines
> > > > > >
> > > > > > On Mon, Oct 05, 2015 at 07:55:24PM +0300, Marius Predut wrote:
> > > > > > > On PNV platform, for 1 pixel line thickness or less, the general
> > > > > > > anti-aliasing algorithm gives up, and a garbage line is generated.
> > > > > > > Setting a Line Width of 0.0 specifies the rasterization of the
> > > > > > > "thinnest" (one-pixel-wide), non-antialiased lines.
> > > > > > > Lines rendered with zero Line Width are rasterized using Grid
> > > > > > > Intersection Quantization rules as specified by
> > > > > > > 2.8.4.1 Zero-Width (Cosmetic) Line Rasterization from volume 1f
> > > > > > > of the
> > > > > > > GEN3 docs.
> > > > > > > The patch was tested on Intel Atom CPU N455.
> > > > > > >
> > > > > > > This patch follow the same rules as patches fixing the
> > > > > > > https://bugs.freedesktop.org/show_bug.cgi?id=28832
> > > > > > > bug.
> > > > > > >
> > > > > > > v1: Eduardo Lima Mitev:  Wrong indentation inside the if clause.
> > > > > > > v2: Ian Romanick: comments fix.
> > > > > > >
> > > > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90367
> > > > > > >
> > > > > > > Signed-off-by: Marius Predut 
> > > > > > > ---
> > > > > > >  src/mesa/drivers/dri/i915/i915_state.c | 15 +++
> > > > > > >  1 file changed, 15 insertions(+)
> > > > > > >
> > > > > > > diff --git a/src/mesa/drivers/dri/i915/i915_state.c
> > > > > > > b/src/mesa/drivers/dri/i915/i915_state.c
> > > > > > > index 4c83073..897eb59 100644
> > > > > > > --- a/src/mesa/drivers/dri/i915/i915_state.c
> > > > > > > +++ b/src/mesa/drivers/dri/i915/i915_state.c
> > > > > > > @@ -599,6 +599,21 @@ i915LineWidth(struct gl_context * ctx,
> > > > > > > GLfloat
> > > > > > > widthf)
> > > > > > >
> > > > > > > width = (int) (widthf * 2);
> > > > > > > width = CLAMP(width, 1, 0xf);
> > > > > > > +
> > > > > > > +   if (ctx->Line.Width < 1.5 || widthf < 1.5) {
> > > > > > > + /* For 1 pixel line thickness or less, the general
> > > > > > > +  * anti-aliasing algorithm gives up, and a garbage line is
> > > > > > > +  * generated.  Setting a Line Width of 0.0 specifies the
> > > > > > > +  * rasterization of the "thinnest" (one-pixel-wide),
> > > > > > > +  * non-antialiased lines.
> > > > > > > +  *
> > > > > > > +  * Lines rendered with zero Line Width are rasterized using
> > > > > > > +  * Grid Intersection Quantization rules as specified by
> > > > > > > +  * volume 1f of the GEN3 docs,
> > > > > > > +  * 2.8.4.1 Zero-Width (Cosmetic) Line Rasterization.
> > > > > > > +  */
> > > > > > > +  width = 0;
> > > > > > > +   }
> > > > > >
> > > > > > I went to do some spec reading, and while I can't confirm the AA
> > > > > > <= 1.0 problem (no mention in the spec about such things), I can
> > > > > > see this fix alone isn't sufficient to satisfy the spec (we lack
> > > > > > the round to nearest integer for non-aa for instance).
> > > > >
> > > > > Ville ,Thanks for review!
> > > > > On this seem not too much docs, here can use experiments or docs for 
> > > > > next
> > > GEN+.
> > > > >
> > > > > >
> > > > > > I think what we'd want is a small helper. i965 has one, although
> > > > > > that one looks quite messy. I think this is how I'd write the
> > > > > > helper for
> > > > > > i915:
> > > > > >
> > > > > > unsigned intel_line_width(ctx)
> > > > > > {
> > > > > > float line_width = ctx->Line.Width;
> > > > > >
> > > > > > if (ctx->Line.SmoothFlag)
> > > > > > line_width = CLAMP(line_width, MinAA, MaxAA);
> > > > > > else
> > > > > > line_width = CLAMP(roundf(line_width), Min, Max);
> > > > > >
> > > > > > /*
> > > > > >  * blah
> > > > > >  */
> > > > > > if (line_width < 1.5f)
> > > > > > line_width = 0.0f
> > > > > >
> > > > > > return U_FIXED(line_width, 1);
> > > > > > }
> > > > > >
> > > > > > and then use it for both gen2 and gen3 state setup.
> > > > >
> > > > > Do you used this and it

[Mesa-dev] [PATCH 1/4] egl/wayland: Use scanout flag for backbuffers

2015-10-21 Thread Axel Davy

The back buffers need to be scanout-able in case
the compositor wants to use the buffer (once sent)
as display framebuffer.

Signed-off-by: Axel Davy 
---
 src/egl/drivers/dri2/platform_wayland.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/egl/drivers/dri2/platform_wayland.c 
b/src/egl/drivers/dri2/platform_wayland.c
index 0d161f6..92ff2af 100644
--- a/src/egl/drivers/dri2/platform_wayland.c
+++ b/src/egl/drivers/dri2/platform_wayland.c
@@ -305,7 +305,7 @@ get_back_bo(struct dri2_egl_surface *dri2_surf)
 {
struct dri2_egl_display *dri2_dpy =
   dri2_egl_display(dri2_surf->base.Resource.Display);
-   int i;
+   int i, use_flags;
unsigned int dri_image_format;
 
/* currently supports three WL DRM formats,
@@ -352,6 +352,8 @@ get_back_bo(struct dri2_egl_surface *dri2_surf)
if (dri2_surf->back == NULL)
   return -1;
 
+   use_flags = __DRI_IMAGE_USE_SHARE | __DRI_IMAGE_USE_SCANOUT;
+
if (dri2_dpy->is_different_gpu &&
dri2_surf->back->linear_copy == NULL) {
dri2_surf->back->linear_copy =
@@ -359,7 +361,7 @@ get_back_bo(struct dri2_egl_surface *dri2_surf)
   dri2_surf->base.Width,
   dri2_surf->base.Height,
   dri_image_format,
-  __DRI_IMAGE_USE_SHARE |
+  use_flags |
   __DRI_IMAGE_USE_LINEAR,
   NULL);
   if (dri2_surf->back->linear_copy == NULL)
@@ -373,7 +375,7 @@ get_back_bo(struct dri2_egl_surface *dri2_surf)
   dri2_surf->base.Height,
   dri_image_format,
   dri2_dpy->is_different_gpu ?
- 0 : __DRI_IMAGE_USE_SHARE,
+ 0 : use_flags,
   NULL);
   dri2_surf->back->age = 0;
}
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] glsl: Implement a SSBO load optimization pass

2015-10-21 Thread Francisco Jerez

Iago Toral  writes:

> Hi Curro,
>
> On Tue, 2015-10-20 at 14:18 +0300, Francisco Jerez wrote:
>> Iago Toral  writes:
>> 
>> > On Tue, 2015-10-20 at 13:22 +0300, Francisco Jerez wrote:
>> >> Iago Toral Quiroga  writes:
>> >> 
>> >> > This allows us to re-use the results of previous ssbo loads in 
>> >> > situations
>> >> > that are safe (i.e. when there are no stores, atomic operations or
>> >> > memory barriers in between).
>> >> >
>> >> > This is particularly useful for things like matrix multiplications, 
>> >> > where
>> >> > for a mat4 buffer variable we cut the number of loads from 16 (4 reads 
>> >> > of
>> >> > each column) down to 4 (1 read of each column).
>> >> >
>> >> > The pass can only cache ssbo loads that involve constant blocks and
>> >> > offsets, but could be extended to compare sub-expressions for these
>> >> > as well, similar to a CSE pass.
>> >> >
>> >> > The way the cache works is simple: ssbo loads with constant block/offset
>> >> > are included in a cache as they are seen. Stores invalidate cache 
>> >> > entries.
>> >> > Stores with non-constant offset invalidate all cached loads for the 
>> >> > block
>> >> > and stores with non-constant block invalidate all cache entries. There 
>> >> > is
>> >> > room to improve this by using the actual variable name we are accessing 
>> >> > to
>> >> > limit the entries that should be invalidated. We also need to invalidate
>> >> > cache entries when we exit the block in which they have been defined
>> >> > (i.e. inside if/else blocks or loops).
>> >> >
>> >> > The cache optimization is built as a separate pass, instead of merging 
>> >> > it
>> >> > inside the lower_ubo_reference pass for a number of reasons:
>> >> >
>> >> > 1) The way we process assignments in visitors is that the LHS is
>> >> > processed before the RHS. This creates a problem for an optimization
>> >> > such as this when we do things like a = a + 1, since we would see the
>> >> > store before the read when the actual execution order is reversed.
>> >> > This could be fixed by re-implementing the logic in the visit_enter
>> >> > method for ir_assignment in lower_ubo_reference and then returning
>> >> > visit_continue_with_parent.
>> >> >
>> >> > 2) Some writes/reads need to be split into multiple smaller
>> >> > writes/reads, and we need to handle caching for each one. This happens
>> >> > deep inside the code that handles the lowering and some
>> >> > of the information we need to do this is not available. This could also
>> >> > be fixed by passing more data into the corresponding functions or by
>> >> > making this data available as class members, but the current 
>> >> > implementation
>> >> > is already complex enough and  this would only contribute to the 
>> >> > complexity.
>> >> >
>> >> > 3) We can have ssbo loads in the LHS too (i.e. a[a[0]] = ..). In these 
>> >> > cases
>> >> > the current code in lower_uo_reference would see the store before the 
>> >> > read.
>> >> > Probably fixable, but again would add more complexity to the lowering.
>> >> >
>> >> > On the other hand, a separate pass that runs after the lowering sees
>> >> > all the individal loads and stores in the correct order (so we don't 
>> >> > need
>> >> > to do any tricks) and it allows us to sepearate the lowering logic 
>> >> > (which
>> >> > is already complex) from the caching logic. It also gives us a chance to
>> >> > run it after other optimization passes have run and turned constant
>> >> > expressions for block/offset into constants, enabling more opportunities
>> >> > for caching.
>> >> 
>> >> Seems like a restricted form of CSE that only handles SSBO loads, and
>> >> only the ones with constant arguments.  Why don't we CSE these? (and
>> >> other memory access operations like image loads)
>> >
>> > There is not a CSE pass in GLSL IR any more so we would have to do it in
>> > NIR and some drivers would lose the optimization. Doing it at GLSL IR
>> > level seemed like a win from this perspective.
>> >
>> > Then there is the fact that we cannot just CSE these. We need to make
>> > sure that we only CSE them when it is safe to do so (i.e. no
>> > stores/atomics to the same offsets in between, no memory barriers, etc).
>> > The current CSE pass in NIR does not support this as far as I can see. I
>> > suppose that we could look into changing the pass to accommodate
>> > restrictions such as this if we think is worth it.
>> >
>> Not really sure if the NIR CSE pass would be adequate, but at least at
>> the i965 back-end level this could be handled easily in the CSE pass
>> (for typed and untyped surface read opcodes in general) in roughly the
>> same way that source variable interference is handled -- Just kill
>> potentially overlapping entries from the AEB whenever an atomic or write
>> instruction for the same surface is seen.
>
> I've been having a quick look at this option but I don't think this is
> going to work for us. The first

Re: [Mesa-dev] [PATCH 1/6] i965: Correct the comment about fb write payload

2015-10-21 Thread Francisco Jerez

Ben Widawsky  writes:

> Cc: Francisco Jerez 
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/brw_defines.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index 393f17a..7a5ee1b 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -916,8 +916,8 @@ enum opcode {
>  * Source 0: [required] Color 0.
>  * Source 1: [optional] Color 1 (for dual source blend messages).
>  * Source 2: [optional] Src0 Alpha.
> -* Source 3: [optional] Source Depth (passthrough from the thread 
> payload).
> -* Source 4: [optional] Destination Depth (gl_FragDepth).
> +* Source 3: [optional] Source Depth (gl_FragDepth)
> +* Source 4: [optional (gen4-5)] Destination Depth passthrough from thread
>  * Source 5: [optional] Sample Mask (gl_SampleMask).
>  * Source 6: [required] Number of color components (as a UD immediate).
>  */

Looks good, thanks,

Reviewed-by: Francisco Jerez 

> -- 
> 2.6.1


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

1 2 >

1 - 100 of 121 matches

Mail list logo