date:20160505

[Mesa-dev] [Bug 93551] Divinity: Original Sin Enhanced Edition(Native) crash on start

2016-05-05 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=93551

--- Comment #20 from kilobug  ---
I just tried with latest Mesash->CompileStatus = GL_FALSE;

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 93551] Divinity: Original Sin Enhanced Edition(Native) crash on start

2016-05-05 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=93551

--- Comment #21 from kilobug  ---
(In reply to kilobug from comment #20)
> I just tried with latest Mesash->CompileStatus = GL_FALSE;

Hrm sorry didn't finish my comment, sorry for the noise :/

So, I just tried with latest Mesa from Oibaf's PPA on my Radeon R9 380X which
contains the patch for "sh->CompileStatus = GL_FALSE;" and I still get :

(kilobug@drizzt) ~/gog/Divinity Original Sin Enhanced Edition $
ALSOFT_DRIVERS=pulse MESA_GL_VERSION_OVERRIDE=4.2 ./start.sh 
Running Divinity: Original Sin - Enhanced Edition
Language detected: English
Thread "EoCApp" (2444249152)
received signal 11

Call stack:

(0) /lib/x86_64-linux-gnu/libpthread.so.0 : +0x10d30 [0x7f8d8d2ebd30]
(1) ./libOGLBinding.so : api::OpenGLRenderer::ApplyConstants()+0x65
[0x7f8d91b87845]
(2) ./libRenderFramework.so : rf::Renderer::Apply(bool)+0x57 [0x7f8d91b3e437]
(3) ./EoCApp : ig::IggyBinding::Swap(rf::Renderer*)+0xfc [0xed032c]
(4) ./libGameEngine.so : BaseApp::EndDrawGUI(rf::Renderer*)+0x9b
[0x7f8d8e01bfab]
(5) ./libGameEngine.so : BaseApp::MakeFrame()+0x3a4 [0x7f8d8e01c4d4]
(6) ./libGameEngine.so : BaseApp::OnIdle()+0xe0 [0x7f8d8e01acb0]
(7) ./EoCApp : main+0x170 [0x6d5180]
(8) /lib/x86_64-linux-gnu/libc.so.6 : __libc_start_main+0xf0 [0x7f8d8cf53610]
(9) ./EoCApp : _start+0x29 [0x6d4ef9]
Segmentation fault

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 27/59] i965/fs: add a stride helper

2016-05-05 Thread Iago Toral

On Wed, 2016-05-04 at 13:59 -0700, Francisco Jerez wrote:
> Iago Toral  writes:
> 
> > On Wed, 2016-05-04 at 01:15 -0700, Francisco Jerez wrote:
> >> Iago Toral  writes:
> >> 
> >> > On Mon, 2016-05-02 at 18:48 -0700, Francisco Jerez wrote:
> >> >> Samuel Iglesias Gonsálvez  writes:
> >> >> 
> >> >> > From: Connor Abbott 
> >> >> >
> >> >> > Similar to retype() and offset().
> >> >> > ---
> >> >> >  src/mesa/drivers/dri/i965/brw_ir_fs.h | 8 
> >> >> >  1 file changed, 8 insertions(+)
> >> >> >
> >> >> > diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h 
> >> >> > b/src/mesa/drivers/dri/i965/brw_ir_fs.h
> >> >> > index e4f20f4..abda2c3 100644
> >> >> > --- a/src/mesa/drivers/dri/i965/brw_ir_fs.h
> >> >> > +++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h
> >> >> > @@ -78,6 +78,14 @@ retype(fs_reg reg, enum brw_reg_type type)
> >> >> >  }
> >> >> >  
> >> >> >  static inline fs_reg
> >> >> > +stride(fs_reg reg, unsigned stride)
> >> >> > +{
> >> >> > +   if (reg.stride != 0)
> >> >> > +  reg.stride = stride;
> >> >> > +   return reg;
> >> >> > +}
> >> >> > +
> >> >> 
> >> >> This only works if reg.stride == 0 or 1, we need to honour the stride of
> >> >> the original register (e.g. by doing reg.stride *= stride) or you'll end
> >> >> up taking components not part of the region given as argument.  It gets
> >> >> messy with ARF and HW registers because they represent the stride
> >> >> differently...  I suggest that instead of fixing this you take the
> >> >> following patch from my SIMD32 branch that introduces a somewhat easier
> >> >> to use helper: Instead of doing 'stride(horiz_offset(retype(reg, t), i),
> >> >> type_sz(reg.type) / type_sz(t))' to take the i-th subcomponent of type t
> >> >> of the original register you would just do 'subscript(reg, t, i)'. 
> >> >
> >> > Actually, I think this is not what we want in a number of places.
> >> >
> >> > Sometimes we just want to apply a stride to a register, no type changes
> >> > or anything. That is, we just want to do something like stride(reg, 2).
> >> > Of course, we can code that with subscript as subscript(reg, retype(reg,
> >> > type_that is_half_the_size), 0) but that is certainly forced. For
> >> > example, we have code such as:
> >> >
> >> > bld.MOV(tmp, stride(component_i, 2));
> >> > bld.MOV(horiz_offset(tmp, 8 * multiplier),
> >> > stride(horiz_offset(component_i, 1), 2));
> >> >
> >> Oh no, this is precisely the use-case I had in mind: component_i is a
> >> vector of 64-bit channels laid out contiguously, and you want to extract
> >> each 32-bit field from each channel separately.  You happen to have
> >> declared component_i as a 32-bit type but that's kind of a lie because
> >> each component of "component_i" only represents half a channel, it would
> >> definitely make more sense to make it a DF because that's what it
> >> contains...
> >
> > I think that whether component_i is really 64-bit or 32-bit is not
> > relevant, in the sense that what matters is how the algorithm
> > needs/wants to interpret the data it has to manipulate. The way I
> > thought about this is that we have our 32-bit channels mixed up and we
> > need to re-arrange them, so from my perspective, this is all about
> > 32-bit data that we need to move around in a particular fashion.
> 
> Keep in mind that channel masking is applied to component_i in multiples
> of 64 bits regardless of the type you declare it with, so if you forget
> the fact that component_i is a vector of elements 64 bits apart from
> each other and pretend it's a vector of 32 bit channels, you'll end up
> crossing channel boundaries illegally and miscompiling the program (or
> working it around with ->force_writemask_all hacks like you do in the
> SHUFFLE functions...). 

Yes, that is true.

>  subscript() guarantees that you never have to
> break channel boundaries not even temporarily -- If the register you
> give it as argument was channel-aligned the result is guaranteed
> channel-aligned.  stride() OTOH gives you a channel-misaligned result
> you need to compensate for with a matching retype(), so pretty much
> every use of the result from stride() other than as argument for
> retype() indicates a bug, that's why it makes sense to do both things in
> the same operation.  Otherwise you need to deal with a temporary value
> which is basically meaningless, and you need to manually make sure that
> the value provided as stride multiplier matches the ratio of the type
> sizes exactly (the snippet you paste below from PACK lowering
> illustrates the problem since it would miscompile the program silently
> if the instruction didn't have the exact number of sources you
> expected).

Ok, I see your point, that sounds like a good argument :)

> > That's why I find the stride() helper a lot more natural to use, where
> > I just tell the access type and the stride explicitly, while with
> > subscript() I have to think about a register type and an access type
> > instead and the stride is a co

Re: [Mesa-dev] [PATCH 27/59] i965/fs: add a stride helper

2016-05-05 Thread Iago Toral

On Thu, 2016-05-05 at 09:15 +0200, Iago Toral wrote:
> On Wed, 2016-05-04 at 13:59 -0700, Francisco Jerez wrote:
> > Iago Toral  writes:
> > 
> > > On Wed, 2016-05-04 at 01:15 -0700, Francisco Jerez wrote:
> > >> Iago Toral  writes:
> > >> 
> > >> > On Mon, 2016-05-02 at 18:48 -0700, Francisco Jerez wrote:
> > >> >> Samuel Iglesias Gonsálvez  writes:
> > >> >> 
> > >> >> > From: Connor Abbott 
> > >> >> >
> > >> >> > Similar to retype() and offset().
> > >> >> > ---
> > >> >> >  src/mesa/drivers/dri/i965/brw_ir_fs.h | 8 
> > >> >> >  1 file changed, 8 insertions(+)
> > >> >> >
> > >> >> > diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h 
> > >> >> > b/src/mesa/drivers/dri/i965/brw_ir_fs.h
> > >> >> > index e4f20f4..abda2c3 100644
> > >> >> > --- a/src/mesa/drivers/dri/i965/brw_ir_fs.h
> > >> >> > +++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h
> > >> >> > @@ -78,6 +78,14 @@ retype(fs_reg reg, enum brw_reg_type type)
> > >> >> >  }
> > >> >> >  
> > >> >> >  static inline fs_reg
> > >> >> > +stride(fs_reg reg, unsigned stride)
> > >> >> > +{
> > >> >> > +   if (reg.stride != 0)
> > >> >> > +  reg.stride = stride;
> > >> >> > +   return reg;
> > >> >> > +}
> > >> >> > +
> > >> >> 
> > >> >> This only works if reg.stride == 0 or 1, we need to honour the stride 
> > >> >> of
> > >> >> the original register (e.g. by doing reg.stride *= stride) or you'll 
> > >> >> end
> > >> >> up taking components not part of the region given as argument.  It 
> > >> >> gets
> > >> >> messy with ARF and HW registers because they represent the stride
> > >> >> differently...  I suggest that instead of fixing this you take the
> > >> >> following patch from my SIMD32 branch that introduces a somewhat 
> > >> >> easier
> > >> >> to use helper: Instead of doing 'stride(horiz_offset(retype(reg, t), 
> > >> >> i),
> > >> >> type_sz(reg.type) / type_sz(t))' to take the i-th subcomponent of 
> > >> >> type t
> > >> >> of the original register you would just do 'subscript(reg, t, i)'. 
> > >> >
> > >> > Actually, I think this is not what we want in a number of places.
> > >> >
> > >> > Sometimes we just want to apply a stride to a register, no type changes
> > >> > or anything. That is, we just want to do something like stride(reg, 2).
> > >> > Of course, we can code that with subscript as subscript(reg, 
> > >> > retype(reg,
> > >> > type_that is_half_the_size), 0) but that is certainly forced. For
> > >> > example, we have code such as:
> > >> >
> > >> > bld.MOV(tmp, stride(component_i, 2));
> > >> > bld.MOV(horiz_offset(tmp, 8 * multiplier),
> > >> > stride(horiz_offset(component_i, 1), 2));
> > >> >
> > >> Oh no, this is precisely the use-case I had in mind: component_i is a
> > >> vector of 64-bit channels laid out contiguously, and you want to extract
> > >> each 32-bit field from each channel separately.  You happen to have
> > >> declared component_i as a 32-bit type but that's kind of a lie because
> > >> each component of "component_i" only represents half a channel, it would
> > >> definitely make more sense to make it a DF because that's what it
> > >> contains...
> > >
> > > I think that whether component_i is really 64-bit or 32-bit is not
> > > relevant, in the sense that what matters is how the algorithm
> > > needs/wants to interpret the data it has to manipulate. The way I
> > > thought about this is that we have our 32-bit channels mixed up and we
> > > need to re-arrange them, so from my perspective, this is all about
> > > 32-bit data that we need to move around in a particular fashion.
> > 
> > Keep in mind that channel masking is applied to component_i in multiples
> > of 64 bits regardless of the type you declare it with, so if you forget
> > the fact that component_i is a vector of elements 64 bits apart from
> > each other and pretend it's a vector of 32 bit channels, you'll end up
> > crossing channel boundaries illegally and miscompiling the program (or
> > working it around with ->force_writemask_all hacks like you do in the
> > SHUFFLE functions...). 
> 
> Yes, that is true.
> 
> >  subscript() guarantees that you never have to
> > break channel boundaries not even temporarily -- If the register you
> > give it as argument was channel-aligned the result is guaranteed
> > channel-aligned.  stride() OTOH gives you a channel-misaligned result
> > you need to compensate for with a matching retype(), so pretty much
> > every use of the result from stride() other than as argument for
> > retype() indicates a bug, that's why it makes sense to do both things in
> > the same operation.  Otherwise you need to deal with a temporary value
> > which is basically meaningless, and you need to manually make sure that
> > the value provided as stride multiplier matches the ratio of the type
> > sizes exactly (the snippet you paste below from PACK lowering
> > illustrates the problem since it would miscompile the program silently
> > if the instruction didn't have the exact number of sources you
> > expect

Re: [Mesa-dev] [PATCH 2/2] mesa/compute: drop pointless casts.

2016-05-05 Thread Alejandro Piñeiro

LGTM:

Reviewed-by: Alejandro Piñeiro 

On 05/05/16 02:41, Dave Airlie wrote:
> From: Dave Airlie 
>
> We already are a GLintptr, casting won't help.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/mesa/main/api_validate.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
> index d455f19..421ae8b 100644
> --- a/src/mesa/main/api_validate.c
> +++ b/src/mesa/main/api_validate.c
> @@ -1107,20 +1107,20 @@ valid_dispatch_indirect(struct gl_context *ctx,
>  GLintptr indirect,
>  GLsizei size, const char *name)
>  {
> -   GLintptr end = (GLintptr)indirect + size;
> +   GLintptr end = indirect + size;
>  
> /* From the OpenGL 4.3 Core Specification, Chapter 19, Compute Shaders:
>  *
>  * "An INVALID_VALUE error is generated if indirect is negative or is not 
> a
>  *  multiple of four."
>  */
> -   if ((GLintptr)indirect & (sizeof(GLuint) - 1)) {
> +   if (indirect & (sizeof(GLuint) - 1)) {
>_mesa_error(ctx, GL_INVALID_VALUE,
>"%s(indirect is not aligned)", name);
>return GL_FALSE;
> }
>  
> -   if ((GLintptr)indirect < 0) {
> +   if (indirect < 0) {
>_mesa_error(ctx, GL_INVALID_VALUE,
>"%s(indirect is less than zero)", name);
>return GL_FALSE;

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] mesa/compute: move compute checks around for tests.

2016-05-05 Thread Alejandro Piñeiro

So as far as I understand, on that test there is no active program and
indirect length is wrong, and fails because it was expecting the second
error. Is that right?

Unless Im wrong, when the OpenGL spec specifies the Error cases, it
doesn't specify any kind of priority (which error should be raised first
if more of one condition happens).

Im somewhat biased to think that it is a problem with the test. What
would happen if a new test is written to check that INVALID_OPERATION is
generated when no active program, and it uses a wrong length? IMHO, if
the test is setting two wrong conditions, it should check for the two
possible errors.

On 05/05/16 02:41, Dave Airlie wrote:
> From: Dave Airlie 
>
> This fixes GL43-CTS.compute_shader.api-indirect
> which tests the length/4 before anything else.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/mesa/main/api_validate.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
> index 688408f..d455f19 100644
> --- a/src/mesa/main/api_validate.c
> +++ b/src/mesa/main/api_validate.c
> @@ -1109,9 +1109,6 @@ valid_dispatch_indirect(struct gl_context *ctx,
>  {
> GLintptr end = (GLintptr)indirect + size;
>  
> -   if (!check_valid_to_compute(ctx, name))
> -  return GL_FALSE;
> -
> /* From the OpenGL 4.3 Core Specification, Chapter 19, Compute Shaders:
>  *
>  * "An INVALID_VALUE error is generated if indirect is negative or is not 
> a
> @@ -1153,6 +1150,9 @@ valid_dispatch_indirect(struct gl_context *ctx,
>return GL_FALSE;
> }
>  
> +   if (!check_valid_to_compute(ctx, name))
> +  return GL_FALSE;
> +
> return GL_TRUE;
>  }
>  

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] nir: fix assert for wildcard pairs

2016-05-05 Thread Eduardo Lima Mitev

On 05/05/2016 04:37 AM, Thomas H.P. Andersen wrote:
> 
> 
> On Wed, May 4, 2016 at 7:46 AM, Eduardo Lima Mitev  > wrote:
> 
> Good catch!
> 
> Reviewed-by: Eduardo Lima Mitev  >
> 
> 
> Thanks! I do not have commit access. Can I ask you to push?
> 

Sure! Pushed this and the other 2 that I reviewed in your other series.

Be aware that this one remains unreviewed, so not merged:
[PATCH 1/3] coccinelle: remove null check before free

cheers,
Eduardo

> 
> On 05/04/2016 05:48 AM, Thomas Hindoe Paaboel Andersen wrote:
> > The assert was null checking dest_arr_parent twice. The intention
> > seems to be to check both dest_ and src_.
> >
> > Added in d3636da9
> > ---
> > v2:
> >   Fix the assert rather than checking both in the if(). Hat tip to
> Ilia.
> >  src/compiler/nir/nir_lower_var_copies.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/src/compiler/nir/nir_lower_var_copies.c
> b/src/compiler/nir/nir_lower_var_copies.c
> > index 707d5af..1a7e2ee 100644
> > --- a/src/compiler/nir/nir_lower_var_copies.c
> > +++ b/src/compiler/nir/nir_lower_var_copies.c
> > @@ -85,7 +85,7 @@ emit_copy_load_store(nir_intrinsic_instr
> *copy_instr,
> >
> > if (src_arr_parent || dest_arr_parent) {
> >/* Wildcards had better come in matched pairs */
> > -  assert(dest_arr_parent && dest_arr_parent);
> > +  assert(src_arr_parent && dest_arr_parent);
> >
> >nir_deref_array *src_arr =
> nir_deref_as_array(src_arr_parent->child);
> >nir_deref_array *dest_arr =
> nir_deref_as_array(dest_arr_parent->child);
> >
> 
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa/ubo: add missing compute cases for ubo/atomic buffers

2016-05-05 Thread Anuj Phogat

On Wed, May 4, 2016 at 8:12 PM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> This fixes: GL43-CTS.compute_shader.resource-ubo
>
> Signed-off-by: Dave Airlie 
> ---
>  src/mesa/main/uniforms.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/src/mesa/main/uniforms.c b/src/mesa/main/uniforms.c
> index a9308d0..acb2d06 100644
> --- a/src/mesa/main/uniforms.c
> +++ b/src/mesa/main/uniforms.c
> @@ -1144,6 +1144,12 @@ mesa_bufferiv(struct gl_shader_program *shProg, GLenum 
> type,
>GL_REFERENCED_BY_FRAGMENT_SHADER, params,
>caller);
>return;
> +   case GL_UNIFORM_BLOCK_REFERENCED_BY_COMPUTE_SHADER:
> +   case GL_ATOMIC_COUNTER_BUFFER_REFERENCED_BY_COMPUTE_SHADER:
> +  _mesa_program_resource_prop(shProg, res, index,
> +  GL_REFERENCED_BY_COMPUTE_SHADER, params,
> +  caller);
> +  return;
> default:
>_mesa_error(ctx, GL_INVALID_ENUM,
>"%s(pname 0x%x (%s))", caller, pname,
> --
> 2.5.5
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reviewed-by: Anuj Phogat 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] mesa/compute: drop pointless casts.

2016-05-05 Thread Anuj Phogat

On Wed, May 4, 2016 at 5:41 PM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> We already are a GLintptr, casting won't help.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/mesa/main/api_validate.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
> index d455f19..421ae8b 100644
> --- a/src/mesa/main/api_validate.c
> +++ b/src/mesa/main/api_validate.c
> @@ -1107,20 +1107,20 @@ valid_dispatch_indirect(struct gl_context *ctx,
>  GLintptr indirect,
>  GLsizei size, const char *name)
>  {
> -   GLintptr end = (GLintptr)indirect + size;
> +   GLintptr end = indirect + size;
>
> /* From the OpenGL 4.3 Core Specification, Chapter 19, Compute Shaders:
>  *
>  * "An INVALID_VALUE error is generated if indirect is negative or is not 
> a
>  *  multiple of four."
>  */
> -   if ((GLintptr)indirect & (sizeof(GLuint) - 1)) {
> +   if (indirect & (sizeof(GLuint) - 1)) {
>_mesa_error(ctx, GL_INVALID_VALUE,
>"%s(indirect is not aligned)", name);
>return GL_FALSE;
> }
>
> -   if ((GLintptr)indirect < 0) {
> +   if (indirect < 0) {
>_mesa_error(ctx, GL_INVALID_VALUE,
>"%s(indirect is less than zero)", name);
>return GL_FALSE;
> --
> 2.5.5
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reviewed-by: Anuj Phogat 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] mesa/main: add a comment to clarify INTERNALFORMAT_PREFERRED

2016-05-05 Thread Alejandro Piñeiro

The comment clarifies that the driver is called only to try to get
a preferred internalformat, and that it was already checked if the
format is supported or not.
---
 src/mesa/main/formatquery.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/formatquery.c b/src/mesa/main/formatquery.c
index 215c14f..1f21d17 100644
--- a/src/mesa/main/formatquery.c
+++ b/src/mesa/main/formatquery.c
@@ -902,7 +902,10 @@ _mesa_GetInternalformativ(GLenum target, GLenum 
internalformat, GLenum pname,
* format for representing resources of the specified 
 is
* returned in .
*
-   * Therefore, we let the driver answer.
+   * Therefore, we let the driver answer. Note that if we reach this
+   * point, it means that the internalformat is supported, so the driver
+   * is called just to try to get a preferred format. If not supported,
+   * GL_NONE was already returned and the driver is not called.
*/
   ctx->Driver.QueryInternalFormat(ctx, target, internalformat, pname,
   buffer);
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] i965/formatquery: remove INTERNALFORMAT_PREFERRED implementation

2016-05-05 Thread Alejandro Piñeiro

Right now the implementation only checks if the internalformat is
supported or not. But that implementation is wrong, returning
unsupported for some internalformats. Additionally, checking if
the internalformat is supported or not is already done at mesa/main
before calling the driver hook, so this new check is not needed.
---
 src/mesa/drivers/dri/i965/brw_formatquery.c | 71 -
 1 file changed, 71 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_formatquery.c 
b/src/mesa/drivers/dri/i965/brw_formatquery.c
index 210109b..8f7a910 100644
--- a/src/mesa/drivers/dri/i965/brw_formatquery.c
+++ b/src/mesa/drivers/dri/i965/brw_formatquery.c
@@ -65,46 +65,6 @@ brw_query_samples_for_format(struct gl_context *ctx, GLenum 
target,
}
 }
 
-/**
- * Returns a generic GL type from an internal format, so that it can be used
- * together with the base format to obtain a mesa_format by calling
- * mesa_format_from_format_and_type().
- */
-static GLenum
-get_generic_type_for_internal_format(GLenum internalFormat)
-{
-   if (_mesa_is_color_format(internalFormat)) {
-  if (_mesa_is_enum_format_unsigned_int(internalFormat))
- return GL_UNSIGNED_BYTE;
-  else if (_mesa_is_enum_format_signed_int(internalFormat))
- return GL_BYTE;
-   } else {
-  switch (internalFormat) {
-  case GL_STENCIL_INDEX:
-  case GL_STENCIL_INDEX8:
- return GL_UNSIGNED_BYTE;
-  case GL_DEPTH_COMPONENT:
-  case GL_DEPTH_COMPONENT16:
- return GL_UNSIGNED_SHORT;
-  case GL_DEPTH_COMPONENT24:
-  case GL_DEPTH_COMPONENT32:
- return GL_UNSIGNED_INT;
-  case GL_DEPTH_COMPONENT32F:
- return GL_FLOAT;
-  case GL_DEPTH_STENCIL:
-  case GL_DEPTH24_STENCIL8:
- return GL_UNSIGNED_INT_24_8;
-  case GL_DEPTH32F_STENCIL8:
- return GL_FLOAT_32_UNSIGNED_INT_24_8_REV;
-  default:
- /* fall-through */
- break;
-  }
-   }
-
-   return GL_FLOAT;
-}
-
 void
 brw_query_internal_format(struct gl_context *ctx, GLenum target,
   GLenum internalFormat, GLenum pname, GLint *params)
@@ -129,37 +89,6 @@ brw_query_internal_format(struct gl_context *ctx, GLenum 
target,
   break;
}
 
-   case GL_INTERNALFORMAT_PREFERRED: {
-  params[0] = GL_NONE;
-
-  /* We need to resolve an internal format that is compatible with
-   * the passed internal format, and optimal to the driver. By now,
-   * we just validate that the passed internal format is supported by
-   * the driver, and if so return the same internal format, otherwise
-   * return GL_NONE.
-   *
-   * For validating the internal format, we use the
-   * ctx->TextureFormatSupported map to check that a BRW surface format
-   * exists, that can be derived from the internal format. But this
-   * expects a mesa_format, not an internal format. So we need to "come up"
-   * with a type that is generic enough, to resolve the mesa_format first.
-   */
-  GLenum type = get_generic_type_for_internal_format(internalFormat);
-
-  /* Get a mesa_format from the internal format and type. */
-  GLint base_format = _mesa_base_tex_format(ctx, internalFormat);
-  if (base_format != -1) {
- mesa_format mesa_format =
-_mesa_format_from_format_and_type(base_format, type);
-
- if (mesa_format < MESA_FORMAT_COUNT &&
- ctx->TextureFormatSupported[mesa_format]) {
-params[0] = internalFormat;
- }
-  }
-  break;
-   }
-
default:
   /* By default, we call the driver hook's fallback function from the 
frontend,
* which has generic implementation for all pnames.
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 01/13] i965: Add a dependency on libisl

2016-05-05 Thread Jonathan Gray

On Fri, Apr 22, 2016 at 11:19:04AM -0700, Chad Versace wrote:
> On 04/22/2016 10:50 AM, Jason Ekstrand wrote:
> > 
> > 
> > On Fri, Apr 22, 2016 at 10:15 AM, Jonathan Gray  > > wrote:
> > 
> > On Fri, Apr 22, 2016 at 05:31:29PM +0100, Emil Velikov wrote:
> >> On 22 April 2016 at 16:08, Jonathan Gray  >> > wrote:
> >>> It is worth noting that the isl code extensively requires
> >>> designated initialisers on anonymous structs.  It isn't clear to
> >>> me when gcc introduced support for this but it isn't in 4.2.
> >>> 
> >> I think it should work for GCC 4.2 with -fms-extensions. We used
> >> to set -std=gnu99 for pre 4.6 which effectively enables it the
> >> extension. Can you double-check ?
> > 
> > The part that sets gnu99 for < gcc 4.6 is still there, using
> > -fms-extensions does not help for these. 
> 
> 
> > At least for the errors you're seeing there, I see two options:  1)
> > Use the isl_extentNd constructor functions in isl.h.  2) Stop making
> > isl_extentNd have anonymous unions.  I'm not sure how much the
> > anonymous unions are really doing for us but I'd like chad to chip in
> > before we throw them out.
> 
> I like the anonymous unions because they make some equations more concise.
> But, they don't provide anymore than that. I'll accept patches to remove
> them, as other people have also commented on their awkwardness.

Mark Kettenis has modified the base compiler to accept designated
initialisers for anonymous types so there is now no need to patch them
out for OpenBSD.

http://marc.info/?l=openbsd-cvs&m=146244074408249&w=2

>  
> > Here's another question: I know BSD doesn't ship gcc newer than 4.2
> > for license issues, but do you have a recent version of clang
> > available?
> 
> GCC 4.2 was released in 2007, and BSD won't upgrade it due to GPLv3.
> I believe it's unreasonable to promise Mesa will indefinitely restrict
> all new code to the feature set provided by a 2007-era GCC. As the years
> roll by, such a promise would become untenable.
> 
> Please try clang.
> 
> >>> Would you accept patches to remove them?
> >> While I cannot comment if they're OK with the idea, there might be 
> >> some confusion on the topic. There is anonymous and named. I
> >> believe developers were against the latter. Examples form [1]
> >> 
> >> struct bar { int i; }; // (1) unnamed, but tagged, ie *not*
> >> anonymous struct { int j; }; // (2) unnamed, but anonymous 
> >> struct { int k; } baz; // (3) named, but not tagged
> >> 
> >> Fwiw it would be great to use the more portable solution. Would
> >> C11 buy us anything ?
> 
> I believe that C11 does fix the problem. It standardized anonymous unions and 
> structs.
> However, if we need to support GCC 4.2, then we can't use C11. In fact, GCC's 
> C11 support
> is still incomplete as 5.0. I've discovered serious bugs as recently as GCC 
> 4.9.
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965: Only enable ARB_query_buffer_object for newer kernels on Haswell.

2016-05-05 Thread Kenneth Graunke

On Haswell, we need version 6 of the kernel command parser in order to
write the math registers.  Our implementation of ARB_query_buffer_object
heavily relies on MI_MATH, so we should only advertise it when MI_MATH
is available.

We're going to want to use MI_MATH in more places in the future.
To make checking for it easier, introduce a screen->has_mi_math flag.

Cc: Jordan Justen 
Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/intel_extensions.c | 2 +-
 src/mesa/drivers/dri/i965/intel_screen.c | 8 
 src/mesa/drivers/dri/i965/intel_screen.h | 5 +
 3 files changed, 14 insertions(+), 1 deletion(-)

Note: kernel command parser version 7 isn't actually a real thing yet.
You need the patch I just sent to intel-gfx:
[PATCH] drm/i915: Allow MI_LOAD_REGISTER_REG between whitelisted registers.

Also, Piglit's qbo test's
query-GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN-ASYNC subtest fails for me
on Haswell.  I haven't looked into why yet.

diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index 588df1a..7794c2f 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -366,7 +366,7 @@ intelInitExtensions(struct gl_context *ctx)
   }
}
 
-   if (brw->gen >= 8 || brw->is_haswell) {
+   if (brw->intelScreen->has_mi_math) {
   ctx->Extensions.ARB_query_buffer_object = true;
}
 
diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index 878901a..24fe523 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -1541,6 +1541,14 @@ __DRIconfig **intelInitScreen2(__DRIscreen *psp)
if (ret == -1)
   intelScreen->cmd_parser_version = 0;
 
+   /* Haswell requires command parser version 6 in order to write to the
+* MI_MATH GPR registers, and version 7 in order to use
+* MI_LOAD_REGISTER_REG (which all users of MI_MATH use).
+*/
+   intelScreen->has_mi_math = intelScreen->devinfo->gen >= 8 ||
+  (intelScreen->devinfo->is_haswell &&
+   intelScreen->cmd_parser_version >= 7);
+
psp->extensions = !intelScreen->has_context_reset_notification
   ? intelScreenExtensions : intelRobustScreenExtensions;
 
diff --git a/src/mesa/drivers/dri/i965/intel_screen.h 
b/src/mesa/drivers/dri/i965/intel_screen.h
index 01d45d0..8000c4c 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.h
+++ b/src/mesa/drivers/dri/i965/intel_screen.h
@@ -56,6 +56,11 @@ struct intel_screen
bool has_resource_streamer;
 
/**
+* Does the current hardware and kernel support MI_MATH?
+*/
+   bool has_mi_math;
+
+   /**
 * Does the kernel support context reset notifications?
 */
bool has_context_reset_notification;
-- 
2.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 06/23] i965/fs: fix copy/constant propagation regioning checks

2016-05-05 Thread Iago Toral

On Tue, 2016-05-03 at 17:28 -0700, Jordan Justen wrote:
> On 2016-05-03 05:21:55, Samuel Iglesias Gonsálvez wrote:
> > From: Iago Toral Quiroga 
> > 
> > We were not accounting for reg_suboffset in the check for the start
> > of the region. This meant that would allow copy-propagation even if
> > the dst wrote to sub_regoffset 4 and our source read from
> > sub_regoffset 0, which is not correct. This was observed in fp64 code,
> > since there we use reg_suboffset to select the high 32-bit of a double.
> > 
> > Also, fs_reg::regs_read() already takes the stride into account, so we
> > should not multiply its result by the stride again. This was making
> > copy-propagation fail to copy-propagate cases that would otherwise be
> > safe to copy-propagate. Again, this was observed in fp64 code, since
> > there we use stride > 1 often.
> > 
> > Incidentally, these fixes open up more possibilities for copy propagation
> > which uncovered new bugs in copy-propagation. The folowing patches address
> > each of these new issues.
> 
> Should this be moved after those fixes?

Yeah, I think that would be better.

> -Jordan
> 
> > ---
> >  .../drivers/dri/i965/brw_fs_copy_propagation.cpp| 21 
> > +
> >  1 file changed, 13 insertions(+), 8 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
> > b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> > index 5fae10f..23df877 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> > @@ -329,6 +329,15 @@ can_take_stride(fs_inst *inst, unsigned arg, unsigned 
> > stride,
> > return true;
> >  }
> >  
> > +static inline bool
> > +region_match(fs_reg src, unsigned regs_read,
> > + fs_reg dst, unsigned regs_written)
> > +{
> > +   return src.reg_offset >= dst.reg_offset &&
> > +  (src.reg_offset + regs_read) <= (dst.reg_offset + regs_written) 
> > &&
> > +  src.subreg_offset >= dst.subreg_offset;
> > +}
> > +
> >  bool
> >  fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
> >  {
> > @@ -351,10 +360,8 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
> > acp_entry *entry)
> > /* Bail if inst is reading a range that isn't contained in the range
> >  * that entry is writing.
> >  */
> > -   if (inst->src[arg].reg_offset < entry->dst.reg_offset ||
> > -   (inst->src[arg].reg_offset * 32 + inst->src[arg].subreg_offset +
> > -inst->regs_read(arg) * inst->src[arg].stride * 32) >
> > -   (entry->dst.reg_offset + entry->regs_written) * 32)
> > +   if (!region_match(inst->src[arg], inst->regs_read(arg),
> > + entry->dst, entry->regs_written))
> >return false;
> >  
> > /* we can't generally copy-propagate UD negations because we
> > @@ -554,10 +561,8 @@ fs_visitor::try_constant_propagate(fs_inst *inst, 
> > acp_entry *entry)
> >/* Bail if inst is reading a range that isn't contained in the range
> > * that entry is writing.
> > */
> > -  if (inst->src[i].reg_offset < entry->dst.reg_offset ||
> > -  (inst->src[i].reg_offset * 32 + inst->src[i].subreg_offset +
> > -   inst->regs_read(i) * inst->src[i].stride * 32) >
> > -  (entry->dst.reg_offset + entry->regs_written) * 32)
> > +  if (!region_match(inst->src[i], inst->regs_read(i),
> > +entry->dst, entry->regs_written))
> >   continue;
> >  
> >fs_reg val = entry->src;
> > -- 
> > 2.5.0
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/3] glsl: Apply memory qualifiers to vars inside named block interfaces

2016-05-05 Thread Eduardo Lima Mitev

This is missing and memory qualifiers are currently being ignored for SSBOs.
---
 src/compiler/glsl/ast_to_hir.cpp | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/src/compiler/glsl/ast_to_hir.cpp b/src/compiler/glsl/ast_to_hir.cpp
index c5cd48f..5a1fc9f 100644
--- a/src/compiler/glsl/ast_to_hir.cpp
+++ b/src/compiler/glsl/ast_to_hir.cpp
@@ -6963,6 +6963,16 @@ is_unsized_array_last_element(ir_variable *v)
return false;
 }
 
+static void
+apply_memory_qualifiers(ir_variable *var, glsl_struct_field field)
+{
+   var->data.image_read_only = field.image_read_only;
+   var->data.image_write_only = field.image_write_only;
+   var->data.image_coherent = field.image_coherent;
+   var->data.image_volatile = field.image_volatile;
+   var->data.image_restrict = field.image_restrict;
+}
+
 ir_rvalue *
 ast_interface_block::hir(exec_list *instructions,
  struct _mesa_glsl_parse_state *state)
@@ -7449,6 +7459,9 @@ ast_interface_block::hir(exec_list *instructions,
}
 }
  }
+
+ if (var->data.mode == ir_var_shader_storage)
+apply_memory_qualifiers(var, fields[i]);
   }
 
   if (ir_variable *earlier =
@@ -7523,13 +7536,8 @@ ast_interface_block::hir(exec_list *instructions,
 var->data.matrix_layout = fields[i].matrix_layout;
  }
 
- if (var->data.mode == ir_var_shader_storage) {
-var->data.image_read_only = fields[i].image_read_only;
-var->data.image_write_only = fields[i].image_write_only;
-var->data.image_coherent = fields[i].image_coherent;
-var->data.image_volatile = fields[i].image_volatile;
-var->data.image_restrict = fields[i].image_restrict;
- }
+ if (var->data.mode == ir_var_shader_storage)
+apply_memory_qualifiers(var, fields[i]);
 
  /* Examine var name here since var may get deleted in the next call */
  bool var_is_gl_id = is_gl_identifier(var->name);
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/3] nir: Put nir_variable's memory qualifiers in its own struct

2016-05-05 Thread Eduardo Lima Mitev

ARB_shader_storage_buffer_object allows for memory qualifiers defined for
ARB_shader_image_load_store, to be applied to SSBOs using the same semantics.
So, for clarity, we might want to take these qualifiers out into its own
category, rather than putting them inside the 'image' specific struct.

I agree that this renaming barely improves anything other than making
nir_variable data struct pedantically stricter, so I would be ok dropping
this patch if people think it is not worth it. Also, glsl data structs
still use 'image' namespace for all memory qualifiers, so we might want
to rename things there too.
---
 src/compiler/nir/glsl_to_nir.cpp | 10 +-
 src/compiler/nir/nir.h   |  7 ++-
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp |  2 +-
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/src/compiler/nir/glsl_to_nir.cpp b/src/compiler/nir/glsl_to_nir.cpp
index c8954ce..75760f4 100644
--- a/src/compiler/nir/glsl_to_nir.cpp
+++ b/src/compiler/nir/glsl_to_nir.cpp
@@ -414,11 +414,11 @@ nir_visitor::visit(ir_variable *ir)
var->data.index = ir->data.index;
var->data.binding = ir->data.binding;
var->data.offset = ir->data.offset;
-   var->data.image.read_only = ir->data.image_read_only;
-   var->data.image.write_only = ir->data.image_write_only;
-   var->data.image.coherent = ir->data.image_coherent;
-   var->data.image._volatile = ir->data.image_volatile;
-   var->data.image.restrict_flag = ir->data.image_restrict;
+   var->data.memory.read_only = ir->data.image_read_only;
+   var->data.memory.write_only = ir->data.image_write_only;
+   var->data.memory.coherent = ir->data.image_coherent;
+   var->data.memory._volatile = ir->data.image_volatile;
+   var->data.memory.restrict_flag = ir->data.image_restrict;
var->data.image.format = ir->data.image_format;
var->data.max_array_access = ir->data.max_array_access;
 
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 98451c6..b2645a3 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -283,7 +283,7 @@ typedef struct nir_variable {
   unsigned offset;
 
   /**
-   * ARB_shader_image_load_store qualifiers.
+   * Memory qualifiers.
*/
   struct {
  bool read_only; /**< "readonly" qualifier. */
@@ -291,7 +291,12 @@ typedef struct nir_variable {
  bool coherent;
  bool _volatile;
  bool restrict_flag;
+  } memory;
 
+  /**
+   * ARB_shader_image_load_store.
+   */
+  struct {
  /** Image internal format if specified explicitly, otherwise GL_NONE. 
*/
  GLenum format;
   } image;
diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 4d14fda..13a4751 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -2935,7 +2935,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
nir_intrinsic_instr *instr
 
   else if (instr->intrinsic == nir_intrinsic_image_store)
  emit_image_store(bld, image, addr, src0, surf_dims, arr_dims,
-  var->data.image.write_only ? GL_NONE : format);
+  var->data.memory.write_only ? GL_NONE : format);
 
   else
  tmp = emit_image_atomic(bld, image, addr, src0, src1,
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] nir/print: Print memory qualifiers in a variable declaration

2016-05-05 Thread Eduardo Lima Mitev

---
 src/compiler/nir/nir_print.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/compiler/nir/nir_print.c b/src/compiler/nir/nir_print.c
index 9a77faf..a36561e 100644
--- a/src/compiler/nir/nir_print.c
+++ b/src/compiler/nir/nir_print.c
@@ -354,6 +354,13 @@ print_var_decl(nir_variable *var, print_state *state)
cent, samp, patch, inv, get_variable_mode_str(var->data.mode),
glsl_interp_qualifier_name(var->data.interpolation));
 
+   const char *const coher = (var->data.image.coherent) ? "coherent " : "";
+   const char *const volat = (var->data.image._volatile) ? "volatile " : "";
+   const char *const restr = (var->data.image.restrict_flag) ? "restrict " : 
"";
+   const char *const ronly = (var->data.image.read_only) ? "readonly " : "";
+   const char *const wonly = (var->data.image.write_only) ? "writeonly " : "";
+   fprintf(fp, "%s%s%s%s%s", coher, volat, restr, ronly, wonly);
+
glsl_print_type(var->type, fp);
 
fprintf(fp, " %s", get_var_name(var, state));
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 56/59] i965/fs: align access to double-based uniforms in push constant buffer

2016-05-05 Thread Samuel Iglesias Gonsálvez



On 03/05/16 01:10, Kenneth Graunke wrote:
> On Friday, April 29, 2016 1:29:53 PM PDT Samuel Iglesias Gonsálvez wrote:
>> When there is a mix of definitions of uniforms with 32-bit or 64-bit
>> data type sizes, the driver ends up doing misaligned access to double
>> based variables in the push constant buffer.
>>
>> To fix this, the driver adds padding when needed in the push constant
>> buffer and takes it into account to avoid exceeding its limits and to
>> update the GRF registers to access uniform variables.
>>
>> v2 (Iago):
>>   - Renamed some variables for clarity.
>>   - Replace the boolean array of paddings that tracked if each param was
>> padded by an integer array that keeps count of accumnulated padding.
>> This allows us to remove a couple of loops that had to compute that.
>>   - Reworked things a bit so we can get rid of the nr_paddings field in
>> brw_stage_prog_data.
>>   - Use rzalloc_array instead or ralloc_array and memset.
>>   - Fixed wrong indentation.
>>
>> Signed-off-by: Samuel Iglesias Gonsálvez 
>> Signed-off-by: Iago Toral Quiroga 
>> ---
>>  src/mesa/drivers/dri/i965/brw_compiler.h|  8 
>>  src/mesa/drivers/dri/i965/brw_fs.cpp| 50 +
> +---
>>  src/mesa/drivers/dri/i965/brw_program.c |  1 +
>>  src/mesa/drivers/dri/i965/brw_wm.c  |  2 +
>>  src/mesa/drivers/dri/i965/gen6_constant_state.c | 12 --
>>  5 files changed, 64 insertions(+), 9 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h b/src/mesa/drivers/
> dri/i965/brw_compiler.h
>> index 5807305..bc15bf3 100644
>> --- a/src/mesa/drivers/dri/i965/brw_compiler.h
>> +++ b/src/mesa/drivers/dri/i965/brw_compiler.h
>> @@ -333,6 +333,14 @@ struct brw_stage_prog_data {
>> } binding_table;
>>  
>> GLuint nr_params;   /**< number of float params/constants */
>> +   /**
>> +* Accumulated padding (in 32-bit units) in the push constant buffer for
>> +* each param. If param_padding[n] = P, it means that params 0..n 
> required
>> +* a total accumulated padding of P 32-bit slots. This is useful to 
> compute
>> +* the actual location, including padding, for each param.
>> +*/
>> +   uint32_t *param_padding;
>> +
>> GLuint nr_pull_params;
>> unsigned nr_image_params;
>>  
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/
> i965/brw_fs.cpp
>> index 53f709b..5aa6ded 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> @@ -1571,7 +1571,8 @@ fs_visitor::assign_curb_setup()
>>  int uniform_nr = inst->src[i].nr + inst->src[i].reg_offset;
>>  int constant_nr;
>>  if (uniform_nr >= 0 && uniform_nr < (int) uniforms) {
>> -   constant_nr = push_constant_loc[uniform_nr];
>> +   constant_nr = push_constant_loc[uniform_nr] +
>> + stage_prog_data->param_padding[uniform_nr];
>>  } else {
>> /* Section 5.11 of the OpenGL 4.1 spec says:
>>  * "Out-of-bounds reads return undefined values, which 
> include
>> @@ -2010,7 +2011,10 @@ fs_visitor::assign_constant_locations()
>>return;
>>  
>> bool is_live[uniforms];
>> +   bool is_64bit[uniforms];
>> +
>> memset(is_live, 0, sizeof(is_live));
>> +   memset(is_64bit, 0, sizeof(is_64bit));
>>  
>> /* For each uniform slot, a value of true indicates that the given slot 
> and
>>  * the next slot must remain contiguous.  This is used to keep us from
>> @@ -2047,8 +2051,11 @@ fs_visitor::assign_constant_locations()
>>  is_live[last] = true;
>>   } else {
>>  if (constant_nr >= 0 && constant_nr < (int) uniforms) {
>> -   for (int j = 0; j < inst->regs_read(i); j++)
>> +   for (int j = 0; j < inst->regs_read(i); j++) {
>>is_live[constant_nr + j] = true;
>> +  if (type_sz(inst->src[i].type) == 8)
>> + is_64bit[constant_nr + j] = true;
>> +   }
>>  }
>>   }
>>}
>> @@ -2072,14 +2079,18 @@ fs_visitor::assign_constant_locations()
>>  
>> unsigned int num_push_constants = 0;
>> unsigned int num_pull_constants = 0;
>> +   unsigned num_paddings = 0;
>> +   bool is_64bit_aligned = true;
>>  
>> push_constant_loc = ralloc_array(mem_ctx, int, uniforms);
>> pull_constant_loc = ralloc_array(mem_ctx, int, uniforms);
>> +   stage_prog_data->param_padding = rzalloc_array(NULL, uint32_t, 
> uniforms);
>>  
>> int chunk_start = -1;
>> for (unsigned u = 0; u < uniforms; u++) {
>>push_constant_loc[u] = -1;
>>pull_constant_loc[u] = -1;
>> +  stage_prog_data->param_padding[u] = num_paddings;
>>  
>>if (!is_live[u])
>>   continue;
>> @@ -2094,17 +2105,44 @@ fs_visitor::assign_constant_locations()
>> */
>>if (!contiguous[u]) {
>>   unsigned chunk_

Re: [Mesa-dev] [PATCH 05/23] i965/fs: fix copy-propagation with suboffset from constants

2016-05-05 Thread Iago Toral

On Tue, 2016-05-03 at 16:21 -0700, Jordan Justen wrote:
> On 2016-05-03 05:21:54, Samuel Iglesias Gonsálvez wrote:
> > From: Iago Toral Quiroga 
> > 
> > The current code ignores the suboffet in the instruction's source
> > and just uses the one from the constant. This is not correct
> > when the instruction's source is accessing the constant with a
> > different type and using the suboffset to select a specific
> > chunk of the constant. We generate this kind of code in fp64
> > when we want to select only the high 32-bit of a particular
> > double constant.
> > 
> > Instead, we should add any existing suboffset in the
> > instruction's source (modulo the size of the entry's type)
> > to the suboffset in the constant so we can preserve the orinal
> > semantics.
> > 
> > Prevents that we turn this:
> > 
> > mov(8) vgrf5:DF, u2<0>:DF
> > mov(8) vgrf7:UD, vgrf5+0.4<2>:UD
> > 
> > Into:
> > 
> > mov(8) vgrf7:UD, u2<0>:UD
> > 
> > And instead, with this patch, we produce:
> > 
> > mov(8) vgrf7:UD, u2+0.4<0>:UD
> > ---
> >  .../drivers/dri/i965/brw_fs_copy_propagation.cpp   | 23 
> > --
> >  1 file changed, 21 insertions(+), 2 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
> > b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> > index aa4c9c9..5fae10f 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> > @@ -445,8 +445,27 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
> > acp_entry *entry)
> > case BAD_FILE:
> > case ARF:
> > case FIXED_GRF:
> > -  inst->src[arg].reg_offset = entry->src.reg_offset;
> > -  inst->src[arg].subreg_offset = entry->src.subreg_offset;
> > +  {
> > + inst->src[arg].reg_offset = entry->src.reg_offset;
> > + inst->src[arg].subreg_offset = entry->src.subreg_offset;
> > +
> > + /* If we copy propagate from a larger type we have to be aware 
> > that
> > +  * the instruction might be using subreg_offset to select a 
> > particular
> > +  * chunk of the data in the entry. For example:
> > +  *
> > +  * mov(8) vgrf5:DF, u2<0>:DF
> > +  * mov(8) vgrf7:UD, vgrf5+0.4<2>:UD
> > +  *
> > +  * vgrf5+0.4<2>:UD is actually reading the high 32-bit of u2.0, 
> > so if
> > +  * we want to copy propagate here we have to do it from u2+0.4.
> > +  */
> > + int type_sz_src = type_sz(inst->src[arg].type);
> > + int type_sz_entry = type_sz(entry->src.type);
> > + if (type_sz_entry > type_sz_src) {
> > +inst->src[arg].subreg_offset +=
> > +   inst->src[arg].subreg_offset % type_sz_entry;
> 
> Seeing 'inst->src[arg].subreg_offset' on both sides of this += seems
> strange. Is this correct?

Yes, this looks wrong. I'll fix it, thanks!

> -Jordan
> 
> > + }
> > +  }
> >break;
> > case ATTR:
> > case VGRF:
> > -- 
> > 2.5.0
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.

2016-05-05 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95266

--- Comment #17 from David Lonie  ---
Thanks for the info Jose, I'll make a note of that for the future.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Only enable ARB_query_buffer_object for newer kernels on Haswell.

2016-05-05 Thread Jordan Justen

On 2016-05-05 03:15:51, Kenneth Graunke wrote:
> On Haswell, we need version 6 of the kernel command parser in order to
> write the math registers.  Our implementation of ARB_query_buffer_object
> heavily relies on MI_MATH, so we should only advertise it when MI_MATH
> is available.
> 

Doh! Command Parser!! :(

> We're going to want to use MI_MATH in more places in the future.
> To make checking for it easier, introduce a screen->has_mi_math flag.
> 
> Cc: Jordan Justen 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/intel_extensions.c | 2 +-
>  src/mesa/drivers/dri/i965/intel_screen.c | 8 
>  src/mesa/drivers/dri/i965/intel_screen.h | 5 +
>  3 files changed, 14 insertions(+), 1 deletion(-)
> 
> Note: kernel command parser version 7 isn't actually a real thing yet.
> You need the patch I just sent to intel-gfx:
> [PATCH] drm/i915: Allow MI_LOAD_REGISTER_REG between whitelisted registers.
> 
> Also, Piglit's qbo test's
> query-GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN-ASYNC subtest fails for me
> on Haswell.  I haven't looked into why yet.
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
> b/src/mesa/drivers/dri/i965/intel_extensions.c
> index 588df1a..7794c2f 100644
> --- a/src/mesa/drivers/dri/i965/intel_extensions.c
> +++ b/src/mesa/drivers/dri/i965/intel_extensions.c
> @@ -366,7 +366,7 @@ intelInitExtensions(struct gl_context *ctx)
>}
> }
>  
> -   if (brw->gen >= 8 || brw->is_haswell) {
> +   if (brw->intelScreen->has_mi_math) {

Should we bump the check to gen >= 8 for now?

With just changing the check to not include hsw:

Reviewed-by: Jordan Justen 

>ctx->Extensions.ARB_query_buffer_object = true;
> }
>  
> diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
> b/src/mesa/drivers/dri/i965/intel_screen.c
> index 878901a..24fe523 100644
> --- a/src/mesa/drivers/dri/i965/intel_screen.c
> +++ b/src/mesa/drivers/dri/i965/intel_screen.c
> @@ -1541,6 +1541,14 @@ __DRIconfig **intelInitScreen2(__DRIscreen *psp)
> if (ret == -1)
>intelScreen->cmd_parser_version = 0;
>  
> +   /* Haswell requires command parser version 6 in order to write to the
> +* MI_MATH GPR registers, and version 7 in order to use
> +* MI_LOAD_REGISTER_REG (which all users of MI_MATH use).
> +*/
> +   intelScreen->has_mi_math = intelScreen->devinfo->gen >= 8 ||
> +  (intelScreen->devinfo->is_haswell &&
> +   intelScreen->cmd_parser_version >= 7);

Do you think has_mi_math should imply LRR? Command parser version 6
should allow MI_MATH to be used. I guess the Vulkan driver uses
MI_MATH without LRR.

-Jordan

> +
> psp->extensions = !intelScreen->has_context_reset_notification
>? intelScreenExtensions : intelRobustScreenExtensions;
>  
> diff --git a/src/mesa/drivers/dri/i965/intel_screen.h 
> b/src/mesa/drivers/dri/i965/intel_screen.h
> index 01d45d0..8000c4c 100644
> --- a/src/mesa/drivers/dri/i965/intel_screen.h
> +++ b/src/mesa/drivers/dri/i965/intel_screen.h
> @@ -56,6 +56,11 @@ struct intel_screen
> bool has_resource_streamer;
>  
> /**
> +* Does the current hardware and kernel support MI_MATH?
> +*/
> +   bool has_mi_math;
> +
> +   /**
>  * Does the kernel support context reset notifications?
>  */
> bool has_context_reset_notification;
> -- 
> 2.8.2
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/14] RadeonSI: SDMA texture copy rewrite

2016-05-05 Thread Alex Deucher

On Wed, May 4, 2016 at 7:43 PM, Marek Olšák  wrote:
> Hi,
>
> This patch series completely rewrites texture copying with SDMA for CIK & VI. 
> It only uses the "partial" copy packets, which makes it a lot simpler (one 
> packet per layered/3D copy). Most of the complexity is in handling hw 
> limitations, but luckily the SDMA path can be used in the majority of cases.
>
> On top of that, the DMA IB support is improved thanks to extensive testing 
> and benchmarking.
>
> This is only enabled on VI for now. It can be enabled on CIK after enough 
> testing is done.
>
> There is one more hardware limitation that can cause VM faults with T2L 
> copies and needs a workaround. The exact workaround is still under 
> discussion, but I think this is good enough for review already.
>
> On Tonga, it beats the 3D engine in texture upload performance by 0-80%. On 
> APUs, the performance is mixed or slightly worse. The reason may be that the 
> 3D engine uses less bandwidth thanks to DCC.
>
> The fact that SDMA can't do DCC compression can be detrimental to texturing 
> performance on VI. The impact hasn't been measured yet.
>
> Texture uploads still use 2 copies (user memory -> linear, linear -> tiled). 
> Merging those 2 copies into 1 is not done in this series.
>
> Please review and opinions on the DCC issue are welcome,

Have you looked at using SDMA for buffer clears as well?

Patches are:
Reviewed-by: Alex Deucher 

>
> Marek
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/14] RadeonSI: SDMA texture copy rewrite

2016-05-05 Thread Marek Olšák

On Thu, May 5, 2016 at 5:57 PM, Alex Deucher  wrote:
> On Wed, May 4, 2016 at 7:43 PM, Marek Olšák  wrote:
>> Hi,
>>
>> This patch series completely rewrites texture copying with SDMA for CIK & 
>> VI. It only uses the "partial" copy packets, which makes it a lot simpler 
>> (one packet per layered/3D copy). Most of the complexity is in handling hw 
>> limitations, but luckily the SDMA path can be used in the majority of cases.
>>
>> On top of that, the DMA IB support is improved thanks to extensive testing 
>> and benchmarking.
>>
>> This is only enabled on VI for now. It can be enabled on CIK after enough 
>> testing is done.
>>
>> There is one more hardware limitation that can cause VM faults with T2L 
>> copies and needs a workaround. The exact workaround is still under 
>> discussion, but I think this is good enough for review already.
>>
>> On Tonga, it beats the 3D engine in texture upload performance by 0-80%. On 
>> APUs, the performance is mixed or slightly worse. The reason may be that the 
>> 3D engine uses less bandwidth thanks to DCC.
>>
>> The fact that SDMA can't do DCC compression can be detrimental to texturing 
>> performance on VI. The impact hasn't been measured yet.
>>
>> Texture uploads still use 2 copies (user memory -> linear, linear -> tiled). 
>> Merging those 2 copies into 1 is not done in this series.
>>
>> Please review and opinions on the DCC issue are welcome,
>
> Have you looked at using SDMA for buffer clears as well?

In general, I don't think SDMA is a good match for any buffer-related
operations if a GFX IB flush is required.

The only area where it might make sense to use SDMA is the initial
HTILE/DCC/CMASK fill when new textures are being created and there is
no dependency on the GFX IB.

We already use SDMA for all pipelined buffer uploads. I don't know if
that is helping us or hurting us. Probably the latter since CP DMA can
copy through (and thus preload) TC L2, while SDMA can't.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/14] RadeonSI: SDMA texture copy rewrite

2016-05-05 Thread Alex Deucher

On Thu, May 5, 2016 at 12:17 PM, Marek Olšák  wrote:
> On Thu, May 5, 2016 at 5:57 PM, Alex Deucher  wrote:
>> On Wed, May 4, 2016 at 7:43 PM, Marek Olšák  wrote:
>>> Hi,
>>>
>>> This patch series completely rewrites texture copying with SDMA for CIK & 
>>> VI. It only uses the "partial" copy packets, which makes it a lot simpler 
>>> (one packet per layered/3D copy). Most of the complexity is in handling hw 
>>> limitations, but luckily the SDMA path can be used in the majority of cases.
>>>
>>> On top of that, the DMA IB support is improved thanks to extensive testing 
>>> and benchmarking.
>>>
>>> This is only enabled on VI for now. It can be enabled on CIK after enough 
>>> testing is done.
>>>
>>> There is one more hardware limitation that can cause VM faults with T2L 
>>> copies and needs a workaround. The exact workaround is still under 
>>> discussion, but I think this is good enough for review already.
>>>
>>> On Tonga, it beats the 3D engine in texture upload performance by 0-80%. On 
>>> APUs, the performance is mixed or slightly worse. The reason may be that 
>>> the 3D engine uses less bandwidth thanks to DCC.
>>>
>>> The fact that SDMA can't do DCC compression can be detrimental to texturing 
>>> performance on VI. The impact hasn't been measured yet.
>>>
>>> Texture uploads still use 2 copies (user memory -> linear, linear -> 
>>> tiled). Merging those 2 copies into 1 is not done in this series.
>>>
>>> Please review and opinions on the DCC issue are welcome,
>>
>> Have you looked at using SDMA for buffer clears as well?
>
> In general, I don't think SDMA is a good match for any buffer-related
> operations if a GFX IB flush is required.
>
> The only area where it might make sense to use SDMA is the initial
> HTILE/DCC/CMASK fill when new textures are being created and there is
> no dependency on the GFX IB.

Yeah, I was thinking it might be useful for UVD and VCE since we do a
lot of clears in the state setup and we don't really have a dependency
on the GFX pipe.

Alex

>
> We already use SDMA for all pipelined buffer uploads. I don't know if
> that is helping us or hurting us. Probably the latter since CP DMA can
> copy through (and thus preload) TC L2, while SDMA can't.
>
> Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: set DECOMPRESS_Z_ON_FLUSH if nr_samples >= 4

2016-05-05 Thread Nicolai Hähnle


Reviewed-by: Nicolai Hähnle 

On 04.05.2016 17:35, Marek Olšák wrote:

From: Marek Olšák 

Vulkan always sets this. It only affects in-place Z decompression.
This is recommended for performance, but what app uses MSAA depth
texturing?
---
  src/gallium/drivers/radeonsi/si_state.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index 81a4341..c6e10b7 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -1104,7 +1104,8 @@ static void si_emit_db_render_state(struct si_context 
*sctx, struct r600_atom *s
/* DB_RENDER_OVERRIDE2 */
radeon_set_context_reg(cs, R_028010_DB_RENDER_OVERRIDE2,

S_028010_DISABLE_ZMASK_EXPCLEAR_OPTIMIZATION(sctx->db_depth_disable_expclear) |
-   
S_028010_DISABLE_SMEM_EXPCLEAR_OPTIMIZATION(sctx->db_stencil_disable_expclear));
+   
S_028010_DISABLE_SMEM_EXPCLEAR_OPTIMIZATION(sctx->db_stencil_disable_expclear) |
+   S_028010_DECOMPRESS_Z_ON_FLUSH(sctx->framebuffer.nr_samples >= 
4));

db_shader_control = 
S_02880C_ALPHA_TO_MASK_DISABLE(sctx->framebuffer.cb0_is_integer) |
sctx->ps_db_shader_control;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] PCI on PowerPC

2016-05-05 Thread Nicolai Hähnle


Hi Mathieu,

discussions of the kernel drivers should be sent to 
dri-de...@lists.freedesktop.org instead.


Thanks,
Nicolai

On 03.05.2016 01:42, Mathieu Malaterre wrote:

Dear all,

I've tested a patch against radeon_drc.c so that the default mode is
now PCI on PowerPC arch.

This is the result of the discussion on the following bug report:

https://bugs.freedesktop.org/show_bug.cgi?id=95017

I had prepared a less invasive patch (targetting only the specific
subsystem I had), but Ilia Mirkin mentionned that nouveau driver is
already doing the same invasive patch for all PowerPC arch.

Thanks for your comments,
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/14] RadeonSI: SDMA texture copy rewrite

2016-05-05 Thread Marek Olšák

> There is one more hardware limitation that can cause VM faults with T2L 
> copies and needs a workaround. The exact workaround is still under 
> discussion, but I think this is good enough for review already.

BTW, the T2L VM fault is completely harmless. It's caused by SDMA
reading more entries from the page directory than needed. The
workaround can be added in a follow-up patch.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] r600g: use the hw MSAA resolving if formats are compatible

2016-05-05 Thread Marek Olšák

From: Marek Olšák 

This allows resolving RGBA into RGBX.
This should improve HL2 Lost Coast performance.
---
 src/gallium/drivers/r600/r600_blit.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600_blit.c 
b/src/gallium/drivers/r600/r600_blit.c
index ed67cb8..9230b40 100644
--- a/src/gallium/drivers/r600/r600_blit.c
+++ b/src/gallium/drivers/r600/r600_blit.c
@@ -808,7 +808,8 @@ static bool do_hardware_msaa_resolve(struct pipe_context 
*ctx,
info->dst.resource->nr_samples <= 1 &&
util_max_layer(info->src.resource, 0) == 0 &&
util_max_layer(info->dst.resource, info->dst.level) == 0 &&
-   info->dst.format == info->src.format &&
+   util_is_format_compatible(util_format_description(info->src.format),
+ 
util_format_description(info->dst.format)) &&
!util_format_is_pure_integer(format) &&
!util_format_is_depth_or_stencil(format) &&
!info->scissor_enable &&
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] r600g: use the hw MSAA resolving if formats are compatible

2016-05-05 Thread Alex Deucher

On Thu, May 5, 2016 at 1:03 PM, Marek Olšák  wrote:
> From: Marek Olšák 
>
> This allows resolving RGBA into RGBX.
> This should improve HL2 Lost Coast performance.

Reviewed-by: Alex Deucher 

> ---
>  src/gallium/drivers/r600/r600_blit.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/r600/r600_blit.c 
> b/src/gallium/drivers/r600/r600_blit.c
> index ed67cb8..9230b40 100644
> --- a/src/gallium/drivers/r600/r600_blit.c
> +++ b/src/gallium/drivers/r600/r600_blit.c
> @@ -808,7 +808,8 @@ static bool do_hardware_msaa_resolve(struct pipe_context 
> *ctx,
> info->dst.resource->nr_samples <= 1 &&
> util_max_layer(info->src.resource, 0) == 0 &&
> util_max_layer(info->dst.resource, info->dst.level) == 0 &&
> -   info->dst.format == info->src.format &&
> +   
> util_is_format_compatible(util_format_description(info->src.format),
> + 
> util_format_description(info->dst.format)) &&
> !util_format_is_pure_integer(format) &&
> !util_format_is_depth_or_stencil(format) &&
> !info->scissor_enable &&
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Only enable ARB_query_buffer_object for newer kernels on Haswell.

2016-05-05 Thread Kenneth Graunke

On Thursday, May 5, 2016 8:17:03 AM PDT Jordan Justen wrote:
> On 2016-05-05 03:15:51, Kenneth Graunke wrote:
> > On Haswell, we need version 6 of the kernel command parser in order to
> > write the math registers.  Our implementation of ARB_query_buffer_object
> > heavily relies on MI_MATH, so we should only advertise it when MI_MATH
> > is available.
> > 
> 
> Doh! Command Parser!! :(
> 
> > We're going to want to use MI_MATH in more places in the future.
> > To make checking for it easier, introduce a screen->has_mi_math flag.
> > 
> > Cc: Jordan Justen 
> > Signed-off-by: Kenneth Graunke 
> > ---
> >  src/mesa/drivers/dri/i965/intel_extensions.c | 2 +-
> >  src/mesa/drivers/dri/i965/intel_screen.c | 8 
> >  src/mesa/drivers/dri/i965/intel_screen.h | 5 +
> >  3 files changed, 14 insertions(+), 1 deletion(-)
> > 
> > Note: kernel command parser version 7 isn't actually a real thing yet.
> > You need the patch I just sent to intel-gfx:
> > [PATCH] drm/i915: Allow MI_LOAD_REGISTER_REG between whitelisted 
registers.
> > 
> > Also, Piglit's qbo test's
> > query-GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN-ASYNC subtest fails for me
> > on Haswell.  I haven't looked into why yet.
> > 
> > diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/
drivers/dri/i965/intel_extensions.c
> > index 588df1a..7794c2f 100644
> > --- a/src/mesa/drivers/dri/i965/intel_extensions.c
> > +++ b/src/mesa/drivers/dri/i965/intel_extensions.c
> > @@ -366,7 +366,7 @@ intelInitExtensions(struct gl_context *ctx)
> >}
> > }
> >  
> > -   if (brw->gen >= 8 || brw->is_haswell) {
> > +   if (brw->intelScreen->has_mi_math) {
> 
> Should we bump the check to gen >= 8 for now?
> 
> With just changing the check to not include hsw:
> 
> Reviewed-by: Jordan Justen 
> 
> >ctx->Extensions.ARB_query_buffer_object = true;
> > }
> >  
> > diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/
dri/i965/intel_screen.c
> > index 878901a..24fe523 100644
> > --- a/src/mesa/drivers/dri/i965/intel_screen.c
> > +++ b/src/mesa/drivers/dri/i965/intel_screen.c
> > @@ -1541,6 +1541,14 @@ __DRIconfig **intelInitScreen2(__DRIscreen *psp)
> > if (ret == -1)
> >intelScreen->cmd_parser_version = 0;
> >  
> > +   /* Haswell requires command parser version 6 in order to write to the
> > +* MI_MATH GPR registers, and version 7 in order to use
> > +* MI_LOAD_REGISTER_REG (which all users of MI_MATH use).
> > +*/
> > +   intelScreen->has_mi_math = intelScreen->devinfo->gen >= 8 ||
> > +  (intelScreen->devinfo->is_haswell &&
> > +   intelScreen->cmd_parser_version >= 7);
> 
> Do you think has_mi_math should imply LRR? Command parser version 6
> should allow MI_MATH to be used. I guess the Vulkan driver uses
> MI_MATH without LRR.
> 
> -Jordan

I'll admit, it's a little weird...originally this just checked for
version 6.  But then I discovered that we needed MI_LOAD_REGISTER_REG
too...both your query buffer code and my new ARB_transform_feedback2
implementation use both commands.

I suppose we could rename it to has_mi_math_and_load_reg_reg?


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] nir: fix nir_before/after_cf_node() for function impls

2016-05-05 Thread Connor Abbott

Signed-off-by: Connor Abbott 
---
 src/compiler/nir/nir.h | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 98451c6..8316b0d 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -1995,19 +1995,33 @@ nir_after_block_before_jump(nir_block *block)
 static inline nir_cursor
 nir_before_cf_node(nir_cf_node *node)
 {
-   if (node->type == nir_cf_node_block)
+   switch(node->type) {
+   case nir_cf_node_block:
   return nir_before_block(nir_cf_node_as_block(node));
 
-   return nir_after_block(nir_cf_node_as_block(nir_cf_node_prev(node)));
+   case nir_cf_node_function:
+  return nir_before_block(
+ nir_start_block(nir_cf_node_as_function(node)));
+
+   default:
+  return nir_after_block(nir_cf_node_as_block(nir_cf_node_prev(node)));
+   }
 }
 
 static inline nir_cursor
 nir_after_cf_node(nir_cf_node *node)
 {
-   if (node->type == nir_cf_node_block)
+   switch(node->type) {
+   case nir_cf_node_block:
   return nir_after_block(nir_cf_node_as_block(node));
 
-   return nir_before_block(nir_cf_node_as_block(nir_cf_node_next(node)));
+   case nir_cf_node_function:
+  return nir_after_block(
+ nir_impl_last_block(nir_cf_node_as_function(node)));
+
+   default:
+  return nir_before_block(nir_cf_node_as_block(nir_cf_node_next(node)));
+   }
 }
 
 static inline nir_cursor
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: fix nir_before/after_cf_node() for function impls

2016-05-05 Thread Jason Ekstrand

On Thu, May 5, 2016 at 10:10 AM, Connor Abbott  wrote:

> Signed-off-by: Connor Abbott 
> ---
>  src/compiler/nir/nir.h | 22 ++
>  1 file changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index 98451c6..8316b0d 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -1995,19 +1995,33 @@ nir_after_block_before_jump(nir_block *block)
>  static inline nir_cursor
>  nir_before_cf_node(nir_cf_node *node)
>  {
> -   if (node->type == nir_cf_node_block)
> +   switch(node->type) {
> +   case nir_cf_node_block:
>return nir_before_block(nir_cf_node_as_block(node));
>
> -   return nir_after_block(nir_cf_node_as_block(nir_cf_node_prev(node)));
> +   case nir_cf_node_function:
> +  return nir_before_block(
> + nir_start_block(nir_cf_node_as_function(node)));
>

I'm not sure what I think about this.  Every other nir_before_cf_node case
that touches a "container" node puts it *before* the container not at the
top of it.  This is a substantial edge-case.  What's the use-case?


> +
> +   default:
> +  return
> nir_after_block(nir_cf_node_as_block(nir_cf_node_prev(node)));
> +   }
>  }
>
>  static inline nir_cursor
>  nir_after_cf_node(nir_cf_node *node)
>  {
> -   if (node->type == nir_cf_node_block)
> +   switch(node->type) {
> +   case nir_cf_node_block:
>return nir_after_block(nir_cf_node_as_block(node));
>
> -   return nir_before_block(nir_cf_node_as_block(nir_cf_node_next(node)));
> +   case nir_cf_node_function:
> +  return nir_after_block(
> + nir_impl_last_block(nir_cf_node_as_function(node)));
> +
> +   default:
> +  return
> nir_before_block(nir_cf_node_as_block(nir_cf_node_next(node)));
> +   }
>  }
>
>  static inline nir_cursor
> --
> 2.5.5
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] meta/blit: Don't blend integer values during MSAA resolves

2016-05-05 Thread Matt Turner

On Wed, May 4, 2016 at 4:22 PM, Jason Ekstrand  wrote:
> ---
>  src/mesa/drivers/common/meta_blit.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/common/meta_blit.c 
> b/src/mesa/drivers/common/meta_blit.c
> index 6761238..bb79c46 100644
> --- a/src/mesa/drivers/common/meta_blit.c
> +++ b/src/mesa/drivers/common/meta_blit.c
> @@ -458,8 +458,17 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
>   int step;
>
>   if (src_datatype == GL_INT || src_datatype == GL_UNSIGNED_INT) {
> -merge_function =
> -   "gvec4 merge(gvec4 a, gvec4 b) { return (a >> gvec4(1)) + (b 
> >> gvec4(1)) + (a & b & gvec4(1)); }\n";
> +/* From the OpenGL ES 3.2 spec section 16.2.1:
> + *
> + *"If the source formats are integer types or stencil values,
> + *a single sample’s value is selected for each pixel."

Let's replace the smart quote with a regular old single quote.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] i965/borp: Don't blend integer values during MSAA resolves

2016-05-05 Thread Matt Turner

Typo in subject: s/borp/blorp/

On Wed, May 4, 2016 at 4:22 PM, Jason Ekstrand  wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 30 
> ++--
>  1 file changed, 19 insertions(+), 11 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
> b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> index 882f9ed..dc68eab 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> @@ -2160,6 +2160,8 @@ brw_blorp_blit_program::manual_blend_average(unsigned 
> num_samples)
> if (key->tex_layout == INTEL_MSAA_LAYOUT_CMS)
>mcs_fetch();
>
> +   assert(key->texture_data_type == BRW_REGISTER_TYPE_F);
> +
> /* We add together samples using a binary tree structure, e.g. for 4x 
> MSAA:
>  *
>  *   result = ((sample[0] + sample[1]) + (sample[2] + sample[3])) / 4
> @@ -2228,8 +2230,7 @@ brw_blorp_blit_program::manual_blend_average(unsigned 
> num_samples)
>
>   /* TODO: should use a smaller loop bound for non_RGBA formats */
>   for (int k = 0; k < 4; ++k) {
> -emit_combine(key->texture_data_type == BRW_REGISTER_TYPE_F ?
> -BRW_OPCODE_ADD : BRW_OPCODE_AVG,
> +emit_combine(BRW_OPCODE_ADD,
>   offset(texture_data[stack_depth - 1], 2*k),
>   offset(vec8(texture_data[stack_depth - 1]), 2*k),
>   offset(vec8(texture_data[stack_depth]), 2*k));
> @@ -2240,14 +2241,12 @@ brw_blorp_blit_program::manual_blend_average(unsigned 
> num_samples)
> /* We should have just 1 sample on the stack now. */
> assert(stack_depth == 1);
>
> -   if (key->texture_data_type == BRW_REGISTER_TYPE_F) {
> -  /* Scale the result down by a factor of num_samples */
> -  /* TODO: should use a smaller loop bound for non-RGBA formats */
> -  for (int j = 0; j < 4; ++j) {
> - emit_mul(offset(texture_data[0], 2*j),
> - offset(vec8(texture_data[0]), 2*j),
> - brw_imm_f(1.0f / num_samples));
> -  }
> +   /* Scale the result down by a factor of num_samples */
> +   /* TODO: should use a smaller loop bound for non-RGBA formats */
> +   for (int j = 0; j < 4; ++j) {
> +  emit_mul(offset(texture_data[0], 2*j),
> +  offset(vec8(texture_data[0]), 2*j),
> +  brw_imm_f(1.0f / num_samples));
> }
>
> if (key->tex_layout == INTEL_MSAA_LAYOUT_CMS)
> @@ -2869,8 +2868,17 @@ brw_blorp_blit_miptrees(struct brw_context *brw,
> GLenum base_format = _mesa_get_format_base_format(src_mt->format);
> if (base_format != GL_DEPTH_COMPONENT && /* TODO: what about 
> depth/stencil? */
> base_format != GL_STENCIL_INDEX &&
> +   !_mesa_is_format_integer(src_mt->format) &&
> src_mt->num_samples > 1 && dst_mt->num_samples <= 1) {
> -  /* We are downsampling a color buffer, so blend. */
> +  /* We are downsampling a non-integer color buffer, so blend.
> +   *
> +   * Regarding integer color buffers, the OpenGL ES 3.2 spec says:
> +   *
> +   *"If the source formats are integer types or stencil values, a
> +   *single sample’s value is selected for each pixel."

Smart quote -> single quote.

With those couple of things fixed, these two are

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Only enable ARB_query_buffer_object for newer kernels on Haswell.

2016-05-05 Thread Jordan Justen

On 2016-05-05 10:11:24, Kenneth Graunke wrote:
> On Thursday, May 5, 2016 8:17:03 AM PDT Jordan Justen wrote:
> > On 2016-05-05 03:15:51, Kenneth Graunke wrote:
> > >  
> > > +   /* Haswell requires command parser version 6 in order to write to the
> > > +* MI_MATH GPR registers, and version 7 in order to use
> > > +* MI_LOAD_REGISTER_REG (which all users of MI_MATH use).
> > > +*/
> > > +   intelScreen->has_mi_math = intelScreen->devinfo->gen >= 8 ||
> > > +  (intelScreen->devinfo->is_haswell &&
> > > +   intelScreen->cmd_parser_version >= 7);
> > 
> > Do you think has_mi_math should imply LRR? Command parser version 6
> > should allow MI_MATH to be used. I guess the Vulkan driver uses
> > MI_MATH without LRR.
> > 
> > -Jordan
> 
> I'll admit, it's a little weird...originally this just checked for
> version 6.  But then I discovered that we needed MI_LOAD_REGISTER_REG
> too...both your query buffer code and my new ARB_transform_feedback2
> implementation use both commands.
> 
> I suppose we could rename it to has_mi_math_and_load_reg_reg?

has_mi_math_and_lrr?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [v4 01/11] i965/blorp: Set full resolve for lossless compressed

2016-05-05 Thread Ben Widawsky

On Thu, Apr 21, 2016 at 02:58:56PM +0300, Topi Pohjolainen wrote:
> Signed-off-by: Topi Pohjolainen 
> ---
>  src/mesa/drivers/dri/i965/brw_blorp_clear.cpp | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp 
> b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> index 41ff2a5..b1da935 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> @@ -217,7 +217,10 @@ brw_blorp_rt_resolve_params::brw_blorp_rt_resolve_params(
>  
> brw_get_resolve_rect(brw, mt, &x0, &y0, &x1, &y1);
>  
> -   fast_clear_op = GEN7_PS_RENDER_TARGET_RESOLVE_ENABLE;
> +   if (intel_miptree_is_lossless_compressed(brw, mt))
> +  fast_clear_op = GEN9_PS_RENDER_TARGET_RESOLVE_FULL;
> +   else
> +  fast_clear_op = GEN7_PS_RENDER_TARGET_RESOLVE_ENABLE;
>  
> /* Note: there is no need to initialize push constants because it doesn't
>  * matter what data gets dispatched to the render target.  However, we 
> must

Reviewed-by: Ben Widawsky 



After rebasing this it becomes params.fast_clear_op and I would be in favor of:

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h 
b/src/mesa/drivers/dri/i965/brw_blorp.h
index c5c2c4e..d61e9d5 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.h
+++ b/src/mesa/drivers/dri/i965/brw_blorp.h
@@ -225,7 +225,10 @@ struct brw_blorp_params
struct brw_blorp_surface_info src;
struct brw_blorp_surface_info dst;
enum gen6_hiz_op hiz_op;
-   unsigned fast_clear_op;
+   union {
+  unsigned fast_clear_op;
+  unsigned resolve_type;
+   };
bool color_write_disable[4];
struct brw_blorp_wm_push_constants wm_push_consts;
unsigned num_varyings;
diff --git a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
index 6c48dfc..9b6c95b 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
@@ -411,9 +411,9 @@ brw_blorp_resolve_color(struct brw_context *brw, struct 
intel_mipmap_tree *mt)
 ¶ms.x1, ¶ms.y1);
 
if (intel_miptree_is_lossless_compressed(brw, mt))
-  params.fast_clear_op = GEN9_PS_RENDER_TARGET_RESOLVE_FULL;
+  params.resolve_type = GEN9_PS_RENDER_TARGET_RESOLVE_FULL;
else
-  params.fast_clear_op = GEN7_PS_RENDER_TARGET_RESOLVE_ENABLE;
+  params.resolve_type = GEN7_PS_RENDER_TARGET_RESOLVE_ENABLE;
 
/* Note: there is no need to initialize push constants because it doesn't
 * matter what data gets dispatched to the render target.  However, we must

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [v4 02/11] i965: Relax assertion of halign == 16 for lossless compressed aux

2016-05-05 Thread Ben Widawsky

On Thu, Apr 21, 2016 at 02:58:57PM +0300, Topi Pohjolainen wrote:
> Signed-off-by: Topi Pohjolainen 
> ---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index ae08300..b68575f 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -556,8 +556,13 @@ intel_miptree_create_layout(struct brw_context *brw,
> } else if (brw->gen >= 9 && num_samples > 1) {
>layout_flags |= MIPTREE_LAYOUT_FORCE_HALIGN16;
> } else {
> +  const bool is_lossless_compressed_aux =
> + brw->gen >= 9 && num_samples == 1 &&
> + mt->format == MESA_FORMAT_R_UINT32;
> +
>/* For now, nothing else has this requirement */
> -  assert((layout_flags & MIPTREE_LAYOUT_FORCE_HALIGN16) == 0);
> +  assert(is_lossless_compressed_aux ||
> + (layout_flags & MIPTREE_LAYOUT_FORCE_HALIGN16) == 0);
> }
>  
> brw_miptree_layout(brw, mt, layout_flags);

I'd just drop the else condition entirely instead. The else case assertion also
becomes invalid when we want to fast clear multi LOD and arrayed surfaces. I'm
slightly worried that the assertion is somewhat frail and not future proof.

Also the format check makes me wonder if you want to add some assertion or
condition to intel_miptree_is_lossless_compressed? ie.

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 17c87a2..467a60b 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -296,6 +296,9 @@ intel_miptree_is_lossless_compressed(const struct 
brw_context *brw,
if (mt->msaa_layout != INTEL_MSAA_LAYOUT_CMS)
   return false;
 
+   if (mt->format == MESA_FORMAT_R_UINT32)
+  return false;
+
/* And finally distinguish between msaa and single sample case. */
return mt->num_samples <= 1;
 }

Either way:
Reviewed-by: Ben Widawsky 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [v4 02/11] i965: Relax assertion of halign == 16 for lossless compressed aux

2016-05-05 Thread Ben Widawsky

On Thu, May 05, 2016 at 10:51:32AM -0700, Ben Widawsky wrote:
> On Thu, Apr 21, 2016 at 02:58:57PM +0300, Topi Pohjolainen wrote:
> > Signed-off-by: Topi Pohjolainen 
> > ---
> >  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 7 ++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > index ae08300..b68575f 100644
> > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > @@ -556,8 +556,13 @@ intel_miptree_create_layout(struct brw_context *brw,
> > } else if (brw->gen >= 9 && num_samples > 1) {
> >layout_flags |= MIPTREE_LAYOUT_FORCE_HALIGN16;
> > } else {
> > +  const bool is_lossless_compressed_aux =
> > + brw->gen >= 9 && num_samples == 1 &&
> > + mt->format == MESA_FORMAT_R_UINT32;
> > +
> >/* For now, nothing else has this requirement */
> > -  assert((layout_flags & MIPTREE_LAYOUT_FORCE_HALIGN16) == 0);
> > +  assert(is_lossless_compressed_aux ||
> > + (layout_flags & MIPTREE_LAYOUT_FORCE_HALIGN16) == 0);
> > }
> >  
> > brw_miptree_layout(brw, mt, layout_flags);
> 
> I'd just drop the else condition entirely instead. The else case assertion 
> also
> becomes invalid when we want to fast clear multi LOD and arrayed surfaces. I'm
> slightly worried that the assertion is somewhat frail and not future proof.
> 
> Also the format check makes me wonder if you want to add some assertion or
> condition to intel_miptree_is_lossless_compressed? ie.
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index 17c87a2..467a60b 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -296,6 +296,9 @@ intel_miptree_is_lossless_compressed(const struct 
> brw_context *brw,
> if (mt->msaa_layout != INTEL_MSAA_LAYOUT_CMS)
>return false;
>  
> +   if (mt->format == MESA_FORMAT_R_UINT32)
> +  return false;
> +
> /* And finally distinguish between msaa and single sample case. */
> return mt->num_samples <= 1;
>  }
> 
> Either way:
> Reviewed-by: Ben Widawsky 

Hmm. On second thought, doesn't this still violate what the spec says?
"When Auxiliary Surface Mode is set to AUX_CCS_D or AUX_CCS_E, HALIGN 16 must be
used."
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [v4 03/11] i965/gen9: Prepare surface state setup for lossless compression

2016-05-05 Thread Ben Widawsky

On Thu, Apr 21, 2016 at 02:58:58PM +0300, Topi Pohjolainen wrote:
> v2 (Ben): Use combination of msaa_layout and number of samples
>   instead of introducing explicit type for lossless
>   compression (intel_miptree_is_lossless_compressed()).
> v3 (Ben): Do not set fast claer state in surface state setup.
>   Moved into brw_postdraw_set_buffers_need_resolve()
>   using a separate patch.
> v4: Support for blorp
> 
> Signed-off-by: Topi Pohjolainen 
> ---
>  src/mesa/drivers/dri/i965/brw_defines.h| 1 +
>  src/mesa/drivers/dri/i965/gen8_blorp.cpp   | 5 -
>  src/mesa/drivers/dri/i965/gen8_surface_state.c | 3 +++
>  3 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index 60b696c..3193d32 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -656,6 +656,7 @@
>  #define GEN8_SURFACE_AUX_MODE_MCS   1
>  #define GEN8_SURFACE_AUX_MODE_APPEND2
>  #define GEN8_SURFACE_AUX_MODE_HIZ   3
> +#define GEN9_SURFACE_AUX_MODE_CCS_E 5
>  
>  /* Surface state DW7 */
>  #define GEN9_SURFACE_RT_COMPRESSION_SHIFT   30
> diff --git a/src/mesa/drivers/dri/i965/gen8_blorp.cpp 
> b/src/mesa/drivers/dri/i965/gen8_blorp.cpp
> index bbf724c..acab52d 100644
> --- a/src/mesa/drivers/dri/i965/gen8_blorp.cpp
> +++ b/src/mesa/drivers/dri/i965/gen8_blorp.cpp

s/cpp/c

> @@ -74,10 +74,13 @@ gen8_blorp_emit_surface_state(struct brw_context *brw,
>  surface->msaa_layout);
>  
> if (surface->mt->mcs_mt) {
> +  const uint32_t aux_mode =
> + intel_miptree_is_lossless_compressed(brw, mt) ?
> +GEN9_SURFACE_AUX_MODE_CCS_E : GEN8_SURFACE_AUX_MODE_MCS;
>surf[6] = SET_FIELD(surface->mt->qpitch / 4, GEN8_SURFACE_AUX_QPITCH) |
>  SET_FIELD((surface->mt->mcs_mt->pitch / 128) - 1,
>GEN8_SURFACE_AUX_PITCH) |
> -GEN8_SURFACE_AUX_MODE_MCS;
> +aux_mode;
> } else {
>surf[6] = 0;
> }
> diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
> b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> index 5161d2b..e3593d3 100644
> --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> @@ -217,6 +217,9 @@ gen8_get_aux_mode(const struct brw_context *brw,
> if (brw->gen >= 9 || mt->num_samples == 1)
>assert(mt->halign == 16);
>  
> +   if (intel_miptree_is_lossless_compressed(brw, mt))
> +  return GEN9_SURFACE_AUX_MODE_CCS_E;
> +
> return GEN8_SURFACE_AUX_MODE_MCS;
>  }
>  

Wouldn't mind if you extracted and used gen8_get_aux_mode() instead.

Either way:
Reviewed-by: Ben Widawsky 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: fix nir_before/after_cf_node() for function impls

2016-05-05 Thread Connor Abbott

On Thu, May 5, 2016 at 1:20 PM, Jason Ekstrand  wrote:
> On Thu, May 5, 2016 at 10:10 AM, Connor Abbott  wrote:
>>
>> Signed-off-by: Connor Abbott 
>> ---
>>  src/compiler/nir/nir.h | 22 ++
>>  1 file changed, 18 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> index 98451c6..8316b0d 100644
>> --- a/src/compiler/nir/nir.h
>> +++ b/src/compiler/nir/nir.h
>> @@ -1995,19 +1995,33 @@ nir_after_block_before_jump(nir_block *block)
>>  static inline nir_cursor
>>  nir_before_cf_node(nir_cf_node *node)
>>  {
>> -   if (node->type == nir_cf_node_block)
>> +   switch(node->type) {
>> +   case nir_cf_node_block:
>>return nir_before_block(nir_cf_node_as_block(node));
>>
>> -   return nir_after_block(nir_cf_node_as_block(nir_cf_node_prev(node)));
>> +   case nir_cf_node_function:
>> +  return nir_before_block(
>> + nir_start_block(nir_cf_node_as_function(node)));
>
>
> I'm not sure what I think about this.  Every other nir_before_cf_node case
> that touches a "container" node puts it *before* the container not at the
> top of it.  This is a substantial edge-case.  What's the use-case?

That's a good point. On the other hand, this is kind of the only sane
thing that this can do. It was one of Rob's passes that was
segfaulting. Is there a branch somewhere with the code that was doing
this?

>
>>
>> +
>> +   default:
>> +  return
>> nir_after_block(nir_cf_node_as_block(nir_cf_node_prev(node)));
>> +   }
>>  }
>>
>>  static inline nir_cursor
>>  nir_after_cf_node(nir_cf_node *node)
>>  {
>> -   if (node->type == nir_cf_node_block)
>> +   switch(node->type) {
>> +   case nir_cf_node_block:
>>return nir_after_block(nir_cf_node_as_block(node));
>>
>> -   return nir_before_block(nir_cf_node_as_block(nir_cf_node_next(node)));
>> +   case nir_cf_node_function:
>> +  return nir_after_block(
>> + nir_impl_last_block(nir_cf_node_as_function(node)));
>> +
>> +   default:
>> +  return
>> nir_before_block(nir_cf_node_as_block(nir_cf_node_next(node)));
>> +   }
>>  }
>>
>>  static inline nir_cursor
>> --
>> 2.5.5
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [v4 04/11] i965: Add helper for lossless compression support

2016-05-05 Thread Ben Widawsky

On Thu, Apr 21, 2016 at 02:58:59PM +0300, Topi Pohjolainen wrote:
> v2: Check explicitly against base type of GL_FLOAT instead of
> using _mesa_is_format_integer_color(). Otherwise we miss
> GL_UNSIGNED_NORMALIZED.
> 
> Signed-off-by: Topi Pohjolainen 
> ---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 22 ++
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.h |  3 +++
>  2 files changed, 25 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index b68575f..78717df 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -294,6 +294,28 @@ intel_miptree_is_lossless_compressed(const struct 
> brw_context *brw,
> return mt->num_samples <= 1;
>  }
>  
> +bool
> +intel_miptree_supports_lossless_compressed(mesa_format format)
> +{
> +   /* For now compression is only enabled for integer formats even though
> +* there exist supported floating point formats also. This is a heuristic
> +* decision based on current public benchmarks. In none of the cases these
> +* formats provided any improvement but a few cases were seen to regress.
> +* Hence these are left to to be enabled in the future when they are known
> +* to improve things.
> +*/
> +   if (_mesa_get_format_datatype(format) == GL_FLOAT)
> +  return false;
> +
> +   /* In principle, fast clear mechanism and lossless compression go hand in
> +* hand. However, fast clear can be also used to clear srgb surfaces by
> +* using equivalent linear format. This trick, however, can't be extended
> +* to be used with lossless compression and therefore a check is needed to
> +* see if the format really is linear.
> +*/
> +   return _mesa_get_srgb_format_linear(format) == format;
> +}
> +

Picking nits here, but I don't like this function because it doesn't do what it
purports to do in it's name. It's really an ancillary function to MCS support.
How about?

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 20c8a7f..17a6f7e 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -301,7 +301,9 @@ intel_miptree_is_lossless_compressed(const struct 
brw_context *brw,
 }
 
 bool
-intel_miptree_supports_lossless_compressed(mesa_format format)
+intel_miptree_supports_lossless_compressed(struct brw_context *brw,
+   const struct intel_mipmap_tree *mt,
+   mesa_format format)
 {
/* For now compression is only enabled for integer formats even though
 * there exist supported floating point formats also. This is a heuristic
@@ -313,11 +315,14 @@ intel_miptree_supports_lossless_compressed(mesa_format 
format)
if (_mesa_get_format_datatype(format) == GL_FLOAT)
   return false;
 
-   /* In principle, fast clear mechanism and lossless compression go hand in
-* hand. However, fast clear can be also used to clear srgb surfaces by
-* using equivalent linear format. This trick, however, can't be extended
-* to be used with lossless compression and therefore a check is needed to
-* see if the format really is linear.
+   /* Fast clear mechanism and lossless compression go hand in hand. */
+   if (!intel_miptree_supports_non_msrt_fast_clear(brw, mt))
+  return false;
+
+   /* Fast clear can be also used to clear srgb surfaces by using equivalent
+* linear format. This trick, however, can't be extended to be used with
+* lossless compression and therefore a check is needed to see if the format
+* really is linear.
 */
return _mesa_get_srgb_format_linear(format) == format;
 }
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
index d4f8563..c859f6e 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
@@ -680,7 +680,9 @@ intel_miptree_supports_non_msrt_fast_clear(struct 
brw_context *brw,
const struct intel_mipmap_tree *mt);
 
 bool
-intel_miptree_supports_lossless_compressed(mesa_format format);
+intel_miptree_supports_lossless_compressed(struct brw_context *brw,
+   const struct intel_mipmap_tree *mt,
+   mesa_format format);
 
 bool
 intel_miptree_alloc_non_msrt_mcs(struct brw_context *brw,

>  /**
>   * Determine depth format corresponding to a depth+stencil format,
>   * for separate stencil.
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> index 7862152..6c143c3 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> @@ -679,6 +679,9 @@ intel_miptree_supports_

[Mesa-dev] [PATCH] nir: fix block iterator to not forget fxn->end_block

2016-05-05 Thread Rob Clark

From: Rob Clark 

With the switch to new block iterator macro, we silently stopped
iterating over the end-block.  Which caused nir_index_blocks() to not
index the end-block.  Resulting in funny looking nir_print's like:

impl main {
block block_0:
/* preds: */
intrinsic copy_var () (gl_FragColor@out-temp, gl_Color) ()
intrinsic copy_var () (gl_FragColor, gl_FragColor@out-temp) ()
/* succs: block_0 */
block block_0:
}

I don't *think* there are any more serious consequences of not iterating
the end_block, but it's probably also a good idea to not subtly change
behavior compared to the old fxn-callback based iterator.

Signed-off-by: Rob Clark 
---
 src/compiler/nir/nir.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/src/compiler/nir/nir.c b/src/compiler/nir/nir.c
index 867a43c..1f51b31 100644
--- a/src/compiler/nir/nir.c
+++ b/src/compiler/nir/nir.c
@@ -1512,12 +1512,18 @@ nir_block_cf_tree_next(nir_block *block)
   return NULL;
}
 
-   nir_cf_node *cf_next = nir_cf_node_next(&block->cf_node);
+   nir_cf_node *parent = block->cf_node.parent;
+   nir_cf_node *cf_next;
+
+   if ((parent->type == nir_cf_node_function) &&
+   (block == nir_cf_node_as_function(parent)->end_block))
+  cf_next = NULL;
+   else
+  cf_next = nir_cf_node_next(&block->cf_node);
+
if (cf_next)
   return nir_cf_node_cf_tree_first(cf_next);
 
-   nir_cf_node *parent = block->cf_node.parent;
-
switch (parent->type) {
case nir_cf_node_if: {
   /* Are we at the end of the if? Go to the beginning of the else */
@@ -1532,9 +1538,12 @@ nir_block_cf_tree_next(nir_block *block)
case nir_cf_node_loop:
   return nir_cf_node_as_block(nir_cf_node_next(parent));
 
-   case nir_cf_node_function:
+   case nir_cf_node_function: {
+  nir_function_impl *func = nir_cf_node_as_function(parent);
+  if (func->end_block != block)
+ return func->end_block;
   return NULL;
-
+   }
default:
   unreachable("unknown cf node type");
}
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/11] update swr rasterizer

2016-05-05 Thread Cherniak, Bruce

Reviewed-by: Bruce Cherniak  

> On May 3, 2016, at 11:13 AM, Tim Rowley  wrote:
> 
> Smallish changes this time around, some changes that hopefully
> start improving coverity's opinion of the rasterizer, and a small
> performance tweak.
> 
> Tim Rowley (11):
>  swr: [rasterizer] Whitespace cleanup and misc changes
>  swr: [rasterizer core] Fix threadviz support in buckets
>  swr: [rasterizer core] Fix thread allocation
>  swr: [rasterizer jitter] Add asserts for supported formats in fetch
>shader
>  swr: [rasterizer memory] Add missing store tiles function
>  swr: [rasterizer jitter] Fix printing bugs for tracing.
>  swr: [rasterizer] Add support for X24_TYPELESS_G8_UINT format
>  swr: [rasterizer] Miscellaneous backend changes
>  swr: [rasterizer] Add SWR_ASSUME / SWR_ASSUME_ASSERT macros
>  swr: [rasterizer] Small warning cleanup
>  swr: [rasterizer core] Faster modulo operator in ProcessVerts
> 
> .../drivers/swr/rasterizer/common/formats.cpp  |  25 +++--
> .../drivers/swr/rasterizer/common/formats.h|   1 +
> src/gallium/drivers/swr/rasterizer/common/os.h |   1 +
> .../swr/rasterizer/common/rdtsc_buckets.cpp|  12 +++
> .../drivers/swr/rasterizer/common/rdtsc_buckets.h  |  12 +--
> .../drivers/swr/rasterizer/common/simdintrin.h |  20 
> .../drivers/swr/rasterizer/common/swr_assert.h |  50 +-
> src/gallium/drivers/swr/rasterizer/core/arena.h|  16 ++--
> .../drivers/swr/rasterizer/core/backend.cpp|  26 ++---
> src/gallium/drivers/swr/rasterizer/core/backend.h  |   7 +-
> .../drivers/swr/rasterizer/core/format_traits.h|  22 +
> .../drivers/swr/rasterizer/core/frontend.cpp   |   5 +-
> src/gallium/drivers/swr/rasterizer/core/pa.h   |   5 +-
> .../drivers/swr/rasterizer/core/rdtsc_core.cpp |   2 +-
> .../drivers/swr/rasterizer/core/threads.cpp|  64 +
> .../drivers/swr/rasterizer/jitter/JitManager.h |   7 --
> .../drivers/swr/rasterizer/jitter/blend_jit.h  |   1 -
> .../drivers/swr/rasterizer/jitter/builder_misc.cpp | 105 +
> .../drivers/swr/rasterizer/jitter/fetch_jit.cpp|   2 +
> .../drivers/swr/rasterizer/jitter/jit_api.h|   1 -
> .../drivers/swr/rasterizer/memory/StoreTile.cpp|   5 +-
> .../rasterizer/scripts/templates/knobs.template|   3 -
> src/gallium/drivers/swr/swr_state.h|   1 +
> 23 files changed, 225 insertions(+), 168 deletions(-)
> 
> -- 
> 1.9.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radeonsi: Compute correct LDS size for fragment shaders.

2016-05-05 Thread Bas Nieuwenhuizen

No sure where the 36 came from, but we clearly need at least
48 bytes per attribute per primitive.

Signed-off-by: Bas Nieuwenhuizen 
---
 src/gallium/drivers/radeonsi/si_shader.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 49c498d..211db9f 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -5640,15 +5640,18 @@ static void si_shader_dump_stats(struct si_screen 
*sscreen,
 
/* Compute LDS usage for PS. */
if (processor == PIPE_SHADER_FRAGMENT) {
-   /* The minimum usage per wave is (num_inputs * 36). The maximum
-* usage is (num_inputs * 36 * 16).
+   /* The minimum usage per wave is (num_inputs * 48). The maximum
+* usage is (num_inputs * 48 * 16).
 * We can get anything in between and it varies between waves.
 *
+* The 48 bytes per input for a single primitive is equal to
+* 4 bytes/component * 4 components/input * 3 points.
+*
 * Other stages don't know the size at compile time or don't
 * allocate LDS per wave, but instead they do it per thread 
group.
 */
lds_per_wave = conf->lds_size * lds_increment +
-  align(num_inputs * 36, lds_increment);
+  align(num_inputs * 48, lds_increment);
}
 
/* Compute the per-SIMD wave counts. */
-- 
2.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] glsl: trap atomic operations on illegal image formats in compiler.

2016-05-05 Thread Dave Airlie

From: Dave Airlie 

This fixes:
GL43-CTS.shader_image_load_store.negative-compileErrors

where shader 9 was being compiled with atomic operation on an r16i.

Signed-off-by: Dave Airlie 
---
 src/compiler/glsl/ast_function.cpp  | 21 +
 src/compiler/glsl/builtin_functions.cpp |  6 +++---
 src/compiler/glsl/glsl_parser_extras.h  |  7 +++
 src/compiler/glsl/ir.h  |  2 ++
 4 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/src/compiler/glsl/ast_function.cpp 
b/src/compiler/glsl/ast_function.cpp
index fe0df15..3ea6e05 100644
--- a/src/compiler/glsl/ast_function.cpp
+++ b/src/compiler/glsl/ast_function.cpp
@@ -139,6 +139,27 @@ verify_image_parameter(YYLTYPE *loc, 
_mesa_glsl_parse_state *state,
   return false;
}
 
+   if (formal->data.image_atomic) {
+ if (actual->data.image_format != GL_R32UI &&
+actual->data.image_format != GL_R32I) {
+   _mesa_glsl_error(loc, state,
+   "atomic operations can only happen on r32ui/r32i 
formats.");
+   return false;
+ }
+   }
+
+   if (formal->data.image_atomic_exchange) {
+ if ((actual->data.image_format != GL_R32UI &&
+ actual->data.image_format != GL_R32I)) {
+   /* check for the r32f special case. */
+   if (!(state->has_shader_image_atomic_exchange_float() &&
+actual->data.image_format == GL_R32F)) {
+_mesa_glsl_error(loc, state,
+ "atomic exchange operations can only happen on 
r32ui/r32i formats (or r32f in GLES).");
+return false;
+   }
+ }
+   }
return true;
 }
 
diff --git a/src/compiler/glsl/builtin_functions.cpp 
b/src/compiler/glsl/builtin_functions.cpp
index 25d914d..9d029bb 100644
--- a/src/compiler/glsl/builtin_functions.cpp
+++ b/src/compiler/glsl/builtin_functions.cpp
@@ -478,9 +478,7 @@ shader_image_atomic(const _mesa_glsl_parse_state *state)
 static bool
 shader_image_atomic_exchange_float(const _mesa_glsl_parse_state *state)
 {
-   return (state->is_version(450, 320) ||
-   state->ARB_ES3_1_compatibility_enable ||
-   state->OES_shader_image_atomic_enable);
+  return (state->has_shader_image_atomic_exchange_float());
 }
 
 static bool
@@ -5436,6 +5434,8 @@ builtin_builder::_image_prototype(const glsl_type 
*image_type,
image->data.image_coherent = true;
image->data.image_volatile = true;
image->data.image_restrict = true;
+   image->data.image_atomic = (flags & IMAGE_FUNCTION_AVAIL_ATOMIC) != 0;
+   image->data.image_atomic_exchange = (flags & 
IMAGE_FUNCTION_AVAIL_ATOMIC_EXCHANGE) != 0;
 
return sig;
 }
diff --git a/src/compiler/glsl/glsl_parser_extras.h 
b/src/compiler/glsl/glsl_parser_extras.h
index 7018347..9bf9245 100644
--- a/src/compiler/glsl/glsl_parser_extras.h
+++ b/src/compiler/glsl/glsl_parser_extras.h
@@ -270,6 +270,13 @@ struct _mesa_glsl_parse_state {
   return OES_geometry_shader_enable || is_version(150, 320);
}
 
+   bool has_shader_image_atomic_exchange_float() const
+   {
+  return (is_version(450, 320) ||
+ ARB_ES3_1_compatibility_enable ||
+ OES_shader_image_atomic_enable);
+   }
+
void process_version_directive(YYLTYPE *locp, int version,
   const char *ident);
 
diff --git a/src/compiler/glsl/ir.h b/src/compiler/glsl/ir.h
index 0c319ea..2bf04e68 100644
--- a/src/compiler/glsl/ir.h
+++ b/src/compiler/glsl/ir.h
@@ -811,6 +811,8 @@ public:
   unsigned image_volatile:1;
   unsigned image_restrict:1;
 
+  unsigned image_atomic:1;
+  unsigned image_atomic_exchange:1;
   /**
* ARB_shader_storage_buffer_object
*/
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glsl: trap atomic operations on illegal image formats in compiler.

2016-05-05 Thread Matt Turner

On Thu, May 5, 2016 at 3:19 PM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> This fixes:
> GL43-CTS.shader_image_load_store.negative-compileErrors
>
> where shader 9 was being compiled with atomic operation on an r16i.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/compiler/glsl/ast_function.cpp  | 21 +
>  src/compiler/glsl/builtin_functions.cpp |  6 +++---
>  src/compiler/glsl/glsl_parser_extras.h  |  7 +++
>  src/compiler/glsl/ir.h  |  2 ++
>  4 files changed, 33 insertions(+), 3 deletions(-)
>
> diff --git a/src/compiler/glsl/ast_function.cpp 
> b/src/compiler/glsl/ast_function.cpp
> index fe0df15..3ea6e05 100644
> --- a/src/compiler/glsl/ast_function.cpp
> +++ b/src/compiler/glsl/ast_function.cpp
> @@ -139,6 +139,27 @@ verify_image_parameter(YYLTYPE *loc, 
> _mesa_glsl_parse_state *state,
>return false;
> }
>
> +   if (formal->data.image_atomic) {
> + if (actual->data.image_format != GL_R32UI &&
> +actual->data.image_format != GL_R32I) {
> +   _mesa_glsl_error(loc, state,
> +   "atomic operations can only happen on r32ui/r32i 
> formats.");

Bunch of tabs in this patch.

> +   return false;
> + }
> +   }
> +
> +   if (formal->data.image_atomic_exchange) {
> + if ((actual->data.image_format != GL_R32UI &&
> + actual->data.image_format != GL_R32I)) {
> +   /* check for the r32f special case. */
> +   if (!(state->has_shader_image_atomic_exchange_float() &&
> +actual->data.image_format == GL_R32F)) {
> +_mesa_glsl_error(loc, state,
> + "atomic exchange operations can only happen on 
> r32ui/r32i formats (or r32f in GLES).");

Probably would be nice to line wrap this.

> +return false;
> +   }
> + }
> +   }
> return true;
>  }
>
> diff --git a/src/compiler/glsl/builtin_functions.cpp 
> b/src/compiler/glsl/builtin_functions.cpp
> index 25d914d..9d029bb 100644
> --- a/src/compiler/glsl/builtin_functions.cpp
> +++ b/src/compiler/glsl/builtin_functions.cpp
> @@ -478,9 +478,7 @@ shader_image_atomic(const _mesa_glsl_parse_state *state)
>  static bool
>  shader_image_atomic_exchange_float(const _mesa_glsl_parse_state *state)
>  {
> -   return (state->is_version(450, 320) ||
> -   state->ARB_ES3_1_compatibility_enable ||
> -   state->OES_shader_image_atomic_enable);
> +  return (state->has_shader_image_atomic_exchange_float());
>  }
>
>  static bool
> @@ -5436,6 +5434,8 @@ builtin_builder::_image_prototype(const glsl_type 
> *image_type,
> image->data.image_coherent = true;
> image->data.image_volatile = true;
> image->data.image_restrict = true;
> +   image->data.image_atomic = (flags & IMAGE_FUNCTION_AVAIL_ATOMIC) != 0;
> +   image->data.image_atomic_exchange = (flags & 
> IMAGE_FUNCTION_AVAIL_ATOMIC_EXCHANGE) != 0;
>
> return sig;
>  }
> diff --git a/src/compiler/glsl/glsl_parser_extras.h 
> b/src/compiler/glsl/glsl_parser_extras.h
> index 7018347..9bf9245 100644
> --- a/src/compiler/glsl/glsl_parser_extras.h
> +++ b/src/compiler/glsl/glsl_parser_extras.h
> @@ -270,6 +270,13 @@ struct _mesa_glsl_parse_state {
>return OES_geometry_shader_enable || is_version(150, 320);
> }
>
> +   bool has_shader_image_atomic_exchange_float() const
> +   {
> +  return (is_version(450, 320) ||
> + ARB_ES3_1_compatibility_enable ||
> + OES_shader_image_atomic_enable);
> +   }
> +
> void process_version_directive(YYLTYPE *locp, int version,
>const char *ident);
>
> diff --git a/src/compiler/glsl/ir.h b/src/compiler/glsl/ir.h
> index 0c319ea..2bf04e68 100644
> --- a/src/compiler/glsl/ir.h
> +++ b/src/compiler/glsl/ir.h
> @@ -811,6 +811,8 @@ public:
>unsigned image_volatile:1;
>unsigned image_restrict:1;
>
> +  unsigned image_atomic:1;
> +  unsigned image_atomic_exchange:1;

Leave a newline after these.

>/**
> * ARB_shader_storage_buffer_object
> */
> --
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] [rfc] mesa/debug_output: fix returned message length.

2016-05-05 Thread Dave Airlie

On 4 May 2016 at 06:52, Dave Airlie  wrote:
> From: Dave Airlie 
>
> This fixes both:
> GL43-CTS.khr_debug.receiveing_messages
> GL43-CTS.khr_debug.groups

Lols, this breaks
GL43-CTS.gtf43.GL2ExtensionTests.debug.debug

and the spec states the lengths include the NULL terminator, so ignore
this patch.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/7] i965: Support textures with multiple planes

2016-05-05 Thread Kristian Høgsberg

From: Kristian Høgsberg Kristensen 

---
 src/mesa/drivers/dri/i965/brw_compiler.h  |  1 +
 src/mesa/drivers/dri/i965/brw_context.h   |  2 +-
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp  | 13 
 src/mesa/drivers/dri/i965/brw_shader.cpp  |  9 ++
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c  | 38 +--
 src/mesa/drivers/dri/i965/gen7_wm_surface_state.c |  3 +-
 src/mesa/drivers/dri/i965/gen8_surface_state.c| 12 ++-
 7 files changed, 59 insertions(+), 19 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h 
b/src/mesa/drivers/dri/i965/brw_compiler.h
index 5807305..7d75202 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.h
+++ b/src/mesa/drivers/dri/i965/brw_compiler.h
@@ -329,6 +329,7 @@ struct brw_stage_prog_data {
   uint32_t abo_start;
   uint32_t image_start;
   uint32_t shader_time_start;
+  uint32_t plane_start[3];
   /** @} */
} binding_table;
 
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 1d3d5b2..5b6b3a2 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -722,7 +722,7 @@ struct brw_context
   void (*update_texture_surface)(struct gl_context *ctx,
  unsigned unit,
  uint32_t *surf_offset,
- bool for_gather);
+ bool for_gather, uint32_t plane);
   uint32_t (*update_renderbuffer_surface)(struct brw_context *brw,
   struct gl_renderbuffer *rb,
   bool layered, unsigned unit,
diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index cf4f782..08d00c9 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -3169,6 +3169,19 @@ fs_visitor::nir_emit_texture(const fs_builder &bld, 
nir_tex_instr *instr)
  break;
   }
 
+  case nir_tex_src_plane: {
+ nir_const_value *const_plane =
+nir_src_as_const_value(instr->src[i].src);
+ const uint32_t plane = const_plane->u32[0];
+ const uint32_t texture_index =
+instr->texture_index +
+stage_prog_data->binding_table.plane_start[plane] -
+stage_prog_data->binding_table.texture_start;
+
+ texture_reg = fs_reg(brw_imm_ud(texture_index));
+ break;
+  }
+
   default:
  unreachable("unknown texture source");
   }
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index 376cb25..4566597 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -1125,6 +1125,15 @@ brw_assign_common_binding_table_offsets(gl_shader_stage 
stage,
stage_prog_data->binding_table.pull_constants_start = 
next_binding_table_offset;
next_binding_table_offset++;
 
+   /* Plane 0 is just the regular texture section */
+   stage_prog_data->binding_table.plane_start[0] = 
stage_prog_data->binding_table.texture_start;
+
+   stage_prog_data->binding_table.plane_start[1] = next_binding_table_offset;
+   next_binding_table_offset += num_textures;
+
+   stage_prog_data->binding_table.plane_start[2] = next_binding_table_offset;
+   next_binding_table_offset += num_textures;
+
assert(next_binding_table_offset <= BRW_MAX_SURFACES);
 
/* prog_data->base.binding_table.size will be set by brw_mark_surface_used. 
*/
diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
index 218afab..ca7b32e 100644
--- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
@@ -314,7 +314,8 @@ static void
 brw_update_texture_surface(struct gl_context *ctx,
unsigned unit,
uint32_t *surf_offset,
-   bool for_gather)
+   bool for_gather,
+   uint32_t plane)
 {
struct brw_context *brw = brw_context(ctx);
struct gl_texture_object *tObj = ctx->Texture.Unit[unit]._Current;
@@ -822,7 +823,7 @@ static void
 update_stage_texture_surfaces(struct brw_context *brw,
   const struct gl_program *prog,
   struct brw_stage_state *stage_state,
-  bool for_gather)
+  bool for_gather, uint32_t plane)
 {
if (!prog)
   return;
@@ -835,7 +836,7 @@ update_stage_texture_surfaces(struct brw_context *brw,
if (for_gather)
   surf_offset += 
stage_state->prog_data->binding_table.gather_texture_start;
else
-  surf_offset += stage_state->prog_data->binding_table.texture_start;
+  surf_offset += stage_state->prog_data

[Mesa-dev] [PATCH 6/7] i965: Allow creating planar YUV __DRIimages

2016-05-05 Thread Kristian Høgsberg

From: Kristian Høgsberg Kristensen 

Lift the resctriction we had before and allow creation of images with
multiple planes. We still require all the planes to be within the same
bo.
---
 src/mesa/drivers/dri/i965/intel_screen.c | 33 ++--
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index db9d94d..580f6de 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -676,9 +676,14 @@ intel_create_image_from_fds(__DRIscreen *screen,
__DRIimage *image;
int i, index;
 
-   if (fds == NULL || num_fds != 1)
+   if (fds == NULL || num_fds < 1)
   return NULL;
 
+   /* We only support all planes from the same bo */
+   for (i = 0; i < num_fds; i++)
+  if (fds[0] != fds[i])
+ return NULL;
+
f = intel_image_format_lookup(fourcc);
if (f == NULL)
   return NULL;
@@ -691,22 +696,28 @@ intel_create_image_from_fds(__DRIscreen *screen,
if (image == NULL)
   return NULL;
 
-   image->bo = drm_intel_bo_gem_create_from_prime(intelScreen->bufmgr,
-  fds[0],
-  height * strides[0]);
-   if (image->bo == NULL) {
-  free(image);
-  return NULL;
-   }
image->width = width;
image->height = height;
image->pitch = strides[0];
 
image->planar_format = f;
+   int size = 0;
for (i = 0; i < f->nplanes; i++) {
   index = f->planes[i].buffer_index;
   image->offsets[index] = offsets[index];
   image->strides[index] = strides[index];
+
+  const int height = height >> f->planes[i].height_shift;
+  const int end = offsets[index] + height * strides[index];
+  if (size < end)
+ size = end;
+   }
+
+   image->bo = drm_intel_bo_gem_create_from_prime(intelScreen->bufmgr,
+  fds[0], size);
+   if (image->bo == NULL) {
+  free(image);
+  return NULL;
}
 
if (f->nplanes == 1) {
@@ -732,12 +743,6 @@ intel_create_image_from_dma_bufs(__DRIscreen *screen,
__DRIimage *image;
struct intel_image_format *f = intel_image_format_lookup(fourcc);
 
-   /* For now only packed formats that have native sampling are supported. */
-   if (!f || f->nplanes != 1) {
-  *error = __DRI_IMAGE_ERROR_BAD_MATCH;
-  return NULL;
-   }
-
image = intel_create_image_from_fds(screen, width, height, fourcc, fds,
num_fds, strides, offsets,
loaderPrivate);
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/7] nir: Add new 'plane' texture source type

2016-05-05 Thread Kristian Høgsberg

From: Kristian Høgsberg Kristensen 

This will be used to select the plane to sample from for planar
textures.
---
 src/compiler/nir/nir.h   | 1 +
 src/compiler/nir/nir_print.c | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 4693ab5..d85ea17 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -1072,6 +1072,7 @@ typedef enum {
nir_tex_src_ddy,
nir_tex_src_texture_offset, /* < dynamically uniform indirect offset */
nir_tex_src_sampler_offset, /* < dynamically uniform indirect offset */
+   nir_tex_src_plane,  /* < selects plane for planar textures */
nir_num_tex_src_types
 } nir_tex_src_type;
 
diff --git a/src/compiler/nir/nir_print.c b/src/compiler/nir/nir_print.c
index 229539d..407a71b 100644
--- a/src/compiler/nir/nir_print.c
+++ b/src/compiler/nir/nir_print.c
@@ -681,6 +681,9 @@ print_tex_instr(nir_tex_instr *instr, print_state *state)
   case nir_tex_src_sampler_offset:
  fprintf(fp, "(sampler_offset)");
  break;
+  case nir_tex_src_plane:
+ fprintf(fp, "(plane)");
+ break;
 
   default:
  unreachable("Invalid texture source type");
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/7] nir: Add a lowering pass for YUV textures

2016-05-05 Thread Kristian Høgsberg

From: Kristian Høgsberg Kristensen 

This lowers sampling from YUV textures to 1) one or more texture
instructions to sample each plane and 2) color space conversion to RGB.
---
 src/compiler/nir/nir.h   |   7 +++
 src/compiler/nir/nir_lower_tex.c | 118 +++
 2 files changed, 125 insertions(+)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index d85ea17..cf6d49d 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2273,6 +2273,13 @@ typedef struct nir_lower_tex_options {
bool lower_rect;
 
/**
+* If true, convert yuv to rgb.
+*/
+   unsigned lower_y_uv_external;
+   unsigned lower_y_u_v_external;
+   unsigned lower_yx_xuxv_external;
+
+   /**
 * To emulate certain texture wrap modes, this can be used
 * to saturate the specified tex coord to [0.0, 1.0].  The
 * bits are according to sampler #, ie. if, for example:
diff --git a/src/compiler/nir/nir_lower_tex.c b/src/compiler/nir/nir_lower_tex.c
index 16fee9a..c8aebb4 100644
--- a/src/compiler/nir/nir_lower_tex.c
+++ b/src/compiler/nir/nir_lower_tex.c
@@ -167,6 +167,108 @@ lower_rect(nir_builder *b, nir_tex_instr *tex)
tex->sampler_dim = GLSL_SAMPLER_DIM_2D;
 }
 
+static nir_ssa_def *
+sample_plane(nir_builder *b, nir_tex_instr *tex, int plane)
+{
+   assert(tex->dest.is_ssa);
+   assert(nir_tex_instr_dest_size(tex) == 4);
+   assert(nir_alu_type_get_base_type(tex->dest_type) == nir_type_float);
+   assert(tex->op == nir_texop_tex);
+   assert(tex->coord_components == 2);
+
+   nir_tex_instr *plane_tex = nir_tex_instr_create(b->shader, 2);
+   plane_tex->src[0].src = tex->src[0].src;
+   plane_tex->src[0].src_type = nir_tex_src_coord;
+   plane_tex->src[1].src = nir_src_for_ssa(nir_imm_int(b, plane));
+   plane_tex->src[1].src_type = nir_tex_src_plane;
+   plane_tex->op = nir_texop_tex;
+   plane_tex->sampler_dim = 2;
+   plane_tex->dest_type = nir_type_float;
+   plane_tex->coord_components = 2;
+
+   plane_tex->texture_index = tex->texture_index;
+   plane_tex->texture = tex->texture;
+   plane_tex->sampler_index = tex->sampler_index;
+   plane_tex->sampler = tex->sampler;
+
+   nir_ssa_dest_init(&plane_tex->instr, &plane_tex->dest, 4, 32, NULL);
+
+   nir_builder_instr_insert(b, &plane_tex->instr);
+
+   return &plane_tex->dest.ssa;
+}
+
+static void
+convert_yuv_to_rgb(nir_builder *b, nir_tex_instr *tex,
+   nir_ssa_def *y, nir_ssa_def *u, nir_ssa_def *v)
+{
+   nir_const_value m[3] = {
+  { .f32 = { 1.0f,  0.0f, 1.59602678f, 0.0f } },
+  { .f32 = { 1.0f, -0.39176229f, -0.81296764f, 0.0f } },
+  { .f32 = { 1.0f,  2.01723214f,  0.0f,0.0f } }
+   };
+
+   nir_ssa_def *yuv =
+  nir_vec4(b,
+   nir_fmul(b, nir_imm_float(b, 1.16438356f),
+nir_fadd(b, y, nir_imm_float(b, -0.0625f))),
+   nir_channel(b, nir_fadd(b, u, nir_imm_float(b, -0.5f)), 0),
+   nir_channel(b, nir_fadd(b, v, nir_imm_float(b, -0.5f)), 0),
+   nir_imm_float(b, 0.0));
+
+   nir_ssa_def *red = nir_fdot4(b, yuv, nir_build_imm(b, 4, m[0]));
+   nir_ssa_def *green = nir_fdot4(b, yuv, nir_build_imm(b, 4, m[1]));
+   nir_ssa_def *blue = nir_fdot4(b, yuv, nir_build_imm(b, 4, m[2]));
+
+   nir_ssa_def *result = nir_vec4(b, red, green, blue, nir_imm_float(b, 1.0f));
+
+   nir_ssa_def_rewrite_uses_after(&tex->dest.ssa, nir_src_for_ssa(result),
+  result->parent_instr);
+}
+
+static void
+lower_y_uv_external(nir_builder *b, nir_tex_instr *tex)
+{
+   b->cursor = nir_after_instr(&tex->instr);
+
+   nir_ssa_def *y = sample_plane(b, tex, 0);
+   nir_ssa_def *uv = sample_plane(b, tex, 1);
+
+   convert_yuv_to_rgb(b, tex,
+  nir_channel(b, y, 0),
+  nir_channel(b, uv, 0),
+  nir_channel(b, uv, 1));
+}
+
+static void
+lower_y_u_v_external(nir_builder *b, nir_tex_instr *tex)
+{
+   b->cursor = nir_after_instr(&tex->instr);
+
+   nir_ssa_def *y = sample_plane(b, tex, 0);
+   nir_ssa_def *u = sample_plane(b, tex, 1);
+   nir_ssa_def *v = sample_plane(b, tex, 2);
+
+   convert_yuv_to_rgb(b, tex,
+  nir_channel(b, y, 0),
+  nir_channel(b, u, 0),
+  nir_channel(b, v, 0));
+}
+
+static void
+lower_yx_xuxv_external(nir_builder *b, nir_tex_instr *tex)
+{
+   b->cursor = nir_after_instr(&tex->instr);
+
+   nir_ssa_def *y = sample_plane(b, tex, 0);
+   nir_ssa_def *xuxv = sample_plane(b, tex, 1);
+
+   convert_yuv_to_rgb(b, tex,
+  nir_channel(b, y, 0),
+  nir_channel(b, xuxv, 1),
+  nir_channel(b, xuxv, 3));
+}
+
 static void
 saturate_src(nir_builder *b, nir_tex_instr *tex, unsigned sat_mask)
 {
@@ -351,6 +453,22 @@ nir_lower_tex_block(nir_block *block, void *void_state)
  state->progress = true;
   }
 
+  if ((1 << tex->texture_index) & options->lower_y_uv_external) {
+

[Mesa-dev] [PATCH 3/7] i965: Create multiple miptrees planar YUV images

2016-05-05 Thread Kristian Høgsberg

From: Kristian Høgsberg Kristensen 

---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c |  3 ++
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h |  5 ++
 src/mesa/drivers/dri/i965/intel_tex_image.c   | 73 ++-
 src/mesa/drivers/dri/i965/intel_tex_obj.h |  2 +
 4 files changed, 71 insertions(+), 12 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 26c297d..e3e2a8f 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -990,6 +990,9 @@ intel_miptree_release(struct intel_mipmap_tree **mt)
   intel_miptree_release(&(*mt)->mcs_mt);
   intel_resolve_map_clear(&(*mt)->hiz_map);
 
+  intel_miptree_release(&(*mt)->plane[0]);
+  intel_miptree_release(&(*mt)->plane[1]);
+
   for (i = 0; i < MAX_TEXTURE_LEVELS; i++) {
 free((*mt)->level[i].slice);
   }
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
index 7862152..9ab4b23 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
@@ -625,6 +625,11 @@ struct intel_mipmap_tree
struct intel_mipmap_tree *mcs_mt;
 
/**
+* Planes 1 and 2 in case this is a planar surface.
+*/
+   struct intel_mipmap_tree *plane[2];
+
+   /**
 * Fast clear state for this buffer.
 */
enum intel_fast_clear_state fast_clear_state;
diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c 
b/src/mesa/drivers/dri/i965/intel_tex_image.c
index bee8be1..ad7ee97 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_image.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
@@ -332,18 +332,67 @@ intel_image_target_texture_2d(struct gl_context *ctx, 
GLenum target,
   return;
}
 
-   /* Disable creation of the texture's aux buffers because the driver exposes
-* no EGL API to manage them. That is, there is no API for resolving the aux
-* buffer's content to the main buffer nor for invalidating the aux buffer's
-* content.
-*/
-   intel_set_texture_image_bo(ctx, texImage, image->bo,
-  target, image->internal_format,
-  image->format, image->offset,
-  image->width,  image->height,
-  image->pitch,
-  image->tile_x, image->tile_y,
-  MIPTREE_LAYOUT_DISABLE_AUX);
+   struct intel_texture_image *intel_image = intel_texture_image(texImage);
+   struct gl_texture_object *texobj = texImage->TexObject;
+   struct intel_texture_object *intel_texobj = intel_texture_object(texobj);
+   struct intel_image_format *f = image->planar_format;
+   struct intel_mipmap_tree *planar_mt;
+
+   for (int i = 0; i < f->nplanes; i++) {
+  int index = f->planes[i].buffer_index;
+  const uint32_t dri_format = f->planes[i].dri_format;
+  const mesa_format format = driImageFormatToGLFormat(dri_format);
+  const uint32_t width = image->width >> f->planes[i].width_shift;
+  const uint32_t height = image->height >> f->planes[i].height_shift;
+
+  /* Disable creation of the texture's aux buffers because the driver
+   * exposes no EGL API to manage them. That is, there is no API for
+   * resolving the aux buffer's content to the main buffer nor for
+   * invalidating the aux buffer's content.
+   */
+  struct intel_mipmap_tree *mt =
+ intel_miptree_create_for_bo(brw, image->bo, format,
+ image->offsets[index],
+ width, height, 1,
+ image->strides[index],
+ MIPTREE_LAYOUT_DISABLE_AUX);
+  if (mt == NULL)
+ return NULL;
+
+  mt->target = target;
+  mt->total_width = width;
+  mt->total_height = height;
+  mt->level[0].slice[0].x_offset = 0;
+  mt->level[0].slice[0].y_offset = 0;
+
+  if (i == 0)
+ planar_mt = mt;
+  else
+ planar_mt->plane[i - 1] = mt;
+   }
+
+   intel_texobj->dri_image = image;
+
+   const uint32_t internal_format = 
_mesa_get_format_base_format(planar_mt->format);
+   _mesa_init_teximage_fields(&brw->ctx, texImage,
+ image->width, image->height, 1,
+ 0, internal_format, planar_mt->format);
+
+   ctx->Driver.FreeTextureImageBuffer(ctx, texImage);
+
+   intel_miptree_reference(&intel_image->mt, planar_mt);
+
+   assert(planar_mt->pitch % planar_mt->cpp == 0);
+   intel_image->base.RowStride = planar_mt->pitch / planar_mt->cpp;
+
+   /* The miptree is in a validated state, so no need to check later. */
+   intel_miptree_reference(&intel_texobj->mt, intel_image->mt);
+   intel_texobj->validated_first_level = 0;
+   intel_texobj->validated_last_level = 0;
+   intel_texobj->_Format = planar_mt->format;
+   intel

[Mesa-dev] [PATCH 7/7] dri: Add YVU formats

2016-05-05 Thread Kristian Høgsberg

From: Kristian Høgsberg Kristensen 

---
 include/GL/internal/dri_interface.h  |  5 +
 src/mesa/drivers/dri/i965/intel_screen.c | 26 ++
 2 files changed, 31 insertions(+)

diff --git a/include/GL/internal/dri_interface.h 
b/include/GL/internal/dri_interface.h
index 84731a0..4049be6 100644
--- a/include/GL/internal/dri_interface.h
+++ b/include/GL/internal/dri_interface.h
@@ -1158,6 +1158,11 @@ struct __DRIdri2ExtensionRec {
 #define __DRI_IMAGE_FOURCC_NV160x3631564e
 #define __DRI_IMAGE_FOURCC_YUYV0x56595559
 
+#define __DRI_IMAGE_FOURCC_YVU410  0x39555659
+#define __DRI_IMAGE_FOURCC_YVU411  0x31315659
+#define __DRI_IMAGE_FOURCC_YVU420  0x32315659
+#define __DRI_IMAGE_FOURCC_YVU422  0x36315659
+#define __DRI_IMAGE_FOURCC_YVU444  0x34325659
 
 /**
  * Queryable on images created by createImageFromNames.
diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index 580f6de..a6a9fe4 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -224,6 +224,7 @@ static struct intel_image_format intel_image_formats[] = {
{ __DRI_IMAGE_FOURCC_XBGR, __DRI_IMAGE_COMPONENTS_RGB, 1,
  { { 0, 0, 0, __DRI_IMAGE_FORMAT_XBGR, 4 }, } },
 
+
{ __DRI_IMAGE_FOURCC_RGB565, __DRI_IMAGE_COMPONENTS_RGB, 1,
  { { 0, 0, 0, __DRI_IMAGE_FORMAT_RGB565, 2 } } },
 
@@ -258,6 +259,31 @@ static struct intel_image_format intel_image_formats[] = {
{ 1, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
{ 2, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 } } },
 
+   { __DRI_IMAGE_FOURCC_YVU410, __DRI_IMAGE_COMPONENTS_Y_U_V, 3,
+ { { 0, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+   { 2, 2, 2, __DRI_IMAGE_FORMAT_R8, 1 },
+   { 1, 2, 2, __DRI_IMAGE_FORMAT_R8, 1 } } },
+
+   { __DRI_IMAGE_FOURCC_YVU411, __DRI_IMAGE_COMPONENTS_Y_U_V, 3,
+ { { 0, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+   { 2, 2, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+   { 1, 2, 0, __DRI_IMAGE_FORMAT_R8, 1 } } },
+
+   { __DRI_IMAGE_FOURCC_YVU420, __DRI_IMAGE_COMPONENTS_Y_U_V, 3,
+ { { 0, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+   { 2, 1, 1, __DRI_IMAGE_FORMAT_R8, 1 },
+   { 1, 1, 1, __DRI_IMAGE_FORMAT_R8, 1 } } },
+
+   { __DRI_IMAGE_FOURCC_YVU422, __DRI_IMAGE_COMPONENTS_Y_U_V, 3,
+ { { 0, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+   { 2, 1, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+   { 1, 1, 0, __DRI_IMAGE_FORMAT_R8, 1 } } },
+
+   { __DRI_IMAGE_FOURCC_YVU444, __DRI_IMAGE_COMPONENTS_Y_U_V, 3,
+ { { 0, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+   { 2, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
+   { 1, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 } } },
+
{ __DRI_IMAGE_FOURCC_NV12, __DRI_IMAGE_COMPONENTS_Y_UV, 2,
  { { 0, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
{ 1, 1, 1, __DRI_IMAGE_FORMAT_GR88, 2 } } },
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/7] Support for samplerExternalOES YUV EGLImages

2016-05-05 Thread Kristian Høgsberg

From: Kristian Høgsberg Kristensen 

This patche series adds support for importing YUV buffers as EGLImages
and sampling from them using samplerExternalOES. This series
implements the sampling by lowering the texture operation to 1-3
texture operations from the YUV planes and inserts color space
conversion in the shader.

Kristian Høgsberg Kristensen (7):
  nir: Add new 'plane' texture source type
  nir: Add a lowering pass for YUV textures
  i965: Create multiple miptrees planar YUV images
  i965: Support textures with multiple planes
  i965: Invoke lowering pass for YUV textures
  i965: Allow creating planar YUV __DRIimages
  dri: Add YVU formats

 include/GL/internal/dri_interface.h   |   5 +
 src/compiler/nir/nir.h|   8 ++
 src/compiler/nir/nir_lower_tex.c  | 118 ++
 src/compiler/nir/nir_print.c  |   3 +
 src/mesa/drivers/dri/i965/brw_compiler.h  |   8 ++
 src/mesa/drivers/dri/i965/brw_context.h   |   2 +-
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp  |  13 +++
 src/mesa/drivers/dri/i965/brw_nir.c   |   4 +
 src/mesa/drivers/dri/i965/brw_shader.cpp  |   9 ++
 src/mesa/drivers/dri/i965/brw_wm.c|  29 ++
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c  |  38 ---
 src/mesa/drivers/dri/i965/gen7_wm_surface_state.c |   3 +-
 src/mesa/drivers/dri/i965/gen8_surface_state.c|  12 ++-
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c |   3 +
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h |   5 +
 src/mesa/drivers/dri/i965/intel_screen.c  |  59 ---
 src/mesa/drivers/dri/i965/intel_tex_image.c   |  73 ++---
 src/mesa/drivers/dri/i965/intel_tex_obj.h |   2 +
 18 files changed, 349 insertions(+), 45 deletions(-)

-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/7] i965: Invoke lowering pass for YUV textures

2016-05-05 Thread Kristian Høgsberg

From: Kristian Høgsberg Kristensen 

---
 src/mesa/drivers/dri/i965/brw_compiler.h |  7 +++
 src/mesa/drivers/dri/i965/brw_nir.c  |  4 
 src/mesa/drivers/dri/i965/brw_wm.c   | 29 +
 3 files changed, 40 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h 
b/src/mesa/drivers/dri/i965/brw_compiler.h
index 7d75202..e28711e 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.h
+++ b/src/mesa/drivers/dri/i965/brw_compiler.h
@@ -159,6 +159,13 @@ struct brw_sampler_prog_key_data {
 * For Sandybridge, which shader w/a we need for gather quirks.
 */
enum gen6_gather_sampler_wa gen6_gather_wa[MAX_SAMPLERS];
+
+   /**
+* Texture units that have a YUV image bound.
+*/
+   uint32_t y_u_v_image_mask;
+   uint32_t y_uv_image_mask;
+   uint32_t yx_xuxv_image_mask;
 };
 
 
diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
b/src/mesa/drivers/dri/i965/brw_nir.c
index f9bcba2..c5d5ea9 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -615,6 +615,10 @@ brw_nir_apply_sampler_key(nir_shader *nir,
  tex_options.swizzles[s][c] = GET_SWZ(key_tex->swizzles[s], c);
}
 
+   tex_options.lower_y_uv_external = key_tex->y_uv_image_mask;
+   tex_options.lower_y_u_v_external = key_tex->y_u_v_image_mask;
+   tex_options.lower_yx_xuxv_external = key_tex->yx_xuxv_image_mask;
+
if (nir_lower_tex(nir, &tex_options)) {
   nir_validate_shader(nir);
   nir = nir_optimize(nir, is_scalar);
diff --git a/src/mesa/drivers/dri/i965/brw_wm.c 
b/src/mesa/drivers/dri/i965/brw_wm.c
index dbc626c..09aecc6 100644
--- a/src/mesa/drivers/dri/i965/brw_wm.c
+++ b/src/mesa/drivers/dri/i965/brw_wm.c
@@ -35,6 +35,7 @@
 #include "program/prog_parameter.h"
 #include "program/program.h"
 #include "intel_mipmap_tree.h"
+#include "intel_image.h"
 #include "brw_nir.h"
 #include "brw_program.h"
 
@@ -206,6 +207,16 @@ brw_debug_recompile_sampler_key(struct brw_context *brw,
   old_key->msaa_16,
   key->msaa_16);
 
+   found |= key_debug(brw, "y_uv image bound",
+  old_key->y_uv_image_mask,
+  key->y_uv_image_mask);
+   found |= key_debug(brw, "y_u_v image bound",
+  old_key->y_u_v_image_mask,
+  key->y_u_v_image_mask);
+   found |= key_debug(brw, "yx_xuxv image bound",
+  old_key->yx_xuxv_image_mask,
+  key->yx_xuxv_image_mask);
+
for (unsigned int i = 0; i < MAX_SAMPLERS; i++) {
   found |= key_debug(brw, "textureGather workarounds",
  old_key->gen6_gather_wa[i], key->gen6_gather_wa[i]);
@@ -370,6 +381,24 @@ brw_populate_sampler_prog_key_data(struct gl_context *ctx,
key->msaa_16 |= 1 << s;
 }
  }
+
+ if (t->Target == GL_TEXTURE_EXTERNAL_OES && intel_tex->dri_image) {
+__DRIimage *image = intel_tex->dri_image;
+switch (image->planar_format->components) {
+case __DRI_IMAGE_COMPONENTS_Y_UV:
+   key->y_uv_image_mask |= 1 << s;
+   break;
+case __DRI_IMAGE_COMPONENTS_Y_U_V:
+   key->y_u_v_image_mask |= 1 << s;
+   break;
+case __DRI_IMAGE_COMPONENTS_Y_XUXV:
+   key->yx_xuxv_image_mask |= 1 << s;
+   break;
+default:
+   break;
+}
+ }
+
   }
}
 }
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] mesa/shader_query: add missing subroutines cases

2016-05-05 Thread Dave Airlie

From: Dave Airlie 

ARRAY_SIZE and LOCATION should accept the SUBROUTINE_UNIFORM types.

Fixes:
GL43-CTS.program_interface_query.subroutines-vertex
GL43-CTS.program_interface_query.subroutines-tess-control
GL43-CTS.program_interface_query.subroutines-tess-eval
GL43-CTS.program_interface_query.subroutines-geometry
GL43-CTS.program_interface_query.subroutines-fragment
GL43-CTS.program_interface_query.subroutines-compute

Signed-off-by: Dave Airlie 
---
 src/mesa/main/shader_query.cpp | 13 +
 1 file changed, 13 insertions(+)

diff --git a/src/mesa/main/shader_query.cpp b/src/mesa/main/shader_query.cpp
index 020990a..c8c3df4 100644
--- a/src/mesa/main/shader_query.cpp
+++ b/src/mesa/main/shader_query.cpp
@@ -1125,6 +1125,13 @@ _mesa_program_resource_prop(struct gl_shader_program 
*shProg,
   switch (res->Type) {
   case GL_UNIFORM:
   case GL_BUFFER_VARIABLE:
+  case GL_VERTEX_SUBROUTINE_UNIFORM:
+  case GL_GEOMETRY_SUBROUTINE_UNIFORM:
+  case GL_FRAGMENT_SUBROUTINE_UNIFORM:
+  case GL_COMPUTE_SUBROUTINE_UNIFORM:
+  case GL_TESS_CONTROL_SUBROUTINE_UNIFORM:
+  case GL_TESS_EVALUATION_SUBROUTINE_UNIFORM:
+
  /* Test if a buffer variable is an array or an unsized array.
   * Unsized arrays return zero as array size.
   */
@@ -1207,6 +1214,12 @@ _mesa_program_resource_prop(struct gl_shader_program 
*shProg,
case GL_LOCATION:
   switch (res->Type) {
   case GL_UNIFORM:
+  case GL_VERTEX_SUBROUTINE_UNIFORM:
+  case GL_GEOMETRY_SUBROUTINE_UNIFORM:
+  case GL_FRAGMENT_SUBROUTINE_UNIFORM:
+  case GL_COMPUTE_SUBROUTINE_UNIFORM:
+  case GL_TESS_CONTROL_SUBROUTINE_UNIFORM:
+  case GL_TESS_EVALUATION_SUBROUTINE_UNIFORM:
   case GL_PROGRAM_INPUT:
   case GL_PROGRAM_OUTPUT:
  *val = program_resource_location(shProg, res,
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/7] nir: Add a lowering pass for YUV textures

2016-05-05 Thread Jason Ekstrand

On Thu, May 5, 2016 at 5:04 PM, Kristian Høgsberg  wrote:

> From: Kristian Høgsberg Kristensen 
>
> This lowers sampling from YUV textures to 1) one or more texture
> instructions to sample each plane and 2) color space conversion to RGB.
> ---
>  src/compiler/nir/nir.h   |   7 +++
>  src/compiler/nir/nir_lower_tex.c | 118
> +++
>  2 files changed, 125 insertions(+)
>
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index d85ea17..cf6d49d 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -2273,6 +2273,13 @@ typedef struct nir_lower_tex_options {
> bool lower_rect;
>
> /**
> +* If true, convert yuv to rgb.
> +*/
> +   unsigned lower_y_uv_external;
> +   unsigned lower_y_u_v_external;
> +   unsigned lower_yx_xuxv_external;
> +
> +   /**
>  * To emulate certain texture wrap modes, this can be used
>  * to saturate the specified tex coord to [0.0, 1.0].  The
>  * bits are according to sampler #, ie. if, for example:
> diff --git a/src/compiler/nir/nir_lower_tex.c
> b/src/compiler/nir/nir_lower_tex.c
> index 16fee9a..c8aebb4 100644
> --- a/src/compiler/nir/nir_lower_tex.c
> +++ b/src/compiler/nir/nir_lower_tex.c
> @@ -167,6 +167,108 @@ lower_rect(nir_builder *b, nir_tex_instr *tex)
> tex->sampler_dim = GLSL_SAMPLER_DIM_2D;
>  }
>
> +static nir_ssa_def *
> +sample_plane(nir_builder *b, nir_tex_instr *tex, int plane)
> +{
> +   assert(tex->dest.is_ssa);
> +   assert(nir_tex_instr_dest_size(tex) == 4);
> +   assert(nir_alu_type_get_base_type(tex->dest_type) == nir_type_float);
> +   assert(tex->op == nir_texop_tex);
> +   assert(tex->coord_components == 2);
> +
> +   nir_tex_instr *plane_tex = nir_tex_instr_create(b->shader, 2);
> +   plane_tex->src[0].src = tex->src[0].src;
>

You should use nir_src_copy here


> +   plane_tex->src[0].src_type = nir_tex_src_coord;
> +   plane_tex->src[1].src = nir_src_for_ssa(nir_imm_int(b, plane));
> +   plane_tex->src[1].src_type = nir_tex_src_plane;
> +   plane_tex->op = nir_texop_tex;
> +   plane_tex->sampler_dim = 2;
> +   plane_tex->dest_type = nir_type_float;
> +   plane_tex->coord_components = 2;
> +
> +   plane_tex->texture_index = tex->texture_index;
>

This should use nir_copy_deref.  Right now, the deref is supposed to be
ralloc parented to the instruciton so you have to make a copy.  One of
these days, I'm going to clean that up.


> +   plane_tex->texture = tex->texture;
> +   plane_tex->sampler_index = tex->sampler_index;
> +   plane_tex->sampler = tex->sampler;
>

Same here


> +
> +   nir_ssa_dest_init(&plane_tex->instr, &plane_tex->dest, 4, 32, NULL);
> +
> +   nir_builder_instr_insert(b, &plane_tex->instr);
> +
> +   return &plane_tex->dest.ssa;
> +}
> +
> +static void
> +convert_yuv_to_rgb(nir_builder *b, nir_tex_instr *tex,
> +   nir_ssa_def *y, nir_ssa_def *u, nir_ssa_def *v)
> +{
> +   nir_const_value m[3] = {
> +  { .f32 = { 1.0f,  0.0f, 1.59602678f, 0.0f } },
> +  { .f32 = { 1.0f, -0.39176229f, -0.81296764f, 0.0f } },
> +  { .f32 = { 1.0f,  2.01723214f,  0.0f,0.0f } }
> +   };
> +
> +   nir_ssa_def *yuv =
> +  nir_vec4(b,
> +   nir_fmul(b, nir_imm_float(b, 1.16438356f),
> +nir_fadd(b, y, nir_imm_float(b, -0.0625f))),
> +   nir_channel(b, nir_fadd(b, u, nir_imm_float(b, -0.5f)), 0),
> +   nir_channel(b, nir_fadd(b, v, nir_imm_float(b, -0.5f)), 0),
> +   nir_imm_float(b, 0.0));
> +
> +   nir_ssa_def *red = nir_fdot4(b, yuv, nir_build_imm(b, 4, m[0]));
> +   nir_ssa_def *green = nir_fdot4(b, yuv, nir_build_imm(b, 4, m[1]));
> +   nir_ssa_def *blue = nir_fdot4(b, yuv, nir_build_imm(b, 4, m[2]));
>

I didn't bother checking your math here, but I've seen the results so it's
probably ok.


> +
> +   nir_ssa_def *result = nir_vec4(b, red, green, blue, nir_imm_float(b,
> 1.0f));
> +
> +   nir_ssa_def_rewrite_uses_after(&tex->dest.ssa, nir_src_for_ssa(result),
> +  result->parent_instr);
>

You can just use ssa_def_rewrite_uses here.  No need to use the _after
version.

With those fixed, this and the other NIR patch are

Reviewed-by: Jason Ekstrand 


> +}
> +
> +static void
> +lower_y_uv_external(nir_builder *b, nir_tex_instr *tex)
> +{
> +   b->cursor = nir_after_instr(&tex->instr);
> +
> +   nir_ssa_def *y = sample_plane(b, tex, 0);
> +   nir_ssa_def *uv = sample_plane(b, tex, 1);
> +
> +   convert_yuv_to_rgb(b, tex,
> +  nir_channel(b, y, 0),
> +  nir_channel(b, uv, 0),
> +  nir_channel(b, uv, 1));
> +}
> +
> +static void
> +lower_y_u_v_external(nir_builder *b, nir_tex_instr *tex)
> +{
> +   b->cursor = nir_after_instr(&tex->instr);
> +
> +   nir_ssa_def *y = sample_plane(b, tex, 0);
> +   nir_ssa_def *u = sample_plane(b, tex, 1);
> +   nir_ssa_def *v = sample_plane(b, tex, 2);
> +
> +   convert_yuv_to_rgb(b, tex,
> +

[Mesa-dev] [PATCH] mesa/compute: Fix indirect dispatch buffer size check on 32-bit systems

2016-05-05 Thread Jordan Justen

2655265fcba9017e793026c76e490e04db088c8f, but for compute.

Signed-off-by: Jordan Justen 
Cc: Kenneth Graunke 
---

 I don't know if there is a test that this fixes, but it seems like we
 should handle it the same as the draw case.

 Note that the draw indirect param is a pointer, whereas it is a
 GLintptr for dispatch indirect.

 src/mesa/main/api_validate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
index 6ec65e5..9d29f8d 100644
--- a/src/mesa/main/api_validate.c
+++ b/src/mesa/main/api_validate.c
@@ -1107,7 +1107,7 @@ valid_dispatch_indirect(struct gl_context *ctx,
 GLintptr indirect,
 GLsizei size, const char *name)
 {
-   GLintptr end = indirect + size;
+   const uint64_t end = (uint64_t) indirect + size;
 
if (!check_valid_to_compute(ctx, name))
   return GL_FALSE;
-- 
2.8.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa/compute: Fix indirect dispatch buffer size check on 32-bit systems

2016-05-05 Thread Jason Ekstrand

Reviewed-by: Jason Ekstrand 

On Thu, May 5, 2016 at 5:38 PM, Jordan Justen 
wrote:

> 2655265fcba9017e793026c76e490e04db088c8f, but for compute.
>
> Signed-off-by: Jordan Justen 
> Cc: Kenneth Graunke 
> ---
>
>  I don't know if there is a test that this fixes, but it seems like we
>  should handle it the same as the draw case.
>
>  Note that the draw indirect param is a pointer, whereas it is a
>  GLintptr for dispatch indirect.
>
>  src/mesa/main/api_validate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
> index 6ec65e5..9d29f8d 100644
> --- a/src/mesa/main/api_validate.c
> +++ b/src/mesa/main/api_validate.c
> @@ -1107,7 +1107,7 @@ valid_dispatch_indirect(struct gl_context *ctx,
>  GLintptr indirect,
>  GLsizei size, const char *name)
>  {
> -   GLintptr end = indirect + size;
> +   const uint64_t end = (uint64_t) indirect + size;
>
> if (!check_valid_to_compute(ctx, name))
>return GL_FALSE;
> --
> 2.8.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/3] nir/algebraic: Separate ffma lowering from fusing

2016-05-05 Thread Jason Ekstrand

The i965 driver has its own pass for fusing mul+add combinations that's
much smarter than what nir_opt_algebraic can do so we don't want to get the
nir_opt_algebraic one just because we didn't set lower_ffma.
---
 src/compiler/nir/nir.h  | 1 +
 src/compiler/nir/nir_opt_algebraic.py   | 2 +-
 src/gallium/drivers/freedreno/ir3/ir3_nir.c | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 8a616d4..ba9d311 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -1624,6 +1624,7 @@ typedef struct nir_function {
 typedef struct nir_shader_compiler_options {
bool lower_fdiv;
bool lower_ffma;
+   bool fuse_ffma;
bool lower_flrp32;
/** Lowers flrp when it does not support doubles */
bool lower_flrp64;
diff --git a/src/compiler/nir/nir_opt_algebraic.py 
b/src/compiler/nir/nir_opt_algebraic.py
index 0a95725..f8db2b6 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -108,7 +108,7 @@ optimizations = [
(('~fadd@32', a, ('fmul', c , ('fadd', b, ('fneg', a, ('flrp', 
a, b, c), '!options->lower_flrp32'),
(('~fadd@64', a, ('fmul', c , ('fadd', b, ('fneg', a, ('flrp', 
a, b, c), '!options->lower_flrp64'),
(('ffma', a, b, c), ('fadd', ('fmul', a, b), c), 'options->lower_ffma'),
-   (('~fadd', ('fmul', a, b), c), ('ffma', a, b, c), '!options->lower_ffma'),
+   (('~fadd', ('fmul', a, b), c), ('ffma', a, b, c), 'options->fuse_ffma'),
# Comparison simplifications
(('~inot', ('flt', a, b)), ('fge', a, b)),
(('~inot', ('fge', a, b)), ('flt', a, b)),
diff --git a/src/gallium/drivers/freedreno/ir3/ir3_nir.c 
b/src/gallium/drivers/freedreno/ir3/ir3_nir.c
index 99c2ecb..5662927 100644
--- a/src/gallium/drivers/freedreno/ir3/ir3_nir.c
+++ b/src/gallium/drivers/freedreno/ir3/ir3_nir.c
@@ -44,6 +44,7 @@ ir3_tgsi_to_nir(const struct tgsi_token *tokens)
.lower_scmp = true,
.lower_flrp32 = true,
.lower_ffract = true,
+   .fuse_ffma = true,
.native_integers = true,
.vertex_id_zero_based = true,
.lower_extract_byte = true,
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] ptn: Emit mul+add for MAD

2016-05-05 Thread Jason Ekstrand

Unlike fma() in GLSL, MAD in ARB programs is 100% splittable.  Just emit
the split version and let the optimizer fuse them later.

Shader-db results on Haswell:

   total instructions in shared programs: 7560379 -> 7560300 (-0.00%)
   instructions in affected programs: 143928 -> 143849 (-0.05%)
   helped: 443
   HURT: 250
---
 src/mesa/program/prog_to_nir.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/mesa/program/prog_to_nir.c b/src/mesa/program/prog_to_nir.c
index a6119ae..aa51dec 100644
--- a/src/mesa/program/prog_to_nir.c
+++ b/src/mesa/program/prog_to_nir.c
@@ -706,7 +706,7 @@ static const nir_op op_trans[MAX_OPCODE] = {
[OPCODE_LIT] = 0,
[OPCODE_LOG] = 0,
[OPCODE_LRP] = 0,
-   [OPCODE_MAD] = nir_op_ffma,
+   [OPCODE_MAD] = 0,
[OPCODE_MAX] = nir_op_fmax,
[OPCODE_MIN] = nir_op_fmin,
[OPCODE_MOV] = nir_op_fmov,
@@ -801,6 +801,10 @@ ptn_emit_instruction(struct ptn_compile *c, struct 
prog_instruction *prog_inst)
   ptn_lrp(b, dest, src);
   break;
 
+   case OPCODE_MAD:
+  ptn_move_dest(b, dest, nir_fadd(b, nir_fmul(b, src[0], src[1]), src[2]));
+  break;
+
case OPCODE_DST:
   ptn_dst(b, dest, src);
   break;
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/3] i965: Stop splitting fma() prior to optimization

2016-05-05 Thread Jason Ekstrand

According to the GLSL spec, if the user uses fma directly and you have it
in your hardware, you shouldn't split it.  For a while now, we've been
splitting all fma's up-front and then planned to fuse them later.  The only
reason why this possibly helped before was for ARB programs which is
handled by the previous commit. This fixes rendering corruptions in Tomb
Raider.

Shader-db results on Haswell:

   total instructions in shared programs: 7560300 -> 7561510 (0.02%)
   instructions in affected programs: 56265 -> 57475 (2.15%)
   helped: 86
   HURT: 291

The only shaders in the database that are affected are from Shadow of
Mordor which is the first app in our database to use fma().

Reported-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_compiler.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c 
b/src/mesa/drivers/dri/i965/brw_compiler.c
index 0ea5e8b..7969543 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.c
+++ b/src/mesa/drivers/dri/i965/brw_compiler.c
@@ -72,7 +72,6 @@ shader_perf_log_mesa(void *data, const char *fmt, ...)
 * split all ffma instructions during opt_algebraic and we then re-combine \
 * them as a later step.   \
 */\
-   .lower_ffma = true,\
.lower_sub = true, \
.lower_fdiv = true,\
.lower_scmp = true,\
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] i965: Stop splitting fma() prior to optimization

2016-05-05 Thread Ilia Mirkin

On Thu, May 5, 2016 at 8:51 PM, Jason Ekstrand  wrote:
> According to the GLSL spec, if the user uses fma directly and you have it
> in your hardware, you shouldn't split it.  For a while now, we've been
> splitting all fma's up-front and then planned to fuse them later.  The only
> reason why this possibly helped before was for ARB programs which is
> handled by the previous commit. This fixes rendering corruptions in Tomb
> Raider.
>
> Shader-db results on Haswell:
>
>total instructions in shared programs: 7560300 -> 7561510 (0.02%)
>instructions in affected programs: 56265 -> 57475 (2.15%)
>helped: 86
>HURT: 291
>
> The only shaders in the database that are affected are from Shadow of
> Mordor which is the first app in our database to use fma().
>
> Reported-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_compiler.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c 
> b/src/mesa/drivers/dri/i965/brw_compiler.c
> index 0ea5e8b..7969543 100644
> --- a/src/mesa/drivers/dri/i965/brw_compiler.c
> +++ b/src/mesa/drivers/dri/i965/brw_compiler.c
> @@ -72,7 +72,6 @@ shader_perf_log_mesa(void *data, const char *fmt, ...)
>  * split all ffma instructions during opt_algebraic and we then 
> re-combine \
>  * them as a later step.  
>  \
>  */   
>  \
> -   .lower_ffma = true,   
>  \

Looks like maybe there's a comment above in need of updating?

> .lower_sub = true,
>  \
> .lower_fdiv = true,   
>  \
> .lower_scmp = true,   
>  \
> --
> 2.5.0.400.gff86faf
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] i965: Stop splitting fma() prior to optimization

2016-05-05 Thread Jason Ekstrand

On May 5, 2016 6:13 PM, "Ilia Mirkin"  wrote:
>
> On Thu, May 5, 2016 at 8:51 PM, Jason Ekstrand 
wrote:
> > According to the GLSL spec, if the user uses fma directly and you have
it
> > in your hardware, you shouldn't split it.  For a while now, we've been
> > splitting all fma's up-front and then planned to fuse them later.  The
only
> > reason why this possibly helped before was for ARB programs which is
> > handled by the previous commit. This fixes rendering corruptions in Tomb
> > Raider.
> >
> > Shader-db results on Haswell:
> >
> >total instructions in shared programs: 7560300 -> 7561510 (0.02%)
> >instructions in affected programs: 56265 -> 57475 (2.15%)
> >helped: 86
> >HURT: 291
> >
> > The only shaders in the database that are affected are from Shadow of
> > Mordor which is the first app in our database to use fma().
> >
> > Reported-by: Kenneth Graunke 
> > ---
> >  src/mesa/drivers/dri/i965/brw_compiler.c | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c
b/src/mesa/drivers/dri/i965/brw_compiler.c
> > index 0ea5e8b..7969543 100644
> > --- a/src/mesa/drivers/dri/i965/brw_compiler.c
> > +++ b/src/mesa/drivers/dri/i965/brw_compiler.c
> > @@ -72,7 +72,6 @@ shader_perf_log_mesa(void *data, const char *fmt, ...)
> >  * split all ffma instructions during opt_algebraic and we then
re-combine \
> >  * them as a later step.
   \
> >  */
\
> > -   .lower_ffma = true,
\
>
> Looks like maybe there's a comment above in need of updating?

Good call. :-)

> > .lower_sub = true,
   \
> > .lower_fdiv = true,
\
> > .lower_scmp = true,
\
> > --
> > 2.5.0.400.gff86faf
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] i965: Stop splitting fma() prior to optimization

2016-05-05 Thread Jason Ekstrand

On Thu, May 5, 2016 at 6:22 PM, Jason Ekstrand  wrote:

>
> On May 5, 2016 6:13 PM, "Ilia Mirkin"  wrote:
> >
> > On Thu, May 5, 2016 at 8:51 PM, Jason Ekstrand 
> wrote:
> > > According to the GLSL spec, if the user uses fma directly and you have
> it
> > > in your hardware, you shouldn't split it.  For a while now, we've been
> > > splitting all fma's up-front and then planned to fuse them later.  The
> only
> > > reason why this possibly helped before was for ARB programs which is
> > > handled by the previous commit. This fixes rendering corruptions in
> Tomb
> > > Raider.
> > >
> > > Shader-db results on Haswell:
> > >
> > >total instructions in shared programs: 7560300 -> 7561510 (0.02%)
> > >instructions in affected programs: 56265 -> 57475 (2.15%)
> > >helped: 86
> > >HURT: 291
> > >
> > > The only shaders in the database that are affected are from Shadow of
> > > Mordor which is the first app in our database to use fma().
> > >
> > > Reported-by: Kenneth Graunke 
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_compiler.c | 1 -
> > >  1 file changed, 1 deletion(-)
> > >
> > > diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c
> b/src/mesa/drivers/dri/i965/brw_compiler.c
> > > index 0ea5e8b..7969543 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_compiler.c
> > > +++ b/src/mesa/drivers/dri/i965/brw_compiler.c
> > > @@ -72,7 +72,6 @@ shader_perf_log_mesa(void *data, const char *fmt,
> ...)
> > >  * split all ffma instructions during opt_algebraic and we then
> re-combine \
> > >  * them as a later step.
>  \
> > >  */
> \
> > > -   .lower_ffma = true,
> \
> >
> > Looks like maybe there's a comment above in need of updating?
>
> Good call. :-)
>

Fixed locally.

> > .lower_sub = true,
>\
> > > .lower_fdiv = true,
> \
> > > .lower_scmp = true,
> \
> > > --
> > > 2.5.0.400.gff86faf
> > >
> > > ___
> > > mesa-dev mailing list
> > > mesa-dev@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] glsl: validate subroutine types match function signature.

2016-05-05 Thread Dave Airlie

From: Dave Airlie 

This fixes:
GL43-CTS.shader_subroutine.subroutines_incompatible_with_subroutine_type

It just makes sure the signatures match as well as the return
types.

Signed-off-by: Dave Airlie 
---
 src/compiler/glsl/ast_to_hir.cpp | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/src/compiler/glsl/ast_to_hir.cpp b/src/compiler/glsl/ast_to_hir.cpp
index dd99893..e54e8d2 100644
--- a/src/compiler/glsl/ast_to_hir.cpp
+++ b/src/compiler/glsl/ast_to_hir.cpp
@@ -5504,6 +5504,24 @@ ast_function::hir(exec_list *instructions,
  if (!type) {
 _mesa_glsl_error(& loc, state, "unknown type '%s' in subroutine 
function definition", decl->identifier);
  }
+
+ for (int i = 0; i < state->num_subroutine_types; i++) {
+ir_function *fn = state->subroutine_types[i];
+ir_function_signature *tsig = NULL;
+
+if (strcmp(fn->name, decl->identifier))
+   continue;
+
+tsig = fn->matching_signature(state, &sig->parameters,
+  false);
+if (!tsig) {
+   _mesa_glsl_error(& loc, state, "subroutine type mismatch '%s' - 
signatures do not match\n", decl->identifier);
+} else {
+   if (tsig->return_type != sig->return_type) {
+  _mesa_glsl_error(& loc, state, "subroutine type mismatch 
'%s' - return types do not match\n", decl->identifier);
+   }
+}
+ }
  f->subroutine_types[idx++] = type;
   }
   state->subroutines = (ir_function **)reralloc(state, state->subroutines,
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] i965: Stop splitting fma() prior to optimization

2016-05-05 Thread Connor Abbott

On Thu, May 5, 2016 at 8:51 PM, Jason Ekstrand  wrote:
> According to the GLSL spec, if the user uses fma directly and you have it
> in your hardware, you shouldn't split it.  For a while now, we've been
> splitting all fma's up-front and then planned to fuse them later.  The only
> reason why this possibly helped before was for ARB programs which is
> handled by the previous commit. This fixes rendering corruptions in Tomb
> Raider.

Where in the GLSL spec does it say this? In the definition of fma(), it says:

"Otherwise, in the absence of precise consumption, there are no
special constraints on the number of operations or difference in
precision between fma() and the expression
'a * b + c'."

So in other words, splitting them and then opportunistically combining
them is fine. I think the real problem is that the lower_ffma
transform isn't marked as inexact, when doing that transform does
violate the GLSL spec. That is, we shouldn't be splitting ffma
instructions marked as exact.

>
> Shader-db results on Haswell:
>
>total instructions in shared programs: 7560300 -> 7561510 (0.02%)
>instructions in affected programs: 56265 -> 57475 (2.15%)
>helped: 86
>HURT: 291
>
> The only shaders in the database that are affected are from Shadow of
> Mordor which is the first app in our database to use fma().
>
> Reported-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_compiler.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c 
> b/src/mesa/drivers/dri/i965/brw_compiler.c
> index 0ea5e8b..7969543 100644
> --- a/src/mesa/drivers/dri/i965/brw_compiler.c
> +++ b/src/mesa/drivers/dri/i965/brw_compiler.c
> @@ -72,7 +72,6 @@ shader_perf_log_mesa(void *data, const char *fmt, ...)
>  * split all ffma instructions during opt_algebraic and we then 
> re-combine \
>  * them as a later step.  
>  \
>  */   
>  \
> -   .lower_ffma = true,   
>  \
> .lower_sub = true,
>  \
> .lower_fdiv = true,   
>  \
> .lower_scmp = true,   
>  \
> --
> 2.5.0.400.gff86faf
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] glsl: produce a linker error for a subroutine uniform with no functions.

2016-05-05 Thread Dave Airlie

From: Dave Airlie 

If a subroutine uniform is declared with no functions backing it,
that isn't legal, so we should fail to link.

Fixes:
GL43-CTS.shader_subroutine.subroutine_uniform_wo_matching_subroutines

Signed-off-by: Dave Airlie 
---
 src/compiler/glsl/linker.cpp | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp
index 9c72478..daf9016 100644
--- a/src/compiler/glsl/linker.cpp
+++ b/src/compiler/glsl/linker.cpp
@@ -3056,6 +3056,10 @@ link_calculate_subroutine_compat(struct 
gl_shader_program *prog)
 continue;
 
  count = 0;
+ if (sh->NumSubroutineFunctions == 0) {
+linker_error(prog, "subroutine uniform %s defined but no valid 
functions found\n", uni->type->name);
+continue;
+ }
  for (unsigned f = 0; f < sh->NumSubroutineFunctions; f++) {
 struct gl_subroutine_function *fn = &sh->SubroutineFunctions[f];
 for (int k = 0; k < fn->num_compat_types; k++) {
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glsl: trap atomic operations on illegal image formats in compiler.

2016-05-05 Thread Francisco Jerez

Dave Airlie  writes:

> From: Dave Airlie 
>
> This fixes:
> GL43-CTS.shader_image_load_store.negative-compileErrors
>
> where shader 9 was being compiled with atomic operation on an r16i.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/compiler/glsl/ast_function.cpp  | 21 +
>  src/compiler/glsl/builtin_functions.cpp |  6 +++---
>  src/compiler/glsl/glsl_parser_extras.h  |  7 +++
>  src/compiler/glsl/ir.h  |  2 ++
>  4 files changed, 33 insertions(+), 3 deletions(-)
>
> diff --git a/src/compiler/glsl/ast_function.cpp 
> b/src/compiler/glsl/ast_function.cpp
> index fe0df15..3ea6e05 100644
> --- a/src/compiler/glsl/ast_function.cpp
> +++ b/src/compiler/glsl/ast_function.cpp
> @@ -139,6 +139,27 @@ verify_image_parameter(YYLTYPE *loc, 
> _mesa_glsl_parse_state *state,
>return false;
> }
>  
> +   if (formal->data.image_atomic) {
> + if (actual->data.image_format != GL_R32UI &&
> +  actual->data.image_format != GL_R32I) {
> +   _mesa_glsl_error(loc, state,
> + "atomic operations can only happen on r32ui/r32i 
> formats.");
> +   return false;
> + }
> +   }
> +
> +   if (formal->data.image_atomic_exchange) {
> + if ((actual->data.image_format != GL_R32UI &&
> +   actual->data.image_format != GL_R32I)) {
> +   /* check for the r32f special case. */
> +   if (!(state->has_shader_image_atomic_exchange_float() &&
> +  actual->data.image_format == GL_R32F)) {

AFAICT the state->has_shader_image_atomic_exchange_float() check (and
related changes below, including the definition of
ir_variable_data::image_atomic_exchange) shouldn't be necessary because
the floating point imageAtomicExchange overload shouldn't be introduced
into the program symbol table if the r32f variant is not supported, so
passing an r32f image would lead to a type mismatch either way
regardless of what you do here.

> +  _mesa_glsl_error(loc, state,
> +   "atomic exchange operations can only happen on 
> r32ui/r32i formats (or r32f in GLES).");
> +  return false;
> +   }
> + }
> +   }

FTR the reason why I didn't bother to introduce any format qualifier
checks whatsoever in verify_image_parameter() is the following GLSL spec
text from section 4.10 "Memory Qualifiers":

| Layout qualifiers cannot be used on formal function parameters, but
| they are not included in parameter matching.

which sounds like it's legal to pass image variables of *any* format as
parameter for any user-defined function -- Which brings me to the
question of what should happen when you have e.g. an image atomic call
inside a function passing an image formal parameter as argument to the
atomic call -- Is that supposed to be an error?  If not how do you tell
whether the format matches to emit the error above?  You don't?  If you
don't how come it must be an error to do it at the top level?  Or is it
not necessarily an error either and the CTS test is just checking
out-of-spec behaviour?  The spec wording seems *really* sketchy
honestly...

> return true;
>  }
>  
> diff --git a/src/compiler/glsl/builtin_functions.cpp 
> b/src/compiler/glsl/builtin_functions.cpp
> index 25d914d..9d029bb 100644
> --- a/src/compiler/glsl/builtin_functions.cpp
> +++ b/src/compiler/glsl/builtin_functions.cpp
> @@ -478,9 +478,7 @@ shader_image_atomic(const _mesa_glsl_parse_state *state)
>  static bool
>  shader_image_atomic_exchange_float(const _mesa_glsl_parse_state *state)
>  {
> -   return (state->is_version(450, 320) ||
> -   state->ARB_ES3_1_compatibility_enable ||
> -   state->OES_shader_image_atomic_enable);
> +  return (state->has_shader_image_atomic_exchange_float());
>  }
>  
>  static bool
> @@ -5436,6 +5434,8 @@ builtin_builder::_image_prototype(const glsl_type 
> *image_type,
> image->data.image_coherent = true;
> image->data.image_volatile = true;
> image->data.image_restrict = true;
> +   image->data.image_atomic = (flags & IMAGE_FUNCTION_AVAIL_ATOMIC) != 0;
> +   image->data.image_atomic_exchange = (flags & 
> IMAGE_FUNCTION_AVAIL_ATOMIC_EXCHANGE) != 0;
>  
> return sig;
>  }
> diff --git a/src/compiler/glsl/glsl_parser_extras.h 
> b/src/compiler/glsl/glsl_parser_extras.h
> index 7018347..9bf9245 100644
> --- a/src/compiler/glsl/glsl_parser_extras.h
> +++ b/src/compiler/glsl/glsl_parser_extras.h
> @@ -270,6 +270,13 @@ struct _mesa_glsl_parse_state {
>return OES_geometry_shader_enable || is_version(150, 320);
> }
>  
> +   bool has_shader_image_atomic_exchange_float() const
> +   {
> +  return (is_version(450, 320) ||
> +   ARB_ES3_1_compatibility_enable ||
> +   OES_shader_image_atomic_enable);
> +   }
> +
> void process_version_directive(YYLTYPE *locp, int version,
>const char *ident);
>  
> diff --git a/src/compiler/glsl/ir.h b/src/compiler/glsl/ir.h
> index 0c319ea..2bf04e68 100644
> --- a/src/compiler/glsl/ir.h
> +++ b/src/c

Re: [Mesa-dev] [04.5/11] i965: Add flag telling if miptree is for client consumption

2016-05-05 Thread Ben Widawsky

On Mon, Apr 25, 2016 at 08:10:01PM +0300, Topi Pohjolainen wrote:
> Signed-off-by: Topi Pohjolainen 
> ---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 7 ---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 9 +
>  2 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index b11fafd..cdf2fbd 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -394,6 +394,7 @@ intel_miptree_create_layout(struct brw_context *brw,
> mt->logical_depth0 = depth0;
> mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_NO_MCS;
> mt->disable_aux_buffers = (layout_flags & MIPTREE_LAYOUT_DISABLE_AUX) != 
> 0;
> +   mt->is_scanout = (layout_flags & MIPTREE_LAYOUT_FOR_SCANOUT) != 0;
> exec_list_make_empty(&mt->hiz_map);
> mt->cpp = _mesa_get_format_bytes(format);
> mt->num_samples = num_samples;
> @@ -900,7 +901,7 @@ intel_update_winsys_renderbuffer_miptree(struct 
> brw_context *intel,
>   height,
>   1,
>   pitch,
> - 0);
> + MIPTREE_LAYOUT_FOR_SCANOUT);
> if (!singlesample_mt)
>goto fail;
>  
> @@ -959,8 +960,8 @@ intel_miptree_create_for_renderbuffer(struct brw_context 
> *brw,
> bool ok;
> GLenum target = num_samples > 1 ? GL_TEXTURE_2D_MULTISAMPLE : 
> GL_TEXTURE_2D;
> const uint32_t layout_flags = MIPTREE_LAYOUT_ACCELERATED_UPLOAD |
> - MIPTREE_LAYOUT_TILING_ANY;
> -
> + MIPTREE_LAYOUT_TILING_ANY |
> + MIPTREE_LAYOUT_FOR_SCANOUT;
>  
> mt = intel_miptree_create(brw, target, format, 0, 0,
>   width, height, depth, num_samples,
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> index d4f8563..bb06522 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> @@ -657,6 +657,13 @@ struct intel_mipmap_tree
>  */
> bool disable_aux_buffers;
>  
> +   /**
> +* Tells if the underlying buffer is to be also consumed by entities other
> +* than the driver. This allows logic to turn off features such as 
> lossless
> +* compression which is not currently understood by client applications.
> +*/
> +   bool is_scanout;
> +
> /* These are also refcounted:
>  */
> GLuint refcount;
> @@ -697,6 +704,8 @@ enum {
> MIPTREE_LAYOUT_TILING_NONE  = 1 << 6,
> MIPTREE_LAYOUT_TILING_ANY   = MIPTREE_LAYOUT_TILING_Y |
>   MIPTREE_LAYOUT_TILING_NONE,
> +
> +   MIPTREE_LAYOUT_FOR_SCANOUT  = 1 << 7,
>  };
>  
>  struct intel_mipmap_tree *intel_miptree_create(struct brw_context *brw,

Since we hope that we one day can use these buffers for scanout, I'd go for
something more generic, like (requires other changes as well):

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
index 21e4718..deea30d 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
@@ -687,11 +687,14 @@ enum {
MIPTREE_LAYOUT_ACCELERATED_UPLOAD   = 1 << 0,
MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD   = 1 << 1,
MIPTREE_LAYOUT_FOR_BO   = 1 << 2,
-   MIPTREE_LAYOUT_DISABLE_AUX  = 1 << 3,
-   MIPTREE_LAYOUT_FORCE_HALIGN16   = 1 << 4,
-
-   MIPTREE_LAYOUT_TILING_Y = 1 << 5,
-   MIPTREE_LAYOUT_TILING_NONE  = 1 << 6,
+   MIPTREE_LAYOUT_DISABLE_AUX_MCS /* CCS_D */ = 1 << 3,
+   MIPTREE_LAYOUT_DISABLE_AUX_CCS_E= 1 << 4,
+   MIPTREE_LAYOUT_DISABLE_AUX  = MIPTREE_LAYOUT_DISABLE_AUX_MCS |
+ MIPTREE_LAYOUT_DISABLE_AUX_CCS_E,
+   MIPTREE_LAYOUT_FORCE_HALIGN16   = 1 << 5,
+
+   MIPTREE_LAYOUT_TILING_Y = 1 << 6,
+   MIPTREE_LAYOUT_TILING_NONE  = 1 << 7,
MIPTREE_LAYOUT_TILING_ANY   = MIPTREE_LAYOUT_TILING_Y |
  MIPTREE_LAYOUT_TILING_NONE,
 };


I do like mt->is_scanout though. That seems useful. Either way:
Reviewed-by: Ben Widawsky 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [v5 05/11] i965: Deferred allocation of mcs for lossless compressed

2016-05-05 Thread Ben Widawsky

On Mon, Apr 25, 2016 at 08:10:02PM +0300, Topi Pohjolainen wrote:
> Until not mcs was associated to single sampled buffers only for
 ^ now

> fast clear purposes and it was therefore the responsibility of the
> clear logic to allocate the aux buffer when needed. Now that normal
> 3D render or blorp blit may render with mcs enabled also, they need
> to prepare the mcs just as well.
> 
> v2: Do not enable for scanout buffers
> 
> Signed-off-by: Topi Pohjolainen 
> ---
>  src/mesa/drivers/dri/i965/brw_blorp_blit.cpp  |  2 ++
>  src/mesa/drivers/dri/i965/brw_draw.c  | 17 ++
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 34 
> +++
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.h |  4 
>  4 files changed, 57 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
> b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> index 7556d6a..30206f2 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> @@ -82,6 +82,8 @@ brw_blorp_blit_miptrees(struct brw_context *brw,
> intel_miptree_slice_resolve_depth(brw, src_mt, src_level, src_layer);
> intel_miptree_slice_resolve_depth(brw, dst_mt, dst_level, dst_layer);
>  
> +   intel_miptree_prepare_mcs(brw, dst_mt);
> +
> DBG("%s from %dx %s mt %p %d %d (%f,%f) (%f,%f)"
> "to %dx %s mt %p %d %d (%f,%f) (%f,%f) (flip %d,%d)\n",
> __func__,
> diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
> b/src/mesa/drivers/dri/i965/brw_draw.c
> index afa8a4e..4fa0f4e 100644
> --- a/src/mesa/drivers/dri/i965/brw_draw.c
> +++ b/src/mesa/drivers/dri/i965/brw_draw.c
> @@ -371,6 +371,22 @@ brw_postdraw_set_buffers_need_resolve(struct brw_context 
> *brw)
> }
>  }
>  
> +static void
> +brw_predraw_set_aux_buffers(struct brw_context *brw)
> +{
> +   struct gl_context *ctx = &brw->ctx;
> +   struct gl_framebuffer *fb = ctx->DrawBuffer;
> +
> +   for (unsigned i = 0; i < fb->_NumColorDrawBuffers; i++) {
> +  struct intel_renderbuffer *irb =
> + intel_renderbuffer(fb->_ColorDrawBuffers[i]);
> +
> +  if (irb) {
> + intel_miptree_prepare_mcs(brw, irb->mt);
> +  }
> +   }
> +}
> +

I wonder if it makes sense to put the gen check in this function to avoid
unnecessary pointer derefs.

>  /* May fail if out of video memory for texture or vbo upload, or on
>   * fallback conditions.
>   */
> @@ -416,6 +432,7 @@ brw_try_draw_prims(struct gl_context *ctx,
>_mesa_fls(ctx->VertexProgram._Current->Base.SamplersUsed);
>  
> intel_prepare_render(brw);
> +   brw_predraw_set_aux_buffers(brw);
>  
> /* This workaround has to happen outside of brw_upload_render_state()
>  * because it may flush the batchbuffer for a blit, affecting the state
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index cdf2fbd..758b9dd 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -1624,6 +1624,40 @@ intel_miptree_alloc_non_msrt_mcs(struct brw_context 
> *brw,
> return mt->mcs_mt;
>  }
>  
> +void
> +intel_miptree_prepare_mcs(struct brw_context *brw,
> +  struct intel_mipmap_tree *mt)
> +{
> +   if (mt->mcs_mt)
> +  return;
> +

I'm in favor of easily readable early gen checks, but up to you:
if (brw->gen < 9)
   return;

> +   /* Consider if lossless compression is supported but the needed
> +* auxiliary buffer doesn't exist yet.
> +*/
> +   if (brw->gen >= 9 &&
> +   intel_tiling_supports_non_msrt_mcs(brw, mt->tiling) &&
> +   intel_miptree_supports_non_msrt_fast_clear(brw, mt) &&
> +   intel_miptree_supports_lossless_compressed(mt->format)) {
> +
> +  /* Clients are not currently capable of consuming compressed
> +   * single-sampled buffers.
> +   */
> +  if (mt->is_scanout)
> + return;
> +
> +  /* Failing to allocate the auxiliary buffer means running out of
> +   * memory. The pointer to the aux miptree is left NULL which should
> +   * signal non-compressed behavior.
   ^ uncompressed
> +   */
> +  if (!intel_miptree_alloc_non_msrt_mcs(brw, mt)) {
> + _mesa_warning(NULL,
> +   "Failed to allocated aux buffer for lossless"
> +   " compressed %p %u:%u %s\n",
> +   mt, mt->logical_width0, mt->logical_height0,
> +   _mesa_get_format_name(mt->format));
> +  }
> +   }
> +}
>  
>  /**
>   * Helper for intel_miptree_alloc_hiz() that sets
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> index bb06522..e0d543b 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> @@ -693,6 +693,10 @@ bool
>  intel_miptree_alloc_non_msrt_mcs(struct brw_context *brw,
>

Re: [Mesa-dev] [PATCH 3/3] i965: Stop splitting fma() prior to optimization

2016-05-05 Thread Jason Ekstrand

On May 5, 2016 6:35 PM, "Connor Abbott"  wrote:
>
> On Thu, May 5, 2016 at 8:51 PM, Jason Ekstrand 
wrote:
> > According to the GLSL spec, if the user uses fma directly and you have
it
> > in your hardware, you shouldn't split it.  For a while now, we've been
> > splitting all fma's up-front and then planned to fuse them later.  The
only
> > reason why this possibly helped before was for ARB programs which is
> > handled by the previous commit. This fixes rendering corruptions in Tomb
> > Raider.
>
> Where in the GLSL spec does it say this? In the definition of fma(), it
says:
>
> "Otherwise, in the absence of precise consumption, there are no
> special constraints on the number of operations or difference in
> precision between fma() and the expression
> 'a * b + c'."
>
> So in other words, splitting them and then opportunistically combining
> them is fine. I think the real problem is that the lower_ffma
> transform isn't marked as inexact, when doing that transform does
> violate the GLSL spec. That is, we shouldn't be splitting ffma
> instructions marked as exact.

Sure, but that isn't lowering.  Drivers that need lowering want it split
regardless.  If we want to just split imprecise things then we need a new
split_imprecise_ffma option.

I should at least update the comment.

> >
> > Shader-db results on Haswell:
> >
> >total instructions in shared programs: 7560300 -> 7561510 (0.02%)
> >instructions in affected programs: 56265 -> 57475 (2.15%)
> >helped: 86
> >HURT: 291
> >
> > The only shaders in the database that are affected are from Shadow of
> > Mordor which is the first app in our database to use fma().
> >
> > Reported-by: Kenneth Graunke 
> > ---
> >  src/mesa/drivers/dri/i965/brw_compiler.c | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c
b/src/mesa/drivers/dri/i965/brw_compiler.c
> > index 0ea5e8b..7969543 100644
> > --- a/src/mesa/drivers/dri/i965/brw_compiler.c
> > +++ b/src/mesa/drivers/dri/i965/brw_compiler.c
> > @@ -72,7 +72,6 @@ shader_perf_log_mesa(void *data, const char *fmt, ...)
> >  * split all ffma instructions during opt_algebraic and we then
re-combine \
> >  * them as a later step.
   \
> >  */
\
> > -   .lower_ffma = true,
\
> > .lower_sub = true,
   \
> > .lower_fdiv = true,
\
> > .lower_scmp = true,
\
> > --
> > 2.5.0.400.gff86faf
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using gallivm.

2016-05-05 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95266

Ilia Mirkin  changed:

   What|Removed |Added

Summary|Geometry missing from   |Geometry missing from
   |rendering, only when using  |rendering, only when using
   |Mesa.   |gallivm.

--- Comment #18 from Ilia Mirkin  ---
And to further confirm, trace replays fine on both a GF108 and GT215 with
nouveau's nvc0 and nv50 drivers. So this is a gallivm issue, not mesa.
Adjusting subject.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/7] nir: Add a lowering pass for YUV textures

2016-05-05 Thread Kristian Høgsberg

On Thu, May 5, 2016 at 5:37 PM, Jason Ekstrand  wrote:
>
>
> On Thu, May 5, 2016 at 5:04 PM, Kristian Høgsberg  wrote:
>>
>> From: Kristian Høgsberg Kristensen 
>>
>> This lowers sampling from YUV textures to 1) one or more texture
>> instructions to sample each plane and 2) color space conversion to RGB.
>> ---
>>  src/compiler/nir/nir.h   |   7 +++
>>  src/compiler/nir/nir_lower_tex.c | 118
>> +++
>>  2 files changed, 125 insertions(+)
>>
>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> index d85ea17..cf6d49d 100644
>> --- a/src/compiler/nir/nir.h
>> +++ b/src/compiler/nir/nir.h
>> @@ -2273,6 +2273,13 @@ typedef struct nir_lower_tex_options {
>> bool lower_rect;
>>
>> /**
>> +* If true, convert yuv to rgb.
>> +*/
>> +   unsigned lower_y_uv_external;
>> +   unsigned lower_y_u_v_external;
>> +   unsigned lower_yx_xuxv_external;
>> +
>> +   /**
>>  * To emulate certain texture wrap modes, this can be used
>>  * to saturate the specified tex coord to [0.0, 1.0].  The
>>  * bits are according to sampler #, ie. if, for example:
>> diff --git a/src/compiler/nir/nir_lower_tex.c
>> b/src/compiler/nir/nir_lower_tex.c
>> index 16fee9a..c8aebb4 100644
>> --- a/src/compiler/nir/nir_lower_tex.c
>> +++ b/src/compiler/nir/nir_lower_tex.c
>> @@ -167,6 +167,108 @@ lower_rect(nir_builder *b, nir_tex_instr *tex)
>> tex->sampler_dim = GLSL_SAMPLER_DIM_2D;
>>  }
>>
>> +static nir_ssa_def *
>> +sample_plane(nir_builder *b, nir_tex_instr *tex, int plane)
>> +{
>> +   assert(tex->dest.is_ssa);
>> +   assert(nir_tex_instr_dest_size(tex) == 4);
>> +   assert(nir_alu_type_get_base_type(tex->dest_type) == nir_type_float);
>> +   assert(tex->op == nir_texop_tex);
>> +   assert(tex->coord_components == 2);
>> +
>> +   nir_tex_instr *plane_tex = nir_tex_instr_create(b->shader, 2);
>> +   plane_tex->src[0].src = tex->src[0].src;
>
>
> You should use nir_src_copy here
>
>>
>> +   plane_tex->src[0].src_type = nir_tex_src_coord;
>> +   plane_tex->src[1].src = nir_src_for_ssa(nir_imm_int(b, plane));
>> +   plane_tex->src[1].src_type = nir_tex_src_plane;
>> +   plane_tex->op = nir_texop_tex;
>> +   plane_tex->sampler_dim = 2;
>> +   plane_tex->dest_type = nir_type_float;
>> +   plane_tex->coord_components = 2;
>> +
>> +   plane_tex->texture_index = tex->texture_index;
>
>
> This should use nir_copy_deref.  Right now, the deref is supposed to be
> ralloc parented to the instruciton so you have to make a copy.  One of these
> days, I'm going to clean that up.
>
>>
>> +   plane_tex->texture = tex->texture;
>> +   plane_tex->sampler_index = tex->sampler_index;
>> +   plane_tex->sampler = tex->sampler;
>
>
> Same here
>
>>
>> +
>> +   nir_ssa_dest_init(&plane_tex->instr, &plane_tex->dest, 4, 32, NULL);
>> +
>> +   nir_builder_instr_insert(b, &plane_tex->instr);
>> +
>> +   return &plane_tex->dest.ssa;
>> +}
>> +
>> +static void
>> +convert_yuv_to_rgb(nir_builder *b, nir_tex_instr *tex,
>> +   nir_ssa_def *y, nir_ssa_def *u, nir_ssa_def *v)
>> +{
>> +   nir_const_value m[3] = {
>> +  { .f32 = { 1.0f,  0.0f, 1.59602678f, 0.0f } },
>> +  { .f32 = { 1.0f, -0.39176229f, -0.81296764f, 0.0f } },
>> +  { .f32 = { 1.0f,  2.01723214f,  0.0f,0.0f } }
>> +   };
>> +
>> +   nir_ssa_def *yuv =
>> +  nir_vec4(b,
>> +   nir_fmul(b, nir_imm_float(b, 1.16438356f),
>> +nir_fadd(b, y, nir_imm_float(b, -0.0625f))),
>> +   nir_channel(b, nir_fadd(b, u, nir_imm_float(b, -0.5f)),
>> 0),
>> +   nir_channel(b, nir_fadd(b, v, nir_imm_float(b, -0.5f)),
>> 0),
>> +   nir_imm_float(b, 0.0));
>> +
>> +   nir_ssa_def *red = nir_fdot4(b, yuv, nir_build_imm(b, 4, m[0]));
>> +   nir_ssa_def *green = nir_fdot4(b, yuv, nir_build_imm(b, 4, m[1]));
>> +   nir_ssa_def *blue = nir_fdot4(b, yuv, nir_build_imm(b, 4, m[2]));
>
>
> I didn't bother checking your math here, but I've seen the results so it's
> probably ok.
>
>>
>> +
>> +   nir_ssa_def *result = nir_vec4(b, red, green, blue, nir_imm_float(b,
>> 1.0f));
>> +
>> +   nir_ssa_def_rewrite_uses_after(&tex->dest.ssa,
>> nir_src_for_ssa(result),
>> +  result->parent_instr);
>
>
> You can just use ssa_def_rewrite_uses here.  No need to use the _after
> version.
>
> With those fixed, this and the other NIR patch are

Ah yes, we talked about those and then I forgot in all the other refactoring.

> Reviewed-by: Jason Ekstrand 

Thanks!

>
>>
>> +}
>> +
>> +static void
>> +lower_y_uv_external(nir_builder *b, nir_tex_instr *tex)
>> +{
>> +   b->cursor = nir_after_instr(&tex->instr);
>> +
>> +   nir_ssa_def *y = sample_plane(b, tex, 0);
>> +   nir_ssa_def *uv = sample_plane(b, tex, 1);
>> +
>> +   convert_yuv_to_rgb(b, tex,
>> +  nir_channel(b, y, 0),
>> +  nir_channel(b, uv, 0),
>> +  nir_channel(b, uv, 1));
>> +}
>> +
>> +static void
>> +lo

Re: [Mesa-dev] [PATCH] glsl: trap atomic operations on illegal image formats in compiler.

2016-05-05 Thread Dave Airlie

On 6 May 2016 at 11:41, Francisco Jerez  wrote:
> Dave Airlie  writes:
>
>> From: Dave Airlie 
>>
>> This fixes:
>> GL43-CTS.shader_image_load_store.negative-compileErrors
>>
>> where shader 9 was being compiled with atomic operation on an r16i.
>>
>> Signed-off-by: Dave Airlie 
>> ---
>>  src/compiler/glsl/ast_function.cpp  | 21 +
>>  src/compiler/glsl/builtin_functions.cpp |  6 +++---
>>  src/compiler/glsl/glsl_parser_extras.h  |  7 +++
>>  src/compiler/glsl/ir.h  |  2 ++
>>  4 files changed, 33 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/compiler/glsl/ast_function.cpp 
>> b/src/compiler/glsl/ast_function.cpp
>> index fe0df15..3ea6e05 100644
>> --- a/src/compiler/glsl/ast_function.cpp
>> +++ b/src/compiler/glsl/ast_function.cpp
>> @@ -139,6 +139,27 @@ verify_image_parameter(YYLTYPE *loc, 
>> _mesa_glsl_parse_state *state,
>>return false;
>> }
>>
>> +   if (formal->data.image_atomic) {
>> + if (actual->data.image_format != GL_R32UI &&
>> +  actual->data.image_format != GL_R32I) {
>> +   _mesa_glsl_error(loc, state,
>> + "atomic operations can only happen on r32ui/r32i 
>> formats.");
>> +   return false;
>> + }
>> +   }
>> +
>> +   if (formal->data.image_atomic_exchange) {
>> + if ((actual->data.image_format != GL_R32UI &&
>> +   actual->data.image_format != GL_R32I)) {
>> +   /* check for the r32f special case. */
>> +   if (!(state->has_shader_image_atomic_exchange_float() &&
>> +  actual->data.image_format == GL_R32F)) {
>
> AFAICT the state->has_shader_image_atomic_exchange_float() check (and
> related changes below, including the definition of
> ir_variable_data::image_atomic_exchange) shouldn't be necessary because
> the floating point imageAtomicExchange overload shouldn't be introduced
> into the program symbol table if the r32f variant is not supported, so
> passing an r32f image would lead to a type mismatch either way
> regardless of what you do here.
>

Cool I can try that.

This bit of spec seems clear to me:

From GLSL 4.50: Section 8.12

Atomic memory operations are supported on only a subset of all image
variable types; image must be
either:
• a signed integer image variable (type starts “iimage”) and a format
qualifier of r32i, used with a
data argument of type int, or
• an unsigned image variable (type starts “uimage”) and a format
qualifier of r32ui, used with a
data argument of type uint.

So there is another CTS test that does:

GLSL source for fragment shader 3:
: #version 420 core
layout(location = 0) out vec4 o_color;
layout(rgba32f) coherent volatile restrict uniform image2D g_image_layer0;
layout(rgba32f) volatile uniform image2D g_image_layer1;
vec4 Load(layout(rgba32f) volatile image2D image, ivec2 coord) {
  return imageLoad(image, coord);
}
void main() {
  ivec2 coord = ivec2(gl_FragCoord.xy);
  imageStore(g_image_layer0, coord, vec4(1.0));
  memoryBarrier();
  imageStore(g_image_layer0, coord, vec4(2.0));
  memoryBarrier();
  imageStore(g_image_layer0, coord, vec4(0.0, 1.0, 0.0, 1.0));
  memoryBarrier();
  o_color = imageLoad(g_image_layer0, coord) + Load(g_image_layer1, coord);
}


>
> FTR the reason why I didn't bother to introduce any format qualifier
> checks whatsoever in verify_image_parameter() is the following GLSL spec
> text from section 4.10 "Memory Qualifiers":
>
> | Layout qualifiers cannot be used on formal function parameters, but
> | they are not included in parameter matching.
>
> which sounds like it's legal to pass image variables of *any* format as
> parameter for any user-defined function -- Which brings me to the
> question of what should happen when you have e.g. an image atomic call
> inside a function passing an image formal parameter as argument to the
> atomic call -- Is that supposed to be an error?  If not how do you tell
> whether the format matches to emit the error above?  You don't?  If you
> don't how come it must be an error to do it at the top level?  Or is it
> not necessarily an error either and the CTS test is just checking
> out-of-spec behaviour?  The spec wording seems *really* sketchy
> honestly...
>

So it might be that CTS is out of spec, I'll see if I can dig up any
more info on it.
Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [v4 06/11] i965/blorp: Prepare blits for lossless compression

2016-05-05 Thread Ben Widawsky

On Thu, Apr 21, 2016 at 02:59:01PM +0300, Topi Pohjolainen wrote:
> Signed-off-by: Topi Pohjolainen 
> ---
>  src/mesa/drivers/dri/i965/brw_blorp.cpp  |  2 +-
>  src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 26 +-
>  2 files changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp.cpp 
> b/src/mesa/drivers/dri/i965/brw_blorp.cpp
> index 5ffc5b4..d23d212 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_blorp.cpp
> @@ -58,7 +58,7 @@ brw_blorp_mip_info::set(struct intel_mipmap_tree *mt,
>  */
> if (mt->msaa_layout == INTEL_MSAA_LAYOUT_UMS ||
> mt->msaa_layout == INTEL_MSAA_LAYOUT_CMS) {
> -  assert(layer % mt->num_samples == 0);
> +  assert(mt->num_samples <= 1 || layer % mt->num_samples == 0);

Hmm. Can you explain this hunk to me? It /seems/ wrong, but I am not as
intimately familiar as you are.

> }
>  
> intel_miptree_check_level_layer(mt, level, layer);
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
> b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> index 30206f2..f33f4e1 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> @@ -72,12 +72,7 @@ brw_blorp_blit_miptrees(struct brw_context *brw,
>  * the destination buffer because we use the standard render path to 
> render
>  * to destination color buffers, and the standard render path is
>  * fast-color-aware.
> -* Lossless compression is only introduced for gen9 onwards whereas
> -* blorp is not supported even for gen8. Therefore it should be impossible
> -* to end up here with single sampled compressed surfaces.
>  */
> -   assert(!intel_miptree_is_lossless_compressed(brw, src_mt));
> -   assert(!intel_miptree_is_lossless_compressed(brw, dst_mt));
> intel_miptree_resolve_color(brw, src_mt, 0);
> intel_miptree_slice_resolve_depth(brw, src_mt, src_level, src_layer);
> intel_miptree_slice_resolve_depth(brw, dst_mt, dst_level, dst_layer);
> @@ -1894,6 +1889,14 @@ compute_msaa_layout_for_pipeline(struct brw_context 
> *brw, unsigned num_samples,
>   intel_msaa_layout true_layout)
>  {
> if (num_samples <= 1) {
> +  /* Layout is used to determine if ld2dms is needed for sampling. In
> +   * single sampled case normal ld is enough avoiding also the need to
> +   * fetch mcs. Therefore simply set the layout to none.
> +   */
> +  if (brw->gen >= 9 && true_layout == INTEL_MSAA_LAYOUT_CMS) {
> + return INTEL_MSAA_LAYOUT_NONE;
> +  }
> +
>/* When configuring the GPU for non-MSAA, we can still accommodate IMS
> * format buffers, by transforming coordinates appropriately.
> */
> @@ -2072,6 +2075,19 @@ brw_blorp_blit_params::brw_blorp_blit_params(struct 
> brw_context *brw,
> wm_prog_key.src_layout = src_mt->msaa_layout;
> wm_prog_key.dst_layout = dst_mt->msaa_layout;
>  
> +   /* On gen9+ compressed single sampled buffers carry the same layout type 
> as
> +* multisampled. The difference is that they can be sampled using normal
> +* ld message and as render target behave just like non-compressed surface
> +* from compiler point of view. Therefore override the type in the program
> +* key.
> +*/
> +   if (brw->gen >= 9 && src.num_samples <= 1 &&
> +   src_mt->msaa_layout == INTEL_MSAA_LAYOUT_CMS)
> +  wm_prog_key.src_layout = INTEL_MSAA_LAYOUT_NONE;
> +   if (brw->gen >= 9 && dst.num_samples <= 1 &&
> +   dst_mt->msaa_layout == INTEL_MSAA_LAYOUT_CMS)
> +  wm_prog_key.dst_layout = INTEL_MSAA_LAYOUT_NONE;
> +
> wm_prog_key.src_tiled_w = src.map_stencil_as_y_tiled;
> wm_prog_key.dst_tiled_w = dst.map_stencil_as_y_tiled;
> /* Round floating point values to nearest integer to avoid "off by one 
> texel"

With sufficient answer to my question:
Reviewed-by: Ben Widawsky 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2] nir: Put nir_variable's memory qualifiers in its own struct

2016-05-05 Thread Eduardo Lima Mitev

ARB_shader_storage_buffer_object allows for memory qualifiers defined for
ARB_shader_image_load_store, to be applied to SSBOs using the same semantics.
So, for clarity, we might want to take these qualifiers out into its own
category, rather than putting them inside the 'image' specific struct.

I agree that this renaming barely improves anything other than making
nir_variable data struct pedantically stricter, so I would be ok dropping
this patch if people think it is not worth it. Also, glsl data structs
still use 'image' namespace for all memory qualifiers, so we might want
to rename things there too.

v2: Added a missing patch chunk that updated nir_print.
---
 src/compiler/nir/glsl_to_nir.cpp | 10 +-
 src/compiler/nir/nir.h   |  7 ++-
 src/compiler/nir/nir_print.c | 10 +-
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp |  2 +-
 4 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/src/compiler/nir/glsl_to_nir.cpp b/src/compiler/nir/glsl_to_nir.cpp
index c8954ce..75760f4 100644
--- a/src/compiler/nir/glsl_to_nir.cpp
+++ b/src/compiler/nir/glsl_to_nir.cpp
@@ -414,11 +414,11 @@ nir_visitor::visit(ir_variable *ir)
var->data.index = ir->data.index;
var->data.binding = ir->data.binding;
var->data.offset = ir->data.offset;
-   var->data.image.read_only = ir->data.image_read_only;
-   var->data.image.write_only = ir->data.image_write_only;
-   var->data.image.coherent = ir->data.image_coherent;
-   var->data.image._volatile = ir->data.image_volatile;
-   var->data.image.restrict_flag = ir->data.image_restrict;
+   var->data.memory.read_only = ir->data.image_read_only;
+   var->data.memory.write_only = ir->data.image_write_only;
+   var->data.memory.coherent = ir->data.image_coherent;
+   var->data.memory._volatile = ir->data.image_volatile;
+   var->data.memory.restrict_flag = ir->data.image_restrict;
var->data.image.format = ir->data.image_format;
var->data.max_array_access = ir->data.max_array_access;
 
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 8a616d4..57158d1 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -283,7 +283,7 @@ typedef struct nir_variable {
   unsigned offset;
 
   /**
-   * ARB_shader_image_load_store qualifiers.
+   * Memory qualifiers.
*/
   struct {
  bool read_only; /**< "readonly" qualifier. */
@@ -291,7 +291,12 @@ typedef struct nir_variable {
  bool coherent;
  bool _volatile;
  bool restrict_flag;
+  } memory;
 
+  /**
+   * ARB_shader_image_load_store.
+   */
+  struct {
  /** Image internal format if specified explicitly, otherwise GL_NONE. 
*/
  GLenum format;
   } image;
diff --git a/src/compiler/nir/nir_print.c b/src/compiler/nir/nir_print.c
index a36561e..1aa1ef4 100644
--- a/src/compiler/nir/nir_print.c
+++ b/src/compiler/nir/nir_print.c
@@ -354,11 +354,11 @@ print_var_decl(nir_variable *var, print_state *state)
cent, samp, patch, inv, get_variable_mode_str(var->data.mode),
glsl_interp_qualifier_name(var->data.interpolation));
 
-   const char *const coher = (var->data.image.coherent) ? "coherent " : "";
-   const char *const volat = (var->data.image._volatile) ? "volatile " : "";
-   const char *const restr = (var->data.image.restrict_flag) ? "restrict " : 
"";
-   const char *const ronly = (var->data.image.read_only) ? "readonly " : "";
-   const char *const wonly = (var->data.image.write_only) ? "writeonly " : "";
+   const char *const coher = (var->data.memory.coherent) ? "coherent " : "";
+   const char *const volat = (var->data.memory._volatile) ? "volatile " : "";
+   const char *const restr = (var->data.memory.restrict_flag) ? "restrict " : 
"";
+   const char *const ronly = (var->data.memory.read_only) ? "readonly " : "";
+   const char *const wonly = (var->data.memory.write_only) ? "writeonly " : "";
fprintf(fp, "%s%s%s%s%s", coher, volat, restr, ronly, wonly);
 
glsl_print_type(var->type, fp);
diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 905f5c1..c9015a4 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -2935,7 +2935,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
nir_intrinsic_instr *instr
 
   else if (instr->intrinsic == nir_intrinsic_image_store)
  emit_image_store(bld, image, addr, src0, surf_dims, arr_dims,
-  var->data.image.write_only ? GL_NONE : format);
+  var->data.memory.write_only ? GL_NONE : format);
 
   else
  tmp = emit_image_atomic(bld, image, addr, src0, src1,
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [v4 06/11] i965/blorp: Prepare blits for lossless compression

2016-05-05 Thread Pohjolainen, Topi

On Thu, May 05, 2016 at 11:01:54PM -0700, Ben Widawsky wrote:
> On Thu, Apr 21, 2016 at 02:59:01PM +0300, Topi Pohjolainen wrote:
> > Signed-off-by: Topi Pohjolainen 
> > ---
> >  src/mesa/drivers/dri/i965/brw_blorp.cpp  |  2 +-
> >  src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 26 
> > +-
> >  2 files changed, 22 insertions(+), 6 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_blorp.cpp 
> > b/src/mesa/drivers/dri/i965/brw_blorp.cpp
> > index 5ffc5b4..d23d212 100644
> > --- a/src/mesa/drivers/dri/i965/brw_blorp.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_blorp.cpp
> > @@ -58,7 +58,7 @@ brw_blorp_mip_info::set(struct intel_mipmap_tree *mt,
> >  */
> > if (mt->msaa_layout == INTEL_MSAA_LAYOUT_UMS ||
> > mt->msaa_layout == INTEL_MSAA_LAYOUT_CMS) {
> > -  assert(layer % mt->num_samples == 0);
> > +  assert(mt->num_samples <= 1 || layer % mt->num_samples == 0);
> 
> Hmm. Can you explain this hunk to me? It /seems/ wrong, but I am not as
> intimately familiar as you are.

As we re-use CMS layout also for lossless compression, we end up here. I need
to re-check the details, but I hit this with piglit tests. I'm just rebasing
with your comments, I'l get back here once I have it running again.

> 
> > }
> >  
> > intel_miptree_check_level_layer(mt, level, layer);
> > diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
> > b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> > index 30206f2..f33f4e1 100644
> > --- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> > @@ -72,12 +72,7 @@ brw_blorp_blit_miptrees(struct brw_context *brw,
> >  * the destination buffer because we use the standard render path to 
> > render
> >  * to destination color buffers, and the standard render path is
> >  * fast-color-aware.
> > -* Lossless compression is only introduced for gen9 onwards whereas
> > -* blorp is not supported even for gen8. Therefore it should be 
> > impossible
> > -* to end up here with single sampled compressed surfaces.
> >  */
> > -   assert(!intel_miptree_is_lossless_compressed(brw, src_mt));
> > -   assert(!intel_miptree_is_lossless_compressed(brw, dst_mt));
> > intel_miptree_resolve_color(brw, src_mt, 0);
> > intel_miptree_slice_resolve_depth(brw, src_mt, src_level, src_layer);
> > intel_miptree_slice_resolve_depth(brw, dst_mt, dst_level, dst_layer);
> > @@ -1894,6 +1889,14 @@ compute_msaa_layout_for_pipeline(struct brw_context 
> > *brw, unsigned num_samples,
> >   intel_msaa_layout true_layout)
> >  {
> > if (num_samples <= 1) {
> > +  /* Layout is used to determine if ld2dms is needed for sampling. In
> > +   * single sampled case normal ld is enough avoiding also the need to
> > +   * fetch mcs. Therefore simply set the layout to none.
> > +   */
> > +  if (brw->gen >= 9 && true_layout == INTEL_MSAA_LAYOUT_CMS) {
> > + return INTEL_MSAA_LAYOUT_NONE;
> > +  }
> > +
> >/* When configuring the GPU for non-MSAA, we can still accommodate 
> > IMS
> > * format buffers, by transforming coordinates appropriately.
> > */
> > @@ -2072,6 +2075,19 @@ brw_blorp_blit_params::brw_blorp_blit_params(struct 
> > brw_context *brw,
> > wm_prog_key.src_layout = src_mt->msaa_layout;
> > wm_prog_key.dst_layout = dst_mt->msaa_layout;
> >  
> > +   /* On gen9+ compressed single sampled buffers carry the same layout 
> > type as
> > +* multisampled. The difference is that they can be sampled using normal
> > +* ld message and as render target behave just like non-compressed 
> > surface
> > +* from compiler point of view. Therefore override the type in the 
> > program
> > +* key.
> > +*/
> > +   if (brw->gen >= 9 && src.num_samples <= 1 &&
> > +   src_mt->msaa_layout == INTEL_MSAA_LAYOUT_CMS)
> > +  wm_prog_key.src_layout = INTEL_MSAA_LAYOUT_NONE;
> > +   if (brw->gen >= 9 && dst.num_samples <= 1 &&
> > +   dst_mt->msaa_layout == INTEL_MSAA_LAYOUT_CMS)
> > +  wm_prog_key.dst_layout = INTEL_MSAA_LAYOUT_NONE;
> > +
> > wm_prog_key.src_tiled_w = src.map_stencil_as_y_tiled;
> > wm_prog_key.dst_tiled_w = dst.map_stencil_as_y_tiled;
> > /* Round floating point values to nearest integer to avoid "off by one 
> > texel"
> 
> With sufficient answer to my question:
> Reviewed-by: Ben Widawsky 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 56/59] i965/fs: push first double-based uniforms in push constant buffer

2016-05-05 Thread Samuel Iglesias Gonsálvez

When there is a mix of definitions of uniforms with 32-bit or 64-bit
data type sizes, the driver ends up doing misaligned access to double
based variables in the push constant buffer.

To fix this, this patch pushes first all the 64-bit variables and
then the rest. Then, all the variables would be aligned to
its data type size.

Signed-off-by: Samuel Iglesias Gonsálvez 
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 113 +--
 1 file changed, 83 insertions(+), 30 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 65aa3c7..7eaf1b4 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -39,6 +39,7 @@
 #include "brw_program.h"
 #include "brw_dead_control_flow.h"
 #include "compiler/glsl_types.h"
+#include "program/prog_parameter.h"
 
 using namespace brw;
 
@@ -2004,6 +2005,45 @@ fs_visitor::compact_virtual_grfs()
return progress;
 }
 
+static void
+set_push_pull_constant_loc(unsigned uniform, int &chunk_start, bool contiguous,
+   int *push_constant_loc, int *pull_constant_loc,
+   unsigned &num_push_constants,
+   unsigned &num_pull_constants,
+   const unsigned max_push_components,
+   const unsigned max_chunk_size,
+   struct brw_stage_prog_data *stage_prog_data)
+{
+   /* This is the first live uniform in the chunk */
+   if (chunk_start < 0)
+  chunk_start = uniform;
+
+   /* If this element does not need to be contiguous with the next, we
+* split at this point and everthing between chunk_start and u forms a
+* single chunk.
+*/
+   if (!contiguous) {
+  unsigned chunk_size = uniform - chunk_start + 1;
+
+  /* Decide whether we should push or pull this parameter.  In the
+   * Vulkan driver, push constants are explicitly exposed via the API
+   * so we push everything.  In GL, we only push small arrays.
+   */
+  if (stage_prog_data->pull_param == NULL ||
+  (num_push_constants + chunk_size <= max_push_components &&
+   chunk_size <= max_chunk_size)) {
+ assert(num_push_constants + chunk_size <= max_push_components);
+ for (unsigned j = chunk_start; j <= uniform; j++)
+push_constant_loc[j] = num_push_constants++;
+  } else {
+ for (unsigned j = chunk_start; j <= uniform; j++)
+pull_constant_loc[j] = num_pull_constants++;
+  }
+
+  chunk_start = -1;
+   }
+}
+
 /**
  * Assign UNIFORM file registers to either push constants or pull constants.
  *
@@ -2022,6 +2062,8 @@ fs_visitor::assign_constant_locations()
 
bool is_live[uniforms];
memset(is_live, 0, sizeof(is_live));
+   bool is_live_64bit[uniforms];
+   memset(is_live_64bit, 0, sizeof(is_live_64bit));
 
/* For each uniform slot, a value of true indicates that the given slot and
 * the next slot must remain contiguous.  This is used to keep us from
@@ -2054,14 +2096,21 @@ fs_visitor::assign_constant_locations()
 for (unsigned j = constant_nr; j < last; j++) {
is_live[j] = true;
contiguous[j] = true;
+   if (type_sz(inst->src[i].type) == 8) {
+  is_live_64bit[j] = true;
+   }
 }
 is_live[last] = true;
  } else {
 if (constant_nr >= 0 && constant_nr < (int) uniforms) {
int regs_read = inst->components_read(i) *
   type_sz(inst->src[i].type) / 4;
-   for (int j = 0; j < regs_read; j++)
+   for (int j = 0; j < regs_read; j++) {
   is_live[constant_nr + j] = true;
+  if (type_sz(inst->src[i].type) == 8) {
+ is_live_64bit[constant_nr + j] = true;
+  }
+   }
 }
  }
   }
@@ -2090,43 +2139,46 @@ fs_visitor::assign_constant_locations()
pull_constant_loc = ralloc_array(mem_ctx, int, uniforms);
 
int chunk_start = -1;
+
+
+   /* First push 64-bit uniforms */
for (unsigned u = 0; u < uniforms; u++) {
-  push_constant_loc[u] = -1;
+  if (!is_live[u] || !is_live_64bit[u])
+ continue;
+
   pull_constant_loc[u] = -1;
+  push_constant_loc[u] = -1;
 
-  if (!is_live[u])
- continue;
+  set_push_pull_constant_loc(u, chunk_start, contiguous[u],
+ push_constant_loc, pull_constant_loc,
+ num_push_constants, num_pull_constants,
+ max_push_components, max_chunk_size,
+ stage_prog_data);
 
-  /* This is the first live uniform in the chunk */
-  if (chunk_start < 0)
- chunk_start = u;
+   }
 
-  /* If this element does not need to be contiguous with the next, we
-   * split at this point and everth

Re: [Mesa-dev] [v4 06/11] i965/blorp: Prepare blits for lossless compression

2016-05-05 Thread Pohjolainen, Topi

On Fri, May 06, 2016 at 09:30:27AM +0300, Pohjolainen, Topi wrote:
> On Thu, May 05, 2016 at 11:01:54PM -0700, Ben Widawsky wrote:
> > On Thu, Apr 21, 2016 at 02:59:01PM +0300, Topi Pohjolainen wrote:
> > > Signed-off-by: Topi Pohjolainen 
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_blorp.cpp  |  2 +-
> > >  src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 26 
> > > +-
> > >  2 files changed, 22 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/src/mesa/drivers/dri/i965/brw_blorp.cpp 
> > > b/src/mesa/drivers/dri/i965/brw_blorp.cpp
> > > index 5ffc5b4..d23d212 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_blorp.cpp
> > > +++ b/src/mesa/drivers/dri/i965/brw_blorp.cpp
> > > @@ -58,7 +58,7 @@ brw_blorp_mip_info::set(struct intel_mipmap_tree *mt,
> > >  */
> > > if (mt->msaa_layout == INTEL_MSAA_LAYOUT_UMS ||
> > > mt->msaa_layout == INTEL_MSAA_LAYOUT_CMS) {
> > > -  assert(layer % mt->num_samples == 0);
> > > +  assert(mt->num_samples <= 1 || layer % mt->num_samples == 0);
> > 
> > Hmm. Can you explain this hunk to me? It /seems/ wrong, but I am not as
> > intimately familiar as you are.
> 
> As we re-use CMS layout also for lossless compression, we end up here. I need
> to re-check the details, but I hit this with piglit tests. I'm just rebasing
> with your comments, I'l get back here once I have it running again.

It was as I suspected, we use both mt->num_samples == 1 and == 0 for
single sampled - latter won't work with %-operand.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

80 matches

Mail list logo