date:20160504

Re: [Mesa-dev] [PATCH 00/59] Initial arb_gpu_shader_fp64 support to the i965 scalar backend

2016-05-04 Thread Samuel Iglesias Gonsálvez

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256



On 04/05/16 21:28, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez  writes:
> 
> 
> 
> On 03/05/16 20:59, Kenneth Graunke wrote:
 Other than patches 37, 56, and ones you agreed to drop, the
 series is: Reviewed-by: Kenneth Graunke
 
 
 I think you can go ahead and land all except those, and we
 can land new solutions for those problems afterwards.
 
 We still need to fix the horiz_offset problem, and I think
 Curro's subscript() helper is a great way to do that.  I
 would be fine with fixing that after this series lands - it
 would probably mean less rebasing :)
 
> 
> We prepared a patch series without patches 11, 22 (both were
> dropped), 37, 40 (it has a NACK from Curro) and 56. Our idea was to
> push the patch series once we check there is no piglit
> regressions.
> 
> We got 0 piglit regressions on ILK, SNB, IVB, HSW and BDW, the
> latter with the extension enabled with the environment variable.
> 
> Sadly, on SKL we had regressions (including GPU hangs) only if we 
> *enable* the arb_gpu_shader_fp64 extension. Those regressions are 
> fixed by patch 40 but we need still to investigate why they are
> happening.
> 
>> That sounds like PATCH 40 was most likely working around
>> regs_read() being interpreted with incorrect units in some other
>> place.  The use of regs_read() in PATCH 36 is obviously wrong
>> which might explain the problem.  Something along the lines of
>> 'inst->components_read(i) * type_sz(inst->src[i]) / 4' would do
>> what you want -- With that changed PATCH 36 is:
> 
>> Reviewed-by: Francisco Jerez 

This change fixes the SKL regressions. Thanks Curro!

Sam

> 
> For that reason, we have not pushed anything yet. Tomorrow we will 
> investigate why we have those regressions only on SKL and which
> tests are affected.
> 
>> No worries, you still have a couple of weeks until the merge
>> deadline, I don't see the need to rush in any changes that still
>> have outstanding comments.
> 
> JFTR, if we disable the extension on SKL (i.e. we don't set the 
> environment variable), we have 0 piglit regressions on that
> generation.
> 
> Thanks,
> 
> Sam
> 
> 
>> ___ mesa-dev mailing
>> list mesa-dev@lists.freedesktop.org 
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJXKuxFAAoJEH/0ujLxfcNDtx4QAMvFo5qXCyK9hoigUV38J8nL
tTtsveHH/aIQzhFzhPwU8q2ABoIuGfhFJ0vuusFKVmWGXVt93/0S+QMLToGTXKQS
2m5ySc+w+K8bO7PD0/S4VDRcEck2MKM7rNP+ErdTM1F8elB+vqasgfIRVMlGUf6S
pjT4wuiRamL5boIGBhLdch8eVZjap2t07DZ9TkqqPltkhHmtMeBJyACzLKNMH44w
2QD0J0OzJ268oRr6PPlVHisk/LbtoQSsCmtQciNwg1u5LPxF72pNPn3xpnPtZ3R2
PWCF14KedgxHfswes4qGqJ7hWm9B0Zd6PHSPXfzdqkIu4Nq8n4uJw2EAim092QVm
OWyjhSGgUthoWwBJCf2yWqtufTp2kdwaTia+l1oRTNOzGnXRurTxu3CNrLJ42Cju
8JwXkKaNjkk47dM85vFftBzQh82lYfAuKceIOgv37BqrvYPbinYM53/hPYaRnQxW
EViDfDTWrNBdTi5OxTP3HZ87w3LQQEngxK7kW/N8bLalKlCEMVQ2PPu9Nmg+TKHm
YlO8z+lWoPgda2dEzJjoIu6J9GOX1WCKKrwHtwbPxBTrt/PtDlj4TiuD+nsA34Pk
RXgOJHLGFBoxgOcIZBrFlwdkVv8avE80nH3fj6BMlERhloZv+KF6zSv0tx02lRq1
J5GVIh0plzqZDF+uVSLJ
=G05c
-END PGP SIGNATURE-
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] mesa/ubo: add missing compute cases for ubo/atomic buffers

2016-05-04 Thread Dave Airlie

From: Dave Airlie 

This fixes: GL43-CTS.compute_shader.resource-ubo

Signed-off-by: Dave Airlie 
---
 src/mesa/main/uniforms.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/mesa/main/uniforms.c b/src/mesa/main/uniforms.c
index a9308d0..acb2d06 100644
--- a/src/mesa/main/uniforms.c
+++ b/src/mesa/main/uniforms.c
@@ -1144,6 +1144,12 @@ mesa_bufferiv(struct gl_shader_program *shProg, GLenum 
type,
   GL_REFERENCED_BY_FRAGMENT_SHADER, params,
   caller);
   return;
+   case GL_UNIFORM_BLOCK_REFERENCED_BY_COMPUTE_SHADER:
+   case GL_ATOMIC_COUNTER_BUFFER_REFERENCED_BY_COMPUTE_SHADER:
+  _mesa_program_resource_prop(shProg, res, index,
+  GL_REFERENCED_BY_COMPUTE_SHADER, params,
+  caller);
+  return;
default:
   _mesa_error(ctx, GL_INVALID_ENUM,
   "%s(pname 0x%x (%s))", caller, pname,
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] mesa: remove null check before free

2016-05-04 Thread Thomas H.P. Andersen

On Wed, May 4, 2016 at 7:54 AM, Eduardo Lima Mitev  wrote:

> Patch 2 and 3 are:
>
> Reviewed-by: Eduardo Lima Mitev 
>

Thanks! I do not have commit access. Can I ask you to push (again)?


>
> I barely know about coccinelle scripts so I will let others look at
> first patch.
>
> Thanks
> Eduardo
>
> On 05/04/2016 06:15 AM, Thomas Hindoe Paaboel Andersen wrote:
> > ---
> >  src/mesa/main/readpix.c | 3 +--
> >  src/mesa/main/texgetimage.c | 3 +--
> >  2 files changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
> > index 882d863..1cb06c7 100644
> > --- a/src/mesa/main/readpix.c
> > +++ b/src/mesa/main/readpix.c
> > @@ -608,8 +608,7 @@ read_rgba_pixels( struct gl_context *ctx,
> >   dst, format, type);
> > }
> >
> > -   if (rgba)
> > -  free(rgba);
> > +   free(rgba);
> >
> >  done_swap:
> > /* Handle byte swapping if required */
> > diff --git a/src/mesa/main/texgetimage.c b/src/mesa/main/texgetimage.c
> > index 4ac0ad4..fc3cc6b 100644
> > --- a/src/mesa/main/texgetimage.c
> > +++ b/src/mesa/main/texgetimage.c
> > @@ -557,8 +557,7 @@ get_tex_rgba_uncompressed(struct gl_context *ctx,
> GLuint dimensions,
> > }
> >
> >  done:
> > -   if (rgba)
> > -  free(rgba);
> > +   free(rgba);
> >  }
> >
> >
> >
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] nir: fix assert for wildcard pairs

2016-05-04 Thread Thomas H.P. Andersen

On Wed, May 4, 2016 at 7:46 AM, Eduardo Lima Mitev  wrote:

> Good catch!
>
> Reviewed-by: Eduardo Lima Mitev 
>

Thanks! I do not have commit access. Can I ask you to push?

>
> On 05/04/2016 05:48 AM, Thomas Hindoe Paaboel Andersen wrote:
> > The assert was null checking dest_arr_parent twice. The intention
> > seems to be to check both dest_ and src_.
> >
> > Added in d3636da9
> > ---
> > v2:
> >   Fix the assert rather than checking both in the if(). Hat tip to Ilia.
> >  src/compiler/nir/nir_lower_var_copies.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/src/compiler/nir/nir_lower_var_copies.c
> b/src/compiler/nir/nir_lower_var_copies.c
> > index 707d5af..1a7e2ee 100644
> > --- a/src/compiler/nir/nir_lower_var_copies.c
> > +++ b/src/compiler/nir/nir_lower_var_copies.c
> > @@ -85,7 +85,7 @@ emit_copy_load_store(nir_intrinsic_instr *copy_instr,
> >
> > if (src_arr_parent || dest_arr_parent) {
> >/* Wildcards had better come in matched pairs */
> > -  assert(dest_arr_parent && dest_arr_parent);
> > +  assert(src_arr_parent && dest_arr_parent);
> >
> >nir_deref_array *src_arr =
> nir_deref_as_array(src_arr_parent->child);
> >nir_deref_array *dest_arr =
> nir_deref_as_array(dest_arr_parent->child);
> >
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] mesa/compute: drop pointless casts.

2016-05-04 Thread Dave Airlie

From: Dave Airlie 

We already are a GLintptr, casting won't help.

Signed-off-by: Dave Airlie 
---
 src/mesa/main/api_validate.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
index d455f19..421ae8b 100644
--- a/src/mesa/main/api_validate.c
+++ b/src/mesa/main/api_validate.c
@@ -1107,20 +1107,20 @@ valid_dispatch_indirect(struct gl_context *ctx,
 GLintptr indirect,
 GLsizei size, const char *name)
 {
-   GLintptr end = (GLintptr)indirect + size;
+   GLintptr end = indirect + size;
 
/* From the OpenGL 4.3 Core Specification, Chapter 19, Compute Shaders:
 *
 * "An INVALID_VALUE error is generated if indirect is negative or is not a
 *  multiple of four."
 */
-   if ((GLintptr)indirect & (sizeof(GLuint) - 1)) {
+   if (indirect & (sizeof(GLuint) - 1)) {
   _mesa_error(ctx, GL_INVALID_VALUE,
   "%s(indirect is not aligned)", name);
   return GL_FALSE;
}
 
-   if ((GLintptr)indirect < 0) {
+   if (indirect < 0) {
   _mesa_error(ctx, GL_INVALID_VALUE,
   "%s(indirect is less than zero)", name);
   return GL_FALSE;
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/3] i965: Add infrastucture for sample lod-zero operations.

2016-05-04 Thread Kenneth Graunke

On Wednesday, May 4, 2016 3:54:13 PM PDT Matt Turner wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_defines.h |  5 +
>  src/mesa/drivers/dri/i965/brw_disasm.c  |  3 +++
>  src/mesa/drivers/dri/i965/brw_fs.cpp|  3 +++
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp  | 14 ++
>  src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp |  2 ++
>  src/mesa/drivers/dri/i965/brw_shader.cpp|  6 ++
>  6 files changed, 33 insertions(+)

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] mesa/compute: move compute checks around for tests.

2016-05-04 Thread Dave Airlie

From: Dave Airlie 

This fixes GL43-CTS.compute_shader.api-indirect
which tests the length/4 before anything else.

Signed-off-by: Dave Airlie 
---
 src/mesa/main/api_validate.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
index 688408f..d455f19 100644
--- a/src/mesa/main/api_validate.c
+++ b/src/mesa/main/api_validate.c
@@ -1109,9 +1109,6 @@ valid_dispatch_indirect(struct gl_context *ctx,
 {
GLintptr end = (GLintptr)indirect + size;
 
-   if (!check_valid_to_compute(ctx, name))
-  return GL_FALSE;
-
/* From the OpenGL 4.3 Core Specification, Chapter 19, Compute Shaders:
 *
 * "An INVALID_VALUE error is generated if indirect is negative or is not a
@@ -1153,6 +1150,9 @@ valid_dispatch_indirect(struct gl_context *ctx,
   return GL_FALSE;
}
 
+   if (!check_valid_to_compute(ctx, name))
+  return GL_FALSE;
+
return GL_TRUE;
 }
 
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] i965/fs: Recognize and emit ld_lz, sample_lz, sample_c_lz.

2016-05-04 Thread Kenneth Graunke

On Wednesday, May 4, 2016 3:54:14 PM PDT Matt Turner wrote:
> Ken suggested instead of a big and complicated optimization pass, to
> just recognize the operations here. It's certainly less code and a lot
> prettier, but it seems to actually perform worse for currently unknown
> reasons.

One potential pitfall is that is_zero() currently returns false for
-0.0f.  The fp64 series that's supposedly landing fixes this.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/3] i965/fs: Add and use get_nir_src_imm().

2016-05-04 Thread Kenneth Graunke

On Wednesday, May 4, 2016 3:54:12 PM PDT Matt Turner wrote:
> Terrible name. Suggest something else, or even a better way to do this.
> 
> Basically, the next patch wants to inspect the LOD argument and do
> something different if it's 0.0f. But at that point we've emitted a MOV
> for it and we just have a register to look at.
> ---
> This is an alternative approach to the 4 patch series I sent ast night.
> 
>  src/mesa/drivers/dri/i965/brw_fs.h   |  1 +
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 36 ++
+-
>  2 files changed, 32 insertions(+), 5 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/
brw_fs.h
> index ba6bd3f..29fa593 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -273,6 +273,7 @@ public:
> void nir_emit_jump(const brw::fs_builder &bld,
>nir_jump_instr *instr);
> fs_reg get_nir_src(nir_src src);
> +   fs_reg get_nir_src_imm(nir_src src);
> fs_reg get_nir_dest(nir_dest dest);
> fs_reg get_nir_image_deref(const nir_deref_var *deref);
> fs_reg get_indirect_offset(nir_intrinsic_instr *instr);
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/
dri/i965/brw_fs_nir.cpp
> index 4d14fda..fb59800 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -1245,6 +1245,32 @@ fs_visitor::get_nir_src(nir_src src)
>  }
>  
>  fs_reg
> +fs_visitor::get_nir_src_imm(nir_src src)
> +{
> +   fs_reg reg;
> +   if (src.is_ssa) {
> +  if (src.ssa->parent_instr->type == nir_instr_type_load_const) {
> + nir_load_const_instr *load_const =
> +nir_instr_as_load_const(src.ssa->parent_instr);
> + return brw_imm_ud(load_const->value.u32[0]);
> +  }
> +
> +  reg = nir_ssa_values[src.ssa->index];
> +   } else {
> +  /* We don't handle indirects on locals */
> +  assert(src.reg.indirect == NULL);
> +  reg = offset(nir_locals[src.reg.reg->index], bld,
> +   src.reg.base_offset * src.reg.reg->num_components);
> +   }
> +
> +   /* to avoid floating-point denorm flushing problems, set the type by
> +* default to D - instructions that need floating point semantics will 
set
> +* this to F if they need to
> +*/
> +   return retype(reg, BRW_REGISTER_TYPE_D);
> +}

I actually wrote one of these in commit ce1511c627da2aa2a of my tree,
and it's much simpler:

/**
 * Return an IMM for constants; otherwise call get_nir_src() as normal.
 */
fs_reg
fs_visitor::get_nir_src_imm(nir_src src)
{
   nir_const_value *val = nir_src_as_const_value(src);
   return val ? fs_reg(val->i[0]) : get_nir_src(src);
}

I ended up dropping the patch because I just used nir_src_as_const_value()
in the callers directly, but it's probably nice to have one of these.

> +
> +fs_reg
>  fs_visitor::get_nir_dest(nir_dest dest)
>  {
> if (dest.is_ssa) {
> @@ -3444,7 +3470,7 @@ fs_visitor::nir_emit_texture(const fs_builder &bld, 
nir_tex_instr *instr)
>fs_reg src = get_nir_src(instr->src[i].src);
>switch (instr->src[i].src_type) {
>case nir_tex_src_bias:
> - lod = retype(src, BRW_REGISTER_TYPE_F);
> + lod = retype(get_nir_src_imm(instr->src[i].src), 
BRW_REGISTER_TYPE_F);
>   break;
>case nir_tex_src_comparitor:
>   shadow_comparitor = retype(src, BRW_REGISTER_TYPE_F);
> @@ -3462,7 +3488,7 @@ fs_visitor::nir_emit_texture(const fs_builder &bld, 
nir_tex_instr *instr)
>   }
>   break;
>case nir_tex_src_ddx:
> - lod = retype(src, BRW_REGISTER_TYPE_F);
> + lod = retype(get_nir_src_imm(instr->src[i].src), 
BRW_REGISTER_TYPE_F);
>   lod_components = nir_tex_instr_src_size(instr, i);
>   break;
>case nir_tex_src_ddy:
> @@ -3471,13 +3497,13 @@ fs_visitor::nir_emit_texture(const fs_builder &bld, 
nir_tex_instr *instr)
>case nir_tex_src_lod:
>   switch (instr->op) {
>   case nir_texop_txs:
> -lod = retype(src, BRW_REGISTER_TYPE_UD);
> +lod = retype(get_nir_src_imm(instr->src[i].src), 
BRW_REGISTER_TYPE_UD);
>  break;
>   case nir_texop_txf:
> -lod = retype(src, BRW_REGISTER_TYPE_D);
> +lod = retype(get_nir_src_imm(instr->src[i].src), 
BRW_REGISTER_TYPE_D);
>  break;
>   default:
> -lod = retype(src, BRW_REGISTER_TYPE_F);
> +lod = retype(get_nir_src_imm(instr->src[i].src), 
BRW_REGISTER_TYPE_F);
>  break;
>   }
>   break;
> 



signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/4] i965/fs: Merge nir_emit_texture and emit_texture

2016-05-04 Thread Kenneth Graunke

On Tuesday, May 3, 2016 3:00:24 PM PDT Jason Ekstrand wrote:
> The fs_visitor::emit_texture helper originated when we still had both NIR
> and IR visitors for the FS backend.  Since the old visitor was removed,
> emit_texture serves no real purpose beyond arbitrarily splitting
> heavily-linked code across two functions.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.h   |  18 ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 223 ++
+
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 159 ---
>  3 files changed, 162 insertions(+), 238 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/
brw_fs.h
> index a5c3297..925e4b7 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -200,21 +200,6 @@ public:
> void emit_interpolation_setup_gen4();
> void emit_interpolation_setup_gen6();
> void compute_sample_position(fs_reg dst, fs_reg int_sample_pos);
> -   void emit_texture(ir_texture_opcode op,
> - const glsl_type *dest_type,
> - fs_reg coordinate, int components,
> - fs_reg shadow_c,
> - fs_reg lod, fs_reg dpdy, int grad_components,
> - fs_reg sample_index,
> - fs_reg offset,
> - fs_reg mcs,
> - int gather_component,
> - bool is_cube_array,
> - uint32_t surface,
> - fs_reg surface_reg,
> - uint32_t sampler,
> - fs_reg sampler_reg,
> - unsigned return_channels);
> fs_reg emit_mcs_fetch(const fs_reg &coordinate, unsigned components,
>   const fs_reg &sampler);
> void emit_gen6_gather_wa(uint8_t wa, fs_reg dst);
> @@ -375,9 +360,6 @@ public:
> bool simd16_unsupported;
> char *no16_msg;
>  
> -   /* Result of last visit() method. Still used by emit_texture() */
> -   fs_reg result;
> -
> /** Register numbers for thread payload fields. */
> struct thread_payload {
>uint8_t source_depth_reg;
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/
dri/i965/brw_fs_nir.cpp
> index 360e2c9..ebc54ad 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -3068,65 +3068,61 @@ fs_visitor::nir_emit_texture(const fs_builder &bld, 
nir_tex_instr *instr)
>  {
> unsigned texture = instr->texture_index;
> unsigned sampler = instr->sampler_index;
> -   fs_reg texture_reg(brw_imm_ud(texture));
> -   fs_reg sampler_reg(brw_imm_ud(sampler));
>  
> -   int gather_component = instr->component;
> +   fs_reg srcs[TEX_LOGICAL_NUM_SRCS];
>  
> -   bool is_cube_array = instr->sampler_dim == GLSL_SAMPLER_DIM_CUBE &&
> -instr->is_array;
> +   srcs[TEX_LOGICAL_SRC_SURFACE] = brw_imm_ud(texture);
> +   srcs[TEX_LOGICAL_SRC_SAMPLER] = brw_imm_ud(sampler);
>  
> int lod_components = 0;
>  
> -   fs_reg coordinate, shadow_comparitor, lod, lod2, sample_index, mcs, 
tex_offset;
> -
> /* The hardware requires a LOD for buffer textures */
> if (instr->sampler_dim == GLSL_SAMPLER_DIM_BUF)
> -  lod = brw_imm_d(0);
> +  srcs[TEX_LOGICAL_SRC_LOD] = brw_imm_d(0);
>  
> for (unsigned i = 0; i < instr->num_srcs; i++) {
>fs_reg src = get_nir_src(instr->src[i].src);
>switch (instr->src[i].src_type) {
>case nir_tex_src_bias:
> - lod = retype(src, BRW_REGISTER_TYPE_F);
> + srcs[TEX_LOGICAL_SRC_LOD] = retype(src, BRW_REGISTER_TYPE_F);
>   break;
>case nir_tex_src_comparitor:
> - shadow_comparitor = retype(src, BRW_REGISTER_TYPE_F);
> + srcs[TEX_LOGICAL_SRC_SHADOW_C] = retype(src, BRW_REGISTER_TYPE_F);
>   break;
>case nir_tex_src_coord:
>   switch (instr->op) {
>   case nir_texop_txf:
>   case nir_texop_txf_ms:
>   case nir_texop_samples_identical:
> -coordinate = retype(src, BRW_REGISTER_TYPE_D);
> +srcs[TEX_LOGICAL_SRC_COORDINATE] = retype(src, 
BRW_REGISTER_TYPE_D);
>  break;
>   default:
> -coordinate = retype(src, BRW_REGISTER_TYPE_F);
> +srcs[TEX_LOGICAL_SRC_COORDINATE] = retype(src, 
BRW_REGISTER_TYPE_F);
>  break;
>   }
>   break;
>case nir_tex_src_ddx:
> - lod = retype(src, BRW_REGISTER_TYPE_F);
> + srcs[TEX_LOGICAL_SRC_LOD] = retype(src, BRW_REGISTER_TYPE_F);
>   lod_components = nir_tex_instr_src_size(instr, i);
>   break;
>case nir_tex_src_ddy:
> - lod2 = retype(src, BRW_REGISTER_TYPE_F);
> + srcs[TEX_LOGICAL_SRC_LOD2] = retype(src, BRW_REGISTER_TYPE_F);
>   break;
>case nir_tex_src_lod:
>   switch (instr->op) {
>   case nir_texop_txs:
> -

[Mesa-dev] [PATCH 13/14] radeonsi: consolidate radeon_add_to_buffer_list calls for DMA

2016-05-04 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/radeon/r600_pipe_common.c | 14 ++
 src/gallium/drivers/radeonsi/cik_sdma.c   | 23 ---
 src/gallium/drivers/radeonsi/si_dma.c | 10 --
 3 files changed, 14 insertions(+), 33 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index c0579da..3eb12f3 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -166,6 +166,20 @@ void r600_need_dma_space(struct r600_common_context *ctx, 
unsigned num_dw,
ctx->dma.flush(ctx, RADEON_FLUSH_ASYNC, NULL);
assert((num_dw + ctx->dma.cs->cdw) <= ctx->dma.cs->max_dw);
}
+
+   /* If GPUVM is not supported, the CS checker needs 2 entries
+* in the buffer list per packet, which has to be done manually.
+*/
+   if (ctx->screen->info.has_virtual_memory) {
+   if (dst)
+   radeon_add_to_buffer_list(ctx, &ctx->dma, dst,
+ RADEON_USAGE_WRITE,
+ RADEON_PRIO_SDMA_BUFFER);
+   if (src)
+   radeon_add_to_buffer_list(ctx, &ctx->dma, src,
+ RADEON_USAGE_READ,
+ RADEON_PRIO_SDMA_BUFFER);
+   }
 }
 
 /* This is required to prevent read-after-write hazards. */
diff --git a/src/gallium/drivers/radeonsi/cik_sdma.c 
b/src/gallium/drivers/radeonsi/cik_sdma.c
index 29d9a00..17c3a83 100644
--- a/src/gallium/drivers/radeonsi/cik_sdma.c
+++ b/src/gallium/drivers/radeonsi/cik_sdma.c
@@ -49,11 +49,6 @@ static void cik_sdma_do_copy_buffer(struct si_context *ctx,
ncopy = DIV_ROUND_UP(size, CIK_SDMA_COPY_MAX_SIZE);
r600_need_dma_space(&ctx->b, ncopy * 7, rdst, rsrc);
 
-   radeon_add_to_buffer_list(&ctx->b, &ctx->b.dma, rsrc, RADEON_USAGE_READ,
- RADEON_PRIO_SDMA_BUFFER);
-   radeon_add_to_buffer_list(&ctx->b, &ctx->b.dma, rdst, 
RADEON_USAGE_WRITE,
- RADEON_PRIO_SDMA_BUFFER);
-
for (i = 0; i < ncopy; i++) {
csize = MIN2(size, CIK_SDMA_COPY_MAX_SIZE);
cs->buf[cs->cdw++] = CIK_SDMA_PACKET(CIK_SDMA_OPCODE_COPY,
@@ -213,12 +208,6 @@ static bool cik_sdma_copy_texture(struct si_context *sctx,
struct radeon_winsys_cs *cs = sctx->b.dma.cs;
 
r600_need_dma_space(&sctx->b, 13, &rdst->resource, 
&rsrc->resource);
-   radeon_add_to_buffer_list(&sctx->b, &sctx->b.dma, 
&rsrc->resource,
- RADEON_USAGE_READ,
- RADEON_PRIO_SDMA_TEXTURE);
-   radeon_add_to_buffer_list(&sctx->b, &sctx->b.dma, 
&rdst->resource,
- RADEON_USAGE_WRITE,
- RADEON_PRIO_SDMA_TEXTURE);
 
radeon_emit(cs, CIK_SDMA_PACKET(CIK_SDMA_OPCODE_COPY,

CIK_SDMA_COPY_SUB_OPCODE_LINEAR_SUB_WINDOW, 0) |
@@ -381,12 +370,6 @@ static bool cik_sdma_copy_texture(struct si_context *sctx,
struct radeon_winsys_cs *cs = sctx->b.dma.cs;
 
r600_need_dma_space(&sctx->b, 14, &rdst->resource, 
&rsrc->resource);
-   radeon_add_to_buffer_list(&sctx->b, &sctx->b.dma, 
&rsrc->resource,
- RADEON_USAGE_READ,
- RADEON_PRIO_SDMA_TEXTURE);
-   radeon_add_to_buffer_list(&sctx->b, &sctx->b.dma, 
&rdst->resource,
- RADEON_USAGE_WRITE,
- RADEON_PRIO_SDMA_TEXTURE);
 
radeon_emit(cs, CIK_SDMA_PACKET(CIK_SDMA_OPCODE_COPY,

CIK_SDMA_COPY_SUB_OPCODE_TILED_SUB_WINDOW, 0) |
@@ -483,12 +466,6 @@ static bool cik_sdma_copy_texture(struct si_context *sctx,
struct radeon_winsys_cs *cs = sctx->b.dma.cs;
 
r600_need_dma_space(&sctx->b, 15, &rdst->resource, 
&rsrc->resource);
-   radeon_add_to_buffer_list(&sctx->b, &sctx->b.dma, 
&rsrc->resource,
- RADEON_USAGE_READ,
- RADEON_PRIO_SDMA_TEXTURE);
-   radeon_add_to_buffer_list(&sctx->b, &sctx->b.dma, 
&rdst->resource,
- RADEON_USAGE_WRITE,
- RADEON_PRIO_SDMA_TEXTURE);
 
radeon_emit(cs, CIK_SDMA_PACKET(CIK_SDMA_OPCODE_COPY,

[Mesa-dev] [PATCH 14/14] gallium/radeon: don't flush the GFX IB if DMA doesn't depend on it

2016-05-04 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/radeon/r600_pipe_common.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 3eb12f3..4845587 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -154,8 +154,14 @@ void r600_need_dma_space(struct r600_common_context *ctx, 
unsigned num_dw,
gtt += src->buf->size;
}
 
-   /* Flush the GFX IB if it's not empty. */
-   if (ctx->gfx.cs->cdw > ctx->initial_gfx_cs_size)
+   /* Flush the GFX IB if DMA depends on it. */
+   if (ctx->gfx.cs->cdw > ctx->initial_gfx_cs_size &&
+   ((dst &&
+ ctx->ws->cs_is_buffer_referenced(ctx->gfx.cs, dst->buf,
+  RADEON_USAGE_READWRITE)) ||
+(src &&
+ ctx->ws->cs_is_buffer_referenced(ctx->gfx.cs, src->buf,
+  RADEON_USAGE_WRITE
ctx->gfx.flush(ctx, RADEON_FLUSH_ASYNC, NULL);
 
/* Flush if there's not enough space, or if the memory usage per IB
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 11/14] gallium/radeon: flush if DMA IB memory usage is too high

2016-05-04 Thread Marek Olšák

From: Marek Olšák 

This prevents IB rejections due to insane memory usage from
many concecutive texture uploads.
---
 src/gallium/drivers/r600/evergreen_hw_context.c |  2 +-
 src/gallium/drivers/r600/evergreen_state.c  |  2 +-
 src/gallium/drivers/r600/r600_hw_context.c  |  2 +-
 src/gallium/drivers/r600/r600_state.c   |  2 +-
 src/gallium/drivers/radeon/r600_pipe_common.c   | 27 +
 src/gallium/drivers/radeon/r600_pipe_common.h   |  3 ++-
 src/gallium/drivers/radeonsi/cik_sdma.c |  8 
 src/gallium/drivers/radeonsi/si_dma.c   |  4 ++--
 8 files changed, 35 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c 
b/src/gallium/drivers/r600/evergreen_hw_context.c
index c2dba8c..cd07319 100644
--- a/src/gallium/drivers/r600/evergreen_hw_context.c
+++ b/src/gallium/drivers/r600/evergreen_hw_context.c
@@ -60,7 +60,7 @@ void evergreen_dma_copy_buffer(struct r600_context *rctx,
}
ncopy = (size / EG_DMA_COPY_MAX_SIZE) + !!(size % EG_DMA_COPY_MAX_SIZE);
 
-   r600_need_dma_space(&rctx->b, ncopy * 5);
+   r600_need_dma_space(&rctx->b, ncopy * 5, rdst, rsrc);
for (i = 0; i < ncopy; i++) {
csize = size < EG_DMA_COPY_MAX_SIZE ? size : 
EG_DMA_COPY_MAX_SIZE;
/* emit reloc before writing cs so that cs is always in 
consistent state */
diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index acf60c6..62152c0 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -3442,7 +3442,7 @@ static void evergreen_dma_copy_tile(struct r600_context 
*rctx,
 
size = (copy_height * pitch) / 4;
ncopy = (size / EG_DMA_COPY_MAX_SIZE) + !!(size % EG_DMA_COPY_MAX_SIZE);
-   r600_need_dma_space(&rctx->b, ncopy * 9);
+   r600_need_dma_space(&rctx->b, ncopy * 9, &rdst->resource, 
&rsrc->resource);
 
for (i = 0; i < ncopy; i++) {
cheight = copy_height;
diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index fa1028b..857da7f 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -467,7 +467,7 @@ void r600_dma_copy_buffer(struct r600_context *rctx,
size >>= 2; /* convert to dwords */
ncopy = (size / R600_DMA_COPY_MAX_SIZE_DW) + !!(size % 
R600_DMA_COPY_MAX_SIZE_DW);
 
-   r600_need_dma_space(&rctx->b, ncopy * 5);
+   r600_need_dma_space(&rctx->b, ncopy * 5, rdst, rsrc);
for (i = 0; i < ncopy; i++) {
csize = size < R600_DMA_COPY_MAX_SIZE_DW ? size : 
R600_DMA_COPY_MAX_SIZE_DW;
/* emit reloc before writing cs so that cs is always in 
consistent state */
diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index 715c6f1..ab0cf5c 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -2918,7 +2918,7 @@ static boolean r600_dma_copy_tile(struct r600_context 
*rctx,
 */
cheight = ((R600_DMA_COPY_MAX_SIZE_DW * 4) / pitch) & 0xfff8;
ncopy = (copy_height / cheight) + !!(copy_height % cheight);
-   r600_need_dma_space(&rctx->b, ncopy * 7);
+   r600_need_dma_space(&rctx->b, ncopy * 7, &rdst->resource, 
&rsrc->resource);
 
for (i = 0; i < ncopy; i++) {
cheight = cheight > copy_height ? copy_height : cheight;
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 02e17ba..eac7812 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -136,14 +136,33 @@ void r600_draw_rectangle(struct blitter_context *blitter,
pipe_resource_reference(&buf, NULL);
 }
 
-void r600_need_dma_space(struct r600_common_context *ctx, unsigned num_dw)
+void r600_need_dma_space(struct r600_common_context *ctx, unsigned num_dw,
+ struct r600_resource *dst, struct r600_resource *src)
 {
+   uint64_t vram = 0, gtt = 0;
+
+   if (dst) {
+   if (dst->domains & RADEON_DOMAIN_VRAM)
+   vram += dst->buf->size;
+   else if (dst->domains & RADEON_DOMAIN_GTT)
+   gtt += dst->buf->size;
+   }
+   if (src) {
+   if (src->domains & RADEON_DOMAIN_VRAM)
+   vram += src->buf->size;
+   else if (src->domains & RADEON_DOMAIN_GTT)
+   gtt += src->buf->size;
+   }
+
/* Flush the GFX IB if it's not empty. */
if (ctx->gfx.cs->cdw > ctx->initial_gfx_cs_size)
ctx->gfx.flush(ctx, RADEON_FLUSH_ASYNC, NULL);
 
-   /* Flush if there's not enough space. */
-   if ((num_dw + ctx->dma.cs->cdw) > ctx->dma.cs->max_dw) {
+   /* Flush if there's not enough space, or if the

[Mesa-dev] [PATCH 12/14] gallium/radeon: add a heuristic for better (S)DMA performance

2016-05-04 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/radeon/r600_pipe_common.c | 16 
 src/gallium/drivers/radeon/radeon_winsys.h|  2 ++
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.c |  8 
 src/gallium/winsys/radeon/drm/radeon_drm_cs.c |  8 
 4 files changed, 34 insertions(+)

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index eac7812..c0579da 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -176,6 +176,22 @@ void r600_dma_emit_wait_idle(struct r600_common_context 
*rctx)
/* done at the end of DMA calls, so increment this. */
rctx->num_dma_calls++;
 
+   /* IBs using too little memory are limited by the IB submission 
overhead.
+* IBs using too much memory are limited by the kernel/TTM overhead.
+* Too long IBs create CPU-GPU pipeline bubbles and add latency.
+*
+* This heuristic makes sure that DMA requests are executed
+* very soon or immediately after the call is made and lowers memory
+* usage. It improves texture upload performance by keeping the DMA
+* engine busy while uploads are being submitted.
+*
+* 512 KB of copied data (1 MB / 2) is enough to submit an IB.
+*/
+   if (rctx->ws->cs_query_memory_usage(rctx->dma.cs) > 1024 * 1024) {
+   rctx->dma.flush(rctx, RADEON_FLUSH_ASYNC, NULL);
+   return;
+   }
+
r600_need_dma_space(rctx, 1, NULL, NULL);
 
if (cs->cdw == 0) /* empty queue */
diff --git a/src/gallium/drivers/radeon/radeon_winsys.h 
b/src/gallium/drivers/radeon/radeon_winsys.h
index 79c548c..2e1848e 100644
--- a/src/gallium/drivers/radeon/radeon_winsys.h
+++ b/src/gallium/drivers/radeon/radeon_winsys.h
@@ -682,6 +682,8 @@ struct radeon_winsys {
  */
 boolean (*cs_memory_below_limit)(struct radeon_winsys_cs *cs, uint64_t 
vram, uint64_t gtt);
 
+uint64_t (*cs_query_memory_usage)(struct radeon_winsys_cs *cs);
+
 /**
  * Return the buffer list.
  *
diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
index 03e45a9..a6156fc 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
@@ -562,6 +562,13 @@ static boolean amdgpu_cs_memory_below_limit(struct 
radeon_winsys_cs *rcs, uint64
return gtt < ws->info.gart_size * 0.7;
 }
 
+static uint64_t amdgpu_cs_query_memory_usage(struct radeon_winsys_cs *rcs)
+{
+   struct amdgpu_cs *cs = amdgpu_cs(rcs);
+
+   return cs->used_vram + cs->used_gart;
+}
+
 static unsigned amdgpu_cs_get_buffer_list(struct radeon_winsys_cs *rcs,
   struct radeon_bo_list_item *list)
 {
@@ -827,6 +834,7 @@ void amdgpu_cs_init_functions(struct amdgpu_winsys *ws)
ws->base.cs_lookup_buffer = amdgpu_cs_lookup_buffer;
ws->base.cs_validate = amdgpu_cs_validate;
ws->base.cs_memory_below_limit = amdgpu_cs_memory_below_limit;
+   ws->base.cs_query_memory_usage = amdgpu_cs_query_memory_usage;
ws->base.cs_get_buffer_list = amdgpu_cs_get_buffer_list;
ws->base.cs_flush = amdgpu_cs_flush;
ws->base.cs_is_buffer_referenced = amdgpu_bo_is_referenced;
diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c 
b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
index 7a901a1..9ac5081 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
@@ -398,6 +398,13 @@ static boolean radeon_drm_cs_memory_below_limit(struct 
radeon_winsys_cs *rcs, ui
 return gtt < cs->ws->info.gart_size * 0.7;
 }
 
+static uint64_t radeon_drm_cs_query_memory_usage(struct radeon_winsys_cs *rcs)
+{
+   struct radeon_drm_cs *cs = radeon_drm_cs(rcs);
+
+   return cs->csc->used_vram + cs->csc->used_gart;
+}
+
 static unsigned radeon_drm_cs_get_buffer_list(struct radeon_winsys_cs *rcs,
   struct radeon_bo_list_item *list)
 {
@@ -671,6 +678,7 @@ void radeon_drm_cs_init_functions(struct radeon_drm_winsys 
*ws)
 ws->base.cs_lookup_buffer = radeon_drm_cs_lookup_buffer;
 ws->base.cs_validate = radeon_drm_cs_validate;
 ws->base.cs_memory_below_limit = radeon_drm_cs_memory_below_limit;
+ws->base.cs_query_memory_usage = radeon_drm_cs_query_memory_usage;
 ws->base.cs_get_buffer_list = radeon_drm_cs_get_buffer_list;
 ws->base.cs_flush = radeon_drm_cs_flush;
 ws->base.cs_is_buffer_referenced = radeon_bo_is_referenced;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 09/14] gallium/radeon: fix (S)DMA read-after-write hazards

2016-05-04 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/r600/evergreen_hw_context.c |  1 +
 src/gallium/drivers/r600/evergreen_state.c  |  1 +
 src/gallium/drivers/r600/r600_hw_context.c  |  1 +
 src/gallium/drivers/r600/r600_state.c   |  1 +
 src/gallium/drivers/radeon/r600_pipe_common.c   | 24 
 src/gallium/drivers/radeon/r600_pipe_common.h   |  1 +
 src/gallium/drivers/radeonsi/cik_sdma.c |  1 +
 src/gallium/drivers/radeonsi/si_dma.c   |  2 ++
 8 files changed, 32 insertions(+)

diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c 
b/src/gallium/drivers/r600/evergreen_hw_context.c
index a0f4680..c2dba8c 100644
--- a/src/gallium/drivers/r600/evergreen_hw_context.c
+++ b/src/gallium/drivers/r600/evergreen_hw_context.c
@@ -77,6 +77,7 @@ void evergreen_dma_copy_buffer(struct r600_context *rctx,
src_offset += csize << shift;
size -= csize;
}
+   r600_dma_emit_wait_idle(&rctx->b);
 }
 
 /* The max number of bytes to copy per packet. */
diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 672ddd4..acf60c6 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -3470,6 +3470,7 @@ static void evergreen_dma_copy_tile(struct r600_context 
*rctx,
addr += cheight * pitch;
y += cheight;
}
+   r600_dma_emit_wait_idle(&rctx->b);
 }
 
 static void evergreen_dma_copy(struct pipe_context *ctx,
diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index 2bc6d3f..fa1028b 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -484,4 +484,5 @@ void r600_dma_copy_buffer(struct r600_context *rctx,
src_offset += csize << 2;
size -= csize;
}
+   r600_dma_emit_wait_idle(&rctx->b);
 }
diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index 48e3663..715c6f1 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -2941,6 +2941,7 @@ static boolean r600_dma_copy_tile(struct r600_context 
*rctx,
addr += cheight * pitch;
y += cheight;
}
+   r600_dma_emit_wait_idle(&rctx->b);
return TRUE;
 }
 
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index e1b0b63..02e17ba 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -149,6 +149,30 @@ void r600_need_dma_space(struct r600_common_context *ctx, 
unsigned num_dw)
}
 }
 
+/* This is required to prevent read-after-write hazards. */
+void r600_dma_emit_wait_idle(struct r600_common_context *rctx)
+{
+   struct radeon_winsys_cs *cs = rctx->dma.cs;
+
+   /* done at the end of DMA calls, so increment this. */
+   rctx->num_dma_calls++;
+
+   r600_need_dma_space(rctx, 1);
+
+   if (cs->cdw == 0) /* empty queue */
+   return;
+
+   /* NOP waits for idle on Evergreen and later. */
+   if (rctx->chip_class >= CIK)
+   radeon_emit(cs, 0x); /* NOP */
+   else if (rctx->chip_class >= EVERGREEN)
+   radeon_emit(cs, 0xf000); /* NOP */
+   else {
+   /* TODO: R600-R700 should use the FENCE packet.
+* CS checker support is required. */
+   }
+}
+
 static void r600_memory_barrier(struct pipe_context *ctx, unsigned flags)
 {
 }
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index cc2d33e..a6afc43 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -598,6 +598,7 @@ struct pipe_resource *r600_resource_create_common(struct 
pipe_screen *screen,
  const struct pipe_resource 
*templ);
 const char *r600_get_llvm_processor_name(enum radeon_family family);
 void r600_need_dma_space(struct r600_common_context *ctx, unsigned num_dw);
+void r600_dma_emit_wait_idle(struct r600_common_context *rctx);
 
 /* r600_gpu_load.c */
 void r600_gpu_load_kill_thread(struct r600_common_screen *rscreen);
diff --git a/src/gallium/drivers/radeonsi/cik_sdma.c 
b/src/gallium/drivers/radeonsi/cik_sdma.c
index 88a994e..5021578 100644
--- a/src/gallium/drivers/radeonsi/cik_sdma.c
+++ b/src/gallium/drivers/radeonsi/cik_sdma.c
@@ -87,6 +87,7 @@ static void cik_sdma_copy_buffer(struct si_context *ctx,
   dst_offset + size);
 
cik_sdma_do_copy_buffer(ctx, dst, src, dst_offset, src_offset, size);
+   r600_dma_emit_wait_idle(&ctx->b);
 }
 
 static void cik_sdma_copy(struct pipe_context *ctx,
diff --git a/src/gallium/drivers/radeonsi/si_dma.c 
b/src/gallium/drivers/radeonsi/si_dma.c
index 033eb

[Mesa-dev] [PATCH 01/14] gallium/radeon: rename r600_texture_disable_cmask -> discard_cmask

2016-05-04 Thread Marek Olšák

From: Marek Olšák 

because it doesn't decompress
---
 src/gallium/drivers/radeon/r600_texture.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_texture.c 
b/src/gallium/drivers/radeon/r600_texture.c
index 67b2a34..6df013a 100644
--- a/src/gallium/drivers/radeon/r600_texture.c
+++ b/src/gallium/drivers/radeon/r600_texture.c
@@ -267,7 +267,7 @@ static void r600_eliminate_fast_color_clear(struct 
r600_common_screen *rscreen,
pipe_mutex_unlock(rscreen->aux_context_lock);
 }
 
-static void r600_texture_disable_cmask(struct r600_common_screen *rscreen,
+static void r600_texture_discard_cmask(struct r600_common_screen *rscreen,
   struct r600_texture *rtex)
 {
if (!rtex->cmask.size)
@@ -351,7 +351,7 @@ static boolean r600_texture_get_handle(struct pipe_screen* 
screen,
/* Disable CMASK if flush_resource isn't going
 * to be called.
 */
-   r600_texture_disable_cmask(rscreen, rtex);
+   r600_texture_discard_cmask(rscreen, rtex);
update_metadata = true;
}
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 10/14] radeonsi: add new SDMA texture copy code

2016-05-04 Thread Marek Olšák

From: Marek Olšák 

This implements:
- Linear-to-linear partial copies. (unaligned)
- Tiled-to-linear and linear-to-tiled partial copies.
  (unaligned except 1-2 Bpp)
- Tiled-to-tiled partial copies aligned to 8x8.
---
 src/gallium/drivers/radeonsi/cik_sdma.c | 436 
 1 file changed, 436 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/cik_sdma.c 
b/src/gallium/drivers/radeonsi/cik_sdma.c
index 5021578..ea6a122 100644
--- a/src/gallium/drivers/radeonsi/cik_sdma.c
+++ b/src/gallium/drivers/radeonsi/cik_sdma.c
@@ -90,6 +90,438 @@ static void cik_sdma_copy_buffer(struct si_context *ctx,
r600_dma_emit_wait_idle(&ctx->b);
 }
 
+static unsigned minify_as_blocks(unsigned width, unsigned level, unsigned 
blk_w)
+{
+   width = u_minify(width, level);
+   return DIV_ROUND_UP(width, blk_w);
+}
+
+static unsigned encode_tile_info(struct si_context *sctx,
+struct r600_texture *tex, unsigned level,
+bool set_bpp)
+{
+   struct radeon_info *info = &sctx->screen->b.info;
+   unsigned tile_index = tex->surface.tiling_index[level];
+   unsigned macro_tile_index = tex->surface.macro_tile_index;
+   unsigned tile_mode = info->si_tile_mode_array[tile_index];
+   unsigned macro_tile_mode = 
info->cik_macrotile_mode_array[macro_tile_index];
+
+   return (set_bpp ? util_logbase2(tex->surface.bpe) : 0) |
+   (G_009910_ARRAY_MODE(tile_mode) << 3) |
+   (G_009910_MICRO_TILE_MODE_NEW(tile_mode) << 8) |
+   /* Non-depth modes don't have TILE_SPLIT set. */
+   ((util_logbase2(tex->surface.tile_split >> 6)) << 11) |
+   (G_009990_BANK_WIDTH(macro_tile_mode) << 15) |
+   (G_009990_BANK_HEIGHT(macro_tile_mode) << 18) |
+   (G_009990_NUM_BANKS(macro_tile_mode) << 21) |
+   (G_009990_MACRO_TILE_ASPECT(macro_tile_mode) << 24) |
+   (G_009910_PIPE_CONFIG(tile_mode) << 26);
+}
+
+static bool cik_sdma_copy_texture(struct si_context *sctx,
+ struct pipe_resource *dst,
+ unsigned dst_level,
+ unsigned dstx, unsigned dsty, unsigned dstz,
+ struct pipe_resource *src,
+ unsigned src_level,
+ const struct pipe_box *src_box)
+{
+   struct radeon_info *info = &sctx->screen->b.info;
+   struct r600_texture *rsrc = (struct r600_texture*)src;
+   struct r600_texture *rdst = (struct r600_texture*)dst;
+   unsigned bpp = rdst->surface.bpe;
+   uint64_t dst_address = rdst->resource.gpu_address +
+  rdst->surface.level[dst_level].offset;
+   uint64_t src_address = rsrc->resource.gpu_address +
+  rsrc->surface.level[src_level].offset;
+   unsigned dst_mode = rdst->surface.level[dst_level].mode;
+   unsigned src_mode = rsrc->surface.level[src_level].mode;
+   unsigned dst_tile_index = rdst->surface.tiling_index[dst_level];
+   unsigned src_tile_index = rsrc->surface.tiling_index[src_level];
+   unsigned dst_tile_mode = info->si_tile_mode_array[dst_tile_index];
+   unsigned src_tile_mode = info->si_tile_mode_array[src_tile_index];
+   unsigned dst_micro_mode = G_009910_MICRO_TILE_MODE_NEW(dst_tile_mode);
+   unsigned src_micro_mode = G_009910_MICRO_TILE_MODE_NEW(src_tile_mode);
+   unsigned dst_pitch = rdst->surface.level[dst_level].pitch_bytes / bpp;
+   unsigned src_pitch = rsrc->surface.level[src_level].pitch_bytes / bpp;
+   uint64_t dst_slice_pitch = rdst->surface.level[dst_level].slice_size / 
bpp;
+   uint64_t src_slice_pitch = rsrc->surface.level[src_level].slice_size / 
bpp;
+   unsigned dst_width = minify_as_blocks(rdst->resource.b.b.width0,
+ dst_level, rdst->surface.blk_w);
+   unsigned src_width = minify_as_blocks(rsrc->resource.b.b.width0,
+ src_level, rsrc->surface.blk_w);
+   unsigned dst_height = minify_as_blocks(rdst->resource.b.b.height0,
+  dst_level, rdst->surface.blk_h);
+   unsigned src_height = minify_as_blocks(rsrc->resource.b.b.height0,
+  src_level, rsrc->surface.blk_h);
+   unsigned srcx = src_box->x / rsrc->surface.blk_w;
+   unsigned srcy = src_box->y / rsrc->surface.blk_h;
+   unsigned srcz = src_box->z;
+   unsigned copy_width = DIV_ROUND_UP(src_box->width, rsrc->surface.blk_w);
+   unsigned copy_height = DIV_ROUND_UP(src_box->height, 
rsrc->surface.blk_h);
+   unsigned copy_depth = src_box->depth;
+
+   assert(src_level <= src->last_level);
+   assert(dst_level <= dst->last_level);
+   assert(rdst->surface.level[dst_level].offset +
+

[Mesa-dev] [PATCH 05/14] gallium/radeon: implement randomized SDMA texture copy testing (v2)

2016-05-04 Thread Marek Olšák

From: Marek Olšák 

v2: - adjustments for exercising all important SDMA code paths
- decrease the probability of getting huge sizes (faster testing)
- increase the probability of getting power-of-two dimensions
- change the memory cap to 128MB (faster testing)
- better detect which engine has been used
---
 src/gallium/drivers/r600/r600_pipe.c  |   3 +
 src/gallium/drivers/radeon/Makefile.sources   |   1 +
 src/gallium/drivers/radeon/r600_pipe_common.c |   2 +
 src/gallium/drivers/radeon/r600_pipe_common.h |   6 +
 src/gallium/drivers/radeon/r600_test_dma.c| 404 ++
 src/gallium/drivers/radeonsi/si_pipe.c|   3 +
 6 files changed, 419 insertions(+)
 create mode 100644 src/gallium/drivers/radeon/r600_test_dma.c

diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index 52f0986..98c5a89 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -707,5 +707,8 @@ struct pipe_screen *r600_screen_create(struct radeon_winsys 
*ws)
}
 #endif
 
+   if (rscreen->b.debug_flags & DBG_TEST_DMA)
+   r600_test_dma(&rscreen->b);
+
return &rscreen->b.b;
 }
diff --git a/src/gallium/drivers/radeon/Makefile.sources 
b/src/gallium/drivers/radeon/Makefile.sources
index f993d75..6fbed81 100644
--- a/src/gallium/drivers/radeon/Makefile.sources
+++ b/src/gallium/drivers/radeon/Makefile.sources
@@ -10,6 +10,7 @@ C_SOURCES := \
r600_query.c \
r600_query.h \
r600_streamout.c \
+   r600_test_dma.c \
r600_texture.c \
r600_viewport.c \
radeon_uvd.c \
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index dae2369..e1b0b63 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -383,6 +383,8 @@ static const struct debug_named_value 
common_debug_options[] = {
{ "noasm", DBG_NO_ASM, "Don't print disassembled shaders"},
{ "preoptir", DBG_PREOPT_IR, "Print the LLVM IR before initial 
optimizations" },
 
+   { "testdma", DBG_TEST_DMA, "Invoke SDMA tests and exit." },
+
/* features */
{ "nodma", DBG_NO_ASYNC_DMA, "Disable asynchronous DMA" },
{ "nohyperz", DBG_NO_HYPERZ, "Disable Hyper-Z" },
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index 803c89c..cc2d33e 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -77,6 +77,8 @@
 #define DBG_NO_TGSI(1 << 13)
 #define DBG_NO_ASM (1 << 14)
 #define DBG_PREOPT_IR  (1 << 15)
+/* gaps */
+#define DBG_TEST_DMA   (1 << 20)
 /* Bits 21-31 are reserved for the r600g driver. */
 /* features */
 #define DBG_NO_ASYNC_DMA   (1llu << 32)
@@ -484,6 +486,7 @@ struct r600_common_context {
unsignedmax_db; /* for OQ */
/* Misc stats. */
unsignednum_draw_calls;
+   unsignednum_dma_calls;
 
/* Render condition. */
struct r600_atomrender_cond_atom;
@@ -622,6 +625,9 @@ void r600_update_prims_generated_query_state(struct 
r600_common_context *rctx,
 unsigned type, int diff);
 void r600_streamout_init(struct r600_common_context *rctx);
 
+/* r600_test_dma.c */
+void r600_test_dma(struct r600_common_screen *rscreen);
+
 /* r600_texture.c */
 bool r600_prepare_for_dma_blit(struct r600_common_context *rctx,
struct r600_texture *rdst,
diff --git a/src/gallium/drivers/radeon/r600_test_dma.c 
b/src/gallium/drivers/radeon/r600_test_dma.c
new file mode 100644
index 000..c203b4d
--- /dev/null
+++ b/src/gallium/drivers/radeon/r600_test_dma.c
@@ -0,0 +1,404 @@
+/*
+ * Copyright 2016 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRAC

[Mesa-dev] [PATCH 03/14] gallium/radeon: use a common function for DMA blit preparation

2016-05-04 Thread Marek Olšák

From: Marek Olšák 

this is more robust and probably fixes some bugs already
---
 src/gallium/drivers/r600/evergreen_state.c| 10 ++---
 src/gallium/drivers/r600/r600_state.c |  5 ++-
 src/gallium/drivers/radeon/r600_pipe_common.h |  7 
 src/gallium/drivers/radeon/r600_texture.c | 55 +++
 src/gallium/drivers/radeonsi/cik_sdma.c   | 14 +--
 src/gallium/drivers/radeonsi/si_dma.c | 13 ++-
 6 files changed, 73 insertions(+), 31 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 5224e42..672ddd4 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -3497,14 +3497,10 @@ static void evergreen_dma_copy(struct pipe_context *ctx,
return;
}
 
-   if (src->format != dst->format || src_box->depth > 1 ||
-   (rdst->dirty_level_mask | rdst->stencil_dirty_level_mask) & (1 << 
dst_level)) {
+   if (src_box->depth > 1 ||
+   !r600_prepare_for_dma_blit(&rctx->b, rdst, dst_level, dstx, dsty,
+   dstz, rsrc, src_level, src_box))
goto fallback;
-   }
-
-   if (rsrc->dirty_level_mask & (1 << src_level)) {
-   ctx->flush_resource(ctx, src);
-   }
 
src_x = util_format_get_nblocksx(src->format, src_box->x);
dst_x = util_format_get_nblocksx(src->format, dst_x);
diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index 2291096..48e3663 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -2972,9 +2972,10 @@ static void r600_dma_copy(struct pipe_context *ctx,
return;
}
 
-   if (src->format != dst->format || src_box->depth > 1) {
+   if (src_box->depth > 1 ||
+   !r600_prepare_for_dma_blit(&rctx->b, rdst, dst_level, dstx, dsty,
+   dstz, rsrc, src_level, src_box))
goto fallback;
-   }
 
src_x = util_format_get_nblocksx(src->format, src_box->x);
dst_x = util_format_get_nblocksx(src->format, dst_x);
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index 74eefbb..803c89c 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -623,6 +623,13 @@ void r600_update_prims_generated_query_state(struct 
r600_common_context *rctx,
 void r600_streamout_init(struct r600_common_context *rctx);
 
 /* r600_texture.c */
+bool r600_prepare_for_dma_blit(struct r600_common_context *rctx,
+   struct r600_texture *rdst,
+   unsigned dst_level, unsigned dstx,
+   unsigned dsty, unsigned dstz,
+   struct r600_texture *rsrc,
+   unsigned src_level,
+   const struct pipe_box *src_box);
 void r600_texture_get_fmask_info(struct r600_common_screen *rscreen,
 struct r600_texture *rtex,
 unsigned nr_samples,
diff --git a/src/gallium/drivers/radeon/r600_texture.c 
b/src/gallium/drivers/radeon/r600_texture.c
index ac11380..0dba045 100644
--- a/src/gallium/drivers/radeon/r600_texture.c
+++ b/src/gallium/drivers/radeon/r600_texture.c
@@ -32,6 +32,61 @@
 #include 
 #include 
 
+bool r600_prepare_for_dma_blit(struct r600_common_context *rctx,
+  struct r600_texture *rdst,
+  unsigned dst_level, unsigned dstx,
+  unsigned dsty, unsigned dstz,
+  struct r600_texture *rsrc,
+  unsigned src_level,
+  const struct pipe_box *src_box)
+{
+   if (!rctx->dma.cs)
+   return false;
+
+   if (util_format_get_blocksizebits(rdst->resource.b.b.format) !=
+   util_format_get_blocksizebits(rsrc->resource.b.b.format))
+   return false;
+
+   /* MSAA: Blits don't exist in the real world. */
+   if (rsrc->resource.b.b.nr_samples > 1 ||
+   rdst->resource.b.b.nr_samples > 1)
+   return false;
+
+   /* Depth-stencil surfaces:
+*   When dst is linear, the DB->CB copy preserves HTILE.
+*   When dst is tiled, the 3D path must be used to update HTILE.
+*/
+   if (rsrc->is_depth || rdst->is_depth)
+   return false;
+
+   /* DCC as:
+*   src: Use the 3D path. DCC decompression is expensive.
+*   dst: If overwriting the whole texture, disable DCC and use SDMA.
+*Otherwise, use the 3D path.
+* TODO: handle the case when the dst box covers the whole texture
+*/
+   if (rsrc->dcc_offset || rdst->dcc_offset)
+

[Mesa-dev] [PATCH 02/14] gallium/radeon: split out code for discarding DCC

2016-05-04 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/radeon/r600_texture.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_texture.c 
b/src/gallium/drivers/radeon/r600_texture.c
index 6df013a..ac11380 100644
--- a/src/gallium/drivers/radeon/r600_texture.c
+++ b/src/gallium/drivers/radeon/r600_texture.c
@@ -292,6 +292,17 @@ static void r600_texture_discard_cmask(struct 
r600_common_screen *rscreen,
p_atomic_inc(&rscreen->compressed_colortex_counter);
 }
 
+static void r600_texture_discard_dcc(struct r600_common_screen *rscreen,
+struct r600_texture *rtex)
+{
+   /* Disable DCC. */
+   rtex->dcc_offset = 0;
+   rtex->cb_color_info &= ~VI_S_028C70_DCC_ENABLE(1);
+
+   /* Notify all contexts about the change. */
+   r600_dirty_all_framebuffer_states(rscreen);
+}
+
 void r600_texture_disable_dcc(struct r600_common_screen *rscreen,
  struct r600_texture *rtex)
 {
@@ -307,12 +318,7 @@ void r600_texture_disable_dcc(struct r600_common_screen 
*rscreen,
rctx->b.flush(&rctx->b, NULL, 0);
pipe_mutex_unlock(rscreen->aux_context_lock);
 
-   /* Disable DCC. */
-   rtex->dcc_offset = 0;
-   rtex->cb_color_info &= ~VI_S_028C70_DCC_ENABLE(1);
-
-   /* Notify all contexts about the change. */
-   r600_dirty_all_framebuffer_states(rscreen);
+   r600_texture_discard_dcc(rscreen, rtex);
 }
 
 static boolean r600_texture_get_handle(struct pipe_screen* screen,
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 04/14] gallium/radeon: discard CMASK or DCC if overwriting a whole texture by DMA

2016-05-04 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/radeon/r600_texture.c | 46 ++-
 1 file changed, 39 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_texture.c 
b/src/gallium/drivers/radeon/r600_texture.c
index 0dba045..d6d95af 100644
--- a/src/gallium/drivers/radeon/r600_texture.c
+++ b/src/gallium/drivers/radeon/r600_texture.c
@@ -32,6 +32,23 @@
 #include 
 #include 
 
+static void r600_texture_discard_dcc(struct r600_common_screen *rscreen,
+struct r600_texture *rtex);
+static void r600_texture_discard_cmask(struct r600_common_screen *rscreen,
+  struct r600_texture *rtex);
+
+
+static bool range_covers_whole_texture(struct pipe_resource *tex,
+  unsigned level, unsigned x, unsigned y,
+  unsigned z, unsigned width,
+  unsigned height, unsigned depth)
+{
+   return x == 0 && y == 0 && z == 0 &&
+  width == u_minify(tex->width0, level) &&
+  height == u_minify(tex->height0, level) &&
+  depth == util_max_layer(tex, level) + 1;
+}
+
 bool r600_prepare_for_dma_blit(struct r600_common_context *rctx,
   struct r600_texture *rdst,
   unsigned dst_level, unsigned dstx,
@@ -61,21 +78,36 @@ bool r600_prepare_for_dma_blit(struct r600_common_context 
*rctx,
 
/* DCC as:
 *   src: Use the 3D path. DCC decompression is expensive.
-*   dst: If overwriting the whole texture, disable DCC and use SDMA.
+*   dst: If overwriting the whole texture, discard DCC and use SDMA.
 *Otherwise, use the 3D path.
-* TODO: handle the case when the dst box covers the whole texture
 */
-   if (rsrc->dcc_offset || rdst->dcc_offset)
+   if (rsrc->dcc_offset)
return false;
 
+   if (rdst->dcc_offset) {
+   /* We can't discard DCC if the texture has been exported. */
+   if (!rdst->resource.is_shared &&
+   range_covers_whole_texture(&rdst->resource.b.b, dst_level,
+  dstx, dsty, dstz, src_box->width,
+  src_box->height, src_box->depth))
+   r600_texture_discard_dcc(rctx->screen, rdst);
+   else
+   return false;
+   }
+
/* CMASK as:
 *   src: Both texture and SDMA paths need decompression. Use SDMA.
-*   dst: If overwriting the whole texture, deallocate CMASK and use
+*   dst: If overwriting the whole texture, discard CMASK and use
 *SDMA. Otherwise, use the 3D path.
-* TODO: handle the case when the dst box covers the whole texture
 */
-   if (rdst->cmask.size && rdst->dirty_level_mask & (1 << dst_level))
-   return false;
+   if (rdst->cmask.size && rdst->dirty_level_mask & (1 << dst_level)) {
+   if (range_covers_whole_texture(&rdst->resource.b.b, dst_level,
+  dstx, dsty, dstz, src_box->width,
+  src_box->height, src_box->depth))
+   r600_texture_discard_cmask(rctx->screen, rdst);
+   else
+   return false;
+   }
 
/* All requirements are met. Prepare textures for SDMA. */
if (rsrc->cmask.size && rsrc->dirty_level_mask & (1 << src_level))
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 06/14] radeonsi: only expose _init_dma_functions from (S)DMA files

2016-05-04 Thread Marek Olšák

From: Marek Olšák 

just normalizing the interfaces
---
 src/gallium/drivers/radeonsi/cik_sdma.c | 19 ---
 src/gallium/drivers/radeonsi/si_dma.c   | 19 ---
 src/gallium/drivers/radeonsi/si_pipe.c  |  5 +
 src/gallium/drivers/radeonsi/si_pipe.h  | 16 ++--
 src/gallium/drivers/radeonsi/si_state.c |  6 --
 5 files changed, 31 insertions(+), 34 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/cik_sdma.c 
b/src/gallium/drivers/radeonsi/cik_sdma.c
index 7f7db91..a8caf04 100644
--- a/src/gallium/drivers/radeonsi/cik_sdma.c
+++ b/src/gallium/drivers/radeonsi/cik_sdma.c
@@ -193,13 +193,13 @@ static void cik_sdma_copy_tile(struct si_context *ctx,
}
 }
 
-void cik_sdma_copy(struct pipe_context *ctx,
-  struct pipe_resource *dst,
-  unsigned dst_level,
-  unsigned dstx, unsigned dsty, unsigned dstz,
-  struct pipe_resource *src,
-  unsigned src_level,
-  const struct pipe_box *src_box)
+static void cik_sdma_copy(struct pipe_context *ctx,
+ struct pipe_resource *dst,
+ unsigned dst_level,
+ unsigned dstx, unsigned dsty, unsigned dstz,
+ struct pipe_resource *src,
+ unsigned src_level,
+ const struct pipe_box *src_box)
 {
struct si_context *sctx = (struct si_context *)ctx;
struct r600_texture *rsrc = (struct r600_texture*)src;
@@ -324,3 +324,8 @@ fallback:
si_resource_copy_region(ctx, dst, dst_level, dstx, dsty, dstz,
src, src_level, src_box);
 }
+
+void cik_init_sdma_functions(struct si_context *sctx)
+{
+   sctx->b.dma_copy = cik_sdma_copy;
+}
diff --git a/src/gallium/drivers/radeonsi/si_dma.c 
b/src/gallium/drivers/radeonsi/si_dma.c
index 84961d5..033eb7b 100644
--- a/src/gallium/drivers/radeonsi/si_dma.c
+++ b/src/gallium/drivers/radeonsi/si_dma.c
@@ -190,13 +190,13 @@ static void si_dma_copy_tile(struct si_context *ctx,
}
 }
 
-void si_dma_copy(struct pipe_context *ctx,
-struct pipe_resource *dst,
-unsigned dst_level,
-unsigned dstx, unsigned dsty, unsigned dstz,
-struct pipe_resource *src,
-unsigned src_level,
-const struct pipe_box *src_box)
+static void si_dma_copy(struct pipe_context *ctx,
+   struct pipe_resource *dst,
+   unsigned dst_level,
+   unsigned dstx, unsigned dsty, unsigned dstz,
+   struct pipe_resource *src,
+   unsigned src_level,
+   const struct pipe_box *src_box)
 {
struct si_context *sctx = (struct si_context *)ctx;
struct r600_texture *rsrc = (struct r600_texture*)src;
@@ -293,3 +293,8 @@ fallback:
si_resource_copy_region(ctx, dst, dst_level, dstx, dsty, dstz,
src, src_level, src_box);
 }
+
+void si_init_dma_functions(struct si_context *sctx)
+{
+   sctx->b.dma_copy = si_dma_copy;
+}
diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 1a69f43..f2692dc 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -192,6 +192,11 @@ static struct pipe_context *si_create_context(struct 
pipe_screen *screen,
si_init_state_functions(sctx);
si_init_shader_functions(sctx);
 
+   if (sctx->b.chip_class >= CIK)
+   cik_init_sdma_functions(sctx);
+   else
+   si_init_dma_functions(sctx);
+
if (sscreen->b.debug_flags & DBG_FORCE_DMA)
sctx->b.b.resource_copy_region = sctx->b.dma_copy;
 
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index d31e9a9..33d3d25 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -325,13 +325,7 @@ struct si_context {
 };
 
 /* cik_sdma.c */
-void cik_sdma_copy(struct pipe_context *ctx,
-  struct pipe_resource *dst,
-  unsigned dst_level,
-  unsigned dstx, unsigned dsty, unsigned dstz,
-  struct pipe_resource *src,
-  unsigned src_level,
-  const struct pipe_box *src_box);
+void cik_init_sdma_functions(struct si_context *sctx);
 
 /* si_blit.c */
 void si_init_blit_functions(struct si_context *sctx);
@@ -357,13 +351,7 @@ void si_check_vm_faults(struct si_context *sctx);
 bool si_replace_shader(unsigned num, struct radeon_shader_binary *binary);
 
 /* si_dma.c */
-void si_dma_copy(struct pipe_context *ctx,
-struct pipe_resource *dst,
-unsigned dst_level,
-unsigned dstx, unsigned dsty, unsigned dstz,
-struct pipe_resource *src,
-

[Mesa-dev] [PATCH 07/14] radeonsi: remove SDMA texture copy code

2016-05-04 Thread Marek Olšák

From: Marek Olšák 

Most of this has never worked according to the new test.

The new code will be radically different.
---
 src/gallium/drivers/radeonsi/cik_sdma.c | 217 +---
 1 file changed, 2 insertions(+), 215 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/cik_sdma.c 
b/src/gallium/drivers/radeonsi/cik_sdma.c
index a8caf04..ec3446a 100644
--- a/src/gallium/drivers/radeonsi/cik_sdma.c
+++ b/src/gallium/drivers/radeonsi/cik_sdma.c
@@ -89,110 +89,6 @@ static void cik_sdma_copy_buffer(struct si_context *ctx,
cik_sdma_do_copy_buffer(ctx, dst, src, dst_offset, src_offset, size);
 }
 
-static void cik_sdma_copy_tile(struct si_context *ctx,
-  struct pipe_resource *dst,
-  unsigned dst_level,
-  struct pipe_resource *src,
-  unsigned src_level,
-  unsigned y,
-  unsigned copy_height,
-  unsigned y_align,
-  unsigned pitch,
-  unsigned bpe)
-{
-   struct radeon_winsys_cs *cs = ctx->b.dma.cs;
-   struct r600_texture *rsrc = (struct r600_texture*)src;
-   struct r600_texture *rdst = (struct r600_texture*)dst;
-   unsigned dst_mode = rdst->surface.level[dst_level].mode;
-   unsigned src_mode = rsrc->surface.level[src_level].mode;
-   bool detile = dst_mode == RADEON_SURF_MODE_LINEAR_ALIGNED;
-   struct r600_texture *rlinear = detile ? rdst : rsrc;
-   struct r600_texture *rtiled = detile ? rsrc : rdst;
-   unsigned linear_lvl = detile ? dst_level : src_level;
-   unsigned tiled_lvl = detile ? src_level : dst_level;
-   struct radeon_info *info = &ctx->screen->b.info;
-   unsigned index = rtiled->surface.tiling_index[tiled_lvl];
-   unsigned macro_index = rtiled->surface.macro_tile_index;
-   unsigned tile_mode = info->si_tile_mode_array[index];
-   unsigned macro_mode = info->cik_macrotile_mode_array[macro_index];
-   unsigned array_mode, lbpe, pitch_tile_max, slice_tile_max, size;
-   unsigned ncopy, height, cheight, i;
-   unsigned sub_op, bank_h, bank_w, mt_aspect, nbanks, tile_split, mt;
-   uint64_t base, addr;
-   unsigned pipe_config;
-
-   assert(dst_mode != src_mode);
-   assert(src_mode == RADEON_SURF_MODE_LINEAR_ALIGNED || dst_mode == 
RADEON_SURF_MODE_LINEAR_ALIGNED);
-
-   sub_op = CIK_SDMA_COPY_SUB_OPCODE_TILED;
-   lbpe = util_logbase2(bpe);
-   pitch_tile_max = ((pitch / bpe) / 8) - 1;
-
-   assert(!util_format_is_depth_and_stencil(rtiled->resource.b.b.format));
-
-   array_mode = G_009910_ARRAY_MODE(tile_mode);
-   slice_tile_max = (rtiled->surface.level[tiled_lvl].nblk_x *
- rtiled->surface.level[tiled_lvl].nblk_y) / (8*8) - 1;
-   height = rlinear->surface.level[linear_lvl].nblk_y;
-   base = rtiled->surface.level[tiled_lvl].offset;
-   addr = rlinear->surface.level[linear_lvl].offset;
-   bank_h = G_009990_BANK_HEIGHT(macro_mode);
-   bank_w = G_009990_BANK_WIDTH(macro_mode);
-   mt_aspect = G_009990_MACRO_TILE_ASPECT(macro_mode);
-   /* Non-depth modes don't have TILE_SPLIT set. */
-   tile_split = util_logbase2(rtiled->surface.tile_split >> 6);
-   nbanks = G_009990_NUM_BANKS(macro_mode);
-   base += rtiled->resource.gpu_address;
-   addr += rlinear->resource.gpu_address;
-
-   pipe_config = G_009910_PIPE_CONFIG(tile_mode);
-   mt = G_009910_MICRO_TILE_MODE_NEW(tile_mode);
-
-   size = (copy_height * pitch) / 4;
-   cheight = copy_height;
-   if (((cheight * pitch) / 4) > CIK_SDMA_COPY_MAX_SIZE) {
-   cheight = (CIK_SDMA_COPY_MAX_SIZE * 4) / pitch;
-   cheight &= ~(y_align - 1);
-   }
-   ncopy = (copy_height + cheight - 1) / cheight;
-   r600_need_dma_space(&ctx->b, ncopy * 12);
-
-   radeon_add_to_buffer_list(&ctx->b, &ctx->b.dma, &rsrc->resource,
- RADEON_USAGE_READ, RADEON_PRIO_SDMA_TEXTURE);
-   radeon_add_to_buffer_list(&ctx->b, &ctx->b.dma, &rdst->resource,
- RADEON_USAGE_WRITE, RADEON_PRIO_SDMA_TEXTURE);
-
-   copy_height = size * 4 / pitch;
-   for (i = 0; i < ncopy; i++) {
-   cheight = copy_height;
-   if (((cheight * pitch) / 4) > CIK_SDMA_COPY_MAX_SIZE) {
-   cheight = (CIK_SDMA_COPY_MAX_SIZE * 4) / pitch;
-   cheight &= ~(y_align - 1);
-   }
-   size = (cheight * pitch) / 4;
-
-   cs->buf[cs->cdw++] = CIK_SDMA_PACKET(CIK_SDMA_OPCODE_COPY,
-sub_op, detile << 15);
-   cs->buf[cs->cdw++] = base;
-   cs->buf[cs->cdw++] = base >> 32;
-   cs->buf[cs->cdw++] = ((height - 1) << 16) | pitch_tile_max;
-

[Mesa-dev] [PATCH 08/14] radeonsi: raise the max size for SDMA buffer copies

2016-05-04 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/cik_sdma.c | 4 ++--
 src/gallium/drivers/radeonsi/sid.h  | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/cik_sdma.c 
b/src/gallium/drivers/radeonsi/cik_sdma.c
index ec3446a..88a994e 100644
--- a/src/gallium/drivers/radeonsi/cik_sdma.c
+++ b/src/gallium/drivers/radeonsi/cik_sdma.c
@@ -46,7 +46,7 @@ static void cik_sdma_do_copy_buffer(struct si_context *ctx,
dst_offset += r600_resource(dst)->gpu_address;
src_offset += r600_resource(src)->gpu_address;
 
-   ncopy = (size + CIK_SDMA_COPY_MAX_SIZE - 1) / CIK_SDMA_COPY_MAX_SIZE;
+   ncopy = DIV_ROUND_UP(size, CIK_SDMA_COPY_MAX_SIZE);
r600_need_dma_space(&ctx->b, ncopy * 7);
 
radeon_add_to_buffer_list(&ctx->b, &ctx->b.dma, rsrc, RADEON_USAGE_READ,
@@ -55,7 +55,7 @@ static void cik_sdma_do_copy_buffer(struct si_context *ctx,
  RADEON_PRIO_SDMA_BUFFER);
 
for (i = 0; i < ncopy; i++) {
-   csize = size < CIK_SDMA_COPY_MAX_SIZE ? size : 
CIK_SDMA_COPY_MAX_SIZE;
+   csize = MIN2(size, CIK_SDMA_COPY_MAX_SIZE);
cs->buf[cs->cdw++] = CIK_SDMA_PACKET(CIK_SDMA_OPCODE_COPY,
 
CIK_SDMA_COPY_SUB_OPCODE_LINEAR,
 0);
diff --git a/src/gallium/drivers/radeonsi/sid.h 
b/src/gallium/drivers/radeonsi/sid.h
index 9daefdb..a388c5b 100644
--- a/src/gallium/drivers/radeonsi/sid.h
+++ b/src/gallium/drivers/radeonsi/sid.h
@@ -9021,7 +9021,7 @@
 #defineCIK_SDMA_PACKET_SEMAPHORE   0x7
 #defineCIK_SDMA_PACKET_CONSTANT_FILL   0xb
 #defineCIK_SDMA_PACKET_SRBM_WRITE  0xe
-#defineCIK_SDMA_COPY_MAX_SIZE  0x1f
+#defineCIK_SDMA_COPY_MAX_SIZE  0x3fffe0
 
 #endif /* _SID_H */
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 00/14] RadeonSI: SDMA texture copy rewrite

2016-05-04 Thread Marek Olšák

Hi,

This patch series completely rewrites texture copying with SDMA for CIK & VI. 
It only uses the "partial" copy packets, which makes it a lot simpler (one 
packet per layered/3D copy). Most of the complexity is in handling hw 
limitations, but luckily the SDMA path can be used in the majority of cases.

On top of that, the DMA IB support is improved thanks to extensive testing and 
benchmarking.

This is only enabled on VI for now. It can be enabled on CIK after enough 
testing is done.

There is one more hardware limitation that can cause VM faults with T2L copies 
and needs a workaround. The exact workaround is still under discussion, but I 
think this is good enough for review already.

On Tonga, it beats the 3D engine in texture upload performance by 0-80%. On 
APUs, the performance is mixed or slightly worse. The reason may be that the 3D 
engine uses less bandwidth thanks to DCC.

The fact that SDMA can't do DCC compression can be detrimental to texturing 
performance on VI. The impact hasn't been measured yet.

Texture uploads still use 2 copies (user memory -> linear, linear -> tiled). 
Merging those 2 copies into 1 is not done in this series.

Please review and opinions on the DCC issue are welcome,

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/fs: Don't follow pow with an instruction with two dest regs.

2016-05-04 Thread Francisco Jerez

Matt Turner  writes:

> Beginning with commit 7b208a73, Unigine Valley began hanging the GPU on
> Gen >= 8 platforms.
>
> Evidently that commit allowed the scheduler to make different choices
> that somehow finally ran afoul of a hardware bug in which POW and FDIV
> instructions may not be followed by an instruction with two destination
> registers (including compressed instructions). I presume the conditions
> are more complex than that, but the internal hardware bug report (BDWGFX
> bug_de 1696294) does not contain much more information.
>
> Cc: Francisco Jerez 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94924
> Reviewed-by: Topi Pohjolainen  [v1]
> Tested-by:  Mark Janes  [v1]
> ---
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 18 ++
>  1 file changed, 18 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index 8654ca4..ff8d37e 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -1726,6 +1726,24 @@ fs_generator::generate_code(const cfg_t *cfg, int 
> dispatch_width)
>unsigned int last_insn_offset = p->next_insn_offset;
>bool multiple_instructions_emitted = false;
>  
> +  /* From the Broadwell PRM, Volume 7, "3D-Media-GPGPU", in the
> +   * "Register Region Restrictions" section: for BDW, SKL:
> +   *
> +   *"A POW/FDIV operation must not be followed by an instruction
> +   * that requires two destination registers."
> +   *
> +   * The documentation is often lacking annotations for Atom parts,
> +   * and empirically this affects CHV as well.
> +   */
> +  if (devinfo->gen >= 8 &&
> +  p->nr_insn > 1 &&
> +  brw_inst_opcode(devinfo, brw_last_inst) == BRW_OPCODE_MATH &&
> +  brw_inst_math_function(devinfo, brw_last_inst) == 
> BRW_MATH_FUNCTION_POW &&
> +  inst->dst.component_size(inst->exec_size) > REG_SIZE) {
> + brw_NOP(p);
> + last_insn_offset = p->next_insn_offset;
> +  }
> +

Looks good.  This would likely be a good time to check whether the math
function is FDIV in addition -- Just to be sure that this hardware bug
doesn't bite us again in the future if we ever start using FDIV.  Either
way patch is:

Reviewed-by: Francisco Jerez 

>if (unlikely(debug_flag))
>   annotate(p->devinfo, &annotation, cfg, inst, p->next_insn_offset);
>  
> -- 
> 2.7.3


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965/fs: Don't follow pow with an instruction with two dest regs.

2016-05-04 Thread Matt Turner

Beginning with commit 7b208a73, Unigine Valley began hanging the GPU on
Gen >= 8 platforms.

Evidently that commit allowed the scheduler to make different choices
that somehow finally ran afoul of a hardware bug in which POW and FDIV
instructions may not be followed by an instruction with two destination
registers (including compressed instructions). I presume the conditions
are more complex than that, but the internal hardware bug report (BDWGFX
bug_de 1696294) does not contain much more information.

Cc: Francisco Jerez 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94924
Reviewed-by: Topi Pohjolainen  [v1]
Tested-by:  Mark Janes  [v1]
---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 8654ca4..ff8d37e 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -1726,6 +1726,24 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
   unsigned int last_insn_offset = p->next_insn_offset;
   bool multiple_instructions_emitted = false;
 
+  /* From the Broadwell PRM, Volume 7, "3D-Media-GPGPU", in the
+   * "Register Region Restrictions" section: for BDW, SKL:
+   *
+   *"A POW/FDIV operation must not be followed by an instruction
+   * that requires two destination registers."
+   *
+   * The documentation is often lacking annotations for Atom parts,
+   * and empirically this affects CHV as well.
+   */
+  if (devinfo->gen >= 8 &&
+  p->nr_insn > 1 &&
+  brw_inst_opcode(devinfo, brw_last_inst) == BRW_OPCODE_MATH &&
+  brw_inst_math_function(devinfo, brw_last_inst) == 
BRW_MATH_FUNCTION_POW &&
+  inst->dst.component_size(inst->exec_size) > REG_SIZE) {
+ brw_NOP(p);
+ last_insn_offset = p->next_insn_offset;
+  }
+
   if (unlikely(debug_flag))
  annotate(p->devinfo, &annotation, cfg, inst, p->next_insn_offset);
 
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 37/59] i965/fs: generalize SIMD16 interference workaround

2016-05-04 Thread Connor Abbott

On Curro's suggestion, I tried reproducing the problem that the patch
fixes. Not only could I not reproduce it, but it seems like the
original workaround that Jason added back when he reworked the fs
backend ages ago is unnecessary at least on my ivybridge and broadwell
machines. Since reproducing it involves specific register-allocation
choices, I hacked the fs visitor to skip emit_nir_code() and instead
produce a specific program with some hardcoded register assignments,
then picked a random piglit test
(tests/shaders/glsl-fs-abs-neg.shader_test) and ran it with
INTEL_DEBUG=no8. The resulting assembly looked like:

Native code for unnamed fragment shader GLSL3
SIMD16 shader: 7 instructions. 0 loops. 58 cycles. 0:0 spills:fills.
Promoted 0 constants. Compacted 112 to 96 bytes (14%)
   START B0
mov(16) g10<1>F -1F { align1 1H };
mov(16) g122<1>F1F  { align1 1H };
mov(16) g124<1>F[0F, 0F, 0F, 0F]VF  {
align1 1H compacted };
mov(16) g126<1>F1F  { align1 1H };
add(16) g11<1>F g10<8,8,1>F 1F  { align1 1H };
mov(16) g120<1>Fg11<8,8,1>F {
align1 1H compacted };
sendc(16)   null<1>UW   g120<8,8,1>F
render RT write SIMD16 LastRT Surface = 0
mlen 8 rlen 0 { align1 1H EOT };
   END B0

If the comment Jason added is correct, then

   add(16) g11<1>F g10<8,8,1>F 1F  {
align1 1H };

will get expanded into

   add(8) g11<1>F g10<8,8,1>F 1F  { align1 1Q };
   add(8) g12<1>F g11<8,8,1>F 1F  { align1 2Q };

And the add will happen twice for g11, making some of the pixels
yellow instead of green. But on the machines I tested, this resulted
in pure green. I also tried a similar thing for fp64, but there was
still no problem. I Also tested the similar problem described in
fs_inst::has_source_and_destination_hazard() with this program:

Native code for unnamed fragment shader GLSL3
SIMD16 shader: 7 instructions. 0 loops. 56 cycles. 0:0 spills:fills.
Promoted 0 constants. Compacted 112 to 96 bytes (14%)
   START B0
mov(1)  g10<1>F -1F {
align1 WE_all };
mov(16) g122<1>F1F  { align1 1H };
mov(16) g124<1>F[0F, 0F, 0F, 0F]VF  {
align1 1H compacted };
mov(16) g126<1>F1F  { align1 1H };
add(16) g10<1>F g10<0,1,0>F 1F  { align1 1H };
mov(16) g120<1>Fg10<8,8,1>F {
align1 1H compacted };
sendc(16)   null<1>UW   g120<8,8,1>F
render RT write SIMD16 LastRT Surface = 0
mlen 8 rlen 0 { align1 1H EOT };
   END B0

and again the comment seems to be wrong. Also, there were no piglit
regressions with the patch reverted, and the only tests broken with
the original workarounds reverted are ones that use math instructions
which we split to SIMD8 in the generator (but probably shouldn't be).
So for now, we can probably drop this patch, and if we can do more
testing, then we can probably drop the original workaround or restrict
it to older gens.

If anyone wants to test this on other platforms than broadwell and
ivybridge, the hacks I made are available on my fdo repo on the
test-compressed-wa and test-fp64-compressed-wa (the former has two
commits, go back to the first to produce the first assembly).

On Fri, Apr 29, 2016 at 7:29 AM, Samuel Iglesias Gonsálvez
 wrote:
> From: Connor Abbott 
>
> Work based on registers read/written instead of dispatch_width, so that
> the interferences are added for 64-bit sources/destinations as well.
> ---
>  src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 67 
> ---
>  1 file changed, 48 insertions(+), 19 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> index 2347cd5..f0e96f9 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> @@ -541,6 +541,34 @@ setup_mrf_hack_interference(fs_visitor *v, struct 
> ra_graph *g,
> }
>  }
>
> +static unsigned
> +get_reg_node(const fs_reg& reg, unsigned first_payload_node)
> +{
> +   switch (reg.file) {
> +   case VGRF:
> +  return reg.nr;
> +   case ARF:
> +   case FIXED_GRF:
> +  return first_payload_node + reg.nr;
> +   case MRF:
> +   default:
> +  unreachable("unhandled register file");
> +   }
> +
> +   return 0;
> +}
> +
> +static void
> +add_reg_interference(struct ra_graph *g, const fs_reg& reg1,
> + const fs_reg& reg2, unsigned first_payload_node)
> +{
> +   if ((reg1.file == VGRF || reg1.file == ARF || reg1.file == FIXED_GRF) &&
> +   (reg2

[Mesa-dev] [PATCH 1/2] meta/blit: Don't blend integer values during MSAA resolves

2016-05-04 Thread Jason Ekstrand

---
 src/mesa/drivers/common/meta_blit.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/common/meta_blit.c 
b/src/mesa/drivers/common/meta_blit.c
index 6761238..bb79c46 100644
--- a/src/mesa/drivers/common/meta_blit.c
+++ b/src/mesa/drivers/common/meta_blit.c
@@ -458,8 +458,17 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
  int step;
 
  if (src_datatype == GL_INT || src_datatype == GL_UNSIGNED_INT) {
-merge_function =
-   "gvec4 merge(gvec4 a, gvec4 b) { return (a >> gvec4(1)) + (b >> 
gvec4(1)) + (a & b & gvec4(1)); }\n";
+/* From the OpenGL ES 3.2 spec section 16.2.1:
+ *
+ *"If the source formats are integer types or stencil values,
+ *a single sample’s value is selected for each pixel."
+ *
+ * The OpenGL 4.4 spec contains exactly the same language.
+ *
+ * We can accomplish this by making the merge function return just
+ * one of the two samples.  The compiler should do the rest.
+ */
+merge_function = "gvec4 merge(gvec4 a, gvec4 b) { return a; }\n";
  } else {
 /* The divide will happen at the end for floats. */
 merge_function =
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] i965/borp: Don't blend integer values during MSAA resolves

2016-05-04 Thread Jason Ekstrand

---
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 30 ++--
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index 882f9ed..dc68eab 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -2160,6 +2160,8 @@ brw_blorp_blit_program::manual_blend_average(unsigned 
num_samples)
if (key->tex_layout == INTEL_MSAA_LAYOUT_CMS)
   mcs_fetch();
 
+   assert(key->texture_data_type == BRW_REGISTER_TYPE_F);
+
/* We add together samples using a binary tree structure, e.g. for 4x MSAA:
 *
 *   result = ((sample[0] + sample[1]) + (sample[2] + sample[3])) / 4
@@ -2228,8 +2230,7 @@ brw_blorp_blit_program::manual_blend_average(unsigned 
num_samples)
 
  /* TODO: should use a smaller loop bound for non_RGBA formats */
  for (int k = 0; k < 4; ++k) {
-emit_combine(key->texture_data_type == BRW_REGISTER_TYPE_F ?
-BRW_OPCODE_ADD : BRW_OPCODE_AVG,
+emit_combine(BRW_OPCODE_ADD,
  offset(texture_data[stack_depth - 1], 2*k),
  offset(vec8(texture_data[stack_depth - 1]), 2*k),
  offset(vec8(texture_data[stack_depth]), 2*k));
@@ -2240,14 +2241,12 @@ brw_blorp_blit_program::manual_blend_average(unsigned 
num_samples)
/* We should have just 1 sample on the stack now. */
assert(stack_depth == 1);
 
-   if (key->texture_data_type == BRW_REGISTER_TYPE_F) {
-  /* Scale the result down by a factor of num_samples */
-  /* TODO: should use a smaller loop bound for non-RGBA formats */
-  for (int j = 0; j < 4; ++j) {
- emit_mul(offset(texture_data[0], 2*j),
- offset(vec8(texture_data[0]), 2*j),
- brw_imm_f(1.0f / num_samples));
-  }
+   /* Scale the result down by a factor of num_samples */
+   /* TODO: should use a smaller loop bound for non-RGBA formats */
+   for (int j = 0; j < 4; ++j) {
+  emit_mul(offset(texture_data[0], 2*j),
+  offset(vec8(texture_data[0]), 2*j),
+  brw_imm_f(1.0f / num_samples));
}
 
if (key->tex_layout == INTEL_MSAA_LAYOUT_CMS)
@@ -2869,8 +2868,17 @@ brw_blorp_blit_miptrees(struct brw_context *brw,
GLenum base_format = _mesa_get_format_base_format(src_mt->format);
if (base_format != GL_DEPTH_COMPONENT && /* TODO: what about depth/stencil? 
*/
base_format != GL_STENCIL_INDEX &&
+   !_mesa_is_format_integer(src_mt->format) &&
src_mt->num_samples > 1 && dst_mt->num_samples <= 1) {
-  /* We are downsampling a color buffer, so blend. */
+  /* We are downsampling a non-integer color buffer, so blend.
+   *
+   * Regarding integer color buffers, the OpenGL ES 3.2 spec says:
+   *
+   *"If the source formats are integer types or stencil values, a
+   *single sample’s value is selected for each pixel."
+   *
+   * This implies we should not blend in that case.
+   */
   wm_prog_key.blend = true;
}
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] i965: Add infrastucture for sample lod-zero operations.

2016-05-04 Thread Matt Turner

---
 src/mesa/drivers/dri/i965/brw_defines.h |  5 +
 src/mesa/drivers/dri/i965/brw_disasm.c  |  3 +++
 src/mesa/drivers/dri/i965/brw_fs.cpp|  3 +++
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp  | 14 ++
 src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp |  2 ++
 src/mesa/drivers/dri/i965/brw_shader.cpp|  6 ++
 6 files changed, 33 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 60b696c..ba22363 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -977,8 +977,10 @@ enum opcode {
SHADER_OPCODE_TXD_LOGICAL,
SHADER_OPCODE_TXF,
SHADER_OPCODE_TXF_LOGICAL,
+   SHADER_OPCODE_TXF_LZ,
SHADER_OPCODE_TXL,
SHADER_OPCODE_TXL_LOGICAL,
+   SHADER_OPCODE_TXL_LZ,
SHADER_OPCODE_TXS,
SHADER_OPCODE_TXS_LOGICAL,
FS_OPCODE_TXB,
@@ -1636,6 +1638,9 @@ enum brw_message_target {
 #define GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO   17
 #define GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO_C 18
 #define HSW_SAMPLER_MESSAGE_SAMPLE_DERIV_COMPARE 20
+#define GEN9_SAMPLER_MESSAGE_SAMPLE_LZ   24
+#define GEN9_SAMPLER_MESSAGE_SAMPLE_C_LZ 25
+#define GEN9_SAMPLER_MESSAGE_SAMPLE_LD_LZ26
 #define GEN9_SAMPLER_MESSAGE_SAMPLE_LD2DMS_W 28
 #define GEN7_SAMPLER_MESSAGE_SAMPLE_LD_MCS   29
 #define GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DMS   30
diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c 
b/src/mesa/drivers/dri/i965/brw_disasm.c
index 1778419..5d25774 100644
--- a/src/mesa/drivers/dri/i965/brw_disasm.c
+++ b/src/mesa/drivers/dri/i965/brw_disasm.c
@@ -550,6 +550,9 @@ static const char *const gen5_sampler_msg_type[] = {
[GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO]   = "gather4_po",
[GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO_C] = "gather4_po_c",
[HSW_SAMPLER_MESSAGE_SAMPLE_DERIV_COMPARE] = "sample_d_c",
+   [GEN9_SAMPLER_MESSAGE_SAMPLE_LZ]   = "sample_lz",
+   [GEN9_SAMPLER_MESSAGE_SAMPLE_C_LZ] = "sample_c_lz",
+   [GEN9_SAMPLER_MESSAGE_SAMPLE_LD_LZ]= "ld_lz",
[GEN9_SAMPLER_MESSAGE_SAMPLE_LD2DMS_W] = "ld2dms_w",
[GEN7_SAMPLER_MESSAGE_SAMPLE_LD_MCS]   = "ld_mcs",
[GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DMS]   = "ld2dms",
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 18760dd..d233265 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -949,12 +949,14 @@ fs_visitor::implied_mrf_writes(fs_inst *inst)
case FS_OPCODE_TXB:
case SHADER_OPCODE_TXD:
case SHADER_OPCODE_TXF:
+   case SHADER_OPCODE_TXF_LZ:
case SHADER_OPCODE_TXF_CMS:
case SHADER_OPCODE_TXF_CMS_W:
case SHADER_OPCODE_TXF_MCS:
case SHADER_OPCODE_TG4:
case SHADER_OPCODE_TG4_OFFSET:
case SHADER_OPCODE_TXL:
+   case SHADER_OPCODE_TXL_LZ:
case SHADER_OPCODE_TXS:
case SHADER_OPCODE_LOD:
case SHADER_OPCODE_SAMPLEINFO:
@@ -4156,6 +4158,7 @@ lower_sampler_logical_send_gen7(const fs_builder &bld, 
fs_inst *inst, opcode op,
 
   coordinate_done = true;
   break;
+
case SHADER_OPCODE_TXF_CMS:
case SHADER_OPCODE_TXF_CMS_W:
case SHADER_OPCODE_TXF_UMS:
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 8654ca4..5867a48 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -792,6 +792,14 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg 
dst, struct brw_reg src
msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LOD;
 }
 break;
+  case SHADER_OPCODE_TXL_LZ:
+ assert(devinfo->gen >= 9);
+if (inst->shadow_compare) {
+msg_type = GEN9_SAMPLER_MESSAGE_SAMPLE_C_LZ;
+ } else {
+msg_type = GEN9_SAMPLER_MESSAGE_SAMPLE_LZ;
+ }
+ break;
   case SHADER_OPCODE_TXS:
 msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_RESINFO;
 break;
@@ -807,6 +815,10 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg 
dst, struct brw_reg src
   case SHADER_OPCODE_TXF:
 msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LD;
 break;
+  case SHADER_OPCODE_TXF_LZ:
+ assert(devinfo->gen >= 9);
+ msg_type = GEN9_SAMPLER_MESSAGE_SAMPLE_LD_LZ;
+ break;
   case SHADER_OPCODE_TXF_CMS_W:
  assert(devinfo->gen >= 9);
  msg_type = GEN9_SAMPLER_MESSAGE_SAMPLE_LD2DMS_W;
@@ -2115,11 +2127,13 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
   case FS_OPCODE_TXB:
   case SHADER_OPCODE_TXD:
   case SHADER_OPCODE_TXF:
+  case SHADER_OPCODE_TXF_LZ:
   case SHADER_OPCODE_TXF_CMS:
   case SHADER_OPCODE_TXF_CMS_W:
   case SHADER_OPCODE_TXF_UMS:
   case SHADER_OPCODE_TXF_MCS:
   case SHADER_OPCODE_TXL:
+  case SHADER_OPCODE_TXL_LZ:
   case

[Mesa-dev] [PATCH 3/3] i965/fs: Recognize and emit ld_lz, sample_lz, sample_c_lz.

2016-05-04 Thread Matt Turner

Ken suggested instead of a big and complicated optimization pass, to
just recognize the operations here. It's certainly less code and a lot
prettier, but it seems to actually perform worse for currently unknown
reasons.

total instructions in shared programs: 8514403 -> 8495373 (-0.22%)
instructions in affected programs: 809512 -> 790482 (-2.35%)
helped: 3316
HURT: 10

total cycles in shared programs: 64259830 -> 63979404 (-0.44%)
cycles in affected programs: 10511460 -> 10231034 (-2.67%)
helped: 2339
HURT: 771

total spills in shared programs: 1707 -> 1695 (-0.70%)
spills in affected programs: 90 -> 78 (-13.33%)
helped: 4

total fills in shared programs: 2647 -> 2620 (-1.02%)
fills in affected programs: 174 -> 147 (-15.52%)
helped: 4

LOST:   12
GAINED: 36
---
Here's a word diff of the shader-db results from the previous series (before)
compared with this series (after)

diff --git a/before b/after
index e856a62..e2de982 100644
--- a/before
+++ b/after
@@ -1,27 +1,27 @@
total instructions in shared programs: [-8522163-]{+8514403+} -> [-8502879 
(-0.23%)-]{+8495373 (-0.22%)+}
instructions in affected programs: [-801772-]{+809512+} -> [-782488 
(-2.41%)-]{+790482 (-2.35%)+}
helped: 3316
HURT: [-0-]{+10+}

total cycles in shared programs: [-64374090-]{+64259830+} -> [-64070042 
(-0.47%)-]{+63979404 (-0.44%)+}
cycles in affected programs: [-10100336-]{+10511460+} -> [-9796288 
(-3.01%)-]{+10231034 (-2.67%)+}
helped: [-2351-]{+2339+}
HURT: [-735-]{+771+}

total loops in shared programs: [-2209-]{+2199+} -> [-2209-]{+2199+} (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total spills in shared programs: 1707 -> [-1683 (-1.41%)-]{+1695 (-0.70%)+}
spills in affected programs: 90 -> [-66 (-26.67%)-]{+78 (-13.33%)+}
helped: 4
HURT: 0

total fills in shared programs: 2647 -> [-2606 (-1.55%)-]{+2620 (-1.02%)+}
fills in affected programs: 174 -> [-133 (-23.56%)-]{+147 (-15.52%)+}
helped: 4
HURT: 0

LOST:   [-2-]{+12+}
GAINED: 36


 src/mesa/drivers/dri/i965/brw_fs.cpp | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index d233265..202e0f7 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -4096,6 +4096,10 @@ lower_sampler_logical_send_gen7(const fs_builder &bld, 
fs_inst *inst, opcode op,
switch (op) {
case FS_OPCODE_TXB:
case SHADER_OPCODE_TXL:
+  if (devinfo->gen >= 9 && op == SHADER_OPCODE_TXL && lod.is_zero()) {
+ op = SHADER_OPCODE_TXL_LZ;
+ break;
+  }
   bld.MOV(sources[length], lod);
   length++;
   break;
@@ -4147,8 +4151,12 @@ lower_sampler_logical_send_gen7(const fs_builder &bld, 
fs_inst *inst, opcode op,
  length++;
   }
 
-  bld.MOV(retype(sources[length], BRW_REGISTER_TYPE_D), lod);
-  length++;
+  if (devinfo->gen >= 9 && lod.is_zero()) {
+ op = SHADER_OPCODE_TXF_LZ;
+  } else {
+ bld.MOV(retype(sources[length], BRW_REGISTER_TYPE_D), lod);
+ length++;
+  }
 
   for (unsigned i = devinfo->gen >= 9 ? 2 : 1; i < coord_components; i++) {
  bld.MOV(retype(sources[length], BRW_REGISTER_TYPE_D), coordinate);
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/3] i965/fs: Add and use get_nir_src_imm().

2016-05-04 Thread Matt Turner

Terrible name. Suggest something else, or even a better way to do this.

Basically, the next patch wants to inspect the LOD argument and do
something different if it's 0.0f. But at that point we've emitted a MOV
for it and we just have a register to look at.
---
This is an alternative approach to the 4 patch series I sent ast night.

 src/mesa/drivers/dri/i965/brw_fs.h   |  1 +
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 36 +++-
 2 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index ba6bd3f..29fa593 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -273,6 +273,7 @@ public:
void nir_emit_jump(const brw::fs_builder &bld,
   nir_jump_instr *instr);
fs_reg get_nir_src(nir_src src);
+   fs_reg get_nir_src_imm(nir_src src);
fs_reg get_nir_dest(nir_dest dest);
fs_reg get_nir_image_deref(const nir_deref_var *deref);
fs_reg get_indirect_offset(nir_intrinsic_instr *instr);
diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 4d14fda..fb59800 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -1245,6 +1245,32 @@ fs_visitor::get_nir_src(nir_src src)
 }
 
 fs_reg
+fs_visitor::get_nir_src_imm(nir_src src)
+{
+   fs_reg reg;
+   if (src.is_ssa) {
+  if (src.ssa->parent_instr->type == nir_instr_type_load_const) {
+ nir_load_const_instr *load_const =
+nir_instr_as_load_const(src.ssa->parent_instr);
+ return brw_imm_ud(load_const->value.u32[0]);
+  }
+
+  reg = nir_ssa_values[src.ssa->index];
+   } else {
+  /* We don't handle indirects on locals */
+  assert(src.reg.indirect == NULL);
+  reg = offset(nir_locals[src.reg.reg->index], bld,
+   src.reg.base_offset * src.reg.reg->num_components);
+   }
+
+   /* to avoid floating-point denorm flushing problems, set the type by
+* default to D - instructions that need floating point semantics will set
+* this to F if they need to
+*/
+   return retype(reg, BRW_REGISTER_TYPE_D);
+}
+
+fs_reg
 fs_visitor::get_nir_dest(nir_dest dest)
 {
if (dest.is_ssa) {
@@ -3444,7 +3470,7 @@ fs_visitor::nir_emit_texture(const fs_builder &bld, 
nir_tex_instr *instr)
   fs_reg src = get_nir_src(instr->src[i].src);
   switch (instr->src[i].src_type) {
   case nir_tex_src_bias:
- lod = retype(src, BRW_REGISTER_TYPE_F);
+ lod = retype(get_nir_src_imm(instr->src[i].src), BRW_REGISTER_TYPE_F);
  break;
   case nir_tex_src_comparitor:
  shadow_comparitor = retype(src, BRW_REGISTER_TYPE_F);
@@ -3462,7 +3488,7 @@ fs_visitor::nir_emit_texture(const fs_builder &bld, 
nir_tex_instr *instr)
  }
  break;
   case nir_tex_src_ddx:
- lod = retype(src, BRW_REGISTER_TYPE_F);
+ lod = retype(get_nir_src_imm(instr->src[i].src), BRW_REGISTER_TYPE_F);
  lod_components = nir_tex_instr_src_size(instr, i);
  break;
   case nir_tex_src_ddy:
@@ -3471,13 +3497,13 @@ fs_visitor::nir_emit_texture(const fs_builder &bld, 
nir_tex_instr *instr)
   case nir_tex_src_lod:
  switch (instr->op) {
  case nir_texop_txs:
-lod = retype(src, BRW_REGISTER_TYPE_UD);
+lod = retype(get_nir_src_imm(instr->src[i].src), 
BRW_REGISTER_TYPE_UD);
 break;
  case nir_texop_txf:
-lod = retype(src, BRW_REGISTER_TYPE_D);
+lod = retype(get_nir_src_imm(instr->src[i].src), 
BRW_REGISTER_TYPE_D);
 break;
  default:
-lod = retype(src, BRW_REGISTER_TYPE_F);
+lod = retype(get_nir_src_imm(instr->src[i].src), 
BRW_REGISTER_TYPE_F);
 break;
  }
  break;
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radeonsi: set DECOMPRESS_Z_ON_FLUSH if nr_samples >= 4

2016-05-04 Thread Marek Olšák

From: Marek Olšák 

Vulkan always sets this. It only affects in-place Z decompression.
This is recommended for performance, but what app uses MSAA depth
texturing?
---
 src/gallium/drivers/radeonsi/si_state.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index 81a4341..c6e10b7 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -1104,7 +1104,8 @@ static void si_emit_db_render_state(struct si_context 
*sctx, struct r600_atom *s
/* DB_RENDER_OVERRIDE2 */
radeon_set_context_reg(cs, R_028010_DB_RENDER_OVERRIDE2,

S_028010_DISABLE_ZMASK_EXPCLEAR_OPTIMIZATION(sctx->db_depth_disable_expclear) |
-   
S_028010_DISABLE_SMEM_EXPCLEAR_OPTIMIZATION(sctx->db_stencil_disable_expclear));
+   
S_028010_DISABLE_SMEM_EXPCLEAR_OPTIMIZATION(sctx->db_stencil_disable_expclear) |
+   S_028010_DECOMPRESS_Z_ON_FLUSH(sctx->framebuffer.nr_samples >= 
4));
 
db_shader_control = 
S_02880C_ALPHA_TO_MASK_DISABLE(sctx->framebuffer.cb0_is_integer) |
sctx->ps_db_shader_control;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.

2016-05-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95266

--- Comment #16 from Jose Fonseca  ---
(In reply to David Lonie from comment #12)
> Just tried that here, no such luck: 
> 
> $ ln -s libGL.so libGL.so.1
> $ LD_LIBRARY_PATH=$PWD:$LD_LIBRARY_PATH apitrace trace --api gl
> ./vtkRenderingCoreCxxTests "TestTranslucentLUTDepthPeeling" 
> 
> apitrace: loaded into /usr/bin/apitrace 

apitrace is being injected into apitrace...

This can only happen if you have stuff you shouldn't have on LD_PRELOAD and
LD_LIBRARY_PATH variables...  :-/

Please start from a clean terminal.  Don't do anything other than I described. 
In particular don't set LD_PRELOAD, and only set LD_LIBRARY_PATH=$PWD.

Honestly it's better not to fiddle with LD_PRELOAD directly yourself at all. 
It just makes everything unnecesarily harder.  Just let the `apitrace trace`
command do it.


I repeat: 

- the right way of loading libGL.so is to set
LD_LIBRARY_PATH=/path/to/where/libGL.so.1/lives/

- the right way of tracing is to `apitrace trace myapp`

- and the right way of doing both is to

  export LD_LIBRARY_PATH=/path/to/where/libGL.so.1/lives/
  apitrace trace myapp


(Also note, the right name for libGL is not libGL.so as you had in the tarball.
 It needs to be libGL.so.1.  I suspect that this mismatch is what lead you to
resort to LD_PRELOAD to start with.  But if the name is libGL.so.1 that should
never be necessary.)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.

2016-05-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95266

--- Comment #15 from Emil Velikov  ---
Just pointing out something that may not be that obvious - David is using the
xlib powered gallium libGL. That one uses the (iirc) unsupported split shared
LLVM libraries.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.

2016-05-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95266

--- Comment #14 from David Lonie  ---
(In reply to Ilia Mirkin from comment #13)
> FWIW this is what it looks like on i965: http://i.imgur.com/LMmcGkM.png
> 
> I'm guessing that's correct, but the "correct" screenshot is inside an
> archive and I'm lazy. It is indeed totally messed up with llvmpipe.

Yep, that's what it should look like.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCHv2 10/23] i965/fs: Stop using the LOAD_PAYLOAD instruction in lower_simd_width.

2016-05-04 Thread Francisco Jerez

Michael Schellenberger Costa 
writes:

> Hi Curro,
>
> Am 04.05.2016 um 06:26 schrieb Francisco Jerez:
>> Instead of using the LOAD_PAYLOAD instruction (emitted through the
>> emit_transpose() helper that is no longer useful and this commit
>> removes) which had to be marked force_writemask_all in some cases,
>> emit a series of moves to apply proper channel enable signals to the
>> destination.  Until now lower_simd_width() had mainly been used to
>> lower things that invariably had a basic block-local temporary as
>> destination so it didn't seem like a big deal, but I found it to be
>> the reason for several Piglit regressions in my SIMD32 branch and
>> Igalia discovered the same issue independently while working on FP64
>> support.
>> ---
>> This is taken from the following WIP series:
>>   
>> https://cgit.freedesktop.org/~currojerez/mesa/log/?h=i965-late-simd-lowering
>>
>> See also:
>>   https://lists.freedesktop.org/archives/mesa-dev/2016-May/115596.html
>>
>>  src/mesa/drivers/dri/i965/brw_fs.cpp | 58 
>> +++-
>>  1 file changed, 18 insertions(+), 40 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
>> b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> index 4b6aa67..34599cb 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> @@ -4625,29 +4625,6 @@ get_lowered_simd_width(const struct brw_device_info 
>> *devinfo,
>> }
>>  }
>>  
>> -/**
>> - * The \p rows array of registers represents a \p num_rows by \p num_columns
>> - * matrix in row-major order, write it in column-major order into the 
>> register
>> - * passed as destination.  \p stride gives the separation between matrix
>> - * elements in the input in fs_builder::dispatch_width() units.
>> - */
>> -static void
>> -emit_transpose(const fs_builder &bld,
>> -   const fs_reg &dst, const fs_reg *rows,
>> -   unsigned num_rows, unsigned num_columns, unsigned stride)
>> -{
>> -   fs_reg *const components = new fs_reg[num_rows * num_columns];
>> -
>> -   for (unsigned i = 0; i < num_columns; ++i) {
>> -  for (unsigned j = 0; j < num_rows; ++j)
>> - components[num_rows * i + j] = offset(rows[j], bld, stride * i);
>> -   }
>> -
>> -   bld.LOAD_PAYLOAD(dst, components, num_rows * num_columns, 0);
>> -
>> -   delete[] components;
>> -}
>> -
>>  bool
>>  fs_visitor::lower_simd_width()
>>  {
>> @@ -4698,16 +4675,19 @@ fs_visitor::lower_simd_width()
>> if (inst->src[j].file != BAD_FILE &&
>> !is_uniform(inst->src[j])) {
>>/* Get the i-th copy_width-wide chunk of the source. */
>> -  const fs_reg src = horiz_offset(inst->src[j], copy_width 
>> * i);
>> +  const fs_builder cbld = lbld.group(copy_width, 0);
>> +  const fs_reg src = offset(inst->src[j], cbld, i);
>>const unsigned src_size = inst->components_read(j);
>>  
>> -  /* Use a trivial transposition to copy one every n
>> -   * copy_width-wide components of the register into a
>> -   * temporary passed as source to the lowered instruction.
>> +  /* Copy one every n copy_width-wide components of the
> That first sentence is really unclear. Shouldnt it sound "Copy each
> copy_width-wide component of..."
>
Nope, it really is copying one every n copy_width chunks of the
register, any suggestions to make the wording more clear?

> Michael
>> +   * register into a temporary passed as source to the 
>> lowered
>> +   * instruction.
>> */
>>split_inst.src[j] = lbld.vgrf(inst->src[j].type, 
>> src_size);
>> -  emit_transpose(lbld.group(copy_width, 0),
>> - split_inst.src[j], &src, 1, src_size, n);
>> +
>> +  for (unsigned k = 0; k < src_size; ++k)
>> + cbld.MOV(offset(split_inst.src[j], lbld, k),
>> +  offset(src, cbld, n * k));
>> }
>>  }
>>  
>> @@ -4726,20 +4706,18 @@ fs_visitor::lower_simd_width()
>>   }
>>  
>>   if (inst->regs_written) {
>> -/* Distance between useful channels in the temporaries, skipping
>> - * garbage if the lowered instruction is wider than the 
>> original.
>> - */
>> -const unsigned m = lower_width / copy_width;
>> +const fs_builder lbld = ibld.group(lower_width, 0);
>>  
>>  /* Interleave the components of the result from the lowered
>> - * instructions.  We need to set exec_all() when copying more 
>> than
>> - * one half per component, because LOAD_PAYLOAD (in terms of 
>> which
>> - * emit_transpose is implemented) can only use the same channel
>> - * enable signals for all of its non-header sources.
>> + * ins

[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.

2016-05-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95266

--- Comment #13 from Ilia Mirkin  ---
FWIW this is what it looks like on i965: http://i.imgur.com/LMmcGkM.png

I'm guessing that's correct, but the "correct" screenshot is inside an archive
and I'm lazy. It is indeed totally messed up with llvmpipe.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.

2016-05-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95266

--- Comment #12 from David Lonie  ---
(In reply to Jose Fonseca from comment #10)
> The proper way of using a custom libGL.so is to simply set LD_LIBRARY_PATH
> to the directory that has it:
> 
>   ln -sf libGL.so libGL.so,1
>   LD_LIBRARY_PATH=$PWD ./vtkRenderingCoreCxxTests
> "TestTranslucentLUTDepthPeeling" -I

Just tried that here, no such luck: 

$ ln -s libGL.so libGL.so.1
$ LD_LIBRARY_PATH=$PWD:$LD_LIBRARY_PATH apitrace trace --api gl
./vtkRenderingCoreCxxTests "TestTranslucentLUTDepthPeeling" 
apitrace: loaded into /usr/bin/apitrace 
apitrace: unloaded from /usr/bin/apitrace   
apitrace: loaded into /ssd/src/VTK/build-tmp/mesa_bug/vtkRenderingCoreCxxTests  
apitrace: tracing to
/ssd/src/VTK/build-tmp/mesa_bug/vtkRenderingCoreCxxTests.18.trace   
vtkRenderingCoreCxxTests:
/build/apitrace/src/apitrace-7.1/build/wrappers/glxtrace.cpp:97195: void (*
_wrapProcAddress(const GLubyte*, __GLXextFuncPtr))(): Assertion `procPtr !=
(__GLXextFuncPtr)&glXCreateContextAttribsARB' failed.

Looks like you did get a trace from it, though, so I wonder what's different.
Thanks for uploading that, hopefully that will make it easier to track down!

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.

2016-05-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95266

--- Comment #11 from Jose Fonseca  ---
Created attachment 123472
  --> https://bugs.freedesktop.org/attachment.cgi?id=123472&action=edit
TestTranslucentLUTDepthPeeling.trace

Trace.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 27/59] i965/fs: add a stride helper

2016-05-04 Thread Francisco Jerez

Iago Toral  writes:

> On Wed, 2016-05-04 at 01:15 -0700, Francisco Jerez wrote:
>> Iago Toral  writes:
>> 
>> > On Mon, 2016-05-02 at 18:48 -0700, Francisco Jerez wrote:
>> >> Samuel Iglesias Gonsálvez  writes:
>> >> 
>> >> > From: Connor Abbott 
>> >> >
>> >> > Similar to retype() and offset().
>> >> > ---
>> >> >  src/mesa/drivers/dri/i965/brw_ir_fs.h | 8 
>> >> >  1 file changed, 8 insertions(+)
>> >> >
>> >> > diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h 
>> >> > b/src/mesa/drivers/dri/i965/brw_ir_fs.h
>> >> > index e4f20f4..abda2c3 100644
>> >> > --- a/src/mesa/drivers/dri/i965/brw_ir_fs.h
>> >> > +++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h
>> >> > @@ -78,6 +78,14 @@ retype(fs_reg reg, enum brw_reg_type type)
>> >> >  }
>> >> >  
>> >> >  static inline fs_reg
>> >> > +stride(fs_reg reg, unsigned stride)
>> >> > +{
>> >> > +   if (reg.stride != 0)
>> >> > +  reg.stride = stride;
>> >> > +   return reg;
>> >> > +}
>> >> > +
>> >> 
>> >> This only works if reg.stride == 0 or 1, we need to honour the stride of
>> >> the original register (e.g. by doing reg.stride *= stride) or you'll end
>> >> up taking components not part of the region given as argument.  It gets
>> >> messy with ARF and HW registers because they represent the stride
>> >> differently...  I suggest that instead of fixing this you take the
>> >> following patch from my SIMD32 branch that introduces a somewhat easier
>> >> to use helper: Instead of doing 'stride(horiz_offset(retype(reg, t), i),
>> >> type_sz(reg.type) / type_sz(t))' to take the i-th subcomponent of type t
>> >> of the original register you would just do 'subscript(reg, t, i)'. 
>> >
>> > Actually, I think this is not what we want in a number of places.
>> >
>> > Sometimes we just want to apply a stride to a register, no type changes
>> > or anything. That is, we just want to do something like stride(reg, 2).
>> > Of course, we can code that with subscript as subscript(reg, retype(reg,
>> > type_that is_half_the_size), 0) but that is certainly forced. For
>> > example, we have code such as:
>> >
>> > bld.MOV(tmp, stride(component_i, 2));
>> > bld.MOV(horiz_offset(tmp, 8 * multiplier),
>> > stride(horiz_offset(component_i, 1), 2));
>> >
>> Oh no, this is precisely the use-case I had in mind: component_i is a
>> vector of 64-bit channels laid out contiguously, and you want to extract
>> each 32-bit field from each channel separately.  You happen to have
>> declared component_i as a 32-bit type but that's kind of a lie because
>> each component of "component_i" only represents half a channel, it would
>> definitely make more sense to make it a DF because that's what it
>> contains...
>
> I think that whether component_i is really 64-bit or 32-bit is not
> relevant, in the sense that what matters is how the algorithm
> needs/wants to interpret the data it has to manipulate. The way I
> thought about this is that we have our 32-bit channels mixed up and we
> need to re-arrange them, so from my perspective, this is all about
> 32-bit data that we need to move around in a particular fashion.

Keep in mind that channel masking is applied to component_i in multiples
of 64 bits regardless of the type you declare it with, so if you forget
the fact that component_i is a vector of elements 64 bits apart from
each other and pretend it's a vector of 32 bit channels, you'll end up
crossing channel boundaries illegally and miscompiling the program (or
working it around with ->force_writemask_all hacks like you do in the
SHUFFLE functions...).  subscript() guarantees that you never have to
break channel boundaries not even temporarily -- If the register you
give it as argument was channel-aligned the result is guaranteed
channel-aligned.  stride() OTOH gives you a channel-misaligned result
you need to compensate for with a matching retype(), so pretty much
every use of the result from stride() other than as argument for
retype() indicates a bug, that's why it makes sense to do both things in
the same operation.  Otherwise you need to deal with a temporary value
which is basically meaningless, and you need to manually make sure that
the value provided as stride multiplier matches the ratio of the type
sizes exactly (the snippet you paste below from PACK lowering
illustrates the problem since it would miscompile the program silently
if the instruction didn't have the exact number of sources you
expected).

> That's why I find the stride() helper a lot more natural to use, where
> I just tell the access type and the stride explicitly, while with
> subscript() I have to think about a register type and an access type
> instead and the stride is a consequence of that relation. I admit this
> is very subjective though.
>
Sorry, but this wasn't just a codestyle suggestion, most of the uses of
stride() and horiz_offset() you have in both series will be broken for a
certain input of fully well-formed IR due to at least one out of three
different reasons (horiz_offset() not b

[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.

2016-05-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95266

--- Comment #10 from Jose Fonseca  ---
(In reply to Jose Fonseca from comment #8)
> It looks like apitrace is about to enter an infinite loop: the address of
> the supposedly "real" glXCreateContextAttribsARB function is actually
> pointing to the fake glXCreateContextAttribsARB wrapper function.
> 
> I'll take a look.  Thanks for trying.
> 
> 
> It looks like doing `MESA_GL_VERSION_OVERRIDE=4.5 glretrace -S frame
> vtkRenderingOpenGL2CxxTests.4075.trim.trace` works.

I suspect problem is that you're loading Mesa libGL.so via LD_PRELOAD. From
attached mesa_bug.tar.bz2:

  $ cat run-mesa.sh 
  #!/bin/bash

  echo "Running test against mesa libGL."
  LD_PRELOAD=libGL.so ./vtkRenderingCoreCxxTests
"TestTranslucentLUTDepthPeeling" -I

This will defintely interfere with apitrace since apitrace needs to use
LD_PRELOAD to inject.

The proper way of using a custom libGL.so is to simply set LD_LIBRARY_PATH to
the directory that has it:

  ln -sf libGL.so libGL.so,1
  LD_LIBRARY_PATH=$PWD ./vtkRenderingCoreCxxTests
"TestTranslucentLUTDepthPeeling" -I

and all should work fine.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.

2016-05-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95266

--- Comment #9 from Roland Scheidegger  ---
FWIW it was not the samplers returning NaN for the float texture, but rather
the _coords_ being NaNs (thus the results from sampling could be anything,
though it is possible some implementations would use 0.0 as coord instead). But
those coords came from the vs, which seemed simple enough. So possibly the NaNs
might have been the result of a previous render pass. (It is of course also
possible some shader is simply bogus and relying on NaNs getting flushed to
zero somewhere on output.)
It would indeed be helpful to know if hw drivers are affected too.
(Using vtk sources to reproduce is pretty easy.)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.

2016-05-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95266

--- Comment #8 from Jose Fonseca  ---
(In reply to David Lonie from comment #7)
> Is this issue familiar to anyone? I'll keep trying to get this working, but
> would appreciate any tips if this is a known issue :)

It looks like apitrace is about to enter an infinite loop: the address of the
supposedly "real" glXCreateContextAttribsARB function is actually pointing to
the fake glXCreateContextAttribsARB wrapper function.

I'll take a look.  Thanks for trying.


It looks like doing `MESA_GL_VERSION_OVERRIDE=4.5 glretrace -S frame
vtkRenderingOpenGL2CxxTests.4075.trim.trace` works.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glapi: fix parameter type for GetSamplerParameterIuivEXT() in es_EXT.xml

2016-05-04 Thread Brian Paul


Missed that.  Will do.

I should probably also cc this for 11.2

Thanks.

-Brian

On 05/04/2016 02:05 PM, Ilia Mirkin wrote:

My bad (iirc)... copy/paste error. Also need to fix the OES variant. With that,

Reviewed-by: Ilia Mirkin 

On Wed, May 4, 2016 at 4:03 PM, Brian Paul  wrote:

The function returns GLuint, not GLfloat values.
---
  src/mapi/glapi/gen/es_EXT.xml | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mapi/glapi/gen/es_EXT.xml b/src/mapi/glapi/gen/es_EXT.xml
index bf67eae..e453401 100644
--- a/src/mapi/glapi/gen/es_EXT.xml
+++ b/src/mapi/glapi/gen/es_EXT.xml
@@ -901,7 +901,7 @@
  


-  
+  
  

  
--
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=BQIBaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=T0t4QG7chq2ZwJo6wilkFznRSFy-8uDKartPGbomVj8&m=AXoK8Bn0E4KSNMa9kHkvvoyH6_V9nNSXmrZwVVtLrU4&s=a8BcTju1sst0Dp04U3N4IgadHfryV8dptFRg_ldFeek&e=


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glapi: fix parameter type for GetSamplerParameterIuivEXT() in es_EXT.xml

2016-05-04 Thread Charmaine Lee

Reviewed-by: Charmaine Lee 

From: Brian Paul 
Sent: Wednesday, May 4, 2016 1:03 PM
To: mesa-dev@lists.freedesktop.org
Cc: Charmaine Lee; Sinclair Yeh
Subject: [PATCH] glapi: fix parameter type for GetSamplerParameterIuivEXT() in 
es_EXT.xml

The function returns GLuint, not GLfloat values.
---
 src/mapi/glapi/gen/es_EXT.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mapi/glapi/gen/es_EXT.xml b/src/mapi/glapi/gen/es_EXT.xml
index bf67eae..e453401 100644
--- a/src/mapi/glapi/gen/es_EXT.xml
+++ b/src/mapi/glapi/gen/es_EXT.xml
@@ -901,7 +901,7 @@
 
   
   
-  
+  
 

 
--
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glapi: fix parameter type for GetSamplerParameterIuivEXT() in es_EXT.xml

2016-05-04 Thread Ilia Mirkin

My bad (iirc)... copy/paste error. Also need to fix the OES variant. With that,

Reviewed-by: Ilia Mirkin 

On Wed, May 4, 2016 at 4:03 PM, Brian Paul  wrote:
> The function returns GLuint, not GLfloat values.
> ---
>  src/mapi/glapi/gen/es_EXT.xml | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/mapi/glapi/gen/es_EXT.xml b/src/mapi/glapi/gen/es_EXT.xml
> index bf67eae..e453401 100644
> --- a/src/mapi/glapi/gen/es_EXT.xml
> +++ b/src/mapi/glapi/gen/es_EXT.xml
> @@ -901,7 +901,7 @@
>   alias="GetSamplerParameterIuiv">
>
>
> -  
> +  
>  
>
>  
> --
> 1.9.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa: include texture format in glGenerateMipmap error message

2016-05-04 Thread Charmaine Lee


Reviewed-by: Charmaine Lee 

From: Brian Paul 
Sent: Wednesday, May 4, 2016 1:03 PM
To: mesa-dev@lists.freedesktop.org
Cc: Charmaine Lee; Sinclair Yeh
Subject: [PATCH] mesa: include texture format in glGenerateMipmap error message

---
 src/mesa/main/genmipmap.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/genmipmap.c b/src/mesa/main/genmipmap.c
index 1a6ae9a..d917220 100644
--- a/src/mesa/main/genmipmap.c
+++ b/src/mesa/main/genmipmap.c
@@ -139,7 +139,8 @@ _mesa_generate_texture_mipmap(struct gl_context *ctx,
   
srcImage->InternalFormat)) {
   _mesa_unlock_texture(ctx, texObj);
   _mesa_error(ctx, GL_INVALID_OPERATION,
-  "glGenerate%sMipmap(invalid internal format)", suffix);
+  "glGenerate%sMipmap(invalid internal format %s)", suffix,
+  _mesa_enum_to_string(srcImage->InternalFormat));
   return;
}

--
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] glapi: fix parameter type for GetSamplerParameterIuivEXT() in es_EXT.xml

2016-05-04 Thread Brian Paul

The function returns GLuint, not GLfloat values.
---
 src/mapi/glapi/gen/es_EXT.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mapi/glapi/gen/es_EXT.xml b/src/mapi/glapi/gen/es_EXT.xml
index bf67eae..e453401 100644
--- a/src/mapi/glapi/gen/es_EXT.xml
+++ b/src/mapi/glapi/gen/es_EXT.xml
@@ -901,7 +901,7 @@
 
   
   
-  
+  
 
 
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] mesa: include texture format in glGenerateMipmap error message

2016-05-04 Thread Brian Paul

---
 src/mesa/main/genmipmap.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/genmipmap.c b/src/mesa/main/genmipmap.c
index 1a6ae9a..d917220 100644
--- a/src/mesa/main/genmipmap.c
+++ b/src/mesa/main/genmipmap.c
@@ -139,7 +139,8 @@ _mesa_generate_texture_mipmap(struct gl_context *ctx,
   
srcImage->InternalFormat)) {
   _mesa_unlock_texture(ctx, texObj);
   _mesa_error(ctx, GL_INVALID_OPERATION,
-  "glGenerate%sMipmap(invalid internal format)", suffix);
+  "glGenerate%sMipmap(invalid internal format %s)", suffix,
+  _mesa_enum_to_string(srcImage->InternalFormat));
   return;
}
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/59] Initial arb_gpu_shader_fp64 support to the i965 scalar backend

2016-05-04 Thread Francisco Jerez

Samuel Iglesias Gonsálvez  writes:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
>
>
> On 03/05/16 20:59, Kenneth Graunke wrote:
>> Other than patches 37, 56, and ones you agreed to drop, the series
>> is: Reviewed-by: Kenneth Graunke 
>> 
>> I think you can go ahead and land all except those, and we can
>> land new solutions for those problems afterwards.
>> 
>> We still need to fix the horiz_offset problem, and I think Curro's 
>> subscript() helper is a great way to do that.  I would be fine
>> with fixing that after this series lands - it would probably mean
>> less rebasing :)
>> 
>
> We prepared a patch series without patches 11, 22 (both were dropped),
> 37, 40 (it has a NACK from Curro) and 56. Our idea was to push the
> patch series once we check there is no piglit regressions.
>
> We got 0 piglit regressions on ILK, SNB, IVB, HSW and BDW, the latter
> with the extension enabled with the environment variable.
>
> Sadly, on SKL we had regressions (including GPU hangs) only if we
> *enable* the arb_gpu_shader_fp64 extension. Those regressions are
> fixed by patch 40 but we need still to investigate why they are happening.
>
That sounds like PATCH 40 was most likely working around regs_read()
being interpreted with incorrect units in some other place.  The use of
regs_read() in PATCH 36 is obviously wrong which might explain the
problem.  Something along the lines of 'inst->components_read(i) *
type_sz(inst->src[i]) / 4' would do what you want -- With that changed
PATCH 36 is:

 Reviewed-by: Francisco Jerez 

> For that reason, we have not pushed anything yet. Tomorrow we will
> investigate why we have those regressions only on SKL and which tests
> are affected.
>
No worries, you still have a couple of weeks until the merge deadline, I
don't see the need to rush in any changes that still have outstanding
comments.

> JFTR, if we disable the extension on SKL (i.e. we don't set the
> environment variable), we have 0 piglit regressions on that generation.
>
> Thanks,
>
> Sam
>
>
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2
>
> iQIcBAEBCAAGBQJXKgPaAAoJEH/0ujLxfcNDtHgP/0JBd96UX2026Y0zBydrE1Ta
> U7FV5e0SJXY5JnxWqdV4ZhtdiqQ0EDLj2G0NlFq0u29c+lQX//x9KHMANxbt2tdc
> jStRm4NMNXds4rxFYaiRs+3p4XLquV4aZ33Px+0CT6m6Hjsm6sNJZViarB41SjAq
> HVRuKAxBbVTjqqPXBKAoXXxp34OUym5uZydXFjavCs1lU70DVwZq9t/vzIAGUunT
> NCbehScgtdVupTDEzORZ7QRXvGyqQoJa+HBeuEK6HvvhonfezwrB1hawIRGP34V/
> /k+UvL/DA3GEenpCqHfOIwnxVRYN6iXYNlHSNK+LolsjR/CUxccBs67tcQiGAPBA
> g8V02KHbuIN2JyXLSwzJ0ovX5aA+vr5Fuc8e+0PZ7raa7GxIsDg3LbUNnyT8dT83
> fQAM6IMFYhZUOKKF8ynqw5tvRAaUhWJwageYqRkdHROSs8toero7RDAabEdBKWhG
> scGkKEu3S7vsGLMBzeSch/UqyAXdaBcJzRSTKCi2UHyWT/5aPczqf4eacyLgWpa/
> 3r+qqzljZ2F8MUbnyP71CyTufx+UQzoBAlB4kM09UcF9RC95XTSQrMBMB/SqR2Om
> +ebdE/d1e5ubExNQTbQPrb2OGCIiQhRGQo8MFcuFtgl2DabJFvW4QvsuJKinhQLR
> zcfupbRUPDegesKRU+es
> =phid
> -END PGP SIGNATURE-
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.

2016-05-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95266

--- Comment #7 from David Lonie  ---
I'm trying to get a trace, but apitrace is failing an assertion:

$ apitrace trace --api gl bin/vtkRenderingCoreCxxTests
"TestTranslucentLUTDepthPeeling"

vtkRenderingCoreCxxTests:
/build/apitrace/src/apitrace-7.1/build/wrappers/glxtrace.cpp:97195: void (*
_wrapProcAddress(const GLubyte*, __GLXextFuncPtr))(): Assertion `procPtr !=
(__GLXextFuncPtr)&glXCreateContextAttribsARB' failed.

This is only happening on Mesa (I get a successful trace using my system nVidia
drivers), and it also happens when following the "Trace manually" instructions
here:

https://github.com/apitrace/apitrace/blob/master/docs/USAGE.markdown#tracing-manually

Is this issue familiar to anyone? I'll keep trying to get this working, but
would appreciate any tips if this is a known issue :)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/6] glapi: Harden GLX request size processing (v2)

2016-05-04 Thread Adam Jackson

v2: Use == not is for equality testing (Dylan Baker)

Signed-off-by: Adam Jackson 
---
 src/mapi/glapi/gen/glX_XML.py|  2 +-
 src/mapi/glapi/gen/glX_proto_recv.py |  2 --
 src/mapi/glapi/gen/glX_proto_send.py |  2 --
 src/mapi/glapi/gen/glX_proto_size.py | 24 +++-
 src/mapi/glapi/gen/gl_XML.py |  2 +-
 5 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/src/mapi/glapi/gen/glX_XML.py b/src/mapi/glapi/gen/glX_XML.py
index 12ff291..c659e93 100644
--- a/src/mapi/glapi/gen/glX_XML.py
+++ b/src/mapi/glapi/gen/glX_XML.py
@@ -357,7 +357,7 @@ class glx_function(gl_XML.gl_function):
 # FIXME adds some extra diffs to the generated
 # FIXME code.
 
-size_string = size_string + " + __GLX_PAD(%s)" % 
(p.size_string(1))
+size_string = size_string + " + safe_pad(%s)" % 
(p.size_string(1))
 
 return size_string
 
diff --git a/src/mapi/glapi/gen/glX_proto_recv.py 
b/src/mapi/glapi/gen/glX_proto_recv.py
index afee388..09cf05d 100644
--- a/src/mapi/glapi/gen/glX_proto_recv.py
+++ b/src/mapi/glapi/gen/glX_proto_recv.py
@@ -89,8 +89,6 @@ class 
PrintGlxDispatchFunctions(glX_proto_common.glx_print_proto):
 print '#include "indirect_util.h"'
 print '#include "singlesize.h"'
 print ''
-print '#define __GLX_PAD(x)  (((x) + 3) & ~3)'
-print ''
 print 'typedef struct {'
 print '__GLX_PIXEL_3D_HDR;'
 print '} __GLXpixel3DHeader;'
diff --git a/src/mapi/glapi/gen/glX_proto_send.py 
b/src/mapi/glapi/gen/glX_proto_send.py
index 8b3d8d7..10abcff 100644
--- a/src/mapi/glapi/gen/glX_proto_send.py
+++ b/src/mapi/glapi/gen/glX_proto_send.py
@@ -177,8 +177,6 @@ class PrintGlxProtoStubs(glX_proto_common.glx_print_proto):
 print '#include '
 
 print ''
-print '#define __GLX_PAD(n) (((n) + 3) & ~3)'
-print ''
 self.printFastcall()
 self.printNoinline()
 print ''
diff --git a/src/mapi/glapi/gen/glX_proto_size.py 
b/src/mapi/glapi/gen/glX_proto_size.py
index 75fc26f..3a1c554 100644
--- a/src/mapi/glapi/gen/glX_proto_size.py
+++ b/src/mapi/glapi/gen/glX_proto_size.py
@@ -291,7 +291,7 @@ class glx_server_enum_function(glx_enum_function):
 print ''
 print 'compsize = __gl%s_size(%s);' % (f.name, 
string.join(f.count_parameter_list, ","))
 p = f.variable_length_parameter()
-print 'return __GLX_PAD(%s);' % (p.size_string())
+print 'return safe_pad(%s);' % (p.size_string())
 
 print '}'
 print ''
@@ -428,7 +428,7 @@ class PrintGlxReqSize_h(PrintGlxReqSize_common):
 def printBody(self, api):
 for func in api.functionIterateGlx():
 if not func.ignore and func.has_variable_size_request():
-print 'extern PURE _X_HIDDEN int __glX%sReqSize(const GLbyte 
*pc, Bool swap);' % (func.name)
+print 'extern PURE _X_HIDDEN int __glX%sReqSize(const GLbyte 
*pc, Bool swap, int reqlen);' % (func.name)
 
 
 class PrintGlxReqSize_c(PrintGlxReqSize_common):
@@ -452,20 +452,18 @@ class PrintGlxReqSize_c(PrintGlxReqSize_common):
 print '#include "indirect_size.h"'
 print '#include "indirect_reqsize.h"'
 print ''
-print '#define __GLX_PAD(x)  (((x) + 3) & ~3)'
-print ''
 print '#if defined(__CYGWIN__) || defined(__MINGW32__)'
 print '#  undef HAVE_ALIAS'
 print '#endif'
 print '#ifdef HAVE_ALIAS'
 print '#  define ALIAS2(from,to) \\'
-print 'GLint __glX ## from ## ReqSize( const GLbyte * pc, Bool 
swap ) \\'
+print 'GLint __glX ## from ## ReqSize( const GLbyte * pc, Bool 
swap, int reqlen ) \\'
 print '__attribute__ ((alias( # to )));'
 print '#  define ALIAS(from,to) ALIAS2( from, __glX ## to ## ReqSize )'
 print '#else'
 print '#  define ALIAS(from,to) \\'
-print 'GLint __glX ## from ## ReqSize( const GLbyte * pc, Bool 
swap ) \\'
-print '{ return __glX ## to ## ReqSize( pc, swap ); }'
+print 'GLint __glX ## from ## ReqSize( const GLbyte * pc, Bool 
swap, int reqlen ) \\'
+print '{ return __glX ## to ## ReqSize( pc, swap, reqlen ); }'
 print '#endif'
 print ''
 print ''
@@ -547,7 +545,7 @@ class PrintGlxReqSize_c(PrintGlxReqSize_common):
 
 def common_func_print_just_header(self, f):
 print 'int'
-print '__glX%sReqSize( const GLbyte * pc, Bool swap )' % (f.name)
+print '__glX%sReqSize( const GLbyte * pc, Bool swap, int reqlen )' % 
(f.name)
 print '{'
 
 
@@ -602,7 +600,6 @@ class PrintGlxReqSize_c(PrintGlxReqSize_common):
 offset = 0
 fixup = []
 params = []
-plus = ''
 size = ''
 param_offsets = {}
 
@@ -620,9 +617,10 @@ class PrintGlxReqSize_c(PrintGlxReqSize_common):
 if s == 0: s = 1

[Mesa-dev] [PATCH 5/6] glapi: Define PURE for Sun Studio as well

2016-05-04 Thread Adam Jackson

Signed-off-by: Adam Jackson 
---
 src/mapi/glapi/gen/gl_XML.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mapi/glapi/gen/gl_XML.py b/src/mapi/glapi/gen/gl_XML.py
index 4f35343..8036a02 100644
--- a/src/mapi/glapi/gen/gl_XML.py
+++ b/src/mapi/glapi/gen/gl_XML.py
@@ -183,7 +183,7 @@ class gl_print_base(object):
 The name is also added to the file's undef_list.
 """
 self.undef_list.append("PURE")
-print """#  if defined(__GNUC__)
+print """#  if defined(__GNUC__) || (defined(__SUNPRO_C) && 
(__SUNPRO_C >= 0x590))
 #define PURE __attribute__((pure))
 #  else
 #define PURE
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/6] glapi/glx: Mark byteswap functions as _X_UNUSED (v2)

2016-05-04 Thread Adam Jackson

Squashes the one remaining warning in the xserver build.

v2: Also clean up some non-standard whitespace (Ian Romanick)

Reviewed-by: Ian Romanick 
Signed-off-by: Adam Jackson 
---
 src/mapi/glapi/gen/glX_proto_recv.py | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/mapi/glapi/gen/glX_proto_recv.py 
b/src/mapi/glapi/gen/glX_proto_recv.py
index 09cf05d..5433288 100644
--- a/src/mapi/glapi/gen/glX_proto_recv.py
+++ b/src/mapi/glapi/gen/glX_proto_recv.py
@@ -171,11 +171,11 @@ class 
PrintGlxDispatchFunctions(glX_proto_common.glx_print_proto):
 if t.glx_name not in already_done:
 real_name = self.real_types[t_size]
 
-print 'static %s' % (t_name)
-print 'bswap_%s( const void * src )' % (t.glx_name)
+print 'static _X_UNUSED %s' % (t_name)
+print 'bswap_%s(const void * src)' % (t.glx_name)
 print '{'
 print 'union { %s dst; %s ret; } x;' % (real_name, 
t_name)
-print 'x.dst = bswap_%u( *(%s *) src );' % (t_size * 
8, real_name)
+print 'x.dst = bswap_%u(*(%s *) src);' % (t_size * 8, 
real_name)
 print 'return x.ret;'
 print '}'
 print ''
@@ -183,12 +183,12 @@ class 
PrintGlxDispatchFunctions(glX_proto_common.glx_print_proto):
 
 for bits in [16, 32, 64]:
 print 'static void *'
-print 'bswap_%u_array( uint%u_t * src, unsigned count )' % (bits, 
bits)
+print 'bswap_%u_array(uint%u_t * src, unsigned count)' % (bits, 
bits)
 print '{'
 print 'unsigned  i;'
 print ''
-print 'for ( i = 0 ; i < count ; i++ ) {'
-print 'uint%u_t temp = bswap_%u( src[i] );' % (bits, bits)
+print 'for (i = 0 ; i < count ; i++) {'
+print 'uint%u_t temp = bswap_%u(src[i]);' % (bits, bits)
 print 'src[i] = temp;'
 print '}'
 print ''
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/6] Update generated GLX server code

2016-05-04 Thread Adam Jackson

Another attempt at syncing the GLX generator scripts with xserver.

Jon mentioned a couple of issues in the last series, namely that these two
patches were still necessary:

https://lists.x.org/archives/xorg-devel/2014-April/041597.html
https://lists.x.org/archives/xorg-devel/2014-April/041919.html

The former doesn't work at all: 'api' isn't a global, nor is it a parameter
to emit_function_call(). So that's not included here, and anyone updating
the xserver side will just need to be careful about the ARB_multitexture
bits. It's also fairly ugly! Turns out there really isn't enough
information attached to the entrypoint itself to be able to walk backward
to the GL version or extension name directly. I've been fiddling with
getting it working and not having a lot of fun.

The latter is addressed in a different way, I've just dropped the "skip
that which is marked for skipping" patch from the series. This means we
will generate protocol code for NV_*_program again, which is harmless
enough. But it does mean xserver will need to revert the commit that
removes the open-coded size logic for GetProgramString.

- ajax

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 6/6] glapi/gen: Copy some GL 1.0 enum details into ARB_viewport_array

2016-05-04 Thread Adam Jackson

Otherwise the instances in the extension XML override the core
definitions, and we stop knowing their sizes in indirect_size_get.c

Signed-off-by: Adam Jackson 
---
 src/mapi/glapi/gen/ARB_viewport_array.xml | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/src/mapi/glapi/gen/ARB_viewport_array.xml 
b/src/mapi/glapi/gen/ARB_viewport_array.xml
index b20cf61..ebd5b99 100644
--- a/src/mapi/glapi/gen/ARB_viewport_array.xml
+++ b/src/mapi/glapi/gen/ARB_viewport_array.xml
@@ -12,10 +12,18 @@
 
 
 
-
-
-
-
+
+
+
+
+
+
+
+
+
+
+
+
 
 
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/6] glapi: Add the safe_{add, mul, pad} functions from xserver

2016-05-04 Thread Adam Jackson

We're about to update the generator scripts to use these, easier not to
vary between client and server.

Signed-off-by: Adam Jackson 
---
 src/mapi/glapi/gen/glX_proto_send.py | 24 
 1 file changed, 24 insertions(+)

diff --git a/src/mapi/glapi/gen/glX_proto_send.py 
b/src/mapi/glapi/gen/glX_proto_send.py
index 2b33030..8b3d8d7 100644
--- a/src/mapi/glapi/gen/glX_proto_send.py
+++ b/src/mapi/glapi/gen/glX_proto_send.py
@@ -174,6 +174,7 @@ class PrintGlxProtoStubs(glX_proto_common.glx_print_proto):
 print '#include '
 print '#include '
 print '#include '
+print '#include '
 
 print ''
 print '#define __GLX_PAD(n) (((n) + 3) & ~3)'
@@ -181,6 +182,29 @@ class PrintGlxProtoStubs(glX_proto_common.glx_print_proto):
 self.printFastcall()
 self.printNoinline()
 print ''
+
+print 'static _X_INLINE int safe_add(int a, int b)'
+print '{'
+print 'if (a < 0 || b < 0) return -1;'
+print 'if (INT_MAX - a < b) return -1;'
+print 'return a + b;'
+print '}'
+print 'static _X_INLINE int safe_mul(int a, int b)'
+print '{'
+print 'if (a < 0 || b < 0) return -1;'
+print 'if (a == 0 || b == 0) return 0;'
+print 'if (a > INT_MAX / b) return -1;'
+print 'return a * b;'
+print '}'
+print 'static _X_INLINE int safe_pad(int a)'
+print '{'
+print 'int ret;'
+print 'if (a < 0) return -1;'
+print 'if ((ret = safe_add(a, 3)) < 0) return -1;'
+print 'return ret & (GLuint)~3;'
+print '}'
+print ''
+
 print '#ifndef __GNUC__'
 print '#  define __builtin_expect(x, y) x'
 print '#endif'
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/6] glapi: Fix whitespace droppings when printing the license header

2016-05-04 Thread Adam Jackson

Signed-off-by: Adam Jackson 
---
 src/mapi/glapi/gen/gl_XML.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mapi/glapi/gen/gl_XML.py b/src/mapi/glapi/gen/gl_XML.py
index 2e7123e..e11f6fc 100644
--- a/src/mapi/glapi/gen/gl_XML.py
+++ b/src/mapi/glapi/gen/gl_XML.py
@@ -130,7 +130,7 @@ class gl_print_base(object):
 % (self.name)
 print ''
 print '/*'
-print ' * ' + self.license.replace('\n', '\n * ')
+print (' * ' + self.license.replace('\n', '\n * ')).replace(' \n', 
'\n')
 print ' */'
 print ''
 if self.header_tag:
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 02/15] i965: Configure how to store 64PASSTHRU vertex components

2016-05-04 Thread Kenneth Graunke

On Wednesday, May 4, 2016 1:01:33 PM PDT Antía Puentes wrote:
> Hi Kenneth,
> 
> thanks for reviewing.
> 
> On mié, 2016-05-04 at 03:36 -0700, Kenneth Graunke wrote:
> > On Thursday, April 28, 2016 1:40:32 PM PDT Antia Puentes wrote:
> > > 
> > > From the Broadwell specification, structure VERTEX_ELEMENT_STATE
> > > description:
> > > 
> > >"When SourceElementFormat is set to one of the *64*_PASSTHRU
> > > formats,  64-bit components are stored in the URB without any
> > > conversion. In this case, vertex elements must be written as 128
> > > or 256 bits, with VFCOMP_STORE_0 being used to pad the output
> > > as required. E.g., if R64_PASSTHRU is used to copy a 64-bit Red 
> > component into
> > > 
> > > the URB, Component 1 must be specified as VFCOMP_STORE_0 (with
> > > Components 2,3 set to VFCOMP_NOSTORE) in order to output a 128-bit
> > > vertex element, or Components 1-3 must be specified as 
VFCOMP_STORE_0
> > > in order to output a 256-bit vertex element. Likewise, use of
> > > R64G64B64_PASSTHRU requires Component 3 to be specified as 
> > VFCOMP_STORE_0
> > > 
> > > in order to output a 256-bit vertex element."
> > > 
> > > Uses 128-bits to write double and dvec2 vertex elements, and 256-bits 
for
> > > dvec3 and dvec4 vertex elements.
> > > 
> > > Signed-off-by: Juan A. Suarez Romero 
> > > Signed-off-by: Antia Puentes 
> > > ---
> > >  src/mesa/drivers/dri/i965/gen8_draw_upload.c | 34 +

> > +++
> > > 
> > >  1 file changed, 34 insertions(+)
> > > 
> > > diff --git a/src/mesa/drivers/dri/i965/gen8_draw_upload.c b/src/mesa/
> > drivers/dri/i965/gen8_draw_upload.c
> > > 
> > > index fe5ed35..14f7091 100644
> > > --- a/src/mesa/drivers/dri/i965/gen8_draw_upload.c
> > > +++ b/src/mesa/drivers/dri/i965/gen8_draw_upload.c
> > > @@ -217,6 +217,40 @@ gen8_emit_vertices(struct brw_context *brw)
> > >   break;
> > >}
> > >  
> > > +  /* From the BDW PRM, Volume 2d, page 586 (VERTEX_ELEMENT_STATE):
> > > +   *
> > > +   * "When SourceElementFormat is set to one of the 
*64*_PASSTHRU
> > > +   * formats, 64-bit components are stored in the URB without 
any
> > > +   * conversion. In this case, vertex elements must be written 
as 
> > 128
> > > 
> > > +   * or 256 bits, with VFCOMP_STORE_0 being used to pad the 
output
> > > +   * as required. E.g., if R64_PASSTHRU is used to copy a 64-
bit 
> > Red
> > > 
> > > +   * component into the URB, Component 1 must be specified as
> > > +   * VFCOMP_STORE_0 (with Components 2,3 set to VFCOMP_NOSTORE)
> > > +   * in order to output a 128-bit vertex element, or Components 
1-3 
> > must
> > > 
> > > +   * be specified as VFCOMP_STORE_0 in order to output a 256-
bit 
> > vertex
> > > 
> > > +   * element. Likewise, use of R64G64B64_PASSTHRU requires 
> > Component 3
> > > 
> > > +   * to be specified as VFCOMP_STORE_0 in order to output a 
256-bit 
> > vertex
> > > 
> > > +   * element."
> > > +   */
> > The above comment seems to indicate that R64 needs component 1 set to
> > STORE_0, and that looks to be missing.  I'm guessing you need to add:
> > 
> > > 
> > > +  if (input->glarray->Doubles) {
> > > + switch (input->glarray->Size) {
> > > + case 0:
> >comp0 = BRW_VE1_COMPONENT_STORE_0;
> >/* fallthrough */
> > 
> > > 
> > > + case 1:
> >comp1 = BRW_VE1_COMPONENT_STORE_0;
> >/* fallthrough */
> > 
> 
> I have not added them because comp0 and comp1 are initialized in the
> code immediately above the switch I added. That already existing code
> sets the default values for the components:
> 
> switch (input->glarray->Size) {
>   case 0: comp0 = BRW_VE1_COMPONENT_STORE_0;
>   case 1: comp1 = BRW_VE1_COMPONENT_STORE_0;
>   case 2: comp2 = BRW_VE1_COMPONENT_STORE_0;
>   case 3: comp3 = input->glarray->Integer ?
>BRW_VE1_COMPONENT_STORE_1_INT
>  : BRW_VE1_COMPONENT_STORE_1_FLT;
>  break;
>   }
> 
> In my switch I just override the values that are not right for the
> double-precision floating point cases.

Right, sorry :)  That seems totally fine then.

Patches 1-3 are:
Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 01/15] i965: get the proper vertex surface type for doubles on gen8+

2016-05-04 Thread Kenneth Graunke

On Wednesday, May 4, 2016 3:30:13 PM PDT Alejandro Piñeiro wrote:
> 
> On 04/05/16 12:31, Kenneth Graunke wrote:
> > On Thursday, April 28, 2016 1:40:31 PM PDT Antia Puentes wrote:
> >> From: Alejandro Piñeiro 
> >>
> >> This commit adds support for PASSTHRU format when pushing
> >> double-precision attributes.
> >>
> >> Check glarray->Doubles in order to know if we should choose a format
> >> that does a conversion to float, or just passthru the 64-bit double.
> >> ---
> >>  src/mesa/drivers/dri/i965/brw_draw_upload.c | 30 +++
++
> > +---
> >>  1 file changed, 27 insertions(+), 3 deletions(-)
> > What's the motivation for using the PASSTHRU formats?  Is there some
> > advantage over just using the R64*FLOAT formats?
> 
> We can't use R64*FLOAT because it does a double to float conversion, as
> explained at the bdw PRM (vol 7, section 10.15, page 470, Table "Source
> Element Formats Supported in VF Unit"). PASSTHRU ensures that the
> 64-bits are passed without any conversion, that is what we want. If it
> is not clear reading the commit message, I could expand it.

Ahh...thanks.  For some reason, I just assumed that R64_FLOAT meant
double precision float, not "reads a 64 bit float and outputs a 32 bit
float".  Heh :)

Thanks for the explanation!


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/nine: clean up WINAPI definition

2016-05-04 Thread Christian Schmidbauer

On Sun, May 1, 2016 at 6:03 PM, Axel Davy  wrote:
> Do we need the #ifndef WINAPI part ?
>
> Axel
>
>
> On 29/04/2016 20:53, Christian Schmidbauer wrote:
>>
>> As Emil pointed out, only gcc, clang and MSVC compatibility is required.
>> Hence the check for GNUC can be skipped, as __i386__ and __x86_64__ are
>> only defined for gcc/clang, not for MSVC.
>>
>> Remove the #undef which has been there for historic reasons, when wine
>> dlls for nine have been built inside mesa. Instead use #ifndef in order
>> to avoid redefining WINAPI from MSVC's headers.
>> ---
>>   include/D3D9/d3d9types.h | 16 +---
>>   1 file changed, 5 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/D3D9/d3d9types.h b/include/D3D9/d3d9types.h
>> index e0b8652..88f22b9 100644
>> --- a/include/D3D9/d3d9types.h
>> +++ b/include/D3D9/d3d9types.h
>> @@ -173,22 +173,16 @@ typedef struct _RGNDATA {
>>   #define D3DPRESENTFLAG_RESTRICTED_CONTENT  0x0400
>>   #define D3DPRESENTFLAG_RESTRICT_SHARED_RESOURCE_DRIVER 0x0800
>>   -
>> -#ifdef WINAPI
>> -#undef WINAPI
>> -#endif /* WINAPI*/
>> -
>> -#ifdef __GNUC__
>> -  #if (defined(__x86_64__) && !defined(__ILP32__)) || defined(_M_X64)
>> +/* Windows calling convention */
>> +#ifndef WINAPI
>> +  #if defined(__x86_64__) && !defined(__ILP32__)
>>   #define WINAPI __attribute__((ms_abi))
>> -  #elif defined(__i386) || defined(_M_IX86)
>> +  #elif defined(__i386__)
>>   #define WINAPI __attribute__((__stdcall__))
>> #else /* neither amd64 nor i386 */
>>   #define WINAPI
>> #endif
>> -#else /* __GNUC__ */
>> -  #define WINAPI
>> -#endif
>> +#endif /* WINAPI */
>> /* Implementation caps */
>>   #define D3DPRESENT_BACK_BUFFERS_MAX3
>

If you do not care about compatibility with MSVC, you can skip it.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/nine: clean up WINAPI definition

2016-05-04 Thread Christian Schmidbauer

On Sun, May 1, 2016 at 12:02 PM, Emil Velikov  wrote:
> Hi Christian,
>
> On 29 April 2016 at 19:53, Christian Schmidbauer
>  wrote:
>> As Emil pointed out, only gcc, clang and MSVC compatibility is required.
>> Hence the check for GNUC can be skipped, as __i386__ and __x86_64__ are
>> only defined for gcc/clang, not for MSVC.
>>
>> Remove the #undef which has been there for historic reasons, when wine
>> dlls for nine have been built inside mesa. Instead use #ifndef in order
>> to avoid redefining WINAPI from MSVC's headers.
> Reviewed-by: Emil Velikov 
>
> Thanks for picking this up. Should we apply this for stable as well
> (considering the patch it 'fixes' is already in there) ?
> I'm slightly concerned about the removal of undef WINAPI - I take it
> that you've tested the patch ?
>
> Humbly waiting for an ack/r-b/t-b from someone (else) involved in Nine ;-)
>
> Thanks
> Emil

I didn't CC stable as the patch just cleans up the code and does not
really fix anything. I don't mind applying this to stable though.

I also talked to zhasha on IRC and he also thinks it was only
necessary in the past. I did test the patch with wine (on amd64) and
Xnine (on x32, i386).
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.

2016-05-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95266

--- Comment #6 from Jose Fonseca  ---
(In reply to David Lonie from comment #3)
> I included an apitrace on the last bug, and was told that it was not
> suitable for reproducing bugs (something about apitrace forcing a 4.5
> context IIRC). I can make one for this issue if anyone needs it.

I agree a trace would be useful.

If the issue repros with Mesa + llvmpipe, then just record the trace with Mesa
+ llvmpipe.

If you don't record the trace with the exact same OpenGL implementation, then
all sort of things can go wrong.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.

2016-05-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95266

--- Comment #4 from Ilia Mirkin  ---
(In reply to David Lonie from comment #3)
> I included an apitrace on the last bug, and was told that it was not
> suitable for reproducing bugs (something about apitrace forcing a 4.5
> context IIRC). I can make one for this issue if anyone needs it.

Looking at it...

2 @0 glXCreateContextAttribsARB(dpy = 0x6c4f20, config = 0x72c1c0,
share_context = NULL, direct = True, attrib_list =
[GLX_CONTEXT_MAJOR_VERSION_ARB, 4, GLX_CONTEXT_MINOR_VERSION_ARB, 5, 0]) =
0x74cf10

don't do that? :) Mesa does not support GL 4.5, was that trace captured on some
proprietary driver that does? Can you change VTK (even temporarily) to only
request GL 3.0? Or alternatively flip to requesting a core context (which that
trace isn't doing) and request GL 3.2?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.

2016-05-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95266

--- Comment #5 from Ilia Mirkin  ---
FWIW I was able to run your trace from the other bug with

MESA_GL_VERSION_OVERRIDE=4.5 MESA_GLSL_VERSION_OVERRIDE=450

so I suspect that a trace of this issue would be nice to have as well.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] About tolerance calculation on specific (builtin) functions

2016-05-04 Thread Jan Vesely

On Wed, 2016-05-04 at 13:48 -0400, Ilia Mirkin wrote:
> On Wed, May 4, 2016 at 1:41 PM, Connor Abbott 
> wrote:
> > 
> > On Wed, May 4, 2016 at 1:05 PM, Andres Gomez 
> > wrote:
> > > 
> > > Hi,
> > > 
> > > as part of the work done to "Add FP64 support to the i965 shader
> > > backends" at:
> > > https://bugs.freedesktop.org/show_bug.cgi?id=92760
> > > 
> > > I've been working to add piglit tests that would check the new
> > > features
> > > added by this addition.
> > > 
> > > Due to this, I've been checking and making modifications into the
> > > builtin_functions*py module used by some generators. These
> > > modules use
> > > automatic calculation of the tolerance when distance checking the
> > > result of a function.
> > > 
> > > As already stated in the good documentation of the module, the
> > > tolerance is computed following what it is stated in OpenGL's
> > > specs:
> > > 
> > > From the OpenGL 1.4 spec (2.1.1 "Floating-Point
> > > Computation"):
> > > 
> > >   "We require simply that numbers' floating-point parts
> > > contain
> > >   enough bits ... so that individual results of floating-
> > > point
> > >   operations are accurate to about 1 part in 10^5."
> > > 
> > > Although the text is open to interpretation, and for specific
> > > operations we take a little bit more flexible approach,
> > > basically, the
> > > tolerance is calculated as:
> > > 
> > > tolerance =  / 10⁵
> > > 
> > > This makes sense since the precision of a floating point value
> > > gets
> > > reduced while the number gets bigger[1].
> > > 
> > > Following this approach, for a number in the order of 40*10⁵, the
> > > tolerance used is ~40. While this should be OK for most of the
> > > functions, it seems to me that such a high tolerance should not
> > > be used
> > > with specific functions, if any tolerance should be used at all.
> > > 
> > > For example, when testing the "sign()" function, seems pretty
> > > obvious
> > > that using a value of 40 in the tolerance of a function that
> > > should
> > > return either 1.0, 0.0 or -1.0 doesn't make much sense.
> > > 
> > > A similar case is the "trunc" function and probably others, like
> > > "floor", "ceil", "abs", etc.
> > > 
> > > My conclusion is that it should be safe to assume no tolerance in
> > > this
> > > functions and I could modify the algorithm used for them in the
> > > python
> > > module but I wanted to have some feedback in case I'm not taking
> > > into
> > > account something that advices against doing these modifications.
> > > 
> > > Opinions?
> > Hi,
> > 
> > If you look at the GLSL 4.40 spec, in section 4.7.1 ("Range and
> > Precision") you'll find a table listing the precision of various
> > operations. Your intuition about floor(), ceil(), etc. needing the
> > exact result is correct, as you can see. For doubles, it says "The
> > precision of double-precision operations is at least that of single
> > precision." Now, it's up for interpretation whether that means that
> > they must have the same *absolute* precision or the same ULP's (if
> > the
> > operation is not exact). For example inversesqrt() is listed at 2
> > ULP
> > for single precision, which means that there must be 24 - 2 = 22
> > bits
> > of precision. For doubles, are there still 22 bits of precision
> > required, or is the requirement really that there still be 2 ULP's
> > of
> > precision in which case there are 53 - 2 = 51 bits of precision. I
> > wrote the original lowering pass for inversesqrt() and friends
> > assuming the latter was correct, since it seems like the most sane
> > to
> > me (or else doubles would have no advantage over floats for
> > anything
> > except addition and multiplication).
> All of this is added by ARB_shader_precision (part of GLSL 4.10). It
> also punts on doubles, good to know that the latest specs have
> maintained the status quo. One interpretation of it is "you can
> implement doubles with float ops". Another is that the ULP's apply to
> doubles as well.

Just out of curiosity.
Is OpenGL's ULP defined anywhere?
The OpenCL's definition is "distance between the two non-equal finite
floating-point numbers nearest x". the required values are same as the
ones in ARB_shader_precision which would hint to use the same
definition. defining ULP as digits in binary representation gives you
exponentially higher errors.

Jan

> 
> Prior to that ext, the precision of everything was to 10^-5.
> 
>   -ilia
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-- 
Jan Vesely 


signature.asc
Description: This is a digitally signed message part
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.

2016-05-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95266

--- Comment #3 from David Lonie  ---
I included an apitrace on the last bug, and was told that it was not suitable
for reproducing bugs (something about apitrace forcing a 4.5 context IIRC). I
can make one for this issue if anyone needs it.

I share concerns about running random binaries, and have included instructions
for building VTK from official sources to reproduce the issue for the more wary
developers among us :)

I've only tested using the mesa configuration options provided in the report.
If there are other configurations you'd like me to test, I'd be happy to do so.
Just let me know the exact options to specify -- mesa's configure options are a
bit of a black box to me.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] About tolerance calculation on specific (builtin) functions

2016-05-04 Thread Eirik Byrkjeflot Anonsen

Andres Gomez  writes:

> Hi,
>
> as part of the work done to "Add FP64 support to the i965 shader
> backends" at:
> https://bugs.freedesktop.org/show_bug.cgi?id=92760
>
> I've been working to add piglit tests that would check the new features
> added by this addition.
>
> Due to this, I've been checking and making modifications into the
> builtin_functions*py module used by some generators. These modules use
> automatic calculation of the tolerance when distance checking the
> result of a function.
>
> As already stated in the good documentation of the module, the
> tolerance is computed following what it is stated in OpenGL's specs:
>
> From the OpenGL 1.4 spec (2.1.1 "Floating-Point Computation"):
>
>   "We require simply that numbers' floating-point parts contain
>   enough bits ... so that individual results of floating-point
>   operations are accurate to about 1 part in 10^5."
>
> Although the text is open to interpretation, and for specific
> operations we take a little bit more flexible approach, basically, the
> tolerance is calculated as:
>
> tolerance =  / 10⁵
>
> This makes sense since the precision of a floating point value gets
> reduced while the number gets bigger[1].
>
> Following this approach, for a number in the order of 40*10⁵, the
> tolerance used is ~40. While this should be OK for most of the
> functions, it seems to me that such a high tolerance should not be used
> with specific functions, if any tolerance should be used at all.
>
> For example, when testing the "sign()" function, seems pretty obvious
> that using a value of 40 in the tolerance of a function that should
> return either 1.0, 0.0 or -1.0 doesn't make much sense.

That's an interesting interpretation. I would assume that a tolerance of
1 part in 10^5 for any of these three values would be somewhat less than
40 :)

Although that does bring up an interesting point: To get things correct,
probably the tolerance has to apply to all values in the chain, both the
inputs, outputs and any intermediates. Consider for example the case of

a = 40 * 10^5 + 1
b = 40 * 10^5 - 1
c = a - b

What is the correct tolerance for c?

eirik

> A similar case is the "trunc" function and probably others, like
> "floor", "ceil", "abs", etc.
>
> My conclusion is that it should be safe to assume no tolerance in this
> functions and I could modify the algorithm used for them in the python
> module but I wanted to have some feedback in case I'm not taking into
> account something that advices against doing these modifications.
>
> Opinions?
>
> [1] https://en.wikipedia.org/wiki/IEEE_floating_point#Basic_and_interch
> ange_formats
> -- 
> Br,
>
> Andres
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.

2016-05-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95266

Ilia Mirkin  changed:

   What|Removed |Added

 Attachment #123465|text/plain  |application/bzip2
  mime type||

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.

2016-05-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95266

--- Comment #2 from Ilia Mirkin  ---
You might consider including an apitrace that reproduces the issue. People are
probably going to be less inclined to run some random binary off the interwebs.

Also, is this only a bug with llvmpipe (and/or gallivm), or have you tested
this with various hw drivers in mesa? (Sorry, your wrote a lot of text and I
only skimmed it.)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 94955] Uninitialized variables leads to random segfaults (valgrind log, apitrace attached)

2016-05-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=94955

--- Comment #23 from David Lonie  ---
Opened a new bug:

https://bugs.freedesktop.org/show_bug.cgi?id=95266

Feel free to close this one.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.

2016-05-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95266

--- Comment #1 from David Lonie  ---
The "Reproduction of issue" attachment is a tar.bz2 file, btw. When I try to
download it it just dumps the binary as character data to my browser...Let me
know if I should re-upload it and do something differently.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95266] Geometry missing from rendering, only when using Mesa.

2016-05-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95266

Bug ID: 95266
   Summary: Geometry missing from rendering, only when using Mesa.
   Product: Mesa
   Version: git
  Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
  Severity: normal
  Priority: medium
 Component: Mesa core
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: david.lo...@kitware.com
QA Contact: mesa-dev@lists.freedesktop.org

Created attachment 123465
  --> https://bugs.freedesktop.org/attachment.cgi?id=123465&action=edit
Reproduction of issue

VTK's new depth peeling implementation has been tested on numerous platforms
(mac, windows, linux, iOS/android ES3) and various OpenGL implementations
(nVidia, ATI, Intel, and Mesa), but only fails when using mesa.

This issue was first described in another bug report: 

https://bugs.freedesktop.org/show_bug.cgi?id=94955

but the issue has moved away from the original topic of that report. From the
discussion there, it appears to be some issue in the floating point textures
used in the new peeling code, as the samplers are returning NaNs (If I
understand the discussion correctly). Indeed, the old peeling implementation
(which works on Mesa) only used 32-bit RGBA textures.

I've provided an executable from VTK that reproduces the issue, a compiled mesa
libGL.so, and sample outputs using nVidia and Mesa OpenGL implementations,
along with details on how VTK and Mesa were configured.

The README included in the tarball is pasted below for further details.

-

This directory contains a reproduction of a Mesa OpenGL bug that
affects VTK's new depth peeling code. It contains the following files:

libGL.so - A compiled mesa OpenGL implementation, built against 38fcf7cb.
run-mesa.sh - A script to run the test executable with the included mesa
library.
run-sysGL.sh - A script to run the test executable using the system OpenGL
library.
TestTranslucentLUTDepthPeeling-valid.png - Example valid output for test.
TestTranslucentLUTDepthPeeling-mesa.png - Incorrect output for test from mesa.
vtkRenderingCoreCxxTests -- The test executable.

# Reproducing the bug

Execute run-sysGL.sh. This will run the TestTranslucentLUTDepthPeeling
test in the vtkRenderingCoreCxxTests executable using the libGL.so
found in LD_LIBRARY_PATH. This code has been successfully tested on
windows, mac, iOS/android GL ES3, and linux, using nVidia, intel, and
ATI OpenGL implementations. The output should resemble the included
'valid' png.

Execute run-mesa.sh. This runs the test using the included libGL.so
(change the LD_PRELOAD value in the script to test a different
library). The output will resemble the included 'mesa' png -- just the
background, with no visible geometry.

# Notes: Building Mesa

The included mesa libGL.so was compiled with the options:

./autogen.sh \
--enable-debug \
--prefix=/ssd/src/mesa-master-install-nomangle \
--disable-dri \
--disable-egl \
--disable-gles1 \
--disable-gles2 \
--disable-shared-glapi \
--enable-xlib-glx \
--enable-gallium-osmesa \
--with-gallium-drivers=swrast \
--enable-gallium-llvm=yes \
LLVM_CONFIG=/path/to/llvm-3.8.0/llvm-config \
--enable-llvm-shared-libs

with debugging symbols stripped later to reduce file size.

# Notes: Building VTK

The VTK test can be built by downloading the VTK source code from our
git repo:

https://gitlab.kitware.com/vtk/vtk.git

I built against HEAD acd85fe48d.

There is currently a workaround that falls back to an older depth
peeling implementation for Mesa 11.3.0(-devel). To reproduce the bug,
edit Rendering/OpenGL2/vtkOpenGLRenderer.cxx and remove the version check in
vtkOpenGLRenderer::DeviceRenderTranslucentPolygonalGeometry, around line 307:

  std::string glVersion =
  reinterpret_cast(glGetString(GL_VERSION));
  if (glVersion.find("Mesa 11.3.0") != std::string::npos)
{
vtkDebugMacro("Disabling dual depth peeling -- mesa bug detected. "
  "GL_VERSION = " << glVersion);
dualDepthPeelingSupported = false;
}  
}

Configure using cmake:

mkdir vtk-build
cd vtk-build
cmake path/to/vtk/sourcecode/ \
  -DVTK_RENDERING_BACKEND=OpenGL2 \
  -DOPENGL_INCLUDE_DIR=/path/to/installed/mesa/include
  -DOPENGL_gl_LIBRARY=/path/to/installed/mesa/lib/libGL.so
  -DOPENGL_glu_LIBRARY=""

Run 'make' to build, and run the depth peeling tests:

cd vtk-build
ctest -R DepthPeel

(use "ctest -V -R 'SomeTestName'" to see the actual
executables/arguments/output)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] About tolerance calculation on specific (builtin) functions

2016-05-04 Thread Ilia Mirkin

On Wed, May 4, 2016 at 1:41 PM, Connor Abbott  wrote:
> On Wed, May 4, 2016 at 1:05 PM, Andres Gomez  wrote:
>> Hi,
>>
>> as part of the work done to "Add FP64 support to the i965 shader
>> backends" at:
>> https://bugs.freedesktop.org/show_bug.cgi?id=92760
>>
>> I've been working to add piglit tests that would check the new features
>> added by this addition.
>>
>> Due to this, I've been checking and making modifications into the
>> builtin_functions*py module used by some generators. These modules use
>> automatic calculation of the tolerance when distance checking the
>> result of a function.
>>
>> As already stated in the good documentation of the module, the
>> tolerance is computed following what it is stated in OpenGL's specs:
>>
>> From the OpenGL 1.4 spec (2.1.1 "Floating-Point Computation"):
>>
>>   "We require simply that numbers' floating-point parts contain
>>   enough bits ... so that individual results of floating-point
>>   operations are accurate to about 1 part in 10^5."
>>
>> Although the text is open to interpretation, and for specific
>> operations we take a little bit more flexible approach, basically, the
>> tolerance is calculated as:
>>
>> tolerance =  / 10⁵
>>
>> This makes sense since the precision of a floating point value gets
>> reduced while the number gets bigger[1].
>>
>> Following this approach, for a number in the order of 40*10⁵, the
>> tolerance used is ~40. While this should be OK for most of the
>> functions, it seems to me that such a high tolerance should not be used
>> with specific functions, if any tolerance should be used at all.
>>
>> For example, when testing the "sign()" function, seems pretty obvious
>> that using a value of 40 in the tolerance of a function that should
>> return either 1.0, 0.0 or -1.0 doesn't make much sense.
>>
>> A similar case is the "trunc" function and probably others, like
>> "floor", "ceil", "abs", etc.
>>
>> My conclusion is that it should be safe to assume no tolerance in this
>> functions and I could modify the algorithm used for them in the python
>> module but I wanted to have some feedback in case I'm not taking into
>> account something that advices against doing these modifications.
>>
>> Opinions?
>
> Hi,
>
> If you look at the GLSL 4.40 spec, in section 4.7.1 ("Range and
> Precision") you'll find a table listing the precision of various
> operations. Your intuition about floor(), ceil(), etc. needing the
> exact result is correct, as you can see. For doubles, it says "The
> precision of double-precision operations is at least that of single
> precision." Now, it's up for interpretation whether that means that
> they must have the same *absolute* precision or the same ULP's (if the
> operation is not exact). For example inversesqrt() is listed at 2 ULP
> for single precision, which means that there must be 24 - 2 = 22 bits
> of precision. For doubles, are there still 22 bits of precision
> required, or is the requirement really that there still be 2 ULP's of
> precision in which case there are 53 - 2 = 51 bits of precision. I
> wrote the original lowering pass for inversesqrt() and friends
> assuming the latter was correct, since it seems like the most sane to
> me (or else doubles would have no advantage over floats for anything
> except addition and multiplication).

All of this is added by ARB_shader_precision (part of GLSL 4.10). It
also punts on doubles, good to know that the latest specs have
maintained the status quo. One interpretation of it is "you can
implement doubles with float ops". Another is that the ULP's apply to
doubles as well.

Prior to that ext, the precision of everything was to 10^-5.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] About tolerance calculation on specific (builtin) functions

2016-05-04 Thread Connor Abbott

On Wed, May 4, 2016 at 1:05 PM, Andres Gomez  wrote:
> Hi,
>
> as part of the work done to "Add FP64 support to the i965 shader
> backends" at:
> https://bugs.freedesktop.org/show_bug.cgi?id=92760
>
> I've been working to add piglit tests that would check the new features
> added by this addition.
>
> Due to this, I've been checking and making modifications into the
> builtin_functions*py module used by some generators. These modules use
> automatic calculation of the tolerance when distance checking the
> result of a function.
>
> As already stated in the good documentation of the module, the
> tolerance is computed following what it is stated in OpenGL's specs:
>
> From the OpenGL 1.4 spec (2.1.1 "Floating-Point Computation"):
>
>   "We require simply that numbers' floating-point parts contain
>   enough bits ... so that individual results of floating-point
>   operations are accurate to about 1 part in 10^5."
>
> Although the text is open to interpretation, and for specific
> operations we take a little bit more flexible approach, basically, the
> tolerance is calculated as:
>
> tolerance =  / 10⁵
>
> This makes sense since the precision of a floating point value gets
> reduced while the number gets bigger[1].
>
> Following this approach, for a number in the order of 40*10⁵, the
> tolerance used is ~40. While this should be OK for most of the
> functions, it seems to me that such a high tolerance should not be used
> with specific functions, if any tolerance should be used at all.
>
> For example, when testing the "sign()" function, seems pretty obvious
> that using a value of 40 in the tolerance of a function that should
> return either 1.0, 0.0 or -1.0 doesn't make much sense.
>
> A similar case is the "trunc" function and probably others, like
> "floor", "ceil", "abs", etc.
>
> My conclusion is that it should be safe to assume no tolerance in this
> functions and I could modify the algorithm used for them in the python
> module but I wanted to have some feedback in case I'm not taking into
> account something that advices against doing these modifications.
>
> Opinions?

Hi,

If you look at the GLSL 4.40 spec, in section 4.7.1 ("Range and
Precision") you'll find a table listing the precision of various
operations. Your intuition about floor(), ceil(), etc. needing the
exact result is correct, as you can see. For doubles, it says "The
precision of double-precision operations is at least that of single
precision." Now, it's up for interpretation whether that means that
they must have the same *absolute* precision or the same ULP's (if the
operation is not exact). For example inversesqrt() is listed at 2 ULP
for single precision, which means that there must be 24 - 2 = 22 bits
of precision. For doubles, are there still 22 bits of precision
required, or is the requirement really that there still be 2 ULP's of
precision in which case there are 53 - 2 = 51 bits of precision. I
wrote the original lowering pass for inversesqrt() and friends
assuming the latter was correct, since it seems like the most sane to
me (or else doubles would have no advantage over floats for anything
except addition and multiplication).

Also, you might want to modify the FP64 tests to test the additional
range that doubles provide. Last time I looked at them, they didn't do
that.

Connor

>
> [1] https://en.wikipedia.org/wiki/IEEE_floating_point#Basic_and_interch
> ange_formats
> --
> Br,
>
> Andres
>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] About tolerance calculation on specific (builtin) functions

2016-05-04 Thread Matt Turner

On Wed, May 4, 2016 at 10:05 AM, Andres Gomez  wrote:
> Hi,
>
> as part of the work done to "Add FP64 support to the i965 shader
> backends" at:
> https://bugs.freedesktop.org/show_bug.cgi?id=92760
>
> I've been working to add piglit tests that would check the new features
> added by this addition.
>
> Due to this, I've been checking and making modifications into the
> builtin_functions*py module used by some generators. These modules use
> automatic calculation of the tolerance when distance checking the
> result of a function.
>
> As already stated in the good documentation of the module, the
> tolerance is computed following what it is stated in OpenGL's specs:
>
> From the OpenGL 1.4 spec (2.1.1 "Floating-Point Computation"):
>
>   "We require simply that numbers' floating-point parts contain
>   enough bits ... so that individual results of floating-point
>   operations are accurate to about 1 part in 10^5."
>
> Although the text is open to interpretation, and for specific
> operations we take a little bit more flexible approach, basically, the
> tolerance is calculated as:
>
> tolerance =  / 10⁵
>
> This makes sense since the precision of a floating point value gets
> reduced while the number gets bigger[1].
>
> Following this approach, for a number in the order of 40*10⁵, the
> tolerance used is ~40. While this should be OK for most of the
> functions, it seems to me that such a high tolerance should not be used
> with specific functions, if any tolerance should be used at all.
>
> For example, when testing the "sign()" function, seems pretty obvious
> that using a value of 40 in the tolerance of a function that should
> return either 1.0, 0.0 or -1.0 doesn't make much sense.
>
> A similar case is the "trunc" function and probably others, like
> "floor", "ceil", "abs", etc.
>
> My conclusion is that it should be safe to assume no tolerance in this
> functions and I could modify the algorithm used for them in the python
> module but I wanted to have some feedback in case I'm not taking into
> account something that advices against doing these modifications.
>
> Opinions?

Yes, I agree. I noticed the "trunc" test provided some tolerance, and
really... it shouldn't. We should be able to calculate trunc (and a
bunch of other functions like you mention) exactly.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC] add MAINTAINERS and get_maintainer.pl script

2016-05-04 Thread Rob Clark

On Wed, May 4, 2016 at 1:04 PM, Eric Engestrom
 wrote:
> On Tue, May 03, 2016 at 11:11:56AM -0400, Rob Clark wrote:
>>
>> Ok, I've renamed to REVIEWERS and scripts/get_reviewer.pl and updated
>> the verbage appropriately.
>
> scripts/get_reviewer.pl still obeys .get_maintainer.conf and
> .get_maintainer.ignore. Is that something we want to make more
> consistent?

maybe?

tbh, I only tried to make minimal changes compared to linux get_maintainer.pl

>> I think I'll go ahead and push it this afternoon if nobody screams.
>> No formal ack-by's, but a handful of yeah-thats-a-good-idea-by's and
>> no objections so far.
>
> I just tested it, and it barfed some UTF8-as-ASCII at me, along with
> an error message and some text that shouldn't be there:
>
>   Bad divisor in main::vcs_assign: 0
>   (cc-cmd) Adding cc: "GitAuthor: Jason Ekstrand"  
> from: 'scripts/get_reviewer.pl'
>   (cc-cmd) Adding cc: =?UTF-8?q?GitAuthor=3A=20Kristian=20H=C3=B8gsberg?= 
>  from: 'scripts/get_reviewer.pl'
>   (cc-cmd) Adding cc: Chad Versace  from: 
> 'scripts/get_reviewer.pl'
>   (cc-cmd) Adding cc: =?UTF-8?q?Kristian=20H=C3=B8gsberg?= 
>  from: 'scripts/get_reviewer.pl'
>
> The UTF-8 markup *might* be necessary if whatever consumes this doesn't
> understand actual UTF-8, but the error and the extra "GitAuthor: "
> definitely don't belong :]

I think the UTF-8 is necessary when encoding email addresses.  This
has been how it has always worked for me in the past.  No idea about
the "GitAuthor:" bits..  in the end the CC's that it has added when
used from git-send-email have seemed correct to me.

BR,
-R


> Cheers,
>   Eric
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] About tolerance calculation on specific (builtin) functions

2016-05-04 Thread Andres Gomez

Hi,

as part of the work done to "Add FP64 support to the i965 shader
backends" at:
https://bugs.freedesktop.org/show_bug.cgi?id=92760

I've been working to add piglit tests that would check the new features
added by this addition.

Due to this, I've been checking and making modifications into the
builtin_functions*py module used by some generators. These modules use
automatic calculation of the tolerance when distance checking the
result of a function.

As already stated in the good documentation of the module, the
tolerance is computed following what it is stated in OpenGL's specs:

From the OpenGL 1.4 spec (2.1.1 "Floating-Point Computation"):

  "We require simply that numbers' floating-point parts contain
  enough bits ... so that individual results of floating-point
  operations are accurate to about 1 part in 10^5."

Although the text is open to interpretation, and for specific
operations we take a little bit more flexible approach, basically, the
tolerance is calculated as:

tolerance =  / 10⁵

This makes sense since the precision of a floating point value gets
reduced while the number gets bigger[1].

Following this approach, for a number in the order of 40*10⁵, the
tolerance used is ~40. While this should be OK for most of the
functions, it seems to me that such a high tolerance should not be used
with specific functions, if any tolerance should be used at all.

For example, when testing the "sign()" function, seems pretty obvious
that using a value of 40 in the tolerance of a function that should
return either 1.0, 0.0 or -1.0 doesn't make much sense.

A similar case is the "trunc" function and probably others, like
"floor", "ceil", "abs", etc.

My conclusion is that it should be safe to assume no tolerance in this
functions and I could modify the algorithm used for them in the python
module but I wanted to have some feedback in case I'm not taking into
account something that advices against doing these modifications.

Opinions?

[1] https://en.wikipedia.org/wiki/IEEE_floating_point#Basic_and_interch
ange_formats
-- 
Br,

Andres



signature.asc
Description: This is a digitally signed message part
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC] add MAINTAINERS and get_maintainer.pl script

2016-05-04 Thread Eric Engestrom

On Tue, May 03, 2016 at 11:11:56AM -0400, Rob Clark wrote:
> 
> Ok, I've renamed to REVIEWERS and scripts/get_reviewer.pl and updated
> the verbage appropriately.

scripts/get_reviewer.pl still obeys .get_maintainer.conf and
.get_maintainer.ignore. Is that something we want to make more
consistent?

> I think I'll go ahead and push it this afternoon if nobody screams.
> No formal ack-by's, but a handful of yeah-thats-a-good-idea-by's and
> no objections so far.

I just tested it, and it barfed some UTF8-as-ASCII at me, along with
an error message and some text that shouldn't be there:

  Bad divisor in main::vcs_assign: 0
  (cc-cmd) Adding cc: "GitAuthor: Jason Ekstrand"  
from: 'scripts/get_reviewer.pl'
  (cc-cmd) Adding cc: =?UTF-8?q?GitAuthor=3A=20Kristian=20H=C3=B8gsberg?= 
 from: 'scripts/get_reviewer.pl'
  (cc-cmd) Adding cc: Chad Versace  from: 
'scripts/get_reviewer.pl'
  (cc-cmd) Adding cc: =?UTF-8?q?Kristian=20H=C3=B8gsberg?= 
 from: 'scripts/get_reviewer.pl'

The UTF-8 markup *might* be necessary if whatever consumes this doesn't
understand actual UTF-8, but the error and the extra "GitAuthor: "
definitely don't belong :]

Cheers,
  Eric
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] main: uses casts to silence some _mesa_debug() format warnings

2016-05-04 Thread Sinclair Yeh

Reviewed-by: Sinclair Yeh 

On Mon, May 02, 2016 at 07:15:11PM -0600, Brian Paul wrote:
> Silences warnings with 32-bit Linux gcc builds and MinGW which doesn't
> recognize the ‘t’ conversion character.
> ---
>  src/mesa/main/bufferobj.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c
> index 731b62e..e60a8ea 100644
> --- a/src/mesa/main/bufferobj.c
> +++ b/src/mesa/main/bufferobj.c
> @@ -2487,8 +2487,9 @@ _mesa_map_buffer_range(struct gl_context *ctx,
>  
> if (offset + length > bufObj->Size) {
>_mesa_error(ctx, GL_INVALID_VALUE,
> -  "%s(offset %td + length %td > buffer_size %td)", func,
> -  offset, length, bufObj->Size);
> +  "%s(offset %lu + length %lu > buffer_size %lu)", func,
> +  (unsigned long) offset, (unsigned long) length,
> +  (unsigned long) bufObj->Size);
>return NULL;
> }
>  
> @@ -3749,8 +3750,9 @@ _mesa_BindBufferRange(GLenum target, GLuint index,
> struct gl_buffer_object *bufObj;
>  
> if (MESA_VERBOSE & VERBOSE_API) {
> -  _mesa_debug(ctx, "glBindBufferRange(%s, %u, %u, %ld, %ld)\n",
> -  _mesa_enum_to_string(target), index, buffer, offset, size);
> +  _mesa_debug(ctx, "glBindBufferRange(%s, %u, %u, %lu, %lu)\n",
> +  _mesa_enum_to_string(target), index, buffer,
> +  (unsigned long) offset, (unsigned long) size);
> }
>  
> if (buffer == 0) {
> -- 
> 1.9.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] swr: Remove stall waiting for core query counters.

2016-05-04 Thread Kyriazis, George

Reviewed-By: George Kyriazis 

> -Original Message-
> From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On
> Behalf Of BruceCherniak
> Sent: Thursday, April 28, 2016 12:13 PM
> To: mesa-dev@lists.freedesktop.org
> Subject: [Mesa-dev] [PATCH] swr: Remove stall waiting for core query
> counters.
> 
> When gathering query results, swr_gather_stats was
> unnecessarily stalling the entire pipeline.  Results are now
> collected asynchronously, with a fence marking completion.
> ---
>  src/gallium/drivers/swr/swr_fence.cpp |6 -
>  src/gallium/drivers/swr/swr_fence.h   |8 ++
>  src/gallium/drivers/swr/swr_query.cpp |  180 --
> ---
>  src/gallium/drivers/swr/swr_query.h   |   11 ++-
>  4 files changed, 81 insertions(+), 124 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/swr_fence.cpp
> b/src/gallium/drivers/swr/swr_fence.cpp
> index 2e95b39..8a8e864 100644
> --- a/src/gallium/drivers/swr/swr_fence.cpp
> +++ b/src/gallium/drivers/swr/swr_fence.cpp
> @@ -105,12 +105,6 @@ swr_fence_reference(struct pipe_screen *screen,
>swr_fence_destroy(old);
>  }
> 
> -static INLINE boolean
> -swr_is_fence_done(struct pipe_fence_handle *fence_handle)
> -{
> -   struct swr_fence *fence = swr_fence(fence_handle);
> -   return (fence->read == fence->write);
> -}
> 
>  /*
>   * Wait for the fence to finish.
> diff --git a/src/gallium/drivers/swr/swr_fence.h
> b/src/gallium/drivers/swr/swr_fence.h
> index df3776e..47f4d2e 100644
> --- a/src/gallium/drivers/swr/swr_fence.h
> +++ b/src/gallium/drivers/swr/swr_fence.h
> @@ -45,6 +45,14 @@ swr_fence(struct pipe_fence_handle *fence)
> return (struct swr_fence *)fence;
>  }
> 
> +
> +static INLINE boolean
> +swr_is_fence_done(struct pipe_fence_handle *fence_handle)
> +{
> +   struct swr_fence *fence = swr_fence(fence_handle);
> +   return (fence->read == fence->write);
> +}
> +
>  static INLINE boolean
>  swr_is_fence_pending(struct pipe_fence_handle *fence_handle)
>  {
> diff --git a/src/gallium/drivers/swr/swr_query.cpp
> b/src/gallium/drivers/swr/swr_query.cpp
> index f038a6e..5c59965 100644
> --- a/src/gallium/drivers/swr/swr_query.cpp
> +++ b/src/gallium/drivers/swr/swr_query.cpp
> @@ -62,10 +62,8 @@ swr_destroy_query(struct pipe_context *pipe, struct
> pipe_query *q)
> struct swr_query *pq = swr_query(q);
> 
> if (pq->fence) {
> -  if (!swr_is_fence_pending(pq->fence)) {
> - swr_fence_submit(swr_context(pipe), pq->fence);
> +  if (swr_is_fence_pending(pq->fence))
>   swr_fence_finish(pipe->screen, pq->fence, 0);
> -  }
>swr_fence_reference(pipe->screen, &pq->fence, NULL);
> }
> 
> @@ -73,100 +71,45 @@ swr_destroy_query(struct pipe_context *pipe,
> struct pipe_query *q)
>  }
> 
> 
> -// XXX Create a fence callback, rather than stalling SwrWaitForIdle
>  static void
>  swr_gather_stats(struct pipe_context *pipe, struct swr_query *pq)
>  {
> struct swr_context *ctx = swr_context(pipe);
> 
> assert(pq->result);
> -   union pipe_query_result *result = pq->result;
> +   struct swr_query_result *result = pq->result;
> boolean enable_stats = pq->enable_stats;
> -   SWR_STATS swr_stats = {0};
> -
> -   if (pq->fence) {
> -  if (!swr_is_fence_pending(pq->fence)) {
> - swr_fence_submit(ctx, pq->fence);
> - swr_fence_finish(pipe->screen, pq->fence, 0);
> -  }
> -  swr_fence_reference(pipe->screen, &pq->fence, NULL);
> -   }
> 
> -   /*
> -* These queries don't need SWR Stats enabled in the core
> -* Set and return.
> -*/
> +   /* A few results don't require the core, so don't involve it */
> switch (pq->type) {
> case PIPE_QUERY_TIMESTAMP:
> case PIPE_QUERY_TIME_ELAPSED:
> -  result->u64 = swr_get_timestamp(pipe->screen);
> -  return;
> +  result->timestamp = swr_get_timestamp(pipe->screen);
>break;
> case PIPE_QUERY_TIMESTAMP_DISJOINT:
> -  /* nothing to do here */
> -  return;
> -  break;
> case PIPE_QUERY_GPU_FINISHED:
> -  result->b = TRUE; /* XXX TODO Add an api func to SWR to compare
> drawId
> -   vs LastRetiredId? */
> -  return;
> +  /* nothing to do here */
>break;
> default:
> -  /* Any query that needs SwrCore stats */
> -  break;
> -   }
> -
> -   /*
> -* All other results are collected from SwrCore counters
> -*/
> +  /*
> +   * All other results are collected from SwrCore counters via
> +   * SwrGetStats. This returns immediately, but results are later filled
> +   * in by the backend.  Fence status is the only indication of
> +   * completion.  */
> +  SwrGetStats(ctx->swrContext, &result->core);
> +
> +  if (!pq->fence) {
> + struct swr_screen *screen = swr_screen(pipe->screen);
> + swr_fence_reference(pipe->screen, &pq->fence, screen-
> >flush_fence);
> +  }
> +  swr_fence_submit(ctx, pq->fence);
> 
> -   /* XXX, Should turn this into a fence

Re: [Mesa-dev] [PATCH 08/11] glapi: Harden GLX request size processing

2016-05-04 Thread Adam Jackson

On Mon, 2016-03-28 at 11:10 -0700, Ian Romanick wrote:

> > @@ -428,7 +428,7 @@ class PrintGlxReqSize_h(PrintGlxReqSize_common):
>   def printBody(self, api):
> >  for func in api.functionIterateGlx():
> >  if not func.ignore and func.has_variable_size_request():
> > -print 'extern PURE _X_HIDDEN int __glX%sReqSize(const 
> > GLbyte *pc, Bool swap);' % (func.name)
> > +print 'extern PURE _X_HIDDEN int __glX%sReqSize(const 
> > GLbyte *pc, Bool swap, int reqlen);' % (func.name)
> 
> I don't see where reqlen is used.  Did I miss something?

You didn't. It's only used (or needed) in __glXDrawArraysReqSize, which
isn't generated. But since the prototype needs to match...

 -ajax
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC demos] a mesa equivalent to vulkaninfo

2016-05-04 Thread Alejandro Piñeiro

Hi,

some weeks ago Dave Airlie made the review for the
ARB_internalformat_query2 extension support (again, thanks a lot), and
at the end he suggested to create a "mesa-demos glxinfo super query app.
Something that does what vulkaninfo kinda does" [1]. As I mentioned at
the moment, I had a heavily modified piglit test that was doing
something similar when working on the extension. So after some cleaning
I have a standalone version of that program. But before start to work to
integrate it with mesa demos, and propose it to the list as a patch, I
have some doubts, and I hope someone is interested in answering me:

 * Formatting: right now I decided to format the outcome as csv. So
something like this:

64 bit, GL_SAMPLES, GL_RENDERBUFFER, GL_DEPTH_COMPONENT, "8,4"
64 bit, GL_SAMPLES, GL_RENDERBUFFER, GL_DEPTH_STENCIL, "8,4"
64 bit, GL_SAMPLES, GL_RENDERBUFFER, GL_RED, "8,4"
64 bit, GL_SAMPLES, GL_RENDERBUFFER, GL_RG, "8,4"

The value between quotes, because the number of possible values is
variable (although right now only SAMPLES can return more that one value).

The reason is that with so many possible target/internalformat/pname
combinations, there isn't a simple way to fit it on a table. I also
tried to not repeat the information on any line, but then you need to go
back and forth to know what you are seeing. I assume that this format is
good enough to dump the outcome to a file if you want to check something
in particular, and that any kind of pretty-print is out of the scope of
this little demo. I also guess that csv would be good enough if someone
wants to implement that pretty-printer as a something external.

In any case I'm open to any other kind of formatting.

* Getting the gl enum name: ok, now the hack. In order to get the name
for all the gl enums I just copied a method from
piglit-util-gl-enum-gen.c. That is a generated file. The correct thing
would be do as piglit, and generate that source file from the khronos
xml. But that sounded like an overkill to me. Additionally that khronos
xml is like 2.5M. vulkaninfo try to do some define/# magic, but I don't
think that the solution would be better for this case [2], where we have
more enums to take into account.

* For the initialization I used glut and glew, instead of something more
low-level like glxinfo. There are already several mesa-demos using both,
so I assumed that it would not be any problem to do the same here.

* Right now the name is query2-info. It is not really impressive, but I
can't think on anything better. Suggestion are welcome.

If someone wants to take a look to the standalone program, you can find
it here [3].

Thanks in advance.

BR

[1] https://lists.freedesktop.org/archives/mesa-dev/2016-March/108954.html
[2]
https://github.com/LunarG/VulkanTools/blob/master/demos/vulkaninfo.c#L204
[3] https://github.com/infapi00/query2-info
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/59] Initial arb_gpu_shader_fp64 support to the i965 scalar backend

2016-05-04 Thread Samuel Iglesias Gonsálvez

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 03/05/16 20:59, Kenneth Graunke wrote:
> Other than patches 37, 56, and ones you agreed to drop, the series
> is: Reviewed-by: Kenneth Graunke 
> 
> I think you can go ahead and land all except those, and we can
> land new solutions for those problems afterwards.
> 
> We still need to fix the horiz_offset problem, and I think Curro's 
> subscript() helper is a great way to do that.  I would be fine
> with fixing that after this series lands - it would probably mean
> less rebasing :)
> 

We prepared a patch series without patches 11, 22 (both were dropped),
37, 40 (it has a NACK from Curro) and 56. Our idea was to push the
patch series once we check there is no piglit regressions.

We got 0 piglit regressions on ILK, SNB, IVB, HSW and BDW, the latter
with the extension enabled with the environment variable.

Sadly, on SKL we had regressions (including GPU hangs) only if we
*enable* the arb_gpu_shader_fp64 extension. Those regressions are
fixed by patch 40 but we need still to investigate why they are happening.

For that reason, we have not pushed anything yet. Tomorrow we will
investigate why we have those regressions only on SKL and which tests
are affected.

JFTR, if we disable the extension on SKL (i.e. we don't set the
environment variable), we have 0 piglit regressions on that generation.

Thanks,

Sam

-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJXKgPaAAoJEH/0ujLxfcNDtHgP/0JBd96UX2026Y0zBydrE1Ta
U7FV5e0SJXY5JnxWqdV4ZhtdiqQ0EDLj2G0NlFq0u29c+lQX//x9KHMANxbt2tdc
jStRm4NMNXds4rxFYaiRs+3p4XLquV4aZ33Px+0CT6m6Hjsm6sNJZViarB41SjAq
HVRuKAxBbVTjqqPXBKAoXXxp34OUym5uZydXFjavCs1lU70DVwZq9t/vzIAGUunT
NCbehScgtdVupTDEzORZ7QRXvGyqQoJa+HBeuEK6HvvhonfezwrB1hawIRGP34V/
/k+UvL/DA3GEenpCqHfOIwnxVRYN6iXYNlHSNK+LolsjR/CUxccBs67tcQiGAPBA
g8V02KHbuIN2JyXLSwzJ0ovX5aA+vr5Fuc8e+0PZ7raa7GxIsDg3LbUNnyT8dT83
fQAM6IMFYhZUOKKF8ynqw5tvRAaUhWJwageYqRkdHROSs8toero7RDAabEdBKWhG
scGkKEu3S7vsGLMBzeSch/UqyAXdaBcJzRSTKCi2UHyWT/5aPczqf4eacyLgWpa/
3r+qqzljZ2F8MUbnyP71CyTufx+UQzoBAlB4kM09UcF9RC95XTSQrMBMB/SqR2Om
+ebdE/d1e5ubExNQTbQPrb2OGCIiQhRGQo8MFcuFtgl2DabJFvW4QvsuJKinhQLR
zcfupbRUPDegesKRU+es
=phid
-END PGP SIGNATURE-
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 95211] scons TypeError: 'tuple' object is not callable

2016-05-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=95211

Jose Fonseca  changed:

   What|Removed |Added

   Assignee|mesa-dev@lists.freedesktop. |jfons...@vmware.com
   |org |

--- Comment #1 from Jose Fonseca  ---
It looks like SCons changed some of its internals.

Most likely it won't be an easy fix.  I'm not sure when I'll be able to look
into this.

My suggestion is to use SCons 2.4.x for now.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCHv2 10/23] i965/fs: Stop using the LOAD_PAYLOAD instruction in lower_simd_width.

2016-05-04 Thread Michael Schellenberger Costa

Hi Curro,

Am 04.05.2016 um 06:26 schrieb Francisco Jerez:
> Instead of using the LOAD_PAYLOAD instruction (emitted through the
> emit_transpose() helper that is no longer useful and this commit
> removes) which had to be marked force_writemask_all in some cases,
> emit a series of moves to apply proper channel enable signals to the
> destination.  Until now lower_simd_width() had mainly been used to
> lower things that invariably had a basic block-local temporary as
> destination so it didn't seem like a big deal, but I found it to be
> the reason for several Piglit regressions in my SIMD32 branch and
> Igalia discovered the same issue independently while working on FP64
> support.
> ---
> This is taken from the following WIP series:
>   https://cgit.freedesktop.org/~currojerez/mesa/log/?h=i965-late-simd-lowering
>
> See also:
>   https://lists.freedesktop.org/archives/mesa-dev/2016-May/115596.html
>
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 58 
> +++-
>  1 file changed, 18 insertions(+), 40 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 4b6aa67..34599cb 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -4625,29 +4625,6 @@ get_lowered_simd_width(const struct brw_device_info 
> *devinfo,
> }
>  }
>  
> -/**
> - * The \p rows array of registers represents a \p num_rows by \p num_columns
> - * matrix in row-major order, write it in column-major order into the 
> register
> - * passed as destination.  \p stride gives the separation between matrix
> - * elements in the input in fs_builder::dispatch_width() units.
> - */
> -static void
> -emit_transpose(const fs_builder &bld,
> -   const fs_reg &dst, const fs_reg *rows,
> -   unsigned num_rows, unsigned num_columns, unsigned stride)
> -{
> -   fs_reg *const components = new fs_reg[num_rows * num_columns];
> -
> -   for (unsigned i = 0; i < num_columns; ++i) {
> -  for (unsigned j = 0; j < num_rows; ++j)
> - components[num_rows * i + j] = offset(rows[j], bld, stride * i);
> -   }
> -
> -   bld.LOAD_PAYLOAD(dst, components, num_rows * num_columns, 0);
> -
> -   delete[] components;
> -}
> -
>  bool
>  fs_visitor::lower_simd_width()
>  {
> @@ -4698,16 +4675,19 @@ fs_visitor::lower_simd_width()
> if (inst->src[j].file != BAD_FILE &&
> !is_uniform(inst->src[j])) {
>/* Get the i-th copy_width-wide chunk of the source. */
> -  const fs_reg src = horiz_offset(inst->src[j], copy_width * 
> i);
> +  const fs_builder cbld = lbld.group(copy_width, 0);
> +  const fs_reg src = offset(inst->src[j], cbld, i);
>const unsigned src_size = inst->components_read(j);
>  
> -  /* Use a trivial transposition to copy one every n
> -   * copy_width-wide components of the register into a
> -   * temporary passed as source to the lowered instruction.
> +  /* Copy one every n copy_width-wide components of the
That first sentence is really unclear. Shouldnt it sound "Copy each
copy_width-wide component of..."

Michael
> +   * register into a temporary passed as source to the 
> lowered
> +   * instruction.
> */
>split_inst.src[j] = lbld.vgrf(inst->src[j].type, src_size);
> -  emit_transpose(lbld.group(copy_width, 0),
> - split_inst.src[j], &src, 1, src_size, n);
> +
> +  for (unsigned k = 0; k < src_size; ++k)
> + cbld.MOV(offset(split_inst.src[j], lbld, k),
> +  offset(src, cbld, n * k));
> }
>  }
>  
> @@ -4726,20 +4706,18 @@ fs_visitor::lower_simd_width()
>   }
>  
>   if (inst->regs_written) {
> -/* Distance between useful channels in the temporaries, skipping
> - * garbage if the lowered instruction is wider than the original.
> - */
> -const unsigned m = lower_width / copy_width;
> +const fs_builder lbld = ibld.group(lower_width, 0);
>  
>  /* Interleave the components of the result from the lowered
> - * instructions.  We need to set exec_all() when copying more 
> than
> - * one half per component, because LOAD_PAYLOAD (in terms of 
> which
> - * emit_transpose is implemented) can only use the same channel
> - * enable signals for all of its non-header sources.
> + * instructions.
>   */
> -emit_transpose(ibld.exec_all(inst->exec_size > copy_width)
> -   .group(copy_width, 0),
> -   inst->dst, dsts, n, dst_size, m);
> +for (unsigned i = 0; i < dst_size; ++i) {
> +

Re: [Mesa-dev] [PATCH v2 03/25] anv: tweak the %.json rule

2016-05-04 Thread Knut Andre Tidemann

On fredag 22. april 2016 19.55.10 CEST Emil Velikov wrote:
> From: Emil Velikov 
> 
> It's used only by dev_icd.json so just call it that way. While we're
> here, manually expand $< (as it might cause issue on some systems)
> and drop the unneeded install_libdir substitution.
> 
> Cc: Jason Ekstrand 
> Signed-off-by: Emil Velikov 
> ---
> This commit replaces "anv: remove no longer needed %.json rule"
> ---
>  src/intel/vulkan/Makefile.am | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/src/intel/vulkan/Makefile.am b/src/intel/vulkan/Makefile.am
> index 916f171..6d93906 100644
> --- a/src/intel/vulkan/Makefile.am
> +++ b/src/intel/vulkan/Makefile.am
> @@ -190,11 +190,10 @@ icdconf_DATA = intel_icd.json
>  # The following is used for development purposes, by setting
> VK_ICD_FILENAMES. noinst_DATA = dev_icd.json
> 
> -%.json : %.json.in
> +dev_icd.json : dev_icd.json.in
>   $(AM_V_GEN) $(SED) \
> - -e "s#@build_libdir@#${abs_top_builddir}/${LIB_DIR}#" \
> - -e "s#@install_libdir@#${libdir}#" < $< > $@
> -
> + -e "s#@build_libdir@#${abs_top_builddir}/${LIB_DIR}#"
> + < $(srcdir)/dev_icd.json.in > $@
> 
>  # Libvulkan with dummy gem. Used for unit tests.

Hello. 

This patch breaks the build for me, as it hangs on the sed line. I've sent a 
small patch that fixes the issue.

--
Knut
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] anv: fix hang during generation of dev_icd.json.

2016-05-04 Thread Knut Andre Tidemann

---
 src/intel/vulkan/Makefile.am | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/vulkan/Makefile.am b/src/intel/vulkan/Makefile.am
index fd8aae0..662d720 100644
--- a/src/intel/vulkan/Makefile.am
+++ b/src/intel/vulkan/Makefile.am
@@ -156,7 +156,7 @@ noinst_DATA = dev_icd.json
 
 dev_icd.json : dev_icd.json.in
$(AM_V_GEN) $(SED) \
-   -e "s#@build_libdir@#${abs_top_builddir}/${LIB_DIR}#"
+   -e "s#@build_libdir@#${abs_top_builddir}/${LIB_DIR}#" \
< $(srcdir)/dev_icd.json.in > $@
 
 # Libvulkan with dummy gem. Used for unit tests.
-- 
2.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 01/15] i965: get the proper vertex surface type for doubles on gen8+

2016-05-04 Thread Alejandro Piñeiro

On 04/05/16 12:31, Kenneth Graunke wrote:
> On Thursday, April 28, 2016 1:40:31 PM PDT Antia Puentes wrote:
>> From: Alejandro Piñeiro 
>>
>> This commit adds support for PASSTHRU format when pushing
>> double-precision attributes.
>>
>> Check glarray->Doubles in order to know if we should choose a format
>> that does a conversion to float, or just passthru the 64-bit double.
>> ---
>>  src/mesa/drivers/dri/i965/brw_draw_upload.c | 30 +
> +---
>>  1 file changed, 27 insertions(+), 3 deletions(-)
> What's the motivation for using the PASSTHRU formats?  Is there some
> advantage over just using the R64*FLOAT formats?

We can't use R64*FLOAT because it does a double to float conversion, as
explained at the bdw PRM (vol 7, section 10.15, page 470, Table "Source
Element Formats Supported in VF Unit"). PASSTHRU ensures that the
64-bits are passed without any conversion, that is what we want. If it
is not clear reading the commit message, I could expand it.

FWIW, the R64*FLOAT format is used when calling the previous API
(without *L*). You can call glVertexAttribPointer using GL_DOUBLE as
type, but it will feed the shaders with floats. That conversion is done
using R64*FLOAT formats when uploading the data. When using
glVertexAttrib with GL_DOUBLE, the casting to float is done on the call
itself, at mesa/main.

Finally, the alternative to R64*PASSTHRU formats would be using
R32*FLOAT formats, and doing two uploads if needed (for dvec3/dvec4). We
needed to do that for the in-progress gen7 support, and means some extra
code compared to just use PASSTHRU. If you are curious:
https://github.com/Igalia/mesa/commit/7d4b0c5cff33a3b555a9c5d38b1af0aeba697cac

Best regards

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] swr: Fix resource leak in object.

2016-05-04 Thread Robert Foss




On 05/04/2016 08:52 AM, Rowley, Timothy O wrote:

After shader compilation, the gallivm_state is owned by ShaderVariant<> and 
destroyed by its destructor (swr_state.h) so that the generated JIT code and 
gallivm_state can be freed when no longer needed.



Oh. Splendid.
Thanks for looking into it.


Rob.


On May 3, 2016, at 5:45 PM, robert.f...@collabora.com wrote:

From: Robert Foss 

Make sure that memory allocated is free'd.
Previously only the contents of the variable
galliumvm was free'd, not the actual memory
it points to.

Coverity: 1358907
Signed-off-by: Robert Foss 
---
src/gallium/drivers/swr/swr_shader.cpp | 1 +
1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/swr/swr_shader.cpp 
b/src/gallium/drivers/swr/swr_shader.cpp
index f693f51..5b1b1ee 100644
--- a/src/gallium/drivers/swr/swr_shader.cpp
+++ b/src/gallium/drivers/swr/swr_shader.cpp
@@ -134,6 +134,7 @@ struct BuilderSWR : public Builder {

~BuilderSWR() {
   gallivm_free_ir(gallivm);
+  FREE(gallivm);
}

struct gallivm_state *gallivm;
--
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3] vulkan: Fix use of uninitialized variable.

2016-05-04 Thread robert . foss

From: Robert Foss 

The return variable was not set for failure paths.
It has now been changed to VK_ERROR_INITIALIZATION_FAILED
for failure paths.

Coverity: 1358944
Reviewed-by: Eric Engestrom 
Signed-off-by: Robert Foss 
---

Changes since v1:
- Started using vk_error().

Changes since v2:
- Added Reviewed-by: Eric Engestrom 

 src/intel/vulkan/anv_wsi_wayland.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/intel/vulkan/anv_wsi_wayland.c 
b/src/intel/vulkan/anv_wsi_wayland.c
index 6f25eaf..43606f6 100644
--- a/src/intel/vulkan/anv_wsi_wayland.c
+++ b/src/intel/vulkan/anv_wsi_wayland.c
@@ -776,12 +776,16 @@ wsi_wl_surface_create_swapchain(VkIcdSurfaceBase 
*icd_surface,
chain->queue = NULL;
 
chain->display = wsi_wl_get_display(device->instance, surface->display);
-   if (!chain->display)
+   if (!chain->display) {
+  result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
   goto fail;
+   }
 
chain->queue = wl_display_create_queue(chain->display->display);
-   if (!chain->queue)
+   if (!chain->queue) {
+  result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
   goto fail;
+   }
 
for (uint32_t i = 0; i < chain->image_count; i++) {
   result = wsi_wl_image_init(chain, &chain->images[i], pAllocator);
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2] vulkan: Fix use of uninitialized variable.

2016-05-04 Thread robert . foss

From: Robert Foss 

The return variable was not set for failure paths.
It has now been changed to VK_ERROR_INITIALIZATION_FAILED
for failure paths.

Coverity: 1358944
Signed-off-by: Robert Foss 
---

Changes since v1:
- Started using vk_error().

 src/intel/vulkan/anv_wsi_wayland.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/intel/vulkan/anv_wsi_wayland.c 
b/src/intel/vulkan/anv_wsi_wayland.c
index 6f25eaf..43606f6 100644
--- a/src/intel/vulkan/anv_wsi_wayland.c
+++ b/src/intel/vulkan/anv_wsi_wayland.c
@@ -776,12 +776,16 @@ wsi_wl_surface_create_swapchain(VkIcdSurfaceBase 
*icd_surface,
chain->queue = NULL;
 
chain->display = wsi_wl_get_display(device->instance, surface->display);
-   if (!chain->display)
+   if (!chain->display) {
+  result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
   goto fail;
+   }
 
chain->queue = wl_display_create_queue(chain->display->display);
-   if (!chain->queue)
+   if (!chain->queue) {
+  result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
   goto fail;
+   }
 
for (uint32_t i = 0; i < chain->image_count; i++) {
   result = wsi_wl_image_init(chain, &chain->images[i], pAllocator);
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] swr: Fix resource leak in object.

2016-05-04 Thread Rowley, Timothy O

After shader compilation, the gallivm_state is owned by ShaderVariant<> and 
destroyed by its destructor (swr_state.h) so that the generated JIT code and 
gallivm_state can be freed when no longer needed.

> On May 3, 2016, at 5:45 PM, robert.f...@collabora.com wrote:
> 
> From: Robert Foss 
> 
> Make sure that memory allocated is free'd.
> Previously only the contents of the variable
> galliumvm was free'd, not the actual memory
> it points to.
> 
> Coverity: 1358907
> Signed-off-by: Robert Foss 
> ---
> src/gallium/drivers/swr/swr_shader.cpp | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/src/gallium/drivers/swr/swr_shader.cpp 
> b/src/gallium/drivers/swr/swr_shader.cpp
> index f693f51..5b1b1ee 100644
> --- a/src/gallium/drivers/swr/swr_shader.cpp
> +++ b/src/gallium/drivers/swr/swr_shader.cpp
> @@ -134,6 +134,7 @@ struct BuilderSWR : public Builder {
> 
>~BuilderSWR() {
>   gallivm_free_ir(gallivm);
> +  FREE(gallivm);
>}
> 
>struct gallivm_state *gallivm;
> -- 
> 2.5.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] configure.ac: error out when building from git without python3

2016-05-04 Thread Emil Velikov

On 4 May 2016 at 13:08, Ilia Mirkin  wrote:
> Only if building Intel vulkan, no?
I believe the idea is to move from python2 so this will do no harm as is.

> Could that script just be fixed to work
> with python2 BTW?
>
Possibly. The above 'moving away from python2' topic still applies imho.

If we have a brave soul to port it and (more importantly) others are
happy I would love to drop this patch.

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] configure.ac: error out when building from git without python3

2016-05-04 Thread Ilia Mirkin

Only if building Intel vulkan, no? Could that script just be fixed to work
with python2 BTW?
On May 4, 2016 6:44 AM, "Emil Velikov"  wrote:

> From: Emil Velikov 
>
> Bail early, as opposed to later on during the build.
>
> Signed-off-by: Emil Velikov 
> ---
>  configure.ac | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/configure.ac b/configure.ac
> index 5f75c60..0932871 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -142,6 +142,12 @@ else
>  fi
>  fi
>
> +if test -z "$PYTHON3"; then
> +if test ! -f "$srcdir/src/intel/genxml/gen9_pack.h"; then
> +AC_MSG_ERROR([Python3 not found - unable to generate sources])
> +fi
> +fi
> +
>  AC_PROG_INSTALL
>
>  dnl We need a POSIX shell for parts of the build. Assume we have one
> --
> 2.6.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 03/59] freedreno/ir3: lower lrp when operating with double operands

2016-05-04 Thread Rob Clark

On Fri, Apr 29, 2016 at 7:29 AM, Samuel Iglesias Gonsálvez
 wrote:
> Lower lrp when operating with double operands because float version of
> lrp is also lowered.
>
> Signed-off-by: Samuel Iglesias Gonsálvez 
> CC: Rob Clark 
> ---
>  src/gallium/drivers/freedreno/ir3/ir3_nir.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/gallium/drivers/freedreno/ir3/ir3_nir.c 
> b/src/gallium/drivers/freedreno/ir3/ir3_nir.c
> index 364e92b..0635dfb 100644
> --- a/src/gallium/drivers/freedreno/ir3/ir3_nir.c
> +++ b/src/gallium/drivers/freedreno/ir3/ir3_nir.c
> @@ -43,6 +43,7 @@ ir3_tgsi_to_nir(const struct tgsi_token *tokens)
> .lower_fsat = true,
> .lower_scmp = true,
> .lower_flrp32 = true,
> +   .lower_flrp64 = true,
> .lower_ffract = true,
> .native_integers = true,
> .vertex_id_zero_based = true,

I guess there are probably other things missing for fp64 support, but
I think we probably want to figure that out at some point.  According
to opencl level supported by a4xx+ it should support doubles, but I'm
not sure yet if that is just via lowering.  Anyways, I guess this
makes sense for now, so:

Reviewed-by: Rob Clark 



> --
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 12/15] i965: abort linking if URB read length greater than 15

2016-05-04 Thread Juan A. Suarez Romero

On Wed, 2016-05-04 at 03:43 -0700, Kenneth Graunke wrote:
> On Thursday, April 28, 2016 1:40:42 PM PDT Antia Puentes wrote:
> > 
> > From: "Juan A. Suarez Romero" 
> > 
> > In scalar mode, URB read length limit is 15. Abort if we go beyond
> > it.
> > ---
> >  src/mesa/drivers/dri/i965/brw_vec4.cpp | 8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > b/src/mesa/drivers/dri/
> i965/brw_vec4.cpp
> > 
> > index 9816f0d..04287fb 100644
> > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> > @@ -2148,6 +2148,14 @@ brw_compile_vs(const struct brw_compiler
> > *compiler, 
> void *log_data,
> > 
> >    prog_data->base.urb_read_length =
> >   DIV_ROUND_UP(MAX2(nr_attribute_slots, 1), 2);
> >  
> > +   if (is_scalar && prog_data->base.urb_read_length > 15) {
> > +  if (error_str)
> > + *error_str = ralloc_strdup(mem_ctx,
> > +"Too many attributes. Try to
> > reduce the 
> "
> > 
> > +"number of attributes or their
> > size");
> > +  return NULL;
> > +   }
> > +
> > prog_data->nr_attributes = nr_attributes;
> > prog_data->nr_attribute_slots = nr_attribute_slots;
> >  
> > 
> Have you hit this limit?  I'm probably doing my math wrong, but it
> doesn't seem like we should ever hit it...
> 

Yes. We have a piglit test (not in master yet) that hits the limit. You
can find it in http://pastebin.com/k96EyYun

This links with a comment made by Ian[1] in a different patch, but that
also applies here.

https://lists.freedesktop.org/archives/mesa-dev/2016-April/114885.html

dvec3/dmat*3/dvec4/dmat*4 can be counted as consuming the same
attributes as the single precision version, or consuming twice. Current
code follows the former option. If we switch to the second, then we
won't hit this limit. But this has other cons, as explained in the
thread.

> I'm not sure we can legally just say no - we may need to resort to
> pulling URB data...
> 

Maybe. Another option could be to switch to vec4... if gen8+vec4 were
working.

As this seems an extreme case (when we use lot of attributes), I
thought that dealing with it was more like a task to improve in the
future.

J.A.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 02/15] i965: Configure how to store 64PASSTHRU vertex components

2016-05-04 Thread Antía Puentes

Hi Kenneth,

thanks for reviewing.

On mié, 2016-05-04 at 03:36 -0700, Kenneth Graunke wrote:
> On Thursday, April 28, 2016 1:40:32 PM PDT Antia Puentes wrote:
> > 
> > From the Broadwell specification, structure VERTEX_ELEMENT_STATE
> > description:
> > 
> >    "When SourceElementFormat is set to one of the *64*_PASSTHRU
> > formats,  64-bit components are stored in the URB without any
> > conversion. In this case, vertex elements must be written as 128
> > or 256 bits, with VFCOMP_STORE_0 being used to pad the output
> > as required. E.g., if R64_PASSTHRU is used to copy a 64-bit Red 
> component into
> > 
> > the URB, Component 1 must be specified as VFCOMP_STORE_0 (with
> > Components 2,3 set to VFCOMP_NOSTORE) in order to output a 128-bit
> > vertex element, or Components 1-3 must be specified as VFCOMP_STORE_0
> > in order to output a 256-bit vertex element. Likewise, use of
> > R64G64B64_PASSTHRU requires Component 3 to be specified as 
> VFCOMP_STORE_0
> > 
> > in order to output a 256-bit vertex element."
> > 
> > Uses 128-bits to write double and dvec2 vertex elements, and 256-bits for
> > dvec3 and dvec4 vertex elements.
> > 
> > Signed-off-by: Juan A. Suarez Romero 
> > Signed-off-by: Antia Puentes 
> > ---
> >  src/mesa/drivers/dri/i965/gen8_draw_upload.c | 34 +
> +++
> > 
> >  1 file changed, 34 insertions(+)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/gen8_draw_upload.c b/src/mesa/
> drivers/dri/i965/gen8_draw_upload.c
> > 
> > index fe5ed35..14f7091 100644
> > --- a/src/mesa/drivers/dri/i965/gen8_draw_upload.c
> > +++ b/src/mesa/drivers/dri/i965/gen8_draw_upload.c
> > @@ -217,6 +217,40 @@ gen8_emit_vertices(struct brw_context *brw)
> >   break;
> >    }
> >  
> > +  /* From the BDW PRM, Volume 2d, page 586 (VERTEX_ELEMENT_STATE):
> > +   *
> > +   * "When SourceElementFormat is set to one of the *64*_PASSTHRU
> > +   * formats, 64-bit components are stored in the URB without any
> > +   * conversion. In this case, vertex elements must be written as 
> 128
> > 
> > +   * or 256 bits, with VFCOMP_STORE_0 being used to pad the output
> > +   * as required. E.g., if R64_PASSTHRU is used to copy a 64-bit 
> Red
> > 
> > +   * component into the URB, Component 1 must be specified as
> > +   * VFCOMP_STORE_0 (with Components 2,3 set to VFCOMP_NOSTORE)
> > +   * in order to output a 128-bit vertex element, or Components 
> > 1-3 
> must
> > 
> > +   * be specified as VFCOMP_STORE_0 in order to output a 256-bit 
> vertex
> > 
> > +   * element. Likewise, use of R64G64B64_PASSTHRU requires 
> Component 3
> > 
> > +   * to be specified as VFCOMP_STORE_0 in order to output a 
> > 256-bit 
> vertex
> > 
> > +   * element."
> > +   */
> The above comment seems to indicate that R64 needs component 1 set to
> STORE_0, and that looks to be missing.  I'm guessing you need to add:
> 
> > 
> > +  if (input->glarray->Doubles) {
> > + switch (input->glarray->Size) {
> > + case 0:
>    comp0 = BRW_VE1_COMPONENT_STORE_0;
>    /* fallthrough */
> 
> > 
> > + case 1:
>    comp1 = BRW_VE1_COMPONENT_STORE_0;
>    /* fallthrough */
> 

I have not added them because comp0 and comp1 are initialized in the
code immediately above the switch I added. That already existing code
sets the default values for the components:

switch (input->glarray->Size) {
  case 0: comp0 = BRW_VE1_COMPONENT_STORE_0;
  case 1: comp1 = BRW_VE1_COMPONENT_STORE_0;
  case 2: comp2 = BRW_VE1_COMPONENT_STORE_0;
  case 3: comp3 = input->glarray->Integer ?
   BRW_VE1_COMPONENT_STORE_1_INT
 : BRW_VE1_COMPONENT_STORE_1_FLT;
 break;
  }

In my switch I just override the values that are not right for the
double-precision floating point cases.

> > + case 2:
> > +/*  Use 128-bits instead of 256-bits to write double and dvec2
> > + *  vertex elements.
> > + */
> > +comp2 = BRW_VE1_COMPONENT_NOSTORE;
> > +comp3 = BRW_VE1_COMPONENT_NOSTORE;
> > +break;
> > + case 3:
> > +/* Pad the output using VFCOMP_STORE_0 as suggested
> > + * by the BDW PRM.
> > + */
> > +comp3 = BRW_VE1_COMPONENT_STORE_0;
> > + }
> > +  }
> > +
> >    OUT_BATCH((input->buffer << GEN6_VE0_INDEX_SHIFT) |
> >  GEN6_VE0_VALID |
> >  (format << BRW_VE0_FORMAT_SHIFT) |
> > 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

1 2 >

1 - 100 of 119 matches

Mail list logo