date:20160909

Re: [Mesa-dev] [PATCH 30/33] anv: Use blorp for CopyBuffer and UpdateBuffer

2016-09-09 Thread Jason Ekstrand

On Fri, Sep 9, 2016 at 5:47 PM, Nanley Chery  wrote:

> On Wed, Aug 31, 2016 at 02:22:49PM -0700, Jason Ekstrand wrote:
> > ---
> >  src/intel/vulkan/Makefile.sources |   1 -
> >  src/intel/vulkan/anv_blorp.c  | 184 ++
> 
> >  src/intel/vulkan/anv_meta_copy.c  | 180 --
> ---
> >  3 files changed, 184 insertions(+), 181 deletions(-)
> >  delete mode 100644 src/intel/vulkan/anv_meta_copy.c
> >
> > diff --git a/src/intel/vulkan/Makefile.sources
> b/src/intel/vulkan/Makefile.sources
> > index 35e15f6..6c9853b 100644
> > --- a/src/intel/vulkan/Makefile.sources
> > +++ b/src/intel/vulkan/Makefile.sources
> > @@ -35,7 +35,6 @@ VULKAN_FILES := \
> >   anv_meta.h \
> >   anv_meta_blit2d.c \
> >   anv_meta_clear.c \
> > - anv_meta_copy.c \
> >   anv_meta_resolve.c \
> >   anv_nir.h \
> >   anv_nir_apply_dynamic_offsets.c \
> > diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
> > index 89ff3b3..e2b6672 100644
> > --- a/src/intel/vulkan/anv_blorp.c
> > +++ b/src/intel/vulkan/anv_blorp.c
> > @@ -479,3 +479,187 @@ void anv_CmdBlitImage(
> >
> > blorp_batch_finish();
> >  }
> > +
> > +static void
> > +do_buffer_copy(struct blorp_batch *batch,
> > +   struct anv_bo *src, uint64_t src_offset,
> > +   struct anv_bo *dst, uint64_t dst_offset,
> > +   int width, int height, int block_size)
> > +{
> > +   struct anv_device *device = batch->blorp->driver_ctx;
> > +
> > +   /* The actual format we pick doesn't matter as blorp will throw it
> away.
> > +* The only thing that actually matters is the size.
> > +*/
> > +   enum isl_format format;
> > +   switch (block_size) {
> > +   case 1:  format = ISL_FORMAT_R8_UINT;  break;
> > +   case 2:  format = ISL_FORMAT_R8G8_UINT;break;
> > +   case 4:  format = ISL_FORMAT_R8G8B8A8_UNORM;   break;
> > +   case 8:  format = ISL_FORMAT_R16G16B16A16_UNORM;   break;
> > +   case 16: format = ISL_FORMAT_R32G32B32A32_UINT;break;
> > +   default:
> > +  unreachable("Not a power-of-two format size");
> > +   }
> > +
> > +   struct isl_surf surf;
> > +   isl_surf_init(>isl_dev, ,
> > + .dim = ISL_SURF_DIM_2D,
> > + .format = format,
> > + .width = width,
> > + .height = height,
> > + .depth = 1,
> > + .levels = 1,
> > + .array_len = 1,
> > + .samples = 1,
> > + .usage = ISL_SURF_USAGE_TEXTURE_BIT,
>
> Shouldn't we also OR in ISL_SURF_USAGE_RENDER_TARGET_BIT?
>

Yes we should.


> > + .tiling_flags = ISL_TILING_LINEAR_BIT);
> > +   assert(surf.row_pitch == width * block_size);
> > +
> > +   struct blorp_surf src_blorp_surf = {
> > +  .surf = ,
> > +  .addr = {
> > + .buffer = src,
> > + .offset = src_offset,
> > +  },
> > +   };
> > +
> > +   struct blorp_surf dst_blorp_surf = {
> > +  .surf = ,
> > +  .addr = {
> > + .buffer = dst,
> > + .offset = dst_offset,
> > +  },
> > +   };
> > +
> > +   blorp_copy(batch, _blorp_surf, 0, 0, _blorp_surf, 0, 0,
> > +  0, 0, 0, 0, width, height);
> > +}
> > +
> > +void anv_CmdCopyBuffer(
> > +VkCommandBuffer commandBuffer,
> > +VkBuffersrcBuffer,
> > +VkBufferdstBuffer,
> > +uint32_tregionCount,
> > +const VkBufferCopy* pRegions)
> > +{
> > +   ANV_FROM_HANDLE(anv_cmd_buffer, cmd_buffer, commandBuffer);
> > +   ANV_FROM_HANDLE(anv_buffer, src_buffer, srcBuffer);
> > +   ANV_FROM_HANDLE(anv_buffer, dst_buffer, dstBuffer);
> > +
> > +   struct blorp_batch batch;
> > +   blorp_batch_init(_buffer->device->blorp, , cmd_buffer);
> > +
> > +   for (unsigned r = 0; r < regionCount; r++) {
> > +  uint64_t src_offset = src_buffer->offset + pRegions[r].srcOffset;
> > +  uint64_t dst_offset = dst_buffer->offset + pRegions[r].dstOffset;
> > +  uint64_t copy_size = pRegions[r].size;
> > +
> > +  /* First, we compute the biggest format that can be used with the
> > +   * given offsets and size.
> > +   */
> > +  int bs = 16;
> > +
> > +  int fs = ffs(src_offset) - 1;
> > +  if (fs != -1)
> > + bs = MIN2(bs, 1 << fs);
> > +  assert(src_offset % bs == 0);
>
> I think we could replace this math and some math in
> anv_CmdUpdateBuffer() with logic that's a bit simpler:
>
> /* Get the greatest power-of-two multiple between a number and a
>  * power-of-two upper-bound.
>  */
> #define gcm_pow2(val, max_pow2m) (1 << (ffs((val)|(max_pow2m)) - 1))
>

That's a good idea.  I just pushed to the branch with two new patches for
you to look at.  One which uses gcm_pow2 and one which sets the usage bits
as requested

Re: [Mesa-dev] [PATCH] mesa: Implement ARB_shader_viewport_layer_array for i965

2016-09-09 Thread Ilia Mirkin

On Fri, Sep 9, 2016 at 8:59 PM, Dylan Baker  wrote:
>> Some piglit tests to check the TES behaviour would be nice.
>
> Yeah, I figured. I'll have to go read up on how tessellation works.

While I would hate to discourage you from such a joyous learning
experience, if you want the quickest way to success, mash up the AMD_*
tests with the ARB_tessellation_shader nop.shader_test or
sanity.shader_test.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa: Implement ARB_shader_viewport_layer_array for i965

2016-09-09 Thread Dylan Baker

Quoting Ilia Mirkin (2016-09-09 16:40:15)
> On Fri, Sep 9, 2016 at 7:14 PM, Dylan Baker  wrote:
> > +EXT(ARB_shader_viewport_layer_array , 
> > ARB_shader_viewport_layer_array, GLL, GLC,  x ,  x , 2015)
> 
> This relies on GL 3.2 at least, so I think this should just be GLC, not GLL.

The descriptions of what GLL, GLC, etc do are a bit... vague.

I'll change that before I push anything.

> 
> Also, you mention it requires GL 3.1, but really it just requires
> GL_ARB_viewport_array (which became core in GL 4.1). I think it'd be
> safe to enable this on SNB+. Your call.

The spec explicitly calls out that it requires GL 4.1, so I guess the
question is whether we want to to honor that requirement or not.

> 
> Core bits are
> 
> Reviewed-by: Ilia Mirkin 
> 
> Some piglit tests to check the TES behaviour would be nice.

Yeah, I figured. I'll have to go read up on how tessellation works.


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: don't preload constants at the beginning of shaders

2016-09-09 Thread Edward O'Callaghan



On 09/10/2016 10:02 AM, Bas Nieuwenhuizen wrote:
> Reviewed-by: Bas Nieuwenhuizen 

Reviewed-by: Edward O'Callaghan 

> 
> On Sat, Sep 10, 2016 at 12:40 AM, Marek Olšák  wrote:
>> From: Marek Olšák 
>>
>> LLVM can CSE the loads, thus we can always re-load constants before each
>> use. The decrease in SGPR spilling is huge.
>>
>> The best improvements are the dumbest ones.
>>
>> 26011 shaders in 14651 tests
>> Totals:
>> SGPRS: 1453346 -> 1251920 (-13.86 %)
>> VGPRS: 742576 -> 728421 (-1.91 %)
>> Spilled SGPRs: 52298 -> 16644 (-68.17 %)
>> Spilled VGPRs: 397 -> 369 (-7.05 %)
>> Scratch VGPRs: 1372 -> 1344 (-2.04 %) dwords per thread
>> Code Size: 36136488 -> 36001064 (-0.37 %) bytes
>> LDS: 767 -> 767 (0.00 %) blocks
>> Max Waves: 219315 -> 21 (1.33 %)
>> ---
>>  src/gallium/drivers/radeonsi/si_shader.c | 30 +++---
>>  1 file changed, 11 insertions(+), 19 deletions(-)
>>
>> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
>> b/src/gallium/drivers/radeonsi/si_shader.c
>> index 0b7de18..08e3cee 100644
>> --- a/src/gallium/drivers/radeonsi/si_shader.c
>> +++ b/src/gallium/drivers/radeonsi/si_shader.c
>> @@ -1874,26 +1874,33 @@ static LLVMValueRef fetch_constant(
>> for (chan = 0; chan < TGSI_NUM_CHANNELS; ++chan)
>> values[chan] = fetch_constant(bld_base, reg, type, 
>> chan);
>>
>> return lp_build_gather_values(bld_base->base.gallivm, 
>> values, 4);
>> }
>>
>> buf = reg->Register.Dimension ? reg->Dimension.Index : 0;
>> idx = reg->Register.Index * 4 + swizzle;
>>
>> if (!reg->Register.Indirect && !reg->Dimension.Indirect) {
>> +   LLVMValueRef c0, c1;
>> +
>> +   c0 = buffer_load_const(ctx, ctx->const_buffers[buf],
>> +  LLVMConstInt(ctx->i32, idx * 4, 0));
>> +
>> if (!tgsi_type_is_64bit(type))
>> -   return bitcast(bld_base, type, 
>> ctx->constants[buf][idx]);
>> +   return bitcast(bld_base, type, c0);
>> else {
>> +   c1 = buffer_load_const(ctx, ctx->const_buffers[buf],
>> +  LLVMConstInt(ctx->i32,
>> +   (idx + 1) * 4, 
>> 0));
>> return radeon_llvm_emit_fetch_64bit(bld_base, type,
>> -   
>> ctx->constants[buf][idx],
>> -   
>> ctx->constants[buf][idx + 1]);
>> +   c0, c1);
>> }
>> }
>>
>> if (reg->Register.Dimension && reg->Dimension.Indirect) {
>> LLVMValueRef ptr = LLVMGetParam(ctx->radeon_bld.main_fn, 
>> SI_PARAM_CONST_BUFFERS);
>> LLVMValueRef index;
>> index = get_bounded_indirect_index(ctx, >DimIndirect,
>>reg->Dimension.Index,
>>SI_NUM_CONST_BUFFERS);
>> bufp = build_indexed_load_const(ctx, ptr, index);
>> @@ -5789,39 +5796,26 @@ static void create_function(struct si_shader_context 
>> *ctx)
>>
>>  static void preload_constants(struct si_shader_context *ctx)
>>  {
>> struct lp_build_tgsi_context *bld_base = 
>> >radeon_bld.soa.bld_base;
>> struct gallivm_state *gallivm = bld_base->base.gallivm;
>> const struct tgsi_shader_info *info = bld_base->info;
>> unsigned buf;
>> LLVMValueRef ptr = LLVMGetParam(ctx->radeon_bld.main_fn, 
>> SI_PARAM_CONST_BUFFERS);
>>
>> for (buf = 0; buf < SI_NUM_CONST_BUFFERS; buf++) {
>> -   unsigned i, num_const = info->const_file_max[buf] + 1;
>> -
>> -   if (num_const == 0)
>> +   if (info->const_file_max[buf] == -1)
>> continue;
>>
>> -   /* Allocate space for the constant values */
>> -   ctx->constants[buf] = CALLOC(num_const * 4, 
>> sizeof(LLVMValueRef));
>> -
>> /* Load the resource descriptor */
>> ctx->const_buffers[buf] =
>> build_indexed_load_const(ctx, ptr, 
>> lp_build_const_int32(gallivm, buf));
>> -
>> -   /* Load the constants, we rely on the code sinking to do the 
>> rest */
>> -   for (i = 0; i < num_const * 4; ++i) {
>> -   ctx->constants[buf][i] =
>> -   buffer_load_const(ctx,
>> -   ctx->const_buffers[buf],
>> -   lp_build_const_int32(gallivm, i * 
>> 4));
>> -   }
>> }
>>  }
>>
>>  static void preload_shader_buffers(struct si_shader_context *ctx)
>>  {

Re: [Mesa-dev] [PATCH 30/33] anv: Use blorp for CopyBuffer and UpdateBuffer

2016-09-09 Thread Nanley Chery

On Wed, Aug 31, 2016 at 02:22:49PM -0700, Jason Ekstrand wrote:
> ---
>  src/intel/vulkan/Makefile.sources |   1 -
>  src/intel/vulkan/anv_blorp.c  | 184 
> ++
>  src/intel/vulkan/anv_meta_copy.c  | 180 -
>  3 files changed, 184 insertions(+), 181 deletions(-)
>  delete mode 100644 src/intel/vulkan/anv_meta_copy.c
> 
> diff --git a/src/intel/vulkan/Makefile.sources 
> b/src/intel/vulkan/Makefile.sources
> index 35e15f6..6c9853b 100644
> --- a/src/intel/vulkan/Makefile.sources
> +++ b/src/intel/vulkan/Makefile.sources
> @@ -35,7 +35,6 @@ VULKAN_FILES := \
>   anv_meta.h \
>   anv_meta_blit2d.c \
>   anv_meta_clear.c \
> - anv_meta_copy.c \
>   anv_meta_resolve.c \
>   anv_nir.h \
>   anv_nir_apply_dynamic_offsets.c \
> diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
> index 89ff3b3..e2b6672 100644
> --- a/src/intel/vulkan/anv_blorp.c
> +++ b/src/intel/vulkan/anv_blorp.c
> @@ -479,3 +479,187 @@ void anv_CmdBlitImage(
>  
> blorp_batch_finish();
>  }
> +
> +static void
> +do_buffer_copy(struct blorp_batch *batch,
> +   struct anv_bo *src, uint64_t src_offset,
> +   struct anv_bo *dst, uint64_t dst_offset,
> +   int width, int height, int block_size)
> +{
> +   struct anv_device *device = batch->blorp->driver_ctx;
> +
> +   /* The actual format we pick doesn't matter as blorp will throw it away.
> +* The only thing that actually matters is the size.
> +*/
> +   enum isl_format format;
> +   switch (block_size) {
> +   case 1:  format = ISL_FORMAT_R8_UINT;  break;
> +   case 2:  format = ISL_FORMAT_R8G8_UINT;break;
> +   case 4:  format = ISL_FORMAT_R8G8B8A8_UNORM;   break;
> +   case 8:  format = ISL_FORMAT_R16G16B16A16_UNORM;   break;
> +   case 16: format = ISL_FORMAT_R32G32B32A32_UINT;break;
> +   default:
> +  unreachable("Not a power-of-two format size");
> +   }
> +
> +   struct isl_surf surf;
> +   isl_surf_init(>isl_dev, ,
> + .dim = ISL_SURF_DIM_2D,
> + .format = format,
> + .width = width,
> + .height = height,
> + .depth = 1,
> + .levels = 1,
> + .array_len = 1,
> + .samples = 1,
> + .usage = ISL_SURF_USAGE_TEXTURE_BIT,

Shouldn't we also OR in ISL_SURF_USAGE_RENDER_TARGET_BIT?

> + .tiling_flags = ISL_TILING_LINEAR_BIT);
> +   assert(surf.row_pitch == width * block_size);
> +
> +   struct blorp_surf src_blorp_surf = {
> +  .surf = ,
> +  .addr = {
> + .buffer = src,
> + .offset = src_offset,
> +  },
> +   };
> +
> +   struct blorp_surf dst_blorp_surf = {
> +  .surf = ,
> +  .addr = {
> + .buffer = dst,
> + .offset = dst_offset,
> +  },
> +   };
> +
> +   blorp_copy(batch, _blorp_surf, 0, 0, _blorp_surf, 0, 0,
> +  0, 0, 0, 0, width, height);
> +}
> +
> +void anv_CmdCopyBuffer(
> +VkCommandBuffer commandBuffer,
> +VkBuffersrcBuffer,
> +VkBufferdstBuffer,
> +uint32_tregionCount,
> +const VkBufferCopy* pRegions)
> +{
> +   ANV_FROM_HANDLE(anv_cmd_buffer, cmd_buffer, commandBuffer);
> +   ANV_FROM_HANDLE(anv_buffer, src_buffer, srcBuffer);
> +   ANV_FROM_HANDLE(anv_buffer, dst_buffer, dstBuffer);
> +
> +   struct blorp_batch batch;
> +   blorp_batch_init(_buffer->device->blorp, , cmd_buffer);
> +
> +   for (unsigned r = 0; r < regionCount; r++) {
> +  uint64_t src_offset = src_buffer->offset + pRegions[r].srcOffset;
> +  uint64_t dst_offset = dst_buffer->offset + pRegions[r].dstOffset;
> +  uint64_t copy_size = pRegions[r].size;
> +
> +  /* First, we compute the biggest format that can be used with the
> +   * given offsets and size.
> +   */
> +  int bs = 16;
> +
> +  int fs = ffs(src_offset) - 1;
> +  if (fs != -1)
> + bs = MIN2(bs, 1 << fs);
> +  assert(src_offset % bs == 0);

I think we could replace this math and some math in
anv_CmdUpdateBuffer() with logic that's a bit simpler:

/* Get the greatest power-of-two multiple between a number and a
 * power-of-two upper-bound.
 */
#define gcm_pow2(val, max_pow2m) (1 << (ffs((val)|(max_pow2m)) - 1))

  bs = gcm_pow2(src_offset, bs);
  assert(src_offset % bs == 0);

> +
> +  fs = ffs(dst_offset) - 1;
> +  if (fs != -1)
> + bs = MIN2(bs, 1 << fs);
> +  assert(dst_offset % bs == 0);
> +
> +  fs = ffs(pRegions[r].size) - 1;
> +  if (fs != -1)
> + bs = MIN2(bs, 1 << fs);
> +  assert(pRegions[r].size % bs == 0);
> +
> +  /* This is maximum possible width/height our HW can handle */
> +  uint64_t max_surface_dim = 1 << 14;
> +
> +  /* First,

Re: [Mesa-dev] [PATCH] radeonsi: don't preload constants at the beginning of shaders

2016-09-09 Thread Bas Nieuwenhuizen

Reviewed-by: Bas Nieuwenhuizen 

On Sat, Sep 10, 2016 at 12:40 AM, Marek Olšák  wrote:
> From: Marek Olšák 
>
> LLVM can CSE the loads, thus we can always re-load constants before each
> use. The decrease in SGPR spilling is huge.
>
> The best improvements are the dumbest ones.
>
> 26011 shaders in 14651 tests
> Totals:
> SGPRS: 1453346 -> 1251920 (-13.86 %)
> VGPRS: 742576 -> 728421 (-1.91 %)
> Spilled SGPRs: 52298 -> 16644 (-68.17 %)
> Spilled VGPRs: 397 -> 369 (-7.05 %)
> Scratch VGPRs: 1372 -> 1344 (-2.04 %) dwords per thread
> Code Size: 36136488 -> 36001064 (-0.37 %) bytes
> LDS: 767 -> 767 (0.00 %) blocks
> Max Waves: 219315 -> 21 (1.33 %)
> ---
>  src/gallium/drivers/radeonsi/si_shader.c | 30 +++---
>  1 file changed, 11 insertions(+), 19 deletions(-)
>
> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> b/src/gallium/drivers/radeonsi/si_shader.c
> index 0b7de18..08e3cee 100644
> --- a/src/gallium/drivers/radeonsi/si_shader.c
> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> @@ -1874,26 +1874,33 @@ static LLVMValueRef fetch_constant(
> for (chan = 0; chan < TGSI_NUM_CHANNELS; ++chan)
> values[chan] = fetch_constant(bld_base, reg, type, 
> chan);
>
> return lp_build_gather_values(bld_base->base.gallivm, values, 
> 4);
> }
>
> buf = reg->Register.Dimension ? reg->Dimension.Index : 0;
> idx = reg->Register.Index * 4 + swizzle;
>
> if (!reg->Register.Indirect && !reg->Dimension.Indirect) {
> +   LLVMValueRef c0, c1;
> +
> +   c0 = buffer_load_const(ctx, ctx->const_buffers[buf],
> +  LLVMConstInt(ctx->i32, idx * 4, 0));
> +
> if (!tgsi_type_is_64bit(type))
> -   return bitcast(bld_base, type, 
> ctx->constants[buf][idx]);
> +   return bitcast(bld_base, type, c0);
> else {
> +   c1 = buffer_load_const(ctx, ctx->const_buffers[buf],
> +  LLVMConstInt(ctx->i32,
> +   (idx + 1) * 4, 
> 0));
> return radeon_llvm_emit_fetch_64bit(bld_base, type,
> -   
> ctx->constants[buf][idx],
> -   
> ctx->constants[buf][idx + 1]);
> +   c0, c1);
> }
> }
>
> if (reg->Register.Dimension && reg->Dimension.Indirect) {
> LLVMValueRef ptr = LLVMGetParam(ctx->radeon_bld.main_fn, 
> SI_PARAM_CONST_BUFFERS);
> LLVMValueRef index;
> index = get_bounded_indirect_index(ctx, >DimIndirect,
>reg->Dimension.Index,
>SI_NUM_CONST_BUFFERS);
> bufp = build_indexed_load_const(ctx, ptr, index);
> @@ -5789,39 +5796,26 @@ static void create_function(struct si_shader_context 
> *ctx)
>
>  static void preload_constants(struct si_shader_context *ctx)
>  {
> struct lp_build_tgsi_context *bld_base = 
> >radeon_bld.soa.bld_base;
> struct gallivm_state *gallivm = bld_base->base.gallivm;
> const struct tgsi_shader_info *info = bld_base->info;
> unsigned buf;
> LLVMValueRef ptr = LLVMGetParam(ctx->radeon_bld.main_fn, 
> SI_PARAM_CONST_BUFFERS);
>
> for (buf = 0; buf < SI_NUM_CONST_BUFFERS; buf++) {
> -   unsigned i, num_const = info->const_file_max[buf] + 1;
> -
> -   if (num_const == 0)
> +   if (info->const_file_max[buf] == -1)
> continue;
>
> -   /* Allocate space for the constant values */
> -   ctx->constants[buf] = CALLOC(num_const * 4, 
> sizeof(LLVMValueRef));
> -
> /* Load the resource descriptor */
> ctx->const_buffers[buf] =
> build_indexed_load_const(ctx, ptr, 
> lp_build_const_int32(gallivm, buf));
> -
> -   /* Load the constants, we rely on the code sinking to do the 
> rest */
> -   for (i = 0; i < num_const * 4; ++i) {
> -   ctx->constants[buf][i] =
> -   buffer_load_const(ctx,
> -   ctx->const_buffers[buf],
> -   lp_build_const_int32(gallivm, i * 4));
> -   }
> }
>  }
>
>  static void preload_shader_buffers(struct si_shader_context *ctx)
>  {
> struct gallivm_state *gallivm = >radeon_bld.gallivm;
> LLVMValueRef ptr = LLVMGetParam(ctx->radeon_bld.main_fn, 
> SI_PARAM_SHADER_BUFFERS);
> int buf, maxbuf;
>
> maxbuf =

Re: [Mesa-dev] [PATCH] mesa: Implement ARB_shader_viewport_layer_array for i965

2016-09-09 Thread Ilia Mirkin

On Fri, Sep 9, 2016 at 7:14 PM, Dylan Baker  wrote:
> +EXT(ARB_shader_viewport_layer_array , 
> ARB_shader_viewport_layer_array, GLL, GLC,  x ,  x , 2015)

This relies on GL 3.2 at least, so I think this should just be GLC, not GLL.

Also, you mention it requires GL 3.1, but really it just requires
GL_ARB_viewport_array (which became core in GL 4.1). I think it'd be
safe to enable this on SNB+. Your call.

Core bits are

Reviewed-by: Ilia Mirkin 

Some piglit tests to check the TES behaviour would be nice.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] Check about recent changes in amdgpu

2016-09-09 Thread Mauro Rossi

Hi Dave,

while checking mesa-dev with android build,
I'm porting the necessary changes to to have amd/addrlib in Android build,

but I have also encountered a building error related to commit
https://cgit.freedesktop.org/mesa/mesa/commit/?id=1add3562e33f0234da50e54dda8cfa6dac613125

In the attachment the full build log, under my signatue short error log.

If I revert 1add3562e33f0234da50e54dda8cfa6dac613125
and after that I modify src/amd/common/amdgpu_id.h
in this way...

-#include "util/u_endian.h"
+#include "pipe/p_config.h"

...I can build mesa and complete Android 7.0, build,
so there may be some part missing in the new util/u_endian.h
header compared to the old one.

Would you like to check when you have time, please?

Pardon me if I may have wrote something wrong, beacuse I'm more involved
android  build system than in coding.

After your feedback, I will submit the final version of patch to fix
android build for review,
while in the following brach I have the workaround I've mentioned:
https://github.com/maurossi/mesa/commits/12.1.0devel_nougat-x86_10-sep-2016

Mauro Rossi
issor.or...@gmail.com


In file included from external/mesa/src/amd/addrlib/r800/ciaddrlib.cpp:36:
external/mesa/src/amd/addrlib/inc/chip/r800/si_gb_reg.h:36:2: error:
"BIGENDIAN_CPU or LITTLEENDIAN_CPU must be defined"
#error "BIGENDIAN_CPU or LITTLEENDIAN_CPU must be defined"
 ^
external/mesa/src/amd/addrlib/inc/chip/r800/si_gb_reg.h:91:6: error:
unknown type name 'GB_ADDR_CONFIG_T'
 GB_ADDR_CONFIG_T f;
 ^
external/mesa/src/amd/addrlib/inc/chip/r800/si_gb_reg.h:146:6: error:
unknown type name 'GB_TILE_MODE_T'
 GB_TILE_MODE_T f;
 ^
external/mesa/src/amd/addrlib/inc/chip/r800/si_gb_reg.h:151:6: error:
unknown type name 'GB_MACROTILE_MODE_T'
 GB_MACROTILE_MODE_T f;
 ^
4 errors generated.
[ 20% 391/1939] target  C++: libmesa_amdgpu_addrlib <=
external/mesa/src/amd/addrlib/r800/egbaddrlib.cpp
ninja: build stopped: subcommand failed.
build/core/ninja.mk:148: recipe for target 'ninja_wrapper' failed
make: *** [ninja_wrapper] Error 1

 make failed to build some targets (01:33 (mm:ss)) 
utente@utente-MS-7576:~/nougat-x86$ make -j5 iso_img 
TARGET_KERNEL_CONFIG=480_defconfig FOR_RELEASE=1

PLATFORM_VERSION_CODENAME=REL
PLATFORM_VERSION=7.0
TARGET_PRODUCT=android_x86
TARGET_BUILD_VARIANT=eng
TARGET_BUILD_TYPE=release
TARGET_BUILD_APPS=
TARGET_ARCH=x86
TARGET_ARCH_VARIANT=x86
TARGET_CPU_VARIANT=
TARGET_2ND_ARCH=
TARGET_2ND_ARCH_VARIANT=
TARGET_2ND_CPU_VARIANT=
HOST_ARCH=x86_64
HOST_2ND_ARCH=x86
HOST_OS=linux
HOST_OS_EXTRA=Linux-4.4.0-38-generic-x86_64-with-Ubuntu-16.04-xenial
HOST_CROSS_OS=windows
HOST_CROSS_ARCH=x86
HOST_CROSS_2ND_ARCH=x86_64
HOST_BUILD_TYPE=release
BUILD_ID=NRD90M
OUT_DIR=out

Running kati to generate build-android_x86.ninja...
external/mesa/src/gallium/winsys/amdgpu/drm/Android.mk was modified, 
regenerating...

PLATFORM_VERSION_CODENAME=REL
PLATFORM_VERSION=7.0
TARGET_PRODUCT=android_x86
TARGET_BUILD_VARIANT=eng
TARGET_BUILD_TYPE=release
TARGET_BUILD_APPS=
TARGET_ARCH=x86
TARGET_ARCH_VARIANT=x86
TARGET_CPU_VARIANT=
TARGET_2ND_ARCH=
TARGET_2ND_ARCH_VARIANT=
TARGET_2ND_CPU_VARIANT=
HOST_ARCH=x86_64
HOST_2ND_ARCH=x86
HOST_OS=linux
HOST_OS_EXTRA=Linux-4.4.0-38-generic-x86_64-with-Ubuntu-16.04-xenial
HOST_CROSS_OS=windows
HOST_CROSS_ARCH=x86
HOST_CROSS_2ND_ARCH=x86_64
HOST_BUILD_TYPE=release
BUILD_ID=NRD90M
OUT_DIR=out

including ./abi/cpp/Android.mk ...
including ./art/Android.mk ...
including ./bionic/Android.mk ...
including ./bootable/newinstaller/Android.mk ...
including ./bootable/recovery/Android.mk ...
including ./build/libs/host/Android.mk ...
including ./build/target/board/Android.mk ...
including ./build/target/product/security/Android.mk ...
including ./build/tools/Android.mk ...
including ./dalvik/Android.mk ...
including ./development/apps/BluetoothDebug/Android.mk ...
including ./development/apps/BuildWidget/Android.mk ...
including ./development/apps/CustomLocale/Android.mk ...
including ./development/apps/Development/Android.mk ...
including ./development/apps/DevelopmentSettings/Android.mk ...
including ./development/apps/Fallback/Android.mk ...
including ./development/apps/GestureBuilder/Android.mk ...
including ./development/apps/NinePatchLab/Android.mk ...
including ./development/apps/OBJViewer/Android.mk ...
including ./development/apps/SdkSetup/Android.mk ...
including ./development/apps/SettingInjectorSample/Android.mk ...
including ./development/apps/WidgetPreview/Android.mk ...
including ./development/apps/launchperf/Android.mk ...
including ./development/build/Android.mk ...
including ./development/cmds/monkey/Android.mk ...
including ./development/host/windows/prebuilt/usb/Android.mk ...
including ./development/ndk/Android.mk ...
including ./development/perftests/panorama/Android.mk ...

[Mesa-dev] [PATCH] mesa: Implement ARB_shader_viewport_layer_array for i965

2016-09-09 Thread Dylan Baker

This extension is a combination of AMD_vertex_shader_viewport_index and
AMD_vertex_shader_layer, making it rather trivial to implement.

For gallium I *think* this needs a new cap because of the addition of
support in tessellation evaluation shaders, and since I don't have any
hardware to test it on, I've left that for someone else to wire up.

Since this requires GL 4.1, this is only available on gen8+.

Signed-off-by: Dylan Baker 
---
 docs/features.txt|  2 +-
 docs/relnotes/12.1.0.html|  1 +
 src/compiler/glsl/builtin_variables.cpp  | 14 --
 src/compiler/glsl/glsl_parser_extras.cpp |  1 +
 src/compiler/glsl/glsl_parser_extras.h   |  2 ++
 src/mesa/drivers/dri/i965/intel_extensions.c |  1 +
 src/mesa/main/extensions_table.h |  1 +
 src/mesa/main/mtypes.h   |  1 +
 8 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/docs/features.txt b/docs/features.txt
index d6c3240..4b39a1c 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -296,7 +296,7 @@ Khronos, ARB, and OES extensions that are not part of any 
OpenGL or OpenGL ES ve
   GL_ARB_shader_draw_parameters DONE (i965, nvc0, 
radeonsi)
   GL_ARB_shader_group_vote  DONE (nvc0)
   GL_ARB_shader_stencil_export  DONE (i965/gen9+, 
radeonsi, softpipe, llvmpipe, swr)
-  GL_ARB_shader_viewport_layer_arraynot started
+  GL_ARB_shader_viewport_layer_arrayDONE (i965/gen8+)
   GL_ARB_sparse_buffer  not started
   GL_ARB_sparse_texture not started
   GL_ARB_sparse_texture2not started
diff --git a/docs/relnotes/12.1.0.html b/docs/relnotes/12.1.0.html
index bb20e4f..1a0177c 100644
--- a/docs/relnotes/12.1.0.html
+++ b/docs/relnotes/12.1.0.html
@@ -52,6 +52,7 @@ Note: some of the new features are only available with 
certain drivers.
 GL_ARB_indirect_parameters on radeonsi
 GL_ARB_shader_draw_parameters on radeonsi
 GL_ARB_shader_group_vote on nvc0
+GL_ARB_shader_viewport_layer_array on i965/gen8+
 GL_ARB_stencil_texturing on i965/hsw
 GL_ARB_texture_stencil8 on i965/hsw
 GL_EXT_window_rectangles on nv50, nvc0
diff --git a/src/compiler/glsl/builtin_variables.cpp 
b/src/compiler/glsl/builtin_variables.cpp
index 90278d6..8d6413e 100644
--- a/src/compiler/glsl/builtin_variables.cpp
+++ b/src/compiler/glsl/builtin_variables.cpp
@@ -1000,11 +1000,13 @@ builtin_variable_generator::generate_vs_special_vars()
   add_system_value(SYSTEM_VALUE_BASE_INSTANCE, int_t, 
"gl_BaseInstanceARB");
   add_system_value(SYSTEM_VALUE_DRAW_ID, int_t, "gl_DrawIDARB");
}
-   if (state->AMD_vertex_shader_layer_enable) {
+   if (state->AMD_vertex_shader_layer_enable ||
+   state->ARB_shader_viewport_layer_array_enable) {
   var = add_output(VARYING_SLOT_LAYER, int_t, "gl_Layer");
   var->data.interpolation = INTERP_MODE_FLAT;
}
-   if (state->AMD_vertex_shader_viewport_index_enable) {
+   if (state->AMD_vertex_shader_viewport_index_enable ||
+   state->ARB_shader_viewport_layer_array_enable) {
   var = add_output(VARYING_SLOT_VIEWPORT, int_t, "gl_ViewportIndex");
   var->data.interpolation = INTERP_MODE_FLAT;
}
@@ -1066,6 +1068,8 @@ builtin_variable_generator::generate_tcs_special_vars()
 void
 builtin_variable_generator::generate_tes_special_vars()
 {
+   ir_variable *var;
+
add_system_value(SYSTEM_VALUE_PRIMITIVE_ID, int_t, "gl_PrimitiveID");
add_system_value(SYSTEM_VALUE_VERTICES_IN, int_t, "gl_PatchVerticesIn");
add_system_value(SYSTEM_VALUE_TESS_COORD, vec3_t, "gl_TessCoord");
@@ -1073,6 +1077,12 @@ builtin_variable_generator::generate_tes_special_vars()
 "gl_TessLevelOuter");
add_system_value(SYSTEM_VALUE_TESS_LEVEL_INNER, array(float_t, 2),
 "gl_TessLevelInner");
+   if (state->ARB_shader_viewport_layer_array_enable) {
+  var = add_output(VARYING_SLOT_LAYER, int_t, "gl_Layer");
+  var->data.interpolation = INTERP_MODE_FLAT;
+  var = add_output(VARYING_SLOT_VIEWPORT, int_t, "gl_ViewportIndex");
+  var->data.interpolation = INTERP_MODE_FLAT;
+   }
 }
 
 
diff --git a/src/compiler/glsl/glsl_parser_extras.cpp 
b/src/compiler/glsl/glsl_parser_extras.cpp
index 436ddd0..a21ce50 100644
--- a/src/compiler/glsl/glsl_parser_extras.cpp
+++ b/src/compiler/glsl/glsl_parser_extras.cpp
@@ -608,6 +608,7 @@ static const _mesa_glsl_extension 
_mesa_glsl_supported_extensions[] = {
EXT(ARB_shader_subroutine),
EXT(ARB_shader_texture_image_samples),
EXT(ARB_shader_texture_lod),
+   EXT(ARB_shader_viewport_layer_array),
EXT(ARB_shading_language_420pack),
EXT(ARB_shading_language_packing),
EXT(ARB_tessellation_shader),
diff --git a/src/compiler/glsl/glsl_parser_extras.h 
b/src/compiler/glsl/glsl_parser_extras.h
index

[Mesa-dev] [PATCH] radeonsi: don't preload constants at the beginning of shaders

2016-09-09 Thread Marek Olšák

From: Marek Olšák 

LLVM can CSE the loads, thus we can always re-load constants before each
use. The decrease in SGPR spilling is huge.

The best improvements are the dumbest ones.

26011 shaders in 14651 tests
Totals:
SGPRS: 1453346 -> 1251920 (-13.86 %)
VGPRS: 742576 -> 728421 (-1.91 %)
Spilled SGPRs: 52298 -> 16644 (-68.17 %)
Spilled VGPRs: 397 -> 369 (-7.05 %)
Scratch VGPRs: 1372 -> 1344 (-2.04 %) dwords per thread
Code Size: 36136488 -> 36001064 (-0.37 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 219315 -> 21 (1.33 %)
---
 src/gallium/drivers/radeonsi/si_shader.c | 30 +++---
 1 file changed, 11 insertions(+), 19 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 0b7de18..08e3cee 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1874,26 +1874,33 @@ static LLVMValueRef fetch_constant(
for (chan = 0; chan < TGSI_NUM_CHANNELS; ++chan)
values[chan] = fetch_constant(bld_base, reg, type, 
chan);
 
return lp_build_gather_values(bld_base->base.gallivm, values, 
4);
}
 
buf = reg->Register.Dimension ? reg->Dimension.Index : 0;
idx = reg->Register.Index * 4 + swizzle;
 
if (!reg->Register.Indirect && !reg->Dimension.Indirect) {
+   LLVMValueRef c0, c1;
+
+   c0 = buffer_load_const(ctx, ctx->const_buffers[buf],
+  LLVMConstInt(ctx->i32, idx * 4, 0));
+
if (!tgsi_type_is_64bit(type))
-   return bitcast(bld_base, type, 
ctx->constants[buf][idx]);
+   return bitcast(bld_base, type, c0);
else {
+   c1 = buffer_load_const(ctx, ctx->const_buffers[buf],
+  LLVMConstInt(ctx->i32,
+   (idx + 1) * 4, 0));
return radeon_llvm_emit_fetch_64bit(bld_base, type,
-   
ctx->constants[buf][idx],
-   
ctx->constants[buf][idx + 1]);
+   c0, c1);
}
}
 
if (reg->Register.Dimension && reg->Dimension.Indirect) {
LLVMValueRef ptr = LLVMGetParam(ctx->radeon_bld.main_fn, 
SI_PARAM_CONST_BUFFERS);
LLVMValueRef index;
index = get_bounded_indirect_index(ctx, >DimIndirect,
   reg->Dimension.Index,
   SI_NUM_CONST_BUFFERS);
bufp = build_indexed_load_const(ctx, ptr, index);
@@ -5789,39 +5796,26 @@ static void create_function(struct si_shader_context 
*ctx)
 
 static void preload_constants(struct si_shader_context *ctx)
 {
struct lp_build_tgsi_context *bld_base = >radeon_bld.soa.bld_base;
struct gallivm_state *gallivm = bld_base->base.gallivm;
const struct tgsi_shader_info *info = bld_base->info;
unsigned buf;
LLVMValueRef ptr = LLVMGetParam(ctx->radeon_bld.main_fn, 
SI_PARAM_CONST_BUFFERS);
 
for (buf = 0; buf < SI_NUM_CONST_BUFFERS; buf++) {
-   unsigned i, num_const = info->const_file_max[buf] + 1;
-
-   if (num_const == 0)
+   if (info->const_file_max[buf] == -1)
continue;
 
-   /* Allocate space for the constant values */
-   ctx->constants[buf] = CALLOC(num_const * 4, 
sizeof(LLVMValueRef));
-
/* Load the resource descriptor */
ctx->const_buffers[buf] =
build_indexed_load_const(ctx, ptr, 
lp_build_const_int32(gallivm, buf));
-
-   /* Load the constants, we rely on the code sinking to do the 
rest */
-   for (i = 0; i < num_const * 4; ++i) {
-   ctx->constants[buf][i] =
-   buffer_load_const(ctx,
-   ctx->const_buffers[buf],
-   lp_build_const_int32(gallivm, i * 4));
-   }
}
 }
 
 static void preload_shader_buffers(struct si_shader_context *ctx)
 {
struct gallivm_state *gallivm = >radeon_bld.gallivm;
LLVMValueRef ptr = LLVMGetParam(ctx->radeon_bld.main_fn, 
SI_PARAM_SHADER_BUFFERS);
int buf, maxbuf;
 
maxbuf = MIN2(ctx->shader->selector->info.file_max[TGSI_FILE_BUFFER],
@@ -6898,22 +6892,20 @@ int si_compile_tgsi_shader(struct si_screen *sscreen,
ctx.shader = shader->gs_copy_shader;
if ((r = si_generate_gs_copy_shader(sscreen, ,
shader, debug))) {

[Mesa-dev] [PATCH] nir: Allow opt_peephole_sel to be more aggressive in flattening IFs.

2016-09-09 Thread Eric Anholt

VC4 was running into a major performance regression from enabling control
flow in the glmark2 conditionals test, because of short if statements
containing an ffract.

This pass seems like it was was trying to ensure that we only flattened
IFs that should be entirely a win by guaranteeing that there would be
fewer bcsels than there were MOVs otherwise.  However, if the number of
ALU ops is small, we can avoid the overhead of branching (which itself
costs cycles) and still get a win, even if it means moving real
instructions out of the THEN/ELSE blocks.

For now, just turn on aggressive flattening on vc4.  i965 will need some
tuning to avoid regressions.  It does looks like this may be useful to
replace freedreno code.

Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47
fps to 95 fps on vc4.

vc4 shader-db:
total instructions in shared programs: 83220 -> 78887 (-5.21%)
instructions in affected programs: 21095 -> 16762 (-20.54%)
total uniforms in shared programs: 25977 -> 25606 (-1.43%)
uniforms in affected programs: 4196 -> 3825 (-8.84%)
total estimated cycles in shared programs: 195563 -> 192153 (-1.74%)
estimated cycles in affected programs: 30171 -> 26761 (-11.30%)

(This doesn't even count the test being optimized, due to a limitation in
shader-db-2)
---
 src/compiler/nir/nir.h |  2 +-
 src/compiler/nir/nir_opt_peephole_select.c | 82 --
 src/gallium/drivers/vc4/vc4_program.c  |  2 +-
 src/mesa/drivers/dri/i965/brw_nir.c|  2 +-
 4 files changed, 57 insertions(+), 31 deletions(-)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index c1cf94001f65..f56f67690d34 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2589,7 +2589,7 @@ bool nir_opt_dead_cf(nir_shader *shader);
 
 void nir_opt_gcm(nir_shader *shader);
 
-bool nir_opt_peephole_select(nir_shader *shader);
+bool nir_opt_peephole_select(nir_shader *shader, unsigned limit);
 
 bool nir_opt_remove_phis(nir_shader *shader);
 
diff --git a/src/compiler/nir/nir_opt_peephole_select.c 
b/src/compiler/nir/nir_opt_peephole_select.c
index 633e9f486c08..6a73d737077c 100644
--- a/src/compiler/nir/nir_opt_peephole_select.c
+++ b/src/compiler/nir/nir_opt_peephole_select.c
@@ -32,23 +32,33 @@
  * Implements a small peephole optimization that looks for
  *
  * if (cond) {
- *
+ *
  * } else {
- *
+ *
  * }
  * phi
  * ...
  * phi
  *
- * and replaces it with a series of selects.  It can also handle the case
- * where, instead of being empty, the if may contain some move operations
- * whose only use is one of the following phi nodes.  This happens all the
- * time when the SSA form comes from a conditional assignment with a
- * swizzle.
+ * and replaces it with:
+ *
+ * 
+ * 
+ * bcsel
+ * ...
+ * bcsel
+ *
+ * where the SSA defs are ALU operations or other cheap instructions (not
+ * texturing, for example).
+ *
+ * If the number of ALU operations in the branches is greater than the limit
+ * parameter, then the optimization is skipped.  In limit=0 mode, the SSA defs
+ * must only be MOVs which we expect to get copy-propagated away once they're
+ * out of the inner blocks.
  */
 
 static bool
-block_check_for_allowed_instrs(nir_block *block)
+block_check_for_allowed_instrs(nir_block *block, unsigned *count, bool alu_ok)
 {
nir_foreach_instr(instr, block) {
   switch (instr->type) {
@@ -67,6 +77,11 @@ block_check_for_allowed_instrs(nir_block *block)
 }
 break;
 
+ case nir_intrinsic_load_uniform:
+if (!alu_ok)
+   return false;
+break;
+
  default:
 return false;
  }
@@ -89,29 +104,36 @@ block_check_for_allowed_instrs(nir_block *block)
  case nir_op_vec2:
  case nir_op_vec3:
  case nir_op_vec4:
-/* It must be a move-like operation. */
 break;
  default:
-return false;
+if (!alu_ok) {
+   /* It must be a move-like operation. */
+   return false;
+}
+break;
  }
 
- /* Can't handle saturate */
- if (mov->dest.saturate)
-return false;
-
  /* It must be SSA */
  if (!mov->dest.dest.is_ssa)
 return false;
 
- /* It cannot have any if-uses */
- if (!list_empty(>dest.dest.ssa.if_uses))
-return false;
+ if (alu_ok) {
+(*count)++;
+ } else {
+/* Can't handle saturate */
+if (mov->dest.saturate)
+   return false;
 
- /* The only uses of this definition must be phi's in the successor */
- nir_foreach_use(use, >dest.dest.ssa) {
-if (use->parent_instr->type != nir_instr_type_phi ||
-use->parent_instr->block != block->successors[0])
+/* It cannot have any if-uses */
+if

Re: [Mesa-dev] [PATCH 29/33] anv: Use blorp for CopyImage

2016-09-09 Thread Jason Ekstrand

On Fri, Sep 9, 2016 at 2:35 PM, Nanley Chery  wrote:

> On Wed, Aug 31, 2016 at 02:22:48PM -0700, Jason Ekstrand wrote:
> > ---
> >  src/intel/vulkan/anv_blorp.c |  67 +
> >  src/intel/vulkan/anv_meta_copy.c | 158 --
> -
> >  2 files changed, 67 insertions(+), 158 deletions(-)
> >
> > diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
> > index 5fa6699..89ff3b3 100644
> > --- a/src/intel/vulkan/anv_blorp.c
> > +++ b/src/intel/vulkan/anv_blorp.c
> > @@ -168,6 +168,73 @@ get_blorp_surf_for_anv_image(const struct
> anv_image *image,
> > };
> >  }
> >
> > +void anv_CmdCopyImage(
> > +VkCommandBuffer commandBuffer,
> > +VkImage srcImage,
> > +VkImageLayout   srcImageLayout,
> > +VkImage dstImage,
> > +VkImageLayout   dstImageLayout,
> > +uint32_tregionCount,
> > +const VkImageCopy*  pRegions)
> > +{
> > +   ANV_FROM_HANDLE(anv_cmd_buffer, cmd_buffer, commandBuffer);
> > +   ANV_FROM_HANDLE(anv_image, src_image, srcImage);
> > +   ANV_FROM_HANDLE(anv_image, dst_image, dstImage);
> > +
> > +   struct blorp_batch batch;
> > +   blorp_batch_init(_buffer->device->blorp, , cmd_buffer);
> > +
> > +   for (unsigned r = 0; r < regionCount; r++) {
> > +  VkOffset3D srcOffset =
> > + anv_sanitize_image_offset(src_image->type,
> pRegions[r].srcOffset);
> > +  VkOffset3D dstOffset =
> > + anv_sanitize_image_offset(src_image->type,
> pRegions[r].dstOffset);
> > +  VkExtent3D extent =
> > + anv_sanitize_image_extent(src_image->type,
> pRegions[r].extent);
> > +
> > +  unsigned dst_base_layer, layer_count;
> > +  if (dst_image->type == VK_IMAGE_TYPE_3D) {
> > + dst_base_layer = dstOffset.z;
> > + layer_count = extent.depth;
>
> In the patch titled, "anv/blorp: Use the correct coordinates in
> CmdCopyImage,"
> you replace extent.depth with pRegions[r].extent.depth. I think it also
> makes
> sense to change the assignment of src_base_layer and dest_base_layer with
> the
> offsets straight from user when we know the image type is VK_IMAGE_TYPE_3D.
> When the type is 3D, the sanitize functions do nothing.
>

Sounds reasonable.


> With or without that change, and assuming the fixup patch is amended,
> this patch is,
>
> Reviewed-by: Nanley Chery 
>

Thanks!

> +  } else {
> > + dst_base_layer = pRegions[r].dstSubresource.baseArrayLayer;
> > + layer_count = pRegions[r].dstSubresource.layerCount;
> > +  }
> > +
> > +  unsigned src_base_layer;
> > +  if (src_image->type == VK_IMAGE_TYPE_3D) {
> > + src_base_layer = srcOffset.z;
> > +  } else {
> > + src_base_layer = pRegions[r].srcSubresource.baseArrayLayer;
> > + assert(pRegions[r].srcSubresource.layerCount == layer_count);
> > +  }
> > +
> > +  assert(pRegions[r].srcSubresource.aspectMask ==
> > + pRegions[r].dstSubresource.aspectMask);
> > +
> > +  uint32_t a;
> > +  for_each_bit(a, pRegions[r].dstSubresource.aspectMask) {
> > + VkImageAspectFlagBits aspect = (1 << a);
> > +
> > + struct blorp_surf src_surf, dst_surf;
> > + get_blorp_surf_for_anv_image(src_image, aspect, _surf);
> > + get_blorp_surf_for_anv_image(dst_image, aspect, _surf);
> > +
> > + for (unsigned i = 0; i < layer_count; i++) {
> > +blorp_copy(, _surf, pRegions[r].srcSubresource.
> mipLevel,
> > +   src_base_layer + i,
> > +   _surf, pRegions[r].dstSubresource.mipLevel,
> > +   dst_base_layer + i,
> > +   srcOffset.x, srcOffset.y,
> > +   dstOffset.x, dstOffset.y,
> > +   extent.width, extent.height);
> > + }
> > +  }
> > +   }
> > +
> > +   blorp_batch_finish();
> > +}
> > +
> >  static void
> >  copy_buffer_to_image(struct anv_cmd_buffer *cmd_buffer,
> >   struct anv_buffer *anv_buffer,
> > diff --git a/src/intel/vulkan/anv_meta_copy.c
> b/src/intel/vulkan/anv_meta_copy.c
> > index 5df04e6..b33273e 100644
> > --- a/src/intel/vulkan/anv_meta_copy.c
> > +++ b/src/intel/vulkan/anv_meta_copy.c
> > @@ -23,63 +23,6 @@
> >
> >  #include "anv_meta.h"
> >
> > -static VkExtent3D
> > -meta_image_block_size(const struct anv_image *image)
> > -{
> > -   if (image->aspects == VK_IMAGE_ASPECT_COLOR_BIT) {
> > -  const struct isl_format_layout *isl_layout =
> > - isl_format_get_layout(image->color_surface.isl.format);
> > -  return (VkExtent3D) { isl_layout->bw, isl_layout->bh,
> isl_layout->bd };
> > -   } else {
> > -  return (VkExtent3D) { 1, 1, 1 };
> > -   }
> > -}
> > -
> > -/* Returns

Re: [Mesa-dev] [PATCH 29/33] anv: Use blorp for CopyImage

2016-09-09 Thread Nanley Chery

On Wed, Aug 31, 2016 at 02:22:48PM -0700, Jason Ekstrand wrote:
> ---
>  src/intel/vulkan/anv_blorp.c |  67 +
>  src/intel/vulkan/anv_meta_copy.c | 158 
> ---
>  2 files changed, 67 insertions(+), 158 deletions(-)
> 
> diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
> index 5fa6699..89ff3b3 100644
> --- a/src/intel/vulkan/anv_blorp.c
> +++ b/src/intel/vulkan/anv_blorp.c
> @@ -168,6 +168,73 @@ get_blorp_surf_for_anv_image(const struct anv_image 
> *image,
> };
>  }
>  
> +void anv_CmdCopyImage(
> +VkCommandBuffer commandBuffer,
> +VkImage srcImage,
> +VkImageLayout   srcImageLayout,
> +VkImage dstImage,
> +VkImageLayout   dstImageLayout,
> +uint32_tregionCount,
> +const VkImageCopy*  pRegions)
> +{
> +   ANV_FROM_HANDLE(anv_cmd_buffer, cmd_buffer, commandBuffer);
> +   ANV_FROM_HANDLE(anv_image, src_image, srcImage);
> +   ANV_FROM_HANDLE(anv_image, dst_image, dstImage);
> +
> +   struct blorp_batch batch;
> +   blorp_batch_init(_buffer->device->blorp, , cmd_buffer);
> +
> +   for (unsigned r = 0; r < regionCount; r++) {
> +  VkOffset3D srcOffset =
> + anv_sanitize_image_offset(src_image->type, pRegions[r].srcOffset);
> +  VkOffset3D dstOffset =
> + anv_sanitize_image_offset(src_image->type, pRegions[r].dstOffset);
> +  VkExtent3D extent =
> + anv_sanitize_image_extent(src_image->type, pRegions[r].extent);
> +
> +  unsigned dst_base_layer, layer_count;
> +  if (dst_image->type == VK_IMAGE_TYPE_3D) {
> + dst_base_layer = dstOffset.z;
> + layer_count = extent.depth;

In the patch titled, "anv/blorp: Use the correct coordinates in CmdCopyImage,"
you replace extent.depth with pRegions[r].extent.depth. I think it also makes
sense to change the assignment of src_base_layer and dest_base_layer with the
offsets straight from user when we know the image type is VK_IMAGE_TYPE_3D.
When the type is 3D, the sanitize functions do nothing.

With or without that change, and assuming the fixup patch is amended,
this patch is,

Reviewed-by: Nanley Chery 

> +  } else {
> + dst_base_layer = pRegions[r].dstSubresource.baseArrayLayer;
> + layer_count = pRegions[r].dstSubresource.layerCount;
> +  }
> +
> +  unsigned src_base_layer;
> +  if (src_image->type == VK_IMAGE_TYPE_3D) {
> + src_base_layer = srcOffset.z;
> +  } else {
> + src_base_layer = pRegions[r].srcSubresource.baseArrayLayer;
> + assert(pRegions[r].srcSubresource.layerCount == layer_count);
> +  }
> +
> +  assert(pRegions[r].srcSubresource.aspectMask ==
> + pRegions[r].dstSubresource.aspectMask);
> +
> +  uint32_t a;
> +  for_each_bit(a, pRegions[r].dstSubresource.aspectMask) {
> + VkImageAspectFlagBits aspect = (1 << a);
> +
> + struct blorp_surf src_surf, dst_surf;
> + get_blorp_surf_for_anv_image(src_image, aspect, _surf);
> + get_blorp_surf_for_anv_image(dst_image, aspect, _surf);
> +
> + for (unsigned i = 0; i < layer_count; i++) {
> +blorp_copy(, _surf, 
> pRegions[r].srcSubresource.mipLevel,
> +   src_base_layer + i,
> +   _surf, pRegions[r].dstSubresource.mipLevel,
> +   dst_base_layer + i,
> +   srcOffset.x, srcOffset.y,
> +   dstOffset.x, dstOffset.y,
> +   extent.width, extent.height);
> + }
> +  }
> +   }
> +
> +   blorp_batch_finish();
> +}
> +
>  static void
>  copy_buffer_to_image(struct anv_cmd_buffer *cmd_buffer,
>   struct anv_buffer *anv_buffer,
> diff --git a/src/intel/vulkan/anv_meta_copy.c 
> b/src/intel/vulkan/anv_meta_copy.c
> index 5df04e6..b33273e 100644
> --- a/src/intel/vulkan/anv_meta_copy.c
> +++ b/src/intel/vulkan/anv_meta_copy.c
> @@ -23,63 +23,6 @@
>  
>  #include "anv_meta.h"
>  
> -static VkExtent3D
> -meta_image_block_size(const struct anv_image *image)
> -{
> -   if (image->aspects == VK_IMAGE_ASPECT_COLOR_BIT) {
> -  const struct isl_format_layout *isl_layout =
> - isl_format_get_layout(image->color_surface.isl.format);
> -  return (VkExtent3D) { isl_layout->bw, isl_layout->bh, isl_layout->bd };
> -   } else {
> -  return (VkExtent3D) { 1, 1, 1 };
> -   }
> -}
> -
> -/* Returns the user-provided VkBufferImageCopy::imageExtent in units of
> - * elements rather than texels. One element equals one texel or one block
> - * if Image is uncompressed or compressed, respectively.
> - */
> -static struct VkExtent3D
> -meta_region_extent_el(const struct anv_image *image,
> -  const struct VkExtent3D

Re: [Mesa-dev] [PATCH 00/12] radeon/winsyses: various cleanups

2016-09-09 Thread Marek Olšák

On Fri, Sep 9, 2016 at 7:34 PM, Nicolai Hähnle  wrote:
> Hi,
>
> this is a bunch of random cleanups in and related to radeon and amdgpu
> winsyses.
>
> There is one significant behavioral change, which is that amdgpu now only
> keeps a single fence around for each buffer object. See the explanation
> at that commit. Please review!

It's not really a behavioral change, is it?

For the series:

Reviewed-by: Marek Olšák 

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 2/3] egl: return corresponding offset of EGLImage instead of 0.

2016-09-09 Thread Axel Davy

I'm not sure calling queryImage with an unsupported attribute is legal, 
thus I think a small check doesn't hurt.


It'd give

if (offsets) {
   offsets[0] = 0;
   if (dri2_dpy->image->base.version >= 13) {
  EGLint img_offset = 0;
  bool ret = dri2_dpy->image->queryImage(dri2_img->dri_image,
 __DRI_IMAGE_ATTRIB_OFFSET, 
_offset);

  if (ret)
 offsets[0] = img_offset;
   }
}
return EGL_TRUE;

or alternatively, if you think not being able to give the offset 
indicates an error,


if (offsets) {
   offsets[0] = 0;
   if (dri2_dpy->image->base.version >= 13) {
  EGLint img_offset = 0;
  bool ret = dri2_dpy->image->queryImage(dri2_img->dri_image,
 __DRI_IMAGE_ATTRIB_OFFSET, 
_offset);

  if (!ret)
 return EGL_FALSE;
  offsets[0] = img_offset;
   }
}
return EGL_TRUE;

Axel Davy

On 09/09/2016 17:34, Weng, Chuanbo wrote:

The comments from Emil can be summarized into the following code:

if (offsets) {
   offsets[0] = 0;
   EGLint img_offset = 0;
   bool ret = dri2_dpy->image->queryImage(dri2_img->dri_image,
__DRI_IMAGE_ATTRIB_OFFSET, _offset);
   if(ret == true)
  offsets[0] = img_offset;
}

return EGL_TRUE;

Emil, is it right?

-Original Message-
From: Emil Velikov [mailto:emil.l.veli...@gmail.com]
Sent: Friday, September 9, 2016 8:32 PM
To: Weng, Chuanbo 
Cc: ML mesa-dev ; Axel Davy 
Subject: Re: [PATCH v2 2/3] egl: return corresponding offset of EGLImage 
instead of 0.

On 9 September 2016 at 08:58, Chuanbo Weng  wrote:

The offset should not always be 0. For example, if EGLImage is created
from a 2D texture with EGL_GL_TEXTURE_LEVEL=1, then the offset should
be the actual start of miplevel 1 in bo.

v2: Add version check of __DRIimageExtension implementation (Suggested
by Axel Davy).


Just to elaborate what Axel was saying from another perspective:

With current master the offset is explicitly zeroed, while with the
(v2) patch you'll _only_ set the value when you have v13 driver.
Thus using the loader + v12 driver you'll get a regression since a) the offset 
will remain garbage and b) the function
(dri2_export_dma_buf_image_mesa) will fail.



Signed-off-by: Chuanbo Weng 
---
  src/egl/drivers/dri2/egl_dri2.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/egl/drivers/dri2/egl_dri2.c
b/src/egl/drivers/dri2/egl_dri2.c index 859612f..c529e55 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -2249,6 +2249,7 @@ dri2_export_dma_buf_image_mesa(_EGLDriver *drv, 
_EGLDisplay *disp, _EGLImage *im
 struct dri2_egl_image *dri2_img = dri2_egl_image(img);

 (void) drv;
+   EGLint img_offset = 0;

 /* rework later to provide multiple fds/strides/offsets */
 if (fds)
@@ -2259,8 +2260,14 @@ dri2_export_dma_buf_image_mesa(_EGLDriver *drv, 
_EGLDisplay *disp, _EGLImage *im
dri2_dpy->image->queryImage(dri2_img->dri_image,
   __DRI_IMAGE_ATTRIB_STRIDE, strides);

-   if (offsets)
+   if (offsets){
offsets[0] = 0;
+  if(dri2_dpy->image->base.version >= 13){

Please move the variable declaration in local scope and add space between ){

Functionality wise we don't need the version check, esp. since you want to set 
the offset only when queryImage() succeeds...


+ dri2_dpy->image->queryImage(dri2_img->dri_image,
+ __DRI_IMAGE_ATTRIB_OFFSET, _offset);

... which you don't seem to be checking here, so you'll effectively store/use 
garbage.

So unless Alex feels strongly for the version check I'd omit it, and fix the 
rest of this patch.

-Emil



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 28/57] i965/fs: Take into account copy register offset during compute-to-mrf.

2016-09-09 Thread Francisco Jerez

Iago Toral  writes:

> On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote:
>> This was dropping 'inst->dst.offset' on the floor.  Nothing in the
>> code above seems to guarantee that it's zero and in that case the
>> offset of the register being coalesced into wouldn't be taken into
>> account while rewriting the generating instruction.
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> index 98da06c..5b7ca98 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> @@ -2821,7 +2821,7 @@ fs_visitor::compute_to_mrf()
>>  }
>>  
>>  scan_inst->dst.file = MRF;
>> -scan_inst->dst.offset %= REG_SIZE;
>> +scan_inst->dst.offset = inst->dst.offset + scan_inst-
>> >dst.offset % REG_SIZE;
>
> So if we had something like this:
>
> 0: mov(4) r1.4:F r0.0:F
> 1: mov(4) m1:F r1.4:F
>
> This pass is trying to get us:
>
> mov(4) m1:F r0.0:F
>

Yeah, that's what it should be doing, but the current code (already
before this series) would give you:

| mov(4) m1.4:F r0.0:F

which is pretty bogus.

> In that case, the offset into the destination of the instruction we are 
> propagating from should not affect the offset into the MRF we are writing to. 
> Or maybe I am missing what is going on here :)

Yeah, that's exactly the problem fixed in the next patch.  Note that
this commit doesn't actually introduce the bug, it's preserving the
buggy behavior of preexisting code, which happens to become more obvious
due to the explicit 'scan_inst->dst.offset % REG_SIZE' term instead of
the '%=' operator.

>
>>  scan_inst->saturate |= inst->saturate;
>>  if (!regs_left)
>> break;


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 27/57] i965/vec4: Drop backend_reg::in_range() in favor of regions_overlap().

2016-09-09 Thread Francisco Jerez

Iago Toral  writes:

> On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote:
>> This makes sure that overlap checks are done correctly throughout the
>> back-end when the '*this' register starts before the register/size
>> pair provided as argument, and is actually less annoying to use than
>> in_range() at this point since regions_overlap() takes its size
>> arguments in bytes.
>> ---
>>  src/mesa/drivers/dri/i965/brw_shader.cpp|  9 -
>> 
>>  src/mesa/drivers/dri/i965/brw_shader.h  |  1 -
>>  src/mesa/drivers/dri/i965/brw_vec4.cpp  | 14
>> --
>>  src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp |  4 ++--
>>  src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp |  4 ++--
>>  5 files changed, 12 insertions(+), 20 deletions(-)
>> 
>> diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp
>> b/src/mesa/drivers/dri/i965/brw_shader.cpp
>> index e599235..951e6b2 100644
>> --- a/src/mesa/drivers/dri/i965/brw_shader.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
>> @@ -745,15 +745,6 @@ backend_reg::is_accumulator() const
>>  }
>>  
>>  bool
>> -backend_reg::in_range(const backend_reg , unsigned n) const
>> -{
>> -   return (file == r.file &&
>> -   nr == r.nr &&
>> -   offset >= r.offset &&
>> -   offset < r.offset + n * REG_SIZE);
>> -}
>> -
>> -bool
>>  backend_instruction::is_commutative() const
>>  {
>> switch (opcode) {
>> diff --git a/src/mesa/drivers/dri/i965/brw_shader.h
>> b/src/mesa/drivers/dri/i965/brw_shader.h
>> index 0de0808..ba2404a 100644
>> --- a/src/mesa/drivers/dri/i965/brw_shader.h
>> +++ b/src/mesa/drivers/dri/i965/brw_shader.h
>> @@ -62,7 +62,6 @@ struct backend_reg : private brw_reg
>> bool is_negative_one() const;
>> bool is_null() const;
>> bool is_accumulator() const;
>> -   bool in_range(const backend_reg , unsigned n) const;
>>  
>> /** Offset from the start of the (virtual) register in bytes. */
>> uint16_t offset;
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> index 561170c..eb2888f 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> @@ -1143,7 +1143,8 @@ vec4_visitor::opt_register_coalesce()
>>    inst) {
>>   _scan_inst = scan_inst;
>>  
>> - if (inst->src[0].in_range(scan_inst->dst,
>> DIV_ROUND_UP(scan_inst->size_written, REG_SIZE))) {
>> + if (regions_overlap(inst->src[0], inst->size_read(0),
>> + scan_inst->dst, scan_inst-
>> >size_written)) {
>>  /* Found something writing to the reg we want to
>> coalesce away. */
>>  if (to_mrf) {
>> /* SEND instructions can't have MRF as a destination.
>> */
>> @@ -1197,8 +1198,8 @@ vec4_visitor::opt_register_coalesce()
>>    */
>>   bool interfered = false;
>>   for (int i = 0; i < 3; i++) {
>> -if (inst->src[0].in_range(scan_inst->src[i],
>> -  DIV_ROUND_UP(scan_inst-
>> >size_read(i), REG_SIZE)))
>> +if (regions_overlap(inst->src[0], inst->size_read(0),
>> +scan_inst->src[i], scan_inst-
>> >size_read(i)))
>>     interfered = true;
>>   }
>>   if (interfered)
>> @@ -1207,7 +1208,8 @@ vec4_visitor::opt_register_coalesce()
>>   /* If somebody else writes the same channels of our
>> destination here,
>>    * we can't coalesce before that.
>>    */
>> - if (inst->dst.in_range(scan_inst->dst,
>> DIV_ROUND_UP(scan_inst->size_written, REG_SIZE)) &&
>> + if (regions_overlap(inst->dst, inst->size_written,
>> + scan_inst->dst, scan_inst-
>> >size_written) &&
>>   (inst->dst.writemask & scan_inst->dst.writemask) != 0)
>> {
>>  break;
>>   }
>> @@ -1223,8 +1225,8 @@ vec4_visitor::opt_register_coalesce()
>>  }
>>   } else {
>>  for (int i = 0; i < 3; i++) {
>> -   if (inst->dst.in_range(scan_inst->src[i],
>> -  DIV_ROUND_UP(scan_inst-
>> >size_read(i), REG_SIZE)))
>> +   if (regions_overlap(inst->dst, inst->size_written,
>> +   scan_inst->src[i], scan_inst-
>> >size_read(i)))
>>    interfered = true;
>>  }
>>  if (interfered)
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
>> b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
>> index e74bc15..9977317 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
>> @@ -68,8 +68,8 @@ opt_cmod_propagation_local(bblock_t *block)
>>  
>>    bool read_flag = false;
>>

Re: [Mesa-dev] [PATCH 16/57] i965/fs: Take into account trailing padding in regs_written() and regs_read().

2016-09-09 Thread Francisco Jerez

Iago Toral  writes:

> On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote:
>> This fixes regs_written() and regs_read() to return a more accurate
>> value when the padding left between components due to a stride value
>> greater than one causes the region bounds given by size_written or
>> size_read to overflow into the next register.  This could become a
>> problem in optimization passes that keep track of dataflow using
>> fixed-size arrays with register granularity, because the overflow
>> register (not actually accessed by the region) may not have been
>> allocated at all which could lead to undefined memory access.
>
> ah, nice catch! I am curious: did you find any cases where this was
> creating a real problem or just something you noticed by inspection? 
>

Yeah, it definitely fixed real problems in a bunch of FP64 tests (I
forgot which ones but I could dig it up if you're interested) which
would have regressed from the next commits that fix regs_written and
regs_read to take into account misalignment -- The only reason this
wasn't a problem before is that we weren't rounding up the register
count correctly in most places...

>> An alternative to this would be to subtract the trailing padding
>> already during the calculation of fs_inst::size_read and
>> ::size_written, but that would break code that currently assumes that
>> ::size_read and _written are whole multiples of the component size,
>> and would be hard to maintain looking forward because size_written is
>> assigned from a bunch of different places.
>
> Yeah, this would be a more problematic solution.
>
>> ---
>>  src/mesa/drivers/dri/i965/brw_ir_fs.h | 22 --
>>  1 file changed, 20 insertions(+), 2 deletions(-)
>> 
>> diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h
>> b/src/mesa/drivers/dri/i965/brw_ir_fs.h
>> index 4ac9baa..0be67b7 100644
>> --- a/src/mesa/drivers/dri/i965/brw_ir_fs.h
>> +++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h
>> @@ -192,6 +192,20 @@ reg_offset(const fs_reg )
>>  }
>>  
>>  /**
>> + * Return the amount of padding in bytes left unused between
>> individual
>> + * components of register \p r due to a (horizontal) stride value
>> greater than
>> + * one, or zero if components are tightly packed in the register
>> file.
>> + */
>> +static inline unsigned
>> +reg_padding(const fs_reg )
>> +{
>> +   const unsigned stride = ((r.file != ARF && r.file != FIXED_GRF) ?
>> r.stride :
>> +r.hstride == 0 ? 0 :
>> +1 << (r.hstride - 1));
>> +   return (MAX2(1, stride) - 1) * type_sz(r.type);
>> +}
>> +
>> +/**
>>   * Return whether the register region starting at \p r and spanning
>> \p dr
>>   * bytes could potentially overlap the register region starting at
>> \p s and
>>   * spanning \p ds bytes.
>> @@ -423,7 +437,9 @@ regs_written(const fs_inst *inst)
>>  {
>> /* XXX - Take into account register-misaligned offsets correctly.
>> */
>> assert(inst->dst.file != UNIFORM && inst->dst.file != IMM);
>> -   return DIV_ROUND_UP(inst->size_written, REG_SIZE);
>> +   return DIV_ROUND_UP(inst->size_written -
>> +   MIN2(inst->size_written, reg_padding(inst-
>> >dst)),
>> +   REG_SIZE);
>>  }
>>  
>>  /**
>> @@ -438,7 +454,9 @@ regs_read(const fs_inst *inst, unsigned i)
>> /* XXX - Take into account register-misaligned offsets correctly.
>> */
>> const unsigned reg_size =
>>    inst->src[i].file == UNIFORM || inst->src[i].file == IMM ? 4 :
>> REG_SIZE;
>> -   return DIV_ROUND_UP(inst->size_read(i), reg_size);
>> +   return DIV_ROUND_UP(inst->size_read(i) -
>> +   MIN2(inst->size_read(i), reg_padding(inst-
>> >src[i])),
>> +   reg_size);
>>  }
>>  
>>  #endif


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 15/57] i965/fs: Handle fixed HW GRF subnr in reg_offset().

2016-09-09 Thread Francisco Jerez

Iago Toral  writes:

> On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote:
>> This will be useful later on when we start using reg_offset() on
>> fixed
>> hardware registers.
>> ---
>>  src/mesa/drivers/dri/i965/brw_ir_fs.h | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h
>> b/src/mesa/drivers/dri/i965/brw_ir_fs.h
>> index 2e5c8e5..4ac9baa 100644
>> --- a/src/mesa/drivers/dri/i965/brw_ir_fs.h
>> +++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h
>> @@ -187,7 +187,8 @@ static inline unsigned
>>  reg_offset(const fs_reg )
>>  {
>> return (r.file == VGRF || r.file == IMM ? 0 : r.nr) *
>> -  (r.file == UNIFORM ? 4 : REG_SIZE) + r.offset;
>> +  (r.file == UNIFORM ? 4 : REG_SIZE) + r.offset +
>> +  (r.file == ARF || r.file == FIXED_GRF ? r.subnr : 0);
>
> I guess that for ARF and FIXED_GRF r.offset is expected to be 0, right?

It doesn't really matter as far as this function is concerned: If it's
zero it won't affect the result of the function, if it's non-zero it
will be taken into account correctly.  Some other back-end code may
break though if a fixed grf register has non-zero offset, but I'd call
that code responsible for asserting "offset == 0" if it's unable to deal
with it.

> In that case, should we make it more explicit here by adding an assert,
> and/or maybe separating the offset calculation for these register types
> (so it only uses nr and subnr) from the others?
>
> Iago
>
>>  }
>>  
>>  /**


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 25/57] i965/fs: Stop using fs_reg::in_range() in favor of regions_overlap().

2016-09-09 Thread Francisco Jerez

Iago Toral  writes:

> On Wed, 2016-09-07 at 18:48 -0700, Francisco Jerez wrote:
>> Its only use left in the FS back-end should be using
>> regions_overlap()
>> instead to avoid getting a false negative result in cases where
>> source
>> and destination overlap but the former starts before the latter in
>> the
>> VGRF file.
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> index 42ed131..b06606b 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> @@ -4820,7 +4820,8 @@ shuffle_64bit_data_for_32bit_write(const
>> fs_builder ,
>> assert(type_sz(src.type) == 8);
>> assert(type_sz(dst.type) == 4);
>>  
>> -   assert(!src.in_range(dst, 2 * components * bld.dispatch_width() /
>> 8));
>> +   assert(!regions_overlap(dst, 2 *
>> dst.component_size(bld.dispatch_width()),
>> +   src,
>> src.component_size(bld.dispatch_width(;
>
> I think you need to multiply the sizes of the regions for dst and src
> by the number of components being shuffled, like in the original
> assert, otherwise we would only be testing if there is overlap for the
> first component.
>
Good point, thanks!

>> for (unsigned i = 0; i < components; i++) {
>>    const fs_reg component_i = offset(src, bld, i);


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/4] i965: Use blorp_copy for all copy_image operations on gen6+

2016-09-09 Thread Chad Versace

On Fri 09 Sep 2016, Jason Ekstrand wrote:
> 
> On Sep 9, 2016 12:23 PM, "Chad Versace"  wrote:

> > Acked-by: Chad Versace 
> 
> Sure, review the easy patch and ack the interesting ones...

Yep!

> > Just an ack, because I'm not very familiar with the new blorp code.
> 
> The blorp_copy entrypoint does exactly what it looks like it does... Including
> handling all the crazy edge-cases... If you're willing to trust it, 3 and 4 
> are
> trivial. :-)

I was unsure exactly what blorp_copy did. Is it a bit-for-bit copy? Does
it do format translation?

I just read the blorp_copy code, and revisited the glCopyTexImage2D
manpage, and yeah... it's not scary. I trust this code now.

For patches 3 and 4,
Reviewed-by: Chad Versace 

Patch 2, though, is still just an ack. I don't have time to review all
the implications of patch 2.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/4] intel/isl: Add support for RGB formats in X and Y-tiled memory

2016-09-09 Thread Jason Ekstrand

On Sep 9, 2016 12:05 PM, "Chad Versace"  wrote:
>
> On Fri 09 Sep 2016, Jason Ekstrand wrote:
> > Normally, using a non-linear tiling format helps improve cache locality
by
> > ensuring that neighboring pixels are usually close-by in memory.  For
RGB
> > formats, this still sort-of holds, but it can also lead to rather
terrible
> > memory access patterns where a single RGB pixel value crosses a tile
> > boundary and gets split into two pieces in different 4K pages.  It also
> > makes for some rather awkward calculations because your tile size is no
> > longer an even multiple of surface element size.  For these reasons, we
> > chose to simply never create tiled RGB images in the Vulkan driver.
> >
> > The GL driver, however, is not so kind so we need to support it
somehow.  I
> > briefly toyed with a couple of different schemes but this is the best
one I
> > could come up with.  The fundamental problem is that a tile no longer
> > contains an integer number of surface elements.  I briefly considered a
> > couple other options but found them wanting:
> >
> >  1) Using floats for the logical tile size.  This leads to potential
> > rounding error problems.
>
> Let's stay far away from floats With floats, there is just too much risk
> for weird bugs.
>
> Floats are for representing ℝ. Our calculations are in ℚ, so let's
> restrict ourselves to ints.  (Does GMail display the blackboard bold
> UTF-8 correctly?)

It does!  Even on my phone.

> >  2) When presented with a RGB format, just make the tile 3-times as
wide.
> > This isn't so nice because now our tiles are no longer power-of-two
> > size.  Also, it can force the row_pitch to be larger than needed
which,
> > while not strictly a problem for ISL, causes incompatibility
problems
> > with the way the GL driver chooses surface pitches.
> >
> > The chosen method requires that you pay attention and not just assume
that
> > your tile_info is in the units you think it is.  However, it's nice
because
> > it provides a nice "these are the units" declaration in isl_tile_info
> > itself.  Previously, the tile_info wasn't usable as a stand-alone
structure
> > because you had to also know the format.  It also forces figuring out
how
> > to deal with inconsistencies between tiling and format back to the
caller
> > which is good because the two different consumers of isl_tile_info
really
> > want to deal with it differently:  Computation of the surface size wants
> > the fewest number of horizontal tiles possible while
get_intratile_offset
> > is far more concerned with things aligning nicely.
> >
> > Signed-off-by: Jason Ekstrand 
> > Cc: Chad Versace 
> > ---
> >  src/intel/isl/isl.c | 49
-
> >  src/intel/isl/isl.h | 10 +-
> >  2 files changed, 45 insertions(+), 14 deletions(-)
> >
> > diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
> > index 34245f4..8c72186 100644
> > --- a/src/intel/isl/isl.c
> > +++ b/src/intel/isl/isl.c
> > @@ -113,7 +113,16 @@ isl_tiling_get_info(const struct isl_device *dev,
> > const uint32_t bs = format_bpb / 8;
> > struct isl_extent2d logical_el, phys_B;
> >
> > -   assert(tiling == ISL_TILING_LINEAR || isl_is_pow2(format_bpb));
> > +   if (tiling != ISL_TILING_LINEAR && ! isl_is_pow2(format_bpb)) {
> -^^^
> 1. Weirdo space after '!'
>
> > +  /* It is possible to have non-power-of-two formats in a tiled
buffer.
> > +   * The easiest way to handle this is to treat the tile as if it
is three
> > +   * times as wide.  This way no pixel will ever cross a tile
boundary.
> > +   * This really only works on legacy X and Y tiling formats.
> > +   */
> > +  assert(tiling == ISL_TILING_X || tiling == ISL_TILING_Y0);
> > +  assert(bs % 3 == 0 && isl_is_pow2(format_bpb / 3));
> > +  return isl_tiling_get_info(dev, tiling, format_bpb / 3,
tile_info);
> > +   }
> >
> > switch (tiling) {
> > case ISL_TILING_LINEAR:
> > @@ -209,6 +218,7 @@ isl_tiling_get_info(const struct isl_device *dev,
> >
> > *tile_info = (struct isl_tile_info) {
> >.tiling = tiling,
> > +  .format_bpb = format_bpb,
> >.logical_extent_el = logical_el,
> >.phys_extent_B = phys_B,
> > };
> > @@ -1215,14 +1225,20 @@ isl_surf_init_s(const struct isl_device *dev,
> >}
> >base_alignment = isl_round_up_to_power_of_two(base_alignment);
> > } else {
> > +  assert(fmtl->bpb % tile_info.format_bpb == 0);
> > +  const uint32_t tile_el_scale = fmtl->bpb / tile_info.format_bpb;
>
> > +  assert(tile_el_scale == 1 || tile_el_scale == 3);
>
> 2. Without the context of this patch, the assertion is very mysterious.
> Future people will be afraid to touch it, as there's no clue to the
> magic 3's justification. Please add a comment.

Yes, it can be dropped.  At one point I was trying harder to be clever

Re: [Mesa-dev] [PATCH 4/4] i965: Use blorp_copy for all copy_image operations on gen6+

2016-09-09 Thread Jason Ekstrand

On Sep 9, 2016 12:23 PM, "Chad Versace"  wrote:
>
> Patches 3 and 4, despite their craziness, are

They're not that crazy...

> Acked-by: Chad Versace 

Sure, review the easy patch and ack the interesting ones...

> Just an ack, because I'm not very familiar with the new blorp code.

The blorp_copy entrypoint does exactly what it looks like it does...
Including handling all the crazy edge-cases... If you're willing to trust
it, 3 and 4 are trivial. :-)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] egl: Document why EGL_OPENGL{, _ES}_API are mostly identical

2016-09-09 Thread Chad Versace

On Thu 08 Sep 2016, Alejandro Piñeiro wrote:
> 
> 
> On 07/09/16 23:41, Adam Jackson wrote:
> > Signed-off-by: Adam Jackson 
> > ---
> >  src/egl/main/eglcontext.c | 10 ++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/src/egl/main/eglcontext.c b/src/egl/main/eglcontext.c
> > index ebc004d..057b60f 100644
> > --- a/src/egl/main/eglcontext.c
> > +++ b/src/egl/main/eglcontext.c
> > @@ -457,6 +457,16 @@ _eglParseContextAttribList(_EGLContext *ctx, 
> > _EGLDisplay *dpy,
> >  /**
> >   * Initialize the given _EGLContext object to defaults and/or the values
> >   * in the attrib_list.
> > + *
> > + * According to EGL 1.5 Section 3.7:
> > + *
> > + * "EGL_OPENGL_API and EGL_OPENGL_ES_API are interchangeable for all
> > + *  purposes except eglCreateContext."
> 
> Usually spec quotations add some indentation. Take a look to line 109 on
> the same file.
> 
> > + *
> > + * And since we only support GL and GLES, this is the only place where the
> > + * bound API matters at all. We look up the current API from the current
> > + * thread, and stash that in the context we're initializing. Our caller is
> > + * responsible for determining whether that's an API it supports.
> >   */
> >  EGLBoolean
> >  _eglInitContext(_EGLContext *ctx, _EGLDisplay *dpy, _EGLConfig *conf,
> > 
> 
> 
> With that fixed:
> Reviewed-by: Alejandro Piñeiro 

Same indentation request.
Reviewed-by: Chad Versace 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/4] i965: Use blorp_copy for all copy_image operations on gen6+

2016-09-09 Thread Chad Versace

Patches 3 and 4, despite their craziness, are
Acked-by: Chad Versace 

Just an ack, because I'm not very familiar with the new blorp code.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glsl: Add a citation for uniform precision matching.

2016-09-09 Thread Eric Anholt

Ian Romanick  writes:

> On 09/06/2016 03:24 PM, Kenneth Graunke wrote:
>> On Tuesday, September 6, 2016 1:04:43 PM PDT Eric Anholt wrote:
>>> Kenneth Graunke  writes:
>>>
 Ian added this check in commit 259fc505454ea6a67aeacf6cdebf1398d9947759.
 While reviewing the rules, I found a citation which spells this out
 clearly, so I figured I'd send a patch to add it as a comment.

 Cc: i...@freedesktop.org
 Signed-off-by: Kenneth Graunke 
>>>
>>> Reviewed-by: Eric Anholt 
>>>
>>> I was wondering about this change, because glmark2 is failing to compile
>>> its terrain shaders now.
>
> Out of curiosity... is the uniform used in all the stages?

In glmark2 -b terrain, it's defined but not used in VS.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] nvc0/ir: joins get a sched of 0x2f

2016-09-09 Thread Karol Herbst

slightly improves performance for GpuTest /test=pixmark_piano /benchmark
/no_scorebox /msaa=0 /benchmark_duration_ms=6 /width=1024 /height=640

score: 1031 -> 1033

observed from the binary generated by nvidia

Signed-off-by: Karol Herbst 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index d83028c..a8e9a33 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -3033,7 +3033,7 @@ SchedDataCalculator::setDelay(Instruction *insn, int 
delay, Instruction *next)
   insn->sched = 0xc2;
} else
if (insn->op == OP_JOIN || insn->join) {
-  insn->sched = 0x00;
+  insn->sched = 0x2f;
} else
if (delay >= 0 || prevData == 0x04 ||
!next || !targ->canDualIssue(insn, next)) {
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/4] intel/isl: Add support for RGB formats in X and Y-tiled memory

2016-09-09 Thread Chad Versace

On Fri 09 Sep 2016, Jason Ekstrand wrote:
> Normally, using a non-linear tiling format helps improve cache locality by
> ensuring that neighboring pixels are usually close-by in memory.  For RGB
> formats, this still sort-of holds, but it can also lead to rather terrible
> memory access patterns where a single RGB pixel value crosses a tile
> boundary and gets split into two pieces in different 4K pages.  It also
> makes for some rather awkward calculations because your tile size is no
> longer an even multiple of surface element size.  For these reasons, we
> chose to simply never create tiled RGB images in the Vulkan driver.
> 
> The GL driver, however, is not so kind so we need to support it somehow.  I
> briefly toyed with a couple of different schemes but this is the best one I
> could come up with.  The fundamental problem is that a tile no longer
> contains an integer number of surface elements.  I briefly considered a
> couple other options but found them wanting:
> 
>  1) Using floats for the logical tile size.  This leads to potential
> rounding error problems.

Let's stay far away from floats With floats, there is just too much risk
for weird bugs.

Floats are for representing ℝ. Our calculations are in ℚ, so let's
restrict ourselves to ints.  (Does GMail display the blackboard bold
UTF-8 correctly?)

>  2) When presented with a RGB format, just make the tile 3-times as wide.
> This isn't so nice because now our tiles are no longer power-of-two
> size.  Also, it can force the row_pitch to be larger than needed which,
> while not strictly a problem for ISL, causes incompatibility problems
> with the way the GL driver chooses surface pitches.
> 
> The chosen method requires that you pay attention and not just assume that
> your tile_info is in the units you think it is.  However, it's nice because
> it provides a nice "these are the units" declaration in isl_tile_info
> itself.  Previously, the tile_info wasn't usable as a stand-alone structure
> because you had to also know the format.  It also forces figuring out how
> to deal with inconsistencies between tiling and format back to the caller
> which is good because the two different consumers of isl_tile_info really
> want to deal with it differently:  Computation of the surface size wants
> the fewest number of horizontal tiles possible while get_intratile_offset
> is far more concerned with things aligning nicely.
> 
> Signed-off-by: Jason Ekstrand 
> Cc: Chad Versace 
> ---
>  src/intel/isl/isl.c | 49 -
>  src/intel/isl/isl.h | 10 +-
>  2 files changed, 45 insertions(+), 14 deletions(-)
> 
> diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
> index 34245f4..8c72186 100644
> --- a/src/intel/isl/isl.c
> +++ b/src/intel/isl/isl.c
> @@ -113,7 +113,16 @@ isl_tiling_get_info(const struct isl_device *dev,
> const uint32_t bs = format_bpb / 8;
> struct isl_extent2d logical_el, phys_B;
>  
> -   assert(tiling == ISL_TILING_LINEAR || isl_is_pow2(format_bpb));
> +   if (tiling != ISL_TILING_LINEAR && ! isl_is_pow2(format_bpb)) {
-^^^
1. Weirdo space after '!'

> +  /* It is possible to have non-power-of-two formats in a tiled buffer.
> +   * The easiest way to handle this is to treat the tile as if it is 
> three
> +   * times as wide.  This way no pixel will ever cross a tile boundary.
> +   * This really only works on legacy X and Y tiling formats.
> +   */
> +  assert(tiling == ISL_TILING_X || tiling == ISL_TILING_Y0);
> +  assert(bs % 3 == 0 && isl_is_pow2(format_bpb / 3));
> +  return isl_tiling_get_info(dev, tiling, format_bpb / 3, tile_info);
> +   }
>  
> switch (tiling) {
> case ISL_TILING_LINEAR:
> @@ -209,6 +218,7 @@ isl_tiling_get_info(const struct isl_device *dev,
>  
> *tile_info = (struct isl_tile_info) {
>.tiling = tiling,
> +  .format_bpb = format_bpb,
>.logical_extent_el = logical_el,
>.phys_extent_B = phys_B,
> };
> @@ -1215,14 +1225,20 @@ isl_surf_init_s(const struct isl_device *dev,
>}
>base_alignment = isl_round_up_to_power_of_two(base_alignment);
> } else {
> +  assert(fmtl->bpb % tile_info.format_bpb == 0);
> +  const uint32_t tile_el_scale = fmtl->bpb / tile_info.format_bpb;

> +  assert(tile_el_scale == 1 || tile_el_scale == 3);

2. Without the context of this patch, the assertion is very mysterious.
Future people will be afraid to touch it, as there's no clue to the
magic 3's justification. Please add a comment.

> +
>assert(phys_slice0_sa.w % fmtl->bw == 0);
>const uint32_t total_w_el = phys_slice0_sa.width / fmtl->bw;
>const uint32_t total_w_tl =
> - isl_align_div(total_w_el, tile_info.logical_extent_el.width);
> + isl_align_div(total_w_el * tile_el_scale,
> +

Re: [Mesa-dev] [PATCH 0/3] Implement EGL_KHR_no_config_context

2016-09-09 Thread Eric Anholt

Adam Jackson  writes:

> KHR_no_config_context is virtually identical to MESA_configless_context,
> where they differ is only where the Mesa spec is silent, and for the
> most part the code already did what the Khronos spec wants. Fix up the
> one corner case, and mark the Mesa extension as superseded.

This series is:

Reviewed-by: Eric Anholt 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 07/11] glsl: add gl_LocalGroupSizeARB as a system value

2016-09-09 Thread Ilia Mirkin

On Thu, Sep 8, 2016 at 4:31 PM, Samuel Pitoiset
 wrote:
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/compiler/glsl/builtin_variables.cpp | 2 ++
>  src/compiler/shader_enums.h | 1 +
>  2 files changed, 3 insertions(+)
>
> diff --git a/src/compiler/glsl/builtin_variables.cpp 
> b/src/compiler/glsl/builtin_variables.cpp
> index f47daab..a1768fc 100644
> --- a/src/compiler/glsl/builtin_variables.cpp
> +++ b/src/compiler/glsl/builtin_variables.cpp
> @@ -1236,6 +1236,8 @@ builtin_variable_generator::generate_cs_special_vars()
>  "gl_LocalInvocationID");
> add_system_value(SYSTEM_VALUE_WORK_GROUP_ID, uvec3_t, "gl_WorkGroupID");
> add_system_value(SYSTEM_VALUE_NUM_WORK_GROUPS, uvec3_t, 
> "gl_NumWorkGroups");
> +   add_system_value(SYSTEM_VALUE_LOCAL_GROUP_SIZE,
> +uvec3_t, "gl_LocalGroupSizeARB");

if (state->ARB_bla_bla_bla_enable) { } I think

> if (state->ctx->Const.LowerCsDerivedVariables) {
>add_variable("gl_GlobalInvocationID", uvec3_t, ir_var_auto, 0);
>add_variable("gl_LocalInvocationIndex", uint_t, ir_var_auto, 0);
> diff --git a/src/compiler/shader_enums.h b/src/compiler/shader_enums.h
> index c3a62e0..b6e048e 100644
> --- a/src/compiler/shader_enums.h
> +++ b/src/compiler/shader_enums.h
> @@ -472,6 +472,7 @@ typedef enum
> SYSTEM_VALUE_GLOBAL_INVOCATION_ID,
> SYSTEM_VALUE_WORK_GROUP_ID,
> SYSTEM_VALUE_NUM_WORK_GROUPS,
> +   SYSTEM_VALUE_LOCAL_GROUP_SIZE,
> /*@}*/
>
> /**
> --
> 2.9.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 07/11] glsl: add gl_LocalGroupSizeARB as a system value

2016-09-09 Thread Ian Romanick

This patch is

Reviewed-by: Ian Romanick 

On 09/08/2016 01:31 PM, Samuel Pitoiset wrote:
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/compiler/glsl/builtin_variables.cpp | 2 ++
>  src/compiler/shader_enums.h | 1 +
>  2 files changed, 3 insertions(+)
> 
> diff --git a/src/compiler/glsl/builtin_variables.cpp 
> b/src/compiler/glsl/builtin_variables.cpp
> index f47daab..a1768fc 100644
> --- a/src/compiler/glsl/builtin_variables.cpp
> +++ b/src/compiler/glsl/builtin_variables.cpp
> @@ -1236,6 +1236,8 @@ builtin_variable_generator::generate_cs_special_vars()
>  "gl_LocalInvocationID");
> add_system_value(SYSTEM_VALUE_WORK_GROUP_ID, uvec3_t, "gl_WorkGroupID");
> add_system_value(SYSTEM_VALUE_NUM_WORK_GROUPS, uvec3_t, 
> "gl_NumWorkGroups");
> +   add_system_value(SYSTEM_VALUE_LOCAL_GROUP_SIZE,
> +uvec3_t, "gl_LocalGroupSizeARB");
> if (state->ctx->Const.LowerCsDerivedVariables) {
>add_variable("gl_GlobalInvocationID", uvec3_t, ir_var_auto, 0);
>add_variable("gl_LocalInvocationIndex", uint_t, ir_var_auto, 0);
> diff --git a/src/compiler/shader_enums.h b/src/compiler/shader_enums.h
> index c3a62e0..b6e048e 100644
> --- a/src/compiler/shader_enums.h
> +++ b/src/compiler/shader_enums.h
> @@ -472,6 +472,7 @@ typedef enum
> SYSTEM_VALUE_GLOBAL_INVOCATION_ID,
> SYSTEM_VALUE_WORK_GROUP_ID,
> SYSTEM_VALUE_NUM_WORK_GROUPS,
> +   SYSTEM_VALUE_LOCAL_GROUP_SIZE,
> /*@}*/
>  
> /**
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 06/11] glsl/linker: handle errors when a variable local size is used

2016-09-09 Thread Ian Romanick

On 09/08/2016 01:31 PM, Samuel Pitoiset wrote:
> Compute shaders can now include a fixed local size as defined by
> ARB_compute_shader or a variable size as defined by
> ARB_compute_variable_group_size.
> 
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/compiler/glsl/linker.cpp | 23 +--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp
> index c95edf3..e909455 100644
> --- a/src/compiler/glsl/linker.cpp
> +++ b/src/compiler/glsl/linker.cpp
> @@ -2074,6 +2074,7 @@ link_cs_input_layout_qualifiers(struct 
> gl_shader_program *prog,
>  {
> for (int i = 0; i < 3; i++)
>linked_shader->info.Comp.LocalSize[i] = 0;

Blank line here.

> +   linked_shader->info.Comp.LocalSizeVariable = false;
>  
> /* This function is called for all shader stages, but it only has an 
> effect
>  * for compute shaders.
> @@ -2109,6 +2110,20 @@ link_cs_input_layout_qualifiers(struct 
> gl_shader_program *prog,
>  linked_shader->info.Comp.LocalSize[i] =
> shader->info.Comp.LocalSize[i];
>   }
> +  } else if (shader->info.Comp.LocalSizeVariable) {
> + if (linked_shader->info.Comp.LocalSize[0] != 0) {
> +/* From the ARB_compute_variable_group_size spec:

Spec quote.

> + *
> + * If one compute shader attached to a program declares a
> + * variable local group size and a second compute shader
> + * attached to the same program declares a fixed local group
> + * size, a link-time error results.
> + */
> +linker_error(prog, "computer shader defined with both fixed and "
   compute  :)

> + "variable local group size\n");
> +return;
> + }
> + linked_shader->info.Comp.LocalSizeVariable = true;
>}
> }
>  
> @@ -2116,12 +2131,16 @@ link_cs_input_layout_qualifiers(struct 
> gl_shader_program *prog,
>  * since we already know we're in the right type of shader program
>  * for doing it.
>  */
> -   if (linked_shader->info.Comp.LocalSize[0] == 0) {
> -  linker_error(prog, "compute shader didn't declare local size\n");
> +   if (linked_shader->info.Comp.LocalSize[0] == 0 &&
> +   !linked_shader->info.Comp.LocalSizeVariable) {
> +  linker_error(prog, "compute shader must contain a fixed or a variable "
> + "local group size\n");
>return;
> }
> for (int i = 0; i < 3; i++)
>prog->Comp.LocalSize[i] = linked_shader->info.Comp.LocalSize[i];

Blank line here.

With those 4 nits fixed, this patch is

Reviewed-by: Ian Romanick 

> +   prog->Comp.LocalSizeVariable =
> +  linked_shader->info.Comp.LocalSizeVariable;
>  }
>  
>  
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 05/11] glsl: reject compute shaders with fixed and variable local size

2016-09-09 Thread Ian Romanick

On 09/08/2016 01:31 PM, Samuel Pitoiset wrote:
> The ARB_compute_variable_group_size specification explains that
> when a compute shader includes both a fixed and a variable local
> size, a compile-time error occurs.

I probably would have squashed this in with the previous commit, but
it's fine either way.

> Signed-off-by: Samuel Pitoiset 
> ---
>  src/compiler/glsl/ast_to_hir.cpp | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/src/compiler/glsl/ast_to_hir.cpp 
> b/src/compiler/glsl/ast_to_hir.cpp
> index 4fc4c5c..a53a82e 100644
> --- a/src/compiler/glsl/ast_to_hir.cpp
> +++ b/src/compiler/glsl/ast_to_hir.cpp
> @@ -8013,6 +8013,20 @@ ast_cs_input_layout::hir(exec_list *instructions,
>}
> }
>  
> +   /* From the ARB_compute_variable_group_size specification:

Same comment about spec quotes.

It's hard to tell from just the patch... is this check order dependent?
Does it correct handle all the cases:

// case 1
layout(local_size_variable, local_size_x = 32, local_size_y = 32);

// case 2
layout(local_size_x = 32, local_size_y = 32, local_size_variable);

// case 3
layout(local_size_variable);
layout(local_size_x = 32, local_size_y = 32);

// case 4
layout(local_size_x = 32, local_size_y = 32);
layout(local_size_variable);

> +*
> +* If a compute shader including a *local_size_variable* qualifier 
> also
> +* declares a fixed local group size using the *local_size_x*,
> +* *local_size_y*, or *local_size_z* qualifiers, a compile-time error
> +* results
> +*/
> +   if (state->cs_input_local_size_variable_specified) {
> +  _mesa_glsl_error(, state,
> +   "compute shader can't include both a variable and a "
> +   "fixed local group size");
> +  return NULL;
> +   }
> +
> state->cs_input_local_size_specified = true;
> for (int i = 0; i < 3; i++)
>state->cs_input_local_size[i] = qual_local_size[i];
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 04/11] glsl: process local_size_variable input qualifier

2016-09-09 Thread Ian Romanick

On 09/08/2016 01:31 PM, Samuel Pitoiset wrote:
> This is the new layout qualifier introduced by
> ARB_compute_variable_group_size which allows to use a variable work
> group size.
> 
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/compiler/glsl/ast.h  |  5 +
>  src/compiler/glsl/ast_type.cpp   |  6 ++
>  src/compiler/glsl/glsl_parser.yy | 13 +
>  src/compiler/glsl/glsl_parser_extras.cpp |  5 +
>  src/compiler/glsl/glsl_parser_extras.h   |  6 ++
>  5 files changed, 35 insertions(+)
> 
> diff --git a/src/compiler/glsl/ast.h b/src/compiler/glsl/ast.h
> index 4c648d0..55f009a 100644
> --- a/src/compiler/glsl/ast.h
> +++ b/src/compiler/glsl/ast.h
> @@ -553,6 +553,11 @@ struct ast_type_qualifier {
>*/
>   unsigned local_size:3;
>  
> +  /** \name Layout qualifiers for ARB_compute_variable_group_size. */
> +  /** \{ */
> +  unsigned local_size_variable:1;
> +  /** \} */
> +
>/** \name Layout and memory qualifiers for 
> ARB_shader_image_load_store. */
>/** \{ */
>unsigned early_fragment_tests:1;
> diff --git a/src/compiler/glsl/ast_type.cpp b/src/compiler/glsl/ast_type.cpp
> index f3f6b29..3f19f1f 100644
> --- a/src/compiler/glsl/ast_type.cpp
> +++ b/src/compiler/glsl/ast_type.cpp
> @@ -503,6 +503,7 @@ ast_type_qualifier::merge_in_qualifier(YYLTYPE *loc,
>   state->in_qualifier->flags.q.local_size == 0;
>  
>valid_in_mask.flags.q.local_size = 7;
> +  valid_in_mask.flags.q.local_size_variable = 1;
>break;
> default:
>_mesa_glsl_error(loc, state,
> @@ -580,6 +581,10 @@ ast_type_qualifier::merge_in_qualifier(YYLTYPE *loc,
>this->point_mode = q.point_mode;
> }
>  
> +   if (q.flags.q.local_size_variable) {
> +  state->cs_input_local_size_variable_specified = true;
> +   }
> +
> if (create_node) {
>if (create_gs_ast) {
>   node = new(mem_ctx) ast_gs_input_layout(*loc, q.prim_type);
> @@ -653,6 +658,7 @@ ast_type_qualifier::validate_flags(YYLTYPE *loc,
>  bad.flags.q.prim_type ? " prim_type" : "",
>  bad.flags.q.max_vertices ? " max_vertices" : "",
>  bad.flags.q.local_size ? " local_size" : "",
> +bad.flags.q.local_size_variable ? " local_size_variable" 
> : "",
>  bad.flags.q.early_fragment_tests ? " 
> early_fragment_tests" : "",
>  bad.flags.q.explicit_image_format ? " image_format" : "",
>  bad.flags.q.coherent ? " coherent" : "",
> diff --git a/src/compiler/glsl/glsl_parser.yy 
> b/src/compiler/glsl/glsl_parser.yy
> index 9e1fd9e..38cbd3f 100644
> --- a/src/compiler/glsl/glsl_parser.yy
> +++ b/src/compiler/glsl/glsl_parser.yy
> @@ -1491,6 +1491,19 @@ layout_qualifier_id:
>   }
>}
>  
> +  /* Layout qualifiers for ARB_compute_variable_group_size. */
> +  if (!$$.flags.i) {
> + if (match_layout_qualifier($1, "local_size_variable", state) == 0) {
> +$$.flags.q.local_size_variable = 1;
> + }
> +
> + if ($$.flags.i && !state->ARB_compute_variable_group_size_enable) {
> +_mesa_glsl_error(& @1, state,
> + "qualifier `local_size_variable` requires "
> + "ARB_compute_variable_group_size");
> + }
> +  }
> +
>if (!$$.flags.i) {
>   _mesa_glsl_error(& @1, state, "unrecognized layout identifier "
>"`%s'", $1);
> diff --git a/src/compiler/glsl/glsl_parser_extras.cpp 
> b/src/compiler/glsl/glsl_parser_extras.cpp
> index bcbe623..5a7153f 100644
> --- a/src/compiler/glsl/glsl_parser_extras.cpp
> +++ b/src/compiler/glsl/glsl_parser_extras.cpp
> @@ -297,6 +297,8 @@ _mesa_glsl_parse_state::_mesa_glsl_parse_state(struct 
> gl_context *_ctx,
>sizeof(this->atomic_counter_offsets));
> this->allow_extension_directive_midshader =
>ctx->Const.AllowGLSLExtensionDirectiveMidShader;
> +
> +   this->cs_input_local_size_variable_specified = false;
>  }
>  
>  /**
> @@ -1648,6 +1650,7 @@ set_shader_inout_layout(struct gl_shader *shader,
> if (shader->Stage != MESA_SHADER_COMPUTE) {
>/* Should have been prevented by the parser. */
>assert(!state->cs_input_local_size_specified);
> +  assert(!state->cs_input_local_size_variable_specified);
> }
>  
> if (shader->Stage != MESA_SHADER_FRAGMENT) {
> @@ -1763,6 +1766,8 @@ set_shader_inout_layout(struct gl_shader *shader,
>   for (int i = 0; i < 3; i++)
>  shader->info.Comp.LocalSize[i] = 0;
>}

Blank line here.  With that fixed, this patch is

Reviewed-by: Ian Romanick 

> +  shader->info.Comp.LocalSizeVariable =
> + state->cs_input_local_size_variable_specified;
>break;
>  
> case MESA_SHADER_FRAGMENT:
> diff --git

Re: [Mesa-dev] [PATCH 0/3] Implement EGL_KHR_no_config_context

2016-09-09 Thread Kristian Høgsberg

On Fri, Sep 9, 2016 at 9:25 AM, Adam Jackson  wrote:
> KHR_no_config_context is virtually identical to MESA_configless_context,
> where they differ is only where the Mesa spec is silent, and for the
> most part the code already did what the Khronos spec wants. Fix up the
> one corner case, and mark the Mesa extension as superseded.

Looks good to me:

Reviewed-by: Kristian Høgsberg 

> - ajax
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/4] intel/isl: Allow valign2 for texture-only Y-tiled surfaces on gen7

2016-09-09 Thread Jason Ekstrand

On Fri, Sep 9, 2016 at 11:18 AM, Chad Versace 
wrote:

> On Fri 09 Sep 2016, Jason Ekstrand wrote:
> > The restriction that Y-tiled surfaces must have valign == 4 only aplies
> to
> > render targets but we were applying it universally.  This causes problems
> > if R32G32B32_SFLOAT is used because it requires valign == 2; this should
> be
>  
> That isl format doesn't exist. Did you mean R32B32B32_FLOAT (sans S)?
>

I did.


>
> > okay because you can't render to that format.
> >
> > Signed-off-by: Jason Ekstrand 
> > Cc: Chad Versace 
> > ---
> >  src/intel/isl/isl_gen7.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/intel/isl/isl_gen7.c b/src/intel/isl/isl_gen7.c
> > index 02273f8..f3d8428 100644
> > --- a/src/intel/isl/isl_gen7.c
> > +++ b/src/intel/isl/isl_gen7.c
> > @@ -354,7 +354,8 @@ gen7_choose_valign_el(const struct isl_device *dev,
> >  */
> > if (isl_surf_usage_is_depth(info->usage) ||
> > info->samples > 1 ||
> > -   tiling == ISL_TILING_Y0) {
> > +   ((info->usage & ISL_SURF_USAGE_RENDER_TARGET_BIT) &&
> > +tiling == ISL_TILING_Y0)) {
> >require_valign4 = true;
> > }
>
> Did the buggy code trigger the assertion on line 390?
> assert(!require_valign2 || !require_valign4);
> If not, then something is suspicious.
>

Yes, that's exactly what happened


> Fix the commit message and this is
> Reviewed-by: Chad Versace 
> 


Thanks!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 03/11] glsl: add enable flags for ARB_compute_variable_group_size

2016-09-09 Thread Ian Romanick

This patch is

Reviewed-by: Ian Romanick 

I'm going to try to work through the rest of the glsl: patches today,
but I can't make any promises.

On 09/08/2016 01:31 PM, Samuel Pitoiset wrote:
> This also initializes the default values for the standalone compiler.
> 
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/compiler/glsl/glsl_parser_extras.cpp | 1 +
>  src/compiler/glsl/glsl_parser_extras.h   | 2 ++
>  src/compiler/glsl/standalone.cpp | 4 
>  src/compiler/glsl/standalone_scaffolding.cpp | 5 +
>  4 files changed, 12 insertions(+)
> 
> diff --git a/src/compiler/glsl/glsl_parser_extras.cpp 
> b/src/compiler/glsl/glsl_parser_extras.cpp
> index 436ddd0..bcbe623 100644
> --- a/src/compiler/glsl/glsl_parser_extras.cpp
> +++ b/src/compiler/glsl/glsl_parser_extras.cpp
> @@ -580,6 +580,7 @@ static const _mesa_glsl_extension 
> _mesa_glsl_supported_extensions[] = {
> EXT(ARB_ES3_2_compatibility),
> EXT(ARB_arrays_of_arrays),
> EXT(ARB_compute_shader),
> +   EXT(ARB_compute_variable_group_size),
> EXT(ARB_conservative_depth),
> EXT(ARB_cull_distance),
> EXT(ARB_derivative_control),
> diff --git a/src/compiler/glsl/glsl_parser_extras.h 
> b/src/compiler/glsl/glsl_parser_extras.h
> index e146fe1..8e0dafe 100644
> --- a/src/compiler/glsl/glsl_parser_extras.h
> +++ b/src/compiler/glsl/glsl_parser_extras.h
> @@ -576,6 +576,8 @@ struct _mesa_glsl_parse_state {
> bool ARB_arrays_of_arrays_warn;
> bool ARB_compute_shader_enable;
> bool ARB_compute_shader_warn;
> +   bool ARB_compute_variable_group_size_enable;
> +   bool ARB_compute_variable_group_size_warn;
> bool ARB_conservative_depth_enable;
> bool ARB_conservative_depth_warn;
> bool ARB_cull_distance_enable;
> diff --git a/src/compiler/glsl/standalone.cpp 
> b/src/compiler/glsl/standalone.cpp
> index 88fe5fd..cb2da03 100644
> --- a/src/compiler/glsl/standalone.cpp
> +++ b/src/compiler/glsl/standalone.cpp
> @@ -58,6 +58,10 @@ initialize_context(struct gl_context *ctx, gl_api api)
> ctx->Const.MaxComputeWorkGroupSize[2] = 64;
> ctx->Const.MaxComputeWorkGroupInvocations = 1024;
> ctx->Const.MaxComputeSharedMemorySize = 32768;
> +   ctx->Const.MaxComputeVariableGroupSize[0] = 512;
> +   ctx->Const.MaxComputeVariableGroupSize[1] = 512;
> +   ctx->Const.MaxComputeVariableGroupSize[2] = 64;
> +   ctx->Const.MaxComputeVariableGroupInvocations = 512;
> ctx->Const.Program[MESA_SHADER_COMPUTE].MaxTextureImageUnits = 16;
> ctx->Const.Program[MESA_SHADER_COMPUTE].MaxUniformComponents = 1024;
> ctx->Const.Program[MESA_SHADER_COMPUTE].MaxCombinedUniformComponents = 
> 1024;
> diff --git a/src/compiler/glsl/standalone_scaffolding.cpp 
> b/src/compiler/glsl/standalone_scaffolding.cpp
> index b0fb4b7..decff5f 100644
> --- a/src/compiler/glsl/standalone_scaffolding.cpp
> +++ b/src/compiler/glsl/standalone_scaffolding.cpp
> @@ -144,6 +144,7 @@ void initialize_context_to_defaults(struct gl_context 
> *ctx, gl_api api)
> ctx->Extensions.dummy_false = false;
> ctx->Extensions.dummy_true = true;
> ctx->Extensions.ARB_compute_shader = true;
> +   ctx->Extensions.ARB_compute_variable_group_size = true;
> ctx->Extensions.ARB_conservative_depth = true;
> ctx->Extensions.ARB_draw_instanced = true;
> ctx->Extensions.ARB_ES2_compatibility = true;
> @@ -207,6 +208,10 @@ void initialize_context_to_defaults(struct gl_context 
> *ctx, gl_api api)
> ctx->Const.MaxComputeWorkGroupSize[1] = 1024;
> ctx->Const.MaxComputeWorkGroupSize[2] = 64;
> ctx->Const.MaxComputeWorkGroupInvocations = 1024;
> +   ctx->Const.MaxComputeVariableGroupSize[0] = 512;
> +   ctx->Const.MaxComputeVariableGroupSize[1] = 512;
> +   ctx->Const.MaxComputeVariableGroupSize[2] = 64;
> +   ctx->Const.MaxComputeVariableGroupInvocations = 512;
> ctx->Const.Program[MESA_SHADER_COMPUTE].MaxTextureImageUnits = 16;
> ctx->Const.Program[MESA_SHADER_COMPUTE].MaxUniformComponents = 1024;
> ctx->Const.Program[MESA_SHADER_COMPUTE].MaxInputComponents = 0; /* not 
> used */
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: support TGSI compute shaders with variable block size

2016-09-09 Thread Ian Romanick

On 09/09/2016 05:29 AM, Marek Olšák wrote:
> On Fri, Sep 9, 2016 at 10:12 AM, Nicolai Hähnle  wrote:
>> From: Nicolai Hähnle 
>>
>> Not sure if it's possible to avoid programming the block size twice (once for
>> the userdata and once for the dispatch).
>>
>> Since the shaders are compiled with a pessimistic upper limit on the number 
>> of
>> registers, asynchronously compiling variants may be worth considering in the
>> future if we observe the shaders to be dispatched with small block sizes.
>> ---
>> I think this is sufficient to support variable group sizes on radeonsi, but
>> it's completely untested. Do you keep the latest version of your series in a
>> public repository somewhere?
>>
>>  src/gallium/drivers/radeonsi/si_compute.c | 10 +-
>>  src/gallium/drivers/radeonsi/si_shader.c  | 29 -
>>  src/gallium/drivers/radeonsi/si_shader.h  |  4 +++-
>>  3 files changed, 32 insertions(+), 11 deletions(-)
>>
>> diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
>> b/src/gallium/drivers/radeonsi/si_compute.c
>> index 5041761..26e096c 100644
>> --- a/src/gallium/drivers/radeonsi/si_compute.c
>> +++ b/src/gallium/drivers/radeonsi/si_compute.c
>> @@ -379,25 +379,33 @@ static void si_setup_tgsi_grid(struct si_context *sctx,
>> for (i = 0; i < 3; ++i) {
>> radeon_emit(cs, PKT3(PKT3_COPY_DATA, 4, 0));
>> radeon_emit(cs, COPY_DATA_SRC_SEL(COPY_DATA_MEM) |
>> COPY_DATA_DST_SEL(COPY_DATA_REG));
>> radeon_emit(cs, (va +  4 * i));
>> radeon_emit(cs, (va + 4 * i) >> 32);
>> radeon_emit(cs, (grid_size_reg >> 2) + i);
>> radeon_emit(cs, 0);
>> }
>> } else {
>> +   struct si_compute *program = sctx->cs_shader_state.program;
>> +   bool variable_group_size =
>> +   
>> program->shader.selector->info.properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH]
>>  == 0;
>>
>> -   radeon_set_sh_reg_seq(cs, grid_size_reg, 3);
>> +   radeon_set_sh_reg_seq(cs, grid_size_reg, variable_group_size 
>> ? 6 : 3);
>> radeon_emit(cs, info->grid[0]);
>> radeon_emit(cs, info->grid[1]);
>> radeon_emit(cs, info->grid[2]);
>> +   if (variable_group_size) {
>> +   radeon_emit(cs, info->block[0]);
>> +   radeon_emit(cs, info->block[1]);
>> +   radeon_emit(cs, info->block[2]);
>> +   }
>> }
>>  }
>>
>>  static void si_emit_dispatch_packets(struct si_context *sctx,
>>   const struct pipe_grid_info *info)
>>  {
>> struct radeon_winsys_cs *cs = sctx->b.gfx.cs;
>> bool render_cond_bit = sctx->b.render_cond && 
>> !sctx->b.render_cond_force_off;
>> unsigned waves_per_threadgroup =
>> DIV_ROUND_UP(info->block[0] * info->block[1] * 
>> info->block[2], 64);
>> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
>> b/src/gallium/drivers/radeonsi/si_shader.c
>> index 0b7de18..730ee21 100644
>> --- a/src/gallium/drivers/radeonsi/si_shader.c
>> +++ b/src/gallium/drivers/radeonsi/si_shader.c
>> @@ -1783,30 +1783,35 @@ static void declare_system_value(
>>
>> case TGSI_SEMANTIC_GRID_SIZE:
>> value = LLVMGetParam(radeon_bld->main_fn, 
>> SI_PARAM_GRID_SIZE);
>> break;
>>
>> case TGSI_SEMANTIC_BLOCK_SIZE:
>> {
>> LLVMValueRef values[3];
>> unsigned i;
>> unsigned *properties = 
>> ctx->shader->selector->info.properties;
>> -   unsigned sizes[3] = {
>> -   properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH],
>> -   properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT],
>> -   properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH]
>> -   };
>>
>> -   for (i = 0; i < 3; ++i)
>> -   values[i] = lp_build_const_int32(gallivm, sizes[i]);
>> +   if (properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] != 0) {
>> +   unsigned sizes[3] = {
>> +   
>> properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH],
>> +   
>> properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT],
>> +   
>> properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH]
>> +   };
>> +
>> +   for (i = 0; i < 3; ++i)
>> +   values[i] = lp_build_const_int32(gallivm, 
>> sizes[i]);
>>
>> -   value = lp_build_gather_values(gallivm, values, 3);
>> +   value = lp_build_gather_values(gallivm, values, 3);
>> +   } else {
>> +

Re: [Mesa-dev] [PATCH 01/11] glapi: add entry points for GL_ARB_compute_variable_group_size

2016-09-09 Thread Ian Romanick

On 09/09/2016 08:46 AM, Samuel Pitoiset wrote:
> 
> 
> On 09/08/2016 10:58 PM, Ian Romanick wrote:
>> On 09/08/2016 01:31 PM, Samuel Pitoiset wrote:
>>> Signed-off-by: Samuel Pitoiset 
>>> ---
>>>  .../glapi/gen/ARB_compute_variable_group_size.xml  | 25
>>> ++
>>>  src/mapi/glapi/gen/Makefile.am |  1 +
>>>  src/mapi/glapi/gen/gl_API.xml  |  2 ++
>>>  src/mesa/main/compute.c|  8 +++
>>>  src/mesa/main/compute.h|  5 +
>>>  src/mesa/main/tests/dispatch_sanity.cpp|  3 +++
>>>  6 files changed, 44 insertions(+)
>>>  create mode 100644
>>> src/mapi/glapi/gen/ARB_compute_variable_group_size.xml
>>>
>>> diff --git a/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml
>>> b/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml
>>> new file mode 100644
>>> index 000..b21c52f
>>> --- /dev/null
>>> +++ b/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml
>>> @@ -0,0 +1,25 @@
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +  >> value="0x9344"/>
>>> +  >> value="0x90EB"/>
>>> +  >> value="0x9345"/>
>>> +  >> value="0x91BF"/>
>>> +
>>> +  
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +  
>>> +
>>> +
>>> +
>>> diff --git a/src/mapi/glapi/gen/Makefile.am
>>> b/src/mapi/glapi/gen/Makefile.am
>>> index 0d7c338..49fdfe3 100644
>>> --- a/src/mapi/glapi/gen/Makefile.am
>>> +++ b/src/mapi/glapi/gen/Makefile.am
>>> @@ -117,6 +117,7 @@ API_XML = \
>>>  ARB_color_buffer_float.xml \
>>>  ARB_compressed_texture_pixel_storage.xml \
>>>  ARB_compute_shader.xml \
>>> +ARB_compute_variable_group_size.xml \
>>>  ARB_copy_buffer.xml \
>>>  ARB_copy_image.xml \
>>>  ARB_debug_output.xml \
>>> diff --git a/src/mapi/glapi/gen/gl_API.xml
>>> b/src/mapi/glapi/gen/gl_API.xml
>>> index c39aa22..9ad3b60 100644
>>> --- a/src/mapi/glapi/gen/gl_API.xml
>>> +++ b/src/mapi/glapi/gen/gl_API.xml
>>> @@ -8258,6 +8258,8 @@
>>>
>>>  >> xmlns:xi="http://www.w3.org/2001/XInclude"/>
>>>
>>> +>> xmlns:xi="http://www.w3.org/2001/XInclude"/>
>>> +
>>
>> This (extension #153) should go before ARB_indirect_parameters
>> (extension #154), and the "ARB extensions 149 - 153" should be changed
>> to "ARB extensions 149 - 152".
> 
> Presumably, you meant "ARB extensions 149 - 154"? Because the next
> section is "ARB extension 155 - 159".

I meant what I said.  The 'ARB extensions ...' comments are placeholders
for extensions that are not there.  Previously only 154 was there, hence
the 149-153 and 155-159.  You're adding 153, so those become 149-152 and
155-159.

Looking at this more closely... there are some other errors here
unrelated to your change.  I know we support
GL_ARB_texture_mirror_clamp_to_edge (extension #149),
GL_ARB_texture_stencil8 (extension #150), and
GL_ARB_vertex_type_10f_11f_11f_rev (extension #151).  I think these are
all extensions that just enable pre-existing enums in new places, so
they don't add anything new for this file.  A reasonable follow-up patch
would be to add blocks for these extensions like the existing
GL_ARB_texture_non_power_of_two block:





It may not be worth the effort.

>> With that fixed, this patch is
>>
>> Reviewed-by: Ian Romanick 
>>
>>>  
>>>
>>>  >> xmlns:xi="http://www.w3.org/2001/XInclude"/>
>>> diff --git a/src/mesa/main/compute.c b/src/mesa/main/compute.c
>>> index b71430f..b052bae 100644
>>> --- a/src/mesa/main/compute.c
>>> +++ b/src/mesa/main/compute.c
>>> @@ -60,3 +60,11 @@ _mesa_DispatchComputeIndirect(GLintptr indirect)
>>>
>>> ctx->Driver.DispatchComputeIndirect(ctx, indirect);
>>>  }
>>> +
>>> +void GLAPIENTRY
>>> +_mesa_DispatchComputeGroupSizeARB(GLuint num_groups_x, GLuint
>>> num_groups_y,
>>> +  GLuint num_groups_z, GLuint
>>> group_size_x,
>>> +  GLuint group_size_y, GLuint
>>> group_size_z)
>>> +{
>>> +
>>> +}
>>> diff --git a/src/mesa/main/compute.h b/src/mesa/main/compute.h
>>> index 0cc034f..8018bbb 100644
>>> --- a/src/mesa/main/compute.h
>>> +++ b/src/mesa/main/compute.h
>>> @@ -35,4 +35,9 @@ _mesa_DispatchCompute(GLuint num_groups_x,
>>>  extern void GLAPIENTRY
>>>  _mesa_DispatchComputeIndirect(GLintptr indirect);
>>>
>>> +extern void GLAPIENTRY
>>> +_mesa_DispatchComputeGroupSizeARB(GLuint num_groups_x, GLuint
>>> num_groups_y,
>>> +  GLuint num_groups_z, GLuint
>>> group_size_x,
>>> +  GLuint group_size_y, GLuint
>>> group_size_z);
>>> +
>>>  #endif
>>> diff --git a/src/mesa/main/tests/dispatch_sanity.cpp
>>> b/src/mesa/main/tests/dispatch_sanity.cpp
>>> index 42fe61a..7faeabe 100644
>>> --- a/src/mesa/main/tests/dispatch_sanity.cpp
>>> +++ b/src/mesa/main/tests/dispatch_sanity.cpp
>>> @@ -942,6 +942,9 @@ const struct function
>>> common_desktop_functions_possible[] = {
>>> { "glDispatchCompute",

Re: [Mesa-dev] [PATCH 1/4] intel/isl: Allow valign2 for texture-only Y-tiled surfaces on gen7

2016-09-09 Thread Chad Versace

On Fri 09 Sep 2016, Jason Ekstrand wrote:
> The restriction that Y-tiled surfaces must have valign == 4 only aplies to
> render targets but we were applying it universally.  This causes problems
> if R32G32B32_SFLOAT is used because it requires valign == 2; this should be
 
That isl format doesn't exist. Did you mean R32B32B32_FLOAT (sans S)?

> okay because you can't render to that format.
> 
> Signed-off-by: Jason Ekstrand 
> Cc: Chad Versace 
> ---
>  src/intel/isl/isl_gen7.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/src/intel/isl/isl_gen7.c b/src/intel/isl/isl_gen7.c
> index 02273f8..f3d8428 100644
> --- a/src/intel/isl/isl_gen7.c
> +++ b/src/intel/isl/isl_gen7.c
> @@ -354,7 +354,8 @@ gen7_choose_valign_el(const struct isl_device *dev,
>  */
> if (isl_surf_usage_is_depth(info->usage) ||
> info->samples > 1 ||
> -   tiling == ISL_TILING_Y0) {
> +   ((info->usage & ISL_SURF_USAGE_RENDER_TARGET_BIT) &&
> +tiling == ISL_TILING_Y0)) {
>require_valign4 = true;
> }

Did the buggy code trigger the assertion on line 390?
assert(!require_valign2 || !require_valign4);
If not, then something is suspicious.

Fix the commit message and this is
Reviewed-by: Chad Versace 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] anv: Link to libX11-xcb only when unneeded

2016-09-09 Thread Chad Versace

On Thu 08 Sep 2016, Nanley Chery wrote:
> Should the title the say needed instead of unneeded?

Oops :/ It's already pushed.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [v3 4/6] i965/rbc: Consult rb settings for texture surface setup

2016-09-09 Thread Pohjolainen, Topi

On Fri, Sep 09, 2016 at 07:11:16AM -0700, Jason Ekstrand wrote:
>On Sep 8, 2016 11:13 PM, "Pohjolainen, Topi"
><[1]topi.pohjolai...@gmail.com> wrote:
>>
>> On Thu, Sep 08, 2016 at 08:49:56AM -0700, Jason Ekstrand wrote:
>> >On Sep 7, 2016 9:30 PM, "Pohjolainen, Topi"
>> ><[1][2]topi.pohjolai...@gmail.com> wrote:
>> >>
>> >> On Wed, Sep 07, 2016 at 03:25:30PM -0700, Jason Ekstrand
>wrote:
>> >> >On Sep 7, 2016 10:24 AM, "Topi Pohjolainen"
>> >> ><[1][2][3]topi.pohjolai...@gmail.com> wrote:
>> >> >>
>> >> >> Once mcs buffer gets allocated without delay for
>lossless
>> >> >> compression (same as we do for msaa), one gets
>regression in:
>> >> >>
>> >> >> GL45-CTS.texture_barrier_ARB.same-texel-rw
>> >> >>
>> >> >> Setting the auxiliary surface for both sampling engine
>and
>> >data
>> >> >> port seems to fix this. I haven't found any hardware
>> >documentation
>> >> >> backing this though.
>> >> >>
>> >> >> v2 (Jason): Prepare also for the case where surface is
>sampled
>> >with
>> >> >> non-compressible format forcing also
>rendering
>> >without
>> >> >> compression.
>> >> >> v3: Split asserts and decision making.
>> >> >>
>> >> >> Signed-off-by: Topi Pohjolainen
>> ><[2][3][4]topi.pohjolai...@intel.com>
>> >> >> ---
>> >> >>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 63
>> >> >+---
>> >> >>  1 file changed, 56 insertions(+), 7 deletions(-)
>> >> >>
>> >> >> diff --git
>a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
>> >> >b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
>> >> >> index c1273c5..054c5c8 100644
>> >> >> --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
>> >> >> +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
>> >> >> @@ -140,9 +140,7 @@ brw_emit_surface_state(struct
>brw_context
>> >*brw,
>> >> >> struct isl_surf *aux_surf = NULL, aux_surf_s;
>> >> >> uint64_t aux_offset = 0;
>> >> >> enum isl_aux_usage aux_usage = ISL_AUX_USAGE_NONE;
>> >> >> -   if (mt->mcs_mt &&
>> >> >> -   ((view.usage &
>ISL_SURF_USAGE_RENDER_TARGET_BIT) ||
>> >> >> -mt->fast_clear_state !=
>> >INTEL_FAST_CLEAR_STATE_RESOLVED)) {
>> >> >> +   if (mt->mcs_mt && !(flags &
>INTEL_AUX_BUFFER_DISABLED)) {
>> >> >>intel_miptree_get_aux_isl_surf(brw, mt,
>_surf_s,
>> >> >_usage);
>> >> >>aux_surf = _surf_s;
>> >> >>assert(mt->mcs_mt->offset == 0);
>> >> >> @@ -425,6 +423,54 @@ swizzle_to_scs(GLenum swizzle,
>bool
>> >> >need_green_to_blue)
>> >> >> return (need_green_to_blue && scs == HSW_SCS_GREEN)
>?
>> >> >HSW_SCS_BLUE : scs;
>> >> >>  }
>> >> >>
>> >> >> +static unsigned
>> >> >> +brw_find_matching_rb(const struct gl_framebuffer *fb,
>> >> >> + const struct intel_mipmap_tree
>*mt)
>> >> >> +{
>> >> >> +   for (unsigned i = 0; i < fb->_NumColorDrawBuffers;
>i++) {
>> >> >> +  const struct intel_renderbuffer *irb =
>> >> >> + intel_renderbuffer(fb->_ColorDrawBuffers[i]);
>> >> >> +
>> >> >> +  if (irb->mt == mt)
>> >> >> + return i;
>> >> >> +   }
>> >> >> +
>> >> >> +   return fb->_NumColorDrawBuffers;
>> >> >> +}
>> >> >> +
>> >> >> +static bool
>> >> >> +brw_disable_aux_surface(const struct brw_context *brw,
>> >> >> +const struct intel_mipmap_tree
>*mt)
>> >> >> +{
>> >> >> +   /* Nothing to disable. */
>> >> >> +   if (!mt->mcs_mt)
>> >> >> +  return false;
>> >> >> +
>> >> >> +   /* There are special cases only for lossless
>compression.
>> >*/
>> >> >> +   if (!intel_miptree_is_lossless_compressed(brw, mt))
>> >> >> +  return mt->fast_clear_state ==
>> >> >INTEL_FAST_CLEAR_STATE_RESOLVED;
>> >> >> +
>> >> >> +   const struct gl_framebuffer *fb =
>brw->ctx.DrawBuffer;
>> >> >> +   const unsigned rb_index = brw_find_matching_rb(fb,
>mt);
>> >> >> +
>> >> >> +   /* In practise it looks that setting the same
>lossless
>

Re: [Mesa-dev] [PATCH 2/7] EGL: Implement eglLabelObjectKHR

2016-09-09 Thread Adam Jackson

On Fri, 2016-09-09 at 11:27 +0100, Emil Velikov wrote:
> > On 8 September 2016 at 18:46, Adam Jackson  wrote:
> > From: Kyle Brenneman 
> 
> Added a label to the _EGLThreadInfo, _EGLDisplay, and EGLResource
> structs. Implemented the function eglLabelObjectKHR.
> 
> 
> Coding style of the new hunk follows the GLVND one, which is _not_
> what we use in mesa/egl. Please don't do that ?

Fixed in next version.

> b) eglTerminate
> It detaches/unlinks only contexts and surfaces (bug?). Thus even when
> the display is no longer initialized we will get get to this point and
> _eglCheckResource() will return true.

Almost certainly a bug, patch in next version of the series. Not sure
if any tests cover this bug, but I guess I'll find out.

> > +/**
> > + * Returns the label set for the current thread.
> > + */
> > +EGLLabelKHR _eglGetThreadLabel(void)
> > +{
> > +   _EGLThreadInfo *t = _eglGetCurrentThread();
> > +   return t->Label;
> 
> Shouldn't the label be cleared in eglReleaseThread ?
> 

It isn't, not explicitly, but eglReleaseThread ->
_eglDestroyCurrentThread -> _eglDestroyThreadInfo which frees the
struct containing the label.

- ajax
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/7] EGL: Implement eglLabelObjectKHR

2016-09-09 Thread Kyle Brenneman


On 09/09/2016 04:27 AM, Emil Velikov wrote:

On 8 September 2016 at 18:46, Adam Jackson  wrote:

From: Kyle Brenneman 

Added a label to the _EGLThreadInfo, _EGLDisplay, and EGLResource
structs. Implemented the function eglLabelObjectKHR.


Coding style of the new hunk follows the GLVND one, which is _not_
what we use in mesa/egl. Please don't do that ?


Reviewed-by: Adam Jackson 
---
  src/egl/main/eglapi.c | 63 +++
  src/egl/main/eglcurrent.c |  9 +++
  src/egl/main/eglcurrent.h |  4 +++
  src/egl/main/egldisplay.h |  4 +++
  4 files changed, 80 insertions(+)

diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c
index df2dcd6..31b842f 100644
--- a/src/egl/main/eglapi.c
+++ b/src/egl/main/eglapi.c
@@ -1791,6 +1791,68 @@ eglExportDMABUFImageMESA(EGLDisplay dpy, EGLImage image,
 RETURN_EGL_EVAL(disp, ret);
  }

+static EGLint EGLAPIENTRY
+eglLabelObjectKHR(
+  EGLDisplay dpy,
+  EGLenum objectType,
+  EGLObjectKHR object,
+  EGLLabelKHR label)

Incorrect alignment.


+{
+   if (objectType == EGL_OBJECT_THREAD_KHR) {
+  _EGLThreadInfo *t = _eglGetCurrentThread();

Add blank newline after variable declarations.


+  if (!_eglIsCurrentThreadDummy()) {
+ t->Label = label;
+  }

Drop {} when they encapsulate a single line on code (unless the other
side of the conditional has them).


+  return EGL_SUCCESS;
+   } else {

This else can do. This and the above two apply for the rest of the patch.


+  _EGLDisplay *disp = _eglLookupDisplay(dpy);
+  if (disp == NULL) {
+ _eglError(EGL_BAD_DISPLAY, "eglLabelObjectKHR");
+ return EGL_BAD_DISPLAY;
+  }
+
+  if (objectType == EGL_OBJECT_DISPLAY_KHR) {
+ if (dpy != (EGLDisplay) object) {
+_eglError(EGL_BAD_PARAMETER, "eglLabelObjectKHR");
+return EGL_BAD_PARAMETER;
+ }
+ disp->Label = label;
+ return EGL_SUCCESS;
+  } else {
+ _EGLResourceType type;
+ switch (objectType)
+ {
+case EGL_OBJECT_CONTEXT_KHR:
+   type = _EGL_RESOURCE_CONTEXT;
+   break;
+case EGL_OBJECT_SURFACE_KHR:
+   type = _EGL_RESOURCE_SURFACE;
+   break;
+case EGL_OBJECT_IMAGE_KHR:
+   type = _EGL_RESOURCE_IMAGE;
+   break;
+case EGL_OBJECT_SYNC_KHR:
+   type = _EGL_RESOURCE_SYNC;
+   break;
+case EGL_OBJECT_STREAM_KHR:
+default:
+_eglError(EGL_BAD_PARAMETER, "eglLabelObjectKHR");
+   return EGL_BAD_PARAMETER;
+ }
+

 The  does not need to be initialized for 
 EGL_OBJECT_THREAD_KHR, or EGL_OBJECT_DISPLAY_KHR; However for all other
 types it must be initialized in order to validate the  for
 attaching a label.


+ if (_eglCheckResource(object, type, disp)) {

With the above said, I'm not sure of this will be sufficient.
_eglCheckResource checks if the resource is linked, with unlinking
happening via
a) destroy
Mildly related: we seem to have a bug for EGL_OBJECT_SYNC_KHR, where
we unconditionally unblock and destroy, while we should flag for
deletion.

b) eglTerminate
It detaches/unlinks only contexts and surfaces (bug?). Thus even when
the display is no longer initialized we will get get to this point and
_eglCheckResource() will return true.

c) eglReleaseThread
Takes care of the error, bound API and current ctx state.
Would it be sufficient to call _eglLockDisplay instead of 
_eglLookupDisplay, and then check the disp->Initialized flag for the 
other object types? That way, it could just return EGL_NOT_INITIALIZED 
for an uninitialized display, even if the EGLObjectKHR handle points to 
some object that's still lingering after an eglTerminate call.





+/**
+ * Returns the label set for the current thread.
+ */
+EGLLabelKHR _eglGetThreadLabel(void)
+{
+   _EGLThreadInfo *t = _eglGetCurrentThread();
+   return t->Label;

Shouldn't the label be cleared in eglReleaseThread ?
It is, albeit indirectly. eglReleaseThread frees the whole 
_EGLThreadInfo struct, so the next call to _eglGetCurrentThread will 
return an _EGLThreadInfo struct without a thread label. Though, 
_eglGetThreadLabel probably should call _eglIsCurrentThreadDummy first 
so that it doesn't try to create create a new _EGLThreadInfo if it 
doesn't have to.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 12/12] winsys/radeon: rename nrelocs, crelocs to max_relocs, num_relocs

2016-09-09 Thread Nicolai Hähnle

From: Nicolai Hähnle 

---
 src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 48 +--
 src/gallium/winsys/radeon/drm/radeon_drm_cs.h |  6 ++--
 2 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c 
b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
index f0dcac0..b277379 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
@@ -118,27 +118,27 @@ static bool radeon_init_cs_context(struct 
radeon_cs_context *csc,
 for (i = 0; i < ARRAY_SIZE(csc->reloc_indices_hashlist); i++) {
 csc->reloc_indices_hashlist[i] = -1;
 }
 return true;
 }
 
 static void radeon_cs_context_cleanup(struct radeon_cs_context *csc)
 {
 unsigned i;
 
-for (i = 0; i < csc->crelocs; i++) {
+for (i = 0; i < csc->num_relocs; i++) {
 p_atomic_dec(>relocs_bo[i].bo->num_cs_references);
 radeon_bo_reference(>relocs_bo[i].bo, NULL);
 }
 
-csc->crelocs = 0;
-csc->validated_crelocs = 0;
+csc->num_relocs = 0;
+csc->num_validated_relocs = 0;
 csc->chunks[0].length_dw = 0;
 csc->chunks[1].length_dw = 0;
 
 for (i = 0; i < ARRAY_SIZE(csc->reloc_indices_hashlist); i++) {
 csc->reloc_indices_hashlist[i] = -1;
 }
 }
 
 static void radeon_destroy_cs_context(struct radeon_cs_context *csc)
 {
@@ -207,21 +207,21 @@ static inline void update_reloc(struct 
drm_radeon_cs_reloc *reloc,
 int radeon_lookup_buffer(struct radeon_cs_context *csc, struct radeon_bo *bo)
 {
 unsigned hash = bo->handle & (ARRAY_SIZE(csc->reloc_indices_hashlist)-1);
 int i = csc->reloc_indices_hashlist[hash];
 
 /* not found or found */
 if (i == -1 || csc->relocs_bo[i].bo == bo)
 return i;
 
 /* Hash collision, look for the BO in the list of relocs linearly. */
-for (i = csc->crelocs - 1; i >= 0; i--) {
+for (i = csc->num_relocs - 1; i >= 0; i--) {
 if (csc->relocs_bo[i].bo == bo) {
 /* Put this reloc in the hash list.
  * This will prevent additional hash collisions if there are
  * several consecutive lookup_buffer calls for the same buffer.
  *
  * Example: Assuming buffers A,B,C collide in the hash list,
  * the following sequence of relocs:
  * AAABB
  * will collide here: ^ and here:   ^,
  * meaning that we should get very few collisions in the end. */
@@ -265,50 +265,50 @@ static unsigned radeon_add_buffer(struct radeon_drm_cs 
*cs,
  *
  * This doesn't have to be done if virtual memory is enabled,
  * because there is no offset patching with virtual memory.
  */
 if (cs->ring_type != RING_DMA || cs->ws->info.has_virtual_memory) {
 return i;
 }
 }
 
 /* New relocation, check if the backing array is large enough. */
-if (csc->crelocs >= csc->nrelocs) {
+if (csc->num_relocs >= csc->max_relocs) {
 uint32_t size;
-csc->nrelocs = MAX2(csc->nrelocs + 16, (unsigned)(csc->nrelocs * 1.3));
+csc->max_relocs = MAX2(csc->max_relocs + 16, 
(unsigned)(csc->max_relocs * 1.3));
 
-size = csc->nrelocs * sizeof(csc->relocs_bo[0]);
+size = csc->max_relocs * sizeof(csc->relocs_bo[0]);
 csc->relocs_bo = realloc(csc->relocs_bo, size);
 
-size = csc->nrelocs * sizeof(struct drm_radeon_cs_reloc);
+size = csc->max_relocs * sizeof(struct drm_radeon_cs_reloc);
 csc->relocs = realloc(csc->relocs, size);
 
 csc->chunks[1].chunk_data = (uint64_t)(uintptr_t)csc->relocs;
 }
 
 /* Initialize the new relocation. */
-csc->relocs_bo[csc->crelocs].bo = NULL;
-csc->relocs_bo[csc->crelocs].priority_usage = 1llu << priority;
-radeon_bo_reference(>relocs_bo[csc->crelocs].bo, bo);
+csc->relocs_bo[csc->num_relocs].bo = NULL;
+csc->relocs_bo[csc->num_relocs].priority_usage = 1llu << priority;
+radeon_bo_reference(>relocs_bo[csc->num_relocs].bo, bo);
 p_atomic_inc(>num_cs_references);
-reloc = >relocs[csc->crelocs];
+reloc = >relocs[csc->num_relocs];
 reloc->handle = bo->handle;
 reloc->read_domains = rd;
 reloc->write_domain = wd;
 reloc->flags = priority / 4;
 
-csc->reloc_indices_hashlist[hash] = csc->crelocs;
+csc->reloc_indices_hashlist[hash] = csc->num_relocs;
 
 csc->chunks[1].length_dw += RELOC_DWORDS;
 
 *added_domains = rd | wd;
-return csc->crelocs++;
+return csc->num_relocs++;
 }
 
 static unsigned radeon_drm_cs_add_buffer(struct radeon_winsys_cs *rcs,
 struct pb_buffer *buf,
 enum radeon_bo_usage usage,
 enum radeon_bo_domain domains,
 enum radeon_bo_priority priority)
 {
 struct

[Mesa-dev] [PATCH 11/12] winsys/radeon: don't pre-allocate the relocations array

2016-09-09 Thread Nicolai Hähnle

From: Nicolai Hähnle 

It's really not necessary. Switch to an exponential resizing strategy.
---
 src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c 
b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
index 612a876..f0dcac0 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
@@ -91,33 +91,20 @@ static void radeon_drm_ctx_destroy(struct radeon_winsys_ctx 
*ctx)
 {
 /* No context support here. */
 }
 
 static bool radeon_init_cs_context(struct radeon_cs_context *csc,
struct radeon_drm_winsys *ws)
 {
 int i;
 
 csc->fd = ws->fd;
-csc->nrelocs = 512;
-csc->relocs_bo = (struct radeon_bo_item*)
- CALLOC(1, csc->nrelocs * sizeof(csc->relocs_bo[0]));
-if (!csc->relocs_bo) {
-return false;
-}
-
-csc->relocs = (struct drm_radeon_cs_reloc*)
-  CALLOC(1, csc->nrelocs * sizeof(struct drm_radeon_cs_reloc));
-if (!csc->relocs) {
-FREE(csc->relocs_bo);
-return false;
-}
 
 csc->chunks[0].chunk_id = RADEON_CHUNK_ID_IB;
 csc->chunks[0].length_dw = 0;
 csc->chunks[0].chunk_data = (uint64_t)(uintptr_t)csc->buf;
 csc->chunks[1].chunk_id = RADEON_CHUNK_ID_RELOCS;
 csc->chunks[1].length_dw = 0;
 csc->chunks[1].chunk_data = (uint64_t)(uintptr_t)csc->relocs;
 csc->chunks[2].chunk_id = RADEON_CHUNK_ID_FLAGS;
 csc->chunks[2].length_dw = 2;
 csc->chunks[2].chunk_data = (uint64_t)(uintptr_t)>flags;
@@ -280,21 +267,21 @@ static unsigned radeon_add_buffer(struct radeon_drm_cs 
*cs,
  * because there is no offset patching with virtual memory.
  */
 if (cs->ring_type != RING_DMA || cs->ws->info.has_virtual_memory) {
 return i;
 }
 }
 
 /* New relocation, check if the backing array is large enough. */
 if (csc->crelocs >= csc->nrelocs) {
 uint32_t size;
-csc->nrelocs += 10;
+csc->nrelocs = MAX2(csc->nrelocs + 16, (unsigned)(csc->nrelocs * 1.3));
 
 size = csc->nrelocs * sizeof(csc->relocs_bo[0]);
 csc->relocs_bo = realloc(csc->relocs_bo, size);
 
 size = csc->nrelocs * sizeof(struct drm_radeon_cs_reloc);
 csc->relocs = realloc(csc->relocs, size);
 
 csc->chunks[1].chunk_data = (uint64_t)(uintptr_t)csc->relocs;
 }
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 03/12] winsys/amdgpu: clean up error paths in amdgpu_winsys_create

2016-09-09 Thread Nicolai Hähnle

From: Nicolai Hähnle 

No need to call pb_cache_deinit, because the cache hasn't been initialized
at that point.
---
 src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
index 33aa492..521a78a 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
@@ -520,31 +520,29 @@ amdgpu_winsys_create(int fd, radeon_screen_create_t 
screen_create)
/* Lookup a winsys if we have already created one for this device. */
ws = util_hash_table_get(dev_tab, dev);
if (ws) {
   pipe_reference(NULL, >reference);
   pipe_mutex_unlock(dev_tab_mutex);
   return >base;
}
 
/* Create a new winsys. */
ws = CALLOC_STRUCT(amdgpu_winsys);
-   if (!ws) {
-  pipe_mutex_unlock(dev_tab_mutex);
-  return NULL;
-   }
+   if (!ws)
+  goto fail;
 
ws->dev = dev;
ws->info.drm_major = drm_major;
ws->info.drm_minor = drm_minor;
 
if (!do_winsys_init(ws, fd))
-  goto fail;
+  goto fail_alloc;
 
/* Create managers. */
pb_cache_init(>bo_cache, 50, ws->check_vm ? 1.0f : 2.0f, 0,
  (ws->info.vram_size + ws->info.gart_size) / 8,
  amdgpu_bo_destroy, amdgpu_bo_can_reclaim);
 
/* init reference */
pipe_reference_init(>reference, 1);
 
/* Set functions. */
@@ -580,16 +578,16 @@ amdgpu_winsys_create(int fd, radeon_screen_create_t 
screen_create)
 
util_hash_table_set(dev_tab, dev, ws);
 
/* We must unlock the mutex once the winsys is fully initialized, so that
 * other threads attempting to create the winsys from the same fd will
 * get a fully initialized winsys and not just half-way initialized. */
pipe_mutex_unlock(dev_tab_mutex);
 
return >base;
 
+fail_alloc:
+   FREE(ws);
 fail:
pipe_mutex_unlock(dev_tab_mutex);
-   pb_cache_deinit(>bo_cache);
-   FREE(ws);
return NULL;
 }
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 04/12] winsys/amdgpu: add do_winsys_deinit function

2016-09-09 Thread Nicolai Hähnle

From: Nicolai Hähnle 

The idea is to have matching init/deinit functions so that deinit can be
re-used for cleanup in the error path of amdgpu_winsys_create.
---
 src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
index 521a78a..3961ee3 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
@@ -361,32 +361,37 @@ static bool do_winsys_init(struct amdgpu_winsys *ws, int 
fd)
return true;
 
 fail:
if (ws->addrlib)
   AddrDestroy(ws->addrlib);
amdgpu_device_deinitialize(ws->dev);
ws->dev = NULL;
return false;
 }
 
+static void do_winsys_deinit(struct amdgpu_winsys *ws)
+{
+   AddrDestroy(ws->addrlib);
+   amdgpu_device_deinitialize(ws->dev);
+}
+
 static void amdgpu_winsys_destroy(struct radeon_winsys *rws)
 {
struct amdgpu_winsys *ws = (struct amdgpu_winsys*)rws;
 
if (util_queue_is_initialized(>cs_queue))
   util_queue_destroy(>cs_queue);
 
pipe_mutex_destroy(ws->bo_fence_lock);
pb_cache_deinit(>bo_cache);
pipe_mutex_destroy(ws->global_bo_list_lock);
-   AddrDestroy(ws->addrlib);
-   amdgpu_device_deinitialize(ws->dev);
+   do_winsys_deinit(ws);
FREE(rws);
 }
 
 static void amdgpu_winsys_query_info(struct radeon_winsys *rws,
  struct radeon_info *info)
 {
*info = ((struct amdgpu_winsys *)rws)->info;
 }
 
 static bool amdgpu_cs_request_feature(struct radeon_winsys_cs *rcs,
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 02/12] gallium/radeon: page alignment for buffers is unnecessary

2016-09-09 Thread Nicolai Hähnle

From: Nicolai Hähnle 

In some places (e.g. shader program pointers) we require 256 bytes alignment.
---
 src/gallium/drivers/radeon/r600_pipe_common.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 6d7cc1b..6fe3183 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -1077,22 +1077,21 @@ static void r600_query_memory_info(struct pipe_screen 
*screen,
/* Just return the number of evicted 64KB pages. */
info->nr_device_memory_evictions = info->device_memory_evicted 
/ 64;
 }
 
 struct pipe_resource *r600_resource_create_common(struct pipe_screen *screen,
  const struct pipe_resource 
*templ)
 {
struct r600_common_screen *rscreen = (struct r600_common_screen*)screen;
 
if (templ->target == PIPE_BUFFER) {
-   return r600_buffer_create(screen, templ,
- rscreen->info.gart_page_size);
+   return r600_buffer_create(screen, templ, 256);
} else {
return r600_texture_create(screen, templ);
}
 }
 
 bool r600_common_screen_init(struct r600_common_screen *rscreen,
 struct radeon_winsys *ws)
 {
char llvm_string[32] = {}, kernel_version[128] = {};
struct utsname uname_data;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 10/12] winsys/radeon: remove unused radeon_cs_context::priority_usage

2016-09-09 Thread Nicolai Hähnle

From: Nicolai Hähnle 

---
 src/gallium/winsys/radeon/drm/radeon_drm_cs.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.h 
b/src/gallium/winsys/radeon/drm/radeon_drm_cs.h
index 76004c5..208452d 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.h
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.h
@@ -42,21 +42,20 @@ struct radeon_cs_context {
 struct drm_radeon_cs_chunk  chunks[3];
 uint64_tchunk_array[3];
 uint32_tflags[2];
 
 /* Buffers. */
 unsignednrelocs;
 unsignedcrelocs;
 unsignedvalidated_crelocs;
 struct radeon_bo_item   *relocs_bo;
 struct drm_radeon_cs_reloc  *relocs;
-uint64_t*priority_usage;
 
 int reloc_indices_hashlist[4096];
 };
 
 struct radeon_drm_cs {
 struct radeon_winsys_cs base;
 enum ring_type  ring_type;
 
 /* We flip between these two CS. While one is being consumed
  * by the kernel in another thread, the other one is being filled
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 05/12] winsys/amdgpu: use only one fence per BO

2016-09-09 Thread Nicolai Hähnle

From: Nicolai Hähnle 

The fence that is added to the BO during flush is guaranteed to be
signaled after all the fences that were in the fences array of the BO
before the flush, because those fences are added as dependencies for the
submission (and all this happens atomically under the bo_fence_lock).

Therefore, keeping only the last fence around is sufficient.
---
 src/gallium/winsys/amdgpu/drm/amdgpu_bo.c | 54 +++--
 src/gallium/winsys/amdgpu/drm/amdgpu_bo.h |  4 +-
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 66 +++
 3 files changed, 56 insertions(+), 68 deletions(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
index 32df0be..a6d4aa4 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
@@ -37,21 +37,20 @@
 #include 
 #include 
 #include 
 
 static bool amdgpu_bo_wait(struct pb_buffer *_buf, uint64_t timeout,
enum radeon_bo_usage usage)
 {
struct amdgpu_winsys_bo *bo = amdgpu_winsys_bo(_buf);
struct amdgpu_winsys *ws = bo->ws;
int64_t abs_timeout;
-   int i;
 
if (timeout == 0) {
   if (p_atomic_read(>num_active_ioctls))
  return false;
 
} else {
   abs_timeout = os_time_get_absolute_timeout(timeout);
 
   /* Wait if any ioctl is being submitted with this buffer. */
   if (!os_wait_until_zero_abs_timeout(>num_active_ioctls, abs_timeout))
@@ -68,91 +67,82 @@ static bool amdgpu_bo_wait(struct pb_buffer *_buf, uint64_t 
timeout,
 
   r = amdgpu_bo_wait_for_idle(bo->bo, timeout, _busy);
   if (r)
  fprintf(stderr, "%s: amdgpu_bo_wait_for_idle failed %i\n", __func__,
  r);
   return !buffer_busy;
}
 
if (timeout == 0) {
   pipe_mutex_lock(ws->bo_fence_lock);
-  for (i = 0; i < RING_LAST; i++)
- if (bo->fence[i]) {
-if (amdgpu_fence_wait(bo->fence[i], 0, false)) {
-   /* Release the idle fence to avoid checking it again later. */
-   amdgpu_fence_reference(>fence[i], NULL);
-} else {
-   pipe_mutex_unlock(ws->bo_fence_lock);
-   return false;
-}
+  if (bo->fence) {
+ if (amdgpu_fence_wait(bo->fence, 0, false)) {
+/* Release the idle fence to avoid checking it again later. */
+amdgpu_fence_reference(>fence, NULL);
+ } else {
+pipe_mutex_unlock(ws->bo_fence_lock);
+return false;
  }
+  }
   pipe_mutex_unlock(ws->bo_fence_lock);
   return true;
 
} else {
-  struct pipe_fence_handle *fence[RING_LAST] = {};
-  bool fence_idle[RING_LAST] = {};
+  struct pipe_fence_handle *fence = NULL;
+  bool fence_idle = false;
   bool buffer_idle = true;
 
-  /* Take references to all fences, so that we can wait for them
+  /* Take a reference to the fences, so that we can wait for it
* without the lock. */
   pipe_mutex_lock(ws->bo_fence_lock);
-  for (i = 0; i < RING_LAST; i++)
- amdgpu_fence_reference([i], bo->fence[i]);
+  amdgpu_fence_reference(, bo->fence);
   pipe_mutex_unlock(ws->bo_fence_lock);
 
-  /* Now wait for the fences. */
-  for (i = 0; i < RING_LAST; i++) {
- if (fence[i]) {
-if (amdgpu_fence_wait(fence[i], abs_timeout, true))
-   fence_idle[i] = true;
-else
-   buffer_idle = false;
- }
+  /* Now wait for the fence. */
+  if (fence) {
+ if (amdgpu_fence_wait(fence, abs_timeout, true))
+fence_idle = true;
+ else
+buffer_idle = false;
   }
 
   /* Release idle fences to avoid checking them again later. */
   pipe_mutex_lock(ws->bo_fence_lock);
-  for (i = 0; i < RING_LAST; i++) {
- if (fence[i] == bo->fence[i] && fence_idle[i])
-amdgpu_fence_reference(>fence[i], NULL);
-
- amdgpu_fence_reference([i], NULL);
-  }
+  if (fence == bo->fence && fence_idle)
+ amdgpu_fence_reference(>fence, NULL);
+  amdgpu_fence_reference(, NULL);
   pipe_mutex_unlock(ws->bo_fence_lock);
 
   return buffer_idle;
}
 }
 
 static enum radeon_bo_domain amdgpu_bo_get_initial_domain(
   struct pb_buffer *buf)
 {
return ((struct amdgpu_winsys_bo*)buf)->initial_domain;
 }
 
 void amdgpu_bo_destroy(struct pb_buffer *_buf)
 {
struct amdgpu_winsys_bo *bo = amdgpu_winsys_bo(_buf);
-   int i;
 
pipe_mutex_lock(bo->ws->global_bo_list_lock);
LIST_DEL(>global_list_item);
bo->ws->num_buffers--;
pipe_mutex_unlock(bo->ws->global_bo_list_lock);
 
amdgpu_bo_va_op(bo->bo, 0, bo->base.size, bo->va, 0, AMDGPU_VA_OP_UNMAP);
amdgpu_va_range_free(bo->va_handle);
amdgpu_bo_free(bo->bo);
 
-   for (i = 0; i < RING_LAST; i++)
-  amdgpu_fence_reference(>fence[i], NULL);
+

[Mesa-dev] [PATCH 09/12] winsys/amdgpu: remove amdgpu_cs_lookup_buffer

2016-09-09 Thread Nicolai Hähnle

From: Nicolai Hähnle 

The radeonsi driver doesn't and shouldn't care about the buffer index.
Only the virtual addresses matter.
---
 src/gallium/drivers/radeon/radeon_winsys.h | 3 +++
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.c  | 9 -
 2 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_winsys.h 
b/src/gallium/drivers/radeon/radeon_winsys.h
index 9693032..91f6e89 100644
--- a/src/gallium/drivers/radeon/radeon_winsys.h
+++ b/src/gallium/drivers/radeon/radeon_winsys.h
@@ -596,20 +596,23 @@ struct radeon_winsys {
  */
 unsigned (*cs_add_buffer)(struct radeon_winsys_cs *cs,
  struct pb_buffer *buf,
  enum radeon_bo_usage usage,
  enum radeon_bo_domain domain,
  enum radeon_bo_priority priority);
 
 /**
  * Return the index of an already-added buffer.
  *
+ * Not supported on amdgpu. Drivers with GPUVM should not care about
+ * buffer indices.
+ *
  * \param csCommand stream
  * \param buf   Buffer
  * \return  The buffer index, or -1 if the buffer has not been 
added.
  */
 int (*cs_lookup_buffer)(struct radeon_winsys_cs *cs,
 struct pb_buffer *buf);
 
 /**
  * Return true if there is enough memory in VRAM and GTT for the buffers
  * added so far. If the validation fails, all buffers which have
diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
index df9859b..e2229ff 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
@@ -692,28 +692,20 @@ amdgpu_cs_add_const_preamble_ib(struct radeon_winsys_cs 
*rcs)
cs->csc->request.ibs = >csc->ib[IB_CONST_PREAMBLE];
 
cs->cst->request.number_of_ibs = 3;
cs->cst->request.ibs = >cst->ib[IB_CONST_PREAMBLE];
 
return >const_preamble_ib.base;
 }
 
 #define OUT_CS(cs, value) (cs)->current.buf[(cs)->current.cdw++] = (value)
 
-static int amdgpu_cs_lookup_buffer(struct radeon_winsys_cs *rcs,
-   struct pb_buffer *buf)
-{
-   struct amdgpu_cs *cs = amdgpu_cs(rcs);
-
-   return amdgpu_lookup_buffer(cs->csc, (struct amdgpu_winsys_bo*)buf);
-}
-
 static bool amdgpu_cs_validate(struct radeon_winsys_cs *rcs)
 {
return true;
 }
 
 static bool amdgpu_cs_check_space(struct radeon_winsys_cs *rcs, unsigned dw)
 {
struct amdgpu_ib *ib = amdgpu_ib(rcs);
struct amdgpu_cs *cs = amdgpu_cs_from_ib(ib);
unsigned requested_size = rcs->prev_dw + rcs->current.cdw + dw;
@@ -,21 +1103,20 @@ static bool amdgpu_bo_is_referenced(struct 
radeon_winsys_cs *rcs,
 void amdgpu_cs_init_functions(struct amdgpu_winsys *ws)
 {
ws->base.ctx_create = amdgpu_ctx_create;
ws->base.ctx_destroy = amdgpu_ctx_destroy;
ws->base.ctx_query_reset_status = amdgpu_ctx_query_reset_status;
ws->base.cs_create = amdgpu_cs_create;
ws->base.cs_add_const_ib = amdgpu_cs_add_const_ib;
ws->base.cs_add_const_preamble_ib = amdgpu_cs_add_const_preamble_ib;
ws->base.cs_destroy = amdgpu_cs_destroy;
ws->base.cs_add_buffer = amdgpu_cs_add_buffer;
-   ws->base.cs_lookup_buffer = amdgpu_cs_lookup_buffer;
ws->base.cs_validate = amdgpu_cs_validate;
ws->base.cs_check_space = amdgpu_cs_check_space;
ws->base.cs_get_buffer_list = amdgpu_cs_get_buffer_list;
ws->base.cs_flush = amdgpu_cs_flush;
ws->base.cs_get_next_fence = amdgpu_cs_get_next_fence;
ws->base.cs_is_buffer_referenced = amdgpu_bo_is_referenced;
ws->base.cs_sync_flush = amdgpu_cs_sync_flush;
ws->base.fence_wait = amdgpu_fence_wait_rel_timeout;
ws->base.fence_reference = amdgpu_fence_reference;
 }
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 06/12] winsys/amdgpu: extract adding a new buffer list entry into its own function

2016-09-09 Thread Nicolai Hähnle

From: Nicolai Hähnle 

While at it, try to be a little more robust in the face of memory allocation
failure.
---
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 113 ++
 1 file changed, 70 insertions(+), 43 deletions(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
index 7cbf19b..638c2d5 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
@@ -284,77 +284,104 @@ int amdgpu_lookup_buffer(struct amdgpu_cs_context *cs, 
struct amdgpu_winsys_bo *
   * AAABB
   * will collide here: ^ and here:   ^,
   * meaning that we should get very few collisions in the end. */
  cs->buffer_indices_hashlist[hash] = i;
  return i;
   }
}
return -1;
 }
 
-static unsigned amdgpu_add_buffer(struct amdgpu_cs *acs,
- struct amdgpu_winsys_bo *bo,
- enum radeon_bo_usage usage,
- enum radeon_bo_domain domains,
- unsigned priority,
- enum radeon_bo_domain *added_domains)
+static int
+amdgpu_lookup_or_add_buffer(struct amdgpu_cs_context *cs, struct 
amdgpu_winsys_bo *bo)
 {
-   struct amdgpu_cs_context *cs = acs->csc;
struct amdgpu_cs_buffer *buffer;
-   unsigned hash = bo->unique_id & (ARRAY_SIZE(cs->buffer_indices_hashlist)-1);
-   int i = -1;
+   unsigned hash;
+   int idx = amdgpu_lookup_buffer(cs, bo);
 
-   assert(priority < 64);
-   *added_domains = 0;
-
-   i = amdgpu_lookup_buffer(cs, bo);
-
-   if (i >= 0) {
-  buffer = >buffers[i];
-  buffer->priority_usage |= 1llu << priority;
-  buffer->usage |= usage;
-  *added_domains = domains & ~buffer->domains;
-  buffer->domains |= domains;
-  cs->flags[i] = MAX2(cs->flags[i], priority / 4);
-  return i;
-   }
+   if (idx >= 0)
+  return idx;
 
/* New buffer, check if the backing array is large enough. */
if (cs->num_buffers >= cs->max_num_buffers) {
-  uint32_t size;
-  cs->max_num_buffers += 10;
+  unsigned new_max =
+ MAX2(cs->max_num_buffers + 16, (unsigned)(cs->max_num_buffers * 1.3));
+  struct amdgpu_cs_buffer *new_buffers;
+  amdgpu_bo_handle *new_handles;
+  uint8_t *new_flags;
+
+  new_buffers = MALLOC(new_max * sizeof(*new_buffers));
+  new_handles = MALLOC(new_max * sizeof(*new_handles));
+  new_flags = MALLOC(new_max * sizeof(*new_flags));
+
+  if (!new_buffers || !new_handles || !new_flags) {
+ fprintf(stderr, "amdgpu_lookup_or_add_buffer: allocation failed\n");
+ FREE(new_buffers);
+ FREE(new_handles);
+ FREE(new_flags);
+ return -1;
+  }
 
-  size = cs->max_num_buffers * sizeof(struct amdgpu_cs_buffer);
-  cs->buffers = realloc(cs->buffers, size);
+  memcpy(new_buffers, cs->buffers, cs->num_buffers * sizeof(*new_buffers));
+  memcpy(new_handles, cs->handles, cs->num_buffers * sizeof(*new_handles));
+  memcpy(new_flags, cs->flags, cs->num_buffers * sizeof(*new_flags));
 
-  size = cs->max_num_buffers * sizeof(amdgpu_bo_handle);
-  cs->handles = realloc(cs->handles, size);
+  FREE(cs->buffers);
+  FREE(cs->handles);
+  FREE(cs->flags);
 
-  cs->flags = realloc(cs->flags, cs->max_num_buffers);
+  cs->max_num_buffers = new_max;
+  cs->buffers = new_buffers;
+  cs->handles = new_handles;
+  cs->flags = new_flags;
}
 
-   /* Initialize the new buffer. */
-   cs->buffers[cs->num_buffers].bo = NULL;
-   amdgpu_winsys_bo_reference(>buffers[cs->num_buffers].bo, bo);
-   cs->handles[cs->num_buffers] = bo->bo;
-   cs->flags[cs->num_buffers] = priority / 4;
+   idx = cs->num_buffers;
+   buffer = >buffers[idx];
+   memset(buffer, 0, sizeof(*buffer));
+   amdgpu_winsys_bo_reference(>bo, bo);
+   cs->handles[idx] = bo->bo;
+   cs->flags[idx] = 0;
p_atomic_inc(>num_cs_references);
-   buffer = >buffers[cs->num_buffers];
-   buffer->bo = bo;
-   buffer->priority_usage = 1llu << priority;
-   buffer->usage = usage;
-   buffer->domains = domains;
+   cs->num_buffers++;
 
-   cs->buffer_indices_hashlist[hash] = cs->num_buffers;
+   hash = bo->unique_id & (ARRAY_SIZE(cs->buffer_indices_hashlist)-1);
+   cs->buffer_indices_hashlist[hash] = idx;
+
+   return idx;
+}
+
+static unsigned amdgpu_add_buffer(struct amdgpu_cs *acs,
+ struct amdgpu_winsys_bo *bo,
+ enum radeon_bo_usage usage,
+ enum radeon_bo_domain domains,
+ unsigned priority,
+ enum radeon_bo_domain *added_domains)
+{
+   struct amdgpu_cs_context *cs = acs->csc;
+   struct amdgpu_cs_buffer *buffer;
+   int i = amdgpu_lookup_or_add_buffer(cs, bo);
+
+   assert(priority < 64);
+

[Mesa-dev] [PATCH 08/12] winsys/amdgpu: remove unused field domains from amdgpu_cs_buffer

2016-09-09 Thread Nicolai Hähnle

From: Nicolai Hähnle 

---
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 53 ++-
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.h |  1 -
 2 files changed, 17 insertions(+), 37 deletions(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
index 10a4416..df9859b 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
@@ -285,22 +285,23 @@ int amdgpu_lookup_buffer(struct amdgpu_cs_context *cs, 
struct amdgpu_winsys_bo *
   * will collide here: ^ and here:   ^,
   * meaning that we should get very few collisions in the end. */
  cs->buffer_indices_hashlist[hash] = i;
  return i;
   }
}
return -1;
 }
 
 static int
-amdgpu_lookup_or_add_buffer(struct amdgpu_cs_context *cs, struct 
amdgpu_winsys_bo *bo)
+amdgpu_lookup_or_add_buffer(struct amdgpu_cs *acs, struct amdgpu_winsys_bo *bo)
 {
+   struct amdgpu_cs_context *cs = acs->csc;
struct amdgpu_cs_buffer *buffer;
unsigned hash;
int idx = amdgpu_lookup_buffer(cs, bo);
 
if (idx >= 0)
   return idx;
 
/* New buffer, check if the backing array is large enough. */
if (cs->num_buffers >= cs->max_num_buffers) {
   unsigned new_max =
@@ -340,70 +341,50 @@ amdgpu_lookup_or_add_buffer(struct amdgpu_cs_context *cs, 
struct amdgpu_winsys_b
memset(buffer, 0, sizeof(*buffer));
amdgpu_winsys_bo_reference(>bo, bo);
cs->handles[idx] = bo->bo;
cs->flags[idx] = 0;
p_atomic_inc(>num_cs_references);
cs->num_buffers++;
 
hash = bo->unique_id & (ARRAY_SIZE(cs->buffer_indices_hashlist)-1);
cs->buffer_indices_hashlist[hash] = idx;
 
-   return idx;
-}
-
-static unsigned amdgpu_add_buffer(struct amdgpu_cs *acs,
- struct amdgpu_winsys_bo *bo,
- enum radeon_bo_usage usage,
- enum radeon_bo_domain domains,
- unsigned priority,
- enum radeon_bo_domain *added_domains)
-{
-   struct amdgpu_cs_context *cs = acs->csc;
-   struct amdgpu_cs_buffer *buffer;
-   int i = amdgpu_lookup_or_add_buffer(cs, bo);
-
-   assert(priority < 64);
-
-   if (i < 0) {
-  *added_domains = 0;
-  return ~0;
-   }
+   if (bo->initial_domain & RADEON_DOMAIN_VRAM)
+  acs->main.base.used_vram += bo->base.size;
+   else if (bo->initial_domain & RADEON_DOMAIN_GTT)
+  acs->main.base.used_gart += bo->base.size;
 
-   buffer = >buffers[i];
-   buffer->priority_usage |= 1llu << priority;
-   buffer->usage |= usage;
-   *added_domains = domains & ~buffer->domains;
-   buffer->domains |= domains;
-   cs->flags[i] = MAX2(cs->flags[i], priority / 4);
-   return i;
+   return idx;
 }
 
 static unsigned amdgpu_cs_add_buffer(struct radeon_winsys_cs *rcs,
 struct pb_buffer *buf,
 enum radeon_bo_usage usage,
 enum radeon_bo_domain domains,
 enum radeon_bo_priority priority)
 {
/* Don't use the "domains" parameter. Amdgpu doesn't support changing
 * the buffer placement during command submission.
 */
-   struct amdgpu_cs *cs = amdgpu_cs(rcs);
+   struct amdgpu_cs *acs = amdgpu_cs(rcs);
+   struct amdgpu_cs_context *cs = acs->csc;
struct amdgpu_winsys_bo *bo = (struct amdgpu_winsys_bo*)buf;
-   enum radeon_bo_domain added_domains;
-   unsigned index = amdgpu_add_buffer(cs, bo, usage, bo->initial_domain,
- priority, _domains);
+   struct amdgpu_cs_buffer *buffer;
+   int index = amdgpu_lookup_or_add_buffer(acs, bo);
 
-   if (added_domains & RADEON_DOMAIN_VRAM)
-  cs->main.base.used_vram += bo->base.size;
-   else if (added_domains & RADEON_DOMAIN_GTT)
-  cs->main.base.used_gart += bo->base.size;
+   if (index < 0)
+  return 0;
 
+   buffer = >buffers[index];
+   buffer->priority_usage |= 1llu << priority;
+   buffer->usage |= usage;
+   cs->flags[index] = MAX2(cs->flags[index], priority / 4);
return index;
 }
 
 static bool amdgpu_ib_new_buffer(struct amdgpu_winsys *ws, struct amdgpu_ib 
*ib)
 {
struct pb_buffer *pb;
uint8_t *mapped;
unsigned buffer_size;
 
/* Always create a buffer that is at least as large as the maximum seen IB
diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h
index 7455061..51753db 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h
@@ -40,21 +40,20 @@ struct amdgpu_ctx {
amdgpu_context_handle ctx;
amdgpu_bo_handle user_fence_bo;
uint64_t *user_fence_cpu_address_base;
int refcount;
 };
 
 struct amdgpu_cs_buffer {
struct amdgpu_winsys_bo *bo;
uint64_t priority_usage;
enum radeon_bo_usage usage;
-   enum radeon_bo_domain domains;
 };
 
 enum

[Mesa-dev] [PATCH 01/12] gallium/radeon/winsyses: remove #includes of pb_bufmgr.h

2016-09-09 Thread Nicolai Hähnle

From: Nicolai Hähnle 

---
 src/gallium/winsys/amdgpu/drm/amdgpu_bo.h | 1 -
 src/gallium/winsys/radeon/drm/radeon_drm_bo.h | 1 -
 src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 1 -
 3 files changed, 3 deletions(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.h 
b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.h
index e2ee049..70d9854 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.h
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.h
@@ -27,21 +27,20 @@
  */
 /*
  * Authors:
  *  Marek Olšák 
  */
 
 #ifndef AMDGPU_BO_H
 #define AMDGPU_BO_H
 
 #include "amdgpu_winsys.h"
-#include "pipebuffer/pb_bufmgr.h"
 
 struct amdgpu_winsys_bo {
struct pb_buffer base;
struct pb_cache_entry cache_entry;
 
struct amdgpu_winsys *ws;
void *user_ptr; /* from buffer_from_ptr */
 
amdgpu_bo_handle bo;
int map_count;
diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.h 
b/src/gallium/winsys/radeon/drm/radeon_drm_bo.h
index f7f4ce3..a9f31c0 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.h
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.h
@@ -26,21 +26,20 @@
  */
 /*
  * Authors:
  *  Jérôme Glisse 
  *  Marek Olšák 
  */
 #ifndef RADEON_DRM_BO_H
 #define RADEON_DRM_BO_H
 
 #include "radeon_drm_winsys.h"
-#include "pipebuffer/pb_bufmgr.h"
 #include "os/os_thread.h"
 
 struct radeon_bo {
 struct pb_buffer base;
 struct pb_cache_entry cache_entry;
 
 struct radeon_drm_winsys *rws;
 void *user_ptr; /* from buffer_from_ptr */
 
 void *ptr;
diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c 
b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
index d73b7f4..aa4bf5f 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
@@ -28,21 +28,20 @@
  * Authors:
  *  Corbin Simpson 
  *  Joakim Sindholt 
  *  Marek Olšák 
  */
 
 #include "radeon_drm_bo.h"
 #include "radeon_drm_cs.h"
 #include "radeon_drm_public.h"
 
-#include "pipebuffer/pb_bufmgr.h"
 #include "util/u_memory.h"
 #include "util/u_hash_table.h"
 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 07/12] winsys/amdgpu: remove initial buffer list allocation

2016-09-09 Thread Nicolai Hähnle

From: Nicolai Hähnle 

It's really not necessary.
---
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 20 
 1 file changed, 20 deletions(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
index 638c2d5..10a4416 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
@@ -568,40 +568,20 @@ static bool amdgpu_init_cs_context(struct 
amdgpu_cs_context *cs,
case RING_COMPUTE:
   cs->request.ip_type = AMDGPU_HW_IP_COMPUTE;
   break;
 
default:
case RING_GFX:
   cs->request.ip_type = AMDGPU_HW_IP_GFX;
   break;
}
 
-   cs->max_num_buffers = 512;
-   cs->buffers = (struct amdgpu_cs_buffer*)
-  CALLOC(1, cs->max_num_buffers * sizeof(struct 
amdgpu_cs_buffer));
-   if (!cs->buffers) {
-  return false;
-   }
-
-   cs->handles = CALLOC(1, cs->max_num_buffers * sizeof(amdgpu_bo_handle));
-   if (!cs->handles) {
-  FREE(cs->buffers);
-  return false;
-   }
-
-   cs->flags = CALLOC(1, cs->max_num_buffers);
-   if (!cs->flags) {
-  FREE(cs->handles);
-  FREE(cs->buffers);
-  return false;
-   }
-
for (i = 0; i < ARRAY_SIZE(cs->buffer_indices_hashlist); i++) {
   cs->buffer_indices_hashlist[i] = -1;
}
 
cs->request.number_of_ibs = 1;
cs->request.ibs = >ib[IB_MAIN];
 
cs->ib[IB_CONST].flags = AMDGPU_IB_FLAG_CE;
cs->ib[IB_CONST_PREAMBLE].flags = AMDGPU_IB_FLAG_CE |
  AMDGPU_IB_FLAG_PREAMBLE;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 00/12] radeon/winsyses: various cleanups

2016-09-09 Thread Nicolai Hähnle

Hi,

this is a bunch of random cleanups in and related to radeon and amdgpu
winsyses.

There is one significant behavioral change, which is that amdgpu now only
keeps a single fence around for each buffer object. See the explanation
at that commit. Please review!

Thanks,
Nicolai
--
 .../drivers/radeon/r600_pipe_common.c|   3 +-
 src/gallium/drivers/radeon/radeon_winsys.h   |   3 +
 src/gallium/winsys/amdgpu/drm/amdgpu_bo.c|  54 ++---
 src/gallium/winsys/amdgpu/drm/amdgpu_bo.h|   5 +-
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.c| 203 -
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.h|   1 -
 .../winsys/amdgpu/drm/amdgpu_winsys.c|  21 +-
 .../winsys/radeon/drm/radeon_drm_bo.h|   1 -
 .../winsys/radeon/drm/radeon_drm_cs.c|  61 ++---
 .../winsys/radeon/drm/radeon_drm_cs.h|   7 +-
 .../winsys/radeon/drm/radeon_drm_winsys.c|   1 -

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallium/winsys/kms: Close drm device filedescriptor on kms_dri_sw_winsys release

2016-09-09 Thread Łukasz Spintzyk

Hi,

Can I do something to have it merged to the source tree?

Fix was already tested, works fine :)

regards
Łukasz Spintzyk

2016-09-05 18:48 GMT+02:00 Lukasz Spintzyk :

> This closes filedescriptor owned by kms_dri_sw_winsys struct. It fixes
> issue
> where removal of udl or evdi module used by DisplayLink devices was
> impossible
> due to not closed filedescriptors.
>
> When this file descriptor was not closed then command
> rmmod udl was returning error "Module udl is in use".
> By this fix xserver does not prevent module removal when usb device
> is unplugged.
>
> Signed-off-by: Lukasz Spintzyk 
> ---
>  src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c
> b/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c
> index 07eca99..f06ccef 100644
> --- a/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c
> +++ b/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c
> @@ -371,6 +371,9 @@ kms_sw_displaytarget_display(struct sw_winsys *ws,
>  static void
>  kms_destroy_sw_winsys(struct sw_winsys *winsys)
>  {
> +   struct kms_sw_winsys *kms_sw = kms_sw_winsys(winsys);
> +
> +   close(kms_sw->fd);
> FREE(winsys);
>  }
>
> --
> 2.7.4
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/4] i965: Use blorp_copy to implement glCopyImageSubData

2016-09-09 Thread Jason Ekstrand

Now that we have the shiny new blorp_copy entrypoint that was added to
implement VkCopy* in the Vulkan driver, it's almost trivial to hook the GL
driver up to use it for glCopyImageSubData.  What's nice about this new
entrypoint is that it can handle everything that glCopyImageSubData can
throw at it instead of being restricted to just uncompressed destinations.
Also, the old blorp and meta-based approaches are broken for compressed
textures with multiple miplevels and fixing them isn't really practical.

Jason Ekstrand (4):
  intel/isl: Allow valign2 for texture-only Y-tiled surfaces on gen7
  intel/isl: Add support for RGB formats in X and Y-tiled memory
  i965/blorp: Add a copy_miptrees helper
  i965: Use blorp_copy for all copy_image operations on gen6+

 src/intel/isl/isl.c  | 49 ++-
 src/intel/isl/isl.h  | 10 +++-
 src/intel/isl/isl_gen7.c |  3 +-
 src/mesa/drivers/dri/i965/brw_blorp.c| 71 
 src/mesa/drivers/dri/i965/brw_blorp.h| 10 
 src/mesa/drivers/dri/i965/intel_copy_image.c | 28 +++
 6 files changed, 134 insertions(+), 37 deletions(-)

-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/4] i965/blorp: Add a copy_miptrees helper

2016-09-09 Thread Jason Ekstrand

---
 src/mesa/drivers/dri/i965/brw_blorp.c | 71 +++
 src/mesa/drivers/dri/i965/brw_blorp.h | 10 +
 2 files changed, 81 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c 
b/src/mesa/drivers/dri/i965/brw_blorp.c
index 193c7a4..626c33c 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.c
+++ b/src/mesa/drivers/dri/i965/brw_blorp.c
@@ -288,6 +288,23 @@ physical_to_logical_layer(struct intel_mipmap_tree *mt,
}
 }
 
+static void
+miptree_check_level_logical_layer(struct intel_mipmap_tree *mt,
+  unsigned level,
+  unsigned logical_layer)
+{
+   if (mt->msaa_layout == INTEL_MSAA_LAYOUT_UMS ||
+   mt->msaa_layout == INTEL_MSAA_LAYOUT_CMS) {
+  const unsigned num_samples = MAX2(1, mt->num_samples);
+  for (unsigned s = 0; s < num_samples; s++) {
+ const unsigned physical_layer = (logical_layer * num_samples) + s;
+ intel_miptree_check_level_layer(mt, level, physical_layer);
+  }
+   } else {
+  intel_miptree_check_level_layer(mt, level, logical_layer);
+   }
+}
+
 /**
  * Note: if the src (or dst) is a 2D multisample array texture on Gen7+ using
  * INTEL_MSAA_LAYOUT_UMS or INTEL_MSAA_LAYOUT_CMS, src_layer (dst_layer) is
@@ -392,6 +409,60 @@ brw_blorp_blit_miptrees(struct brw_context *brw,
   dst_mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_UNRESOLVED;
 }
 
+void
+brw_blorp_copy_miptrees(struct brw_context *brw,
+struct intel_mipmap_tree *src_mt,
+unsigned src_level, unsigned src_layer,
+struct intel_mipmap_tree *dst_mt,
+unsigned dst_level, unsigned dst_layer,
+unsigned src_x, unsigned src_y,
+unsigned dst_x, unsigned dst_y,
+unsigned src_width, unsigned src_height)
+{
+   /* Get ready to blit.  This includes depth resolving the src and dst
+* buffers if necessary.  Note: it's not necessary to do a color resolve on
+* the destination buffer because we use the standard render path to render
+* to destination color buffers, and the standard render path is
+* fast-color-aware.
+*/
+   intel_miptree_resolve_color(brw, src_mt, INTEL_MIPTREE_IGNORE_CCS_E);
+   intel_miptree_slice_resolve_depth(brw, src_mt, src_level, src_layer);
+   intel_miptree_slice_resolve_depth(brw, dst_mt, dst_level, dst_layer);
+
+   intel_miptree_prepare_mcs(brw, dst_mt);
+
+   DBG("%s from %dx %s mt %p %d %d (%d,%d) %dx%d"
+   "to %dx %s mt %p %d %d (%d,%d)\n",
+   __func__,
+   src_mt->num_samples, _mesa_get_format_name(src_mt->format), src_mt,
+   src_level, src_layer, src_x, src_y, src_width, src_height,
+   dst_mt->num_samples, _mesa_get_format_name(dst_mt->format), dst_mt,
+   dst_level, dst_layer, dst_x, dst_y);
+
+   miptree_check_level_logical_layer(src_mt, src_level, src_layer);
+   miptree_check_level_logical_layer(dst_mt, dst_level, dst_layer);
+   intel_miptree_used_for_rendering(dst_mt);
+
+   struct isl_surf tmp_surfs[4];
+   struct blorp_surf src_surf, dst_surf;
+   blorp_surf_for_miptree(brw, _surf, src_mt, false,
+  _level, _surfs[0]);
+   blorp_surf_for_miptree(brw, _surf, dst_mt, true,
+  _level, _surfs[2]);
+
+   struct blorp_batch batch;
+   blorp_batch_init(>blorp, , brw);
+   blorp_copy(, _surf, src_level, src_layer,
+  _surf, dst_level, dst_layer,
+  src_x, src_y, dst_x, dst_y, src_width, src_height);
+   blorp_batch_finish();
+
+   intel_miptree_slice_set_needs_hiz_resolve(dst_mt, dst_level, dst_layer);
+
+   if (intel_miptree_is_lossless_compressed(brw, dst_mt))
+  dst_mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_UNRESOLVED;
+}
+
 static struct intel_mipmap_tree *
 find_miptree(GLbitfield buffer_bit, struct intel_renderbuffer *irb)
 {
diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h 
b/src/mesa/drivers/dri/i965/brw_blorp.h
index afbf68f..abf3956 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.h
+++ b/src/mesa/drivers/dri/i965/brw_blorp.h
@@ -48,6 +48,16 @@ brw_blorp_blit_miptrees(struct brw_context *brw,
 GLenum filter, bool mirror_x, bool mirror_y,
 bool decode_srgb, bool encode_srgb);
 
+void
+brw_blorp_copy_miptrees(struct brw_context *brw,
+struct intel_mipmap_tree *src_mt,
+unsigned src_level, unsigned src_logical_layer,
+struct intel_mipmap_tree *dst_mt,
+unsigned dst_level, unsigned dst_logical_layer,
+unsigned src_x, unsigned src_y,
+unsigned dst_x, unsigned dst_y,
+unsigned src_width, unsigned src_height);
+
 bool
 brw_blorp_clear_color(struct brw_context *brw, struct gl_framebuffer *fb,
   GLbitfield mask, bool partial_clear, bool

[Mesa-dev] [PATCH 2/4] intel/isl: Add support for RGB formats in X and Y-tiled memory

2016-09-09 Thread Jason Ekstrand

Normally, using a non-linear tiling format helps improve cache locality by
ensuring that neighboring pixels are usually close-by in memory.  For RGB
formats, this still sort-of holds, but it can also lead to rather terrible
memory access patterns where a single RGB pixel value crosses a tile
boundary and gets split into two pieces in different 4K pages.  It also
makes for some rather awkward calculations because your tile size is no
longer an even multiple of surface element size.  For these reasons, we
chose to simply never create tiled RGB images in the Vulkan driver.

The GL driver, however, is not so kind so we need to support it somehow.  I
briefly toyed with a couple of different schemes but this is the best one I
could come up with.  The fundamental problem is that a tile no longer
contains an integer number of surface elements.  I briefly considered a
couple other options but found them wanting:

 1) Using floats for the logical tile size.  This leads to potential
rounding error problems.

 2) When presented with a RGB format, just make the tile 3-times as wide.
This isn't so nice because now our tiles are no longer power-of-two
size.  Also, it can force the row_pitch to be larger than needed which,
while not strictly a problem for ISL, causes incompatibility problems
with the way the GL driver chooses surface pitches.

The chosen method requires that you pay attention and not just assume that
your tile_info is in the units you think it is.  However, it's nice because
it provides a nice "these are the units" declaration in isl_tile_info
itself.  Previously, the tile_info wasn't usable as a stand-alone structure
because you had to also know the format.  It also forces figuring out how
to deal with inconsistencies between tiling and format back to the caller
which is good because the two different consumers of isl_tile_info really
want to deal with it differently:  Computation of the surface size wants
the fewest number of horizontal tiles possible while get_intratile_offset
is far more concerned with things aligning nicely.

Signed-off-by: Jason Ekstrand 
Cc: Chad Versace 
---
 src/intel/isl/isl.c | 49 -
 src/intel/isl/isl.h | 10 +-
 2 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
index 34245f4..8c72186 100644
--- a/src/intel/isl/isl.c
+++ b/src/intel/isl/isl.c
@@ -113,7 +113,16 @@ isl_tiling_get_info(const struct isl_device *dev,
const uint32_t bs = format_bpb / 8;
struct isl_extent2d logical_el, phys_B;
 
-   assert(tiling == ISL_TILING_LINEAR || isl_is_pow2(format_bpb));
+   if (tiling != ISL_TILING_LINEAR && ! isl_is_pow2(format_bpb)) {
+  /* It is possible to have non-power-of-two formats in a tiled buffer.
+   * The easiest way to handle this is to treat the tile as if it is three
+   * times as wide.  This way no pixel will ever cross a tile boundary.
+   * This really only works on legacy X and Y tiling formats.
+   */
+  assert(tiling == ISL_TILING_X || tiling == ISL_TILING_Y0);
+  assert(bs % 3 == 0 && isl_is_pow2(format_bpb / 3));
+  return isl_tiling_get_info(dev, tiling, format_bpb / 3, tile_info);
+   }
 
switch (tiling) {
case ISL_TILING_LINEAR:
@@ -209,6 +218,7 @@ isl_tiling_get_info(const struct isl_device *dev,
 
*tile_info = (struct isl_tile_info) {
   .tiling = tiling,
+  .format_bpb = format_bpb,
   .logical_extent_el = logical_el,
   .phys_extent_B = phys_B,
};
@@ -1215,14 +1225,20 @@ isl_surf_init_s(const struct isl_device *dev,
   }
   base_alignment = isl_round_up_to_power_of_two(base_alignment);
} else {
+  assert(fmtl->bpb % tile_info.format_bpb == 0);
+  const uint32_t tile_el_scale = fmtl->bpb / tile_info.format_bpb;
+  assert(tile_el_scale == 1 || tile_el_scale == 3);
+
   assert(phys_slice0_sa.w % fmtl->bw == 0);
   const uint32_t total_w_el = phys_slice0_sa.width / fmtl->bw;
   const uint32_t total_w_tl =
- isl_align_div(total_w_el, tile_info.logical_extent_el.width);
+ isl_align_div(total_w_el * tile_el_scale,
+   tile_info.logical_extent_el.width);
 
   row_pitch = total_w_tl * tile_info.phys_extent_B.width;
   if (row_pitch < info->min_pitch) {
- row_pitch = isl_align(info->min_pitch, tile_info.phys_extent_B.width);
+ row_pitch = isl_align_npot(info->min_pitch,
+tile_info.phys_extent_B.width);
   }
 
   total_h_el += isl_align_div_npot(pad_bytes, row_pitch);
@@ -1678,14 +1694,6 @@ isl_tiling_get_intratile_offset_el(const struct 
isl_device *dev,
uint32_t *x_offset_el,
uint32_t *y_offset_el)
 {
-   /* This function only really works for power-of-two surfaces.  In
-* theory, we could make it work for

[Mesa-dev] [PATCH 4/4] i965: Use blorp_copy for all copy_image operations on gen6+

2016-09-09 Thread Jason Ekstrand

---
 src/mesa/drivers/dri/i965/intel_copy_image.c | 28 ++--
 1 file changed, 6 insertions(+), 22 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_copy_image.c 
b/src/mesa/drivers/dri/i965/intel_copy_image.c
index 1ca6003..7698d8e 100644
--- a/src/mesa/drivers/dri/i965/intel_copy_image.c
+++ b/src/mesa/drivers/dri/i965/intel_copy_image.c
@@ -210,30 +210,14 @@ copy_miptrees(struct brw_context *brw,
   int dst_x, int dst_y, int dst_z, unsigned dst_level,
   int src_width, int src_height)
 {
-   struct gl_context *ctx = >ctx;
unsigned bw, bh;
 
-   if (brw->gen >= 6 &&
-   brw->format_supported_as_render_target[dst_mt->format] &&
-   !_mesa_is_format_compressed(src_mt->format)) {
-
-  /* We'll use the destination format for both images */
-  mesa_format format = dst_mt->format;
-
-  brw_blorp_blit_miptrees(brw,
-  src_mt, src_level, src_z, format, SWIZZLE_XYZW,
-  dst_mt, dst_level, dst_z, format,
-  src_x, src_y,
-  src_x + src_width, src_y + src_height,
-  dst_x, dst_y,
-  dst_x + src_width, dst_y + src_height,
-  GL_NEAREST, false, false, /* mirror */
-  false, false);
-  return;
-   }
-
-   if (src_mt->num_samples > 0 || dst_mt->num_samples > 0) {
-  _mesa_problem(ctx, "Failed to copy multisampled texture with BLORP\n");
+   if (brw->gen >= 6) {
+  brw_blorp_copy_miptrees(brw,
+  src_mt, src_level, src_z,
+  dst_mt, dst_level, dst_z,
+  src_x, src_y, dst_x, dst_y,
+  src_width, src_height);
   return;
}
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/4] intel/isl: Allow valign2 for texture-only Y-tiled surfaces on gen7

2016-09-09 Thread Jason Ekstrand

The restriction that Y-tiled surfaces must have valign == 4 only aplies to
render targets but we were applying it universally.  This causes problems
if R32G32B32_SFLOAT is used because it requires valign == 2; this should be
okay because you can't render to that format.

Signed-off-by: Jason Ekstrand 
Cc: Chad Versace 
---
 src/intel/isl/isl_gen7.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/intel/isl/isl_gen7.c b/src/intel/isl/isl_gen7.c
index 02273f8..f3d8428 100644
--- a/src/intel/isl/isl_gen7.c
+++ b/src/intel/isl/isl_gen7.c
@@ -354,7 +354,8 @@ gen7_choose_valign_el(const struct isl_device *dev,
 */
if (isl_surf_usage_is_depth(info->usage) ||
info->samples > 1 ||
-   tiling == ISL_TILING_Y0) {
+   ((info->usage & ISL_SURF_USAGE_RENDER_TARGET_BIT) &&
+tiling == ISL_TILING_Y0)) {
   require_valign4 = true;
}
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 5/5] radeonsi: flush TC L2 before using a compute indirect buffer

2016-09-09 Thread Nicolai Hähnle


This series is

Reviewed-by: Nicolai Hähnle 

On 09.09.2016 17:05, Marek Olšák wrote:

From: Marek Olšák 

There is no known test for this.

Cc: 12.0 
---
 src/gallium/drivers/radeonsi/si_compute.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
b/src/gallium/drivers/radeonsi/si_compute.c
index f43c616..d988214 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -457,24 +457,32 @@ static void si_launch_grid(
info->block[0] * info->block[1] * info->block[2] > 256;

if (cs_regalloc_hang)
sctx->b.flags |= SI_CONTEXT_PS_PARTIAL_FLUSH |
 SI_CONTEXT_CS_PARTIAL_FLUSH;

si_decompress_compute_textures(sctx);

/* Add buffer sizes for memory checking in need_cs_space. */
r600_context_add_resource_size(ctx, >shader.bo->b.b);
-   if (info->indirect)
-   r600_context_add_resource_size(ctx, info->indirect);
/* TODO: add the scratch buffer */

+   if (info->indirect) {
+   r600_context_add_resource_size(ctx, info->indirect);
+
+   /* The hw doesn't read the indirect buffer via TC L2. */
+   if (r600_resource(info->indirect)->TC_L2_dirty) {
+   sctx->b.flags |= SI_CONTEXT_INV_GLOBAL_L2;
+   r600_resource(info->indirect)->TC_L2_dirty = false;
+   }
+   }
+
si_need_cs_space(sctx);

if (!sctx->cs_shader_state.initialized)
si_initialize_compute(sctx);

if (sctx->b.flags)
si_emit_cache_flush(sctx);

if (!si_switch_compute_shader(sctx, program, >shader, 
info->pc))
return;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 01/11] glapi: add entry points for GL_ARB_compute_variable_group_size

2016-09-09 Thread Marek Olšák

On Fri, Sep 9, 2016 at 5:46 PM, Samuel Pitoiset
 wrote:
>
>
> On 09/08/2016 10:58 PM, Ian Romanick wrote:
>>
>> On 09/08/2016 01:31 PM, Samuel Pitoiset wrote:
>>>
>>> Signed-off-by: Samuel Pitoiset 
>>> ---
>>>  .../glapi/gen/ARB_compute_variable_group_size.xml  | 25
>>> ++
>>>  src/mapi/glapi/gen/Makefile.am |  1 +
>>>  src/mapi/glapi/gen/gl_API.xml  |  2 ++
>>>  src/mesa/main/compute.c|  8 +++
>>>  src/mesa/main/compute.h|  5 +
>>>  src/mesa/main/tests/dispatch_sanity.cpp|  3 +++
>>>  6 files changed, 44 insertions(+)
>>>  create mode 100644
>>> src/mapi/glapi/gen/ARB_compute_variable_group_size.xml
>>>
>>> diff --git a/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml
>>> b/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml
>>> new file mode 100644
>>> index 000..b21c52f
>>> --- /dev/null
>>> +++ b/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml
>>> @@ -0,0 +1,25 @@
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +  >> value="0x9344"/>
>>> +  >> value="0x90EB"/>
>>> +  >> value="0x9345"/>
>>> +  >> value="0x91BF"/>
>>> +
>>> +  
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +  
>>> +
>>> +
>>> +
>>> diff --git a/src/mapi/glapi/gen/Makefile.am
>>> b/src/mapi/glapi/gen/Makefile.am
>>> index 0d7c338..49fdfe3 100644
>>> --- a/src/mapi/glapi/gen/Makefile.am
>>> +++ b/src/mapi/glapi/gen/Makefile.am
>>> @@ -117,6 +117,7 @@ API_XML = \
>>> ARB_color_buffer_float.xml \
>>> ARB_compressed_texture_pixel_storage.xml \
>>> ARB_compute_shader.xml \
>>> +   ARB_compute_variable_group_size.xml \
>>> ARB_copy_buffer.xml \
>>> ARB_copy_image.xml \
>>> ARB_debug_output.xml \
>>> diff --git a/src/mapi/glapi/gen/gl_API.xml
>>> b/src/mapi/glapi/gen/gl_API.xml
>>> index c39aa22..9ad3b60 100644
>>> --- a/src/mapi/glapi/gen/gl_API.xml
>>> +++ b/src/mapi/glapi/gen/gl_API.xml
>>> @@ -8258,6 +8258,8 @@
>>>
>>>  >> xmlns:xi="http://www.w3.org/2001/XInclude"/>
>>>
>>> +>> xmlns:xi="http://www.w3.org/2001/XInclude"/>
>>> +
>>
>>
>> This (extension #153) should go before ARB_indirect_parameters
>> (extension #154), and the "ARB extensions 149 - 153" should be changed
>> to "ARB extensions 149 - 152".
>
>
> Presumably, you meant "ARB extensions 149 - 154"? Because the next section
> is "ARB extension 155 - 159".

The range indicates which extensions are NOT defined by the file.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 01/11] glapi: add entry points for GL_ARB_compute_variable_group_size

2016-09-09 Thread Samuel Pitoiset




On 09/09/2016 06:31 PM, Marek Olšák wrote:

On Fri, Sep 9, 2016 at 5:46 PM, Samuel Pitoiset
 wrote:



On 09/08/2016 10:58 PM, Ian Romanick wrote:


On 09/08/2016 01:31 PM, Samuel Pitoiset wrote:


Signed-off-by: Samuel Pitoiset 
---
 .../glapi/gen/ARB_compute_variable_group_size.xml  | 25
++
 src/mapi/glapi/gen/Makefile.am |  1 +
 src/mapi/glapi/gen/gl_API.xml  |  2 ++
 src/mesa/main/compute.c|  8 +++
 src/mesa/main/compute.h|  5 +
 src/mesa/main/tests/dispatch_sanity.cpp|  3 +++
 6 files changed, 44 insertions(+)
 create mode 100644
src/mapi/glapi/gen/ARB_compute_variable_group_size.xml

diff --git a/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml
b/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml
new file mode 100644
index 000..b21c52f
--- /dev/null
+++ b/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml
@@ -0,0 +1,25 @@
+
+
+
+
+
+
+
+
+
+  
+  
+  
+  
+
+  
+
+
+
+
+
+
+  
+
+
+
diff --git a/src/mapi/glapi/gen/Makefile.am
b/src/mapi/glapi/gen/Makefile.am
index 0d7c338..49fdfe3 100644
--- a/src/mapi/glapi/gen/Makefile.am
+++ b/src/mapi/glapi/gen/Makefile.am
@@ -117,6 +117,7 @@ API_XML = \
ARB_color_buffer_float.xml \
ARB_compressed_texture_pixel_storage.xml \
ARB_compute_shader.xml \
+   ARB_compute_variable_group_size.xml \
ARB_copy_buffer.xml \
ARB_copy_image.xml \
ARB_debug_output.xml \
diff --git a/src/mapi/glapi/gen/gl_API.xml
b/src/mapi/glapi/gen/gl_API.xml
index c39aa22..9ad3b60 100644
--- a/src/mapi/glapi/gen/gl_API.xml
+++ b/src/mapi/glapi/gen/gl_API.xml
@@ -8258,6 +8258,8 @@

 http://www.w3.org/2001/XInclude"/>

+http://www.w3.org/2001/XInclude"/>
+



This (extension #153) should go before ARB_indirect_parameters
(extension #154), and the "ARB extensions 149 - 153" should be changed
to "ARB extensions 149 - 152".



Presumably, you meant "ARB extensions 149 - 154"? Because the next section
is "ARB extension 155 - 159".


The range indicates which extensions are NOT defined by the file.


Ah okay.



Marek


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/3] docs: Note MESA_configless_context as superseded

2016-09-09 Thread Adam Jackson

Signed-off-by: Adam Jackson 
---
 docs/specs/MESA_configless_context.spec | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/docs/specs/MESA_configless_context.spec 
b/docs/specs/MESA_configless_context.spec
index f2fafb3..d7ba62d 100644
--- a/docs/specs/MESA_configless_context.spec
+++ b/docs/specs/MESA_configless_context.spec
@@ -12,11 +12,12 @@ Contact
 
 Status
 
-Proposal
+Superseded by the functionally identical EGL_KHR_no_config_context
+extension.
 
 Version
 
-Version 1, February 28, 2014
+Version 2, September 9, 2016
 
 Number
 
@@ -121,5 +122,8 @@ Issues
 
 Revision History
 
+Version 2, September 9, 2016
+Defer to EGL_KHR_no_config_context (Adam Jackson)
+
 Version 1, February 28, 2014
 Initial draft (Neil Roberts)
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] egl: Rename MESA_configless_context bit to KHR_no_config_context

2016-09-09 Thread Adam Jackson

Keep the old name in the extension string, but refer to the KHR
extension internally.

Signed-off-by: Adam Jackson 
---
 src/egl/drivers/dri2/egl_dri2.c | 2 +-
 src/egl/main/eglapi.c   | 6 --
 src/egl/main/eglcontext.c   | 4 ++--
 src/egl/main/egldisplay.h   | 2 +-
 4 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
index 859612f..bbc457c 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -603,8 +603,8 @@ dri2_setup_screen(_EGLDisplay *disp)
   disp->ClientAPIs |= EGL_OPENGL_ES3_BIT_KHR;
 
assert(dri2_dpy->image_driver || dri2_dpy->dri2 || dri2_dpy->swrast);
+   disp->Extensions.KHR_no_config_context = EGL_TRUE;
disp->Extensions.KHR_surfaceless_context = EGL_TRUE;
-   disp->Extensions.MESA_configless_context = EGL_TRUE;
 
if (dri2_renderer_query_integer(dri2_dpy,
__DRI2_RENDERER_HAS_FRAMEBUFFER_SRGB))
diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c
index df2dcd6..ba8305e 100644
--- a/src/egl/main/eglapi.c
+++ b/src/egl/main/eglapi.c
@@ -406,11 +406,13 @@ _eglCreateExtensionsString(_EGLDisplay *dpy)
   _eglAppendExtension(, "EGL_KHR_image");
_EGL_CHECK_EXTENSION(KHR_image_base);
_EGL_CHECK_EXTENSION(KHR_image_pixmap);
+   _EGL_CHECK_EXTENSION(KHR_no_config_context);
_EGL_CHECK_EXTENSION(KHR_reusable_sync);
_EGL_CHECK_EXTENSION(KHR_surfaceless_context);
_EGL_CHECK_EXTENSION(KHR_wait_sync);
 
-   _EGL_CHECK_EXTENSION(MESA_configless_context);
+   if (dpy->Extensions.KHR_no_config_context)
+  _eglAppendExtension(, "EGL_MESA_configless_context");
_EGL_CHECK_EXTENSION(MESA_drm_image);
_EGL_CHECK_EXTENSION(MESA_image_dma_buf_export);
 
@@ -627,7 +629,7 @@ eglCreateContext(EGLDisplay dpy, EGLConfig config, 
EGLContext share_list,
 
_EGL_CHECK_DISPLAY(disp, EGL_NO_CONTEXT, drv);
 
-   if (!config && !disp->Extensions.MESA_configless_context)
+   if (!config && !disp->Extensions.KHR_no_config_context)
   RETURN_EGL_ERROR(disp, EGL_BAD_CONFIG, EGL_NO_CONTEXT);
 
if (!share && share_list != EGL_NO_CONTEXT)
diff --git a/src/egl/main/eglcontext.c b/src/egl/main/eglcontext.c
index 7eac79a..60625f6 100644
--- a/src/egl/main/eglcontext.c
+++ b/src/egl/main/eglcontext.c
@@ -642,9 +642,9 @@ _eglCheckMakeCurrent(_EGLContext *ctx, _EGLSurface *draw, 
_EGLSurface *read)
   (read && read->Config != ctx->Config))
  return _eglError(EGL_BAD_MATCH, "eglMakeCurrent");
} else {
-  /* Otherwise we must be using the EGL_MESA_configless_context
+  /* Otherwise we must be using the EGL_KHR_no_config_context
* extension */
-  assert(dpy->Extensions.MESA_configless_context);
+  assert(dpy->Extensions.KHR_no_config_context);
 
   /* The extension doesn't permit binding draw and read buffers with
* differing contexts */
diff --git a/src/egl/main/egldisplay.h b/src/egl/main/egldisplay.h
index 6bfc858..6f3340e 100644
--- a/src/egl/main/egldisplay.h
+++ b/src/egl/main/egldisplay.h
@@ -112,11 +112,11 @@ struct _egl_extensions
EGLBoolean KHR_gl_texture_cubemap_image;
EGLBoolean KHR_image_base;
EGLBoolean KHR_image_pixmap;
+   EGLBoolean KHR_no_config_context;
EGLBoolean KHR_reusable_sync;
EGLBoolean KHR_surfaceless_context;
EGLBoolean KHR_wait_sync;
 
-   EGLBoolean MESA_configless_context;
EGLBoolean MESA_drm_image;
EGLBoolean MESA_image_dma_buf_export;
 
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/3] Implement EGL_KHR_no_config_context

2016-09-09 Thread Adam Jackson

KHR_no_config_context is virtually identical to MESA_configless_context,
where they differ is only where the Mesa spec is silent, and for the
most part the code already did what the Khronos spec wants. Fix up the
one corner case, and mark the Mesa extension as superseded.

- ajax

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/3] egl: QueryContext on a configless context returns zero

2016-09-09 Thread Adam Jackson

MESA_configless_context does not specify the interaction with
QueryContext at all, and the code to generate an error in this case
predates the Mesa extension. Since EGL_NO_CONFIG_{KHR,MESA} are
numerically identical there's no way to distinguish which one the
application asked for, so use the KHR behaviour.

Signed-off-by: Adam Jackson 
---
 src/egl/main/eglcontext.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/egl/main/eglcontext.c b/src/egl/main/eglcontext.c
index 58740c3..7eac79a 100644
--- a/src/egl/main/eglcontext.c
+++ b/src/egl/main/eglcontext.c
@@ -538,9 +538,14 @@ _eglQueryContext(_EGLDriver *drv, _EGLDisplay *dpy, 
_EGLContext *c,
 
switch (attribute) {
case EGL_CONFIG_ID:
-  if (!c->Config)
- return _eglError(EGL_BAD_ATTRIBUTE, "eglQueryContext");
-  *value = c->Config->ConfigID;
+  /*
+   * From EGL_KHR_no_config_context:
+   *
+   *"Querying EGL_CONFIG_ID returns the ID of the EGLConfig with
+   * respect to which the context was created, or zero if created
+   * without respect to an EGLConfig."
+   */
+  *value = c->Config ? c->Config->ConfigID : 0;
   break;
case EGL_CONTEXT_CLIENT_VERSION:
   *value = c->ClientMajorVersion;
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: support TGSI compute shaders with variable block size

2016-09-09 Thread Nicolai Hähnle


On 09.09.2016 18:07, Samuel Pitoiset wrote:



On 09/09/2016 02:37 PM, Ilia Mirkin wrote:

On Fri, Sep 9, 2016 at 8:29 AM, Marek Olšák  wrote:

On Fri, Sep 9, 2016 at 10:12 AM, Nicolai Hähnle 
wrote:

From: Nicolai Hähnle 

Not sure if it's possible to avoid programming the block size twice
(once for
the userdata and once for the dispatch).

Since the shaders are compiled with a pessimistic upper limit on the
number of
registers, asynchronously compiling variants may be worth
considering in the
future if we observe the shaders to be dispatched with small block
sizes.
---
I think this is sufficient to support variable group sizes on
radeonsi, but
it's completely untested. Do you keep the latest version of your
series in a
public repository somewhere?

 src/gallium/drivers/radeonsi/si_compute.c | 10 +-
 src/gallium/drivers/radeonsi/si_shader.c  | 29
-
 src/gallium/drivers/radeonsi/si_shader.h  |  4 +++-
 3 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_compute.c
b/src/gallium/drivers/radeonsi/si_compute.c
index 5041761..26e096c 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -379,25 +379,33 @@ static void si_setup_tgsi_grid(struct
si_context *sctx,
for (i = 0; i < 3; ++i) {
radeon_emit(cs, PKT3(PKT3_COPY_DATA, 4, 0));
radeon_emit(cs,
COPY_DATA_SRC_SEL(COPY_DATA_MEM) |

COPY_DATA_DST_SEL(COPY_DATA_REG));
radeon_emit(cs, (va +  4 * i));
radeon_emit(cs, (va + 4 * i) >> 32);
radeon_emit(cs, (grid_size_reg >> 2) + i);
radeon_emit(cs, 0);
}
} else {
+   struct si_compute *program =
sctx->cs_shader_state.program;
+   bool variable_group_size =
+
program->shader.selector->info.properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH]
== 0;

-   radeon_set_sh_reg_seq(cs, grid_size_reg, 3);
+   radeon_set_sh_reg_seq(cs, grid_size_reg,
variable_group_size ? 6 : 3);
radeon_emit(cs, info->grid[0]);
radeon_emit(cs, info->grid[1]);
radeon_emit(cs, info->grid[2]);
+   if (variable_group_size) {
+   radeon_emit(cs, info->block[0]);
+   radeon_emit(cs, info->block[1]);
+   radeon_emit(cs, info->block[2]);
+   }
}
 }

 static void si_emit_dispatch_packets(struct si_context *sctx,
  const struct pipe_grid_info
*info)
 {
struct radeon_winsys_cs *cs = sctx->b.gfx.cs;
bool render_cond_bit = sctx->b.render_cond &&
!sctx->b.render_cond_force_off;
unsigned waves_per_threadgroup =
DIV_ROUND_UP(info->block[0] * info->block[1] *
info->block[2], 64);
diff --git a/src/gallium/drivers/radeonsi/si_shader.c
b/src/gallium/drivers/radeonsi/si_shader.c
index 0b7de18..730ee21 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1783,30 +1783,35 @@ static void declare_system_value(

case TGSI_SEMANTIC_GRID_SIZE:
value = LLVMGetParam(radeon_bld->main_fn,
SI_PARAM_GRID_SIZE);
break;

case TGSI_SEMANTIC_BLOCK_SIZE:
{
LLVMValueRef values[3];
unsigned i;
unsigned *properties =
ctx->shader->selector->info.properties;
-   unsigned sizes[3] = {
-   properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH],
-
properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT],
-   properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH]
-   };

-   for (i = 0; i < 3; ++i)
-   values[i] = lp_build_const_int32(gallivm,
sizes[i]);
+   if (properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH]
!= 0) {
+   unsigned sizes[3] = {
+
properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH],
+
properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT],
+
properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH]
+   };
+
+   for (i = 0; i < 3; ++i)
+   values[i] =
lp_build_const_int32(gallivm, sizes[i]);

-   value = lp_build_gather_values(gallivm, values, 3);
+   value = lp_build_gather_values(gallivm,
values, 3);
+   } else {
+   value = LLVMGetParam(radeon_bld->main_fn,
SI_PARAM_BLOCK_SIZE);
+   }
break;
}

case TGSI_SEMANTIC_BLOCK_ID:
value = LLVMGetParam(radeon_bld->main_fn,
SI_PARAM_BLOCK_ID);
break;

case TGSI_SEMANTIC_THREAD_ID:
value = LLVMGetParam(radeon_bld->main_fn,
SI_PARAM_THREAD_ID);

Re: [Mesa-dev] [PATCH] radeonsi: support TGSI compute shaders with variable block size

2016-09-09 Thread Samuel Pitoiset




On 09/09/2016 02:37 PM, Ilia Mirkin wrote:

On Fri, Sep 9, 2016 at 8:29 AM, Marek Olšák  wrote:

On Fri, Sep 9, 2016 at 10:12 AM, Nicolai Hähnle  wrote:

From: Nicolai Hähnle 

Not sure if it's possible to avoid programming the block size twice (once for
the userdata and once for the dispatch).

Since the shaders are compiled with a pessimistic upper limit on the number of
registers, asynchronously compiling variants may be worth considering in the
future if we observe the shaders to be dispatched with small block sizes.
---
I think this is sufficient to support variable group sizes on radeonsi, but
it's completely untested. Do you keep the latest version of your series in a
public repository somewhere?

 src/gallium/drivers/radeonsi/si_compute.c | 10 +-
 src/gallium/drivers/radeonsi/si_shader.c  | 29 -
 src/gallium/drivers/radeonsi/si_shader.h  |  4 +++-
 3 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
b/src/gallium/drivers/radeonsi/si_compute.c
index 5041761..26e096c 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -379,25 +379,33 @@ static void si_setup_tgsi_grid(struct si_context *sctx,
for (i = 0; i < 3; ++i) {
radeon_emit(cs, PKT3(PKT3_COPY_DATA, 4, 0));
radeon_emit(cs, COPY_DATA_SRC_SEL(COPY_DATA_MEM) |
COPY_DATA_DST_SEL(COPY_DATA_REG));
radeon_emit(cs, (va +  4 * i));
radeon_emit(cs, (va + 4 * i) >> 32);
radeon_emit(cs, (grid_size_reg >> 2) + i);
radeon_emit(cs, 0);
}
} else {
+   struct si_compute *program = sctx->cs_shader_state.program;
+   bool variable_group_size =
+   
program->shader.selector->info.properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] 
== 0;

-   radeon_set_sh_reg_seq(cs, grid_size_reg, 3);
+   radeon_set_sh_reg_seq(cs, grid_size_reg, variable_group_size ? 
6 : 3);
radeon_emit(cs, info->grid[0]);
radeon_emit(cs, info->grid[1]);
radeon_emit(cs, info->grid[2]);
+   if (variable_group_size) {
+   radeon_emit(cs, info->block[0]);
+   radeon_emit(cs, info->block[1]);
+   radeon_emit(cs, info->block[2]);
+   }
}
 }

 static void si_emit_dispatch_packets(struct si_context *sctx,
  const struct pipe_grid_info *info)
 {
struct radeon_winsys_cs *cs = sctx->b.gfx.cs;
bool render_cond_bit = sctx->b.render_cond && 
!sctx->b.render_cond_force_off;
unsigned waves_per_threadgroup =
DIV_ROUND_UP(info->block[0] * info->block[1] * info->block[2], 
64);
diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 0b7de18..730ee21 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1783,30 +1783,35 @@ static void declare_system_value(

case TGSI_SEMANTIC_GRID_SIZE:
value = LLVMGetParam(radeon_bld->main_fn, SI_PARAM_GRID_SIZE);
break;

case TGSI_SEMANTIC_BLOCK_SIZE:
{
LLVMValueRef values[3];
unsigned i;
unsigned *properties = ctx->shader->selector->info.properties;
-   unsigned sizes[3] = {
-   properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH],
-   properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT],
-   properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH]
-   };

-   for (i = 0; i < 3; ++i)
-   values[i] = lp_build_const_int32(gallivm, sizes[i]);
+   if (properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] != 0) {
+   unsigned sizes[3] = {
+   properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH],
+   properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT],
+   properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH]
+   };
+
+   for (i = 0; i < 3; ++i)
+   values[i] = lp_build_const_int32(gallivm, 
sizes[i]);

-   value = lp_build_gather_values(gallivm, values, 3);
+   value = lp_build_gather_values(gallivm, values, 3);
+   } else {
+   value = LLVMGetParam(radeon_bld->main_fn, 
SI_PARAM_BLOCK_SIZE);
+   }
break;
}

case TGSI_SEMANTIC_BLOCK_ID:
value = LLVMGetParam(radeon_bld->main_fn, SI_PARAM_BLOCK_ID);

Re: [Mesa-dev] [PATCH] radeonsi: support TGSI compute shaders with variable block size

2016-09-09 Thread Samuel Pitoiset




On 09/09/2016 10:12 AM, Nicolai Hähnle wrote:

From: Nicolai Hähnle 

Not sure if it's possible to avoid programming the block size twice (once for
the userdata and once for the dispatch).

Since the shaders are compiled with a pessimistic upper limit on the number of
registers, asynchronously compiling variants may be worth considering in the
future if we observe the shaders to be dispatched with small block sizes.
---
I think this is sufficient to support variable group sizes on radeonsi, but
it's completely untested. Do you keep the latest version of your series in a
public repository somewhere?


Yes, you can find the current version here:
https://cgit.freedesktop.org/~hakzsam/mesa/log/?h=arb_compute_variable_group_size

The next one will be in the arb_compute_variable_group_size_v1 branch 
(once I have fixed all the things locally).




 src/gallium/drivers/radeonsi/si_compute.c | 10 +-
 src/gallium/drivers/radeonsi/si_shader.c  | 29 -
 src/gallium/drivers/radeonsi/si_shader.h  |  4 +++-
 3 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
b/src/gallium/drivers/radeonsi/si_compute.c
index 5041761..26e096c 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -379,25 +379,33 @@ static void si_setup_tgsi_grid(struct si_context *sctx,
for (i = 0; i < 3; ++i) {
radeon_emit(cs, PKT3(PKT3_COPY_DATA, 4, 0));
radeon_emit(cs, COPY_DATA_SRC_SEL(COPY_DATA_MEM) |
COPY_DATA_DST_SEL(COPY_DATA_REG));
radeon_emit(cs, (va +  4 * i));
radeon_emit(cs, (va + 4 * i) >> 32);
radeon_emit(cs, (grid_size_reg >> 2) + i);
radeon_emit(cs, 0);
}
} else {
+   struct si_compute *program = sctx->cs_shader_state.program;
+   bool variable_group_size =
+   
program->shader.selector->info.properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] 
== 0;

-   radeon_set_sh_reg_seq(cs, grid_size_reg, 3);
+   radeon_set_sh_reg_seq(cs, grid_size_reg, variable_group_size ? 
6 : 3);
radeon_emit(cs, info->grid[0]);
radeon_emit(cs, info->grid[1]);
radeon_emit(cs, info->grid[2]);
+   if (variable_group_size) {
+   radeon_emit(cs, info->block[0]);
+   radeon_emit(cs, info->block[1]);
+   radeon_emit(cs, info->block[2]);
+   }
}
 }

 static void si_emit_dispatch_packets(struct si_context *sctx,
  const struct pipe_grid_info *info)
 {
struct radeon_winsys_cs *cs = sctx->b.gfx.cs;
bool render_cond_bit = sctx->b.render_cond && 
!sctx->b.render_cond_force_off;
unsigned waves_per_threadgroup =
DIV_ROUND_UP(info->block[0] * info->block[1] * info->block[2], 
64);
diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 0b7de18..730ee21 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1783,30 +1783,35 @@ static void declare_system_value(

case TGSI_SEMANTIC_GRID_SIZE:
value = LLVMGetParam(radeon_bld->main_fn, SI_PARAM_GRID_SIZE);
break;

case TGSI_SEMANTIC_BLOCK_SIZE:
{
LLVMValueRef values[3];
unsigned i;
unsigned *properties = ctx->shader->selector->info.properties;
-   unsigned sizes[3] = {
-   properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH],
-   properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT],
-   properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH]
-   };

-   for (i = 0; i < 3; ++i)
-   values[i] = lp_build_const_int32(gallivm, sizes[i]);
+   if (properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] != 0) {
+   unsigned sizes[3] = {
+   properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH],
+   properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT],
+   properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH]
+   };
+
+   for (i = 0; i < 3; ++i)
+   values[i] = lp_build_const_int32(gallivm, 
sizes[i]);

-   value = lp_build_gather_values(gallivm, values, 3);
+   value = lp_build_gather_values(gallivm, values, 3);
+   } else {
+   value = LLVMGetParam(radeon_bld->main_fn, 
SI_PARAM_BLOCK_SIZE);
+   }
break;
}

case

Re: [Mesa-dev] [PATCH 00/11] add support for ARB_compute_variable_group_size

2016-09-09 Thread Samuel Pitoiset




On 09/09/2016 09:43 AM, Nicolai Hähnle wrote:

Hi Samuel,


Hi,



thanks for working on this! I've only skimmed the series so far without
looking at each patch in detail:

For GCN, we need to pass the (maximum) work group size (aka block size /
local group size) into the compiler, because it affects occupancy and
hence register limits. And the lowering of TGSI_SEMANTIC_BLOCK_SIZE also
needs to be changed. This means more work will definitely be needed for
radeonsi.


I will add a new cap for MaxComputeVariableGroupInvocations as suggested 
by Marek. And enable the extension based on that cap.


That way, you will be able to fix up radeonsi in a separate patch.



It would be great if you could add docs for
TGSI_SEMANTIC_{GROUP,BLOCK}_SIZE to tgsi.rst. I know they were already
missing before.


That would be great yeah, but in a separate patch out of that series I 
think.




Also, the nouveau enablement patch should come _before_ the patch that
turns on the extension...


Good catch, thanks. :)



Cheers,
Nicolai

On 08.09.2016 22:31, Samuel Pitoiset wrote:

Hi,

This series implements ARB_compute_variable_group_size written against
GL 4.3.
This extension allows to dispatch variable work group size via a new
function
called glDispatchComputeGroupSizeARB().

Because this extension is pretty similar to ARB_compute_shader, all
Gallium
drivers which already support compute shaders will expose
ARB_compute_variable_group_size with that series.

I did write a bunch of piglit tests, have a look here if you want:
https://lists.freedesktop.org/archives/piglit/2016-September/020755.html

All tests pass on Fermi (GF119) as well as all previous compute
shaders tests.

Marek, Nicolai and other AMD folks, I don't know if radeonsi will need
a fix
somewhere for handling a variable work group size, but as I don't have
the
hardware, I can't test. Let me know if something needs to be slighty
updated.

Please review,
Thanks!

Samuel Pitoiset (11):
  glapi: add entry points for GL_ARB_compute_variable_group_size
  mesa/main: add support for ARB_compute_variable_groups_size
  glsl: add enable flags for ARB_compute_variable_group_size
  glsl: process local_size_variable input qualifier
  glsl: reject compute shaders with fixed and variable local size
  glsl/linker: handle errors when a variable local size is used
  glsl: add gl_LocalGroupSizeARB as a system value
  st/mesa: add mapping for SYSTEM_VALUE_LOCAL_GROUP_SIZE
  st/mesa: add support for dispatching a variable local size
  st/mesa: expose ARB_compute_variable_group_size
  nv50/ir: use 1024 threads/block for variable local size

 src/compiler/glsl/ast.h|  5 ++
 src/compiler/glsl/ast_to_hir.cpp   | 14 
 src/compiler/glsl/ast_type.cpp |  6 ++
 src/compiler/glsl/builtin_variables.cpp|  2 +
 src/compiler/glsl/glsl_parser.yy   | 13 +++
 src/compiler/glsl/glsl_parser_extras.cpp   |  6 ++
 src/compiler/glsl/glsl_parser_extras.h |  8 ++
 src/compiler/glsl/linker.cpp   | 23 +-
 src/compiler/glsl/standalone.cpp   |  4 +
 src/compiler/glsl/standalone_scaffolding.cpp   |  5 ++
 src/compiler/shader_enums.h|  1 +
 .../drivers/nouveau/codegen/nv50_ir_target.h   |  3 +-
 .../glapi/gen/ARB_compute_variable_group_size.xml  | 25 ++
 src/mapi/glapi/gen/Makefile.am |  1 +
 src/mapi/glapi/gen/gl_API.xml  |  2 +
 src/mesa/main/api_validate.c   | 94
++
 src/mesa/main/api_validate.h   |  4 +
 src/mesa/main/compute.c| 25 ++
 src/mesa/main/compute.h|  5 ++
 src/mesa/main/context.c|  6 ++
 src/mesa/main/dd.h |  9 +++
 src/mesa/main/extensions_table.h   |  1 +
 src/mesa/main/get.c| 12 +++
 src/mesa/main/get_hash_params.py   |  3 +
 src/mesa/main/mtypes.h | 23 +-
 src/mesa/main/shaderapi.c  |  1 +
 src/mesa/main/shaderobj.c  |  2 +
 src/mesa/main/tests/dispatch_sanity.cpp|  3 +
 src/mesa/state_tracker/st_cb_compute.c | 15 +++-
 src/mesa/state_tracker/st_extensions.c | 13 +++
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp |  2 +
 31 files changed, 329 insertions(+), 7 deletions(-)
 create mode 100644
src/mapi/glapi/gen/ARB_compute_variable_group_size.xml


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 10/11] st/mesa: expose ARB_compute_variable_group_size

2016-09-09 Thread Samuel Pitoiset




On 09/09/2016 04:02 PM, Marek Olšák wrote:

For patches 8, 9:

Reviewed-by: Marek Olšák 

Patch 10 won't work for us, because radeonsi (and presumably softpipe
as well) don't support this feature at the moment. Also, I would
prefer a PIPE_CAP for MaxComputeVariableGroupInvocations and the
extension can be exposed based on that CAP.


Fine by me, I will add this new cap.
Thanks for reviewing.



Marek


On Thu, Sep 8, 2016 at 10:31 PM, Samuel Pitoiset
 wrote:

This extension is only exposed if the underlying driver supports
ARB_compute_shader.

Signed-off-by: Samuel Pitoiset 
---
 src/mesa/state_tracker/st_extensions.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index 807fbfb..dc2e60a 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -1196,6 +1196,19 @@ void st_init_extensions(struct pipe_screen *screen,
  extensions->ARB_compute_shader =
   extensions->ARB_shader_image_load_store 
&&
   extensions->ARB_shader_atomic_counters;
+
+ if (extensions->ARB_compute_shader) {
+/* Because the minimum values required by
+ * ARB_compute_variable_group_size are less (or equal) than the
+ * ones defined by ARB_compute_shader we can re-use them. */
+for (i = 0; i < 3; i++) {
+   consts->MaxComputeVariableGroupSize[i] =
+  consts->MaxComputeWorkGroupSize[i];
+}
+consts->MaxComputeVariableGroupInvocations =
+   consts->MaxComputeWorkGroupInvocations;
+extensions->ARB_compute_variable_group_size = true;
+ }
   }
}

--
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 02/11] mesa/main: add support for ARB_compute_variable_groups_size

2016-09-09 Thread Samuel Pitoiset




On 09/08/2016 10:58 PM, Ian Romanick wrote:

On 09/08/2016 01:31 PM, Samuel Pitoiset wrote:

Signed-off-by: Samuel Pitoiset 
---
 src/mesa/main/api_validate.c | 94 
 src/mesa/main/api_validate.h |  4 ++
 src/mesa/main/compute.c  | 17 
 src/mesa/main/context.c  |  6 +++
 src/mesa/main/dd.h   |  9 
 src/mesa/main/extensions_table.h |  1 +
 src/mesa/main/get.c  | 12 +
 src/mesa/main/get_hash_params.py |  3 ++
 src/mesa/main/mtypes.h   | 23 +-
 src/mesa/main/shaderapi.c|  1 +
 src/mesa/main/shaderobj.c|  2 +
 11 files changed, 171 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
index b35751e..9379015 100644
--- a/src/mesa/main/api_validate.c
+++ b/src/mesa/main/api_validate.c
@@ -1096,6 +1096,7 @@ GLboolean
 _mesa_validate_DispatchCompute(struct gl_context *ctx,
const GLuint *num_groups)
 {
+   struct gl_shader_program *prog;
int i;
FLUSH_CURRENT(ctx, 0);

@@ -1128,6 +1129,86 @@ _mesa_validate_DispatchCompute(struct gl_context *ctx,
   }
}

+   /* From the ARB_compute_variable_group_size specification:
+*
+* "An INVALID_OPERATION error is generated by DispatchCompute if the active
+* program for the compute shader stage has a variable work group size."
+*/


There has been a lot of debate about formatting spec quotations.  The
one thing where I think everyone agrees is formatting the first like.
Please use

The ARB_compute_variable_group_size spec says:

That makes it much easier to grep the code for spec quotations.

This comment applies to the spec quotations below too.


Fine by me.




+   prog = ctx->_Shader->CurrentProgram[MESA_SHADER_COMPUTE];
+   if (prog->Comp.LocalSizeVariable) {
+  _mesa_error(ctx, GL_INVALID_OPERATION,
+  "glDispatchCompute(variable work group size forbidden)");
+  return GL_FALSE;
+   }
+
+   return GL_TRUE;
+}
+
+GLboolean
+_mesa_validate_DispatchComputeGroupSizeARB(struct gl_context *ctx,
+   const GLuint *num_groups,
+   const GLuint *group_size)
+{
+   struct gl_shader_program *prog;
+   GLuint64 total_invocations = 1;
+   int i;
+
+   FLUSH_CURRENT(ctx, 0);
+
+   if (!check_valid_to_compute(ctx, "glDispatchComputeGroupSizeARB"))
+  return GL_FALSE;
+
+   for (i = 0; i < 3; i++) {
+  /* From the ARB_compute_variable_group_size specification:
+   *
+   * "An INVALID_VALUE error is generated by DispatchComputeGroupSizeARB if
+   * any of , , or  is less than
+   * or equal to zero or greater than the maximum local work group size for
+   * compute shaders with variable group size
+   * (MAX_COMPUTE_VARIABLE_GROUP_SIZE_ARB) in the corresponding dimension."
+   *
+   * However, the "less than" is a spec bug because they are declared as
+   * unsigned integers.
+   */
+  if (group_size[i] == 0 ||
+  group_size[i] > ctx->Const.MaxComputeVariableGroupSize[i]) {
+ _mesa_error(ctx, GL_INVALID_VALUE,
+ "glDispatchComputeGroupSizeARB(group_size_%c)", 'x' + i);
+ return GL_FALSE;
+  }
+
+  /* From the ARB_compute_variable_group_size specification:
+   *
+   * "An INVALID_VALUE error is generated by DispatchComputeGroupSizeARB if
+   * the product of , , and 
+   * exceeds the implementation-dependent maximum local work group
+   * invocation count for compute shaders with variable group size
+   * (MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB)."
+   */
+  total_invocations *= group_size[i];
+  if (total_invocations > ctx->Const.MaxComputeVariableGroupInvocations) {
+ _mesa_error(ctx, GL_INVALID_VALUE,
+ "glDispatchComputeGroupSizeARB(product of local_sizes "
+ "exceeds MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB 
(%d))",
+ ctx->Const.MaxComputeVariableGroupInvocations);
+ return GL_FALSE;
+  }


This check should happen after the loop, and you should also log the
value of total_invocations.  Something like:

   _mesa_error(ctx, GL_INVALID_VALUE,
   "glDispatchComputeGroupSizeARB(product of local_sizes "
   "exceeds MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB (%d < %d))",
total_invocations,
ctx->Const.MaxComputeVariableGroupInvocations);


Yes, the message is more informative.




+
+  /* From the ARB_compute_variable_group_size specification:
+   *
+   * "An INVALID_OPERATION error is generated by
+   * DispatchComputeGroupSizeARB if the active program for the compute
+   * shader stage has a fixed work group size."
+   */
+  prog = ctx->_Shader->CurrentProgram[MESA_SHADER_COMPUTE];
+  if

[Mesa-dev] [PATCH v3] vl/dri3: handle the case of different GPU

2016-09-09 Thread Nayan Deshmukh

In case of prime when rendering is done on GPU other then the
server GPU, use a seprate linear buffer for each back buffer
which will be displayed using present extension.

v2: Use a seprate linear buffer for each back buffer (Michel)
v3: change variable names and fix coding style (Leo and Emil)

Signed-off-by: Nayan Deshmukh 
---
 src/gallium/auxiliary/vl/vl_winsys_dri3.c | 63 ---
 1 file changed, 50 insertions(+), 13 deletions(-)

diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c 
b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
index 3d596a6..3607f86 100644
--- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c
+++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
@@ -49,6 +49,7 @@
 struct vl_dri3_buffer
 {
struct pipe_resource *texture;
+   struct pipe_resource *linear_texture;
 
uint32_t pixmap;
uint32_t sync_fence;
@@ -69,6 +70,8 @@ struct vl_dri3_screen
xcb_present_event_t eid;
xcb_special_event_t *special_event;
 
+   struct pipe_context *pipe;
+
struct vl_dri3_buffer *back_buffers[BACK_BUFFER_NUM];
int cur_back;
 
@@ -82,6 +85,7 @@ struct vl_dri3_screen
int64_t last_ust, ns_frame, last_msc, next_msc;
 
bool flushed;
+   bool is_different_gpu;
 };
 
 static void
@@ -102,6 +106,8 @@ dri3_free_back_buffer(struct vl_dri3_screen *scrn,
xcb_sync_destroy_fence(scrn->conn, buffer->sync_fence);
xshmfence_unmap_shm(buffer->shm_fence);
pipe_resource_reference(>texture, NULL);
+   if (buffer->linear_texture)
+   pipe_resource_reference(>linear_texture, NULL);
FREE(buffer);
 }
 
@@ -209,7 +215,7 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
xcb_sync_fence_t sync_fence;
struct xshmfence *shm_fence;
int buffer_fd, fence_fd;
-   struct pipe_resource templ;
+   struct pipe_resource templ, *pixmap_buffer_texture;
struct winsys_handle whandle;
unsigned usage;
 
@@ -226,8 +232,7 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
   goto close_fd;
 
memset(, 0, sizeof(templ));
-   templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW |
-PIPE_BIND_SCANOUT | PIPE_BIND_SHARED;
+   templ.bind = PIPE_BIND_RENDER_TARGET;
templ.format = PIPE_FORMAT_B8G8R8X8_UNORM;
templ.target = PIPE_TEXTURE_2D;
templ.last_level = 0;
@@ -235,16 +240,36 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
templ.height0 = scrn->height;
templ.depth0 = 1;
templ.array_size = 1;
-   buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen,
- );
-   if (!buffer->texture)
-  goto unmap_shm;
 
+   if (scrn->is_different_gpu) {
+  buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen,
+);
+  if (!buffer->texture)
+ goto unmap_shm;
+
+  templ.bind |= PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_SCANOUT |
+PIPE_BIND_SHARED | PIPE_BIND_LINEAR;
+  buffer->linear_texture = 
scrn->base.pscreen->resource_create(scrn->base.pscreen,
+  );
+  pixmap_buffer_texture = buffer->linear_texture;
+
+  if (!buffer->linear_texture)
+ goto no_linear_texture;
+
+   } else {
+  templ.bind |= PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_SCANOUT |
+PIPE_BIND_SHARED;
+  buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen,
+);
+  if (!buffer->texture)
+ goto unmap_shm;
+  pixmap_buffer_texture = buffer->texture;
+   }
memset(, 0, sizeof(whandle));
whandle.type= DRM_API_HANDLE_TYPE_FD;
usage = PIPE_HANDLE_USAGE_EXPLICIT_FLUSH | PIPE_HANDLE_USAGE_READ;
scrn->base.pscreen->resource_get_handle(scrn->base.pscreen, NULL,
-   buffer->texture, ,
+   pixmap_buffer_texture, ,
usage);
buffer_fd = whandle.handle;
buffer->pitch = whandle.stride;
@@ -271,6 +296,8 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
 
return buffer;
 
+no_linear_texture:
+   pipe_resource_reference(>texture, NULL);
 unmap_shm:
xshmfence_unmap_shm(shm_fence);
 close_fd:
@@ -474,6 +501,7 @@ vl_dri3_flush_frontbuffer(struct pipe_screen *screen,
struct vl_dri3_screen *scrn = (struct vl_dri3_screen *)context_private;
uint32_t options = XCB_PRESENT_OPTION_NONE;
struct vl_dri3_buffer *back;
+   struct pipe_box src_box;
 
back = scrn->back_buffers[scrn->cur_back];
if (!back)
@@ -485,6 +513,16 @@ vl_dri3_flush_frontbuffer(struct pipe_screen *screen,
 return;
}
 
+   if (scrn->is_different_gpu) {
+  u_box_origin_2d(scrn->width, scrn->height, _box);
+  scrn->pipe->resource_copy_region(scrn->pipe,
+   back->linear_texture,

Re: [Mesa-dev] [PATCH 01/11] glapi: add entry points for GL_ARB_compute_variable_group_size

2016-09-09 Thread Samuel Pitoiset




On 09/08/2016 10:58 PM, Ian Romanick wrote:

On 09/08/2016 01:31 PM, Samuel Pitoiset wrote:

Signed-off-by: Samuel Pitoiset 
---
 .../glapi/gen/ARB_compute_variable_group_size.xml  | 25 ++
 src/mapi/glapi/gen/Makefile.am |  1 +
 src/mapi/glapi/gen/gl_API.xml  |  2 ++
 src/mesa/main/compute.c|  8 +++
 src/mesa/main/compute.h|  5 +
 src/mesa/main/tests/dispatch_sanity.cpp|  3 +++
 6 files changed, 44 insertions(+)
 create mode 100644 src/mapi/glapi/gen/ARB_compute_variable_group_size.xml

diff --git a/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml 
b/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml
new file mode 100644
index 000..b21c52f
--- /dev/null
+++ b/src/mapi/glapi/gen/ARB_compute_variable_group_size.xml
@@ -0,0 +1,25 @@
+
+
+
+
+
+
+
+
+
+  
+  
+  
+  
+
+  
+
+
+
+
+
+
+  
+
+
+
diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am
index 0d7c338..49fdfe3 100644
--- a/src/mapi/glapi/gen/Makefile.am
+++ b/src/mapi/glapi/gen/Makefile.am
@@ -117,6 +117,7 @@ API_XML = \
ARB_color_buffer_float.xml \
ARB_compressed_texture_pixel_storage.xml \
ARB_compute_shader.xml \
+   ARB_compute_variable_group_size.xml \
ARB_copy_buffer.xml \
ARB_copy_image.xml \
ARB_debug_output.xml \
diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
index c39aa22..9ad3b60 100644
--- a/src/mapi/glapi/gen/gl_API.xml
+++ b/src/mapi/glapi/gen/gl_API.xml
@@ -8258,6 +8258,8 @@

 http://www.w3.org/2001/XInclude"/>

+http://www.w3.org/2001/XInclude"/>
+


This (extension #153) should go before ARB_indirect_parameters
(extension #154), and the "ARB extensions 149 - 153" should be changed
to "ARB extensions 149 - 152".


Presumably, you meant "ARB extensions 149 - 154"? Because the next 
section is "ARB extension 155 - 159".




With that fixed, this patch is

Reviewed-by: Ian Romanick 


 

 http://www.w3.org/2001/XInclude"/>
diff --git a/src/mesa/main/compute.c b/src/mesa/main/compute.c
index b71430f..b052bae 100644
--- a/src/mesa/main/compute.c
+++ b/src/mesa/main/compute.c
@@ -60,3 +60,11 @@ _mesa_DispatchComputeIndirect(GLintptr indirect)

ctx->Driver.DispatchComputeIndirect(ctx, indirect);
 }
+
+void GLAPIENTRY
+_mesa_DispatchComputeGroupSizeARB(GLuint num_groups_x, GLuint num_groups_y,
+  GLuint num_groups_z, GLuint group_size_x,
+  GLuint group_size_y, GLuint group_size_z)
+{
+
+}
diff --git a/src/mesa/main/compute.h b/src/mesa/main/compute.h
index 0cc034f..8018bbb 100644
--- a/src/mesa/main/compute.h
+++ b/src/mesa/main/compute.h
@@ -35,4 +35,9 @@ _mesa_DispatchCompute(GLuint num_groups_x,
 extern void GLAPIENTRY
 _mesa_DispatchComputeIndirect(GLintptr indirect);

+extern void GLAPIENTRY
+_mesa_DispatchComputeGroupSizeARB(GLuint num_groups_x, GLuint num_groups_y,
+  GLuint num_groups_z, GLuint group_size_x,
+  GLuint group_size_y, GLuint group_size_z);
+
 #endif
diff --git a/src/mesa/main/tests/dispatch_sanity.cpp 
b/src/mesa/main/tests/dispatch_sanity.cpp
index 42fe61a..7faeabe 100644
--- a/src/mesa/main/tests/dispatch_sanity.cpp
+++ b/src/mesa/main/tests/dispatch_sanity.cpp
@@ -942,6 +942,9 @@ const struct function common_desktop_functions_possible[] = 
{
{ "glDispatchCompute", 43, -1 },
{ "glDispatchComputeIndirect", 43, -1 },

+   /* GL_ARB_compute_variable_group_size */
+   { "glDispatchComputeGroupSizeARB", 43, -1 },
+
/* GL_EXT_polygon_offset_clamp */
{ "glPolygonOffsetClampEXT", 11, -1 },





___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 2/3] egl: return corresponding offset of EGLImage instead of 0.

2016-09-09 Thread Weng, Chuanbo

The comments from Emil can be summarized into the following code:

   if (offsets) {
  offsets[0] = 0;
  EGLint img_offset = 0;
  bool ret = dri2_dpy->image->queryImage(dri2_img->dri_image,
   __DRI_IMAGE_ATTRIB_OFFSET, _offset);
  if(ret == true)
 offsets[0] = img_offset;
   }

   return EGL_TRUE;

Emil, is it right?

-Original Message-
From: Emil Velikov [mailto:emil.l.veli...@gmail.com] 
Sent: Friday, September 9, 2016 8:32 PM
To: Weng, Chuanbo 
Cc: ML mesa-dev ; Axel Davy 
Subject: Re: [PATCH v2 2/3] egl: return corresponding offset of EGLImage 
instead of 0.

On 9 September 2016 at 08:58, Chuanbo Weng  wrote:
> The offset should not always be 0. For example, if EGLImage is created 
> from a 2D texture with EGL_GL_TEXTURE_LEVEL=1, then the offset should 
> be the actual start of miplevel 1 in bo.
>
> v2: Add version check of __DRIimageExtension implementation (Suggested 
> by Axel Davy).
>
Just to elaborate what Axel was saying from another perspective:

With current master the offset is explicitly zeroed, while with the
(v2) patch you'll _only_ set the value when you have v13 driver.
Thus using the loader + v12 driver you'll get a regression since a) the offset 
will remain garbage and b) the function
(dri2_export_dma_buf_image_mesa) will fail.


> Signed-off-by: Chuanbo Weng 
> ---
>  src/egl/drivers/dri2/egl_dri2.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/src/egl/drivers/dri2/egl_dri2.c 
> b/src/egl/drivers/dri2/egl_dri2.c index 859612f..c529e55 100644
> --- a/src/egl/drivers/dri2/egl_dri2.c
> +++ b/src/egl/drivers/dri2/egl_dri2.c
> @@ -2249,6 +2249,7 @@ dri2_export_dma_buf_image_mesa(_EGLDriver *drv, 
> _EGLDisplay *disp, _EGLImage *im
> struct dri2_egl_image *dri2_img = dri2_egl_image(img);
>
> (void) drv;
> +   EGLint img_offset = 0;
>
> /* rework later to provide multiple fds/strides/offsets */
> if (fds)
> @@ -2259,8 +2260,14 @@ dri2_export_dma_buf_image_mesa(_EGLDriver *drv, 
> _EGLDisplay *disp, _EGLImage *im
>dri2_dpy->image->queryImage(dri2_img->dri_image,
>   __DRI_IMAGE_ATTRIB_STRIDE, strides);
>
> -   if (offsets)
> +   if (offsets){
>offsets[0] = 0;
> +  if(dri2_dpy->image->base.version >= 13){
Please move the variable declaration in local scope and add space between ){

Functionality wise we don't need the version check, esp. since you want to set 
the offset only when queryImage() succeeds...

> + dri2_dpy->image->queryImage(dri2_img->dri_image,
> + __DRI_IMAGE_ATTRIB_OFFSET, _offset);
... which you don't seem to be checking here, so you'll effectively store/use 
garbage.

So unless Alex feels strongly for the version check I'd omit it, and fix the 
rest of this patch.

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/5] radeonsi: remove the cache_flush atom

2016-09-09 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_compute.c|  2 +-
 src/gallium/drivers/radeonsi/si_cp_dma.c |  2 +-
 src/gallium/drivers/radeonsi/si_hw_context.c |  2 +-
 src/gallium/drivers/radeonsi/si_pipe.h   |  1 -
 src/gallium/drivers/radeonsi/si_state.c  |  1 -
 src/gallium/drivers/radeonsi/si_state.h  |  3 +--
 src/gallium/drivers/radeonsi/si_state_draw.c | 10 +-
 7 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
b/src/gallium/drivers/radeonsi/si_compute.c
index 5041761..f43c616 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -467,21 +467,21 @@ static void si_launch_grid(
if (info->indirect)
r600_context_add_resource_size(ctx, info->indirect);
/* TODO: add the scratch buffer */
 
si_need_cs_space(sctx);
 
if (!sctx->cs_shader_state.initialized)
si_initialize_compute(sctx);
 
if (sctx->b.flags)
-   si_emit_cache_flush(sctx, NULL);
+   si_emit_cache_flush(sctx);
 
if (!si_switch_compute_shader(sctx, program, >shader, 
info->pc))
return;
 
si_upload_compute_shader_descriptors(sctx);
si_emit_compute_shader_userdata(sctx);
 
if (si_is_atom_dirty(sctx, sctx->atoms.s.render_cond)) {
sctx->atoms.s.render_cond->emit(>b,
sctx->atoms.s.render_cond);
diff --git a/src/gallium/drivers/radeonsi/si_cp_dma.c 
b/src/gallium/drivers/radeonsi/si_cp_dma.c
index 7d4edc0..08d3dfe 100644
--- a/src/gallium/drivers/radeonsi/si_cp_dma.c
+++ b/src/gallium/drivers/radeonsi/si_cp_dma.c
@@ -163,21 +163,21 @@ static void si_cp_dma_prepare(struct si_context *sctx, 
struct pipe_resource *dst
  RADEON_USAGE_WRITE, RADEON_PRIO_CP_DMA);
if (src)
radeon_add_to_buffer_list(>b, >b.gfx,
  (struct r600_resource*)src,
  RADEON_USAGE_READ, 
RADEON_PRIO_CP_DMA);
 
/* Flush the caches for the first copy only.
 * Also wait for the previous CP DMA operations.
 */
if (sctx->b.flags) {
-   si_emit_cache_flush(sctx, NULL);
+   si_emit_cache_flush(sctx);
*flags |= SI_CP_DMA_RAW_WAIT;
}
 
/* Do the synchronization after the last dma, so that all data
 * is written to memory.
 */
if (byte_count == remaining_size)
*flags |= R600_CP_DMA_SYNC;
 }
 
diff --git a/src/gallium/drivers/radeonsi/si_hw_context.c 
b/src/gallium/drivers/radeonsi/si_hw_context.c
index 24b0360..67e8352 100644
--- a/src/gallium/drivers/radeonsi/si_hw_context.c
+++ b/src/gallium/drivers/radeonsi/si_hw_context.c
@@ -108,21 +108,21 @@ void si_context_gfx_flush(void *context, unsigned flags,
r600_preflush_suspend_features(>b);
 
ctx->b.flags |= SI_CONTEXT_CS_PARTIAL_FLUSH |
SI_CONTEXT_PS_PARTIAL_FLUSH;
 
/* DRM 3.1.0 doesn't flush TC for VI correctly. */
if (ctx->b.chip_class == VI && ctx->b.screen->info.drm_minor <= 1)
ctx->b.flags |= SI_CONTEXT_INV_GLOBAL_L2 |
SI_CONTEXT_INV_VMEM_L1;
 
-   si_emit_cache_flush(ctx, NULL);
+   si_emit_cache_flush(ctx);
 
if (ctx->trace_buf)
si_trace_emit(ctx);
 
if (ctx->is_debug) {
/* Save the IB for debug contexts. */
radeon_clear_saved_cs(>last_gfx);
radeon_save_cs(ws, cs, >last_gfx);
r600_resource_reference(>last_trace_buf, ctx->trace_buf);
r600_resource_reference(>trace_buf, NULL);
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index a648d86..1080e72 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -222,21 +222,20 @@ struct si_context {
boolcompute_is_busy;
 
/* Atoms (direct states). */
union si_state_atomsatoms;
unsigneddirty_atoms; /* mask */
/* PM4 states (precomputed immutable states) */
union si_state  queued;
union si_state  emitted;
 
/* Atom declarations. */
-   struct r600_atomcache_flush;
struct si_framebuffer   framebuffer;
struct si_sample_locs   msaa_sample_locs;
struct r600_atomdb_render_state;
struct r600_atommsaa_config;
struct si_sample_mask   sample_mask;
struct r600_atomcb_render_state;
struct si_blend_color   blend_color;
struct r600_atom

[Mesa-dev] [PATCH 1/5] winsys/radeon: replace OUT_CS with radeon_emit

2016-09-09 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c 
b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
index 612a876..9de00c2 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
@@ -195,22 +195,20 @@ radeon_drm_cs_create(struct radeon_winsys_ctx *ctx,
 cs->csc = >csc1;
 cs->cst = >csc2;
 cs->base.current.buf = cs->csc->buf;
 cs->base.current.max_dw = ARRAY_SIZE(cs->csc->buf);
 cs->ring_type = ring_type;
 
 p_atomic_inc(>num_cs);
 return >base;
 }
 
-#define OUT_CS(cs, value) (cs)->current.buf[(cs)->current.cdw++] = (value)
-
 static inline void update_reloc(struct drm_radeon_cs_reloc *reloc,
 enum radeon_bo_domain rd,
 enum radeon_bo_domain wd,
 unsigned priority,
 enum radeon_bo_domain *added_domains)
 {
 *added_domains = (rd | wd) & ~(reloc->read_domains | reloc->write_domain);
 
 reloc->read_domains |= rd;
 reloc->write_domain |= wd;
@@ -454,41 +452,41 @@ static int radeon_drm_cs_flush(struct radeon_winsys_cs 
*rcs,
struct pipe_fence_handle **fence)
 {
 struct radeon_drm_cs *cs = radeon_drm_cs(rcs);
 struct radeon_cs_context *tmp;
 
 switch (cs->ring_type) {
 case RING_DMA:
 /* pad DMA ring to 8 DWs */
 if (cs->ws->info.chip_class <= SI) {
 while (rcs->current.cdw & 7)
-OUT_CS(>base, 0xf000); /* NOP packet */
+radeon_emit(>base, 0xf000); /* NOP packet */
 } else {
 while (rcs->current.cdw & 7)
-OUT_CS(>base, 0x); /* NOP packet */
+radeon_emit(>base, 0x); /* NOP packet */
 }
 break;
 case RING_GFX:
 /* pad GFX ring to 8 DWs to meet CP fetch alignment requirements
  * r6xx, requires at least 4 dw alignment to avoid a hw bug.
  */
 if (cs->ws->info.gfx_ib_pad_with_type2) {
 while (rcs->current.cdw & 7)
-OUT_CS(>base, 0x8000); /* type2 nop packet */
+radeon_emit(>base, 0x8000); /* type2 nop packet */
 } else {
 while (rcs->current.cdw & 7)
-OUT_CS(>base, 0x1000); /* type3 nop packet */
+radeon_emit(>base, 0x1000); /* type3 nop packet */
 }
 break;
 case RING_UVD:
 while (rcs->current.cdw & 15)
-OUT_CS(>base, 0x8000); /* type2 nop packet */
+radeon_emit(>base, 0x8000); /* type2 nop packet */
 break;
 default:
 break;
 }
 
 if (rcs->current.cdw > rcs->current.max_dw) {
fprintf(stderr, "radeon: command stream overflowed\n");
 }
 
 if (fence) {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/5] radeonsi: flush TC L2 before using a compute indirect buffer

2016-09-09 Thread Marek Olšák

From: Marek Olšák 

There is no known test for this.

Cc: 12.0 
---
 src/gallium/drivers/radeonsi/si_compute.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
b/src/gallium/drivers/radeonsi/si_compute.c
index f43c616..d988214 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -457,24 +457,32 @@ static void si_launch_grid(
info->block[0] * info->block[1] * info->block[2] > 256;
 
if (cs_regalloc_hang)
sctx->b.flags |= SI_CONTEXT_PS_PARTIAL_FLUSH |
 SI_CONTEXT_CS_PARTIAL_FLUSH;
 
si_decompress_compute_textures(sctx);
 
/* Add buffer sizes for memory checking in need_cs_space. */
r600_context_add_resource_size(ctx, >shader.bo->b.b);
-   if (info->indirect)
-   r600_context_add_resource_size(ctx, info->indirect);
/* TODO: add the scratch buffer */
 
+   if (info->indirect) {
+   r600_context_add_resource_size(ctx, info->indirect);
+
+   /* The hw doesn't read the indirect buffer via TC L2. */
+   if (r600_resource(info->indirect)->TC_L2_dirty) {
+   sctx->b.flags |= SI_CONTEXT_INV_GLOBAL_L2;
+   r600_resource(info->indirect)->TC_L2_dirty = false;
+   }
+   }
+
si_need_cs_space(sctx);
 
if (!sctx->cs_shader_state.initialized)
si_initialize_compute(sctx);
 
if (sctx->b.flags)
si_emit_cache_flush(sctx);
 
if (!si_switch_compute_shader(sctx, program, >shader, 
info->pc))
return;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/5] winsys/amdgpu: replace OUT_CS with radeon_emit

2016-09-09 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
index 73c8a97..0bb916e 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
@@ -702,22 +702,20 @@ amdgpu_cs_add_const_preamble_ib(struct radeon_winsys_cs 
*rcs)
 
cs->csc->request.number_of_ibs = 3;
cs->csc->request.ibs = >csc->ib[IB_CONST_PREAMBLE];
 
cs->cst->request.number_of_ibs = 3;
cs->cst->request.ibs = >cst->ib[IB_CONST_PREAMBLE];
 
return >const_preamble_ib.base;
 }
 
-#define OUT_CS(cs, value) (cs)->current.buf[(cs)->current.cdw++] = (value)
-
 static int amdgpu_cs_lookup_buffer(struct radeon_winsys_cs *rcs,
struct pb_buffer *buf)
 {
struct amdgpu_cs *cs = amdgpu_cs(rcs);
 
return amdgpu_lookup_buffer(cs->csc, (struct amdgpu_winsys_bo*)buf);
 }
 
 static bool amdgpu_cs_validate(struct radeon_winsys_cs *rcs)
 {
@@ -765,28 +763,28 @@ static bool amdgpu_cs_check_space(struct radeon_winsys_cs 
*rcs, unsigned dw)
 
assert(ib->used_ib_space == 0);
va = amdgpu_winsys_bo(ib->big_ib_buffer)->va;
 
/* This space was originally reserved. */
rcs->current.max_dw += 4;
assert(ib->used_ib_space + 4 * rcs->current.max_dw <= 
ib->big_ib_buffer->size);
 
/* Pad with NOPs and add INDIRECT_BUFFER packet */
while ((rcs->current.cdw & 7) != 4)
-  OUT_CS(rcs, 0x1000); /* type3 nop packet */
+  radeon_emit(rcs, 0x1000); /* type3 nop packet */
 
-   OUT_CS(rcs, PKT3(ib->ib_type == IB_MAIN ? PKT3_INDIRECT_BUFFER_CIK
+   radeon_emit(rcs, PKT3(ib->ib_type == IB_MAIN ? PKT3_INDIRECT_BUFFER_CIK
: PKT3_INDIRECT_BUFFER_CONST, 2, 
0));
-   OUT_CS(rcs, va);
-   OUT_CS(rcs, va >> 32);
+   radeon_emit(rcs, va);
+   radeon_emit(rcs, va >> 32);
new_ptr_ib_size = >current.buf[rcs->current.cdw];
-   OUT_CS(rcs, S_3F2_CHAIN(1) | S_3F2_VALID(1));
+   radeon_emit(rcs, S_3F2_CHAIN(1) | S_3F2_VALID(1));
 
assert((rcs->current.cdw & 7) == 0);
assert(rcs->current.cdw <= rcs->current.max_dw);
 
*ib->ptr_ib_size |= rcs->current.cdw;
ib->ptr_ib_size = new_ptr_ib_size;
 
/* Hook up the new chunk */
rcs->prev[rcs->num_prev].buf = rcs->current.buf;
rcs->prev[rcs->num_prev].cdw = rcs->current.cdw;
@@ -976,48 +974,48 @@ static int amdgpu_cs_flush(struct radeon_winsys_cs *rcs,
struct amdgpu_winsys *ws = cs->ctx->ws;
int error_code = 0;
 
rcs->current.max_dw += amdgpu_cs_epilog_dws(cs->ring_type);
 
switch (cs->ring_type) {
case RING_DMA:
   /* pad DMA ring to 8 DWs */
   if (ws->info.chip_class <= SI) {
  while (rcs->current.cdw & 7)
-OUT_CS(rcs, 0xf000); /* NOP packet */
+radeon_emit(rcs, 0xf000); /* NOP packet */
   } else {
  while (rcs->current.cdw & 7)
-OUT_CS(rcs, 0x); /* NOP packet */
+radeon_emit(rcs, 0x); /* NOP packet */
   }
   break;
case RING_GFX:
   /* pad GFX ring to 8 DWs to meet CP fetch alignment requirements */
   if (ws->info.gfx_ib_pad_with_type2) {
  while (rcs->current.cdw & 7)
-OUT_CS(rcs, 0x8000); /* type2 nop packet */
+radeon_emit(rcs, 0x8000); /* type2 nop packet */
   } else {
  while (rcs->current.cdw & 7)
-OUT_CS(rcs, 0x1000); /* type3 nop packet */
+radeon_emit(rcs, 0x1000); /* type3 nop packet */
   }
 
   /* Also pad the const IB. */
   if (cs->const_ib.ib_mapped)
  while (!cs->const_ib.base.current.cdw || 
(cs->const_ib.base.current.cdw & 7))
-OUT_CS(>const_ib.base, 0x1000); /* type3 nop packet */
+radeon_emit(>const_ib.base, 0x1000); /* type3 nop packet */
 
   if (cs->const_preamble_ib.ib_mapped)
  while (!cs->const_preamble_ib.base.current.cdw || 
(cs->const_preamble_ib.base.current.cdw & 7))
-OUT_CS(>const_preamble_ib.base, 0x1000);
+radeon_emit(>const_preamble_ib.base, 0x1000);
   break;
case RING_UVD:
   while (rcs->current.cdw & 15)
- OUT_CS(rcs, 0x8000); /* type2 nop packet */
+ radeon_emit(rcs, 0x8000); /* type2 nop packet */
   break;
default:
   break;
}
 
if (rcs->current.cdw > rcs->current.max_dw) {
   fprintf(stderr, "amdgpu: command stream overflowed\n");
}
 
/* If the CS is not empty or overflowed */
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/5] radeonsi: fix the VGT performance tweak for small instances

2016-09-09 Thread Marek Olšák

From: Marek Olšák 

Based on the VGT spec.

The Vulkan driver doesn't do it optimally and they plan to fix it.
---
 src/gallium/drivers/radeonsi/si_state_draw.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
b/src/gallium/drivers/radeonsi/si_state_draw.c
index d3e6e1a..e44147f 100644
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -311,28 +311,29 @@ static unsigned si_get_ia_multi_vgt_param(struct 
si_context *sctx,
wd_switch_on_eop = true;
 
/* Hawaii hangs if instancing is enabled and WD_SWITCH_ON_EOP 
is 0.
 * We don't know that for indirect drawing, so treat it as
 * always problematic. */
if (sctx->b.family == CHIP_HAWAII &&
(info->indirect || info->instance_count > 1))
wd_switch_on_eop = true;
 
/* Performance recommendation for 4 SE Gfx7-8 parts if
-* instances are smaller than a primgroup. Ignore the fact
-* primgroup_size is a primitive count, not vertex count.
-* Don't do anything for indirect draws.
+* instances are smaller than a primgroup.
+* Assume indirect draws always use small instances.
+* This is needed for good VS wave utilization.
 */
if (sctx->b.chip_class <= VI &&
sctx->b.screen->info.max_se >= 4 &&
-   !info->indirect &&
-   info->instance_count > 1 && info->count < primgroup_size)
+   (info->indirect ||
+(info->instance_count > 1 &&
+ si_num_prims_for_vertices(info) < primgroup_size)))
wd_switch_on_eop = true;
 
/* Required on CIK and later. */
if (sctx->b.screen->info.max_se > 2 && !wd_switch_on_eop)
ia_switch_on_eoi = true;
 
/* Required by Hawaii and, for some special cases, by VI. */
if (ia_switch_on_eoi &&
(sctx->b.family == CHIP_HAWAII ||
 (sctx->b.chip_class == VI &&
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/7] EGL: Update eglext.h

2016-09-09 Thread Emil Velikov

On 9 September 2016 at 15:26, Kyle Brenneman  wrote:
> Would it be better to use the new upstream version of eglext.h and add the
> #includes, or to keep Mesa's existing eglext.h and add the missing
> declarations for EGL_KHR_debug?
>
Afaict we've been doing the former every(?) time.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] vl/dri3: handle the case of different GPU

2016-09-09 Thread Leo Liu


Hi Nayan,

Besides Emil's comments, here is some others in lines.

On 09/09/2016 09:17 AM, Nayan Deshmukh wrote:

use a linear buffer in case of back buffer

v2: Use a seprate linear buffer for each back buffer

Signed-off-by: Nayan Deshmukh 
---
  src/gallium/auxiliary/vl/vl_winsys_dri3.c | 82 ---
  1 file changed, 63 insertions(+), 19 deletions(-)

diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c 
b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
index 3d596a6..e168e36 100644
--- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c
+++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
@@ -49,6 +49,7 @@
  struct vl_dri3_buffer
  {
 struct pipe_resource *texture;
+   struct pipe_resource *linear_buffer;


Better name it as linear_texture so it matches above.

  
 uint32_t pixmap;

 uint32_t sync_fence;
@@ -69,6 +70,8 @@ struct vl_dri3_screen
 xcb_present_event_t eid;
 xcb_special_event_t *special_event;
  
+   struct pipe_context *pipe;

+
 struct vl_dri3_buffer *back_buffers[BACK_BUFFER_NUM];
 int cur_back;
  
@@ -82,6 +85,7 @@ struct vl_dri3_screen

 int64_t last_ust, ns_frame, last_msc, next_msc;
  
 bool flushed;

+   bool is_different_gpu;
  };
  
  static void

@@ -209,7 +213,7 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
 xcb_sync_fence_t sync_fence;
 struct xshmfence *shm_fence;
 int buffer_fd, fence_fd;
-   struct pipe_resource templ;
+   struct pipe_resource templ, *pixmap_buffer;


Here is same, maybe just name as texture for pixmap_buffer.


 struct winsys_handle whandle;
 unsigned usage;
  
@@ -226,25 +230,49 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)

goto close_fd;
  
 memset(, 0, sizeof(templ));

-   templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW |
-PIPE_BIND_SCANOUT | PIPE_BIND_SHARED;
-   templ.format = PIPE_FORMAT_B8G8R8X8_UNORM;
-   templ.target = PIPE_TEXTURE_2D;
-   templ.last_level = 0;
-   templ.width0 = scrn->width;
-   templ.height0 = scrn->height;
-   templ.depth0 = 1;
-   templ.array_size = 1;
-   buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen,
- );
-   if (!buffer->texture)
-  goto unmap_shm;
-
+   if (scrn->is_different_gpu) {
+  templ.format = PIPE_FORMAT_B8G8R8X8_UNORM;
+  templ.target = PIPE_TEXTURE_2D;
+  templ.last_level = 0;
+  templ.width0 = scrn->width;
+  templ.height0 = scrn->height;
+  templ.depth0 = 1;
+  templ.array_size = 1;
+  buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen,
+);
+  if (!buffer->texture)
+ goto unmap_shm;
+
+  templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW |
+   PIPE_BIND_SCANOUT | PIPE_BIND_SHARED | PIPE_BIND_LINEAR;
+  buffer->linear_buffer = 
scrn->base.pscreen->resource_create(scrn->base.pscreen,
+  );


You have to remember to free it later.


+  pixmap_buffer = buffer->linear_buffer;
+
+  if (!buffer->linear_buffer)
+ goto no_linear_buffer;
+
+   } else {
+  templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW |
+   PIPE_BIND_SCANOUT | PIPE_BIND_SHARED;
+  templ.format = PIPE_FORMAT_B8G8R8X8_UNORM;
+  templ.target = PIPE_TEXTURE_2D;
+  templ.last_level = 0;
+  templ.width0 = scrn->width;
+  templ.height0 = scrn->height;
+  templ.depth0 = 1;
+  templ.array_size = 1;
+  buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen,
+);
+  if (!buffer->texture)
+ goto unmap_shm;
+  pixmap_buffer = buffer->texture;
+   }
 memset(, 0, sizeof(whandle));
 whandle.type= DRM_API_HANDLE_TYPE_FD;
 usage = PIPE_HANDLE_USAGE_EXPLICIT_FLUSH | PIPE_HANDLE_USAGE_READ;
 scrn->base.pscreen->resource_get_handle(scrn->base.pscreen, NULL,
-   buffer->texture, ,
+   pixmap_buffer, ,
 usage);
 buffer_fd = whandle.handle;
 buffer->pitch = whandle.stride;
@@ -271,6 +299,8 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
  
 return buffer;
  
+no_linear_buffer:

+   pipe_resource_reference(>texture, NULL);
  unmap_shm:
 xshmfence_unmap_shm(shm_fence);
  close_fd:
@@ -474,6 +504,7 @@ vl_dri3_flush_frontbuffer(struct pipe_screen *screen,
 struct vl_dri3_screen *scrn = (struct vl_dri3_screen *)context_private;
 uint32_t options = XCB_PRESENT_OPTION_NONE;
 struct vl_dri3_buffer *back;
+   struct pipe_box src_box;
  
 back = scrn->back_buffers[scrn->cur_back];

 if (!back)
@@ -485,6 +516,17 @@ vl_dri3_flush_frontbuffer(struct pipe_screen *screen,

Re: [Mesa-dev] [PATCH 1/7] EGL: Update eglext.h

2016-09-09 Thread Kyle Brenneman

Would it be better to use the new upstream version of eglext.h and add 
the #includes, or to keep Mesa's existing eglext.h and add the missing 
declarations for EGL_KHR_debug?


-Kyle

On 09/09/2016 03:31 AM, Emil Velikov wrote:

On 8 September 2016 at 18:46, Adam Jackson  wrote:

From: Kyle Brenneman 

Updated eglext.h to revision 32074 from the Khronos repository.

Added two #includes to egltypedefs.h. Both were in the previous version
of eglext.h but not in the new one.
-#include 
-#include 
-

AFAICT neither of these was in the upstream eglext.h, ever. Then again
we must upstreaming/standardise the remaining extensions, since
removing this hunk will break applications.
Most/all companies working on mesa are members of Khronos so it
shouldn't be that hard to get things sorted :-)

NACK on this hunk - please keep it as-is for the time being.
-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [v3 4/6] i965/rbc: Consult rb settings for texture surface setup

2016-09-09 Thread Jason Ekstrand

On Sep 8, 2016 11:13 PM, "Pohjolainen, Topi" 
wrote:
>
> On Thu, Sep 08, 2016 at 08:49:56AM -0700, Jason Ekstrand wrote:
> >On Sep 7, 2016 9:30 PM, "Pohjolainen, Topi"
> ><[1]topi.pohjolai...@gmail.com> wrote:
> >>
> >> On Wed, Sep 07, 2016 at 03:25:30PM -0700, Jason Ekstrand wrote:
> >> >On Sep 7, 2016 10:24 AM, "Topi Pohjolainen"
> >> ><[1][2]topi.pohjolai...@gmail.com> wrote:
> >> >>
> >> >> Once mcs buffer gets allocated without delay for lossless
> >> >> compression (same as we do for msaa), one gets regression
in:
> >> >>
> >> >> GL45-CTS.texture_barrier_ARB.same-texel-rw
> >> >>
> >> >> Setting the auxiliary surface for both sampling engine and
> >data
> >> >> port seems to fix this. I haven't found any hardware
> >documentation
> >> >> backing this though.
> >> >>
> >> >> v2 (Jason): Prepare also for the case where surface is
sampled
> >with
> >> >> non-compressible format forcing also rendering
> >without
> >> >> compression.
> >> >> v3: Split asserts and decision making.
> >> >>
> >> >> Signed-off-by: Topi Pohjolainen
> ><[2][3]topi.pohjolai...@intel.com>
> >> >> ---
> >> >>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 63
> >> >+---
> >> >>  1 file changed, 56 insertions(+), 7 deletions(-)
> >> >>
> >> >> diff --git
a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> >> >b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> >> >> index c1273c5..054c5c8 100644
> >> >> --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> >> >> +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> >> >> @@ -140,9 +140,7 @@ brw_emit_surface_state(struct
brw_context
> >*brw,
> >> >> struct isl_surf *aux_surf = NULL, aux_surf_s;
> >> >> uint64_t aux_offset = 0;
> >> >> enum isl_aux_usage aux_usage = ISL_AUX_USAGE_NONE;
> >> >> -   if (mt->mcs_mt &&
> >> >> -   ((view.usage & ISL_SURF_USAGE_RENDER_TARGET_BIT) ||
> >> >> -mt->fast_clear_state !=
> >INTEL_FAST_CLEAR_STATE_RESOLVED)) {
> >> >> +   if (mt->mcs_mt && !(flags & INTEL_AUX_BUFFER_DISABLED))
{
> >> >>intel_miptree_get_aux_isl_surf(brw, mt, _surf_s,
> >> >_usage);
> >> >>aux_surf = _surf_s;
> >> >>assert(mt->mcs_mt->offset == 0);
> >> >> @@ -425,6 +423,54 @@ swizzle_to_scs(GLenum swizzle, bool
> >> >need_green_to_blue)
> >> >> return (need_green_to_blue && scs == HSW_SCS_GREEN) ?
> >> >HSW_SCS_BLUE : scs;
> >> >>  }
> >> >>
> >> >> +static unsigned
> >> >> +brw_find_matching_rb(const struct gl_framebuffer *fb,
> >> >> + const struct intel_mipmap_tree *mt)
> >> >> +{
> >> >> +   for (unsigned i = 0; i < fb->_NumColorDrawBuffers; i++)
{
> >> >> +  const struct intel_renderbuffer *irb =
> >> >> + intel_renderbuffer(fb->_ColorDrawBuffers[i]);
> >> >> +
> >> >> +  if (irb->mt == mt)
> >> >> + return i;
> >> >> +   }
> >> >> +
> >> >> +   return fb->_NumColorDrawBuffers;
> >> >> +}
> >> >> +
> >> >> +static bool
> >> >> +brw_disable_aux_surface(const struct brw_context *brw,
> >> >> +const struct intel_mipmap_tree *mt)
> >> >> +{
> >> >> +   /* Nothing to disable. */
> >> >> +   if (!mt->mcs_mt)
> >> >> +  return false;
> >> >> +
> >> >> +   /* There are special cases only for lossless
compression.
> >*/
> >> >> +   if (!intel_miptree_is_lossless_compressed(brw, mt))
> >> >> +  return mt->fast_clear_state ==
> >> >INTEL_FAST_CLEAR_STATE_RESOLVED;
> >> >> +
> >> >> +   const struct gl_framebuffer *fb = brw->ctx.DrawBuffer;
> >> >> +   const unsigned rb_index = brw_find_matching_rb(fb, mt);
> >> >> +
> >> >> +   /* In practise it looks that setting the same lossless
> >compressed
> >> >surface
> >> >> +* to be sampled without auxiliary surface and to be
> >written with
> >> >auxiliary
> >> >> +* surface confuses the hardware. Therefore sampler
engine
> >must
> >> >be provided
> >> >> +* with auxiliary buffer regardless of the fast clear
> >state if
> >> >the same
> >> >> +* surface is also going to be written during the same
> >rendering
> >> >pass with
> >> >> +* auxiliary buffer enabled.
> >> >> +*/
> >> >> +   if (rb_index < fb->_NumColorDrawBuffers) {
> >>

Re: [Mesa-dev] [PATCH v2] vl/dri3: handle the case of different GPU

2016-09-09 Thread Nayan Deshmukh

On Fri, Sep 9, 2016 at 7:03 PM, Emil Velikov 
wrote:

> Hi Nayan,
>
> Just a couple of fly-by comments. As always don't read too much into them.
>
> On 9 September 2016 at 14:17, Nayan Deshmukh 
> wrote:
> > use a linear buffer in case of back buffer
> >
> You might want to mention a bit more about the implementation and/or
> why doing things like X won't work/is bad idea.
>
> Yeah I will add more detail to the commit message.


> > +   if (scrn->is_different_gpu) {
> > +  templ.format = PIPE_FORMAT_B8G8R8X8_UNORM;
> > +  templ.target = PIPE_TEXTURE_2D;
> > +  templ.last_level = 0;
> > +  templ.width0 = scrn->width;
> > +  templ.height0 = scrn->height;
> > +  templ.depth0 = 1;
> > +  templ.array_size = 1;
> bind == 0 here, that doesn't seem right ?
>
> I was trying to replicate the code used in loader_dri3_helper.c. I was not
sure
which bind flags should be used, I was hoping that michel can have a look
and suggest the necessary changes. I tested it on my hardware and it was
working fine. On second thought it should have a PIPE_BIND_RENDER_TARGET
flag.

> +  buffer->texture = scrn->base.pscreen->resource_
> create(scrn->base.pscreen,
> > +);
> > +  if (!buffer->texture)
> > + goto unmap_shm;
> > +
> With the bind fixed the above hunk is common so can be kept prior to
> the conditional, no ?
>
> With bind flag fixed this common code could be taken out of the
conditional.

>
> > @@ -485,6 +516,17 @@ vl_dri3_flush_frontbuffer(struct pipe_screen
> *screen,
> >  return;
> > }
> >
> > +   if (scrn->is_different_gpu) {
> > +  u_box_origin_2d(scrn->width,
> > +  scrn->height,
> > +  _box);
> > +  scrn->pipe->resource_copy_region(scrn->pipe,
> > +   back->linear_buffer,
> > +   0, 0, 0, 0,
> > +   back->texture,
> > +   0, _box);
> > +
> Please try to use the full width of the line.
>
> The entire file uses this kind of wraping so I was trying to follow the
style.
It would be better if entire file has the same kind of wraping.

Thanks for the review. I'll make the necessary changes in the next version
once michel has a look and approves it.


> Regards,
> Emil
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] tgsi/scan: don't set interp flags for inputs only used by INTERP instructions

2016-09-09 Thread Marek Olšák

From: Marek Olšák 

radeonsi depends on the interp flags a little bit too much.

This fixes 9 randomly failing tests:
  GL45-CTS.shader_multisample_interpolation.render.interpolate_at_centroid.*
---
 src/gallium/auxiliary/tgsi/tgsi_scan.c | 105 ++---
 1 file changed, 57 insertions(+), 48 deletions(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.c 
b/src/gallium/auxiliary/tgsi/tgsi_scan.c
index a3b0d9f..39e4dd9 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_scan.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_scan.c
@@ -95,20 +95,21 @@ computes_derivative(unsigned opcode)
 }
 
 
 static void
 scan_instruction(struct tgsi_shader_info *info,
  const struct tgsi_full_instruction *fullinst,
  unsigned *current_depth)
 {
unsigned i;
bool is_mem_inst = false;
+   bool is_interp_instruction = false;
 
assert(fullinst->Instruction.Opcode < TGSI_OPCODE_LAST);
info->opcode_count[fullinst->Instruction.Opcode]++;
 
switch (fullinst->Instruction.Opcode) {
case TGSI_OPCODE_IF:
case TGSI_OPCODE_UIF:
case TGSI_OPCODE_BGNLOOP:
   (*current_depth)++;
   info->max_depth = MAX2(info->max_depth, *current_depth);
@@ -120,20 +121,22 @@ scan_instruction(struct tgsi_shader_info *info,
default:
   break;
}
 
if (fullinst->Instruction.Opcode == TGSI_OPCODE_INTERP_CENTROID ||
fullinst->Instruction.Opcode == TGSI_OPCODE_INTERP_OFFSET ||
fullinst->Instruction.Opcode == TGSI_OPCODE_INTERP_SAMPLE) {
   const struct tgsi_full_src_register *src0 = >Src[0];
   unsigned input;
 
+  is_interp_instruction = true;
+
   if (src0->Register.Indirect && src0->Indirect.ArrayID)
  input = info->input_array_first[src0->Indirect.ArrayID];
   else
  input = src0->Register.Index;
 
   /* For the INTERP opcodes, the interpolation is always
* PERSPECTIVE unless LINEAR is specified.
*/
   switch (info->input_interpolate[input]) {
   case TGSI_INTERPOLATE_COLOR:
@@ -183,43 +186,91 @@ scan_instruction(struct tgsi_shader_info *info,
  if (src->Register.Indirect) {
 for (ind = 0; ind < info->num_inputs; ++ind) {
info->input_usage_mask[ind] |= usage_mask;
 }
  } else {
 assert(ind >= 0);
 assert(ind < PIPE_MAX_SHADER_INPUTS);
 info->input_usage_mask[ind] |= usage_mask;
  }
 
- if (info->processor == PIPE_SHADER_FRAGMENT &&
- !src->Register.Indirect) {
-unsigned name =
-   info->input_semantic_name[src->Register.Index];
-unsigned index =
-   info->input_semantic_index[src->Register.Index];
+ if (info->processor == PIPE_SHADER_FRAGMENT) {
+unsigned name, index, input;
+
+if (src->Register.Indirect && src->Indirect.ArrayID)
+   input = info->input_array_first[src->Indirect.ArrayID];
+else
+   input = src->Register.Index;
+
+name = info->input_semantic_name[input];
+index = info->input_semantic_index[input];
 
 if (name == TGSI_SEMANTIC_POSITION &&
 (src->Register.SwizzleX == TGSI_SWIZZLE_Z ||
  src->Register.SwizzleY == TGSI_SWIZZLE_Z ||
  src->Register.SwizzleZ == TGSI_SWIZZLE_Z ||
  src->Register.SwizzleW == TGSI_SWIZZLE_Z))
info->reads_z = TRUE;
 
 if (name == TGSI_SEMANTIC_COLOR) {
unsigned mask =
   (1 << src->Register.SwizzleX) |
   (1 << src->Register.SwizzleY) |
   (1 << src->Register.SwizzleZ) |
   (1 << src->Register.SwizzleW);
 
info->colors_read |= mask << (index * 4);
 }
+
+/* Process only interpolated varyings. Don't include POSITION.
+ * Don't include integer varyings, because they are not
+ * interpolated. Don't process inputs interpolated by INTERP
+ * opcodes. Those are tracked separately.
+ */
+if ((!is_interp_instruction || i != 0) &&
+(name == TGSI_SEMANTIC_GENERIC ||
+ name == TGSI_SEMANTIC_TEXCOORD ||
+ name == TGSI_SEMANTIC_COLOR ||
+ name == TGSI_SEMANTIC_BCOLOR ||
+ name == TGSI_SEMANTIC_FOG ||
+ name == TGSI_SEMANTIC_CLIPDIST)) {
+   switch (info->input_interpolate[index]) {
+   case TGSI_INTERPOLATE_COLOR:
+   case TGSI_INTERPOLATE_PERSPECTIVE:
+  switch (info->input_interpolate_loc[index]) {
+  case TGSI_INTERPOLATE_LOC_CENTER:
+ info->uses_persp_center = TRUE;
+ break;
+  case TGSI_INTERPOLATE_LOC_CENTROID:
+ info->uses_persp_centroid = TRUE;
+

[Mesa-dev] [PATCH] mesa: fix glGetFramebufferAttachmentParameteriv w/ on-demand FRONT_BACK alloc

2016-09-09 Thread Marek Olšák

From: Marek Olšák 

This fixes 66 CTS tests on st/mesa.

Cc: 12.0 
---
 src/mesa/main/fbobject.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c
index 2c01526..09da6b7 100644
--- a/src/mesa/main/fbobject.c
+++ b/src/mesa/main/fbobject.c
@@ -296,23 +296,35 @@ _mesa_get_fb0_attachment(struct gl_context *ctx, struct 
gl_framebuffer *fb,
  return >Attachment[BUFFER_FRONT_LEFT];
   case GL_DEPTH:
   return >Attachment[BUFFER_DEPTH];
   case GL_STENCIL:
  return >Attachment[BUFFER_STENCIL];
   }
}
 
switch (attachment) {
case GL_FRONT_LEFT:
-  return >Attachment[BUFFER_FRONT_LEFT];
+  /* Front buffers can be allocated on the first use, but
+   * glGetFramebufferAttachmentParameteriv must work even if that
+   * allocation hasn't happened yet. In such case, use the back buffer,
+   * which should be the same.
+   */
+  if (fb->Attachment[BUFFER_FRONT_LEFT].Type == GL_NONE)
+ return >Attachment[BUFFER_BACK_LEFT];
+  else
+ return >Attachment[BUFFER_FRONT_LEFT];
case GL_FRONT_RIGHT:
-  return >Attachment[BUFFER_FRONT_RIGHT];
+  /* Same as above. */
+  if (fb->Attachment[BUFFER_FRONT_RIGHT].Type == GL_NONE)
+ return >Attachment[BUFFER_BACK_RIGHT];
+  else
+ return >Attachment[BUFFER_FRONT_RIGHT];
case GL_BACK_LEFT:
   return >Attachment[BUFFER_BACK_LEFT];
case GL_BACK_RIGHT:
   return >Attachment[BUFFER_BACK_RIGHT];
case GL_AUX0:
   if (fb->Visual.numAuxBuffers == 1) {
  return >Attachment[BUFFER_AUX0];
   }
   return NULL;
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] glsl: remove interpolateAt* instructions for demoted inputs

2016-09-09 Thread Marek Olšák

From: Marek Olšák 

This fixes 8 fs-interpolateat* piglit crashes on radeonsi, because it can't
handle non-input operands in interpolateAt*.
---
 src/compiler/glsl/link_varyings.cpp |  5 +
 src/compiler/glsl/opt_algebraic.cpp | 10 ++
 2 files changed, 15 insertions(+)

diff --git a/src/compiler/glsl/link_varyings.cpp 
b/src/compiler/glsl/link_varyings.cpp
index 1fad310..85ca044 100644
--- a/src/compiler/glsl/link_varyings.cpp
+++ b/src/compiler/glsl/link_varyings.cpp
@@ -586,20 +586,25 @@ remove_unused_shader_inputs_and_outputs(bool 
is_separate_shader_object,
 
   if (var == NULL || var->data.mode != int(mode))
  continue;
 
   /* A shader 'in' or 'out' variable is only really an input or output if
* its value is used by other shader stages. This will cause the
* variable to have a location assigned.
*/
   if (var->data.is_unmatched_generic_inout && !var->data.is_xfb_only) {
  assert(var->data.mode != ir_var_temporary);
+
+ /* Assign zeros to demoted inputs to allow more optimizations. */
+ if (var->data.mode == ir_var_shader_in && !var->constant_value)
+var->constant_value = ir_constant::zero(var, var->type);
+
  var->data.mode = ir_var_auto;
   }
}
 
/* Eliminate code that is now dead due to unused inputs/outputs being
 * demoted.
 */
while (do_dead_code(sh->ir, false))
   ;
 
diff --git a/src/compiler/glsl/opt_algebraic.cpp 
b/src/compiler/glsl/opt_algebraic.cpp
index f5858c8..2829a78 100644
--- a/src/compiler/glsl/opt_algebraic.cpp
+++ b/src/compiler/glsl/opt_algebraic.cpp
@@ -954,20 +954,30 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
   }
   break;
 
case ir_triop_csel:
   if (is_vec_one(op_const[0]))
 return ir->operands[1];
   if (is_vec_zero(op_const[0]))
 return ir->operands[2];
   break;
 
+   /* Remove interpolateAt* instructions for demoted inputs. They are
+* assigned a constant expression to facilitate this.
+*/
+   case ir_unop_interpolate_at_centroid:
+   case ir_binop_interpolate_at_offset:
+   case ir_binop_interpolate_at_sample:
+  if (op_const[0])
+ return ir->operands[0];
+  break;
+
default:
   break;
}
 
return ir;
 }
 
 void
 ir_algebraic_visitor::handle_rvalue(ir_rvalue **rvalue)
 {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 10/11] st/mesa: expose ARB_compute_variable_group_size

2016-09-09 Thread Marek Olšák

For patches 8, 9:

Reviewed-by: Marek Olšák 

Patch 10 won't work for us, because radeonsi (and presumably softpipe
as well) don't support this feature at the moment. Also, I would
prefer a PIPE_CAP for MaxComputeVariableGroupInvocations and the
extension can be exposed based on that CAP.

Marek


On Thu, Sep 8, 2016 at 10:31 PM, Samuel Pitoiset
 wrote:
> This extension is only exposed if the underlying driver supports
> ARB_compute_shader.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/mesa/state_tracker/st_extensions.c | 13 +
>  1 file changed, 13 insertions(+)
>
> diff --git a/src/mesa/state_tracker/st_extensions.c 
> b/src/mesa/state_tracker/st_extensions.c
> index 807fbfb..dc2e60a 100644
> --- a/src/mesa/state_tracker/st_extensions.c
> +++ b/src/mesa/state_tracker/st_extensions.c
> @@ -1196,6 +1196,19 @@ void st_init_extensions(struct pipe_screen *screen,
>   extensions->ARB_compute_shader =
>
> extensions->ARB_shader_image_load_store &&
>extensions->ARB_shader_atomic_counters;
> +
> + if (extensions->ARB_compute_shader) {
> +/* Because the minimum values required by
> + * ARB_compute_variable_group_size are less (or equal) than the
> + * ones defined by ARB_compute_shader we can re-use them. */
> +for (i = 0; i < 3; i++) {
> +   consts->MaxComputeVariableGroupSize[i] =
> +  consts->MaxComputeWorkGroupSize[i];
> +}
> +consts->MaxComputeVariableGroupInvocations =
> +   consts->MaxComputeWorkGroupInvocations;
> +extensions->ARB_compute_variable_group_size = true;
> + }
>}
> }
>
> --
> 2.9.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] vl/dri3: handle the case of different GPU

2016-09-09 Thread Emil Velikov

Hi Nayan,

Just a couple of fly-by comments. As always don't read too much into them.

On 9 September 2016 at 14:17, Nayan Deshmukh  wrote:
> use a linear buffer in case of back buffer
>
You might want to mention a bit more about the implementation and/or
why doing things like X won't work/is bad idea.

> +   if (scrn->is_different_gpu) {
> +  templ.format = PIPE_FORMAT_B8G8R8X8_UNORM;
> +  templ.target = PIPE_TEXTURE_2D;
> +  templ.last_level = 0;
> +  templ.width0 = scrn->width;
> +  templ.height0 = scrn->height;
> +  templ.depth0 = 1;
> +  templ.array_size = 1;
bind == 0 here, that doesn't seem right ?

> +  buffer->texture = 
> scrn->base.pscreen->resource_create(scrn->base.pscreen,
> +);
> +  if (!buffer->texture)
> + goto unmap_shm;
> +
With the bind fixed the above hunk is common so can be kept prior to
the conditional, no ?


> @@ -485,6 +516,17 @@ vl_dri3_flush_frontbuffer(struct pipe_screen *screen,
>  return;
> }
>
> +   if (scrn->is_different_gpu) {
> +  u_box_origin_2d(scrn->width,
> +  scrn->height,
> +  _box);
> +  scrn->pipe->resource_copy_region(scrn->pipe,
> +   back->linear_buffer,
> +   0, 0, 0, 0,
> +   back->texture,
> +   0, _box);
> +
Please try to use the full width of the line.

Regards,
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2] vl/dri3: handle the case of different GPU

2016-09-09 Thread Nayan Deshmukh

use a linear buffer in case of back buffer

v2: Use a seprate linear buffer for each back buffer

Signed-off-by: Nayan Deshmukh 
---
 src/gallium/auxiliary/vl/vl_winsys_dri3.c | 82 ---
 1 file changed, 63 insertions(+), 19 deletions(-)

diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c 
b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
index 3d596a6..e168e36 100644
--- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c
+++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
@@ -49,6 +49,7 @@
 struct vl_dri3_buffer
 {
struct pipe_resource *texture;
+   struct pipe_resource *linear_buffer;
 
uint32_t pixmap;
uint32_t sync_fence;
@@ -69,6 +70,8 @@ struct vl_dri3_screen
xcb_present_event_t eid;
xcb_special_event_t *special_event;
 
+   struct pipe_context *pipe;
+
struct vl_dri3_buffer *back_buffers[BACK_BUFFER_NUM];
int cur_back;
 
@@ -82,6 +85,7 @@ struct vl_dri3_screen
int64_t last_ust, ns_frame, last_msc, next_msc;
 
bool flushed;
+   bool is_different_gpu;
 };
 
 static void
@@ -209,7 +213,7 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
xcb_sync_fence_t sync_fence;
struct xshmfence *shm_fence;
int buffer_fd, fence_fd;
-   struct pipe_resource templ;
+   struct pipe_resource templ, *pixmap_buffer;
struct winsys_handle whandle;
unsigned usage;
 
@@ -226,25 +230,49 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
   goto close_fd;
 
memset(, 0, sizeof(templ));
-   templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW |
-PIPE_BIND_SCANOUT | PIPE_BIND_SHARED;
-   templ.format = PIPE_FORMAT_B8G8R8X8_UNORM;
-   templ.target = PIPE_TEXTURE_2D;
-   templ.last_level = 0;
-   templ.width0 = scrn->width;
-   templ.height0 = scrn->height;
-   templ.depth0 = 1;
-   templ.array_size = 1;
-   buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen,
- );
-   if (!buffer->texture)
-  goto unmap_shm;
-
+   if (scrn->is_different_gpu) {
+  templ.format = PIPE_FORMAT_B8G8R8X8_UNORM;
+  templ.target = PIPE_TEXTURE_2D;
+  templ.last_level = 0;
+  templ.width0 = scrn->width;
+  templ.height0 = scrn->height;
+  templ.depth0 = 1;
+  templ.array_size = 1;
+  buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen,
+);
+  if (!buffer->texture)
+ goto unmap_shm;
+
+  templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW |
+   PIPE_BIND_SCANOUT | PIPE_BIND_SHARED | PIPE_BIND_LINEAR;
+  buffer->linear_buffer = 
scrn->base.pscreen->resource_create(scrn->base.pscreen,
+  );
+  pixmap_buffer = buffer->linear_buffer;
+
+  if (!buffer->linear_buffer)
+ goto no_linear_buffer;
+
+   } else {
+  templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW |
+   PIPE_BIND_SCANOUT | PIPE_BIND_SHARED;
+  templ.format = PIPE_FORMAT_B8G8R8X8_UNORM;
+  templ.target = PIPE_TEXTURE_2D;
+  templ.last_level = 0;
+  templ.width0 = scrn->width;
+  templ.height0 = scrn->height;
+  templ.depth0 = 1;
+  templ.array_size = 1;
+  buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen,
+);
+  if (!buffer->texture)
+ goto unmap_shm;
+  pixmap_buffer = buffer->texture;
+   }
memset(, 0, sizeof(whandle));
whandle.type= DRM_API_HANDLE_TYPE_FD;
usage = PIPE_HANDLE_USAGE_EXPLICIT_FLUSH | PIPE_HANDLE_USAGE_READ;
scrn->base.pscreen->resource_get_handle(scrn->base.pscreen, NULL,
-   buffer->texture, ,
+   pixmap_buffer, ,
usage);
buffer_fd = whandle.handle;
buffer->pitch = whandle.stride;
@@ -271,6 +299,8 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
 
return buffer;
 
+no_linear_buffer:
+   pipe_resource_reference(>texture, NULL);
 unmap_shm:
xshmfence_unmap_shm(shm_fence);
 close_fd:
@@ -474,6 +504,7 @@ vl_dri3_flush_frontbuffer(struct pipe_screen *screen,
struct vl_dri3_screen *scrn = (struct vl_dri3_screen *)context_private;
uint32_t options = XCB_PRESENT_OPTION_NONE;
struct vl_dri3_buffer *back;
+   struct pipe_box src_box;
 
back = scrn->back_buffers[scrn->cur_back];
if (!back)
@@ -485,6 +516,17 @@ vl_dri3_flush_frontbuffer(struct pipe_screen *screen,
 return;
}
 
+   if (scrn->is_different_gpu) {
+  u_box_origin_2d(scrn->width,
+  scrn->height,
+  _box);
+  scrn->pipe->resource_copy_region(scrn->pipe,
+   back->linear_buffer,
+   0, 0, 0, 0,
+

Re: [Mesa-dev] [PATCH] Disable the code that allocates W|X memory on OpenBSD

2016-09-09 Thread Emil Velikov

On 8 September 2016 at 19:46, Mark Kettenis  wrote:
>> From: Emil Velikov 
>> Date: Thu, 8 Sep 2016 18:57:44 +0100
>>
>> On 1 September 2016 at 18:23, Jonathan Gray  wrote:
>> > OpenBSD now has strict W^X enforcement.  Processes that violate
>> > the policy get killed by the kernel.  Don't attempt to use
>> > executable memory on OpenBSD to avoid this.
>> >
>> > Patch from Mark Kettenis.
>
> Jonathan, Emil,
>
> That diff should probably not land "upstream".  We're still tweaking
> the W^X policy, trying to find the code in ports that violates W^X.
> We recently changed things around such that
>
> mmap(..., PROT_WRITE | PROT_EXEC)
>
> will simply fail instead of aborting the process.  Since Mesa has
> fallback code for that case, the changes I made (at least most of
> them) are no longer necessary.  But I didn't revert them yet in order
> to reduce the amount of false positives.
>
Fwiw I'm for upstreaming things where they don't seem too hacky. When
they do I try to elaborate why and how they could be addressed.

Here... thinking about the mmap solution (workaround?) I'm wondering
if it isn't the better option. Using it one won't need to patch X
projects and at the same time everyone should handle mmap() failure.
If they don't then that's a bug which should be fixed regardless of
anything else.

>> > --- a/src/gallium/auxiliary/rtasm/rtasm_execmem.c
>> > +++ b/src/gallium/auxiliary/rtasm/rtasm_execmem.c
>> > @@ -69,6 +69,16 @@ static struct mem_block *exec_heap = NULL;
>> >  static unsigned char *exec_mem = NULL;
>> >
>> >
>> > +#ifdef __OpenBSD__
>> > +
>> > +static int
>> > +init_heap(void)
>> > +{
>> > +   return 0;
>> > +}
>> Afaict this is equivalent to using the #else path in translate_see.c.
>> In general I'm wondering if we can/should not have a configure toggle
>> for this. Then again please look below.
>
> Right.  We basically prefer the slow code paths over the unsecure
> codepaths.  There are ways around W^X enforcement though.  One
> possibility is to generate the code in pages that are writable, but
> not executable, and then flip the permissions to make the page
> read-only but executable.  Other techniques include using aliased
> pages where one mapping is writable and the other mapping is
> executable.  Although we don't really want to encourage using aliasing
> since it is a bit of a cheat.
>
>> > --- a/src/mapi/u_execmem.c
>> > +++ b/src/mapi/u_execmem.c
>> > @@ -45,8 +45,15 @@ static unsigned int head = 0;
>> >
>> >  static unsigned char *exec_mem = (unsigned char *)0;
>> >
>> > +#if defined(__OpenBSD__)
>> >
>> > -#if defined(__linux__) || defined(__OpenBSD__) || defined(_NetBSD__) || 
>> > defined(__sun) || defined(__HAIKU__)
>> > +static int
>> > +init_map(void)
>> > +{
>> > +  return 0;
>> > +}
>> > +
>> And this one to --disable-glx-tls and/or --disable-asm. Which reminds
>> me of - have you guys tried enabling either/both of them. Has there
>> been (m)any issues ?
>
> Pilip Guenther is working on enabling TLS support in OpenBSD.  He is
> getting close, but he was grumbling that Mesa was uncovering some
> toolchain bugs.
>
Ack, looking forward to it. Hope that if there's problems on mesa
side, he'll shout early as he finds them.

>> For a long while the intent has been to use --enable-glx-tls by
>> default and kill off the other codepaths. But with the write xor
>> execute policy, it's going to be (close to) impossible.
>
> Not entirely sure what TLS has to do with W^X enforcement.
>
Things are slightly overloaded, but the gist is that the TLS/ASM
codepaths constructs a jump table(?) which is used instead of calls
into the respective GL functions.

Linux has been explicitly using only the TLS codepath for 3+ years (if
not a lot more), with the only obstacle for keeping the other two
codepaths being BSD and maybe Solaris.
To make it even better the code (all three codepaths) has close to
zero fixes/updates since it was introduced and is quite dense.

>> Have you guys considered a way to disable the restriction for usecases
>> that need the behaviour ?
>
> "no comment", erh, yes, there is a way.  But not for what we consider
> the base of our OS.  Which includes X.  Mapping pages to be both
> writable and executable is really bad from a security perspective.
> And the goal is to eliminate that practice completely from the
> ecosystem.
Yes even though I'm not a security person, I'm aware that using W|X is
a bad idea.
The key point imho is that a) there must always be a contingency plan,
which involves toggling off and/or other methods that you've mentioned
and b) considering the limited amount of devs/interest in the area it
will take a while to untangle things.

> Please note that we're not alone here.  iOS for example
> does not allow you to create mappings like that either.
>
I see your point about iOS and/or other platforms using such policy
and I'm not against it. Merely pointing out be lots of divergent,

Re: [Mesa-dev] [PATCH 7/7] egl: Fix DebugMessageControl(callback, NULL)

2016-09-09 Thread Emil Velikov

On 8 September 2016 at 18:47, Adam Jackson  wrote:
> Treat a null attribute list as meaning "don't change attributes". This
> is semantically equivalent to a list consisting of just EGL_NONE.
>
Same as before - please don't introduce known buggy code with one
patch only to fix it with a latter one in the series.

git commit --fixup && git rebase --autosquash is our friend.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 6/7] EGL: Fix some command names for EGL_KHR_debug

2016-09-09 Thread Emil Velikov

On 8 September 2016 at 18:47, Adam Jackson  wrote:
> From: Kyle Brenneman 
>
> Change a few EGL entrypoints to call a common internal function instead
> of forwarding to another entrypoint.
>
> If one EGL entrypoint calls another, then the second entrypoint would
> overwrite the current function name in the _EGLThreadInfo struct. That
> would cause it to pass the wrong function name to the EGL_KHR_debug
> callback.
>
This should _really_ be squashed with the respective patches. Ideally
with the minor refactoring split out as separate patch(es).

Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: support TGSI compute shaders with variable block size

2016-09-09 Thread Ilia Mirkin

On Fri, Sep 9, 2016 at 8:29 AM, Marek Olšák  wrote:
> On Fri, Sep 9, 2016 at 10:12 AM, Nicolai Hähnle  wrote:
>> From: Nicolai Hähnle 
>>
>> Not sure if it's possible to avoid programming the block size twice (once for
>> the userdata and once for the dispatch).
>>
>> Since the shaders are compiled with a pessimistic upper limit on the number 
>> of
>> registers, asynchronously compiling variants may be worth considering in the
>> future if we observe the shaders to be dispatched with small block sizes.
>> ---
>> I think this is sufficient to support variable group sizes on radeonsi, but
>> it's completely untested. Do you keep the latest version of your series in a
>> public repository somewhere?
>>
>>  src/gallium/drivers/radeonsi/si_compute.c | 10 +-
>>  src/gallium/drivers/radeonsi/si_shader.c  | 29 -
>>  src/gallium/drivers/radeonsi/si_shader.h  |  4 +++-
>>  3 files changed, 32 insertions(+), 11 deletions(-)
>>
>> diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
>> b/src/gallium/drivers/radeonsi/si_compute.c
>> index 5041761..26e096c 100644
>> --- a/src/gallium/drivers/radeonsi/si_compute.c
>> +++ b/src/gallium/drivers/radeonsi/si_compute.c
>> @@ -379,25 +379,33 @@ static void si_setup_tgsi_grid(struct si_context *sctx,
>> for (i = 0; i < 3; ++i) {
>> radeon_emit(cs, PKT3(PKT3_COPY_DATA, 4, 0));
>> radeon_emit(cs, COPY_DATA_SRC_SEL(COPY_DATA_MEM) |
>> COPY_DATA_DST_SEL(COPY_DATA_REG));
>> radeon_emit(cs, (va +  4 * i));
>> radeon_emit(cs, (va + 4 * i) >> 32);
>> radeon_emit(cs, (grid_size_reg >> 2) + i);
>> radeon_emit(cs, 0);
>> }
>> } else {
>> +   struct si_compute *program = sctx->cs_shader_state.program;
>> +   bool variable_group_size =
>> +   
>> program->shader.selector->info.properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH]
>>  == 0;
>>
>> -   radeon_set_sh_reg_seq(cs, grid_size_reg, 3);
>> +   radeon_set_sh_reg_seq(cs, grid_size_reg, variable_group_size 
>> ? 6 : 3);
>> radeon_emit(cs, info->grid[0]);
>> radeon_emit(cs, info->grid[1]);
>> radeon_emit(cs, info->grid[2]);
>> +   if (variable_group_size) {
>> +   radeon_emit(cs, info->block[0]);
>> +   radeon_emit(cs, info->block[1]);
>> +   radeon_emit(cs, info->block[2]);
>> +   }
>> }
>>  }
>>
>>  static void si_emit_dispatch_packets(struct si_context *sctx,
>>   const struct pipe_grid_info *info)
>>  {
>> struct radeon_winsys_cs *cs = sctx->b.gfx.cs;
>> bool render_cond_bit = sctx->b.render_cond && 
>> !sctx->b.render_cond_force_off;
>> unsigned waves_per_threadgroup =
>> DIV_ROUND_UP(info->block[0] * info->block[1] * 
>> info->block[2], 64);
>> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
>> b/src/gallium/drivers/radeonsi/si_shader.c
>> index 0b7de18..730ee21 100644
>> --- a/src/gallium/drivers/radeonsi/si_shader.c
>> +++ b/src/gallium/drivers/radeonsi/si_shader.c
>> @@ -1783,30 +1783,35 @@ static void declare_system_value(
>>
>> case TGSI_SEMANTIC_GRID_SIZE:
>> value = LLVMGetParam(radeon_bld->main_fn, 
>> SI_PARAM_GRID_SIZE);
>> break;
>>
>> case TGSI_SEMANTIC_BLOCK_SIZE:
>> {
>> LLVMValueRef values[3];
>> unsigned i;
>> unsigned *properties = 
>> ctx->shader->selector->info.properties;
>> -   unsigned sizes[3] = {
>> -   properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH],
>> -   properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT],
>> -   properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH]
>> -   };
>>
>> -   for (i = 0; i < 3; ++i)
>> -   values[i] = lp_build_const_int32(gallivm, sizes[i]);
>> +   if (properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] != 0) {
>> +   unsigned sizes[3] = {
>> +   
>> properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH],
>> +   
>> properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT],
>> +   
>> properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH]
>> +   };
>> +
>> +   for (i = 0; i < 3; ++i)
>> +   values[i] = lp_build_const_int32(gallivm, 
>> sizes[i]);
>>
>> -   value = lp_build_gather_values(gallivm, values, 3);
>> +   value = lp_build_gather_values(gallivm, values, 3);
>> +   } else {

Re: [Mesa-dev] [PATCH v2 2/3] egl: return corresponding offset of EGLImage instead of 0.

2016-09-09 Thread Emil Velikov

On 9 September 2016 at 08:58, Chuanbo Weng  wrote:
> The offset should not always be 0. For example, if EGLImage is
> created from a 2D texture with EGL_GL_TEXTURE_LEVEL=1, then the
> offset should be the actual start of miplevel 1 in bo.
>
> v2: Add version check of __DRIimageExtension implementation
> (Suggested by Axel Davy).
>
Just to elaborate what Axel was saying from another perspective:

With current master the offset is explicitly zeroed, while with the
(v2) patch you'll _only_ set the value when you have v13 driver.
Thus using the loader + v12 driver you'll get a regression since a)
the offset will remain garbage and b) the function
(dri2_export_dma_buf_image_mesa) will fail.


> Signed-off-by: Chuanbo Weng 
> ---
>  src/egl/drivers/dri2/egl_dri2.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
> index 859612f..c529e55 100644
> --- a/src/egl/drivers/dri2/egl_dri2.c
> +++ b/src/egl/drivers/dri2/egl_dri2.c
> @@ -2249,6 +2249,7 @@ dri2_export_dma_buf_image_mesa(_EGLDriver *drv, 
> _EGLDisplay *disp, _EGLImage *im
> struct dri2_egl_image *dri2_img = dri2_egl_image(img);
>
> (void) drv;
> +   EGLint img_offset = 0;
>
> /* rework later to provide multiple fds/strides/offsets */
> if (fds)
> @@ -2259,8 +2260,14 @@ dri2_export_dma_buf_image_mesa(_EGLDriver *drv, 
> _EGLDisplay *disp, _EGLImage *im
>dri2_dpy->image->queryImage(dri2_img->dri_image,
>   __DRI_IMAGE_ATTRIB_STRIDE, strides);
>
> -   if (offsets)
> +   if (offsets){
>offsets[0] = 0;
> +  if(dri2_dpy->image->base.version >= 13){
Please move the variable declaration in local scope and add space between ){

Functionality wise we don't need the version check, esp. since you
want to set the offset only when queryImage() succeeds...

> + dri2_dpy->image->queryImage(dri2_img->dri_image,
> + __DRI_IMAGE_ATTRIB_OFFSET, _offset);
... which you don't seem to be checking here, so you'll effectively
store/use garbage.

So unless Alex feels strongly for the version check I'd omit it, and
fix the rest of this patch.

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

1 2 >

1 - 100 of 137 matches

Mail list logo