Re: [Mesa-dev] [PATCH v2 22/42] glsl: Don't lower_variable_index_to_cond_assign for shared variables

2015-11-25 Thread Iago Toral
On Tue, 2015-11-17 at 21:54 -0800, Jordan Justen wrote:
> Signed-off-by: Jordan Justen 
> ---
>  src/glsl/lower_variable_index_to_cond_assign.cpp | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/src/glsl/lower_variable_index_to_cond_assign.cpp 
> b/src/glsl/lower_variable_index_to_cond_assign.cpp
> index 1ab3afe..a1ba934 100644
> --- a/src/glsl/lower_variable_index_to_cond_assign.cpp
> +++ b/src/glsl/lower_variable_index_to_cond_assign.cpp
> @@ -378,6 +378,9 @@ public:
>case ir_var_shader_storage:
>return this->lower_uniforms;
>  
> +  case ir_var_shader_shared:
> +  return false;

I suppose the right thing to do here is to add a lower_shared_variables
parameter to this and take its value from a compiler option that we set
to false, like we do with the other types, but I guess this is good
enough for now:

Reviewed-by: Iago Toral Quiroga 

>case ir_var_function_in:
>case ir_var_const_in:
>   return this->lower_temps;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 1/2] st/va: if h264 then delay decoder creation until max_references is known

2015-11-25 Thread Julien Isorce
From: Julien Isorce 

In general max_references cannot be based on num_render_targets.

This patch allow to allocate accurate sizes for buffers.
For other codecs it is a fixed value to 2.

This is similar behaviour as vaapi/vdpau-driver.

XXX: do the same for HEVC

Signed-off-by: Julien Isorce 
---
 src/gallium/state_trackers/va/context.c  | 41 ++--
 src/gallium/state_trackers/va/picture.c  | 37 ++---
 src/gallium/state_trackers/va/picture_h264.c | 29 +++-
 src/gallium/state_trackers/va/va_private.h   |  4 +--
 4 files changed, 78 insertions(+), 33 deletions(-)

diff --git a/src/gallium/state_trackers/va/context.c 
b/src/gallium/state_trackers/va/context.c
index f0051e5..985007b 100644
--- a/src/gallium/state_trackers/va/context.c
+++ b/src/gallium/state_trackers/va/context.c
@@ -187,7 +187,6 @@ vlVaCreateContext(VADriverContextP ctx, VAConfigID 
config_id, int picture_width,
   int picture_height, int flag, VASurfaceID *render_targets,
   int num_render_targets, VAContextID *context_id)
 {
-   struct pipe_video_codec templat = {};
vlVaDriver *drv;
vlVaContext *context;
int is_vpp;
@@ -213,27 +212,28 @@ vlVaCreateContext(VADriverContextP ctx, VAConfigID 
config_id, int picture_width,
  return VA_STATUS_ERROR_INVALID_CONTEXT;
   }
} else {
-  templat.profile = config_id;
-  templat.entrypoint = PIPE_VIDEO_ENTRYPOINT_BITSTREAM;
-  templat.chroma_format = PIPE_VIDEO_CHROMA_FORMAT_420;
-  templat.width = picture_width;
-  templat.height = picture_height;
-  templat.max_references = num_render_targets;
-  templat.expect_chunked_decode = true;
-
-  if (u_reduce_video_profile(templat.profile) ==
-PIPE_VIDEO_FORMAT_MPEG4_AVC)
-templat.level = u_get_h264_level(templat.width, templat.height,
- &templat.max_references);
-
-  context->decoder = drv->pipe->create_video_codec(drv->pipe, &templat);
-  if (!context->decoder) {
- FREE(context);
- return VA_STATUS_ERROR_ALLOCATION_FAILED;
+  context->templat.profile = config_id;
+  context->templat.entrypoint = PIPE_VIDEO_ENTRYPOINT_BITSTREAM;
+  context->templat.chroma_format = PIPE_VIDEO_CHROMA_FORMAT_420;
+  context->templat.width = picture_width;
+  context->templat.height = picture_height;
+  context->templat.max_references = 2;
+  context->templat.expect_chunked_decode = true;
+
+  /* Can only create decoders for which max_references is known. */
+  if (u_reduce_video_profile(context->templat.profile) !=
+ PIPE_VIDEO_FORMAT_MPEG4_AVC) {
+ context->decoder = drv->pipe->create_video_codec(drv->pipe,
+&context->templat);
+ if (!context->decoder) {
+FREE(context);
+return VA_STATUS_ERROR_ALLOCATION_FAILED;
+ }
   }
 
-  if (u_reduce_video_profile(context->decoder->profile) ==
+  if (u_reduce_video_profile(context->templat.profile) ==
  PIPE_VIDEO_FORMAT_MPEG4_AVC) {
+ context->templat.max_references = 0;
  context->desc.h264.pps = CALLOC_STRUCT(pipe_h264_pps);
  if (!context->desc.h264.pps) {
 FREE(context);
@@ -247,8 +247,9 @@ vlVaCreateContext(VADriverContextP ctx, VAConfigID 
config_id, int picture_width,
  }
   }
 
-  if (u_reduce_video_profile(context->decoder->profile) ==
+  if (u_reduce_video_profile(context->templat.profile) ==
 PIPE_VIDEO_FORMAT_HEVC) {
+ context->templat.max_references = num_render_targets;
  context->desc.h265.pps = CALLOC_STRUCT(pipe_h265_pps);
  if (!context->desc.h265.pps) {
 FREE(context);
diff --git a/src/gallium/state_trackers/va/picture.c 
b/src/gallium/state_trackers/va/picture.c
index 25d2940..e80873b 100644
--- a/src/gallium/state_trackers/va/picture.c
+++ b/src/gallium/state_trackers/va/picture.c
@@ -60,6 +60,12 @@ vlVaBeginPicture(VADriverContextP ctx, VAContextID 
context_id, VASurfaceID rende
 
context->target = surf->buffer;
if (!context->decoder) {
+  /* Decoder creation is delayed until max_references is set. */
+  if (u_reduce_video_profile(context->templat.profile) ==
+  PIPE_VIDEO_FORMAT_MPEG4_AVC)
+ return context->templat.max_references == 0 ?
+VA_STATUS_SUCCESS : VA_STATUS_ERROR_INVALID_CONTEXT;
+
   /* VPP */
   if ((context->target->buffer_format != PIPE_FORMAT_B8G8R8A8_UNORM  &&
context->target->buffer_format != PIPE_FORMAT_R8G8B8A8_UNORM  &&
@@ -67,6 +73,7 @@ vlVaBeginPicture(VADriverContextP ctx, VAContextID 
context_id, VASurfaceID rende
context->target->buffer_format != PIPE_FORMAT_R8G8B8X8_UNORM) ||
context->target->interlaced)
  return VA_STATUS_ERROR_UNIMPLEMENTED;
+
   return VA_STATUS_SUCCESS;
}
 
@@ -86,16 +93,18 @@ vlVaGetReferenceFrame(vlVaDriver *drv, VASur

Re: [Mesa-dev] [PATCH v2 1/2] st/va: if h264 then delay decoder creation until max_references is known

2015-11-25 Thread Julien Isorce
For commit message please read:

"HEVC case is left unchanged since delaying decoder creation is not needed
on AMD hardware."

instead of

"XXX: do the same for HEVC"

On 25 November 2015 at 09:07, Julien Isorce  wrote:

> From: Julien Isorce 
>
> In general max_references cannot be based on num_render_targets.
>
> This patch allow to allocate accurate sizes for buffers.
> For other codecs it is a fixed value to 2.
>
> This is similar behaviour as vaapi/vdpau-driver.
>
> XXX: do the same for HEVC
>
> Signed-off-by: Julien Isorce 
> ---
>  src/gallium/state_trackers/va/context.c  | 41
> ++--
>  src/gallium/state_trackers/va/picture.c  | 37
> ++---
>  src/gallium/state_trackers/va/picture_h264.c | 29 +++-
>  src/gallium/state_trackers/va/va_private.h   |  4 +--
>  4 files changed, 78 insertions(+), 33 deletions(-)
>
> diff --git a/src/gallium/state_trackers/va/context.c
> b/src/gallium/state_trackers/va/context.c
> index f0051e5..985007b 100644
> --- a/src/gallium/state_trackers/va/context.c
> +++ b/src/gallium/state_trackers/va/context.c
> @@ -187,7 +187,6 @@ vlVaCreateContext(VADriverContextP ctx, VAConfigID
> config_id, int picture_width,
>int picture_height, int flag, VASurfaceID
> *render_targets,
>int num_render_targets, VAContextID *context_id)
>  {
> -   struct pipe_video_codec templat = {};
> vlVaDriver *drv;
> vlVaContext *context;
> int is_vpp;
> @@ -213,27 +212,28 @@ vlVaCreateContext(VADriverContextP ctx, VAConfigID
> config_id, int picture_width,
>   return VA_STATUS_ERROR_INVALID_CONTEXT;
>}
> } else {
> -  templat.profile = config_id;
> -  templat.entrypoint = PIPE_VIDEO_ENTRYPOINT_BITSTREAM;
> -  templat.chroma_format = PIPE_VIDEO_CHROMA_FORMAT_420;
> -  templat.width = picture_width;
> -  templat.height = picture_height;
> -  templat.max_references = num_render_targets;
> -  templat.expect_chunked_decode = true;
> -
> -  if (u_reduce_video_profile(templat.profile) ==
> -PIPE_VIDEO_FORMAT_MPEG4_AVC)
> -templat.level = u_get_h264_level(templat.width, templat.height,
> - &templat.max_references);
> -
> -  context->decoder = drv->pipe->create_video_codec(drv->pipe,
> &templat);
> -  if (!context->decoder) {
> - FREE(context);
> - return VA_STATUS_ERROR_ALLOCATION_FAILED;
> +  context->templat.profile = config_id;
> +  context->templat.entrypoint = PIPE_VIDEO_ENTRYPOINT_BITSTREAM;
> +  context->templat.chroma_format = PIPE_VIDEO_CHROMA_FORMAT_420;
> +  context->templat.width = picture_width;
> +  context->templat.height = picture_height;
> +  context->templat.max_references = 2;
> +  context->templat.expect_chunked_decode = true;
> +
> +  /* Can only create decoders for which max_references is known. */
> +  if (u_reduce_video_profile(context->templat.profile) !=
> + PIPE_VIDEO_FORMAT_MPEG4_AVC) {
> + context->decoder = drv->pipe->create_video_codec(drv->pipe,
> +&context->templat);
> + if (!context->decoder) {
> +FREE(context);
> +return VA_STATUS_ERROR_ALLOCATION_FAILED;
> + }
>}
>
> -  if (u_reduce_video_profile(context->decoder->profile) ==
> +  if (u_reduce_video_profile(context->templat.profile) ==
>   PIPE_VIDEO_FORMAT_MPEG4_AVC) {
> + context->templat.max_references = 0;
>   context->desc.h264.pps = CALLOC_STRUCT(pipe_h264_pps);
>   if (!context->desc.h264.pps) {
>  FREE(context);
> @@ -247,8 +247,9 @@ vlVaCreateContext(VADriverContextP ctx, VAConfigID
> config_id, int picture_width,
>   }
>}
>
> -  if (u_reduce_video_profile(context->decoder->profile) ==
> +  if (u_reduce_video_profile(context->templat.profile) ==
>  PIPE_VIDEO_FORMAT_HEVC) {
> + context->templat.max_references = num_render_targets;
>   context->desc.h265.pps = CALLOC_STRUCT(pipe_h265_pps);
>   if (!context->desc.h265.pps) {
>  FREE(context);
> diff --git a/src/gallium/state_trackers/va/picture.c
> b/src/gallium/state_trackers/va/picture.c
> index 25d2940..e80873b 100644
> --- a/src/gallium/state_trackers/va/picture.c
> +++ b/src/gallium/state_trackers/va/picture.c
> @@ -60,6 +60,12 @@ vlVaBeginPicture(VADriverContextP ctx, VAContextID
> context_id, VASurfaceID rende
>
> context->target = surf->buffer;
> if (!context->decoder) {
> +  /* Decoder creation is delayed until max_references is set. */
> +  if (u_reduce_video_profile(context->templat.profile) ==
> +  PIPE_VIDEO_FORMAT_MPEG4_AVC)
> + return context->templat.max_references == 0 ?
> +VA_STATUS_SUCCESS : VA_STATUS_ERROR_INVALID_CONTEXT;
> +
>/* VPP */
>if ((context->target->buffer_format != PIPE_FORMAT_B8G8R8A8_UNORM
> &&
> co

[Mesa-dev] [PATCH v2 2/2] st/va: also retrieve reference frames info for h264

2015-11-25 Thread Julien Isorce
From: Julien Isorce 

Other hardwares than AMD require to parse:
VAPictureParameterBufferH264.ReferenceFrames[16]

Signed-off-by: Julien Isorce 
---
 src/gallium/state_trackers/va/picture_h264.c | 53 
 1 file changed, 53 insertions(+)

diff --git a/src/gallium/state_trackers/va/picture_h264.c 
b/src/gallium/state_trackers/va/picture_h264.c
index e9a8825..f811add 100644
--- a/src/gallium/state_trackers/va/picture_h264.c
+++ b/src/gallium/state_trackers/va/picture_h264.c
@@ -29,9 +29,24 @@
 #include "util/u_video.h"
 #include "va_private.h"
 
+
+static void resetReferencePictureDesc(struct pipe_h264_picture_desc *h264,
+  unsigned int i)
+{
+   h264->ref[i] = NULL;
+   h264->frame_num_list[i] = 0;
+   h264->is_long_term[i] = 0;
+   h264->top_is_reference[i] = 0;
+   h264->bottom_is_reference[i] = 0;
+   h264->field_order_cnt_list[i][0] = 0;
+   h264->field_order_cnt_list[i][1] = 0;
+}
+
 VAStatus vlVaHandlePictureParameterBufferH264(vlVaDriver *drv, vlVaContext 
*context, vlVaBuffer *buf)
 {
VAPictureParameterBufferH264 *h264 = buf->data;
+   unsigned int top_or_bottom_field;
+   unsigned i;
 
assert(buf->size >= sizeof(VAPictureParameterBufferH264) && 
buf->num_elements == 1);
/*CurrPic*/
@@ -91,10 +106,48 @@ VAStatus vlVaHandlePictureParameterBufferH264(vlVaDriver 
*drv, vlVaContext *cont
   h264->pic_fields.bits.redundant_pic_cnt_present_flag;
/*reference_pic_flag*/
context->desc.h264.frame_num = h264->frame_num;
+   context->desc.h264.is_reference = h264->pic_fields.bits.reference_pic_flag;
+   context->desc.h264.bottom_field_flag =
+  h264->pic_fields.bits.field_pic_flag &&
+  (h264->CurrPic.flags & VA_PICTURE_H264_BOTTOM_FIELD) != 0;
 
if (!context->decoder && context->desc.h264.num_ref_frames > 0)
   context->templat.max_references = 
MIN2(context->desc.h264.num_ref_frames, 16);
 
+   for (i = 0; i < context->templat.max_references; ++i) {
+  if ((h264->ReferenceFrames[i].flags & VA_PICTURE_H264_INVALID) ||
+  (h264->ReferenceFrames[i].picture_id == VA_INVALID_SURFACE)) {
+ resetReferencePictureDesc(&context->desc.h264, i);
+ break;
+  }
+
+  vlVaGetReferenceFrame(drv, h264->ReferenceFrames[i].picture_id, 
&context->desc.h264.ref[i]);
+  context->desc.h264.frame_num_list[i] = 
h264->ReferenceFrames[i].frame_idx;
+
+  top_or_bottom_field = h264->ReferenceFrames[i].flags &
+ (VA_PICTURE_H264_TOP_FIELD | VA_PICTURE_H264_BOTTOM_FIELD);
+  context->desc.h264.is_long_term[i] = (h264->ReferenceFrames[i].flags &
+ (VA_PICTURE_H264_SHORT_TERM_REFERENCE |
+ VA_PICTURE_H264_LONG_TERM_REFERENCE)) !=
+ VA_PICTURE_H264_SHORT_TERM_REFERENCE;
+  context->desc.h264.top_is_reference[i] =
+ !context->desc.h264.is_long_term[i] ||
+ !!(h264->ReferenceFrames[i].flags & VA_PICTURE_H264_TOP_FIELD);
+  context->desc.h264.bottom_is_reference[i] =
+ !context->desc.h264.is_long_term[i] ||
+ !!(h264->ReferenceFrames[i].flags & VA_PICTURE_H264_BOTTOM_FIELD);
+  context->desc.h264.field_order_cnt_list[i][0] =
+ top_or_bottom_field != VA_PICTURE_H264_BOTTOM_FIELD ?
+ h264->ReferenceFrames[i].TopFieldOrderCnt: INT_MAX;
+  context->desc.h264.field_order_cnt_list[i][1] =
+ top_or_bottom_field != VA_PICTURE_H264_TOP_FIELD ?
+ h264->ReferenceFrames[i].BottomFieldOrderCnt: INT_MAX;
+   }
+
+   /* Make sure remaining elements are clean */
+   for (; i < 16; ++i)
+  resetReferencePictureDesc(&context->desc.h264, i);
+
/* Create the decoder once max_references is known. */
if (!context->decoder) {
   if (!context->target)
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Don't inline intel_batchbuffer_require_space().

2015-11-25 Thread Chris Wilson
On Tue, Nov 24, 2015 at 11:07:54PM -0800, Kenneth Graunke wrote:
> On Tuesday, November 24, 2015 05:17:29 PM Matt Turner wrote:
> > It's called by the inline intel_batchbuffer_begin() function which
> > itself is used in BEGIN_BATCH. So in sequence of code emitting multiple
> > packets, we have inlined this ~200 byte function multiple times. Making
> > it an out-of-line function presumably improved icache usage.
> > 
> > Improves performance of Gl32Batch7 by 3.39898% +/- 0.358674% (n=155) on
> > Ivybridge.
> 
> That's kind of sad.  When I added the render ring prelude code to this
> function, Eric was concerned about overhead like this.  I do wonder
> whether we'd be better off doing explicit ring switching, like I did
> on the 'ringswitch' branch of my tree.  That kills a bunch of the code
> on every BEGIN_BATCH().

Also note that I sent a bunch of patches earlier to remove the extra
code from BEGIN_BATCH and a ton of unnecessary work in relocation and
batch construction. In total there's about 20% (more on byt/bsw where we
couldn't emit batches fast enough to keep the GPU busy) on *Batch7.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 23/42] glsl: Add lowering pass for shared variable references

2015-11-25 Thread Iago Toral
On Tue, 2015-11-17 at 21:54 -0800, Jordan Justen wrote:
> In this lowering pass, shared variables are decomposed into intrinsic
> calls.
> 
> v2:
>  * Send mem_ctx as a parameter (Iago)
> 
> Signed-off-by: Jordan Justen 
> ---
>  src/glsl/Makefile.sources   |   1 +
>  src/glsl/ir_optimization.h  |   1 +
>  src/glsl/linker.cpp |   4 +
>  src/glsl/lower_shared_reference.cpp | 365 
> 
>  src/mesa/main/mtypes.h  |   7 +
>  5 files changed, 378 insertions(+)
>  create mode 100644 src/glsl/lower_shared_reference.cpp
> 
> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
> index 1b51116..d73cfc5 100644
> --- a/src/glsl/Makefile.sources
> +++ b/src/glsl/Makefile.sources
> @@ -181,6 +181,7 @@ LIBGLSL_FILES = \
>   lower_vector_insert.cpp \
>   lower_vertex_id.cpp \
>   lower_output_reads.cpp \
> + lower_shared_reference.cpp \
>   lower_ubo_reference.cpp \
>   opt_algebraic.cpp \
>   opt_array_splitting.cpp \
> diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
> index 2fee81c..dabd80a 100644
> --- a/src/glsl/ir_optimization.h
> +++ b/src/glsl/ir_optimization.h
> @@ -124,6 +124,7 @@ bool lower_const_arrays_to_uniforms(exec_list 
> *instructions);
>  bool lower_clip_distance(gl_shader *shader);
>  void lower_output_reads(unsigned stage, exec_list *instructions);
>  bool lower_packing_builtins(exec_list *instructions, int op_mask);
> +void lower_shared_reference(struct gl_shader *shader, unsigned *shared_size);
>  void lower_ubo_reference(struct gl_shader *shader);
>  void lower_packed_varyings(void *mem_ctx,
> unsigned locations_used, ir_variable_mode mode,
> diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
> index db00f8f..dedcf77 100644
> --- a/src/glsl/linker.cpp
> +++ b/src/glsl/linker.cpp
> @@ -4452,6 +4452,10 @@ link_shaders(struct gl_context *ctx, struct 
> gl_shader_program *prog)
>if (ctx->Const.ShaderCompilerOptions[i].LowerBufferInterfaceBlocks)
>   lower_ubo_reference(prog->_LinkedShaders[i]);
>  
> +  if (ctx->Const.ShaderCompilerOptions[i].LowerShaderSharedVariables)
> + lower_shared_reference(prog->_LinkedShaders[i],
> +&prog->Comp.SharedSize);
> +
>lower_vector_derefs(prog->_LinkedShaders[i]);
> }
>  
> diff --git a/src/glsl/lower_shared_reference.cpp 
> b/src/glsl/lower_shared_reference.cpp
> new file mode 100644
> index 000..21bc5d5
> --- /dev/null
> +++ b/src/glsl/lower_shared_reference.cpp
> @@ -0,0 +1,365 @@
> +/*
> + * Copyright (c) 2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + */
> +
> +/**
> + * \file lower_shared_reference.cpp
> + *
> + * IR lower pass to replace dereferences of compute shader shared variables
> + * with intrinsic function calls.
> + *
> + * This relieves drivers of the responsibility of allocating space for the
> + * shared variables in the shared memory region.
> + */
> +
> +#include "ir.h"
> +#include "ir_builder.h"
> +#include "ir_rvalue_visitor.h"
> +#include "main/macros.h"
> +#include "util/list.h"
> +#include "glsl_parser_extras.h"
> +#include "lower_buffer_access.h"
> +
> +using namespace ir_builder;
> +
> +namespace {
> +
> +struct var_offset {
> +   struct list_head node;
> +   const ir_variable *var;
> +   unsigned offset;
> +};
> +
> +class lower_shared_reference_visitor :
> +  public lower_buffer_access::lower_buffer_access {
> +public:
> +
> +   lower_shared_reference_visitor(struct gl_shader *shader)
> +  : 
> lower_buffer_access::lower_buffer_access(GLSL_MATRIX_LAYOUT_COLUMN_MAJOR),
> +list_ctx(ralloc_context(NULL)), shader(shader),
> +shared_size(0u)
> +   {
> +  list_inithead(&var_offsets);
> +   }
> +
> +   ~lower_shared_reference_visitor()
> +   {
> +   

Re: [Mesa-dev] [PATCH v2 21/42] glsl: Add default matrix ordering in lower_buffer_access

2015-11-25 Thread Iago Toral

On Tue, 2015-11-17 at 21:54 -0800, Jordan Justen wrote:
> For compute shader shared variable we will set a default of column
> major.
> 
> Signed-off-by: Jordan Justen 
> ---
>  src/glsl/lower_buffer_access.cpp |  5 +++--
>  src/glsl/lower_buffer_access.h   | 10 ++
>  2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/src/glsl/lower_buffer_access.cpp 
> b/src/glsl/lower_buffer_access.cpp
> index 297ed69..66e7abe 100644
> --- a/src/glsl/lower_buffer_access.cpp
> +++ b/src/glsl/lower_buffer_access.cpp
> @@ -281,8 +281,9 @@ 
> lower_buffer_access::is_dereferenced_thing_row_major(const ir_rvalue *deref)
>  
>   switch (matrix_layout) {
>   case GLSL_MATRIX_LAYOUT_INHERITED:
> -assert(!matrix);
> -return false;
> +assert(default_matrix_layout != GLSL_MATRIX_LAYOUT_INHERITED ||
> +   !matrix);
> +return default_matrix_layout == GLSL_MATRIX_LAYOUT_ROW_MAJOR;

I am not sure I understand this. If shared variables are column major by
default, then isn't that the same behavior we have for ubos and ssbos?
In what case is this needed?

>   case GLSL_MATRIX_LAYOUT_COLUMN_MAJOR:
>  return false;
>   case GLSL_MATRIX_LAYOUT_ROW_MAJOR:
> diff --git a/src/glsl/lower_buffer_access.h b/src/glsl/lower_buffer_access.h
> index f8e1070..82b35ed 100644
> --- a/src/glsl/lower_buffer_access.h
> +++ b/src/glsl/lower_buffer_access.h
> @@ -39,6 +39,14 @@ namespace lower_buffer_access {
>  
>  class lower_buffer_access : public ir_rvalue_enter_visitor {
>  public:
> +   lower_buffer_access() :
> +  default_matrix_layout(GLSL_MATRIX_LAYOUT_INHERITED)
> +   {}
> +
> +   lower_buffer_access(enum glsl_matrix_layout default_matrix_layout) :
> +  default_matrix_layout(default_matrix_layout)
> +   {}
> +
> virtual void
> insert_buffer_access(void *mem_ctx, ir_dereference *deref,
>  const glsl_type *type, ir_rvalue *offset,
> @@ -55,6 +63,8 @@ public:
>  ir_rvalue **offset, unsigned *const_offset,
>  bool *row_major, int *matrix_columns,
>  unsigned packing);
> +
> +   enum glsl_matrix_layout default_matrix_layout;
>  };
>  
>  } /* namespace lower_buffer_access */



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] dri_interface: Introduce __DRI_IMAGE_USE_SCANOUT_ROTATION_* flags (v3)

2015-11-25 Thread Michel Dänzer
On 21.11.2015 12:38, Vivek Kasireddy wrote:
> These flags can be used by the DRI driver to set additional requirements
> such as tiling while creating buffers.
> 
> v2: Added a brief comment to explain the rotation orientation.
> 
> v3: Corrected a typo in the comment added in v2 and removed an empty
> line.
> 
> Cc: Michel Danzer 
> Signed-off-by: Vivek Kasireddy 

The series is

Reviewed-by: Michel Dänzer 


Do you need somebody to push the patches for you?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 93100] [IVB]piglit/spec/opengl_3.3/gl-3.2-layered-rendering-framebuffertexture fails

2015-11-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=93100

Bug ID: 93100
   Summary: [IVB]piglit/spec/opengl_3.3/gl-3.2-layered-rendering-f
ramebuffertexture fails
   Product: Mesa
   Version: unspecified
  Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
  Severity: normal
  Priority: medium
 Component: Mesa core
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: marius.pre...@intel.com
QA Contact: mesa-dev@lists.freedesktop.org

get mesa master commit 685335639a982b398d305b8f314fc3857fcdbead
get piglit master commit 566627da9f3c728036212ef0dad9e12549da586d.

Step to reproduce:
/bin/gl-3.2-layered-rendering-framebuffertexture -auto -fbo

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 24/42] nir: Translate glsl shared var load intrinsic to nir intrinsic

2015-11-25 Thread Iago Toral
On Tue, 2015-11-17 at 21:54 -0800, Jordan Justen wrote:
> Signed-off-by: Jordan Justen 
> ---
>  src/glsl/nir/glsl_to_nir.cpp  | 29 +
>  src/glsl/nir/nir_intrinsics.h |  1 +
>  2 files changed, 30 insertions(+)
> 
> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
> index 6d24341..a59d09c 100644
> --- a/src/glsl/nir/glsl_to_nir.cpp
> +++ b/src/glsl/nir/glsl_to_nir.cpp
> @@ -727,6 +727,8 @@ nir_visitor::visit(ir_call *ir)
>   op = nir_intrinsic_memory_barrier_image;
>} else if (strcmp(ir->callee_name(), 
> "__intrinsic_memory_barrier_shared") == 0) {
>   op = nir_intrinsic_memory_barrier_shared;
> +  } else if (strcmp(ir->callee_name(), "__intrinsic_load_shared") == 0) {
> + op = nir_intrinsic_load_shared;
>} else {
>   unreachable("not reached");
>}
> @@ -974,6 +976,33 @@ nir_visitor::visit(ir_call *ir)
>   nir_builder_instr_insert(&b, &instr->instr);
>   break;
>}
> +  case nir_intrinsic_load_shared: {
> + exec_node *param = ir->actual_parameters.get_head();
> + ir_rvalue *offset = ((ir_instruction *)param)->as_rvalue();
> +
> + /* Check if we need the indirect version */
> + ir_constant *const_offset = offset->as_constant();
> + if (!const_offset) {
> +op = nir_intrinsic_load_shared_indirect;
> +ralloc_free(instr);
> +instr = nir_intrinsic_instr_create(shader, op);
> +instr->src[0] = nir_src_for_ssa(evaluate_rvalue(offset));
> +instr->const_index[0] = 0;
> +dest = &instr->dest;
> + } else {
> +instr->const_index[0] = const_offset->value.u[0];
> + }
> +
> + const glsl_type *type = ir->return_deref->var->type;
> + instr->num_components = type->vector_elements;
> +
> + /* Setup destination register */
> + nir_ssa_dest_init(&instr->instr, &instr->dest,
> +   type->vector_elements, NULL);
> +
> + nir_builder_instr_insert(&b, &instr->instr);

In ubos and ssbo loads we need to fixup bool types, it looks like you
would need to do the same here, right?

> + break;
> +  }
>default:
>   unreachable("not reached");
>}
> diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
> index b8d7d6c..de15128 100644
> --- a/src/glsl/nir/nir_intrinsics.h
> +++ b/src/glsl/nir/nir_intrinsics.h
> @@ -257,6 +257,7 @@ LOAD(per_vertex_input, 1, 1, NIR_INTRINSIC_CAN_ELIMINATE 
> | NIR_INTRINSIC_CAN_REO
>  LOAD(ssbo, 1, 1, NIR_INTRINSIC_CAN_ELIMINATE)
>  LOAD(output, 0, 1, NIR_INTRINSIC_CAN_ELIMINATE)
>  LOAD(per_vertex_output, 1, 1, NIR_INTRINSIC_CAN_ELIMINATE)
> +LOAD(shared, 0, 1, NIR_INTRINSIC_CAN_ELIMINATE)
>  
>  /*
>   * Stores work the same way as loads, except now the first register input is


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/2] st/va: if h264 then delay decoder creation until max_references is known

2015-11-25 Thread Christian König

On 25.11.2015 10:12, Julien Isorce wrote:

For commit message please read:

"HEVC case is left unchanged since delaying decoder creation is not 
needed on AMD hardware."


In this case please update the commit message, but honestly I'm not sure 
if we don't use the max_references somewhere in the DPB calculation for 
HEVC.


Some more comments below.



instead of

"XXX: do the same for HEVC"

On 25 November 2015 at 09:07, Julien Isorce > wrote:


From: Julien Isorce mailto:julien.iso...@gmail.com>>

In general max_references cannot be based on num_render_targets.

This patch allow to allocate accurate sizes for buffers.
For other codecs it is a fixed value to 2.

This is similar behaviour as vaapi/vdpau-driver.

XXX: do the same for HEVC

Signed-off-by: Julien Isorce mailto:j.iso...@samsung.com>>
---
 src/gallium/state_trackers/va/context.c  | 41
++--
 src/gallium/state_trackers/va/picture.c  | 37
++---
 src/gallium/state_trackers/va/picture_h264.c | 29
+++-
 src/gallium/state_trackers/va/va_private.h   |  4 +--
 4 files changed, 78 insertions(+), 33 deletions(-)

diff --git a/src/gallium/state_trackers/va/context.c
b/src/gallium/state_trackers/va/context.c
index f0051e5..985007b 100644
--- a/src/gallium/state_trackers/va/context.c
+++ b/src/gallium/state_trackers/va/context.c
@@ -187,7 +187,6 @@ vlVaCreateContext(VADriverContextP ctx,
VAConfigID config_id, int picture_width,
   int picture_height, int flag, VASurfaceID
*render_targets,
   int num_render_targets, VAContextID *context_id)
 {
-   struct pipe_video_codec templat = {};
vlVaDriver *drv;
vlVaContext *context;
int is_vpp;
@@ -213,27 +212,28 @@ vlVaCreateContext(VADriverContextP ctx,
VAConfigID config_id, int picture_width,
  return VA_STATUS_ERROR_INVALID_CONTEXT;
   }
} else {
-  templat.profile = config_id;
-  templat.entrypoint = PIPE_VIDEO_ENTRYPOINT_BITSTREAM;
-  templat.chroma_format = PIPE_VIDEO_CHROMA_FORMAT_420;
-  templat.width = picture_width;
-  templat.height = picture_height;
-  templat.max_references = num_render_targets;
-  templat.expect_chunked_decode = true;
-
-  if (u_reduce_video_profile(templat.profile) ==
-PIPE_VIDEO_FORMAT_MPEG4_AVC)
-templat.level = u_get_h264_level(templat.width,
templat.height,
- &templat.max_references);
-
-  context->decoder = drv->pipe->create_video_codec(drv->pipe,
&templat);
-  if (!context->decoder) {
- FREE(context);
- return VA_STATUS_ERROR_ALLOCATION_FAILED;
+  context->templat.profile = config_id;
+  context->templat.entrypoint = PIPE_VIDEO_ENTRYPOINT_BITSTREAM;
+  context->templat.chroma_format = PIPE_VIDEO_CHROMA_FORMAT_420;
+  context->templat.width = picture_width;
+  context->templat.height = picture_height;
+  context->templat.max_references = 2;
+  context->templat.expect_chunked_decode = true;
+
+  /* Can only create decoders for which max_references is
known. */
+  if (u_reduce_video_profile(context->templat.profile) !=
+ PIPE_VIDEO_FORMAT_MPEG4_AVC) {
+ context->decoder = drv->pipe->create_video_codec(drv->pipe,
+&context->templat);
+ if (!context->decoder) {
+FREE(context);
+return VA_STATUS_ERROR_ALLOCATION_FAILED;
+ }
   }

-  if (u_reduce_video_profile(context->decoder->profile) ==
+  if (u_reduce_video_profile(context->templat.profile) ==
  PIPE_VIDEO_FORMAT_MPEG4_AVC) {



Please join this check with the one above, maybe even make this a switch 
statement.


Apart from that looks good to me.


+ context->templat.max_references = 0;
  context->desc.h264.pps = CALLOC_STRUCT(pipe_h264_pps);
  if (!context->desc.h264.pps) {
 FREE(context);
@@ -247,8 +247,9 @@ vlVaCreateContext(VADriverContextP ctx,
VAConfigID config_id, int picture_width,
  }
   }

-  if (u_reduce_video_profile(context->decoder->profile) ==
+  if (u_reduce_video_profile(context->templat.profile) ==
 PIPE_VIDEO_FORMAT_HEVC) {
+ context->templat.max_references = num_render_targets;
  context->desc.h265.pps = CALLOC_STRUCT(pipe_h265_pps);
  if (!context->desc.h265.pps) {
 FREE(context);
diff --git a/src/gallium/state_trackers/va/picture.c
b/src/gallium/state_trackers/va/picture.c
index 25d2940..e80873b 100644
--- a/src/gallium/state_trackers/va/picture.c
++

Re: [Mesa-dev] [PATCH v2 2/2] st/va: also retrieve reference frames info for h264

2015-11-25 Thread Christian König

On 25.11.2015 10:13, Julien Isorce wrote:

From: Julien Isorce 

Other hardwares than AMD require to parse:
VAPictureParameterBufferH264.ReferenceFrames[16]

Signed-off-by: Julien Isorce 


Reviewed-by: Christian König 


---
  src/gallium/state_trackers/va/picture_h264.c | 53 
  1 file changed, 53 insertions(+)

diff --git a/src/gallium/state_trackers/va/picture_h264.c 
b/src/gallium/state_trackers/va/picture_h264.c
index e9a8825..f811add 100644
--- a/src/gallium/state_trackers/va/picture_h264.c
+++ b/src/gallium/state_trackers/va/picture_h264.c
@@ -29,9 +29,24 @@
  #include "util/u_video.h"
  #include "va_private.h"
  
+

+static void resetReferencePictureDesc(struct pipe_h264_picture_desc *h264,
+  unsigned int i)
+{
+   h264->ref[i] = NULL;
+   h264->frame_num_list[i] = 0;
+   h264->is_long_term[i] = 0;
+   h264->top_is_reference[i] = 0;
+   h264->bottom_is_reference[i] = 0;
+   h264->field_order_cnt_list[i][0] = 0;
+   h264->field_order_cnt_list[i][1] = 0;
+}
+
  VAStatus vlVaHandlePictureParameterBufferH264(vlVaDriver *drv, vlVaContext 
*context, vlVaBuffer *buf)
  {
 VAPictureParameterBufferH264 *h264 = buf->data;
+   unsigned int top_or_bottom_field;
+   unsigned i;
  
 assert(buf->size >= sizeof(VAPictureParameterBufferH264) && buf->num_elements == 1);

 /*CurrPic*/
@@ -91,10 +106,48 @@ VAStatus vlVaHandlePictureParameterBufferH264(vlVaDriver 
*drv, vlVaContext *cont
h264->pic_fields.bits.redundant_pic_cnt_present_flag;
 /*reference_pic_flag*/
 context->desc.h264.frame_num = h264->frame_num;
+   context->desc.h264.is_reference = h264->pic_fields.bits.reference_pic_flag;
+   context->desc.h264.bottom_field_flag =
+  h264->pic_fields.bits.field_pic_flag &&
+  (h264->CurrPic.flags & VA_PICTURE_H264_BOTTOM_FIELD) != 0;
  
 if (!context->decoder && context->desc.h264.num_ref_frames > 0)

context->templat.max_references = 
MIN2(context->desc.h264.num_ref_frames, 16);
  
+   for (i = 0; i < context->templat.max_references; ++i) {

+  if ((h264->ReferenceFrames[i].flags & VA_PICTURE_H264_INVALID) ||
+  (h264->ReferenceFrames[i].picture_id == VA_INVALID_SURFACE)) {
+ resetReferencePictureDesc(&context->desc.h264, i);
+ break;
+  }
+
+  vlVaGetReferenceFrame(drv, h264->ReferenceFrames[i].picture_id, 
&context->desc.h264.ref[i]);
+  context->desc.h264.frame_num_list[i] = 
h264->ReferenceFrames[i].frame_idx;
+
+  top_or_bottom_field = h264->ReferenceFrames[i].flags &
+ (VA_PICTURE_H264_TOP_FIELD | VA_PICTURE_H264_BOTTOM_FIELD);
+  context->desc.h264.is_long_term[i] = (h264->ReferenceFrames[i].flags &
+ (VA_PICTURE_H264_SHORT_TERM_REFERENCE |
+ VA_PICTURE_H264_LONG_TERM_REFERENCE)) !=
+ VA_PICTURE_H264_SHORT_TERM_REFERENCE;
+  context->desc.h264.top_is_reference[i] =
+ !context->desc.h264.is_long_term[i] ||
+ !!(h264->ReferenceFrames[i].flags & VA_PICTURE_H264_TOP_FIELD);
+  context->desc.h264.bottom_is_reference[i] =
+ !context->desc.h264.is_long_term[i] ||
+ !!(h264->ReferenceFrames[i].flags & VA_PICTURE_H264_BOTTOM_FIELD);
+  context->desc.h264.field_order_cnt_list[i][0] =
+ top_or_bottom_field != VA_PICTURE_H264_BOTTOM_FIELD ?
+ h264->ReferenceFrames[i].TopFieldOrderCnt: INT_MAX;
+  context->desc.h264.field_order_cnt_list[i][1] =
+ top_or_bottom_field != VA_PICTURE_H264_TOP_FIELD ?
+ h264->ReferenceFrames[i].BottomFieldOrderCnt: INT_MAX;
+   }
+
+   /* Make sure remaining elements are clean */
+   for (; i < 16; ++i)
+  resetReferencePictureDesc(&context->desc.h264, i);
+
 /* Create the decoder once max_references is known. */
 if (!context->decoder) {
if (!context->target)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] SSO fixes V3

2015-11-25 Thread Timothy Arceri
V3:
- Fixed explicit location optimisation (Patch 1)
- Added fix for assigning varyings to the same location when
 using arrays (Patch 6)
- Pushed some clean-up patches reviewed in v2

V2: I've dropped most of my fix from V1 and instead included the fixes
from Gregory which I have cleaned up slightly. I've also added a
patch to remove inactive varyings with an explicit location.

The removal of inactive varyings with an explicit location results
in no changes in the public shader-db although its unlikely many if any
shaders in there make use of SSO.

I also ran the series on Intels CI system with no regressions.

The series fixes the following piglit tests:

piglit.spec.arb_separate_shader_objects.rendezvous by name
piglit.spec.arb_separate_shader_objects.rendezvous by name with multiple 
interpolation qualifier
http://patchwork.freedesktop.org/patch/66094/

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/6] glsl: optimise inputs/outputs with explicit locations

2015-11-25 Thread Timothy Arceri
This change allows used defined inputs/outputs with explicit locations
to be removed if they are detected to not be used between shaders
at link time.

To enable this we change the is_unmatched_generic_inout field to be
flagged when we have a user defined varying. Previously
explicit_location was assumed to be set only in builtins however SSO
allows the user to set an explicit location.

We then add a function to match explicit locations between shaders.

V2: call match_explicit_outputs_to_inputs() after
is_unmatched_generic_inout has been initialised.

Cc: Gregory Hainaut 
---
 src/glsl/link_varyings.cpp |  6 ++--
 src/glsl/linker.cpp| 82 +++---
 2 files changed, 74 insertions(+), 14 deletions(-)

diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp
index c0b4b3e..ac2755f 100644
--- a/src/glsl/link_varyings.cpp
+++ b/src/glsl/link_varyings.cpp
@@ -896,8 +896,10 @@ varying_matches::record(ir_variable *producer_var, 
ir_variable *consumer_var)
 {
assert(producer_var != NULL || consumer_var != NULL);
 
-   if ((producer_var && !producer_var->data.is_unmatched_generic_inout)
-   || (consumer_var && !consumer_var->data.is_unmatched_generic_inout)) {
+   if ((producer_var && (!producer_var->data.is_unmatched_generic_inout ||
+   producer_var->data.explicit_location)) ||
+   (consumer_var && (!consumer_var->data.is_unmatched_generic_inout ||
+   consumer_var->data.explicit_location))) {
   /* Either a location already exists for this variable (since it is part
* of fixed functionality), or it has already been recorded as part of a
* previous match.
diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
index 5ff433c..930b585 100644
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -631,20 +631,12 @@ link_invalidate_variable_locations(exec_list *ir)
 
   /* ir_variable::is_unmatched_generic_inout is used by the linker while
* connecting outputs from one stage to inputs of the next stage.
-   *
-   * There are two implicit assumptions here.  First, we assume that any
-   * built-in variable (i.e., non-generic in or out) will have
-   * explicit_location set.  Second, we assume that any generic in or out
-   * will not have explicit_location set.
-   *
-   * This second assumption will only be valid until
-   * GL_ARB_separate_shader_objects is supported.  When that extension is
-   * implemented, this function will need some modifications.
*/
-  if (!var->data.explicit_location) {
- var->data.is_unmatched_generic_inout = 1;
-  } else {
+  if (var->data.explicit_location &&
+  var->data.location < VARYING_SLOT_VAR0) {
  var->data.is_unmatched_generic_inout = 0;
+  } else {
+ var->data.is_unmatched_generic_inout = 1;
   }
}
 }
@@ -2421,6 +2413,7 @@ assign_attribute_or_color_locations(gl_shader_program 
*prog,
 continue;
 
   if (var->data.explicit_location) {
+ var->data.is_unmatched_generic_inout = 0;
 if ((var->data.location >= (int)(max_index + generic_base))
 || (var->data.location < 0)) {
linker_error(prog,
@@ -2690,6 +2683,61 @@ assign_attribute_or_color_locations(gl_shader_program 
*prog,
return true;
 }
 
+/**
+ * Match explicit locations of outputs to inputs and deactivate the
+ * unmatch flag if found so we don't optimise them alway.
+ */
+void
+match_explicit_outputs_to_inputs(struct gl_shader_program *prog,
+ gl_shader *producer,
+ gl_shader *consumer)
+{
+   glsl_symbol_table parameters;
+   ir_variable *explicit_locations[MAX_VARYING] = { NULL, };
+
+   /* Find all shader outputs in the "producer" stage.
+*/
+   foreach_in_list(ir_instruction, node, producer->ir) {
+  ir_variable *const var = node->as_variable();
+
+  if ((var == NULL) || (var->data.mode != ir_var_shader_out))
+continue;
+
+  /* Mark output as matched if separte shader with no linked consumer */
+  if (consumer == NULL)
+ var->data.is_unmatched_generic_inout = 0;
+
+  if (var->data.explicit_location &&
+  var->data.location >= VARYING_SLOT_VAR0) {
+ const unsigned idx = var->data.location - VARYING_SLOT_VAR0;
+ if (explicit_locations[idx] == NULL)
+explicit_locations[idx] = var;
+  }
+   }
+
+   /* Match inputs to outputs */
+   foreach_in_list(ir_instruction, node, consumer->ir) {
+  ir_variable *const input = node->as_variable();
+
+  if ((input == NULL) || (input->data.mode != ir_var_shader_in))
+continue;
+
+  /* Mark input as matched if separte shader with no linked producer */
+  if (producer == NULL)
+ input->data.is_unmatched_generic_inout = 0;
+
+  ir_variable *output = NULL;
+  if (input->data.explicit_location
+  && input->data.location >= VARYING_SLOT_VAR0) {
+   

[Mesa-dev] [PATCH 2/6] glsl: copy how_declared when lowering interface blocks

2015-11-25 Thread Timothy Arceri
Cc: Gregory Hainaut 
---
 src/glsl/lower_named_interface_blocks.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/glsl/lower_named_interface_blocks.cpp 
b/src/glsl/lower_named_interface_blocks.cpp
index 114bb58..f29eba4 100644
--- a/src/glsl/lower_named_interface_blocks.cpp
+++ b/src/glsl/lower_named_interface_blocks.cpp
@@ -187,6 +187,7 @@ flatten_named_interface_blocks_declarations::run(exec_list 
*instructions)
 new_var->data.sample = iface_t->fields.structure[i].sample;
 new_var->data.patch = iface_t->fields.structure[i].patch;
 new_var->data.stream = var->data.stream;
+new_var->data.how_declared = var->data.how_declared;
 
 new_var->init_interface_type(iface_t);
 hash_table_insert(interface_namespace, new_var,
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/6] glsl: fix overlapping of varying locations for arrays

2015-11-25 Thread Timothy Arceri
Previously we were not reserving the full array for explicit locations.

We also didn't take into account implicit locations clashing with
explicit locations when assigning locations for their arrays.

This patch fixes both issues.

There is no effort to make arrays of arrays work here because we should
just add a lowing pass for inputs and outputs that turns
arrays of arrays into single dimension arrays with consecutive locations
which would allow this change to work with arrays of array also.

Cc: Gregory Hainaut 
---
 src/glsl/link_varyings.cpp | 68 --
 1 file changed, 60 insertions(+), 8 deletions(-)

diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp
index 71750d1..44909c2 100644
--- a/src/glsl/link_varyings.cpp
+++ b/src/glsl/link_varyings.cpp
@@ -766,7 +766,8 @@ public:
gl_shader_stage consumer_stage);
~varying_matches();
void record(ir_variable *producer_var, ir_variable *consumer_var);
-   unsigned assign_locations(uint64_t reserved_slots, bool separate_shader);
+   unsigned assign_locations(struct gl_shader_program *prog,
+ uint64_t reserved_slots, bool separate_shader);
void store_locations() const;
 
 private:
@@ -988,7 +989,9 @@ varying_matches::record(ir_variable *producer_var, 
ir_variable *consumer_var)
  * passed to varying_matches::record().
  */
 unsigned
-varying_matches::assign_locations(uint64_t reserved_slots, bool 
separate_shader)
+varying_matches::assign_locations(struct gl_shader_program *prog,
+  uint64_t reserved_slots,
+  bool separate_shader)
 {
/* We disable varying sorting for separate shader programs for the
 * following reasons:
@@ -1040,9 +1043,53 @@ varying_matches::assign_locations(uint64_t 
reserved_slots, bool separate_shader)
   != this->matches[i].packing_class) {
  *location = ALIGN(*location, 4);
   }
-  while ((*location < MAX_VARYING * 4u) &&
-(reserved_slots & (1u << *location / 4u))) {
- *location = ALIGN(*location + 1, 4);
+
+  const ir_variable *var =
+ matches[i].consumer_var ? matches[i].consumer_var :
+matches[i].producer_var;
+
+  unsigned num_elements =
+ var->type->is_array() ? var->type->length : 1;
+
+  /* This will hold the location of the last component in the varying */
+  unsigned last_component = this->disable_varying_packing ? 4 :
+ var->type->without_array()->vector_elements;
+  last_component += *location - 1;
+
+  /* FIXME: We could be smarter in the below code and loop back over
+   * trying to fill any locations that we skipped because we couldn't pack
+   * the varying between an explicit location. For now just let the user
+   * hit the linking error if we run out of room and suggest they use
+   * explicit locations.
+   */
+  for (unsigned j = 0; j < num_elements; j++) {
+ while ((last_component < MAX_VARYING * 4u) &&
+((reserved_slots & (1u << *location / 4u) ||
+ (reserved_slots & (1u << last_component / 4u) {
+
+*location = ALIGN(*location + 1, 4);
+last_component = *location - 1;
+
+/* reset the counter and try again */
+j = 0;
+ }
+
+ /* Increase the last component location to make sure there is enough
+  * room for next array element, or if we are trying a new location
+  * we are bumping this to the last component.
+  */
+ if (this->disable_varying_packing)
+last_component += 4;
+ else
+last_component += var->type->without_array()->vector_elements;
+  }
+
+  if (*location >= MAX_VARYING * 4u) {
+ linker_error(prog, "insufficient contiguous locations available for "
+  "%s it is likely an array could not be packed between "
+  "varyings with explicit locations. Try using an "
+  "explicit location for arrays.",
+  var->name);
   }
 
   this->matches[i].generic_location = *location;
@@ -1430,8 +1477,13 @@ reserved_varying_slot(struct gl_shader *stage, 
ir_variable_mode io_mode)
  continue;
 
   var_slot = var->data.location - VARYING_SLOT_VAR0;
-  if (var_slot >= 0 && var_slot < MAX_VARYING)
- slots |= 1u << var_slot;
+  unsigned num_elements =
+ var->type->is_array() ? var->type->length : 1;
+  for (unsigned i = 0; i < num_elements; i++) {
+ if (var_slot >= 0 && var_slot < MAX_VARYING)
+slots |= 1u << var_slot;
+ var_slot += 1;
+  }
}
 
return slots;
@@ -1617,7 +1669,7 @@ assign_varying_locations(struct gl_context *ctx,
   reserved_varying_slot(producer, ir_var_shader_out) |
   reserved_varying_slot(consumer, ir_var_shader_in);
 
-   const unsigned slots_u

[Mesa-dev] [PATCH 3/6] glsl: add always_active_io attribute to ir_variable

2015-11-25 Thread Timothy Arceri
From: Gregory Hainaut 

The value will be set in separate-shader program when an input/output
must remains active. e.g. when deadcode removal isn't allowed because
it will create interface location/name-matching mismatch.

v3:
* Rename the attribute
* Use ir_variable directly instead of ir_variable_refcount_visitor
* Move the foreach IR code in the linker file

v4:
* Fix variable name in assert

v5 (by Timothy Arceri):
* Rename functions and reword comments
* Don't set alway active on builtins

Signed-off-by: Gregory Hainaut 
Reviewed-by: Timothy Arceri 
---
 src/glsl/ir.cpp |  1 +
 src/glsl/ir.h   |  7 +
 src/glsl/linker.cpp | 74 +
 3 files changed, 82 insertions(+)

diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp
index ca520f5..f989e9b 100644
--- a/src/glsl/ir.cpp
+++ b/src/glsl/ir.cpp
@@ -1669,6 +1669,7 @@ ir_variable::ir_variable(const struct glsl_type *type, 
const char *name,
this->data.pixel_center_integer = false;
this->data.depth_layout = ir_depth_layout_none;
this->data.used = false;
+   this->data.always_active_io = false;
this->data.read_only = false;
this->data.centroid = false;
this->data.sample = false;
diff --git a/src/glsl/ir.h b/src/glsl/ir.h
index 20c94a1..717b036 100644
--- a/src/glsl/ir.h
+++ b/src/glsl/ir.h
@@ -659,6 +659,13 @@ public:
   unsigned assigned:1;
 
   /**
+   * When separate shader programs are enabled, only input/outputs between
+   * the stages of a multi-stage separate program can be safely removed
+   * from the shader interface. Other input/outputs must remains active.
+   */
+  unsigned always_active_io:1;
+
+  /**
* Enum indicating how the variable was declared.  See
* ir_var_declaration_type.
*
diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
index 930b585..4745e86 100644
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -3988,6 +3988,77 @@ split_ubos_and_ssbos(void *mem_ctx,
assert(*num_ubos + *num_ssbos == num_blocks);
 }
 
+static void
+set_always_active_io(exec_list *ir, ir_variable_mode io_mode)
+{
+   assert(io_mode == ir_var_shader_in || io_mode == ir_var_shader_out);
+
+   foreach_in_list(ir_instruction, node, ir) {
+  ir_variable *const var = node->as_variable();
+
+  if (var == NULL || var->data.mode != io_mode)
+ continue;
+
+  /* Don't set alway active on builtins that haven't been redeclared */
+  if(var->data.how_declared == ir_var_declared_implicitly)
+ continue;
+
+  var->data.always_active_io = true;
+   }
+}
+
+/**
+ * When separate shader programs are enabled, only input/outputs between
+ * the stages of a multi-stage separate program can be safely removed
+ * from the shader interface. Other input/outputs must remains active.
+ */
+static void
+disable_varying_optimizations_for_sso(struct gl_shader_program *prog)
+{
+   unsigned first, last;
+   assert(prog->SeparateShader);
+
+   first = MESA_SHADER_STAGES;
+   last = 0;
+
+   /* Determine first and last stage. Excluding the compute stage */
+   for (unsigned i = 0; i < MESA_SHADER_COMPUTE; i++) {
+  if (!prog->_LinkedShaders[i])
+ continue;
+  if (first == MESA_SHADER_STAGES)
+ first = i;
+  last = i;
+   }
+
+   if (first == MESA_SHADER_STAGES)
+  return;
+
+   for (unsigned stage = 0; stage < MESA_SHADER_STAGES; stage++) {
+  gl_shader *sh = prog->_LinkedShaders[stage];
+  if (!sh)
+ continue;
+
+  if (first == last) {
+ /* For a single shader program only allow inputs to the vertex shader
+  * and outputs from the fragment shader to be removed.
+  */
+ if (stage != MESA_SHADER_VERTEX)
+set_always_active_io(sh->ir, ir_var_shader_in);
+ if (stage != MESA_SHADER_FRAGMENT)
+set_always_active_io(sh->ir, ir_var_shader_out);
+  } else {
+ /* For multi-stage separate shader programs only allow inputs and
+  * outputs between the shader stages to be removed as well as inputs
+  * to the vertex shader and outputs from the fragment shader.
+  */
+ if (stage == first && stage != MESA_SHADER_VERTEX)
+set_always_active_io(sh->ir, ir_var_shader_in);
+ else if (stage == last && stage != MESA_SHADER_FRAGMENT)
+set_always_active_io(sh->ir, ir_var_shader_out);
+  }
+   }
+}
+
 void
 link_shaders(struct gl_context *ctx, struct gl_shader_program *prog)
 {
@@ -4255,6 +4326,9 @@ link_shaders(struct gl_context *ctx, struct 
gl_shader_program *prog)
   }
}
 
+   if (prog->SeparateShader)
+  disable_varying_optimizations_for_sso(prog);
+
if (!interstage_cross_validate_uniform_blocks(prog))
   goto done;
 
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/6] glsl: don't dead code remove SSO varyings marked as active

2015-11-25 Thread Timothy Arceri
From: Gregory Hainaut 

GL_ARB_separate_shader_objects allow matching by name variable or block
interface. Input varyings can't be removed because it is will impact the
location assignment.

This fixes the bug 79783 and likely any application that uses
GL_ARB_separate_shader_objects extension.

V2 (by Timothy Arceri):
* simplify now that builtins are not set as always active

Signed-off-by: Gregory Hainaut 
Reviewed-by: Timothy Arceri 
https://bugs.freedesktop.org/show_bug.cgi?id=79783
---
 src/glsl/opt_dead_code.cpp | 16 
 1 file changed, 16 insertions(+)

diff --git a/src/glsl/opt_dead_code.cpp b/src/glsl/opt_dead_code.cpp
index c5be166..d2316bc 100644
--- a/src/glsl/opt_dead_code.cpp
+++ b/src/glsl/opt_dead_code.cpp
@@ -75,6 +75,22 @@ do_dead_code(exec_list *instructions, bool 
uniform_locations_assigned)
  || !entry->declaration)
 continue;
 
+  /* Section 7.4.1 (Shader Interface Matching) of the OpenGL 4.5
+   * (Core Profile) spec says:
+   *
+   *"With separable program objects, interfaces between shader
+   *stages may involve the outputs from one program object and the
+   *inputs from a second program object.  For such interfaces, it is
+   *not possible to detect mismatches at link time, because the
+   *programs are linked separately. When each such program is
+   *linked, all inputs or outputs interfacing with another program
+   *stage are treated as active."
+   */
+  if (entry->var->data.always_active_io &&
+  (entry->var->data.mode == ir_var_shader_in ||
+   entry->var->data.mode == ir_var_shader_out))
+ continue;
+
   if (!entry->assign_list.is_empty()) {
 /* Remove all the dead assignments to the variable we found.
  * Don't do so if it's a shader or function output, though.
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/6] glsl: don't sort varying in separate shader mode

2015-11-25 Thread Timothy Arceri
From: Gregory Hainaut 

This fixes an issue where the addition of the FLAT qualifier in
varying_matches::record() can break the expected varying order.

It also avoids a future issue with the relaxing of interpolation
qualifier matching constraints in GLSL 4.50.

V2: (by Timothy Arceri)
* reworked comment slightly

Signed-off-by: Gregory Hainaut 
Reviewed-by: Timothy Arceri 
---
 src/glsl/link_varyings.cpp | 38 --
 1 file changed, 32 insertions(+), 6 deletions(-)

diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp
index ac2755f..71750d1 100644
--- a/src/glsl/link_varyings.cpp
+++ b/src/glsl/link_varyings.cpp
@@ -766,7 +766,7 @@ public:
gl_shader_stage consumer_stage);
~varying_matches();
void record(ir_variable *producer_var, ir_variable *consumer_var);
-   unsigned assign_locations(uint64_t reserved_slots);
+   unsigned assign_locations(uint64_t reserved_slots, bool separate_shader);
void store_locations() const;
 
 private:
@@ -988,11 +988,36 @@ varying_matches::record(ir_variable *producer_var, 
ir_variable *consumer_var)
  * passed to varying_matches::record().
  */
 unsigned
-varying_matches::assign_locations(uint64_t reserved_slots)
+varying_matches::assign_locations(uint64_t reserved_slots, bool 
separate_shader)
 {
-   /* Sort varying matches into an order that makes them easy to pack. */
-   qsort(this->matches, this->num_matches, sizeof(*this->matches),
- &varying_matches::match_comparator);
+   /* We disable varying sorting for separate shader programs for the
+* following reasons:
+*
+* 1/ All programs must sort the code in the same order to guarantee the
+*interface matching. However varying_matches::record() will change the
+*interpolation qualifier of some stages.
+*
+* 2/ GLSL version 4.50 removes the matching constrain on the interpolation
+*qualifier.
+*
+* From Section 4.5 (Interpolation Qualifiers) of the GLSL 4.40 spec:
+*
+*"The type and presence of interpolation qualifiers of variables with
+*the same name declared in all linked shaders for the same cross-stage
+*interface must match, otherwise the link command will fail.
+*
+*When comparing an output from one stage to an input of a subsequent
+*stage, the input and output don't match if their interpolation
+*qualifiers (or lack thereof) are not the same."
+*
+*"It is a link-time error if, within the same stage, the interpolation
+*qualifiers of variables of the same name do not match."
+*/
+   if (!separate_shader) {
+  /* Sort varying matches into an order that makes them easy to pack. */
+  qsort(this->matches, this->num_matches, sizeof(*this->matches),
+&varying_matches::match_comparator);
+   }
 
unsigned generic_location = 0;
unsigned generic_patch_location = MAX_VARYING*4;
@@ -1592,7 +1617,8 @@ assign_varying_locations(struct gl_context *ctx,
   reserved_varying_slot(producer, ir_var_shader_out) |
   reserved_varying_slot(consumer, ir_var_shader_in);
 
-   const unsigned slots_used = matches.assign_locations(reserved_slots);
+   const unsigned slots_used = matches.assign_locations(reserved_slots,
+prog->SeparateShader);
matches.store_locations();
 
for (unsigned i = 0; i < num_tfeedback_decls; ++i) {
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 25/42] nir: Translate glsl shared var store intrinsic to nir intrinsic

2015-11-25 Thread Iago Toral
On Tue, 2015-11-17 at 21:54 -0800, Jordan Justen wrote:
> Signed-off-by: Jordan Justen 
> ---
>  src/glsl/nir/glsl_to_nir.cpp  | 33 +
>  src/glsl/nir/nir_intrinsics.h |  3 ++-
>  2 files changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
> index a59d09c..a832570 100644
> --- a/src/glsl/nir/glsl_to_nir.cpp
> +++ b/src/glsl/nir/glsl_to_nir.cpp
> @@ -729,6 +729,8 @@ nir_visitor::visit(ir_call *ir)
>   op = nir_intrinsic_memory_barrier_shared;
>} else if (strcmp(ir->callee_name(), "__intrinsic_load_shared") == 0) {
>   op = nir_intrinsic_load_shared;
> +  } else if (strcmp(ir->callee_name(), "__intrinsic_store_shared") == 0) 
> {
> + op = nir_intrinsic_store_shared;
>} else {
>   unreachable("not reached");
>}
> @@ -1003,6 +1005,37 @@ nir_visitor::visit(ir_call *ir)
>   nir_builder_instr_insert(&b, &instr->instr);
>   break;
>}
> +  case nir_intrinsic_store_shared: {
> + exec_node *param = ir->actual_parameters.get_head();
> + ir_rvalue *offset = ((ir_instruction *)param)->as_rvalue();
> +
> + param = param->get_next();
> + ir_rvalue *val = ((ir_instruction *)param)->as_rvalue();
> +
> + param = param->get_next();
> + ir_constant *write_mask = ((ir_instruction *)param)->as_constant();
> + assert(write_mask);
> +
> + /* Check if we need the indirect version */
> + ir_constant *const_offset = offset->as_constant();
> + if (!const_offset) {
> +op = nir_intrinsic_store_shared_indirect;
> +ralloc_free(instr);
> +instr = nir_intrinsic_instr_create(shader, op);
> +instr->src[1] = nir_src_for_ssa(evaluate_rvalue(offset));
> +instr->const_index[0] = 0;
> + } else {
> +instr->const_index[0] = const_offset->value.u[0];
> + }
> +
> + instr->const_index[1] = write_mask->value.u[0];
> +
> + instr->src[0] = nir_src_for_ssa(evaluate_rvalue(val));
> + instr->num_components = val->type->vector_elements;
> +
> + nir_builder_instr_insert(&b, &instr->instr);
> + break;
> +  }
>default:
>   unreachable("not reached");
>}
> diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
> index de15128..6912258 100644
> --- a/src/glsl/nir/nir_intrinsics.h
> +++ b/src/glsl/nir/nir_intrinsics.h
> @@ -277,5 +277,6 @@ LOAD(shared, 0, 1, NIR_INTRINSIC_CAN_ELIMINATE)
>  STORE(output, 0, 0, 0, 0)
>  STORE(per_vertex_output, 1, 1, 0, 0)
>  STORE(ssbo, 1, 1, 1, 0)
> +STORE(shared, 0, 0, 0, 0)

This should be:

STORE(shared, 0, 0, 1, 0)

to account for the for the writemask, which always goes in
const_index[1]. With that change:

Reviewed-by: Iago Toral Quiroga 

> -LAST_INTRINSIC(store_ssbo_indirect)
> +LAST_INTRINSIC(store_shared_indirect)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 93100] [IVB]piglit/spec/opengl_3.3/gl-3.2-layered-rendering-framebuffertexture fails

2015-11-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=93100

marius predut  changed:

   What|Removed |Added

 Depends on||91670

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 26/42] i965: Disable vector splitting on shared variables

2015-11-25 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga 

On Tue, 2015-11-17 at 21:54 -0800, Jordan Justen wrote:
> Signed-off-by: Jordan Justen 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp
> index cab5af3..2c7e0dc 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp
> @@ -107,6 +107,7 @@ 
> ir_vector_reference_visitor::get_variable_entry(ir_variable *var)
> switch (var->data.mode) {
> case ir_var_uniform:
> case ir_var_shader_storage:
> +   case ir_var_shader_shared:
> case ir_var_shader_in:
> case ir_var_shader_out:
> case ir_var_system_value:


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa/main: TexImage2DMultisample needs to pass OpenGL3.3 conformance test.

2015-11-25 Thread Marius Predut
Open GL 3.3 reference document says:
samples must be in the range zero to GL_MAX_TEXTURE_SIZE - 1.
Open GL.4 clearly states:
An INVALID_VALUE error is generated if samples is zero.

Fixing the piglit test case gl-3.2-layered-rendering-framebuffertexture.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93100

Signed-off-by: Marius Predut 
---
 src/mesa/main/teximage.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
index d9453e3..69634ff 100644
--- a/src/mesa/main/teximage.c
+++ b/src/mesa/main/teximage.c
@@ -5211,7 +5211,7 @@ texture_image_multisample(struct gl_context *ctx, GLuint 
dims,
   return;
}
 
-   if (samples < 1) {
+   if (samples < 1 && ctx->API == API_OPENGL_CORE && ctx->Version >= 40) {
   _mesa_error(ctx, GL_INVALID_VALUE, "%s(samples < 1)", func);
   return;
}
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 93100] [IVB]piglit/spec/opengl_3.3/gl-3.2-layered-rendering-framebuffertexture fails

2015-11-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=93100

--- Comment #1 from marius predut  ---

The pair that fix this issue:

Patch sent to the piglit mailing list:
http://lists.freedesktop.org/archives/piglit/2015-November/018182.html

Patch sent to the mesa mailing list:
http://lists.freedesktop.org/archives/mesa-dev/2015-November/101403.html

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/main: TexImage2DMultisample needs to pass OpenGL3.3 conformance test.

2015-11-25 Thread Tapani Pälli

Hi;

On 11/25/2015 01:15 PM, Marius Predut wrote:

Open GL 3.3 reference document says:
samples must be in the range zero to GL_MAX_TEXTURE_SIZE - 1.
Open GL.4 clearly states:
An INVALID_VALUE error is generated if samples is zero.


OpenGL ES 3.1 spec also says "An INVALID_VALUE error is generated if 
samples is zero.". You'll need to change you check below to include also 
ES 3.1.



Fixing the piglit test case gl-3.2-layered-rendering-framebuffertexture.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93100

Signed-off-by: Marius Predut 
---
  src/mesa/main/teximage.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
index d9453e3..69634ff 100644
--- a/src/mesa/main/teximage.c
+++ b/src/mesa/main/teximage.c
@@ -5211,7 +5211,7 @@ texture_image_multisample(struct gl_context *ctx, GLuint 
dims,
return;
 }
  
-   if (samples < 1) {

+   if (samples < 1 && ctx->API == API_OPENGL_CORE && ctx->Version >= 40) {
_mesa_error(ctx, GL_INVALID_VALUE, "%s(samples < 1)", func);
return;
 }


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 27/42] i965/fs: Handle nir shared variable load intrinsic

2015-11-25 Thread Iago Toral
On Tue, 2015-11-17 at 21:54 -0800, Jordan Justen wrote:
> Signed-off-by: Jordan Justen 
> ---
>  src/mesa/drivers/dri/i965/brw_defines.h  |  2 ++
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 33 
> 
>  2 files changed, 35 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index ade3ede..8189c08 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -2867,4 +2867,6 @@ enum brw_wm_barycentric_interp_mode {
>  # define GPGPU_WALKER_THREAD_WIDTH_MAX_SHIFT0
>  # define GPGPU_WALKER_THREAD_WIDTH_MAX_MASK INTEL_MASK(5, 0)
>  
> +#define BRW_SLM_SURFACE_INDEX 0xfe
> +

I imagine that shared memory is always fixed at this index and we don't
need to setup the surface in the driver? I have not been able to find
the reference to that in the docs though.

Iago

>  #endif
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index c282f83..e9336fd 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -33,6 +33,9 @@
>  #include "brw_fs_surface_builder.h"
>  #include "brw_vec4_gs_visitor.h"
>  
> +#include "brw_fs_builder.h"
> +#include "brw_fs_surface_builder.h"

brw_fs_surface_builder.h is already included below and that already
includes brw_fs_builder.h, so no need to include them here.

>  using namespace brw;
>  using namespace brw::surface_access;
>  
> @@ -2297,6 +2300,36 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
> nir_intrinsic_instr *instr
>break;
> }
>  
> +   case nir_intrinsic_load_shared_indirect:
> +  has_indirect = true;
> +  /* fallthrough */
> +   case nir_intrinsic_load_shared: {
> +  assert(devinfo->gen >= 7);
> +
> +  fs_reg surf_index;
> +  unsigned index = BRW_SLM_SURFACE_INDEX;
> +  surf_index = fs_reg(index);
> +
> +  /* Get the offset to read from */
> +  fs_reg offset_reg;
> +  if (has_indirect) {
> + offset_reg = get_nir_src(instr->src[0]);
> +  } else {
> + offset_reg = fs_reg(instr->const_index[0]);
> +  }
> +
> +  /* Read the vector */
> +  fs_reg read_result = emit_untyped_read(bld, surf_index, offset_reg,
> + 1 /* dims */,
> + instr->num_components,
> + BRW_PREDICATE_NONE);
> +  read_result.type = dest.type;
> +  for (int i = 0; i < instr->num_components; i++)
> + bld.MOV(offset(dest, bld, i), offset(read_result, bld, i));
> +
> +  break;
> +   }
> +
> case nir_intrinsic_load_input_indirect:
>has_indirect = true;
>/* fallthrough */


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium/util: Fix util_blitter_clear_render_target() for num_layers>1

2015-11-25 Thread Marek Olšák
On Tue, Nov 24, 2015 at 1:03 PM, Edward O'Callaghan
 wrote:
> Previously util_blitter_clear_render_target() could not clear more
> than the first layer. We need to generalise this so that
> ARB_clear_texture can pass the 3d piglit test.
>
> Signed-off-by: Edward O'Callaghan 
> ---
>  src/gallium/auxiliary/util/u_blitter.c | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/src/gallium/auxiliary/util/u_blitter.c 
> b/src/gallium/auxiliary/util/u_blitter.c
> index fccc92c..37daefe 100644
> --- a/src/gallium/auxiliary/util/u_blitter.c
> +++ b/src/gallium/auxiliary/util/u_blitter.c
> @@ -1852,10 +1852,19 @@ void util_blitter_clear_render_target(struct 
> blitter_context *blitter,
> pipe->set_framebuffer_state(pipe, &fb_state);
> pipe->set_sample_mask(pipe, ~0);
>
> -   blitter_set_common_draw_rect_state(ctx, FALSE, FALSE);
> blitter_set_dst_dimensions(ctx, dstsurf->width, dstsurf->height);
> -   blitter->draw_rectangle(blitter, dstx, dsty, dstx+width, dsty+height, 0,
> -   UTIL_BLITTER_ATTRIB_COLOR, color);
> +
> +   unsigned num_layers = util_framebuffer_get_num_layers(&fb_state);

num_layers can be obtained more easily:
dstsurf->u.tex.last_layer - dstsurf->u.tex.first_layer + 1

Also, I guess clear_depth_stencil needs the same treatment.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 93100] [IVB]piglit/spec/opengl_3.3/gl-3.2-layered-rendering-framebuffertexture fails

2015-11-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=93100

--- Comment #2 from Timothy Arceri  ---
I think this is just a bug in the reference pages see my comments in bug 91670

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 28/42] i965/fs: Handle nir shared variable store intrinsic function

2015-11-25 Thread Iago Toral
On Tue, 2015-11-17 at 21:55 -0800, Jordan Justen wrote:
> Signed-off-by: Jordan Justen 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 54 
> 
>  1 file changed, 54 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index e9336fd..c8c6370 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -2330,6 +2330,60 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
> nir_intrinsic_instr *instr
>break;
> }
>  
> +   case nir_intrinsic_store_shared_indirect:
> +  has_indirect = true;
> +  /* fallthrough */
> +   case nir_intrinsic_store_shared: {
> +  assert(devinfo->gen >= 7);
> +
> +  /* Block index */
> +  fs_reg surf_index;
> +  unsigned index = BRW_SLM_SURFACE_INDEX;
> +  surf_index = fs_reg(index);

We don't need the index variable here. Also, this needs to be rebased on
top of Matt's changes, so you can just do:

fs_reg surf_index = brw_imm_ud(BRW_SLM_SURFACE_INDEX);

Also, you need to do the same in the previous patch.

> +  /* Offset */
> +  fs_reg offset_reg = vgrf(glsl_type::uint_type);
> +  unsigned const_offset_bytes = 0;
> +  if (has_indirect) {
> + bld.MOV(offset_reg, get_nir_src(instr->src[1]));
> +  } else {
> + const_offset_bytes = instr->const_index[0];
> + bld.MOV(offset_reg, fs_reg(const_offset_bytes));
> +  }
> +
> +  /* Value */
> +  fs_reg val_reg = get_nir_src(instr->src[0]);
> +
> +  /* Writemask */
> +  unsigned writemask = instr->const_index[1];
> +
> +  /* Write each component present in the writemask */

The loop below is exactly the same I wrote in the initial implementation
of ssbo stores, but Kristian optimized it later so we can group
consecutive enabled channels in a single write message. See
0cb7d7b4b7c32246. I believe we should do the same here.

Iago

> +  unsigned skipped_channels = 0;
> +  for (int i = 0; i < instr->num_components; i++) {
> + int component_mask = 1 << i;
> + if (writemask & component_mask) {
> +if (skipped_channels) {
> +   if (!has_indirect) {
> +  const_offset_bytes += 4 * skipped_channels;
> +  bld.MOV(offset_reg, fs_reg(const_offset_bytes));
> +   } else {
> +  bld.ADD(offset_reg, offset_reg,
> +   brw_imm_ud(4 * skipped_channels));
> +   }
> +   skipped_channels = 0;
> +}
> +
> +emit_untyped_write(bld, surf_index, offset_reg,
> +   offset(val_reg, bld, i),
> +   1 /* dims */, 1 /* size */,
> +   BRW_PREDICATE_NONE);
> + }
> +
> + skipped_channels++;
> +  }
> +  break;
> +   }
> +
> case nir_intrinsic_load_input_indirect:
>has_indirect = true;
>/* fallthrough */


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/main: TexImage2DMultisample needs to pass OpenGL3.3 conformance test.

2015-11-25 Thread Timothy Arceri
On Wed, 2015-11-25 at 12:47 +0200, Tapani Pälli wrote:
> Hi;
> 
> On 11/25/2015 01:15 PM, Marius Predut wrote:
> > Open GL 3.3 reference document says:
> > samples must be in the range zero to GL_MAX_TEXTURE_SIZE - 1.
> > Open GL.4 clearly states:
> > An INVALID_VALUE error is generated if samples is zero.

See my comment in bugzilla [1] I believe this is just a bug in the
reference pages, we implement things in Mesa going by what the spec
says and the spec says nothing about samples being 0 in the 3.2 spec in
fact it doen't even say anything in the 4.0 spec which you have changed
the check to.

Also the 4.5 reference pages also conflict with the spec so this is
even more reason I think this change is wrong.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=91670

> 
> OpenGL ES 3.1 spec also says "An INVALID_VALUE error is generated if 
> samples is zero.". You'll need to change you check below to include
> also 
> ES 3.1.
> 
> > Fixing the piglit test case gl-3.2-layered-rendering
> > -framebuffertexture.
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93100
> > 
> > Signed-off-by: Marius Predut 
> > ---
> >   src/mesa/main/teximage.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
> > index d9453e3..69634ff 100644
> > --- a/src/mesa/main/teximage.c
> > +++ b/src/mesa/main/teximage.c
> > @@ -5211,7 +5211,7 @@ texture_image_multisample(struct gl_context
> > *ctx, GLuint dims,
> > return;
> >  }
> >   
> > -   if (samples < 1) {
> > +   if (samples < 1 && ctx->API == API_OPENGL_CORE && ctx->Version
> > >= 40) {
> > _mesa_error(ctx, GL_INVALID_VALUE, "%s(samples < 1)",
> > func);
> > return;
> >  }
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] mesa: remove ARB_geometry_shader4

2015-11-25 Thread Marta Lofstedt
From: Marta Lofstedt 

No drivers currently implement ARB_geometry_shader4, nor are there
any plans to implement it.  We only support the version of geometry
shaders that was incorporated into OpenGL 3.2 / GLSL 1.50.

Signed-off-by: Marta Lofstedt 
---
 src/mapi/glapi/gen/ARB_geometry_shader4.xml | 57 -
 src/mapi/glapi/gen/Makefile.am  |  1 -
 src/mapi/glapi/gen/gl_API.xml   |  2 +-
 src/mesa/main/api_validate.c|  2 +-
 src/mesa/main/config.h  |  2 +-
 src/mesa/main/context.h |  3 +-
 src/mesa/main/dlist.c   | 55 
 src/mesa/main/get.c |  7 
 src/mesa/main/get_hash_params.py| 12 ++
 src/mesa/main/mtypes.h  |  3 +-
 src/mesa/main/tests/enum_strings.cpp|  6 ---
 11 files changed, 9 insertions(+), 141 deletions(-)
 delete mode 100644 src/mapi/glapi/gen/ARB_geometry_shader4.xml

diff --git a/src/mapi/glapi/gen/ARB_geometry_shader4.xml 
b/src/mapi/glapi/gen/ARB_geometry_shader4.xml
deleted file mode 100644
index 280e7a0..000
--- a/src/mapi/glapi/gen/ARB_geometry_shader4.xml
+++ /dev/null
@@ -1,57 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am
index a5a26a6..40b0e65 100644
--- a/src/mapi/glapi/gen/Makefile.am
+++ b/src/mapi/glapi/gen/Makefile.am
@@ -133,7 +133,6 @@ API_XML = \
ARB_ES3_compatibility.xml \
ARB_framebuffer_no_attachments.xml \
ARB_framebuffer_object.xml \
-   ARB_geometry_shader4.xml \
ARB_get_program_binary.xml \
ARB_get_texture_sub_image.xml \
ARB_gpu_shader_fp64.xml \
diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
index ec83cd4..6243bdd 100644
--- a/src/mapi/glapi/gen/gl_API.xml
+++ b/src/mapi/glapi/gen/gl_API.xml
@@ -7975,7 +7975,7 @@
 
 
 
-http://www.w3.org/2001/XInclude"/>
+
 
 
 
diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
index a490189..cbfb6b5 100644
--- a/src/mesa/main/api_validate.c
+++ b/src/mesa/main/api_validate.c
@@ -170,7 +170,7 @@ _mesa_valid_prim_mode(struct gl_context *ctx, GLenum mode, 
const char *name)
   return GL_FALSE;
}
 
-   /* From the ARB_geometry_shader4 spec:
+   /* From the OpenGL 4.5 specification, section 11.3.1:
 *
 * The error INVALID_OPERATION is generated if Begin, or any command that
 * implicitly calls Begin, is called when a geometry shader is active and:
diff --git a/src/mesa/main/config.h b/src/mesa/main/config.h
index f29de5f..2d53e2f 100644
--- a/src/mesa/main/config.h
+++ b/src/mesa/main/config.h
@@ -246,7 +246,7 @@
 #define MAX_FEEDBACK_BUFFERS 4
 #define MAX_FEEDBACK_ATTRIBS 32
 
-/** For GL_ARB_geometry_shader4 */
+/** For geometry shader */
 /*@{*/
 #define MAX_GEOMETRY_UNIFORM_COMPONENTS  512
 #define MAX_GEOMETRY_OUTPUT_VERTICES 256
diff --git a/src/mesa/main/context.h b/src/mesa/main/context.h
index 4798b1f..8b64f45 100644
--- a/src/mesa/main/context.h
+++ b/src/mesa/main/context.h
@@ -330,8 +330,7 @@ _mesa_is_gles31(const struct gl_context *ctx)
 static inline bool
 _mesa_has_geometry_shaders(const struct gl_context *ctx)
 {
-   return _mesa_is_desktop_gl(ctx) &&
-  (ctx->Version >= 32 || ctx->Extensions.ARB_geometry_shader4);
+   return _mesa_is_desktop_gl(ctx) && ctx->Version >= 32;
 }
 
 
diff --git a/src/mesa/main/dlist.c b/src/mesa/main/dlist.c
index 2b65b2e..ba2e670 100644
--- a/src/mesa/main/dlist.c
+++ b/src/mesa/main/dlist.c
@@ -457,11 +457,6 @@ typedef enum
OPCODE_SAMPLER_PARAMETERIIV,
OPCODE_SAMPLER_PARAMETERUIV,
 
-   /* GL_ARB_geometry_shader4 */
-   OPCODE_PROGRAM_PARAMETERI,
-   OPCODE_FRAMEBUFFER_TEXTURE,
-   OPCODE_FRAMEBUFFER_TEXTURE_FACE,
-
/* GL_ARB_sync */
OPCODE_WAIT_SYNC,
 
@@ -7554,44 +7549,6 @@ save_SamplerParameterIuiv(GLuint sampler, GLenum pname, 
const GLuint *params)
}
 }
 
-/* GL_ARB_geometry_shader4 */
-static void GLAPIENTRY
-save_ProgramParameteri(GLuint program, GLenum pname, GLint value)
-{
-   Node *n;
-   GET_CURRENT_CONTEXT(ctx);
-   ASSERT_OUTSIDE_SAVE_BEGIN_END_AND_FLUSH(ctx);
-   n = alloc_instruction(ctx, OPCODE_PROGRAM_PARAMETERI, 3);
-   if (n) {
-  n[1].ui = program;
-  n[2].e = pname;
-  n[3].i = value;
-   }
-   if (ctx->ExecuteFlag) {
-  CALL_ProgramParameteri(ctx->Exec, (program, pname, value));
-   }
-}
-
-static void GLAPIENTRY
-save_FramebufferTexture(GLenum target, GLenum attachment,
-GLuint texture, GLint level)
-{
-   Node *n;
-   GET_CURRENT_CONTEXT(ctx

Re: [Mesa-dev] [PATCH] mesa/main: TexImage2DMultisample needs to pass OpenGL3.3 conformance test.

2015-11-25 Thread Tapani Pälli

On 11/25/2015 01:11 PM, Timothy Arceri wrote:

On Wed, 2015-11-25 at 12:47 +0200, Tapani Pälli wrote:

Hi;

On 11/25/2015 01:15 PM, Marius Predut wrote:

Open GL 3.3 reference document says:
samples must be in the range zero to GL_MAX_TEXTURE_SIZE - 1.
Open GL.4 clearly states:
An INVALID_VALUE error is generated if samples is zero.

See my comment in bugzilla [1] I believe this is just a bug in the
reference pages, we implement things in Mesa going by what the spec
says and the spec says nothing about samples being 0 in the 3.2 spec in
fact it doen't even say anything in the 4.0 spec which you have changed
the check to.

Also the 4.5 reference pages also conflict with the spec so this is
even more reason I think this change is wrong.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=91670


I believe there is some background to this change, AFAIK the reason to 
push this would be to get desktop GL CTS test suite (which we currently 
run against GL 3.x)  behave better. Marius, is this right?




OpenGL ES 3.1 spec also says "An INVALID_VALUE error is generated if
samples is zero.". You'll need to change you check below to include
also
ES 3.1.


Fixing the piglit test case gl-3.2-layered-rendering
-framebuffertexture.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93100

Signed-off-by: Marius Predut 
---
   src/mesa/main/teximage.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
index d9453e3..69634ff 100644
--- a/src/mesa/main/teximage.c
+++ b/src/mesa/main/teximage.c
@@ -5211,7 +5211,7 @@ texture_image_multisample(struct gl_context
*ctx, GLuint dims,
 return;
  }
   
-   if (samples < 1) {

+   if (samples < 1 && ctx->API == API_OPENGL_CORE && ctx->Version

= 40) {

 _mesa_error(ctx, GL_INVALID_VALUE, "%s(samples < 1)",
func);
 return;
  }

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/11] winsys/amdgpu/addrlib: do not wrap header inclusion in extern "C"

2015-11-25 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Tue, Nov 24, 2015 at 5:29 PM, Emil Velikov  wrote:
> From: Emil Velikov 
>
> Signed-off-by: Emil Velikov 
> ---
>  src/gallium/winsys/amdgpu/drm/addrlib/addrinterface.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/winsys/amdgpu/drm/addrlib/addrinterface.h 
> b/src/gallium/winsys/amdgpu/drm/addrlib/addrinterface.h
> index 03fbf2b..ead6033 100644
> --- a/src/gallium/winsys/amdgpu/drm/addrlib/addrinterface.h
> +++ b/src/gallium/winsys/amdgpu/drm/addrlib/addrinterface.h
> @@ -33,13 +33,13 @@
>  #ifndef __ADDR_INTERFACE_H__
>  #define __ADDR_INTERFACE_H__
>
> +#include "addrtypes.h"
> +
>  #if defined(__cplusplus)
>  extern "C"
>  {
>  #endif
>
> -#include "addrtypes.h"
> -
>  #define ADDRLIB_VERSION_MAJOR 5
>  #define ADDRLIB_VERSION_MINOR 25
>  #define ADDRLIB_VERSION ((ADDRLIB_VERSION_MAJOR << 16) | 
> ADDRLIB_VERSION_MINOR)
> --
> 2.6.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 29/42] i965: Enable shared local memory for CS shared variables

2015-11-25 Thread Iago Toral
On Tue, 2015-11-17 at 21:55 -0800, Jordan Justen wrote:
> Signed-off-by: Jordan Justen 
> ---
>  src/mesa/drivers/dri/i965/brw_cs.c|  2 ++
>  src/mesa/drivers/dri/i965/brw_defines.h   |  2 ++
>  src/mesa/drivers/dri/i965/gen7_cs_state.c | 12 
>  3 files changed, 16 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_cs.c 
> b/src/mesa/drivers/dri/i965/brw_cs.c
> index 263d224..704b00d 100644
> --- a/src/mesa/drivers/dri/i965/brw_cs.c
> +++ b/src/mesa/drivers/dri/i965/brw_cs.c
> @@ -69,6 +69,8 @@ brw_codegen_cs_prog(struct brw_context *brw,
>  
> memset(&prog_data, 0, sizeof(prog_data));
>  
> +   prog_data.base.total_shared = prog->Comp.SharedSize;
> +
> assign_cs_binding_table_offsets(brw->intelScreen->devinfo, prog,
> &cp->program.Base, &prog_data);
>  
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index 8189c08..ca5378a 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -2846,6 +2846,8 @@ enum brw_wm_barycentric_interp_mode {
>  /* GEN7 DW5, GEN8+ DW6 */
>  # define MEDIA_BARRIER_ENABLE_SHIFT 21
>  # define MEDIA_BARRIER_ENABLE_MASK  INTEL_MASK(21, 21)
> +# define MEDIA_SHARED_LOCAL_MEMORY_SIZE_SHIFT   16
> +# define MEDIA_SHARED_LOCAL_MEMORY_SIZE_MASKINTEL_MASK(20, 16)
>  # define MEDIA_GPGPU_THREAD_COUNT_SHIFT 0
>  # define MEDIA_GPGPU_THREAD_COUNT_MASK  INTEL_MASK(7, 0)
>  # define GEN8_MEDIA_GPGPU_THREAD_COUNT_SHIFT0
> diff --git a/src/mesa/drivers/dri/i965/gen7_cs_state.c 
> b/src/mesa/drivers/dri/i965/gen7_cs_state.c
> index 2d7c04f..344ea5a 100644
> --- a/src/mesa/drivers/dri/i965/gen7_cs_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_cs_state.c
> @@ -164,8 +164,20 @@ brw_upload_cs_state(struct brw_context *brw)
>SET_FIELD(threads, GEN8_MEDIA_GPGPU_THREAD_COUNT) :
>SET_FIELD(threads, MEDIA_GPGPU_THREAD_COUNT);
> assert(threads <= brw->max_cs_threads);
> +
> +   assert(prog_data->total_shared <= 64 * 1024);
> +   uint32_t slm_size = 0;
> +   if (prog_data->total_shared > 0) {
> +  /* slm_size is in 4k increments, but must be a power of 2. */
> +  slm_size = 4 * 1024;
> +  while (slm_size < 64 * 1024 && slm_size < prog_data->total_shared)

total_shared is guaranteed to be <= 64KB because of the assert above so
we should not need the "slm_size < 64 * 1024" part of the condition
here.

Reviewed-by: Iago Toral Quiroga 

> + slm_size <<= 1;
> +  slm_size /= 4 * 1024;
> +   }
> +
> desc[dw++] =
>SET_FIELD(cs_prog_data->uses_barrier, MEDIA_BARRIER_ENABLE) |
> +  SET_FIELD(slm_size, MEDIA_SHARED_LOCAL_MEMORY_SIZE) |
>media_threads;
>  
> BEGIN_BATCH(4);


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/main: TexImage2DMultisample needs to pass OpenGL3.3 conformance test.

2015-11-25 Thread Lofstedt, Marta

> -Original Message-
> From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On
> Behalf Of Tapani Pälli
> Sent: Wednesday, November 25, 2015 12:24 PM
> To: Timothy Arceri; Predut, Marius; mesa-dev@lists.freedesktop.org
> Subject: Re: [Mesa-dev] [PATCH] mesa/main: TexImage2DMultisample
> needs to pass OpenGL3.3 conformance test.
> 
> On 11/25/2015 01:11 PM, Timothy Arceri wrote:
> > On Wed, 2015-11-25 at 12:47 +0200, Tapani Pälli wrote:
> >> Hi;
> >>
> >> On 11/25/2015 01:15 PM, Marius Predut wrote:
> >>> Open GL 3.3 reference document says:
> >>> samples must be in the range zero to GL_MAX_TEXTURE_SIZE - 1.
> >>> Open GL.4 clearly states:
> >>> An INVALID_VALUE error is generated if samples is zero.
> > See my comment in bugzilla [1] I believe this is just a bug in the
> > reference pages, we implement things in Mesa going by what the spec
> > says and the spec says nothing about samples being 0 in the 3.2 spec
> > in fact it doen't even say anything in the 4.0 spec which you have
> > changed the check to.
> >
> > Also the 4.5 reference pages also conflict with the spec so this is
> > even more reason I think this change is wrong.
> >
> > [1] https://bugs.freedesktop.org/show_bug.cgi?id=91670
> 
> I believe there is some background to this change, AFAIK the reason to push
> this would be to get desktop GL CTS test suite (which we currently run
> against GL 3.x)  behave better. Marius, is this right?
> 

There is also the option of filing a BUG against the CTS.

> 
> >> OpenGL ES 3.1 spec also says "An INVALID_VALUE error is generated if
> >> samples is zero.". You'll need to change you check below to include
> >> also ES 3.1.
> >>
> >>> Fixing the piglit test case gl-3.2-layered-rendering
> >>> -framebuffertexture.
> >>>
> >>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93100
> >>>
> >>> Signed-off-by: Marius Predut 
> >>> ---
> >>>src/mesa/main/teximage.c | 2 +-
> >>>1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
> >>> index d9453e3..69634ff 100644
> >>> --- a/src/mesa/main/teximage.c
> >>> +++ b/src/mesa/main/teximage.c
> >>> @@ -5211,7 +5211,7 @@ texture_image_multisample(struct gl_context
> >>> *ctx, GLuint dims,
> >>>  return;
> >>>   }
> >>>
> >>> -   if (samples < 1) {
> >>> +   if (samples < 1 && ctx->API == API_OPENGL_CORE && ctx->Version
>  = 40) {
> >>>  _mesa_error(ctx, GL_INVALID_VALUE, "%s(samples < 1)",
> >>> func);
> >>>  return;
> >>>   }
> >> ___
> >> mesa-dev mailing list
> >> mesa-dev@lists.freedesktop.org
> >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/main: TexImage2DMultisample needs to pass OpenGL3.3 conformance test.

2015-11-25 Thread Predut, Marius

-Original Message-
From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf Of 
Tapani Pälli
Sent: Wednesday, November 25, 2015 1:23 PM
To: Timothy Arceri; Predut, Marius; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH] mesa/main: TexImage2DMultisample needs to pass 
OpenGL3.3 conformance test.

On 11/25/2015 01:11 PM, Timothy Arceri wrote:
> On Wed, 2015-11-25 at 12:47 +0200, Tapani Pälli wrote:
>> Hi;
>>
>> On 11/25/2015 01:15 PM, Marius Predut wrote:
>>> Open GL 3.3 reference document says:
>>> samples must be in the range zero to GL_MAX_TEXTURE_SIZE - 1.
>>> Open GL.4 clearly states:
>>> An INVALID_VALUE error is generated if samples is zero.
> See my comment in bugzilla [1] I believe this is just a bug in the 
> reference pages, we implement things in Mesa going by what the spec 
> says and the spec says nothing about samples being 0 in the 3.2 spec 
> in fact it doen't even say anything in the 4.0 spec which you have 
> changed the check to.
>
> Also the 4.5 reference pages also conflict with the spec so this is 
> even more reason I think this change is wrong.
>
> [1] https://bugs.freedesktop.org/show_bug.cgi?id=91670

I believe there is some background to this change, AFAIK the reason to push 
this would be to get desktop GL CTS test suite (which we currently run against 
GL 3.x)  behave better. Marius, is this right?
Yes Tapani

>> OpenGL ES 3.1 spec also says "An INVALID_VALUE error is generated if 
>> samples is zero.". You'll need to change you check below to include 
>> also ES 3.1.
>>
>>> Fixing the piglit test case gl-3.2-layered-rendering 
>>> -framebuffertexture.
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93100
>>>
>>> Signed-off-by: Marius Predut 
>>> ---
>>>src/mesa/main/teximage.c | 2 +-
>>>1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c 
>>> index d9453e3..69634ff 100644
>>> --- a/src/mesa/main/teximage.c
>>> +++ b/src/mesa/main/teximage.c
>>> @@ -5211,7 +5211,7 @@ texture_image_multisample(struct gl_context 
>>> *ctx, GLuint dims,
>>>  return;
>>>   }
>>>
>>> -   if (samples < 1) {
>>> +   if (samples < 1 && ctx->API == API_OPENGL_CORE && ctx->Version
 = 40) {
>>>  _mesa_error(ctx, GL_INVALID_VALUE, "%s(samples < 1)", 
>>> func);
>>>  return;
>>>   }
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 31/42] glsl: Allow atomic functions to be used with shared variables

2015-11-25 Thread Iago Toral
On Tue, 2015-11-17 at 21:55 -0800, Jordan Justen wrote:
> Signed-off-by: Jordan Justen 
> Reviewed-by: Timothy Arceri 

Reviewed-by: Iago Toral Quiroga 

> ---
>  src/glsl/ast_function.cpp | 18 ++
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/src/glsl/ast_function.cpp b/src/glsl/ast_function.cpp
> index 466ece6..da1167a 100644
> --- a/src/glsl/ast_function.cpp
> +++ b/src/glsl/ast_function.cpp
> @@ -143,19 +143,21 @@ verify_image_parameter(YYLTYPE *loc, 
> _mesa_glsl_parse_state *state,
>  }
>  
>  static bool
> -verify_first_atomic_ssbo_parameter(YYLTYPE *loc, _mesa_glsl_parse_state 
> *state,
> +verify_first_atomic_parameter(YYLTYPE *loc, _mesa_glsl_parse_state *state,
> ir_variable *var)
>  {
> -   if (!var || !var->is_in_shader_storage_block()) {
> +   if (!var ||
> +   (!var->is_in_shader_storage_block() &&
> +var->data.mode != ir_var_shader_shared)) {
>_mesa_glsl_error(loc, state, "First argument to atomic function "
> -   "must be a buffer variable");
> +   "must be a buffer or shared variable");
>return false;
> }
> return true;
>  }
>  
>  static bool
> -is_atomic_ssbo_function(const char *func_name)
> +is_atomic_function(const char *func_name)
>  {
> return !strcmp(func_name, "atomicAdd") ||
>!strcmp(func_name, "atomicMin") ||
> @@ -276,16 +278,16 @@ verify_parameter_modes(_mesa_glsl_parse_state *state,
>  
> /* The first parameter of atomic functions must be a buffer variable */
> const char *func_name = sig->function_name();
> -   bool is_atomic_ssbo = is_atomic_ssbo_function(func_name);
> -   if (is_atomic_ssbo) {
> +   bool is_atomic = is_atomic_function(func_name);
> +   if (is_atomic) {
>const ir_rvalue *const actual = (ir_rvalue *) 
> actual_ir_parameters.head;
>  
>const ast_expression *const actual_ast =
>   exec_node_data(ast_expression, actual_ast_parameters.head, link);
>YYLTYPE loc = actual_ast->get_location();
>  
> -  if (!verify_first_atomic_ssbo_parameter(&loc, state,
> -  
> actual->variable_referenced())) {
> +  if (!verify_first_atomic_parameter(&loc, state,
> + actual->variable_referenced())) {
>   return false;
>}
> }


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 30/42] i965: Lower shared variable references to intrinsic calls

2015-11-25 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga 

On Tue, 2015-11-17 at 21:55 -0800, Jordan Justen wrote:
> Signed-off-by: Jordan Justen 
> ---
>  src/mesa/drivers/dri/i965/brw_shader.cpp | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
> b/src/mesa/drivers/dri/i965/brw_shader.cpp
> index c4a567f..1365609 100644
> --- a/src/mesa/drivers/dri/i965/brw_shader.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
> @@ -137,6 +137,9 @@ brw_compiler_create(void *mem_ctx, const struct 
> brw_device_info *devinfo)
>compiler->glsl_compiler_options[i].LowerBufferInterfaceBlocks = true;
> }
>  
> +   compiler->glsl_compiler_options[MESA_SHADER_COMPUTE]
> +  .LowerShaderSharedVariables = true;
> +
> return compiler;
>  }
>  


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/3] mesa: delayed initialization of performance monitor groups

2015-11-25 Thread Nicolai Hähnle
Hi,

since performance monitors are almost never used, we want to avoid the startup
cost associated with setting up the group information. This mostly affects
Gallium, where that initialization is rather inefficient right now.

For radeonsi with my performance counter series, the first two patches shave
off almost 200ms from the time taken by glxinfo. The last patch is a smaller
win on top for radeon only (harder to measure, in the single-digit ms range at
most). (The remaining overhead for radeonsi in enabling performance counters
vs. not doing so when unused is not measurable with a naive use of `time`
on the command line.)

The 200ms are actually largely caused by the fact that the state tracker
queries the driver very rather inefficiently, with the time of initialization
being proportional to # groups * # driver queries rather than proportional to
# driver queries (and then, inside of that, yet another inefficiency in
the radeon driver). It would certainly be nice to improve that as well,
but even with such an improvement, it's time we don't need to spend in the
vast majority of cases, hence the approach I've taken in these patches.

Please review!

Thanks,
Nicolai

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] mesa/main: allow delayed initialization of performance monitors

2015-11-25 Thread Nicolai Hähnle
Most applications never use performance counters, so allow drivers to
skip potentially expensive initialization steps.

A driver that wants to use this must enable the appropriate extension(s)
at context initialization and set the InitPerfMonitorGroups driver function
which will be called the first time information about the performance monitor
groups is actually used.

The init_groups helper is called for API functions that can be called before
a monitor object exists. Functions that require an existing monitor object
can rely on init_groups having been called before.
---
 src/mesa/main/dd.h  |  1 +
 src/mesa/main/performance_monitor.c | 39 +
 2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
index 496a14f..e5281ce 100644
--- a/src/mesa/main/dd.h
+++ b/src/mesa/main/dd.h
@@ -727,6 +727,7 @@ struct dd_function_table {
 * \name Performance monitors
 */
/*@{*/
+   void (*InitPerfMonitorGroups)(struct gl_context *ctx);
struct gl_perf_monitor_object * (*NewPerfMonitor)(struct gl_context *ctx);
void (*DeletePerfMonitor)(struct gl_context *ctx,
  struct gl_perf_monitor_object *m);
diff --git a/src/mesa/main/performance_monitor.c 
b/src/mesa/main/performance_monitor.c
index 2d740da..98dfbea 100644
--- a/src/mesa/main/performance_monitor.c
+++ b/src/mesa/main/performance_monitor.c
@@ -53,6 +53,13 @@ _mesa_init_performance_monitors(struct gl_context *ctx)
ctx->PerfMonitor.Groups = NULL;
 }
 
+static inline void
+init_groups(struct gl_context *ctx)
+{
+   if (unlikely(!ctx->PerfMonitor.Groups))
+  ctx->Driver.InitPerfMonitorGroups(ctx);
+}
+
 static struct gl_perf_monitor_object *
 new_performance_monitor(struct gl_context *ctx, GLuint index)
 {
@@ -171,6 +178,7 @@ _mesa_GetPerfMonitorGroupsAMD(GLint *numGroups, GLsizei 
groupsSize,
   GLuint *groups)
 {
GET_CURRENT_CONTEXT(ctx);
+   init_groups(ctx);
 
if (numGroups != NULL)
   *numGroups = ctx->PerfMonitor.NumGroups;
@@ -191,7 +199,11 @@ _mesa_GetPerfMonitorCountersAMD(GLuint group, GLint 
*numCounters,
 GLsizei countersSize, GLuint *counters)
 {
GET_CURRENT_CONTEXT(ctx);
-   const struct gl_perf_monitor_group *group_obj = get_group(ctx, group);
+   const struct gl_perf_monitor_group *group_obj;
+
+   init_groups(ctx);
+
+   group_obj = get_group(ctx, group);
if (group_obj == NULL) {
   _mesa_error(ctx, GL_INVALID_VALUE,
   "glGetPerfMonitorCountersAMD(invalid group)");
@@ -219,9 +231,11 @@ _mesa_GetPerfMonitorGroupStringAMD(GLuint group, GLsizei 
bufSize,
GLsizei *length, GLchar *groupString)
 {
GET_CURRENT_CONTEXT(ctx);
+   const struct gl_perf_monitor_group *group_obj;
 
-   const struct gl_perf_monitor_group *group_obj = get_group(ctx, group);
+   init_groups(ctx);
 
+   group_obj = get_group(ctx, group);
if (group_obj == NULL) {
   _mesa_error(ctx, GL_INVALID_VALUE, "glGetPerfMonitorGroupStringAMD");
   return;
@@ -251,6 +265,8 @@ _mesa_GetPerfMonitorCounterStringAMD(GLuint group, GLuint 
counter,
const struct gl_perf_monitor_group *group_obj;
const struct gl_perf_monitor_counter *counter_obj;
 
+   init_groups(ctx);
+
group_obj = get_group(ctx, group);
 
if (group_obj == NULL) {
@@ -290,6 +306,8 @@ _mesa_GetPerfMonitorCounterInfoAMD(GLuint group, GLuint 
counter, GLenum pname,
const struct gl_perf_monitor_group *group_obj;
const struct gl_perf_monitor_counter *counter_obj;
 
+   init_groups(ctx);
+
group_obj = get_group(ctx, group);
 
if (group_obj == NULL) {
@@ -353,6 +371,8 @@ _mesa_GenPerfMonitorsAMD(GLsizei n, GLuint *monitors)
if (MESA_VERBOSE & VERBOSE_API)
   _mesa_debug(ctx, "glGenPerfMonitorsAMD(%d)\n", n);
 
+   init_groups(ctx);
+
if (n < 0) {
   _mesa_error(ctx, GL_INVALID_VALUE, "glGenPerfMonitorsAMD(n < 0)");
   return;
@@ -673,6 +693,8 @@ _mesa_GetFirstPerfQueryIdINTEL(GLuint *queryId)
GET_CURRENT_CONTEXT(ctx);
unsigned numGroups;
 
+   init_groups(ctx);
+
/* The GL_INTEL_performance_query spec says:
 *
 *"If queryId pointer is equal to 0, INVALID_VALUE error is generated."
@@ -705,6 +727,7 @@ extern void GLAPIENTRY
 _mesa_GetNextPerfQueryIdINTEL(GLuint queryId, GLuint *nextQueryId)
 {
GET_CURRENT_CONTEXT(ctx);
+   init_groups(ctx);
 
/* The GL_INTEL_performance_query spec says:
 *
@@ -744,6 +767,8 @@ _mesa_GetPerfQueryIdByNameINTEL(char *queryName, GLuint 
*queryId)
GET_CURRENT_CONTEXT(ctx);
unsigned i;
 
+   init_groups(ctx);
+
/* The GL_INTEL_performance_query spec says:
 *
 *"If queryName does not reference a valid query name, an INVALID_VALUE
@@ -783,9 +808,11 @@ _mesa_GetPerfQueryInfoINTEL(GLuint queryId,
GET_CURRENT_CONTEXT(ctx);
unsigned i;
 
-   const struct gl_perf_monitor_group *group_obj =
-  get_group

[Mesa-dev] [PATCH 2/3] st/mesa: delay initialization of performance counters

2015-11-25 Thread Nicolai Hähnle
---
 src/mesa/state_tracker/st_cb_perfmon.c | 28 +---
 src/mesa/state_tracker/st_cb_perfmon.h |  2 +-
 src/mesa/state_tracker/st_context.c|  2 +-
 3 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_perfmon.c 
b/src/mesa/state_tracker/st_cb_perfmon.c
index 8fdf0e8..cd7fdc0 100644
--- a/src/mesa/state_tracker/st_cb_perfmon.c
+++ b/src/mesa/state_tracker/st_cb_perfmon.c
@@ -329,8 +329,20 @@ st_GetPerfMonitorResult(struct gl_context *ctx,
 
 
 bool
-st_init_perfmon(struct st_context *st)
+st_have_perfmon(struct st_context *st)
 {
+   struct pipe_screen *screen = st->pipe->screen;
+
+   if (!screen->get_driver_query_info || !screen->get_driver_query_group_info)
+  return false;
+
+   return screen->get_driver_query_group_info(screen, 0, NULL) != 0;
+}
+
+static void
+st_InitPerfMonitorGroups(struct gl_context *ctx)
+{
+   struct st_context *st = st_context(ctx);
struct gl_perf_monitor_state *perfmon = &st->ctx->PerfMonitor;
struct pipe_screen *screen = st->pipe->screen;
struct gl_perf_monitor_group *groups = NULL;
@@ -338,20 +350,14 @@ st_init_perfmon(struct st_context *st)
int num_counters, num_groups;
int gid, cid;
 
-   if (!screen->get_driver_query_info || !screen->get_driver_query_group_info)
-  return false;
-
/* Get the number of available queries. */
num_counters = screen->get_driver_query_info(screen, 0, NULL);
-   if (!num_counters)
-  return false;
 
/* Get the number of available groups. */
num_groups = screen->get_driver_query_group_info(screen, 0, NULL);
-   if (num_groups)
-  groups = CALLOC(num_groups, sizeof(*groups));
+   groups = CALLOC(num_groups, sizeof(*groups));
if (!groups)
-  return false;
+  return;
 
stgroups = CALLOC(num_groups, sizeof(*stgroups));
if (!stgroups)
@@ -432,7 +438,7 @@ st_init_perfmon(struct st_context *st)
perfmon->Groups = groups;
st->perfmon = stgroups;
 
-   return true;
+   return;
 
 fail:
for (gid = 0; gid < num_groups; gid++) {
@@ -442,7 +448,6 @@ fail:
FREE(stgroups);
 fail_only_groups:
FREE(groups);
-   return false;
 }
 
 void
@@ -461,6 +466,7 @@ st_destroy_perfmon(struct st_context *st)
 
 void st_init_perfmon_functions(struct dd_function_table *functions)
 {
+   functions->InitPerfMonitorGroups = st_InitPerfMonitorGroups;
functions->NewPerfMonitor = st_NewPerfMonitor;
functions->DeletePerfMonitor = st_DeletePerfMonitor;
functions->BeginPerfMonitor = st_BeginPerfMonitor;
diff --git a/src/mesa/state_tracker/st_cb_perfmon.h 
b/src/mesa/state_tracker/st_cb_perfmon.h
index 2973286..98202f2 100644
--- a/src/mesa/state_tracker/st_cb_perfmon.h
+++ b/src/mesa/state_tracker/st_cb_perfmon.h
@@ -73,7 +73,7 @@ st_perf_monitor_object(struct gl_perf_monitor_object *q)
 }
 
 bool
-st_init_perfmon(struct st_context *st);
+st_have_perfmon(struct st_context *st);
 
 void
 st_destroy_perfmon(struct st_context *st);
diff --git a/src/mesa/state_tracker/st_context.c 
b/src/mesa/state_tracker/st_context.c
index 6e20fd1..eea3de0 100644
--- a/src/mesa/state_tracker/st_context.c
+++ b/src/mesa/state_tracker/st_context.c
@@ -255,7 +255,7 @@ st_create_context_priv( struct gl_context *ctx, struct 
pipe_context *pipe,
st_init_extensions(st->pipe->screen, &ctx->Const,
   &ctx->Extensions, &st->options, ctx->Mesa_DXTn);
 
-   if (st_init_perfmon(st)) {
+   if (st_have_perfmon(st)) {
   /* GL_AMD_performance_monitor is only enabled when the underlying
* driver expose GPU hardware performance counters. */
   ctx->Extensions.AMD_performance_monitor = GL_TRUE;
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] radeon: delay the generation of driver query names until first use

2015-11-25 Thread Nicolai Hähnle
This shaves a bit more time off the startup of programs that don't
actually use performance counters.
---
 src/gallium/drivers/radeon/r600_perfcounter.c | 192 ++
 src/gallium/drivers/radeon/r600_query.h   |  10 +-
 src/gallium/drivers/radeonsi/si_perfcounter.c |  15 +-
 3 files changed, 113 insertions(+), 104 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_perfcounter.c 
b/src/gallium/drivers/radeon/r600_perfcounter.c
index a74cc89..962c41d 100644
--- a/src/gallium/drivers/radeon/r600_perfcounter.c
+++ b/src/gallium/drivers/radeon/r600_perfcounter.c
@@ -424,6 +424,89 @@ error:
return NULL;
 }
 
+static boolean r600_init_block_names(struct r600_common_screen *screen,
+struct r600_perfcounter_block *block)
+{
+   unsigned i, j, k;
+   unsigned groups_shader = 1, groups_se = 1, groups_instance = 1;
+   unsigned namelen;
+   char *groupname;
+   char *p;
+
+   if (block->flags & R600_PC_BLOCK_INSTANCE_GROUPS)
+   groups_instance = block->num_instances;
+   if (block->flags & R600_PC_BLOCK_SE_GROUPS)
+   groups_se = screen->info.max_se;
+   if (block->flags & R600_PC_BLOCK_SHADER)
+   groups_shader = ARRAY_SIZE(r600_pc_shader_suffix);
+
+   namelen = strlen(block->basename);
+   block->group_name_stride = namelen + 1;
+   if (block->flags & R600_PC_BLOCK_SHADER)
+   block->group_name_stride += 3;
+   if (block->flags & R600_PC_BLOCK_SE_GROUPS) {
+   assert(groups_se <= 10);
+   block->group_name_stride += 1;
+
+   if (block->flags & R600_PC_BLOCK_INSTANCE_GROUPS)
+   block->group_name_stride += 1;
+   }
+   if (block->flags & R600_PC_BLOCK_INSTANCE_GROUPS) {
+   assert(groups_instance <= 100);
+   block->group_name_stride += 2;
+   }
+
+   block->group_names = MALLOC(block->num_groups * 
block->group_name_stride);
+   if (!block->group_names)
+   return FALSE;
+
+   groupname = block->group_names;
+   for (i = 0; i < groups_shader; ++i) {
+   unsigned shaderlen = strlen(r600_pc_shader_suffix[i]);
+   for (j = 0; j < groups_se; ++j) {
+   for (k = 0; k < groups_instance; ++k) {
+   strcpy(groupname, block->basename);
+   p = groupname + namelen;
+
+   if (block->flags & R600_PC_BLOCK_SHADER) {
+   strcpy(p, r600_pc_shader_suffix[i]);
+   p += shaderlen;
+   }
+
+   if (block->flags & R600_PC_BLOCK_SE_GROUPS) {
+   p += sprintf(p, "%d", j);
+   if (block->flags & 
R600_PC_BLOCK_INSTANCE_GROUPS)
+   *p++ = '_';
+   }
+
+   if (block->flags & 
R600_PC_BLOCK_INSTANCE_GROUPS)
+   p += sprintf(p, "%d", k);
+
+   groupname += block->group_name_stride;
+   }
+   }
+   }
+
+   assert(block->num_selectors <= 1000);
+   block->selector_name_stride = block->group_name_stride + 4;
+   block->selector_names = MALLOC(block->num_groups * block->num_selectors 
*
+  block->selector_name_stride);
+   if (!block->selector_names)
+   return FALSE;
+
+   groupname = block->group_names;
+   p = block->selector_names;
+   for (i = 0; i < block->num_groups; ++i) {
+   for (j = 0; j < block->num_selectors; ++j) {
+   sprintf(p, "%s_%03d", groupname, j);
+   p += block->selector_name_stride;
+   }
+   groupname += block->group_name_stride;
+   }
+
+   return TRUE;
+}
+
 int r600_get_perfcounter_info(struct r600_common_screen *screen,
  unsigned index,
  struct pipe_driver_query_info *info)
@@ -450,6 +533,10 @@ int r600_get_perfcounter_info(struct r600_common_screen 
*screen,
if (!block)
return 0;
 
+   if (!block->selector_names) {
+   if (!r600_init_block_names(screen, block))
+   return 0;
+   }
info->name = block->selector_names + sub * block->selector_name_stride;
info->query_type = R600_QUERY_FIRST_PERFCOUNTER + index;
info->max_value.u64 = 0;
@@ -478,6 +565,11 @@ int r600_get_perfcounter_group_info(struct 
r600_common_screen *screen,
block = lookup_group(pc, &index);
if (!block)
return 0;
+
+   if (!block->group_names) {
+   if (!r600_init_block_names(screen, block))
+

Re: [Mesa-dev] [PATCH v2 35/42] glsl: Translate atomic intrinsic functions on shared variables

2015-11-25 Thread Iago Toral
On Tue, 2015-11-17 at 21:55 -0800, Jordan Justen wrote:
> When an intrinsic atomic operation is used on a shared variable, we
> translate it to a new 'share variable' specific intrinsic function

s/share/shared

> call.
> 
> For example, add call to __intrinsic_atomic_add when used on a shared
> variable will be translated to a call to
> __intrinsic_atomic_add_shared.

s/add/a

> 
> Signed-off-by: Jordan Justen 
> ---
>  src/glsl/lower_shared_reference.cpp | 151 
> 
>  1 file changed, 151 insertions(+)
> 
> diff --git a/src/glsl/lower_shared_reference.cpp 
> b/src/glsl/lower_shared_reference.cpp
> index 21bc5d5..7da32a5 100644
> --- a/src/glsl/lower_shared_reference.cpp
> +++ b/src/glsl/lower_shared_reference.cpp
> @@ -80,6 +80,10 @@ public:
> ir_visitor_status visit_enter(ir_assignment *ir);
> void handle_assignment(ir_assignment *ir);
>  
> +   ir_call *lower_shared_atomic_intrinsic(ir_call *ir);
> +   ir_call *check_for_shared_atomic_intrinsic(ir_call *ir);
> +   ir_visitor_status visit_enter(ir_call *ir);
> +
> unsigned get_shared_offset(const ir_variable *);
>  
> ir_call *shared_load(void *mem_ctx, const struct glsl_type *type,
> @@ -342,6 +346,153 @@ lower_shared_reference_visitor::shared_load(void 
> *mem_ctx,
> return new(mem_ctx) ir_call(sig, deref_result, &call_params);
>  }
>  
> +/* Lowers the intrinsic call to a new internal intrinsic that swaps the
> + * access to the buffer variable in the first parameter by an offset

s/buffer/shared

> + * and block index. This involves creating the new internal intrinsic

the block_index part does not apply to shared variables.

> + * (i.e. the new function signature).
> + */
> +ir_call *
> +lower_shared_reference_visitor::lower_shared_atomic_intrinsic(ir_call *ir)
> +{
> +   /* Shared atomics usually have 2 parameters, the shared variable and an
> +* integer argument. The exception is CompSwap, that has an additional
> +* integer parameter.
> +*/
> +   int param_count = ir->actual_parameters.length();
> +   assert(param_count == 2 || param_count == 3);
> +
> +   /* First argument must be a scalar integer buffer variable */

s/buffer/shared

> +   exec_node *param = ir->actual_parameters.get_head();
> +   ir_instruction *inst = (ir_instruction *) param;
> +   assert(inst->ir_type == ir_type_dereference_variable ||
> +  inst->ir_type == ir_type_dereference_array ||
> +  inst->ir_type == ir_type_dereference_record ||
> +  inst->ir_type == ir_type_swizzle);
> +
> +   ir_rvalue *deref = (ir_rvalue *) inst;
> +   assert(deref->type->is_scalar() && deref->type->is_integer());
> +
> +   ir_variable *var = deref->variable_referenced();
> +   assert(var);
> +
> +   /* Compute the offset to the start if the dereference and the
> +* block index

Only the offset, not the block index.

> +*/
> +   void *mem_ctx = ralloc_parent(shader->ir);
> +
> +   ir_rvalue *offset = NULL;
> +   unsigned const_offset = get_shared_offset(var);
> +   bool row_major;
> +   int matrix_columns;
> +   const glsl_type *iface = var->get_interface_type();
> +   unsigned packing =
> +  iface ? iface->interface_packing : GLSL_INTERFACE_PACKING_STD430;
> +   buffer_access_type = shared_atomic_access;

Same question about shared variables in interface blocks I asked in a
previous patch.

> +   setup_buffer_access(mem_ctx, var, deref,
> +   &offset, &const_offset,
> +   &row_major, &matrix_columns, packing);
> +
> +   assert(offset);
> +   assert(!row_major);
> +   assert(matrix_columns == 1);
> +
> +   ir_rvalue *deref_offset =
> +  add(offset, new(mem_ctx) ir_constant(const_offset));
> +
> +   /* Create the new internal function signature that will take a block
> +* index and offset instead of a buffer variable

Only an offset, not a block index

> +*/
> +   exec_list sig_params;
> +   ir_variable *sig_param = new(mem_ctx)
> +  ir_variable(glsl_type::uint_type, "offset" , ir_var_function_in);
> +   sig_params.push_tail(sig_param);
> +
> +   const glsl_type *type = deref->type->base_type == GLSL_TYPE_INT ?
> +  glsl_type::int_type : glsl_type::uint_type;
> +   sig_param = new(mem_ctx)
> + ir_variable(type, "data1", ir_var_function_in);
> +   sig_params.push_tail(sig_param);
> +
> +   if (param_count == 3) {
> +  sig_param = new(mem_ctx)
> +ir_variable(type, "data2", ir_var_function_in);
> +  sig_params.push_tail(sig_param);
> +   }
> +
> +   ir_function_signature *sig =
> +  new(mem_ctx) ir_function_signature(deref->type,
> + compute_shader_enabled);
> +   assert(sig);
> +   sig->replace_parameters(&sig_params);
> +   sig->is_intrinsic = true;
> +
> +   char func_name[64];
> +   sprintf(func_name, "%s_shared", ir->callee_name());
> +   ir_function *f = new(mem_ctx) ir_function(func_name);
> +   f->add_signature(sig);
> +
> +   /* Now, create the call to the inter

Re: [Mesa-dev] [PATCH 0/8] radeonsi: Implement AMD_performance_monitor

2015-11-25 Thread Nicolai Hähnle

On 24.11.2015 17:20, Marek Olšák wrote:

On Tue, Nov 24, 2015 at 5:08 PM, Ilia Mirkin  wrote:

On Tue, Nov 24, 2015 at 11:00 AM, Nicolai Hähnle  wrote:

One annoyance of the series as it currently stands is that exposing thousands
of counters affects the context creation time due to the way the Mesa state
tracker initializes itself, especially given that this is a feature that
isn't used in production. Dealing with that is on my todo-list.


Just a thought - only expose the (majority of the) counters in debug contexts?


Deferring the initialization until the first use seems better.


I agree.


If piglit takes longer to run now, the issue should be addressed
before the patches land.


I haven't tested the running time of piglit, but with the patches I've 
just posted the overhead should be essentially zero now.


Cheers,
Nicolai


Marek



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 36/42] glsl: Buffer atomics are supported for compute shaders

2015-11-25 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga 

On Tue, 2015-11-17 at 21:55 -0800, Jordan Justen wrote:
> Signed-off-by: Jordan Justen 
> ---
>  src/glsl/builtin_functions.cpp | 70 
> +++---
>  1 file changed, 38 insertions(+), 32 deletions(-)
> 
> diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
> index 3e767e8..bd4c5a3 100644
> --- a/src/glsl/builtin_functions.cpp
> +++ b/src/glsl/builtin_functions.cpp
> @@ -465,6 +465,12 @@ compute_shader(const _mesa_glsl_parse_state *state)
>  }
>  
>  static bool
> +buffer_atomics_supported(const _mesa_glsl_parse_state *state)
> +{
> +   return compute_shader(state) || shader_storage_buffer_object(state);
> +}
> +
> +static bool
>  barrier_supported(const _mesa_glsl_parse_state *state)
>  {
> return compute_shader(state) ||
> @@ -916,51 +922,51 @@ builtin_builder::create_intrinsics()
>  NULL);
>  
> add_function("__intrinsic_atomic_add",
> -_atomic_intrinsic2(shader_storage_buffer_object,
> +_atomic_intrinsic2(buffer_atomics_supported,
> glsl_type::uint_type),
> -_atomic_intrinsic2(shader_storage_buffer_object,
> +_atomic_intrinsic2(buffer_atomics_supported,
> glsl_type::int_type),
>  NULL);
> add_function("__intrinsic_atomic_min",
> -_atomic_intrinsic2(shader_storage_buffer_object,
> +_atomic_intrinsic2(buffer_atomics_supported,
> glsl_type::uint_type),
> -_atomic_intrinsic2(shader_storage_buffer_object,
> +_atomic_intrinsic2(buffer_atomics_supported,
> glsl_type::int_type),
>  NULL);
> add_function("__intrinsic_atomic_max",
> -_atomic_intrinsic2(shader_storage_buffer_object,
> +_atomic_intrinsic2(buffer_atomics_supported,
> glsl_type::uint_type),
> -_atomic_intrinsic2(shader_storage_buffer_object,
> +_atomic_intrinsic2(buffer_atomics_supported,
> glsl_type::int_type),
>  NULL);
> add_function("__intrinsic_atomic_and",
> -_atomic_intrinsic2(shader_storage_buffer_object,
> +_atomic_intrinsic2(buffer_atomics_supported,
> glsl_type::uint_type),
> -_atomic_intrinsic2(shader_storage_buffer_object,
> +_atomic_intrinsic2(buffer_atomics_supported,
> glsl_type::int_type),
>  NULL);
> add_function("__intrinsic_atomic_or",
> -_atomic_intrinsic2(shader_storage_buffer_object,
> +_atomic_intrinsic2(buffer_atomics_supported,
> glsl_type::uint_type),
> -_atomic_intrinsic2(shader_storage_buffer_object,
> +_atomic_intrinsic2(buffer_atomics_supported,
> glsl_type::int_type),
>  NULL);
> add_function("__intrinsic_atomic_xor",
> -_atomic_intrinsic2(shader_storage_buffer_object,
> +_atomic_intrinsic2(buffer_atomics_supported,
> glsl_type::uint_type),
> -_atomic_intrinsic2(shader_storage_buffer_object,
> +_atomic_intrinsic2(buffer_atomics_supported,
> glsl_type::int_type),
>  NULL);
> add_function("__intrinsic_atomic_exchange",
> -_atomic_intrinsic2(shader_storage_buffer_object,
> +_atomic_intrinsic2(buffer_atomics_supported,
> glsl_type::uint_type),
> -_atomic_intrinsic2(shader_storage_buffer_object,
> +_atomic_intrinsic2(buffer_atomics_supported,
> glsl_type::int_type),
>  NULL);
> add_function("__intrinsic_atomic_comp_swap",
> -_atomic_intrinsic3(shader_storage_buffer_object,
> +_atomic_intrinsic3(buffer_atomics_supported,
> glsl_type::uint_type),
> -_atomic_intrinsic3(shader_storage_buffer_object,
> +_atomic_intrinsic3(buffer_atomics_supported,
> glsl_type::int_type),
>  NULL);
>  
> @@ -2658,66 +2664,66 @@ builtin_builder::create_builtins()
>  
> add_function("atomicAdd",
>  _atomic_op2("__intrinsic_atomic_add",
> -shader_storage_buffer_object,
> +buffer_atomics_supported,
>  glsl_type::uint_type),
>  _atomic_op2("__intrinsic_atomic_add",
> -shade

Re: [Mesa-dev] [PATCH v2 37/42] glsl: Disable several optimizations on shared variables

2015-11-25 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga 

On Tue, 2015-11-17 at 21:55 -0800, Jordan Justen wrote:
> Shared variables can be accessed by other threads within the same
> local workgroup. This prevents us from performing certain
> optimizations with shared variables.
> 
> Signed-off-by: Jordan Justen 
> ---
>  src/glsl/opt_constant_propagation.cpp | 3 ++-
>  src/glsl/opt_constant_variable.cpp| 3 ++-
>  src/glsl/opt_copy_propagation.cpp | 3 ++-
>  3 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/src/glsl/opt_constant_propagation.cpp 
> b/src/glsl/opt_constant_propagation.cpp
> index 184aaa1..fb24a4f 100644
> --- a/src/glsl/opt_constant_propagation.cpp
> +++ b/src/glsl/opt_constant_propagation.cpp
> @@ -500,7 +500,8 @@ 
> ir_constant_propagation_visitor::add_constant(ir_assignment *ir)
>  * the variable value isn't modified between this assignment and the next
>  * instruction where its value is read.
>  */
> -   if (deref->var->data.mode == ir_var_shader_storage)
> +   if (deref->var->data.mode == ir_var_shader_storage ||
> +   deref->var->data.mode == ir_var_shader_shared)
>return;
>  
> entry = new(this->mem_ctx) acp_entry(deref->var, ir->write_mask, 
> constant);
> diff --git a/src/glsl/opt_constant_variable.cpp 
> b/src/glsl/opt_constant_variable.cpp
> index cdfbc34..56f6a81 100644
> --- a/src/glsl/opt_constant_variable.cpp
> +++ b/src/glsl/opt_constant_variable.cpp
> @@ -120,7 +120,8 @@ ir_constant_variable_visitor::visit_enter(ir_assignment 
> *ir)
>  * and we can't be sure that this variable won't be written by another
>  * thread.
>  */
> -   if (var->data.mode == ir_var_shader_storage)
> +   if (var->data.mode == ir_var_shader_storage ||
> +   var->data.mode == ir_var_shader_shared)
>return visit_continue;
>  
> constval = ir->rhs->constant_expression_value();
> diff --git a/src/glsl/opt_copy_propagation.cpp 
> b/src/glsl/opt_copy_propagation.cpp
> index f206995..5d4cb4f 100644
> --- a/src/glsl/opt_copy_propagation.cpp
> +++ b/src/glsl/opt_copy_propagation.cpp
> @@ -330,7 +330,8 @@ ir_copy_propagation_visitor::add_copy(ir_assignment *ir)
> */
>ir->condition = new(ralloc_parent(ir)) ir_constant(false);
>this->progress = true;
> -  } else if (lhs_var->data.mode != ir_var_shader_storage) {
> +  } else if (lhs_var->data.mode != ir_var_shader_storage &&
> + lhs_var->data.mode != ir_var_shader_shared) {
>entry = new(this->acp) acp_entry(lhs_var, rhs_var);
>this->acp->push_tail(entry);
>}


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/1] Do not loose steps when dumping to file with INTEL_DEBUG=optimizer

2015-11-25 Thread Juan A. Suarez Romero

The BRW functions that run then optimizations in the FS 
(fs_visitor::optimize()) and
in the VS (vec4_visitor::run()) can be called more than once.

This means when using INTEL_DEBUG=optimizer the output files with the
optimization in each step are overwritten in successive calls.

The quick and easiest fix is adding a new variable that is part of the filename
(as iteration and pass_num), but that never is re-initialized to 0, and that
indicates the call number. So this way we don't loose any in-between file.

Adding a new variable instead of not initializing to 0 one of the others (like
iteration) make it easier to indentify for each file in which call it was
generated.

As example, in original master I get 138 files for FS/VS optimizations when
running a piglit test, while now I get 225, meaning I'm recovering 87 missing
steps.


Juan A. Suarez Romero (1):
  i965: Do not overwrite optimizer dumps

 src/mesa/drivers/dri/i965/brw_fs.cpp   | 13 +
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 11 +++
 2 files changed, 16 insertions(+), 8 deletions(-)

-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/1] i965: Do not overwrite optimizer dumps

2015-11-25 Thread Juan A. Suarez Romero
When using INTEL_DEBUG=optimizer, each optimizing step is dump to disk,
in a separate file.

But as fs_visitor::optimize() and vec4_visitor::run() are called more
than once, it ends up overwriting the files already on disk, loosing
then previous optimizer steps.

To avoid this, add a new static variable that tracks the global
iteration across the entire life of the program running.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 13 +
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 11 +++
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 29f19cc..9520a62 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -4947,6 +4947,8 @@ fs_visitor::calculate_register_pressure()
 void
 fs_visitor::optimize()
 {
+   static int global_iteration = 0;
+
/* Start by validating the shader we currently have. */
validate();
 
@@ -4977,8 +4979,9 @@ fs_visitor::optimize()
 \
   if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER) && this_progress) {   \
  char filename[64]; \
- snprintf(filename, 64, "%s%d-%s-%02d-%02d-" #pass,  \
-  stage_abbrev, dispatch_width, nir->info.name, iteration, 
pass_num); \
+ snprintf(filename, 64, "%s%d-%s-%02d-%02d-%02d-" #pass,\
+  stage_abbrev, dispatch_width, nir->info.name, \
+  global_iteration, iteration, pass_num);   \
 \
  backend_shader::dump_instructions(filename);   \
   } \
@@ -4989,10 +4992,12 @@ fs_visitor::optimize()
   this_progress;\
})
 
+   global_iteration++;
+
if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER)) {
   char filename[64];
-  snprintf(filename, 64, "%s%d-%s-00-start",
-   stage_abbrev, dispatch_width, nir->info.name);
+  snprintf(filename, 64, "%s%d-%s-%02d-00-00-start",
+   stage_abbrev, dispatch_width, nir->info.name, global_iteration);
 
   backend_shader::dump_instructions(filename);
}
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 9a79d67..689ed2a 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1779,6 +1779,8 @@ vec4_visitor::convert_to_hw_regs()
 bool
 vec4_visitor::run()
 {
+   static int global_iteration = 0;
+
if (shader_time_index >= 0)
   emit_shader_time_begin();
 
@@ -1812,8 +1814,8 @@ vec4_visitor::run()
\
   if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER) && this_progress) {  \
  char filename[64];\
- snprintf(filename, 64, "%s-%s-%02d-%02d-" #pass,  \
-  stage_abbrev, nir->info.name, iteration, pass_num);  \
+ snprintf(filename, 64, "%s-%s-%02d-%02d-%02d-" #pass, \
+  stage_abbrev, nir->info.name, global_iteration, iteration, 
pass_num); \
\
  backend_shader::dump_instructions(filename);  \
   }\
@@ -1822,11 +1824,12 @@ vec4_visitor::run()
   this_progress;   \
})
 
+   global_iteration++;
 
if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER)) {
   char filename[64];
-  snprintf(filename, 64, "%s-%s-00-start",
-   stage_abbrev, nir->info.name);
+  snprintf(filename, 64, "%s-%s-%02d-00-00-start",
+   stage_abbrev, nir->info.name, global_iteration);
 
   backend_shader::dump_instructions(filename);
}
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 06/42] i965: Import tables enumerating the set of validated L3 configurations.

2015-11-25 Thread Samuel Iglesias Gonsálvez


On 18/11/15 06:54, Jordan Justen wrote:
> From: Francisco Jerez 
> 
> It should be possible to use additional L3 configurations other than
> the ones listed in the tables of validated allocations ("BSpec »
> 3D-Media-GPGPU Engine » L3 Cache and URB [IVB+] » L3 Cache and URB [*]
> » L3 Allocation and Programming"), but it seems sensible for now to
> hard-code the tables in order to stick to the hardware docs.  Instead
> of setting up the arbitrary L3 partitioning given as input, the
> closest validated L3 configuration will be looked up in these tables
> and used to program the hardware.
> 
> The included tables should work for Gen7-9.  Note that the quantities
> are specified in ways rather than in KB, this is because the L3
> control registers expect the value in ways, and because by doing that
> we can re-use a single table for all GT variants of the same
> generation (and in the case of IVB/HSW and CHV/SKL across different
> generations) which generally have different L3 way sizes but allow the
> same combinations of way allocations.
> ---
>  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
>  src/mesa/drivers/dri/i965/gen7_l3_state.c  | 163 
> +
>  2 files changed, 164 insertions(+)
>  create mode 100644 src/mesa/drivers/dri/i965/gen7_l3_state.c
> 
> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
> b/src/mesa/drivers/dri/i965/Makefile.sources
> index 5a88d66..91901ad 100644
> --- a/src/mesa/drivers/dri/i965/Makefile.sources
> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
> @@ -184,6 +184,7 @@ i965_FILES = \
>   gen7_cs_state.c \
>   gen7_disable.c \
>   gen7_gs_state.c \
> + gen7_l3_state.c \
>   gen7_misc_state.c \
>   gen7_sf_state.c \
>   gen7_sol_state.c \
> diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
> b/src/mesa/drivers/dri/i965/gen7_l3_state.c
> new file mode 100644
> index 000..8f9ba5b
> --- /dev/null
> +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
> @@ -0,0 +1,163 @@
> +/*
> + * Copyright (c) 2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include "brw_context.h"
> +#include "brw_defines.h"
> +#include "brw_state.h"
> +#include "intel_batchbuffer.h"
> +
> +/**
> + * Chunk of L3 cache reserved for some specific purpose.
> + */
> +enum brw_l3_partition {
> +   /** Shared local memory. */
> +   L3P_SLM = 0,
> +   /** Unified return buffer. */
> +   L3P_URB,
> +   /** Union of DC and RO. */
> +   L3P_ALL,
> +   /** Data cluster RW partition. */
> +   L3P_DC,
> +   /** Union of IS, C and T. */
> +   L3P_RO,
> +   /** Instruction and state cache. */
> +   L3P_IS,
> +   /** Constant cache. */
> +   L3P_C,
> +   /** Texture cache. */
> +   L3P_T,
> +   /** Number of supported L3 partitions. */
> +   NUM_L3P
> +};
> +
> +/**
> + * L3 configuration represented as the number of ways allocated for each
> + * partition.  \sa get_l3_way_size().
> + */
> +struct brw_l3_config {
> +   unsigned n[NUM_L3P];
> +};
> +
> +/**
> + * IVB/HSW validated L3 configurations.
> + */
> +static const struct brw_l3_config ivb_l3_configs[] = {
> +   {{  0, 32,  0,  0, 32,  0,  0,  0 }},
> +   {{  0, 32,  0, 16, 16,  0,  0,  0 }},
> +   {{  0, 32,  0,  4,  0,  8,  4, 16 }},
> +   {{  0, 28,  0,  8,  0,  8,  4, 16 }},
> +   {{  0, 28,  0, 16,  0,  8,  4,  8 }},
> +   {{  0, 28,  0,  8,  0, 16,  4,  8 }},
> +   {{  0, 28,  0,  0,  0, 16,  4, 16 }},
> +   {{  0, 32,  0,  0,  0, 16,  0, 16 }},
> +   {{  0, 28,  0,  4, 32,  0,  0,  0 }},
> +   {{ 16, 16,  0, 16, 16,  0,  0,  0 }},
> +   {{ 16, 16,  0,  8,  0,  8,  8,  8 }},
> +   {{ 16, 16,  0,  4,  0,  8,  4, 16 }},
> +   {{ 16, 16,  0,  4,  0, 16,  4,  8 }},
> +   {{ 16, 16,  0,  0, 32,  0,  0,  0 }},
> +   {{ 0 }}
> +};
> +

This definition is wrong. It defines 15 elements when NUM_L3P value is 9.

> +/**
> + * VLV validated L3 configurations.
> + */

Re: [Mesa-dev] [PATCH v2 38/42] nir: Add nir intrinsics for shared variable atomic operations

2015-11-25 Thread Iago Toral
On Tue, 2015-11-17 at 21:55 -0800, Jordan Justen wrote:
> Signed-off-by: Jordan Justen 
> ---
>  src/glsl/nir/glsl_to_nir.cpp  | 53 
> +++
>  src/glsl/nir/nir_intrinsics.h | 25 
>  2 files changed, 78 insertions(+)
> 
> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
> index 83724d3..a7ee4be 100644
> --- a/src/glsl/nir/glsl_to_nir.cpp
> +++ b/src/glsl/nir/glsl_to_nir.cpp
> @@ -731,6 +731,22 @@ nir_visitor::visit(ir_call *ir)
>   op = nir_intrinsic_load_shared;
>} else if (strcmp(ir->callee_name(), "__intrinsic_store_shared") == 0) 
> {
>   op = nir_intrinsic_store_shared;
> +  } else if (strcmp(ir->callee_name(), "__intrinsic_atomic_add_shared") 
> == 0) {
> + op = nir_intrinsic_shared_atomic_add;
> +  } else if (strcmp(ir->callee_name(), "__intrinsic_atomic_and_shared") 
> == 0) {
> + op = nir_intrinsic_shared_atomic_and;
> +  } else if (strcmp(ir->callee_name(), "__intrinsic_atomic_or_shared") 
> == 0) {
> + op = nir_intrinsic_shared_atomic_or;
> +  } else if (strcmp(ir->callee_name(), "__intrinsic_atomic_xor_shared") 
> == 0) {
> + op = nir_intrinsic_shared_atomic_xor;
> +  } else if (strcmp(ir->callee_name(), "__intrinsic_atomic_min_shared") 
> == 0) {
> + op = nir_intrinsic_shared_atomic_min;
> +  } else if (strcmp(ir->callee_name(), "__intrinsic_atomic_max_shared") 
> == 0) {
> + op = nir_intrinsic_shared_atomic_max;

This is based on our early ssbo nir code. However, For min/max we need
to do something different because we need to differentiate between
unsigned and signed versions and our initial nir port did not get that
right. See the nir ssbo implementation in master.

Iago

> +  } else if (strcmp(ir->callee_name(), 
> "__intrinsic_atomic_exchange_shared") == 0) {
> + op = nir_intrinsic_shared_atomic_exchange;
> +  } else if (strcmp(ir->callee_name(), 
> "__intrinsic_atomic_comp_swap_shared") == 0) {
> + op = nir_intrinsic_shared_atomic_comp_swap;
>} else {
>   unreachable("not reached");
>}
> @@ -1036,6 +1052,43 @@ nir_visitor::visit(ir_call *ir)
>   nir_builder_instr_insert(&b, &instr->instr);
>   break;
>}
> +  case nir_intrinsic_shared_atomic_add:
> +  case nir_intrinsic_shared_atomic_min:
> +  case nir_intrinsic_shared_atomic_max:
> +  case nir_intrinsic_shared_atomic_and:
> +  case nir_intrinsic_shared_atomic_or:
> +  case nir_intrinsic_shared_atomic_xor:
> +  case nir_intrinsic_shared_atomic_exchange:
> +  case nir_intrinsic_shared_atomic_comp_swap: {
> + int param_count = ir->actual_parameters.length();
> + assert(param_count == 2 || param_count == 3);
> +
> + /* Offset */
> + exec_node *param = ir->actual_parameters.get_head();
> + ir_instruction *inst = (ir_instruction *) param;
> + instr->src[0] = nir_src_for_ssa(evaluate_rvalue(inst->as_rvalue()));
> +
> + /* data1 parameter (this is always present) */
> + param = param->get_next();
> + inst = (ir_instruction *) param;
> + instr->src[1] = nir_src_for_ssa(evaluate_rvalue(inst->as_rvalue()));
> +
> + /* data2 parameter (only with atomic_comp_swap) */
> + if (param_count == 3) {
> +assert(op == nir_intrinsic_shared_atomic_comp_swap);
> +param = param->get_next();
> +inst = (ir_instruction *) param;
> +instr->src[2] =
> +   nir_src_for_ssa(evaluate_rvalue(inst->as_rvalue()));
> + }
> +
> + /* Atomic result */
> + assert(ir->return_deref);
> + nir_ssa_dest_init(&instr->instr, &instr->dest,
> +   ir->return_deref->type->vector_elements, NULL);
> + nir_builder_instr_insert(&b, &instr->instr);
> + break;
> +  }
>default:
>   unreachable("not reached");
>}
> diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
> index 6912258..31b83fe 100644
> --- a/src/glsl/nir/nir_intrinsics.h
> +++ b/src/glsl/nir/nir_intrinsics.h
> @@ -203,6 +203,31 @@ INTRINSIC(ssbo_atomic_xor, 3, ARR(1, 1, 1), true, 1, 0, 
> 0, 0)
>  INTRINSIC(ssbo_atomic_exchange, 3, ARR(1, 1, 1), true, 1, 0, 0, 0)
>  INTRINSIC(ssbo_atomic_comp_swap, 4, ARR(1, 1, 1, 1), true, 1, 0, 0, 0)
>  
> +/*
> + * CS shared variable atomic intrinsics
> + *
> + * All of the shared variable atomic memory operations read a value from
> + * memory, compute a new value using one of the operations below, write the
> + * new value to memory, and return the original value read.
> + *
> + * All operations take 2 sources except CompSwap that takes 3. These
> + * sources represent:
> + *
> + * 0: The offset into the shared variable storage region that the atomic
> + *operation will operate on.
> + * 1: The data parameter to the atomic function (i.e. the

Re: [Mesa-dev] [PATCH v2 39/42] i965/nir: Implement shared variable atomic operations

2015-11-25 Thread Iago Toral
On Tue, 2015-11-17 at 21:55 -0800, Jordan Justen wrote:
> Signed-off-by: Jordan Justen 
> ---
>  src/mesa/drivers/dri/i965/brw_fs.h   |  2 ++
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 60 
> 
>  2 files changed, 62 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
> b/src/mesa/drivers/dri/i965/brw_fs.h
> index cbfc07f..1c9b4c3 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -276,6 +276,8 @@ public:
> nir_intrinsic_instr *instr);
> void nir_emit_ssbo_atomic(const brw::fs_builder &bld,
>   int op, nir_intrinsic_instr *instr);
> +   void nir_emit_shared_atomic(const brw::fs_builder &bld,
> +   int op, nir_intrinsic_instr *instr);
> void nir_emit_texture(const brw::fs_builder &bld,
>   nir_tex_instr *instr);
> void nir_emit_jump(const brw::fs_builder &bld,
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index c8c6370..792fda7 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -1967,6 +1967,37 @@ fs_visitor::nir_emit_cs_intrinsic(const fs_builder 
> &bld,
>break;
> }
>  
> +   case nir_intrinsic_shared_atomic_add:
> +  nir_emit_shared_atomic(bld, BRW_AOP_ADD, instr);
> +  break;
> +   case nir_intrinsic_shared_atomic_min:
> +  if (dest.type == BRW_REGISTER_TYPE_D)
> + nir_emit_shared_atomic(bld, BRW_AOP_IMIN, instr);
> +  else
> + nir_emit_shared_atomic(bld, BRW_AOP_UMIN, instr);
> +  break;
> +   case nir_intrinsic_shared_atomic_max:
> +  if (dest.type == BRW_REGISTER_TYPE_D)
> + nir_emit_shared_atomic(bld, BRW_AOP_IMAX, instr);
> +  else
> + nir_emit_shared_atomic(bld, BRW_AOP_UMAX, instr);
> +  break;

As explained in the previous patch, this implementation of min/max won't
work. Check the ssbo implementation in master.

> +   case nir_intrinsic_shared_atomic_and:
> +  nir_emit_shared_atomic(bld, BRW_AOP_AND, instr);
> +  break;
> +   case nir_intrinsic_shared_atomic_or:
> +  nir_emit_shared_atomic(bld, BRW_AOP_OR, instr);
> +  break;
> +   case nir_intrinsic_shared_atomic_xor:
> +  nir_emit_shared_atomic(bld, BRW_AOP_XOR, instr);
> +  break;
> +   case nir_intrinsic_shared_atomic_exchange:
> +  nir_emit_shared_atomic(bld, BRW_AOP_MOV, instr);
> +  break;
> +   case nir_intrinsic_shared_atomic_comp_swap:
> +  nir_emit_shared_atomic(bld, BRW_AOP_CMPWR, instr);
> +  break;
> +
> default:
>nir_emit_intrinsic(bld, instr);
>break;
> @@ -2607,6 +2638,35 @@ fs_visitor::nir_emit_ssbo_atomic(const fs_builder &bld,
>  }
>  
>  void
> +fs_visitor::nir_emit_shared_atomic(const fs_builder &bld,
> +   int op, nir_intrinsic_instr *instr)
> +{
> +   fs_reg dest;
> +   if (nir_intrinsic_infos[instr->intrinsic].has_dest)
> +  dest = get_nir_dest(instr->dest);
> +
> +   unsigned index = BRW_SLM_SURFACE_INDEX;
> +   fs_reg surface = fs_reg(index);

fs_reg surface = brw_imm_ud(BRW_SLM_SURFACE_INDEX) and remove the index
variable.

Iago

> +   fs_reg offset = get_nir_src(instr->src[0]);
> +   fs_reg data1 = get_nir_src(instr->src[1]);
> +   fs_reg data2;
> +   if (op == BRW_AOP_CMPWR)
> +  data2 = get_nir_src(instr->src[2]);
> +
> +   /* Emit the actual atomic operation operation */
> +
> +   fs_reg atomic_result =
> +  surface_access::emit_untyped_atomic(bld, surface, offset,
> +  data1, data2,
> +  1 /* dims */, 1 /* rsize */,
> +  op,
> +  BRW_PREDICATE_NONE);
> +   dest.type = atomic_result.type;
> +   bld.MOV(dest, atomic_result);
> +}
> +
> +void
>  fs_visitor::nir_emit_texture(const fs_builder &bld, nir_tex_instr *instr)
>  {
> unsigned sampler = instr->sampler_index;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/1] i965: Do not overwrite optimizer dumps

2015-11-25 Thread Juan A. Suarez Romero
On Wed, 2015-11-25 at 13:15 +0100, Juan A. Suarez Romero wrote:
> When using INTEL_DEBUG=optimizer, each optimizing step is dump to
> disk,
> in a separate file.
> 
> But as fs_visitor::optimize() and vec4_visitor::run() are called more
> than once, it ends up overwriting the files already on disk, loosing
> then previous optimizer steps.
> 
> To avoid this, add a new static variable that tracks the global
> iteration across the entire life of the program running.
> ---

Forgot to add the Signed-by.

J.A.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Do not overwrite optimizer dumps

2015-11-25 Thread Juan A. Suarez Romero
When using INTEL_DEBUG=optimizer, each optimizing step is dump to disk,
in a separate file.

But as fs_visitor::optimize() and vec4_visitor::run() are called more
than once, it ends up overwriting the files already on disk, loosing
then previous optimizer steps.

To avoid this, add a new static variable that tracks the global
iteration across the entire life of the program running.

Signed-off-by: Juan A. Suarez Romero 
---
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 13 +
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 11 +++
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 29f19cc..9520a62 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -4947,6 +4947,8 @@ fs_visitor::calculate_register_pressure()
 void
 fs_visitor::optimize()
 {
+   static int global_iteration = 0;
+
/* Start by validating the shader we currently have. */
validate();
 
@@ -4977,8 +4979,9 @@ fs_visitor::optimize()
 \
   if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER) && this_progress) {   \
  char filename[64]; \
- snprintf(filename, 64, "%s%d-%s-%02d-%02d-" #pass,  \
-  stage_abbrev, dispatch_width, nir->info.name, iteration, 
pass_num); \
+ snprintf(filename, 64, "%s%d-%s-%02d-%02d-%02d-" #pass,\
+  stage_abbrev, dispatch_width, nir->info.name, \
+  global_iteration, iteration, pass_num);   \
 \
  backend_shader::dump_instructions(filename);   \
   } \
@@ -4989,10 +4992,12 @@ fs_visitor::optimize()
   this_progress;\
})
 
+   global_iteration++;
+
if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER)) {
   char filename[64];
-  snprintf(filename, 64, "%s%d-%s-00-start",
-   stage_abbrev, dispatch_width, nir->info.name);
+  snprintf(filename, 64, "%s%d-%s-%02d-00-00-start",
+   stage_abbrev, dispatch_width, nir->info.name, global_iteration);
 
   backend_shader::dump_instructions(filename);
}
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 9a79d67..689ed2a 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1779,6 +1779,8 @@ vec4_visitor::convert_to_hw_regs()
 bool
 vec4_visitor::run()
 {
+   static int global_iteration = 0;
+
if (shader_time_index >= 0)
   emit_shader_time_begin();
 
@@ -1812,8 +1814,8 @@ vec4_visitor::run()
\
   if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER) && this_progress) {  \
  char filename[64];\
- snprintf(filename, 64, "%s-%s-%02d-%02d-" #pass,  \
-  stage_abbrev, nir->info.name, iteration, pass_num);  \
+ snprintf(filename, 64, "%s-%s-%02d-%02d-%02d-" #pass, \
+  stage_abbrev, nir->info.name, global_iteration, iteration, 
pass_num); \
\
  backend_shader::dump_instructions(filename);  \
   }\
@@ -1822,11 +1824,12 @@ vec4_visitor::run()
   this_progress;   \
})
 
+   global_iteration++;
 
if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER)) {
   char filename[64];
-  snprintf(filename, 64, "%s-%s-00-start",
-   stage_abbrev, nir->info.name);
+  snprintf(filename, 64, "%s-%s-%02d-00-00-start",
+   stage_abbrev, nir->info.name, global_iteration);
 
   backend_shader::dump_instructions(filename);
}
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 40/42] i965: Enable ARB_compute_shader extension on supported hardware

2015-11-25 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga 

On Tue, 2015-11-17 at 21:55 -0800, Jordan Justen wrote:
> Enable ARB_compute_shader on gen7+, on hardware that supports the
> OpenGL 4.3 requirements of a local group size of 1024.
> 
> With SIMD16 support, this is limited to Ivy Bridge and Haswell.
> 
> Broadwell will work with a local group size up to 896 on SIMD16
> meaning programs that use this size or lower should run when setting
> MESA_EXTENSION_OVERRIDE=GL_ARB_compute_shader.
> 
> Signed-off-by: Jordan Justen 
> ---
>  src/mesa/drivers/dri/i965/brw_context.c  | 11 ++-
>  src/mesa/drivers/dri/i965/intel_extensions.c |  2 ++
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
> b/src/mesa/drivers/dri/i965/brw_context.c
> index 2ea0a9e..fc32a2f 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.c
> +++ b/src/mesa/drivers/dri/i965/brw_context.c
> @@ -330,7 +330,9 @@ brw_initialize_context_constants(struct brw_context *brw)
>[MESA_SHADER_TESS_EVAL] = false,
>[MESA_SHADER_GEOMETRY] = brw->gen >= 6,
>[MESA_SHADER_FRAGMENT] = true,
> -  [MESA_SHADER_COMPUTE] = 
> _mesa_extension_override_enables.ARB_compute_shader,
> +  [MESA_SHADER_COMPUTE] =
> + (ctx->Const.MaxComputeWorkGroupSize[0] >= 1024) ||
> + _mesa_extension_override_enables.ARB_compute_shader,
> };
>  
> unsigned num_stages = 0;
> @@ -613,7 +615,7 @@ brw_initialize_context_constants(struct brw_context *brw)
>  }
>  
>  static void
> -brw_adjust_cs_context_constants(struct brw_context *brw)
> +brw_initialize_cs_context_constants(struct brw_context *brw, unsigned 
> max_threads)
>  {
> struct gl_context *ctx = &brw->ctx;
>  
> @@ -627,7 +629,7 @@ brw_adjust_cs_context_constants(struct brw_context *brw)
>  */
> const int simd_size = ctx->API == API_OPENGL_CORE ? 16 : 8;
>  
> -   const uint32_t max_invocations = simd_size * brw->max_cs_threads;
> +   const uint32_t max_invocations = simd_size * max_threads;
> ctx->Const.MaxComputeWorkGroupSize[0] = max_invocations;
> ctx->Const.MaxComputeWorkGroupSize[1] = max_invocations;
> ctx->Const.MaxComputeWorkGroupSize[2] = max_invocations;
> @@ -817,6 +819,7 @@ brwCreateContext(gl_api api,
> if (INTEL_DEBUG & DEBUG_PERF)
>brw->perf_debug = true;
>  
> +   brw_initialize_cs_context_constants(brw, devinfo->max_cs_threads);
> brw_initialize_context_constants(brw);
>  
> ctx->Const.ResetStrategy = notify_reset
> @@ -871,8 +874,6 @@ brwCreateContext(gl_api api,
> brw->urb.max_ds_entries = devinfo->urb.max_ds_entries;
> brw->urb.max_gs_entries = devinfo->urb.max_gs_entries;
>  
> -   brw_adjust_cs_context_constants(brw);
> -
> /* Estimate the size of the mappable aperture into the GTT.  There's an
>  * ioctl to get the whole GTT size, but not one to get the mappable 
> subset.
>  * It turns out it's basically always 256MB, though some ancient hardware
> diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
> b/src/mesa/drivers/dri/i965/intel_extensions.c
> index 81215db..6326cfc 100644
> --- a/src/mesa/drivers/dri/i965/intel_extensions.c
> +++ b/src/mesa/drivers/dri/i965/intel_extensions.c
> @@ -356,6 +356,8 @@ intelInitExtensions(struct gl_context *ctx)
>   ctx->Extensions.ARB_viewport_array = true;
>   ctx->Extensions.AMD_vertex_shader_viewport_index = true;
>   ctx->Extensions.ARB_shader_subroutine = true;
> + if (ctx->Const.MaxComputeWorkGroupSize[0] >= 1024)
> +ctx->Extensions.ARB_compute_shader = true;
>}
> }
>  


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 41/42] docs: Mark ARB_compute_shader as done for i965

2015-11-25 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga 

On Tue, 2015-11-17 at 21:55 -0800, Jordan Justen wrote:
> Signed-off-by: Jordan Justen 
> ---
>  docs/GL3.txt | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/GL3.txt b/docs/GL3.txt
> index b768eea..cb4415f 100644
> --- a/docs/GL3.txt
> +++ b/docs/GL3.txt
> @@ -152,7 +152,7 @@ GL 4.3, GLSL 4.30:
>GL_ARB_arrays_of_arrays  DONE (i965)
>GL_ARB_ES3_compatibility DONE (all drivers 
> that support GLSL 3.30)
>GL_ARB_clear_buffer_object   DONE (all drivers)
> -  GL_ARB_compute_shaderin progress (jljusten)
> +  GL_ARB_compute_shaderDONE (i965)
>GL_ARB_copy_imageDONE (i965, nv50, 
> nvc0, radeonsi)
>GL_KHR_debug DONE (all drivers)
>GL_ARB_explicit_uniform_location DONE (all drivers 
> that support GLSL)
> @@ -210,7 +210,7 @@ GL 4.5, GLSL 4.50:
>  These are the extensions cherry-picked to make GLES 3.1
>  GLES3.1, GLSL ES 3.1
>GL_ARB_arrays_of_arrays  DONE (i965)
> -  GL_ARB_compute_shaderin progress (jljusten)
> +  GL_ARB_compute_shaderDONE (i965)
>GL_ARB_draw_indirect DONE (i965, nvc0, 
> r600, radeonsi, llvmpipe, softpipe)
>GL_ARB_explicit_uniform_location DONE (all drivers 
> that support GLSL)
>GL_ARB_framebuffer_no_attachmentsDONE (i965)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 42/42] docs: Add ARB_compute_shader to 11.1.0 release notes

2015-11-25 Thread Iago Toral
I think we are too late for 11.1.0.

On Tue, 2015-11-17 at 21:55 -0800, Jordan Justen wrote:
> Signed-off-by: Jordan Justen 
> ---
>  docs/relnotes/11.1.0.html | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/docs/relnotes/11.1.0.html b/docs/relnotes/11.1.0.html
> index 6654311..c89b822 100644
> --- a/docs/relnotes/11.1.0.html
> +++ b/docs/relnotes/11.1.0.html
> @@ -47,6 +47,7 @@ Note: some of the new features are only available with 
> certain drivers.
>  GL_ARB_arrays_of_arrays on i965
>  GL_ARB_blend_func_extended on freedreno (a3xx)
>  GL_ARB_clear_texture on nv50, nvc0
> +GL_ARB_compute_shader on i965
>  GL_ARB_copy_image on nv50, nvc0, radeonsi
>  GL_ARB_gpu_shader_fp64 on r600 for Cypress/Cayman/Aruba chips
>  GL_ARB_gpu_shader5 on r600 for Evergreen and later chips


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 06/42] i965: Import tables enumerating the set of validated L3 configurations.

2015-11-25 Thread Samuel Iglesias Gonsálvez


On 25/11/15 13:20, Samuel Iglesias Gonsálvez wrote:
> 
> 
> On 18/11/15 06:54, Jordan Justen wrote:
>> From: Francisco Jerez 
>>
>> It should be possible to use additional L3 configurations other than
>> the ones listed in the tables of validated allocations ("BSpec »
>> 3D-Media-GPGPU Engine » L3 Cache and URB [IVB+] » L3 Cache and URB [*]
>> » L3 Allocation and Programming"), but it seems sensible for now to
>> hard-code the tables in order to stick to the hardware docs.  Instead
>> of setting up the arbitrary L3 partitioning given as input, the
>> closest validated L3 configuration will be looked up in these tables
>> and used to program the hardware.
>>
>> The included tables should work for Gen7-9.  Note that the quantities
>> are specified in ways rather than in KB, this is because the L3
>> control registers expect the value in ways, and because by doing that
>> we can re-use a single table for all GT variants of the same
>> generation (and in the case of IVB/HSW and CHV/SKL across different
>> generations) which generally have different L3 way sizes but allow the
>> same combinations of way allocations.
>> ---
>>  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
>>  src/mesa/drivers/dri/i965/gen7_l3_state.c  | 163 
>> +
>>  2 files changed, 164 insertions(+)
>>  create mode 100644 src/mesa/drivers/dri/i965/gen7_l3_state.c
>>
>> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
>> b/src/mesa/drivers/dri/i965/Makefile.sources
>> index 5a88d66..91901ad 100644
>> --- a/src/mesa/drivers/dri/i965/Makefile.sources
>> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
>> @@ -184,6 +184,7 @@ i965_FILES = \
>>  gen7_cs_state.c \
>>  gen7_disable.c \
>>  gen7_gs_state.c \
>> +gen7_l3_state.c \
>>  gen7_misc_state.c \
>>  gen7_sf_state.c \
>>  gen7_sol_state.c \
>> diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
>> b/src/mesa/drivers/dri/i965/gen7_l3_state.c
>> new file mode 100644
>> index 000..8f9ba5b
>> --- /dev/null
>> +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
>> @@ -0,0 +1,163 @@
>> +/*
>> + * Copyright (c) 2015 Intel Corporation
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the 
>> "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice (including the next
>> + * paragraph) shall be included in all copies or substantial portions of the
>> + * Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
>> OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
>> OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
>> DEALINGS
>> + * IN THE SOFTWARE.
>> + */
>> +
>> +#include "brw_context.h"
>> +#include "brw_defines.h"
>> +#include "brw_state.h"
>> +#include "intel_batchbuffer.h"
>> +
>> +/**
>> + * Chunk of L3 cache reserved for some specific purpose.
>> + */
>> +enum brw_l3_partition {
>> +   /** Shared local memory. */
>> +   L3P_SLM = 0,
>> +   /** Unified return buffer. */
>> +   L3P_URB,
>> +   /** Union of DC and RO. */
>> +   L3P_ALL,
>> +   /** Data cluster RW partition. */
>> +   L3P_DC,
>> +   /** Union of IS, C and T. */
>> +   L3P_RO,
>> +   /** Instruction and state cache. */
>> +   L3P_IS,
>> +   /** Constant cache. */
>> +   L3P_C,
>> +   /** Texture cache. */
>> +   L3P_T,
>> +   /** Number of supported L3 partitions. */
>> +   NUM_L3P
>> +};
>> +
>> +/**
>> + * L3 configuration represented as the number of ways allocated for each
>> + * partition.  \sa get_l3_way_size().
>> + */
>> +struct brw_l3_config {
>> +   unsigned n[NUM_L3P];
>> +};
>> +
>> +/**
>> + * IVB/HSW validated L3 configurations.
>> + */
>> +static const struct brw_l3_config ivb_l3_configs[] = {
>> +   {{  0, 32,  0,  0, 32,  0,  0,  0 }},
>> +   {{  0, 32,  0, 16, 16,  0,  0,  0 }},
>> +   {{  0, 32,  0,  4,  0,  8,  4, 16 }},
>> +   {{  0, 28,  0,  8,  0,  8,  4, 16 }},
>> +   {{  0, 28,  0, 16,  0,  8,  4,  8 }},
>> +   {{  0, 28,  0,  8,  0, 16,  4,  8 }},
>> +   {{  0, 28,  0,  0,  0, 16,  4, 16 }},
>> +   {{  0, 32,  0,  0,  0, 16,  0, 16 }},
>> +   {{  0, 28,  0,  4, 32,  0,  0,  0 }},
>> +   {{ 16, 16,  0, 16, 16,  0,  0,  0 }},
>> +   {{ 16, 16,  0,  8,  0,  8,  8,  8 }},
>> +   {{ 16, 16,  0,  4,  0,  8,  4, 16 }},
>> +   {{ 16, 16,  0,  4,  0, 16,  4,  8 }},
>> +   {{ 16

Re: [Mesa-dev] [PATCH v2 06/42] i965: Import tables enumerating the set of validated L3 configurations.

2015-11-25 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> On 18/11/15 06:54, Jordan Justen wrote:
>> From: Francisco Jerez 
>> 
>> It should be possible to use additional L3 configurations other than
>> the ones listed in the tables of validated allocations ("BSpec »
>> 3D-Media-GPGPU Engine » L3 Cache and URB [IVB+] » L3 Cache and URB [*]
>> » L3 Allocation and Programming"), but it seems sensible for now to
>> hard-code the tables in order to stick to the hardware docs.  Instead
>> of setting up the arbitrary L3 partitioning given as input, the
>> closest validated L3 configuration will be looked up in these tables
>> and used to program the hardware.
>> 
>> The included tables should work for Gen7-9.  Note that the quantities
>> are specified in ways rather than in KB, this is because the L3
>> control registers expect the value in ways, and because by doing that
>> we can re-use a single table for all GT variants of the same
>> generation (and in the case of IVB/HSW and CHV/SKL across different
>> generations) which generally have different L3 way sizes but allow the
>> same combinations of way allocations.
>> ---
>>  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
>>  src/mesa/drivers/dri/i965/gen7_l3_state.c  | 163 
>> +
>>  2 files changed, 164 insertions(+)
>>  create mode 100644 src/mesa/drivers/dri/i965/gen7_l3_state.c
>> 
>> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
>> b/src/mesa/drivers/dri/i965/Makefile.sources
>> index 5a88d66..91901ad 100644
>> --- a/src/mesa/drivers/dri/i965/Makefile.sources
>> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
>> @@ -184,6 +184,7 @@ i965_FILES = \
>>  gen7_cs_state.c \
>>  gen7_disable.c \
>>  gen7_gs_state.c \
>> +gen7_l3_state.c \
>>  gen7_misc_state.c \
>>  gen7_sf_state.c \
>>  gen7_sol_state.c \
>> diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
>> b/src/mesa/drivers/dri/i965/gen7_l3_state.c
>> new file mode 100644
>> index 000..8f9ba5b
>> --- /dev/null
>> +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
>> @@ -0,0 +1,163 @@
>> +/*
>> + * Copyright (c) 2015 Intel Corporation
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the 
>> "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice (including the next
>> + * paragraph) shall be included in all copies or substantial portions of the
>> + * Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
>> OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
>> OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
>> DEALINGS
>> + * IN THE SOFTWARE.
>> + */
>> +
>> +#include "brw_context.h"
>> +#include "brw_defines.h"
>> +#include "brw_state.h"
>> +#include "intel_batchbuffer.h"
>> +
>> +/**
>> + * Chunk of L3 cache reserved for some specific purpose.
>> + */
>> +enum brw_l3_partition {
>> +   /** Shared local memory. */
>> +   L3P_SLM = 0,
>> +   /** Unified return buffer. */
>> +   L3P_URB,
>> +   /** Union of DC and RO. */
>> +   L3P_ALL,
>> +   /** Data cluster RW partition. */
>> +   L3P_DC,
>> +   /** Union of IS, C and T. */
>> +   L3P_RO,
>> +   /** Instruction and state cache. */
>> +   L3P_IS,
>> +   /** Constant cache. */
>> +   L3P_C,
>> +   /** Texture cache. */
>> +   L3P_T,
>> +   /** Number of supported L3 partitions. */
>> +   NUM_L3P
>> +};
>> +
>> +/**
>> + * L3 configuration represented as the number of ways allocated for each
>> + * partition.  \sa get_l3_way_size().
>> + */
>> +struct brw_l3_config {
>> +   unsigned n[NUM_L3P];
>> +};
>> +
>> +/**
>> + * IVB/HSW validated L3 configurations.
>> + */
>> +static const struct brw_l3_config ivb_l3_configs[] = {
>> +   {{  0, 32,  0,  0, 32,  0,  0,  0 }},
>> +   {{  0, 32,  0, 16, 16,  0,  0,  0 }},
>> +   {{  0, 32,  0,  4,  0,  8,  4, 16 }},
>> +   {{  0, 28,  0,  8,  0,  8,  4, 16 }},
>> +   {{  0, 28,  0, 16,  0,  8,  4,  8 }},
>> +   {{  0, 28,  0,  8,  0, 16,  4,  8 }},
>> +   {{  0, 28,  0,  0,  0, 16,  4, 16 }},
>> +   {{  0, 32,  0,  0,  0, 16,  0, 16 }},
>> +   {{  0, 28,  0,  4, 32,  0,  0,  0 }},
>> +   {{ 16, 16,  0, 16, 16,  0,  0,  0 }},
>> +   {{ 16, 16,  0,  8,  0,  8,  8,  8 }},
>> +   {{ 16, 16,  0,  4,  0,  8,  4, 16 }},
>> +   {{ 16, 16,  0,  4,  0, 16,  4,  8 }},
>> +   {{ 16, 16,  0,  0, 32,  0,

Re: [Mesa-dev] [PATCH v3] mesa: error out in indirect draw when vertex bindings mismatch

2015-11-25 Thread Tapani Pälli

On 11/25/2015 08:21 AM, Fredrik Höglund wrote:

On Tuesday 24 November 2015, Tapani Pälli wrote:

Patch adds additional mask for tracking which vertex arrays have
associated vertex buffer binding set. This array can be directly
compared to which vertex arrays are enabled and should match when
drawing.

Fixes following CTS tests:

ES31-CTS.draw_indirect.negative-noVBO-arrays
ES31-CTS.draw_indirect.negative-noVBO-elements

v2: update mask in vertex_array_attrib_binding
v3: rename mask and make it track _BoundArrays which matches what
 was actually originally wanted (Fredrik Höglund)

Signed-off-by: Tapani Pälli 
---
  src/mesa/main/api_validate.c | 13 +
  src/mesa/main/mtypes.h   |  3 +++
  src/mesa/main/varray.c   | 10 ++
  3 files changed, 26 insertions(+)

diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
index a490189..c13c177 100644
--- a/src/mesa/main/api_validate.c
+++ b/src/mesa/main/api_validate.c
@@ -710,6 +710,19 @@ valid_draw_indirect(struct gl_context *ctx,
return GL_FALSE;
 }
  
+   /* From OpenGL ES 3.1 spec. section 10.5:

+* "An INVALID_OPERATION error is generated if zero is bound to
+* VERTEX_ARRAY_BINDING, DRAW_INDIRECT_BUFFER or to any enabled
+* vertex array."
+*
+* Here we check that for each enabled vertex array we have a vertex
+* buffer bound.
+*/

This error is specific to OpenGL ES AFAICT, so you should probably also
check the API here.


Did not notice that before, good catch.


+   if (ctx->Array.VAO->_Enabled != ctx->Array.VAO->VertexAttribBufferMask) {
+  _mesa_error(ctx, GL_INVALID_OPERATION, "%s(No VBO bound)", name);
+  return GL_FALSE;
+   }
+
 if (!_mesa_valid_prim_mode(ctx, mode, name))
return GL_FALSE;
  
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h

index d425571..242efe8 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -1419,6 +1419,9 @@ struct gl_vertex_array_object
 /** Vertex buffer bindings */
 struct gl_vertex_buffer_binding VertexBinding[VERT_ATTRIB_MAX];
  
+   /** Mask indicating which vertex arrays have vertex buffer associated. */

+   GLbitfield64 VertexAttribBufferMask;
+
 /** Mask of VERT_BIT_* values indicating which arrays are enabled */
 GLbitfield64 _Enabled;
  
diff --git a/src/mesa/main/varray.c b/src/mesa/main/varray.c

index 887d0c0..0cd8e33 100644
--- a/src/mesa/main/varray.c
+++ b/src/mesa/main/varray.c
@@ -135,6 +135,11 @@ vertex_attrib_binding(struct gl_context *ctx,
  {
 struct gl_vertex_attrib_array *array = &vao->VertexAttrib[attribIndex];
  
+   if (vao->VertexBinding[bindingIndex].BufferObj == ctx->Shared->NullBufferObj)

Make this if (!_mesa_is_bufferobj(...))


+ vao->VertexAttribBufferMask &= ~VERT_BIT(attribIndex);
+   else
+ vao->VertexAttribBufferMask |= VERT_BIT(attribIndex);
+
 if (array->VertexBinding != bindingIndex) {
const GLbitfield64 array_bit = VERT_BIT(attribIndex);
  
@@ -174,6 +179,11 @@ bind_vertex_buffer(struct gl_context *ctx,

binding->Offset = offset;
binding->Stride = stride;
  
+  if (vbo == ctx->Shared->NullBufferObj)

And here as well.

With those nitpicks fixed, this patch is:

Reviewed-by: Fredrik Höglund 


Thanks!


+ vao->VertexAttribBufferMask &= ~binding->_BoundArrays;
+  else
+ vao->VertexAttribBufferMask |= binding->_BoundArrays;
+
vao->NewArrays |= binding->_BoundArrays;
 }
  }



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] mesa/main: allow delayed initialization of performance monitors

2015-11-25 Thread Samuel Pitoiset
This is definitely a good performance improvement at initialization 
time. This is not going to affect Nouveau because we don't have as much 
as performance counters as Radeon, but it's great anyway.


One comment below.

Reviewed-by: Samuel Pitoiset 

On 11/25/2015 01:00 PM, Nicolai Hähnle wrote:

Most applications never use performance counters, so allow drivers to
skip potentially expensive initialization steps.

A driver that wants to use this must enable the appropriate extension(s)
at context initialization and set the InitPerfMonitorGroups driver function
which will be called the first time information about the performance monitor
groups is actually used.

The init_groups helper is called for API functions that can be called before
a monitor object exists. Functions that require an existing monitor object
can rely on init_groups having been called before.
---
  src/mesa/main/dd.h  |  1 +
  src/mesa/main/performance_monitor.c | 39 +
  2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
index 496a14f..e5281ce 100644
--- a/src/mesa/main/dd.h
+++ b/src/mesa/main/dd.h
@@ -727,6 +727,7 @@ struct dd_function_table {
  * \name Performance monitors
  */
 /*@{*/
+   void (*InitPerfMonitorGroups)(struct gl_context *ctx);


Just a minor comment that your are free to accept or reject. :-)
Why don't just call that function InitPerfMonitor()?


 struct gl_perf_monitor_object * (*NewPerfMonitor)(struct gl_context *ctx);
 void (*DeletePerfMonitor)(struct gl_context *ctx,
   struct gl_perf_monitor_object *m);
diff --git a/src/mesa/main/performance_monitor.c 
b/src/mesa/main/performance_monitor.c
index 2d740da..98dfbea 100644
--- a/src/mesa/main/performance_monitor.c
+++ b/src/mesa/main/performance_monitor.c
@@ -53,6 +53,13 @@ _mesa_init_performance_monitors(struct gl_context *ctx)
 ctx->PerfMonitor.Groups = NULL;
  }

+static inline void
+init_groups(struct gl_context *ctx)
+{
+   if (unlikely(!ctx->PerfMonitor.Groups))
+  ctx->Driver.InitPerfMonitorGroups(ctx);
+}
+
  static struct gl_perf_monitor_object *
  new_performance_monitor(struct gl_context *ctx, GLuint index)
  {
@@ -171,6 +178,7 @@ _mesa_GetPerfMonitorGroupsAMD(GLint *numGroups, GLsizei 
groupsSize,
GLuint *groups)
  {
 GET_CURRENT_CONTEXT(ctx);
+   init_groups(ctx);

 if (numGroups != NULL)
*numGroups = ctx->PerfMonitor.NumGroups;
@@ -191,7 +199,11 @@ _mesa_GetPerfMonitorCountersAMD(GLuint group, GLint 
*numCounters,
  GLsizei countersSize, GLuint *counters)
  {
 GET_CURRENT_CONTEXT(ctx);
-   const struct gl_perf_monitor_group *group_obj = get_group(ctx, group);
+   const struct gl_perf_monitor_group *group_obj;
+
+   init_groups(ctx);
+
+   group_obj = get_group(ctx, group);
 if (group_obj == NULL) {
_mesa_error(ctx, GL_INVALID_VALUE,
"glGetPerfMonitorCountersAMD(invalid group)");
@@ -219,9 +231,11 @@ _mesa_GetPerfMonitorGroupStringAMD(GLuint group, GLsizei 
bufSize,
 GLsizei *length, GLchar *groupString)
  {
 GET_CURRENT_CONTEXT(ctx);
+   const struct gl_perf_monitor_group *group_obj;

-   const struct gl_perf_monitor_group *group_obj = get_group(ctx, group);
+   init_groups(ctx);

+   group_obj = get_group(ctx, group);
 if (group_obj == NULL) {
_mesa_error(ctx, GL_INVALID_VALUE, "glGetPerfMonitorGroupStringAMD");
return;
@@ -251,6 +265,8 @@ _mesa_GetPerfMonitorCounterStringAMD(GLuint group, GLuint 
counter,
 const struct gl_perf_monitor_group *group_obj;
 const struct gl_perf_monitor_counter *counter_obj;

+   init_groups(ctx);
+
 group_obj = get_group(ctx, group);

 if (group_obj == NULL) {
@@ -290,6 +306,8 @@ _mesa_GetPerfMonitorCounterInfoAMD(GLuint group, GLuint 
counter, GLenum pname,
 const struct gl_perf_monitor_group *group_obj;
 const struct gl_perf_monitor_counter *counter_obj;

+   init_groups(ctx);
+
 group_obj = get_group(ctx, group);

 if (group_obj == NULL) {
@@ -353,6 +371,8 @@ _mesa_GenPerfMonitorsAMD(GLsizei n, GLuint *monitors)
 if (MESA_VERBOSE & VERBOSE_API)
_mesa_debug(ctx, "glGenPerfMonitorsAMD(%d)\n", n);

+   init_groups(ctx);
+
 if (n < 0) {
_mesa_error(ctx, GL_INVALID_VALUE, "glGenPerfMonitorsAMD(n < 0)");
return;
@@ -673,6 +693,8 @@ _mesa_GetFirstPerfQueryIdINTEL(GLuint *queryId)
 GET_CURRENT_CONTEXT(ctx);
 unsigned numGroups;

+   init_groups(ctx);
+
 /* The GL_INTEL_performance_query spec says:
  *
  *"If queryId pointer is equal to 0, INVALID_VALUE error is generated."
@@ -705,6 +727,7 @@ extern void GLAPIENTRY
  _mesa_GetNextPerfQueryIdINTEL(GLuint queryId, GLuint *nextQueryId)
  {
 GET_CURRENT_CONTEXT(ctx);
+   init_groups(ctx);

 /* The GL_INTEL_performance_query spec says:
  *
@@ -744,6 +

Re: [Mesa-dev] [PATCH 2/3] st/mesa: delay initialization of performance counters

2015-11-25 Thread Samuel Pitoiset

One comment below.

Reviewed-by: Samuel Pitoiset 

On 11/25/2015 01:00 PM, Nicolai Hähnle wrote:

---
  src/mesa/state_tracker/st_cb_perfmon.c | 28 +---
  src/mesa/state_tracker/st_cb_perfmon.h |  2 +-
  src/mesa/state_tracker/st_context.c|  2 +-
  3 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_perfmon.c 
b/src/mesa/state_tracker/st_cb_perfmon.c
index 8fdf0e8..cd7fdc0 100644
--- a/src/mesa/state_tracker/st_cb_perfmon.c
+++ b/src/mesa/state_tracker/st_cb_perfmon.c
@@ -329,8 +329,20 @@ st_GetPerfMonitorResult(struct gl_context *ctx,


  bool
-st_init_perfmon(struct st_context *st)
+st_have_perfmon(struct st_context *st)
  {
+   struct pipe_screen *screen = st->pipe->screen;
+
+   if (!screen->get_driver_query_info || !screen->get_driver_query_group_info)
+  return false;
+
+   return screen->get_driver_query_group_info(screen, 0, NULL) != 0;
+}
+
+static void
+st_InitPerfMonitorGroups(struct gl_context *ctx)
+{
+   struct st_context *st = st_context(ctx);
 struct gl_perf_monitor_state *perfmon = &st->ctx->PerfMonitor;
 struct pipe_screen *screen = st->pipe->screen;
 struct gl_perf_monitor_group *groups = NULL;
@@ -338,20 +350,14 @@ st_init_perfmon(struct st_context *st)
 int num_counters, num_groups;
 int gid, cid;

-   if (!screen->get_driver_query_info || !screen->get_driver_query_group_info)
-  return false;
-
 /* Get the number of available queries. */
 num_counters = screen->get_driver_query_info(screen, 0, NULL);
-   if (!num_counters)
-  return false;

 /* Get the number of available groups. */
 num_groups = screen->get_driver_query_group_info(screen, 0, NULL);
-   if (num_groups)
-  groups = CALLOC(num_groups, sizeof(*groups));
+   groups = CALLOC(num_groups, sizeof(*groups));
 if (!groups)
-  return false;
+  return;

 stgroups = CALLOC(num_groups, sizeof(*stgroups));
 if (!stgroups)
@@ -432,7 +438,7 @@ st_init_perfmon(struct st_context *st)
 perfmon->Groups = groups;
 st->perfmon = stgroups;

-   return true;
+   return;

  fail:
 for (gid = 0; gid < num_groups; gid++) {
@@ -442,7 +448,6 @@ fail:
 FREE(stgroups);
  fail_only_groups:
 FREE(groups);
-   return false;
  }

  void
@@ -461,6 +466,7 @@ st_destroy_perfmon(struct st_context *st)

  void st_init_perfmon_functions(struct dd_function_table *functions)
  {
+   functions->InitPerfMonitorGroups = st_InitPerfMonitorGroups;
 functions->NewPerfMonitor = st_NewPerfMonitor;
 functions->DeletePerfMonitor = st_DeletePerfMonitor;
 functions->BeginPerfMonitor = st_BeginPerfMonitor;
diff --git a/src/mesa/state_tracker/st_cb_perfmon.h 
b/src/mesa/state_tracker/st_cb_perfmon.h
index 2973286..98202f2 100644
--- a/src/mesa/state_tracker/st_cb_perfmon.h
+++ b/src/mesa/state_tracker/st_cb_perfmon.h
@@ -73,7 +73,7 @@ st_perf_monitor_object(struct gl_perf_monitor_object *q)
  }

  bool
-st_init_perfmon(struct st_context *st);
+st_have_perfmon(struct st_context *st);

  void
  st_destroy_perfmon(struct st_context *st);
diff --git a/src/mesa/state_tracker/st_context.c 
b/src/mesa/state_tracker/st_context.c
index 6e20fd1..eea3de0 100644
--- a/src/mesa/state_tracker/st_context.c
+++ b/src/mesa/state_tracker/st_context.c
@@ -255,7 +255,7 @@ st_create_context_priv( struct gl_context *ctx, struct 
pipe_context *pipe,
 st_init_extensions(st->pipe->screen, &ctx->Const,
&ctx->Extensions, &st->options, ctx->Mesa_DXTn);

-   if (st_init_perfmon(st)) {
+   if (st_have_perfmon(st)) {
/* GL_AMD_performance_monitor is only enabled when the underlying
 * driver expose GPU hardware performance counters. */


Mmmh, we should drop that comment since you removed that notion of 
GPU/CPU counters.



ctx->Extensions.AMD_performance_monitor = GL_TRUE;



--
-Samuel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 06/42] i965: Import tables enumerating the set of validated L3 configurations.

2015-11-25 Thread Samuel Iglesias Gonsálvez


On 25/11/15 13:56, Samuel Iglesias Gonsálvez wrote:
> 
> 
> On 25/11/15 13:20, Samuel Iglesias Gonsálvez wrote:
>>
>>
>> On 18/11/15 06:54, Jordan Justen wrote:
>>> From: Francisco Jerez 
>>>
>>> It should be possible to use additional L3 configurations other than
>>> the ones listed in the tables of validated allocations ("BSpec »
>>> 3D-Media-GPGPU Engine » L3 Cache and URB [IVB+] » L3 Cache and URB [*]
>>> » L3 Allocation and Programming"), but it seems sensible for now to
>>> hard-code the tables in order to stick to the hardware docs.  Instead
>>> of setting up the arbitrary L3 partitioning given as input, the
>>> closest validated L3 configuration will be looked up in these tables
>>> and used to program the hardware.
>>>
>>> The included tables should work for Gen7-9.  Note that the quantities
>>> are specified in ways rather than in KB, this is because the L3
>>> control registers expect the value in ways, and because by doing that
>>> we can re-use a single table for all GT variants of the same
>>> generation (and in the case of IVB/HSW and CHV/SKL across different
>>> generations) which generally have different L3 way sizes but allow the
>>> same combinations of way allocations.
>>> ---
>>>  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
>>>  src/mesa/drivers/dri/i965/gen7_l3_state.c  | 163 
>>> +
>>>  2 files changed, 164 insertions(+)
>>>  create mode 100644 src/mesa/drivers/dri/i965/gen7_l3_state.c
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
>>> b/src/mesa/drivers/dri/i965/Makefile.sources
>>> index 5a88d66..91901ad 100644
>>> --- a/src/mesa/drivers/dri/i965/Makefile.sources
>>> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
>>> @@ -184,6 +184,7 @@ i965_FILES = \
>>> gen7_cs_state.c \
>>> gen7_disable.c \
>>> gen7_gs_state.c \
>>> +   gen7_l3_state.c \
>>> gen7_misc_state.c \
>>> gen7_sf_state.c \
>>> gen7_sol_state.c \
>>> diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
>>> b/src/mesa/drivers/dri/i965/gen7_l3_state.c
>>> new file mode 100644
>>> index 000..8f9ba5b
>>> --- /dev/null
>>> +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
>>> @@ -0,0 +1,163 @@
>>> +/*
>>> + * Copyright (c) 2015 Intel Corporation
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person obtaining a
>>> + * copy of this software and associated documentation files (the 
>>> "Software"),
>>> + * to deal in the Software without restriction, including without 
>>> limitation
>>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>>> + * and/or sell copies of the Software, and to permit persons to whom the
>>> + * Software is furnished to do so, subject to the following conditions:
>>> + *
>>> + * The above copyright notice and this permission notice (including the 
>>> next
>>> + * paragraph) shall be included in all copies or substantial portions of 
>>> the
>>> + * Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
>>> OR
>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
>>> OTHER
>>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
>>> DEALINGS
>>> + * IN THE SOFTWARE.
>>> + */
>>> +
>>> +#include "brw_context.h"
>>> +#include "brw_defines.h"
>>> +#include "brw_state.h"
>>> +#include "intel_batchbuffer.h"
>>> +
>>> +/**
>>> + * Chunk of L3 cache reserved for some specific purpose.
>>> + */
>>> +enum brw_l3_partition {
>>> +   /** Shared local memory. */
>>> +   L3P_SLM = 0,
>>> +   /** Unified return buffer. */
>>> +   L3P_URB,
>>> +   /** Union of DC and RO. */
>>> +   L3P_ALL,
>>> +   /** Data cluster RW partition. */
>>> +   L3P_DC,
>>> +   /** Union of IS, C and T. */
>>> +   L3P_RO,
>>> +   /** Instruction and state cache. */
>>> +   L3P_IS,
>>> +   /** Constant cache. */
>>> +   L3P_C,
>>> +   /** Texture cache. */
>>> +   L3P_T,
>>> +   /** Number of supported L3 partitions. */
>>> +   NUM_L3P
>>> +};
>>> +
>>> +/**
>>> + * L3 configuration represented as the number of ways allocated for each
>>> + * partition.  \sa get_l3_way_size().
>>> + */
>>> +struct brw_l3_config {
>>> +   unsigned n[NUM_L3P];
>>> +};
>>> +
>>> +/**
>>> + * IVB/HSW validated L3 configurations.
>>> + */
>>> +static const struct brw_l3_config ivb_l3_configs[] = {
>>> +   {{  0, 32,  0,  0, 32,  0,  0,  0 }},
>>> +   {{  0, 32,  0, 16, 16,  0,  0,  0 }},
>>> +   {{  0, 32,  0,  4,  0,  8,  4, 16 }},
>>> +   {{  0, 28,  0,  8,  0,  8,  4, 16 }},
>>> +   {{  0, 28,  0, 16,  0,  8,  4,  8 }},
>>> +   {{  0, 28,  0,  8,  0, 16,  4,  8 }},
>>> +   {{  0, 28,  0,  0,  0, 16,  4, 16 }},
>>> +   {{  0, 32,  0,  0,  0, 16,  0, 16 }},
>>> +   {{  0, 28,  0,  4, 32,  0,  0,  0 }},


Re: [Mesa-dev] [PATCH 1/3] mesa/main: allow delayed initialization of performance monitors

2015-11-25 Thread Nicolai Hähnle

On 25.11.2015 14:10, Samuel Pitoiset wrote:

This is definitely a good performance improvement at initialization
time. This is not going to affect Nouveau because we don't have as much
as performance counters as Radeon, but it's great anyway.

One comment below.

Reviewed-by: Samuel Pitoiset 

On 11/25/2015 01:00 PM, Nicolai Hähnle wrote:

Most applications never use performance counters, so allow drivers to
skip potentially expensive initialization steps.

A driver that wants to use this must enable the appropriate extension(s)
at context initialization and set the InitPerfMonitorGroups driver
function
which will be called the first time information about the performance
monitor
groups is actually used.

The init_groups helper is called for API functions that can be called
before
a monitor object exists. Functions that require an existing monitor
object
can rely on init_groups having been called before.
---
  src/mesa/main/dd.h  |  1 +
  src/mesa/main/performance_monitor.c | 39
+
  2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
index 496a14f..e5281ce 100644
--- a/src/mesa/main/dd.h
+++ b/src/mesa/main/dd.h
@@ -727,6 +727,7 @@ struct dd_function_table {
  * \name Performance monitors
  */
 /*@{*/
+   void (*InitPerfMonitorGroups)(struct gl_context *ctx);


Just a minor comment that your are free to accept or reject. :-)
Why don't just call that function InitPerfMonitor()?


Mesa has a function _mesa_init_performance_monitors which initializes 
the monitor object hash, and so I chose the name to make it obvious that 
there is a distinction. I don't feel strongly either way, though.


Cheers,
Nicolai


 struct gl_perf_monitor_object * (*NewPerfMonitor)(struct
gl_context *ctx);
 void (*DeletePerfMonitor)(struct gl_context *ctx,
   struct gl_perf_monitor_object *m);
diff --git a/src/mesa/main/performance_monitor.c
b/src/mesa/main/performance_monitor.c
index 2d740da..98dfbea 100644
--- a/src/mesa/main/performance_monitor.c
+++ b/src/mesa/main/performance_monitor.c
@@ -53,6 +53,13 @@ _mesa_init_performance_monitors(struct gl_context
*ctx)
 ctx->PerfMonitor.Groups = NULL;
  }

+static inline void
+init_groups(struct gl_context *ctx)
+{
+   if (unlikely(!ctx->PerfMonitor.Groups))
+  ctx->Driver.InitPerfMonitorGroups(ctx);
+}
+
  static struct gl_perf_monitor_object *
  new_performance_monitor(struct gl_context *ctx, GLuint index)
  {
@@ -171,6 +178,7 @@ _mesa_GetPerfMonitorGroupsAMD(GLint *numGroups,
GLsizei groupsSize,
GLuint *groups)
  {
 GET_CURRENT_CONTEXT(ctx);
+   init_groups(ctx);

 if (numGroups != NULL)
*numGroups = ctx->PerfMonitor.NumGroups;
@@ -191,7 +199,11 @@ _mesa_GetPerfMonitorCountersAMD(GLuint group,
GLint *numCounters,
  GLsizei countersSize, GLuint *counters)
  {
 GET_CURRENT_CONTEXT(ctx);
-   const struct gl_perf_monitor_group *group_obj = get_group(ctx,
group);
+   const struct gl_perf_monitor_group *group_obj;
+
+   init_groups(ctx);
+
+   group_obj = get_group(ctx, group);
 if (group_obj == NULL) {
_mesa_error(ctx, GL_INVALID_VALUE,
"glGetPerfMonitorCountersAMD(invalid group)");
@@ -219,9 +231,11 @@ _mesa_GetPerfMonitorGroupStringAMD(GLuint group,
GLsizei bufSize,
 GLsizei *length, GLchar
*groupString)
  {
 GET_CURRENT_CONTEXT(ctx);
+   const struct gl_perf_monitor_group *group_obj;

-   const struct gl_perf_monitor_group *group_obj = get_group(ctx,
group);
+   init_groups(ctx);

+   group_obj = get_group(ctx, group);
 if (group_obj == NULL) {
_mesa_error(ctx, GL_INVALID_VALUE,
"glGetPerfMonitorGroupStringAMD");
return;
@@ -251,6 +265,8 @@ _mesa_GetPerfMonitorCounterStringAMD(GLuint group,
GLuint counter,
 const struct gl_perf_monitor_group *group_obj;
 const struct gl_perf_monitor_counter *counter_obj;

+   init_groups(ctx);
+
 group_obj = get_group(ctx, group);

 if (group_obj == NULL) {
@@ -290,6 +306,8 @@ _mesa_GetPerfMonitorCounterInfoAMD(GLuint group,
GLuint counter, GLenum pname,
 const struct gl_perf_monitor_group *group_obj;
 const struct gl_perf_monitor_counter *counter_obj;

+   init_groups(ctx);
+
 group_obj = get_group(ctx, group);

 if (group_obj == NULL) {
@@ -353,6 +371,8 @@ _mesa_GenPerfMonitorsAMD(GLsizei n, GLuint *monitors)
 if (MESA_VERBOSE & VERBOSE_API)
_mesa_debug(ctx, "glGenPerfMonitorsAMD(%d)\n", n);

+   init_groups(ctx);
+
 if (n < 0) {
_mesa_error(ctx, GL_INVALID_VALUE, "glGenPerfMonitorsAMD(n <
0)");
return;
@@ -673,6 +693,8 @@ _mesa_GetFirstPerfQueryIdINTEL(GLuint *queryId)
 GET_CURRENT_CONTEXT(ctx);
 unsigned numGroups;

+   init_groups(ctx);
+
 /* The GL_INTEL_performance_query spec says:
  *
  *"If queryId pointer is equal to 0, INVALI

Re: [Mesa-dev] [PATCH 2/3] st/mesa: delay initialization of performance counters

2015-11-25 Thread Nicolai Hähnle

On 25.11.2015 14:11, Samuel Pitoiset wrote:

One comment below.

Reviewed-by: Samuel Pitoiset 

On 11/25/2015 01:00 PM, Nicolai Hähnle wrote:

---
  src/mesa/state_tracker/st_cb_perfmon.c | 28
+---
  src/mesa/state_tracker/st_cb_perfmon.h |  2 +-
  src/mesa/state_tracker/st_context.c|  2 +-
  3 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_perfmon.c
b/src/mesa/state_tracker/st_cb_perfmon.c
index 8fdf0e8..cd7fdc0 100644
--- a/src/mesa/state_tracker/st_cb_perfmon.c
+++ b/src/mesa/state_tracker/st_cb_perfmon.c
@@ -329,8 +329,20 @@ st_GetPerfMonitorResult(struct gl_context *ctx,


  bool
-st_init_perfmon(struct st_context *st)
+st_have_perfmon(struct st_context *st)
  {
+   struct pipe_screen *screen = st->pipe->screen;
+
+   if (!screen->get_driver_query_info ||
!screen->get_driver_query_group_info)
+  return false;
+
+   return screen->get_driver_query_group_info(screen, 0, NULL) != 0;
+}
+
+static void
+st_InitPerfMonitorGroups(struct gl_context *ctx)
+{
+   struct st_context *st = st_context(ctx);
 struct gl_perf_monitor_state *perfmon = &st->ctx->PerfMonitor;
 struct pipe_screen *screen = st->pipe->screen;
 struct gl_perf_monitor_group *groups = NULL;
@@ -338,20 +350,14 @@ st_init_perfmon(struct st_context *st)
 int num_counters, num_groups;
 int gid, cid;

-   if (!screen->get_driver_query_info ||
!screen->get_driver_query_group_info)
-  return false;
-
 /* Get the number of available queries. */
 num_counters = screen->get_driver_query_info(screen, 0, NULL);
-   if (!num_counters)
-  return false;

 /* Get the number of available groups. */
 num_groups = screen->get_driver_query_group_info(screen, 0, NULL);
-   if (num_groups)
-  groups = CALLOC(num_groups, sizeof(*groups));
+   groups = CALLOC(num_groups, sizeof(*groups));
 if (!groups)
-  return false;
+  return;

 stgroups = CALLOC(num_groups, sizeof(*stgroups));
 if (!stgroups)
@@ -432,7 +438,7 @@ st_init_perfmon(struct st_context *st)
 perfmon->Groups = groups;
 st->perfmon = stgroups;

-   return true;
+   return;

  fail:
 for (gid = 0; gid < num_groups; gid++) {
@@ -442,7 +448,6 @@ fail:
 FREE(stgroups);
  fail_only_groups:
 FREE(groups);
-   return false;
  }

  void
@@ -461,6 +466,7 @@ st_destroy_perfmon(struct st_context *st)

  void st_init_perfmon_functions(struct dd_function_table *functions)
  {
+   functions->InitPerfMonitorGroups = st_InitPerfMonitorGroups;
 functions->NewPerfMonitor = st_NewPerfMonitor;
 functions->DeletePerfMonitor = st_DeletePerfMonitor;
 functions->BeginPerfMonitor = st_BeginPerfMonitor;
diff --git a/src/mesa/state_tracker/st_cb_perfmon.h
b/src/mesa/state_tracker/st_cb_perfmon.h
index 2973286..98202f2 100644
--- a/src/mesa/state_tracker/st_cb_perfmon.h
+++ b/src/mesa/state_tracker/st_cb_perfmon.h
@@ -73,7 +73,7 @@ st_perf_monitor_object(struct gl_perf_monitor_object
*q)
  }

  bool
-st_init_perfmon(struct st_context *st);
+st_have_perfmon(struct st_context *st);

  void
  st_destroy_perfmon(struct st_context *st);
diff --git a/src/mesa/state_tracker/st_context.c
b/src/mesa/state_tracker/st_context.c
index 6e20fd1..eea3de0 100644
--- a/src/mesa/state_tracker/st_context.c
+++ b/src/mesa/state_tracker/st_context.c
@@ -255,7 +255,7 @@ st_create_context_priv( struct gl_context *ctx,
struct pipe_context *pipe,
 st_init_extensions(st->pipe->screen, &ctx->Const,
&ctx->Extensions, &st->options, ctx->Mesa_DXTn);

-   if (st_init_perfmon(st)) {
+   if (st_have_perfmon(st)) {
/* GL_AMD_performance_monitor is only enabled when the underlying
 * driver expose GPU hardware performance counters. */


Mmmh, we should drop that comment since you removed that notion of
GPU/CPU counters.


Good point. I'll put this in a separate patch, since it's somewhat 
unrelated. Thanks for the review!


Cheers,
Nicolai


ctx->Extensions.AMD_performance_monitor = GL_TRUE;





___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] mesa/main: allow delayed initialization of performance monitors

2015-11-25 Thread Samuel Pitoiset



On 11/25/2015 02:17 PM, Nicolai Hähnle wrote:

On 25.11.2015 14:10, Samuel Pitoiset wrote:

This is definitely a good performance improvement at initialization
time. This is not going to affect Nouveau because we don't have as much
as performance counters as Radeon, but it's great anyway.

One comment below.

Reviewed-by: Samuel Pitoiset 

On 11/25/2015 01:00 PM, Nicolai Hähnle wrote:

Most applications never use performance counters, so allow drivers to
skip potentially expensive initialization steps.

A driver that wants to use this must enable the appropriate extension(s)
at context initialization and set the InitPerfMonitorGroups driver
function
which will be called the first time information about the performance
monitor
groups is actually used.

The init_groups helper is called for API functions that can be called
before
a monitor object exists. Functions that require an existing monitor
object
can rely on init_groups having been called before.
---
  src/mesa/main/dd.h  |  1 +
  src/mesa/main/performance_monitor.c | 39
+
  2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
index 496a14f..e5281ce 100644
--- a/src/mesa/main/dd.h
+++ b/src/mesa/main/dd.h
@@ -727,6 +727,7 @@ struct dd_function_table {
  * \name Performance monitors
  */
 /*@{*/
+   void (*InitPerfMonitorGroups)(struct gl_context *ctx);


Just a minor comment that your are free to accept or reject. :-)
Why don't just call that function InitPerfMonitor()?


Mesa has a function _mesa_init_performance_monitors which initializes
the monitor object hash, and so I chose the name to make it obvious that
there is a distinction. I don't feel strongly either way, though.


Fine by me. This doesn't matter.



Cheers,
Nicolai


 struct gl_perf_monitor_object * (*NewPerfMonitor)(struct
gl_context *ctx);
 void (*DeletePerfMonitor)(struct gl_context *ctx,
   struct gl_perf_monitor_object *m);
diff --git a/src/mesa/main/performance_monitor.c
b/src/mesa/main/performance_monitor.c
index 2d740da..98dfbea 100644
--- a/src/mesa/main/performance_monitor.c
+++ b/src/mesa/main/performance_monitor.c
@@ -53,6 +53,13 @@ _mesa_init_performance_monitors(struct gl_context
*ctx)
 ctx->PerfMonitor.Groups = NULL;
  }

+static inline void
+init_groups(struct gl_context *ctx)
+{
+   if (unlikely(!ctx->PerfMonitor.Groups))
+  ctx->Driver.InitPerfMonitorGroups(ctx);
+}
+
  static struct gl_perf_monitor_object *
  new_performance_monitor(struct gl_context *ctx, GLuint index)
  {
@@ -171,6 +178,7 @@ _mesa_GetPerfMonitorGroupsAMD(GLint *numGroups,
GLsizei groupsSize,
GLuint *groups)
  {
 GET_CURRENT_CONTEXT(ctx);
+   init_groups(ctx);

 if (numGroups != NULL)
*numGroups = ctx->PerfMonitor.NumGroups;
@@ -191,7 +199,11 @@ _mesa_GetPerfMonitorCountersAMD(GLuint group,
GLint *numCounters,
  GLsizei countersSize, GLuint
*counters)
  {
 GET_CURRENT_CONTEXT(ctx);
-   const struct gl_perf_monitor_group *group_obj = get_group(ctx,
group);
+   const struct gl_perf_monitor_group *group_obj;
+
+   init_groups(ctx);
+
+   group_obj = get_group(ctx, group);
 if (group_obj == NULL) {
_mesa_error(ctx, GL_INVALID_VALUE,
"glGetPerfMonitorCountersAMD(invalid group)");
@@ -219,9 +231,11 @@ _mesa_GetPerfMonitorGroupStringAMD(GLuint group,
GLsizei bufSize,
 GLsizei *length, GLchar
*groupString)
  {
 GET_CURRENT_CONTEXT(ctx);
+   const struct gl_perf_monitor_group *group_obj;

-   const struct gl_perf_monitor_group *group_obj = get_group(ctx,
group);
+   init_groups(ctx);

+   group_obj = get_group(ctx, group);
 if (group_obj == NULL) {
_mesa_error(ctx, GL_INVALID_VALUE,
"glGetPerfMonitorGroupStringAMD");
return;
@@ -251,6 +265,8 @@ _mesa_GetPerfMonitorCounterStringAMD(GLuint group,
GLuint counter,
 const struct gl_perf_monitor_group *group_obj;
 const struct gl_perf_monitor_counter *counter_obj;

+   init_groups(ctx);
+
 group_obj = get_group(ctx, group);

 if (group_obj == NULL) {
@@ -290,6 +306,8 @@ _mesa_GetPerfMonitorCounterInfoAMD(GLuint group,
GLuint counter, GLenum pname,
 const struct gl_perf_monitor_group *group_obj;
 const struct gl_perf_monitor_counter *counter_obj;

+   init_groups(ctx);
+
 group_obj = get_group(ctx, group);

 if (group_obj == NULL) {
@@ -353,6 +371,8 @@ _mesa_GenPerfMonitorsAMD(GLsizei n, GLuint
*monitors)
 if (MESA_VERBOSE & VERBOSE_API)
_mesa_debug(ctx, "glGenPerfMonitorsAMD(%d)\n", n);

+   init_groups(ctx);
+
 if (n < 0) {
_mesa_error(ctx, GL_INVALID_VALUE, "glGenPerfMonitorsAMD(n <
0)");
return;
@@ -673,6 +693,8 @@ _mesa_GetFirstPerfQueryIdINTEL(GLuint *queryId)
 GET_CURRENT_CONTEXT(ctx);
 unsigned numGroups;

+   init_groups(ctx);
+
 /* The GL_INTEL_perfor

[Mesa-dev] [PATCH] st/mesa: remove outdated comment

2015-11-25 Thread Nicolai Hähnle
From: Nicolai Hähnle 

The enable of AMD_performance_monitor is no longer related to whether
queries are run by the GPU since the commit mentioned below.

Suggested-by: Samuel Pitoiset 

commit ddf27a3dd062c78ff49a69a1396be4de9c1b5d37
Author: Nicolai Hähnle 
Date:   Tue Nov 10 13:35:01 2015 +0100

gallium: remove pipe_driver_query_group_info field type
---
 src/mesa/state_tracker/st_context.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_context.c 
b/src/mesa/state_tracker/st_context.c
index eea3de0..1459f25 100644
--- a/src/mesa/state_tracker/st_context.c
+++ b/src/mesa/state_tracker/st_context.c
@@ -256,8 +256,6 @@ st_create_context_priv( struct gl_context *ctx, struct 
pipe_context *pipe,
   &ctx->Extensions, &st->options, ctx->Mesa_DXTn);
 
if (st_have_perfmon(st)) {
-  /* GL_AMD_performance_monitor is only enabled when the underlying
-   * driver expose GPU hardware performance counters. */
   ctx->Extensions.AMD_performance_monitor = GL_TRUE;
}
 
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: remove outdated comment

2015-11-25 Thread Samuel Pitoiset

Reviewed-by: Samuel Pitoiset 

Thanks Nicolai.

On 11/25/2015 02:29 PM, Nicolai Hähnle wrote:

From: Nicolai Hähnle 

The enable of AMD_performance_monitor is no longer related to whether
queries are run by the GPU since the commit mentioned below.

Suggested-by: Samuel Pitoiset 

commit ddf27a3dd062c78ff49a69a1396be4de9c1b5d37
Author: Nicolai Hähnle 
Date:   Tue Nov 10 13:35:01 2015 +0100

 gallium: remove pipe_driver_query_group_info field type
---
  src/mesa/state_tracker/st_context.c | 2 --
  1 file changed, 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_context.c 
b/src/mesa/state_tracker/st_context.c
index eea3de0..1459f25 100644
--- a/src/mesa/state_tracker/st_context.c
+++ b/src/mesa/state_tracker/st_context.c
@@ -256,8 +256,6 @@ st_create_context_priv( struct gl_context *ctx, struct 
pipe_context *pipe,
&ctx->Extensions, &st->options, ctx->Mesa_DXTn);

 if (st_have_perfmon(st)) {
-  /* GL_AMD_performance_monitor is only enabled when the underlying
-   * driver expose GPU hardware performance counters. */
ctx->Extensions.AMD_performance_monitor = GL_TRUE;
 }




--
-Samuel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/2] Run post opt_vector_float() opts in a loop

2015-11-25 Thread Juan A. Suarez Romero

When checking output VS in glsl-mat-from-int-ctor-03 piglit, I got the following
(part of) code.

mov(8)  g19<1>.xyzF g6<4,4,1>.xyzzD { align16 
1Q };
dp4(8)  g115<1>.wF  g4<4,4,1>F  g2.4<0,4,1>F{ align16 
NoDDChk 1Q };
cmp.nz.f0(8)null<1>Fg11<4,4,1>.xyzzF g19<4,4,1>.xyzzF { align16 
1Q switch };
cmp.nz.f0(8)null<1>Dg7<4,4,1>D  0D  { align16 
1Q switch };
(+f0.any4h) mov(8) g21<1>.xUD   0xUD{ align16 
1Q };

Clearly the first cmp can be removed because the result is overwritten by the
second one.

Investigating why this is not happening, saw that in brw_vec4, after running
opt_vector_float(), we are running optimizations just once, instead of in a loop
until no progress happens.

Not sure if there is a reason to keep it separated from the previous
loop. Tracking back the code seems that originally it wasn't added because
opt_vector_float() wasn't written as an optimization, and no optimizations were
done after running it. But later someone suggested to run some optimizations if
opt_vector_float() success, which made to add a conditional. In final commits,
that opt_vector_float() was converted in a true optimiztaion, but still kept out
of the loop. At this point I'm not sure if there was a good reason (no
explanation found) or just to make the less changes in code. Maybe someone can
bring light here.

So merged those optimizations inside the previous loop (second commit). But this
made some piglit tests to never end. Checking about this, saw that some
optimizations were been reverted by others (specifically CSE reverted by
copy-propagation). So I added a minor change in CSE (first commit) that prevents
apply it when the common expression is just an immediate (it saves nothing, and
adds a new instruction).

The free shaders in shader-db are not improved (nor hurted). But testing against
the non-free shaders we get a small improvement:
   
total instructions in shared programs: 6819828 -> 6819468 (-0.01%)
instructions in affected programs: 30516 -> 30156 (-1.18%)
total loops in shared programs:1971 -> 1971 (0.00%)
helped:154
HURT:  0
GAINED:0
LOST:  0


Juan A. Suarez Romero (2):
  i965: Do not apply CSE opt to MOV immediate
  i965: run brw_vec4 optimizations in loop

 src/mesa/drivers/dri/i965/brw_vec4.cpp | 10 +++---
 src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 13 -
 2 files changed, 15 insertions(+), 8 deletions(-)

-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] i965: run brw_vec4 optimizations in loop

2015-11-25 Thread Juan A. Suarez Romero
When analyzing output for glsl-mat-from-int-ctor-03 piglit test, found
that the following piece of generated asm code:

mov(8)  g19<1>.xyzF g6<4,4,1>.xyzzD { align16 1Q };
dp4(8)  g115<1>.wF  g4<4,4,1>F  g2.4<0,4,1>F{ align16 
NoDDChk 1Q };
cmp.nz.f0(8)null<1>Fg11<4,4,1>.xyzzF g19<4,4,1>.xyzzF { align16 1Q 
switch };
cmp.nz.f0(8)null<1>Dg7<4,4,1>D  0D  { align16 1Q 
switch };
(+f0.any4h) mov(8) g21<1>.xUD   0xUD{ align16 1Q };

can be further optimized if we remove the former cmp instruction, as the
output is overwritten by the second one (a clearly dead-code).

The reason why this was not optimized is in vec4_visitor::run()
(brw_vec4.cpp file). If opt_vector_float() success, then we apply
several optimizations once, including a dead-code eliminate.

But turns out that running this DCE optimization leds to the code above,
and as we don't run again the DCE (we don't run them in a loop), the
code remains as above.

So this commit run all those optimizations in a loop. More strictly,
instead of running them in a separated loop, it runs them in the
previous loop, with all the other optimizations.

When testing against the improvement obtained against shader-db, we
don't get any improvement (nor any loose). But testing also against the
non-free shaders in shader-db, we get the following improvement:

total instructions in shared programs: 6819828 -> 6819468 (-0.01%)
instructions in affected programs: 30516 -> 30156 (-1.18%)
total loops in shared programs:1971 -> 1971 (0.00%)
helped:154
HURT:  0
GAINED:0
LOST:  0

Signed-off-by: Juan A. Suarez Romero 
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 9a79d67..39e525d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1846,19 +1846,15 @@ vec4_visitor::run()
   OPT(opt_copy_propagation);
   OPT(opt_cmod_propagation);
   OPT(opt_cse);
-  OPT(opt_algebraic);
   OPT(opt_register_coalesce);
   OPT(eliminate_find_live_channel);
-   } while (progress);
-
-   pass_num = 0;
-
-   if (OPT(opt_vector_float)) {
+  OPT(opt_algebraic);
+  OPT(opt_vector_float);
   OPT(opt_cse);
   OPT(opt_copy_propagation, false);
   OPT(opt_copy_propagation, true);
   OPT(dead_code_eliminate);
-   }
+   } while (progress);
 
if (failed)
   return false;
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] i965: Do not apply CSE opt to MOV immediate

2015-11-25 Thread Juan A. Suarez Romero
If the shader asm code is something like:

mov vgrf2767.0:F, [13F, 14F, 15F, 16F]
mov vgrf2768.0:F, [9F, 10F, 11F, 12F]
mov m8:F, [13F, 14F, 15F, 16F]
mov m7:F, [9F, 10F, 11F, 12F]

And we apply Common Subexpresion Elimination optimization, we get the
following one:

mov vgrf2771.0:F, [13F, 14F, 15F, 16F]
mov vgrf2767.0:F, vgrf2771.xyzw:F
mov vgrf2772.0:F, [9F, 10F, 11F, 12F]
mov vgrf2768.0:F, vgrf2772.xyzw:F
mov m8:F, vgrf2771.xyzw:F
mov m7:F, vgrf2772.xyzw:F

The problem is that later we apply Copy Propagation optimization, which
reverts the change to the original one. If we run the optimizations in
a loop, there is always a progress, but we are in neverending loop.

Usually, when we have a sentence of the form:

X = exp

We apply CSE if "exp" is actually an expression. But if it is constant
we do not apply it, as the point of CSE is saving running the same
computation more than once, that doesn't happen when we have a constant.

So this commit ensures CSE is not applied to MOV immediate (as it
provides no gain, and it is reverted later by copy-propagation
optimization).

Signed-off-by: Juan A. Suarez Romero 
---
 src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
index 85cbf24..7ed7654 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
@@ -93,6 +93,17 @@ is_expression(const vec4_instruction *const inst)
 }
 
 static bool
+is_mov_constant(const vec4_instruction *const inst)
+{
+   if (inst->opcode == BRW_OPCODE_MOV &&
+   inst->src[0].file == BRW_IMMEDIATE_VALUE) {
+  return true;
+   } else {
+  return false;
+   }
+}
+
+static bool
 operands_match(const vec4_instruction *a, const vec4_instruction *b)
 {
const src_reg *xs = a->src;
@@ -142,7 +153,7 @@ vec4_visitor::opt_cse_local(bblock_t *block)
int ip = block->start_ip;
foreach_inst_in_block (vec4_instruction, inst, block) {
   /* Skip some cases. */
-  if (is_expression(inst) && !inst->predicate && inst->mlen == 0 &&
+  if (is_expression(inst) && !inst->predicate && inst->mlen == 0 && 
!is_mov_constant(inst) &&
   ((inst->dst.file != ARF && inst->dst.file != FIXED_GRF) ||
inst->dst.is_null()))
   {
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 06/42] i965: Import tables enumerating the set of validated L3 configurations.

2015-11-25 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> On 25/11/15 13:56, Samuel Iglesias Gonsálvez wrote:
>> 
>> 
>> On 25/11/15 13:20, Samuel Iglesias Gonsálvez wrote:
>>>
>>>
>>> On 18/11/15 06:54, Jordan Justen wrote:
 From: Francisco Jerez 

 It should be possible to use additional L3 configurations other than
 the ones listed in the tables of validated allocations ("BSpec »
 3D-Media-GPGPU Engine » L3 Cache and URB [IVB+] » L3 Cache and URB [*]
 » L3 Allocation and Programming"), but it seems sensible for now to
 hard-code the tables in order to stick to the hardware docs.  Instead
 of setting up the arbitrary L3 partitioning given as input, the
 closest validated L3 configuration will be looked up in these tables
 and used to program the hardware.

 The included tables should work for Gen7-9.  Note that the quantities
 are specified in ways rather than in KB, this is because the L3
 control registers expect the value in ways, and because by doing that
 we can re-use a single table for all GT variants of the same
 generation (and in the case of IVB/HSW and CHV/SKL across different
 generations) which generally have different L3 way sizes but allow the
 same combinations of way allocations.
 ---
  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
  src/mesa/drivers/dri/i965/gen7_l3_state.c  | 163 
 +
  2 files changed, 164 insertions(+)
  create mode 100644 src/mesa/drivers/dri/i965/gen7_l3_state.c

 diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
 b/src/mesa/drivers/dri/i965/Makefile.sources
 index 5a88d66..91901ad 100644
 --- a/src/mesa/drivers/dri/i965/Makefile.sources
 +++ b/src/mesa/drivers/dri/i965/Makefile.sources
 @@ -184,6 +184,7 @@ i965_FILES = \
gen7_cs_state.c \
gen7_disable.c \
gen7_gs_state.c \
 +  gen7_l3_state.c \
gen7_misc_state.c \
gen7_sf_state.c \
gen7_sol_state.c \
 diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
 b/src/mesa/drivers/dri/i965/gen7_l3_state.c
 new file mode 100644
 index 000..8f9ba5b
 --- /dev/null
 +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
 @@ -0,0 +1,163 @@
 +/*
 + * Copyright (c) 2015 Intel Corporation
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the 
 "Software"),
 + * to deal in the Software without restriction, including without 
 limitation
 + * the rights to use, copy, modify, merge, publish, distribute, 
 sublicense,
 + * and/or sell copies of the Software, and to permit persons to whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including the 
 next
 + * paragraph) shall be included in all copies or substantial portions of 
 the
 + * Software.
 + *
 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
 EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
 MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT 
 SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
 OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
 DEALINGS
 + * IN THE SOFTWARE.
 + */
 +
 +#include "brw_context.h"
 +#include "brw_defines.h"
 +#include "brw_state.h"
 +#include "intel_batchbuffer.h"
 +
 +/**
 + * Chunk of L3 cache reserved for some specific purpose.
 + */
 +enum brw_l3_partition {
 +   /** Shared local memory. */
 +   L3P_SLM = 0,
 +   /** Unified return buffer. */
 +   L3P_URB,
 +   /** Union of DC and RO. */
 +   L3P_ALL,
 +   /** Data cluster RW partition. */
 +   L3P_DC,
 +   /** Union of IS, C and T. */
 +   L3P_RO,
 +   /** Instruction and state cache. */
 +   L3P_IS,
 +   /** Constant cache. */
 +   L3P_C,
 +   /** Texture cache. */
 +   L3P_T,
 +   /** Number of supported L3 partitions. */
 +   NUM_L3P
 +};
 +
 +/**
 + * L3 configuration represented as the number of ways allocated for each
 + * partition.  \sa get_l3_way_size().
 + */
 +struct brw_l3_config {
 +   unsigned n[NUM_L3P];
 +};
 +
 +/**
 + * IVB/HSW validated L3 configurations.
 + */
 +static const struct brw_l3_config ivb_l3_configs[] = {
 +   {{  0, 32,  0,  0, 32,  0,  0,  0 }},
 +   {{  0, 32,  0, 16, 16,  0,  0,  0 }},
 +   {{  0, 32,  0,  4,  0,  8,  4, 16 }},
 +   {{  0, 28,  0,  8,  0,  8,  4, 16 }},
 +   {{  0, 28,  0, 16,  0,  8,  4,  8 }},
 +   {{

Re: [Mesa-dev] [PATCH 7/8] radeonsi: implement AMD_performance_monitor for CIK+

2015-11-25 Thread Marek Olšák
R600_PC_SHADER_WINDOWING seems unused. What is it good for?

BTW, I think the EVENT_WRITE_EOP workaround is indeed required.

The series is:
Reviewed-by: Marek Olšák 

Marek

On Tue, Nov 24, 2015 at 5:00 PM, Nicolai Hähnle  wrote:
> Expose most of the performance counter groups that are exposed by Catalyst.
> Ideally, the driver will work with GPUPerfStudio at some point, but we are not
> quite there yet. In any case, this is the reason for grouping multiple
> instances of hardware blocks in the way it is implemented.
>
> The counters can also be shown using the Gallium HUD. If one is interested to
> see how work is distributed across multiple shader engines, one can set the
> environment variable RADEON_PC_SEPARATE_SE=1 to obtain finer-grained 
> performance
> counter groups.
>
> Part of the implementation is in radeon because an implementation for
> older hardware would largely follow along the same lines, but exposing
> a different set of blocks which are programmed slightly differently.
> ---
>  src/gallium/drivers/radeon/Makefile.sources   |   1 +
>  src/gallium/drivers/radeon/r600_perfcounter.c | 633 +++
>  src/gallium/drivers/radeon/r600_pipe_common.c |   1 +
>  src/gallium/drivers/radeon/r600_pipe_common.h |   7 +
>  src/gallium/drivers/radeon/r600_query.c   |  21 +-
>  src/gallium/drivers/radeon/r600_query.h   | 121 +
>  src/gallium/drivers/radeonsi/Makefile.sources |   1 +
>  src/gallium/drivers/radeonsi/si_perfcounter.c | 697 
> ++
>  src/gallium/drivers/radeonsi/si_pipe.c|   3 +
>  src/gallium/drivers/radeonsi/si_pipe.h|   3 +
>  10 files changed, 1485 insertions(+), 3 deletions(-)
>  create mode 100644 src/gallium/drivers/radeon/r600_perfcounter.c
>  create mode 100644 src/gallium/drivers/radeonsi/si_perfcounter.c
>
> diff --git a/src/gallium/drivers/radeon/Makefile.sources 
> b/src/gallium/drivers/radeon/Makefile.sources
> index 1dbad2f..eb171f7 100644
> --- a/src/gallium/drivers/radeon/Makefile.sources
> +++ b/src/gallium/drivers/radeon/Makefile.sources
> @@ -4,6 +4,7 @@ C_SOURCES := \
> r600_cs.h \
> r600d_common.h \
> r600_gpu_load.c \
> +   r600_perfcounter.c \
> r600_pipe_common.c \
> r600_pipe_common.h \
> r600_query.c \
> diff --git a/src/gallium/drivers/radeon/r600_perfcounter.c 
> b/src/gallium/drivers/radeon/r600_perfcounter.c
> new file mode 100644
> index 000..b4915d0
> --- /dev/null
> +++ b/src/gallium/drivers/radeon/r600_perfcounter.c
> @@ -0,0 +1,633 @@
> +/*
> + * Copyright 2015 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 
> THE
> + * SOFTWARE.
> + *
> + * Authors:
> + *  Nicolai Hähnle 
> + *
> + */
> +
> +#include "util/u_memory.h"
> +#include "r600_query.h"
> +#include "r600_pipe_common.h"
> +#include "r600d_common.h"
> +
> +/* Max counters per HW block */
> +#define R600_QUERY_MAX_COUNTERS 16
> +
> +static const char * const r600_pc_shader_suffix[] = {
> +   "", "_PS", "_VS", "_GS", "_ES", "_HS", "_LS", "_CS"
> +};
> +
> +static struct r600_perfcounter_block *
> +lookup_counter(struct r600_perfcounters *pc, unsigned index,
> +  unsigned *base_gid, unsigned *sub_index)
> +{
> +   struct r600_perfcounter_block *block = pc->blocks;
> +   unsigned bid;
> +
> +   *base_gid = 0;
> +   for (bid = 0; bid < pc->num_blocks; ++bid, ++block) {
> +   unsigned total = block->num_groups * block->num_selectors;
> +
> +   if (index < total) {
> +   *sub_index = index;
> +   return block;
> +   }
> +
> +   index -= total;
> +   *base_gid += block->num_groups;
> +   }
> +
> +   return NULL;
> +}
> +
> +static struct r600_perfcounter_block *
> +lookup_group(struct r600_perfcounters *pc, unsigned *index)
> +{
> + 

Re: [Mesa-dev] [PATCH v2 06/42] i965: Import tables enumerating the set of validated L3 configurations.

2015-11-25 Thread Samuel Iglesias Gonsálvez


On 25/11/15 14:45, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez  writes:
> 
>> On 25/11/15 13:56, Samuel Iglesias Gonsálvez wrote:
>>>
>>>
>>> On 25/11/15 13:20, Samuel Iglesias Gonsálvez wrote:


 On 18/11/15 06:54, Jordan Justen wrote:
> From: Francisco Jerez 
>
> It should be possible to use additional L3 configurations other than
> the ones listed in the tables of validated allocations ("BSpec »
> 3D-Media-GPGPU Engine » L3 Cache and URB [IVB+] » L3 Cache and URB [*]
> » L3 Allocation and Programming"), but it seems sensible for now to
> hard-code the tables in order to stick to the hardware docs.  Instead
> of setting up the arbitrary L3 partitioning given as input, the
> closest validated L3 configuration will be looked up in these tables
> and used to program the hardware.
>
> The included tables should work for Gen7-9.  Note that the quantities
> are specified in ways rather than in KB, this is because the L3
> control registers expect the value in ways, and because by doing that
> we can re-use a single table for all GT variants of the same
> generation (and in the case of IVB/HSW and CHV/SKL across different
> generations) which generally have different L3 way sizes but allow the
> same combinations of way allocations.
> ---
>  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
>  src/mesa/drivers/dri/i965/gen7_l3_state.c  | 163 
> +
>  2 files changed, 164 insertions(+)
>  create mode 100644 src/mesa/drivers/dri/i965/gen7_l3_state.c
>
> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
> b/src/mesa/drivers/dri/i965/Makefile.sources
> index 5a88d66..91901ad 100644
> --- a/src/mesa/drivers/dri/i965/Makefile.sources
> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
> @@ -184,6 +184,7 @@ i965_FILES = \
>   gen7_cs_state.c \
>   gen7_disable.c \
>   gen7_gs_state.c \
> + gen7_l3_state.c \
>   gen7_misc_state.c \
>   gen7_sf_state.c \
>   gen7_sol_state.c \
> diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
> b/src/mesa/drivers/dri/i965/gen7_l3_state.c
> new file mode 100644
> index 000..8f9ba5b
> --- /dev/null
> +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
> @@ -0,0 +1,163 @@
> +/*
> + * Copyright (c) 2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining 
> a
> + * copy of this software and associated documentation files (the 
> "Software"),
> + * to deal in the Software without restriction, including without 
> limitation
> + * the rights to use, copy, modify, merge, publish, distribute, 
> sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the 
> next
> + * paragraph) shall be included in all copies or substantial portions of 
> the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
> EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
> MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT 
> SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
> OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 
> ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include "brw_context.h"
> +#include "brw_defines.h"
> +#include "brw_state.h"
> +#include "intel_batchbuffer.h"
> +
> +/**
> + * Chunk of L3 cache reserved for some specific purpose.
> + */
> +enum brw_l3_partition {
> +   /** Shared local memory. */
> +   L3P_SLM = 0,
> +   /** Unified return buffer. */
> +   L3P_URB,
> +   /** Union of DC and RO. */
> +   L3P_ALL,
> +   /** Data cluster RW partition. */
> +   L3P_DC,
> +   /** Union of IS, C and T. */
> +   L3P_RO,
> +   /** Instruction and state cache. */
> +   L3P_IS,
> +   /** Constant cache. */
> +   L3P_C,
> +   /** Texture cache. */
> +   L3P_T,
> +   /** Number of supported L3 partitions. */
> +   NUM_L3P
> +};
> +
> +/**
> + * L3 configuration represented as the number of ways allocated for each
> + * partition.  \sa get_l3_way_size().
> + */
> +struct brw_l3_config {
> +   unsigned n[NUM_L3P];
> +};
> +
> +/**
> + * IVB/HSW validated L3 configurations.
> + */
> +static const struct brw_l3_config ivb_l3_configs[] = {
> +   {{  0, 32,  0,  0, 32,  0,  0,  0 }},
> +   {{  0, 32

Re: [Mesa-dev] [PATCH] mesa/main: TexImage2DMultisample needs to pass OpenGL3.3 conformance test.

2015-11-25 Thread Predut, Marius

> -Original Message-
> From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf Of
> Timothy Arceri
> Sent: Wednesday, November 25, 2015 1:12 PM
> To: Palli, Tapani; Predut, Marius; mesa-dev@lists.freedesktop.org
> Subject: Re: [Mesa-dev] [PATCH] mesa/main: TexImage2DMultisample needs to pass
> OpenGL3.3 conformance test.
> 
> On Wed, 2015-11-25 at 12:47 +0200, Tapani Pälli wrote:
> > Hi;
> >
> > On 11/25/2015 01:15 PM, Marius Predut wrote:
> > > Open GL 3.3 reference document says:
> > > samples must be in the range zero to GL_MAX_TEXTURE_SIZE - 1.
> > > Open GL.4 clearly states:
> > > An INVALID_VALUE error is generated if samples is zero.
> 
> See my comment in bugzilla [1] I believe this is just a bug in the
> reference pages, we implement things in Mesa going by what the spec
> says and the spec says nothing about samples being 0 in the 3.2 spec in
> fact it doen't even say anything in the 4.0 spec which you have changed
> the check to.
> 
> Also the 4.5 reference pages also conflict with the spec so this is
> even more reason I think this change is wrong.
> 
> [1] https://bugs.freedesktop.org/show_bug.cgi?id=91670
> 

I don't think it is a bug in specs because in this case also the CTS and the 
piglit test is wrong:

With this patch 2 things are fixed:
1.Khronos CTS conformance tests for OpenGL 3.3
2. The piglit test https://bugs.freedesktop.org/show_bug.cgi?id=93100 (Or 
https://bugs.freedesktop.org/show_bug.cgi?id=91670)

The patch is based on this spec:
https://www.opengl.org/sdk/docs/man3/xhtml/glTexImage3DMultisample.xml

We can't believe or "suppose" something here, the specs need to be as an axioma.
Can someone confirm that this reference includes the wrong specs?


> >
> > OpenGL ES 3.1 spec also says "An INVALID_VALUE error is generated if
> > samples is zero.". You'll need to change you check below to include
> > also
> > ES 3.1.
> >
> > > Fixing the piglit test case gl-3.2-layered-rendering
> > > -framebuffertexture.
> > >
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93100
> > >
> > > Signed-off-by: Marius Predut 
> > > ---
> > >   src/mesa/main/teximage.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
> > > index d9453e3..69634ff 100644
> > > --- a/src/mesa/main/teximage.c
> > > +++ b/src/mesa/main/teximage.c
> > > @@ -5211,7 +5211,7 @@ texture_image_multisample(struct gl_context
> > > *ctx, GLuint dims,
> > > return;
> > >  }
> > >
> > > -   if (samples < 1) {
> > > +   if (samples < 1 && ctx->API == API_OPENGL_CORE && ctx->Version
> > > >= 40) {
> > > _mesa_error(ctx, GL_INVALID_VALUE, "%s(samples < 1)",
> > > func);
> > > return;
> > >  }
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 5.5/42] i965: Add slice count to the brw_device_info structure.

2015-11-25 Thread Francisco Jerez
---
 src/mesa/drivers/dri/i965/brw_device_info.c | 20 
 src/mesa/drivers/dri/i965/brw_device_info.h |  5 +
 2 files changed, 25 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_device_info.c 
b/src/mesa/drivers/dri/i965/brw_device_info.c
index 4550550..bf31898 100644
--- a/src/mesa/drivers/dri/i965/brw_device_info.c
+++ b/src/mesa/drivers/dri/i965/brw_device_info.c
@@ -28,6 +28,7 @@
 static const struct brw_device_info brw_device_info_i965 = {
.gen = 4,
.has_negative_rhw_bug = true,
+   .num_slices = 1,
.max_vs_threads = 16,
.max_gs_threads = 2,
.max_wm_threads = 8 * 4,
@@ -42,6 +43,7 @@ static const struct brw_device_info brw_device_info_g4x = {
.has_compr4 = true,
.has_surface_tile_offset = true,
.is_g4x = true,
+   .num_slices = 1,
.max_vs_threads = 32,
.max_gs_threads = 2,
.max_wm_threads = 10 * 5,
@@ -55,6 +57,7 @@ static const struct brw_device_info brw_device_info_ilk = {
.has_pln = true,
.has_compr4 = true,
.has_surface_tile_offset = true,
+   .num_slices = 1,
.max_vs_threads = 72,
.max_gs_threads = 32,
.max_wm_threads = 12 * 6,
@@ -71,6 +74,7 @@ static const struct brw_device_info brw_device_info_snb_gt1 = 
{
.has_pln = true,
.has_surface_tile_offset = true,
.needs_unlit_centroid_workaround = true,
+   .num_slices = 1,
.max_vs_threads = 24,
.max_gs_threads = 21, /* conservative; 24 if rendering disabled. */
.max_wm_threads = 40,
@@ -90,6 +94,7 @@ static const struct brw_device_info brw_device_info_snb_gt2 = 
{
.has_pln = true,
.has_surface_tile_offset = true,
.needs_unlit_centroid_workaround = true,
+   .num_slices = 1,
.max_vs_threads = 60,
.max_gs_threads = 60,
.max_wm_threads = 80,
@@ -112,6 +117,7 @@ static const struct brw_device_info brw_device_info_snb_gt2 
= {
 static const struct brw_device_info brw_device_info_ivb_gt1 = {
GEN7_FEATURES, .is_ivybridge = true, .gt = 1,
.needs_unlit_centroid_workaround = true,
+   .num_slices = 1,
.max_vs_threads = 36,
.max_hs_threads = 36,
.max_ds_threads = 36,
@@ -131,6 +137,7 @@ static const struct brw_device_info brw_device_info_ivb_gt1 
= {
 static const struct brw_device_info brw_device_info_ivb_gt2 = {
GEN7_FEATURES, .is_ivybridge = true, .gt = 2,
.needs_unlit_centroid_workaround = true,
+   .num_slices = 1,
.max_vs_threads = 128,
.max_hs_threads = 128,
.max_ds_threads = 128,
@@ -150,6 +157,7 @@ static const struct brw_device_info brw_device_info_ivb_gt2 
= {
 static const struct brw_device_info brw_device_info_byt = {
GEN7_FEATURES, .is_baytrail = true, .gt = 1,
.needs_unlit_centroid_workaround = true,
+   .num_slices = 1,
.has_llc = false,
.max_vs_threads = 36,
.max_hs_threads = 36,
@@ -175,6 +183,7 @@ static const struct brw_device_info brw_device_info_byt = {
 
 static const struct brw_device_info brw_device_info_hsw_gt1 = {
HSW_FEATURES, .gt = 1,
+   .num_slices = 1,
.max_vs_threads = 70,
.max_hs_threads = 70,
.max_ds_threads = 70,
@@ -193,6 +202,7 @@ static const struct brw_device_info brw_device_info_hsw_gt1 
= {
 
 static const struct brw_device_info brw_device_info_hsw_gt2 = {
HSW_FEATURES, .gt = 2,
+   .num_slices = 1,
.max_vs_threads = 280,
.max_hs_threads = 256,
.max_ds_threads = 280,
@@ -211,6 +221,7 @@ static const struct brw_device_info brw_device_info_hsw_gt2 
= {
 
 static const struct brw_device_info brw_device_info_hsw_gt3 = {
HSW_FEATURES, .gt = 3,
+   .num_slices = 2,
.max_vs_threads = 280,
.max_hs_threads = 256,
.max_ds_threads = 280,
@@ -243,6 +254,7 @@ static const struct brw_device_info brw_device_info_hsw_gt3 
= {
 
 static const struct brw_device_info brw_device_info_bdw_gt1 = {
GEN8_FEATURES, .gt = 1,
+   .num_slices = 1,
.max_cs_threads = 42,
.urb = {
   .size = 192,
@@ -256,6 +268,7 @@ static const struct brw_device_info brw_device_info_bdw_gt1 
= {
 
 static const struct brw_device_info brw_device_info_bdw_gt2 = {
GEN8_FEATURES, .gt = 2,
+   .num_slices = 1,
.max_cs_threads = 56,
.urb = {
   .size = 384,
@@ -269,6 +282,7 @@ static const struct brw_device_info brw_device_info_bdw_gt2 
= {
 
 static const struct brw_device_info brw_device_info_bdw_gt3 = {
GEN8_FEATURES, .gt = 3,
+   .num_slices = 2,
.max_cs_threads = 56,
.urb = {
   .size = 384,
@@ -283,6 +297,7 @@ static const struct brw_device_info brw_device_info_bdw_gt3 
= {
 static const struct brw_device_info brw_device_info_chv = {
GEN8_FEATURES, .is_cherryview = 1, .gt = 1,
.has_llc = false,
+   .num_slices = 1,
.max_vs_threads = 80,
.max_hs_threads = 80,
.max_ds_threads = 80,
@@ -324,19 +339,23 @@ static const struct brw_device_info brw_device_info_chv = 
{
 
 static const struct brw_device_info brw_device_info_skl_gt1 = {
GEN9_FEATURES, .gt = 1,
+   .num_slices = 1,
.urb.size = 192,
 };
 
 static const struct brw_device_info brw_device

[Mesa-dev] [PATCH v3 06/42] i965: Import tables enumerating the set of validated L3 configurations.

2015-11-25 Thread Francisco Jerez
It should be possible to use additional L3 configurations other than
the ones listed in the tables of validated allocations ("BSpec »
3D-Media-GPGPU Engine » L3 Cache and URB [IVB+] » L3 Cache and URB [*]
» L3 Allocation and Programming"), but it seems sensible for now to
hard-code the tables in order to stick to the hardware docs.  Instead
of setting up the arbitrary L3 partitioning given as input, the
closest validated L3 configuration will be looked up in these tables
and used to program the hardware.

The included tables should work for Gen7-9.  Note that the quantities
are specified in ways rather than in KB, this is because the L3
control registers expect the value in ways, and because by doing that
we can re-use a single table for all GT variants of the same
generation (and in the case of IVB/HSW and CHV/SKL across different
generations) which generally have different L3 way sizes but allow the
same combinations of way allocations.

v3: Use slice count from the devinfo structure instead of the gt
number to implement get_l3_way_size().
---
 src/mesa/drivers/dri/i965/Makefile.sources |   1 +
 src/mesa/drivers/dri/i965/gen7_l3_state.c  | 163 +
 2 files changed, 164 insertions(+)
 create mode 100644 src/mesa/drivers/dri/i965/gen7_l3_state.c

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 5a88d66..91901ad 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -184,6 +184,7 @@ i965_FILES = \
gen7_cs_state.c \
gen7_disable.c \
gen7_gs_state.c \
+   gen7_l3_state.c \
gen7_misc_state.c \
gen7_sf_state.c \
gen7_sol_state.c \
diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
b/src/mesa/drivers/dri/i965/gen7_l3_state.c
new file mode 100644
index 000..8765b11
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
@@ -0,0 +1,163 @@
+/*
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "brw_context.h"
+#include "brw_defines.h"
+#include "brw_state.h"
+#include "intel_batchbuffer.h"
+
+/**
+ * Chunk of L3 cache reserved for some specific purpose.
+ */
+enum brw_l3_partition {
+   /** Shared local memory. */
+   L3P_SLM = 0,
+   /** Unified return buffer. */
+   L3P_URB,
+   /** Union of DC and RO. */
+   L3P_ALL,
+   /** Data cluster RW partition. */
+   L3P_DC,
+   /** Union of IS, C and T. */
+   L3P_RO,
+   /** Instruction and state cache. */
+   L3P_IS,
+   /** Constant cache. */
+   L3P_C,
+   /** Texture cache. */
+   L3P_T,
+   /** Number of supported L3 partitions. */
+   NUM_L3P
+};
+
+/**
+ * L3 configuration represented as the number of ways allocated for each
+ * partition.  \sa get_l3_way_size().
+ */
+struct brw_l3_config {
+   unsigned n[NUM_L3P];
+};
+
+/**
+ * IVB/HSW validated L3 configurations.
+ */
+static const struct brw_l3_config ivb_l3_configs[] = {
+   {{  0, 32,  0,  0, 32,  0,  0,  0 }},
+   {{  0, 32,  0, 16, 16,  0,  0,  0 }},
+   {{  0, 32,  0,  4,  0,  8,  4, 16 }},
+   {{  0, 28,  0,  8,  0,  8,  4, 16 }},
+   {{  0, 28,  0, 16,  0,  8,  4,  8 }},
+   {{  0, 28,  0,  8,  0, 16,  4,  8 }},
+   {{  0, 28,  0,  0,  0, 16,  4, 16 }},
+   {{  0, 32,  0,  0,  0, 16,  0, 16 }},
+   {{  0, 28,  0,  4, 32,  0,  0,  0 }},
+   {{ 16, 16,  0, 16, 16,  0,  0,  0 }},
+   {{ 16, 16,  0,  8,  0,  8,  8,  8 }},
+   {{ 16, 16,  0,  4,  0,  8,  4, 16 }},
+   {{ 16, 16,  0,  4,  0, 16,  4,  8 }},
+   {{ 16, 16,  0,  0, 32,  0,  0,  0 }},
+   {{ 0 }}
+};
+
+/**
+ * VLV validated L3 configurations.
+ */
+static const struct brw_l3_config vlv_l3_configs[] = {
+   {{  0, 80,  0,  0, 16,  0,  0,  0 }},
+   {{  0, 80,  0,  8,  8,  0,  0,  0 }},
+   {{  0, 64,  0, 16, 16,  0,  0,  0 }},
+   {{  0, 64,  0,  0, 32,  0,  0,  0 }},
+   {{  0, 60,  0,  4, 32,  0,  0,  0 }},
+   {{ 32, 32, 

[Mesa-dev] [PATCH v3 11/42] i965: Implement L3 state atom.

2015-11-25 Thread Francisco Jerez
The L3 state atom calculates the target L3 partition weights when the
program bound to some shader stage is modified, and in case they are
far enough from the current partitioning it makes sure that the L3
state is re-emitted.

v3: Fix for inconsistent units the context URB size is expressed in.
Clamp URB size to 1008 KB on SKL due to FF hardware limitation.
---
 src/mesa/drivers/dri/i965/brw_context.h   |  6 +++
 src/mesa/drivers/dri/i965/brw_state.h |  1 +
 src/mesa/drivers/dri/i965/gen7_l3_state.c | 81 +++
 3 files changed, 88 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 9983454..b4f8de6 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -678,6 +678,8 @@ enum brw_predicate_state {
 
 struct shader_times;
 
+struct brw_l3_config;
+
 /**
  * brw_context is derived from gl_context.
  */
@@ -1220,6 +1222,10 @@ struct brw_context
int basevertex;
 
struct {
+  const struct brw_l3_config *config;
+   } l3;
+
+   struct {
   drm_intel_bo *bo;
   const char **names;
   int *ids;
diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
b/src/mesa/drivers/dri/i965/brw_state.h
index 94734ba..49f301a 100644
--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/drivers/dri/i965/brw_state.h
@@ -129,6 +129,7 @@ extern const struct brw_tracked_state gen7_depthbuffer;
 extern const struct brw_tracked_state gen7_clip_state;
 extern const struct brw_tracked_state gen7_disable_stages;
 extern const struct brw_tracked_state gen7_gs_state;
+extern const struct brw_tracked_state gen7_l3_state;
 extern const struct brw_tracked_state gen7_ps_state;
 extern const struct brw_tracked_state gen7_push_constant_space;
 extern const struct brw_tracked_state gen7_sbe_state;
diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
b/src/mesa/drivers/dri/i965/gen7_l3_state.c
index 70a61ae..4c314f5 100644
--- a/src/mesa/drivers/dri/i965/gen7_l3_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
@@ -418,3 +418,84 @@ setup_l3_config(struct brw_context *brw, const struct 
brw_l3_config *cfg)
   }
}
 }
+
+/**
+ * Return the unit brw_context::urb::size is expressed in, in KB.  \sa
+ * brw_device_info::urb::size.
+ */
+static unsigned
+get_urb_size_scale(const struct brw_device_info *devinfo)
+{
+   return (devinfo->gen >= 8 ? devinfo->num_slices : 1);
+}
+
+/**
+ * Update the URB size in the context state for the specified L3
+ * configuration.
+ */
+static void
+update_urb_size(struct brw_context *brw, const struct brw_l3_config *cfg)
+{
+   const struct brw_device_info *devinfo = brw->intelScreen->devinfo;
+   /* From the SKL "L3 Allocation and Programming" documentation:
+*
+* "URB is limited to 1008KB due to programming restrictions.  This is not
+* a restriction of the L3 implementation, but of the FF and other clients.
+* Therefore, in a GT4 implementation it is possible for the programmed
+* allocation of the L3 data array to provide 3*384KB=1152KB for URB, but
+* only 1008KB of this will be used."
+*/
+   const unsigned max = (devinfo->gen == 9 ? 1008 : ~0);
+   const unsigned sz =
+  MIN2(max, cfg->n[L3P_URB] * get_l3_way_size(devinfo)) /
+  get_urb_size_scale(devinfo);
+
+   if (brw->urb.size != sz) {
+  brw->urb.size = sz;
+  brw->ctx.NewDriverState |= BRW_NEW_URB_SIZE;
+   }
+}
+
+static void
+emit_l3_state(struct brw_context *brw)
+{
+   const struct brw_l3_weights w = get_pipeline_state_l3_weights(brw);
+   const float dw = diff_l3_weights(w, get_config_l3_weights(brw->l3.config));
+   /* The distance between any two compatible weight vectors cannot exceed two
+* due to the triangle inequality.
+*/
+   const float large_dw_threshold = 2.0;
+   /* Somewhat arbitrary, simply makes sure that there will be no repeated
+* transitions to the same L3 configuration, could probably do better here.
+*/
+   const float small_dw_threshold = 0.5;
+   /* If we're emitting a new batch the caches should already be clean and the
+* transition should be relatively cheap, so it shouldn't hurt much to use
+* the smaller threshold.  Otherwise use the larger threshold so that we
+* only reprogram the L3 mid-batch if the most recently programmed
+* configuration is incompatible with the current pipeline state.
+*/
+   const float dw_threshold = (brw->ctx.NewDriverState & BRW_NEW_BATCH ?
+   small_dw_threshold : large_dw_threshold);
+
+   if (dw > dw_threshold && brw->can_do_pipelined_register_writes) {
+  const struct brw_l3_config *const cfg =
+ get_l3_config(brw->intelScreen->devinfo, w);
+
+  setup_l3_config(brw, cfg);
+  update_urb_size(brw, cfg);
+  brw->l3.config = cfg;
+   }
+}
+
+const struct brw_tracked_state gen7_l3_state = {
+   .dirty = {
+  .mesa = 0,
+  .brw = BRW_NEW_BATCH |
+ BRW_NEW_VS_PROG_DATA 

Re: [Mesa-dev] [PATCH v2 06/42] i965: Import tables enumerating the set of validated L3 configurations.

2015-11-25 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> On 25/11/15 14:45, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez  writes:
>> 
>>> On 25/11/15 13:56, Samuel Iglesias Gonsálvez wrote:


 On 25/11/15 13:20, Samuel Iglesias Gonsálvez wrote:
>
>
> On 18/11/15 06:54, Jordan Justen wrote:
>> From: Francisco Jerez 
>>
>> It should be possible to use additional L3 configurations other than
>> the ones listed in the tables of validated allocations ("BSpec »
>> 3D-Media-GPGPU Engine » L3 Cache and URB [IVB+] » L3 Cache and URB [*]
>> » L3 Allocation and Programming"), but it seems sensible for now to
>> hard-code the tables in order to stick to the hardware docs.  Instead
>> of setting up the arbitrary L3 partitioning given as input, the
>> closest validated L3 configuration will be looked up in these tables
>> and used to program the hardware.
>>
>> The included tables should work for Gen7-9.  Note that the quantities
>> are specified in ways rather than in KB, this is because the L3
>> control registers expect the value in ways, and because by doing that
>> we can re-use a single table for all GT variants of the same
>> generation (and in the case of IVB/HSW and CHV/SKL across different
>> generations) which generally have different L3 way sizes but allow the
>> same combinations of way allocations.
>> ---
>>  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
>>  src/mesa/drivers/dri/i965/gen7_l3_state.c  | 163 
>> +
>>  2 files changed, 164 insertions(+)
>>  create mode 100644 src/mesa/drivers/dri/i965/gen7_l3_state.c
>>
>> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
>> b/src/mesa/drivers/dri/i965/Makefile.sources
>> index 5a88d66..91901ad 100644
>> --- a/src/mesa/drivers/dri/i965/Makefile.sources
>> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
>> @@ -184,6 +184,7 @@ i965_FILES = \
>>  gen7_cs_state.c \
>>  gen7_disable.c \
>>  gen7_gs_state.c \
>> +gen7_l3_state.c \
>>  gen7_misc_state.c \
>>  gen7_sf_state.c \
>>  gen7_sol_state.c \
>> diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
>> b/src/mesa/drivers/dri/i965/gen7_l3_state.c
>> new file mode 100644
>> index 000..8f9ba5b
>> --- /dev/null
>> +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
>> @@ -0,0 +1,163 @@
>> +/*
>> + * Copyright (c) 2015 Intel Corporation
>> + *
>> + * Permission is hereby granted, free of charge, to any person 
>> obtaining a
>> + * copy of this software and associated documentation files (the 
>> "Software"),
>> + * to deal in the Software without restriction, including without 
>> limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, 
>> sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice (including the 
>> next
>> + * paragraph) shall be included in all copies or substantial portions 
>> of the
>> + * Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT 
>> SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
>> OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 
>> ARISING
>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
>> DEALINGS
>> + * IN THE SOFTWARE.
>> + */
>> +
>> +#include "brw_context.h"
>> +#include "brw_defines.h"
>> +#include "brw_state.h"
>> +#include "intel_batchbuffer.h"
>> +
>> +/**
>> + * Chunk of L3 cache reserved for some specific purpose.
>> + */
>> +enum brw_l3_partition {
>> +   /** Shared local memory. */
>> +   L3P_SLM = 0,
>> +   /** Unified return buffer. */
>> +   L3P_URB,
>> +   /** Union of DC and RO. */
>> +   L3P_ALL,
>> +   /** Data cluster RW partition. */
>> +   L3P_DC,
>> +   /** Union of IS, C and T. */
>> +   L3P_RO,
>> +   /** Instruction and state cache. */
>> +   L3P_IS,
>> +   /** Constant cache. */
>> +   L3P_C,
>> +   /** Texture cache. */
>> +   L3P_T,
>> +   /** Number of supported L3 partitions. */
>> +   NUM_L3P
>> +};
>> +
>> +/**
>> + * L3 configuration represented as the number of ways allocated for each
>> + * partition.  \sa get_l3_way_size().
>> + */
>> +struct brw_l3_config {
>> +   unsigned n[NUM_L3P];
>> 

Re: [Mesa-dev] [PATCH v2 06/42] i965: Import tables enumerating the set of validated L3 configurations.

2015-11-25 Thread Samuel Iglesias Gonsálvez


On 25/11/15 15:21, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez  writes:
> 
>> On 25/11/15 14:45, Francisco Jerez wrote:
>>> Samuel Iglesias Gonsálvez  writes:
>>>
 On 25/11/15 13:56, Samuel Iglesias Gonsálvez wrote:
>
>
> On 25/11/15 13:20, Samuel Iglesias Gonsálvez wrote:
>>
>>
>> On 18/11/15 06:54, Jordan Justen wrote:
>>> From: Francisco Jerez 
>>>
>>> It should be possible to use additional L3 configurations other than
>>> the ones listed in the tables of validated allocations ("BSpec »
>>> 3D-Media-GPGPU Engine » L3 Cache and URB [IVB+] » L3 Cache and URB [*]
>>> » L3 Allocation and Programming"), but it seems sensible for now to
>>> hard-code the tables in order to stick to the hardware docs.  Instead
>>> of setting up the arbitrary L3 partitioning given as input, the
>>> closest validated L3 configuration will be looked up in these tables
>>> and used to program the hardware.
>>>
>>> The included tables should work for Gen7-9.  Note that the quantities
>>> are specified in ways rather than in KB, this is because the L3
>>> control registers expect the value in ways, and because by doing that
>>> we can re-use a single table for all GT variants of the same
>>> generation (and in the case of IVB/HSW and CHV/SKL across different
>>> generations) which generally have different L3 way sizes but allow the
>>> same combinations of way allocations.
>>> ---
>>>  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
>>>  src/mesa/drivers/dri/i965/gen7_l3_state.c  | 163 
>>> +
>>>  2 files changed, 164 insertions(+)
>>>  create mode 100644 src/mesa/drivers/dri/i965/gen7_l3_state.c
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
>>> b/src/mesa/drivers/dri/i965/Makefile.sources
>>> index 5a88d66..91901ad 100644
>>> --- a/src/mesa/drivers/dri/i965/Makefile.sources
>>> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
>>> @@ -184,6 +184,7 @@ i965_FILES = \
>>> gen7_cs_state.c \
>>> gen7_disable.c \
>>> gen7_gs_state.c \
>>> +   gen7_l3_state.c \
>>> gen7_misc_state.c \
>>> gen7_sf_state.c \
>>> gen7_sol_state.c \
>>> diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
>>> b/src/mesa/drivers/dri/i965/gen7_l3_state.c
>>> new file mode 100644
>>> index 000..8f9ba5b
>>> --- /dev/null
>>> +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
>>> @@ -0,0 +1,163 @@
>>> +/*
>>> + * Copyright (c) 2015 Intel Corporation
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person 
>>> obtaining a
>>> + * copy of this software and associated documentation files (the 
>>> "Software"),
>>> + * to deal in the Software without restriction, including without 
>>> limitation
>>> + * the rights to use, copy, modify, merge, publish, distribute, 
>>> sublicense,
>>> + * and/or sell copies of the Software, and to permit persons to whom 
>>> the
>>> + * Software is furnished to do so, subject to the following conditions:
>>> + *
>>> + * The above copyright notice and this permission notice (including 
>>> the next
>>> + * paragraph) shall be included in all copies or substantial portions 
>>> of the
>>> + * Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
>>> EXPRESS OR
>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
>>> MERCHANTABILITY,
>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT 
>>> SHALL
>>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES 
>>> OR OTHER
>>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 
>>> ARISING
>>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
>>> DEALINGS
>>> + * IN THE SOFTWARE.
>>> + */
>>> +
>>> +#include "brw_context.h"
>>> +#include "brw_defines.h"
>>> +#include "brw_state.h"
>>> +#include "intel_batchbuffer.h"
>>> +
>>> +/**
>>> + * Chunk of L3 cache reserved for some specific purpose.
>>> + */
>>> +enum brw_l3_partition {
>>> +   /** Shared local memory. */
>>> +   L3P_SLM = 0,
>>> +   /** Unified return buffer. */
>>> +   L3P_URB,
>>> +   /** Union of DC and RO. */
>>> +   L3P_ALL,
>>> +   /** Data cluster RW partition. */
>>> +   L3P_DC,
>>> +   /** Union of IS, C and T. */
>>> +   L3P_RO,
>>> +   /** Instruction and state cache. */
>>> +   L3P_IS,
>>> +   /** Constant cache. */
>>> +   L3P_C,
>>> +   /** Texture cache. */
>>> +   L3P_T,
>>> +   /** Number of supported L3 partitions. */
>>> +   NUM_L3P
>>> +};
>>> +
>>> +/**
>>> + * L3 configuration represented

[Mesa-dev] [PATCH] i965: Document inconsistent units the URB size is represented in.

2015-11-25 Thread Francisco Jerez
 src/mesa/drivers/dri/i965/brw_context.h | 8 +++-
 src/mesa/drivers/dri/i965/brw_device_info.h | 5 +
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index ee3b6a3..30e5c48 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1000,7 +1000,13 @@ struct brw_context
   GLuint clip_start;
   GLuint sf_start;
   GLuint cs_start;
-  GLuint size; /* Hardware URB size, in KB. */
+  /**
+   * URB size in the current configuration.  The units this is expressed
+   * in are somewhat inconsistent, see brw_device_info::urb::size.
+   *
+   * FINISHME: Represent the URB size consistently in KB on all platforms.
+   */
+  GLuint size;
 
   /* True if the most recently sent _3DSTATE_URB message allocated
* URB space for the GS.
diff --git a/src/mesa/drivers/dri/i965/brw_device_info.h 
b/src/mesa/drivers/dri/i965/brw_device_info.h
index 8d3af62..324c074 100644
--- a/src/mesa/drivers/dri/i965/brw_device_info.h
+++ b/src/mesa/drivers/dri/i965/brw_device_info.h
@@ -81,6 +81,11 @@ struct brw_device_info
unsigned max_cs_threads;
 
struct {
+  /**
+   * Hardware default URB size.  The units this is expressed in are
+   * somewhat inconsistent: 512b units on Gen4-5, KB on Gen6-7, and KB
+   * times the slice count on Gen8+.
+   */
   unsigned size;
   unsigned min_vs_entries;
   unsigned max_vs_entries;
-- 
2.5.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 09/42] i965: Implement selection of the closest L3 configuration based on a vector of weights.

2015-11-25 Thread Samuel Iglesias Gonsálvez

Reviewed-by: Samuel Iglesias Gonsálvez 

On 18/11/15 06:54, Jordan Justen wrote:
> From: Francisco Jerez 
> 
> The input of the L3 set-up code is a vector giving the approximate
> desired relative size of each partition.  This implements logic to
> compare the input vector against the table of validated configurations
> for the device and pick the closest compatible one.
> ---
>  src/mesa/drivers/dri/i965/gen7_l3_state.c | 95 
> +++
>  1 file changed, 95 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
> b/src/mesa/drivers/dri/i965/gen7_l3_state.c
> index c863b7f..4d0cfcd 100644
> --- a/src/mesa/drivers/dri/i965/gen7_l3_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
> @@ -163,6 +163,101 @@ get_l3_way_size(const struct brw_device_info *devinfo)
>  }
>  
>  /**
> + * L3 configuration represented as a vector of weights giving the desired
> + * relative size of each partition.  The scale is arbitrary, only the ratios
> + * between weights will have an influence on the selection of the closest L3
> + * configuration.
> + */
> +struct brw_l3_weights {
> +   float w[NUM_L3P];
> +};
> +
> +/**
> + * L1-normalize a vector of L3 partition weights.
> + */
> +static struct brw_l3_weights
> +norm_l3_weights(struct brw_l3_weights w)
> +{
> +   float sz = 0;
> +
> +   for (unsigned i = 0; i < NUM_L3P; i++)
> +  sz += w.w[i];
> +
> +   for (unsigned i = 0; i < NUM_L3P; i++)
> +  w.w[i] /= sz;
> +
> +   return w;
> +}
> +
> +/**
> + * Get the relative partition weights of the specified L3 configuration.
> + */
> +static struct brw_l3_weights
> +get_config_l3_weights(const struct brw_l3_config *cfg)
> +{
> +   if (cfg) {
> +  struct brw_l3_weights w;
> +
> +  for (unsigned i = 0; i < NUM_L3P; i++)
> + w.w[i] = cfg->n[i];
> +
> +  return norm_l3_weights(w);
> +   } else {
> +  const struct brw_l3_weights w = { { 0 } };
> +  return w;
> +   }
> +}
> +
> +/**
> + * Distance between two L3 configurations represented as vectors of weights.
> + * Usually just the L1 metric except when the two configurations are
> + * considered incompatible in which case the distance will be infinite.  Note
> + * that the compatibility condition is asymmetric -- They will be considered
> + * incompatible whenever the reference configuration \p w0 requires SLM, DC,
> + * or URB but \p w1 doesn't provide it.
> + */
> +static float
> +diff_l3_weights(struct brw_l3_weights w0, struct brw_l3_weights w1)
> +{
> +   if ((w0.w[L3P_SLM] && !w1.w[L3P_SLM]) ||
> +   (w0.w[L3P_DC] && !w1.w[L3P_DC] && !w1.w[L3P_ALL]) ||
> +   (w0.w[L3P_URB] && !w1.w[L3P_URB])) {
> +  return HUGE_VALF;
> +
> +   } else {
> +  float dw = 0;
> +
> +  for (unsigned i = 0; i < NUM_L3P; i++)
> + dw += fabs(w0.w[i] - w1.w[i]);
> +
> +  return dw;
> +   }
> +}
> +
> +/**
> + * Return the closest validated L3 configuration for the specified device and
> + * weight vector.
> + */
> +static const struct brw_l3_config *
> +get_l3_config(const struct brw_device_info *devinfo, struct brw_l3_weights 
> w0)
> +{
> +   const struct brw_l3_config *const cfgs = get_l3_configs(devinfo);
> +   const struct brw_l3_config *cfg_best = NULL;
> +   float dw_best = HUGE_VALF;
> +
> +   for (const struct brw_l3_config *cfg = cfgs; cfg->n[L3P_URB]; cfg++) {
> +  const float dw = diff_l3_weights(w0, get_config_l3_weights(cfg));
> +
> +  if (dw < dw_best) {
> + cfg_best = cfg;
> + dw_best = dw;
> +  }
> +   }
> +
> +   return cfg_best;
> +}
> +
> +/**
>   * Program the hardware to use the specified L3 configuration.
>   */
>  static void
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/8] radeonsi: implement AMD_performance_monitor for CIK+

2015-11-25 Thread Nicolai Hähnle

On 25.11.2015 14:56, Marek Olšák wrote:

R600_PC_SHADER_WINDOWING seems unused. What is it good for?


Ah yes, that's totally non-obvious in the code, sorry about that.  It 
makes query->shader non-zero, which will cause SQ_PERFCOUNTER_CTRL to be 
written. I've added a short comment about this.


The long version is this: (Most of) the performance counters in the 
texturing blocks (TA/TD/TCP) are windowed such that they only count 
cycles/events for loads (or stores) issued by shader waves for which 
performance counting is enabled.


So when you simultaneously listen for e.g. SQ_VS_xxx and TAx_xxx, most 
of the TA-counters only reflect loads issued by vertex shaders.


The setting of R600_PC_SHADER_WINDOWING guarantees that when you *only* 
listen to the texture blocks, SQ_PERFCOUNTER_CTRL is reset such that 
accesses by all shader types are counted.



BTW, I think the EVENT_WRITE_EOP workaround is indeed required.


Okay. I've removed the XXX part of the comment.

Cheers,
Nicolai



The series is:
Reviewed-by: Marek Olšák 

Marek

On Tue, Nov 24, 2015 at 5:00 PM, Nicolai Hähnle  wrote:

Expose most of the performance counter groups that are exposed by Catalyst.
Ideally, the driver will work with GPUPerfStudio at some point, but we are not
quite there yet. In any case, this is the reason for grouping multiple
instances of hardware blocks in the way it is implemented.

The counters can also be shown using the Gallium HUD. If one is interested to
see how work is distributed across multiple shader engines, one can set the
environment variable RADEON_PC_SEPARATE_SE=1 to obtain finer-grained performance
counter groups.

Part of the implementation is in radeon because an implementation for
older hardware would largely follow along the same lines, but exposing
a different set of blocks which are programmed slightly differently.
---
  src/gallium/drivers/radeon/Makefile.sources   |   1 +
  src/gallium/drivers/radeon/r600_perfcounter.c | 633 +++
  src/gallium/drivers/radeon/r600_pipe_common.c |   1 +
  src/gallium/drivers/radeon/r600_pipe_common.h |   7 +
  src/gallium/drivers/radeon/r600_query.c   |  21 +-
  src/gallium/drivers/radeon/r600_query.h   | 121 +
  src/gallium/drivers/radeonsi/Makefile.sources |   1 +
  src/gallium/drivers/radeonsi/si_perfcounter.c | 697 ++
  src/gallium/drivers/radeonsi/si_pipe.c|   3 +
  src/gallium/drivers/radeonsi/si_pipe.h|   3 +
  10 files changed, 1485 insertions(+), 3 deletions(-)
  create mode 100644 src/gallium/drivers/radeon/r600_perfcounter.c
  create mode 100644 src/gallium/drivers/radeonsi/si_perfcounter.c

diff --git a/src/gallium/drivers/radeon/Makefile.sources 
b/src/gallium/drivers/radeon/Makefile.sources
index 1dbad2f..eb171f7 100644
--- a/src/gallium/drivers/radeon/Makefile.sources
+++ b/src/gallium/drivers/radeon/Makefile.sources
@@ -4,6 +4,7 @@ C_SOURCES := \
 r600_cs.h \
 r600d_common.h \
 r600_gpu_load.c \
+   r600_perfcounter.c \
 r600_pipe_common.c \
 r600_pipe_common.h \
 r600_query.c \
diff --git a/src/gallium/drivers/radeon/r600_perfcounter.c 
b/src/gallium/drivers/radeon/r600_perfcounter.c
new file mode 100644
index 000..b4915d0
--- /dev/null
+++ b/src/gallium/drivers/radeon/r600_perfcounter.c
@@ -0,0 +1,633 @@
+/*
+ * Copyright 2015 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 
THE
+ * SOFTWARE.
+ *
+ * Authors:
+ *  Nicolai Hähnle 
+ *
+ */
+
+#include "util/u_memory.h"
+#include "r600_query.h"
+#include "r600_pipe_common.h"
+#include "r600d_common.h"
+
+/* Max counters per HW block */
+#define R600_QUERY_MAX_COUNTERS 16
+
+static const char * const r600_pc_shader_suffix[] = {
+   "", "_PS", "_VS", "_GS", "_ES", "_HS", "_LS", "_CS"
+};
+
+static struct r600_perfcounter_block *
+lookup_counter(struct r600_perfcounters *pc, unsigned index,
+ 

Re: [Mesa-dev] [PATCH v2 1/2] st/va: if h264 then delay decoder creation until max_references is known

2015-11-25 Thread Emil Velikov
Hi Julien,

On 25 November 2015 at 09:07, Julien Isorce  wrote:
> From: Julien Isorce 
>
> In general max_references cannot be based on num_render_targets.
>
> This patch allow to allocate accurate sizes for buffers.
> For other codecs it is a fixed value to 2.
>
> This is similar behaviour as vaapi/vdpau-driver.
>
> XXX: do the same for HEVC
>
> Signed-off-by: Julien Isorce 
> ---
>  src/gallium/state_trackers/va/context.c  | 41 
> ++--
>  src/gallium/state_trackers/va/picture.c  | 37 ++---
>  src/gallium/state_trackers/va/picture_h264.c | 29 +++-
>  src/gallium/state_trackers/va/va_private.h   |  4 +--
>  4 files changed, 78 insertions(+), 33 deletions(-)
>
> diff --git a/src/gallium/state_trackers/va/context.c 
> b/src/gallium/state_trackers/va/context.c
> index f0051e5..985007b 100644
> --- a/src/gallium/state_trackers/va/context.c
> +++ b/src/gallium/state_trackers/va/context.c

> +  /* Can only create decoders for which max_references is known. */
> +  if (u_reduce_video_profile(context->templat.profile) !=
> + PIPE_VIDEO_FORMAT_MPEG4_AVC) {
> + context->decoder = drv->pipe->create_video_codec(drv->pipe,
> +&context->templat);
> + if (!context->decoder) {
> +FREE(context);
> +return VA_STATUS_ERROR_ALLOCATION_FAILED;
> + }
>}
What is the benefit of delaying only h264 ? Won't this make things
less obvious and you'll have more subtle checks throughout ?

> --- a/src/gallium/state_trackers/va/picture.c
> +++ b/src/gallium/state_trackers/va/picture.c
> @@ -60,6 +60,12 @@ vlVaBeginPicture(VADriverContextP ctx, VAContextID 
> context_id, VASurfaceID rende
>
> context->target = surf->buffer;
> if (!context->decoder) {
> +  /* Decoder creation is delayed until max_references is set. */
> +  if (u_reduce_video_profile(context->templat.profile) ==
> +  PIPE_VIDEO_FORMAT_MPEG4_AVC)
> + return context->templat.max_references == 0 ?
> +VA_STATUS_SUCCESS : VA_STATUS_ERROR_INVALID_CONTEXT;
> +
Can we really hit the VA_STATUS_ERROR_INVALID_CONTEXT case here ? The
only way that I can think of is consecutive BeginPicture calls without
a EndPicture - surely that can cause issues even without this patch.

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/main: TexImage2DMultisample needs to pass OpenGL3.3 conformance test.

2015-11-25 Thread Tapani Pälli

On 11/25/2015 04:00 PM, Predut, Marius wrote:

-Original Message-
From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf Of
Timothy Arceri
Sent: Wednesday, November 25, 2015 1:12 PM
To: Palli, Tapani; Predut, Marius; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH] mesa/main: TexImage2DMultisample needs to pass
OpenGL3.3 conformance test.

On Wed, 2015-11-25 at 12:47 +0200, Tapani Pälli wrote:

Hi;

On 11/25/2015 01:15 PM, Marius Predut wrote:

Open GL 3.3 reference document says:
samples must be in the range zero to GL_MAX_TEXTURE_SIZE - 1.
Open GL.4 clearly states:
An INVALID_VALUE error is generated if samples is zero.

See my comment in bugzilla [1] I believe this is just a bug in the
reference pages, we implement things in Mesa going by what the spec
says and the spec says nothing about samples being 0 in the 3.2 spec in
fact it doen't even say anything in the 4.0 spec which you have changed
the check to.

Also the 4.5 reference pages also conflict with the spec so this is
even more reason I think this change is wrong.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=91670


I don't think it is a bug in specs because in this case also the CTS and the 
piglit test is wrong:

With this patch 2 things are fixed:
1.Khronos CTS conformance tests for OpenGL 3.3
2. The piglit test https://bugs.freedesktop.org/show_bug.cgi?id=93100 (Or 
https://bugs.freedesktop.org/show_bug.cgi?id=91670)

The patch is based on this spec:
https://www.opengl.org/sdk/docs/man3/xhtml/glTexImage3DMultisample.xml

We can't believe or "suppose" something here, the specs need to be as an axioma.
Can someone confirm that this reference includes the wrong specs?


That's not a specification but a manual page. Timothy is pointing to 
OpenGL specifications (available at www.opengl.org/registry). It is true 
that for example 3.3 Core does not mention this error case which means 
using 0 was allowed there. IMO either we should allow to use 0 (and bump 
it to 1?) when running on 3.x context since it's not forbidden or maybe 
just locally patch this whenever running 3.x conformance.


(It seems OpenGL 4.2 is the first spec to state the INVALID_VALUE error 
case for 0.)






OpenGL ES 3.1 spec also says "An INVALID_VALUE error is generated if
samples is zero.". You'll need to change you check below to include
also
ES 3.1.


Fixing the piglit test case gl-3.2-layered-rendering
-framebuffertexture.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93100

Signed-off-by: Marius Predut 
---
   src/mesa/main/teximage.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
index d9453e3..69634ff 100644
--- a/src/mesa/main/teximage.c
+++ b/src/mesa/main/teximage.c
@@ -5211,7 +5211,7 @@ texture_image_multisample(struct gl_context
*ctx, GLuint dims,
 return;
  }

-   if (samples < 1) {
+   if (samples < 1 && ctx->API == API_OPENGL_CORE && ctx->Version

= 40) {

 _mesa_error(ctx, GL_INVALID_VALUE, "%s(samples < 1)",
func);
 return;
  }

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 24/42] nir: Translate glsl shared var load intrinsic to nir intrinsic

2015-11-25 Thread Connor Abbott
On Wed, Nov 25, 2015 at 4:48 AM, Iago Toral  wrote:
> On Tue, 2015-11-17 at 21:54 -0800, Jordan Justen wrote:
>> Signed-off-by: Jordan Justen 
>> ---
>>  src/glsl/nir/glsl_to_nir.cpp  | 29 +
>>  src/glsl/nir/nir_intrinsics.h |  1 +
>>  2 files changed, 30 insertions(+)
>>
>> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
>> index 6d24341..a59d09c 100644
>> --- a/src/glsl/nir/glsl_to_nir.cpp
>> +++ b/src/glsl/nir/glsl_to_nir.cpp
>> @@ -727,6 +727,8 @@ nir_visitor::visit(ir_call *ir)
>>   op = nir_intrinsic_memory_barrier_image;
>>} else if (strcmp(ir->callee_name(), 
>> "__intrinsic_memory_barrier_shared") == 0) {
>>   op = nir_intrinsic_memory_barrier_shared;
>> +  } else if (strcmp(ir->callee_name(), "__intrinsic_load_shared") == 0) 
>> {
>> + op = nir_intrinsic_load_shared;
>>} else {
>>   unreachable("not reached");
>>}
>> @@ -974,6 +976,33 @@ nir_visitor::visit(ir_call *ir)
>>   nir_builder_instr_insert(&b, &instr->instr);
>>   break;
>>}
>> +  case nir_intrinsic_load_shared: {
>> + exec_node *param = ir->actual_parameters.get_head();
>> + ir_rvalue *offset = ((ir_instruction *)param)->as_rvalue();
>> +
>> + /* Check if we need the indirect version */
>> + ir_constant *const_offset = offset->as_constant();
>> + if (!const_offset) {
>> +op = nir_intrinsic_load_shared_indirect;
>> +ralloc_free(instr);
>> +instr = nir_intrinsic_instr_create(shader, op);
>> +instr->src[0] = nir_src_for_ssa(evaluate_rvalue(offset));
>> +instr->const_index[0] = 0;
>> +dest = &instr->dest;
>> + } else {
>> +instr->const_index[0] = const_offset->value.u[0];
>> + }
>> +
>> + const glsl_type *type = ir->return_deref->var->type;
>> + instr->num_components = type->vector_elements;
>> +
>> + /* Setup destination register */
>> + nir_ssa_dest_init(&instr->instr, &instr->dest,
>> +   type->vector_elements, NULL);
>> +
>> + nir_builder_instr_insert(&b, &instr->instr);
>
> In ubos and ssbo loads we need to fixup bool types, it looks like you
> would need to do the same here, right?

I don't think that's necessary, because shared variables are never
read/written on the CPU, so we just always use ~0/0 for true/false and
never have to convert to what the API expects.

>
>> + break;
>> +  }
>>default:
>>   unreachable("not reached");
>>}
>> diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
>> index b8d7d6c..de15128 100644
>> --- a/src/glsl/nir/nir_intrinsics.h
>> +++ b/src/glsl/nir/nir_intrinsics.h
>> @@ -257,6 +257,7 @@ LOAD(per_vertex_input, 1, 1, NIR_INTRINSIC_CAN_ELIMINATE 
>> | NIR_INTRINSIC_CAN_REO
>>  LOAD(ssbo, 1, 1, NIR_INTRINSIC_CAN_ELIMINATE)
>>  LOAD(output, 0, 1, NIR_INTRINSIC_CAN_ELIMINATE)
>>  LOAD(per_vertex_output, 1, 1, NIR_INTRINSIC_CAN_ELIMINATE)
>> +LOAD(shared, 0, 1, NIR_INTRINSIC_CAN_ELIMINATE)
>>
>>  /*
>>   * Stores work the same way as loads, except now the first register input is
>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] question on GL_RED and gles

2015-11-25 Thread Julien Isorce
Hi,

In EXT_texture_rg.txt it is mentioned of GL_RED_EXT on gles 2.0.

In  glformats.c::_mesa_es_error_check_format_and_type returns
GL_INVALID_VALUE if  GL_RED_EXT(as it reaches default case)
so glTexImage2D(..., GL_RED_EXT, GL_UNSIGNED_BYTE, data) fails.

Though GL_EXTENSIONS contains GL_EXT_texture_rg.

So it seems that GL_RED_EXT is actually only possible with gles >= 3.0.

What am I missing ?

Thx in advance for any clarification.
Julien
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] docs: minimum required python mako version is 0.3.4

2015-11-25 Thread Samuel Iglesias Gonsálvez
Signed-off-by: Samuel Iglesias Gonsálvez 
---
 docs/install.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/install.html b/docs/install.html
index a90c2b2..c826d64 100644
--- a/docs/install.html
+++ b/docs/install.html
@@ -39,7 +39,7 @@ Version 2.6.4 or later should work.
 
 
 http://www.makotemplates.org/";>Python Mako module -
-Python Mako module is required. Version 0.7.3 or later should work.
+Python Mako module is required. Version 0.3.4 or later should work.
 
 
 http://www.scons.org/";>SCons is required for building on
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 2/2] targets: use the non-inline sw helpers

2015-11-25 Thread Nick Sarnie
On Mon, Nov 23, 2015 at 10:28 AM, Emil Velikov 
wrote:

> Previously (with the inline ones) things were embedded into the
> pipe-loader, which means that we cannot control/select what we want in
> each target.
>
> That also meant that at runtime we ended up with the empty
> sw_screen_create() as the GALLIUM_SOFTPIPE/LLVMPIPE were not set.
>
> v2: Cover all the targets, not just dri.
>
> Cc: "11.1" 
> Cc: Oded Gabbay 
> Cc: Ilia Mirkin 
> Cc: Edward O'Callaghan 
> Signed-off-by: Emil Velikov 
> ---
>  src/gallium/auxiliary/pipe-loader/pipe_loader_sw.c | 3 ++-
>  src/gallium/targets/d3dadapter9/drm.c  | 1 +
>  src/gallium/targets/dri/target.c   | 1 +
>  src/gallium/targets/omx/target.c   | 1 +
>  src/gallium/targets/va/target.c| 1 +
>  src/gallium/targets/vdpau/target.c | 1 +
>  src/gallium/targets/xa/target.c| 1 +
>  7 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/src/gallium/auxiliary/pipe-loader/pipe_loader_sw.c
> b/src/gallium/auxiliary/pipe-loader/pipe_loader_sw.c
> index 5539a73..1c38750 100644
> --- a/src/gallium/auxiliary/pipe-loader/pipe_loader_sw.c
> +++ b/src/gallium/auxiliary/pipe-loader/pipe_loader_sw.c
> @@ -33,9 +33,10 @@
>  #include "sw/kms-dri/kms_dri_sw_winsys.h"
>  #include "sw/null/null_sw_winsys.h"
>  #include "sw/wrapper/wrapper_sw_winsys.h"
> -#include "target-helpers/inline_sw_helper.h"
> +#include "target-helpers/sw_helper_public.h"
>  #include "state_tracker/drisw_api.h"
>  #include "state_tracker/sw_driver.h"
> +#include "state_tracker/sw_winsys.h"
>
>  struct pipe_loader_sw_device {
> struct pipe_loader_device base;
> diff --git a/src/gallium/targets/d3dadapter9/drm.c
> b/src/gallium/targets/d3dadapter9/drm.c
> index ad712db..5cd1ba7 100644
> --- a/src/gallium/targets/d3dadapter9/drm.c
> +++ b/src/gallium/targets/d3dadapter9/drm.c
> @@ -31,6 +31,7 @@
>  #include "pipe/p_state.h"
>
>  #include "target-helpers/drm_helper.h"
> +#include "target-helpers/sw_helper.h"
>  #include "state_tracker/drm_driver.h"
>
>  #include "d3dadapter/d3dadapter9.h"
> diff --git a/src/gallium/targets/dri/target.c
> b/src/gallium/targets/dri/target.c
> index d6fbd01..01532e2 100644
> --- a/src/gallium/targets/dri/target.c
> +++ b/src/gallium/targets/dri/target.c
> @@ -1,4 +1,5 @@
>  #include "target-helpers/drm_helper.h"
> +#include "target-helpers/sw_helper.h"
>
>  #include "dri_screen.h"
>
> diff --git a/src/gallium/targets/omx/target.c
> b/src/gallium/targets/omx/target.c
> index 42b1346..308e23b 100644
> --- a/src/gallium/targets/omx/target.c
> +++ b/src/gallium/targets/omx/target.c
> @@ -1 +1,2 @@
>  #include "target-helpers/drm_helper.h"
> +#include "target-helpers/sw_helper.h"
> diff --git a/src/gallium/targets/va/target.c
> b/src/gallium/targets/va/target.c
> index 42b1346..308e23b 100644
> --- a/src/gallium/targets/va/target.c
> +++ b/src/gallium/targets/va/target.c
> @@ -1 +1,2 @@
>  #include "target-helpers/drm_helper.h"
> +#include "target-helpers/sw_helper.h"
> diff --git a/src/gallium/targets/vdpau/target.c
> b/src/gallium/targets/vdpau/target.c
> index 42b1346..308e23b 100644
> --- a/src/gallium/targets/vdpau/target.c
> +++ b/src/gallium/targets/vdpau/target.c
> @@ -1 +1,2 @@
>  #include "target-helpers/drm_helper.h"
> +#include "target-helpers/sw_helper.h"
> diff --git a/src/gallium/targets/xa/target.c
> b/src/gallium/targets/xa/target.c
> index 42b1346..308e23b 100644
> --- a/src/gallium/targets/xa/target.c
> +++ b/src/gallium/targets/xa/target.c
> @@ -1 +1,2 @@
>  #include "target-helpers/drm_helper.h"
> +#include "target-helpers/sw_helper.h"
> --
> 2.6.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>


Hi,

This fixes Gallium Nine for me.

For the series:
Tested-by: Nick Sarnie 

Thanks
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] docs: minimum required python mako version is 0.3.4

2015-11-25 Thread Emil Velikov
On 25 November 2015 at 15:35, Samuel Iglesias Gonsálvez
 wrote:
> Signed-off-by: Samuel Iglesias Gonsálvez 
> ---
>  docs/install.html | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/docs/install.html b/docs/install.html
> index a90c2b2..c826d64 100644
> --- a/docs/install.html
> +++ b/docs/install.html
> @@ -39,7 +39,7 @@ Version 2.6.4 or later should work.
>  
>  
>  http://www.makotemplates.org/";>Python Mako module -
> -Python Mako module is required. Version 0.7.3 or later should work.
> +Python Mako module is required. Version 0.3.4 or later should work.
>  
>  
>  http://www.scons.org/";>SCons is required for building on

Looks like it was missed with commit 6d43a4c3386 "configure: change
required Python Mako version to 0.3.4"

Reviewed-by: Emil Velikov 

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 93103] llvm symbols leak through, cause trouble with software rendering in llvm-linked software

2015-11-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=93103

Bug ID: 93103
   Summary: llvm symbols leak through, cause trouble with software
rendering in llvm-linked software
   Product: Mesa
   Version: 10.1
  Hardware: Other
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: Drivers/X11
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: t...@schlueters.de
QA Contact: mesa-dev@lists.freedesktop.org

libmesa uses llvm in its software rendering path.  I'm trying to run via X
forwarding (and thus using software rendering) a software project on ubuntu
14.04 that uses OpenGL for visualization and links against its own copy of llvm
(via the root libraries, I don't think this matters, and they have recently put
in place a workaround for this problem, their bug report is here
https://sft.its.cern.ch/jira/browse/ROOT-7744).

When loading the mesa libraries I get a backtrace that looks as follows
(libCling is the interface to llvm):
#5  0x7f3eff60deb1 in llvm::cl::AddLiteralOption(llvm::cl::Option&, char
const*) () from
/home/schlueter/src/belle2/externals/v01-01-01/Linux_x86_64/opt/root/lib/libCling.so
#6  0x7f3ecadcfdcc in ?? () from /usr/lib/x86_64-linux-gnu/libLLVM-3.4.so.1
#7  0x7f3f0867913a in ?? () from /lib64/ld-linux-x86-64.so.2
#8  0x7f3f08679223 in ?? () from /lib64/ld-linux-x86-64.so.2
#9  0x7f3f0867dc70 in ?? () from /lib64/ld-linux-x86-64.so.2
#10 0x7f3f08678ff4 in ?? () from /lib64/ld-linux-x86-64.so.2
#11 0x7f3f0867d3bb in ?? () from /lib64/ld-linux-x86-64.so.2
#12 0x7f3f06cd902b in ?? () from /lib/x86_64-linux-gnu/libdl.so.2
#13 0x7f3f08678ff4 in ?? () from /lib64/ld-linux-x86-64.so.2
#14 0x7f3f06cd962d in ?? () from /lib/x86_64-linux-gnu/libdl.so.2
#15 0x7f3f06cd90c1 in dlopen () from /lib/x86_64-linux-gnu/libdl.so.2
#16 0x7f3edc84801c in ?? () from /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1
#17 0x7f3edc847719 in ?? () from /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1
#18 0x7f3edc828558 in ?? () from /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1
#19 0x7f3edc824f3b in ?? () from /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1
#20 0x7f3edc8256d0 in glXChooseVisual () from
/usr/lib/x86_64-linux-gnu/mesa/libGL.so.1
#21 0x7f3eddc7ffc1 in TGLWidget::CreateWindow(TGWindow const*, TGLFormat
const&, unsigned int, unsigned int, std::pair&) () from
/home/schlueter/src/belle2/externals/v01-01-01/Linux_x86_64/opt/root/lib/libRGL.so
#22 0x7f3eddc80500 in TGLWidget::Create(TGLFormat const&, TGWindow const*,
bool, bool, TGLPaintDevice const*, unsigned int, unsigned int) () from
/home/schlueter/src/belle2/externals/v01-01-01/Linux_x86_64/opt/root/lib/libRGL.so
#23 0x7f3eddc80723 in TGLWidget::Create(TGWindow const*, bool, bool,
TGLPaintDevice const*, unsigned int, unsigned int) () from
/home/schlueter/src/belle2/externals/v01-01-01/Linux_x86_64/opt/root/lib/libRGL.so
#24 0x7f3eddbf4801 in TGLEmbeddedViewer::CreateFrames() () from
/home/schlueter/src/belle2/externals/v01-01-01/Linux_x86_64/opt/root/lib/libRGL.so

The reason is that the symbols from the libllvm that libmesa is linked against
collide with the symbols of my already loaded llvm library.  I don't think
libmesa should load libraries in a way that pollutes the symbol namespace,
depending on the execution path taken.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 5.5/42] i965: Add slice count to the brw_device_info structure.

2015-11-25 Thread Samuel Iglesias Gonsálvez
Reviewed-by: Samuel Iglesias Gonsálvez 

On 25/11/15 15:22, Francisco Jerez wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_device_info.c | 20 
>  src/mesa/drivers/dri/i965/brw_device_info.h |  5 +
>  2 files changed, 25 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_device_info.c 
> b/src/mesa/drivers/dri/i965/brw_device_info.c
> index 4550550..bf31898 100644
> --- a/src/mesa/drivers/dri/i965/brw_device_info.c
> +++ b/src/mesa/drivers/dri/i965/brw_device_info.c
> @@ -28,6 +28,7 @@
>  static const struct brw_device_info brw_device_info_i965 = {
> .gen = 4,
> .has_negative_rhw_bug = true,
> +   .num_slices = 1,
> .max_vs_threads = 16,
> .max_gs_threads = 2,
> .max_wm_threads = 8 * 4,
> @@ -42,6 +43,7 @@ static const struct brw_device_info brw_device_info_g4x = {
> .has_compr4 = true,
> .has_surface_tile_offset = true,
> .is_g4x = true,
> +   .num_slices = 1,
> .max_vs_threads = 32,
> .max_gs_threads = 2,
> .max_wm_threads = 10 * 5,
> @@ -55,6 +57,7 @@ static const struct brw_device_info brw_device_info_ilk = {
> .has_pln = true,
> .has_compr4 = true,
> .has_surface_tile_offset = true,
> +   .num_slices = 1,
> .max_vs_threads = 72,
> .max_gs_threads = 32,
> .max_wm_threads = 12 * 6,
> @@ -71,6 +74,7 @@ static const struct brw_device_info brw_device_info_snb_gt1 
> = {
> .has_pln = true,
> .has_surface_tile_offset = true,
> .needs_unlit_centroid_workaround = true,
> +   .num_slices = 1,
> .max_vs_threads = 24,
> .max_gs_threads = 21, /* conservative; 24 if rendering disabled. */
> .max_wm_threads = 40,
> @@ -90,6 +94,7 @@ static const struct brw_device_info brw_device_info_snb_gt2 
> = {
> .has_pln = true,
> .has_surface_tile_offset = true,
> .needs_unlit_centroid_workaround = true,
> +   .num_slices = 1,
> .max_vs_threads = 60,
> .max_gs_threads = 60,
> .max_wm_threads = 80,
> @@ -112,6 +117,7 @@ static const struct brw_device_info 
> brw_device_info_snb_gt2 = {
>  static const struct brw_device_info brw_device_info_ivb_gt1 = {
> GEN7_FEATURES, .is_ivybridge = true, .gt = 1,
> .needs_unlit_centroid_workaround = true,
> +   .num_slices = 1,
> .max_vs_threads = 36,
> .max_hs_threads = 36,
> .max_ds_threads = 36,
> @@ -131,6 +137,7 @@ static const struct brw_device_info 
> brw_device_info_ivb_gt1 = {
>  static const struct brw_device_info brw_device_info_ivb_gt2 = {
> GEN7_FEATURES, .is_ivybridge = true, .gt = 2,
> .needs_unlit_centroid_workaround = true,
> +   .num_slices = 1,
> .max_vs_threads = 128,
> .max_hs_threads = 128,
> .max_ds_threads = 128,
> @@ -150,6 +157,7 @@ static const struct brw_device_info 
> brw_device_info_ivb_gt2 = {
>  static const struct brw_device_info brw_device_info_byt = {
> GEN7_FEATURES, .is_baytrail = true, .gt = 1,
> .needs_unlit_centroid_workaround = true,
> +   .num_slices = 1,
> .has_llc = false,
> .max_vs_threads = 36,
> .max_hs_threads = 36,
> @@ -175,6 +183,7 @@ static const struct brw_device_info brw_device_info_byt = 
> {
>  
>  static const struct brw_device_info brw_device_info_hsw_gt1 = {
> HSW_FEATURES, .gt = 1,
> +   .num_slices = 1,
> .max_vs_threads = 70,
> .max_hs_threads = 70,
> .max_ds_threads = 70,
> @@ -193,6 +202,7 @@ static const struct brw_device_info 
> brw_device_info_hsw_gt1 = {
>  
>  static const struct brw_device_info brw_device_info_hsw_gt2 = {
> HSW_FEATURES, .gt = 2,
> +   .num_slices = 1,
> .max_vs_threads = 280,
> .max_hs_threads = 256,
> .max_ds_threads = 280,
> @@ -211,6 +221,7 @@ static const struct brw_device_info 
> brw_device_info_hsw_gt2 = {
>  
>  static const struct brw_device_info brw_device_info_hsw_gt3 = {
> HSW_FEATURES, .gt = 3,
> +   .num_slices = 2,
> .max_vs_threads = 280,
> .max_hs_threads = 256,
> .max_ds_threads = 280,
> @@ -243,6 +254,7 @@ static const struct brw_device_info 
> brw_device_info_hsw_gt3 = {
>  
>  static const struct brw_device_info brw_device_info_bdw_gt1 = {
> GEN8_FEATURES, .gt = 1,
> +   .num_slices = 1,
> .max_cs_threads = 42,
> .urb = {
>.size = 192,
> @@ -256,6 +268,7 @@ static const struct brw_device_info 
> brw_device_info_bdw_gt1 = {
>  
>  static const struct brw_device_info brw_device_info_bdw_gt2 = {
> GEN8_FEATURES, .gt = 2,
> +   .num_slices = 1,
> .max_cs_threads = 56,
> .urb = {
>.size = 384,
> @@ -269,6 +282,7 @@ static const struct brw_device_info 
> brw_device_info_bdw_gt2 = {
>  
>  static const struct brw_device_info brw_device_info_bdw_gt3 = {
> GEN8_FEATURES, .gt = 3,
> +   .num_slices = 2,
> .max_cs_threads = 56,
> .urb = {
>.size = 384,
> @@ -283,6 +297,7 @@ static const struct brw_device_info 
> brw_device_info_bdw_gt3 = {
>  static const struct brw_device_info brw_device_info_chv = {
> GEN8_FEATURES, .is_cherryview = 1, .gt = 1,
> .has_llc = false,
> +   .num_s

Re: [Mesa-dev] [PATCH] mesa/main: TexImage2DMultisample needs to pass OpenGL3.3 conformance test.

2015-11-25 Thread Timothy Arceri
On Wed, 2015-11-25 at 17:13 +0200, Tapani Pälli wrote:
> On 11/25/2015 04:00 PM, Predut, Marius wrote:
> > > -Original Message-
> > > From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On
> > > Behalf Of
> > > Timothy Arceri
> > > Sent: Wednesday, November 25, 2015 1:12 PM
> > > To: Palli, Tapani; Predut, Marius; mesa-dev@lists.freedesktop.org
> > > Subject: Re: [Mesa-dev] [PATCH] mesa/main: TexImage2DMultisample
> > > needs to pass
> > > OpenGL3.3 conformance test.
> > > 
> > > On Wed, 2015-11-25 at 12:47 +0200, Tapani Pälli wrote:
> > > > Hi;
> > > > 
> > > > On 11/25/2015 01:15 PM, Marius Predut wrote:
> > > > > Open GL 3.3 reference document says:
> > > > > samples must be in the range zero to GL_MAX_TEXTURE_SIZE - 1.
> > > > > Open GL.4 clearly states:
> > > > > An INVALID_VALUE error is generated if samples is zero.
> > > See my comment in bugzilla [1] I believe this is just a bug in
> > > the
> > > reference pages, we implement things in Mesa going by what the
> > > spec
> > > says and the spec says nothing about samples being 0 in the 3.2
> > > spec in
> > > fact it doen't even say anything in the 4.0 spec which you have
> > > changed
> > > the check to.
> > > 
> > > Also the 4.5 reference pages also conflict with the spec so this
> > > is
> > > even more reason I think this change is wrong.
> > > 
> > > [1] https://bugs.freedesktop.org/show_bug.cgi?id=91670
> > > 
> > I don't think it is a bug in specs because in this case also the
> > CTS and the piglit test is wrong:
> > 
> > With this patch 2 things are fixed:
> > 1.Khronos CTS conformance tests for OpenGL 3.3
> > 2. The piglit test 
> > https://bugs.freedesktop.org/show_bug.cgi?id=93100 (Or 
> > https://bugs.freedesktop.org/show_bug.cgi?id=91670)
> > 
> > The patch is based on this spec:
> > https://www.opengl.org/sdk/docs/man3/xhtml/glTexImage3DMultisample.
> > xml
> > 
> > We can't believe or "suppose" something here, the specs need to be
> > as an axioma.
> > Can someone confirm that this reference includes the wrong specs?
> 
> That's not a specification but a manual page. Timothy is pointing to 
> OpenGL specifications (available at www.opengl.org/registry). It is
> true 
> that for example 3.3 Core does not mention this error case which
> means 
> using 0 was allowed there. IMO either we should allow to use 0 (and
> bump 
> it to 1?) when running on 3.x context since it's not forbidden or
> maybe 
> just locally patch this whenever running 3.x conformance.

Hi Marius,

Please provide the test CTS test that you are trying to fix so that we
can take a look at what the CTS is trying to test.

While the OpenGL 3.2 -> 4.2 specs seem to allow zero due simple to not
mentioning it seems odd to me that the CTS would specificly test for
this. As far as I understand it a value of 0 would result in undefined
behaviour, so it doesn't seem right to allow this for these versions of
OpenGL. IMO we should be going with the later specs that fix this
oversight for all versions of GL.

One thing I can see the CTS doing in various tests is querying the GL
implementation for values, its possible we are passing a value of 0
back to the tests somewhere and its trying to used this with the
multisample functions. The only way to know whats going on is if you
tell us which test you are trying to fix.

Tim

> 
> (It seems OpenGL 4.2 is the first spec to state the INVALID_VALUE
> error 
> case for 0.)
> > > > OpenGL ES 3.1 spec also says "An INVALID_VALUE error is
> > > > generated if
> > > > samples is zero.". You'll need to change you check below to
> > > > include
> > > > also
> > > > ES 3.1.
> > > > 
> > > > > Fixing the piglit test case gl-3.2-layered-rendering
> > > > > -framebuffertexture.
> > > > > 
> > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93100
> > > > > 
> > > > > Signed-off-by: Marius Predut 
> > > > > ---
> > > > >src/mesa/main/teximage.c | 2 +-
> > > > >1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/src/mesa/main/teximage.c
> > > > > b/src/mesa/main/teximage.c
> > > > > index d9453e3..69634ff 100644
> > > > > --- a/src/mesa/main/teximage.c
> > > > > +++ b/src/mesa/main/teximage.c
> > > > > @@ -5211,7 +5211,7 @@ texture_image_multisample(struct
> > > > > gl_context
> > > > > *ctx, GLuint dims,
> > > > >  return;
> > > > >   }
> > > > > 
> > > > > -   if (samples < 1) {
> > > > > +   if (samples < 1 && ctx->API == API_OPENGL_CORE && ctx
> > > > > ->Version
> > > > > > = 40) {
> > > > >  _mesa_error(ctx, GL_INVALID_VALUE, "%s(samples <
> > > > > 1)",
> > > > > func);
> > > > >  return;
> > > > >   }
> > > > ___
> > > > mesa-dev mailing list
> > > > mesa-dev@lists.freedesktop.org
> > > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > > ___
> > > mesa-dev mailing list
> > > mesa-dev@lists.freedesktop.org
> > > http://lists.freedesktop.org/mailma

Re: [Mesa-dev] [PATCH v3 06/42] i965: Import tables enumerating the set of validated L3 configurations.

2015-11-25 Thread Samuel Iglesias Gonsálvez
Just to make it explicit,

Reviewed-by: Samuel Iglesias Gonsálvez 

Sam

On 25/11/15 15:24, Francisco Jerez wrote:
> It should be possible to use additional L3 configurations other than
> the ones listed in the tables of validated allocations ("BSpec »
> 3D-Media-GPGPU Engine » L3 Cache and URB [IVB+] » L3 Cache and URB [*]
> » L3 Allocation and Programming"), but it seems sensible for now to
> hard-code the tables in order to stick to the hardware docs.  Instead
> of setting up the arbitrary L3 partitioning given as input, the
> closest validated L3 configuration will be looked up in these tables
> and used to program the hardware.
> 
> The included tables should work for Gen7-9.  Note that the quantities
> are specified in ways rather than in KB, this is because the L3
> control registers expect the value in ways, and because by doing that
> we can re-use a single table for all GT variants of the same
> generation (and in the case of IVB/HSW and CHV/SKL across different
> generations) which generally have different L3 way sizes but allow the
> same combinations of way allocations.
> 
> v3: Use slice count from the devinfo structure instead of the gt
> number to implement get_l3_way_size().
> ---
>  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
>  src/mesa/drivers/dri/i965/gen7_l3_state.c  | 163 
> +
>  2 files changed, 164 insertions(+)
>  create mode 100644 src/mesa/drivers/dri/i965/gen7_l3_state.c
> 
> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
> b/src/mesa/drivers/dri/i965/Makefile.sources
> index 5a88d66..91901ad 100644
> --- a/src/mesa/drivers/dri/i965/Makefile.sources
> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
> @@ -184,6 +184,7 @@ i965_FILES = \
>   gen7_cs_state.c \
>   gen7_disable.c \
>   gen7_gs_state.c \
> + gen7_l3_state.c \
>   gen7_misc_state.c \
>   gen7_sf_state.c \
>   gen7_sol_state.c \
> diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
> b/src/mesa/drivers/dri/i965/gen7_l3_state.c
> new file mode 100644
> index 000..8765b11
> --- /dev/null
> +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
> @@ -0,0 +1,163 @@
> +/*
> + * Copyright (c) 2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include "brw_context.h"
> +#include "brw_defines.h"
> +#include "brw_state.h"
> +#include "intel_batchbuffer.h"
> +
> +/**
> + * Chunk of L3 cache reserved for some specific purpose.
> + */
> +enum brw_l3_partition {
> +   /** Shared local memory. */
> +   L3P_SLM = 0,
> +   /** Unified return buffer. */
> +   L3P_URB,
> +   /** Union of DC and RO. */
> +   L3P_ALL,
> +   /** Data cluster RW partition. */
> +   L3P_DC,
> +   /** Union of IS, C and T. */
> +   L3P_RO,
> +   /** Instruction and state cache. */
> +   L3P_IS,
> +   /** Constant cache. */
> +   L3P_C,
> +   /** Texture cache. */
> +   L3P_T,
> +   /** Number of supported L3 partitions. */
> +   NUM_L3P
> +};
> +
> +/**
> + * L3 configuration represented as the number of ways allocated for each
> + * partition.  \sa get_l3_way_size().
> + */
> +struct brw_l3_config {
> +   unsigned n[NUM_L3P];
> +};
> +
> +/**
> + * IVB/HSW validated L3 configurations.
> + */
> +static const struct brw_l3_config ivb_l3_configs[] = {
> +   {{  0, 32,  0,  0, 32,  0,  0,  0 }},
> +   {{  0, 32,  0, 16, 16,  0,  0,  0 }},
> +   {{  0, 32,  0,  4,  0,  8,  4, 16 }},
> +   {{  0, 28,  0,  8,  0,  8,  4, 16 }},
> +   {{  0, 28,  0, 16,  0,  8,  4,  8 }},
> +   {{  0, 28,  0,  8,  0, 16,  4,  8 }},
> +   {{  0, 28,  0,  0,  0, 16,  4, 16 }},
> +   {{  0, 32,  0,  0,  0, 16,  0, 16 }},
> +   {{  0, 28,  0,  4, 32,  0,  0,  0 }},
> +   {{ 16, 16,  0, 16, 16,  0,  0,  0 }},
> +   {{ 16, 16,  0,  8,  0,  8,  8,  8 }},
> +   {{ 16, 16,  0,  4,  0,  8,  4, 16 }},
> +   {{ 16, 16,  0,  4,  0, 16,  4,  8 }},
> +   {{ 16, 16,  0,  0, 32,  0,  0, 

Re: [Mesa-dev] [PATCH] gallium/tests: fix build with clang compiler

2015-11-25 Thread Samuel Pitoiset



On 11/25/2015 05:16 PM, Eirik Byrkjeflot Anonsen wrote:

Samuel Pitoiset  writes:


Nested functions are supported as an extension in GNU C, but Clang
don't support them.

This fixes compilation errors when (manually) building compute.c,
or by setting --enable-gallium-tests to the configure script.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75165

Signed-off-by: Samuel Pitoiset 
---
  src/gallium/tests/trivial/compute.c | 643 +---
  1 file changed, 370 insertions(+), 273 deletions(-)

diff --git a/src/gallium/tests/trivial/compute.c 
b/src/gallium/tests/trivial/compute.c
index bcdfb11..4cb32e5 100644
--- a/src/gallium/tests/trivial/compute.c
+++ b/src/gallium/tests/trivial/compute.c
@@ -428,6 +428,35 @@ static void launch_grid(struct context *ctx, const uint 
*block_layout,
  pipe->launch_grid(pipe, block_layout, grid_layout, pc, input);
  }

+/* test_system_values */
+static void test_system_values_init(void *p, int s, int x, int y)
+{
+*(uint32_t *)p = 0xdeadbeef;
+}
+
+static void test_sytem_values_expect(void *p, int s, int x, int y)


s/sytem/system/, I assume?


Your assumption is correct. Good catch, thanks!



eirik


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 93103] llvm symbols leak through, cause trouble with software rendering in llvm-linked software

2015-11-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=93103

Jose Fonseca  changed:

   What|Removed |Added

 CC||jfons...@vmware.com

--- Comment #1 from Jose Fonseca  ---
I said over and over again that building Mesa drivers with shared LLVM library
(as opposed to statically linked) was a bad idea.

You need to build with

  --disable-llvm-shared-libs

and lobby the Linux distributions to not do it.


In addition to that, we probably also need to use a LD version script to ensure
that LLVM symbols don't pop in the dynamic symbol table.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 93103] llvm symbols leak through, cause trouble with software rendering in llvm-linked software

2015-11-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=93103

--- Comment #2 from Emil Velikov  ---
Hmm I'm pretty sure that I removed all of those an year or two ago. 

And looking at the patches in said report, it seems that it was a problem on
their end -> they were not hiding the (should be) internal symbols.

The only thing that can remotely cause problems is that we dlopen(RTLD_GLOBAL)
the module which internally references LLVM. You can try the following patch,
although we cannot get it upstream without some serious work or we'll break a
lot of applications.



diff --git a/src/glx/dri_common.c b/src/glx/dri_common.c
index 8a56385..2c2eef6 100644
--- a/src/glx/dri_common.c
+++ b/src/glx/dri_common.c
@@ -103,7 +103,7 @@ driOpenDriver(const char *driverName)
int len;

/* Attempt to make sure libGL symbols will be visible to the driver */
-   glhandle = dlopen(GL_LIB_NAME, RTLD_NOW | RTLD_GLOBAL);
+   glhandle = dlopen(GL_LIB_NAME, RTLD_NOW | RTLD_LOCAL);

libPaths = NULL;
if (geteuid() == getuid()) {
@@ -131,14 +131,14 @@ driOpenDriver(const char *driverName)
   snprintf(realDriverName, sizeof realDriverName,
"%.*s/tls/%s_dri.so", len, p, driverName);
   InfoMessageF("OpenDriver: trying %s\n", realDriverName);
-  handle = dlopen(realDriverName, RTLD_NOW | RTLD_GLOBAL);
+  handle = dlopen(realDriverName, RTLD_NOW | RTLD_LOCAL);
 #endif

   if (handle == NULL) {
  snprintf(realDriverName, sizeof realDriverName,
   "%.*s/%s_dri.so", len, p, driverName);
  InfoMessageF("OpenDriver: trying %s\n", realDriverName);
- handle = dlopen(realDriverName, RTLD_NOW | RTLD_GLOBAL);
+ handle = dlopen(realDriverName, RTLD_NOW | RTLD_LOCAL);
   }

   if (handle != NULL)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium/tests: fix build with clang compiler

2015-11-25 Thread Eirik Byrkjeflot Anonsen
Samuel Pitoiset  writes:

> Nested functions are supported as an extension in GNU C, but Clang
> don't support them.
>
> This fixes compilation errors when (manually) building compute.c,
> or by setting --enable-gallium-tests to the configure script.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75165
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/tests/trivial/compute.c | 643 
> +---
>  1 file changed, 370 insertions(+), 273 deletions(-)
>
> diff --git a/src/gallium/tests/trivial/compute.c 
> b/src/gallium/tests/trivial/compute.c
> index bcdfb11..4cb32e5 100644
> --- a/src/gallium/tests/trivial/compute.c
> +++ b/src/gallium/tests/trivial/compute.c
> @@ -428,6 +428,35 @@ static void launch_grid(struct context *ctx, const uint 
> *block_layout,
>  pipe->launch_grid(pipe, block_layout, grid_layout, pc, input);
>  }
>  
> +/* test_system_values */
> +static void test_system_values_init(void *p, int s, int x, int y)
> +{
> +*(uint32_t *)p = 0xdeadbeef;
> +}
> +
> +static void test_sytem_values_expect(void *p, int s, int x, int y)

s/sytem/system/, I assume?

eirik
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 93103] llvm symbols leak through, cause trouble with software rendering in llvm-linked software

2015-11-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=93103

--- Comment #3 from Emil Velikov  ---
(In reply to Jose Fonseca from comment #1)

> In addition to that, we probably also need to use a LD version script to
> ensure that LLVM symbols don't pop in the dynamic symbol table.
We have those for a while. Atm only the autotools build uses them (hint hint
scons).

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 93091] [opencl] segfault when running any opencl programs (like clinfo)

2015-11-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=93091

--- Comment #7 from Aaron Watry  ---
Bah, ignore me.

I could still reproduce the issue yesterday to the best of my knowledge, but
after an llvm/mesa rebuild with the patch applied this morning, things are
working correctly...

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-announce] Mesa 11.1.0 release candidate 1

2015-11-25 Thread Mark Janes
Hi Emil,

I noticed that this branchpoint is after the KHR_DEBUG patches which
broke GL conformance.

Is the plan to resolve this bug before release?

https://bugs.freedesktop.org/show_bug.cgi?id=93048

-Mark

Emil Velikov  writes:

> On 23 November 2015 at 09:18, Thierry Vignaud  
> wrote:
>> On 21 November 2015 at 15:28, Emil Velikov  wrote:
>>> The first release candidate for Mesa 11.1.0 is now available.
>>>
>>> The plan is to have one release candidate every Friday, until the final
>>> release on December 11th 2015.
>>>
>>> As a reminder, with the 11.1 branch now created, patches nominated with:
>>>
>>> CC: 
>>>
>>> will now be candidates only for the new 11.1 branch. To nominate patches
>>> for the older 11.0 branch as well, please use:
>>>
>>> CC: "11.0 11.1" 
>>>
>>> The expectation is that the 11.0 branch will remain alive with bi-weekly
>>> releases until after 11.1.1 release.
>>
>> It failed to build when we tried it yesterday on Mageia.
>> See
>> http://pkgsubmit.mageia.org/uploads/failure/cauldron/core/release/20151122205943.tv.valstar.4312/log/mesa-11.1.0~rc1-1.mga6/build.0.20151122210002.log
>>
>>   CC   libloader_la-loader.lo
>> In file included from loader.c:93:0:
>> pci_id_driver_map.h:16:34: fatal error: pci_ids/i915_pci_ids.h: No
>> such file or directory
>> compilation terminated.
>> Makefile:628: recipe for target 'libloader_la-loader.lo' failed
>
> Just pushed three patches which should resolve this + related issues.
>
> http://cgit.freedesktop.org/mesa/mesa/commit/?id=8a6d4765880a2c9000970b2b6e584291090b1dc3
> http://cgit.freedesktop.org/mesa/mesa/commit/?id=b9b0a1f58e41ba4027a16300393835dc8c632f50
> http://cgit.freedesktop.org/mesa/mesa/commit/?id=b89d1b2ccf16c0bee3708560b69e99b6dbb212f0
>
> They will make it into rc2 coming later this week.
>
> Thanks
> Emil
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   3   >